VMware Cloud Community
alex777
Contributor
Contributor

problems with SAN ?

Hello.

ESX 3.5 build 82663

SAN FC Xyratex F5412E. HBA qlogic

I greate two identical VMs. One on Xyratex, second on loca datastore.

When i copy big (>1GB) file into guest OS (win2k3) on Xyratex the VM periodically hangs up.

vmkernel log :

"Reset request on handle 8196 (28 outstanding commands)

Sep 24 11:23:35 vm5 vmkernel: 0:15:04:24.910 cpu1:1056)VSCSI: 3019: Resetting handle 8196

Sep 24 11:23:35 vm5 vmkernel: 0:15:04:24.910 cpu1:1056)<6>qla24xx_abort_command(2): handle to abort=625

Sep 24 11:23:35 vm5 vmkernel: 0:15:04:24.910 cpu1:1056)<6>qla24xx_abort_command(2): handle to abort=622

Sep 24 11:23:35 vm5 vmkernel: 0:15:04:24.910 cpu1:1056)<6>qla24xx_abort_command(2): handle to abort=784

Sep 24 11:23:35 vm5 vmkernel: 0:15:04:24.910 cpu1:1056)<6>qla24xx_abort_command(2): handle to abort=620

Sep 24 11:23:35 vm5 vmkernel: 0:15:04:24.910 cpu1:1056)<6>qla24xx_abort_command(2): handle to abort=797

Sep 24 11:23:35 vm5 vmkernel: 0:15:04:24.911 cpu1:1056)<6>qla24xx_abort_command(2): handle to abort=633

Sep 24 11:23:35 vm5 vmkernel: 0:15:04:24.911 cpu1:1056)<6>qla24xx_abort_command(2): handle to abort=623

Sep 24 11:23:35 vm5 vmkernel: 0:15:04:24.911 cpu1:1056)<6>qla24xx_abort_command(2): handle to abort=896

Sep 24 11:23:35 vm5 vmkernel: 0:15:04:24.911 cpu1:1056)<6>qla24xx_abort_command(2): handle to abort=627

Sep 24 11:23:35 vm5 vmkernel: 0:15:04:24.911 cpu1:1056)<6>qla24xx_abort_command(2): handle to abort=629

Sep 24 11:23:35 vm5 vmkernel: 0:15:04:24.911 cpu1:1056)<6>qla24xx_abort_command(2): handle to abort=635

Sep 24 11:23:35 vm5 vmkernel: 0:15:04:24.911 cpu1:1056)<6>qla24xx_abort_command(2): handle to abort=619

"

When i copy same file into guest OS (win2k3) on local datastore or to vmfs (by mc or vmkfstools) on Xyratex (not into guest OS) all works normally.

Reply
0 Kudos
11 Replies
paul-bogodynami
Contributor
Contributor

Is the HBA in the VM guest on RAW disk or just normal VMDK presented on data store?

Qlogic QDepth?

Reply
0 Kudos
alex777
Contributor
Contributor

Normal vmdk on vmfs 3 on FC data store.

Reply
0 Kudos
paul-bogodynami
Contributor
Contributor

vmfs block size?

What happens if you SCP a file from datastore to datastore? Still slow?

Reply
0 Kudos
christianZ
Champion
Champion

What happens when you clone a vm (on san)?

What happens when you copy the file inside a vm?

Can you post here the /proc/vmware/interrupts file?

Reply
0 Kudos
mike_laspina
Champion
Champion

Hello,

Please check the /var/log/vmkwarning file.

What other servers are using the SAN?

http://blog.laspina.ca/ vExpert 2009
Reply
0 Kudos
alex777
Contributor
Contributor

vmfs block size - 1MB

When SCP a file from datastore to datastore or clone a vm (on SAN), no errors on vmkernel log. And guest OS work normal (no slow or hangs up).

When i copy a big file inside a VM (on SAN) VM periodically hangs up. On vmkernel log :

"

Sep 24 11:23:35 vm5 vmkernel: 0:15:04:24.910 cpu1:1056)<6>qla24xx_abort_command(2): handle to abort=622

Sep 24 11:23:35 vm5 vmkernel: 0:15:04:24.910 cpu1:1056)<6>qla24xx_abort_command(2): handle to abort=784

Sep 24 11:23:35 vm5 vmkernel: 0:15:04:24.910 cpu1:1056)<6>qla24xx_abort_command(2): handle to abort=620

"

Other servers not using the SAN. Only ESX.

We try severals same Xyratex (with SAS and SATA disks) - the same.

Reply
0 Kudos
mike_laspina
Champion
Champion

There does appear to be some irq sharing that is not optimal shown in red.

0x71: 3095 816 1405 409 <COS irq 16 (PCI level)>, VMK megasas, VMK qla2300

0x79: 198929 0 0 0 COS irq 17 (PCI level), VMK qla2300

Can you try changing the active path to the other QLA2300 marked in green and see if that changes the behavior?

http://blog.laspina.ca/ vExpert 2009
Reply
0 Kudos
Leafy911
Expert
Expert

Have you aligned the NTFS partition with your VMFS partition?

You can do this with the Diskpart.exe utility.

I have a similar issue at the moment and found that aligning the NTFS to the VMFS to the SAN LUN made a noticeable improvement.

Regards

Leafy911

(Dont forget you recieve points when you award points)

Regards Leafy911 (Dont forget you recieve points when you award points)
Reply
0 Kudos
christianZ
Champion
Champion

0x71: 3095 816 1405 409 <COS irq 16 (PCI level)>, VMK megasas, VMK qla2300

0x79: 198929 0 0 0 COS irq 17 (PCI level), VMK qla2300

0x81: 169357 0 0 0 COS irq 19 (PCI level), VMK vmnic1, VMK libata

0x89: 185369 0 0 0 COS irq 18 (PCI level), VMK vmnic0 Definitely not the optimal interrupts there. Try to change them in server bios; check in /proc/interrupts which one is used by console as well.

The irqs should be over all cpus.

Reply
0 Kudos
alex777
Contributor
Contributor

After I have connected the second server to Xyratex messages like "qla24xx_abort_command(2): handle to abort=625 " from a vmkernel log were gone, but the problem with guest OS still the same.

I have changed IRQ for QLogic. Now no sharing IRQ.

Try different HBA Queue Depth.

Try to aligned the NTFS partition.

The same. When i write big data to guest OS, VMs on the SAN hangs up (freezes), but file is copying. When copying is ended guest OS work normal.

Reply
0 Kudos
alex777
Contributor
Contributor

I think that a problem in drivers of hard disk and/or NIC in guest OS. Whether It is possible to change drivers which vmware gives guest OS?

Reply
0 Kudos