Hello.
ESX 3.5 build 82663
SAN FC Xyratex F5412E. HBA qlogic
I greate two identical VMs. One on Xyratex, second on loca datastore.
When i copy big (>1GB) file into guest OS (win2k3) on Xyratex the VM periodically hangs up.
vmkernel log :
"Reset request on handle 8196 (28 outstanding commands)
Sep 24 11:23:35 vm5 vmkernel: 0:15:04:24.910 cpu1:1056)VSCSI: 3019: Resetting handle 8196
Sep 24 11:23:35 vm5 vmkernel: 0:15:04:24.910 cpu1:1056)<6>qla24xx_abort_command(2): handle to abort=625
Sep 24 11:23:35 vm5 vmkernel: 0:15:04:24.910 cpu1:1056)<6>qla24xx_abort_command(2): handle to abort=622
Sep 24 11:23:35 vm5 vmkernel: 0:15:04:24.910 cpu1:1056)<6>qla24xx_abort_command(2): handle to abort=784
Sep 24 11:23:35 vm5 vmkernel: 0:15:04:24.910 cpu1:1056)<6>qla24xx_abort_command(2): handle to abort=620
Sep 24 11:23:35 vm5 vmkernel: 0:15:04:24.910 cpu1:1056)<6>qla24xx_abort_command(2): handle to abort=797
Sep 24 11:23:35 vm5 vmkernel: 0:15:04:24.911 cpu1:1056)<6>qla24xx_abort_command(2): handle to abort=633
Sep 24 11:23:35 vm5 vmkernel: 0:15:04:24.911 cpu1:1056)<6>qla24xx_abort_command(2): handle to abort=623
Sep 24 11:23:35 vm5 vmkernel: 0:15:04:24.911 cpu1:1056)<6>qla24xx_abort_command(2): handle to abort=896
Sep 24 11:23:35 vm5 vmkernel: 0:15:04:24.911 cpu1:1056)<6>qla24xx_abort_command(2): handle to abort=627
Sep 24 11:23:35 vm5 vmkernel: 0:15:04:24.911 cpu1:1056)<6>qla24xx_abort_command(2): handle to abort=629
Sep 24 11:23:35 vm5 vmkernel: 0:15:04:24.911 cpu1:1056)<6>qla24xx_abort_command(2): handle to abort=635
Sep 24 11:23:35 vm5 vmkernel: 0:15:04:24.911 cpu1:1056)<6>qla24xx_abort_command(2): handle to abort=619
"
When i copy same file into guest OS (win2k3) on local datastore or to vmfs (by mc or vmkfstools) on Xyratex (not into guest OS) all works normally.
Is the HBA in the VM guest on RAW disk or just normal VMDK presented on data store?
Qlogic QDepth?
Normal vmdk on vmfs 3 on FC data store.
vmfs block size?
What happens if you SCP a file from datastore to datastore? Still slow?
What happens when you clone a vm (on san)?
What happens when you copy the file inside a vm?
Can you post here the /proc/vmware/interrupts file?
vmfs block size - 1MB
When SCP a file from datastore to datastore or clone a vm (on SAN), no errors on vmkernel log. And guest OS work normal (no slow or hangs up).
When i copy a big file inside a VM (on SAN) VM periodically hangs up. On vmkernel log :
"
Sep 24 11:23:35 vm5 vmkernel: 0:15:04:24.910 cpu1:1056)<6>qla24xx_abort_command(2): handle to abort=622
Sep 24 11:23:35 vm5 vmkernel: 0:15:04:24.910 cpu1:1056)<6>qla24xx_abort_command(2): handle to abort=784
Sep 24 11:23:35 vm5 vmkernel: 0:15:04:24.910 cpu1:1056)<6>qla24xx_abort_command(2): handle to abort=620
"
Other servers not using the SAN. Only ESX.
We try severals same Xyratex (with SAS and SATA disks) - the same.
There does appear to be some irq sharing that is not optimal shown in red.
0x71: 3095 816 1405 409 <COS irq 16 (PCI level)>, VMK megasas, VMK qla2300
0x79: 198929 0 0 0 COS irq 17 (PCI level), VMK qla2300
Can you try changing the active path to the other QLA2300 marked in green and see if that changes the behavior?
Have you aligned the NTFS partition with your VMFS partition?
You can do this with the Diskpart.exe utility.
I have a similar issue at the moment and found that aligning the NTFS to the VMFS to the SAN LUN made a noticeable improvement.
Regards
Leafy911
(Dont forget you recieve points when you award points)
0x71: 3095 816 1405 409 <COS irq 16 (PCI level)>, VMK megasas, VMK qla2300
0x79: 198929 0 0 0 COS irq 17 (PCI level), VMK qla2300
0x81: 169357 0 0 0 COS irq 19 (PCI level), VMK vmnic1, VMK libata
0x89: 185369 0 0 0 COS irq 18 (PCI level), VMK vmnic0 Definitely not the optimal interrupts there. Try to change them in server bios; check in /proc/interrupts which one is used by console as well.
The irqs should be over all cpus.
After I have connected the second server to Xyratex messages like "qla24xx_abort_command(2): handle to abort=625 " from a vmkernel log were gone, but the problem with guest OS still the same.
I have changed IRQ for QLogic. Now no sharing IRQ.
Try different HBA Queue Depth.
Try to aligned the NTFS partition.
The same. When i write big data to guest OS, VMs on the SAN hangs up (freezes), but file is copying. When copying is ended guest OS work normal.
I think that a problem in drivers of hard disk and/or NIC in guest OS. Whether It is possible to change drivers which vmware gives guest OS?