I'm having this odd issue where our ESX hosts are having high device lantency on the locally attached storage.
I'm not sure what the common latency is supposed to be, but checkout the screenshot. Anyone know what the service console could be consistently writing to disk?
The only thing hosted on vmhba0 is the service console
So you dont have any 3rd party software installed in the service console correct? What is your local array and drive types and speed? what is your main system model? What verison of ESX? How much memory do you have assigned to the Service Console? the default or did you up it to 800mb?
HP DL360 g6, 2 mirrored SCSI drives.. unsure of speed
ESX 4.1
Service Console Memory
693796k used, 108388k free,
The drive speed could be important in this case. Does it say anything on the Front of the Drive?
Looks like it could be a: Hot plug SFF SAS, Hot plug SFF SATA and then you also have the choice of Entry/Efficiency, Base, Performance for the HP Smart Array P410i embedded controller any ideas what was selected? or can you go into the BIOS of the Raid Card and see how much cache memory it lists if any?
http://h18004.www1.hp.com/products/quickspecs/DS_00145/DS_00145.pdf
I would definatally bump it up to 800mb also for the SC and reboot it. It also would be good to understand what is writing to the local disk? do you have any VM's running from local disk? Also, how did you configure your ESX partitions? specifically the swap space? but the other ones too.
What did you mean by this "would definatally bump it up to 800mb also for the SC and reboot it"
My SC memory is already 800MB
-Service Console Memory
693796k used, 108388k free,
Here is something I found that could be some useful debugging info?
Not sure what these are
ls -alh /vmfs/devices/disks/mpx*
-rw------- 1 root root 137G Oct 7 11:34 /vmfs/devices/disks/mpx.vmhba0:C0:T0:L0
-rw------- 1 root root 1.1G Oct 7 11:34 /vmfs/devices/disks/mpx.vmhba0:C0:T0:L0:1
-rw------- 1 root root 110M Oct 7 11:34 /vmfs/devices/disks/mpx.vmhba0:C0:T0:L0:2
-rw------- 1 root root 136G Oct 7 11:34 /vmfs/devices/disks/mpx.vmhba0:C0:T0:L0:3
-rw------- 1 root root 136G Oct 7 11:34 /vmfs/devices/disks/mpx.vmhba0:C0:T0:L0:5
(Other than the first.. the other one's could be the locally attached storage on the other hosts in the cluster)
LSOF:
vmkload_a 3899 root 11u DIR 65,21 4096 2 /
vmkload_a 3899 root 12u DIR 65,21 4096 2 /
vmkload_a 3924 root 11u DIR 65,21 4096 2 /
vmkload_a 3924 root 12u DIR 65,21 4096 2 /
vmkload_a 4016 root 11u DIR 65,21 4096 2 /
vmkload_a 4016 root 12u DIR 65,21 4096 2 /
vmkload_a 4027 root 11u DIR 65,21 4096 2 /
vmkload_a 4027 root 12u DIR 65,21 4096 2 /
vmkload_a 4123 root 11u DIR 65,21 4096 2 /
vmkload_a 4123 root 12u DIR 65,21 4096 2 /
vmkload_a 4788 root 12u DIR 65,21 4096 2 /
vmkload_a 5199 root 12u DIR 65,21 4096 2 /
vmkload_a 5370 root 12u DIR 65,21 4096 2 /
vmkload_a 8357 root 12u DIR 65,21 4096 2 /
vmkload_a 10684 root 11u DIR 65,21 4096 2 /
vmkload_a 10684 root 12u DIR 65,21 4096 2 /
vmkload_a 13691 root 12u DIR 65,21 4096 2 /
vmkload_a 27533 root 12u DIR 65,21 4096 2 /
This is the error I'm getting from vFoglight
ESX Host vmesxcl1-4.corp.navcan.ca Total Command Latency (time taken during the collection interval to process a SCSI command issued by the Guest OS to the virtual machine. The sum of kernelLatency and deviceLatency) to extent: 38.00 have exceeded the threshold of 35 ms. Virtual machines on this ESX Host may be experiencing performance problems. The following URL can be used to obtain alarm details.
The screenshots in first post show the device.
2x
SAS
DualPort 10k
146gb
Mirrored
I realize this is an old post, but just in case someone looks here for assistance...I was having a similar disk latency issue on vmhba0 on two of my IBM x3650 M2 esx 4.1U1 hosts. The latency being reported in vmware would never drop below 20ms and would jump occasionally to the 30ms and 40ms range...even when there were no VMs running on the host. My issue was found to be directly related to a failed cache backup battery on the MR10i ServeRAID controller. Once the battery was replaced, the average latency dropped to between 0ms and 2ms.