VMware Cloud Community
djkast33
Contributor
Contributor

vmhba0 high device latency

I'm having this odd issue where our ESX hosts are having high device lantency on the locally attached storage.

I'm not sure what the common latency is supposed to be, but checkout the screenshot. Anyone know what the service console could be consistently writing to disk?

The only thing hosted on vmhba0 is the service console

Reply
0 Kudos
6 Replies
jcwuerfl
Hot Shot
Hot Shot

So you dont have any 3rd party software installed in the service console correct? What is your local array and drive types and speed? what is your main system model? What verison of ESX? How much memory do you have assigned to the Service Console? the default or did you up it to 800mb?

Reply
0 Kudos
djkast33
Contributor
Contributor

HP DL360 g6, 2 mirrored SCSI drives.. unsure of speed

ESX 4.1

Service Console Memory

693796k used, 108388k free,

Reply
0 Kudos
jcwuerfl
Hot Shot
Hot Shot

The drive speed could be important in this case. Does it say anything on the Front of the Drive?

Looks like it could be a: Hot plug SFF SAS, Hot plug SFF SATA and then you also have the choice of Entry/Efficiency, Base, Performance for the HP Smart Array P410i embedded controller any ideas what was selected? or can you go into the BIOS of the Raid Card and see how much cache memory it lists if any?

http://h18004.www1.hp.com/products/quickspecs/DS_00145/DS_00145.pdf

I would definatally bump it up to 800mb also for the SC and reboot it. It also would be good to understand what is writing to the local disk? do you have any VM's running from local disk? Also, how did you configure your ESX partitions? specifically the swap space? but the other ones too.

Reply
0 Kudos
djkast33
Contributor
Contributor

What did you mean by this "would definatally bump it up to 800mb also for the SC and reboot it"

My SC memory is already 800MB

-Service Console Memory

693796k used, 108388k free,

Here is something I found that could be some useful debugging info?

Not sure what these are

  1. ls -alh /vmfs/devices/disks/mpx*

-rw------- 1 root root 137G Oct 7 11:34 /vmfs/devices/disks/mpx.vmhba0:C0:T0:L0

-rw------- 1 root root 1.1G Oct 7 11:34 /vmfs/devices/disks/mpx.vmhba0:C0:T0:L0:1

-rw------- 1 root root 110M Oct 7 11:34 /vmfs/devices/disks/mpx.vmhba0:C0:T0:L0:2

-rw------- 1 root root 136G Oct 7 11:34 /vmfs/devices/disks/mpx.vmhba0:C0:T0:L0:3

-rw------- 1 root root 136G Oct 7 11:34 /vmfs/devices/disks/mpx.vmhba0:C0:T0:L0:5

(Other than the first.. the other one's could be the locally attached storage on the other hosts in the cluster)

LSOF:

  1. lsof +d / | awk '$4 ~ /[0-9].*/'

vmkload_a 3899 root 11u DIR 65,21 4096 2 /

vmkload_a 3899 root 12u DIR 65,21 4096 2 /

vmkload_a 3924 root 11u DIR 65,21 4096 2 /

vmkload_a 3924 root 12u DIR 65,21 4096 2 /

vmkload_a 4016 root 11u DIR 65,21 4096 2 /

vmkload_a 4016 root 12u DIR 65,21 4096 2 /

vmkload_a 4027 root 11u DIR 65,21 4096 2 /

vmkload_a 4027 root 12u DIR 65,21 4096 2 /

vmkload_a 4123 root 11u DIR 65,21 4096 2 /

vmkload_a 4123 root 12u DIR 65,21 4096 2 /

vmkload_a 4788 root 12u DIR 65,21 4096 2 /

vmkload_a 5199 root 12u DIR 65,21 4096 2 /

vmkload_a 5370 root 12u DIR 65,21 4096 2 /

vmkload_a 8357 root 12u DIR 65,21 4096 2 /

vmkload_a 10684 root 11u DIR 65,21 4096 2 /

vmkload_a 10684 root 12u DIR 65,21 4096 2 /

vmkload_a 13691 root 12u DIR 65,21 4096 2 /

vmkload_a 27533 root 12u DIR 65,21 4096 2 /

This is the error I'm getting from vFoglight

ESX Host vmesxcl1-4.corp.navcan.ca Total Command Latency (time taken during the collection interval to process a SCSI command issued by the Guest OS to the virtual machine. The sum of kernelLatency and deviceLatency) to extent: 38.00 have exceeded the threshold of 35 ms. Virtual machines on this ESX Host may be experiencing performance problems. The following URL can be used to obtain alarm details.

The screenshots in first post show the device.

Reply
0 Kudos
djkast33
Contributor
Contributor

2x

SAS

DualPort 10k

146gb

Mirrored

Reply
0 Kudos
lservello
Contributor
Contributor

I realize this is an old post, but just in case someone looks here for assistance...I was having a similar disk latency issue on vmhba0 on two of my IBM x3650 M2 esx 4.1U1 hosts. The latency being reported in vmware would never drop below 20ms and would jump occasionally to the 30ms and 40ms range...even when there were no VMs running on the host. My issue was found to be directly related to a failed cache backup battery on the MR10i ServeRAID controller. Once the battery was replaced, the average latency dropped to between 0ms and 2ms.

Reply
0 Kudos