VMware Cloud Community
Matthiker
Contributor
Contributor

esxtop output with huge number -- error?

I have 4 ESX 4.0 hosts running around 100 VMs with vCenter and have been experiencing performance issues for a couple of weeks.

All VMs are very slow during certain period about 1 or 2 hours every day, and it changes sometimes.

After having analysized with SAN tools, esxtop tools and some scripts inside VMs, I identified the problem. It's about High Disk Avg. Latency on a Dell PS6000E Storage.

the Latency pattern presented in esxtop tools matched SAN Utility tools, sometime with high IOPS, sometime with high MB Read/sec and MB Written/sec, but not easy to identify which VM or group of VMs caused the problem.

I wrote a script on every ESX host, collecting esxtop every 10 seconds for 1 hour whenever a High "Physical Disk\(vmhba36\)\\Average Driver MilliSec" detected.

And, now I found out when it becomes slow, the output of esxtop got something interesting every time. I awk the output for Human read, only taking several measurements.

Machine                Commands             Reads                Writes               Read MBs     Written MBs  Latency  ESX Host

VM-name1           5462836927972282368  5462836927972308992  5462836927972466688  5.20977e+12  5.20977e+12  17318.8  ESX01

VM-name2           3518437297766401024  3518437297766401024  3518437297766402048  3.35544e+12  3.35544e+12  12120.1  ESX02

helper                   7.04                         2334.75              2.89                   2331.87          0.01          103.46   ESX01

VM-name3           49458.6                   41820.9              7637.62             689.17            274.14       722.07   ESX03

VM-name4          17584.6                    11340                 6244.58              463.47           342            12301.6  ESX04

............

Does anyone see this before?

0 Kudos
8 Replies
Matthiker
Contributor
Contributor

Please see the attached picture for formatted detailed output.

0 Kudos
Matthiker
Contributor
Contributor

I also found the followin logs in vmkernel, generally it appears there, sometimes not.


Mar 31 08:45:59 ESXHOST01 vmkernel: 107:02:15:15.314 cpu4:4229)NMP: nmp_CompleteCommandForPath: Command 0x28 (0x4100010d1bc0) to NMP device "naa.6090a0684007fb15e7bb94010000f075" fai
led on physical path "vmhba36:C0:T0:L0" H:0x5 D:0x0 P:0x0 Possible sense data: 0x2 0x3a 0x0.
Mar 31 08:45:59 ESXHOST01 vmkernel: 107:02:15:15.314 cpu4:4229)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.6090a0684007fb15e7bb94010000f075" state in doubt; reque
sted fast path state update...
Mar 31 08:45:59 ESXHOST01 vmkernel: 107:02:15:15.314 cpu4:4229)ScsiDeviceIO: 747: Command 0x28 to device "naa.6090a0684007fb15e7bb94010000f075" failed H:0x5 D:0x0 P:0x0 Possible sens
e data: 0x2 0x3a 0x0.
Mar 31 08:45:59 ESXHOST01 vmkernel: 107:02:15:15.315 cpu3:4119)J3: 2158: Aborting txn 0x4100b711c010 due to failure pre-committing: Optimistic lock acquired by another host
Mar 31 08:46:01 ESXHOST01 vmkernel: 107:02:15:17.685 cpu7:4229)NMP: nmp_CompleteCommandForPath: Command 0x28 (0x4100010ef6c0) to NMP device "naa.6090a0684007fb15e7bb94010000f075" fai
led on physical path "vmhba36:C0:T0:L0" H:0x5 D:0x0 P:0x0 Possible sense data: 0x2 0x3a 0x0.
Mar 31 08:46:01 ESXHOST01 vmkernel: 107:02:15:17.685 cpu7:4229)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.6090a0684007fb15e7bb94010000f075" state in doubt; reque
sted fast path state update...
Mar 31 08:46:01 ESXHOST01 vmkernel: 107:02:15:17.685 cpu7:4229)ScsiDeviceIO: 747: Command 0x28 to device "naa.6090a0684007fb15e7bb94010000f075" failed H:0x5 D:0x0 P:0x0 Possible sens
e data: 0x2 0x3a 0x0.
Mar 31 08:46:01 ESXHOST01 vmkernel: 107:02:15:17.686 cpu3:4119)J3: 2158: Aborting txn 0x4100b711c010 due to failure pre-committing: Optimistic lock acquired by another host

0 Kudos
idle-jam
Immortal
Immortal

perhaps the value is too much to be display on screen. what not use software like esxtop to look in more graphical manner? http://labs.vmware.com/flings/esxplot

0 Kudos
Matthiker
Contributor
Contributor

Have tried esxplot before, this value looks like a straight line down from sky on the chart.

0 Kudos
Matthiker
Contributor
Contributor

The problem is not how to analysing the logs, but the huge number in the output of esxtop.

0 Kudos
Matthiker
Contributor
Contributor

nobody had encountered this?

0 Kudos
Matthiker
Contributor
Contributor

After having upgraded to ESX 4.1, everything is fine.

0 Kudos
Matthiker
Contributor
Contributor

DONE

0 Kudos