VMware Cloud Community
particleman68
Contributor
Contributor

frequent error message 'nmp_ThrottleLogForDevice:3178' in vmkernel.log

Dear Community,

I get the error messages below every 5 minutes in my vmkernel.log on a fresh ESXi6 insatll.

This is reproducible on several hp Proliant machines with different disks (Samsung SSD or original hp hard-disks) ,

different controllers P410, P410i, different host machines DL360G6, ML350G6.

Any combination of hardware show the same error logs.

After a few days of zero load the host system becomes inaccessible  and the guest filesystem switches to read-only.

Has anyone any idea or similar experiences ?

Any help is highly appreciated

kristian

from /var/log/vmkernel.log:

2011-05-05T01:33:23.951Z cpu2:33081)NMP: nmp_ThrottleLogForDevice:3178: Cmd 0x12 (0x439d80702b00, 0) to dev "naa.600508b1001030364132303435300700" on path "vmhba1:C0:T0:L1" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0. Act:NONE

2011-05-05T01:33:23.954Z cpu2:32789)NMP: nmp_ThrottleLogForDevice:3178: Cmd 0x1a (0x439d80702b00, 0) to dev "mpx.vmhba0:C0:T0:L0" on path "vmhba0:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE

2011-05-05T01:38:23.954Z cpu3:32967)NMP: nmp_ThrottleLogForDevice:3178: Cmd 0x12 (0x439d806e1a80, 0) to dev "naa.600508b1001030364132303435300700" on path "vmhba1:C0:T0:L1" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0. Act:NONE

2011-05-05T01:38:23.958Z cpu2:32789)NMP: nmp_ThrottleLogForDevice:3178: Cmd 0x1a (0x439d806e1a80, 0) to dev "mpx.vmhba0:C0:T0:L0" on path "vmhba0:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE

2011-05-05T01:43:23.951Z cpu0:35848)NMP: nmp_ThrottleLogForDevice:3178: Cmd 0x12 (0x439d80701d80, 0) to dev "naa.600508b1001030364132303435300700" on path "vmhba1:C0:T0:L1" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0. Act:NONE

2011-05-05T01:43:23.956Z cpu3:32790)NMP: nmp_ThrottleLogForDevice:3178: Cmd 0x1a (0x439d80701d80, 0) to dev "mpx.vmhba0:C0:T0:L0" on path "vmhba0:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE

2011-05-05T01:48:23.951Z cpu1:33244)NMP: nmp_ThrottleLogForDevice:3178: Cmd 0x12 (0x439d806e4d80, 0) to dev "naa.600508b1001030364132303435300700" on path "vmhba1:C0:T0:L1" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0. Act:NONE

2011-05-05T01:48:23.954Z cpu0:32787)NMP: nmp_ThrottleLogForDevice:3178: Cmd 0x1a (0x439d806e4d80, 0) to dev "mpx.vmhba0:C0:T0:L0" on path "vmhba0:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE

2011-05-05T01:53:23.952Z cpu1:36196)NMP: nmp_ThrottleLogForDevice:3178: Cmd 0x12 (0x439d806f5dc0, 0) to dev "naa.600508b1001030364132303435300700" on path "vmhba1:C0:T0:L1" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0. Act:NONE

2011-05-05T01:53:23.955Z cpu2:32789)NMP: nmp_ThrottleLogForDevice:3178: Cmd 0x1a (0x439d806f5dc0, 0) to dev "mpx.vmhba0:C0:T0:L0" on path "vmhba0:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE

2011-05-05T01:54:00.557Z cpu0:33013)<4>hpsa 0000:04:00.0: Device:C0:B0:T0:L1 Command:0x85 CC:05/20/00 Illegal Request.

2011-05-05T01:54:00.557Z cpu1:32781)NMP: nmp_ThrottleLogForDevice:3178: Cmd 0x85 (0x439d806d16c0, 34212) to dev "naa.600508b1001030364132303435300700" on path "vmhba1:C0:T0:L1" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE

2011-05-05T01:54:00.568Z cpu1:32781)ScsiDeviceIO: 2646: Cmd(0x439d806d16c0) 0x4d, CmdSN 0x5 from world 34212 to dev "naa.600508b1001030364132303435300700" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

2011-05-05T01:54:00.569Z cpu1:34212)NMP: nmp_ThrottleLogForDevice:3178: Cmd 0x1a (0x439d806d16c0, 34212) to dev "naa.600508b1001030364132303435300700" on path "vmhba1:C0:T0:L1" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0. Act:NONE

2011-05-05T01:54:00.569Z cpu1:34212)ScsiDeviceIO: 2646: Cmd(0x439d806d16c0) 0x1a, CmdSN 0x6 from world 34212 to dev "naa.600508b1001030364132303435300700" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2011-05-05T01:54:00.583Z cpu0:33014)<4>hpsa 0000:04:00.0: Device:C0:B0:T0:L1 Command:0x85 CC:05/20/00 Illegal Request.

2011-05-05T01:54:00.583Z cpu1:33156)NMP: nmp_ThrottleLogForDevice:3178: Cmd 0x85 (0x439d806d16c0, 34212) to dev "naa.600508b1001030364132303435300700" on path "vmhba1:C0:T0:L1" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE

2011-05-05T01:58:23.953Z cpu3:32817)NMP: nmp_ThrottleLogForDevice:3178: Cmd 0x12 (0x439d806e0a00, 0) to dev "naa.600508b1001030364132303435300700" on path "vmhba1:C0:T0:L1" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0. Act:NONE

2011-05-05T01:58:23.958Z cpu3:32790)NMP: nmp_ThrottleLogForDevice:3178: Cmd 0x1a (0x439d806e0a00, 0) to dev "mpx.vmhba0:C0:T0:L0" on path "vmhba0:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE

2011-05-05T02:03:23.955Z cpu1:32769)NMP: nmp_ThrottleLogForDevice:3178: Cmd 0x12 (0x439d806d40c0, 0) to dev "naa.600508b1001030364132303435300700" on path "vmhba1:C0:T0:L1" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0. Act:NONE

2011-05-05T02:03:23.957Z cpu1:32788)NMP: nmp_ThrottleLogForDevice:3178: Cmd 0x1a (0x439d806d40c0, 0) to dev "mpx.vmhba0:C0:T0:L0" on path "vmhba0:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE

Tags (3)
6 Replies
unsichtbare
Expert
Expert

Is the install to traditional HDD with RAID or flash-media.

Our best workaround is to direct both the log-files and the host scratch to one of our SANs (which also centralized logging). Truly,  I don't understand the relationship at work here, but it seems to prevent o the apparent lock-up of the ESXi Host.

+The Invisible Admin+ If you find me useful, follow my blog: http://johnborhek.com/
j_rodstein
Enthusiast
Enthusiast

I am having similar issues as I posted here:  Re: Host Storage Status Alerts in ESXi 6

I have a few iSCSI SANs setup for testing...all using Multipathing.  I am also getting this error on SSD drives in the ESXi hosts that are used for vFlash Cache.  I seem to be getting this error on all drives...not matter if local or iSCSI.  I did not have this problem in ESXi 5.5.  Only thing performed since the ESXi 6 install (did so fresh since my hosts had been in-place upgraded since 5.0) is update all of my drivers and firmware on the hosts to make sure I was in compliance with VMware's HCL.

I also have the one ESXi patch that has been released applied, although I had these issues before applying it.  Perhaps there is a bug somewhere?

0 Kudos
particleman68
Contributor
Contributor

current situation is:

After a complete re-install of ESXi 5.5 and having the complete driver and firmware landscape according to:

https://h20565.www2.hp.com/hpsc/swd/public/readIndex?sp4ts.oid=3884321&swLangOid=8&swEnvOid=4166

I still get those messages (see below), just less frequently (every 30 minutes) which is, as I have read so far, just a matter of setiings inside ESXi.

Using original hp hard-disks or Samsung 850 pro SSD doesn't make a difference.

Other hardware components were checked by having a complete mirror system ML350G6 which shows the same behaviour.

I have 2 more questions:

I'm planning to use the machine as a production database. Can these messages really be "safely ignored" ?

Is it normal that the true write performance in a CentOS 6 guest OS is only around 30MB/s in a RAID1 on a P410i smart array controller ?

dd if=/dev/zero of=./big.dat bs=1G count=1

1+0 records in

1+0 records out

1073741824 bytes (1,1 GB) copied, 52,1014 s, 20,6 MB/s

~ # grep -i fail /var/log/vmkernel.log

2015-05-13T06:40:36.087Z cpu4:32867)NMP: nmp_ThrottleLogForDevice:2322: Cmd 0x85 (0x412e8044d1c0, 34397) to dev "naa.600508b1001031363320202020200008" on path "vmhba1:C0:T0:L2" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE

2015-05-13T06:40:36.087Z cpu4:32867)ScsiDeviceIO: 2338: Cmd(0x412e8044d1c0) 0x85, CmdSN 0xb1 from world 34397 to dev "naa.600508b1001031363320202020200008" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

2015-05-13T06:40:36.098Z cpu4:33522)ScsiDeviceIO: 2338: Cmd(0x412e8044d1c0) 0x4d, CmdSN 0xb2 from world 34397 to dev "naa.600508b1001031363320202020200008" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

2015-05-13T06:40:36.098Z cpu4:33522)ScsiDeviceIO: 2338: Cmd(0x412e8044d1c0) 0x1a, CmdSN 0xb3 from world 34397 to dev "naa.600508b1001031363320202020200008" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2015-05-13T06:40:36.123Z cpu4:33267)NMP: nmp_ThrottleLogForDevice:2322: Cmd 0x85 (0x412e8044d1c0, 34397) to dev "naa.600508b1001c2b8bf72b80c87bbd9218" on path "vmhba1:C0:T0:L1" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE

2015-05-13T06:40:36.123Z cpu4:33267)ScsiDeviceIO: 2338: Cmd(0x412e8044d1c0) 0x85, CmdSN 0xb4 from world 34397 to dev "naa.600508b1001c2b8bf72b80c87bbd9218" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

2015-05-13T06:40:36.134Z cpu4:32792)ScsiDeviceIO: 2338: Cmd(0x412e8044d1c0) 0x4d, CmdSN 0xb5 from world 34397 to dev "naa.600508b1001c2b8bf72b80c87bbd9218" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

2015-05-13T06:40:36.134Z cpu6:35142)ScsiDeviceIO: 2338: Cmd(0x412e8044d1c0) 0x1a, CmdSN 0xb6 from world 34397 to dev "naa.600508b1001c2b8bf72b80c87bbd9218" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

0 Kudos
j_rodstein
Enthusiast
Enthusiast

You can use your server in production, but I would not ignore those errors.  I did not have them at the frequency I am receiving them (every 5 min on all disks...local or iSCSI) until ESXi 6.  That does not mean I may be missing a tweak I made in 5.5.  I would continue to pursue and open a case with Support if you have a contract in place.  I plan on doing the same once I get time.

0 Kudos
particleman68
Contributor
Contributor

I just wanted to share an interesting observation regarding disk performance.

After adding a Cache  module to the smart array (512MB FBWC) the disk write performance improved a lot.

Controller P410i with 512MB FBWC, RAID 1

true write performance measured by:

dd if=/dev/zero of=some.file bs=1G count=1

hp SAS HDD: 229MB/s

Samsung 850 pro SSD: 48MB/s

read performance:

hdparm -t /dev/sda

hp SAS HDD: 346 MB/s

Samsung 850 pro SSD: 380 MB/s

Obviously it is not well-invested money to use "Pro" SSD in a RAID 1 configuration.

Unfortunately, the error messages in the vmkernel.log still remain.

0 Kudos
Goat2048
Contributor
Contributor

I found some very useful information to interpret the error message here: Analyzing ESXi Log Files to Identify Storage and Multipathing Problems

Based on this article I will interpret you error message, hopefully this will help some others. In your message we find the code H:0x0 D:0x2 P:0x0, this means the error is related to the Device (H: host, 😧 device, P:plugin status). On the vmware site (vSphere Documentation Center) we find the information required to understand the code 0x2, what means CHECK CONDITION. In the article you also find some references to t10.org about how to interpret the Valid sense data: 0x5 0x24 0x0, which is also in the log message.

Based on the following ref: SCSI Sense Keys‌ we know that an ILLEGAL REQUEST was send to your storage device. But how bad is this?

I did some further research on were the request comes from. I did this on my own test environment. Every 30minutes I see the message in my log files. After some research I found that the world ID (Process ID) is in the log lines. In your logs the world ID is 34212. Using the ps command you can find the process responsible for sending the ILLEGAL REQUEST (ps | grep 34212). In my environment it was the SMARTD process. I think this is nothing to worry about. I expect the smartd proces to check statuses of your disks, which is not understand by your disks. Do your disks and RAID-controller support SMART? Especially RAID controllers do not forward the SMART requests I think?

This guy also writes about this issue: My way...: In vSphere ESXi vmkernel.log we see messages every 30 minutes: NMP: nmp_ThrottleLogForDev...

I hope this helps you!