VMware Cloud Community
supahted
Enthusiast
Enthusiast

WARNING: NMP: nmp_DeviceRequestFastDeviceProbe

I am currently testing ESXi 4 by adding one ESXi 4 host to our VMware production cluster. The host is a HP BL460c G1 blade running ESXi 4 build 175625 connected to a HP EVA 6000 storage array. The ESXi 4 host seems to run fine but i noticed the following kernel warnings in the system log:

Jul 18 17:00:27 vmkernel: 2:07:08:24.308 cpu7:40478)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x4100021b8480) to NMP device "naa.600508b4000554df00007000034a0000" failed on physical path "vmhba1:C0:T0:L11" H:0x2 D:0x0 P:0x0 Possible sense data: Jul 18 17:00:27 0x0 0x0 0x0.

Jul 18 17:00:27 vmkernel: 2:07:08:24.308 cpu7:40478)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.600508b4000554df00007000034a0000" state in doubt; requested fast path state update...

Jul 18 17:00:27 vmkernel: 2:07:08:24.308 cpu7:40478)ScsiDeviceIO: 747: Command 0x2a to device "naa.600508b4000554df00007000034a0000" failed H:0x2 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

These warnings don't appear on our ESXi 3 hosts. These warning seems something to do with the multipath policies but i don't understand the warning message. This warnings are reported frequently on multiple lun's. Does anybody knows what these warnings mean?

blog: http://vknowledge.wordpress.com/
Tags (1)
Reply
0 Kudos
59 Replies
Elgreco
Contributor
Contributor

Im experiencing the same error on esx4 on my 3.5u4 i didnt get the error

Sep 18 09:59:06 esxbl03 vmkernel: 6:16:23:14.324 cpu7:4103)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x41000523fa00) to NMP device "naa.600508b4001076bc0000800000

c60000" failed on physical path "vmhba2:C0:T1:L4" H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

Sep 18 09:59:06 esxbl03 vmkernel: 6:16:23:14.324 cpu7:4103)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.600508b4001076bc0000800000c60000" state in d

oubt; requested fast path state update...

Sep 18 09:59:06 esxbl3 vmkernel: 6:16:23:14.324 cpu7:4103)ScsiDeviceIO: 747: Command 0x2a to device "naa.600508b4001076bc0000800000c60000" failed H:0x8 D:0x0 P:0x0 Po

ssible sense data: 0x0 0x0 0x0.

any sollution for these problem ?

Reply
0 Kudos
Elgreco
Contributor
Contributor

i solved my problem with changing the mpio to Fixed in the datastore

Reply
0 Kudos
hmolvig
Contributor
Contributor

I just tried it - didnt help here...

Reply
0 Kudos
RRaebiger
Contributor
Contributor

Hello,

i have the same problem with ESXi 4. Our System:

DELL PowerEdge 1800 with SATA CERC Raid Controller, 6 x 160 GB SATA, RAID 10.

1 x Machine LSI Logic, 2 Drives

1 x Machine Buslogig, 1 Drive

1 x Machine Buslogix, 1 Drive

1 x Machine IDE, 1 Drive

When starting another Image with LSI Logic SCSI the System hangs. I am trying now to use another machine with Buslogic SCSI. Can someone test or have a look if it could be a problem with the used SCSI Controllers?

Regards

Rainer Raebiger

Reply
0 Kudos
_BoRiS_
Contributor
Contributor

Same problem here

3 hosts ESX4 build 175625, storage Datacore SANmeloidy 3.01

Oct 7 14:18:15 srvesx3 vmkernel: 6:00:55:54.256 cpu5:4268)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.60030d9056566f6c5341533200000000" state in doubt; requested fast path state update...

Oct 7 14:18:15 srvesx3 vmkernel: 6:00:55:54.256 cpu5:4268)ScsiDeviceIO: 747: Command 0x28 to device "naa.60030d9056566f6c5341533200000000" failed H:0x2 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

Oct 7 14:21:44 srvesx3 vmkernel: 6:00:59:23.076 cpu7:7497)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x410006032080) to NMP device "naa.60030d9056566f6c5341533200000000" failed on physical path "vmhba1:C0:T1:L1" H:0x2 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

Oct 7 14:21:44 srvesx3 vmkernel: 6:00:59:23.076 cpu7:7497)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.60030d9056566f6c5341533200000000" state in doubt; requested fast path state update...

Oct 7 14:21:44 srvesx3 vmkernel: 6:00:59:23.076 cpu7:7497)ScsiDeviceIO: 747: Command 0x2a to device "naa.60030d9056566f6c5341533200000000" failed H:0x2 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

Same error on local storage (HP P400 controler)

Oct 7 14:34:05 srvesx3 vmkernel: 6:01:11:43.667 cpu6:4102)NMP: nmp_CompleteCommandForPath: Command 0x12 (0x41000601fd40) to NMP device "mpx.vmhba2:C0:T1:L0" failed on physical path "vmhba2:C0:T1:L0" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

Oct 7 14:34:05 srvesx3 vmkernel: 6:01:11:43.667 cpu6:4102)ScsiDeviceIO: 747: Command 0x12 to device "mpx.vmhba2:C0:T1:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

Oct 7 14:34:05 srvesx3 vmkernel: 6:01:11:43.868 cpu6:4102)NMP: nmp_CompleteCommandForPath: Command 0x12 (0x41000613d000) to NMP device "mpx.vmhba2:C0:T0:L0" failed on physical path "vmhba2:C0:T0:L0" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

Oct 7 14:34:05 srvesx3 vmkernel: 6:01:11:43.868 cpu6:4102)ScsiDeviceIO: 747: Command 0x12 to device "mpx.vmhba2:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

Anyone find the solution to this warning ?

Regards,

Boris

Reply
0 Kudos
dodell
Contributor
Contributor

Just received the following email from VMware support:

"I'm writing you in regards to the service request 1432691811 concerning unresponsive VMs after un-presenting a LUN.

I'd like to inform you that our engineering has identified this issue and working on a fix. I will inform you again with additional information as soon as it becomes available."

Reply
0 Kudos
Dmitri024
Contributor
Contributor

We have 4 hosts ESX 4.0 build 175625 and 3 ESXi hosts 4.0 build 193498 connected to the storage Datacore SANMelody 2.0.4 Update 1 and the same Problem.

I have opened the SR with VMware Support. The answer was "Your configuration is not supported. You must update your Datacore SANMelody to version 3.0"

But I can not upgrade our Datacore SANMelody to version 3 right now.

Boris, did You open the SR by VMWare Support? Which answer did You get?

Regards,

Dmitri

Reply
0 Kudos
iceman76
Enthusiast
Enthusiast

Sehr geehrte Dame sehr geehrter Herr,

ich befinde mich vom 2. - 10 November 2009 nicht im Hause. Ich empfange Ihre eMail zwar.kann Sie aber nicht bearbeiten. Bitte wenden Sie sich in dringenden Fällen an unsere Technikhotline die unter der Rufnummer 0611 780 3003 zu erreichen ist.

Mit freundlichen Grüßen

Carsten Buchberger

Reply
0 Kudos
RRaebiger
Contributor
Contributor

Hi all,

we changed all SCSI-Controllers from LSI to BusLogic. Since then we have no more problems.

Regards

Rainer Raebiger

Reply
0 Kudos
_BoRiS_
Contributor
Contributor

No I didn't open a SR by VMware support.

And you did you open a support request by Datacore ?

Regards,

Boris

Reply
0 Kudos
iceman76
Enthusiast
Enthusiast

To solve our problem (see post from end of july) falconstor had to fix its ipstor storage-server. There was indeed a problem with a lun 0 which was not presented in the correct way. After applying the patch we didn't see the error again.

Reply
0 Kudos
sakacc
Enthusiast
Enthusiast

Wrote Virtualgeek blog posts about a vSphere 4 (and vSphere 4u1) condition that can create this state, and two workarounds.

Not saying it is the root cause of the above noted cases, but to me it looks like it.

VMs (obviously those NOT on the lost datastore) becoming intermittently inaccessible if an APD (all paths dead) state is detected is a known issue. Commonly this can be triggered by yanking LUNs before removing datastores and ESX devices, or storage or FC/FCoE/iSCSI network issues.

You can see the post and workarounds here:

http://virtualgeek.typepad.com/virtual_geek/2009/12/an-important-vsphere-4-storage-bug-and-workaroun...

Chad Sakac, P.Eng. vExpert

EMC Corp

VP, VMware Technology Alliance

Reply
0 Kudos
mellerbeck
Enthusiast
Enthusiast

This issue has been driving me crazy. How do you cleanly 'remove' a data store. I would do it the 'correct' way if I knew how!

Reply
0 Kudos
kattrap
Contributor
Contributor

This post has gotten a bit muddled with people replying with different SCSI codes (although they all start with NMP). I'm seeing the same SCSI codes as Ted (H:0x2 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0) and I've got a VM running Windows 2003 R2 / SQL Server 2005 SP3 (all fully patched) giving me messages in Event Viewer that are listed at the same time

"SQL Server has encountered ** occurrence(s) of I/O requests taking longer than 15 seconds to complete on file in database (5). The OS file handle is 0x00000648. The offset of the latest long I/O is: 0x000000cb0f2000"

So I'm not seeing anything as horrible as some customers are having (via Chad's blog post), but it is noticeable to guests that something's not quiet right.

Running ESX4u1 and have had some shoddy support dealings with tier1 vm support in the past. This post is mainly to link the SQL i/o message back to a vmware storage bug. I'm hesitent to make any advanced config changes on the ESX box from a few "busy" messages, but I'd love to hear other opinions.

Reply
0 Kudos
RanCyyD
Contributor
Contributor

Hi all!

Is there a resolution or a hotfix from VMware available so far?

We´re getting the errors mentioned above, when I´m trying to save a full VM via VCB. (VMs on a Netapp)

Saving the same VM on a selfbuild-Openfiler causes no Problems. (also connected via iSCSI)

Reply
0 Kudos
kattrap
Contributor
Contributor

If you haven't already, patch with the most recent set of ESX4 patches. I'm still verifying if the below kb is the fix but so far I haven't seen the errors in my vmkernel logs this morning.

It would be awesome if someone else could reply as well.

-


Scratch that. I'm still getting the same NMP messages. I've now learned that not every datastore available to the ESX host is setting off the error. :smileysilly:

Reply
0 Kudos
RanCyyD
Contributor
Contributor

@katrap:

The hosts are all on the same patch-level (last patches applied), but still getting the error (only on the Netapp-Storage), on Openfiler-Storage erverythings works fine.

As Morten Dalgaard wrote:

"The error, at least for me, also seems to be load related, as it

happens more often when VCB backup is running. Actually it almost only

occurs when VCB is running."

Seems the same here: the higher the load, the more the errors. (For example while restarting VMs)

Fascinating, that an Opensource-Software works fine and a really expensive storage doesn ´t...

Help and answers from VMware really appreciated!

Thanks!

Reply
0 Kudos
hmolvig
Contributor
Contributor

I also experienced severe issues with a NETAPP FAS. - Have you upgraded to Ontap 7.3 or higher ? - That's required.

Henrik

Reply
0 Kudos
RanCyyD
Contributor
Contributor

Hi Henrik,

thanks for your answer, will ask this our Storage-Admin.

At the moment I also try the solution to remove the mpio-driver from the VCB...

Edit: Unfortunately no improvement... Smiley Sad

Reply
0 Kudos
seniornwb
Contributor
Contributor

Has anyone found or gotten a solution to this from vmware yet.on the issue ? I am seeing this on my hosts too.

state in doubt, requested fast path state update,

A lot of these errors

Mar 20 07:07:55 LIC-VM16 vmkernel: 0:18:41:26.931 cpu16:4312)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x4100bb1ccd40) to NMP device "naa.600601600de11a00fa6e3b607a38dd11" failed on physical path "vmhba2:C0:T0:L103" H:0x2 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

Mar 20 07:07:55 LIC-VM16 vmkernel: 0:18:41:26.931 cpu16:4312)ScsiDeviceIO: 747: Command 0x2a to device "naa.600601600de11a00fa6e3b607a38dd11" failed H:0x2 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

Though mostly at one lun at a time, so it seems not the whole storage bus is out, while the error is host busy.

And last,

The virtual machines sometimes become unresponsive for a brief period of time.

Reply
0 Kudos