Highlighted
Enthusiast
Enthusiast

WARNING: NMP: nmp_DeviceRequestFastDeviceProbe

I am currently testing ESXi 4 by adding one ESXi 4 host to our VMware production cluster. The host is a HP BL460c G1 blade running ESXi 4 build 175625 connected to a HP EVA 6000 storage array. The ESXi 4 host seems to run fine but i noticed the following kernel warnings in the system log:

Jul 18 17:00:27 vmkernel: 2:07:08:24.308 cpu7:40478)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x4100021b8480) to NMP device "naa.600508b4000554df00007000034a0000" failed on physical path "vmhba1:C0:T0:L11" H:0x2 D:0x0 P:0x0 Possible sense data: Jul 18 17:00:27 0x0 0x0 0x0.

Jul 18 17:00:27 vmkernel: 2:07:08:24.308 cpu7:40478)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.600508b4000554df00007000034a0000" state in doubt; requested fast path state update...

Jul 18 17:00:27 vmkernel: 2:07:08:24.308 cpu7:40478)ScsiDeviceIO: 747: Command 0x2a to device "naa.600508b4000554df00007000034a0000" failed H:0x2 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

These warnings don't appear on our ESXi 3 hosts. These warning seems something to do with the multipath policies but i don't understand the warning message. This warnings are reported frequently on multiple lun's. Does anybody knows what these warnings mean?

blog: http://vknowledge.wordpress.com/
Tags (1)
0 Kudos
59 Replies
Highlighted
Contributor
Contributor

I am experiencing the same problems here in my production environment after upgrading from VMware ESX 3.5-U2 to VMware ESX 4 (Build 164009). My hardware setup:

- HP ProLiant DL360 G5 and DL360 G6 servers,

- Emulex LPe11000 HBAs (dual channel, 4 Gbps FibreChannel),

- transtec Provigo 550 FibreChannel SAN (two controllers, two 4 Gbps paths each),

- Cisco MDS9124 switches.

The SAN is connected to the two FC-switches, with two paths per switch (one path per controller per switch). Every server is connected with one path per switch to the switches. I.e., VMware ESX detects four paths to every LUN on the SAN. The path policy in ESX is set to "Fixed". (I have tested path failover -- and it works.)

/var/log/vmkernel is full of messages like this one:

Jul 27 10:21:44 vmware05 vmkernel: 5:00:22:01.094 cpu1:4262)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x4100021153c0) to NMP device "naa.60050cc000205a840000000000000023" failed on physical path "vmhba2:C0:T1:L0" H:0x0 D:0x28 P:0x0 Possible sense data: 0x0 0x0 0x0.

Jul 27 10:21:44 vmware05 vmkernel: 5:00:22:01.094 cpu1:4262)ScsiDeviceIO: 747: Command 0x2a to device "naa.60050cc000205a840000000000000023" failed H:0x0 D:0x28 P:0x0 Possible sense data: 0x0 0x0 0x0.

Any ideas?

Thanks in advance,

Josef.

0 Kudos
Highlighted
Contributor
Contributor

Hi,

I have seen the same messages on my ESX4 Servers. In my case I used Lefthand iSCS Storage. Everytime when I start a VCB Job on a new Volume this Events come up... (automount and scrub disabled) In my case, after i uninstalled the MPIO driver from the VCB Proxy everything works fine. But I had to recreate the VMFS Volumes...

Stefan

0 Kudos
Highlighted
Enthusiast
Enthusiast

I've just opened a VMware support case concerning this warning message. I will post the results to this discussion!

blog: http://vknowledge.wordpress.com/
0 Kudos
Highlighted
Contributor
Contributor

Great -- thanks a lot!

Josef.

0 Kudos
Highlighted
Contributor
Contributor

I'm seeing some similiar events. CX3-80 using FC and round robin. We removed one LUN, it causes the VMs on other LUNs to lose their network connectivity. Have a SR open now, but it looks like I'm going to have to call EMC.

Jul 27 14:01:18 fohapesx13 vmkernel: 31:23:37:10.259 cpu6:13355)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x41000224a980) to NMP device "naa.60060160ed741a0024bd2deeffa0dd11" failed on physical path "vmhba2:C0:T1:L66" H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x3f 0xe.

Jul 27 14:01:18 fohapesx13 vmkernel: 31:23:37:10.259 cpu6:13355)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.60060160ed741a0024bd2deeffa0dd11" state in doubt; requested fast path state update...

Jul 27 14:01:18 fohapesx13 vmkernel: 31:23:37:10.259 cpu6:13355)ScsiDeviceIO: 747: Command 0x2a to device "naa.60060160ed741a0024bd2deeffa0dd11" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x3f 0xe.

Jul 27 14:01:18 fohapesx13 vmkernel: 31:23:37:10.371 cpu6:14106)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x4100021a02c0) to NMP device "naa.60060160ed741a00c2135fabd30cde11" failed on physical path "vmhba2:C0:T1:L53" H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x3f 0xe.

Jul 27 14:01:18 fohapesx13 vmkernel: 31:23:37:10.371 cpu6:14106)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.60060160ed741a00c2135fabd30cde11" state in doubt; requested fast path state update...

Jul 27 14:01:18 fohapesx13 vmkernel: 31:23:37:10.371 cpu6:14106)ScsiDeviceIO: 747: Command 0x2a to device "naa.60060160ed741a00c2135fabd30cde11" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x3f 0xe.

0 Kudos
Highlighted
Enthusiast
Enthusiast

After upgrading one of our ESX Hosts to vSphere we experienced the same problems. /var/log/vmkernel got filled up rapidly with this warning message. The system also got very slow, and it took ages to boot up.

I found out that ist has to to with the so called Lun 0 (LUNZ on EMC systems) and how it is handled from the vSphere on the one hand and presented by the storage system on the other hand.

Our scenario

Storage System 1 : Clariion CX

Storage System 2 : FalconStor IPStor

2 ESX 3.5 Hosts and one ESX 4 Host.

The Clariion presents a couple of Luns to all ESX-Host. The "old" hosts only see the luns which are assigned to them. The 4.0 host sees an additional Lun 0 (LUNZ). Usually this Lun is presented only if there is no Lun with the Host-ID 0 (i guess for scsi compatibility reasons). If you do an "esxcfg-scsidevs -l" you see this LUNZ-Lun, but it has the Flag "Is Pseudo: true".

The Clariion also had a real Lun 0 with vmfs on it, so we had no problems with it.

But on FalconStor the first lun with vmfs on it had the host id 1, so falconstor also presented a dummy lun 0, and that was the one which caused the trouble. ESX 4.0 did not recognize that ist was only a fake (the Flag "Is Pseudo" was false) and tried to get it under its control.....

So what we did was to create a small lun on Falconstor with only 10 MB and present it with the Host-ID 0 to the ESX-Hosts - no problems any longer.

As soon as Falconstor presents a Lun with HostID 0 it does not use its internal dummy Lun 0 any longer - in my opinion that is the correct method.

Maybe you have a similar issue.

Best Regards

Carsten

0 Kudos
Highlighted
Contributor
Contributor

VMware support is saying it is a storage issue, so I have opened a ticket with EMC. Waiting on a call back. VMware is telling me that vSphere is aware that LUN 0 is not an acutal LUN. The commands are reads and writes and are failing at the storage array.

It happened last week to us when we removed a few luns, but only a few VMs went off the net once for a few minutes. The same events showed up in the vkernel logs for every ESX server and all LUNs. Today we removed one LUN, and I didn't force a rescan. The VMs on other LUNs kept going off the network for a couple minutes over and over again. It finally stopped when I did a manual rescan. More events:

Jul 27 14:01:18 fohapesx13 vmkernel: 31:23:37:10.259 cpu6:13355)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x41000224a980) to NMP device "naa.60060160ed741a0024bd2deeffa0dd11" failed on physical path "vmhba2:C0:T1:L66" H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x3f 0xe.

Jul 27 14:01:18 fohapesx13 vmkernel: 31:23:37:10.259 cpu6:13355)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.60060160ed741a0024bd2deeffa0dd11" state in doubt; requested fast path state update...

Jul 27 14:01:18 fohapesx13 vmkernel: 31:23:37:10.259 cpu6:13355)ScsiDeviceIO: 747: Command 0x2a to device "naa.60060160ed741a0024bd2deeffa0dd11" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x3f 0xe.

Jul 27 14:01:18 fohapesx13 vmkernel: 31:23:37:10.371 cpu6:14106)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x4100021a02c0) to NMP device "naa.60060160ed741a00c2135fabd30cde11" failed on physical path "vmhba2:C0:T1:L53" H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x3f 0xe.

Jul 27 14:01:18 fohapesx13 vmkernel: 31:23:37:10.371 cpu6:14106)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.60060160ed741a00c2135fabd30cde11" state in doubt; requested fast path state update...

Jul 27 14:01:18 fohapesx13 vmkernel: 31:23:37:10.371 cpu6:14106)ScsiDeviceIO: 747: Command 0x2a to device "naa.60060160ed741a00c2135fabd30cde11" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x3f 0xe.

Jul 27 14:01:18 fohapesx13 vmkernel: 31:23:37:10.425 cpu6:13355)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x410002266f40) to NMP device "naa.60060160ed741a004c71b05d5662de11" failed on physical path "vmhba2:C0:T1:L11" H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x3f 0xe.

Jul 27 14:01:18 fohapesx13 vmkernel: 31:23:37:10.425 cpu6:13355)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.60060160ed741a004c71b05d5662de11" state in doubt; requested fast path state update...

Jul 27 14:01:18 fohapesx13 vmkernel: 31:23:37:10.425 cpu6:13355)ScsiDeviceIO: 747: Command 0x2a to device "naa.60060160ed741a004c71b05d5662de11" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x3f 0xe.

Jul 27 14:01:18 fohapesx13 vmkernel: 31:23:37:10.831 cpu6:13355)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x41000207e580) to NMP device "naa.60060160ed741a00caeba6dd155cde11" failed on physical path "vmhba2:C0:T0:L10" H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x3f 0xe.

Jul 27 14:01:18 fohapesx13 vmkernel: 31:23:37:10.831 cpu6:13355)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.60060160ed741a00caeba6dd155cde11" state in doubt; requested fast path state update...

Jul 27 14:01:18 fohapesx13 vmkernel: 31:23:37:10.831 cpu6:13355)ScsiDeviceIO: 747: Command 0x2a to device "naa.60060160ed741a00caeba6dd155cde11" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x3f 0xe.

Jul 27 14:01:19 fohapesx13 vmkernel: 31:23:37:12.128 cpu6:13355)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x410002257480) to NMP device "naa.60060160ed741a007811dced5c5bde11" failed on physical path "vmhba2:C0:T1:L9" H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x3f 0xe.

Jul 27 14:01:19 fohapesx13 vmkernel: 31:23:37:12.128 cpu6:13355)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.60060160ed741a007811dced5c5bde11" state in doubt; requested fast path state update...

Jul 27 14:01:19 fohapesx13 vmkernel: 31:23:37:12.128 cpu6:13355)ScsiDeviceIO: 747: Command 0x2a to device "naa.60060160ed741a007811dced5c5bde11" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x3f 0xe.

Jul 27 14:01:21 fohapesx13 vmkernel: 31:23:37:13.362 cpu6:13203)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x410002253780) to NMP device "naa.60060160ed741a0024bd2deeffa0dd11" failed on physical path "vmhba1:C0:T0:L66" H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x3f 0xe.

Jul 27 14:01:21 fohapesx13 vmkernel: 31:23:37:13.362 cpu6:13203)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.60060160ed741a0024bd2deeffa0dd11" state in doubt; requested fast path state update...

Jul 27 14:01:21 fohapesx13 vmkernel: 31:23:37:13.362 cpu6:13203)ScsiDeviceIO: 747: Command 0x2a to device "naa.60060160ed741a0024bd2deeffa0dd11" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x3f 0xe.

Jul 27 14:01:33 fohapesx13 vmkernel: 31:23:37:25.461 cpu6:14123)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x41000219fac0) to NMP device "naa.60060160ed741a004c71b05d5662de11" failed on physical path "vmhba1:C0:T0:L11" H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x3f 0xe.

Jul 27 14:01:33 fohapesx13 vmkernel: 31:23:37:25.461 cpu6:14123)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.60060160ed741a004c71b05d5662de11" state in doubt; requested fast path state update...

Jul 27 14:01:33 fohapesx13 vmkernel: 31:23:37:25.461 cpu6:14123)ScsiDeviceIO: 747: Command 0x2a to device "naa.60060160ed741a004c71b05d5662de11" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x3f 0xe.

Jul 27 14:01:35 fohapesx13 vmkernel: 31:23:37:27.674 cpu6:13355)NMP: nmp_CompleteCommandForPath: Command 0x28 (0x41000202f600) to NMP device "naa.60060160ed741a00c2135fabd30cde11" failed on physical path "vmhba1:C0:T0:L53" H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x3f 0xe.

Jul 27 14:01:35 fohapesx13 vmkernel: 31:23:37:27.674 cpu6:13355)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.60060160ed741a00c2135fabd30cde11" state in doubt; requested fast path state update...

Jul 27 14:01:35 fohapesx13 vmkernel: 31:23:37:27.674 cpu6:13355)ScsiDeviceIO: 747: Command 0x28 to device "naa.60060160ed741a00c2135fabd30cde11" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x3f 0xe.

Jul 27 14:02:57 fohapesx13 vmkernel: 31:23:38:49.246 cpu7:11283)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x41000211dc40) to NMP device "naa.60060160ed741a00caeba6dd155cde11" failed on physical path "vmhba1:C0:T1:L10" H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x3f 0xe.

Jul 27 14:02:57 fohapesx13 vmkernel: 31:23:38:49.246 cpu7:11283)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.60060160ed741a00caeba6dd155cde11" state in doubt; requested fast path state update...

Jul 27 14:02:57 fohapesx13 vmkernel: 31:23:38:49.246 cpu7:11283)ScsiDeviceIO: 747: Command 0x2a to device "naa.60060160ed741a00caeba6dd155cde11" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x3f 0xe.

Jul 27 14:03:05 fohapesx13 vmkernel: 31:23:38:57.970 cpu7:4103)NMP: nmp_CompleteCommandForPath: Command 0x25 (0x4100021e0540) to NMP device "naa.60060160ed741a005281c721ef28de11" failed on physical path "vmhba1:C0:T1:L56" H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x3f 0xe.

Jul 27 14:03:05 fohapesx13 vmkernel: 31:23:38:57.970 cpu7:4103)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.60060160ed741a005281c721ef28de11" state in doubt; requested fast path state update...

Is anyone else booting ESX locally, attached to a CLARiiON, not using LUN 0, and round robin?

Thanks.

0 Kudos
Highlighted
Contributor
Contributor

Some more details ...

Log-entries in /var/log/vmkernel are as described in the previous posting:

Jul 28 08:39:54 vmware05 vmkernel: 0:20:44:22.115 cpu1:4259)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x4100020c7f40) to NMP device "naa.60050cc000205a840000000000000023" failed on physical path "vmhba2:C0:T1:L0" H:0x0 D:0x28 P:0x0 Possible sense data: 0x0 0x0 0x0.

Jul 28 08:39:54 vmware05 vmkernel: 0:20:44:22.115 cpu1:4259)ScsiDeviceIO: 747: Command 0x2a to device "naa.60050cc000205a840000000000000023" failed H:0x0 D:0x28 P:0x0 Possible sense data: 0x0 0x0 0x0.

In between these log entries there are also messages like these:

Jul 28 08:40:34 vmware05 vmkernel: 0:20:45:02.471 cpu1:4275)WARNING: NMP: nmp_RegisterDevice: Registration of NMP device with primary uid 'mpx.vmhba2:C0:T0:L6' failed. Already exists

Jul 28 08:40:34 vmware05 vmkernel: 0:20:45:02.471 cpu1:4275)VMWARE SCSI Id: Id for vmhba2:C0:T1:L6

Jul 28 08:40:34 vmware05 vmkernel: 0x46 0x46 0x45 0x30 0x35 0x41 0x38 0x34 0x50 0x52 0x4f 0x56 0x49 0x47

Jul 28 08:40:34 vmware05 vmkernel: 0:20:45:02.471 cpu1:4275)ScsiUid: 370: Existing device mpx.vmhba1:C0:T0:L6 already has uid vml.0103060000464645303541383450524f564947

Jul 28 08:40:34 vmware05 vmkernel: 0:20:45:02.471 cpu1:4275)ScsiDevice: 1734: Failing registration of device 'mpx.vmhba2:C0:T1:L6': failed to add legacy uid vml.0103060000464645303541383450524f564947on path vmhba2:C0:T1:L6: Already exists

About every five seconds the following log entries (three lines per "record") are written to /var/log/vmkwarning:

Jul 26 01:34:43 vmware05 vmkernel: 3:15:35:02.465 cpu2:4418)WARNING: NMP: nmp_RegisterDevice: Registration of NMP device with primary uid 'mpx.vmhba1:C0:T1:L6' failed. Already exists

Jul 26 01:34:43 vmware05 vmkernel: 3:15:35:02.466 cpu2:4418)WARNING: NMP: nmp_RegisterDevice: Registration of NMP device with primary uid 'mpx.vmhba2:C0:T0:L6' failed. Already exists

Jul 26 01:34:43 vmware05 vmkernel: 3:15:35:02.466 cpu2:4418)WARNING: NMP: nmp_RegisterDevice: Registration of NMP device with primary uid 'mpx.vmhba2:C0:T1:L6' failed. Already exists

Output of 'esxcfg-scsidevs -l' is for this device:

naa.60050cc000205a840000000000000023

Device Type: Direct-Access

Size: 1361358 MB

Display Name: transtec Fibre Channel Disk (naa.60050cc000205a840000000000000023)

Plugin: NMP

Console Device: /dev/sdf

Devfs Path: /vmfs/devices/disks/naa.60050cc000205a840000000000000023

Vendor: transtec Model: PROVIGO 550F Revis:

SCSI Level: 5 Is Pseudo: false Status: on

Is RDM Capable: true Is Removable: false

Is Local: false

Other Names:

vml.020000000060050cc000205a84000000000000002350524f564947

Josef.

0 Kudos
Highlighted
Contributor
Contributor

I have the same issue here in my production environment after upgrading from VMware ESX 3.5-U4 to VMware ESX 4 Build 175625.

Our hardware setup:

- 2 Dell PowerEdge 2850 servers upgraded to ESX 4.0

- 2 QLA2340 2Gb FC HBA

- 2 Dell PowerEdge 2950 servers upgraded to ESX 4.0

- 2 ISP2432 4GB FC HBA

- 3 Dell PowerEdge 2950 III serverswith ESXi 3.5U4 Embedded

- 2 ISP2432 4GB FC HBA

connected to the 2 Storage systems:

- Datacore Storage Server SANMelody

- EMC Clariion CX300

The each SAN Storage is connected to the two FC-switches, with one path per switch. Every server is connected with one path per switch to the switches. The VMware ESX detects two paths to every LUN on the each SAN Storage. The path policy in ESX is set to "Fixed".

The /var/log/vmkernell on all ESX 4.0 hosts have warnings like this:

Jul 28 10:31:12 xxxx vmkernel: 6:23:58:54.638 cpu3:4188)<6>qla2xxx 0000:0e:00.0: scsi(6:3:3): Abort command issued -- 1 24b
7d5e 2002.
Jul 28 10:31:12 xxxx vmkernel: 6:23:58:54.639 cpu3:4099)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x410005198900) to N
MP device "naa.60030d90564d2d4e564d340000000000" failed on physical path "vmhba2:C0:T3:L3" H:0x8 D:0x0 P:0x0 Possible sense
data: 0x0 0x0 0x0.
Jul 28 10:31:12 xxxx vmkernel: 6:23:58:54.639 cpu3:4099)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.600
30d90564d2d4e564d340000000000" state in doubt; requested fast path state update...
Jul 28 10:31:12 xxxx vmkernel: 6:23:58:54.639 cpu3:4099)ScsiDeviceIO: 747: Command 0x2a to device "naa.60030d90564d2d4e564d
340000000000" failed H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
Jul 28 10:31:12 xxxx vmkernel: 6:23:58:54.639 cpu3:4188)<6>qla2xxx 0000:0e:00.0: scsi(6:3:5): Abort command issued -- 1 24b
7d59 2002.
Jul 28 10:31:12 xxxx vmkernel: 6:23:58:54.639 cpu3:4099)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x41000507c240) to N
MP device "naa.60030d90564d2d4e564d360000000000" failed on physical path "vmhba2:C0:T3:L5" H:0x8 D:0x0 P:0x0 Possible sense
data: 0x0 0x0 0x0.
Jul 28 10:31:12 xxxx vmkernel: 6:23:58:54.639 cpu3:4099)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.600
30d90564d2d4e564d360000000000" state in doubt; requested fast path state update...
Jul 28 10:31:12 xxxx vmkernel: 6:23:58:54.639 cpu3:4099)ScsiDeviceIO: 747: Command 0x2a to device "naa.60030d90564d2d4e564d
360000000000" failed H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
Jul 28 10:31:12 xxxx vmkernel: 6:23:58:54.639 cpu3:4188)<6>qla2xxx 0000:0e:00.0: scsi(6:3:7): Abort command issued -- 1 24b
7d5c 2002.
Jul 28 10:31:12 xxxx vmkernel: 6:23:58:54.639 cpu3:4099)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x410005029bc0) to N
MP device "naa.60030d90564d2d4e564d380000000000" failed on physical path "vmhba2:C0:T3:L7" H:0x8 D:0x0 P:0x0 Possible sense
data: 0x0 0x0 0x0.
Jul 28 10:31:12 xxxx vmkernel: 6:23:58:54.639 cpu3:4099)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.600
30d90564d2d4e564d380000000000" state in doubt; requested fast path state update...
Jul 28 10:31:12 xxxx vmkernel: 6:23:58:54.639 cpu3:4099)ScsiDeviceIO: 747: Command 0x2a to device "naa.60030d90564d2d4e564d
380000000000" failed H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
Jul 28 10:31:12 xxxx vmkernel: 6:23:58:54.640 cpu3:4188)<6>qla2xxx 0000:0e:00.0: scsi(6:3:4): Abort command issued -- 1 24b

7d5d 2002.

Jul 28 10:31:12 xxxx vmkernel: 6:23:58:54.640 cpu3:4099)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x410005196600) to N

MP device "naa.60030d90564d2d4e564d350000000000" failed on physical path "vmhba2:C0:T3:L4" H:0x8 D:0x0 P:0x0 Possible sense

data: 0x0 0x0 0x0.

Jul 28 10:31:12 xxxx vmkernel: 6:23:58:54.640 cpu3:4099)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.600

30d90564d2d4e564d350000000000" state in doubt; requested fast path state update...

Jul 28 10:31:12 vmkd vmkernel: 6:23:58:54.640 cpu3:4099)ScsiDeviceIO: 747: Command 0x2a to device "naa.60030d90564d2d4e564d

350000000000" failed H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

-


These warnings don't appear on our ESXi 3 hosts. These warnings are reported frequently on multiple lun's, not only on lun 0, but only Datacore Storage Server SANMelody lun's affected. Also the ESX 4.0 hosts got slow.

Any ideas how to resolve this issue?

Best Regards,

Dmitri

0 Kudos
Highlighted
Contributor
Contributor

Opened a SR with EMC and they see no issues.

We are going to set up a new cluster and see if we can reproduce our issue. We suspect it has something to do with LUN 0, but we cannot confirm.

EMC CX3-80, local install, no LUN0, and using round robin. Anyone else with a similar config?

0 Kudos
Highlighted
Enthusiast
Enthusiast

My hardware setup is as following:

ESX cluster based on HP Blade Infrastructure:

  • HP BL460c G1 servers and HP BL460c G6 servers

  • 2 QLA 4Gb FC HBA per server

Connected to HP Storage system:

  • HP EVA 6000 XP Array

The SAN Storage is connected to the two FC-switches, with one path per switch. Every server is connected with one path per switch to the switches. The VMware ESX detects two paths to every LUN on the each SAN Storage. The path policy in ESX is set to "Fixed"

On the ESXi 4 servers the storage array controller is detected as lun 0 and recognized by ESXi 4 as type 'Array'. I am currently building a new VMware cluster to see if the warnings still do appear.

blog: http://vknowledge.wordpress.com/
0 Kudos
Highlighted
Contributor
Contributor

Hi all,

I tried to investigate the issue, had a conversation with our SAN vendor, and I think that I do, in fact, have some answers.

(1) nmp_CompleteCommandForPath ... Command 0x2a to NMP device failed on physical path ... Possible sense data 0x0 0x0 0x0:

(1a) Analysis:

Jul 28 08:39:54 vmware05 vmkernel: 0:20:44:22.115 cpu1:4259)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x4100020c7f40) to NMP device "naa.60050cc000205a840000000000000023" failed on physical path "vmhba2:C0:T1:L0" H:0x0 D:0x28 P:0x0 Possible sense data: 0x0 0x0 0x0.

Jul 28 08:39:54 vmware05 vmkernel: 0:20:44:22.115 cpu1:4259)ScsiDeviceIO: 747: Command 0x2a to device "naa.60050cc000205a840000000000000023" failed H:0x0 D:0x28 P:0x0 Possible sense data: 0x0 0x0 0x0.

The sense codes logged by VMware stand for "TASK SET FULL". Our SAN vendor told us that, at least for them, this is a known "issue". In fact, it is not even a real issue. The explanation is: The SAN's controller has a write cache (for each array). When a single host, for example, writes a lot of data to a single array, the write cache might be full, and other hosts might not be able to write to the write cache. Our SAN offers a setting for "overload management". When overload management is enabled the hosts that have to wait until the write cache is free will be sent the message "TASK SET FULL" by the SAN's controller. I.e., these hosts cannot write to the SAN at the moment and will have to wait. VMware waits and logs this event with the corresponding sense data for "TASK SET FULL" to /var/log/vmkernel.

(1b) Additional information:

There is a VMware Knowledge Base article on SCSI sense codes:

The log message above contains the following codes:

- H:0x0 D:0x28 P:0x0 Possible sense data: 0x0 0x0 0x0

The interesting section here is the code starting with "D" (D stands for "device status"). Device status 0x28 means for "TASK SET FULL".

(1c) Solution:

I re-configured our SAN. The write cache setting for each array was set to "maximum", and I reduced it to a fixed amount. Hence, the arrays really act independently from each other. (Otherwise a write cache congestion on one array can have an impact on other arrays). Moreover, I changed the "overload management setting" from:

- Enabled: Commands that can not be accepted before the response timeout will fail with the status TASK SET FULL (0x28).

to:

- Disabled: No target queue full timeout will be enforced. Commands will wait until they can be processed or are timed out in the transport layer.

Furthermore, I activated the option "Enable cache Writethrough operation when write cache is full." (I prefer slow write operations to the SAN to no write operations.) Smiley Wink

(1d) Note:

The log messages do not appear any longer. (At least at the moment.) However, the log messages did not appear in ESX 3.5-U2 anyway -- they only started appearing in ESX 4.0. So either ESX 4.0 handles SCSI write commands in a different way (rather unlikely) or ESX 4.0 simply logs more or increasingly detailed messages.

(2) nmp_RegisterDevice: Registration of NMP device failed:

(2a) Analysis:

Jul 26 01:34:43 vmware05 vmkernel: 3:15:35:02.465 cpu2:4418)WARNING: NMP: nmp_RegisterDevice: Registration of NMP device with primary uid 'mpx.vmhba1:C0:T1:L6' failed. Already exists

Jul 26 01:34:43 vmware05 vmkernel: 3:15:35:02.466 cpu2:4418)WARNING: NMP: nmp_RegisterDevice: Registration of NMP device with primary uid 'mpx.vmhba2:C0:T0:L6' failed. Already exists

Jul 26 01:34:43 vmware05 vmkernel: 3:15:35:02.466 cpu2:4418)WARNING: NMP: nmp_RegisterDevice: Registration of NMP device with primary uid 'mpx.vmhba2:C0:T1:L6' failed. Already exists

I have six LUNs on the SAN (LUN 0 through LUN 5). LUN 6 is the SAN's controller. So these error messages correspond to the SAN's controller and not to any of the datastores.

Unfortunately, I do not have an answer for this issue yet ... and /var/log/vmkernel is filling up rapidly -- at 26,000 lines or 4.5 MB per hour.

What I'd like to see is a ESX setting that lets me disable these messages for a given LUN.

Best regards,

Josef.

Highlighted
Enthusiast
Enthusiast

Hi all,

After contacting VMware support I'd like to share the information with you. The problem i am experiencing is a little bit different as Josef is experiencing. I am experiencing the following warning:

Jul 18 17:00:27 vmkernel: 2:07:08:24.308 cpu7:40478)ScsiDeviceIO: 747: Command 0x2a to device "naa.600508b4000554df00007000034a0000" failed H:0x2 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0

The log message above contains the following codes:

failed H:0x2 D:0x0 P:0x0

The interesting section here is the code starting with "H" (H stands for "Host status"). Host status 0x2 means "HOST BUSY"

Vmware support gives the following explanation for this:

-


I checked with our bug database and as I had thought previously, H:0x2 D:0x0 P:0x0 translates to hba busy. The driver for whatever reason failed the i/o with a busy status. These can occur for any number of reasons. These failures are automatically retried by ESX.

Jul 18 17:00:27 vmkernel: 2:07:08:24.308 cpu7:40478)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.600508b4000554df00007000034a0000" state in doubt; requested fast path state update..."

This messaging will initially indicate that a NMP command was not responsive on a device, thus the NMP plugin 'doubted' the sate of the lun, i.e was it busy, was it on a path, was it responsive. This could be a driver problem or spurious logging. A bug for this message has been logged, and as yet is not an issue, unless followed by failing I/O or VM failures.

-


So it looks like a bug, but as yet is not an issue. Hope this gives some clarification!

Regards,

Ted

blog: http://vknowledge.wordpress.com/
0 Kudos
Highlighted
Enthusiast
Enthusiast

I am also occasionally receiving the error you just mentioned supahted, specifically the 0x2 HOST BUSY error followed by the "state in doubt" warning on our EVA8000.

My immediate guess is that the error differs depending on the type of storage array you have, so HP EVA's may be giving out HOST BUSY errors, while other arrays may be giving out the TASK SET FULL error.

The error, at least for me, also seems to be load related, as it happens more often when VCB backup is running. Actually it almost only occurs when VCB is running.

Could this be related to SCSI reservation errors, eg, the resolution could be to split LUNs up into smaller LUNs?

Clarification from a VMware engineer/expert would be appreciated

[edit}

After reading the article mentioned above: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=289902 i am a bit surprised to see that the H:0x2 code actually suggests that the HBA is issuing the error. So is it the HBA which is "Host Busy" or is it simply timing out reads/writes to the storage array?

0 Kudos
Highlighted
Enthusiast
Enthusiast

Ich befinde mich vom 31.07.2009 bis einschließlich 14.08.2009 nicht im Hause. In dringenden Fällen wenden Sie sich bitte an technik@witcom.de oder an 0611 7803003

Mit freundlichen Grüssen

Carsten Buchberger

0 Kudos
Highlighted
Contributor
Contributor

We stood up a 3 node ESX test cluster connected to a CX3-80. We added 4 LUNs and put VMs on 3 of the LUNs. We set up continous pings to the VMs. Whenever we remove the one LUN without VMs, the other VMs will eventually "freeze" after a period of time. Pings will fail and the console of the VM will be a blank black screen.

We tired adding a LUN 0 and that didn't fix it. We changed the pathing policy from round robin to MRU. Still broken. We disabled the auto rescan. Still broken. The time of VM failure could be 5 minutes to 1 hour. This will keep reoccuring at different intervals. The only way to stop it is by manually rescanning on the ESX host.

Has anyone removed a lun on an ESX4 cluster and experienced the same thing?

Thanks.

0 Kudos
Highlighted
Contributor
Contributor

Hi

Our system have same problem:

- HP BL680

- Storage Hitachi

It takes long time when rescan adapter and resolve VMFS

0 Kudos
Highlighted
Contributor
Contributor

We have the same issues:

Sep 11 14:49:48 esx5 vmkernel: 0:15:30:14.830 cpu15:4111)NMP: nmp_CompleteCommandForPath: Command 0x28 (0x41000b04cd80) to NMP device "naa.60a9800050334c646c4a50744f547334" failed on physical path "vmhba5:C0:T3:L0" H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x3f 0xe.

Sep 11 14:49:48 esx5 vmkernel: 0:15:30:14.830 cpu15:4111)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.60a9800050334c646c4a50744f547334" state in doubt; requested fast path state update...

Sep 11 14:49:48 esx5 vmkernel: 0:15:30:14.830 cpu15:4111)ScsiDeviceIO: 747: Command 0x28 to device "naa.60a9800050334c646c4a50744f547334" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x3f 0xe.

We also had a VMFS failure on 1 directory (1 VM) - that no longer can be accessed. Say VMFS lock corrupt. This could be triggered by the above errors. We had no issues before upgrading to ESX4.

Any who can help here ? - What should I do...

Henrik

0 Kudos
Highlighted
Contributor
Contributor

Vielen Dank für Ihre E-Mail. Ich bin bis zum 21. September nicht im Büro. In dringenden Fällen wenden Sie sich bitte an unseren Hepldesk oder an Herrn Haselau Tel. 0541 9493-157

Thank you for E-Mail. I am currently out of the office and will return your message when I return September 21th, 2009. If you require immediate assistance, please email our Helpdesk or call at +49(0)41-9493-157

Mit freundlichen Grüßen

i.A. Stefan Ohlmeyer

Stellv. Leiter Technik

SIEVERS-SNC

Computer & Software GmbH & Co. KG

Ein Unternehmen der SIEVERS-GROUP

Hans-Wunderlich-Straße 8

49078 Osnabrück

Fon: +49 (541) 9493-160

Fax: +49 (541) 9493-260

E-Mail: sohlmeyer@sievers-group.com

Web: www.sievers-group.com

Pers. haftende Gesellschafterin:

SIEVERS-SNC Beteiligungs GmbH

Amtsgericht Osnabrück, HRB 19289

Geschäftsführer:

Dipl.-Kfm. Klaus Gerdes-Röben

Software-Ing. Marco Naber

Dipl.-Wirtschaftsing. Rüdiger Sievers

Amtsgericht Osnabrück, HRA 6465

0 Kudos