joshscott
Contributor
Contributor

ESXi 4.1 losing and recovering connections to data stores constantly

So I have 3 ESXi 4.1 hosts running on an Intel Modular Server with the most current firmware installed. I have 3 data stores on the IMS itself and 2 more on a Promise vTrak e610s.

Throughout the day I am getting constant errors and info warnings from vSphere that the ESXi hosts have "Lost connectivity to storage device ..." a moment later I get an informational that it is active again. This seems to happen randomly. It does not affect all of them simultaneously.

I have contacted both Promise and Intel and they say that their systems are running normally.

The largest impact seems to be to our database server that is running on one of the hosts. As part of our month end accounting proccesses we have to update a large number of tables and when these errors start occuring the process can take up to 10x longer than normal.

I am at a bit of a loss as to what is going on here. There doesn't seem to be any reason why they aren't working. If anybody could shed some light on the situation I would be very grateful and probably buy you a beer (or 2) if you ever come to Dallas.

0 Kudos
7 Replies
AWo
Immortal
Immortal

Do you face that behaviour (slow, connection loss) right from the beginnig or has it started recently?

AWo

vExpert 2009/10/11 [:o]===[o:] [: ]o=o[ :] = Save forests! rent firewood! =
0 Kudos
joshscott
Contributor
Contributor

We had an issue with it when it I first connected to the Promise Array but that was months ago. It has been stable since then. About 2 weeks ago it started up again. I suspected at first that it might be the Promise Array but I can't think of why it would be having issues with the other 2 datastores if that were the case.

I have rebooted every piece of equipment and no improvements.

0 Kudos
Realitysoft
Enthusiast
Enthusiast

Hi,

Has anything changed since it orginally was an issue such as guests relocated and contention on the same partitions?  have you looked at ESXTOP and disk stats to see if anything unusual is spiking?

You could try increasing the timeout value. Are the guest disks spread across different storage arrays?

What type of datastores are presented and what is the load i.e. 500GB LUN with 10 VMs located...

Thanks, Jim
0 Kudos
joshscott
Contributor
Contributor

esxtop doesn't show anything unusual and I haven't migrated any of the VM's around on the datastores since the promise array was first added to the mix.

As for datastores themselves they are as follows

  • Datastore 1 = 1.98 TB (Located on the Promise Array) 11 Virtual Machines
  • Datastore 2 = 1.98 TB (Located on the Promise Array) 1 Virtual Machine
  • Datastore 3 = 808.75 GB (Located on the IMS) 2 Virtual Machines
  • Datastore 4 = 402.75 GB (Located on the IMS) 1 Virtual Machine
  • Datastore 5 = 402.75 GB (Located on the IMS) 5 Virtual Machines

I have a total of 20 VMs running on accross 3 ESXi Hosts.

I haven't migrated the virtual machines around at all since the Promise Array was first installed. They have stayed put.

0 Kudos
joshscott
Contributor
Contributor

I took a look at the messages log on one of the ESXi Hosts and It was full of the following

Mar  2 16:01:36 vmkernel: 12:18:56:34.587 cpu10:4106)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x4102bebb8240) to NMP device "eui.22eb000155cf49f5" failed on physical path "vmhba0:C0:T0:L1" H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.
Mar  2 16:01:36 vmkernel: 12:18:56:34.587 cpu10:4106)ScsiDeviceIO: 1672: Command 0x2a to device "eui.22eb000155cf49f5" failed H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.
Mar  2 16:01:36 vmkernel: 12:18:56:34.587 cpu10:4106)NMP: nmp_CompleteCommandForPath: Command 0x28 (0x4102bef66140) to NMP device "eui.22eb000155cf49f5" failed on physical path "vmhba0:C0:T0:L1" H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.
Mar  2 16:01:36 vmkernel: 12:18:56:34.587 cpu10:4106)ScsiDeviceIO: 1672: Command 0x28 to device "eui.22eb000155cf49f5" failed H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.
Mar  2 16:01:36 vmkernel: 12:18:56:34.588 cpu10:4106)NMP: nmp_CompleteCommandForPath: Command 0x28 (0x4102bf9d6c40) to NMP device "eui.22eb000155cf49f5" failed on physical path "vmhba0:C0:T0:L1" H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.
Mar  2 16:01:36 vmkernel: 12:18:56:34.588 cpu10:4106)ScsiDeviceIO: 1672: Command 0x28 to device "eui.22eb000155cf49f5" failed H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.
Mar  2 16:01:36 vmkernel: 12:18:56:34.589 cpu10:4106)NMP: nmp_CompleteCommandForPath: Command 0x28 (0x4102befe0e40) to NMP device "eui.22eb000155cf49f5" failed on physical path "vmhba0:C0:T0:L1" H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.
Mar  2 16:01:36 vmkernel: 12:18:56:34.589 cpu10:4106)ScsiDeviceIO: 1672: Command 0x28 to device "eui.22eb000155cf49f5" failed H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.
Mar  2 16:01:36 vmkernel: 12:18:56:34.591 cpu10:4106)NMP: nmp_CompleteCommandForPath: Command 0x28 (0x4102bf1b4640) to NMP device "eui.22eb000155cf49f5" failed on physical path "vmhba0:C0:T0:L1" H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.
Mar  2 16:01:36 vmkernel: 12:18:56:34.591 cpu10:4106)ScsiDeviceIO: 1672: Command 0x28 to device "eui.22eb000155cf49f5" failed H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.
Mar  2 16:01:36 vmkernel: 12:18:56:34.592 cpu10:4106)NMP: nmp_CompleteCommandForPath: Command 0x28 (0x4102be754940) to NMP device "eui.22eb000155cf49f5" failed on physical path "vmhba0:C0:T0:L1" H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.
Mar  2 16:01:36 vmkernel: 12:18:56:34.592 cpu10:4106)ScsiDeviceIO: 1672: Command 0x28 to device "eui.22eb000155cf49f5" failed H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.
Mar  2 16:01:36 vmkernel: 12:18:56:34.631 cpu0:2687492)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x41027fa52540) to NMP device "eui.22e4000155d1d9a0" failed on physical path "vmhba0:C0:T0:L2" H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.
Mar  2 16:01:36 vmkernel: 12:18:56:34.631 cpu0:2687492)ScsiDeviceIO: 1672: Command 0x2a to device "eui.22e4000155d1d9a0" failed H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.
Mar  2 16:01:36 vmkernel: 12:18:56:34.637 cpu10:4106)NMP: nmp_CompleteCommandForPath: Command 0x28 (0x4102bfa72340) to NMP device "eui.22eb000155cf49f5" failed on physical path "vmhba0:C0:T0:L1" H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.
Mar  2 16:01:36 vmkernel: 12:18:56:34.637 cpu10:4106)ScsiDeviceIO: 1672: Command 0x28 to device "eui.22eb000155cf49f5" failed H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.
Mar  2 16:01:36 vmkernel: 12:18:56:34.637 cpu10:4106)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x4102bebb8240) to NMP device "eui.22eb000155cf49f5" failed on physical path "vmhba0:C0:T0:L1" H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.
Mar  2 16:01:36 vmkernel: 12:18:56:34.637 cpu10:4106)ScsiDeviceIO: 1672: Command 0x2a to device "eui.22eb000155cf49f5" failed H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.
Mar  2 16:01:36 vmkernel: 12:18:56:34.638 cpu10:4106)NMP: nmp_CompleteCommandForPath: Command 0x28 (0x4102bef66140) to NMP device "eui.22eb000155cf49f5" failed on physical path "vmhba0:C0:T0:L1" H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.
Mar  2 16:01:36 vmkernel: 12:18:56:34.638 cpu10:4106)ScsiDeviceIO: 1672: Command 0x28 to device "eui.22eb000155cf49f5" failed H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.
Mar  2 16:01:36 vmkernel: 12:18:56:34.640 cpu10:4106)NMP: nmp_CompleteCommandForPath: Command 0x28 (0x4102bf9d6c40) to NMP device "eui.22eb000155cf49f5" failed on physical path "vmhba0:C0:T0:L1" H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.
Mar  2 16:01:36 vmkernel: 12:18:56:34.640 cpu10:4106)ScsiDeviceIO: 1672: Command 0x28 to device "eui.22eb000155cf49f5" failed H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.
Mar  2 16:01:36 vmkernel: 12:18:56:34.640 cpu10:4106)NMP: nmp_CompleteCommandForPath: Command 0x28 (0x4102befe0e40) to NMP device "eui.22eb000155cf49f5" failed on physical path "vmhba0:C0:T0:L1" H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.
Mar  2 16:01:36 vmkernel: 12:18:56:34.640 cpu10:4106)ScsiDeviceIO: 1672: Command 0x28 to device "eui.22eb000155cf49f5" failed H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.
Mar  2 16:01:36 vmkernel: 12:18:56:34.643 cpu10:4106)NMP: nmp_CompleteCommandForPath: Command 0x28 (0x4102bf1b4640) to NMP device "eui.22eb000155cf49f5" failed on physical path "vmhba0:C0:T0:L1" H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.
Mar  2 16:01:36 vmkernel: 12:18:56:34.643 cpu10:4106)ScsiDeviceIO: 1672: Command 0x28 to device "eui.22eb000155cf49f5" failed H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.
Mar  2 16:01:36 vmkernel: 12:18:56:34.643 cpu10:4106)NMP: nmp_CompleteCommandForPath: Command 0x28 (0x4102be754940) to NMP device "eui.22eb000155cf49f5" failed on physical path "vmhba0:C0:T0:L1" H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.
Mar  2 16:01:36 vmkernel: 12:18:56:34.643 cpu10:4106)ScsiDeviceIO: 1672: Command 0x28 to device "eui.22eb000155cf49f5" failed H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.

As far as I can tell this tells me that I am getting a "Blank Check" error for my connected devices. Though I am not exactly sure what a blank check error is or how to remedy it.

0 Kudos
Thorsten_Schnei
Hot Shot
Hot Shot

Hi,

did you find the cause for that problem ? I'm currently seeing the same errors on one of my hosts and only for one specific connection. Could it be a faulty cable ?

Thanks

Thorsten

0 Kudos
wamatha
Contributor
Contributor

We had the same issue re-curring every other day, finally EMC said it is a known issue caused, by the VNX FLARE code version 704 which we had on our arrays. We upgraded the VNX to the latest FLARE code version 716 2 days ago.

We are stil monitoring the situation, hopefully this code upgrade works.

0 Kudos