VMware Cloud Community
Basildane
Contributor
Contributor

Esxi host goes down after file i/o

Esxi 6.5 host with a windows 2016 file server on it.

The file server is connected to a MD1200 PowerVault with an H800 controller.

Once I start copying files to the PowerVault, I start to get these errors in the vmkernel.log after about 10 minutes.

    2018-08-13T12:51:23.055Z cpu10:67779)ScsiDeviceIO: 2948: Cmd(0x439d00453600) 0x1a, CmdSN 0x1041 from world 0 to dev "naa.6782bcb04fcf0c00224bef011f55af3c" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

    2018-08-13T13:04:36.605Z cpu13:65999)ScsiDeviceIO: 2948: Cmd(0x439d05934b40) 0x1a, CmdSN 0x10b6 from world 0 to dev "naa.6782bcb04fcf0c00224bef011f55af3c" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

If I continue copying files, the entire esxi host will become unstable.  I lose all connectivity to the VM's.  I can't even stop them using the command line.  My only option is to cold-reboot the esxi server.

This happens every time I copy files to the array.

The array is configured as a VMFS 6 raid-5 disk.

13 Replies
GayathriS
Expert
Expert

Hi

Could you please confirm the storage model you are using is MD1200 PowerVault  DELL ?

Did you check if that is compatible with esxi before you configure ?

Because I dont see the model name what you have given in compatibility guide, am I missing something here .

Below are the 3 models that I see with respective to power vault :

pastedImage_0.png

Please consider marking this answer as "correct" or "helpful" if you think your questions have been answered.

regards

Gayathri

0 Kudos
Basildane
Contributor
Contributor

It is a Dell PowerVault MD1200.  It's one of the most popular storage cabinets on the planet.

The host is a Dell R710.

0 Kudos
Basildane
Contributor
Contributor

I just started a new test.

I am copying files just between folders on the Perc 6, Raid-5 local to the R710.

No MD1200 involved at all.

After 5 minutes of copying files I get more errors.

2018-08-13T13:47:11.511Z cpu1:69357)ScsiDeviceIO: 2948: Cmd(0x439509551c40) 0x1a, CmdSN 0x1979 from world 0 to dev "naa.6782bcb04fcf0c00224bef011f55af3c" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

That LUN is the local array on the R710.

0 Kudos
GayathriS
Expert
Expert

With the scsi code if I check it says : INVALID FIELD IN CDB

Converting scsi code:

https://www.virten.net/vmware/esxi-scsi-sense-code-decoder/?host=0&device=2&plugin=0&sensekey=5&asc=...

I would like to see more from logs.

As your are facing this with local drive,

-->we can check if we any errors from logs

-->Check and update to latest firmware

-->Finally you can restart management agents or host post which you need to verify if you are still facing same issue.

Please consider marking this answer as "correct" or "helpful" if you think your questions have been answered.

regards

gayathri

0 Kudos
GayathriS
Expert
Expert

Hi

I was verifying if the dell R710 on which you have installed esxi 6.5 is that compatible and I dont see this server being compatible with 6.5 esxi .

Please see below image :

pastedImage_0.png

This confirms that Dell R710 is compatible till ESXi 6.0 U3 , esxi 6.5 is not a compatible version to run on this server .

Please consider marking this answer as "correct" or "helpful" if you think your questions have been answered.

regards

Gayathri

0 Kudos
Basildane
Contributor
Contributor

I received your PM and tried multiple times to respond, but the web page just hangs saying "loading".

To answer your question, yes, I have 6.5.0.

Thanks for all the information.

0 Kudos
Lalegre
Virtuoso
Virtuoso

Which is the model of your I/O Controller on the R710?

Which version of ESXi are you currently using?

Are you using the Dell customized ISO?

Try this command on your esxi "esxcfg-scsidevs -a" look for the driver and then use esxcli software vib list | grep "The name of the driver" and give us the version of the driver.

0 Kudos
Basildane
Contributor
Contributor

Which is the model of your I/O Controller on the R710?

Perc 6 and H800

Which version of ESXi are you currently using?

ESXi 6.5.0, build 5969303, when the problem was occurring this weekend.

I have updated to build 8935087 and I am re-testing now.

Are you using the Dell customized ISO?

No.  I would like to hear more about this though.  I am using the default ISO install.

Try this command on your esxi "esxcfg-scsidevs -a" look for the driver and then use esxcli software vib list | grep "The name of the driver" and give us the version of the driver.

[root@dozer:/var/log] esxcfg-scsidevs -a

vmhba0  megaraid_sas      link-n/a  unknown.vmhba0                          (0000:03:00.0) LSI / Symbios Logic Dell PERC 6/i Integrated

vmhba1  vmkata            link-n/a  sata.vmhba1                             (0000:00:1f.2) Intel Corporation 2 port SATA IDE Controller (ICH9)

vmhba2  megaraid_sas      link-n/a  unknown.vmhba2                          (0000:05:00.0) LSI / Symbios Logic Dell PERC H800 Adapter

vmhba64 vmkata            link-n/a  sata.vmhba64                            (0000:00:1f.2) Intel Corporation 2 port SATA IDE Controller (ICH9)

vmhba33 vmkusb            link-n/a  usb.vmhba33                             () USB

[root@dozer:/var/log] esxcli software vib list | grep "megaraid"

scsi-megaraid-mbox             2.20.5.1-6vmw.650.0.0.4564106         VMW     VMwareCertified   2018-03-27

scsi-megaraid-sas              6.603.55.00-2vmw.650.0.0.4564106      VMW     VMwareCertified   2018-03-27

scsi-megaraid2                 2.00.4-9vmw.650.0.0.4564106           VMW     VMwareCertified   2018-03-27

lsu-lsi-megaraid-sas-plugin    1.0.0-8vmw.650.1.26.5969303           VMware  VMwareCertified   2018-03-27

[root@dozer:/var/log]

0 Kudos
Lalegre
Virtuoso
Virtuoso

Which one is the I/O controller that you are using to present the Datastore?

When it comes about customized ISO i am talking about ISO developed by the vendors in conjunction with VMware with the recommended drivers for specific hardware. As you have Dell here is the customized ISO: VMware ESXi 6.5 U1 | Dell Argentina

I just realized too that the R710 is not supported with ESXi 6.5 Update 1 as seen in the VMware Compatibility Guide. The last supported version is 6.0 Update 3. Take that in mind also, could be the issue.

0 Kudos
Basildane
Contributor
Contributor

The Perc 6 is the esxi and vm data store.

The data is being transferred to the md1200 on the h800  controller.

As for "supported release", that just means they didn't test it.  It doesn't cause it to fail.

Today I updated that esxi to the latest rev for 6.5.  The test is still running.  Nothing evident yet.

0 Kudos
battybishop
Hot Shot
Hot Shot

Also note that the H800 has not been tested is not supported on 6.5

pastedImage_0.png

If this is a production system then you won't get any support from VMware

0 Kudos
Lalegre
Virtuoso
Virtuoso

As me and battybishop said that server has not been tested also the i/o controller neither. I have seen this errors commonly because of drivers versions:

2018-08-13T12:51:23.055Z cpu10:67779)ScsiDeviceIO: 2948: Cmd(0x439d00453600) 0x1a, CmdSN 0x1041 from world 0 to dev "naa.6782bcb04fcf0c00224bef011f55af3c" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

    2018-08-13T13:04:36.605Z cpu13:65999)ScsiDeviceIO: 2948: Cmd(0x439d05934b40) 0x1a, CmdSN 0x10b6 from world 0 to dev "naa.6782bcb04fcf0c00224bef011f55af3c" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

You will not have support from VMware but what you can try to do is upgrade your megaraid-sas driver to a newer version.

Personally i do not recommend to do this. The reason that is not listed on the VMware Compatiblity Guide could be your issue!

0 Kudos
battybishop
Hot Shot
Hot Shot

Agree totally, it's not always that the hardware has not been tested often it's the fact the the hardware is old and there are no apropriate drivers to make it work and can not support the later versions of ESXi.