VMware Cloud Community
nology
Contributor
Contributor

VSAN permanent disk failure with SSD

Hi I am having an issue with disk failures in Vsan even though the dell server says the disk is fine.  I reboot and the issue goes away for awhile then comes back when I try to create VM's.  It appears its only the SSD flash drives with the issue

All servers and disks are new and reporting healthly.  Fresh install of VSAN 6

All servers and drives are in the HCL.  The only issue is vmware lists the wrong firmware for the sandisk optimus ascend SSD.  The one they list does not exist from what I can tell from sandisks website.

3 Dell R630 Servers New

Each server has 1 Sandisk Optimus SSD and 2 Seagate 10k HD's  All New

Vsan network consists of each server with 10GBe intel X540 nics

Switches are netgear XS712T

Any insight would be helpful.  Thanks

Tags (1)
7 Replies
zdickinson
Expert
Expert

I had the same issue with another component.  Firmware list by VMware was not available on the manufacture's website.  I had to contact the manufacture to obtain the proper firmware.  Not sure if it would be Dell or SanDisk in your case, assuming the SanDisk was purchased OEM from Dell.  Thank you, Zach.

hunor
Contributor
Contributor

Hi! Could you find out something? Having the same hardware and the same problem.

Only difference is the flash drive which make the trouble, it's Intel not Sandisk.

Dell suggested to upgrade all firmware to the newest version but the problem still keeps coming back.

Reply
0 Kudos
strorageTech
VMware Employee
VMware Employee

When a disk  exhibits very high latency for long time , VSAN marks such devices degraded. This will reflect in disk failures.

The failed disk status remains in the memory and is not persistent across reboot.

When server is rebooted, the disk again turns up to be healthy.Again if the device starts showing very high latencies

you would see this issue again. Best solution for your problem would be to get the disk/firmware/driver right.

Reply
0 Kudos
paggarwa
Contributor
Contributor

Hi,

Facing same issue on HP setup here, got all LDs as failed in RAID 0, deleted all LDs, re-created them and error seemed to have gone, but once the ESXi was put back in prod, this issue popped up again. Using 5.5. Any suggestion/solution would be great.

Update: Got update from HP, seems like controller needs to be disk replaced even though it is showing healthy in ILO.

Regards,

Pranav

Regards, Pranav
Reply
0 Kudos
sicorspa
Contributor
Contributor

sorry if I profit of your configuration, but I have pretty the same config with 3 hosts, same 10g card and netgear XS708E switch (the smaller one) and have bad result on multicast test of the healt check plugin.

Have you ever tryed it? What's your results?

Many thanks for your help

Manuel

Reply
0 Kudos
joergriether
Hot Shot
Hot Shot

On 31th of July this year we were contacted by DELL informing us of urgent firmware updates regarding some SSDs and some Raid-Controllers. But i guess the main thing was on the SSDs because of some "hang"-cases. So - please use the latest Nautilus and update the drives. Attention: You HAVE TO use Nautilus (in UEFI mode - if you have installed ESXi6 via BIOS, make sure to swith to UEFI before booting Nautilus but make sure even more to switch back to BIOS when using ESXi again IF you installed ESXi via BIOS). For the drive FW you can´t use LC at this time, even with the R730 series. So always have the latest Nautilus on hand after doing the usual SUU stuff.

http://www.dell.com/support/home/us/en/04/Drivers/DriversDetails?driverId=TTRG8

Best,

Joerg

Reply
0 Kudos
hunor
Contributor
Contributor

We had the same issue. After upgrading the Dell firmware and patching the ESXi the problem seems to be solved.

For detailes please see:

VMware KB: Using a Dell Perc H730 controller in an ESXi 5.5 or ESXi 6.0 host displays IO failures or... !

Reply
0 Kudos