Directory corruption on VMFS data store leads to P...

swinster · ‎08-19-2018

Hi All,

I have recently been having some issues with an LIS 9260-8i RAID adaptor running in a Dell T430 server. The card manages 4 disks in a RAID 6 array containing a single VMFS datastore. One of the VDMK files in the store is mounted as a drive to a Windows server and shared for backup purposes. When I write a reasonable amount of data to the drive (200MB as a machine backup), the adaptor seemingly disappears from the host, taking the data-store with it and causing Windows VM server to fail. Rebooting the host resolves the issue and there doesn't seem to be a problem unless you write large amounts of data.

Anyhow, I decided to update firmware on the Dell server and the LIS card, and just to be sure, I also decided to install a fresh version of ESXi v6.7 just to ensure that ESXi was in a known reasonable state. This is the Dell OEM ESXi version. Eventually, everything is back up and running; however, I went to open some folders on the datastore an ended up with a Pink Screen of death (see attachment).

I am guessing that the adaptor failure has cause some datastore corruption. This appears to be with regard to some folders which contain some VM guests that I do not use a great deal, but if I can salvage anything, that would be great. If I do not navigate to those folders, ESXi seems happy.

Is there any way to repair file/folders on the VMFS datastore?

SupreetK · ‎08-19-2018

Can you run VOMA VMware Knowledge Base and check if there is any corruption? Is it a VMFS-5 datastore or VMFS-6? One caveat is that the datastore should not have any active I/Os while running VOMA.

Cheers,

Supreet

swinster · ‎08-20-2018

Thanks SupreetK. The VMFS volume is v5.81.

I will need to take down the Widows server to ensure that there is no active I/O. I see that I need to unmount the VMFS volume entirely, so will do that in a bit.

SupreetK · ‎08-20-2018

Cool, whenever you can afford the downtime One caveat would be that if there is a significant amount of corruption, we might not be able to mount the datastore back. Now that it is already mounted, it does not worry about the corruption. Either move the Windows VM to a different volume or be willing to restore it from backup should the datastore does not want to mount back.

Cheers,

Supreet

continuum · ‎08-20-2018

Hi

voma will not help you.

Instead read Create a VMFS-Header-dump using an ESXi-Host in production | VM-Sickbay

and create a dump like explained. call me via skype when you have the dump.

Ulli

________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

swinster · ‎08-20-2018

Hey SupreetK, well I unmounted before you said don't do that. Thankfully, it re-mounted with no issue.

There is, however, a significant amount of corruption, and as continuum outlined, it looks like VOMA doesn't actually do anything (ran it twice and got the same output).

I will continue with the VM-Sickbay dump and see where we do

SupreetK · ‎08-20-2018

Great! There is no corruption that continuum cannot fix

Cheers,

Supreet

swinster · ‎08-20-2018

FWIW, here is the file from VOMA - it look very Urghhhh.

Add to compress the 37MB of errors :8

swinster · ‎08-20-2018

continuum Apologies for the rubbish posted earlier (now deleted) - it might help if I were in the right terminal :smileyplain:. Been a long day....

OK, the resultant dump file is 1.5GB !!!

I tar'ed this up and it is a more reasonable 1.5MB. Will post to an online drive and share the link, reference this thread.

continuum · ‎08-20-2018

FYI : I do not believe in "fixing VMFS problems"

In most cases I prefer to extract the corrupt objects to another datastore.

________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

continuum · ‎08-21-2018

Update:
the vmfs-header dump showed that the VMFS metadata for 2 or 3 directories was missing but the rest was readable.
With that data it would have been possible to recover most of the VMs.
Unfortunately Dell support suggested to roll back the firmware of the Dell server which finally had the result that the raid is completely inaccessable now.

________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

swinster · ‎08-21-2018

Right, Looks like I am going to have to buy a replacement RAID card to extract the data.

The screwed firmware of the Dell host is another mater entirely.

When it rains, , it pours

swinster · ‎09-01-2018

More news on this one continuum

Dell server - Repaired. New motherboard!!!

LSI RAID card replaced. Seems OK, the card shows POST during server boot and the RAID 6 array is recognised. Harrah

Now I',m attempting to copy off the files from the datastore on the RAID 6 array to another locally attached HDD. OH MY GOD, this is sloooooow (copying/moving via the GUI datastore browser). I expect there must be a quicker way to do this. Based on the current transfer rate this 2TB VDMK is going to take days to copy!! This is only transferring effectively a file from one locally attached disk (the LSI RAID) to another (a bog standard SATA drive).

continuum · ‎09-01-2018

When your only problem is performance you were very lucky 😉
Use vmkfstools for all copy operations of vmdks across datastores.

________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

All

Directory corruption on VMFS data store leads to Pink Screen after issue with LIS RAID adpator