VMware Cloud Community
shanefletch
Contributor
Contributor

RAID6 crashed, rebuilt array, now I'm trying to mount VMFS but struggling

Hi,

As background, I have ESXi6.5.0u1 running. My first datastore is a RAID1 consisting of 2 SSDs, for ESXi to live on (I know, I could have done that on USBs, but it's work). The second datastore consisted of 8x 1TB SSDs, in RAID6. All disks are connected via an LSI Megaraid SAS Card. This ran fine, having been updated from 6.0 to 6.5 in the past few years.

After a public holiday, I returned to find the array showing 2 disks failed in the RAID6. I had those replaced, and in the rebuilding process, a third disk failed. Using the RAID card's BIOS, I marked the two new disks "good", left it to rebuild, replaced the third disk, left it to rebuild, then rebooted into ESXi. At no point did I delete the datastore, or any VMs, or the array. No suprises, my datastore doesn't mount automatically, but the volume does appear when I run "esxcli storage vmfs extent list" (with correct volume name, extent number 0, partition 1).

I tried to check metadata with "voma -m vmfs -f check -d", which complains of 10 total errors found;

- resourcesPerCluster "0"

- clustersPerGroup "0"

- clusterGroupOffset "0"

- resourceSize "0"

- clusterGroupSize "0"

- bitsPerResource "0"

- version "0"

- signature "0"

- numAffinityInfoPerRC "0"

- numAffinityInfoPerRsrc "0"

- Failed to check sbc.sf

- "VOMA failed to check device : Invalid address"

The next step I tried was a "fix", but of course this isn't possible with VOMA v0.7, I'd have to update the host to 6.7 to get that.

What's the collective thought, is my datastore toast? Is it worth updating to 6.7 to run VOMA in fix mode? Should I just put the beast out of its misery, blow the array away, and start again, restoring whatever backups I've got?

Tags (4)
0 Kudos
4 Replies
gregsn
Enthusiast
Enthusiast

>>What's the collective thought, is my datastore toast?

RAID6 can only sustain two concurrent disk failures at a time without data loss.  If a third disk fails in an array, you are typically facing total loss of data.

>>Is it worth updating to 6.7 to run VOMA in fix mode?

It's unlikely to make a difference since you're facing a failure mode (three-disk failure) that software, generally, cannot fix.

Depending on how far along the rebuild completed before the third disk failed, you may be able to pull some data off the array using software-based data recovery tools on the part of the array that completed rebuilding.  This can be a time-consuming process and may only provide you with shards of actual data. 

>>Should I just put the beast out of its misery, blow the array away, and start again, restoring whatever backups I've got?

If you have known-good recent backups, and getting the system back online quickly is important, then restoring from backups is probably the best option.  However, if you're concerned about data which has not been backed up and may still be on the damaged array, restore to a different storage server/NAS/etc. and then perform data recovery on the damaged array (ymmv).

0 Kudos
continuum
Immortal
Immortal

Read Create a VMFS-Header-dump using an ESXi-Host in production | VM-Sickbay

If you send a dump like that I can give you a good suggestion for the next steps.

Ulli


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

0 Kudos
shanefletch
Contributor
Contributor

Hi Continuum,

I've gathered the file, it's in my Google Drive.

The forum doesn't seem to be allowing me to send you a DM with the link - is there another way I can get it to you?

Regards,

Shane

0 Kudos
continuum
Immortal
Immortal

Sorry to hear that - is skype not allowed in your country ?


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

0 Kudos