VMware Cloud Community
HagenV
Contributor
Contributor

HP ProLiant with vSphere 5.5 both drives in a RAID1 volume failed, backup available, how to recover datastore?

Hi all,

after a power cycle, two drives of a RAID1 volume will no longer spin-up. vSphere Management Assitant and VMware vCenter Server Appliance are still available because located on another volume. Before doing the power cycle, we created a backup from all datastores using Veeam.

After replacing both drives the RAID controller of the server asked if it should retain the volume using the new drives what we confirmed.

Now several VMs are not available depending on the datastore they reside on ("datastore1" holds the most critical machines and is still available while "datastore2" has failed).

The vSphere client still shows "datastore2" but with no content while the datastore is missing at the configuration page of the physical machine's memory (cluster with single machine).

The raid volume is visible at the memory adapters but when I try to create a new datastore from it named "datastore2" I get the error message that the name would already exist.

What I would like to do is:

- See my "datastore2" again in my server's configuration

- Copy back all images of saved VMs from my backup

- Make the server working again

However I'm stuck. Can someone give some helpful hints how to get out of this situation?

0 Kudos
9 Replies
IRIX201110141
Champion
Champion

If you have old/unwanted references to datastore2 you have to remove them first before you can use the same name again. Otherwise use "datastore2_new" temporarily and rename it later.

- Registered VMs

- Configured vCDROM (iso)

- Configured for Scratch or Syslog

can be reasons.

Regards,
Joerg

0 Kudos
continuum
Immortal
Immortal

>>How to recover datastore ?

Does the failed Raid 1 array contain 2 VMFS-volumes (datastore1 and datastore2 ) ?

Have you checked the integrity of the data on datastore 1 ?

Safest approach:

create new datastore3 and 4 using additional hardware.

Then restore your backups to datastore 3 and 4.

When that is done you can use the VMs for production again.

Then decide wether you need to recover any data from the original datastore 1 and 2.

About datastore 2:

When a datastore suddenly appears to be empty it is still sometimes possible to extract the VMs but you never should use such a datastore without rebuilding and reformatting it.

If both datastores were located on the same failed array you should also regard datastore1 as unreliable - if you have working fresh backups restore the backups rather than

using the VMs that are left on datastore1.

Decision to make now:

do you trust your backups ?

If yes then rebuild datastore1 and 2 completely (wipe first 2 gbs of each datastore with zeroes before you recreate them)

If not then create new datastores from additional hardware and restore your backups to the new datastores.

Worst case: you rebuild datastore1 and 2 - restore the VMs to them and then figure out that a VM is missing or does not work as expected.

To avoid that you need to keep datastore1 and 2 in their current state until you know for sure that the restored backups are working.

I can help you with recovering VMs from datastore2 - but as recovery results are always unpredictable you should avoid any scenario that counts on recovery-results.


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

0 Kudos
HagenV
Contributor
Contributor

Thank you both for your quick replies.

>>Does the failed Raid 1 array contain 2 VMFS-volumes (datastore1 and datastore2 ) ?

No, datastore1 is stored on different drives as well as "datastore3" (which I didn't mention before). Each of our three datastores has his own physical drives (RAID1). They are of different sizes due to growing storage demands in the past.

>>Have you checked the integrity of the data on datastore 1 ?

All machines on datastore1 are running (as the vMA and vCenter Server) except those machines that e.g. use a VMDK on datastore2. Also machines on datastore3 are fine also.

>>When a datastore suddenly appears to be empty...

It's empty because I already installed new drives which are blank. The old drives are no longer spinning up - 100% dead. I think there is nothing to restore from the old datastore 2.

I can still see the VMs in the inventory of the vCenter Server however some of them (i.e. those on datastore2) are grey/italic.

I was hoping to rebuild datastore2 "under the hood" but if I understand you correctly, I will have to remove all VMs from the inventory which are on datastore2, then restore them from my backup to a new datastore and finally add them again to the inventory of the cluster. Is this correct?

Another problem is that the vCenter Server still lists datastore2 - but I can't delete it (menu item is grey).

When connecting with the vSphere Client directly to the server, it does not list datastore2. So something appears to be out of sync. I add two pictures of the corresponding views.

0 Kudos
continuum
Immortal
Immortal

> I was hoping to rebuild datastore2 "under the hood" but if I understand you correctly,

> I will have to remove all VMs from the inventory which are on datastore2, then restore them from my backup

> to a new datastore and finally add them again to the inventory of the cluster. Is this correct?

Yes - when that is done you should be able to renove datastore2


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

0 Kudos
HagenV
Contributor
Contributor

After removing all VMs from the inventory that were associated to datastore2 in some way, I could delete datastore2. After this I created a new "datastore2" on the new RAID volume. I successfully copied back my backup (which took 22 hours). Furthermore all VMs that were completely stored on datastore2 have been added back to the inventory and could be started successfully.

However there are some machines, that have their VMX file (and their boot VMDK) on datastore1 but have a second, larger VMDK on datastore2. The VMDK on datastore2 has been restored but an attempt to start one of these VMs fails because the datastore2 now has a new volume id (error message ala: access to /vmfs/volumes/4f3cc3c7-0704b478-0c49-e4115bde9495/vcs-db-oracle/CentOS6.2 x64 Base_1.vmdk not possible).

What's the right way to proceed with these machines? Patching their VMX file manually (i.e. replacing the volume id by the correct one // how can it be determined?) or removing the disk from the VM configuration and add it back (but wouldn't this lead to further conflicts within the guest operating system)?

Thanks for any hints in advance!

0 Kudos
a_p_
Leadership
Leadership

Assuming it's only a few VM's that need to be fixed, I'd simply edit the .vmx files, i.e. replace the datastore ID/path. Please note that manually editing the .vmx file requires the configuration to be re-read before powering on the VM. This can be done by following steps 2+3 ion https://kb.vmware.com/s/article/1026043​.

Removing the virtual disk from the configuration, and adding it again is also possible. However, there are two things to be be aware of:

  • CAUTION: In case the VM has active snapshots (<vmname>-00000x.vmdk file names), you may not be able to re-attach these from the GUI. Attaching the base .vmdk in such a case may cause data loss, or corruption!
  • Removing and re-adding should be done in two steps, so that the SCSI ID remains the same. I.e. remove the virtual disk (ensure that "delete from disk" is NOT selected) from the VM's settings, and close the settings window. Then open the VM's settings again to add the virtual disk again.


André

0 Kudos
a_p_
Leadership
Leadership

... how can it be determined?

From the command line run cd /vmfs/volumes/<datastore-name>/vcs-db-oracle. The prompt will then show the datastore ID instead of its name.


André

0 Kudos
IRIX201110141
Champion
Champion

If all parts of VM are located into the $Home_Dir than VMware uses a relative path addressing (there is maybe one exception but we will ignore this right now).  If you place a vDisk into another Datastore the addressing switch to absolute which means something like /vmfs/volumes/<uuid>>/folder/name.vmdk is used.  The <uuid> is not the user friendly name because this can be changed every time and also contains ugly chars which are invalid within a filesystem. 

Because you create a new datastore the system generated a new uuid which now have break all those references.

- Manually Edit the *.vmx

- ***Edit VM Config trough the GUI and remove the Disk and add it again. Keep notice the type of SCSI controller and SCSI ID

*** Big warning when youre dealing with vDisks and Snaphots. IIRC the Datastorebrowser/vDisk selection within the old vSphere Clients only showing the base vmdk rather than all snapshots also cant display which is the "latest" Snap you would like to use.

Regards,
Joerg

0 Kudos
HagenV
Contributor
Contributor

Thanks to all!!!

This forum was really helpful, the system is up and running again.

This was the first time we really needed our raw backup for a desaster revocery and though it took some time, everything worked at the end.

N.B.: I decided to download the VMX files, patch them and then upload them again. After applying the patch, I added the VMs to the inventory and they could be startet as expected. Gladfully there were no snapshots. Special thanks to André who helped me with determining the new volume id.

0 Kudos