One of the drives in my raid array failed a few days ago.
I had a hot spare in the array so the controller immediately began the rebuild process and all servers remained running throughout.
When the rebuild was completed ( I verified this via the raid array user interface and the log), I needed to shut the server down to remove the failed drive (chassis not hot swappable) and when I re-booted the server, the data store on the raid array is no longer there. I also verified through the raid controller interface that I removed the proper (failed) drive and that the array was still in a Ready state when it came up.
In vsphere client, when I click on the Add Storage... link, the sever sees the hardware but if I click next it tells me that it will re-format the volume. See attached. I most definitely did not go that next step and reformat. I simply took the screen-shot and backed out.
I found these instructions, but they are for a much older version of ESXi and am not sure if they are correct for ESXi 6.0.0 338124
VMware KB: Datastore missing after rebuilding the RAID disk/LUN
Are these the steps that I should follow?
If these are not the right instructions can you point me to the version that is for ESXi 6.0.0 338124 as I have been unable to locate anything.
Thanks
Hi ThompsG,
Yes, There were two data stores for the VM that were on the RAID array. The VM itself was stored on a different data store that was not in the raid array.
I spent about 48 hours over the past week, including this morning, trying to coax ESXi into recognizing the volumes, with no luck. Finally, I gave up and I removed the hard drives from the virtual machine that were on the corrupt data store. Then, the virtual machine came up without issue.
Finally, since I have everything on those 2 volumes backed up to a cloud provider, I re-created the two data stores in the raid array and began the restore process. It is currently running and has an estimated 16 days left to go.
Sorry all, I just noticed the instructions for ESXI 6.0 in an embedded link in the document that I linked to. Doh!
If that proves to be the correction, I will mark this as answered.
On this page:
https://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&externalId=1011387
It says the following, but when I get to step 6, the VMS label column is blank. I didn't go any further at that point because I am pretty new to VMWare at this level and would rather not lose the data if I can avoid it. Yes I have a backup in the cloud, but would need to order a restore hard drive shipped to me. If I can recover without having to resort to that it would be much preferred. Any additional guidance would be greatly appreciated.
vSphere Client
vSphere Web Client
Command line
The esxcli
command is used on the command line.
# esxcli storage vmfs snapshot list
49d22e2e-996a0dea-b555-001f2960aed8
Volume Name: VMFS_1
VMFS UUID:
49d22e2e-996a0dea-b555-001f2960aed8
Can mount: true
Reason for un-mountability:
Can resignature: true
Reason for non-resignaturability:
Unresolved Extent Count: 1
# esxcli storage vmfs snapshot mount -l label|-u uuid
# esxcli storage vmfs snapshot mount -l "VMFS_1"
# esxcli storage vmfs snapshot mount -u "49d22e2e-996a0dea-b555-001f2960aed8"
# esxcli storage vmfs snapshot mount -n -l label|-u uuid
# esxcli storage vmfs snapshot mount -n -l "VMFS_1"
# esxcli storage vmfs snapshot mount -n -u "49d22e2e-996a0dea-b555-001f2960aed8"
# esxcli storage vmfs snapshot resignature -l label|-u uuid
# esxcli storage vmfs snapshot resignature -l "VMFS_1"
# esxcli storage vmfs snapshot resignature -u "49d22e2e-996a0dea-b555-001f2960aed8"
# esxcfg-volume -M VMFS_UUID|label
For example:# esxcfg-volume -M "VMFS_1"
# esxcfg-volume -M "49d22e2e-996a0dea-b555-001f2960aed8"
Note:
To view the datastores again in vCenter Server, you may have to perform a rescan of the storage adapters on all ESXi/ESX hosts that the datastore is presented to or a refresh of the storage view. If you are having trouble identifying the affected datastore, in the vSphere client, check the storage view of another ESX/ESXi host that still has the datastore mounted correctly. This will then allow you to correlate VMFS datastore name with NAA LUN identifier.
Here are the errors that I get when I try to start the server that is stored on the RaidArray:
Power On virtual machine:6 (No such device or address)
See the error stack for details on the cause of this problem.
Time: 2/29/2016 11:30:43 PM
Target: FS
ESXi: 192.168.1.250
Error Stack
Failed to start the virtual machine.
Cannot open the disk '/vmfs/volumes/52b4931f-de0c19aa-d2b4-001e673dab32/FS/FS_2.vmdk' or one of the snapshot disks it depends on.
6 (No such device or address)
Module Disk power on failed.
Cannot open the disk '/vmfs/volumes/52b4931f-de0c19aa-d2b4-001e673dab32/FS/FS_1.vmdk' or one of the snapshot disks it depends on.
6 (No such device or address)
Thanks
The datastore seems to still have a VMFS partition - so the raid-rebuild probably did not fail completely.
Often the vmdks can still be extracted with the the help of a Linux-system using vmfs-fuse
If you want I can have a closer look.
Hi continuum,
Thanks for the reply. I am encouraged by your optimism. The raid array shows that it was able to successfully rebuild the array. See attached screen-shots.
I mainly work on mid-range computer systems from IBM where the raid arrays are extremely reliable. When they say that they rebuild something, there is never a doubt and has never been an issue where it was not able to reliably rebuilt the array. Given that kind of expectation, if this array was not properly rebuilt, it sort of makes me wonder how the company that makes the card can get away with claiming that their card provides a RAID solution. But let me set my expectations aside and figure out how I can get this darn thing working again.
This is my personal VMWare installation, at home, and I sometimes get to work on it in the evenings, after I get home from work, but I mostly work on it on the weekends when I have more time and my head is clear.
Would you be willing to assist me at one of those times?
If that is not possible, I will try to arrange for something during business hours as it would be great if I can get this going again.
I do have quite a bit of technical skills (30 yrs in software development) but have very little technical expertise with the bowels of VMWare. Knowing that, if you think it won't waste too much of your time typing instructions through the thread, I would be open to trying that as well. In addition, perhaps the community would benefit from the solution at the same time.
Thanks again for your offer to help and I look forward to hearing from you.
Hi MyCroWave,
Don't want to get in the way of Continuum as he is definitely the expert here but thought I'd asks validation question.
You mention that you tried to power the VM on - does this mean part of the VM was on other disks or that you have managed to mount the datastore now but the VM doesn't start?
Kind regards.
Hi ThompsG,
Yes, There were two data stores for the VM that were on the RAID array. The VM itself was stored on a different data store that was not in the raid array.
I spent about 48 hours over the past week, including this morning, trying to coax ESXi into recognizing the volumes, with no luck. Finally, I gave up and I removed the hard drives from the virtual machine that were on the corrupt data store. Then, the virtual machine came up without issue.
Finally, since I have everything on those 2 volumes backed up to a cloud provider, I re-created the two data stores in the raid array and began the restore process. It is currently running and has an estimated 16 days left to go.
Hi MyCroWave,
Feel your pain - at least you have a recovery point but ouch with the restore time 🙂
To be honest it is quite scary at the moment the number of people having data lost with ESXi - perhaps it is just my imagination but seems the trend is increasing - maybe it just that more people are turning to the community for help?
Have a great day,
ThompsG