Heres what happened. internal raid 5 failed, two drives down. reboot, server comes back with an error stating that the drives previously failed but now appear operational, some data may be lost, to accept this loss press F2, to coninue with logical drives offline press F1. I hit F2, machine reboots and loops, same thing, try it again, this time it starts loading ESXi (4.0.1) fails at boot image, corrupt. DOH!
I get a 4.1 u1 image, burn it, use it to repair. during the repair it said i will have a fully functional however some VMFS partitions may not be immediatly available. So they are infact not there. Ive tried multiple methods people have listed as working solutions, nothing has fixed it. I havnt had time to place a call into VMware yet, but that may be what needs to happen.
Any thoughts, opinions? are the VMFS completly gone as i lost two drives? nothing (at least nothing that i know of) has been over written, only repaired ESXi, albeit with a newer version. did i screw my self by doing this?
Thanks
With RAID losing two drives does mean the volume is gone - in my opinion you are out of luck -
It may be possible to recover but it depends on how much damage may have occured. VMware support may be able to help. Recover from backup might be quicker.
....thats one of the problems, backup solution was not in place. quite possibly the worst day in a while, started with waking up to pump being out on Koi pond, lost 4 fish. then server takes a dive. /facepalm
Companies like Kroll Ontrack http://www.krollontrack.com/ can often times recover things. Very expensive though.
I am inclined to agree with David. Count it gone.
Thats kinda what i was thinkin.....oh well didnt lose anything too major, just a couple of XP boxes, a minecraft server and oh yeah, two years worth of emails and docs that never got backed up on the physical box and the drive was reallocated when we VMed it. hah. doh, ok we'll now i know, next time run at least weekly backups.
For future reference.
GhettoVCB backup script http://communities.vmware.com/docs/DOC-8760
ESXi Control http://blog.peacon.co.uk/wiki/Esxi-control.pl
Hope the rest of your day gets better.
So they are infact not there. Ive tried multiple methods people have listed as working solutions, nothing has fixed it.
Although I agree with the others that data loss is most likely with the two lost disks, one question though. Do you see the VMFS partition when running fdisk -lu on the console?
André
~ # fdisk -ul
Disk /dev/disks/mpx.vmhba1:C0:T0:L0: 441.2 GB, 441241845760 bytes
64 heads, 32 sectors/track, 420801 cylinders, total 861800480 sectors
Units = sectors of 1 * 512 = 512 bytes
Device Boot Start End Blocks Id Sys tem
/dev/disks/mpx.vmhba1:C0:T0:L0p1 8192 1843199 917504 5 Extende d
/dev/disks/mpx.vmhba1:C0:T0:L0p4 * 32 8191 4080 4 FAT16 < 32M
/dev/disks/mpx.vmhba1:C0:T0:L0p5 8224 520191 255984 6 FAT16
/dev/disks/mpx.vmhba1:C0:T0:L0p6 520224 1032191 255984 6 FAT16
/dev/disks/mpx.vmhba1:C0:T0:L0p7 1032224 1257471 112624 fc VMKcore
/dev/disks/mpx.vmhba1:C0:T0:L0p8 1257504 1843199 292848 6 FAT16
Partition table entries are not in disk order
I dont believe they are there, no. 😕
Edit: Also, the disks where never replaced. the host was simply rebooted and the hardware diag said they appeared operational, then asked me if continuing with the data loss was acceptable. I figured one may have gone down and taken the one next to it with it, so only one was orginally comprimised. the only data that should have been lost at that point was the write cache. oh well im not sure what caused the failure, and not sure why it says they are working again, i think i need a SAN and more disks and a better (exsisting) backup solution and that will fix this lil mess.
From what I can see, the VMFS partition is missing. However that does not necessarily mean the data is lost. If we/you are able to restore the partition table (recreate the VMFS partition) with the correct values, there might be a chance to access the data.
It's up to you to decide to call VMware support, where they have engineers and the detailed knowledge or try it by your own. In this case I will fire up my test system to see whether I can find out what exactly is missing. However I can't promise you anything.
Btw. which version/buidl of ESXi did you run before this happened and which build did you use to run the repair?
André
I installed 3.5 initially, updated to 4.0.1 with a CD, then ran the repair with 4.1 u1......i was not paying attention, i realized i was only at 4.0.1 when i found the update CD i had previouslly used.
You updated with a CD? Just to be sure, did you run ESXi or ESX?
If it was ESXi - which I really hope - we'll have to find out the correct partitioning (like in the examples at http://kb.vmware.com/kb/2002461) and then re-create partitions 2 and 3 (where partition 3 is the VMFS partition). The values from the KB should actually be ok (except for the typo for the end sector of partition 8). The VMFS partition is always the last partition on the disk and its end sector usually matches the output of fdisk -lu minus 1.
Please don't start doing anything before I double checked that the values were the same for ESXi 3.5. Only answer my question above.
André
ESXi, 3.5 updated to 4.0.1 with CD, repaired (after failed boot image from drive failure) with a 4.1 u1 CD.
Sorry it took some time, but I needed to setup an ESXi 3.5 host to reproduce the issue. The bad news is that partitions are different between 3.5 and 4.x, the (hopefully) good news is that the area where the VMFS partition should be located was not overwritten, because partition 2 was not created.
Again, you do the following on your own risk, even though this worked with my test system! It's still time to call VMware support!
To re-create partition 3 (VMFS) do the following:
run: fdisk /dev/disks/mpx.vmhba1\:C0\:T0\:L0
If this does not work, run esxcfg-scsidevs -c to find out whether the disk has a Linux device name (like /dev/sda) and run the fdisk command with this device name.
In fdisk enter the following commands:
To verify the values for partiton 3 run fdisk -lu again. The start value for partition 3 should be 9922560 and the System type VMFS.
Once done, rescan the vmhba in the vSphere Client (in Storage Adapters). If the values were correct, ESXi should detect the VMFS datastore.
Even if you are able to access the datastore at this point, I strongly recommend that you immediately backup the VMs and consider to reinstall the host.
Good Luck
André
well, now i see it in my fdisk -ul list, however when i go to rescan it does not show up, but when i go to add storage there is a VMFS datastore listed, but i cant use it, and there is 3.8Gbs of freespace that it wants to use. well, we tried, but i think in the end its gone. :|,
thanks for you help.
Just for fun, i created a new VMFS partition using the add storage wizard. i see two VMFS paritions in fdisk, but only one shows up in vSpere client....not sure how to get the other one to add, or if i botched it and thats why i cant add it.
hmmm, odd thing here, the new VMFS parition was created at 901 to 4845, and labled p2. the new one went from 4846 to the end. now i cant get the wizard to list any available drives to add a VMFS store. odd issues im seeing here, i think best bet here is to start over, new drives, make sure we get the suspsect one outta there and start over.
For ESX(i) to recognize a VMFS partition, the partition type needs to be "FB" and the partition needs to be formatted with the VMFS file system. If it does not show up when rescanning the vmhba could mean that either the start cylinder is not correct or the file system on the partition is corrupt. However, to find out what's going on you would need to know the structure of the VMFS file system.
Btw. the reason why you see the small VMFS partition is that when you created it, it was formatted with the VMFS file system.
André