One of our ESXi 4.1 servers lost a drive in it's Raid-1 boot array about a week ago. No biggie, bought new drive, inserted new drive today, array began rebuild. Other drive fails during rebuild. "F&%K!" I screamed.
Good news is the VMs are still running as is ESXi' as far as I can tell. I know though as soon ESXi needs to read/write to a log file or something it's all going to come crashing down.
So the question is, since I don't have vmotion and I don't have another suitable host anyway, how long do I have before ESXi dies? Do I have enough time to get a new server (10 days)? Should I put a really powerful working up quick (overnight)? Or is best to shut the VMs down now and deal with outage?
A few bits of information. The VMs are stored completely on a shared NFS array, not on the hosts' physical storage. The host has dual power supplies each tied to separate battery backups each on their own circuit. I don't think power will bring them down. The host is starved for RAM (thanks Exchange) and was planned to be replaced before year's end. Only thing running on the host is ESXi itself. It was loaded to the boot array. Everything else is stored on the NFS systems.
If the VMs are running on different storage to your ESXi install then one option would be to install ESXi onto say a USB stick instead of a disk.
Once installed, configure ESXi as before then navigate the datastore and import the VMs.
I would check that ESXi 4.1 and your hardware supports USB boot for ESXi.
Also, your server will be okay until it is rebooted as ESXi runs in memory.
just an idea
remove the new disk
go into raid bios and set the last disk that failed to good and try booting. (sometimes it works but not always)
if that works try rebuilding again with new disk
ofcourse this requires downtime. but you will need downtime anyway as i been reading.
Since your vm's running on a NFS datastore you should be fine to reinstalling on other disks/usb or other server and add the NFS datastore to the new server