I just had an instance were all my storage was unavailable to my ESXi host for a short time - all machines recovered happily when the paths were restored and marched on with the exception of one: The VCSA appliance. On booting, I'm greeted with 'Failed to start File System Check on '/dev/.....' and I'm stuck in 'Emergency Mode'. My searching seems to have info on VCSA6 and possibly before, but VCSA 6.5 is a different animal - I don't seem to be able to interrupt the boot to modify the GRUB loader and boot with bash or anything, and if I run 'systemctl status systemd-fsck-root.service', I get 'fsck failed with error code 4' and 'Failed to start File System Check on /dev/disk/...'. I can't run fsck or e2fsck in this state... Basically, the 'Emergency Mode' seems to be about as useless as an ice salesman at the south pole.
Does anyone know how I can get my VCSA up and running again?
The fact that the VCSA is this fragile is rather poor... I have several Linux based VMs running in my setup, and of them, this is the only one that blew up as a result - the rest, at most, needed a reboot to clear up. Windows? Didn't skip a beat. Every Windows machine picked up right where it left off.
Hi Cougar281 I just experienced this same thing. There is probably a more elegant way to accomplish this but here's the process that worked for me. At the emergency mode prompt, I typed "fsck /dev/sda3" and then answered yes to all questions about repairs, inodes and fixing issues. Once the prompt was back I reset the VM and got the regular logon prompt. I then tried accessing the web interface as usual but it gave the messaage saying /vsphere-client was not available. I went back to the console, logged in as root and did a clean reboot. Once back up, I check the nifty vsphere admin page https://[vcsa-name].[domain].com:5480 and saw the health badges were green. Finally I logged in to the normal interface https://[vcsa-name].[domain].com/vsphere-client and all was well.
I experienced the same iussue twice now. First time I could just go away with doing 'fsck /dev/sda3' and after the report everything was ok. But after the second occurence I can not login any more. The emergency shell does not accept my root password. I'm now completely stuck.
On my test rig I often lost the connection to my datastore and all vms came back or even didn't have any trouble with that. I gave the VCSA a try since I want to get away from the windows vcenter.
With a storage loss on VCSA 6.5 I have came across this "fsck failed with error code 4" issue. Best thing I have found to do is continue in emergency mode by typing in your root password 2x. Get into shell and type: fsck /dev/mapper/log_vg-log This will prompt you with a lot of questions and just answer yes to them. Once done reboot and the VCSA should come back up.
at the emergency shell prompt you don't need a password. Just type fsck /dev/sda3 and you should see the file system check start. This all assumes the underlying storage that went missing is made available again.
As my VCSA 6.5 is still in my test lab I played a bit with this short unavailibility of my storage.
The VCSA is running on ESXi 5.5. and my storage is located on a windows server served as a NFS share.
The VCSA 6.5 stops working everytime I loose my storage by a simple reboot of the windows server. In many cases a normale reboot of the VCSA was not possible. So I had to stop the VM and restart it again.In a few cases I had to do a manual file system check.
But as a result such a setup of VCSA is not feasable. Every other VM (doesn't matter Linux or Windows) survived such an outage of the datastore.
Can't thank you guys enough... the fsck /dev/mapper/log_vg-log worked for me the second time around, where as the first method to the disk brought the errors up again. Back to the usual command prompt now, and background VC tasks are updating and happening as expected.
Same. I had mine on shared storage so i had to fsck /dev/disk/by-partuuid/<UUID ID>
This was a bit tricky to find as you would think it would be in /by-uuid but it isnt/wasnt.
Answer Y to all. Reboot.
The UUID was displayed in the error on boot. Good thing tab complete works in recovery mode
I am resurrecting this thread since it seams that any storage issue causes corrupt file systems. Of course the steps below are good to correct the issue but I am curious if this is the expected behavior or is there a release that has "fixed" this?
I also just had this come up. vCenter lost it's connection to the backend SAN and all of the VM's recovered normally other than this appliance. I would expect this to operate at least as well as the old Windows server it use to be hosted on. Hopefully they will add some built in recovery.