Cougar281
Enthusiast
Enthusiast

VCSA 6.5 'Failed to start File System Check'

I just had an instance were all my storage was unavailable to my ESXi host for a short time - all machines recovered happily when the paths were restored and marched on with the exception of one: The VCSA appliance. On booting, I'm greeted with 'Failed to start File System Check on '/dev/.....' and I'm stuck in 'Emergency Mode'. My searching seems to have info on VCSA6 and possibly before, but VCSA 6.5 is a different animal - I don't seem to be able to interrupt the boot to modify the GRUB loader and boot with bash or anything, and if I run 'systemctl status systemd-fsck-root.service', I get 'fsck failed with error code 4' and 'Failed to start File System Check on /dev/disk/...'. I can't run fsck or e2fsck in this state... Basically, the 'Emergency Mode' seems to be about as useless as an ice salesman at the south pole.

Does anyone know how I can get my VCSA up and running again?

The fact that the VCSA is this fragile is rather poor... I have several Linux based VMs running in my setup, and of them, this is the only one that blew up as a result - the rest, at most, needed a reboot to clear up. Windows? Didn't skip a beat. Every Windows machine picked up right where it left off.

16 Replies
Mike_Gelhar
Enthusiast
Enthusiast

Hi Cougar281 I just experienced this same thing. There is probably a more elegant way to accomplish this but here's the process that worked for me. At the emergency mode prompt, I typed "fsck /dev/sda3" and then answered yes to all questions about repairs, inodes and fixing issues. Once the prompt was back I reset the VM and got the regular logon prompt. I then tried accessing the web interface as usual but it gave the messaage saying /vsphere-client was not available. I went back to the console, logged in as root and did a clean reboot. Once back up, I check the nifty vsphere admin page https://[vcsa-name].[domain].com:5480 and saw the health badges were green. Finally I logged in to the normal interface https://[vcsa-name].[domain].com/vsphere-client and all was well.

AnsgarINTIS
Contributor
Contributor

I experienced the same iussue twice now. First time I could just go away with doing 'fsck /dev/sda3' and after the report everything was ok. But after the second occurence I can not login any more. The emergency shell does not accept my root password. I'm now completely stuck.

On my test rig I often lost the connection to my datastore and all vms came back or even didn't have any trouble with that. I gave the VCSA a try since I want to get away from the windows vcenter.

Best regards

Ansgar

0 Kudos
JRoseMHS
Contributor
Contributor

With a storage loss on VCSA 6.5 I have came across this "fsck failed with error code 4" issue.  Best thing I have found to do is continue in emergency mode by typing in your root password 2x.  Get into shell and type:  fsck /dev/mapper/log_vg-log  This will prompt you with a lot of questions and just answer yes to them.  Once done reboot and the VCSA should come back up. 

Mike_Gelhar
Enthusiast
Enthusiast

at the emergency shell prompt you don't need a password. Just type fsck /dev/sda3 and you should see the file system check start. This all assumes the underlying storage that went missing is made available again.

4-vcsa-file-system-check.JPG

AnsgarINTIS
Contributor
Contributor

As my VCSA 6.5 is still in my test lab I played a bit with this short unavailibility of my storage.

The VCSA is running on ESXi 5.5. and my storage is located on a windows server served as a NFS share.

The VCSA 6.5 stops working everytime I loose my storage by a simple reboot of the windows server. In many cases a normale reboot of the VCSA was not possible. So I had to stop the VM and restart it again.In a few cases I had to do a manual file system check.

But as a result such a setup of VCSA is not feasable. Every other VM (doesn't matter Linux or Windows) survived such an outage of the datastore.

Regards

Ansgar

0 Kudos
stewie2k
Contributor
Contributor

Just want to say thanks, this sorted me out and got me working again when all other answers failed

0 Kudos
nitrobass24
Contributor
Contributor

This worked for me.

Get into shell and type:  fsck /dev/mapper/log_vg-log

rkornson
Contributor
Contributor

Hi,

Had the same issue here, also twice. Fixed it with the command e2fsck /dev/sda3 and let it all fix. Reboot appliance and vCenter is up-and-running again.

Rgds,

Richard

0 Kudos
aoden
Contributor
Contributor

Thank you sir, I lost access to storage briefly and was faced with this issue on VCSA 6.5.0.5400. fsck /dev/mapper/log_vg-log

This resolved my issue as well.

chavez885
Contributor
Contributor

Worked for me too, this has happened twice now to my vcsa, hopefully latest update fixes this crap.

0 Kudos
iops
Contributor
Contributor

Can't thank you guys enough... the fsck /dev/mapper/log_vg-log worked for me the second time around, where as the first method to the disk brought the errors up again.  Back to the usual command prompt now, and background VC tasks are updating and happening as expected.

0 Kudos
scale21
Enthusiast
Enthusiast

Same. I had mine on shared storage so i had to fsck /dev/disk/by-partuuid/<UUID ID>

This was a bit tricky to find as you would think it would be in /by-uuid but it isnt/wasnt.

Answer Y to all. Reboot.

The UUID was displayed in the error on boot. Good thing tab complete works in recovery mode Smiley Happy

0 Kudos
danielmgrinnell
Enthusiast
Enthusiast

This FSCK worked great, good call! thx

0 Kudos
Nodnarb
Enthusiast
Enthusiast

fsck /dev/sda3 did it for me, too. Thanks!

0 Kudos
hendersp3
Enthusiast
Enthusiast

I am resurrecting this thread since it seams that any storage issue causes corrupt file systems.  Of course the steps below are good to correct the issue but I am curious if this is the expected behavior or is there a release that has "fixed" this? 

Thanks

0 Kudos
shudon
Contributor
Contributor

I also just had this come up. vCenter lost it's connection to the backend SAN and all of the VM's recovered normally other than this appliance. I would expect this to operate at least as well as the old Windows server it use to be hosted on. Hopefully they will add some built in recovery.

0 Kudos