VMware Cloud Community
shrcol
Contributor
Contributor
Jump to solution

ESX boot problems after power failure

Morning all,

I think this may be a simple issue to resolve but at the moment I am not sure how. We recently had a power cut that lasted longer than the UPS backup, hence the two ESX servers dropped. One came back without issue however the other one hasn't. When its powered on it gets to the boot menu, and when the normal boot mode is selected it kernel panics with a disk error (device missing or similar - can confirm if need be).

However, if I reboot it and select debug mode from the boot menu it starts fine, boots as normal and will run VM's. This leads me to believe that the disks, filesystem and OS are fine and the issue lies with the config, possibly damaged as a result of the unclean shutdown? I have done some looking around and read a post about repairing the boot sequence using the command: esxcfg-boot -r --- however couldnt quantify if this was relevant in my situation or not.

Any advice greatly appreciated.

Tags (2)
0 Kudos
1 Solution

Accepted Solutions
athlon_crazy
Virtuoso
Virtuoso
Jump to solution

Verify UUID in /boot/grub/grub.conf :

1) VMware ESX Server boot UUID (root=UUID=c087c715-534d-471e-a023-!@#$%*) is similar with your VMware ESX Server (debug mode).

2) VMware ESX Server (root=UUID=c087c715-534d-471e-a023-!@#$%*) is similar with "/" inside /etc/fstab

p/s : While you figuring out how, hav u thought about to reinstall your esx which can be done within few minutes.

vcbMC-1.0.6 Beta

vcbMC-1.0.7 Lite

http://www.no-x.org

View solution in original post

0 Kudos
19 Replies
athlon_crazy
Virtuoso
Virtuoso
Jump to solution

Verify UUID in /boot/grub/grub.conf :

1) VMware ESX Server boot UUID (root=UUID=c087c715-534d-471e-a023-!@#$%*) is similar with your VMware ESX Server (debug mode).

2) VMware ESX Server (root=UUID=c087c715-534d-471e-a023-!@#$%*) is similar with "/" inside /etc/fstab

p/s : While you figuring out how, hav u thought about to reinstall your esx which can be done within few minutes.

vcbMC-1.0.6 Beta

vcbMC-1.0.7 Lite

http://www.no-x.org
0 Kudos
shrcol
Contributor
Contributor
Jump to solution

Thanks for your quick response. I will take a look and update.

0 Kudos
shrcol
Contributor
Contributor
Jump to solution

OK,

The UUID's all matched for boot & debug mode in /boot/grub/grub.conf and also for the "/" section in /etc/fstab. I guess that means the confg is fine - you mentioned a reinstall of ESX on that machine, I guess thats a fairly straight forward process as all of the config and machines are within the VirtualCentre box and on the SAN? Is there anything that needs to be backed up from the machine itself?

Thanks again.

0 Kudos
athlon_crazy
Virtuoso
Virtuoso
Jump to solution

Yeh, just get backup your"/etc" folder as for references for your fresh ESX installation.

btw, do u mind to post here your :

  • grub.conf

  • /etc/fstab

vcbMC-1.0.6 Beta

vcbMC-1.0.7 Lite

http://www.no-x.org
0 Kudos
shrcol
Contributor
Contributor
Jump to solution

I don't mind posting the files, I am not sure how I would go about getting them on here however. With the box in debug mode SSH seems to be disabled (root logon in any case). I can get on using the console however cant think how I can easily get the files off to a client machine to post them here. I guess they could be sent to the other machine using SCP and then I could copy / paste them from a terminal session. My linux is a bit rusty, does that sound feasible?

0 Kudos
athlon_crazy
Virtuoso
Virtuoso
Jump to solution

I'm not sure in debug mode you can scp, smbclient & copy those file, however you can try boot from any linux live cd, mount your esx "/" dir, tar "/etc" folder and scp to another linux box.. Same goes to grub.conf & fstab file.

vcbMC-1.0.6 Beta

vcbMC-1.0.7 Lite

http://www.no-x.org
0 Kudos
shrcol
Contributor
Contributor
Jump to solution

I have some planned downtime for next week in which I can have a look at this further. Thanks for your help so far, once I have a look I will update the thread.

0 Kudos
shrcol
Contributor
Contributor
Jump to solution

OK, managed to down the box and take it out of production to work on. Live disc worked a treat, got a copy of /etc, grub.conf and fstab. Only thing is, the /etc folder copied but refused to send a number of files - permissions I guess - are there any specific ones that I can check copied? Below are the fstab and grub.conf

!!! fstab !!!

UUID=37f78e90-ae30-444d-8cd8-b90110ae846d / ext3 defaults 1 1

UUID=0b3968be-d9ff-4812-a36f-941e2f92a828 /boot ext3 defaults 1 2

none /dev/pts devpts gid=5,mode=620 0 0

none /dev/shm tmpfs defaults 0 0

none /proc proc defaults 0 0

UUID=e515da64-4bfd-4ba1-bf61-99c0572bcf8d /var/log ext3 defaults 1 2

UUID=dfc68819-e544-4ee4-a6a0-68360d734020 swap swap defaults 0 0

/dev/cdrom /mnt/cdrom udf,iso9660 noauto,owner,kudzu,ro 0 0

-

!!! grub.conf !!!

#vmware:configversion 1

  1. grub.conf generated by anaconda

#

  1. Note that you do not have to rerun grub after making changes to this file

  2. NOTICE: You have a /boot partition. This means that

  3. all kernel and initrd paths are relative to /boot/, eg.

  4. root (hd0,0)

  5. kernel /vmlinuz-version ro root=/dev/sda2

  6. initrd /initrd-version.img

#boot=/dev/sda

timeout=10

default=0

title VMware ESX Server

#vmware:autogenerated esx

root (hd0,0)

uppermem 277504

kernel --no-mem-option /vmlinuz-2.4.21-57.ELvmnix ro root=UUID=37f78e90-ae30-444d-8cd8-b90110ae846d mem=272M

initrd /initrd-2.4.21-57.ELvmnix.img

title VMware ESX Server (debug mode)

#vmware:autogenerated esx

root (hd0,0)

uppermem 277504

kernel --no-mem-option /vmlinuz-2.4.21-57.ELvmnix ro root=UUID=37f78e90-ae30-444d-8cd8-b90110ae846d mem=272M console=ttyS0,115200 console=tty0 debug

initrd /initrd-2.4.21-57.ELvmnix.img-dbg

title Service Console only (troubleshooting mode)

#vmware:autogenerated esx

root (hd0,0)

uppermem 277504

kernel --no-mem-option /vmlinuz-2.4.21-57.ELvmnix ro root=UUID=37f78e90-ae30-444d-8cd8-b90110ae846d mem=272M tblsht

initrd /initrd-2.4.21-57.ELvmnix.img-sc

-

Thanks!

0 Kudos
athlon_crazy
Virtuoso
Virtuoso
Jump to solution

I think the most related one is /etc/vmware/esx.conf but use this file as your reference only. Don't simply copy and paste.






vcbMC-1.0.6 Beta

vcbMC-1.0.7 Lite

http://www.no-x.org

http://www.no-x.org
0 Kudos
shrcol
Contributor
Contributor
Jump to solution

I don't have an /etc/vmware/esx.conf - do have a 'config' in the same folder. Perhaps it missed the file during the copy. I can retry it using 'sudo' that should get the lot hopefully.

0 Kudos
shrcol
Contributor
Contributor
Jump to solution

Just did the copy again using sudo and now have the complete /etc folder. The vmware/esx.conf is now present. I guess nows the time to reload the ESX install. Its part of a 2 node setup controlled by VirtualCentre. Beyond the /esx folder, grub.conf and fstab from the failed machine is there anything else thats needs backing up or doing to the VC box etc before I reload. I presume the reload process is basically just as if I was starting from scratch?

0 Kudos
athlon_crazy
Virtuoso
Virtuoso
Jump to solution

I donno what do you mean by reload but by doing new installation & reconfigure it base on your esx.cfg you can get your new ESX up & running within minutes instead you troubleshooting it.

vcbMC-1.0.6 Beta

vcbMC-1.0.7 Lite

http://www.no-x.org
0 Kudos
shrcol
Contributor
Contributor
Jump to solution

By reload I mean re-install. Thanks, I will have a go at it now, just need to find the install media Smiley Happy

0 Kudos
shrcol
Contributor
Contributor
Jump to solution

Just finished the fresh install. All seems to have gone well. Booted from the disc, went through a fresh install. Once finished configured the networking and storage. Finally ran the update process. Just have one small issue. I created a new VM on the rebuilt machine just to test and while it creates it fine and will power it on and off the console wont connect. I have updated my VI client to the latest but it hasnt made a difference. I can still open the console on existing machines, just not the new one - get a timeout error. Only thing that comes to mind is that the other ESX server is now on an older patch level than the new one. Any ideas?

0 Kudos
shrcol
Contributor
Contributor
Jump to solution

OK, moved new machine with non working console to other ESX server and it all works fine. Seems like a firewall issue on rebuilt ESX host?

0 Kudos
athlon_crazy
Virtuoso
Virtuoso
Jump to solution

What do you mean by console not working? any exact error or screenshot?






vcbMC-1.0.6 Beta

vcbMC-1.0.7 Lite

http://www.no-x.org

http://www.no-x.org
0 Kudos
shrcol
Contributor
Contributor
Jump to solution

The console window appears, and after 30 seconds or so it displays a yellow banner saying; Cannot connect to host <fresh build esx server name> - A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.

0 Kudos
athlon_crazy
Virtuoso
Virtuoso
Jump to solution

Try this KB






vcbMC-1.0.6 Beta

vcbMC-1.0.7 Lite

http://www.no-x.org

http://www.no-x.org
shrcol
Contributor
Contributor
Jump to solution

Sorry! Just rebooted the host (first time after all config changes made) and its now working. Many thanks for all your assistance getting this back working correctly.

0 Kudos