Failure to boot post Update Manager push to ESX 3.5 - opinion on root cause?
Is root cause grub.conf corruption or Memory settings?
Scenario:
vCenter 4.1.0 Bld 491557
VMware Update Mgr 4.1.0.6589
ESX 3.5 Bld 64607 patching to Bld 604481
ESX 3.5 node fails to boot with error:
Error 15 : File not found
ESX will not boot in any mode (regular, debug, troubleshooting) even after multiple hardware reboots and attempts in each mode.
VMware support consulted for several hours and escalated to Linux expert - advised unrecoverable without boot into debug or
troubleshooting modes from boot window. Multiple tries to boot from screen options failed with same Error 15. Recommended rebuild.
Edit "e" option of boot command and direct boot-up option not offered.
Resolution:
KB 10065, 1007908, 1004574, 1004104
I don't see the "edit" option in any reviewed KBs
Highlight Service Console Only (troubleshooting mode) and enter "e" for edit
Modify uppermem, kernel, initrd entries as needed. I modified the memory from 272 to 800, changed the "kernel" string and "initrd"
string to match previously successful updated ESX server "/boot/grub/grub.conf" file. It was a single digit change reflecting the
upgraded kernel and initrd (fourth set of numbers changed from "63" to "66")
Note: the strings in command window in iLo will run off display, "arrow" left/right as needed
Very important! Hit the "b" command from edit window to boot using the modified boot entry
root (hd0,0)
uppermem 818176
kernel --no-mem-option /vmlinuz-2.4.21-66.ELvmnix ro root=UUID=42aea862-3ca2-46f6-8dcb-8d5f7050eeee mem=800M tblsht
initrd /initrd-2.4.21-66.ELvmnix
This will boot into troubleshooting mode
Logon as root and run esxcfg-boot commands below
esxcfg-boot -p
esxcfg-boot -b
esxcfg-boot -r
Modify Service Console memory via vCenter to 800 Mg Memory
Reboot in normal mode - this should resolve issue.
Additional info -
space was not the issue
df -h
Filesystem Size Used Avail Use% Mounted on
/dev/cciss/c0d0p2 5.0G 1.5G 3.3G 32% /
/dev/cciss/c0d0p1 244M 32M 200M 14% /boot
/dev/cciss/c0d0p8 2.0G 33M 1.9G 2% /home
none 391M 0 391M 0% /dev/shm
/dev/cciss/c0d0p6 4.0G 33M 3.8G 1% /tmp
/dev/cciss/c0d0p7 4.0G 76M 3.7G 2% /var/log
esxcfg-boot
-h –help
-q –query bootvmkmod
-p –update-pci
-b –update-boot
-d –rootdev UUID=
-a –kernelappend
-r –refresh-initrd
-g –regenerate-grub
Queries cannot be combined with each other or other options. Passing -p or -d enables -b even if it is not passed explicitly. -b
implies -g plus a new initrd creation. -b and -r are incompatible, but -g and -r can be combined.
[root@ESXserver11 /]# cat /boot/grub/grub.conf
#vmware:configversion 1
# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE: You have a /boot partition. This means that
# all kernel and initrd paths are relative to /boot/, eg.
# root (hd0,0)
# kernel /vmlinuz-version ro root=/dev/cciss/c0d0p2
# initrd /initrd-version.img
#boot=/dev/cciss/c0d0
timeout=10
default=0
title VMware ESX Server
#vmware:autogenerated esx
root (hd0,0)
uppermem 818176
kernel --no-mem-option /vmlinuz-2.4.21-66.ELvmnix ro root=UUID=42aea862-3ca2-46f6-8dcb-8d5f7050eeee mem=800M
initrd /initrd-2.4.21-66.ELvmnix.img
title VMware ESX Server (debug mode)
#vmware:autogenerated esx
root (hd0,0)
uppermem 818176
kernel --no-mem-option /vmlinuz-2.4.21-66.ELvmnix ro root=UUID=42aea862-3ca2-46f6-8dcb-8d5f7050eeee mem=800M
console=ttyS0,115200 console=tty0 debug
initrd /initrd-2.4.21-66.ELvmnix.img-dbg
title Service Console only (troubleshooting mode)
#vmware:autogenerated esx
root (hd0,0)
uppermem 818176
kernel --no-mem-option /vmlinuz-2.4.21-66.ELvmnix ro root=UUID=42aea862-3ca2-46f6-8dcb-8d5f7050eeee mem=800M tblsht
initrd /initrd-2.4.21-66.ELvmnix.img-sc
Is root cause grub.conf corruption or Memory settings?
All points will be awarded.