I have 2x Dell M620 blades servers running ESXi 5.5, which are connected to vCenter. Last night I tried to connect to the DCUI through the drac to host 2 but the keyboard didn't work( F2 or F12). At this point I assumed it was a problem with the drac so I went down and plugged a keyboard directly into the host but the same thing happened. I then decided to put the host into maintenance mode and give it a reboot but this didn't work either. The host came back out of maintenance mode and I was able to vMotion VMs back to it and everything appears to be working fine. There's no errors or alerts in the vSphere client but I'm stumped to which logs or what to look for. There's nothing on google, so it's either rare or stupidly obvious. Anyone else experienced this before ?
I reverted the bootbank from 5.5.0-2.33.2068190 to 5.50-2068190 which didn't work and I also couldn't get the host back in the cluster, so I ended up re-installing and reconfiguring it.
Hi,
If the issue occurs due to OS, then we would have some logs in the host logs or some indication about why the host was frozen. If suddenly, host stopped responding and if you are unable to find any logs, then I would recommend you to check at hardware end if you are able to see any issues. Perform a CPU or Memory intensive diagnostics test at the host hardware and update the BIOS to the latest version.
Regards,
Suresh
Welcome to the community,
as workaround you can try to connect via SSH and run DCUI from that session.
Just type DCUI followed by Enter... to "quit" back to the console press Ctrl+C
If this will work I would suspect some issue with Blade enclosure ... do you experince this issue on the second Blade server?
The first host is working fine, it's only the second host that has the problem. Last time it was known to be working was about 3 months ago when updates were run.
I was able to make the change I needed after finding this
http://www.virtuallyghetto.com/2011/07/how-to-add-splash-of-remote-color-to.html
However I'm thinking maybe this is could be part of a bigger problem, so I'm trawling through the host logs https://host-ip/host but unfortunately they don't make a whole lot of sense to me.
OK can you please post vmkernel.log for analysis and I would be also interested in ESXTOP CPU stats output.
This behavior could be also caused by faulty HW or just a driver/firmware issue...?!
In the meantime verify your firmware versions and update to the latest if possible.
Simplest try is to revert back to the previous state and boot the host from alternate bootbank (in ESXi boot sequence SHIFT+R)
and try if anything changed? (just for prove if later updates has brought some inconsistencies...)
The blade and blade chassis firmware was updated when ESXi on the hosts were updated around November last year.
I've attached the ESXTOP CPU output and the vmkernal log - I renamed the servers to VMserver
The ESXTOP CPU output is showing high CPU usage for dcui.213978
Before you ask, no there's not 2 keyboards connected lol
DCUI – High CPU usage on ESXi | Tom Fojta's Blog
regarding DCUI usage have a look at this KB:
in the meantime I will try to dive into the log...
Thanks for the help
The esx.conf.LOCK file doesn't exist, I assume this means there's no lock on the esx.conf file ?
That means we are out of luck...;-)
unfortunately I'am not at office right now to inspect your logs, what about to rollback to previous image using AltBootbank...?!
It's on my to do list, I'll let you know how it goes.
When I revert to the alt bootbank image does it make it the primary bootbank,do I lose the current primary bootbank ?
If I am not mistaken when you revert primary bootbank just become secondary to which you can revert....
Anyway before such operation you can backup host configuration for more on that see:
Is there anyway to check what's in the alt bootbank, such as what ESXi version/build or even when it was created ?
when you hit Shift+R you will enter the Hypervisor Recovery screen ... under Installed Hypervisors
you will see what is the current (default) build and what's the build of the second Alt-image.
I reverted the bootbank from 5.5.0-2.33.2068190 to 5.50-2068190 which didn't work and I also couldn't get the host back in the cluster, so I ended up re-installing and reconfiguring it.