VMware Cloud Community
gkern
Contributor
Contributor

HOST went down

Hello:

Our ESX3.0.1 HOST server went down yesterday afternoon for no apparent reason:

There was no power spike, nobody tripped over any cord, and none of the other

servers (W2K3) or network components had any problem. Since I was remote, I had

one of my Power Users go into the server room, confirm that everything was connected,

and then power that thing back on by hand... It came up just fine, and I was able to

RDP to another box, launch the VI Client, and re-start all my VMs...

(The box is an HP DL380G5 with 4Gb of memory and several hundred Gigs of disk)...

My question is, what TOOLS are there -- or Steps I can take -- to find out why that

machine went down, and what I can do to prevent a repeat episode...

Thanks very much.

0 Kudos
5 Replies
oreeh
Immortal
Immortal

Check the logs (/var/log) on the host and also check the ILO and HP management agent logs.

0 Kudos
esiebert7625
Immortal
Immortal

Definitely sounds like a hardware or management agent issue. Do you have the HP Management agents for ESX installed? If so what version? Log into the HP console (usually https://:2381) and see if there are any errors. Do you have any other 3rd party software installed on the Service Console? You can also check some of the ESX server logs but you may not find much in them if it was a hardware issue.

How do I troubleshoot ESX server issues?

You can check several log files on the ESX server based on the problem you are experiencing, these include:

o Vmkernel - /var/log/vmkernel – records activities related to the virtual machines and ESX server

o Vmkernel Warnings - /var/log/vmkwarning – records activities with the virtual machines

o Vmkernel Summary - /var/log/vmksummary - Used to determine uptime and availability statistics for ESX Server; human-readable summary found in /var/log/vmksummary.txt

o ESX Server host agent log - /var/log/vmware/hostd.log - Contains information on the agent that manages and configures the ESX Server host and its virtual machines (Search the file date/time stamps to find the log file it is currently outputting to.)

o Service Console - /var/log/messages - Contain all general log messages used to troubleshoot virtual machines or ESX Server

o Web Access - /var/log/vmware/webAccess - Records information on Web-based access to ESX Server

o Authentication log - /var/log/secure - Contains records of connections that require authentication, such as VMware daemons and actions initiated by the xinetd daemon.

o VirtualCenter agent - /var/log/vmware/vpx - Contains information on the agent that communicates with VirtualCenter

o Virtual Machines - The same directory as the affected virtual machine’s configuration files; named vmware.log - Contain information when a virtual machine crashes or ends abnormally

Fyi…if you find this post helpful, please award points using the Helpful/Correct buttons.

-=-=-=-=-=-=-=-=-=-=-==-=-=-=-=-=-=-=-=-=-=-=-

Thanks, Eric

Visit my website: http://vmware-land.com

-=-=-=-=-=-=-=-=-=-=-==-=-=-=-=-=-=-=-=-=-=-=-

0 Kudos
gkern
Contributor
Contributor

OK, I used the "nano" command line command to check out all those files... There wasn't anything listed there that would indicate any kind of major Problem... I might have overlooked something, of course, but nothing jumped out at me...

Any suggestions about checking the VMs themselves might be kind of secondary... The HOST itself went down hard, requiring a physical "Start button" reboot... No idea why, and therefore I'm concerned that it might happen again...

It's a very basic config, with my data store on the local disk set... I've got 4 light-duty, Citrix-related VMs running, and that's about it...

0 Kudos
esiebert7625
Immortal
Immortal

So by the host going down hard did you get a Purple screen? Typically faulty memory can cause this. Do you have any hardware management agents installed on the server?

0 Kudos
oreeh
Immortal
Immortal

Check the logs available in ILO - if you have a hardware problem there should be an indication somewhere.

0 Kudos