Hi all,
One of our ESX 3.5 hosts reboot itself unexpectedly and I was looking for a little help in terms of what to look for to try and find the reason why. Being a windows person I'm not exactly familiar with Linux logs and which ones to check...
Thanks in advance
VCP,MCSE NT4/W2k/W2k3, MCSA W2k3
A very high level place to start is on that ESX host in the /var/log folder. You can find all the logs to check out. In my experience when a machine just reboots out of the blue usually it is 1 or 2 things.
1. Someone rebooted the server by acident
2. You have had some kind of hardware error and I would start with memory first.
If you have any hardware agents like IBM Director or HP SIM ect take a look and see if you see any alerts or problems and look at the physical server for any yellow light that point to a problem
Steve Beaver
VMware Communities User Moderator
====
Co-Author of "VMware ESX Essentials in the Virtual Data Center"
(ISBN:1420070274) from Auerbach
*Virtualization is a journey, not a project.*
Thanks sbeaver for taking the time to respond, but it still doesn't help me. I know the log files are in /var/log, but which log files do I need to look at as there is a plethora of logs. Also there are no agents installed so I can discount that right off the bat...
Thanks again
VCP,MCSE NT4/W2k/W2k3, MCSA W2k3
Have to agree with steve here.
--Matt
Also take a look at the vmkernel logs also
Yeah had a look at the vmkernel logs and conspicuously there are entries missing around that time of the event?
VCP,MCSE NT4/W2k/W2k3, MCSA W2k3
That makes me think hardware issue
OK then thanks for your help appreciate it.
VCP,MCSE NT4/W2k/W2k3, MCSA W2k3
Dont agree here.
This entry:
Dec 12 09:26:28 eduvm108 shutdown: shutting down for system halt
Dec 12 09:26:28 eduvm108 init: Switching to runlevel: 0
This is a controlled reboot request. Someone in your organization requested a controlled reboot of this host.
--Matt
Well this is a bit frightening indeed! Because the root password is tightly controlled and only a few people should know it.
That would explain the "...root...127.0.0.1" which would mean some one had done it from ILO/RSA/console..?
May have to look at changing the root password.
VCP,MCSE NT4/W2k/W2k3, MCSA W2k3
Dec 12 09:26:50 eduvm108 vmware-hostd[2088]: Accepted password for user root from 127.0.0.1
^^^ that bit is normal on an ESX host when hostd is doing stuff.
It could have been initiated from the console or via the VIC.
--Matt
Thanks to the both of you for your help, I think I will be recommending a password change for our hosts
VCP,MCSE NT4/W2k/W2k3, MCSA W2k3