VMware Cloud Community
GravelberryPieK
Contributor
Contributor

Multiple Server reboots

We have 4 PE R710 in a rack with other devices.  3 of the R710 are part of a VMware 3.5 cluster.  The 4th is a stand-alone VMware 4.0 ESXi (free version) machine.  On 7/22 around 4:07 all 4 reboot.  Nothing else in the rack did so there is no outside power issue.  All of the equipment plug into powerstrip inside the rack and then to the UPS.  The SAN showed no power issues.  The UPS showed no power issues.  3 other non R710 machines without DRAC did not reboot.  I'm wondering if anyone has any thoughts on where to start looking.  I've already spoken to VMwae support and review all the logs.  From the VMware view, the systems were turned off and on.  I'm wondering if there could be some CPU, iDRAC or something else on this hardware that reboots a machine and leaves no logging?

0 Kudos
3 Replies
marcelo_soares
Champion
Champion

Did you had simple reboot or a PSOD? Check the /root directory of the hosts to verify if they did not generated any dump. Also, the file where you will find anything is the /var/log/vmkernel (if you want to send some lines, fine).

Anyway, if support didn't got anything... Dell maybe can tell you if there is any possibility of DRAC doing this...

Marcelo Soares
0 Kudos
firestartah
Virtuoso
Virtuoso

Hi

If there's nothing in the logs apart from it saying a standard shutdown then it sounds like someone sent a reboot request to the machines via the DRAC card/interface. I would assume only a very select number of people have access so someone must have been on doing something and pressed the wrong thing.

Gregg

If you found this or other information useful, please consider awarding points for "Correct" or "Helpful". Gregg http://thesaffageek.co.uk
0 Kudos
GravelberryPieK
Contributor
Contributor

All good thoughts but there was no PSOD as there was no memory dump file and /var/log/vmkernel only show a normal reboot. After reviewing the DRAC logs there are no logins for months until I just checked so there was no accidental DRAC reboot caused by an administrator. All very strange to see random servers in a rack and UPS reboot like this.

0 Kudos