VMware Cloud Community
techmind
Contributor
Contributor

ESXi Reboots

Hi,

I am unable to figure out why our ESXi 6.0.0 randomly reboots. When that happens, the two virtual servers running off it, just disappear from the network of course, only to come back about 20 minutes later. It almost looks like the ESXi is crashing, rebooting and automatically launching the virtual servers. I have connected it to a new UPS (a week old or so), which is connected to a separate outlet from the rest of the gear (switches, routers, etc...). The ESXi is the only thing connected to this new UPS. There is a USB cable connected from the back of the UPS to the ESXi. I have collected two sets of logs, one from a month ago or so, which I'll post first, and one from today, which I'll post second:

First Set of Logs (about a month old):

[root@KETKHST001:~] cat /var/log/vmksummary.log | grep bootstop

2017-01-07T05:57:43Z bootstop: Host has booted

2017-01-20T14:15:15Z bootstop: Host has booted

2017-01-20T23:02:58Z bootstop: Host has booted

2017-02-03T22:19:14Z bootstop: Host has booted

2017-02-15T14:14:56Z bootstop: Host has booted

2017-02-18T15:13:32Z bootstop: Host has booted

2017-03-04T14:29:44Z bootstop: Host has booted

2017-03-18T13:46:02Z bootstop: Host has booted

2017-04-01T13:02:04Z bootstop: Host has booted

2017-04-15T12:18:18Z bootstop: Host has booted

2017-04-24T22:19:35Z bootstop: Host has booted

2017-05-08T21:35:46Z bootstop: Host has booted

2017-05-21T20:13:12Z bootstop: Host has booted

2017-05-26T11:13:53Z bootstop: Host has booted

2017-05-30T15:25:07Z bootstop: Host is rebooting

2017-05-30T16:02:22Z bootstop: Host has booted

2017-06-09T10:30:05Z bootstop: Host has booted

2017-06-15T20:07:00Z bootstop: Host has booted

[root@KETKHST001:~] esxcfg-advcfg -g /Misc/BlueSreenTimeout

Exception occured: Unable to find option BlueSreenTimeout

[root@KETKHST001:~] cat /var/log/hostd.log | grep poweroff

[root@KETKHST001:~] cat /var/log/hostd.log | grep shutdown

2017-06-15T20:06:48.559Z info hostd[FFCFDA80] [Originator@6876 sub=Solo.VmwareCLI] (vim.EsxCLI.system.shutdown) ha-cli-handler-system-shutdown created

2017-06-15T20:06:53.700Z info hostd[FFCFDA80] [Originator@6876 sub=TagExtractor] 8: Rule type=[N5Hostd6Common31MethodNameBasedTagExtractorRuleE:0x1f450560], id=rule[VMFoundryOpRule], tag=IsVMFoundryOp, regex=vim\.VirtualMachine\.(acquireMksTicket|clone|createSecondary|createSnapshot|customize|disableSecondary|enableSecondary|makePrimary|migrate|mountToolsInstaller|powerOff|powerOn|rebootGuest|reconfigure|reload|reloadFromPath|relocate|removeAllSnapshots|turnOffFaultTolerance|recommendHostsForSecondaryVm|rename|reset|resetGuestInformation|retrieveScreenshot|revertToCurrentSnapshot|revertToSnapshot|screenshot|setScreenResolution|shutdownGuest|standbyGuest|startRecording|startReplaying|stopRecording|stopReplaying|suspend|terminate|terminateFaultTolerantVM|unmountToolsInstaller|unregister|upgradeTools|upgradeVirtualHardware) - Identifies Virtual Machine operat

[root@KETKHST001:~] cat /var/log/hostd.log | grep reboot

2017-06-15T20:06:53.700Z info hostd[FFCFDA80] [Originator@6876 sub=TagExtractor] 8: Rule type=[N5Hostd6Common31MethodNameBasedTagExtractorRuleE:0x1f450560], id=rule[VMFoundryOpRule], tag=IsVMFoundryOp, regex=vim\.VirtualMachine\.(acquireMksTicket|clone|createSecondary|createSnapshot|customize|disableSecondary|enableSecondary|makePrimary|migrate|mountToolsInstaller|powerOff|powerOn|rebootGuest|reconfigure|reload|reloadFromPath|relocate|removeAllSnapshots|turnOffFaultTolerance|recommendHostsForSecondaryVm|rename|reset|resetGuestInformation|retrieveScreenshot|revertToCurrentSnapshot|revertToSnapshot|screenshot|setScreenResolution|shutdownGuest|standbyGuest|startRecording|startReplaying|stopRecording|stopReplaying|suspend|terminate|terminateFaultTolerantVM|unmountToolsInstaller|unregister|upgradeTools|upgradeVirtualHardware) - Identifies Virtual Machine operat

[root@KETKHST001:~]

Second Set of Logs (happened today):

[root@KETKHST001:~] cat /var/log/vmksummary.log | grep bootstop

2017-01-07T05:57:43Z bootstop: Host has booted

2017-01-20T14:15:15Z bootstop: Host has booted

2017-01-20T23:02:58Z bootstop: Host has booted

2017-02-03T22:19:14Z bootstop: Host has booted

2017-02-15T14:14:56Z bootstop: Host has booted

2017-02-18T15:13:32Z bootstop: Host has booted

2017-03-04T14:29:44Z bootstop: Host has booted

2017-03-18T13:46:02Z bootstop: Host has booted

2017-04-01T13:02:04Z bootstop: Host has booted

2017-04-15T12:18:18Z bootstop: Host has booted

2017-04-24T22:19:35Z bootstop: Host has booted

2017-05-08T21:35:46Z bootstop: Host has booted

2017-05-21T20:13:12Z bootstop: Host has booted

2017-05-26T11:13:53Z bootstop: Host has booted

2017-05-30T15:25:07Z bootstop: Host is rebooting

2017-05-30T16:02:22Z bootstop: Host has booted

2017-06-09T10:30:05Z bootstop: Host has booted

2017-06-15T20:07:00Z bootstop: Host has booted

2017-06-23T20:01:55Z bootstop: Host is powering off

2017-06-23T20:09:05Z bootstop: Host has booted

2017-07-07T17:42:08Z bootstop: Host has booted <----- This happened today

2017-07-07T17:57:44Z bootstop: Host is rebooting

2017-07-07T18:12:47Z bootstop: Host has booted <----- After auto reboot of the ESXi, one of the VM got stuck in "powering on" so I had to force shutdown the ESXi and reboot again

[root@KETKHST001:~] esxcfg-advcfg -g /Misc/BlueSreenTimeout

Exception occured: Unable to find option BlueSreenTimeout

[root@KETKHST001:~] cat /var/log/hostd.log | grep poweroff

[root@KETKHST001:~] cat /var/log/hostd.log | grep shutdown

2017-07-07T17:41:55.653Z info hostd[FFB8BA80] [Originator@6876 sub=Solo.VmwareCLI] (vim.EsxCLI.system.shutdown) ha-cli-handler-system-shutdown created

2017-07-07T17:42:01.377Z info hostd[FFB8BA80] [Originator@6876 sub=TagExtractor] 8: Rule type=[N5Hostd6Common31MethodNameBasedTagExtractorRuleE:0x28529e08], id=rule[VMFoundryOpRule], tag=IsVMFoundryOp, regex=vim\.VirtualMachine\.(acquireMksTicket|clone|createSecondary|createSnapshot|customize|disableSecondary|enableSecondary|makePrimary|migrate|mountToolsInstaller|powerOff|powerOn|rebootGuest|reconfigure|reload|reloadFromPath|relocate|removeAllSnapshots|turnOffFaultTolerance|recommendHostsForSecondaryVm|rename|reset|resetGuestInformation|retrieveScreenshot|revertToCurrentSnapshot|revertToSnapshot|screenshot|setScreenResolution|shutdownGuest|standbyGuest|startRecording|startReplaying|stopRecording|stopReplaying|suspend|terminate|terminateFaultTolerantVM|unmountToolsInstaller|unregister|upgradeTools|upgradeVirtualHardware) - Identifies Virtual Machine operat

2017-07-07T18:12:34.575Z info hostd[FFCC9A80] [Originator@6876 sub=Solo.VmwareCLI] (vim.EsxCLI.system.shutdown) ha-cli-handler-system-shutdown created

2017-07-07T18:12:39.303Z info hostd[FFCC9A80] [Originator@6876 sub=TagExtractor] 8: Rule type=[N5Hostd6Common31MethodNameBasedTagExtractorRuleE:0x57821eb8], id=rule[VMFoundryOpRule], tag=IsVMFoundryOp, regex=vim\.VirtualMachine\.(acquireMksTicket|clone|createSecondary|createSnapshot|customize|disableSecondary|enableSecondary|makePrimary|migrate|mountToolsInstaller|powerOff|powerOn|rebootGuest|reconfigure|reload|reloadFromPath|relocate|removeAllSnapshots|turnOffFaultTolerance|recommendHostsForSecondaryVm|rename|reset|resetGuestInformation|retrieveScreenshot|revertToCurrentSnapshot|revertToSnapshot|screenshot|setScreenResolution|shutdownGuest|standbyGuest|startRecording|startReplaying|stopRecording|stopReplaying|suspend|terminate|terminateFaultTolerantVM|unmountToolsInstaller|unregister|upgradeTools|upgradeVirtualHardware) - Identifies Virtual Machine operat

[root@KETKHST001:~]

Can anything be gleaned from the information above? If not, what would you recommend the next step be? I'm not really familiar with ESXi and I am not sure, for example, if a system was setup to send the logs to a different log server. Any information would be greatly appreciated.

0 Kudos
5 Replies
dekoshal
Hot Shot
Hot Shot

Check this VMware KB article.

Determining why an ESXi/ESX host was powered off or restarted (1019238) | VMware KB

If you found this or any other answer helpful, please consider the use of the Correct or Helpful to award points.

Best Regards,

Deepak Koshal

CNE|CLA|CWMA|VCP4|VCP5|CCAH

0 Kudos
vijayrana968
Virtuoso
Virtuoso

If it is a managed UPS then you can check UPS agent settings, also checks Systems logs via ILO/DREC with hardware perspective.

0 Kudos
techmind
Contributor
Contributor

Hi Deepak,

Thank you but I've seen that doc, that's how I managed to get those logs extracted. I'm still using it as a reference.

0 Kudos
techmind
Contributor
Contributor

Hi vijayrana968,

I like the Ilo idea but I'm still trying to figure out why it's only on port 22 and it doesn't have port 80 or 443 open at all. Wondering if there is a way to enable those ports from SSH? Either way, I'll be onsite next week and will have direct access to the server and will report back any new findings. BTW, Is "DREC/Hardware Perspective" a feature of ILO or something that needs to be installed separately? As I mentioned earlier, I'm not really familiar with this type of environment.

Thank you

0 Kudos
Dooti
Contributor
Contributor

What is the make and model of the Server ?

Configuring ESXi coredump to file instead of partition (2077516)

Configuring a diagnostic coredump partition on an ESXi 5.x/6.x host (2004299) | VMware KB

Verify is ASR enabled on the Host.

HP Automatic Server Recovery (ASR) - VMware Knowledge Base

https://kb.vmware.com/kb/1010842

0 Kudos