Hi,
I am unable to figure out why our ESXi 6.0.0 randomly reboots. When that happens, the two virtual servers running off it, just disappear from the network of course, only to come back about 20 minutes later. It almost looks like the ESXi is crashing, rebooting and automatically launching the virtual servers. I have connected it to a new UPS (a week old or so), which is connected to a separate outlet from the rest of the gear (switches, routers, etc...). The ESXi is the only thing connected to this new UPS. There is a USB cable connected from the back of the UPS to the ESXi. I have collected two sets of logs, one from a month ago or so, which I'll post first, and one from today, which I'll post second:
First Set of Logs (about a month old):
[root@KETKHST001:~] cat /var/log/vmksummary.log | grep bootstop
2017-01-07T05:57:43Z bootstop: Host has booted
2017-01-20T14:15:15Z bootstop: Host has booted
2017-01-20T23:02:58Z bootstop: Host has booted
2017-02-03T22:19:14Z bootstop: Host has booted
2017-02-15T14:14:56Z bootstop: Host has booted
2017-02-18T15:13:32Z bootstop: Host has booted
2017-03-04T14:29:44Z bootstop: Host has booted
2017-03-18T13:46:02Z bootstop: Host has booted
2017-04-01T13:02:04Z bootstop: Host has booted
2017-04-15T12:18:18Z bootstop: Host has booted
2017-04-24T22:19:35Z bootstop: Host has booted
2017-05-08T21:35:46Z bootstop: Host has booted
2017-05-21T20:13:12Z bootstop: Host has booted
2017-05-26T11:13:53Z bootstop: Host has booted
2017-05-30T15:25:07Z bootstop: Host is rebooting
2017-05-30T16:02:22Z bootstop: Host has booted
2017-06-09T10:30:05Z bootstop: Host has booted
2017-06-15T20:07:00Z bootstop: Host has booted
[root@KETKHST001:~] esxcfg-advcfg -g /Misc/BlueSreenTimeout
Exception occured: Unable to find option BlueSreenTimeout
[root@KETKHST001:~] cat /var/log/hostd.log | grep poweroff
[root@KETKHST001:~] cat /var/log/hostd.log | grep shutdown
2017-06-15T20:06:48.559Z info hostd[FFCFDA80] [Originator@6876 sub=Solo.VmwareCLI] (vim.EsxCLI.system.shutdown) ha-cli-handler-system-shutdown created
2017-06-15T20:06:53.700Z info hostd[FFCFDA80] [Originator@6876 sub=TagExtractor] 8: Rule type=[N5Hostd6Common31MethodNameBasedTagExtractorRuleE:0x1f450560], id=rule[VMFoundryOpRule], tag=IsVMFoundryOp, regex=vim\.VirtualMachine\.(acquireMksTicket|clone|createSecondary|createSnapshot|customize|disableSecondary|enableSecondary|makePrimary|migrate|mountToolsInstaller|powerOff|powerOn|rebootGuest|reconfigure|reload|reloadFromPath|relocate|removeAllSnapshots|turnOffFaultTolerance|recommendHostsForSecondaryVm|rename|reset|resetGuestInformation|retrieveScreenshot|revertToCurrentSnapshot|revertToSnapshot|screenshot|setScreenResolution|shutdownGuest|standbyGuest|startRecording|startReplaying|stopRecording|stopReplaying|suspend|terminate|terminateFaultTolerantVM|unmountToolsInstaller|unregister|upgradeTools|upgradeVirtualHardware) - Identifies Virtual Machine operat
[root@KETKHST001:~] cat /var/log/hostd.log | grep reboot
2017-06-15T20:06:53.700Z info hostd[FFCFDA80] [Originator@6876 sub=TagExtractor] 8: Rule type=[N5Hostd6Common31MethodNameBasedTagExtractorRuleE:0x1f450560], id=rule[VMFoundryOpRule], tag=IsVMFoundryOp, regex=vim\.VirtualMachine\.(acquireMksTicket|clone|createSecondary|createSnapshot|customize|disableSecondary|enableSecondary|makePrimary|migrate|mountToolsInstaller|powerOff|powerOn|rebootGuest|reconfigure|reload|reloadFromPath|relocate|removeAllSnapshots|turnOffFaultTolerance|recommendHostsForSecondaryVm|rename|reset|resetGuestInformation|retrieveScreenshot|revertToCurrentSnapshot|revertToSnapshot|screenshot|setScreenResolution|shutdownGuest|standbyGuest|startRecording|startReplaying|stopRecording|stopReplaying|suspend|terminate|terminateFaultTolerantVM|unmountToolsInstaller|unregister|upgradeTools|upgradeVirtualHardware) - Identifies Virtual Machine operat
[root@KETKHST001:~]
Second Set of Logs (happened today):
[root@KETKHST001:~] cat /var/log/vmksummary.log | grep bootstop
2017-01-07T05:57:43Z bootstop: Host has booted
2017-01-20T14:15:15Z bootstop: Host has booted
2017-01-20T23:02:58Z bootstop: Host has booted
2017-02-03T22:19:14Z bootstop: Host has booted
2017-02-15T14:14:56Z bootstop: Host has booted
2017-02-18T15:13:32Z bootstop: Host has booted
2017-03-04T14:29:44Z bootstop: Host has booted
2017-03-18T13:46:02Z bootstop: Host has booted
2017-04-01T13:02:04Z bootstop: Host has booted
2017-04-15T12:18:18Z bootstop: Host has booted
2017-04-24T22:19:35Z bootstop: Host has booted
2017-05-08T21:35:46Z bootstop: Host has booted
2017-05-21T20:13:12Z bootstop: Host has booted
2017-05-26T11:13:53Z bootstop: Host has booted
2017-05-30T15:25:07Z bootstop: Host is rebooting
2017-05-30T16:02:22Z bootstop: Host has booted
2017-06-09T10:30:05Z bootstop: Host has booted
2017-06-15T20:07:00Z bootstop: Host has booted
2017-06-23T20:01:55Z bootstop: Host is powering off
2017-06-23T20:09:05Z bootstop: Host has booted
2017-07-07T17:42:08Z bootstop: Host has booted <----- This happened today
2017-07-07T17:57:44Z bootstop: Host is rebooting
2017-07-07T18:12:47Z bootstop: Host has booted <----- After auto reboot of the ESXi, one of the VM got stuck in "powering on" so I had to force shutdown the ESXi and reboot again
[root@KETKHST001:~] esxcfg-advcfg -g /Misc/BlueSreenTimeout
Exception occured: Unable to find option BlueSreenTimeout
[root@KETKHST001:~] cat /var/log/hostd.log | grep poweroff
[root@KETKHST001:~] cat /var/log/hostd.log | grep shutdown
2017-07-07T17:41:55.653Z info hostd[FFB8BA80] [Originator@6876 sub=Solo.VmwareCLI] (vim.EsxCLI.system.shutdown) ha-cli-handler-system-shutdown created
2017-07-07T17:42:01.377Z info hostd[FFB8BA80] [Originator@6876 sub=TagExtractor] 8: Rule type=[N5Hostd6Common31MethodNameBasedTagExtractorRuleE:0x28529e08], id=rule[VMFoundryOpRule], tag=IsVMFoundryOp, regex=vim\.VirtualMachine\.(acquireMksTicket|clone|createSecondary|createSnapshot|customize|disableSecondary|enableSecondary|makePrimary|migrate|mountToolsInstaller|powerOff|powerOn|rebootGuest|reconfigure|reload|reloadFromPath|relocate|removeAllSnapshots|turnOffFaultTolerance|recommendHostsForSecondaryVm|rename|reset|resetGuestInformation|retrieveScreenshot|revertToCurrentSnapshot|revertToSnapshot|screenshot|setScreenResolution|shutdownGuest|standbyGuest|startRecording|startReplaying|stopRecording|stopReplaying|suspend|terminate|terminateFaultTolerantVM|unmountToolsInstaller|unregister|upgradeTools|upgradeVirtualHardware) - Identifies Virtual Machine operat
2017-07-07T18:12:34.575Z info hostd[FFCC9A80] [Originator@6876 sub=Solo.VmwareCLI] (vim.EsxCLI.system.shutdown) ha-cli-handler-system-shutdown created
2017-07-07T18:12:39.303Z info hostd[FFCC9A80] [Originator@6876 sub=TagExtractor] 8: Rule type=[N5Hostd6Common31MethodNameBasedTagExtractorRuleE:0x57821eb8], id=rule[VMFoundryOpRule], tag=IsVMFoundryOp, regex=vim\.VirtualMachine\.(acquireMksTicket|clone|createSecondary|createSnapshot|customize|disableSecondary|enableSecondary|makePrimary|migrate|mountToolsInstaller|powerOff|powerOn|rebootGuest|reconfigure|reload|reloadFromPath|relocate|removeAllSnapshots|turnOffFaultTolerance|recommendHostsForSecondaryVm|rename|reset|resetGuestInformation|retrieveScreenshot|revertToCurrentSnapshot|revertToSnapshot|screenshot|setScreenResolution|shutdownGuest|standbyGuest|startRecording|startReplaying|stopRecording|stopReplaying|suspend|terminate|terminateFaultTolerantVM|unmountToolsInstaller|unregister|upgradeTools|upgradeVirtualHardware) - Identifies Virtual Machine operat
[root@KETKHST001:~]
Can anything be gleaned from the information above? If not, what would you recommend the next step be? I'm not really familiar with ESXi and I am not sure, for example, if a system was setup to send the logs to a different log server. Any information would be greatly appreciated.
Check this VMware KB article.
Determining why an ESXi/ESX host was powered off or restarted (1019238) | VMware KB
If you found this or any other answer helpful, please consider the use of the Correct or Helpful to award points.
Best Regards,
Deepak Koshal
CNE|CLA|CWMA|VCP4|VCP5|CCAH
If it is a managed UPS then you can check UPS agent settings, also checks Systems logs via ILO/DREC with hardware perspective.
Hi Deepak,
Thank you but I've seen that doc, that's how I managed to get those logs extracted. I'm still using it as a reference.
Hi vijayrana968,
I like the Ilo idea but I'm still trying to figure out why it's only on port 22 and it doesn't have port 80 or 443 open at all. Wondering if there is a way to enable those ports from SSH? Either way, I'll be onsite next week and will have direct access to the server and will report back any new findings. BTW, Is "DREC/Hardware Perspective" a feature of ILO or something that needs to be installed separately? As I mentioned earlier, I'm not really familiar with this type of environment.
Thank you
What is the make and model of the Server ?
Configuring ESXi coredump to file instead of partition (2077516)
Configuring a diagnostic coredump partition on an ESXi 5.x/6.x host (2004299) | VMware KB
Verify is ASR enabled on the Host.