Hi All,
esxi host rebooted suddenly and we logged with OEM-vmware and we got the reply from vmware that since the log not available they could not find the root cause.
please let me know what could make esxi to reboot since our customer asking for us RCA.
Please do the
Well, it's hard to help without any details. Anyway, it's quite unusual that a host reboots by its own. If an severe error is detected by ESXi, it's supposed to halt the system and to display a PSOD (Purple Screen of Diagnostics). With enterprise hardware, a reboot my be caused by the server itself if a hardware error is detected. HP for example calls this option ASR (Automatic Server Reboot). Such ASR's - as well as e.g. a power loss - are then logged in the server management log file.
André
Create a persistent scratch location according to http://kb.vmware.com/kb/1033696.
Then log files are not deleted (if the are located on the ramdisk) once the server is rebooted.
"If persistent scratch space is not available, ESXi stores this temporary data on a ramdisk, which is constrained in space. This might be problematic in low-memory situations, but is not critical to the operation of ESXi. Information stored on a ramdisk does not persist across reboots, so troubleshooting information such as logs and core files could be lost. If a persistent scratch location on the host is not configured properly, you may experience intermittent issues due to lack of space for temporary files and the log files will not be updated."
VMware is right that they cannot find out the root cause if no logs are available.
If a persistent scratch location was created and the issue re-occurs, you can start with this KB: http://kb.vmware.com/kb/1019238.
Then check the logs like syslog.log, vobd.log, vmkernel.log, vmkwarning.log and hostd.log.
Check the HW logs if the server did a reboot because of a HW issue (ASR - Automatic Server Restart, NMI etc.).
Hi
To determine the reason for abrupt shut down or reboot an ESX host:
# cat /var/log/vmksummary
/var/log/vmksummary
similar to:localhost logger: (1265803308) hb: vmk loaded, 1746.98, 1745.148, 0, 208167, 208167, 0, vmware-h-59580, sfcbd-7660, sfcbd-3524
localhost vmkhalt: (1268149354) Halting system...
localhost vmkhalt: (1268149486) Starting system...
localhost logger: (1268149540) loaded VMkernel
vsphere5 logger: (1251788469) hb: vmk loaded, 3597562.98, 3597450.113, 13, 164009, 164009, 356, vmware-h-79976, vpxa-54148, sfcbd-12600
vsphere5 vmkhalt: (1251797195) Starting system...
vsphere5 logger: (1251797206) VMkernel error
vsphere5 logger: (1251797261) loaded VMkernel
localhost logger: (1265803308) hb: vmk loaded, 1746.98, 1745.148, 0, 208167, 208167, 0, vmware-h-59580, sfcbd-7660, sfcbd-3524
localhost vmkhalt: (1268149486) Starting system...
localhost logger: (1268149540) loaded VMkernel
/var/log/vmkernel
log of the ESX host:VMKAcpi: 1865: In PowerButton Helper
please check IMM or ILO or DRAC logs events. will get at least if there is any hardware fault which missed to be triggered in Vcenter., also can check firmware level of the system to dig deeper.
Regards
Rahul
Difficult to reply on such generic statements, provide more details, vmkernel logs, dumps ( if configured ) , hardware logs etc.
Hi
Please Check if ESXi is configured to automatically reboot after a purple screen by executing this command:
esxcfg-advcfg -g /Misc/BlueScreenTimeout
If the value is different than 0, then ESXi reboots automatically after the purple screen
Thanks
Sakthivel R