Hello, today my team reported that one of our PE1950 servers with ESXi 5.5 started to reboot unexpectedly 4 times, and after the 5th boot, it stopped rebooting.
I got the following summary of logs from the vmware-support tar file , from today (Nov212016 UTC -5):
1) vmwarning:
2016-11-21T17:35:24.738Z cpu0:33294)WARNING: LinuxSignal: 538: ignored unexpected signal flags 0x2 (sig 17)
2016-11-21T17:35:27.152Z cpu1:33176)WARNING: Team.etherswitch: TeamES_Activate:668: Failed to initialize beaconing on portset 'pps': Not implemented.
2016-11-21T17:35:41.264Z cpu1:33416)WARNING: ScsiScan: 1408: Failed to add path vmhba1:C0:T0:L0 : Not found
2016-11-21T17:35:41.268Z cpu1:33416)WARNING: ScsiScan: 1408: Failed to add path vmhba1:C0:T1:L0 : Not found
2016-11-21T17:35:45.708Z cpu3:33176)WARNING: NetDVS: 547: portAlias is NULL
2) vmkernel:
0:00:00:05.226 cpu0:32768)Device: 220: Registered driver 'chardevlayer' from 0
0:00:00:05.227 cpu0:32768)Init: 881: Vmkernel initialization done.
0:00:00:05.227 cpu0:32768)VMKernel loaded successfully.
2016-11-21T20:32:14.139Z cpu1:32791)ScsiCore: 130: Starting taskMgmt watchdog world 32791
2016-11-21T20:32:14.139Z cpu2:32792)ScsiCore: 130: Starting taskMgmt watchdog world 32792
2016-11-21T20:32:14.140Z cpu3:32938)ScsiCore: 63: Starting taskmgmt handler world 32938/1
2016-11-21T20:32:14.140Z cpu3:32879)VSCSI: 2606: Starting reset handler world 32879/1
2016-11-21T20:32:14.140Z cpu2:32880)VSCSI: 2800: Starting reset watchdog
3)syslog:
2016-11-21T21:17:59Z localcli: IpmiIfcSensorGetReading: Skipping System Software sensor 0x7
2016-11-21T21:17:59Z localcli: IpmiIfcSensorGetReading: Skipping System Software sensor 0x6
2016-11-21T21:17:59Z localcli: IpmiIfcSensorGetReading: Skipping System Software sensor 0x5
2016-11-21T21:17:59Z localcli: IpmiIfcSensorGetReading: Skipping System Software sensor 0x4
Searching in google, i suspected that the issue could be related to this bug: ESXi 5.5 host server reboots unexpectedly followed by Uncorrectable Machine Check Exception error (2..., however, I didnt find the specific UMCE error. So, I dont know which could be the cause of this behavior.
I also suspect that the server could shutdown because of high temperature on room, or error 1215 on server screen, but this error only refers to a voltage level.
Any idea of this behavior' root cause is greatly appreciated, since often the esxi logs are not conclusive.
Regards.