9 Replies Latest reply on Dec 9, 2014 2:03 PM by HarryHolt

    ESXi 5.0 random reboot

    PeterT82 Lurker

      I have an ESXi 5.0 host that will randomly reboot. Its on a Win 2008 R2 server and is running two virtual machines with Win 7 OS. Has anyone else had this problem? Can you point me in the right direction to solving this issue.

       

       

       

       

      Thank you

        • 1. Re: ESXi 5.0 random reboot
          JD Hot Shot
          vExpert

          Are you running esxi on workstation or in Hyper-v ?

          Do you see any events in the logs ? ( /var/log/messages , /var/log/vmksummary.log, like DCUI: reboot )

          http://www.myitblog.in/
          • 2. Re: ESXi 5.0 random reboot
            HarryHolt Lurker

            Same issue here.  Just started recently.

             

            The server has been busier than usual lately, I have cleared off a lot of test VMs, and have done quite a bit of cloning and copying VMDKs to and from the server. Other than that, no changes to settings or ESXi software or configuration settings.

             

            I've looked in the BIOS, and there is thermal throttling enabled, but nothing like a max temp shutdown available or enabled there.  I also changed the power-off settings so that the BIOS will leave the computer off if it is shutdown due to any power interruption, and it has rebooted a couple of times since then, so it does not seem to be power related.

             

            There is nothing in the logs indicating any kind of reboot request, just this in the vmksummary.log:

             

            2014-08-05T00:00:01Z heartbeat: up 0d1h1m32s, 3 VMs; [[5904 vmx 827956kB] [7604 vmx 3065716kB] [7588 vmx 3178488kB]] [[5988 sfcb-hhrc 2%max] [5922 sfcb-vmware_bas 4%max] [5916 sfcb-pycim 16%max]]
            2014-08-05T00:40:06Z bootstop: Host is rebooting
            2014-08-05T00:42:51Z bootstop: Host has booted
            2014-08-05T00:43:30Z bootstop: Host is rebooting
            2014-08-05T00:46:04Z bootstop: Host has booted
            
            

             

            Redirecting the logs to the VMFS storage seems to get reset on a reboot, BTW.  I have to re-direct them again after a reboot.

             

            I found VMWare article that suggested checking messages.log, but there is not messages.log file being created.

             

            I have not attempted to configure core dumps, is that the next step?

            • 3. Re: ESXi 5.0 random reboot
              King_Robert Hot Shot

              check your server resources ..might be its happening due to less resources on the server..how many times it is restarting in a day..?

              • 4. Re: ESXi 5.0 random reboot
                HarryHolt Lurker

                The server resources are fine, it's not stressed at all.  Rarely does memory or CPU go over 50%, and in fact most of the times that it restarted the guests were mostly idle.

                 

                It has restarted three times within a 24-hour period, sometimes it doesn't restart for days.  It hasn't restarted now for 5 days.  I bought a new cooler for the CPU, as I noticed it was running just over 65 C the last couple of times I checked it.  Waiting for the next random restart before I install that.

                • 5. Re: ESXi 5.0 random reboot
                  HarryHolt Lurker

                  Well the server rebooted again this morning.  This is what was in the vmksummary.log:

                   

                  2014-08-12T11:00:01Z heartbeat: up 5d19h41m0s, 3 VMs; [[5904 vmx 2097152kB] [310239 vmx 2900680kB] [77306 vmx 4119540kB]] [[5980 sfcb-hhrc 2%max] [5928 sfcb-vmware_bas 4%max] [5922 sfcb-pycim 16%max]]

                  2014-08-12T11:17:52Z bootstop: Host has booted

                  2014-08-12T12:00:01Z heartbeat: up 0d0h43m7s, 1 VM; [[5918 sfcb-pycim 13024kB] [4990 hostd-worker 40712kB] [5900 vmx 856708kB]] [[5976 sfcb-hhrc 2%max] [5924 sfcb-vmware_bas 4%max] [5918 sfcb-pycim 16%max]]

                   

                  So no indication of why.

                   

                  The last message in the vmkernel.log was this:

                   

                  2014-08-12T00:01:02.983Z cpu0:4155)WARNING: VFAT: 4346: Failed to flush file times: Stale file handle

                   

                   

                  But that was several hours before the restart.  Nothing else in any of the other logs around the reboot time.

                   

                  I'm pretty stumped.  I'm going to install the new CPU cooler just to throw that at the wall (it's a cheap try).

                  • 6. Re: ESXi 5.0 random reboot
                    IntellinetSC Lurker

                    Same exact thing just started happening to us in the last 1.5 months

                    No events pointing to a reboot.

                    Resources are perfect.

                    We get a software monitor alert that the vms are down.

                    By the time we remote in, host is back up and we have to manually start the vms back up.

                    Log of last night pasted below. Starts at 9:24pm at the bottom.

                     

                     

                    SFX-MAS is powered on info 11/5/2014 11:26:28 PM SFX-MAS root

                    SFX-MAS is starting info 11/5/2014 11:26:25 PM SFX-MAS root

                    SBS2011 is powered on info 11/5/2014 11:26:18 PM SBS2011 root

                    SBS2011 is starting info 11/5/2014 11:26:16 PM SBS2011 root

                    User root@68.115.251.146 logged in info 11/5/2014 11:25:57 PM root

                    User root@127.0.0.1 logged in info 11/5/2014 9:24:46 PM root

                    User root logged out info 11/5/2014 9:24:24 PM root

                    User root@127.0.0.1 logged in info 11/5/2014 9:24:24 PM root

                    VMware Host Agent started info 11/5/2014 9:24:24 PM localhost.localdomain

                    Host has booted. info 11/5/2014 9:24:23 PM localhost.localdomain

                    Physical NIC vmnic1 linkstate is up. info 11/5/2014 9:24:18 PM localhost.localdomain

                    Physical NIC vmnic1 linkstate is down. warning 11/5/2014 9:24:18 PM localhost.localdomain

                    Physical NIC vmnic1 linkstate is up. info 11/5/2014 9:24:18 PM localhost.localdomain

                    Physical NIC vmnic0 linkstate is up. info 11/5/2014 9:24:18 PM localhost.localdomain

                    Port vmk0 is now protected by Firewall. info 11/5/2014 9:24:17 PM localhost.localdomain

                    Firewall configuration has changed. Operation
                    'enable' for rule set netDump succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'add' for rule set netDump succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'add' for rule set remoteSerialPort succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'add' for rule set vSPC succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'enable' for rule set WOL succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'add' for rule set WOL succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'add' for rule set IKED succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'add' for rule set syslog succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'add' for rule set DVSSync succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'add' for rule set DHCPv6 succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'add' for rule set DVFilter succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'add' for rule set gdbserver succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'add' for rule set httpClient succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'add' for rule set ftpClient succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'enable' for rule set HBR succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'add' for rule set HBR succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'enable' for rule set NFC succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    One or more LVM devices have been discovered
                    on this host.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'add' for rule set NFC succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'add' for rule set activeDirectoryAll succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'setrequired' for rule set vSphereClient succeeded
                    .
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'enable' for rule set vSphereClient succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'add' for rule set vSphereClient succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'enable' for rule set vMotion succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'add' for rule set vMotion succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'enable' for rule set webAccess succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'add' for rule set webAccess succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'enable' for rule set faultTolerance succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'add' for rule set faultTolerance succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'add' for rule set updateManager succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'enable' for rule set vpxHeartbeats succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'add' for rule set vpxHeartbeats succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'add' for rule set iSCSI succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'enable' for rule set CIMSLP succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'add' for rule set CIMSLP succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'enable' for rule set CIMHttpsServer succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'add' for rule set CIMHttpsServer succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'enable' for rule set CIMHttpServer succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'add' for rule set CIMHttpServer succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'enable' for rule set ntpClient succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'add' for rule set ntpClient succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'enable' for rule set snmp succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'add' for rule set snmp succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'enable' for rule set dns succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'add' for rule set dns succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'enable' for rule set dhcp succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'add' for rule set dhcp succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'add' for rule set nfsClient succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'add' for rule set sshClient succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    Firewall configuration has changed. Operation
                    'add' for rule set sshServer succeeded.
                    info
                    11/5/2014 9:24:17 PM
                    localhost.localdomain

                    • 7. Re: ESXi 5.0 random reboot
                      Hot Shot

                      Can you check vmkernal logs and see if anything wrong there.

                       

                      Thanks,

                      DJ

                      • 8. Re: ESXi 5.0 random reboot
                        Alistar Expert

                        Hi guys,

                         

                        can you please check vmkernel.log for Machine Check Errors and post output of this?

                        # grep MCE /var/log/vmkernel.log

                        also a dump of vmkwarning.log would be helpful

                         

                        We had hosts crashing under random loads as well and eventualy it was pointing out to a hardware failure - this is 99% the case when an ESXi crashes on you without any PSOD.

                        • 9. Re: ESXi 5.0 random reboot
                          HarryHolt Lurker

                          The only result from your suggested grep was:

                           

                          TSC: 114497 cpu0:0)BootConfig: 89: mcaClearBanksOnMCE = TRUE

                          0:00:00:02.005 cpu0:4096)MCE: 2582: Detected 6 MCE banks. MCG_CAP MSR:0x106

                           

                           

                          There was nothing relevant in the warning file, so I didn't post that, it's just a bunch of latency warnings (during backups), and a few stale file handles.  None of it written near the reboot times.

                           

                          But I think my issue probably came down to some incompatibility between my UPS and the server's power supply.  I moved the laser printer off of the UPS to a different circuit, and I haven't had any random reboots since.