VMware Cloud Community
timw18
Enthusiast
Enthusiast
Jump to solution

ESX Server intermittently rebooting

I have a ESX Server with 4 vms on it that keeps rebooting itself intermittently. This only started a few weeks ago. I was wondering if there are any log files that I could have a look at to see why the reboot is happening. Have Vmotioned my vms onto the other ESX box in the cluster while I investigate the issue

0 Kudos
29 Replies
Jae_Ellers
Virtuoso
Virtuoso
Jump to solution

vi /etc/ssh/sshd_config

change PermitRootLogin to yes

service sshd reload

Alternatively you can use winscp to login as root and it has a built in editor you can view log files with.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=- http://blog.mr-vm.com http://www.vmprofessional.com -=-=-=-=-=-=-=-=-=-=-=-=-=-=-
0 Kudos
wila
Immortal
Immortal
Jump to solution

Most of those logs are also available in the Virtual Infrastructure Client. Just click on the "Admin" toolbar button and go to "System Logs" tab.

From the new user account you can also get root access by using the su command.

su -

The dash is there to initialize the environment properly so that the search paths are setup for root.

Message was edited by:

wila, added the su part.

| Author of Vimalin. The virtual machine Backup app for VMware Fusion, VMware Workstation and Player |
| More info at vimalin.com | Twitter @wilva
0 Kudos
vmware_lic
Enthusiast
Enthusiast
Jump to solution

We had a similar situation with a DL585 G2 doing the same thing ( rebooting for no reason ) ..

I uninstalled the Insight Manager ver 7.6 and installed 7.7.0-115 and we have had no issues so far and its been about 2 weeks.

ESX 3.01, 32039

Thanks

0 Kudos
timw18
Enthusiast
Enthusiast
Jump to solution

Think I have found the issue. Here is a copy of the vmkwarning log file

Feb 27 23:15:51 MERCURY vmkernel: TSC: 10437255687 cpu0:0)WARNING: NUMA: 606: Memory is incorrectly balanced between the NUMA nodes of this system, which will lead to poor performance. See /proc/vmware/NUMA/hardware for details on your current memory configuration

Feb 27 23:16:29 MERCURY vmkernel: 0:00:00:34.688 cpu1:1038)WARNING: SCSI: 1784: Manual switchover to path vmhba1:2:1 begins.

Feb 27 23:16:29 MERCURY vmkernel: 0:00:00:34.688 cpu1:1038)WARNING: SCSI: 1110: Did not switchover to vmhba1:2:1. Check Unit Ready Command returned READY instead of NOT READY for standby controller .

Feb 27 23:16:29 MERCURY vmkernel: 0:00:00:34.688 cpu1:1038)WARNING: SCSI: 1819: Manual switchover to vmhba1:2:1 completed successfully.

Feb 27 23:22:24 MERCURY vmkernel: 0:00:08:09.936 cpu3:1060)WARNING: Cow: 1089: COW file was not closed cleanly, doing checks

Feb 27 23:31:54 MERCURY vmkernel: 0:00:17:40.846 cpu1:1069)WARNING: CpuSched: 7145: time went backwards by 27 usec

Feb 27 23:31:54 MERCURY vmkernel: 0:00:17:40.857 cpu1:1069)WARNING: CpuSched: 7145: time went backwards by 31 usec

Feb 27 23:31:54 MERCURY vmkernel: 0:00:17:40.878 cpu1:1069)WARNING: CpuSched: 7145: time went backwards by 43 usec

Feb 28 06:14:16 MERCURY vmkernel: 0:07:00:01.971 cpu3:1062)WARNING: CpuSched: 7145: time went backwards by 161 usec

Feb 28 06:14:16 MERCURY vmkernel: 0:07:00:01.974 cpu3:1059)WARNING: CpuSched: 7145: time went backwards by 183 usec

Feb 28 06:14:16 MERCURY vmkernel: 0:07:00:02.028 cpu2:1061)WARNING: CpuSched: 7145: time went backwards by 447 usec

This is from the HP web site on the HPSIM Agent

Fixed an issue wherein the storage agents consumed excessive CPU time, potentially resulting in server reboots (ASRs). The CCISS device nodes are now kept open by default on all servers to workaround this issue

0 Kudos
timw18
Enthusiast
Enthusiast
Jump to solution

I have uninstalled the Insight Manager agent 7.5 and installed 7.7. Will now have to wait and see how the machine performs.

0 Kudos
ramram77
Contributor
Contributor
Jump to solution

Hi,

I have the same problem with my VM's rebooting on 3.0.1.

Pls see my posting

http://www.vmware.com/community/thread.jspa?threadID=74582&tstart=30

When i contacted vmware support guys, they told me that there is a bug in 3.0.1 with Memory Ballooning itseems( Bug id 108195).

Could not find details of Bug id 108195.

0 Kudos
timw18
Enthusiast
Enthusiast
Jump to solution

The problem I have is not quite the same. My ESX server is rebooting not the VMs. They seem to be fine. But you seem to have a very interesting problem.

0 Kudos
Rob_Bohmann1
Expert
Expert
Jump to solution

FYI HP just released an update for BL25's to address memory errors specifically in VMware.

http://h18023.www1.hp.com/support/files/server/us/download/26720.html

Unfortunately, every download page I can find is packaged for a Windows install and not VMWare. If anyone finds a usuable one, please respond to this.

0 Kudos
timw18
Enthusiast
Enthusiast
Jump to solution

After updating the Insite manager client I have not had the problem occur again. It has been a week now with no reboots.

0 Kudos
timw18
Enthusiast
Enthusiast
Jump to solution

We have had no more problems with this so counting this as resolved

0 Kudos