I have a ESX Server with 4 vms on it that keeps rebooting itself intermittently. This only started a few weeks ago. I was wondering if there are any log files that I could have a look at to see why the reboot is happening. Have Vmotioned my vms onto the other ESX box in the cluster while I investigate the issue
vi /etc/ssh/sshd_config
change PermitRootLogin to yes
service sshd reload
Alternatively you can use winscp to login as root and it has a built in editor you can view log files with.
Most of those logs are also available in the Virtual Infrastructure Client. Just click on the "Admin" toolbar button and go to "System Logs" tab.
From the new user account you can also get root access by using the su command.
su -
The dash is there to initialize the environment properly so that the search paths are setup for root.
Message was edited by:
wila, added the su part.
We had a similar situation with a DL585 G2 doing the same thing ( rebooting for no reason ) ..
I uninstalled the Insight Manager ver 7.6 and installed 7.7.0-115 and we have had no issues so far and its been about 2 weeks.
ESX 3.01, 32039
Thanks
Think I have found the issue. Here is a copy of the vmkwarning log file
Feb 27 23:15:51 MERCURY vmkernel: TSC: 10437255687 cpu0:0)WARNING: NUMA: 606: Memory is incorrectly balanced between the NUMA nodes of this system, which will lead to poor performance. See /proc/vmware/NUMA/hardware for details on your current memory configuration
Feb 27 23:16:29 MERCURY vmkernel: 0:00:00:34.688 cpu1:1038)WARNING: SCSI: 1784: Manual switchover to path vmhba1:2:1 begins.
Feb 27 23:16:29 MERCURY vmkernel: 0:00:00:34.688 cpu1:1038)WARNING: SCSI: 1110: Did not switchover to vmhba1:2:1. Check Unit Ready Command returned READY instead of NOT READY for standby controller .
Feb 27 23:16:29 MERCURY vmkernel: 0:00:00:34.688 cpu1:1038)WARNING: SCSI: 1819: Manual switchover to vmhba1:2:1 completed successfully.
Feb 27 23:22:24 MERCURY vmkernel: 0:00:08:09.936 cpu3:1060)WARNING: Cow: 1089: COW file was not closed cleanly, doing checks
Feb 27 23:31:54 MERCURY vmkernel: 0:00:17:40.846 cpu1:1069)WARNING: CpuSched: 7145: time went backwards by 27 usec
Feb 27 23:31:54 MERCURY vmkernel: 0:00:17:40.857 cpu1:1069)WARNING: CpuSched: 7145: time went backwards by 31 usec
Feb 27 23:31:54 MERCURY vmkernel: 0:00:17:40.878 cpu1:1069)WARNING: CpuSched: 7145: time went backwards by 43 usec
Feb 28 06:14:16 MERCURY vmkernel: 0:07:00:01.971 cpu3:1062)WARNING: CpuSched: 7145: time went backwards by 161 usec
Feb 28 06:14:16 MERCURY vmkernel: 0:07:00:01.974 cpu3:1059)WARNING: CpuSched: 7145: time went backwards by 183 usec
Feb 28 06:14:16 MERCURY vmkernel: 0:07:00:02.028 cpu2:1061)WARNING: CpuSched: 7145: time went backwards by 447 usec
This is from the HP web site on the HPSIM Agent
Fixed an issue wherein the storage agents consumed excessive CPU time, potentially resulting in server reboots (ASRs). The CCISS device nodes are now kept open by default on all servers to workaround this issue
I have uninstalled the Insight Manager agent 7.5 and installed 7.7. Will now have to wait and see how the machine performs.
Hi,
I have the same problem with my VM's rebooting on 3.0.1.
Pls see my posting
http://www.vmware.com/community/thread.jspa?threadID=74582&tstart=30
When i contacted vmware support guys, they told me that there is a bug in 3.0.1 with Memory Ballooning itseems( Bug id 108195).
Could not find details of Bug id 108195.
The problem I have is not quite the same. My ESX server is rebooting not the VMs. They seem to be fine. But you seem to have a very interesting problem.
FYI HP just released an update for BL25's to address memory errors specifically in VMware.
http://h18023.www1.hp.com/support/files/server/us/download/26720.html
Unfortunately, every download page I can find is packaged for a Windows install and not VMWare. If anyone finds a usuable one, please respond to this.
After updating the Insite manager client I have not had the problem occur again. It has been a week now with no reboots.
We have had no more problems with this so counting this as resolved