VMware Cloud Community
davechubb
Contributor
Contributor

Multiple Guests Crash

Hi All, I have a single ESXi 5 host running a coupe of VMs. I have recently experienced 3 crash type situations. Although I can still access the host. The symptoms are as follows: All guests a mixture of Windows and Linux stop responding. I am able to login to the host via the vsphere client and can see the un-responsive console windows. When trying to power off the VM I see the task complete but nothing happens to the VM, console remains the same. If I look at the CPU graph I can see the CPU usage for all guests has completely flat lined and is not existent. The only way I have found to resolve this is to restart the host completely. Please could I have suggestions on log files to check to try and identify the issue? Many thanks Dave

0 Kudos
4 Replies
vmroyale
Immortal
Immortal

Hello and welcome to the communities.

Check out http://kb.vmware.com/kb/2004201 for the log file locations. I would also check in the VM's working location for the actual VM log files.

Brian Atkinson | vExpert | VMTN Moderator | Author of "VCP5-DCV VMware Certified Professional-Data Center Virtualization on vSphere 5.5 Study Guide: VCP-550" | @vmroyale | http://vmroyale.com
0 Kudos
davechubb
Contributor
Contributor

Many thanks for the infomation on the logs.

The following is an extract from the final part of one guest vmware.log just before the crash.

2013-01-23T03:04:12.581Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox timed out.
2013-01-23T03:04:27.580Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox timed out.
2013-01-23T03:04:27.580Z| vmx| I120: GuestRpc: app toolbox's second ping timeout; assuming app is down
2013-01-23T03:04:27.581Z| vmx| I120: GuestRpc: Reinitializing Channel 0(toolbox)
2013-01-23T03:04:27.581Z| vmx| I120: GuestMsg: Channel 0, Cannot unpost because the previous post is already completed
2013-01-23T03:04:27.581Z| vmx| I120: GuestRpc: Channel 0 reinitialized.
2013-01-23T03:04:27.581Z| vmx| I120: GuestRpc: Channel 0 reinitialized.
2013-01-23T03:04:56.321Z| vcpu-0| I120: GuestMsg: channel 0: wrong cookie, discarding message.
2013-01-23T03:04:56.321Z| vcpu-0| I120: GuestMsg: channel 0: wrong cookie, discarding message.
2013-01-23T03:04:57.323Z| vcpu-0| I120: GuestRpc: Channel 0, guest application toolbox.
2013-01-23T03:04:57.326Z| vcpu-0| I120: TOOLS autoupgrade protocol version 2
2013-01-23T03:04:57.327Z| vcpu-0| I120: Vix: [5860 mainDispatch.c:3787]: VMAutomationReportPowerStateChange: Reporting power state change (opcode=2, err=0).

Any suggestions on what the issue is please?

Many thanks

Dave

0 Kudos
davechubb
Contributor
Contributor

I have reviewed vmksummary.log and see there is a huge gap in time from the time of the guest crash to when I restarted the host.

2013-01-23T04:00:01Z heartbeat: up 2d12h50m51s, 6 VMs; [[6122 vmx 1173372kB] [5837 vmx 1376664kB] [6019 vmx 3810092kB]] [[580
2013-01-23T08:33:03Z bootstop: Host has booted
2013-01-23T09:00:01Z heartbeat: up 0d0h27m57s, 6 VMs; [[6074 vmx 468852kB] [5791 vmx 799972kB] [5971 vmx 2609760kB]] [[5953 s

Would this suggest ESXi crashed complety?

Which logs would help me find the cause?

Thanks

Dave

0 Kudos
zXi_Gamer
Virtuoso
Virtuoso

2013-01-23T03:04:12.581Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox timed out.
2013-01-23T03:04:27.580Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox timed out.
2013-01-23T03:04:27.580Z| vmx| I120: GuestRpc: app toolbox's second ping timeout; assuming app is down
2013-01-23T03:04:27.581Z| vmx| I120: GuestRpc: Reinitializing Channel 0(toolbox)

The messages do mention about the vmware tools sync and heartbeat validations of the Guests.

Like I mentioned in your other post, it does seem that the vms are trying to sync the tool status, but the vm's tools are not running, or the server is unable to grep the tools status in the guests. This could happen in case of overcommitt or over stress vms using heavy memory usage. Could you help us by letting know your server mem/cpu vm mem/cpu configuration?

0 Kudos