VMware Cloud Community
Skidz
Contributor
Contributor

Virtual machine unresponsive no error messages

Hello all,

This one has me scratching my head:

I have a number of virtual machines (Windows 7 Enterprise) on three ESXi 5.5 hosts. These VMs are all on an NFS datastore (Windows Storage Server 2008R2). I know I have some latency issues with this datastore and we are working on a SAN solution in the near-term. However, I have certain VMs that are exhibiting strange behavior intermittently.

Essentially, every once in awhile, they will become completely unresponsive (can't log on, can't remotely manage them, nothing). On my vCenter console, I have no warnings or alerts as to CPU, disk or memory for these VMs, and in the summary tab, VMware Tools shows as Running (Current), but the IP Address is blank. This is the only indication I have that something is wrong with the VM. The only thing I can do at this point is power off the VM and then reboot it.

After reboot, in the Event viewer (System), I will find all kinds of Event IDs in the hours preceding the failure. Some pertaining to disk (Event ID 51 is a popular one), and some pertaining to SCM (Event ID 7000). I'm hoping that moving these VMs to a datastore will solve the issue, but we're still months away from that at this point.

My biggest issue at this point is that I get no warning of these events. It would be nice if I could set up some kind of monitoring to let me know as soon as one of these events occurs, so I don't have a user overseas paging me at 4am because his VM is not responding...

If vCenter is no longer displaying the VMs IP address, this means that it is not getting the information back from VMware Tools. Is there a way to set a trigger on this condition that would at least generate an error message ? Ideally, I'd like to be able to automate the power-off and restart cycle of a VM that enters this state...

Any ideas or tips welcome here 🙂

0 Kudos
0 Replies