VMware Cloud Community
BriGem
Contributor
Contributor

host not responding - cpu utilization drops to zero

On average of once per day at variable times on some of my ESX 3.5.0, 153875 hosts, the cpu utilization drops to zero. sometimes this leads to a "host is not responding" error which usually lasts for about 10 seconds. When this occurs, the virtual machines disconnect, but then a few seconds later they come back, and everything returns to normal.

I was thinking it might be related to my swapfile settings; I store the swapfile in the datastore specified by the host (locally on the host) in order to preserve performance on my SAN. I'm thinking of changing that setting to store the swapfile in the same directory as the virtual machine as a possible resolution to the host is not responding issue.

Has anyone else seen this drop in cpu performance graphs with a corresponding host is not responding error in the events?

0 Kudos
4 Replies
marcelo_soares
Champion
Champion

Hmmmmm... I'm quite sure that the "drop" is due to the disconnection (you are not seeing the data because you cannot connect to the COS). This is reproductible? It happens always in the same time of day? Do you have any 3rd part agent installed onto the COS?

Marcelo Soares

VMWare Certified Professional 310

Technical Support Engineer

Linux Server Senior Administrator

Marcelo Soares
0 Kudos
BriGem
Contributor
Contributor

It is not really reproducible per se, it occurs randomly on each of the host servers at various times. I noticed that each time it occurs there are corresponding "changed resource allocation for (virtual machine name)" events on the host. I don't have any third party agent installed on the COS, this is a plain vanilla installation.

I'm running an average of 7 virtual machines (windows server 2003), per host, and my average cpu utilization is 20 to 50% on each host, and it does not appear to be high just prior to the dropouts. Also it's rare that I actually see the phantom cpu dropping to zero that actually leads to disconnects errors in the events, but that seems to be the only time this issue really causes a problem, so I configured my cluster host connection state alarm to email me antime the disconnect occurs.

Thanks,

Brian Smith

0 Kudos
marcelo_soares
Champion
Champion

Brian,

Try this: http://kb.vmware.com/kb/1012575

I'm not sure if this is the cause (these are ESX or ESXi??)... but maybe this will mitigate the problem. In fact CIM starves the memory so you experience the disconnect issues.

Try this and let me know what happened.

Marcelo Soares

VMWare Certified Professional 310

Technical Support Engineer

Linux Server Senior Administrator

Marcelo Soares
BriGem
Contributor
Contributor

Thanks Marcelo,

The issue is occuring on ESX, not ESXi. However over the last month I have had 2 instances where one of my hosts inexplicably crashed "softly" - they were not blue screened and did not crash, but the virtual machines stopped responding on the network. Normally when a host goes down for any reason, my virtual machines will scatter to the remaining hosts. In this case they were held "hostage". Once I manually restarted the host, everything returned to normal. Your kb link led me to believe these events are related. I am trying the setting on one of my hosts:

Click the Configuration tab. Click the Advanced Settings link. Navigate to the Misc Category Find the Misc.CimEnabled parameter from the list and change the value to 0.

I will let you know the results.

Thank you very much,

Brian Smith

0 Kudos