Reply to Message

View discussion in a popup

Replying to:
peetz
Leadership
Leadership

PLEASE comment: Abnormally increased RAM usage of hostd in ESX 3.5 U3 (memory leak?)

Hello all,

since we updated our ESX hosts from ESX 3.5 Update 2 to ESX 3.5 Update 3 (+ Dec 2008 patches) we experience an abnormally high RAM usage of the hostd management process on the hosts.

The symptoms are that hostd hits its default hard memory limit of 200 MB and shuts itself down. This is indicated by the following error message in var/log/vmware/hostd.log:

\[2009-01-06 06:25:47.707 'Memory checker' 19123120 error\] Current value 207184 exceeds hard limit 204800. Shutting down process.

The service is automatically restarted, but as a consequence the VirtualCenter agent also restarts and the host will become temporarily unresponsive in VirtualCenter (shown as "not responding" or disconnected").

To fix the problem we increased the hard memory limit of hostd by following the recommendations we found here:

http://communities.vmware.com/thread/140140

and here:

We increased the hard limit to 250 MB, but within several hours this limit was also reached on some hosts and the problem re-appeared. So, I suspect that there is ome kind of memory leak in hostd, and we need to find its cause to finally solve the problem.

According to VMware support there are all sorts of things that might cause an increased RAM usage of hostd, because there are many other processes/applications using it: The VirtualCenter agent, the HA agent, hardware management agents, and other external applications that use the web-API to talk to the host.

We have several hosts that are configured all the same and belong to the same cluster in VirtualCenter. These hosts (let's call them type A ) all show the problem. However, there is one host that is not part of the cluster, but is also managed by the same VirtualCenter instance, and this one does NOT show the problem (let's all it type B ). I hope to find out the reason for hostd's increased RAM usage by comparing these two types of hosts:

  • Both types are installed with ESX 3.5 Update 3 (+ Dec 2008 patches) using the same automated build.

  • Both are HP hardware and have the HP Management agents version 8.1.1 installed.

  • Both servers are monitored by the same HP System Insight Manager server that queries the installed HP management agents.

  • Type A is HP ProLiant DL585 (G1 or G2) with four AMD-Opteron-DualCores. Type B is a HP ProLiant DL360 G5 with two Intel-Xeon-QuadCores.

  • Type A has SAN connections (using QLogic adapters) and uses two NFS datastores for ISO images. Type B uses only local hard disks for VM and ISO storage.

  • Type A is in a DRS- and HA-enabled cluster (with EVC enabled). Type B is stand-alone.

I'm trying to find the problem's cause by the process of elimination. I already disabled HA on the cluster, and this did NOT fix the problem. Now I stopped the HP agents on one host to see if it makes a difference (although I do not expect it since both types A and B have them running).

While I'm going down this I'd like to have some input from the community that might also lead to the cause of my problem:

  • Anyone out there that is also experiencing high hostd RAM usage? What is your hard limit, and is your configuration comparable?

  • Anyone out there with configurations comparable to type A, but NOT seeing this problem (I guess there are many ...)? What might be the difference causing the problem?

  • Any other helpful comments? Does anyone know a way to increase the debug level of hostd? (I also asked VMware support for this, but have not yet received an answer)

You can check RAM usage of hostd by using

ps waux | grep vmware-hostd

in the service console. It outputs something like

root 19285 0.4 8.1 81576 65488 ? S 07:44 0:37 /usr/lib/vmware/hostd/vmware-hostd /etc/vmware/hostd/config.xml -u

The fat number is the RAM usage in MB. You can also check /var/log/vmware/hostd.log for messages like "... exceeds soft limit ..." (warning only) and "... exceeds hard limit ..." (will cause a service restart).

I'll keep this thread updated with all informations I find out myself or receive from VMware support. Thank you for any contributions.

Andreas

added additional information, corrected formatting.

updated tags to include EMC_Controlcenter

Twitter: @VFrontDe, @ESXiPatches | https://esxi-patches.v-front.de | https://vibsdepot.v-front.de
Reply
0 Kudos