davlloyd
Hot Shot
Hot Shot

Host not accepting tasks from VC after VC offline for period

We had an issue where the database for VC was offline for about 6 hours. When it was recovered all hosts were reporting correctly through the console so all looked to be OK.

An issue raised its head though when tasks were attempted to execute against some of the hosts. Most the hosts accepted the tasks as expected with the exception of one. The tasks executed against this host were failing with a general error with the unable to communicate with guest response. On connecting to the host all looked well althogh quiet. After giving the management services a restart request the action took about over 5 minutes to complete (normally only about 15 seconds).

The logs for the time period shows at the restart request that an out of memory exception occured and the process killed. We are currently running 2.01 path 2 and looking towards 3.02 as an action.

Two questions that comes of this are:

\- Does the management agent have a possible leak that can be amplified by repeated attempts to connect to a VC that is offline

\- How do you monitor against this issue as the host and VC reported as OK and the issue only raised its head when running tasks (inventory sync was OK)

Anyone else had this joy?

Cheers

0 Kudos
4 Replies
Texiwill
Leadership
Leadership

Hello,

You may have to restart the hostd Daemon on ESX.

Login to the SC:

run: service vmware-mgmt restart

Then look inside the file /var/log/vmware/hostd.log for any errors related to the restart.

I would also look at /var/log/vmware/esxcfg-firewall.log as well. Effectively any log in that directory will hold the missing bits of information. I had the same problem and it was a badly configured firewall XML file. Use the logs to determine which is causing the issue.

Best regards,

Edward

--
Edward L. Haletky
vExpert XIV: 2009-2022,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill
0 Kudos
davlloyd
Hot Shot
Hot Shot

Sorry should have been clearer, the management services restart is 'service mgmt-vmare restart'. When running this the process took over 5 minutes which by the hostd log is OK but in the messages log an out of memory error occurs during the stop process.

0 Kudos
bertdb
Virtuoso
Virtuoso

does "top" or "vmstat 5" report anything out of the ordinary ?

cat /proc/swaps , is there a lot of swap used ?

0 Kudos
Texiwill
Leadership
Leadership

Hello,

I would still investigate all the logs for any errors. As I stated, I had something similar to me, hostd crashing and it ended up being a firewall rule, but could be some other XML file. Do you keep backup /etc/vmware/.xml or /etc/vmware//*.xml files in the same directory as the orginals?

Best regards,

Edward

--
Edward L. Haletky
vExpert XIV: 2009-2022,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill