Guys !
One of our Esxi host 6.0 is showing as "Not responding" in vCenter Console 6.0. Virtual Machines resides in the ESXi box are running fine and able to RDP them.
Found hostd service is not running in the host, hence tried to restart the hostd service, it throws out the following error. Kindly help .
/sbin/watchdog.sh: line 342: can't fork
sh : you need to specify whom to kill
/usr/lib/vmware/hostd/bin/upgrade-configrules.sh: line 12: can't fork
Try running below command which will restart all services on the ESXi host.
Dont worry, it wont impact the running VM on it.
services.sh restart
If it works then ok else you need to take the outage and reboot all VMs along with ESXi host.
Thanks
vmwarediary.com
hostd is management agent so for any reason if the ESXi management agent stopped responding and failed to restart or start, it might be showing that error. Check the hostd log for exact time when the hostd stopped responding or tried to start.
VMs does not depend on management network so it should be fine. If the hostd is recovered (no timeline for this ), then i have seen that it would connect back automatically but if there are underlying storage issues and hostd could not recover, you might have to plan downtime for vms and reboot the esxi host
Thanks,
MS
Welcome to Communities,
I would recommend reviewing these following KB to ensure none of these steps helped to resolve the issue:
https://kb.vmware.com/s/article/1002849
https://kb.vmware.com/s/article/1003409
https://kb.vmware.com/s/article/1019082
&
https://kb.vmware.com/s/article/1004424
you will only be able to recover hostd and management connectivity with a reboot of ESXi after performing a graceful shutdown of all the running VM's. Once the host is rebooted collect logs to determine what impacted and raise a support request with VMware.
https://kb.vmware.com/s/article/653
https://kb.vmware.com/s/article/2069559
========================================================================================================================================================================
Not Responding
A host can become greyed out and shown as Not Responding because of an external factor that vCenter Server is unaware of. If a host is showing as Not Responding, vCenter Server no longer receives heartbeats from it.
This happens because of several reasons, all of which prevent heartbeats being received from the host to vCenter.
Some common reasons include:
A host can go from Not Responding back to a normal state if the underlying issue which brought the host to the Not Responding state is resolved. However, a host that is in the Disconnected state ceases to be monitored by vCenter Server and stays in that state regardless of the status of the underlying issue. After resolving the issue, the user must right-click on the host and select Connect to bring the host back to a normal state in vCenter Server.
Disconnected
Disconnected is a state initiated from the vCenter Server side and suspends vCenter Server host management, and thus all vCenter Server services ignore the host.
A disconnected host is the one that has been explicitly disconnected by the user, or the license on the host has expired. Disconnected hosts also require the user to manually reconnect the host.
Ultimately, a host that is Disconnected due to one of these three reasons (2 of which require manual intervention):
When a host becomes disconnected, it still remains in the vCenter Server inventory, but vCenter Server does not get any updates from the disconnected host, does not monitor it, and therefore has no knowledge of the health of that disconnected host.
vCenter Server takes a conservative approach when considering disconnected hosts. Virtual machines on a host that is not responding affect the admission control check for vSphere HA. vCenter Server does not include those virtual machines when computing the current failover level for HA, but assumes that any virtual machines running on a disconnected host will be failed over if the host fails. Because the status of the host is not known, and because vCenter Server is not communicating with that host, HA cannot use it as a guaranteed failover target. As part of disconnecting a host, vCenter Server disables HA on that host. The virtual machines on that host are therefore not failed over in the event of a host isolation. When the host becomes reconnected, the host becomes available for failover again.
Let me know if you need additional information 'or' have any other questions that I can help with.