VMware Cloud Community
SATHISHVIJAY
Enthusiast
Enthusiast

hostd not running

Guys !

One of our Esxi host 6.0 is showing as "Not responding" in vCenter Console 6.0. Virtual Machines resides in the ESXi box are running fine and able to RDP them.

Found hostd service is not running in the host, hence tried to restart the hostd service, it throws out the following error. Kindly help .

/sbin/watchdog.sh: line 342: can't fork

sh : you need to specify whom to kill

/usr/lib/vmware/hostd/bin/upgrade-configrules.sh: line 12: can't fork

Tags (1)
0 Kudos
3 Replies
Jitu211003
Hot Shot
Hot Shot

Try running below command which will restart all services on the ESXi host.

Dont worry, it wont impact the running VM on it.

services.sh restart

If it works then ok else you need to take the outage and reboot all VMs along with ESXi host.

Thanks

vmwarediary.com

0 Kudos
msripada
Virtuoso
Virtuoso

hostd is management agent so for any reason if the ESXi management agent stopped responding and failed to restart or start, it might be showing that error. Check the hostd log for exact time when the hostd stopped responding or tried to start.

VMs does not depend on management network so it should be fine. If the hostd is recovered (no timeline for this ), then i have seen that it would connect back automatically but if there are underlying storage issues and hostd could not recover, you might have to plan downtime for vms and reboot the esxi host

Thanks,

MS

0 Kudos
VishShah
VMware Employee
VMware Employee

Welcome to Communities,

I would recommend reviewing these following KB to ensure none of these steps helped to resolve the issue:

https://kb.vmware.com/s/article/1002849

https://kb.vmware.com/s/article/1003409

https://kb.vmware.com/s/article/1019082

&

https://kb.vmware.com/s/article/1004424

you will only be able to recover hostd and management connectivity with a reboot of ESXi after performing a graceful shutdown of all the running VM's. Once the host is rebooted collect logs to determine what impacted and raise a support request with VMware.

https://kb.vmware.com/s/article/653

https://kb.vmware.com/s/article/2069559

========================================================================================================================================================================

Not Responding

A host can become greyed out and shown as Not Responding because of an external factor that vCenter Server is unaware of. If a host is showing as Not Responding, vCenter Server no longer receives heartbeats from it.

This happens because of several reasons, all of which prevent heartbeats being received from the host to vCenter.

Some common reasons include:

  • A network connectivity issue between the host and vCenter Server, such as UDP port 902 not open, a routing issue, bad cable, firewall rule, etc.
  • hostd is not running successfully on the host.
  • vpxa is not running successfully on the host.
  • The host has failed.

A host can go from Not Responding back to a normal state if the underlying issue which brought the host to the Not Responding state is resolved. However, a host that is in the Disconnected state ceases to be monitored by vCenter Server and stays in that state regardless of the status of the underlying issue. After resolving the issue, the user must right-click on the host and select Connect to bring the host back to a normal state in vCenter Server.

Disconnected

Disconnected is a state initiated from the vCenter Server side and suspends vCenter Server host management, and thus all vCenter Server services ignore the host.

A disconnected host is the one that has been explicitly disconnected by the user, or the license on the host has expired. Disconnected hosts also require the user to manually reconnect the host.

Ultimately, a host that is Disconnected due to one of these three reasons (2 of which require manual intervention):

  • A user right-clicks the host and selects Disconnect.
  • A user right-clicks a host that is listed as Not Responding and clicks Connect and that task fails.
  • The host license expires.

When a host becomes disconnected, it still remains in the vCenter Server inventory, but vCenter Server does not get any updates from the disconnected host, does not monitor it, and therefore has no knowledge of the health of that disconnected host.

vCenter Server takes a conservative approach when considering disconnected hosts. Virtual machines on a host that is not responding affect the admission control check for vSphere HA. vCenter Server does not include those virtual machines when computing the current failover level for HA, but assumes that any virtual machines running on a disconnected host will be failed over if the host fails. Because the status of the host is not known, and because vCenter Server is not communicating with that host, HA cannot use it as a guaranteed failover target. As part of disconnecting a host, vCenter Server disables HA on that host. The virtual machines on that host are therefore not failed over in the event of a host isolation. When the host becomes reconnected, the host becomes available for failover again.

Let me know if you need additional information 'or' have any other questions that I can help with.

Regards Vishwajit Shah Skyline Support Moderator VCA-DCV | VCP5-DCV | VCP6.0; 6.5-DCV & CMA | VCA-DBT | VCAP60-DCV
0 Kudos