Hi,
I have a host which will not reconnect to vCenter. When I attempt to stop and restart hostd I receive the following error;
watchdog-hostd: PID file /var/run/vmware/watchdog-hostd.PID does not exist
watchdog-hostd: Unable to terminate watchdog: No running watchdog process for hostd
sh: cannot kill pid 2833: No such process
and of course hostd still shows as running.
I have VMs running on this host and if possible need to recover without affecting there current state
Any suggestions?
Thanks
Hi,
look at that.
Frank
You can try to kill the process and restart it using SSH or the local console
Hi,
Thanks for the reply. I have already tried to restart the services via SSH. The response, is posted within my initial query.
Also the .PID files do not exist.
Thanks
Mike
Have you tried to manually stop it with the kill command?
Yes - sh: cannot kill pid 2833: No such process
But this is the ID that you see with ps command?
It is the ID returned from the following command, yes
~ # ps -g | grep hostd
8780146 8780146 nssquery 8780146 8853869 /usr/libexec/hostd/nssquery
2833 2833 hostd-worker 2833 2833 hostd
2985 2985 nssquery 2985 2833 /usr/libexec/hostd/nssquery
126057 2833 hostd-worker 2833 2833 hostd
Have you tried with 126057?
Same response I'm afraid
I will add that this appears to be the result of some iSCSI storage changes. The host still has access to the targets but some of the LUNs were removed from the target. They had been unmounted etc. but following a rescan, the host became disconnected
Hi,
Try this kb Article http://kb.vmware.com/kb/1007261
Hi - tried this but the PID files do not exist
Did you manage to kill the hostd process without reboot in the end?
I have this exact issue, no connecticity in vcenter to esxi 5.0 hosts (latest patches installed) and when attempting to restart hostd it advises " cannot kill pid 2901: No success process"
I really cannot reboot these hosts, they are running approx 40 cusomter VM's.
Did someone find a solution for this problem?
is urgently!
In the end after speaking with VMware support we had no option but to re-install the host.
Our issue was caused by incorrect procedure for removing iSCSI based storage
Ok, you make me nervous.
We talk about 4 host and round about 150 VMs.
We got new storage (1TB LUN) with the VMFS5 format.
After the migrate of the vm's from the old to the new LUN, we unmount and extinguished the LUN over the vCenter. After a rescan of HBA and LUN, the hosts lost 2-3 hours later the connections.
I see dead LUNs at the host (esxcfg-mpath -L | grep dead).