VMware Cloud Community
mhoward1977
Contributor
Contributor

Disconnected host

Hi,

I have a host which will not reconnect to vCenter. When I attempt to stop and restart hostd I receive the following error;

watchdog-hostd: PID file /var/run/vmware/watchdog-hostd.PID does not exist
watchdog-hostd: Unable to terminate watchdog: No running watchdog process for hostd
sh: cannot kill pid 2833: No such process

and of course hostd still shows as running.

I have VMs running on this host and if possible need to recover without affecting there current state

Any suggestions?

Thanks

Tags (1)
Reply
0 Kudos
15 Replies
JimKnopf99
Commander
Commander

Reply
0 Kudos
AndreTheGiant
Immortal
Immortal

You can try to kill the process and restart it using SSH or the local console

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=100556...

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro
Reply
0 Kudos
mhoward1977
Contributor
Contributor

Hi,

Thanks for the reply. I have already tried to restart the services via SSH. The response, is posted within my initial query.

Also the .PID files do not exist.

Thanks

Mike

Reply
0 Kudos
AndreTheGiant
Immortal
Immortal

Have you tried to manually stop it with the kill command?

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro
Reply
0 Kudos
mhoward1977
Contributor
Contributor

Yes - sh: cannot kill pid 2833: No such process

Reply
0 Kudos
AndreTheGiant
Immortal
Immortal

But this is the ID that you see with ps command?

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro
Reply
0 Kudos
mhoward1977
Contributor
Contributor

It is the ID returned from the following command, yes

~ # ps -g | grep hostd
8780146 8780146 nssquery            8780146 8853869  /usr/libexec/hostd/nssquery
2833 2833 hostd-worker        2833 2833  hostd
2985 2985 nssquery            2985 2833  /usr/libexec/hostd/nssquery
126057 2833 hostd-worker        2833 2833  hostd

Reply
0 Kudos
AndreTheGiant
Immortal
Immortal

Have you tried with 126057?

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro
Reply
0 Kudos
mhoward1977
Contributor
Contributor

Same response I'm afraid

I will add that this appears to be the result of some iSCSI storage changes. The host still has access to the targets but some of the LUNs were removed from the target. They had been unmounted etc. but following a rescan, the host became disconnected

Reply
0 Kudos
BharatR
Hot Shot
Hot Shot

Hi,


Try this kb Article http://kb.vmware.com/kb/1007261

Best regards, BharatR--VCP4-Certification #: 79230, If you find this information useful, please award points for "correct" or "helpful".
Reply
0 Kudos
mhoward1977
Contributor
Contributor

Hi - tried this but the PID files do not exist

Reply
0 Kudos
williamtaylor21
Contributor
Contributor

Did you manage to kill the hostd process without reboot in the end?

I have this exact issue, no connecticity in vcenter to esxi 5.0 hosts (latest patches installed) and when attempting to restart hostd it advises " cannot kill pid 2901: No success process"

I really cannot reboot these hosts, they are running approx 40 cusomter VM's.

Reply
0 Kudos
Micha123
Contributor
Contributor

Did someone find a solution for this problem?

is urgently!

Reply
0 Kudos
mhoward1977
Contributor
Contributor

In the end after speaking with VMware support we had no option but to re-install the host.

Our issue was caused by incorrect procedure for removing iSCSI based storage

Reply
0 Kudos
Micha123
Contributor
Contributor

Ok, you make me nervous.
We talk about 4 host and round about 150 VMs.

We got new storage (1TB LUN) with the VMFS5 format.

After the migrate of the vm's from the old to the new LUN, we unmount and extinguished the LUN over the vCenter. After a rescan of HBA and LUN, the hosts lost 2-3 hours later the connections.

I see dead LUNs at the host (esxcfg-mpath -L | grep dead).

Reply
0 Kudos