Re: Disconnected host

mhoward1977 · ‎07-05-2012

Hi,

I have a host which will not reconnect to vCenter. When I attempt to stop and restart hostd I receive the following error;

watchdog-hostd: PID file /var/run/vmware/watchdog-hostd.PID does not exist
watchdog-hostd: Unable to terminate watchdog: No running watchdog process for hostd
sh: cannot kill pid 2833: No such process

and of course hostd still shows as running.

I have VMs running on this host and if possible need to recover without affecting there current state

Any suggestions?

Thanks

JimKnopf99 · ‎07-05-2012

Hi,

look at that.

http://kb.vmware.com/selfservice/documentLinkInt.do?micrositeID=&popup=true&languageId=&externalID=1...

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=100556...

Frank

If you find this information useful, please award points for "correct" or "helpful".

AndreTheGiant · ‎07-05-2012

You can try to kill the process and restart it using SSH or the local console

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=100556...

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro

mhoward1977 · ‎07-05-2012

Hi,

Thanks for the reply. I have already tried to restart the services via SSH. The response, is posted within my initial query.

Also the .PID files do not exist.

Thanks

Mike

AndreTheGiant · ‎07-05-2012

Have you tried to manually stop it with the kill command?

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro

mhoward1977 · ‎07-05-2012

Yes - sh: cannot kill pid 2833: No such process

AndreTheGiant · ‎07-05-2012

But this is the ID that you see with ps command?

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro

mhoward1977 · ‎07-05-2012

It is the ID returned from the following command, yes

~ # ps -g | grep hostd
8780146 8780146 nssquery            8780146 8853869 /usr/libexec/hostd/nssquery
2833 2833 hostd-worker        2833 2833 hostd
2985 2985 nssquery            2985 2833 /usr/libexec/hostd/nssquery
126057 2833 hostd-worker        2833 2833 hostd

AndreTheGiant · ‎07-05-2012

Have you tried with 126057?

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro

mhoward1977 · ‎07-05-2012

Same response I'm afraid

I will add that this appears to be the result of some iSCSI storage changes. The host still has access to the targets but some of the LUNs were removed from the target. They had been unmounted etc. but following a rescan, the host became disconnected

BharatR · ‎07-05-2012

Hi,

Try this kb Article http://kb.vmware.com/kb/1007261

Best regards, BharatR--VCP4-Certification #: 79230, If you find this information useful, please award points for "correct" or "helpful".

mhoward1977 · ‎07-05-2012

Hi - tried this but the PID files do not exist

williamtaylor21 · ‎07-18-2012

Did you manage to kill the hostd process without reboot in the end?

I have this exact issue, no connecticity in vcenter to esxi 5.0 hosts (latest patches installed) and when attempting to restart hostd it advises " cannot kill pid 2901: No success process"

I really cannot reboot these hosts, they are running approx 40 cusomter VM's.

Micha123 · ‎10-24-2012

Did someone find a solution for this problem?

is urgently!

mhoward1977 · ‎10-24-2012

In the end after speaking with VMware support we had no option but to re-install the host.

Our issue was caused by incorrect procedure for removing iSCSI based storage

Micha123 · ‎10-24-2012

Ok, you make me nervous.
We talk about 4 host and round about 150 VMs.

We got new storage (1TB LUN) with the VMFS5 format.

After the migrate of the vm's from the old to the new LUN, we unmount and extinguished the LUN over the vCenter. After a rescan of HBA and LUN, the hosts lost 2-3 hours later the connections.

I see dead LUNs at the host (esxcfg-mpath -L | grep dead).

All

Disconnected host