VMware Cloud Community
Jemimus
Contributor
Contributor

Reviving hostd without rebooting the host

Hi all,

Due to a storage-presentation mistake I made, I have a 5.1 Host that is now unmanagable and disconnected from vCenter.

All the VM's are still happily running, but I would like to avoid having to shut them all down if possible.

I have been trying my hardest to breath some life back into the management agents.

No matter what I try, hostd doesn't seem to want to shutdown or restart properly, its status is always 'running' while being simultaneously completely unresponsive.

~ # /etc/init.d/hostd stop

watchdog-hostd: PID file /var/run/vmware/watchdog-hostd.PID does not exist

watchdog-hostd: Unable to terminate watchdog: No running watchdog process for hostd

hostd stopped.

~ # /etc/init.d/hostd status

hostd is running.

Stuff I have tried so far:

KB1003490 - /etc/init.d/hostd restart, /etc/init.d/vpxa restart, services.sh restart

KB1005566 - deleted vmware-hostd.PID watchdog-hostd.PID and tried to forcefully kill hostd with kill -9

KB1003494

KB1002849 - No other process seems to be keeping hostd busy as far as I can see, and the host is very quiet. Hostd itself doesnt seem to be doing anything.

Also tried:

Johnny's Random Tips: Restart hostd - "The real way" - This involves deleting the files in /var/lib/vmware/hostd/stats/,  Didn't help.

services.sh restart hangs when it gets to /etc/init.d/vpxa , so I generated the list of services myself and manually tried to restart all of  the rest of them.  That didnt really help either.

~ # /sbin/chkconfig -o | sed -n -e '1!G' -e 'h' -e '$p'

/etc/opt/init.d/vmware-fdm

/etc/init.d/xorg

/etc/init.d/wsman

/etc/init.d/snmpd

/etc/init.d/sfcbd

/etc/init.d/sfcbd-watchdog

/etc/init.d/vpxa

/etc/init.d/vobd

/etc/init.d/memscrubd

/etc/init.d/cdp

/etc/init.d/lacp

/etc/init.d/smartd

/etc/init.d/dcbd

/etc/init.d/slpd

/etc/init.d/vprobed

/etc/init.d/hostd

/etc/init.d/rhttpproxy

/etc/init.d/lbtd

/etc/init.d/storageRM

/etc/init.d/sensord

/etc/init.d/usbarbitrator

/etc/init.d/ntpd

/etc/init.d/lsassd

/etc/init.d/netlogond

/etc/init.d/DCUI

/etc/init.d/lwiod

So anyway, I am plain out of ideas.

My next step is to go and arrange an organised shutdown of the VM's running on the host, so I can reboot it.

But perhaps someone here has any more ideas, things I can try.

0 Kudos
2 Replies
chris2018pro
Enthusiast
Enthusiast

Had the same issue and had to restart a 6.5 host. Couldn't find a solution without reboot.

0 Kudos
MBreidenbach0
Hot Shot
Hot Shot

Older ESXi versions did have real problems with All Path Down Scenarios - hostd would crash because it would try to access the lost LUN forever. Now there's a timeout setting.

You'll probably have to shutdown VMs then restart host.

0 Kudos