Hi,
When trying to run:
service mgmt-vmware restart
Stopping VMware ESX Server Management services:
VMware ESX Server Host Agent Watchdog
Starting VMware ESX Server Management services:
VMware ESX Server Host Agent (background)
Availability report startup (background)
Restart of Server Host Agent fails on stoping... The host now appears disconnect on Virtual Center, and i'm not able to reconnect as it says the host is not available on the network, or not contactable....
VM's are still working. Restart of ESX server is not an easy option... how can i solve this? Any ideas?
A reboot is probably best, but you could try killing the vmware-hostd process manually.
Use ps to find the process IDs.
ps aux | grep hostd
root 1579 0.0 0.3 4268 964 ? S Apr15 0:00 /bin/sh /usr/bin/vmware-watchdog -s hostd -u 60 -q 5 -c /usr/sbin/vmware-hostd-support /usr/sbin/vmware-hostd -u
root 1608 0.9 19.4 88548 52364 ? S Apr15 18:10 /usr/lib/vmware/hostd/vmware-hostd /etc/vmware/hostd/config.xml -u
kill -9 the watchdog, in the case PID 1579, and then the vmware-hostd. Do another ps to check they're gone then try restarting mgmt-vmware.
Alex
What we have had to do in the past is to kill the watchdog and hostd on the host and then restart them.
cd /var/run/vmware
and
cat vmware-hostd.PID
and
cat watchdog-hostd.PID
to get the PID of the services. then
kill -9
for each of them. Be very careful that you type the correct PID as you don't want to kill the wrong process!
once this is done
rm -f watchdog-hostd.PID
rm -f watchdog-hostd.PID
Then restart the mgmt-vmware service.
All of our VMs continued to run and the Host came back into VC
If it does not come back after restarting the service then you may need to remove the virtual center agent and add the host back into VC. Stop the vpxa service and remove the package as follows.
Get the full packaged name by typing the command
"rpm -qa | grep -i vmware-vpxa)
This returned the agent package installed on the host. Then run the following to uninstall the package:
rpm -e VMware-vpxa-2.0.x.xxxxx
(where xxxx is the installed package found in the previous step)
Once you get the host back into VC you should probably put it into maintenance mode and VMotion the VMs off and reboot it as well.
Hi,
Output of command:
ps aux | grep hostd
root 24309 0.0 0.0 3684 676 pts/0 S 18:33 0:00 grep hostd
So now I should kill this process with a command such as:
kill -9 24309
Is this correct? This won't affect my running vm's, right?
But, I don't understand...
Result of command:
cat vmware-hostd.PID
2247[root@mpaesx4 vmware]#
Result of command:
cat vmware-watchdog.PID
cat: watchdog-hostd.PID: No such file or directory
So, for killing the process, should I use the PID 2247, or the PID 24309???
You should try a standard kill to see if that works and then if that still fails, then do it with -9 which is kill with prejudice.
kill 24309
then if that fails,
kill -9 24309
Also FYI - hostd will probably have a parent process and spawned child, so you should get back two
Here is quick example:
[root@himalaya VMware]# ps -ef | grep hostd | grep -v grep root 1511 1 0 Mar22 ? 00:00:00 /bin/sh /usr/bin/vmware-watchdog -s hostd -u 60 -q 5 -c /usr/sbin/vmware-hostd-support /usr/sbin/vmware-hostd -u root 1521 1511 0 Mar22 ? 05:37:11 /usr/lib/vmware/hostd/vmware-hostd /etc/vmware/hostd/config.xml -u
Here you'll want to kill both the child and parent and then restart the process which will auto-respawn the child process
=========================================================================
William Lam
VMware vExpert 2009
VMware ESX/ESXi scripts and resources at:
That is your grep PID
go to the /var/run/vmware directory and run
cat vmware-hosts.PID
and so on as mentioned above. It will not kill your VMs once you kill and delete the .PID files and you restart the mgmt-vmware process it will recreate the .PID files for you
24309 is the PID for the grep command
the 2247 is the hostd PID.
(sorry )
We have had to do this several times unfortunately.
Message was edited by: Chamon
When trying to kill process 2247 I get:
2247[root@mpaesx4 vmware]# kill 2247
-bash: kill: (2247) - No such process
Any other ideas?
I would delete the vmware-hostd.PID and the watchdog-hostd.PID and restart the
service mgmt-vmware restart
This will recreate the two .PID files and the service should start up. Then If it still cannot be added back to the VC you would have to remove the vpxa agent and then add the host back to the VC
Removed vmware-hostd.PID. There is no watchdog-hostd.PID file to remove!
Restart of mgmt service ended um with the same result when trying to stop ESX Server Host Agent...
FAILED...
Did it start? It would fail because it was not started. If the services started back up what do you get when you
service mgmt-vmware status?
if it is up try restarting it again. What do you have in /var/log/messages?
When we had this problem before we had some "disk appears confused" events. Ours was a bad CD-ROM drive. Had remove it to get it back working again.
Any luck?
Same problem:
Stopping VMware ESX Server Management services:
VMware ESX Server Host Agent Watchdog
Starting VMware ESX Server Management services:
VMware ESX Server Host Agent (background)
Availability report startup (background)
Last entries on messages are:
Apr 16 19:09:31 mpaesx4 VMware[init]: Begin '/usr/sbin/vmware-hostd -u', min-uptime = 60, max-quick-failures = 5, max$
Apr 16 19:09:32 mpaesx4 watchdog-hostd: '/usr/sbin/vmware-hostd -u' exited after 1 seconds (quick failure 1)
Apr 16 19:09:32 mpaesx4 watchdog-hostd: Executing cleanup command '/usr/sbin/vmware-hostd-support'
Apr 16 19:09:32 mpaesx4 VMware[init]: /usr/bin/vmware-watchdog: line 192: 25118 Segmentation fault (core dumped) setsid $
Apr 16 19:09:32 mpaesx4 watchdog-hostd: Executing '/usr/sbin/vmware-hostd -u'
Apr 16 19:09:33 mpaesx4 VMware[init]: /usr/bin/vmware-watchdog: line 192: 25218 Segmentation fault (core dumped) setsid $
Apr 16 19:09:33 mpaesx4 watchdog-hostd: '/usr/sbin/vmware-hostd -u' exited after 1 seconds (quick failure 2)
Apr 16 19:09:33 mpaesx4 watchdog-hostd: Executing cleanup command '/usr/sbin/vmware-hostd-support'
Apr 16 19:09:33 mpaesx4 watchdog-hostd: Executing '/usr/sbin/vmware-hostd -u'
Apr 16 19:09:33 mpaesx4 VMware[init]: /usr/bin/vmware-watchdog: line 192: 25288 Segmentation fault (core dumped) setsid $
Apr 16 19:09:33 mpaesx4 watchdog-hostd: '/usr/sbin/vmware-hostd -u' exited after 0 seconds (quick failure 3)
Apr 16 19:09:33 mpaesx4 watchdog-hostd: Executing cleanup command '/usr/sbin/vmware-hostd-support'
Apr 16 19:09:34 mpaesx4 watchdog-hostd: Executing '/usr/sbin/vmware-hostd -u'
Apr 16 19:09:34 mpaesx4 VMware[init]: /usr/bin/vmware-watchdog: line 192: 25358 Segmentation fault (core dumped) setsid $
Apr 16 19:09:34 mpaesx4 watchdog-hostd: '/usr/sbin/vmware-hostd -u' exited after 0 seconds (quick failure 4)
Apr 16 19:09:34 mpaesx4 watchdog-hostd: Executing cleanup command '/usr/sbin/vmware-hostd-support'
Apr 16 19:09:34 mpaesx4 watchdog-hostd: Executing '/usr/sbin/vmware-hostd -u'
Apr 16 19:09:34 mpaesx4 VMware[init]: /usr/bin/vmware-watchdog: line 192: 25428 Segmentation fault (core dumped) setsid $
Apr 16 19:09:34 mpaesx4 watchdog-hostd: '/usr/sbin/vmware-hostd -u' exited after 0 seconds (quick failure 5)
Apr 16 19:09:34 mpaesx4 watchdog-hostd: Executing cleanup command '/usr/sbin/vmware-hostd-support'
Apr 16 19:09:35 mpaesx4 watchdog-hostd: Executing '/usr/sbin/vmware-hostd -u'
Apr 16 19:09:35 mpaesx4 VMware[init]: /usr/bin/vmware-watchdog: line 192: 25498 Segmentation fault (core dumped) setsid $
Apr 16 19:09:35 mpaesx4 watchdog-hostd: '/usr/sbin/vmware-hostd -u' exited after 0 seconds (quick failure 6)
Apr 16 19:09:35 mpaesx4 watchdog-hostd: Executing cleanup command '/usr/sbin/vmware-hostd-support'
Apr 16 19:09:35 mpaesx4 watchdog-hostd: End '/usr/sbin/vmware-hostd -u', failure limit reached
Apr 16 19:24:42 mpaesx4 watchdog-hostd: PID file /var/run/vmware/watchdog-hostd.PID not found
Apr 16 19:24:42 mpaesx4 watchdog-hostd: Unable to terminate watchdog: Can't find process
Apr 16 19:24:42 mpaesx4 watchdog-hostd: PID file /var/run/vmware/watchdog-hostd.PID not found
Apr 16 19:24:42 mpaesx4 watchdog-hostd: Begin '/usr/sbin/vmware-hostd -u', min-uptime = 60, max-quick-failures = 5, m$
Apr 16 19:24:42 mpaesx4 watchdog-hostd: Executing '/usr/sbin/vmware-hostd -u'
Apr 16 19:24:42 mpaesx4 VMware[init]: Begin '/usr/sbin/vmware-hostd -u', min-uptime = 60, max-quick-failures = 5, max$
Apr 16 19:24:42 mpaesx4 VMware[init]: /usr/bin/vmware-watchdog: line 192: 25625 Segmentation fault (core dumped) setsid $
Apr 16 19:24:42 mpaesx4 watchdog-hostd: '/usr/sbin/vmware-hostd -u' exited after 0 seconds (quick failure 1)
Apr 16 19:24:42 mpaesx4 watchdog-hostd: Executing cleanup command '/usr/sbin/vmware-hostd-support'
Apr 16 19:24:43 mpaesx4 watchdog-hostd: Executing '/usr/sbin/vmware-hostd -u'
Apr 16 19:24:43 mpaesx4 watchdog-hostd: '/usr/sbin/vmware-hostd -u' exited after 0 seconds (quick failure 2)
Apr 16 19:24:43 mpaesx4 watchdog-hostd: Executing cleanup command '/usr/sbin/vmware-hostd-support'
Apr 16 19:24:43 mpaesx4 VMware[init]: /usr/bin/vmware-watchdog: line 192: 25725 Segmentation fault (core dumped) setsid $
Apr 16 19:24:43 mpaesx4 watchdog-hostd: Executing '/usr/sbin/vmware-hostd -u'
Apr 16 19:24:43 mpaesx4 VMware[init]: /usr/bin/vmware-watchdog: line 192: 25800 Segmentation fault (core dumped) setsid $
Still cannot connect to the esx host through Virtual Center.
What do you have in var/log/vmware/hostd.log ?
To get it back into VC you will probably need to remove the vpxa and then add it back into VC. Have you tried to add it back into the VC by first removing it from Virtual Center? Or does it still show disconnected?
hostd-log shows this:
Hw info file: /etc/vmware/hostd/hwInfo.xml
GetPropertyProvider failed for haTask-ha-root-pool-vim.Resour$
GetPropertyProvider failed for haTask-ha-root-pool-vim.Resour$
Task Created : haTask-ha-root-pool-vim.ResourcePool.updateConfig-51526
Task Completed : haTask-ha-root-pool-vim.ResourcePool.updateConfig-515$
Task Created : haTask-ha-root-pool-vim.ResourcePool.updateConfig-51527
Task Completed : haTask-ha-root-pool-vim.ResourcePool.updateConfig-515$
GetPropertyProvider failed for haTask-ha-root-pool-vim.Resourc$
GetPropertyProvider failed for haTask-ha-root-pool-vim.Resourc$
Hw info file: /etc/vmware/hostd/hwInfo.xml
Hw info file: /etc/vmware/hostd/hwInfo.xml
GetPropertyProvider failed for haTask-ha-root-pool-vim.Resourc$
GetPropertyProvider failed for haTask-ha-root-pool-vim.Resourc$
Task Created : haTask-ha-root-pool-vim.ResourcePool.updateConfig-51567
Task Completed : haTask-ha-root-pool-vim.ResourcePool.updateConfig-51567
Task Created : haTask-ha-root-pool-vim.ResourcePool.updateConfig-51568
Task Completed : haTask-ha-root-pool-vim.ResourcePool.updateConfig-51568
Hw info file: /etc/vmware/hostd/hwInfo.xml
Hw info file: /etc/vmware/hostd/hwInfo.xml
Task Created : haTask-ha-root-pool-vim.ResourcePool.updateConfig-51595
Task Completed : haTask-ha-root-pool-vim.ResourcePool.updateConfig-515$
Task Created : haTask-ha-root-pool-vim.ResourcePool.updateConfig-51596
Task Completed : haTask-ha-root-pool-vim.ResourcePool.updateConfig-515$
GetPropertyProvider failed for haTask-ha-root-pool-vim.Resour$
GetPropertyProvider failed for haTask-ha-root-pool-vim.Resour$
Task Created : haTask-ha-root-pool-vim.ResourcePool.updateConfig-51606
Task Completed : haTask-ha-root-pool-vim.ResourcePool.updateConfig-51606
Task Created : haTask-ha-root-pool-vim.ResourcePool.updateConfig-51607
[[2009-04-16 17:03:29.498 'TaskManager' 21154736 info] Task Completed : haTask-ha-root-pool-vim.ResourcePool.updateConfig-51607
Hw info file: /etc/vmware/hostd/hwInfo.xml
Task Created : haTask-ha-root-pool-vim.ResourcePool.updateConfig-51621
Task Completed : haTask-ha-root-pool-vim.ResourcePool.updateConfig-51$
Task Created : haTask-ha-root-pool-vim.ResourcePool.updateConfig-51622
Task Completed : haTask-ha-root-pool-vim.ResourcePool.updateConfig-51$
Hw info file: /etc/vmware/hostd/hwInfo.xml
Task Created : haTask-ha-root-pool-vim.ResourcePool.updateConfig-51633
Task Completed : haTask-ha-root-pool-vim.ResourcePool.updateConfig-51633
Task Created : haTask-ha-root-pool-vim.ResourcePool.updateConfig-51634
Task Completed : haTask-ha-root-pool-vim.ResourcePool.updateConfig-516$
GetPropertyProvider failed for haTask-ha-root-pool-vim.Resourc$
GetPropertyProvider failed for haTask-ha-root-pool-vim.Resourc$
GetPropertyProvider failed for haTask-ha-root-pool-vim.Resourc$
GetPropertyProvider failed for haTask-ha-root-pool-vim.Resourc$
Hw info file: /etc/vmware/hostd/hwInfo.xml
Task Created : haTask-ha-root-pool-vim.ResourcePool.updateConfig-51651
Task Completed : haTask-ha-root-pool-vim.ResourcePool.updateConfig-516$
Task Created : haTask-ha-root-pool-vim.ResourcePool.updateConfig-51652
Task Completed : haTask-ha-root-pool-vim.ResourcePool.updateConfig-516$
Task Created : haTask-ha-root-pool-vim.ResourcePool.updateConfig-51660
Task Completed : haTask-ha-root-pool-vim.ResourcePool.updateConfig-516$
Task Created : haTask-ha-root-pool-vim.ResourcePool.updateConfig-51661
Task Completed : haTask-ha-root-pool-vim.ResourcePool.updateConfig-516$
GetPropertyProvider failed for haTask-ha-root-pool-vim.Resour$
GetPropertyProvider failed for haTask-ha-root-pool-vim.Resour$
Hw info file: /etc/vmware/hostd/hwInfo.xml
I did not try to remove the vpxa or remove the host of the virtual center, because I still haven't been able to get the mgmt services to restart properly... so, i believe removing the host from the virtual center wouldn't help... also, i'm still not sure on how to stop the vpxa service and remove the agent from the host....
But, at this point, i'm exactly the same... when i try to restart the services i get a failure on the esx server host agent... it says failure on stopping, but ok on restarting (as i already mentioned before). But, although it says it started ok, virtual center isn't able to contact the esx host, and gives a message that the management services are not responding...
Your vms are running correct? I will see what I can find tonight. Did
anything change just before you started having this issue? Why were
you restarting the mgmt service?
On Apr 16, 2009, at 5:02 PM, Zigoze <communities-emailer@vmware.com
Yes, vms are running correctly...
Thanks so much for your help.
No anything changed before the issue. We only had a problem with a vm (wich is in another node of the cluster) that failed vcb backup stating that there was another operation pending. We had solved this issue before with a simple restart of vmware mgmt services. That is why we were restarting the services on the hosts... all of the other nodes worked fine... but this one failed and the host became disconnected. Now I cannot get it to reconnect again, because mgmt services apparently are not restarting correctly on this host...
You shouldn't need to delete hostd, I don't think I ever have. Certainly not recently; I think I'd look at other things. For example. What's your build? ("vmware-v"). Have you got anything in /var/core?, I'd probably also suggest trying "df -h". I've seen hostd get mucked up if there's too many problems with root or /var filling up. Anything in /var/core or /var/crash can probably go. What's your service console memory set to? Believe me, I wish I could shout to the heavens! If you're still running 272, increase it to 800! :smileycool: (Now that I think about, it might also be worth doing a /mount. Think I saw a service console the other day with a completely read-only filesystem. That was a reboot for sure.)