I was having a problem with not being able connect the VIC to one of our servers (ESX 3.0.2) and tried rebooting it. That didn't help. The server reboots but I can't connect to it with the client. I can ssh to it but none of the virtual servers have started.
It seems that vmware.hostd starts and dies immediately. If I run 'service mgmt-vmware restart', it dies again. The last entries in /var/log/vmware/host.d are:
2008-04-17 17:03:38.706 'ServiceSystem' 3076472960 verbose Invoking command /etc/init.d/ntpd status
2008-04-17 17:03:38.723 'ServiceSystem' 3076472960 verbose Command finished with status 0
2008-04-17 17:03:38.724 'FirewallSystem' 3076472960 verbose Loading firewall configuration file '/etc/vmware/firewall/services.xml.old'
2008-04-17 17:03:38.729 'App' 3076472960 panic Application error: File Format Exception: Node has both value and children
2008-04-17 17:03:38.730 'App' 3076472960 panic Backtrace generated:
This system is supposed to be supported by HP but they won't call back. Meanwhile, I've got 20 servers down and no idea how to fix this.
Anyone? Please?
Remove the file from /etc/vmware/firewall/services.xml.old
It appears there's a loop that goes in and tries to run things in that folder. Delete that file and restart hostd.
-KjB
It sounds like you have some kind of drive/OS corruption. Are the VMs on local storage or on a SAN. Could you possibly move the VMs until you resolve the issue with the host?
Just started moving the VM's.
We should be able to get by for while. Really need to figure out how to troubleshoot this.
Unfortunately, VMWare got into production before anyone here really got to know it. We use the 'Just too late' training model.
Try checking to see if the /logs partition is full, that'll kill hostd.
vdf -h
/dev/sda6 1.4G 57M 1.3G 5% /var/log
No, I've been caught by full log partition before. Not this time
OK. VMs are successfully moving to a new server so I can start breathing right again.
This is no longer urgent but I still need a fix (hopefully other than rebuilding a server)
Remove the file from /etc/vmware/firewall/services.xml.old
It appears there's a loop that goes in and tries to run things in that folder. Delete that file and restart hostd.
-KjB
That was it. I had renamed the services.xml to services.xml.old and copied in a file from another server. Apparently the bad one was still being read. Once I deleted the bad one and tried to restart vmware-hostd came up.
I'm not sure what happened to change these, but they had probably been like that for months.
Thanks!!
Ken