VMware Cloud Community
KenMac
Contributor
Contributor
Jump to solution

vmware-hostd dies immediately

I was having a problem with not being able connect the VIC to one of our servers (ESX 3.0.2) and tried rebooting it. That didn't help. The server reboots but I can't connect to it with the client. I can ssh to it but none of the virtual servers have started.

It seems that vmware.hostd starts and dies immediately. If I run 'service mgmt-vmware restart', it dies again. The last entries in /var/log/vmware/host.d are:

2008-04-17 17:03:38.706 'ServiceSystem' 3076472960 verbose Invoking command /etc/init.d/ntpd status

2008-04-17 17:03:38.723 'ServiceSystem' 3076472960 verbose Command finished with status 0

2008-04-17 17:03:38.724 'FirewallSystem' 3076472960 verbose Loading firewall configuration file '/etc/vmware/firewall/services.xml.old'

2008-04-17 17:03:38.729 'App' 3076472960 panic Application error: File Format Exception: Node has both value and children

2008-04-17 17:03:38.730 'App' 3076472960 panic Backtrace generated:

eip 0x127a77e

eip 0x11d1369

eip 0x118b4b5

eip 0x11a2c42

eip 0x11a2400

eip 0x825043b

eip 0x824fe7e

eip 0x824c6d5

eip 0x82533f7

eip 0x82c4653

eip 0x82d1574

eip 0x82f94ad

eip 0x8305de4

eip 0x85179a

eip 0x8090e51

This system is supposed to be supported by HP but they won't call back. Meanwhile, I've got 20 servers down and no idea how to fix this.

Anyone? Please?

0 Kudos
1 Solution

Accepted Solutions
kjb007
Immortal
Immortal
Jump to solution

Remove the file from /etc/vmware/firewall/services.xml.old

It appears there's a loop that goes in and tries to run things in that folder. Delete that file and restart hostd.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

View solution in original post

0 Kudos
7 Replies
danpalacios
Hot Shot
Hot Shot
Jump to solution

It sounds like you have some kind of drive/OS corruption. Are the VMs on local storage or on a SAN. Could you possibly move the VMs until you resolve the issue with the host?

KenMac
Contributor
Contributor
Jump to solution

Just started moving the VM's.

We should be able to get by for while. Really need to figure out how to troubleshoot this.

Unfortunately, VMWare got into production before anyone here really got to know it. We use the 'Just too late' training model.

0 Kudos
Randy_B
Enthusiast
Enthusiast
Jump to solution

Try checking to see if the /logs partition is full, that'll kill hostd.

vdf -h

KenMac
Contributor
Contributor
Jump to solution

/dev/sda6 1.4G 57M 1.3G 5% /var/log

No, I've been caught by full log partition before. Not this time

0 Kudos
KenMac
Contributor
Contributor
Jump to solution

OK. VMs are successfully moving to a new server so I can start breathing right again.

This is no longer urgent but I still need a fix (hopefully other than rebuilding a server)

0 Kudos
kjb007
Immortal
Immortal
Jump to solution

Remove the file from /etc/vmware/firewall/services.xml.old

It appears there's a loop that goes in and tries to run things in that folder. Delete that file and restart hostd.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
0 Kudos
KenMac
Contributor
Contributor
Jump to solution

That was it. I had renamed the services.xml to services.xml.old and copied in a file from another server. Apparently the bad one was still being read. Once I deleted the bad one and tried to restart vmware-hostd came up.

I'm not sure what happened to change these, but they had probably been like that for months.

Thanks!!

Ken

0 Kudos