VMware Cloud Community
Seventh77
Enthusiast
Enthusiast
Jump to solution

Adding host to cluster: "Timed waiting for vpxa to start"

I have a cluster of 15 ESXi 5.0 hosts, with a 5.1 vCenter / Enterprise Plus license. This has been running well for quite some time, but two of my hosts were disconnected tonight and I am troubleshooting it now.

When I go to reconnect them, I get an error saying that "A general system error has occured: Timed waiting for vpxa to start". I did some searching and found that this was generally related to snapshots, but none of the VMs on either of my un-connectable hosts have any snapshots at all.

I've tried:

- Restarting vCenter services

- Rebooting vCenter

- Restarting services on the hosts

- Warm reboot of hosts

- Hard/cold reboot of hosts

- Powering off all VMs on hosts and entering maintenance mode

- DNS is working between hosts and vCenter, and vice versa

- Time is correct on vCenter and hosts

- Network connectivity is good between vCenter and hosts (all on the same switch)

However nothing seems to work, and I still can't add these hosts back to my cluster. What's odd is that I can connect to them directly with vSphere, but I just can't get them back into my vCenter. I get through the usual prompts when adding it (where it asks you to assign a license, etc) and it sees the VMs on the host as I'm adding it, but times out with this error after about 5 minutes.

Any insight would be very much appreciated.

Tags (3)
Reply
0 Kudos
1 Solution

Accepted Solutions
Seventh77
Enthusiast
Enthusiast
Jump to solution

Aha! I found this KB article which sorted it out:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=203189...

It's worth mentioning that (for helpful Googling) you'll need to chmod 777 /etc/vmware/vpxa/vpxa.cfg before you can edit it, and then chmod 444 it once finished. Restarted the vpxa service and I was able to add the host again.

View solution in original post

Reply
0 Kudos
10 Replies
marcelo_soares
Champion
Champion
Jump to solution

Check:

/var/log/vpxa.log

/var/log/vmkernel.log

One of them should give you some light on the issue. Also check if you have enough free space on the ramdisk (with df -h). If you see any 0% free or even negative numbers, paste contents here.

Question: this install is a VMware install or an OEM version (Dell, HP, IBM)?

Marcelo Soares
Reply
0 Kudos
dhanarajramesh
Jump to solution

evc enabled ? disable and try and also check the host ip address in the vpxa config file. if it is correct, backup and try to rebuild the vpxa.cfg file in the host. thanks.

Reply
0 Kudos
Seventh77
Enthusiast
Enthusiast
Jump to solution

Thanks for the replies. Here's a tail vpxa.log on one of the two hosts that have the issue:

---

2013-12-02T15:03:34.474Z [3E819B90 verbose 'Default' opID=WFU-4e735eb6] [VpxaInvtHost] Increment master gen. no to (111): VmSnapshot:CreateMoVm

2013-12-02T15:03:34.474Z [3E819B90 verbose 'Default' opID=WFU-4e735eb6] [VpxaInvtHost] Increment master gen. no to (112): VmLayout:CreateMoVm

2013-12-02T15:03:34.474Z [3E819B90 verbose 'Default' opID=WFU-4e735eb6] [VpxaInvtHost] Increment master gen. no to (113): VmStorage:CreateMoVm

2013-12-02T15:03:34.475Z [3E819B90 verbose 'Default' opID=WFU-4e735eb6] [VpxaInvtHost] Increment master gen. no (114): VmAdded

2013-12-02T15:03:34.475Z [3E819B90 info 'Default' opID=WFU-4e735eb6] [VpxaMoHost::QueryOverheadEx] Found file backing info for device 2000 of type vim.vm.device.VirtualDisk, removing vpxd moref vim.Datastore:10.86.254.251:/vol/nfs_fas2020a before passing to hostd

2013-12-02T15:03:34.475Z [3E819B90 info 'Default' opID=WFU-4e735eb6] [VpxaMoHost::QueryOverheadEx] Found network backing info for device 4000 of type vim.vm.device.VirtualE1000, removing vpxd moref vim.Network:HaNetwork-INSOC-W-VLAN before passing to hostd

---

Here's vmkernel.log:

---

2013-12-02T15:05:29.561Z cpu14:3200)WARNING: UserObj: 675: Failed to crossdup fd 12, /vmfs/devices/char/vob/VM type CHAR: Busy

2013-12-02T15:05:29.561Z cpu14:3200)WARNING: UserObj: 675: Failed to crossdup fd 13, /vmfs/devices/char/vob/External type CHAR: Busy

2013-12-02T15:05:29.561Z cpu14:3200)WARNING: UserObj: 675: Failed to crossdup fd 14, /vmfs/devices/char/vob/iScsi type CHAR: Busy

2013-12-02T15:05:29.561Z cpu14:3200)WARNING: UserObj: 675: Failed to crossdup fd 15, /vmfs/devices/char/vob/Migrate type CHAR: Busy

2013-12-02T15:05:29.561Z cpu14:3200)WARNING: UserObj: 675: Failed to crossdup fd 16, /vmfs/devices/char/vob/PageReti type CHAR: Busy

2013-12-02T15:05:29.561Z cpu14:3200)WARNING: UserObj: 675: Failed to crossdup fd 17, /vmfs/devices/char/vob/Visorfs type CHAR: Busy

2013-12-02T15:05:29.561Z cpu14:3200)WARNING: UserObj: 675: Failed to crossdup fd 18, /vmfs/devices/char/vob/Hardware type CHAR: Busy

2013-12-02T15:05:29.561Z cpu14:3200)WARNING: UserObj: 675: Failed to crossdup fd 19, /vmfs/devices/char/vob/Vfat type CHAR: Busy

2013-12-02T15:05:29.561Z cpu14:3200)WARNING: UserObj: 3232: Unimplemented operation on 0x4100233874b0/SOCKET_UNIX_SERVER

2013-12-02T15:05:29.561Z cpu14:3200)WARNING: UserObj: 675: Failed to crossdup fd 20, /var/run/vmware/vobd-user-ctx.s type SOCKET_UNIX_SERVER: Not implemented

---

No zeroes on the ramdisk, plenty of space open. This is a standard VMWare build, and as I said these two hosts have worked fine for months now. They simply showed up disconnected, and I'm trying to re-add them to the cluster.

I've tried it with EVC enabled and disabled, no change - same error.

Reply
0 Kudos
Seventh77
Enthusiast
Enthusiast
Jump to solution

I don't see an IP address at all in /etc/vmware/hostd/config.xml, or even a field where it looks like it should be. How do I go about rebuilding it? I'll try anything at this point.

Reply
0 Kudos
Seventh77
Enthusiast
Enthusiast
Jump to solution

Aha! I found this KB article which sorted it out:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=203189...

It's worth mentioning that (for helpful Googling) you'll need to chmod 777 /etc/vmware/vpxa/vpxa.cfg before you can edit it, and then chmod 444 it once finished. Restarted the vpxa service and I was able to add the host again.

Reply
0 Kudos
marcelo_soares
Champion
Champion
Jump to solution

Good. Thanks for sharing. Smiley Wink

Marcelo Soares
Reply
0 Kudos
Seventh77
Enthusiast
Enthusiast
Jump to solution

Update to this - I've now had another server in this same cluster go down with the exact same problem - so that makes two blades and one physical server all with the same issue (and the same fix).

While it's nice to know how to fix this - why is this happening? This is way too much downtime.

Reply
0 Kudos
marcelo_soares
Champion
Champion
Jump to solution

- Have you changed anything on the cluster?

- Added HA/DRS, created more VMs, etc?

- What is your VM growth tax per month?

- Have you changed log/statistics settings for vCenter?

- Can you check if you don't have a lot of snapshots on the environment? (on SSH, do a "find /vmfs/volumes/ -name *delta*")

Marcelo Soares
Reply
0 Kudos
Seventh77
Enthusiast
Enthusiast
Jump to solution

I started a new thread for this, so that this one can stay as a complete/answered thread. I'll answer you there - thanks!

Multiple host disconnects with "failed to crossdup fd xxx" errors in vmkernel.log

Reply
0 Kudos
Arador68
Contributor
Contributor
Jump to solution

When you exit the editing mode put a ! after the write and you won't have to chmod anything.

Reply
0 Kudos