I have a cluster of 15 ESXi 5.0 hosts, with a 5.1 vCenter / Enterprise Plus license. This has been running well for quite some time, but two of my hosts were disconnected tonight and I am troubleshooting it now.
When I go to reconnect them, I get an error saying that "A general system error has occured: Timed waiting for vpxa to start". I did some searching and found that this was generally related to snapshots, but none of the VMs on either of my un-connectable hosts have any snapshots at all.
I've tried:
- Restarting vCenter services
- Rebooting vCenter
- Restarting services on the hosts
- Warm reboot of hosts
- Hard/cold reboot of hosts
- Powering off all VMs on hosts and entering maintenance mode
- DNS is working between hosts and vCenter, and vice versa
- Time is correct on vCenter and hosts
- Network connectivity is good between vCenter and hosts (all on the same switch)
However nothing seems to work, and I still can't add these hosts back to my cluster. What's odd is that I can connect to them directly with vSphere, but I just can't get them back into my vCenter. I get through the usual prompts when adding it (where it asks you to assign a license, etc) and it sees the VMs on the host as I'm adding it, but times out with this error after about 5 minutes.
Any insight would be very much appreciated.
Aha! I found this KB article which sorted it out:
It's worth mentioning that (for helpful Googling) you'll need to chmod 777 /etc/vmware/vpxa/vpxa.cfg before you can edit it, and then chmod 444 it once finished. Restarted the vpxa service and I was able to add the host again.
Check:
/var/log/vpxa.log
/var/log/vmkernel.log
One of them should give you some light on the issue. Also check if you have enough free space on the ramdisk (with df -h). If you see any 0% free or even negative numbers, paste contents here.
Question: this install is a VMware install or an OEM version (Dell, HP, IBM)?
evc enabled ? disable and try and also check the host ip address in the vpxa config file. if it is correct, backup and try to rebuild the vpxa.cfg file in the host. thanks.
Thanks for the replies. Here's a tail vpxa.log on one of the two hosts that have the issue:
---
2013-12-02T15:03:34.474Z [3E819B90 verbose 'Default' opID=WFU-4e735eb6] [VpxaInvtHost] Increment master gen. no to (111): VmSnapshot:CreateMoVm
2013-12-02T15:03:34.474Z [3E819B90 verbose 'Default' opID=WFU-4e735eb6] [VpxaInvtHost] Increment master gen. no to (112): VmLayout:CreateMoVm
2013-12-02T15:03:34.474Z [3E819B90 verbose 'Default' opID=WFU-4e735eb6] [VpxaInvtHost] Increment master gen. no to (113): VmStorage:CreateMoVm
2013-12-02T15:03:34.475Z [3E819B90 verbose 'Default' opID=WFU-4e735eb6] [VpxaInvtHost] Increment master gen. no (114): VmAdded
2013-12-02T15:03:34.475Z [3E819B90 info 'Default' opID=WFU-4e735eb6] [VpxaMoHost::QueryOverheadEx] Found file backing info for device 2000 of type vim.vm.device.VirtualDisk, removing vpxd moref vim.Datastore:10.86.254.251:/vol/nfs_fas2020a before passing to hostd
2013-12-02T15:03:34.475Z [3E819B90 info 'Default' opID=WFU-4e735eb6] [VpxaMoHost::QueryOverheadEx] Found network backing info for device 4000 of type vim.vm.device.VirtualE1000, removing vpxd moref vim.Network:HaNetwork-INSOC-W-VLAN before passing to hostd
---
Here's vmkernel.log:
---
2013-12-02T15:05:29.561Z cpu14:3200)WARNING: UserObj: 675: Failed to crossdup fd 12, /vmfs/devices/char/vob/VM type CHAR: Busy
2013-12-02T15:05:29.561Z cpu14:3200)WARNING: UserObj: 675: Failed to crossdup fd 13, /vmfs/devices/char/vob/External type CHAR: Busy
2013-12-02T15:05:29.561Z cpu14:3200)WARNING: UserObj: 675: Failed to crossdup fd 14, /vmfs/devices/char/vob/iScsi type CHAR: Busy
2013-12-02T15:05:29.561Z cpu14:3200)WARNING: UserObj: 675: Failed to crossdup fd 15, /vmfs/devices/char/vob/Migrate type CHAR: Busy
2013-12-02T15:05:29.561Z cpu14:3200)WARNING: UserObj: 675: Failed to crossdup fd 16, /vmfs/devices/char/vob/PageReti type CHAR: Busy
2013-12-02T15:05:29.561Z cpu14:3200)WARNING: UserObj: 675: Failed to crossdup fd 17, /vmfs/devices/char/vob/Visorfs type CHAR: Busy
2013-12-02T15:05:29.561Z cpu14:3200)WARNING: UserObj: 675: Failed to crossdup fd 18, /vmfs/devices/char/vob/Hardware type CHAR: Busy
2013-12-02T15:05:29.561Z cpu14:3200)WARNING: UserObj: 675: Failed to crossdup fd 19, /vmfs/devices/char/vob/Vfat type CHAR: Busy
2013-12-02T15:05:29.561Z cpu14:3200)WARNING: UserObj: 3232: Unimplemented operation on 0x4100233874b0/SOCKET_UNIX_SERVER
2013-12-02T15:05:29.561Z cpu14:3200)WARNING: UserObj: 675: Failed to crossdup fd 20, /var/run/vmware/vobd-user-ctx.s type SOCKET_UNIX_SERVER: Not implemented
---
No zeroes on the ramdisk, plenty of space open. This is a standard VMWare build, and as I said these two hosts have worked fine for months now. They simply showed up disconnected, and I'm trying to re-add them to the cluster.
I've tried it with EVC enabled and disabled, no change - same error.
I don't see an IP address at all in /etc/vmware/hostd/config.xml, or even a field where it looks like it should be. How do I go about rebuilding it? I'll try anything at this point.
Aha! I found this KB article which sorted it out:
It's worth mentioning that (for helpful Googling) you'll need to chmod 777 /etc/vmware/vpxa/vpxa.cfg before you can edit it, and then chmod 444 it once finished. Restarted the vpxa service and I was able to add the host again.
Good. Thanks for sharing. ![]()
Update to this - I've now had another server in this same cluster go down with the exact same problem - so that makes two blades and one physical server all with the same issue (and the same fix).
While it's nice to know how to fix this - why is this happening? This is way too much downtime.
- Have you changed anything on the cluster?
- Added HA/DRS, created more VMs, etc?
- What is your VM growth tax per month?
- Have you changed log/statistics settings for vCenter?
- Can you check if you don't have a lot of snapshots on the environment? (on SSH, do a "find /vmfs/volumes/ -name *delta*")
I started a new thread for this, so that this one can stay as a complete/answered thread. I'll answer you there - thanks!
Multiple host disconnects with "failed to crossdup fd xxx" errors in vmkernel.log
When you exit the editing mode put a ! after the write and you won't have to chmod anything.
