VMware

This Question is Answered

1 "correct" answer available (10 pts) 1 "helpful" answer available (6 pts)
7 Replies Last post: Jan 8, 2009 10:22 PM by birdontheridge  

HA Agent has an error - : cmd remove failed: posted: Jan 5, 2009 10:47 PM

Click to view birdontheridge's profile Novice 4 posts since
Aug 24, 2008
Hi Guys,

Got a small twin host setup here running ESX 3i 3.5 servers iScsi'ed to an MSA disk shelf. Both hosts are clustered with HA & DRS enabled and all was well for a number of months since it's installation .... until sunday night. Our UPS decided to have a hernia and stopped passing power to our Server/Comms cabinets resulting in the enevitable instant power off of all hardware. Great way to start the year off !

Anyway, 99% of all systems came back up without a worry once we bypassed the dodgy UPS (waiting for that to be fixed) but ever since we have had an "HA Agent has an error" message with the red alert symbol on one host only. The second host has no problems. I have tried a number of things to repair the HA error from the VI console including:

a) Right-Clicking dodgy host and selecting "Reconfigure for VMWare HA" <-- This reults in a ": cmd remove failed:" error in the events list for the host just after the message "enabling HA agent"

b) Removing the dodgy host from the Cluster and then re-joining it

c) Dropping the dodgy host to maintenance mode and then re-enabling it

d) Restarting Dodgy Host

OK, now my linux skills can be easily classed as newbie, but i knew enough to get the following console info for troubleshooting :

1) I logged into the VMWare OS and ran a Test Management Network. Gateway, 2x DNS servers and hostname resolution all successful

2) Based on some other threads i have seen here, I then logged into the console and did a test ping by hostname to the other host and the Virtual Centre server. Both were successful, so i doubt i have a DNS or hostnames problem

3) Did an Alt-F12 to view the logging screen and noticed something that may be related ??? Every 10 minutes or so i get the following entrys :

WARNING: UserThread: 406: peer table full for sfcbd

WARNING: World: vm 204457: 910: init fn user failed with: Out of resources!

WARNING: World: vm 20457: 1775: WorldInit failed: trying to cleanup.

That may or may not be related, but i have a gut feeling it is.
Any ideas what i need to check next in order to get HA back functioning again on this host?

Thx

Mitch

Click to view peetz's profile Expert 272 posts since
Sep 25, 2004

Hi Mitch,

we had exactly the same error on of our ESXi 3.5 hosts that boots of a USB key drive.

It turned out that the USB key drive was defective causing inconsistencies in the internal file system (it's FAT by the way) that in turn led to failures of the HA agent.

Try to rewrite your boot media. If you have an HP server with a USB key drive there is a recovery CD available from HP that you can use for that. This helped us for a while, but the problem re-appeared then, because the key drive was defective.

Maybe in your case there is just a file corruption caused by the power outage that can be repaired by rewriting the media.

Best regards

Andreas

Click to view ronaldmendozaftb's profile Lurker 2 posts since
Nov 8, 2005

Another option is to actually disable VMware HA/DRS on the cluster, then re-enable it.
Click to view peetz's profile Expert 272 posts since
Sep 25, 2004

We had exactly the same drama here with the bad green keys which where then replaced by good black ones ...

I definitely recommend rewriting the key instead of fiddling around with the HA agent installation. It's not that hard:

You can save all the configuration of the host (networking etc.) using the Remote CLI interface (esxcfg-cfgbackup.pl) and restore it later using the same command after you have rewritten the key. Then you only need to re-add it to VirtualCenter and patch it to the current patchlevel.

Click to view ircman's profile Lurker 1 posts since
Jan 7, 2009
Hi All!

I"m having the same error on the hosts that i want to add to my HA cluster on my VMware esxi 3.5 upd 2 platform.

Any thoughts on this ? (i''m using a local install and no usb disks).

Regards,

Cedric.

VMware Developer

SDKs, APIs, Videos, Learn and much more in the Developer community.

Learn More

Developer Sample Code

Increase your developer productivity with VMware API sample code.

Learn More

VMworld Sessions & Labs

Online access to the latest VMworld Sessions & Labs and online services.

Learn more

Purchase PSO Credits Online

Purchase credits to redeem training and consulting services online.

Buy Now

Community Hardware Software

View reported configurations or report your own.

Learn More

VMware vSphere

Come witness the next giant leap in virtualization.

Register Today

Communities