Freddie_Andreas
Contributor
Contributor

HA Agent Error - VMap_SERVERNAME process failed to stop

We are getting the follwoing message on one of our 3 ESX Servers.

HA Agent on SERVERNAME[/i] in Cluster CLUSTERNAME[/i] has an error. Related events: Error detected on SERVERNAME[/i] ín DATACENTERNAME[/i] "internalerror : Vmap_[i]SERVERNAME[/i] process failed to stop.

Does anyone know what causes this and perhaps how to resolve it?

We have restarted the ESX server, taken it to maintenence mode and remove/add to the cluster. The problem still occurs.

0 Kudos
17 Replies
bertdb
Virtuoso
Virtuoso

Look in the /opt/LGTO* subdirs, there's service console HA logfiles there.

I suspect that your HA problem has its roots in DNS/hostname problems.

can you paste the output of the following commands on all ESXes involved ?

hostname

host other-service-console-hostname (e.g. host server4.ourdomain.com)

ping other-service-console-hostname

0 Kudos
ibomo
Enthusiast
Enthusiast

Hi,

I have the same problem as stated in the forum topic (1 node out of 10).

I have checked your commands and there seems to be no problem in the DNS; previously, this ESX was working fine with HA and I have check with the communication guys that there is no problem in the DNS:

\[user@P0MVMWESX05 user]$ hostname

P0MVMWESX05.bizkaiko.aldundia

\[user@P0MVMWESX05 user]$ host p0mvmwesx09.bizkaiko.aldundia

p0mvmwesx09.bizkaiko.aldundia has address 172.27.241.59

\[user@P0MVMWESX05 user]$ host p0mvmwesx03.bizkaiko.aldundia

p0mvmwesx03.bizkaiko.aldundia has address 172.27.241.53

\[user@P0MVMWESX05 user]$ host p0mvmwesx05.bizkaiko.aldundia

p0mvmwesx05.bizkaiko.aldundia has address 172.27.241.55

\[user@P0MVMWESX05 user]$ host p0mvmwesx01.bizkaiko.aldundia

p0mvmwesx01.bizkaiko.aldundia has address 172.27.241.51

\[user@P0MVMWESX05 user]$ host p0mvmwesx03.bizkaiko.aldundia

p0mvmwesx03.bizkaiko.aldundia has address 172.27.241.53

\[user@P0MVMWESX05 user]$ host p0mvmwesx04.bizkaiko.aldundia

p0mvmwesx04.bizkaiko.aldundia has address 172.27.241.54

\[user@P0MVMWESX05 user]$ host p0mvmwesx05.bizkaiko.aldundia

p0mvmwesx05.bizkaiko.aldundia has address 172.27.241.55

\[user@P0MVMWESX05 user]$ host p0mvmwesx06.bizkaiko.aldundia

p0mvmwesx06.bizkaiko.aldundia has address 172.27.241.56

\[user@P0MVMWESX05 user]$ host p0mvmwesx07.bizkaiko.aldundia

p0mvmwesx07.bizkaiko.aldundia has address 172.27.241.57

\[user@P0MVMWESX05 user]$ host p0mvmwesx08.bizkaiko.aldundia

p0mvmwesx08.bizkaiko.aldundia has address 172.27.241.58

\[user@P0MVMWESX05 user]$ host p0mvmwesx09.bizkaiko.aldundia

p0mvmwesx09.bizkaiko.aldundia has address 172.27.241.59

\[user@P0MVMWESX05 user]$ host p0mvmwesx010.bizkaiko.aldundia

p0mvmwesx010.bizkaiko.aldundia has address 172.27.241.60

\[user@P0MVMWESX05 user]$

Any other idea?

Thanks on hand for your help, regards,

Inaki

EMC VMware Presales Specialist EMEA South
0 Kudos
ibomo
Enthusiast
Enthusiast

No way.

I have had to disable HA from the whole cluster. One of the nodes took ages to disable the service ¿?

Now that I have finished patching and rebooting in order all of the nodes, let's see if I am lucky and I can reconfigure HA...

EMC VMware Presales Specialist EMEA South
0 Kudos
ibomo
Enthusiast
Enthusiast

Again, no way to make it work properly.

I had to disable HA and re enable it... ¿? do not know what it is going on with HA, but if it behaves this way in a "controlled" situation, I do not want to know how it may behave under difficult circumstances.

Will see if it keeps on working normally.

EMC VMware Presales Specialist EMEA South
0 Kudos
Faustina
Enthusiast
Enthusiast

for HA issues after patching try this procedure this will work 100% at all times :

1. rpm -ev Vmware-vpxa- (you can find out the module name by doing rpm -qa | grep LGTO)

4. disconnect host from VC

5. reconnect host to VC

6. disable HA in cluster

7. enable HA in cluster.

0 Kudos
ibomo
Enthusiast
Enthusiast

Sweet!

Thanks for the response.

Does this come in any guide? Just to know if I should have known how to do it...

Regards

EMC VMware Presales Specialist EMEA South
0 Kudos
admin
Immortal
Immortal

What version of VC are you using? This probably has nothing to do with DNS, but rather the cleanup of HA related processes on reconfiguration.

0 Kudos
ibomo
Enthusiast
Enthusiast

VC 2.0.1 Patch 1... I know we have to update to patch 2, but the customer has not found still the time to schedule a downtime in VC.

Is it dangerous to update VC 2.0.1 Patch + SQL 2005 SP1 to Patch 2?

I know that some of my collegues have successfully performed it, but just wanted to double check.

Is it my impression or does VC 2.0.1 Patch 1 many "funny" things regarding HA? (reconfiguring too often, failing too often to reconfigure some of the nodes...)

Thanks!

EMC VMware Presales Specialist EMEA South
0 Kudos
joskev
Contributor
Contributor

You could create a dummy cluster with HA enabled, put the failed server in this dummy cluster configration and reconfigure HA.

Normally is removes the old configuration and will install a clean configuration. After that you but the server back into maint mode and move it back to the original cluster config. Exit maint mode and it will config HA correctly.

In my config this works fine.

0 Kudos
evilcraig
Contributor
Contributor

I found this forum by searching when I had the same problem.

I solved mine by using steps which are CLOSE to Faustina

His method didn't work when I followed it word by word.

I followed these steps and it worked 100% for me:

1. disable HA in cluster

On the sick server:

\--- a. rpm -ev Vmware-vpxa- (you can find out the module name by doing rpm -qa | grep LGTO)

2. disconnect host from VC

3. reconnect host to VC

4. enable HA in cluster.

FYI: VC 2.0.1 Patch 2, 3 x ESX servers, 3.0.1 with all patches upto and including 15/5/07

0 Kudos
Faustina
Enthusiast
Enthusiast

I do not see any difference in what is there above and what i had given.

0 Kudos
ezimmerm
Enthusiast
Enthusiast

Hey guys...having the same problem. I've tried removing and readding the nonworking server but still keep getting HA config errors.

internalerror vmap_hostname blah blah blah.

I'm going to try the directions above but.....

I just wanted to ask to be sure......turning off HA on the cluster will not cause any of the VM's to shutdown will it?

I think I'd have a heart attack if all the servers went down during the day.

Sorry about resurrecting an old thread but it's the newest one I can find.

0 Kudos
jbeale
Contributor
Contributor

The direction in the above replies fixed my HA issue. One tidbit of information would be to copy paste the results of the grep for the versions. Also the VM Module has to be done prior to the agent.

0 Kudos
Virtual_Jake
Contributor
Contributor

I got this error and I simply removed HA from the entire cluster, and then added it back. The HA agent on the problem child host installed properly after that. Thanks for all that responded, this helped me out!

0 Kudos
MrElliot
Contributor
Contributor

My problem solved! I followed evilcraig's instructions word for word accept I needed to add a reboot (step 2b):

1. disable HA in cluster

On the sick server:

a. rpm -ev Vmware-vpxa- (you can find out the module name by doing rpm -qa | grep LGTO)

2. disconnect host from VC

2b. Reboot Host (Skipping a reboot caused my VC to fail reconnecting to the Host.)

3. reconnect host to VC

4. enable HA in cluster.

0 Kudos
lensmanseye
Contributor
Contributor

While this is an old thread, thought I would add details of our experience with this problem here (couldn't find many threads about this on Google).

We encountered this problem on VC 2.0.1 with two ESX 3.0.1 hosts (we have as yet been unable to update these out-of-date installations). Reconfiguring the problematic host for HA consistently met with the same problem. However, disabling and then enabling HA on the cluster quickly sorted this problem out. (We did not try the rpm uninstallation method.)

0 Kudos
Mujina
Contributor
Contributor

"disable HA in cluster" is done before uninstall

0 Kudos