VMware Cloud Community
VMdawg
Enthusiast
Enthusiast
Jump to solution

HA problems after redundant SVC node booted

4 Hosts in a cluster: 2X3850 2X346. HA was enabled and working fine until I fat fingered the power button on one of our SVC nodes (button sticks out like a nipple!). Fortunately for me we have 2 nodes, however when it booted back up HA had been disabled on this cluster and 2 VM's were powered off (which I realize is another issue altogether). when I try and re-enable HA I get the notorious "Insufficient resources to satisfy HA failover level on cluster...Unable to contact a primary HA agent in cluster."

Personally, whenver I have had hosts that I can't re-enable HA on I vmotion everytyhing off and boot it or I have even had to blow away the cluster and start over in some cases...but since I've got 29VM's on that cluster I really don't want to recreate the cluster...any ideas?

0 Kudos
1 Solution

Accepted Solutions
dpomeroy
Champion
Champion
Jump to solution

not sure, presently I have HA disabled on all our systems, but in doing troubleshooting VMware support told me I had to have portfast enabled. This is interesting because the documentation doesn't say its required, just recommended.

So Im just collecting info for when I get back to trying to get HA to work in our environment. I might just leave it off until "HA 2.0" comes out.

View solution in original post

0 Kudos
18 Replies
VMdawg
Enthusiast
Enthusiast
Jump to solution

p.s. We boot from SAN on amlost everything.

0 Kudos
admin
Immortal
Immortal
Jump to solution

Dan,

I've had this happen as well...so it's not just your setup. However, you know that SVC is not supported on VI3 yet, right? If you do a search for SVC and VI3 out on these forums...I believe that you'll find a thread that states that IBM will support you on a case by case basis...but no broad support for VI3 just yet.

Funny thing is...SVC is supported on ESX 2.5.x.

I know that this probably doesn't have anything to do with your HA problems.

One thing to double check is your DNS settings one each ESX server...and make sure you have host records for each server on your DNS server. Make sure the Service Console can ping the default gateway as well.

Chris

VMdawg
Enthusiast
Enthusiast
Jump to solution

Ok, I checked namesearch on each host and made sure I could ping the gateway/DNS/other hosts from ESX. I made sure all the hosts were in DNS as well. I'm wondering if because of my SVC failure, the host (or a host) in that cluster lost access to a vdisk mapping temporarily (which caused 2 vm's to go down) and it freaked out the cluster enough to cause this problem...onyl thing I can think of since it was working before. I bet if I dumped all those hosts in a new HA enabled cluster they would work...but before I can do that I need to put a spare host in that cluster for migrations.

sidenote- One thing I have noticed with clustering is its great, but it's like putting all your eggs in 2 or 3 baskets, and once those baskets get maxed (resources) you are limited to what you can do in a situation like this. Shuffling 29 VM on 4 hosts to other hosts in order to build a new cluster is a pain!

0 Kudos
dpomeroy
Champion
Champion
Jump to solution

Did you do a "reconfigure for HA".

Do you have cisco switches?

VMdawg
Enthusiast
Enthusiast
Jump to solution

I have tried reconfiguring HA multiple times...always fails. We do use Cisco switches.

0 Kudos
dpomeroy
Champion
Champion
Jump to solution

Do you have Portfast enabled?

0 Kudos
VMdawg
Enthusiast
Enthusiast
Jump to solution

Yes.

0 Kudos
VMdawg
Enthusiast
Enthusiast
Jump to solution

Some ports are running BDPU FILTER and BDPU GUARD but not all of them.

0 Kudos
dpomeroy
Champion
Champion
Jump to solution

Are all the ports for the ESX servers?

0 Kudos
mreferre
Champion
Champion
Jump to solution

So Don .... tell us that ALL these HA issues are due to something you nailed down to be around Cisco configurations so we stop beating this piece of sw ..... Smiley Wink

Massimo.

Massimo Re Ferre' VMware vCloud Architect twitter.com/mreferre www.it20.info
0 Kudos
dpomeroy
Champion
Champion
Jump to solution

not sure, presently I have HA disabled on all our systems, but in doing troubleshooting VMware support told me I had to have portfast enabled. This is interesting because the documentation doesn't say its required, just recommended.

So Im just collecting info for when I get back to trying to get HA to work in our environment. I might just leave it off until "HA 2.0" comes out.

0 Kudos
VMdawg
Enthusiast
Enthusiast
Jump to solution

4 out of 25 Hosts are on ports running both BDPU Filter and Guard. 2 of those 4 are hosts I am having the HA problem on...but the other 2 are in other HA clusters with no problems.

0 Kudos
VMdawg
Enthusiast
Enthusiast
Jump to solution

Any idea when HA 2.0 is coming out?

0 Kudos
dpomeroy
Champion
Champion
Jump to solution

nope, I'm just hoping there are some improvements in ESX 3.1, not sure when that comes out either.

0 Kudos
VMdawg
Enthusiast
Enthusiast
Jump to solution

Me2...Had to stage all the VM's off the hosts in that bad cluster and create a new cluster to re-enable HA and DRS...working great now, just a lot of work!

0 Kudos
mreferre
Champion
Champion
Jump to solution

Had to do the same thing .... fortunately it was a lab environment ....

http://www.vmware.com/community/thread.jspa?threadID=79725

Certainly re-creating the whole cluster when something goes wrong is not a nice thing though .......

Massimo.

Massimo Re Ferre' VMware vCloud Architect twitter.com/mreferre www.it20.info
0 Kudos
Funtoosh
Enthusiast
Enthusiast
Jump to solution

Here is how I resolve my problem with HA. I got an error message "Unable to contact a primary HA agent in cluster XXXXX in XXXXX".

Its 12 host cluster with 144 VM's

  1. I tried configuring HA agent on each individual host and it did not work.

  2. I tried disabling HA agent on cluster and re-enable it that did not work.

  3. Then I looked at /opt/LGTOaam512/log/aam_config_util_listnodes.log and it tells which is your primary HA host (Usually first host added into cluster becomes primary HA cluster host).

  4. I just reconfigure HA agent on that host which is primary HA agent and then rest of them allowed me to reconfigure.

There might be other way that people have found that solution but I have tried this and it directly worked.

If this helped please award me the point :smileygrin:

0 Kudos
VMdawg
Enthusiast
Enthusiast
Jump to solution

I just ran into this problem again, I have 12 hosts in this farm and when I dropped a new host in there HA was giving and internal HA:Vmap error. I disabled HA on the entire cluster and re-enabled and it worked. Hopefully they fix this bug in the newest release.

0 Kudos