ITRC_Architect
Contributor
Contributor

How to force HA with Blades?

This question has been lingering, but now it is critical.

I have a disconnected blade.  It is acting up in an awful way.  However, it is also keeping hold of it's machines which also show up as disconnected in Virtual Center.  The whole thing stinks, and reboots are not helping.  Now, some of the guests are turned off and we can't get them back on because of the host problems, etc.

I want to force an HA so I can easily get back control of the virtual guests on other ESXi hosts.  Unfortunately, these are blade systems and pulling a network cable out isn't very easy.

How do people force an HA action on a misbehaving host when pulling out network cables is not an option?

Thank you.

0 Kudos
7 Replies
weinstein5
Immortal
Immortal

I deally if you are licensed for vmotion you would vmotion the vms to other blades - by placing the porblem blade in Maintenance Mode and vmotion the VMs off and when they are off power down the suspect blade

If vmotion is not vailable just power down the blade - 

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
0 Kudos
ITRC_Architect
Contributor
Contributor

That is the problem.  Powered on, everything (hosts and guests) are disconnected and I can't vmotion.

Powered off, everything stays disconnected, even though we had HA turned on, etc...

I guess the box feels he was shut down correctly and therefore has no reason to give up his guests.  I don't know, but it is frustrating.

0 Kudos
Troy_Clavell
Immortal
Immortal

you can ping the Host in question?  What happens if vCenter if you right click on the host and "connect"  Have you tried restarting the management agents as well?

Once HA is configured there is no dependecing on vCenter.  This is a Host issue now.

0 Kudos
PduPreez
VMware Employee
VMware Employee

If HA is set up properly, the VMs should start on the other hosts in the event of host failure.

If this host is completely isolated HA will try and start the VMs on the other hosts

But if the VMs are still running or is locked by the crippled host, the HA startups will fail.

HA will try starting the VMs 6 times (5 times in ESXi5)

So if HA tried to start the VMs for 6 retries and you shutdown the host, HA will not try again.

What you can do is power off the Host through remote access (ILO for HPs)

If you do this while HA is still trying the VMs should start, if you do this after the HA process you should be able to start the VMs manually.

right click the Disconnected VM and select start or migrate to working host.

If this does not work, remove the VM from inventory and re-add from datastore

Hope this help

Please award points if you find this helpful/correct Smiley Happy

ITRC_Architect
Contributor
Contributor

We can ping the host.  We have rebooted.  etc...

We are on the phone with VMware who are trying to figure out the logs and are seeing strange errors.  This will get fixed, I have no doubt.

The point of this thread however, is "how do we force an HA with blades?"

Yesterday, while everything was disconnected, we just waited to do anything because the guests were still running and were critical.  However, last night we performed our reboot to try to clear things up.  Things did not clear up, but got worse because the ESXi 4.1 host still held onto the guests and we had minimal control over the guests.  The first thing we wanted to do was regain full control over the guests and then troubleshoot the ESXi problems on this host.

However, all the technicians are going the other way and trying to figure out why the ESXi host is misbehaving first.  That sucks.  I would really like to get back my guests and since we have / paid for HA with the ESXi plus licensing, how do I make that happen?

In all honesty, I have never seen HA work when we needed it to.  Over the last 4 years I have seen HA cause cluster problems when deploying HA agents to new hosts joining the cluster.  I have seen HA crap out and throw a lot of alarms, etc...  I have seen HA prevent guests from starting.  At some point, I want to see HA provide some benefit.  Like today, I want it to move my machines from this disconnected ESXi host so we can spend time figuring out what broke.

My last resort is simply to shut off the disconnected ESXi host and add the machines to a new host and boot them.  However, then I will get a lot of problems when this host restarts...

0 Kudos
ITRC_Architect
Contributor
Contributor

One question.  You made a statement that I would like to better understand:

"right click the Disconnected VM and select start or migrate to working host."

When our VM's are disconnected on a disconnected host, right clicking does not provide options to migrate or start.  Your reply, however, indicated we should see those options?

As a note, our ESXi host just magically reconnected.  Everybody swears they didn't do anything.  We are putting it into maintenance mode now, so crisis is over and we can start running through the logs...

0 Kudos
PduPreez
VMware Employee
VMware Employee

One question.  You made a statement that I would like to better understand:

"right click the Disconnected VM and select start or migrate to working host."

This is me thinking out loud. I'm sure I've migrated a grayed out VM before, but it could have been a orphaned other state.

The other explanation is after many hours of struggling I could have been delusional. Smiley Happy

Regarding your post about HA and blades:

Blades should be no different than rack mounted servers for HA.

HA just does not like hosts the fails half way. Network only, Storage only or hanged state.

It caters more for complete host failures (normally accompanied by a loud bang):smileylaugh:

That is why the Networking redundancy and storage redundancy is so important.

And just a note,

My last resort is simply to shut off the disconnected ESXi host and add the machines to a new host and boot them. However, then I will get a lot of problems when this host restarts...

Remove the disconnected host from the cluster as well until it is fixed. If this host comes up again it will have no impact on the VMs. Even if it tries to start the VMs it will fail because it is locked by other hosts.

If HA is configured correctly and all dependencies like DNS, gateway, ect is healthy, it works like a charm.

PS. HA in vShpere 5 is completely re-writen to overcome restrictions and add functionality

For Example HA in vSphere 5 have Storage heartbeats as well

Please award points if you find this helpful/correct :smileycool: