Re: VMware HA: broken or limited ? - Page 2

mreferre · ‎03-13-2007

I was facing this issue in a customer deployment and I wanted to share it over here.

Without getting into the details of what we are doing in this space (I can get if you want) I wanted to share with you a very basic scenario that is a show-stopper for a HA / DR strategy we are implementing with a big customer.

In a situation where you for example have 4 servers in a cluster with a bunch of vm's running on each of these nodes, if for any reasons, these servers crash all of a sudden (due to mere power issues or similar) AND if one or more of these server will not come up again (i.e. due to a mere hardware failure or due to an ESX local disk corruption) the virtual machines running on these dead horses will not be restarted on the surviving nodes. So if you have 4 hosts running 10 vm's each ....... and all of them crash with 3 of them surviving, at the next reboot only 30 virtual machines will come up again while the 10 vm's hosted on the dead horse will stay down.

What do you think ? Would you be looking at this as a limitation or do you consider it to be broken ?

I will start sharing my opinion: it's completely broken! I see no reason why an HA product would not bring all managed objects up and running independently by the sequence of failures of the hosts. As far as I can tell every HA product does provide the capability to bring up and running objects after a complete cluster failure.

Thoughts / Comments ?

Massimo.

Massimo Re Ferre' VMware vCloud Architect twitter.com/mreferre www.it20.info

mreferre · ‎03-14-2007

Shawzer,

the scenario you are picturing is different from what I have envisioned and YES in that situation I would expect all vm's to come back on-line (at one point all VM's are already running on Host B and C so if you turn them off and turn them on again all vm's would restart; in theory at least).

I haven't tested this though because it's not a scenario we are interested in.

Massimo.

Massimo Re Ferre' VMware vCloud Architect twitter.com/mreferre www.it20.info

d1c1ple · ‎04-18-2007

we've had good success as mentioned already here with our small 2 server sandbox environment. Has anyone done any testing with larger configs (16 Hosts)? We plan to create large ha/drs clusters and will configure to allow 1 or 2 host failures maximum. Don't have all our gear yet; but are in the midst of deploying the first 8. Something we'll have to look at deeper for sure. We seperate our farms 50/50 on different sides of the datacenter and PDU's to minimize as best we can.

jqvm · ‎04-18-2007

I've been contemplating these same limitations. I've configured a 14 host cluster with HA/DRS.

After a network switch firmware update, all hosts were isolated, all VM's shutdown. In 10 minutes the switch came back up but HA stayed in an error state on all hosts. The network was live and running, but all our VM's were down, total service outage! Had to click the 'reconfigure for HA' option in virtual center to bring them up, then power on the vm's.

The obvious "fix" for this is to leave vm's running on host isolation. This strategy falters when an ESX host has network failure, vm's stay running and their disks are locked. Loss of service while you get to the console of the failed host and shutdown the VM's.

ChadAEG · ‎04-19-2007

To me, the obvious fix would be to put another nic in the hosts, going to another switch, and made part of the console nic virtual switch, so you can do network maintenance without the hosts ever needing to be off the network. Even in the event of a switch failure your still up and running without issue.

Many of the scenarios some are expecting HA to handle are really weak design in other areas of the datacenter.

While I am all for additional features and functionality in HA, I don't expect it to gracefully handle complete network failures, large scale power outages, and other disasters that should be covered by other redundant infrastructure.

oreeh · ‎04-19-2007

IMHO this is a bug.

HA should

\- save state information on the shared storage (probably in configurable intervals)

\- have the ability to add a secondary (backup) HA interface

\- have the ability to add a second test IP

If someone from VMware reads this: take a look at the HA implementation of the Sidewinder firewall and adopt some of it's features.

d1c1ple · ‎04-19-2007

we plan to create a vswitch and have our console and vmotion networks trunked and both vlans allowed and setup as failover paths should we lose either connection. still need to try it; but should work

mreferre · ‎04-19-2007

>we've had good success as mentioned already here with our small 2

>server sandbox environment. Has anyone done any testing with larger

>configs (16 Hosts)? We plan to create large ha/drs clusters and will

>configure to allow 1 or 2 host failures maximum. Don't have all our gear

>yet; but are in the midst of deploying the first 8. Something we'll have to

>look at deeper for sure. We seperate our farms 50/50 on different sides

>of the datacenter and PDU's to minimize as best we can.

We were looking into this for the same customer. We were thinking about creating 12+ nodes clusters but since this customer streches the cluster across two buildings and since HA only allows a maximum of 4 hosts failuers ..... we had to step back to 8 nodes per cluster.

Another "nice" limitation.

Massimo.

Massimo Re Ferre' VMware vCloud Architect twitter.com/mreferre www.it20.info

mreferre · ‎04-19-2007

>While I am all for additional features and functionality in HA, I don't

>expect it to gracefully handle complete network failures, large scale

>power outages, and other disasters that should be covered by other

>redundant infrastructure.

I agree in principle and I must admit myself that with my original post I wanted to push HA into realms it doesn't address (i.e. DR) . However we would like to do is to leverage standard HA softwares features to adapt it to our own requirements.

The problem seems (to me) to be that VMware HA (as of today) does not even support standard minimal high-availability software features (such as those of Microsoft Cluster and Veritas Cluster). What HA seems to be providing today is IP loss detection with the assumption that the host has gone blue ..... well ....... some folks of the community have written scripts that do this much better apparently.

Massimo.

Massimo Re Ferre' VMware vCloud Architect twitter.com/mreferre www.it20.info

CPetro · ‎04-22-2007

Yes...but if HA didn't understand the problem then it
shouldn't have started ANY of the VMs rather than
starting only a portion of them. Since HA
did[/i]start the VMs on all but one host, that
indicates that there's a problem in the
implementation.

HA didn't do anything in this situation, ESX by itself did.

The original scenario:

In a situation where you for example have 4 servers
in a cluster with a bunch of vm's running on each of
these nodes, if for any reasons, these servers crash
all of a sudden (due to mere power issues or similar)
AND if one or more of these server will not come up
again (i.e. due to a mere hardware failure or due to
an ESX local disk corruption) the virtual machines
running on these dead horses will not be restarted on
the surviving nodes. So if you have 4 hosts running
10 vm's each ....... and all of them crash with 3 of
them surviving, at the next reboot only 30 virtual
machines will come up again while the 10 vm's hosted
on the dead horse will stay down.

For the purposes of discussion:

ESX hosts 1 to 4 (h1 ... h4)

VMs named as in h1a-h1j to h4a-h4j.

Now, what you have to understand is that h1 \*owns* h1a to h1j, h2 owns h2a-h2j etc.

So then you power on the servers and h4 is all toasty-crunchy for some reason.

On h1-h3 ESX is responsible for starting up the VMs, not HA. HA is only responsible for \*restarting* the servers if a host fails, since h4 failed before HA came up, it doesn't know whether those VMs are offline on purpose or not, so it does nothing.

Given the cataclysmic nature of this sort of event the fix is relatively simple and HA is arguable doing "the right thing".

Ken_Cline · ‎04-22-2007

HA didn't do anything in this situation, ESX by itself did.

By default, ESX does not autostart any VMs. And, since Massimo was the OP, I'm fairly confident that the hosts were not configured to autostart VMs, any starting of VMs without manual intervention would have been as a result of HA doing it's thing.

The original scenario was of four hosts that all failed at once - with HA running. Upon restart, three of the four hosts restarted, with the fourth staying down due to hardware failure or other reason. On the three hosts that came back up, VMs were restarted (again - ESX does not[/i] automatically power on VMs). Since some of the VMs DID restart, one can only assume that HA was responsible for that action. It is my assertion that if HA were functioning "correctly" it would have restarted all of the VMs or it would have restarted none of the VMs. Restarting SOME of the VMs is inconsistent, and therefore - in my opinion - wrong.

In an environment without HA configured, none of the VMs would have restarted - the administrator would have had to go in and manually start the VMs.

Ken Cline VMware vExpert 2009 VMware Communities User Moderator Blogging at: http://KensVirtualReality.wordpress.com/

mreferre · ‎04-25-2007

Ken is, obviously, correct. I did not configure any autostart on the ESX host. As I stated in the middle of this thread:

>I have thought about using autostart as well but it is my understanding >that autostart allows you to state which and how a given set of virtual >machines can start on a given host (very fix relationship between a host >and ITS vm's). Being now the vm's not bound to a specific host but rather >to a cluster I didn't find a very intuitive way to do the same thing cluster->wise. Well .... actually I did find a way .... and that was HA .... that is (to >me at least) HA should be doing for the cluster what Autostart was doing >for the host.

So in my opinion the "autostart" concept right now is even broken in logic since everything seems to be moving "cluster-wise" while autostart is still bound to the legacy concept of the server.

Interestingly enough we have found out just a few days ago that if you leave the vm's restart priority option set to medium (in the HA config) this is what happens (i.e. the vm's running on the server that comes back on-line get restarted while the vm's previously active on a server that remains down will remain down). If on the other hand you configure the vm's with a "high" restart priority in the HA config they will ALL restart (which is what we wanted to achieve at the beginning).

At this point one of the options was to leave all vm's off and have an external source / logic to power on the vm's in a logic that suits their role (i.e. first the db, then the appl vm, then the web server etc etc) ..... this was especially necessary since you can't do this with the HA priority settings (i.e. you need to set all vm's to "high priority" if you want all them at least restart somehow at some point).

So what would you do if you do not want ANY vm to restart ? Set the priority to "low" right ? No sir .... setting it to low gives you the same mixed result as "medium". Go figure ..........

Perhaps we are doing stuff "on the border line" (are we?) but the reality is that what HA seems to be able to achieve is mostly by chance than by product design.

To me this feature is COMPLETELY broken and not only because of what we are trying to (not successfully) achieving ..... but if you browse this forum and search for HA you will only find negative attributes and just very few positive stuff.

Let's face it .... VMware have been doing great stuff ..... HA is just not one of those.

Massimo.

Massimo Re Ferre' VMware vCloud Architect twitter.com/mreferre www.it20.info

Ken_Cline · ‎04-25-2007

Massimo - couple quick questions:

1. How do you have the admission control policy set for the HA cluster?

2. Are you using reservations (CPU/RAM) on the VMs in question?

3. What is the host failure tolerance setting for the cluster?

You've probably already answered these somewhere in this thread - I'm just too lazy to go hunt them down

Ken Cline VMware vExpert 2009 VMware Communities User Moderator Blogging at: http://KensVirtualReality.wordpress.com/

mreferre · ‎04-25-2007

Ken,

1. Very relaxed...... set to power on all vm's no matter what.

2. No we are not. We are not using any Resouce Pool nor we are setting any reservation for vm's. Perhaps some vm's have more shares than others but that's about it.

3. Well so far we have been testing with two physical servers in the lab so I guess we have either set it to 1 or 2 but I am not sure .... perhaps we have already done tests with both settings (I am not the one physically carrying on all the tests).

Massimo.

Massimo Re Ferre' VMware vCloud Architect twitter.com/mreferre www.it20.info

dpomeroy · ‎04-26-2007

all our vSwitches have more than one physical NIC going to multiple physical switches. This isn't going to eliminate a variety of HA issues.

MikeAvery · ‎05-18-2007

I would just chime in here that folks using RDM LUNs should never use the "leave VM powered on" option. I am not saying that you are suggesting this configuration, I just want to mention that if a VM is left up on an isolated host and that VM is also started on a different HA host the result can be corrupt LUNs.

That being said, if HA is in a broken state, it is possible that the host doesn't actually power off the VM, even if instructed to do so under isolation response options. I have seen this and it leads to orphaned VMs and VMs that VC loses track of, so to speak.

CWedge · ‎05-18-2007

I think we should all remember that this is basically HA Version 1.0

Having said that...we need to bring up the issues..but we should have all understood going in that this was not Solaris 9

smithg001 · ‎05-25-2007

We were looking into this for the same customer. We
were thinking about creating 12+ nodes clusters but
since this customer streches the cluster across two
buildings and since HA only allows a maximum of 4
hosts failuers ..... we had to step back to 8 nodes
per cluster.

I would be very careful implementing HA over a stretch cluster to implement everything on both arrays in lockstep. We started to engineer a solution like this but as soon as any identifying components on an array are altered the Lun becomes unreadable on the systems connecting to the array that changed and you must turn on the re-signature lun option to even be able to read it again. This happened as an example during an HP VCS code upgrade to one of two EVA 5000's to VCS 4.x. That upgrade changed the controller from HSV 110 to HSV 111.

We moved to a Synchronous replication with fail-over scripts for the Luns, but including a heterogeneous mix of resources inside a cluster also doesn't seem to be good for HA.

d1c1ple · ‎05-25-2007

We'll we upgraded a small 4 server farm to VI3; and created DRS/HA. Fine for a few days; then came across some issues. Turns out HA does not use DNS FQDN as advertised; but relies on the short name; so we ended up having to populate the hosts file on each node with the short name and all the other cluster members as well which stinks; but helped cure our issue.

All

VMware HA: broken or limited ?