VMware Cloud Community
SunilSaini
Contributor
Contributor

Isolation responce

Hi to all

What isolation responce should we use out of three (Leave power on. Power off, shutdown) ? How can we decide that which will be best isolation responce after detecting the host isolation ?

Please make it clear any body.

Thanks in advance

Reply
0 Kudos
12 Replies
Troy_Clavell
Immortal
Immortal

We tend to use "leave powered on". This will help keep the guests on-line in the event of a false positive. If it's a true HA event, not just an isolation event, the guest would be powered down and restarted on other hosts in the cluster.

Reply
0 Kudos
mcowger
Immortal
Immortal

This book explains how to makr that decision very well:

http://www.amazon.com/VMware-vSphere-technical-deepdive-ebook/dp/B004V49JGW

--Matt VCDX #52 blog.cowger.us
Reply
0 Kudos
depping
Leadership
Leadership

Our book explains it in-depth indeed. Thanks Matt! It is also covered in this blog article:

http://www.yellow-bricks.com/vmware-high-availability-deepdiv/

Reply
0 Kudos
Techstarts
Expert
Expert

Strongly suggest you read the best book available as mentioned and written by Duncap

This decision be it shutdown,leave power on has impact (side effects) which every decision maker must be aware.

Thank you,

Preetam

With Great Regards,
Reply
0 Kudos
SunilSaini
Contributor
Contributor

Hi All

is it true that host isolation declare itself ? if it does what happened if the host suddenly goes down due to power failure .

I got confused after reading the following lines.

The time elapsed before the host declares itself isolated varies depending on the role of the host (master or
slave) at the time of the loss of heartbeats. If the host was a master, it will declare itself isolated within 5 seconds.
If the host was a slave, it will declare itself isolated in 30 seconds.

Please make it clear ?

Reply
0 Kudos
Techstarts
Expert
Expert

SunilSaini wrote:

Hi All

is it true that host isolation declare itself ? if it does what happened if the host suddenly goes down due to power failure .

I got confused after reading the following lines.

The time elapsed before the host declares itself isolated varies depending on the role of the host (master or
slave) at the time of the loss of heartbeats. If the host was a master, it will declare itself isolated within 5 seconds.
If the host was a slave, it will declare itself isolated in 30 seconds.

Please make it clear ?

Isolated means "host is powered ON but not on the network which can't be reached by master or slave"

There are two ways it confirms if it is isolated by pinging it's mgmt gateway or any specifically mentioned ip address. Second it checks if it can reach any datastore.  If it is powered off where is the question of recieving the heartbeats.

Read the book for why master needs 5 sec and slave needs 30 sec

With Great Regards,
Reply
0 Kudos
SunilSaini
Contributor
Contributor

Thanks a  lot to discuss your thoyughts.

My basic concern is that which one host is responsible for declaring the host isolation.

Isolation declaration is done it self or by Master Ha agent server ?

Lets take an example.

One host is disconnected from the network. Not able to ping or reachable from any one. Datastore is also not reachable. But after all of this host is Power on. In this case Master declare that host is totaly isolated and isolation responce is triggered.

But what about that isolated host ?

how will it be reboot? or we will have to reboot it manually.

Thnaks to All

Reply
0 Kudos
admin
Immortal
Immortal

If a host is network isolated and also is not able to reach its storage (specifically its heartbeat datastores), then the master will report it as "dead", not "isolated" (even though it really is not dead the master cannot tell this). A master can only distinguish an isolated host from a dead host if the isolated host is still heartbeating to the datastore. The isolated host itself is also able to tell if it is isolated (and it can do this even without access to storage but it is not able to communicate this to the master in that case). When vCenter reports that a host is isolated, that information is coming from the master, not the isolated host itself (since vCenter cannot talk to the isolated host in that case of course)

Elisha

Reply
0 Kudos
SunilSaini
Contributor
Contributor

Thanks Elisha

My concern is that isolation declaration is done by it self or by the master server.

As per you The isolated host itself is also able to tell if it is isolated (and it  can do this even without access to storage but it is not able to  communicate this to the master in that case).


without accessing storage how can it possible to tell about itself isolation. Please explain

Again thanks a lot.

Reply
0 Kudos
admin
Immortal
Immortal

Isolation in the context of HA means isolation from the management network - it has nothing to do with loss of access to storage. The fdm agent on a host declares itself isolated when 3 conditions are met:

1) it cannot communicate with any other fdm agent in the cluster (over the management network)

2) it cannot ping (icmp) any other host in the cluster (also over the management network)

3) it cannot ping (icmp) the isolation address(es) - by default this is the default gateway of the management network

The way that an isolated host tells the master that it is isolated is by writing some state to a file on its heartbeat datastores. If it can't access its heartbeat datastores, the master will think it is dead.

Elisha

Reply
0 Kudos
SunilSaini
Contributor
Contributor

Again thanks elisha

if HA is disconnected from the all network as well as from the storage then what wil happen. would it be restart auto or manually ?

Thanks in advance

Reply
0 Kudos
admin
Immortal
Immortal

In that case, the master will think the host is dead and will immediately restart any vms that were running on it. These vms will remain running on the isolated host but the vmfs lock on the vmdk will be lost because the host has lost access to storage. When the isolated host regains access to storage it will try to reacquire the vmdk lock which will fail (since the vm is running on another host which has the locks). The HA agent will detect this situation and cause the original vm instance to power off. So no manual action should be necessary.

Elisha

Reply
0 Kudos