VMware Cloud Community
pdrace
Hot Shot
Hot Shot
Jump to solution

Nic configuration and HA isolation response setting

On new hosts that we have been ordering, HP 580 G7s and 380 G8s, the built in 1 GB nic ports are single quad port cards.

In the past I have always used the built in ports for the management network. On older hosts, 385G7s and 380G7s,they are split in to two sets of dual ports adapters on different ends of the motherboard, so I treated them as two discrete cards.

Having my management connections plugged into one card seems somewhat risky. With subsequent orders I am now adding an additional card so that I can team the connections between the built card and the add in card.

Given the lack of truly redundant management connections on the hosts I already have what woild be the recommended Isolation response setting? Leave Powered On?

That seems to be the recommendation I am seeing with Datastore heartbeating now being used. Our storage connections don't use the same connection as the management network.

Tags (2)
Reply
0 Kudos
1 Solution

Accepted Solutions
depping
Leadership
Leadership
Jump to solution

I recommend "leave powered on" in your scenario. No need to incur any downtime when it is most likely that the virtual machines will still have access to disk and to the network. (see table)

View solution in original post

Reply
0 Kudos
9 Replies
VirtuallyMikeB
Jump to solution

Good day!

Datastore heartbeating offers a truly resiliant design. It reduces the stress on the host and reduces the number of restart attempts by identifying an active datatore heartbeat before trying to restart the VM after a management network failure.

As you seem to alude to, many multi-port NICS are actually controlled by a single chip of silicon.  Even though it seems as if one is protected by redundant ports, if the single silicon chip fails, both ports will fail.  You're correct in pairing NIC ports across two different chips, onboard and PCIe.

Now, with regards to the host isolation response, you have optons depending on the customer's requirements, contstraints, and expectations.  The general answer here is, as always, it depends.  This has changed over the years with the different versions of ESX, but today, the recommended option with vSphere 5 is "Leave powered on."  This is because it eliminates the chances of a false positive and its associated VM downtime.

Imagine the result of only your management network went down had you configured the other two options.  Your storage network and virtual machine network were still up.  Only your management network went down, yet all your VMs are now powered off or shut down while your users could actually be accessing them.  With the introduction fo vSphere 5, the default and recommended configuration has changed to "Leave powered on" to mitigate this very circumstance.

Cheers,

Mike Brown

http://VirtuallyMikeBrown.com

https://twitter.com/#!/VirtuallyMikeB

http://LinkedIn.com/in/michaelbbrown

Message was edited by: VirtuallyMikeB

----------------------------------------- Please consider marking this answer "correct" or "helpful" if you found it useful (you'll get points too). Mike Brown VMware, Cisco Data Center, and NetApp dude Sr. Systems Engineer michael.b.brown3@gmail.com Twitter: @VirtuallyMikeB Blog: http://VirtuallyMikeBrown.com LinkedIn: http://LinkedIn.com/in/michaelbbrown
pdrace
Hot Shot
Hot Shot
Jump to solution

Yes my concern is the single point of failure on the management nic. I guess I will be ordering some addtional 1 GB nics  to retrofit the 5 machines that don't have a seperate card. I guess the question that still remains for me is whether I should change the isolation response on the hosts until I am able to retrofit them? My fear is the prospect of the single card used for management failing, The only way I have to repair the network at that point is it is replace the card. While the vms will keep running they will be unmanagable and I 'll have no way to migrate them to a working host.

I guess the only option at that point would to be initiate a HA failure by powering down the machine to get the machine to restart on the other hosts,

Reply
0 Kudos
joshodgers
Enthusiast
Enthusiast
Jump to solution

Isolation response should be set to "Leave Powered On" in an environment where management network is not highly redundant, such as your scenario where a single NIC (dual or quad port) failure could isolate a host.

I generally recommend using one onboard NIC and one physically seperate NIC for ESXi management connections for this reason.

You could also consider disabling the default isolation address (your ESXi Management VMKs default gateway) and setting alternate isolation addresses.

I discuss some of these points in my blog, it discusses environments where the management network is not highly available and tips on avoiding unnessasary isolation response events, such as changing isolation addresses and datastore heartbeating.

http://joshodgers.com/2012/05/30/vmware-ha-and-ip-storage/

Josh Odgers | VCDX #90 | Blog: www.joshodgers.com | Twitter @josh_odgers
Reply
0 Kudos
depping
Leadership
Leadership
Jump to solution

See my article around design decisions for the isolation response:

http://www.yellow-bricks.com/2012/05/31/which-isolation-response-should-i-use/

Reply
0 Kudos
pdrace
Hot Shot
Hot Shot
Jump to solution

Josh Odgers wrote:

Isolation response should be set to "Leave Powered On" in an environment where management network is not highly redundant, such as your scenario where a single NIC (dual or quad port) failure could isolate a host.

I generally recommend using one onboard NIC and one physically seperate NIC for ESXi management connections for this reason.

You could also consider disabling the default isolation address (your ESXi Management VMKs default gateway) and setting alternate isolation addresses.

I discuss some of these points in my blog, it discusses environments where the management network is not highly available and tips on avoiding unnessasary isolation response events, such as changing isolation addresses and datastore heartbeating.

http://joshodgers.com/2012/05/30/vmware-ha-and-ip-storage/

So you recommend setting the response to "Power Off" if the management network is fully redundant?

Reply
0 Kudos
joshodgers
Enthusiast
Enthusiast
Jump to solution

Not nessasarily, but if the management network is fully redundant, and your not using IP Storage, then it is an option I would consider depending on the environment.

The safer option is "Leave powered On" as it prevents isolation responce from a false positive, but "Shutdown" or "Power off" are also importaint to consider to ensure in the event of isolation, the VMs can be recovered to a working host in a timely manner.

Josh Odgers | VCDX #90 | Blog: www.joshodgers.com | Twitter @josh_odgers
depping
Leadership
Leadership
Jump to solution

Parker Race wrote:


So you recommend setting the response to "Power Off" if the management network is fully redundant?

Depends on the rest of your environment:

1) are you using IP based storage?

2) are you using a converged network?

3) what are the chances the virtual machines are also isolated?

http://www.yellow-bricks.com/2012/05/31/which-isolation-response-should-i-use/

Reply
0 Kudos
pdrace
Hot Shot
Hot Shot
Jump to solution

Duncan wrote:

Parker Race wrote:


So you recommend setting the response to "Power Off" if the management network is fully redundant?

Depends on the rest of your environment:

1) are you using IP based storage?

2) are you using a converged network?

3) what are the chances the virtual machines are also isolated?

http://www.yellow-bricks.com/2012/05/31/which-isolation-response-should-i-use/

1. All our storage is IP based (NFS).

2. The management network connection uses a set of nics that is separate from the storage and virtual machine connections. 

3. Probably unlikely unless there was a general network infrastructure outage. We had one this month, not a lot of fun.

Thanks everyone for your responses.

I wish there was a way to assign more points for helpful answers.

Reply
0 Kudos
depping
Leadership
Leadership
Jump to solution

I recommend "leave powered on" in your scenario. No need to incur any downtime when it is most likely that the virtual machines will still have access to disk and to the network. (see table)

Reply
0 Kudos