VMware Cloud Community
Patrack80
Contributor
Contributor

COS in different VLANS

Hello,

I'm searching for a solution, if possible, to have the service console interface connected on two different VLANs.

The reason is simple : We experienced a short network interrupt on our management VLAN. The result was an isolation detection for the hosts of our cluster and the power off of all the VMs.

I would like to know if there is a way to prevent that kind of issue.

Thanks for any info

0 Kudos
8 Replies
bertdb
Virtuoso
Virtuoso

redundant networking for the COS is very important. Two simple choices:

1) give the vSwitch holding the service console interface multiple physical uplinks (NICs), connected to different switches.

2) create multiple service console interfaces. Give them IPs in separate ranges, and create them on vSwitches so that they can use different NICs connected to different switches. Don't forget to "reconfigure HA" on all hosts after doing this.

0 Kudos
Patrack80
Contributor
Contributor

1) Already Done

2) Assume that I use Vlans to separate my networks, Can i have one service console interface on Vlan1 with physical NIC1 and one service console interface on vlan2 with physical NIC2 ? With of course two different ranges.

Is that configuration is allowed ? Won't be any conflict ?

- in case the main Vlan 1 gets down for one hour, are we sure the service continues on Vlan2, without any impact on VMs ?

0 Kudos
tabit
Contributor
Contributor

I doubt that will really work like the way you want even if it's possible. The HA cluster to me is very unpredictable during a network outage. There always a "HA agent" error here and there. IMHO, it's not quite ready for serious enterprise usage.

The best solution so far it to change the "isolation response" in each VM in an HA cluster to "leave power on". Just search "isolation response" you will see some threads.

There is lock up risk, if you really have an ESX host got isolated because it's own network problem, you can not start the VM from anther host since it leave the VM power on and locks it. You have to manually power down the VM at the isolated ESX host first, then start it from another host.

In my environment, I put COS on a Virtual Swith with two NICs, so I think the chance for a ESX to get isolated "for its own good reason" is very low and acceptable. However if there's a major network outage, all ESX s will be isolated I would definitely want them the leave the VMs on.

I'd rather take the risk of locking up VMs rarely on a single ESX server than taking all the VMs down on all ESX servers in a HA cluster during an network outage.

Patrack80
Contributor
Contributor

Tabit,

I agree with you for these settings.

I selected the "power off" option not to be constained in case of BCP (having to force a power off on all impacted VMs before being able to present the replicated LUN to the other hosts) but now I will leave them powered on and do the necessary to have a script able to do this "powering on" operation.

We are actually testing the Dunes VS-O product and this one can surely do what we need.

Thanks for your opinion.

0 Kudos
bertdb
Virtuoso
Virtuoso

sure, how would that conflict ?

if both VLANs are reachable over different vSwitches, it's simple, just one COS interface per vSwitch. If the VLANs are reachable over a single vSwitch, you define two COS interfaces, define the correct VLAN IDs, and specify different active/standby policies in the NIC teaming settings.

I don't know what you mean by "the service continues". When you configure your hosts for HA through VC, they get all contact information of all other hosts. Each host will start heartbeating the other hosts over all available network paths between the service consoles. In this case: two paths.

If the first scenario was already implemented, how did your network fail for one hour ? multiple network switches went down together ? If that happens, scenario two will cause isolation detection as well !

If a host can't see any of the other service consoles AND it can't see its default gateway, the host decides that it is isolated. If all physical switches connected to vSwitches that hold service console ports are powered off, that will always happen, guaranteed.

0 Kudos
bertdb
Virtuoso
Virtuoso

if isolation of the host can happen without the conclusion that the VMs would better be powered off, then "leave powered on" (aka "do nothing") is the only correct setting.

Preventing host isolation is better, but not always feasible.

0 Kudos
tabit
Contributor
Contributor

One more thing, if you vmotion a virtual machine or make changes to the HA cluster, make sure to recheck the VMs's isolation response. I found it will be changed to default "power off" after some changes in a HA cluster

0 Kudos
Patrack80
Contributor
Contributor

Hello,

I'm still trying to understand what happens exactly.

How HA works exactly in case of spanning tree on the network ? The heartbeat between hosts is made via the vmkernel ? It says that the check is made by "pinging" the default gateway of the COS, but if the router gets down, the hosts should still see each other, no ?

Sorry but i'm not quite sure to understand.

I'm searching for a solution to be sure this can not happen anymore.

0 Kudos