VMware Cloud Community
Erik_Zandboer
Expert
Expert

host currently has no management network redundancy

Hello all,

I have seen this error appearing after the upgrade to ESX 3.5 / VC2.5 on HA-enabled systems. Somehow ESX thinks it is not redundant enough for HA I guess. Does anyone know the exact parameters for ESX to come up with this? My setup has one service console connected to a vswitch, which has two physical uplinks. Obiously, ESX does not think this is redundant enough. Anyone got more info on how to get rid of this message (eg how you should design network setup in the eyes of ESX HA)?

Visit my blog at http://www.vmdamentals.com
Reply
0 Kudos
54 Replies
TheRealJason
Enthusiast
Enthusiast

I did have to end up rebooting the Host to get the message to actually go away after configuring the additional Service Console.

Reply
0 Kudos
TheRealJason
Enthusiast
Enthusiast

One benefit I suspect that you will get out of having the Service Console on the VMotion network is in the scenario of network isolation. Since all the hosts are all on the same VMotion network, I would assume that they can also do heartbeats on that Console if the primary fails.

Can anyone confirm?

Reply
0 Kudos
Erik_Zandboer
Expert
Expert

Hi,

HA does do heartbeats between ESX nodes. So adding a service console to the vmotion network could help. On the other hand, HA also checks the gateway (by default; you can specify other IPs). Since Vmotion is "very layer2", often a default gateway is not available, nor any other pingable device in that subnet (except ESX nodes themselves). So yes, it could help if you have a gateway in your vmotion network. If you do not have a gateway, you could rely on having a team of physical NICs (which you should split over two switches to make it "really HA").

Visit my blog at http://www.vmdamentals.com
Reply
0 Kudos
xlogicg
Contributor
Contributor

Once you add the second nic to the Service Console if you right click on your esx server and choose "reconfigure for vmware HA" assuming the fail over nic is correct the warning/error message will go away. That was my experience.

Reply
0 Kudos
dmorgan
Hot Shot
Hot Shot

All I did was add a second unused NIC, configured as a stand-by, and re-configured for HA, and the error went away for me. Simple as that, no additional virtual switches, no second Service Console, just that. Pretty easy fix.

If you found this or any other post helpful please consider the use of the Helpfull/Correct buttons to award points
Reply
0 Kudos
admin
Immortal
Immortal

It checks to see if the number of active networks is greater than 1. active network = link is up. It can either be nic teaming or two service consoles. It checks the physical links that the relevant vswitches map to.

Hope that helps!

-Sridhar

Reply
0 Kudos
dmorgan
Hot Shot
Hot Shot

OK great, that explains it a little better. In 3.0 it never gave this error, but once I upgraded to 3.5, there it was. Thanks for the explaination, it makes a lot more sense as to why there are multiple solutions to the same problem. I guess so long as it has redundancy of some sort, it doesn't care what kind.

If you found this or any other post helpful please consider the use of the Helpfull/Correct buttons to award points
Reply
0 Kudos
admin
Immortal
Immortal

Yeah, a lot of people were running into problems of "isolation response" because they didn't have network redundancy for their service console, and the networking team decided to do switch maintenance, or they had some network hiccup. With redundant networks, if one becomes unavailable, the HA service can still continue to heartbeat and communicate using the other network, so your network does not become the "single point of failure" in your infrastructure. This check was added in 3.5 to alert the user to a possible lack of redundancy.

Reply
0 Kudos
dmorgan
Hot Shot
Hot Shot

This actually took a lot less time for me to fix than it took to figure out that a previous administrator had disabled ping requests to our default gateway. Since VMWare pings this default gateway to determine the network status, this caused some problems. Took quite some time to figure this out, and a few seconds to resolve it.

If you found this or any other post helpful please consider the use of the Helpfull/Correct buttons to award points
Reply
0 Kudos
admin
Immortal
Immortal

Cool. Actually, since vc2.0.2, and in VC2.5, HA also checks to see if the default gateway/das.isolationaddress is pingable, and shows a "config issue" for those that cannot be pinged.

Reply
0 Kudos
DetXL
Contributor
Contributor

I only added a second Service Console to a (existing) second virtual Switch with one interface. Then "Reconfigure for HA" on all HA Servers and this worked immediately

DetXL

Reply
0 Kudos
michelemase
Contributor
Contributor

hi i have "solved" it , by disabling and reenabling the HA..after enabling it again the warning disappeared.

bye

Reply
0 Kudos
dwchan
Enthusiast
Enthusiast

If I decide to add a 2nd service console to the same vSwitch as my vMotion (where vMotion is on a difference subnet/vlan as my primary service console), which default gateway or Ip range do I use? Do I use the same subnet mask/gateway as my vMotion network - which vmotion network does have a gateway. If this is the case, is not that big of a deal, since I am basically provide a mini network for the service console to check heart beat with each other.

Also, if the 2nd service console is added, which Ip do I put in DNS? Since technically, there is 2 IP now that go to the service console. In my DNS, do the hostname resolve to the 1st service console or 2nd?

and lastly, i find another article from VMware talking about a value call "das.isolationaddress" what is this?

dwc

Reply
0 Kudos
admin
Immortal
Immortal

The 2nd service console is primarily there for the HA heartbeating, so it can be your private mini network. You can use the same subnet mask/gateway as your vmotion network.

As for DNS, you would put the first SC in the DNS. It would resolve to the first IP. This is the one you would use to add the host to VC.

If you have a 2nd SC, you'll notice that you cannot specify a different gateway for it. You can specify it via the das.isolationaddress HA advanced option. A HA host that is isolated from the network will try to ping its isolation addresses (by default, the gateway address) to determine if it is isolated from the network (vs. thinking the other node has failed). Since you cannot specify a 2nd gateway address directly, you can specify the das.isolationaddress.

-Sridhar

Reply
0 Kudos
dwchan
Enthusiast
Enthusiast

i think this is our best bet. I just got one last question. In my current vMotion subnet, it is in vLan/subnet (10.10.10.#/255.255.255.0) So technical, i can either use an IP somewhere in that subnet range and the same gateway as the vMotion (10.10.10.1) for my 2nd S.C. But can I make something up (like 10.20.20.###/255.255.255.0) and use the "das.isolationaddress" parameter to inject a 3rd gateway 10.20.20.1 which technically doesn't exist. But since all my esx host (less than 254 of them) are using the subnet fake subnet 10.20.20.### for its default gw,would it work?

Regarding to the das.isolationaddress parameter, for multiple S.C. configuration, do i need add my 1st S.C. Gateway IP as das.isolationaddress1 and than add my 2nd S.C. Gateway Ip as das.isolationaddress2? or it is enough for me to just to add my 2nd S.C. gateway IP as das.isolationaddress1

dwc

Reply
0 Kudos
strike
Contributor
Contributor

To turn it off, you have to uncheck the HA features of your cluster within the datacenter, yuo will see all the cluster node reconfigure from HA and after that you can enable again (check again) the HA feature of your cluster.

Reply
0 Kudos
admin
Immortal
Immortal

The isolation address is used to test if the host itself is isolated from the network (when both networks are down, and when it has stopped receiving heartbeats from other members). You will want to use a reliable address for this purpose, and one that is not too many hops away, and it should be pingable.

By default, HA uses the SC default gateway, in addition to any das.isolationaddress* you may specify. If you don't want it to use the SC default gateway, then there is an advanced option for it ( das.useDefaultIsolationAddress = false). In your case, since you're using SC1, it's sufficient to add SC2 as das.isolationaddress1

Reply
0 Kudos
mbrown2010
Contributor
Contributor

I fixed the host currently has no management network redundancy issue on my VMware ESX 3.5 hosts by adding another Service Console port on a different network card and reconfiguring VMWare HA.

Reply
0 Kudos
xooops
Contributor
Contributor

Hi

At the moment we still have this issue. Because we don’t have any free pNic’s or free IP Address for a second service console.

We have

vSwitch0 connected to vmnic0

Ports Service Console and VMkernel

vSwitch1 connected to vmnic1 and vmnic2

Ports for VM Machines

VSwitch2 connected to vmnic2

Ports for VM Machines

Does anybody have a solution for us?

Thanks, sven

Reply
0 Kudos
gmjulian
Contributor
Contributor

So from what you have posted it looks like you only have 3 avail nic ports in the server?

Any chance you an add a 4th. If you do that you can use VLAN tagging and setup something similar to below:

vSwitch0 connected to vmnic0 and vmnic3

Ports Service Console and VMkernel

vSwitch1 connected to vmnic1 and vmnic2

Use VLAN Tagging (802.1q) and setup all vlans for VM Machines

Reply
0 Kudos