VMware Cloud Community
zmclean
Enthusiast
Enthusiast

HA Errors after VC 2.5 Update 2 upgrade Incompatible HA Networks

Ill post below the screen shot but i upgraded the VC to 2.5 U2 and then i upgraded the last server to 3.5 U2 and ever since it has not been happy with the HA configuration. I have tried renaming and disabling and reenabling but it does the same thing. Says Incompatible HA Networks and to try the das.allowNetwork option in the advanced cluster. I haven't been able to find anything on that particular option.

Z-Bone

Z-Bone
Tags (4)
0 Kudos
53 Replies
zmclean
Enthusiast
Enthusiast

That's ok, can you type the error messages and you XXX out the IP address or sensitive material. Just a rough idea of the error message might give me something to think about.

Z-Bone

Z-Bone
0 Kudos
dominic7
Virtuoso
Virtuoso

"HA Agent on <hostname> in cluster <clustername> in <datacenter> has an error Incompatible HA Networks: Host has network(s) that don't exist on cluster members: <ip address>: Cluster has network(s) missing on host: <ip address>: Consider using Advanced Cluster Settings das.allowNetwork to control network usage"

0 Kudos
zmclean
Enthusiast
Enthusiast

"HA Agent on <hostname> in cluster <clustername> in <datacenter> has an error Incompatible HA Networks: Host has network(s) that don't exist on cluster members: <ip address>: Cluster has network(s) missing on host: <ip address>: Consider using Advanced Cluster Settings das.allowNetwork to control network usage"

Can you verify that all subnet's that are allowed can ping the IP address in question? The IP in question is a known ESX Host? You tried that das.allowNetwork Service Console, etc?

Z-Bone

Z-Bone
0 Kudos
dominic7
Virtuoso
Virtuoso

The hosts were able to ping all of the other hosts, and the default gateways. The HA cluster was fully functional before the upgrade. Since then I have moved all of the service consoles into the same VLAN and everything is functioning well again, but I still have another site with at least 4 clusters that are set up similar ( crappy ) fashion which I will have to get fixed before I can upgrade the VC at that site.

0 Kudos
vpert
Enthusiast
Enthusiast

One of our customers has the same issue and opened a SR at VMware - no solution up until now.

We "just" upgraded the VirtualCenter to 2.5 U2 and had the ESX host still running on 3.5.0 U1 when this problem came up the first time.

We created then a new Cluster and move the faulted ESX Server (ESX02) to it: HA gets configured and everything is fine.

Then we also moved the second ESX Server (ESX03) - which was working before - into this cluster: Error Message as mentioned before in this thread

When we check the "Network" in the VC all networks on the ESX Server (ESX03) are shown as "Connect" but Status red. ...but the networks are fine and all running VM's can be reached.

Just for our pleasure we did the steps again: new cluster, add ESX03 - then the ESX02 has the same error's as the ESX03 before.

Then we removed the working Host ESX03 from the cluster and started a "Reconfigure HA" on the faulted ESX02 - everything is fine.

Finally we upgrade both ESX Server to the current version 3.5.0, 103908. No change to this issue at all.

We found out that when a ESX Host has this error - no changes to the HA agent or configuration on this host happens when starting a "Reconfigure HA" or adding the host to the HA enabled cluster.

So we assume that this issue must been caused from the VirtualCenter 2.5 U2 not from the HA Agent or the ESX Server.

I'll keep you up-to-date what happen with the VMware SR concerning this.

0 Kudos
meistermn
Expert
Expert

0 Kudos
vpert
Enthusiast
Enthusiast

thx for the hint - we double checked this already - all hostname and DNS entries are in lower case. There are no problems with DNS name resolution we could find.

Now the issue appears even in "untouched" clusters; means ESX clusters which were running without this problem for a couple of days. ...it's getting strange.......

0 Kudos
marxk
Contributor
Contributor

I too would like to see the information in the link. I have clusters of machines that are in different datacenters on campus and the console connections are on separate VLans. I would prefer not to have to re-architect the network connections to fix this if possible. I am wondering if we can configure the HA communication to use one set of nics and the communication with Vcenter to use a different one. My main problem is IP space.

0 Kudos
admin
Immortal
Immortal

Here's the scoop. The HA heartbeats are configured to take place over matched networks. The determination of a "match" is one whose IP address/subnet mask is the same as the other hosts in the cluster. If you are using VLANs, then there is currently no way for HA to know to match up the two. This will hopefully be improved in the future so that routable networks can be tagged to assist in the match-up.

If you don't have enough IP addresses to have a single subnet and need VLANs, then you'll have to exclude that network from consideration by HA. So you should use the das.allownetworkX to specify only the non-VLAN management networks. If that doesn't allow for redundancy, then HA will have that one netwokr as a single point-of-failure and less reliable.

I wish I had a better option to offer....

0 Kudos
admin
Immortal
Immortal

Can you verify that your hostname was not inadvertently upcased during the upgrade process?

0 Kudos
KBrown01
Contributor
Contributor

Besides the issues of not having enough IP space, I think the biggest issue with this change is that VMWare has taken something that worked in VC2.5u1 and previous, changed the requirements, made no notification of said change, and made no way to work around it. If I was with my previous employer I know they would have their legal people looking into this since that was the way they operated. On top of all that there is no way to test out the upgrades to the rest of the products without upgrading to VC2.5u2 first (at least to be able to test out through VC), and the roll back works for crap...I had to roll back to u1 becasue of a failure during the upgrade..and that hosed the DB even more...the roll back means you restore from backup tape. All that being said, I would have been at least a little more tolerant of the change if I had been made aware of it PRIOR to performing the upgrade...in the release notes...but I am guessing that would have been too much to ask for....

0 Kudos
marxk
Contributor
Contributor

I thought maybe if I drew a picture of how I have it setup you might be able to see a better way:

There are 3 hosts in the cluster I am working on at the moment and each is configured nearly identical. I removed a redundant console for each from either the Prod 1 or Prod 2 networks:

vSwitch0- 1 nic (1000/full)

Service Console - "Unix/Backup Network 255.255.255.0"

Host 000- 172.xx.AA.230

Host 001- 172.xx.BB.231

Host 002- 172.xx.BB.230

vSwitch1- 3 nics (1000/full)

VM Port Group - "deployment/management network 255.255.255.0 "

VMotion 0 192.xx.xx.246

vSwitch2- 2 nics (1000/full)

VM Port group - "Production network 1 255.255.255.0"

vSwitch3- 2 nics (1000/full)

VM Port Group - "Production Nework 2 255.255.255.0"

vSwitch4- NO nics

VM Port Group - "Isolation"

vSwitch5- NO nics

VM Port Group - "Local Development"

Thanks

0 Kudos
admin
Immortal
Immortal

I appreciate your frustration. You are right, this change was not properly communicated in the release notes.

One correction, though....

>"VMWare has taken something that worked in VC2.5u1"

There never used to be a cluster configuration validity check, so improper configurations were unrecognized. While things may have appeard to be working properly, the cluster was vulnerable when there wasn't complete network compatibilty. This is due to the way the cluster communication multi-paths its cluster communication over the available networks. Certain host restart scenarios could lead to split-brain conditions (where the cluster divides into two separate clusters). Also, the heartbeats would never be recieved over non-routable networks, leading to degraded host failure detection.

And I agree, the upgrade process is painful for large installations. That is something we are committed to improving.

0 Kudos
marxk
Contributor
Contributor

I have to agree. Due to issues when performing updrades previously (several times we had to restore the DB from tape) we decided to build a fresh server, fresh database and install vcenter 2.5u2 cleanly. Now I have a mess because there was no "Oh, by the way..." I am not a professional programmer but even I could have figured out a way to run a query on the database to see if the current configuration would fail if machines in the same cluster violated this new rule.

0 Kudos
marxk
Contributor
Contributor

How do you configure for 2 consoles on separate subnets? It only gives me the option to set the default console gateway but I would like to put a console on another subnet.

0 Kudos
boy
Contributor
Contributor

KB article is released

http://kb.vmware.com/kb/1006541

0 Kudos
rsullivan
Contributor
Contributor

Ok, so how does one use this das.allowNetwork feature?

I keep getting "Object reference not set to an instance of an object." errors.

I have two vlans I'm trying to work with. 3 of my hosts are on 1 vlan, and the rest are on another..

So what is the correct way to express them?

das.allowNetwork195 Service Console

das.allowNetwork228 Service Console

Tried the above and it didn't fix it. Also, if I just create a das.allowNetwork1, and 2 it seems to pre-populate the vlans after I've reconfigured the cluster.

I just need an example to go off of because I'm not getting this to work.

0 Kudos
Ida_Soco
Contributor
Contributor

The issue can ONLY be solved by move the Service Console of ALL nodes in a cluster to the same IP network. This is not acceptable because this would mean a redesign of the IP network in lots of VI3 environments.

We are still waiting of a "Workaround" or solution from VMware.

0 Kudos
admin
Immortal
Immortal

> The issue can ONLY be solved by move the Service Console of ALL nodes in a cluster to the same IP network.

To be clearer, if you have a single Service Console on each ESX host, then they all need to be on the same IP subnet (even if you know better and are using routable VLANs).

VMware will have to create a patch to "ignore the network compatibility check", which isn't currently in the U2, otherwise, for those who don't have the flexibility to modify their Service Console IP networks, there is no workaround.

0 Kudos
vpert
Enthusiast
Enthusiast

Absolutely agree.

Just had a call with the VMware Support - Engineering will not change this "new feature". But Support will ask for a possibility to let the customer decide if he want's to use this enhanced HA compatibility check or not.

0 Kudos