Ill post below the screen shot but i upgraded the VC to 2.5 U2 and then i upgraded the last server to 3.5 U2 and ever since it has not been happy with the HA configuration. I have tried renaming and disabling and reenabling but it does the same thing. Says Incompatible HA Networks and to try the das.allowNetwork option in the advanced cluster. I haven't been able to find anything on that particular option.
Z-Bone
That's ok, can you type the error messages and you XXX out the IP address or sensitive material. Just a rough idea of the error message might give me something to think about.
Z-Bone
"HA Agent on <hostname> in cluster <clustername> in <datacenter> has an error Incompatible HA Networks: Host has network(s) that don't exist on cluster members: <ip address>: Cluster has network(s) missing on host: <ip address>: Consider using Advanced Cluster Settings das.allowNetwork to control network usage"
"HA Agent on <hostname> in cluster <clustername> in <datacenter> has an error Incompatible HA Networks: Host has network(s) that don't exist on cluster members: <ip address>: Cluster has network(s) missing on host: <ip address>: Consider using Advanced Cluster Settings das.allowNetwork to control network usage"
Can you verify that all subnet's that are allowed can ping the IP address in question? The IP in question is a known ESX Host? You tried that das.allowNetwork Service Console, etc?
Z-Bone
The hosts were able to ping all of the other hosts, and the default gateways. The HA cluster was fully functional before the upgrade. Since then I have moved all of the service consoles into the same VLAN and everything is functioning well again, but I still have another site with at least 4 clusters that are set up similar ( crappy ) fashion which I will have to get fixed before I can upgrade the VC at that site.
One of our customers has the same issue and opened a SR at VMware - no solution up until now.
We "just" upgraded the VirtualCenter to 2.5 U2 and had the ESX host still running on 3.5.0 U1 when this problem came up the first time.
We created then a new Cluster and move the faulted ESX Server (ESX02) to it: HA gets configured and everything is fine.
Then we also moved the second ESX Server (ESX03) - which was working before - into this cluster: Error Message as mentioned before in this thread
When we check the "Network" in the VC all networks on the ESX Server (ESX03) are shown as "Connect" but Status red. ...but the networks are fine and all running VM's can be reached.
Just for our pleasure we did the steps again: new cluster, add ESX03 - then the ESX02 has the same error's as the ESX03 before.
Then we removed the working Host ESX03 from the cluster and started a "Reconfigure HA" on the faulted ESX02 - everything is fine.
Finally we upgrade both ESX Server to the current version 3.5.0, 103908. No change to this issue at all.
We found out that when a ESX Host has this error - no changes to the HA agent or configuration on this host happens when starting a "Reconfigure HA" or adding the host to the HA enabled cluster.
So we assume that this issue must been caused from the VirtualCenter 2.5 U2 not from the HA Agent or the ESX Server.
I'll keep you up-to-date what happen with the VMware SR concerning this.
thx for the hint - we double checked this already - all hostname and DNS entries are in lower case. There are no problems with DNS name resolution we could find.
Now the issue appears even in "untouched" clusters; means ESX clusters which were running without this problem for a couple of days. ...it's getting strange.......
I too would like to see the information in the link. I have clusters of machines that are in different datacenters on campus and the console connections are on separate VLans. I would prefer not to have to re-architect the network connections to fix this if possible. I am wondering if we can configure the HA communication to use one set of nics and the communication with Vcenter to use a different one. My main problem is IP space.
Here's the scoop. The HA heartbeats are configured to take place over matched networks. The determination of a "match" is one whose IP address/subnet mask is the same as the other hosts in the cluster. If you are using VLANs, then there is currently no way for HA to know to match up the two. This will hopefully be improved in the future so that routable networks can be tagged to assist in the match-up.
If you don't have enough IP addresses to have a single subnet and need VLANs, then you'll have to exclude that network from consideration by HA. So you should use the das.allownetworkX to specify only the non-VLAN management networks. If that doesn't allow for redundancy, then HA will have that one netwokr as a single point-of-failure and less reliable.
I wish I had a better option to offer....
Can you verify that your hostname was not inadvertently upcased during the upgrade process?
Besides the issues of not having enough IP space, I think the biggest issue with this change is that VMWare has taken something that worked in VC2.5u1 and previous, changed the requirements, made no notification of said change, and made no way to work around it. If I was with my previous employer I know they would have their legal people looking into this since that was the way they operated. On top of all that there is no way to test out the upgrades to the rest of the products without upgrading to VC2.5u2 first (at least to be able to test out through VC), and the roll back works for crap...I had to roll back to u1 becasue of a failure during the upgrade..and that hosed the DB even more...the roll back means you restore from backup tape. All that being said, I would have been at least a little more tolerant of the change if I had been made aware of it PRIOR to performing the upgrade...in the release notes...but I am guessing that would have been too much to ask for....
I thought maybe if I drew a picture of how I have it setup you might be able to see a better way:
There are 3 hosts in the cluster I am working on at the moment and each is configured nearly identical. I removed a redundant console for each from either the Prod 1 or Prod 2 networks:
vSwitch0- 1 nic (1000/full)
Service Console - "Unix/Backup Network 255.255.255.0"
Host 000- 172.xx.AA.230
Host 001- 172.xx.BB.231
Host 002- 172.xx.BB.230
vSwitch1- 3 nics (1000/full)
VM Port Group - "deployment/management network 255.255.255.0 "
VMotion 0 192.xx.xx.246
vSwitch2- 2 nics (1000/full)
VM Port group - "Production network 1 255.255.255.0"
vSwitch3- 2 nics (1000/full)
VM Port Group - "Production Nework 2 255.255.255.0"
vSwitch4- NO nics
VM Port Group - "Isolation"
vSwitch5- NO nics
VM Port Group - "Local Development"
Thanks
I appreciate your frustration. You are right, this change was not properly communicated in the release notes.
One correction, though....
>"VMWare has taken something that worked in VC2.5u1"
There never used to be a cluster configuration validity check, so improper configurations were unrecognized. While things may have appeard to be working properly, the cluster was vulnerable when there wasn't complete network compatibilty. This is due to the way the cluster communication multi-paths its cluster communication over the available networks. Certain host restart scenarios could lead to split-brain conditions (where the cluster divides into two separate clusters). Also, the heartbeats would never be recieved over non-routable networks, leading to degraded host failure detection.
And I agree, the upgrade process is painful for large installations. That is something we are committed to improving.
I have to agree. Due to issues when performing updrades previously (several times we had to restore the DB from tape) we decided to build a fresh server, fresh database and install vcenter 2.5u2 cleanly. Now I have a mess because there was no "Oh, by the way..." I am not a professional programmer but even I could have figured out a way to run a query on the database to see if the current configuration would fail if machines in the same cluster violated this new rule.
How do you configure for 2 consoles on separate subnets? It only gives me the option to set the default console gateway but I would like to put a console on another subnet.
KB article is released
Ok, so how does one use this das.allowNetwork feature?
I keep getting "Object reference not set to an instance of an object." errors.
I have two vlans I'm trying to work with. 3 of my hosts are on 1 vlan, and the rest are on another..
So what is the correct way to express them?
das.allowNetwork195 Service Console
das.allowNetwork228 Service Console
Tried the above and it didn't fix it. Also, if I just create a das.allowNetwork1, and 2 it seems to pre-populate the vlans after I've reconfigured the cluster.
I just need an example to go off of because I'm not getting this to work.
The issue can ONLY be solved by move the Service Console of ALL nodes in a cluster to the same IP network. This is not acceptable because this would mean a redesign of the IP network in lots of VI3 environments.
We are still waiting of a "Workaround" or solution from VMware.
> The issue can ONLY be solved by move the Service Console of ALL nodes in a cluster to the same IP network.
To be clearer, if you have a single Service Console on each ESX host, then they all need to be on the same IP subnet (even if you know better and are using routable VLANs).
VMware will have to create a patch to "ignore the network compatibility check", which isn't currently in the U2, otherwise, for those who don't have the flexibility to modify their Service Console IP networks, there is no workaround.
Absolutely agree.
Just had a call with the VMware Support - Engineering will not change this "new feature". But Support will ask for a possibility to let the customer decide if he want's to use this enhanced HA compatibility check or not.