Hi all,
I have 3 hosts running ESXi 5.5. I manage these hosts from a vCenter server.
I have created a cluster out of these hosts and enabled HA on the cluster. There are no error messages or no warnings. The vSphere HA summary page of the cluster shows:
In addition to this, I have management network redundancy on each host via VDSwitch. There is portgroup called mgmt, which is for management network, this portgroup is connected to 3 NIC uplinks (Fibre A, Fibre B and Copper) for redundancy. Another portgroup I have is vMotion. Both mgmt and vMotion portgroups have correct vLans:
Now I have found two problems:
1. When I disabled the port connected to vmnic5Uplink from the the Cisco switch, the host goes down; meaning there is no management network redundancy although I am using all 3 available NICs for management traffic. Nowhere in vSphere does it mention there is a problem with management redundancy or any warnings regarding that. Only when I disabled ports connected to any of the 3 uplinks, the host goes entirely down (Not responding).
2. When the host goes down, HA does not work. HA tries to migrate VMs residing in the affected host but HA migration fails. Before the host becomes "not responding" stage, there is this error:
After about 10 seconds of this error message, it says
I cannot figure out why this is happening. Anyone has any ideas from looking at the above?
Thanks
The teaming configuration needs modification..
vmnic4 is the only active nic which means as long as the upstream port for vmnic4 is not down the other 2 nics will not be used for management traffic.
"when I shut ports from the switch going to both Fibre A and B, management network does not fail over to copper." I am not sure why you have this..
Having said that, I suggest you use 2 fibre adapters for your management.
the 2+1 configurations seems little odd in just that I have not seen such a thing before.. I am also not sure if it can be considered as a design best practices.
I would rather suggest you to use below config..
Fibre port A and Fibre port B both active..
Team them up in the physical switch end and use Route based on physical nic load..
You will get redundancy and load balancing..
What upstream switch do you have..?
I have couple of questions...
1) what happens when you disable any one of the 2 fibre ports (A or B)
2) could you also share the host isolation response setting on the cluster
1. What do you mean by Host goes down?? is it not on the network?? 2. Please share the NIC teaming policy on switch. 3. Check the network port settings on physical switch. 4. are VMs on shared Storage?? 5. What is the HA protection status on the VMs?
Hi!
To properly test HA you need to power off the host with running VMs using iLO\iDRAC or by pushing power button.
Also you can generate PSOD manually vsish -e set /reliability/crashMe/Panic 1
As well HA works properly only when your VM reside on shared datastore. VMs on local datastores can not be restarted on other hosts.
When you down network ports more than likely you will get network isolation status (as you posted on screenshot). There is separate action for this. Check Host isolation responce in HA cluster settings.
When management network is down and connectivity to shared storage is active (it means that you have Fibre Channel storage) than datastore heartbeating works. And your host can tell to master host which VMs are active and running.
Hi ShekharRana,
Thanks for your reply.
1. What do you mean by Host goes down?? is it not on the network??
I can no longer ping the host IP. And host becomes "not responding" status in vCenter.
2. Please share the NIC teaming policy on switch.
3. Check the network port settings on physical switch.
Physical switch has absolutely correct settings, as I have confirmed together with my network administrator.
4. are VMs on shared Storage??
All 3 are using 2 volumes from the SAN. They are mapped via iSCSI adapters. I'm not sure if this is shared storage or not.
5. What is the HA protection status on the VMs?
Cluster HA settings
VM summary (same for all VMs in this cluster
Hi Finikiez,
None of the VMs are on local datastores. However, they are residing in SAN volumes attached to the hosts via iSCSI adapters.
can you take the copper link out of the picture and test HA..
with 2fibre nics you still have redundancy..
Hi Hussainbte,
Thanks for your suggestion. I was doing exactly that yesterday and redundancy works.
There is a very strange thing happening here.
Here is how my teaming and failover is set up for Mgmt portgroup in the DVSwitch:
The teaming configuration needs modification..
vmnic4 is the only active nic which means as long as the upstream port for vmnic4 is not down the other 2 nics will not be used for management traffic.
"when I shut ports from the switch going to both Fibre A and B, management network does not fail over to copper." I am not sure why you have this..
Having said that, I suggest you use 2 fibre adapters for your management.
the 2+1 configurations seems little odd in just that I have not seen such a thing before.. I am also not sure if it can be considered as a design best practices.
I would rather suggest you to use below config..
Fibre port A and Fibre port B both active..
Team them up in the physical switch end and use Route based on physical nic load..
You will get redundancy and load balancing..
What upstream switch do you have..?