Hello all,
I have a couple of ESXi 5.1 Update 1 hosts that are only connected to a pair of 10G switches with two fibre cables. All management, VM, iSCSI and vMotion traffic goes through those two 10G NICs. The hosts are using a distributed switch.
The dvPort Group used for the management traffic is set to use vmnic4 as the Active NIC and vmnic5 is set as Standby. The Teaming and Failover Failback policy is set to No.
I wanted to test my redundancy so I performed the following actions:
It would seem like I have redundancy but only after one failure of the Active NIC. Thereafter, despite being "repaired" by plugging in the cable it will not failback to vmnic4. When Failback is set to "No", will it never ever failback or is there some time delay after which it will.
When I change the Failback to "Yes", the following happens:
Any ideas?
Thanks in advance.
Rosco
Hi Rosco,
How is your physical networking configured, and what load balancing policy are you using? Are there any hints in the vmkernel.log?
One thing that comes to mind is that PortFast might not be configured in this instance - could you confirm if this has been enabled?
Cheers,
Jon
Message was edited by: Jon Munday
when failback is set to no, traffic will not failback to NIC 4 when it returns to active duty after failure untill and unless there is failure of NIC 5. As suggested by Jon please check portfast is enabled on your physical switch servicing NIC 4 and NIC 5.
Please mark this answer as "correct" or "helpful" if you found it useful.
Alex Hunt | IT Operations Analyst | VCP-DCV
Website : https://alexhunt86.wordpress.com
Blog : https://communities.vmware.com/blogs/vgeeks/
Hi Jon / Alex,
Thanks for the ideas. I did some testing today with one of our network engineers and it turned out to be a problem with the ARP cache timeout on the Huawei switches. By reducing that it failed back fine after about 10 seconds which I can live with. Seems slightly faster when connected to Cisco kit but all in all happy with the failover and failback times now.
Cheers,
Rosco
Still seems a bit odd.
Failover should always work immediately (or just ~1 ping) since the host sends gratuitous ARP broadcast frames on the new link for all attached vNICs. Do you have the "notify switches" option enabled on the port group?
It basically works exactly the same when you vMotion a VM from one host to another. The new host will send gratuitous ARPs on behalf of the VM to update the physical switch's CAM/MAC tables, so you should have that issue there as well if you migrate VMs between hosts.
Thanks for that MKguy. Notify Switches is set to "Yes". I just tested a vMotion and you're right, it does drop about 6 pings so something's not right with the switches. I'll get our network engineer to have another look.