vDS and physical network configuration

Schoko81 · ‎11-07-2014

Hello,

we are migrating our vsphere network configuration from vStandard to vDistributed Switch and during our tests we are facing some configuration issues and at least some "best practice" problems.

At first, a short introduction of our test environment:

Location A and B, each: 2 ESXi 5.5 hosts, 4x 1GB NICs connected to 2 trunked switches.

Defined VLANs (trunks) for ManagementNetwork, vMotion, VM traffic

First of all we "just" migrated our VSS to VDS inclusive all portgroups without any obvious issues and errors. After enabling the health check we get the following Alarms a few times a day:

Teaming configuration in the vSphere Distributed Switch on host hostname does not match the physical switch configuration in ha-datacenter. Detail: No loadbalance_ip teaming policy matches

After some research we found the following VM KB article: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=205766...

This leads us to some discussions with our network team and they defined portchannel groups for all NICs of each ESXi host. We changed the VDS settings and activated LACP (passive) for the uplinks, set all Uplinks as Active (for all portgroups) and changed the loadbalancing policy to "Route based on IP hash ..." as described in the kb article above and further kb articles.

These changes helped us concerning the vDS health check errors and we did not get any VDS teaming/failover errors and warnings again!!!

So we continue our research and we found some references and "best practice tips" which do not recommend our LACP configuration for "VMotion" because of the "route based on ip hash" load balancing policy, which is not "load aware". LBT and "Route based on originating virtual port" is recommanded. But on one side, if we configure portchannel groups we have to use the "ip hash" policy and on the other side without the portchannel groups and the "....originating virtual ip port" load balancing we get the health check errors.

Are there any ideas how we can realize our configuration without LACP (portchanneling) and health check errors? How should we configure the physical switch ports and/or VDS!?

Thank you in advance!!!

Manuel

vfk · ‎11-07-2014

The best way forward for you will be to configure each port fasting the esxi as portfast trunk, so that each port carries all the required VLANs and this should work fine with "Route based on originating virtual port" you can configure LBT. I suspect you are getting the error because you have a mismatch in VLAN configured on your portgroups and VLAN visible on the trunk. As you have already discovered, LAG created complications and LBT is far easier to implement and support.

--- If you found this or any other answer helpful, please consider the use of the Helpful or Correct buttons to award points. vfk Systems Manager / Technical Architect VCP5-DCV, VCAP5-DCA, vExpert, ITILv3, CCNA, MCP

chriswahl · ‎11-11-2014

Unless your vMotion traffic lasts longer than 30 seconds, it won't really matter. It takes at least that long for Load Based Teaming (LBT) to even kick in. If it does, and you're finding issues with contention, then it is worth considering.

I've written a few blog posts on the advantages and disadvantages of Link Aggregation Groups (LAGs); neither is a "best practice" in my mind per se, but are various design alternatives. I tend to lean away from LAGs - they add a fair bit of complexity, eliminate a handful of features, and provide very corner-case level benefits. Most network folks will automatically wire up a vSphere host LAG because there's a "switch" inside of it, but vSphere can't form a loop (it doesn't flood frames). There are some benefits for avoiding a peer link interface (square topology), but that's more of a bandwidth concern than any sort of latency headache.

Are there any ideas how we can realize our configuration without LACP (portchanneling) and health check errors? How should we configure the physical switch ports and/or VDS!?

You'll have to get rid of the port channel from the physical interfaces. The VDS can then use virtual port ID or physical NIC load as port group load balancing policies.

VCDX #104 (DCV, NV) ஃ WahlNetwork.com ஃ @ChrisWahl ஃ Author, Networking for VMware Administrators

Schoko81 · ‎11-12-2014

Ok, thank you for your input. I will talk to my network guys and we will try the following:

- dissolve the Portchannel group(s) for each host

- check the vlan trunks

- change the port settings to portfast trunk (on nx-os "spanning-tree port type edge", correct?)

on ESXi:

- change the VDS Settings (loadbalancing policy, active and standby adapters)

For testing purpose we defined all necessary VLANs for all pysical ports (pnics) and assign each portgroup one of VLAN - ID. So, from this side, there should be no problem. I will check on esxi side if there are all VLANs visible!

Thank you!

Update:

We've made the described changes on the first two ESXi hosts (and switch ports) on a new VDS as described above.

The next days we will monitor the VDS health check, if the the teaming/failover warning still occurs. I checked the visible VLANs and they are OK, from my point of view. We can see both VLANs on all NICs of both ESXi hosts.

Best Regards

Schoko81 · ‎11-14-2014

Hi,

I am back with good news (for me ) - the portfast trunk settings on the switch ports solved our problem.

Now we are working/discussing/researching for the "best" Teaming and Failover policies for the portgroups. Perhaps you can give me some advises and corrections of our foundation:

Location A and B, each: 2 ESXi 5.5 hosts, 4x 1GB NICs connected to 2 trunked switches.

Defined VLANs (trunks), 1x VDS, NIOC:

1x Management traffic (vmk):

Active/Standby: NIC 0 / NIC 1

LB: Route based on originating virtual port -> Route based on physical NIC load just makes sense in an active/active configuration?

Failback: NO

1-2x vMotion (vmk):

Active/Standby: NIC 2 / NIC 3

LB: Route based on originating virtual port

Failback: YES -> "NO" only really necessary for management traffic concerning possible host isolation response in an active/standby configuration?

x Virtual Machine:

Active: NIC 0, 1, 2, 3

LB: Route based on physical NIC load

Failback: YES

From my point of view, the "Virtual Maschine" group settings should be OK, but I am not sure concerning the bold written settings above.

Thanks in advance!

Manuel

Schoko81 · ‎11-17-2014

Hello,

I am sorry I was a little bit overhasty and I have to revoke my last entry that the problem is solved.

After two days without any teaming errors from the vDS health check, the errors/warnings return:

- Teaming configuration in the vSphere Distributed Switch does not match the physical switch configuration. Detail: No loadbalance_ip teaming policy matches

This issue occured the last day 5-6 times on two different hosts (4 hosts in a cluster added to the vDS). The issue changes the Status from Green to Red. About two minutes later the status changes back to GREEN! The network guys said, that they "see" no errors on the connected ports.

Perhaps another useful information: We are using Cisco Fabric Extender(s) (FEX) in our network?

I am a little bit lost at the moment! Any further ideas?

Thanks in advance again!

Manuel

KReichert · ‎02-05-2015

I have had this same issue for a while. Configuration is a 6-10 hosts connected to a vDS using LBT same config for 5 different vDS and they all see the health check error No loadbalance_ip teaming policy matches. I opened a case with support and they told me that the Teaming and Failover Health Check option does not work with LBT, so the error is a false positive and I should disable that feature.

Here is the quote from support

"The alert message was coming because we had DVS health check enabled for teaming and failover. The health for teaming and failover is designed for IP hash (accompanied with port channel or ether channel at the physical switch level) in which one physical NIC broadcast network packets and expect it to receive on other physical NIC's. Since we did not have ether channel configured at physical switch, this functionality would not work as there will be dropped packets at physical switch level. Hence alarm was getting triggered.We suggest you to disable the health check for teaming and failover, disabling this feature will not cause alarm to trigger."

All

vDS and physical network configuration