rhaaaven
Contributor
Contributor

Load balancing configuration mismatch issue

Hi All,

After reading some threads on Communities I've started to feel inspired to simply...ask for advice when I'm not able to find solution to the matter on my own Smiley Happy

Excuse if I'm duplicating issue - I didn't find similar case on Communities..

I'm administrating vSphere environment consisting of two sites - three host clusters in each site.

Few weeks ago two host started to show 'vSphere Distributed Switch teaming matched status' alarm.

Some VMs on one of the hosts was issued with degraded communication - lost packets - lost pings.

After some digging considering appropriate kb: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=205766...

I've found that two of the physical switches that those hosts are connected to are configured differently than VMware best practices are recommending.

I have IP HASH set on my vSphere environment and EtherChannel is in use; but on these two switches I see troubling config line:

- which should be "IPv4: Source XOR Destination IP address"

-but is "IPv4: Source XOR Destination IP address and .........."

Bolded entries seem to be the mismatch causing the VMs network traffic to be miscalculated.

The problem is that I've had no problem at all since the beginning of this environment and the third party that is managing our network (switch configuration)

is making me sure that no changes were made and these mismatches were present also since the host were first connected.

I assume that increased number of VMs - therefore network traffic - caused this to come out - but is it true ?

There is no mechanism on Standard vSwitch to check for mismatches...

Or maybe I'm completely lost ?

Lookng forward to your advice and support.

Many thanks in advance !

Cheers !

Piotrek

0 Kudos
2 Replies
grasshopper
Virtuoso
Virtuoso

rhaaaven wrote:

After reading some threads on Communities I've started to feel inspired to simply...ask for advice when I'm not able to find solution to the matter on my own Smiley Happy

Nice!  That's what it's all about.  Anyway, hopefully we can help.  This is a cool, but possibly tricky topic.  Personally, I only use the 3rd party VDS (1000v) so I don't have production experience with this new Health Check feature of the VMwre VDS.  However, the fundamentals of reviewing port channel configs, etc. for VMHosts is the same.

The first step would be to generate a CDP report using PowerCLI.  This creates a .csv file which you can clean up in Excel then share with your Network folks.  This will ensure that the ports under review are clear and distinct.  Have them check the port channel configs for typos, etc.  Ensure that they send you the text output showing the configs.

The problem is that I've had no problem at all since the beginning of this environment and the third party that is managing our network (switch configuration) is making me sure that no changes were made and these mismatches were present also since the host were first connected.

Is it possible that the health check feature of the VDS was only recently turned on?  It's not on by default AFAIK.  The issue may have been going on unobserved (speculation).  It's also possible that no changes were made to your VMHost's switch ports, but a change in switch to switch communication could have happened.

Anyway, you may consider obtaining the MAC addresses of the Guest OS's affected during the loss of ping.  Have the network guys review the switches to find the MAC and see if they find anything interesting.

I assume that increased number of VMs - therefore network traffic - caused this to come out?

That is absolutely possible and I have seen that many times.  If your "VM Network" consists of 4 physical NIC ports, you may not observe the issue until the 4th VM is powered on.  That's a simplified case, but when the port channel is misconfigured I typically see a percentage of VMs (i.e. 25%) fail to ping when placing them on a bad host.

It will serve you well to document the vCenter and ESXi versions in use along with the firmware, driver and NIC models, Number of NICs, etc.  If your network team decides to escalate to the switch vendor (or you engage VMware support) you will need to have this info handy.  We can help with any questions you have gathering that info if desired.

0 Kudos
TommyFreddy
Enthusiast
Enthusiast

0 Kudos