VMware Cloud Community
Ytsejamer1
Enthusiast
Enthusiast

vDS in Active/Active or Active/Passive ends up dropping connections to ESXi hosts

Hello,

Long story short - when I have dvuplinks in Active/Active or Active/Passive, I get connection drops from the hosts connected to the switch.  When only one dvuplink is used, all is right with the world.

A couple of months ago, I built out a new three-node cluster (ESXi 5.1 U1 b1312873).  I built a 5.1 vDS and configured it using default load balancing (as well as attempting with Route based on physical load & explicit failover order).  For Network Failover Detection, I've chosen Beacon Probing as per the direction of the network engineers.

Each host has two network ports (for a total of six nic ports used) for:

Management/Server LAN (0 and 1)

NFS (2 and 3)

vMotion (4 and 5)

Each pair of host NICs is plugged into a separate 2K switch...for physical switch FT.  NICs 0,2,4 go to one 2K, NICs 1,3,5 go to the other 2K.

Our physical networking setup is two Cisco 5Ks with two fabric extender 2Ks...one off of each 5K.  In our current setup we have ESXi hosts dual-homed on the nexus 2k’s. When the dv uplinks are brought up in active/active or active/passive on the applicable vDS port group, we receive periodic ping drops. In order to avoid this problem we have limited the connection to using only one of the dv uplinks.

Here's the SW software version for the Cisco switches:

Software

  BIOS:      version 3.5.0

  loader:    version N/A

  kickstart: version 5.1(3)N1(1)

  system:    version 5.1(3)N1(1)

  power-seq: Module 1: version v1.0

             Module 3: version v2.0

  uC:        version v1.2.0.1

  SFP uC:    Module 1: v1.0.0.0

  BIOS compile time:       02/03/2011

  kickstart image file is: bootflash:///n5000-uk9-kickstart.5.1.3.N1.1.bin

  kickstart compile time:  12/6/2011 22:00:00 [12/07/2011 01:30:01]

  system image file is:    bootflash:///n5000-uk9.5.1.3.N1.1.bin

  system compile time:     12/6/2011 22:00:00 [12/07/2011 03:09:44]

If anyone has any thoughts, ideas, feelings....I'm open to suggestion.  Thanks so much in advance!

5 Replies
VirtuallyMikeB

Hello,

Can you please share the hardware version of your 5ks and 2ks?

Thanks,

Mike

http://VirtuallyMikeBrown.com

https://twitter.com/VirtuallyMikeB

http://LinkedIn.com/in/michaelbbrown

----------------------------------------- Please consider marking this answer "correct" or "helpful" if you found it useful (you'll get points too). Mike Brown VMware, Cisco Data Center, and NetApp dude Sr. Systems Engineer michael.b.brown3@gmail.com Twitter: @VirtuallyMikeB Blog: http://VirtuallyMikeBrown.com LinkedIn: http://LinkedIn.com/in/michaelbbrown
0 Kudos
VirtuallyMikeB

So I should add a bit to this...

What does your network diagram look like? In particular, what connections are you running upstream from the Nexus' to your core? I have a feeling it's not a mesh if you're using beacon probing.  What core switches are you running?  You might not need beacon probing - one rarely does with multi-chassis etherchannel switches these days.

Are the Nexus' 5020s? The NX-OS version you're running is very old.

Can you show the output of a  "show vpc" from the 5ks?  Have you enabled LACP on your ESXi hosts? Please also show the switchport configs for your ESXi hosts.

Cheers,

Mike

----------------------------------------- Please consider marking this answer "correct" or "helpful" if you found it useful (you'll get points too). Mike Brown VMware, Cisco Data Center, and NetApp dude Sr. Systems Engineer michael.b.brown3@gmail.com Twitter: @VirtuallyMikeB Blog: http://VirtuallyMikeBrown.com LinkedIn: http://LinkedIn.com/in/michaelbbrown
Ytsejamer1
Enthusiast
Enthusiast

Hi Mike,

thanks for the response...Let me chat with the network engineers and see what I can get for more info...

On a side note, I was able to avoid the problem by switching Network Failover Detection from Beacon Probing to Link Status Only.  I could have sworn I had done this many times a couple months ago, but perhaps not.  I'd say there's something still amiss, and would love to figure out more.

0 Kudos
VirtuallyMikeB

Sorry, didn't mean  to imply beacon probing was your problem.  Just added that as an aside.  If you have an MLAG switches (7k, 6500, stacked 3750, etc.) north of the Nexus', you don't need beacon probing.

----------------------------------------- Please consider marking this answer "correct" or "helpful" if you found it useful (you'll get points too). Mike Brown VMware, Cisco Data Center, and NetApp dude Sr. Systems Engineer michael.b.brown3@gmail.com Twitter: @VirtuallyMikeB Blog: http://VirtuallyMikeBrown.com LinkedIn: http://LinkedIn.com/in/michaelbbrown
0 Kudos
Ytsejamer1
Enthusiast
Enthusiast

Yeah, we've got the hosts plugged into two 2Ks, which are extended off of the 5Ks, that then go into a couple 4506s.

I'm content to leave things as is...now that they're working as expected.  I just was trying to keep my vNetworking configurations in a consistent configuration...

Thank you very much for taking the time to offer your thoughts and explanations.  Very much appreciated!

0 Kudos