Awurz
Contributor
Contributor

standard vSwitch: Linux guest OS bonding via separate vSwitches

Hello,

i am experiencing issue while using two standard vSwitches for guest OS bonding.

vSphere/ESX is 5.0

I am using a HP blade chassis with two IO modules, each has one uplink to external switch.

ESX on a blade inside the chassis is configured with one vSwitch per uplink and multiple port-groups.

Both side (uplink trunks A and B) are identical in terms of allowed VLANs ... and therefore also vSwitches are identical.

vSwitch1:

NA uplink:   vmnic1

Pgroup1:     untagged (native)

Pgroup2:     VLAN 1 (tagged)   

Pgroup3:     VLAN 2 (tagged)

vSwitch2:

NA uplink:   vmnic2

Pgroup1:     untagged (native)

Pgroup2:     VLAN 1 (tagged)   

Pgroup3:     VLAN 2 (tagged)

Each VM on the ESX host has one interface per port-group on each vSwitch assigned.

The Linux guest OS uses interfaces in same Pgroup from vSwitch1 and vSwitch2 as bond interface.

Issue is now:

VMs can ping each other (and also the network gateway) inside their port-groups if all active slaves of the bond interfaces are on same vSwicth  (therefore VMs should be switched inside vSwitch).

If one of the VMs has active slave of bond on other vSwitch, it still can ping external infrastructure network gateways but not the other VMs anymore.

Normally the same port-groups VLANs should be switched via the external switch were both uplinks of vSwitches are connected (i also see that if active slave of bond changes, the uplink port assignment for VM bond MAC address changes in MAC table of external switch) but it cannot ping other VMs in same port-group (VLAN) which reside on (mirror) other vSwitch.

I checked external network ... and everything seems fine- i checked ESX networking which also seems fine on both vSwitches- really strange thing is that i can always (does not matter if active slave of bond is on vSwitch1 or vSwitch2 the network gateway addresses which reside on external switches/router but not between VMs if active slave bond interfaces of guest OS reside on different (but identical) vSwitches.

Anyone came across this or has any clue what could causing this?!

0 Kudos
18 Replies
rickardnobel
Champion
Champion

This bonding that you do inside the Linux guest, do you know what kind of method is uses to distribute the outgoing frames?

And also, what is the reason for this setup? If I understand it correct than you have two vSwitches with identical portgroups, but only one uplink on each vSwitch and then each VM has two vNICs and doing internal teaming/bonding?

Depending on what you try to actually do there might be more simple ways to setup the vSphere networking.

My VMware blog: www.rickardnobel.se
0 Kudos
MKguy
Virtuoso
Virtuoso

Why are you using interface bonding from the guest side in the first place? The virtual NICs of your VMs will always, forever and ever see "link up" unless you manually edit the "connected" checkbox/option of that vNIC in the VM configuration. Even if your blade NIC, blade interconnect module, or physical switch dies, the VMs will never be able to notice this and will happily continue to send their traffic into that black hole.

Create one vSwitch with 2 uplinks and use easy, bread and butter ESXi teaming.

Anyways, you can combine all of this into one vSwitch too and that should work. Configure two port groups with the same VLAN and port group 1 to use physical vmnic1 as active uplink, and physical vmnic2 as unused. Port group 2 will use vmnic2 as active uplink, and physical vmnic1 as unused. This yields the same effect as two separate vSwitches.

Note that this would only work for active-passive failover bonding configurations and not with any kind of load balancing like etherchannel, 803.11ad or LACP.

But like I said, your approach providing NIC teaming on the guest side is a strange one to begin with.

-- http://alpacapowered.wordpress.com
0 Kudos
Awurz
Contributor
Contributor

bonding type is MII (link based) active-backup (no load-balancing) for redundency.

I explicitelly need the bonding for guest OS cause i try to simulate rack-mounted pizzabox servers with network redundency.

I know that i can aggregate ESX uplinks but this is out of scope cause my attention is at guest OS (Linux) configuration including network redundency through active-standby bonding.

This pizzabox configuration is a live configuration and i need to virtualize that 1to1 for testing reasons.

And just to mention- if i down the active guest bond interface, former standby interface becomes active, bond can reach its network gateway but not the other VMs on ESX host anymore.

So in case that the physical uplink will fail, all VMs will switch guest OS bonding interface slave, so issue will no occur but it is strange that if active bond interfaces of VMs are on different vswitch only communication to network gateway works but not between guests.

0 Kudos
rickardnobel
Champion
Champion

You should be aware that the vNIC (virtual network card in guest) will never go offline, no matter what will happen to the physical interfaces of the ESXi host.

My VMware blog: www.rickardnobel.se
0 Kudos
Awurz
Contributor
Contributor

=> So in case that the physical uplink will fail, all VMs will switch guest OS bonding interface slave, so issue will no occur but it is strange that if active bond interfaces of VMs are on different vswitch only communication to network gateway works but not between guests.   [thats the case if i also enable ARP probing on gateway as target in active-standby bond- link base alone would stay online all the time for sure]


0 Kudos
rickardnobel
Champion
Champion

Awurz wrote:

=> So in case that the physical uplink will fail, all VMs will switch guest OS bonding interface slave, so issue will no occur

If you do some internal guest probing on some external address it might work, but you should be aware that this is not the way it is typically done or even meant from VMware to be used.

Awurz wrote:

but it is strange that if active bond interfaces of VMs are on different vswitch only communication to network gateway works but not between guests. 

Was that between two VMs on the same ESXi host, but connected to two different vSwitches? Typically this should be no problem, but the traffic must always leave the ESXi host and go through physical switch before entering the other vSwitch.

If this does not work there might be problems with the physical switch, it could be internal firewalls on the guests OR it could be that the bonding setup does something with the guest MAC addresses which confused either the vSwitches or the physical switches.

My VMware blog: www.rickardnobel.se
0 Kudos
Awurz
Contributor
Contributor

as i said- i try to simulate a rack-mounted server. I do not try to setup a VM any specific "best practise" way- i also utilize the vSphere methods for redundency and aggregation on other ESX/VMs but here i need guest OS bonding that way to have it identically to the rack-mounted server for testing purpose.

Was that between two VMs on the same ESXi host, but connected to two different vSwitches? Typically this should be no problem, but the traffic must always leave the ESXi host and go through physical switch before entering the other vSwitch.

Bingo- the traffic should be switched via external infrastructure (other than VMs with active bond interface on same vSwitch which are switched internally)- i can see the MAC address of bond switching the uplinks in external switch MAC table but for some reasons VMs with active bond interfaces on different vSwitches cannot PING each other any longer- but their gateway IP works fine on all VMs in same VLAN.

0 Kudos
MKguy
Virtuoso
Virtuoso

Use a configuration with a single vSwitch and 2 port groups with alternating active/unused uplink assignments as I described above.

-- http://alpacapowered.wordpress.com
0 Kudos
Awurz
Contributor
Contributor

2port groups with alternating active/unused uplink assignment?! VLAN tagged in trunk uplinks?!

That does not not work out for me- and as i said i explicitelly want 2 sides (left uplink with left vSwitch and right uplink with right vSwitch)

I know that a single vSwitch will work (with all vSwitch combinations of failover/active-standby and load-balancing what ever)

0 Kudos
MKguy
Virtuoso
Virtuoso

That does not not work out for me- and as i said i explicitelly want 2 sides (left uplink with left vSwitch and right uplink with right vSwitch)

But you get exactly the same end result you desire that with this configuration. The only difference is that you don't have two logical vSwitches, which the network or VM won't be aware of anyways and you control it on the port group instead of vSwitch level. The VMs vNICs will still be statically mapped to a specific physical uplink that way.

So what's wrong with that approach?

-- http://alpacapowered.wordpress.com
0 Kudos
Awurz
Contributor
Contributor

You mean assign the uplink to the port-group(s) and have identical (for sure different name) port-groups? So does it really work to have on same vSwitch different port-groups with same VLAN Id?

0 Kudos
rickardnobel
Champion
Champion

Awurz wrote:

So does it really work to have on same vSwitch different port-groups with same VLAN Id?

Yes, you can have several portgroups on the same vSwitch with the same VLAN id, that is basic VMware networking.

My VMware blog: www.rickardnobel.se
0 Kudos
MKguy
Virtuoso
Virtuoso

Yes, we're actually using a similar config like that (except explicitly changing uplinks and using multiple vNICs) for one network because we have VMs in the same subnet that require different port group specific settings. VMs can still communicate with each other over the vSwitch without going through an uplink as long as the port group has the same VLAN.

It looks like this in our case:

Switch Name Num Ports   Used Ports  Configured PortsMTU Uplinks
vSwitch1    256         25          256             1500vmnic1,vmnic4  
   PortGroup Name  VLAN ID   Used PortsUplinks
   PG_10.1.1.0_1   22     4        vmnic1,vmnic4  
   PG_10.1.1.0_2   22     12       vmnic1,vmnic4  

[...]

I also took a bit of a closer look to the MII failover bonding you mentioned here:

https://www.kernel.org/doc/Documentation/networking/bonding.txt

active or 1 The "active" fail_over_mac policy indicates that the MAC address of the bond should always be the MAC address of the currently active slave. The MAC address of the slaves is not changed; instead, the MAC address of the bond changes during a failover.

The fail_over_mac option seems interesting, how is it configured in your case? Does your bonding assign a virtual MAC for both NICs or is it using the physical MAC of the respective active uplink? Check with tcpdump how the bonded system responds to ARPs and what source MAC it's using for sending frames.

I know for example a firewall clustering solution that works (very well) with physical MACs only and using gratuitous ARPs to update neighbouring ARP caches with the new physical MAC of the failover NIC.

Can you also post the output of an ifconfig and cat /proc/net/bonding/bond0 ?

-- http://alpacapowered.wordpress.com
0 Kudos
Awurz
Contributor
Contributor

OK- so i will try if assigning uplinks to port-groups inside one vSwitch works.

But never the less can you agree that also using 2 vSwitches should work? Just want to know if anything speaks against that.

Cause i know i could handle same scenario different ways but i came across this issue when using 2 vSwitches.

@MKguy:  mode is: active-backup     (MAC address does not change- bond always uses MAC of first slave)- I also thought of that cause MAC is provided via vNIC on vSwitch1 but is then active on vSwitch2 when vNIC1 is downed (but on the other side inbound and outbound is working fine to routed networks and network gateway via this vNIC)- so i dont think this SHOULD cause an issue (if its not a BUG)

0 Kudos
MKguy
Virtuoso
Virtuoso

I also thought of that cause MAC is provided via vNIC on vSwitch1 but is then active on vSwitch2 when vNIC1 is downed (but on the other side inbound and outbound is working fine to routed networks and network gateway via this vNIC)- so i dont think this SHOULD cause an issue (if its not a BUG)

I think that exactly is the problem in your configuration. Take a look at this article and think about it:

http://blog.ioshints.info/2010/11/vmware-virtual-switch-no-need-for-stp.html

No MAC address learning

The hypervisor knows the MAC addresses of all virtual machines running in the ESX server; there’s no need to perform MAC address learning.

So as both NICs run with the same MAC, both vSwitches think the destination MAC of the frame is actually locally connected to themselves and do not attempt forward traffic over the physical network, which actually makes kind of sense.

There is also this but I'm not sure if it applies over mutliple vSwitches as well:

Packet received through one of the uplinks and having a source MAC address belonging to one of the virtual machines is silently dropped.

During failover, can the other VM see any broadcasts originating from the source in your case?

-- http://alpacapowered.wordpress.com
Awurz
Contributor
Contributor

I searched quiet alot but i did not came across this pages.

OK- so that might be the case i already thought of- Hypervisor provides vNIC MAC via vSwitch1 and this NIC is downed in guest OS so Bond uses vNIC from vSwitch2 but with MAC staying same (provided by vSwitch1)- but traffic to/from other networks or to network gateway works, so it seems that only traffic to vSwitch/port-group is dropped for destination of original port-group.

But then also using one vSwitch with different port-groups might not work cause the also MAC is originally in one port-group and then transfered to other port-group. <= i am wrong, should be switched locally although and not be handled via uplink cause port-group is only logical entitiy

0 Kudos
MKguy
Virtuoso
Virtuoso

I'm not so sure about whether that would work or not. Just try it out.

And if it doesn't, see how you can make the bonding use different MACs with the fail_over_mac or some other option.

-- http://alpacapowered.wordpress.com
0 Kudos
Awurz
Contributor
Contributor

i will try- bonding is configured that way because application-servers (and also simulated servers as VMs) are auto-installed and this config is used on those rack-mounted servers

0 Kudos