I am trying to create a network bridge between 2 vSwitches using Linux (CentOS 5.4 / 2.6.18-164.15.1 bridge-utils 1.1)- pretty straight forward setup.
vSwitch0 has the physical NICs in the ESXi box connected and promiscuous mode allowed. vSwitch1 has no physical NICs and promiscuous mode allowed as well. ESXi has 2 VMs, Linux machine with 2 vNICs (one in each vSwitch) and a Windows machine with a single vNIC in vSwitch1.
I've created a ethernet bridge on the Linux machine enslaving both eth0 and eth1, disabled selinux, flushed iptables and ensured both vNICs are up.
With this setup, the windows VM can not get arp replies back over the linux bridge so it never gets the mac address of the physical network GW (a cisco switch in this case). I can see the arp broadcast from the windows VM go over the bridge, and get replied too from the switch... but the reply never makes it back over the linux bridge. The response never gets sent out eth1 in vSwitch1. If I set the GW mac statically in the arp table on the windows machine, everything seems to work- so its only layer 2, ethernet broadcasts that do not seem to be able to make it over the bridge- in only one direction.
Cisco GW <-> Physical host NIC <-> vSwitch0 <-> Linux Machine vNIC eth0 <-> Linux bridge br0 <-> Linux machine vNIC eth1 <-> vSwitch1 <-> Windows vNIC
I know it sounds like a Linux issue, but this is a very basic bridge which works in the physical environment. (i set it up in the lab on physical hardware just to test that i am not forgettting something basic, and it works as expected.)
I can't be the first person to try this- is host bridging of vSwitches not supported?
(The Linux machine will -if this works- end up being a transparent Snort sensor)
I would definitely make sure both eth adapters are in promiscuous mode...
They are-
dmesg output:
device eth0 entered promiscuous mode
device eth1 entered promiscuous mode
br0: port 2(eth1) entering learning state
br0: port 1(eth0) entering learning state
acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
VMware PVSCSI driver - version 0.0.0.6
VMware memory control driver initialized
vmmemctl: started kernel thread pid=2806
eth0: no IPv6 routers present
br0: no IPv6 routers present
eth1: no IPv6 routers present
br0: topology change detected, propagating
br0: port 2(eth1) entering forwarding state
br0: topology change detected, propagating
br0: port 1(eth0) entering forwarding state
br0: port 2(eth1) entering learning state
br0: topology change detected, propagating
br0: port 2(eth1) entering forwarding state
Remember that you have to enable promiscuos mode ALSO on the right PortGroups od your vSwitches.
Andre
The port groups inherit the settings from the switch, do they not?
I have not explicitly set the port groups, so they should have the same setting as the vSwitch....
I think I found out why its not forwarding arp back to the client windows machine- looking at the arp table in the bridge, the mac address for the windows machine is showing up on the wrong bridge port. eth0 is port 1 and eth1 is port 2, 00:0c:29:b9:2f:e9 is the mac of the windows machine.
port no mac addr is local? ageing timer
1 00:0c:29:a8:1b:ee yes 0.00
2 00:0c:29:a8:1b:f8 yes 0.00
1 00:0c:29:b9:2f:e9 no 1.86
If i disconnect the eth0 vNIC from the linux host, the mac moves to the correct port, 2. Im not sure vmware is completely to blame, but i can't reproduce this issue in a physical environment- I will try to different distro of linux with a newer kernel and see if that works- if not, is it worth taking up with vmware?
still an issue with Ubuntu 9.10 2.6.31-14 and brige-utils 1.2
i found a post where someone else is experiencing the same issue without resolution.
http://archives.free.net.ph/message/20100108.174704.efbb18cc.ja.html
So it looks like an actual bug
In speaking with the linux-bridging mailing list, I understand where the issue lays.
The issue is in VMwares vSwitches- when a vSwitch has more than one pNIC in it, the second pNIC (even if standby in an active/passive fail over) replicates back the arp requests, causing the linux bridge to incorrectly update its mac table.
The recourse is one of a couple things:
Remove the second pNIC from the vSwitch; of course compromising redundancy.
Replace the built in vSwitch with a Cisco 1000V (unconfirmed, but assumed to work)
Replace the Linux bridge with an arp proxy / ip forward.
If any one else has suggestions, I'm all ears. My understanding is vmware has stated no intention on changing this behavor (hearsay from the mailing list).
Hopefully this saves someone else a week of their professional career
A workaround is to transform your linux bridge in a "hub", disabling the learn process.
brctl setageing br0 0
where br0 is the bridge name.
So every time a packet arrives in the bridge, it will be flooded to all ports.
I hope it helps other people with the same problem