Skip navigation

Its been a while since I last wrote a blog post so I thought I should put up a quick one.

In NSX deployments and designs I have worked on, common questions I get when using a teaming method on NSX-v of Loadbalance-SRCID/MAC is:

 

1. How can I determine what physical interface VXLAN encapsulated traffic will egress/ingress via on an ESXi host?

2. What will be the SRC IP address and SRC MAC address of the VXLAN traffic?

Please note this does not include traffic traversing a DLR, I will cover that in a later post.

So to determine this you can follow the below process:

 

If you run the command esxtop from an ESXi hosts cli, then type n (for network) you will get a list of VM's DVPorts and VMKernel interfaces (VTEPs) and more importantly what DVUplink/vmnic on the host they are mapped to as below shows. In this example I'm going to be using VM04 as an example.

 

So in the below example from esxtop VM04 is using vmnic1 (DVUplink1) on my DVSwitch.

 

Output of esxtop for VM's:

 

  67108876      39870:VM04.eth0     vmnic1 DvsPortset-1         0.00    0.00       1.94    0.00   0.00   0.00

  67108877      49943:wah1.eth3     vmnic1 DvsPortset-1          0.00    0.00       0.00    0.00   0.00   0.00

  67108878      49943:wah1.eth2     vmnic2 DvsPortset-1          0.00    0.00       0.00    0.00   0.00   0.00

  67108879      49943:wah1.eth1     vmnic2 DvsPortset-1          0.00    0.00       0.00    0.00   0.00   0.00

  67108880      49943:wah1.eth0     vmnic1 DvsPortset-1          0.00    0.00       0.00    0.00   0.00   0.00

  67108881      50475:wah2.eth3     vmnic1 DvsPortset-1          0.00    0.00       0.00    0.00   0.00   0.00

  67108882      50475:wah2.eth2     vmnic2 DvsPortset-1          0.00    0.00       0.00    0.00   0.00   0.00

  67108883      50475:wah2.eth1     vmnic2 DvsPortset-1          0.00    0.00       0.00    0.00   0.00   0.00

  67108884      50475:wah2.eth0     vmnic1 DvsPortset-1          0.00    0.00       0.00    0.00   0.00   0.00

  67108885      51009:wah3.eth0     vmnic1 DvsPortset-1          0.00    0.00       0.00    0.00   0.00   0.00

 

If you then run the below esxcli command to list what VTEP a DVPort is using, we can see that VM04 on Switch port 67108876 is using a VTEP with an ID of 0.

 

/var/log # esxcli network vswitch dvs vmware vxlan network port list --vds-name=DSwitch-Cluster2 --vxlan-id=5000

Switch Port ID  VDS Port ID  VMKNIC ID

--------------  -----------  ---------

      67108870  vdrPort             1

      67108876  142                  0    <<< VM04

      67108877  141                  0

      67108878  143                  1

      67108879  144                  1

      67108880  145                  0

      67108883  139                  1

      67108884  140                  0

      67108885  138                  0

 

If we then use the below esxcli command to check what vmknic VTEP ID 0 maps to, this will provide the IP address and MAC address we will SRC traffic from when VM04 traffic is encapsulated in VXLAN from this host.

 

~ # esxcli network vswitch dvs vmware vxlan vmknic list --vds-name=DSwitch-Cluster2

Vmknic Name  Switch Port ID  VDS Port ID  Endpoint ID  VLAN ID  IP          Netmask        IP Acquire Timeout  Multicast Group Count  Segment ID

-----------  --------------  -----------  -----------  -------  ----------  -------------  ------------------  ---------------------  ----------

vmk1               67108871  130                    0        0  172.16.1.5  255.255.255.0                   0                      0  172.16.1.0

vmk2               67108872  137                    1        0  172.16.1.6  255.255.255.0                   0                      0  172.16.1.0

 

~ # esxcfg-vmknic -l <trunkated output>

Interface  Port Group/DVPort   IP Family IP Address        Netmask         Broadcast       MAC Address       MTU     TSO MSS   Enabled Type            

vmk1       130                 IPv4      172.16.1.5                   255.255.255.0   172.16.1.255    00:50:56:68:93:ed 1600    65535     true    STATIC           

 

We can then check again in esxtop what vmnic the VTEP vmk1 also mapped to and see if they matches what VM04 was using (vmnic1)

 

Output of esxtop for vmknic's:

 

~ #

  67108871                 vmk1     vmnic1 DvsPortset-1          0.19    0.00       0.77    0.00   0.00   0.00

  67108872                 vmk2     vmnic2 DvsPortset-1          0.00    0.00       0.97    0.00   0.00   0.00

 

So from the above VM04 is using VTEP ID 0 which is vmk1. Both vmk1 and VM04 are using vmnic1

 

So in conclusion when VM04 sends traffic to be encapsulated in VXLAN:

 

1. Outer SRC IP address is 172.16.1.5 (vmknic1)

2. Outer SRC MAC address is 00:50:56:68:93:ed (vmknic1)

3. Egress host physical nic - vmnic1 (DVUplink1)

 

Thanks for reading

Kev Barrass

This is part two of my blog post on my OpenFlow and Openvswitch lab.
This will be my last blog post for some time until I settle into my new Role.

 

In my previous blog Post http://communities.vmware.com/blogs/kevinbarrass/2013/03/13/mixed-hypervisor-with-openvswitch-and-openflow-network-virtualisation I showed how you can build a lab using several mixed Hypervisors KVM/XEN/XenServer all using Openvswitch and build a virtual network across all hosts using Openflow and GRE tunnels.

 

In the second part of this blog post I will show how I used Wireshark and the Open Flow Protocol "OFP" dissector to decode the OFP packets and get an idea of what is happening as well as viewing the flow tables on each Openvswitch "OVS".
You can find details of the OFP dissector on the website: http://www.noxrepo.org/2012/03/openflow-wireshark-dissector-on-windows/

To simplify number of packets I have captured, I have reduced the number of hosts from 4 to 2 with a single GRE tunnel as shown in the lab diagram below:

OVS blog 2 hosts.gif

We will need to know the OVS port-name to Port number on each host to be able to interpret the OFP messages; you can do this by typing the command "sudo ovs-dpctl show br0" on each host. In the case of my lab it gives the below port name->port number mappings.

 

Host1 OVS port name to port number mappings.

host1 show port name to port number.png

Host2 OVS port name to port number mappings.

host2 show port name to port number.png


As the above lab is running on VMware Workstation using VMnet8 (NAT) it is easy to use Wireshark on the same computer running Workstation to capture all traffic on that network as it is flooded over VMnet8.

What I will try to do during this lab test is sending a ping from VM1 to VM2, then capture the OFP packets. I will then decode some of the OFP packets and try to explain what each packet is doing, there is some duplication of OFP packets i.e. one for ICMP echo-request then one for ICMP echo-reply, I will decode the 1st for each flow.
Please bear in mind I'm very new to OVS and Openflow so I may well have mistakes in my interpretation on how this lab works and the Openflow Protocol/OVS. I would recommend building a similar lab reading the Openflow Switch specification and have a play

So in the minimal lab I have started up the POX Openflow controller as before, this time with just two OVS's connected. I have a Windows VM on each host/OVS one with the IP address 172.16.0.1 the second with the IP address 172.16.0.2. I then run a ping with a single ICMP echo-request from 172.16.0.1 to 172.16.0.2. Below is the Wireshark capture of the OFP packets related to both the ARP requests and ICMP echo request/reply.

 

all OFP packets.png

 

From the above OFP packet capture:
A. OFP Packets 197 and 224 in block A is related to VM01 on host1/OVS sending an ARP request for VM2's MAC address.
B. OFP packets 234 and 235 in block B are the OFP packets related to the ARP response from VM02 on host2/OVS.
C. OFP packets 239 and 240 in block C that relates to the ICMP echo-request from VM01 on host1/OVS to VM02.

 

ARP request OFP Packet IN Host1.png

Now that we have sent the ICMP echo-request from VM01 to VM02, VM01 generates an ARP request for VM02's MAC address. This is received by host1's OVS who has no matching flow for this packet so send an OFP Packet-in to the NOX Openflow controller.

In the above decoded packet capture of the OFP packet in for the ARP-request:
A: This OFP is of a type "packet-in" with a version of 0x01
B: The buffer ID of the buffered packet that caused this OFP packet.
C: What OVS port the packet was received on, in this case port 2.
D: Reason the OFP was generated; in this case it was due to no local flow entry matching this ARP request packet.
E: Frame data containing details of the received packet that can be used to construct a flow entry.
F: shows summary of the OFP packet.

 

ARP request OFP Packet OUT Host1.png

The NOX Openflow controller now receives the previous OFP packet and using the forwarding.l2_learning module makes a policy decision in this case as the ARP request is a broadcast so the NOX controller will instruct the OVS using a OFP packet-out to flood the packet out all ports except those blocked by spanning-tree STP "not used in this lab" or the source OVS port.

In the above decoded packet capture of the OFP packet out for the ARP-request:
A: This OFP is of a type "packet-out" with a version of 0x01
B: The buffer ID of the buffered packet on the OVS related to this packet-out.
C: What OVS port the packet was received on, in this case port 2.
D: Action type, in this case to output to a switch port.
E: Action to take, in this case to Flood the packet in buffer ID 288 out of all ports expect the input port and ports disabled by STP.
F: shows summary of the OFP packet.

At this point the ARP request from VM01 is flooded out of host1/OVS and received by host2/OVS. Host2/OVS then goes through the above process with this ARP request but I will not decode these as we have already examined similar  an OFP ARP-request packet above.

 

ARP reply OFP Packet IN Host2.png

At this point VM02 has received the ARP request and VM02 will send an ARP-reply back directly to the MAC address of VM01. This is received by host2's OVS who has no matching flow for this packet so send an OFP Packet-in to the NOX Openflow controller.

In the above decoded packet capture of the OFP packet in for the ARP-reply:
A: This OFP is of a type "packet-out" with a version of 0x01
B: The buffer ID of the buffered packet that caused this OFP packet.
C: What OVS port the packet was received on, in this case port 3.
D: Reason the OFP was generated, in this case it was due to no local flow entry matching this ARP reply packet.
E: Frame data containing details of the received packet that can be used to construct a flow entry.
F: shows summary of the OFP packet.

ARP reply OFP Flow Mod host2.png

 

The NOX Openflow controller receives the previous OFP packet from host2/OVS and using the forwarding.l2_learning module makes a policy decision. In this case as the ARP reply is not a broadcast packet, instead of a packet-out to flood the packet the NOX controller creates a specific flow entry and sends OFP flow-mod to the OVS. the OVS will install this Flow and then send the buffered packet out matching this flow.

In the above decoded packet capture of the OFP flow-mod for the ARP-reply:
A: This OFP is of a type "flow-mod" with a version of 0x01
B: The specific details used to create the flow.
C: Idle time to discard this flow when inactive, and max time before the next packet of this flow is punted back to the NOX Openflow controller for NOX to then install or not and matching flow back on the OVS.
D: The buffer ID of the buffered packet that caused this OFP packet.
E: Action type, in this case to output to a switch port.
F: Action to take, in this case to send all packets matching this flow including the one in buffer matching the ID 289 out of OVS port 1.
G: shows summary of the OFP packet.

As this ARP-reply passes to host1/OVS a similar flow will be installed by the Openflow controller NOX onto host1/OVS, with the ARP-reply eventually reaching VM01.

 

ICMP echo req OFP packet in host1.png

 

VM01 will then send an ICMP echo request as before this ICMP echo-request will reach host1/OVS and as there is no matching flow on host1/OVS the OVS will send an OFP packet type packet-in to the NOX Openflow controller.
The NOX Openflow controller receives the this OFP packet from host1/OVS and using the forwarding.l2_learning module makes a policy decision. The NOX controller creates a specific flow entry and sends OFP flow-mod to the OVS. The OVS will install this Flow and then send the buffered packet out matching this flow.

In the above decoded packet capture of the OFP flow-mod for the ICMP echo-request.
A: This OFP is of a type "packet-in" with a version of 0x01
B: The buffer ID of the buffered packet that caused this OFP packet.
C: What OVS port the packet was received on, in this case port 2.
D: Reason the OFP was generated; in this case it was due to no local flow entry matching this ICMP echo-request packet.
E: Frame data containing details of the received packet that can be used to construct a flow entry.
F: shows summary of the OFP packet.

 

ICMP echo req OFP Flow Mod host1.png

 

The NOX Openflow controller receives the previous OFP packet from host1/OVS and using the forwarding.l2_learning module makes a policy decision. The controller then creates a specific flow entry and sends OFP flow-mod to the OVS. The OVS will install this Flow and then send the buffered packet out matching this flow.

In the above decoded packet capture of the OFP flow-mod for the ICMP echo-request:
A: This OFP is of a type "flow-mod" with a version of 0x01
B: The specific details used to create the flow.
C: Idle time to discard this flow when inactive, and max time before the next packet of this flow is punted back to the NOX Openflow controller for NOX to then install or not and matching flow back on the OVS.
D: The buffer ID of the buffered packet that caused this OFP packet in.
E: Action type, in this case to output to a switch port.
F: Action to take, in this case to send all packets matching this flow including the one in buffer matching the ID 290 out of OVS port 1.
G: shows summary of the OFP packet.

The ICMP echo-request will then be tunnelled over to host2/OVS using GRE and host2/OVS will go through the same process and have a similar flow installed by the NOX Openflow controller. A similar process will then happen in reverse for the ICMP echo-reply.

All the flows being installed by NOX here are reactive flows, i.e. NOX did not determine the full network topology and install pre-emptive flows it is reacting to packet-in to then reactively install flows into the OVS Bridge br0 that originated the OFP packet-in.

To view the flows installed by the Openflow controller into the OVS userspace run the command "sudo ovs-ofctl dump-flows br0" and to view these flows that have been installed into the OVS data path for bridge br0 due to a flow matching a userspace flow you can run the command "sudo ovs-dpctl dump-flows br0" which will dump the installed flows as shown in the screenshot below:

 

ovs-dpctl dump flows.png

 

You can also run the command "sudo ovs-dpctl show br0 -s" to get port statistics such as received/transmitted packets as show in the screenshot below:

get port counters.png


That is the end of this my blog post on Mixed Hypervisor with Openvswitch and Openflow Network virtualisation I hope it was of some use to anyone and as before I'm open to feedback and any corrections on anything I may have miss-interpreted.

Thanks for reading.

Kind Regards
Kevin Barrass

These next 2 blogs are going to be a change from my usual vCNS based blogs and will be my last blogs for a while as I’m taking on a new exciting career in network virtualisation, but will hopefully post some more blogs in the future.

 

This blog is going to show the lab testing I’m doing on Openvswitch “OVS” and Openflow using the Hypervisors KVM, XEN and Citrix XenServer. All labs will be running on VMware Workstation using Ubuntu 12.04 Linux where I can.

Below is a diagram of the Lab I have built. There are 4 Hypervisors, 2 KVM, 1 XEN and 1 XenServer. Each Hypervisor is running Openvswitch. Both KVM and the XEN Hypervisor are running Libvirt for VM management and attaching VM’s to an Openvswitch bridge. I’m running a new version of Libvirt “version 1.0.0” on Host1 as I was trying out Openvswitch support in Libvirt for VLAN support as opposed to the other hosts using Linux Bridge Compatibility with Libvirt version 0.9.8.  The XenServer host is managed using Citrix XenCenter.

I then have the POX OpenFlow Controller with POXDesk installed on an Ubuntu 12.04 VM.

Even though in this lab each host is connected to VMNet8 for simplicity each host could be in a different layer3 network as GRE can be used over layer 3 boundaries but to keep this lab simple all hosts are on the same subnet.

Lab Diagram:

 

OVS blog lab.gif

What I wanted to achieve from this lab was to have different Hypervisors using a common Virtual Switch using an Openflow controller and a tunnelling method to virtualise the network. I could then use this lab to do some tests on Openflow outside of this blog.

During this blog I will take you through the below steps I took to build the lab, these will assume you will have install the hosts, Openflow controller OS, Hypervisors and VM management tools as well as each Openvswitch. I would highly recommend Scott Lowe’s blog http://blog.scottlowe.org/2012/08/17/installing-kvm-and-open-vswitch-on-ubuntu/for this as it is out of the scope of this blog.

 

Steps:

  1. Build the OpenFlow controller and start it with required modules
  2. Add a Bridge on each OVS called br0
  3. Configure the bridge br0 fail_mode to “Secure”
  4. Connect each OVS “br0” to the Openflow controller
  5. Add a GRE tunnel between each OVS “br0” as described in the lab diagram.

Once the above steps are done we can then verify each OVS is connected to the Openflow controller. Then boot our VM’s on each Hypervisor and test connectivity using ping. This will assume you have used tools such as virt-install or the Citrix XenCenter to create the VM’s and attach to each OVS bridge “br0”. On XenCenter you will need to add a “Single-Server Private Network” then find the bridge name on the XenServer hosts OVS using the command “ovs-vsctl show” in my case the bridge was called xapi0, this bridge will only be present when a VM is attached to it and powered on.

 

1. Build the OpenFlow controller and start it with required modules.

 

On the VM you have created to act as your Openflow controller install git, this will be used to download the POX repository.

“sudo apt-get install git”

Now clone and checkout the most current POX repository into your home folder.

“git clone http://github.com/noxrepo/pox”

“cd pox; git checkout betta”

Now to install the POXDesk GUI to view topology and Openflow tables etc using a web browser, run the below from within the pox folder.

“git clone http://github.com/MurphyMc/poxdesk”

The POX openflow controller is now built; there is a README file in the pox folder that shows how to start the POX controller and how to load modules.

We now want to start the POX Openflow controller with required modules using the below command.

“./pox.py --verbose forwarding.l2_learning samples.pretty_log web messenger messenger.log_service messenger.ajax_transport openflow.of_service poxdesk openflow.topology openflow.discovery poxdesk.tinytopo”

The “orwarding.l2_learning” module will be used to leverage the Openflow controller as a learning bridge/switch by installing reactive flows into each OVS bridge that sends an Openflow Protocol “OFP” packet up to the controller due to receiving a packet that does not match any local flows. The other modules are used to discover the topology, make logs look pretty and provide the POXDesk web user interface etc. Please note we are not using Openflow or spanning-tree to prevent forwarding loops as this is not something I have yet covered in OVS, The GRE tunnels are setup in a way to intentionally prevent any forwarding loops.

2. Add a Bridge on each OVS called br0

Now that we have our POX Openflow controller running we will add a bridge on each each KVM and the XEN host called br0 using the below command.

“sudo ovs-vsctl add-br br0”

3. Configure the bridge br0 fail_mode to “Secure”

So we can prove that the OVS bridge br0 is using the POX Openflow controller and not performing local learning we will set each OVS bridge br0 to have a fail mode of secure, which means that the OVS will not perform local learning and will rely on the POX Openflow controller to install flows in br0. Use the below command to set the fail mode to secure on each OVS br0 bridge.

“sudo ovs-vsctl set-fail-mode br0 secure” for KVM/XEN hosts

“sudo ovs-vsctl set-fail-mode xapi0 secure” for XenServer host

4. Connect each OVS “br0” to the Openflow controller

We now want to connect each of our newly created OVS bridges “br0” to our POX Openflow controller. We are not using TLS simply TCP in this lab.

“sudo ovs-vsctl set-controller br0 tcp:192.168.118.128:6633”

If we now run the command “sudo ovs-vsctl show” on each OVS we will see a single bridge “br0” on KVM/XEN or xapi0 on XenServer with a controller connected to IP 192.168.118.129 on TCP port 6633 and connection status is true. We can also see that the fail mode is “secure”.

What I wanted to achieve from this lab was to have different Hypervisors using a common Virtual Switch using an Openflow controller and a tunnelling method to virtualise the network. I could then use this lab to do some tests on Openflow outside of this blog.

During this blog I will take you through the below steps I took to build the lab, these will assume you will have install the hosts, Openflow controller OS, Hypervisors and VM management tools as well as each Openvswitch. I would highly recommend Scott Lowe’s blog http://blog.scottlowe.org/2012/08/17/installing-kvm-and-open-vswitch-on-ubuntu/for this as it is out of the scope of this blog.

Steps:

1. Build the OpenFlow controller and start it with required modules

2. Add a Bridge on each OVS called br0

3. Configure the bridge br0 fail_mode to “Secure”

4. Connect each OVS “br0” to the Openflow controller

5. Add a GRE tunnel between each OVS “br0” as described in the lab diagram.

Once the above steps are done we can then verify each OVS is connected to the Openflow controller. Then boot our VM’s on each Hypervisor and test connectivity using ping. This will assume you have used tools such as virt-install or the Citrix XenCenter to create the VM’s and attach to each OVS bridge “br0”. On XenCenter you will need to add a “Single-Server Private Network” then find the bridge name on the XenServer hosts OVS using the command “ovs-vsctl show” in my case the bridge was called xapi0, this bridge will only be present when a VM is attached to it and powered on.

1. Build the OpenFlow controller and start it with required modules.

On the VM you have created to act as your Openflow controller install git, this will be used to download the POX repository.

“sudo apt-get install git”

Now clone and checkout the most current POX repository into your home folder.

“git clone http://github.com/noxrepo/pox”

“cd pox; git checkout betta”

Now to install the POXDesk GUI to view topology and Openflow tables etc using a web browser, run the below from within the pox folder.

“git clone http://github.com/MurphyMc/poxdesk”

The POX openflow controller is now built; there is a README file in the pox folder that shows how to start the POX controller and how to load modules.

We now want to start the POX Openflow controller with required modules using the below command.

“./pox.py --verbose forwarding.l2_learning samples.pretty_log web messenger messenger.log_service messenger.ajax_transport openflow.of_service poxdesk openflow.topology openflow.discovery poxdesk.tinytopo”

 

The “orwarding.l2_learning” module will be used to leverage the Openflow controller as a learning bridge/switch by installing reactive flows into each OVS bridge that sends an Openflow Protocol “OFP” packet up to the controller due to receiving a packet that does not match any local flows. The other modules are used to discover the topology, make logs look pretty and provide the POXDesk web user interface etc.

Please note we are not using Openflow or spanning-tree to prevent forwarding loops as this is not something I have yet covered in OVS, The GRE tunnels are setup in a way to intentionally prevent any forwarding loops.

 

Pox Starting:

starting POX.png

2. Add a Bridge on each OVS called br0

Now that we have our POX Openflow controller running we will add a bridge on each each KVM and the XEN host called br0 using the below command.

“sudo ovs-vsctl add-br br0”

 

3. Configure the bridge br0 fail_mode to “Secure”

So we can prove that the OVS bridge br0 is using the POX Openflow controller and not performing local learning we will set each OVS bridge br0 to have a fail mode of secure, which means that the OVS will not perform local learning and will rely on the POX Openflow controller to install flows in br0. Use the below command to set the fail mode to secure on each OVS br0 bridge.

“sudo ovs-vsctl set-fail-mode br0 secure” for KVM/XEN hosts

“sudo ovs-vsctl set-fail-mode xapi0 secure” for XenServer host

 

4. Connect each OVS “br0” to the Openflow controller

We now want to connect each of our newly created OVS bridges “br0” to our POX Openflow controller. We are not using TLS simply TCP in this lab.

“sudo ovs-vsctl set-controller br0 tcp:192.168.118.128:6633”

If we now run the command “sudo ovs-vsctl show” on each OVS we will see a single bridge “br0” on KVM/XEN or xapi0 on XenServer with a controller connected to IP 192.168.118.129 on TCP port 6633 and connection status is true. We can also see that the fail mode is “secure”.

 

Image showing OVS connected to POX:

ovs-vsctl showing ovs connected to POX.png

 

Also if we go back to the terminal running our Openflow controller POX we can see each OVS bridge connecting to our Openflow controller.

 

Image of OVS connecting to POX:

OVS connecting to POX.png

 

As we have not got any interfaces such as “Eth0” connected to each of the OVS bridges br0, any VM’s attached to each bridge br0 will only be able to communicate with VM’s on the same OVS bridge/host. To enable layer 2 connectivity between each hosts OVS bridge br0 we could simply add the interface Eth0 into a OVS port Eth0 on each bridge br0 but that would not work if each host was on a different layer 2/3 domain. So we will configure a GRE tunnel between each OVS bridge “br0” making sure we do not create any forwarding loops i.e. create a simple daisy chain of bridges not a ring.

 

5. Add a GRE tunnel between each OVS “br0” as described in the lab diagram.

 

For host1:

“sudo ovs-vsctl add-port br0 gre0 -- set interface gre0 options:remote_ip=192.168.118.146”

 

For host2:

“sudo ovs-vsctl add-port br0 gre0 -- set interface gre0 options:remote_ip=192.168.118.145”

“sudo ovs-vsctl add-port br0 gre1 -- set interface gre1 options:remote_ip=192.168.118.130”

 

For host3:

“sudo ovs-vsctl add-port br0 gre1 -- set interface gre1 options:remote_ip=192.168.118.146”

“sudo ovs-vsctl add-port br0 gre2 -- set interface gre2 options:remote_ip=192.168.118.133”

 

For host4:

“sudo ovs-vsctl add-port xapi0 gre2 -- set interface gre2 options:remote_ip=192.168.118.130”

 

On each host, if we now issue the command “sudo ovs-vsctl show” we will now see the gre tunnel configured. In this case now with a new port named “gre0” with an interface “gre0” attached of a type gre with a remote endpoint IP address.

 

Image showing OVS GRE tunnel:

ovs-vsctl showing tunnel.png

We now have 4 OVS bridges connected to each other using GRE tunnels and are all also connected to the same POX Openflow controller.

We can now start up a VM on each host and test connectivity between VM’s over the virtual network we have just created and the results should be that all VM’s have connectivity to each other as though they are all on a single virtual switch as below.

 

Image showing single virtual switch:

OVS blog lab 02.gif

If you take a look at the terminal running your POX Openflow controller you will see log events for flows being installed as below from the connectivity test we performed earlier.

 

Image showing Flows being Installed:

POX flows being install on terminal.png

We can now use the POXDesk module to view the topology. We will open up Firefox and load the URL

 

http://192.168.118.129:8000

 

At the bottom of the page that loads click on the link “/poxdesk”

When the POXDesk GUI loads you can then click on the bottom left “POX” buttons TopologyViewer, Tableviewer and L2LearningSwitch to view a graphical topology and view flows on each OVS bridge as shown below:

poxdesk.png

 

That is the end of this part of the blog, at this stage we have a working Openflow/Openvswitch lab working on different hypervisors with connectivity between all VM’s utilising GRE tunnels. In the second part of this blog I will show how you can use Wireshark and the OFP plug-in to decode the OFP packets and make sense of them, as well as use cli commands to view the flow tables at various parts of an OVS and look at interface counters etc.

Thanks for reading and as always open to feedback.

 

Kind Regards

Kevin

Note: all comments and opinions expressed in this blog are that of myself and not my employer.

Hi

 

I was originally going to do a blog post on using vCNS Edge as a VXLAN gateway as a follow-on from my previous blog posts. I have put that on hold due to the release of vCloud Connector 2.0 which has a feature of great interest to me, the "Datacenter Extension".

 

Everything below is my own personal interpretation of vCloud Connector 2.0 from my own testing.

 

My lab is a Hybrid Cloud comprising of a vSphere 5.1 Datacenter as the source cloud and a Public Cloud based on  vCloud Director 5.1.1, destination cloud. Both environments are using vCNS 5.1.2 as this is required for the Datacenter Extension feature and I decided to use VXLAN's in both clouds.

 

I won't go into details of how to install vCloud Connector 2.0 as there are plenty of blogs out there and from what I can see there isn't a great deal of difference in installation and initial configuration other than enabling the two clouds for stretched deployments.

 

In my vSphere 5.1 Datacenter I have a 2 VM's on a single VXLAN behind a vCNS Edge with a single external network. The VXLAN is in the subnet 10.1.0.0/24 with a default gateway of 10.1.0.1 which is assigned to the vCNS Edge's internal interface. I also have a vCloud Connector Server and Node deployed in this vSphere 5.1 Datacenter.

In the vCloud 5.1.1 environment I have a single vCloud Connector Node, and single routed OrgVDC Network with an external IP sub allocation on the vCNS Edge for this network. This OrgVDC network is a VXLAN.

 

Both the node in the vSphere Datacenter and the node in the vCloud environment are registered to the vCloud Connector Server and nodes registered to the respective vCenter or vCloud.

 

What I want to achieve is to copy the powered off VM "UniVM01" on the vSphere 5.1 Datacenter to a new vAPP on my org on the vCloud environment. As this is a stretched deployment the network UniVM01 is attached to will be extended into the vCloud environment.

 

Below is the process I went through to copy the VM over and stretch/bridge the connected network.

 

On the vCenter server for the source cloud "vSphere 5.1 datacenter" click on the vCloud Connector Plug-in then select UniVM01 and from the Actions button select "Stretch Deploy".

 

select stretch deploy.png

At this point you are given the option to select the target cloud, in this case the vCloud lab environment. Give it a name, and select an existing catalog to use.

 

select target cloud.png

Click next, then you are prompted to select the target resources(Org VDC, OrgVDC Network and the external IP address that will be used to DNAT the SSL VPN traffic to the vCNS Edge that will be created to terminate the SSL VPN to).

 

select target org network.png

Click next, then if required select any proxy settings required, in this case none are required.

 

select proxy.png

click next again and select "power on deployed entity" to power on the copied VM UniVM01.

 

deplpyment options.png

Then click next again where you are presented with a summary of the deployment, then click finish for the deployment process to start.

 

ready to cpmplete.png

If successful a new vAPP will be created in the vCloud Org as below, this will be a routed vAPP connected to the OrgVDC Network using a vCNS Edge. The external IP address of this new vAPP Edge will be NAT'ed on the OrgVDC Edge to the external network. The internal IP address of this new vCNS edge will have the same IP address as the existing vCNS Edge on the vSphere 5.1 Datacenter "10.1.0.1". This new vAPP vCNS Edge from what I can see then runs a SSL VPN Server in bridged mode with a local and remote network of both 10.1.0.0/24.

 

vCloud connecter it seems then configures the existing vCNS Edge on the vSphere 5.1 Datacenter to to start a SSL VPN client in bridged mode and connect to the new vCNS Edge in vCloud.

 

## Screen shot of new vAPP created in vCloud ##

 

new vapp in vlcoud top level.png

At this point I ran a constant ping from UniVM02 (10.1.0.111) which is running on the vSphere 5.1 Datacenter to UniVM01 (10.1.0.222) running in vCloud. Both VM's on the same L3/L2 domains but on different VXLAN's in different Datacenters. As expected the pings succeeded

North/South traffic i.e. traffic from external to UniVM01 will go via the vCNS Edge on the vSphere 5.1 datacenter which I tested successfully using a DNAT and firewall on the vCNS Edge on the vSphere 5.1 Datacenter.

 

ping between VMs.png

At this point I thought it would be interesting to have a poke around the config of both the vCNS Edges using both the vShield Manager and Edge CLI, the results I found didn't fully add up as explained below.

 

On the vShield Managers for both environments vSphere 5.1 and vCloud it shows that IPSec VPN is enabled with an active tunnel with a local and remote network of 10.1.0.0/24 with a tunnel status of UP, mmm strange as this is documented as an SSL VPN. Debugging traffic flows between both vCNS Edges also as far as I can see shows no IPSec traffic.

 

## vCloud vCNS Edge IPSec status ##

 

vcloud ipsec VPN via gui.png

## vSphere 5.1 Datacenter vCNS Edge IPSec status ##

 

source ipsec VPN via gui.png

The Edge CLI of both Edges also shows an IPSec VPN config but shows no active VPN sessions??

 

Checking the status of the vCNS Edge in the vCloud environment using the vShield Manager GUI shows that SSL VPN is disabled but with a single active SSL VPN session "acting as the server" which again doesn't add up?

 

vcloud SSL VPN via gui.png

The vCloud Edge CLI shows that the SSL VPN tunnel is up and passing traffic.

 

vcloud edge ssl vpn cli status.png

Edge CLI SSL vpn stats on vcloud edge.png

Neither vCNS Edges have any visible SSL VPN config that I can see from either the vShield Manager GUI or CLI.

 

The vCNS Edge on the vSphere 5.1 Datacenter doesn't show SSL VPN as enabled either as expected as this is acting as client. This vCNS Edge has started an SSL VPN to the vCNS Edge in vCloud using the process "naclientd" which I assume is the NeoAccel SSL VPN client but I could be wrong as this is just an assumption"

 

vcloud network connections.png

source network connections.png

Just to be sure the ping traffic was traversing the SSL VPN Bridge I ran a constant ping from UniVM01 in vCloud to UniVM02 on the vSphere 5.1 Datacenter and then debugged the traffic flow on the vCloud vCNS Edge internal and external interfaces which shows that the ICMP traffic enters the internal interface and SSL VPN traffic exits the external interface.

 

## Internal Interface debug ##

 

ping from vcloud to vsphere - debug on vcloud edge inside interface.png

## External Interface Debug ##

 

ping from vcloud to vsphere - debug on vcloud edge.png

One interesting thing to note is how even though each vCNS edge vSphere 5.1 Datacenter and vCloud environment both have the same internal IP address 10.1.0.1 the VM's attached to their respective Edges has an ARP entry for 10.1.0.1 matching the MAC address of the local vCNS Edge.

 

default gateway arp entries on VMs.png

 

Doing this blog post has raised more questions for myself on how this Layer 2/Bridge SSL VPN feature works and how you can debug/fault find any issues. But all-in-all this is a great feature and very easy to use.

I'm still doing some testing on the vCloud Connector 2.0 so hope to update this blog post as I learn more. I would appreciate any comments on any aspects I may have incorrect or miss interpreted.

 

Thank you for reading.

 

Regards

Kev Barrass

In this Blog Post I will describe how I have configured VXLAN's over a Multicast enabled Layer 3 network. I will show the router configs and the associated multicast routes created and the host VXLAN mappings.

 

This lab is a physical lab rather than a virtual one on VMware Workstation. I hope to cover how to do this on a virtual lab in a later Blog Post. The lab is based on two hosts each with 2 Nics, a small PC for vCenter, vSphere Client and shared storage. The vShield Manager "VSM" is deployed onto the one of the two ESXi hosts. The lab is based on vSphere 5.1 and vCNS 5.1.

The network is based on two Cisco routers and a Switch for vSphere PC as shown in the diagram below:

 

vxlan over l3 lab.png

 

On the network I have deployed PIM Sparse-Mode "PIM-SM" with R1 as the rendezvous point for both routers R1 and R2. I have used PIM-SM as apposed to sparse-dense mode which seems to be the recommendation. My reasons for this is that I have more experience of PIM-SM so this seems a good starting point for me to learn VMwares implementation of VXLAN.

 

As the previous blog I will not be deploying a VXLAN gateway just yet and will be concentrating on just VXLAN itself. The VSM and VXLAN preparation is identical to my previous 2 part Blog Post "Simple VXLAN lab on Workstation viewing traffic with Wireshark".

The only difference here is that each ESXi hosts vmk1 interface is now in a different layer 3 and layer 2 segment.

Host ESXi1 VXLAN interface vmk1 is in subnet 192.168.150.0/24

Host ESXi2 VXLAN interface vmk1 is in subnet 192.168.136.0/24

Each physical router acts as a DHCP server for each ESXi host.

 

Preparation for VXLAN is now as per below:

 

prepared cluster.png

Below are the configs I used for routers R1 and R2, a simple network based on two routers in a single PIM-SM domain both running OSPF in a single area 0.

 

Router "R1" config:

 

!
hostname R1
!
ip multicast-routing
!
interface Loopback0
description PIM RP
ip address 172.16.1.1 255.255.255.255
ip pim sparse-mode
!
interface FastEthernet0/0
description Link to Host esxi1
ip address 192.168.150.254 255.255.255.0
no ip proxy-arp
ip pim sparse-mode
duplex auto
speed auto
!
interface FastEthernet0/1
description Link to R2-Fa0-1
ip address 10.1.0.1 255.255.0.0
ip pim sparse-mode
duplex auto
speed auto
!
router ospf 1
log-adjacency-changes
passive-interface FastEthernet0/0
network 10.0.0.0 0.255.255.255 area 0
network 172.16.1.1 0.0.0.0 area 0
network 192.168.0.0 0.0.255.255 area 0
!
ip pim rp-address 172.16.1.1
!
ip access-list standard VXLAN-1-BOUNDARY
deny   224.1.1.50
permit 224.0.0.0 15.255.255.255
!

 

Router "R2" config:

 

!
hostname R2
!
ip multicast-routing
!
interface FastEthernet0/0
description Link to Host esxi2
ip address 192.168.136.254 255.255.255.0
no ip proxy-arp
ip pim sparse-mode
duplex auto
speed auto
!
interface FastEthernet0/1
description Link to R1-Fa0-1
ip address 10.1.0.2 255.255.0.0
ip pim sparse-mode
duplex auto
speed auto
!
interface FastEthernet1/0
ip address 10.3.0.1 255.255.0.0
ip pim sparse-mode
duplex auto
speed auto
!
router ospf 1
log-adjacency-changes
passive-interface FastEthernet0/0
network 10.0.0.0 0.255.255.255 area 0
network 172.16.1.1 0.0.0.0 area 0
network 192.168.0.0 0.0.255.255 area 0
!
ip pim rp-address 172.16.1.1
!
ip access-list standard VXLAN-1-BOUNDARY
deny   224.1.1.50
permit 224.0.0.0 15.255.255.255
!

 

As in the previous lab I have two OpenBSD VMs deployed, VM01 on host ESXi1 and VM02 on host ESXi2. I have then created a single VXLAN named vxlan-01 with a VNI of 5000 using the multicast group 224.1.1.50.

 

VM's VM01 and VM02 are in subnet 172.16.0.0/24, VM01 with the IP 172.16.0.2 and VM02 with the IP 172.16.0.3.

 

With the VM's deployed and their vNics a member of vxlan-01 as expected there are no hosts joined to the multicast group 224.1.1.50 and there are no active multicast sources for the group 224.1.1.50.

 

VMs Powered Off mroute.png

VMs Powered Off IGMP.png

Now we will power on both VM's VM01 and VM02. As soon as the VM's are powered on even before the guest OS of the VM's has booted up each host now joins the multicast group 224.1.1.50 through IGMP version 2.

 

We now have multicast routes in place for the two hosts that have joined the multicast group 224.1.1.50.

 

VMs Powered On mroute ASM only.png

We can also see the IGMP membership report for group 224.1.1.50 on each router.

VMs Powered On IGMP.png

 

At this point as no packets such as broadcast, unknown unicast or ARP have been send from VM's VM01 and VM02 and therefore nothing has had to be encapsulated in the multicast group 224.1.1.50 by either host ESXi1 or ESXi2 so no multicast sources for group 224.1.1.50 are registered with the rendezvous point.

 

Now a ping session is started from VM01 on Host ESXi1 to VM02 on host ESX2. At this point host ESXi01 encapsulated the ARP request packet into a multicast packet and transmits it on the group 224.1.1.50 with a VXLAN header for VNI 5000. The router R1 will register the source 192.168.150.128 for the group 224.1.1.50 with the rendezvous point. The host ESXi2 will receive the Multicast packet for group 224.1.1.50 and VNI 5000, decapsulates it and send onto the recipient VM whilst adding the source VM Mac address, host and VXLAN mapping into its VXLAN mapping table. Router R2 will then send a source specific join towards the host ESXi1 for the group 224.1.1.50 (192.168.150.137,224.1.1.50). When the router R1 is recieving duplicate packets one from Shared tree and one from the now formed shortest-path-tree, the router R2 will switch over to the shortest-path-tree.

 

We now have the below multicast routes in place showing host ESXi1 as a source for group 224.1.1.50.

 

VMs Powered On mroute SSM source.png

Only the original ARP request is passed over multicast the rest of the ICMP session is passed over Unicast between the host's encapsulated in a VXLAN packet for VNI 5000.

 

On the previous Blog Post I put up on VXLAN, the hosts VXLAN mapping table had an outer MAC that matched the recipient hosts vmk1 MAC address.

In this use case the outer MAC now has the outer MAC of the 1st hop router i.e. R1 fa0/0 or R2 fa0/0 as shown below, I presume this MAC address is learned when a host receives a VXLAN frame from another host as the source MAC address will be that of the egress router and alleviates the need for proxy-arp on the routers or a separate kernel default route for the VXLAN network.

 

ESXi VXLAN mapping L3.png

Anyway, this is a short Blog Post just to hopefully describe basically how VXLAN can be used over a Layer 3 multicast enabled network.

In a future Blog Post I will look at the VXLAN gateway "vCNS Edge" and how it can be used to connect from on VXLAN to another or from the "real world" into a VXLAN. I will also cover NAT and firewall services on the vCNS Edge and how you can use the Edge CLI to aid fault finding.

 

The above is based on my understanding of both PIM-SM and VXLAN so may well be wrong, then again may hopefully be right

 

Thanks for reading.

 

Kevin Barrass

My 1st lab I'm blogging shows how you can setup a simple virtual lab to create VXLAN's on then use wireshark to view the VXLAN traffic and peek inside the VXLAN traffic by decoding the packets.

 

On my lab I have a single VM for vCenter 5.1, 2 ESXi 5.1 hosts and a shared storage virtual appliance.

The vCenter and each host as well as storage appliance has a vNic in the VMNet1(host only) Workstation network. Each host also has a vNic in the VMNet4(host only) Workstation network that is DHCP enabled for VXLAN transport. My lasptop has the IP address 192.168.142.1 and has vSphere 5.1 client and Wireshark installed. I have created a single DC with one cluster that I have deployed the vShield Manager "VSM" into. I have also created two VM's for testing. I've used OpenBSD as they have a small footprint and can run various services for testing network traffic.

 

Please note figure 1.0 in Part two of this blog post on how the lab was built on VMware Workstation.

 

Once the VSM has booted up and had its IP, Subnet, default gateway and DNS configure from the VSM CLI I registered it with my vCenter Server.

 

register VSM.png

 

Once the VSM plugin has registered with vCenter Client you navigate to the DC and click the "Network Virtualization" tab. Then click on preperation and edit for the segment ID. Enter some test values in this case a multicast group range of 224.1.1.50-224.1.1.60 and segment ID pool of 5000-5100.

 

create segment ID.png

 

Once the segment ID is added click on connectivity>edit and select the cluster that you want to deploy VXLAN over. This assumes I have already created the vNetwork Distributed switch version 5.1 accross all hosts in the cluster.

 

select cluster.png

 

At this point VXLAN will be deployed onto each host in the cluster, a new VMkernel interface for VXLAN will be created on each hosts in my case vmk1. The vmk1 interfaces will be assigned an IP address using DHCP from the VMware workstation VMNet4 IP address pool.

 

At this point the Cluster is prepared for VXLAN and should have a status of Normal.

 

prepared.png

 

We now create the Scope for the VXLAN's in this case a single cluster. Select add then give the scope a name, descipton and select the cluster(s) for this scope.

 

add scope.png

 

Great, VXLAN is now deployed and ready for us to create our 1st VXLAN. Each VXLAN wil be mapped to a multicast group, if we assigned just two multicast addresses to our range and created 3 VXLANs you would have 2 virtual wires mapped to a single multicast group. This is not a problem as far as I know but you would need to be mindful of mutliple VXLANs broadcast domains being on the same multicast tree and would make pruning each broadcast domain back harder due to a single multicast group for multiple VXLAN's so might need carefull thought and planning.

 

So now to create our 1st VXLAN, simply click on Networks then the green cross for add. Give the VXLAN a name, description and scope which in this case has to be scope-01. Then click ok. The VXLAN is then created and an associated Portgroup is created. The VXLAN is given a VNI "VXLAN Network Identifier" of 5000 and mapped to the multicast group 224.1.1.50.

 

create virtual wire.png

Now we will add each of our two OpenBSD VM's into the VXLAN,

 

assign nic to vxlan.png

 

Now run a continous ping from VM01 "172.16.1.2" to VM02 "172.16.1.3". Then open the packet capture tool Wireshark and choose the interface for VMNet4 which is the virtual network on Workstation we are transporting the VXLAN traffic over.

 

Now start the packet capture whilst the ping session is still running from VM01 to VM02. It is important to make sure that VM01 and VM02 are NOT running on the same host or you will be sat like I was wondering why I didnt see any traffic. The reason for this is we want to see the VXLAN traffic going between hosts over the VXLAN transport network.

When the packet capture starts you should see the VXLAN UDP traffic with a source IP of one ESXi host and a destination of the other ESXi host, with a destination UDP port of 8472. Looking at the packet capture we can only see that it is a VXLAN packet, and that it has a payload. The important thing here is that the traffic is unicast what we havent done is captured the traffic that transported the ARP request using multicast, we will do this later in this blog.

 

packet capture none decoded.png

What we can now do is use the VXLAN decoder in Wireshark to look agt the VXLAN header and the contents of the packet. To do this select a packet and right click then select "decode as" and choose a transport protocol of VXLAN.

 

packet capture set decode to vxlan.png

 

Now we should see the contents of the VXLAN payload i.e. the original packet which in this case is the ICMP echo-request and echo-reply. You can also view the VXLAN header showing the VNI of 5000 which is the VXLAN Network Identifier which was assigned to our VXLAN.

 

packet capture decoded.png

 

As we missed the original ARP request sent by VM01 for VM02's MAC address from VM01 we will run a continous ping to an IP address not in use 172.16.1.4. this will keep VM01 sending an ARP request out as a broadcast. VXLAN will then encapsulate this packet into a multicast packet onto the multicast group 224.1.1.50 and this will be sent to all hosts that would have joined this group through IGMP. In our case we are not using PIM or IGMP snooping. As you can see in the none-decoded traffic flow below the source remains the hosts IP address but the destination is now the multicast group for this and possibly other VXLAN's.

 

packet capture ARP none decoded.png

 

In part 2 of this blog post we will decode the VXLAN multicast traffic, and show what information you can view on the ESXi hosts via an SSH session.

 

Kevin Barrass

Now that we have captured an ARP request that has been encapsulated in multicast by VXLAN we will now use the same option in Wireshark to decode the VXLAN packet. This shows the same VXLAN header and the encapsulated ARP request. The host that received this encapsulated ARP request would add the MAC address of the requesting VM in this case VM01 to its VXLAN mapping.

 

packet capture ARP decoded.png

 

If we re-run the constant ping from VM01 to VM02 we can log into each ESXi host and view its VXLAN mapping of VM MAC addresses. In this case we enable SSH on each host, log into the host and run the command:

 

esxcli network vswitch dvs vmware vxlan network mapping list --vds-name=dvSwitch --vxlan-id=5000

 

Where dvSwitch is our vNetwork Distributed switch name and 5000 is the VNI of our VXLAN. This will then show the mappings for our two VM's VM01 and VM02. The inner MAC is the MAC address of the VM and both the outer MAC and outer IP are that of the recipient ESXi hosts vmk1 VMKernel interface. If we were doing VXLAN over a routed network the out IP would still be that of the recipient ESXi host but the outer MAC would be that of the next hop PIM router. I will cover this in a later blog post on VXLAN over a pure layer 3 network using PIM Sparse Mode.

 

VXLAN mappings for VNI 5000

 

show vxlan mapping.png

 

As you can see the destination for VM01 on host ESXi1 matches the MAC and IP address of that hosts VMKernel interface vmk1.

vxlan vmk1 interface.png

Another thing you can do is to view from the ESXi cli the VNI to multicast group mapping using the command:

 

esxcli network vswitch dvs vmware vxlan network list --vds-name=dvSwitch --vxlan-id=5000

 

There are more combinations of "esxcli network vswitch dvs vmware vxlan" command than Ive played with or could cover in this blog post.

show vxlans and mcast groups via ssh.png

Thats the end of this blog post, as this is my 1st lab blog post I really appreciate any constructive feedback as I hope to add more blogs posts as I lab things up at home depending on free home time

 

VXLAN Lab setup as per Figure 1.0 below:

Computer with 16GB RAM, quad core i5 CPU and Solid state disk.

VMware Workstation verson 8

One VM for vCenter, one VM for shared storage and two ESXi VMs with virtualised VT-x enabled. vShield Manager as a VM on the virtual ESXi hosts.

 

Figure1.0

vxlan blog 1.gif

 

I hope to create a blog post for each of the below in the coming months:

 

          VXLAN over Layer 3 using PIM-SM

          vCNS Edge SSL VPN.

          vCNS Edge firewall and CLI/debugging

          and hopefully more.....

 

Thanks for reading.

Kevin Barrass

kjbarrass Novice

Welcome to my Blog

Posted by kjbarrass Nov 21, 2012

Only just set up this blog but hope over the coming weeks and months to upload how to build VXLAN/vCNS and vCloud labs on VMware Workstation. I will also put up examples of how to do packet captures to get a better understanding of how VXLAN works.

 

Kev Barrass