VMware Networking Community
Bolw
Contributor
Contributor

Help with multi-tenant configuration

I want to learn how network overlay works on the NSX environment so I set up a lab with the following setting

GW: 10.1.1.254, 10.1.2.254, 10.1.3.254

vCenter: 10.1.3.1

NSX: 10.1.3.2

NSX Controller: 10.1.2.100

(Cluster1)

  - ESX1: 10.1.1.1

    - Guest1: 192.168.1.1

  - ESX2: 10.1.1.2

    - Guest2: 192.168.1.2

(Cluster2)

  - ESX3: 10.1.2.1

    - Guest3: 192.168.1.3

Then I did

1. configure VXLAN on both clusters

2. create a transport zone (unicast) and add both clusters to it

3. create a logical switch with the transport zone above and add all 3 guest machines to it

4. then I try to ping each other among the guest machines but none of them succeeded.

I can capture the VXLAN packets from ESX1 when Guest1 pings Guest2. The weird thing is the destination IP address is not ESX2. Instead it's an IP address not in my settings, 0.0.0.1. I thought if Guest1 pings Guest2, the destination IP of the VXLAN packet should be ESX2 and if Guest1 pings Guest3, the destination IP should be ESX3. So what's wrong in my lab setup? Thanks for help.

0 Kudos
24 Replies
kumarlakshman_k
Enthusiast
Enthusiast

Hello,

From VXLAN perspective you need to check these configurations

1. you need to check your segment IDs and multicast address are proper

2. check your VTEPs you created are pinging (basically this is a new vmkernal port get created for VXLAN encapsulation and decapsulcation of VXLAN packets on ESX)

3. For VMs virtual NIC you need to assing the virtual network of the VXLAN you created

Thanks & Regards,

Lakshman,VCP550

0 Kudos
admin
Immortal
Immortal

Hello,

What IP addresses and what VLAN ID did you give to your VTEPs when you configured VXLAN?

See screenshot from a nested lab (VLAN ID = 0). The IP addresses you see in "VMKNic IP Addressing" are the IPs I'm talking about. These are VTEPs, and all VXLAN traffic from hosts will be originated and terminated to these IPs.

Screen Shot 2014-07-21 at 10.21.32 AM.png

0 Kudos
Bolw
Contributor
Contributor

The screenshot of my configuration is as follows. It looks the IP addresses are OK. It's weird to see packets with IP address 0.0.0.1.

vxlan.png

0 Kudos
Bolw
Contributor
Contributor

kumarlakshman_kumar 撰写:

Hello,

From VXLAN perspective you need to check these configurations

1. you need to check your segment IDs and multicast address are proper

2. check your VTEPs you created are pinging (basically this is a new vmkernal port get created for VXLAN encapsulation and decapsulcation of VXLAN packets on ESX)

3. For VMs virtual NIC you need to assing the virtual network of the VXLAN you created

Thanks & Regards,

Lakshman,VCP550

1. my segment ID pool is 5000-5200. multicast addresses are 239.0.0.0-239.255.255.255. since I use unicast trasport zone, I guess the multicast address setting is not used?

2. yes, those addresses are accessible among all machines in my lab except those guest virtual machines

3. I think this is configured by vCenter(NSX?) automatically. in my lab, they are assigned to a port group named vxw-dvs-15-virtualwire-2-sid-5000-LSW1 on the DS. 

0 Kudos
kumarlakshman_k
Enthusiast
Enthusiast

Hello Bolw,

the config looks fine to me, I have not worked on NSX but have worked on VXLAN using vShield(Network Virtualization).

can you try this command issuing a ping command, run on the ESX where the VM is hosted

'pktcap-uw --capture Dynamic -f UplinkDevTransmit --vxlan '5000/5001...'' |grep TSO ==> VXLAN argument value should the segment ID being used for this group. as you are creating for first time it should be  5000.

1. ping the VMs of ESX in the same cluster

2. try this command on ESX machines in the same cluster 'net-dvs -l|grep "vxlan\|port"' ==> check the property '...vxlan.id' matches on both the ESX

Thanks & Regards,

Lakshman,VCP550

0 Kudos
Bolw
Contributor
Contributor

can you try this command issuing a ping command, run on the ESX where the VM is hosted

'pktcap-uw --capture Dynamic -f UplinkDevTransmit --vxlan '5000/5001...'' |grep TSO ==> VXLAN argument value should the segment ID being used for this group. as you are creating for first time it should be  5000.

1. ping the VMs of ESX in the same cluster

2. try this command on ESX machines in the same cluster 'net-dvs -l|grep "vxlan\|port"' ==> check the property '...vxlan.id' matches on both the ESX

Thanks & Regards,

Lakshman,VCP550

the output of pktcap-uw is

07:04:29.131064[1] Captured at Dynamic point, TSO not enabled, Checksum not offloaded and not verified, Vxlan 5000 but not encapsulated, length 110.

and the output of net-dvs -l|grep "vxlan" on both ESX servers are

                com.vmware.net.vxlan.udpport = 0x21.18

                com.vmware.net.vxlan.vmknic = 0x 1

                com.vmware.net.vxlan.cp = 0x 0. 0. 0. 1

                com.vmware.net.vxlan.id = 0x 0. 0.13.88

                com.vmware.net.vxlan.mcastip = 0x 0. 0. 0. 1

the IP 0.0.0.1 is what I saw in the VXLAN packet. where are these fields mapped in the UI setting?

0 Kudos
kumarlakshman_k
Enthusiast
Enthusiast

Hello,

the IP 0.0.0.1 which you are seeing is a multicast IP.

you cant see all the fields in the UI.

currently I don't have the VXLAN setup will try in 1 or 2 days and update you.

mean while you can make sure few things like

1. vxlan.id is same on both ESX machines

2. Try debugging between the VMs of same cluster first

3. GOS is able to ping VTEPs(vmkernal ports created for VXLAN) of both ESX again try in same cluster

Thanks & Regards,

Lakshman,VCP550.

0 Kudos
Bolw
Contributor
Contributor

kumarlakshman_kumar 撰写:

Hello,

the IP 0.0.0.1 which you are seeing is a multicast IP.

you cant see all the fields in the UI.

currently I don't have the VXLAN setup will try in 1 or 2 days and update you.

mean while you can make sure few things like

1. vxlan.id is same on both ESX machines

2. Try debugging between the VMs of same cluster first

3. GOS is able to ping VTEPs(vmkernal ports created for VXLAN) of both ESX again try in same cluster

Thanks & Regards,

Lakshman,VCP550.

why multicast is used if I use unicast transport zone and logical switch?

1. vxlan.id are same on both ESX

2. yes, that's my current scenario

3. I don't understand this. GOS and VTEPs are in different subnets. why should they be able to ping VTEPs?

Thanks.

0 Kudos
admin
Immortal
Immortal

I see that you have only one Controller, which is an unsupported (and potentially very unstable) configuration. Please confirm that this is the case.

To do some further troubleshooting, could you please do the following for me:

1) On your controller, please run the following commands:

show control-cluster logical-switches vni 5000

show control-cluster logical-switches connection-table 5000

show control-cluster logical-switches vtep-table 5000

show control-cluster logical-switches mac-table 5000

If you have only one Controller and the output of the above commands is empty, we have a problem.

2) On your hosts, please run the following:

esxcli network vswitch dvs vmware vxlan network list --vds-name DSwitch

This command's output will confirm whether control plane connection to the Controller is up or not.

0 Kudos
kumarlakshman_k
Enthusiast
Enthusiast

Hello Bolw,

don't know if you have tried or not, but there is 'http://labs.hol.vmware.com.' there are couple of labs for NSX server with detailed explanation of each component. it should solve all your queries.

Thanks & Regards,

Lakshman,VCP550

0 Kudos
Bolw
Contributor
Contributor

DmitriK 撰写:

I see that you have only one Controller, which is an unsupported (and potentially very unstable) configuration. Please confirm that this is the case.

In NSX installation guide, it says "VMware recommends that you add 3 or 5 controllers for scale and redundancy.". Could you explain why 1 controller is unsupported if I don't care the redundancy?

1) On your controller, please run the following commands:

show control-cluster logical-switches vni 5000

show control-cluster logical-switches connection-table 5000

show control-cluster logical-switches vtep-table 5000

show control-cluster logical-switches mac-table 5000

If you have only one Controller and the output of the above commands is empty, we have a problem.

Except the 1st command, all output is empty. The output of the 1st command is

VNI      Controller      BUM-Replication ARP-Proxy Connections VTEPs

5000    10.1.2.101     Enabled             Enabled      0                 0

2) On your hosts, please run the following:

esxcli network vswitch dvs vmware vxlan network list --vds-name DSwitch

This command's output will confirm whether control plane connection to the Controller is up or not.

VXLAN ID  Multicast IP                     Control Plane  Controller Connection  Port Count  MAC Entry Count  ARP Entry Count

--------------  ------------------------------------   -------------------  -------------------------------  ----------------  -------------------------  -------------------------

       5000  N/A (headend replication)  Enabled ()       0.0.0.0 (down)                           1                         0                          0

0 Kudos
Bolw
Contributor
Contributor

kumarlakshman_kumar 撰写:

Hello Bolw,

don't know if you have tried or not, but there is 'http://labs.hol.vmware.com.' there are couple of labs for NSX server with detailed explanation of each component. it should solve all your queries.

Thanks & Regards,

Lakshman,VCP550

I tried it before but the version of NSX seems to be old. The UI is completely different.

0 Kudos
admin
Immortal
Immortal

> Could you explain why 1 controller is unsupported if I don't care the redundancy?

Because if you do something (like create a Logical Switch with Unicast control plane) while no Controllers are available, you'll land up with a non-working Logical Switch.

Having more than one Controller (if you don't have situation where all of them may be offline/disconnected at the same time) should help avoiding this.

Back to your problem: your hosts aren't talking to your controller, and seem to be not aware of Controller's IP address, either.

This means that potentially your host(s) aren't talking to the NSX Manager. Let's check it. Run the following on your ESXi hosts:

esxcfg-advcfg -g /UserVars/RmqIpAddress <= this should come back with your NSX Manager's IP address. If it doesnt, please rectify IP connectivity between your ESXi host and your NSX Manager, then reboot the host.

If IP address returned by the previous command is correct, and your host can reach NSX Manager, run this:

esxcli network ip connection list | grep 5671

This should display one or more established connections between your host and NSX Manager. If there are connections, and they are ESTABLISHED, check /var/log/vsfwd.log for error messages.

Let us know what you see, and we'll take it from there.

0 Kudos
admin
Immortal
Immortal

One more thing:

The output of the 1st command is

VNI      Controller      BUM-Replication ARP-Proxy Connections VTEPs

5000    10.1.2.101     Enabled             Enabled      0                 0

but from your first post:

GW: 10.1.1.254, 10.1.2.254, 10.1.3.254

vCenter: 10.1.3.1

NSX: 10.1.3.2

NSX Controller: 10.1.2.100

What's the story here? Did you delete the original controller, and then make a new one?

0 Kudos
Bolw
Contributor
Contributor

Because if you do something (like create a Logical Switch with Unicast control plane) while no Controllers are available, you'll land up with a non-working Logical Switch.

Having more than one Controller (if you don't have situation where all of them may be offline/disconnected at the same time) should help avoiding this.

does this mean I have to set up another controller to proceed the troubleshooting? I hope it's not because my disk may not be large enough.

esxcfg-advcfg -g /UserVars/RmqIpAddress <= this should come back with your NSX Manager's IP address. If it doesnt, please rectify IP connectivity between your ESXi host and your NSX Manager, then reboot the host.

If IP address returned by the previous command is correct, and your host can reach NSX Manager, run this:

esxcli network ip connection list | grep 5671

This should display one or more established connections between your host and NSX Manager. If there are connections, and they are ESTABLISHED, check /var/log/vsfwd.log for error messages.

Let us know what you see, and we'll take it from there.

the first command is OK. NSX manager's IP is correct and reachable.

for the 2nd command, all ESX return something like this

  tcp         0       0  10.1.1.1:35328                  10.1.3.2:5671    ESTABLISHED   1959585  newreno  vsfwd

  tcp         0       0  10.1.1.1:55695                  10.1.3.2:5671    ESTABLISHED   1959585  newreno  vsfwd

for vsfwd.log. ESX1 does not have this file. ESX2 and ESX3 have this file full of these messages

2014-07-24T06:41:27Z vsfwd: [DEBUG] (VSIPFW) No flow records for filter

2014-07-24T06:41:27Z vsfwd: [DEBUG] (VSIPFW) Ipfix not enabled, skip sending records

0 Kudos
Bolw
Contributor
Contributor

DmitriK 撰写:

One more thing:

The output of the 1st command is

VNI      Controller      BUM-Replication ARP-Proxy Connections VTEPs

5000    10.1.2.101     Enabled             Enabled      0                 0

but from your first post:

GW: 10.1.1.254, 10.1.2.254, 10.1.3.254

vCenter: 10.1.3.1

NSX: 10.1.3.2

NSX Controller: 10.1.2.100

What's the story here? Did you delete the original controller, and then make a new one?

actually I am not aware of this change. the controller and one of the ESX host use the same IP pool. is it possible the IP will change due to the timing of boot? if not, maybe it's my typo. will this be a concern?

10.1.2.100 is now used by ESX3 as its VMKNic IP address

0 Kudos
admin
Immortal
Immortal

> does this mean I have to set up another controller to proceed the troubleshooting?

No, you can continue with one.

> ESX1 does not have this file

Can you try to reboot all three hosts, and see if:

- all hosts have /var/log/vsfwd.log and it's updating, and

- "esxcli network vswitch dvs vmware vxlan network list --vds-name DSwitch" would show Controller Connection as Up?

If not, we'll troubleshoot further.

0 Kudos
admin
Immortal
Immortal

> will this be a concern?

Not much of a concern, as long as hosts expect to find the Controller on its correct IP address.

The IP address of all Controllers is sent from NSX Manager to the vsfwd process on each host, and is saved in the file /etc/vmware/netcpa/config-by-vsm.xml (you should be able to see it easily).

Then on each host process called netcpa takes IP(s) of controler(s) from that file, and tries to connect to them for control plane connection. This is what appears to be not happening for you. netcpa writes into a log file /var/log/netcpa.log, where you should be able to see host attempting to connect to controllers, and errors if it can't.

0 Kudos
Bolw
Contributor
Contributor

Can you try to reboot all three hosts, and see if:

- all hosts have /var/log/vsfwd.log and it's updating, and

- "esxcli network vswitch dvs vmware vxlan network list --vds-name DSwitch" would show Controller Connection as Up?

after reboot all ESX hosts, everything works. all guest virtual machines can ping among each other. how could this happen? anyway I really appreciate your great help. I learn a lot through this troubleshooting process.

0 Kudos