aaronwsmith
Enthusiast
Enthusiast

VXLAN - Reduce VTEP Count?

Hi Everyone,

This is a lab setup running vSphere atop 3x Dell R610 servers with 4 physical NICs each.  The VDS connected to this cluster has 4 uplinks.  I then installed NSX-v 6.2.2 and configued VXLAN.  Since the DVS initially had 4 uplinks, configuring VXLAN on the cluster forced 4 VTEPs.  This seems to be expected per the installation guide: "The number of VTEPs is not editable in the UI. The VTEP number is set to match the number of dvUplinks on the vSphere distributed switch being prepared." (http://pubs.vmware.com/NSX-62/topic/com.vmware.ICbase/PDF/nsx_62_install.pdf) ... PDF page 63.

Initially everything was good.  However, as I started building out a nested ESXi lab, intention was to have NSX working in that environment as well.  I quickly ran into problems that are described in the following 2 blogs:

http://vlenzker.net/2016/04/nsx-and-nested-esxi-environments-caveats-layer-2-troubleshooting/

https://telecomoccasionally.wordpress.com/2016/03/10/from-the-dept-of-the-knowledge-arcane-nsx-v-wit...

So I pulled out a vmnic from the now-VXLAN enabled VDS, updated all the port groups and VXLAN virtual wires to exclude the last uplink, then changed the uplink count from 4 -> 3 successfully.  I then manually deleted 1x vmk port (from the vxlan tcp/ip stack) from each ESXi host so everything was consistent.

Next, using the NSX API guide at https://pubs.vmware.com/NSX-6/topic/com.vmware.ICbase/PDF/nsx_604_api.pdf‌ (as the GUI seems to lack the ability to adjust some of the VXLAN settings) I did the following:

1. API call to update VXLAN configuration for the VDS to exclude uplink 4: https://<nsx-manager>/api/2.0/vdn/switches -- Reference example 6.1 from the API guide or essentially steps in KB Changing the VXLAN teaming policy and MTU settings in VMware vCloud Networking and Security 5.5.x an...

Changed (removed "Uplink 4" row):

    <uplinkPortName>Uplink 3</uplinkPortName>

    <uplinkPortName>Uplink 2</uplinkPortName>

    <uplinkPortName>Uplink 1</uplinkPortName>

2. API call to change the vmkernel port count from 4 -> 3: https://<nsx-manager>/api/2.0/nwfabric/configure‌ -- Reference example 5.2 from the API guide.

Changed (4 --> 3):

  <vmknicCount>3</vmknicCount>

Both API calls completed with HTTP/200.  For step 1, I can query the VXLAN VDS configuration and verify the uplinks are 1 - 3.  But for step 2, I cannot find a way to query to confirm the change took effect.

In NSX via the Web Client, I still see VTEP = 4 in Installation -> Logical Network Preparation -> VXLAN Transport for the Cluster in question.  See attached picture.

I've *NOT* yet made API calls to release the IPs from the pool from the VTEPs I manually deleted.

I also tried restarting NSX Manager, no change in the GUI.

Basically want to ensure if I rebuild/add a host to the cluster, that NSX Manager will correctly configure VXLAN on it.  Also been a great learning experience on the REST APIs for NSX 🙂

Thanks for any suggestions/help on correcting this!  I imagine this could be a legit situation you could encounter in production as well if you ever needed to pull out a vmnic / uplink from an NSX enabled VDS in a production environment where un-configuring VXLAN on the cluster isn't an option.

Also including additional pictures of the 2 API calls I mentioned.

Tags (2)
0 Kudos
1 Reply
aaronwsmith
Enthusiast
Enthusiast

As an update ... I performed the following API call to NSX Manager to query the allocated IP addresses from the NSX IP Pool where the VTEPs were originally created:

GET https://<nsx-manager>/api/2.0/services/ipam/pools/ipaddresspool-2/ipaddresses

In my case, ipaddresspool-2 is the ID for the NSX IP pool.  I found the following 3 IPs allocated corresponding to the vmk ports / VTEPs I manually deleted:

    <allocatedIpAddress>

        <id>23</id>

        <ipAddress>192.168.2.209</ipAddress>

        <gateway>192.168.2.1</gateway>

        <prefixLength>24</prefixLength>

        <dnsServer1>192.168.3.10</dnsServer1>

        <dnsServer2>192.168.3.11</dnsServer2>

        <dnsSuffix>helios.local</dnsSuffix>

        <subnetId>subnet-2</subnetId>

    </allocatedIpAddress>

    <allocatedIpAddress>

        <id>24</id>

        <ipAddress>192.168.2.210</ipAddress>

        <gateway>192.168.2.1</gateway>

        <prefixLength>24</prefixLength>

        <dnsServer1>192.168.3.10</dnsServer1>

        <dnsServer2>192.168.3.11</dnsServer2>

        <dnsSuffix>helios.local</dnsSuffix>

        <subnetId>subnet-2</subnetId>

    </allocatedIpAddress>

    <allocatedIpAddress>

        <id>25</id>

        <ipAddress>192.168.2.211</ipAddress>

        <gateway>192.168.2.1</gateway>

        <prefixLength>24</prefixLength>

        <dnsServer1>192.168.3.10</dnsServer1>

        <dnsServer2>192.168.3.11</dnsServer2>

        <dnsSuffix>helios.local</dnsSuffix>

        <subnetId>subnet-2</subnetId>

    </allocatedIpAddress>

I then used the <id> to try and delete / release those IPs:

DELETE https://<nsx-manager>/api/2.0/services/ipam/pools/ipaddresspool-2/ipaddresses/23

DELETE https://<nsx-manager>/api/2.0/services/ipam/pools/ipaddresspool-2/ipaddresses/24

DELETE https://<nsx-manager>/api/2.0/services/ipam/pools/ipaddresspool-2/ipaddresses/25

In all three cases, HTTP/200 is returned, but the XML payload suggests nothing happened:

<?xml version="1.0" encoding="UTF-8"?>

<boolean>false</boolean>

And when I query the allocated IPs from the pool again, those entries remain in the list, confirming nothing happened.  Perhaps NSX Manager is still aware of the vmk# ports somewhere and that's why it refuses to release the IP from the pool?  I haven't found an API call yet to query the VTEPs created by NSX Manager.  If anyone knows on this piece, it might be the key to why the GUI still shows 4 VTEPs.  Thanks!

0 Kudos