VMware Networking Community
Dryv
Enthusiast
Enthusiast
Jump to solution

VMSC: NSX/VXLAN/VTEP Configuration

Hi Community

I am trying to get my head around what is currently a paper based exercise before I take this into my lab. My current problem is my brain isn't getting past what seems like some basic network principles... its just not registering. So here's my scenario: 

- I want to build a VMSC using NSX

I have:

- 4 Servers in Site A

- 4 Servers in Site B

- Storage that can be configured as per VMSC requirements across both sites, Uniform deployment configuration

I can achieve the following:

- Deploy ESXi 6.5 across all 8 Servers

- Site A ESXi servers (Management) can go on VLAN 10 (10.1.10.x)

- Site A ESXi servers (vMotion) can go on VLAN 11 (10.1.11.x)

- Site B ESXi servers (Management) can go on VLAN 20 (10.2.10.x)

- Site B ESXi servers (vMotion) can go on VLAN 21 (10.2.11.x)

- All the above subnets can be reached from each other

- Configure the storage so its stretched across both Sites and presented to all ESXi servers as per VMSC requirements

- Deploy a single VCSA 6.5 and all the NSX components on Site A (Preferred Site)

- Add all 8 ESXi servers into the above VCSA and into the same HA Cluster providing the basis of the VMSC

Now, I just dont know where to go next.... The whole purpose is to be able to build L2 stretched networks over L3 networks across sites for the VMs that will be hosted. This will provide me mobility of VMs across both sites and also allow HA to kick in in case a Site Fails. My issue is I cant make sense of the VXLAN and VTEP configuration across the hosts in the 2 sites...I think I can get to the point of deploying NSX compoenents and preparing hosts for NSX but the next steps aren't clear for me. Lots of questions filling my head like:

- What IP addresses will the VTEP vmkernal ports have across the hosts?

- Will they need to be part of the same VLANs/Subnets?

- Do I tag the VLAN ID down to these VTEP interfaces like I do on the Management/vMotion port groups?

- The same VLANs/Subnets arent available in both sites for the VTEP interfaces, so what are my options?

- I can be given site specific VLANs/IP Subnets, but is this enough from which the VXLAN subnets can be stretched?

- If I successfully manage to create the stretched VM networks, how will the physical world route over and reach the VMs that will be on the VXLANs?

and many many more...Can someone help me through the understanding process of how this will hang together in a simple fashion...NSX Link-O-Rama is great but my head is just getting filled with more and more questions. The VMware HOLs are also brilliant but tailored for the single site it seems.

Dryv

Reply
0 Kudos
1 Solution

Accepted Solutions
smitmartijn
VMware Employee
VMware Employee
Jump to solution

Hi Dryv,

You're on the right track and have your basis covered. The next step is to overlay NSX on top of this VMSC and let VXLAN handle the L2 stretching. So you've gotten the ESXi hosts prepared for NSX and squared away, good. Before I continue, you haven't talked about NSX controllers, these appliances are needed to make the next steps work. That's as simple as deploying 3 NSX controllers from the Installation section: Deploy NSX Controller Cluster

The next step is to define a VXLAN segment ID pool ( Assign a Segment ID Pool and Multicast Address Range ) and configure VXLAN on your cluster. At this point, your first questions come into play. The VTEPs that will be created need an IP address. This IP address can come from 2 places; a NSX  IP Pool or DHCP. Currently, you can only choose a single IP Pool for a single VMSC.This means if you go for an IP Pool, all IP addresses of all ESXi hosts will be in the same subnet.

Another way is to choose DHCP and let a DHCP server per location distribute different subnets. So the IP subnet for location A will be 10.1.12.0/24 and in location B it will be 10.2.12.0/24. There's a "trick" to this tough, as you can only supply 1 VLAN ID when you prepare a cluster for VXLAN. If you do for different subnets (which I recommend, because you don't have to stretch the VXLAN VLAN in that case, having pure L3 between locations), you have the same VLAN ID on both locations, but with different IP subnets and DHCP servers. That needs to be clearly documented to prevent confusion.

When configuring VXLAN, you provide a few options; which VDS, how many VTEPs, MTU and VLAN ID. (Configure VXLAN Transport Parameters ).

- What IP addresses will the VTEP vmkernal ports have across the hosts? <- DHCP (different subnets) or IP Pool (same subnet)

- Will they need to be part of the same VLANs/Subnets? <- Not with the "trick" - yes with IP Pool

- Do I tag the VLAN ID down to these VTEP interfaces like I do on the Management/vMotion port groups? <- Yes, provide the VLAN ID when you configure VXLAN for the cluster

- The same VLANs/Subnets arent available in both sites for the VTEP interfaces, so what are my options? <- DHCP

- I can be given site specific VLANs/IP Subnets, but is this enough from which the VXLAN subnets can be stretched? <- All you need is reachability & MTU >=1600 between the ESXi hosts

- If I successfully manage to create the stretched VM networks, how will the physical world route over and reach the VMs that will be on the VXLANs?

When it comes to the physical network; that will only see the ESXi VTEPs communicating with each other. So the VTEP IPs need to be reachable amongst both locations. If you're going to use DHCP and use different subnets, the VXLAN traffic will be properly routed through the physical network between locations. Inside NSX it'll look like you've stretched your L2 network, on the physical it's just location specific IP subnets talking to each other.

If you'd like a picture, you can find one in a blog I did a while back: https://www.vmguru.com/2016/08/please-stop-stretching-vlans-virtualize-your-network/

View solution in original post

Reply
0 Kudos
13 Replies
Sreec
VMware Employee
VMware Employee
Jump to solution

What IP addresses will the VTEP vmkernal ports have across the hosts?

Make use of  different subnet comparing with other traffic that you have in the setup ,if you have a DHCP scope for this network that is highly recommended

- Will they need to be part of the same VLANs/Subnets?

Not recommended

- Do I tag the VLAN ID down to these VTEP interfaces like I do on the Management/vMotion port groups?

Optional

- The same VLANs/Subnets arent available in both sites for the VTEP interfaces, so what are my options?

Recommended to use same VLAN and have different subnet because  you cannot configure different VLAN IDs for different ESXi hosts in the same cluster. Try to use DHCP here and that way you don't need to stretch VXLAN transport vlan. In addition to this ensure that you don't mix NIC team policy for VTEP which are part of same cluster.

- I can be given site specific VLANs/IP Subnets, but is this enough from which the VXLAN subnets can be stretched?

This question is not clear , can you be little more precise with the ask ?

- If I successfully manage to create the stretched VM networks, how will the physical world route over and reach the VMs that will be on the VXLANs?

Either you can do NAT at the edge level or establish a IGP peering with next hop device and do a EGP from there from there.

Cheers,
Sree | VCIX-5X| VCAP-5X| VExpert 6x|Cisco Certified Specialist
Please KUDO helpful posts and mark the thread as solved if answered
smitmartijn
VMware Employee
VMware Employee
Jump to solution

Hi Dryv,

You're on the right track and have your basis covered. The next step is to overlay NSX on top of this VMSC and let VXLAN handle the L2 stretching. So you've gotten the ESXi hosts prepared for NSX and squared away, good. Before I continue, you haven't talked about NSX controllers, these appliances are needed to make the next steps work. That's as simple as deploying 3 NSX controllers from the Installation section: Deploy NSX Controller Cluster

The next step is to define a VXLAN segment ID pool ( Assign a Segment ID Pool and Multicast Address Range ) and configure VXLAN on your cluster. At this point, your first questions come into play. The VTEPs that will be created need an IP address. This IP address can come from 2 places; a NSX  IP Pool or DHCP. Currently, you can only choose a single IP Pool for a single VMSC.This means if you go for an IP Pool, all IP addresses of all ESXi hosts will be in the same subnet.

Another way is to choose DHCP and let a DHCP server per location distribute different subnets. So the IP subnet for location A will be 10.1.12.0/24 and in location B it will be 10.2.12.0/24. There's a "trick" to this tough, as you can only supply 1 VLAN ID when you prepare a cluster for VXLAN. If you do for different subnets (which I recommend, because you don't have to stretch the VXLAN VLAN in that case, having pure L3 between locations), you have the same VLAN ID on both locations, but with different IP subnets and DHCP servers. That needs to be clearly documented to prevent confusion.

When configuring VXLAN, you provide a few options; which VDS, how many VTEPs, MTU and VLAN ID. (Configure VXLAN Transport Parameters ).

- What IP addresses will the VTEP vmkernal ports have across the hosts? <- DHCP (different subnets) or IP Pool (same subnet)

- Will they need to be part of the same VLANs/Subnets? <- Not with the "trick" - yes with IP Pool

- Do I tag the VLAN ID down to these VTEP interfaces like I do on the Management/vMotion port groups? <- Yes, provide the VLAN ID when you configure VXLAN for the cluster

- The same VLANs/Subnets arent available in both sites for the VTEP interfaces, so what are my options? <- DHCP

- I can be given site specific VLANs/IP Subnets, but is this enough from which the VXLAN subnets can be stretched? <- All you need is reachability & MTU >=1600 between the ESXi hosts

- If I successfully manage to create the stretched VM networks, how will the physical world route over and reach the VMs that will be on the VXLANs?

When it comes to the physical network; that will only see the ESXi VTEPs communicating with each other. So the VTEP IPs need to be reachable amongst both locations. If you're going to use DHCP and use different subnets, the VXLAN traffic will be properly routed through the physical network between locations. Inside NSX it'll look like you've stretched your L2 network, on the physical it's just location specific IP subnets talking to each other.

If you'd like a picture, you can find one in a blog I did a while back: https://www.vmguru.com/2016/08/please-stop-stretching-vlans-virtualize-your-network/

Reply
0 Kudos
Dryv
Enthusiast
Enthusiast
Jump to solution

Hi Both,

Thanks for your responses. I think you have both exposed where my knowledge lacks in this solution. It seems I am confused about VLAN IDs and Subnets... Like Subnets, I did not think it is possible to have the same VLAN ID in each site? I was in 2 minds on whether I would have to ask my network guys to present me the same VLAN ID to my ESXi servers across both sites. From our Network schema there is never the same VLAN ID across 2 sites, like there is never the same subnet in each site. I was thinking if I asked them this I would get 'the looks'!!!???

Sreec, you picked up on this when you ask me 'This question is not clear, can you be a little more precise with the ask' I noticed you had a very similar posting titled "Active-Active with VMSC(vSphere metro storage cluster)" Did you manage to complete this task? Was it successful? Can you share anything non proprietary or confidential?

Martijn, well...so... you are VMGURU.com!!! well I'm glad you've stepped in because I blame you for this whole curiosity of stretching networks using NSX thats turned my life upside down Smiley Happy Smiley Happy  Its because I came across the very article of yours you reference that this whole saga has triggered in my life for the last 3 weeks. I have been glued to my laptop day and night trying to figure out in my head how to do this stretching of L2 networks over L3. I was happy in my world, I have a production VMSC running over cisco OTV all working fine and my life was peaceful. I came across your article, and being a practical person myself, I now want to lab this and quench my thirst for this piece of knowledge! So it's only right that you now help me out of this HUGE hole I've fallen into!!! haha Smiley Happy My family are not happy with you! Smiley Happy

So, in response to your answers:

The next step is to define a VXLAN segment ID pool ( Assign a Segment ID Pool and Multicast Address Range ) and configure VXLAN on your cluster. At this point, your first questions come into play. The VTEPs that will be created need an IP address. This IP address can come from 2 places; a NSX  IP Pool or DHCP. Currently, you can only choose a single IP Pool for a single VMSC.This means if you go for an IP Pool, all IP addresses of all ESXi hosts will be in the same subnet.

So if I use an IP Pool, I will still need to stretch this transport Subnet somehow before I can stretch my VM Networks (using NSX/VXLAN)...so I may as well just stick to my OTV based solution, and stretch everything using that.. whats the point in me stretching my transport network using OTV or something else first and then the remainder VM networks using NSX? Seems over complicated to me and my network team already dont like me! Smiley Happy

Another way is to choose DHCP and let a DHCP server per location distribute different subnets. So the IP subnet for location A will be 10.1.12.0/24 and in location B it will be 10.2.12.0/24. There's a "trick" to this tough, as you can only supply 1 VLAN ID when you prepare a cluster for VXLAN. If you do for different subnets (which I recommend, because you don't have to stretch the VXLAN VLAN in that case, having pure L3 between locations), you have the same VLAN ID on both locations, but with different IP subnets and DHCP servers. That needs to be clearly documented to prevent confusion.

This explanation absolutely rocks!! So here, basically what you are saying is I still need the same VLAN ID tagged down to my ESXi servers in both sites? Through DHCP my ESXi servers in each site receive IP addresses in different subnets but be tagged on the same VLAN regardless of Site location? This way I would avoid the need to stretch the VXLAN transport VLAN using some other technology before I can start stretching my VM Networks using NSX/VXLAN?

Pure L3 between sites is exactly what I want. I wouldn't want any stretching taking place unless its being done by NSX/VXLAN. This way I think all I need from my network team is simple L3 routing between sites and MTU >=1600 and this 'trick'? So, can I safely go and ask my network team for example to give me a VLAN ID 100 that exists in both Site A and Site B, without them shooting me? I could then tag this VLAN 100 to the VMKernal Ports that the VTEPS will configure too? And then I guess I could also ask the network team if they can spin up a DHCP service in Site A VLAN 100 to dish out addresses in 10.1.12.0/24 and a DHCP service in Site B VLAN 100 to dish out addresses in 10.2.12.0.24?

For now, I think you helping in just getting me this far will be enough for me to sleep in peace for a little while Smiley Happy I really appreciate the detailed response in the last reply. Thank you.

The rest of the question around how I stitch this all up between virtual and physical worlds I think I will ask separately after I get to get to grips with the fundamentals. Eventually I will need to stretch the current production VLANs/Subnets (that are being stretched using OTV) using NSX/VXLAN... but the physical VLANs/Subnets will need to remain in place as physical and virtual workloads are still on them... and reachability to everything cant be affected! I think this bit is better left for another day when you have more time! Smiley Happy

Thanks both, so far

Dryv

Reply
0 Kudos
Sreec
VMware Employee
VMware Employee
Jump to solution

Hello Dryv,

                  Really loved this discussion Smiley Happy  some excellent points. You are right i had a question around A-A with VMSC with NSX . Even though i didn't get any response at that time , myself and team had a great planning because it was a two Site VBLOCK with VPLEX ,RP and OTV+OSPF+MPLS . There were no reference for a brownfield configuration for such a complex architecture(Application level clustering was also there) that too when production down time was less. First of all i'm not a fan of single VC cross vSphere stretched cluster environment . So we decided to stick with 2 VC and break all vblock internal architecture from a s/w perspective PSC/VC/PowerpathRP/DB etc lot of subnet change ,configure UCS SP&N2/7/9k etc etc and update MTU . After a 1 month of work - output was something like this ->compute was stretched via VXLAN and  L3 for all management between sites with few custom tcp/ip stack for vmotion traffic.

Considering your architecture most of the points are covered in this discussion .

Pure L3 between sites is exactly what I want. I wouldn't want any stretching taking place unless its being done by NSX/VXLAN. This way I think all I need from my network team is simple L3 routing between sites and MTU >=1600 and this 'trick'? So, can I safely go and ask my network team for example to give me a VLAN ID 100 that exists in both Site A and Site B, without them shooting me? I could then tag this VLAN 100 to the VMKernal Ports that the VTEPS will configure too? And then I guess I could also ask the network team if they can spin up a DHCP service in Site A VLAN 100 to dish out addresses in 10.1.12.0/24 and a DHCP service in Site B VLAN 100 to dish out addresses in 10.2.12.0.24?

Yes this is 100& achievable - l3 with whatever routing protocol which they think best ,get it configured/advertise new l3 subnets and update MTU to 1600. One main point is wherever you have multiple VMK with different Subnet -ensure you create all these from webclient with custom TCP/IP stack and test it before going live.

Cheers,
Sree | VCIX-5X| VCAP-5X| VExpert 6x|Cisco Certified Specialist
Please KUDO helpful posts and mark the thread as solved if answered
Reply
0 Kudos
bayupw
Leadership
Leadership
Jump to solution

Hi Dryv,

There is a good VMworld session on NSX with vMSC here: NSX with vSphere Metro Cluster - sharing this so its easier to find for some people by Ray Budavari ...

It's not based on the latest NSX version but it should give you some idea on how it all hangs together

Don't forget to check the multi-site design guide here: NSX-V Multi-site Options and Cross-VC NSX Design Guide

Regarding your questions

- What IP addresses will the VTEP vmkernal ports have across the hosts?

As per previous answer, you can use DHCP or IP Pool. Regardless of what you choose, you can only input one VLAN ID for the cluster.

pastedImage_7.png

Assuming the vMSC/vSphere cluster is spanned across two sites

When using IP Pool, you will need to use same network/subnet/VLAN ID for that stretched cluster which is a single failure domain.

If you want to use different IP address for different site, you can change the IP manually after NSX configured the VTEP IP, but this is not recommended as when you need to remove/add new host, you will again need to change this and it is somehow not sync/consistent with the global configuration as you manually override the IP.

Normally the recommended way is to use DHCP so each site has its own DHCP server, they can use different network/subnet but need to be on the same VLAN (see screenshot above you can only a common VLAN ID for a particular cluster (stretched cluster in your case)

- Will they need to be part of the same VLANs/Subnets?

Answer in question #1

- Do I tag the VLAN ID down to these VTEP interfaces like I do on the Management/vMotion port groups?

Depends on your host/NIC setup, if they share same NIC then yes but if the VTEP are on dedicated NIC then just the VLAN ID for VTEP which you will need to input during VXLAN networking configuration (previous screenshot)

- The same VLANs/Subnets arent available in both sites for the VTEP interfaces, so what are my options?

Answer in question #1

- I can be given site specific VLANs/IP Subnets, but is this enough from which the VXLAN subnets can be stretched?

Just like vMotion VMkernel, you need the VTEP IP address to be able to reach each other + minimum MTU 1600 end to end, check with network team/service provider

- If I successfully manage to create the stretched VM networks, how will the physical world route over and reach the VMs that will be on the VXLANs?

See these sample of architectures by Hany Michael on the detail logical network/routing (exclude VTEP) http://www.networkskyx.com/

Network SkyX

SDDC Architectures: Workload Mobility & Recovery with NSX 6.2, vSphere 6.x & SRM 6.x | NetworkSkyX

VXLAN/VTEP is between ESXi hosts, as long as they can reach each other with minimum MTU 1600 then you should be good.

Logical network/data plane is another story another IP addressing which runs on top of the VXLAN/VTEP

Note that NSX should be able to optimise the egress routing (from NSX to the outside world) but the ingress routing will need to be handled by the physical network.

A network for example 10.0.0.0/24 would be only advertised from one site for example primary site, during failover the physical network would need to update the network to be advertised from the backup/secondary site.

If the failover is site level, you can use dynamic routing in NSX so NSX would update that the 10.0.0.0/24 network is now reachable on site backup site.

But if the failover is VM level for example only 10.0.0.11/32 then you would need to inject a /32 route - depends on what will be the failure scenario, when should VM be restarted on backup site.

It will be more complicated once you introduce stateful services such as north-south firewall and load balancing services.

I would suggest to define a DR plan/strategy first whether the solution should be able to handle just site level failure or network/subnet level failure which is easier or VM level failure which is harder

Bayu Wibowo | VCIX6-DCV/NV
Author of VMware NSX Cookbook http://bit.ly/NSXCookbook
https://github.com/bayupw/PowerNSX-Scripts
https://nz.linkedin.com/in/bayupw | twitter @bayupw
Reply
0 Kudos
Dryv
Enthusiast
Enthusiast
Jump to solution

Hi Bayu,

Thanks also for coming into this thread and providing valuable knowledge and insights. I am really concerned about one thing though:

But if the failover is VM level for example only 10.0.0.11/32 then you would need to inject a /32 route - depends on what will be the failure scenario, when should VM be restarted on backup site.

Now, this is causing me some concern in theory. Looking at the context of planning for NSX over a VMSC, if a VM level failure happened and the VM HA'd over to the backup site, DRS should kick in and bring it back over to the active (and preferred site). Therefore no routing changes should be needed. However, lets say that DRS didnt kick in or someone deployed a VM and forgot to assign it a preferred site.. in this case if a VM level failure happened and the VM HA'd over to the backup site and remained there, are you saying a /32 route would need to be injected before the VM is reachable from the outside world? Or are you saying a /32 route would need to be injected for optimal ingress routing to take place?

I would have thought Ingress routing would continue the way it is through the Active Site via the Physical Network (as its just a VM level failure) and as soon the packet reaches the NSX layer in the Active Site, it would push it over the VXLAN Transport Network (host to host) to the VM in the backup site...I agree, sub-optimal, but from the point of the VM starting up from failure in the backup site, it would atleast still be reachable?

As for egress routing, I guess I could have localised Egress points, with the latest NSX developments. But even without, I would have thought the same sub optimal path would be immediately available in the case of a VM level failure?

This level of discussion, about routing and connecting the Virtual World to the Physical World is really fragile in my head right now, so please do correct me if my logic above it completely wrong. When things are more solid in my head I'm sure I will be raising lots of questions in the community to help me through! Smiley Happy

Reply
0 Kudos
bayupw
Leadership
Leadership
Jump to solution

Hi Dryv,

Yes you are correct there is no need of /32 routes if you do not need ingress routing optimisation and accept suboptimal routing.

This suboptimal routing would work if the next hop of DLR is still the primary/Active site.

If you enable local egress then there will be asymmetrical routing and stateful services would normally break.

So no local egress + no /32 routes (ingress routing optimisation) would work

Bayu Wibowo | VCIX6-DCV/NV
Author of VMware NSX Cookbook http://bit.ly/NSXCookbook
https://github.com/bayupw/PowerNSX-Scripts
https://nz.linkedin.com/in/bayupw | twitter @bayupw
Dryv
Enthusiast
Enthusiast
Jump to solution

phewwwwwwwwww.... something in my head makes sense. Its fragile, it required 24 hours of thought but I think its getting there! Thanks for the quick response Bayu

Reply
0 Kudos
smitmartijn
VMware Employee
VMware Employee
Jump to solution

Hi,

Thanks for that reply, you've just made my day. 🙂

Just a quick note about a few things that were not addressed yet:

> ..so I may as well just stick to my OTV based solution, and stretch everything using that.. whats the point in me stretching my transport network using OTV or something else first and then the remainder VM networks using NSX?

The point is that you only have to stretch your transport network and there are no physical changes required whenever a new VM network is created. Without NSX, the OTV layer has to be modified every time that happens (so bye bye automation). With NSX, no changes need to happen and the "danger" of stretching VLANs is limited to the admins managing the transport network instead of the users that live/work on the VM networks. Still, pure L3 is better. 😉

> So, can I safely go and ask my network team for example to give me a VLAN ID 100 that exists in both Site A and Site B, without them shooting me?

Yes, but be sure to tell them you do not want it stretched. That's why I mentioned that it needs to be properly documented, so they also know why it's like this.

Reply
0 Kudos
Dryv
Enthusiast
Enthusiast
Jump to solution

Haha... no worries... Thank you for your help Martijn

I'll be back soon with questions around integrating physical and virtual worlds when using NSX! Watch this space!

Reply
0 Kudos
wreedMH
Hot Shot
Hot Shot
Jump to solution

Hello,

We are in the process of setting up a stretched vSAN cluster with NSX layered on top. I am to the point of configuring the VXLAN at each site, however we made the VXLAN VLAN IDs at each site different. Judging by reading this, I assume they must match on both sides because you can only enter one VXLAN VLAN ID in the dialog box? Any other options or ways around this?

Reply
0 Kudos
smitmartijn
VMware Employee
VMware Employee
Jump to solution

Hi,

There is no way around that, inside the same cluster the VLAN ID can only be the same. You don't have to stretch it (which is what this thread is about), but it can to be the same ID.

Reply
0 Kudos
wreedMH
Hot Shot
Hot Shot
Jump to solution

Got it, we will renumber one side tomorrow.

Reply
0 Kudos