In my environment we are planning to use NSX with Leaf Spine Architecture. (In current scenario we have traditional 3 level network design)
When we go for NSX with Leaf Spine Architecture, would like to understand whether the leaf switches should be L2 or L3
My understanding is that it should be L2 since NSX will be having the Edge Gateways which will form the L3 Adjacency with the spine switches.
Please clarify
Dear Rajeev,
leaf switches should be L2
From NSX's perspective it doesn't really matter since the physical underlay doesn't matter much, but in a typical leaf/spine architecture the leaf would actually be the L3 boundary and your ESGs would peer with an edge leaf. The idea with leaf/spine is that you utilize L3 all the way to the edge so that connectivity back across the spine can take advantage of L3 ECMP rather than having to utilize some sort of L2 link aggregation.
Is there any document from VMware which states which is the recommended design considering large scale design and deployment.
Starting on page 101 of the design guide at VMware® NSX for vSphere Network Virtualization Design Guide ver 3.0 both L2 and L3 access designs are covered.
Thanks
By understanding & my thought is as below.
The leaf switches should be L2 switches.
The Spice switches will be L3
There will be a separate POD for the Management & Edge clusters.
The Edge clusters will be running the Edge gateways which will run routing protocol & establish routing with the Spine L3 switches.
Leaf switches will be providing only the L2 functionality.
If case if leaf switches are configured as L3, all the leaf switches will be L3 & there will be multiple paths when the leaf switches expands.
It will be easy & simple to have the spine as L3 rather than having leaf as L3.
The above is my understanding. If any inputs or suggestions please let me know regarding my understanding.
I got the below reference diagram from VMware document for NSX design with Cisco Nexus switches.
1- In the diagram there is separate POD for Compute, Management & Edge
2 - The spine switch is 9500 & the leaf switch is 93XX series.
3 - There is also another pair of switches inside each rack - Would like to know what is the purpose of these switches.
Why not the servers inside each POD is not directly connected to the leaf switches.
4 - Why should the leaf switches be L3. What is the advantage of using leaf switches as L3 in this scenario.
5 - The Edge GWY will be running L3 & it can form L3 adjacency with the Spine switch.
So in this case why the leaf switches to be L3.
The design guide that diagram comes from is actually specific to Cisco UCS configurations so the separate switches you see in each rack actually represent the Cisco fabric interconnects.
As far as why you'd do L3 in the leaf instead of L2, the main benefits are scalability and the fact that you can use L3 ECMP for the connectivity from the leaf switches to the spines. That and the fact that NSX can stretch layer 2 connectivity to wherever it's needed in the overlay for your workloads leaves really no benefit to extending L2 beyond the TOR.
As far as your 5th point, you're correct that the NSX edge can technically peer with either a leaf or a spine and as mentioned in my initial response, it doesn't really matter either way from the perspective of NSX so placement of your L2/L3 boundaries is just a matter of how you need to scale the underlay physical network.
Regarding the Fabric inter connect i am clear with your explanation. Thanks.
Regarding leaf as L3, in this case will the Edge gateways of NSX will be forming the routing adjacency with each of the leaf switches .
So when the no of leaf switches increases the routing with the Edge gateways also gets expanded.
But if the Edge Gateways form the routing with the Spine L3 switches it will be simpler in design & there wont by multiple adjacency.
The ESGs would only peer with the leaf switches that connect to the host that they're on so should only have 2 adjacencies in the example of the diagram you referenced where there were 2 top of rack switches each. Routes for prefixes that live inside the NSX overlay will all get propagated to the rest of the fabric from there and vice versa so no need to peer with anything outside of the edge cluster rack(s).