Even if you decide not to use SNAT, it still has to support service of type LoadBalancer and Ingress, and those require stateful services to exist. If they want to use the NCP, the T0 has to be A/S. Just a design requirement which probably isn't changing.
Thanks! Yes, customer uses NCP to integrate NSX-T with K8S based on standard guide. But for LB and ingress, NCP will create T1+LB automatically, not on T0. So they thinks it should not impact T0 Active/Active mode.
Sorry, but they think there is no any stateful service on T0 in this environment, just routing, no FW/NAT/LB/VPN...
Technically speaking, it seems logical that it should work. It would have to be tested if NCP actually works or if it has some check to verify if the T0 is A/S. Assuming NCP runs without problems, all your namespaces would have to be created with an annotation so they are no-NAT, so NCP doesn’t try to create an SNAT it will no be able to configure because the T0 is A/A. Even then there are still the default namespaces created in Kubernetes installation, like kub-system that will probably not work because you can’t annotate these and NCP will try to create an SNAT. In the end I doubt it will actually work and if it does your creating unnece complexity and also not following documentwtion, which could impact on supportability.
What I would do is first understand why they see a problem with A/S. Do they need more bandwidth N/S than a single edge can take care of? What are their concerns? One alternative I see is have two layers of T0. One A/A for N/S going to the physical network and another one A/S for the Kubernetes.
Actually, in NCP yaml file, we change the configuration to default no-nat. So no problem, NCP will create NS networking for no-nat default.
for customer's major concerns as below.
- They can’t trust software A/S convergence, because they had software switchover fails for many times before. If put all risks on this A/S mode for so large production environment, they totally can’t accept.
- Also they think the bandwidth is limited to total 20Gbps (two uplinks per edge), not good scalability and maybe not enough.
- Also convergence time for all applications when switchover, they think the impact is too big.
I think their concerns are very reasonable!
Most of these seem like unreasonable concerns that boil down to, "well, we're scared!".
They can’t trust software A/S convergence, because they had software switchover fails for many times before. If put all risks on this A/S mode for so large production environment, they totally can’t accept.
Convergence works fine as I've tested it many times. This isn't a reason to not adopt a feature because you're scared it won't work.
Also they think the bandwidth is limited to total 20Gbps (two uplinks per edge), not good scalability and maybe not enough.
Well, that's not true. They can use bare metal edges and get even more throughput.
Also convergence time for all applications when switchover, they think the impact is too big.
Again, has this been measured or tested? No? That's not a reason to not do something that is spelled out as a requirement. Also again, they could use bare metal edges and have convergence time down to under 1 second.
The fact is the requirement is to have your T0s in A/S mode. Would it technically work otherwise? Maybe. Who can say what other issues they'd run into let alone if they would receive support from VMware...
Yes, the customer was scared by other softwares' A/S mode before. We tried to convince them for many times.
Actually, they use the bare metal edge now. But the edges are standard config with only two 10gbps nics for uplink. So just 20Gbps for T0.
The short answer is: although it work, it hasn’t been validated by VMware, hence the documentation explicitly saying T0 has to be A/S. If customer chooses to go with A/A they may run into support issues.
Understand the validate issue. But if it works and better for customer and VMware, why not validate and support it quickly? VMware is a software company and should be very agility. So does anynoe know how to request QE test?
But if it works and better for customer and VMware, why not validate and support it quickly?
Who says it works better for your customer, let alone VMware? They haven't even tried it! As you say, they don't want to use it because they're scared of things outside of NSX-T. Again, that's not a reason to not adopt a technology, especially when you've been told a certain setting is a requirement. You don't get to make up the requirements yourself.
So does anynoe know how to request QE test?
You're more than welcome to open a SR on behalf of your customer and ask, but they're not going to change the requirement for you. Bottom line: If your customer wants to ever receive support from VMware on this NCP, they need to abide by the requirements, regardless of what their use case is or the source of their (unfounded) fears. Really nothing more to be said.
Calm down, man. This is customer's requirement, not me. And I think if NSBU add this use-case that NSX will widen the market for k8s integration. Actually, I am requesting NSBU to help assess this use case. I opened this discussion here because I thought this is Q&A BBS and could communicate with NSBU R&D experts. Anyway, I know it's just open forum for NSX technology now. Thanks for your attention!