VMware Networking Community
Rahul418282
Contributor
Contributor

NSX environment performance issue

Hello Experts,

Customer is facing performance issues in VMware SDDC environment and blaming NSX for it. I can see ESXi hosts are healthy, rules are pushed in hypervisors, no error on NSX dashboard but yet they facing latency when traffic is passing through VXLAN and DFW rules.

Can anyone guide me what additional parameters should I check? Any suggestions are welcome.

Thank you!!

Rahul Kumar

Reply
0 Kudos
5 Replies
Sreec
VMware Employee
VMware Employee

I'm sorry to say this Smiley Happy , you need do your home work to isolate this performance issue .

1.What is the latency you are seeing when user report the issue ? How did you test the latency ?  Is there any strict latency requirements for those apps ?

2.How is the design for this setup ?

3.For what kind of workloads we have performance issues ?

4.What type of traffic is reporting performance issues?

5.Do we have such issues from the beginning ?

6.Was there any change in the setup recently ?

7.Do we have a specific time frame for such issues or it is intermittent ?

8. Do we have any performance monitoring tools/software's in this setup ?

Please do watch VMworld 2017 US - NET1343BU - NSX Performance Deep Dive - YouTube and never ignore vSphere design ,it can be a potential caveat as well.

Cheers,
Sree | VCIX-5X| VCAP-5X| VExpert 7x|Cisco Certified Specialist
Please KUDO helpful posts and mark the thread as solved if answered
Reply
0 Kudos
Rahul418282
Contributor
Contributor

Hello Sreec,

Answers to your questions are below inline

1.What is the latency you are seeing when user report the issue ? How did you test the latency ?  Is there any strict latency requirements for those apps ? Using tool httperf with rate test = 10000, installed on Source VM in the LAN cluster.

2.How is the design for this setup ?  3 ESG in ECMP mode connecting down to one DLR. Separate ESGs in one-arm mode are being used as load balancer for the backend servers.

Only two clusters are under same datacenter at vcenter level. One LAN cluster ( vxlan not configured ), one VXLAN cluster ( vxlan configured ).  Source VM is in LAN cluster and target VMs are in VXLAN cluster ( mircosegmentation is done to allow traffic - DFW rules are in place  - Target VM's are behind separate ESGs in one-arm mode ).

3.For what kind of workloads we have performance issues ?  For all applications hosted in VXLAN cluster.

4.What type of traffic is reporting performance issues? TCP traffic most of the time

5.Do we have such issues from the beginning ? Not from the beginning. We upgraded NSX from 6.3.4 to 6.4.5 in oct-nov 2019. After that customer started reporting such issues in platform. I can't any bug reported by VMware on internet.

6.Was there any change in the setup recently ? No, except for NSX upgrade in cot-nov, 2019.

7.Do we have a specific time frame for such issues or it is intermittent ?  it's for every test they running to validate test across platform.

8. Do we have any performance monitoring tools/software's in this setup ? Except the tool httperf, no other tool is being to monitor the latency. Any advice?

Reply
0 Kudos
HassanAlKak88
Expert
Expert

Hello,

A quick hint, are they using the Applied To option under NSX DFW or they keep it the default?

The Applied To defines the scope at which this rule is applicable which decrease the number of rules applied per VM network adapter.

check the following: https://www.esvr.cloud/2017/08/10/the-importance-of-nsx-distributed-firewall-applied-to/


If my reply was helpful, I kindly ask you to like it and mark it as a solution

Regards,
Hassan Alkak
Reply
0 Kudos
Rahul418282
Contributor
Contributor

Hello HassanAlKak88

Yes, Problem is customer has created all the dfw firewall rule with "Applied to" set to DFW in turn it has replied to very vnic of VMs hosted on platform. Although firewall rules are around 1500-1700 but per vnic it has exceeded supported number ( 3500 max as per VMware ). In my case it's over 5700. This is what VMware support team has concluded after raising this case to them and root cause of performance issues.

I don't have visibility on what rule is being used for what. Has anyone faced this situation before and what was done to rewrite the existing rules?

Reply
0 Kudos
HassanAlKak88
Expert
Expert

Hello Dear,

To handle this kind of problems, you have to make a global assessment on all your firewall rules and try the below:


If my reply was helpful, I kindly ask you to like it and mark it as a solution

Regards,
Hassan Alkak
Reply
0 Kudos