VMware Networking Community
Goran2018
Contributor
Contributor

ESG / NSX logical switch - intermittent error when pinging some port

Hi to all,

we have implemented new environment with vSphere and NSX.

NSX use is to implement Load balancing for MS Exchange and Horizon View. We have decided to go with inline LB scenario.

We have deployed one Edge Services Gateway which is router and load balancer for NSX network segment (NSX logical switch) where are MS Exchange and Horizon View servers connected.

Everything went fine, installation and deployment. Routing works fine and we have network visibility between NSX logical switch and rest of the network.

We did not realize problem until colleagues which manages Exchange installation did not point us into a problem.

They pointed that communication between Exchange (connected to NSX logical switch behind ESG) and domain controller (connected to another VLAN in datacenter) doesn't work as it should. Communication which uses port 3269 (global catalog) is not stable and breaks.

Servers which are connected to NSX logical switch (and routed through ESG) have network visibility with domain controller, ping goes with 100 % but when we ping specific port on domain controller (in this case 3269) we have some loss which apparently is problem for exchange. This loss is sometimes smaller, sometimes bigger.

If we move servers from logical switch to standard VLAN port group on vDS, we don't have problem.

Deployed ESG is Quad Large so performance is not a problem.

We are using following versions:

NSX:  6.4.4.11197766

vCenter: 6.7.0 11727113

ESXi: ESXi6.7u1-10302608_20181010

If someone have some advice, I would appreciate help.

Thank you and best regards

Goran

Ping port error.jpg

0 Kudos
4 Replies
Sreec
VMware Employee
VMware Employee

Servers which are connected to NSX logical switch (and routed through ESG) have network visibility with domain controller, ping goes with 100 % but when we ping specific port on domain controller (in this case 3269) we have some loss which apparently is problem for exchange. This loss is sometimes smaller, sometimes bigger.

I hope the server which you are referring here is the same exchange which is behind the LB ? I'm sorry bit confused with that point.  If that is the case, to keep LB out of the equation, i would recommend to take out one of the Exchange node from LB  and test the connectivity via Edge -Transit VLAN to DC VLAN and confirm if you have packet/ping drops ?

Cheers,
Sree | VCIX-5X| VCAP-5X| VExpert 7x|Cisco Certified Specialist
Please KUDO helpful posts and mark the thread as solved if answered
0 Kudos
Goran2018
Contributor
Contributor

Hi Sreec,

thank you for reply.

Servers I am refering are exchange servers but same is with other servers connected to logical switch behind ESG.

We have

ESG uplink connected to standard VLAN where are some other production servers (and DCs) and

ESG internal connected to logical switch.

We migrated Exchange from logical switch to standard VLAN and problem is gone for Exchange. Now we have few servers behind ESG which we use for testing.

Anyhow, problem is not visible for same servers when they are connected to standard VLAN (where ESG uplink is too).

Best regards

Goran

0 Kudos
Goran2018
Contributor
Contributor

I have new information so I wanted to share them.

If someone have idea, please suggest.

It looks that communication problem is visible only between virtual servers connected to NSX logical switch and virtual or physical server located in same Blade chassis.

So, there are two domain controllers: one is VM, another is physical. Both located in same chassis.

Virtual servers connected on NSX logical switch and routed through ESG or DLR looses ping to domain controllers located in same Blade chassis.

If you disconnect mentioned virtual servers from logical switch and connect them to VLAN port group on vDS ping goes with 100%.

Crazy fact, when virtual servers are connected to virtual switch, ping to some distant domain controllers goes with 100%, in same time when pings are failing to local domain controllers.

It looks like some performance problem?

I would appreciate some advice, if any, regarding strangeness of problem.

Best regards

Goran

0 Kudos
Sreec
VMware Employee
VMware Employee

Apologize for the late reply. Can you provide an update on below queries

1. When we place DLR,ESG,AD virtual machine on same blade - Do you have any drops ? - If so, please provide the trace route output

2. When ESG,DLR,AD are scattered across multiple blades , same chassis - you mentioned there are drops - Please provide the trace output in this case as well.

3. Are these UCS blades ?

4. ESG are in A-S mode with OSPF /BGP ?

Cheers,
Sree | VCIX-5X| VCAP-5X| VExpert 7x|Cisco Certified Specialist
Please KUDO helpful posts and mark the thread as solved if answered
0 Kudos