Reply to Message

View discussion in a popup

Replying to:
nookzzz
Contributor
Contributor

Guest VM unpredictably lost connection when using NSX-V edge gateway

Good day mate,

I'm current having an issue with NSX-V 6.4.x version.

Let say that currently we have 2 vCenter working in a linked mode(vCenter-A and vCenter-B)

Then we have Cloud Director working on top if these linked vCenters

NSX-V is configured on both vCenter with version 6.4.11 and on Cloud Director

We've configure the basic component which are NSX manager, NSX controller and deploying NSX edge gateway for most of customers

 

The problem we facing right now is on vCenter-A we got randomly edge gateway getting hang, the symptoms are as follow:

Let says we have 3 VM here

1. Edge gateway VM = 192.168.1.1

2.VM-A = 192.168.1.2

3.VM-B = 192.168.1.3

  1. VMs residing this edge gateway lost connection from internet(public IP are not pingable from my laptop) and from VM cannot ping edge gateway VM
  2. On edge gateway VM, ARP connection from another(VM-A and VM-B) using this edge is missing from the ARP output
  3. On edge gateway VM, we login to the console and still reaching the internet (8.8.8.8 for testing)
  4. On edge gateway VM, can't connect to VM-A and VM-B (ping to 192.169.1.2 and 192.168.1.3 from edge VM is unreachable)
  5. VMs residing this edge gateway can't reach to edge gateway (ping to 192.168.1.1 is unreachable) and can't reach internet

Note that this only happen on vCenter-A, for vCenter-B has no issue at all

What we've done so far is we did upgrade NSX on vCenter-A from 6.4.11 to 6.4.14 (not helping, issue still persist after upgrade)

 

We do have a workaround is when the issue happen so we got trigger that the public ip is unreachable, the workaround we have list below:

  1. Redeploy edge gateway from Cloud Director, and the issue fixed (this option is not permanent, we found some edge gateway having repeat issue, but some not until now)
  2. We migrate Edge gateway VM to the same ESX host with the VM and creating a rule for them to make them stay together always(192.168.1.1-3 stay in the same host, this is permanent fix for us right now but not a good idea I know) 

 

We do have a hundred of edge gateway VM on vCenter-A but this happen on one Edge at at time (Another remain stable, only one got issue at a time but different random Edge gateway).

More things to know, for vCenter-A and vCenter-B we are having the physical hosts and switches on the same chassis and rack. Most of them are mixing together using the same HW and configuration. But this never happen on vCenter-B.

 

vCenter version

7.0.3 (Build 20990077)

 

ESXi version

7.0.3 (20842708)

 

 

 

Tags (1)
Reply
0 Kudos