VMware Networking Community
5mall5nail5
Enthusiast
Enthusiast

SSH to VM behind ESG/DLR stops working (diagram included) -- "software caused connection abort"

Hi all -

I am playing with NSX in a lab.  I have the following scenario setup:

NSX Diagram.jpg

I'd like to get OSPF working with the pfSense, but for now I have static routes which work fine.  I have route redistribution setup for connected and static.  When I do "show ip route" on the ESG I see:

kcloud1esg1-0> show ip route

Codes: O - OSPF derived, i - IS-IS derived, B - BGP derived,

C - connected, S - static, L1 - IS-IS level-1, L2 - IS-IS level-2,

IA - OSPF inter area, E1 - OSPF external type 1, E2 - OSPF external type 2,

N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2

Total number of routes: 5

S       0.0.0.0/0            [0/0]         via 192.168.50.1

C       10.250.250.0/24      [0/0]         via 10.250.250.1

O   E2  10.251.251.0/24      [110/0]       via 10.250.250.2

O   E2  10.252.252.0/24      [110/0]       via 10.250.250.2

C       192.168.50.0/24      [0/0]         via 192.168.50.254

I have the firewall turned off and all of my VMs are in the "Exclusion List" within the NSX Manager (not entirely sure what this does yet but it seemed to be something I might want to use up front).

I can SSH to 10.251.251.200 from 192.168.50.19 - I connect.  I can ping google from 10.251.251.200 and I can update the Linux VM, no problem.  However, after a somewhat random amount of time between 15 - 45 seconds, SSH will drop and I cannot figure out why!  If I restart the putty session it re-establishes just fine.. but will drop again.

Any thoughts?

Edit:  I should mention I am running NSX 6.3.5, but this occurred on 6.3.3 as well.

Edit 2:  In an effort to not be defeated by this, I've performed a packet capture from the desktop I am SSH'ing from.  Got some yucky stuff just prior to the SSH drop:

wireshark.jpg

Thanks!

0 Kudos
3 Replies
parmarr
VMware Employee
VMware Employee

Please review the Firewall Rules with a Custom Layer 3 Protocol section of the NSX Administration Guide that may assist on resolving this issue - https://docs.vmware.com/en/VMware-NSX-for-vSphere/6.3/com.vmware.nsx.admin.doc/GUID-293B8FB3-8261-48...

Sincerely, Rahul Parmar VMware Support Moderator
0 Kudos
Mparayil
Enthusiast
Enthusiast

Suspecting the TCP timeout mismatch between the Server the Firewall can you check what is the tcp timeout set in the Linux server and mach that exactly to the ESG and see how the behavior is.

you can refer to the below KB article for getting the TCP timeout value and to set TCP timeout value.

vCNS/NSX Edge Firewall TCP Timeout Values (2101275)

VMware Knowledge Base

I would recommend you to modifies it on Server and match to ESG and see. give a try ! :

Regards

Manoj VP

0 Kudos
bayupw
Leadership
Leadership

Hi

Do you see this issue only after you have changed the routing from static to dynamic (OSPF) or you also have this issue when you were on static routing?

If it's only on OSPF, check if the OSPF is dropped at the same time your SSH is dropped.

You can try debug OSPF packet from Edge for example.

I have seen some dynamic routing issues in some firewalls, I don't have much experience with pfSense tho.

On the vDS side, how many vmnics do you have and what kind of load balancing policy do you have?

Not sure if this is acceptable in your environment, but if you have multiple vmnics you can try removing/disconnecting one of the vmnic from the vDS dvUplink to eliminate vDS load balancing issue

Bayu Wibowo | VCIX6-DCV/NV
Author of VMware NSX Cookbook http://bit.ly/NSXCookbook
https://github.com/bayupw/PowerNSX-Scripts
https://nz.linkedin.com/in/bayupw | twitter @bayupw
0 Kudos