I have two Debian 7 VMs connected to the same virtual wire. When both VM's are running on the same ESXi hosts ping, arp and ssh works. When each VM runs on a different ESXi host in the cluster ping and arp work fine but ssh fails. On the SSH server site I see SYN arriving but the server is failing to reply with SYN ACK.
This is in a nested ESXI 5.5 environment.
vshield manager version 5.5.0.
I'm using vxlan
VMs are debian 7
I'm guessing its an MTU problem of some sort. The pings are 64 bytes or whatever, but SSH is a real 1500 byte thing and the encapsulated packet is getting fragmented somewhere (which breaks VXLAN). Make sure everything along the path is at least 1600, although I usually just do 9000. Also make sure you have promiscuous mode enabled on the top level ESXi vswitches.
My VMs are set to MTU 1500. The underlaying network is all set to 9000, promiscuous mode, forged transmits, MAC address change all.enabled.
When both VMs are running on the same ESXI host it works fine. When each VM is running on a different ESXI hosts SSH fails. Ping,and arp continues to work fine.
Today I changed the network adapters of the VMs into VMXNET3 ( was E1000 ) but no luck,
Right, so the ping working but real traffic not working says MTU to me. I can't think of any other fundamental difference between the two traffic types. One way to test this would be to increase the size of the ping packets and see if they break when they get bigger:
ping -s 1000 10.0.0.1
ping -s 1472 10.0.0.1
ping -s 1500 10.0.0.1
This might also tell you if your network has the Don't Fragment bit set somewhere.
Just to clarify, you mean like 2 Linux VMs that are nested under ESXI hosts that are themselves VMs right? Are the ESXI host VMs on different physical hosts or the same?
Look at the attached jpg. The way I'm picturing it is that SSH works in scenario 1, then fails in scenario 2. Do I have that right?
Wasn't able to spend time on it. I will follow up your advise trying to change the packet size.
See also the attached jpeg of my nested setup.
I can ping any size I want it all works fine in both scenarios.
I have the exact same issue... bump!!!!
ping/ARP etc. works fine across different hosts, different subnets. TCP/UDP breaks.
esxi packet capture - it appears unidirectional e.g. TCP SYNs get to the other end, but nothing comes back.
MTU again is fine, 1500 pings in the hosts OK (1508 with overhead), 1580 VTEP to VTEP as per suggested VXLAN troubleshooting....
All firewall functions are off