VMware Cloud Community
bunny101
Contributor
Contributor

VMware HA and max-segment-size at the router

Good morning, all!

Not sure if this is the correct place to post this questions, but here goes....

We have two datacenters with vSphere 4.1 clusters at each datacenter.  The vCenter is at the main DC and HA is enabled on each cluster.  The DC's are connected by an MPLS circuit.

We want to implement a VPN to fail over through our Internet connections in the event that MPLS connection goes down.  However, during testing the hosts in the remote DC show up as isolated.  It appears that they can't be pinged, even though I can RDP to a Windows box during testing. 

During failover the VPN connection is routed through a GRE tunnel between the two datacenter routers.  On advice from Cisco we implemented settings to lower the MTU and adjust the max segment size.  MTU is set to 1400 and the max segment is set to 1360.  This is to allow for the frame size overhead of the IP headers, the GRE tunneling, and the VPN inside that.

Would this be why we get the host isolation response from the local vCenter server?  If the ICMP packets are getting fragmented, would running another instance of  vCenter in the second DC help?

We are working toward a migration to vSphere 5.5 soon; this is part of the design and implementation of that migration.

Thanks to all for looking!

Gregg

Reply
0 Kudos
1 Reply
MKguy
Virtuoso
Virtuoso

I recently worked on a similar case of a remote site that required MSS tuning at the gateway, though not for vSphere hosts.

Just to confirm, when you say they show up as "isolated" you actually mean "disconnected" in vCenter, right? HA isolation is a completely different beast and independent from vCenter reachability. Or do you have a single cluster stretched across both datacenters?

MTU is set to 1400 and the max segment is set to 1360.  This is to allow for the frame size overhead of the IP headers, the GRE tunneling, and the VPN inside that.

The IPv4 header and TCP header are already 20 byte each. If you add any more protocol overhead you would need to reduce the MSS further.

Do you see any ICMP fragmentation needed packets when you capture traffic on the ESXi hosts or vCenter?

Have you confirmed that the TCP MSS value in the TCP SYN/SYN-ACK packets is correctly adjusted from one end to another, in both directions?

What if you reduce the MTU of the vmkernel management port to match your link MTU (or below).

It appears that they can't be pinged,

Default ICMP ping payloads vary between 8-32 byte, which results in IP packets well below 100 byte, including the IP and ICMP header. If that doesn't go through then it sounds like a different issue (routing, firewalls or whatever). Run some traceroutes between the vCenter and ESXi hosts (both directions) and see where you get stuck.

-- http://alpacapowered.wordpress.com
Reply
0 Kudos