VMware Networking Community
rajeevsrikant
Expert
Expert

NSX Edge Gateway - HA

I was going through the NSX Edge Gateway HA & I found the below link.

NSX 6 Documentation Center

Local link IPs are assigned to HA virtual machines in the NSX Edge HA so that they can communicate with each other.

You can specify management IP addresses to override the local links.

Would like to know by default what IP range is assigned for the HA for the local link.

I have 2 NSX Edge Gateways in ECMP, wanted to change from ECMP to HA. So would like to know about the local link IPs.

Reply
0 Kudos
14 Replies
rajeevsrikant
Expert
Expert

Further to the above question what exactly the Dead time & frequency means in this.

EDGE-1-0> show service highavailability

Highavailability Status: running
Highavailability Unit Name: edge-1-0
Highavailability Unit State: active
Highavailability Interface(s): vNic_1
Unit Poll Policy:
   Frequency: 3 seconds
   Deadtime: 15 seconds
   Stateful Sync-up Time: 10 seconds
Highavailability Healthcheck Status:
   Peer host [edge-1-1 ]: good
   This host [edge-1-0 ]: good
Highavailability Stateful Logical Status:
   File-Sync running
   Connection-Sync running
      xmit xerr rcv rerr
      21612 0 13920 0

Reply
0 Kudos
smitmartijn
VMware Employee
VMware Employee

Hi,

The default IP range that the NSX Edges use are from RFC3927, which means they're in 169.254.0.0/16. This range should definitely not be used in your production networks, so there's usually no danger to overlapping IPs there.

As for your other question;  the frequency timer is the polling interval in seconds to which the Edge sends heartbeat messages. The dead timer is the amount of time in seconds for a standby edge to not receive any heartbeat messages before it starts becoming the active edge. The dead timer can be modified to a minimal amount of 6 seconds, at this time.

Hope that helps!

Reply
0 Kudos
rajeevsrikant
Expert
Expert

Thanks.

I can find the option to change the dead interval. But is there any way to change the frequency interval.

Can the frequency interval be changed via REST API or by some other means.

Reply
0 Kudos
rajeevsrikant
Expert
Expert

Any inputs ?

Reply
0 Kudos
bayupw
Leadership
Leadership

Just to add more information on the HA vNIC.

Based on design guideVMware® NSX for vSphere Network Virtualization Design Guide ver 3.0 if you don't specify any interface for HA, the first internal interface created is selected.

pastedImage_0.png

Uplink interface is not supported for HA, so at least one internal interface is required.

Specifing an IP is not mandatory and the vNIC will automatically get 169.254.1.x/30 IP address.

pastedImage_1.png

As per design guide, VXLAN-backed PortGroup is recommended so you can have one Logical Switch per NSX Edge.

This way you don't need to manage the IP address of HA vNIC; as long as you have one Logical Switch per NSX Edge, you will be safe and there will be no IP conflict since the network is isolated just for the Edge HA.

Regarding the poll frequency, I can't find any documentation but based on testing the frequency seems to have 1 seconds frequency every 4 seconds deadtime

pastedImage_6.png

pastedImage_8.png

pastedImage_4.png

pastedImage_3.png

pastedImage_2.png

Bayu Wibowo | VCIX6-DCV/NV
Author of VMware NSX Cookbook http://bit.ly/NSXCookbook
https://github.com/bayupw/PowerNSX-Scripts
https://nz.linkedin.com/in/bayupw | twitter @bayupw
Reply
0 Kudos
rajeevsrikant
Expert
Expert

As per VMware recommendation the dead timer should be 9 seconds.

So if i set the dead timer to 9 seconds what will be the frequency. Will the frequency changes based on the dead timer internal.

From your output, my understanding is that for 9 seconds dead time. the frequency will be 2.25 seconds.

So for every 2.25 seconds the HA heart beat will be exchanged & checked. If no heart beat is received for 9 seconds it is considered as dead.

Let me know if my above understanding is right.

Reply
0 Kudos
bayupw
Leadership
Leadership

Yes 9 seconds is safe as per design guide

pastedImage_0.png

I haven't found any document that explains about that but at least that would be the frequency based on my testing

pastedImage_1.png

Bayu Wibowo | VCIX6-DCV/NV
Author of VMware NSX Cookbook http://bit.ly/NSXCookbook
https://github.com/bayupw/PowerNSX-Scripts
https://nz.linkedin.com/in/bayupw | twitter @bayupw
Reply
0 Kudos
rajeevsrikant
Expert
Expert

In HOL LAB 1703 , I was enabling HA in the Edge Gateway.

For HA configuration I was using vNIC "Transit_Network_01"

pastedImage_0.png

When the HA is enabled it is using IP Address 169.254.1.1 on vNIC_1 (Transit Network)

vNIC_1 transit network is having IP Address 192.168.5.1. So is this normal to use 169.254.1.1 on vNIC_1 even though it is having IP Address 192.168.5.1

pastedImage_5.png

Also both the Active - Standby Edge will be on different hosts.

The heart beat between the Active - Standby Edge Gateways will be via Physical switch.

So is there any configuration like VLAN or other settings required on Physical switch to allow the heart beat to flow through.

Reply
0 Kudos
bayupw
Leadership
Leadership

The transit network on the HOL is probably on VXLAN logical switch so there is no need to specify VLAN specific that heartbeat, the only VLAN you need to tag is the VXLAN transport/VTEP VLAN.

If the HA vNIC is connected to VLAN-backed dvPortGroup then you need to tag the dvPortGroup VLAN just like any normal VLAN-backed dvPortGroup, this setting is not specific to Edge HA.

It will use the 169.254.1.x/30 if you do not specify any IP address on the Management IPs.

It is easier to have dedicated logical switch per Edge HA.

If you are sharing same logical switch across multiple pair of Edge HA, then the IP addresses would conflict or HA would not work properly.

It would work sharing same logical switch with Transit Network, just like normal Layer 2 network/VLAN you can have multiple network/subnet in that VLAN.

But for production purpose, I would dedicate a logical switch for Edge HA

Bayu Wibowo | VCIX6-DCV/NV
Author of VMware NSX Cookbook http://bit.ly/NSXCookbook
https://github.com/bayupw/PowerNSX-Scripts
https://nz.linkedin.com/in/bayupw | twitter @bayupw
Reply
0 Kudos
rajeevsrikant
Expert
Expert

Thanks

I am planning to have a separate logical switch for HA as you have mentioned below.

Reply
0 Kudos
rajeevsrikant
Expert
Expert

pastedImage_0.png

I had the above scenario.

The edge gateway was in Active - Standby. I have kept the OSPF timers as 30 & 120 as recommended by VMware. The HA Dead timer = 9  seconds recommended by VMWare.

When I shutdown the active Edge Gateway , there was interruption of nearly 150 seconds for the north south traffic.

The switch over of Edge gateway from Active - Standby was ok but there is long interruption of 150 seconds with the timers of 30/120 OSPF recommended by VMWare.

Is this normal ?

If it is normal what is the reason for this.

Request any ones input at the earliest.

Reply
0 Kudos
rajeevsrikant
Expert
Expert

Any inputs

Reply
0 Kudos
rajeevsrikant
Expert
Expert

I have identified the exact reason for this.

OSPF adjacencies are deleted with MD5 authentication after NSX Edge HA failover (2147787) | VMware K...

It is related to the bug mentioned above.

Any one have faced similar issue with NSX version above 6.2.4 ?

Reply
0 Kudos
rajeevsrikant
Expert
Expert

Got confirmation from VMware that this issue exists in all the NSX versions & at present there is no fix to this.

Reply
0 Kudos