VMware Cloud Community
TIANZAN
Enthusiast
Enthusiast
Jump to solution

Heartbeat Wait Time

Hi,

We have 3 Hosts in HA and heartbeat is present between them via network as well as through HA Heartbeat using 2 datastores.
We have a physical network switch in the rack that needs to turned off for a brief period for an adapter change. It will be turned off for say 5 minutes.

Is there a chance this would cause any issues with the VMs ? I understand that during the switch power off internet will not be available and the services will be out of reach.
But will it auto resume once its powered back on ? Any chance that it will cause the Hosts to try HA and try to reboot the VMs? How long will it wait for HA Heartbeat?

1 Solution

Accepted Solutions
amohammadimir
Hot Shot
Hot Shot
Jump to solution

Disable Host Monitoring before maintaining.

VMware Knowledge Base

Please remember to mark the replies as answers if they helped.

View solution in original post

10 Replies
amohammadimir
Hot Shot
Hot Shot
Jump to solution

Disable Host Monitoring before maintaining.

VMware Knowledge Base

Please remember to mark the replies as answers if they helped.
TIANZAN
Enthusiast
Enthusiast
Jump to solution

Thanks alot Amir. I will do that.

In addition to the HA, Is there anything else I need to be cautious of that can cause any issues with the Host or VMs during the powercycle of Switch?

Basically the activity is simple.This particular switch doesn't have dual power. I just want to turn off, connect a power cable from dual power supply, turn it back on.

This setup is in production. Anything else I need to do so everything in the production will continue as is after switch is back, up and running ?

Regards

Reply
0 Kudos
depping
Leadership
Leadership
Jump to solution

Disable vSphere HA as mentioned above, also, don't know what kind of storage you have, but if it is network attached and goes across the same switch then 5 minutes of downtime would trigger and "all paths down" which would impact all VMs, also something to take in to consideration!

TIANZAN
Enthusiast
Enthusiast
Jump to solution

Thanks depping.

What we have is  Fibre Channel and iSCSi based connections from the datastores to the hosts directly. Switch is not involved in it.
However there is a NAS which is mounted as a datastore connected via switch. So I will have to unmount this datastore first before I turn off the switch?

Also I have never encountered an 'All Paths Down' so far. Will it recover automatically once the switch is back up and running? Or I will have to go to configuration of each host and remount the storage devices?

Reply
0 Kudos
depping
Leadership
Leadership
Jump to solution

Depends on how you configured  HA. if HA is configured to respond to APD or PDL it will kill those VMs and restart them if possible. An APD is declared when there's no storage connection for > 140 seconds. At that point IO Retries will stop, which could cause your operating system to fail. If the OS doesn't fail, then the VM and host will recover when the storage returns for duty.

TIANZAN
Enthusiast
Enthusiast
Jump to solution

Ok. Iam planning to increese the APL time limit to 20 minutes just to be sure. So it will keep trying IO till the switch is back, up and running. This should be fine right?

Reply
0 Kudos
depping
Leadership
Leadership
Jump to solution

No please don't do this, as this will result in the hypervisor failing. Please leave the APD setting as is. Simply disable the vSphere HA APD response to disabled!

TIANZAN
Enthusiast
Enthusiast
Jump to solution

Ok. Well noted.

Instructions from this url would be ok to follow for this I guess?

//docs.vmware.com/en/VMware-vSphere/6.7/com.vmware.vsphere.avail.doc/GUID-DF3123C3-CE63-4431-A1FE-FD41F3395BCE.html

Reply
0 Kudos
TIANZAN
Enthusiast
Enthusiast
Jump to solution

Thanks Guys. I was able to do the activity successfully. I disabled APD response from web client as vsphere client wasnt showing it. Also disabled Host monitoring. Restart of the switch took around 3 mins. Connectivity was down till then but when switch came back up, Everything went back to normal. Nothing in the Host or VMs were affected.

Thanks alot for your great support :smileygrin:

Reply
0 Kudos
depping
Leadership
Leadership
Jump to solution

good to hear!

Reply
0 Kudos