VMware Cloud Community
LonBlast
Contributor
Contributor

HA timing

Is there any way to reduce the timing of the HA for 1 host with 1 vm to another host or is there any way to reduce the time to power on vm after moving vm.

Timing as of below:

~20 seconds to detect host isolation

~1 min to move vm to second host

~2 min to poweron vm

0 Kudos
6 Replies
depping
Leadership
Leadership

No, there's not much you can tweak to be honest. The timing sounds a bit off though, you can check the fdm.log to figure out what is happening. Is HA somehow waiting on certain resources? My experience is that with an isolation the VM is rebooted within 1 minute or so. But it will depend on the storage system used, how fast the VM is powered off, if resources are available etc.

0 Kudos
depping
Leadership
Leadership

Just went back and figured I would look at the timing, assuming this is a secondary/slave host which is isolated:

  • T0 – Isolation of the host (slave)
  • T10s – Slave enters “election state”
  • T25s – Slave elects itself as master
  • T25s – Slave pings “isolation addresses”
  • T30s – Slave declares itself isolated
  • T60s – Slave “triggers” isolation response

Which means that after 60 seconds the Master can now restart the VM. When it is a primary/master host which is isolated then it will be about 30 seconds faster to declare the host isolated. There's a 30 second wait baked in, it is defined through das.config.fdm.isolationPolicyDelaySec, but the minimum value is 30 seconds, so you can't shorten it, you can only increase it!

0 Kudos
LonBlast
Contributor
Contributor

I have checked the fdm.log, and most of time it is checking heartbeat datastore. by the way i do not have vmware tools installed in any of the VMs.

0 Kudos
depping
Leadership
Leadership

yeah it won't even look at VMware tools, this is looking at the datastore indeed. Not much you can tweak unfortunately.

0 Kudos
LonBlast
Contributor
Contributor

will using vmware tools help the timing issue?

0 Kudos
depping
Leadership
Leadership

no it won't, HA will not look at VMware Tools when it comes to the Isolation Response. Isolation Response only is about the host. it will simply kill and restart the VMs.

not sure which isolation response you used, but "power off" is faster than "shutdown" for sure.

0 Kudos