We've had transient datastore failures, and are trying to understand our mitigation options.
It's documented / we've observed that if an esxi host loses access to a datastore, virtual machines will continue to run on that host from some amount of time before the virtual machine is told the disks are read-only. This time appears to be on the order of 5 to 15 minutes. If the datastore is restored within some grace period, it resumes operating normally.
* What factors affect the grace time before the virtual machine is affected?
* Are there any vSphere configuration options which can affect the grace period for specific virtual machines?
This is a great question.
This gracetime you are talking about I suppose is dictated when a the last heartbeat of the Datastore is failed from all hosts.
In my experience this grace time is not greater than some seconds (maybe 10 or 15 seconds).
I found something related: VMware Knowledge Base
Maybe anyone has another thing to say.
What ESXi version are you running? Most probably the host got isolated vSphere HA heartbeat datastores, the isolation address and vSAN - Yellow Bricks