After upgrading our environment to ESXi6, we've been experiencing DAG failover at least once a week during DRS vMotion failover. However, if we manually perform the failover, there are no issues. The SameSubnetDelay has already been increase to 20 seconds from default of 5 seconds.
Anyone knows why? What's changed in ESXi6 that is causing this long timeout for DRS vMotion on Exchange 2010 DAG?
HP Proliant G8
HP 3Par with SSD/SAS
For the time being you should disable DRS on your DAG nodes to avoid the failures. It could very well be that you have some underlying performance issue that is causing the DRS migration to occur and causing your bandwidth problems. Meaning, whatever is causing stress that causes DRS to engage is probably causing the failure which is why you don't see it during a manual vMotion.
Check the cluster sensitivity- http://www.vmware.com/files/pdf/using-vmware-HA-DRS-and-vmotion-with-exchange-2010-dags.pdf
Relavent section is 5.2 I believe. This seemed to work for us- at least, the databases don't flop anymore. Exchange guy said his indexes failed though- but they came back about 30 secs later.
See what you think.
Whoops didn't read your whole post. That is odd indeed, if you already have the sensitivity relaxed.