Hi,
I'm running SRM 4.1.2-484598 with two EVA8400's and 4.1 hosts.
During a 'real' failover test a couple of week ago the hosts running protected VMs became unresponsive during the failover of the first set of VMs. The remaining VMs on the hosts were unaffected from a network connectivity point of view. The hosts disconnected from vCenter and I could not connect a VI client directly to them.
The remaining failover plans were unable to gracefully shutdown protected VMs so when booted in the recovery site they were obviously in a crashed state; which was not ideal for a planned failover.
A reboot to the protected hosts brought them back online. The actual failover (and failback) completed sucessfully for 150+ VMs but its just wasn't very controlled!
Any idea? Sounds like a APD problem to me, I'm wondering if the LUNS were not correctly changed to 'Read Only' so to allow the protected hosts to still access inventory.
Thanks
Anyone?....
Hi,
I have exactly the same issue, with EVA 8400's
Testing failover to recovery site, causes all hosts on protected site to go into APD condition after rescan of datastores.
A restart of management agents on the hosts will rectify ... or a restart of host.
I currently have a case in with VMWare support for this... will post answer once fix known and tested.
Cheers
no fix for this problem..
VMWare will not publicly announce this, but this is a known bug, fix is to upgrade to V5.1
I have only seen this with the 8400's, no other storage, so this issue may be particular to this..
We are running this in production, and have had a site failure and failover/ failback, works well, apart from hosts going offline after failover(back)
Workaround is to reset management agents via SSH... or DCUI.. but SSH works best... not a problem as long as you are prepared for it, and incorporate into your procedures.
Cheers
Thanks. We have upgraded to 5.1 and test failovers appear to be okay.