VMware Cloud Community
jamesrico
Contributor
Contributor

Real failover problem with EVA8400

Hi,

I'm running SRM 4.1.2-484598 with two EVA8400's and 4.1 hosts.

During a 'real' failover test a couple of week ago the hosts running protected VMs became unresponsive during the failover of the first set of VMs. The remaining VMs on the hosts were unaffected from a network connectivity point of view. The hosts disconnected from vCenter and I could not connect a VI client directly to them.

The remaining failover plans were unable to gracefully shutdown protected VMs so when booted in the recovery site they were obviously in a crashed state; which was not ideal for a planned failover.

A reboot to the protected hosts brought them back online. The actual failover (and failback) completed sucessfully for 150+ VMs but its just wasn't very controlled!

Any idea? Sounds like a APD problem to me, I'm wondering if the LUNS were not correctly changed to 'Read Only' so to allow the protected hosts to still access inventory.

Thanks

0 Kudos
4 Replies
jamesrico
Contributor
Contributor

Anyone?....

0 Kudos
stu161
Contributor
Contributor

Hi,

I have exactly the same issue, with EVA 8400's

Testing failover to recovery site, causes all hosts on protected site to go into APD condition after rescan of datastores.

A restart of management agents on the hosts will rectify ... or a restart of host.

I currently have a case in with VMWare support for this... will post answer once fix  known  and tested.

Cheers

0 Kudos
stu161
Contributor
Contributor

no fix for this problem..

VMWare will not publicly announce this, but this is a known bug, fix is to upgrade to V5.1

I have only seen this with the 8400's, no other storage, so this issue may be particular to this..

We are running this in production, and have had a site failure and failover/ failback, works well, apart from hosts going offline after failover(back)

Workaround is to reset management agents via SSH...  or DCUI.. but SSH works best... not a problem as long as you are prepared for it, and incorporate into your procedures.

Cheers

jamesrico
Contributor
Contributor

Thanks. We have upgraded to 5.1 and test failovers appear to be okay.

0 Kudos