VMware Cloud Community
MrPix
Contributor
Contributor

SRM with EVA8x00, test recovery plan fails

Hi,

Got a bit of an issue and hoping I can get some clarification here. We have logged the problem with HP already and are awaiting results.

We have an Active/Active environment and replication is setup and working OK. We have used the 'old manual' method of failover without SRM to test that the underlyning infrastructure all works OK and this was a success as we are able to bring up Site A's VMs on Site B once failed over, rescanned and registered.

We have both BC and CA licences for both sites.

We can successfully create a snapshot on the EVA of the replicated/masked LUN.

We have installed SRM and the EVA VA (SRA) and this has completed successfully on both sites..... and have set up all the required elements.

We have created a recovery plan, but when we press the 'Test' on the RP, it fails after 5 minutes..15%.... it times out.

The Error extracted from the logs is below:

2009-01-30 14:56:33.521 'SecondarySanProvider' 5996 error SysCommand encountered a failure while executing command 'C:\Program Files\VMware\VMware Site Recovery Manager\external\perl-5.8.8\bin\perl.exe', Error: 'AsyncWaitForExit: timeout occured'

2009-01-30 14:56:33.521 'SecondarySanProvider' 5996 warning Failed to create lun snapshots: Unexpected Vmacore::SystemException AsyncWaitForExit: timeout occured (86870884)

2009-01-30 14:56:33.521 'SecondarySanProvider' 5996 verbose Deleting lun snapshots

The snapshot of the masked LUN seems to be created as we can see it on the recovery site's Command View screen, although access to this application is hampered and really slow to refresh while running the test.

Can anyone shed some light on this.... help (please).

Kindest regards,

MrPix

Message was edited by: MrPix

Reply
0 Kudos
3 Replies
admin
Immortal
Immortal

Hi MrPix,

You can try to increase the SRA command execution timeout. The default is 300 seconds (5 minutes) which doesn't seem to be enough in your environment. To change the timeout edit SanProvider/CommandTimeout setting in SRM's configuration file (vmware-dr.xml):

Then restart SRM service and re-run test recovery.

Hope this helps,

-Masha

Reply
0 Kudos
MrPix
Contributor
Contributor

Hi mariab,

Today my colleague reinstalled the EVA VA's on both sites. we have also tweaked the performance at ESX level by applyying the latest QLogic patches and removing the VMFS2 drivers at startup,

This combination has resulted in a great improvement and successful test failovers. Just in case we hit a busy time on the EVA, we have increased the timeout following your suggestion, to 450 seconds as it seemed to be on the cusp of 5 mins during the original tests.

Thanks for your suggestion, I'm sure it will be useful for others too. Smiley Happy

Kindest reagrds

MrPix

Reply
0 Kudos
maronep
Contributor
Contributor

Heya,

Had a similar problem however was when configuring the array manager for the first time, time out period to run the scripts was too short. Had to increase to more the 15 minutes (900s) to work with command view .

Thanks for the tip..!

P

Reply
0 Kudos