Re: SRM 1.0.1 "Error: Failed to recover datastore"...

Rakot45 · ‎04-12-2009

Greetings to all.

Who is confronted with an error on the SRM storages HP EVA?

If I start "test recovery plan", everything is done properly. When

i start "run recovery plan", in paragraph 5 I see an error:

5. Recover Normal Priority Virtual Machines	Error: Failed to recover datastore:	00:00:00
5.1. Recover VM "2000"	Error: Failed to recover datastore:	00:00:00

SRM 1.0.1

ESX 3.5 U4

VC 2.5 U4

HP SRA 1.0.1

Site1 - EVA4000 (HSV200) firmware 6110

Site2 - EVA4100 (HSV200-B) firmware 6110

Management software - CV EVA 8.00.02

PS: At the same system with connected netapp storages everything works normal.

Michelle_Laveri · ‎04-13-2009

This might have changed - but I believe with the EVA - you have to two scans before the snapshot, and resigitures happen.

In the main xml or .ini file for SRM the options were you can tell SRM to do two rescans of the hbas...

Might be in the rel notes or pdfs somewhere...

EVA specific - hence it works with netapp fine but not the EVA

Regards

Mike Laverick

RTFM Education

http://www.rtfm-ed.co.uk

Author of the SRM Book: http://www.lulu.com/content/4343147

Regards
Michelle Laverick
@m_laverick
http://www.michellelaverick.com

depping · ‎04-14-2009

That's probably the problem: http://www.yellow-bricks.com/2009/02/19/srm-and-rescanning-your-storage-twice/

I've experience this every single time I work with an EVA, that's why I wrote it down

Duncan

VMware Communities User Moderator

-

Blogging: http://www.yellow-bricks.com

Twitter:

If you find this information useful, please award points for "correct" or "helpful".

Rakot45 · ‎04-14-2009

I write this option before run recovery plan

in the SRM log I found the line: dr.san.fault.RecoveredDatastoreNotFound

later I will try to change Qlogic HBA to Emulex

depping · ‎04-14-2009

Huh? Don;t think this has anything to do with the type of HBA you are using.

Did you already present the recovery sites LUNs to the recovery site hosts? this is a pre-req for a full failover!

Duncan

VMware Communities User Moderator

-

Blogging:

Twitter:

If you find this information useful, please award points for "correct" or "helpful".

Rakot45 · ‎04-14-2009

Yes i am present recovery lun to the recovery ESX HOST.

Can i send my log file to you ?

depping · ‎04-14-2009

Attach your logfiles to this topic, this way all the experts can chip in.

Duncan

VMware Communities User Moderator

-

Blogging:

Twitter:

If you find this information useful, please award points for "correct" or "helpful".

Rakot45 · ‎04-15-2009

Vmware Experts, please check this log files.

Thank's

admin · ‎04-15-2009

Hi,

It looks like a configuration issue on the array side. ESX hosts at recovery site report LUNs on target 50:0A:09:82:86:27:B9:99 which is not reported by the SRA.

Added LUN '50:01:10:A0:00:18:3E:18;0;50:0A:09:82:86:27:B9:99' with keys 'host-321;vmhba1:4:0' and 'host-321;020000000060a980004334623668344d52415873424c554e202020'

Added LUN '50:01:10:A0:00:18:3E:18;1;50:0A:09:82:86:27:B9:99' with keys 'host-321;vmhba1:4:1' and 'host-321;020001000060a98000433462385a4a4a43726179644c554e202020'

Could you check your array configuration against the SRA installation guide?

-Masha

Rakot45 · ‎04-17-2009

what type of access to the remote site i need to set ? not access or read-only ?

depping · ‎04-17-2009

Access to what?

Duncan

VMware Communities User Moderator

-

Blogging:

Twitter:

If you find this information useful, please award points for "correct" or "helpful".

Rakot45 · ‎04-17-2009

Sorry.

access mode to the LUN on the second site. Set in the creation of the replication in HP Command View.

jbloo2 · ‎04-20-2009

I think it is the dual rescan issue and agree with Duncan's assessment to set the hostRescanRepeatCnt value to 2, however you need to restart the SRM server on the recovery side for this change to take effect (like any change to vmware-dr.xml since it is only read when the SRM service is started). To restart the service, locate "VMware Site Recovery Manager" in the Windows services GUI, right-click and restart.

The logs show the value of the restart count at startup time and both logs have the same message:

Setting number of repeated host rescans during recovery to 1

The reason it is most likely the rescan issue is that the HP adapter claims it successfully failed over the target device and assigned it LUN number 1:

However, on rescan, ESX host only sees LUN 0 (presumably there are two paths to it which is why it is listed twice); it should also see a LUN 1, i.e. vmhba1:0:1

Added LUN '50:01:10:A0:00:18:3E:18;0;50:00:1F:E1:50:11:26:48' with keys 'host-321;vmhba1:0:0' and 'host-321;020c00000050001fe150112640485356323030'

Added LUN '50:01:10:A0:00:18:3E:18;0;50:00:1F:E1:50:0A:FF:48' with keys 'host-321;vmhba1:2:0' and 'host-321;020c00000050001fe1500aff40485356323030'

The lines:

Added LUN '50:01:10:A0:00:18:3E:18;0;50:0A:09:82:86:27:B9:99' with keys 'host-321;vmhba1:4:0' and 'host-321;020000000060a980004334623668344d52415873424c554e202020'

Added LUN '50:01:10:A0:00:18:3E:18;1;50:0A:09:82:86:27:B9:99' with keys 'host-321;vmhba1:4:1' and 'host-321;020001000060a98000433462385a4a4a43726179644c554e202020'

most likely are for LUNs on the NetApp array which you indicated is also attached to the ESX host -- the third value in the semi-colon-separated string immediately after "Added LUN" is the WWPN of the FC port on the array presenting the LUN, 50:0A:09 is NetApp. These are ignored as part of this failover because SRM is looking for EVA LUNs, i.e. any LUN on a WWPN returned by discoverArrays.

Duncan, I suggest updating your very useful blog post to remind uses to restart SRM service after changing vmware-dr.xml since many might not be aware of this requirement.

All

SRM 1.0.1 "Error: Failed to recover datastore" on HP EVA storages.