VMware Cloud Community
ChallengeLogic7
Enthusiast
Enthusiast

IBM V7000 & SRM 5.0 - Failed to create snapshots of replica devices

Hi All

I wonder if anyone can help.

We have 2 IBM Storwize V7000's - 1 at the Protected Site and 1 at the Recovery Site.

We have a test Volume that is GlobalMirror 'repliacted' to the DR side - showing 'Consistent Synchronized'.

Within SRM 5.0 - Array Managers I can see that the Volume is found and replicated as expected. Not sure if our replicated Volume needs to be in a Consisteny group tho?

array.JPG

When we run a Recovery Plan it fails at Step 4. and we get this error:-

4. Create Writeable Storage Snapshot

Error - Failed to create snapshots of replica devices.

SRA command 'testFailoverStart' failed. sraError.38295F72-F7D0-4A0F-B8D4-FF9821AB2675.1.desc sraError.38295F72-F7D0-4A0F-B8D4-FF9821AB2675.1.fixHint

We are using the IBM V7000 SRA version 2.x (downloaded from VMware). Our IBM V7000's are Firmware 6.2.0.5.

The SRA is configured with the tick box 'Pre-Configured Env.' ticked. All other settings greyed out (as shown):

sra.JPG

I have set the Advanced Settings option: storage.Commandtimeout to 1800 in 'Sites' as suggested by IBM.

Has anyone out there got their IBM V7000's working with SRM 5.0? If so are we missing a piece of the jigsaw somewhere? Does the DR side (on the V7000) need a FlashCopy Mapping as well?

Thanks in advance.

31 Replies
vMario156
Expert
Expert

Hi,

if you run the recovery plan is the failover process working completly?

In your case you are testing a recovery plan, otherwise you wouldnt have the step "4. Create Writeable Storage Snapshot".

I don´t know your IBM system but the snapshot function (in your case it seems to be FlashCopy), needs to be working mainly on your DR site! Because your storage system at the DR site is creating the snapshot not the system at the protected site. Of course it should be also on your protected site for testing the failback later after you did a failover (or in a bi-directional setup), but in the first step just your DR site matters.

Regards,

Mario

Blog: http://vKnowledge.net
Reply
0 Kudos
ChallengeLogic7
Enthusiast
Enthusiast

Hi

Thanks for the input - yes we are just doing a Test not a Recovery. The test fails at Step 4. We do have a FlashCopy Mapping of the Volume thats been replicated at the DR side. 

Is there anyone with specific IBM V7000 experience & SRM 5.0 out there? thanks

Reply
0 Kudos
zzmax65
Enthusiast
Enthusiast

Silly question: do you have enough free space for the flashcopy? I know that the free space must be 20% (at least) of the volume you want to snapshot...

Reply
0 Kudos
ChallengeLogic7
Enthusiast
Enthusiast

Yep - enough free space. We are now in the process of testing a new SRA code release from IBM....

Reply
0 Kudos
visak
Contributor
Contributor

Hi,

Is you SRM working now with v7000 without any issue .What is your SRA and SRM version you are running

Thanks,

Visak

Reply
0 Kudos
christas
Enthusiast
Enthusiast

HI - I know this article is a little dated, but I am curious if you ever found a solution to your issues. We are having very similar issues, using exactly what you have. The IBM V7000 Storwize, VMware SRM version 5, and ESXi 5 Update1.

I know we've had to adjust some of the storageProvider settings, under Advanced Settings for both sites (protected and recovery) but that hasn't solved all of our issues. We still have failed tests, with random and similar errors.

If you can, will you update this discussion as to what you did to resolve your issues? Also, all of our LUNs are in consistency groups, which are in various Protection Groups, depending on the test I am performing.

Thank you!

VCP5, VCAP5-DCA
Reply
0 Kudos
vladpmf
Contributor
Contributor

Hi, Having the same problem here:


Site Recovery Manager 5.02

IBM Storwize v7000 6.4.1.3
SRA IBMSVCSRA_v2.1.0.121224



"Error: Failed to create snapshots of replica devices. Failed to create snapshot of replica consistency group ..."

Reply
0 Kudos
christas
Enthusiast
Enthusiast

When we received the error that you have about failing to create snapshot of replica devices, we had to make sure that the flash copies at the DR site were mapped to the DR ESXi host cluster, but NOT the change volumes at the DR site. Once we did that the failure to create the snapshots stopped.

I forget if there was anything else we did, on the SAN.

In SRM, for both the protected and recovery site, I went into advanced settings for each site (right click and select advanced settings once in SRM) and changed the storage provider then under storageProvider.hostRescanRepeatCnt [number of repeated host rescans during test and recover] I changed it from 1 to 2.. that still did not give me the result i needed so i changed it from 2 to 3. It helps give the hosts time to rescan and mount any "new" snapshot flashcopies that are slower to prepare.

Do NOT change the storageProvider.resignatureFailureRetryCount [number of times to retry resignaturing a VMFS colume (after a failure)] to anything but 1... this will fail miserably to the extent that the test will instantly fail, and actually cause your flash copies from production to DR to stop!

Thats just the update on what we are going through right now. I have an open PMR with IBM for the remaining issues with the SRA not working well with the SAN. I will keep you posted.

To note, we are at the following code levels as of yesterday:

vCenter 5.1

SRM 5.1

SRA v2.1.0.120916   This I pulled down from VMware's approved SRAs under the My Downloads page..

VCP5, VCAP5-DCA
Reply
0 Kudos
vladpmf
Contributor
Contributor

Hello, I'm using the 6.4.1.3 microcode, which microcode are you using ?

Thanks in advance

Reply
0 Kudos
tanwk
Enthusiast
Enthusiast

Same error here.  Have not done any setting in the advanced setting.

On IBM Storwise 7000, 6.4

SRM 5.1.1

vSphere 5.1

Same thing failing at the snapshot.

Both flashcopy has been enabled and mapped.  Any update to the error resolution would be good.

Current status, we are able to by pass the above error now we are facing "failed to recover datastore vmfs volume" issues.  I have increased the HBA per host scan time to 3000 from 1800.  The repeat count is the same remaining at default of 3.

Blog: http://plain-virt.blogspot.com
Twitter: @tanwk3
LinkedIn: http://sg.linkedin.com/in/weekiongtan
Reply
0 Kudos
vladpmf
Contributor
Contributor

Here goes my update on this.

Created a new volume(small one 3GB, installed minimal linux for testing), consistency group with change volumes, downgrade SRA from IBMSVCSRA_v2.1.0.121224 To SRA 2.1.0.121108, after that, I was able to test SRM with success.

          1. Synchronize Storage Skipped   
          1.1. Protection Group teste Skipped   
          2. Restore hosts from standby Success 2013-04-09 09:41:28 (UTC 0) 2013-04-09 09:41:28 (UTC 0) 
          3. Suspend Non-critical VMs at Recovery Site Inactive   
          4. Create Writeable Storage Snapshot Success 2013-04-09 09:41:28 (UTC 0) 2013-04-09 09:43:06 (UTC 0) 
          4.1. Protection Group teste Success 2013-04-09 09:41:28 (UTC 0) 2013-04-09 09:43:06 (UTC 0)

But the issue still remains on largest volumes (ex. 500GB), still fails with change volumes:

     "Error - Failed to create snapshots of replica devices. SRA command 'testFailoverStart' failed. Invalid Array ID. Refer to IBM SAN Volume Controller      troubleshooting"

the strange thing is that it creates the snapshot and the consistency group fcmap

When i try to run the clean process under SRM, it fails with:

     "Error - Failed to delete snapshots of replica devices. SRA command 'testFailoverStop' failed. Invalid Array ID. Refer to IBM SAN Volume Controller      troubleshooting"

I have to force clean up, but then again I get the follwing warning message:

     "Warning - Failed to delete snapshots of replica devices. SRA command 'testFailoverStop' failed. Invalid Array ID. Refer to IBM SAN Volume Controller troubleshooting"

Reply
0 Kudos
vladpmf
Contributor
Contributor

Hi,

Are you using a preconfigured env. or the non-preconfigured env ?

Thanks on advance.

Reply
0 Kudos
TaupoJohn
Contributor
Contributor

Same problems here:

IBM v7000 Version 6.4.1.2 (build 75.0.1211301000)

vCenter 5.1

SRM 5.1.0-941848

I'm running a "test" of a simple recovery plan.

I'm getting the same exact error with IBMSVCSRA_v2.1.0.121224 and IBMSVCSRA_v2.1.0.121108.

"Error - failed to create snapshots of replica devices

blah blah

Volume cannot be created"

I am not set up for preconfigured. and set the "Test MDisk Group ID" on both sides. I have not set

"Protect Source Vols" or "Protect Target Vols".


I have set the Advanced Settings option: storage.Commandtimeout to 1800

and storageProvider.hostRescanRepeatCnt to 3.

Has anyone actually got this working yet?

rgds, John B
Reply
0 Kudos
vladpmf
Contributor
Contributor

Hi, it seems to be an IBM SRA problem, I've opened a case on IBM, I'm still waiting for an answer.

Reply
0 Kudos
TaupoJohn
Contributor
Contributor

Can you let use know if you hear anything Vlad?

Thanks.

rgds, John B
Reply
0 Kudos
vladpmf
Contributor
Contributor

Sure.

Reply
0 Kudos
FrankZo
Contributor
Contributor

going to follow this thread. I'm very interested in the SRA for SVC / v7000. There are a lot of issues with this SRA which are not solved. They should make a better SRA soon!!

Reply
0 Kudos
vfovermars
Contributor
Contributor

Hi there,

I've done a lot of testing on all kind of SRA levels, V7000 levels and SRM versions. Lots of call's at IBM, VMware and tested several SRA's from the development team and test lab in Mainz.

Here's my working setup.

First of all, use the NON-preconfigured environment. Don't use the preconfigured environment. Use the latest SRA version you can download at the VMware site. At the time of this reply, this is version .....24.zip

Leave all the checkboxes empty and don't fill in the Src. Mdisk Group ID and Target Mdisk Group ID.

Set the Test MDisk Group ID according to your V7000 setup at both sites. You might need to ad the ID feeld in the V7000 Gui to be able to read the ID..

You can use SpaceEfficient Mode or not.

SRV-DC-01-A (680 474 121) - TeamViewer_2013-04-24_00-40-08.jpg

Within SRM adjust the advanced settings at both sites.

Set the repeat count settings at 2 and the timeout at 600

SRV-DC-01-A (680 474 121) - TeamViewer_2013-04-24_00-53-47.jpg

Dont forget to change all the settings at both sites.

Hope this helps.

This setting works for me.

Reply
0 Kudos
TaupoJohn
Contributor
Contributor

Thanks vfovermars,

That's pretty much what I'm using. I still don't see how to find the Test MDisk Group ID, but "0" seems to work for me (we only have one big pool on each SAN).

The problem now seems to be that the SAN is not creating a Flash Copy. It just sits at 0%, and the SRA waits and waits and waits.

I've started another thread - for up to date versions. You can see it here:

http://communities.vmware.com/message/2231160

rgds, John B
Reply
0 Kudos