VMware Cloud Community
TaupoJohn
Contributor
Contributor

IBM V7000 SRA with SRM 5.1

There is a conversation going on in another thread

(http://communities.vmware.com/message/2227065#2227065)

- but it is an old thread and started off with different versions of everything.

I'm going to start a new thread based on current versions.

I cannot get the SRA to work with SRM 5.1

When I run a "test" failover for a simple recovery plan - I get this error:

"Error - Failed to create snapshots of replica devices. Failed to create snapshot of replica consistency group test1rep. SRA command 'testFailoverStart' failed for consistency group 'test1rep'. Volume cannot be created, please check test mdiskgroup cofiguration Refer to IBM SAN Volume Controller troubleshooting"

I'm running:

IBM v7000 Version 6.4.1.2 (build 75.0.1211301000)

vCenter 5.1

SRM 5.1.0-941848 on Windows 2008 R2 64 bit (2vCPU 8GB RAM)

SRM advanced settings
I have set the Advanced Settings option:
    storage.Commandtimeout to 1800 (def 300)
    storageProvider.hostRescanRepeatCnt to 3 (def 1)
    storageProvider.hostRescanTimeoutSec 1800 (def 300)

SRA v2.1.0.121224 (also tried 2.1.0.121108)
non-preconfigured environment

    Pre-Configured Env. not set
    Test MDisk Group ID set
    SpaceEfficient Mode = False

Is anyone having success with the IBM V7000 SRA (any version) with SRM 5.1?

Thanks

rgds, John B
Reply
0 Kudos
37 Replies
TaupoJohn
Contributor
Contributor

maybe this isn't so bad after all. This was quicker

test failover.jpg

rgds, John B
Reply
0 Kudos
TaupoJohn
Contributor
Contributor

I have good news to report!

We expanded our test LUN to 1TB, and waited for it to sync to the other site.

I ran a test recovery plan (this time with 2 VMs) and it worked fine and finished in 15 minutes!

It seems like it doesn't really matter how big the LUN is, or how many VMs there are - it takes around the same time.

test failover 1TB lun.jpg

I will try a real failover next - and report back. All my versions are listed previously - but if this is all successfull, I will summarise with all the versions that I'm using.

rgds, John B
Reply
0 Kudos
TaupoJohn
Contributor
Contributor

I had everything working perfectly, and now we're trying to set up a recovery plan in the opposite direction.

(both sites will be protected and recovery sites for half the VMs)

I'm back to the same problems!

I think the "Test MDisk Group ID" of 0 is wrong on the other side. When I use this - it gives a "can't find the group ID" error. If I try anything else it gives the "can't create snapshot" error (the original error I had) - but when you look closely, the SAN isn't even trying to do anything.

So does anyone know how to find the "Test MDisk Group ID" number?

In the V7000 web manager - this is just the Pool name and it has a *name*, not a number.

How do I find the identifier (eg 0) that works on one site but not the other?

PS - we're still waiting for IBM to return our call, to start the process of logging this support incident. It's only been 3 working days 😉

rgds, John B
Reply
0 Kudos
TaupoJohn
Contributor
Contributor

My problem is caused by having some RDM LUNs sync'd besides the one used by my test VM. It works fine with just the one LUN being replicated.

I'll work on this...

EDIT - yes, I needed a VM in the protection group that had access to all of the LUNs in the consistency group. I was being careful, and tried it with a test VM, which did not work - but it works fine with the "real" VM.

I think I can say that this is working now - for a test failover. Next we have to try a "real" datacenter migration.

rgds, John B
Reply
0 Kudos
TaupoJohn
Contributor
Contributor

I can now verify that this is working perfectly - and much faster than it was last week. An entire recovery (not test plan) of 3 VMs and 4 LUNs is only taking around 3 minutes, and then < 1 minute to "reprotect".

for the sake of having all th einformation in one place:

Here are the versionn I'm using:

IBM v7000 Version 6.4.1.2 (build 75.0.1211301000)

vCenter 5.1

SRM 5.1.0-941848 on Windows 2008 R2 64 bit (2vCPU 8GB RAM)

SRM advanced settings
I have set the Advanced Settings option:
    storage.Commandtimeout to 1800 (def 300)
    storageProvider.hostRescanRepeatCnt to 2 (def 1)
    storageProvider.hostRescanTimeoutSec 600 (def 300)


SRA v2.1.0.121224
non-preconfigured environment

    Pre-Configured Env. not set
    Test MDisk Group ID set
    SpaceEfficient Mode = true

sra config.jpg

settings.jpg

I hope this is of value to someone

rgds, John B
Reply
0 Kudos
Mikky83
Contributor
Contributor

Hi to all,

Great discussion so far. I have some questions for those who allready created SRM enviroment with v7000.

1. What is best practice regarding Datastores and VM's when using SRM, one VM - one Datastore or multiple VM's (how much ?) - one Datastore?

2. What is your expirience with Global Mirror with change volumes and SRM ?

3. What about versioning on v7000 and SRM? A choice of version for possible recovery.

BR Miki,

Reply
0 Kudos
FrankZo
Contributor
Contributor

Hi,

1. There is no best practice regarding datastores. I'm using 500gb, 1tb and even 2tb datastores with SRM. On some datastores I have 15 virtual machines and others just 1. When a VM is on multiple datastores, SRM will detect that en place them together. Its a good idea to place these vdisks in the same consist group.

Also mind the remote vdisk ID. When there are doubles SRM will detect them and show them as VMware datastores in de device view of the SRA. Huge bug.....But doenst seem to break the functionallity of SRM

2. On some disks we have change volume. Works fine. It's even supported by IBM

3. Please explain versioning (I'm not actually managing the SVC environment , just know the bits for SRM to work Smiley Happy )

Reply
0 Kudos
Mikky83
Contributor
Contributor

Hi FrankZo,

Thanks for your reply. Regarding answers.

1. What was your definition & theory regarding "How many VM's on Datastore" ?

Thanks for point about vdisk ID.

2. What is better (simple) option regarding global mirror on v7000 with SRM? Standard GM or GM with change volumes? How do you define RPO time? How you manage replication?

3. Versioning is to have two or three different versions (snapshots) on DR site. For example when primary Site goes offline in 14:28, and there are corrupted data on DR site in replica from 14:15, customer want to have another copy from 14:00. Is this possible in practice with v7000/SVC?

And another question, why you choose SRM to work with SRA (storage replication) instead of native vSphere Replication?

BR Miki,

Reply
0 Kudos
Mikky83
Contributor
Contributor

Anyone to share something :smileyconfused:

Reply
0 Kudos
FrankZo
Contributor
Contributor

He guys, IBM just released a whole new SRA on the FTP site. 2.2.0. Seems that we now can filter in devices view. Something I was waiting for (and requested) a long time ago.  I;m going to test this as soon as possible.

ftp://ftp.software.ibm.com/storage/ds_open_api/VMWARE/SVC_SRA/

Reply
0 Kudos
Mcshammertime
Contributor
Contributor

Hi all

apologies for bringing up such an old post but that is exactly the same issue I'm having and would really appreciate some quick advice as the IBM doco is a bit hard to decipher!

iI've created a volume at site 1 (protected site), a volume at site 2 (recovery site) and they are synch'd via remote copy.

first - does SRM care if they're not the same name?

ive then created a flash copy of both volumes, they're synch'd and in a consistency group.

this is the bit I'm confused about - does the mirrored (remote copy) volume need to be mapped at the second (recovery) site, or does SRM do it? I tried mapping it but vm complains about the signature, if i choose keep signature it won't add the volume.

Same question for the flash copy volumes, do they need to be mapped to either/both sites?

thanks in advance, any assistance would be greatly appreciated

Reply
0 Kudos
TaupoJohn
Contributor
Contributor

I can't remember now - but I don't think we had to map the volumes at the recovery site. The SRA should take care of that.

Also - we didn't use flash copies.

I don't know if the remote copy volumes need to have the same name or not - just try running some test recovery plans to see if things are working OK.

rgds, John B
Reply
0 Kudos
Mcshammertime
Contributor
Contributor

Hi taupo

thanks for the quick reply. Are you certain flashcopy mappings aren't required? I thought you said you did?Almost every VMware/IBM SRM document is saying its a prerequisite - here are the steps they claim should be followed

Procedure

  1. 1. Create an equal number of FlashCopy (target) volumes as the Remote Copy

target volumes on the recovery site SAN Volume Controller.

  1. 2. Create a background copy and incremental FlashCopy mapping between

Remote Copy target volumes and the previous created FlashCopy target

volumes on the recovery site SAN Volume Controller.

  1. 3. If the remote copies are in a consistency group, create a corresponding

FlashCopy consistency group and configure the corresponding FlashCopy to

the FlashCopy consistency group.

  1. 4. Map the Remote Copy target and FlashCopy target volumes to the recovery site

vSphere servers.

  1. 5. Create an equal number of FlashCopy (target) volumes as the Remote Copy

source volumes on the protected site SAN Volume Controller.

  1. 6. Create a background copy and incremental FlashCopy mapping between

Remote Copy source volumes and the previously created FlashCopy target

volumes on the protected site SAN Volume Controller.

  1. 7. If the remote copies are in a consistency group, create a corresponding

FlashCopy consistency group and configure the corresponding FlashCopy to

the FlashCopy consistency group.

  1. 8. Map the Remote Copy source and FlashCopy target volumes to the protected

site vSphere servers.

Note:

  1. a. FlashCopy configuration on the protected site is for the test recovery and

recovery operation after the reprotectoperation on the recovery site.

  1. b. FlashCopy (target) volumes are resynchronized during recovery
Reply
0 Kudos
TaupoJohn
Contributor
Contributor

I'm pretty sure the SRA created the flash copy when it needed to - for a test failover. One of the SRA settings tells it to do this (rather than having the flash copy already created)

rgds, John B
Reply
0 Kudos
Mcshammertime
Contributor
Contributor

hmm. Interesting. My first failed tests were with no flash copy's, just the remote copy in place.

srm adds the placeholder to the recovery site and then bombs out with the snapshot error. Based on the instructions I posted I figured I was required to put the flash copy's in place for SRM.

talking to an IBM tech he suggested the same thing - could it be based on preconfigured vs nom preconfigured?

Reply
0 Kudos
TaupoJohn
Contributor
Contributor

>could it be based on preconfigured vs nom preconfigured?

Exactly! Try "non preconfigured" 😉

rgds, John B
Reply
0 Kudos
Mcshammertime
Contributor
Contributor

Haha sorry for the poor spelling, its late and I've spent too long staring at this screen!

ill give "non" Smiley Happy  pre-configured a shot, sounds like it removes alotve overheads

cheers!

Reply
0 Kudos
TaupoJohn
Contributor
Contributor

I knew what you meant Smiley Wink

I always type like that Smiley Happy

rgds, John B
Reply
0 Kudos