There is a conversation going on in another thread
(http://communities.vmware.com/message/2227065#2227065)
- but it is an old thread and started off with different versions of everything.
I'm going to start a new thread based on current versions.
I cannot get the SRA to work with SRM 5.1
When I run a "test" failover for a simple recovery plan - I get this error:
"Error - Failed to create snapshots of replica devices. Failed to create snapshot of replica consistency group test1rep. SRA command 'testFailoverStart' failed for consistency group 'test1rep'. Volume cannot be created, please check test mdiskgroup cofiguration Refer to IBM SAN Volume Controller troubleshooting"
I'm running:
IBM v7000 Version 6.4.1.2 (build 75.0.1211301000)
vCenter 5.1
SRM 5.1.0-941848 on Windows 2008 R2 64 bit (2vCPU 8GB RAM)
SRM advanced settings
I have set the Advanced Settings option:
storage.Commandtimeout to 1800 (def 300)
storageProvider.hostRescanRepeatCnt to 3 (def 1)
storageProvider.hostRescanTimeoutSec 1800 (def 300)
SRA v2.1.0.121224 (also tried 2.1.0.121108)
non-preconfigured environment
Pre-Configured Env. not set
Test MDisk Group ID set
SpaceEfficient Mode = False
Is anyone having success with the IBM V7000 SRA (any version) with SRM 5.1?
Thanks
maybe this isn't so bad after all. This was quicker
I have good news to report!
We expanded our test LUN to 1TB, and waited for it to sync to the other site.
I ran a test recovery plan (this time with 2 VMs) and it worked fine and finished in 15 minutes!
It seems like it doesn't really matter how big the LUN is, or how many VMs there are - it takes around the same time.
I will try a real failover next - and report back. All my versions are listed previously - but if this is all successfull, I will summarise with all the versions that I'm using.
I had everything working perfectly, and now we're trying to set up a recovery plan in the opposite direction.
(both sites will be protected and recovery sites for half the VMs)
I'm back to the same problems!
I think the "Test MDisk Group ID" of 0 is wrong on the other side. When I use this - it gives a "can't find the group ID" error. If I try anything else it gives the "can't create snapshot" error (the original error I had) - but when you look closely, the SAN isn't even trying to do anything.
So does anyone know how to find the "Test MDisk Group ID" number?
In the V7000 web manager - this is just the Pool name and it has a *name*, not a number.
How do I find the identifier (eg 0) that works on one site but not the other?
PS - we're still waiting for IBM to return our call, to start the process of logging this support incident. It's only been 3 working days 😉
My problem is caused by having some RDM LUNs sync'd besides the one used by my test VM. It works fine with just the one LUN being replicated.
I'll work on this...
EDIT - yes, I needed a VM in the protection group that had access to all of the LUNs in the consistency group. I was being careful, and tried it with a test VM, which did not work - but it works fine with the "real" VM.
I think I can say that this is working now - for a test failover. Next we have to try a "real" datacenter migration.
I can now verify that this is working perfectly - and much faster than it was last week. An entire recovery (not test plan) of 3 VMs and 4 LUNs is only taking around 3 minutes, and then < 1 minute to "reprotect".
for the sake of having all th einformation in one place:
Here are the versionn I'm using:
IBM v7000 Version 6.4.1.2 (build 75.0.1211301000)
vCenter 5.1
SRM 5.1.0-941848 on Windows 2008 R2 64 bit (2vCPU 8GB RAM)
SRM advanced settings
I have set the Advanced Settings option:
storage.Commandtimeout to 1800 (def 300)
storageProvider.hostRescanRepeatCnt to 2 (def 1)
storageProvider.hostRescanTimeoutSec 600 (def 300)
SRA v2.1.0.121224
non-preconfigured environment
Pre-Configured Env. not set
Test MDisk Group ID set
SpaceEfficient Mode = true
I hope this is of value to someone
Hi to all,
Great discussion so far. I have some questions for those who allready created SRM enviroment with v7000.
1. What is best practice regarding Datastores and VM's when using SRM, one VM - one Datastore or multiple VM's (how much ?) - one Datastore?
2. What is your expirience with Global Mirror with change volumes and SRM ?
3. What about versioning on v7000 and SRM? A choice of version for possible recovery.
BR Miki,
Hi,
1. There is no best practice regarding datastores. I'm using 500gb, 1tb and even 2tb datastores with SRM. On some datastores I have 15 virtual machines and others just 1. When a VM is on multiple datastores, SRM will detect that en place them together. Its a good idea to place these vdisks in the same consist group.
Also mind the remote vdisk ID. When there are doubles SRM will detect them and show them as VMware datastores in de device view of the SRA. Huge bug.....But doenst seem to break the functionallity of SRM
2. On some disks we have change volume. Works fine. It's even supported by IBM
3. Please explain versioning (I'm not actually managing the SVC environment , just know the bits for SRM to work )
Hi FrankZo,
Thanks for your reply. Regarding answers.
1. What was your definition & theory regarding "How many VM's on Datastore" ?
Thanks for point about vdisk ID.
2. What is better (simple) option regarding global mirror on v7000 with SRM? Standard GM or GM with change volumes? How do you define RPO time? How you manage replication?
3. Versioning is to have two or three different versions (snapshots) on DR site. For example when primary Site goes offline in 14:28, and there are corrupted data on DR site in replica from 14:15, customer want to have another copy from 14:00. Is this possible in practice with v7000/SVC?
And another question, why you choose SRM to work with SRA (storage replication) instead of native vSphere Replication?
BR Miki,
Anyone to share something :smileyconfused:
He guys, IBM just released a whole new SRA on the FTP site. 2.2.0. Seems that we now can filter in devices view. Something I was waiting for (and requested) a long time ago. I;m going to test this as soon as possible.
ftp://ftp.software.ibm.com/storage/ds_open_api/VMWARE/SVC_SRA/
Hi all
apologies for bringing up such an old post but that is exactly the same issue I'm having and would really appreciate some quick advice as the IBM doco is a bit hard to decipher!
iI've created a volume at site 1 (protected site), a volume at site 2 (recovery site) and they are synch'd via remote copy.
first - does SRM care if they're not the same name?
ive then created a flash copy of both volumes, they're synch'd and in a consistency group.
this is the bit I'm confused about - does the mirrored (remote copy) volume need to be mapped at the second (recovery) site, or does SRM do it? I tried mapping it but vm complains about the signature, if i choose keep signature it won't add the volume.
Same question for the flash copy volumes, do they need to be mapped to either/both sites?
thanks in advance, any assistance would be greatly appreciated
I can't remember now - but I don't think we had to map the volumes at the recovery site. The SRA should take care of that.
Also - we didn't use flash copies.
I don't know if the remote copy volumes need to have the same name or not - just try running some test recovery plans to see if things are working OK.
Hi taupo
thanks for the quick reply. Are you certain flashcopy mappings aren't required? I thought you said you did?Almost every VMware/IBM SRM document is saying its a prerequisite - here are the steps they claim should be followed
Procedure
target volumes on the recovery site SAN Volume Controller.
Remote Copy target volumes and the previous created FlashCopy target
volumes on the recovery site SAN Volume Controller.
FlashCopy consistency group and configure the corresponding FlashCopy to
the FlashCopy consistency group.
vSphere servers.
source volumes on the protected site SAN Volume Controller.
Remote Copy source volumes and the previously created FlashCopy target
volumes on the protected site SAN Volume Controller.
FlashCopy consistency group and configure the corresponding FlashCopy to
the FlashCopy consistency group.
site vSphere servers.
Note:
recovery operation after the reprotectoperation on the recovery site.
I'm pretty sure the SRA created the flash copy when it needed to - for a test failover. One of the SRA settings tells it to do this (rather than having the flash copy already created)
hmm. Interesting. My first failed tests were with no flash copy's, just the remote copy in place.
srm adds the placeholder to the recovery site and then bombs out with the snapshot error. Based on the instructions I posted I figured I was required to put the flash copy's in place for SRM.
talking to an IBM tech he suggested the same thing - could it be based on preconfigured vs nom preconfigured?
>could it be based on preconfigured vs nom preconfigured?
Exactly! Try "non preconfigured" 😉
Haha sorry for the poor spelling, its late and I've spent too long staring at this screen!
ill give "non" pre-configured a shot, sounds like it removes alotve overheads
cheers!
I knew what you meant
I always type like that