Re: SRM bunch of questions

alex0 · ‎08-22-2008

I have a few questions about SRM.

Is SRM enabled on a VirtualCenter “cluster”, “resource pool” or just the individual VMs themselves?

Does SRM actually REQUIRE all protected VMs to be in a VC cluster? If so, must the VC cluster contain the ESX hosts at both the primary AND DR sites?

If a site DOES failover, how does SRM guarantee there is enough resources for the protected VMs on the other site?

If SRM does require ESX hosts from both the primary and DR site to be in ONE VC "cluster", can you set it up in such a way that HA/DRS will only happen within a single data centre... and the DR esx hosts will only be used in event of site failure with SRM?

SRM relies on SAN mirroring … what is the maximum distance of a dark fiber between data centres before SAN mirroring is no longer possible?

Cheers

Alex

Texiwill · ‎08-22-2008

Hello,

Moved to SRM forum.

Best regards,

Edward L. Haletky

VMware Communities User Moderator

====

CIO Virtualization Blog: http://www.cio.com/blog/index/topic/168354

As well as the Virtualization Wiki at http://www.astroarch.com/wiki/index.php/Virtualization

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill

admin · ‎08-22-2008

Hi Alex,

(1) In SRM 1.0, the basic unit of replication is the datastore. Recovered VMs can be placed on arbitrary hosts/clusters, as long as the hosts can access the replicated datastores.

(2) There is no requirement for protected VMs to reside on clusters at either end. SRM requires separate VC instances managing the protected and recovery sites, so in fact it is not possible to have both protected and recovery hosts in the same VC cluster.

(3) In order to guarantee sufficient resources at recovery time the best method is to use resource pools. SRM also allows you to suspend non-critical VMs during recovery.

(4) See above, SRM requires separate VC instances for the protected and recovery sites.

(5) There is no maximum distance for FC replication (assuming the presence of appropriate switches/repeaters), but beyond a certain distance (~100 KM) synchronous replication becomes impractical. We expect that in the common case array replication underlying protected LUNs will be asynchronous.

Hope this helps!

-Alvin

alex0 · ‎08-22-2008

Hi Alvin,

Thanks for your response.

Can you confirm SRM is bi-directional... ie EITHER site in a pair can fail and recovered VMs will be recovered at the opposite site? Ie For customers which don't have a traditional "PROD / DR" setup, but instead have more of a "PROD&DR for Site B" / "PROD&DR for SiteA" setup

Hypothetical question, If SRM is NOT required, and you had two data centres within say 30 KM of each other, connected with a big dark fiber link, effecitvely creating one logical data centre across two disparate geographical locations.. then you could set up ONE logical cluster within VC that encompasses hosts within two sites. LUNs from SANs at Site A could be allocated to ESX hosts at Site B and vice versa. VMs could be vmotioned from hosts at Site A to Site B and vice versa etc... ie the setup becomes one large logical data centre.

Question ... is there any reason why you would want to do this? I see a risk, which is if the dark fiber link goes down, any VMs running at the "opposite" site (ie running at the site where the SAN holding that particular VM isn't) would go down. Furthermore, if the dark fiber went down, and you had HA clusters across sites, then HA would not no longer be guaranteed, because half of the hosts in the cluster are effectively no longer available.

Also, the capacity of dark fibre required JUST for mirroring (as is the case for SRM) might be say 0.5Gbps. The capacity required to also vmotion/DRS across data centres which includes sending I/O data across the dark fiber for VMs at "opposite" sites might be say 2.5Gbps. So there is a risk of the dark fiber going down, and the dark fiber link requires a lot more capacity so will cost a lot more.

Can you see any positives for setting up an environment this way?

With an SRM enabled setup, is there any way you can still have two VCs, one at each site, however one VC is a 'standby' VC ... so that day-to-day you can control both sites from one VC, instead of having to log into two separate VCs, one for each site?

If is possible in an SRm enabled setup, it is recommended by VMware for SRM setups? Is it best practice? Is it supported? If it is supported but not recommended, what are the risks?

I believe once SRM is configured, to activate the DR process in the event of a DR, this is done at the click of a button. Once the button is clicked, the failover process is completely automated. Can you confirm this is correct?

I believe if you need to restore BACK to the failed site once it comes back up, this is a MANUAL process (although SRM can assist with part of the process). Are there plans in the next release of SRM to fully automate the restoration back to the failed site at the click of a button?

Regards

Alex

Michelle_Laveri · ‎08-23-2008

See my inline...

>

I have a few questions about SRM.
Is SRM enabled on a VirtualCenter “cluster”, “resource pool” or just the individual VMs themselves?

The SRM's scope is on limited by the LUNs, if a LUN is replicated, then SRM can protect anything on that volume. It is not enabled on particular VC object as such...

Does SRM actually REQUIRE all protected VMs to be in a VC cluster? If so, must the VC cluster contain the ESX hosts at both the primary AND DR sites?

No you don't need DRS/HA to do SRM, you could have unclustered ESX hosts...

If a site DOES failover, how does SRM guarantee there is enough resources for the protected VMs on the other site?

SRM is DR automation and is not really responsible for performance issues - that's a function of the quality of the ESX hosts at the recovery site. That said, you can excludes, suspend and power on vms in a specific order which allow to make sure the important stuff gets the resources you need. You will have to capacity planning for DR location just like you do for you Protected Site.

If SRM does require ESX hosts from both the primary and DR site to be in ONE VC "cluster", can you set it up in such a way that HA/DRS will only happen within a single data centre... and the DR esx hosts will only be used in event of site failure with SRM?

DR is triggered in SRM manually, automatically. HA will only kick in if you have ESX host failure. In many ways there are separate technologies that have no direct relationship with each other.

BTW. Remember SRM requires two SRMs and Two VC - one at each site. There is no relationship between them, apart from the pairing process and inventory mappings. We do not have a uber-cluster that spans to locations. Although this is possible with stretch VLANs and stretched clustering - this is more do with your underlying storage/network layer - it has directly nothing do with SRM

SRM relies on SAN mirroring … what is the maximum distance of a dark fiber between data centres before SAN mirroring is no longer possible?

Not clue. I understood the max distance (before repeaters) for dark fibre is 10km (?!?!) in reality you more likely to be constrained by the comms infrastructure in your environment. It's common practise to have two hop replication. Synchronous via dark fibre from Primary to first DR location, and then Asych (with network pipes+latancy) to get it an even further distance. Dependent on your requirements and restrictions imposed by your storage vendor - SRM will work with asych and synch replication and doesn't care if the pipe is WAN or Dark Fibre technology. What matters are your recovery point objective (RPOs)... in other words if you invoke DR at what "state" do you want your data to be - exact, 1hr out of synch, since last night out-of-synch....

Cheers
Alex

Regards

Mike Laverick

author of a soon to be released book on SRM!

Regards
Michelle Laverick
@m_laverick
http://www.michellelaverick.com

Michelle_Laveri · ‎08-23-2008

See my inline

Hi Alvin,
Thanks for your response.
Can you confirm SRM is bi-directional... ie EITHER site in a pair can fail and recovered VMs will be recovered at the opposite site? Ie For customers which don't have a traditional "PROD / DR" setup, but instead have more of a "PROD&DR for Site B" / "PROD&DR for SiteA" setup>

SRM is bidirectional

Hypothetical question, If SRM is NOT required, and you had two data centres within say 30 KM of each other, connected with a big dark fiber link, effecitvely creating one logical data centre across two disparate geographical locations.. then you could set up ONE logical cluster within VC that encompasses hosts within two sites. LUNs from SANs at Site A could be allocated to ESX hosts at Site B and vice versa. VMs could be vmotioned from hosts at Site A to Site B and vice versa etc... ie the setup becomes one large logical data centre.

Yes, you need streched VLANs so the VM does not leave the network is on

Question ... is there any reason why you would want to do this? I see a risk, which is if the dark fiber link goes down, any VMs running at the "opposite" site (ie running at the site where the SAN holding that particular VM isn't) would go down. Furthermore, if the dark fiber went down, and you had HA clusters across sites, then HA would not no longer be guaranteed, because half of the hosts in the cluster are effectively no longer available.
Also, the capacity of dark fibre required JUST for mirroring (as is the case for SRM) might be say 0.5Gbps. The capacity required to also vmotion/DRS across data centres which includes sending I/O data across the dark fiber for VMs at "opposite" sites might be say 2.5Gbps. So there is a risk of the dark fiber going down, and the dark fiber link requires a lot more capacity so will cost a lot more.

You need 1g for VMotion on the network pipes. I have a customer who vmotions from London to Bournemouth - which is big distance... Why do it? Management. Planned DR events

But NOT true diaster DR, you need two functioning ESX hosts at both sites for this to work

Can you see any positives for setting up an environment this way?

See above

With an SRM enabled setup, is there any way you can still have two VCs, one at each site, however one VC is a 'standby' VC ... so that day-to-day you can control both sites from one VC, instead of having to log into two separate VCs, one for each site?

VMware have always supported standby for VC (since VC1 Update 4 you could even cluster them). This is separate but related issue to SRM. SRM REQUIRES two VC's one at each site...

If is possible in an SRm enabled setup, it is recommended by VMware for SRM setups? Is it best practice? Is it supported? If it is supported but not recommended, what are the risks?

No. Not supported. SRM requires to two VC at both the primary and DR location

I believe once SRM is configured, to activate the DR process in the event of a DR, this is done at the click of a button. Once the button is clicked, the failover process is completely automated. Can you confirm this is correct?

Yes, completed automated - BUT the reality is that you will still have INTERNAL processes that require a human operator to interceed. SRM contains a "message" feature to stall a recovery plan - to allow this manual action to taken, before then resumming the recovery plan

I believe if you need to restore BACK to the failed site once it comes back up, this is a MANUAL process (although SRM can assist with part of the process). Are there plans in the next release of SRM to fully automate the restoration back to the failed site at the click of a button?

Mmm, the failback process I think will become EASIER... but is unlikely to be come a point & click event. It is just too scary and serious an undertaken to be undertaken with a casual click of the button. In fact some would say that failback is more risky/serious generally in the world DR (with or without SRM) than failover.

Regards
Alex

Regards
Michelle Laverick
@m_laverick
http://www.michellelaverick.com

alex0 · ‎08-23-2008

Mike,

I appreciate your responses.

The customer that vmotions from London to Bournemouth... they must therefore have one logical data centre configured in VC.

I assume their setup is NOT SRM-enabled.

If they DID want to move to an SRM-enabled setup, would they have to sacrifice the ability to vmotion between the two separate data centres?

Regards

Alex

Michelle_Laveri · ‎08-23-2008

mmm, good question

Had to take a bit paper out and scribble that down just to get that in my head....

1. SRM has NO idea physically where your ESX hosts are... but an ESX host cannot be managed by two VC at the same time..... (currently)

I think they could have both. That one be one VC+SRM at Protected Site, with an ESX hosts in both Protected & Recovery Location. - of you go and do VMotion, for planned DR...

There would have to be a separate VC with recovery ESX hosts that would be a mirror of the protected location. But the ESX hosts in the recovery location that you do VMotions too - would NOT be allowed to be listed in the same Recovery VC/SRM environment because of 1.

Technically, I think its possible - but ugly/clunky. We have to remember that VMotion was never really intended to be DR tools. Of the opinon you should never take a hammer to do the work of spanner...

Regards

Mike

Regards
Michelle Laverick
@m_laverick
http://www.michellelaverick.com

alex0 · ‎08-23-2008

Hi Mike,

Thanks again for your comments.

In regards to reasons FOR having one logical virtualcenter cluster across two disparate geo locations, you mentioned:

1. planned DR events

2. management

Can you expand on this a little, because I'm confused.

For planned DR events, I'm assuming that this means you can vmotion stuff onto the other site during a planned DR event... however, in a real DR scenario, this option would not be available to you. Furthermore, this assumes that the SANs at both sites which holds the VM being vmotioned will remain accessible during DR, again, in a real DR scenario, this would not be available.

Using vmotion as a planned DR event tool makes absolutely no sense to me.

For management... yes it is convenient to see your entire VMware environment from one viClient instance. However, you could simply have two viClients loaded, one for each site. It seems pretty extravagant to create one logical data centre simply to be able to use one viClient instead instead of two... or are there deeper reasons behind your "management" explanation which I'm not seeing?

Regards

Alex

Michelle_Laveri · ‎08-24-2008

OK.

1. Is very easy to explain. Perhaps I should have spent more time being clear. By definition VMotion will NOT help for unplanned DR events. The true diaster a-la Twin Towers that no-one can see coming. By defintion a Planned DR event is one that we can see coming. Some would argue that using VMotion to move your virtual machines off-site is quicker and less intrusive to uses. By planned DR event you could say something like major infrastructure maintenance which is going to inhibit operations such as - power, water, heating, building or road maintaince outage - perhaps someone wants to dimolish a building near your location - and they don't want you being around during that time. But agreed VMotion is of NO USE in a unplanned, true-diaster scenario - I wasn't making an agrument it should....

2. Management

Much hard to give a usage case. I know people who do it, and people are ALWAYS asking me is possible on training courses. If I can make contact with the customer who does London-Bournemouth VMotions, I will try to ask them what management scenarios they use this functionality for. I think it maybe an internal issue like 99% of IT team is in London, their master plan is to get to Bournemouth in the next 18 months. But VMs are being created in London, and then VMotion'd to Bournemouth. Another words some kind of gradual datacenter move...

Disclaimer: I should perhaps say although I have set up VMotion many times - I've never personally set it up across sites - I just roughly know how its done, and that it can be done. What I'd be interested in learning is this. Say you use VMotion for a planned moved to a different datacenter - how does that affect the storage. I imagine you would have to make sure the destination ESX hosts had r/w access to the volumes at the alternative location, and also - what about the data integrity issues if you have to power off the primary location, and VMotion back a week later when you have access to the primary location. I imagine this would be very much like the SRM deals with manual failback...

If this is a topic that really interests you - we should split this thread. And put the VMotion-across-sites in the VirtualCenter forum - where you might get more realistic responses. But keep they DR/SRM stuff here. They are really separate topics. Remember VMotion was never intended to be used as DR tool.

Regards

Mike

Regards
Michelle Laverick
@m_laverick
http://www.michellelaverick.com

alex0 · ‎08-24-2008

Hi Mike,

If you did use vmotion as a planned DR tool as you suggest, eg if they were going to POWER OFF SITE A, you would do this.

1. vmotion VMs to Site B

2. some of those VMs you just vmotioned would already be sitting on a SAN at Site B, so do nothing further with those

3. those VMs which you just vmotioned that are still sitting on a SAN at SITE A, you would storage vmotion them to a Site B SAN

When you want to move back to Site A:

1. optionally storage vmotion the VMs off the Site B SAN back to the Site A SAN

2. vmotion VMs back to Site A

Remember in this one-logical-cluster-across-two-geo-locations scenario, a SAN at site A can present LUNs to servers at site B, and vice versa across the dark fibre. The SANs don't even know they are going across a dark fiber, for all they know they think they are provisioning on local fabric.

Regards

Alex

Michelle_Laveri · ‎08-29-2008

Thanks for this Alex...

I would really love to get the opportunity to configure something like this - but unfortunately, its way beyond what I can do with my lab equipment!

Thanks for the clarification...

Regards

Mike

Hi Mike,
If you did use vmotion as a planned DR tool as you suggest, eg if they were going to POWER OFF SITE A, you would do this.
1. vmotion VMs to Site B
2. some of those VMs you just vmotioned would already be sitting on a SAN at Site B, so do nothing further with those
3. those VMs which you just vmotioned that are still sitting on a SAN at SITE A, you would storage vmotion them to a Site B SAN
When you want to move back to Site A:
1. optionally storage vmotion the VMs off the Site B SAN back to the Site A SAN
2. vmotion VMs back to Site A
Remember in this one-logical-cluster-across-two-geo-locations scenario, a SAN at site A can present LUNs to servers at site B, and vice versa across the dark fibre. The SANs don't even know they are going across a dark fiber, for all they know they think they are provisioning on local fabric.
Regards
Alex

Regards
Michelle Laverick
@m_laverick
http://www.michellelaverick.com

All

SRM bunch of questions