8 Replies Latest reply on Jun 9, 2009 8:33 AM by bse1969

    Two site DR Scenario

    jcpoole Lurker


      My name is J.C. Poole from the City of Tuscaloosa. I wanted to ask you more experienced guys about  a DR scenario I was thinking about. 



      My DR scenario:



      We have two sites (2 ESX hosts, 1 iSCSI SAN at each site) connected over city owned fiber. We will call these Building 1 and Building 2, or B1 and B2 for short.



      -The fiber between B1 & B2 was damaged or cut. It is also the only way they can communicate. B2 has no connectivity on the LAN. B2 must go through the damaged fiber to reach the LAN or the ISP.



      - There are critical VMs on the hosts and SAN at B2 that need to be moved to the hosts and SAN at B1



      - B1 has full connectivity except to B2



      - Each site has a physical VC server running



      How would you move those critical VMs from B2 to B1?















        • 1. Re: Two site DR Scenario
          JFitchVMA Enthusiast

          Is this a hypothetical or real life problem? Without any type of connectivity you will have trouble getting the vms from one to the other. Does this setup have any type of replication setup between the arrays?


          More questions Do you have a backup of the vmdks?


          Assuming no to all of the above then you could copy the vmdks to a laptop via sftp or scp assuming its not a 200GB VM or something.


          Let me know the answers to the above questions and I can give more options hopefully

          • 2. Re: Two site DR Scenario
            jcpoole1 Lurker


            This is hypothetical. However, the fiber has been cut before. There actually is replication, but I'm not sure how often replication is taking place.  We would also have a backup from the night before.



            I think getting a to the minute copy over sftp of scp would be preferable if  the vm was being written to frequently (ie SQL,Exchange). If the site was far away, I guess I would have to use the replica or backup.



            I can't remember whether the arrays (Dell Equallogic PS5000XVs) replicate on a schedule or if they just take a full copy and send the changes after that.



            The best solution would be Fault Tolerance & redundant paths back between the buildings.



            Anyway, thanks for your insight.



            • 3. Re: Two site DR Scenario
              JFitchVMA Enthusiast


              You didn't ask what the best solution to have setup before hand I answered based on what was asked in the hypothetical question.



              In our environment we have DMX and 3Par storage that we have our VMs on and we use the replication technology to replicate the data to our cold DR site so that if we had a case where something happened to our primary site we can have our VM's stood up quickly since we replicate all during the day. We also use that setup as a good selling point for doubters about VMs. Instant DR is a big seller for someone who didn't want to spend the money on a physical server DR setup (dupilcate hardware). That is the absolute best scenario if the array is capable of doing it. Especially when you have a critical VM.



              Now once vSphere 4 comes out that will change the answer to this question. It's still a great idea to replicate your data but with HA in vSphere 4 it will give another strong option. If you haven't looked into it yet HA is going to allow you to setup a hot VM that is switched to when the original VM goes offline. Now granted the information is still scarce but from what I've read and talked to VMWare about it it'll be a very nice feature to have. It will require providing redundant hardware of course.



              Not everyone will be early adopters of 4.0 though. I have heard of cases where replication falied and there was a problem with the connection between sites, not in a disaster, but in this case in a relocation, and in that case the last resort was to go to the other site, pick up the inaccessible array and bring it to the other location. Now that's an extreme last resort but if your other site is going to be down for a week and your data either isn't replicated between sites or is out of date on critical servers that you don't have much choice. My SCP idea is one I've done to move boxes around between hosts in different VC instances so it would work for a DR as well, the biggest hurdle would be if the VM, or if you have multiple, are larger than the drive you can get ahold of. I have a VM with a 2TB disk attached it so if I didn't replication I'd be up a certain creek without a paddle trying to figure out how to get that box back..



              If you have replication technologies a good solution to look at is SRM. Hopefully at the next VMUG we can have Paul from VMWare do a presentation on that.



              Good question!



              • 4. Re: Two site DR Scenario
                jcpoole1 Lurker


                I wasn't knocking any of your recommendations. I think they were spot on.



                I have seen a presentation that Paul Weiss did on SRM, and it was great. It would have been even better if he would have done a live demonstration.



                • 5. Re: Two site DR Scenario
                  JFitchVMA Enthusiast

                  I didn't think you were

                  • 6. Re: Two site DR Scenario
                    bse1969 Novice

                    Sounds like the VMs are already there since you are running replication. Check with Dell but I believe replication is constantly sending changes. You should be able to online the replication LUNs and then bring the VMs online. There isn't a need to have to move the VMs, you are already doing that.

                    • 7. Re: Two site DR Scenario
                      JFitchVMA Enthusiast


                      That's true, but in one of his answers he said he wasn't sure how often replication was happening, so for the hypothectial situation posed if you don't have an up to date replication you have to use other methods. If replication was happening on a good frequency then all is good and you spin up your VMs from the replicated LUNs as you said, and at worst you lost a small amount of time/data depending on your replication frequency.



                      • 8. Re: Two site DR Scenario
                        bse1969 Novice

                        That is why I told him to check with Dell to see how often replication is happening. That will determine his best method of getting the VMs back up. Is he trying to come up with a backup for the replication? Maybe I misunderstood what he was looking for, but if replication is already running then like you said he is only looking at missing that time since the last replication. He has to answer the question of do I want to be missing a couple of minutes or do I want to be down the couple of hours it will take to copy the VMDKs to a laptop and get in the car then copy then to a new host (I know I said a couple of hours but I have no clue how big his VMs are).


                        I think if he is truly replicating then it is constantly sending changes, otherwise if it were schedule it would be snapshots of the LUNs. But again I don't use Dell's Equalogic and it has been a while since I have.



                        Talked to a friend of mine that is familiar with Equalogics and he said the replication is scheduled.  That being the case my preference would be to schedule the replication and just be able to bring up the VMs at a point in time.  The only question is how often to schedule it and that is a business determination on how much can you afford to lose.