5 Replies Latest reply on Nov 14, 2018 12:03 AM by Finikiez

    SRM testing with a full network fail over / disconnection

    Baoth Novice

      Hello

       

      I am working for a client that is performing network fail over testing as part of an ongoing DR project. They currently use SRM to protect the primary site (well, a subset of VM's within the site), and have had no issues with an SRM test itself.

       

      However, the project have been told that if network connectivity was cut between the primary and DR site, SRM couldn't be used to bring the VM's up at the recovery site as when the network connectivity is subsequently restored, the SRM database would have issues as both sites would believe they are the protected site, and I presume changes to the databases would have happened would be the reason behind this.

       

      I was wondering if there is any advice on how to approach a full / proper DR test where the network is disconnected, SRM can be used to bring up protected VM's at the DR site, and then play happily when network connectivity is restored between sites please?

       

      Might be worth noting that we are using VMware vSphere 6.0.0, SRM is 6.1.2.1, and during the DR test the primary site will continue to operate with skeleton staff on site.

       

      Thanks

        • 1. Re: SRM testing with a full network fail over / disconnection
          Finikiez Master
          vExpert

          How both sites are used? are they both active? or VMs are running only on protected site?

           

           

          If you break the network connectivity between sites then you have only one option to make failover to the protected site - do disaster recovery in SRM.

          However you will get two sets of working VMs in this scenario - one on protected site and one on recovery site.

           

          So I doubt that you want to do this.

           

          SRM is not the tool that should be used when you break network between sites I guess.

          • 2. Re: SRM testing with a full network fail over / disconnection
            Baoth Novice

            Hi Finikiez

             

            Thanks for the reply.

             

            Yes, VM's are running only in the protected site.

             

            I think you are right.

             

            What is your opinion on performing a planned migration prior to breaking the network link? Once the network link is restored, would a planned migration back keep everything ticking over nicely and not break the SRM configuration?

             

            Thanks again.

            • 3. Re: SRM testing with a full network fail over / disconnection
              Finikiez Master
              vExpert

              What is your opinion on performing a planned migration prior to breaking the network link? Once the network link is restored, would a planned migration back keep everything ticking over nicely and not break the SRM configuration?

               

              My opinion that this is the only thing you should do.

               

              Or just do nothing because splitting network between sites can just break storage replication (if you replicate storage via ethernet like NetApp) and generate some alarms about remote site availability.

              • 4. Re: SRM testing with a full network fail over / disconnection
                ThompsG Master

                Hi there,

                 

                Sorry a little late to the party but running a failover while the network is offline between the two SRM servers will not break SRM.

                 

                With SRM 5.5 and before you could select an option when doing a Planned Recovery to change this to Forced Recovery. This was used in the advent that some catastrophic had happened to your protected site and you needed to get things running on the recovery site. This would perform the failover but not run any of the operations at the protected site, i.e. power off VMs, etc.

                 

                This has changed slightly with SRM 5.8x and above however still possible. Obviously the previous option displayed through the GUI was too easy for somebody to make a mistake so a Forced Recovery now requires an advanced option to be set to put SRM in Disaster Recovery mode. This means you know what you are doing and really want to proceed

                 

                Anywho - to get back on topic once you have run the Forced Recovery and have got your Protected Site back online, with VMs shutdown and replication sorted, to put SRM back to a normal Failover state you simple run the Recovery Plan again without the Forced option. As communication at this point is available between both SRM server this will check the Protected Site, realise the VMs are powered down, check the array replication state and work out it is failed over and then finish successfully. Well... that is the glossy brochure.

                 

                Read here for more information on this process: Running a Recovery with Forced Recovery

                 

                I would be at Code Brown if required to do this but it can be done

                 

                The best and least risk of data loss, is a planned failover with both sites healthy. With my current employer, we do this once a year - failover to our Recovery Site, run there for a week and then failback.


                The scenario you are describing should ONLY be used in a disaster scenario so the business would need to accept some data loss but it can be done and SRM will not be affected.

                 

                NOTE: As Finikiez said, I would NOT be doing this in a DR test scenario unless the business wants to lose data. Deliberately creating a split-brain scenario for your arrays is potentially a career limiting move.

                 

                Does this help?

                • 5. Re: SRM testing with a full network fail over / disconnection
                  Finikiez Master
                  vExpert

                  Good point, I completely forgot about this option