VMware Cloud Community
notesguru99
Contributor
Contributor

vCenter DR Situation!

Hi All,

We have just experienced a power failure on the street where our main business resides, so our VM hosts along with the SAN's and everything else powered off before we had time to gracefully shut things down!

I have got it all going again, but it wasn't straight forward and upon writing my DR report I've come to question some of my earlier decisions and whether or not I should change anything.

My main issue was that the vCenter server was virtualised. It was a bit of a faff connecting to the correct host and then firing up Vcenter and reconnecting to this server for the VI.  In hindsight better documentation of our systems would have helped, but I am just wondering if I could make life easier by somehow auto starting the vCenter server once the hosts start?  Although when I connected directly to the host I needed to rescan my iSCSI datastores (of which the vCenter lives) so I'm not sure if this would actually work or not?  Does anybody have any better ideas?  Should I run another vCenter remotely for example?

My other problem came about questioning whether or not I could easily recover from a real disaster if my primary SAN failed.  Currently we have 2 SANs and they are bonded and performing asynchronous copies from SAN 1 to SAN 2, so in the event of SAN 1 not starting I could flip to SAN 2 as it would be an exact copy of everything I have on SAN 1, including vCenter for starters.  My unknown here is that ofcourse, I have never tried this!!!  And I'm not convinced it would work either.  Presumably SAN 2 would have different iSCSI fqn's that none of my VM Hosts would have ever seen before.  So how would it connect to SAN 2 when there is no vCenter running to initiate the rescan or initial connection? And then after it somehow connects would it present me with with new, but the same, datastores (if that makes sense)? And it would see all of the VM's on the datastores, I would be able to start them, but they would be copies, so in vCenter I would have duplicated VM's? 

Sorry but this is just as confusing to write let alone read it!   I am thinking now that I should move vCenter to a physical box to protect me from the 'eggs all in one basket' scenario.   Has anybody else recovered from a similar SAN failure and if so what did you do?

Thanks in advance

Stuart

Reply
0 Kudos
8 Replies
admin
Immortal
Immortal

It all depends how frequent is the power failure. 🙂 You can use the vCenter Heartbeat as an alternate option for vCenter redundancy. Regarding SAN1 and SAN2, you might need to use SRM application for a failover to happen considering both the SAN are on different iSCSI network, Alternatively, you can use VDP as a backup application to backup your VMs on a daily basis if a disaster happens, but then you need to ensure you provide a dedicated SAN to VDP. I wont recommend vCenter to be used on a physical server since its a waste of money and infrastructure. Then whats the point of virtualization.

notesguru99
Contributor
Contributor

Hi

Thanks for the quick response.

Power failures are infrequent, but when they do happen I want to be better prepared.  I looked at the Heart Beat software, thanks for that, but at $10,000 (list price) it seems overly expensive, so what I might do is periodically (weekly?) clone my vCenter server and send it to another site via USB drive.

We do have all of the VM's backed up too via BackupExec 2012, so if there was a disaster I can restore the critical servers (probably)! In fact I could have used this (appliance) to restore Vcenter on SAN2 and connect to that 😉 and then vMotion it back to SAN1 once all is well...

I'm not sure about the vCenter server staying virtual, although if I can get offisite backups that would be the ideal.  Thanks for the ideas.

Regards

Stuart

Reply
0 Kudos
admin
Immortal
Immortal

Mark my post as "helpful" or "correct" if I've helped resolve or answered your query!

Reply
0 Kudos
depping
Leadership
Leadership

notesguru99 wrote:

Presumably SAN 2 would have different iSCSI fqn's that none of my VM Hosts would have ever seen before.  So how would it connect to SAN 2 when there is no vCenter running to initiate the rescan or initial connection? And then after it somehow connects would it present me with with new, but the same, datastores (if that makes sense)? And it would see all of the VM's on the datastores, I would be able to start them, but they would be copies, so in vCenter I would have duplicated VM's? 

Hi,

When the storage is replicated and the replication is broken and you present it to the hosts you will need to rescan the iSCSI adapter. You can do this through the commandline however using probably:

esxcli storage core adapter rescan -A vmhba33

Now you probably will want to add the initiator to the host before that:

esxcli iscsi adapter discovery sendtarget add -A vmhba33 -a <ip-address of iSCSI array>

But you could add the iSCSI initiator way before ever running in to this scenario.

Now when your hosts sees the new LUNs you will either need to "mount" these LUNs or "resignature" these. You use "mount" when the connection to the "old SAN" is lost for good, and you resignature when the old connection is still there or you expect it back.

When you mount the LUN you can just simply power on the VMs, heck if you are lucky vSphere HA will do it for you.

When you 'resignature' the LUN you will need to re-register the VMs as well as all the "UUIDs" of the datastores they are on have changed and "ESXi / vCenter" doesn't know where the VMs are.

So not a 1 2 3 straight forward and simple procedure. I would recommend testing this and documenting it!

notesguru99
Contributor
Contributor

Thanks for the reply.

It sounds like the 'resignature' option is the one I would use because I would expect to get SAN 1 back by 'hook or crook', I just need to test this out somehow.

I am going to implement HA clustering soon, so that will atleast power on my machines after a power cut and subsequent restart of the boxes.  One other question though, should I start my SAN or the hosts first? I've had all of my VM's with an alert since our powercut :smileyconfused:

By the way, just reading your book, clustering deep five v5.0 in prep for the exam.  Didn't realise 5.1 was out :smileycry:

Reply
0 Kudos
depping
Leadership
Leadership

I would start the SAN first, wait until it is started up. Then start the hosts and ensure it sees all storage devices. If not rescan.

But you can test the "resignature thing", just create a new LUN. Add one test VM to it. Replicate the LUN. Unpresent the original LUN, and now present the "replicated LUN", resignature it and then register the VM and power it on. Then at least you have tested the procedure.

Reply
0 Kudos
notesguru99
Contributor
Contributor

I am replicating the LUNs at the SAN level not vmware.  The replication task 'locks' SAN 2 so that it does not present the iSCSI conenctions to the mirrored volumes whilst it is in slave mode.  Upon failure of SAN 1, SAN 2 must be made primary and then it will present all its iSCSI info to vmware - this is the bit I can't easily test as it is an all or nothing failover scenario.

I have used all available space on SAN 1 so I can't create another LUN and then mirror it to SAN 2.  I think I might get in touch with the vendor and ask for their help.

Thanks again.

Reply
0 Kudos
depping
Leadership
Leadership

notesguru99 wrote:

I am replicating the LUNs at the SAN level not vmware.  The replication task 'locks' SAN 2 so that it does not present the iSCSI conenctions to the mirrored volumes whilst it is in slave mode.  Upon failure of SAN 1, SAN 2 must be made primary and then it will present all its iSCSI info to vmware - this is the bit I can't easily test as it is an all or nothing failover scenario.

I have used all available space on SAN 1 so I can't create another LUN and then mirror it to SAN 2.  I think I might get in touch with the vendor and ask for their help.

Thanks again.

I would, definitely something you will want to test a couple of times!

Reply
0 Kudos