VMware Cloud Community
jb42
Contributor
Contributor

Help with VDP DR test?

Hi all,

If you want to get right to the question go to paragraph 4.

1. I'm making good progress with VDP, backed by iscsi volumes on ZFS (zvol.) It's a nice solution running free backup software in VDP which has the excellent added benefit of deduplication. It runs on a free open source operating system, OpenIndiana. And the hardware was free since it's running on old pre-virtualization rigs. Plus with ZFS I get snapshots, disc scrub, replication, and iSCSI with multipathing and all kinds of other goodness I haven't gotten into yet!

2. So I've got 2 OI/ZFS rigs. One presents two iscsi volumes to ESXi 5.1 for VDP backup appliances. (Yes, my consistency checks sometimes stop running and CPU tends to creep up eventually requiring a restart - so I check them daily. And I had to work out diskshadow pre-freeze scripts to get application consistent server 2008R2 backups, and I'm still using a workaround for the exchange server. Also I've just passed the first 30 days and I don't *think* the retention profile has gone back and deleted backups it now should have but I'm not ready to say that yet.)

3. A bit bumpy and difficult getting here but I've got reliable-enough daily backups on the VDP appliance and I've succesfully restored a VM so that's pretty good. On the OI box, I take a daily ZFS snapshot of each zvol. I'm not powering off the appliances prior to this zfs snap but as long as I can restore to the last validated consitency check I'll be happy. I've repliacted (locally for now, via zfs send/receive) the seed file system and incremental snapshots for the last week (~2TB!) over to my 2nd OI box and rolled the subsequent zvol to the latest snapshot (although I think that may have been unnecessary) and then presented this zvol to ESXi 5.1 over iSCSI. It was necessary to resignature the LUN to get it to mount but now it has and I can browse the datastore and it looks like I'm good to go.

4. I've got my original VDP appliance LUN and it's SNAP LUN mounted in vCenter. I want to test DR (but not go so far as to setup a new vCenter server if possible.) Without really mucking up vcenter's vdp registrations, can I shut down the original appliance and add the SNAP LUN .vmx to inventory and power it up? It'll be named/configured/registered the same as the original. Will that break stuff? Next steps from there?

5. To get this to work offsite in another location, I think I'll have to first spin up a new vcenter server and register the appliance. Any ideas fleshing out that process? I'd love to be able to handle disaster recovery on that end on the free hypervisor but am I right that can't happen if I've got VDP in the mix?

Thanks in advance,

jb

0 Kudos
6 Replies
jb42
Contributor
Contributor

well it looks like I'll need to shutdown before replication and i guess probably storage snaps too. Core services came up as unrecoverable. In case it's related, appliance copy booted wanting to use eth1 network interface but vdp seemed to reference /etc/sysconfig/network/ifcfg-eth0. I renamed this to ifcfg-eth1 and was able to access web interface but thats where I encountered core services unrecoverable. Same thing here when appliance wasn't shutdown prior to replication: http://blogs.vmware.com/vsphere/2012/12/recover-replicated-vdp-appliance.html.

So I'm replicating again now with the appliance down and ignoring the network issue for the time being...

0 Kudos
jb42
Contributor
Contributor

Looks like that network interface switch may have been a key issue after-all. It appears to be a common issue when cloning Linux boxes (references CentOS but seemed to apply.) http://www.cyberciti.biz/tips/vmware-linux-lost-eth0-after-cloning-image.html.

After deleteing the referenced rules file and reversing my ifcfg-eth0 rename, the appliance booted with core services running. But management services wouldn't start or auto-restore. Trying a rollback now...

0 Kudos
jb42
Contributor
Contributor

alright, it worked! (hopefully somebody else cares Smiley Happy.) I was able to ZFS snap and replicate my vdp appliance, remount and use the copy. It actually came back up and ran a scheduled backup job! Test restore job is running now.

Tomorrow I'll go through the process clean with my second appliance, as I think I've worked it out from this test. Subsequent tests will be to restore from incremental snaps/replicas, and then too actually run through the whole thing at the remote site which will involve setting up a fresh vcenter.

0 Kudos
jhunter1
VMware Employee
VMware Employee

jb42, I care.  Smiley Happy

Good stuff. Please keep us updated on your progress!

0 Kudos
jb42
Contributor
Contributor

Following up: initial VM restore was a success.

Today, I completed an incremental replication of the daily volume snaps since my initial volume seed, bringing my 'remote-copy' up to date. I am restoring a VM from this now. Compared with my initial multi-day effort, today I completed replication -> restore in less than four hours. And the time from LUN mount to restore (which is where I spent two days before) was less than 1 hour, requiring only 1 reboot to edit network settings as previously noted. No checkpoint rollback was required! Boot times were fast without the timeouts which I understand are due to the network-config issues.

0 Kudos
jb42
Contributor
Contributor

One other note here: the data snaps since 4/9 were running after the backup was complete but before the checkpoint validation commenced. I did not shut down the appliance during data snaps and did not appear to encounter any difficulties as a result of that. I'm gonna try and move the data snap back to after the validation so I have fresher remote backups. This way, they are stale by one day. I did shut down during replication but I don't think I should have to - just replicating the snaps - so I'll have to test that more too.

0 Kudos