VMware Cloud Community
wseaton
Contributor
Contributor

Proper methods for backing up large VM's

Forgive me if this question is a bit pedestrian, but I'm a relative newcomer to data center scale virtualization, but have to solve the problem.

I recently took over about 50 virtualized servers running on Vcenter 4, and while everything is running smoothly I've run into a snag regarding our larger VM's. Given I'm certainly not the only guy dealing with terrabyte sized VMs, there has to be some other options and strategy's for dealing with this and am looking for ideas.

Currently our back-up strategy consists of logical segregation between file level back-ups (incremental, etc.) and guest OS back-ups. File level consists of back-up agents running through the guest VMs, and it works fairly well. Guest OS back-ups consist of an automated snap-shot > back-up > delete snap-shot method, and it also works fairly well, with exceptions.

The problem is VMs with a lot of attached storage (we are not using RAW disks). While it's easy to clone / snap-shot a Windows 2008 server, it starts to get inefficient once you start to exceed 100gig or so, and impossible when it get's much larger. Maybe this is something I'm missing with VMware, but as I understand it a guest VM and all the attached storage disks from a SANs are treated as one logical entity from VCenter. So, if you have a base Windows file server (or Exchange server) with several terrabytes of attached storage you can only snap-shot  / clone the entire guest *AND* attached data stores.

In the bare metal days I would use any number of tools to clone just the OS drive (or logical disk array) on a periodic basis with any number of tools and ignored the other logical or physical data volumes. If a round of service packs 'bricked' a server who cares - just restore the logical disk or partition and off you go. Even so, I found I was among a minority of Admins who did this having grasped the concept that Server OS's could be treated the same way as Desktop OS's. So, I'm a bit perplexed as to the best way to handle this in our current environment without having to resort to expensive SANs level 3rd party software.

0 Kudos
11 Replies
Texiwill
Leadership
Leadership

Hello and welcome to the Forums,

THere are several tools out there to aid in backups. THe key is to determine what you really want to backup. WIth virtualization making a full and then incremental backups of a VM is quite easy. Many of the tools are designed to limit bandwidth by using change block tracking (for incrementals, backing up only those things that changed), source side de-duplication, active block tracking (looking at the filesystem to see whcih blocks should be zeroed), etc.

For just starting out many of the tools work just fine. VMware VDR, Veeam Backup, PhD Virtual, and Quest vRanger. If you already have an investment in Symantec they also integrate into the virtual environment.

One other thing, always look for a tool that integrates into the virtual environment.

Best regards,

Edward L. Haletky

Communities Moderator, VMware vExpert,

Author: VMware vSphere and Virtual Infrastructure Security,VMware ESX and ESXi in the Enterprise 2nd Edition

Podcast: The Virtualization Security Podcast Resources: The Virtualization Bookshelf

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill
0 Kudos
wseaton
Contributor
Contributor

Thank you for the response.

However, I've already looked at many of those, and maybe my eyes are missing the fine points but I don't see them offering explicitly what I want. I am not looking to back-up an entire Vm. I am not looking to back up a guest OS with the assumption the guest OS is live and will be cooperating. I need a tool that's granular enough to restore a guest OS without worrying about attached storage nor care if the guest OS is functional.

If I may vent for a second the majority of data centers I've worked in (several dozen over the past decade) do not have explicit OS back-ups and the REAL recovery plan is to pull out a Windows recovery CD, or blindly rely on a cluster fail over. This is the second data center I've been in less than 6months where the resident VMware engineers *do not* have functioning guest back-ups once the snap-shot process fails, which it does once the guest VM and attached storage grows large. So, they assume Windows guest OS's will never break, service pack downloads will never throw the guest OS into an infinite upgrade loop, registries will never get corrupted, etc. Vent = off 🙂

0 Kudos
sketchy00
Hot Shot
Hot Shot

Your response to Ed's reply leaves me even more confused than your original question.  I've had to protect countless numbers of systems over the years, and from what I'm hearing you say, I think it might be best to take a step back, and ask yourself what you are really trying to do.  In short, the collective objectives of all Administrators is to protect and preserve the integrity of:

1.  The data  (structured, unstructured, etc.)

2.  The system that serves up the data.

I also think you might be making some unfair assumptions about snapshots.  Most of the 3rd party apps metioned, along with other SAN based solutions like EqualLogic leverage vcenter's API to make a hypervisor consistent snap, but that snap then is independendant from the VM itself, so you don't have any journaling going on, or old snapshots laying around, which is the reason for growth.

Remember, it's about "protection" not just backing up files on a guest OS.  Confusion might occur when you apply the older paradigm of protection of systems in the physical realm.  The desired result is the same, but the capabilities of the physical realm never allowed for such a thing.  My position is that file level backups of OS's never worked in the first place, which is why one ended up with bare metal disk imaging solutions.  This seemed like magic until virtualization came along.

I'm actually quite a stickler about #2.  Anybody trying to rebuild a Sharepoint Server, Exchange Server, CRM Server, etc. just from database backups knows what I'm talking about here.  ...but if you can protect the system that serve up the data in their totality (hypervisor consistent mechanisms to protect the system), you are covered.  My testing, and real recovery scenarios have proven that to be the case.

0 Kudos
wseaton
Contributor
Contributor

>>I also think you might be making some unfair assumptions about snapshots.

How may I ask am I making an unfair assumption about snapshots when they aren't working now? Why should I engage in yet another 3rd party  product and spend more money when such a product is going to rely on the snapshot process, which I've made clear isn't working on the larger VMs?

From a bare metal perspective it was easy to perform in-state imaging and back-ups of the bare metal OS that could be restored in minutes. From what I'm hearing this is obsolete with VMware, but I've yet to have a concise answer as to what this process is other than references to buy other back-up products that then reference the snap-shot process which is broken on our larger VM's.

We spent $150,000 for our current back-up system, which works exceedingly well for file level restore via guest agents. It works very well for smaller VMs. It does not work with terrabyte size VMs because the VMware snapshot process fails. Which means I need to resort to SAN level products and justifying another huge expense when a $300 copy of Paragon performed the task perfectly on bare metal.

See my frustration?

0 Kudos
sergeadam
Enthusiast
Enthusiast

I see your frustration, and while my environment is likely smaller, I too deal in multi TB data stores.

I'm in the throws of virtualizing my environment. In the physical world, I have to rely on old full backups because it takes too long to take a full backup of a 10TB volume.

First you have to set goals. What are you trying to acheive from a backup solution.

My marching orders are usually to be up and running as fast as possible while losing as little data as possible.

The traditional method still works and is still slow.

I've looked at virtualization aware solution and am leaning towards VEEAM. As a test, a complete backup of a moderately sized VM was 4 times faster than with Backup Exec. And the block level incremental are ridicously fast. Combined with the ability to incorporate the incrementals into the full, dropping the need for regular fulls make it a good, fast solution.

It meets the need of being able to recover at the VM or file level. Depending on the license level and your infrastructure, you may also be able to power up your VM from the backup file.

At some point, you may need to realize that your old backup solution may no longer be viable. You've been through a change in technology and the old tools may not work anymore,

0 Kudos
wseaton
Contributor
Contributor

Thanks for the insight Sergeadam. We're on the same wavelength.

>>First you have to set goals. What are you trying to acheive from a backup solution

Well first, given all the money we dumped into virtualization I'd like the same OS level redundancy and restore capabilities I had with bare iron twelve years ago. It's that simple. Our operating systems and applications conduct our business, not VM kernels. With bare iron in 1998 I'd use Ghost to back up critical servers to a spare local drive, and use file level back-ups to cover anything else.  Even if my entire OS partition or physical drive took a bullet it was a simple task to restore the OS in entire state in less than 20 minutes. Bang - done - server back up. Meanwhile most of my contemporaries would run around looking for their Windows restore disks and shut down production for a few days. After all...they had the MCSE 🙂

Fast forward a few years and I started using tools like Paragon, which could do in-state clones to a spare drive or network, often without dropping the server at all. If the system OS took a bullet for whatever reason I could restore it quickly and reliably. When you spend 500 hours getting a Citrix cluster to work flawless you don't use ArcServe to back-up your OS. You learn to clone the OS and stop listening to the nitwits telling you Servers are 'special' and can't be cloned.

Fast forward to now, and I'm running into data centers with data stores slung all over the place with no real schema in mind and pricey back-up mechanisms not working. If I Google problem issues with Snap-Shots I could spend a month reading them all with many VM specialists telling me it's a clunky and primitive process and others telling me it's a panacea, so I'm starting to thing there's a distortion field somewhere. My current back-up solution is 5months old, was put in place by a VMware consulting company, and now I'm told I have to resort to pricey SAN's level back-up methods which eat up any savings I incurred consolidating servers in the first place? This does not compute. This is a School District where budgets are tight.

So, the question remains and has yet to be answered. Is VCenter and it's stock tool set capable of cloning/  backing up / mirroring (whatever you want to a call it) a guest OS and be granular enough to not have to deal with attached datastores blowing up the snap-shot process?

0 Kudos
sergeadam
Enthusiast
Enthusiast

What is your current backup solution?

0 Kudos
wseaton
Contributor
Contributor

HP data protector and hardware.

Works great for file level back-ups. Works great for snap-shots and integrates nicely with VCenter. Turns into a paperweight when snap-shots don't work.

In retrospect we should have spent the investment into a robust SANs level back-up solution, but I wasn't here then.

0 Kudos
sergeadam
Enthusiast
Enthusiast

My first thought would be to get up HPs ass and get them to remedy the snapshot issue or get them to admit their solution does not work in your environment and go back to the consultant for remedy.

0 Kudos
wseaton
Contributor
Contributor

Exactly. As a product it's not HP's fault because it's VMware's API that fails making a snapshot. From a file layer perspective it's a solid product.

The other part of your comment.....yep, pursuing it.

0 Kudos
Texiwill
Leadership
Leadership

Hello,

I have used many of these tools and if there is a failure to create a VMware Snapshot (not any other form of snapshot) then that is usually a failure in the tool understanding a response, a snapshot may already exist, lack of space within your environment to create such a snapshot, or you are using a non-virtual Raw Disk Map. non-virtual RDMs require traditional backup/restore approaches.

If it was me, instead of looking at a new tool that may have the same issue, look at the reason for the snapshot creation failure. If you can find the root cause you can solve the problem and it will be solved going forward. If you are out of diskspace on a given VMFS and as such should probably specify another LUN to store the snapshot (also possible). If you are using a non-virtual RDM convert it to Virtual (which may require a reboot of the VM). If it is any other reason then it will take some debugging. Some older tools may also have issues if other snapshots exist.

However, if you rather not do that work there are several new tools on the market that perform replication without the need for snapshots. Those would be ZeRTO and the soon to be out VMware Replication. Granted they only work for running VMs, but that may be sufficient for your needs. These tools tie into the vSCSI layer and as such replicate data as it goes towards the backend array.

Best regards,

Edward L. Haletky

Communities Moderator, VMware vExpert,

Author: VMware vSphere and Virtual Infrastructure Security,VMware ESX and ESXi in the Enterprise 2nd Edition

Podcast: The Virtualization Security Podcast Resources: The Virtualization Bookshelf

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill
0 Kudos