I love the fact that VMware is providing VDR as part of the vSphere package. It's definitely a step in the right direction, albeit I'm still inclined to think this software hasn't been put through the ringer in terms of proper QA. I'm just trying to put out a feeler to see how many others have experienced some of the same issues I'm having.
To start, I'm backing up my VMs via a network share on a standalone Windows 2003 server that has a NAS attached to it.
Some of the issues I've noticed:
1) Backups take an inordinate amount of time. I can understand the first backup, but my VMs don't change very much from day to day. Most of the data being manipulated is located on RDMs are these are backed up using Tivoli, not VDR (I use VDR solely for the OS partitions). Each partition is approximately 25GB, there are 15 VMs and my backup window (10pm - 6pm) isn't sufficient to complete the process.
2) Integrity checks for the backups are taking a crazy amount of time and will usually stop due to my window being closed (see point #1)
3) I'm getting inconsistent "failures" for certain VMs (the report will simply state that a VM failed to backup, not much else). It also varies per night and not always the same VMs (not exactly sure if this is related to #1 where the window is closing while VDR is executing)
4) I had the most difficult time setting up the remote share from the VDR appliance in vSphere. The username and password would never be accepted (even though if I tried the same share with the same user/pass on a Windows machine, it would work fine). I finally narrowed down the problem to the simple fact that the VDR appliance can't handle passwords that have special characters in them (this password had an "@" and a ","). Looking at the console while attempting to mount the share would spit out a CIFS error -22. Changing the password to include only numbers and letters was sufficient to work around this issue.
5) Snapshots not being created for no apparent reason and thus failing the VDR process. I'm fully able to do a manual snapshot with or without the memory state, so I'm not sure why VDR can't do it. This issue is very intermittent. I had it often when I first setup VDR, but now it only happens every so often (without any type of consistency).
I think that's all I can think about for now..
im experiencing most of the problems mentioned above and also backup exec
backup exec fails every time when VDR is running , the speed of a backup is to slow, my backup store gets corrupted regulary and 1 vm can never be backed up the logging is very poort with little indication of what is going on.
Heres my take on the latest VDR.
I must say I've actually had excellent success with VDR since the 1.1 release. All snapshots get deleted and created with no problems, integrity checks and reclaim tasks are running fine as well and we rarely restart it. Running it since it has come out and been working perfectly apart from one issue which vmware sorted out.
The problem was 1 particular VM would always freeze at the same point when backing up. VMware suggested i delete the changes tracking file in the vm's folder. I moved it and as soon as i ran the backup it recreated it and worked fine since then.
But I must also say we are running seperate tasks for each VM. critical ones are run in off-peak time and non critical VM's are run during the day to ease the load.
That has seem to help ALOT. We have about 20 VM's being backed up every day without issue.
Oh and we are also using a windows share as a destination.
EDIT: should also put my two bits in and say we really need some form of notifications!
So my recommendations:
Seperate tasks for each VM and try to seperate them so no more than 3 tasks should run at the same time.
VDR should do that for us, don't you think? It tries to run 6 +/- backups at a time, its no wonder it is slow.
I run a script that snaps then clones the VM. It takes ~5 hours to clone 31 VMs.
The really interesting thing to note, is that the script is using 100% VM commands to perform this task!
As I and others have commented, VDR isn't "prime time ready". I am anxious for when they get the 1.x bugs worked out, but for now I'll continue to use what is consistently reliable for us.
Chief Information Officer
Lexington Memorial Hospital
Lexington, NC 27293
Has anyone opened a Support Request on vDR? If not, please do so, as it needs to de documented and forwarded to product support team to review. Thanks.
VMware Communities User Moderator
I have, on Friday. Provided ~1gb worth of logs. I'm having snapshot issues, problems with incrementals vs. fulls, and speed issues.
1) If backup jobs don't run but VDR recognizes that they are out of compliance, one of the following could be the issue
a. The dedupe store is locked - this could be because there are damaged restore points. Running a manual integity check should tell you whether this is the case (and if it fails, then it will also report which restore points are damaged and need to be deleted)
b. VDR is running another operation - unlikely but worth checking. For example, if the integrity check or reclaim operation is not complete when the backup window is open or if there is not enough resources to initiate snapshot, then backups will not work. Look under Reports/Running tasks.
c. I would try to recreate the backup job (identical VMs, source and window) and see if it works more reliably.
2) If you have to do this, my theory is that the VDR appliance is crashing for some reason...and thus why you cannot connect until you restart the appliance. The VDR logs should provide some more information about why this is happening
3) If the destination disk is not mounted or a reclaim operation is running, that would explain why the restore points are not being displayed.
I would recommend that you open up an SR and have the VMware GSS folks dig into the problem with a bit more detail.
I've been having pretty much every issue mentioned in sevearl of the VDR threads....
Sounds like 2.0 might be better, but i'm not sure when that will be out yet.
Anyway, kdc, you mention deleting the "Changes tracking file" in the VM folder and your problem of one particular VM backup hanging went away. I have the same issue with just one of my VM's right now. Would that be the 'vm_name'-ctk file??
Going to look into esXpress 3.6 possibly, or maybe Acronis. Anyone have an opinon on either of those solutions?
I'll try and reply to a few posts:
I have opened support issues with VMWare reference my VDR 1.10. I do follow best practices - 500GB VMDK on 15k drives/fiber attached SAN - but I've also used, CIFS share on Window Server, NFS on SAN and NFS on a Quantum DX6520 and had similar results.
My snapshots fail occasionally (for whatever reason - I like the variety, so to speak) and a VM doesn't get backed up. Meh. Happens with tape, too, right? The real problem starts when VDR cannot delete snapshots (again - for whatever reason) but keeps on making them! I go on vacation and come back and have a VM running with a bunch of snapshots. Ugh. My rear end puckers as I run fixdeletetable.py script and commit (let's all just refuse to use the word "Delete", okay?) 10 snapshots on a bunch of VMs (don't forget to make one first and make sure one ore more of the VMs VMDKs isn't still attached to VDR!).
So I can usually remedy each situation, thanks to the lessons I've learned from VMWare tech support, but issues keep popping up. It's like "Whack-a-Mole". I know how to swing the hammer now, but I'd prefer not to do it so often.
As to Acronis, my reseller sold us Acronis Backup and Recovery 10 Advanced Server Virtual Edition licenses with my first round of vSphere licenses. Problem is that quote was put together with ESX 3.5 in mind, but delivered with vSphere 4 (using ESXi 4.0 hosts). I couldn't get it to install and actually work, and since vConverter worked so well and VDR worked well enough, I didn't stay on top of it. I would prompt my vendor to get on it, and they tried to get it working in their lab and could not. They said they worked with Acronis and I was told (by reseller) "Acronis says it won't work with this version - they know it and they're working on it". This is...Septemeber 2009, maybe.
More time goes by...
Prompt reseller recently - hear back "Acronis says it should be working now - we'll get on it". Vendor then says it still isn't working. I paste forums with posts trying to find anyone who has actually used this version of Acronis with vSphere/ESXi 4.0 and find no one. I also find no one who will say it definitely will work or won't work. I do get negative responses about Acronis. I never receive anything in writing from Acronis taking a side, and the support I receive directly is generally worthless.
So currently, I have some fairly expensive Acronis licenses (with support, ironically) sitting around doing nothing, and most likely will work with reseller to recoup the costs in some fashion.
If VDR was just a tiny bit...okay, a good but more reliable and had a couple of more features (notification, notification, and notification), I'd be happy. Because when it works, it works well enough (as I restore a kluged VM quickly while typing this). I mean - tape (and I've been through almost every iteration) kind of blows, too. And I'm close to rectifying the tape thing. But I need VDR to step up a little more...
I've just deployed VDR 1.1 on an ESXi 4.0 U1 box, and it's causing me some grief, more than I'd expect for a "backup appliance", which Should Just Work on a vanilla setup. It's not our hardware (it's brand-new and ultra fast), I'm not using CIFS or NFS (I gave the VDR VM a 256GB VMFS disk), or anything obvious like that.
The first time I deployed it, I had to change the network settings (we renumbered an IP subnet). I could never connect to the VDR appliance ever again, even after reconfiguring its IP settings using the console, rebooting multiple times, etc... Some setting had 'tattooed', and couldn't be changed. I had to redeploy the OVF file!
I also noticed that the OVF file enforces "DHCP" as the network configuration, so there's issues deploying VDR to a network that doesn't have DHCP, which is true of most server networks.
After a lot of effort getting it going, it seemed to work for a day or two, but it's confusing the identity of virtual machines. The backup metadata of at least 3 different VMs are appearing under the tree node of an unrelated VM. If I try to restore one of the problem VMs, it tries to restore it using the name of a different machine. That's just insane!
The tree view used to select VMs is the "hosts and clusters" view, even though it should really by using the logical "VMs an Templates" view, which is were most people organise their VMs into folders.
If a slightly corrupt backup target is attached to a VDR appliance, it locks it up shortly after boot, forcing a power-off. This then often results in other backup targets from becoming corrupt. I'm lucky this is a lab environment here, because I managed to destroy the data on several backup targets before I gave up and started with blank disks!
There doesn't appear to be any way to fix any of these issues. If something is corrupt or misconfigured, that's it, there's no workaround, documentation, cleanup, or any other fix available.
I still haven't figured out why it's mixing up VMs. Even redeploying it from scratch and starting over didn't fix that, so now I'm stumped.
In case anyone has the same issue, I've figured out the cause of VDR mixing up VMs in the backups.
It uses the "bios.uuid" field as the machine ID. If there are duplicate IDs, it becomes confused.
Someone here had built VMs by cut & pasting between VMX files, which was silly...
My advice - DO NOT LET ANYONE MODIFY THE VMX files, unless you KNOW what you are doing. That is taking a VERY UNCESSARY and FOOLISH risk...guess you already figured that out, huh? Good find, though!
As for VDR...I have given up on it. Maybe the 2.0 release will actually fix the problems! The numerical sequence of VDR release should have NOTHING to do with the quality of the VDR application. I hoping they will fix it with a maintenance patch...keep trying VMware - you'll eventually get it right.
Well, I decided to abandon VDR and I'm using the very nice ghettoVCBg2 perl script (http://communities.vmware.com/docs/DOC-9843) to back up my VMs. It isn't perfect, but it is reliable!
after using VDR 1.1 for about 2 month I found it very fast ans useful but we are facing the folowing error since two days ago:
Trouble writing to destination volume, error -107 ( out of application memory)
should I increase RAm for VDR ? any idea what is this about ?
nothing change in the VMI !!
I suggest you wait for the next release of VDR. VDR v1.1 (and earlier versions) are well-known for being unstable in a variety of configurations; a quick read in the community forums supports my statement.
In my own personal experience, VDR v1.1 (and prior) isn't stable nor adequately reliable for production guests. There are other methods of backup protection, including license products and "free" script-based utilities that can help you protect your VM guests.
VDR has "great promise" for managing VM guest backups and for de-duplication. I know (from interactions with support and development) that VMware is working diligently on the next release and all of us are expecting it to be a stable release! VDR will be a great backup/recovery utility - after it has "baked-in" for a while. I don't recommend you use v1.1 or prior for protection of production systems!
I know this isn't what you want to hear - but this "is-the-way-it-is" for now!
Has anyone involved in this thread tried out VDR 1.2??
Results, opinions, rants, anyone??
I can readily recommend esXpress 3.6.x/4.x, having used it for almost 3 years, support is very good.
Thank you, Tom
Results, opinions, rants, anyone??
VDR 1.2, performance is improved, they fixed some nagging bugs, allowed the plug-in to switch appliances, and they made a couple of additions.
Its definitely a work in progress, but it is getting better. It still crashes, and it can use a LOT more usable features (like email, external logging, and ability to set the number of jobs in progress, and stop the jobs all at once) but I will say, the previous version is a -10.
This version I give it a 1. There is so much room for improvement.
I agree with esXpress (now PHD Virtual) and VizionCore vRanger et.al but for a FREE add-on, VDR 1.2 is at LEAST tolerable.
I've been using it for 4-6 months, since it was in private beta. It has run very well. We have 40+ guests and we are backing up to iSCSI. It still isn't our primary backup - we are using scripts to snap and clone the guests (which works PERFECTLY - EVERY DAY). VDR v1.2 isn't exactly perfect, but it is DEFINITELY an improvement over prior releases with consistent reliability.
I agree with a prior message about the missing functionality...that is VERY true. I have heard from a reliable "VMWare internal-source" that v1.2 was focused on fixing the reliability issues and they weren't focused on adding new functionality on this release. They seems to have (finally) hit the target this time. I suggest that you carefully consider using VDR as your primary image/data backup solution. It seems to be a stable release.
The same can be for my experiences too. VDR 1.2 seems to be stable as i have been running it for 1.5 weeks. However, i think the retention policy is not being upheld though.My policy is: Number of backups to retain = 7. Older backups to retain is: 0: Weekly, 0: Monthly, 0: Quarterly, and 0: Yearly. Thus, I assume that after 7 days from the point of time I created the policy, the reclaim process would delete the oldest backup restore point. However i still have over 10 plus restore points. I assume that based on the above policy, I should only have 7 restore points. Or maybe I don't understand the retention policy itself clearly.
Can someone explain?
that's a matter of opinion was too whether it's FREE.
We had to purchase VRanger Pro for one of our clients, because they bought into the "VMware Sales Pitch", bloody good job, HP SAN failed losing all 36 VMs, and vDR hadn't been working correctly for months, VRanger Pro was only installed 4 days before the HP SAN failure, and it saved the day!
We are currently testing 1.2, where are these features you mention:-
like email, external logging, and ability to set the number of jobs in progress - maybe we are being stupid, it looks the same to us,certainly cannot find these items.
So far, things are looking good, but so was 1.0 and 1.1, for two months before the deduplication store got three months worth of data in it, and then it would crash, not index, not catalog, backups would run for two days and then stop.
Performance is far better with 1.2, and here's something to look forward to, and then the client will be totally happy we have a VMware on VMware solution!