VMware Cloud Community
kgouldsk
Contributor
Contributor

All vRanger/ESXRanger users - problem awareness, opinion and experience requested

Vizioncore vRanger users,

I would like to take a poll to find out if you are aware that your vRanger backups are taking potentially significantly longer than necessary because their concurrency is not maximized due to scheduler design. This is a concern for:

- large environments where vRanger backup windows are long

- environments where there are problems with individual VM backups hanging, as it effectively halts any further backups

I've been promised this will be improved in upcoming version 3.5, but the current vRanger product works by using rules to determine how many backups to start, then as the individuals complete, it waits until the final member of the batch "completes" before starting more. So large VM backups create long periods where only 1 backup is running when there should be 4-6 or more, and a hung job obviously prevents execution entirely.

To some of us, this is clearly understood, and it's been a point of contention with me for a long time, despite being an enthusiastic customer of Vizioncore. I'm frustrated that Vizioncore hasn't prioritized improving this as the so that when things go wrong I still get the majority of my backups, which makes it critical to aggressively manage my backup problems as an emergency, rather than an operational issue. The other effect on my environment is that rather than hitting a reasonable backup window, the total completion time is significantly beyond what it would need to be because I'm not maintaining the throughput I could.

I'm suspicious many are unaware, as here is what the logs look like:

Backing up the VM: vhoappp05

Backing up the VM: vhowsv005

Backing up the VM: vhotstt01

Backing up the VM: ExchangeLab-vhoadcp02

Backing up the VM: vhoappp02

Backing up the VM: loki

No timestamps on start, no indication at all of completion, either of which would highlight the problem.

I'm interested in whether

- others have complained to Vizioncore about this

- you are even aware of it

One final thing - is anyone running Microsoft Operations Manager 2005 agents on your vRanger server? One thing I've noticed is that the most stable of my vRanger installs is not running MOM agents. If this is a commonality, I'd like to pursue it.

Thanks for any comments,

Kevin

0 Kudos
23 Replies
RParker
Immortal
Immortal

Well we run vRanger in a VM. The backups are faster than esXpress mostly, going across the same fabric, same target, and same VM's on the same ESX host.

It's not a big amount, but if esXpress takes 35 min to backup a 36G VM, then vRanger takes 25 min, in my tests.

So no I am not having a problem with vRanger being slow at all.

0 Kudos
kgouldsk
Contributor
Contributor

The individual performance is ok - but my total throughput is higher when multiple are running, particularly because at periods during a VM backup, traffic goes to 0, I suppose because there is buffering happening on the host. If we can keep as many running concurrently as possible. we will shorten the overal length. If you're talking about 10 VM's, it's not a big deal. If you're talking about several dozens, 15-45 minutes of lost opportunity makes a big difference.

It's perhaps useful to ask people to include how many VM's they're backing up on how many physical hosts.

0 Kudos
RParker
Immortal
Immortal

Hmm, that's fascinating. I will have to check that theory out.

0 Kudos
kharbin
Commander
Commander

Not to change the subject, just to address a comment "but if esXpress takes 35 min to backup a 36G VM, then vRanger takes 25 min, in my tests"

Rparker, I believe you are using our entry level esXpress LE product, which is throttled at 60GB/hour per backup thread, or about 1GB per minute. So 35 mins to backup 36GB is exactly where it should be. Our Professional and Enterprise products run unthrottled, with speeds as high as 565GB/hour, per ESX host.

Ken Harbin

www.esXpress.com

0 Kudos
RParker
Immortal
Immortal

Yes, that's true. However, I did try this test with the esXpress demo, which is unthrottled. So the speed was the same. Also, for our environment, the speed wasn't the problem, I am happy with esXpress, I don't have an issue with esXpress, even if speed was the only difference.

I prefer esXpress, but I was trumped by the need for centralized backup. I opted for esXpress across our servers, but management wants this to be as simple and easy for them, so this isn't a slam on esXpress, it's more of an observation.

I was pointing out that his problem lies elsewhere, and vRanger is definately not slower, but I just used esXpress as a referrence point on our servers.

0 Kudos
Rumple
Virtuoso
Virtuoso

Just out of curiosity, how are you backing up? vRanger on a host using the SC network or are you isng VCB to backup

esXpress in a well designed vmfs - vmfs backup just screams along as it pretty much goes as fast as I can write to disk (once you tweak the vmhelpers for performance)

vRanger and esXpress across the LAN to an ftp host I find pretty comparable in speeds and have not seen any issues with performance.

Scheduler design with Ranger is pretty simple...set how many hosts to run backups against and how many backups to run against the LUN's.

I haven't had any vm backups hang knocking out the entire backup schedule yet (except that one time when my esxhost with an active backup with read only on the console. That caused vRanger to think the backups were still running but on the host sides it was a mess (couldn't even console in anymore even though the vm's stayed running). Really made a mess for me.

0 Kudos
kgouldsk
Contributor
Contributor

Guys, please.....focus. I would request that no further discussion include esXpress. It not only detracts from my objective of collecting information and experiences, but it just devolves into a giant product debate which is not the purpose of this thread.

I have not implemented vcb, though will do so at some point. I do not have SAN storage everywhere, so vcb isn't an option in most of my locations, so my focus is on maximizing stability in the standalone/network backup scenario.

Thanks again for responses.

Kevin

0 Kudos
Rumple
Virtuoso
Virtuoso

On your ESx hosts have you increased the default memory allocation or left it at 272MB?

How do you have the concurrent backup settings configured (default of something like 3 hosts/2 luns per host? I forget exactly what the default is

How many VM's exist on each host (roughly). If the service console gets too busy that can cause issues with performance across the board (as well as cause the backups to fail/stall. You could increase the Service console cpu reservation as well.

Overall though, I am backing up 36 vm's right now across 3 hosts and it takes a while across the network, but it still just chugs along without any huge gaps in backup times. We've increased SC memory to 544 and we have a 1600MB swap file for it.

0 Kudos
kgouldsk
Contributor
Contributor

I've been through all the standard stuff with Vizioncore, and our hangs are happening with essentially the same frequency on our very busy hosts vs. our laid back sleepy hosts. Our console memory is bumped to 800, cpu reservation to 1500 Mhz, all the firewall ports confirmed etc. We backup 3/host 2/lun.

I'm not expecting much in the way of solving the problem - Vizioncore will get around to that I'm sure as they've identified some memory issues in the product that should be remedied on the next release. I just wanted input into whether anyone else has been complaining about the scheduler as I'm quite irritated that given they've had these hang issues from time to time in various releases, it would have been an easy way to minimize the impact that a hang has by letting other backups continue. Note that if all of your VM's are approximately the same size, you really wouldn't notice much time when you don't have all concurrent slots filled, as they'll finish approximately the same time. If however you are backing up a 150 GB VM, and it happens to get scheduled with 2 x 15 GB VM's, you'll have a very long period where it's only backing up one VM - how much this affects overall throughput will vary significantly with particular configurations, with the most robust ones capable of keeping all slots running hard suffering the greatest impact to potential performance.

0 Kudos
Mr_Spain
Contributor
Contributor

kgouldsk,

The reply is probably a little later, but:

Sounds like we are in the same boat, we too use vRanger and experience the same frustrations as yourself. Better scheduling has been something we have also been asking for as we have a variety of large and small VMs. We have also been suffering from radom backup "hangs" were vrRanger will stop processing once the backup has completed preventing further scheduled backups from running. No amount of support calls, config changes or log checks has resulted in a fix.

We are currently using a pre release version 3.2.3 to see if this addressed the fault.

Alos, we do use MOM 2005 agents on our VCB/vRanger hosts.

0 Kudos
dadalowg
Contributor
Contributor

Exactly the same issue here. vRanger 3.2.3.4 running and backing up about 50 VMs across 10 hosts and the backups will hang randomly. i am using VCB (LAN enabled on VCB failure) and sometimes (like today) the job will run in excess of 15 hours. Hope they fix this shortly.

I am experiencing the exact same issue with one vm hanging and causing the remaining backups to wait until the first batch is completed.

Kabir

0 Kudos
Mr_Spain
Contributor
Contributor

After many hours, weeks, and months of troubleshooting this with Vizioncore we ended up rebuilding our VCB/vRanger proxy server.

Built: (in this order)

  • Windows 2003 SP2,

  • Most recent critical patches

  • .Net2 Sp1, 3.0 and 3.5

  • vRanger 3.2.3.3

  • VCB plugin 2.0.2.2

  • VCB Framework 1.03 Update1

Since this the backups frun from this particular backup server have not failed. We have had 100% success rate on a backup server that used to fail EVERY night. So I am not sure that our fault wasa actually related to VCB. Possibly a combination of .Net patch and SP2.....who knows....

An interesting point, when we tried to retrospectively update our other failing backup server E.g Patch to SP2, add .Net etc without rebuilding the OS......this did not fix it.

Hope this helps.

0 Kudos
Texiwill
Leadership
Leadership

Hello,

Take a look at http://vmprofessional.com/index.php?content=esx3backups for a comparison between the backup tools for ESX.


Best regards,

Edward L. Haletky

VMware Communities User Moderator

====

Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education.

CIO Virtualization Blog: http://www.cio.com/blog/index/topic/168354

As well as the Virtualization Wiki at http://www.astroarch.com/wiki/index.php/Virtualization

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill
0 Kudos
dadalowg
Contributor
Contributor

Thanks Mr. Spain:

I am going to rebuild the Proxy server the way you have stated here and try it out. We used our general RIS build initially. I will update this topic with the results.

dada

0 Kudos
dadalowg
Contributor
Contributor

BOX built as per Mr. Spain's sequence. Used W2K3 R2 SP1 instead of W2K3 SP2.

- Backups run = 3 x (16 machines on daily backups) and 1 x (15 machines on weekly backups).

- Total VCB failures = 3

- Total Network backup hangs = 0 (upto now - fingers crossed).

Thanks

Dada

0 Kudos
gdesmo
Enthusiast
Enthusiast

I have been a vizioncore/ranger customer for three years now. Using the product in a non vcb and vcb image mode. It has been painfull. Everytime they fix one bug it seems they introduce another. I have complained about their QA process. But it has not gotten any better. Maybe it is because vmware releases so many updates they have to adjust.

I am hesitant to upgrade when they release a new version. At least I know what to expect with the version I am curentlly running. Even though it is not working without error.

0 Kudos
Cotay
Contributor
Contributor

For LAN based backups, the newest build (3.2.3.4) has fixed nearly all of our problems and works beautifully. The original posters issue of backups not starting until the batch was completed has been fixed for us. Our only outstanding issue is the differential engine breaking after a VMotion. This has been an issue with a DRS cluster, however, backing off the slider in the VC a couple of notches has helped considerably.

vRanger VCB backups are not working on VMs with the VSS agent installed. Supposedly this has worked it's way back to development. No reply from Vizioncore.

As SAN vendors fine tune their VI3 agents and allow for SAN based Snapshots/Replicas that are application/file level consistent, companies like Vizioncore and ESXpress will really need to innovate and improve customer service to stay competitive.

0 Kudos
stormlight
Enthusiast
Enthusiast

what speeds are people getting with the vcb option on and off ? With vcb on I can't get anything faster the 100/Megabits per second 12 MegaBytes per second. This is on a iscsi San

If you find this or any post helpful please award points
0 Kudos
dadalowg
Contributor
Contributor

--UPDATE

100% VCB Backups in the last 2 days. Have not seen this in a while. No errors on all 15 VMs.

I must be dreaming.

0 Kudos