We have upgraded to vSphere (full) and we're seeing a drastic reduction in backup speeds. Where guest backups would previously report 50-70MB/s, we're now seeing things top out at 25MB/s, even for the followup incremental backups.
We struck it up as a fluke on the first datacenter in our main facility, but we're now seeing the exact same performance after installing in our second datacenter.
We're using Veeam Backup 3.1, installed on a fresh Windows 2008 64-bit guest with 2 CPU and 2GB memory. It was previously installed in a Windows 2008 32-bit guest.
The physical server has 8 2.83GHz CPUs, 32GB memory and is showing approximately 20% host utilization at the time backups are occurring.
The hosts are connected via 4GB/s FC to an 8-drive EVA 4400 through a Brocade 200E switch. The backups are writing to a 16-drive RAID50 Samba share. The Samba server isn't experiencing high CPU loads, we're seeing about 6% utilization there during the backups.
The vSphere Service Console is on vSwitch0. The vSwitch has a Service Console port and a VMKernel port. The vSwitch is connected to 2 physical adapters, both Broadcom BCM5708 gig NICs. These are the same NICs that were used under ESX 3.5 and Veeam Backup 3.0.1 with great backup performance.
Have checked Ethernet NICs and switch ports, no errors. Have checked FC ports and switches, nothing odd there either.
The vSphere installations on the two hosts were clean installs, not upgrades. The vCenter install was also on a fresh Windows 2008 64-bit guest. It was previously on a Windows 2003 32-bit guest.
The guests were all updated with the new VMware Tools, part of them were upgraded with the newer VM Hardware. Old or new VM Hardware doesn't seem to make a difference in the backup speed.
Testing the SAN and Samba server via IOmeter get us performance that we'd expect, much higher than we're seeing via Veeam backups.
We have tested backing up via a Samba share (
server\share) and doing "direct" to the Linux Samba server. Both perform equally as bad.
Have checked the Veeam forums for possible problems, but came up empty. Have watched these forums for the past few days to see if others were seeing this, but didn't see anything there either.
Sort of at the end of my rope as to what is causing this drastic slowdown. A 330GB guest is now taking about 4.5 hours to backup and that's REALLY ridiculous... Sorry for the long post, but wanted to provide the hardware and install steps along with the troubleshooting we've attempted so far.
Would love to hear any suggestions for how to narrows this down and resolve it!
I would just like to report that we have the same problem. 2TB of data takes almost 24 hours to complete.... 😕
But I would like to add that we are using VCB mode for our backups (in Veam Backup 3.1), and are seeing the same awful speed. Command line VCB backup on the same backup server = ca 80MB/s (same as esx 3.5). So I think this is a Veeam issue.
Thanks, we've reported there as well. We'll see what happens!
Putting my differences aside, (I'm from vizioncore) and wanting to see this fixes for all VMware users, here is what it think is going on. I think this is due to the read speeds from VMFS on ESX 4 VS ESX 3.
Give this test a try, create a 10 GB VM and run this command, how does it take on ESX 3 vs ESX 4? You have to remember that VCB over the network is using VMware API's which are going to get more disk/read time. I think VMware has starved the COS reads again. I think this same thing happened from ESX 2.5.x to 3.0 and when 3.0.1 came out it was fixed, I'm not 100% on that ,but I think that's what happened.
time cat JM_10GB_Test-flat.vmdk > /dev/null
Results of the 10GB VMDK cat to /dev/null on ESX4 were:
time cat sjp-test-flat.vmdk > /dev/null
Which gives me roughly 54.1MB/sec, which is WAY (3x) faster than we're seeing any other transfers happening.
I don't have an ESX 3.5 host to test with anymore...
Now I'm REALLY stumped!
I am from Veeam and I have been researching these two issues past couple of days. Steve, Lars and Trevor - thank you for your patience and bearing with me.
We seem to have nailed down the second issue (VCB backup) reported in this topic by Lars (asp24), it looks to be related to backup target storage speed, or controller, or cache settings. Backup speed is significantly faster with another backup storage.
However, I am still working on the service console agent backup performance issue with Steve and Trevor... I plan to get my hands on our main lab with some nice FC SAN equipment, and do FastSCP file copy testing there in both service console agent, and agentless modes. I will be comparing ESX 3.5 and ESX 4.0 hosts both connected to the same SAN storage. In my lab, I have pretty weak ESX hosts with slow local storage, so I could not confirm any differences between ESX 3.5 and ESX 4.0 (about 25MB/s DL speed on ISO files). I know that my fastest ESX 3.5 host with RAID0 local storage on 2 fast modern hard drives could do much better than this, but unfortunately this host died recently while setting another world record on FastSCP download speed
I will update this thread with my findings.
Gostev - Just wanted to pass along my continued appreciation for all the help you've given us in tracking this down. Thanks!
Does that time cat command work on ESXi free? The process seems to hang for upwards of 10 minutes and then finally I get impatient and just ctrl-c it. The vm is on the local storage.
I've never tried opening service console access on ESXi, but if shell does not error out when you start the command, then supposedly it works.
Also, note that it may take much longer to execute than 10 minutes, if you are testing on big VMDK.
In my case, time cat on 4GB test file takes 2 min 36 sec (on "fat" ESX4).
Ok I will give it another shot. What's the calculation to figure out MB/s?
Nevermind. I need to learn not to ask questions until I have tried it myself.
It ran at about 18MB/sec for an empty 10GB file. I realize this has nothing to do with Veeam or vSphere but thats where i finally found the command for some sort of speed test.
I am seeing similar performance issues on our 4 host vSphere environment. I agree with the earlier post that the COS is being starved of resources, and this is reflected by our results after increasing COS memory allocation. It's not just datastore transfers that suffer, we are seeing appalling transfer rates when tar'ing data from a datastore off to tape (circa 3-4MB per sec) even if you use passthrough to a VM. When we moved the tape backup to a standalone macine, we're seeing 35MB/s.
The best results we have seen so far is using ghettoVCB. We are seeing sustained rates of 70-80MB/s over a iSCSI connection between 2 arrays. I spoke to the Vizioncore guys last week after some pretty disappointing testing with DPP and am assured that a more polished beta will be available this week with a gold version available end of july.
I would like to see some feedback from VMware as to why COS poerfomance is soooooo bad. I know that they say backup from ESX hosts is not supported, but surely with so many customers clearly wanting this functionality VMware need to address this. There are a lot of proucts out there to cover day to day internal backups, but we still need a simple offsite backup solution (eg to tape!).
I am currently troubleshooting a VCBbackup slowness issue. Backup server(vcb proxy) is connected via 4gb fc. It is taking 4-5 hours to mount a 300gb VM. vcb proxy is running windows 2008 x64 and writing to a raid 0 array. server has quad core cpu and plenty of memory.
This is a new vSphere 4 implementation with an hp msa2312fc 4gb fiber channel san, 2 HP fc san switches, vcb proxy server is connected to san via dual 4gb fc. The San is 1 large raid 5 across 12 x 450GB 15K RPM drives.
I wouldn't rule out a problem with vcb proxy. I am possibly leaning towards and issues with x64 windows server 2008 and vcb proxy.
Any suggestions? I have a support case open with vmware but it is slow going. They are analyzing logs.
I hope this isn't the same issue as the one I recall when 3.5 came out with poor transfer speeds to/from the ESX host via the SC network.....I am about to implement vSphere4 and P2V some 88 servers. I don't have time or patience if the SC network isn't performing properly.
any news on this?
it`s a little bit weird, that only read performance seems to be affected.