rossanderson
Enthusiast
Enthusiast

Windows 2008 Server to Server File Copy - SLOW

Hardware - Cisco UCS with Nexus Switching

Storage - Clustered NetApp (3220 with flashcache) with multipath iSCSI backend

VMware - ESX 5.1

NIC Driver - VMXNET3

OS - Windows 2008 R2 fully patched

I know this topic has been covered ad nauseum - I've been reading articles for days. I'm mainly wondering if I am missing something. When I do a file copy between two Windows Server 2008 R2 VMs (both in the same vmware cluster) my copy initially gets around 150-300MB/s but eventually drops to around 30-35MB/s. I suppose my main question is why the initial speed is so high but eventually drops to a fraction of the initial rate.

So far, I have tried the following :

Disable SMB2

http://www.petri.co.il/how-to-disable-smb-2-on-windows-vista-or-server-2008.htm (Smb2)

DisableTaskOffload (this was very helpful in our XenServer environment)
DisableTaskOffload on VM and Windows (Chimney)
Modify VMXNET3 ring sizes
Disable all Windows 2008 NIC performance features
-- Win2008 NIC Settings --
Netsh int tcp set global RSS=Disable
Netsh int tcp set global chimney=Disabled
Netsh int tcp set global autotuninglevel=Disabled
Netsh int tcp set global congestionprovider=None
Netsh int tcp set global ecncapability=Disabled
Netsh int ip set global taskoffload=disabled
Netsh int tcp set global timestamps=Disabled

Am I missing something obvious here?  I know ultimately we are limited by the backend iSCSI performance as all folders I'm copying between are ultimately tranferred to and from, but I know 150-180MB is possible based on many different performance tests with IOMeter, LanSpeed, etc. Jumbo frames are not an option at this time either.

This seems to be a Windows buffering issue to me since initial copy rates are so much higher in the first 30-60s of the transfer, but I can't put my finger on the ultimate issue.

Thanks for any suggestions!

Here is the initial copy rate - the est time to finish is < 60s, but the rate eventually falls to ~30-40MB/s so the copy will take ~3-4min

Copy-Initial-Rate.jpg

0 Kudos
9 Replies
jdptechnc
Expert
Expert

Are you doing any traffic shaping on either the vSwitch hosting the iSCSI vmk or the VM network?

What do the resrouces inside the Guest OS look like?  How about esxtop on the hosts that are involved in the copy...  maybe look at your disk adapter, disk device, and disk VM stats and look for latency indications or anything else that might jump out at you.

Please consider marking as "helpful", if you find this post useful. Thanks!... IT Guy since 12/2000... Virtual since 10/2006... VCAP-DCA #2222
0 Kudos
jdptechnc
Expert
Expert

If you think it is buffering in Windows causing this behavior, try xcopy /j, which will execute your copy unbuffered.

Please consider marking as "helpful", if you find this post useful. Thanks!... IT Guy since 12/2000... Virtual since 10/2006... VCAP-DCA #2222
rossanderson
Enthusiast
Enthusiast

No, its a basic vswitch with no features or traffic shaping. Very simple as far as vsphere is concerned. Very complex everywhere else Smiley Happy

EDIT : xcopy /J took longer, so Windows buffering may not be a problem. 10GB copy took ~7min, about double the time it took to do a copy/paste from the other server (using SMB)

0 Kudos
rossanderson
Enthusiast
Enthusiast

Normal copy/paste copy takes ~ 3:30

0 Kudos
Ethan44
Enthusiast
Enthusiast

Hi

Welcome to the communities.

please check with teracopy.

http://www.filehippo.com/download_teracopy/

"a journey of a thousand miles starts  with a single step."
0 Kudos
rossanderson
Enthusiast
Enthusiast

thanks for the suggestion - Teracopy did not speed up the copy at all

0 Kudos
jdptechnc
Expert
Expert

What types/sizes of files are included in this copy?  Maybe toward the end it is hitting lots of little files and is slowing down because of the overhead?

Please consider marking as "helpful", if you find this post useful. Thanks!... IT Guy since 12/2000... Virtual since 10/2006... VCAP-DCA #2222
rossanderson
Enthusiast
Enthusiast

Good question - the copy is approx. 10gb total with about 10,500 files. The number of files could very well be the underlying issue, as a network test utility on this server can read/write a single 10gb file in about 75s.

So perhaps the copy is running as fast as possible with regards to the hardware and NIC config. A secondary question would then be, is this as fast as a Windows 2008 copy can be with this many files?  Is it simply the number of files that is causing the slowdown? Interesting (sort of)

0 Kudos
rossanderson
Enthusiast
Enthusiast

I just copied 9 files totalling 2.64gb and it took about ~19s, so that was an avg of almost 140MB/s .. much more in line with what I think I should always be seeing. So JDPTECHNC, you could be spot on with your assessment re: the number of files.

I always knew you'd get worse throughput with a higher number of files but I didn't expect it to be such a performance hit. I wonder if there is anyway to improve the copies now the slow down is ostensibly due to the number of files and not to the networking config.

Thanks again for the assistance JDPTECHNC!

PARTIALLY SOLVED - copy was previously taking around 4min so on a hunch I turned off Symantec A/V and the time to copy went down to 1:50 - avg rate was approx. 91MB/s. Not as high as doing large files where I've seen 140MB/s, but a lot better than before. I assume we have realtime protection within SAV, causing every file to be examined. I'd still like to speed up copies with large numbers of files, but this at least helps.

0 Kudos