jpreou
Contributor
Contributor

Slow network copy between P2V'd VMs on same host

We've recently installed an HP C3000 chassis with three BL460c blades each running ESX 3.5 (Foundation, updated to just prior to 'Update2') and backending to an HP MSA2012fc SAN. The blade chassis has Gb2E ethernet switches which connect to a Cisco 3750 core switch. What we have noted is that CPU utilization on the File/Printer server (which was P2V'd using VMware Converter) is pegged during network activity and network file transfer performance is slower than expected. Looking at historical logs we can easily see the CPU peaks from 8am to 6pm when users access the server. It did not exhibit this same behaviour when it was a physical server. We're seeing 90%+ CPU during business hours and virtually nothing outside of hours. Even after a full cold-reset of the entire environment while we re-racked the server room post initial install we still saw high CPU, though not quite up to the 90% levels (around 45%-60% now). Performing some simple 'real world' diagnostics using file copy tests we noted the following:-

  1. Copying a 370MB file between freshly built VMs is just fine, regardless of which host they are on.

  2. Copying the same file between P2V'd VMs on seperate hosts is also just fine.

  3. Copying the same file between P2V'd VMs on the same host (i.e. not even touching the external network) is slow and pegs the CPU.

  4. I rolled out two new servers from template (fresh builts) and copied the same file between them (for each host, using an "Internal Only" network with no uplinks). All good.

By slow, I mean that normally the 370MB file takes around 10-15 seconds to copy between VMs, but in scenario 3 it takes around 3½ minutes. It only seems to be between P2V'd machines, and only when the machines are on the same ESX host. In each case the CPU pegs to 100% during the file copy. We have checked this on other ESX servers at other customers and don't see the same behaviour there.

When doing the P2V we went through all the 'normal' post-cleanup tasks. Took the HAL back to single proc (for 1vCPU), removed hidden and non-present devices, uninstalled all software, utilities, drivers that were not required (like all the HP stuff), installed VMware Tools, etc. All the network cards, switches, interfaces, etc are all set to auto-negotiate and are Gbit and I have confirmed that everything is running at full speed and full duplex. Network cards in VMs are using the Flex driver installed with Tools. In theory, though, network copies between VMs on the same host shouldn't even touch the physical network.

Anyone got any ideas for me before I call VMware?

0 Kudos
4 Replies
mmperry70
Contributor
Contributor

Same thing - one freshly built VM (Win 2008 64bit) and one P2Ved (Win2003 32bit) both on ESXi 3.5U4.

Both have latest VMware tools.

I'm only getting around 2.8MB/sec = 2GB file copy took over 10mins.

This is strange since I thought it'd be lightning fast???

Anyone ever get an answer from VMware???

Thanks.

0 Kudos
InToBytes
Contributor
Contributor

Hi,

I was wondering if you are any further withe mentionend behaviour with slow copy's.

We are having same problems. We P2V'd a W2K3 first release withe SP2 Domain controller to an VM. On these are also profiles for the Terminal Cluster enviroment.

File copying of many smaller files is extremely slow and users (170) on the Terminal server experience problems with opening files. We also tried replace the network switches and the virual switches within the VM but no results.

Maybe someone have a solution. We also wil be opening a ticket to support tommorow.

Thnx ahead

0 Kudos
jpreou
Contributor
Contributor

I'm afraid we never did find a solution. It was far enough back that I can't remember what the outcome was, and since we aren't allowed more than 100MB of mailbox and I already archived all last year's mail I can't easily look it up either! Obviously it doesn't continue to be a problem otherwise the customer would still be screaming at us. Either that or they simply accepted it (which I doubt is the case!)

UPDATE: I have a sneaking suspicion (without going to check) that we modified the vmx config file for the file/print server to force the use of the Intel E1000 network card inside the VM instead of the default AMD PCNet (Accelerated or otherwise).

0 Kudos
alpferd
Contributor
Contributor

Yeeeeeeeeeeaah!!!! solved!

http://rhymingpanda.com/weblog/2007/03/13/20_22_12/index.html

Set the tcp segmentation offload to off:
$ sudo ethtool -k eth0 \\ rx-checksumming: off \\ tx-checksumming: off \\ scatter-gather: off \\ tcp segmentation offload: off$ sudo ethtool -K eth0 tso off \\

I set this on all VMs and network speed is back!!!! Smiley Happy

0 Kudos