Contributor
Contributor

Load average spike when copying/moving large files

Jump to solution

Hello,

I'm running vSphere ESXi, the free version, on an HP ML350 G6. The host server has 10GB of memory, a single E5620 processor, and SATA hard disk drives which are directly attached. I run 6 VMs on this particular server -- very low resource usage on them.

My problem is that when I move or copy large files (2-4GB), the load average of the particular VM I'm working in spikes into double digits. The VM is running CentOS 5.5. The swap file for the VM is located on a separate physical disk than the OS. One thing I've noticed is that the hypervisor can't seem to determine whether or not hardware acceleration is enabled for the storage controller (Smart Array P410i). It shows as "unknown" in the vSphere client. I'm pretty sure it's enabled.

Could someone please send me in the right direction on how to troubleshoot this I/O problem. If you need any further information, let me know.

Thanks,

Rod

0 Kudos
1 Solution

Accepted Solutions
Virtuoso
Virtuoso

I'm not sure that the kind of hardware acceleration that Andre is talking about (VAAI) will help you even if you were to get the right storage array and license of vSphere.  VAAI helps a lot for operations like cloning VMs, creating new virtual hard disks, and a few other things.  But your issue occurs when copying large amounts of files within the guest OS, correct?  If so then VAAI will not make any difference.

For what it's worth I think the vSphere Essentials bundle is an absolutely steal to get a licensed vSphere product.  It's only $500 and includes licenses for up to three physical servers running ESX or ESXi, and it includes a copy of vCenter.  The copy of vCenter alone is worth the price.

It's possible that the file copy is taking up so many CPU resources that other processes are being forced to wait.  How many vCPUs are assigned to the VM?  Maybe try assigning it another vCPU and see if that helps with the issue.  Switching to a different virtual NIC driver (like the VMXNET 3 driver) may help reduce CPU utilization during large copies as well, but probably not a huge amount.  I'd see how the VM performs with 2 vCPUs.

Matt

http://www.thelowercasew.com

Matt | http://www.thelowercasew.com | @mattliebowitz

View solution in original post

0 Kudos
12 Replies
Virtuoso
Virtuoso

When you said load, what do you mean specifically?  Are you referring to CPU utilization?

It is common for the CPU usage to go up when copying large files as that is a CPU intensive process.  If you've got the VMware Tools installed in the guest and use a driver other than the E1000 NIC driver you will get lower CPU usage during high NIC utilization.

Hope that helps.  If not, can you clarify what you mean by load?

Matt

http://www.thelowercasew.com

Matt | http://www.thelowercasew.com | @mattliebowitz
0 Kudos
Immortal
Immortal

One thing I've noticed is that the hypervisor can't seem to determine whether or not hardware acceleration is enabled for the storage controller (Smart Array P410i). It shows as "unknown" in the vSphere client. I'm pretty sure it's enabled.

This is only for VAAI... and require other license and other kind of storage (only some enterpripre storage support it).

If your controller has a backup battery, put the cache to write back and repeat the test.

Andre

Andre | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro
0 Kudos
Contributor
Contributor

Thanks, Matt. The weekend hit. Sorry for the delay in replying.

By load average, I mean the load average of a Linux machine where processes are waiting for attention from the CPU. It's the three numbers you see when running the "w" or "uptime" commands. I don't fully understand load average. I just know that when it spikes, especially into the double digits, things slow down considerably.

I do have VMware tools installed in the guest. I just checked and I see that the e1000 driver is loaded on the guest.

Andre mentions that hardware acceleration isn't available in the version of vsphere I'm running (the free hypervisor). Do you think that might be the problem? We are looking at buying vSphere Essentials soon.

Thanks for your input,

Rod

0 Kudos
Virtuoso
Virtuoso

I'm not sure that the kind of hardware acceleration that Andre is talking about (VAAI) will help you even if you were to get the right storage array and license of vSphere.  VAAI helps a lot for operations like cloning VMs, creating new virtual hard disks, and a few other things.  But your issue occurs when copying large amounts of files within the guest OS, correct?  If so then VAAI will not make any difference.

For what it's worth I think the vSphere Essentials bundle is an absolutely steal to get a licensed vSphere product.  It's only $500 and includes licenses for up to three physical servers running ESX or ESXi, and it includes a copy of vCenter.  The copy of vCenter alone is worth the price.

It's possible that the file copy is taking up so many CPU resources that other processes are being forced to wait.  How many vCPUs are assigned to the VM?  Maybe try assigning it another vCPU and see if that helps with the issue.  Switching to a different virtual NIC driver (like the VMXNET 3 driver) may help reduce CPU utilization during large copies as well, but probably not a huge amount.  I'd see how the VM performs with 2 vCPUs.

Matt

http://www.thelowercasew.com

Matt | http://www.thelowercasew.com | @mattliebowitz

View solution in original post

0 Kudos
Contributor
Contributor

Yes, the issues occur when I'm copying large amounts of data within the guest. I currently have 4 vCPUs assigned to this particular guest. The processor is an E5620. Would I be able to assign more than 4 vCPUs to the guest? I think we will be purchasing vSphere Essentials. Maybe that will help. It's definitely a good price for what you get. Thanks!

0 Kudos
Immortal
Immortal

More vCPUs are not likely to help. I would in fact reduce the CPU count. You mention the P410 disk controller. Does it have a battery backed or flash backed cache? Without the BBWC module write caching isn't enabled which will have a huge impact on write performance.

-- David -- VMware Communities Moderator
Contributor
Contributor

I don't know if it has the BBWC module or not. How would I find out? Thank you.

0 Kudos
Immortal
Immortal

By default I think that the P410i comes without RAM or Battery module unless you specifically ordered it. I would just check with who you ordered the server from. If you have the HP version of ESXi installed you may be able to tell from the hardware TAB in the vSphere Client.

-- David -- VMware Communities Moderator
0 Kudos
Virtuoso
Virtuoso

If you've already got 4 vCPUs assigned to the VM then I agree that it doesn't need any more.  How many other VMs are on the host and how many vCPUs do they have assigned?

Matt

http://www.thelowercasew.com

Matt | http://www.thelowercasew.com | @mattliebowitz
0 Kudos
Contributor
Contributor

There are 5 other guests, each assigned only 1 vCPU. I contacted HP today and they say that this particular server has 256MB of write cache, indicating a BBWC module is present. However, they weren't sure whether or not it is enabled by default or whether or not it has a battery attached to it. They asked me to run an Array Diagnostics Utility report and send it to them. I might visually inspect the machine to see if I see the module/battery. I have another ML350 that is a G5, and that has a 128MB module that has the battery. The G5 is also running the free version of vSphere ESXi. I use it as a backup to the production server. I tried enabling physical drive write cache, after verifying the presence of the write cache module/battery back up, and tested copying a large file and performance seem to be much better. The G5 has a different host adapter than the G6. It has a p200i.

0 Kudos
Virtuoso
Virtuoso

I would strongly suggest reducing the vCPU allocation to a maximum of 2 vCPU on one guest, and 1 vCPU on the remainder (this is a quad-core box, correct?).  Keep in mind that guests are simply processes from ESXi's perspective; if 4 vCPU are assigned then ESXi can't do anything itself concurrently, such as service hardware interrupts generated by the NIC.

But as stated above the issue sounds like lack of write caching on the controller.  It needs to be in write-back mode.

Contributor
Contributor

Thanks, J1mbo. I'll reduce the vCPUs to 2 when I'm able to restart that guest.

I did a physical inspection of the machine, taking the side panel off. I found that there is a 256MB module in the slot for the controller, however, there is no battery backup attached. I'll have to look at purchasing a battery pack for it.

0 Kudos