Poor local disk performance


I've been running ESXi 4.1 on two whitebox servers with VirtualCenter for a long time without problems... or so I thought. Performance has been good but it's a home lab and seemed reasonable. Moving a VM from esxihost1 to esxihost2 through vSphere client has never been fast but I just assumed it was charasteristic of my lab hardware. Today I needed to move a vmdk from hdd1 to hdd2 and it took forever. I performed 'mv' logged into the esxihost. I thought I would log in on a second ssh session and check it out but that took forever. During the mv operation, the second ssh session was slow and intermittantly hung. It took at least a minute for the login. Sometimes just pressing carriage-return at the unix shell prompt would take 1-2 seconds for a response same as ls -l, other times it would take 30 seconds or much more.

Copying a 42 GB file from local disk1 to local disk2 via 'cp' took 29 minutes. I shut down all VMs on this same host and rebooted into maintenance mode. cp of the same file to a different name on the SAME disk ('from' disk and 'to' disk were same local disk) took 39 minutes! The esxtop stats below seem low:

adapter: 16.30 MBREAD/s, 20.27 MBWRTN/s

disk: 11.48 MBREAD/s, 23.35 MBWRTN/s

Identical systems are ASUS P5QL-PRO mobo, Intel Core2 Duo Quad CPU 9400 @ 2.66 GHz, Intel PRO/1000 GT Desktop Adapter, Promise SA300 TX4 SATA controller. Two 7200 RPM hard drives connected to the TX4. VMware ESXi 4.1.0 build-320137. e.g. 4.1 Update 1. All of this hardware is listed at http://www.vm-help.com/esx40i/esx40_whitebox_HCL.php

Does anyone have ideas how to diagnose and solve the problem? My guess is the Promise TX4 controllers aren't that great although they are on the whitebox HCL. I'm not a unix guy. If I swap out the controller, I assume I'll need to reload ESXi? Is there a way to back up and restore the ESXi configuration? Will ESXi automatically detect the hardware swap and take care of any required drivers assuming it recongizes the hardware?

Here's the system info. Thanks in advance!

Disk /dev/disks/t10.ATA_____ST3250410AS_________________________________________6RYL3HZM: 250.0 GB, 250059350016 bytes
64 heads, 32 sectors/track, 238475 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

                                                                               Device Boot      Start         End      Blocks  Id System
/dev/disks/t10.ATA_____ST3250410AS_________________________________________6RYL3HZMp1             5       900    917504    5  Extended
/dev/disks/t10.ATA_____ST3250410AS_________________________________________6RYL3HZMp2           901      4995   4193280    6  FAT16
/dev/disks/t10.ATA_____ST3250410AS_________________________________________6RYL3HZMp3          4996    238476 239083704   fb  VMFS
/dev/disks/t10.ATA_____ST3250410AS_________________________________________6RYL3HZMp4   *         1         4      4080    4  FAT16 <32M
/dev/disks/t10.ATA_____ST3250410AS_________________________________________6RYL3HZMp5             5       254    255984    6  FAT16
/dev/disks/t10.ATA_____ST3250410AS_________________________________________6RYL3HZMp6           255       504    255984    6  FAT16
/dev/disks/t10.ATA_____ST3250410AS_________________________________________6RYL3HZMp7           505       614    112624   fc  VMKcore
/dev/disks/t10.ATA_____ST3250410AS_________________________________________6RYL3HZMp8           615       900    292848    6  FAT16

Partition table entries are not in disk order

Disk /dev/disks/t10.ATA_____Hitachi_HDP725050GLA360_______________________GEA534RJ06SSXA: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

                                                                               Device Boot      Start         End      Blocks  Id System
/dev/disks/t10.ATA_____Hitachi_HDP725050GLA360_______________________GEA534RJ06SSXAp1             1       523   4193280    6  FAT16
Partition 1 does not end on cylinder boundary
/dev/disks/t10.ATA_____Hitachi_HDP725050GLA360_______________________GEA534RJ06SSXAp2           523     60802 484193272+  fb  VMFS
/bin #

//bin # hwinfo -p
Seg:Bus:Sl.F Vend:Dvid Subv:Subd ISA/irq/Vec P M Module       Name   
                                Spawned bus                         
000:000:00.0 8086:2e20 1043:82d3               V             
000:000:01.0 8086:2e21 0000:0000 10/ 10/0x68 A V              PCIe RP[000:000:01.0]
000:000:26.0 8086:3a37 1043:82d4 10/ 10/0x68 A V usb-uhci    
000:000:26.1 8086:3a38 1043:82d4 15/ 15/0x70 B V usb-uhci    
000:000:26.2 8086:3a39 1043:82d4  5/  5/0x78 C V usb-uhci    
000:000:26.7 8086:3a3c 1043:82d4  5/  5/0x78 C V ehci-hcd    
000:000:28.0 8086:3a40 0000:0000 11/ 11/0x88 A V              PCIe RP[000:000:28.0]
000:000:28.4 8086:3a48 0000:0000 11/ 11/0x88 A V              PCIe RP[000:000:28.4]
000:000:29.0 8086:3a34 1043:82d4 14/ 14/0x98 A V usb-uhci    
000:000:29.1 8086:3a35 1043:82d4  3/  3/0xa0 B V usb-uhci    
000:000:29.2 8086:3a36 1043:82d4  5/  5/0x78 C V usb-uhci    
000:000:29.7 8086:3a3a 1043:82d4 14/ 14/0x98 A V ehci-hcd    
000:000:30.0 8086:244e 0000:0000               V             
000:000:31.0 8086:3a18 1043:82d4               V             
000:000:31.3 8086:3a30 1043:82d4  5/   /     C V             
000:001:00.0 10de:01d3 3842:c430 10/ 10/0x68 A V             
000:002:00.0 11ab:6101 1043:82e0 10/ 10/0x68 A V             
000:004:00.0 8086:107c 8086:1376 10/ 10/0x68 A V e1000        vmnic0
000:004:01.0 8086:107c 8086:1376 11/ 11/0x88 A V e1000        vmnic1
000:004:02.0 105a:3d17 105a:3d17  5/  5/0x78 A V sata_promise vmhba0
/bin #

Here's the esxtop output for disk adapter and disk device while I performed the cp test again.

3:34:53am up  3:20, 180 worlds; CPU load average: 0.01, 0.01, 0.00

vmhba0 -                       0     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00
vmhba33 -                       1     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00
vmhba34 -                       0     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00
vmhba35 -                       1  2653.31  1441.56  1210.56    11.48    23.35     0.37     0.33     0.70     0.00

3:35:17am up  3:21, 180 worlds; CPU load average: 0.02, 0.01, 0.00

t10.ATA_____Hit           -               1     -    0    0    0  0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00
t10.ATA_____ST3           -               1     -    1    1  100  2.00  4172.59  2086.29  2086.29    16.30    20.27     0.24     0.23     0.46

0 Kudos
9 Replies

If you've logged onto a console and moved using "mv", then you're dealing with the console's rate limiting. This generally isn't a supported way of moving data by the way.

I'm unsure if the supported "vifs.pl" script suffers this issue.

Do you experience these issues if you copy a file within a VM? That will help you determine whether the console's the limit here.

The "fast" way to move VMs is with a storage vmotion. I realise you probably don't have an appropriate license - but then, that's part of asking people to pay for such features. The other way to try it is with the VMware Standalone Converter - which should be significantly quicker than what you're doing.


I would start by switching to vmkfstools to copy the virtual disk.  cp and mv aren't the ideal tools.  I just did a quick test with cp and vmkfstools to copy a 40 GB virtual disk (about 15 GB of data in the virtual disk).  With cp it would have run about 32 minutes.  vmkfstools was done it 4. Smiley Happy


Thank you both for the helpful replies. The actual disk file  size is 42 gb. I'm just trying to determine  at this point if disk performance is what it's supposed to be or way  off. I would use hdparm but it isn't available in ESXi. I'm not really sure how to go about determining what it should be but the disk drives are rated at nearly 100 Mb/second sustained and my numbers are 25% of this figure.

I found some posts where people were having controller performance issues and the magic ticket was enabling write cache. There doesn't seem to be a way to access the bios on this card and I can't figure out how - if at all - this can be enabled in ESXi. There's a Promise Windows utility that I think can access it. Maybe it's currently disabled? I don't think I would be able to get at the card through a Windows VM. I wonder if I can install Windows on a flash drive that I can plug into the ESXi host and boot? Install the Promise utility and check the setting?

I also found where some improved performance by making sure the mobo BIOS SATA setting was AHCI not IDE. Mine was IDE, but the drives aren't plugged into the mobo. They're plugged into the Promise card that doesn't seem to have a BIOS I can get to through boot. In any event, I changed the mobo setting and esxtop now shows 6 additional adapter ports which would be correct. I wonder if I even need the Promise card now? What would happen if I moved the two HDDs to the mobo ports? Would the OS boot and work correctly without reconfiguration?

Let me know if you have any other ideas.

Thanks again.

P.S. I'll check into using vmkfstools.

Message was edited by: Piggy

0 Kudos

Are the two disks in an array or just individual disks.  If they're individual disks you should be able to move them, rescan and likely the datastore will come back up.  You don't want to use the format them again, but you may have to resignature the disks.

Battery backed write cache will make a difference.  If ESXi doesn't see battery then it won't use controller cache.

0 Kudos

Individual disks. ESXi is installed on one of them.

How does ESXi know if there's a battery? I know there isn't but if I can enable cache (assuming it's disabled) will ESXi accept it? The hosts are on a big UPS so I'm not worried about a power failure.

What sort of performance gain do you think I'll see if I have a controller with a battery? Do you have any recommendations for something not too expensive for a home lab? I use the lab pretty heavily and definitely want to increase performance if possible.

0 Kudos

How does ESXi know if there's a battery?

It does not, that's the point. ESX "talks" to the controller and waits for confirmation. The faster the controller acknowledges disk writes the faster the Hypervisor can send the next request.

On e.g. HP DL380 G5 servers I saw differences between 5-10 GB/s without and 80-100 GB/s with BBWC. However, this also depends on whether there is sequential or random access.


0 Kudos

If these drives are running as individual drives (no RAID) then BBWC can still be needed as the controller or drive firmware may be disabling write caching.  For example WD's Caviar Green drives, via on-board SATA, seem to have their internal write cache enabled and hence perform OK, whilst REx drives seem to have it disabled in the same configuration and hence perform badly.

However when either is used on a Perc-6i, write-back cache configuration (which needs the battery) is needed to get decent performance.

As an aside, AHCI mode will enable SATA NCQ, if the controller supports it.

Hope that helps.

0 Kudos

What if I replace the Promise controller with an Adaptec 2405? I confirmed it's supported. It has 128MB cache but no battery backup. I don't see the point of a battery because I'm screwed anyway if the UPS goes out. This is the least expensive supported controller I've found with a cache. New anyway... haven't checked eBay yet.

I've read posts where folks say ESXi will wait for writes to complete without BBWC but how does it know the difference between a caching controller with or without a battery? Isn't this all handled in the controller anyway?

Also just to be clear, this isn't part of a RAID configuration. Just two separate disk drives.

Thanks again!

0 Kudos

Having re-read this I'm not convinced that there is a caching issue here, as 42GB in 32 minutes is 24MB/s which is way more than we'd expect if caching was disabled (more like 5MB/s).

I'd suggest installing IOMeter on a WIndows guest on the system, assign a 2GB thick provisioned disk to the VM from each physical drive, then run 32K 100% sequential read workload against each, individually, with 16 outstanding IOs.  Then repeat for write.  Run each test for at least a minute and note it may take a while to fill the drives with test data first.

If there is a write caching issue, the read and write performance will be very different.

0 Kudos