Disk write rate is double read rate

dkleeman · ‎11-26-2007

Hi.

I am running ESX 3.0.1 starter edition on a Dell 2950 server with direct attached hardware RAID. Whenever I look at the disk stats when I am doing an internal disk-to-disk copy operation I see that the write rate is double the read rate. Is this normal behaviour, and if so, why?

Please see this screenhot:

Thanks

Daniel Kleeman

Cambridge, UK

dkleeman · ‎11-28-2007

Does no-one else see this behaviour doing a disk-to-disk copy?

BenConrad · ‎11-28-2007

does 'esxtop' (type 'd' to get the disk stats) on the ESX server give you the same information?

dkleeman · ‎11-30-2007

Yes, I see the same kind of stats using ESXTOP.

RParker · ‎11-30-2007

Could be a display bug. I would upgrade to 3.02 and make sure you have ALL the patches installed. Then try the test again.

I don't know what RAID you are using, but it shouldn't matter, ANY write performance shouldn't display this, as writes take longer than reads. It must be a bug in the RAID configuration on ESX, maybe it's swapping the info.

ronzo · ‎11-30-2007

Hi Daniel-

Besides looking at esxtop, you should try the 'phdcat' test. We include 'phdcat' with esXpress as a handy test all type command.

(You can install the esXpress RPM, copy out the PHDCAT, then remove esXpress.)

Basically phdcat is cat with stats.

Go to a folder on the VMFS or local storage.

To test write speed:

phdcat /dev/zero > zero

The write test reads NULLs from memory and writes them to a file. This is a real representation of how fast you can write to the disk from within the console.

Read Test:

phdcat some-flat.vmdk > /dev/null

The read test, will read from the disk and write to /dev/null. So you get a good idea of how fast you can read.

phdcat is just cat, so it will run until you hit ^C (Control C) or run out of space. Remember to delete the 'zero' file you created.

On my Dell 1950, with local mirror:

phdcat /dev/zero >zero

4612 MB processed Avg(61.5 MB/s) Cur(87.2 MB/s)

phdcat Demos-flat.vmdk >/dev/null

4146 MB processed Avg(414.6 MB/s) Cur(329.1 MB/s)

On my iSCSI attached storage on this Dell.

phdcat /dev/zero >zero

810 MB processed Avg(19.8 MB/s) Cur(17.0 MB/s)

phdcat zero >/dev/null

768 MB processed Avg(48.0 MB/s) Cur(68.2 MB/s)

thanks

ron

dkleeman · ‎11-30-2007

Thanks to RParker and ronzo for their comments. The RAID is built-in Dell PERC RAID controller that is supported by ESX. Yes, upgrading to latest patch level must be a good thing (but not possible until some downtime can be scheduled).

The question relates not to disk read/write speeds but to the fact that on a long term copy the write rate is consistently double the read rate. This would seem to be impossible/ridiculous. Are people seeing matched read & write rates when the do a disk-to-disk copy? This is what I would expect.

Daniel

christianZ · ‎11-30-2007

Maybe all this is normal and depends from controller cache algorithms - as I remember I saw such phenomena too.

Stefan_Sarzio · ‎07-01-2008

I am experiencing the same problem within an Samba-Server VM on an IBM server. Data is read by the service console at ~ 13 MB/s and written by the SAMBA-Server VM at ~ 27 MB/s. The VI client as well as esxtop report the same values.

DannoXYZ · ‎07-02-2008

I see this all the time and I think it's normal. The RAID controllers give enhanced write-performance due to the caching algorithms. They buffer the data-stream and re-order the commands to put the data onto the disk in a continuous stream. This improves throughput immensely. The host and VMs can send write-data as fast as the controller and buffer can keep up.

Reads on the other hand, are at the mercy of the physical layout on the platters. The data can only come off the surface so fast and no amount of caching or intelligent algorithm can improve it. The order of the sector-reads has to be sequential to match the file's structure: block-1, block-2, block-3, etc. But these blocks may be scattered all over the drive, requiring a lot of latency as the head moves back and forth... sometimes ALOT.

Stefan_Sarzio · ‎07-02-2008

I'm not only seeing this on small burst transfers, but on continuous transfers. This rules out that any buffering on the controler side might be the reason.

DannoXYZ · ‎07-02-2008

Command-queuing helps continuous writes too. Small-burst transfers can typically be 20-50x faster than reads, while continuous is 2x as fast. As a test, you can reconfigure the RAID cards' buffer for write-through operation instead of write-back. Or you can disable the buffer completely. In which case you'll see write-speeds drop tremendously. Yet, read-transfers will pretty much stay the same.

rajkn · ‎07-02-2008

I would look for -

Do write and read use the same block size?

Do the read and write - both doing random or sequential?

Do both do direct or cached I/O?

I have observed that writes with blocksize that's a multiple of 4K give better throughput than the ones that are not. IMO, there are so many variables out there that could make one do better than the other. If everything is same, then yeah I agree, read should do better than writes.

~rajkn

Stefan_Sarzio · ‎07-02-2008

Danno - if I constantly read at 10 MB/s and constantly write at 20 MB/s (with no other tasks/programs/vms running)...then that has nothing at all to do with buffering, command-queuing or anything like that. Neither buffering nor command-queuing "invents" or doubles data.

DannoXYZ · ‎07-02-2008

Yeah, it makes no sense that the exact same amount of data would show up with twice the write-rate. That would make more sense if you're writing stuff to that disk from another source. What we humans may be thinking of is overall or average throughput. Such as you take the total size of the file, and divide out by the total amount of time it takes to read or write it.

The display may be doing it in much smaller time-slices. While actual disk-performance may reflect what you're seeing, however, the indicator may be misleading perhaps. What it could be showing might be instantaneous transfer-rates over short time-spans. So if the reads occur over 1-sec and the writes take only 0.5-sec with 0.5-sec of waiting, the indicator might be showing just the max-rate during the short 0.5-sec that it's actually writing. Or use any arbitrarily shorter or longer time-span.

The other thing that's screwy is the "average" disk-usage is faster than any individual drive's read OR write speed. Where did it get the 14-MB/s peak disk-usage number where none of my disks got above 7-MB/s? I think what we're looking at is different time-frames for the indicators.

DannoXYZ · ‎07-02-2008

Ok, I figured it out by separating read versus write operations. I took a virgin WinXP VM and copied a 300mb ISO file from my local-storage to my network SAN storage. Then I deleted the file and copied it back. The SAN is an HP-EVA5000 and my ESX-host is hooked up to it with a 1gb NIC. I can pretty much be assured that network-traffic either way and the SAN's performance is identical in both cases. Here's the results:

READ-performance: 115-seconds = 2.6-MB/s

WRITE-perfomance: 42-seconds = 7.1-MB/s

By separating the read versus write, it's pretty simple to see that the write-speeds are about 2.5x faster than reads. In the OP's case, what's happening with copying a file locally is that the computer is going back and forth very quickly between read and write operations. I might read in 20mb, then write it, then read in another 20mb, then write it, etc. So it's spending 1sec reading, then 0.5sec writing, then 1sec reading, then 0.5sec writing. Or whatever data-chunk size or time-slice you prefer. It basically spends twice as much time reading as it does writing and the performance graph shows the instantaneous read & write speeds at that time. Or the average over whatever time-slice the graph is plotted with.

As a test, I did a duplication of that file on the local-storage disk. Overall, I got 161-seconds, or almost exactly the sum of the individual read and write tests above. So during this time, the computer is writing about 2.5x as fast for the exact same amount of data and spending 2.5x as much time reading that data.

Schorschi · ‎07-03-2008

Is the data stream unique data? Or you really forcing the controller to read directly from disk? Is the write to disk using cache, so the read just looks slow in comparision? Pulling truly unique data on a read will always seem slow, if the comparative write is cached. This is correct behavior, exactly what a write cache should be doing, meaning releasing control back to the DMA, mainboard processor is told that the write is done, when it really is still pending write to disk.

Is the PERC card in the setup (BIOS) set for 50% write 50% read caching? It feels like the PERC card is set to 100% write cache. Been a while since I been in the low level setup for a PERC card, but I recall on HP smart array it was rather simple to change the caching bias, read vs. write.

urs_weber · ‎04-17-2011

Hello

I often wondered about this strange behaviour, too and it's still there in version 4.1.

It has nothing to do with RAID, cache nor blocksize. ESX can't "see" the RAID or how much is really written to the disks, it just "sees" a LUN. Same thing with caching. If the VMFS-blocksizes are different for source and destination it may impact performance and show different IO-counts but not the doubled amount of data written to the destination disk (the surface of the writes-chart should be equal to the reads).

Maybe ESX is zeroing out the destination vmdk before writing to it but this makes no sense to me.

I also checked the counters on the storage system and the storage also receives the doubled amount of data compared to the reads. So ESX really writes double the amount to the destination disk and it's not a bug in the performance counters or the chart.

Is there anybody out there that can explain, why this happens and how to change it, because I'd like avoid needles IOs.

Regards

Urs

tbraechter · ‎04-17-2011

Ich bin zurzeit nicht im Hause. Ich kehre am 26.04.11 zurück. Bitte wenden Sie sich in dringenden Fällen an Thomas Dierkes (Thomas.Dierkes@havilog.com) oder Daniel Lübbe (Daniel.Luebbe@havilog.com).

I'm currently out of office. I will be back on 26.04.11. In urgent cases please contact Thomas Dierkes (Thomas.Dierkes@havilog.com) or Daniel Lübbe (Daniel.Luebbe@havilog.com).

joelasaro · ‎11-07-2011

Wow this is an old thread, but still no one has answered it. Wish I could, but all I can do is add my voice to it. I am seeing the same thing. In my case tonight I am doing Storage vMotions and I was seeing the exact same thing. My write rate (KBps) is double the read rate (KBps). I don't see any way that is explained by anything but that the ESXi 4.1 server (in my case) is writing everything about twice. But why?

All

Disk write rate is double read rate