Re: RAMDISK Support

jrmunday · ‎02-19-2015

Hi All,

Does anyone know if the use of RAMDISKS are supported on virtual machines?

For example;

Disk Cache, Hybrid Disk Cache and RamDisk</title><meta http-equiv="Content-Type" ...

I have tested it, so know that it works, but I am interested if anyone has experience of using this in a production environment (assuming it is supported).

In terms of results, the performance enhancements are very favorable with the one possible showstopper that it maxes CPU utilization at 100% even at low queue depths.

Any feedback appreciated - thanks!

Kind Regards,

Jon

vExpert 2014 - 2022 | VCP6-DCV | http://www.jonmunday.net | @JonMunday77

dariusd · ‎02-19-2015

I don't see any reason why an in-guest RAMDISK would not be supported, although you'd have to be careful with sizing the disk and the VM such that guest swapping, VM ballooning or VM swapping issues didn't negate the performance gains.

Are you sure it's not just the workload that is consuming 100% CPU? With zero-latency "disk" I/O, that storage cannot become the workload's bottleneck, and a good disk-based workload will then run at such speed that it ends up consuming 100% (or very nearly 100%) of CPU time.

Queue depth should be meaningless for a zero-latency storage device.

Cheers,

--

Darius

jrmunday · ‎02-26-2015

Hi Darius,

Thanks for the reply. I was planning on reserving all memory for the guest VM to cater for concerns about balooning swapping etc.

Another issue that I noticed is that the RAMDISK software misaligns the partition, but haven't looked yet to see if you can manually specify a starting offset to correctly align the partition.

For interest, here are the results of my testing;

Mirrored SAN;

RAMDISK;

Cheers,

Jon

vExpert 2014 - 2022 | VCP6-DCV | http://www.jonmunday.net | @JonMunday77

dariusd · ‎02-26-2015

Cool graphs. :smileycool:

They're pretty much what I'd expect, including the CPU utilization of the RAM disk benchmark.

If you look at the measurement of CPU work during a regular benchmark, it will look something like this pseudocode, illustrating a simple sequential read benchmark, queue depth 4:

Benchmark starts.

Benchmark tool issues a single asynchronous I/O read for block 1 into queue slot 1

Benchmark tool issues a single asynchronous I/O read for block 2 into queue slot 2

Benchmark tool issues a single asynchronous I/O read for block 3 into queue slot 3

Benchmark tool issues a single asynchronous I/O read for block 4 into queue slot 4

while (there are outstanding I/Os) {

Wait for an I/O to complete. The CPU will be idle until one of the I/Os is complete... This is the only reason that the benchmark's CPU utilization can be below 100%!

Record the latency of the completed I/O.

if (we need to read more data yet) {

Benchmark tool issues a single asynchronous I/O read for the next block into whichever queue slot is now available

}

Benchmark ends.

Now, for a RAM disk benchmark, the latency (the part in blue text above) is always zero, assuming no swapping at any level. The I/Os issued can complete immediately: The only delay is the time taken to copy the block(s) from the RAM disk's memory range into the memory used by the benchmark tool, which is a synchronous "solid state" delay... It does not depend on any rotating media, does not depend on the request being transmitted across any link to a physical disk device and the block data being transmitted back, etc.

Because the latency is zero and the I/Os can complete immediately, there is absolutely no reason for the CPU to ever be idle... It will always either be working on behalf of the benchmark tool (recording the latency stats, preparing the next I/O request) or on behalf of the RAM disk (copying the data from its memory buffer into the buffers used for the I/O), or on the various bits of kernel and userspace glue to keep things working. All the stuff in red text is this work, and it is all real CPU utilization.

With no reason for the CPU to ever be idle (e.g. with the blue part effectively removed), there is no reason for the CPU utilization to ever drop below 100% while benchmarking.

This does not mean that a real-world workload on a RAM disk will always lead to 100% CPU utilization... Only workloads which would otherwise be constrained by I/O bandwidth will now instead become constrained by CPU speed, and will consume all available CPU power while the workload is actively "working" with no other resource constraints.

Now if you measure CPU utilization per I/O, that would tell a different story... the RAM disk's CPU utilization per I/O may well be higher than that of a good disk controller, since the disk controller will do DMA and avoid burdening the CPU with actually copying the data around, while the RAM disk would need the CPU to copy the data.

TL;DR -- Don't be afraid of 100% CPU utilization for an artificial benchmark running against a RAM disk. A benchmark designed to exercise an I/O bottleneck will become CPU-bound when the I/O has no latency.

Hope this helps!

--

Darius

jrmunday · ‎03-04-2015

Cool graphs. :smileycool:

Thanks, it's just iometer CSV results mashed in an Excel pivot table ... simples

vExpert 2014 - 2022 | VCP6-DCV | http://www.jonmunday.net | @JonMunday77

admin · ‎03-05-2015

Having worked 2.5 years for the ESXi and vCenter Server support department in GSS I would not have denied support for such a solution, simply referred you to the vendor though if you had any issues. Basically we were treating the VM as a black box. We make sure it can boot and everything after that is mainly the issue to resolve for the OS/application vendor then. There are some exceptions to that like performance in proper comparision being extremely poor but that should give you some general guidance I hope.