VMware Cloud Community
mike_mcc14
Contributor
Contributor

Where am I losing performance? Is it the Read Cache Reservation?

So I'll provide a lot of pictures for this topic, but essentially we have one host that sees higher latency and I'm trying to figure out why. I think I've narrowed it down to one VM that is getting a lot of read cache misses.

pastedImage_0.png

In the above picture you can see that it isn't "bad" but is definitely worse than the other hosts. (This is the VSAN Client view).

pastedImage_1.png

This above view doesn't really show any issues at all. Now for the VM I "think" could be making the host look bad:

pastedImage_2.png

The above image is just one of the vmdks on the VM, this particular one is only taking microsoft VSS snapshots of some CIFS shares. Below is the image for the data stored in the CIFS share:

pastedImage_3.png

pastedImage_5.png

I've circled the RC Hit Rate for this vmdk, which seems quite poor compared to all of the other VMs in our environment.

The one thing that sets the VM asside from the others is that I have specified a "Read Cache Reservation" of 2% = 6.3GB (315GB VMDK). So here are my main questions:

  1. Does the "Read Cache Reservation" act as some kind of limit, where it will never go above that 6.3GB?
  2. Does the "Read Cache Reservation" have an impact on write performance/latency? For example, is the write cache for this VMDK limited to 30% of the value specified? (6.3GB x 0.3 = 1.89GB for write cache?)
  3. Am I approaching this the wrong way, and/or should I just be happy with what I'm getting?

My main reason for asking #2, is that I'm seeing pretty high write latencies for this VM. It has 100% space reserved, combine that with all writes are supposed to hit an SSD it seems like write latency should be pretty low. Here is an image of the latencies:

pastedImage_8.png

It may be worth noting this VMDK is used by a 2012 server running deduplication.

Reply
0 Kudos
3 Replies
SomethingStrang
Contributor
Contributor

A quick question, What SSD's are you using in each disk group?

Reply
0 Kudos
CHogan
VMware Employee
VMware Employee

Just some thoughts - they might help, they may not. I guess the bottom line is that you wish to reduce the latency for that VM, correct?

Do you know anything about the workload in that VM? Does it involved prolonged periods of sequential reads or writes?

Along with the lower RC hit rate, does this VM also show more evictions than other VMs?

The fact that this VM is having read cache misses implies that it has to going to the spinning/magnetic disk layer to retrieve the data block. This will of course increase latency.

If you are also seeing evictions on this VM more than others, it "could" mean that the write buffer is filling up and thus blocks need to be evicted to make room for new writes. This will also increase latency.

A couple of things:

- the read cache reservation could be relevant. Without a reservation, all VMs will share the cache. With a reservation, you are giving this VM a chunk of cache for its own use. Try removing that reservation and see if things improve...

- what is the magnetic disk configuration on the hosts? Are you using a stripe width, or is it a single spindle (no striping). I'm wondering if a stripe width would improve the latency value in the case of read cache misses and cache evictions. Can you try that too?

HTH

Cormac

http://cormachogan.com
Reply
0 Kudos
mike_mcc14
Contributor
Contributor

Thanks for all the replies guys, here are the answers I have for the moment:

SomethingStrange wrote:

A quick question, What SSD's are you using in each disk group?

We have a Intel S3700 400GB SSDs in each disk group.

Cormac:

  • I am looking to reduce latency, that is correct.
  • The workload has very random reads/writes, as it is a windows file server that holds user profiles, redirected folders, some application data, and all of our network shares/folders.
    • With about 50 users accessing the VM for all of those purposes, it is probably being hit by reads/writes 100% of the time during business hours. On average it is probably only 50cmds/s, but can burst into the thousands when someone reads/saves a large file(s).
  • Evictions - For the most part it looks like evictions sits at zero for this VM, I did see two periods of evictions that happened in the last hour, but they never went above "1" on the graph.
  • Read Cache Reservation - I'll go ahead and remove it to see if there are any differences.
  • Magnetic Disk configuration - Each host has 4 magnetic disks, 600gb 10k SAS. The VM in question is using a storage policy that applies two stripes and 100% space reservation.

Thanks again!

Reply
0 Kudos