Highlighted
Enthusiast
Enthusiast

Read Cache Hit Rate below 90%

Jump to solution

Hi All

I have 2 identical VxRail clusters and just one (with less workload) I observer low Read Cache Hit Rate one many disk groups (one DG looks worst that others).

CONFIGURATION:

=========================================================

4x VxRail nodes P570 Hybrid ver 4.5.400-14097254

External vCenter 6.5 build 14690228

vSAN config

Hybrid 4 disk groups (each 1x800GB cache + 5x1.2GB magnetic disks)

=========================================================

ISSUE: LOW Read Cache Hit Rate

The vSAN health all green expect vSAN disk balance (same other cluster) => ongoing issue VMware recommends to wait and it should balance itself but needs time

The other cluster has same configuration, type of workload (even more VM's) and is not experiencing this issue.

How I can identify what is causing this problem? If this is purely related to workload access patter (loads random reads) which I doubt (other cluster is OK) how I can identify those VM's and what I can do to improve reads from cache (example add more strips). VMware didn't give me any answers so far just keep recommending upgrade which is not an option at the moment at least until will get proper RCA.

Thanks

1 Solution

Accepted Solutions
Highlighted
VMware Employee
VMware Employee

Hello szafa

So, you use the word 'issue' and 'problem' multiple times here - do you actually have a perceivable performance issue in this cluster or are you just getting tired of the vROps alarm?

I ask this very basic question as I have seen this vROps alert triggered for dozens of clusters where the VMs are working away happily and where the cluster hardware is perfectly adequate for the workload - if the answer to the above is the former, then consider improving the Cache:Capacity ratio of the Disk-Groups, if the answer is just the latter then disable this alarm.

To explain the basics of how a read-cache works in a vSAN Hybrid cluster, here is an analogy:

You have 2 bookshelves - one is relatively small but nice and close to where you read every evening it can hold 10 books, the other can hold 100 books but is on the other side of your house.

You have a good book (book1) that you are reading bits of every evening - you place it in your small but close shelf - , you then get another book (book2) that you find yourself frequently referencing and store it in the small but close shelf also.

Time goes by, you have now amassed 12 books on this topic and thus cannot fit them all in the small but close bookshelf - the original books (book1 and book2) are no longer the books you find yourself reading every night, only referencing them infrequently (or at least less frequently then the others), so you store them in your larger but farther away bookshelf.

In case it was not abundantly obvious, the small bookshelf is the read-cache portion of the Cache-tier SSD of a Disk-Group, the large bookshelf is the cumulative storage of the Capacity-tier HDDs of the same Disk-Group and the 'books' are the data stored on this Disk-Group.

The read-cache of a Disk-Group (~70% of the SSD size) is only so large and thus at any one time it can only contain a subset of all of the data in the Disk-Group - sure you *could* size the Disk-Groups so that the read-cache was the same size as cumulative capacity of the Capacity-tier but for most purposes this would be poor design and wasteful.

Thus on a read IO if the data is not found in the read-cache, it has to access this from the slower HDD - this is a read-cache 'miss' - do not confuse or compare this with anything else such dropped IOs or packets, it's not like it didn't find the data or failed to do it's job, it just had to travel to the further away bookshelf and it had to do this because at some point it INTENTIONALLY moved it out of it's near bookshelf as some other data on that Disk-Group was being read and/or accessed more often (these are called 'evictions').

More information on this topic can be found here:

VMware Knowledge Base

https://blogs.vmware.com/virtualblocks/2019/04/18/vsan-disk-groups/

Bob

View solution in original post

4 Replies
Highlighted
VMware Employee
VMware Employee

Hello szafa

So, you use the word 'issue' and 'problem' multiple times here - do you actually have a perceivable performance issue in this cluster or are you just getting tired of the vROps alarm?

I ask this very basic question as I have seen this vROps alert triggered for dozens of clusters where the VMs are working away happily and where the cluster hardware is perfectly adequate for the workload - if the answer to the above is the former, then consider improving the Cache:Capacity ratio of the Disk-Groups, if the answer is just the latter then disable this alarm.

To explain the basics of how a read-cache works in a vSAN Hybrid cluster, here is an analogy:

You have 2 bookshelves - one is relatively small but nice and close to where you read every evening it can hold 10 books, the other can hold 100 books but is on the other side of your house.

You have a good book (book1) that you are reading bits of every evening - you place it in your small but close shelf - , you then get another book (book2) that you find yourself frequently referencing and store it in the small but close shelf also.

Time goes by, you have now amassed 12 books on this topic and thus cannot fit them all in the small but close bookshelf - the original books (book1 and book2) are no longer the books you find yourself reading every night, only referencing them infrequently (or at least less frequently then the others), so you store them in your larger but farther away bookshelf.

In case it was not abundantly obvious, the small bookshelf is the read-cache portion of the Cache-tier SSD of a Disk-Group, the large bookshelf is the cumulative storage of the Capacity-tier HDDs of the same Disk-Group and the 'books' are the data stored on this Disk-Group.

The read-cache of a Disk-Group (~70% of the SSD size) is only so large and thus at any one time it can only contain a subset of all of the data in the Disk-Group - sure you *could* size the Disk-Groups so that the read-cache was the same size as cumulative capacity of the Capacity-tier but for most purposes this would be poor design and wasteful.

Thus on a read IO if the data is not found in the read-cache, it has to access this from the slower HDD - this is a read-cache 'miss' - do not confuse or compare this with anything else such dropped IOs or packets, it's not like it didn't find the data or failed to do it's job, it just had to travel to the further away bookshelf and it had to do this because at some point it INTENTIONALLY moved it out of it's near bookshelf as some other data on that Disk-Group was being read and/or accessed more often (these are called 'evictions').

More information on this topic can be found here:

VMware Knowledge Base

https://blogs.vmware.com/virtualblocks/2019/04/18/vsan-disk-groups/

Bob

View solution in original post

Highlighted
Enthusiast
Enthusiast

TheBobkin

Thanks for answer and great explanation how read cache works Smiley Happy

I will check with the users if any VM experience performance issue. I guess not and will ask them to disable alert or decrease threshold but I know that they will ask why we have that vrops alert Smiley Happy

Could help me and tell me how I can encourage them that all is fine with that cluster (they don't have any baseline or never done any test like HCIBench). In sort what other metric in conjunction with low "Read Cache|Hit Rate (%)" would indicate performance issue. Next problem they experience is proactive rebalance that never ends so they are just afraid that vSAN is not working correctly so I need to have good arguments that will encourage them that all is good.

Thanks in advance

Szafa

0 Kudos
Highlighted
VMware Employee
VMware Employee

Happy to help - I find as a support engineer (and more so as a mentor to our new colleagues) that we can sometimes get lost in the technical stuff, so spending some time on an adequate analogy is always time well spent.

While checking performance metrics from the VM/OS side is good practice, you should be checking the vSAN Client and Backend Performance statistics, these can be checked at the per-cluster and per-node level and in 7.0 U1 also at the per-VM and per-vmdk level - the Front-end/VM/Client level basically shows the speeds and feeds as it reaches the VM (in technical terms the vscsi layer), these can be accessed at Cluster/Host > Monitor > vSAN > Performance. In this case you should be validating that read latency and throughput is within reason for this type of cluster (e.g. while Hybrid clusters vary massively 5-10ms is likely reasonable) and relatively consistent e.g. no unexpected large spikes of 50-100ms, if this is a '9to5' cluster then relatively large spikes during morning and post-lunch boot/log-on is generally expected.

Regarding proactive rebalance - this is by design a slow and minimal process e.g. if there is 30% variance (the default threshold) between highest and lowest used disk, when you run it, it isn't going to try an make all the disks like ~1% variance disparity because 1. this won't always be possible (e.g. you just move where is imbalanced) and 2. unnecessarily moving data isn't always a good thing. If you are running this and it gets the highest-lowest disparity *just* slightly below the health alert trigger (30% disparity) then it will likely trigger again within days/weeks as whatever is on the higher-used disk is potentially growing faster than what is on the lowest-used. The fix for this is to update to 6.7 U3 where this is a toggle-switch option and it deals with this in the background (and in a more intelligent manner) without administrators having to manually start it. The option for now (before upgrading or if this is not possible) would be to use proactive rebalance via RVC where a lower variance threshold (e.g. 15-20%) can be applied.

Bob

Highlighted
Enthusiast
Enthusiast

Thanks again TheBobkin

0 Kudos