VMware Cloud Community
nicholas1982
Hot Shot
Hot Shot
Jump to solution

How important are stats object and its storage policy?

Hello All,

Just looking through my environment and found that on one of my stretched clusters the "stats object storage policy was set to FTT=1 No Local" Which by its named seemed wrong cause I thought everything in our environment was FTT2

Please see attached screenshot of policy rules for FTT=1 No Local; just wondering if someone here with good stretched VSAN cluster experience could explain what the expected behaviour would be if for example there was a host/s failures in the primary fault domain or an entire primary site failure or network isolation. Stretched VSAN cluster consists of 12 hosts, 6 hosts in each site.

Nicholas
1 Solution

Accepted Solutions
TheBobkin
Champion
Champion
Jump to solution

Hello Nicholas,

With that Storage Policy (SP), there is essentially an FTT=1 set of the data (using RAID5 as the Fault Tolerance Method) hosted only on the Secondary site (Affinity) - thus any host on the Primary site becoming inaccessible would not impact such Objects as their data is not stored there (though vice-versa, if whole Secondary site became inaccessible the Objects would be unavailable).

The 'stats db' Object is just used to store data collected from by the Performance service, you can re-apply a different SP to it if considered crucial data - historical performance data would be lost if it became permanently unavailable.

This Object can also be analyzed and manipulated via RVC:

virten.net/2017/07/vsan-6-6-rvc-guide-part-5-performance-service/

Bob

View solution in original post

3 Replies
TheBobkin
Champion
Champion
Jump to solution

Hello Nicholas,

With that Storage Policy (SP), there is essentially an FTT=1 set of the data (using RAID5 as the Fault Tolerance Method) hosted only on the Secondary site (Affinity) - thus any host on the Primary site becoming inaccessible would not impact such Objects as their data is not stored there (though vice-versa, if whole Secondary site became inaccessible the Objects would be unavailable).

The 'stats db' Object is just used to store data collected from by the Performance service, you can re-apply a different SP to it if considered crucial data - historical performance data would be lost if it became permanently unavailable.

This Object can also be analyzed and manipulated via RVC:

virten.net/2017/07/vsan-6-6-rvc-guide-part-5-performance-service/

Bob

nicholas1982
Hot Shot
Hot Shot
Jump to solution

Thanks Bob,

I think the intention was to have everything including stats DB stored locally with FTT2. But I'm just wondering if there was a reason for someone in the org to set it up this way, could it be performance related since the workloads are in the primary site? And just so I get my head around this if it was set to Primary Level of Failures to Tolerate=1 and secondary=1 there would be a full copy on each site and then objects would still be available if the secondary site was to become unavailable?

Nicholas
Reply
0 Kudos
TheBobkin
Champion
Champion
Jump to solution

Hello Nicholas,

The 'Stats db' Object writes are not incredibly busy AFAIK so I don't think there would be a huge amount of extra workload by writing to Primary also.

Yes, if you applied an SP with PFTT=1, SFTT=1 it would be FTT=1 at each site.

There is a very handy table for looking at all the potential options here and their layout (p18-19 & p33-35):

storagehub.vmware.com/export_to_pdf/vsan-stretched-cluster-2-node-guide

depping​ also summarizes these here:

yellow-bricks.com/2017/05/30/sizing-vsan-stretched-cluster/

Bob