Hello All,
Just looking through my environment and found that on one of my stretched clusters the "stats object storage policy was set to FTT=1 No Local" Which by its named seemed wrong cause I thought everything in our environment was FTT2
Please see attached screenshot of policy rules for FTT=1 No Local; just wondering if someone here with good stretched VSAN cluster experience could explain what the expected behaviour would be if for example there was a host/s failures in the primary fault domain or an entire primary site failure or network isolation. Stretched VSAN cluster consists of 12 hosts, 6 hosts in each site.
Hello Nicholas,
With that Storage Policy (SP), there is essentially an FTT=1 set of the data (using RAID5 as the Fault Tolerance Method) hosted only on the Secondary site (Affinity) - thus any host on the Primary site becoming inaccessible would not impact such Objects as their data is not stored there (though vice-versa, if whole Secondary site became inaccessible the Objects would be unavailable).
The 'stats db' Object is just used to store data collected from by the Performance service, you can re-apply a different SP to it if considered crucial data - historical performance data would be lost if it became permanently unavailable.
This Object can also be analyzed and manipulated via RVC:
virten.net/2017/07/vsan-6-6-rvc-guide-part-5-performance-service/
Bob
Hello Nicholas,
With that Storage Policy (SP), there is essentially an FTT=1 set of the data (using RAID5 as the Fault Tolerance Method) hosted only on the Secondary site (Affinity) - thus any host on the Primary site becoming inaccessible would not impact such Objects as their data is not stored there (though vice-versa, if whole Secondary site became inaccessible the Objects would be unavailable).
The 'stats db' Object is just used to store data collected from by the Performance service, you can re-apply a different SP to it if considered crucial data - historical performance data would be lost if it became permanently unavailable.
This Object can also be analyzed and manipulated via RVC:
virten.net/2017/07/vsan-6-6-rvc-guide-part-5-performance-service/
Bob
Thanks Bob,
I think the intention was to have everything including stats DB stored locally with FTT2. But I'm just wondering if there was a reason for someone in the org to set it up this way, could it be performance related since the workloads are in the primary site? And just so I get my head around this if it was set to Primary Level of Failures to Tolerate=1 and secondary=1 there would be a full copy on each site and then objects would still be available if the secondary site was to become unavailable?
Hello Nicholas,
The 'Stats db' Object writes are not incredibly busy AFAIK so I don't think there would be a huge amount of extra workload by writing to Primary also.
Yes, if you applied an SP with PFTT=1, SFTT=1 it would be FTT=1 at each site.
There is a very handy table for looking at all the potential options here and their layout (p18-19 & p33-35):
storagehub.vmware.com/export_to_pdf/vsan-stretched-cluster-2-node-guide
depping also summarizes these here:
yellow-bricks.com/2017/05/30/sizing-vsan-stretched-cluster/
Bob