VMware Cloud Community
andersghansen
Contributor
Contributor
Jump to solution

Regular cluster or stretched cluster

Scenario: 2 datacenters with 0.5 ms latency between. We want to deploy both hybrid and AllFlash VSAN clusters.

Would a regular cluster be supported in this setup?

What is the criteria for a supported setup?

A regular cluster gives more flexibility and does not have the same constrains as a stretched cluster. (and management overhead for managing a witness host in a third site)

Networking is stretched between the datacenters.

0 Kudos
1 Solution

Accepted Solutions
depping
Leadership
Leadership
Jump to solution

If you are below 1ms than you are safe and can create a regular cluster. (I have requested this to be documented  a while ago, hasn't happened I see now) Now the real question: Can you create 3 fault domains in 3 physical locations? I see a lot of people asking the above question and they want to avoid the cost of the Enterprise license. They don't want to have a witness VM and manage it. And of course this is possible, but it only makes sense when you have 3 fault domains. I wrote about this a while back: http://www.yellow-bricks.com/2017/03/29/vsan-needs-3-fault-domains/

View solution in original post

0 Kudos
6 Replies
TheBobkin
Champion
Champion
Jump to solution

Hello Anders,

This is a good question, as on doing a bit of research (as I am not aware of any concrete advised rules compared to stretched) I cannot easily find anything definitive as to what is *too-much* network latency for a normal vSAN cluster. From  what I have garnered so far it appears to not be the main concern as typically on local sites the cache+disk overhead is more significant, though obviously any additional network-latency will add this extra latency to most processes.

Where this gets more interesting (and relevant to your question!) is what you require/want to achieve here:

If you stretch the clusters across the two DCs they may be more efficient and perform better due to the data-locality algorithms that function when operating in this configuration (assuming FTT=1 for convenience) when compared to a normal cluster that has this additional network-latency of 0.5ms when processing data from the other site and not optimizing for locality optimization.

Locality algorithm explained very well here by the great @CormacHogan:

http://cormachogan.com/2015/09/24/read-locality-in-vsan-stretched-cluster/

From what/where are you measuring the cross-DC latency as 0.5ms?

- From the switches or from hosts attached to each side? (Yes I am aware the difference will likely to be negligible but it all adds up).

You stated that you intend on deploying numerous clusters (some flash, some hybrid) - is your rationale for considering stretched clusters to split the cluster into site-based Fault Domains or is running some clusters local to one site and others local to another site not an option for some other reason not addressed?

The only benchmark I could find for typical expected network-latency is an older document (if anyone has something more current or with more detail, please post it):

"The majority of customers with production Virtual SAN deployments (and for that

matter any hyper-converged storage product) are using 10Gigabit Ethernet (10GbE).

10GbE networks have observed latencies in the range of 5 - 50 microseconds"

http://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/products/vsan/vmware-virtual-san-da...

I may have a look tomorrow with internal resources and assets regarding *advised* or *supported* regular cluster network "max" latency, as this is indeed an interesting question, so thanks for raising it.

Bob

-o- If you found this comment useful please click the 'Helpful' button and/or select as 'Answer' if you consider it so, please ask follow-up questions if you have any -o-

GreatWhiteTec
VMware Employee
VMware Employee
Jump to solution

It all depends what you are designing for. Although stretched-cluster gives you more HA between sites it also has more requirements (bandwidth wise). A stretched cluster will allow you to lose 1 DC and still be able to run. This is attractive for some companies where maintenance windows are scarce, and they may need to power off all servers on one DC.

You could have both HY and AF clusters managed by the same vCenter; however, you will need to either have separate vlans for vSAN traffic for each cluster, or change the addresses for the hosts for one of the cluster.

As far as bandwidth for stretched-cluster, here is a document to help you do some calculations Storage and Availability Technical Documents 

0 Kudos
andersghansen
Contributor
Contributor
Jump to solution

Thank for your input.

0.5 ms is the response time from one host to another in the 2 datacenters.

I know about read locality in the stretched cluster. The question is, does it work properly without read locality,. Bandwidth between the datacenters are 4 x 10Gbit.

We are a serviceprovider and want to provide at 2 datacenter solution. So if one datacenter fails, the VMs will boot up in the other datacenter.

I just talked to one of my colleagues who implemented a big VSAN AF cluster at a customer with 4 datacenters (28 nodes). They also had 0.5ms between the locations, and that setup was validated by VMware.

0 Kudos
TheBobkin
Champion
Champion
Jump to solution

Hello Anders,

Yes, I looked into it and I cannot find any 'rules' as such regarding max network latency for regular vSAN clusters (as opposed to stretched where 5ms RTT between data-sites and 200ms between data and Witness sites are required), so it should work.

The point I was kind of drawing towards with regard to read-locality was - will a regular cluster work as WELL as a stretched might and unfortunately I think the only way to know this would be testing both.

"So if one datacenter fails, the VMs will boot up in the other datacenter."

This can be achieved using either regular or stretched implementation by configuring Fault Domains (FDs). In, fact if you implemented FDs one on each site of a regular cluster (assuming FTT=1 Objects) the component placement would be similar to how a stretched cluster places them - but it wouldn't implement the read-locality algorithm

If you are planning to have a regular-cluster with FDs as above, then other than having a Witness Appliance on hand you should have no problem easily setting up an implementation with regular cluster, testing using HCIbench, Observer and in-built Proactive tests, switching it to stretched and testing again. I would advise testing this with initial smaller clusters first (both AF and Hybrid, I would imagine AF might see more difference between stretched and regular) and a larger cluster later if no clear difference is observed.

Bob

-o- If you found this comment useful please click the 'Helpful' button and/or select as 'Answer' if you consider it so, please ask follow-up questions if you have any -o-

depping
Leadership
Leadership
Jump to solution

If you are below 1ms than you are safe and can create a regular cluster. (I have requested this to be documented  a while ago, hasn't happened I see now) Now the real question: Can you create 3 fault domains in 3 physical locations? I see a lot of people asking the above question and they want to avoid the cost of the Enterprise license. They don't want to have a witness VM and manage it. And of course this is possible, but it only makes sense when you have 3 fault domains. I wrote about this a while back: http://www.yellow-bricks.com/2017/03/29/vsan-needs-3-fault-domains/

0 Kudos
andersghansen
Contributor
Contributor
Jump to solution

Great - that was what is was looking for. But ALOT has happened since i posted (on the stretched cluster topic), with the 6.6 release for VSAN. All though i created the post not many days ago.

PFTT and SFTT solves some of those issues we have with the stretched cluster configuration.

Thank you for your replys

0 Kudos