RolandoRodrigue
Contributor
Contributor

vSAN Stretched Cluster: Latency & Inter-link

Jump to solution

Hello, everyone.

I have a customer who is validating the several requirements for implementing a Stretched Cluster solution with vSAN ready nodes in two sites. Here are two questions that I want to clarify:

- The simulation or measurement of the latency between the Data Sites and between Witness site and Data Sites should be done between the routers of each Datacenter? I understand that those routers are the ones that allow the long distance communication between sites. I thought that this measurement had to be metered between vSAN nodes o witness and vSAN nodes, but since I don't have implemented the cluster I cannot make that measurement. I think that the major latency should be generated between the routers, right? So if the latency between routers is under the maximums (5ms for data sites and 200ms for witness to data sites), the latency between vSAN nodes should be practically the same.

- Is there any requirement for Stretched Cluster implementation for the redundancy type of inter-link between Data sites and Witness to data sites? Is it mandatory to have two links? active-active redundancy? active-standby redundancy? Or can I have just 1 link and have support from VMware.

I appreciate your opinions and answers. Thanks in advanced.

Best regards,

Rolando

0 Kudos
1 Solution

Accepted Solutions
depping
Leadership
Leadership

Hi,

From a support stance it is simple: latency is measured between the hosts, so you need to meet those requirements. Having said that, the majority of latency of course is incurred on the ISL, so if you measure 5ms between routers then the latency could be 6ms from host to host, but there's no given that it is 6ms, it could also be 7 or more. All of that depends on the architecture of the environment.

When it comes to links between locations, we don't have a requirement around the ISL. You can have 1 link, you can have more, we support both. What would we recommend? Resiliency!

View solution in original post

3 Replies
depping
Leadership
Leadership

Hi,

From a support stance it is simple: latency is measured between the hosts, so you need to meet those requirements. Having said that, the majority of latency of course is incurred on the ISL, so if you measure 5ms between routers then the latency could be 6ms from host to host, but there's no given that it is 6ms, it could also be 7 or more. All of that depends on the architecture of the environment.

When it comes to links between locations, we don't have a requirement around the ISL. You can have 1 link, you can have more, we support both. What would we recommend? Resiliency!

View solution in original post

sunvmware1
Enthusiast
Enthusiast

Hi,

vSAN is highly customizable, and as a result, several user-adjustable variables can influence performance.  Hardware-based decisions include the size and type of write buffer in the disk groups, the number of disk groups per host, the type of capacity devices used at the capacity tier, and the capabilities of the network.  vSAN settings influencing performance include the level of failure to tolerate and data placement scheme used, and cluster-wide data services enabled.  A more complete understanding of influencing factors can be found in the Discovery/Review - Environment section of the Troubleshooting vSAN Performance guide.

https://core.vmware.com/blog/performance-vsan-stretched-clusters

TheBobkin
VMware Employee
VMware Employee

@RolandoRodrigue , Please don't consider the 'supported maximum' as a target 😟 - most production Stretched Clusters that I work with on a daily basis have 1-2ms between sites.

 

Every ms of latency on the ISL is essentially extra ms added to the storage latency as every IO to Objects that have data-components stored on both sites of the cluster need to commit the IOs to the data-components on both sites before it is acknowledged e.g. you can have the fastest All-NVMe cluster possible with sub-0.2ms for the storage to commit the IOs but if you add 4-5ms RTT network latency to this then it is kind of defeating the purpose.

Tags (1)