VMware Cloud Community
Dhakshinamoorth
Contributor
Contributor

About vSAN

Hi There,

I am working on designing the solution for a customer and thought of proposing VMware vSAN.

What is the minimum nodes that we need to factor for building highly resilient vSAN cluster?

Do i need to factor VMware ESXi license as well? (or) vSAN license will includes ESXi license also?

Please comment on this.

Dhakshinamoorthy Balasubramanian
Tags (1)
3 Replies
TheBobkin
Champion
Champion

Hello Dhakshinamoorthy,

The word 'minimum' and 'highly resilient' are kind of at odds with each other here - best to find a middle-ground of what suits the solution.

Brief summary of typical configurations and their benefits from smallest up:

- 2-Node Direct Connect/ROBO, 2 data-nodes either running local to each other or on different sites (ROBO) + Witness Appliance.

These allow for standard RAID1 FTT=1 (Failures To Tolerate=1) Storage Policies applied to the data Objects so that any node can fail and the data should remain accessible.

These require additional or existing infrastructure to run Witness Appliance as a VM or on a lower-spec server, this comes with a free vSAN license.

- 3-Node cluster

Same as above except with 3 equal-spec data-nodes and no Witness (Witness components are stored across all nodes).

- 4-Node cluster

Same capability of FTT=1, however this configuration allows for all data to be rebuilt in the event of a permanent node or disk-group failure (provided there is space), something that 2/3-node clusters cannot facilitate. This can also be beneficial for uptime as rolling upgrades can be performed by evacuating data off a single node at a time so VMs remain up (2/3-node clusters can keep VMs running on reduced number of data components with 'ensure-accessibility' but if there is a failure during this period VMs may go down).

All-Flash is required for RAID5/RAID6:

With 4 or more nodes RAID5 is also an option for the method of achieving FTT=1, this results in decent savings in storage space used (RAID1=2x, RAID5=1.33x), however using RAID5 will limit the ability to fully evacuate a node (minimum 4 components per RAID5 Object as opposed to 3 in RAID1).

- 5-Node cluster

Can use FTT=2 policies in RAID1.

Can use RAID5 as protection method and fully evacuate one node without running in reduced-availability (provided there is space).

- 6-node cluster

Can use RAID6 protection method for FTT=2 (1.5x storage utilisation).

With regard to licensing:

Hosts require an additional license as well as ESXi licenses.

These have comparable licensing levels to vSphere with varying levels of features, better explained here:

www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/products/vsan/vmware-vsan-66-licensing-gui...

www.vmware.com/files/pdf/products/vsan/vmware-vsan-62-licensing-guide.pdf

Bob

duffer1
Contributor
Contributor

Bob is dead on with the descriptions.  From my experience, I would recommend 4 nodes.  This will allow for a node to be taken offline for maintenance and still be able to withstand a node unexpectedly going offline.   Believe it or not, I heard of a situation where that occurred and luckily they had 4 nodes.  I would also recommend that you go with the canned solutions.  We have 35 nodes deployed for vSAN across 5 vSAN clusters.  We went the build your own route with Cisco hardware.  We have had issues over the years dealing with hardware/firmware/drivers being on the HCL for vSAN.  Going with a canned solution will minimize issues with the HCL.  Remember, you don't have to fully populate each node or disk group with storage.  Depending on the amount of storage currently needed and the anticipated need over the next several years, smaller physical servers may be an option to keep the cost down.  The great thing about vSAN is disks can be easily added to disk groups, new disk groups can be added and/or new nodes can be added to vSAN while the storage is in use.  Hope this helps.

0 Kudos
NetxRunner
Enthusiast
Enthusiast

A "highly resilient vSAN cluster" starts from 6 all-flash nodes where FTT=2 can be achieved along with an effective space utilization and good performance.

Also, I recommend you to consider alternative SDS options and add more solutions to your checklist. vSAN is very capable if used in large-sized deployments, however, there is a number of solutions that can deliver FTT=2 or similar resiliency using lesser amount of nodes.

Check out HPE StoreVirtual VSA or StarWind vSAN. These solutions can leverage hardware RAID controllers and feature synchronous replication allowing to build compact but redundant clusters. The final 2-node config would be, basically, a "RAID61" where you can lose an entire node along with a pair of drives on the other node and still be up and running. Worth considering IMO.

0 Kudos