VMware Cloud Community
mamatadesai
Contributor
Contributor
Jump to solution

vSAN datastore capacity

Hello,

vSAN n00b here (rather, using it after many years).  I have a 4-node cluster with 838GB capacity storage and 447GB flash storage from each node.  The vSAN datastore capacity is 838GB.  What can I change to increase the capacity of my vSAN datastore?  Right now, it looks like every node will have a copy of each object?

Thanks!

0 Kudos
1 Solution

Accepted Solutions
TheBobkin
Champion
Champion
Jump to solution

"Sub-Cluster Member Count: 1"

That cluster is partitioned and/or that specific node is isolated.

This should also be visible via:

Cluster > Monitor > vSAN > Health > Network

This should also validate whether the nodes have a vSAN-enabled vmk configured and/or whether ping between the nodes is failing.

In short, you need to troubleshoot why the nodes are not communicating with one another.

Bob

View solution in original post

0 Kudos
9 Replies
mObaid
Contributor
Contributor
Jump to solution

You can have maximum of 5 Disk groups in a vSAN cluster and each disk group could contain a maximum of 7 capacity disks. Secondly, you can also explore the current FTT policy and change it to capacity focused policy such as RAID-5 and RAID-6 erasure coding.

So you have plenty of option to increase the capacity. And each of this option will have impact on your availability and performance. Therefore needs to be weighed upon based on the requirements of the applications running in your VMs.

0 Kudos
depping
Leadership
Leadership
Jump to solution

Unfortunately you can't use RAID-5 when you have HDDs as capacity devices. RAID-1 is your only option. So you either need to add disks or add diskgroups to the hosts to increase the capacity. It is a strange configuration though, very small capacity devices!

0 Kudos
mamatadesai
Contributor
Contributor
Jump to solution

Thanks depping​.  With HDDs as capacity devices and 4 hosts in this vSAN cluster, this cluster has 4 disk groups (1 per host).  Is there any knob to tweak with "Fault Domains" and configuration type to increase capacity?  Attached another screenshot showing FDs.

0 Kudos
TheBobkin
Champion
Champion
Jump to solution

Hello mamatadesai

"I have a 4-node cluster with 838GB capacity storage and 447GB flash storage from each node.  The vSAN datastore capacity is 838GB. "

If each node has a Disk-Group providing 838GB capacity (e.g. the total size of all Capacity-tier devices on each node) then the vsanDatastore should be ~3.35TB NOT the size of a single nodes storage as you indicated - if it is showing as only 838GB (simple check with df -h via SSH should suffice) then you either have partitioned cluster/isolated nodes, issues with storage on 3 of the 4 nodes, some nodes are in a Decom state or you have not configured a Disk-Group on every node.

To start, if you could share/PM the output of the following:

On any one node:

# esxcli vsan cluster get

# cmmds-tool find -t NODE_DECOM_STATE

From all nodes:

# vdq -Hi

Just an FYI - nodes are by default a 'Fault Domain' when it comes to component placement (e.g. it won't place both data-replicas of an FTT=1 Object on the same node) - explicitly configuring 4 Fault Domains as you have there is pointless as it offers no benefit.

Bob

0 Kudos
mamatadesai
Contributor
Contributor
Jump to solution

TheBobkin
Thanks for your response.  Please see the screenshots posted above.

Yes, I was just playing around to see if having 1 FD vs 4 FDs changes capacity.  I was expecting an easy UI way to toggle RAID levels or equivalent of FTT.  Having all 4 hosts in one FD vs having each host in its own FD makes no difference to capacity.  To be clear, I am not concerned about failures on this vSAN cluster, just want to setup something with highest capacity I can get with the available hardware.

Also, here are the outputs of the commands on all 4 hosts.

.103:

[root@localhost:~] esxcli vsan cluster get

Cluster Information

   Enabled: true

   Current Local Time: 2020-07-14T16:31:04Z

   Local Node UUID: 5ef4938a-643d-b9d8-8f68-222d1450010c

   Local Node Type: NORMAL

   Local Node State: MASTER

   Local Node Health State: HEALTHY

   Sub-Cluster Master UUID: 5ef4938a-643d-b9d8-8f68-222d1450010c

   Sub-Cluster Backup UUID:

   Sub-Cluster UUID: 522a17e6-6d8b-a93e-989c-444ab595b593

   Sub-Cluster Membership Entry Revision: 0

   Sub-Cluster Member Count: 1

   Sub-Cluster Member UUIDs: 5ef4938a-643d-b9d8-8f68-222d1450010c

   Sub-Cluster Member HostNames: localhost

   Sub-Cluster Membership UUID: 1b20fd5e-6a74-9904-2477-222d1450010c

   Unicast Mode Enabled: true

   Maintenance Mode State: OFF

   Config Generation: 569b40c7-4cd8-448a-86e2-d9c5355dbb0f 2 2020-07-01T23:45:34.902

[root@localhost:~] cmmds-tool find -t NODE_DECOM_STATE

owner=5ef4938a-643d-b9d8-8f68-222d1450010c(Health: Healthy) uuid=5ef4938a-643d-b9d8-8f68-222d1450010c type=NODE_DECOM_STATE rev=0 minHostVer=0  [content = (i0 i0 UUID_NULL i0 [ ] i0 i0 i0 "" i0 i0 l0 l0)], errorStr=(null)

[root@localhost:~] vdq -Hi

Mappings:

   DiskMapping[0]:

           SSD:  naa.5001438031545913

            MD:  naa.5000c5009611ebaf

            MD:  naa.5000c5009612e4e3

            MD:  naa.5000c500961ac12f

.104:

[root@localhost:~] vdq -Hi

Mappings:

   DiskMapping[0]:

           SSD:  naa.5001438031545910

            MD:  naa.5000c500961ac97b

            MD:  naa.5000c500961b069f

            MD:  naa.5000c500961a9bb7

.98:

[root@localhost:~] vdq -Hi

Mappings:

   DiskMapping[0]:

           SSD:  naa.5001438031545911

            MD:  naa.5000c500960040ab

            MD:  naa.5000c50095fd3507

            MD:  naa.5000c50095fd12d3

.99:

[root@localhost:~] vdq -Hi

Mappings:

   DiskMapping[0]:

           SSD:  naa.5001438031545912

            MD:  naa.5000c500961b0a3b

            MD:  naa.5000c500961900f3

            MD:  naa.5000c50096190197

   DiskMapping[0]:

           SSD:  naa.5001438031545912

            MD:  naa.5000c500961b0a3b

            MD:  naa.5000c500961900f3

            MD:  naa.5000c50096190197

0 Kudos
TheBobkin
Champion
Champion
Jump to solution

"Sub-Cluster Member Count: 1"

That cluster is partitioned and/or that specific node is isolated.

This should also be visible via:

Cluster > Monitor > vSAN > Health > Network

This should also validate whether the nodes have a vSAN-enabled vmk configured and/or whether ping between the nodes is failing.

In short, you need to troubleshoot why the nodes are not communicating with one another.

Bob

0 Kudos
TheBobkin
Champion
Champion
Jump to solution

"To be clear, I am not concerned about failures on this vSAN cluster, just want to setup something with highest capacity I can get with the available hardware."

In this case you should be doing this via SPBM e.g. configuring and applying a FTT=0 Storage Policy to all VMs/Objects, with the clear understanding that a single disk failure will mean the data that was on this will be gone.

About vSAN Policies

As the Capacity-Tier disks are quite small, you might want to also lower the max component sizes and/or used a Storage Policy with Stripe-Width >1 so that you have smaller sub-components which will fit in smaller spaces better (and thus be able to utilise more of the available space):

VMware Knowledge Base

First though, of course get the cluster clustered :smileygrin:

Bob

0 Kudos
mamatadesai
Contributor
Contributor
Jump to solution

OMG :smileysilly: Something changed in my network.   The monitor Health tab - very helpful, and it is upgraded to Skyline Health now.  Size is now 3.27TB for my vSAN datastore, yay!

Thanks TheBobkin

0 Kudos
TheBobkin
Champion
Champion
Jump to solution

Happy to help mamatadesai - being able to use all the clusters storage is a sure-fire way of increasing capacity :smileycool:

Yes, vSAN/Skyline Health is a great troubleshooting/monitoring resource - a large proportion of the checks done there were added over time as a result of folks like myself in GSS exclaiming that:

1. we shouldn't have to manually check these (e.g. try checking ping and latency both ways between 64 nodes via SSH!) and

2. customers should be able to check these things themselves in a few clicks.

A couple of things to further utilise this in future:

It runs automatically every hour and will trigger cluster-level icon change (e.g. if cluster is partitioned it will show red alarm icon) so that you can have awareness just from the vSphere inventory main screen.

It can provide historical logging data for awareness of what the state of a cluster/test was from the vSAN health summary logs stored /var/log/vmware/vsan-health .

Bob

0 Kudos