Solved: vSAN datastore capacity

mamatadesai · ‎07-13-2020

Hello,

vSAN n00b here (rather, using it after many years). I have a 4-node cluster with 838GB capacity storage and 447GB flash storage from each node. The vSAN datastore capacity is 838GB. What can I change to increase the capacity of my vSAN datastore? Right now, it looks like every node will have a copy of each object?

Thanks!

TheBobkin · ‎07-14-2020

"Sub-Cluster Member Count: 1"

That cluster is partitioned and/or that specific node is isolated.

This should also be visible via:

Cluster > Monitor > vSAN > Health > Network

This should also validate whether the nodes have a vSAN-enabled vmk configured and/or whether ping between the nodes is failing.

In short, you need to troubleshoot why the nodes are not communicating with one another.

Bob

View solution in original post

mObaid · ‎07-13-2020

You can have maximum of 5 Disk groups in a vSAN cluster and each disk group could contain a maximum of 7 capacity disks. Secondly, you can also explore the current FTT policy and change it to capacity focused policy such as RAID-5 and RAID-6 erasure coding.

So you have plenty of option to increase the capacity. And each of this option will have impact on your availability and performance. Therefore needs to be weighed upon based on the requirements of the applications running in your VMs.

depping · ‎07-13-2020

Unfortunately you can't use RAID-5 when you have HDDs as capacity devices. RAID-1 is your only option. So you either need to add disks or add diskgroups to the hosts to increase the capacity. It is a strange configuration though, very small capacity devices!

mamatadesai · ‎07-14-2020

Thanks depping. With HDDs as capacity devices and 4 hosts in this vSAN cluster, this cluster has 4 disk groups (1 per host). Is there any knob to tweak with "Fault Domains" and configuration type to increase capacity? Attached another screenshot showing FDs.

TheBobkin · ‎07-14-2020

Hello mamatadesai

"I have a 4-node cluster with 838GB capacity storage and 447GB flash storage from each node. The vSAN datastore capacity is 838GB. "

If each node has a Disk-Group providing 838GB capacity (e.g. the total size of all Capacity-tier devices on each node) then the vsanDatastore should be ~3.35TB NOT the size of a single nodes storage as you indicated - if it is showing as only 838GB (simple check with df -h via SSH should suffice) then you either have partitioned cluster/isolated nodes, issues with storage on 3 of the 4 nodes, some nodes are in a Decom state or you have not configured a Disk-Group on every node.

To start, if you could share/PM the output of the following:

On any one node:

# esxcli vsan cluster get

# cmmds-tool find -t NODE_DECOM_STATE

From all nodes:

# vdq -Hi

Just an FYI - nodes are by default a 'Fault Domain' when it comes to component placement (e.g. it won't place both data-replicas of an FTT=1 Object on the same node) - explicitly configuring 4 Fault Domains as you have there is pointless as it offers no benefit.

Bob

mamatadesai · ‎07-14-2020

TheBobkin
Thanks for your response. Please see the screenshots posted above.

Yes, I was just playing around to see if having 1 FD vs 4 FDs changes capacity. I was expecting an easy UI way to toggle RAID levels or equivalent of FTT. Having all 4 hosts in one FD vs having each host in its own FD makes no difference to capacity. To be clear, I am not concerned about failures on this vSAN cluster, just want to setup something with highest capacity I can get with the available hardware.

Also, here are the outputs of the commands on all 4 hosts.

.103:

[root@localhost:~] esxcli vsan cluster get

Cluster Information

Enabled: true

Current Local Time: 2020-07-14T16:31:04Z

Local Node UUID: 5ef4938a-643d-b9d8-8f68-222d1450010c

Local Node Type: NORMAL

Local Node State: MASTER

Local Node Health State: HEALTHY

Sub-Cluster Master UUID: 5ef4938a-643d-b9d8-8f68-222d1450010c

Sub-Cluster Backup UUID:

Sub-Cluster UUID: 522a17e6-6d8b-a93e-989c-444ab595b593

Sub-Cluster Membership Entry Revision: 0

Sub-Cluster Member Count: 1

Sub-Cluster Member UUIDs: 5ef4938a-643d-b9d8-8f68-222d1450010c

Sub-Cluster Member HostNames: localhost

Sub-Cluster Membership UUID: 1b20fd5e-6a74-9904-2477-222d1450010c

Unicast Mode Enabled: true

Maintenance Mode State: OFF

Config Generation: 569b40c7-4cd8-448a-86e2-d9c5355dbb0f 2 2020-07-01T23:45:34.902

[root@localhost:~] cmmds-tool find -t NODE_DECOM_STATE

owner=5ef4938a-643d-b9d8-8f68-222d1450010c(Health: Healthy) uuid=5ef4938a-643d-b9d8-8f68-222d1450010c type=NODE_DECOM_STATE rev=0 minHostVer=0 [content = (i0 i0 UUID_NULL i0 [ ] i0 i0 i0 "" i0 i0 l0 l0)], errorStr=(null)

[root@localhost:~] vdq -Hi

Mappings:

DiskMapping[0]:

SSD: naa.5001438031545913

MD: naa.5000c5009611ebaf

MD: naa.5000c5009612e4e3

MD: naa.5000c500961ac12f

.104:

[root@localhost:~] vdq -Hi

Mappings:

DiskMapping[0]:

SSD: naa.5001438031545910

MD: naa.5000c500961ac97b

MD: naa.5000c500961b069f

MD: naa.5000c500961a9bb7

.98:

[root@localhost:~] vdq -Hi

Mappings:

DiskMapping[0]:

SSD: naa.5001438031545911

MD: naa.5000c500960040ab

MD: naa.5000c50095fd3507

MD: naa.5000c50095fd12d3

.99:

[root@localhost:~] vdq -Hi

Mappings:

DiskMapping[0]:

SSD: naa.5001438031545912

MD: naa.5000c500961b0a3b

MD: naa.5000c500961900f3

MD: naa.5000c50096190197

DiskMapping[0]:

SSD: naa.5001438031545912

MD: naa.5000c500961b0a3b

MD: naa.5000c500961900f3

MD: naa.5000c50096190197

TheBobkin · ‎07-14-2020

"Sub-Cluster Member Count: 1"

That cluster is partitioned and/or that specific node is isolated.

This should also be visible via:

Cluster > Monitor > vSAN > Health > Network

This should also validate whether the nodes have a vSAN-enabled vmk configured and/or whether ping between the nodes is failing.

In short, you need to troubleshoot why the nodes are not communicating with one another.

Bob

TheBobkin · ‎07-14-2020

"To be clear, I am not concerned about failures on this vSAN cluster, just want to setup something with highest capacity I can get with the available hardware."

In this case you should be doing this via SPBM e.g. configuring and applying a FTT=0 Storage Policy to all VMs/Objects, with the clear understanding that a single disk failure will mean the data that was on this will be gone.

About vSAN Policies

As the Capacity-Tier disks are quite small, you might want to also lower the max component sizes and/or used a Storage Policy with Stripe-Width >1 so that you have smaller sub-components which will fit in smaller spaces better (and thus be able to utilise more of the available space):

VMware Knowledge Base

First though, of course get the cluster clustered :smileygrin:

Bob

mamatadesai · ‎07-14-2020

OMG :smileysilly: Something changed in my network. The monitor Health tab - very helpful, and it is upgraded to Skyline Health now. Size is now 3.27TB for my vSAN datastore, yay!

Thanks TheBobkin

TheBobkin · ‎07-14-2020

Happy to help mamatadesai - being able to use all the clusters storage is a sure-fire way of increasing capacity :smileycool:

Yes, vSAN/Skyline Health is a great troubleshooting/monitoring resource - a large proportion of the checks done there were added over time as a result of folks like myself in GSS exclaiming that:

1. we shouldn't have to manually check these (e.g. try checking ping and latency both ways between 64 nodes via SSH!) and

2. customers should be able to check these things themselves in a few clicks.

A couple of things to further utilise this in future:

It runs automatically every hour and will trigger cluster-level icon change (e.g. if cluster is partitioned it will show red alarm icon) so that you can have awareness just from the vSphere inventory main screen.

It can provide historical logging data for awareness of what the state of a cluster/test was from the vSAN health summary logs stored /var/log/vmware/vsan-health .

Bob

All

vSAN datastore capacity