Hello,
vSAN n00b here (rather, using it after many years). I have a 4-node cluster with 838GB capacity storage and 447GB flash storage from each node. The vSAN datastore capacity is 838GB. What can I change to increase the capacity of my vSAN datastore? Right now, it looks like every node will have a copy of each object?
Thanks!
"Sub-Cluster Member Count: 1"
That cluster is partitioned and/or that specific node is isolated.
This should also be visible via:
Cluster > Monitor > vSAN > Health > Network
This should also validate whether the nodes have a vSAN-enabled vmk configured and/or whether ping between the nodes is failing.
In short, you need to troubleshoot why the nodes are not communicating with one another.
Bob
You can have maximum of 5 Disk groups in a vSAN cluster and each disk group could contain a maximum of 7 capacity disks. Secondly, you can also explore the current FTT policy and change it to capacity focused policy such as RAID-5 and RAID-6 erasure coding.
So you have plenty of option to increase the capacity. And each of this option will have impact on your availability and performance. Therefore needs to be weighed upon based on the requirements of the applications running in your VMs.
Unfortunately you can't use RAID-5 when you have HDDs as capacity devices. RAID-1 is your only option. So you either need to add disks or add diskgroups to the hosts to increase the capacity. It is a strange configuration though, very small capacity devices!
Hello mamatadesai
"I have a 4-node cluster with 838GB capacity storage and 447GB flash storage from each node. The vSAN datastore capacity is 838GB. "
If each node has a Disk-Group providing 838GB capacity (e.g. the total size of all Capacity-tier devices on each node) then the vsanDatastore should be ~3.35TB NOT the size of a single nodes storage as you indicated - if it is showing as only 838GB (simple check with df -h via SSH should suffice) then you either have partitioned cluster/isolated nodes, issues with storage on 3 of the 4 nodes, some nodes are in a Decom state or you have not configured a Disk-Group on every node.
To start, if you could share/PM the output of the following:
On any one node:
# esxcli vsan cluster get
# cmmds-tool find -t NODE_DECOM_STATE
From all nodes:
# vdq -Hi
Just an FYI - nodes are by default a 'Fault Domain' when it comes to component placement (e.g. it won't place both data-replicas of an FTT=1 Object on the same node) - explicitly configuring 4 Fault Domains as you have there is pointless as it offers no benefit.
Bob
TheBobkin
Thanks for your response. Please see the screenshots posted above.
Yes, I was just playing around to see if having 1 FD vs 4 FDs changes capacity. I was expecting an easy UI way to toggle RAID levels or equivalent of FTT. Having all 4 hosts in one FD vs having each host in its own FD makes no difference to capacity. To be clear, I am not concerned about failures on this vSAN cluster, just want to setup something with highest capacity I can get with the available hardware.
Also, here are the outputs of the commands on all 4 hosts.
.103:
[root@localhost:~] esxcli vsan cluster get
Cluster Information
Enabled: true
Current Local Time: 2020-07-14T16:31:04Z
Local Node UUID: 5ef4938a-643d-b9d8-8f68-222d1450010c
Local Node Type: NORMAL
Local Node State: MASTER
Local Node Health State: HEALTHY
Sub-Cluster Master UUID: 5ef4938a-643d-b9d8-8f68-222d1450010c
Sub-Cluster Backup UUID:
Sub-Cluster UUID: 522a17e6-6d8b-a93e-989c-444ab595b593
Sub-Cluster Membership Entry Revision: 0
Sub-Cluster Member Count: 1
Sub-Cluster Member UUIDs: 5ef4938a-643d-b9d8-8f68-222d1450010c
Sub-Cluster Member HostNames: localhost
Sub-Cluster Membership UUID: 1b20fd5e-6a74-9904-2477-222d1450010c
Unicast Mode Enabled: true
Maintenance Mode State: OFF
Config Generation: 569b40c7-4cd8-448a-86e2-d9c5355dbb0f 2 2020-07-01T23:45:34.902
[root@localhost:~] cmmds-tool find -t NODE_DECOM_STATE
owner=5ef4938a-643d-b9d8-8f68-222d1450010c(Health: Healthy) uuid=5ef4938a-643d-b9d8-8f68-222d1450010c type=NODE_DECOM_STATE rev=0 minHostVer=0 [content = (i0 i0 UUID_NULL i0 [ ] i0 i0 i0 "" i0 i0 l0 l0)], errorStr=(null)
[root@localhost:~] vdq -Hi
Mappings:
DiskMapping[0]:
SSD: naa.5001438031545913
MD: naa.5000c5009611ebaf
MD: naa.5000c5009612e4e3
MD: naa.5000c500961ac12f
.104:
[root@localhost:~] vdq -Hi
Mappings:
DiskMapping[0]:
SSD: naa.5001438031545910
MD: naa.5000c500961ac97b
MD: naa.5000c500961b069f
MD: naa.5000c500961a9bb7
.98:
[root@localhost:~] vdq -Hi
Mappings:
DiskMapping[0]:
SSD: naa.5001438031545911
MD: naa.5000c500960040ab
MD: naa.5000c50095fd3507
MD: naa.5000c50095fd12d3
.99:
[root@localhost:~] vdq -Hi
Mappings:
DiskMapping[0]:
SSD: naa.5001438031545912
MD: naa.5000c500961b0a3b
MD: naa.5000c500961900f3
MD: naa.5000c50096190197
DiskMapping[0]:
SSD: naa.5001438031545912
MD: naa.5000c500961b0a3b
MD: naa.5000c500961900f3
MD: naa.5000c50096190197
"Sub-Cluster Member Count: 1"
That cluster is partitioned and/or that specific node is isolated.
This should also be visible via:
Cluster > Monitor > vSAN > Health > Network
This should also validate whether the nodes have a vSAN-enabled vmk configured and/or whether ping between the nodes is failing.
In short, you need to troubleshoot why the nodes are not communicating with one another.
Bob
"To be clear, I am not concerned about failures on this vSAN cluster, just want to setup something with highest capacity I can get with the available hardware."
In this case you should be doing this via SPBM e.g. configuring and applying a FTT=0 Storage Policy to all VMs/Objects, with the clear understanding that a single disk failure will mean the data that was on this will be gone.
As the Capacity-Tier disks are quite small, you might want to also lower the max component sizes and/or used a Storage Policy with Stripe-Width >1 so that you have smaller sub-components which will fit in smaller spaces better (and thus be able to utilise more of the available space):
First though, of course get the cluster clustered :smileygrin:
Bob
Happy to help mamatadesai - being able to use all the clusters storage is a sure-fire way of increasing capacity :smileycool:
Yes, vSAN/Skyline Health is a great troubleshooting/monitoring resource - a large proportion of the checks done there were added over time as a result of folks like myself in GSS exclaiming that:
1. we shouldn't have to manually check these (e.g. try checking ping and latency both ways between 64 nodes via SSH!) and
2. customers should be able to check these things themselves in a few clicks.
A couple of things to further utilise this in future:
It runs automatically every hour and will trigger cluster-level icon change (e.g. if cluster is partitioned it will show red alarm icon) so that you can have awareness just from the vSphere inventory main screen.
It can provide historical logging data for awareness of what the state of a cluster/test was from the vSAN health summary logs stored /var/log/vmware/vsan-health .
Bob