VMware Cloud Community
Thouvou
Contributor
Contributor
Jump to solution

vSAN node restart for disk expansion

Hello all,

We have a 3 node vSAN 6.0 cluster with 2 disk groups on each node (1SSD + 7HDD per group). We have purchased and added some extra disks on each node in order to create a new disk group and expand the overal vSAN capacity. The case is that the disks are not detected via HBA rescan, so we assume a node restart is required (probably H730P controller or firmware does not allow disk detection on hot-add).

Our concern is related to the free vSAN space, as of the total 45TB capacity, only 12.5TB are free. Our default vSAN policy has FTT=1. Would there be data loss when we will place in maitenance mode a vsan node (Ensure accessibility option), due to the small free space which is less that 30% of the total, as stated in vSAN best practices? Could we avoid data loss by setting FTT=0 on all VMs, in order to "erase" the replicas and consume less space which would then lead to increasing the free percentage (more than 30%)?

Thank you very much for your time

1 Solution

Accepted Solutions
TheBobkin
Champion
Champion
Jump to solution

Hello Thouvou​,

"Would there be data loss when we will place in maintenance mode a vsan node (Ensure accessibility option), due to the small free space which is less that 30% of the total, as stated in vSAN best practices?"

When you place a node in MM with EA option you are essentially pausing update to the data components residing on that nodes disks - thus there is only a single data-replica still up to date until that node is taken out of MM and data it missed while it was gone is resynced. If you have a hardware failure of a physical disk while you are running on a single copy of the data then yes you will lose that data, thus why it is best practice to have current back-ups before doing this and if this is not possible, have the data cold (e.g. if no components are being updated then they can't be out of sync and it will remain FTT=0) this would of course require a maintenance window with all VMs off and thus why the former option is what most do.

"Could we avoid data loss by setting FTT=0 on all VMs, in order to "erase" the replicas and consume less space which would then lead to increasing the free percentage (more than 30%)?"

No that wouldn't help as you would be in the same situation of only having one data-replica which could of course be affected by the same scenario above. As the data is stored as RAID1 FTT=1 currently there would be no available Fault-Domain(node here) to rebuild the Absent 3rd component when you put a host in MM so no resync will occur regardless of how long the node was in MM. With one node not contributing storage, the space of the cluster will be 2/3rd of it's current size and ~2/3rd of current used and free and thus is proportionally static.

Bob

View solution in original post

3 Replies
TheBobkin
Champion
Champion
Jump to solution

Hello Thouvou​,

"Would there be data loss when we will place in maintenance mode a vsan node (Ensure accessibility option), due to the small free space which is less that 30% of the total, as stated in vSAN best practices?"

When you place a node in MM with EA option you are essentially pausing update to the data components residing on that nodes disks - thus there is only a single data-replica still up to date until that node is taken out of MM and data it missed while it was gone is resynced. If you have a hardware failure of a physical disk while you are running on a single copy of the data then yes you will lose that data, thus why it is best practice to have current back-ups before doing this and if this is not possible, have the data cold (e.g. if no components are being updated then they can't be out of sync and it will remain FTT=0) this would of course require a maintenance window with all VMs off and thus why the former option is what most do.

"Could we avoid data loss by setting FTT=0 on all VMs, in order to "erase" the replicas and consume less space which would then lead to increasing the free percentage (more than 30%)?"

No that wouldn't help as you would be in the same situation of only having one data-replica which could of course be affected by the same scenario above. As the data is stored as RAID1 FTT=1 currently there would be no available Fault-Domain(node here) to rebuild the Absent 3rd component when you put a host in MM so no resync will occur regardless of how long the node was in MM. With one node not contributing storage, the space of the cluster will be 2/3rd of it's current size and ~2/3rd of current used and free and thus is proportionally static.

Bob

Thouvou
Contributor
Contributor
Jump to solution

Hello TheBobkin​,

Thank you very much for the explanation. I am mainly worried for the 12.5TB free space (<30%) and more specificaly, during the MM where some of the data and metadata of the host are evacuated to the rest of the nodes: is there a chance that the free capacity won't be adequate and not all data would be evacuated? We are thinking to shutdown all the affected VMs as we have enough time window, prior to entering MM.

Thank you again

Reply
0 Kudos
TheBobkin
Champion
Champion
Jump to solution

As I said above: the used and free space should stay approximately proportional e.g. you will have ~8.2TB free space after placing one node in MM with EA option in a 3-node cluster. And should be no resync or evacuation of data unless you have some FTT=0 data in the cluster, FTT=1 Objects require 3 usable Fault Domains for component placement and there are only 2 available with one node in MM/rebooting.

"We are thinking to shutdown all the affected VMs as we have enough time window, prior to entering MM."

Note that merely shutting down just the VMs running on one node won't make the data on any one node cold - you would require to shut down all VMs on the cluster during the maintenance window and thus why most just take current back-ups then perform the maintenance with rolling MM EA.

Bob