Hi everyone,
how does vSAN split disks into components and (re)balance them across the cluster?
Here are some examples, VSAN.ClomMaxComponentSizeGB=255, FTT=0, no stripe to keep it simple
1. new 50 GB disk. No component split
2. new 400 GB disk. Disk is split into 2 components. Are they 2*200 or 1*255 and 1*145?
3. Increase the 400 GB disk to 600GB. I understand a new concatentation disk of 200 GB for writes is created and and existing components are adjusted and rebalanced. But based on the answer of question 2. its hard for me to believe that VMware distributes sizes of components equally because this would mean every increase of the disk would lead to new compoennets sizes and every component would be recreated and rebalanced. Is vSAN really that inefficient?
Thanks
vSAN will create a second object and it will not rebalance based on how the disk is filled up, as that would indeed mean a lot of data movement. it will leave the environment as is. Rebalance will only happen when physical disks are not balanced. (Proactive rebalance, more details here: https://kb.vmware.com/s/article/2149809)
vSAN may try, by the way, to merge the concat over time, but this is simply to simplify the object lay out. not something I would worry about, and never seen / heard anyone be bothered by it or complaining about it.
Hi Duncan,
thanks for your reply.
But if we leave the rebalance aside for a second because I am with you that this operation happens from time to time to even out cluster usage.
Are there any info available on how the components are split and which sizes they have?
I actually have a customer which experiences heavy data movement every time a VM disk is increased.
As I understood, if you have a new 700 GB disk, its split evenly in 3*233 GB components. A 30 GB increase would result in a 30 GB concat disk and 3*243 GB new components which would data would be moved to and would be visible as temporary transient storage. And this sound fairly inefficient. A 2*255 GB and 1*190 GB split would make more sense because the 2 "maxed out" 255 GB components wouldnt be adjusted and therefore less movement and transient storage would be created.
Which version do they have? I agree, a full resync doesn't make sense at that point, and I thought we fixed this actually, but I may be mistaken.
We have ESXi 6.7 EP 23, Build 19195723, which includes VMware_bootbank_vsan_6.7.0-3.163.19184739, vCenter is 6.7 U3q, build 19299595. Default storage policy is FTT=1, not stripe, no space reservation. VSAN.ClomMaxComponentSizeGB ist default 255 GB, the physical disks are way larger (4 TB).
We dont witness this behaviour on every VM, but particular one large ones with large single disks with many components splits already. It appears like an increase of such a disk leads to redistribution of all the components and 2 times disk size in transient storage. Thats the reason I wanted to understand the size of the splits. Because evenly large compoenents for every disk would lead to adjusting the size every time a disk is increased and is larger than 255 GB.
Thanks for taking the time.
@depping Based on an internal training for vSAN 7.0 U1 we do indeed re-create all components split by the new total size after the disk increasement. I can share the link internally, if you're curious.
@tim-may Might be a weird question, but did your customer recently open a vSAN-related SR for this topic?
Just reading on our internal confluence pages, and it seems indeed we do a rebuild to reduce the number of components when disks grow beyond 2TB in certain configurations. I am not sure anything can be done about it.
@pkvmw : There is a SR 22329617005. The engineer made an example with a 700 GB disk but if I understood Duncan correctly, this applies to disks with a size > 2TB. Thats the reason I asked because it seams very inefficient to me to recreate the components every time an increase occurs. Why not "max out" with 255 GB the components during creation to avoid a whole redistribution every time and have one with <255 GB to "fill up" on the next increase. Also since everything is thin provisioned, a increase of virtually 0 GB causes this redistribution.
@depping Not sure how this does reduce the amount of components.
For example: I create a 2 TB disk. Since this cant be split into 255 GB parts with some GBs remaining, it split into 8*250 GB components. If I increase now by 8 GB, every component is recreated to be 8*251 GB afterwards resulting in 2 TB transient storage. So it seams to independent if the number of components is increase with the disk increase or not. It never a consolidation operation.
But if this is expected behaviour, I thank you very much for replying, this was fun ![]()
Maybe this can be considered as a future improvement to be included in the product.
Yes @pkvmw please do, will be good to read/view up ![]()
Hold on, the other enhancements are 7.0 U1 enhancements, with 6.7 U3 you shouldn't see those I guess, unless it was backported to a particular patch. Hmmm.
I did some tests on my 6.7 environment.
I created a new 1000 Gb disk on a VM. This was split into 4 components. I filled the disk with approx. 320 GB of data.
Increased the disk to 1020 GB (with would be 4*255 GB, which is the max component size). Nothing happend, component count stayed at 4, no resync. Increased the disk again to 1021 GB (to exceed and force create a new component). Now a concatenation disk appears and you can see the resync (Cluster-Monitor-vSAN-Resyncing objects) with twice the size of the 320 GB used data. Also in Cluster-Monitor-vSAN-Capacity the transient storage is visible under System usage. After the sync is finished, the concat disk is removed and 5 components are visible. I repeated that some times and also with disk size >2000 GB which show the same behaviour.
Conclusion: Disk size does not matter, only with a creation of a new component a resync is happening and not on every increase per default which I was worried about. But if a resync happens, it affects all components and all the used data on the disk, even if the (thin) increase does not add data.
