VMware Cloud Community
aungkyawthuymb
Contributor
Contributor

Migrate existing one hybrid cluster to sepratate all flash and hybrid

My environment is using vSAN hybrid cluster with total 6 hosts.Now, we've planning to separate two clusters with three hosts.

One cluster is all flash and another one will be hybrid. Please could you give any idea to do this migration. How should we start and where should we start like procedures.

Tags (1)
4 Replies
TheBobkin
Champion
Champion

Hello aungkyawthuymb,

The first step would be to thoroughly look at the existing data stored on the cluster and reduce the used space:

examples: consolidate any snapshots, delete unused test VMs, apply thin-provisioned Storage Policy to anything that is thick-provisioned where not necessary, move lower-utilized or larger VMs to non-vSAN storage.

The next step would be to start decommissioning hosts in the original cluster with Full Data Evacuation option to push the data onto just the nodes that will remain in this cluster and move the empty decommissioned nodes into the new cluster- for this to be possible you would really want to be aiming for ~40% vsanDatastore space used on the orginal 6-node cluster (as this data will have to be temporarily stored on 3 nodes, so utilization will be ~80% once three nodes have been evacuated).

Another option would be if you have available slots in the node and existing disk-groups (DG) (e.g. only 5 of 7 capacity-tier slots used in DGs) would be to fully evacuate a node, then remove the capacity-tier disks from this and add them to the DGs on the other nodes (provided the controllers support hot-adding disks) - note that it should be aimed to keep the storage-space provided by each node as even as possible e.g. it wouldn't be a good idea to remove a DG from a decommissioned node and add this whole DG to one other node as this would result in an imbalanced cluster and thus the added space may not be fully usable.

Another less advisable option would be to evacuate the data from a single node then move this to new cluster, add the All-Flash DG(s) and then use Storage vMotion to move some VMs to this new cluster as FTT=0 Objects and continue to move data until other nodes can be fully evacuated, moved over and All-Flash DG(s) added - however this is far from ideal as the data would not be adequately protected from failure during this process (and also I am unsure of the current ability to temporarily set-up 1-node clusters in vSAN 6.6 - would likely require CLI configuration).

Hope this helps.

Bob

aungkyawthuymb
Contributor
Contributor

Hello Bob,

Thanks for your great post. Please could I ask for migration with available slots in the node.

We are available only one slot for capacity tier slots ( 6 of 7 capacity slots are used in DG). Let says as follow.

Host 1,2,3 will be All-Flash and 4,5,6 will be Hybrid. So, we will remove capacity tier disks one by one from host 1,2,3 then add those disk on disk group of other nodes. After that slots are not available on 4,5,6. So, how should we do other rest of capacity disks or Would it be enough to start decommissioning from original cluster.

0 Kudos
mprazeres183
Enthusiast
Enthusiast

Hi there,

Some of this maight be already answered, but let me give you a quick overwiev on how we dealed with the exact situation.

We have a Cluster with 12 Nodes all on a Hybrid vSAN infrastructure, all the Hosts are using HBA Mode on the Smart Array adapters (No Raid).

We had a total of 190 vGuests already deployed with a total storage availability of 120TB and only 30TB used, so the change in live mode was easy and quickly.


So first you need to figure out, if you are breaking your current vSAN configuration (You need at least to start with 1 Host) if the other 5 can handle the space without passing a 80% overcommit, as when it passes over 80% they start to resyncing components, and you do not want to have that, as it may have performance issues.

If you can't meet this conditions, then use this command on all Hosts to increase the minimum % before they start resyncing.


You have to connect yourself to each Host, then use this command to get the actuall Threeshold: esxcfg-advcfg -g /VSAN/ClomRebalanceThreshold you should get 80%, increase the % to a level who will be under the % you will get after you put the Host in Maintenance, you can usually just use 99% as you are only using this setting during the change. So to change that, you just write: esxcfg-advcfg -s 99 /VSAN/ClomRebalanceThreshold

Ok, so now you have at least calculated the amount needed, you have also disbaled or decreased the % before the vSAN starts to do Resyncing jobs.

What you want to do next, is to select one host with the least data to migrate, to figure it out, just right click each Host, select Maintenance Mode, and then check the first Point, where it says: Evacuate all data to other hosts, be sure that you select the one with the least TB, GB as this can take a while. Whenever you have checked on all 6 Hosts and you have your candidate to start, do it by selecting the first Point and put it in Maintenance, if you have DRS active, it will move all your vGuests to other hosts, and then it will stay at 70isch% and stay there for a long time till all the data is migrated, you can monitor it, by selecting the cluster, go to monitor and check on Resyncing Components. Whenver this is finished, your Host will go to maintenance.


Now, create a new Cluster, activate vSAN on it.

Delete on the Host you put on Maintenance the Disk Group and take away all SAS disks, just holding on the SSDs (If you do not have spare disks for doing this, and you need another host from the 6 to redistribute Disks, then you have to do the same I described on a second host, just if it tolerates the Treshold without causing you space issues). Don't forget to delete the diskgroups if you are taking a second Host.

Important before you do so, Be sure that the Amount of the Cache Tier is at least 10% of the Amount of the Capacity Tier, So if you have 8TB in total of SSH Disks as Capacity, you need an 960GB Cache disk, so that you meet the conditions!

Now, create on that Host, where you deleted the the Diskgroup a new group, you do that by first adding the Disks to the Host, be sure to start Left or Right (Actually it dosn't matter) it just matters on what Smart Array you are using, if you use a smart array where each slot is connected to, then you don't have to care, I just find out, that it's quite easy to do it if you follow a certain patern.

In our end I use HP G9 Servers with a total of 24 Slots.

So I created 3 Diskgroups with a total of 8 Disks per Group, 1 Cache with 7 Capacity and I added First Slot Cache, the next 7 then Capacity and so on.
If you want to be sure that you are using the disks per Group you are adding, don't add all the Disks at the same time, meaning.

Add 1 Cache and 7 Capacity for the first Group, create the Group, then add the other 1 Cache and 7 Capacity and create the second Group, then add the other 1 Cache and 7 Capacity and create the thirt Group, by doing this, the Groups are always keeped toghether.

Whenever you have done that, add the Host to the vCenter, to the new created Cluster, then (Live-Migrate) just use the same vmkernel vor vMotion between both Clusters (Yes crosscluster migration in vSAN is possible) and select Migrate, Change both Compute and resource and storage, slect the new Host, the new vSAN Datastore and move it to the new Cluster.

You have to do that, so that you are able to continue with the next Host and so on.
Your goal will be to have a working Cluster vSAN Full Flash with a lot of the vGuests on it, so that you can create the two clusters from scratch.

Now don't forget that I mentioned you need 2 Hosts so that you can repartition the Disks, so that means you have now 1 Host that isn't in any Cluster, as you utilized the SSD disks for the Host that you have now online on the new Cluster. What you need to do with this one, is to have at least 1 SSD for the Cache Tier and the rest of the disks for the Capacity HDDs. (For one Group)

Whenever you have created the Disk Group. On the second host that you will need for the Hybrid Cluster, you will now create a thirt Cluster in your vCenter and add this host to it.

You will have to do so, because you will not be able to do it with the running other Hosts that you had in your initial Cluster! This inital Hybrid Cluster will disapear once you have finished with building up the Hybrid and Full-Flash solution.

Now, this can be a time sensible task, it took me about 60 Hours on my infrastructure (That because of the Resyncing Jobs). You will have to wait a long time for the Hosts to put them in to Maintenance, but if you follow this rules, you will be running with a vSAN infrastructure in about 4 Days (Depending on your Dataamount).

Have fun doing it, and let me know if it worked out.

Check my blog, and if my answere resolved the issue, please provide a feedback. Marco Frias - VMware is my World www.vmtn.blog
TheBobkin
Champion
Champion

Hello,

mprazeres183 , thanks for your input - it's always good to hear from others that have gone through similar processes and their tips.

However, I do feel the need to clear up a few things just so that anyone reading this in future is better informed:

"as when it passes over 80% they start to resyncing component"

- This is actually not a normal 'resync' as such (e.g. rebuilding missing/degraded components) but a rebalance operation that is triggered when any capacity-tier drive (not host) reaches 80% used capacity which aims to smooth out the per-disk storage utilization.

- While I would advise *temporarily* increasing the rebalance threshold, this should be used sparingly and I wouldn't jump straight to 99% as this may result in active thin-provisioned Objects being unable to write to their data components (95% is safer but this does also depend on capacity-tier drive size). Also, this parameter can be configured much easier from the Web Client under the Advanced Settings tab on each host.

"Be sure that the Amount of the Cache Tier is at least 10% of the Amount of the Capacity Tier"

- The guidelines for Hybrid clusters state that cache-tier SSDs should be a minimum 10% of CONSUMED capacity - not raw capacity:

storagehub.vmware.com/export_to_pdf/vmware-r-virtual-san-tm-design-and-sizing-guide

- All-Flash cache to capacity ratio guidelines are a different story:

https://blogs.vmware.com/virtualblocks/2017/01/18/designing-vsan-disk-groups-cache-ratio-revisited/

aungkyawthuymb

"Host 1,2,3 will be All-Flash and 4,5,6 will be Hybrid. So, we will remove capacity tier disks one by one from host 1,2,3 then add those disk on disk group of other nodes."

Are there other empty drive-bays on the physical servers or do they only have 8 slots max? If there are free slots then you could make second disk-groups using the decommissioned ones - but note that this wouldn't really be of benefit until 2 full nodes were evacuated and their disk-groups moved (unless you have a lot of FTT=0 data, which only requires 1 Fault Domain for component placement).

"After that slots are not available on 4,5,6. So, how should we do other rest of capacity disks or Would it be enough to start decommissioning from original cluster.""

What is the current storage utilization and capacity of the cluster? (#df -h)

As per my previous comment I would strongly advise taking some time with any admins that know this environment and work out what stuff doesn't need to be on there or is no longer in use, with the aim to free up as much space as possible. You could also determine VMs which are not critical and/or ones that can be recreated very easily (e.g. from templates/golden images) - these could potentially (temporarily) have an FTT=0 Storage Policty applied to them which would cut their space utilization in half (assuming they are currently FTT=1), this means less data that has to be moved around alongside cutting back on space used.

Bob

0 Kudos