Solved: Re: destroy vsan cluster

billdossett · ‎12-24-2020

Merry Christmas vSAN peeps!

I need to destroy my whole vSAN cluster, currently 6.7. I did this once before a couple years back and I remember that I had a hell of a time trying to get the disks back as I didn't do something first I think... can't remember what. I've found plenty of articles on deleting the vsan itself, but my vcenter appliance is on the vsan, this is my lab. 4 nodes with 1 x cache ssd and 2 x storage ssds in each node. I am going to reload the nodes with esxi7 and then use vCF to rebuild... but I seem to remember the disks being marked as vsan last time and I couldn't reuse them - possible even had to boot a 3rd party util and blank them or something... can anyone tell me how to tear this thing down in the neatest and shortest time possible please? thank you!

Bill Dossett

TheBobkin · ‎12-24-2020

@billdossett , A Merry Christmas wished to you and yours also!
If you are planning on properly blanking it (and are redeploying a new vCSA and want NO data left on it), then provided it is 'healthy' then you should be able to just delete all Disk-Groups via:
Cluster > vSAN > Configure > Disk Management > Select Disk Group > Delete button > Option 'No Action'

(Note to anyone reading this in future - this DELETES ALL DATA PERMANENTLY SO ONLY DO THIS IF THIS IS WHAT YOU INTEND)

The only time removing DGs in this manner *might* fail from the UI is if there is some other problem with the DG/disks (e.g. if they are in a problem state and read-only from an ESXi perspective).
If ever you cannot claim disks for use by vSAN it is due to 1. they still have (any) partitions on them, 2. they are for whatever reason not marked as 'local' (maybe they are not locally-attached) or 3. they are not marked as SSD when they are SSDs and being used as such (e.g. ESXi/firmware detection IIRC).

Point 1. above is the most likely cause in 90%+ of occasions and whether the device(s) have existing partitions can be validated via a multitude of manners:
- In the vSphere or Host UI: Host > Configure > Storage Devices > Select Device > Partition details
- via CLI to the ESXi host: # vdq -q - this shows as 'eligible' or 'ineligible' for vSAN based on this factor.
- via CLI to the ESXi host: # ls /dev/disks/ - disks with partitions will show as for example naa.xxxxxxxxxxxxxxxx:1 naa.xxxxxxxxxxxxxxxx:2 (vSAN-formatted disks will always have 2 partitions).
If they still have partitions on them and UI delete is not informing of a stateful reason (e.g. device is read-only) then running dd on it should inform of this when running the following (and generally remediated by a host reboot):
dd if=/dev/zero of=/vmfs/devices/disks/<naa.###> bs=1M count=50 conv=notrunc
partedUtil mklabel /vmfs/devices/disks/<naa.###> msdos
AGAIN (sorry if seems shouty but necessary 🤠 ), THE ABOVE IS AN INTENTIONALLY DESTRUCTIVE COMMAND, IF YOU STUMBLED ACROSS THIS POST AND YOU DON'T WANT TO DELETE YOUR DATA, DO NOT RUN THE ABOVE COMMANDS.

View solution in original post

TheBobkin · ‎12-24-2020

@billdossett , A Merry Christmas wished to you and yours also!
If you are planning on properly blanking it (and are redeploying a new vCSA and want NO data left on it), then provided it is 'healthy' then you should be able to just delete all Disk-Groups via:
Cluster > vSAN > Configure > Disk Management > Select Disk Group > Delete button > Option 'No Action'

(Note to anyone reading this in future - this DELETES ALL DATA PERMANENTLY SO ONLY DO THIS IF THIS IS WHAT YOU INTEND)

The only time removing DGs in this manner *might* fail from the UI is if there is some other problem with the DG/disks (e.g. if they are in a problem state and read-only from an ESXi perspective).
If ever you cannot claim disks for use by vSAN it is due to 1. they still have (any) partitions on them, 2. they are for whatever reason not marked as 'local' (maybe they are not locally-attached) or 3. they are not marked as SSD when they are SSDs and being used as such (e.g. ESXi/firmware detection IIRC).

Point 1. above is the most likely cause in 90%+ of occasions and whether the device(s) have existing partitions can be validated via a multitude of manners:
- In the vSphere or Host UI: Host > Configure > Storage Devices > Select Device > Partition details
- via CLI to the ESXi host: # vdq -q - this shows as 'eligible' or 'ineligible' for vSAN based on this factor.
- via CLI to the ESXi host: # ls /dev/disks/ - disks with partitions will show as for example naa.xxxxxxxxxxxxxxxx:1 naa.xxxxxxxxxxxxxxxx:2 (vSAN-formatted disks will always have 2 partitions).
If they still have partitions on them and UI delete is not informing of a stateful reason (e.g. device is read-only) then running dd on it should inform of this when running the following (and generally remediated by a host reboot):
dd if=/dev/zero of=/vmfs/devices/disks/<naa.###> bs=1M count=50 conv=notrunc
partedUtil mklabel /vmfs/devices/disks/<naa.###> msdos
AGAIN (sorry if seems shouty but necessary 🤠 ), THE ABOVE IS AN INTENTIONALLY DESTRUCTIVE COMMAND, IF YOU STUMBLED ACROSS THIS POST AND YOU DON'T WANT TO DELETE YOUR DATA, DO NOT RUN THE ABOVE COMMANDS.

billdossett · ‎12-24-2020

seems to be working - slightly different path on my version of the UI Hosts -> Cluster -> Configure tab, scroll down to vSan and expand and then Disk Management and the disk groups are there. 2 out of 4 gones - I wasn't sure about all of that as the vcsa is on the vsan, but I guess the important bits are in memory so it continues to work even though I've pulled the rug out from underneath it, thanks so much for this, I'm on a tight schedule to redeploy with vCF and NSX-T and vRealize as we're taking that all on board in the form of a very large vxRail in 2021 and I want to be the SME on the project!

Bill Dossett

billdossett · ‎12-24-2020

may have spoke a bit too soon, 3rd host disk group got to 100% but didn't say it finished - 4th disk group is stuck on the evacuation page.

So it would seem maybe the vcsa doesn't appreciate you deleting the disk groups that it is located on?

Well, in any case, two hosts should be good, maybe 3 and I can just boot up a disk util and blank the disks with that worst case

just wanted to add that in case anyone else wonders what might happen.

Bill Dossett

billdossett · ‎12-24-2020

still having issues with the last host... I will probably just take the disks out and put them in my cradle and do something to them tomorrow.

My disks list as:

t10.ATA_____CT2000MX500SSD1_________________________2023E2A96F5D________

t10.ATA_____CT2000MX500SSD1_________________________2023E2A96F5D________:1

t10.ATA_____CT2000MX500SSD1_________________________2023E2A96F5D________:2

t10.ATA_____CT2000MX500SSD1_________________________2023E2A96FD3________

t10.ATA_____CT2000MX500SSD1_________________________2023E2A96FD3________:1

t10.ATA_____CT2000MX500SSD1_________________________2023E2A96FD3________:2

t10.ATA_____Samsung_SSD_850_EVO_250GB_______________S2R5NB0J349578A_____

t10.ATA_____Samsung_SSD_850_EVO_250GB_______________S2R5NB0J349578A_____:1

t10.ATA_____Samsung_SSD_850_EVO_250GB_______________S2R5NB0J349578A_____:2

guessing its the disk and then the:1 and:2 are partitions.

anything I do with dd says can't open, function not implemented

and partedUtil with the disk name says can't stat it. (also tried addressing the paritions themselves :1 and :2 and the same)

but vdq says they are in use by san, I've rebooted and the same. It would be nice to know how to completely destroy a cluster like this as I expect I am going to be building it quite a few times with vCF as its my first go with vCF and I seriously doubt, ney, actually hope it doesnt work the first time so I get some experience at building clusters with vCF! Thanks again.

Bill Dossett

TheBobkin · ‎12-24-2020

@billdossett so maybe I should have been a bit more clear and apologies as I take it for granted that one should not delete or impair ones vCSA that one is working from (yes, still always an advantage is possible to run this from elsewhere or even migrate to a temp VMFS carved from a single repurposed capacity-tier disk etc.), vCSA is not really 'running from memory' if deleting the data-components it is running from (how would it if it cannot write to disk?).

So yes it would appear that you have 'cut the branch you are standing on'!

Nonetheless, dd or partition remove from UI should still work - if you could specify what you are specifically trying for this it might help, note you are not to be specifying the partitions themselves but the devices e.g.:
# dd if=/dev/zero of=/vmfs/devices/disks/t10.ATA_____CT2000MX500SSD1_________________________2023E2A96F5D________ bs=1M count=50 conv=notrunc
# partedUtil mklabel /vmfs/devices/disks/t10.ATA_____CT2000MX500SSD1_________________________2023E2A96F5D________ msdos
But that being said, it is maybe easier done from the host UI in the browser.
You can also likely do it with:
# esxcli vsan storage remove -u <DISKUUIDHERE> -m noAction
With the UUIDS retrievable via:
# esxcli vsan storage list

billdossett · ‎12-24-2020

Thanks, I thought I had said that my vcenter appliance was running on the cluster, that was why I came here for help as I found several articles on google saying to remove the disk groups and turn off HA - but then you said to do it and ..., no harm done as I am redeploying vcsa anyway. I'm not exactly keen on running the appliance on the single datastore and this is a lab, so I am limited in other storage, though I could probably add another local disk in at least one of my hosts as a safe haven to put things on if I need to, like this. Funny thing is that Dell/EMC set my production vxRail up like this, no extra datastores to put the vcenter or anything else on as the local disks are so small. I've never been completely comfortable with that, but 3 yrs later and its been ok. So back to the problem, I was unable to remove them on the UI be either clearing or editing and deleting. Error message was Failed Cannot change the host configuration... but esxcli to the rescue! I was able to delete all three disks, or clear them or whatever that does and vdq. -q now reports they are all eligible for vsan use... so in my notes, I am just going to put the esxcli commands to remove disks on all the hosts and vdq to make sure that all is good before rebuilding. Seems the simplest and most reliable way to do it to me as, at least until I get around to putting a local storage device in one of the hosts. Thanks again for your help on Christmas Eve! Bonus points for that, now I can get up in the morning and build myself a Christmas present, a new vsan cluster! lol. Christmas is pretty much a non-starter here this year, its all about the food and having a nap afterwards :-). all the best! Bill

Bill Dossett

All

destroy vsan cluster

Merr