I'm trying to remove a failed SSD (to replace it short term - we're in a migration to AWS that will complete in ~6 weeks)
The web client won't let me remove the group - it shows an error as attached.
If I put the host in maintenance mode and try with the CLI I get the error below
esxcli vsan storage remove -u 52317d1d-b65c-d898-307f-d38adf7ec1a3
Unable to remove device: Failed to write partition
In VSAN config the status of the group is showing as attached - the SSD failed (IO errors) and data was rebuilt, there is currently nothing using these disks.
How can I remove the disk to replace it with a new SSD?
Thanks
Alain
Hello Alian,
Likely the Cache-Tier SSD was dropped and came back but in a Read-only state - this can be confirmed from the vmkernel.log, potentially messages relating to NMP failing to read the capacity of the device etc. .
If this is the only Disk-Group on the host then I would suggest placing it in Maintenance Mode (Ensure Accessibility option), rebooting it and seeing if it can then manage the device normally to remove the partitions, if it cannot then I would advise (with the host still in MM) removing and replacing the SSD from the server (or removing via out-of-band-management console/BIOS if possible), following which the leftover references to the Disk-Group (and thus remaining partitions on the Capacity-Tier devices) should be removable via the Flash Client with 'No Action' or via the CLI with the same command you used before but with -m noAction.
Bob
Hello Alian,
Likely the Cache-Tier SSD was dropped and came back but in a Read-only state - this can be confirmed from the vmkernel.log, potentially messages relating to NMP failing to read the capacity of the device etc. .
If this is the only Disk-Group on the host then I would suggest placing it in Maintenance Mode (Ensure Accessibility option), rebooting it and seeing if it can then manage the device normally to remove the partitions, if it cannot then I would advise (with the host still in MM) removing and replacing the SSD from the server (or removing via out-of-band-management console/BIOS if possible), following which the leftover references to the Disk-Group (and thus remaining partitions on the Capacity-Tier devices) should be removable via the Flash Client with 'No Action' or via the CLI with the same command you used before but with -m noAction.
Bob
Thanks Bob, it's not the only disk group on the server so assuming that a reboot is still ok with ensure availability MM option? I'll try that first and see if I can remove the group after the reboot completes?
Hello Alain,
Ensure you have a good recent backup before MM with EA option so - some portion of your data is essentially currently configured FTT minus 1 until that host is back, out of MM and delta data resynced.
Bob
Thanks Bob, this worked - shutting down the server and restarting allowed me to remove the disk group.