VMware Cloud Community
alainrussell
Enthusiast
Enthusiast
Jump to solution

Cannot removed failed SSD

I'm trying to remove a failed SSD (to replace it short term - we're in a migration to AWS that will complete in ~6 weeks)

The web client won't let me remove the group - it shows an error as attached.

If I put the host in maintenance mode and try with the CLI I get the error below

esxcli vsan storage remove -u 52317d1d-b65c-d898-307f-d38adf7ec1a3

Unable to remove device: Failed to write partition

In VSAN config the status of the group is showing as attached - the SSD failed (IO errors) and data was rebuilt, there is currently nothing using these disks.

How can I remove the disk to replace it with a new SSD?

Thanks

Alain

Reply
0 Kudos
1 Solution

Accepted Solutions
TheBobkin
Champion
Champion
Jump to solution

Hello Alian,

Likely the Cache-Tier SSD was dropped and came back but in a Read-only state - this can be confirmed from the vmkernel.log, potentially messages relating to NMP failing to read the capacity of the device etc. .

If this is the only Disk-Group on the host then I would suggest placing it in Maintenance Mode (Ensure Accessibility option), rebooting it and seeing if it can then manage the device normally to remove the partitions, if it cannot then I would advise (with the host still in MM) removing and replacing the SSD from the server (or removing via out-of-band-management console/BIOS if possible), following which the leftover references to the Disk-Group (and thus remaining partitions on the Capacity-Tier devices) should be removable via the Flash Client with 'No Action' or via the CLI with the same command you used before but with -m noAction.

Bob

View solution in original post

Reply
0 Kudos
4 Replies
TheBobkin
Champion
Champion
Jump to solution

Hello Alian,

Likely the Cache-Tier SSD was dropped and came back but in a Read-only state - this can be confirmed from the vmkernel.log, potentially messages relating to NMP failing to read the capacity of the device etc. .

If this is the only Disk-Group on the host then I would suggest placing it in Maintenance Mode (Ensure Accessibility option), rebooting it and seeing if it can then manage the device normally to remove the partitions, if it cannot then I would advise (with the host still in MM) removing and replacing the SSD from the server (or removing via out-of-band-management console/BIOS if possible), following which the leftover references to the Disk-Group (and thus remaining partitions on the Capacity-Tier devices) should be removable via the Flash Client with 'No Action' or via the CLI with the same command you used before but with -m noAction.

Bob

Reply
0 Kudos
alainrussell
Enthusiast
Enthusiast
Jump to solution

Thanks Bob, it's not the only disk group on the server so assuming that a reboot is still ok with ensure availability MM option? I'll try that first and see if I can remove the group after the reboot completes?

Reply
0 Kudos
TheBobkin
Champion
Champion
Jump to solution

Hello Alain,

Ensure you have a good recent backup before MM with EA option so - some portion of your data is essentially currently configured FTT minus 1 until that host is back, out of MM and delta data resynced.

Bob

Reply
0 Kudos
alainrussell
Enthusiast
Enthusiast
Jump to solution

Thanks Bob, this worked - shutting down the server and restarting allowed me to remove the disk group.

Reply
0 Kudos