4 Replies Latest reply on Jun 24, 2020 1:44 AM by alainrussell

    Cannot removed failed SSD

    alainrussell Enthusiast

      I'm trying to remove a failed SSD (to replace it short term - we're in a migration to AWS that will complete in ~6 weeks)

       

      The web client won't let me remove the group - it shows an error as attached.

       

      If I put the host in maintenance mode and try with the CLI I get the error below

      esxcli vsan storage remove -u 52317d1d-b65c-d898-307f-d38adf7ec1a3

      Unable to remove device: Failed to write partition

       

      In VSAN config the status of the group is showing as attached - the SSD failed (IO errors) and data was rebuilt, there is currently nothing using these disks.

       

      How can I remove the disk to replace it with a new SSD?

       

      Thanks

      Alain

        • 1. Re: Cannot removed failed SSD
          TheBobkin Virtuoso
          vExpertVMware Employees

          Hello Alian,

           

          Likely the Cache-Tier SSD was dropped and came back but in a Read-only state - this can be confirmed from the vmkernel.log, potentially messages relating to NMP failing to read the capacity of the device etc. .

          If this is the only Disk-Group on the host then I would suggest placing it in Maintenance Mode (Ensure Accessibility option), rebooting it and seeing if it can then manage the device normally to remove the partitions, if it cannot then I would advise (with the host still in MM) removing and replacing the SSD from the server (or removing via out-of-band-management console/BIOS if possible), following which the leftover references to the Disk-Group (and thus remaining partitions on the Capacity-Tier devices) should be removable via the Flash Client with 'No Action' or via the CLI with the same command you used before but with -m noAction.

           

          Bob

          • 2. Re: Cannot removed failed SSD
            alainrussell Enthusiast

            Thanks Bob, it's not the only disk group on the server so assuming that a reboot is still ok with ensure availability MM option? I'll try that first and see if I can remove the group after the reboot completes?

            • 3. Re: Cannot removed failed SSD
              TheBobkin Virtuoso
              vExpertVMware Employees

              Hello Alain,

               

              Ensure you have a good recent backup before MM with EA option so - some portion of your data is essentially currently configured FTT minus 1 until that host is back, out of MM and delta data resynced.

               

              Bob

              • 4. Re: Cannot removed failed SSD
                alainrussell Enthusiast

                Thanks Bob, this worked - shutting down the server and restarting allowed me to remove the disk group.