3 Replies Latest reply on Jan 19, 2020 7:04 AM by TheBobkin

    Unhealthy disk(s) are used by vSAN. But this disk and host are absent

    bastaramus Lurker

      I have a strange issue with my VSAN.

      There are 3 servers in the cluster. Two of them have disk format version 7 and another one has version 5. When I tried to upgrade disk format version, I got the next error:

      Unhealthy disk(s) 522e6640-06e2-6e3f-ad08-af84f26abf2e are used by vSAN.

       

      The most interesting that there is no the owner of this UUID in my vsan.


      I can see this object via "esxcli vsan debug object list"  command:

      UUID: 522e6640-06e2-6e3f-ad08-af84f26abf2e

         Name: naa.57c3548172844c9f

         Owner: mi204.local

         Version: 7

         Disk Group: 522e6640-06e2-6e3f-ad08-af84f26abf2e

         Disk Tier: Cache

         SSD: false

         In Cmmds: true

         In Vsi: false

         Model: N/A

         Encryption: false

         Deduplication: false

         Dedup Ratio: N/A

         Overall Health: (PDL)

         Metadata Health: N/A

         Operational Health: N/A

         Congestion Health:

               State: N/A

               Congestion Value: 0

               Congestion Area: N/A

               All Congestion Fields:

       

      Also, I found this object and its owner via the RVC command "vsan.cmmds_find -u 522e6640-06e2-6e3f-ad08-af84f26abf2e". The status is unhealthy:
      0a112-fbd8ceba-badb-48dc-a0af-f2fba9d8121c.png

       

      But the owner of this object is absent. I removed server with thus UUID 3 weeks ago, so I cant login to it via ssh and remove the inacessible object via objtool.

      There is unhealthy status of this owner:

      0a112-732d4fc2-1dc0-44c3-b55e-52792bf928cc.png

       

      I've tried to remove it from another server, but got message that object not found:

      grabilla.Uh9740.png

       

      Do you have any idea how to remove unhealthy inacessible object and/or host? And how to resolve this issue?

      Thank you very much!

       

      Btw, I have restarted vcenter server many times. The issue still exists.

        • 1. Re: Unhealthy disk(s) are used by vSAN. But this disk and host are absent
          TheBobkin Virtuoso
          vExpertVMware Employees

          Hello bastaramus,

           

          These are CMMDS entries not DOM-Objects and thus it is completely expected behaviour to not be able to remove these using objtool.

          These can be removed using cmmds-tool delete option but I must emphasise that extreme caution must be applied when doing such things (e.g. don't delete entries for any host or any entry belonging to it that has any possibility of coming back, don't even consider using this for removing any entries relating to DOM or LSOM-Objects as they won't remove the data just remove the CMMDS describing them etc.).

           

          Bob

          • 2. Re: Unhealthy disk(s) are used by vSAN. But this disk and host are absent
            bastaramus Lurker

            Thank you TheBobkin!

            I've used cmmds-tool to remove this object (it was a disk group from an unavailable old server) and the error gone.

            Also, there are some CMMDS entries with unhealty status. For example, hosts that I had removed previously. Those hosts will never come back and they are not present in the cluster or in the vcenter infrastacture at all. TheBobkin What do you think, should I remove it via cmmds-tool or leave it as it is?

            • 3. Re: Unhealthy disk(s) are used by vSAN. But this disk and host are absent
              TheBobkin Virtuoso
              vExpertVMware Employees

              Happy to help.

              I must repeat though for anyone who happens across this post in future - please don't go messing with cmmds-tool unless you know what you are doing and the problem is clear, if in doubt then please just call us at GSS.

               

              Unhealthy references to old nodes can be fairly common as a result of nodes getting re-imaged following boot device failure etc. - You can remove them if you like but the only time I have seen these cause any type of issues is if they are related to Witnesses (which have extra entry types such as PREFERRED_FAULT_DOMAIN etc.), I have yet to encounter any negative impact of leftover normal NODE references.

               

              Bob