7 Replies Latest reply on Jun 3, 2015 8:17 AM by jonretting

    3 Node vSAN. Failure of 2 Hosts

    DSS_Junior6621 Lurker

      HI everyone, we had the fortunate situation where we lost 2 hosts of our 3 hosts vSAN cluster. HP has successfully repaired one of the 2 failed nodes and I am wondering how i go about trying to "Rebuild" or restart vSAN between the 2 working hosts. Could find the answer on google so I wanted to ask here.

       

      I can console to the 2 hosts and they see the datastore, but the type is "unknown" and there are no files if i browse to it.

       

      Also, vCenter was on this datastore so i cannot access that either.. lucky me..

       

      Thanks for any help you can give.

       

      ESXi v 5.5

        • 1. Re: 3 Node vSAN. Failure of 2 Hosts
          vuzzini Hot Shot

          Hello DSS_Junior6621,

           

          You need a minimum on 3 nodes in a vSAN cluster. If one host fails in a 3 node cluster, there are not enough hosts left in the cluster to rebuild the failed/missing components.

           

          The option left with you now is to fix the 3rd node such that the rebuilding of failed/missing component takes place. In order to prevent this issue in future, you may configure 2 disk groups on the same host.

          • 2. Re: 3 Node vSAN. Failure of 2 Hosts
            jonretting Enthusiast

            How did you "lose the hosts", what happened? Were your storage disk affected by this, they lost their data?

            • 3. Re: 3 Node vSAN. Failure of 2 Hosts
              zdickinson Expert

              Agreed with jonretting, the type of failure will dictate your recover options.  We had something similar happen where we had 3 hosts, 2 disk groups per host, but only one controller per host.  We had something happen with two of the controllers.  vSAN looked just like yours.  It was there, but showed unkown and had no files.  Once we fixed the controller issue and got all three host back online, the datastore was available.  However if the disks had failed and not the controller we would have been in trouble.

               

              My belief is the 4 nodes should be required and not recommended, but that's a topic for another day.  Thank you, Zach.

              • 4. Re: 3 Node vSAN. Failure of 2 Hosts
                DSS_Junior6621 Lurker

                We got one host up yesterday, that was a failed temperature sensor so i am assuming no data loss.

                 

                Tech is onsite this am replacing pieces and testing on the 3rd server.

                 

                At this point it was not the storage or the controllers that were affected.

                 

                Once I get the third node back up I am hoping my data will be intact.

                 

                Will advise once i know more.

                 

                Thank you all.

                • 5. Re: 3 Node vSAN. Failure of 2 Hosts
                  DSS_Junior6621 Lurker

                  Third server is now back online and still type is unknown and there is no files if i browse. (i do see total size and free space now)

                   

                  I am assuming at this point I need to give it time to rebuild?

                   

                  Are there any commands I can run against a host via ssh to see the progress of the rebuild?

                  • 6. Re: 3 Node vSAN. Failure of 2 Hosts
                    zdickinson Expert

                    Manage VSAN with RVC Part 2 – VSAN Cluster Administration | Virten.net

                     

                    vsan.resync_dashboard is an RVC command that will show you what's re-syncing.  There are other commands on that page that can tell you other information about the health of the cluster.  Thank you, Zach.

                    • 7. Re: 3 Node vSAN. Failure of 2 Hosts
                      jonretting Enthusiast

                      Do you care about the data? Have you worked with VSAN before? There are many things should should check before doing anything extreme. Check your cables, see if your getting any errors in the logs, verify your netstacks. Which hosts can see each other, what makes them different? Unless somehow the data on the disks is gone, there is no reason why you couldn't re-install ESXi foreach host. Setup the network, move the hosts into a new cluster, and re-active VSAN. The datastore should be there and be accessible. But knowing nothing about the failure you experienced you are flying blind.