VMware Cloud Community
DSS_Junior6621
Contributor
Contributor

3 Node vSAN. Failure of 2 Hosts

HI everyone, we had the fortunate situation where we lost 2 hosts of our 3 hosts vSAN cluster. HP has successfully repaired one of the 2 failed nodes and I am wondering how i go about trying to "Rebuild" or restart vSAN between the 2 working hosts. Could find the answer on google so I wanted to ask here.

I can console to the 2 hosts and they see the datastore, but the type is "unknown" and there are no files if i browse to it.

Also, vCenter was on this datastore so i cannot access that either.. lucky me..

Thanks for any help you can give.

ESXi v 5.5

Tags (2)
7 Replies
vuzzini
Enthusiast
Enthusiast

Hello DSS_Junior6621,

You need a minimum on 3 nodes in a vSAN cluster. If one host fails in a 3 node cluster, there are not enough hosts left in the cluster to rebuild the failed/missing components.

The option left with you now is to fix the 3rd node such that the rebuilding of failed/missing component takes place. In order to prevent this issue in future, you may configure 2 disk groups on the same host.

If you found this or any other answer useful please consider the use of the Helpful or Correct buttons to award points. Sandeep Vuzzini Sr. DevOps Engineer
jonretting
Enthusiast
Enthusiast

How did you "lose the hosts", what happened? Were your storage disk affected by this, they lost their data?

0 Kudos
zdickinson
Expert
Expert

Agreed with jonretting, the type of failure will dictate your recover options.  We had something similar happen where we had 3 hosts, 2 disk groups per host, but only one controller per host.  We had something happen with two of the controllers.  vSAN looked just like yours.  It was there, but showed unkown and had no files.  Once we fixed the controller issue and got all three host back online, the datastore was available.  However if the disks had failed and not the controller we would have been in trouble.

My belief is the 4 nodes should be required and not recommended, but that's a topic for another day.  Thank you, Zach.

0 Kudos
DSS_Junior6621
Contributor
Contributor

We got one host up yesterday, that was a failed temperature sensor so i am assuming no data loss.

Tech is onsite this am replacing pieces and testing on the 3rd server.

At this point it was not the storage or the controllers that were affected.

Once I get the third node back up I am hoping my data will be intact.

Will advise once i know more.

Thank you all.

0 Kudos
DSS_Junior6621
Contributor
Contributor

Third server is now back online and still type is unknown and there is no files if i browse. (i do see total size and free space now)

I am assuming at this point I need to give it time to rebuild?

Are there any commands I can run against a host via ssh to see the progress of the rebuild?

0 Kudos
zdickinson
Expert
Expert

Manage VSAN with RVC Part 2 – VSAN Cluster Administration | Virten.net

vsan.resync_dashboard is an RVC command that will show you what's re-syncing.  There are other commands on that page that can tell you other information about the health of the cluster.  Thank you, Zach.

0 Kudos
jonretting
Enthusiast
Enthusiast

Do you care about the data? Have you worked with VSAN before? There are many things should should check before doing anything extreme. Check your cables, see if your getting any errors in the logs, verify your netstacks. Which hosts can see each other, what makes them different? Unless somehow the data on the disks is gone, there is no reason why you couldn't re-install ESXi foreach host. Setup the network, move the hosts into a new cluster, and re-active VSAN. The datastore should be there and be accessible. But knowing nothing about the failure you experienced you are flying blind.

0 Kudos