7 Replies Latest reply on Jun 29, 2015 3:22 AM by shahidpp

    Testing no of host failures to tolerate capability on vsan 6.0

    shahidpp Lurker

      I have a 4 node  VSAN cluster where each host contributes one HDD(3.5T) and one SSD

      The VSAN datastore is up and running.

      Say the hosts are A,B,C,D

       

      I have a VM using default VSAN storage policy residing on VSAN datastore whose data objects are mirrored across two hosts.

       

      -VM is under host A

      -Components are mirrored across host A and host B

      -Witness is Host C

      -HA is disabled in the cluster.

       

      When i shutdown host A ,the VM is showing disconnected although "no. of host failures to tolerate" is set to 1.

      I was expecting as the data is mirrored across host B, this host will take ownership of the VM and VM stays connected.

       

      However when HA is enabled the VM is restarted on HostB

       

      So my question is, Do we have to enable HA for VSAN host failure scenario?

       

      Please give me some light on why the data are mirrored.?

       

      I am still a beginner to VSAN. Seeking help on this

        • 1. Re: Testing no of host failures to tolerate capability on vsan 6.0
          jonretting Enthusiast

          Assuming you are using DRS and entering maintenance mode for the host your are taking down, the VM should be vmotioned to another host. If you are simulating a complete failure of that host, and the VM in question is using that host for compute, then that VM will be offline. It would seem you are mixing up compute node and the storage policy. In a single host failure scenario that VM's storage is still 100% available. But would need to be restarted on another host by you, scripts, and especially HA. In certain situations without HA you might need to remove the VM from inventory and re-register it (via datastore browser) to a live compute node. Cheers

          • 2. Re: Testing no of host failures to tolerate capability on vsan 6.0
            shahidpp Lurker

            Thanks Jonretting for clarifying.

             

            Yes, I was completely powering of the VM's compute host, causing a VM offline situation.

             

            But the HA scenario below is common to any cluster having a shared storage(other than vsan), ie VM will get restated on an available host in the cluster.

            So what does this capability does additionally?

             

            What does the Host failure mentioned in the storage policy means, Is it a disk failure or just a network partition?

            Will a manual shutdown come under this?

             

            Thanks in advance

            • 3. Re: Testing no of host failures to tolerate capability on vsan 6.0
              zdickinson Expert

              Failures To Tolerate (FTT) is how many hosts can fail and still have data availability.  The host can fail in any number of ways.  Crash, purple screen of death.  SSD failure, assuming only one diskgroup in a host.  A network failure like you mentioned.  If you have three nodes and 1 fails, you will be w/o redundancy until the node is brought back online. If you have four or more nodes, a rebuild will be started.  I believe there is a timeout before the rebuild start to account for maintenance windows and reboots.

               

              HA will power a machine up on another host in the event of a failure, if that machine was running.  If the machine was powered off at the time of the failure it will show as disconnected until the host is back online.  I hope this helps.  Thank you, Zach.

              • 4. Re: Testing no of host failures to tolerate capability on vsan 6.0
                jonretting Enthusiast

                The default amount of time before a rebuild takes place is still 60 minutes. On my lab setup I would occasionally forget to bring a host out of maintenance, or leave it off too long doing working on hardware.

                 

                The setting to change is "VSAN.ClomRepairDelay"

                 

                And to avoid restarting the host after modification you can manually restart the "clomd" daemon with:

                %$ /etc/init.d/clomd restart

                 

                Cheers

                • 5. Re: Testing no of host failures to tolerate capability on vsan 6.0
                  npadmani Master
                  VMware EmployeesvExpert

                  HA will power a machine up on another host in the event of a failure, if that machine was running.  If the machine was powered off at the time of the failure it will show as disconnected until the host is back online.

                  FYI, Little correction is needed in above statement.

                  if the VM was powered off and host fails which was part of HA cluster, provided that powered off VM was part of shared datastore, it will still be re-registered by HA on one of the other healthy hosts in HA cluster. It's just that it will remain powered off.

                  • 6. Re: Testing no of host failures to tolerate capability on vsan 6.0
                    shahidpp Lurker

                    Thanks a lot for the information