4 Replies Latest reply on Mar 21, 2007 9:13 PM by conradsia

    HA Bug?

    conradsia Hot Shot

      I have a problem with HA.


      I have a 3 nic team on both ESX servers, when I pull out one nic and put it back in no problem, HA doesn't failover, if I pull two nics out and put them back in one at a time, no problem.


      When I pull out two nics from the team, and then simultaneously reconnect them, HA detects a failed host, shuts down the vm's and restarts them on the second host.


      I have tried every combo to make HA failover between the two servers and this does it everytime. Anyone else?


      I have filed a SR but so far I haven't gotten an answer, still waiting to hear back from the VMware folks.



        • 1. Re: HA Bug?
          conradsia Hot Shot

          Setting the "rolling failover" to Yes in the nic team apparently stops the HA restart issue.

          • 2. Re: HA Bug?
            conradsia Hot Shot

            But you don't want to do that so here was the issue.


            I am using cisco switches, if the console port is on the port that fails, the groupID is changed to the up port. When the nic is plugged back in, vmware recognizes it's groupID is backup and switches the console back to that port. The problem is the cisco doesn't immediately come back up, it takes 15 to 30 seconds to sync and initialize, so vmware then detects a host down, which caused HA to restart.


            I still don't get why the nic wouldn't then just fail over to the working nic instead of restarting HA like it did when I unplugged it. And that is why when you turn rolling failover on it stays up when you re-connect the nics.

            • 3. Re: HA Bug?
              Rumple Master

              do you have fastport turned on for your ports?  that should allow the ports to come back up within 2 seconds.

              1 person found this helpful
              • 4. Re: HA Bug?
                conradsia Hot Shot

                It must not be, i'm going to turn it on and see what happens.


                I was also wondering still why the server fails over when there is still an available up port. It seems to me that it should just fail back to the up port and not restart HA.