10 Replies Latest reply on Mar 11, 2008 8:32 PM by jasonboche

    Maintenance mode and DRS rules

    jasonboche Champion
    vExpert

      I ran into a scenario tonight that was puzzling and I blew it off to gremlins but then later the light bulb went off in my head.

       

      I have a cluster of 3 ESX hosts.  The cluster is DRS and HA enabled.  DRS has a few "keep VMs separate" rules.

       

      I put one of the hosts in maintenance mode.  In doing so, 6 VMs were going to be automatically VMotioned to the other two hosts.  Well, 5 of the VMs migrated off without issue.  The 6th VM just sat there.  No migration began for it.  The Maintenance mode command remained in the status "in progress".  Why was VC not migrating that last VM?  Was it waiting for the other 2 hosts to settle down to see which would be the best host candidate for the last VMotion that was waiting to take place?

       

      I waited about 8-10 minutes and nothing happened.  I lost my patience and manually VMotioned the last VM and it VMotioned without issue.  The enter maintenance mode task then finished successfully since it was waiting for that last VMotion.

       

      An hour later, I remembered why what happened, happened.  I had set up an anti-affinity (keep VMs separate) rule in DRS which that last VMotion would have violated.  That is why VC would not willingly VMotion it.

       

      Should VC have handled this any differently?  A few possible "alternative" outcomes:

       

      = VC acknowledges/assumes maintenance mode trumps any DRS rules.  All VMs are automatically VMotioned even if DRS rules are broken in the process.  A task/event will be written showing that a DRS rule was broken by VC per the administrator's request to put a host into maintenance mode.

       

      = VC realizes that DRS rules will need to be broken in order to complete the maintenance mode task and prompts the administrator with a "yes/no" type question on whether or not to proceed.  If yes, A task/event will be written showing that a DRS rule was broken by VC per the administrator's request to put a host into maintenance mode.  Thought:  HA has the foresight and intuitiveness in the way that it will splash bright yellow banners on the VC console when HA policy is in jeopardy, broken, or unlicensed; DRS seems to have the intelligence, but not the communication or notification piece.

       

      = Other possibilities?

       

      Either of the above seem a little better rather than VC and ESX sitting there at a stalemate because of a DRS rule, with no apparent reason why other than for the administrator to figure out an hour later there was a DRS rule in place.

        • 1. Re: Maintenance mode and DRS rules
          bister Expert

          I wonder what will happen when one of your hosts crashes: Will ESX bring all VMs up again? Or will it obey DRS rules!?

           

          Regards,

          Christian

          • 2. Re: Maintenance mode and DRS rules
            MR-T Champion

            I dont understand why this didn't work in your situation.

             

            You've got 3 hosts in the cluster right.

             

            so you take 1 down and there are still 2 host left which can handle the affinity rule.

             

            It would be different if you only had 2 servers in the cluster.

            • 3. Re: Maintenance mode and DRS rules
              jasonboche Champion
              vExpert

              I dont understand why this didn't work in your

              situation.

               

              You've got 3 hosts in the cluster right.

               

              so you take 1 down and there are still 2 host left

              which can handle the affinity rule.

               

              It would be different if you only had 2 servers in

              the cluster.

               

              The anti-affinity rule was set up such that:

              A can't be with B

              B can't be with C

              C can't be with A

               

              ie.  3 VMs need to be on 3 separate hosts.  None can be together.

               

              When I took the 3rd host down, one of those rules had to be broken because a minimum of 3 hosts are required for the above to work.

              • 4. Re: Maintenance mode and DRS rules
                bister Expert

                Hmmm seems you should invest in a fourth node to keep your wish for separate VMs alive... you want 3 VMs on 3 hosts AND run them in a cluster. There's missing one cluster node.

                 

                So IMO VI worked correctly.

                 

                Then my first question (possibly) is answered: When 1 of 3 clusters in your scenario fails then the one VM will stay powered off.

                 

                Regards,

                Christian

                • 5. Re: Maintenance mode and DRS rules
                  jasonboche Champion
                  vExpert

                  This is a proof of concept lab for finding things just like this.  Also, this does not excuse the fact that there is no notification from VirtualCenter on what's going on with the delay during the pending maintenance mode request.  I know in production I'd need a 4th box, but what I'm pointing out here is the flow of information and how it could work better.

                  • 6. Re: Maintenance mode and DRS rules
                    bister Expert

                    Sorry for being impolite...

                     

                    I also think a simple message in the event log would be the minimum, just to know why things are not working. But I guess there is no check in the DRS-algorithm that checks for a maximum of time or loops while doing DRS... Otherwise the process of going into maintenance mode would not be dead locked...

                     

                    Respectfully,

                    Christian

                    • 7. Re: Maintenance mode and DRS rules
                      jasonboche Champion
                      vExpert

                      Hi, no need to apologize, you were not being impolite.  I think you were under the assumption this was production and I was pointing out that it was a test environment.  Good day.

                       

                      Jas

                      • 8. Re: Maintenance mode and DRS rules
                        jasonboche Champion
                        vExpert

                        Looks like there actually is a slight clue in the Events log for the ESX host eluding to the fact that maintenance mode can't VMotion the VM because of violating an anti-affinity rule.  Nothing too revealing though.

                         

                        http://www.vmware.com/community/servlet/JiveServlet/download/287-71101-570624-559/untitled.jpg

                        • 9. Re: Maintenance mode and DRS rules
                          dpomeroy Virtuoso

                          Jason,

                          It would be great if this was a configurable option, such as: If a server is put into Maintenance Mode either 1. Override DRS rules until they can be met again or 2. Power Off the VM. This could be a setting in the DRS settings.

                           

                          I also agree on the alert/logging, often there doesn't seem to be enough info as to why things are happening, or that they are happening at all.

                           

                          Hopefully since this is the first version of DRS (and HA) will we see continued improvements as new version of ESX come out.

                          1 person found this helpful
                          • 10. Re: Maintenance mode and DRS rules
                            jasonboche Champion
                            vExpert

                            Still need some help here from VMware.  I'll try flagging the thread to see if that helps at all.

                             

                             

                             






                            [i]Jason Boche[/i]

                            [VMware Communities User Moderator|http://communities.vmware.com/docs/DOC-2444][/i]