VMware vSphere

 View Only
  • 1.  DRS vMotion of large VM during a Failure of a ESXi host

    Posted Oct 01, 2020 08:53 PM

    Does anyone know how this scenario would be handled?

    I have 3 ESX hosts each with 500GB of RAM

    Host A has 1 VM (VM_A) with 450GB of ram

    Host B has various VMs using a total of 200GB

    Host C has various VMs using a total of 200GB

    If Host A fails, VMA_ would need to fail over to either host B or C, but not enough room.

    Would the vMotion:

    A. start VM_A on Host B and after a period of time migrate the existing VMs to Host C, since Host B is now overloaded?

    or

    B.  migrate VMs away from Host B to Host C, until there is enough resources for VM_A, then start VM_A on Host B?

    I know you can choose that VM_A does not power on until there are sufficient resources, but is vMotion smart enough to make room for it before it is powered on?

    I dont want to get into a situation where it starts on a Host which is overloaded, even for a short while, or it doesnt power on at all and requires manual intervention.



  • 2.  RE: DRS vMotion of large VM during a Failure of a ESXi host

    Broadcom Employee
    Posted Oct 01, 2020 09:09 PM

    Your scenario isn’t specific at all to vMotion, you’re actually asking about HA (failover) and DRS (compute resource management).

    vMotion is merely the live migration mechanism used by the dynamic balancing function of DRS.



  • 3.  RE: DRS vMotion of large VM during a Failure of a ESXi host

    Broadcom Employee
    Posted Oct 01, 2020 10:30 PM

    This may help: Using vSphere HA and DRS Together

    As the article mentions the priority is availability, so that’s HA.

    Unless your VMs are set with memory reservations, there will be “room” for your big VM to failover - it will just contend with the other VMs on whichever host is fails over onto.

    It will then be down to DRS to balance the VMs across the 2 remaining hosts (using vMotion as necessary).



  • 4.  RE: DRS vMotion of large VM during a Failure of a ESXi host

    Posted Oct 01, 2020 10:41 PM

    Assuming we use memory reservations, I'm wondering if we would run into a situation where it simply fails because 'not enough resources available' as opposed to moving things around to allow enough resources.

    I guess the real answer is "test it and see"



  • 5.  RE: DRS vMotion of large VM during a Failure of a ESXi host

    Broadcom Employee
    Posted Oct 02, 2020 05:34 AM

    HA would have prevented you from powering on all your VMs in the first place (even with all hosts available) if their failover could not be guaranteed.

    vSphere HA Admission Control



  • 6.  RE: DRS vMotion of large VM during a Failure of a ESXi host
    Best Answer

    Posted Oct 02, 2020 06:55 AM

    Hi

    First of all pls read resources recommended by Scott

    2nd You need to decide whether you are thinking about host evacuation (DRS event really) or host failure (HA event)

    3rd there a lots of options that can decide about outcomes

    Few remarks

    Considering traditional scenario with no admission control and no reservation

    In case of HA due to the Host A failure - big VM will be powered on at the first available host (B or C), and then DRS will start to move small VMs to the other host

    Considering non traditional scenario with reservation

    HA won't be able to power on big VM at the first attempt, but will notify DRS about that.

    DRS in turn will attempt to make enough room by making the migrations (like from B to C)

    HA makes regular checks and makes several attempts to power on VMs from failed hosts.

    AFAIR last HA attempt is made like 30 min after failure in order to allow DPM to power on suspended hosts from stand by.

    Sooner or later your hosts will have enough room to fit this big VM and power it on.

    If you use reservation consider enabling admission control based on resource percentage.

    It will prevent you from shooting yourself in the foot



  • 7.  RE: DRS vMotion of large VM during a Failure of a ESXi host

    Posted Oct 02, 2020 03:56 PM

    Ok thanks this is what I was looking for!

    "Considering non traditional scenario with reservation

    HA won't be able to power on big VM at the first attempt, but will notify DRS about that.

    DRS in turn will attempt to make enough room by making the migrations (like from B to C)

    HA makes regular checks and makes several attempts to power on VMs from failed hosts.

    AFAIR last HA attempt is made like 30 min after failure in order to allow DPM to power on suspended hosts from stand by.

    Sooner or later your hosts will have enough room to fit this big VM and power it on."