hendersp3
Enthusiast
Enthusiast

ESXi host shutdown behavior when executed from vCenter

Can anyone discuss how a host is supposed to behave when it has a bunch of running vms on it and we issue a shutdown or reboot command from the vCenter UI?  Sometime it shuts down and the guest vms will boot up elsewhere and other times the host just sort of hangs on shutdown and can take 30-45 minutes to shutdown. 

The logs aren't very useful as they basically stop logging and ssh disconnects so my tail log closes as well. 

I am just curious as to what the host may be doing\trying to do that causes this delay.  Any links or KBs and documentation would be helpful.  I have searched but am not finding much.  Only a few things that discuss dcui vs vcenter restart that don't apply to my situation. 

Thanks for any information or tips you can provide. 

5 Replies
daphnissov
Immortal
Immortal

Are you asking about ESXi's behavior when it is not in maintenance mode and you perform either of these actions?

0 Kudos
hendersp3
Enthusiast
Enthusiast

Apologies,

I left out some details.  Yes I am asking about a reboot or shutdown from vcenter when the host is NOT in maintenance mode.  We are testing some applications failover capabilities.  I know MM is better and all but we are testing all possible failure scenarios, even a less experienced administrator rebooting a host that is not in MM. 

Thank You

0 Kudos
daphnissov
Immortal
Immortal

In the past (so I'm assuming this is current behavior), if you did a "restart" of an ESXi host via vCenter, this would be basically an ACPI soft-off command and would not trigger an HA response. A "shutdown" would trigger an HA response, however. That said, neither of those are appropriate actions to perform if a host is not in maintenance mode, and should be part of the training and education plan afforded to administrators who have such privileges into vCenter. It's just a right vs wrong way of doing things.

hendersp3
Enthusiast
Enthusiast

Thank you for the soft shut off information.  I completely understand about using Maintenance mode when doing things but we are a software vendor and we design our applications to be resilient enough for a non MM shutdown or reboot.  So we test for that during new releases.   That is how we discovered it sometimes hangs and other times it is immediate. 

0 Kudos
psibaja
VMware Employee
VMware Employee

Hello hendersp3,

The behaviors are different depending of the configuration of the cluster where the host is located when HA/DRS are enabled/disabled. Also, anytime you try to reboot or shutdown an ESXi that is not in Maintenance Mode you will get this message:

pastedImage_1.png

So vCenter is warning you beforehand of what could happen if you click "OK". So the virtual machines will be stopped unsafely and there might be data loss on the vmdks.

However, there are a few things that could assist us: HA and DRS.

When the ESXi is not in Maintenance Mode and you execute a shutdown/reboot on the host:

  • If the cluster has HA enabled with Host Monitoring enabled: After the host is shutdown or rebooted and the HA heartbeats are lost to this host, HA will boot the VMs in another host after a few seconds. If Host Monitoring is disabled the VMs that are powered on at the moment of the shutdown/reboot process of the ESxi are hardly powered off and will remain that way on this host until an administrator powers them on manually. DRS can kick in and decide in which hosts the VMs can be powered on but HA is the one taking the first action to boot the VMs in other hosts.
  • DRS will not trigger here even in Fully Automated mode.

When the ESXi is entering Maintenance Mode:

  • If the cluster has DRS enabled in Fully Automated: DRS will vMotion the powered on VMs to another host automatically (you can choose to also migrate the powered off VMs).
  • If the cluster has DRS enabled in Partially Automated: You have to go to the Cluster's tab "Monitor" and then "vSphere DRS" and "Apply Recommendations" so DRS can migrate the powered on VMs to the recommended hosts and then the host can enter into Maintenance mode. This has to be done manually and that's why DRS is better in Fully Automated for cases like this.
  • If the cluster has DRS enabled in Manual: Same as with Partially Automated.

Then you can shutdown/reboot the host safely once all the powered on VMs are running in other hosts.

Hope this information helps you out. Let me know if you have any more questions.

Regards.

0 Kudos