I ran into a scenario tonight that was puzzling and I blew it off to gremlins but then later the light bulb went off in my head.
I have a cluster of 3 ESX hosts. The cluster is DRS and HA enabled. DRS has a few "keep VMs separate" rules.
I put one of the hosts in maintenance mode. In doing so, 6 VMs were going to be automatically VMotioned to the other two hosts. Well, 5 of the VMs migrated off without issue. The 6th VM just sat there. No migration began for it. The Maintenance mode command remained in the status "in progress". Why was VC not migrating that last VM? Was it waiting for the other 2 hosts to settle down to see which would be the best host candidate for the last VMotion that was waiting to take place?
I waited about 8-10 minutes and nothing happened. I lost my patience and manually VMotioned the last VM and it VMotioned without issue. The enter maintenance mode task then finished successfully since it was waiting for that last VMotion.
An hour later, I remembered why what happened, happened. I had set up an anti-affinity (keep VMs separate) rule in DRS which that last VMotion would have violated. That is why VC would not willingly VMotion it.
Should VC have handled this any differently? A few possible "alternative" outcomes:
= VC acknowledges/assumes maintenance mode trumps any DRS rules. All VMs are automatically VMotioned even if DRS rules are broken in the process. A task/event will be written showing that a DRS rule was broken by VC per the administrator's request to put a host into maintenance mode.
= VC realizes that DRS rules will need to be broken in order to complete the maintenance mode task and prompts the administrator with a "yes/no" type question on whether or not to proceed. If yes, A task/event will be written showing that a DRS rule was broken by VC per the administrator's request to put a host into maintenance mode. Thought: HA has the foresight and intuitiveness in the way that it will splash bright yellow banners on the VC console when HA policy is in jeopardy, broken, or unlicensed; DRS seems to have the intelligence, but not the communication or notification piece.
= Other possibilities?
Either of the above seem a little better rather than VC and ESX sitting there at a stalemate because of a DRS rule, with no apparent reason why other than for the administrator to figure out an hour later there was a DRS rule in place.