With the advent of the new ESX patches, I placed one of my 4 ESX nodes in a DRS/HA cluster (set to apply recommendations with 3 or more stars) into maintenance mode, applied the patches, rebooted the host, and exited maintenance mode. After 10 minutes or so DRS moved a couple of VMs to the recently rebooted node, so I know that DRS was alive and well. I then placed a different node in the cluster into maintenance mode, and watched DRS only send VMs to the 2 nodes that had not been touched yet. With 100 VMs now effectively running on 2 hosts (the first patched node still only running 2 VMs), the hosts' resource utilization skyrocketed.
After all the VMs had been cleared off the host now in maintenance mode and running in this state for around 10 minutes, DRS began moving VMs off to the first server. My question is this - why didn't DRS send most/all of the VMs to the first server in the first place as it was extremely underutilized at the time in comparison to the other 2 servers? Does DRS not "trust" a host that has been recently rebooted or exited from maintenance mode until a certain amount of time has passed? Do I have to patch one server per day to let DRS get comfortable with it and then move VMs appropriately? This makes the prospect of patching DRS/HA clusters a much longer propsition than I think it should be.
Since this is now the second time that I've experienced this, I wanted to get everyone's thoughts - am I the only one seeing this, why is it doing this, and how are other people with similar or larger clusters dealing with this?
Thanks!
After all the VMs had been cleared off the host now in maintenance mode and running in this state for around 10 minutes, DRS began moving VMs off to the first server. My question is this - why didn't DRS send most/all of the VMs to the first server in the first place as it was extremely underutilized at the time in comparison to the other 2 servers? Does DRS not "trust" a host that has been recently rebooted or exited from maintenance mode until a certain amount of time has passed? Do I have to patch one server per day to let DRS get comfortable with it and then move VMs appropriately? This makes the prospect of patching DRS/HA clusters a much longer propsition than I think it should be.
Since this is now the second time that I've experienced this, I wanted to get everyone's thoughts - am I the only one seeing this, why is it doing this, and how are other people with similar or larger clusters dealing with this?
Thanks!