VMware Cloud Community
scale21
Enthusiast
Enthusiast

update manager and cluster remediation

I just installed update manager on our esx 4.1 vcenter server which is controling our 4.1 8 node cluster. I have a fair number of critical and security updates to apply and im thinking of creating a baseline group to address both in a maintenence window.

If i do apply this to our cluster which is running HA and DRS and i schedule it to run durring our windows (late night), how does vcenter handle the cluster or do 1 host at a time?

I realize it puts them in main. mode and vmotions all machines away and then once complete it will bring them back into the cluster, ,but how does it determine which host to do first etc?


is this somthing i need to worry about or will update manager take care of it? Meaning....is there any chance update manager tries to throw them all in Main. mode at the same time confusing everything and causing the updates to fail or possible system down time?

0 Kudos
5 Replies
Troy_Clavell
Immortal
Immortal

We remediate clusters all of the time.  The process goes, scan for updates and then remediate.  The remediation of th cluster will put 1 host at a time into maintenance mode.  Once all the guests are migrated off and the host is in maintence mode, the patching process begins.  From there a reboot is done.  Once the host comes back and available, the host is exited from maintence mode... Rinse and repeat until the cluster is complete

As i've found, the remidiation process is somewhat random

scale21
Enthusiast
Enthusiast

amazing stuff!

lets say it finds a vm that it cant move, migrate or get into the right state to remediate and that host fails after x number of retries, will it move on to the next host in the cluster if it can?

Will it put the failed one back into the mix...move vms to it and move on to the next host?

0 Kudos
Troy_Clavell
Immortal
Immortal

if a VM cannot be vMotioned the "enter maintenance mode" state will remain at 2% and your remediation task will be stuck as well.  So, knowing that, I usually like to peak in every now and again to see what's going on.

If the tasks get stuck for any reason, they remain stuck.  The remediation process is done 1 host at a time.  So, until the remediation of a single host is done, the next one will not start.

orczakm
Enthusiast
Enthusiast

Hi!

Just a sidenote - i would never apply a lot of security/host patches for a complete cluster when not have tested them on identical hardware before. Since always, it could happen that something went wrong after "successfully" updated a host, in your case you have then a cluster with errors instead of a single machine.

just my 2 ct to think about when doing updates.

Matthias

Die Welt ist binär - es gibt nur einsen und Nullen
0 Kudos
scale21
Enthusiast
Enthusiast

thanks all. I plan to do one host to test....and maybe roll out the whole thing "1 host at a time"  to make sure things go ok.

0 Kudos