VMware Cloud Community
KevinLK
Contributor
Contributor

Updating when there are dependencies between VMs

Hello,

I'm looking for a way to upgrade the hosts in a non-HA cluster where each host has a single VM and the VMs are dependent on the rest of the VMs in the cluster, such that only a single VM (i.e. host) can be down at a time. The VMs are part of a HA product that can't be failed over at the vmware level because each VM has persisted data that is required to maintain proper functionality (like a vSAN cluster). We need to update hosts sequentially (again, like a vSAN cluster) and after a VM comes up, we need some amount of time before the next host can be taken down. Is there an API (or a way to invoke a script that we supply) that would allow the hosts (or VUM) to query the VMs and see if they're "ready" to be taken down?

Thanks,

Kevin

0 Kudos
6 Replies
LucD
Leadership
Leadership

Not 100% sure what exactly you want here.
Is the following high-level algorithm (somewhat) describing what you wan?

  • Run through all ESXi nodes in the cluster
    • On an ESXi node, get the VM running there
    • Query the VM if it is "ready" (not sure what you mean here)
    • Stop the VM
    • Place the ESXi node in maintenance mode
    • Remediate the ESXi node
    • Wait till the ESXi node/VM is "ready"
    • Proceed with next ESXi node


Blog: lucd.info  Twitter: @LucD22  Co-author PowerCLI Reference

KevinLK
Contributor
Contributor

Thanks for the reply. Yes, you have the basic process down. The application running inside of the VM is part of an HA cluster (not the vmware definition of HA), so the host ESXi cannot be updated without the VM being in a state that is "ready" to be brought down. "Ready" just means that the HA state of the cluster can be maintained if a VM goes down.

The basic question is, can the application running in the VM provide status back to the VUM so that ESXi updates are sequenced according to the "readiness" of the application running within the VMs (where each host has a single VM and each VM has a single instance of the application running). As long as one or more instances of the application are "ready", have VUM pick one to remediate. While one is being remediated, the others will report that they are "Not Ready". Once remediation of a host completes, the other instances will report that they are "ready" (not necessarily right away), and at that point, the VUM can pick another "ready" host to remediate.

I hope that makes more sense.

Thanks,

Kevin

0 Kudos
LucD
Leadership
Leadership

Ok, I think I see what you mean.

Afaik, there is no mechanism in place that would allow a VM or ESXi node to "talk" to the Update Manager, and say "take me".

You could get the applications in the VMs to arrange this:

  • check if the application is "ready"
  • announce this to the other applications/VMs
  • have a mechanism to select on the "ready" VMs
  • the "ready" VM triggers the remediation of the ESXi node on which it is running
  • repeat till all nodes are remediated

But this logic, besides the remediation step, would all happen outside the Update Manager I'm afraid.


Blog: lucd.info  Twitter: @LucD22  Co-author PowerCLI Reference

KevinLK
Contributor
Contributor

Again, thanks for the reply. I couldn't find mechanism to coordinate with the Update Manager, so your answer doesn't surprise me. Thinking about it more, I think that vSAN would have the same requirements, but I'm sure that they have some internal APIs that would allow them manage their own coordinated updates.

Can you explain what you mean by, "the ready VM triggers the remediation of the ESXi node on which it is running"? It almost sounds like there's maybe a way to manage the host remediation from the within the VM.

Thanks,

Kevin

0 Kudos
LucD
Leadership
Leadership

No, that's not what I meant :smileygrin:
What I wanted to say, you will have to write something that runs inside the VM, and is able to detect if the VM is "ready".
Then you would need a synchronisation mechanism between all the "ready" VMs, which decides which VM is next.

Then you would need another script, running somewhere else because you are going to stop the ESXi where the "ready" VM is running, to stop the "ready" VM and start the remediation of the ESXi node.

Quite complex, and you would have to do that all in the scripts.
No cmdlets or API to assist afaik.


Blog: lucd.info  Twitter: @LucD22  Co-author PowerCLI Reference

0 Kudos
KevinLK
Contributor
Contributor

ok, that makes more sense Smiley Happy. Thanks for all of the information!

Kevin

0 Kudos