VMware Cloud Community
powerbuck
Contributor
Contributor

Zombie children in Multi machine container

We have a workflow which is used to scale out an existing multi-machine build. We are seeing a failure in the step where it's iterating over the children and checking their status, as long as all the children are good, it will continue.

The MM container has 4 vms. It also appears to have another 7 children in an unknown state. I think it's from failed scale out runs. Is there any way to find these zombies and kill them? It appears to kill the process and add another zombie.

VCO 5.5.5

VCAC 6.2.0

Our logs look like this:

[2016-09-07 13:55:29.758] [I] ParentMachineId = 58217ef4-e263-453e-a70a-9334e4a3feba

[2016-09-07 13:55:32.104] [I] 2bb365d8-aae5-4676-99cd-3ecd18aabb40 On

[2016-09-07 13:55:33.198] [I] 62e858fa-e4d7-48f4-80f1-8e59bc801feb On

[2016-09-07 13:55:34.332] [I] e44f3a6b-9102-47e6-a0b8-5cc0eae79656 MachineProvisioned

[2016-09-07 13:55:35.459] [I] 905d3318-5b6f-4592-9f60-9c94a9dc1c67 On

[2016-09-07 13:55:36.562] [I] 29f09391-15b9-4c06-ae43-d4dfcea14fae On

[2016-09-07 13:55:36.562] [I] Child 3506842b-78dd-4040-85fb-80dc2a7a4349 is in unknown state

[2016-09-07 13:55:36.562] [I] Child 632fd382-fdf6-4d41-9543-aae5885da9fb is in unknown state

[2016-09-07 13:55:36.563] [I] Child 6c2dcadc-4c4f-4cf2-ba7e-20d36263234b is in unknown state

[2016-09-07 13:55:36.563] [I] Child 304c1946-8dd9-47c4-a34b-61d0a840a5ba is in unknown state

[2016-09-07 13:55:36.563] [I] Child 184fd6c3-9f31-49cf-9ac2-c3b128591721 is in unknown state

[2016-09-07 13:55:36.563] [I] Child de4f3612-5b41-4ecf-92e6-82d30deab1fc is in unknown state

[2016-09-07 13:55:36.563] [I] Child e44f3a6b-9102-47e6-a0b8-5cc0eae79656 is in success state

[2016-09-07 13:55:36.563] [I] One or more machines failed provisioning for the multimachine container. (Dynamic Script Module name : multiMachine_WaitForState#100)

Any idea how we can remove these unknown children?

Thanks.

-Dave

0 Kudos
1 Reply
GrantOrchardVMw
Commander
Commander

It's possible that these "zombies" have been placed into resources that aren't a part of the Reservation. It's hard to tell with this much information.

Do Data Collections also return errors?

Is there a reason you aren't using the native reconfigure to handle scale in/out?

Grant http://grantorchard.com
0 Kudos