VMware Cloud Community
stevenbright1
Enthusiast
Enthusiast
Jump to solution

Workflow Failing on the Waiting Timer

I have a workflow created that generates a snapshot of a VM, notifies the end user via email that the snapshot was created and then goes into a waiting timer for a 2 week period before checking to see if the snapshot still exists and notifying the user that they should remove the snapshot. It then loops each day notifying the user that the snapshot should be removed until the snapshot no longer exists.

For the majority of the time, the workflow works as expected, however, I find that one or two of the workflow runs will occassionally fail. The workflow execution schema shows that the workflow failed on the Waiting Timer (which I have verified has a valid date/time value assigned to it as the input). When I view the Events for the workflow, it has 3 events, Workflow has started, Workflow is paused, and Workflow has failed. When you select the Workflow has failed event, the description is "Workflow was issued from a task, the workflow execution has not be resumed" (I assume "be"="been" in the message). Additionally, when the workflow fails, it never executes the exception workflow which is configured to send a notification email to me with the error/exception information.

Does anyone have any ideas what might be causing this failure? I have noticed the failure occurs most often after the Orchestrator service has been restarted.

Reply
0 Kudos
1 Solution

Accepted Solutions
radostin
VMware Employee
VMware Employee
Jump to solution

Hi,

Actually the behaviour you are observing is not random and is somehow expected. Currently all workflows that are started by a task i.e. scheduled, are not resumed on server restart. The task center is responsible for them. The reason for this is that it could happen that such a workflow can be started simultaneously by resuming it from the point it was paused and from the task manager/center. So if the workflow was scheduled it won't resume on server restart.
On other hand on server restart the task manager will proceed with the following logic:
1.1 If it is a recurrent task the task manager will compute the next execution time and will schedule it for execution.
1.2 If the task is not recurrent i.e. it is a one time task there are two options:
1.2.1 The execution time didn't pass - then the task will be scheduled for execution.
1.2.2 The execution time passed already - We go in two more options:
    1.2.2.1 The task has been configured to "Start if scheduled in the past" and hasn't been completed yet - The task will be scheduled for execution once.
    1.2.2.2 The task has been configured to "DON'T start if scheduled in the past" or has already been completed - The task will complete with error and the workflow will not be executed.
   
I guess in your case the workflow is scheduled (started from a task) and that's why it fails to resume on server restart. If the workflow is not recurrent one as a workaround you can configure the scheduled task as "Start if scheduled in the past". You can do this by navigating from the client to the Scheduler-><The scheduled task>->Edit (picture attached). This will lead to execution of the workflow from the begining on server restart if the workflow has already been started but hasn't been completed i.e. paused in the middle.

Start_in_the_past.png
Regards,
Radostin

View solution in original post

Reply
0 Kudos
4 Replies
tschoergez
Leadership
Leadership
Jump to solution

Hi!

What's the setting for the workflow in case of a server failure? Restart?

It it reproducable that all workflows in the paused-state crash after a server restart?

As a workaround: You could change the logic of your workflow from using a timer event to a self-scheduling strategy (the workflow schedules itself/or schedules the reminder workflow accordingly.

See a nice example how to do that here: http://communities.vmware.com/thread/318791

Cheers,

Joerg

stevenbright1
Enthusiast
Enthusiast
Jump to solution

The server restart behavior for the workflow is configured as “Resume workflow run”.  Not all workflows fail after a reboot/restart of the service, it just appears that usually after a reboot/restart of the service, at least one of the workflows will fail. The workflows occasionally fail between restarts as well, it just appears that restarting the server causes it to happen more often.

The reason that it is currently integrated as a single workflow is that the email reminders are based on whether or not an item still exists. So the main workflow kicks off a looped workflow that starts exactly 2 weeks after the snapshot was created. When first two week timer expires, the workflow will load all the existing snapshots for the VM, check to see if the snapshot that the workflow created still exists, and then if it exists, send the email reminder and schedule a new time for 24 hours later. If the snapshot no longer exists, the entire workflow ends. The sleep timers attached to the to the email generation workflow items are 60 second timers that in the event that the workflow cannot contact the SMTP server, it will wait 60 seconds and try again instead of failing the workflow.

Attached is the schema of the workflow from one of the failures. There is a separate workflow that kicks of the creation of the snapshot and then passes the information to this workflow to generate the email reminders.

Reply
0 Kudos
radostin
VMware Employee
VMware Employee
Jump to solution

Hi,

Actually the behaviour you are observing is not random and is somehow expected. Currently all workflows that are started by a task i.e. scheduled, are not resumed on server restart. The task center is responsible for them. The reason for this is that it could happen that such a workflow can be started simultaneously by resuming it from the point it was paused and from the task manager/center. So if the workflow was scheduled it won't resume on server restart.
On other hand on server restart the task manager will proceed with the following logic:
1.1 If it is a recurrent task the task manager will compute the next execution time and will schedule it for execution.
1.2 If the task is not recurrent i.e. it is a one time task there are two options:
1.2.1 The execution time didn't pass - then the task will be scheduled for execution.
1.2.2 The execution time passed already - We go in two more options:
    1.2.2.1 The task has been configured to "Start if scheduled in the past" and hasn't been completed yet - The task will be scheduled for execution once.
    1.2.2.2 The task has been configured to "DON'T start if scheduled in the past" or has already been completed - The task will complete with error and the workflow will not be executed.
   
I guess in your case the workflow is scheduled (started from a task) and that's why it fails to resume on server restart. If the workflow is not recurrent one as a workaround you can configure the scheduled task as "Start if scheduled in the past". You can do this by navigating from the client to the Scheduler-><The scheduled task>->Edit (picture attached). This will lead to execution of the workflow from the begining on server restart if the workflow has already been started but hasn't been completed i.e. paused in the middle.

Start_in_the_past.png
Regards,
Radostin

Reply
0 Kudos
stevenbright1
Enthusiast
Enthusiast
Jump to solution

Thanks Radostin.

I believe that answers my question. Many of the tasks were started via the scheduler and I had not correlated the failures to the workflows that were scheduled.

I'm thinking in the future, if I know a workflow needs to be schedule, that I should build a waiting timer into the task for this functionality instead of relying on the scheduler?

Thanks again!

Reply
0 Kudos