Skip to content

Automatically Fail Workflows on Repeated Decision Task Failures #7294

@natemort

Description

@natemort

Is your feature request related to a problem? Please describe.
A variety of issues can result in decision tasks repeatedly failing, such as specifying the wrong Workflow type, non-determinism, or potentially invalid input. Retrying the decision task forever can be convenient as the workflows will automatically resume once the issue is fixed, but it doesn't effectively convey to the user that some action is required.

These retries additionally create unnecessary load on the server, and can potentially conflict with user's workloads as they compete for resources, scheduling, and rate limiting. There have been a number of attempts to work address this problem, such as adding backoff or eventually abandoning dispatching the task altogether. These both deliver a poor user experience.

If the workflows were failed after a certain number of attempts for a given decision task that provides a clear signal to the user that it will not complete without additional intervention, and the user can reset the workflow to resume execution once the problem has been addressed.

Proposed Solution
Similar to restrictions on workflow size, or concurrent activity execution, we should add configuration options to warn about and ultimately terminate workflows that exceed a certain number of attempts for a given decision task.

Ideally we add some sort of search attribute or metadata to the workflow to make them easy to find, and provide a clear mechanism for users to reset to that specific point in the workflow history in bulk.

We should document the existing behavior around retries, backoff, and abandoning tasks as well.

Additional context

One additional piece of nuance is that the first decision task for a Workflow has a TTL equivalent to the Workflow's overall TTL. This is an optimization to avoid redispatching it over and over. The solution described here will not fail workflows started with the wrong TaskList.

Metadata

Metadata

Assignees

No one assigned

    Labels

    good first issueUp for grab as first issue to contribute to Cadence project

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions