|
| 1 | +# Resume stopped agents |
| 2 | + |
| 3 | +An ADK agent's execution can be interrupted by various factors including |
| 4 | +dropped network connections, power failure, or a required external system going |
| 5 | +offline. The Resume feature of ADK allows an agent workflow to pick up where it |
| 6 | +left off, avoiding the need to restart the entire workflow. In ADK Python 1.16 |
| 7 | +and higher, you can configure an ADK workflow to be resumable, so that it tracks |
| 8 | +the execution of workflow and then allows you to resume it after an unexpected |
| 9 | +interruption. |
| 10 | + |
| 11 | +This guide explains how to configure your ADK agent workflow to be resumable. |
| 12 | +If you use Custom Agents, you can update them to be resumable. For more |
| 13 | +information, see |
| 14 | +[Add resume to custom Agents](#custom-agents). |
| 15 | + |
| 16 | +## Add resumable configuration |
| 17 | + |
| 18 | +Enable the Resume function for an agent workflow by applying a Resumabiltiy |
| 19 | +configuration to the App object of your ADK workflow, as shown in the following |
| 20 | +code example: |
| 21 | + |
| 22 | +```python |
| 23 | +app = App( |
| 24 | + name='my_resumable_agent', |
| 25 | + root_agent=root_agent, |
| 26 | + # Set the resumability config to enable resumability. |
| 27 | + resumability_config=ResumabilityConfig( |
| 28 | + is_resumable=True, |
| 29 | + ), |
| 30 | +) |
| 31 | +``` |
| 32 | + |
| 33 | +!!! warning "Caution: Long Running Functions, Confirmations, Authentication" |
| 34 | + For agents that use |
| 35 | + [Long Running Functions](/adk-docs/tools/function-tools/#long-run-tool), |
| 36 | + [Confirmations](/adk-docs/tools/confirmation/), or |
| 37 | + [Authentication](/adk-docs/tools/authentication/) |
| 38 | + requiring user input, adding a resumable confirmation changes how these features |
| 39 | + operate. For more information, see the documentation for those features. |
| 40 | + |
| 41 | +!!! info "Note: Custom Agents" |
| 42 | + Resume is not supported by default for Custom Agents. You must |
| 43 | + update the agent code for a Custom Agent to support the Resume feature. For |
| 44 | + information on modifying Custom Agents to support incremental resume |
| 45 | + functionality, see |
| 46 | + [Add resume to custom Agents](#custom-agents). |
| 47 | + |
| 48 | +## Resume a stopped workflow |
| 49 | + |
| 50 | +When an ADK workflow stops execution you can resume the workflow using a |
| 51 | +command containing the Invocation ID for the workflow instance, which can be |
| 52 | +found in the |
| 53 | +[Event](/adk-docs/events/#understanding-and-using-events) |
| 54 | +history of the workflow. Make sure the ADK API server is running, in case it was |
| 55 | +interrupted or powered off, and then run the following command to resume the |
| 56 | +workflow, as shown in the following API request example. |
| 57 | + |
| 58 | +```console |
| 59 | +# restart the API server if needed: |
| 60 | +adk api_server my_resumable_agent/ |
| 61 | + |
| 62 | +# resume the agent: |
| 63 | +curl -X POST http://localhost:8000/run_sse \ |
| 64 | + -H "Content-Type: application/json" \ |
| 65 | + -d '{ |
| 66 | + "app_name": "my_resumable_agent", |
| 67 | + "user_id": "u_123", |
| 68 | + "session_id": "s_abc", |
| 69 | + "invocation_id": "invocation-123", |
| 70 | + }' |
| 71 | +``` |
| 72 | + |
| 73 | +You can also resume a workflow using the Runner object Run Async method, as |
| 74 | +shown below: |
| 75 | + |
| 76 | +```python |
| 77 | +runner.run_async(user_id='u_123', session_id='s_abc', |
| 78 | + invocation_id='invocation-123') |
| 79 | + |
| 80 | +# When new_message is set to a function response, |
| 81 | +# we are trying to resume a long running function. |
| 82 | +``` |
| 83 | + |
| 84 | +!!! info "Note" |
| 85 | + Resuming a workflow from the ADK Web user interface or using the ADK |
| 86 | + command line (CLI) tool is not currently supported. |
| 87 | + |
| 88 | +## How it works |
| 89 | + |
| 90 | +The Resume feature works by logging completed Agent workflow tasks, |
| 91 | +including incremental steps using |
| 92 | +[Events](/adk-docs/events/) and |
| 93 | +[Event Actions](/adk-docs/events/#detecting-actions-and-side-effects). |
| 94 | +tracking completion of agent tasks within a resumable workflow. If a workflow is |
| 95 | +interrupted and then later restarted, the system resumes the workflow by setting |
| 96 | +the completion state of each agent. If an agent did not complete, the workflow |
| 97 | +system reinstates any completed Events for that agent, and restarts the workflow |
| 98 | +from the partially completed state. For multi-agent workflows, the specific |
| 99 | +resume behavior varies, based on the multi-agent classes in your workflow, as |
| 100 | +described below: |
| 101 | + |
| 102 | +- **Sequential Agent**: Reads the current_sub_agent from its saved state |
| 103 | + to find the next sub-agent to run in the sequence. |
| 104 | +- **Loop Agent**: Uses the current_sub_agent and times_looped values to |
| 105 | + continue the loop from the last completed iteration and sub-agent. |
| 106 | +- **Parallel Agent**: Determines which sub-agents have already completed |
| 107 | + and only runs those that have not finished. |
| 108 | + |
| 109 | +Event logging includes results from Tools which successfully returned a result. |
| 110 | +So if an agent successfully executed Function Tools A and B, and then failed |
| 111 | +during execution of tool C, the system reinstates the results from the |
| 112 | +tools A and B, and resumes the workflow by re-running the tool C request. |
| 113 | + |
| 114 | +!!! warning "Caution: Tool execution behavior" |
| 115 | + When resuming a workflow with Tools, the Resume feature ensures |
| 116 | + that the Tools in an agent are run ***at least once***, and may run more than |
| 117 | + once when resuming a workflow. If your agent uses Tools where duplicate runs |
| 118 | + would have a negative impact, such as purchases, you should modify the Tool to |
| 119 | + check for and prevent duplicate runs. |
| 120 | + |
| 121 | +!!! note "Note: Workflow modification with Resume not supported" |
| 122 | + Do not modify a stopped agent workflow before resuming it. |
| 123 | + For example adding or removing agents from workflow that has stopped |
| 124 | + and then resuming that workflow is not supported. |
| 125 | + |
| 126 | +## Add resume to custom Agents {#custom-agents} |
| 127 | + |
| 128 | +Custom agents have specific implementation requirements in order to support |
| 129 | +resumability. You must decide on and define workflow steps within your custom |
| 130 | +agent which produce a result which can be preserved before handing off to the |
| 131 | +next step of processing. The following steps outline how to modify a Custom |
| 132 | +Agent to support a workflow Resume. |
| 133 | + |
| 134 | +- **Create CustomAgentState class**: Extend the BaseAgentState to create |
| 135 | + an object that preserves the state of your agent. |
| 136 | + - **Optionally, create WorkFlowStep class**: If your custom agent |
| 137 | + has sequential steps, consider creating a WorkFlowStep list object that |
| 138 | + defines the discrete, savable steps of the agent. |
| 139 | +- **Add initial agent state:** Modify your agent's async run function to |
| 140 | + set the initial state of your agent. |
| 141 | +- **Add agent state checkpoints**: Modify your agent's async run function |
| 142 | + to generate and save the agent state for each completed step of the agent's |
| 143 | + overall task. |
| 144 | +- **Add end of agent status to track agent state:** Modify your agent's |
| 145 | + async run function to include an `end_of_agent=True` status upon successful |
| 146 | + completion of the agent's full task. |
| 147 | + |
| 148 | +The following example shows the required code modifications to the example |
| 149 | +StoryFlowAgent class shown in the |
| 150 | +[Custom Agents](/adk-docs/agents/custom-agents/#full-code-example) |
| 151 | +guide: |
| 152 | + |
| 153 | +```python |
| 154 | +class WorkflowStep(int, Enum): |
| 155 | + INITIAL_STORY_GENERATION = 1 |
| 156 | + CRITIC_REVISER_LOOP = 2 |
| 157 | + POST_PROCESSING = 3 |
| 158 | + CONDITIONAL_REGENERATION = 4 |
| 159 | + |
| 160 | +# Extend BaseAgentState |
| 161 | + |
| 162 | +### class StoryFlowAgentState(BaseAgentState): |
| 163 | + |
| 164 | +### step = WorkflowStep |
| 165 | + |
| 166 | +@override |
| 167 | +async def _run_async_impl( |
| 168 | + self, ctx: InvocationContext |
| 169 | +) -> AsyncGenerator[Event, None]: |
| 170 | + """ |
| 171 | + Implements the custom orchestration logic for the story workflow. |
| 172 | + Uses the instance attributes assigned by Pydantic (e.g., self.story_generator). |
| 173 | + """ |
| 174 | + agent_state = self._load_agent_state(ctx, WorkflowStep) |
| 175 | + |
| 176 | + if agent_state is None: |
| 177 | + # Record the start of the agent |
| 178 | + agent_state = StoryFlowAgentState(step=WorkflowStep.INITIAL_STORY_GENERATION) |
| 179 | + yield self._create_agent_state_event(ctx, agent_state) |
| 180 | + |
| 181 | + next_step = agent_state.step |
| 182 | + logger.info(f"[{self.name}] Starting story generation workflow.") |
| 183 | + |
| 184 | + # Step 1. Initial Story Generation |
| 185 | + if next_step <= WorkflowStep.INITIAL_STORY_GENERATION: |
| 186 | + logger.info(f"[{self.name}] Running StoryGenerator...") |
| 187 | + async for event in self.story_generator.run_async(ctx): |
| 188 | + yield event |
| 189 | + |
| 190 | + # Check if story was generated before proceeding |
| 191 | + if "current_story" not in ctx.session.state or not ctx.session.state[ |
| 192 | + "current_story" |
| 193 | + ]: |
| 194 | + return # Stop processing if initial story failed |
| 195 | + |
| 196 | + agent_state = StoryFlowAgentState(step=WorkflowStep.CRITIC_REVISER_LOOP) |
| 197 | + yield self._create_agent_state_event(ctx, agent_state) |
| 198 | + |
| 199 | + # Step 2. Critic-Reviser Loop |
| 200 | + if next_step <= WorkflowStep.CRITIC_REVISER_LOOP: |
| 201 | + logger.info(f"[{self.name}] Running CriticReviserLoop...") |
| 202 | + async for event in self.loop_agent.run_async(ctx): |
| 203 | + logger.info( |
| 204 | + f"[{self.name}] Event from CriticReviserLoop: " |
| 205 | + f"{event.model_dump_json(indent=2, exclude_none=True)}" |
| 206 | + ) |
| 207 | + yield event |
| 208 | + |
| 209 | + agent_state = StoryFlowAgentState(step=WorkflowStep.POST_PROCESSING) |
| 210 | + yield self._create_agent_state_event(ctx, agent_state) |
| 211 | + |
| 212 | + # Step 3. Sequential Post-Processing (Grammar and Tone Check) |
| 213 | + if next_step <= WorkflowStep.POST_PROCESSING: |
| 214 | + logger.info(f"[{self.name}] Running PostProcessing...") |
| 215 | + async for event in self.sequential_agent.run_async(ctx): |
| 216 | + logger.info( |
| 217 | + f"[{self.name}] Event from PostProcessing: " |
| 218 | + f"{event.model_dump_json(indent=2, exclude_none=True)}" |
| 219 | + ) |
| 220 | + yield event |
| 221 | + |
| 222 | + agent_state = StoryFlowAgentState(step=WorkflowStep.CONDITIONAL_REGENERATION) |
| 223 | + yield self._create_agent_state_event(ctx, agent_state) |
| 224 | + |
| 225 | + # Step 4. Tone-Based Conditional Logic |
| 226 | + if next_step <= WorkflowStep.CONDITIONAL_REGENERATION: |
| 227 | + tone_check_result = ctx.session.state.get("tone_check_result") |
| 228 | + if tone_check_result == "negative": |
| 229 | + logger.info(f"[{self.name}] Tone is negative. Regenerating story...") |
| 230 | + async for event in self.story_generator.run_async(ctx): |
| 231 | + logger.info( |
| 232 | + f"[{self.name}] Event from StoryGenerator (Regen): " |
| 233 | + f"{event.model_dump_json(indent=2, exclude_none=True)}" |
| 234 | + ) |
| 235 | + yield event |
| 236 | + else: |
| 237 | + logger.info(f"[{self.name}] Tone is not negative. Keeping current story.") |
| 238 | + |
| 239 | + logger.info(f"[{self.name}] Workflow finished.") |
| 240 | + yield self._create_agent_state_event(ctx, end_of_agent=True) |
| 241 | +``` |
0 commit comments