Skip to content

Commit 6fbf5fc

Browse files
tkattkatpirate
andauthored
Update close tool + add output to agent result (#1505)
# Why Previously, agent executions could end without a structured final response - either by hitting `maxSteps` or the LLM breaking out of its loop without calling the `close` tool. This made it difficult to: 1. Reliably determine if a task was completed successfully 2. Extract structured data from the agent's execution # What Changed ### Ensured Close Tool is Always Called - Added `handleCloseToolCall` utility that forces a `close` tool call via a separate `generateText` call when the main agent loop ends without explicitly closing - Integrated via new `ensureClosed` private method in `v3AgentHandler.ts` - Works for both `execute()` and `stream()` modes - Triggers when `maxSteps` is reached or the LLM stops ( completes its task) ### Added Output Schema Support (Experimental) - Users can now pass a Zod schema to `agent.execute({ output: z.object({...}) })` to return structured data at the end of execution - The schema dynamically extends the close tool's input schema - Extracted data is returned in `result.output` - Added validation: - **CUA mode**: Throws `StagehandInvalidArgumentError` (not supported) - **Non-CUA without `experimental: true`**: Throws `ExperimentalNotConfiguredError` ### Example Usage ```typescript const result = await agent.execute({ instruction: "search for a shampoo on amazon and click into one of the results", maxSteps: 20, output: z.object({ productName: z.string().describe("The name of the shampoo product"), price: z.string().describe("The price of the product"), rating: z.string().describe("The star rating of the product"), }), }); console.log(result.output); // { productName: "...", price: "$12.99", rating: "4.5 out of 5 stars" } ``` # Test Plan - [x] Verify close tool is called when agent naturally completes (no change in behavior) - [x] Verify close tool is forced when `maxSteps` is reached - [x] Verify `output` schema extracts data correctly in `execute()` mode - [x] Verify `output` schema extracts data correctly in `stream()` mode - [x] Verify `output` throws `StagehandInvalidArgumentError` when used with CUA mode - [x] Verify `output` throws `ExperimentalNotConfiguredError` when used without `experimental: true` <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Ensures every agent run ends with a structured final response and adds optional structured output via a Zod schema. Improves reliability by always setting completion status and final reasoning. - **New Features** - Always triggers a "close" tool call at the end of a run (LLM stops or maxSteps), for both execute() and stream(). - Optional output schema: pass output: z.object({...}) to return typed data in result.output. - Validation: output schema is not supported in CUA (throws StagehandInvalidArgumentError). In non-CUA, requires experimental: true (throws ExperimentalNotConfiguredError otherwise). - Removed "close" from the main tool list and system prompt; closing is handled automatically post-run. <sup>Written for commit dfb703a. Summary will update on new commits.</sup> <!-- End of auto-generated description by cubic. --> --------- Co-authored-by: Nick Sweeting <github@sweeting.me>
1 parent 425138b commit 6fbf5fc

11 files changed

Lines changed: 273 additions & 52 deletions

File tree

.changeset/quick-games-try.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
"@browserbasehq/stagehand": patch
3+
---
4+
5+
Add structured output to agent result + ensure close tool is always called

packages/core/lib/v3/agent/prompts/agentSystemPrompt.ts

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,6 @@ function buildToolsSection(
6767
{ name: "wait", description: "Wait for a specified time" },
6868
{ name: "navback", description: "Navigate back in browser history" },
6969
{ name: "scroll", description: "Scroll the page x pixels up or down" },
70-
{ name: "close", description: "Mark the task as complete or failed" },
7170
];
7271

7372
const domTools: ToolDefinition[] = [
@@ -92,7 +91,6 @@ function buildToolsSection(
9291
{ name: "wait", description: "Wait for a specified time" },
9392
{ name: "navback", description: "Navigate back in browser history" },
9493
{ name: "scroll", description: "Scroll the page x pixels up or down" },
95-
{ name: "close", description: "Mark the task as complete or failed" },
9694
];
9795

9896
const baseTools = isHybridMode ? hybridTools : domTools;
@@ -224,8 +222,6 @@ export function buildAgentSystemPrompt(
224222
<item>Always start by understanding the current page state</item>
225223
<item>Use the screenshot tool to verify page state when needed</item>
226224
<item>Use appropriate tools for each action</item>
227-
<item>When the task is complete, use the "close" tool with taskComplete: true</item>
228-
<item>If the task cannot be completed, use "close" with taskComplete: false</item>
229225
</guidelines>
230226
${pageUnderstandingProtocol}
231227
<navigation>

packages/core/lib/v3/agent/tools/close.ts

Lines changed: 0 additions & 16 deletions
This file was deleted.

packages/core/lib/v3/agent/tools/index.ts

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,6 @@ import { actTool } from "./act";
33
import { screenshotTool } from "./screenshot";
44
import { waitTool } from "./wait";
55
import { navBackTool } from "./navback";
6-
import { closeTool } from "./close";
76
import { ariaTreeTool } from "./ariaTree";
87
import { fillFormTool } from "./fillform";
98
import { scrollTool, scrollVisionTool } from "./scroll";
@@ -87,7 +86,7 @@ export function createAgentTools(v3: V3, options?: V3AgentToolOptions) {
8786
ariaTree: ariaTreeTool(v3),
8887
click: clickTool(v3, provider),
8988
clickAndHold: clickAndHoldTool(v3, provider),
90-
close: closeTool(),
89+
//close: closeTool(),
9190
dragAndDrop: dragAndDropTool(v3, provider),
9291
extract: extractTool(v3, executionModel, options?.logger),
9392
fillForm: fillFormTool(v3, executionModel),
@@ -121,7 +120,6 @@ export type AgentToolTypesMap = {
121120
ariaTree: ReturnType<typeof ariaTreeTool>;
122121
click: ReturnType<typeof clickTool>;
123122
clickAndHold: ReturnType<typeof clickAndHoldTool>;
124-
close: ReturnType<typeof closeTool>;
125123
dragAndDrop: ReturnType<typeof dragAndDropTool>;
126124
extract: ReturnType<typeof extractTool>;
127125
fillForm: ReturnType<typeof fillFormTool>;
Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
import { generateText, ModelMessage, LanguageModel, ToolSet } from "ai";
2+
import { z } from "zod";
3+
import { tool } from "ai";
4+
import { LogLine } from "../../types/public/logs";
5+
import { StagehandZodObject } from "../../zodCompat";
6+
interface CloseResult {
7+
reasoning: string;
8+
taskComplete: boolean;
9+
messages: ModelMessage[];
10+
output?: Record<string, unknown>;
11+
}
12+
13+
const baseCloseSchema = z.object({
14+
reasoning: z
15+
.string()
16+
.describe("Brief summary of what actions were taken and the outcome"),
17+
taskComplete: z
18+
.boolean()
19+
.describe("true if the task was fully completed, false otherwise"),
20+
});
21+
22+
/**
23+
* Force a close tool call at the end of an agent run.
24+
* This ensures we always get a structured final response,
25+
* even if the main loop ended without calling close.
26+
*/
27+
export async function handleCloseToolCall(options: {
28+
model: LanguageModel;
29+
inputMessages: ModelMessage[];
30+
instruction: string;
31+
outputSchema?: StagehandZodObject;
32+
logger: (message: LogLine) => void;
33+
}): Promise<CloseResult> {
34+
const { model, inputMessages, instruction, outputSchema, logger } = options;
35+
36+
logger({
37+
category: "agent",
38+
message: "Agent calling tool: close",
39+
level: 1,
40+
});
41+
// Merge base close schema with user-provided output schema if present
42+
const closeToolSchema = outputSchema
43+
? baseCloseSchema.extend({
44+
output: outputSchema.describe(
45+
"The specific data the user requested from this task",
46+
),
47+
})
48+
: baseCloseSchema;
49+
50+
const outputInstructions = outputSchema
51+
? `\n\nThe user also requested the following information from this task. Provide it in the "output" field:\n${JSON.stringify(
52+
Object.fromEntries(
53+
Object.entries(outputSchema.shape).map(([key, value]) => [
54+
key,
55+
value.description || "no description",
56+
]),
57+
),
58+
null,
59+
2,
60+
)}`
61+
: "";
62+
63+
const systemPrompt = `You are a web automation assistant that was tasked with completing a task.
64+
65+
The task was:
66+
"${instruction}"
67+
68+
Review what was accomplished and provide your final assessment in whether the task was completed successfully. you have been provided with the history of the actions taken so far, use this to determine if the task was completed successfully.${outputInstructions}
69+
70+
Call the "close" tool with:
71+
1. A brief summary of what was done
72+
2. Whether the task was completed successfully${outputSchema ? "\n3. The requested output data based on what you found" : ""}`;
73+
74+
const closeTool = tool({
75+
description: outputSchema
76+
? "Complete the task with your assessment and the requested output data."
77+
: "Complete the task with your final assessment.",
78+
inputSchema: closeToolSchema,
79+
execute: async (params) => {
80+
return { success: true, ...params };
81+
},
82+
});
83+
84+
const userPrompt: ModelMessage = {
85+
role: "user",
86+
content: outputSchema
87+
? "Provide your final assessment and the requested output data."
88+
: "Provide your final assessment.",
89+
};
90+
91+
const result = await generateText({
92+
model,
93+
system: systemPrompt,
94+
messages: [...inputMessages, userPrompt],
95+
tools: { close: closeTool } as ToolSet,
96+
toolChoice: { type: "tool", toolName: "close" },
97+
});
98+
99+
const closeToolCall = result.toolCalls.find((tc) => tc.toolName === "close");
100+
const outputMessages: ModelMessage[] = [
101+
userPrompt,
102+
...(result.response?.messages || []),
103+
];
104+
105+
if (!closeToolCall) {
106+
return {
107+
reasoning: result.text || "Task execution completed",
108+
taskComplete: false,
109+
messages: outputMessages,
110+
};
111+
}
112+
113+
const input = closeToolCall.input as z.infer<typeof baseCloseSchema> & {
114+
output?: Record<string, unknown>;
115+
};
116+
logger({
117+
category: "agent",
118+
message: `Task completed`,
119+
level: 1,
120+
});
121+
122+
return {
123+
reasoning: input.reasoning,
124+
taskComplete: input.taskComplete,
125+
messages: outputMessages,
126+
output: input.output,
127+
};
128+
}

packages/core/lib/v3/agent/utils/validateExperimentalFeatures.ts

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,10 +21,10 @@ export interface AgentValidationOptions {
2121
* Validates agent configuration and experimental feature usage.
2222
*
2323
* This utility consolidates all validation checks for both CUA and non-CUA agent paths:
24-
* - Invalid argument errors for CUA (streaming, abort signal, message continuation, excludeTools are not supported)
24+
* - Invalid argument errors for CUA (streaming, abort signal, message continuation, excludeTools, output schema are not supported)
2525
* - Experimental feature checks for integrations and tools (both CUA and non-CUA)
2626
* - Experimental feature checks for hybrid mode (requires experimental: true)
27-
* - Experimental feature checks for non-CUA only (callbacks, signal, messages, streaming, excludeTools)
27+
* - Experimental feature checks for non-CUA only (callbacks, signal, messages, streaming, excludeTools, output schema)
2828
*
2929
* Throws StagehandInvalidArgumentError for invalid/unsupported configurations.
3030
* Throws ExperimentalNotConfiguredError if experimental features are used without experimental mode.
@@ -56,6 +56,9 @@ export function validateExperimentalFeatures(
5656
) {
5757
unsupportedFeatures.push("excludeTools");
5858
}
59+
if (executeOptions?.output) {
60+
unsupportedFeatures.push("output schema");
61+
}
5962

6063
if (unsupportedFeatures.length > 0) {
6164
throw new StagehandInvalidArgumentError(
@@ -97,6 +100,9 @@ export function validateExperimentalFeatures(
97100
if (executeOptions.excludeTools && executeOptions.excludeTools.length > 0) {
98101
features.push("excludeTools");
99102
}
103+
if (executeOptions.output) {
104+
features.push("output schema");
105+
}
100106
}
101107

102108
if (features.length > 0) {

0 commit comments

Comments
 (0)