Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions packages/builtin-tools/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ export { TaskStopTool } from './tools/TaskStopTool/TaskStopTool.js'
export { TodoWriteTool } from './tools/TodoWriteTool/TodoWriteTool.js'
export { ToolSearchTool } from './tools/ToolSearchTool/ToolSearchTool.js'
export { TungstenTool } from './tools/TungstenTool/TungstenTool.js'
export { ExaSearchTool } from './tools/ExaSearchTool/ExaSearchTool.js'
export { WebFetchTool } from './tools/WebFetchTool/WebFetchTool.js'
export { WebSearchTool } from './tools/WebSearchTool/WebSearchTool.js'
export { TestingPermissionTool } from './tools/testing/TestingPermissionTool.js'
Expand Down
288 changes: 288 additions & 0 deletions packages/builtin-tools/src/tools/ExaSearchTool/ExaSearchTool.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,288 @@
import { z } from 'zod/v4'
import type { ValidationResult } from 'src/Tool.js'
import { buildTool, type ToolDef } from 'src/Tool.js'
import { lazySchema } from 'src/utils/lazySchema.js'
import { EXA_SEARCH_TOOL_NAME, getDescription } from './prompt.js'
import {
getToolUseSummary,
renderToolResultMessage,
renderToolUseMessage,
} from './UI.js'

const inputSchema = lazySchema(() =>
z.strictObject({
query: z.string().min(2).describe('The search query to use'),
numResults: z
.number()
.optional()
.describe('Number of search results to return (default: 8)'),
livecrawl: z
.enum(['fallback', 'preferred'])
.optional()
.describe(
"Live crawl mode - 'fallback': use live crawling as backup if cached content unavailable, 'preferred': prioritize live crawling (default: 'fallback')",
),
type: z
.enum(['auto', 'fast', 'deep'])
.optional()
.describe(
"Search type - 'auto': balanced search (default), 'fast': quick results, 'deep': comprehensive search",
),
contextMaxCharacters: z
.number()
.optional()
Comment on lines +16 to +33
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Constrain numeric input parameters before making external calls.

numResults and contextMaxCharacters accept any number currently (including zero/negative/infinite). That can send invalid requests and produce avoidable API failures.

Suggested schema constraints
     numResults: z
       .number()
+      .int()
+      .min(1)
+      .max(20)
       .optional()
       .describe('Number of search results to return (default: 8)'),
...
     contextMaxCharacters: z
       .number()
+      .int()
+      .min(1)
+      .max(100_000)
       .optional()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
.number()
.optional()
.describe('Number of search results to return (default: 8)'),
livecrawl: z
.enum(['fallback', 'preferred'])
.optional()
.describe(
"Live crawl mode - 'fallback': use live crawling as backup if cached content unavailable, 'preferred': prioritize live crawling (default: 'fallback')",
),
type: z
.enum(['auto', 'fast', 'deep'])
.optional()
.describe(
"Search type - 'auto': balanced search (default), 'fast': quick results, 'deep': comprehensive search",
),
contextMaxCharacters: z
.number()
.optional()
.number()
.int()
.min(1)
.max(20)
.optional()
.describe('Number of search results to return (default: 8)'),
livecrawl: z
.enum(['fallback', 'preferred'])
.optional()
.describe(
"Live crawl mode - 'fallback': use live crawling as backup if cached content unavailable, 'preferred': prioritize live crawling (default: 'fallback')",
),
type: z
.enum(['auto', 'fast', 'deep'])
.optional()
.describe(
"Search type - 'auto': balanced search (default), 'fast': quick results, 'deep': comprehensive search",
),
contextMaxCharacters: z
.number()
.int()
.min(1)
.max(100_000)
.optional()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/builtin-tools/src/tools/ExaSearchTool/ExaSearchTool.ts` around lines
16 - 33, The numeric fields numResults and contextMaxCharacters in
ExaSearchTool's schema currently accept any number; restrict and validate them
before external calls by updating the Zod schemas for numResults and
contextMaxCharacters (in ExaSearchTool.ts) to enforce sensible bounds and types
(e.g., make numResults an integer with .int().min(1).max(50) and a default of 8,
and make contextMaxCharacters an integer with .int().min(100).max(20000) or
similar), and keep them optional only if appropriate; this ensures invalid
values (zero/negative/NaN/Infinity) are rejected or normalized before making
external API calls.

.describe(
'Maximum characters for context string optimized for LLMs (default: 10000)',
),
}),
)

type InputSchema = ReturnType<typeof inputSchema>
type Input = z.infer<InputSchema>

const outputSchema = lazySchema(() =>
z.object({
query: z.string().describe('The search query that was executed'),
results: z
.array(
z.object({
title: z.string().describe('The title of the search result'),
url: z.string().describe('The URL of the search result'),
}),
)
.describe('Search results'),
durationSeconds: z
.number()
.describe('Time taken to complete the search operation'),
}),
)

type OutputSchema = ReturnType<typeof outputSchema>
type Output = z.infer<OutputSchema>

const API_CONFIG = {
BASE_URL: 'https://mcp.exa.ai',
ENDPOINTS: {
SEARCH: '/mcp',
},
DEFAULT_NUM_RESULTS: 8,
TIMEOUT_MS: 25000,
} as const

interface McpSearchRequest {
jsonrpc: string
id: number
method: string
params: {
name: string
arguments: {
query: string
numResults?: number
livecrawl?: 'fallback' | 'preferred'
type?: 'auto' | 'fast' | 'deep'
contextMaxCharacters?: number
}
}
}

interface McpSearchResponse {
jsonrpc: string
result: {
content: Array<{
type: string
text: string
}>
}
}

function parseExaResults(text: string): Array<{ title: string; url: string }> {
const results: Array<{ title: string; url: string }> = []
const linkRegex = /\[([^\]]+)\]\(([^)]+)\)/g
let match

while ((match = linkRegex.exec(text)) !== null && results.length < 20) {
const title = match[1].trim()
const url = match[2].trim()

if (
title &&
url &&
(url.startsWith('http://') || url.startsWith('https://'))
) {
results.push({ title, url })
}
}

if (results.length === 0) {
const lines = text.split('\n').filter(l => l.trim())
for (const line of lines) {
const urlMatch = line.match(/https?:\/\/[^\s]+/)
if (urlMatch) {
const url = urlMatch[0]
const title =
line
.replace(url, '')
.replace(/^[-*•]\s*/, '')
.trim() || url
results.push({ title, url })
}
}
}

return results
}

export const ExaSearchTool = buildTool({
name: EXA_SEARCH_TOOL_NAME,
searchHint: 'search the web using Exa AI for current information',
maxResultSizeChars: 100_000,
shouldDefer: true,
async description(input) {
return `Exa web search for: ${input.query}`
},
userFacingName() {
return 'Exa Search'
},
getToolUseSummary,
getActivityDescription(input) {
const summary = getToolUseSummary(input)
return summary ? `Searching the web for "${summary}"` : 'Searching the web'
},
get inputSchema(): InputSchema {
return inputSchema()
},
get outputSchema(): OutputSchema {
return outputSchema()
},
isConcurrencySafe() {
return true
},
isReadOnly() {
return true
},
toAutoClassifierInput(input) {
return input.query
},
isSearchOrReadCommand() {
return { isSearch: true, isRead: false }
},
async validateInput(input): Promise<ValidationResult> {
if (!input.query || input.query.length < 2) {
return {
result: false,
message: 'Query must be at least 2 characters',
errorCode: 1,
}
}
return { result: true }
},
async prompt() {
return getDescription()
},
renderToolUseMessage,
renderToolResultMessage,
extractSearchText({ query, results }) {
if (!results) return ''
return results.map(r => `${r.title} ${r.url}`).join('\n')
},
async call(input, { abortController }): Promise<{ data: Output }> {
const startTime = performance.now()

const searchRequest: McpSearchRequest = {
jsonrpc: '2.0',
id: 1,
method: 'tools/call',
params: {
name: 'web_search_exa',
arguments: {
query: input.query,
type: input.type || 'auto',
numResults: input.numResults || API_CONFIG.DEFAULT_NUM_RESULTS,
livecrawl: input.livecrawl || 'fallback',
contextMaxCharacters: input.contextMaxCharacters,
},
},
}

const timeoutId = setTimeout(
() => abortController.abort(),
API_CONFIG.TIMEOUT_MS,
)

try {
const headers: Record<string, string> = {
accept: 'application/json, text/event-stream',
'content-type': 'application/json',
}

const response = await fetch(
`${API_CONFIG.BASE_URL}${API_CONFIG.ENDPOINTS.SEARCH}`,
{
method: 'POST',
headers,
body: JSON.stringify(searchRequest),
signal: abortController.signal,
},
)

if (!response.ok) {
const errorText = await response.text()
throw new Error(`Search error (${response.status}): ${errorText}`)
}

const responseText = await response.text()
const lines = responseText.split('\n')

for (const line of lines) {
if (line.startsWith('data: ')) {
const data: McpSearchResponse = JSON.parse(line.substring(6))
if (
data.result &&
data.result.content &&
data.result.content.length > 0
) {
const contentText = data.result.content[0].text
const results = parseExaResults(contentText)

return {
data: {
query: input.query,
results,
durationSeconds: (performance.now() - startTime) / 1000,
},
}
}
}
}
Comment on lines +236 to +256
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make SSE parsing resilient and avoid returning on the first frame.

Current logic JSON.parses every data: line and returns immediately from the first content payload. SSE streams commonly include non-JSON frames (e.g. done markers) and multi-frame payloads, so this can fail or return partial results.

Suggested hardening
-      for (const line of lines) {
+      let aggregatedText = ''
+      for (const line of lines) {
         if (line.startsWith('data: ')) {
-          const data: McpSearchResponse = JSON.parse(line.substring(6))
+          const raw = line.substring(6).trim()
+          if (!raw || raw === '[DONE]') continue
+          let data: McpSearchResponse | null = null
+          try {
+            data = JSON.parse(raw) as McpSearchResponse
+          } catch {
+            continue
+          }
           if (
-            data.result &&
-            data.result.content &&
-            data.result.content.length > 0
+            data?.result?.content &&
+            data.result.content.length > 0 &&
+            data.result.content[0]?.text
           ) {
-            const contentText = data.result.content[0].text
-            const results = parseExaResults(contentText)
-
-            return {
-              data: {
-                query: input.query,
-                results,
-                durationSeconds: (performance.now() - startTime) / 1000,
-              },
-            }
+            aggregatedText += `${data.result.content[0].text}\n`
           }
         }
       }
+      const results = parseExaResults(aggregatedText)
+      return {
+        data: {
+          query: input.query,
+          results,
+          durationSeconds: (performance.now() - startTime) / 1000,
+        },
+      }
-
-      return {
-        data: {
-          query: input.query,
-          results: [],
-          durationSeconds: (performance.now() - startTime) / 1000,
-        },
-      }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/builtin-tools/src/tools/ExaSearchTool/ExaSearchTool.ts` around lines
236 - 256, The SSE parsing loop in the for (const line of lines) block currently
JSON.parse's every 'data: ' line and returns on the first content frame, which
is brittle for non-JSON frames and multi-frame payloads; update the logic to:
for each line that startsWith('data: '), try/catch JSON.parse to skip non-JSON
frames, accumulate all valid data.result.content texts (e.g., push contentText
from each parsed McpSearchResponse into an array), only compute results via
parseExaResults after the stream completes or a terminal marker is seen, and
return a single aggregated response including query, merged results, and
durationSeconds computed from startTime; reference parseExaResults,
McpSearchResponse, input.query, and startTime to locate and modify the code.


return {
data: {
query: input.query,
results: [],
durationSeconds: (performance.now() - startTime) / 1000,
},
}
} finally {
clearTimeout(timeoutId)
}
},
mapToolResultToToolResultBlockParam(output, toolUseID) {
let formattedOutput = `Exa web search results for: "${output.query}"\n\n`

if (output.results.length > 0) {
output.results.forEach(r => {
formattedOutput += `- ${r.title}\n ${r.url}\n`
})
} else {
formattedOutput += 'No results found.\n'
}

formattedOutput += `\nSearch completed in ${output.durationSeconds.toFixed(2)}s`

return {
tool_use_id: toolUseID,
type: 'tool_result',
content: formattedOutput.trim(),
}
},
} satisfies ToolDef<InputSchema, Output, unknown>)
Loading