Feat: Implement Gemini Interaction API in adk-js#364
Conversation
…olve CI E401 error
…DK 2.0 type inheritance checks in CI
| text?: string; | ||
| // eslint-disable-next-line @typescript-eslint/no-explicit-any | ||
| functionCall?: any; | ||
| // eslint-disable-next-line @typescript-eslint/no-explicit-any | ||
| functionResponse?: any; | ||
| // eslint-disable-next-line @typescript-eslint/no-explicit-any | ||
| inlineData?: any; | ||
| // eslint-disable-next-line @typescript-eslint/no-explicit-any | ||
| fileData?: any; | ||
| // eslint-disable-next-line @typescript-eslint/no-explicit-any | ||
| thoughtSignature?: any; | ||
| thought?: boolean; | ||
| // eslint-disable-next-line @typescript-eslint/no-explicit-any | ||
| codeExecutionResult?: any; | ||
| // eslint-disable-next-line @typescript-eslint/no-explicit-any | ||
| executableCode?: any; |
There was a problem hiding this comment.
please remove, use Part type from @google/genai
There was a problem hiding this comment.
I've audited the SDK type definitions and confirmed that the native Part type from @google/genai already natively contains all these fields. I have removed the ExtendedPart interface entirely and refactored the utility functions to parse standard Part objects directly.
| interface ExtendedTool { | ||
| functionDeclarations?: Array<{ | ||
| name: string; | ||
| description?: string; | ||
| parameters?: { | ||
| properties?: Record<string, unknown>; | ||
| required?: string[]; | ||
| }; | ||
| parametersJsonSchema?: unknown; | ||
| }>; | ||
| googleSearch?: unknown; | ||
| codeExecution?: unknown; | ||
| urlContext?: unknown; | ||
| } | ||
|
|
||
| interface InteractionTextContent { | ||
| type: 'text'; | ||
| text: string; | ||
| } | ||
|
|
||
| interface InteractionFunctionCall { | ||
| type: 'function_call'; | ||
| id: string; | ||
| name: string; | ||
| arguments: Record<string, unknown>; | ||
| thought_signature?: string; | ||
| } | ||
|
|
||
| interface InteractionFunctionResult { | ||
| type: 'function_result'; | ||
| name: string; | ||
| call_id: string; | ||
| result: unknown; | ||
| } | ||
|
|
||
| interface InteractionMediaContent { | ||
| type: 'image' | 'audio' | 'video' | 'document'; | ||
| data?: string; | ||
| uri?: string; | ||
| mime_type: string; | ||
| } | ||
|
|
||
| interface InteractionThought { | ||
| type: 'thought'; | ||
| signature?: string; | ||
| } | ||
|
|
||
| interface InteractionCodeExecutionCall { | ||
| type: 'code_execution_call'; | ||
| id: string; | ||
| arguments: { | ||
| code: string; | ||
| language: string; | ||
| }; | ||
| } | ||
|
|
||
| interface InteractionCodeExecutionResult { | ||
| type: 'code_execution_result'; | ||
| call_id: string; | ||
| result: string; | ||
| is_error: boolean; | ||
| } | ||
|
|
||
| type InteractionContent = | ||
| | InteractionTextContent | ||
| | InteractionFunctionCall | ||
| | InteractionFunctionResult | ||
| | InteractionMediaContent | ||
| | InteractionThought | ||
| | InteractionCodeExecutionCall | ||
| | InteractionCodeExecutionResult; | ||
|
|
||
| interface InteractionTurn { | ||
| role: string; | ||
| content: InteractionContent[]; | ||
| } | ||
|
|
||
| interface InteractionTool { | ||
| type: 'function' | 'google_search' | 'code_execution' | 'url_context'; | ||
| name?: string; | ||
| description?: string; | ||
| parameters?: unknown; | ||
| } | ||
|
|
||
| interface InteractionResponse { | ||
| id: string; | ||
| status: 'completed' | 'requires_action' | 'failed' | string; | ||
| error?: { | ||
| code: string; | ||
| message: string; | ||
| }; | ||
| outputs?: Record<string, unknown>[]; | ||
| usage?: { | ||
| total_input_tokens?: number; | ||
| total_output_tokens?: number; | ||
| }; | ||
| } | ||
|
|
||
| interface InteractionSSEEvent { | ||
| event_type?: string; | ||
| eventType?: string; | ||
| delta?: { | ||
| type: string; | ||
| text?: string; | ||
| name?: string; | ||
| id?: string; | ||
| arguments?: Record<string, unknown>; | ||
| thought_signature?: string; | ||
| data?: string; | ||
| uri?: string; | ||
| mime_type: string; | ||
| }; | ||
| status?: string; | ||
| error?: { | ||
| code: string; | ||
| message: string; | ||
| }; | ||
| code?: string; | ||
| message?: string; | ||
| interaction_id?: string; | ||
| interactionId?: string; | ||
| interaction?: { | ||
| id: string; | ||
| }; | ||
| id?: string; | ||
| } | ||
|
|
||
| interface GoogleGenAIWithInteractions { | ||
| interactions: { | ||
| create(params: { | ||
| model?: string; | ||
| input: InteractionTurn[]; | ||
| stream: boolean; | ||
| systemInstruction?: string; | ||
| tools?: InteractionTool[]; | ||
| generationConfig?: Record<string, unknown>; | ||
| previousInteractionId?: string; | ||
| // eslint-disable-next-line @typescript-eslint/no-explicit-any | ||
| }): Promise<any>; // We keep 'any' here as the SDK return type is complex (stream vs non-stream) | ||
| }; | ||
| } |
There was a problem hiding this comment.
Can we get those types from @google/genai?
There was a problem hiding this comment.
I have replaced all custom interaction types (InteractionContent, InteractionTurn, InteractionTool, InteractionResponse, InteractionSSEEvent, etc.) with native counterparts provided by @google/genai (like Interactions.Content, Interactions.Turn, Interactions.Tool, and Interactions.Interaction).
To handle runtime/SDK discrepancies typesafely without casting to any:
- Defined local
ExtendedInteractionandExtendedInteractionStatusUpdateinterfaces extending the SDK types to cleanly declare the runtimeerrorfields. - Adjusted the stream event parser to fall back to
delta.signature || delta.thought_signatureand correctly map nestederrorobjects for standardErrorEvents. - Removed the obsolete
GoogleGenAIWithInteractionsclient wrapper sinceGoogleGenAIhas a nativeinteractionsgetter.
| if (mimeType.startsWith('image/')) { | ||
| return 'image'; | ||
| } else if (mimeType.startsWith('audio/')) { | ||
| return 'audio'; | ||
| } else if (mimeType.startsWith('video/')) { | ||
| return 'video'; | ||
| } else { | ||
| return 'document'; | ||
| } |
There was a problem hiding this comment.
please use switch case instead
There was a problem hiding this comment.
Refactored getInteractionMediaType to split mimeType by / and use a clean switch statement on the primary media type prefix (e.g., image, audio).
fdad857 to
c4561d4
Compare
…s in llm_response.ts
| } | ||
|
|
||
| const model = agent.canonicalModel; | ||
| if (model instanceof Gemini && model.useInteractionsApi) { |
There was a problem hiding this comment.
please avoid using the instanceof and use util functions like isGenimiModel. Looks like isGenimiModel does not exist yet so we should create one. As an example use the https://github.com/google/adk-js/blob/main/core/src/models/base_llm.ts#L17-L31 from BaseModel.
Problem with instanceof is that when user will have multiple adk-js packages in their runtime it will not able to mix objects from one to another.
There was a problem hiding this comment.
Resolved. Defined a new isGemini type guard in core/src/models/google_llm.ts using a unique symbol ( Symbol.for('google.adk.geminiModel') ) to avoid package duplication issues. Refactored interactions_request_processor.ts to use isGemini(model) instead of model instanceof Gemini .
|
|
||
| // --- Helper Interfaces for Strong Typing --- | ||
|
|
||
| interface ExtendedTool { |
There was a problem hiding this comment.
Why not just using the default Tool interface from @google/genai
There was a problem hiding this comment.
Resolved. Removed the custom ExtendedTool interface and refactored convertToolsConfigToInteractionsFormat to use the native Tool interface from @google/genai directly.
| interface ExtendedTool { | ||
| functionDeclarations?: Array<{ | ||
| name: string; | ||
| description?: string; | ||
| parameters?: { | ||
| properties?: Record<string, unknown>; | ||
| required?: string[]; | ||
| }; | ||
| parametersJsonSchema?: unknown; | ||
| }>; | ||
| googleSearch?: unknown; | ||
| codeExecution?: unknown; | ||
| urlContext?: unknown; | ||
| } | ||
|
|
||
| export interface ExtendedInteraction extends Interactions.Interaction { | ||
| error?: { | ||
| code: string; | ||
| message: string; | ||
| }; | ||
| } | ||
|
|
||
| export interface ExtendedInteractionStatusUpdate extends Omit< | ||
| Interactions.InteractionStatusUpdate, | ||
| 'error' | ||
| > { | ||
| error?: { | ||
| code: string; | ||
| message: string; | ||
| }; | ||
| } | ||
|
|
||
| // Runtime event types can be more relaxed than compile-time | ||
| export interface ExtendedInteractionSSEEvent extends Omit< | ||
| Interactions.InteractionSSEEvent, | ||
| 'error' | 'interaction_id' | 'status' | 'event_type' | ||
| > { | ||
| event_type?: string; | ||
| eventType?: string; | ||
| delta?: { | ||
| type: string; | ||
| text?: string; | ||
| name?: string; | ||
| id?: string; | ||
| arguments?: Record<string, unknown>; | ||
| thought_signature?: string; | ||
| signature?: string; | ||
| data?: string; | ||
| uri?: string; | ||
| mime_type: string; | ||
| }; | ||
| status?: string; | ||
| error?: { | ||
| code: string; | ||
| message: string; | ||
| }; | ||
| code?: string; | ||
| message?: string; | ||
| interaction_id?: string; | ||
| interactionId?: string; | ||
| interaction?: { | ||
| id: string; | ||
| }; | ||
| id?: string; | ||
| } |
There was a problem hiding this comment.
Pretty much sure that we does not need all of that. Please make sure that we does not need that and we can use default types from @google/genai
There was a problem hiding this comment.
We double-checked the native types in @google/genai (version 2.9.0) and confirmed that:
Interactions.Interaction and Interactions.InteractionStatusUpdate still lack the error field, which is returned at runtime when an interaction fails.
Interactions.InteractionSSEEvent is a strict union type that makes it difficult to access common fields like interaction_id or interactionId (handling casing differences at runtime) without verbose type narrowing on each stream event. Both casing schemes are also needed here as both are used in the codebase. The Gen AI API uses snake_case while the SDK uses camelCase in some cases.
Therefore, we kept these clean local extended interfaces to maintain type safety without resorting to any casts, aligning with the project's coding guidelines.
Additionally, we defined ExtendedFunctionCallStep extending Interactions.FunctionCallStep to add the signature field, which is missing in the SDK type definition but used in the codebase to propagate thought signatures.
…edFunctionCallStep
| "lint:fix": "eslint --fix \"**/*.ts\"", | ||
| "format": "prettier \"**/*.ts\" --write", | ||
| "format:check": "prettier \"**/*.ts\" --check", | ||
| "format": "prettier \"core/**/*.ts\" --write && prettier \"dev/**/*.ts\" --write && prettier \"tests/**/*.ts\" vitest.config.ts --write", |
| const childProcess = spawn('npm', ['run', 'start'], { | ||
| cwd: projectPath, | ||
| shell: true, | ||
| stdio: ['pipe', 'pipe', 'inherit'], |
| expect(buildResult.stdout).toContain('\nBuild complete'); | ||
| } | ||
| }); | ||
| }, 120000); |
| describe('Agent with skills that generates JS script and runs it locally', () => { | ||
| beforeAll(async () => { | ||
| await execAsync('npm install', {cwd: PROJECT_PATH}); | ||
| }, TEST_EXECUTION_TIMEOUT); |
Please ensure you have read the contribution guide before creating a pull request.
Link to Issue or Description of Change
N/A
Closes: #issue_number
Related: #issue_number
2. Or, if no issue exists, describe the change:
This PR implements the next-generation stateful Gemini Interaction API integration in
adk-js, mirroring the design and functionality already present inadk-python. This enables stateful, multi-turn conversations by tracking interaction history server-side usinginteractionId, reducing payload sizes across progressive turns.If applicable, please follow the issue templates to provide as much detail as possible.
Problem:
The current
adk-jscore only supports stateless execution via the standardgenerateContentAPI, which requires sending the entire conversational history back and forth on every turn. This increases payload sizes, causes overhead, and prevents leveraging server-side interaction history tracking.Solution:
previousInteractionId?: stringtoLlmRequestincore/src/models/llm_request.ts.interactionId?: stringtoLlmResponseincore/src/models/llm_response.ts.InteractionsRequestProcessorundercore/src/agents/processors/interactions_request_processor.ts. It automatically traverses the session events history in reverse to find the latest validinteractionIdfor the current branch and sub-agent name, injecting it aspreviousInteractionIdinto the outgoing request.INTERACTIONS_REQUEST_PROCESSORinLlmAgentrequest processors, immediately following theCONTENT_REQUEST_PROCESSOR.core/src/models/interactions_utils.tscontaining:getLatestUserContentsto trim the outgoing conversation history, sending only the latest continuous user turn whenpreviousInteractionIdis present (with special handling to retain the preceding model turn's function call if the user turn contains a function response).@google/genaiInteractions REST schemas (and vice-versa).generateContentViaInteractionswrapping@google/genaiinteractions resource calls.Geminiclass (core/src/models/google_llm.ts) to acceptuseInteractionsApi?: booleanparameter, toggling the flow to delegate togenerateContentViaInteractionswhen enabled.Testing Plan
Please describe the tests that you ran to verify your changes. This is required for all PRs that are not small documentation or typo fixes.
Unit Tests:
We implemented extensive unit tests targeting the stateful request processor and the payload converters:
core/test/agents/processors/interactions_request_processor_test.ts(6 tests)core/test/models/interactions_utils_test.ts(89 tests)Summary of passed npm test results:
We achieved 100% Statement, Branch, Function, and Line coverage for both new source files in
adk-js/core:core/src/agents/processors/interactions_request_processor.ts: 100% Coveragecore/src/models/interactions_utils.ts: 100% CoverageManual End-to-End (E2E) Tests:
We created a verification script
verify_interactions.tsin the root of the workspace. It tests a two-turn conversation:interactionId).previousInteractionIdset (verifies history is trimmed and the model correctly recalls "blue" from the server-side state).To execute manual verification:
Checklist
Additional context
TAG=agy
CONV=8a91ed6a-f4db-4160-83d9-68d5e80e066c