Skip to content

Feat: Implement Gemini Interaction API in adk-js#364

Open
AmaadMartin wants to merge 28 commits into
google:mainfrom
AmaadMartin:feat/interactions-api-2
Open

Feat: Implement Gemini Interaction API in adk-js#364
AmaadMartin wants to merge 28 commits into
google:mainfrom
AmaadMartin:feat/interactions-api-2

Conversation

@AmaadMartin

Copy link
Copy Markdown
Collaborator

Please ensure you have read the contribution guide before creating a pull request.

Link to Issue or Description of Change

  1. Link to an existing issue (if applicable):
    N/A

Closes: #issue_number
Related: #issue_number
2. Or, if no issue exists, describe the change:
This PR implements the next-generation stateful Gemini Interaction API integration in adk-js, mirroring the design and functionality already present in adk-python. This enables stateful, multi-turn conversations by tracking interaction history server-side using interactionId, reducing payload sizes across progressive turns.

If applicable, please follow the issue templates to provide as much detail as possible.

Problem:
The current adk-js core only supports stateless execution via the standard generateContent API, which requires sending the entire conversational history back and forth on every turn. This increases payload sizes, causes overhead, and prevents leveraging server-side interaction history tracking.

Solution:

  1. Interface Updates:
    • Added optional previousInteractionId?: string to LlmRequest in core/src/models/llm_request.ts.
    • Added optional interactionId?: string to LlmResponse in core/src/models/llm_response.ts.
  2. Stateful Request Processor:
    • Created InteractionsRequestProcessor under core/src/agents/processors/interactions_request_processor.ts. It automatically traverses the session events history in reverse to find the latest valid interactionId for the current branch and sub-agent name, injecting it as previousInteractionId into the outgoing request.
    • Registered INTERACTIONS_REQUEST_PROCESSOR in LlmAgent request processors, immediately following the CONTENT_REQUEST_PROCESSOR.
  3. Interaction Utility & Payload Transformation:
    • Created core/src/models/interactions_utils.ts containing:
      • getLatestUserContents to trim the outgoing conversation history, sending only the latest continuous user turn when previousInteractionId is present (with special handling to retain the preceding model turn's function call if the user turn contains a function response).
      • Request/response converters mapping ADK types (text, function calls, tool results, media data, code execution) to @google/genai Interactions REST schemas (and vice-versa).
      • Core streaming/non-streaming runner generateContentViaInteractions wrapping @google/genai interactions resource calls.
  4. Model Integration:
    • Updated Gemini class (core/src/models/google_llm.ts) to accept useInteractionsApi?: boolean parameter, toggling the flow to delegate to generateContentViaInteractions when enabled.

Testing Plan
Please describe the tests that you ran to verify your changes. This is required for all PRs that are not small documentation or typo fixes.

Unit Tests:

  • I have added or updated unit tests for my change.
  • All unit tests pass locally.

We implemented extensive unit tests targeting the stateful request processor and the payload converters:

  • core/test/agents/processors/interactions_request_processor_test.ts (6 tests)
  • core/test/models/interactions_utils_test.ts (89 tests)

Summary of passed npm test results:

 RUN  v3.2.4 /usr/local/google/home/amaadmartin/Workspace/Agentspaces/feat-interactions-api-2/adk-js

 Test Files  2 passed (2)
      Tests  95 passed (95)
   Start at  13:17:31
    Duration  10.68s

We achieved 100% Statement, Branch, Function, and Line coverage for both new source files in adk-js/core:

  • core/src/agents/processors/interactions_request_processor.ts: 100% Coverage
  • core/src/models/interactions_utils.ts: 100% Coverage

Manual End-to-End (E2E) Tests:
We created a verification script verify_interactions.ts in the root of the workspace. It tests a two-turn conversation:

  1. Turn 1: "My favorite color is deep blue. Remember this." (verifies the model responds and returns a valid interactionId).
  2. Turn 2: "What is my favorite color?" with previousInteractionId set (verifies history is trimmed and the model correctly recalls "blue" from the server-side state).

To execute manual verification:

GEMINI_API_KEY=your_live_api_key npx tsx verify_interactions.ts

Checklist

  • I have read the CONTRIBUTING.md document.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have added tests that prove my fix is effective or that my feature works.
  • New and existing unit tests pass locally with my changes.
  • I have manually tested my changes end-to-end.
  • Any dependent changes have been merged and published in downstream modules.

Additional context
TAG=agy
CONV=8a91ed6a-f4db-4160-83d9-68d5e80e066c

Comment thread core/src/models/interactions_utils.ts Outdated
Comment on lines +25 to +40
text?: string;
// eslint-disable-next-line @typescript-eslint/no-explicit-any
functionCall?: any;
// eslint-disable-next-line @typescript-eslint/no-explicit-any
functionResponse?: any;
// eslint-disable-next-line @typescript-eslint/no-explicit-any
inlineData?: any;
// eslint-disable-next-line @typescript-eslint/no-explicit-any
fileData?: any;
// eslint-disable-next-line @typescript-eslint/no-explicit-any
thoughtSignature?: any;
thought?: boolean;
// eslint-disable-next-line @typescript-eslint/no-explicit-any
codeExecutionResult?: any;
// eslint-disable-next-line @typescript-eslint/no-explicit-any
executableCode?: any;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove, use Part type from @google/genai

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've audited the SDK type definitions and confirmed that the native Part type from @google/genai already natively contains all these fields. I have removed the ExtendedPart interface entirely and refactored the utility functions to parse standard Part objects directly.

Comment thread core/src/models/interactions_utils.ts Outdated
Comment on lines +43 to +183
interface ExtendedTool {
functionDeclarations?: Array<{
name: string;
description?: string;
parameters?: {
properties?: Record<string, unknown>;
required?: string[];
};
parametersJsonSchema?: unknown;
}>;
googleSearch?: unknown;
codeExecution?: unknown;
urlContext?: unknown;
}

interface InteractionTextContent {
type: 'text';
text: string;
}

interface InteractionFunctionCall {
type: 'function_call';
id: string;
name: string;
arguments: Record<string, unknown>;
thought_signature?: string;
}

interface InteractionFunctionResult {
type: 'function_result';
name: string;
call_id: string;
result: unknown;
}

interface InteractionMediaContent {
type: 'image' | 'audio' | 'video' | 'document';
data?: string;
uri?: string;
mime_type: string;
}

interface InteractionThought {
type: 'thought';
signature?: string;
}

interface InteractionCodeExecutionCall {
type: 'code_execution_call';
id: string;
arguments: {
code: string;
language: string;
};
}

interface InteractionCodeExecutionResult {
type: 'code_execution_result';
call_id: string;
result: string;
is_error: boolean;
}

type InteractionContent =
| InteractionTextContent
| InteractionFunctionCall
| InteractionFunctionResult
| InteractionMediaContent
| InteractionThought
| InteractionCodeExecutionCall
| InteractionCodeExecutionResult;

interface InteractionTurn {
role: string;
content: InteractionContent[];
}

interface InteractionTool {
type: 'function' | 'google_search' | 'code_execution' | 'url_context';
name?: string;
description?: string;
parameters?: unknown;
}

interface InteractionResponse {
id: string;
status: 'completed' | 'requires_action' | 'failed' | string;
error?: {
code: string;
message: string;
};
outputs?: Record<string, unknown>[];
usage?: {
total_input_tokens?: number;
total_output_tokens?: number;
};
}

interface InteractionSSEEvent {
event_type?: string;
eventType?: string;
delta?: {
type: string;
text?: string;
name?: string;
id?: string;
arguments?: Record<string, unknown>;
thought_signature?: string;
data?: string;
uri?: string;
mime_type: string;
};
status?: string;
error?: {
code: string;
message: string;
};
code?: string;
message?: string;
interaction_id?: string;
interactionId?: string;
interaction?: {
id: string;
};
id?: string;
}

interface GoogleGenAIWithInteractions {
interactions: {
create(params: {
model?: string;
input: InteractionTurn[];
stream: boolean;
systemInstruction?: string;
tools?: InteractionTool[];
generationConfig?: Record<string, unknown>;
previousInteractionId?: string;
// eslint-disable-next-line @typescript-eslint/no-explicit-any
}): Promise<any>; // We keep 'any' here as the SDK return type is complex (stream vs non-stream)
};
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get those types from @google/genai?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have replaced all custom interaction types (InteractionContent, InteractionTurn, InteractionTool, InteractionResponse, InteractionSSEEvent, etc.) with native counterparts provided by @google/genai (like Interactions.Content, Interactions.Turn, Interactions.Tool, and Interactions.Interaction).

To handle runtime/SDK discrepancies typesafely without casting to any:

  • Defined local ExtendedInteraction and ExtendedInteractionStatusUpdate interfaces extending the SDK types to cleanly declare the runtime error fields.
  • Adjusted the stream event parser to fall back to delta.signature || delta.thought_signature and correctly map nested error objects for standard ErrorEvents.
  • Removed the obsolete GoogleGenAIWithInteractions client wrapper since GoogleGenAI has a native interactions getter.

Comment thread core/src/models/interactions_utils.ts Outdated
Comment on lines +193 to +201
if (mimeType.startsWith('image/')) {
return 'image';
} else if (mimeType.startsWith('audio/')) {
return 'audio';
} else if (mimeType.startsWith('video/')) {
return 'video';
} else {
return 'document';
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use switch case instead

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored getInteractionMediaType to split mimeType by / and use a clean switch statement on the primary media type prefix (e.g., image, audio).

@AmaadMartin AmaadMartin force-pushed the feat/interactions-api-2 branch from fdad857 to c4561d4 Compare June 1, 2026 19:50
}

const model = agent.canonicalModel;
if (model instanceof Gemini && model.useInteractionsApi) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please avoid using the instanceof and use util functions like isGenimiModel. Looks like isGenimiModel does not exist yet so we should create one. As an example use the https://github.com/google/adk-js/blob/main/core/src/models/base_llm.ts#L17-L31 from BaseModel.

Problem with instanceof is that when user will have multiple adk-js packages in their runtime it will not able to mix objects from one to another.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved. Defined a new isGemini type guard in core/src/models/google_llm.ts using a unique symbol ( Symbol.for('google.adk.geminiModel') ) to avoid package duplication issues. Refactored interactions_request_processor.ts to use isGemini(model) instead of model instanceof Gemini .

Comment thread core/src/models/interactions_utils.ts Outdated

// --- Helper Interfaces for Strong Typing ---

interface ExtendedTool {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just using the default Tool interface from @google/genai

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved. Removed the custom ExtendedTool interface and refactored convertToolsConfigToInteractionsFormat to use the native Tool interface from @google/genai directly.

Comment thread core/src/models/interactions_utils.ts Outdated
Comment on lines +22 to +86
interface ExtendedTool {
functionDeclarations?: Array<{
name: string;
description?: string;
parameters?: {
properties?: Record<string, unknown>;
required?: string[];
};
parametersJsonSchema?: unknown;
}>;
googleSearch?: unknown;
codeExecution?: unknown;
urlContext?: unknown;
}

export interface ExtendedInteraction extends Interactions.Interaction {
error?: {
code: string;
message: string;
};
}

export interface ExtendedInteractionStatusUpdate extends Omit<
Interactions.InteractionStatusUpdate,
'error'
> {
error?: {
code: string;
message: string;
};
}

// Runtime event types can be more relaxed than compile-time
export interface ExtendedInteractionSSEEvent extends Omit<
Interactions.InteractionSSEEvent,
'error' | 'interaction_id' | 'status' | 'event_type'
> {
event_type?: string;
eventType?: string;
delta?: {
type: string;
text?: string;
name?: string;
id?: string;
arguments?: Record<string, unknown>;
thought_signature?: string;
signature?: string;
data?: string;
uri?: string;
mime_type: string;
};
status?: string;
error?: {
code: string;
message: string;
};
code?: string;
message?: string;
interaction_id?: string;
interactionId?: string;
interaction?: {
id: string;
};
id?: string;
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty much sure that we does not need all of that. Please make sure that we does not need that and we can use default types from @google/genai

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We double-checked the native types in @google/genai (version 2.9.0) and confirmed that:

Interactions.Interaction and Interactions.InteractionStatusUpdate still lack the error field, which is returned at runtime when an interaction fails.

Interactions.InteractionSSEEvent is a strict union type that makes it difficult to access common fields like interaction_id or interactionId (handling casing differences at runtime) without verbose type narrowing on each stream event. Both casing schemes are also needed here as both are used in the codebase. The Gen AI API uses snake_case while the SDK uses camelCase in some cases.

Therefore, we kept these clean local extended interfaces to maintain type safety without resorting to any casts, aligning with the project's coding guidelines.

Additionally, we defined ExtendedFunctionCallStep extending Interactions.FunctionCallStep to add the signature field, which is missing in the SDK type definition but used in the codebase to propagate thought signatures.

Comment thread package.json Outdated
"lint:fix": "eslint --fix \"**/*.ts\"",
"format": "prettier \"**/*.ts\" --write",
"format:check": "prettier \"**/*.ts\" --check",
"format": "prettier \"core/**/*.ts\" --write && prettier \"dev/**/*.ts\" --write && prettier \"tests/**/*.ts\" vitest.config.ts --write",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert please

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted

const childProcess = spawn('npm', ['run', 'start'], {
cwd: projectPath,
shell: true,
stdio: ['pipe', 'pipe', 'inherit'],

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? please revert

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted

expect(buildResult.stdout).toContain('\nBuild complete');
}
});
}, 120000);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? please revert

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted

describe('Agent with skills that generates JS script and runs it locally', () => {
beforeAll(async () => {
await execAsync('npm install', {cwd: PROJECT_PATH});
}, TEST_EXECUTION_TIMEOUT);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? please revert

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants