Skip to content

Add AgentField realtime sessions for WebRTC voice ingress#654

Open
santoshkumarradha wants to merge 5 commits into
mainfrom
codex/session-workflow-parent-context
Open

Add AgentField realtime sessions for WebRTC voice ingress#654
santoshkumarradha wants to merge 5 commits into
mainfrom
codex/session-workflow-parent-context

Conversation

@santoshkumarradha

@santoshkumarradha santoshkumarradha commented Jun 11, 2026

Copy link
Copy Markdown
Member

Summary

This PR introduces the AgentField session DX for realtime voice and multimodal ingress through the control plane.

The product goal is that browsers, CLIs, and external clients do not connect directly to model providers. They create an AgentField session, and the control plane owns the provider boundary, realtime transport, tool calls, and workflow provenance.

What Changed

  • Adds Python @app.session(...) for declaring long-lived realtime/multimodal session handlers.
  • Adds Python RealtimeSession, SessionTurn, session.input(), session.say(...), and session.call(...) as the handler-facing DX.
  • Adds TypeScript agent.session(...) and Go RegisterSession(...) equivalents so session declarations register consistently across SDKs.
  • Adds explicit provider + transport validation in Python, TypeScript, Go, and the control plane.
  • Adds control-plane session routes:
    • POST /api/v1/sessions/:target/start
    • POST /api/v1/sessions/:session_id/realtime-offer
    • POST /api/v1/sessions/:session_id/tools/:tool
  • Adds OpenAI WebRTC SDP offer proxying through the control plane for provider=openai, transport=webrtc.
  • Adds af session CLI commands for start, SDP offer exchange, tool invocation, and workflow lookup.
  • Keeps session tool work on AgentField execute/async, forwarding X-Session-ID so resulting reasoner work belongs to the session and appears in the normal workflow DAG.
  • Fixes Python nested app.call parent headers by forwarding the current execution as X-Parent-Execution-ID.
  • Adds a TODO beside control-plane session relationship fields for future explicit session-to-session lifecycle edges.

DX Shape

@app.session(
    "voice",
    provider="openai",
    model="gpt-realtime-2",
    transport="webrtc",
    modalities=["audio", "text"],
    voice="marin",
    tools=["support.resolve_voice_turn"],
)
async def voice(session):
    turn = await session.input()
    result = await session.call("support.resolve_voice_turn", turn=turn)
    await session.say(result["spoken_response"])

Provider and transport are explicit controls:

  • provider selects who runs the realtime/audio model.
  • transport selects how the session moves audio/events through the control plane.
  • AgentField validates the pair, but does not infer a transport or silently switch providers.

Example validation behavior:

Unsupported session transport 'webrtc' for provider 'openrouter'. Supported transports: audio_turns. AgentField does not infer or switch providers; set provider and transport explicitly.

CLI Shape

af session start voice-support-af.voice \
  --provider openai \
  --transport webrtc \
  --model gpt-realtime-2 \
  --voice marin

# Pipe-friendly SDP exchange through the control plane.
af session offer sess_123 \
  --provider openai \
  --transport webrtc \
  --sdp @offer.sdp > answer.sdp

# Session tools still route through AgentField execute/async.
af session tool sess_123 launch_support_workflow \
  --target voice-support-af.resolve_voice_turn \
  --in '{"utterance":"I need help with a delayed order"}'

af session workflows sess_123

af session offer accepts inline SDP, --sdp @path, or stdin. The default output is raw SDP so browser automation and WebRTC tooling can pipe it directly; --output json wraps the answer as { "answer_sdp": "..." }.

Product Positioning

Sessions make AgentField the ingress layer for realtime AI applications.

A browser voice call is not a side channel around AgentField. It becomes an AgentField session backed by the control plane. Browser WebRTC setup, provider negotiation, and tool calls all go through AgentField, while every agent action still routes through the existing execution API.

That gives realtime voice apps the same properties as normal AgentField workflows:

  • provider boundary owned by the control plane
  • explicit provider and transport configuration
  • session-scoped execution context
  • reasoner-to-reasoner workflow DAGs
  • replay/provenance surfaces for agent work
  • consistent SDK shape across Python, TypeScript, and Go

Docs Follow-up

Website docs PR: Agent-Field/website2.0#28.

Merge this AgentField runtime/SDK/control-plane PR first. Merge the website docs PR after this lands so the public docs do not describe session APIs before they are available.

Validation

  • uv run pytest tests/test_agent_session.py tests/test_session_transport.py tests/test_execution_context_core.py -q: 13 passed.
  • npm test from sdk/typescript: 619 passed.
  • npm run lint: passed.
  • go test ./agent -run 'TestValidateSessionTransport|TestAgentRegisterSession|TestAgent' from sdk/go: passed.
  • go test ./pkg/types ./internal/handlers ./internal/cli from control-plane: passed.
  • Earlier live voice spike validated browser WebRTC ingress through an AgentField-style session route into OpenAI Realtime, with session tool calls launching a control-plane workflow and preserving a 9-node AgentField DAG.

Session Tool Semantics

session.call(...) and tools=[...] serve different paths:

  • session.call(...) is handler-controlled orchestration. The session function can call AgentField reasoners or skills directly without listing them in tools.
  • tools=[...] is the provider/client-visible allowlist for autonomous realtime tool calls during the live session. For example, a realtime audio model or browser-side tool bridge can choose to call orders.lookup_order; AgentField then routes that request through /api/v1/sessions/:session_id/tools/:tool into execute/async with the session ID attached.

So tools is not Python dependency injection and not required for normal handler code. It is the explicit exposure boundary for capabilities the live audio loop may invoke.

@santoshkumarradha santoshkumarradha requested review from a team and AbirAbbas as code owners June 11, 2026 12:38
@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Performance

SDK Memory Δ Latency Δ Tests Status
Python 9.4 KB +4% 0.33 µs -6%
Go 218 B -22% 0.60 µs -40%
TS 357 B +2% 1.58 µs -21%

✓ No regressions detected

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

📊 Coverage gate

Thresholds from .coverage-gate.toml: per-surface ≥ 84%, aggregate ≥ 85%, max per-surface regression ≤ 1.0 pp, max aggregate regression ≤ 0.50 pp.

Surface Current Baseline Δ
control-plane 87.10% 87.40% ↓ -0.30 pp 🟡
sdk-go 91.80% 92.00% ↓ -0.20 pp 🟢
sdk-python 93.73% 93.73% ↑ +0.00 pp 🟢
sdk-typescript 90.41% 90.42% ↓ -0.01 pp 🟢
web-ui 84.82% 84.79% ↑ +0.03 pp 🟡
aggregate 85.66% 85.75% ↓ -0.09 pp 🟡

✅ Gate passed

No surface regressed past the allowed threshold and the aggregate stayed above the floor.

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

📐 Patch coverage gate

Threshold: 80% on lines this PR touches vs origin/main (from .coverage-gate.toml:thresholds.min_patch).

Surface Touched lines Patch coverage Status
control-plane 524 84.00%
sdk-go 100 84.00%
sdk-python 0 ➖ no changes
sdk-typescript 32 87.00%
web-ui 0 ➖ no changes

✅ Patch gate passed

Every surface whose lines were touched by this PR has patch coverage at or above the threshold.

@santoshkumarradha santoshkumarradha force-pushed the codex/session-workflow-parent-context branch from 2d559dc to 3f5bc75 Compare June 11, 2026 12:48
@santoshkumarradha santoshkumarradha changed the title Fix Python workflow parent headers for nested session calls Validate explicit session provider transports Jun 11, 2026
@santoshkumarradha santoshkumarradha marked this pull request as draft June 11, 2026 13:06
@santoshkumarradha santoshkumarradha force-pushed the codex/session-workflow-parent-context branch 2 times, most recently from c2474e1 to d27241e Compare June 11, 2026 13:27
@santoshkumarradha santoshkumarradha changed the title Validate explicit session provider transports Add AgentField realtime sessions for WebRTC voice ingress Jun 11, 2026
@santoshkumarradha santoshkumarradha force-pushed the codex/session-workflow-parent-context branch from d27241e to df26f94 Compare June 11, 2026 13:35
@santoshkumarradha santoshkumarradha force-pushed the codex/session-workflow-parent-context branch from df26f94 to 6315fbb Compare June 11, 2026 13:37
@santoshkumarradha santoshkumarradha marked this pull request as ready for review June 11, 2026 13:59
@santoshkumarradha

Copy link
Copy Markdown
Member Author

Merge order note: merge this AgentField runtime/SDK/control-plane PR first. The website docs PR Agent-Field/website2.0#28 documents the APIs introduced here and should merge only after this PR lands, so public docs do not get ahead of the shipped session surface.

@santoshkumarradha

santoshkumarradha commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

Clarification on session tools: session.call(...) is available to the session handler for normal handler-controlled orchestration and does not require the target to be listed in tools.

The tools=[...] kwarg is the provider/client-visible allowlist for autonomous realtime tool calls during the live session. Those calls enter through /api/v1/sessions/:session_id/tools/:tool and are routed into execute/async with X-Session-ID so the workflow DAG remains session-scoped.

@santoshkumarradha

Copy link
Copy Markdown
Member Author

Merge order for realtime voice/session work:\n\n1. Merge this PR first: base realtime session SDK/control-plane/WebRTC support.\n2. Merge #655 second: session access tags, explicit session target/instance routes, CLI route updates, and UI surfaces.\n3. Merge/update website docs PR Agent-Field/website2.0#28 after the runtime PRs land, so docs reflect the final route names and access-control behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant