Add AgentField realtime sessions for WebRTC voice ingress#654
Add AgentField realtime sessions for WebRTC voice ingress#654santoshkumarradha wants to merge 5 commits into
Conversation
Performance
✓ No regressions detected |
📊 Coverage gateThresholds from
✅ Gate passedNo surface regressed past the allowed threshold and the aggregate stayed above the floor. |
📐 Patch coverage gateThreshold: 80% on lines this PR touches vs
✅ Patch gate passedEvery surface whose lines were touched by this PR has patch coverage at or above the threshold. |
2d559dc to
3f5bc75
Compare
c2474e1 to
d27241e
Compare
d27241e to
df26f94
Compare
df26f94 to
6315fbb
Compare
|
Merge order note: merge this AgentField runtime/SDK/control-plane PR first. The website docs PR Agent-Field/website2.0#28 documents the APIs introduced here and should merge only after this PR lands, so public docs do not get ahead of the shipped session surface. |
|
Clarification on session The |
|
Merge order for realtime voice/session work:\n\n1. Merge this PR first: base realtime session SDK/control-plane/WebRTC support.\n2. Merge #655 second: session access tags, explicit session target/instance routes, CLI route updates, and UI surfaces.\n3. Merge/update website docs PR Agent-Field/website2.0#28 after the runtime PRs land, so docs reflect the final route names and access-control behavior. |
Summary
This PR introduces the AgentField session DX for realtime voice and multimodal ingress through the control plane.
The product goal is that browsers, CLIs, and external clients do not connect directly to model providers. They create an AgentField session, and the control plane owns the provider boundary, realtime transport, tool calls, and workflow provenance.
What Changed
@app.session(...)for declaring long-lived realtime/multimodal session handlers.RealtimeSession,SessionTurn,session.input(),session.say(...), andsession.call(...)as the handler-facing DX.agent.session(...)and GoRegisterSession(...)equivalents so session declarations register consistently across SDKs.POST /api/v1/sessions/:target/startPOST /api/v1/sessions/:session_id/realtime-offerPOST /api/v1/sessions/:session_id/tools/:toolprovider=openai,transport=webrtc.af sessionCLI commands for start, SDP offer exchange, tool invocation, and workflow lookup.execute/async, forwardingX-Session-IDso resulting reasoner work belongs to the session and appears in the normal workflow DAG.app.callparent headers by forwarding the current execution asX-Parent-Execution-ID.DX Shape
Provider and transport are explicit controls:
providerselects who runs the realtime/audio model.transportselects how the session moves audio/events through the control plane.Example validation behavior:
CLI Shape
af session offeraccepts inline SDP,--sdp @path, or stdin. The default output is raw SDP so browser automation and WebRTC tooling can pipe it directly;--output jsonwraps the answer as{ "answer_sdp": "..." }.Product Positioning
Sessions make AgentField the ingress layer for realtime AI applications.
A browser voice call is not a side channel around AgentField. It becomes an AgentField session backed by the control plane. Browser WebRTC setup, provider negotiation, and tool calls all go through AgentField, while every agent action still routes through the existing execution API.
That gives realtime voice apps the same properties as normal AgentField workflows:
Docs Follow-up
Website docs PR: Agent-Field/website2.0#28.
Merge this AgentField runtime/SDK/control-plane PR first. Merge the website docs PR after this lands so the public docs do not describe session APIs before they are available.
Validation
uv run pytest tests/test_agent_session.py tests/test_session_transport.py tests/test_execution_context_core.py -q: 13 passed.npm testfromsdk/typescript: 619 passed.npm run lint: passed.go test ./agent -run 'TestValidateSessionTransport|TestAgentRegisterSession|TestAgent'fromsdk/go: passed.go test ./pkg/types ./internal/handlers ./internal/clifromcontrol-plane: passed.Session Tool Semantics
session.call(...)andtools=[...]serve different paths:session.call(...)is handler-controlled orchestration. The session function can call AgentField reasoners or skills directly without listing them intools.tools=[...]is the provider/client-visible allowlist for autonomous realtime tool calls during the live session. For example, a realtime audio model or browser-side tool bridge can choose to callorders.lookup_order; AgentField then routes that request through/api/v1/sessions/:session_id/tools/:toolintoexecute/asyncwith the session ID attached.So
toolsis not Python dependency injection and not required for normal handler code. It is the explicit exposure boundary for capabilities the live audio loop may invoke.