Releases: pipecat-ai/pipecat
v1.0.0
Migration guide: https://docs.pipecat.ai/pipecat/migration/migration-1.0
Added
-
Updated LemonSlice transport:
- Added
on_avatar_connectedandon_avatar_disconnectedevents triggered when the avatar joins and leaves the room. - Added
api_urlparameter toLemonSliceNewSessionRequestto allow overriding the LemonSlice API endpoint. - Added support for passing arbitrary named parameters to the LemonSlice API endpoint.
(PR #3995)
- Added
-
Added Inworld Realtime LLM service with WebSocket-based cascade STT/LLM/TTS, semantic VAD, function calling, and Router support.
(PR #4140) -
⚠️ Added WebSocket-basedOpenAIResponsesLLMServiceas the new default for the OpenAI Responses API. It maintains a persistent connection towss://api.openai.com/v1/responsesand automatically usesprevious_response_idto send only incremental context, falling back to full context on reconnection or cache miss. The previous HTTP-based implementation is now available asOpenAIResponsesHttpLLMService.
(PR #4141) -
Added
group_parallel_toolsparameter toLLMService(defaultTrue). WhenTrue, all function calls from the same LLM response batch share a group ID and the LLM is triggered exactly once after the last call completes. Set toFalseto trigger inference independently for each function call result as it arrives.
(PR #4217) -
Added async function call support to
register_function()andregister_direct_function()viacancel_on_interruption=False. When set toFalse, the LLM continues the conversation immediately without waiting for the function result. The result is injected back into the context as adevelopermessage once available, triggering a new LLM inference at that point.
(PR #4217) -
Added
enable_prompt_cachingsetting toAWSBedrockLLMServicefor Bedrock ConverseStream prompt caching.
(PR #4219) -
Added support for streaming intermediate results from async function calls. Call
result_callbackmultiple times withproperties=FunctionCallResultProperties(is_final=False)to push incremental updates, then call it once more (withis_final=True, the default) to deliver the final result. Only valid for functions registered withcancel_on_interruption=False.
(PR #4230) -
Added
LLMMessagesTransformFrameto facilitate programmatically editing context in a frame-based way.The previous approach required the caller to directly grab a reference to the context object, grab a "snapshot" of its messages at that point in time, transform the messages, and then push an
LLMMessagesUpdateFramewith the transformed messages. This approach can lead to problems: what if there had already been a change to the context queued in the pipeline? The transformed messages would simply overwrite it without consideration.
(PR #4231) -
The development runner now exports a module-level
appFastAPI instance (from pipecat.runner.run import app) so you can register custom routes before callingmain().
(PR #4234) -
ToolsSchemanow acceptscustom_toolsfor OpenAI LLM services (OpenAILLMService,OpenAIResponsesLLMService,OpenAIResponsesHttpLLMService, andOpenAIRealtimeLLMService), letting you pass provider-specific tools liketool_searchalongside standard function tools.
(PR #4248) -
Added enhancements to
NvidiaTTSService:- Cross-sentence stitching: multiple sentences within an LLM turn are fed into a single
SynthesizeOnlinegRPC stream for seamless audio across sentence boundaries (requires Magpie TTS model v1.7.0+). custom_dictionaryandencodingparameters for IPA-based custom pronunciation and output audio encoding.- Metrics generation (
can_generate_metricsreturns true) andstop_all_metrics()when an audio context is interrupted. - gRPC error handling around synthesis config retrieval (
GetRivaSynthesisConfig).
(PR #4249)
- Cross-sentence stitching: multiple sentences within an LLM turn are fed into a single
-
Added
MistralTTSServicefor streaming text-to-speech using Mistral's Voxtral TTS API (voxtral-mini-tts-2603). Supports SSE-based audio streaming with automatic resampling from the API's native 24kHz to any requested sample rate. Requires themistraloptional extra (pip install pipecat-ai[mistral]).
(PR #4251) -
Added
truncate_large_valuesparameter toLLMContext.get_messages(). WhenTrue, returns compact deep copies of messages with binary data (base64 images, audio) replaced by short placeholders and long string values in LLM-specific messages recursively truncated. Useful for serialization, logging, and debugging tools.
(PR #4272) -
CartesiaSTTServicenow supports runtime settings updates (e.g. changinglanguageormodelviaSTTUpdateSettingsFrame). The service automatically reconnects with the new parameters. Previously, settings updates were silently ignored.
(PR #4282) -
Added
pcm_32000andpcm_48000sample rate support to ElevenLabs TTS services.
(PR #4293) -
Added
enable_loggingparameter toElevenLabsHttpTTSService. Set toFalseto enable zero retention mode (enterprise only).
(PR #4293)
Changed
-
Updated
onnxruntimefrom 1.23.2 to 1.24.3, adding support for Python 3.14.
(PR #3984) -
MCPClient now requires async with MCPClient(...) as mcp: or explicit start()/close() calls to manage the connection lifecycle.
(PR #4034) -
⚠️ Updatedlangchainextra to require langchain 1.x (from 0.3.x), langchain-community 0.4.x (from 0.3.x), and langchain-openai 1.x (from 0.3.x). If you pin these packages in your project, update your pins accordingly.
(PR #4192) -
WebsocketServicereconnection errors are now non-fatal. When a websocket service exhausts its reconnection attempts (either via exponential backoff or quick failure detection), it emits a non-fatalErrorFrameinstead of a fatal one. This allows application-level failover (e.g.ServiceSwitcher) to handle the failure instead of killing the entire pipeline.
(PR #4201) -
Changed
GrokLLMServicedefault model fromgrok-3-betatogrok-3, now that the model is generally available.
(PR #4209) -
GoogleImageGenServicenow defaults toimagen-4.0-generate-001(previouslyimagen-3.0-generate-002).
(PR #4213) -
⚠️ BaseOpenAILLMService.get_chat_completions()now accepts anLLMContextinstead ofOpenAILLMInvocationParams. If you override this method, update your signature accordingly.
(PR #4215) -
When multiple function calls are returned in a single LLM response, by default (when
group_parallel_tools=True) the LLM is now triggered exactly once after the last call in the batch completes, rather than waiting for all function calls.
(PR #4217) -
⚠️ LLMService.function_call_timeout_secsnow defaults toNoneinstead of10.0. Deferred function calls will run indefinitely unless a timeout is explicitly set at the service level or per-call. If you relied on the previous 10-second default, passfunction_call_timeout_secs=10.0explicitly.
(PR #4224) -
Updated
NvidiaTTSService:- Made
api_keyoptional for local NIM deployments. - Voice, language, and quality can be updated without reconnecting the gRPC client; new values take effect on the next synthesis turn, not for the current turn's in-flight requests.
- Replaced per-sentence synchronous
synthesize_onlinecalls with async queue-backed gRPC streaming. - Streaming now uses asyncio tasks with explicit gRPC cancellation on interruption and stale-response filtering when a stream is aborted or replaced.
- Renamed Riva references to Nemotron Speech in docs and messages.
- Disabled automatic TTS start frames at the service level (
push_start_frame=False) and emitTTSStartedFramewhen a stitched synthesis stream is started for a context.
(PR #4249)
- Made
Removed
-
⚠️ RemovedOpenPipeLLMServiceand theopenpipeextra. OpenPipe was acquired by CoreWeave and the package is no longer maintained. If you were usingopenpipeas an LLM provider, switch to the underlying provider directly (e.g.openai). The OpenPipe interface can still be used withOpenAILLMServiceby specifying abase_url.
(PR #4191) -
⚠️ RemovedNoisereduceFilter. Use system-level noise reduction or a service-based alternative instead.
(PR #4204) -
⚠️ Removed deprecatedvad_enabledandvad_audio_passthroughtransport params.
(PR #4204) -
⚠️ Removed deprecatedcamera_in_enabled,camera_in_is_live,camera_in_width,camera_in_height, `came...
v0.0.108
Added
-
Added
SarvamLLMServicewith support forsarvam-30b,sarvam-30b-16k,sarvam-105bandsarvam-105b-32k.
(PR #3978) -
Added
on_turn_context_created(context_id)hook toTTSService. Override this to perform provider-specific setup (e.g. eagerly opening a server-side context) before text starts flowing. Called each time a new turn context ID is created.
(PR #4013) -
Added
XAIHttpTTSServicefor text-to-speech using xAI's HTTP TTS API.
(PR #4031) -
Added support for "developer" role messages in conversation context across all LLM adapters. For non-OpenAI services (Anthropic, Google, AWS Bedrock), "developer" messages are converted to "user" messages (use
system_instructionto set the system instruction). For OpenAI services, "developer" messages pass through in conversation history. For the Responses API, they are kept as "developer" role (matching the existing "system" → "developer" conversion).
(PR #4089) -
Added
SmallestTTSService, a WebSocket-based TTS service integration with Smallest AI's Waves API. Supports the Lightning v2 and v3.1 models with configurable voice, language, speed, consistency, similarity, and enhancement settings.
(PR #4092) -
Added warnings in turn stop strategies when
VADParams.stop_secsdiffers from the recommended default (0.2s) or whenstop_secs >= STT p99 latency, which collapses the STT wait timeout to 0s and may cause delayed turn detection. The warnings guide developers to re-run the stt-benchmark with their VAD settings.
(PR #4115) -
Added
domainparameter toAssemblyAISTTSettingsfor specialized recognition modes such as Medical Mode (domain="medical-v1").
(PR #4117) -
Added
NovitaLLMServicefor using Novita AI's LLM models via their OpenAI-compatible API.
(PR #4119) -
Added
cleanup()method toVADAnalyzerandVADControllerso VAD analyzer resources are properly released when no longer needed. CustomVADAnalyzersubclasses can overridecleanup()to free any held resources.
(PR #4120) -
Added
on_end_of_turnevent handler toAssemblyAISTTService. This fires after the final transcript is pushed, providing a reliable hook for end-of-turn logic that doesn't race withTranscriptionFrame. Works in both Pipecat and AssemblyAI turn detection modes.
(PR #4128) -
Added
DeepgramFluxSageMakerSTTServicefor running Deepgram Flux speech-to-text on AWS SageMaker endpoints. Use withExternalUserTurnStrategiesto take advantage of Flux's turn detection.
(PR #4143) -
Added
Mem0MemoryService.get_memories()convenience method for retrieving all stored memories outside the pipeline (e.g. to build a personalized greeting at connection time). This avoids the need to manually handle client type branching, filter construction, and async wrapping.
(PR #4156)
Changed
-
Added context prewarming path for
InworldTTSServiceto improve first audio latency.
(PR #4013) -
Added
KrispVivaVadAnalyzerfor Voice Activity Detection using the Krisp VIVA SDK (requireskrisp_audio).
(PR #4022) -
Modified
InworldTTSServiceto close context at end of turn instead of relying on idle timeout. (PR #4028) -
Added Gemini 3 support to the Gemini Live service.
(PR #4078) -
TTSService: the defaultstop_frame_timeout_s(idle time before an automaticTTSStoppedFrameis pushed whenpush_stop_frames=True) has changed from2.0to3.0seconds.
(PR #4084) -
⚠️ GeminiLLMAdapternow only treatsmessages[0]as the initial system message, matching all other adapters. Previously it searched for the first "system" message anywhere in the conversation history. A "system" message appearing later in the list will now be converted to "user" instead of being extracted as the system instruction.(PR #4089)
-
Fixed
InworldTtsServiceto fallback to full text when TTS timestamps are not received.
(PR #4113) -
⚠️ Realtime services (Gemini Live, OpenAI Realtime, Grok Realtime, Nova Sonic) now prefersystem_instructionfrom service settings over an initial system message in the LLM context, matching the behavior of non-realtime services. Previously, context-provided system instructions took precedence. A warning is now logged when both are set.
(PR #4130) -
Bumped
nvidia-riva-clientminimum version to>=2.25.1.
(PR #4136) -
Upgraded
protobuffrom 5.x to 6.x (>=6.31.1,<7).
(PR #4136) -
Unrecognized language strings (e.g. Deepgram's
"multi") no longer produce a warning at startup. The log message has been downgraded to debug level since these are valid service-specific values that are passed through correctly.
(PR #4137) -
GrokLLMServiceandGrokRealtimeLLMServicenow live in thepipecat.services.xaimodule alongsideXAIHttpTTSService, since all three use the same xAI API. Update imports frompipecat.services.grok.*topipecat.services.xai.*(e.g.from pipecat.services.xai.llm import GrokLLMService).
(PR #4142) -
⚠️ Bumpedmem0aidependency from~=0.1.94to>=1.0.8,<2. Users of themem0extra will need to update their mem0ai package.
(PR #4156)
Deprecated
pipecat.services.grok.llm,pipecat.services.grok.realtime.llm, and
pipecat.services.grok.realtime.eventsare deprecated. The old import paths
still work but emit aDeprecationWarning; usepipecat.services.xai.llm,
pipecat.services.xai.realtime.llm, and
pipecat.services.xai.realtime.eventsinstead.
(PR #4142)
Removed
-
⚠️ TTSService.add_word_timestamps()no longer supports the"Reset"and"TTSStoppedFrame"sentinel strings. If you have a custom TTS service that calledawait self.add_word_timestamps([("Reset", 0)])orawait self.add_word_timestamps([("TTSStoppedFrame", 0), ("Reset", 0)], ctx_id), replace them withawait self.append_to_audio_context(ctx_id, TTSStoppedFrame(context_id=ctx_id))and let_handle_audio_contextmanage the word-timestamp reset automatically.
(PR #4145) -
Removed
SambaNovaSTTService. SambaNova no longer offers speech-to-text audio models. Use another STT provider instead.
(PR #4154)
Fixed
-
Fixed Gemini Live (
GoogleGeminiLiveLLMService) not honoringsettings.system_instruction. The system instruction was being read from a deprecated constructor parameter instead of the settings object, causing it to be silently ignored.
(PR #4089) -
Fixed
AWSBedrockLLMAdaptersending an empty message list to the API when the only message in context was a system message. The lone system message is now converted to "user" role instead of being extracted, matching the existing Anthropic adapter behavior.
(PR #4089) -
Fixed Gemini Live pipeline hanging indefinitely when an
EndFramewas deferred while waiting for the bot to finish responding andturn_completenever arrived. As a possible root-cause fix,turn_completemessages are now handled even if they lackusage_metadata. As a fallback, the deferredEndFramenow has a 30-second safety timeout.
(PR #4125) -
Fixed ElevenLabs WebSocket disconnections (1008 "Maximum simultaneous contexts exceeded") caused by rapid user interruptions. When interruptions arrived before any TTS text was generated, phantom contexts were created on the ElevenLabs server that were never closed, eventually exceeding the 5-context limit.
(PR #4126) -
Fixed the final sentence being dropped from the conversation context when using RTVI text input with non-word-timestamp TTS services. The
LLMFullResponseEndFramewas racing ahead of the lastTTSTextFrame, causing theLLMAssistantAggregatorto finalize the context before the final sentence arrived.
(PR #4127) -
Fixed audio crackling and popping in recordings when both user and bot are speaking.
AudioBufferProcessorno longer injects silence into a track's buffer while that track is actively producing audio, preventing mid-utterance interruptions in the recorded output.
(PR #4135) -
Fixed websocket TTS word timestamps so interrupted contexts cannot leak stale words or backward PTS values into later turns.
(PR #4145)
...
v0.0.107
Added
-
Added
frame_orderparameter toSyncParallelPipeline. Setframe_order=FrameOrder.PIPELINEto push synchronized output frames in pipeline definition order (all frames from the first pipeline, then the second, etc.) instead of the default arrival order.
(PR #4029) -
Added
sync_with_audiofield toOutputImageRawFrame. When set toTrue, the output transport queues image frames with audio so they are displayed only after all preceding audio has been sent, enabling synchronized audio/image playback.
(PR #4029) -
Added
OpenAIResponsesLLMService, a new LLM service that uses the OpenAI Responses API. Supports streaming text, function calling, usage metrics, and out-of-band inference. Works with the universalLLMContextandLLMContextAggregatorPair. Seeexamples/foundational/07-interruptible-openai-responses.pyand14-function-calling-openai-responses.py.
(PR #4074) -
Added
audio_out_auto_silenceparameter toTransportParams(defaults toTrue). When set toFalse, the transport waits for audio data instead of inserting silence when the output queue is empty, which is useful for scenarios that require uninterrupted audio playback without artificial gaps.
(PR #4104)
Changed
-
Renamed tracing span attributes to align with OpenTelemetry GenAI semantic conventions:
gen_ai.systemtogen_ai.provider.name,systemtogen_ai.system_instructions,gen_ai.usage.cache_read_input_tokenstogen_ai.usage.cache_read.input_tokens, andgen_ai.usage.cache_creation_input_tokenstogen_ai.usage.cache_creation.input_tokens.
(PR #3449) -
DeepgramSageMakerTTSServicenow correctly routes audio through the baseTTSServiceaudio context queue. Audio frames are delivered viaappend_to_audio_context()instead of being pushed directly, enabling proper ordering, interruption handling, and start/stop frame lifecycle management. Interruptions now trigger aClearmessage to Deepgram (flushing its text buffer) at the right time viaon_audio_context_interrupted.
(PR #4083) -
GradiumTTSServicenow sends a per-contextsetupmessage withclient_req_idbefore the first text message for each TTS context, following Gradium's multiplexing protocol. Previously, a single setup message was sent at connection time without aclient_req_id, which prevented Gradium from associating requests with their sessions when usingclose_ws_on_eos=False.
(PR #4091)
Fixed
-
Fixed stale
system_instructionin LLM tracing spans by reading from_settings.system_instructioninstead of the removed_system_instructionattribute.
(PR #3449) -
Fixed
SyncParallelPipelinebreaking the Whisker debugger.
(PR #4029) -
Fixed
SyncParallelPipelinerace condition where concurrent SystemFrame processing (e.g. from RTVI) could corrupt sink queues and cause deadlocks. SystemFrames now take a fast path that passes them through without draining queued output.
(PR #4029) -
Fixed TTS frame ordering so that non-system frames always arrive in correct order relative to the
TTSStartedFrame/TTSAudioRawFrame/TTSStoppedFramesequence. Previously these frames could race ahead of or behind audio context frames, producing out-of-order output downstream.
(PR #4075) -
Fixed
SarvamTTSServiceaudio and error frames now route throughappend_to_audio_context()instead ofpush_frame(), ensuring correct behavior with audio contexts and interruptions.
(PR #4082) -
Fixed audio frame ordering and interruption handling in Fish Audio, LMNT, Neuphonic, and Rime NonJson TTS services. These services were bypassing the base
TTSServiceaudio context serialization queue by pushing audio frames directly, which could cause out-of-order frames and broken interruptions during speech.
(PR #4090) -
Fixed Genesys AudioHook serializer to always include the
parametersfield in protocol messages. The AudioHook protocol requires every message to carry aparametersobject (even if empty), but_create_messageomitted it when no parameters were provided. This caused clients that validate message structure (including the Genesys reference implementation) to rejectpongand parameter-lessclosedresponses, breaking server sequence tracking and preventingoutputVariablesfrom reaching the Architect flow.
(PR #4093)
v0.0.106
Added
-
Added optional
servicefield toServiceUpdateSettingsFrame(and its subclassesLLMUpdateSettingsFrame,TTSUpdateSettingsFrame,STTUpdateSettingsFrame) to target a specific service instance. Whenserviceis set, only the matching service applies the settings; others forward the frame unchanged. This enables updating a single service when multiple services of the same type exist in the pipeline.
(PR #4004) -
Added
sip_providerandroom_geoparameters toconfigure()in the Daily runner. These convenience parameters let callers specify a SIP provider name and geographic region directly without manually constructingDailyRoomPropertiesandDailyRoomSipParams.
(PR #4005) -
Added
PerplexityLLMAdapterthat automatically transforms conversation messages to satisfy Perplexity's stricter API constraints (strict role alternation, no non-initial system messages, last message must be user/tool). Previously, certain conversation histories could cause Perplexity API errors that didn't occur with OpenAI (PerplexityLLMServicesubclassesOpenAILLMServicesince Perplexity uses an OpenAI-compatible API).
(PR #4009) -
Added DTMF input event support to the Daily transport. Incoming DTMF tones are now received via Daily's
on_dtmf_eventcallback and pushed into the pipeline asInputDTMFFrame, enabling bots to react to keypad presses from phone callers.
(PR #4047) -
Added
WakePhraseUserTurnStartStrategyfor triggering user turns based on wake phrases, with support forsingle_activationmode. DeprecatesWakeCheckFilter.
(PR #4064) -
Added
default_user_turn_start_strategies()anddefault_user_turn_stop_strategies()helper functions for composing custom strategy lists.
(PR #4064)
Changed
-
Changed tool result JSON serialization to use
ensure_ascii=False, preserving UTF-8 characters instead of escaping them. This reduces context size and token usage for non-English languages.
(PR #3457) -
OpenAIRealtimeSTTService'snoise_reductionparameter is now part ofOpenAIRealtimeSTTSettings, making it runtime-updatable viaSTTUpdateSettingsFrame. The directnoise_reductioninit argument is deprecated as of 0.0.106.
(PR #3991) -
Updated
sarvamaidependency from0.1.26a2(alpha) to0.1.26(stable release).
(PR #3997) -
SimliVideoServicenow extendsAIServiceinstead ofFrameProcessor, aligning it with the HeyGen and Tavus video services. It supportsSimliVideoService.Settings(...)for configuration and usesstart()/stop()/cancel()lifecycle methods. Existing constructor usage (api_key,face_id, etc.) remains unchanged.
(PR #4001) -
Update
pipecat-ai-small-webrtc-prebuiltto2.4.0.
(PR #4023) -
Nova Sonic assistant text transcripts are now delivered in real-time using speculative text events instead of delayed final text events. Previously, assistant text only arrived after all audio had finished playing, causing laggy transcripts in client UIs. Speculative text arrives before each audio chunk, providing text synchronized with what the bot is saying. This also simplifies the internal text handling by removing the interruption re-push hack and assistant text buffer.
(PR #4042) -
Updated
daily-pythondependency to 0.25.0.
(PR #4047) -
Added
enable_dialoutparameter toconfigure()inpipecat.runner.dailyto support dial-out rooms. Also narrowed misleadingOptionaltype hints and deduplicated token expiry calculation.
(PR #4048) -
Extended
ProcessFrameResultto stop strategies, allowing a stop strategy to short-circuit evaluation of subsequent strategies by returningSTOP.
(PR #4064) -
GradiumSTTServicenow takes both anencodingandsample_rateconstructor argument which is assmebled in the class to form theinput_format. PCM accepts8000,16000, and24000Hz sample rates.
(PR #4066) -
Improved
GradiumSTTServicetranscription accuracy by reworking how text fragments are accumulated and finalized. Previously, trailing words could be dropped when the server'sflushedresponse arrived before all text tokens were delivered. The service now uses a short aggregation delay after flush to capture trailing tokens, producing complete utterances.
(PR #4066)
Deprecated
-
SimliVideoService.InputParamsis deprecated. Use the direct constructor parametersmax_session_length,max_idle_time, andenable_logginginstead.
(PR #4001) -
Deprecated
LocalSmartTurnAnalyzerV2andLocalCoreMLSmartTurnAnalyzer. UseLocalSmartTurnAnalyzerV3instead. Instantiating these analyzers will now emit aDeprecationWarning.
(PR #4012) -
Deprecated
WakeCheckFilterin favor ofWakePhraseUserTurnStartStrategy.
(PR #4064)
Fixed
-
Fixed an issue where the default model for
OpenAILLMServiceandAzureLLMServicewas mistakenly reverted togpt-4o. The defaults are now restored togpt-4.1.
(PR #4000) -
Fixed a race condition where
EndTaskFramecould cause the pipeline to shut down before in-flight frames (e.g. LLM function call responses) finished processing.EndTaskFrameandStopTaskFramenow flow through the pipeline asControlFrames, ensuring all pending work is flushed before shutdown begins.CancelTaskFrameandInterruptionTaskFrameremain immediate (SystemFrame).
(PR #4006) -
Fixed
ParallelPipelinedropping or misordering frames during lifecycle synchronization. Buffered frames are now flushed in the correct order relative to synchronization frames (StartFramegoes first,EndFrame/CancelFramego after), and frames added to the buffer during flush are also drained.
(PR #4007) -
Fixed
TTSServicepotentially canceling in-flight audio during shutdown. The stop sequence now waits for all queued audio contexts to finish processing before canceling the stop frame task.
(PR #4007) -
Fixed
Languageenum values (e.g.Language.ES) not being converted to service-specific codes when passed viasettings=Service.Settings(language=Language.ES)at init time. This caused API errors (e.g. 400 from Rime) because the raw enum was sent instead of the expected language code (e.g."spa"). Runtime updates viaUpdateSettingsFramewere unaffected. The fix centralizes conversion in the baseTTSServiceandSTTServiceclasses so all services handle this consistently.
(PR #4024) -
Fixed
DeepgramSTTServiceignoring thebase_urlscheme when usingws://orhttp://. Previously these were silently overwritten withwss:///https://, breaking air-gapped or private deployments that don't use TLS. All scheme choices (wss://,https://,ws://,http://, or bare hostname) are now respected.
(PR #4026) -
Fixed
LLMSwitcher.register_function()andregister_direct_function()not accepting or forwarding thetimeout_secsparameter.
(PR #4037) -
Fixed empty user transcriptions in Nova Sonic causing spurious interruptions. Previously, an empty transcription could trigger an interruption of the assistant's response even though the user hadn't actually spoken.
(PR #4042) -
Fixed
SonioxSTTServiceandOpenAIRealtimeSTTServicecrash when language parameters contain plain strings instead ofLanguageenum values.
(PR #4046) -
Fixed premature user turn stops caused by late transcriptions arriving between turns. A stale transcript from the previous turn could persist into the next turn and trigger a stop before the current turn's real transcript arrived. Stop strategies are now reset at both turn start and turn stop to prevent state from leaking across turn boundaries.
(PR #4057) -
Fixed raw language strings like
"de-DE"silently failing when passed to TTS/STT services (e.g. ElevenLabs producing no audio). Raw strings now go through the sameLanguageenum resolution as enum values, so regional codes like"de-DE"are properly converted to service-expected formats like"de". Unrecognized strings log a warning instead of failing silently.
(PR #4058) -
Fixed Deepgram STT list-type settings (
keyterm,keywords,search,redact,replace) being stringified instead of passed as lists to the SDK, which caused them to be sent as literal strings (e.g."['pipecat']") in the nWebSocket query params.
(PR #4063) -
...
v0.0.105
Added
-
Added concurrent audio context support:
CartesiaTTSServicecan now synthesize the next sentence while the previous one is still playing, by settingpause_frame_processing=Falseand routing each sentence through its own audio context queue.
(PR #3804) -
Added custom video track support to Daily transport. Use
video_out_destinationsinDailyParamsto publish multiple video tracks simultaneously, mirroring the existingaudio_out_destinationsfeature.
(PR #3831) -
Added
ServiceSwitcherStrategyFailoverthat automatically switches to the next service when the active service reports a non-fatal error. Recovery policies can be implemented via theon_service_switchedevent handler.
(PR #3861) -
Added optional
timeout_secsparameter toregister_function()andregister_direct_function()for per-tool function call timeout control, overriding the globalfunction_call_timeout_secsdefault.
(PR #3915) -
Added
cloud-audio-onlyrecording option to Daily transport'senable_recordingproperty.
(PR #3916) -
Wired up
system_instructioninBaseOpenAILLMService,AnthropicLLMService, andAWSBedrockLLMServiceso it works as a default system prompt, matching the behavior of the Google services. This enables sharing a singleLLMContextacross multiple LLM services, where each service provides its own system instruction independently.llm = OpenAILLMService( api_key=os.getenv("OPENAI_API_KEY"), system_instruction="You are a helpful assistant.", ) context = LLMContext() @transport.event_handler("on_client_connected") async def on_client_connected(transport, client): context.add_message({"role": "user", "content": "Please introduce yourself."}) await task.queue_frames([LLMRunFrame()])
(PR #3918)
-
Added
vad_thresholdparameter toAssemblyAIConnectionParamsfor configuring voice activity detection sensitivity in U3 Pro. Aligning this with external VAD thresholds (e.g., Silero VAD) prevents the "dead zone" where AssemblyAI transcribes speech that VAD hasn't detected yet.
(PR #3927) -
Added
push_empty_transcriptsparameter toBaseWhisperSTTServiceandOpenAISTTServiceto allow empty transcripts to be pushed downstream asTranscriptionFrameinstead of discarding them (the default behavior). This is intended for situations where VAD fires even though the user did not speak. In these cases, it is useful to know that nothing was transcribed so that the agent can resume speaking, instead of waiting longer for a transcription.
(PR #3930) -
LLM services (
BaseOpenAILLMService,AnthropicLLMService,AWSBedrockLLMService) now log a warning when bothsystem_instructionand a system message in the context are set. The constructor'ssystem_instructiontakes precedence.
(PR #3932) -
Runtime settings updates (via
STTUpdateSettingsFrame) now work for AWS Transcribe, Azure, Cartesia, Deepgram, ElevenLabs Realtime, Gradium, and Soniox STT services. Previously, changing settings at runtime only stored the new values without reconnecting.
(PR #3946) -
Exposed
on_summary_appliedevent onLLMAssistantAggregator, allowing users to listen for context summarization events without accessing private members.
(PR #3947) -
Deepgram Flux STT settings (
keyterm,eot_threshold,eager_eot_threshold,eot_timeout_ms) can now be updated mid-stream viaSTTUpdateSettingsFramewithout triggering a reconnect. The new values are sent to Deepgram as a Configure WebSocket message on the existing connection.
(PR #3953) -
Added
system_instructionparameter torun_inferenceacross all LLM services, allowing callers to override the system prompt for one-shot inference calls. Used by_generate_summaryto pass the summarization prompt cleanly.
(PR #3968)
Changed
-
Audio context management (previously in
AudioContextTTSService) is now built intoTTSService. All WebSocket providers (cartesia,elevenlabs,asyncai,inworld,rime,gradium,resembleai) now inherit fromWebsocketTTSServicedirectly. Word-timestamp baseline is set automatically on the first audio chunk of each context instead of requiring each provider to callstart_word_timestamps()in their receive loop.
(PR #3804) -
Daily transport now uses
CustomVideoSource/CustomVideoTrackinstead ofVirtualCameraDevicefor the default camera output, mirroring how audio already works withCustomAudioSource/CustomAudioTrack.
(PR #3831) -
⚠️ UpdatedDeepgramSTTServiceto usedeepgram-sdkv6. TheLiveOptionsclass was removed from the SDK and is now provided by pipecat directly; import it frompipecat.services.deepgram.sttinstead ofdeepgram.
(PR #3848) -
ServiceSwitcherStrategybase class now provides ahandle_error()hook for subclasses to implement error-based switching.ServiceSwitcherdefaults toServiceSwitcherStrategyManualandstrategy_typeis now optional.
(PR #3861) -
Support for Voice Focus 2.0 models.
- Updated
aic-sdkto~=2.1.0to support Voice Focus 2.0 models. - Cleaned unused
ParameterFixedErrorexception handling inAICFilter
parameter setup.
(PR #3889)
- Updated
-
max_context_tokensandmax_unsummarized_messagesinLLMAutoContextSummarizationConfig(and deprecatedLLMContextSummarizationConfig) can now be set toNoneindependently to disable that summarization threshold. At least one must remain set.
(PR #3914) -
⚠️ Removedformatted_finalsandword_finalization_max_wait_timefromAssemblyAIConnectionParamsas these were v2 API parameters not supported in v3. Clarified thatformat_turnsonly applies to Universal-Streaming models; U3 Pro has automatic formatting built-in.
(PR #3927) -
Changed
DeepgramTTSServiceto send a Clear message on interruption instead of disconnecting and reconnecting the WebSocket, allowing the connection to persist throughout the session.
(PR #3958) -
Re-added
enhancement_levelsupport toAICFilterwith runtimeFilterEnableFramecontrol, applyingProcessorParameter.BypassandProcessorParameter.EnhancementLeveltogether.
(PR #3961) -
Updated
daily-pythondependency from~=0.23.0to~=0.24.0.
(PR #3970) -
Updated
FishAudioTTSServicedefault model froms1tos2-pro, matching Fish Audio's latest recommended model for improved quality and speed.
(PR #3973) -
AzureSTTServiceregionparameter is now optional whenprivate_endpointis provided. AValueErroris raised if neither is given, and a warning is logged if both are provided (private_endpointtakes priority).
(PR #3974)
Deprecated
-
Deprecated
AudioContextTTSServiceandAudioContextWordTTSService. SubclassWebsocketTTSServicedirectly instead; audio context management is now part of the baseTTSService.- Deprecated
WordTTSService,WebsocketWordTTSService, andInterruptibleWordTTSService. Word timestamp logic is now always active inTTSServiceand no longer needs to be opted into via a subclass.
(PR #3804)
- Deprecated
-
Deprecated
pipecat.services.google.llm_vertex,pipecat.services.google.llm_openai, andpipecat.services.google.gemini_live.llm_vertexmodules. Usepipecat.services.google.vertex.llm,pipecat.services.google.openai.llm, andpipecat.services.google.gemini_live.vertex.llminstead. The old import paths still work but will emit aDeprecationWarning.
(PR #3980)
Removed
⚠️ Removedsupports_word_timestampsparameter fromTTSService.__init__(). Word timestamp logic is now always active. Remove this argument from any custom subclasssuper().__init__()calls.
(PR #3804)
Fixed
-
Fixed
DeepgramSTTServicekeepalive ping timeout disconnections. The deepgram-sdk v6 removed automatic keepalive; pipecat now sends explicitKeepAlivemessages every 5 seconds, within the recommended 3–5 second interval before Deepgram's 10-second inactivity timeout.
(PR #3848) -
Fixed
BufferError: Existing exports of data: object cannot be re-sizedinAICFiltercaused by holding amemoryviewon the mutable audio buffer across async yield points.
(PR #3889) -
Fixed TTS context not being appended to the assistant message history when using
TTSSpeakFramewithappend_to_context=Truewith some TTS providers.
(PR [#3936](https://githu...
v0.0.104
Added
-
Added
TextAggregationMetricsDatametric measuring the time from the first LLM token to the first complete sentence, representing the latency cost of sentence aggregation in the TTS pipeline.
(PR #3696) -
Added support for using strongly-typed objects instead of dicts for updating service settings at runtime.
Instead of, say:
await task.queue_frame( STTUpdateSettingsFrame(settings={"language": Language.ES}) )
you'd do:
await task.queue_frame( STTUpdateSettingsFrame(delta=DeepgramSTTSettings(language=Language.ES)) )
Each service now vends strongly-typed classes like
DeepgramSTTSettingsrepresenting the service's runtime-updatable settings.
(PR #3714) -
Added support for specifying private endpoints for Azure Speech-to-Text, enabling use in private networks behind firewalls.
(PR #3764) -
Added
LemonSliceTransportandLemonSliceApito support adding real-time LemonSlice Avatars to any Daily room.
(PR #3791) -
Added
output_mediumparameter toAgentInputParamsandOneShotInputParamsin Ultravox service to control initial output medium (text or voice) at call creation time.
(PR #3806) -
Added
TurnMetricsDataas a generic metrics class for turn detection, with e2e processing time measurement.KrispVivaTurnnow emitsTurnMetricsDatawithe2e_processing_time_mstracking the interval from VAD speech-to-silence transition to turn completion.
(PR #3809) -
Added
on_audio_context_interrupted()andon_audio_context_completed()callbacks toAudioContextTTSService. Subclasses can override these to perform provider-specific cleanup instead of overriding_handle_interruption().
(PR #3814) -
Added
on_summary_appliedevent toLLMContextSummarizerfor observability, providing message counts before and after context summarization.
(PR #3855) -
Added
summary_message_templatetoLLMContextSummarizationConfigfor customizing how summaries are formatted when injected into context (e.g., wrapping in XML tags).
(PR #3855) -
Added
summarization_timeouttoLLMContextSummarizationConfig(default 120s) to prevent hung LLM calls from permanently blocking future summarizations.
(PR #3855) -
Added optional
llmfield toLLMContextSummarizationConfigfor routing summarization to a dedicated LLM service (e.g., a cheaper/faster model) instead of the pipeline's primary model.
(PR #3855) -
Add AssemblyAI u3-rt-pro model support with built-in turn detection mode
(PR #3856) -
Added
LLMSummarizeContextFrameto trigger on-demand context summarization from anywhere in the pipeline (e.g. a function call tool). Accepts an optionalconfig: LLMContextSummaryConfigto override summary generation settings per request.
(PR #3863) -
Added
LLMContextSummaryConfig(summary generation params:target_context_tokens,min_messages_after_summary,summarization_prompt) andLLMAutoContextSummarizationConfig(auto-trigger thresholds:max_context_tokens,max_unsummarized_messages, plus a nestedsummary_config). These replace the monolithicLLMContextSummarizationConfig.
(PR #3863) -
Added support for the
speed_alphaparameter to thearcanamodel inRimeTTSService.
(PR #3873) -
Added
ClientConnectedFrame, a newSystemFramepushed by all transports (Daily, LiveKit, FastAPI WebSocket, WebSocket Server, SmallWebRTC, HeyGen, Tavus) when a client connects. Enables observers to track transport readiness timing.
(PR #3881) -
Added
StartupTimingObserverfor measuring how long each processor'sstart()method takes during pipeline startup. Also measures transport readiness — the time fromStartFrameto first client connection — via theon_transport_timing_reportevent.
(PR #3881) -
Added
BotConnectedFramefor SFU transports andon_transport_timing_reportevent toStartupTimingObserverwith bot and client connection timing.
(PR #3881) -
Added optional
directionparameter toPipelineTask.queue_frame()andPipelineTask.queue_frames(), allowing frames to be pushed upstream from the end of the pipeline.
(PR #3883) -
Added
on_latency_breakdownevent toUserBotLatencyObserverproviding per-service TTFB, text aggregation, user turn duration, and function call latency metrics for each user-to-bot response cycle.
(PR #3885) -
Added
on_first_bot_speech_latencyevent toUserBotLatencyObservermeasuring the time from client connection to first bot speech. Anon_latency_breakdownis also emitted for this first speech event.
(PR #3885) -
Added
broadcast_interruption()toFrameProcessor. This method pushes anInterruptionFrameboth upstream and downstream directly from the calling processor, avoiding the round-trip through the pipeline task thatpush_interruption_task_frame_and_wait()required.
(PR #3896)
Changed
-
Added
text_aggregation_modeparameter toTTSServiceand all TTS subclasses with a newTextAggregationModeenum (SENTENCE,TOKEN). All text now flows through text aggregators regardless of mode, enabling pattern detection and tag handling in TOKEN mode.
(PR #3696) -
⚠️ Refactored runtime-updatable service settings to use strongly-typed classes (TTSSettings,STTSettings,LLMSettings, and service-specific subclasses) instead of plain dicts. Each service's_settingsnow holds these strongly-typed objects. For service maintainers, see changes in COMMUNITY_INTEGRATIONS.md.
(PR #3714) -
Word timestamp support has been moved from
WordTTSServiceintoTTSServicevia a newsupports_word_timestampsparameter. Services that previously extendedWordTTSService,AudioContextWordTTSService, orWebsocketWordTTSServicenow passsupports_word_timestamps=Trueto their parent__init__instead.
(PR #3786) -
Improved Ultravox TTFB measurement accuracy by using VAD speech end time instead of
UserStoppedSpeakingFrametiming.
(PR #3806) -
Aligned
UltravoxRealtimeLLMServiceframe handling with OpenAI/Gemini realtime services: addedInterruptionFramehandling with metrics cleanup, processing metrics at response boundaries, and improved agent transcript handling for both voice and text output modalities.
(PR #3806) -
Updated
OpenAIRealtimeLLMServicedefault model togpt-realtime-1.5.
(PR #3807) -
Added
api_keyparameter toKrispVivaSDKManager,KrispVivaTurn, andKrispVivaFilterfor Krisp SDK v1.6.1+ licensing. Falls back toKRISP_VIVA_API_KEYenvironment variable.
(PR #3809) -
Bumped
nltkminimum version from 3.9.1 to 3.9.3 to resolve a security vulnerability.
(PR #3811) -
ServiceSettingsUpdateFrames are nowUninterruptibleFrames. Generally speaking, you don't want a user interruption to prevent a service setting change from going into effect. Note that you usually don't useServiceSettingsUpdateFramedirectly, you use one of its subclasses:LLMUpdateSettingsFrameTTSUpdateSettingsFrameSTTUpdateSettingsFrame
(PR #3819)
-
Updated context summarization to use
userrole instead ofassistantfor summary messages.
(PR #3855) -
Rename
AssemblyAISTTServiceparametermin_end_of_turn_silence_when_confidentparameter tomin_turn_silence(old name still supported with deprecation warning)
(PR #3856) -
⚠️ RenamedLLMAssistantAggregatorParamsfields:enable_context_summarization→enable_auto_context_summarizationandcontext_summarization_config→auto_context_summarization_config(now acceptsLLMAutoContextSummarizationConfig). The old names still work with aDeprecationWarningfor one release cycle.
(PR #3863) -
ElevenLabsRealtimeSTTServicenow setsTranscriptionFrame.finalizedtoTruewhen usingCommitStrategy.MANUAL.
(PR #3865) -
Updated numba version pin from == to >=0.61.2
(PR #3868) -
Updated tracing code to use
ServiceSettingsdataclass API (given_fields(), attribute access) instead of dict-style access (.items(),in, subscript).
(PR [...
v0.0.103
Added
-
Added
"timestampTransportStrategy": "ASYNC"toInworldAITTSService. This allows timestamps info to trail audio chunks arrival, resulting in much better first audio chunk latency
(PR #3625) -
Added model-specific
InputParamstoRimeTTSService: arcana params (repetition_penalty,temperature,top_p) and mistv2 params (no_text_normalization,save_oovs,segment). Model, voice, and param changes now trigger WebSocket reconnection.
(PR #3642) -
Added
write_transport_frame()hook toBaseOutputTransportallowing transport subclasses to handle custom frame types that flow through the audio queue.
(PR #3719) -
Added
DailySIPTransferFrameandDailySIPReferFrameto the Daily transport. These frames queue SIP transfer and SIP REFER operations with audio, so the operation executes only after the bot finishes its current utterance.
(PR #3719) -
Added keepalive support to
SarvamSTTServiceto prevent idle connection timeouts (e.g. when used behind aServiceSwitcher).
(PR #3730) -
Added
UserIdleTimeoutUpdateFrameto enable or disable user idle detection at runtime by updating the timeout dynamically.
(PR #3748) -
Added
broadcast_sibling_idfield to the baseFrameclass. This field is automatically set bybroadcast_frame()andbroadcast_frame_instance()to the ID of the paired frame pushed in the opposite direction, allowing receivers to identify broadcast pairs.
(PR #3774) -
Added
ignored_sourcesparameter toRTVIObserverParamsandadd_ignored_source()/remove_ignored_source()methods toRTVIObserverto suppress RTVI messages from specific pipeline processors (e.g. a silent evaluation LLM).
(PR #3779) -
Added
DeepgramSageMakerTTSServicefor running Deepgram TTS models deployed on AWS SageMaker endpoints via HTTP/2 bidirectional streaming. Supports the Deepgram TTS protocol (Speak, Flush, Clear, Close), interruption handling, and per-turn TTFB metrics.
(PR #3785)
Changed
-
⚠️ RimeTTSServicenow defaults tomodel="arcana"and thewss://users-ws.rime.ai/ws3endpoint.InputParamsdefaults changed from mistv2-specific values toNone— only explicitly-set params are sent as query params.
(PR #3642) -
AICFilternow shares read-only AIC models via a singletonAICModelManager
inaic_filter.py.- Multiple filters using the same model path or
(model_id, model_download_dir)share one loaded model, with reference counting and concurrent load deduplication. - Model file I/O runs off the event loop so the filter does not block.
(PR #3684)
- Multiple filters using the same model path or
-
Added
X-User-AgentandX-Request-Idheaders toInworldTTSServicefor better traceability.
(PR #3706) -
DailyUpdateRemoteParticipantsFrameis no longer deprecated and is now queued with audio like other transport frames.
(PR #3719) -
Bumped Pillow dependency upper bound from
<12to<13to allow Pillow 12.x.
(PR #3728) -
Moved STT keepalive mechanism from
WebsocketSTTServiceto theSTTServicebase class, allowing any STT service (not just websocket-based ones) to use idle-connection keepalive via thekeepalive_timeoutandkeepalive_intervalparameters.
(PR #3730) -
Improved audio context management in
AudioContextTTSServiceby moving context ID tracking to the base class and addingreuse_context_id_within_turnparameter to control concurrent TTS request handling.- Added helper methods:
has_active_audio_context(),get_active_audio_context_id(),remove_active_audio_context(),reset_active_audio_context() - Simplified Cartesia, ElevenLabs, Inworld, Rime, AsyncAI, and Gradium TTS implementations by removing duplicate context management code
(PR #3732)
- Added helper methods:
-
UserIdleControlleris now always created with a default timeout of 0 (disabled). Theuser_idle_timeoutparameter changed fromOptional[float] = Nonetofloat = 0inUserTurnProcessor,LLMUserAggregatorParams, andUserIdleController.
(PR #3748) -
Change the version specifier from
>=0.2.8to~=0.2.8for thespeechmatics-voicepackage to ensure compatibility with future patch versions.
(PR #3761) -
Updated
InworldTTSServiceandInworldHttpTTSServiceto useASYNCtimestamp transport strategy by default
(PR #3765) -
Added
start_timeandend_timeparameters tostart_ttfb_metrics(),stop_ttfb_metrics(),start_processing_metrics(), andstop_processing_metrics()inFrameProcessorandFrameProcessorMetrics, allowing custom timestamps for metrics measurement.STTServicenow uses these instead of custom TTFB tracking.
(PR #3776) -
Updated default Anthropic model from
claude-sonnet-4-5-20250929toclaude-sonnet-4-6.
(PR #3792)
Deprecated
- Deprecated unused
Traceable,@traceable,@traced, andAttachmentStrategyinpipecat.utils.tracing.class_decorators. This module will be removed in a future release.
(PR #3733)
Fixed
-
Fixed race condition where
RTVIObservercould send messages beforeDailyTransportjoin completed. Outbound messages are now queued & delivered after the transport is ready.
(PR #3615) -
Fixed async generator cleanup in OpenAI LLM streaming to prevent
AttributeErrorwith uvloop on Python 3.12+ (MagicStack/uvloop#699).
(PR #3698) -
Fixed
SmallWebRTCTransportinput audio resampling to properly handle all sample rates, including 8kHz audio.
(PR #3713) -
Fixed a race condition in
RTVIObserverwhere bot output messages could be sent before the bot-started-speaking event.
(PR #3718) -
Fixed Grok Realtime
session.updatedevent parsing failure caused by the API returning prefixed voice names (e.g."human_Ara"instead of"Ara").
(PR #3720) -
Fixed context ID reuse issue in
ElevenLabsTTSService,InworldTTSService,RimeTTSService,CartesiaTTSService,AsyncAITTSService, andPlayHTTTSService. Services now properly reuse the same context ID across multiplerun_tts()invocations within a single LLM turn, preventing context tracking issues and incorrect lifecycle signaling.
(PR #3729) -
Fixed word timestamp interleaving issue in
ElevenLabsTTSServicewhen processing multiple sentences within a single LLM turn.
(PR #3729) -
Fixed tracing service decorators executing the wrapped function twice when the function itself raised an exception (e.g., LLM rate limit, TTS timeout).
(PR #3735) -
Fixed
LLMUserAggregatorbroadcasting mute events beforeStartFramereaches downstream processors.
(PR #3737) -
Fixed
UserIdleControllerfalse idle triggers caused by gaps between user and bot activity frames. The idle timer now starts only afterBotStoppedSpeakingFrameand is suppressed during active user turns and function calls.
(PR #3744) -
Fixed incorrect
sample_rateassignment inTavusInputTransport._on_participant_audio_data(was usingaudio.audio_framesinstead ofaudio.sample_rate).
(PR #3768) -
Fixed
RTVIObservernot processing upstream-only frames. Previously, all upstream frames were filtered out to avoid duplicate messages from broadcasted frames. Now only upstream copies of broadcasted frames are skipped.
(PR #3774) -
Fixed mutable default arguments in
LLMContextAggregatorPair.__init__()that could cause shared state across instances.
(PR #3782) -
Fixed
DeepgramSageMakerSTTServiceto properly track finalize lifecycle usingrequest_finalize()/confirm_finalize()and useis_final(instead ofis_final and speech_final) for final transcription detection, matchingDeepgramSTTServicebehavior.
(PR #3784) -
Fixed a race condition in
AudioContextTTSServicewhere the audio context could time out between consecutive TTS requests within the same turn, causing audio to be discarded.
(PR #3787) -
Fixed
push_interruption_task_frame_and_wait()hanging indefinitely when theInterruptionFramedoes not reach the pipeline sink within the timeout. Added atimeoutkeyword argument to customize the wait duration.
(PR [#3789](https://github.com...
v0.0.102
Added
-
Added
ResembleAITTSServicefor text-to-speech using Resemble AI's streaming WebSocket API with word-level timestamps and jitter buffering for smooth audio playback.
(PR #3134) -
Added
UserBotLatencyObserverfor tracking user-to-bot response latency. When tracing is enabled, latency measurements are automatically recorded asturn.user_bot_latency_secondsattributes on OpenTelemetry turn spans.
(PR #3355) -
Added
append_to_contextparameter toTTSSpeakFramefor conditional LLM context addition.- Allows fine-grained control over whether text should be added to conversation context
- Defaults to
Trueto maintain backward compatibility
(PR #3584)
-
Added TTS context tracking system with
context_idfield to trace audio generation through the pipeline.TTSAudioRawFrame,TTSStartedFrame,TTSStoppedFramenow includecontext_idAggregatedTextFrameandTTSTextFramenow includecontext_id- Enables tracking which TTS request generated specific audio chunks
(PR #3584)
-
Added support for Inworld TTS Websocket Auto Mode for improved latency
(PR #3593) -
Added new frames for context summarization:
LLMContextSummaryRequestFrameandLLMContextSummaryResultFrame.
(PR #3621) -
Added context summarization feature to automatically compress conversation history when conversation length limits (by token or message count) are reached, enabling efficient long-running conversations.
- Configure via
enable_context_summarization=TrueinLLMAssistantAggregatorParams - Customize behavior with
LLMContextSummarizationConfig(max tokens, thresholds, etc.) - Automatically preserves incomplete function call sequences during summarization
- See new examples:
examples/foundational/54-context-summarization-openai.pyand
examples/foundational/54a-context-summarization-google.py
(PR #3621)
- Configure via
-
Added RTVI function call lifecycle events (
llm-function-call-started,llm-function-call-in-progress,llm-function-call-stopped) with configurable security levels viaRTVIObserverParams.function_call_report_level. Supports per-function control over what information is exposed (DISABLED,NONE,NAME, orFULL).
(PR #3630) -
Added
RequestMetadataFrameand metadata handling forServiceSwitcherto ensure STT services correctly emitSTTMetadataFramewhen switching between services. Only the active service's metadata is propagated downstream, switching services triggers the newly active service to re-emit its metadata, and proper frame ordering is maintained at startup.
(PR #3637) -
Added
STTMetadataFrameto broadcast STT service latency information at pipeline start.- STT services broadcast P99 time-to-final-segment (
ttfs_p99_latency) to downstream processors - Turn stop strategies automatically configure their STT timeout from this metadata
- Developers can override
ttfs_p99_latencyvia constructor argument for custom deployments - Added measured P99 values for STT providers.
- See stt-benchmark to measure latency for your configuration
(PR #3637)
- STT services broadcast P99 time-to-final-segment (
-
Added support for
is_sandboxparameter inLiveAvatarNewSessionRequestto enable sandbox mode for HeyGen LiveAvatar sessions.
(PR #3653) -
Added support for
video_settingsparameter inLiveAvatarNewSessionRequestto configure video encoding (H264/VP8) and quality levels.
(PR #3653) -
Added
OpenAIRealtimeSTTServicefor real-time streaming speech-to-text using OpenAI's Realtime API WebSocket transcription sessions. Supports local VAD and server-side VAD modes, noise reduction, and automatic reconnection.
(PR #3656) -
Added
bulbul:v3-betaTTS model support for Sarvam AI with temperature control and 25 new speaker voices.
(PR #3671) -
Added
saaras:v3STT model support for Sarvam AI with newmodeparameter (transcribe, translate, verbatim, translit, codemix) and prompt support.
(PR #3671) -
Added new OpenAI TTS voice options
marinandcedar.
(PR #3682) -
Added
UserMuteStartedFrameandUserMuteStoppedFramesystem frames, and correspondinguser-mute-started/user-mute-stoppedRTVI messages, so clients can observe when mute strategies activate or deactivate.
(PR #3687)
Changed
-
Updated all 30+ TTS service implementations to support context tracking with
context_id.- Services now generate and propagate context IDs through TTS frames
- Enables end-to-end tracing of TTS requests through the pipeline
(PR #3584)
-
⚠️ TTSService.run_tts()now requires acontext_idparameter for context tracking.- Custom TTS service implementations must update their
run_tts()signature - Before:
async def run_tts(self, text: str) -> AsyncGenerator[Frame, None]: - After:
async def run_tts(self, text: str, context_id: str) -> AsyncGenerator[Frame, None]:
(PR #3584)
- Custom TTS service implementations must update their
-
Simplified context aggregators to use
frame.append_to_contextflag instead of tracking internal state.- Cleaner logic in
LLMResponseAggregatorandLLMResponseUniversalAggregator - More consistent behavior across aggregator implementations
(PR #3584)
- Cleaner logic in
-
Updated timestamps to be cumulative within an agent turn, using flushCompleted message as an indication of when timestamps from the server are reset to 0
(PR #3593) -
Changed
KokoroTTSServiceto usekokoro-onnxinstead ofkokoroas the underlying TTS engine.
(PR #3612) -
Improved user turn stop timing in
TranscriptionUserTurnStopStrategyandTurnAnalyzerUserTurnStopStrategy.- Timeout now starts on
VADUserStoppedSpeakingFramefor tighter, more predictable timing - Added support for finalized transcripts (
TranscriptionFrame.finalized=True) to trigger earlier - Added fallback timeout for edge cases where transcripts arrive without VAD events
- Removed
InterimTranscriptionFramehandling (no longer affects timing)
(PR #3637)
- Timeout now starts on
-
Improved the accuracy of the
UserBotLatencyObserverandUserBotLatencyLogObserverby measuring from the time when the user actually starts speaking.
(PR #3637) -
⚠️ Renamedtimeoutparameter touser_speech_timeoutinTranscriptionUserTurnStopStrategy.
(PR #3637) -
Updated the
VADUserStartedSpeakingFrameto includestart_secsandtimestampandVADUserStoppedSpeakingFrameto includestop_secsandtimestamp, removing the need to separately handle theSpeechControlParamsFramefor VADParams values.
(PR #3637) -
⚠️ RenamedTranscriptionUserTurnStopStrategytoSpeechTimeoutUserTurnStopStrategy. The old name is deprecated and will be removed in a future release.
(PR #3637) -
AssemblyAISTTServicenow automatically configures optimal settings for manual turn detection whenvad_force_turn_endpoint=True. This setsend_of_turn_confidence_threshold=1.0andmax_turn_silence=2000by default, which disables model-based turn detection and reduces latency by relying on external VAD for turn endpoints. Warnings are logged if conflicting settings are detected.
(PR #3644) -
Upgraded the
pipecat-ai-small-webrtc-prebuiltpackage to v2.1.0.
(PR #3652) -
Changed default session mode from "CUSTOM" to "LITE" in HeyGen LiveAvatar integration, with VP8 as the default video encoding.
(PR #3653) -
⚠️ The defaultVADParamsstop_secsdefault is changing from0.8seconds to0.2seconds. This change both simplifies the developer experience and improves the performance of STT services. With a shorterstop_secsvalue, STT services using a local VAD can finalize sooner, resulting in faster transcription.SpeechTimeoutUserTurnStopStrategy: control how long to wait for additional user speech usinguser_speech_timeout(default: 0.6 sec).TurnAnalyzerUserTurnStopStrategy: the turn analyzer automatically adjusts the user wait time based on the audio input.
(PR #3659)
-
Moved interruption wait event from per-processor instance state to
InterruptionFrameitself. AddedInterruptionFrame.complete()to signal when the interruption has fully traversed the pipeline. Custom processors that block or consume anInterruptionFramebefore it reaches the pipeline sink must callframe.complete()to avoid stalling `push_interruption_...
v0.0.101
Added
-
Additions for
AICFilterandAICVADAnalyzer:- Added model downloading support to
AICFilterwithmodel_idandmodel_download_dirparameters. - Added
model_pathparameter toAICFilterfor loading local.aicmodelfiles. - Added unit tests for
AICFilterandAICVADAnalyzer.
(PR #3408)
- Added model downloading support to
-
Added handling for
server_content.interruptedsignal in the Gemini Live service for faster interruption response in the case where there isn't already turn tracking in the pipeline, e.g. local VAD + context aggregators. When there is already turn tracking in the pipeline, the additional interruption does no harm.
(PR #3429) -
Added new
GenesysFrameSerializerfor the Genesys AudioHook WebSocket protocol, enabling bidirectional audio streaming between Pipecat pipelines and Genesys Cloud contact center.
(PR #3500) -
Added
reached_upstream_typesandreached_downstream_typesread-only properties toPipelineTaskfor inspecting current frame filters.
(PR #3510) -
Added
add_reached_upstream_filter()andadd_reached_downstream_filter()methods toPipelineTaskfor appending frame types.
(PR #3510) -
Added
UserTurnCompletionLLMServiceMixinfor LLM services to detect and filter incomplete user turns. When enabled viafilter_incomplete_user_turnsinLLMUserAggregatorParams, the LLM outputs a turn completion marker at the start of each response: ✓ (complete), ○ (incomplete short), or ◐ (incomplete long). Incomplete turns are suppressed, and configurable timeouts automatically re-prompt the user.
(PR #3518) -
Added
FrameProcessor.broadcast_frame_instance(frame)method to broadcast a frame instance by extracting its fields and creating new instances for each direction.
(PR #3519) -
PipelineTasknow automatically addsRTVIProcessorand registersRTVIObserverwhenenable_rtvi=True(default), simplifying pipeline setup.
(PR #3519) -
Added
RTVIProcessor.create_rtvi_observer()factory method for creating RTVI observers.
(PR #3519) -
Added
video_out_codecparameter toTransportParamsallowing configuration of the preferred video codec (e.g.,"VP8","H264","H265") for video output inDailyTransport.
(PR #3520) -
Added
locationparameter to Google TTS services (GoogleHttpTTSService,GoogleTTSService,GeminiTTSService) for regional endpoint support.
(PR #3523) -
Added new
PIPECAT_SMART_TURN_LOG_DATAenvironment variable, which causes Smart Turn input data to be saved to disk
(PR #3525) -
Added
result_callbackparameter toUserImageRequestFrameto support deferred function call results.
(PR #3571) -
Added
function_call_timeout_secsparameter toLLMServiceto configure timeout for deferred function calls (defaults to 10.0 seconds).
(PR #3571) -
Added
vad_analyzerparameter toLLMUserAggregatorParams. VAD analysis is now handled inside theLLMUserAggregatorrather than in the transport, keeping voice activity detection closer to where it is consumed. Thevad_analyzeronBaseInputTransportis now deprecated.context_aggregator = LLMContextAggregatorPair( context, user_params=LLMUserAggregatorParams( vad_analyzer=SileroVADAnalyzer(), ), )
(PR #3583)
-
Added
VADProcessorfor detecting speech in audio streams within a pipeline. PushesVADUserStartedSpeakingFrame,VADUserStoppedSpeakingFrame, andUserSpeakingFramedownstream based on VAD state changes.
(PR #3583) -
Added
VADControllerfor managing voice activity detection state and emitting speech events independently of transport or pipeline processors.
(PR #3583) -
Added local
PiperTTSServicefor offline text-to-speech using Piper voice models. The existing HTTP-based service has been renamed toPiperHttpTTSService.
(PR #3585) -
main()inpipecat.runner.runnow accepts an optionalargparse.ArgumentParser, allowing bots to define custom CLI arguments accessible viarunner_args.cli_args.
(PR #3590) -
Added
KokoroTTSServicefor local text-to-speech synthesis using the Kokoro-82M model.
(PR #3595)
Changed
-
Updated
AICFilterandAICVADAnalyzerto use aic-sdk ~= 2.0.1.
(PR #3408) -
Improved the STT TTFB (Time To First Byte) measurement, reporting the delay between when the user stops speaking and when the final transcription is received. Note: Unlike traditional TTFB which measures from a discrete request, STT services receive continuous audio input—so we measure from speech end to final transcript, which captures the latency that matters for voice AI applications. In support of this change, added
finalizedfield toTranscriptionFrameto indicate when a transcript is the final result for an utterance.
(PR #3495) -
SarvamSTTServicenow defaultsvad_signalsandhigh_vad_sensitivitytoNone(omitted from connection parameters), improving latency by ~300ms compared to the previous defaults.
(PR #3495) -
Changed frame filter storage from tuples to sets in
PipelineTask.
(PR #3510) -
Changed default Inworld TTS model from
inworld-tts-1toinworld-tts-1.5-max.
(PR #3531) -
FrameSerializernow subclasses fromBaseObjectto enable event support.
(PR #3560) -
Added support for TTFS in
SpeechmaticsSTTServiceand set the default mode toEXTERNALto support Pipecat-controlled VAD.- Changed dependency to
speechmatics-voice[smart]>=0.2.8
(PR #3562)
- Changed dependency to
-
⚠️ Changed function call handling to use timeout-based completion instead of immediate callback execution.- Function calls that defer their results (e.g.,
UserImageRequestFrame) now use a timeout mechanism - The
result_callbackis invoked automatically when the deferred operation completes or after timeout - This change affects examples using
UserImageRequestFrame- theresult_callbackshould now be passed to the frame instead of being called immediately
(PR #3571)
- Function calls that defer their results (e.g.,
-
Pipecat runner now uses
DAILY_ROOM_URLinstead ofDAILY_SAMPLE_ROOM_URL.
(PR #3582) -
Updates to
GradiumSTTService:- Now flushes pending transcriptions when VAD detects the user stopped speaking, improving response latency.
GradiumSTTServicenow supportsInputParamsfor configuringlanguageanddelay_in_framessettings.
(PR #3587)
Deprecated
⚠️ Deprecatedvad_analyzerparameter onBaseInputTransport. Passvad_analyzertoLLMUserAggregatorParamsinstead or useVADProcessorin the pipeline.
(PR #3583)
Removed
- Removed deprecated
AICFilterparameters:enhancement_level,voice_gain,noise_gate_enable.
(PR #3408)
Fixed
-
Fixed an issue where if you were using
OpenRouterLLMServicewith a Gemini model, it wouldn't handle multiple"system"messages as expected (and as we do inGoogleLLMService), which is to convert subsequent ones into"user"messages. Instead, the latest"system"message would overwrite the previous ones.
(PR #3406) -
Transports now properly broadcast
InputTransportMessageFrameframes both upstream and downstream instead of only pushing downstream.
(PR #3519) -
Fixed
FrameProcessor.broadcast_frame()to deep copy kwargs, preventing shared mutable references between the downstream and upstream frame instances.
(PR #3519) -
Fixed OpenAI LLM services to emit
ErrorFrameon completion timeout, enabling proper error handling and LLMSwitcher failover.
(PR #3529) -
Fixed a logging issue where non-ASCII characters (e.g., Japanese, Chinese, etc.) were being unnecessarily escaped to Unicode sequences when function call occurred.
(PR #3536) -
Fixed how audio tracks are synchronized inside the
AudioBufferProcessorto fix timing issues where silence and audio were misaligned between user and bot buffers.
(PR #3541) -
Fixed race condition in
OpenAIRealtimeBetaLLMServicethat could cause an error when truncating the conversation....
v0.0.100
Added
-
Added Hathora service to support Hathora-hosted TTS and STT models (only non-streaming)
(PR #3169) -
Added
CambTTSService, using Camb.ai's TTS integration with MARS models (mars-flash, mars-pro, mars-instruct) for high-quality text-to-speech synthesis.
(PR #3349) -
Added the
additional_headersparam toWebsocketClientParams, allowingWebsocketClientTransportto send custom headers on connect, for cases such as authentication.
(PR #3461) -
Added
UserIdleControllerfor detecting user idle state, integrated intoLLMUserAggregatorandUserTurnProcessorvia optionaluser_idle_timeoutparameter. Emitson_user_turn_idleevent for application-level handling. DeprecatedUserIdleProcessorin favor of the new compositional approach.
(PR #3482) -
Added
on_user_mute_startedandon_user_mute_stoppedevent handlers toLLMUserAggregatorfor tracking user mute state changes.
(PR #3490)
Changed
-
Enhanced interruption handling in
AsyncAITTSServiceby supporting multi-context WebSocket sessions for more robust context management.
(PR #3287) -
Throttle
UserSpeakingFrameto broadcast at most every 200ms instead of on every audio chunk, reducing frame processing overhead during user speech.
(PR #3483)
Deprecated
- For consistency with other package names, we just deprecated
pipecat.turns.mute(introduced in Pipecat 0.0.99) in favor ofpipecat.turns.user_mute.
(PR #3479)
Fixed
-
Corrected TTFB metric calculation in
AsyncAIHttpTTSService.
(PR #3287) -
Fixed an issue where the "bot-llm-text" RTVI event would not fire for realtime (speech-to-speech) services:
AWSNovaSonicLLMServiceGeminiLiveLLMServiceOpenAIRealtimeLLMServiceGrokRealtimeLLMService
The issue was that these services weren't pushing
LLMTextFrames. Now they do.
(PR #3446) -
Fixed an issue where
on_user_turn_stop_timeoutcould fire while a user is talking when usingExternalUserTurnStrategies.
(PR #3454) -
Fixed an issue where user turn start strategies were not being reset after a user turn started, causing incorrect strategy behavior.
(PR #3455) -
Fixed
MinWordsUserTurnStartStrategyto not aggregate transcriptions, preventing incorrect turn starts when words are spoken with pauses between them.
(PR #3462) -
Fixed an issue where Grok Realtime would error out when running with SmallWebRTC transport.
(PR #3480) -
Fixed a
Mem0MemoryServiceissue where passingasync_mode: truewas causing an error. See https://docs.mem0.ai/platform/features/async-mode-default-change.
(PR #3484) -
Fixed
AWSNovaSonicLLMService.reset_conversation(), which would previously error out. Now it successfully reconnects and "rehydrates" from the context object.
(PR #3486) -
Fixed
AzureTTSServicetranscript formatting issues:- Punctuation now appears without extra spaces (e.g., "Hello!" instead of "Hello !")
- CJK languages (Chinese, Japanese, Korean) no longer have unwanted spaces between characters
(PR #3489)
-
Fixed an issue where
UninterruptibleFrameframes would not be preserved in some cases.
(PR #3494) -
Fixed memory leak in
LiveKitTransportwhenvideo_in_enabledisFalse.
(PR #3499) -
Fixed an issue in
AIServicewhere unhandled exceptions instart(),stop(), orcancel()implementations would preventprocess_frame()to continue and thereforeStartFrame,EndFrame, orCancelFramefrom being pushed downstream, causing the pipeline to not start or stop properly.
(PR #3503) -
Moved
NVIDIATTSServiceandNVIDIASTTServiceclient initialization from constructor tostart()for better error handling.
(PR #3504) -
Optimized
NVIDIATTSServiceto process incoming audio frames immediately.
(PR #3509) -
Optimized
NVIDIASTTServiceby removing unnecessary queue and task.
(PR #3509) -
Fixed a
CambTTSServiceissue where client was being initialized in the constructor which wouldn't allow for proper Pipeline error handling.
(PR #3511)