446 KiB
Changelog
All notable changes to Pipecat will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[1.2.1] - 2026-05-15
Changed
- Changed the default WebSocket endpoints for
GradiumSTTServiceandGradiumTTSServiceto the region-neutralwss://api.gradium.ai/api/speech/asrandwss://api.gradium.ai/api/speech/tts. Gradium now automatically routes traffic to the nearest endpoint. Override the url to pin to a specific region. (PR #4500)
Fixed
- Fixed bot hangs when
filter_incomplete_user_turnswas enabled and the LLM responded by calling a tool. The user turn never finalized, so the assistant aggregator gated the tool-result context push and the LLM continuation never ran. Tool calls now finalize the turn the moment they start, before the function dispatches. (PR #4501)
[1.2.0] - 2026-05-14
Added
-
Added a
session_idfield toRunnerArgumentsso bots can log or trace a per-session identifier in local development the same way they can in Pipecat Cloud. The development runner now mints a UUID at every construction site, and paths that already returned asessionIdto the caller (Daily/start, dial-in webhook) share that same UUID with the runner args instead of generating two. The SmallWebRTC/api/offerendpoint also accepts an optionalsession_idquery parameter so the/sessions/{session_id}/...proxy can thread it through. (PR #4385) -
Added a
max_buffer_delay_msconstructor argument toCartesiaTTSServicefor controlling Cartesia's server-side text buffering. When unset, Pipecat picks a sensible default based ontext_aggregation_mode:0inSENTENCEmode (custom buffering — avoids stacking client-side aggregation on top of Cartesia's default 3000ms server buffer) and unset inTOKENmode (Cartesia's managed buffering applies). Pass an explicit value (0–5000ms) to override. (PR #4390) -
Added a
mip_opt_outconstructor argument toDeepgramTTSServiceandDeepgramHttpTTSServiceso callers can opt out of the Deepgram Model Improvement Program. When set, the value is forwarded to Deepgram as a query parameter on the speak request. Defaults toNone, which preserves the existing behavior. See https://dpgr.am/deepgram-mip for pricing implications before enabling. (PR #4400) -
Added an opt-in
add_tool_change_messagesflag to the LLM aggregators (set viaLLMContextAggregatorPair(..., add_tool_change_messages=True)) that appends a developer-role message to the context wheneverLLMSetToolsFramechanges the set of advertised standard tools. Helps the LLM stay coherent across mid-conversation tool changes, mitigating several flavors of tool-call-related hallucination: calling tools that have been removed, avoiding tools that have been re-added, and hallucinating output (made-up answers or tool-call-shaped non-tool-calls) when tools are unavailable. (PR #4404) -
Added
deferred(strategy)andDeferredUserTurnStopStrategyinpipecat.turns.user_stop. Wraps a stop strategy so it fires only the inference-triggered event and suppresseson_user_turn_stopped, leaving finalization to another strategy in the chain such asLLMTurnCompletionUserTurnStopStrategy. (PR #4405) -
Added
ExternalUserTurnCompletionStopStrategyinpipecat.turns.user_stop— a generic stop strategy that finalizes the user turn whenever aUserTurnInferenceCompletedFramearrives, regardless of which component produced it.LLMTurnCompletionUserTurnStopStrategynow extends this base; future producers (Flux, custom end-of-turn classifiers, etc.) can use the base directly or subclass it to add producer-specific setup. (PR #4405) -
Added
on_user_turn_inference_triggered, a new event on the user turn controller, processor, aggregator and stop strategies that fires when a strategy has enough signal to start LLM inference. By default it fires together withon_user_turn_stopped; a gating strategy can fire only the inference-triggered event and defer finalization to a peer. (PR #4405) -
Added
FilterIncompleteUserTurnStrategiesinpipecat.turns.user_turn_strategies— aUserTurnStrategiesspecialization that wraps the detector chain withdeferred(...)and appendsLLMTurnCompletionUserTurnStopStrategyas the finalizer. Common case:user_turn_strategies=FilterIncompleteUserTurnStrategies(). Passconfig=UserTurnCompletionConfig(...)to customize timeouts and prompts. (PR #4405) -
Added
LLMTurnCompletionUserTurnStopStrategyinpipecat.turns.user_stop. When installed, the strategy gateson_user_turn_stoppedon aUserTurnInferenceCompletedFrame(a new fieldless system frame emitted by any component that can judge turn completeness — e.g. theUserTurnCompletionLLMServiceMixinon✓). Afinalization_timeoutprovides a safety net if no completion frame ever arrives. (PR #4405) -
Added first-class RTVI support for the UI Agent Protocol:
- Adds
ui-event,ui-snapshot, andui-cancel-taskclient-to-server messages, plusui-commandandui-taskserver-to-client messages, with paired*Data/*Messagepydantic models. - Adds built-in command payload models for
Toast,Navigate,ScrollTo,Highlight,Focus,Click,SetInputValue, andSelectText; matching default handlers live in@pipecat-ai/client-react. - Adds
RTVIProcessor.on_ui_messagefor inboundui-event,ui-snapshot, andui-cancel-taskmessages. - Adds five UI pipeline frames, mirroring the
client-messageframe-and-event pattern: downstream code pushesRTVIUICommandFrame/RTVIUITaskFramefor the observer to wrap into outboundUICommandMessage/UITaskMessageenvelopes, while the processor pushes inboundRTVIUIEventFrame,RTVIUISnapshotFrame, andRTVIUICancelTaskFramealongsideon_ui_message. - Bumps the RTVI
PROTOCOL_VERSIONfrom1.2.0to1.3.0. (PR #4407)
- Adds
-
AWS Transcribe STT, Polly TTS, Bedrock LLM, and the Bedrock AgentCore processor now resolve credentials via the standard boto3 provider chain (EC2 instance profiles, EKS pod roles / IRSA, ECS task roles, SSO,
~/.aws/credentials) when explicit credentials andAWS_*environment variables are absent. Services running with IAM roles no longer need to export static credentials. (PR #4416) -
Added
keytermssupport to ElevenLabs STT services so Scribe V2 callers can bias transcription for both file-based and realtime transcription. (PR #4426) -
Added
watchdog_min_timeoutparameter toDeepgramFluxSTTandDeepgramFluxSageMakerSTT(default0.5seconds) to control the minimum silence duration before the watchdog sends a silence packet to prevent dangling turns. The actual threshold ismax(chunk_duration * 2, watchdog_min_timeout), so it also adapts automatically to the audio chunk size in use. (PR #4430) -
Added
cancel_on_interruption=Falsesupport forGeminiLiveLLMServiceon models that support Gemini's NON_BLOCKING tool mechanism (currently Gemini 2.x); the conversation now continues while the tool runs. On models that don't yet support NON_BLOCKING (Gemini 3.x), the service surfaces a one-time warning explaining the limitation. (Note: an intermittent 1008 error can occasionally fire on Gemini 2.5 during long-running tool calls; we auto-reconnect.) (PR #4448) -
Added
NvidiaSageMakerWebsocketSTTServicefor streaming speech recognition using NVIDIA Nemotron ASR via an AWS SageMaker bidirectional-stream endpoint. ProducesInterimTranscriptionFrameandTranscriptionFrameframes, is VAD-aware, and automatically reconnects on error. (PR #4464) -
Added NVIDIA Magpie TTS services via AWS SageMaker:
NvidiaSageMakerHTTPTTSService(single HTTP invocation, streams raw PCM back) andNvidiaSageMakerWebsocketTTSService(persistent HTTP/2 bidi-stream with full interruption support viaInterruptibleTTSService). (PR #4464) -
Added support for
reasoningconfiguration onOpenAIRealtimeLLMService, for use with reasoning-capable Realtime models such asgpt-realtime-2. (PR #4470) -
Inworld TTS updates:
- Added
delivery_modesetting (STABLE/BALANCED/CREATIVE) toInworldTTSServiceandInworldHttpTTSService, enabling the stability-vs-creativity tradeoff ininworld-tts-2. - Added language support to
InworldTTSServiceandInworldHttpTTSService. Thelanguagesetting is now forwarded to the API, and a newlanguage_to_inworld_language()helper normalizes PipecatLanguageenums to Inworld's BCP-47 locale tags. (PR #4473)
- Added
Changed
-
Updated the default
SonioxTTSServicemodel fromtts-rt-v1-previewto the generally availabletts-rt-v1. (PR #4386) -
Default
cartesia_versionforCartesiaTTSServicebumped from2025-04-16to2026-03-01, matchingCartesiaHttpTTSServiceand unlocking theuse_normalized_timestampsandmax_buffer_delay_msfields. (PR #4390) -
⚠️
CartesiaTTSServicenow sendsuse_normalized_timestamps: trueinstead of the deprecateduse_original_timestampsfield. Word timestamps now reflect what was actually spoken (post text-normalization and pronunciation-dictionary substitution), matching the convention Pipecat uses for ElevenLabs. This is a behavior change forsonic-3users, who were previously receiving timestamps tied to the input transcript. (PR #4390) -
Broadened
tool_resourcestoapp_resourcesfor easy access not just in tool handlers but in other places like customFrameProcessors. Three changes: a rename (tool_resources→app_resources), a newapp_resourcesproperty onPipelineTask, and a newpipeline_taskproperty onFrameProcessor. Tool handlers now readparams.app_resources; custom processors readself.pipeline_task.app_resources. The previoustool_resourcesaliases (onPipelineTask,FunctionCallParams, andFrameProcessorSetup) keep working but are deprecated as of 1.2.0 and emitDeprecationWarnings. (PR #4395) -
Lowered the per-message log in
SmallWebRTCInputTransport._handle_app_messagefromdebugtotrace. App messages can be high-frequency and were noisy at debug level; set the loguru level toTRACEto see them again. (PR #4397) -
Changed the default model for
GrokRealtimeLLMServicetogrok-voice-think-fast-1.0, xAI's recommended Voice Agent model. The previous default ofgrok-voice-fast-1.0has been deprecated by xAI and is being removed. (PR #4401) -
Changed the default Inworld TTS model from
inworld-tts-1.5-maxtoinworld-tts-2(Realtime TTS-2) acrossInworldHttpTTSService,InworldTTSService, and theInworldRealtimeLLMServicecascade. Existing users can pin the prior model explicitly via themodel/tts_modelargument; bothinworld-tts-1.5-maxandinworld-tts-1.5-miniremain valid model IDs. (PR #4422) -
Changed the default model for
GrokLLMServicefromgrok-3togrok-4.20-non-reasoning. xAI is retiringgrok-3on May 15, 2026. (PR #4429) -
DeepgramFluxSTTwatchdog silence threshold is now dynamic:max(chunk_duration * 2, watchdog_min_timeout)instead of a fixed 500 ms. This prevents false silence injections when large audio chunks are sent at lower frequency. (PR #4430) -
ElevenLabsTTSServicenow sendsclose_contextto the server as soon as the turn is complete (onon_turn_context_completed) rather than waiting until all audio has finished playing back. TheisFinalmessage from ElevenLabs is now used to signalTTSStoppedFrameand clean up the audio context, improving turn transition timing. (PR #4433) -
Updated
InworldHttpTTSServiceandInworldTTSServiceto use PCM audio encoding by default, which returns audio bytes without headers. (PR #4446) -
Moved
create_task,cancel_task, thetask_managerproperty, andsetup(task_manager)up fromFrameProcessortoBaseObject. CustomBaseObjectsubclasses (turn strategies, controllers, etc.) now inherit these methods directly instead of reimplementing the task manager wiring. Owners propagate the task manager to their childBaseObjects viaawait child.setup(task_manager). (PR #4449) -
Changed the default OpenAI Realtime input audio transcription model from
gpt-4o-transcribetogpt-realtime-whisperfor bothOpenAIRealtimeSTTServiceandOpenAIRealtimeLLMService. The new model does not accept thepromptparameter; if a prompt is supplied alongsidegpt-realtime-whisper, it is dropped automatically and a warning is logged. To keep using prompt hints, explicitly pinmodel="gpt-4o-transcribe"(or"gpt-4o-mini-transcribe"). (PR #4450) -
Updated the default model for
CartesiaTTSServiceandCartesiaHttpTTSServicefromsonic-3tosonic-3.5. (PR #4462) -
Changed the default model for
OpenAIRealtimeLLMServicefromgpt-realtime-1.5togpt-realtime-2. (PR #4472)
Deprecated
-
Deprecated
LLMUserAggregatorParams.filter_incomplete_user_turns. Useuser_turn_strategies=FilterIncompleteUserTurnStrategies()(or addLLMTurnCompletionUserTurnStopStrategyto a customuser_turn_strategies.stop) instead. Setting the legacy flag still works for one release: the aggregator emits aDeprecationWarningand rewires the strategies as if you had passedFilterIncompleteUserTurnStrategiesdirectly. (PR #4405) -
Deprecated
ResampyResamplerin favor ofSOXRAudioResampler(or thecreate_file_resampler()/create_stream_resampler()factories). InstantiatingResampyResamplernow emits aDeprecationWarning. The class will be removed in Pipecat 2.0 along with the defaultresampyandnumbadependencies. (PR #4428)
Fixed
-
Fixed
CartesiaTTSServicesurfacingflush_donemessages from Cartesia asErrorFrames. The latest API emits aflush_doneper transcript when server-side buffering is disabled; Pipecat now consumes them silently since each turn already has its owncontext_id. (PR #4390) -
Fixed Cartesia tag helpers (
SPELL,EMOTION_TAG,PAUSE_TAG,VOLUME_TAG,SPEED_TAG) raisingTypeErrorwhen called on an instance (e.g.tts.SPELL("hi")). They're now@staticmethodand callable from both the class and an instance. (PR #4390) -
Fixed
CartesiaHttpTTSServicepushing twoErrorFrames on a non-200 response — one with the API's error text and a second, less informative "Unknown error" frame from the outer exception handler. It now pushes a single frame that includes the HTTP status code and returns cleanly. (PR #4390) -
Fixed an issue where
LocalSmartTurnAnalyzerV3was imported unconditionally for user turn stop strategies. It is now only imported whendefault_user_turn_stop_strategies()is called. This improves startup time and removes thetransformers"PyTorch/TensorFlow/Flax not found" warning when the default stop strategies are not used. (PR #4393) -
Fixed
GrokRealtimeLLMServiceignoring the configured model. The model was stored inSettingsbut never sent to xAI, so every session silently fell back to xAI's server-side default. The model is now passed via the?model=query parameter on the WebSocket URL as xAI's Voice Agent API requires. (PR #4401) -
Fixed
on_user_turn_stoppedfiring prematurely whenfilter_incomplete_user_turnswas enabled. The event now fires only after the LLM confirms the user turn is complete (✓); previously the smart-turn detector's tentative stop was bubbling up before the LLM had a chance to veto it, causing observers, transcript appenders and UI indicators to receive an early — and sometimes duplicated — signal. (PR #4405) -
Fixed
TTSSpeakFrame(append_to_context=True)greetings sometimes splitting across two assistant messages in the LLM context and not surfacing inon_assistant_turn_stopped. TheLLMAssistantPushAggregationFrameemitted at the end of a TTS context now carries a PTS just past the last word so it can't overtake clock-queuedTTSTextFrames in the transport's output, andLLMAssistantAggregatornow triggerson_assistant_turn_started/on_assistant_turn_stoppedwhen it receives the frame outside an LLM response cycle (restoring v0.0.104 behavior for greeting transcripts). (PR #4414) -
Fixed
ElevenLabsTTSServiceandElevenLabsHttpTTSServiceproducing merged words (e.g.bookLook) when using Flash models. Flash often splits sentences mid-stream into alignment chunks that begin with a real inter-word space, but the previous fix unconditionally stripped that space from every chunk. Leading spaces are now stripped only on the first alignment chunk of an utterance, so subsequent chunks correctly flush partial words across boundaries. (PR #4415) -
Fixed AWS Polly TTS, Bedrock LLM, and the Bedrock AgentCore processor erroring out when only one of
AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEYwas set in the environment. The half-populated kwargs are no longer forwarded to aioboto3; partial env-var configurations now fall through to the boto3 credential chain like fully-unset configurations do. (PR #4416) -
Fixed
ElevenLabsTTSServiceandElevenLabsHttpTTSServicewriting romanized/normalized text to the LLM context. With non-Latin input (e.g., Chinese), the assistant transcript was getting populated with pinyin (Ni Hao !instead of你好!), which then degraded subsequent LLM turns. The services now consumealignmentby default and only switch tonormalizedAlignment/normalized_alignmentwhenpronunciation_dictionary_locatorsis configured (wherealignmenthas overlapping restarts that produce duplicated/garbled words, per #4316). Both fields are read with preferred-with-fallback semantics since each is nullable per the API schema. (PR #4424) -
Fixed a deadlock in
TTSServicethat could permanently stall pipeline processing when all three conditions occurred together:pause_frame_processing=True, an interruption arrived before any TTS audio was played, and anUninterruptibleFrame(e.g.TTSUpdateSettingsFrame,FunctionCallResultFrame) was in the processing queue at that moment. The process task would block on__process_event.wait()indefinitely becauseBotStoppedSpeakingFramenever arrives (no audio was played) and the interruption handler did not resume processing. Affects services usingpause_frame_processing=Truesuch as ElevenLabs, Rime, AsyncAI, Gradium, and ResembleAI. (PR #4431) -
Fixed interruptions being delayed when a slow non-uninterruptible frame was processing and an uninterruptible frame was waiting in the queue. The bot would stall until the slow frame finished instead of cancelling it immediately on interruption. (PR #4434)
-
Fixed
TTSServicedropping uninterruptible frames (e.g.FunctionCallResultFrame) from its internal serialization queue when an interruption occurs. Previously, the queue was recreated on every interruption, silently discarding any queued frames. The queue is now reset instead of recreated, preserving uninterruptible frames so they are always delivered downstream. (PR #4435) -
Fixed a race condition in the Daily transport that caused
AttributeError: 'NoneType' object has no attribute 'send_app_message'when tearing down a pipeline. BothDailyInputTransportandDailyOutputTransportshare the sameDailyTransportClientand both callcleanup(), which was releasing the underlyingCallClienton the first call — leaving the second caller with aNoneclient. (PR #4440) -
Restored
cancel_on_interruption=Falsesupport forAWSNovaSonicLLMServiceandOpenAIRealtimeLLMService. These services previously honored the flag by simply not cancelling in-flight function calls on interruption; the introduction of the new async-tool mechanism (which threads started/intermediate/final messages through the LLM context) broke that path because the realtime services didn't know how to interpret those messages. Note that new-style streamed intermediate results (FunctionCallResultProperties(is_final=False)) are not supported on these realtime services. Similar fixes for other impacted realtime services are forthcoming. (PR #4441) -
Fixed two misspelled Gemini TTS voice names in
GeminiTTSService.AVAILABLE_VOICES. (PR #4443) -
Extended the
cancel_on_interruption=Falseregression fix toGrokRealtimeLLMService,AzureRealtimeLLMService, andUltravoxRealtimeLLMService. Grok and Azure use the same approach as in #4441 (each service detects async-tool messages in the LLM context and routes the final result to its formal tool-result channel; Azure inherits transitively fromOpenAIRealtimeLLMService). Ultravox needed a different approach because its API freezes the conversation betweenclient_tool_invocationand the matchingclient_tool_result— for async-registered functions it now ships a placeholderclient_tool_resultimmediately when the function is invoked (to unfreeze the conversation), then injects the real result as user-side text once the tool finishes. Streamed intermediate results (FunctionCallResultProperties(is_final=False)) are still not supported on any of these realtime services.GeminiLiveLLMServiceandInworldRealtimeLLMServiceare excluded for now: Gemini Live's async-tool path needs deeper investigation, and Inworld tool calling needs to be sorted out first. (PR #4447) -
Fixed
OpenAIRealtimeLLMServicehandling of multi-output-item responses (observed withgpt-realtime-2). A single response can now contain more than one audio item, and the first item'saudio.donemay arrive after the second item's deltas have started. Deltas still arrive strictly in playback order, so we continue to forward them as received (matching OpenAI's reference implementation). The fix removes spurious warnings, ensures truncation always targets the latest audio item, and emits a single bracketingTTSStartedFrame/TTSStoppedFramepair per assistant turn (the Stopped is now pushed onresponse.done). (PR #4465) -
Fixed missing
outputattribute on LLM OpenTelemetry spans when the LLM call is interrupted mid-stream. (PR #4467) -
Fixed incorrect
metrics.ttfbon STT OpenTelemetry spans, and parented them to the current turn span. (PR #4467) -
Fixed incorrect
metrics.ttfbon TTS OpenTelemetry spans for streaming services. (PR #4467) -
Extended the
cancel_on_interruption=Falseregression fix toInworldRealtimeLLMService. Uses the same approach as in #4441 (the service detects async-tool messages in the LLM context and routes the final result to its formal tool-result channel). Note: as of this writing, Inworld Realtime doesn't appear to handle the resulting delayed tool result reliably — the routing is best-effort and the service surfaces a one-time warning when async-tool messages are seen. Streamed intermediate results (FunctionCallResultProperties(is_final=False)) are still not supported on this realtime service. (Inworld was excluded from #4447 pending resolution of an unrelated tool-calling issue, which turned out to be an account-level matter.) (PR #4474) -
Fixed Cartesia TTS Korean word timestamps to use normal spacing rules, preserving word boundaries and per-word timestamp alignment during downstream aggregation. (PR #4475)
-
Fixed Cartesia TTS Chinese and Japanese timestamp grouping to preserve provider text spacing, avoiding artificial spaces when timestamp groups are reassembled downstream. (PR #4475)
-
Fixed
SonioxSTTServicefinal transcription frames missing detected language metadata when Soniox returns token-level language annotations. (PR #4482) -
Fixed Soniox final transcription language detection to use the most common recognized token language, avoiding mislabeling an utterance when the last token is tagged with a different language. (PR #4495)
-
Fixed dropped audio in streaming TTS services whose wire protocol doesn't echo
context_idback on incoming audio (Sarvam, Smallest, Soniox, Inworld, and others). Previously, audio that arrived between contexts or at the very start of a turn was tagged withcontext_id=Noneand silently dropped with an "unable to append audio to context: no context ID provided" debug log.TTSService.get_active_audio_context_id()now falls back to the synthesis-side_turn_context_idwhen the playback cursor isn't set yet. (PR #4497)
Security
- Fixed a path traversal issue in the development runner's
/files/{filename:path}download endpoint. Previously, when the runner was started with--folder, a request like/files/..%2F..%2Fetc%2Fpasswdcould escape the configured folder because%2F-encoded separators bypassed Starlette's path normalisation. The endpoint now resolves the joined path and rejects any filename that escapes the allowed base with a 403, and also returns 404 (instead of an implicitnull200) when--folderis unset. (PR #4417)
[1.1.0] - 2026-04-27
Added
-
Added
MistralSTTServicefor real-time speech-to-text using Mistral's Voxtral Realtime API (voxtral-mini-transcribe-realtime-2602). Supports streaming transcription with interim results, automatic language detection, and VAD-driven utterance lifecycle. (PR #4253) -
Added
buttonsfield toOutputDTMFFrameandOutputDTMFUrgentFramefor sending multi-key DTMF sequences as alist[KeypadEntry]. UseOutputDTMFFrame.from_string("123#")(or the equivalent onOutputDTMFUrgentFrame) to build one from a dial string, andto_string()to convert back. (PR #4313) -
Added
DailyTransport.send_dtmf()to expose the Daily call client's DTMF sending capability, enabling applications to send tones during a call (e.g. IVR navigation). (PR #4313) -
Added
DailyOutputDTMFFrameandDailyOutputDTMFUrgentFrameframes. In addition to the inheritedbuttons, they acceptsession_id,digit_duration_msandmethod, which are forwarded to Daily'ssend_dtmfassessionId,digitDurationMsandmethod. (PR #4313) -
Added incremental
pyrighttype checking. Apyrightconfig.jsonat the repo root usestypeCheckingMode: "basic"with an explicitincludelist of modules that pass cleanly (clocks,metrics,transcriptions,frames,observers,extensions,turns,pipeline,runner). Remaining modules will be added in subsequent PRs. CI enforces the checked set viauv run pyrightin the format workflow. (PR #4324) -
Added multilingual support to
DeepgramFluxSTTServicevia a newlanguage_hints: list[Language]setting. Works with Deepgram's newflux-general-multimodel to bias transcription across English, Spanish, French, German, Hindi, Russian, Portuguese, Japanese, Italian, and Dutch. Omit the hints to use auto-detection, or pass a subset to bias toward expected languages. Hints can be updated mid-stream viaSTTUpdateSettingsFrame(sent as a DeepgramConfigurecontrol message, no reconnect) to support detect-then-lock flows. (PR #4326) -
Added fine-grained server-side VAD tuning options to
SarvamSTTService.Settingsfor thesaaras:v3model, including speech thresholds, frame-count controls, pre-speech padding, interruption sensitivity, and initial-frame skipping. (PR #4334) -
Added
XAISTTServicefor real-time speech-to-text using xAI's voice STT WebSocket API (wss://api.x.ai/v1/stt). Streams raw audio (PCM, µ-law, or A-law) and emits interim and final transcription frames driven by the server'sis_final/speech_finalflags. Settings exposeinterim_results,endpointing,language,multichannel,channels, anddiarize. Requires thexaioptional extra (pip install "pipecat-ai[xai]"). (PR #4340) -
Added
XAITTSServicefor streaming text-to-speech using xAI's WebSocket TTS endpoint (wss://api.x.ai/v1/tts). Streamstext.deltachunks up and base64audio.deltachunks down on the same connection so audio begins flowing before the full utterance finishes synthesizing; complements the batch-HTTPXAIHttpTTSService. Defaults to raw PCM output soTTSAudioRawFrameneeds no decoding. Thexaioptional extra now pulls inpipecat-ai[websockets-base]. (PR #4341) -
Added
SonioxTTSService, a real-time WebSocket TTS service that streams text in and audio out over a persistent connection. Install withpip install "pipecat-ai[soniox]". (PR #4360) -
Added support for Daily's built-in
screenVideodestination inDailyTransport. When"screenVideo"is included invideo_out_destinationstransport parameter, a dedicated screen video track is created at join time and frames withtransport_destination="screenVideo"are routed to it.params = DailyParams( video_out_enabled=True, video_out_is_live=True, video_out_width=1280, video_out_height=720, video_out_destinations=["screenVideo"] ) ... frame = OutputImageRawFrame(...) frame.transport_destination = "screenVideo"(PR #4370)
-
Added
camera_out_send_settingstoDailyParams. This dict is passed verbatim to the Daily client's camera publishing settings, allowing applications to fully control encoding, codec, bitrate, and framerate.params = DailyParams( camera_out_send_settings={ "maxQuality": "high", "encodings": { "high": {"maxBitrate": 2_000_000, "maxFramerate": 30} }, }, )(PR #4370)
-
Added
tool_resourcestoPipelineTaskandFunctionCallParams. Pass an application-defined object (DB handles, clients, state, etc.) toPipelineTask(..., tool_resources=...)and access it from any tool handler viaparams.tool_resources. Passed by reference; the caller retains their handle and can read mutations after the task finishes. Resolves #4256. (PR #4371)
Changed
-
Updated NVIDIA STT services to align with Nemotron Speech defaults and configuration:
api_keyis now optional for local deployments, additional recognition settings are available (including alternatives, word offsets, and diarization), and streaming/segmented docs now reflect Nemotron Speech APIs.- NVIDIA streaming STT now sets
TranscriptionFrame.finalized=Truewhen the provider marks a result as final, and preserveslanguageon bothTranscriptionFrameandInterimTranscriptionFrame. (PR #4269)
- NVIDIA streaming STT now sets
-
Updated
NvidiaLLMServiceto emit model reasoning asLLMThought*Frames (from bothreasoning_contentand<think>...</think>output), avoid mixing reasoning text into normal assistant content, and allow keyless local NIM endpoints while warning when the cloud endpoint is used without an API key. (PR #4270) -
STT services now reconnect safely when settings change: reconnection is deferred until the current user turn ends (i.e., until
UserStoppedSpeakingFrameis received) rather than interrupting an active speech session. Audio frames received while the reconnect is in progress are buffered and replayed once the new connection is ready.CartesiaSTTServiceandDeepgramSTTServiceboth use this new behavior. (PR #4311) -
Reduced debug log noise for LLM services. The system instruction is now logged once when composed (e.g. when turn completion is enabled) instead of on every LLM call. Per-call logs now show only the conversation messages, consistent across Google, Anthropic, AWS, and OpenAI services. (PR #4314)
-
LiveKitRunnerArguments.tokenis now a requiredstr(previouslystr | Nonewith a default ofNone). LiveKit requires a token to join a room, so the type now reflects reality. This only affects custom runners that constructLiveKitRunnerArgumentsdirectly; code consuming the argument from the standard runner is unaffected. (PR #4324) -
TranscriptionFrame.languageandInterimTranscriptionFrame.languageemitted byDeepgramFluxSTTServicenow reflect the language Deepgram detected for each turn (read from thelanguagesfield on Flux'sTurnInfoevent). Onflux-general-multithis gives per-turn accuracy for downstream consumers (e.g. TTS voice selection).flux-general-encontinues to emitLanguage.EN. (PR #4326) -
Added
includes_inter_frame_spacesparameter toTTSService.add_word_timestampsand_add_word_timestamps(defaultNone). WhenTrue, downstream consumers will not inject additional spaces between tokens;Noneleaves each frame's own default unchanged.InworldTTSServicenow passesincludes_inter_frame_spaces=Truewhen reporting word timestamps, since Inworld tokens already include inter-word spacing. (PR #4330)
-
SarvamSTTServicenow usessaaras:v3as its default model instead ofsaarika:v2.5. Applications that relied on the previous default should setsettings=SarvamSTTService.Settings(model="saarika:v2.5")explicitly. (PR #4334) -
SpeechTimeoutUserTurnStopStrategynow waits onlyuser_speech_timeoutwhen a transcript arrives without a VAD stop event, rather thanmax(ttfs_p99_latency, user_speech_timeout). If you hadttfs_p99_latency > user_speech_timeout, turn detection in that path is slightly faster than before. (PR #4337) -
If you use an STT service that emits finalized transcripts (Speechmatics, Soniox, Deepgram Flux, AssemblyAI) with
SpeechTimeoutUserTurnStopStrategy, user turns now end as soon asuser_speech_timeoutelapses after VAD stop. Previously the strategy also waited for the STT P99 latency (ttfs_p99_latency) even when the transcript was already marked final.user_speech_timeoutis still honored as a floor — STT finalization never shortens it. (PR #4337) -
⚠️
PlivoFrameSerializerandTelnyxFrameSerializernow raiseValueErrorat construction whenauto_hang_up=True(the default) but required credentials are missing, matchingTwilioFrameSerializer. Previously they constructed successfully and the hangup failed silently at call-end, leaving phantom billable sessions on the provider. If you relied on the old silent behavior, passauto_hang_up=Falseexplicitly or provide the credentials. The specific fields checked arecall_id/auth_id/auth_tokenfor Plivo andcall_control_id/api_keyfor Telnyx. (PR #4349) -
ToolsSchema(standard_tools=...)now accepts anySequence[FunctionSchema | DirectFunction]rather than requiring an exactlistof the union. Callers can pass a narrowerlist[FunctionSchema](or any otherSequence) without the type checker complaining about list invariance. (PR #4352) -
Updated
aic-sdkdependency to~=2.2.0. TheAIC_LICENSE_KEYenvironment variable replaces the previousAICOUSTICS_LICENSE_KEY. (PR #4362) -
Loosened the
protobufdependency to>=5.29.6,<7, so projects pinned to protobuf 5.x can installpipecat-aiagain. The previous>=6.31.1,<7pin (introduced in 1.0.8 alongside thenvidia-riva-client 2.25.1upgrade) silently blocked any environment whose dependency graph already constrained protobuf to the 5.x line. The bundledframes_pb2.pyis now compiled with protoc 5.x so it imports cleanly on both 5.x and 6.x runtimes.Installing the
nvidiaextra still pulls protobuf 6.x:nvidia-riva-client 2.25.1ships gencode that requires a 6.x runtime, sopipecat-ai[nvidia]now declaresprotobuf>=6.31.1,<7explicitly to cover an upstream packaging gap (https://github.com/nvidia-riva/python-clients/issues/172). (PR #4372) -
Daily rooms created by the development runner (
pipecat.runner.run) now expire after 4 hours witheject_at_room_exp=True, mirroring Pipecat Cloud's max session limit. Previously, runner-created rooms inherited a 2-hour expiration on the default code paths and had no expiration at all when callers posted partialdailyRoomProperties(e.g.{"start_video_off": true}) to/start, causing rooms to accumulate indefinitely. Explicitexpandeject_at_room_expvalues indailyRoomPropertiesare still respected. (PR #4374) -
Updated
daily-pythondependency to~=0.28.0. (PR #4379)
Deprecated
- Deprecated
TransportParams.video_out_bitratefor the Daily transport. UseDailyParams.camera_out_send_settingsinstead to configure camera publishing encodings (bitrate, framerate, codec, etc.). (PR #4370)
Fixed
-
Fixed missing tool handlers so unregistered tool calls fail with a normal final tool result instead of leaving tool-call state hanging. (PR #4301)
-
Fixed
pipecat-ai[tavus]not installing the requireddaily-pythondependency. Installing thetavusextra now correctly pulls inpipecat-ai[daily]. (PR #4304) -
Fixed audio loss and potential errors when STT settings were updated mid-speech. Previously,
CartesiaSTTServiceandDeepgramSTTServicewould immediately disconnect and reconnect when settings changed, dropping any in-flight audio. Reconnection is now deferred until the user stops speaking, and audio arriving during the reconnect window is buffered and replayed. (PR #4311) -
Fixed
SmallestTTSServiceWebSocket endpoint URL to match Smallest AI v4.0.0 API (wss://waves-api.smallest.ai→wss://api.smallest.ai) and restored keepalive using a silent space message instead of the unsupported flush command. (PR #4320) -
Fixed whitespace handling in TTS token streaming mode. Inter-token whitespace (e.g., spaces between words) is now preserved for correct prosody, while leading whitespace before the first non-whitespace token is still stripped to avoid issues with TTS models that are sensitive to leading spaces. (PR #4323)
-
Fixed
SentryMetricssilently droppingMetricsFrames fromstop_ttfb_metricsandstop_processing_metrics.SentryMetricscalled the baseFrameProcessorMetricsimplementation but discarded its return value, soFrameProcessornever pushed theMetricsFramedownstream. This prevented observers (e.g.UserBotLatencyObserver,MetricsLogObserver) from seeing TTFB and processing metrics for any service usingmetrics=SentryMetrics(). The metrics were still calculated and Sentry transactions still completed — only the downstream frame push was affected. (PR #4325) -
Fixed
ElevenLabsTTSServiceandElevenLabsHttpTTSServiceemitting word timestamps andTTSTextFramecontent that matched the input text instead of the spoken audio when a pronunciation dictionary (pronunciation_dictionary_locators) or text normalization rewrote the input. Both services now consume ElevenLabs' normalized alignment, so downstream consumers (captions, transcripts, context aggregation) reflect what the listener actually hears. (PR #4344) -
Fixed a crash in
DeepgramSTTServicewhen anSTTUpdateSettingsFramearrived before the WebSocket handshake completed (for example, when pushing an update upstream onStartFrame). The settings-triggered reconnect cancelled the in-flight connection task before its keepalive task was created, causing anUnboundLocalError: cannot access local variable 'keepalive_task'in the handler'sfinallyblock. (PR #4347) -
Fixed direct-function registration crashing for functions without a docstring.
DirectFunctionWrapperpassedinspect.getdoc()'s result todocstring_parser.parse(), which raises when the docstring isNone. Functions now register cleanly whether or not they have a docstring; an empty docstring produces empty description and parameter metadata as expected. (PR #4352) -
Fixed
AssemblyAISTTService,CartesiaSTTService,GradiumSTTService, andSonioxSTTServicecrashing the pipeline on transient WebSocket send failures. Eachrun_sttsent audio directly without catching errors, so a single network hiccup mid-stream raised an uncaught exception throughprocess_frame. The guards now log a warning and let the connection-state check on the next call handle recovery, matching the pattern used by Deepgram, xAI, Azure, and other push-based STTs. (PR #4352) -
Fixed Gemini Live losing conversation history in the (rare) case of a WebSocket reconnect before any session resumption handle is received. When the session reconnects (e.g. on system instruction change), conversation history is now re-seeded into the new session before it is marked ready for input. (PR #4355)
-
Fixed SmallWebRTC data channel silently stalling on networks with a 1280-byte MTU (IPv6, Tailscale overlays, many consumer VPNs). aiortc's default SCTP chunk size of 1200 bytes produces ~1305-byte UDP datagrams after headers, which the kernel rejects with EMSGSIZE; aiortc has no path-MTU discovery so it retransmits forever at the same oversized size. The chunk size is now clamped to 1100 bytes (~1205-byte datagrams, ~75 bytes of slack). Override with
PIPECAT_SCTP_MAX_CHUNK_SIZEif your path MTU requires a different value. (PR #4358)
[1.0.0] - 2026-04-14
Migration guide: https://docs.pipecat.ai/pipecat/migration/migration-1.0
Added
-
Updated LemonSlice transport:
- Added
on_avatar_connectedandon_avatar_disconnectedevents triggered when the avatar joins and leaves the room. - Added
api_urlparameter toLemonSliceNewSessionRequestto allow overriding the LemonSlice API endpoint. - Added support for passing arbitrary named parameters to the LemonSlice API endpoint. (PR #3995)
- Added
-
Added Inworld Realtime LLM service with WebSocket-based cascade STT/LLM/TTS, semantic VAD, function calling, and Router support. (PR #4140)
-
⚠️ Added WebSocket-based
OpenAIResponsesLLMServiceas the new default for the OpenAI Responses API. It maintains a persistent connection towss://api.openai.com/v1/responsesand automatically usesprevious_response_idto send only incremental context, falling back to full context on reconnection or cache miss. The previous HTTP-based implementation is now available asOpenAIResponsesHttpLLMService. (PR #4141) -
Added
group_parallel_toolsparameter toLLMService(defaultTrue). WhenTrue, all function calls from the same LLM response batch share a group ID and the LLM is triggered exactly once after the last call completes. Set toFalseto trigger inference independently for each function call result as it arrives. (PR #4217) -
Added async function call support to
register_function()andregister_direct_function()viacancel_on_interruption=False. When set toFalse, the LLM continues the conversation immediately without waiting for the function result. The result is injected back into the context as adevelopermessage once available, triggering a new LLM inference at that point. (PR #4217) -
Added
enable_prompt_cachingsetting toAWSBedrockLLMServicefor Bedrock ConverseStream prompt caching. (PR #4219) -
Added support for streaming intermediate results from async function calls. Call
result_callbackmultiple times withproperties=FunctionCallResultProperties(is_final=False)to push incremental updates, then call it once more (withis_final=True, the default) to deliver the final result. Only valid for functions registered withcancel_on_interruption=False. (PR #4230) -
Added
LLMMessagesTransformFrameto facilitate programmatically editing context in a frame-based way.The previous approach required the caller to directly grab a reference to the context object, grab a "snapshot" of its messages at that point in time, transform the messages, and then push an
LLMMessagesUpdateFramewith the transformed messages. This approach can lead to problems: what if there had already been a change to the context queued in the pipeline? The transformed messages would simply overwrite it without consideration. (PR #4231) -
The development runner now exports a module-level
appFastAPI instance (from pipecat.runner.run import app) so you can register custom routes before callingmain(). (PR #4234) -
ToolsSchemanow acceptscustom_toolsfor OpenAI LLM services (OpenAILLMService,OpenAIResponsesLLMService,OpenAIResponsesHttpLLMService, andOpenAIRealtimeLLMService), letting you pass provider-specific tools liketool_searchalongside standard function tools. (PR #4248) -
Added enhancements to
NvidiaTTSService:- Cross-sentence stitching: multiple sentences within an LLM turn are fed
into a single
SynthesizeOnlinegRPC stream for seamless audio across sentence boundaries (requires Magpie TTS model v1.7.0+). custom_dictionaryandencodingparameters for IPA-based custom pronunciation and output audio encoding.- Metrics generation (
can_generate_metricsreturns true) andstop_all_metrics()when an audio context is interrupted. - gRPC error handling around synthesis config retrieval
(
GetRivaSynthesisConfig). (PR #4249)
- Cross-sentence stitching: multiple sentences within an LLM turn are fed
into a single
-
Added
MistralTTSServicefor streaming text-to-speech using Mistral's Voxtral TTS API (voxtral-mini-tts-2603). Supports SSE-based audio streaming with automatic resampling from the API's native 24kHz to any requested sample rate. Requires themistraloptional extra (pip install pipecat-ai[mistral]). (PR #4251) -
Added
truncate_large_valuesparameter toLLMContext.get_messages(). WhenTrue, returns compact deep copies of messages with binary data (base64 images, audio) replaced by short placeholders and long string values in LLM-specific messages recursively truncated. Useful for serialization, logging, and debugging tools. (PR #4272) -
CartesiaSTTServicenow supports runtime settings updates (e.g. changinglanguageormodelviaSTTUpdateSettingsFrame). The service automatically reconnects with the new parameters. Previously, settings updates were silently ignored. (PR #4282) -
Added
pcm_32000andpcm_48000sample rate support to ElevenLabs TTS services. (PR #4293) -
Added
enable_loggingparameter toElevenLabsHttpTTSService. Set toFalseto enable zero retention mode (enterprise only). (PR #4293)
Changed
-
Updated
onnxruntimefrom 1.23.2 to 1.24.3, adding support for Python 3.14. (PR #3984) -
MCPClient now requires async with MCPClient(...) as mcp: or explicit start()/close() calls to manage the connection lifecycle. (PR #4034)
-
⚠️ Updated
langchainextra to require langchain 1.x (from 0.3.x), langchain-community 0.4.x (from 0.3.x), and langchain-openai 1.x (from 0.3.x). If you pin these packages in your project, update your pins accordingly. (PR #4192) -
WebsocketServicereconnection errors are now non-fatal. When a websocket service exhausts its reconnection attempts (either via exponential backoff or quick failure detection), it emits a non-fatalErrorFrameinstead of a fatal one. This allows application-level failover (e.g.ServiceSwitcher) to handle the failure instead of killing the entire pipeline. (PR #4201) -
Changed
GrokLLMServicedefault model fromgrok-3-betatogrok-3, now that the model is generally available. (PR #4209) -
GoogleImageGenServicenow defaults toimagen-4.0-generate-001(previouslyimagen-3.0-generate-002). (PR #4213) -
⚠️
BaseOpenAILLMService.get_chat_completions()now accepts anLLMContextinstead ofOpenAILLMInvocationParams. If you override this method, update your signature accordingly. (PR #4215) -
When multiple function calls are returned in a single LLM response, by default (when
group_parallel_tools=True) the LLM is now triggered exactly once after the last call in the batch completes, rather than waiting for all function calls. (PR #4217) -
⚠️
LLMService.function_call_timeout_secsnow defaults toNoneinstead of10.0. Deferred function calls will run indefinitely unless a timeout is explicitly set at the service level or per-call. If you relied on the previous 10-second default, passfunction_call_timeout_secs=10.0explicitly. (PR #4224) -
Updated
NvidiaTTSService:- Made
api_keyoptional for local NIM deployments. - Voice, language, and quality can be updated without reconnecting the gRPC client; new values take effect on the next synthesis turn, not for the current turn's in-flight requests.
- Replaced per-sentence synchronous
synthesize_onlinecalls with async queue-backed gRPC streaming. - Streaming now uses asyncio tasks with explicit gRPC cancellation on interruption and stale-response filtering when a stream is aborted or replaced.
- Renamed Riva references to Nemotron Speech in docs and messages.
- Disabled automatic TTS start frames at the service level
(
push_start_frame=False) and emitTTSStartedFramewhen a stitched synthesis stream is started for a context. (PR #4249)
- Made
Removed
-
⚠️ Removed
OpenPipeLLMServiceand theopenpipeextra. OpenPipe was acquired by CoreWeave and the package is no longer maintained. If you were usingopenpipeas an LLM provider, switch to the underlying provider directly (e.g.openai). The OpenPipe interface can still be used withOpenAILLMServiceby specifying abase_url. (PR #4191) -
⚠️ Removed
NoisereduceFilter. Use system-level noise reduction or a service-based alternative instead. (PR #4204) -
⚠️ Removed deprecated
vad_enabledandvad_audio_passthroughtransport params. (PR #4204) -
⚠️ Removed deprecated
camera_in_enabled,camera_in_is_live,camera_in_width,camera_in_height,camera_out_enabled,camera_out_is_live,camera_out_width,camera_out_height, andcamera_out_colortransport params. Use thevideo_in_*andvideo_out_*equivalents instead. (PR #4204) -
⚠️ Removed
FrameProcessor.wait_for_task(). Usecreate_task()and manage tasks with the built-inTaskManagerinstead. (PR #4204) -
⚠️ Removed deprecated transport frames:
TransportMessageFrame,TransportMessageUrgentFrame,InputTransportMessageUrgentFrame,DailyTransportMessageFrame, andDailyTransportMessageUrgentFrame. UseOutputTransportMessageFrame,OutputTransportMessageUrgentFrame,InputTransportMessageFrame,DailyOutputTransportMessageFrame, andDailyOutputTransportMessageUrgentFrameinstead. (PR #4204) -
⚠️ Removed
create_default_resampler()frompipecat.audio.utils. (PR #4204) -
⚠️ Removed
DailyRunner.configure_with_args(). UsePipelineRunnerwithRunnerArgumentsinstead. (PR #4204) -
⚠️ Removed deprecated
on_pipeline_ended,on_pipeline_cancelled, andon_pipeline_stoppedevents fromPipelineTask. Useon_pipeline_finishedinstead. (PR #4204) -
⚠️ Removed single-argument function call support from
LLMService. Functions must use named parameters instead of a singleargumentsparameter. (PR #4204) -
⚠️ Removed
FalSmartTurnAnalyzerandLocalSmartTurnAnalyzer. (PR #4204) -
⚠️ Removed
RTVIObserver.errors_enabledparameter. (PR #4204) -
⚠️ Removed deprecated RTVI models, frames, and processor methods including
RTVIConfig,RTVIServiceConfig,RTVIServiceOptionConfig, variousRTVI*Datamodels,RTVIActionFrame, andRTVIProcessor.handle_function_call/handle_function_call_start. Use the updated RTVI processor API instead. (PR #4204) -
⚠️ Removed deprecated
KeypadEntryFramealias. (PR #4204) -
⚠️ Removed deprecated interruption frames:
StartInterruptionFrameandBotInterruptionFrame. UseInterruptionFrameandInterruptionTaskFrameinstead. (PR #4204) -
⚠️ Removed
LLMService.request_image_frame(). Push aUserImageRequestFrameinstead. (PR #4204) -
⚠️ Removed
TTSService.say(). Push aTTSSpeakFrameinto the pipeline instead. (PR #4204) -
⚠️ Removed
KrispFilter. Thekrispextra has been removed frompyproject.toml. (PR #4204) -
⚠️ Removed
AudioBufferProcessor.user_continuous_streamparameter. Useuser_audio_passthroughinstead. (PR #4204) -
⚠️ Removed
LLMService.start_callbackparameter. Register anon_llm_response_startevent handler instead. (PR #4204) -
⚠️ Removed deprecated
observersfield fromPipelineParams. Pass observers directly toPipelineTaskconstructor instead. (PR #4204) -
⚠️ Removed deprecated
pipecat.services.openai_realtimepackage. Usepipecat.services.openai.realtimeinstead. (PR #4208) -
⚠️ Removed deprecated
pipecat.services.google.llm_vertexmodule. Usepipecat.services.google.vertex.llminstead. (PR #4208) -
⚠️ Removed deprecated
GoogleLLMOpenAIBetaServicefrompipecat.services.google.openai. UseGoogleLLMServicefrompipecat.services.google.llminstead. (PR #4208) -
⚠️ Removed deprecated
OpenAIRealtimeBetaLLMServiceandAzureRealtimeBetaLLMService. UseOpenAIRealtimeLLMServiceandAzureRealtimeLLMServicefrompipecat.services.openai.realtimeandpipecat.services.azure.realtimeinstead. (PR #4208) -
⚠️ Removed deprecated
pipecat.services.ai_servicesmodule. Import frompipecat.services.ai_service,pipecat.services.llm_service,pipecat.services.stt_service,pipecat.services.tts_service, etc. instead. (PR #4208) -
⚠️ Removed deprecated
pipecat.services.gemini_multimodal_livepackage. Usepipecat.services.google.gemini_liveinstead. Note that class names no longer include "Multimodal" (e.g.GeminiMultimodalLiveLLMService→GeminiLiveLLMService). (PR #4208) -
⚠️ Removed deprecated
pipecat.services.google.gemini_live.llm_vertexmodule. Usepipecat.services.google.gemini_live.vertex.llminstead. (PR #4208) -
⚠️ Removed deprecated
pipecat.services.nimpackage. Usepipecat.services.nvidia.llminstead (NimLLMService→NvidiaLLMService). (PR #4208) -
⚠️ Removed deprecated
pipecat.services.deepgram.stt_sagemakerandpipecat.services.deepgram.tts_sagemakermodules. Usepipecat.services.deepgram.sagemaker.sttandpipecat.services.deepgram.sagemaker.ttsinstead. (PR #4208) -
⚠️ Removed deprecated
pipecat.services.aws_nova_sonicpackage. Usepipecat.services.aws.nova_sonicinstead. (PR #4208) -
⚠️ Removed deprecated
pipecat.services.rivapackage. Usepipecat.services.nvidia.sttandpipecat.services.nvidia.ttsinstead (RivaSTTService→NvidiaSTTService,RivaTTSService→NvidiaTTSService). (PR #4208) -
⚠️ Removed deprecated compatibility modules:
pipecat.services.openai_realtime_beta(usepipecat.services.openai.realtime),pipecat.services.openai_realtime.context,pipecat.services.openai_realtime.frames,pipecat.services.openai.realtime.context,pipecat.services.openai.realtime.frames,pipecat.services.gemini_multimodal_live(usepipecat.services.google.gemini_live),pipecat.services.aws_nova_sonic.context(usepipecat.services.aws.nova_sonic),pipecat.services.google.openaiandpipecat.services.google.llm_openai(usepipecat.services.google.llm). (PR #4215) -
⚠️ Removed
VisionImageFrameAggregator(frompipecat.processors.aggregators.vision_image_frame). Vision/image handling is now built intoLLMContext(frompipecat.processors.aggregators.llm_context). See the12*examples for the recommended replacement pattern. (PR #4215) -
⚠️ Removed
OpenAILLMContext,OpenAILLMContextFrame, andOpenAILLMContext.from_messages(). UseLLMContext(frompipecat.processors.aggregators.llm_context) andLLMContextFrame(frompipecat.frames.frames) instead. All services now exclusively use the universalLLMContext.From the developer's point of view, migrating will usually be a matter of going from this:
context = OpenAILLMContext(messages, tools) context_aggregator = llm.create_context_aggregator(context)To this:
from pipecat.processors.aggregators.llm_context import LLMContext from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair context = LLMContext(messages, tools) context_aggregator = LLMContextAggregatorPair(context)(PR #4215)
-
⚠️ Removed deprecated frame types
LLMMessagesFrameandOpenAILLMContextAssistantTimestampFramefrompipecat.frames.frames. Instead ofLLMMessagesFrame, useLLMContextFramewith the new messages, orLLMMessagesUpdateFramewithrun_llm=True. (PR #4215) -
⚠️ Removed
GatedOpenAILLMContextAggregator(frompipecat.processors.aggregators.gated_open_ai_llm_context). UseGatedLLMContextAggregator(frompipecat.processors.aggregators.gated_llm_context) instead. (PR #4215) -
⚠️ Removed deprecated service-specific context and aggregator machinery, which was superseded by the universal
LLMContextsystem.Service-specific classes removed:
AnthropicLLMContext,AnthropicContextAggregatorPair,AWSBedrockLLMContext,AWSBedrockContextAggregatorPair,OpenAIContextAggregatorPair, and their user/assistant aggregators. Also removedcreate_context_aggregator()fromLLMService,OpenAILLMService,AnthropicLLMService, andAWSBedrockLLMService.Base aggregator classes removed (from
pipecat.processors.aggregators.llm_response):BaseLLMResponseAggregator,LLMContextResponseAggregator,LLMUserContextAggregator,LLMAssistantContextAggregator,LLMUserResponseAggregator,LLMAssistantResponseAggregator.From the developer's point of view, migrating will usually be a matter of going from this:
context = OpenAILLMContext(messages, tools) context_aggregator = llm.create_context_aggregator(context)To this:
from pipecat.processors.aggregators.llm_context import LLMContext from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair context = LLMContext(messages, tools) context_aggregator = LLMContextAggregatorPair(context)(PR #4215)
-
⚠️ Removed deprecated service parameters and shims that have been replaced by the
settings=Service.Settings(...)pattern or direct__init__parameters:PollyTTSServicealias (useAWSTTSService)TTSService:text_aggregator,text_filterinit paramsAWSNovaSonicLLMService:send_transcription_framesinit paramDeepgramSTTService:urlinit param (usebase_url)FishAudioTTSService:modelinit param (usereference_idorsettings)GladiaSTTService:languageandconfidencefromGladiaInputParams,InputParamsclass aliasGeminiTTSService:api_keyinit paramGeminiLiveLLMService:base_urlinit param (usehttp_options)GoogleVertexLLMService:InputParamsclass withlocation/project_idfields (use direct init params);project_idis now required,locationdefaults to"us-east4"MiniMaxHttpTTSService:english_normalizationfromInputParams(usetext_normalization)SimliVideoService:simli_configinit param (useapi_key/face_id),use_turn_serverinit param;api_keyandface_idare now requiredAnthropicLLMService:enable_prompt_caching_betafromInputParams(useenable_prompt_caching) (PR #4220)
-
⚠️ Removed deprecated
pipecat.transports.servicesandpipecat.transports.networkmodule aliases. Update imports to usepipecat.transports.daily.transport,pipecat.transports.livekit.transport,pipecat.transports.websocket.*,pipecat.transports.webrtc.*, andpipecat.transports.daily.utilsrespectively. (PR #4225) -
⚠️ Removed deprecated
pipecat.syncpackage. Usepipecat.utils.syncinstead. (PR #4225) -
⚠️ Removed deprecated
TranscriptionMessage,ThoughtTranscriptionMessage, andTranscriptionUpdateFramefrompipecat.frames.frames. (PR #4228) -
⚠️ Removed deprecated
allow_interruptionsparameter fromPipelineParams,StartFrame, andFrameProcessor. Interruptions are now always allowed by default. UseLLMUserAggregator'suser_turn_strategies/user_mute_strategiesparameters to control interruption behavior. (PR #4228) -
⚠️ Removed deprecated
STTMuteFilter,STTMuteConfig, andSTTMuteStrategyfrompipecat.processors.filters.stt_mute_filter. Usepipecat.turns.user_mutestrategies withLLMUserAggregator'suser_mute_strategiesparameter instead. (PR #4228) -
⚠️ Removed deprecated
pipecat.processors.transcript_processormodule (TranscriptProcessor,TranscriptProcessorConfig). Use pipeline observers instead. (PR #4228) -
⚠️ Removed deprecated
EmulateUserStartedSpeakingFrameandEmulateUserStoppedSpeakingFrameframes, and theemulatedfield fromUserStartedSpeakingFrame/UserStoppedSpeakingFrame. (PR #4228) -
⚠️ Removed deprecated
interruption_strategiesparameter fromPipelineParams,StartFrame, andFrameProcessor. UseLLMUserAggregator'suser_turn_strategiesparameter instead. (PR #4228) -
⚠️ Removed deprecated
pipecat.audio.interruptionsmodule (BaseInterruptionStrategy,MinWordsInterruptionStrategy). Usepipecat.turns.user_start.MinWordsUserTurnStartStrategywithLLMUserAggregator'suser_turn_strategiesparameter instead. (PR #4228) -
⚠️ Removed deprecated
pipecat.utils.tracing.class_decoratorsmodule. Usepipecat.utils.tracing.service_decoratorsinstead. (PR #4228) -
⚠️ Removed deprecated
add_pattern_pairmethod fromPatternPairAggregator. Useadd_patterninstead. (PR #4228) -
⚠️ Removed deprecated
UserResponseAggregatorclass frompipecat.processors.aggregators.user_response. UseLLMUserAggregatorinstead. (PR #4228) -
⚠️ Removed
ExternalUserTurnStrategiesand the automatic fallback to it inLLMUserAggregatorwhen aSpeechControlParamsFramewas received from the transport. (PR #4229) -
⚠️ Removed
vad_analyzerandturn_analyzerparameters fromTransportParamsand all transport input classes, along with all deprecated VAD/turn analysis logic inBaseInputTransport. VAD and turn detection are now handled entirely byLLMUserAggregator. (PR #4229) -
⚠️ Removed deprecated
TranscriptionUserTurnStopStrategyalias (deprecated in 0.0.102). UseSpeechTimeoutUserTurnStopStrategyinstead. (PR #4232) -
⚠️ Removed deprecated
vad_eventssetting andshould_interruptparameter fromDeepgramSTTService(deprecated in 0.0.99). Use Silero VAD for voice activity detection instead. (PR #4232) -
⚠️ Removed deprecated
send_transcription_framesparameter fromOpenAIRealtimeLLMService(deprecated in 0.0.92). Transcription frames are always sent. (PR #4232) -
⚠️ Removed deprecated
UserIdleProcessor(deprecated in 0.0.100). UseLLMUserAggregatorwith theuser_idle_timeoutparameter instead. (PR #4232) -
⚠️ Removed deprecated
UserBotLatencyLogObserver(deprecated in 0.0.102). UseUserBotLatencyObserverwith itson_latency_measuredevent handler instead. (PR #4232) -
⚠️ Removed the
rivainstall extra. Usenvidiainstead (pip install "pipecat-ai[nvidia]"). (PR #4235) -
Removed the empty
remote-smart-turninstall extra (was already a no-op). (PR #4235) -
⚠️ Removed
DeprecatedModuleProxyand all service__init__.pyre-export shims. Flat imports likefrom pipecat.services.openai import OpenAILLMServiceno longer work. Use the full submodule path instead:from pipecat.services.openai.llm import OpenAILLMService. This is already the established pattern across all examples and internal code. (PR #4239) -
⚠️ Removed deprecated
PIPECAT_OBSERVER_FILESenvironment variable support. UsePIPECAT_SETUP_FILESinstead. (PR #4267)
Fixed
-
Fixed
IdleFrameProcessorwhereasyncio.Eventwas unconditionally cleared in afinallyblock instead of only on the success path. (PR #3796) -
Fixed MCPClient opening a new connection for every tool call instead of reusing the session. (PR #4034)
-
GoogleLLMService now applies a low-latency thinking default (
thinking_level="minimal") for Gemini 3+ Flash models. (PR #4067) -
Fixed
WebsocketServiceentering an infinite reconnection loop when a server accepts the WebSocket handshake but immediately closes the connection (e.g. invalid API key, close code 1008). The service now detects connections that fail repeatedly within seconds of being established and stops retrying after 3 consecutive quick failures. (PR #4201) -
Fixed
InworldHttpTTSServicestreaming responses crashing withUnicodeDecodeErrorwhen multi-byte UTF-8 characters were split across chunk boundaries. This caused TTS audio to cut off mid-sentence intermittently. (PR #4202) -
Fixed a crash (
JSONDecodeError) when a user interruption occurs while the LLM is streaming function call arguments. Previously, the incomplete JSON arguments were passed directly tojson.loads(), causing an unhandled exception. Affected services: OpenAI, Google (OpenAI-compatible), and SambaNova. (PR #4203) -
Fixed
BaseOutputTransportdiscarding pendingUninterruptibleFrameitems (e.g. function-call context updates) when an interruption arrived. The audio task is now kept alive and only interruptible frames are drained when uninterruptible frames are present in the queue. (PR #4217) -
Fixed spurious LLM inference being triggered when a function call result arrived while the user was actively speaking. The context frame is now suppressed until the user stops speaking. (PR #4217)
-
Fixed
CartesiaTTSServicefailing with "Context has closed" errors when switching voice, model, or language viaTTSUpdateSettingsFrame. The service now automatically flushes the current audio context and opens a fresh one when these settings change. (PR #4220) -
Fixed duplicate LLM replies that could occur when multiple async function call results arrived while an LLM request was already queued. (PR #4230)
-
Fixed undefined
_warn_deprecated_paramcalls inOpenAIRealtimeLLMServiceandGrokRealtimeLLMServicefor the deprecatedsession_propertiesinit parameter. (PR #4232) -
Fixed Gemini Live bot hanging after a session resumption reconnect. Audio, video, and text input were silently dropped after reconnecting because the internal
_ready_for_realtime_inputflag was not being reset. (PR #4242) -
Fixed
VADControllergetting stuck in theSPEAKINGstate when audio frames stop arriving mid-speech (e.g. user mutes mic). A newaudio_idle_timeoutparameter (default 1s, set to 0 to disable) forces a transition back toQUIETand emitson_speech_stoppedwhen no audio is received while speaking. (PR #4244) -
Fixed
PipelineRunner._gc_collect()blocking the event loop by runninggc.collect()synchronously. Now offloaded viaasyncio.to_threadto avoid stalling concurrent pipeline tasks. (PR #4255) -
Fixed
ElevenLabsTTSServiceincorrectly enablingauto_modewhen usingTextAggregationMode.TOKEN. Auto mode disables server-side buffering and is designed for complete sentences — enabling it with token streaming degraded speech quality. The default is now derived automatically from the aggregation strategy:auto_mode=TrueforSENTENCE,auto_mode=FalseforTOKEN. Callers can still override by passingauto_modeexplicitly. (PR #4265) -
Fixed
ValueError: write to closed fileduring pipeline shutdown when observers were active. Observer proxy tasks are now cancelled before observer resources are cleaned up. (PR #4267) -
Fixed delayed turn completion when STT transcripts arrive after the p99 timeout. Previously, a late transcript (beyond the p99 window) would fall through to the 5-second
user_turn_stop_timeoutfallback. Now the turn stop triggers immediately when the late transcript arrives. (PR #4283) -
Fixed
ElevenLabsTTSServiceignoringenable_logging=Falseandenable_ssml_parsing=False. The truthy check treatedFalsethe same asNone(both skipped), and Python'sstr(False)produced"False"instead of the lowercase"false"expected by the API. (PR #4293) -
Fixed
on_assistant_turn_stoppednot resetting internal state when the LLM returned no text tokens. Addedinterruptedfield toAssistantTurnStoppedMessageto indicate whether the assistant turn was interrupted. (PR #4294) -
Fixed
LLMContextSummarizerfailing with "No messages to summarize" when usingsystem_instructioninstead of a system-role message at the start of the context. The summarizer previously scanned the entire context for the first system message, which could match a mid-conversation injection (e.g. idle notifications) instead of the initial prompt, causing the summarization range to be empty. (PR #4295)
[0.0.108] - 2026-03-27
Added
-
Added
SarvamLLMServicewith support forsarvam-30b,sarvam-30b-16k,sarvam-105bandsarvam-105b-32k. (PR #3978) -
Added
on_turn_context_created(context_id)hook toTTSService. Override this to perform provider-specific setup (e.g. eagerly opening a server-side context) before text starts flowing. Called each time a new turn context ID is created. (PR #4013) -
Added
XAIHttpTTSServicefor text-to-speech using xAI's HTTP TTS API. (PR #4031) -
Added support for "developer" role messages in conversation context across all LLM adapters. For non-OpenAI services (Anthropic, Google, AWS Bedrock), "developer" messages are converted to "user" messages (use
system_instructionto set the system instruction). For OpenAI services, "developer" messages pass through in conversation history. For the Responses API, they are kept as "developer" role (matching the existing "system" → "developer" conversion). (PR #4089) -
Added
SmallestTTSService, a WebSocket-based TTS service integration with Smallest AI's Waves API. Supports the Lightning v2 and v3.1 models with configurable voice, language, speed, consistency, similarity, and enhancement settings. (PR #4092) -
Added warnings in turn stop strategies when
VADParams.stop_secsdiffers from the recommended default (0.2s) or whenstop_secs >= STT p99 latency, which collapses the STT wait timeout to 0s and may cause delayed turn detection. The warnings guide developers to re-run the stt-benchmark with their VAD settings. (PR #4115) -
Added
domainparameter toAssemblyAISTTSettingsfor specialized recognition modes such as Medical Mode (domain="medical-v1"). (PR #4117) -
Added
NovitaLLMServicefor using Novita AI's LLM models via their OpenAI-compatible API. (PR #4119) -
Added
cleanup()method toVADAnalyzerandVADControllerso VAD analyzer resources are properly released when no longer needed. CustomVADAnalyzersubclasses can overridecleanup()to free any held resources. (PR #4120) -
Added
on_end_of_turnevent handler toAssemblyAISTTService. This fires after the final transcript is pushed, providing a reliable hook for end-of-turn logic that doesn't race withTranscriptionFrame. Works in both Pipecat and AssemblyAI turn detection modes. (PR #4128) -
Added
DeepgramFluxSageMakerSTTServicefor running Deepgram Flux speech-to-text on AWS SageMaker endpoints. Use withExternalUserTurnStrategiesto take advantage of Flux's turn detection. (PR #4143) -
Added
Mem0MemoryService.get_memories()convenience method for retrieving all stored memories outside the pipeline (e.g. to build a personalized greeting at connection time). This avoids the need to manually handle client type branching, filter construction, and async wrapping. (PR #4156)
Changed
-
Added context prewarming path for
InworldTTSServiceto improve first audio latency. (PR #4013) -
Added
KrispVivaVadAnalyzerfor Voice Activity Detection using the Krisp VIVA SDK (requireskrisp_audio). (PR #4022) -
Modified
InworldTTSServiceto close context at end of turn instead of relying on idle timeout. (PR #4028) -
Added Gemini 3 support to the Gemini Live service. (PR #4078)
-
TTSService: the defaultstop_frame_timeout_s(idle time before an automaticTTSStoppedFrameis pushed whenpush_stop_frames=True) has changed from2.0to3.0seconds. (PR #4084) -
⚠️
GeminiLLMAdapternow only treatsmessages[0]as the initial system message, matching all other adapters. Previously it searched for the first "system" message anywhere in the conversation history. A "system" message appearing later in the list will now be converted to "user" instead of being extracted as the system instruction. (PR #4089) -
Fixed
InworldTtsServiceto fallback to full text when TTS timestamps are not received. (PR #4113) -
⚠️ Realtime services (Gemini Live, OpenAI Realtime, Grok Realtime, Nova Sonic) now prefer
system_instructionfrom service settings over an initial system message in the LLM context, matching the behavior of non-realtime services. Previously, context-provided system instructions took precedence. A warning is now logged when both are set. (PR #4130) -
Bumped
nvidia-riva-clientminimum version to>=2.25.1. (PR #4136) -
Upgraded
protobuffrom 5.x to 6.x (>=6.31.1,<7). (PR #4136) -
Unrecognized language strings (e.g. Deepgram's
"multi") no longer produce a warning at startup. The log message has been downgraded to debug level since these are valid service-specific values that are passed through correctly. (PR #4137) -
GrokLLMServiceandGrokRealtimeLLMServicenow live in thepipecat.services.xaimodule alongsideXAIHttpTTSService, since all three use the same xAI API. Update imports frompipecat.services.grok.*topipecat.services.xai.*(e.g.from pipecat.services.xai.llm import GrokLLMService). (PR #4142) -
⚠️ Bumped
mem0aidependency from~=0.1.94to>=1.0.8,<2. Users of themem0extra will need to update their mem0ai package. (PR #4156)
Deprecated
pipecat.services.grok.llm,pipecat.services.grok.realtime.llm, andpipecat.services.grok.realtime.eventsare deprecated. The old import paths still work but emit aDeprecationWarning; usepipecat.services.xai.llm,pipecat.services.xai.realtime.llm, andpipecat.services.xai.realtime.eventsinstead. (PR #4142)
Removed
-
⚠️
TTSService.add_word_timestamps()no longer supports the"Reset"and"TTSStoppedFrame"sentinel strings. If you have a custom TTS service that calledawait self.add_word_timestamps([("Reset", 0)])orawait self.add_word_timestamps([("TTSStoppedFrame", 0), ("Reset", 0)], ctx_id), replace them withawait self.append_to_audio_context(ctx_id, TTSStoppedFrame(context_id=ctx_id))and let_handle_audio_contextmanage the word-timestamp reset automatically. (PR #4145) -
Removed
SambaNovaSTTService. SambaNova no longer offers speech-to-text audio models. Use another STT provider instead. (PR #4154)
Fixed
-
Fixed Gemini Live (
GoogleGeminiLiveLLMService) not honoringsettings.system_instruction. The system instruction was being read from a deprecated constructor parameter instead of the settings object, causing it to be silently ignored. (PR #4089) -
Fixed
AWSBedrockLLMAdaptersending an empty message list to the API when the only message in context was a system message. The lone system message is now converted to "user" role instead of being extracted, matching the existing Anthropic adapter behavior. (PR #4089) -
Fixed Gemini Live pipeline hanging indefinitely when an
EndFramewas deferred while waiting for the bot to finish responding andturn_completenever arrived. As a possible root-cause fix,turn_completemessages are now handled even if they lackusage_metadata. As a fallback, the deferredEndFramenow has a 30-second safety timeout. (PR #4125) -
Fixed ElevenLabs WebSocket disconnections (1008 "Maximum simultaneous contexts exceeded") caused by rapid user interruptions. When interruptions arrived before any TTS text was generated, phantom contexts were created on the ElevenLabs server that were never closed, eventually exceeding the 5-context limit. (PR #4126)
-
Fixed the final sentence being dropped from the conversation context when using RTVI text input with non-word-timestamp TTS services. The
LLMFullResponseEndFramewas racing ahead of the lastTTSTextFrame, causing theLLMAssistantAggregatorto finalize the context before the final sentence arrived. (PR #4127) -
Fixed audio crackling and popping in recordings when both user and bot are speaking.
AudioBufferProcessorno longer injects silence into a track's buffer while that track is actively producing audio, preventing mid-utterance interruptions in the recorded output. (PR #4135) -
Fixed websocket TTS word timestamps so interrupted contexts cannot leak stale words or backward PTS values into later turns. (PR #4145)
-
Fixed a race condition in
InterruptibleTTSServicewhere, ifrun_ttshad been invoked butBotStartedSpeakingFramehad not yet been received, a user interruption could allow stale audio to leak through. (PR #4145) -
Fixed Gemini Live local VAD mode (
GeminiVADParams(disabled=True)with external VAD) not working. The bot now correctly detects user speech and signals turn boundaries to the Gemini API. (PR #4146) -
Fixed Gemini Live message handling to process all
server_contentfields independently. Gemini 3.x can bundle multiple fields (e.g.model_turnandoutput_transcription) on the same message, but the previouselifchain only processed the first match, silently dropping the rest. (PR #4147) -
Fixed
ServiceSwitcherwithServiceSwitcherStrategyFailoverincorrectly triggering failover whenErrorFrames from other pipeline stages (e.g. TTS) propagated upstream through the switcher. Previously, any non-fatal error passing through would be misattributed to the active service and trigger an unwanted service switch. Now only errors originating from the switcher's own managed services trigger failover. (PR #4149) -
Fixed
LiveKitOutputTransportnot clearing thertc.AudioSourceinternal buffer on interruption, causing the bot to continue speaking for several seconds after being interrupted. (PR #4151) -
Fixed a crash in OpenAI LLM processing when the provider returns
chunk.choices[0].delta.audio = None, which caused'NoneType' object has no attribute 'get'errors during audio transcript handling. (PR #4152) -
Fixed error floods in
DeepgramSTTServicewhen the WebSocket connection drops. With Deepgram SDK 6.x,send_media()raises exceptions on a dead connection instead of silently failing, causing every queued audio frame to log an error. Nowsend_media()failures are caught gracefully — a single warning is logged and audio frames are skipped until the existing reconnection logic restores the connection. (PR #4153) -
Mem0MemoryServiceno longer blocks the event loop during memory storage and retrieval. All Mem0 API calls now run in a background thread, and message storage is fire-and-forget so it doesn't delay downstream processing. (PR #4156) -
Fixed
Mem0MemoryServicefailing to store messages when the context contained system or developer role messages. The Mem0 API only accepts user and assistant roles, so other roles are now filtered out before storing. (PR #4156) -
Added missing
on_dtmf_eventcallback toLemonSliceTransportClient.setup()DailyCallbacksconstruction, fixing aValidationErrorat pipeline setup time. (PR #4161) -
Fixed an issue in
InworldTTSServicewhere, in cases of fast interruption, we would continue receiving audio from the previous context. (PR #4167) -
Fixed a word timestamp interleaving issue in
InworldTTSServicewhen processing multiple sentences. (PR #4167) -
Fixed duplicate
TTSStoppedFramebeing pushed in TTS services usingpush_stop_frames=True. When the stop-frame timeout fired, a secondTTSStoppedFramecould be pushed after the normal one at context completion. (PR #4172) -
⚠️ Fixed
DeepgramSTTServicecompatibility with deepgram-sdk 6.1.0. The SDK now requires explicit message objects forsend_keep_alive(),send_close_stream(), andsend_finalize(). The minimum deepgram-sdk version is now 6.1.0. (PR #4174) -
Fixed RTVI events not being delivered to clients when using WebSocket transports.
ProtobufFrameSerializernow setsignore_rtvi_messages=Falseby default. (PR #4176) -
Fixed a timing issue where turn detection timer tasks (idle controller, speech timeout, turn analyzer, and turn completion) could miss their first tick because the newly created asyncio task was not yet scheduled when the caller continued. (PR #4183)
-
Fixed
FastAPIWebsocketTransportintermittently hanging on shutdown when the remote side (e.g. Twilio) disconnects while audio is being sent. A race condition between the send and receive paths could cause theon_client_disconnectedcallback to be skipped, leaving the pipeline waiting for a disconnect signal that never came. (PR #4186)
Performance
RimeTTSServicenow handles Rime'sdoneWebSocket message to complete audio contexts immediately, eliminating the 3-second idle timeout that previously added latency at the end of each utterance. (PR #4172)
[0.0.107] - 2026-03-23
Added
-
Added
frame_orderparameter toSyncParallelPipeline. Setframe_order=FrameOrder.PIPELINEto push synchronized output frames in pipeline definition order (all frames from the first pipeline, then the second, etc.) instead of the default arrival order. (PR #4029) -
Added
sync_with_audiofield toOutputImageRawFrame. When set toTrue, the output transport queues image frames with audio so they are displayed only after all preceding audio has been sent, enabling synchronized audio/image playback. (PR #4029) -
Added
OpenAIResponsesLLMService, a new LLM service that uses the OpenAI Responses API. Supports streaming text, function calling, usage metrics, and out-of-band inference. Works with the universalLLMContextandLLMContextAggregatorPair. Seeexamples/foundational/07-interruptible-openai-responses.pyand14-function-calling-openai-responses.py. (PR #4074) -
Added
audio_out_auto_silenceparameter toTransportParams(defaults toTrue). When set toFalse, the transport waits for audio data instead of inserting silence when the output queue is empty, which is useful for scenarios that require uninterrupted audio playback without artificial gaps. (PR #4104)
Changed
-
Renamed tracing span attributes to align with OpenTelemetry GenAI semantic conventions:
gen_ai.systemtogen_ai.provider.name,systemtogen_ai.system_instructions,gen_ai.usage.cache_read_input_tokenstogen_ai.usage.cache_read.input_tokens, andgen_ai.usage.cache_creation_input_tokenstogen_ai.usage.cache_creation.input_tokens. (PR #3449) -
DeepgramSageMakerTTSServicenow correctly routes audio through the baseTTSServiceaudio context queue. Audio frames are delivered viaappend_to_audio_context()instead of being pushed directly, enabling proper ordering, interruption handling, and start/stop frame lifecycle management. Interruptions now trigger aClearmessage to Deepgram (flushing its text buffer) at the right time viaon_audio_context_interrupted. (PR #4083) -
GradiumTTSServicenow sends a per-contextsetupmessage withclient_req_idbefore the first text message for each TTS context, following Gradium's multiplexing protocol. Previously, a single setup message was sent at connection time without aclient_req_id, which prevented Gradium from associating requests with their sessions when usingclose_ws_on_eos=False. (PR #4091)
Fixed
-
Fixed stale
system_instructionin LLM tracing spans by reading from_settings.system_instructioninstead of the removed_system_instructionattribute. (PR #3449) -
Fixed
SyncParallelPipelinebreaking the Whisker debugger. (PR #4029) -
Fixed
SyncParallelPipelinerace condition where concurrent SystemFrame processing (e.g. from RTVI) could corrupt sink queues and cause deadlocks. SystemFrames now take a fast path that passes them through without draining queued output. (PR #4029) -
Fixed TTS frame ordering so that non-system frames always arrive in correct order relative to the
TTSStartedFrame/TTSAudioRawFrame/TTSStoppedFramesequence. Previously these frames could race ahead of or behind audio context frames, producing out-of-order output downstream. (PR #4075) -
Fixed
SarvamTTSServiceaudio and error frames now route throughappend_to_audio_context()instead ofpush_frame(), ensuring correct behavior with audio contexts and interruptions. (PR #4082) -
Fixed audio frame ordering and interruption handling in Fish Audio, LMNT, Neuphonic, and Rime NonJson TTS services. These services were bypassing the base
TTSServiceaudio context serialization queue by pushing audio frames directly, which could cause out-of-order frames and broken interruptions during speech. (PR #4090) -
Fixed Genesys AudioHook serializer to always include the
parametersfield in protocol messages. The AudioHook protocol requires every message to carry aparametersobject (even if empty), but_create_messageomitted it when no parameters were provided. This caused clients that validate message structure (including the Genesys reference implementation) to rejectpongand parameter-lessclosedresponses, breaking server sequence tracking and preventingoutputVariablesfrom reaching the Architect flow. (PR #4093)
[0.0.106] - 2026-03-18
Added
-
Added optional
servicefield toServiceUpdateSettingsFrame(and its subclassesLLMUpdateSettingsFrame,TTSUpdateSettingsFrame,STTUpdateSettingsFrame) to target a specific service instance. Whenserviceis set, only the matching service applies the settings; others forward the frame unchanged. This enables updating a single service when multiple services of the same type exist in the pipeline. (PR #4004) -
Added
sip_providerandroom_geoparameters toconfigure()in the Daily runner. These convenience parameters let callers specify a SIP provider name and geographic region directly without manually constructingDailyRoomPropertiesandDailyRoomSipParams. (PR #4005) -
Added
PerplexityLLMAdapterthat automatically transforms conversation messages to satisfy Perplexity's stricter API constraints (strict role alternation, no non-initial system messages, last message must be user/tool). Previously, certain conversation histories could cause Perplexity API errors that didn't occur with OpenAI (PerplexityLLMServicesubclassesOpenAILLMServicesince Perplexity uses an OpenAI-compatible API). (PR #4009) -
Added DTMF input event support to the Daily transport. Incoming DTMF tones are now received via Daily's
on_dtmf_eventcallback and pushed into the pipeline asInputDTMFFrame, enabling bots to react to keypad presses from phone callers. (PR #4047) -
Added
WakePhraseUserTurnStartStrategyfor triggering user turns based on wake phrases, with support forsingle_activationmode. DeprecatesWakeCheckFilter. (PR #4064) -
Added
default_user_turn_start_strategies()anddefault_user_turn_stop_strategies()helper functions for composing custom strategy lists. (PR #4064)
Changed
-
Changed tool result JSON serialization to use
ensure_ascii=False, preserving UTF-8 characters instead of escaping them. This reduces context size and token usage for non-English languages. (PR #3457) -
OpenAIRealtimeSTTService'snoise_reductionparameter is now part ofOpenAIRealtimeSTTSettings, making it runtime-updatable viaSTTUpdateSettingsFrame. The directnoise_reductioninit argument is deprecated as of 0.0.106. (PR #3991) -
Updated
sarvamaidependency from0.1.26a2(alpha) to0.1.26(stable release). (PR #3997) -
SimliVideoServicenow extendsAIServiceinstead ofFrameProcessor, aligning it with the HeyGen and Tavus video services. It supportsSimliVideoService.Settings(...)for configuration and usesstart()/stop()/cancel()lifecycle methods. Existing constructor usage (api_key,face_id, etc.) remains unchanged. (PR #4001) -
Update
pipecat-ai-small-webrtc-prebuiltto2.4.0. (PR #4023) -
Nova Sonic assistant text transcripts are now delivered in real-time using speculative text events instead of delayed final text events. Previously, assistant text only arrived after all audio had finished playing, causing laggy transcripts in client UIs. Speculative text arrives before each audio chunk, providing text synchronized with what the bot is saying. This also simplifies the internal text handling by removing the interruption re-push hack and assistant text buffer. (PR #4042)
-
Updated
daily-pythondependency to 0.25.0. (PR #4047) -
Added
enable_dialoutparameter toconfigure()inpipecat.runner.dailyto support dial-out rooms. Also narrowed misleadingOptionaltype hints and deduplicated token expiry calculation. (PR #4048) -
Extended
ProcessFrameResultto stop strategies, allowing a stop strategy to short-circuit evaluation of subsequent strategies by returningSTOP. (PR #4064) -
GradiumSTTServicenow takes both anencodingandsample_rateconstructor argument which is assmebled in the class to form theinput_format. PCM accepts8000,16000, and24000Hz sample rates. (PR #4066) -
Improved
GradiumSTTServicetranscription accuracy by reworking how text fragments are accumulated and finalized. Previously, trailing words could be dropped when the server'sflushedresponse arrived before all text tokens were delivered. The service now uses a short aggregation delay after flush to capture trailing tokens, producing complete utterances. (PR #4066)
Deprecated
-
SimliVideoService.InputParamsis deprecated. Use the direct constructor parametersmax_session_length,max_idle_time, andenable_logginginstead. (PR #4001) -
Deprecated
LocalSmartTurnAnalyzerV2andLocalCoreMLSmartTurnAnalyzer. UseLocalSmartTurnAnalyzerV3instead. Instantiating these analyzers will now emit aDeprecationWarning. (PR #4012) -
Deprecated
WakeCheckFilterin favor ofWakePhraseUserTurnStartStrategy. (PR #4064)
Fixed
-
Fixed an issue where the default model for
OpenAILLMServiceandAzureLLMServicewas mistakenly reverted togpt-4o. The defaults are now restored togpt-4.1. (PR #4000) -
Fixed a race condition where
EndTaskFramecould cause the pipeline to shut down before in-flight frames (e.g. LLM function call responses) finished processing.EndTaskFrameandStopTaskFramenow flow through the pipeline asControlFrames, ensuring all pending work is flushed before shutdown begins.CancelTaskFrameandInterruptionTaskFrameremain immediate (SystemFrame). (PR #4006) -
Fixed
ParallelPipelinedropping or misordering frames during lifecycle synchronization. Buffered frames are now flushed in the correct order relative to synchronization frames (StartFramegoes first,EndFrame/CancelFramego after), and frames added to the buffer during flush are also drained. (PR #4007) -
Fixed
TTSServicepotentially canceling in-flight audio during shutdown. The stop sequence now waits for all queued audio contexts to finish processing before canceling the stop frame task. (PR #4007) -
Fixed
Languageenum values (e.g.Language.ES) not being converted to service-specific codes when passed viasettings=Service.Settings(language=Language.ES)at init time. This caused API errors (e.g. 400 from Rime) because the raw enum was sent instead of the expected language code (e.g."spa"). Runtime updates viaUpdateSettingsFramewere unaffected. The fix centralizes conversion in the baseTTSServiceandSTTServiceclasses so all services handle this consistently. (PR #4024) -
Fixed
DeepgramSTTServiceignoring thebase_urlscheme when usingws://orhttp://. Previously these were silently overwritten withwss:///https://, breaking air-gapped or private deployments that don't use TLS. All scheme choices (wss://,https://,ws://,http://, or bare hostname) are now respected. (PR #4026) -
Fixed
LLMSwitcher.register_function()andregister_direct_function()not accepting or forwarding thetimeout_secsparameter. (PR #4037) -
Fixed empty user transcriptions in Nova Sonic causing spurious interruptions. Previously, an empty transcription could trigger an interruption of the assistant's response even though the user hadn't actually spoken. (PR #4042)
-
Fixed
SonioxSTTServiceandOpenAIRealtimeSTTServicecrash when language parameters contain plain strings instead ofLanguageenum values. (PR #4046) -
Fixed premature user turn stops caused by late transcriptions arriving between turns. A stale transcript from the previous turn could persist into the next turn and trigger a stop before the current turn's real transcript arrived. Stop strategies are now reset at both turn start and turn stop to prevent state from leaking across turn boundaries. (PR #4057)
-
Fixed raw language strings like
"de-DE"silently failing when passed to TTS/STT services (e.g. ElevenLabs producing no audio). Raw strings now go through the sameLanguageenum resolution as enum values, so regional codes like"de-DE"are properly converted to service-expected formats like"de". Unrecognized strings log a warning instead of failing silently. (PR #4058) -
Fixed Deepgram STT list-type settings (
keyterm,keywords,search,redact,replace) being stringified instead of passed as lists to the SDK, which caused them to be sent as literal strings (e.g."['pipecat']") in the WebSocket query params. (PR #4063) -
Fixed
MinWordsUserTurnStartStrategyincluding text below the word threshold in the output by resetting aggregation when the minimum word count is not met. (PR #4064) -
Fixed audio overlap and potential dropped TTS content when multiple assistant turns occur in quick succession.
TTSServicenow flushes remaining text before pausing frame processing onLLMFullResponseEndFrame/EndFrame, instead of pausing first. (PR #4071)
Security
- Bumped PyJWT minimum version from 2.10.1 to 2.12.0 in the
livekitextra to address CVE-2026-32597 (GHSA-752w-5fwx-jx9f), where PyJWT <= 2.11.0 accepted unknowncritheader extensions. (PR #4035)
[0.0.105] - 2026-03-10
Added
-
Added concurrent audio context support:
CartesiaTTSServicecan now synthesize the next sentence while the previous one is still playing, by settingpause_frame_processing=Falseand routing each sentence through its own audio context queue. (PR #3804) -
Added custom video track support to Daily transport. Use
video_out_destinationsinDailyParamsto publish multiple video tracks simultaneously, mirroring the existingaudio_out_destinationsfeature. (PR #3831) -
Added
ServiceSwitcherStrategyFailoverthat automatically switches to the next service when the active service reports a non-fatal error. Recovery policies can be implemented via theon_service_switchedevent handler. (PR #3861) -
Added optional
timeout_secsparameter toregister_function()andregister_direct_function()for per-tool function call timeout control, overriding the globalfunction_call_timeout_secsdefault. (PR #3915) -
Added
cloud-audio-onlyrecording option to Daily transport'senable_recordingproperty. (PR #3916) -
Wired up
system_instructioninBaseOpenAILLMService,AnthropicLLMService, andAWSBedrockLLMServiceso it works as a default system prompt, matching the behavior of the Google services. This enables sharing a singleLLMContextacross multiple LLM services, where each service provides its own system instruction independently.llm = OpenAILLMService( api_key=os.getenv("OPENAI_API_KEY"), system_instruction="You are a helpful assistant.", ) context = LLMContext() @transport.event_handler("on_client_connected") async def on_client_connected(transport, client): context.add_message({"role": "user", "content": "Please introduce yourself."}) await task.queue_frames([LLMRunFrame()])(PR #3918)
-
Added
vad_thresholdparameter toAssemblyAIConnectionParamsfor configuring voice activity detection sensitivity in U3 Pro. Aligning this with external VAD thresholds (e.g., Silero VAD) prevents the "dead zone" where AssemblyAI transcribes speech that VAD hasn't detected yet. (PR #3927) -
Added
push_empty_transcriptsparameter toBaseWhisperSTTServiceandOpenAISTTServiceto allow empty transcripts to be pushed downstream asTranscriptionFrameinstead of discarding them (the default behavior). This is intended for situations where VAD fires even though the user did not speak. In these cases, it is useful to know that nothing was transcribed so that the agent can resume speaking, instead of waiting longer for a transcription. (PR #3930) -
LLM services (
BaseOpenAILLMService,AnthropicLLMService,AWSBedrockLLMService) now log a warning when bothsystem_instructionand a system message in the context are set. The constructor'ssystem_instructiontakes precedence. (PR #3932) -
Runtime settings updates (via
STTUpdateSettingsFrame) now work for AWS Transcribe, Azure, Cartesia, Deepgram, ElevenLabs Realtime, Gradium, and Soniox STT services. Previously, changing settings at runtime only stored the new values without reconnecting. (PR #3946) -
Exposed
on_summary_appliedevent onLLMAssistantAggregator, allowing users to listen for context summarization events without accessing private members. (PR #3947) -
Deepgram Flux STT settings (
keyterm,eot_threshold,eager_eot_threshold,eot_timeout_ms) can now be updated mid-stream viaSTTUpdateSettingsFramewithout triggering a reconnect. The new values are sent to Deepgram as a Configure WebSocket message on the existing connection. (PR #3953) -
Added
system_instructionparameter torun_inferenceacross all LLM services, allowing callers to override the system prompt for one-shot inference calls. Used by_generate_summaryto pass the summarization prompt cleanly. (PR #3968)
Changed
-
Audio context management (previously in
AudioContextTTSService) is now built intoTTSService. All WebSocket providers (cartesia,elevenlabs,asyncai,inworld,rime,gradium,resembleai) now inherit fromWebsocketTTSServicedirectly. Word-timestamp baseline is set automatically on the first audio chunk of each context instead of requiring each provider to callstart_word_timestamps()in their receive loop. (PR #3804) -
Daily transport now uses
CustomVideoSource/CustomVideoTrackinstead ofVirtualCameraDevicefor the default camera output, mirroring how audio already works withCustomAudioSource/CustomAudioTrack. (PR #3831) -
⚠️ Updated
DeepgramSTTServiceto usedeepgram-sdkv6. TheLiveOptionsclass was removed from the SDK and is now provided by pipecat directly; import it frompipecat.services.deepgram.sttinstead ofdeepgram. (PR #3848) -
ServiceSwitcherStrategybase class now provides ahandle_error()hook for subclasses to implement error-based switching.ServiceSwitcherdefaults toServiceSwitcherStrategyManualandstrategy_typeis now optional. (PR #3861) -
Support for Voice Focus 2.0 models.
- Updated
aic-sdkto~=2.1.0to support Voice Focus 2.0 models. - Cleaned unused
ParameterFixedErrorexception handling inAICFilterparameter setup. (PR #3889)
- Updated
-
max_context_tokensandmax_unsummarized_messagesinLLMAutoContextSummarizationConfig(and deprecatedLLMContextSummarizationConfig) can now be set toNoneindependently to disable that summarization threshold. At least one must remain set. (PR #3914) -
⚠️ Removed
formatted_finalsandword_finalization_max_wait_timefromAssemblyAIConnectionParamsas these were v2 API parameters not supported in v3. Clarified thatformat_turnsonly applies to Universal-Streaming models; U3 Pro has automatic formatting built-in. (PR #3927) -
Changed
DeepgramTTSServiceto send a Clear message on interruption instead of disconnecting and reconnecting the WebSocket, allowing the connection to persist throughout the session. (PR #3958) -
Re-added
enhancement_levelsupport toAICFilterwith runtimeFilterEnableFramecontrol, applyingProcessorParameter.BypassandProcessorParameter.EnhancementLeveltogether. (PR #3961) -
Updated
daily-pythondependency from~=0.23.0to~=0.24.0. (PR #3970) -
Updated
FishAudioTTSServicedefault model froms1tos2-pro, matching Fish Audio's latest recommended model for improved quality and speed. (PR #3973) -
AzureSTTServiceregionparameter is now optional whenprivate_endpointis provided. AValueErroris raised if neither is given, and a warning is logged if both are provided (private_endpointtakes priority). (PR #3974)
Deprecated
-
Deprecated
AudioContextTTSServiceandAudioContextWordTTSService. SubclassWebsocketTTSServicedirectly instead; audio context management is now part of the baseTTSService.- Deprecated
WordTTSService,WebsocketWordTTSService, andInterruptibleWordTTSService. Word timestamp logic is now always active inTTSServiceand no longer needs to be opted into via a subclass. (PR #3804)
- Deprecated
-
Deprecated
pipecat.services.google.llm_vertex,pipecat.services.google.llm_openai, andpipecat.services.google.gemini_live.llm_vertexmodules. Usepipecat.services.google.vertex.llm,pipecat.services.google.openai.llm, andpipecat.services.google.gemini_live.vertex.llminstead. The old import paths still work but will emit aDeprecationWarning. (PR #3980)
Removed
- ⚠️ Removed
supports_word_timestampsparameter fromTTSService.__init__(). Word timestamp logic is now always active. Remove this argument from any custom subclasssuper().__init__()calls. (PR #3804)
Fixed
-
Fixed
DeepgramSTTServicekeepalive ping timeout disconnections. The deepgram-sdk v6 removed automatic keepalive; pipecat now sends explicitKeepAlivemessages every 5 seconds, within the recommended 3–5 second interval before Deepgram's 10-second inactivity timeout. (PR #3848) -
Fixed
BufferError: Existing exports of data: object cannot be re-sizedinAICFiltercaused by holding amemoryviewon the mutable audio buffer across async yield points. (PR #3889) -
Fixed TTS context not being appended to the assistant message history when using
TTSSpeakFramewithappend_to_context=Truewith some TTS providers. (PR #3936) -
Fixed context summarization leaving orphaned tool responses in the kept context when tool calls were moved to the summarized portion. (PR #3937)
-
Fixed turn completion state not resetting at end of LLM responses.
LLMFullResponseEndFrameis pushed (not received) by the LLM service, so the mixin now handles it inpush_frameinstead ofprocess_frame. (PR #3956) -
Fixed turn completion instructions being injected as a context system message instead of using
system_instruction. This caused warning spam whensystem_instructionwas also set and didn't persist across full context updates. (PR #3957) -
Fixed
TTSServiceaudio context queue getting blocked whenappend_to_audio_context()was called with aNonecontext ID, which prevented subsequent audio from being delivered. (PR #3958) -
Fixed
on_call_state_updatedevent handler in LiveKit transport receiving incorrect number of arguments due to redundantselfpassed to_call_event_handler. (PR #3959) -
Fixed OpenAI Realtime, OpenAI Realtime Beta, and Grok realtime services treating
conversation_already_has_active_responseas a fatal error. These services now log it as a non-fatal debug event when a response is already in progress. (PR #3960) -
Fixed
SmallWebRTCConnectionsilently discarding messages sent before the data channel is open by queuing them and flushing once the channel is ready. A bounded queue (MAX_MESSAGE_QUEUE_SIZE = 50) prevents unbounded memory growth, and a 10-second timeout after connection clears the queue and falls back to discard mode if the data channel never opens. (PR #3962) -
Fixed
AzureSTTServicefailing to initialize whenprivate_endpointis provided. The Azure Speech SDK'sSpeechConfigdoes not accept bothregionandendpointsimultaneously, so they are now passed conditionally. (PR #3967) -
Fixed
GoogleLLMServiceignoring thesystem_instructionset via constructor orGoogleLLMSettingswhen a system message was also present in the context. The settings value now correctly takes priority, and a warning is logged when both are set. (PR #3976)
Other
-
Updated foundational examples to use
system_instructionon LLM services instead of adding system messages toLLMContext. (PR #3918) -
Updated AssemblyAI turn detection example to use
keyterms_promptlist format instead ofpromptstring for improved clarity. (PR #3929) -
Updated foundational examples and eval scripts to use
"user"role instead of"system"when adding messages toLLMContext, since system prompts should be set viasystem_instructionon the LLM service. (PR #3931)
[0.0.104] - 2026-03-02
Added
-
Added
TextAggregationMetricsDatametric measuring the time from the first LLM token to the first complete sentence, representing the latency cost of sentence aggregation in the TTS pipeline. (PR #3696) -
Added support for using strongly-typed objects instead of dicts for updating service settings at runtime.
Instead of, say:
await task.queue_frame( STTUpdateSettingsFrame(settings={"language": Language.ES}) )you'd do:
await task.queue_frame( STTUpdateSettingsFrame(delta=DeepgramSTTSettings(language=Language.ES)) )Each service now vends strongly-typed classes like
DeepgramSTTSettingsrepresenting the service's runtime-updatable settings. (PR #3714) -
Added support for specifying private endpoints for Azure Speech-to-Text, enabling use in private networks behind firewalls. (PR #3764)
-
Added
LemonSliceTransportandLemonSliceApito support adding real-time LemonSlice Avatars to any Daily room. (PR #3791) -
Added
output_mediumparameter toAgentInputParamsandOneShotInputParamsin Ultravox service to control initial output medium (text or voice) at call creation time. (PR #3806) -
Added
TurnMetricsDataas a generic metrics class for turn detection, with e2e processing time measurement.KrispVivaTurnnow emitsTurnMetricsDatawithe2e_processing_time_mstracking the interval from VAD speech-to-silence transition to turn completion. (PR #3809) -
Added
on_audio_context_interrupted()andon_audio_context_completed()callbacks toAudioContextTTSService. Subclasses can override these to perform provider-specific cleanup instead of overriding_handle_interruption(). (PR #3814) -
Added
on_summary_appliedevent toLLMContextSummarizerfor observability, providing message counts before and after context summarization. (PR #3855) -
Added
summary_message_templatetoLLMContextSummarizationConfigfor customizing how summaries are formatted when injected into context (e.g., wrapping in XML tags). (PR #3855) -
Added
summarization_timeouttoLLMContextSummarizationConfig(default 120s) to prevent hung LLM calls from permanently blocking future summarizations. (PR #3855) -
Added optional
llmfield toLLMContextSummarizationConfigfor routing summarization to a dedicated LLM service (e.g., a cheaper/faster model) instead of the pipeline's primary model. (PR #3855) -
Add AssemblyAI u3-rt-pro model support with built-in turn detection mode (PR #3856)
-
Added
LLMSummarizeContextFrameto trigger on-demand context summarization from anywhere in the pipeline (e.g. a function call tool). Accepts an optionalconfig: LLMContextSummaryConfigto override summary generation settings per request. (PR #3863) -
Added
LLMContextSummaryConfig(summary generation params:target_context_tokens,min_messages_after_summary,summarization_prompt) andLLMAutoContextSummarizationConfig(auto-trigger thresholds:max_context_tokens,max_unsummarized_messages, plus a nestedsummary_config). These replace the monolithicLLMContextSummarizationConfig. (PR #3863) -
Added support for the
speed_alphaparameter to thearcanamodel inRimeTTSService. (PR #3873) -
Added
ClientConnectedFrame, a newSystemFramepushed by all transports (Daily, LiveKit, FastAPI WebSocket, WebSocket Server, SmallWebRTC, HeyGen, Tavus) when a client connects. Enables observers to track transport readiness timing. (PR #3881) -
Added
StartupTimingObserverfor measuring how long each processor'sstart()method takes during pipeline startup. Also measures transport readiness — the time fromStartFrameto first client connection — via theon_transport_timing_reportevent. (PR #3881) -
Added
BotConnectedFramefor SFU transports andon_transport_timing_reportevent toStartupTimingObserverwith bot and client connection timing. (PR #3881) -
Added optional
directionparameter toPipelineTask.queue_frame()andPipelineTask.queue_frames(), allowing frames to be pushed upstream from the end of the pipeline. (PR #3883) -
Added
on_latency_breakdownevent toUserBotLatencyObserverproviding per-service TTFB, text aggregation, user turn duration, and function call latency metrics for each user-to-bot response cycle. (PR #3885) -
Added
on_first_bot_speech_latencyevent toUserBotLatencyObservermeasuring the time from client connection to first bot speech. Anon_latency_breakdownis also emitted for this first speech event. (PR #3885) -
Added
broadcast_interruption()toFrameProcessor. This method pushes anInterruptionFrameboth upstream and downstream directly from the calling processor, avoiding the round-trip through the pipeline task thatpush_interruption_task_frame_and_wait()required. (PR #3896)
Changed
-
Added
text_aggregation_modeparameter toTTSServiceand all TTS subclasses with a newTextAggregationModeenum (SENTENCE,TOKEN). All text now flows through text aggregators regardless of mode, enabling pattern detection and tag handling in TOKEN mode. (PR #3696) -
⚠️ Refactored runtime-updatable service settings to use strongly-typed classes (
TTSSettings,STTSettings,LLMSettings, and service-specific subclasses) instead of plain dicts. Each service's_settingsnow holds these strongly-typed objects. For service maintainers, see changes in COMMUNITY_INTEGRATIONS.md. (PR #3714) -
Word timestamp support has been moved from
WordTTSServiceintoTTSServicevia a newsupports_word_timestampsparameter. Services that previously extendedWordTTSService,AudioContextWordTTSService, orWebsocketWordTTSServicenow passsupports_word_timestamps=Trueto their parent__init__instead. (PR #3786) -
Improved Ultravox TTFB measurement accuracy by using VAD speech end time instead of
UserStoppedSpeakingFrametiming. (PR #3806) -
Aligned
UltravoxRealtimeLLMServiceframe handling with OpenAI/Gemini realtime services: addedInterruptionFramehandling with metrics cleanup, processing metrics at response boundaries, and improved agent transcript handling for both voice and text output modalities. (PR #3806) -
Updated
OpenAIRealtimeLLMServicedefault model togpt-realtime-1.5. (PR #3807) -
Added
api_keyparameter toKrispVivaSDKManager,KrispVivaTurn, andKrispVivaFilterfor Krisp SDK v1.6.1+ licensing. Falls back toKRISP_VIVA_API_KEYenvironment variable. (PR #3809) -
Bumped
nltkminimum version from 3.9.1 to 3.9.3 to resolve a security vulnerability. (PR #3811) -
ServiceSettingsUpdateFrames are nowUninterruptibleFrames. Generally speaking, you don't want a user interruption to prevent a service setting change from going into effect. Note that you usually don't useServiceSettingsUpdateFramedirectly, you use one of its subclasses:LLMUpdateSettingsFrameTTSUpdateSettingsFrameSTTUpdateSettingsFrame(PR #3819)
-
Updated context summarization to use
userrole instead ofassistantfor summary messages. (PR #3855) -
Rename
AssemblyAISTTServiceparametermin_end_of_turn_silence_when_confidentparameter tomin_turn_silence(old name still supported with deprecation warning) (PR #3856) -
⚠️ Renamed
LLMAssistantAggregatorParamsfields:enable_context_summarization→enable_auto_context_summarizationandcontext_summarization_config→auto_context_summarization_config(now acceptsLLMAutoContextSummarizationConfig). The old names still work with aDeprecationWarningfor one release cycle. (PR #3863) -
ElevenLabsRealtimeSTTServicenow setsTranscriptionFrame.finalizedtoTruewhen usingCommitStrategy.MANUAL. (PR #3865) -
Updated numba version pin from == to >=0.61.2 (PR #3868)
-
Updated tracing code to use
ServiceSettingsdataclass API (given_fields(), attribute access) instead of dict-style access (.items(),in, subscript). (PR #3879) -
⚠️ Removed
eventfield andcomplete()method fromInterruptionFrame. Removedeventfield fromInterruptionTaskFrame. These are no longer needed sincebroadcast_interruption()does not require a round-trip completion signal. (PR #3896) -
Moved
pipecat.services.deepgram.stt_sagemakerandpipecat.services.deepgram.tts_sagemakertopipecat.services.deepgram.sagemaker.sttandpipecat.services.deepgram.sagemaker.tts. The old import paths still work but emit aDeprecationWarning. (PR #3902)
Deprecated
-
⚠️ Deprecated
aggregate_sentencesparameter onTTSServiceand all TTS subclasses. Usetext_aggregation_mode=TextAggregationMode.SENTENCEortext_aggregation_mode=TextAggregationMode.TOKENinstead. (PR #3696) -
Deprecated
set_model(),set_voice(), andset_language()on AI services in favor of runtime updates viaTTSUpdateSettingsFrame,STTUpdateSettingsFrame, andLLMUpdateSettingsFrame.⚠️ Note, too, a subtle behavior change in these deprecated methods. Whereas previously only
set_language()caused the service to actually react to the update (e.g. by reconnecting to a remote service so it an pick up the change), now all these methods do. This change was made as part of a refactor making them all work the same way under the hood. (PR #3714) -
Dict-based
*UpdateSettingsFrame(settings={...})is deprecated in favor of passing typed settings delta objects with*UpdateSettingsFrame(delta={...}). (PR #3714) -
Deprecated
WordTTSService,WebsocketWordTTSService,AudioContextWordTTSService, andInterruptibleWordTTSService. Use their non-word counterparts withsupports_word_timestamps=Trueinstead:WordTTSService→TTSService(supports_word_timestamps=True)WebsocketWordTTSService→WebsocketTTSService(supports_word_timestamps=True)AudioContextWordTTSService→AudioContextTTSService(supports_word_timestamps=True)InterruptibleWordTTSService→InterruptibleTTSService(supports_word_timestamps=True)(PR #3786)
-
Deprecated
SmartTurnMetricsDatain favor ofTurnMetricsData.BaseSmartTurnnow emitsTurnMetricsDatadirectly. (PR #3809) -
Deprecated
LLMContextSummarizationConfig. UseLLMAutoContextSummarizationConfigwith a nestedLLMContextSummaryConfiginstead. The old class emits aDeprecationWarning. (PR #3863) -
Deprecated
push_interruption_task_frame_and_wait()inFrameProcessor. Usebroadcast_interruption()instead. The old method now delegates tobroadcast_interruption()and logs a deprecation warning. (PR #3896)
Removed
-
Removed
local-smart-turn-v3optional extra frompyproject.toml. Thetransformersandonnxruntimepackages are now always installed as core dependencies since they are required by the default turn stop strategy,TurnAnalyzerUserTurnStopStrategywhich usesLocalSmartTurnAnalyzerV3. (PR #3803) -
⚠️ Removed
PlayHTTTSServiceandPlayHTHttpTTSService. PlayHT has been shut down and is no longer available. (PR #3838)
Fixed
-
Added
LLMSpecificMessagehandling inLLMContextSummarizationUtilto skip provider-specific messages during context summarization. (PR #3794) -
Treated
response_cancel_not_activeas a non-fatal error in realtime services (OpenAIRealtimeLLMService,GrokRealtimeLLMService,OpenAIRealtimeBetaLLMService) to prevent WebSocket disconnection when cancelling an inactive response. (PR #3795) -
Fixed Poetry compatibility by inlining
local-smart-turn-v3dependencies (transformers,onnxruntime) into core dependencies instead of using a self-referential extra. (PR #3803) -
Fixed
SentryMetricsmethod signatures to match updatedFrameProcessorMetricsbase class, resolvingTypeErrorwhen usingstart_time/end_timekeyword arguments. (PR #3808) -
Fixed STT TTFB metrics not being reported for
SonioxSTTServiceandAWSTranscribeSTTServicedue to missingcan_generate_metrics()override. (PR #3813) -
Fixed an issue where
AudioContextTTSService-based providers (AsyncAI, ElevenLabs, Inworld, Rime) did not close or clean up their server-side audio contexts after normal speech completion, only on interruption. (PR #3814) -
Fixed STT TTFB metrics measuring timeout expiry time instead of actual transcript arrival time. (PR #3822)
-
Fixed
InterimTranscriptionFrameandTranslationFramebeing unintentionally pushed downstream inLLMUserAggregator. They are now consumed likeTranscriptionFrame. (PR #3825) -
Fixed misleading "Empty audio frame received for STT service" warnings when using audio filters (e.g.
RNNoiseFilter,KrispVivaFilter,AICFilter) that buffer audio internally. (PR #3828) -
Fixed issues with
RimeNonJsonTTSServicewhere trailing punctuation is sometimes vocalized (PR #3837) -
Fixed
TTSSpeakFramenot committing spoken text to the conversation context when used outside of an LLM response (e.g., bot greetings or injected speech). (PR #3845) -
Removed verbose per-chunk audio logging from
GenesysAudioHookSerializerthat flooded production logs. (PR #3850) -
Add beta feature warning when using custom prompts with AssemblyAI (PR #3856)
-
Fixed
LocalSmartTurnAnalyzerV3producing incorrect end-of-turn predictions at non-16kHz sample rates (e.g. 8kHz Twilio telephony) by adding automatic resampling to 16kHz before Whisper feature extraction. (PR #3857) -
Fixed
PipelineTaskdouble-insertingRTVIProcessorinto the frame chain when the user provides both anRTVIProcessorin the pipeline and a customRTVIObserversubclass in observers. (PR #3867) -
Fixed turn completion instructions being lost when
LLMMessagesUpdateFramereplaces the LLM context. Whenfilter_incomplete_user_turnsis enabled, the turn completion system message is now re-injected after context replacement. (PR #3888) -
Fixed Azure TTS and STT services silently swallowing cancellation errors (invalid API key, network failures, rate limiting) instead of propagating them as
ErrorFrames to the pipeline. (PR #3893)
Performance
- Switched
GradiumTTSServicefromInterruptibleWordTTSServicetoAudioContextWordTTSService, eliminating websocket disconnect/reconnect on every interruption by usingclient_req_id-based multiplexing. (PR #3759)
Other
- Standardized Sarvam STT/TTS User-Agent header handling to consistently send Pipecat SDK identity in websocket requests. (PR #3886)
[0.0.103] - 2026-02-20
Added
-
Added
"timestampTransportStrategy": "ASYNC"toInworldAITTSService. This allows timestamps info to trail audio chunks arrival, resulting in much better first audio chunk latency (PR #3625) -
Added model-specific
InputParamstoRimeTTSService: arcana params (repetition_penalty,temperature,top_p) and mistv2 params (no_text_normalization,save_oovs,segment). Model, voice, and param changes now trigger WebSocket reconnection. (PR #3642) -
Added
write_transport_frame()hook toBaseOutputTransportallowing transport subclasses to handle custom frame types that flow through the audio queue. (PR #3719) -
Added
DailySIPTransferFrameandDailySIPReferFrameto the Daily transport. These frames queue SIP transfer and SIP REFER operations with audio, so the operation executes only after the bot finishes its current utterance. (PR #3719) -
Added keepalive support to
SarvamSTTServiceto prevent idle connection timeouts (e.g. when used behind aServiceSwitcher). (PR #3730) -
Added
UserIdleTimeoutUpdateFrameto enable or disable user idle detection at runtime by updating the timeout dynamically. (PR #3748) -
Added
broadcast_sibling_idfield to the baseFrameclass. This field is automatically set bybroadcast_frame()andbroadcast_frame_instance()to the ID of the paired frame pushed in the opposite direction, allowing receivers to identify broadcast pairs. (PR #3774) -
Added
ignored_sourcesparameter toRTVIObserverParamsandadd_ignored_source()/remove_ignored_source()methods toRTVIObserverto suppress RTVI messages from specific pipeline processors (e.g. a silent evaluation LLM). (PR #3779) -
Added
DeepgramSageMakerTTSServicefor running Deepgram TTS models deployed on AWS SageMaker endpoints via HTTP/2 bidirectional streaming. Supports the Deepgram TTS protocol (Speak, Flush, Clear, Close), interruption handling, and per-turn TTFB metrics. (PR #3785)
Changed
-
⚠️
RimeTTSServicenow defaults tomodel="arcana"and thewss://users-ws.rime.ai/ws3endpoint.InputParamsdefaults changed from mistv2-specific values toNone— only explicitly-set params are sent as query params. (PR #3642) -
AICFilternow shares read-only AIC models via a singletonAICModelManagerinaic_filter.py.- Multiple filters using the same model path or
(model_id, model_download_dir)share one loaded model, with reference counting and concurrent load deduplication. - Model file I/O runs off the event loop so the filter does not block. (PR #3684)
- Multiple filters using the same model path or
-
Added
X-User-AgentandX-Request-Idheaders toInworldTTSServicefor better traceability. (PR #3706) -
DailyUpdateRemoteParticipantsFrameis no longer deprecated and is now queued with audio like other transport frames. (PR #3719) -
Bumped Pillow dependency upper bound from
<12to<13to allow Pillow 12.x. (PR #3728) -
Moved STT keepalive mechanism from
WebsocketSTTServiceto theSTTServicebase class, allowing any STT service (not just websocket-based ones) to use idle-connection keepalive via thekeepalive_timeoutandkeepalive_intervalparameters. (PR #3730) -
Improved audio context management in
AudioContextTTSServiceby moving context ID tracking to the base class and addingreuse_context_id_within_turnparameter to control concurrent TTS request handling.- Added helper methods:
has_active_audio_context(),get_active_audio_context_id(),remove_active_audio_context(),reset_active_audio_context() - Simplified Cartesia, ElevenLabs, Inworld, Rime, AsyncAI, and Gradium TTS implementations by removing duplicate context management code (PR #3732)
- Added helper methods:
-
UserIdleControlleris now always created with a default timeout of 0 (disabled). Theuser_idle_timeoutparameter changed fromOptional[float] = Nonetofloat = 0inUserTurnProcessor,LLMUserAggregatorParams, andUserIdleController. (PR #3748) -
Change the version specifier from
>=0.2.8to~=0.2.8for thespeechmatics-voicepackage to ensure compatibility with future patch versions. (PR #3761) -
Updated
InworldTTSServiceandInworldHttpTTSServiceto useASYNCtimestamp transport strategy by default (PR #3765) -
Added
start_timeandend_timeparameters tostart_ttfb_metrics(),stop_ttfb_metrics(),start_processing_metrics(), andstop_processing_metrics()inFrameProcessorandFrameProcessorMetrics, allowing custom timestamps for metrics measurement.STTServicenow uses these instead of custom TTFB tracking. (PR #3776) -
Updated default Anthropic model from
claude-sonnet-4-5-20250929toclaude-sonnet-4-6. (PR #3792)
Deprecated
- Deprecated unused
Traceable,@traceable,@traced, andAttachmentStrategyinpipecat.utils.tracing.class_decorators. This module will be removed in a future release. (PR #3733)
Fixed
-
Fixed race condition where
RTVIObservercould send messages beforeDailyTransportjoin completed. Outbound messages are now queued & delivered after the transport is ready. (PR #3615) -
Fixed async generator cleanup in OpenAI LLM streaming to prevent
AttributeErrorwith uvloop on Python 3.12+ (MagicStack/uvloop#699). (PR #3698) -
Fixed
SmallWebRTCTransportinput audio resampling to properly handle all sample rates, including 8kHz audio. (PR #3713) -
Fixed a race condition in
RTVIObserverwhere bot output messages could be sent before the bot-started-speaking event. (PR #3718) -
Fixed Grok Realtime
session.updatedevent parsing failure caused by the API returning prefixed voice names (e.g."human_Ara"instead of"Ara"). (PR #3720) -
Fixed context ID reuse issue in
ElevenLabsTTSService,InworldTTSService,RimeTTSService,CartesiaTTSService,AsyncAITTSService, andPlayHTTTSService. Services now properly reuse the same context ID across multiplerun_tts()invocations within a single LLM turn, preventing context tracking issues and incorrect lifecycle signaling. (PR #3729) -
Fixed word timestamp interleaving issue in
ElevenLabsTTSServicewhen processing multiple sentences within a single LLM turn. (PR #3729) -
Fixed tracing service decorators executing the wrapped function twice when the function itself raised an exception (e.g., LLM rate limit, TTS timeout). (PR #3735)
-
Fixed
LLMUserAggregatorbroadcasting mute events beforeStartFramereaches downstream processors. (PR #3737) -
Fixed
UserIdleControllerfalse idle triggers caused by gaps between user and bot activity frames. The idle timer now starts only afterBotStoppedSpeakingFrameand is suppressed during active user turns and function calls. (PR #3744) -
Fixed incorrect
sample_rateassignment inTavusInputTransport._on_participant_audio_data(was usingaudio.audio_framesinstead ofaudio.sample_rate). (PR #3768) -
Fixed
RTVIObservernot processing upstream-only frames. Previously, all upstream frames were filtered out to avoid duplicate messages from broadcasted frames. Now only upstream copies of broadcasted frames are skipped. (PR #3774) -
Fixed mutable default arguments in
LLMContextAggregatorPair.__init__()that could cause shared state across instances. (PR #3782) -
Fixed
DeepgramSageMakerSTTServiceto properly track finalize lifecycle usingrequest_finalize()/confirm_finalize()and useis_final(instead ofis_final and speech_final) for final transcription detection, matchingDeepgramSTTServicebehavior. (PR #3784) -
Fixed a race condition in
AudioContextTTSServicewhere the audio context could time out between consecutive TTS requests within the same turn, causing audio to be discarded. (PR #3787) -
Fixed
push_interruption_task_frame_and_wait()hanging indefinitely when theInterruptionFramedoes not reach the pipeline sink within the timeout. Added atimeoutkeyword argument to customize the wait duration. (PR #3789)
[0.0.102] - 2026-02-10
Added
-
Added
ResembleAITTSServicefor text-to-speech using Resemble AI's streaming WebSocket API with word-level timestamps and jitter buffering for smooth audio playback. (PR #3134) -
Added
UserBotLatencyObserverfor tracking user-to-bot response latency. When tracing is enabled, latency measurements are automatically recorded asturn.user_bot_latency_secondsattributes on OpenTelemetry turn spans. (PR #3355) -
Added
append_to_contextparameter toTTSSpeakFramefor conditional LLM context addition.- Allows fine-grained control over whether text should be added to conversation context
- Defaults to
Trueto maintain backward compatibility (PR #3584)
-
Added TTS context tracking system with
context_idfield to trace audio generation through the pipeline.TTSAudioRawFrame,TTSStartedFrame,TTSStoppedFramenow includecontext_idAggregatedTextFrameandTTSTextFramenow includecontext_id- Enables tracking which TTS request generated specific audio chunks (PR #3584)
-
Added support for Inworld TTS Websocket Auto Mode for improved latency (PR #3593)
-
Added new frames for context summarization:
LLMContextSummaryRequestFrameandLLMContextSummaryResultFrame. (PR #3621) -
Added context summarization feature to automatically compress conversation history when conversation length limits (by token or message count) are reached, enabling efficient long-running conversations.
- Configure via
enable_context_summarization=TrueinLLMAssistantAggregatorParams - Customize behavior with
LLMContextSummarizationConfig(max tokens, thresholds, etc.) - Automatically preserves incomplete function call sequences during summarization
- See new examples:
examples/foundational/54-context-summarization-openai.pyandexamples/foundational/54a-context-summarization-google.py(PR #3621)
- Configure via
-
Added RTVI function call lifecycle events (
llm-function-call-started,llm-function-call-in-progress,llm-function-call-stopped) with configurable security levels viaRTVIObserverParams.function_call_report_level. Supports per-function control over what information is exposed (DISABLED,NONE,NAME, orFULL). (PR #3630) -
Added
RequestMetadataFrameand metadata handling forServiceSwitcherto ensure STT services correctly emitSTTMetadataFramewhen switching between services. Only the active service's metadata is propagated downstream, switching services triggers the newly active service to re-emit its metadata, and proper frame ordering is maintained at startup. (PR #3637) -
Added
STTMetadataFrameto broadcast STT service latency information at pipeline start.- STT services broadcast P99 time-to-final-segment (
ttfs_p99_latency) to downstream processors - Turn stop strategies automatically configure their STT timeout from this metadata
- Developers can override
ttfs_p99_latencyvia constructor argument for custom deployments - Added measured P99 values for STT providers.
- See stt-benchmark to measure latency for your configuration (PR #3637)
- STT services broadcast P99 time-to-final-segment (
-
Added support for
is_sandboxparameter inLiveAvatarNewSessionRequestto enable sandbox mode for HeyGen LiveAvatar sessions. (PR #3653) -
Added support for
video_settingsparameter inLiveAvatarNewSessionRequestto configure video encoding (H264/VP8) and quality levels. (PR #3653) -
Added
OpenAIRealtimeSTTServicefor real-time streaming speech-to-text using OpenAI's Realtime API WebSocket transcription sessions. Supports local VAD and server-side VAD modes, noise reduction, and automatic reconnection. (PR #3656) -
Added
bulbul:v3-betaTTS model support for Sarvam AI with temperature control and 25 new speaker voices. (PR #3671) -
Added
saaras:v3STT model support for Sarvam AI with newmodeparameter (transcribe, translate, verbatim, translit, codemix) and prompt support. (PR #3671) -
Added new OpenAI TTS voice options
marinandcedar. (PR #3682) -
Added
UserMuteStartedFrameandUserMuteStoppedFramesystem frames, and correspondinguser-mute-started/user-mute-stoppedRTVI messages, so clients can observe when mute strategies activate or deactivate. (PR #3687)
Changed
-
Updated all 30+ TTS service implementations to support context tracking with
context_id.- Services now generate and propagate context IDs through TTS frames
- Enables end-to-end tracing of TTS requests through the pipeline (PR #3584)
-
⚠️
TTSService.run_tts()now requires acontext_idparameter for context tracking.- Custom TTS service implementations must update their
run_tts()signature - Before:
async def run_tts(self, text: str) -> AsyncGenerator[Frame, None]: - After:
async def run_tts(self, text: str, context_id: str) -> AsyncGenerator[Frame, None]:(PR #3584)
- Custom TTS service implementations must update their
-
Simplified context aggregators to use
frame.append_to_contextflag instead of tracking internal state.- Cleaner logic in
LLMResponseAggregatorandLLMResponseUniversalAggregator - More consistent behavior across aggregator implementations (PR #3584)
- Cleaner logic in
-
Updated timestamps to be cumulative within an agent turn, using flushCompleted message as an indication of when timestamps from the server are reset to 0 (PR #3593)
-
Changed
KokoroTTSServiceto usekokoro-onnxinstead ofkokoroas the underlying TTS engine. (PR #3612) -
Improved user turn stop timing in
TranscriptionUserTurnStopStrategyandTurnAnalyzerUserTurnStopStrategy.- Timeout now starts on
VADUserStoppedSpeakingFramefor tighter, more predictable timing - Added support for finalized transcripts
(
TranscriptionFrame.finalized=True) to trigger earlier - Added fallback timeout for edge cases where transcripts arrive without VAD events
- Removed
InterimTranscriptionFramehandling (no longer affects timing) (PR #3637)
- Timeout now starts on
-
Improved the accuracy of the
UserBotLatencyObserverandUserBotLatencyLogObserverby measuring from the time when the user actually starts speaking. (PR #3637) -
⚠️ Renamed
timeoutparameter touser_speech_timeoutinTranscriptionUserTurnStopStrategy. (PR #3637) -
Updated the
VADUserStartedSpeakingFrameto includestart_secsandtimestampandVADUserStoppedSpeakingFrameto includestop_secsandtimestamp, removing the need to separately handle theSpeechControlParamsFramefor VADParams values. (PR #3637) -
⚠️ Renamed
TranscriptionUserTurnStopStrategytoSpeechTimeoutUserTurnStopStrategy. The old name is deprecated and will be removed in a future release. (PR #3637) -
AssemblyAISTTServicenow automatically configures optimal settings for manual turn detection whenvad_force_turn_endpoint=True. This setsend_of_turn_confidence_threshold=1.0andmax_turn_silence=2000by default, which disables model-based turn detection and reduces latency by relying on external VAD for turn endpoints. Warnings are logged if conflicting settings are detected. (PR #3644) -
Upgraded the
pipecat-ai-small-webrtc-prebuiltpackage to v2.1.0. (PR #3652) -
Changed default session mode from "CUSTOM" to "LITE" in HeyGen LiveAvatar integration, with VP8 as the default video encoding. (PR #3653)
-
⚠️ The default
VADParamsstop_secsdefault is changing from0.8seconds to0.2seconds. This change both simplifies the developer experience and improves the performance of STT services. With a shorterstop_secsvalue, STT services using a local VAD can finalize sooner, resulting in faster transcription.SpeechTimeoutUserTurnStopStrategy: control how long to wait for additional user speech usinguser_speech_timeout(default: 0.6 sec).TurnAnalyzerUserTurnStopStrategy: the turn analyzer automatically adjusts the user wait time based on the audio input. (PR #3659)
-
Moved interruption wait event from per-processor instance state to
InterruptionFrameitself. AddedInterruptionFrame.complete()to signal when the interruption has fully traversed the pipeline. Custom processors that block or consume anInterruptionFramebefore it reaches the pipeline sink must callframe.complete()to avoid stallingpush_interruption_task_frame_and_wait(). A warning is logged if completion does not happen within 2 seconds. (PR #3660) -
Update the default model to
scribe_v2forElevenLabsSTTService. (PR #3664) -
Changed the
DeepgramSTTServicedefault setting forsmart_formattoFalse, as agents don't need smart formatting. Disabling this setting provides a small performance improvement, as well. (PR #3666) -
Changed
FunctionCallCancelFrameto broadcast in both directions for consistency with other function call frames. (PR #3672) -
Changed default user turn stop strategy from
TranscriptionUserTurnStopStrategytoTurnAnalyzerUserTurnStopStrategywithLocalSmartTurnAnalyzerV3. (PR #3689) -
Renamed
RequestMetadataFrametoServiceSwitcherRequestMetadataFrameand added aservicefield to target a specific service. The frame is now pushed downstream by services after handling instead of being silently consumed. (PR #3692) -
Update
SonioxSTTServiceto setvad_force_turn_endpointtoTrue. This setting disabled the turn detection logic available natively in Soniox. Instead, Soniox relies on a local VAD to finalize the transcript. This configuration meaningfully reduces the time to final segment for Soniox. With this setting enabled, Soniox outputs a transcript in ~250ms (median). Pipecat enables smart-turn detection by default using theLocalSmartTurnAnalyzerV3. To use the native turn detection logic in Soniox, just setvad_force_turn_endpointtoFalse. (PR #3697) -
Update
SonioxSTTServicedefault model tostt-rt-v4. (PR #3697) -
Updated the default model to
async_flash_v1.0and base URL tohttps://api.async.comforAsyncAITTSService. (PR #3701)
Deprecated
-
Deprecated
UserBotLatencyLogObserver. UseUserBotLatencyObserverdirectly with itson_latency_measuredevent handler instead. (PR #3355) -
Deprecated
RTVILLMFunctionCallMessage,RTVILLMFunctionCallMessageData, andRTVIProcessor.handle_function_call(). Use the newllm-function-call-in-progressevent sent automatically byRTVIObserverinstead. (PR #3630)
Removed
- ⚠️ Removed
timeoutparameter fromTurnAnalyzerUserTurnStopStrategy. The timeout is now managed internally based on STT latency. (PR #3637)
Fixed
-
Fixed pipeline freeze when
InterruptionFramediscardsEndFrameorStopFrameby making terminal frames uninterruptible. (PR #3542) -
Fixed OpenAI LLM stream not being closed on cancellation/exception, which could leak sockets. (PR #3589)
-
Fixed
PipelineTaskadding duplicateRTVIProcessorandRTVIObserverwhen they were already provided in the pipeline or observers list. They are now detected and skipped, with appropriate warnings and errors logged for mismatched configurations. (PR #3610) -
Fixed function call timeout task not being cancelled when the handler completes without calling
result_callbackor is cancelled externally, which causedRuntimeWarning: coroutine was never awaited. (PR #3616) -
Fixed sentence splitting for Japanese, Chinese, Korean, and other non-Latin languages in TTS pipeline. NLTK's sentence tokenizer does not support CJK languages, causing text to accumulate until flush instead of being split at sentence boundaries. Added fallback detection for unambiguous non-Latin sentence-ending punctuation (e.g.,
。,?,!). (PR #3617) -
Fixed
PipelineTaskto also callset_bot_ready()when an externalRTVIProcessoris provided. (PR #3623) -
Fixed
VADControllernot broadcastingSpeechControlParamsFrameon startup, which prevented STT services from receiving VAD params needed for TTFB measurement. (PR #3628) -
Fixed
StopAsyncIterationexceptions inparse_telephony_websocket()when WebSocket connections close before sending expected messages. (PR #3629) -
Fixed WebSocket transport error when broadcasting
InputTransportMessageFrameby correctly instantiating the frame with its message parameter. (PR #3635) -
Fixed orphan OpenTelemetry spans during flow initialization and transitions in tracing. (PR #3649)
-
Fixed
SambaNovaLLMServiceandGoogleLLMOpenAIBetaServicestreams not being closed on cancellation/exception, which could leak sockets. (PR #3663) -
Fixed an issue in
InworldTTSServicewhere punctuation was pronounced. Now, theInworldTTSServiceensures proper spacing between sentences, resolving pronunciation issues. (PR #3667) -
Fixed
ParallelPipelineallowing frames pushed by internal processors to escape during lifecycle frame (StartFrame/EndFrame/CancelFrame) synchronization. These frames are now buffered and flushed after all branches complete. (PR #3668) -
Fixed issues in Sarvam STT and TTS services: missing event handler registration for VAD signals,
Optional[bool]type annotations, WebSocket state cleanup on API errors, and TTS disconnect/reconnection state management. (PR #3671) -
Fixed
RTVIObserversending duplicate client messages for frames that are broadcast in both directions (e.g.UserStartedSpeakingFrame,FunctionCallResultFrame). (PR #3672) -
Fixed WebSocket STT services (ElevenLabs, Cartesia, Gladia, Soniox) disconnecting due to idle timeout when no audio is being sent (e.g. when inactive behind a
ServiceSwitcher).WebsocketSTTServicenow provides opt-in silence-based keepalive viakeepalive_timeoutandkeepalive_intervalparameters. (PR #3675)
[0.0.101] - 2026-01-30
Added
-
Additions for
AICFilterandAICVADAnalyzer:- Added model downloading support to
AICFilterwithmodel_idandmodel_download_dirparameters. - Added
model_pathparameter toAICFilterfor loading local.aicmodelfiles. - Added unit tests for
AICFilterandAICVADAnalyzer. (PR #3408)
- Added model downloading support to
-
Added handling for
server_content.interruptedsignal in the Gemini Live service for faster interruption response in the case where there isn't already turn tracking in the pipeline, e.g. local VAD + context aggregators. When there is already turn tracking in the pipeline, the additional interruption does no harm. (PR #3429) -
Added new
GenesysFrameSerializerfor the Genesys AudioHook WebSocket protocol, enabling bidirectional audio streaming between Pipecat pipelines and Genesys Cloud contact center. (PR #3500) -
Added
reached_upstream_typesandreached_downstream_typesread-only properties toPipelineTaskfor inspecting current frame filters. (PR #3510) -
Added
add_reached_upstream_filter()andadd_reached_downstream_filter()methods toPipelineTaskfor appending frame types. (PR #3510) -
Added
UserTurnCompletionLLMServiceMixinfor LLM services to detect and filter incomplete user turns. When enabled viafilter_incomplete_user_turnsinLLMUserAggregatorParams, the LLM outputs a turn completion marker at the start of each response: ✓ (complete), ○ (incomplete short), or ◐ (incomplete long). Incomplete turns are suppressed, and configurable timeouts automatically re-prompt the user. (PR #3518) -
Added
FrameProcessor.broadcast_frame_instance(frame)method to broadcast a frame instance by extracting its fields and creating new instances for each direction. (PR #3519) -
PipelineTasknow automatically addsRTVIProcessorand registersRTVIObserverwhenenable_rtvi=True(default), simplifying pipeline setup. (PR #3519) -
Added
RTVIProcessor.create_rtvi_observer()factory method for creating RTVI observers. (PR #3519) -
Added
video_out_codecparameter toTransportParamsallowing configuration of the preferred video codec (e.g.,"VP8","H264","H265") for video output inDailyTransport. (PR #3520) -
Added
locationparameter to Google TTS services (GoogleHttpTTSService,GoogleTTSService,GeminiTTSService) for regional endpoint support. (PR #3523) -
Added new
PIPECAT_SMART_TURN_LOG_DATAenvironment variable, which causes Smart Turn input data to be saved to disk (PR #3525) -
Added
result_callbackparameter toUserImageRequestFrameto support deferred function call results. (PR #3571) -
Added
function_call_timeout_secsparameter toLLMServiceto configure timeout for deferred function calls (defaults to 10.0 seconds). (PR #3571) -
Added
vad_analyzerparameter toLLMUserAggregatorParams. VAD analysis is now handled inside theLLMUserAggregatorrather than in the transport, keeping voice activity detection closer to where it is consumed. Thevad_analyzeronBaseInputTransportis now deprecated.context_aggregator = LLMContextAggregatorPair( context, user_params=LLMUserAggregatorParams( vad_analyzer=SileroVADAnalyzer(), ), )(PR #3583)
-
Added
VADProcessorfor detecting speech in audio streams within a pipeline. PushesVADUserStartedSpeakingFrame,VADUserStoppedSpeakingFrame, andUserSpeakingFramedownstream based on VAD state changes. (PR #3583) -
Added
VADControllerfor managing voice activity detection state and emitting speech events independently of transport or pipeline processors. (PR #3583) -
Added local
PiperTTSServicefor offline text-to-speech using Piper voice models. The existing HTTP-based service has been renamed toPiperHttpTTSService. (PR #3585) -
main()inpipecat.runner.runnow accepts an optionalargparse.ArgumentParser, allowing bots to define custom CLI arguments accessible viarunner_args.cli_args. (PR #3590) -
Added
KokoroTTSServicefor local text-to-speech synthesis using the Kokoro-82M model. (PR #3595)
Changed
-
Updated
AICFilterandAICVADAnalyzerto use aic-sdk ~= 2.0.1. (PR #3408) -
Improved the STT TTFB (Time To First Byte) measurement, reporting the delay between when the user stops speaking and when the final transcription is received. Note: Unlike traditional TTFB which measures from a discrete request, STT services receive continuous audio input—so we measure from speech end to final transcript, which captures the latency that matters for voice AI applications. In support of this change, added
finalizedfield toTranscriptionFrameto indicate when a transcript is the final result for an utterance. (PR #3495) -
SarvamSTTServicenow defaultsvad_signalsandhigh_vad_sensitivitytoNone(omitted from connection parameters), improving latency by ~300ms compared to the previous defaults. (PR #3495) -
Changed frame filter storage from tuples to sets in
PipelineTask. (PR #3510) -
Changed default Inworld TTS model from
inworld-tts-1toinworld-tts-1.5-max. (PR #3531) -
FrameSerializernow subclasses fromBaseObjectto enable event support. (PR #3560) -
Added support for TTFS in
SpeechmaticsSTTServiceand set the default mode toEXTERNALto support Pipecat-controlled VAD.- Changed dependency to
speechmatics-voice[smart]>=0.2.8(PR #3562)
- Changed dependency to
-
⚠️ Changed function call handling to use timeout-based completion instead of immediate callback execution.
- Function calls that defer their results (e.g.,
UserImageRequestFrame) now use a timeout mechanism - The
result_callbackis invoked automatically when the deferred operation completes or after timeout - This change affects examples using
UserImageRequestFrame- theresult_callbackshould now be passed to the frame instead of being called immediately (PR #3571)
- Function calls that defer their results (e.g.,
-
Pipecat runner now uses
DAILY_ROOM_URLinstead ofDAILY_SAMPLE_ROOM_URL. (PR #3582) -
Updates to
GradiumSTTService:- Now flushes pending transcriptions when VAD detects the user stopped speaking, improving response latency.
GradiumSTTServicenow supportsInputParamsfor configuringlanguageanddelay_in_framessettings. (PR #3587)
Deprecated
- ⚠️ Deprecated
vad_analyzerparameter onBaseInputTransport. Passvad_analyzertoLLMUserAggregatorParamsinstead or useVADProcessorin the pipeline. (PR #3583)
Removed
- Removed deprecated
AICFilterparameters:enhancement_level,voice_gain,noise_gate_enable. (PR #3408)
Fixed
-
Fixed an issue where if you were using
OpenRouterLLMServicewith a Gemini model, it wouldn't handle multiple"system"messages as expected (and as we do inGoogleLLMService), which is to convert subsequent ones into"user"messages. Instead, the latest"system"message would overwrite the previous ones. (PR #3406) -
Transports now properly broadcast
InputTransportMessageFrameframes both upstream and downstream instead of only pushing downstream. (PR #3519) -
Fixed
FrameProcessor.broadcast_frame()to deep copy kwargs, preventing shared mutable references between the downstream and upstream frame instances. (PR #3519) -
Fixed OpenAI LLM services to emit
ErrorFrameon completion timeout, enabling proper error handling and LLMSwitcher failover. (PR #3529) -
Fixed a logging issue where non-ASCII characters (e.g., Japanese, Chinese, etc.) were being unnecessarily escaped to Unicode sequences when function call occurred. (PR #3536)
-
Fixed how audio tracks are synchronized inside the
AudioBufferProcessorto fix timing issues where silence and audio were misaligned between user and bot buffers. (PR #3541) -
Fixed race condition in
OpenAIRealtimeBetaLLMServicethat could cause an error when truncating the conversation. (PR #3567) -
Fixed an infinite loop in
WebsocketServicethat blocked the event loop when a remote server closed the connection gracefully. (PR #3574) -
Fixed
LLMUserAggregatorandLLMAssistantAggregatornot emitting pending transcripts viaon_user_turn_stoppedandon_assistant_turn_stoppedevents when the conversation ends (EndFrame) or is cancelled (CancelFrame). (PR #3575) -
Added missing
LiveKitRunnerArgumentsandLiveKitTransportsupport in runner utilities to enable LiveKit transport configuration. (PR #3580) -
Fixed race condition in
OpenAIRealtimeLLMServicethat could cause an error when truncating the conversation. (PR #3581) -
Fixed
PiperHttpTTSService(olfPiperTTSService) to resample audio output based on the model's sample rate parsed from the WAV header. (PR #3585) -
Fixed
UserTurnControllerto reset user turn timeout when interim transcriptions are received. (PR #3594) -
Fixed an issue in the
IVRNavigatorwhere theTextFrames pushed had incorrect spacing. Now, the internalIVRProcessorpushesAggregatedTextFrames when in conversation mode. This allows for controlling spacing of the outputted, aggregated text. (PR #3604) -
Fixed
GeminiLiveLLMServicetranscription timeout handler not being scheduled by yielding to the event loop after task creation. (PR #3605)
[0.0.100] - 2026-01-20
Added
-
Added Hathora service to support Hathora-hosted TTS and STT models (only non-streaming) (PR #3169)
-
Added
CambTTSService, using Camb.ai's TTS integration with MARS models (mars-flash, mars-pro, mars-instruct) for high-quality text-to-speech synthesis. (PR #3349) -
Added the
additional_headersparam toWebsocketClientParams, allowingWebsocketClientTransportto send custom headers on connect, for cases such as authentication. (PR #3461) -
Added
UserIdleControllerfor detecting user idle state, integrated intoLLMUserAggregatorandUserTurnProcessorvia optionaluser_idle_timeoutparameter. Emitson_user_turn_idleevent for application-level handling. DeprecatedUserIdleProcessorin favor of the new compositional approach. (PR #3482) -
Added
on_user_mute_startedandon_user_mute_stoppedevent handlers toLLMUserAggregatorfor tracking user mute state changes. (PR #3490)
Changed
-
Enhanced interruption handling in
AsyncAITTSServiceby supporting multi-context WebSocket sessions for more robust context management. (PR #3287) -
Throttle
UserSpeakingFrameto broadcast at most every 200ms instead of on every audio chunk, reducing frame processing overhead during user speech. (PR #3483)
Deprecated
- For consistency with other package names, we just deprecated
pipecat.turns.mute(introduced in Pipecat 0.0.99) in favor ofpipecat.turns.user_mute. (PR #3479)
Fixed
-
Corrected TTFB metric calculation in
AsyncAIHttpTTSService. (PR #3287) -
Fixed an issue where the "bot-llm-text" RTVI event would not fire for realtime (speech-to-speech) services:
AWSNovaSonicLLMServiceGeminiLiveLLMServiceOpenAIRealtimeLLMServiceGrokRealtimeLLMService
The issue was that these services weren't pushing
LLMTextFrames. Now they do. (PR #3446) -
Fixed an issue where
on_user_turn_stop_timeoutcould fire while a user is talking when usingExternalUserTurnStrategies. (PR #3454) -
Fixed an issue where user turn start strategies were not being reset after a user turn started, causing incorrect strategy behavior. (PR #3455)
-
Fixed
MinWordsUserTurnStartStrategyto not aggregate transcriptions, preventing incorrect turn starts when words are spoken with pauses between them. (PR #3462) -
Fixed an issue where Grok Realtime would error out when running with SmallWebRTC transport. (PR #3480)
-
Fixed a
Mem0MemoryServiceissue where passingasync_mode: truewas causing an error. See https://docs.mem0.ai/platform/features/async-mode-default-change. (PR #3484) -
Fixed
AWSNovaSonicLLMService.reset_conversation(), which would previously error out. Now it successfully reconnects and "rehydrates" from the context object. (PR #3486) -
Fixed
AzureTTSServicetranscript formatting issues:- Punctuation now appears without extra spaces (e.g., "Hello!" instead of "Hello !")
- CJK languages (Chinese, Japanese, Korean) no longer have unwanted spaces between characters (PR #3489)
-
Fixed an issue where
UninterruptibleFrameframes would not be preserved in some cases. (PR #3494) -
Fixed memory leak in
LiveKitTransportwhenvideo_in_enabledisFalse. (PR #3499) -
Fixed an issue in
AIServicewhere unhandled exceptions instart(),stop(), orcancel()implementations would preventprocess_frame()to continue and thereforeStartFrame,EndFrame, orCancelFramefrom being pushed downstream, causing the pipeline to not start or stop properly. (PR #3503) -
Moved
NVIDIATTSServiceandNVIDIASTTServiceclient initialization from constructor tostart()for better error handling. (PR #3504) -
Optimized
NVIDIATTSServiceto process incoming audio frames immediately. (PR #3509) -
Optimized
NVIDIASTTServiceby removing unnecessary queue and task. (PR #3509) -
Fixed a
CambTTSServiceissue where client was being initialized in the constructor which wouldn't allow for proper Pipeline error handling. (PR #3511)
[0.0.99] - 2026-01-13
Added
-
Introducing user turn strategies. User turn strategies indicate when the user turn starts or stops. In conversational agents, these are often referred to as start/stop speaking or turn-taking plans or policies.
User turn start strategies indicate when the user starts speaking (e.g. using VAD events or when a user says one or more words).
User turn stop strategies indicate when the user stops speaking (e.g. using an end-of-turn detection model or by observing incoming transcriptions).
A list of strategies can be specified for both strategies; strategies are evaluated in order until one evaluates to true.
Available user turn start strategies:
- VADUserTurnStartStrategy
- TranscriptionUserTurnStartStrategy
- MinWordsUserTurnStartStrategy
- ExternalUserTurnStartStrategy
Available user turn stop strategies:
- TranscriptionUserTurnStopStrategy
- TurnAnalyzerUserTurnStopStrategy
- ExternalUserTurnStopStrategy
The default strategies are:
- start: [VADUserTurnStartStrategy, TranscriptionUserTurnStartStrategy]
- stop: [TranscriptionUserTurnStopStrategy]
Turn strategies are configured when setting up
LLMContextAggregatorPair. For example:context_aggregator = LLMContextAggregatorPair( context, user_params=LLMUserAggregatorParams( user_turn_strategies=UserTurnStrategies( stop=[ TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()) ) ], ) ), )In order to use the user turn strategies you must update to the new universal
LLMContextandLLMContextAggregatorPair. (PR #3045) -
Added
RNNoiseFilterfor real-time noise suppression using RNNoise neural network via pyrnnoise library. (PR #3205) -
Added
GrokRealtimeLLMServicefor xAI's Grok Voice Agent API with real-time voice conversations:- Support for real-time audio streaming with WebSocket connection
- Built-in server-side VAD (Voice Activity Detection)
- Multiple voice options: Ara, Rex, Sal, Eve, Leo
- Built-in tools support: web_search, x_search, file_search
- Custom function calling with standard Pipecat tools schema
- Configurable audio formats (PCM at 8kHz-48kHz) (PR #3267)
-
Added an approximation of TTFB for Ultravox. (PR #3268)
-
Added a new
AudioContextTTSServiceto the TTS service base classes. TheAudioContextWordTTSServicenow inherits fromAudioContextTTSServiceandWebsocketWordTTSService. (PR #3289) -
LLMUserAggregatornow exposes the following events:on_user_turn_started: triggered when a user turn startson_user_turn_stopped: triggered when a user turn endson_user_turn_stop_timeout: triggered when a user turn does not stop and times out (PR #3291)
-
Introducing user mute strategies. User mute strategies indicate when user input should be muted based on the current system state.
In conversational agents, user mute strategies are used to prevent user input from interrupting bot speech, tool execution, or other critical system operations.
A list of strategies can be specified; all strategies are evaluated for every frame so that each strategy can maintain its internal state. A user frame is muted if any of the configured strategies indicates it should be muted.
Available user mute strategies:
FirstSpeechUserMuteStrategyMuteUntilFirstBotCompleteUserMuteStrategyAlwaysUserMuteStrategyFunctionCallUserMuteStrategy
User mute strategies replace the legacy
STTMuteFilterand provide a more flexible and composable approach to muting user input.User mute strategies are configured when setting up the
LLMContextAggregatorPair. For example:context_aggregator = LLMContextAggregatorPair( context, user_params=LLMUserAggregatorParams( user_mute_strategies=[ FirstSpeechUserMuteStrategy(), ] ), )In order to use user mute strategies you should update to the new universal
LLMContextandLLMContextAggregatorPair. (PR #3292) -
Added
use_sslparameter toNvidiaSTTService,NvidiaSegmentedSTTServiceandNvidiaTTSService. (PR #3300) -
Added
enable_interruptionsconstructor argument to all user turn strategies. This tells theLLMUserAggregatorto push or not push anInterruptionFrame. (PR #3316) -
Added
split_sentencesparameter toSpeechmaticsSTTServiceto control sentence splitting behavior for finals on sentence boundaries. (PR #3328) -
Added word-level timestamp support to
AzureTTSServicefor accurate text-to-audio synchronization. (PR #3334) -
Added
pronunciation_dict_idparameter toCartesiaTTSService.InputParamsandCartesiaHttpTTSService.InputParamsto support Cartesia's pronunciation dictionary feature for custom pronunciations. (PR #3346) -
Added support for using the HeyGen LiveAvatar API with the
HeyGenTransport(see https://www.liveavatar.com/). (PR #3357) -
Added image support to
OpenAIRealtimeLLMServiceviaInputImageRawFrame:- New
start_video_pausedparameter to control initial video input state - New
video_frame_detailparameter to set image processing quality ("auto", "low", or "high"). This corresponds to OpenAI Realtime'simage_detailparameter. set_video_input_paused()method to pause/resume video input at runtimeset_video_frame_detail()method to adjust video frame quality dynamically- Automatic rate limiting (1 frame per second) to prevent API overload (PR #3360)
- New
-
Added
UserTurnProcessor, a frame processor built onUserTurnControllerthat pushesUserStartedSpeakingFrameandUserStoppedSpeakingFrameframes and interruptions based on the controller's user turn strategies. (PR #3372) -
Added
UserTurnControllerto manage user turns. It emitson_user_turn_started,on_user_turn_stopped, andon_user_turn_stop_timeoutevents, and can be integrated into processors to detect and handle user turns.LLMUserAggregatorandUserTurnProcessorare implemented using this controller. (PR #3372) -
Added
should_interruptproperty toDeepgramFluxSTTService,DeepgramSTTService, andSpeechmaticsSTTServiceto configure whether the bot should be interrupted when the external service detects user speech. (PR #3374) -
LLMAssistantAggregatornow exposes the following events:on_assistant_turn_started: triggered when the assistant turn startson_assistant_turn_stopped: triggered when the assistant turn endson_assistant_thought: triggered when there's an assistant thought available (PR #3385)
-
Added
KrispVivaTurnanalyzer for end of turn detection using the Krisp VIVA SDK (requireskrisp_audio). (PR #3391) -
Added support for setting up a pipeline task from external files. You can now register custom pipeline task setup files by setting the
PIPECAT_SETUP_FILESenvironment variable. This variable should contain a colon-separated list of Python files (e.g.export PIPECAT_SETUP_FILES="setup1.py:setup.py:..."). Each file must define a function with the following signature:async def setup_pipeline_task(task: PipelineTask): ...(PR #3397)
-
Added a keepalive task for
InworldTTSServiceto keep the service connected in the event of no generations for longer periods of time. (PR #3403) -
Added
enable_vadtoParamsfor use in theGladiaSTTService. When enabled,GladiaSTTServiceacts as the turn controller, emittingUserStartedSpeakingFrame,UserStoppedSpeakingFrame, and optionallyInterruptionFrame. (PR #3404) -
Added
should_interruptproperty toGladiaSTTServiceto configure whether the bot should be interrupted when the external service detects user speech. (PR #3404) -
Added
VonageFrameSerializerfor the Vonage Video API Audio Connector WebSocket protocol. (PR #3410) -
Added
append_trailing_spaceparameter toTTSServiceto automatically append a trailing space to text before sending to TTS, helping prevent some services from vocalizing trailing punctuation. (PR #3424)
Changed
-
Updated
ElevenLabsRealtimeSTTServiceto accept theinclude_language_detectionparameter to detect language.stt = ElevenLabsRealtimeSTTService( api_key=os.getenv("ELEVENLABS_API_KEY"), include_language_detection=True )(PR #3216)
-
Updated
SpeechmaticsSTTServiceto use new Python Voice SDK with improved VAD, Smart Turn capabilities, and brings dramatic improvements to latency without any impact on accuracy. Use theturn_detection_modeparameter to control the endpointing of speech, withTurnDetectionMode.EXTERNAL(default),TurnDetectionMode.ADAPTIVE, orTurnDetectionMode.SMART_TURN.stt = SpeechmaticsSTTService( api_key=os.getenv("SPEECHMATICS_API_KEY"), params=SpeechmaticsSTTService.InputParams( language=Language.EN, turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.ADAPTIVE, speaker_active_format="<{speaker_id}>{text}</{speaker_id}>", ), )(PR #3225)
-
daily-pythonupdated to 0.23.0. (PR #3257) -
TranscriptionFrameandInterimTranscriptionFrameproduced byDailyTransportnow include the transport source (i.e., the originating audio track). (PR #3257) -
Updates to Inworld TTS services:
- Improved
InworldTTSService's websocket implementation to better flush and close context to better handle long inputs. - Improved docstrings for
InworldTTSServiceandInworldHttpTTSService. (PR #3288)
- Improved
-
Improved the error handling and reconnection logic for
WebsocketServerby distinguishing between errors when disconnecting and websocket communication errors. (PR #3392) -
Updated
DeepgramSTTServiceto push user started/stopped speaking and interruption frames whenvad_enabledis set to true. This centralizes the frames into the service, removing the need to have your application code handle Deepgram's events and push these frames. (PR #3314) -
Added encoding validation to
DeepgramTTSServiceto prevent unsupported encodings from reaching the API. The service now raisesValueErrorat initialization with a clear error message. (PR #3329) -
Updated
read_audio_frame&read_video_framemethods inSmallWebRTCClientto check if the track is enabled before logging a warning. (PR #3336) -
Updated
CartesiaTTSServiceto support settinglanguage=None, resulting in Cartesia auto-detecting the language of the conversation. (PR #3366) -
The bundled Smart Turn weights are now updated to v3.2, which has better handling of short utterances, and is more robust against background noise. (PR #3367)
-
Updated
SpeechmaticsSTTServicedependency tospeechmatics-voice[smart]>=0.2.6(PR #3371) -
Smart Turn now takes into account
vad_start_secondswhen buffering audio, meaning that the start of the turn audio is not cut off. This improves accuracy for short utterances. -
The default value of
pre_speech_msis now set to 500ms for Smart Turn. (PR #3377) -
Improved Krisp SDK management to allow
KrispVivaTurnandKrispVivaFilterto share a single SDK instance within the same process. (PR #3391) -
Updated default model for
GroqTTSServicetocanopylabs/orpheus-v1-englishand voice ID toautumn. (PR #3399) -
Enhanced
FastAPIWebsocketTransportwith optional protocol-level audio packetization via thefixed_audio_packet_sizeparameter to support media endpoints requiring strict framing and real-time pacing. (PR #3410) -
DeepgramTTSServiceandRimeTTSServicenow setappend_trailing_spacetoTrueto prevent punctuation (e.g., “dot”) from being pronounced. (PR #3424) -
Updated
GeminiLiveLLMServiceto pushLLMThoughtStartFrame,LLMThoughtTextFrame, andLLMThoughtEndFramewhen the model returns thought content. (PR #3431)
Deprecated
-
pipecat.audio.interruptions.MinWordsInterruptionStrategyis deprecated. Usepipecat.turns.user_start.MinWordsUserTurnStartStrategywithLLMUserAggregator's newuser_turn_strategiesparameter instead. (PR #3045) -
FrameProcessor.interruption_strategiesis deprecated, useLLMUserAggregator's newuser_turn_strategiesparameter instead. (PR #3045) -
The
LLMUserAggregatorParamsandLLMAssistantAggregatorParamsclasses inpipecat.processors.aggregators.llm_responseare now deprecated. Use the new universalLLMContextandLLMContextAggregatorPairinstead. (PR #3045) -
Deprecated the
emulatedfield in theUserStartedSpeakingFrameandUserStoppedSpeakingFrameframes. (PR #3045) -
EmulateUserStartedSpeakingFrameandEmulateUserStoppedSpeakingFrameframes are deprecated. (PR #3045) -
⚠️
TransportParams.turn_analyzeris deprecated and might result in unexpected behavior, useLLMUserAggregator's newuser_turn_strategiesparameter instead. (PR #3045) -
For
SpeechmaticsSTTService, theend_of_utterance_modeparameter is deprecated. Use the newturn_detection_modeparameter instead, withTurnDetectionMode.EXTERNAL,TurnDetectionMode.ADAPTIVE, orTurnDetectionMode.SMART_TURN. Theenable_vadparameter is also deprecated and is inferred from theturn_detection_mode. (PR #3225) -
OpenAILLMContextand its associated things (context aggregators, etc.) are now deprecated in favor of the universalLLMContextand its associated things.From the developer's point of view, switching to using
LLMContextmachinery will usually be a matter of going from this:context = OpenAILLMContext(messages, tools) context_aggregator = llm.create_context_aggregator(context)To this:
context = LLMContext(messages, tools) context_aggregator = LLMContextAggregatorPair(context)(PR #3263)
-
STTMuteFilteris deprecated and will be removed in a future version. UseLLMUserAggregator's newuser_mute_strategiesinstead. (PR #3292) -
FrameProcessor.interruptions_allowedis now deprecated, useLLMUserAggregator's new parameteruser_mute_strategiesinstead. (PR #3297) -
PipelineParams.allow_interruptionsis now deprecated, useLLMUserAggregator's new parameteruser_turn_strategiesinstead. For example, to disable interruptions but still get user turns you can do:context_aggregator = LLMContextAggregatorPair( context, user_params=LLMUserAggregatorParams( user_turn_strategies=UserTurnStrategies( start=[TranscriptionUserTurnStartStrategy(enable_interruptions=False)], ), ), )(PR #3297)
-
TranscriptProcessorand related data classes and frames (TranscriptionMessage,ThoughtTranscriptionMessage,TranscriptionUpdateFrame) are deprecated. UseLLMUserAggregator's andLLMAssistantAggregator's new events (on_user_turn_stoppedandon_assistant_turn_stopped) instead. (PR #3385) -
Deprecated support for the
vad_eventsLiveOptionsinDeepgramSTTService. Instead, use a local Silero VAD for VAD events. Additionally, deprecatedshould_interruptwhich will be removed along withvad_eventssupport in a future release. (PR #3386) -
Loading external observers from files is deprecated, use the new pipeline task setup files and
PIPECAT_SETUP_FILESenvironment variable instead. (PR #3397)
Fixed
-
Improved error handling in
ElevenLabsRealtimeSTTService(PR #3233) -
Fixed an issue in
ElevenLabsRealtimeSTTServicecausing an infinite loop that blocks the process if the websocket disconnects due to an error (PR #3233) -
Fixed a bug in
STTMuteFilterwhere the user was not always muted during function calls, especially when there were multiple simultaneous calls. (PR #3292) -
Fixed a
RNNoiseFilterissue that would cause a "[Errno 12] Cannot allocate memory" error when processing silence audio frames. (PR #3322) -
Updated
SpeechmaticsSTTServicefor version0.0.99+:- Fixed
SpeechmaticsSTTServiceto listen forVADUserStoppedSpeakingFramein order to finalize transcription. - Default to
TurnDetectionMode.FIXEDfor Pipecat-controlled end of turn detection. - Only emit VAD + interruption frames if VAD is enabled within the plugin
(modes other than
TurnDetectionMode.FIXEDorTurnDetectionMode.EXTERNAL). (PR #3328)
- Fixed
-
Fixed an issue with function calling where a handler failing to invoke its result callback could leave the context stuck in IN_PROGRESS, causing LLM inference for subsequent function call results to block while waiting on the unresolved call. (PR #3343)
-
Fixed an issue with DeepgramTTSService where the model would output "Dot" instead of a period in some circumstances. (PR #3345)
-
Fixed an issue in
traced_sttwheremodel_namein OpenTelemetry appears asunknown. (PR #3351) -
Fixed an issue in GeminiLiveLLMService where TranscriptionFrames were occasionally not pushed. (PR #3356)
-
Fixed potential memory leaks and initialization issues in
KrispVivaFilterby improving SDK lifecycle management. (PR #3391) -
Fixed timing issue in
BaseOutputTransportwhere the bot speaking flag was set after awaiting, allowing the event loop to re-enter the method before the guard was set. (PR #3400) -
Fixed parallel function calling when using Gemini thinking. (PR 3420)
-
Fixed an issue in
traced_llmwheremodel_namein OpenTelemetry appears asunknown. (PR #3422) -
Fixed an issue in
traced_tts,traced_gemini_live, andtraced_openai_realtimewheremodel_namein OpenTelemetry appears asunknown. (PR #3428) -
Fixed
request_image_frame(for backwards compatibility) and restored function-call–related fields inUserImageRequestFrameandUserImageRawFrame, preventing a case where adding a non-LLM message to the context could trigger duplicate LLM inferences (on image arrival and on function-call result), potentially causing an infinite inference loop. (PR #3430) -
Fixed
LLMContext.create_audio_message()by correcting an internal helper that was incorrectly declared async while being run inasyncio.to_thread(). (PR #3435)
Other
-
Added
52-live-transcription.pyfoundational example demonstrating live transcription and translation from English to Spanish. In this example, the bot is not interruptible: as the user continues speaking, English transcriptions are queued, and the bot continuously translates and speaks each queued sentence in Spanish without being interrupted by new user speech. (PR #3316) -
Added a new foundational example
53-concurrent-llm-evaluation.pythat shows how to useUserTurnProcessor. (PR #3372) -
Added a new foundational example
28-user-assistant-turns.pythat shows how to use the newLLMUserAggregatorandLLMAssistantAggregatorevents to gather a conversation transcript. (PR #3385)
[0.0.98] - 2025-12-17
Added
-
Added
RimeNonJsonTTSServicewhich supports non-JSON streaming mode. This new class supports websocket streaming for the Arcana model. (PR #3085) -
Added additional functionality related to "thinking", for Google and Anthropic LLMs.
- New typed parameters for Google and Anthropic LLMs that control the
models' thinking behavior (like how much thinking to do, and whether to
output thoughts or thought summaries):
AnthropicLLMService.ThinkingConfigGoogleLLMService.ThinkingConfig
- New frames for representing thoughts output by LLMs:
LLMThoughtStartFrameLLMThoughtTextFrameLLMThoughtEndFrame
- A generic mechanism for recording LLM thoughts to context, used
specifically to support Anthropic, whose thought signatures are expected
to appear alongside the text of the thoughts within assistant context
messages. See:
LLMThoughtEndFrame.signatureLLMAssistantAggregatorhandling of the above fieldAnthropicLLMAdapterhandling of"thought"context messages
- Google-specific logic for inserting thought signatures into the context,
to help maintain thinking continuity in a chain of LLM calls. See:
GoogleLLMServicesendingLLMMessagesAppendFrames to add LLM-specific"thought_signature"messages to contextGeminiLLMAdapterhandling of"thought_signature"messages
- An expansion of
TranscriptProcessorto process LLM thoughts in addition to user and assistant utterances. See:TranscriptProcessor(process_thoughts=True)(defaults toFalse)ThoughtTranscriptionMessage, which is now also emitted with the"on_transcript_update"event (PR #3175)
- New typed parameters for Google and Anthropic LLMs that control the
models' thinking behavior (like how much thinking to do, and whether to
output thoughts or thought summaries):
-
Data and control frames can now be marked as non-interruptible by using the
UninterruptibleFramemixin. Frames marked asUninterruptibleFramewill not be interrupted during processing, and any queued frames of this type will be retained in the internal queues. This is useful when you need ordered frames (data or control) that should not be discarded or cancelled due to interruptions. (PR #3189) -
Added
on_conversation_detectedevent toVoicemaiDetector. (PR #3207) -
Added
x-goog-api-clientheader with Pipecat's version to all Google services' requests. (PR #3208) -
Added support for the HeyGen LiveAvatar API (see https://www.liveavatar.com/). (PR #3210)
-
Added to
AWSNovaSonicLLMServicefunctionality related to the new (and now default) Nova 2 Sonic model ("amazon.nova-2-sonic-v1:0"):- Added the
endpointing_sensitivityparameter to control how quickly the model decides the user has stopped speaking. - Made the assistant-response-trigger hack a no-op. It's only needed for the older Nova Sonic model. (PR #3212)
- Added the
-
Ultravox Realtime is now a supported speech-to-speech service.
- Added
UltravoxRealtimeLLMServicefor the integration. - Added
49-ultravox-realtime.pyexample (with tool calling). (PR #3227)
- Added
-
Added Daily PSTN dial-in support to the development runner with
--dialinflag. This includes:/daily-dialin-webhookendpoint that handles incoming Daily PSTN webhooks- Automatic Daily room creation with SIP configuration
DialinSettingsandDailyDialinRequesttypes inpipecat.runner.typesfor type-safe dial-in data- The runner now mimics Pipecat Cloud's dial-in webhook handling for local development (PR #3235)
-
Add Gladia session id to logs for
GladiaSTTService. (PR #3236) -
Added
InworldHttpTTSServicewhich uses Inworld's HTTP based TTS service in either streaming or non-streaming mode. Note: This class was previously namedInworldTTSService. (PR #3239) -
Added
language_hints_strictparameter toSonioxSTTServiceto strictly enforces language hints. This ensures that transcription occurs in the specified language. (PR #3245) -
Added Pipecat library version info to the
aboutfield in thebot-readyRTVI message. (PR #3248) -
Added
VisionFullResponseStartFrame,VisionFullResponseEndFrameandVisionTextFrame. This are used by vision services similar to LLM services. (PR #3252)
Changed
-
FunctionCallInProgressFrameandFunctionCallResultFramehave changed from system frames to a control frame and a data frame, respectively, and are now both marked asUninterruptibleFrame. (PR #3189) -
UserBotLatencyLogObservernow usesVADUserStartedSpeakingFrameandVADUserStoppedSpeakingFrameto determine latency from user stopped speaking to bot started speaking. (PR #3206) -
Updated
HeyGenVideoServiceandHeyGenTransportto support both HeyGen APIs (Interactive Avatar and Live Avatar). Using them is as simple as specifying theservice_typewhen creating theHeyGenVideoServiceand theHeyGenTransport:heyGen = HeyGenVideoService( api_key=os.getenv("HEYGEN_LIVE_AVATAR_API_KEY"), service_type=ServiceType.LIVE_AVATAR, session=session, )(PR #3210)
-
Made
"amazon.nova-2-sonic-v1:0"the new default model forAWSNovaSonicLLMService. (PR #3212) -
Updated the
run_inferencemethods in the LLM service classes (AnthropicLLMService,AWSBedrockLLMService,GoogleLLMService, andOpenAILLMServiceand its base classes) to use the provided LLM configuration parameters. (PR #3214) -
Updated default models for:
GeminiLiveLLMServicetogemini-2.5-flash-native-audio-preview-12-2025.GeminiLiveVertexLLMServicetogemini-live-2.5-flash-native-audio. (PR #3228)
-
Changed the
reasonfield inEndFrame,CancelFrame,EndTaskFrame, andCancelTaskFramefromstrtoAnyto indicate that it can hold values other than strings. (PR #3231) -
Updated websocket STT services to use the
WebsocketSTTServicebase class. This base class manages the websocket connection and handles reconnects. Updated services:AssemblyAISTTServiceAWSTranscribeSTTServiceGladiaSTTServiceSonioxSTTService(PR #3236)
-
Changed Inworld's TTS service implementations:
- Previously, the HTTP implementation was named
InworldTTSService. That has been moved toInworldHttpTTSService. This service now supports word-timestamp alignment data in both streaming and non-streaming modes. - Updated the
InworldTTSServiceclass to use Inworld's Websocket API. This class now has support for word-timestamp alignment data and tracks contexts for each user turn. (PR #3239)
- Previously, the HTTP implementation was named
-
⚠️ Breaking change:
WordTTSService.start_word_timestamps()andWordTTSService.reset_word_timestamps()are now async. (PR #3240) -
Updated the current RTVI version to 1.1.0 to reflect recent additions and deprecations.
- New RTVI Messages:
send-textandbot-output - Deprecated Messages:
append-to-contextandbot-transcription(PR #3248)
- New RTVI Messages:
-
MoondreamServicenow pushesVisionFullResponseStartFrame,VisionFullResponseEndFrameandVisionTextFrame. (PR #3252)
Deprecated
FalSmartTurnAnalyzerandLocalSmartTurnAnalyzerare deprecated and will be removed in a future version. UseLocalSmartTurnAnalyzerV3instead. (PR #3219)
Removed
- Removed the deprecated VLLM-based open source Ultravox STT service. (PR #3227)
Fixed
-
Fixed a bug in
AWSNovaSonicLLMServicewhere we would mishandle cancelled tool calls in the context, resulting in errors. (PR #3212) -
Better support conversation history with Gemini 2.5 Flash Image (model "gemini-2.5-flash-image"). Prior to this fix, the model had no memory of previous images it had generated, so it wouldn't be able to iterate on them. (PR #3224)
-
Support conversations with Gemini 3 Pro Image (model "gemini-3-pro-image-preview"). Prior to this fix, after the model generated an image the conversation would not be able to progress. (PR #3224)
-
Fixed an issue where
ElevenLabsHttpTTSServicewas not updating voice settings when receiving aTTSUpdateSettingsFrame. (PR #3226) -
Fixed the return type for
SmallWebRTCRequestHandler.handle_web_request()function. (PR #3230) -
Fix a bug in LLM context audio content handling (PR #3234)
-
In
GladiaSTTService, reset the_bytes_sentcounter on connecting the websocket. This avoids unnecessary audio buffer trimming. (PR #3236) -
Fixed a TTS service word-timestamp issue that could cause generated
TTSTextFrameinstances to have an incorrect pts (pts = -1). (PR #3240) -
Fixed an issue in
SimpleTextAggreagtorwhere spaces were not being stripped before returning the aggregation. This resulted in an extra space for TTS services that don't support word-timestamp alignment data. (PR #3247)
[0.0.97] - 2025-12-05
Added
-
Added new Gradium services,
GradiumSTTServiceandGradiumTTSService, for speech-to-text and text-to-speech functionality using Gradium's API. -
Additions for
AsyncAITTSServiceandAsyncAIHttpTTSService:- Added new
languages:pt,nl,ar,ru,ro,ja,he,hy,tr,hi,zh. - Updated the default model to
asyncflow_multilingual_v1.0for improved accuracy and broader language coverage.
- Added new
-
Added optional tool and tool output filters for MCP services.
Changed
-
Updated Deepgram logging to include Deepgram request IDs for improved debugging.
-
Text Aggregation Improvements:
- Breaking Change:
BaseTextAggregator.aggregate()now returnsAsyncIterator[Aggregation]instead ofOptional[Aggregation]. This enables the aggregator to return multiple results based on the provided text. - Refactored text aggregators to use inheritance:
SkipTagsAggregatorandPatternPairAggregatornow inherit fromSimpleTextAggregator, reusing the base class's sentence detection logic.
- Breaking Change:
-
Improved interruption handling to prevent bots from repeating themselves. LLM services that return multiple sentences in a single response (e.g.,
GoogleLLMService) are now split into individual sentences before being sent to TTS. This ensures interruptions occur at sentence boundaries, preventing the bot from repeating content after being interrupted during long responses. -
Updated
AICFilterto use Quail STT as the default model (AICModelType.QUAIL_STT). Quail STT is optimized for human-to-machine interaction (e.g., voice agents, speech-to-text) and operates at a native sample rate of 16 kHz with fixed enhancement parameters. -
If an unexpected exception is caught, or if
FrameProcessor.push_error()is called with an exception, the file name and line number where the exception occured are now logged. -
Updated Smart Turn model weights to v3.1.
-
Smart Turn analyzer now uses the full context of the turn rather than just the audio since VAD last triggered.
-
Updated
CartesiaSTTServiceto return the full transcriptionresultin theTranscriptionFrameandInterimTranscriptionFrame. This provides access to word timestamp data. -
HumeTTSServicechanges:- Added tracking headers (
X-Hume-Client-NameandX-Hume-Client-Version) to all requests made byHumeTTSServiceto the Hume API for better usage tracking and analytics. - Added
stop()andcancel()cleanup methods toHumeTTSServiceto properly close the HTTP client and prevent resource leaks.
- Added tracking headers (
Deprecated
-
NVIDIA Services name changes (all functionality is unchanged):
NimLLMServiceis now deprecated, useNvidiaLLMServiceinstead.RivaSTTServiceis now deprecated, useNvidiaSTTServiceinstead.RivaTTSServiceis now deprecated, useNvidiaTTSServiceinstead.- Use
uv pip install pipecat-ai[nvidia]instead ofuv pip install pipecat-ai[riva]
-
The
noise_gate_enableparameter inAICFilteris deprecated and no longer has any effect. Noise gating is now handled automatically by the AIC VAD system. UseAICFilter.create_vad_analyzer()for VAD functionality instead. -
Package
pipecat.syncis deprecated, usepipecat.utils.syncinstead.
Fixed
-
Fixed bug in
PatternPairAggregatorwhere pattern handlers could be called multiple times forKEEPorAGGREGATEpatterns. -
Fixed sentence aggregation to correctly handle ambiguous punctuation in streaming text, such as currency ("$29.95") and abbreviations ("Mr. Smith").
-
Fixed an issue in
AWSTranscribeSTTServicewhere theregionarg was always set tous-east-1when providing an AWS_REGION env var. -
Fixed an issue in
SarvamTTSServicewhere the last sentence was not being spoken. Now, audio is flushed when the TTS services receives theLLMFullResponseEndFrameorEndFrame. -
Fixed an issue in
DeepgramTTSServicewhere aTTSStoppedFramewas incorrectly pushed after a functional call. This caused an issue with the voice-ui-kit's conversational panel rending of the LLM output after a function call. -
Fixed an issue where
LLMTextFrame.skip_ttswas being overwritten by LLM services. -
Fixed an issue that caused
WebsocketServiceinstances to attempt reconnection during shutdown. -
Fixed an issue in
ElevenLabsTTSServicewhere character usage metrics were only reported on the first TTS generation per turn.
[0.0.96] - 2025-11-26 🦃 "Happy Thanksgiving!" 🦃
Added
-
Added
AWSBedrockAgentCoreProcessorto support invoking an AgentCore-hosted agent in a Pipecat pipeline. -
Enhanced error handling across the framework:
-
Added
on_errorcallback toFrameProcessorfor centralized error handling. -
Renamed
push_error(error: ErrorFrame)topush_error_frame(error: ErrorFrame)for clarity. -
Added new
push_errormethod for simplified error reporting:async def push_error(error_msg: str, exception: Optional[Exception] = None, fatal: bool = False) -
Standardized error logging by replacing
logger.exceptioncalls withlogger.errorthroughout the codebase.
-
-
Added
cache_read_input_tokens,cache_creation_input_tokensandreasoning_tokensto OTel spans for LLM call -
Added
LiveKitRESTHelperutility class for managing LiveKit rooms via REST API. -
Added
DeepgramSageMakerSTTServicewhich connects to a SageMaker hosted Deepgram STT model. Added07c-interruptible-deepgram-sagemaker.pyfoundational example. -
Added
SageMakerBidiClientto connect to SageMaker hosted BiDi compatible services. -
Added support for
include_timestampsandenable_logginginElevenLabsRealtimeSTTService. Wheninclude_timestampsis enabled, timestamp data is included in theTranscriptionFrame'sresultparameter. -
Added optional speaking rate control to
InworldTTSService. -
Introduced a new
AggregatedTextFrametype to support passing text along with anaggregated_byfield to describe the type of text included.TTSTextFrames now inherit fromAggregatedTextFrame. With this inheritance, an observer can watch forAggregatedTextFrames to accumlate the perceived output and determine whether or not the text was spoken based on if that frame is also aTTSTextFrame.With this frame, the llm token stream can be transformed into custom composable chunks, allowing for aggregation outside the TTS service. This makes it possible to listen for or handle those aggregations and sets the stage for doing things like composing a best effort of the perceived llm output in a more digestable form and to do so whether or not it is processed by a TTS or if even a TTS exists.
-
Introduced
LLMTextProcessor: A new processor meant to allow customization for how LLMTextFrames should be aggregated and considered. It's purpose is to turnLLMTextFrames intoAggregatedTextFrames. By default, a TTSService will still aggregateLLMTextFrames by sentence for the service to consume. However, if you wish to override how the llm text is aggregated, you should no longer override the TTS's internal text_aggregator, but instead, insert this processor between your LLM and TTS in the pipeline. -
New
bot-outputRTVI message to represent what the bot actually "says".-
The
RTVIObservernow emitsbot-outputmessages based off the newAggregatedTextFrames (bot-tts-textandbot-llm-textare still supported and generated, butbot-transcriptis now deprecated in lieu of this new, more thorough, message). -
The new
RTVIBotOutputMessageincludes the fields:-
spoken: A boolean indicating whether the text was spoken by TTS -
aggregated_by: A string representing how the text was aggregated ("sentence", "word", "my custom aggregation")
-
-
Introduced new fields to
RTVIObserverto support the newbot-outputmessaging:-
bot_output_enabled: Defaults to True. Set to false to disable bot-output messages. -
skip_aggregator_types: Defaults toNone. Set to a list of strings that match aggregation types that should not be included in bot-output messages. (Ex.credit_card)
-
-
Introduced new methods,
add_text_transformer()andremove_text_transformer(), toRTVIObserverto support providing (and subsequently removing) callbacks for various types of aggregations (or all aggregations with*) that can modify the text before being sent as abot-outputortts-textmessage. (Think obscuring the credit card or inserting extra detail the client might want that the context doesn't need.)
-
-
In
MiniMaxHttpTTSService:-
Added support for speech-2.6-hd and speech-2.6-turbo models
-
Added languages: Afrikaans, Bulgarian, Catalan, Danish, Persian, Filipino, Hebrew, Croatian, Hungarian, Malay, Norwegian, Nynorsk, Slovak, Slovenian, Swedish, and Tamil
-
Added new emotions: calm and fluent
-
-
Added
enable_loggingtoSimliVideoServiceinput parameters. It's disabled by default.
Changed
-
Updated
FishAudioTTSServicedefault model tos1. -
Updated
DeepgramTTSServiceto use Deepgram's TTS websocket API. ⚠️ This is a potential breaking change, which only affects you if you're self-hostingDeepgramTTSService. The new service uses Websockets and improves TTFB latency. -
Updated
daily-pythonto 0.22.0. -
BaseTextAggregatorchanges:Modified the BaseTextAggregator type so that when text gets aggregated, metadata can be associated with it. Currently, that just means a
type, so that the aggregation can be classified or described. Changes made to support this:-
⚠️ IMPORTANT: Aggregators are now expected to strip leading/trailing white space characters before returning their aggregation from
aggregation()or.text. This way all aggregators have a consistent contract allowing downstream use to know how to stitch aggregations back together. -
Introduced a new
Aggregationdataclass to represent both the aggregatedtextand a string identifying thetypeof aggregation (ex. "sentence", "word", "my custom aggregation") -
⚠️ Breaking change:
BaseTextAggregator.textnow returns anAggregation(instead ofstr).Before:
aggregated_text = myAggregator.textNow:
aggregated_text = myAggregator.text.text -
⚠️ Breaking change:
BaseTextAggregator.aggregate()now returnsOptional[Aggregation](instead ofOptional[str]).Before:
aggregation = myAggregator.aggregate(text) print(f"successfully aggregated text: {aggregation}")Now:
aggregation = myAggregator.aggregate(text) if aggregation: print(f"successfully aggregated text: {aggregation.text}") -
SimpleTextAggregator,SkipTagsAggregator,PatternPairAggregatorupdated to produce/consumeAggregationobjects. -
All uses of the above Aggregators have been updated accordingly.
-
-
Augmented the
PatternPairAggregatorso that matched patterns can be treated as their own aggregation, taking advantage of the new. To that end:-
Introduced a new, preferred version of
add_patternto support a new option for treating a match as a separate aggregation returned fromaggregate(). This replaces the now deprecatedadd_pattern_pairmethod and you provide aMatchActionin lieu of theremove_matchfield.-
MatchActionenum:REMOVE,KEEP,AGGREGATE, allowing customization for how a match should be handled.-
REMOVE: The text along with its delimiters will be removed from the streaming text. Sentence aggregation will continue on as if this text did not exist. -
KEEP: The delimiters will be removed, but the content between them will be kept. Sentence aggregation will continue on with the internal text included. -
AGGREGATE: The delimiters will be removed and the content between will be treated as a separate aggregation. Any text before the start of the pattern will be returned early, whether or not a complete sentence was found. Then the pattern will be returned. Then the aggregation will continue on sentence matching after the closing delimiter is found. The content between the delimiters is not aggregated by sentence. It is aggregated as one single block of text.
-
-
PatternMatchnow extendsAggregationand provides richer info to handlers.
-
-
⚠️ Breaking change: The
PatternMatchtype returned to handlers registered viaon_pattern_matchhas been updated to subclass from the newAggregationtype, which means thatcontenthas been replaced withtextandpattern_idhas been replaced withtype:async dev on_match_tag(match: PatternMatch): pattern = match.type # instead of match.pattern_id text = match.text # instead of match.content
-
-
TextFramenow includes the fieldappend_to_contextto support setting whether or not the encompassing text should be added to the LLM context (by the LLM assistant aggregator). It defaults toTrue. -
TTSServicebase class updates:-
TTSServices now accept a newskip_aggregator_typesto avoid speaking certain aggregation types (now determined/returned by the aggregator) -
Introduced the ability to do a just-in-time transform of text before it gets sent to the TTS service via callbacks you can set up via a new init field,
text_transformsor a new methodadd_text_transformer(). This makes it possible to do things like introduce TTS-specific tags for spelling or emotion or change the pronunciation of something on the fly.remove_text_transformerhas also been added to support removing a registered transform callback. -
TTS services push
AggregatedTextFramein addition toTTSTextFrames when either an aggregation occurs that should not be spoken or when the TTS service supports word-by-word timestamping. In the latter case, theTTSServicepreliminarily generates anAggregatedTextFrame, aggregated by sentence to generate the full sentence content as early as possible.
-
-
Updated
CartesiaTTSService:-
Modified use of custom default text_aggregator to avoid deprecation warnings and push users towards use of transformers or the
LLMTextProcessor -
Added convenience methods for taking advantage of Cartesia's SSML tags: spell, emotion, pauses, volume, and speed.
-
-
Updated
RimeTTSService:-
Modified use of custom default text_aggregator to avoid deprecation warnings and push users towards use of transformers or the
LLMTextProcessor -
Added convenience methods for taking advantage of Rime's customization options: spell, pauses, pronunciations, and inline speed control.
-
Deprecated
-
The TTS constructor field,
text_aggregatoris deprecated in favor of the newLLMTextProcessor. TTSServices still have an internal aggregator for support of default behavior, but if you want to override the aggregation behavior, you should use the new processor. -
The RTVI
bot-transcriptionevent is deprecated in favor of the newbot-outputmessage which is the canonical representation of bot output (spoken or not). The code still emits a transcription message for backwards compatibility while transition occurs. -
Deprecated
add_pattern_pairin thePatternPairAggregatorwhich takes apattern_idandremove_matchfield in favor of the newadd_patternmethod which takes atypeand anaction -
english_normalizationinput parameter forMiniMaxHttpTTSServiceis deprecated, usetest_normalizationinstead.
Fixed
-
Fixed an issue in
AWSBedrockLLMServicewhere theaws_regionarg was always set tous-east-1when providing an AWS_REGION env var. -
Fixed an issue with
DeepgramFluxSTTServicewhere it sometimes failed to reconnect. -
Fixed an issue in
ElevenLabsRealtimeSTTServicewhere dynamic language updates were not working. -
Fixed an issue in
ElevenLabsRealtimeSTTServicewhere setting the sample rate would result in transcripts failing. -
Fixed
InworldTTSServiceaudio config payload to use camelCase keys expected by the Inworld API.
[0.0.95] - 2025-11-18
Added
-
Added ai-coustics integrated VAD (
AICVADAnalyzer) withAICFilterfactory and example wiring; leverages the enhancement model for robust detection with no ONNX dependency or added processing complexity. -
Added a watchdog to
DeepgramFluxSTTServiceto prevent dangling tasks in case the user was speaking and we stop receiving audio. -
Introduced a minimum confidence parameter in
DeepgramFluxSTTServiceto avoid generating transcriptions below a defined threshold. -
Added
ElevenLabsRealtimeSTTServicewhich implements the Realtime STT service from ElevenLabs. -
Added word-level timestamps support to Hume TTS service
Changed
-
⚠️ Breaking change:
LLMContext.create_image_message(),LLMContext.create_audio_message(),LLMContext.add_image_frame_message()andLLMContext.add_audio_frames_message()are now async methods. This fixes an issue where the asyncio event loop would be blocked while encoding audio or images. -
ConsumerProcessornow queues frames from the producer internally instead of pushing them directly. This allows us to subclass consumer processors and manipulate frames before they are pushed. -
BaseTextFilteronly require subclasses to implement thefilter()method. -
Extracted the logic for retrying connections, and create a new
send_with_retrymethod insideWebSocketService. -
Refactored
DeepgramFluxSTTServiceto automatically reconnect if sending a message fails. -
Updated all STT and TTS services to use consistent error handling pattern with
push_error()method for better pipeline error event integration. -
Added support for
maybe_capture_participant_camera()andmaybe_capture_participant_screen()forSmallWebRTCTransportin the runner utils. -
Added Hindi support for Rime TTS services.
-
Updated
GeminiTTSServiceto use Google Cloud Text-to-Speech streaming API instead of the deprecated Gemini API. Now usescredentials/credentials_pathfor authentication. Theapi_keyparameter is deprecated. Also, added support forpromptparameter for style instructions and expressive markup tags. Significantly improved latency with streaming synthesis. -
Updated language mappings for the Google and Gemini TTS services to match official documentation.
Deprecated
- The
api_keyparameter inGeminiTTSServiceis deprecated. Usecredentialsorcredentials_pathinstead for Google Cloud authentication.
Fixed
-
Fixed a
SimliVideoServiceconnection issue. -
Fixed an issue in the
Runnerwhere, when usingSmallWebRTCTransport, therequest_datawas not being passed to theSmallWebRTCRunnerArgumentsbody. -
Fixed subtle issue of assistant context messages ending up with double spaces between words or sentences.
-
Fixed an issue where
NeuphonicTTSServicewasn't pushingTTSTextFrames, meaning assistant messages weren't being written to context. -
Fixed an issue with OpenTelemetry where tracing wasn't correctly displaying LLM completions and tools when using the universal
LLMContext. -
Fixed issue where
DeepgramFluxSTTServicefailed to connect if passing akeytermortagcontaining a space. -
Prevented
HeyGenVideoServicefrom automatically disconnecting after 5 minutes.
[0.0.94] - 2025-11-10
Changed
- Added support for retrying
SpeechmaticsTTSServicewhen it returns a 503 error. Default values inInputParams.
Deprecated
- The
KrispFilteris deprecated and will be removed in a future version. Use theKrispVivaFilterinstead.
Removed
LivekitFrameSerializerhas been removed. UseLiveKitTransportinstead.
Fixed
- Fixed a bug related to
LLMAssistantAggregatorwhere spaces were sometimes missing from assistant messages in context.
[0.0.93] - 2025-11-07
Added
-
Added support for Sarvam Speech-to-Text service (
SarvamSTTService) with streaming WebSocket support forsaarika(STT) andsaaras(STT-translate) models. -
Added support for passing in a
ToolsSchemain lieu of a list of provider- specific dicts when initializingOpenAIRealtimeLLMServiceor when updating it usingLLMUpdateSettingsFrame. -
Added
TransportParams.audio_out_silence_secs, which specifies how many seconds of silence to output when anEndFramereaches the output transport. This can help ensure that all audio data is fully delivered to clients. -
Added new
FrameProcessor.broadcast_frame()method. This will push two instances of a given frame class, one upstream and the other downstream.await self.broadcast_frame(UserSpeakingFrame) -
Added
MetricsLogObserverfor logging performance metrics fromMetricsFrameinstances. Supports filtering viainclude_metricsparameter to control which metrics types are logged (TTFB, processing time, LLM token usage, TTS usage, smart turn metrics). -
Added
pronunciation_dictionary_locatorstoElevenLabsTTSServiceandElevenLabsHttpTTSService. -
Added support for loading external observers. You can now register custom pipeline observers by setting the
PIPECAT_OBSERVER_FILESenvironment variable. This variable should contain a colon-separated list of Python files (e.g.export PIPECAT_OBSERVER_FILES="observer1.py:observer2.py:..."). Each file must define a function with the following signature:async def create_observers(task: PipelineTask) -> Iterable[BaseObserver]: ... -
Added support for new sonic-3 languages in
CartesiaTTSServiceandCartesiaHttpTTSService. -
EndFrameandEndTaskFramehave an optionalreasonfield to indicate why the pipeline is being ended. -
CancelFrameandCancelTaskFramehave an optionalreasonfield to indicate why the pipeline is being canceled. This can be also specified when you cancel a task withPipelineTask.cancel(reason="cancellation reason"). -
Added
include_prob_metricsparameter to Whisper STT services to enable access to probability metrics from transcription results. -
Added utility functions
extract_whisper_probability(),extract_openai_gpt4o_probability(), andextract_deepgram_probability()to extract probability metrics fromTranscriptionFrameobjects for Whisper-based, OpenAI GPT-4o-transcribe, and Deepgram STT services respectively. -
Added
LLMSwitcher.register_direct_function(). It works much likeLLMSwitcher.register_function()in that it's a shorthand for registering functions on all LLMs in the switcher, but for direct functions. -
Added
LLMSwitcher.register_direct_function(). It works much likeLLMSwitcher.register_function()in that it's a shorthand for registering a function on all LLMs in the switcher, except this new method takes a direct function (aFunctionSchema-less function). -
Added
MCPClient.get_tools_schema()andMCPClient.register_tools_schema()as a two-step alternative toMCPClient.register_tools(), to allow users to pass MCP tools to, say,GeminiLiveLLMService(as well as other speech-to-speech services) in the constructor. -
Added support for passing in an
LLMSwichertoMCPClient.register_tools()(as well as the newMCPClient.register_tools_schema()). -
Added
cpu_countparameter toLocalSmartTurnAnalyzerV3. This is set to1by default for more predictable performance on low-CPU systems.
Changed
-
Updated
simli-aito 0.1.25. -
STTMuteFilterno longer sendsSTTMuteFrameto the STT service. The filter now blocks frames locally without instructing the STT service to stop processing audio. This prevents inactivity-related errors (such as 409 errors from Google STT) while maintaining the same muting behavior at the application level. Important: The STTMuteFilter should be placed after the STT service itself. -
Improved
GoogleSTTServiceerror handling to properly catch gRPCAbortedexceptions (corresponding to 409 errors) caused by stream inactivity. These exceptions are now logged at DEBUG level instead of ERROR level, since they indicate expected behavior when no audio is sent for 10+ seconds (e.g., during long silences or when audio input is blocked). The service automatically reconnects when this occurs. -
Bumped the
fastapidependency's upperbound to<0.122.0. -
Updated the default model for
GoogleVertexLLMServicetogemini-2.5-flash. -
Updated the
GoogleVertexLLMServiceto use theGoogleLLMServiceas a base class instead of theOpenAILLMService. -
Updated STT and TTS services to pass through unverified language codes with a warning instead of returning None. This allows developers to use newly supported languages before Pipecat's service classes are updated, while still providing guidance on verified languages.
Removed
- Removed
needs_mcp_alternate_schema()fromLLMService. The mechanism that relied on it went away.
Fixed
-
Restore backwards compatibility for vision/image features (broken in 0.0.92) when using non-universal context and assistant aggregators.
-
Fixed
DeepgramSTTService._disconnect()to properly awaitis_connected()method call, which is an async coroutine in the Deepgram SDK. -
Fixed an issue where the
SmallWebRTCRequestdataclass in runner would scrub arbitrary request data from client due to camelCase typing. This fixes data passthrough for JS clients whereAPIRequestis used. -
Fixed a bug in
GeminiLiveLLMServicewhere in some circumstances it wouldn't respond after a tool call. -
Fixed
GeminiLiveLLMServicesession resumption after a connection timeout. -
GeminiLiveLLMServicenow properly supports context-provided system instruction and tools. -
Fixed
GoogleLLMServicetoken counting to avoid double-counting tokens when Gemini sends usage metadata across multiple streaming chunks.
[0.0.92] - 2025-10-31 🎃 "The Haunted Edition" 👻
Added
-
Added a new
DeepgramHttpTTSService, which delivers a meaningful reduction in latency when compared to theDeepgramTTSService. -
Add support for
speaking_rateinput parameter inGoogleHttpTTSService. -
Added
enable_speaker_diarizationandenable_language_identificationtoSonioxSTTService. -
Added
SpeechmaticsTTSService, which uses Speechmatic's TTS API. Updated examples 07a* to use the new TTS service. -
Added support for including images or audio to LLM context messages using
LLMContext.create_image_message()orLLMContext.create_image_url_message()(not all LLMs support URLs) andLLMContext.create_audio_message(). For example, when creatingLLMMessagesAppendFrame:message = LLMContext.create_image_message(image=..., size= ...) await self.push_frame(LLMMessagesAppendFrame(messages=[message], run_llm=True)) -
New event handlers for the
DeepgramFluxSTTService:on_start_of_turn,on_turn_resumed,on_end_of_turn,on_eager_end_of_turn,on_update. -
Added
generation_configparameter support toCartesiaTTSServiceandCartesiaHttpTTSServicefor Cartesia Sonic-3 models. Includes a newGenerationConfigclass withvolume(0.5-2.0),speed(0.6-1.5), andemotion(60+ options) parameters for fine-grained speech generation control. -
Expanded support for univeral
LLMContexttoOpenAIRealtimeLLMService. As a reminder, the context-setup pattern when usingLLMContextis:context = LLMContext(messages, tools) context_aggregator = LLMContextAggregatorPair(context)(Note that even though
OpenAIRealtimeLLMServicenow supports the universalLLMContext, it is not meant to be swapped out for another LLM service at runtime withLLMSwitcher.)Note:
TranscriptionFrames andInterimTranscriptionFrames now go upstream fromOpenAIRealtimeLLMService, so if you're usingTranscriptProcessor, say, you'll want to adjust accordingly:pipeline = Pipeline( [ transport.input(), context_aggregator.user(), # BEFORE llm, transcript.user(), # AFTER transcript.user(), llm, transport.output(), transcript.assistant(), context_aggregator.assistant(), ] )Also worth noting: whether or not you use the new context-setup pattern with
OpenAIRealtimeLLMService, some types have changed under the hood:## BEFORE: # Context aggregator type context_aggregator: OpenAIContextAggregatorPair # Context frame type frame: OpenAILLMContextFrame # Context type context: OpenAIRealtimeLLMContext # or context: OpenAILLMContext ## AFTER: # Context aggregator type context_aggregator: LLMContextAggregatorPair # Context frame type frame: LLMContextFrame # Context type context: LLMContextAlso note that
RealtimeMessagesUpdateFrameandRealtimeFunctionCallResultFramehave been deprecated, since they're no longer used byOpenAIRealtimeLLMService. OpenAI Realtime now works more like other LLM services in Pipecat, relying on updates to its context, pushed by context aggregators, to update its internal state. Listen forLLMContextFrames for context updates.Finally,
LLMTextFrames are no longer pushed fromOpenAIRealtimeLLMServicewhen it's configured withoutput_modalities=['audio']. If you need to process its output, listen forTTSTextFrames instead. -
Expanded support for universal
LLMContexttoGeminiLiveLLMService. As a reminder, the context-setup pattern when usingLLMContextis:context = LLMContext(messages, tools) context_aggregator = LLMContextAggregatorPair(context)(Note that even though
GeminiLiveLLMServicenow supports the universalLLMContext, it is not meant to be swapped out for another LLM service at runtime withLLMSwitcher.)Worth noting: whether or not you use the new context-setup pattern with
GeminiLiveLLMService, some types have changed under the hood:## BEFORE: # Context aggregator type context_aggregator: GeminiLiveContextAggregatorPair # Context frame type frame: OpenAILLMContextFrame # Context type context: GeminiLiveLLMContext # or context: OpenAILLMContext ## AFTER: # Context aggregator type context_aggregator: LLMContextAggregatorPair # Context frame type frame: LLMContextFrame # Context type context: LLMContextAlso note that
LLMTextFrames are no longer pushed fromGeminiLiveLLMServicewhen it's configured withmodalities=GeminiModalities.AUDIO. If you need to process its output, listen forTTSTextFrames instead.
Changed
-
The development runner's
/startendpoint now supports passingdailyRoomPropertiesanddailyMeetingTokenPropertiesin the request body whencreateDailyRoomis true. Properties are validated against theDailyRoomPropertiesandDailyMeetingTokenPropertiestypes respectively and passed to Daily's room and token creation APIs. -
UserImageRawFramenew fieldsappend_to_contextandtext. Theappend_to_contextfield indicates if this image and text should be added to the LLM context (by the LLM assistant aggregator). Thetextfield, if set, might also guide the LLM or the vision service on how to analyze the image. -
UserImageRequestFramenew fielsappend_to_contextandtext. Both fields will be used to set the same fields on the capturedUserImageRawFrame. -
UserImageRequestFramedon't require function call name and ID anymore. -
Updated
MoondreamServiceto processUserImageRawFrame. -
VisionServiceexpectsUserImageRawFramein order to analyze images. -
DailyTransporttriggerson_errorevent if transcription can't be started or stopped. -
DailyTransportupdates:start_dialout()now returns two values:session_idanderror.start_recording()now returns two values:stream_idanderror. -
Updated
daily-pythonto 0.21.0. -
SimliVideoServicenow acceptsapi_keyandface_idparameters directly, with optionalparamsformax_session_lengthandmax_idle_timeconfiguration, aligning with other Pipecat service patterns. -
Updated the default model to
sonic-3forCartesiaTTSServiceandCartesiaHttpTTSService. -
FunctionFilternow has afilter_system_framesarg, which controls whether or not SystemFrames are filtered. -
Upgraded
aws_sdk_bedrock_runtimeto v0.1.1 to resolve potential CPU issues when runningAWSNovaSonicLLMService.
Deprecated
-
The
expect_stripped_wordsparameter ofLLMAssistantAggregatorParamsis ignored when used with the newerLLMAssistantAggregator, which now handles word spacing automatically. -
LLMService.request_image_frame()is deprecated, push aUserImageRequestFrameinstead. -
UserResponseAggregatoris deprecated and will be removed in a future version. -
The
send_transcription_framesargument toOpenAIRealtimeLLMServiceis deprecated. Transcription frames are now always sent. They go upstream, to be handled by the user context aggregator. See "Added" section for details. -
Types in
pipecat.services.openai.realtime.contextandpipecat.services.openai.realtime.framesare deprecated, as they're no longer used byOpenAIRealtimeLLMService. See "Added" section for details. -
SimliVideoServicesimli_configparameter is deprecated. Useapi_keyandface_idparameters instead.
Removed
-
Removed
enable_non_final_tokensandmax_non_final_tokens_duration_msfromSonioxSTTService. -
Removed the
aiohttp_sessionarg fromSarvamTTSServiceas it's no longer used.
Fixed
-
Fixed a
PipelineTaskissue that was causing an idle timeout for frames that were being generated but not reaching the end of the pipeline. Since the exact point when frames are discarded is unknown, we now monitor pipeline frames using an observer. If the observer detects frames are being generated, it will prevent the pipeline from being considered idle. -
Fixed an issue in
HumeTTSServicethat was only using Octave 2, which does not support thedescriptionfield. Now, if a description is provided, it switches to Octave 1. -
Fixed an issue where
DailyTransportwould timeout prematurely on join and on leave. -
Fixed an issue in the runner where starting a DailyTransport room via
/startdidn't support using theDAILY_SAMPLE_ROOM_URLenv var. -
Fixed an issue in
ServiceSwitcherwhere theSTTServices would result in all STT services producingTranscriptionFrames.
Other
-
Updated all vision 12-series foundational examples to load images from a file.
-
Added 14-series video examples for different services. These new examples request an image from the user camera through a function call.
[0.0.91] - 2025-10-21
Added
-
It is now possible to start a bot from the
/startendpoint when using the runner Daily's transport. This follows the Pipecat Cloud format withcreateDailyRoomandbodyfields in the POST request body. -
Added an ellipsis character (
…) to the end of sentence detection in the string utils. -
Expanded support for universal
LLMContexttoAWSNovaSonicLLMService. As a reminder, the context-setup pattern when usingLLMContextis:context = LLMContext(messages, tools) context_aggregator = LLMContextAggregatorPair(context)(Note that even though
AWSNovaSonicLLMServicenow supports the universalLLMContext, it is not meant to be swapped out for another LLM service at runtime withLLMSwitcher.)Worth noting: whether or not you use the new context-setup pattern with
AWSNovaSonicLLMService, some types have changed under the hood:## BEFORE: # Context aggregator type context_aggregator: AWSNovaSonicContextAggregatorPair # Context frame type frame: OpenAILLMContextFrame # Context type context: AWSNovaSonicLLMContext # or context: OpenAILLMContext ## AFTER: # Context aggregator type context_aggregator: LLMContextAggregatorPair # Context frame type frame: LLMContextFrame # Context type context: LLMContext -
Added support for
bulbul:v3model inSarvamTTSServiceandSarvamHttpTTSService. -
Added
keyterms_promptparameter toAssemblyAIConnectionParams. -
Added
speech_modelparameter toAssemblyAIConnectionParamsto access the multilingual model. -
Added support for trickle ICE to the
SmallWebRTCTransport. -
Added support for updating
OpenAITTSServicesettings (instructionsandspeed) at runtime viaTTSUpdateSettingsFrame. -
Added
--whatsappflag to runner to better surface WhatsApp transport logs. -
Added
on_connectedandon_disconnectedevents to TTS and STT websocket-based services. -
Added an
aggregate_sentencesarg inElevenLabsHttpTTSService, where the default value is True. -
Added a
room_propertiesarg to the Daily runner'sconfigure()method, allowingDailyRoomPropertiesto be provided. -
The runner
--folderargument now supports downloading files from subdirectories.
Changed
-
RunnerArgumentsnow include thebodyfield, so there's no need to add it to subclasses. Also, allRunnerArgumentsfields are now keyword-only. -
CartesiaSTTServicenow inherits fromWebsocketSTTService. -
Package upgrades:
daily-pythonupgraded to 0.20.0.openaiupgraded to support up to 2.x.x.openpipeupgraded to support up to 5.x.x.
-
SpeechmaticsSTTServiceupdated dependencies forspeechmatics-rt>=0.5.0.
Deprecated
-
The
send_transcription_framesargument toAWSNovaSonicLLMServiceis deprecated. Transcription frames are now always sent. They go upstream, to be handled by the user context aggregator. See "Added" section for details. -
Types in
pipecat.services.aws.nova_sonic.contextare deprecated, as they're no longer used byAWSNovaSonicLLMService. See "Added" section for details.
Fixed
-
Fixed an issue where the
RTVIProcessorwas sending duplicateUserStartedSpeakingFrameandUserStoppedSpeakingFramemessages. -
Fixed an issue in
AWSBedrockLLMServicewhere bothtemperatureandtop_pwere always sent together, causing conflicts with models like Claude Sonnet 4.5 that don't allow both parameters simultaneously. The service now only includes inference parameters that are explicitly set, andInputParamsdefaults have been changed toNoneto rely on AWS Bedrock's built-in model defaults. -
Fixed an issue in
RivaSegmentedSTTServicewhere a runtime error occurred due to a mismatch in the_handle_transcriptionmethod's signature. -
Fixed multiple pipeline task cancellation issues.
asyncio.CancelledErroris now handled properly inPipelineTaskmaking it possible to cancel an asyncio task that it's executing aPipelineRunnercleanly. Also,PipelineTask.cancel()does not block anymore waiting for theCancelFrameto reach the end of the pipeline (going back to the behavior in < 0.0.83). -
Fixed an issue in
ElevenLabsTTSServiceandElevenLabsHttpTTSServicewhere the Flash models would split words, resulting in a space being inserted between words. -
Fixed an issue where audio filters'
stop()would not be called when usingCancelFrame. -
Fixed an issue in
ElevenLabsHttpTTSService, whereapply_text_normalizationwas incorrectly set as a query parameter. It's now being added as a request parameter. -
Fixed an issue where
RimeHttpTTSServiceandPiperTTSServicecould generate incorrectly 16-bit aligned audio frames, potentially leading to internal errors or static audio. -
Fixed an issue in
SpeechmaticsSTTServicewhereAdditionalVocabEntryitems needed to havesounds_likefor the session to start.
Other
-
Added foundational example
47-sentry-metrics.py, demonstrating how to use theSentryMetricsprocessor. -
Added foundational example
14x-function-calling-openpipe.py.
[0.0.90] - 2025-10-10
Added
-
Added audio filter
KrispVivaFilterusing the Krisp VIVA SDK. -
Added
--folderargument to the runner, allowing files saved in that folder to be downloaded fromhttp://HOST:PORT/file/FILE. -
Added
GeminiLiveVertexLLMService, for accessing Gemini Live via Google Vertex AI. -
Added some new configuration options to
GeminiLiveLLMService:thinkingenable_affective_dialogproactivity
Note that these new configuration options require using a newer model than the default, like "gemini-2.5-flash-native-audio-preview-09-2025". The last two require specifying
http_options=HttpOptions(api_version="v1alpha"). -
Added
on_pipeline_errorevent toPipelineTask. This event will get fired when anErrorFrameis pushed (useFrameProcessor.push_error()).@task.event_handler("on_pipeline_error") async def on_pipeline_error(task: PipelineTask, frame: ErrorFrame): ... -
Added a
service_tierInputParamto theBaseOpenAILLMService. This parameter can influence the latency of the response. For example"priority"will result in faster completions, but in exchange for a higher price.
Changed
- Updated
GeminiLiveLLMServiceto use thegoogle-genailibrary rather than use WebSockets directly.
Deprecated
-
LivekitFrameSerializeris now deprecated. UseLiveKitTransportinstead. -
pipecat.service.openai_realtimeis now deprecated, usepipecat.services.openai.realtimeinstead orpipecat.services.azure.realtimefor Azure Realtime. -
pipecat.service.aws_nova_sonicis now deprecated, usepipecat.services.aws.nova_sonicinstead. -
GeminiMultimodalLiveLLMServiceis now deprecated, useGeminiLiveLLMService.
Fixed
-
Fixed a
GoogleVertexLLMServiceissue that would generate an error if no token information was returned. -
GeminiLiveLLMServicewill now end gracefully (i.e. after the bot has finished) upon receiving anEndFrame. -
GeminiLiveLLMServicewill try to seamlessly reconnect when it loses its connection.
[0.0.89] - 2025-10-07
Fixed
- Reverted a change introduced in 0.0.88 that was causing pipelines to be frozen
when using interruption strategies and processors that block interruption
frames (e.g.
STTMuteFilter).
[0.0.88] - 2025-10-07
Added
-
Added support for Nano Banana models to
GoogleLLMService. For example, you can now use thegemini-2.5-flash-imagemodel to generate images. -
Added
HumeTTSServicefor text-to-speech synthesis using Hume AI's expressive voice models. Provides high-quality, emotionally expressive speech synthesis with support for various voice models. Includes example inexamples/foundational/07ad-interruptible-hume.py. Use with:uv pip install pipecat-ai[hume].
Changed
- Updated default
GoogleLLMServicemodel togemini-2.5-flash.
Deprecated
- PlayHT is shutting down their API on December 31st, 2025. As a result,
PlayHTTTSServiceandPlayHTHttpTTSServiceare deprecated and will be removed in a future version.
Fixed
-
Fixed an issue with
AWSNovaSonicLLMServicewhere the client wouldn't connect due to a breaking change in the AWS dependency chain. -
PermissionErroris now caught if NLTK'spunkt_tabcan't be downloaded. -
Fixed an issue that would cause wrong user/assistant context ordering when using interruption strategies.
-
Fixed RTVI incoming message handling, broken in 0.0.87.
[0.0.87] - 2025-10-02
Added
-
Added
WebsocketSTTServicebase class for websocket-based STT services. Combines STT functionality with websocket connectivity, providing automatic error handling and reconnection capabilities with exponential backoff. -
Added
DeepgramFluxSTTServicefor real-time speech recognition using Deepgram's Flux WebSocket API. Flux understands conversational flow and automatically handles turn-taking. -
Added RTVI messages for user/bot audio levels and system logs.
-
Include OpenAI-based LLM services cached tokens to
MetricsFrame.
Changed
- Updated the default model for
AnthropicLLMServicetoclaude-sonnet-4-5-20250929.
Deprecated
-
DailyTransportMessageFrameandDailyTransportMessageUrgentFrameare deprecated, useDailyOutputTransportMessageFrameandDailyOutputTransportMessageUrgentFramerespectively instead. -
LiveKitTransportMessageFrameandLiveKitTransportMessageUrgentFrameare deprecated, useLiveKitOutputTransportMessageFrameandLiveKitOutputTransportMessageUrgentFramerespectively instead. -
TransportMessageFrameandTransportMessageUrgentFrameare deprecated, useOutputTransportMessageFrameandOutputTransportMessageUrgentFramerespectively instead. -
InputTransportMessageUrgentFrameis deprecated, useInputTransportMessageFrameinstead. -
DailyUpdateRemoteParticipantsFrameis deprecated and will be removed in a future version. Instead, create your own custom frame and handle it in the@transport.output().event_handler("on_after_push_frame")event handler or a custom processor.
Fixed
-
Fixed an issue in
AWSBedrockLLMServicewhere timeout exceptions weren't being detected. -
Fixed a
PipelineTaskissue that could prevent the application to exit iftask.cancel()was called when the task was already finished. -
Fixed an issue where local SmartTurn was not being ran in a separate thread.
[0.0.86] - 2025-09-24
Added
-
Added
HeyGenTransport. This is an integration for HeyGen Interactive Avatar. A video service that handles audio streaming and requests HeyGen to generate avatar video responses. (see https://www.heygen.com/). When used, the Pipecat bot joins the same virtual room as the HeyGen Avatar and the user. -
Added support to
TwilioFrameSerializerforregionandedgesettings. -
Added support for using universal
LLMContextwith:LLMLogObserverGatedLLMContextAggregator(formerlyGatedOpenAILLMContextAggregator)LangchainProcessorMem0MemoryService
-
Added
StrandsAgentProcessorwhich allows you to use the Strands Agents framework to build your voice agents. See https://strandsagents.com -
Added
ElevenLabsSTTServicefor speech-to-text transcription. -
Added a peer connection monitor to the
SmallWebRTCConnectionthat automatically disconnects if the connection fails to establish within the timeout (1 minute by default). -
Added memory cleanup improvements to reduce memory peaks.
-
Added
on_before_process_frame,on_after_process_frame,on_before_push_frameandon_after_push_frame. These are synchronous events that get called before and after a frame is processed or pushed. Note that these events are synchrnous so they should ideally perform lightweight tasks in order to not block the pipeline. Seeexamples/foundational/45-before-and-after-events.py. -
Added
on_before_leavesynchronous event toDailyTransport. -
Added
on_before_disconnectsynchronous event toLiveKitTransport. -
It is now possible to register synchronous event handlers. By default, all event handlers are executed in a separate task. However, in some cases we want to guarantee order of execution, for example, executing something before disconnecting a transport.
self._register_event_handler("on_event_name", sync=True) -
Added support for global location in
GoogleVertexLLMService. The service now supports both regional locations (e.g., "us-east4") and the "global" location for Vertex AI endpoints. When using "global" location, the service will useaiplatform.googleapis.comas the API host instead of the regional format. -
Added
on_pipeline_finishedevent toPipelineTask. This event will get fired when the pipeline is done running. This can be the result of aStopFrame,CancelFrameorEndFrame.@task.event_handler("on_pipeline_finished") async def on_pipeline_finished(task: PipelineTask, frame: Frame): ... -
Added support for new RTVI
send-textevent, along with the ability to toggle the audio response off (skip tts) while handling the new context.
Changed
-
Updated
aiortcto 1.13.0. -
Updated
sentryto 2.38.0. -
BaseOutputTransportmethodswrite_audio_frameandwrite_video_framenow return a boolean to indicate if the transport implementation was able to write the given frame or not. -
Updated Silero VAD model to v6.
-
Updated
livekitto 1.0.13. -
torchandtorchaudioare no longer required for running Smart Turn locally. This avoids gigabytes of dependencies being installed. -
Updated
websocketsdependency to support version 15.0. Removed deprecated usage ofConnectionClosed.codeandConnectionClosed.reasonattributes inAWSTranscribeSTTServicefor compatibility. -
Refactored
pyproject.tomlto reduce websockets dependency repetition using self-referencing extras. All websockets-dependent services now reference a sharedwebsockets-baseextra.
Deprecated
-
GladiaSTTService'sconfidencearg is deprecated.confidenceis no longer needed to determine which transcription or translation frames to emit. -
PipelineTaskeventson_pipeline_stopped,on_pipeline_endedandon_pipeline_cancelledare now deprecated. Useon_pipeline_finishedinstead. -
Support for the RTVI
append-to-contextevent, in lieu of the newsend-textevent and making way for future events likesend-image.
Fixed
-
Fixed an issue where the pipeline could freeze if a task cancellation never completed because a third-party library swallowed asyncio.CancelledError. We now apply a timeout to task cancellations to prevent these freezes. If the timeout is reached, the system logs warnings and leaves dangling tasks behind, which can help diagnose where cancellation is being blocked.
-
Fixed an
AudioBufferProcessorissues that was causing user audio to be missing in stereo recordings causing bot and user overlaps. -
Fixed a
BaseOutputTransportissue that could produce large savedAudioBufferProcessorfiles when using an audio mixer. -
Fixed a
PipelineRunnerissue on Windows where setting up SIGINT and SIGTERM was raising an exception. -
Fixed an issue where multiple handlers for an event would not run in parallel.
-
Fixed
DailyTransport.sip_call_transfer()to automatically use the session ID from theon_dialin_connectedevent, when not explicitly provided. Now supports cold transfers (from incoming dial-in calls) by automatically tracking session IDs from connection events. -
Fixed a memory leak in
SmallWebRTCTransport. Inaiortc, when you receive aMediaStreamTrack(audio or video), frames are produced asynchronously. If the code never consumes these frames, they are queued in memory, causing a memory leak. -
Fixed an issue in
AsyncAITTSService, whereTTSTextFrameswere not being pushed. -
Fixed an issue that would cause
push_interruption_task_frame_and_wait()to not wait if a previous interruption had already happened. -
Fixed a couple of bugs in
ServiceSwitcher:- Using multiple
ServiceSwitchers in a pipeline would result in an error. ServiceSwitcherFrames (such asManuallySwitchServiceFrames) were having an effect too early, essentially "jumping the queue" in terms of pipeline frame ordering.
- Using multiple
-
Fixed a self-cancellation deadlock in
UserIdleProcessorwhen returningFalsefrom an idle callback. The task now terminates naturally instead of attempting to cancel itself. -
Fixed an issue in
AudioBufferProcessorwhere a recording is not created when a bot speaks and user input is blocked. -
Fixed a
FastAPIWebsocketTransportandSmallWebRTCTransportissue whereon_client_disconnectedwould be triggered when the bot ends the conversation. That is,on_client_disconnectedshould only be triggered when the remote client actually disconnects. -
Fixed an issue in
HeyGenVideoServicewhere theBotStartedSpeakingFramewas blocked from moving through the Pipeline.
[0.0.85] - 2025-09-12
Added
-
AzureSTTServicenow pushes interim transcriptions. -
Added
voice_cloning_keytoGoogleTTSServiceto support custom cloned voices. -
Added
speaking_ratetoGoogleTTSService.InputParamsto control the speaking rate. -
Added a
speedarg toOpenAITTSServiceto control the speed of the voice response. -
Added
FrameProcessor.push_interruption_task_frame_and_wait(). Use this method to programatically interrupt the bot from any part of the pipeline. This guarantees that all the processors in the pipeline are interrupted in order (from upstream to downstream). Internally, this works by first pushing anInterruptionTaskFrameupstream until it reaches the pipeline task. The pipeline task then generates anInterruptionFrame, which flows downstream through all processors. Once theInterruptionFramehas reaches the processor waiting for the interruption, the function returns and execution continues after the call. Think of it as sending an upstream request for interruption and waiting until the acknowledgment flows back downstream. -
Added new base
TaskFrame(which is a system frame). This is the base class for all task frames (EndTaskFrame,CancelTaskFrame, etc.) that are meant to be pushed upstream to reach the pipeline task. -
Expanded support for universal
LLMContextto the AWS Bedrock LLM service. Using the universalLLMContextand associatedLLMContextAggregatorPairis a pre-requisite for usingLLMSwitcherto switch between LLMs at runtime. -
Added new fields to the development runner's
parse_telephony_websocketmethod in support of providing dynamic data to a bot.- Twilio: Added a new
bodyparameter, which parses the websocket message forcustomParameters. Provide data via theParameternouns in your TwiML to use this feature. - Telnyx & Exotel: Both providers make the
toandfromphone numbers available in the websocket messages. You can now access these numbers ascall_data["to"]andcall_data["from"].
Note: Each telephony provider offers different features. Refer to the corresponding example in
pipecat-examplesto see how to pass custom data to your bot. - Twilio: Added a new
-
Added
bodyto theWebsocketRunnerArgumentsas an optional parameter. Custombodyinformation can be passed from the server into the bot file via thebot()method using this new parameter. -
Added video streaming support to
LiveKitTransport. -
Added
OpenAIRealtimeLLMServiceandAzureRealtimeLLMServicewhich provide access to OpenAI Realtime.
Changed
pipeline.tests.utils.run_test()now allows passingPipelineParamsinstead of individual parameters.
Removed
- Remove
VisionImageRawFramein favor of context frames (LLMContextFrameorOpenAILLMContextFrame).
Deprecated
-
BotInterruptionFrameis now deprecated, useInterruptionTaskFrameinstead. -
StartInterruptionFrameis now deprected, useInterruptionFrameinstead. -
Deprecate
VisionImageFrameAggregatorbecauseVisionImageRawFramehas been removed. See the12*examples for the new recommended replacement pattern. -
NoisereduceFilteris now deprecated and will be removed in a future version. Use other audio filters likeKrispFilterorAICFilter. -
Deprecated
OpenAIRealtimeBetaLLMServiceandAzureRealtimeBetaLLMService. UseOpenAIRealtimeLLMServiceandAzureRealtimeLLMService, respectively. Each service will be removed in an upcoming version, 1.0.0.
Fixed
-
Fixed a
BaseOutputTransportissue that caused incorrect detection of when the bot stopped talking while using an audio mixer. -
Fixed a
LiveKitTransportissue where RTVI messages were not properly encoded. -
Add additional fixups to Mistral context messages to ensure they meet Mistral-specific requirements, avoiding Mistral "invalid request" errors.
-
Fixed
DailyTransporttranscription handling to gracefully handle missingrawResponsefield in transcription messages, preventing KeyError crashes.
[0.0.84] - 2025-09-05
Added
-
Add the ability to send DTMF to
LiveKitTransport. -
Expanded support for universal
LLMContextto the Anthropic LLM service. Using the universalLLMContextand associatedLLMContextAggregatorPairis a pre-requisite for usingLLMSwitcherto switch between LLMs at runtime.
Changed
-
Updated
daily-pythonto 0.19.9. -
Restored
DailyTransport's native DTMF support using Daily'ssend_dtmf()method instead of generated audio tones.
Fixed
-
Fixed a
AWSBedrockLLMServicecrash caused by an extraawait. -
Fixed a
OpenAIImageGenServiceissue where it was not creatingURLImageRawFramecorrectly.
[0.0.83] - 2025-09-03
Added
-
Added multilingual support for AsyncAI in
AsyncAITTSServiceandAsyncAIHttpTTSService.- New
languages:es,fr,de,it.
- New
-
Added new frames
InputTransportMessageUrgentFrameandDailyInputTransportMessageUrgentFramefor transport messages received from external sources. -
Added
UserSpeakingFrame. This will be sent upstream and downstream while VAD detects the user is speaking. -
Expanded support for universal
LLMContextto more LLM services. Using the universalLLMContextand associatedLLMContextAggregatorPairis a pre-requisite for usingLLMSwitcherto switch between LLMs at runtime. Here are the newly-supported services:- Azure
- Cerebras
- Deepseek
- Fireworks AI
- Google Vertex AI
- Grok
- Groq
- Mistral
- NVIDIA NIM
- Ollama
- OpenPipe
- OpenRouter
- Perplexity
- Qwen
- SambaNova
- Together.ai
-
Added support for WhatsApp User-initiated Calls.
-
Added new audio filter
AICFilter, speech enhancement for improving VAD/STT performance, no ONNX dependency. See https://ai-coustics.com/sdk/ -
Added a timeout around cancel input tasks to prevent indefinite hangs when cancellation is swallowed by third-party code.
-
Added
pipecat.extensions.ivrfor automated IVR system navigation with configurable goals and conversation handling. Supports DTMF input, verbal responses, and intelligent menu traversal.Basic usage:
from pipecat.extensions.ivr.ivr_navigator import IVRNavigator # Create IVR navigator with your goal ivr_navigator = IVRNavigator( llm=llm_service, ivr_prompt="Navigate to billing department to dispute a charge" ) # Handle different outcomes @ivr_navigator.event_handler("on_conversation_detected") async def on_conversation(processor, conversation_history): # Switch to normal conversation mode pass @ivr_navigator.event_handler("on_ivr_status_changed") async def on_ivr_status(processor, status): if status == IVRStatus.COMPLETED: # End pipeline, transfer call, or start bot conversation elif status == IVRStatus.STUCK: # Handle navigation failure -
BaseOutputTransportnow implementswrite_dtmf()by loading DTMF audio and sending it through the transport. This makes sending DTMF generic across all output transports. -
Added new config parameters to
GladiaSTTService.- PreProcessingConfig >
audio_enhancerto enhance audio quality. - CustomVocabularyItem >
pronunciationsandlanguageto specify special pronunciations and in which language it will be pronounced.
- PreProcessingConfig >
Changed
-
UserStartedSpeakingFrameandUserStoppedSpeakingFrameare also pushed upstream. -
ParallelPipelinenow waits forCancelFrameto finish in all branches before pushing it downstream. -
Added
sip_codecsto theDailyRoomSipParams. -
Updated the
configure()function inpipecat.runner.dailyto include new args to create SIP-enabled rooms. Additionally, added new args to control the room and token expiration durations. -
pipecat.frames.frames.KeypadEntryis deprecated and has been moved topipecat.audio.dtmf.types.KeypadEntry. -
Updated
RimeTTSService's flush_audio message to conform with Rime's official API. -
Updated the default model for
CerebrasLLMServiceto GPT-OSS-120B.
Removed
-
Remove
StopInterruptionFrame. This was a legacy frame that was not being used really anywhere and it didn't provide any useful meaning. It was only pushed afterUserStoppedSpeakingFrame, so developers can just useUserStoppedSpeakingFrame. -
DailyTransport.write_dtmf()has been removed in favor of the genericBaseOutputTransport.write_dtmf(). -
Remove deprecated
DailyTransport.send_dtmf().
Deprecated
-
Transports have been re-organized.
pipecat.transports.network.small_webrtc -> pipecat.transports.smallwebrtc.transport pipecat.transports.network.webrtc_connection -> pipecat.transports.smallwebrtc.connection pipecat.transports.network.websocket_client -> pipecat.transports.websocket.client pipecat.transports.network.websocket_server -> pipecat.transports.websocket.server pipecat.transports.network.fastapi_websocket -> pipecat.transports.websocket.fastapi pipecat.transports.services.daily -> pipecat.transports.daily.transport pipecat.transports.services.helpers.daily_rest -> pipecat.transports.daily.utils pipecat.transports.services.livekit -> pipecat.transports.livekit.transport pipecat.transports.services.tavus -> pipecat.transports.tavus.transport -
pipecat.frames.frames.KeypadEntryis deprecated usepipecat.audio.dtmf.types.KeypadEntryinstead.
Fixed
-
Fixed an issue where messages received from the transport were always being resent.
-
Fixed
SmallWebRTCTransportto not usemidto decide if the transceiver should besendrecvor not. -
Fixed an issue where Deepgram swallowed
asyncio.CancelledErrorduring disconnect, preventing tasks from being cancelled. -
Fixed an issue where
PipelineTaskwas not cleaning up the observers.
Performance
- Reduced latency and improved memory performance in
Mem0MemoryService.
[0.0.82] - 2025-08-28
Added
-
Added a new
LLMRunFrameto trigger an LLM response:await task.queue_frames([LLMRunFrame()])This replaces
OpenAILLMContextFrame, which you’d previously typically use like this:await task.queue_frames([context_aggregator.user().get_context_frame()])Use this way of kicking off your conversation when you’ve already initialized your context and are simply instructing the bot when to go:
context = OpenAILLMContext(messages, tools) context_aggregator = llm.create_context_aggregator(context) # ... @transport.event_handler("on_client_connected") async def on_client_connected(transport, client): # Kick off the conversation. await task.queue_frames([LLMRunFrame()])Note that if you want to add new messages when kicking off the conversation, you could use
LLMMessagesAppendFramewithrun_llm=Trueinstead:@transport.event_handler("on_client_connected") async def on_client_connected(transport, client): # Kick off the conversation. await task.queue_frames([LLMMessagesAppendFrame(new_messages, run_llm=True)])In the rare case you don’t have a context aggregator in your pipeline, then you may continue using a context frame.
-
Added support for switching between audio+text to text-only modes within the same pipeline. This is done by pushing
LLMConfigureOutputFrame(skip_tts=True)to enter text-only mode, and disabling it to return to audio+text. The LLM will still generate tokens and add them to the context, but they will not be sent to TTS. -
Added
skip_ttsfield toTextFrame. This lets a text frame bypass TTS while still being included in the LLM context. Useful for cases like structured text that isn’t meant to be spoken but should still contribute to context. -
Added a
cancel_timeout_secsargument toPipelineTaskwhich defines how long the pipeline has to complete cancellation. WhenPipelineTask.cancel()is called, aCancelFrameis pushed through the pipeline and must reach the end. If it does not reach the end within the specified time, a warning is shown and the wait is aborted. -
Added a new "universal" (LLM-agnostic)
LLMContextand accompanyingLLMContextAggregatorPair, which will eventually replaceOpenAILLMContext(and the other under-the-hood contexts) and the other context aggregators. The new universalLLMContextmachinery allows a single context to be shared between different LLMs, enabling runtime LLM switching and scenarios like failover.From the developer's point of view, switching to using the new universal context machinery will usually be a matter of going from this:
context = OpenAILLMContext(messages, tools) context_aggregator = llm.create_context_aggregator(context)To this:
context = LLMContext(messages, tools) context_aggregator = LLMContextAggregatorPair(context)To start, the universal
LLMContextis supported with the following LLM services:OpenAILLMServiceGoogleLLMService
-
Added a new
LLMSwitcherclass to enable runtime LLM switching, built atop a new genericServiceSwitcher.Switchers take a switching strategy. The first available strategy is
ServiceSwitcherStrategyManual.To switch LLMs at runtime, the LLMs must be sharing one instance of the new universal
LLMContext(see above bullet).# Instantiate your LLM services llm_openai = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY")) llm_google = GoogleLLMService(api_key=os.getenv("GOOGLE_API_KEY")) # Instantiate a switcher # (ServiceSwitcherStrategyManual defaults to OpenAI, as it's first in the list) llm_switcher = LLMSwitcher( llms=[llm_openai, llm_google], strategy_type=ServiceSwitcherStrategyManual ) # Create your pipeline pipeline = Pipeline( [ transport.input(), stt, context_aggregator.user(), llm_switcher, tts, transport.output(), context_aggregator.assistant(), ] ) task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True)) # ... # Whenever is appropriate, switch LLMs! await task.queue_frames([ManuallySwitchServiceFrame(service=llm_google)]) -
Added an
LLMService.run_inference()method to LLM services to enable direct, out-of-band (i.e. out-of-pipeline) inference.
Changed
-
Updated
daily-pythonto 0.19.8. -
PipelineTasknow waits forStartFrameto reach the end of the pipeline before pushing any other frames. -
Updated
CartesiaTTSServiceandCartesiaHttpTTSServiceto align with Cartesia's changes for thespeedparameter. It now takes only an enum ofslow,normal, orfast. -
Added support to
AWSBedrockLLMServicefor setting authentication credentials through environment variables. -
Updated
SarvamTTSServiceto use WebSocket streaming for real-time audio generation with multiple Indian languages, with HTTP support still available viaSarvamHttpTTSService.
Fixed
-
Fixed an RTVI issue that was causing frames to be pushed before pipeline was properly initialized.
-
Fixed some
get_messages_for_logging()that were returning a JSON string instead of a list. -
Fixed a
DailyTransportissue that prevented DTMF tones from being sent. -
Fixed a missing import in
SentryMetrics. -
Fixed
AWSPollyTTSServiceto support AWS credential provider chain (IAM roles, IRSA, instance profiles) instead of requiring explicit environment variables. -
Fixed a
CartesiaTTSServiceissue that was causing the application to hang after Cartesia's 5 minutes timed out. -
Fixed an issue preventing
SpeechmaticsSTTServicefrom transcribing audio.
[0.0.81] - 2025-08-25
Added
-
Added
pipecat.extensions.voicemail, a module for detecting voicemail vs. live conversation, primarily intended for use in outbound calling scenarios. The voicemail module is optimized for text LLMs only. -
Added new frames to the
idle_timeout_framesarg:TranscriptionFrame,InterimTranscriptionFrame,UserStartedSpeakingFrame, andUserStoppedSpeakingFrame. These additions serve as indicators of user activity in the pipeline idle detection logic. -
Allow passing custom pipeline sink and source processors to a
Pipeline. Pipeline source and sink processors are used to know and control what's coming in and out of aPipelineprocessor. -
Added
FrameProcessor.pause_processing_system_frames()andFrameProcessor.resume_processing_system_frames(). These allow to pause and resume the processing of system frame. -
Added new
on_process_frame()observer method which makes it possible to know when a frame is being processed. -
Added new
FrameProcessor.entry_processor()method. This allows you to access the first non-compound processor in a pipeline. -
Added
FrameProcessorpropertiesprocessors,nextandprevious. -
ElevenLabsTTSServicenow supports additional runtime changes to themodel,language, andvoice_settingsparameters. -
Added
apply_text_normalizationsupport toElevenLabsTTSServiceandElevenLabsHttpTTSService. -
Added
MistralLLMService, using Mistral's chat completion API. -
Added the ability to retry executing a chat completion after a timeout period for
OpenAILLMServiceand its subclasses,AnthropicLLMService, andAWSBedrockLLMService. The LLM services accept new args:retry_timeout_secsandretry_on_timeout. This feature is disabled by default.
Changed
- Updated
daily-pythonto 0.19.7.
Deprecated
FrameProcessor.wait_for_task()is deprecated. Useawait taskorawait asyncio.wait_for(task, timeout)instead.
Removed
-
Watchdog timers have been removed. They were introduced in 0.0.72 to help diagnose pipeline freezes. Unfortunately, they proved ineffective since they required developers to use Pipecat-specific queues, iterators, and events to correctly reset the timer, which limited their usefulness and added friction.
-
Removed unused
FrameProcessor.set_parent()andFrameProcessor.get_parent().
Fixed
-
Fixed an issue that would cause
PipelineRunnerandPipelineTaskto not handle external asyncio task cancellation properly. -
Added
SpeechmaticsSTTServiceexception handling on connection and sending. -
Replaced
asyncio.wait_for()forwait_for2.wait_for()for Python < 3.12. because of issues regarding task cancellation (i.e. cancellation is never propagated). See https://bugs.python.org/issue42130 -
Fixed an
AudioBufferProcessorissues that would cause audio overlap when setting a max buffer size. -
Fixed an issue where
AsyncAITTSServicehad very high latency in responding by addingforce=truewhen sending the flush command.
Performance
-
Improve
PipelineTaskperformance by using direct mode processors and by removing unnecessary tasks. -
Improve
ParallelPipelineperformance by using direct mode, by not creating a task for each frame and every sub-pipeline and also by removing other unnecessary tasks. -
Pipelineperformance improvements by using direct mode.
Other
-
Added
14w-function-calling-mistal.pyusingMistralLLMService. -
Added
13j-azure-transcription.pyusingAzureSTTService.
[0.0.80] - 2025-08-13
Added
-
Added
GeminiTTSServicewhich uses Google Gemini to generate TTS output. The Gemini model can be prompted to insert styled speech to control the TTS output. -
Added Exotel support to Pipecat's development runner. You can now connect using the runner with
uv run bot.py -t exoteland an ngrok connection to HTTP port 7860. -
Added
enable_direct_modeargument toFrameProcessor. The direct mode is for processors which require very little I/O or compute resources, that is processors that can perform their task almost immediately. These type of processors don't need any of the internal tasks and queues usually created by frame processors which means overall application performance might be slightly increased. Use with care. -
Added TTFB metrics for
HeyGenVideoServiceandTavusVideoService. -
Added
endpoint_idparameter toAzureSTTService. (Custom EndpointId)
Changed
-
WatchdogPriorityQueuenow requires the items to be inserted to always be tuples and the size of the tuple needs to be specified in the constructor when creating the queue with thetuple_sizeargument. -
Updated Moondream to revision
2025-01-09. -
Updated
PlayHTHttpTTSServiceto no longer use thepyhtclient to remove compatibility issues with other packages. Now you can use the PlayHT HTTP service with other services, like GoogleLLMService. -
Updated
pyproject.tomlto once again pinnumbato==0.61.2in order to resolve package versioning issues. -
Updated the
STTMuteFilterto includeVADUserStartedSpeakingFrameandVADUserStoppedSpeakingFramein the list of frames to filter when the filtering is on.
Performance
-
Improving the latency of the
HeyGenVideoService. -
Improved some frame processors performance by using the new frame processor direct mode. In direct mode a frame processor will process frames right away avoiding the need for internal queues and tasks. This is useful for some simple processors. For example, in processors that wrap other processors (e.g.
Pipeline,ParallelPipeline), we add one processor before and one after the wrapped processors (internally, you will see them as sources and sinks). These sources and sinks don't do any special processing and they basically forward frames. So, for these simple processors we now enable the new direct mode which avoids creating any internal tasks (and queues) and therefore improves performance.
Fixed
-
Fixed an issue with the
BaseWhisperSTTServicewhere the language was specified as an enum and not a string. -
Fixed an issue where
SmallWebRTCTransportended before TTS finished. -
Fixed an issue in
OpenAIRealtimeBetaLLMServicewhere specifying atextmodalitiesdidn't result in text being outputted from the model. -
Added SSML reserved character escaping to
AzureBaseTTSServiceto properly handle special characters in text sent to Azure TTS. This fixes an issue where characters like&,<,>,", and'in LLM-generated text would cause TTS failures. -
Fixed a
WatchdogPriorityQueueissue that could cause an exception when compating watchdog cancel sentinel items with other items in the queue. -
Fixed an issue that would cause system frames to not be processed with higher priority than other frames. This could cause slower interruption times.
-
Fixed an issue where retrying a websocket connection error would result in an error.
Other
-
Add foundation example
19b-openai-realtime-beta-text.py, showing how to useOpenAIRealtimeBetaLLMServiceto output text to a TTS service. -
Add vision support to release evals so we can run the foundational examples 12 series.
-
Added foundational example
15a-switch-languages.pyto release evals. It is able to detect if we switched the language properly. -
Updated foundational examples to show how to enclose complex logic (e.g.
ParallelPipeline) into a single processor so the main pipeline becomes simpler. -
Added
07n-interruptible-gemini.py, demonstrating how to useGeminiTTSService.
[0.0.79] - 2025-08-07
Changed
- Changed
pipecat-ai'sopenaidependency to>=1.74.0,<=1.99.1due to a breaking change inopenai1.99.2 (commit)
Deprecated
-
TTSService.say()is deprecated, push aTTSSpeakFrameinstead. Calling functions directly is a discouraged pattern in Pipecat because, for example, it might cause issues with frame ordering. -
LLMMessagesFrameis deprecated, in favor of either:LLMMessagesUpdateFramewithrun_llm=TrueOpenAILLMContextFramewith desired messages in a new context
-
LLMUserResponseAggregatorandLLMAssistantResponseAggregatorare deprecated, as they depended on the now-deprecatedLLMMessagesFrame. UseLLMUserContextAggregatorandLLMAssistantResponseAggregator(or LLM-specific subclasses thereof) instead.
[0.0.78] - 2025-08-07
Added
-
Added
SonioxSTTServiceusing Soniox's STT websocket API. -
Added
enable_emulated_vad_interruptionstoLLMUserAggregatorParams. When user speech is emulated (e.g. when a transcription is received but VAD doesn't detect speech), this parameter controls whether the emulated speech can interrupt the bot. Default is False (emulated speech is ignored while the bot is speaking). -
Added new
handle_sigintandhandle_sigtermtoRunnerArguments. This allows applications to know what settings they should use for the environment they are running on. Also, addedpipeline_idle_timeout_secsto be able to control thePipelineTaskidle timeout. -
Added
processorfield toErrorFrameto indicateFrameProcessorthat generated the error. -
Added new language support for
AWSTranscribeSTTService. All languages supporting streaming data input are now supported: https://docs.aws.amazon.com/transcribe/latest/dg/supported-languages.html -
Added support for Simli Trinity Avatars. A new
is_trinity_avatarparameter has been introduced to specify whether the providedfaceIdcorresponds to a Trinity avatar, which is required for optimal Trinity avatar performance. -
The development runner how handles custom
bodydata forDailyTransport. Thebodydata is passed to the Pipecat client. You can POST to the/startendpoint with a request body of:{ "createDailyRoom": true, "dailyRoomProperties": { "start_video_off": true }, "body": { "custom_data": "value" } }The
bodyinformation is parsed and used in the application. ThedailyRoomPropertiesare currently not handled. -
Added detailed latency logging to
UserBotLatencyLogObserver, capturing average response time between user stop and bot start, as well as minimum and maximum response latency. -
Added Chinese, Japanese, Korean word timestamp support to
CartesiaTTSService. -
Added
regionparameter toGladiaSTTService. Accepted values: eu-west (default), us-west.
Changed
-
System frames are now queued. Before, system frames could be generated from any task and would not guarantee any order which was causing undesired behavior. Also, it was possible to get into some rare recursion issues because of the way system frames were executed (they were executed in-place, meaning calling
push_frame()would finish after the system frame traversed all the pipeline). This makes system frames more deterministic. -
Changed the default model for both
ElevenLabsTTSServiceandElevenLabsHttpTTSServicetoeleven_turbo_v2_5. The rationale for this change is that the Turbo v2.5 model exhibits the most stable voice quality along with very low latency TTFB; latencies are on par with the Flash v2.5 model. Also, the Turbo v2.5 model outputs word/timestamp alignment data with correct spacing. -
The development runners
/connectand/startendpoint now both returndailyRoomanddailyTokenin place of the previousroom_urlandtoken. -
Updated the
pipecat.runner.dailyutility to only a takeDAILY_API_URLandDAILY_SAMPLE_ROOM_URLenvironment variables instead of argparsing-uand-k, respectively. -
Updated
daily-pythonto 0.19.6. -
Changed
TavusVideoServiceto send audio or video frames only after the transport is ready, preventing warning messages at startup. -
The development runner now strips any provided protocol (e.g. https://) from the proxy address and issues a warning. It also strips trailing
/.
Deprecated
-
In the
pipecat.runner.daily, theconfigure_with_args()function is deprecated. Use theconfigure()function instead. -
The development runner's
/connectendpoint is deprecated and will be removed in a future version. Use the/startendpoint in its place. In the meantime, both endpoints work and deliver equivalent functionality.
Fixed
-
Fixed a
DailyTransportissue that would result in an unhandledconcurrent.futures.CancelledErrorwhen a future is cancelled. -
Fixed a
RivaSTTServiceissue that would result in an unhandledconcurrent.futures.CancelledErrorwhen a future is cancelled when reading from the audio chunks from the incoming audio stream. -
Fixed an issue in the
BaseOutputTransport, mainly reproducible withFastAPIWebsocketOutputTransportwhen the audio mixer was enabled, where the loop could consume 100% CPU by continuously returning without delay, preventing other asyncio tasks (such as cancellation or shutdown signals) from being processed. -
Fixed an issue where
BotStartedSpeakingFrameandBotStoppedSpeakingFramewere not emitted when usingTavusVideoServiceorHeyGenVideoService. -
Fixed an issue in
LiveKitTransportwhere emptyAudioRawFrames were pushed down the pipeline. This resulted in warnings by the STT processor. -
Fixed
PiperTTSServiceto send text as a JSON object in the request body, resolving compatibility with Piper's HTTP API. -
Fixed an issue with the
TavusVideoServicewhere an error was thrown due to missing transcription callbacks. -
Fixed an issue in
SpeechmaticsSTTServicewhere theuser_idwas set toNonewhen diarization is not enabled.
Performance
- Fixed an issue in
TaskObserver(a proxy to all observers) that was degrading global performance.
Other
- Added
07aa-interruptible-soniox.py,07ab-interruptible-inworld-http.py,07ac-interruptible-asyncai.pyand07ac-interruptible-asyncai-http.pyrelease evals.
[0.0.77] - 2025-07-31
Added
-
Added
InputTextRawFrameframe type to handle user text input with Gemini Multimodal Live. -
Added
HeyGenVideoService. This is an integration for HeyGen Interactive Avatar. A video service that handles audio streaming and requests HeyGen to generate avatar video responses. (see https://www.heygen.com/) -
Added the ability to switch voices to
RimeTTSService. -
Added unified development runner for building voice AI bots across multiple transports
pipecat.runner.run– FastAPI-based development server with automatic bot discoverypipecat.runner.types– Runner session argument types (DailyRunnerArguments,SmallWebRTCRunnerArguments,WebSocketRunnerArguments)pipecat.runner.utils.create_transport()– Factory function for creating transports from session argumentspipecat.runner.dailyandpipecat.runner.livekit– Configuration utilities for Daily and LiveKit setups- Support for all transport types: Daily, WebRTC, Twilio, Telnyx, Plivo
- Automatic telephony provider detection and serializer configuration
- ESP32 WebRTC compatibility with SDP munging
- Environment detection (
ENV=local) for conditional features
-
Added Async.ai TTS integration (https://async.ai/)
AsyncAITTSService– WebSocket-based streaming TTS with interruption supportAsyncAIHttpTTSService– HTTP-based streaming TTS service- Example scripts:
examples/foundational/07ac-interruptible-asyncai.py(WebSocket demo)examples/foundational/07ac-interruptible-asyncai-http.py(HTTP demo)
-
Added
transcription_bucketparams support to theDailyRESTHelper. -
Added a new TTS service,
InworldTTSService. This service provides low-latency, high-quality speech generation using Inworld's streaming API. -
Added a new field
handle_sigtermtoPipelineRunner. It defaults toFalse. This field handles SIGTERM signals. Thehandle_sigintfield still defaults toTrue, but now it handles only SIGINT signals. -
Added foundational example
14u-function-calling-ollama.pyfor Ollama function calling. -
Added
LocalSmartTurnAnalyzerV2, which supports local on-device inference with the newsmart-turn-v2turn detection model. -
Added
set_log_leveltoDailyTransport, allowing setting the logging level for Daily's internal logging system. -
Added
on_transcription_stoppedandon_transcription_errorto Daily callbacks.
Changed
-
Changed the default
urlforNeuphonicTTSServicetowss://api.neuphonic.comas it provides better global performance. You can set the URL to other URLs, such as the previous default:wss://eu-west-1.api.neuphonic.com. -
Update
daily-pythonto 0.19.5. -
STTMuteFilternow pushes theSTTMuteFrameupstream and downstream, to allow for more flexibleSTTMuteFilterplacement. -
Play delayed messages from
ElevenLabsTTSServiceif they still belong to the current context. -
Dependency compatibility improvements: Relaxed version constraints for core dependencies to support broader version ranges while maintaining stability:
aiohttp,Markdown,nltk,numpy,Pillow,pydantic,openai,numba: Now support up to the next major version (e.g.numpy>=1.26.4,<3)pyht: Relaxed to>=0.1.6to resolvegrpcioconflicts withnvidia-riva-clientfastapi: Updated to support versions>=0.115.6,<0.117.0torch/torchaudio: Changed from exact pinning (==2.5.0) to compatible range (~=2.5.0)aws_sdk_bedrock_runtime: Added Python 3.12+ constraint via environment markernumba: Reduced minimum version to0.60.0for better compatibility
-
Changed
NeuphonicHttpTTSServiceto use a POST based request instead of thepyneuphonicpackage. This removes a package requirement, allowing Neuphonic to work with more services. -
Updated
ElevenLabsTTSServiceto handle the case whereallow_interruptions=False. Now, when interruptions are disabled, the same context ID will be used throughout the conversation. -
Updated the
deepgramoptional dependency to 4.7.0, which downgrades thetasks cancelled errorto a debug log. This removes the log from appearing in Pipecat logs upon leaving. -
Upgraded the
websocketsimplementation to the new asyncio implementation. Along with this change, we're updating support for versions >=13.1.0 and <15.0.0. All services have been update to use the asyncio implementation. -
Updated
MiniMaxHttpTTSServicewith abase_urlarg where you can specify the Global endpoint (default) or Mainland China. -
Replaced regex-based sentence detection in
match_endofsentencewith NLTK's punkt_tab tokenizer for more reliable sentence boundary detection. -
Changed the
livekitoptional dependency fortenacitytotenacity>=8.2.3,<10.0.0in order to support thegoogle-genaipackage. -
For
LmntTTSService, changed the defaultmodeltoblizzard, LMNT's recommended model. -
Updated
SpeechmaticsSTTService:- Added support for additional diarization options.
- Added foundational example
07a-interruptible-speechmatics-vad.py, which uses VAD detection provided bySpeechmaticsSTTService.
Fixed
-
Fixed a
LLMUserResponseAggregatorissue where interruptions were not being handled properly. -
Fixed
PiperTTSServiceto work with newer Piper GPL. -
Fixed a race condition in
FastAPIWebsocketClientthat occurred when attempting to send a message while the client was disconnecting. -
Fixed an issue in
GoogleLLMServicewhere interruptions did not work when an interruption strategy was used. -
Fixed an issue in the
TranscriptProcessorwhere newline characters could cause the transcript output to be corrupted (e.g. missing all spaces). -
Fixed an issue in
AudioBufferProcessorwhen usingSmallWebRTCTransportwhere, if the microphone was muted, track timing was not respected. -
Fixed an error that occurs when pushing an
LLMMessagesFrame. Only some LLM services, like Grok, are impacted by this issue. The fix is to remove the optionalnameproperty that was being added to the message. -
Fixed an issue in
AudioBufferProcessorthat caused garbled audio whenenable_turn_audiowas enabled and audio resampling was required. -
Fixed a dependency issue for uv users where an
llvmliteversion required python 3.9. -
Fixed an issue in
MiniMaxHttpTTSServicewhere thepitchparam was the incorrect type. -
Fixed an issue with OpenTelemetry tracing where the
enable_tracingflag did not disable the internal tracing decorator functions. -
Fixed an issue in
OLLamaLLMServicewhere kwargs were not passed correctly to the parent class. -
Fixed an issue in
ElevenLabsTTSServicewhere the word/timestamp pairs were calculating word boundaries incorrectly. -
Fixed an issue where, in some edge cases, the
EmulateUserStartedSpeakingFramecould be created even if we didn't have a transcription. -
Fixed an issue in
GoogleLLMContextwhere it would inject thesystem_messageas a "user" message into cases where it was not meant to; it was only meant to do that when there were no "regular" (non-function-call) messages in the context, to ensure that inference would run properly. -
Fixed an issue in
LiveKitTransportwhere theon_audio_track_subscribedwas never emitted.
Other
-
Added new quickstart demos:
- examples/quickstart: voice AI bot quickstart
- examples/client-server-web: client/server starter example
- examples/phone-bot-twilio: twilio starter example
-
Removed most of the examples from the pipecat repo. Examples can now be found in: https://github.com/pipecat-ai/pipecat-examples.
[0.0.76] - 2025-07-11
Added
- Added
SpeechControlParamsFrame, a newSystemFramethat notifies downstream processors of the VAD and Turn analyzer params. This frame is pushed by theBaseInputTransportat Start and any time aVADParamsUpdateFrameis received.
Changed
- Two package dependencies have been updated:
numpynow supports 1.26.0 and newertransformersnow supports 4.48.0 and newer
Fixed
-
Fixed an issue with RTVI's handling of
append-to-context. -
Fixed an issue where using audio input with a sample rate requiring resampling could result in empty audio being passed to STT services, causing errors.
-
Fixed the VAD analyzer to process the full audio buffer as long as it contains more than the minimum required bytes per iteration, instead of only analyzing the first chunk.
-
Fixed an issue in ParallelPipeline that caused errors when attempting to drain the queues.
-
Fixed an issue with emulated VAD timeout inconsistency in
LLMUserContextAggregator. Previously, emulated VAD scenarios (where transcription is received without VAD detection) used a hardcodedaggregation_timeout(default 0.5s) instead of matching the VAD'sstop_secsparameter (default 0.8s). This created different user experiences between real VAD and emulated VAD scenarios. Now, emulated VAD timeouts automatically synchronize with the VAD'sstop_secsparameter. -
Fix a pipeline freeze when using AWS Nova Sonic, which would occur if the user started early, while the bot was still working through
trigger_assistant_response().
[0.0.75] - 2025-07-08 [YANKED]
This release has been yanked due to resampling issues affecting audio output
quality and critical bugs impacting ParallelPipelines functionality.
Please upgrade to version 0.0.76 or later.
Added
-
Added an
aggregate_sentencesarg inCartesiaTTSService,ElevenLabsTTSService,NeuphonicTTSServiceandRimeTTSService, where the default value is True. Whenaggregate_sentencesis True, theTTSServiceaggregates the LLM streamed tokens into sentences by default. Note: setting the value to False requires a custom processor before theTTSServiceto aggregate LLM tokens. -
Added
kwargsto theOLLamaLLMServiceto allow for configuration args to be passed to Ollama. -
Added call hang-up error handling in
TwilioFrameSerializer, which handles the case where the user has hung up before theTwilioFrameSerializerhangs up the call.
Changed
-
Updated
RTVIObserverandRTVIProcessorto match the new RTVI 1.0.0 protocol. This includes:- Deprecating support for all messages related to service configuaration and actions.
- Adding support for obtaining and logging data about client, including its RTVI version and optionally included system information (OS/browser/etc.)
- Adding support for handling the new
client-messageRTVI message through either aon_client_messageevent handler or listening for a newRTVIClientMessageFrame - Adding support for responding to a
client-messagewith aserver-responsevia either a direct call on theRTVIProcessoror via pushing a newRTVIServerResponseFrame - Adding built-in support for handling the new
append-to-contextRTVI message which allows a client to add to the user or assistant llm context. No extra code is required for supporting this behavior. - Updating all JavaScript and React client RTVI examples to use versions 1.0.0 of the clients.
Get started migrating to RTVI protocol 1.0.0 by following the migration guide: https://docs.pipecat.ai/client/migration-guide
-
Refactored
AWSBedrockLLMServiceandAWSPollyTTSServiceto work asynchronously usingaioboto3instead of theboto3library. -
The
UserIdleProcessornow handles the scenario where function calls take longer than the idle timeout duration. This allows you to use theUserIdleProcessorin conjunction with function calls that take a while to return a result.
Fixed
-
Updated the
NeuphonicTTSServiceto work with the updated websocket API. -
Fixed an issue with
RivaSTTServicewhere the watchdog feature was causing an error on initialization.
Performance
- Remove unncessary push task in each
FrameProcessor.
[0.0.74] - 2025-07-03 [YANKED]
This release has been yanked due to resampling issues affecting audio output
quality and critical bugs impacting ParallelPipelines functionality.
Please upgrade to version 0.0.76 or later.
Added
-
Added a new STT service,
SpeechmaticsSTTService. This service provides real-time speech-to-text transcription using the Speechmatics API. It supports partial and final transcriptions, multiple languages, various audio formats, and speaker diarization. -
Added
normalizeandmodel_idtoFishAudioTTSService. -
Added
http_optionsargument toGoogleLLMService. -
Added
run_llmfield toLLMMessagesAppendFrameandLLMMessagesUpdateFrameframes. If true, a context frame will be pushed triggering the LLM to respond. -
Added a new
SOXRStreamAudioResamplerfor processing audio in chunks or streams. If you write your own processor and need to use an audio resampler, use the newcreate_stream_resampler(). -
Added new
DailyParams.audio_in_user_tracksto allow receiving one track per user (default) or a single track from the room (all participants mixed). -
Added support for providing "direct" functions, which don't need an accompanying
FunctionSchemaor function definition dict. Instead, metadata (i.e.name,description,properties, andrequired) are automatically extracted from a combination of the function signature and docstring.Usage:
# "Direct" function # `params` must be the first parameter async def do_something(params: FunctionCallParams, foo: int, bar: str = ""): """ Do something interesting. Args: foo (int): The foo to do something interesting with. bar (string): The bar to do something interesting with. """ result = await process(foo, bar) await params.result_callback({"result": result}) # ... llm.register_direct_function(do_something) # ... tools = ToolsSchema(standard_tools=[do_something]) -
user_idis now populated in theTranscriptionFrameandInterimTranscriptionFramewhen using a transport that provides auser_id, likeDailyTransportorLiveKitTransport. -
Added
watchdog_coroutine(). This is a watchdog helper for couroutines. So, if you have a coroutine that is waiting for a result and that takes a long time, you will need to wrap it withwatchdog_coroutine()so the watchdog timers are reset regularly. -
Added
session_tokenparameter toAWSNovaSonicLLMService. -
Added Gemini Multimodal Live File API for uploading, fetching, listing, and deleting files. See
26f-gemini-live-files-api.pyfor example usage.
Changed
-
Updated all the services to use the new
SOXRStreamAudioResampler, ensuring smooth transitions and eliminating clicks. -
Upgraded
daily-pythonto 0.19.4. -
Updated
googleoptional dependency to usegoogle-genaiversion1.24.0.
Fixed
-
Fixed an issue where audio would get stuck in the queue when an interrupt occurs during Azure TTS synthesis.
-
Fixed a race condition that occurs in Python 3.10+ where the task could miss the
CancelledErrorand continue running indefinitely, freezing the pipeline. -
Fixed a
AWSNovaSonicLLMServiceissue introduced in 0.0.72.
Deprecated
- In
FishAudioTTSService, deprecatedmodeland replaced withreference_id. This change is to better align with Fish Audio's variable naming and to reduce confusion about what functionality the variable controls.
[0.0.73] - 2025-06-26
Fixed
- Fixed an issue introduced in 0.0.72 that would cause
ElevenLabsTTSService,GladiaSTTService,NeuphonicTTSServiceandOpenAIRealtimeBetaLLMServiceto throw an error.
[0.0.72] - 2025-06-26
Added
-
Added logging and improved error handling to help diagnose and prevent potential Pipeline freezes.
-
Added
WatchdogQueue,WatchdogPriorityQueue,WatchdogEventandWatchdogAsyncIterator. These helper utilities reset watchdog timers appropriately before they expire. When watchdog timers are disabled, the utilities behave as standard counterparts without side effects. -
Introduce task watchdog timers. Watchdog timers are used to detect if a Pipecat task is taking longer than expected (by default 5 seconds). Watchdog timers are disabled by default and can be enabled globally by passing
enable_watchdog_timersargument toPipelineTaskconstructor. It is possible to change the default watchdog timer timeout by using thewatchdog_timeoutargument. You can also log how long it takes to reset the watchdog timers which is done with theenable_watchdog_logging. You can control all these settings per each frame processor or even per task. That is, you can setenable_watchdog_timers,enable_watchdog_loggingandwatchdog_timeoutwhen creating any frame processor through their constructor arguments or when you create a task withFrameProcessor.create_task(). Note that watchdog timers only work with Pipecat tasks and will not work if you useasycio.create_task()or similar. -
Added
lexicon_namesparameter toAWSPollyTTSService.InputParams. -
Added reconnection logic and audio buffer management to
GladiaSTTService. -
The
TurnTrackingObservernow ends a turn upon observing anEndFrameorCancelFrame. -
Added Polish support to
AWSTranscribeSTTService. -
Added new frames
FrameProcessorPauseFrameandFrameProcessorResumeFramewhich allow pausing and resuming frame processing for a given frame processor. These are control frames, so they are ordered. Pausing frame processor will keep old frames in the internal queues until resume takes place. Frames being pushed while a frame processor is paused will be pushed to the queues. When frame processing is resumed all queued frames will be processed in order. Also addedFrameProcessorPauseUrgentFrameandFrameProcessorResumeUrgentFramewhich are system frames and therefore they have high priority. -
Added a property called
has_function_calls_in_progressinLLMAssistantContextAggregatorthat exposes whether a function call is in progress. -
Added
SambaNovaLLMServicewhich provides llm api integration with an OpenAI-compatible interface. -
Added
SambaNovaTTSServicewhich provides speech-to-text functionality using SambaNovas's (whisper) API. -
Add fundational examples for function calling and transcription
14s-function-calling-sambanova.py,13g-sambanova-transcription.py
Changed
-
HeartbeatFrames are now control frames. This will make it easier to detect pipeline freezes. Previously, heartbeat frames were system frames which meant they were not get queued with other frames, making it difficult to detect pipeline stalls. -
Updated
OpenAIRealtimeBetaLLMServiceto acceptlanguagein theInputAudioTranscriptionclass for all models. -
Updated the default model for
OpenAIRealtimeBetaLLMServicetogpt-4o-realtime-preview-2025-06-03. -
The
PipelineParamsargallow_interruptionsnow defaults toTrue. -
TavusTransportandTavusVideoServicenow send audio to Tavus using WebRTC audio tracks instead ofapp-messagesover WebSocket. This should improve the overall audio quality. -
Upgraded
daily-pythonto 0.19.3.
Fixed
-
Fixed an issue that would cause heartbeat frames to be sent before processors were started.
-
Fixed an event loop blocking issue when using
SentryMetrics. -
Fixed an issue in
FastAPIWebsocketClientto ensure proper disconnection when the websocket is already closed. -
Fixed an issue where the
UserStoppedSpeakingFramewas not received if the transport was not receiving new audio frames. -
Fixed an edge case where if the user interrupted the bot but no new aggregation was received, the bot would not resume speaking.
-
Fixed an issue with
TelnyxFrameSerializerwhere it would throw an exception when the user hung up the call. -
Fixed an issue with
ElevenLabsTTSServicewhere the context was not being closed. -
Fixed function calling in
AWSNovaSonicLLMService. -
Fixed an issue that would cause multiple
PipelineTask.on_idle_timeoutevents to be triggered repeatedly. -
Fixed an issue that was causing user and bot speech to not be synchronized during recordings.
-
Fixed an issue where voice settings weren't applied to ElevenLabsTTSService.
-
Fixed an issue with
GroqTTSServicewhere it was not properly parsing the WAV file header. -
Fixed an issue with
GoogleSTTServicewhere it was constantly reconnecting before starting to receive audio from the user. -
Fixed an issue where
GoogleLLMService's TTFB value was incorrect.
Deprecated
AudioBufferProcessorparameteruser_continuos_streamis deprecated.
Other
- Rename
14e-function-calling-gemini.pyto14e-function-calling-google.py.
[0.0.71] - 2025-06-10
Added
- Adds a parameter called
additional_span_attributesto PipelineTask that lets you add any additional attributes you'd like to the conversation span.
Fixed
- Fixed an issue with
CartesiaSTTServiceinitialization.
[0.0.70] - 2025-06-10
Added
-
Added
ExotelFrameSerializerto handle telephony calls via Exotel. -
Added the option
informaltoTranslationConfigon Gladia config. Allowing to force informal language forms when available. -
Added
CartesiaSTTServicewhich is a websocket based implementation to transcribe audio. Added a foundational example in13f-cartesia-transcription.py -
Added an
websocketexample, showing how to use the new Pipecat clientWebsocketTransportto connect with PipecatFastAPIWebsocketTransportorWebsocketServerTransport. -
Added language support to
RimeHttpTTSService. Extended languages to include German and French for bothRimeTTSServiceandRimeHttpTTSService.
Changed
-
Upgraded
daily-pythonto 0.19.2. -
Make
PipelineTask.add_observer()synchronous. This allows callers to call it before doing the work of running thePipelineTask(i.e. without invokingPipelineTask.set_event_loop()first). -
Pipecat 0.0.69 forced
uvloopevent loop on Linux on macOS. Unfortunately, this is causing issue in some systems. So,uvloopis not enabled by default anymore. If you want to useuvloopyou can just set theasyncioevent policy before starting your agent with:
asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())
Fixed
-
Fixed an issue with various TTS services that would cause audio glitches at the start of every bot turn.
-
Fixed an
ElevenLabsTTSServiceissue where a context warning was printed when pushing aTTSSpeakFrame. -
Fixed an
AssemblyAISTTServiceissue that could cause unexpected behavior when yielding emptyFrame()s. -
Fixed an issue where
OutputAudioRawFrame.transport_destinationwas being reset toNoneinstead of retaining its intended value before sending the audio frame towrite_audio_frame. -
Fixed a typo in Livekit transport that prevented initialization.
[0.0.69] - 2025-06-02 "AI Engineer World's Fair release" ✨
Added
-
Added a new frame
FunctionCallsStartedFrame. This frame is pushed both upstream and downstream from the LLM service to indicate that one or more function calls are going to be executed. -
Added LLM services
on_function_calls_startedevent. This event will be triggered when the LLM service receives function calls from the model and is going to start executing them. -
Function calls can now be executed sequentially (in the order received in the completion) by passing
run_in_parallel=Falsewhen creating your LLM service. By default, if the LLM completion returns 2 or more function calls they run concurrently. In both cases, concurrently and sequentially, a new LLM completion will run when the last function call finishes. -
Added OpenTelemetry tracing for
GeminiMultimodalLiveLLMServiceandOpenAIRealtimeBetaLLMService. -
Added initial support for interruption strategies, which determine if the user should interrupt the bot while the bot is speaking. Interruption strategies can be based on factors such as audio volume or the number of words spoken by the user. These can be specified via the new
interruption_strategiesfield inPipelineParams. A newMinWordsInterruptionStrategystrategy has been introduced which triggers an interruption if the user has spoken a minimum number of words. If no interruption strategies are specified, the normal interruption behavior applies. If multiple strategies are provided, the first one that evaluates to true will trigger the interruption. -
BaseInputTransportnow handlesStopFrame. When aStopFrameis received the transport will pause sending frames downstream until a newStartFrameis received. This allows the transport to be reused (keeping the same connection) in a different pipeline. -
Updated AssemblyAI STT service to support their latest streaming speech-to-text model with improved transcription latency and endpointing.
-
You can now access STT service results through the new
TranscriptionFrame.resultandInterimTranscriptionFrame.resultfield. This is useful in case you use some specific settings for the STT and you want to access the STT results. -
The examples runner is now public from the
pipecat.examplespackage. This allows everyone to build their own examples and run them easily. -
It is now possible to push
OutputDTMFFrameorOutputDTMFUrgentFramewithDailyTransport. This will be sent properly if a Daily dial-out connection has been established. -
Added
OutputDTMFUrgentFrameto send a DTMF keypress quickly. The previousOutputDTMFFramequeues the keypress with the rest of data frames. -
Added
DTMFAggregator, which aggregates keypad presses intoTranscriptionFrames. Aggregation occurs after a timeout, termination key press, or user interruption. You can specify the prefix of theTranscriptionFrame. -
Added new functions
DailyTransport.start_transcription()andDailyTransport.stop_transcription()to be able to start and stop Daily transcription dynamically (maybe with different settings).
Changed
-
Reverted the default model for
GeminiMultimodalLiveLLMServiceback tomodels/gemini-2.0-flash-live-001.gemini-2.5-flash-preview-native-audio-dialoghas inconsistent performance. You can opt in to using this model by setting themodelarg. -
Function calls are now cancelled by default if there's an interruption. To disable this behavior you can set
cancel_on_interruption=Falsewhen registering the function call. Since function calls are executed as tasks you can tell if a function call has been cancelled by catching theasyncio.CancelledErrorexception (and don't forget to raise it again!). -
Updated OpenTelemetry tracing attribute
metrics.ttfb_mstometrics.ttfb. The attribute reports TTFB in seconds.
Deprecated
DailyTransport.send_dtmf()is deprecated, push anOutputDTMFFrameor anOutputDTMFUrgentFrameinstead.
Fixed
-
Fixed an issue with
ElevenLabsTTSServicewhere long responses would continue generating output even after an interruption. -
Fixed an issue with the
OpenAILLMContextwhere non-Roman characters were being incorrectly encoded as Unicode escape sequences. This was a logging issue and did not impact the actual conversation. -
In
AWSBedrockLLMService, worked around a possible bug in AWS Bedrock where atoolConfigis required if there has been previous tool use in the messages array. This workaround includes a no_op factory function call is used to satisfy the requirement. -
Fixed
WebsocketClientTransportto useFrameProcessorSetup.task_managerinstead ofStartFrame.task_manager.
Performance
- Use
uvloopas the new event loop on Linux and macOS systems.
[0.0.68] - 2025-05-28
Added
-
Added
GoogleHttpTTSServicewhich uses Google's HTTP TTS API. -
Added
TavusTransport, a new transport implementation compatible with any Pipecat pipeline. When using theTavusTransportthe Pipecat bot will connect in the same room as the Tavus Avatar and the user. -
Added
PlivoFrameSerializerto support Plivo calls. A full running example has also been added toexamples/plivo-chatbot. -
Added
UserBotLatencyLogObserver. This is an observer that logs the latency between when the user stops speaking and when the bot starts speaking. This gives you an initial idea on how quickly the AI services respond. -
Added
SarvamTTSService, which implements Sarvam AI's TTS API: https://docs.sarvam.ai/api-reference-docs/text-to-speech/convert. -
Added
PipelineTask.add_observer()andPipelineTask.remove_observer()to allow mangaging observers at runtime. This is useful for cases where the task is passed around to other code components that might want to observe the pipeline dynamically. -
Added
user_idfield toTranscriptionMessage. This allows identifying the user in a multi-user scenario. Note that this requires thatTranscriptionFramehas theuser_idproperly set. -
Added new
PipelineTaskevent handlerson_pipeline_started,on_pipeline_stopped,on_pipeline_endedandon_pipeline_cancelled, which correspond to theStartFrame,StopFrame,EndFrameandCancelFramerespectively. -
Added additional languages to
LmntTTSService. Languages include:hi,id,it,ja,nl,pl,ru,sv,th,tr,uk,vi. -
Added a
modelparameter to theLmntTTSServiceconstructor, allowing switching between LMNT models. -
Added
MiniMaxHttpTTSService, which implements MiniMax's T2A API for TTS. Learn more: https://www.minimax.io/platform_overview -
A new function
FrameProcessor.setup()has been added to allow setting up frame processors before receiving aStartFrame. This is what's happening internally:FrameProcessor.setup()is called,StartFrameis pushed from the beginning of the pipeline, your regular pipeline operations,EndFrameorCancelFrameare pushed from the beginning of the pipeline and finallyFrameProcessor.cleanup()is called. -
Added support for OpenTelemetry tracing in Pipecat. This initial implementation includes:
- A
setup_tracingmethod where you can specify your OpenTelemetry exporter - Service decorators for STT (
@traced_stt), LLM (@traced_llm), and TTS (@traced_tts) which trace the execution and collect properties and metrics (TTFB, token usage, character counts, etc.) - Class decorators that provide execution tracking; these are generic and can be used for service tracking as needed
- Spans that help track traces on a per conversations and turn basis:
conversation-uuid ├── turn-1 │ ├── stt_deepgramsttservice │ ├── llm_openaillmservice │ └── tts_cartesiattsservice ... └── turn-n └── ...By default, Pipecat has implemented service decorators to trace execution of STT, LLM, and TTS services. You can enable tracing by setting
enable_tracingtoTruein the PipelineTask. - A
-
Added
TurnTrackingObserver, which tracks the start and end of a user/bot turn pair and emits eventson_turn_startedandon_turn_stoppedcorresponding to the start and end of a turn, respectively. -
Allow passing observers to
run_test()while running unit tests.
Changed
-
Upgraded
daily-pythonto 0.19.1. -
⚠️ Updated
SmallWebRTCTransportto align with how other transports handleon_client_disconnected. Now, when the connection is closed and no reconnection is attempted,on_client_disconnectedis called instead ofon_client_close. Theon_client_closecallback is no longer used, useon_client_disconnectedinstead. -
Check if
PipelineTaskhas already been cancelled. -
Don't raise an exception if event handler is not registered.
-
Upgraded
deepgram-sdkto 4.1.0. -
Updated
GoogleTTSServiceto use Google's streaming TTS API. The default voice also updated toen-US-Chirp3-HD-Charon. -
⚠️ Refactored the
TavusVideoService, so it acts like a proxy, sending audio to Tavus and receiving both audio and video. This will makeTavusVideoServiceusable with any Pipecat pipeline and with any transport. This is a breaking change, check theexamples/foundational/21a-tavus-layer-small-webrtc.pyto see how to use it. -
DailyTransportnow uses custom microphone audio tracks instead of virtual microphones. Now, multiple Daily transports can be used in the same process. -
DailyTransportnow captures audio from individual participants instead of the whole room. This allows identifying audio frames per participant. -
Updated the default model for
AnthropicLLMServicetoclaude-sonnet-4-20250514. -
Updated the default model for
GeminiMultimodalLiveLLMServicetomodels/gemini-2.5-flash-preview-native-audio-dialog. -
BaseTextFiltermethodsfilter(),update_settings(),handle_interruption()andreset_interruption()are now async. -
BaseTextAggregatormethodsaggregate(),handle_interruption()andreset()are now async. -
The API version for
CartesiaTTSServiceandCartesiaHttpTTSServicehas been updated. Also, thecartesiadependency has been updated to 2.x. -
CartesiaTTSServiceandCartesiaHttpTTSServicenow support Cartesia's newspeedparameter which accepts values ofslow,normal, andfast. -
GeminiMultimodalLiveLLMServicenow uses the user transcription and usage metrics provided by Gemini Live. -
GoogleLLMServicehas been updated to usegoogle-genaiinstead of the deprecatedgoogle-generativeai.
Deprecated
- In
CartesiaTTSServiceandCartesiaHttpTTSService,emotionhas been deprecated by Cartesia. Pipecat is following suit and deprecatingemotionas well.
Removed
-
Since
GeminiMultimodalLiveLLMServicenow transcribes it's own audio, thetranscribe_user_audioarg has been removed. Audio is now transcribed automatically. -
Removed
SileroVADframe processor, just useSileroVADAnalyzerinstead. Also removed,07a-interruptible-vad.pyexample.
Fixed
-
Fixed a
DailyTransportissue that was not allow capturing video frames if framerate was greater than zero. -
Fixed a
DeegramSTTServiceconnection issue when the user provided their ownLiveOptions. -
Fixed a
DailyTransportissue that would cause images needing resize to block the event loop. -
Fixed an issue with
ElevenLabsTTSServicewhere changing the model or voice while the service is running wasn't working. -
Fixed an issue that would cause multiple instances of the same class to behave incorrectly if any of the given constructor arguments defaulted to a mutable value (e.g. lists, dictionaries, objects).
-
Fixed an issue with
CartesiaTTSServicewhereTTSTextFramemessages weren't being emitted when the model was set tosonic. This resulted in the assistant context not being updated with assistant messages.
Performance
-
DailyTransport: process audio, video and events in separate tasks. -
Don't create event handler tasks if no user event handlers have been registered.
Other
-
It is now possible to run all (or most) foundational example with multiple transports. By default, they run with P2P (Peer-To-Peer) WebRTC so you can try everything locally. You can also run them with Daily or even with a Twilio phone number.
-
Added foundation examples
07y-interruptible-minimax.pyand07z-interruptible-sarvam.pyto show how to use theMiniMaxHttpTTSServiceandSarvamTTSService, respectively. -
Added an
open-telemetry-tracingexample, showing how to setup tracing. The example also includes Jaeger as an open source OpenTelemetry client to review traces from the example runs. -
Added foundational example
29-turn-tracking-observer.pyto show how to use theTurnTrackingObserver.
[0.0.67] - 2025-05-07
Added
-
Added
DebugLogObserverfor detailed frame logging with configurable filtering by frame type and endpoint. This observer automatically extracts and formats all frame data fields for debug logging. -
UserImageRequestFrame.video_sourcefield has been added to request an image from the desired video source. -
Added support for the AWS Nova Sonic speech-to-speech model with the new
AWSNovaSonicLLMService. See https://docs.aws.amazon.com/nova/latest/userguide/speech.html. Note that it requires Python >= 3.12 andpip install pipecat-ai[aws-nova-sonic]. -
Added new AWS services
AWSBedrockLLMServiceandAWSTranscribeSTTService. -
Added
on_active_speaker_changedevent handler to theDailyTransportclass. -
Added
enable_ssml_parsingandenable_loggingtoInputParamsinElevenLabsTTSService. -
Added support to
RimeHttpTTSServicefor thearcanamodel.
Changed
-
Updated
ElevenLabsTTSServiceto use the beta websocket API (multi-stream-input). This new API supports context_ids and cancelling those contexts, which greatly improves interruption handling. -
Observers
on_push_frame()now take a single argumentFramePushedinstead of multiple arguments. -
Updated the default voice for
DeepgramTTSServicetoaura-2-helena-en.
Deprecated
-
PollyTTSServiceis now deprecated, useAWSPollyTTSServiceinstead. -
Observer
on_push_frame(src, dst, frame, direction, timestamp)is now deprecated, useon_push_frame(data: FramePushed)instead.
Fixed
-
Fixed a
DailyTransportissue that was causing issues when multiple audio or video sources where being captured. -
Fixed a
UltravoxSTTServiceissue that would cause the service to generate all tokens as one word. -
Fixed a
PipelineTaskissue that would cause tasks to not be cancelled if task was cancelled from outside of Pipecat. -
Fixed a
TaskManagerthat was causing dangling tasks to be reported. -
Fixed an issue that could cause data to be sent to the transports when they were still not ready.
-
Remove custom audio tracks from
DailyTransportbefore leaving.
Removed
- Removed
CanonicalMetricsServiceas it's no longer maintained.
[0.0.66] - 2025-05-02
Added
-
Added two new input parameters to
RimeTTSService:pause_between_bracketsandphonemize_between_brackets. -
Added support for cross-platform local smart turn detection. You can use
LocalSmartTurnAnalyzerfor on-device inference using Torch. -
BaseOutputTransportnow allows multiple destinations if the transport implementation supports it (e.g. Daily's custom tracks). With multiple destinations it is possible to send different audio or video tracks with a single transport simultaneously. To do that, you need to set the newFrame.transport_destinationfield with your desired transport destination (e.g. custom track name), tell the transport you want a new destination withTransportParams.audio_out_destinationsorTransportParams.video_out_destinationsand the transport should take care of the rest. -
Similar to the new
Frame.transport_destination, there's a newFrame.transport_sourcefield which is set by theBaseInputTransportif the incoming data comes from a non-default source (e.g. custom tracks). -
TTSServicehas a newtransport_destinationconstructor parameter. This parameter will be used to update theFrame.transport_destinationfield for each generatedTTSAudioRawFrame. This allows sending multiple bots' audio to multiple destinations in the same pipeline. -
Added
DailyTransportParams.camera_out_enabledandDailyTransportParams.microphone_out_enabledwhich allows you to enable/disable the main output camera or microphone tracks. This is useful if you only want to use custom tracks and not send the main tracks. Note that you still needaudio_out_enabled=Trueorvideo_out_enabled. -
Added
DailyTransport.capture_participant_audio()which allows you to capture an audio source (e.g. "microphone", "screenAudio" or a custom track name) from a remote participant. -
Added
DailyTransport.update_publishing()which allows you to update the call video and audio publishing settings (e.g. audio and video quality). -
Added
RTVIObserverParamswhich allows you to configure what RTVI messages are sent to the clients. -
Added a
context_window_compressionInputParam toGeminiMultimodalLiveLLMServicewhich allows you to enable a sliding context window for the session as well as set the token limit of the sliding window. -
Updated
SmallWebRTCConnectionto supportice_serverswith credentials. -
Added
VADUserStartedSpeakingFrameandVADUserStoppedSpeakingFrame, indicating when the VAD detected the user to start and stop speaking. These events are helpful when using smart turn detection, as the user's stop time can differ from when their turn ends (signified by UserStoppedSpeakingFrame). -
Added
TranslationFrame, a new frame type that contains a translated transcription. -
Added
TransportParams.audio_in_passthrough. If set (the default), incoming audio will be pushed downstream. -
Added
MCPClient; a way to connect to MCP servers and use the MCP servers' tools. -
Added
Mem0 OSS, along with Mem0 cloud support now the OSS version is also available.
Changed
TransportParams.audio_mixernow supports a string and also a dictionary to provide a mixer per destination. For example:
audio_out_mixer={
"track-1": SoundfileMixer(...),
"track-2": SoundfileMixer(...),
"track-N": SoundfileMixer(...),
},
-
The
STTMuteFilternow mutesInterimTranscriptionFrameandTranscriptionFramewhich allows theSTTMuteFilterto be used in conjunction with transports that generate transcripts, e.g.DailyTransport. -
Function calls now receive a single parameter
FunctionCallParamsinstead of(function_name, tool_call_id, args, llm, context, result_callback)which is now deprecated. -
Changed the user aggregator timeout for late transcriptions from 1.0s to 0.5s (
LLMUserAggregatorParams.aggregation_timeout). Sometimes, the STT services might give us more than one transcription which could come after the user stopped speaking. We still want to include these additional transcriptions with the first one because it's part of the user turn. This is what this timeout is helpful with. -
Short utterances not detected by VAD while the bot is speaking are now ignored. This reduces the amount of bot interruptions significantly providing a more natural conversation experience.
-
Updated
GladiaSTTServiceto output aTranslationFramewhen specifying atranslationandtranslation_config. -
STT services now passthrough audio frames by default. This allows you to add audio recording without worrying about what's wrong in your pipeline when it doesn't work the first time.
-
Input transports now always push audio downstream unless disabled with
TransportParams.audio_in_passthrough. After many Pipecat releases, we realized this is the common use case. There are use cases where the input transport already provides STT and you also don't want recordings, in which case there's no need to push audio to the rest of the pipeline, but this is not a very common case. -
Added
RivaSegmentedSTTService, which allows Riva offline/batch models, such as to be "canary-1b-asr" used in Pipecat.
Deprecated
-
Function calls with parameters
(function_name, tool_call_id, args, llm, context, result_callback)are deprectated, use a singleFunctionCallParamsparameter instead. -
TransportParams.camera_*parameters are now deprecated, useTransportParams.video_*instead. -
TransportParams.vad_enabledparameter is now deprecated, useTransportParams.audio_in_enabledandTransportParams.vad_analyzerinstead. -
TransportParams.vad_audio_passthroughparameter is now deprecated, useTransportParams.audio_in_passthroughinstead. -
ParakeetSTTServiceis now deprecated, useRivaSTTServiceinstead, which uses the model "parakeet-ctc-1.1b-asr" by default. -
FastPitchTTSServiceis now deprecated, useRivaTTSServiceinstead, which uses the model "magpie-tts-multilingual" by default.
Fixed
-
Fixed an issue with
SimliVideoServicewhere the bot was continuously outputting audio, which prevents theBotStoppedSpeakingFramefrom being emitted. -
Fixed an issue where
OpenAIRealtimeBetaLLMServicewould add two assistant messages to the context. -
Fixed an issue with
GeminiMultimodalLiveLLMServicewhere the context contained tokens instead of words. -
Fixed an issue with HTTP Smart Turn handling, where the service returns a 500 error. Previously, this would cause an unhandled exception. Now, a 500 error is treated as an incomplete response.
-
Fixed a TTS services issue that could cause assistant output not to be aggregated to the context when also using
TTSSpeakFrames. -
Fixed an issue where the
SmartTurnMetricsDatawas reporting 0ms for inference and processing time when using theFalSmartTurnAnalyzer.
Other
-
Added
examples/daily-custom-tracksto show how to send and receive Daily custom tracks. -
Added
examples/daily-multi-translationto showcase how to send multiple simulataneous translations with the same transport. -
Added 04 foundational examples for client/server transports. Also, renamed
29-livekit-audio-chat.pyto04b-transports-livekit.py. -
Added foundational example
13c-gladia-translation.pyshowing how to useTranscriptionFrameandTranslationFrame.
[0.0.65] - 2025-04-23 "Sant Jordi's release" 🌹📕
https://en.wikipedia.org/wiki/Saint_George%27s_Day_in_Catalonia
Added
-
Added automatic hangup logic to the Telnyx serializer. This feature hangs up the Telnyx call when an
EndFrameorCancelFrameis received. It is enabled by default and is configurable via theauto_hang_upInputParam. -
Added a keepalive task to
GladiaSTTServiceto prevent the websocket from disconnecting after 30 seconds of no audio input.
Changed
-
The
InputParamsforElevenLabsTTSServiceandElevenLabsHttpTTSServiceno longer require thatstabilityandsimilarity_boostbe set. You can individually set each param. -
In
TwilioFrameSerializer,call_sidis Optional so as to avoid a breaking changed.call_sidis required to automatically hang up.
Fixed
- Fixed an issue where
TwilioFrameSerializerwould send two hang up commands: one for theEndFrameand one for theCancelFrame.
[0.0.64] - 2025-04-22
Added
-
Added automatic hangup logic to the Twilio serializer. This feature hangs up the Twilio call when an
EndFrameorCancelFrameis received. It is enabled by default and is configurable via theauto_hang_upInputParam. -
Added
SmartTurnMetricsData, which contains end-of-turn prediction metrics, to theMetricsFrame. UsingMetricsFrame, you can now retrieve prediction confidence scores and processing time metrics from the smart turn analyzers. -
Added support for Application Default Credentials in Google services,
GoogleSTTService,GoogleTTSService, andGoogleVertexLLMService. -
Added support for Smart Turn Detection via the
turn_analyzertransport parameter. You can now choose betweenHttpSmartTurnAnalyzer()orFalSmartTurnAnalyzer()for remote inference orLocalCoreMLSmartTurnAnalyzer()for on-device inference using Core ML. -
DeepgramTTSServiceacceptsbase_urlargument again, allowing you to connect to an on-prem service. -
Added
LLMUserAggregatorParamsandLLMAssistantAggregatorParamswhich allow you to control aggregator settings. You can now pass these arguments when creating aggregator pairs withcreate_context_aggregator(). -
Added
previous_textcontext support to ElevenLabsHttpTTSService, improving speech consistency across sentences within an LLM response. -
Added word/timestamp pairs to
ElevenLabsHttpTTSService. -
It is now possible to disable
SoundfileMixerwhen created. You can then useMixerEnableFrameto dynamically enable it when necessary. -
Added
on_client_connectedandon_client_disconnectedevent handlers to theDailyTransportclass. These handlers map to the same underlying Daily events ason_participant_joinedandon_participant_left, respectively. This makes it easier to write a single bot pipeline that can also use other transports likeSmallWebRTCTransportandFastAPIWebsocketTransport.
Changed
-
GrokLLMServicenow usesgrok-3-betaas its default model. -
Daily's REST helpers now include an
eject_at_token_expparam, which ejects the user when their token expires. This new parameter defaults to False. Also, the default value forenable_prejoin_uichanged to False andeject_at_room_expchanged to False. -
OpenAILLMServiceandOpenPipeLLMServicenow usegpt-4.1as their default model. -
SoundfileMixerconstructor arguments need to be keywords.
Deprecated
DeepgramSTTServiceparameterurlis now deprecated, usebase_urlinstead.
Removed
- Parameters
user_kwargsandassistant_kwargswhen creating a context aggregator pair usingcreate_context_aggregator()have been removed. Useuser_paramsandassistant_paramsinstead.
Fixed
-
Fixed an issue that would cause TTS websocket-based services to not cleanup resources properly when disconnecting.
-
Fixed a
TavusVideoServiceissue that was causing audio choppiness. -
Fixed an issue in
SmallWebRTCTransportwhere an error was thrown if the client did not create a video transceiver. -
Fixed an issue where LLM input parameters were not working and applied correctly in
GoogleVertexLLMService, causing unexpected behavior during inference.
Other
- Updated the
twilio-chatbotexample to use the auto-hangup feature.
[0.0.63] - 2025-04-11
Added
-
Added media resolution control to
GeminiMultimodalLiveLLMServicewithGeminiMediaResolutionenum, allowing configuration of token usage for image processing (LOW: 64 tokens, MEDIUM: 256 tokens, HIGH: zoomed reframing with 256 tokens). -
Added Gemini's Voice Activity Detection (VAD) configuration to
GeminiMultimodalLiveLLMServicewithGeminiVADParams, allowing fine control over speech detection sensitivity and timing, including:- Start sensitivity (how quickly speech is detected)
- End sensitivity (how quickly turns end after pauses)
- Prefix padding (milliseconds of audio to keep before speech is detected)
- Silence duration (milliseconds of silence required to end a turn)
-
Added comprehensive language support to
GeminiMultimodalLiveLLMService, supporting over 30 languages via thelanguageparameter, with proper mapping between Pipecat'sLanguageenum and Gemini's language codes. -
Added support in
SmallWebRTCTransportto detect when remote tracks are muted. -
Added support for image capture from a video stream to the
SmallWebRTCTransport. -
Added a new iOS client option to the
SmallWebRTCTransportvideo-transform example. -
Added new processors
ProducerProcessorandConsumerProcessor. The producer processor processes frames from the pipeline and decides whether the consumers should consume it or not. If so, the same frame that is received by the producer is sent to the consumer. There can be multiple consumers per producer. These processors can be useful to push frames from one part of a pipeline to a different one (e.g. when usingParallelPipeline). -
Improvements for the
SmallWebRTCTransport:- Wait until the pipeline is ready before triggering the
connectedevent. - Queue messages if the data channel is not ready.
- Update the aiortc dependency to fix an issue where the 'video/rtx' MIME type was incorrectly handled as a codec retransmission.
- Avoid initial video delays.
- Wait until the pipeline is ready before triggering the
Changed
-
In
GeminiMultimodalLiveLLMService, removed thetranscribe_model_audioparameter in favor of Gemini Live's native output transcription support. Now text transcriptions are produced directly by the model. No configuration is required. -
Updated
GeminiMultimodalLiveLLMService’s defaultmodeltomodels/gemini-2.0-flash-live-001andbase_urlto thev1betawebsocket URL.
Fixed
-
Updated
daily-pythonto 0.17.0 to fix an issue that was preventing to run on older platforms. -
Fixed an issue where
CartesiaTTSService's spell feature would result in the spelled word in the context appearing as "F,O,O,B,A,R" instead of "FOOBAR". -
Fixed an issue in the Azure TTS services where the language was being set incorrectly.
-
Fixed
SmallWebRTCTransportto support dynamic values forTransportParams.audio_out_10ms_chunks. Previously, it only worked with 20ms chunks. -
Fixed an issue with
GeminiMultimodalLiveLLMServicewhere the assistant context messages had no space between words. -
Fixed an issue where
LLMAssistantContextAggregatorwould prevent aBotStoppedSpeakingFramefrom moving through the pipeline.
[0.0.62] - 2025-04-01 "An April Fools' release"
Added
-
Added
TransportParams.audio_out_10ms_chunksparameter to allow controlling the amount of audio being sent by the output transport. It defaults to 4, so 40ms audio chunks are sent. -
Added
QwenLLMServicefor Qwen integration with an OpenAI-compatible interface. Added foundational example14q-function-calling-qwen.py. -
Added
Mem0MemoryService. Mem0 is a self-improving memory layer for LLM applications. Learn more at: https://mem0.ai/. -
Added
WhisperSTTServiceMLXfor Whisper transcription on Apple Silicon. See example inexamples/foundational/13e-whisper-mlx.py. Latency of completed transcription using Whisper large-v3-turbo on an M4 macbook is ~500ms. -
Added
SmallWebRTCTransport, a new P2P WebRTC transport.- Created two examples in
p2p-webrtc:- video-transform: Demonstrates sending and receiving audio/video with
SmallWebRTCTransportusingTypeScript. Includes video frame processing with OpenCV. - voice-agent: A minimal example of creating a voice agent with
SmallWebRTCTransport.
- video-transform: Demonstrates sending and receiving audio/video with
- Created two examples in
-
GladiaSTTServicenow have comprehensive support for the latest API config options, including model, language detection, preprocessing, custom vocabulary, custom spelling, translation, and message filtering options. -
Added
SmallWebRTCTransport, a new P2P WebRTC transport.- Created two examples in
p2p-webrtc:- video-transform: Demonstrates sending and receiving audio/video with
SmallWebRTCTransportusingTypeScript. Includes video frame processing with OpenCV. - voice-agent: A minimal example of creating a voice agent with
SmallWebRTCTransport.
- video-transform: Demonstrates sending and receiving audio/video with
- Created two examples in
-
Added support to
ProtobufFrameSerializerto send the messages fromTransportMessageFrameandTransportMessageUrgentFrame. -
Added support for a new TTS service,
PiperTTSService. (see https://github.com/rhasspy/piper/) -
It is now possible to tell whether
UserStartedSpeakingFrameorUserStoppedSpeakingFramehave been generated because of emulation frames.
Changed
-
FunctionCallResultFramea are now system frames. This is to prevent function call results to be discarded during interruptions. -
Pipecat services have been reorganized into packages. Each package can have one or more of the following modules (in the future new module names might be needed) depending on the services implemented:
- image: for image generation services
- llm: for LLM services
- memory: for memory services
- stt: for Speech-To-Text services
- tts: for Text-To-Speech services
- video: for video generation services
- vision: for video recognition services
-
Base classes for AI services have been reorganized into modules. They can now be found in
pipecat.services.[ai_service,image_service,llm_service,stt_service,vision_service]. -
GladiaSTTServicenow uses thesolaria-1model by default. Other params use Gladia's default values. Added support for more language codes.
Deprecated
-
All Pipecat services imports have been deprecated and a warning will be shown when using the old import. The new import should be
pipecat.services.[service].[image,llm,memory,stt,tts,video,vision]. For example,from pipecat.services.openai.llm import OpenAILLMService. -
Import for AI services base classes from
pipecat.services.ai_servicesis now deprecated, use one ofpipecat.services.[ai_service,image_service,llm_service,stt_service,vision_service]. -
Deprecated the
languageparameter inGladiaSTTService.InputParamsin favor oflanguage_config, which better aligns with Gladia's API. -
Deprecated using
GladiaSTTService.InputParamsdirectly. Use the newGladiaInputParamsclass instead.
Fixed
-
Fixed a
FastAPIWebsocketTransportandWebsocketClientTransportissue that would cause the transport to be closed prematurely, preventing the internally queued audio to be sent. The same issue could also cause an infinite loop while using an output mixer and when sending anEndFrame, preventing the bot to finish. -
Fixed an issue that could cause the
TranscriptionUpdateFramebeing pushed because of an interruption to be discarded. -
Fixed an issue that would cause
SegmentedSTTServicebased services (e.g.OpenAISTTService) to try to transcribe non-spoken audio, causing invalid transcriptions. -
Fixed an issue where
GoogleTTSServicewas emitting twoTTSStoppedFrames.
Performance
-
Output transports now send 40ms audio chunks instead of 20ms. This should improve performance.
-
BotSpeakingFrames are now sent every 200ms. If the output transport audio chunks are higher than 200ms then they will be sent at every audio chunk.
Other
-
Added foundational example
37-mem0.pydemonstrating how to use theMem0MemoryService. -
Added foundational example
13e-whisper-mlx.pydemonstrating how to use theWhisperSTTServiceMLX.
[0.0.61] - 2025-03-26
Added
-
Added a new frame,
LLMSetToolChoiceFrame, which provides a mechanism for modifying thetool_choicein the context. -
Added
GroqTTSServicewhich provides text-to-speech functionality using Groq's API. -
Added support in
DailyTransportfor updating remote participants'canReceivepermission via theupdate_remote_participants()method, by bumping the daily-python dependency to >= 0.16.0. -
ElevenLabs TTS services now support a sample rate of 8000.
-
Added support for
instructionsinOpenAITTSService. -
Added support for
base_urlinOpenAIImageGenServiceandOpenAITTSService.
Fixed
-
Fixed an issue in
RTVIObserverthat prevented handling of Google LLM context messages. The observer now processes both OpenAI-style and Google-style contexts. -
Fixed an issue in Daily involving switching virtual devices, by bumping the daily-python dependency to >= 0.16.1.
-
Fixed a
GoogleAssistantContextAggregatorissue where function calls placeholders where not being updated when then function call result was different from a string. -
Fixed an issue that would cause
LLMAssistantContextAggregatorto block processing more frames while processing a function call result. -
Fixed an issue where the
RTVIObserverwould report two bot started and stopped speaking events for each bot turn. -
Fixed an issue in
UltravoxSTTServicethat caused improper audio processing and incorrect LLM frame output.
Other
- Added
examples/foundational/07x-interruptible-local.pyto show how a local transport can be used.
[0.0.60] - 2025-03-20
Added
- Added
default_headersparameter toBaseOpenAILLMServiceconstructor.
Changed
-
Rollback to
deepgram-sdk3.8.0 since 3.10.1 was causing connections issues. -
Changed the default
InputAudioTranscriptionmodel togpt-4o-transcribeforOpenAIRealtimeBetaLLMService.
Other
- Update the
19-openai-realtime-beta.pyand19a-azure-realtime-beta.pyexamples to use the FunctionSchema format.
[0.0.59] - 2025-03-20
Added
-
When registering a function call it is now possible to indicate if you want the function call to be cancelled if there's a user interruption via
cancel_on_interruption(defaults to False). This is now possible because function calls are executed concurrently. -
Added support for detecting idle pipelines. By default, if no activity has been detected during 5 minutes, the
PipelineTaskwill be automatically cancelled. It is possible to override this behavior by passingcancel_on_idle_timeout=False. It is also possible to change the default timeout withidle_timeout_secsor the frames that prevent the pipeline from being idle withidle_timeout_frames. Finally, anon_idle_timeoutevent handler will be triggered if the idle timeout is reached (whether the pipeline task is cancelled or not). -
Added
FalSTTService, which provides STT for Fal's Wizper API. -
Added a
reconnect_on_errorparameter to websocket-based TTS services as well as aon_connection_errorevent handler. Thereconnect_on_errorindicates whether the TTS service should reconnect on error. Theon_connection_errorwill always get called if there's any error no matter the value ofreconnect_on_error. This allows, for example, to fallback to a different TTS provider if something goes wrong with the current one. -
Added new
SkipTagsAggregatorthat extendsBaseTextAggregatorto aggregate text and skips end of sentence matching if aggregated text is between start/end tags. -
Added new
PatternPairAggregatorthat extendsBaseTextAggregatorto identify content between matching pattern pairs in streamed text. This allows for detection and processing of structured content like XML-style tags that may span across multiple text chunks or sentence boundaries. -
Added new
BaseTextAggregator. Text aggregators are used by the TTS service to aggregate LLM tokens and decide when the aggregated text should be pushed to the TTS service. They also allow for the text to be manipulated while it's being aggregated. A text aggregator can be passed viatext_aggregatorto the TTS service. -
Added new
sample_rateconstructor parameter toTavusVideoServiceto allow changing the output sample rate. -
Added new
NeuphonicTTSService. (see https://neuphonic.com) -
Added new
UltravoxSTTService. (see https://github.com/fixie-ai/ultravox) -
Added
on_frame_reached_upstreamandon_frame_reached_downstreamevent handlers toPipelineTask. Those events will be called when a frame reaches the beginning or end of the pipeline respectively. Note that by default, the event handlers will not be called unless a filter is set withPipelineTask.set_reached_upstream_filter()orPipelineTask.set_reached_downstream_filter(). -
Added support for Chirp voices in
GoogleTTSService. -
Added a
flush_audio()method toFishTTSServiceandLmntTTSService. -
Added a
set_languageconvenience method forGoogleSTTService, allowing you to set a single language. This is in addition to theset_languagesmethod which allows you to set a list of languages. -
Added
on_user_turn_audio_dataandon_bot_turn_audio_datatoAudioBufferProcessor. This gives the ability to grab the audio of only that turn for both the user and the bot. -
Added new base class
BaseObjectwhich is now the base class ofFrameProcessor,PipelineRunner,PipelineTaskandBaseTransport. The newBaseObjectadds supports for event handlers. -
Added support for a unified format for specifying function calling across all LLM services.
weather_function = FunctionSchema(
name="get_current_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the user's location.",
},
},
required=["location"],
)
tools = ToolsSchema(standard_tools=[weather_function])
-
Added
speech_thresholdparameter toGladiaSTTService. -
Allow passing user (
user_kwargs) and assistant (assistant_kwargs) context aggregator parameters when usingcreate_context_aggregator(). The values are passed as a mapping that will then be converted to arguments. -
Added
speedas anInputParamfor bothElevenLabsTTSServiceandElevenLabsHttpTTSService. -
Added new
LLMFullResponseAggregatorto aggregate full LLM completions. At every completion theon_completionevent handler is triggered. -
Added a new frame,
RTVIServerMessageFrame, and RTVI messageRTVIServerMessagewhich provides a generic mechanism for sending custom messages from server to client. TheRTVIServerMessageFrameis processed by theRTVIObserverand will be delivered to the client'sonServerMessagecallback orServerMessageevent. -
Added
GoogleLLMOpenAIBetaServicefor Google LLM integration with an OpenAI-compatible interface. Added foundational example14o-function-calling-gemini-openai-format.py. -
Added
AzureRealtimeBetaLLMServiceto support Azure's OpeanAI Realtime API. Added foundational example19a-azure-realtime-beta.py. -
Introduced
GoogleVertexLLMService, a new class for integrating with Vertex AI Gemini models. Added foundational example14p-function-calling-gemini-vertex-ai.py. -
Added support in
OpenAIRealtimeBetaLLMServicefor a slate of new features:-
The
'gpt-4o-transcribe'input audio transcription model, along with newlanguageandpromptoptions specific to that model. -
The
input_audio_noise_reductionsession property.session_properties = SessionProperties( # ... input_audio_noise_reduction=InputAudioNoiseReduction( type="near_field" # also supported: "far_field" ) # ... ) -
The
'semantic_vad'turn_detectionsession property value, a more sophisticated model for detecting when the user has stopped speaking. -
on_conversation_item_createdandon_conversation_item_updatedevents toOpenAIRealtimeBetaLLMService.@llm.event_handler("on_conversation_item_created") async def on_conversation_item_created(llm, item_id, item): # ... @llm.event_handler("on_conversation_item_updated") async def on_conversation_item_updated(llm, item_id, item): # `item` may not always be available here # ... -
The
retrieve_conversation_item(item_id)method for introspecting a conversation item on the server.item = await llm.retrieve_conversation_item(item_id)
-
Changed
-
Updated
OpenAISTTServiceto usegpt-4o-transcribeas the default transcription model. -
Updated
OpenAITTSServiceto usegpt-4o-mini-ttsas the default TTS model. -
Function calls are now executed in tasks. This means that the pipeline will not be blocked while the function call is being executed.
-
⚠️
PipelineTaskwill now be automatically cancelled if no bot activity is happening in the pipeline. There are a few settings to configure this behavior, seePipelineTaskdocumentation for more details. -
All event handlers are now executed in separate tasks in order to prevent blocking the pipeline. It is possible that event handlers take some time to execute in which case the pipeline would be blocked waiting for the event handler to complete.
-
Updated
TranscriptProcessorto support text output fromOpenAIRealtimeBetaLLMService. -
OpenAIRealtimeBetaLLMServiceandGeminiMultimodalLiveLLMServicenow push aTTSTextFrame. -
Updated the default mode for
CartesiaTTSServiceandCartesiaHttpTTSServicetosonic-2.
Deprecated
-
Passing a
start_callbacktoLLMService.register_function()is now deprecated, simply move the code from the start callback to the function call. -
TTSServiceparametertext_filteris now deprecated, usetext_filtersinstead which is now a list. This allows passing multiple filters that will be executed in order.
Removed
-
Removed deprecated
audio.resample_audio(), usecreate_default_resampler()instead. -
Removed deprecated
stt_serviceparameter fromSTTMuteFilter. -
Removed deprecated RTVI processors, use an
RTVIObserverinstead. -
Removed deprecated
AWSTTSService, usePollyTTSServiceinstead. -
Removed deprecated field
tierfromDailyTranscriptionSettings, usemodelinstead. -
Removed deprecated
pipecat.vadpackage, usepipecat.audio.vadinstead.
Fixed
-
Fixed an assistant aggregator issue that could cause assistant text to be split into multiple chunks during function calls.
-
Fixed an assistant aggregator issue that was causing assistant text to not be added to the context during function calls. This could lead to duplications.
-
Fixed a
SegmentedSTTServiceissue that was causing audio to be sent prematurely to the STT service. Instead of analyzing the volume in this service we rely on VAD events which use both VAD and volume. -
Fixed a
GeminiMultimodalLiveLLMServiceissue that was causing messages to be duplicated in the context when pushingLLMMessagesAppendFrameframes. -
Fixed an issue with
SegmentedSTTServicebased services (e.g.GroqSTTService) that was not allow audio to pass-through downstream. -
Fixed a
CartesiaTTSServiceandRimeTTSServiceissue that would consider text between spelling out tags end of sentence. -
Fixed a
match_endofsentenceissue that would result in floating point numbers to be considered an end of sentence. -
Fixed a
match_endofsentenceissue that would result in emails to be considered an end of sentence. -
Fixed an issue where the RTVI message
disconnect-botwas pushing anEndFrame, resulting in the pipeline not shutting down. It now pushes anEndTaskFrameupstream to shutdown the pipeline. -
Fixed an issue with the
GoogleSTTServicewhere stream timeouts during periods of inactivity were causing connection failures. The service now properly detects timeout errors and handles reconnection gracefully, ensuring continuous operation even after periods of silence or when using anSTTMuteFilter. -
Fixed an issue in
RimeTTSServicewhere the last line of text sent didn't result in an audio output being generated. -
Fixed
OpenAIRealtimeBetaLLMServiceby adding proper handling for:- The
conversation.item.input_audio_transcription.deltaserver message, which was added server-side at some point and not handled client-side. - Errors reported by the
response.doneserver message.
- The
Other
-
Add foundational example
07w-interruptible-fal.py, showingFalSTTService. -
Added a new Ultravox example
examples/foundational/07u-interruptible-ultravox.py. -
Added new Neuphonic examples
examples/foundational/07v-interruptible-neuphonic.pyandexamples/foundational/07v-interruptible-neuphonic-http.py. -
Added a new example
examples/foundational/36-user-email-gathering.pyto show how to gather user emails. The example uses's Cartesia's<spell></spell>tags and Rimespell()function to spell out the emails for confirmation. -
Update the
34-audio-recording.pyexample to include an STT processor. -
Added foundational example
35-voice-switching.pyshowing how to use the newPatternPairAggregator. This example shows how to encode information for the LLM to instruct TTS voice changes, but this can be used to encode any information into the LLM response, which you want to parse and use in other parts of your application. -
Added a Pipecat Cloud deployment example to the
examplesdirectory. -
Removed foundational examples 28b and 28c as the TranscriptProcessor no longer has an LLM depedency. Renamed foundational example 28a to
28-transcript-processor.py.
[0.0.58] - 2025-02-26
Added
-
Added track-specific audio event
on_track_audio_datatoAudioBufferProcessorfor accessing separate input and output audio tracks. -
Pipecat version will now be logged on every application startup. This will help us identify what version we are running in case of any issues.
-
Added a new
StopFramewhich can be used to stop a pipeline task while keeping the frame processors running. The frame processors could then be used in a different pipeline. The difference between aStopFrameand aStopTaskFrameis that, as withEndFrameandEndTaskFrame, theStopFrameis pushed from the task and theStopTaskFrameis pushed upstream inside the pipeline by any processor. -
Added a new
PipelineTaskparameterobserversthat replaces the previousPipelineParams.observers. -
Added a new
PipelineTaskparametercheck_dangling_tasksto enable or disable checking for frame processors' dangling tasks when the Pipeline finishes running. -
Added new
on_completion_timeoutevent for LLM services (all OpenAI-based services, Anthropic and Google). Note that this event will only get triggered if LLM timeouts are setup and if the timeout was reached. It can be useful to retrigger another completion and see if the timeout was just a blip. -
Added new log observers
LLMLogObserverandTranscriptionLogObserverthat can be useful for debugging your pipelines. -
Added
room_urlproperty toDailyTransport. -
Added
addonsargument toDeepgramSTTService. -
Added
exponential_backoff_time()toutils.networkmodule.
Changed
-
⚠️
PipelineTasknow requires keyword arguments (except for the first one for the pipeline). -
Updated
PlayHTHttpTTSServiceto take avoice_engineandprotocolinput in the constructor. The previous method of providing avoice_engineinput that contains the engine and protocol is deprecated by PlayHT. -
The base
TTSServiceclass now strips leading newlines before sending text to the TTS provider. This change is to solve issues where some TTS providers, like Azure, would not output text due to newlines. -
GrokLLMSServicenow usesgrok-2as the default model. -
AnthropicLLMServicenow usesclaude-3-7-sonnet-20250219as the default model. -
RimeHttpTTSServiceneeds anaiohttp.ClientSessionto be passed to the constructor as all the other HTTP-based services. -
RimeHttpTTSServicedoesn't use a default voice anymore. -
DeepgramSTTServicenow uses the newnova-3model by default. If you want to use the previous model you can passLiveOptions(model="nova-2-general"). (see https://deepgram.com/learn/introducing-nova-3-speech-to-text-api)
stt = DeepgramSTTService(..., live_options=LiveOptions(model="nova-2-general"))
Deprecated
PipelineParams.observersis now deprecated, you the newPipelineTaskparameterobservers.
Removed
- Remove
TransportParams.audio_out_is_livesince it was not being used at all.
Fixed
-
Fixed an issue that would cause undesired interruptions via
EmulateUserStartedSpeakingFrame. -
Fixed a
GoogleLLMServicethat was causing an exception when sending inline audio in some cases. -
Fixed an
AudioContextWordTTSServiceissue that would cause anEndFrameto disconnect from the TTS service before audio from all the contexts was received. This affected services like Cartesia and Rime. -
Fixed an issue that was not allowing to pass an
OpenAILLMContextto createGoogleLLMService's context aggregators. -
Fixed a
ElevenLabsTTSService,FishAudioTTSService,LMNTTTSServiceandPlayHTTTSServiceissue that was resulting in audio requested before an interruption being played after an interruption. -
Fixed
match_endofsentencesupport for ellipses. -
Fixed an issue where
EndTaskFramewas not triggeringon_client_disconnectedor closing the WebSocket in FastAPI. -
Fixed an issue in
DeepgramSTTServicewhere thesample_ratepassed to theLiveOptionswas not being used, causing the service to use the default sample rate of pipeline. -
Fixed a context aggregator issue that would not append the LLM text response to the context if a function call happened in the same LLM turn.
-
Fixed an issue that was causing HTTP TTS services to push
TTSStoppedFramemore than once. -
Fixed a
FishAudioTTSServiceissue whereTTSStoppedFramewas not being pushed. -
Fixed an issue that
start_callbackwas not invoked for some LLM services. -
Fixed an issue that would cause
DeepgramSTTServiceto stop working after an error occurred (e.g. sudden network loss). If the network recovered we would not reconnect. -
Fixed a
STTMuteFilterissue that would not mute user audio frames causing transcriptions to be generated by the STT service.
Other
-
Added Gemini support to
examples/phone-chatbot. -
Added foundational example
34-audio-recording.pyshowing how to use the AudioBufferProcessor callbacks to save merged and track recordings.
[0.0.57] - 2025-02-14
Added
-
Added new
AudioContextWordTTSService. This is a TTS base class for TTS services that handling multiple separate audio requests. -
Added new frames
EmulateUserStartedSpeakingFrameandEmulateUserStoppedSpeakingFramewhich can be used to emulated VAD behavior without VAD being present or not being triggered. -
Added a new
audio_in_stream_on_startfield toTransportParams. -
Added a new method
start_audio_in_streamingin theBaseInputTransport.- This method should be used to start receiving the input audio in case the
field
audio_in_stream_on_startis set tofalse.
- This method should be used to start receiving the input audio in case the
field
-
Added support for the
RTVIProcessorto handle buffered audio inbase64format, converting it into InputAudioRawFrame for transport. -
Added support for the
RTVIProcessorto triggerstart_audio_in_streamingonly after theclient-readymessage. -
Added new
MUTE_UNTIL_FIRST_BOT_COMPLETEstrategy toSTTMuteStrategy. This strategy starts muted and remains muted until the first bot speech completes, ensuring the bot's first response cannot be interrupted. This complements the existingFIRST_SPEECHstrategy which only mutes during the first detected bot speech. -
Added support for Google Cloud Speech-to-Text V2 through
GoogleSTTService. -
Added
RimeTTSService, a newWordTTSService. Updated the foundational example07q-interruptible-rime.pyto useRimeTTSService. -
Added support for Groq's Whisper API through the new
GroqSTTServiceand OpenAI's Whisper API through the newOpenAISTTService. Introduced a new base classBaseWhisperSTTServiceto handle common Whisper API functionality. -
Added
PerplexityLLMServicefor Perplexity NIM API integration, with an OpenAI-compatible interface. Also, added foundational example14n-function-calling-perplexity.py. -
Added
DailyTransport.update_remote_participants(). This allows you to update remote participant's settings, like their permissions or which of their devices are enabled. Requires that the local participant have participant admin permission.
Changed
-
We don't consider a colon
:and end of sentence any more. -
Updated
DailyTransportto respect theaudio_in_stream_on_startfield, ensuring it only starts receiving the audio input if it is enabled. -
Updated
FastAPIWebsocketOutputTransportto sendTransportMessageFrameandTransportMessageUrgentFrameto the serializer. -
Updated
WebsocketServerOutputTransportto sendTransportMessageFrameandTransportMessageUrgentFrameto the serializer. -
Enhanced
STTMuteConfigto validate strategy combinations, preventingMUTE_UNTIL_FIRST_BOT_COMPLETEandFIRST_SPEECHfrom being used together as they handle first bot speech differently. -
Updated foundational example
07n-interruptible-google.pyto use all Google services. -
RimeHttpTTSServicenow uses themistv2model by default. -
Improved error handling in
AzureTTSServiceto properly detect and log synthesis cancellation errors. -
Enhanced
WhisperSTTServicewith full language support and improved model documentation. -
Updated foundation example
14f-function-calling-groq.pyto useGroqSTTServicefor transcription. -
Updated
GroqLLMServiceto usellama-3.3-70b-versatileas the default model. -
RTVIObserverdoesn't handleLLMSearchResponseFrameframes anymore. For now, to handle those frames you need to create aGoogleRTVIObserverinstead.
Deprecated
-
STTMuteFilterconstructor'sstt_serviceparameter is now deprecated and will be removed in a future version. The filter now manages mute state internally instead of querying the STT service. -
RTVI.observer()is now deprecated, instantiate anRTVIObserverdirectly instead. -
All RTVI frame processors (e.g.
RTVISpeakingProcessor,RTVIBotLLMProcessor) are now deprecated, instantiate anRTVIObserverinstead.
Fixed
-
Fixed a
FalImageGenServiceissue that was causing the event loop to be blocked while loading the downloadded image. -
Fixed a
CartesiaTTSServiceservice issue that would cause audio overlapping in some cases. -
Fixed a websocket-based service issue (e.g.
CartesiaTTSService) that was preventing a reconnection after the server disconnected cleanly, which was causing an inifite loop instead. -
Fixed a
BaseOutputTransportissue that was causing upstream frames to no be pushed upstream. -
Fixed multiple issue where user transcriptions where not being handled properly. It was possible for short utterances to not trigger VAD which would cause user transcriptions to be ignored. It was also possible for one or more transcriptions to be generated after VAD in which case they would also be ignored.
-
Fixed an issue that was causing
BotStoppedSpeakingFrameto be generated too late. This could then cause issues unblockingSTTMuteFilterlater than desired. -
Fixed an issue that was causing
AudioBufferProcessorto not record synchronized audio. -
Fixed an
RTVIissue that was causingbot-tts-textmessages to be sent before being processed by the output transport. -
Fixed an issue[#1192] in 11labs where we are trying to reconnect/disconnect the websocket connection even when the connection is already closed.
-
Fixed an issue where
has_regular_messagescondition was always true inGoogleLLMContextdue toParthavingfunction_call&function_responsewithNonevalues.
Other
-
Added new
instant-voiceexample. This example showcases how to enable instant voice communication as soon as a user connects. -
Added new
local-input-select-sttexample. This examples allows you to play with local audio inputs by slecting them through a nice text interface.
[0.0.56] - 2025-02-06
Changed
-
Use
gemini-2.0-flash-001as the default model forGoogleLLMSerivce. -
Improved foundational examples 22b, 22c, and 22d to support function calling. With these base examples,
FunctionCallInProgressFrameandFunctionCallResultFramewill no longer be blocked by the gates.
Fixed
-
Fixed a
TkLocalTransportandLocalAudioTransportissues that was causing errors on cleanup. -
Fixed an issue that was causing
tests.utilsimport to fail because of logging setup. -
Fixed a
SentryMetricsissue that was preventing any metrics to be sent to Sentry and also was preventing from metrics frames to be pushed to the pipeline. -
Fixed an issue in
BaseOutputTransportwhere incoming audio would not be resampled to the desired output sample rate. -
Fixed an issue with the
TwilioFrameSerializerandTelnyxFrameSerializerwheretwilio_sample_rateandtelnyx_sample_ratewere incorrectly initialized toaudio_in_sample_rate. Those values currently default to 8000 and should be set manually from the serializer constructor if a different value is needed.
Other
- Added a new
sentry-metricsexample.
[0.0.55] - 2025-02-05
Added
-
Added a new
start_metadatafield toPipelineParams. The provided metadata will be set to the initialStartFramebeing pushed from thePipelineTask. -
Added new fields to
PipelineParamsto control audio input and output sample rates for the whole pipeline. This allows controlling sample rates from a single place instead of having to specify sample rates in each service. Setting a sample rate to a service is still possible and will override the value fromPipelineParams. -
Introduce audio resamplers (
BaseAudioResampler). This is just a base class to implement audio resamplers. Currently, two implementations are providedSOXRAudioResamplerandResampyResampler. A newcreate_default_resampler()has been added (replacing the now deprecatedresample_audio()). -
It is now possible to specify the asyncio event loop that a
PipelineTaskand all the processors should run on by passing it as a new argument to thePipelineRunner. This could allow running pipelines in multiple threads each one with its own event loop. -
Added a new
utils.TaskManager. Instead of a global task manager we now have a task manager perPipelineTask. In the previous version the task manager was global, so running multiple simultaneousPipelineTasks could result in dangling task warnings which were not actually true. In order, for all the processors to know about the task manager, we pass it through theStartFrame. This means that processors should create tasks when they receive aStartFramebut not before (because they don't have a task manager yet). -
Added
TelnyxFrameSerializerto support Telnyx calls. A full running example has also been added toexamples/telnyx-chatbot. -
Allow pushing silence audio frames before
TTSStoppedFrame. This might be useful for testing purposes, for example, passing bot audio to an STT service which usually needs additional audio data to detect the utterance stopped. -
TwilioSerializernow supports transport message frames. With this we can create Twilio emulators. -
Added a new transport:
WebsocketClientTransport. -
Added a
metadatafield toFramewhich makes it possible to pass custom data to all frames. -
Added
test/utils.pyinside of pipecat package.
Changed
-
GatedOpenAILLMContextAggregatornow require keyword arguments. Also, a newstart_openargument has been added to set the initial state of the gate. -
Added
organizationandprojectlevel authentication toOpenAILLMService. -
Improved the language checking logic in
ElevenLabsTTSServiceandElevenLabsHttpTTSServiceto properly handle language codes based on model compatibility, with appropriate warnings when language codes cannot be applied. -
Updated
GoogleLLMContextto support pushingLLMMessagesUpdateFrames that contain a combination of function calls, function call responses, system messages, or just messages. -
InputDTMFFrameis now based onDTMFFrame. There's also a newOutputDTMFFrameframe.
Deprecated
resample_audio()is now deprecated, usecreate_default_resampler()instead.
Removed
AudioBufferProcessor.reset_audio_buffers()has been removed, useAudioBufferProcessor.start_recording()andAudioBufferProcessor.stop_recording()instead.
Fixed
-
Fixed a
AudioBufferProcessorthat would cause crackling in some recordings. -
Fixed an issue in
AudioBufferProcessorwhere user callback would not be called on task cancellation. -
Fixed an issue in
AudioBufferProcessorthat would cause wrong silence padding in some cases. -
Fixed an issue where
ElevenLabsTTSServicemessages would return a 1009 websocket error by increasing the max message size limit to 16MB. -
Fixed a
DailyTransportissue that would cause events to be triggered before join finished. -
Fixed a
PipelineTaskissue that was preventing processors to be cleaned up after cancelling the task. -
Fixed an issue where queuing a
CancelFrameto a pipeline task would not cause the task to finish. However, usingPipelineTask.cancel()is still the recommended way to cancel a task.
Other
-
Improved Unit Test
run_test()to usePipelineTaskandPipelineRunner. There's now also some control aroundStartFrameandEndFrame. TheEndTaskFramehas been removed since it doesn't seem necessary with this new approach. -
Updated
twilio-chatbotwith a few new features: use 8000 sample rate and avoid resampling, a new client useful for stress testing and testing locally without the need to make phone calls. Also, added audio recording on both the client and the server to make sure the audio sounds good. -
Updated examples to use
task.cancel()to immediately exit the example when a participant leaves or disconnects, instead of pushing anEndFrame. Pushing anEndFramecauses the bot to run through everything that is internally queued (which could take some seconds). Note that usingtask.cancel()might not always be the best option and pushing anEndFramecould still be desirable to make sure all the pipeline is flushed.
[0.0.54] - 2025-01-27
Added
-
In order to create tasks in Pipecat frame processors it is now recommended to use
FrameProcessor.create_task()(which uses the newutils.asyncio.create_task()). It takes care of uncaught exceptions, task cancellation handling and task management. To cancel or wait for a task there isFrameProcessor.cancel_task()andFrameProcessor.wait_for_task(). All of Pipecat processors have been updated accordingly. Also, when a pipeline runner finishes, a warning about dangling tasks might appear, which indicates if any of the created tasks was never cancelled or awaited for (using these new functions). -
It is now possible to specify the period of the
PipelineTaskheartbeat frames withheartbeats_period_secs. -
Added
DailyMeetingTokenPropertiesandDailyMeetingTokenParamsPydantic models for meeting token creation inget_tokenmethod ofDailyRESTHelper. -
Added
enable_recordingandgeoparameters toDailyRoomProperties. -
Added
RecordingsBucketConfigtoDailyRoomPropertiesto upload recordings to a custom AWS bucket.
Changed
-
Enhanced
UserIdleProcessorwith retry functionality and control over idle monitoring via new callback signature(processor, retry_count) -> bool. Updated the17-detect-user-idle.pyto show how to use theretry_count. -
Add defensive error handling for
OpenAIRealtimeBetaLLMService's audio truncation. Audio truncation errors during interruptions now log a warning and allow the session to continue instead of throwing an exception. -
Modified
TranscriptProcessorto use TTS text frames for more accurate assistant transcripts. Assistant messages are now aggregated based on bot speaking boundaries rather than LLM context, providing better handling of interruptions and partial utterances. -
Updated foundational examples
28a-transcription-processor-openai.py,28b-transcript-processor-anthropic.py, and28c-transcription-processor-gemini.pyto use the updatedTranscriptProcessor.
Fixed
-
Fixed an
GeminiMultimodalLiveLLMServiceissue that was preventing the user to push initial LLM assistant messages (usingLLMMessagesAppendFrame). -
Added missing
FrameProcessor.cleanup()calls toPipeline,ParallelPipelineandUserIdleProcessor. -
Fixed a type error when using
voice_settingsinElevenLabsHttpTTSService. -
Fixed an issue where
OpenAIRealtimeBetaLLMServicefunction calling resulted in an error. -
Fixed an issue in
AudioBufferProcessorwhere the last audio buffer was not being processed, in cases where the_user_audio_bufferwas smaller than the buffer size.
Performance
- Replaced audio resampling library
resampywithsoxr. Resampling a 2:21s audio file from 24KHz to 16KHz took 1.41s withresampyand 0.031s withsoxrwith similar audio quality.
Other
- Added initial unit test infrastructure.
[0.0.53] - 2025-01-18
Added
-
Added
ElevenLabsHttpTTSServicewhich uses EleveLabs' HTTP API instead of the websocket one. -
Introduced pipeline frame observers. Observers can view all the frames that go through the pipeline without the need to inject processors in the pipeline. This can be useful, for example, to implement frame loggers or debuggers among other things. The example
examples/foundational/30-observer.pyshows how to add an observer to a pipeline for debugging. -
Introduced heartbeat frames. The pipeline task can now push periodic heartbeats down the pipeline when
enable_heartbeats=True. Heartbeats are system frames that are supposed to make it all the way to the end of the pipeline. When a heartbeat frame is received the traversing time (i.e. the time it took to go through the whole pipeline) will be displayed (with TRACE logging) otherwise a warning will be shown. The exampleexamples/foundational/31-heartbeats.pyshows how to enable heartbeats and forces warnings to be displayed. -
Added
LLMTextFrameandTTSTextFramewhich should be pushed by LLM and TTS services respectively instead ofTextFrames. -
Added
OpenRouterfor OpenRouter integration with an OpenAI-compatible interface. Added foundational example14m-function-calling-openrouter.py. -
Added a new
WebsocketServicebased class for TTS services, containing base functions and retry logic. -
Added
DeepSeekLLMServicefor DeepSeek integration with an OpenAI-compatible interface. Added foundational example14l-function-calling-deepseek.py. -
Added
FunctionCallResultPropertiesdataclass to provide a structured way to control function call behavior, including:run_llm: Controls whether to trigger LLM completionon_context_updated: Optional callback triggered after context update
-
Added a new foundational example
07e-interruptible-playht-http.pyfor easy testing ofPlayHTHttpTTSService. -
Added support for Google TTS Journey voices in
GoogleTTSService. -
Added
29-livekit-audio-chat.py, as a new foundational examples forLiveKitTransportLayer. -
Added
enable_prejoin_ui,max_participantsandstart_video_offparams toDailyRoomProperties. -
Added
session_timeouttoFastAPIWebsocketTransportandWebsocketServerTransportfor configuring session timeouts (in seconds). Triggerson_session_timeoutfor custom timeout handling. See examples/websocket-server/bot.py. -
Added the new modalities option and helper function to set Gemini output modalities.
-
Added
examples/foundational/26d-gemini-live-text.pywhich is using Gemini as TEXT modality and using another TTS provider for TTS process.
Changed
-
Modified
UserIdleProcessorto start monitoring only after first conversation activity (UserStartedSpeakingFrameorBotStartedSpeakingFrame) instead of immediately. -
Modified
OpenAIAssistantContextAggregatorto support controlled completions and to emit context update callbacks viaFunctionCallResultProperties. -
Added
aws_session_tokento thePollyTTSService. -
Changed the default model for
PlayHTHttpTTSServicetoPlay3.0-mini-http. -
api_key,aws_access_key_idandregionare no longer required parameters for the PollyTTSService (AWSTTSService) -
Added
session_timeoutexample inexamples/websocket-server/bot.pyto handle session timeout event. -
Changed
InputParamsinsrc/pipecat/services/gemini_multimodal_live/gemini.pyto support different modalities. -
Changed
DeepgramSTTServiceto sendfinalizeevent whenever VAD detectsUserStoppedSpeakingFrame. This helps in faster transcriptions and clearing theDeepgramaudio buffer.
Fixed
-
Fixed an issue where
DeepgramSTTServicewas not generating metrics using pipeline's VAD. -
Fixed
UserIdleProcessornot properly propagatingEndFrames through the pipeline. -
Fixed an issue where websocket based TTS services could incorrectly terminate their connection due to a retry counter not resetting.
-
Fixed a
PipelineTaskissue that would cause a dangling task after stopping the pipeline with anEndFrame. -
Fixed an import issue for
PlayHTHttpTTSService. -
Fixed an issue where languages couldn't be used with the
PlayHTHttpTTSService. -
Fixed an issue where
OpenAIRealtimeBetaLLMServiceaudio chunks were hitting an error when truncating audio content. -
Fixed an issue where setting the voice and model for
RimeHttpTTSServicewasn't working. -
Fixed an issue where
IdleFrameProcessorandUserIdleProcessorwere getting initialized before the start of the pipeline.
[0.0.52] - 2024-12-24
Added
-
Constructor arguments for GoogleLLMService to directly set tools and tool_config.
-
Smart turn detection example (
22d-natural-conversation-gemini-audio.py) that leverages Gemini 2.0 capabilities (). (see https://x.com/kwindla/status/1870974144831275410) -
Added
DailyTransport.send_dtmf()to send dial-out DTMF tones. -
Added
DailyTransport.sip_call_transfer()to forward SIP and PSTN calls to another address or number. For example, transfer a SIP call to a different SIP address or transfer a PSTN phone number to a different PSTN phone number. -
Added
DailyTransport.sip_refer()to transfer incoming SIP/PSTN calls from outside Daily to another SIP/PSTN address. -
Added an
auto_modeinput parameter toElevenLabsTTSService.auto_modeis set toTrueby default. Enabling this setting disables the chunk schedule and all buffers, which reduces latency. -
Added
KoalaFilterwhich implement on device noise reduction using Koala Noise Suppression. (see https://picovoice.ai/platform/koala/) -
Added
CerebrasLLMServicefor Cerebras integration with an OpenAI-compatible interface. Added foundational example14k-function-calling-cerebras.py. -
Pipecat now supports Python 3.13. We had a dependency on the
audiooppackage which was deprecated and now removed on Python 3.13. We are now usingaudioop-lts(https://github.com/AbstractUmbra/audioop) to provide the same functionality. -
Added timestamped conversation transcript support:
- New
TranscriptProcessorfactory provides access to user and assistant transcript processors. UserTranscriptProcessorprocesses user speech with timestamps from transcription.AssistantTranscriptProcessorprocesses assistant responses with LLM context timestamps.- Messages emitted with ISO 8601 timestamps indicating when they were spoken.
- Supports all LLM formats (OpenAI, Anthropic, Google) via standard message format.
- New examples:
28a-transcription-processor-openai.py,28b-transcription-processor-anthropic.py, and28c-transcription-processor-gemini.py.
- New
-
Add support for more languages to ElevenLabs (Arabic, Croatian, Filipino, Tamil) and PlayHT (Afrikans, Albanian, Amharic, Arabic, Bengali, Croatian, Galician, Hebrew, Mandarin, Serbian, Tagalog, Urdu, Xhosa).
Changed
-
PlayHTTTSServiceuses the new v4 websocket API, which also fixes an issue where text inputted to the TTS didn't return audio. -
The default model for
ElevenLabsTTSServiceis noweleven_flash_v2_5. -
OpenAIRealtimeBetaLLMServicenow takes amodelparameter in the constructor. -
Updated the default model for the
OpenAIRealtimeBetaLLMService. -
Room expiration (
exp) inDailyRoomPropertiesis now optional (None) by default instead of automatically setting a 5-minute expiration time. You must explicitly set expiration time if desired.
Deprecated
AWSTTSServiceis now deprecated, usePollyTTSServiceinstead.
Fixed
-
Fixed token counting in
GoogleLLMService. Tokens were summed incorrectly (double-counted in many cases). -
Fixed an issue that could cause the bot to stop talking if there was a user interruption before getting any audio from the TTS service.
-
Fixed an issue that would cause
ParallelPipelineto handleEndFrameincorrectly causing the main pipeline to not terminate or terminate too early. -
Fixed an audio stuttering issue in
FastPitchTTSService. -
Fixed a
BaseOutputTransportissue that was causing non-audio frames being processed before the previous audio frames were played. This will allow, for example, sending a frameAafter aTTSSpeakFrameand the frameAwill only be pushed downstream after the audio generated fromTTSSpeakFramehas been spoken. -
Fixed a
DeepgramSTTServiceissue that was causing language to be passed as an object instead of a string resulting in the connection to fail.
[0.0.51] - 2024-12-16
Fixed
- Fixed an issue in websocket-based TTS services that was causing infinite reconnections (Cartesia, ElevenLabs, PlayHT and LMNT).
[0.0.50] - 2024-12-11
Added
-
Added
GeminiMultimodalLiveLLMService. This is an integration for Google's Gemini Multimodal Live API, supporting:- Real-time audio and video input processing
- Streaming text responses with TTS
- Audio transcription for both user and bot speech
- Function calling
- System instructions and context management
- Dynamic parameter updates (temperature, top_p, etc.)
-
Added
AudioTranscriberutility class for handling audio transcription with Gemini models. -
Added new context classes for Gemini:
GeminiMultimodalLiveContextGeminiMultimodalLiveUserContextAggregatorGeminiMultimodalLiveAssistantContextAggregatorGeminiMultimodalLiveContextAggregatorPair
-
Added new foundational examples for
GeminiMultimodalLiveLLMService:26-gemini-multimodal-live.py26a-gemini-live-transcription.py26b-gemini-live-video.py26c-gemini-live-video.py
-
Added
SimliVideoService. This is an integration for Simli AI avatars. (see https://www.simli.com) -
Added NVIDIA Riva's
FastPitchTTSServiceandParakeetSTTService. (see https://www.nvidia.com/en-us/ai-data-science/products/riva/) -
Added
IdentityFilter. This is the simplest frame filter that lets through all incoming frames. -
New
STTMuteStrategycalledFUNCTION_CALLwhich mutes the STT service during LLM function calls. -
DeepgramSTTServicenow exposes two event handlerson_speech_startedandon_utterance_endthat could be used to implement interruptions. See new exampleexamples/foundational/07c-interruptible-deepgram-vad.py. -
Added
GroqLLMService,GrokLLMService, andNimLLMServicefor Groq, Grok, and NVIDIA NIM API integration, with an OpenAI-compatible interface. -
New examples demonstrating function calling with Groq, Grok, Azure OpenAI, Fireworks, and NVIDIA NIM:
14f-function-calling-groq.py,14g-function-calling-grok.py,14h-function-calling-azure.py,14i-function-calling-fireworks.py, and14j-function-calling-nvidia.py. -
In order to obtain the audio stored by the
AudioBufferProcessoryou can now also register anon_audio_dataevent handler. Theon_audio_datahandler will be called every timebuffer_size(a new constructor argument) is reached. Ifbuffer_sizeis 0 (default) you need to manually get the audio as before usingAudioBufferProcessor.merge_audio_buffers().
@audiobuffer.event_handler("on_audio_data")
async def on_audio_data(processor, audio, sample_rate, num_channels):
await save_audio(audio, sample_rate, num_channels)
- Added a new RTVI message called
disconnect-bot, which when handled pushes anEndFrameto trigger the pipeline to stop.
Changed
-
STTMuteFilternow supports multiple simultaneous muting strategies. -
XTTSServicelanguage now defaults toLanguage.EN. -
SoundfileMixerdoesn't resample input files anymore to avoid startup delays. The sample rate of the provided sound files now need to match the sample rate of the output transport. -
Input frames (audio, image and transport messages) are now system frames. This means they are processed immediately by all processors instead of being queued internally.
-
Expanded the transcriptions.language module to support a superset of languages.
-
Updated STT and TTS services with language options that match the supported languages for each service.
-
Updated the
AzureLLMServiceto use theOpenAILLMService. Updated theapi_versionto2024-09-01-preview. -
Updated the
FireworksLLMServiceto use theOpenAILLMService. Updated the default model toaccounts/fireworks/models/firefunction-v2. -
Updated the
simple-chatbotexample to include a Javascript and React client example, using RTVI JS and React.
Removed
- Removed
AppFrame. This was used as a special user custom frame, but there's actually no use case for that.
Fixed
-
Fixed a
ParallelPipelineissue that would cause system frames to be queued. -
Fixed
FastAPIWebsocketTransportso it can work with binary data (e.g. using the protobuf serializer). -
Fixed an issue in
CartesiaTTSServicethat could cause previous audio to be received after an interruption. -
Fixed Cartesia, ElevenLabs, LMNT and PlayHT TTS websocket reconnection. Before, if an error occurred no reconnection was happening.
-
Fixed a
BaseOutputTransportissue that was causing audio to be discarded after anEndFramewas received. -
Fixed an issue in
WebsocketServerTransportandFastAPIWebsocketTransportthat would cause a busy loop when using audio mixer. -
Fixed a
DailyTransportandLiveKitTransportissue where connections were being closed in the input transport prematurely. This was causing frames queued inside the pipeline being discarded. -
Fixed an issue in
DailyTransportthat would cause some internal callbacks to not be executed. -
Fixed an issue where other frames were being processed while a
CancelFramewas being pushed down the pipeline. -
AudioBufferProcessornow handles interruptions properly. -
Fixed a
WebsocketServerTransportissue that would prevent interruptions withTwilioSerializerfrom working. -
DailyTransport.capture_participant_videonow allows capturing user's screen share by simply passingvideo_source="screenVideo". -
Fixed Google Gemini message handling to properly convert appended messages to Gemini's required format.
-
Fixed an issue with
FireworksLLMServicewhere chat completions were failing by removing thestream_optionsfrom the chat completion options.
[0.0.49] - 2024-11-17
Added
-
Added RTVI
on_bot_startedevent which is useful in a single turn interaction. -
Added
DailyTransporteventsdialin-connected,dialin-stopped,dialin-erroranddialin-warning. Needs daily-python >= 0.13.0. -
Added
RimeHttpTTSServiceand the07q-interruptible-rime.pyfoundational example. -
Added
STTMuteFilter, a general-purpose processor that combines STT muting and interruption control. When active, it prevents both transcription and interruptions during bot speech. The processor supports multiple strategies:FIRST_SPEECH(mute only during bot's first speech),ALWAYS(mute during all bot speech), orCUSTOM(using provided callback). -
Added
STTMuteFrame, a control frame that enables/disables speech transcription in STT services.
[0.0.48] - 2024-11-10 "Antonio release"
Added
-
There's now an input queue in each frame processor. When you call
FrameProcessor.push_frame()this will internally callFrameProcessor.queue_frame()on the next processor (upstream or downstream) and the frame will be internally queued (except system frames). Then, the queued frames will get processed. With this input queue it is also possible for FrameProcessors to block processing more frames by callingFrameProcessor.pause_processing_frames(). The way to resume processing frames is by callingFrameProcessor.resume_processing_frames(). -
Added audio filter
NoisereduceFilter. -
Introduce input transport audio filters (
BaseAudioFilter). Audio filters can be used to remove background noises before audio is sent to VAD. -
Introduce output transport audio mixers (
BaseAudioMixer). Output transport audio mixers can be used, for example, to add background sounds or any other audio mixing functionality before the output audio is actually written to the transport. -
Added
GatedOpenAILLMContextAggregator. This aggregator keeps the last received OpenAI LLM context frame and it doesn't let it through until the notifier is notified. -
Added
WakeNotifierFilter. This processor expects a list of frame types and will execute a given callback predicate when a frame of any of those type is being processed. If the callback returns true the notifier will be notified. -
Added
NullFilter. A null filter doesn't push any frames upstream or downstream. This is usually used to disable one of the pipelines inParallelPipeline. -
Added
EventNotifier. This can be used as a very simple synchronization feature between processors. -
Added
TavusVideoService. This is an integration for Tavus digital twins. (see https://www.tavus.io/) -
Added
DailyTransport.update_subscriptions(). This allows you to have fine grained control of what media subscriptions you want for each participant in a room. -
Added audio filter
KrispFilter.
Changed
-
The following
DailyTransportfunctions are nowasyncwhich means they need to be awaited:start_dialout,stop_dialout,start_recording,stop_recording,capture_participant_transcriptionandcapture_participant_video. -
Changed default output sample rate to 24000. This changes all TTS service to output to 24000 and also the default output transport sample rate. This improves audio quality at the cost of some extra bandwidth.
-
AzureTTSServicenow uses Azure websockets instead of HTTP requests. -
The previous
AzureTTSServiceHTTP implementation is nowAzureHttpTTSService.
Fixed
-
Websocket transports (FastAPI and Websocket) now synchronize with time before sending data. This allows for interruptions to just work out of the box.
-
Improved bot speaking detection for all TTS services by using actual bot audio.
-
Fixed an issue that was generating constant bot started/stopped speaking frames for HTTP TTS services.
-
Fixed an issue that was causing stuttering with AWS TTS service.
-
Fixed an issue with PlayHTTTSService, where the TTFB metrics were reporting very small time values.
-
Fixed an issue where AzureTTSService wasn't initializing the specified language.
Other
-
Add
23-bot-background-sound.pyfoundational example. -
Added a new foundational example
22-natural-conversation.py. This example shows how to achieve a more natural conversation detecting when the user ends statement.
[0.0.47] - 2024-10-22
Added
-
Added
AssemblyAISTTServiceand corresponding foundational examples07o-interruptible-assemblyai.pyand13d-assemblyai-transcription.py. -
Added a foundational example for Gladia transcription:
13c-gladia-transcription.py
Changed
-
Updated
GladiaSTTServiceto use the V2 API. -
Changed
DailyTransporttranscription model tonova-2-general.
Fixed
-
Fixed an issue that would cause an import error when importing
SileroVADAnalyzerfrom the old packagepipecat.vad.silero. -
Fixed
enable_usage_metricsto control LLM/TTS usage metrics separately fromenable_metrics.
[0.0.46] - 2024-10-19
Added
-
Added
audio_passthroughparameter toSTTService. If enabled it allows audio frames to be pushed downstream in case other processors need them. -
Added input parameter options for
PlayHTTTSServiceandPlayHTHttpTTSService.
Changed
-
Changed
DeepgramSTTServicemodel tonova-2-general. -
Moved
SileroVADaudio processor toprocessors.audio.vad. -
Module
utils.audiois nowaudio.utils. A newresample_audiofunction has been added. -
PlayHTTTSServicenow uses PlayHT websockets instead of HTTP requests. -
The previous
PlayHTTTSServiceHTTP implementation is nowPlayHTHttpTTSService. -
PlayHTTTSServiceandPlayHTHttpTTSServicenow use avoice_engineofPlayHT3.0-mini, which allows for multi-lingual support. -
Renamed
OpenAILLMServiceRealtimeBetatoOpenAIRealtimeBetaLLMServiceto match other services.
Deprecated
-
LLMUserResponseAggregatorandLLMAssistantResponseAggregatorare mostly deprecated, useOpenAILLMContextinstead. -
The
vadpackage is now deprecated andaudio.vadshould be used instead. Theavdpackage will get removed in a future release.
Fixed
-
Fixed an issue that would cause an error if no VAD analyzer was passed to
LiveKitTransportparams. -
Fixed
SileroVADprocessor to support interruptions properly.
Other
- Added
examples/foundational/07-interruptible-vad.py. This is the same as07-interruptible.pybut using theSileroVADprocessor instead of passing theVADAnalyzerin the transport.
[0.0.45] - 2024-10-16
Changed
- Metrics messages have moved out from the transport's base output into RTVI.
[0.0.44] - 2024-10-15
Added
-
Added support for OpenAI Realtime API with the new
OpenAILLMServiceRealtimeBetaprocessor. (see https://platform.openai.com/docs/guides/realtime/overview) -
Added
RTVIBotTranscriptionProcessorwhich will send the RTVIbot-transcriptionprotocol message. These are TTS text aggregated (into sentences) messages. -
Added new input params to the
MarkdownTextFilterutility. You can setfilter_codeto filter code from text andfilter_tablesto filter tables from text. -
Added
CanonicalMetricsService. This processor uses the newAudioBufferProcessorto capture conversation audio and later send it to Canonical AI. (see https://canonical.chat/) -
Added
AudioBufferProcessor. This processor can be used to buffer mixed user and bot audio. This can later be saved into an audio file or processed by some audio analyzer. -
Added
on_first_participant_joinedevent toLiveKitTransport.
Changed
-
LLM text responses are now logged properly as unicode characters.
-
UserStartedSpeakingFrame,UserStoppedSpeakingFrame,BotStartedSpeakingFrame,BotStoppedSpeakingFrame,BotSpeakingFrameandUserImageRequestFrameare now based fromSystemFrame
Fixed
-
Merge
RTVIBotLLMProcessor/RTVIBotLLMTextProcessorandRTVIBotTTSProcessor/RTVIBotTTSTextProcessorto avoid out of order issues. -
Fixed an issue in RTVI protocol that could cause a
bot-llm-stoppedorbot-tts-stoppedmessage to be sent before abot-llm-textorbot-tts-textmessage. -
Fixed
DeepgramSTTServiceconstructor settings not being merged with default ones. -
Fixed an issue in Daily transport that would cause tasks to be hanging if urgent transport messages were being sent from a transport event handler.
-
Fixed an issue in
BaseOutputTransportthat would causeEndFrameto be pushed downed too early and callFrameProcessor.cleanup()before letting the transport stop properly.
[0.0.43] - 2024-10-10
Added
-
Added a new util called
MarkdownTextFilterwhich is a subclass of a new base class calledBaseTextFilter. This is a configurable utility which is intended to filter text received by TTS services. -
Added new
RTVIUserLLMTextProcessor. This processor will send an RTVIuser-llm-textmessage with the user content's that was sent to the LLM.
Changed
-
TransportMessageFramedoesn't have anurgentfield anymore, instead there's now aTransportMessageUrgentFramewhich is aSystemFrameand therefore skip all internal queuing. -
For TTS services, convert inputted languages to match each service's language format
Fixed
- Fixed an issue where changing a language with the Deepgram STT service wouldn't apply the change. This was fixed by disconnecting and reconnecting when the language changes.
[0.0.42] - 2024-10-02
Added
-
SentryMetricshas been added to report frame processor metrics to Sentry. This is now possible becauseFrameProcessorMetricscan now be passed toFrameProcessor. -
Added Google TTS service and corresponding foundational example
07n-interruptible-google.py -
Added AWS Polly TTS support and
07m-interruptible-aws.pyas an example. -
Added InputParams to Azure TTS service.
-
Added
LivekitTransport(audio-only for now). -
RTVI 0.2.0 is now supported.
-
All
FrameProcessorscan now register event handlers.
tts = SomeTTSService(...)
@tts.event_handler("on_connected"):
async def on_connected(processor):
...
-
Added
AsyncGeneratorProcessor. This processor can be used together with aFrameSerializeras an async generator. It provides agenerator()function that returns anAsyncGeneratorand that yields serialized frames. -
Added
EndTaskFrameandCancelTaskFrame. These are new frames that are meant to be pushed upstream to tell the pipeline task to stop nicely or immediately respectively. -
Added configurable LLM parameters (e.g., temperature, top_p, max_tokens, seed) for OpenAI, Anthropic, and Together AI services along with corresponding setter functions.
-
Added
sample_rateas a constructor parameter for TTS services. -
Pipecat has a pipeline-based architecture. The pipeline consists of frame processors linked to each other. The elements traveling across the pipeline are called frames.
To have a deterministic behavior the frames traveling through the pipeline should always be ordered, except system frames which are out-of-band frames. To achieve that, each frame processor should only output frames from a single task.
In this version all the frame processors have their own task to push frames. That is, when
push_frame()is called the given frame will be put into an internal queue (with the exception of system frames) and a frame processor task will push it out. -
Added pipeline clocks. A pipeline clock is used by the output transport to know when a frame needs to be presented. For that, all frames now have an optional
ptsfield (prensentation timestamp). There's currently just one clock implementationSystemClockand theptsfield is currently only used forTextFrames (audio and image frames will be next). -
A clock can now be specified to
PipelineTask(defaults toSystemClock). This clock will be passed to each frame processor via theStartFrame. -
Added
CartesiaHttpTTSService. -
DailyTransportnow supports setting the audio bitrate to improve audio quality through theDailyParams.audio_out_bitrateparameter. The new default is 96kbps. -
DailyTransportnow uses the number of audio output channels (1 or 2) to set mono or stereo audio when needed. -
Interruptions support has been added to
TwilioFrameSerializerwhen usingFastAPIWebsocketTransport. -
Added new
LmntTTSServicetext-to-speech service. (see https://www.lmnt.com/) -
Added
TTSModelUpdateFrame,TTSLanguageUpdateFrame,STTModelUpdateFrame, andSTTLanguageUpdateFrameframes to allow you to switch models, language and voices in TTS and STT services. -
Added new
transcriptions.Languageenum.
Changed
-
Context frames are now pushed downstream from assistant context aggregators.
-
Removed Silero VAD torch dependency.
-
Updated individual update settings frame classes into a single
ServiceUpdateSettingsFrameclass. -
We now distinguish between input and output audio and image frames. We introduce
InputAudioRawFrame,OutputAudioRawFrame,InputImageRawFrameandOutputImageRawFrame(and other subclasses of those). The input frames usually come from an input transport and are meant to be processed inside the pipeline to generate new frames. However, the input frames will not be sent through an output transport. The output frames can also be processed by any frame processor in the pipeline and they are allowed to be sent by the output transport. -
ParallelTaskhas been renamed toSyncParallelPipeline. ASyncParallelPipelineis a frame processor that contains a list of different pipelines to be executed concurrently. The difference between aSyncParallelPipelineand aParallelPipelineis that, given an input frame, theSyncParallelPipelinewill wait for all the internal pipelines to complete. This is achieved by making sure the last processor in each of the pipelines is synchronous (e.g. an HTTP-based service that waits for the response). -
StartFrameis back a system frame to make sure it's processed immediately by all processors.EndFramestays a control frame since it needs to be ordered allowing the frames in the pipeline to be processed. -
Updated
MoondreamServicerevision to2024-08-26. -
CartesiaTTSServiceandElevenLabsTTSServicenow add presentation timestamps to their text output. This allows the output transport to push the text frames downstream at almost the same time the words are spoken. We say "almost" because currently the audio frames don't have presentation timestamp but they should be played at roughly the same time. -
DailyTransport.on_joinedevent now returns the full session data instead of just the participant. -
CartesiaTTSServiceis now a subclass ofTTSService. -
DeepgramSTTServiceis now a subclass ofSTTService. -
WhisperSTTServiceis now a subclass ofSegmentedSTTService. ASegmentedSTTServiceis aSTTServicewhere the provided audio is given in a big chunk (i.e. from when the user starts speaking until the user stops speaking) instead of a continous stream.
Fixed
-
Fixed OpenAI multiple function calls.
-
Fixed a Cartesia TTS issue that would cause audio to be truncated in some cases.
-
Fixed a
BaseOutputTransportissue that would stop audio and video rendering tasks (after receiving andEndFrame) before the internal queue was emptied, causing the pipeline to finish prematurely. -
StartFrameshould be the first frame every processor receives to avoid situations where things are not initialized (because initialization happens onStartFrame) and other frames come in resulting in undesired behavior.
Performance
obj_id()andobj_count()now useitertools.countavoiding the need ofthreading.Lock.
Other
- Pipecat now uses Ruff as its formatter (https://github.com/astral-sh/ruff).
[0.0.41] - 2024-08-22
Added
- Added
LivekitFrameSerializeraudio frame serializer.
Fixed
-
Fix
FastAPIWebsocketOutputTransportvariable name clash with subclass. -
Fix an
AnthropicLLMServiceissue with empty arguments in function calling.
Other
- Fixed
studypalexample errors.
[0.0.40] - 2024-08-20
Added
-
VAD parameters can now be dynamicallt updated using the
VADParamsUpdateFrame. -
ErrorFramehas now afatalfield to indicate the bot should exit if a fatal error is pushed upstream (false by default). A newFatalErrorFramethat sets this flag to true has been added. -
AnthropicLLMServicenow supports function calling and initial support for prompt caching. (see https://www.anthropic.com/news/prompt-caching) -
ElevenLabsTTSServicecan now specify ElevenLabs input parameters such asoutput_format. -
TwilioFrameSerializercan now specify Twilio's and Pipecat's desired sample rates to use. -
Added new
on_participant_updatedevent toDailyTransport. -
Added
DailyRESTHelper.delete_room_by_name()andDailyRESTHelper.delete_room_by_url(). -
Added LLM and TTS usage metrics. Those are enabled when
PipelineParams.enable_usage_metricsis True. -
AudioRawFrames are now pushed downstream from the base output transport. This allows capturing the exact words the bot says by adding an STT service at the end of the pipeline. -
Added new
GStreamerPipelineSource. This processor can generate image or audio frames from a GStreamer pipeline (e.g. reading an MP4 file, and RTP stream or anything supported by GStreamer). -
Added
TransportParams.audio_out_is_live. This flag is False by default and it is useful to indicate we should not synchronize audio with sporadic images. -
Added new
BotStartedSpeakingFrameandBotStoppedSpeakingFramecontrol frames. These frames are pushed upstream and they should wrapBotSpeakingFrame. -
Transports now allow you to register event handlers without decorators.
Changed
-
Support RTVI message protocol 0.1. This includes new messages, support for messages responses, support for actions, configuration, webhooks and a bunch of new cool stuff. (see https://docs.rtvi.ai/)
-
SileroVADdependency is now imported via pip'ssilero-vadpackage. -
ElevenLabsTTSServicenow useseleven_turbo_v2_5model by default. -
BotSpeakingFrameis now a control frame. -
StartFrameis now a control frame similar toEndFrame. -
DeepgramTTSServicenow is more customizable. You can adjust the encoding and sample rate.
Fixed
-
TTSStartFrameandTTSStopFrameare now sent when TTS really starts and stops. This allows for knowing when the bot starts and stops speaking even with asynchronous services (like Cartesia). -
Fixed
AzureSTTServicetranscription frame timestamps. -
Fixed an issue with
DailyRESTHelper.create_room()expirations which would cause this function to stop working after the initial expiration elapsed. -
Improved
EndFrameandCancelFramehandling.EndFrameshould end things gracefully while aCancelFrameshould cancel all running tasks as soon as possible. -
Fixed an issue in
AIServicethat would cause a yieldedNonevalue to be processed. -
RTVI's
bot-readymessage is now sent when the RTVI pipeline is ready and a first participant joins. -
Fixed a
BaseInputTransportissue that was causing incoming system frames to be queued instead of being pushed immediately. -
Fixed a
BaseInputTransportissue that was causing start/stop interruptions incoming frames to not cancel tasks and be processed properly.
Other
-
Added
studypalexample (from to the Cartesia folks!). -
Most examples now use Cartesia.
-
Added examples
foundational/19a-tools-anthropic.py,foundational/19b-tools-video-anthropic.pyandfoundational/19a-tools-togetherai.py. -
Added examples
foundational/18-gstreamer-filesrc.pyandfoundational/18a-gstreamer-videotestsrc.pythat show how to useGStreamerPipelineSource -
Remove
requestslibrary usage. -
Cleanup examples and use
DailyRESTHelper.
[0.0.39] - 2024-07-23
Fixed
- Fixed a regression introduced in 0.0.38 that would cause Daily transcription to stop the Pipeline.
[0.0.38] - 2024-07-23
Added
-
Added
force_reload,skip_validationandtrust_repotoSileroVADandSileroVADAnalyzer. This allows caching and various GitHub repo validations. -
Added
send_initial_empty_metricsflag toPipelineParamsto request for initial empty metrics (zero values). True by default.
Fixed
-
Fixed initial metrics format. It was using the wrong keys name/time instead of processor/value.
-
STT services should be using ISO 8601 time format for transcription frames.
-
Fixed an issue that would cause Daily transport to show a stop transcription error when actually none occurred.
[0.0.37] - 2024-07-22
Added
-
Added
RTVIProcessorwhich implements the RTVI-AI standard. See https://github.com/rtvi-ai -
Added
BotInterruptionFramewhich allows interrupting the bot while talking. -
Added
LLMMessagesAppendFramewhich allows appending messages to the current LLM context. -
Added
LLMMessagesUpdateFramewhich allows changing the LLM context for the one provided in this new frame. -
Added
LLMModelUpdateFramewhich allows updating the LLM model. -
Added
TTSSpeakFramewhich causes the bot say some text. This text will not be part of the LLM context. -
Added
TTSVoiceUpdateFramewhich allows updating the TTS voice.
Removed
- We remove the
LLMResponseStartFrameandLLMResponseEndFrameframes. These were added in the past to properly handle interruptions for theLLMAssistantContextAggregator. But theLLMContextAggregatoris now based onLLMResponseAggregatorwhich handles interruptions properly by just processing theStartInterruptionFrame, so there's no need for these extra frames any more.
Fixed
-
Fixed an issue with
StatelessTextTransformerwhere it was pushing a string instead of aTextFrame. -
TTSServiceend of sentence detection has been improved. It now works with acronyms, numbers, hours and others. -
Fixed an issue in
TTSServicethat would not properly flush the current aggregated sentence if anLLMFullResponseEndFramewas found.
Performance
CartesiaTTSServicenow uses websockets which improves speed. It also leverages the new Cartesia contexts which maintains generated audio prosody when multiple inputs are sent, therefore improving audio quality a lot.
[0.0.36] - 2024-07-02
Added
-
Added
GladiaSTTService. See https://docs.gladia.io/chapters/speech-to-text-api/pages/live-speech-recognition -
Added
XTTSService. This is a local Text-To-Speech service. See https://github.com/coqui-ai/TTS -
Added
UserIdleProcessor. This processor can be used to wait for any interaction with the user. If the user doesn't say anything within a given timeout a provided callback is called. -
Added
IdleFrameProcessor. This processor can be used to wait for frames within a given timeout. If no frame is received within the timeout a provided callback is called. -
Added new frame
BotSpeakingFrame. This frame will be continuously pushed upstream while the bot is talking. -
It is now possible to specify a Silero VAD version when using
SileroVADAnalyzerorSileroVAD. -
Added
AysncFrameProcessorandAsyncAIService. Some services likeDeepgramSTTServiceneed to process things asynchronously. For example, audio is sent to Deepgram but transcriptions are not returned immediately. In these cases we still require all frames (except system frames) to be pushed downstream from a single task. That's whatAsyncFrameProcessoris for. It creates a task and all frames should be pushed from that task. So, whenever a new Deepgram transcription is ready that transcription will also be pushed from this internal task. -
The
MetricsFramenow includes processing metrics if metrics are enabled. The processing metrics indicate the time a processor needs to generate all its output. Note that not all processors generate these kind of metrics.
Changed
-
WhisperSTTServicemodel can now also be a string. -
Added missing * keyword separators in services.
Fixed
-
WebsocketServerTransportdoesn't try to send frames anymore if serializers returnsNone. -
Fixed an issue where exceptions that occurred inside frame processors were being swallowed and not displayed.
-
Fixed an issue in
FastAPIWebsocketTransportwhere it would still try to send data to the websocket after being closed.
Other
-
Added Fly.io deployment example in
examples/deployment/flyio-example. -
Added new
17-detect-user-idle.pyexample that shows how to use the newUserIdleProcessor.
[0.0.35] - 2024-06-28
Changed
-
FastAPIWebsocketParamsnow require a serializer. -
TwilioFrameSerializernow requires astreamSid.
Fixed
- Silero VAD number of frames needs to be 512 for 16000 sample rate or 256 for 8000 sample rate.
[0.0.34] - 2024-06-25
Fixed
-
Fixed an issue with asynchronous STT services (Deepgram and Azure) that could interruptions to ignore transcriptions.
-
Fixed an issue introduced in 0.0.33 that would cause the LLM to generate shorter output.
[0.0.33] - 2024-06-25
Changed
- Upgraded to Cartesia's new Python library 1.0.0.
CartesiaTTSServicenow expects a voice ID instead of a voice name (you can get the voice ID from Cartesia's playground). You can also specify the audiosample_rateandencodinginstead of the previousoutput_format.
Fixed
-
Fixed an issue with asynchronous STT services (Deepgram and Azure) that could cause static audio issues and interruptions to not work properly when dealing with multiple LLMs sentences.
-
Fixed an issue that could mix new LLM responses with previous ones when handling interruptions.
-
Fixed a Daily transport blocking situation that occurred while reading audio frames after a participant left the room. Needs daily-python >= 0.10.1.
[0.0.32] - 2024-06-22
Added
-
Allow specifying a
DeepgramSTTServiceurl which allows using on-prem Deepgram. -
Added new
FastAPIWebsocketTransport. This is a new websocket transport that can be integrated with FastAPI websockets. -
Added new
TwilioFrameSerializer. This is a new serializer that knows how to serialize and deserialize audio frames from Twilio. -
Added Daily transport event:
on_dialout_answered. See https://reference-python.daily.co/api_reference.html#daily.EventHandler -
Added new
AzureSTTService. This allows you to use Azure Speech-To-Text.
Performance
- Convert
BaseOutputTransportandBaseOutputTransportto fully use asyncio and remove the use of threads.
Other
-
Added
twilio-chatbot. This is an example that shows how to integrate Twilio phone numbers with a Pipecat bot. -
Updated
07f-interruptible-azure.pyto useAzureLLMService,AzureSTTServiceandAzureTTSService.
[0.0.31] - 2024-06-13
Performance
- Break long audio frames into 20ms chunks instead of 10ms.
[0.0.30] - 2024-06-13
Added
-
Added
report_only_initial_ttfbtoPipelineParams. This will make it so only the initial TTFB metrics after the user stops talking are reported. -
Added
OpenPipeLLMService. This service will let you run OpenAI through OpenPipe's SDK. -
Allow specifying frame processors' name through a new
nameconstructor argument. -
Added
DeepgramSTTService. This service has an ongoing websocket connection. To handle this, it subclassesAIServiceinstead ofSTTService. The output of this service will be pushed from the same task, except system frames likeStartFrame,CancelFrameorStartInterruptionFrame.
Changed
-
FrameSerializer.deserialize()can now returnNonein case it is not possible to desearialize the given data. -
daily_rest.DailyRoomPropertiesnow allows extra unknown parameters.
Fixed
-
Fixed an issue where
DailyRoomProperties.expalways had the same old timestamp unless set by the user. -
Fixed a couple of issues with
WebsocketServerTransport. It needed to usepush_audio_frame()and also VAD was not working properly. -
Fixed an issue that would cause LLM aggregator to fail with small
VADParams.stop_secsvalues. -
Fixed an issue where
BaseOutputTransportwould send longer audio frames preventing interruptions.
Other
-
Added new
07h-interruptible-openpipe.pyexample. This example shows how to use OpenPipe to run OpenAI LLMs and get the logs stored in OpenPipe. -
Added new
dialin-chatbotexample. This examples shows how to call the bot using a phone number.
[0.0.29] - 2024-06-07
Added
-
Added a new
FunctionFilter. This filter will let you filter frames based on a given function, except system messages which should never be filtered. -
Added
FrameProcessor.can_generate_metrics()method to indicate if a processor can generate metrics. In the future this might get an extra argument to ask for a specific type of metric. -
Added
BasePipeline. All pipeline classes should be based on this class. All subclasses should implement aprocessors_with_metrics()method that returns a list of allFrameProcessors in the pipeline that can generate metrics. -
Added
enable_metricstoPipelineParams. -
Added
MetricsFrame. TheMetricsFramewill report different metrics in the system. Right now, it can report TTFB (Time To First Byte) values for different services, that is the time spent between the arrival of aFrameto the processor/service until the firstDataFrameis pushed downstream. If metrics are enabled an intialMetricsFramewith all the services in the pipeline will be sent. -
Added TTFB metrics and debug logging for TTS services.
Changed
- Moved
ParallelTasktopipecat.pipeline.parallel_task.
Fixed
- Fixed PlayHT TTS service to work properly async.
[0.0.28] - 2024-06-05
Fixed
- Fixed an issue with
SileroVADAnalyzerthat would cause memory to keep growing indefinitely.
[0.0.27] - 2024-06-05
Added
- Added
DailyTransport.participants()andDailyTransport.participant_counts().
[0.0.26] - 2024-06-05
Added
-
Added
OpenAITTSService. -
Allow passing
output_formatandmodel_idtoCartesiaTTSServiceto change audio sample format and the model to use. -
Added
DailyRESTHelperwhich helps you create Daily rooms and tokens in an easy way. -
PipelineTasknow has ahas_finished()method to indicate if the task has completed. If a task is never ranhas_finished()will return False. -
PipelineRunnernow supports SIGTERM. If received, the runner will be cancelled.
Fixed
-
Fixed an issue where
BaseInputTransportandBaseOutputTransportwhere stopping push tasks before pushingEndFrameframes could cause the bots to get stuck. -
Fixed an error closing local audio transports.
-
Fixed an issue with Deepgram TTS that was introduced in the previous release.
-
Fixed
AnthropicLLMServiceinterruptions. If an interruption occurred, ausermessage could be appended after the previoususermessage. Anthropic does not allow that because it requires alternateuserandassistantmessages.
Performance
-
The
BaseInputTransportdoes not pull audio frames from sub-classes any more. Instead, sub-classes now push audio frames into a queue in the base class. Also,DailyInputTransportnow pushes audio frames every 20ms instead of 10ms. -
Remove redundant camera input thread from
DailyInputTransport. This should improve performance a little bit when processing participant videos. -
Load Cartesia voice on startup.
[0.0.25] - 2024-05-31
Added
-
Added WebsocketServerTransport. This will create a websocket server and will read messages coming from a client. The messages are serialized/deserialized with protobufs. See
examples/websocket-serverfor a detailed example. -
Added function calling (LLMService.register_function()). This will allow the LLM to call functions you have registered when needed. For example, if you register a function to get the weather in Los Angeles and ask the LLM about the weather in Los Angeles, the LLM will call your function. See https://platform.openai.com/docs/guides/function-calling
-
Added new
LangchainProcessor. -
Added Cartesia TTS support (https://cartesia.ai/)
Fixed
-
Fixed SileroVAD frame processor.
-
Fixed an issue where
camera_out_enabledwould cause the highg CPU usage if no image was provided.
Performance
- Removed unnecessary audio input tasks.
[0.0.24] - 2024-05-29
Added
-
Exposed
on_dialin_readyfor Daily transport SIP endpoint handling. This notifies when the Daily room SIP endpoints are ready. This allows integrating with third-party services like Twilio. -
Exposed Daily transport
on_app_messageevent. -
Added Daily transport
on_call_state_updatedevent. -
Added Daily transport
start_recording(),stop_recordingandstop_dialout.
Changed
-
Added
PipelineParams. This replaces theallow_interruptionsargument inPipelineTaskand will allow future parameters in the future. -
Fixed Deepgram Aura TTS base_url and added ErrorFrame reporting.
-
GoogleLLMService
api_keyargument is now mandatory.
Fixed
-
Daily tranport
dialin-readydoesn't not block anymore and it now handles timeouts. -
Fixed AzureLLMService.
[0.0.23] - 2024-05-23
Fixed
- Fixed an issue handling Daily transport
dialin-readyevent.
[0.0.22] - 2024-05-23
Added
-
Added Daily transport
start_dialout()to be able to make phone or SIP calls. See https://reference-python.daily.co/api_reference.html#daily.CallClient.start_dialout -
Added Daily transport support for dial-in use cases.
-
Added Daily transport events:
on_dialout_connected,on_dialout_stopped,on_dialout_errorandon_dialout_warning. See https://reference-python.daily.co/api_reference.html#daily.EventHandler
[0.0.21] - 2024-05-22
Added
-
Added vision support to Anthropic service.
-
Added
WakeCheckFilterwhich allows you to pass information downstream only if you say a certain phrase/word.
Changed
-
FrameSerializer.serialize()andFrameSerializer.deserialize()are nowasync. -
Filterhas been renamed toFrameFilterand it's now underprocessors/filters.
Fixed
-
Fixed Anthropic service to use new frame types.
-
Fixed an issue in
LLMUserResponseAggregatorandUserResponseAggregatorthat would cause frames after a brief pause to not be pushed to the LLM. -
Clear the audio output buffer if we are interrupted.
-
Re-add exponential smoothing after volume calculation. This makes sure the volume value being used doesn't fluctuate so much.
[0.0.20] - 2024-05-22
Added
- In order to improve interruptions we now compute a loudness level using pyloudnorm. The audio coming WebRTC transports (e.g. Daily) have an Automatic Gain Control (AGC) algorithm applied to the signal, however we don't do that on our local PyAudio signals. This means that currently incoming audio from PyAudio is kind of broken. We will fix it in future releases.
Fixed
-
Fixed an issue where
StartInterruptionFramewould causeLLMUserResponseAggregatorto push the accumulated text causing the LLM respond in the wrong task. TheStartInterruptionFrameshould not trigger any new LLM response because that would be spoken in a different task. -
Fixed an issue where tasks and threads could be paused because the executor didn't have more tasks available. This was causing issues when cancelling and recreating tasks during interruptions.
[0.0.19] - 2024-05-20
Changed
LLMUserResponseAggregatorandLLMAssistantResponseAggregatorinternal messages are now exposed through themessagesproperty.
Fixed
- Fixed an issue where
LLMAssistantResponseAggregatorwas not accumulating the full response but short sentences instead. If there's an interruption we only accumulate what the bot has spoken until now in a long response as well.
[0.0.18] - 2024-05-20
Fixed
- Fixed an issue in
DailyOuputTransportwhere transport messages were not being sent.
[0.0.17] - 2024-05-19
Added
-
Added
google.generativeaimodel support, including vision. This newgoogleservice defaults to usinggemini-1.5-flash-latest. Example inexamples/foundational/12a-describe-video-gemini-flash.py. -
Added vision support to
openaiservice. Example inexamples/foundational/12a-describe-video-gemini-flash.py. -
Added initial interruptions support. The assistant contexts (or aggregators) should now be placed after the output transport. This way, only the completed spoken context is added to the assistant context.
-
Added
VADParamsso you can control voice confidence level and others. -
VADAnalyzernow uses an exponential smoothed volume to improve speech detection. This is useful when voice confidence is high (because there's someone talking near you) but volume is low.
Fixed
-
Fixed an issue where TTSService was not pushing TextFrames downstream.
-
Fixed issues with Ctrl-C program termination.
-
Fixed an issue that was causing
StopTaskFrameto actually not exit thePipelineTask.
[0.0.16] - 2024-05-16
Fixed
-
DailyTransport: don't publish camera and audio tracks if not enabled. -
Fixed an issue in
BaseInputTransportthat was causing frames pushed downstream not pushed in the right order.
[0.0.15] - 2024-05-15
Fixed
- Quick hot fix for receiving
DailyTransportMessage.
[0.0.14] - 2024-05-15
Added
-
Added
DailyTransporteventon_participant_left. -
Added support for receiving
DailyTransportMessage.
Fixed
-
Images are now resized to the size of the output camera. This was causing images not being displayed.
-
Fixed an issue in
DailyTransportthat would not allow the input processor to shutdown if no participant ever joined the room. -
Fixed base transports start and stop. In some situation processors would halt or not shutdown properly.
[0.0.13] - 2024-05-14
Changed
-
MoondreamServiceargumentmodel_idis nowmodel. -
VADAnalyzerarguments have been renamed for more clarity.
Fixed
-
Fixed an issue with
DailyInputTransportandDailyOutputTransportthat could cause some threads to not start properly. -
Fixed
STTService. Addmax_silence_secsandmax_buffer_secsto handle better what's being passed to the STT service. Also add exponential smoothing to the RMS. -
Fixed
WhisperSTTService. Addno_speech_probto avoid garbage output text.
[0.0.12] - 2024-05-14
Added
- Added
DailyTranscriptionSettingsto be able to specify transcription settings much easier (e.g. language).
Other
-
Updated
simple-chatbotwith Spanish. -
Add missing dependencies in some of the examples.
[0.0.11] - 2024-05-13
Added
- Allow stopping pipeline tasks with new
StopTaskFrame.
Changed
- TTS, STT and image generation service now use
AsyncGenerator.
Fixed
DailyTransport: allow registering for participant transcriptions even if input transport is not initialized yet.
Other
- Updated
storytelling-chatbot.
[0.0.10] - 2024-05-13
Added
-
Added Intel GPU support to
MoondreamService. -
Added support for sending transport messages (e.g. to communicate with an app at the other end of the transport).
-
Added
FrameProcessor.push_error()to easily send anErrorFrameupstream.
Fixed
- Fixed Azure services (TTS and image generation).
Other
- Updated
simple-chatbot,moondream-chatbotandtranslation-chatbotexamples.
[0.0.9] - 2024-05-12
Changed
Many things have changed in this version. Many of the main ideas such as frames, processors, services and transports are still there but some things have changed a bit.
-
Frames describe the basic units for processing. For example, text, image or audio frames. Or control frames to indicate a user has started or stopped speaking. -
FrameProcessors process frames (e.g. they convert aTextFrameto anImageRawFrame) and push new frames downstream or upstream to their linked peers. -
FrameProcessors can be linked together. The easiest wait is to use thePipelinewhich is a container for processors. Linking processors allow frames to travel upstream or downstream easily. -
Transports are a way to send or receive frames. There can be local transports (e.g. local audio or native apps), network transports (e.g. websocket) or service transports (e.g. https://daily.co). -
Pipelines are just a processor container for other processors. -
A
PipelineTaskknow how to run a pipeline. -
A
PipelineRunnercan run one or more tasks and it is also used, for example, to capture Ctrl-C from the user.
[0.0.8] - 2024-04-11
Added
-
Added
FireworksLLMService. -
Added
InterimTranscriptionFrameand enable interim results inDailyTransporttranscriptions.
Changed
FalImageGenServicenow uses newfal_clientpackage.
Fixed
-
FalImageGenService: useasyncio.to_threadto not block main loop when generating images. -
Allow
TranscriptionFrameafter an end frame (transcriptions can be delayed and received afterUserStoppedSpeakingFrame).
[0.0.7] - 2024-04-10
Added
- Add
use_cpuargument toMoondreamService.
[0.0.6] - 2024-04-10
Added
-
Added
FalImageGenService.InputParams. -
Added
URLImageFrameandUserImageFrame. -
Added
UserImageRequestFrameand allow requesting an image from a participant. -
Added base
VisionServiceandMoondreamService
Changed
-
Don't pass
image_sizetoImageGenService, images should have their own size. -
ImageFramenow receives a tuple(width,height)to specify the size. -
on_first_other_participant_joinednow gets a participant argument.
Fixed
- Check if camera, speaker and microphone are enabled before writing to them.
Performance
DailyTransportonly subscribe to desired participant video track.
[0.0.5] - 2024-04-06
Changed
-
Use
camera_bitrateandcamera_framerate. -
Increase
camera_framerateto 30 by default.
Fixed
- Fixed
LocalTransport.read_audio_frames.
[0.0.4] - 2024-04-04
Added
- Added project optional dependencies
[silero,openai,...].
Changed
-
Moved thransports to its own directory.
-
Use
OPENAI_API_KEYinstead ofOPENAI_CHATGPT_API_KEY.
Fixed
- Don't write to microphone/speaker if not enabled.
Other
-
Added live translation example.
-
Fix foundational examples.
[0.0.3] - 2024-03-13
Other
- Added
storybotandchatbotexamples.
[0.0.2] - 2024-03-12
Initial public release.