# Changelog All notable changes to **Pipecat** will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). ## [Unreleased] ### Added - Added `LiveKitRESTHelper` utility class for managing LiveKit rooms via REST API. - Added `DeepgramSageMakerSTTService` which connects to a SageMaker hosted Deepgram STT model. Added `07c-interruptible-deepgram-sagemaker.py` foundational example. - Added `SageMakerBidiClient` to connect to SageMaker hosted BiDi compatible services. - Added support for `include_timestamps` and `enable_logging` in `ElevenLabsRealtimeSTTService`. When `include_timestamps` is enabled, timestamp data is included in the `TranscriptionFrame`'s `result` parameter. - Added optional speaking rate control to `InworldTTSService`. - Introduced a new `AggregatedTextFrame` type to support passing text along with an `aggregated_by` field to describe the type of text included. `TTSTextFrame`s now inherit from `AggregatedTextFrame`. With this inheritance, an observer can watch for `AggregatedTextFrame`s to accumlate the perceived output and determine whether or not the text was spoken based on if that frame is also a `TTSTextFrame`. With this frame, the llm token stream can be transformed into custom composable chunks, allowing for aggregation outside the TTS service. This makes it possible to listen for or handle those aggregations and sets the stage for doing things like composing a best effort of the perceived llm output in a more digestable form and to do so whether or not it is processed by a TTS or if even a TTS exists. - Introduced `LLMTextProcessor`: A new processor meant to allow customization for how LLMTextFrames should be aggregated and considered. It's purpose is to turn `LLMTextFrame`s into `AggregatedTextFrame`s. By default, a TTSService will still aggregate `LLMTextFrame`s by sentence for the service to consume. However, if you wish to override how the llm text is aggregated, you should no longer override the TTS's internal text_aggregator, but instead, insert this processor between your LLM and TTS in the pipeline. - New `bot-output` RTVI message to represent what the bot actually "says". - The `RTVIObserver` now emits `bot-output` messages based off the new `AggregatedTextFrame`s (`bot-tts-text` and `bot-llm-text` are still supported and generated, but `bot-transcript` is now deprecated in lieu of this new, more thorough, message). - The new `RTVIBotOutputMessage` includes the fields: - `spoken`: A boolean indicating whether the text was spoken by TTS - `aggregated_by`: A string representing how the text was aggregated ("sentence", "word", "my custom aggregation") - Introduced new fields to `RTVIObserver` to support the new `bot-output` messaging: - `bot_output_enabled`: Defaults to True. Set to false to disable bot-output messages. - `skip_aggregator_types`: Defaults to `None`. Set to a list of strings that match aggregation types that should not be included in bot-output messages. (Ex. `credit_card`) - Introduced new methods, `add_text_transformer()` and `remove_text_transformer()`, to `RTVIObserver` to support providing (and subsequently removing) callbacks for various types of aggregations (or all aggregations with `*`) that can modify the text before being sent as a `bot-output` or `tts-text` message. (Think obscuring the credit card or inserting extra detail the client might want that the context doesn't need.) - In `MiniMaxHttpTTSService`: - Added support for speech-2.6-hd and speech-2.6-turbo models - Added languages: Afrikaans, Bulgarian, Catalan, Danish, Persian, Filipino, Hebrew, Croatian, Hungarian, Malay, Norwegian, Nynorsk, Slovak, Slovenian, Swedish, and Tamil - Added new emotions: calm and fluent ### Changed - Updated `daily-python` to 0.22.0. - `BaseTextAggregator` changes: Modified the BaseTextAggregator type so that when text gets aggregated, metadata can be associated with it. Currently, that just means a `type`, so that the aggregation can be classified or described. Changes made to support this: - ⚠️ IMPORTANT: Aggregators are now expected to strip leading/trailing white space characters before returning their aggregation from `aggregation()` or `.text`. This way all aggregators have a consistent contract allowing downstream use to know how to stitch aggregations back together. - Introduced a new `Aggregation` dataclass to represent both the aggregated `text` and a string identifying the `type` of aggregation (ex. "sentence", "word", "my custom aggregation") - ⚠️ Breaking change: `BaseTextAggregator.text` now returns an `Aggregation` (instead of `str`). Before: ```python aggregated_text = myAggregator.text ``` Now: ```python aggregated_text = myAggregator.text.text ``` - ⚠️ Breaking change: `BaseTextAggregator.aggregate()` now returns `Optional[Aggregation]` (instead of `Optional[str]`). Before: ```python aggregation = myAggregator.aggregate(text) print(f"successfully aggregated text: {aggregation}") ``` Now: ```python aggregation = myAggregator.aggregate(text) if aggregation: print(f"successfully aggregated text: {aggregation.text}") ``` - `SimpleTextAggregator`, `SkipTagsAggregator`, `PatternPairAggregator` updated to produce/consume `Aggregation` objects. - All uses of the above Aggregators have been updated accordingly. - Augmented the `PatternPairAggregator` so that matched patterns can be treated as their own aggregation, taking advantage of the new. To that end: - Introduced a new, preferred version of `add_pattern` to support a new option for treating a match as a separate aggregation returned from `aggregate()`. This replaces the now deprecated `add_pattern_pair` method and you provide a `MatchAction` in lieu of the `remove_match` field. - `MatchAction` enum: `REMOVE`, `KEEP`, `AGGREGATE`, allowing customization for how a match should be handled. - `REMOVE`: The text along with its delimiters will be removed from the streaming text. Sentence aggregation will continue on as if this text did not exist. - `KEEP`: The delimiters will be removed, but the content between them will be kept. Sentence aggregation will continue on with the internal text included. - `AGGREGATE`: The delimiters will be removed and the content between will be treated as a separate aggregation. Any text before the start of the pattern will be returned early, whether or not a complete sentence was found. Then the pattern will be returned. Then the aggregation will continue on sentence matching after the closing delimiter is found. The content between the delimiters is not aggregated by sentence. It is aggregated as one single block of text. - `PatternMatch` now extends `Aggregation` and provides richer info to handlers. - ⚠️ Breaking change: The `PatternMatch` type returned to handlers registered via `on_pattern_match` has been updated to subclass from the new `Aggregation` type, which means that `content` has been replaced with `text` and `pattern_id` has been replaced with `type`: ```python async dev on_match_tag(match: PatternMatch): pattern = match.type # instead of match.pattern_id text = match.text # instead of match.content ``` - `TextFrame` now includes the field `append_to_context` to support setting whether or not the encompassing text should be added to the LLM context (by the LLM assistant aggregator). It defaults to `True`. - `TTSService` base class updates: - `TTSService`s now accept a new `skip_aggregator_types` to avoid speaking certain aggregation types (now determined/returned by the aggregator) - Introduced the ability to do a just-in-time transform of text before it gets sent to the TTS service via callbacks you can set up via a new init field, `text_transforms` or a new method `add_text_transformer()`. This makes it possible to do things like introduce TTS-specific tags for spelling or emotion or change the pronunciation of something on the fly. `remove_text_transformer` has also been added to support removing a registered transform callback. - TTS services push `AggregatedTextFrame` in addition to `TTSTextFrame`s when either an aggregation occurs that should not be spoken or when the TTS service supports word-by-word timestamping. In the latter case, the `TTSService` preliminarily generates an `AggregatedTextFrame`, aggregated by sentence to generate the full sentence content as early as possible. - Updated `CartesiaTTSService`: - Modified use of custom default text_aggregator to avoid deprecation warnings and push users towards use of transformers or the `LLMTextProcessor` - Added convenience methods for taking advantage of Cartesia's SSML tags: spell, emotion, pauses, volume, and speed. - Updated `RimeTTSService`: - Modified use of custom default text_aggregator to avoid deprecation warnings and push users towards use of transformers or the `LLMTextProcessor` - Added convenience methods for taking advantage of Rime's customization options: spell, pauses, pronunciations, and inline speed control. ### Deprecated - The TTS constructor field, `text_aggregator` is deprecated in favor of the new `LLMTextProcessor`. TTSServices still have an internal aggregator for support of default behavior, but if you want to override the aggregation behavior, you should use the new processor. - The RTVI `bot-transcription` event is deprecated in favor of the new `bot-output` message which is the canonical representation of bot output (spoken or not). The code still emits a transcription message for backwards compatibility while transition occurs. - Deprecated `add_pattern_pair` in the `PatternPairAggregator` which takes a `pattern_id` and `remove_match` field in favor of the new `add_pattern` method which takes a `type` and an `action` - `english_normalization` input parameter for `MiniMaxHttpTTSService` is deprecated, use `test_normalization` instead. ### Fixed - Fixed an issue in `ElevenLabsRealtimeSTTService` where dynamic language updates were not working. - Fixed an issue in `ElevenLabsRealtimeSTTService` where setting the sample rate would result in transcripts failing. - Fixed `InworldTTSService` audio config payload to use camelCase keys expected by the Inworld API. ## [0.0.95] - 2025-11-18 ### Added - Added ai-coustics integrated VAD (`AICVADAnalyzer`) with `AICFilter` factory and example wiring; leverages the enhancement model for robust detection with no ONNX dependency or added processing complexity. - Added a watchdog to `DeepgramFluxSTTService` to prevent dangling tasks in case the user was speaking and we stop receiving audio. - Introduced a minimum confidence parameter in `DeepgramFluxSTTService` to avoid generating transcriptions below a defined threshold. - Added `ElevenLabsRealtimeSTTService` which implements the Realtime STT service from ElevenLabs. - Added word-level timestamps support to Hume TTS service ### Changed - ⚠️ Breaking change: `LLMContext.create_image_message()`, `LLMContext.create_audio_message()`, `LLMContext.add_image_frame_message()` and `LLMContext.add_audio_frames_message()` are now async methods. This fixes an issue where the asyncio event loop would be blocked while encoding audio or images. - `ConsumerProcessor` now queues frames from the producer internally instead of pushing them directly. This allows us to subclass consumer processors and manipulate frames before they are pushed. - `BaseTextFilter` only require subclasses to implement the `filter()` method. - Extracted the logic for retrying connections, and create a new `send_with_retry` method inside `WebSocketService`. - Refactored `DeepgramFluxSTTService` to automatically reconnect if sending a message fails. - Updated all STT and TTS services to use consistent error handling pattern with `push_error()` method for better pipeline error event integration. - Added support for `maybe_capture_participant_camera()` and `maybe_capture_participant_screen()` for `SmallWebRTCTransport` in the runner utils. - Added Hindi support for Rime TTS services. - Updated `GeminiTTSService` to use Google Cloud Text-to-Speech streaming API instead of the deprecated Gemini API. Now uses `credentials` / `credentials_path` for authentication. The `api_key` parameter is deprecated. Also, added support for `prompt` parameter for style instructions and expressive markup tags. Significantly improved latency with streaming synthesis. - Updated language mappings for the Google and Gemini TTS services to match official documentation. ### Deprecated - The `api_key` parameter in `GeminiTTSService` is deprecated. Use `credentials` or `credentials_path` instead for Google Cloud authentication. ### Fixed - Fixed a `SimliVideoService` connection issue. - Fixed an issue in the `Runner` where, when using `SmallWebRTCTransport`, the `request_data` was not being passed to the `SmallWebRTCRunnerArguments` body. - Fixed subtle issue of assistant context messages ending up with double spaces between words or sentences. - Fixed an issue where `NeuphonicTTSService` wasn't pushing `TTSTextFrame`s, meaning assistant messages weren't being written to context. - Fixed an issue with OpenTelemetry where tracing wasn't correctly displaying LLM completions and tools when using the universal `LLMContext`. - Fixed issue where `DeepgramFluxSTTService` failed to connect if passing a `keyterm` or `tag` containing a space. - Prevented `HeyGenVideoService` from automatically disconnecting after 5 minutes. ## [0.0.94] - 2025-11-10 ### Changed - Added support for retrying `SpeechmaticsTTSService` when it returns a 503 error. Default values in `InputParams`. ### Deprecated - The `KrispFilter` is deprecated and will be removed in a future version. Use the `KrispVivaFilter` instead. ### Removed - `LivekitFrameSerializer` has been removed. Use `LiveKitTransport` instead. ### Fixed - Fixed a bug related to `LLMAssistantAggregator` where spaces were sometimes missing from assistant messages in context. ## [0.0.93] - 2025-11-07 ### Added - Added support for Sarvam Speech-to-Text service (`SarvamSTTService`) with streaming WebSocket support for `saarika` (STT) and `saaras` (STT-translate) models. - Added support for passing in a `ToolsSchema` in lieu of a list of provider- specific dicts when initializing `OpenAIRealtimeLLMService` or when updating it using `LLMUpdateSettingsFrame`. - Added `TransportParams.audio_out_silence_secs`, which specifies how many seconds of silence to output when an `EndFrame` reaches the output transport. This can help ensure that all audio data is fully delivered to clients. - Added new `FrameProcessor.broadcast_frame()` method. This will push two instances of a given frame class, one upstream and the other downstream. ```python await self.broadcast_frame(UserSpeakingFrame) ``` - Added `MetricsLogObserver` for logging performance metrics from `MetricsFrame` instances. Supports filtering via `include_metrics` parameter to control which metrics types are logged (TTFB, processing time, LLM token usage, TTS usage, smart turn metrics). - Added `pronunciation_dictionary_locators` to `ElevenLabsTTSService` and `ElevenLabsHttpTTSService`. - Added support for loading external observers. You can now register custom pipeline observers by setting the `PIPECAT_OBSERVER_FILES` environment variable. This variable should contain a colon-separated list of Python files (e.g. `export PIPECAT_OBSERVER_FILES="observer1.py:observer2.py:..."`). Each file must define a function with the following signature: ```python async def create_observers(task: PipelineTask) -> Iterable[BaseObserver]: ... ``` - Added support for new sonic-3 languages in `CartesiaTTSService` and `CartesiaHttpTTSService`. - `EndFrame` and `EndTaskFrame` have an optional `reason` field to indicate why the pipeline is being ended. - `CancelFrame` and `CancelTaskFrame` have an optional `reason` field to indicate why the pipeline is being canceled. This can be also specified when you cancel a task with `PipelineTask.cancel(reason="cancellation reason")`. - Added `include_prob_metrics` parameter to Whisper STT services to enable access to probability metrics from transcription results. - Added utility functions `extract_whisper_probability()`, `extract_openai_gpt4o_probability()`, and `extract_deepgram_probability()` to extract probability metrics from `TranscriptionFrame` objects for Whisper-based, OpenAI GPT-4o-transcribe, and Deepgram STT services respectively. - Added `LLMSwitcher.register_direct_function()`. It works much like `LLMSwitcher.register_function()` in that it's a shorthand for registering functions on all LLMs in the switcher, but for direct functions. - Added `LLMSwitcher.register_direct_function()`. It works much like `LLMSwitcher.register_function()` in that it's a shorthand for registering a function on all LLMs in the switcher, except this new method takes a direct function (a `FunctionSchema`-less function). - Added `MCPClient.get_tools_schema()` and `MCPClient.register_tools_schema()` as a two-step alternative to `MCPClient.register_tools()`, to allow users to pass MCP tools to, say, `GeminiLiveLLMService` (as well as other speech-to-speech services) in the constructor. - Added support for passing in an `LLMSwicher` to `MCPClient.register_tools()` (as well as the new `MCPClient.register_tools_schema()`). - Added `cpu_count` parameter to `LocalSmartTurnAnalyzerV3`. This is set to `1` by default for more predictable performance on low-CPU systems. ### Changed - Updated `simli-ai` to 0.1.25. - `STTMuteFilter` no longer sends `STTMuteFrame` to the STT service. The filter now blocks frames locally without instructing the STT service to stop processing audio. This prevents inactivity-related errors (such as 409 errors from Google STT) while maintaining the same muting behavior at the application level. Important: The STTMuteFilter should be placed _after_ the STT service itself. - Improved `GoogleSTTService` error handling to properly catch gRPC `Aborted` exceptions (corresponding to 409 errors) caused by stream inactivity. These exceptions are now logged at DEBUG level instead of ERROR level, since they indicate expected behavior when no audio is sent for 10+ seconds (e.g., during long silences or when audio input is blocked). The service automatically reconnects when this occurs. - Bumped the `fastapi` dependency's upperbound to `<0.122.0`. - Updated the default model for `GoogleVertexLLMService` to `gemini-2.5-flash`. - Updated the `GoogleVertexLLMService` to use the `GoogleLLMService` as a base class instead of the `OpenAILLMService`. - Updated STT and TTS services to pass through unverified language codes with a warning instead of returning None. This allows developers to use newly supported languages before Pipecat's service classes are updated, while still providing guidance on verified languages. ### Removed - Removed `needs_mcp_alternate_schema()` from `LLMService`. The mechanism that relied on it went away. ### Fixed - Restore backwards compatibility for vision/image features (broken in 0.0.92) when using non-universal context and assistant aggregators. - Fixed `DeepgramSTTService._disconnect()` to properly await `is_connected()` method call, which is an async coroutine in the Deepgram SDK. - Fixed an issue where the `SmallWebRTCRequest` dataclass in runner would scrub arbitrary request data from client due to camelCase typing. This fixes data passthrough for JS clients where `APIRequest` is used. - Fixed a bug in `GeminiLiveLLMService` where in some circumstances it wouldn't respond after a tool call. - Fixed `GeminiLiveLLMService` session resumption after a connection timeout. - `GeminiLiveLLMService` now properly supports context-provided system instruction and tools. - Fixed `GoogleLLMService` token counting to avoid double-counting tokens when Gemini sends usage metadata across multiple streaming chunks. ## [0.0.92] - 2025-10-31 🎃 "The Haunted Edition" 👻 ### Added - Added a new `DeepgramHttpTTSService`, which delivers a meaningful reduction in latency when compared to the `DeepgramTTSService`. - Add support for `speaking_rate` input parameter in `GoogleHttpTTSService`. - Added `enable_speaker_diarization` and `enable_language_identification` to `SonioxSTTService`. - Added `SpeechmaticsTTSService`, which uses Speechmatic's TTS API. Updated examples 07a\* to use the new TTS service. - Added support for including images or audio to LLM context messages using `LLMContext.create_image_message()` or `LLMContext.create_image_url_message()` (not all LLMs support URLs) and `LLMContext.create_audio_message()`. For example, when creating `LLMMessagesAppendFrame`: ```python message = LLMContext.create_image_message(image=..., size= ...) await self.push_frame(LLMMessagesAppendFrame(messages=[message], run_llm=True)) ``` - New event handlers for the `DeepgramFluxSTTService`: `on_start_of_turn`, `on_turn_resumed`, `on_end_of_turn`, `on_eager_end_of_turn`, `on_update`. - Added `generation_config` parameter support to `CartesiaTTSService` and `CartesiaHttpTTSService` for Cartesia Sonic-3 models. Includes a new `GenerationConfig` class with `volume` (0.5-2.0), `speed` (0.6-1.5), and `emotion` (60+ options) parameters for fine-grained speech generation control. - Expanded support for univeral `LLMContext` to `OpenAIRealtimeLLMService`. As a reminder, the context-setup pattern when using `LLMContext` is: ```python context = LLMContext(messages, tools) context_aggregator = LLMContextAggregatorPair(context) ``` (Note that even though `OpenAIRealtimeLLMService` now supports the universal `LLMContext`, it is not meant to be swapped out for another LLM service at runtime with `LLMSwitcher`.) Note: `TranscriptionFrame`s and `InterimTranscriptionFrame`s now go upstream from `OpenAIRealtimeLLMService`, so if you're using `TranscriptProcessor`, say, you'll want to adjust accordingly: ```python pipeline = Pipeline( [ transport.input(), context_aggregator.user(), # BEFORE llm, transcript.user(), # AFTER transcript.user(), llm, transport.output(), transcript.assistant(), context_aggregator.assistant(), ] ) ``` Also worth noting: whether or not you use the new context-setup pattern with `OpenAIRealtimeLLMService`, some types have changed under the hood: ```python ## BEFORE: # Context aggregator type context_aggregator: OpenAIContextAggregatorPair # Context frame type frame: OpenAILLMContextFrame # Context type context: OpenAIRealtimeLLMContext # or context: OpenAILLMContext ## AFTER: # Context aggregator type context_aggregator: LLMContextAggregatorPair # Context frame type frame: LLMContextFrame # Context type context: LLMContext ``` Also note that `RealtimeMessagesUpdateFrame` and `RealtimeFunctionCallResultFrame` have been deprecated, since they're no longer used by `OpenAIRealtimeLLMService`. OpenAI Realtime now works more like other LLM services in Pipecat, relying on updates to its context, pushed by context aggregators, to update its internal state. Listen for `LLMContextFrame`s for context updates. Finally, `LLMTextFrame`s are no longer pushed from `OpenAIRealtimeLLMService` when it's configured with `output_modalities=['audio']`. If you need to process its output, listen for `TTSTextFrame`s instead. - Expanded support for universal `LLMContext` to `GeminiLiveLLMService`. As a reminder, the context-setup pattern when using `LLMContext` is: ```python context = LLMContext(messages, tools) context_aggregator = LLMContextAggregatorPair(context) ``` (Note that even though `GeminiLiveLLMService` now supports the universal `LLMContext`, it is not meant to be swapped out for another LLM service at runtime with `LLMSwitcher`.) Worth noting: whether or not you use the new context-setup pattern with `GeminiLiveLLMService`, some types have changed under the hood: ```python ## BEFORE: # Context aggregator type context_aggregator: GeminiLiveContextAggregatorPair # Context frame type frame: OpenAILLMContextFrame # Context type context: GeminiLiveLLMContext # or context: OpenAILLMContext ## AFTER: # Context aggregator type context_aggregator: LLMContextAggregatorPair # Context frame type frame: LLMContextFrame # Context type context: LLMContext ``` Also note that `LLMTextFrame`s are no longer pushed from `GeminiLiveLLMService` when it's configured with `modalities=GeminiModalities.AUDIO`. If you need to process its output, listen for `TTSTextFrame`s instead. ### Changed - The development runner's `/start` endpoint now supports passing `dailyRoomProperties` and `dailyMeetingTokenProperties` in the request body when `createDailyRoom` is true. Properties are validated against the `DailyRoomProperties` and `DailyMeetingTokenProperties` types respectively and passed to Daily's room and token creation APIs. - `UserImageRawFrame` new fields `append_to_context` and `text`. The `append_to_context` field indicates if this image and text should be added to the LLM context (by the LLM assistant aggregator). The `text` field, if set, might also guide the LLM or the vision service on how to analyze the image. - `UserImageRequestFrame` new fiels `append_to_context` and `text`. Both fields will be used to set the same fields on the captured `UserImageRawFrame`. - `UserImageRequestFrame` don't require function call name and ID anymore. - Updated `MoondreamService` to process `UserImageRawFrame`. - `VisionService` expects `UserImageRawFrame` in order to analyze images. - `DailyTransport` triggers `on_error` event if transcription can't be started or stopped. - `DailyTransport` updates: `start_dialout()` now returns two values: `session_id` and `error`. `start_recording()` now returns two values: `stream_id` and `error`. - Updated `daily-python` to 0.21.0. - `SimliVideoService` now accepts `api_key` and `face_id` parameters directly, with optional `params` for `max_session_length` and `max_idle_time` configuration, aligning with other Pipecat service patterns. - Updated the default model to `sonic-3` for `CartesiaTTSService` and `CartesiaHttpTTSService`. - `FunctionFilter` now has a `filter_system_frames` arg, which controls whether or not SystemFrames are filtered. - Upgraded `aws_sdk_bedrock_runtime` to v0.1.1 to resolve potential CPU issues when running `AWSNovaSonicLLMService`. ### Deprecated - The `expect_stripped_words` parameter of `LLMAssistantAggregatorParams` is ignored when used with the newer `LLMAssistantAggregator`, which now handles word spacing automatically. - `LLMService.request_image_frame()` is deprecated, push a `UserImageRequestFrame` instead. - `UserResponseAggregator` is deprecated and will be removed in a future version. - The `send_transcription_frames` argument to `OpenAIRealtimeLLMService` is deprecated. Transcription frames are now always sent. They go upstream, to be handled by the user context aggregator. See "Added" section for details. - Types in `pipecat.services.openai.realtime.context` and `pipecat.services.openai.realtime.frames` are deprecated, as they're no longer used by `OpenAIRealtimeLLMService`. See "Added" section for details. - `SimliVideoService` `simli_config` parameter is deprecated. Use `api_key` and `face_id` parameters instead. ### Removed - Removed `enable_non_final_tokens` and `max_non_final_tokens_duration_ms` from `SonioxSTTService`. - Removed the `aiohttp_session` arg from `SarvamTTSService` as it's no longer used. ### Fixed - Fixed a `PipelineTask` issue that was causing an idle timeout for frames that were being generated but not reaching the end of the pipeline. Since the exact point when frames are discarded is unknown, we now monitor pipeline frames using an observer. If the observer detects frames are being generated, it will prevent the pipeline from being considered idle. - Fixed an issue in `HumeTTSService` that was only using Octave 2, which does not support the `description` field. Now, if a description is provided, it switches to Octave 1. - Fixed an issue where `DailyTransport` would timeout prematurely on join and on leave. - Fixed an issue in the runner where starting a DailyTransport room via `/start` didn't support using the `DAILY_SAMPLE_ROOM_URL` env var. - Fixed an issue in `ServiceSwitcher` where the `STTService`s would result in all STT services producing `TranscriptionFrame`s. ### Other - Updated all vision 12-series foundational examples to load images from a file. - Added 14-series video examples for different services. These new examples request an image from the user camera through a function call. ## [0.0.91] - 2025-10-21 ### Added - It is now possible to start a bot from the `/start` endpoint when using the runner Daily's transport. This follows the Pipecat Cloud format with `createDailyRoom` and `body` fields in the POST request body. - Added an ellipsis character (`…`) to the end of sentence detection in the string utils. - Expanded support for universal `LLMContext` to `AWSNovaSonicLLMService`. As a reminder, the context-setup pattern when using `LLMContext` is: ```python context = LLMContext(messages, tools) context_aggregator = LLMContextAggregatorPair(context) ``` (Note that even though `AWSNovaSonicLLMService` now supports the universal `LLMContext`, it is not meant to be swapped out for another LLM service at runtime with `LLMSwitcher`.) Worth noting: whether or not you use the new context-setup pattern with `AWSNovaSonicLLMService`, some types have changed under the hood: ```python ## BEFORE: # Context aggregator type context_aggregator: AWSNovaSonicContextAggregatorPair # Context frame type frame: OpenAILLMContextFrame # Context type context: AWSNovaSonicLLMContext # or context: OpenAILLMContext ## AFTER: # Context aggregator type context_aggregator: LLMContextAggregatorPair # Context frame type frame: LLMContextFrame # Context type context: LLMContext ``` - Added support for `bulbul:v3` model in `SarvamTTSService` and `SarvamHttpTTSService`. - Added `keyterms_prompt` parameter to `AssemblyAIConnectionParams`. - Added `speech_model` parameter to `AssemblyAIConnectionParams` to access the multilingual model. - Added support for trickle ICE to the `SmallWebRTCTransport`. - Added support for updating `OpenAITTSService` settings (`instructions` and `speed`) at runtime via `TTSUpdateSettingsFrame`. - Added `--whatsapp` flag to runner to better surface WhatsApp transport logs. - Added `on_connected` and `on_disconnected` events to TTS and STT websocket-based services. - Added an `aggregate_sentences` arg in `ElevenLabsHttpTTSService`, where the default value is True. - Added a `room_properties` arg to the Daily runner's `configure()` method, allowing `DailyRoomProperties` to be provided. - The runner `--folder` argument now supports downloading files from subdirectories. ### Changed - `RunnerArguments` now include the `body` field, so there's no need to add it to subclasses. Also, all `RunnerArguments` fields are now keyword-only. - `CartesiaSTTService` now inherits from `WebsocketSTTService`. - Package upgrades: - `daily-python` upgraded to 0.20.0. - `openai` upgraded to support up to 2.x.x. - `openpipe` upgraded to support up to 5.x.x. - `SpeechmaticsSTTService` updated dependencies for `speechmatics-rt>=0.5.0`. ### Deprecated - The `send_transcription_frames` argument to `AWSNovaSonicLLMService` is deprecated. Transcription frames are now always sent. They go upstream, to be handled by the user context aggregator. See "Added" section for details. - Types in `pipecat.services.aws.nova_sonic.context` are deprecated, as they're no longer used by `AWSNovaSonicLLMService`. See "Added" section for details. ### Fixed - Fixed an issue where the `RTVIProcessor` was sending duplicate `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` messages. - Fixed an issue in `AWSBedrockLLMService` where both `temperature` and `top_p` were always sent together, causing conflicts with models like Claude Sonnet 4.5 that don't allow both parameters simultaneously. The service now only includes inference parameters that are explicitly set, and `InputParams` defaults have been changed to `None` to rely on AWS Bedrock's built-in model defaults. - Fixed an issue in `RivaSegmentedSTTService` where a runtime error occurred due to a mismatch in the `_handle_transcription` method's signature. - Fixed multiple pipeline task cancellation issues. `asyncio.CancelledError` is now handled properly in `PipelineTask` making it possible to cancel an asyncio task that it's executing a `PipelineRunner` cleanly. Also, `PipelineTask.cancel()` does not block anymore waiting for the `CancelFrame` to reach the end of the pipeline (going back to the behavior in < 0.0.83). - Fixed an issue in `ElevenLabsTTSService` and `ElevenLabsHttpTTSService` where the Flash models would split words, resulting in a space being inserted between words. - Fixed an issue where audio filters' `stop()` would not be called when using `CancelFrame`. - Fixed an issue in `ElevenLabsHttpTTSService`, where `apply_text_normalization` was incorrectly set as a query parameter. It's now being added as a request parameter. - Fixed an issue where `RimeHttpTTSService` and `PiperTTSService` could generate incorrectly 16-bit aligned audio frames, potentially leading to internal errors or static audio. - Fixed an issue in `SpeechmaticsSTTService` where `AdditionalVocabEntry` items needed to have `sounds_like` for the session to start. ### Other - Added foundational example `47-sentry-metrics.py`, demonstrating how to use the `SentryMetrics` processor. - Added foundational example `14x-function-calling-openpipe.py`. ## [0.0.90] - 2025-10-10 ### Added - Added audio filter `KrispVivaFilter` using the Krisp VIVA SDK. - Added `--folder` argument to the runner, allowing files saved in that folder to be downloaded from `http://HOST:PORT/file/FILE`. - Added `GeminiLiveVertexLLMService`, for accessing Gemini Live via Google Vertex AI. - Added some new configuration options to `GeminiLiveLLMService`: - `thinking` - `enable_affective_dialog` - `proactivity` Note that these new configuration options require using a newer model than the default, like "gemini-2.5-flash-native-audio-preview-09-2025". The last two require specifying `http_options=HttpOptions(api_version="v1alpha")`. - Added `on_pipeline_error` event to `PipelineTask`. This event will get fired when an `ErrorFrame` is pushed (use `FrameProcessor.push_error()`). ```python @task.event_handler("on_pipeline_error") async def on_pipeline_error(task: PipelineTask, frame: ErrorFrame): ... ``` - Added a `service_tier` `InputParam` to the `BaseOpenAILLMService`. This parameter can influence the latency of the response. For example `"priority"` will result in faster completions, but in exchange for a higher price. ### Changed - Updated `GeminiLiveLLMService` to use the `google-genai` library rather than use WebSockets directly. ### Deprecated - `LivekitFrameSerializer` is now deprecated. Use `LiveKitTransport` instead. - `pipecat.service.openai_realtime` is now deprecated, use `pipecat.services.openai.realtime` instead or `pipecat.services.azure.realtime` for Azure Realtime. - `pipecat.service.aws_nova_sonic` is now deprecated, use `pipecat.services.aws.nova_sonic` instead. - `GeminiMultimodalLiveLLMService` is now deprecated, use `GeminiLiveLLMService`. ### Fixed - Fixed a `GoogleVertexLLMService` issue that would generate an error if no token information was returned. - `GeminiLiveLLMService` will now end gracefully (i.e. after the bot has finished) upon receiving an `EndFrame`. - `GeminiLiveLLMService` will try to seamlessly reconnect when it loses its connection. ## [0.0.89] - 2025-10-07 ### Fixed - Reverted a change introduced in 0.0.88 that was causing pipelines to be frozen when using interruption strategies and processors that block interruption frames (e.g. `STTMuteFilter`). ## [0.0.88] - 2025-10-07 ### Added - Added support for Nano Banana models to `GoogleLLMService`. For example, you can now use the `gemini-2.5-flash-image` model to generate images. - Added `HumeTTSService` for text-to-speech synthesis using Hume AI's expressive voice models. Provides high-quality, emotionally expressive speech synthesis with support for various voice models. Includes example in `examples/foundational/07ad-interruptible-hume.py`. Use with: `uv pip install pipecat-ai[hume]`. ### Changed - Updated default `GoogleLLMService` model to `gemini-2.5-flash`. ### Deprecated - PlayHT is shutting down their API on December 31st, 2025. As a result, `PlayHTTTSService` and `PlayHTHttpTTSService` are deprecated and will be removed in a future version. ### Fixed - Fixed an issue with `AWSNovaSonicLLMService` where the client wouldn't connect due to a breaking change in the AWS dependency chain. - `PermissionError` is now caught if NLTK's `punkt_tab` can't be downloaded. - Fixed an issue that would cause wrong user/assistant context ordering when using interruption strategies. - Fixed RTVI incoming message handling, broken in 0.0.87. ## [0.0.87] - 2025-10-02 ### Added - Added `WebsocketSTTService` base class for websocket-based STT services. Combines STT functionality with websocket connectivity, providing automatic error handling and reconnection capabilities with exponential backoff. - Added `DeepgramFluxSTTService` for real-time speech recognition using Deepgram's Flux WebSocket API. Flux understands conversational flow and automatically handles turn-taking. - Added RTVI messages for user/bot audio levels and system logs. - Include OpenAI-based LLM services cached tokens to `MetricsFrame`. ### Changed - Updated the default model for `AnthropicLLMService` to `claude-sonnet-4-5-20250929`. ### Deprecated - `DailyTransportMessageFrame` and `DailyTransportMessageUrgentFrame` are deprecated, use `DailyOutputTransportMessageFrame` and `DailyOutputTransportMessageUrgentFrame` respectively instead. - `LiveKitTransportMessageFrame` and `LiveKitTransportMessageUrgentFrame` are deprecated, use `LiveKitOutputTransportMessageFrame` and `LiveKitOutputTransportMessageUrgentFrame` respectively instead. - `TransportMessageFrame` and `TransportMessageUrgentFrame` are deprecated, use `OutputTransportMessageFrame` and `OutputTransportMessageUrgentFrame` respectively instead. - `InputTransportMessageUrgentFrame` is deprecated, use `InputTransportMessageFrame` instead. - `DailyUpdateRemoteParticipantsFrame` is deprecated and will be removed in a future version. Instead, create your own custom frame and handle it in the `@transport.output().event_handler("on_after_push_frame")` event handler or a custom processor. ## Fixed - Fixed an issue in `AWSBedrockLLMService` where timeout exceptions weren't being detected. - Fixed a `PipelineTask` issue that could prevent the application to exit if `task.cancel()` was called when the task was already finished. - Fixed an issue where local SmartTurn was not being ran in a separate thread. ## [0.0.86] - 2025-09-24 ### Added - Added `HeyGenTransport`. This is an integration for HeyGen Interactive Avatar. A video service that handles audio streaming and requests HeyGen to generate avatar video responses. (see https://www.heygen.com/). When used, the Pipecat bot joins the same virtual room as the HeyGen Avatar and the user. - Added support to `TwilioFrameSerializer` for `region` and `edge` settings. - Added support for using universal `LLMContext` with: - `LLMLogObserver` - `GatedLLMContextAggregator` (formerly `GatedOpenAILLMContextAggregator`) - `LangchainProcessor` - `Mem0MemoryService` - Added `StrandsAgentProcessor` which allows you to use the Strands Agents framework to build your voice agents. See https://strandsagents.com - Added `ElevenLabsSTTService` for speech-to-text transcription. - Added a peer connection monitor to the `SmallWebRTCConnection` that automatically disconnects if the connection fails to establish within the timeout (1 minute by default). - Added memory cleanup improvements to reduce memory peaks. - Added `on_before_process_frame`, `on_after_process_frame`, `on_before_push_frame` and `on_after_push_frame`. These are synchronous events that get called before and after a frame is processed or pushed. Note that these events are synchrnous so they should ideally perform lightweight tasks in order to not block the pipeline. See `examples/foundational/45-before-and-after-events.py`. - Added `on_before_leave` synchronous event to `DailyTransport`. - Added `on_before_disconnect` synchronous event to `LiveKitTransport`. - It is now possible to register synchronous event handlers. By default, all event handlers are executed in a separate task. However, in some cases we want to guarantee order of execution, for example, executing something before disconnecting a transport. ```python self._register_event_handler("on_event_name", sync=True) ``` - Added support for global location in `GoogleVertexLLMService`. The service now supports both regional locations (e.g., "us-east4") and the "global" location for Vertex AI endpoints. When using "global" location, the service will use `aiplatform.googleapis.com` as the API host instead of the regional format. - Added `on_pipeline_finished` event to `PipelineTask`. This event will get fired when the pipeline is done running. This can be the result of a `StopFrame`, `CancelFrame` or `EndFrame`. ```python @task.event_handler("on_pipeline_finished") async def on_pipeline_finished(task: PipelineTask, frame: Frame): ... ``` - Added support for new RTVI `send-text` event, along with the ability to toggle the audio response off (skip tts) while handling the new context. ### Changed - Updated `aiortc` to 1.13.0. - Updated `sentry` to 2.38.0. - `BaseOutputTransport` methods `write_audio_frame` and `write_video_frame` now return a boolean to indicate if the transport implementation was able to write the given frame or not. - Updated Silero VAD model to v6. - Updated `livekit` to 1.0.13. - `torch` and `torchaudio` are no longer required for running Smart Turn locally. This avoids gigabytes of dependencies being installed. - Updated `websockets` dependency to support version 15.0. Removed deprecated usage of `ConnectionClosed.code` and `ConnectionClosed.reason` attributes in `AWSTranscribeSTTService` for compatibility. - Refactored `pyproject.toml` to reduce websockets dependency repetition using self-referencing extras. All websockets-dependent services now reference a shared `websockets-base` extra. ### Deprecated - `GladiaSTTService`'s `confidence` arg is deprecated. `confidence` is no longer needed to determine which transcription or translation frames to emit. - `PipelineTask` events `on_pipeline_stopped`, `on_pipeline_ended` and `on_pipeline_cancelled` are now deprecated. Use `on_pipeline_finished` instead. - Support for the RTVI `append-to-context` event, in lieu of the new `send-text` event and making way for future events like `send-image`. ### Fixed - Fixed an issue where the pipeline could freeze if a task cancellation never completed because a third-party library swallowed asyncio.CancelledError. We now apply a timeout to task cancellations to prevent these freezes. If the timeout is reached, the system logs warnings and leaves dangling tasks behind, which can help diagnose where cancellation is being blocked. - Fixed an `AudioBufferProcessor` issues that was causing user audio to be missing in stereo recordings causing bot and user overlaps. - Fixed a `BaseOutputTransport` issue that could produce large saved `AudioBufferProcessor` files when using an audio mixer. - Fixed a `PipelineRunner` issue on Windows where setting up SIGINT and SIGTERM was raising an exception. - Fixed an issue where multiple handlers for an event would not run in parallel. - Fixed `DailyTransport.sip_call_transfer()` to automatically use the session ID from the `on_dialin_connected` event, when not explicitly provided. Now supports cold transfers (from incoming dial-in calls) by automatically tracking session IDs from connection events. - Fixed a memory leak in `SmallWebRTCTransport`. In `aiortc`, when you receive a `MediaStreamTrack` (audio or video), frames are produced asynchronously. If the code never consumes these frames, they are queued in memory, causing a memory leak. - Fixed an issue in `AsyncAITTSService`, where `TTSTextFrames` were not being pushed. - Fixed an issue that would cause `push_interruption_task_frame_and_wait()` to not wait if a previous interruption had already happened. - Fixed a couple of bugs in `ServiceSwitcher`: - Using multiple `ServiceSwitcher`s in a pipeline would result in an error. - `ServiceSwitcherFrame`s (such as `ManuallySwitchServiceFrame`s) were having an effect too early, essentially "jumping the queue" in terms of pipeline frame ordering. - Fixed a self-cancellation deadlock in `UserIdleProcessor` when returning `False` from an idle callback. The task now terminates naturally instead of attempting to cancel itself. - Fixed an issue in `AudioBufferProcessor` where a recording is not created when a bot speaks and user input is blocked. - Fixed a `FastAPIWebsocketTransport` and `SmallWebRTCTransport` issue where `on_client_disconnected` would be triggered when the bot ends the conversation. That is, `on_client_disconnected` should only be triggered when the remote client actually disconnects. - Fixed an issue in `HeyGenVideoService` where the `BotStartedSpeakingFrame` was blocked from moving through the Pipeline. ## [0.0.85] - 2025-09-12 ### Added - `AzureSTTService` now pushes interim transcriptions. - Added `voice_cloning_key` to `GoogleTTSService` to support custom cloned voices. - Added `speaking_rate` to `GoogleTTSService.InputParams` to control the speaking rate. - Added a `speed` arg to `OpenAITTSService` to control the speed of the voice response. - Added `FrameProcessor.push_interruption_task_frame_and_wait()`. Use this method to programatically interrupt the bot from any part of the pipeline. This guarantees that all the processors in the pipeline are interrupted in order (from upstream to downstream). Internally, this works by first pushing an `InterruptionTaskFrame` upstream until it reaches the pipeline task. The pipeline task then generates an `InterruptionFrame`, which flows downstream through all processors. Once the `InterruptionFrame` has reaches the processor waiting for the interruption, the function returns and execution continues after the call. Think of it as sending an upstream request for interruption and waiting until the acknowledgment flows back downstream. - Added new base `TaskFrame` (which is a system frame). This is the base class for all task frames (`EndTaskFrame`, `CancelTaskFrame`, etc.) that are meant to be pushed upstream to reach the pipeline task. - Expanded support for universal `LLMContext` to the AWS Bedrock LLM service. Using the universal `LLMContext` and associated `LLMContextAggregatorPair` is a pre-requisite for using `LLMSwitcher` to switch between LLMs at runtime. - Added new fields to the development runner's `parse_telephony_websocket` method in support of providing dynamic data to a bot. - Twilio: Added a new `body` parameter, which parses the websocket message for `customParameters`. Provide data via the `Parameter` nouns in your TwiML to use this feature. - Telnyx & Exotel: Both providers make the `to` and `from` phone numbers available in the websocket messages. You can now access these numbers as `call_data["to"]` and `call_data["from"]`. Note: Each telephony provider offers different features. Refer to the corresponding example in `pipecat-examples` to see how to pass custom data to your bot. - Added `body` to the `WebsocketRunnerArguments` as an optional parameter. Custom `body` information can be passed from the server into the bot file via the `bot()` method using this new parameter. - Added video streaming support to `LiveKitTransport`. - Added `OpenAIRealtimeLLMService` and `AzureRealtimeLLMService` which provide access to OpenAI Realtime. ### Changed - `pipeline.tests.utils.run_test()` now allows passing `PipelineParams` instead of individual parameters. ### Removed - Remove `VisionImageRawFrame` in favor of context frames (`LLMContextFrame` or `OpenAILLMContextFrame`). ### Deprecated - `BotInterruptionFrame` is now deprecated, use `InterruptionTaskFrame` instead. - `StartInterruptionFrame` is now deprected, use `InterruptionFrame` instead. - Deprecate `VisionImageFrameAggregator` because `VisionImageRawFrame` has been removed. See the `12*` examples for the new recommended replacement pattern. - `NoisereduceFilter` is now deprecated and will be removed in a future version. Use other audio filters like `KrispFilter` or `AICFilter`. - Deprecated `OpenAIRealtimeBetaLLMService` and `AzureRealtimeBetaLLMService`. Use `OpenAIRealtimeLLMService` and `AzureRealtimeLLMService`, respectively. Each service will be removed in an upcoming version, 1.0.0. ### Fixed - Fixed a `BaseOutputTransport` issue that caused incorrect detection of when the bot stopped talking while using an audio mixer. - Fixed a `LiveKitTransport` issue where RTVI messages were not properly encoded. - Add additional fixups to Mistral context messages to ensure they meet Mistral-specific requirements, avoiding Mistral "invalid request" errors. - Fixed `DailyTransport` transcription handling to gracefully handle missing `rawResponse` field in transcription messages, preventing KeyError crashes. ## [0.0.84] - 2025-09-05 ### Added - Add the ability to send DTMF to `LiveKitTransport`. - Expanded support for universal `LLMContext` to the Anthropic LLM service. Using the universal `LLMContext` and associated `LLMContextAggregatorPair` is a pre-requisite for using `LLMSwitcher` to switch between LLMs at runtime. ### Changed - Updated `daily-python` to 0.19.9. - Restored `DailyTransport`'s native DTMF support using Daily's `send_dtmf()` method instead of generated audio tones. ### Fixed - Fixed a `AWSBedrockLLMService` crash caused by an extra `await`. - Fixed a `OpenAIImageGenService` issue where it was not creating `URLImageRawFrame` correctly. ## [0.0.83] - 2025-09-03 ### Added - Added multilingual support for AsyncAI in `AsyncAITTSService` and `AsyncAIHttpTTSService`. - New `languages`: `es`, `fr`, `de`, `it`. - Added new frames `InputTransportMessageUrgentFrame` and `DailyInputTransportMessageUrgentFrame` for transport messages received from external sources. - Added `UserSpeakingFrame`. This will be sent upstream and downstream while VAD detects the user is speaking. - Expanded support for universal `LLMContext` to more LLM services. Using the universal `LLMContext` and associated `LLMContextAggregatorPair` is a pre-requisite for using `LLMSwitcher` to switch between LLMs at runtime. Here are the newly-supported services: - Azure - Cerebras - Deepseek - Fireworks AI - Google Vertex AI - Grok - Groq - Mistral - NVIDIA NIM - Ollama - OpenPipe - OpenRouter - Perplexity - Qwen - SambaNova - Together.ai - Added support for WhatsApp User-initiated Calls. - Added new audio filter `AICFilter`, speech enhancement for improving VAD/STT performance, no ONNX dependency. See https://ai-coustics.com/sdk/ - Added a timeout around cancel input tasks to prevent indefinite hangs when cancellation is swallowed by third-party code. - Added `pipecat.extensions.ivr` for automated IVR system navigation with configurable goals and conversation handling. Supports DTMF input, verbal responses, and intelligent menu traversal. Basic usage: ```python from pipecat.extensions.ivr.ivr_navigator import IVRNavigator # Create IVR navigator with your goal ivr_navigator = IVRNavigator( llm=llm_service, ivr_prompt="Navigate to billing department to dispute a charge" ) # Handle different outcomes @ivr_navigator.event_handler("on_conversation_detected") async def on_conversation(processor, conversation_history): # Switch to normal conversation mode pass @ivr_navigator.event_handler("on_ivr_status_changed") async def on_ivr_status(processor, status): if status == IVRStatus.COMPLETED: # End pipeline, transfer call, or start bot conversation elif status == IVRStatus.STUCK: # Handle navigation failure ``` - `BaseOutputTransport` now implements `write_dtmf()` by loading DTMF audio and sending it through the transport. This makes sending DTMF generic across all output transports. - Added new config parameters to `GladiaSTTService`. - PreProcessingConfig > `audio_enhancer` to enhance audio quality. - CustomVocabularyItem > `pronunciations` and `language` to specify special pronunciations and in which language it will be pronounced. ### Changed - `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` are also pushed upstream. - `ParallelPipeline` now waits for `CancelFrame` to finish in all branches before pushing it downstream. - Added `sip_codecs` to the `DailyRoomSipParams`. - Updated the `configure()` function in `pipecat.runner.daily` to include new args to create SIP-enabled rooms. Additionally, added new args to control the room and token expiration durations. - `pipecat.frames.frames.KeypadEntry` is deprecated and has been moved to `pipecat.audio.dtmf.types.KeypadEntry`. - Updated `RimeTTSService`'s flush_audio message to conform with Rime's official API. - Updated the default model for `CerebrasLLMService` to GPT-OSS-120B. ### Removed - Remove `StopInterruptionFrame`. This was a legacy frame that was not being used really anywhere and it didn't provide any useful meaning. It was only pushed after `UserStoppedSpeakingFrame`, so developers can just use `UserStoppedSpeakingFrame`. - `DailyTransport.write_dtmf()` has been removed in favor of the generic `BaseOutputTransport.write_dtmf()`. - Remove deprecated `DailyTransport.send_dtmf()`. ### Deprecated - Transports have been re-organized. ``` pipecat.transports.network.small_webrtc -> pipecat.transports.smallwebrtc.transport pipecat.transports.network.webrtc_connection -> pipecat.transports.smallwebrtc.connection pipecat.transports.network.websocket_client -> pipecat.transports.websocket.client pipecat.transports.network.websocket_server -> pipecat.transports.websocket.server pipecat.transports.network.fastapi_websocket -> pipecat.transports.websocket.fastapi pipecat.transports.services.daily -> pipecat.transports.daily.transport pipecat.transports.services.helpers.daily_rest -> pipecat.transports.daily.utils pipecat.transports.services.livekit -> pipecat.transports.livekit.transport pipecat.transports.services.tavus -> pipecat.transports.tavus.transport ``` - `pipecat.frames.frames.KeypadEntry` is deprecated use `pipecat.audio.dtmf.types.KeypadEntry` instead. ### Fixed - Fixed an issue where messages received from the transport were always being resent. - Fixed `SmallWebRTCTransport` to not use `mid` to decide if the transceiver should be `sendrecv` or not. - Fixed an issue where Deepgram swallowed `asyncio.CancelledError` during disconnect, preventing tasks from being cancelled. - Fixed an issue where `PipelineTask` was not cleaning up the observers. ### Performance - Reduced latency and improved memory performance in `Mem0MemoryService`. ## [0.0.82] - 2025-08-28 ### Added - Added a new `LLMRunFrame` to trigger an LLM response: ```python await task.queue_frames([LLMRunFrame()]) ``` This replaces `OpenAILLMContextFrame`, which you’d previously typically use like this: ```python await task.queue_frames([context_aggregator.user().get_context_frame()]) ``` Use this way of kicking off your conversation when you’ve already initialized your context and are simply instructing the bot when to go: ```python context = OpenAILLMContext(messages, tools) context_aggregator = llm.create_context_aggregator(context) # ... @transport.event_handler("on_client_connected") async def on_client_connected(transport, client): # Kick off the conversation. await task.queue_frames([LLMRunFrame()]) ``` Note that if you want to add new messages when kicking off the conversation, you could use `LLMMessagesAppendFrame` with `run_llm=True` instead: ```python @transport.event_handler("on_client_connected") async def on_client_connected(transport, client): # Kick off the conversation. await task.queue_frames([LLMMessagesAppendFrame(new_messages, run_llm=True)]) ``` In the rare case you don’t have a context aggregator in your pipeline, then you may continue using a context frame. - Added support for switching between audio+text to text-only modes within the same pipeline. This is done by pushing `LLMConfigureOutputFrame(skip_tts=True)` to enter text-only mode, and disabling it to return to audio+text. The LLM will still generate tokens and add them to the context, but they will not be sent to TTS. - Added `skip_tts` field to `TextFrame`. This lets a text frame bypass TTS while still being included in the LLM context. Useful for cases like structured text that isn’t meant to be spoken but should still contribute to context. - Added a `cancel_timeout_secs` argument to `PipelineTask` which defines how long the pipeline has to complete cancellation. When `PipelineTask.cancel()` is called, a `CancelFrame` is pushed through the pipeline and must reach the end. If it does not reach the end within the specified time, a warning is shown and the wait is aborted. - Added a new "universal" (LLM-agnostic) `LLMContext` and accompanying `LLMContextAggregatorPair`, which will eventually replace `OpenAILLMContext` (and the other under-the-hood contexts) and the other context aggregators. The new universal `LLMContext` machinery allows a single context to be shared between different LLMs, enabling runtime LLM switching and scenarios like failover. From the developer's point of view, switching to using the new universal context machinery will usually be a matter of going from this: ```python context = OpenAILLMContext(messages, tools) context_aggregator = llm.create_context_aggregator(context) ``` To this: ```python context = LLMContext(messages, tools) context_aggregator = LLMContextAggregatorPair(context) ``` To start, the universal `LLMContext` is supported with the following LLM services: - `OpenAILLMService` - `GoogleLLMService` - Added a new `LLMSwitcher` class to enable runtime LLM switching, built atop a new generic `ServiceSwitcher`. Switchers take a switching strategy. The first available strategy is `ServiceSwitcherStrategyManual`. To switch LLMs at runtime, the LLMs must be sharing one instance of the new universal `LLMContext` (see above bullet). ```python # Instantiate your LLM services llm_openai = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY")) llm_google = GoogleLLMService(api_key=os.getenv("GOOGLE_API_KEY")) # Instantiate a switcher # (ServiceSwitcherStrategyManual defaults to OpenAI, as it's first in the list) llm_switcher = LLMSwitcher( llms=[llm_openai, llm_google], strategy_type=ServiceSwitcherStrategyManual ) # Create your pipeline pipeline = Pipeline( [ transport.input(), stt, context_aggregator.user(), llm_switcher, tts, transport.output(), context_aggregator.assistant(), ] ) task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True)) # ... # Whenever is appropriate, switch LLMs! await task.queue_frames([ManuallySwitchServiceFrame(service=llm_google)]) ``` - Added an `LLMService.run_inference()` method to LLM services to enable direct, out-of-band (i.e. out-of-pipeline) inference. ### Changed - Updated `daily-python` to 0.19.8. - `PipelineTask` now waits for `StartFrame` to reach the end of the pipeline before pushing any other frames. - Updated `CartesiaTTSService` and `CartesiaHttpTTSService` to align with Cartesia's changes for the `speed` parameter. It now takes only an enum of `slow`, `normal`, or `fast`. - Added support to `AWSBedrockLLMService` for setting authentication credentials through environment variables. - Updated `SarvamTTSService` to use WebSocket streaming for real-time audio generation with multiple Indian languages, with HTTP support still available via `SarvamHttpTTSService`. ### Fixed - Fixed an RTVI issue that was causing frames to be pushed before pipeline was properly initialized. - Fixed some `get_messages_for_logging()` that were returning a JSON string instead of a list. - Fixed a `DailyTransport` issue that prevented DTMF tones from being sent. - Fixed a missing import in `SentryMetrics`. - Fixed `AWSPollyTTSService` to support AWS credential provider chain (IAM roles, IRSA, instance profiles) instead of requiring explicit environment variables. - Fixed a `CartesiaTTSService` issue that was causing the application to hang after Cartesia's 5 minutes timed out. - Fixed an issue preventing `SpeechmaticsSTTService` from transcribing audio. ## [0.0.81] - 2025-08-25 ### Added - Added `pipecat.extensions.voicemail`, a module for detecting voicemail vs. live conversation, primarily intended for use in outbound calling scenarios. The voicemail module is optimized for text LLMs only. - Added new frames to the `idle_timeout_frames` arg: `TranscriptionFrame`, `InterimTranscriptionFrame`, `UserStartedSpeakingFrame`, and `UserStoppedSpeakingFrame`. These additions serve as indicators of user activity in the pipeline idle detection logic. - Allow passing custom pipeline sink and source processors to a `Pipeline`. Pipeline source and sink processors are used to know and control what's coming in and out of a `Pipeline` processor. - Added `FrameProcessor.pause_processing_system_frames()` and `FrameProcessor.resume_processing_system_frames()`. These allow to pause and resume the processing of system frame. - Added new `on_process_frame()` observer method which makes it possible to know when a frame is being processed. - Added new `FrameProcessor.entry_processor()` method. This allows you to access the first non-compound processor in a pipeline. - Added `FrameProcessor` properties `processors`, `next` and `previous`. - `ElevenLabsTTSService` now supports additional runtime changes to the `model`, `language`, and `voice_settings` parameters. - Added `apply_text_normalization` support to `ElevenLabsTTSService` and `ElevenLabsHttpTTSService`. - Added `MistralLLMService`, using Mistral's chat completion API. - Added the ability to retry executing a chat completion after a timeout period for `OpenAILLMService` and its subclasses, `AnthropicLLMService`, and `AWSBedrockLLMService`. The LLM services accept new args: `retry_timeout_secs` and `retry_on_timeout`. This feature is disabled by default. ### Changed - Updated `daily-python` to 0.19.7. ### Deprecated - `FrameProcessor.wait_for_task()` is deprecated. Use `await task` or `await asyncio.wait_for(task, timeout)` instead. ### Removed - Watchdog timers have been removed. They were introduced in 0.0.72 to help diagnose pipeline freezes. Unfortunately, they proved ineffective since they required developers to use Pipecat-specific queues, iterators, and events to correctly reset the timer, which limited their usefulness and added friction. - Removed unused `FrameProcessor.set_parent()` and `FrameProcessor.get_parent()`. ### Fixed - Fixed an issue that would cause `PipelineRunner` and `PipelineTask` to not handle external asyncio task cancellation properly. - Added `SpeechmaticsSTTService` exception handling on connection and sending. - Replaced `asyncio.wait_for()` for `wait_for2.wait_for()` for Python < 3.12. because of issues regarding task cancellation (i.e. cancellation is never propagated). See https://bugs.python.org/issue42130 - Fixed an `AudioBufferProcessor` issues that would cause audio overlap when setting a max buffer size. - Fixed an issue where `AsyncAITTSService` had very high latency in responding by adding `force=true` when sending the flush command. ### Performance - Improve `PipelineTask` performance by using direct mode processors and by removing unnecessary tasks. - Improve `ParallelPipeline` performance by using direct mode, by not creating a task for each frame and every sub-pipeline and also by removing other unnecessary tasks. - `Pipeline` performance improvements by using direct mode. ### Other - Added `14w-function-calling-mistal.py` using `MistralLLMService`. - Added `13j-azure-transcription.py` using `AzureSTTService`. ## [0.0.80] - 2025-08-13 ### Added - Added `GeminiTTSService` which uses Google Gemini to generate TTS output. The Gemini model can be prompted to insert styled speech to control the TTS output. - Added Exotel support to Pipecat's development runner. You can now connect using the runner with `uv run bot.py -t exotel` and an ngrok connection to HTTP port 7860. - Added `enable_direct_mode` argument to `FrameProcessor`. The direct mode is for processors which require very little I/O or compute resources, that is processors that can perform their task almost immediately. These type of processors don't need any of the internal tasks and queues usually created by frame processors which means overall application performance might be slightly increased. Use with care. - Added TTFB metrics for `HeyGenVideoService` and `TavusVideoService`. - Added `endpoint_id` parameter to `AzureSTTService`. ([Custom EndpointId](https://docs.azure.cn/en-us/ai-services/speech-service/how-to-recognize-speech?pivots=programming-language-python#use-a-custom-endpoint)) ### Changed - `WatchdogPriorityQueue` now requires the items to be inserted to always be tuples and the size of the tuple needs to be specified in the constructor when creating the queue with the `tuple_size` argument. - Updated Moondream to revision `2025-01-09`. - Updated `PlayHTHttpTTSService` to no longer use the `pyht` client to remove compatibility issues with other packages. Now you can use the PlayHT HTTP service with other services, like GoogleLLMService. - Updated `pyproject.toml` to once again pin `numba` to `>=0.61.2` in order to resolve package versioning issues. - Updated the `STTMuteFilter` to include `VADUserStartedSpeakingFrame` and `VADUserStoppedSpeakingFrame` in the list of frames to filter when the filtering is on. ### Performance - Improving the latency of the `HeyGenVideoService`. - Improved some frame processors performance by using the new frame processor direct mode. In direct mode a frame processor will process frames right away avoiding the need for internal queues and tasks. This is useful for some simple processors. For example, in processors that wrap other processors (e.g. `Pipeline`, `ParallelPipeline`), we add one processor before and one after the wrapped processors (internally, you will see them as sources and sinks). These sources and sinks don't do any special processing and they basically forward frames. So, for these simple processors we now enable the new direct mode which avoids creating any internal tasks (and queues) and therefore improves performance. ### Fixed - Fixed an issue with the `BaseWhisperSTTService` where the language was specified as an enum and not a string. - Fixed an issue where `SmallWebRTCTransport` ended before TTS finished. - Fixed an issue in `OpenAIRealtimeBetaLLMService` where specifying a `text` `modalities` didn't result in text being outputted from the model. - Added SSML reserved character escaping to `AzureBaseTTSService` to properly handle special characters in text sent to Azure TTS. This fixes an issue where characters like `&`, `<`, `>`, `"`, and `'` in LLM-generated text would cause TTS failures. - Fixed a `WatchdogPriorityQueue` issue that could cause an exception when compating watchdog cancel sentinel items with other items in the queue. - Fixed an issue that would cause system frames to not be processed with higher priority than other frames. This could cause slower interruption times. - Fixed an issue where retrying a websocket connection error would result in an error. ### Other - Add foundation example `19b-openai-realtime-beta-text.py`, showing how to use `OpenAIRealtimeBetaLLMService` to output text to a TTS service. - Add vision support to release evals so we can run the foundational examples 12 series. - Added foundational example `15a-switch-languages.py` to release evals. It is able to detect if we switched the language properly. - Updated foundational examples to show how to enclose complex logic (e.g. `ParallelPipeline`) into a single processor so the main pipeline becomes simpler. - Added `07n-interruptible-gemini.py`, demonstrating how to use `GeminiTTSService`. ## [0.0.79] - 2025-08-07 ### Changed - Changed `pipecat-ai`'s `openai` dependency to `>=1.74.0,<=1.99.1` due to a breaking change in `openai` 1.99.2 ([commit](https://github.com/openai/openai-python/commit/657f551dbe583ffb259d987dafae12c6211fba06)) ### Deprecated - `TTSService.say()` is deprecated, push a `TTSSpeakFrame` instead. Calling functions directly is a discouraged pattern in Pipecat because, for example, it might cause issues with frame ordering. - `LLMMessagesFrame` is deprecated, in favor of either: - `LLMMessagesUpdateFrame` with `run_llm=True` - `OpenAILLMContextFrame` with desired messages in a new context - `LLMUserResponseAggregator` and `LLMAssistantResponseAggregator` are deprecated, as they depended on the now-deprecated `LLMMessagesFrame`. Use `LLMUserContextAggregator` and `LLMAssistantResponseAggregator` (or LLM-specific subclasses thereof) instead. ## [0.0.78] - 2025-08-07 ### Added - Added `SonioxSTTService` using Soniox's STT websocket API. - Added `enable_emulated_vad_interruptions` to `LLMUserAggregatorParams`. When user speech is emulated (e.g. when a transcription is received but VAD doesn't detect speech), this parameter controls whether the emulated speech can interrupt the bot. Default is False (emulated speech is ignored while the bot is speaking). - Added new `handle_sigint` and `handle_sigterm` to `RunnerArguments`. This allows applications to know what settings they should use for the environment they are running on. Also, added `pipeline_idle_timeout_secs` to be able to control the `PipelineTask` idle timeout. - Added `processor` field to `ErrorFrame` to indicate `FrameProcessor` that generated the error. - Added new language support for `AWSTranscribeSTTService`. All languages supporting streaming data input are now supported: https://docs.aws.amazon.com/transcribe/latest/dg/supported-languages.html - Added support for Simli Trinity Avatars. A new `is_trinity_avatar` parameter has been introduced to specify whether the provided `faceId` corresponds to a Trinity avatar, which is required for optimal Trinity avatar performance. - The development runner how handles custom `body` data for `DailyTransport`. The `body` data is passed to the Pipecat client. You can POST to the `/start` endpoint with a request body of: ``` { "createDailyRoom": true, "dailyRoomProperties": { "start_video_off": true }, "body": { "custom_data": "value" } } ``` The `body` information is parsed and used in the application. The `dailyRoomProperties` are currently not handled. - Added detailed latency logging to `UserBotLatencyLogObserver`, capturing average response time between user stop and bot start, as well as minimum and maximum response latency. - Added Chinese, Japanese, Korean word timestamp support to `CartesiaTTSService`. - Added `region` parameter to `GladiaSTTService`. Accepted values: eu-west (default), us-west. ### Changed - System frames are now queued. Before, system frames could be generated from any task and would not guarantee any order which was causing undesired behavior. Also, it was possible to get into some rare recursion issues because of the way system frames were executed (they were executed in-place, meaning calling `push_frame()` would finish after the system frame traversed all the pipeline). This makes system frames more deterministic. - Changed the default model for both `ElevenLabsTTSService` and `ElevenLabsHttpTTSService` to `eleven_turbo_v2_5`. The rationale for this change is that the Turbo v2.5 model exhibits the most stable voice quality along with very low latency TTFB; latencies are on par with the Flash v2.5 model. Also, the Turbo v2.5 model outputs word/timestamp alignment data with correct spacing. - The development runners `/connect` and `/start` endpoint now both return `dailyRoom` and `dailyToken` in place of the previous `room_url` and `token`. - Updated the `pipecat.runner.daily` utility to only a take `DAILY_API_URL` and `DAILY_SAMPLE_ROOM_URL` environment variables instead of argparsing `-u` and `-k`, respectively. - Updated `daily-python` to 0.19.6. - Changed `TavusVideoService` to send audio or video frames only after the transport is ready, preventing warning messages at startup. - The development runner now strips any provided protocol (e.g. https://) from the proxy address and issues a warning. It also strips trailing `/`. ### Deprecated - In the `pipecat.runner.daily`, the `configure_with_args()` function is deprecated. Use the `configure()` function instead. - The development runner's `/connect` endpoint is deprecated and will be removed in a future version. Use the `/start` endpoint in its place. In the meantime, both endpoints work and deliver equivalent functionality. ### Fixed - Fixed a `DailyTransport` issue that would result in an unhandled `concurrent.futures.CancelledError` when a future is cancelled. - Fixed a `RivaSTTService` issue that would result in an unhandled `concurrent.futures.CancelledError` when a future is cancelled when reading from the audio chunks from the incoming audio stream. - Fixed an issue in the `BaseOutputTransport`, mainly reproducible with `FastAPIWebsocketOutputTransport` when the audio mixer was enabled, where the loop could consume 100% CPU by continuously returning without delay, preventing other asyncio tasks (such as cancellation or shutdown signals) from being processed. - Fixed an issue where `BotStartedSpeakingFrame` and `BotStoppedSpeakingFrame` were not emitted when using `TavusVideoService` or `HeyGenVideoService`. - Fixed an issue in `LiveKitTransport` where empty `AudioRawFrame`s were pushed down the pipeline. This resulted in warnings by the STT processor. - Fixed `PiperTTSService` to send text as a JSON object in the request body, resolving compatibility with Piper's HTTP API. - Fixed an issue with the `TavusVideoService` where an error was thrown due to missing transcription callbacks. - Fixed an issue in `SpeechmaticsSTTService` where the `user_id` was set to `None` when diarization is not enabled. ### Performance - Fixed an issue in `TaskObserver` (a proxy to all observers) that was degrading global performance. ### Other - Added `07aa-interruptible-soniox.py`, `07ab-interruptible-inworld-http.py`, `07ac-interruptible-asyncai.py` and `07ac-interruptible-asyncai-http.py` release evals. ## [0.0.77] - 2025-07-31 ### Added - Added `InputTextRawFrame` frame type to handle user text input with Gemini Multimodal Live. - Added `HeyGenVideoService`. This is an integration for HeyGen Interactive Avatar. A video service that handles audio streaming and requests HeyGen to generate avatar video responses. (see https://www.heygen.com/) - Added the ability to switch voices to `RimeTTSService`. - Added unified development runner for building voice AI bots across multiple transports - `pipecat.runner.run` – FastAPI-based development server with automatic bot discovery - `pipecat.runner.types` – Runner session argument types (`DailyRunnerArguments`, `SmallWebRTCRunnerArguments`, `WebSocketRunnerArguments`) - `pipecat.runner.utils.create_transport()` – Factory function for creating transports from session arguments - `pipecat.runner.daily` and `pipecat.runner.livekit` – Configuration utilities for Daily and LiveKit setups - Support for all transport types: Daily, WebRTC, Twilio, Telnyx, Plivo - Automatic telephony provider detection and serializer configuration - ESP32 WebRTC compatibility with SDP munging - Environment detection (`ENV=local`) for conditional features - Added Async.ai TTS integration (https://async.ai/) - `AsyncAITTSService` – WebSocket-based streaming TTS with interruption support - `AsyncAIHttpTTSService` – HTTP-based streaming TTS service - Example scripts: - `examples/foundational/07ac-interruptible-asyncai.py` (WebSocket demo) - `examples/foundational/07ac-interruptible-asyncai-http.py` (HTTP demo) - Added `transcription_bucket` params support to the `DailyRESTHelper`. - Added a new TTS service, `InworldTTSService`. This service provides low-latency, high-quality speech generation using Inworld's streaming API. - Added a new field `handle_sigterm` to `PipelineRunner`. It defaults to `False`. This field handles SIGTERM signals. The `handle_sigint` field still defaults to `True`, but now it handles only SIGINT signals. - Added foundational example `14u-function-calling-ollama.py` for Ollama function calling. - Added `LocalSmartTurnAnalyzerV2`, which supports local on-device inference with the new `smart-turn-v2` turn detection model. - Added `set_log_level` to `DailyTransport`, allowing setting the logging level for Daily's internal logging system. - Added `on_transcription_stopped` and `on_transcription_error` to Daily callbacks. ### Changed - Changed the default `url` for `NeuphonicTTSService` to `wss://api.neuphonic.com` as it provides better global performance. You can set the URL to other URLs, such as the previous default: `wss://eu-west-1.api.neuphonic.com`. - Update `daily-python` to 0.19.5. - `STTMuteFilter` now pushes the `STTMuteFrame` upstream and downstream, to allow for more flexible `STTMuteFilter` placement. - Play delayed messages from `ElevenLabsTTSService` if they still belong to the current context. - Dependency compatibility improvements: Relaxed version constraints for core dependencies to support broader version ranges while maintaining stability: - `aiohttp`, `Markdown`, `nltk`, `numpy`, `Pillow`, `pydantic`, `openai`, `numba`: Now support up to the next major version (e.g. `numpy>=1.26.4,<3`) - `pyht`: Relaxed to `>=0.1.6` to resolve `grpcio` conflicts with `nvidia-riva-client` - `fastapi`: Updated to support versions `>=0.115.6,<0.117.0` - `torch`/`torchaudio`: Changed from exact pinning (`==2.5.0`) to compatible range (`~=2.5.0`) - `aws_sdk_bedrock_runtime`: Added Python 3.12+ constraint via environment marker - `numba`: Reduced minimum version to `0.60.0` for better compatibility - Changed `NeuphonicHttpTTSService` to use a POST based request instead of the `pyneuphonic` package. This removes a package requirement, allowing Neuphonic to work with more services. - Updated `ElevenLabsTTSService` to handle the case where `allow_interruptions=False`. Now, when interruptions are disabled, the same context ID will be used throughout the conversation. - Updated the `deepgram` optional dependency to 4.7.0, which downgrades the `tasks cancelled error` to a debug log. This removes the log from appearing in Pipecat logs upon leaving. - Upgraded the `websockets` implementation to the new asyncio implementation. Along with this change, we're updating support for versions >=13.1.0 and <15.0.0. All services have been update to use the asyncio implementation. - Updated `MiniMaxHttpTTSService` with a `base_url` arg where you can specify the Global endpoint (default) or Mainland China. - Replaced regex-based sentence detection in `match_endofsentence` with NLTK's punkt_tab tokenizer for more reliable sentence boundary detection. - Changed the `livekit` optional dependency for `tenacity` to `tenacity>=8.2.3,<10.0.0` in order to support the `google-genai` package. - For `LmntTTSService`, changed the default `model` to `blizzard`, LMNT's recommended model. - Updated `SpeechmaticsSTTService`: - Added support for additional diarization options. - Added foundational example `07a-interruptible-speechmatics-vad.py`, which uses VAD detection provided by `SpeechmaticsSTTService`. ### Fixed - Fixed a `LLMUserResponseAggregator` issue where interruptions were not being handled properly. - Fixed `PiperTTSService` to work with newer Piper GPL. - Fixed a race condition in `FastAPIWebsocketClient` that occurred when attempting to send a message while the client was disconnecting. - Fixed an issue in `GoogleLLMService` where interruptions did not work when an interruption strategy was used. - Fixed an issue in the `TranscriptProcessor` where newline characters could cause the transcript output to be corrupted (e.g. missing all spaces). - Fixed an issue in `AudioBufferProcessor` when using `SmallWebRTCTransport` where, if the microphone was muted, track timing was not respected. - Fixed an error that occurs when pushing an `LLMMessagesFrame`. Only some LLM services, like Grok, are impacted by this issue. The fix is to remove the optional `name` property that was being added to the message. - Fixed an issue in `AudioBufferProcessor` that caused garbled audio when `enable_turn_audio` was enabled and audio resampling was required. - Fixed a dependency issue for uv users where an `llvmlite` version required python 3.9. - Fixed an issue in `MiniMaxHttpTTSService` where the `pitch` param was the incorrect type. - Fixed an issue with OpenTelemetry tracing where the `enable_tracing` flag did not disable the internal tracing decorator functions. - Fixed an issue in `OLLamaLLMService` where kwargs were not passed correctly to the parent class. - Fixed an issue in `ElevenLabsTTSService` where the word/timestamp pairs were calculating word boundaries incorrectly. - Fixed an issue where, in some edge cases, the `EmulateUserStartedSpeakingFrame` could be created even if we didn't have a transcription. - Fixed an issue in `GoogleLLMContext` where it would inject the `system_message` as a "user" message into cases where it was not meant to; it was only meant to do that when there were no "regular" (non-function-call) messages in the context, to ensure that inference would run properly. - Fixed an issue in `LiveKitTransport` where the `on_audio_track_subscribed` was never emitted. ### Other - Added new quickstart demos: - examples/quickstart: voice AI bot quickstart - examples/client-server-web: client/server starter example - examples/phone-bot-twilio: twilio starter example - Removed most of the examples from the pipecat repo. Examples can now be found in: https://github.com/pipecat-ai/pipecat-examples. ## [0.0.76] - 2025-07-11 ### Added - Added `SpeechControlParamsFrame`, a new `SystemFrame` that notifies downstream processors of the VAD and Turn analyzer params. This frame is pushed by the `BaseInputTransport` at Start and any time a `VADParamsUpdateFrame` is received. ### Changed - Two package dependencies have been updated: - `numpy` now supports 1.26.0 and newer - `transformers` now supports 4.48.0 and newer ### Fixed - Fixed an issue with RTVI's handling of `append-to-context`. - Fixed an issue where using audio input with a sample rate requiring resampling could result in empty audio being passed to STT services, causing errors. - Fixed the VAD analyzer to process the full audio buffer as long as it contains more than the minimum required bytes per iteration, instead of only analyzing the first chunk. - Fixed an issue in ParallelPipeline that caused errors when attempting to drain the queues. - Fixed an issue with emulated VAD timeout inconsistency in `LLMUserContextAggregator`. Previously, emulated VAD scenarios (where transcription is received without VAD detection) used a hardcoded `aggregation_timeout` (default 0.5s) instead of matching the VAD's `stop_secs` parameter (default 0.8s). This created different user experiences between real VAD and emulated VAD scenarios. Now, emulated VAD timeouts automatically synchronize with the VAD's `stop_secs` parameter. - Fix a pipeline freeze when using AWS Nova Sonic, which would occur if the user started early, while the bot was still working through `trigger_assistant_response()`. ## [0.0.75] - 2025-07-08 [YANKED] **This release has been yanked due to resampling issues affecting audio output quality and critical bugs impacting `ParallelPipelines` functionality.** **Please upgrade to version 0.0.76 or later.** ### Added - Added an `aggregate_sentences` arg in `CartesiaTTSService`, `ElevenLabsTTSService`, `NeuphonicTTSService` and `RimeTTSService`, where the default value is True. When `aggregate_sentences` is True, the `TTSService` aggregates the LLM streamed tokens into sentences by default. Note: setting the value to False requires a custom processor before the `TTSService` to aggregate LLM tokens. - Added `kwargs` to the `OLLamaLLMService` to allow for configuration args to be passed to Ollama. - Added call hang-up error handling in `TwilioFrameSerializer`, which handles the case where the user has hung up before the `TwilioFrameSerializer` hangs up the call. ### Changed - Updated `RTVIObserver` and `RTVIProcessor` to match the new RTVI 1.0.0 protocol. This includes: - Deprecating support for all messages related to service configuaration and actions. - Adding support for obtaining and logging data about client, including its RTVI version and optionally included system information (OS/browser/etc.) - Adding support for handling the new `client-message` RTVI message through either a `on_client_message` event handler or listening for a new `RTVIClientMessageFrame` - Adding support for responding to a `client-message` with a `server-response` via either a direct call on the `RTVIProcessor` or via pushing a new `RTVIServerResponseFrame` - Adding built-in support for handling the new `append-to-context` RTVI message which allows a client to add to the user or assistant llm context. No extra code is required for supporting this behavior. - Updating all JavaScript and React client RTVI examples to use versions 1.0.0 of the clients. Get started migrating to RTVI protocol 1.0.0 by following the migration guide: https://docs.pipecat.ai/client/migration-guide - Refactored `AWSBedrockLLMService` and `AWSPollyTTSService` to work asynchronously using `aioboto3` instead of the `boto3` library. - The `UserIdleProcessor` now handles the scenario where function calls take longer than the idle timeout duration. This allows you to use the `UserIdleProcessor` in conjunction with function calls that take a while to return a result. ### Fixed - Updated the `NeuphonicTTSService` to work with the updated websocket API. - Fixed an issue with `RivaSTTService` where the watchdog feature was causing an error on initialization. ### Performance - Remove unncessary push task in each `FrameProcessor`. ## [0.0.74] - 2025-07-03 [YANKED] **This release has been yanked due to resampling issues affecting audio output quality and critical bugs impacting `ParallelPipelines` functionality.** **Please upgrade to version 0.0.76 or later.** ### Added - Added a new STT service, `SpeechmaticsSTTService`. This service provides real-time speech-to-text transcription using the Speechmatics API. It supports partial and final transcriptions, multiple languages, various audio formats, and speaker diarization. - Added `normalize` and `model_id` to `FishAudioTTSService`. - Added `http_options` argument to `GoogleLLMService`. - Added `run_llm` field to `LLMMessagesAppendFrame` and `LLMMessagesUpdateFrame` frames. If true, a context frame will be pushed triggering the LLM to respond. - Added a new `SOXRStreamAudioResampler` for processing audio in chunks or streams. If you write your own processor and need to use an audio resampler, use the new `create_stream_resampler()`. - Added new `DailyParams.audio_in_user_tracks` to allow receiving one track per user (default) or a single track from the room (all participants mixed). - Added support for providing "direct" functions, which don't need an accompanying `FunctionSchema` or function definition dict. Instead, metadata (i.e. `name`, `description`, `properties`, and `required`) are automatically extracted from a combination of the function signature and docstring. Usage: ```python # "Direct" function # `params` must be the first parameter async def do_something(params: FunctionCallParams, foo: int, bar: str = ""): """ Do something interesting. Args: foo (int): The foo to do something interesting with. bar (string): The bar to do something interesting with. """ result = await process(foo, bar) await params.result_callback({"result": result}) # ... llm.register_direct_function(do_something) # ... tools = ToolsSchema(standard_tools=[do_something]) ``` - `user_id` is now populated in the `TranscriptionFrame` and `InterimTranscriptionFrame` when using a transport that provides a `user_id`, like `DailyTransport` or `LiveKitTransport`. - Added `watchdog_coroutine()`. This is a watchdog helper for couroutines. So, if you have a coroutine that is waiting for a result and that takes a long time, you will need to wrap it with `watchdog_coroutine()` so the watchdog timers are reset regularly. - Added `session_token` parameter to `AWSNovaSonicLLMService`. - Added Gemini Multimodal Live File API for uploading, fetching, listing, and deleting files. See `26f-gemini-live-files-api.py` for example usage. ### Changed - Updated all the services to use the new `SOXRStreamAudioResampler`, ensuring smooth transitions and eliminating clicks. - Upgraded `daily-python` to 0.19.4. - Updated `google` optional dependency to use `google-genai` version `1.24.0`. ### Fixed - Fixed an issue where audio would get stuck in the queue when an interrupt occurs during Azure TTS synthesis. - Fixed a race condition that occurs in Python 3.10+ where the task could miss the `CancelledError` and continue running indefinitely, freezing the pipeline. - Fixed a `AWSNovaSonicLLMService` issue introduced in 0.0.72. ### Deprecated - In `FishAudioTTSService`, deprecated `model` and replaced with `reference_id`. This change is to better align with Fish Audio's variable naming and to reduce confusion about what functionality the variable controls. ## [0.0.73] - 2025-06-26 ### Fixed - Fixed an issue introduced in 0.0.72 that would cause `ElevenLabsTTSService`, `GladiaSTTService`, `NeuphonicTTSService` and `OpenAIRealtimeBetaLLMService` to throw an error. ## [0.0.72] - 2025-06-26 ### Added - Added logging and improved error handling to help diagnose and prevent potential Pipeline freezes. - Added `WatchdogQueue`, `WatchdogPriorityQueue`, `WatchdogEvent` and `WatchdogAsyncIterator`. These helper utilities reset watchdog timers appropriately before they expire. When watchdog timers are disabled, the utilities behave as standard counterparts without side effects. - Introduce task watchdog timers. Watchdog timers are used to detect if a Pipecat task is taking longer than expected (by default 5 seconds). Watchdog timers are disabled by default and can be enabled globally by passing `enable_watchdog_timers` argument to `PipelineTask` constructor. It is possible to change the default watchdog timer timeout by using the `watchdog_timeout` argument. You can also log how long it takes to reset the watchdog timers which is done with the `enable_watchdog_logging`. You can control all these settings per each frame processor or even per task. That is, you can set `enable_watchdog_timers`, `enable_watchdog_logging` and `watchdog_timeout` when creating any frame processor through their constructor arguments or when you create a task with `FrameProcessor.create_task()`. Note that watchdog timers only work with Pipecat tasks and will not work if you use `asycio.create_task()` or similar. - Added `lexicon_names` parameter to `AWSPollyTTSService.InputParams`. - Added reconnection logic and audio buffer management to `GladiaSTTService`. - The `TurnTrackingObserver` now ends a turn upon observing an `EndFrame` or `CancelFrame`. - Added Polish support to `AWSTranscribeSTTService`. - Added new frames `FrameProcessorPauseFrame` and `FrameProcessorResumeFrame` which allow pausing and resuming frame processing for a given frame processor. These are control frames, so they are ordered. Pausing frame processor will keep old frames in the internal queues until resume takes place. Frames being pushed while a frame processor is paused will be pushed to the queues. When frame processing is resumed all queued frames will be processed in order. Also added `FrameProcessorPauseUrgentFrame` and `FrameProcessorResumeUrgentFrame` which are system frames and therefore they have high priority. - Added a property called `has_function_calls_in_progress` in `LLMAssistantContextAggregator` that exposes whether a function call is in progress. - Added `SambaNovaLLMService` which provides llm api integration with an OpenAI-compatible interface. - Added `SambaNovaTTSService` which provides speech-to-text functionality using SambaNovas's (whisper) API. - Add fundational examples for function calling and transcription `14s-function-calling-sambanova.py`, `13g-sambanova-transcription.py` ### Changed - `HeartbeatFrame`s are now control frames. This will make it easier to detect pipeline freezes. Previously, heartbeat frames were system frames which meant they were not get queued with other frames, making it difficult to detect pipeline stalls. - Updated `OpenAIRealtimeBetaLLMService` to accept `language` in the `InputAudioTranscription` class for all models. - Updated the default model for `OpenAIRealtimeBetaLLMService` to `gpt-4o-realtime-preview-2025-06-03`. - The `PipelineParams` arg `allow_interruptions` now defaults to `True`. - `TavusTransport` and `TavusVideoService` now send audio to Tavus using WebRTC audio tracks instead of `app-messages` over WebSocket. This should improve the overall audio quality. - Upgraded `daily-python` to 0.19.3. ### Fixed - Fixed an issue that would cause heartbeat frames to be sent before processors were started. - Fixed an event loop blocking issue when using `SentryMetrics`. - Fixed an issue in `FastAPIWebsocketClient` to ensure proper disconnection when the websocket is already closed. - Fixed an issue where the `UserStoppedSpeakingFrame` was not received if the transport was not receiving new audio frames. - Fixed an edge case where if the user interrupted the bot but no new aggregation was received, the bot would not resume speaking. - Fixed an issue with `TelnyxFrameSerializer` where it would throw an exception when the user hung up the call. - Fixed an issue with `ElevenLabsTTSService` where the context was not being closed. - Fixed function calling in `AWSNovaSonicLLMService`. - Fixed an issue that would cause multiple `PipelineTask.on_idle_timeout` events to be triggered repeatedly. - Fixed an issue that was causing user and bot speech to not be synchronized during recordings. - Fixed an issue where voice settings weren't applied to ElevenLabsTTSService. - Fixed an issue with `GroqTTSService` where it was not properly parsing the WAV file header. - Fixed an issue with `GoogleSTTService` where it was constantly reconnecting before starting to receive audio from the user. - Fixed an issue where `GoogleLLMService`'s TTFB value was incorrect. ### Deprecated - `AudioBufferProcessor` parameter `user_continuos_stream` is deprecated. ### Other - Rename `14e-function-calling-gemini.py` to `14e-function-calling-google.py`. ## [0.0.71] - 2025-06-10 ### Added - Adds a parameter called `additional_span_attributes` to PipelineTask that lets you add any additional attributes you'd like to the conversation span. ### Fixed - Fixed an issue with `CartesiaSTTService` initialization. ## [0.0.70] - 2025-06-10 ### Added - Added `ExotelFrameSerializer` to handle telephony calls via Exotel. - Added the option `informal` to `TranslationConfig` on Gladia config. Allowing to force informal language forms when available. - Added `CartesiaSTTService` which is a websocket based implementation to transcribe audio. Added a foundational example in `13f-cartesia-transcription.py` - Added an `websocket` example, showing how to use the new Pipecat client `WebsocketTransport` to connect with Pipecat `FastAPIWebsocketTransport` or `WebsocketServerTransport`. - Added language support to `RimeHttpTTSService`. Extended languages to include German and French for both `RimeTTSService` and `RimeHttpTTSService`. ### Changed - Upgraded `daily-python` to 0.19.2. - Make `PipelineTask.add_observer()` synchronous. This allows callers to call it before doing the work of running the `PipelineTask` (i.e. without invoking `PipelineTask.set_event_loop()` first). - Pipecat 0.0.69 forced `uvloop` event loop on Linux on macOS. Unfortunately, this is causing issue in some systems. So, `uvloop` is not enabled by default anymore. If you want to use `uvloop` you can just set the `asyncio` event policy before starting your agent with: ```python asyncio.set_event_loop_policy(uvloop.EventLoopPolicy()) ``` ### Fixed - Fixed an issue with various TTS services that would cause audio glitches at the start of every bot turn. - Fixed an `ElevenLabsTTSService` issue where a context warning was printed when pushing a `TTSSpeakFrame`. - Fixed an `AssemblyAISTTService` issue that could cause unexpected behavior when yielding empty `Frame()`s. - Fixed an issue where `OutputAudioRawFrame.transport_destination` was being reset to `None` instead of retaining its intended value before sending the audio frame to `write_audio_frame`. - Fixed a typo in Livekit transport that prevented initialization. ## [0.0.69] - 2025-06-02 "AI Engineer World's Fair release" ✨ ### Added - Added a new frame `FunctionCallsStartedFrame`. This frame is pushed both upstream and downstream from the LLM service to indicate that one or more function calls are going to be executed. - Added LLM services `on_function_calls_started` event. This event will be triggered when the LLM service receives function calls from the model and is going to start executing them. - Function calls can now be executed sequentially (in the order received in the completion) by passing `run_in_parallel=False` when creating your LLM service. By default, if the LLM completion returns 2 or more function calls they run concurrently. In both cases, concurrently and sequentially, a new LLM completion will run when the last function call finishes. - Added OpenTelemetry tracing for `GeminiMultimodalLiveLLMService` and `OpenAIRealtimeBetaLLMService`. - Added initial support for interruption strategies, which determine if the user should interrupt the bot while the bot is speaking. Interruption strategies can be based on factors such as audio volume or the number of words spoken by the user. These can be specified via the new `interruption_strategies` field in `PipelineParams`. A new `MinWordsInterruptionStrategy` strategy has been introduced which triggers an interruption if the user has spoken a minimum number of words. If no interruption strategies are specified, the normal interruption behavior applies. If multiple strategies are provided, the first one that evaluates to true will trigger the interruption. - `BaseInputTransport` now handles `StopFrame`. When a `StopFrame` is received the transport will pause sending frames downstream until a new `StartFrame` is received. This allows the transport to be reused (keeping the same connection) in a different pipeline. - Updated AssemblyAI STT service to support their latest streaming speech-to-text model with improved transcription latency and endpointing. - You can now access STT service results through the new `TranscriptionFrame.result` and `InterimTranscriptionFrame.result` field. This is useful in case you use some specific settings for the STT and you want to access the STT results. - The examples runner is now public from the `pipecat.examples` package. This allows everyone to build their own examples and run them easily. - It is now possible to push `OutputDTMFFrame` or `OutputDTMFUrgentFrame` with `DailyTransport`. This will be sent properly if a Daily dial-out connection has been established. - Added `OutputDTMFUrgentFrame` to send a DTMF keypress quickly. The previous `OutputDTMFFrame` queues the keypress with the rest of data frames. - Added `DTMFAggregator`, which aggregates keypad presses into `TranscriptionFrame`s. Aggregation occurs after a timeout, termination key press, or user interruption. You can specify the prefix of the `TranscriptionFrame`. - Added new functions `DailyTransport.start_transcription()` and `DailyTransport.stop_transcription()` to be able to start and stop Daily transcription dynamically (maybe with different settings). ### Changed - Reverted the default model for `GeminiMultimodalLiveLLMService` back to `models/gemini-2.0-flash-live-001`. `gemini-2.5-flash-preview-native-audio-dialog` has inconsistent performance. You can opt in to using this model by setting the `model` arg. - Function calls are now cancelled by default if there's an interruption. To disable this behavior you can set `cancel_on_interruption=False` when registering the function call. Since function calls are executed as tasks you can tell if a function call has been cancelled by catching the `asyncio.CancelledError` exception (and don't forget to raise it again!). - Updated OpenTelemetry tracing attribute `metrics.ttfb_ms` to `metrics.ttfb`. The attribute reports TTFB in seconds. ### Deprecated - `DailyTransport.send_dtmf()` is deprecated, push an `OutputDTMFFrame` or an `OutputDTMFUrgentFrame` instead. ### Fixed - Fixed an issue with `ElevenLabsTTSService` where long responses would continue generating output even after an interruption. - Fixed an issue with the `OpenAILLMContext` where non-Roman characters were being incorrectly encoded as Unicode escape sequences. This was a logging issue and did not impact the actual conversation. - In `AWSBedrockLLMService`, worked around a possible bug in AWS Bedrock where a `toolConfig` is required if there has been previous tool use in the messages array. This workaround includes a no_op factory function call is used to satisfy the requirement. - Fixed `WebsocketClientTransport` to use `FrameProcessorSetup.task_manager` instead of `StartFrame.task_manager`. ### Performance - Use `uvloop` as the new event loop on Linux and macOS systems. ## [0.0.68] - 2025-05-28 ### Added - Added `GoogleHttpTTSService` which uses Google's HTTP TTS API. - Added `TavusTransport`, a new transport implementation compatible with any Pipecat pipeline. When using the `TavusTransport`the Pipecat bot will connect in the same room as the Tavus Avatar and the user. - Added `PlivoFrameSerializer` to support Plivo calls. A full running example has also been added to `examples/plivo-chatbot`. - Added `UserBotLatencyLogObserver`. This is an observer that logs the latency between when the user stops speaking and when the bot starts speaking. This gives you an initial idea on how quickly the AI services respond. - Added `SarvamTTSService`, which implements Sarvam AI's TTS API: https://docs.sarvam.ai/api-reference-docs/text-to-speech/convert. - Added `PipelineTask.add_observer()` and `PipelineTask.remove_observer()` to allow mangaging observers at runtime. This is useful for cases where the task is passed around to other code components that might want to observe the pipeline dynamically. - Added `user_id` field to `TranscriptionMessage`. This allows identifying the user in a multi-user scenario. Note that this requires that `TranscriptionFrame` has the `user_id` properly set. - Added new `PipelineTask` event handlers `on_pipeline_started`, `on_pipeline_stopped`, `on_pipeline_ended` and `on_pipeline_cancelled`, which correspond to the `StartFrame`, `StopFrame`, `EndFrame` and `CancelFrame` respectively. - Added additional languages to `LmntTTSService`. Languages include: `hi`, `id`, `it`, `ja`, `nl`, `pl`, `ru`, `sv`, `th`, `tr`, `uk`, `vi`. - Added a `model` parameter to the `LmntTTSService` constructor, allowing switching between LMNT models. - Added `MiniMaxHttpTTSService`, which implements MiniMax's T2A API for TTS. Learn more: https://www.minimax.io/platform_overview - A new function `FrameProcessor.setup()` has been added to allow setting up frame processors before receiving a `StartFrame`. This is what's happening internally: `FrameProcessor.setup()` is called, `StartFrame` is pushed from the beginning of the pipeline, your regular pipeline operations, `EndFrame` or `CancelFrame` are pushed from the beginning of the pipeline and finally `FrameProcessor.cleanup()` is called. - Added support for OpenTelemetry tracing in Pipecat. This initial implementation includes: - A `setup_tracing` method where you can specify your OpenTelemetry exporter - Service decorators for STT (`@traced_stt`), LLM (`@traced_llm`), and TTS (`@traced_tts`) which trace the execution and collect properties and metrics (TTFB, token usage, character counts, etc.) - Class decorators that provide execution tracking; these are generic and can be used for service tracking as needed - Spans that help track traces on a per conversations and turn basis: ``` conversation-uuid ├── turn-1 │ ├── stt_deepgramsttservice │ ├── llm_openaillmservice │ └── tts_cartesiattsservice ... └── turn-n └── ... ``` By default, Pipecat has implemented service decorators to trace execution of STT, LLM, and TTS services. You can enable tracing by setting `enable_tracing` to `True` in the PipelineTask. - Added `TurnTrackingObserver`, which tracks the start and end of a user/bot turn pair and emits events `on_turn_started` and `on_turn_stopped` corresponding to the start and end of a turn, respectively. - Allow passing observers to `run_test()` while running unit tests. ### Changed - Upgraded `daily-python` to 0.19.1. - ⚠️ Updated `SmallWebRTCTransport` to align with how other transports handle `on_client_disconnected`. Now, when the connection is closed and no reconnection is attempted, `on_client_disconnected` is called instead of `on_client_close`. The `on_client_close` callback is no longer used, use `on_client_disconnected` instead. - Check if `PipelineTask` has already been cancelled. - Don't raise an exception if event handler is not registered. - Upgraded `deepgram-sdk` to 4.1.0. - Updated `GoogleTTSService` to use Google's streaming TTS API. The default voice also updated to `en-US-Chirp3-HD-Charon`. - ⚠️ Refactored the `TavusVideoService`, so it acts like a proxy, sending audio to Tavus and receiving both audio and video. This will make `TavusVideoService` usable with any Pipecat pipeline and with any transport. This is a **breaking change**, check the `examples/foundational/21a-tavus-layer-small-webrtc.py` to see how to use it. - `DailyTransport` now uses custom microphone audio tracks instead of virtual microphones. Now, multiple Daily transports can be used in the same process. - `DailyTransport` now captures audio from individual participants instead of the whole room. This allows identifying audio frames per participant. - Updated the default model for `AnthropicLLMService` to `claude-sonnet-4-20250514`. - Updated the default model for `GeminiMultimodalLiveLLMService` to `models/gemini-2.5-flash-preview-native-audio-dialog`. - `BaseTextFilter` methods `filter()`, `update_settings()`, `handle_interruption()` and `reset_interruption()` are now async. - `BaseTextAggregator` methods `aggregate()`, `handle_interruption()` and `reset()` are now async. - The API version for `CartesiaTTSService` and `CartesiaHttpTTSService` has been updated. Also, the `cartesia` dependency has been updated to 2.x. - `CartesiaTTSService` and `CartesiaHttpTTSService` now support Cartesia's new `speed` parameter which accepts values of `slow`, `normal`, and `fast`. - `GeminiMultimodalLiveLLMService` now uses the user transcription and usage metrics provided by Gemini Live. - `GoogleLLMService` has been updated to use `google-genai` instead of the deprecated `google-generativeai`. ### Deprecated - In `CartesiaTTSService` and `CartesiaHttpTTSService`, `emotion` has been deprecated by Cartesia. Pipecat is following suit and deprecating `emotion` as well. ### Removed - Since `GeminiMultimodalLiveLLMService` now transcribes it's own audio, the `transcribe_user_audio` arg has been removed. Audio is now transcribed automatically. - Removed `SileroVAD` frame processor, just use `SileroVADAnalyzer` instead. Also removed, `07a-interruptible-vad.py` example. ### Fixed - Fixed a `DailyTransport` issue that was not allow capturing video frames if framerate was greater than zero. - Fixed a `DeegramSTTService` connection issue when the user provided their own `LiveOptions`. - Fixed a `DailyTransport` issue that would cause images needing resize to block the event loop. - Fixed an issue with `ElevenLabsTTSService` where changing the model or voice while the service is running wasn't working. - Fixed an issue that would cause multiple instances of the same class to behave incorrectly if any of the given constructor arguments defaulted to a mutable value (e.g. lists, dictionaries, objects). - Fixed an issue with `CartesiaTTSService` where `TTSTextFrame` messages weren't being emitted when the model was set to `sonic`. This resulted in the assistant context not being updated with assistant messages. ### Performance - `DailyTransport`: process audio, video and events in separate tasks. - Don't create event handler tasks if no user event handlers have been registered. ### Other - It is now possible to run all (or most) foundational example with multiple transports. By default, they run with P2P (Peer-To-Peer) WebRTC so you can try everything locally. You can also run them with Daily or even with a Twilio phone number. - Added foundation examples `07y-interruptible-minimax.py` and `07z-interruptible-sarvam.py`to show how to use the `MiniMaxHttpTTSService` and `SarvamTTSService`, respectively. - Added an `open-telemetry-tracing` example, showing how to setup tracing. The example also includes Jaeger as an open source OpenTelemetry client to review traces from the example runs. - Added foundational example `29-turn-tracking-observer.py` to show how to use the `TurnTrackingObserver`. ## [0.0.67] - 2025-05-07 ### Added - Added `DebugLogObserver` for detailed frame logging with configurable filtering by frame type and endpoint. This observer automatically extracts and formats all frame data fields for debug logging. - `UserImageRequestFrame.video_source` field has been added to request an image from the desired video source. - Added support for the AWS Nova Sonic speech-to-speech model with the new `AWSNovaSonicLLMService`. See https://docs.aws.amazon.com/nova/latest/userguide/speech.html. Note that it requires Python >= 3.12 and `pip install pipecat-ai[aws-nova-sonic]`. - Added new AWS services `AWSBedrockLLMService` and `AWSTranscribeSTTService`. - Added `on_active_speaker_changed` event handler to the `DailyTransport` class. - Added `enable_ssml_parsing` and `enable_logging` to `InputParams` in `ElevenLabsTTSService`. - Added support to `RimeHttpTTSService` for the `arcana` model. ### Changed - Updated `ElevenLabsTTSService` to use the beta websocket API (multi-stream-input). This new API supports context_ids and cancelling those contexts, which greatly improves interruption handling. - Observers `on_push_frame()` now take a single argument `FramePushed` instead of multiple arguments. - Updated the default voice for `DeepgramTTSService` to `aura-2-helena-en`. ### Deprecated - `PollyTTSService` is now deprecated, use `AWSPollyTTSService` instead. - Observer `on_push_frame(src, dst, frame, direction, timestamp)` is now deprecated, use `on_push_frame(data: FramePushed)` instead. ### Fixed - Fixed a `DailyTransport` issue that was causing issues when multiple audio or video sources where being captured. - Fixed a `UltravoxSTTService` issue that would cause the service to generate all tokens as one word. - Fixed a `PipelineTask` issue that would cause tasks to not be cancelled if task was cancelled from outside of Pipecat. - Fixed a `TaskManager` that was causing dangling tasks to be reported. - Fixed an issue that could cause data to be sent to the transports when they were still not ready. - Remove custom audio tracks from `DailyTransport` before leaving. ### Removed - Removed `CanonicalMetricsService` as it's no longer maintained. ## [0.0.66] - 2025-05-02 ### Added - Added two new input parameters to `RimeTTSService`: `pause_between_brackets` and `phonemize_between_brackets`. - Added support for cross-platform local smart turn detection. You can use `LocalSmartTurnAnalyzer` for on-device inference using Torch. - `BaseOutputTransport` now allows multiple destinations if the transport implementation supports it (e.g. Daily's custom tracks). With multiple destinations it is possible to send different audio or video tracks with a single transport simultaneously. To do that, you need to set the new `Frame.transport_destination` field with your desired transport destination (e.g. custom track name), tell the transport you want a new destination with `TransportParams.audio_out_destinations` or `TransportParams.video_out_destinations` and the transport should take care of the rest. - Similar to the new `Frame.transport_destination`, there's a new `Frame.transport_source` field which is set by the `BaseInputTransport` if the incoming data comes from a non-default source (e.g. custom tracks). - `TTSService` has a new `transport_destination` constructor parameter. This parameter will be used to update the `Frame.transport_destination` field for each generated `TTSAudioRawFrame`. This allows sending multiple bots' audio to multiple destinations in the same pipeline. - Added `DailyTransportParams.camera_out_enabled` and `DailyTransportParams.microphone_out_enabled` which allows you to enable/disable the main output camera or microphone tracks. This is useful if you only want to use custom tracks and not send the main tracks. Note that you still need `audio_out_enabled=True` or `video_out_enabled`. - Added `DailyTransport.capture_participant_audio()` which allows you to capture an audio source (e.g. "microphone", "screenAudio" or a custom track name) from a remote participant. - Added `DailyTransport.update_publishing()` which allows you to update the call video and audio publishing settings (e.g. audio and video quality). - Added `RTVIObserverParams` which allows you to configure what RTVI messages are sent to the clients. - Added a `context_window_compression` InputParam to `GeminiMultimodalLiveLLMService` which allows you to enable a sliding context window for the session as well as set the token limit of the sliding window. - Updated `SmallWebRTCConnection` to support `ice_servers` with credentials. - Added `VADUserStartedSpeakingFrame` and `VADUserStoppedSpeakingFrame`, indicating when the VAD detected the user to start and stop speaking. These events are helpful when using smart turn detection, as the user's stop time can differ from when their turn ends (signified by UserStoppedSpeakingFrame). - Added `TranslationFrame`, a new frame type that contains a translated transcription. - Added `TransportParams.audio_in_passthrough`. If set (the default), incoming audio will be pushed downstream. - Added `MCPClient`; a way to connect to MCP servers and use the MCP servers' tools. - Added `Mem0 OSS`, along with Mem0 cloud support now the OSS version is also available. ### Changed - `TransportParams.audio_mixer` now supports a string and also a dictionary to provide a mixer per destination. For example: ```python audio_out_mixer={ "track-1": SoundfileMixer(...), "track-2": SoundfileMixer(...), "track-N": SoundfileMixer(...), }, ``` - The `STTMuteFilter` now mutes `InterimTranscriptionFrame` and `TranscriptionFrame` which allows the `STTMuteFilter` to be used in conjunction with transports that generate transcripts, e.g. `DailyTransport`. - Function calls now receive a single parameter `FunctionCallParams` instead of `(function_name, tool_call_id, args, llm, context, result_callback)` which is now deprecated. - Changed the user aggregator timeout for late transcriptions from 1.0s to 0.5s (`LLMUserAggregatorParams.aggregation_timeout`). Sometimes, the STT services might give us more than one transcription which could come after the user stopped speaking. We still want to include these additional transcriptions with the first one because it's part of the user turn. This is what this timeout is helpful with. - Short utterances not detected by VAD while the bot is speaking are now ignored. This reduces the amount of bot interruptions significantly providing a more natural conversation experience. - Updated `GladiaSTTService` to output a `TranslationFrame` when specifying a `translation` and `translation_config`. - STT services now passthrough audio frames by default. This allows you to add audio recording without worrying about what's wrong in your pipeline when it doesn't work the first time. - Input transports now always push audio downstream unless disabled with `TransportParams.audio_in_passthrough`. After many Pipecat releases, we realized this is the common use case. There are use cases where the input transport already provides STT and you also don't want recordings, in which case there's no need to push audio to the rest of the pipeline, but this is not a very common case. - Added `RivaSegmentedSTTService`, which allows Riva offline/batch models, such as to be "canary-1b-asr" used in Pipecat. ### Deprecated - Function calls with parameters `(function_name, tool_call_id, args, llm, context, result_callback)` are deprectated, use a single `FunctionCallParams` parameter instead. - `TransportParams.camera_*` parameters are now deprecated, use `TransportParams.video_*` instead. - `TransportParams.vad_enabled` parameter is now deprecated, use `TransportParams.audio_in_enabled` and `TransportParams.vad_analyzer` instead. - `TransportParams.vad_audio_passthrough` parameter is now deprecated, use `TransportParams.audio_in_passthrough` instead. - `ParakeetSTTService` is now deprecated, use `RivaSTTService` instead, which uses the model "parakeet-ctc-1.1b-asr" by default. - `FastPitchTTSService` is now deprecated, use `RivaTTSService` instead, which uses the model "magpie-tts-multilingual" by default. ### Fixed - Fixed an issue with `SimliVideoService` where the bot was continuously outputting audio, which prevents the `BotStoppedSpeakingFrame` from being emitted. - Fixed an issue where `OpenAIRealtimeBetaLLMService` would add two assistant messages to the context. - Fixed an issue with `GeminiMultimodalLiveLLMService` where the context contained tokens instead of words. - Fixed an issue with HTTP Smart Turn handling, where the service returns a 500 error. Previously, this would cause an unhandled exception. Now, a 500 error is treated as an incomplete response. - Fixed a TTS services issue that could cause assistant output not to be aggregated to the context when also using `TTSSpeakFrame`s. - Fixed an issue where the `SmartTurnMetricsData` was reporting 0ms for inference and processing time when using the `FalSmartTurnAnalyzer`. ### Other - Added `examples/daily-custom-tracks` to show how to send and receive Daily custom tracks. - Added `examples/daily-multi-translation` to showcase how to send multiple simulataneous translations with the same transport. - Added 04 foundational examples for client/server transports. Also, renamed `29-livekit-audio-chat.py` to `04b-transports-livekit.py`. - Added foundational example `13c-gladia-translation.py` showing how to use `TranscriptionFrame` and `TranslationFrame`. ## [0.0.65] - 2025-04-23 "Sant Jordi's release" 🌹📕 https://en.wikipedia.org/wiki/Saint_George%27s_Day_in_Catalonia ### Added - Added automatic hangup logic to the Telnyx serializer. This feature hangs up the Telnyx call when an `EndFrame` or `CancelFrame` is received. It is enabled by default and is configurable via the `auto_hang_up` `InputParam`. - Added a keepalive task to `GladiaSTTService` to prevent the websocket from disconnecting after 30 seconds of no audio input. ### Changed - The `InputParams` for `ElevenLabsTTSService` and `ElevenLabsHttpTTSService` no longer require that `stability` and `similarity_boost` be set. You can individually set each param. - In `TwilioFrameSerializer`, `call_sid` is Optional so as to avoid a breaking changed. `call_sid` is required to automatically hang up. ### Fixed - Fixed an issue where `TwilioFrameSerializer` would send two hang up commands: one for the `EndFrame` and one for the `CancelFrame`. ## [0.0.64] - 2025-04-22 ### Added - Added automatic hangup logic to the Twilio serializer. This feature hangs up the Twilio call when an `EndFrame` or `CancelFrame` is received. It is enabled by default and is configurable via the `auto_hang_up` `InputParam`. - Added `SmartTurnMetricsData`, which contains end-of-turn prediction metrics, to the `MetricsFrame`. Using `MetricsFrame`, you can now retrieve prediction confidence scores and processing time metrics from the smart turn analyzers. - Added support for Application Default Credentials in Google services, `GoogleSTTService`, `GoogleTTSService`, and `GoogleVertexLLMService`. - Added support for Smart Turn Detection via the `turn_analyzer` transport parameter. You can now choose between `HttpSmartTurnAnalyzer()` or `FalSmartTurnAnalyzer()` for remote inference or `LocalCoreMLSmartTurnAnalyzer()` for on-device inference using Core ML. - `DeepgramTTSService` accepts `base_url` argument again, allowing you to connect to an on-prem service. - Added `LLMUserAggregatorParams` and `LLMAssistantAggregatorParams` which allow you to control aggregator settings. You can now pass these arguments when creating aggregator pairs with `create_context_aggregator()`. - Added `previous_text` context support to ElevenLabsHttpTTSService, improving speech consistency across sentences within an LLM response. - Added word/timestamp pairs to `ElevenLabsHttpTTSService`. - It is now possible to disable `SoundfileMixer` when created. You can then use `MixerEnableFrame` to dynamically enable it when necessary. - Added `on_client_connected` and `on_client_disconnected` event handlers to the `DailyTransport` class. These handlers map to the same underlying Daily events as `on_participant_joined` and `on_participant_left`, respectively. This makes it easier to write a single bot pipeline that can also use other transports like `SmallWebRTCTransport` and `FastAPIWebsocketTransport`. ### Changed - `GrokLLMService` now uses `grok-3-beta` as its default model. - Daily's REST helpers now include an `eject_at_token_exp` param, which ejects the user when their token expires. This new parameter defaults to False. Also, the default value for `enable_prejoin_ui` changed to False and `eject_at_room_exp` changed to False. - `OpenAILLMService` and `OpenPipeLLMService` now use `gpt-4.1` as their default model. - `SoundfileMixer` constructor arguments need to be keywords. ### Deprecated - `DeepgramSTTService` parameter `url` is now deprecated, use `base_url` instead. ### Removed - Parameters `user_kwargs` and `assistant_kwargs` when creating a context aggregator pair using `create_context_aggregator()` have been removed. Use `user_params` and `assistant_params` instead. ### Fixed - Fixed an issue that would cause TTS websocket-based services to not cleanup resources properly when disconnecting. - Fixed a `TavusVideoService` issue that was causing audio choppiness. - Fixed an issue in `SmallWebRTCTransport` where an error was thrown if the client did not create a video transceiver. - Fixed an issue where LLM input parameters were not working and applied correctly in `GoogleVertexLLMService`, causing unexpected behavior during inference. ### Other - Updated the `twilio-chatbot` example to use the auto-hangup feature. ## [0.0.63] - 2025-04-11 ### Added - Added media resolution control to `GeminiMultimodalLiveLLMService` with `GeminiMediaResolution` enum, allowing configuration of token usage for image processing (LOW: 64 tokens, MEDIUM: 256 tokens, HIGH: zoomed reframing with 256 tokens). - Added Gemini's Voice Activity Detection (VAD) configuration to `GeminiMultimodalLiveLLMService` with `GeminiVADParams`, allowing fine control over speech detection sensitivity and timing, including: - Start sensitivity (how quickly speech is detected) - End sensitivity (how quickly turns end after pauses) - Prefix padding (milliseconds of audio to keep before speech is detected) - Silence duration (milliseconds of silence required to end a turn) - Added comprehensive language support to `GeminiMultimodalLiveLLMService`, supporting over 30 languages via the `language` parameter, with proper mapping between Pipecat's `Language` enum and Gemini's language codes. - Added support in `SmallWebRTCTransport` to detect when remote tracks are muted. - Added support for image capture from a video stream to the `SmallWebRTCTransport`. - Added a new iOS client option to the `SmallWebRTCTransport` **video-transform** example. - Added new processors `ProducerProcessor` and `ConsumerProcessor`. The producer processor processes frames from the pipeline and decides whether the consumers should consume it or not. If so, the same frame that is received by the producer is sent to the consumer. There can be multiple consumers per producer. These processors can be useful to push frames from one part of a pipeline to a different one (e.g. when using `ParallelPipeline`). - Improvements for the `SmallWebRTCTransport`: - Wait until the pipeline is ready before triggering the `connected` event. - Queue messages if the data channel is not ready. - Update the aiortc dependency to fix an issue where the 'video/rtx' MIME type was incorrectly handled as a codec retransmission. - Avoid initial video delays. ### Changed - In `GeminiMultimodalLiveLLMService`, removed the `transcribe_model_audio` parameter in favor of Gemini Live's native output transcription support. Now text transcriptions are produced directly by the model. No configuration is required. - Updated `GeminiMultimodalLiveLLMService`’s default `model` to `models/gemini-2.0-flash-live-001` and `base_url` to the `v1beta` websocket URL. ### Fixed - Updated `daily-python` to 0.17.0 to fix an issue that was preventing to run on older platforms. - Fixed an issue where `CartesiaTTSService`'s spell feature would result in the spelled word in the context appearing as "F,O,O,B,A,R" instead of "FOOBAR". - Fixed an issue in the Azure TTS services where the language was being set incorrectly. - Fixed `SmallWebRTCTransport` to support dynamic values for `TransportParams.audio_out_10ms_chunks`. Previously, it only worked with 20ms chunks. - Fixed an issue with `GeminiMultimodalLiveLLMService` where the assistant context messages had no space between words. - Fixed an issue where `LLMAssistantContextAggregator` would prevent a `BotStoppedSpeakingFrame` from moving through the pipeline. ## [0.0.62] - 2025-04-01 "An April Fools' release" ### Added - Added `TransportParams.audio_out_10ms_chunks` parameter to allow controlling the amount of audio being sent by the output transport. It defaults to 4, so 40ms audio chunks are sent. - Added `QwenLLMService` for Qwen integration with an OpenAI-compatible interface. Added foundational example `14q-function-calling-qwen.py`. - Added `Mem0MemoryService`. Mem0 is a self-improving memory layer for LLM applications. Learn more at: https://mem0.ai/. - Added `WhisperSTTServiceMLX` for Whisper transcription on Apple Silicon. See example in `examples/foundational/13e-whisper-mlx.py`. Latency of completed transcription using Whisper large-v3-turbo on an M4 macbook is ~500ms. - Added `SmallWebRTCTransport`, a new P2P WebRTC transport. - Created two examples in `p2p-webrtc`: - **video-transform**: Demonstrates sending and receiving audio/video with `SmallWebRTCTransport` using `TypeScript`. Includes video frame processing with OpenCV. - **voice-agent**: A minimal example of creating a voice agent with `SmallWebRTCTransport`. - `GladiaSTTService` now have comprehensive support for the latest API config options, including model, language detection, preprocessing, custom vocabulary, custom spelling, translation, and message filtering options. - Added `SmallWebRTCTransport`, a new P2P WebRTC transport. - Created two examples in `p2p-webrtc`: - **video-transform**: Demonstrates sending and receiving audio/video with `SmallWebRTCTransport` using `TypeScript`. Includes video frame processing with OpenCV. - **voice-agent**: A minimal example of creating a voice agent with `SmallWebRTCTransport`. - Added support to `ProtobufFrameSerializer` to send the messages from `TransportMessageFrame` and `TransportMessageUrgentFrame`. - Added support for a new TTS service, `PiperTTSService`. (see https://github.com/rhasspy/piper/) - It is now possible to tell whether `UserStartedSpeakingFrame` or `UserStoppedSpeakingFrame` have been generated because of emulation frames. ### Changed - `FunctionCallResultFrame`a are now system frames. This is to prevent function call results to be discarded during interruptions. - Pipecat services have been reorganized into packages. Each package can have one or more of the following modules (in the future new module names might be needed) depending on the services implemented: - image: for image generation services - llm: for LLM services - memory: for memory services - stt: for Speech-To-Text services - tts: for Text-To-Speech services - video: for video generation services - vision: for video recognition services - Base classes for AI services have been reorganized into modules. They can now be found in `pipecat.services.[ai_service,image_service,llm_service,stt_service,vision_service]`. - `GladiaSTTService` now uses the `solaria-1` model by default. Other params use Gladia's default values. Added support for more language codes. ### Deprecated - All Pipecat services imports have been deprecated and a warning will be shown when using the old import. The new import should be `pipecat.services.[service].[image,llm,memory,stt,tts,video,vision]`. For example, `from pipecat.services.openai.llm import OpenAILLMService`. - Import for AI services base classes from `pipecat.services.ai_services` is now deprecated, use one of `pipecat.services.[ai_service,image_service,llm_service,stt_service,vision_service]`. - Deprecated the `language` parameter in `GladiaSTTService.InputParams` in favor of `language_config`, which better aligns with Gladia's API. - Deprecated using `GladiaSTTService.InputParams` directly. Use the new `GladiaInputParams` class instead. ### Fixed - Fixed a `FastAPIWebsocketTransport` and `WebsocketClientTransport` issue that would cause the transport to be closed prematurely, preventing the internally queued audio to be sent. The same issue could also cause an infinite loop while using an output mixer and when sending an `EndFrame`, preventing the bot to finish. - Fixed an issue that could cause the `TranscriptionUpdateFrame` being pushed because of an interruption to be discarded. - Fixed an issue that would cause `SegmentedSTTService` based services (e.g. `OpenAISTTService`) to try to transcribe non-spoken audio, causing invalid transcriptions. - Fixed an issue where `GoogleTTSService` was emitting two `TTSStoppedFrames`. ### Performance - Output transports now send 40ms audio chunks instead of 20ms. This should improve performance. - `BotSpeakingFrame`s are now sent every 200ms. If the output transport audio chunks are higher than 200ms then they will be sent at every audio chunk. ### Other - Added foundational example `37-mem0.py` demonstrating how to use the `Mem0MemoryService`. - Added foundational example `13e-whisper-mlx.py` demonstrating how to use the `WhisperSTTServiceMLX`. ## [0.0.61] - 2025-03-26 ### Added - Added a new frame, `LLMSetToolChoiceFrame`, which provides a mechanism for modifying the `tool_choice` in the context. - Added `GroqTTSService` which provides text-to-speech functionality using Groq's API. - Added support in `DailyTransport` for updating remote participants' `canReceive` permission via the `update_remote_participants()` method, by bumping the daily-python dependency to >= 0.16.0. - ElevenLabs TTS services now support a sample rate of 8000. - Added support for `instructions` in `OpenAITTSService`. - Added support for `base_url` in `OpenAIImageGenService` and `OpenAITTSService`. ### Fixed - Fixed an issue in `RTVIObserver` that prevented handling of Google LLM context messages. The observer now processes both OpenAI-style and Google-style contexts. - Fixed an issue in Daily involving switching virtual devices, by bumping the daily-python dependency to >= 0.16.1. - Fixed a `GoogleAssistantContextAggregator` issue where function calls placeholders where not being updated when then function call result was different from a string. - Fixed an issue that would cause `LLMAssistantContextAggregator` to block processing more frames while processing a function call result. - Fixed an issue where the `RTVIObserver` would report two bot started and stopped speaking events for each bot turn. - Fixed an issue in `UltravoxSTTService` that caused improper audio processing and incorrect LLM frame output. ### Other - Added `examples/foundational/07x-interruptible-local.py` to show how a local transport can be used. ## [0.0.60] - 2025-03-20 ### Added - Added `default_headers` parameter to `BaseOpenAILLMService` constructor. ### Changed - Rollback to `deepgram-sdk` 3.8.0 since 3.10.1 was causing connections issues. - Changed the default `InputAudioTranscription` model to `gpt-4o-transcribe` for `OpenAIRealtimeBetaLLMService`. ### Other - Update the `19-openai-realtime-beta.py` and `19a-azure-realtime-beta.py` examples to use the FunctionSchema format. ## [0.0.59] - 2025-03-20 ### Added - When registering a function call it is now possible to indicate if you want the function call to be cancelled if there's a user interruption via `cancel_on_interruption` (defaults to False). This is now possible because function calls are executed concurrently. - Added support for detecting idle pipelines. By default, if no activity has been detected during 5 minutes, the `PipelineTask` will be automatically cancelled. It is possible to override this behavior by passing `cancel_on_idle_timeout=False`. It is also possible to change the default timeout with `idle_timeout_secs` or the frames that prevent the pipeline from being idle with `idle_timeout_frames`. Finally, an `on_idle_timeout` event handler will be triggered if the idle timeout is reached (whether the pipeline task is cancelled or not). - Added `FalSTTService`, which provides STT for Fal's Wizper API. - Added a `reconnect_on_error` parameter to websocket-based TTS services as well as a `on_connection_error` event handler. The `reconnect_on_error` indicates whether the TTS service should reconnect on error. The `on_connection_error` will always get called if there's any error no matter the value of `reconnect_on_error`. This allows, for example, to fallback to a different TTS provider if something goes wrong with the current one. - Added new `SkipTagsAggregator` that extends `BaseTextAggregator` to aggregate text and skips end of sentence matching if aggregated text is between start/end tags. - Added new `PatternPairAggregator` that extends `BaseTextAggregator` to identify content between matching pattern pairs in streamed text. This allows for detection and processing of structured content like XML-style tags that may span across multiple text chunks or sentence boundaries. - Added new `BaseTextAggregator`. Text aggregators are used by the TTS service to aggregate LLM tokens and decide when the aggregated text should be pushed to the TTS service. They also allow for the text to be manipulated while it's being aggregated. A text aggregator can be passed via `text_aggregator` to the TTS service. - Added new `sample_rate` constructor parameter to `TavusVideoService` to allow changing the output sample rate. - Added new `NeuphonicTTSService`. (see https://neuphonic.com) - Added new `UltravoxSTTService`. (see https://github.com/fixie-ai/ultravox) - Added `on_frame_reached_upstream` and `on_frame_reached_downstream` event handlers to `PipelineTask`. Those events will be called when a frame reaches the beginning or end of the pipeline respectively. Note that by default, the event handlers will not be called unless a filter is set with `PipelineTask.set_reached_upstream_filter()` or `PipelineTask.set_reached_downstream_filter()`. - Added support for Chirp voices in `GoogleTTSService`. - Added a `flush_audio()` method to `FishTTSService` and `LmntTTSService`. - Added a `set_language` convenience method for `GoogleSTTService`, allowing you to set a single language. This is in addition to the `set_languages` method which allows you to set a list of languages. - Added `on_user_turn_audio_data` and `on_bot_turn_audio_data` to `AudioBufferProcessor`. This gives the ability to grab the audio of only that turn for both the user and the bot. - Added new base class `BaseObject` which is now the base class of `FrameProcessor`, `PipelineRunner`, `PipelineTask` and `BaseTransport`. The new `BaseObject` adds supports for event handlers. - Added support for a unified format for specifying function calling across all LLM services. ```python weather_function = FunctionSchema( name="get_current_weather", description="Get the current weather", properties={ "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA", }, "format": { "type": "string", "enum": ["celsius", "fahrenheit"], "description": "The temperature unit to use. Infer this from the user's location.", }, }, required=["location"], ) tools = ToolsSchema(standard_tools=[weather_function]) ``` - Added `speech_threshold` parameter to `GladiaSTTService`. - Allow passing user (`user_kwargs`) and assistant (`assistant_kwargs`) context aggregator parameters when using `create_context_aggregator()`. The values are passed as a mapping that will then be converted to arguments. - Added `speed` as an `InputParam` for both `ElevenLabsTTSService` and `ElevenLabsHttpTTSService`. - Added new `LLMFullResponseAggregator` to aggregate full LLM completions. At every completion the `on_completion` event handler is triggered. - Added a new frame, `RTVIServerMessageFrame`, and RTVI message `RTVIServerMessage` which provides a generic mechanism for sending custom messages from server to client. The `RTVIServerMessageFrame` is processed by the `RTVIObserver` and will be delivered to the client's `onServerMessage` callback or `ServerMessage` event. - Added `GoogleLLMOpenAIBetaService` for Google LLM integration with an OpenAI-compatible interface. Added foundational example `14o-function-calling-gemini-openai-format.py`. - Added `AzureRealtimeBetaLLMService` to support Azure's OpeanAI Realtime API. Added foundational example `19a-azure-realtime-beta.py`. - Introduced `GoogleVertexLLMService`, a new class for integrating with Vertex AI Gemini models. Added foundational example `14p-function-calling-gemini-vertex-ai.py`. - Added support in `OpenAIRealtimeBetaLLMService` for a slate of new features: - The `'gpt-4o-transcribe'` input audio transcription model, along with new `language` and `prompt` options specific to that model. - The `input_audio_noise_reduction` session property. ```python session_properties = SessionProperties( # ... input_audio_noise_reduction=InputAudioNoiseReduction( type="near_field" # also supported: "far_field" ) # ... ) ``` - The `'semantic_vad'` `turn_detection` session property value, a more sophisticated model for detecting when the user has stopped speaking. - `on_conversation_item_created` and `on_conversation_item_updated` events to `OpenAIRealtimeBetaLLMService`. ```python @llm.event_handler("on_conversation_item_created") async def on_conversation_item_created(llm, item_id, item): # ... @llm.event_handler("on_conversation_item_updated") async def on_conversation_item_updated(llm, item_id, item): # `item` may not always be available here # ... ``` - The `retrieve_conversation_item(item_id)` method for introspecting a conversation item on the server. ```python item = await llm.retrieve_conversation_item(item_id) ``` ### Changed - Updated `OpenAISTTService` to use `gpt-4o-transcribe` as the default transcription model. - Updated `OpenAITTSService` to use `gpt-4o-mini-tts` as the default TTS model. - Function calls are now executed in tasks. This means that the pipeline will not be blocked while the function call is being executed. - ⚠️ `PipelineTask` will now be automatically cancelled if no bot activity is happening in the pipeline. There are a few settings to configure this behavior, see `PipelineTask` documentation for more details. - All event handlers are now executed in separate tasks in order to prevent blocking the pipeline. It is possible that event handlers take some time to execute in which case the pipeline would be blocked waiting for the event handler to complete. - Updated `TranscriptProcessor` to support text output from `OpenAIRealtimeBetaLLMService`. - `OpenAIRealtimeBetaLLMService` and `GeminiMultimodalLiveLLMService` now push a `TTSTextFrame`. - Updated the default mode for `CartesiaTTSService` and `CartesiaHttpTTSService` to `sonic-2`. ### Deprecated - Passing a `start_callback` to `LLMService.register_function()` is now deprecated, simply move the code from the start callback to the function call. - `TTSService` parameter `text_filter` is now deprecated, use `text_filters` instead which is now a list. This allows passing multiple filters that will be executed in order. ### Removed - Removed deprecated `audio.resample_audio()`, use `create_default_resampler()` instead. - Removed deprecated`stt_service` parameter from `STTMuteFilter`. - Removed deprecated RTVI processors, use an `RTVIObserver` instead. - Removed deprecated `AWSTTSService`, use `PollyTTSService` instead. - Removed deprecated field `tier` from `DailyTranscriptionSettings`, use `model` instead. - Removed deprecated `pipecat.vad` package, use `pipecat.audio.vad` instead. ### Fixed - Fixed an assistant aggregator issue that could cause assistant text to be split into multiple chunks during function calls. - Fixed an assistant aggregator issue that was causing assistant text to not be added to the context during function calls. This could lead to duplications. - Fixed a `SegmentedSTTService` issue that was causing audio to be sent prematurely to the STT service. Instead of analyzing the volume in this service we rely on VAD events which use both VAD and volume. - Fixed a `GeminiMultimodalLiveLLMService` issue that was causing messages to be duplicated in the context when pushing `LLMMessagesAppendFrame` frames. - Fixed an issue with `SegmentedSTTService` based services (e.g. `GroqSTTService`) that was not allow audio to pass-through downstream. - Fixed a `CartesiaTTSService` and `RimeTTSService` issue that would consider text between spelling out tags end of sentence. - Fixed a `match_endofsentence` issue that would result in floating point numbers to be considered an end of sentence. - Fixed a `match_endofsentence` issue that would result in emails to be considered an end of sentence. - Fixed an issue where the RTVI message `disconnect-bot` was pushing an `EndFrame`, resulting in the pipeline not shutting down. It now pushes an `EndTaskFrame` upstream to shutdown the pipeline. - Fixed an issue with the `GoogleSTTService` where stream timeouts during periods of inactivity were causing connection failures. The service now properly detects timeout errors and handles reconnection gracefully, ensuring continuous operation even after periods of silence or when using an `STTMuteFilter`. - Fixed an issue in `RimeTTSService` where the last line of text sent didn't result in an audio output being generated. - Fixed `OpenAIRealtimeBetaLLMService` by adding proper handling for: - The `conversation.item.input_audio_transcription.delta` server message, which was added server-side at some point and not handled client-side. - Errors reported by the `response.done` server message. ### Other - Add foundational example `07w-interruptible-fal.py`, showing `FalSTTService`. - Added a new Ultravox example `examples/foundational/07u-interruptible-ultravox.py`. - Added new Neuphonic examples `examples/foundational/07v-interruptible-neuphonic.py` and `examples/foundational/07v-interruptible-neuphonic-http.py`. - Added a new example `examples/foundational/36-user-email-gathering.py` to show how to gather user emails. The example uses's Cartesia's `` tags and Rime `spell()` function to spell out the emails for confirmation. - Update the `34-audio-recording.py` example to include an STT processor. - Added foundational example `35-voice-switching.py` showing how to use the new `PatternPairAggregator`. This example shows how to encode information for the LLM to instruct TTS voice changes, but this can be used to encode any information into the LLM response, which you want to parse and use in other parts of your application. - Added a Pipecat Cloud deployment example to the `examples` directory. - Removed foundational examples 28b and 28c as the TranscriptProcessor no longer has an LLM depedency. Renamed foundational example 28a to `28-transcript-processor.py`. ## [0.0.58] - 2025-02-26 ### Added - Added track-specific audio event `on_track_audio_data` to `AudioBufferProcessor` for accessing separate input and output audio tracks. - Pipecat version will now be logged on every application startup. This will help us identify what version we are running in case of any issues. - Added a new `StopFrame` which can be used to stop a pipeline task while keeping the frame processors running. The frame processors could then be used in a different pipeline. The difference between a `StopFrame` and a `StopTaskFrame` is that, as with `EndFrame` and `EndTaskFrame`, the `StopFrame` is pushed from the task and the `StopTaskFrame` is pushed upstream inside the pipeline by any processor. - Added a new `PipelineTask` parameter `observers` that replaces the previous `PipelineParams.observers`. - Added a new `PipelineTask` parameter `check_dangling_tasks` to enable or disable checking for frame processors' dangling tasks when the Pipeline finishes running. - Added new `on_completion_timeout` event for LLM services (all OpenAI-based services, Anthropic and Google). Note that this event will only get triggered if LLM timeouts are setup and if the timeout was reached. It can be useful to retrigger another completion and see if the timeout was just a blip. - Added new log observers `LLMLogObserver` and `TranscriptionLogObserver` that can be useful for debugging your pipelines. - Added `room_url` property to `DailyTransport`. - Added `addons` argument to `DeepgramSTTService`. - Added `exponential_backoff_time()` to `utils.network` module. ### Changed - ⚠️ `PipelineTask` now requires keyword arguments (except for the first one for the pipeline). - Updated `PlayHTHttpTTSService` to take a `voice_engine` and `protocol` input in the constructor. The previous method of providing a `voice_engine` input that contains the engine and protocol is deprecated by PlayHT. - The base `TTSService` class now strips leading newlines before sending text to the TTS provider. This change is to solve issues where some TTS providers, like Azure, would not output text due to newlines. - `GrokLLMSService` now uses `grok-2` as the default model. - `AnthropicLLMService` now uses `claude-3-7-sonnet-20250219` as the default model. - `RimeHttpTTSService` needs an `aiohttp.ClientSession` to be passed to the constructor as all the other HTTP-based services. - `RimeHttpTTSService` doesn't use a default voice anymore. - `DeepgramSTTService` now uses the new `nova-3` model by default. If you want to use the previous model you can pass `LiveOptions(model="nova-2-general")`. (see https://deepgram.com/learn/introducing-nova-3-speech-to-text-api) ```python stt = DeepgramSTTService(..., live_options=LiveOptions(model="nova-2-general")) ``` ### Deprecated - `PipelineParams.observers` is now deprecated, you the new `PipelineTask` parameter `observers`. ### Removed - Remove `TransportParams.audio_out_is_live` since it was not being used at all. ### Fixed - Fixed an issue that would cause undesired interruptions via `EmulateUserStartedSpeakingFrame`. - Fixed a `GoogleLLMService` that was causing an exception when sending inline audio in some cases. - Fixed an `AudioContextWordTTSService` issue that would cause an `EndFrame` to disconnect from the TTS service before audio from all the contexts was received. This affected services like Cartesia and Rime. - Fixed an issue that was not allowing to pass an `OpenAILLMContext` to create `GoogleLLMService`'s context aggregators. - Fixed a `ElevenLabsTTSService`, `FishAudioTTSService`, `LMNTTTSService` and `PlayHTTTSService` issue that was resulting in audio requested before an interruption being played after an interruption. - Fixed `match_endofsentence` support for ellipses. - Fixed an issue where `EndTaskFrame` was not triggering `on_client_disconnected` or closing the WebSocket in FastAPI. - Fixed an issue in `DeepgramSTTService` where the `sample_rate` passed to the `LiveOptions` was not being used, causing the service to use the default sample rate of pipeline. - Fixed a context aggregator issue that would not append the LLM text response to the context if a function call happened in the same LLM turn. - Fixed an issue that was causing HTTP TTS services to push `TTSStoppedFrame` more than once. - Fixed a `FishAudioTTSService` issue where `TTSStoppedFrame` was not being pushed. - Fixed an issue that `start_callback` was not invoked for some LLM services. - Fixed an issue that would cause `DeepgramSTTService` to stop working after an error occurred (e.g. sudden network loss). If the network recovered we would not reconnect. - Fixed a `STTMuteFilter` issue that would not mute user audio frames causing transcriptions to be generated by the STT service. ### Other - Added Gemini support to `examples/phone-chatbot`. - Added foundational example `34-audio-recording.py` showing how to use the AudioBufferProcessor callbacks to save merged and track recordings. ## [0.0.57] - 2025-02-14 ### Added - Added new `AudioContextWordTTSService`. This is a TTS base class for TTS services that handling multiple separate audio requests. - Added new frames `EmulateUserStartedSpeakingFrame` and `EmulateUserStoppedSpeakingFrame` which can be used to emulated VAD behavior without VAD being present or not being triggered. - Added a new `audio_in_stream_on_start` field to `TransportParams`. - Added a new method `start_audio_in_streaming` in the `BaseInputTransport`. - This method should be used to start receiving the input audio in case the field `audio_in_stream_on_start` is set to `false`. - Added support for the `RTVIProcessor` to handle buffered audio in `base64` format, converting it into InputAudioRawFrame for transport. - Added support for the `RTVIProcessor` to trigger `start_audio_in_streaming` only after the `client-ready` message. - Added new `MUTE_UNTIL_FIRST_BOT_COMPLETE` strategy to `STTMuteStrategy`. This strategy starts muted and remains muted until the first bot speech completes, ensuring the bot's first response cannot be interrupted. This complements the existing `FIRST_SPEECH` strategy which only mutes during the first detected bot speech. - Added support for Google Cloud Speech-to-Text V2 through `GoogleSTTService`. - Added `RimeTTSService`, a new `WordTTSService`. Updated the foundational example `07q-interruptible-rime.py` to use `RimeTTSService`. - Added support for Groq's Whisper API through the new `GroqSTTService` and OpenAI's Whisper API through the new `OpenAISTTService`. Introduced a new base class `BaseWhisperSTTService` to handle common Whisper API functionality. - Added `PerplexityLLMService` for Perplexity NIM API integration, with an OpenAI-compatible interface. Also, added foundational example `14n-function-calling-perplexity.py`. - Added `DailyTransport.update_remote_participants()`. This allows you to update remote participant's settings, like their permissions or which of their devices are enabled. Requires that the local participant have participant admin permission. ### Changed - We don't consider a colon `:` and end of sentence any more. - Updated `DailyTransport` to respect the `audio_in_stream_on_start` field, ensuring it only starts receiving the audio input if it is enabled. - Updated `FastAPIWebsocketOutputTransport` to send `TransportMessageFrame` and `TransportMessageUrgentFrame` to the serializer. - Updated `WebsocketServerOutputTransport` to send `TransportMessageFrame` and `TransportMessageUrgentFrame` to the serializer. - Enhanced `STTMuteConfig` to validate strategy combinations, preventing `MUTE_UNTIL_FIRST_BOT_COMPLETE` and `FIRST_SPEECH` from being used together as they handle first bot speech differently. - Updated foundational example `07n-interruptible-google.py` to use all Google services. - `RimeHttpTTSService` now uses the `mistv2` model by default. - Improved error handling in `AzureTTSService` to properly detect and log synthesis cancellation errors. - Enhanced `WhisperSTTService` with full language support and improved model documentation. - Updated foundation example `14f-function-calling-groq.py` to use `GroqSTTService` for transcription. - Updated `GroqLLMService` to use `llama-3.3-70b-versatile` as the default model. - `RTVIObserver` doesn't handle `LLMSearchResponseFrame` frames anymore. For now, to handle those frames you need to create a `GoogleRTVIObserver` instead. ### Deprecated - `STTMuteFilter` constructor's `stt_service` parameter is now deprecated and will be removed in a future version. The filter now manages mute state internally instead of querying the STT service. - `RTVI.observer()` is now deprecated, instantiate an `RTVIObserver` directly instead. - All RTVI frame processors (e.g. `RTVISpeakingProcessor`, `RTVIBotLLMProcessor`) are now deprecated, instantiate an `RTVIObserver` instead. ### Fixed - Fixed a `FalImageGenService` issue that was causing the event loop to be blocked while loading the downloadded image. - Fixed a `CartesiaTTSService` service issue that would cause audio overlapping in some cases. - Fixed a websocket-based service issue (e.g. `CartesiaTTSService`) that was preventing a reconnection after the server disconnected cleanly, which was causing an inifite loop instead. - Fixed a `BaseOutputTransport` issue that was causing upstream frames to no be pushed upstream. - Fixed multiple issue where user transcriptions where not being handled properly. It was possible for short utterances to not trigger VAD which would cause user transcriptions to be ignored. It was also possible for one or more transcriptions to be generated after VAD in which case they would also be ignored. - Fixed an issue that was causing `BotStoppedSpeakingFrame` to be generated too late. This could then cause issues unblocking `STTMuteFilter` later than desired. - Fixed an issue that was causing `AudioBufferProcessor` to not record synchronized audio. - Fixed an `RTVI` issue that was causing `bot-tts-text` messages to be sent before being processed by the output transport. - Fixed an issue[#1192] in 11labs where we are trying to reconnect/disconnect the websocket connection even when the connection is already closed. - Fixed an issue where `has_regular_messages` condition was always true in `GoogleLLMContext` due to `Part` having `function_call` & `function_response` with `None` values. ### Other - Added new `instant-voice` example. This example showcases how to enable instant voice communication as soon as a user connects. - Added new `local-input-select-stt` example. This examples allows you to play with local audio inputs by slecting them through a nice text interface. ## [0.0.56] - 2025-02-06 ### Changed - Use `gemini-2.0-flash-001` as the default model for `GoogleLLMSerivce`. - Improved foundational examples 22b, 22c, and 22d to support function calling. With these base examples, `FunctionCallInProgressFrame` and `FunctionCallResultFrame` will no longer be blocked by the gates. ### Fixed - Fixed a `TkLocalTransport` and `LocalAudioTransport` issues that was causing errors on cleanup. - Fixed an issue that was causing `tests.utils` import to fail because of logging setup. - Fixed a `SentryMetrics` issue that was preventing any metrics to be sent to Sentry and also was preventing from metrics frames to be pushed to the pipeline. - Fixed an issue in `BaseOutputTransport` where incoming audio would not be resampled to the desired output sample rate. - Fixed an issue with the `TwilioFrameSerializer` and `TelnyxFrameSerializer` where `twilio_sample_rate` and `telnyx_sample_rate` were incorrectly initialized to `audio_in_sample_rate`. Those values currently default to 8000 and should be set manually from the serializer constructor if a different value is needed. ### Other - Added a new `sentry-metrics` example. ## [0.0.55] - 2025-02-05 ### Added - Added a new `start_metadata` field to `PipelineParams`. The provided metadata will be set to the initial `StartFrame` being pushed from the `PipelineTask`. - Added new fields to `PipelineParams` to control audio input and output sample rates for the whole pipeline. This allows controlling sample rates from a single place instead of having to specify sample rates in each service. Setting a sample rate to a service is still possible and will override the value from `PipelineParams`. - Introduce audio resamplers (`BaseAudioResampler`). This is just a base class to implement audio resamplers. Currently, two implementations are provided `SOXRAudioResampler` and `ResampyResampler`. A new `create_default_resampler()` has been added (replacing the now deprecated `resample_audio()`). - It is now possible to specify the asyncio event loop that a `PipelineTask` and all the processors should run on by passing it as a new argument to the `PipelineRunner`. This could allow running pipelines in multiple threads each one with its own event loop. - Added a new `utils.TaskManager`. Instead of a global task manager we now have a task manager per `PipelineTask`. In the previous version the task manager was global, so running multiple simultaneous `PipelineTask`s could result in dangling task warnings which were not actually true. In order, for all the processors to know about the task manager, we pass it through the `StartFrame`. This means that processors should create tasks when they receive a `StartFrame` but not before (because they don't have a task manager yet). - Added `TelnyxFrameSerializer` to support Telnyx calls. A full running example has also been added to `examples/telnyx-chatbot`. - Allow pushing silence audio frames before `TTSStoppedFrame`. This might be useful for testing purposes, for example, passing bot audio to an STT service which usually needs additional audio data to detect the utterance stopped. - `TwilioSerializer` now supports transport message frames. With this we can create Twilio emulators. - Added a new transport: `WebsocketClientTransport`. - Added a `metadata` field to `Frame` which makes it possible to pass custom data to all frames. - Added `test/utils.py` inside of pipecat package. ### Changed - `GatedOpenAILLMContextAggregator` now require keyword arguments. Also, a new `start_open` argument has been added to set the initial state of the gate. - Added `organization` and `project` level authentication to `OpenAILLMService`. - Improved the language checking logic in `ElevenLabsTTSService` and `ElevenLabsHttpTTSService` to properly handle language codes based on model compatibility, with appropriate warnings when language codes cannot be applied. - Updated `GoogleLLMContext` to support pushing `LLMMessagesUpdateFrame`s that contain a combination of function calls, function call responses, system messages, or just messages. - `InputDTMFFrame` is now based on `DTMFFrame`. There's also a new `OutputDTMFFrame` frame. ### Deprecated - `resample_audio()` is now deprecated, use `create_default_resampler()` instead. ### Removed - `AudioBufferProcessor.reset_audio_buffers()` has been removed, use `AudioBufferProcessor.start_recording()` and `AudioBufferProcessor.stop_recording()` instead. ### Fixed - Fixed a `AudioBufferProcessor` that would cause crackling in some recordings. - Fixed an issue in `AudioBufferProcessor` where user callback would not be called on task cancellation. - Fixed an issue in `AudioBufferProcessor` that would cause wrong silence padding in some cases. - Fixed an issue where `ElevenLabsTTSService` messages would return a 1009 websocket error by increasing the max message size limit to 16MB. - Fixed a `DailyTransport` issue that would cause events to be triggered before join finished. - Fixed a `PipelineTask` issue that was preventing processors to be cleaned up after cancelling the task. - Fixed an issue where queuing a `CancelFrame` to a pipeline task would not cause the task to finish. However, using `PipelineTask.cancel()` is still the recommended way to cancel a task. ### Other - Improved Unit Test `run_test()` to use `PipelineTask` and `PipelineRunner`. There's now also some control around `StartFrame` and `EndFrame`. The `EndTaskFrame` has been removed since it doesn't seem necessary with this new approach. - Updated `twilio-chatbot` with a few new features: use 8000 sample rate and avoid resampling, a new client useful for stress testing and testing locally without the need to make phone calls. Also, added audio recording on both the client and the server to make sure the audio sounds good. - Updated examples to use `task.cancel()` to immediately exit the example when a participant leaves or disconnects, instead of pushing an `EndFrame`. Pushing an `EndFrame` causes the bot to run through everything that is internally queued (which could take some seconds). Note that using `task.cancel()` might not always be the best option and pushing an `EndFrame` could still be desirable to make sure all the pipeline is flushed. ## [0.0.54] - 2025-01-27 ### Added - In order to create tasks in Pipecat frame processors it is now recommended to use `FrameProcessor.create_task()` (which uses the new `utils.asyncio.create_task()`). It takes care of uncaught exceptions, task cancellation handling and task management. To cancel or wait for a task there is `FrameProcessor.cancel_task()` and `FrameProcessor.wait_for_task()`. All of Pipecat processors have been updated accordingly. Also, when a pipeline runner finishes, a warning about dangling tasks might appear, which indicates if any of the created tasks was never cancelled or awaited for (using these new functions). - It is now possible to specify the period of the `PipelineTask` heartbeat frames with `heartbeats_period_secs`. - Added `DailyMeetingTokenProperties` and `DailyMeetingTokenParams` Pydantic models for meeting token creation in `get_token` method of `DailyRESTHelper`. - Added `enable_recording` and `geo` parameters to `DailyRoomProperties`. - Added `RecordingsBucketConfig` to `DailyRoomProperties` to upload recordings to a custom AWS bucket. ### Changed - Enhanced `UserIdleProcessor` with retry functionality and control over idle monitoring via new callback signature `(processor, retry_count) -> bool`. Updated the `17-detect-user-idle.py` to show how to use the `retry_count`. - Add defensive error handling for `OpenAIRealtimeBetaLLMService`'s audio truncation. Audio truncation errors during interruptions now log a warning and allow the session to continue instead of throwing an exception. - Modified `TranscriptProcessor` to use TTS text frames for more accurate assistant transcripts. Assistant messages are now aggregated based on bot speaking boundaries rather than LLM context, providing better handling of interruptions and partial utterances. - Updated foundational examples `28a-transcription-processor-openai.py`, `28b-transcript-processor-anthropic.py`, and `28c-transcription-processor-gemini.py` to use the updated `TranscriptProcessor`. ### Fixed - Fixed an `GeminiMultimodalLiveLLMService` issue that was preventing the user to push initial LLM assistant messages (using `LLMMessagesAppendFrame`). - Added missing `FrameProcessor.cleanup()` calls to `Pipeline`, `ParallelPipeline` and `UserIdleProcessor`. - Fixed a type error when using `voice_settings` in `ElevenLabsHttpTTSService`. - Fixed an issue where `OpenAIRealtimeBetaLLMService` function calling resulted in an error. - Fixed an issue in `AudioBufferProcessor` where the last audio buffer was not being processed, in cases where the `_user_audio_buffer` was smaller than the buffer size. ### Performance - Replaced audio resampling library `resampy` with `soxr`. Resampling a 2:21s audio file from 24KHz to 16KHz took 1.41s with `resampy` and 0.031s with `soxr` with similar audio quality. ### Other - Added initial unit test infrastructure. ## [0.0.53] - 2025-01-18 ### Added - Added `ElevenLabsHttpTTSService` which uses EleveLabs' HTTP API instead of the websocket one. - Introduced pipeline frame observers. Observers can view all the frames that go through the pipeline without the need to inject processors in the pipeline. This can be useful, for example, to implement frame loggers or debuggers among other things. The example `examples/foundational/30-observer.py` shows how to add an observer to a pipeline for debugging. - Introduced heartbeat frames. The pipeline task can now push periodic heartbeats down the pipeline when `enable_heartbeats=True`. Heartbeats are system frames that are supposed to make it all the way to the end of the pipeline. When a heartbeat frame is received the traversing time (i.e. the time it took to go through the whole pipeline) will be displayed (with TRACE logging) otherwise a warning will be shown. The example `examples/foundational/31-heartbeats.py` shows how to enable heartbeats and forces warnings to be displayed. - Added `LLMTextFrame` and `TTSTextFrame` which should be pushed by LLM and TTS services respectively instead of `TextFrame`s. - Added `OpenRouter` for OpenRouter integration with an OpenAI-compatible interface. Added foundational example `14m-function-calling-openrouter.py`. - Added a new `WebsocketService` based class for TTS services, containing base functions and retry logic. - Added `DeepSeekLLMService` for DeepSeek integration with an OpenAI-compatible interface. Added foundational example `14l-function-calling-deepseek.py`. - Added `FunctionCallResultProperties` dataclass to provide a structured way to control function call behavior, including: - `run_llm`: Controls whether to trigger LLM completion - `on_context_updated`: Optional callback triggered after context update - Added a new foundational example `07e-interruptible-playht-http.py` for easy testing of `PlayHTHttpTTSService`. - Added support for Google TTS Journey voices in `GoogleTTSService`. - Added `29-livekit-audio-chat.py`, as a new foundational examples for `LiveKitTransportLayer`. - Added `enable_prejoin_ui`, `max_participants` and `start_video_off` params to `DailyRoomProperties`. - Added `session_timeout` to `FastAPIWebsocketTransport` and `WebsocketServerTransport` for configuring session timeouts (in seconds). Triggers `on_session_timeout` for custom timeout handling. See [examples/websocket-server/bot.py](https://github.com/pipecat-ai/pipecat/blob/main/examples/websocket-server/bot.py). - Added the new modalities option and helper function to set Gemini output modalities. - Added `examples/foundational/26d-gemini-live-text.py` which is using Gemini as TEXT modality and using another TTS provider for TTS process. ### Changed - Modified `UserIdleProcessor` to start monitoring only after first conversation activity (`UserStartedSpeakingFrame` or `BotStartedSpeakingFrame`) instead of immediately. - Modified `OpenAIAssistantContextAggregator` to support controlled completions and to emit context update callbacks via `FunctionCallResultProperties`. - Added `aws_session_token` to the `PollyTTSService`. - Changed the default model for `PlayHTHttpTTSService` to `Play3.0-mini-http`. - `api_key`, `aws_access_key_id` and `region` are no longer required parameters for the PollyTTSService (AWSTTSService) - Added `session_timeout` example in `examples/websocket-server/bot.py` to handle session timeout event. - Changed `InputParams` in `src/pipecat/services/gemini_multimodal_live/gemini.py` to support different modalities. - Changed `DeepgramSTTService` to send `finalize` event whenever VAD detects `UserStoppedSpeakingFrame`. This helps in faster transcriptions and clearing the `Deepgram` audio buffer. ### Fixed - Fixed an issue where `DeepgramSTTService` was not generating metrics using pipeline's VAD. - Fixed `UserIdleProcessor` not properly propagating `EndFrame`s through the pipeline. - Fixed an issue where websocket based TTS services could incorrectly terminate their connection due to a retry counter not resetting. - Fixed a `PipelineTask` issue that would cause a dangling task after stopping the pipeline with an `EndFrame`. - Fixed an import issue for `PlayHTHttpTTSService`. - Fixed an issue where languages couldn't be used with the `PlayHTHttpTTSService`. - Fixed an issue where `OpenAIRealtimeBetaLLMService` audio chunks were hitting an error when truncating audio content. - Fixed an issue where setting the voice and model for `RimeHttpTTSService` wasn't working. - Fixed an issue where `IdleFrameProcessor` and `UserIdleProcessor` were getting initialized before the start of the pipeline. ## [0.0.52] - 2024-12-24 ### Added - Constructor arguments for GoogleLLMService to directly set tools and tool_config. - Smart turn detection example (`22d-natural-conversation-gemini-audio.py`) that leverages Gemini 2.0 capabilities (). (see https://x.com/kwindla/status/1870974144831275410) - Added `DailyTransport.send_dtmf()` to send dial-out DTMF tones. - Added `DailyTransport.sip_call_transfer()` to forward SIP and PSTN calls to another address or number. For example, transfer a SIP call to a different SIP address or transfer a PSTN phone number to a different PSTN phone number. - Added `DailyTransport.sip_refer()` to transfer incoming SIP/PSTN calls from outside Daily to another SIP/PSTN address. - Added an `auto_mode` input parameter to `ElevenLabsTTSService`. `auto_mode` is set to `True` by default. Enabling this setting disables the chunk schedule and all buffers, which reduces latency. - Added `KoalaFilter` which implement on device noise reduction using Koala Noise Suppression. (see https://picovoice.ai/platform/koala/) - Added `CerebrasLLMService` for Cerebras integration with an OpenAI-compatible interface. Added foundational example `14k-function-calling-cerebras.py`. - Pipecat now supports Python 3.13. We had a dependency on the `audioop` package which was deprecated and now removed on Python 3.13. We are now using `audioop-lts` (https://github.com/AbstractUmbra/audioop) to provide the same functionality. - Added timestamped conversation transcript support: - New `TranscriptProcessor` factory provides access to user and assistant transcript processors. - `UserTranscriptProcessor` processes user speech with timestamps from transcription. - `AssistantTranscriptProcessor` processes assistant responses with LLM context timestamps. - Messages emitted with ISO 8601 timestamps indicating when they were spoken. - Supports all LLM formats (OpenAI, Anthropic, Google) via standard message format. - New examples: `28a-transcription-processor-openai.py`, `28b-transcription-processor-anthropic.py`, and `28c-transcription-processor-gemini.py`. - Add support for more languages to ElevenLabs (Arabic, Croatian, Filipino, Tamil) and PlayHT (Afrikans, Albanian, Amharic, Arabic, Bengali, Croatian, Galician, Hebrew, Mandarin, Serbian, Tagalog, Urdu, Xhosa). ### Changed - `PlayHTTTSService` uses the new v4 websocket API, which also fixes an issue where text inputted to the TTS didn't return audio. - The default model for `ElevenLabsTTSService` is now `eleven_flash_v2_5`. - `OpenAIRealtimeBetaLLMService` now takes a `model` parameter in the constructor. - Updated the default model for the `OpenAIRealtimeBetaLLMService`. - Room expiration (`exp`) in `DailyRoomProperties` is now optional (`None`) by default instead of automatically setting a 5-minute expiration time. You must explicitly set expiration time if desired. ### Deprecated - `AWSTTSService` is now deprecated, use `PollyTTSService` instead. ### Fixed - Fixed token counting in `GoogleLLMService`. Tokens were summed incorrectly (double-counted in many cases). - Fixed an issue that could cause the bot to stop talking if there was a user interruption before getting any audio from the TTS service. - Fixed an issue that would cause `ParallelPipeline` to handle `EndFrame` incorrectly causing the main pipeline to not terminate or terminate too early. - Fixed an audio stuttering issue in `FastPitchTTSService`. - Fixed a `BaseOutputTransport` issue that was causing non-audio frames being processed before the previous audio frames were played. This will allow, for example, sending a frame `A` after a `TTSSpeakFrame` and the frame `A` will only be pushed downstream after the audio generated from `TTSSpeakFrame` has been spoken. - Fixed a `DeepgramSTTService` issue that was causing language to be passed as an object instead of a string resulting in the connection to fail. ## [0.0.51] - 2024-12-16 ### Fixed - Fixed an issue in websocket-based TTS services that was causing infinite reconnections (Cartesia, ElevenLabs, PlayHT and LMNT). ## [0.0.50] - 2024-12-11 ### Added - Added `GeminiMultimodalLiveLLMService`. This is an integration for Google's Gemini Multimodal Live API, supporting: - Real-time audio and video input processing - Streaming text responses with TTS - Audio transcription for both user and bot speech - Function calling - System instructions and context management - Dynamic parameter updates (temperature, top_p, etc.) - Added `AudioTranscriber` utility class for handling audio transcription with Gemini models. - Added new context classes for Gemini: - `GeminiMultimodalLiveContext` - `GeminiMultimodalLiveUserContextAggregator` - `GeminiMultimodalLiveAssistantContextAggregator` - `GeminiMultimodalLiveContextAggregatorPair` - Added new foundational examples for `GeminiMultimodalLiveLLMService`: - `26-gemini-multimodal-live.py` - `26a-gemini-live-transcription.py` - `26b-gemini-live-video.py` - `26c-gemini-live-video.py` - Added `SimliVideoService`. This is an integration for Simli AI avatars. (see https://www.simli.com) - Added NVIDIA Riva's `FastPitchTTSService` and `ParakeetSTTService`. (see https://www.nvidia.com/en-us/ai-data-science/products/riva/) - Added `IdentityFilter`. This is the simplest frame filter that lets through all incoming frames. - New `STTMuteStrategy` called `FUNCTION_CALL` which mutes the STT service during LLM function calls. - `DeepgramSTTService` now exposes two event handlers `on_speech_started` and `on_utterance_end` that could be used to implement interruptions. See new example `examples/foundational/07c-interruptible-deepgram-vad.py`. - Added `GroqLLMService`, `GrokLLMService`, and `NimLLMService` for Groq, Grok, and NVIDIA NIM API integration, with an OpenAI-compatible interface. - New examples demonstrating function calling with Groq, Grok, Azure OpenAI, Fireworks, and NVIDIA NIM: `14f-function-calling-groq.py`, `14g-function-calling-grok.py`, `14h-function-calling-azure.py`, `14i-function-calling-fireworks.py`, and `14j-function-calling-nvidia.py`. - In order to obtain the audio stored by the `AudioBufferProcessor` you can now also register an `on_audio_data` event handler. The `on_audio_data` handler will be called every time `buffer_size` (a new constructor argument) is reached. If `buffer_size` is 0 (default) you need to manually get the audio as before using `AudioBufferProcessor.merge_audio_buffers()`. ``` @audiobuffer.event_handler("on_audio_data") async def on_audio_data(processor, audio, sample_rate, num_channels): await save_audio(audio, sample_rate, num_channels) ``` - Added a new RTVI message called `disconnect-bot`, which when handled pushes an `EndFrame` to trigger the pipeline to stop. ### Changed - `STTMuteFilter` now supports multiple simultaneous muting strategies. - `XTTSService` language now defaults to `Language.EN`. - `SoundfileMixer` doesn't resample input files anymore to avoid startup delays. The sample rate of the provided sound files now need to match the sample rate of the output transport. - Input frames (audio, image and transport messages) are now system frames. This means they are processed immediately by all processors instead of being queued internally. - Expanded the transcriptions.language module to support a superset of languages. - Updated STT and TTS services with language options that match the supported languages for each service. - Updated the `AzureLLMService` to use the `OpenAILLMService`. Updated the `api_version` to `2024-09-01-preview`. - Updated the `FireworksLLMService` to use the `OpenAILLMService`. Updated the default model to `accounts/fireworks/models/firefunction-v2`. - Updated the `simple-chatbot` example to include a Javascript and React client example, using RTVI JS and React. ### Removed - Removed `AppFrame`. This was used as a special user custom frame, but there's actually no use case for that. ### Fixed - Fixed a `ParallelPipeline` issue that would cause system frames to be queued. - Fixed `FastAPIWebsocketTransport` so it can work with binary data (e.g. using the protobuf serializer). - Fixed an issue in `CartesiaTTSService` that could cause previous audio to be received after an interruption. - Fixed Cartesia, ElevenLabs, LMNT and PlayHT TTS websocket reconnection. Before, if an error occurred no reconnection was happening. - Fixed a `BaseOutputTransport` issue that was causing audio to be discarded after an `EndFrame` was received. - Fixed an issue in `WebsocketServerTransport` and `FastAPIWebsocketTransport` that would cause a busy loop when using audio mixer. - Fixed a `DailyTransport` and `LiveKitTransport` issue where connections were being closed in the input transport prematurely. This was causing frames queued inside the pipeline being discarded. - Fixed an issue in `DailyTransport` that would cause some internal callbacks to not be executed. - Fixed an issue where other frames were being processed while a `CancelFrame` was being pushed down the pipeline. - `AudioBufferProcessor` now handles interruptions properly. - Fixed a `WebsocketServerTransport` issue that would prevent interruptions with `TwilioSerializer` from working. - `DailyTransport.capture_participant_video` now allows capturing user's screen share by simply passing `video_source="screenVideo"`. - Fixed Google Gemini message handling to properly convert appended messages to Gemini's required format. - Fixed an issue with `FireworksLLMService` where chat completions were failing by removing the `stream_options` from the chat completion options. ## [0.0.49] - 2024-11-17 ### Added - Added RTVI `on_bot_started` event which is useful in a single turn interaction. - Added `DailyTransport` events `dialin-connected`, `dialin-stopped`, `dialin-error` and `dialin-warning`. Needs daily-python >= 0.13.0. - Added `RimeHttpTTSService` and the `07q-interruptible-rime.py` foundational example. - Added `STTMuteFilter`, a general-purpose processor that combines STT muting and interruption control. When active, it prevents both transcription and interruptions during bot speech. The processor supports multiple strategies: `FIRST_SPEECH` (mute only during bot's first speech), `ALWAYS` (mute during all bot speech), or `CUSTOM` (using provided callback). - Added `STTMuteFrame`, a control frame that enables/disables speech transcription in STT services. ## [0.0.48] - 2024-11-10 "Antonio release" ### Added - There's now an input queue in each frame processor. When you call `FrameProcessor.push_frame()` this will internally call `FrameProcessor.queue_frame()` on the next processor (upstream or downstream) and the frame will be internally queued (except system frames). Then, the queued frames will get processed. With this input queue it is also possible for FrameProcessors to block processing more frames by calling `FrameProcessor.pause_processing_frames()`. The way to resume processing frames is by calling `FrameProcessor.resume_processing_frames()`. - Added audio filter `NoisereduceFilter`. - Introduce input transport audio filters (`BaseAudioFilter`). Audio filters can be used to remove background noises before audio is sent to VAD. - Introduce output transport audio mixers (`BaseAudioMixer`). Output transport audio mixers can be used, for example, to add background sounds or any other audio mixing functionality before the output audio is actually written to the transport. - Added `GatedOpenAILLMContextAggregator`. This aggregator keeps the last received OpenAI LLM context frame and it doesn't let it through until the notifier is notified. - Added `WakeNotifierFilter`. This processor expects a list of frame types and will execute a given callback predicate when a frame of any of those type is being processed. If the callback returns true the notifier will be notified. - Added `NullFilter`. A null filter doesn't push any frames upstream or downstream. This is usually used to disable one of the pipelines in `ParallelPipeline`. - Added `EventNotifier`. This can be used as a very simple synchronization feature between processors. - Added `TavusVideoService`. This is an integration for Tavus digital twins. (see https://www.tavus.io/) - Added `DailyTransport.update_subscriptions()`. This allows you to have fine grained control of what media subscriptions you want for each participant in a room. - Added audio filter `KrispFilter`. ### Changed - The following `DailyTransport` functions are now `async` which means they need to be awaited: `start_dialout`, `stop_dialout`, `start_recording`, `stop_recording`, `capture_participant_transcription` and `capture_participant_video`. - Changed default output sample rate to 24000. This changes all TTS service to output to 24000 and also the default output transport sample rate. This improves audio quality at the cost of some extra bandwidth. - `AzureTTSService` now uses Azure websockets instead of HTTP requests. - The previous `AzureTTSService` HTTP implementation is now `AzureHttpTTSService`. ### Fixed - Websocket transports (FastAPI and Websocket) now synchronize with time before sending data. This allows for interruptions to just work out of the box. - Improved bot speaking detection for all TTS services by using actual bot audio. - Fixed an issue that was generating constant bot started/stopped speaking frames for HTTP TTS services. - Fixed an issue that was causing stuttering with AWS TTS service. - Fixed an issue with PlayHTTTSService, where the TTFB metrics were reporting very small time values. - Fixed an issue where AzureTTSService wasn't initializing the specified language. ### Other - Add `23-bot-background-sound.py` foundational example. - Added a new foundational example `22-natural-conversation.py`. This example shows how to achieve a more natural conversation detecting when the user ends statement. ## [0.0.47] - 2024-10-22 ### Added - Added `AssemblyAISTTService` and corresponding foundational examples `07o-interruptible-assemblyai.py` and `13d-assemblyai-transcription.py`. - Added a foundational example for Gladia transcription: `13c-gladia-transcription.py` ### Changed - Updated `GladiaSTTService` to use the V2 API. - Changed `DailyTransport` transcription model to `nova-2-general`. ### Fixed - Fixed an issue that would cause an import error when importing `SileroVADAnalyzer` from the old package `pipecat.vad.silero`. - Fixed `enable_usage_metrics` to control LLM/TTS usage metrics separately from `enable_metrics`. ## [0.0.46] - 2024-10-19 ### Added - Added `audio_passthrough` parameter to `STTService`. If enabled it allows audio frames to be pushed downstream in case other processors need them. - Added input parameter options for `PlayHTTTSService` and `PlayHTHttpTTSService`. ### Changed - Changed `DeepgramSTTService` model to `nova-2-general`. - Moved `SileroVAD` audio processor to `processors.audio.vad`. - Module `utils.audio` is now `audio.utils`. A new `resample_audio` function has been added. - `PlayHTTTSService` now uses PlayHT websockets instead of HTTP requests. - The previous `PlayHTTTSService` HTTP implementation is now `PlayHTHttpTTSService`. - `PlayHTTTSService` and `PlayHTHttpTTSService` now use a `voice_engine` of `PlayHT3.0-mini`, which allows for multi-lingual support. - Renamed `OpenAILLMServiceRealtimeBeta` to `OpenAIRealtimeBetaLLMService` to match other services. ### Deprecated - `LLMUserResponseAggregator` and `LLMAssistantResponseAggregator` are mostly deprecated, use `OpenAILLMContext` instead. - The `vad` package is now deprecated and `audio.vad` should be used instead. The `avd` package will get removed in a future release. ### Fixed - Fixed an issue that would cause an error if no VAD analyzer was passed to `LiveKitTransport` params. - Fixed `SileroVAD` processor to support interruptions properly. ### Other - Added `examples/foundational/07-interruptible-vad.py`. This is the same as `07-interruptible.py` but using the `SileroVAD` processor instead of passing the `VADAnalyzer` in the transport. ## [0.0.45] - 2024-10-16 ### Changed - Metrics messages have moved out from the transport's base output into RTVI. ## [0.0.44] - 2024-10-15 ### Added - Added support for OpenAI Realtime API with the new `OpenAILLMServiceRealtimeBeta` processor. (see https://platform.openai.com/docs/guides/realtime/overview) - Added `RTVIBotTranscriptionProcessor` which will send the RTVI `bot-transcription` protocol message. These are TTS text aggregated (into sentences) messages. - Added new input params to the `MarkdownTextFilter` utility. You can set `filter_code` to filter code from text and `filter_tables` to filter tables from text. - Added `CanonicalMetricsService`. This processor uses the new `AudioBufferProcessor` to capture conversation audio and later send it to Canonical AI. (see https://canonical.chat/) - Added `AudioBufferProcessor`. This processor can be used to buffer mixed user and bot audio. This can later be saved into an audio file or processed by some audio analyzer. - Added `on_first_participant_joined` event to `LiveKitTransport`. ### Changed - LLM text responses are now logged properly as unicode characters. - `UserStartedSpeakingFrame`, `UserStoppedSpeakingFrame`, `BotStartedSpeakingFrame`, `BotStoppedSpeakingFrame`, `BotSpeakingFrame` and `UserImageRequestFrame` are now based from `SystemFrame` ### Fixed - Merge `RTVIBotLLMProcessor`/`RTVIBotLLMTextProcessor` and `RTVIBotTTSProcessor`/`RTVIBotTTSTextProcessor` to avoid out of order issues. - Fixed an issue in RTVI protocol that could cause a `bot-llm-stopped` or `bot-tts-stopped` message to be sent before a `bot-llm-text` or `bot-tts-text` message. - Fixed `DeepgramSTTService` constructor settings not being merged with default ones. - Fixed an issue in Daily transport that would cause tasks to be hanging if urgent transport messages were being sent from a transport event handler. - Fixed an issue in `BaseOutputTransport` that would cause `EndFrame` to be pushed downed too early and call `FrameProcessor.cleanup()` before letting the transport stop properly. ## [0.0.43] - 2024-10-10 ### Added - Added a new util called `MarkdownTextFilter` which is a subclass of a new base class called `BaseTextFilter`. This is a configurable utility which is intended to filter text received by TTS services. - Added new `RTVIUserLLMTextProcessor`. This processor will send an RTVI `user-llm-text` message with the user content's that was sent to the LLM. ### Changed - `TransportMessageFrame` doesn't have an `urgent` field anymore, instead there's now a `TransportMessageUrgentFrame` which is a `SystemFrame` and therefore skip all internal queuing. - For TTS services, convert inputted languages to match each service's language format ### Fixed - Fixed an issue where changing a language with the Deepgram STT service wouldn't apply the change. This was fixed by disconnecting and reconnecting when the language changes. ## [0.0.42] - 2024-10-02 ### Added - `SentryMetrics` has been added to report frame processor metrics to Sentry. This is now possible because `FrameProcessorMetrics` can now be passed to `FrameProcessor`. - Added Google TTS service and corresponding foundational example `07n-interruptible-google.py` - Added AWS Polly TTS support and `07m-interruptible-aws.py` as an example. - Added InputParams to Azure TTS service. - Added `LivekitTransport` (audio-only for now). - RTVI 0.2.0 is now supported. - All `FrameProcessors` can now register event handlers. ``` tts = SomeTTSService(...) @tts.event_handler("on_connected"): async def on_connected(processor): ... ``` - Added `AsyncGeneratorProcessor`. This processor can be used together with a `FrameSerializer` as an async generator. It provides a `generator()` function that returns an `AsyncGenerator` and that yields serialized frames. - Added `EndTaskFrame` and `CancelTaskFrame`. These are new frames that are meant to be pushed upstream to tell the pipeline task to stop nicely or immediately respectively. - Added configurable LLM parameters (e.g., temperature, top_p, max_tokens, seed) for OpenAI, Anthropic, and Together AI services along with corresponding setter functions. - Added `sample_rate` as a constructor parameter for TTS services. - Pipecat has a pipeline-based architecture. The pipeline consists of frame processors linked to each other. The elements traveling across the pipeline are called frames. To have a deterministic behavior the frames traveling through the pipeline should always be ordered, except system frames which are out-of-band frames. To achieve that, each frame processor should only output frames from a single task. In this version all the frame processors have their own task to push frames. That is, when `push_frame()` is called the given frame will be put into an internal queue (with the exception of system frames) and a frame processor task will push it out. - Added pipeline clocks. A pipeline clock is used by the output transport to know when a frame needs to be presented. For that, all frames now have an optional `pts` field (prensentation timestamp). There's currently just one clock implementation `SystemClock` and the `pts` field is currently only used for `TextFrame`s (audio and image frames will be next). - A clock can now be specified to `PipelineTask` (defaults to `SystemClock`). This clock will be passed to each frame processor via the `StartFrame`. - Added `CartesiaHttpTTSService`. - `DailyTransport` now supports setting the audio bitrate to improve audio quality through the `DailyParams.audio_out_bitrate` parameter. The new default is 96kbps. - `DailyTransport` now uses the number of audio output channels (1 or 2) to set mono or stereo audio when needed. - Interruptions support has been added to `TwilioFrameSerializer` when using `FastAPIWebsocketTransport`. - Added new `LmntTTSService` text-to-speech service. (see https://www.lmnt.com/) - Added `TTSModelUpdateFrame`, `TTSLanguageUpdateFrame`, `STTModelUpdateFrame`, and `STTLanguageUpdateFrame` frames to allow you to switch models, language and voices in TTS and STT services. - Added new `transcriptions.Language` enum. ### Changed - Context frames are now pushed downstream from assistant context aggregators. - Removed Silero VAD torch dependency. - Updated individual update settings frame classes into a single `ServiceUpdateSettingsFrame` class. - We now distinguish between input and output audio and image frames. We introduce `InputAudioRawFrame`, `OutputAudioRawFrame`, `InputImageRawFrame` and `OutputImageRawFrame` (and other subclasses of those). The input frames usually come from an input transport and are meant to be processed inside the pipeline to generate new frames. However, the input frames will not be sent through an output transport. The output frames can also be processed by any frame processor in the pipeline and they are allowed to be sent by the output transport. - `ParallelTask` has been renamed to `SyncParallelPipeline`. A `SyncParallelPipeline` is a frame processor that contains a list of different pipelines to be executed concurrently. The difference between a `SyncParallelPipeline` and a `ParallelPipeline` is that, given an input frame, the `SyncParallelPipeline` will wait for all the internal pipelines to complete. This is achieved by making sure the last processor in each of the pipelines is synchronous (e.g. an HTTP-based service that waits for the response). - `StartFrame` is back a system frame to make sure it's processed immediately by all processors. `EndFrame` stays a control frame since it needs to be ordered allowing the frames in the pipeline to be processed. - Updated `MoondreamService` revision to `2024-08-26`. - `CartesiaTTSService` and `ElevenLabsTTSService` now add presentation timestamps to their text output. This allows the output transport to push the text frames downstream at almost the same time the words are spoken. We say "almost" because currently the audio frames don't have presentation timestamp but they should be played at roughly the same time. - `DailyTransport.on_joined` event now returns the full session data instead of just the participant. - `CartesiaTTSService` is now a subclass of `TTSService`. - `DeepgramSTTService` is now a subclass of `STTService`. - `WhisperSTTService` is now a subclass of `SegmentedSTTService`. A `SegmentedSTTService` is a `STTService` where the provided audio is given in a big chunk (i.e. from when the user starts speaking until the user stops speaking) instead of a continous stream. ### Fixed - Fixed OpenAI multiple function calls. - Fixed a Cartesia TTS issue that would cause audio to be truncated in some cases. - Fixed a `BaseOutputTransport` issue that would stop audio and video rendering tasks (after receiving and `EndFrame`) before the internal queue was emptied, causing the pipeline to finish prematurely. - `StartFrame` should be the first frame every processor receives to avoid situations where things are not initialized (because initialization happens on `StartFrame`) and other frames come in resulting in undesired behavior. ### Performance - `obj_id()` and `obj_count()` now use `itertools.count` avoiding the need of `threading.Lock`. ### Other - Pipecat now uses Ruff as its formatter (https://github.com/astral-sh/ruff). ## [0.0.41] - 2024-08-22 ### Added - Added `LivekitFrameSerializer` audio frame serializer. ### Fixed - Fix `FastAPIWebsocketOutputTransport` variable name clash with subclass. - Fix an `AnthropicLLMService` issue with empty arguments in function calling. ### Other - Fixed `studypal` example errors. ## [0.0.40] - 2024-08-20 ### Added - VAD parameters can now be dynamicallt updated using the `VADParamsUpdateFrame`. - `ErrorFrame` has now a `fatal` field to indicate the bot should exit if a fatal error is pushed upstream (false by default). A new `FatalErrorFrame` that sets this flag to true has been added. - `AnthropicLLMService` now supports function calling and initial support for prompt caching. (see https://www.anthropic.com/news/prompt-caching) - `ElevenLabsTTSService` can now specify ElevenLabs input parameters such as `output_format`. - `TwilioFrameSerializer` can now specify Twilio's and Pipecat's desired sample rates to use. - Added new `on_participant_updated` event to `DailyTransport`. - Added `DailyRESTHelper.delete_room_by_name()` and `DailyRESTHelper.delete_room_by_url()`. - Added LLM and TTS usage metrics. Those are enabled when `PipelineParams.enable_usage_metrics` is True. - `AudioRawFrame`s are now pushed downstream from the base output transport. This allows capturing the exact words the bot says by adding an STT service at the end of the pipeline. - Added new `GStreamerPipelineSource`. This processor can generate image or audio frames from a GStreamer pipeline (e.g. reading an MP4 file, and RTP stream or anything supported by GStreamer). - Added `TransportParams.audio_out_is_live`. This flag is False by default and it is useful to indicate we should not synchronize audio with sporadic images. - Added new `BotStartedSpeakingFrame` and `BotStoppedSpeakingFrame` control frames. These frames are pushed upstream and they should wrap `BotSpeakingFrame`. - Transports now allow you to register event handlers without decorators. ### Changed - Support RTVI message protocol 0.1. This includes new messages, support for messages responses, support for actions, configuration, webhooks and a bunch of new cool stuff. (see https://docs.rtvi.ai/) - `SileroVAD` dependency is now imported via pip's `silero-vad` package. - `ElevenLabsTTSService` now uses `eleven_turbo_v2_5` model by default. - `BotSpeakingFrame` is now a control frame. - `StartFrame` is now a control frame similar to `EndFrame`. - `DeepgramTTSService` now is more customizable. You can adjust the encoding and sample rate. ### Fixed - `TTSStartFrame` and `TTSStopFrame` are now sent when TTS really starts and stops. This allows for knowing when the bot starts and stops speaking even with asynchronous services (like Cartesia). - Fixed `AzureSTTService` transcription frame timestamps. - Fixed an issue with `DailyRESTHelper.create_room()` expirations which would cause this function to stop working after the initial expiration elapsed. - Improved `EndFrame` and `CancelFrame` handling. `EndFrame` should end things gracefully while a `CancelFrame` should cancel all running tasks as soon as possible. - Fixed an issue in `AIService` that would cause a yielded `None` value to be processed. - RTVI's `bot-ready` message is now sent when the RTVI pipeline is ready and a first participant joins. - Fixed a `BaseInputTransport` issue that was causing incoming system frames to be queued instead of being pushed immediately. - Fixed a `BaseInputTransport` issue that was causing start/stop interruptions incoming frames to not cancel tasks and be processed properly. ### Other - Added `studypal` example (from to the Cartesia folks!). - Most examples now use Cartesia. - Added examples `foundational/19a-tools-anthropic.py`, `foundational/19b-tools-video-anthropic.py` and `foundational/19a-tools-togetherai.py`. - Added examples `foundational/18-gstreamer-filesrc.py` and `foundational/18a-gstreamer-videotestsrc.py` that show how to use `GStreamerPipelineSource` - Remove `requests` library usage. - Cleanup examples and use `DailyRESTHelper`. ## [0.0.39] - 2024-07-23 ### Fixed - Fixed a regression introduced in 0.0.38 that would cause Daily transcription to stop the Pipeline. ## [0.0.38] - 2024-07-23 ### Added - Added `force_reload`, `skip_validation` and `trust_repo` to `SileroVAD` and `SileroVADAnalyzer`. This allows caching and various GitHub repo validations. - Added `send_initial_empty_metrics` flag to `PipelineParams` to request for initial empty metrics (zero values). True by default. ### Fixed - Fixed initial metrics format. It was using the wrong keys name/time instead of processor/value. - STT services should be using ISO 8601 time format for transcription frames. - Fixed an issue that would cause Daily transport to show a stop transcription error when actually none occurred. ## [0.0.37] - 2024-07-22 ### Added - Added `RTVIProcessor` which implements the RTVI-AI standard. See https://github.com/rtvi-ai - Added `BotInterruptionFrame` which allows interrupting the bot while talking. - Added `LLMMessagesAppendFrame` which allows appending messages to the current LLM context. - Added `LLMMessagesUpdateFrame` which allows changing the LLM context for the one provided in this new frame. - Added `LLMModelUpdateFrame` which allows updating the LLM model. - Added `TTSSpeakFrame` which causes the bot say some text. This text will not be part of the LLM context. - Added `TTSVoiceUpdateFrame` which allows updating the TTS voice. ### Removed - We remove the `LLMResponseStartFrame` and `LLMResponseEndFrame` frames. These were added in the past to properly handle interruptions for the `LLMAssistantContextAggregator`. But the `LLMContextAggregator` is now based on `LLMResponseAggregator` which handles interruptions properly by just processing the `StartInterruptionFrame`, so there's no need for these extra frames any more. ### Fixed - Fixed an issue with `StatelessTextTransformer` where it was pushing a string instead of a `TextFrame`. - `TTSService` end of sentence detection has been improved. It now works with acronyms, numbers, hours and others. - Fixed an issue in `TTSService` that would not properly flush the current aggregated sentence if an `LLMFullResponseEndFrame` was found. ### Performance - `CartesiaTTSService` now uses websockets which improves speed. It also leverages the new Cartesia contexts which maintains generated audio prosody when multiple inputs are sent, therefore improving audio quality a lot. ## [0.0.36] - 2024-07-02 ### Added - Added `GladiaSTTService`. See https://docs.gladia.io/chapters/speech-to-text-api/pages/live-speech-recognition - Added `XTTSService`. This is a local Text-To-Speech service. See https://github.com/coqui-ai/TTS - Added `UserIdleProcessor`. This processor can be used to wait for any interaction with the user. If the user doesn't say anything within a given timeout a provided callback is called. - Added `IdleFrameProcessor`. This processor can be used to wait for frames within a given timeout. If no frame is received within the timeout a provided callback is called. - Added new frame `BotSpeakingFrame`. This frame will be continuously pushed upstream while the bot is talking. - It is now possible to specify a Silero VAD version when using `SileroVADAnalyzer` or `SileroVAD`. - Added `AysncFrameProcessor` and `AsyncAIService`. Some services like `DeepgramSTTService` need to process things asynchronously. For example, audio is sent to Deepgram but transcriptions are not returned immediately. In these cases we still require all frames (except system frames) to be pushed downstream from a single task. That's what `AsyncFrameProcessor` is for. It creates a task and all frames should be pushed from that task. So, whenever a new Deepgram transcription is ready that transcription will also be pushed from this internal task. - The `MetricsFrame` now includes processing metrics if metrics are enabled. The processing metrics indicate the time a processor needs to generate all its output. Note that not all processors generate these kind of metrics. ### Changed - `WhisperSTTService` model can now also be a string. - Added missing \* keyword separators in services. ### Fixed - `WebsocketServerTransport` doesn't try to send frames anymore if serializers returns `None`. - Fixed an issue where exceptions that occurred inside frame processors were being swallowed and not displayed. - Fixed an issue in `FastAPIWebsocketTransport` where it would still try to send data to the websocket after being closed. ### Other - Added Fly.io deployment example in `examples/deployment/flyio-example`. - Added new `17-detect-user-idle.py` example that shows how to use the new `UserIdleProcessor`. ## [0.0.35] - 2024-06-28 ### Changed - `FastAPIWebsocketParams` now require a serializer. - `TwilioFrameSerializer` now requires a `streamSid`. ### Fixed - Silero VAD number of frames needs to be 512 for 16000 sample rate or 256 for 8000 sample rate. ## [0.0.34] - 2024-06-25 ### Fixed - Fixed an issue with asynchronous STT services (Deepgram and Azure) that could interruptions to ignore transcriptions. - Fixed an issue introduced in 0.0.33 that would cause the LLM to generate shorter output. ## [0.0.33] - 2024-06-25 ### Changed - Upgraded to Cartesia's new Python library 1.0.0. `CartesiaTTSService` now expects a voice ID instead of a voice name (you can get the voice ID from Cartesia's playground). You can also specify the audio `sample_rate` and `encoding` instead of the previous `output_format`. ### Fixed - Fixed an issue with asynchronous STT services (Deepgram and Azure) that could cause static audio issues and interruptions to not work properly when dealing with multiple LLMs sentences. - Fixed an issue that could mix new LLM responses with previous ones when handling interruptions. - Fixed a Daily transport blocking situation that occurred while reading audio frames after a participant left the room. Needs daily-python >= 0.10.1. ## [0.0.32] - 2024-06-22 ### Added - Allow specifying a `DeepgramSTTService` url which allows using on-prem Deepgram. - Added new `FastAPIWebsocketTransport`. This is a new websocket transport that can be integrated with FastAPI websockets. - Added new `TwilioFrameSerializer`. This is a new serializer that knows how to serialize and deserialize audio frames from Twilio. - Added Daily transport event: `on_dialout_answered`. See https://reference-python.daily.co/api_reference.html#daily.EventHandler - Added new `AzureSTTService`. This allows you to use Azure Speech-To-Text. ### Performance - Convert `BaseOutputTransport` and `BaseOutputTransport` to fully use asyncio and remove the use of threads. ### Other - Added `twilio-chatbot`. This is an example that shows how to integrate Twilio phone numbers with a Pipecat bot. - Updated `07f-interruptible-azure.py` to use `AzureLLMService`, `AzureSTTService` and `AzureTTSService`. ## [0.0.31] - 2024-06-13 ### Performance - Break long audio frames into 20ms chunks instead of 10ms. ## [0.0.30] - 2024-06-13 ### Added - Added `report_only_initial_ttfb` to `PipelineParams`. This will make it so only the initial TTFB metrics after the user stops talking are reported. - Added `OpenPipeLLMService`. This service will let you run OpenAI through OpenPipe's SDK. - Allow specifying frame processors' name through a new `name` constructor argument. - Added `DeepgramSTTService`. This service has an ongoing websocket connection. To handle this, it subclasses `AIService` instead of `STTService`. The output of this service will be pushed from the same task, except system frames like `StartFrame`, `CancelFrame` or `StartInterruptionFrame`. ### Changed - `FrameSerializer.deserialize()` can now return `None` in case it is not possible to desearialize the given data. - `daily_rest.DailyRoomProperties` now allows extra unknown parameters. ### Fixed - Fixed an issue where `DailyRoomProperties.exp` always had the same old timestamp unless set by the user. - Fixed a couple of issues with `WebsocketServerTransport`. It needed to use `push_audio_frame()` and also VAD was not working properly. - Fixed an issue that would cause LLM aggregator to fail with small `VADParams.stop_secs` values. - Fixed an issue where `BaseOutputTransport` would send longer audio frames preventing interruptions. ### Other - Added new `07h-interruptible-openpipe.py` example. This example shows how to use OpenPipe to run OpenAI LLMs and get the logs stored in OpenPipe. - Added new `dialin-chatbot` example. This examples shows how to call the bot using a phone number. ## [0.0.29] - 2024-06-07 ### Added - Added a new `FunctionFilter`. This filter will let you filter frames based on a given function, except system messages which should never be filtered. - Added `FrameProcessor.can_generate_metrics()` method to indicate if a processor can generate metrics. In the future this might get an extra argument to ask for a specific type of metric. - Added `BasePipeline`. All pipeline classes should be based on this class. All subclasses should implement a `processors_with_metrics()` method that returns a list of all `FrameProcessor`s in the pipeline that can generate metrics. - Added `enable_metrics` to `PipelineParams`. - Added `MetricsFrame`. The `MetricsFrame` will report different metrics in the system. Right now, it can report TTFB (Time To First Byte) values for different services, that is the time spent between the arrival of a `Frame` to the processor/service until the first `DataFrame` is pushed downstream. If metrics are enabled an intial `MetricsFrame` with all the services in the pipeline will be sent. - Added TTFB metrics and debug logging for TTS services. ### Changed - Moved `ParallelTask` to `pipecat.pipeline.parallel_task`. ### Fixed - Fixed PlayHT TTS service to work properly async. ## [0.0.28] - 2024-06-05 ### Fixed - Fixed an issue with `SileroVADAnalyzer` that would cause memory to keep growing indefinitely. ## [0.0.27] - 2024-06-05 ### Added - Added `DailyTransport.participants()` and `DailyTransport.participant_counts()`. ## [0.0.26] - 2024-06-05 ### Added - Added `OpenAITTSService`. - Allow passing `output_format` and `model_id` to `CartesiaTTSService` to change audio sample format and the model to use. - Added `DailyRESTHelper` which helps you create Daily rooms and tokens in an easy way. - `PipelineTask` now has a `has_finished()` method to indicate if the task has completed. If a task is never ran `has_finished()` will return False. - `PipelineRunner` now supports SIGTERM. If received, the runner will be cancelled. ### Fixed - Fixed an issue where `BaseInputTransport` and `BaseOutputTransport` where stopping push tasks before pushing `EndFrame` frames could cause the bots to get stuck. - Fixed an error closing local audio transports. - Fixed an issue with Deepgram TTS that was introduced in the previous release. - Fixed `AnthropicLLMService` interruptions. If an interruption occurred, a `user` message could be appended after the previous `user` message. Anthropic does not allow that because it requires alternate `user` and `assistant` messages. ### Performance - The `BaseInputTransport` does not pull audio frames from sub-classes any more. Instead, sub-classes now push audio frames into a queue in the base class. Also, `DailyInputTransport` now pushes audio frames every 20ms instead of 10ms. - Remove redundant camera input thread from `DailyInputTransport`. This should improve performance a little bit when processing participant videos. - Load Cartesia voice on startup. ## [0.0.25] - 2024-05-31 ### Added - Added WebsocketServerTransport. This will create a websocket server and will read messages coming from a client. The messages are serialized/deserialized with protobufs. See `examples/websocket-server` for a detailed example. - Added function calling (LLMService.register_function()). This will allow the LLM to call functions you have registered when needed. For example, if you register a function to get the weather in Los Angeles and ask the LLM about the weather in Los Angeles, the LLM will call your function. See https://platform.openai.com/docs/guides/function-calling - Added new `LangchainProcessor`. - Added Cartesia TTS support (https://cartesia.ai/) ### Fixed - Fixed SileroVAD frame processor. - Fixed an issue where `camera_out_enabled` would cause the highg CPU usage if no image was provided. ### Performance - Removed unnecessary audio input tasks. ## [0.0.24] - 2024-05-29 ### Added - Exposed `on_dialin_ready` for Daily transport SIP endpoint handling. This notifies when the Daily room SIP endpoints are ready. This allows integrating with third-party services like Twilio. - Exposed Daily transport `on_app_message` event. - Added Daily transport `on_call_state_updated` event. - Added Daily transport `start_recording()`, `stop_recording` and `stop_dialout`. ### Changed - Added `PipelineParams`. This replaces the `allow_interruptions` argument in `PipelineTask` and will allow future parameters in the future. - Fixed Deepgram Aura TTS base_url and added ErrorFrame reporting. - GoogleLLMService `api_key` argument is now mandatory. ### Fixed - Daily tranport `dialin-ready` doesn't not block anymore and it now handles timeouts. - Fixed AzureLLMService. ## [0.0.23] - 2024-05-23 ### Fixed - Fixed an issue handling Daily transport `dialin-ready` event. ## [0.0.22] - 2024-05-23 ### Added - Added Daily transport `start_dialout()` to be able to make phone or SIP calls. See https://reference-python.daily.co/api_reference.html#daily.CallClient.start_dialout - Added Daily transport support for dial-in use cases. - Added Daily transport events: `on_dialout_connected`, `on_dialout_stopped`, `on_dialout_error` and `on_dialout_warning`. See https://reference-python.daily.co/api_reference.html#daily.EventHandler ## [0.0.21] - 2024-05-22 ### Added - Added vision support to Anthropic service. - Added `WakeCheckFilter` which allows you to pass information downstream only if you say a certain phrase/word. ### Changed - `FrameSerializer.serialize()` and `FrameSerializer.deserialize()` are now `async`. - `Filter` has been renamed to `FrameFilter` and it's now under `processors/filters`. ### Fixed - Fixed Anthropic service to use new frame types. - Fixed an issue in `LLMUserResponseAggregator` and `UserResponseAggregator` that would cause frames after a brief pause to not be pushed to the LLM. - Clear the audio output buffer if we are interrupted. - Re-add exponential smoothing after volume calculation. This makes sure the volume value being used doesn't fluctuate so much. ## [0.0.20] - 2024-05-22 ### Added - In order to improve interruptions we now compute a loudness level using [pyloudnorm](https://github.com/csteinmetz1/pyloudnorm). The audio coming WebRTC transports (e.g. Daily) have an Automatic Gain Control (AGC) algorithm applied to the signal, however we don't do that on our local PyAudio signals. This means that currently incoming audio from PyAudio is kind of broken. We will fix it in future releases. ### Fixed - Fixed an issue where `StartInterruptionFrame` would cause `LLMUserResponseAggregator` to push the accumulated text causing the LLM respond in the wrong task. The `StartInterruptionFrame` should not trigger any new LLM response because that would be spoken in a different task. - Fixed an issue where tasks and threads could be paused because the executor didn't have more tasks available. This was causing issues when cancelling and recreating tasks during interruptions. ## [0.0.19] - 2024-05-20 ### Changed - `LLMUserResponseAggregator` and `LLMAssistantResponseAggregator` internal messages are now exposed through the `messages` property. ### Fixed - Fixed an issue where `LLMAssistantResponseAggregator` was not accumulating the full response but short sentences instead. If there's an interruption we only accumulate what the bot has spoken until now in a long response as well. ## [0.0.18] - 2024-05-20 ### Fixed - Fixed an issue in `DailyOuputTransport` where transport messages were not being sent. ## [0.0.17] - 2024-05-19 ### Added - Added `google.generativeai` model support, including vision. This new `google` service defaults to using `gemini-1.5-flash-latest`. Example in `examples/foundational/12a-describe-video-gemini-flash.py`. - Added vision support to `openai` service. Example in `examples/foundational/12a-describe-video-gemini-flash.py`. - Added initial interruptions support. The assistant contexts (or aggregators) should now be placed after the output transport. This way, only the completed spoken context is added to the assistant context. - Added `VADParams` so you can control voice confidence level and others. - `VADAnalyzer` now uses an exponential smoothed volume to improve speech detection. This is useful when voice confidence is high (because there's someone talking near you) but volume is low. ### Fixed - Fixed an issue where TTSService was not pushing TextFrames downstream. - Fixed issues with Ctrl-C program termination. - Fixed an issue that was causing `StopTaskFrame` to actually not exit the `PipelineTask`. ## [0.0.16] - 2024-05-16 ### Fixed - `DailyTransport`: don't publish camera and audio tracks if not enabled. - Fixed an issue in `BaseInputTransport` that was causing frames pushed downstream not pushed in the right order. ## [0.0.15] - 2024-05-15 ### Fixed - Quick hot fix for receiving `DailyTransportMessage`. ## [0.0.14] - 2024-05-15 ### Added - Added `DailyTransport` event `on_participant_left`. - Added support for receiving `DailyTransportMessage`. ### Fixed - Images are now resized to the size of the output camera. This was causing images not being displayed. - Fixed an issue in `DailyTransport` that would not allow the input processor to shutdown if no participant ever joined the room. - Fixed base transports start and stop. In some situation processors would halt or not shutdown properly. ## [0.0.13] - 2024-05-14 ### Changed - `MoondreamService` argument `model_id` is now `model`. - `VADAnalyzer` arguments have been renamed for more clarity. ### Fixed - Fixed an issue with `DailyInputTransport` and `DailyOutputTransport` that could cause some threads to not start properly. - Fixed `STTService`. Add `max_silence_secs` and `max_buffer_secs` to handle better what's being passed to the STT service. Also add exponential smoothing to the RMS. - Fixed `WhisperSTTService`. Add `no_speech_prob` to avoid garbage output text. ## [0.0.12] - 2024-05-14 ### Added - Added `DailyTranscriptionSettings` to be able to specify transcription settings much easier (e.g. language). ### Other - Updated `simple-chatbot` with Spanish. - Add missing dependencies in some of the examples. ## [0.0.11] - 2024-05-13 ### Added - Allow stopping pipeline tasks with new `StopTaskFrame`. ### Changed - TTS, STT and image generation service now use `AsyncGenerator`. ### Fixed - `DailyTransport`: allow registering for participant transcriptions even if input transport is not initialized yet. ### Other - Updated `storytelling-chatbot`. ## [0.0.10] - 2024-05-13 ### Added - Added Intel GPU support to `MoondreamService`. - Added support for sending transport messages (e.g. to communicate with an app at the other end of the transport). - Added `FrameProcessor.push_error()` to easily send an `ErrorFrame` upstream. ### Fixed - Fixed Azure services (TTS and image generation). ### Other - Updated `simple-chatbot`, `moondream-chatbot` and `translation-chatbot` examples. ## [0.0.9] - 2024-05-12 ### Changed Many things have changed in this version. Many of the main ideas such as frames, processors, services and transports are still there but some things have changed a bit. - `Frame`s describe the basic units for processing. For example, text, image or audio frames. Or control frames to indicate a user has started or stopped speaking. - `FrameProcessor`s process frames (e.g. they convert a `TextFrame` to an `ImageRawFrame`) and push new frames downstream or upstream to their linked peers. - `FrameProcessor`s can be linked together. The easiest wait is to use the `Pipeline` which is a container for processors. Linking processors allow frames to travel upstream or downstream easily. - `Transport`s are a way to send or receive frames. There can be local transports (e.g. local audio or native apps), network transports (e.g. websocket) or service transports (e.g. https://daily.co). - `Pipeline`s are just a processor container for other processors. - A `PipelineTask` know how to run a pipeline. - A `PipelineRunner` can run one or more tasks and it is also used, for example, to capture Ctrl-C from the user. ## [0.0.8] - 2024-04-11 ### Added - Added `FireworksLLMService`. - Added `InterimTranscriptionFrame` and enable interim results in `DailyTransport` transcriptions. ### Changed - `FalImageGenService` now uses new `fal_client` package. ### Fixed - `FalImageGenService`: use `asyncio.to_thread` to not block main loop when generating images. - Allow `TranscriptionFrame` after an end frame (transcriptions can be delayed and received after `UserStoppedSpeakingFrame`). ## [0.0.7] - 2024-04-10 ### Added - Add `use_cpu` argument to `MoondreamService`. ## [0.0.6] - 2024-04-10 ### Added - Added `FalImageGenService.InputParams`. - Added `URLImageFrame` and `UserImageFrame`. - Added `UserImageRequestFrame` and allow requesting an image from a participant. - Added base `VisionService` and `MoondreamService` ### Changed - Don't pass `image_size` to `ImageGenService`, images should have their own size. - `ImageFrame` now receives a tuple`(width,height)` to specify the size. - `on_first_other_participant_joined` now gets a participant argument. ### Fixed - Check if camera, speaker and microphone are enabled before writing to them. ### Performance - `DailyTransport` only subscribe to desired participant video track. ## [0.0.5] - 2024-04-06 ### Changed - Use `camera_bitrate` and `camera_framerate`. - Increase `camera_framerate` to 30 by default. ### Fixed - Fixed `LocalTransport.read_audio_frames`. ## [0.0.4] - 2024-04-04 ### Added - Added project optional dependencies `[silero,openai,...]`. ### Changed - Moved thransports to its own directory. - Use `OPENAI_API_KEY` instead of `OPENAI_CHATGPT_API_KEY`. ### Fixed - Don't write to microphone/speaker if not enabled. ### Other - Added live translation example. - Fix foundational examples. ## [0.0.3] - 2024-03-13 ### Other - Added `storybot` and `chatbot` examples. ## [0.0.2] - 2024-03-12 Initial public release.