Merge pull request #4423 from joycech333/feat/inception-llm-service

feat: add Inception LLM service with Mercury 2 support
Code review fixes
2026-05-21 12:02:27 -04:00 · 2026-05-21 11:45:17 -04:00 · 2026-05-21 11:23:23 -04:00 · 2026-05-21 08:35:46 -04:00 · 2026-05-21 08:35:31 -04:00 · 2026-05-21 08:35:15 -04:00
132 changed files with 5046 additions and 258 deletions
--- a/.claude/skills/squash-commits/SKILL.md
+++ b/.claude/skills/squash-commits/SKILL.md
@@ -0,0 +1,91 @@
+---
+name: squash-commits
+description: Reorganize messy branch commits into a small set of logical, meaningful commits without changing any content. Drops merge-from-main commits. Safe: creates a backup branch first.
+---
+
+Reorganize the commits on the current branch into a small number of logical commits. Do NOT change any file content — only the commit structure changes.
+
+## Instructions
+
+### 1. Safety check
+
+```bash
+git status --short
+```
+
+If there are uncommitted changes, stop and tell the user to commit or stash them first.
+
+### 2. Inspect the branch
+
+```bash
+git log main..HEAD --oneline
+git diff main..HEAD --name-only
+```
+
+List every file changed vs `main` and every commit on the branch (excluding merge commits from main).
+
+### 3. Create a backup branch
+
+```bash
+git branch backup/<current-branch-name>
+```
+
+Tell the user the backup exists so they can recover if needed.
+
+### 4. Soft-reset to main and unstage everything
+
+```bash
+git reset --soft main
+git restore --staged .
+```
+
+All branch changes are now in the working tree, unstaged. No content has changed.
+
+### 5. Plan the logical groups
+
+Read the changed files and the original commit messages to understand what the work covers. Group related files into logical commits. Typical groups:
+
+- Core feature or fix (new source files + modified core files)
+- Secondary features or fixes (each as its own commit if distinct)
+- Refactoring or renames
+- Tests
+- Changelogs / docs
+
+Use the changelog files (if any) as a strong hint — each changelog entry often maps to one commit.
+
+Present the proposed grouping to the user and ask for confirmation before committing.
+
+### 6. Commit in logical groups
+
+For each group, stage only the relevant files and commit with a clear message following the project's conventions:
+
+```bash
+git add <file1> <file2> ...
+git commit -m "..."
+```
+
+Use conventional commit prefixes if the project uses them (`feat:`, `fix:`, `refactor:`, `test:`, `chore:`).
+
+### 7. Verify
+
+```bash
+git log main..HEAD --oneline
+git diff main..HEAD --name-only
+git status --short
+```
+
+Confirm:
+- Commit count is small and each message is meaningful
+- The set of changed files vs `main` is identical to before
+- Working tree is clean
+
+### 8. Remind about force-push
+
+The branch history has been rewritten. Tell the user they will need to `git push --force-with-lease` when they are ready to update the remote. Do NOT push automatically.
+
+## Rules
+
+- Never change file contents. If you find yourself editing a file, stop.
+- Never skip the backup branch step.
+- Never force-push without explicit user instruction.
+- If any step fails or the result looks wrong, tell the user and suggest restoring from the backup: `git reset --hard backup/<branch-name>`.
--- a/README.md
+++ b/README.md
@@ -92,7 +92,7 @@ Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.yout
 | Category            | Services                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
 | ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | Speech-to-Text      | [AssemblyAI](https://docs.pipecat.ai/api-reference/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/api-reference/server/services/stt/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/api-reference/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/api-reference/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/api-reference/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/api-reference/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/api-reference/server/services/stt/gladia), [Google](https://docs.pipecat.ai/api-reference/server/services/stt/google), [Gradium](https://docs.pipecat.ai/api-reference/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/api-reference/server/services/stt/groq), [Mistral](https://docs.pipecat.ai/api-reference/server/services/stt/mistral), [NVIDIA](https://docs.pipecat.ai/api-reference/server/services/stt/nvidia), [OpenAI (Whisper)](https://docs.pipecat.ai/api-reference/server/services/stt/openai), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/api-reference/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/api-reference/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/api-reference/server/services/stt/whisper), [xAI](https://docs.pipecat.ai/api-reference/server/services/stt/xai)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
-| LLMs                | [Anthropic](https://docs.pipecat.ai/api-reference/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/api-reference/server/services/llm/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/api-reference/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/api-reference/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/api-reference/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/api-reference/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/api-reference/server/services/llm/grok), [Groq](https://docs.pipecat.ai/api-reference/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/api-reference/server/services/llm/mistral), [Nebius](https://docs.pipecat.ai/api-reference/server/services/llm/nebius), [Novita](https://docs.pipecat.ai/api-reference/server/services/llm/novita), [NVIDIA NIM](https://docs.pipecat.ai/api-reference/server/services/llm/nvidia), [Ollama](https://docs.pipecat.ai/api-reference/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/api-reference/server/services/llm/openai), [OpenAI Responses](https://docs.pipecat.ai/api-reference/server/services/llm/openai-responses), [OpenRouter](https://docs.pipecat.ai/api-reference/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/api-reference/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/api-reference/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/api-reference/server/services/llm/sambanova), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/llm/sarvam), [Together AI](https://docs.pipecat.ai/api-reference/server/services/llm/together)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
+| LLMs                | [Anthropic](https://docs.pipecat.ai/api-reference/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/api-reference/server/services/llm/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/api-reference/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/api-reference/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/api-reference/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/api-reference/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/api-reference/server/services/llm/grok), [Groq](https://docs.pipecat.ai/api-reference/server/services/llm/groq), [Inception](https://docs.pipecat.ai/api-reference/server/services/llm/inception), [Mistral](https://docs.pipecat.ai/api-reference/server/services/llm/mistral), [Nebius](https://docs.pipecat.ai/api-reference/server/services/llm/nebius), [Novita](https://docs.pipecat.ai/api-reference/server/services/llm/novita), [NVIDIA NIM](https://docs.pipecat.ai/api-reference/server/services/llm/nvidia), [Ollama](https://docs.pipecat.ai/api-reference/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/api-reference/server/services/llm/openai), [OpenAI Responses](https://docs.pipecat.ai/api-reference/server/services/llm/openai-responses), [OpenRouter](https://docs.pipecat.ai/api-reference/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/api-reference/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/api-reference/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/api-reference/server/services/llm/sambanova), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/llm/sarvam), [Together AI](https://docs.pipecat.ai/api-reference/server/services/llm/together)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
 | Text-to-Speech      | [Async](https://docs.pipecat.ai/api-reference/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/api-reference/server/services/tts/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/tts/azure), [Camb AI](https://docs.pipecat.ai/api-reference/server/services/tts/camb), [Cartesia](https://docs.pipecat.ai/api-reference/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/api-reference/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/api-reference/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/api-reference/server/services/tts/fish), [Google](https://docs.pipecat.ai/api-reference/server/services/tts/google), [Gradium](https://docs.pipecat.ai/api-reference/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/api-reference/server/services/tts/groq), [Hume](https://docs.pipecat.ai/api-reference/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/api-reference/server/services/tts/inworld), [Kokoro](https://docs.pipecat.ai/api-reference/server/services/tts/kokoro), [LMNT](https://docs.pipecat.ai/api-reference/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/api-reference/server/services/tts/minimax), [Mistral](https://docs.pipecat.ai/api-reference/server/services/tts/mistral), [Neuphonic](https://docs.pipecat.ai/api-reference/server/services/tts/neuphonic), [NVIDIA](https://docs.pipecat.ai/api-reference/server/services/tts/nvidia), [OpenAI](https://docs.pipecat.ai/api-reference/server/services/tts/openai), [Piper](https://docs.pipecat.ai/api-reference/server/services/tts/piper), [Resemble](https://docs.pipecat.ai/api-reference/server/services/tts/resemble), [Rime](https://docs.pipecat.ai/api-reference/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/tts/sarvam), [Smallest](https://docs.pipecat.ai/api-reference/server/services/tts/smallest), [Soniox](https://docs.pipecat.ai/api-reference/server/services/tts/soniox), [Speechmatics](https://docs.pipecat.ai/api-reference/server/services/tts/speechmatics), [xAI](https://docs.pipecat.ai/api-reference/server/services/tts/xai), [XTTS](https://docs.pipecat.ai/api-reference/server/services/tts/xtts) |
 | Speech-to-Speech    | [AWS Nova Sonic](https://docs.pipecat.ai/api-reference/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/api-reference/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/api-reference/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/api-reference/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/api-reference/server/services/s2s/ultravox),                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
 | Transport           | [Daily (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/api-reference/server/services/transport/fastapi-websocket), [LiveKit (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/livekit), [SmallWebRTCTransport](https://docs.pipecat.ai/api-reference/server/services/transport/small-webrtc), [Vonage (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/vonage), [WebSocket Server](https://docs.pipecat.ai/api-reference/server/services/transport/websocket-server), [WhatsApp](https://docs.pipecat.ai/api-reference/server/services/transport/whatsapp), Local                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
--- a/changelog/4306.fixed.md
+++ b/changelog/4306.fixed.md
@@ -0,0 +1 @@
+- Fixed Azure TTS last word being missed by observers and RTVI UI. The completion signal was racing with word timestamp processing, causing the final word's `TTSTextFrame` to arrive after `TTSStoppedFrame`. Completion is now routed through the word boundary queue to ensure all words are processed before signaling stream end.
--- a/changelog/4380.fixed.2.md
+++ b/changelog/4380.fixed.2.md
@@ -0,0 +1 @@
+- Fixed `BaseOutputTransport` reordering frames that share the same presentation timestamp. Frames with equal PTS values are now emitted in insertion order, preventing subtle audio/text sequencing bugs when multiple frames arrive at the same time.
--- a/changelog/4380.fixed.3.md
+++ b/changelog/4380.fixed.3.md
@@ -0,0 +1 @@
+- Fixed Cartesia word timestamps leaking SSML tag text (e.g. `<spell>`, `<emotion>`, `<break>`) into word entries. Tags are now stripped before processing, so word-to-text attribution remains accurate when SSML markup is present in the TTS input.
--- a/changelog/4380.fixed.4.md
+++ b/changelog/4380.fixed.4.md
@@ -0,0 +1 @@
+- Fixed `TTSTextFrame` entries losing their original text structure when word timestamps are enabled. Each `TTSTextFrame` now carries a `raw_text` field containing the corresponding span of the original LLM-produced text (including pattern delimiters such as `<card>4111 1111 1111 1111</card>`), so the assistant context receives properly-tagged content rather than the cleaned words returned by the TTS provider. Also handles words that straddle two sentence boundaries by splitting them and attributing each part to its correct source frame.
--- a/changelog/4380.fixed.md
+++ b/changelog/4380.fixed.md
@@ -0,0 +1 @@
+- Fixed skipped TTS frames (e.g. code blocks filtered via `skip_aggregator_types`) being emitted to the assistant context immediately instead of waiting for preceding spoken frames to finish. They now hold their position in the frame sequence and are flushed only after all earlier spoken sentences are complete, keeping context ordering correct.
--- a/changelog/4423.added.md
+++ b/changelog/4423.added.md
@@ -0,0 +1 @@
+- Added `InceptionLLMService` for Inception's Mercury 2 diffusion reasoning model, with support for `reasoning_effort` and `realtime` settings.
--- a/changelog/4514.fixed.md
+++ b/changelog/4514.fixed.md
@@ -0,0 +1 @@
+- Fixed websocket STT connection setup failures so services clear stale websocket state and emit non-fatal error frames, allowing `ServiceSwitcher` failover to keep agents running.
--- a/changelog/4521.added.md
+++ b/changelog/4521.added.md
@@ -0,0 +1 @@
+- Added `max_endpoint_delay_ms` to `SonioxSTTService.Settings`, controlling the maximum delay (500-3000 ms) before endpoint detection finalizes a turn.
--- a/changelog/4521.changed.md
+++ b/changelog/4521.changed.md
@@ -0,0 +1 @@
+- `SonioxSTTService` now applies settings updates (e.g. via `STTUpdateSettingsFrame`) using a graceful reconnect instead of a hard disconnect/reconnect, preserving the service's reconnect retry behavior.
--- a/changelog/4521.removed.md
+++ b/changelog/4521.removed.md
@@ -0,0 +1 @@
+- Removed the unsupported Georgian (`Language.KA`) language mapping from `SonioxSTTService`.
--- a/changelog/4522.changed.md
+++ b/changelog/4522.changed.md
@@ -0,0 +1 @@
+- Updated the default p99 TTFS latency values for Smallest AI, Mistral, and XAI STT so turn stop timing uses measured values instead of the conservative fallback.
--- a/changelog/4524.changed.md
+++ b/changelog/4524.changed.md
@@ -0,0 +1 @@
+- Updated the development runner startup banner to show the prebuilt client URL once and list enabled or disabled transports with install hints.
--- a/changelog/4524.fixed.md
+++ b/changelog/4524.fixed.md
@@ -0,0 +1 @@
+- Fixed the development runner so missing optional transport dependencies disable only their related routes instead of failing startup in all-transport mode.
--- a/changelog/4525.changed.md
+++ b/changelog/4525.changed.md
@@ -1 +0,0 @@
- Services and transports with missing optional dependencies now raise `ImportError` instead of a bare `Exception` when their module is imported without the required extra installed. The original `ModuleNotFoundError` is preserved as `__cause__`, so code that wraps these imports can now use `except ImportError:` cleanly instead of `except Exception:`.
--- a/changelog/4527.fixed.md
+++ b/changelog/4527.fixed.md
@@ -0,0 +1 @@
+- Fixed a race in `ElevenLabsTTSService` where the periodic keepalive could be sent for a new turn's context before that context's `voice_settings` initialization message, causing ElevenLabs to close the WebSocket with a 1008 policy violation (`voice_settings field must be provided in the first message ...`). The keepalive now only targets a context once its context-init has been sent.
--- a/changelog/4531.changed.md
+++ b/changelog/4531.changed.md
@@ -0,0 +1 @@
+- Bumped `pipecat-ai-prebuilt` to 1.0.1 in the `runner` extra, updating the prebuilt client UI served by the development runner.
--- a/env.example
+++ b/env.example
@@ -91,6 +91,9 @@ HEYGEN_LIVE_AVATAR_API_KEY=...
 HUME_API_KEY=...
 HUME_VOICE_ID=...

+# Inception
+INCEPTION_API_KEY=...
+
 # Inworld
 INWORLD_API_KEY=...

--- a/examples/function-calling/function-calling-inception.py
+++ b/examples/function-calling/function-calling-inception.py
@@ -0,0 +1,177 @@
+#
+# Copyright (c) 2024-2026, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+
+import os
+
+from dotenv import load_dotenv
+from loguru import logger
+
+from pipecat.adapters.schemas.function_schema import FunctionSchema
+from pipecat.adapters.schemas.tools_schema import ToolsSchema
+from pipecat.audio.vad.silero import SileroVADAnalyzer
+from pipecat.frames.frames import LLMRunFrame, TTSSpeakFrame
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.llm_context import LLMContext
+from pipecat.processors.aggregators.llm_response_universal import (
+    LLMContextAggregatorPair,
+    LLMUserAggregatorParams,
+)
+from pipecat.runner.types import RunnerArguments
+from pipecat.runner.utils import create_transport
+from pipecat.services.cartesia.tts import CartesiaTTSService
+from pipecat.services.deepgram.stt import DeepgramSTTService
+from pipecat.services.inception.llm import InceptionLLMService
+from pipecat.services.llm_service import FunctionCallParams
+from pipecat.transports.base_transport import BaseTransport, TransportParams
+from pipecat.transports.daily.transport import DailyParams
+from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
+
+load_dotenv(override=True)
+
+
+async def fetch_weather_from_api(params: FunctionCallParams):
+    await params.result_callback({"conditions": "nice", "temperature": "75"})
+
+
+async def fetch_restaurant_recommendation(params: FunctionCallParams):
+    await params.result_callback({"name": "The Golden Dragon"})
+
+
+# We use lambdas to defer transport parameter creation until the transport
+# type is selected at runtime.
+transport_params = {
+    "daily": lambda: DailyParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+    "twilio": lambda: FastAPIWebsocketParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+    "webrtc": lambda: TransportParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+}
+
+
+async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
+    logger.info(f"Starting bot")
+
+    stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
+
+    tts = CartesiaTTSService(
+        api_key=os.environ["CARTESIA_API_KEY"],
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
+    )
+
+    llm = InceptionLLMService(
+        api_key=os.environ["INCEPTION_API_KEY"],
+        settings=InceptionLLMService.Settings(
+            reasoning_effort="instant",
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )
+    # You can also register a function_name of None to get all functions
+    # sent to the same callback with an additional function_name parameter.
+    llm.register_function("get_current_weather", fetch_weather_from_api)
+    llm.register_function("get_restaurant_recommendation", fetch_restaurant_recommendation)
+
+    @llm.event_handler("on_function_calls_started")
+    async def on_function_calls_started(service, function_calls):
+        await tts.queue_frame(TTSSpeakFrame("Let me check on that."))
+
+    weather_function = FunctionSchema(
+        name="get_current_weather",
+        description="Get the current weather",
+        properties={
+            "location": {
+                "type": "string",
+                "description": "The city and state, e.g. San Francisco, CA",
+            },
+            "format": {
+                "type": "string",
+                "enum": ["celsius", "fahrenheit"],
+                "description": "The temperature unit to use. Infer this from the user's location.",
+            },
+        },
+        required=["location", "format"],
+    )
+
+    restaurant_function = FunctionSchema(
+        name="get_restaurant_recommendation",
+        description="Get a restaurant recommendation",
+        properties={
+            "location": {
+                "type": "string",
+                "description": "The city and state, e.g. San Francisco, CA",
+            },
+        },
+        required=["location"],
+    )
+    tools = ToolsSchema(standard_tools=[weather_function, restaurant_function])
+
+    context = LLMContext(tools=tools)
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+    )
+
+    pipeline = Pipeline(
+        [
+            transport.input(),
+            stt,
+            user_aggregator,
+            llm,
+            tts,
+            transport.output(),
+            assistant_aggregator,
+        ]
+    )
+
+    task = PipelineTask(
+        pipeline,
+        params=PipelineParams(
+            enable_metrics=True,
+            enable_usage_metrics=True,
+        ),
+        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
+    )
+
+    @transport.event_handler("on_client_connected")
+    async def on_client_connected(transport, client):
+        logger.info(f"Client connected")
+        # Kick off the conversation.
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
+        await task.queue_frames([LLMRunFrame()])
+
+    @transport.event_handler("on_client_disconnected")
+    async def on_client_disconnected(transport, client):
+        logger.info(f"Client disconnected")
+        await task.cancel()
+
+    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
+
+    await runner.run(task)
+
+
+async def bot(runner_args: RunnerArguments):
+    """Main bot entry point compatible with Pipecat Cloud."""
+    transport = await create_transport(runner_args, transport_params)
+    await run_bot(transport, runner_args)
+
+
+if __name__ == "__main__":
+    from pipecat.runner.run import main
+
+    main()
--- a/examples/update-settings/stt/stt-soniox.py
+++ b/examples/update-settings/stt/stt-soniox.py
@@ -22,9 +22,9 @@ from pipecat.processors.aggregators.llm_response_universal import (
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
-from pipecat.services.cartesia.tts import CartesiaTTSService
 from pipecat.services.openai.llm import OpenAILLMService
 from pipecat.services.soniox.stt import SonioxSTTService
+from pipecat.services.soniox.tts import SonioxTTSService
 from pipecat.transcriptions.language import Language
 from pipecat.transports.base_transport import BaseTransport, TransportParams
 from pipecat.transports.daily.transport import DailyParams
@@ -53,12 +53,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    stt = SonioxSTTService(api_key=os.environ["SONIOX_API_KEY"])

-    tts = CartesiaTTSService(
-        api_key=os.environ["CARTESIA_API_KEY"],
-        settings=CartesiaTTSService.Settings(
-            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
-        ),
-    )
+    tts = SonioxTTSService(api_key=os.environ["SONIOX_API_KEY"])

    llm = OpenAILLMService(
        api_key=os.environ["OPENAI_API_KEY"],
@@ -103,9 +98,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        await task.queue_frames([LLMRunFrame()])

        await asyncio.sleep(10)
-        logger.info("Updating Soniox STT settings: language=es")
+        logger.info("Updating Soniox STT settings: language_hints=[es]")
        await task.queue_frame(
-            STTUpdateSettingsFrame(delta=SonioxSTTService.Settings(language=Language.ES))
+            STTUpdateSettingsFrame(delta=SonioxSTTService.Settings(language_hints=[Language.ES]))
        )

    @transport.event_handler("on_client_disconnected")
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -77,6 +77,7 @@ groq = [ "groq>=0.23.0,<2" ]
 gstreamer = [ "pygobject~=3.50.0" ]
 heygen = [ "livekit>=1.0.13,<2", "pipecat-ai[websockets-base]" ]
 hume = [ "hume>=0.11.2,<1" ]
+inception = []
 inworld = [ "pipecat-ai[websockets-base]" ]
 koala = [ "pvkoala~=2.0.3" ]
 kokoro = [ "kokoro-onnx>=0.5.0,<1", "requests>=2.32.5,<3" ]
@@ -103,7 +104,7 @@ piper = [ "piper-tts>=1.3.0,<2", "requests>=2.32.5,<3" ]
 qwen = []
 resembleai = [ "pipecat-ai[websockets-base]" ]
 rime = [ "pipecat-ai[websockets-base]" ]
-runner = [ "python-dotenv>=1.0.0,<2.0.0", "uvicorn>=0.32.0,<1.0.0", "fastapi>=0.115.6,<1", "pipecat-ai-prebuilt>=1.0.0"]
+runner = [ "python-dotenv>=1.0.0,<2.0.0", "uvicorn>=0.32.0,<1.0.0", "fastapi>=0.115.6,<1", "pipecat-ai-prebuilt>=1.0.1"]
 sagemaker = ["aws_sdk_sagemaker_runtime_http2; python_version>='3.12'"]
 sambanova = []
 sarvam = [ "sarvamai==0.1.28", "pipecat-ai[websockets-base]" ]
--- a/scripts/evals/run-release-evals.py
+++ b/scripts/evals/run-release-evals.py
@@ -198,6 +198,7 @@ TESTS_FUNCTION_CALLING = [
    ("function-calling/function-calling-sarvam.py", EVAL_WEATHER),
    ("function-calling/function-calling-novita.py", EVAL_WEATHER),
    ("function-calling/function-calling-deepseek.py", EVAL_WEATHER),
+    ("function-calling/function-calling-inception.py", EVAL_WEATHER),
    # Video
    ("function-calling/function-calling-anthropic-video.py", EVAL_VISION_CAMERA),
    ("function-calling/function-calling-aws-video.py", EVAL_VISION_CAMERA),
--- a/src/pipecat/adapters/services/gemini_adapter.py
+++ b/src/pipecat/adapters/services/gemini_adapter.py
@@ -28,7 +28,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use Google AI, you need to `pip install pipecat-ai[google]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


 class GeminiLLMInvocationParams(TypedDict):
--- a/src/pipecat/audio/filters/koala_filter.py
+++ b/src/pipecat/audio/filters/koala_filter.py
@@ -23,7 +23,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use the Koala filter, you need to `pip install pipecat-ai[koala]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


 class KoalaFilter(BaseAudioFilter):
--- a/src/pipecat/audio/filters/krisp_viva_filter.py
+++ b/src/pipecat/audio/filters/krisp_viva_filter.py
@@ -27,7 +27,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use KrispVivaFilter, you need to install krisp_audio.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


 class KrispVivaFilter(BaseAudioFilter):
--- a/src/pipecat/audio/mixers/soundfile_mixer.py
+++ b/src/pipecat/audio/mixers/soundfile_mixer.py
@@ -28,7 +28,7 @@ except ModuleNotFoundError as e:
    logger.error(
        "In order to use the soundfile mixer, you need to `pip install pipecat-ai[soundfile]`."
    )
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


 class SoundfileMixer(BaseAudioMixer):
--- a/src/pipecat/audio/turn/smart_turn/local_coreml_smart_turn.py
+++ b/src/pipecat/audio/turn/smart_turn/local_coreml_smart_turn.py
@@ -27,7 +27,7 @@ except ModuleNotFoundError as e:
    logger.error(
        "In order to use the LocalSmartTurnAnalyzer, you need to `pip install pipecat-ai[local-smart-turn]`."
    )
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


 class LocalCoreMLSmartTurnAnalyzer(BaseSmartTurn):
--- a/src/pipecat/audio/turn/smart_turn/local_smart_turn_v2.py
+++ b/src/pipecat/audio/turn/smart_turn/local_smart_turn_v2.py
@@ -33,7 +33,7 @@ except ModuleNotFoundError as e:
    logger.error(
        "In order to use LocalSmartTurnAnalyzerV2, you need to `pip install pipecat-ai[local-smart-turn]`."
    )
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


 class LocalSmartTurnAnalyzerV2(BaseSmartTurn):
--- a/src/pipecat/audio/vad/krisp_viva_vad.py
+++ b/src/pipecat/audio/vad/krisp_viva_vad.py
@@ -28,7 +28,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use KrispVivaVADAnalyzer, you need to install krisp_audio.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


 class KrispVivaVadAnalyzer(VADAnalyzer):
--- a/src/pipecat/audio/vad/silero.py
+++ b/src/pipecat/audio/vad/silero.py
@@ -27,7 +27,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use Silero VAD, you need to `pip install pipecat-ai`.")
-    raise ImportError(f"Missing module(s): {e}") from e
+    raise Exception(f"Missing module(s): {e}")


 class SileroOnnxModel:
--- a/src/pipecat/frames/frames.py
+++ b/src/pipecat/frames/frames.py
@@ -383,10 +383,14 @@ class AggregatedTextFrame(TextFrame):
    Parameters:
        aggregated_by: Method used to aggregate the text frames.
        context_id: Unique identifier for the TTS context that generated this text.
+        raw_text: The full matched text including start/end pattern delimiters, set when
+            this frame was produced from a PatternMatch (e.g. a ``<code>...</code>`` block).
+            None for ordinary sentence aggregations.
    """

    aggregated_by: AggregationType | str
    context_id: str | None = None
+    raw_text: str | None = None


@dataclass
--- a/src/pipecat/processors/aggregators/llm_response_universal.py
+++ b/src/pipecat/processors/aggregators/llm_response_universal.py
@@ -25,6 +25,7 @@ from pipecat.adapters.schemas.tools_schema import ToolsSchema
 from pipecat.audio.vad.vad_analyzer import VADAnalyzer
 from pipecat.audio.vad.vad_controller import VADController
 from pipecat.frames.frames import (
+    AggregatedTextFrame,
    AssistantImageRawFrame,
    BotStartedSpeakingFrame,
    BotStoppedSpeakingFrame,
@@ -1496,9 +1497,14 @@ class LLMAssistantAggregator(LLMContextAggregator):
        if len(frame.text) == 0:
            return

+        text = (
+            frame.raw_text
+            if isinstance(frame, AggregatedTextFrame) and frame.raw_text
+            else frame.text
+        )
        self._aggregation.append(
            TextPartForConcatenation(
-                frame.text, includes_inter_part_spaces=frame.includes_inter_frame_spaces
+                text, includes_inter_part_spaces=frame.includes_inter_frame_spaces
            )
        )

--- a/src/pipecat/processors/aggregators/llm_text_processor.py
+++ b/src/pipecat/processors/aggregators/llm_text_processor.py
@@ -23,6 +23,7 @@ from pipecat.frames.frames import (
 )
 from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
 from pipecat.utils.text.base_text_aggregator import BaseTextAggregator
+from pipecat.utils.text.pattern_pair_aggregator import PatternMatch
 from pipecat.utils.text.simple_text_aggregator import SimpleTextAggregator


@@ -85,7 +86,11 @@ class LLMTextProcessor(FrameProcessor):
            out_frame = AggregatedTextFrame(
                text=aggregation.text,
                aggregated_by=aggregation.type,
+                raw_text=aggregation.full_match
+                if isinstance(aggregation, PatternMatch)
+                else aggregation.text,
            )
+            out_frame.append_to_context = True
            out_frame.skip_tts = in_frame.skip_tts
            await self.push_frame(out_frame)

@@ -96,6 +101,9 @@ class LLMTextProcessor(FrameProcessor):
            out_frame = AggregatedTextFrame(
                text=remaining.text,
                aggregated_by=remaining.type,
+                raw_text=remaining.full_match
+                if isinstance(remaining, PatternMatch)
+                else remaining.text,
            )
            out_frame.skip_tts = skip_tts
            await self.push_frame(out_frame)
--- a/src/pipecat/processors/frameworks/langchain.py
+++ b/src/pipecat/processors/frameworks/langchain.py
@@ -22,7 +22,7 @@ try:
    from langchain_core.runnables import Runnable
 except ModuleNotFoundError as e:
    logger.error("In order to use Langchain, you need to `pip install pipecat-ai[langchain]`. ")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


 class LangchainProcessor(FrameProcessor):
--- a/src/pipecat/processors/frameworks/rtvi/observer.py
+++ b/src/pipecat/processors/frameworks/rtvi/observer.py
@@ -528,6 +528,9 @@ class RTVIObserver(BaseObserver):
                text = await transform(text, agg_type)

        isTTS = isinstance(frame, TTSTextFrame)
+        if agg_type is not AggregationType.WORD:
+            logger.trace(f"{self} Aggregated LLM text: {text}, {agg_type} spoken:{isTTS}")
+
        if self._params.bot_output_enabled:
            message = RTVI.BotOutputMessage(
                data=RTVI.BotOutputMessageData(text=text, spoken=isTTS, aggregated_by=agg_type)
--- a/src/pipecat/processors/frameworks/strands_agents.py
+++ b/src/pipecat/processors/frameworks/strands_agents.py
@@ -21,7 +21,7 @@ try:
    from strands.multiagent.graph import Graph
 except ModuleNotFoundError as e:
    logger.error("In order to use Strands Agents, you need to `pip install strands-agents`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


 class StrandsAgentsProcessor(FrameProcessor):
--- a/src/pipecat/processors/gstreamer/pipeline_source.py
+++ b/src/pipecat/processors/gstreamer/pipeline_source.py
@@ -33,7 +33,7 @@ except ModuleNotFoundError as e:
    logger.error(
        "In order to use GStreamer, you need to `pip install pipecat-ai[gstreamer]`. Also, you need to install GStreamer in your system."
    )
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


 class GStreamerPipelineSource(FrameProcessor):
--- a/src/pipecat/processors/metrics/sentry.py
+++ b/src/pipecat/processors/metrics/sentry.py
@@ -17,7 +17,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use Sentry, you need to `pip install pipecat-ai[sentry]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")

 from pipecat.processors.metrics.frame_processor_metrics import FrameProcessorMetrics

--- a/src/pipecat/runner/run.py
+++ b/src/pipecat/runner/run.py
@@ -90,6 +90,7 @@ To run locally:

 import argparse
 import asyncio
+import importlib.util
 import mimetypes
 import os
 import sys
@@ -131,6 +132,18 @@ load_dotenv(override=True)
 os.environ["ENV"] = "local"

 TELEPHONY_TRANSPORTS = ["twilio", "telnyx", "plivo", "exotel"]
+TRANSPORT_ROUTE_DEPENDENCIES = {
+    "daily": ("daily",),
+    "webrtc": ("aiortc",),
+    "telephony": ("fastapi", "websockets"),
+    "websocket": ("fastapi", "websockets"),
+}
+TRANSPORT_INSTALL_HINTS = {
+    "daily": "install pipecat-ai[daily]",
+    "webrtc": "install pipecat-ai[webrtc]",
+    "telephony": "install pipecat-ai[websocket]",
+    "websocket": "install pipecat-ai[websocket]",
+}

 # Mirror Pipecat Cloud's 4-hour max session limit so dev rooms get cleaned up.
 PIPECAT_ROOM_EXP_HOURS = 4.0
@@ -156,6 +169,120 @@ Import this to add custom routes from other packages before calling
 """


+def _is_module_available(module: str) -> bool:
+    """Check whether a module can be imported without importing it.
+
+    Args:
+        module: Fully-qualified module name to check.
+
+    Returns:
+        ``True`` if Python can resolve the module, ``False`` otherwise.
+    """
+    try:
+        return importlib.util.find_spec(module) is not None
+    except (ImportError, ModuleNotFoundError, ValueError):
+        return False
+
+
+def _transport_route_dependencies(transport: str) -> tuple[str, ...]:
+    """Return module dependencies required for a transport route.
+
+    Args:
+        transport: Transport name from the runner request or CLI.
+
+    Returns:
+        Module names required to enable the transport route.
+    """
+    if transport in TELEPHONY_TRANSPORTS:
+        return TRANSPORT_ROUTE_DEPENDENCIES["telephony"]
+    return TRANSPORT_ROUTE_DEPENDENCIES.get(transport, ())
+
+
+def _transport_routes_enabled(transport: str) -> bool:
+    """Return whether a transport route can run in this environment.
+
+    Args:
+        transport: Transport name from the runner request or CLI.
+
+    Returns:
+        ``True`` if the requested transport is enabled.
+    """
+    return all(_is_module_available(module) for module in _transport_route_dependencies(transport))
+
+
+def _runner_url(args: argparse.Namespace) -> str:
+    """Return the browser URL for the runner prebuilt client."""
+    return f"http://{args.host}:{args.port}"
+
+
+def _transport_status_lists() -> tuple[list[str], list[str]]:
+    """Return enabled and disabled transport labels for the startup banner."""
+    transports = ["daily", "webrtc", "telephony", "websocket"]
+    enabled = []
+    disabled = []
+
+    for label in transports:
+        if _transport_routes_enabled(label):
+            enabled.append(label)
+        else:
+            disabled.append(f"{label} ({TRANSPORT_INSTALL_HINTS[label]})")
+
+    return enabled, disabled
+
+
+def _format_transport_status(labels: list[str]) -> str:
+    """Format a startup banner transport status list."""
+    return ", ".join(labels) if labels else "none"
+
+
+def _print_startup_message(args: argparse.Namespace):
+    """Print connection information for the development runner."""
+    print()
+    if args.transport is None:
+        enabled, disabled = _transport_status_lists()
+        print("🚀 Bot ready!")
+        print(f"   → Open: {_runner_url(args)}")
+        print(f"   → Enabled transports: {_format_transport_status(enabled)}")
+        if disabled:
+            print(f"   → Disabled transports: {_format_transport_status(disabled)}")
+    elif args.transport == "webrtc":
+        if args.esp32:
+            print("🚀 Bot ready! (ESP32 mode)")
+        elif args.whatsapp:
+            print("🚀 Bot ready! (WhatsApp)")
+        else:
+            print("🚀 Bot ready! (WebRTC)")
+        if _transport_routes_enabled("webrtc"):
+            print(f"   → Open: {_runner_url(args)}")
+        else:
+            print(f"   → WebRTC disabled ({TRANSPORT_INSTALL_HINTS['webrtc']})")
+    elif args.transport == "daily":
+        print("🚀 Bot ready! (Daily)")
+        if not _transport_routes_enabled("daily"):
+            print(f"   → Daily disabled ({TRANSPORT_INSTALL_HINTS['daily']})")
+        else:
+            print(f"   → Open: {_runner_url(args)}")
+            if args.dialin:
+                print(
+                    f"   → Daily dial-in webhook: "
+                    f"http://{args.host}:{args.port}/daily-dialin-webhook"
+                )
+                print("   → Configure this URL in your Daily phone number settings")
+    elif args.transport in TELEPHONY_TRANSPORTS:
+        print(f"🚀 Bot ready! ({args.transport.capitalize()})")
+        if not _transport_routes_enabled(args.transport):
+            print(f"   → Telephony disabled ({TRANSPORT_INSTALL_HINTS['telephony']})")
+        else:
+            print(f"   → Open: {_runner_url(args)}")
+            if args.proxy:
+                print(f"   → XML webhook: http://{args.host}:{args.port}/")
+            print(f"   → WebSocket:   ws://{args.host}:{args.port}/ws")
+    elif args.transport == "vonage":
+        print()
+        print("🚀 Bot ready!")
+    print()
+
+
 def _get_bot_module():
    """Get the bot module from the calling script."""
    import importlib.util
@@ -227,6 +354,8 @@ async def _run_websocket_bot(websocket: WebSocket, args: argparse.Namespace):

 def _setup_websocket_routes(app: FastAPI, args: argparse.Namespace):
    """Set up the plain WebSocket route at ``/ws-client``."""
+    if not _transport_routes_enabled("websocket"):
+        return

    @app.websocket("/ws-client")
    async def websocket_client_endpoint(websocket: WebSocket):
@@ -338,6 +467,15 @@ def _setup_unified_start_route(
                ),
            )

+        if not _transport_routes_enabled(transport):
+            raise HTTPException(
+                status_code=400,
+                detail=(
+                    f"Transport '{transport}' is disabled in this runner environment. "
+                    "Check the startup banner for enabled transports."
+                ),
+            )
+
        if transport == "webrtc":
            # WebRTC: register the session; the bot starts when the WebRTC offer arrives.
            session_id = str(uuid.uuid4())
@@ -471,6 +609,9 @@ def _setup_webrtc_routes(
    app: FastAPI, args: argparse.Namespace, active_sessions: dict[str, dict[str, Any]]
 ):
    """Set up WebRTC-specific routes."""
+    if not _transport_routes_enabled("webrtc"):
+        return
+
    try:
        from pipecat.transports.smallwebrtc.connection import SmallWebRTCConnection
        from pipecat.transports.smallwebrtc.request_handler import (
@@ -480,7 +621,7 @@ def _setup_webrtc_routes(
            SmallWebRTCRequestHandler,
        )
    except ImportError as e:
-        logger.error(f"WebRTC transport dependencies not installed: {e}")
+        logger.warning(f"WebRTC routes disabled after dependency check passed: {e}")
        return

    @app.get("/files/{filename:path}")
@@ -765,6 +906,8 @@ def _setup_whatsapp_routes(app: FastAPI, args: argparse.Namespace):

 def _setup_daily_routes(app: FastAPI, args: argparse.Namespace):
    """Set up Daily-specific routes."""
+    if not _transport_routes_enabled("daily"):
+        return

    @app.get("/daily")
    async def create_room_and_start_agent():
@@ -908,6 +1051,9 @@ def _setup_telephony_routes(app: FastAPI, args: argparse.Namespace):
    specific telephony transport is chosen via ``-t`` because the XML template
    is provider-specific and requires a proxy hostname (``--proxy``).
    """
+    if not _transport_routes_enabled("telephony"):
+        return
+
    if args.transport in TELEPHONY_TRANSPORTS:
        # XML response templates (Exotel doesn't use XML webhooks)
        XML_TEMPLATES = {
@@ -1160,43 +1306,11 @@ def main(parser: argparse.ArgumentParser | None = None):
        return

    # Print startup message
-    print()
-    if args.transport is None:
-        print("🚀 Bot ready!")
-        print(f"   → WebRTC:    http://{args.host}:{args.port}/client")
-        print(f"   → Daily:     http://{args.host}:{args.port}/daily")
-        print(f"   → Telephony: ws://{args.host}:{args.port}/ws")
-    elif args.transport == "webrtc":
-        if args.esp32:
-            print("🚀 Bot ready! (ESP32 mode)")
-        elif args.whatsapp:
-            print("🚀 Bot ready! (WhatsApp)")
-        else:
-            print("🚀 Bot ready! (WebRTC)")
-        print(f"   → Open http://{args.host}:{args.port}/client in your browser")
-    elif args.transport == "daily":
-        print("🚀 Bot ready! (Daily)")
-        if args.dialin:
-            print(
-                f"   → Daily dial-in webhook: http://{args.host}:{args.port}/daily-dialin-webhook"
-            )
-            print(f"   → Configure this URL in your Daily phone number settings")
-        else:
-            print(
-                f"   → Open http://{args.host}:{args.port}/daily in your browser to start a session"
-            )
-    elif args.transport in TELEPHONY_TRANSPORTS:
-        print(f"🚀 Bot ready! ({args.transport.capitalize()})")
-        if args.proxy:
-            print(f"   → XML webhook: http://{args.host}:{args.port}/")
-        print(f"   → WebSocket:   ws://{args.host}:{args.port}/ws")
-    elif args.transport == "vonage":
-        print()
-        print(f"🚀 Bot ready!")
+    _print_startup_message(args)
+    if args.transport == "vonage":
        asyncio.run(_run_vonage())
        print()
        return
-    print()

    RUNNER_DOWNLOADS_FOLDER = args.folder
    RUNNER_HOST = args.host
--- a/src/pipecat/services/anthropic/llm.py
+++ b/src/pipecat/services/anthropic/llm.py
@@ -48,7 +48,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use Anthropic, you need to `pip install pipecat-ai[anthropic]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


 class AnthropicThinkingConfig(BaseModel):
--- a/src/pipecat/services/assemblyai/stt.py
+++ b/src/pipecat/services/assemblyai/stt.py
@@ -55,7 +55,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error('In order to use AssemblyAI, you need to `pip install "pipecat-ai[assemblyai]"`.')
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


 def map_language_from_assemblyai(language_code: str) -> Language:
@@ -586,9 +586,9 @@ class AssemblyAISTTService(WebsocketSTTService):
            await self._call_event_handler("on_connected")
            logger.debug(f"{self} Connected to AssemblyAI WebSocket")
        except Exception as e:
+            self._websocket = None
            self._connected = False
            await self.push_error(error_msg=f"Unable to connect to AssemblyAI: {e}", exception=e)
-            raise

    async def _disconnect_websocket(self):
        """Close the websocket connection to AssemblyAI."""
--- a/src/pipecat/services/asyncai/tts.py
+++ b/src/pipecat/services/asyncai/tts.py
@@ -39,7 +39,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use Async, you need to `pip install pipecat-ai[asyncai]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


 def language_to_async_language(language: Language) -> str:
--- a/src/pipecat/services/aws/llm.py
+++ b/src/pipecat/services/aws/llm.py
@@ -49,7 +49,7 @@ except ModuleNotFoundError as e:
    logger.error(
        "In order to use AWS services, you need to `pip install pipecat-ai[aws]`. Also, remember to set `AWS_SECRET_ACCESS_KEY`, `AWS_ACCESS_KEY_ID`, and `AWS_REGION` environment variable."
    )
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


@dataclass
--- a/src/pipecat/services/aws/nova_sonic/llm.py
+++ b/src/pipecat/services/aws/nova_sonic/llm.py
@@ -81,7 +81,7 @@ except ModuleNotFoundError as e:
    logger.error(
        "In order to use AWS services, you need to `pip install pipecat-ai[aws-nova-sonic]`."
    )
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


 class AWSNovaSonicUnhandledFunctionException(Exception):
--- a/src/pipecat/services/aws/sagemaker/bidi_client.py
+++ b/src/pipecat/services/aws/sagemaker/bidi_client.py
@@ -32,7 +32,7 @@ except ModuleNotFoundError as e:
    logger.error(
        "In order to use SageMaker BiDi client, you need to `pip install pipecat-ai[sagemaker]`."
    )
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


 class SageMakerBidiClient:
--- a/src/pipecat/services/aws/stt.py
+++ b/src/pipecat/services/aws/stt.py
@@ -48,7 +48,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use AWS services, you need to `pip install pipecat-ai[aws]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


@dataclass
@@ -339,10 +339,10 @@ class AWSTranscribeSTTService(WebsocketSTTService):
            await self._call_event_handler("on_connected")
            logger.info(f"{self} Successfully connected to AWS Transcribe")
        except Exception as e:
+            self._websocket = None
            await self.push_error(
                error_msg=f"Unable to connect to AWS Transcribe: {e}", exception=e
            )
-            raise

    async def _disconnect_websocket(self):
        """Close the websocket connection to AWS Transcribe."""
--- a/src/pipecat/services/aws/tts.py
+++ b/src/pipecat/services/aws/tts.py
@@ -34,7 +34,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use AWS services, you need to `pip install pipecat-ai[aws]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


 def language_to_aws_language(language: Language) -> str:
--- a/src/pipecat/services/azure/realtime/llm.py
+++ b/src/pipecat/services/azure/realtime/llm.py
@@ -17,7 +17,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use Azure Realtime, you need to `pip install pipecat-ai[openai]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


@dataclass
--- a/src/pipecat/services/azure/stt.py
+++ b/src/pipecat/services/azure/stt.py
@@ -49,7 +49,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use Azure, you need to `pip install pipecat-ai[azure]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


@dataclass
--- a/src/pipecat/services/azure/tts.py
+++ b/src/pipecat/services/azure/tts.py
@@ -42,7 +42,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use Azure, you need to `pip install pipecat-ai[azure]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


 def sample_rate_to_output_format(sample_rate: int) -> SpeechSynthesisOutputFormat:
@@ -540,14 +540,25 @@ class AzureTTSService(TTSService, AzureBaseTTSService):
        self._last_timestamp = timestamp

    async def _word_processor_task_handler(self):
-        """Process word timestamps from the queue and call add_word_timestamps."""
+        """Process word timestamps from the queue and call add_word_timestamps.
+
+        Also handles a None sentinel from _handle_completed: once all pending
+        words have been drained, it signals audio stream completion via
+        _audio_queue so that run_tts exits only after the last word has been
+        processed.
+        """
        while True:
            try:
-                word, timestamp_seconds = await self._word_boundary_queue.get()
-                if self._current_context_id:
-                    await self.add_word_timestamps(
-                        [(word, timestamp_seconds)], self._current_context_id
-                    )
+                item = await self._word_boundary_queue.get()
+                if item is None:
+                    # All words drained — now signal audio completion.
+                    self._audio_queue.put_nowait(None)
+                else:
+                    word, timestamp_seconds = item
+                    if self._current_context_id:
+                        await self.add_word_timestamps(
+                            [(word, timestamp_seconds)], self._current_context_id
+                        )
                self._word_boundary_queue.task_done()
            except asyncio.CancelledError:
                break
@@ -569,17 +580,21 @@ class AzureTTSService(TTSService, AzureBaseTTSService):
        Args:
            evt: Completion event from Azure Speech SDK.
        """
+        # Store duration for cumulative offset calculation
+        if evt.result and evt.result.audio_duration:
+            self._current_sentence_duration = evt.result.audio_duration.total_seconds()
+
        # Flush any pending word before completing
        if self._last_word is not None:
            self._word_boundary_queue.put_nowait((self._last_word, self._last_timestamp))
            self._last_word = None
            self._last_timestamp = None

-        # Store duration for cumulative offset calculation
-        if evt.result and evt.result.audio_duration:
-            self._current_sentence_duration = evt.result.audio_duration.total_seconds()
-
-        self._audio_queue.put_nowait(None)  # Signal completion
+        # Route completion through the word boundary queue so the word processor
+        # task drains all pending words before signaling audio stream completion.
+        # Without this, the last word's TTSTextFrame may arrive after
+        # TTSStoppedFrame, causing it to be missed by observers and the UI.
+        self._word_boundary_queue.put_nowait(None)

    def _handle_canceled(self, evt):
        """Handle synthesis cancellation.
--- a/src/pipecat/services/cartesia/stt.py
+++ b/src/pipecat/services/cartesia/stt.py
@@ -42,7 +42,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use Cartesia, you need to `pip install pipecat-ai[cartesia]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


@dataclass
@@ -354,7 +354,8 @@ class CartesiaSTTService(WebsocketSTTService):
            self._websocket = await websocket_connect(ws_url, additional_headers=headers)
            await self._call_event_handler("on_connected")
        except Exception as e:
-            await self.push_error(error_msg=f"Unknown error occurred: {e}", exception=e)
+            self._websocket = None
+            await self.push_error(error_msg=f"Unable to connect to Cartesia: {e}", exception=e)

    async def _disconnect_websocket(self):
        ws = self._websocket
--- a/src/pipecat/services/cartesia/tts.py
+++ b/src/pipecat/services/cartesia/tts.py
@@ -8,6 +8,7 @@

 import base64
 import json
+import re
 from collections.abc import AsyncGenerator
 from dataclasses import dataclass, field
 from enum import StrEnum
@@ -39,7 +40,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use Cartesia, you need to `pip install pipecat-ai[cartesia]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


 class GenerationConfig(BaseModel):
@@ -431,10 +432,20 @@ class CartesiaTTSService(WebsocketTTSService):
        base_lang = language.split("-")[0].lower()
        return base_lang in {"zh", "ja"}

-    def _process_word_timestamps_for_language(
+    _CARTESIA_TAG_RE = re.compile(r"</?(?:spell|emotion|break|volume|speed)\b[^>]*>", re.IGNORECASE)
+
+    def _strip_cartesia_tags(self, text: str) -> str:
+        text = self._CARTESIA_TAG_RE.sub(" ", text)
+        text = re.sub(r"\s+", " ", text)
+        return text.strip()
+
+    def _normalize_word_timestamps(
        self, words: list[str], starts: list[float]
    ) -> list[tuple[str, float]]:
-        """Process word timestamps based on the current language.
+        """Normalize raw word timestamps from Cartesia before further processing.
+
+        Strips Cartesia SSML tags (spell, emotion, break, volume, speed) from each word
+        and drops entries that become empty after stripping.

        For Chinese and Japanese, Cartesia groups related characters in the same timestamp
        message.
@@ -458,14 +469,18 @@ class CartesiaTTSService(WebsocketTTSService):
            # For Chinese/Japanese, combine all characters in this message into one word
            # using the first character's start time.
            if words and starts:
-                combined_word = "".join(words)
+                combined_word = "".join(self._strip_cartesia_tags(w) for w in words)
                first_start = starts[0]
-                return [(combined_word, first_start)]
+                return [(combined_word, first_start)] if combined_word else []
            else:
                return []
        else:
-            # For non-CJK languages, use as-is
-            return list(zip(words, starts))
+            result = []
+            for word, start in zip(words, starts):
+                cleaned = self._strip_cartesia_tags(word)
+                if cleaned:
+                    result.append((cleaned, start))
+            return result

    def _word_timestamps_include_inter_frame_spaces(self) -> bool:
        """Whether timestamp text should be treated as carrying its own spacing."""
@@ -662,7 +677,7 @@ class CartesiaTTSService(WebsocketTTSService):
                await self.remove_audio_context(ctx_id)
            elif msg["type"] == "timestamps":
                # Process the timestamps based on language before adding them
-                processed_timestamps = self._process_word_timestamps_for_language(
+                processed_timestamps = self._normalize_word_timestamps(
                    msg["word_timestamps"]["words"], msg["word_timestamps"]["start"]
                )
                await self.add_word_timestamps(
--- a/src/pipecat/services/deepgram/flux/stt.py
+++ b/src/pipecat/services/deepgram/flux/stt.py
@@ -31,7 +31,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use Deepgram Flux, you need to `pip install pipecat-ai[deepgram]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")

 # Re-export for backward compatibility
 __all__ = [
--- a/src/pipecat/services/deepgram/stt.py
+++ b/src/pipecat/services/deepgram/stt.py
@@ -48,7 +48,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use Deepgram, you need to `pip install pipecat-ai[deepgram]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


 class LiveOptions:
--- a/src/pipecat/services/deepgram/tts.py
+++ b/src/pipecat/services/deepgram/tts.py
@@ -39,7 +39,7 @@ except ModuleNotFoundError as e:
    logger.error(
        "In order to use DeepgramWebsocketTTSService, you need to `pip install pipecat-ai[deepgram]`."
    )
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


@dataclass
--- a/src/pipecat/services/elevenlabs/stt.py
+++ b/src/pipecat/services/elevenlabs/stt.py
@@ -52,7 +52,7 @@ except ModuleNotFoundError as e:
    logger.error(
        "In order to use ElevenLabs Realtime STT, you need to `pip install pipecat-ai[elevenlabs]`."
    )
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


 def language_to_elevenlabs_language(language: Language) -> str:
@@ -823,6 +823,7 @@ class ElevenLabsRealtimeSTTService(WebsocketSTTService):
            await self._call_event_handler("on_connected")
            logger.debug("Connected to ElevenLabs Realtime STT")
        except Exception as e:
+            self._websocket = None
            await self.push_error(
                error_msg=f"Unable to connect to ElevenLabs Realtime STT: {e}", exception=e
            )
--- a/src/pipecat/services/elevenlabs/tts.py
+++ b/src/pipecat/services/elevenlabs/tts.py
@@ -56,7 +56,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use ElevenLabs, you need to `pip install pipecat-ai[elevenlabs]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")

 # Models that support language codes
 # The following models are excluded as they don't support language codes:
@@ -594,6 +594,10 @@ class ElevenLabsTTSService(WebsocketTTSService):
        self._partial_word_start_time = 0.0
        self._alignment_started_context_ids: set[str | None] = set()

+        # Context IDs whose context-init has been sent, so the keepalive knows
+        # which contexts are safe to target.
+        self._context_init_sent: set[str] = set()
+
        # Context management for v1 multi API
        self._receive_task = None
        self._keepalive_task = None
@@ -792,6 +796,7 @@ class ElevenLabsTTSService(WebsocketTTSService):
        finally:
            await self.remove_active_audio_context()
            self._websocket = None
+            self._context_init_sent.clear()
            await self._call_event_handler("on_disconnected")

    def _get_websocket(self):
@@ -822,6 +827,7 @@ class ElevenLabsTTSService(WebsocketTTSService):
        self._partial_word = ""
        self._partial_word_start_time = 0.0
        self._alignment_started_context_ids.discard(context_id)
+        self._context_init_sent.discard(context_id)

    async def on_audio_context_interrupted(self, context_id: str):
        """Close the ElevenLabs context when the bot is interrupted."""
@@ -914,26 +920,35 @@ class ElevenLabsTTSService(WebsocketTTSService):
        while True:
            await asyncio.sleep(KEEPALIVE_SLEEP)
            try:
-                if self._websocket and self._websocket.state is State.OPEN:
-                    context_id = self.get_active_audio_context_id()
-                    if context_id:
-                        # Send keepalive with context ID to keep the connection alive
-                        keepalive_message = {
-                            "text": "",
-                            "context_id": context_id,
-                        }
-                        logger.trace(f"Sending keepalive for context {context_id}")
-                    else:
-                        # It's possible to have a user interruption which clears the context
-                        # without generating a new TTS response. In this case, we'll just send
-                        # an empty message to keep the connection alive.
-                        keepalive_message = {"text": ""}
-                        logger.trace("Sending keepalive without context")
-                    await self._websocket.send(json.dumps(keepalive_message))
+                await self._send_keepalive()
            except websockets.ConnectionClosed as e:
                logger.warning(f"{self} keepalive error: {e}")
                break

+    async def _send_keepalive(self):
+        """Send a single keepalive message to keep the WebSocket connection alive.
+
+        Only stamps a ``context_id`` once its context-init (carrying
+        ``voice_settings``) has been sent. Otherwise the keepalive would be the
+        context's first message, with no ``voice_settings``, and ElevenLabs would
+        reject the later context-init with a 1008 policy violation. A context-less
+        keepalive is sufficient until the context-init is sent.
+        """
+        if not self._websocket or self._websocket.state is not State.OPEN:
+            return
+
+        context_id = self.get_active_audio_context_id()
+        if context_id and context_id in self._context_init_sent:
+            # The context's voice_settings context-init has been sent, so it's
+            # safe to keep that context alive.
+            keepalive_message = {"text": "", "context_id": context_id}
+        else:
+            # No active context, or the active context's context-init hasn't been
+            # sent yet. A context-less keepalive keeps the connection alive without
+            # opening the context prematurely.
+            keepalive_message = {"text": ""}
+        await self._websocket.send(json.dumps(keepalive_message))
+
    async def _send_text(self, text: str, context_id: str):
        """Send text to the WebSocket for synthesis."""
        if self._websocket and context_id:
@@ -980,6 +995,9 @@ class ElevenLabsTTSService(WebsocketTTSService):
                            locator.model_dump()
                            for locator in self._pronunciation_dictionary_locators
                        ]
+                    # Mark the context-init as sent so the keepalive may now
+                    # target this context_id.
+                    self._context_init_sent.add(context_id)
                    await self._websocket.send(json.dumps(msg))
                    logger.trace(f"Created new context {context_id}")

--- a/src/pipecat/services/fish/tts.py
+++ b/src/pipecat/services/fish/tts.py
@@ -38,7 +38,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use Fish Audio, you need to `pip install pipecat-ai[fish]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")

 # FishAudio supports various output formats
 FishAudioOutputFormat = Literal["opus", "mp3", "pcm", "wav"]
--- a/src/pipecat/services/gladia/stt.py
+++ b/src/pipecat/services/gladia/stt.py
@@ -53,7 +53,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use Gladia, you need to `pip install pipecat-ai[gladia]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


 def language_to_gladia_language(language: Language) -> str:
@@ -558,8 +558,9 @@ class GladiaSTTService(WebsocketSTTService):

            logger.debug(f"{self} Connected to Gladia WebSocket")
        except Exception as e:
+            self._websocket = None
+            self._connection_active = False
            await self.push_error(error_msg=f"Unable to connect to Gladia: {e}", exception=e)
-            raise

    async def _disconnect_websocket(self):
        """Close the websocket connection to Gladia."""
--- a/src/pipecat/services/google/gemini_live/llm.py
+++ b/src/pipecat/services/google/gemini_live/llm.py
@@ -105,7 +105,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use Google AI, you need to `pip install pipecat-ai[google]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


 # Connection management constants
--- a/src/pipecat/services/google/gemini_live/vertex/llm.py
+++ b/src/pipecat/services/google/gemini_live/vertex/llm.py
@@ -36,7 +36,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use Google Vertex AI, you need to `pip install pipecat-ai[google]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


@dataclass
--- a/src/pipecat/services/google/image.py
+++ b/src/pipecat/services/google/image.py
@@ -35,7 +35,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use Google AI, you need to `pip install pipecat-ai[google]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


@dataclass
--- a/src/pipecat/services/google/llm.py
+++ b/src/pipecat/services/google/llm.py
@@ -65,7 +65,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use Google AI, you need to `pip install pipecat-ai[google]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


 class GoogleThinkingConfig(BaseModel):
--- a/src/pipecat/services/google/stt.py
+++ b/src/pipecat/services/google/stt.py
@@ -57,7 +57,7 @@ except ModuleNotFoundError as e:
    logger.error(
        "In order to use Google AI, you need to `pip install pipecat-ai[google]`. Also, set `GOOGLE_APPLICATION_CREDENTIALS` environment variable."
    )
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


 def language_to_google_stt_language(language: Language) -> str:
--- a/src/pipecat/services/google/tts.py
+++ b/src/pipecat/services/google/tts.py
@@ -57,7 +57,7 @@ except ModuleNotFoundError as e:
    logger.error(
        "In order to use Google AI, you need to `pip install pipecat-ai[google]`. Also, set `GOOGLE_APPLICATION_CREDENTIALS` environment variable."
    )
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


 def language_to_google_tts_language(language: Language) -> str:
--- a/src/pipecat/services/google/vertex/llm.py
+++ b/src/pipecat/services/google/vertex/llm.py
@@ -35,7 +35,7 @@ except ModuleNotFoundError as e:
    logger.error(
        "In order to use Google AI, you need to `pip install pipecat-ai[google]`. Also, set `GOOGLE_APPLICATION_CREDENTIALS` environment variable."
    )
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


@dataclass
--- a/src/pipecat/services/gradium/stt.py
+++ b/src/pipecat/services/gradium/stt.py
@@ -44,7 +44,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error('In order to use Gradium, you need to `pip install "pipecat-ai[gradium]"`.')
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")

 # Seconds to wait after a "flushed" message for trailing text tokens to arrive
 # before finalizing the transcription.
@@ -423,8 +423,8 @@ class GradiumSTTService(WebsocketSTTService):
            logger.debug("Connected to Gradium STT")

        except Exception as e:
-            await self.push_error(error_msg=f"Unknown error occurred: {e}", exception=e)
-            raise
+            self._websocket = None
+            await self.push_error(error_msg=f"Unable to connect to Gradium: {e}", exception=e)

    async def _disconnect(self):
        await super()._disconnect()
--- a/src/pipecat/services/gradium/tts.py
+++ b/src/pipecat/services/gradium/tts.py
@@ -33,7 +33,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use Gradium, you need to `pip install pipecat-ai[gradium]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")

 SAMPLE_RATE = 48000

--- a/src/pipecat/services/groq/tts.py
+++ b/src/pipecat/services/groq/tts.py
@@ -30,7 +30,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use Groq, you need to `pip install pipecat-ai[groq]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")

 # Hint set for `output_format`. The values mirror the Literal that
 # `groq.resources.audio.speech.AsyncSpeech.create` accepts on its
--- a/src/pipecat/services/heygen/client.py
+++ b/src/pipecat/services/heygen/client.py
@@ -46,7 +46,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use HeyGen, you need to `pip install pipecat-ai[heygen]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")

 HEY_GEN_SAMPLE_RATE = 24000

--- a/src/pipecat/services/hume/tts.py
+++ b/src/pipecat/services/hume/tts.py
@@ -38,7 +38,7 @@ try:
 except ModuleNotFoundError as e:  # pragma: no cover - import-time guidance
    logger.error(f"Exception: {e}")
    logger.error("In order to use Hume, you need to `pip install pipecat-ai[hume]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


 HUME_SAMPLE_RATE = 48_000  # Hume TTS streams at 48 kHz
--- a/src/pipecat/services/inception/init.py
+++ b/src/pipecat/services/inception/init.py
--- a/src/pipecat/services/inception/llm.py
+++ b/src/pipecat/services/inception/llm.py
@@ -0,0 +1,124 @@
+#
+# Copyright (c) 2024-2026, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+"""Inception LLM service implementation using OpenAI-compatible interface."""
+
+from dataclasses import dataclass, field
+from typing import Literal
+
+from loguru import logger
+
+from pipecat.adapters.services.open_ai_adapter import OpenAILLMInvocationParams
+from pipecat.services.openai.base_llm import BaseOpenAILLMService
+from pipecat.services.openai.llm import OpenAILLMService
+from pipecat.services.settings import NOT_GIVEN as _NOT_GIVEN
+from pipecat.services.settings import _NotGiven, is_given
+
+
+@dataclass
+class InceptionLLMSettings(BaseOpenAILLMService.Settings):
+    """Settings for InceptionLLMService.
+
+    Parameters:
+        reasoning_effort: Controls how much reasoning the model applies.
+            One of "instant", "low", "medium", or "high". When unset, the
+            parameter is omitted and Inception's server-side default applies.
+        realtime: When True, reduces time to first diffusion block (TTFT).
+    """
+
+    reasoning_effort: Literal["instant", "low", "medium", "high"] | None | _NotGiven = field(
+        default_factory=lambda: _NOT_GIVEN
+    )
+    realtime: bool | None | _NotGiven = field(default_factory=lambda: _NOT_GIVEN)
+
+
+class InceptionLLMService(OpenAILLMService):
+    """A service for interacting with Inception's API using the OpenAI-compatible interface.
+
+    This service extends OpenAILLMService to connect to Inception's API endpoint while
+    maintaining full compatibility with OpenAI's interface and functionality.
+    Supports Mercury-2, Inception's diffusion-based reasoning model.
+    """
+
+    # Inception doesn't support the "developer" message role.
+    supports_developer_role = False
+
+    Settings = InceptionLLMSettings
+    _settings: Settings
+
+    def __init__(
+        self,
+        *,
+        api_key: str,
+        base_url: str = "https://api.inceptionlabs.ai/v1",
+        settings: Settings | None = None,
+        **kwargs,
+    ):
+        """Initialize the Inception LLM service.
+
+        Args:
+            api_key: The API key for accessing Inception's API.
+            base_url: The base URL for Inception API. Defaults to "https://api.inceptionlabs.ai/v1".
+            settings: Runtime-updatable settings.
+            **kwargs: Additional keyword arguments passed to OpenAILLMService.
+        """
+        default_settings = self.Settings(
+            model="mercury-2",
+            reasoning_effort=None,
+            realtime=None,
+        )
+
+        if settings is not None:
+            default_settings.apply_update(settings)
+
+        super().__init__(api_key=api_key, base_url=base_url, settings=default_settings, **kwargs)
+
+    def create_client(self, api_key=None, base_url=None, **kwargs):
+        """Create OpenAI-compatible client for Inception API endpoint.
+
+        Args:
+            api_key: The API key for authentication. If None, uses instance default.
+            base_url: The base URL for the API. If None, uses instance default.
+            **kwargs: Additional keyword arguments for client configuration.
+
+        Returns:
+            An OpenAI-compatible client configured for Inception's API.
+        """
+        logger.debug(f"Creating Inception client with api {base_url}")
+        return super().create_client(api_key, base_url, **kwargs)
+
+    def build_chat_completion_params(self, params_from_context: OpenAILLMInvocationParams) -> dict:
+        """Build parameters for Inception chat completion request.
+
+        Extends the base OpenAI parameters with Inception-specific options
+        such as reasoning_effort and realtime.
+
+        Args:
+            params_from_context: Parameters, derived from the LLM context, to
+                use for the chat completion. Contains messages, tools, and tool
+                choice.
+
+        Returns:
+            Dictionary of parameters for the chat completion request.
+        """
+        params = super().build_chat_completion_params(params_from_context)
+
+        if (
+            is_given(self._settings.reasoning_effort)
+            and self._settings.reasoning_effort is not None
+        ):
+            params["reasoning_effort"] = self._settings.reasoning_effort
+
+        # realtime is Inception-specific and unknown to the OpenAI SDK,
+        # so it must be passed via extra_body to avoid validation errors.
+        extra_body = {}
+        if is_given(self._settings.realtime) and self._settings.realtime is not None:
+            extra_body["realtime"] = self._settings.realtime
+
+        if extra_body:
+            params["extra_body"] = extra_body
+
+        return params
--- a/src/pipecat/services/inworld/realtime/llm.py
+++ b/src/pipecat/services/inworld/realtime/llm.py
@@ -68,7 +68,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use Inworld Realtime, you need to `pip install pipecat-ai[inworld]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


@dataclass
--- a/src/pipecat/services/inworld/tts.py
+++ b/src/pipecat/services/inworld/tts.py
@@ -43,7 +43,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use Inworld WebSocket TTS, you need to `pip install websockets`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")

 from pipecat.frames.frames import (
    AggregationType,
--- a/src/pipecat/services/kokoro/tts.py
+++ b/src/pipecat/services/kokoro/tts.py
@@ -32,7 +32,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use Kokoro, you need to `pip install pipecat-ai[kokoro]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")

 KOKORO_CACHE_DIR = Path(os.path.expanduser("~/.cache/kokoro-onnx"))
 KOKORO_MODEL_URL = "https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/kokoro-v1.0.onnx"
--- a/src/pipecat/services/lmnt/tts.py
+++ b/src/pipecat/services/lmnt/tts.py
@@ -34,7 +34,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use LMNT, you need to `pip install pipecat-ai[lmnt]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


 def language_to_lmnt_language(language: Language) -> str:
--- a/src/pipecat/services/mcp_service.py
+++ b/src/pipecat/services/mcp_service.py
@@ -29,7 +29,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use an MCP client, you need to `pip install pipecat-ai[mcp]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")

 ServerParameters: TypeAlias = StdioServerParameters | SseServerParameters | StreamableHttpParameters

--- a/src/pipecat/services/mem0/memory.py
+++ b/src/pipecat/services/mem0/memory.py
@@ -28,7 +28,7 @@ except ModuleNotFoundError as e:
    logger.error(
        "In order to use Mem0, you need to `pip install mem0ai`. Also, set the environment variable MEM0_API_KEY."
    )
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


 class Mem0MemoryService(FrameProcessor):
--- a/src/pipecat/services/mistral/stt.py
+++ b/src/pipecat/services/mistral/stt.py
@@ -48,7 +48,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use Mistral STT, you need to `pip install pipecat-ai[mistral]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


@dataclass
--- a/src/pipecat/services/mistral/tts.py
+++ b/src/pipecat/services/mistral/tts.py
@@ -31,7 +31,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use Mistral TTS, you need to `pip install pipecat-ai[mistral]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


@dataclass
--- a/src/pipecat/services/moondream/vision.py
+++ b/src/pipecat/services/moondream/vision.py
@@ -34,7 +34,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use Moondream, you need to `pip install pipecat-ai[moondream]`.")
-    raise ImportError(f"Missing module(s): {e}") from e
+    raise Exception(f"Missing module(s): {e}")


 def detect_device():
--- a/src/pipecat/services/neuphonic/tts.py
+++ b/src/pipecat/services/neuphonic/tts.py
@@ -41,7 +41,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use Neuphonic, you need to `pip install pipecat-ai[neuphonic]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


 def language_to_neuphonic_lang_code(language: Language) -> str:
--- a/src/pipecat/services/nvidia/stt.py
+++ b/src/pipecat/services/nvidia/stt.py
@@ -46,7 +46,7 @@ except ModuleNotFoundError as e:
    logger.error(
        "In order to use NVIDIA Nemotron Speech STT, you need to `pip install pipecat-ai[nvidia]`."
    )
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


 def language_to_nvidia_nemotron_speech_language(language: Language) -> str:
--- a/src/pipecat/services/nvidia/tts.py
+++ b/src/pipecat/services/nvidia/tts.py
@@ -55,7 +55,7 @@ except ModuleNotFoundError as e:
    logger.error(
        "In order to use NVIDIA Nemotron Speech TTS, you need to `pip install pipecat-ai[nvidia]`."
    )
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


@dataclass
--- a/src/pipecat/services/openai/realtime/llm.py
+++ b/src/pipecat/services/openai/realtime/llm.py
@@ -71,7 +71,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use OpenAI, you need to `pip install pipecat-ai[openai]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


@dataclass
--- a/src/pipecat/services/openai/responses/llm.py
+++ b/src/pipecat/services/openai/responses/llm.py
@@ -59,7 +59,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use OpenAI, you need to `pip install pipecat-ai[openai]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


 # ---------------------------------------------------------------------------
--- a/src/pipecat/services/piper/tts.py
+++ b/src/pipecat/services/piper/tts.py
@@ -30,7 +30,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use Piper, you need to `pip install pipecat-ai[piper]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


@dataclass
--- a/src/pipecat/services/resembleai/tts.py
+++ b/src/pipecat/services/resembleai/tts.py
@@ -33,7 +33,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use Resemble AI, you need to `pip install pipecat-ai[resembleai]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


@dataclass
--- a/src/pipecat/services/rime/tts.py
+++ b/src/pipecat/services/rime/tts.py
@@ -47,7 +47,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use Rime, you need to `pip install pipecat-ai[rime]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


 def language_to_rime_language(language: Language) -> str:
--- a/src/pipecat/services/sarvam/stt.py
+++ b/src/pipecat/services/sarvam/stt.py
@@ -53,7 +53,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use Sarvam, you need to `pip install pipecat-ai[sarvam]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


 def language_to_sarvam_language(language: Language) -> str:
--- a/src/pipecat/services/sarvam/tts.py
+++ b/src/pipecat/services/sarvam/tts.py
@@ -70,7 +70,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use Sarvam, you need to `pip install pipecat-ai[sarvam]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


 class SarvamTTSModel(StrEnum):
--- a/src/pipecat/services/simli/video.py
+++ b/src/pipecat/services/simli/video.py
@@ -35,7 +35,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use Simli, you need to `pip install pipecat-ai[simli]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


@dataclass
--- a/src/pipecat/services/smallest/stt.py
+++ b/src/pipecat/services/smallest/stt.py
@@ -48,7 +48,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use Smallest, you need to `pip install pipecat-ai[smallest]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


 def language_to_smallest_stt_language(language: Language) -> str:
--- a/src/pipecat/services/smallest/tts.py
+++ b/src/pipecat/services/smallest/tts.py
@@ -41,7 +41,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use Smallest, you need to `pip install pipecat-ai[smallest]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


 class SmallestTTSModel(StrEnum):
--- a/src/pipecat/services/soniox/stt.py
+++ b/src/pipecat/services/soniox/stt.py
@@ -39,7 +39,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use Soniox, you need to `pip install pipecat-ai[soniox]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


 KEEPALIVE_MESSAGE = '{"type": "keepalive"}'
@@ -155,7 +155,6 @@ def language_to_soniox_language(language: Language) -> str:
        Language.ID: "id",
        Language.IT: "it",
        Language.JA: "ja",
-        Language.KA: "ka",
        Language.KK: "kk",
        Language.KN: "kn",
        Language.KO: "ko",
@@ -232,6 +231,7 @@ class SonioxSTTSettings(STTSettings):
            context_version 2.
        enable_speaker_diarization: Whether to enable speaker diarization.
        enable_language_identification: Whether to enable language identification.
+        max_endpoint_delay_ms: Max ms before endpoint detection finalizes the turn (500-3000).
        client_reference_id: Client reference ID to use for transcription.
    """

@@ -242,6 +242,7 @@ class SonioxSTTSettings(STTSettings):
    enable_language_identification: bool | None | _NotGiven = field(
        default_factory=lambda: NOT_GIVEN
    )
+    max_endpoint_delay_ms: int | None | _NotGiven = field(default_factory=lambda: NOT_GIVEN)
    client_reference_id: str | None | _NotGiven = field(default_factory=lambda: NOT_GIVEN)


@@ -309,6 +310,7 @@ class SonioxSTTService(WebsocketSTTService):
            context=None,
            enable_speaker_diarization=False,
            enable_language_identification=False,
+            max_endpoint_delay_ms=None,
            client_reference_id=None,
        )

@@ -390,8 +392,7 @@ class SonioxSTTService(WebsocketSTTService):
        changed = await super()._update_settings(delta)

        if changed:
-            await self._disconnect()
-            await self._connect()
+            await self._request_reconnect()

        return changed

@@ -522,6 +523,7 @@ class SonioxSTTService(WebsocketSTTService):
                "audio_format": self._audio_format,
                "num_channels": self._num_channels,
                "enable_endpoint_detection": enable_endpoint_detection,
+                "max_endpoint_delay_ms": s.max_endpoint_delay_ms,
                "sample_rate": self.sample_rate,
                "language_hints": _prepare_language_hints(assert_given(s.language_hints)),
                "language_hints_strict": s.language_hints_strict,
@@ -537,8 +539,8 @@ class SonioxSTTService(WebsocketSTTService):
            await self._call_event_handler("on_connected")
            logger.debug("Connected to Soniox STT")
        except Exception as e:
+            self._websocket = None
            await self.push_error(error_msg=f"Unable to connect to Soniox: {e}", exception=e)
-            raise

    async def _disconnect_websocket(self):
        """Close the websocket connection to Soniox."""
--- a/src/pipecat/services/soniox/tts.py
+++ b/src/pipecat/services/soniox/tts.py
@@ -44,7 +44,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error("In order to use Soniox, you need to `pip install pipecat-ai[soniox]`.")
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


 # Soniox idle timeout is 20-30s; keepalive cadence must stay well inside it.
--- a/src/pipecat/services/speechmatics/stt.py
+++ b/src/pipecat/services/speechmatics/stt.py
@@ -61,7 +61,7 @@ except ModuleNotFoundError as e:
    logger.error(
        "In order to use Speechmatics, you need to `pip install pipecat-ai[speechmatics]`."
    )
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


 load_dotenv()
--- a/src/pipecat/services/speechmatics/tts.py
+++ b/src/pipecat/services/speechmatics/tts.py
@@ -32,7 +32,7 @@ except ModuleNotFoundError as e:
    logger.error(
        "In order to use Speechmatics, you need to `pip install pipecat-ai[speechmatics]`."
    )
-    raise ImportError(f"Missing module: {e}") from e
+    raise Exception(f"Missing module: {e}")


@dataclass
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Mark Backman	780c004168	Merge pull request #4423 from joycech333/feat/inception-llm-service feat: add Inception LLM service with Mercury 2 support	2026-05-21 12:02:27 -04:00
Mark Backman	28f9203401	Code review fixes	2026-05-21 11:45:17 -04:00
joycech333	77cc314a08	feat: add Inception LLM service with Mercury-2 support Adds InceptionLLMService, an OpenAI-compatible service for Inception's Mercury-2 diffusion-based reasoning model. Supports reasoning_effort (instant/low/medium/high) and realtime mode for reduced TTFT.	2026-05-21 11:23:23 -04:00
Mark Backman	4a8d1d0b5e	Merge pull request #4532 from pipecat-ai/mb/cleanup-logging-after-smart-text-handling Clean up smart text logging	2026-05-21 08:35:46 -04:00
Mark Backman	87f5d60693	Merge pull request #4531 from pipecat-ai/mb/pipecat-prebuilt-1.0.1 chore: bump pipecat-ai-prebuilt to 1.0.1	2026-05-21 08:35:31 -04:00
Mark Backman	c699b31daa	Merge pull request #4534 from pipecat-ai/mb/changelog-4521 Add changelog for #4521	2026-05-21 08:35:15 -04:00
Mark Backman	ee674ffb01	Add changelog for #4521	2026-05-20 17:57:43 -04:00
mihafabcic-soniox	86a5710801	Add max_endpoint_delay_ms and clean up Sonoix STT settings (#4521 )	2026-05-20 17:54:48 -04:00
Mark Backman	4a96b2a9e6	Clean up smart text logging	2026-05-20 15:38:59 -04:00
Mark Backman	105d6f27da	Merge pull request #4514 from pipecat-ai/mb/websocket-stt-service-exception-handling Align websocket STT connection failures	2026-05-20 15:15:35 -04:00
Filipi da Silva Fuchter	e0e3cd336a	Merge pull request #4529 from pipecat-ai/filipi/squash_skill New skill to squash commits.	2026-05-20 16:06:23 -03:00
Mark Backman	9586db5b50	Preserve websocket reconnect failure retries	2026-05-20 14:45:29 -04:00
Mark Backman	a890ab7b21	Add changelog for PR #4531	2026-05-20 12:18:03 -04:00
Mark Backman	c1bf7dbb4a	chore: bump pipecat-ai-prebuilt to 1.0.1	2026-05-20 12:15:09 -04:00
Mark Backman	709a0ce839	Merge pull request #4527 from pipecat-ai/mb/fix-elevenlabs-keepalive-1008 Fix ElevenLabs keepalive racing context-init (1008 disconnects)	2026-05-20 11:21:17 -04:00
Mark Backman	be93350eae	Merge pull request #4522 from pipecat-ai/mb/stt-latency-smallest Add P99 latency for Smallest AI, Mistral, XAI STT	2026-05-20 11:21:00 -04:00
Mark Backman	4a96ab7073	Merge pull request #4524 from pipecat-ai/mb/fix-runner-imports Improve runner optional transport handling	2026-05-20 11:16:16 -04:00
filipi87	c321f50e76	New skill to squash commits.	2026-05-20 10:29:03 -03:00
Filipi da Silva Fuchter	bca337f97e	Merge pull request #4380 from pipecat-ai/filipi/smart_text Smart Text Handling	2026-05-20 10:18:30 -03:00
filipi87	5d9e8c5ac5	Removing debug log.	2026-05-20 10:13:46 -03:00
Mark Backman	70773bce0a	Add changelog for PR #4527	2026-05-20 09:08:47 -04:00
filipi87	8bdb49bd1a	chore: add changelogs for word-timestamp and frame-ordering fixes	2026-05-20 10:03:30 -03:00
filipi87	81bb81c1d0	test: add automated tests for word tracking, frame sequencing, and Cartesia TTS Adds tests for AggregatedFrameSequencer, WordCompletionTracker, and word_timestamp_utils (including CJK language scenarios). Updates existing Cartesia TTS and TTS frame ordering tests to cover the new behaviours.	2026-05-20 10:03:26 -03:00
filipi87	e1bdee598c	fix: preserve raw_text through TTS pipeline for correct LLM context attribution TTSTextFrame entries were losing their original text structure when word timestamps were enabled. AggregatedTextFrame now carries a raw_text field with the original LLM-produced text (including pattern delimiters such as <card>...</card>). The assistant context receives properly-tagged content rather than the cleaned words returned by the TTS provider. Also handles words that straddle two sentence boundaries by splitting and attributing each part to its correct source frame.	2026-05-20 10:03:21 -03:00
filipi87	185a89bb3b	fix: strip Cartesia SSML tags from word timestamp entries SSML markup (e.g. <spell>, <emotion>, <break>) was leaking into word entries returned by the Cartesia word-timestamps API. Tags are now stripped before processing so word-to-text attribution remains accurate when SSML is present in the TTS input.	2026-05-20 10:03:15 -03:00
filipi87	6b9deefbe3	fix: preserve frame insertion order in BaseOutputTransport for equal PTS values Frames sharing the same presentation timestamp were being reordered by the priority queue. Adds a monotonic counter as a tiebreaker so frames with equal PTS are always emitted in insertion order, preventing subtle audio/text sequencing bugs.	2026-05-20 10:03:08 -03:00
filipi87	deefc32faf	fix: hold skipped TTS frames in position until preceding spoken frames complete Skipped frames (e.g. code blocks filtered via skip_aggregator_types) were emitted to the assistant context immediately instead of waiting for preceding spoken frames to finish. Introduces AggregatedFrameSequencer to hold each frame's slot and flush only after all earlier spoken sentences are complete, keeping context ordering correct.	2026-05-20 10:03:03 -03:00
Mark Backman	a5e6886b80	Fix ElevenLabs keepalive racing context-init (1008 disconnects) The keepalive could fire for a new turn's context before that context's voice_settings context-init was sent, making the keepalive the context's first message (no voice_settings) and causing ElevenLabs to reject the later init with a 1008 policy violation. The keepalive now only targets a context once its context-init has been sent (tracked in _context_init_sent).	2026-05-20 08:59:01 -04:00
Mark Backman	d11a4ba0cd	Use shared telephony route availability checks	2026-05-20 08:57:48 -04:00
Mark Backman	38407e091d	Add p99 values for Mistral and XAI	2026-05-19 22:51:33 -04:00
Mark Backman	82cd931efa	Merge pull request #4306 from YFortin/fix/azure-tts-last-word-race fix(azure-tts): Route completion through word boundary queue to prevent last word from being missed	2026-05-19 22:27:50 -04:00
Mark Backman	33e5d1f89b	Add changelog for PR #4522	2026-05-19 18:33:58 -04:00
Mark Backman	861dd23873	Add changelog for runner updates	2026-05-19 17:31:07 -04:00
Mark Backman	b825dd779e	Clarify runner startup banner	2026-05-19 17:31:07 -04:00
Mark Backman	1487da53a9	Improve runner optional transport handling	2026-05-19 17:03:16 -04:00
Mark Backman	aff84a5d9e	Add P99 latency for Smallest AI STT	2026-05-19 11:05:15 -04:00
Mark Backman	e298491068	Add changelog for websocket STT failure handling	2026-05-18 12:41:56 -04:00
Mark Backman	97b00042df	Align websocket STT connection failures	2026-05-18 12:35:01 -04:00
Yan Fortin	6feeee515f	chore: rename changelog fragment to match PR #4306	2026-04-14 18:49:35 -04:00
Yan Fortin	55fb4b0845	fix(azure-tts): route completion through word boundary queue to prevent last word from being missed The Azure TTS _handle_completed callback was putting the audio stream completion signal (None) directly into _audio_queue while the last word was still pending in _word_boundary_queue. This caused a race condition where run_tts could exit and TTSStoppedFrame could be emitted before the word processor task had a chance to process and emit the final word's TTSTextFrame. The fix routes the completion signal through _word_boundary_queue as a None sentinel. The word processor task now recognizes this sentinel and only signals _audio_queue after all pending words have been drained. This guarantees the last word's TTSTextFrame is always emitted before TTSStoppedFrame. The cancellation/interruption path (_handle_canceled) is unchanged and still signals _audio_queue directly, which is correct since word ordering does not matter when speech is interrupted.	2026-04-14 18:48:40 -04:00
				`@@ -0,0 +1 @@`
				- Fixed Azure TTS last word being missed by observers and RTVI UI. The completion signal was racing with word timestamp processing, causing the final word's `TTSTextFrame` to arrive after `TTSStoppedFrame`. Completion is now routed through the word boundary queue to ensure all words are processed before signaling stream end.
				`@@ -0,0 +1 @@`
				- Fixed `BaseOutputTransport` reordering frames that share the same presentation timestamp. Frames with equal PTS values are now emitted in insertion order, preventing subtle audio/text sequencing bugs when multiple frames arrive at the same time.
				`@@ -0,0 +1 @@`
				- Fixed Cartesia word timestamps leaking SSML tag text (e.g. `<spell>`, `<emotion>`, `<break>`) into word entries. Tags are now stripped before processing, so word-to-text attribution remains accurate when SSML markup is present in the TTS input.
				`@@ -0,0 +1 @@`
				- Fixed `TTSTextFrame` entries losing their original text structure when word timestamps are enabled. Each `TTSTextFrame` now carries a `raw_text` field containing the corresponding span of the original LLM-produced text (including pattern delimiters such as `<card>4111 1111 1111 1111</card>`), so the assistant context receives properly-tagged content rather than the cleaned words returned by the TTS provider. Also handles words that straddle two sentence boundaries by splitting them and attributing each part to its correct source frame.
				`@@ -0,0 +1 @@`
				- Fixed skipped TTS frames (e.g. code blocks filtered via `skip_aggregator_types`) being emitted to the assistant context immediately instead of waiting for preceding spoken frames to finish. They now hold their position in the frame sequence and are flushed only after all earlier spoken sentences are complete, keeping context ordering correct.
				`@@ -0,0 +1 @@`
				- Added `InceptionLLMService` for Inception's Mercury 2 diffusion reasoning model, with support for `reasoning_effort` and `realtime` settings.
				`@@ -0,0 +1 @@`
				- Fixed websocket STT connection setup failures so services clear stale websocket state and emit non-fatal error frames, allowing `ServiceSwitcher` failover to keep agents running.
				`@@ -0,0 +1 @@`
				- Added `max_endpoint_delay_ms` to `SonioxSTTService.Settings`, controlling the maximum delay (500-3000 ms) before endpoint detection finalizes a turn.
				`@@ -0,0 +1 @@`
				- `SonioxSTTService` now applies settings updates (e.g. via `STTUpdateSettingsFrame`) using a graceful reconnect instead of a hard disconnect/reconnect, preserving the service's reconnect retry behavior.