Merge pull request #4423 from joycech333/feat/inception-llm-service

feat: add Inception LLM service with Mercury 2 support
Code review fixes
2026-05-21 12:02:27 -04:00 · 2026-05-21 11:45:17 -04:00 · 2026-05-21 11:23:23 -04:00 · 2026-05-21 08:35:46 -04:00 · 2026-05-21 08:35:31 -04:00 · 2026-05-21 08:35:15 -04:00
39 changed files with 1116 additions and 87 deletions
--- a/.claude/skills/squash-commits/SKILL.md
+++ b/.claude/skills/squash-commits/SKILL.md
@@ -0,0 +1,91 @@
+---
+name: squash-commits
+description: Reorganize messy branch commits into a small set of logical, meaningful commits without changing any content. Drops merge-from-main commits. Safe: creates a backup branch first.
+---
+
+Reorganize the commits on the current branch into a small number of logical commits. Do NOT change any file content — only the commit structure changes.
+
+## Instructions
+
+### 1. Safety check
+
+```bash
+git status --short
+```
+
+If there are uncommitted changes, stop and tell the user to commit or stash them first.
+
+### 2. Inspect the branch
+
+```bash
+git log main..HEAD --oneline
+git diff main..HEAD --name-only
+```
+
+List every file changed vs `main` and every commit on the branch (excluding merge commits from main).
+
+### 3. Create a backup branch
+
+```bash
+git branch backup/<current-branch-name>
+```
+
+Tell the user the backup exists so they can recover if needed.
+
+### 4. Soft-reset to main and unstage everything
+
+```bash
+git reset --soft main
+git restore --staged .
+```
+
+All branch changes are now in the working tree, unstaged. No content has changed.
+
+### 5. Plan the logical groups
+
+Read the changed files and the original commit messages to understand what the work covers. Group related files into logical commits. Typical groups:
+
+- Core feature or fix (new source files + modified core files)
+- Secondary features or fixes (each as its own commit if distinct)
+- Refactoring or renames
+- Tests
+- Changelogs / docs
+
+Use the changelog files (if any) as a strong hint — each changelog entry often maps to one commit.
+
+Present the proposed grouping to the user and ask for confirmation before committing.
+
+### 6. Commit in logical groups
+
+For each group, stage only the relevant files and commit with a clear message following the project's conventions:
+
+```bash
+git add <file1> <file2> ...
+git commit -m "..."
+```
+
+Use conventional commit prefixes if the project uses them (`feat:`, `fix:`, `refactor:`, `test:`, `chore:`).
+
+### 7. Verify
+
+```bash
+git log main..HEAD --oneline
+git diff main..HEAD --name-only
+git status --short
+```
+
+Confirm:
+- Commit count is small and each message is meaningful
+- The set of changed files vs `main` is identical to before
+- Working tree is clean
+
+### 8. Remind about force-push
+
+The branch history has been rewritten. Tell the user they will need to `git push --force-with-lease` when they are ready to update the remote. Do NOT push automatically.
+
+## Rules
+
+- Never change file contents. If you find yourself editing a file, stop.
+- Never skip the backup branch step.
+- Never force-push without explicit user instruction.
+- If any step fails or the result looks wrong, tell the user and suggest restoring from the backup: `git reset --hard backup/<branch-name>`.
--- a/README.md
+++ b/README.md
@@ -92,7 +92,7 @@ Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.yout
 | Category            | Services                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
 | ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | Speech-to-Text      | [AssemblyAI](https://docs.pipecat.ai/api-reference/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/api-reference/server/services/stt/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/api-reference/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/api-reference/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/api-reference/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/api-reference/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/api-reference/server/services/stt/gladia), [Google](https://docs.pipecat.ai/api-reference/server/services/stt/google), [Gradium](https://docs.pipecat.ai/api-reference/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/api-reference/server/services/stt/groq), [Mistral](https://docs.pipecat.ai/api-reference/server/services/stt/mistral), [NVIDIA](https://docs.pipecat.ai/api-reference/server/services/stt/nvidia), [OpenAI (Whisper)](https://docs.pipecat.ai/api-reference/server/services/stt/openai), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/api-reference/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/api-reference/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/api-reference/server/services/stt/whisper), [xAI](https://docs.pipecat.ai/api-reference/server/services/stt/xai)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
-| LLMs                | [Anthropic](https://docs.pipecat.ai/api-reference/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/api-reference/server/services/llm/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/api-reference/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/api-reference/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/api-reference/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/api-reference/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/api-reference/server/services/llm/grok), [Groq](https://docs.pipecat.ai/api-reference/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/api-reference/server/services/llm/mistral), [Nebius](https://docs.pipecat.ai/api-reference/server/services/llm/nebius), [Novita](https://docs.pipecat.ai/api-reference/server/services/llm/novita), [NVIDIA NIM](https://docs.pipecat.ai/api-reference/server/services/llm/nvidia), [Ollama](https://docs.pipecat.ai/api-reference/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/api-reference/server/services/llm/openai), [OpenAI Responses](https://docs.pipecat.ai/api-reference/server/services/llm/openai-responses), [OpenRouter](https://docs.pipecat.ai/api-reference/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/api-reference/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/api-reference/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/api-reference/server/services/llm/sambanova), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/llm/sarvam), [Together AI](https://docs.pipecat.ai/api-reference/server/services/llm/together)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
+| LLMs                | [Anthropic](https://docs.pipecat.ai/api-reference/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/api-reference/server/services/llm/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/api-reference/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/api-reference/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/api-reference/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/api-reference/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/api-reference/server/services/llm/grok), [Groq](https://docs.pipecat.ai/api-reference/server/services/llm/groq), [Inception](https://docs.pipecat.ai/api-reference/server/services/llm/inception), [Mistral](https://docs.pipecat.ai/api-reference/server/services/llm/mistral), [Nebius](https://docs.pipecat.ai/api-reference/server/services/llm/nebius), [Novita](https://docs.pipecat.ai/api-reference/server/services/llm/novita), [NVIDIA NIM](https://docs.pipecat.ai/api-reference/server/services/llm/nvidia), [Ollama](https://docs.pipecat.ai/api-reference/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/api-reference/server/services/llm/openai), [OpenAI Responses](https://docs.pipecat.ai/api-reference/server/services/llm/openai-responses), [OpenRouter](https://docs.pipecat.ai/api-reference/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/api-reference/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/api-reference/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/api-reference/server/services/llm/sambanova), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/llm/sarvam), [Together AI](https://docs.pipecat.ai/api-reference/server/services/llm/together)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
 | Text-to-Speech      | [Async](https://docs.pipecat.ai/api-reference/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/api-reference/server/services/tts/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/tts/azure), [Camb AI](https://docs.pipecat.ai/api-reference/server/services/tts/camb), [Cartesia](https://docs.pipecat.ai/api-reference/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/api-reference/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/api-reference/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/api-reference/server/services/tts/fish), [Google](https://docs.pipecat.ai/api-reference/server/services/tts/google), [Gradium](https://docs.pipecat.ai/api-reference/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/api-reference/server/services/tts/groq), [Hume](https://docs.pipecat.ai/api-reference/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/api-reference/server/services/tts/inworld), [Kokoro](https://docs.pipecat.ai/api-reference/server/services/tts/kokoro), [LMNT](https://docs.pipecat.ai/api-reference/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/api-reference/server/services/tts/minimax), [Mistral](https://docs.pipecat.ai/api-reference/server/services/tts/mistral), [Neuphonic](https://docs.pipecat.ai/api-reference/server/services/tts/neuphonic), [NVIDIA](https://docs.pipecat.ai/api-reference/server/services/tts/nvidia), [OpenAI](https://docs.pipecat.ai/api-reference/server/services/tts/openai), [Piper](https://docs.pipecat.ai/api-reference/server/services/tts/piper), [Resemble](https://docs.pipecat.ai/api-reference/server/services/tts/resemble), [Rime](https://docs.pipecat.ai/api-reference/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/tts/sarvam), [Smallest](https://docs.pipecat.ai/api-reference/server/services/tts/smallest), [Soniox](https://docs.pipecat.ai/api-reference/server/services/tts/soniox), [Speechmatics](https://docs.pipecat.ai/api-reference/server/services/tts/speechmatics), [xAI](https://docs.pipecat.ai/api-reference/server/services/tts/xai), [XTTS](https://docs.pipecat.ai/api-reference/server/services/tts/xtts) |
 | Speech-to-Speech    | [AWS Nova Sonic](https://docs.pipecat.ai/api-reference/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/api-reference/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/api-reference/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/api-reference/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/api-reference/server/services/s2s/ultravox),                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
 | Transport           | [Daily (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/api-reference/server/services/transport/fastapi-websocket), [LiveKit (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/livekit), [SmallWebRTCTransport](https://docs.pipecat.ai/api-reference/server/services/transport/small-webrtc), [Vonage (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/vonage), [WebSocket Server](https://docs.pipecat.ai/api-reference/server/services/transport/websocket-server), [WhatsApp](https://docs.pipecat.ai/api-reference/server/services/transport/whatsapp), Local                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
--- a/changelog/4423.added.md
+++ b/changelog/4423.added.md
@@ -0,0 +1 @@
+- Added `InceptionLLMService` for Inception's Mercury 2 diffusion reasoning model, with support for `reasoning_effort` and `realtime` settings.
--- a/changelog/4514.fixed.md
+++ b/changelog/4514.fixed.md
@@ -0,0 +1 @@
+- Fixed websocket STT connection setup failures so services clear stale websocket state and emit non-fatal error frames, allowing `ServiceSwitcher` failover to keep agents running.
--- a/changelog/4521.added.md
+++ b/changelog/4521.added.md
@@ -0,0 +1 @@
+- Added `max_endpoint_delay_ms` to `SonioxSTTService.Settings`, controlling the maximum delay (500-3000 ms) before endpoint detection finalizes a turn.
--- a/changelog/4521.changed.md
+++ b/changelog/4521.changed.md
@@ -0,0 +1 @@
+- `SonioxSTTService` now applies settings updates (e.g. via `STTUpdateSettingsFrame`) using a graceful reconnect instead of a hard disconnect/reconnect, preserving the service's reconnect retry behavior.
--- a/changelog/4521.removed.md
+++ b/changelog/4521.removed.md
@@ -0,0 +1 @@
+- Removed the unsupported Georgian (`Language.KA`) language mapping from `SonioxSTTService`.
--- a/changelog/4522.changed.md
+++ b/changelog/4522.changed.md
@@ -0,0 +1 @@
+- Updated the default p99 TTFS latency values for Smallest AI, Mistral, and XAI STT so turn stop timing uses measured values instead of the conservative fallback.
--- a/changelog/4524.changed.md
+++ b/changelog/4524.changed.md
@@ -0,0 +1 @@
+- Updated the development runner startup banner to show the prebuilt client URL once and list enabled or disabled transports with install hints.
--- a/changelog/4524.fixed.md
+++ b/changelog/4524.fixed.md
@@ -0,0 +1 @@
+- Fixed the development runner so missing optional transport dependencies disable only their related routes instead of failing startup in all-transport mode.
--- a/changelog/4527.fixed.md
+++ b/changelog/4527.fixed.md
@@ -0,0 +1 @@
+- Fixed a race in `ElevenLabsTTSService` where the periodic keepalive could be sent for a new turn's context before that context's `voice_settings` initialization message, causing ElevenLabs to close the WebSocket with a 1008 policy violation (`voice_settings field must be provided in the first message ...`). The keepalive now only targets a context once its context-init has been sent.
--- a/changelog/4531.changed.md
+++ b/changelog/4531.changed.md
@@ -0,0 +1 @@
+- Bumped `pipecat-ai-prebuilt` to 1.0.1 in the `runner` extra, updating the prebuilt client UI served by the development runner.
--- a/env.example
+++ b/env.example
@@ -91,6 +91,9 @@ HEYGEN_LIVE_AVATAR_API_KEY=...
 HUME_API_KEY=...
 HUME_VOICE_ID=...

+# Inception
+INCEPTION_API_KEY=...
+
 # Inworld
 INWORLD_API_KEY=...

--- a/examples/function-calling/function-calling-inception.py
+++ b/examples/function-calling/function-calling-inception.py
@@ -0,0 +1,177 @@
+#
+# Copyright (c) 2024-2026, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+
+import os
+
+from dotenv import load_dotenv
+from loguru import logger
+
+from pipecat.adapters.schemas.function_schema import FunctionSchema
+from pipecat.adapters.schemas.tools_schema import ToolsSchema
+from pipecat.audio.vad.silero import SileroVADAnalyzer
+from pipecat.frames.frames import LLMRunFrame, TTSSpeakFrame
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.llm_context import LLMContext
+from pipecat.processors.aggregators.llm_response_universal import (
+    LLMContextAggregatorPair,
+    LLMUserAggregatorParams,
+)
+from pipecat.runner.types import RunnerArguments
+from pipecat.runner.utils import create_transport
+from pipecat.services.cartesia.tts import CartesiaTTSService
+from pipecat.services.deepgram.stt import DeepgramSTTService
+from pipecat.services.inception.llm import InceptionLLMService
+from pipecat.services.llm_service import FunctionCallParams
+from pipecat.transports.base_transport import BaseTransport, TransportParams
+from pipecat.transports.daily.transport import DailyParams
+from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
+
+load_dotenv(override=True)
+
+
+async def fetch_weather_from_api(params: FunctionCallParams):
+    await params.result_callback({"conditions": "nice", "temperature": "75"})
+
+
+async def fetch_restaurant_recommendation(params: FunctionCallParams):
+    await params.result_callback({"name": "The Golden Dragon"})
+
+
+# We use lambdas to defer transport parameter creation until the transport
+# type is selected at runtime.
+transport_params = {
+    "daily": lambda: DailyParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+    "twilio": lambda: FastAPIWebsocketParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+    "webrtc": lambda: TransportParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+}
+
+
+async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
+    logger.info(f"Starting bot")
+
+    stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
+
+    tts = CartesiaTTSService(
+        api_key=os.environ["CARTESIA_API_KEY"],
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
+    )
+
+    llm = InceptionLLMService(
+        api_key=os.environ["INCEPTION_API_KEY"],
+        settings=InceptionLLMService.Settings(
+            reasoning_effort="instant",
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )
+    # You can also register a function_name of None to get all functions
+    # sent to the same callback with an additional function_name parameter.
+    llm.register_function("get_current_weather", fetch_weather_from_api)
+    llm.register_function("get_restaurant_recommendation", fetch_restaurant_recommendation)
+
+    @llm.event_handler("on_function_calls_started")
+    async def on_function_calls_started(service, function_calls):
+        await tts.queue_frame(TTSSpeakFrame("Let me check on that."))
+
+    weather_function = FunctionSchema(
+        name="get_current_weather",
+        description="Get the current weather",
+        properties={
+            "location": {
+                "type": "string",
+                "description": "The city and state, e.g. San Francisco, CA",
+            },
+            "format": {
+                "type": "string",
+                "enum": ["celsius", "fahrenheit"],
+                "description": "The temperature unit to use. Infer this from the user's location.",
+            },
+        },
+        required=["location", "format"],
+    )
+
+    restaurant_function = FunctionSchema(
+        name="get_restaurant_recommendation",
+        description="Get a restaurant recommendation",
+        properties={
+            "location": {
+                "type": "string",
+                "description": "The city and state, e.g. San Francisco, CA",
+            },
+        },
+        required=["location"],
+    )
+    tools = ToolsSchema(standard_tools=[weather_function, restaurant_function])
+
+    context = LLMContext(tools=tools)
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+    )
+
+    pipeline = Pipeline(
+        [
+            transport.input(),
+            stt,
+            user_aggregator,
+            llm,
+            tts,
+            transport.output(),
+            assistant_aggregator,
+        ]
+    )
+
+    task = PipelineTask(
+        pipeline,
+        params=PipelineParams(
+            enable_metrics=True,
+            enable_usage_metrics=True,
+        ),
+        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
+    )
+
+    @transport.event_handler("on_client_connected")
+    async def on_client_connected(transport, client):
+        logger.info(f"Client connected")
+        # Kick off the conversation.
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
+        await task.queue_frames([LLMRunFrame()])
+
+    @transport.event_handler("on_client_disconnected")
+    async def on_client_disconnected(transport, client):
+        logger.info(f"Client disconnected")
+        await task.cancel()
+
+    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
+
+    await runner.run(task)
+
+
+async def bot(runner_args: RunnerArguments):
+    """Main bot entry point compatible with Pipecat Cloud."""
+    transport = await create_transport(runner_args, transport_params)
+    await run_bot(transport, runner_args)
+
+
+if __name__ == "__main__":
+    from pipecat.runner.run import main
+
+    main()
--- a/examples/update-settings/stt/stt-soniox.py
+++ b/examples/update-settings/stt/stt-soniox.py
@@ -22,9 +22,9 @@ from pipecat.processors.aggregators.llm_response_universal import (
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
-from pipecat.services.cartesia.tts import CartesiaTTSService
 from pipecat.services.openai.llm import OpenAILLMService
 from pipecat.services.soniox.stt import SonioxSTTService
+from pipecat.services.soniox.tts import SonioxTTSService
 from pipecat.transcriptions.language import Language
 from pipecat.transports.base_transport import BaseTransport, TransportParams
 from pipecat.transports.daily.transport import DailyParams
@@ -53,12 +53,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    stt = SonioxSTTService(api_key=os.environ["SONIOX_API_KEY"])

-    tts = CartesiaTTSService(
-        api_key=os.environ["CARTESIA_API_KEY"],
-        settings=CartesiaTTSService.Settings(
-            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
-        ),
-    )
+    tts = SonioxTTSService(api_key=os.environ["SONIOX_API_KEY"])

    llm = OpenAILLMService(
        api_key=os.environ["OPENAI_API_KEY"],
@@ -103,9 +98,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        await task.queue_frames([LLMRunFrame()])

        await asyncio.sleep(10)
-        logger.info("Updating Soniox STT settings: language=es")
+        logger.info("Updating Soniox STT settings: language_hints=[es]")
        await task.queue_frame(
-            STTUpdateSettingsFrame(delta=SonioxSTTService.Settings(language=Language.ES))
+            STTUpdateSettingsFrame(delta=SonioxSTTService.Settings(language_hints=[Language.ES]))
        )

    @transport.event_handler("on_client_disconnected")
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -77,6 +77,7 @@ groq = [ "groq>=0.23.0,<2" ]
 gstreamer = [ "pygobject~=3.50.0" ]
 heygen = [ "livekit>=1.0.13,<2", "pipecat-ai[websockets-base]" ]
 hume = [ "hume>=0.11.2,<1" ]
+inception = []
 inworld = [ "pipecat-ai[websockets-base]" ]
 koala = [ "pvkoala~=2.0.3" ]
 kokoro = [ "kokoro-onnx>=0.5.0,<1", "requests>=2.32.5,<3" ]
@@ -103,7 +104,7 @@ piper = [ "piper-tts>=1.3.0,<2", "requests>=2.32.5,<3" ]
 qwen = []
 resembleai = [ "pipecat-ai[websockets-base]" ]
 rime = [ "pipecat-ai[websockets-base]" ]
-runner = [ "python-dotenv>=1.0.0,<2.0.0", "uvicorn>=0.32.0,<1.0.0", "fastapi>=0.115.6,<1", "pipecat-ai-prebuilt>=1.0.0"]
+runner = [ "python-dotenv>=1.0.0,<2.0.0", "uvicorn>=0.32.0,<1.0.0", "fastapi>=0.115.6,<1", "pipecat-ai-prebuilt>=1.0.1"]
 sagemaker = ["aws_sdk_sagemaker_runtime_http2; python_version>='3.12'"]
 sambanova = []
 sarvam = [ "sarvamai==0.1.28", "pipecat-ai[websockets-base]" ]
--- a/scripts/evals/run-release-evals.py
+++ b/scripts/evals/run-release-evals.py
@@ -198,6 +198,7 @@ TESTS_FUNCTION_CALLING = [
    ("function-calling/function-calling-sarvam.py", EVAL_WEATHER),
    ("function-calling/function-calling-novita.py", EVAL_WEATHER),
    ("function-calling/function-calling-deepseek.py", EVAL_WEATHER),
+    ("function-calling/function-calling-inception.py", EVAL_WEATHER),
    # Video
    ("function-calling/function-calling-anthropic-video.py", EVAL_VISION_CAMERA),
    ("function-calling/function-calling-aws-video.py", EVAL_VISION_CAMERA),
--- a/src/pipecat/processors/frameworks/rtvi/observer.py
+++ b/src/pipecat/processors/frameworks/rtvi/observer.py
@@ -529,7 +529,7 @@ class RTVIObserver(BaseObserver):

        isTTS = isinstance(frame, TTSTextFrame)
        if agg_type is not AggregationType.WORD:
-            logger.debug(f"{self} Aggregated LLM text: {text}, {agg_type} spoken:{isTTS}")
+            logger.trace(f"{self} Aggregated LLM text: {text}, {agg_type} spoken:{isTTS}")

        if self._params.bot_output_enabled:
            message = RTVI.BotOutputMessage(
--- a/src/pipecat/runner/run.py
+++ b/src/pipecat/runner/run.py
@@ -90,6 +90,7 @@ To run locally:

 import argparse
 import asyncio
+import importlib.util
 import mimetypes
 import os
 import sys
@@ -131,6 +132,18 @@ load_dotenv(override=True)
 os.environ["ENV"] = "local"

 TELEPHONY_TRANSPORTS = ["twilio", "telnyx", "plivo", "exotel"]
+TRANSPORT_ROUTE_DEPENDENCIES = {
+    "daily": ("daily",),
+    "webrtc": ("aiortc",),
+    "telephony": ("fastapi", "websockets"),
+    "websocket": ("fastapi", "websockets"),
+}
+TRANSPORT_INSTALL_HINTS = {
+    "daily": "install pipecat-ai[daily]",
+    "webrtc": "install pipecat-ai[webrtc]",
+    "telephony": "install pipecat-ai[websocket]",
+    "websocket": "install pipecat-ai[websocket]",
+}

 # Mirror Pipecat Cloud's 4-hour max session limit so dev rooms get cleaned up.
 PIPECAT_ROOM_EXP_HOURS = 4.0
@@ -156,6 +169,120 @@ Import this to add custom routes from other packages before calling
 """


+def _is_module_available(module: str) -> bool:
+    """Check whether a module can be imported without importing it.
+
+    Args:
+        module: Fully-qualified module name to check.
+
+    Returns:
+        ``True`` if Python can resolve the module, ``False`` otherwise.
+    """
+    try:
+        return importlib.util.find_spec(module) is not None
+    except (ImportError, ModuleNotFoundError, ValueError):
+        return False
+
+
+def _transport_route_dependencies(transport: str) -> tuple[str, ...]:
+    """Return module dependencies required for a transport route.
+
+    Args:
+        transport: Transport name from the runner request or CLI.
+
+    Returns:
+        Module names required to enable the transport route.
+    """
+    if transport in TELEPHONY_TRANSPORTS:
+        return TRANSPORT_ROUTE_DEPENDENCIES["telephony"]
+    return TRANSPORT_ROUTE_DEPENDENCIES.get(transport, ())
+
+
+def _transport_routes_enabled(transport: str) -> bool:
+    """Return whether a transport route can run in this environment.
+
+    Args:
+        transport: Transport name from the runner request or CLI.
+
+    Returns:
+        ``True`` if the requested transport is enabled.
+    """
+    return all(_is_module_available(module) for module in _transport_route_dependencies(transport))
+
+
+def _runner_url(args: argparse.Namespace) -> str:
+    """Return the browser URL for the runner prebuilt client."""
+    return f"http://{args.host}:{args.port}"
+
+
+def _transport_status_lists() -> tuple[list[str], list[str]]:
+    """Return enabled and disabled transport labels for the startup banner."""
+    transports = ["daily", "webrtc", "telephony", "websocket"]
+    enabled = []
+    disabled = []
+
+    for label in transports:
+        if _transport_routes_enabled(label):
+            enabled.append(label)
+        else:
+            disabled.append(f"{label} ({TRANSPORT_INSTALL_HINTS[label]})")
+
+    return enabled, disabled
+
+
+def _format_transport_status(labels: list[str]) -> str:
+    """Format a startup banner transport status list."""
+    return ", ".join(labels) if labels else "none"
+
+
+def _print_startup_message(args: argparse.Namespace):
+    """Print connection information for the development runner."""
+    print()
+    if args.transport is None:
+        enabled, disabled = _transport_status_lists()
+        print("🚀 Bot ready!")
+        print(f"   → Open: {_runner_url(args)}")
+        print(f"   → Enabled transports: {_format_transport_status(enabled)}")
+        if disabled:
+            print(f"   → Disabled transports: {_format_transport_status(disabled)}")
+    elif args.transport == "webrtc":
+        if args.esp32:
+            print("🚀 Bot ready! (ESP32 mode)")
+        elif args.whatsapp:
+            print("🚀 Bot ready! (WhatsApp)")
+        else:
+            print("🚀 Bot ready! (WebRTC)")
+        if _transport_routes_enabled("webrtc"):
+            print(f"   → Open: {_runner_url(args)}")
+        else:
+            print(f"   → WebRTC disabled ({TRANSPORT_INSTALL_HINTS['webrtc']})")
+    elif args.transport == "daily":
+        print("🚀 Bot ready! (Daily)")
+        if not _transport_routes_enabled("daily"):
+            print(f"   → Daily disabled ({TRANSPORT_INSTALL_HINTS['daily']})")
+        else:
+            print(f"   → Open: {_runner_url(args)}")
+            if args.dialin:
+                print(
+                    f"   → Daily dial-in webhook: "
+                    f"http://{args.host}:{args.port}/daily-dialin-webhook"
+                )
+                print("   → Configure this URL in your Daily phone number settings")
+    elif args.transport in TELEPHONY_TRANSPORTS:
+        print(f"🚀 Bot ready! ({args.transport.capitalize()})")
+        if not _transport_routes_enabled(args.transport):
+            print(f"   → Telephony disabled ({TRANSPORT_INSTALL_HINTS['telephony']})")
+        else:
+            print(f"   → Open: {_runner_url(args)}")
+            if args.proxy:
+                print(f"   → XML webhook: http://{args.host}:{args.port}/")
+            print(f"   → WebSocket:   ws://{args.host}:{args.port}/ws")
+    elif args.transport == "vonage":
+        print()
+        print("🚀 Bot ready!")
+    print()
+
+
 def _get_bot_module():
    """Get the bot module from the calling script."""
    import importlib.util
@@ -227,6 +354,8 @@ async def _run_websocket_bot(websocket: WebSocket, args: argparse.Namespace):

 def _setup_websocket_routes(app: FastAPI, args: argparse.Namespace):
    """Set up the plain WebSocket route at ``/ws-client``."""
+    if not _transport_routes_enabled("websocket"):
+        return

    @app.websocket("/ws-client")
    async def websocket_client_endpoint(websocket: WebSocket):
@@ -338,6 +467,15 @@ def _setup_unified_start_route(
                ),
            )

+        if not _transport_routes_enabled(transport):
+            raise HTTPException(
+                status_code=400,
+                detail=(
+                    f"Transport '{transport}' is disabled in this runner environment. "
+                    "Check the startup banner for enabled transports."
+                ),
+            )
+
        if transport == "webrtc":
            # WebRTC: register the session; the bot starts when the WebRTC offer arrives.
            session_id = str(uuid.uuid4())
@@ -471,6 +609,9 @@ def _setup_webrtc_routes(
    app: FastAPI, args: argparse.Namespace, active_sessions: dict[str, dict[str, Any]]
 ):
    """Set up WebRTC-specific routes."""
+    if not _transport_routes_enabled("webrtc"):
+        return
+
    try:
        from pipecat.transports.smallwebrtc.connection import SmallWebRTCConnection
        from pipecat.transports.smallwebrtc.request_handler import (
@@ -480,7 +621,7 @@ def _setup_webrtc_routes(
            SmallWebRTCRequestHandler,
        )
    except ImportError as e:
-        logger.error(f"WebRTC transport dependencies not installed: {e}")
+        logger.warning(f"WebRTC routes disabled after dependency check passed: {e}")
        return

    @app.get("/files/{filename:path}")
@@ -765,6 +906,8 @@ def _setup_whatsapp_routes(app: FastAPI, args: argparse.Namespace):

 def _setup_daily_routes(app: FastAPI, args: argparse.Namespace):
    """Set up Daily-specific routes."""
+    if not _transport_routes_enabled("daily"):
+        return

    @app.get("/daily")
    async def create_room_and_start_agent():
@@ -908,6 +1051,9 @@ def _setup_telephony_routes(app: FastAPI, args: argparse.Namespace):
    specific telephony transport is chosen via ``-t`` because the XML template
    is provider-specific and requires a proxy hostname (``--proxy``).
    """
+    if not _transport_routes_enabled("telephony"):
+        return
+
    if args.transport in TELEPHONY_TRANSPORTS:
        # XML response templates (Exotel doesn't use XML webhooks)
        XML_TEMPLATES = {
@@ -1160,43 +1306,11 @@ def main(parser: argparse.ArgumentParser | None = None):
        return

    # Print startup message
-    print()
-    if args.transport is None:
-        print("🚀 Bot ready!")
-        print(f"   → WebRTC:    http://{args.host}:{args.port}/client")
-        print(f"   → Daily:     http://{args.host}:{args.port}/daily")
-        print(f"   → Telephony: ws://{args.host}:{args.port}/ws")
-    elif args.transport == "webrtc":
-        if args.esp32:
-            print("🚀 Bot ready! (ESP32 mode)")
-        elif args.whatsapp:
-            print("🚀 Bot ready! (WhatsApp)")
-        else:
-            print("🚀 Bot ready! (WebRTC)")
-        print(f"   → Open http://{args.host}:{args.port}/client in your browser")
-    elif args.transport == "daily":
-        print("🚀 Bot ready! (Daily)")
-        if args.dialin:
-            print(
-                f"   → Daily dial-in webhook: http://{args.host}:{args.port}/daily-dialin-webhook"
-            )
-            print(f"   → Configure this URL in your Daily phone number settings")
-        else:
-            print(
-                f"   → Open http://{args.host}:{args.port}/daily in your browser to start a session"
-            )
-    elif args.transport in TELEPHONY_TRANSPORTS:
-        print(f"🚀 Bot ready! ({args.transport.capitalize()})")
-        if args.proxy:
-            print(f"   → XML webhook: http://{args.host}:{args.port}/")
-        print(f"   → WebSocket:   ws://{args.host}:{args.port}/ws")
-    elif args.transport == "vonage":
-        print()
-        print(f"🚀 Bot ready!")
+    _print_startup_message(args)
+    if args.transport == "vonage":
        asyncio.run(_run_vonage())
        print()
        return
-    print()

    RUNNER_DOWNLOADS_FOLDER = args.folder
    RUNNER_HOST = args.host
--- a/src/pipecat/services/assemblyai/stt.py
+++ b/src/pipecat/services/assemblyai/stt.py
@@ -586,9 +586,9 @@ class AssemblyAISTTService(WebsocketSTTService):
            await self._call_event_handler("on_connected")
            logger.debug(f"{self} Connected to AssemblyAI WebSocket")
        except Exception as e:
+            self._websocket = None
            self._connected = False
            await self.push_error(error_msg=f"Unable to connect to AssemblyAI: {e}", exception=e)
-            raise

    async def _disconnect_websocket(self):
        """Close the websocket connection to AssemblyAI."""
--- a/src/pipecat/services/aws/stt.py
+++ b/src/pipecat/services/aws/stt.py
@@ -339,10 +339,10 @@ class AWSTranscribeSTTService(WebsocketSTTService):
            await self._call_event_handler("on_connected")
            logger.info(f"{self} Successfully connected to AWS Transcribe")
        except Exception as e:
+            self._websocket = None
            await self.push_error(
                error_msg=f"Unable to connect to AWS Transcribe: {e}", exception=e
            )
-            raise

    async def _disconnect_websocket(self):
        """Close the websocket connection to AWS Transcribe."""
--- a/src/pipecat/services/cartesia/stt.py
+++ b/src/pipecat/services/cartesia/stt.py
@@ -354,7 +354,8 @@ class CartesiaSTTService(WebsocketSTTService):
            self._websocket = await websocket_connect(ws_url, additional_headers=headers)
            await self._call_event_handler("on_connected")
        except Exception as e:
-            await self.push_error(error_msg=f"Unknown error occurred: {e}", exception=e)
+            self._websocket = None
+            await self.push_error(error_msg=f"Unable to connect to Cartesia: {e}", exception=e)

    async def _disconnect_websocket(self):
        ws = self._websocket
--- a/src/pipecat/services/elevenlabs/stt.py
+++ b/src/pipecat/services/elevenlabs/stt.py
@@ -823,6 +823,7 @@ class ElevenLabsRealtimeSTTService(WebsocketSTTService):
            await self._call_event_handler("on_connected")
            logger.debug("Connected to ElevenLabs Realtime STT")
        except Exception as e:
+            self._websocket = None
            await self.push_error(
                error_msg=f"Unable to connect to ElevenLabs Realtime STT: {e}", exception=e
            )
--- a/src/pipecat/services/elevenlabs/tts.py
+++ b/src/pipecat/services/elevenlabs/tts.py
@@ -594,6 +594,10 @@ class ElevenLabsTTSService(WebsocketTTSService):
        self._partial_word_start_time = 0.0
        self._alignment_started_context_ids: set[str | None] = set()

+        # Context IDs whose context-init has been sent, so the keepalive knows
+        # which contexts are safe to target.
+        self._context_init_sent: set[str] = set()
+
        # Context management for v1 multi API
        self._receive_task = None
        self._keepalive_task = None
@@ -792,6 +796,7 @@ class ElevenLabsTTSService(WebsocketTTSService):
        finally:
            await self.remove_active_audio_context()
            self._websocket = None
+            self._context_init_sent.clear()
            await self._call_event_handler("on_disconnected")

    def _get_websocket(self):
@@ -822,6 +827,7 @@ class ElevenLabsTTSService(WebsocketTTSService):
        self._partial_word = ""
        self._partial_word_start_time = 0.0
        self._alignment_started_context_ids.discard(context_id)
+        self._context_init_sent.discard(context_id)

    async def on_audio_context_interrupted(self, context_id: str):
        """Close the ElevenLabs context when the bot is interrupted."""
@@ -914,26 +920,35 @@ class ElevenLabsTTSService(WebsocketTTSService):
        while True:
            await asyncio.sleep(KEEPALIVE_SLEEP)
            try:
-                if self._websocket and self._websocket.state is State.OPEN:
-                    context_id = self.get_active_audio_context_id()
-                    if context_id:
-                        # Send keepalive with context ID to keep the connection alive
-                        keepalive_message = {
-                            "text": "",
-                            "context_id": context_id,
-                        }
-                        logger.trace(f"Sending keepalive for context {context_id}")
-                    else:
-                        # It's possible to have a user interruption which clears the context
-                        # without generating a new TTS response. In this case, we'll just send
-                        # an empty message to keep the connection alive.
-                        keepalive_message = {"text": ""}
-                        logger.trace("Sending keepalive without context")
-                    await self._websocket.send(json.dumps(keepalive_message))
+                await self._send_keepalive()
            except websockets.ConnectionClosed as e:
                logger.warning(f"{self} keepalive error: {e}")
                break

+    async def _send_keepalive(self):
+        """Send a single keepalive message to keep the WebSocket connection alive.
+
+        Only stamps a ``context_id`` once its context-init (carrying
+        ``voice_settings``) has been sent. Otherwise the keepalive would be the
+        context's first message, with no ``voice_settings``, and ElevenLabs would
+        reject the later context-init with a 1008 policy violation. A context-less
+        keepalive is sufficient until the context-init is sent.
+        """
+        if not self._websocket or self._websocket.state is not State.OPEN:
+            return
+
+        context_id = self.get_active_audio_context_id()
+        if context_id and context_id in self._context_init_sent:
+            # The context's voice_settings context-init has been sent, so it's
+            # safe to keep that context alive.
+            keepalive_message = {"text": "", "context_id": context_id}
+        else:
+            # No active context, or the active context's context-init hasn't been
+            # sent yet. A context-less keepalive keeps the connection alive without
+            # opening the context prematurely.
+            keepalive_message = {"text": ""}
+        await self._websocket.send(json.dumps(keepalive_message))
+
    async def _send_text(self, text: str, context_id: str):
        """Send text to the WebSocket for synthesis."""
        if self._websocket and context_id:
@@ -980,6 +995,9 @@ class ElevenLabsTTSService(WebsocketTTSService):
                            locator.model_dump()
                            for locator in self._pronunciation_dictionary_locators
                        ]
+                    # Mark the context-init as sent so the keepalive may now
+                    # target this context_id.
+                    self._context_init_sent.add(context_id)
                    await self._websocket.send(json.dumps(msg))
                    logger.trace(f"Created new context {context_id}")

--- a/src/pipecat/services/gladia/stt.py
+++ b/src/pipecat/services/gladia/stt.py
@@ -558,8 +558,9 @@ class GladiaSTTService(WebsocketSTTService):

            logger.debug(f"{self} Connected to Gladia WebSocket")
        except Exception as e:
+            self._websocket = None
+            self._connection_active = False
            await self.push_error(error_msg=f"Unable to connect to Gladia: {e}", exception=e)
-            raise

    async def _disconnect_websocket(self):
        """Close the websocket connection to Gladia."""
--- a/src/pipecat/services/gradium/stt.py
+++ b/src/pipecat/services/gradium/stt.py
@@ -423,8 +423,8 @@ class GradiumSTTService(WebsocketSTTService):
            logger.debug("Connected to Gradium STT")

        except Exception as e:
-            await self.push_error(error_msg=f"Unknown error occurred: {e}", exception=e)
-            raise
+            self._websocket = None
+            await self.push_error(error_msg=f"Unable to connect to Gradium: {e}", exception=e)

    async def _disconnect(self):
        await super()._disconnect()
--- a/src/pipecat/services/inception/init.py
+++ b/src/pipecat/services/inception/init.py
--- a/src/pipecat/services/inception/llm.py
+++ b/src/pipecat/services/inception/llm.py
@@ -0,0 +1,124 @@
+#
+# Copyright (c) 2024-2026, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+"""Inception LLM service implementation using OpenAI-compatible interface."""
+
+from dataclasses import dataclass, field
+from typing import Literal
+
+from loguru import logger
+
+from pipecat.adapters.services.open_ai_adapter import OpenAILLMInvocationParams
+from pipecat.services.openai.base_llm import BaseOpenAILLMService
+from pipecat.services.openai.llm import OpenAILLMService
+from pipecat.services.settings import NOT_GIVEN as _NOT_GIVEN
+from pipecat.services.settings import _NotGiven, is_given
+
+
+@dataclass
+class InceptionLLMSettings(BaseOpenAILLMService.Settings):
+    """Settings for InceptionLLMService.
+
+    Parameters:
+        reasoning_effort: Controls how much reasoning the model applies.
+            One of "instant", "low", "medium", or "high". When unset, the
+            parameter is omitted and Inception's server-side default applies.
+        realtime: When True, reduces time to first diffusion block (TTFT).
+    """
+
+    reasoning_effort: Literal["instant", "low", "medium", "high"] | None | _NotGiven = field(
+        default_factory=lambda: _NOT_GIVEN
+    )
+    realtime: bool | None | _NotGiven = field(default_factory=lambda: _NOT_GIVEN)
+
+
+class InceptionLLMService(OpenAILLMService):
+    """A service for interacting with Inception's API using the OpenAI-compatible interface.
+
+    This service extends OpenAILLMService to connect to Inception's API endpoint while
+    maintaining full compatibility with OpenAI's interface and functionality.
+    Supports Mercury-2, Inception's diffusion-based reasoning model.
+    """
+
+    # Inception doesn't support the "developer" message role.
+    supports_developer_role = False
+
+    Settings = InceptionLLMSettings
+    _settings: Settings
+
+    def __init__(
+        self,
+        *,
+        api_key: str,
+        base_url: str = "https://api.inceptionlabs.ai/v1",
+        settings: Settings | None = None,
+        **kwargs,
+    ):
+        """Initialize the Inception LLM service.
+
+        Args:
+            api_key: The API key for accessing Inception's API.
+            base_url: The base URL for Inception API. Defaults to "https://api.inceptionlabs.ai/v1".
+            settings: Runtime-updatable settings.
+            **kwargs: Additional keyword arguments passed to OpenAILLMService.
+        """
+        default_settings = self.Settings(
+            model="mercury-2",
+            reasoning_effort=None,
+            realtime=None,
+        )
+
+        if settings is not None:
+            default_settings.apply_update(settings)
+
+        super().__init__(api_key=api_key, base_url=base_url, settings=default_settings, **kwargs)
+
+    def create_client(self, api_key=None, base_url=None, **kwargs):
+        """Create OpenAI-compatible client for Inception API endpoint.
+
+        Args:
+            api_key: The API key for authentication. If None, uses instance default.
+            base_url: The base URL for the API. If None, uses instance default.
+            **kwargs: Additional keyword arguments for client configuration.
+
+        Returns:
+            An OpenAI-compatible client configured for Inception's API.
+        """
+        logger.debug(f"Creating Inception client with api {base_url}")
+        return super().create_client(api_key, base_url, **kwargs)
+
+    def build_chat_completion_params(self, params_from_context: OpenAILLMInvocationParams) -> dict:
+        """Build parameters for Inception chat completion request.
+
+        Extends the base OpenAI parameters with Inception-specific options
+        such as reasoning_effort and realtime.
+
+        Args:
+            params_from_context: Parameters, derived from the LLM context, to
+                use for the chat completion. Contains messages, tools, and tool
+                choice.
+
+        Returns:
+            Dictionary of parameters for the chat completion request.
+        """
+        params = super().build_chat_completion_params(params_from_context)
+
+        if (
+            is_given(self._settings.reasoning_effort)
+            and self._settings.reasoning_effort is not None
+        ):
+            params["reasoning_effort"] = self._settings.reasoning_effort
+
+        # realtime is Inception-specific and unknown to the OpenAI SDK,
+        # so it must be passed via extra_body to avoid validation errors.
+        extra_body = {}
+        if is_given(self._settings.realtime) and self._settings.realtime is not None:
+            extra_body["realtime"] = self._settings.realtime
+
+        if extra_body:
+            params["extra_body"] = extra_body
+
+        return params
--- a/src/pipecat/services/soniox/stt.py
+++ b/src/pipecat/services/soniox/stt.py
@@ -155,7 +155,6 @@ def language_to_soniox_language(language: Language) -> str:
        Language.ID: "id",
        Language.IT: "it",
        Language.JA: "ja",
-        Language.KA: "ka",
        Language.KK: "kk",
        Language.KN: "kn",
        Language.KO: "ko",
@@ -232,6 +231,7 @@ class SonioxSTTSettings(STTSettings):
            context_version 2.
        enable_speaker_diarization: Whether to enable speaker diarization.
        enable_language_identification: Whether to enable language identification.
+        max_endpoint_delay_ms: Max ms before endpoint detection finalizes the turn (500-3000).
        client_reference_id: Client reference ID to use for transcription.
    """

@@ -242,6 +242,7 @@ class SonioxSTTSettings(STTSettings):
    enable_language_identification: bool | None | _NotGiven = field(
        default_factory=lambda: NOT_GIVEN
    )
+    max_endpoint_delay_ms: int | None | _NotGiven = field(default_factory=lambda: NOT_GIVEN)
    client_reference_id: str | None | _NotGiven = field(default_factory=lambda: NOT_GIVEN)


@@ -309,6 +310,7 @@ class SonioxSTTService(WebsocketSTTService):
            context=None,
            enable_speaker_diarization=False,
            enable_language_identification=False,
+            max_endpoint_delay_ms=None,
            client_reference_id=None,
        )

@@ -390,8 +392,7 @@ class SonioxSTTService(WebsocketSTTService):
        changed = await super()._update_settings(delta)

        if changed:
-            await self._disconnect()
-            await self._connect()
+            await self._request_reconnect()

        return changed

@@ -522,6 +523,7 @@ class SonioxSTTService(WebsocketSTTService):
                "audio_format": self._audio_format,
                "num_channels": self._num_channels,
                "enable_endpoint_detection": enable_endpoint_detection,
+                "max_endpoint_delay_ms": s.max_endpoint_delay_ms,
                "sample_rate": self.sample_rate,
                "language_hints": _prepare_language_hints(assert_given(s.language_hints)),
                "language_hints_strict": s.language_hints_strict,
@@ -537,8 +539,8 @@ class SonioxSTTService(WebsocketSTTService):
            await self._call_event_handler("on_connected")
            logger.debug("Connected to Soniox STT")
        except Exception as e:
+            self._websocket = None
            await self.push_error(error_msg=f"Unable to connect to Soniox: {e}", exception=e)
-            raise

    async def _disconnect_websocket(self):
        """Close the websocket connection to Soniox."""
--- a/src/pipecat/services/stt_latency.py
+++ b/src/pipecat/services/stt_latency.py
@@ -44,17 +44,15 @@ GLADIA_TTFS_P99: float = 1.49
 GOOGLE_TTFS_P99: float = 1.57
 GRADIUM_TTFS_P99: float = 1.61
 GROQ_TTFS_P99: float = 1.54
+MISTRAL_TTFS_P99: float = 1.89
 OPENAI_TTFS_P99: float = 2.01
 OPENAI_REALTIME_TTFS_P99: float = 1.66
 SARVAM_TTFS_P99: float = 1.17
+SMALLEST_TTFS_P99: float = 1.59
 SONIOX_TTFS_P99: float = 0.35
 SPEECHMATICS_TTFS_P99: float = 0.74
+XAI_TTFS_P99: float = 2.14

 # These services run locally and should be replaced with measured values
 NVIDIA_TTFS_P99: float = DEFAULT_TTFS_P99
 WHISPER_TTFS_P99: float = DEFAULT_TTFS_P99
-
-# No benchmark available yet; using conservative default
-MISTRAL_TTFS_P99: float = DEFAULT_TTFS_P99
-SMALLEST_TTFS_P99: float = DEFAULT_TTFS_P99
-XAI_TTFS_P99: float = DEFAULT_TTFS_P99
--- a/src/pipecat/services/websocket_service.py
+++ b/src/pipecat/services/websocket_service.py
@@ -76,7 +76,9 @@ class WebsocketService(ABC):
        logger.warning(f"{self} reconnecting (attempt: {attempt_number})")
        await self._disconnect_websocket()
        await self._connect_websocket()
-        return await self._verify_connection()
+        if not await self._verify_connection():
+            raise ConnectionError(f"{self} websocket reconnection failed verification")
+        return True

    async def _try_reconnect(
        self,
--- a/src/pipecat/services/xai/stt.py
+++ b/src/pipecat/services/xai/stt.py
@@ -293,8 +293,9 @@ class XAISTTService(WebsocketSTTService):
            await self._call_event_handler("on_connected")
            logger.debug(f"{self} connected to xAI STT WebSocket")
        except Exception as e:
+            self._websocket = None
+            self._session_ready.clear()
            await self.push_error(error_msg=f"Unable to connect to xAI STT: {e}", exception=e)
-            raise

    async def _disconnect_websocket(self):
        """Close the WebSocket connection."""
--- a/src/pipecat/utils/context/word_completion_tracker.py
+++ b/src/pipecat/utils/context/word_completion_tracker.py
@@ -86,7 +86,6 @@ class WordCompletionTracker:
        self._overflow_word: str | None = None
        self._llm_consumed: str | None = None
        self._frame_word: str | None = None
-        logger.debug(f"WordCompletionTracker: {self._tts_normalized}")

    @staticmethod
    def _normalize(text: str) -> str:
--- a/tests/test_cartesia_stt.py
+++ b/tests/test_cartesia_stt.py
@@ -0,0 +1,45 @@
+#
+# Copyright (c) 2024-2026, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+from unittest.mock import AsyncMock
+
+import pytest
+from websockets.protocol import State
+
+from pipecat.services.cartesia.stt import CartesiaSTTService
+
+
+class _FakeWebsocket:
+    def __init__(self, *, state=State.OPEN, send_side_effect=None):
+        self.state = state
+        self.send = AsyncMock(side_effect=send_side_effect)
+
+
+@pytest.mark.asyncio
+async def test_cartesia_connect_failure_clears_stale_websocket(monkeypatch):
+    async def fake_websocket_connect(*args, **kwargs):
+        raise RuntimeError("connection failed")
+
+    monkeypatch.setattr("pipecat.services.cartesia.stt.websocket_connect", fake_websocket_connect)
+
+    service = CartesiaSTTService(api_key="test-key", sample_rate=16000)
+    service._websocket = _FakeWebsocket(state=State.CLOSED)
+
+    await service._connect_websocket()
+
+    assert service._websocket is None
+
+
+@pytest.mark.asyncio
+async def test_cartesia_run_stt_logs_send_failure_without_clearing_websocket():
+    service = CartesiaSTTService(api_key="test-key", sample_rate=16000)
+    websocket = _FakeWebsocket(send_side_effect=RuntimeError("websocket closed"))
+    service._websocket = websocket
+
+    async for _ in service.run_stt(b"\x00" * 160):
+        pass
+
+    assert service._websocket is websocket
--- a/tests/test_elevenlabs_tts.py
+++ b/tests/test_elevenlabs_tts.py
@@ -6,9 +6,14 @@

 """Tests for ElevenLabs TTS alignment handling."""

+import json
 from typing import Any

+import pytest
+from websockets.protocol import State
+
 from pipecat.services.elevenlabs.tts import (
+    ElevenLabsTTSService,
    _select_alignment,
    _strip_utterance_leading_spaces,
    calculate_word_times,
@@ -200,3 +205,87 @@ def test_select_alignment_works_with_http_field_names():
    )
    assert selected is not None
    assert selected["characters"] == list(" Hi")
+
+
+# ---------------------------------------------------------------------------
+# Keepalive vs context-init race
+#
+# The keepalive must only stamp a context_id once its context-init (carrying
+# voice_settings) has been sent. Stamping it earlier makes the keepalive the
+# context's first message, with no voice_settings, and ElevenLabs rejects the
+# later context-init with a 1008 policy violation.
+# ---------------------------------------------------------------------------
+
+
+class _FakeWebSocket:
+    """Minimal stand-in for the ElevenLabs websocket that records sends."""
+
+    def __init__(self):
+        self.state = State.OPEN
+        self.sent: list[dict] = []
+
+    async def send(self, data: str):
+        self.sent.append(json.loads(data))
+
+
+def _make_service() -> ElevenLabsTTSService:
+    return ElevenLabsTTSService(
+        api_key="test-key",
+        settings=ElevenLabsTTSService.Settings(
+            voice="test-voice",
+            stability=0.55,
+            similarity_boost=0.85,
+            use_speaker_boost=True,
+            speed=0.81,
+        ),
+    )
+
+
+@pytest.mark.asyncio
+async def test_keepalive_does_not_stamp_context_before_init():
+    """During the pre-init window the keepalive must not stamp the new context_id."""
+    service = _make_service()
+    ws = _FakeWebSocket()
+    service._websocket = ws
+
+    # Simulate the start of an LLM turn: TTSService sets the turn context id on
+    # LLMFullResponseStartFrame, before run_tts sends the voice_settings init.
+    service._turn_context_id = "ctx-1"
+    service._playing_context_id = None
+    assert "ctx-1" not in service._context_init_sent
+
+    await service._send_keepalive()
+
+    # Context-less keepalive: the real context-init stays the context's first
+    # message, so ElevenLabs won't reject it with 1008.
+    assert ws.sent == [{"text": ""}]
+
+
+@pytest.mark.asyncio
+async def test_keepalive_stamps_context_after_init():
+    """Once the context-init has been sent, the keepalive targets that context."""
+    service = _make_service()
+    ws = _FakeWebSocket()
+    service._websocket = ws
+    service._turn_context_id = "ctx-1"
+    service._playing_context_id = None
+    # run_tts records the context once its voice_settings init has gone out.
+    service._context_init_sent.add("ctx-1")
+
+    await service._send_keepalive()
+
+    assert ws.sent == [{"text": "", "context_id": "ctx-1"}]
+
+
+@pytest.mark.asyncio
+async def test_keepalive_without_active_context_sends_empty():
+    """With no active context, the keepalive sends a plain empty message."""
+    service = _make_service()
+    ws = _FakeWebSocket()
+    service._websocket = ws
+    service._turn_context_id = None
+    service._playing_context_id = None
+
+    await service._send_keepalive()
+
+    assert ws.sent == [{"text": ""}]
--- a/tests/test_runner_run.py
+++ b/tests/test_runner_run.py
@@ -0,0 +1,324 @@
+#
+# Copyright (c) 2024-2026, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+import argparse
+import io
+import sys
+import types
+import unittest
+from contextlib import redirect_stdout
+from unittest.mock import MagicMock, patch
+
+from fastapi import FastAPI
+from fastapi.testclient import TestClient
+from pydantic import BaseModel
+
+from pipecat.runner.run import (
+    _print_startup_message,
+    _setup_daily_routes,
+    _setup_telephony_routes,
+    _setup_unified_start_route,
+    _setup_webrtc_routes,
+    _setup_websocket_routes,
+    _transport_route_dependencies,
+    _transport_routes_enabled,
+)
+
+
+class TestRunnerRun(unittest.TestCase):
+    def _capture_startup_message(self, args: argparse.Namespace) -> str:
+        buffer = io.StringIO()
+        with redirect_stdout(buffer):
+            _print_startup_message(args)
+        return buffer.getvalue()
+
+    def test_transport_route_dependencies_maps_transports_to_modules(self):
+        self.assertEqual(_transport_route_dependencies("daily"), ("daily",))
+        self.assertEqual(_transport_route_dependencies("webrtc"), ("aiortc",))
+        self.assertEqual(_transport_route_dependencies("websocket"), ("fastapi", "websockets"))
+        self.assertEqual(_transport_route_dependencies("telephony"), ("fastapi", "websockets"))
+        self.assertEqual(_transport_route_dependencies("twilio"), ("fastapi", "websockets"))
+        self.assertEqual(_transport_route_dependencies("telnyx"), ("fastapi", "websockets"))
+        self.assertEqual(_transport_route_dependencies("plivo"), ("fastapi", "websockets"))
+        self.assertEqual(_transport_route_dependencies("exotel"), ("fastapi", "websockets"))
+        self.assertEqual(_transport_route_dependencies("vonage"), ())
+
+    def test_transport_routes_enabled_maps_transports_to_dependency_checks(self):
+        def module_available(module: str) -> bool:
+            return module in {"fastapi", "websockets"}
+
+        with patch("pipecat.runner.run._is_module_available", side_effect=module_available):
+            self.assertFalse(_transport_routes_enabled("daily"))
+            self.assertFalse(_transport_routes_enabled("webrtc"))
+            self.assertTrue(_transport_routes_enabled("websocket"))
+            self.assertTrue(_transport_routes_enabled("telephony"))
+            self.assertTrue(_transport_routes_enabled("twilio"))
+            self.assertTrue(_transport_routes_enabled("vonage"))
+
+    def test_setup_webrtc_routes_skips_when_aiortc_is_missing(self):
+        """WebRTC routes should be optional when the webrtc extra is not installed."""
+        app = FastAPI()
+        args = argparse.Namespace(folder=None, esp32=False, host="localhost")
+
+        with (
+            patch("pipecat.runner.run._transport_routes_enabled", return_value=False),
+            patch("pipecat.runner.run.logger") as logger,
+        ):
+            _setup_webrtc_routes(app, args, {})
+
+        paths = {route.path for route in app.routes}
+        self.assertNotIn("/api/offer", paths)
+        logger.info.assert_not_called()
+
+    def test_setup_webrtc_routes_registers_routes_when_webrtc_is_available(self):
+        """WebRTC routes should be registered when dependencies are available."""
+        app = FastAPI()
+        args = argparse.Namespace(folder=None, esp32=False, host="localhost")
+
+        connection_module = types.ModuleType("pipecat.transports.smallwebrtc.connection")
+        connection_module.SmallWebRTCConnection = MagicMock()
+
+        request_handler_module = types.ModuleType("pipecat.transports.smallwebrtc.request_handler")
+
+        class IceCandidate(BaseModel):
+            candidate: str
+            sdp_mid: str
+            sdp_mline_index: int
+
+        class SmallWebRTCPatchRequest(BaseModel):
+            pc_id: str
+            candidates: list[IceCandidate] = []
+
+        class SmallWebRTCRequest(BaseModel):
+            sdp: str
+            type: str
+            pc_id: str | None = None
+            restart_pc: bool | None = None
+            request_data: dict | None = None
+
+        request_handler_module.IceCandidate = IceCandidate
+        request_handler_module.SmallWebRTCPatchRequest = SmallWebRTCPatchRequest
+        request_handler_module.SmallWebRTCRequest = SmallWebRTCRequest
+
+        class MockSmallWebRTCRequestHandler:
+            def __init__(self, *args, **kwargs):
+                pass
+
+            async def close(self):
+                pass
+
+        request_handler_module.SmallWebRTCRequestHandler = MockSmallWebRTCRequestHandler
+
+        with (
+            patch("pipecat.runner.run._transport_routes_enabled", return_value=True),
+            patch.dict(
+                sys.modules,
+                {
+                    "pipecat.transports.smallwebrtc.connection": connection_module,
+                    "pipecat.transports.smallwebrtc.request_handler": request_handler_module,
+                },
+            ),
+        ):
+            _setup_webrtc_routes(app, args, {})
+
+        paths = {route.path for route in app.routes}
+        self.assertIn("/api/offer", paths)
+        self.assertIn("/files/{filename:path}", paths)
+
+    def test_setup_websocket_routes_skips_when_websocket_is_missing(self):
+        """Plain WebSocket routes should be optional."""
+        app = FastAPI()
+        args = argparse.Namespace()
+
+        with patch("pipecat.runner.run._transport_routes_enabled", return_value=False):
+            _setup_websocket_routes(app, args)
+
+        paths = {route.path for route in app.routes}
+        self.assertNotIn("/ws-client", paths)
+
+    def test_setup_websocket_routes_registers_when_websocket_is_available(self):
+        """Plain WebSocket route should be registered when dependencies are available."""
+        app = FastAPI()
+        args = argparse.Namespace()
+
+        with patch("pipecat.runner.run._transport_routes_enabled", return_value=True):
+            _setup_websocket_routes(app, args)
+
+        paths = {route.path for route in app.routes}
+        self.assertIn("/ws-client", paths)
+
+    def test_setup_telephony_routes_skips_when_websocket_is_missing(self):
+        """Telephony WebSocket routes should be optional."""
+        app = FastAPI()
+        args = argparse.Namespace(transport=None)
+
+        with patch("pipecat.runner.run._transport_routes_enabled", return_value=False):
+            _setup_telephony_routes(app, args)
+
+        paths = {route.path for route in app.routes}
+        self.assertNotIn("/ws", paths)
+
+    def test_setup_telephony_routes_registers_when_websocket_is_available(self):
+        """Telephony WebSocket route should be registered when dependencies are available."""
+        app = FastAPI()
+        args = argparse.Namespace(transport=None)
+
+        with patch("pipecat.runner.run._transport_routes_enabled", return_value=True):
+            _setup_telephony_routes(app, args)
+
+        paths = {route.path for route in app.routes}
+        self.assertIn("/ws", paths)
+
+    def test_setup_telephony_routes_registers_provider_webhook_for_selected_transport(self):
+        """Provider webhook route should be registered for selected telephony transports."""
+        app = FastAPI()
+        args = argparse.Namespace(transport="twilio", proxy="example.ngrok.io")
+
+        with patch("pipecat.runner.run._transport_routes_enabled", return_value=True):
+            _setup_telephony_routes(app, args)
+
+        post_root_routes = [
+            route for route in app.routes if route.path == "/" and "POST" in route.methods
+        ]
+        self.assertEqual(len(post_root_routes), 1)
+
+    def test_setup_daily_routes_skips_when_daily_is_missing(self):
+        """Daily routes should be optional."""
+        app = FastAPI()
+        args = argparse.Namespace(dialin=False)
+
+        with patch("pipecat.runner.run._transport_routes_enabled", return_value=False):
+            _setup_daily_routes(app, args)
+
+        paths = {route.path for route in app.routes}
+        self.assertNotIn("/daily", paths)
+
+    def test_setup_daily_routes_registers_when_daily_is_available(self):
+        """Daily route should be registered when dependencies are available."""
+        app = FastAPI()
+        args = argparse.Namespace(dialin=False)
+
+        with patch("pipecat.runner.run._transport_routes_enabled", return_value=True):
+            _setup_daily_routes(app, args)
+
+        paths = {route.path for route in app.routes}
+        self.assertIn("/daily", paths)
+
+    def test_setup_daily_routes_registers_dialin_route_when_enabled(self):
+        """Daily dial-in route should be registered when requested and available."""
+        app = FastAPI()
+        args = argparse.Namespace(dialin=True)
+
+        with patch("pipecat.runner.run._transport_routes_enabled", return_value=True):
+            _setup_daily_routes(app, args)
+
+        paths = {route.path for route in app.routes}
+        self.assertIn("/daily", paths)
+        self.assertIn("/daily-dialin-webhook", paths)
+
+    def test_websocket_routes_require_fastapi_and_websockets(self):
+        with patch(
+            "pipecat.runner.run._is_module_available",
+            side_effect=lambda module: module == "fastapi",
+        ) as is_module_available:
+            self.assertFalse(_transport_routes_enabled("websocket"))
+
+        self.assertEqual(
+            [call.args[0] for call in is_module_available.call_args_list],
+            ["fastapi", "websockets"],
+        )
+
+    def test_start_rejects_disabled_transport_before_running_bot(self):
+        app = FastAPI()
+        args = argparse.Namespace(transport=None)
+        _setup_unified_start_route(app, args, {})
+
+        with patch("pipecat.runner.run._transport_routes_enabled", return_value=False):
+            response = TestClient(app).post("/start", json={"transport": "daily"})
+
+        self.assertEqual(response.status_code, 400)
+        self.assertEqual(
+            response.json()["detail"],
+            (
+                "Transport 'daily' is disabled in this runner environment. "
+                "Check the startup banner for enabled transports."
+            ),
+        )
+
+    def test_startup_message_all_transports_shows_open_url_and_transport_status(self):
+        args = argparse.Namespace(transport=None, host="localhost", port=7860)
+
+        def routes_enabled(transport: str) -> bool:
+            return transport in {"telephony", "websocket"}
+
+        with patch("pipecat.runner.run._transport_routes_enabled", side_effect=routes_enabled):
+            output = self._capture_startup_message(args)
+
+        self.assertEqual(
+            output,
+            (
+                "\n"
+                "🚀 Bot ready!\n"
+                "   → Open: http://localhost:7860\n"
+                "   → Enabled transports: telephony, websocket\n"
+                "   → Disabled transports: daily (install pipecat-ai[daily]), "
+                "webrtc (install pipecat-ai[webrtc])\n"
+                "\n"
+            ),
+        )
+
+    def test_startup_message_all_transports_omits_disabled_status_when_all_enabled(self):
+        args = argparse.Namespace(transport=None, host="localhost", port=7860)
+
+        with patch("pipecat.runner.run._transport_routes_enabled", return_value=True):
+            output = self._capture_startup_message(args)
+
+        self.assertEqual(
+            output,
+            (
+                "\n"
+                "🚀 Bot ready!\n"
+                "   → Open: http://localhost:7860\n"
+                "   → Enabled transports: daily, webrtc, telephony, websocket\n"
+                "\n"
+            ),
+        )
+
+    def test_startup_message_webrtc_uses_root_open_url(self):
+        args = argparse.Namespace(
+            transport="webrtc", host="localhost", port=7860, esp32=False, whatsapp=False
+        )
+
+        with patch("pipecat.runner.run._transport_routes_enabled", return_value=True):
+            output = self._capture_startup_message(args)
+
+        self.assertIn("   → Open: http://localhost:7860\n", output)
+        self.assertNotIn("/client", output)
+
+    def test_startup_message_daily_uses_root_open_url(self):
+        args = argparse.Namespace(transport="daily", host="localhost", port=7860, dialin=False)
+
+        with patch("pipecat.runner.run._transport_routes_enabled", return_value=True):
+            output = self._capture_startup_message(args)
+
+        self.assertIn("   → Open: http://localhost:7860\n", output)
+        self.assertNotIn("/daily in your browser", output)
+
+    def test_startup_message_telephony_keeps_provider_endpoint_details(self):
+        args = argparse.Namespace(
+            transport="twilio", host="localhost", port=7860, proxy="example.ngrok.io"
+        )
+
+        with patch("pipecat.runner.run._transport_routes_enabled", return_value=True):
+            output = self._capture_startup_message(args)
+
+        self.assertIn("   → Open: http://localhost:7860\n", output)
+        self.assertIn("   → XML webhook: http://localhost:7860/\n", output)
+        self.assertIn("   → WebSocket:   ws://localhost:7860/ws\n", output)
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/tests/test_soniox_stt.py
+++ b/tests/test_soniox_stt.py
@@ -5,8 +5,10 @@
 #

 import json
+from unittest.mock import AsyncMock

 import pytest
+from websockets.protocol import State

 from pipecat.frames.frames import TranscriptionFrame
 from pipecat.services.soniox.stt import END_TOKEN, SonioxSTTService, _language_from_tokens
@@ -14,8 +16,10 @@ from pipecat.transcriptions.language import Language


 class _FakeWebsocket:
-    def __init__(self, messages):
+    def __init__(self, messages, *, state=State.OPEN, send_side_effect=None):
        self._messages = messages
+        self.state = state
+        self.send = AsyncMock(side_effect=send_side_effect)

    def __aiter__(self):
        return self._iter_messages()
@@ -25,6 +29,21 @@ class _FakeWebsocket:
            yield message


+@pytest.mark.asyncio
+async def test_connect_failure_clears_stale_websocket_without_raising(monkeypatch):
+    async def fake_websocket_connect(*args, **kwargs):
+        raise RuntimeError("connection failed")
+
+    monkeypatch.setattr("pipecat.services.soniox.stt.websocket_connect", fake_websocket_connect)
+
+    service = SonioxSTTService(api_key="test-key")
+    service._websocket = _FakeWebsocket([], state=State.CLOSED)
+
+    await service._connect_websocket()
+
+    assert service._websocket is None
+
+
 def test_language_from_tokens_uses_single_recognized_language():
    tokens = [
        {"text": "Hello", "language": "en"},
--- a/tests/test_websocket_service.py
+++ b/tests/test_websocket_service.py
@@ -165,6 +165,19 @@ async def test_reconnect_exhausted_emits_non_fatal_error(service, report_error):
    assert "Connection refused" in final_error.error


+@pytest.mark.asyncio
+async def test_reconnect_exhausted_when_connect_does_not_raise(service, report_error):
+    """A non-raising failed connect is treated as a failed reconnect attempt."""
+    result = await service._try_reconnect(report_error=report_error)
+
+    assert result is False
+    assert report_error.call_count == 4
+    final_error = report_error.call_args_list[-1][0][0]
+    assert isinstance(final_error, ErrorFrame)
+    assert final_error.fatal is False
+    assert "websocket reconnection failed verification" in final_error.error
+
+
 # ---------------------------------------------------------------------------
 # Quick failure detection — accept then immediately close
 # ---------------------------------------------------------------------------
--- a/uv.lock
+++ b/uv.lock
@@ -4539,7 +4539,7 @@ requires-dist = [
    { name = "pipecat-ai", extras = ["websockets-base"], marker = "extra == 'ultravox'" },
    { name = "pipecat-ai", extras = ["websockets-base"], marker = "extra == 'websocket'" },
    { name = "pipecat-ai", extras = ["websockets-base"], marker = "extra == 'xai'" },
-    { name = "pipecat-ai-prebuilt", marker = "extra == 'runner'", specifier = ">=1.0.0" },
+    { name = "pipecat-ai-prebuilt", marker = "extra == 'runner'", specifier = ">=1.0.1" },
    { name = "piper-tts", marker = "extra == 'piper'", specifier = ">=1.3.0,<2" },
    { name = "protobuf", specifier = ">=5.29.6,<7" },
    { name = "protobuf", marker = "extra == 'nvidia'", specifier = ">=6.31.1,<7" },
@@ -4574,7 +4574,7 @@ requires-dist = [
    { name = "wait-for2", marker = "python_full_version < '3.12'", specifier = ">=0.4.1,<1" },
    { name = "websockets", marker = "extra == 'websockets-base'", specifier = ">=13.1,<16.0" },
 ]
-provides-extras = ["aic", "anthropic", "assemblyai", "asyncai", "aws", "aws-nova-sonic", "azure", "cartesia", "camb", "cerebras", "daily", "deepgram", "deepseek", "elevenlabs", "fal", "fireworks", "fish", "gladia", "google", "gradium", "grok", "groq", "gstreamer", "heygen", "hume", "inworld", "koala", "kokoro", "langchain", "lemonslice", "livekit", "lmnt", "local", "local-smart-turn", "mcp", "mem0", "mistral", "mlx-whisper", "moondream", "nebius", "neuphonic", "novita", "nvidia", "openai", "rnnoise", "openrouter", "perplexity", "piper", "qwen", "resembleai", "rime", "runner", "sagemaker", "sambanova", "sarvam", "sentry", "silero", "simli", "smallest", "soniox", "soundfile", "speechmatics", "strands", "tavus", "together", "tracing", "ultravox", "vonage-video-connector", "webrtc", "websocket", "websockets-base", "whisper", "xai"]
+provides-extras = ["aic", "anthropic", "assemblyai", "asyncai", "aws", "aws-nova-sonic", "azure", "cartesia", "camb", "cerebras", "daily", "deepgram", "deepseek", "elevenlabs", "fal", "fireworks", "fish", "gladia", "google", "gradium", "grok", "groq", "gstreamer", "heygen", "hume", "inception", "inworld", "koala", "kokoro", "langchain", "lemonslice", "livekit", "lmnt", "local", "local-smart-turn", "mcp", "mem0", "mistral", "mlx-whisper", "moondream", "nebius", "neuphonic", "novita", "nvidia", "openai", "rnnoise", "openrouter", "perplexity", "piper", "qwen", "resembleai", "rime", "runner", "sagemaker", "sambanova", "sarvam", "sentry", "silero", "simli", "smallest", "soniox", "soundfile", "speechmatics", "strands", "tavus", "together", "tracing", "ultravox", "vonage-video-connector", "webrtc", "websocket", "websockets-base", "whisper", "xai"]

 [package.metadata.requires-dev]
 dev = [
@@ -4603,14 +4603,14 @@ docs = [

 [[package]]
 name = "pipecat-ai-prebuilt"
-version = "1.0.0"
+version = "1.0.1"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "fastapi", extra = ["all"] },
 ]
-sdist = { url = "https://files.pythonhosted.org/packages/d0/86/7527474a324e3da787468133a1dba877e06576edc502e1bc7dd84ba7c9f7/pipecat_ai_prebuilt-1.0.0.tar.gz", hash = "sha256:dc66df541f17620eef5dedb2fd44737eb97232899779afb66dcca5aaa9317512", size = 601709, upload-time = "2026-05-14T21:15:26.575Z" }
+sdist = { url = "https://files.pythonhosted.org/packages/fa/27/91857cd93661922687e51f4141583dbeb71f9a6c8d0d6379bae1aa467522/pipecat_ai_prebuilt-1.0.1.tar.gz", hash = "sha256:9453136fcb994802f9b650b5175f3ce1d0476849a9e609fefe52ecc1c3299680", size = 601771, upload-time = "2026-05-20T16:08:14.485Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/89/b1/648122d5e418d3e0c8f797028bc53a22229ffc07a2406712b13b76735f38/pipecat_ai_prebuilt-1.0.0-py3-none-any.whl", hash = "sha256:6b7057920d3d00e5687adb26e032634ba1f6d924eb9079b1804d031620a1e854", size = 601949, upload-time = "2026-05-14T21:15:24.666Z" },
+    { url = "https://files.pythonhosted.org/packages/ec/4f/a636e47967c3aa885ae912502d73a46d1e824a67992e405ea1e94b78bd94/pipecat_ai_prebuilt-1.0.1-py3-none-any.whl", hash = "sha256:45d78d3fd2ac8193626a5dabb5f45d0ff2d35bfc92098b4bcea308ae612196aa", size = 601994, upload-time = "2026-05-20T16:08:12.4Z" },
 ]

 [[package]]
Author	SHA1	Message	Date
Mark Backman	780c004168	Merge pull request #4423 from joycech333/feat/inception-llm-service feat: add Inception LLM service with Mercury 2 support	2026-05-21 12:02:27 -04:00
Mark Backman	28f9203401	Code review fixes	2026-05-21 11:45:17 -04:00
joycech333	77cc314a08	feat: add Inception LLM service with Mercury-2 support Adds InceptionLLMService, an OpenAI-compatible service for Inception's Mercury-2 diffusion-based reasoning model. Supports reasoning_effort (instant/low/medium/high) and realtime mode for reduced TTFT.	2026-05-21 11:23:23 -04:00
Mark Backman	4a8d1d0b5e	Merge pull request #4532 from pipecat-ai/mb/cleanup-logging-after-smart-text-handling Clean up smart text logging	2026-05-21 08:35:46 -04:00
Mark Backman	87f5d60693	Merge pull request #4531 from pipecat-ai/mb/pipecat-prebuilt-1.0.1 chore: bump pipecat-ai-prebuilt to 1.0.1	2026-05-21 08:35:31 -04:00
Mark Backman	c699b31daa	Merge pull request #4534 from pipecat-ai/mb/changelog-4521 Add changelog for #4521	2026-05-21 08:35:15 -04:00
Mark Backman	ee674ffb01	Add changelog for #4521	2026-05-20 17:57:43 -04:00
mihafabcic-soniox	86a5710801	Add max_endpoint_delay_ms and clean up Sonoix STT settings (#4521 )	2026-05-20 17:54:48 -04:00
Mark Backman	4a96b2a9e6	Clean up smart text logging	2026-05-20 15:38:59 -04:00
Mark Backman	105d6f27da	Merge pull request #4514 from pipecat-ai/mb/websocket-stt-service-exception-handling Align websocket STT connection failures	2026-05-20 15:15:35 -04:00
Filipi da Silva Fuchter	e0e3cd336a	Merge pull request #4529 from pipecat-ai/filipi/squash_skill New skill to squash commits.	2026-05-20 16:06:23 -03:00
Mark Backman	9586db5b50	Preserve websocket reconnect failure retries	2026-05-20 14:45:29 -04:00
Mark Backman	a890ab7b21	Add changelog for PR #4531	2026-05-20 12:18:03 -04:00
Mark Backman	c1bf7dbb4a	chore: bump pipecat-ai-prebuilt to 1.0.1	2026-05-20 12:15:09 -04:00
Mark Backman	709a0ce839	Merge pull request #4527 from pipecat-ai/mb/fix-elevenlabs-keepalive-1008 Fix ElevenLabs keepalive racing context-init (1008 disconnects)	2026-05-20 11:21:17 -04:00
Mark Backman	be93350eae	Merge pull request #4522 from pipecat-ai/mb/stt-latency-smallest Add P99 latency for Smallest AI, Mistral, XAI STT	2026-05-20 11:21:00 -04:00
Mark Backman	4a96ab7073	Merge pull request #4524 from pipecat-ai/mb/fix-runner-imports Improve runner optional transport handling	2026-05-20 11:16:16 -04:00
filipi87	c321f50e76	New skill to squash commits.	2026-05-20 10:29:03 -03:00
Mark Backman	70773bce0a	Add changelog for PR #4527	2026-05-20 09:08:47 -04:00
Mark Backman	a5e6886b80	Fix ElevenLabs keepalive racing context-init (1008 disconnects) The keepalive could fire for a new turn's context before that context's voice_settings context-init was sent, making the keepalive the context's first message (no voice_settings) and causing ElevenLabs to reject the later init with a 1008 policy violation. The keepalive now only targets a context once its context-init has been sent (tracked in _context_init_sent).	2026-05-20 08:59:01 -04:00
Mark Backman	d11a4ba0cd	Use shared telephony route availability checks	2026-05-20 08:57:48 -04:00
Mark Backman	38407e091d	Add p99 values for Mistral and XAI	2026-05-19 22:51:33 -04:00
Mark Backman	33e5d1f89b	Add changelog for PR #4522	2026-05-19 18:33:58 -04:00
Mark Backman	861dd23873	Add changelog for runner updates	2026-05-19 17:31:07 -04:00
Mark Backman	b825dd779e	Clarify runner startup banner	2026-05-19 17:31:07 -04:00
Mark Backman	1487da53a9	Improve runner optional transport handling	2026-05-19 17:03:16 -04:00
Mark Backman	aff84a5d9e	Add P99 latency for Smallest AI STT	2026-05-19 11:05:15 -04:00
Mark Backman	e298491068	Add changelog for websocket STT failure handling	2026-05-18 12:41:56 -04:00
Mark Backman	97b00042df	Align websocket STT connection failures	2026-05-18 12:35:01 -04:00
				`@@ -0,0 +1 @@`
				- Added `InceptionLLMService` for Inception's Mercury 2 diffusion reasoning model, with support for `reasoning_effort` and `realtime` settings.
				`@@ -0,0 +1 @@`
				- Fixed websocket STT connection setup failures so services clear stale websocket state and emit non-fatal error frames, allowing `ServiceSwitcher` failover to keep agents running.
				`@@ -0,0 +1 @@`
				- Added `max_endpoint_delay_ms` to `SonioxSTTService.Settings`, controlling the maximum delay (500-3000 ms) before endpoint detection finalizes a turn.
				`@@ -0,0 +1 @@`
				- `SonioxSTTService` now applies settings updates (e.g. via `STTUpdateSettingsFrame`) using a graceful reconnect instead of a hard disconnect/reconnect, preserving the service's reconnect retry behavior.
				`@@ -0,0 +1 @@`
				- Removed the unsupported Georgian (`Language.KA`) language mapping from `SonioxSTTService`.
				`@@ -0,0 +1 @@`
				`- Updated the default p99 TTFS latency values for Smallest AI, Mistral, and XAI STT so turn stop timing uses measured values instead of the conservative fallback.`
				`@@ -0,0 +1 @@`
				`- Updated the development runner startup banner to show the prebuilt client URL once and list enabled or disabled transports with install hints.`
				`@@ -0,0 +1 @@`
				`- Fixed the development runner so missing optional transport dependencies disable only their related routes instead of failing startup in all-transport mode.`
				`@@ -0,0 +1 @@`
				- Fixed a race in `ElevenLabsTTSService` where the periodic keepalive could be sent for a new turn's context before that context's `voice_settings` initialization message, causing ElevenLabs to close the WebSocket with a 1008 policy violation (`voice_settings field must be provided in the first message ...`). The keepalive now only targets a context once its context-init has been sent.