Compare commits

..

40 Commits

Author SHA1 Message Date
Mark Backman
780c004168 Merge pull request #4423 from joycech333/feat/inception-llm-service
feat: add Inception LLM service with Mercury 2 support
2026-05-21 12:02:27 -04:00
Mark Backman
28f9203401 Code review fixes 2026-05-21 11:45:17 -04:00
joycech333
77cc314a08 feat: add Inception LLM service with Mercury-2 support
Adds InceptionLLMService, an OpenAI-compatible service for Inception's
Mercury-2 diffusion-based reasoning model. Supports reasoning_effort
(instant/low/medium/high) and realtime mode for reduced TTFT.
2026-05-21 11:23:23 -04:00
Mark Backman
4a8d1d0b5e Merge pull request #4532 from pipecat-ai/mb/cleanup-logging-after-smart-text-handling
Clean up smart text logging
2026-05-21 08:35:46 -04:00
Mark Backman
87f5d60693 Merge pull request #4531 from pipecat-ai/mb/pipecat-prebuilt-1.0.1
chore: bump pipecat-ai-prebuilt to 1.0.1
2026-05-21 08:35:31 -04:00
Mark Backman
c699b31daa Merge pull request #4534 from pipecat-ai/mb/changelog-4521
Add changelog for #4521
2026-05-21 08:35:15 -04:00
Mark Backman
ee674ffb01 Add changelog for #4521 2026-05-20 17:57:43 -04:00
mihafabcic-soniox
86a5710801 Add max_endpoint_delay_ms and clean up Sonoix STT settings (#4521) 2026-05-20 17:54:48 -04:00
Mark Backman
4a96b2a9e6 Clean up smart text logging 2026-05-20 15:38:59 -04:00
Mark Backman
105d6f27da Merge pull request #4514 from pipecat-ai/mb/websocket-stt-service-exception-handling
Align websocket STT connection failures
2026-05-20 15:15:35 -04:00
Filipi da Silva Fuchter
e0e3cd336a Merge pull request #4529 from pipecat-ai/filipi/squash_skill
New skill to squash commits.
2026-05-20 16:06:23 -03:00
Mark Backman
9586db5b50 Preserve websocket reconnect failure retries 2026-05-20 14:45:29 -04:00
Mark Backman
a890ab7b21 Add changelog for PR #4531 2026-05-20 12:18:03 -04:00
Mark Backman
c1bf7dbb4a chore: bump pipecat-ai-prebuilt to 1.0.1 2026-05-20 12:15:09 -04:00
Mark Backman
709a0ce839 Merge pull request #4527 from pipecat-ai/mb/fix-elevenlabs-keepalive-1008
Fix ElevenLabs keepalive racing context-init (1008 disconnects)
2026-05-20 11:21:17 -04:00
Mark Backman
be93350eae Merge pull request #4522 from pipecat-ai/mb/stt-latency-smallest
Add P99 latency for Smallest AI, Mistral, XAI STT
2026-05-20 11:21:00 -04:00
Mark Backman
4a96ab7073 Merge pull request #4524 from pipecat-ai/mb/fix-runner-imports
Improve runner optional transport handling
2026-05-20 11:16:16 -04:00
filipi87
c321f50e76 New skill to squash commits. 2026-05-20 10:29:03 -03:00
Filipi da Silva Fuchter
bca337f97e Merge pull request #4380 from pipecat-ai/filipi/smart_text
Smart Text Handling
2026-05-20 10:18:30 -03:00
filipi87
5d9e8c5ac5 Removing debug log. 2026-05-20 10:13:46 -03:00
Mark Backman
70773bce0a Add changelog for PR #4527 2026-05-20 09:08:47 -04:00
filipi87
8bdb49bd1a chore: add changelogs for word-timestamp and frame-ordering fixes 2026-05-20 10:03:30 -03:00
filipi87
81bb81c1d0 test: add automated tests for word tracking, frame sequencing, and Cartesia TTS
Adds tests for AggregatedFrameSequencer, WordCompletionTracker, and
word_timestamp_utils (including CJK language scenarios). Updates existing
Cartesia TTS and TTS frame ordering tests to cover the new behaviours.
2026-05-20 10:03:26 -03:00
filipi87
e1bdee598c fix: preserve raw_text through TTS pipeline for correct LLM context attribution
TTSTextFrame entries were losing their original text structure when word
timestamps were enabled. AggregatedTextFrame now carries a raw_text field with
the original LLM-produced text (including pattern delimiters such as
<card>...</card>). The assistant context receives properly-tagged content
rather than the cleaned words returned by the TTS provider. Also handles words
that straddle two sentence boundaries by splitting and attributing each part
to its correct source frame.
2026-05-20 10:03:21 -03:00
filipi87
185a89bb3b fix: strip Cartesia SSML tags from word timestamp entries
SSML markup (e.g. <spell>, <emotion>, <break>) was leaking into word entries
returned by the Cartesia word-timestamps API. Tags are now stripped before
processing so word-to-text attribution remains accurate when SSML is present
in the TTS input.
2026-05-20 10:03:15 -03:00
filipi87
6b9deefbe3 fix: preserve frame insertion order in BaseOutputTransport for equal PTS values
Frames sharing the same presentation timestamp were being reordered by the
priority queue. Adds a monotonic counter as a tiebreaker so frames with equal
PTS are always emitted in insertion order, preventing subtle audio/text
sequencing bugs.
2026-05-20 10:03:08 -03:00
filipi87
deefc32faf fix: hold skipped TTS frames in position until preceding spoken frames complete
Skipped frames (e.g. code blocks filtered via skip_aggregator_types) were
emitted to the assistant context immediately instead of waiting for preceding
spoken frames to finish. Introduces AggregatedFrameSequencer to hold each
frame's slot and flush only after all earlier spoken sentences are complete,
keeping context ordering correct.
2026-05-20 10:03:03 -03:00
Mark Backman
a5e6886b80 Fix ElevenLabs keepalive racing context-init (1008 disconnects)
The keepalive could fire for a new turn's context before that context's
voice_settings context-init was sent, making the keepalive the context's
first message (no voice_settings) and causing ElevenLabs to reject the
later init with a 1008 policy violation. The keepalive now only targets a
context once its context-init has been sent (tracked in _context_init_sent).
2026-05-20 08:59:01 -04:00
Mark Backman
d11a4ba0cd Use shared telephony route availability checks 2026-05-20 08:57:48 -04:00
Mark Backman
38407e091d Add p99 values for Mistral and XAI 2026-05-19 22:51:33 -04:00
Mark Backman
82cd931efa Merge pull request #4306 from YFortin/fix/azure-tts-last-word-race
fix(azure-tts): Route completion through word boundary queue to prevent last word from being missed
2026-05-19 22:27:50 -04:00
Mark Backman
33e5d1f89b Add changelog for PR #4522 2026-05-19 18:33:58 -04:00
Mark Backman
861dd23873 Add changelog for runner updates 2026-05-19 17:31:07 -04:00
Mark Backman
b825dd779e Clarify runner startup banner 2026-05-19 17:31:07 -04:00
Mark Backman
1487da53a9 Improve runner optional transport handling 2026-05-19 17:03:16 -04:00
Mark Backman
aff84a5d9e Add P99 latency for Smallest AI STT 2026-05-19 11:05:15 -04:00
Mark Backman
e298491068 Add changelog for websocket STT failure handling 2026-05-18 12:41:56 -04:00
Mark Backman
97b00042df Align websocket STT connection failures 2026-05-18 12:35:01 -04:00
Yan Fortin
6feeee515f chore: rename changelog fragment to match PR #4306 2026-04-14 18:49:35 -04:00
Yan Fortin
55fb4b0845 fix(azure-tts): route completion through word boundary queue to prevent last word from being missed
The Azure TTS _handle_completed callback was putting the audio stream
completion signal (None) directly into _audio_queue while the last word
was still pending in _word_boundary_queue. This caused a race condition
where run_tts could exit and TTSStoppedFrame could be emitted before the
word processor task had a chance to process and emit the final word's
TTSTextFrame.

The fix routes the completion signal through _word_boundary_queue as a
None sentinel. The word processor task now recognizes this sentinel and
only signals _audio_queue after all pending words have been drained.
This guarantees the last word's TTSTextFrame is always emitted before
TTSStoppedFrame.

The cancellation/interruption path (_handle_canceled) is unchanged and
still signals _audio_queue directly, which is correct since word ordering
does not matter when speech is interrupted.
2026-04-14 18:48:40 -04:00
132 changed files with 5046 additions and 258 deletions

View File

@@ -0,0 +1,91 @@
---
name: squash-commits
description: Reorganize messy branch commits into a small set of logical, meaningful commits without changing any content. Drops merge-from-main commits. Safe: creates a backup branch first.
---
Reorganize the commits on the current branch into a small number of logical commits. Do NOT change any file content — only the commit structure changes.
## Instructions
### 1. Safety check
```bash
git status --short
```
If there are uncommitted changes, stop and tell the user to commit or stash them first.
### 2. Inspect the branch
```bash
git log main..HEAD --oneline
git diff main..HEAD --name-only
```
List every file changed vs `main` and every commit on the branch (excluding merge commits from main).
### 3. Create a backup branch
```bash
git branch backup/<current-branch-name>
```
Tell the user the backup exists so they can recover if needed.
### 4. Soft-reset to main and unstage everything
```bash
git reset --soft main
git restore --staged .
```
All branch changes are now in the working tree, unstaged. No content has changed.
### 5. Plan the logical groups
Read the changed files and the original commit messages to understand what the work covers. Group related files into logical commits. Typical groups:
- Core feature or fix (new source files + modified core files)
- Secondary features or fixes (each as its own commit if distinct)
- Refactoring or renames
- Tests
- Changelogs / docs
Use the changelog files (if any) as a strong hint — each changelog entry often maps to one commit.
Present the proposed grouping to the user and ask for confirmation before committing.
### 6. Commit in logical groups
For each group, stage only the relevant files and commit with a clear message following the project's conventions:
```bash
git add <file1> <file2> ...
git commit -m "..."
```
Use conventional commit prefixes if the project uses them (`feat:`, `fix:`, `refactor:`, `test:`, `chore:`).
### 7. Verify
```bash
git log main..HEAD --oneline
git diff main..HEAD --name-only
git status --short
```
Confirm:
- Commit count is small and each message is meaningful
- The set of changed files vs `main` is identical to before
- Working tree is clean
### 8. Remind about force-push
The branch history has been rewritten. Tell the user they will need to `git push --force-with-lease` when they are ready to update the remote. Do NOT push automatically.
## Rules
- Never change file contents. If you find yourself editing a file, stop.
- Never skip the backup branch step.
- Never force-push without explicit user instruction.
- If any step fails or the result looks wrong, tell the user and suggest restoring from the backup: `git reset --hard backup/<branch-name>`.

View File

@@ -92,7 +92,7 @@ Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.yout
| Category | Services |
| ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/api-reference/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/api-reference/server/services/stt/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/api-reference/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/api-reference/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/api-reference/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/api-reference/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/api-reference/server/services/stt/gladia), [Google](https://docs.pipecat.ai/api-reference/server/services/stt/google), [Gradium](https://docs.pipecat.ai/api-reference/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/api-reference/server/services/stt/groq), [Mistral](https://docs.pipecat.ai/api-reference/server/services/stt/mistral), [NVIDIA](https://docs.pipecat.ai/api-reference/server/services/stt/nvidia), [OpenAI (Whisper)](https://docs.pipecat.ai/api-reference/server/services/stt/openai), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/api-reference/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/api-reference/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/api-reference/server/services/stt/whisper), [xAI](https://docs.pipecat.ai/api-reference/server/services/stt/xai) |
| LLMs | [Anthropic](https://docs.pipecat.ai/api-reference/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/api-reference/server/services/llm/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/api-reference/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/api-reference/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/api-reference/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/api-reference/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/api-reference/server/services/llm/grok), [Groq](https://docs.pipecat.ai/api-reference/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/api-reference/server/services/llm/mistral), [Nebius](https://docs.pipecat.ai/api-reference/server/services/llm/nebius), [Novita](https://docs.pipecat.ai/api-reference/server/services/llm/novita), [NVIDIA NIM](https://docs.pipecat.ai/api-reference/server/services/llm/nvidia), [Ollama](https://docs.pipecat.ai/api-reference/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/api-reference/server/services/llm/openai), [OpenAI Responses](https://docs.pipecat.ai/api-reference/server/services/llm/openai-responses), [OpenRouter](https://docs.pipecat.ai/api-reference/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/api-reference/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/api-reference/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/api-reference/server/services/llm/sambanova), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/llm/sarvam), [Together AI](https://docs.pipecat.ai/api-reference/server/services/llm/together) |
| LLMs | [Anthropic](https://docs.pipecat.ai/api-reference/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/api-reference/server/services/llm/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/api-reference/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/api-reference/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/api-reference/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/api-reference/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/api-reference/server/services/llm/grok), [Groq](https://docs.pipecat.ai/api-reference/server/services/llm/groq), [Inception](https://docs.pipecat.ai/api-reference/server/services/llm/inception), [Mistral](https://docs.pipecat.ai/api-reference/server/services/llm/mistral), [Nebius](https://docs.pipecat.ai/api-reference/server/services/llm/nebius), [Novita](https://docs.pipecat.ai/api-reference/server/services/llm/novita), [NVIDIA NIM](https://docs.pipecat.ai/api-reference/server/services/llm/nvidia), [Ollama](https://docs.pipecat.ai/api-reference/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/api-reference/server/services/llm/openai), [OpenAI Responses](https://docs.pipecat.ai/api-reference/server/services/llm/openai-responses), [OpenRouter](https://docs.pipecat.ai/api-reference/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/api-reference/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/api-reference/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/api-reference/server/services/llm/sambanova), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/llm/sarvam), [Together AI](https://docs.pipecat.ai/api-reference/server/services/llm/together) |
| Text-to-Speech | [Async](https://docs.pipecat.ai/api-reference/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/api-reference/server/services/tts/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/tts/azure), [Camb AI](https://docs.pipecat.ai/api-reference/server/services/tts/camb), [Cartesia](https://docs.pipecat.ai/api-reference/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/api-reference/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/api-reference/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/api-reference/server/services/tts/fish), [Google](https://docs.pipecat.ai/api-reference/server/services/tts/google), [Gradium](https://docs.pipecat.ai/api-reference/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/api-reference/server/services/tts/groq), [Hume](https://docs.pipecat.ai/api-reference/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/api-reference/server/services/tts/inworld), [Kokoro](https://docs.pipecat.ai/api-reference/server/services/tts/kokoro), [LMNT](https://docs.pipecat.ai/api-reference/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/api-reference/server/services/tts/minimax), [Mistral](https://docs.pipecat.ai/api-reference/server/services/tts/mistral), [Neuphonic](https://docs.pipecat.ai/api-reference/server/services/tts/neuphonic), [NVIDIA](https://docs.pipecat.ai/api-reference/server/services/tts/nvidia), [OpenAI](https://docs.pipecat.ai/api-reference/server/services/tts/openai), [Piper](https://docs.pipecat.ai/api-reference/server/services/tts/piper), [Resemble](https://docs.pipecat.ai/api-reference/server/services/tts/resemble), [Rime](https://docs.pipecat.ai/api-reference/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/tts/sarvam), [Smallest](https://docs.pipecat.ai/api-reference/server/services/tts/smallest), [Soniox](https://docs.pipecat.ai/api-reference/server/services/tts/soniox), [Speechmatics](https://docs.pipecat.ai/api-reference/server/services/tts/speechmatics), [xAI](https://docs.pipecat.ai/api-reference/server/services/tts/xai), [XTTS](https://docs.pipecat.ai/api-reference/server/services/tts/xtts) |
| Speech-to-Speech | [AWS Nova Sonic](https://docs.pipecat.ai/api-reference/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/api-reference/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/api-reference/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/api-reference/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/api-reference/server/services/s2s/ultravox), |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/api-reference/server/services/transport/fastapi-websocket), [LiveKit (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/livekit), [SmallWebRTCTransport](https://docs.pipecat.ai/api-reference/server/services/transport/small-webrtc), [Vonage (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/vonage), [WebSocket Server](https://docs.pipecat.ai/api-reference/server/services/transport/websocket-server), [WhatsApp](https://docs.pipecat.ai/api-reference/server/services/transport/whatsapp), Local |

1
changelog/4306.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed Azure TTS last word being missed by observers and RTVI UI. The completion signal was racing with word timestamp processing, causing the final word's `TTSTextFrame` to arrive after `TTSStoppedFrame`. Completion is now routed through the word boundary queue to ensure all words are processed before signaling stream end.

View File

@@ -0,0 +1 @@
- Fixed `BaseOutputTransport` reordering frames that share the same presentation timestamp. Frames with equal PTS values are now emitted in insertion order, preventing subtle audio/text sequencing bugs when multiple frames arrive at the same time.

View File

@@ -0,0 +1 @@
- Fixed Cartesia word timestamps leaking SSML tag text (e.g. `<spell>`, `<emotion>`, `<break>`) into word entries. Tags are now stripped before processing, so word-to-text attribution remains accurate when SSML markup is present in the TTS input.

View File

@@ -0,0 +1 @@
- Fixed `TTSTextFrame` entries losing their original text structure when word timestamps are enabled. Each `TTSTextFrame` now carries a `raw_text` field containing the corresponding span of the original LLM-produced text (including pattern delimiters such as `<card>4111 1111 1111 1111</card>`), so the assistant context receives properly-tagged content rather than the cleaned words returned by the TTS provider. Also handles words that straddle two sentence boundaries by splitting them and attributing each part to its correct source frame.

1
changelog/4380.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed skipped TTS frames (e.g. code blocks filtered via `skip_aggregator_types`) being emitted to the assistant context immediately instead of waiting for preceding spoken frames to finish. They now hold their position in the frame sequence and are flushed only after all earlier spoken sentences are complete, keeping context ordering correct.

1
changelog/4423.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `InceptionLLMService` for Inception's Mercury 2 diffusion reasoning model, with support for `reasoning_effort` and `realtime` settings.

1
changelog/4514.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed websocket STT connection setup failures so services clear stale websocket state and emit non-fatal error frames, allowing `ServiceSwitcher` failover to keep agents running.

1
changelog/4521.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `max_endpoint_delay_ms` to `SonioxSTTService.Settings`, controlling the maximum delay (500-3000 ms) before endpoint detection finalizes a turn.

View File

@@ -0,0 +1 @@
- `SonioxSTTService` now applies settings updates (e.g. via `STTUpdateSettingsFrame`) using a graceful reconnect instead of a hard disconnect/reconnect, preserving the service's reconnect retry behavior.

View File

@@ -0,0 +1 @@
- Removed the unsupported Georgian (`Language.KA`) language mapping from `SonioxSTTService`.

View File

@@ -0,0 +1 @@
- Updated the default p99 TTFS latency values for Smallest AI, Mistral, and XAI STT so turn stop timing uses measured values instead of the conservative fallback.

View File

@@ -0,0 +1 @@
- Updated the development runner startup banner to show the prebuilt client URL once and list enabled or disabled transports with install hints.

1
changelog/4524.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed the development runner so missing optional transport dependencies disable only their related routes instead of failing startup in all-transport mode.

View File

@@ -1 +0,0 @@
- Services and transports with missing optional dependencies now raise `ImportError` instead of a bare `Exception` when their module is imported without the required extra installed. The original `ModuleNotFoundError` is preserved as `__cause__`, so code that wraps these imports can now use `except ImportError:` cleanly instead of `except Exception:`.

1
changelog/4527.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed a race in `ElevenLabsTTSService` where the periodic keepalive could be sent for a new turn's context before that context's `voice_settings` initialization message, causing ElevenLabs to close the WebSocket with a 1008 policy violation (`voice_settings field must be provided in the first message ...`). The keepalive now only targets a context once its context-init has been sent.

View File

@@ -0,0 +1 @@
- Bumped `pipecat-ai-prebuilt` to 1.0.1 in the `runner` extra, updating the prebuilt client UI served by the development runner.

View File

@@ -91,6 +91,9 @@ HEYGEN_LIVE_AVATAR_API_KEY=...
HUME_API_KEY=...
HUME_VOICE_ID=...
# Inception
INCEPTION_API_KEY=...
# Inworld
INWORLD_API_KEY=...

View File

@@ -0,0 +1,177 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame, TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.inception.llm import InceptionLLMService
from pipecat.services.llm_service import FunctionCallParams
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
async def fetch_weather_from_api(params: FunctionCallParams):
await params.result_callback({"conditions": "nice", "temperature": "75"})
async def fetch_restaurant_recommendation(params: FunctionCallParams):
await params.result_callback({"name": "The Golden Dragon"})
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = InceptionLLMService(
api_key=os.environ["INCEPTION_API_KEY"],
settings=InceptionLLMService.Settings(
reasoning_effort="instant",
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
# You can also register a function_name of None to get all functions
# sent to the same callback with an additional function_name parameter.
llm.register_function("get_current_weather", fetch_weather_from_api)
llm.register_function("get_restaurant_recommendation", fetch_restaurant_recommendation)
@llm.event_handler("on_function_calls_started")
async def on_function_calls_started(service, function_calls):
await tts.queue_frame(TTSSpeakFrame("Let me check on that."))
weather_function = FunctionSchema(
name="get_current_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the user's location.",
},
},
required=["location", "format"],
)
restaurant_function = FunctionSchema(
name="get_restaurant_recommendation",
description="Get a restaurant recommendation",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
},
required=["location"],
)
tools = ToolsSchema(standard_tools=[weather_function, restaurant_function])
context = LLMContext(tools=tools)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
[
transport.input(),
stt,
user_aggregator,
llm,
tts,
transport.output(),
assistant_aggregator,
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -22,9 +22,9 @@ from pipecat.processors.aggregators.llm_response_universal import (
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.soniox.stt import SonioxSTTService
from pipecat.services.soniox.tts import SonioxTTSService
from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
@@ -53,12 +53,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = SonioxSTTService(api_key=os.environ["SONIOX_API_KEY"])
tts = CartesiaTTSService(
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
tts = SonioxTTSService(api_key=os.environ["SONIOX_API_KEY"])
llm = OpenAILLMService(
api_key=os.environ["OPENAI_API_KEY"],
@@ -103,9 +98,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
await task.queue_frames([LLMRunFrame()])
await asyncio.sleep(10)
logger.info("Updating Soniox STT settings: language=es")
logger.info("Updating Soniox STT settings: language_hints=[es]")
await task.queue_frame(
STTUpdateSettingsFrame(delta=SonioxSTTService.Settings(language=Language.ES))
STTUpdateSettingsFrame(delta=SonioxSTTService.Settings(language_hints=[Language.ES]))
)
@transport.event_handler("on_client_disconnected")

View File

@@ -77,6 +77,7 @@ groq = [ "groq>=0.23.0,<2" ]
gstreamer = [ "pygobject~=3.50.0" ]
heygen = [ "livekit>=1.0.13,<2", "pipecat-ai[websockets-base]" ]
hume = [ "hume>=0.11.2,<1" ]
inception = []
inworld = [ "pipecat-ai[websockets-base]" ]
koala = [ "pvkoala~=2.0.3" ]
kokoro = [ "kokoro-onnx>=0.5.0,<1", "requests>=2.32.5,<3" ]
@@ -103,7 +104,7 @@ piper = [ "piper-tts>=1.3.0,<2", "requests>=2.32.5,<3" ]
qwen = []
resembleai = [ "pipecat-ai[websockets-base]" ]
rime = [ "pipecat-ai[websockets-base]" ]
runner = [ "python-dotenv>=1.0.0,<2.0.0", "uvicorn>=0.32.0,<1.0.0", "fastapi>=0.115.6,<1", "pipecat-ai-prebuilt>=1.0.0"]
runner = [ "python-dotenv>=1.0.0,<2.0.0", "uvicorn>=0.32.0,<1.0.0", "fastapi>=0.115.6,<1", "pipecat-ai-prebuilt>=1.0.1"]
sagemaker = ["aws_sdk_sagemaker_runtime_http2; python_version>='3.12'"]
sambanova = []
sarvam = [ "sarvamai==0.1.28", "pipecat-ai[websockets-base]" ]

View File

@@ -198,6 +198,7 @@ TESTS_FUNCTION_CALLING = [
("function-calling/function-calling-sarvam.py", EVAL_WEATHER),
("function-calling/function-calling-novita.py", EVAL_WEATHER),
("function-calling/function-calling-deepseek.py", EVAL_WEATHER),
("function-calling/function-calling-inception.py", EVAL_WEATHER),
# Video
("function-calling/function-calling-anthropic-video.py", EVAL_VISION_CAMERA),
("function-calling/function-calling-aws-video.py", EVAL_VISION_CAMERA),

View File

@@ -28,7 +28,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use Google AI, you need to `pip install pipecat-ai[google]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
class GeminiLLMInvocationParams(TypedDict):

View File

@@ -23,7 +23,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use the Koala filter, you need to `pip install pipecat-ai[koala]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
class KoalaFilter(BaseAudioFilter):

View File

@@ -27,7 +27,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use KrispVivaFilter, you need to install krisp_audio.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
class KrispVivaFilter(BaseAudioFilter):

View File

@@ -28,7 +28,7 @@ except ModuleNotFoundError as e:
logger.error(
"In order to use the soundfile mixer, you need to `pip install pipecat-ai[soundfile]`."
)
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
class SoundfileMixer(BaseAudioMixer):

View File

@@ -27,7 +27,7 @@ except ModuleNotFoundError as e:
logger.error(
"In order to use the LocalSmartTurnAnalyzer, you need to `pip install pipecat-ai[local-smart-turn]`."
)
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
class LocalCoreMLSmartTurnAnalyzer(BaseSmartTurn):

View File

@@ -33,7 +33,7 @@ except ModuleNotFoundError as e:
logger.error(
"In order to use LocalSmartTurnAnalyzerV2, you need to `pip install pipecat-ai[local-smart-turn]`."
)
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
class LocalSmartTurnAnalyzerV2(BaseSmartTurn):

View File

@@ -28,7 +28,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use KrispVivaVADAnalyzer, you need to install krisp_audio.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
class KrispVivaVadAnalyzer(VADAnalyzer):

View File

@@ -27,7 +27,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use Silero VAD, you need to `pip install pipecat-ai`.")
raise ImportError(f"Missing module(s): {e}") from e
raise Exception(f"Missing module(s): {e}")
class SileroOnnxModel:

View File

@@ -383,10 +383,14 @@ class AggregatedTextFrame(TextFrame):
Parameters:
aggregated_by: Method used to aggregate the text frames.
context_id: Unique identifier for the TTS context that generated this text.
raw_text: The full matched text including start/end pattern delimiters, set when
this frame was produced from a PatternMatch (e.g. a ``<code>...</code>`` block).
None for ordinary sentence aggregations.
"""
aggregated_by: AggregationType | str
context_id: str | None = None
raw_text: str | None = None
@dataclass

View File

@@ -25,6 +25,7 @@ from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.vad.vad_analyzer import VADAnalyzer
from pipecat.audio.vad.vad_controller import VADController
from pipecat.frames.frames import (
AggregatedTextFrame,
AssistantImageRawFrame,
BotStartedSpeakingFrame,
BotStoppedSpeakingFrame,
@@ -1496,9 +1497,14 @@ class LLMAssistantAggregator(LLMContextAggregator):
if len(frame.text) == 0:
return
text = (
frame.raw_text
if isinstance(frame, AggregatedTextFrame) and frame.raw_text
else frame.text
)
self._aggregation.append(
TextPartForConcatenation(
frame.text, includes_inter_part_spaces=frame.includes_inter_frame_spaces
text, includes_inter_part_spaces=frame.includes_inter_frame_spaces
)
)

View File

@@ -23,6 +23,7 @@ from pipecat.frames.frames import (
)
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.utils.text.base_text_aggregator import BaseTextAggregator
from pipecat.utils.text.pattern_pair_aggregator import PatternMatch
from pipecat.utils.text.simple_text_aggregator import SimpleTextAggregator
@@ -85,7 +86,11 @@ class LLMTextProcessor(FrameProcessor):
out_frame = AggregatedTextFrame(
text=aggregation.text,
aggregated_by=aggregation.type,
raw_text=aggregation.full_match
if isinstance(aggregation, PatternMatch)
else aggregation.text,
)
out_frame.append_to_context = True
out_frame.skip_tts = in_frame.skip_tts
await self.push_frame(out_frame)
@@ -96,6 +101,9 @@ class LLMTextProcessor(FrameProcessor):
out_frame = AggregatedTextFrame(
text=remaining.text,
aggregated_by=remaining.type,
raw_text=remaining.full_match
if isinstance(remaining, PatternMatch)
else remaining.text,
)
out_frame.skip_tts = skip_tts
await self.push_frame(out_frame)

View File

@@ -22,7 +22,7 @@ try:
from langchain_core.runnables import Runnable
except ModuleNotFoundError as e:
logger.error("In order to use Langchain, you need to `pip install pipecat-ai[langchain]`. ")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
class LangchainProcessor(FrameProcessor):

View File

@@ -528,6 +528,9 @@ class RTVIObserver(BaseObserver):
text = await transform(text, agg_type)
isTTS = isinstance(frame, TTSTextFrame)
if agg_type is not AggregationType.WORD:
logger.trace(f"{self} Aggregated LLM text: {text}, {agg_type} spoken:{isTTS}")
if self._params.bot_output_enabled:
message = RTVI.BotOutputMessage(
data=RTVI.BotOutputMessageData(text=text, spoken=isTTS, aggregated_by=agg_type)

View File

@@ -21,7 +21,7 @@ try:
from strands.multiagent.graph import Graph
except ModuleNotFoundError as e:
logger.error("In order to use Strands Agents, you need to `pip install strands-agents`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
class StrandsAgentsProcessor(FrameProcessor):

View File

@@ -33,7 +33,7 @@ except ModuleNotFoundError as e:
logger.error(
"In order to use GStreamer, you need to `pip install pipecat-ai[gstreamer]`. Also, you need to install GStreamer in your system."
)
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
class GStreamerPipelineSource(FrameProcessor):

View File

@@ -17,7 +17,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use Sentry, you need to `pip install pipecat-ai[sentry]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
from pipecat.processors.metrics.frame_processor_metrics import FrameProcessorMetrics

View File

@@ -90,6 +90,7 @@ To run locally:
import argparse
import asyncio
import importlib.util
import mimetypes
import os
import sys
@@ -131,6 +132,18 @@ load_dotenv(override=True)
os.environ["ENV"] = "local"
TELEPHONY_TRANSPORTS = ["twilio", "telnyx", "plivo", "exotel"]
TRANSPORT_ROUTE_DEPENDENCIES = {
"daily": ("daily",),
"webrtc": ("aiortc",),
"telephony": ("fastapi", "websockets"),
"websocket": ("fastapi", "websockets"),
}
TRANSPORT_INSTALL_HINTS = {
"daily": "install pipecat-ai[daily]",
"webrtc": "install pipecat-ai[webrtc]",
"telephony": "install pipecat-ai[websocket]",
"websocket": "install pipecat-ai[websocket]",
}
# Mirror Pipecat Cloud's 4-hour max session limit so dev rooms get cleaned up.
PIPECAT_ROOM_EXP_HOURS = 4.0
@@ -156,6 +169,120 @@ Import this to add custom routes from other packages before calling
"""
def _is_module_available(module: str) -> bool:
"""Check whether a module can be imported without importing it.
Args:
module: Fully-qualified module name to check.
Returns:
``True`` if Python can resolve the module, ``False`` otherwise.
"""
try:
return importlib.util.find_spec(module) is not None
except (ImportError, ModuleNotFoundError, ValueError):
return False
def _transport_route_dependencies(transport: str) -> tuple[str, ...]:
"""Return module dependencies required for a transport route.
Args:
transport: Transport name from the runner request or CLI.
Returns:
Module names required to enable the transport route.
"""
if transport in TELEPHONY_TRANSPORTS:
return TRANSPORT_ROUTE_DEPENDENCIES["telephony"]
return TRANSPORT_ROUTE_DEPENDENCIES.get(transport, ())
def _transport_routes_enabled(transport: str) -> bool:
"""Return whether a transport route can run in this environment.
Args:
transport: Transport name from the runner request or CLI.
Returns:
``True`` if the requested transport is enabled.
"""
return all(_is_module_available(module) for module in _transport_route_dependencies(transport))
def _runner_url(args: argparse.Namespace) -> str:
"""Return the browser URL for the runner prebuilt client."""
return f"http://{args.host}:{args.port}"
def _transport_status_lists() -> tuple[list[str], list[str]]:
"""Return enabled and disabled transport labels for the startup banner."""
transports = ["daily", "webrtc", "telephony", "websocket"]
enabled = []
disabled = []
for label in transports:
if _transport_routes_enabled(label):
enabled.append(label)
else:
disabled.append(f"{label} ({TRANSPORT_INSTALL_HINTS[label]})")
return enabled, disabled
def _format_transport_status(labels: list[str]) -> str:
"""Format a startup banner transport status list."""
return ", ".join(labels) if labels else "none"
def _print_startup_message(args: argparse.Namespace):
"""Print connection information for the development runner."""
print()
if args.transport is None:
enabled, disabled = _transport_status_lists()
print("🚀 Bot ready!")
print(f" → Open: {_runner_url(args)}")
print(f" → Enabled transports: {_format_transport_status(enabled)}")
if disabled:
print(f" → Disabled transports: {_format_transport_status(disabled)}")
elif args.transport == "webrtc":
if args.esp32:
print("🚀 Bot ready! (ESP32 mode)")
elif args.whatsapp:
print("🚀 Bot ready! (WhatsApp)")
else:
print("🚀 Bot ready! (WebRTC)")
if _transport_routes_enabled("webrtc"):
print(f" → Open: {_runner_url(args)}")
else:
print(f" → WebRTC disabled ({TRANSPORT_INSTALL_HINTS['webrtc']})")
elif args.transport == "daily":
print("🚀 Bot ready! (Daily)")
if not _transport_routes_enabled("daily"):
print(f" → Daily disabled ({TRANSPORT_INSTALL_HINTS['daily']})")
else:
print(f" → Open: {_runner_url(args)}")
if args.dialin:
print(
f" → Daily dial-in webhook: "
f"http://{args.host}:{args.port}/daily-dialin-webhook"
)
print(" → Configure this URL in your Daily phone number settings")
elif args.transport in TELEPHONY_TRANSPORTS:
print(f"🚀 Bot ready! ({args.transport.capitalize()})")
if not _transport_routes_enabled(args.transport):
print(f" → Telephony disabled ({TRANSPORT_INSTALL_HINTS['telephony']})")
else:
print(f" → Open: {_runner_url(args)}")
if args.proxy:
print(f" → XML webhook: http://{args.host}:{args.port}/")
print(f" → WebSocket: ws://{args.host}:{args.port}/ws")
elif args.transport == "vonage":
print()
print("🚀 Bot ready!")
print()
def _get_bot_module():
"""Get the bot module from the calling script."""
import importlib.util
@@ -227,6 +354,8 @@ async def _run_websocket_bot(websocket: WebSocket, args: argparse.Namespace):
def _setup_websocket_routes(app: FastAPI, args: argparse.Namespace):
"""Set up the plain WebSocket route at ``/ws-client``."""
if not _transport_routes_enabled("websocket"):
return
@app.websocket("/ws-client")
async def websocket_client_endpoint(websocket: WebSocket):
@@ -338,6 +467,15 @@ def _setup_unified_start_route(
),
)
if not _transport_routes_enabled(transport):
raise HTTPException(
status_code=400,
detail=(
f"Transport '{transport}' is disabled in this runner environment. "
"Check the startup banner for enabled transports."
),
)
if transport == "webrtc":
# WebRTC: register the session; the bot starts when the WebRTC offer arrives.
session_id = str(uuid.uuid4())
@@ -471,6 +609,9 @@ def _setup_webrtc_routes(
app: FastAPI, args: argparse.Namespace, active_sessions: dict[str, dict[str, Any]]
):
"""Set up WebRTC-specific routes."""
if not _transport_routes_enabled("webrtc"):
return
try:
from pipecat.transports.smallwebrtc.connection import SmallWebRTCConnection
from pipecat.transports.smallwebrtc.request_handler import (
@@ -480,7 +621,7 @@ def _setup_webrtc_routes(
SmallWebRTCRequestHandler,
)
except ImportError as e:
logger.error(f"WebRTC transport dependencies not installed: {e}")
logger.warning(f"WebRTC routes disabled after dependency check passed: {e}")
return
@app.get("/files/{filename:path}")
@@ -765,6 +906,8 @@ def _setup_whatsapp_routes(app: FastAPI, args: argparse.Namespace):
def _setup_daily_routes(app: FastAPI, args: argparse.Namespace):
"""Set up Daily-specific routes."""
if not _transport_routes_enabled("daily"):
return
@app.get("/daily")
async def create_room_and_start_agent():
@@ -908,6 +1051,9 @@ def _setup_telephony_routes(app: FastAPI, args: argparse.Namespace):
specific telephony transport is chosen via ``-t`` because the XML template
is provider-specific and requires a proxy hostname (``--proxy``).
"""
if not _transport_routes_enabled("telephony"):
return
if args.transport in TELEPHONY_TRANSPORTS:
# XML response templates (Exotel doesn't use XML webhooks)
XML_TEMPLATES = {
@@ -1160,43 +1306,11 @@ def main(parser: argparse.ArgumentParser | None = None):
return
# Print startup message
print()
if args.transport is None:
print("🚀 Bot ready!")
print(f" → WebRTC: http://{args.host}:{args.port}/client")
print(f" → Daily: http://{args.host}:{args.port}/daily")
print(f" → Telephony: ws://{args.host}:{args.port}/ws")
elif args.transport == "webrtc":
if args.esp32:
print("🚀 Bot ready! (ESP32 mode)")
elif args.whatsapp:
print("🚀 Bot ready! (WhatsApp)")
else:
print("🚀 Bot ready! (WebRTC)")
print(f" → Open http://{args.host}:{args.port}/client in your browser")
elif args.transport == "daily":
print("🚀 Bot ready! (Daily)")
if args.dialin:
print(
f" → Daily dial-in webhook: http://{args.host}:{args.port}/daily-dialin-webhook"
)
print(f" → Configure this URL in your Daily phone number settings")
else:
print(
f" → Open http://{args.host}:{args.port}/daily in your browser to start a session"
)
elif args.transport in TELEPHONY_TRANSPORTS:
print(f"🚀 Bot ready! ({args.transport.capitalize()})")
if args.proxy:
print(f" → XML webhook: http://{args.host}:{args.port}/")
print(f" → WebSocket: ws://{args.host}:{args.port}/ws")
elif args.transport == "vonage":
print()
print(f"🚀 Bot ready!")
_print_startup_message(args)
if args.transport == "vonage":
asyncio.run(_run_vonage())
print()
return
print()
RUNNER_DOWNLOADS_FOLDER = args.folder
RUNNER_HOST = args.host

View File

@@ -48,7 +48,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use Anthropic, you need to `pip install pipecat-ai[anthropic]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
class AnthropicThinkingConfig(BaseModel):

View File

@@ -55,7 +55,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error('In order to use AssemblyAI, you need to `pip install "pipecat-ai[assemblyai]"`.')
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
def map_language_from_assemblyai(language_code: str) -> Language:
@@ -586,9 +586,9 @@ class AssemblyAISTTService(WebsocketSTTService):
await self._call_event_handler("on_connected")
logger.debug(f"{self} Connected to AssemblyAI WebSocket")
except Exception as e:
self._websocket = None
self._connected = False
await self.push_error(error_msg=f"Unable to connect to AssemblyAI: {e}", exception=e)
raise
async def _disconnect_websocket(self):
"""Close the websocket connection to AssemblyAI."""

View File

@@ -39,7 +39,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use Async, you need to `pip install pipecat-ai[asyncai]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
def language_to_async_language(language: Language) -> str:

View File

@@ -49,7 +49,7 @@ except ModuleNotFoundError as e:
logger.error(
"In order to use AWS services, you need to `pip install pipecat-ai[aws]`. Also, remember to set `AWS_SECRET_ACCESS_KEY`, `AWS_ACCESS_KEY_ID`, and `AWS_REGION` environment variable."
)
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
@dataclass

View File

@@ -81,7 +81,7 @@ except ModuleNotFoundError as e:
logger.error(
"In order to use AWS services, you need to `pip install pipecat-ai[aws-nova-sonic]`."
)
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
class AWSNovaSonicUnhandledFunctionException(Exception):

View File

@@ -32,7 +32,7 @@ except ModuleNotFoundError as e:
logger.error(
"In order to use SageMaker BiDi client, you need to `pip install pipecat-ai[sagemaker]`."
)
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
class SageMakerBidiClient:

View File

@@ -48,7 +48,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use AWS services, you need to `pip install pipecat-ai[aws]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
@dataclass
@@ -339,10 +339,10 @@ class AWSTranscribeSTTService(WebsocketSTTService):
await self._call_event_handler("on_connected")
logger.info(f"{self} Successfully connected to AWS Transcribe")
except Exception as e:
self._websocket = None
await self.push_error(
error_msg=f"Unable to connect to AWS Transcribe: {e}", exception=e
)
raise
async def _disconnect_websocket(self):
"""Close the websocket connection to AWS Transcribe."""

View File

@@ -34,7 +34,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use AWS services, you need to `pip install pipecat-ai[aws]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
def language_to_aws_language(language: Language) -> str:

View File

@@ -17,7 +17,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use Azure Realtime, you need to `pip install pipecat-ai[openai]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
@dataclass

View File

@@ -49,7 +49,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use Azure, you need to `pip install pipecat-ai[azure]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
@dataclass

View File

@@ -42,7 +42,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use Azure, you need to `pip install pipecat-ai[azure]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
def sample_rate_to_output_format(sample_rate: int) -> SpeechSynthesisOutputFormat:
@@ -540,14 +540,25 @@ class AzureTTSService(TTSService, AzureBaseTTSService):
self._last_timestamp = timestamp
async def _word_processor_task_handler(self):
"""Process word timestamps from the queue and call add_word_timestamps."""
"""Process word timestamps from the queue and call add_word_timestamps.
Also handles a None sentinel from _handle_completed: once all pending
words have been drained, it signals audio stream completion via
_audio_queue so that run_tts exits only after the last word has been
processed.
"""
while True:
try:
word, timestamp_seconds = await self._word_boundary_queue.get()
if self._current_context_id:
await self.add_word_timestamps(
[(word, timestamp_seconds)], self._current_context_id
)
item = await self._word_boundary_queue.get()
if item is None:
# All words drained — now signal audio completion.
self._audio_queue.put_nowait(None)
else:
word, timestamp_seconds = item
if self._current_context_id:
await self.add_word_timestamps(
[(word, timestamp_seconds)], self._current_context_id
)
self._word_boundary_queue.task_done()
except asyncio.CancelledError:
break
@@ -569,17 +580,21 @@ class AzureTTSService(TTSService, AzureBaseTTSService):
Args:
evt: Completion event from Azure Speech SDK.
"""
# Store duration for cumulative offset calculation
if evt.result and evt.result.audio_duration:
self._current_sentence_duration = evt.result.audio_duration.total_seconds()
# Flush any pending word before completing
if self._last_word is not None:
self._word_boundary_queue.put_nowait((self._last_word, self._last_timestamp))
self._last_word = None
self._last_timestamp = None
# Store duration for cumulative offset calculation
if evt.result and evt.result.audio_duration:
self._current_sentence_duration = evt.result.audio_duration.total_seconds()
self._audio_queue.put_nowait(None) # Signal completion
# Route completion through the word boundary queue so the word processor
# task drains all pending words before signaling audio stream completion.
# Without this, the last word's TTSTextFrame may arrive after
# TTSStoppedFrame, causing it to be missed by observers and the UI.
self._word_boundary_queue.put_nowait(None)
def _handle_canceled(self, evt):
"""Handle synthesis cancellation.

View File

@@ -42,7 +42,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use Cartesia, you need to `pip install pipecat-ai[cartesia]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
@dataclass
@@ -354,7 +354,8 @@ class CartesiaSTTService(WebsocketSTTService):
self._websocket = await websocket_connect(ws_url, additional_headers=headers)
await self._call_event_handler("on_connected")
except Exception as e:
await self.push_error(error_msg=f"Unknown error occurred: {e}", exception=e)
self._websocket = None
await self.push_error(error_msg=f"Unable to connect to Cartesia: {e}", exception=e)
async def _disconnect_websocket(self):
ws = self._websocket

View File

@@ -8,6 +8,7 @@
import base64
import json
import re
from collections.abc import AsyncGenerator
from dataclasses import dataclass, field
from enum import StrEnum
@@ -39,7 +40,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use Cartesia, you need to `pip install pipecat-ai[cartesia]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
class GenerationConfig(BaseModel):
@@ -431,10 +432,20 @@ class CartesiaTTSService(WebsocketTTSService):
base_lang = language.split("-")[0].lower()
return base_lang in {"zh", "ja"}
def _process_word_timestamps_for_language(
_CARTESIA_TAG_RE = re.compile(r"</?(?:spell|emotion|break|volume|speed)\b[^>]*>", re.IGNORECASE)
def _strip_cartesia_tags(self, text: str) -> str:
text = self._CARTESIA_TAG_RE.sub(" ", text)
text = re.sub(r"\s+", " ", text)
return text.strip()
def _normalize_word_timestamps(
self, words: list[str], starts: list[float]
) -> list[tuple[str, float]]:
"""Process word timestamps based on the current language.
"""Normalize raw word timestamps from Cartesia before further processing.
Strips Cartesia SSML tags (spell, emotion, break, volume, speed) from each word
and drops entries that become empty after stripping.
For Chinese and Japanese, Cartesia groups related characters in the same timestamp
message.
@@ -458,14 +469,18 @@ class CartesiaTTSService(WebsocketTTSService):
# For Chinese/Japanese, combine all characters in this message into one word
# using the first character's start time.
if words and starts:
combined_word = "".join(words)
combined_word = "".join(self._strip_cartesia_tags(w) for w in words)
first_start = starts[0]
return [(combined_word, first_start)]
return [(combined_word, first_start)] if combined_word else []
else:
return []
else:
# For non-CJK languages, use as-is
return list(zip(words, starts))
result = []
for word, start in zip(words, starts):
cleaned = self._strip_cartesia_tags(word)
if cleaned:
result.append((cleaned, start))
return result
def _word_timestamps_include_inter_frame_spaces(self) -> bool:
"""Whether timestamp text should be treated as carrying its own spacing."""
@@ -662,7 +677,7 @@ class CartesiaTTSService(WebsocketTTSService):
await self.remove_audio_context(ctx_id)
elif msg["type"] == "timestamps":
# Process the timestamps based on language before adding them
processed_timestamps = self._process_word_timestamps_for_language(
processed_timestamps = self._normalize_word_timestamps(
msg["word_timestamps"]["words"], msg["word_timestamps"]["start"]
)
await self.add_word_timestamps(

View File

@@ -31,7 +31,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use Deepgram Flux, you need to `pip install pipecat-ai[deepgram]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
# Re-export for backward compatibility
__all__ = [

View File

@@ -48,7 +48,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use Deepgram, you need to `pip install pipecat-ai[deepgram]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
class LiveOptions:

View File

@@ -39,7 +39,7 @@ except ModuleNotFoundError as e:
logger.error(
"In order to use DeepgramWebsocketTTSService, you need to `pip install pipecat-ai[deepgram]`."
)
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
@dataclass

View File

@@ -52,7 +52,7 @@ except ModuleNotFoundError as e:
logger.error(
"In order to use ElevenLabs Realtime STT, you need to `pip install pipecat-ai[elevenlabs]`."
)
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
def language_to_elevenlabs_language(language: Language) -> str:
@@ -823,6 +823,7 @@ class ElevenLabsRealtimeSTTService(WebsocketSTTService):
await self._call_event_handler("on_connected")
logger.debug("Connected to ElevenLabs Realtime STT")
except Exception as e:
self._websocket = None
await self.push_error(
error_msg=f"Unable to connect to ElevenLabs Realtime STT: {e}", exception=e
)

View File

@@ -56,7 +56,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use ElevenLabs, you need to `pip install pipecat-ai[elevenlabs]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
# Models that support language codes
# The following models are excluded as they don't support language codes:
@@ -594,6 +594,10 @@ class ElevenLabsTTSService(WebsocketTTSService):
self._partial_word_start_time = 0.0
self._alignment_started_context_ids: set[str | None] = set()
# Context IDs whose context-init has been sent, so the keepalive knows
# which contexts are safe to target.
self._context_init_sent: set[str] = set()
# Context management for v1 multi API
self._receive_task = None
self._keepalive_task = None
@@ -792,6 +796,7 @@ class ElevenLabsTTSService(WebsocketTTSService):
finally:
await self.remove_active_audio_context()
self._websocket = None
self._context_init_sent.clear()
await self._call_event_handler("on_disconnected")
def _get_websocket(self):
@@ -822,6 +827,7 @@ class ElevenLabsTTSService(WebsocketTTSService):
self._partial_word = ""
self._partial_word_start_time = 0.0
self._alignment_started_context_ids.discard(context_id)
self._context_init_sent.discard(context_id)
async def on_audio_context_interrupted(self, context_id: str):
"""Close the ElevenLabs context when the bot is interrupted."""
@@ -914,26 +920,35 @@ class ElevenLabsTTSService(WebsocketTTSService):
while True:
await asyncio.sleep(KEEPALIVE_SLEEP)
try:
if self._websocket and self._websocket.state is State.OPEN:
context_id = self.get_active_audio_context_id()
if context_id:
# Send keepalive with context ID to keep the connection alive
keepalive_message = {
"text": "",
"context_id": context_id,
}
logger.trace(f"Sending keepalive for context {context_id}")
else:
# It's possible to have a user interruption which clears the context
# without generating a new TTS response. In this case, we'll just send
# an empty message to keep the connection alive.
keepalive_message = {"text": ""}
logger.trace("Sending keepalive without context")
await self._websocket.send(json.dumps(keepalive_message))
await self._send_keepalive()
except websockets.ConnectionClosed as e:
logger.warning(f"{self} keepalive error: {e}")
break
async def _send_keepalive(self):
"""Send a single keepalive message to keep the WebSocket connection alive.
Only stamps a ``context_id`` once its context-init (carrying
``voice_settings``) has been sent. Otherwise the keepalive would be the
context's first message, with no ``voice_settings``, and ElevenLabs would
reject the later context-init with a 1008 policy violation. A context-less
keepalive is sufficient until the context-init is sent.
"""
if not self._websocket or self._websocket.state is not State.OPEN:
return
context_id = self.get_active_audio_context_id()
if context_id and context_id in self._context_init_sent:
# The context's voice_settings context-init has been sent, so it's
# safe to keep that context alive.
keepalive_message = {"text": "", "context_id": context_id}
else:
# No active context, or the active context's context-init hasn't been
# sent yet. A context-less keepalive keeps the connection alive without
# opening the context prematurely.
keepalive_message = {"text": ""}
await self._websocket.send(json.dumps(keepalive_message))
async def _send_text(self, text: str, context_id: str):
"""Send text to the WebSocket for synthesis."""
if self._websocket and context_id:
@@ -980,6 +995,9 @@ class ElevenLabsTTSService(WebsocketTTSService):
locator.model_dump()
for locator in self._pronunciation_dictionary_locators
]
# Mark the context-init as sent so the keepalive may now
# target this context_id.
self._context_init_sent.add(context_id)
await self._websocket.send(json.dumps(msg))
logger.trace(f"Created new context {context_id}")

View File

@@ -38,7 +38,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use Fish Audio, you need to `pip install pipecat-ai[fish]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
# FishAudio supports various output formats
FishAudioOutputFormat = Literal["opus", "mp3", "pcm", "wav"]

View File

@@ -53,7 +53,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use Gladia, you need to `pip install pipecat-ai[gladia]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
def language_to_gladia_language(language: Language) -> str:
@@ -558,8 +558,9 @@ class GladiaSTTService(WebsocketSTTService):
logger.debug(f"{self} Connected to Gladia WebSocket")
except Exception as e:
self._websocket = None
self._connection_active = False
await self.push_error(error_msg=f"Unable to connect to Gladia: {e}", exception=e)
raise
async def _disconnect_websocket(self):
"""Close the websocket connection to Gladia."""

View File

@@ -105,7 +105,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use Google AI, you need to `pip install pipecat-ai[google]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
# Connection management constants

View File

@@ -36,7 +36,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use Google Vertex AI, you need to `pip install pipecat-ai[google]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
@dataclass

View File

@@ -35,7 +35,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use Google AI, you need to `pip install pipecat-ai[google]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
@dataclass

View File

@@ -65,7 +65,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use Google AI, you need to `pip install pipecat-ai[google]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
class GoogleThinkingConfig(BaseModel):

View File

@@ -57,7 +57,7 @@ except ModuleNotFoundError as e:
logger.error(
"In order to use Google AI, you need to `pip install pipecat-ai[google]`. Also, set `GOOGLE_APPLICATION_CREDENTIALS` environment variable."
)
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
def language_to_google_stt_language(language: Language) -> str:

View File

@@ -57,7 +57,7 @@ except ModuleNotFoundError as e:
logger.error(
"In order to use Google AI, you need to `pip install pipecat-ai[google]`. Also, set `GOOGLE_APPLICATION_CREDENTIALS` environment variable."
)
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
def language_to_google_tts_language(language: Language) -> str:

View File

@@ -35,7 +35,7 @@ except ModuleNotFoundError as e:
logger.error(
"In order to use Google AI, you need to `pip install pipecat-ai[google]`. Also, set `GOOGLE_APPLICATION_CREDENTIALS` environment variable."
)
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
@dataclass

View File

@@ -44,7 +44,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error('In order to use Gradium, you need to `pip install "pipecat-ai[gradium]"`.')
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
# Seconds to wait after a "flushed" message for trailing text tokens to arrive
# before finalizing the transcription.
@@ -423,8 +423,8 @@ class GradiumSTTService(WebsocketSTTService):
logger.debug("Connected to Gradium STT")
except Exception as e:
await self.push_error(error_msg=f"Unknown error occurred: {e}", exception=e)
raise
self._websocket = None
await self.push_error(error_msg=f"Unable to connect to Gradium: {e}", exception=e)
async def _disconnect(self):
await super()._disconnect()

View File

@@ -33,7 +33,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use Gradium, you need to `pip install pipecat-ai[gradium]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
SAMPLE_RATE = 48000

View File

@@ -30,7 +30,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use Groq, you need to `pip install pipecat-ai[groq]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
# Hint set for `output_format`. The values mirror the Literal that
# `groq.resources.audio.speech.AsyncSpeech.create` accepts on its

View File

@@ -46,7 +46,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use HeyGen, you need to `pip install pipecat-ai[heygen]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
HEY_GEN_SAMPLE_RATE = 24000

View File

@@ -38,7 +38,7 @@ try:
except ModuleNotFoundError as e: # pragma: no cover - import-time guidance
logger.error(f"Exception: {e}")
logger.error("In order to use Hume, you need to `pip install pipecat-ai[hume]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
HUME_SAMPLE_RATE = 48_000 # Hume TTS streams at 48 kHz

View File

@@ -0,0 +1,124 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
"""Inception LLM service implementation using OpenAI-compatible interface."""
from dataclasses import dataclass, field
from typing import Literal
from loguru import logger
from pipecat.adapters.services.open_ai_adapter import OpenAILLMInvocationParams
from pipecat.services.openai.base_llm import BaseOpenAILLMService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.settings import NOT_GIVEN as _NOT_GIVEN
from pipecat.services.settings import _NotGiven, is_given
@dataclass
class InceptionLLMSettings(BaseOpenAILLMService.Settings):
"""Settings for InceptionLLMService.
Parameters:
reasoning_effort: Controls how much reasoning the model applies.
One of "instant", "low", "medium", or "high". When unset, the
parameter is omitted and Inception's server-side default applies.
realtime: When True, reduces time to first diffusion block (TTFT).
"""
reasoning_effort: Literal["instant", "low", "medium", "high"] | None | _NotGiven = field(
default_factory=lambda: _NOT_GIVEN
)
realtime: bool | None | _NotGiven = field(default_factory=lambda: _NOT_GIVEN)
class InceptionLLMService(OpenAILLMService):
"""A service for interacting with Inception's API using the OpenAI-compatible interface.
This service extends OpenAILLMService to connect to Inception's API endpoint while
maintaining full compatibility with OpenAI's interface and functionality.
Supports Mercury-2, Inception's diffusion-based reasoning model.
"""
# Inception doesn't support the "developer" message role.
supports_developer_role = False
Settings = InceptionLLMSettings
_settings: Settings
def __init__(
self,
*,
api_key: str,
base_url: str = "https://api.inceptionlabs.ai/v1",
settings: Settings | None = None,
**kwargs,
):
"""Initialize the Inception LLM service.
Args:
api_key: The API key for accessing Inception's API.
base_url: The base URL for Inception API. Defaults to "https://api.inceptionlabs.ai/v1".
settings: Runtime-updatable settings.
**kwargs: Additional keyword arguments passed to OpenAILLMService.
"""
default_settings = self.Settings(
model="mercury-2",
reasoning_effort=None,
realtime=None,
)
if settings is not None:
default_settings.apply_update(settings)
super().__init__(api_key=api_key, base_url=base_url, settings=default_settings, **kwargs)
def create_client(self, api_key=None, base_url=None, **kwargs):
"""Create OpenAI-compatible client for Inception API endpoint.
Args:
api_key: The API key for authentication. If None, uses instance default.
base_url: The base URL for the API. If None, uses instance default.
**kwargs: Additional keyword arguments for client configuration.
Returns:
An OpenAI-compatible client configured for Inception's API.
"""
logger.debug(f"Creating Inception client with api {base_url}")
return super().create_client(api_key, base_url, **kwargs)
def build_chat_completion_params(self, params_from_context: OpenAILLMInvocationParams) -> dict:
"""Build parameters for Inception chat completion request.
Extends the base OpenAI parameters with Inception-specific options
such as reasoning_effort and realtime.
Args:
params_from_context: Parameters, derived from the LLM context, to
use for the chat completion. Contains messages, tools, and tool
choice.
Returns:
Dictionary of parameters for the chat completion request.
"""
params = super().build_chat_completion_params(params_from_context)
if (
is_given(self._settings.reasoning_effort)
and self._settings.reasoning_effort is not None
):
params["reasoning_effort"] = self._settings.reasoning_effort
# realtime is Inception-specific and unknown to the OpenAI SDK,
# so it must be passed via extra_body to avoid validation errors.
extra_body = {}
if is_given(self._settings.realtime) and self._settings.realtime is not None:
extra_body["realtime"] = self._settings.realtime
if extra_body:
params["extra_body"] = extra_body
return params

View File

@@ -68,7 +68,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use Inworld Realtime, you need to `pip install pipecat-ai[inworld]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
@dataclass

View File

@@ -43,7 +43,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use Inworld WebSocket TTS, you need to `pip install websockets`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
from pipecat.frames.frames import (
AggregationType,

View File

@@ -32,7 +32,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use Kokoro, you need to `pip install pipecat-ai[kokoro]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
KOKORO_CACHE_DIR = Path(os.path.expanduser("~/.cache/kokoro-onnx"))
KOKORO_MODEL_URL = "https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/kokoro-v1.0.onnx"

View File

@@ -34,7 +34,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use LMNT, you need to `pip install pipecat-ai[lmnt]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
def language_to_lmnt_language(language: Language) -> str:

View File

@@ -29,7 +29,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use an MCP client, you need to `pip install pipecat-ai[mcp]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
ServerParameters: TypeAlias = StdioServerParameters | SseServerParameters | StreamableHttpParameters

View File

@@ -28,7 +28,7 @@ except ModuleNotFoundError as e:
logger.error(
"In order to use Mem0, you need to `pip install mem0ai`. Also, set the environment variable MEM0_API_KEY."
)
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
class Mem0MemoryService(FrameProcessor):

View File

@@ -48,7 +48,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use Mistral STT, you need to `pip install pipecat-ai[mistral]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
@dataclass

View File

@@ -31,7 +31,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use Mistral TTS, you need to `pip install pipecat-ai[mistral]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
@dataclass

View File

@@ -34,7 +34,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use Moondream, you need to `pip install pipecat-ai[moondream]`.")
raise ImportError(f"Missing module(s): {e}") from e
raise Exception(f"Missing module(s): {e}")
def detect_device():

View File

@@ -41,7 +41,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use Neuphonic, you need to `pip install pipecat-ai[neuphonic]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
def language_to_neuphonic_lang_code(language: Language) -> str:

View File

@@ -46,7 +46,7 @@ except ModuleNotFoundError as e:
logger.error(
"In order to use NVIDIA Nemotron Speech STT, you need to `pip install pipecat-ai[nvidia]`."
)
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
def language_to_nvidia_nemotron_speech_language(language: Language) -> str:

View File

@@ -55,7 +55,7 @@ except ModuleNotFoundError as e:
logger.error(
"In order to use NVIDIA Nemotron Speech TTS, you need to `pip install pipecat-ai[nvidia]`."
)
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
@dataclass

View File

@@ -71,7 +71,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use OpenAI, you need to `pip install pipecat-ai[openai]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
@dataclass

View File

@@ -59,7 +59,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use OpenAI, you need to `pip install pipecat-ai[openai]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
# ---------------------------------------------------------------------------

View File

@@ -30,7 +30,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use Piper, you need to `pip install pipecat-ai[piper]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
@dataclass

View File

@@ -33,7 +33,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use Resemble AI, you need to `pip install pipecat-ai[resembleai]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
@dataclass

View File

@@ -47,7 +47,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use Rime, you need to `pip install pipecat-ai[rime]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
def language_to_rime_language(language: Language) -> str:

View File

@@ -53,7 +53,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use Sarvam, you need to `pip install pipecat-ai[sarvam]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
def language_to_sarvam_language(language: Language) -> str:

View File

@@ -70,7 +70,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use Sarvam, you need to `pip install pipecat-ai[sarvam]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
class SarvamTTSModel(StrEnum):

View File

@@ -35,7 +35,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use Simli, you need to `pip install pipecat-ai[simli]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
@dataclass

View File

@@ -48,7 +48,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use Smallest, you need to `pip install pipecat-ai[smallest]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
def language_to_smallest_stt_language(language: Language) -> str:

View File

@@ -41,7 +41,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use Smallest, you need to `pip install pipecat-ai[smallest]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
class SmallestTTSModel(StrEnum):

View File

@@ -39,7 +39,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use Soniox, you need to `pip install pipecat-ai[soniox]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
KEEPALIVE_MESSAGE = '{"type": "keepalive"}'
@@ -155,7 +155,6 @@ def language_to_soniox_language(language: Language) -> str:
Language.ID: "id",
Language.IT: "it",
Language.JA: "ja",
Language.KA: "ka",
Language.KK: "kk",
Language.KN: "kn",
Language.KO: "ko",
@@ -232,6 +231,7 @@ class SonioxSTTSettings(STTSettings):
context_version 2.
enable_speaker_diarization: Whether to enable speaker diarization.
enable_language_identification: Whether to enable language identification.
max_endpoint_delay_ms: Max ms before endpoint detection finalizes the turn (500-3000).
client_reference_id: Client reference ID to use for transcription.
"""
@@ -242,6 +242,7 @@ class SonioxSTTSettings(STTSettings):
enable_language_identification: bool | None | _NotGiven = field(
default_factory=lambda: NOT_GIVEN
)
max_endpoint_delay_ms: int | None | _NotGiven = field(default_factory=lambda: NOT_GIVEN)
client_reference_id: str | None | _NotGiven = field(default_factory=lambda: NOT_GIVEN)
@@ -309,6 +310,7 @@ class SonioxSTTService(WebsocketSTTService):
context=None,
enable_speaker_diarization=False,
enable_language_identification=False,
max_endpoint_delay_ms=None,
client_reference_id=None,
)
@@ -390,8 +392,7 @@ class SonioxSTTService(WebsocketSTTService):
changed = await super()._update_settings(delta)
if changed:
await self._disconnect()
await self._connect()
await self._request_reconnect()
return changed
@@ -522,6 +523,7 @@ class SonioxSTTService(WebsocketSTTService):
"audio_format": self._audio_format,
"num_channels": self._num_channels,
"enable_endpoint_detection": enable_endpoint_detection,
"max_endpoint_delay_ms": s.max_endpoint_delay_ms,
"sample_rate": self.sample_rate,
"language_hints": _prepare_language_hints(assert_given(s.language_hints)),
"language_hints_strict": s.language_hints_strict,
@@ -537,8 +539,8 @@ class SonioxSTTService(WebsocketSTTService):
await self._call_event_handler("on_connected")
logger.debug("Connected to Soniox STT")
except Exception as e:
self._websocket = None
await self.push_error(error_msg=f"Unable to connect to Soniox: {e}", exception=e)
raise
async def _disconnect_websocket(self):
"""Close the websocket connection to Soniox."""

View File

@@ -44,7 +44,7 @@ try:
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error("In order to use Soniox, you need to `pip install pipecat-ai[soniox]`.")
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
# Soniox idle timeout is 20-30s; keepalive cadence must stay well inside it.

View File

@@ -61,7 +61,7 @@ except ModuleNotFoundError as e:
logger.error(
"In order to use Speechmatics, you need to `pip install pipecat-ai[speechmatics]`."
)
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
load_dotenv()

View File

@@ -32,7 +32,7 @@ except ModuleNotFoundError as e:
logger.error(
"In order to use Speechmatics, you need to `pip install pipecat-ai[speechmatics]`."
)
raise ImportError(f"Missing module: {e}") from e
raise Exception(f"Missing module: {e}")
@dataclass

Some files were not shown because too many files have changed in this diff Show More