Compare commits
73 Commits
vp-moq-vib
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
780c004168 | ||
|
|
28f9203401 | ||
|
|
77cc314a08 | ||
|
|
4a8d1d0b5e | ||
|
|
87f5d60693 | ||
|
|
c699b31daa | ||
|
|
ee674ffb01 | ||
|
|
86a5710801 | ||
|
|
4a96b2a9e6 | ||
|
|
105d6f27da | ||
|
|
e0e3cd336a | ||
|
|
9586db5b50 | ||
|
|
a890ab7b21 | ||
|
|
c1bf7dbb4a | ||
|
|
709a0ce839 | ||
|
|
be93350eae | ||
|
|
4a96ab7073 | ||
|
|
c321f50e76 | ||
|
|
bca337f97e | ||
|
|
5d9e8c5ac5 | ||
|
|
70773bce0a | ||
|
|
8bdb49bd1a | ||
|
|
81bb81c1d0 | ||
|
|
e1bdee598c | ||
|
|
185a89bb3b | ||
|
|
6b9deefbe3 | ||
|
|
deefc32faf | ||
|
|
a5e6886b80 | ||
|
|
d11a4ba0cd | ||
|
|
38407e091d | ||
|
|
82cd931efa | ||
|
|
33e5d1f89b | ||
|
|
861dd23873 | ||
|
|
b825dd779e | ||
|
|
1487da53a9 | ||
|
|
aff84a5d9e | ||
|
|
c09f6d5adb | ||
|
|
e2d249e5d9 | ||
|
|
956b39b0dc | ||
|
|
e298491068 | ||
|
|
97b00042df | ||
|
|
bc769eaa82 | ||
|
|
ee5aa4dc71 | ||
|
|
dd38fbc735 | ||
|
|
a1c40df471 | ||
|
|
c4ff9300c9 | ||
|
|
cab4585cbb | ||
|
|
18368d047e | ||
|
|
e3abb4b6d7 | ||
|
|
0fd971d59d | ||
|
|
c61672194d | ||
|
|
c51a817efa | ||
|
|
d85eda6da8 | ||
|
|
71feb42711 | ||
|
|
6b93ca0cb6 | ||
|
|
b6ecce754b | ||
|
|
d39e6bf921 | ||
|
|
63064860ef | ||
|
|
f5158d51e7 | ||
|
|
94dbd2fa68 | ||
|
|
b493ed8d3a | ||
|
|
c3338667b1 | ||
|
|
c8efe319b3 | ||
|
|
d6655e7a5e | ||
|
|
33b73df6ec | ||
|
|
c9f0172e9f | ||
|
|
2638885c62 | ||
|
|
cb426cbb14 | ||
|
|
d39beff817 | ||
|
|
1eade184f1 | ||
|
|
3fa193b983 | ||
|
|
6feeee515f | ||
|
|
55fb4b0845 |
91
.claude/skills/squash-commits/SKILL.md
Normal file
91
.claude/skills/squash-commits/SKILL.md
Normal file
@@ -0,0 +1,91 @@
|
||||
---
|
||||
name: squash-commits
|
||||
description: Reorganize messy branch commits into a small set of logical, meaningful commits without changing any content. Drops merge-from-main commits. Safe: creates a backup branch first.
|
||||
---
|
||||
|
||||
Reorganize the commits on the current branch into a small number of logical commits. Do NOT change any file content — only the commit structure changes.
|
||||
|
||||
## Instructions
|
||||
|
||||
### 1. Safety check
|
||||
|
||||
```bash
|
||||
git status --short
|
||||
```
|
||||
|
||||
If there are uncommitted changes, stop and tell the user to commit or stash them first.
|
||||
|
||||
### 2. Inspect the branch
|
||||
|
||||
```bash
|
||||
git log main..HEAD --oneline
|
||||
git diff main..HEAD --name-only
|
||||
```
|
||||
|
||||
List every file changed vs `main` and every commit on the branch (excluding merge commits from main).
|
||||
|
||||
### 3. Create a backup branch
|
||||
|
||||
```bash
|
||||
git branch backup/<current-branch-name>
|
||||
```
|
||||
|
||||
Tell the user the backup exists so they can recover if needed.
|
||||
|
||||
### 4. Soft-reset to main and unstage everything
|
||||
|
||||
```bash
|
||||
git reset --soft main
|
||||
git restore --staged .
|
||||
```
|
||||
|
||||
All branch changes are now in the working tree, unstaged. No content has changed.
|
||||
|
||||
### 5. Plan the logical groups
|
||||
|
||||
Read the changed files and the original commit messages to understand what the work covers. Group related files into logical commits. Typical groups:
|
||||
|
||||
- Core feature or fix (new source files + modified core files)
|
||||
- Secondary features or fixes (each as its own commit if distinct)
|
||||
- Refactoring or renames
|
||||
- Tests
|
||||
- Changelogs / docs
|
||||
|
||||
Use the changelog files (if any) as a strong hint — each changelog entry often maps to one commit.
|
||||
|
||||
Present the proposed grouping to the user and ask for confirmation before committing.
|
||||
|
||||
### 6. Commit in logical groups
|
||||
|
||||
For each group, stage only the relevant files and commit with a clear message following the project's conventions:
|
||||
|
||||
```bash
|
||||
git add <file1> <file2> ...
|
||||
git commit -m "..."
|
||||
```
|
||||
|
||||
Use conventional commit prefixes if the project uses them (`feat:`, `fix:`, `refactor:`, `test:`, `chore:`).
|
||||
|
||||
### 7. Verify
|
||||
|
||||
```bash
|
||||
git log main..HEAD --oneline
|
||||
git diff main..HEAD --name-only
|
||||
git status --short
|
||||
```
|
||||
|
||||
Confirm:
|
||||
- Commit count is small and each message is meaningful
|
||||
- The set of changed files vs `main` is identical to before
|
||||
- Working tree is clean
|
||||
|
||||
### 8. Remind about force-push
|
||||
|
||||
The branch history has been rewritten. Tell the user they will need to `git push --force-with-lease` when they are ready to update the remote. Do NOT push automatically.
|
||||
|
||||
## Rules
|
||||
|
||||
- Never change file contents. If you find yourself editing a file, stop.
|
||||
- Never skip the backup branch step.
|
||||
- Never force-push without explicit user instruction.
|
||||
- If any step fails or the result looks wrong, tell the user and suggest restoring from the backup: `git reset --hard backup/<branch-name>`.
|
||||
21
CHANGELOG.md
21
CHANGELOG.md
@@ -7,6 +7,27 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
||||
|
||||
<!-- towncrier release notes start -->
|
||||
|
||||
## [1.2.1] - 2026-05-15
|
||||
|
||||
### Changed
|
||||
|
||||
- Changed the default WebSocket endpoints for `GradiumSTTService` and
|
||||
`GradiumTTSService` to the region-neutral
|
||||
`wss://api.gradium.ai/api/speech/asr` and
|
||||
`wss://api.gradium.ai/api/speech/tts`. Gradium now automatically routes
|
||||
traffic to the nearest endpoint. Override the url to pin to a specific
|
||||
region.
|
||||
(PR [#4500](https://github.com/pipecat-ai/pipecat/pull/4500))
|
||||
|
||||
### Fixed
|
||||
|
||||
- Fixed bot hangs when `filter_incomplete_user_turns` was enabled and the LLM
|
||||
responded by calling a tool. The user turn never finalized, so the assistant
|
||||
aggregator gated the tool-result context push and the LLM continuation never
|
||||
ran. Tool calls now finalize the turn the moment they start, before the
|
||||
function dispatches.
|
||||
(PR [#4501](https://github.com/pipecat-ai/pipecat/pull/4501))
|
||||
|
||||
## [1.2.0] - 2026-05-14
|
||||
|
||||
### Added
|
||||
|
||||
@@ -92,10 +92,10 @@ Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.yout
|
||||
| Category | Services |
|
||||
| ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/api-reference/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/api-reference/server/services/stt/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/api-reference/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/api-reference/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/api-reference/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/api-reference/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/api-reference/server/services/stt/gladia), [Google](https://docs.pipecat.ai/api-reference/server/services/stt/google), [Gradium](https://docs.pipecat.ai/api-reference/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/api-reference/server/services/stt/groq), [Mistral](https://docs.pipecat.ai/api-reference/server/services/stt/mistral), [NVIDIA](https://docs.pipecat.ai/api-reference/server/services/stt/nvidia), [OpenAI (Whisper)](https://docs.pipecat.ai/api-reference/server/services/stt/openai), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/api-reference/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/api-reference/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/api-reference/server/services/stt/whisper), [xAI](https://docs.pipecat.ai/api-reference/server/services/stt/xai) |
|
||||
| LLMs | [Anthropic](https://docs.pipecat.ai/api-reference/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/api-reference/server/services/llm/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/api-reference/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/api-reference/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/api-reference/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/api-reference/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/api-reference/server/services/llm/grok), [Groq](https://docs.pipecat.ai/api-reference/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/api-reference/server/services/llm/mistral), [Nebius](https://docs.pipecat.ai/api-reference/server/services/llm/nebius), [Novita](https://docs.pipecat.ai/api-reference/server/services/llm/novita), [NVIDIA NIM](https://docs.pipecat.ai/api-reference/server/services/llm/nvidia), [Ollama](https://docs.pipecat.ai/api-reference/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/api-reference/server/services/llm/openai), [OpenAI Responses](https://docs.pipecat.ai/api-reference/server/services/llm/openai-responses), [OpenRouter](https://docs.pipecat.ai/api-reference/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/api-reference/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/api-reference/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/api-reference/server/services/llm/sambanova), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/llm/sarvam), [Together AI](https://docs.pipecat.ai/api-reference/server/services/llm/together) |
|
||||
| LLMs | [Anthropic](https://docs.pipecat.ai/api-reference/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/api-reference/server/services/llm/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/api-reference/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/api-reference/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/api-reference/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/api-reference/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/api-reference/server/services/llm/grok), [Groq](https://docs.pipecat.ai/api-reference/server/services/llm/groq), [Inception](https://docs.pipecat.ai/api-reference/server/services/llm/inception), [Mistral](https://docs.pipecat.ai/api-reference/server/services/llm/mistral), [Nebius](https://docs.pipecat.ai/api-reference/server/services/llm/nebius), [Novita](https://docs.pipecat.ai/api-reference/server/services/llm/novita), [NVIDIA NIM](https://docs.pipecat.ai/api-reference/server/services/llm/nvidia), [Ollama](https://docs.pipecat.ai/api-reference/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/api-reference/server/services/llm/openai), [OpenAI Responses](https://docs.pipecat.ai/api-reference/server/services/llm/openai-responses), [OpenRouter](https://docs.pipecat.ai/api-reference/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/api-reference/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/api-reference/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/api-reference/server/services/llm/sambanova), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/llm/sarvam), [Together AI](https://docs.pipecat.ai/api-reference/server/services/llm/together) |
|
||||
| Text-to-Speech | [Async](https://docs.pipecat.ai/api-reference/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/api-reference/server/services/tts/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/tts/azure), [Camb AI](https://docs.pipecat.ai/api-reference/server/services/tts/camb), [Cartesia](https://docs.pipecat.ai/api-reference/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/api-reference/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/api-reference/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/api-reference/server/services/tts/fish), [Google](https://docs.pipecat.ai/api-reference/server/services/tts/google), [Gradium](https://docs.pipecat.ai/api-reference/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/api-reference/server/services/tts/groq), [Hume](https://docs.pipecat.ai/api-reference/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/api-reference/server/services/tts/inworld), [Kokoro](https://docs.pipecat.ai/api-reference/server/services/tts/kokoro), [LMNT](https://docs.pipecat.ai/api-reference/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/api-reference/server/services/tts/minimax), [Mistral](https://docs.pipecat.ai/api-reference/server/services/tts/mistral), [Neuphonic](https://docs.pipecat.ai/api-reference/server/services/tts/neuphonic), [NVIDIA](https://docs.pipecat.ai/api-reference/server/services/tts/nvidia), [OpenAI](https://docs.pipecat.ai/api-reference/server/services/tts/openai), [Piper](https://docs.pipecat.ai/api-reference/server/services/tts/piper), [Resemble](https://docs.pipecat.ai/api-reference/server/services/tts/resemble), [Rime](https://docs.pipecat.ai/api-reference/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/tts/sarvam), [Smallest](https://docs.pipecat.ai/api-reference/server/services/tts/smallest), [Soniox](https://docs.pipecat.ai/api-reference/server/services/tts/soniox), [Speechmatics](https://docs.pipecat.ai/api-reference/server/services/tts/speechmatics), [xAI](https://docs.pipecat.ai/api-reference/server/services/tts/xai), [XTTS](https://docs.pipecat.ai/api-reference/server/services/tts/xtts) |
|
||||
| Speech-to-Speech | [AWS Nova Sonic](https://docs.pipecat.ai/api-reference/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/api-reference/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/api-reference/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/api-reference/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/api-reference/server/services/s2s/ultravox), |
|
||||
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/api-reference/server/services/transport/fastapi-websocket), [LiveKit (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/livekit), [SmallWebRTCTransport](https://docs.pipecat.ai/api-reference/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/api-reference/server/services/transport/websocket-server), [WhatsApp](https://docs.pipecat.ai/api-reference/server/services/transport/whatsapp), Local |
|
||||
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/api-reference/server/services/transport/fastapi-websocket), [LiveKit (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/livekit), [SmallWebRTCTransport](https://docs.pipecat.ai/api-reference/server/services/transport/small-webrtc), [Vonage (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/vonage), [WebSocket Server](https://docs.pipecat.ai/api-reference/server/services/transport/websocket-server), [WhatsApp](https://docs.pipecat.ai/api-reference/server/services/transport/whatsapp), Local |
|
||||
| Serializers | [Exotel](https://docs.pipecat.ai/api-reference/server/services/serializers/exotel), [Genesys](https://docs.pipecat.ai/api-reference/server/services/serializers/genesys), [Plivo](https://docs.pipecat.ai/api-reference/server/services/serializers/plivo), [Twilio](https://docs.pipecat.ai/api-reference/server/services/serializers/twilio), [Telnyx](https://docs.pipecat.ai/api-reference/server/services/serializers/telnyx), [Vonage](https://docs.pipecat.ai/api-reference/server/services/serializers/vonage) |
|
||||
| Video | [HeyGen](https://docs.pipecat.ai/api-reference/server/services/video/heygen), [LemonSlice](https://docs.pipecat.ai/api-reference/server/services/transport/lemonslice), [Tavus](https://docs.pipecat.ai/api-reference/server/services/video/tavus), [Simli](https://docs.pipecat.ai/api-reference/server/services/video/simli) |
|
||||
| Memory | [mem0](https://docs.pipecat.ai/api-reference/server/services/memory/mem0) |
|
||||
|
||||
1
changelog/4052.added.md
Normal file
1
changelog/4052.added.md
Normal file
@@ -0,0 +1 @@
|
||||
- Added `VonageVideoConnectorTransport`, a new transport integration for real-time Vonage WebRTC sessions using the Vonage Video Connector library.
|
||||
1
changelog/4306.fixed.md
Normal file
1
changelog/4306.fixed.md
Normal file
@@ -0,0 +1 @@
|
||||
- Fixed Azure TTS last word being missed by observers and RTVI UI. The completion signal was racing with word timestamp processing, causing the final word's `TTSTextFrame` to arrive after `TTSStoppedFrame`. Completion is now routed through the word boundary queue to ensure all words are processed before signaling stream end.
|
||||
1
changelog/4380.fixed.2.md
Normal file
1
changelog/4380.fixed.2.md
Normal file
@@ -0,0 +1 @@
|
||||
- Fixed `BaseOutputTransport` reordering frames that share the same presentation timestamp. Frames with equal PTS values are now emitted in insertion order, preventing subtle audio/text sequencing bugs when multiple frames arrive at the same time.
|
||||
1
changelog/4380.fixed.3.md
Normal file
1
changelog/4380.fixed.3.md
Normal file
@@ -0,0 +1 @@
|
||||
- Fixed Cartesia word timestamps leaking SSML tag text (e.g. `<spell>`, `<emotion>`, `<break>`) into word entries. Tags are now stripped before processing, so word-to-text attribution remains accurate when SSML markup is present in the TTS input.
|
||||
1
changelog/4380.fixed.4.md
Normal file
1
changelog/4380.fixed.4.md
Normal file
@@ -0,0 +1 @@
|
||||
- Fixed `TTSTextFrame` entries losing their original text structure when word timestamps are enabled. Each `TTSTextFrame` now carries a `raw_text` field containing the corresponding span of the original LLM-produced text (including pattern delimiters such as `<card>4111 1111 1111 1111</card>`), so the assistant context receives properly-tagged content rather than the cleaned words returned by the TTS provider. Also handles words that straddle two sentence boundaries by splitting them and attributing each part to its correct source frame.
|
||||
1
changelog/4380.fixed.md
Normal file
1
changelog/4380.fixed.md
Normal file
@@ -0,0 +1 @@
|
||||
- Fixed skipped TTS frames (e.g. code blocks filtered via `skip_aggregator_types`) being emitted to the assistant context immediately instead of waiting for preceding spoken frames to finish. They now hold their position in the frame sequence and are flushed only after all earlier spoken sentences are complete, keeping context ordering correct.
|
||||
1
changelog/4423.added.md
Normal file
1
changelog/4423.added.md
Normal file
@@ -0,0 +1 @@
|
||||
- Added `InceptionLLMService` for Inception's Mercury 2 diffusion reasoning model, with support for `reasoning_effort` and `realtime` settings.
|
||||
1
changelog/4442.added.2.md
Normal file
1
changelog/4442.added.2.md
Normal file
@@ -0,0 +1 @@
|
||||
- Added `GET /status` endpoint to the development runner that reports which transports the running instance accepts (all by default, or the single transport passed via `-t`).
|
||||
1
changelog/4442.added.md
Normal file
1
changelog/4442.added.md
Normal file
@@ -0,0 +1 @@
|
||||
- Added plain WebSocket transport support to the development runner. Bots can now accept connections from non-telephony WebSocket clients (e.g., browser apps using protobuf framing) via the `/ws-client` endpoint alongside other transports.
|
||||
1
changelog/4442.changed.md
Normal file
1
changelog/4442.changed.md
Normal file
@@ -0,0 +1 @@
|
||||
- ⚠️ The development runner now supports all transports (WebRTC, Daily, telephony, plain WebSocket) simultaneously from a single server. The `/start` endpoint accepts a `"transport"` field to select the transport per-request; omitting `-t` at startup enables all transports instead of defaulting to WebRTC. The Daily browser-redirect route moved from `GET /` to `GET /daily`.
|
||||
@@ -1 +0,0 @@
|
||||
- Changed the default WebSocket endpoints for `GradiumSTTService` and `GradiumTTSService` to the region-neutral `wss://api.gradium.ai/api/speech/asr` and `wss://api.gradium.ai/api/speech/tts`. Gradium now automatically routes traffic to the nearest endpoint. Override the url to pin to a specific region.
|
||||
1
changelog/4507.fixed.md
Normal file
1
changelog/4507.fixed.md
Normal file
@@ -0,0 +1 @@
|
||||
- Fixed `ElevenLabsSTTService` crashing when `language` was passed as `None`. When `language` is not set, the service now lets ElevenLabs auto-detect the audio language.
|
||||
1
changelog/4514.fixed.md
Normal file
1
changelog/4514.fixed.md
Normal file
@@ -0,0 +1 @@
|
||||
- Fixed websocket STT connection setup failures so services clear stale websocket state and emit non-fatal error frames, allowing `ServiceSwitcher` failover to keep agents running.
|
||||
1
changelog/4521.added.md
Normal file
1
changelog/4521.added.md
Normal file
@@ -0,0 +1 @@
|
||||
- Added `max_endpoint_delay_ms` to `SonioxSTTService.Settings`, controlling the maximum delay (500-3000 ms) before endpoint detection finalizes a turn.
|
||||
1
changelog/4521.changed.md
Normal file
1
changelog/4521.changed.md
Normal file
@@ -0,0 +1 @@
|
||||
- `SonioxSTTService` now applies settings updates (e.g. via `STTUpdateSettingsFrame`) using a graceful reconnect instead of a hard disconnect/reconnect, preserving the service's reconnect retry behavior.
|
||||
1
changelog/4521.removed.md
Normal file
1
changelog/4521.removed.md
Normal file
@@ -0,0 +1 @@
|
||||
- Removed the unsupported Georgian (`Language.KA`) language mapping from `SonioxSTTService`.
|
||||
1
changelog/4522.changed.md
Normal file
1
changelog/4522.changed.md
Normal file
@@ -0,0 +1 @@
|
||||
- Updated the default p99 TTFS latency values for Smallest AI, Mistral, and XAI STT so turn stop timing uses measured values instead of the conservative fallback.
|
||||
1
changelog/4524.changed.md
Normal file
1
changelog/4524.changed.md
Normal file
@@ -0,0 +1 @@
|
||||
- Updated the development runner startup banner to show the prebuilt client URL once and list enabled or disabled transports with install hints.
|
||||
1
changelog/4524.fixed.md
Normal file
1
changelog/4524.fixed.md
Normal file
@@ -0,0 +1 @@
|
||||
- Fixed the development runner so missing optional transport dependencies disable only their related routes instead of failing startup in all-transport mode.
|
||||
1
changelog/4527.fixed.md
Normal file
1
changelog/4527.fixed.md
Normal file
@@ -0,0 +1 @@
|
||||
- Fixed a race in `ElevenLabsTTSService` where the periodic keepalive could be sent for a new turn's context before that context's `voice_settings` initialization message, causing ElevenLabs to close the WebSocket with a 1008 policy violation (`voice_settings field must be provided in the first message ...`). The keepalive now only targets a context once its context-init has been sent.
|
||||
1
changelog/4531.changed.md
Normal file
1
changelog/4531.changed.md
Normal file
@@ -0,0 +1 @@
|
||||
- Bumped `pipecat-ai-prebuilt` to 1.0.1 in the `runner` extra, updating the prebuilt client UI served by the development runner.
|
||||
@@ -91,6 +91,9 @@ HEYGEN_LIVE_AVATAR_API_KEY=...
|
||||
HUME_API_KEY=...
|
||||
HUME_VOICE_ID=...
|
||||
|
||||
# Inception
|
||||
INCEPTION_API_KEY=...
|
||||
|
||||
# Inworld
|
||||
INWORLD_API_KEY=...
|
||||
|
||||
@@ -211,6 +214,11 @@ TWILIO_AUTH_TOKEN=...
|
||||
# Ultravox Realtime
|
||||
ULTRAVOX_API_KEY=...
|
||||
|
||||
# Vonage
|
||||
VONAGE_APPLICATION_ID=...
|
||||
VONAGE_SESSION_ID=...
|
||||
VONAGE_TOKEN=...
|
||||
|
||||
# WhatsApp
|
||||
WHATSAPP_TOKEN=...
|
||||
WHATSAPP_WEBHOOK_VERIFICATION_TOKEN=...
|
||||
|
||||
177
examples/function-calling/function-calling-inception.py
Normal file
177
examples/function-calling/function-calling-inception.py
Normal file
@@ -0,0 +1,177 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
|
||||
|
||||
import os
|
||||
|
||||
from dotenv import load_dotenv
|
||||
from loguru import logger
|
||||
|
||||
from pipecat.adapters.schemas.function_schema import FunctionSchema
|
||||
from pipecat.adapters.schemas.tools_schema import ToolsSchema
|
||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
||||
from pipecat.frames.frames import LLMRunFrame, TTSSpeakFrame
|
||||
from pipecat.pipeline.pipeline import Pipeline
|
||||
from pipecat.pipeline.runner import PipelineRunner
|
||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||
from pipecat.processors.aggregators.llm_context import LLMContext
|
||||
from pipecat.processors.aggregators.llm_response_universal import (
|
||||
LLMContextAggregatorPair,
|
||||
LLMUserAggregatorParams,
|
||||
)
|
||||
from pipecat.runner.types import RunnerArguments
|
||||
from pipecat.runner.utils import create_transport
|
||||
from pipecat.services.cartesia.tts import CartesiaTTSService
|
||||
from pipecat.services.deepgram.stt import DeepgramSTTService
|
||||
from pipecat.services.inception.llm import InceptionLLMService
|
||||
from pipecat.services.llm_service import FunctionCallParams
|
||||
from pipecat.transports.base_transport import BaseTransport, TransportParams
|
||||
from pipecat.transports.daily.transport import DailyParams
|
||||
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
|
||||
|
||||
load_dotenv(override=True)
|
||||
|
||||
|
||||
async def fetch_weather_from_api(params: FunctionCallParams):
|
||||
await params.result_callback({"conditions": "nice", "temperature": "75"})
|
||||
|
||||
|
||||
async def fetch_restaurant_recommendation(params: FunctionCallParams):
|
||||
await params.result_callback({"name": "The Golden Dragon"})
|
||||
|
||||
|
||||
# We use lambdas to defer transport parameter creation until the transport
|
||||
# type is selected at runtime.
|
||||
transport_params = {
|
||||
"daily": lambda: DailyParams(
|
||||
audio_in_enabled=True,
|
||||
audio_out_enabled=True,
|
||||
),
|
||||
"twilio": lambda: FastAPIWebsocketParams(
|
||||
audio_in_enabled=True,
|
||||
audio_out_enabled=True,
|
||||
),
|
||||
"webrtc": lambda: TransportParams(
|
||||
audio_in_enabled=True,
|
||||
audio_out_enabled=True,
|
||||
),
|
||||
}
|
||||
|
||||
|
||||
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
logger.info(f"Starting bot")
|
||||
|
||||
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
|
||||
|
||||
tts = CartesiaTTSService(
|
||||
api_key=os.environ["CARTESIA_API_KEY"],
|
||||
settings=CartesiaTTSService.Settings(
|
||||
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
|
||||
),
|
||||
)
|
||||
|
||||
llm = InceptionLLMService(
|
||||
api_key=os.environ["INCEPTION_API_KEY"],
|
||||
settings=InceptionLLMService.Settings(
|
||||
reasoning_effort="instant",
|
||||
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
|
||||
),
|
||||
)
|
||||
# You can also register a function_name of None to get all functions
|
||||
# sent to the same callback with an additional function_name parameter.
|
||||
llm.register_function("get_current_weather", fetch_weather_from_api)
|
||||
llm.register_function("get_restaurant_recommendation", fetch_restaurant_recommendation)
|
||||
|
||||
@llm.event_handler("on_function_calls_started")
|
||||
async def on_function_calls_started(service, function_calls):
|
||||
await tts.queue_frame(TTSSpeakFrame("Let me check on that."))
|
||||
|
||||
weather_function = FunctionSchema(
|
||||
name="get_current_weather",
|
||||
description="Get the current weather",
|
||||
properties={
|
||||
"location": {
|
||||
"type": "string",
|
||||
"description": "The city and state, e.g. San Francisco, CA",
|
||||
},
|
||||
"format": {
|
||||
"type": "string",
|
||||
"enum": ["celsius", "fahrenheit"],
|
||||
"description": "The temperature unit to use. Infer this from the user's location.",
|
||||
},
|
||||
},
|
||||
required=["location", "format"],
|
||||
)
|
||||
|
||||
restaurant_function = FunctionSchema(
|
||||
name="get_restaurant_recommendation",
|
||||
description="Get a restaurant recommendation",
|
||||
properties={
|
||||
"location": {
|
||||
"type": "string",
|
||||
"description": "The city and state, e.g. San Francisco, CA",
|
||||
},
|
||||
},
|
||||
required=["location"],
|
||||
)
|
||||
tools = ToolsSchema(standard_tools=[weather_function, restaurant_function])
|
||||
|
||||
context = LLMContext(tools=tools)
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
|
||||
)
|
||||
|
||||
pipeline = Pipeline(
|
||||
[
|
||||
transport.input(),
|
||||
stt,
|
||||
user_aggregator,
|
||||
llm,
|
||||
tts,
|
||||
transport.output(),
|
||||
assistant_aggregator,
|
||||
]
|
||||
)
|
||||
|
||||
task = PipelineTask(
|
||||
pipeline,
|
||||
params=PipelineParams(
|
||||
enable_metrics=True,
|
||||
enable_usage_metrics=True,
|
||||
),
|
||||
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
|
||||
)
|
||||
|
||||
@transport.event_handler("on_client_connected")
|
||||
async def on_client_connected(transport, client):
|
||||
logger.info(f"Client connected")
|
||||
# Kick off the conversation.
|
||||
context.add_message(
|
||||
{"role": "developer", "content": "Please introduce yourself to the user."}
|
||||
)
|
||||
await task.queue_frames([LLMRunFrame()])
|
||||
|
||||
@transport.event_handler("on_client_disconnected")
|
||||
async def on_client_disconnected(transport, client):
|
||||
logger.info(f"Client disconnected")
|
||||
await task.cancel()
|
||||
|
||||
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
|
||||
|
||||
await runner.run(task)
|
||||
|
||||
|
||||
async def bot(runner_args: RunnerArguments):
|
||||
"""Main bot entry point compatible with Pipecat Cloud."""
|
||||
transport = await create_transport(runner_args, transport_params)
|
||||
await run_bot(transport, runner_args)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
from pipecat.runner.run import main
|
||||
|
||||
main()
|
||||
@@ -68,9 +68,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
tts = OpenAITTSService(
|
||||
api_key=os.environ["OPENAI_API_KEY"],
|
||||
settings=OpenAITTSService.Settings(
|
||||
instructions="Please speak clearly and at a moderate pace.",
|
||||
voice="ballad",
|
||||
),
|
||||
instructions="Please speak clearly and at a moderate pace.",
|
||||
)
|
||||
|
||||
llm = OpenAILLMService(
|
||||
|
||||
134
examples/transports/transports-vonage.py
Normal file
134
examples/transports/transports-vonage.py
Normal file
@@ -0,0 +1,134 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
|
||||
"""Example of using OpenAI Realtime voice LLM service with Vonage Video Connector transport."""
|
||||
|
||||
import asyncio
|
||||
import os
|
||||
import sys
|
||||
from collections.abc import Callable
|
||||
from typing import Any
|
||||
|
||||
from dotenv import load_dotenv
|
||||
from loguru import logger
|
||||
|
||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
||||
from pipecat.frames.frames import LLMRunFrame
|
||||
from pipecat.observers.loggers.transcription_log_observer import TranscriptionLogObserver
|
||||
from pipecat.pipeline.pipeline import Pipeline
|
||||
from pipecat.pipeline.runner import PipelineRunner
|
||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||
from pipecat.processors.aggregators.llm_context import LLMContext
|
||||
from pipecat.processors.aggregators.llm_response_universal import (
|
||||
LLMContextAggregatorPair,
|
||||
LLMUserAggregatorParams,
|
||||
)
|
||||
from pipecat.runner.vonage import configure
|
||||
from pipecat.services.openai.realtime.events import (
|
||||
AudioConfiguration,
|
||||
AudioInput,
|
||||
InputAudioNoiseReduction,
|
||||
InputAudioTranscription,
|
||||
SemanticTurnDetection,
|
||||
SessionProperties,
|
||||
)
|
||||
from pipecat.services.openai.realtime.llm import OpenAIRealtimeLLMService
|
||||
from pipecat.transports.vonage.video_connector import (
|
||||
VonageVideoConnectorTransport,
|
||||
VonageVideoConnectorTransportParams,
|
||||
)
|
||||
|
||||
load_dotenv(override=True)
|
||||
|
||||
logger.remove(0)
|
||||
logger.add(sys.stderr, level="DEBUG")
|
||||
|
||||
|
||||
async def main() -> None:
|
||||
"""Main entry point for the OpenAI Realtime vonage video connector example."""
|
||||
(application_id, session_id, token) = await configure()
|
||||
|
||||
transport = VonageVideoConnectorTransport(
|
||||
application_id,
|
||||
session_id,
|
||||
token,
|
||||
VonageVideoConnectorTransportParams(
|
||||
audio_in_enabled=True,
|
||||
audio_out_enabled=True,
|
||||
publisher_name="Bot",
|
||||
),
|
||||
)
|
||||
|
||||
llm = OpenAIRealtimeLLMService(
|
||||
api_key=os.environ["OPENAI_API_KEY"],
|
||||
settings=OpenAIRealtimeLLMService.Settings(
|
||||
system_instruction="""You are a helpful and friendly AI.
|
||||
|
||||
Act like a human, but remember that you aren't a human and that you can't do human
|
||||
things in the real world. Your voice and personality should be warm and engaging, with a lively and
|
||||
playful tone.
|
||||
|
||||
If interacting in a non-English language, start by using the standard accent or dialect familiar to
|
||||
the user. Talk quickly.
|
||||
|
||||
You are participating in a voice conversation. Keep your responses concise, short, and to the point
|
||||
unless specifically asked to elaborate on a topic.
|
||||
|
||||
Remember, your responses should be short. Just one or two sentences, usually. Respond in English.""",
|
||||
session_properties=SessionProperties(
|
||||
audio=AudioConfiguration(
|
||||
input=AudioInput(
|
||||
transcription=InputAudioTranscription(),
|
||||
turn_detection=SemanticTurnDetection(),
|
||||
noise_reduction=InputAudioNoiseReduction(type="near_field"),
|
||||
)
|
||||
),
|
||||
),
|
||||
),
|
||||
)
|
||||
|
||||
context = LLMContext(
|
||||
[{"role": "developer", "content": "Say hello!"}],
|
||||
)
|
||||
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
|
||||
)
|
||||
|
||||
pipeline = Pipeline(
|
||||
[
|
||||
transport.input(),
|
||||
user_aggregator,
|
||||
llm,
|
||||
transport.output(),
|
||||
assistant_aggregator,
|
||||
]
|
||||
)
|
||||
|
||||
task = PipelineTask(
|
||||
pipeline,
|
||||
params=PipelineParams(
|
||||
enable_metrics=True,
|
||||
enable_usage_metrics=True,
|
||||
),
|
||||
observers=[TranscriptionLogObserver()],
|
||||
)
|
||||
|
||||
event_handler: Callable[[str], Callable[[Any], Any]] = transport.event_handler
|
||||
|
||||
@event_handler("on_client_connected")
|
||||
async def on_client_connected(transport: VonageVideoConnectorTransport, client: object) -> None:
|
||||
logger.info("Client connected")
|
||||
await task.queue_frames([LLMRunFrame()])
|
||||
|
||||
runner = PipelineRunner()
|
||||
|
||||
await runner.run(task)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
@@ -0,0 +1,201 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
|
||||
"""Example 22: Filter Incomplete Turns
|
||||
|
||||
Demonstrates LLM-based turn completion detection to suppress bot responses when
|
||||
the user was cut off mid-thought. The LLM outputs one of three markers:
|
||||
- ✓ (complete): User finished their thought, respond normally
|
||||
- ○ (incomplete short): User was cut off, wait ~5s then prompt
|
||||
- ◐ (incomplete long): User needs time to think, wait ~10s then prompt
|
||||
|
||||
When incomplete is detected, the bot's response is suppressed. After the timeout
|
||||
expires, the LLM is automatically prompted to re-engage the user.
|
||||
"""
|
||||
|
||||
import os
|
||||
|
||||
from dotenv import load_dotenv
|
||||
from loguru import logger
|
||||
|
||||
from pipecat.adapters.schemas.tools_schema import ToolsSchema
|
||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
||||
from pipecat.frames.frames import LLMRunFrame
|
||||
from pipecat.pipeline.pipeline import Pipeline
|
||||
from pipecat.pipeline.runner import PipelineRunner
|
||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||
from pipecat.processors.aggregators.llm_context import LLMContext
|
||||
from pipecat.processors.aggregators.llm_response_universal import (
|
||||
AssistantTurnStoppedMessage,
|
||||
LLMContextAggregatorPair,
|
||||
LLMUserAggregatorParams,
|
||||
UserTurnStoppedMessage,
|
||||
)
|
||||
from pipecat.runner.types import RunnerArguments
|
||||
from pipecat.runner.utils import create_transport
|
||||
from pipecat.services.cartesia.tts import CartesiaTTSService
|
||||
from pipecat.services.deepgram.stt import DeepgramSTTService
|
||||
from pipecat.services.llm_service import FunctionCallParams
|
||||
from pipecat.services.openai.llm import OpenAILLMService
|
||||
from pipecat.transports.base_transport import BaseTransport, TransportParams
|
||||
from pipecat.transports.daily.transport import DailyParams
|
||||
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
|
||||
from pipecat.turns.user_turn_strategies import FilterIncompleteUserTurnStrategies
|
||||
|
||||
load_dotenv(override=True)
|
||||
|
||||
|
||||
# We use lambdas to defer transport parameter creation until the transport
|
||||
# type is selected at runtime.
|
||||
transport_params = {
|
||||
"daily": lambda: DailyParams(
|
||||
audio_in_enabled=True,
|
||||
audio_out_enabled=True,
|
||||
),
|
||||
"twilio": lambda: FastAPIWebsocketParams(
|
||||
audio_in_enabled=True,
|
||||
audio_out_enabled=True,
|
||||
),
|
||||
"webrtc": lambda: TransportParams(
|
||||
audio_in_enabled=True,
|
||||
audio_out_enabled=True,
|
||||
),
|
||||
}
|
||||
|
||||
|
||||
async def get_weather(params: FunctionCallParams, location: str):
|
||||
"""Return the current weather for a location.
|
||||
|
||||
A stub that always reports the same conditions — replace with a real
|
||||
weather API in production.
|
||||
|
||||
Args:
|
||||
location (str): The city and state or country, e.g. "Paris, France".
|
||||
"""
|
||||
await params.result_callback(
|
||||
{
|
||||
"location": location,
|
||||
"temperature_celsius": 22,
|
||||
"conditions": "partly cloudy",
|
||||
}
|
||||
)
|
||||
|
||||
|
||||
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
logger.info(f"Starting bot")
|
||||
|
||||
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
|
||||
|
||||
llm = OpenAILLMService(
|
||||
api_key=os.environ["OPENAI_API_KEY"],
|
||||
settings=OpenAILLMService.Settings(
|
||||
system_instruction=(
|
||||
"You are a helpful assistant in a voice conversation. Your "
|
||||
"responses will be spoken aloud, so avoid emojis, bullet "
|
||||
"points, or other formatting that can't be spoken. Respond to "
|
||||
"what the user said in a creative, helpful, and brief way. "
|
||||
"If the user asks about the weather, call the get_weather "
|
||||
"tool and speak the result back naturally."
|
||||
),
|
||||
),
|
||||
)
|
||||
llm.register_direct_function(get_weather)
|
||||
|
||||
tts = CartesiaTTSService(
|
||||
api_key=os.environ["CARTESIA_API_KEY"],
|
||||
settings=CartesiaTTSService.Settings(
|
||||
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
|
||||
),
|
||||
)
|
||||
|
||||
context = LLMContext(tools=ToolsSchema(standard_tools=[get_weather]))
|
||||
# `FilterIncompleteUserTurnStrategies` pairs the default detector
|
||||
# chain with `LLMTurnCompletionUserTurnStopStrategy`: detectors
|
||||
# trigger LLM inference but the public `on_user_turn_stopped` event
|
||||
# fires only when the LLM confirms ✓. The LLM marks each response
|
||||
# with one of:
|
||||
# ✓ = complete (respond normally)
|
||||
# ○ = incomplete short (wait 5s, then prompt)
|
||||
# ◐ = incomplete long (wait 10s, then prompt)
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(
|
||||
vad_analyzer=SileroVADAnalyzer(),
|
||||
user_turn_strategies=FilterIncompleteUserTurnStrategies(
|
||||
# Optional: customize turn completion behavior
|
||||
# config=UserTurnCompletionConfig(
|
||||
# incomplete_short_timeout=5.0,
|
||||
# incomplete_long_timeout=10.0,
|
||||
# incomplete_short_prompt="Custom prompt...",
|
||||
# incomplete_long_prompt="Custom prompt...",
|
||||
# instructions="Custom turn completion instructions...",
|
||||
# ),
|
||||
),
|
||||
),
|
||||
)
|
||||
|
||||
pipeline = Pipeline(
|
||||
[
|
||||
transport.input(), # Transport user input
|
||||
stt,
|
||||
user_aggregator, # User responses
|
||||
llm, # LLM
|
||||
tts, # TTS
|
||||
transport.output(), # Transport bot output
|
||||
assistant_aggregator, # Assistant spoken responses
|
||||
]
|
||||
)
|
||||
|
||||
task = PipelineTask(
|
||||
pipeline,
|
||||
params=PipelineParams(
|
||||
enable_metrics=True,
|
||||
enable_usage_metrics=True,
|
||||
),
|
||||
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
|
||||
)
|
||||
|
||||
@transport.event_handler("on_client_connected")
|
||||
async def on_client_connected(transport, client):
|
||||
logger.info(f"Client connected")
|
||||
# Kick off the conversation.
|
||||
context.add_message(
|
||||
{"role": "developer", "content": "Please introduce yourself to the user."}
|
||||
)
|
||||
await task.queue_frames([LLMRunFrame()])
|
||||
|
||||
@transport.event_handler("on_client_disconnected")
|
||||
async def on_client_disconnected(transport, client):
|
||||
logger.info(f"Client disconnected")
|
||||
await task.cancel()
|
||||
|
||||
@user_aggregator.event_handler("on_user_turn_stopped")
|
||||
async def on_user_turn_stopped(aggregator, strategy, message: UserTurnStoppedMessage):
|
||||
timestamp = f"[{message.timestamp}] " if message.timestamp else ""
|
||||
line = f"{timestamp}user: {message.content}"
|
||||
logger.info(f"Transcript: {line}")
|
||||
|
||||
@assistant_aggregator.event_handler("on_assistant_turn_stopped")
|
||||
async def on_assistant_turn_stopped(aggregator, message: AssistantTurnStoppedMessage):
|
||||
timestamp = f"[{message.timestamp}] " if message.timestamp else ""
|
||||
line = f"{timestamp}assistant: {message.content}"
|
||||
logger.info(f"Transcript: {line}")
|
||||
|
||||
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
|
||||
|
||||
await runner.run(task)
|
||||
|
||||
|
||||
async def bot(runner_args: RunnerArguments):
|
||||
"""Main bot entry point compatible with Pipecat Cloud."""
|
||||
transport = await create_transport(runner_args, transport_params)
|
||||
await run_bot(transport, runner_args)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
from pipecat.runner.run import main
|
||||
|
||||
main()
|
||||
@@ -22,9 +22,9 @@ from pipecat.processors.aggregators.llm_response_universal import (
|
||||
)
|
||||
from pipecat.runner.types import RunnerArguments
|
||||
from pipecat.runner.utils import create_transport
|
||||
from pipecat.services.cartesia.tts import CartesiaTTSService
|
||||
from pipecat.services.openai.llm import OpenAILLMService
|
||||
from pipecat.services.soniox.stt import SonioxSTTService
|
||||
from pipecat.services.soniox.tts import SonioxTTSService
|
||||
from pipecat.transcriptions.language import Language
|
||||
from pipecat.transports.base_transport import BaseTransport, TransportParams
|
||||
from pipecat.transports.daily.transport import DailyParams
|
||||
@@ -53,12 +53,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
|
||||
stt = SonioxSTTService(api_key=os.environ["SONIOX_API_KEY"])
|
||||
|
||||
tts = CartesiaTTSService(
|
||||
api_key=os.environ["CARTESIA_API_KEY"],
|
||||
settings=CartesiaTTSService.Settings(
|
||||
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
|
||||
),
|
||||
)
|
||||
tts = SonioxTTSService(api_key=os.environ["SONIOX_API_KEY"])
|
||||
|
||||
llm = OpenAILLMService(
|
||||
api_key=os.environ["OPENAI_API_KEY"],
|
||||
@@ -103,9 +98,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
await task.queue_frames([LLMRunFrame()])
|
||||
|
||||
await asyncio.sleep(10)
|
||||
logger.info("Updating Soniox STT settings: language=es")
|
||||
logger.info("Updating Soniox STT settings: language_hints=[es]")
|
||||
await task.queue_frame(
|
||||
STTUpdateSettingsFrame(delta=SonioxSTTService.Settings(language=Language.ES))
|
||||
STTUpdateSettingsFrame(delta=SonioxSTTService.Settings(language_hints=[Language.ES]))
|
||||
)
|
||||
|
||||
@transport.event_handler("on_client_disconnected")
|
||||
|
||||
@@ -77,6 +77,7 @@ groq = [ "groq>=0.23.0,<2" ]
|
||||
gstreamer = [ "pygobject~=3.50.0" ]
|
||||
heygen = [ "livekit>=1.0.13,<2", "pipecat-ai[websockets-base]" ]
|
||||
hume = [ "hume>=0.11.2,<1" ]
|
||||
inception = []
|
||||
inworld = [ "pipecat-ai[websockets-base]" ]
|
||||
koala = [ "pvkoala~=2.0.3" ]
|
||||
kokoro = [ "kokoro-onnx>=0.5.0,<1", "requests>=2.32.5,<3" ]
|
||||
@@ -103,7 +104,7 @@ piper = [ "piper-tts>=1.3.0,<2", "requests>=2.32.5,<3" ]
|
||||
qwen = []
|
||||
resembleai = [ "pipecat-ai[websockets-base]" ]
|
||||
rime = [ "pipecat-ai[websockets-base]" ]
|
||||
runner = [ "python-dotenv>=1.0.0,<2.0.0", "uvicorn>=0.32.0,<1.0.0", "fastapi>=0.115.6,<1", "pipecat-ai-small-webrtc-prebuilt>=2.5.0"]
|
||||
runner = [ "python-dotenv>=1.0.0,<2.0.0", "uvicorn>=0.32.0,<1.0.0", "fastapi>=0.115.6,<1", "pipecat-ai-prebuilt>=1.0.1"]
|
||||
sagemaker = ["aws_sdk_sagemaker_runtime_http2; python_version>='3.12'"]
|
||||
sambanova = []
|
||||
sarvam = [ "sarvamai==0.1.28", "pipecat-ai[websockets-base]" ]
|
||||
@@ -119,6 +120,7 @@ tavus = [ "pipecat-ai[daily]" ]
|
||||
together = []
|
||||
tracing = [ "opentelemetry-sdk>=1.33.0,<2", "opentelemetry-api>=1.33.0,<2", "opentelemetry-instrumentation>=0.54b0,<1" ]
|
||||
ultravox = [ "pipecat-ai[websockets-base]" ]
|
||||
vonage-video-connector = [ "vonage-video-connector~=0.2.3b0; python_full_version>='3.13' and python_full_version<'3.14' and platform_system=='Linux'" ]
|
||||
webrtc = [ "aiortc>=1.14.0,<2", "opencv-python>=4.11.0.86,<5" ]
|
||||
websocket = [ "pipecat-ai[websockets-base]", "fastapi>=0.115.6,<1" ]
|
||||
websockets-base = [ "websockets>=13.1,<16.0" ]
|
||||
|
||||
@@ -198,6 +198,7 @@ TESTS_FUNCTION_CALLING = [
|
||||
("function-calling/function-calling-sarvam.py", EVAL_WEATHER),
|
||||
("function-calling/function-calling-novita.py", EVAL_WEATHER),
|
||||
("function-calling/function-calling-deepseek.py", EVAL_WEATHER),
|
||||
("function-calling/function-calling-inception.py", EVAL_WEATHER),
|
||||
# Video
|
||||
("function-calling/function-calling-anthropic-video.py", EVAL_VISION_CAMERA),
|
||||
("function-calling/function-calling-aws-video.py", EVAL_VISION_CAMERA),
|
||||
@@ -242,6 +243,7 @@ TESTS_VIDEO_AVATAR = [
|
||||
|
||||
TESTS_TURN_MANAGEMENT = [
|
||||
("turn-management/turn-management-filter-incomplete-turns.py", EVAL_COMPLETE_TURN),
|
||||
("turn-management/turn-management-filter-incomplete-turns-function-calling.py", EVAL_WEATHER),
|
||||
]
|
||||
|
||||
TESTS_THINKING = [
|
||||
|
||||
@@ -383,10 +383,14 @@ class AggregatedTextFrame(TextFrame):
|
||||
Parameters:
|
||||
aggregated_by: Method used to aggregate the text frames.
|
||||
context_id: Unique identifier for the TTS context that generated this text.
|
||||
raw_text: The full matched text including start/end pattern delimiters, set when
|
||||
this frame was produced from a PatternMatch (e.g. a ``<code>...</code>`` block).
|
||||
None for ordinary sentence aggregations.
|
||||
"""
|
||||
|
||||
aggregated_by: AggregationType | str
|
||||
context_id: str | None = None
|
||||
raw_text: str | None = None
|
||||
|
||||
|
||||
@dataclass
|
||||
|
||||
@@ -25,6 +25,7 @@ from pipecat.adapters.schemas.tools_schema import ToolsSchema
|
||||
from pipecat.audio.vad.vad_analyzer import VADAnalyzer
|
||||
from pipecat.audio.vad.vad_controller import VADController
|
||||
from pipecat.frames.frames import (
|
||||
AggregatedTextFrame,
|
||||
AssistantImageRawFrame,
|
||||
BotStartedSpeakingFrame,
|
||||
BotStoppedSpeakingFrame,
|
||||
@@ -1496,9 +1497,14 @@ class LLMAssistantAggregator(LLMContextAggregator):
|
||||
if len(frame.text) == 0:
|
||||
return
|
||||
|
||||
text = (
|
||||
frame.raw_text
|
||||
if isinstance(frame, AggregatedTextFrame) and frame.raw_text
|
||||
else frame.text
|
||||
)
|
||||
self._aggregation.append(
|
||||
TextPartForConcatenation(
|
||||
frame.text, includes_inter_part_spaces=frame.includes_inter_frame_spaces
|
||||
text, includes_inter_part_spaces=frame.includes_inter_frame_spaces
|
||||
)
|
||||
)
|
||||
|
||||
|
||||
@@ -23,6 +23,7 @@ from pipecat.frames.frames import (
|
||||
)
|
||||
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
|
||||
from pipecat.utils.text.base_text_aggregator import BaseTextAggregator
|
||||
from pipecat.utils.text.pattern_pair_aggregator import PatternMatch
|
||||
from pipecat.utils.text.simple_text_aggregator import SimpleTextAggregator
|
||||
|
||||
|
||||
@@ -85,7 +86,11 @@ class LLMTextProcessor(FrameProcessor):
|
||||
out_frame = AggregatedTextFrame(
|
||||
text=aggregation.text,
|
||||
aggregated_by=aggregation.type,
|
||||
raw_text=aggregation.full_match
|
||||
if isinstance(aggregation, PatternMatch)
|
||||
else aggregation.text,
|
||||
)
|
||||
out_frame.append_to_context = True
|
||||
out_frame.skip_tts = in_frame.skip_tts
|
||||
await self.push_frame(out_frame)
|
||||
|
||||
@@ -96,6 +101,9 @@ class LLMTextProcessor(FrameProcessor):
|
||||
out_frame = AggregatedTextFrame(
|
||||
text=remaining.text,
|
||||
aggregated_by=remaining.type,
|
||||
raw_text=remaining.full_match
|
||||
if isinstance(remaining, PatternMatch)
|
||||
else remaining.text,
|
||||
)
|
||||
out_frame.skip_tts = skip_tts
|
||||
await self.push_frame(out_frame)
|
||||
|
||||
@@ -528,6 +528,9 @@ class RTVIObserver(BaseObserver):
|
||||
text = await transform(text, agg_type)
|
||||
|
||||
isTTS = isinstance(frame, TTSTextFrame)
|
||||
if agg_type is not AggregationType.WORD:
|
||||
logger.trace(f"{self} Aggregated LLM text: {text}, {agg_type} spoken:{isTTS}")
|
||||
|
||||
if self._params.bot_output_enabled:
|
||||
message = RTVI.BotOutputMessage(
|
||||
data=RTVI.BotOutputMessageData(text=text, spoken=isTTS, aggregated_by=agg_type)
|
||||
|
||||
@@ -19,6 +19,10 @@ All bots must implement a `bot(runner_args)` async function as the entry point.
|
||||
The server automatically discovers and executes this function when connections
|
||||
are established.
|
||||
|
||||
By default the runner starts a single FastAPI server that supports WebRTC, Daily,
|
||||
and telephony transports simultaneously. Clients declare which transport they want
|
||||
via the ``transport`` field in the ``/start`` request body (default: ``"webrtc"``).
|
||||
|
||||
Single transport example::
|
||||
|
||||
async def bot(runner_args: RunnerArguments):
|
||||
@@ -55,18 +59,38 @@ Supported transports:
|
||||
- WebRTC - Provides local WebRTC interface with prebuilt UI
|
||||
- Telephony - Handles webhook and WebSocket connections for Twilio, Telnyx, Plivo, Exotel
|
||||
|
||||
The ``/start`` endpoint accepts::
|
||||
|
||||
{
|
||||
"transport": "webrtc", // "webrtc" | "daily" | "twilio" | "telnyx" |
|
||||
// "plivo" | "exotel" — default: "webrtc"
|
||||
|
||||
// WebRTC-specific
|
||||
"enableDefaultIceServers": false,
|
||||
"body": {...},
|
||||
|
||||
// Daily-specific
|
||||
"createDailyRoom": true,
|
||||
"dailyRoomProperties": {...},
|
||||
"dailyMeetingTokenProperties": {...},
|
||||
"body": {...}
|
||||
}
|
||||
|
||||
To run locally:
|
||||
|
||||
- WebRTC: `python bot.py -t webrtc`
|
||||
- ESP32: `python bot.py -t webrtc --esp32 --host 192.168.1.100`
|
||||
- Daily (server): `python bot.py -t daily`
|
||||
- Daily (direct, testing only): `python bot.py -d`
|
||||
- Telephony: `python bot.py -t twilio -x your_username.ngrok.io`
|
||||
- Exotel: `python bot.py -t exotel` (no proxy needed, but ngrok connection to HTTP 7860 is required)
|
||||
- All transports (default): ``python bot.py``
|
||||
- WebRTC only: ``python bot.py -t webrtc``
|
||||
- ESP32: ``python bot.py -t webrtc --esp32 --host 192.168.1.100``
|
||||
- Daily only: ``python bot.py -t daily``
|
||||
- Daily (direct, testing only): ``python bot.py -d``
|
||||
- Telephony: ``python bot.py -t twilio -x your_username.ngrok.io``
|
||||
- Exotel: ``python bot.py -t exotel`` (no proxy needed, but ngrok connection to HTTP 7860 is required)
|
||||
- WhatsApp: ``python bot.py --whatsapp``
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import asyncio
|
||||
import importlib.util
|
||||
import mimetypes
|
||||
import os
|
||||
import sys
|
||||
@@ -85,8 +109,10 @@ from pipecat.runner.types import (
|
||||
DailyRunnerArguments,
|
||||
RunnerArguments,
|
||||
SmallWebRTCRunnerArguments,
|
||||
VonageRunnerArguments,
|
||||
WebSocketRunnerArguments,
|
||||
)
|
||||
from pipecat.runner.vonage import configure as configure_vonage
|
||||
|
||||
try:
|
||||
import uvicorn
|
||||
@@ -106,6 +132,18 @@ load_dotenv(override=True)
|
||||
os.environ["ENV"] = "local"
|
||||
|
||||
TELEPHONY_TRANSPORTS = ["twilio", "telnyx", "plivo", "exotel"]
|
||||
TRANSPORT_ROUTE_DEPENDENCIES = {
|
||||
"daily": ("daily",),
|
||||
"webrtc": ("aiortc",),
|
||||
"telephony": ("fastapi", "websockets"),
|
||||
"websocket": ("fastapi", "websockets"),
|
||||
}
|
||||
TRANSPORT_INSTALL_HINTS = {
|
||||
"daily": "install pipecat-ai[daily]",
|
||||
"webrtc": "install pipecat-ai[webrtc]",
|
||||
"telephony": "install pipecat-ai[websocket]",
|
||||
"websocket": "install pipecat-ai[websocket]",
|
||||
}
|
||||
|
||||
# Mirror Pipecat Cloud's 4-hour max session limit so dev rooms get cleaned up.
|
||||
PIPECAT_ROOM_EXP_HOURS = 4.0
|
||||
@@ -131,6 +169,120 @@ Import this to add custom routes from other packages before calling
|
||||
"""
|
||||
|
||||
|
||||
def _is_module_available(module: str) -> bool:
|
||||
"""Check whether a module can be imported without importing it.
|
||||
|
||||
Args:
|
||||
module: Fully-qualified module name to check.
|
||||
|
||||
Returns:
|
||||
``True`` if Python can resolve the module, ``False`` otherwise.
|
||||
"""
|
||||
try:
|
||||
return importlib.util.find_spec(module) is not None
|
||||
except (ImportError, ModuleNotFoundError, ValueError):
|
||||
return False
|
||||
|
||||
|
||||
def _transport_route_dependencies(transport: str) -> tuple[str, ...]:
|
||||
"""Return module dependencies required for a transport route.
|
||||
|
||||
Args:
|
||||
transport: Transport name from the runner request or CLI.
|
||||
|
||||
Returns:
|
||||
Module names required to enable the transport route.
|
||||
"""
|
||||
if transport in TELEPHONY_TRANSPORTS:
|
||||
return TRANSPORT_ROUTE_DEPENDENCIES["telephony"]
|
||||
return TRANSPORT_ROUTE_DEPENDENCIES.get(transport, ())
|
||||
|
||||
|
||||
def _transport_routes_enabled(transport: str) -> bool:
|
||||
"""Return whether a transport route can run in this environment.
|
||||
|
||||
Args:
|
||||
transport: Transport name from the runner request or CLI.
|
||||
|
||||
Returns:
|
||||
``True`` if the requested transport is enabled.
|
||||
"""
|
||||
return all(_is_module_available(module) for module in _transport_route_dependencies(transport))
|
||||
|
||||
|
||||
def _runner_url(args: argparse.Namespace) -> str:
|
||||
"""Return the browser URL for the runner prebuilt client."""
|
||||
return f"http://{args.host}:{args.port}"
|
||||
|
||||
|
||||
def _transport_status_lists() -> tuple[list[str], list[str]]:
|
||||
"""Return enabled and disabled transport labels for the startup banner."""
|
||||
transports = ["daily", "webrtc", "telephony", "websocket"]
|
||||
enabled = []
|
||||
disabled = []
|
||||
|
||||
for label in transports:
|
||||
if _transport_routes_enabled(label):
|
||||
enabled.append(label)
|
||||
else:
|
||||
disabled.append(f"{label} ({TRANSPORT_INSTALL_HINTS[label]})")
|
||||
|
||||
return enabled, disabled
|
||||
|
||||
|
||||
def _format_transport_status(labels: list[str]) -> str:
|
||||
"""Format a startup banner transport status list."""
|
||||
return ", ".join(labels) if labels else "none"
|
||||
|
||||
|
||||
def _print_startup_message(args: argparse.Namespace):
|
||||
"""Print connection information for the development runner."""
|
||||
print()
|
||||
if args.transport is None:
|
||||
enabled, disabled = _transport_status_lists()
|
||||
print("🚀 Bot ready!")
|
||||
print(f" → Open: {_runner_url(args)}")
|
||||
print(f" → Enabled transports: {_format_transport_status(enabled)}")
|
||||
if disabled:
|
||||
print(f" → Disabled transports: {_format_transport_status(disabled)}")
|
||||
elif args.transport == "webrtc":
|
||||
if args.esp32:
|
||||
print("🚀 Bot ready! (ESP32 mode)")
|
||||
elif args.whatsapp:
|
||||
print("🚀 Bot ready! (WhatsApp)")
|
||||
else:
|
||||
print("🚀 Bot ready! (WebRTC)")
|
||||
if _transport_routes_enabled("webrtc"):
|
||||
print(f" → Open: {_runner_url(args)}")
|
||||
else:
|
||||
print(f" → WebRTC disabled ({TRANSPORT_INSTALL_HINTS['webrtc']})")
|
||||
elif args.transport == "daily":
|
||||
print("🚀 Bot ready! (Daily)")
|
||||
if not _transport_routes_enabled("daily"):
|
||||
print(f" → Daily disabled ({TRANSPORT_INSTALL_HINTS['daily']})")
|
||||
else:
|
||||
print(f" → Open: {_runner_url(args)}")
|
||||
if args.dialin:
|
||||
print(
|
||||
f" → Daily dial-in webhook: "
|
||||
f"http://{args.host}:{args.port}/daily-dialin-webhook"
|
||||
)
|
||||
print(" → Configure this URL in your Daily phone number settings")
|
||||
elif args.transport in TELEPHONY_TRANSPORTS:
|
||||
print(f"🚀 Bot ready! ({args.transport.capitalize()})")
|
||||
if not _transport_routes_enabled(args.transport):
|
||||
print(f" → Telephony disabled ({TRANSPORT_INSTALL_HINTS['telephony']})")
|
||||
else:
|
||||
print(f" → Open: {_runner_url(args)}")
|
||||
if args.proxy:
|
||||
print(f" → XML webhook: http://{args.host}:{args.port}/")
|
||||
print(f" → WebSocket: ws://{args.host}:{args.port}/ws")
|
||||
elif args.transport == "vonage":
|
||||
print()
|
||||
print("🚀 Bot ready!")
|
||||
print()
|
||||
|
||||
|
||||
def _get_bot_module():
|
||||
"""Get the bot module from the calling script."""
|
||||
import importlib.util
|
||||
@@ -186,8 +338,35 @@ async def _run_telephony_bot(websocket: WebSocket, args: argparse.Namespace):
|
||||
await bot_module.bot(runner_args)
|
||||
|
||||
|
||||
async def _run_websocket_bot(websocket: WebSocket, args: argparse.Namespace):
|
||||
"""Run a bot for plain WebSocket transport."""
|
||||
bot_module = _get_bot_module()
|
||||
|
||||
runner_args = WebSocketRunnerArguments(
|
||||
websocket=websocket,
|
||||
transport_type="websocket",
|
||||
session_id=str(uuid.uuid4()),
|
||||
)
|
||||
runner_args.cli_args = args
|
||||
|
||||
await bot_module.bot(runner_args)
|
||||
|
||||
|
||||
def _setup_websocket_routes(app: FastAPI, args: argparse.Namespace):
|
||||
"""Set up the plain WebSocket route at ``/ws-client``."""
|
||||
if not _transport_routes_enabled("websocket"):
|
||||
return
|
||||
|
||||
@app.websocket("/ws-client")
|
||||
async def websocket_client_endpoint(websocket: WebSocket):
|
||||
"""Handle plain WebSocket connections (non-telephony)."""
|
||||
await websocket.accept()
|
||||
logger.debug("Plain WebSocket connection accepted")
|
||||
await _run_websocket_bot(websocket, args)
|
||||
|
||||
|
||||
def _configure_server_app(args: argparse.Namespace):
|
||||
"""Configure the module-level FastAPI app with transport-specific routes."""
|
||||
"""Configure the module-level FastAPI app with routes for all transports."""
|
||||
app.add_middleware(
|
||||
CORSMiddleware,
|
||||
allow_origins=["*"],
|
||||
@@ -196,17 +375,207 @@ def _configure_server_app(args: argparse.Namespace):
|
||||
allow_headers=["*"],
|
||||
)
|
||||
|
||||
# Set up transport-specific routes
|
||||
if args.transport == "webrtc":
|
||||
_setup_webrtc_routes(app, args)
|
||||
if args.whatsapp:
|
||||
_setup_whatsapp_routes(app, args)
|
||||
elif args.transport == "daily":
|
||||
_setup_daily_routes(app, args)
|
||||
elif args.transport in TELEPHONY_TRANSPORTS:
|
||||
_setup_telephony_routes(app, args)
|
||||
else:
|
||||
logger.warning(f"Unknown transport type: {args.transport}")
|
||||
# Shared session store: session_id -> body data. Used by the WebRTC /start
|
||||
# flow and the /sessions/{session_id}/... proxy routes.
|
||||
active_sessions: dict[str, dict[str, Any]] = {}
|
||||
|
||||
_setup_frontend_routes(app)
|
||||
_setup_webrtc_routes(app, args, active_sessions)
|
||||
_setup_daily_routes(app, args)
|
||||
_setup_telephony_routes(app, args)
|
||||
_setup_websocket_routes(app, args)
|
||||
_setup_unified_start_route(app, args, active_sessions)
|
||||
|
||||
if args.whatsapp:
|
||||
_setup_whatsapp_routes(app, args)
|
||||
|
||||
|
||||
def _setup_unified_start_route(
|
||||
app: FastAPI, args: argparse.Namespace, active_sessions: dict[str, dict[str, Any]]
|
||||
):
|
||||
"""Register the unified POST /start and GET /status endpoints.
|
||||
|
||||
Handles WebRTC, Daily, and telephony transport start flows. Clients specify
|
||||
which transport they want via the ``transport`` field in the request body.
|
||||
When ``-t`` was passed on the command line, requests for any other transport
|
||||
are rejected with HTTP 400.
|
||||
"""
|
||||
ALL_TRANSPORTS = ["webrtc", "daily", *TELEPHONY_TRANSPORTS, "websocket"]
|
||||
|
||||
@app.get("/status")
|
||||
async def status():
|
||||
"""Return the transports supported by this runner instance."""
|
||||
transports = [args.transport] if args.transport is not None else ALL_TRANSPORTS
|
||||
return {"status": "ready", "transports": transports}
|
||||
|
||||
class IceServer(TypedDict, total=False):
|
||||
urls: str | list[str]
|
||||
|
||||
class IceConfig(TypedDict):
|
||||
iceServers: list[IceServer]
|
||||
|
||||
class StartBotResult(TypedDict, total=False):
|
||||
sessionId: str
|
||||
iceConfig: IceConfig | None
|
||||
dailyRoom: str | None
|
||||
dailyToken: str | None
|
||||
wsUrl: str | None
|
||||
token: str | None
|
||||
|
||||
@app.post("/start")
|
||||
async def start_agent(request: Request):
|
||||
"""Start a bot session.
|
||||
|
||||
Accepts::
|
||||
|
||||
{
|
||||
"transport": "webrtc", // "webrtc" | "daily" | "twilio" | "telnyx" |
|
||||
// "plivo" | "exotel" — default: "webrtc"
|
||||
|
||||
// WebRTC-specific
|
||||
"enableDefaultIceServers": false,
|
||||
"body": {...},
|
||||
|
||||
// Daily-specific
|
||||
"createDailyRoom": true,
|
||||
"dailyRoomProperties": {...},
|
||||
"dailyMeetingTokenProperties": {...},
|
||||
"body": {...}
|
||||
}
|
||||
"""
|
||||
try:
|
||||
request_data = await request.json()
|
||||
logger.debug(f"Received request: {request_data}")
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to parse request body: {e}")
|
||||
request_data = {}
|
||||
|
||||
# Determine transport: explicit field → legacy Daily hint → CLI default → webrtc
|
||||
transport = request_data.get("transport")
|
||||
if transport is None and request_data.get("createDailyRoom", False):
|
||||
transport = "daily"
|
||||
if transport is None:
|
||||
transport = args.transport or "webrtc"
|
||||
|
||||
# Enforce restriction when -t was explicitly set on the command line
|
||||
if args.transport is not None and transport != args.transport:
|
||||
raise HTTPException(
|
||||
status_code=400,
|
||||
detail=(
|
||||
f"Transport '{transport}' is not allowed. "
|
||||
f"Server is configured for '{args.transport}' only (-t {args.transport})."
|
||||
),
|
||||
)
|
||||
|
||||
if not _transport_routes_enabled(transport):
|
||||
raise HTTPException(
|
||||
status_code=400,
|
||||
detail=(
|
||||
f"Transport '{transport}' is disabled in this runner environment. "
|
||||
"Check the startup banner for enabled transports."
|
||||
),
|
||||
)
|
||||
|
||||
if transport == "webrtc":
|
||||
# WebRTC: register the session; the bot starts when the WebRTC offer arrives.
|
||||
session_id = str(uuid.uuid4())
|
||||
active_sessions[session_id] = request_data.get("body", {})
|
||||
|
||||
result = StartBotResult(
|
||||
sessionId=session_id,
|
||||
)
|
||||
if request_data.get("enableDefaultIceServers"):
|
||||
result["iceConfig"] = IceConfig(
|
||||
iceServers=[IceServer(urls=["stun:stun.l.google.com:19302"])]
|
||||
)
|
||||
return result
|
||||
|
||||
elif transport == "daily":
|
||||
create_daily_room = request_data.get("createDailyRoom", False)
|
||||
body = request_data.get("body", {})
|
||||
daily_room_properties_dict = request_data.get("dailyRoomProperties", None)
|
||||
daily_token_properties_dict = request_data.get("dailyMeetingTokenProperties", None)
|
||||
|
||||
bot_module = _get_bot_module()
|
||||
|
||||
existing_room_url = os.getenv("DAILY_ROOM_URL")
|
||||
session_id = str(uuid.uuid4())
|
||||
result: StartBotResult | None = None
|
||||
|
||||
if create_daily_room or existing_room_url:
|
||||
from pipecat.runner.daily import configure
|
||||
from pipecat.transports.daily.utils import (
|
||||
DailyMeetingTokenProperties,
|
||||
DailyRoomProperties,
|
||||
)
|
||||
|
||||
async with aiohttp.ClientSession() as session:
|
||||
room_properties = None
|
||||
if daily_room_properties_dict:
|
||||
daily_room_properties_dict.setdefault(
|
||||
"exp", time.time() + PIPECAT_ROOM_EXP_HOURS * 3600
|
||||
)
|
||||
daily_room_properties_dict.setdefault("eject_at_room_exp", True)
|
||||
try:
|
||||
room_properties = DailyRoomProperties(**daily_room_properties_dict)
|
||||
logger.debug(f"Using custom room properties: {room_properties}")
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to parse dailyRoomProperties: {e}")
|
||||
|
||||
token_properties = None
|
||||
if daily_token_properties_dict:
|
||||
try:
|
||||
token_properties = DailyMeetingTokenProperties(
|
||||
**daily_token_properties_dict
|
||||
)
|
||||
logger.debug(f"Using custom token properties: {token_properties}")
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to parse dailyMeetingTokenProperties: {e}")
|
||||
|
||||
room_url, token = await configure(
|
||||
session,
|
||||
room_exp_duration=PIPECAT_ROOM_EXP_HOURS,
|
||||
room_properties=room_properties,
|
||||
token_properties=token_properties,
|
||||
)
|
||||
runner_args = DailyRunnerArguments(
|
||||
room_url=room_url, token=token, body=body, session_id=session_id
|
||||
)
|
||||
result = StartBotResult(
|
||||
dailyRoom=room_url,
|
||||
dailyToken=token,
|
||||
sessionId=session_id,
|
||||
)
|
||||
else:
|
||||
runner_args = RunnerArguments(body=body, session_id=session_id)
|
||||
|
||||
runner_args.cli_args = args
|
||||
asyncio.create_task(bot_module.bot(runner_args))
|
||||
return result
|
||||
|
||||
elif transport in TELEPHONY_TRANSPORTS:
|
||||
# Telephony: the bot starts when the provider connects to /ws.
|
||||
# Return the WebSocket URL so the caller knows where to point their provider.
|
||||
scheme = "wss" if args.host != "localhost" else "ws"
|
||||
return StartBotResult(
|
||||
wsUrl=f"{scheme}://{args.host}:{args.port}/ws",
|
||||
)
|
||||
|
||||
elif transport == "websocket":
|
||||
# Plain WebSocket: the bot starts when the client connects to /ws-client.
|
||||
scheme = "wss" if args.host != "localhost" else "ws"
|
||||
session_id = str(uuid.uuid4())
|
||||
return StartBotResult(
|
||||
wsUrl=f"{scheme}://{args.host}:{args.port}/ws-client",
|
||||
sessionId=session_id,
|
||||
token="mock_token",
|
||||
)
|
||||
|
||||
else:
|
||||
raise HTTPException(
|
||||
status_code=400,
|
||||
detail=f"Unknown transport '{transport}'.",
|
||||
)
|
||||
|
||||
|
||||
def _resolve_download_path(folder: str, filename: str) -> Path:
|
||||
@@ -220,11 +589,30 @@ def _resolve_download_path(folder: str, filename: str) -> Path:
|
||||
return file_path
|
||||
|
||||
|
||||
def _setup_webrtc_routes(app: FastAPI, args: argparse.Namespace):
|
||||
"""Set up WebRTC-specific routes."""
|
||||
def _setup_frontend_routes(app: FastAPI):
|
||||
"""Mount the prebuilt frontend UI and root redirect for all transports."""
|
||||
try:
|
||||
from pipecat_ai_small_webrtc_prebuilt.frontend import SmallWebRTCPrebuiltUI
|
||||
from pipecat_ai_prebuilt.frontend import PipecatPrebuiltUI
|
||||
except ImportError as e:
|
||||
logger.error(f"Prebuilt frontend not available: {e}")
|
||||
return
|
||||
|
||||
app.mount("/client", PipecatPrebuiltUI)
|
||||
|
||||
@app.get("/", include_in_schema=False)
|
||||
async def root_redirect():
|
||||
"""Redirect root requests to client interface."""
|
||||
return RedirectResponse(url="/client/")
|
||||
|
||||
|
||||
def _setup_webrtc_routes(
|
||||
app: FastAPI, args: argparse.Namespace, active_sessions: dict[str, dict[str, Any]]
|
||||
):
|
||||
"""Set up WebRTC-specific routes."""
|
||||
if not _transport_routes_enabled("webrtc"):
|
||||
return
|
||||
|
||||
try:
|
||||
from pipecat.transports.smallwebrtc.connection import SmallWebRTCConnection
|
||||
from pipecat.transports.smallwebrtc.request_handler import (
|
||||
IceCandidate,
|
||||
@@ -233,30 +621,9 @@ def _setup_webrtc_routes(app: FastAPI, args: argparse.Namespace):
|
||||
SmallWebRTCRequestHandler,
|
||||
)
|
||||
except ImportError as e:
|
||||
logger.error(f"WebRTC transport dependencies not installed: {e}")
|
||||
logger.warning(f"WebRTC routes disabled after dependency check passed: {e}")
|
||||
return
|
||||
|
||||
class IceServer(TypedDict, total=False):
|
||||
urls: str | list[str]
|
||||
|
||||
class IceConfig(TypedDict):
|
||||
iceServers: list[IceServer]
|
||||
|
||||
class StartBotResult(TypedDict, total=False):
|
||||
sessionId: str
|
||||
iceConfig: IceConfig | None
|
||||
|
||||
# In-memory store of active sessions: session_id -> session info
|
||||
active_sessions: dict[str, dict[str, Any]] = {}
|
||||
|
||||
# Mount the frontend
|
||||
app.mount("/client", SmallWebRTCPrebuiltUI)
|
||||
|
||||
@app.get("/", include_in_schema=False)
|
||||
async def root_redirect():
|
||||
"""Redirect root requests to client interface."""
|
||||
return RedirectResponse(url="/client/")
|
||||
|
||||
@app.get("/files/{filename:path}")
|
||||
async def download_file(filename: str):
|
||||
"""Handle file downloads."""
|
||||
@@ -315,29 +682,6 @@ def _setup_webrtc_routes(app: FastAPI, args: argparse.Namespace):
|
||||
await small_webrtc_handler.handle_patch_request(request)
|
||||
return {"status": "success"}
|
||||
|
||||
@app.post("/start")
|
||||
async def rtvi_start(request: Request):
|
||||
"""Mimic Pipecat Cloud's /start endpoint."""
|
||||
# Parse the request body
|
||||
try:
|
||||
request_data = await request.json()
|
||||
logger.debug(f"Received request: {request_data}")
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to parse request body: {e}")
|
||||
request_data = {}
|
||||
|
||||
# Store session info immediately in memory, replicate the behavior expected on Pipecat Cloud
|
||||
session_id = str(uuid.uuid4())
|
||||
active_sessions[session_id] = request_data.get("body", {})
|
||||
|
||||
result: StartBotResult = {"sessionId": session_id}
|
||||
if request_data.get("enableDefaultIceServers"):
|
||||
result["iceConfig"] = IceConfig(
|
||||
iceServers=[IceServer(urls=["stun:stun.l.google.com:19302"])]
|
||||
)
|
||||
|
||||
return result
|
||||
|
||||
@app.api_route(
|
||||
"/sessions/{session_id}/{path:path}",
|
||||
methods=["GET", "POST", "PUT", "PATCH", "DELETE"],
|
||||
@@ -562,13 +906,13 @@ def _setup_whatsapp_routes(app: FastAPI, args: argparse.Namespace):
|
||||
|
||||
def _setup_daily_routes(app: FastAPI, args: argparse.Namespace):
|
||||
"""Set up Daily-specific routes."""
|
||||
if not _transport_routes_enabled("daily"):
|
||||
return
|
||||
|
||||
@app.get("/")
|
||||
@app.get("/daily")
|
||||
async def create_room_and_start_agent():
|
||||
"""Launch a Daily bot and redirect to room."""
|
||||
print("Starting bot with Daily transport and redirecting to Daily room")
|
||||
|
||||
import aiohttp
|
||||
logger.debug("Starting bot with Daily transport and redirecting to Daily room")
|
||||
|
||||
from pipecat.runner.daily import configure
|
||||
|
||||
@@ -584,105 +928,6 @@ def _setup_daily_routes(app: FastAPI, args: argparse.Namespace):
|
||||
asyncio.create_task(bot_module.bot(runner_args))
|
||||
return RedirectResponse(room_url)
|
||||
|
||||
@app.post("/start")
|
||||
async def start_agent(request: Request):
|
||||
"""Handler for /start endpoints.
|
||||
|
||||
Expects POST body like::
|
||||
{
|
||||
"createDailyRoom": true,
|
||||
"dailyRoomProperties": { "start_video_off": true },
|
||||
"dailyMeetingTokenProperties": { "is_owner": true, "user_name": "Bot" },
|
||||
"body": { "custom_data": "value" }
|
||||
}
|
||||
"""
|
||||
print("Starting bot with Daily transport")
|
||||
|
||||
# Parse the request body
|
||||
try:
|
||||
request_data = await request.json()
|
||||
logger.debug(f"Received request: {request_data}")
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to parse request body: {e}")
|
||||
request_data = {}
|
||||
|
||||
create_daily_room = request_data.get("createDailyRoom", False)
|
||||
body = request_data.get("body", {})
|
||||
daily_room_properties_dict = request_data.get("dailyRoomProperties", None)
|
||||
daily_token_properties_dict = request_data.get("dailyMeetingTokenProperties", None)
|
||||
|
||||
bot_module = _get_bot_module()
|
||||
|
||||
existing_room_url = os.getenv("DAILY_ROOM_URL")
|
||||
|
||||
session_id = str(uuid.uuid4())
|
||||
result = None
|
||||
|
||||
# Configure room if:
|
||||
# 1. Explicitly requested via createDailyRoom in payload
|
||||
# 2. Using pre-configured room from DAILY_ROOM_URL env var
|
||||
if create_daily_room or existing_room_url:
|
||||
import aiohttp
|
||||
|
||||
from pipecat.runner.daily import configure
|
||||
from pipecat.transports.daily.utils import (
|
||||
DailyMeetingTokenProperties,
|
||||
DailyRoomProperties,
|
||||
)
|
||||
|
||||
async with aiohttp.ClientSession() as session:
|
||||
# Parse dailyRoomProperties if provided
|
||||
room_properties = None
|
||||
if daily_room_properties_dict:
|
||||
# Apply Pipecat Cloud's session policy if caller didn't override.
|
||||
daily_room_properties_dict.setdefault(
|
||||
"exp", time.time() + PIPECAT_ROOM_EXP_HOURS * 3600
|
||||
)
|
||||
daily_room_properties_dict.setdefault("eject_at_room_exp", True)
|
||||
try:
|
||||
room_properties = DailyRoomProperties(**daily_room_properties_dict)
|
||||
logger.debug(f"Using custom room properties: {room_properties}")
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to parse dailyRoomProperties: {e}")
|
||||
# Continue without custom properties
|
||||
|
||||
# Parse dailyMeetingTokenProperties if provided
|
||||
token_properties = None
|
||||
if daily_token_properties_dict:
|
||||
try:
|
||||
token_properties = DailyMeetingTokenProperties(
|
||||
**daily_token_properties_dict
|
||||
)
|
||||
logger.debug(f"Using custom token properties: {token_properties}")
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to parse dailyMeetingTokenProperties: {e}")
|
||||
# Continue without custom properties
|
||||
|
||||
room_url, token = await configure(
|
||||
session,
|
||||
room_exp_duration=PIPECAT_ROOM_EXP_HOURS,
|
||||
room_properties=room_properties,
|
||||
token_properties=token_properties,
|
||||
)
|
||||
runner_args = DailyRunnerArguments(
|
||||
room_url=room_url, token=token, body=body, session_id=session_id
|
||||
)
|
||||
result = {
|
||||
"dailyRoom": room_url,
|
||||
"dailyToken": token,
|
||||
"sessionId": session_id,
|
||||
}
|
||||
else:
|
||||
runner_args = RunnerArguments(body=body, session_id=session_id)
|
||||
|
||||
# Update CLI args.
|
||||
runner_args.cli_args = args
|
||||
|
||||
# Start the bot in the background
|
||||
asyncio.create_task(bot_module.bot(runner_args))
|
||||
|
||||
return result
|
||||
|
||||
if args.dialin:
|
||||
|
||||
@app.post("/daily-dialin-webhook")
|
||||
@@ -731,8 +976,6 @@ def _setup_daily_routes(app: FastAPI, args: argparse.Namespace):
|
||||
detail="Missing required fields: From, To, callId, callDomain",
|
||||
)
|
||||
|
||||
import aiohttp
|
||||
|
||||
from pipecat.runner.daily import configure
|
||||
from pipecat.runner.types import DailyDialinRequest, DialinSettings
|
||||
|
||||
@@ -801,44 +1044,54 @@ def _setup_daily_routes(app: FastAPI, args: argparse.Namespace):
|
||||
|
||||
|
||||
def _setup_telephony_routes(app: FastAPI, args: argparse.Namespace):
|
||||
"""Set up telephony-specific routes."""
|
||||
# XML response templates (Exotel doesn't use XML webhooks)
|
||||
XML_TEMPLATES = {
|
||||
"twilio": f"""<?xml version="1.0" encoding="UTF-8"?>
|
||||
"""Set up telephony-specific routes.
|
||||
|
||||
The WebSocket endpoint (``/ws``) is always registered so providers can
|
||||
connect directly. The XML webhook (``POST /``) is only registered when a
|
||||
specific telephony transport is chosen via ``-t`` because the XML template
|
||||
is provider-specific and requires a proxy hostname (``--proxy``).
|
||||
"""
|
||||
if not _transport_routes_enabled("telephony"):
|
||||
return
|
||||
|
||||
if args.transport in TELEPHONY_TRANSPORTS:
|
||||
# XML response templates (Exotel doesn't use XML webhooks)
|
||||
XML_TEMPLATES = {
|
||||
"twilio": f"""<?xml version="1.0" encoding="UTF-8"?>
|
||||
<Response>
|
||||
<Connect>
|
||||
<Stream url="wss://{args.proxy}/ws"></Stream>
|
||||
</Connect>
|
||||
<Pause length="40"/>
|
||||
</Response>""",
|
||||
"telnyx": f"""<?xml version="1.0" encoding="UTF-8"?>
|
||||
"telnyx": f"""<?xml version="1.0" encoding="UTF-8"?>
|
||||
<Response>
|
||||
<Connect>
|
||||
<Stream url="wss://{args.proxy}/ws" bidirectionalMode="rtp"></Stream>
|
||||
</Connect>
|
||||
<Pause length="40"/>
|
||||
</Response>""",
|
||||
"plivo": f"""<?xml version="1.0" encoding="UTF-8"?>
|
||||
"plivo": f"""<?xml version="1.0" encoding="UTF-8"?>
|
||||
<Response>
|
||||
<Stream bidirectional="true" keepCallAlive="true" contentType="audio/x-mulaw;rate=8000">wss://{args.proxy}/ws</Stream>
|
||||
</Response>""",
|
||||
}
|
||||
}
|
||||
|
||||
@app.post("/")
|
||||
async def start_call():
|
||||
"""Handle telephony webhook and return XML response."""
|
||||
if args.transport == "exotel":
|
||||
# Exotel doesn't use POST webhooks - redirect to proper documentation
|
||||
logger.debug("POST Exotel endpoint - not used")
|
||||
return {
|
||||
"error": "Exotel doesn't use POST webhooks",
|
||||
"websocket_url": f"wss://{args.proxy}/ws",
|
||||
"note": "Configure the WebSocket URL above in your Exotel App Bazaar Voicebot Applet",
|
||||
}
|
||||
else:
|
||||
logger.debug(f"POST {args.transport.upper()} XML")
|
||||
xml_content = XML_TEMPLATES.get(args.transport, "<Response></Response>")
|
||||
return HTMLResponse(content=xml_content, media_type="application/xml")
|
||||
@app.post("/")
|
||||
async def start_call():
|
||||
"""Handle telephony webhook and return XML response."""
|
||||
if args.transport == "exotel":
|
||||
# Exotel doesn't use POST webhooks - redirect to proper documentation
|
||||
logger.debug("POST Exotel endpoint - not used")
|
||||
return {
|
||||
"error": "Exotel doesn't use POST webhooks",
|
||||
"websocket_url": f"wss://{args.proxy}/ws",
|
||||
"note": "Configure the WebSocket URL above in your Exotel App Bazaar Voicebot Applet",
|
||||
}
|
||||
else:
|
||||
logger.debug(f"POST {args.transport.upper()} XML")
|
||||
xml_content = XML_TEMPLATES.get(args.transport, "<Response></Response>")
|
||||
return HTMLResponse(content=xml_content, media_type="application/xml")
|
||||
|
||||
@app.websocket("/ws")
|
||||
async def websocket_endpoint(websocket: WebSocket):
|
||||
@@ -847,11 +1100,6 @@ def _setup_telephony_routes(app: FastAPI, args: argparse.Namespace):
|
||||
logger.debug("WebSocket connection accepted")
|
||||
await _run_telephony_bot(websocket, args)
|
||||
|
||||
@app.get("/")
|
||||
async def start_agent():
|
||||
"""Simple status endpoint for telephony transports."""
|
||||
return {"status": f"Bot started with {args.transport}"}
|
||||
|
||||
|
||||
async def _run_daily_direct(args: argparse.Namespace):
|
||||
"""Run Daily bot with direct connection (no FastAPI server)."""
|
||||
@@ -883,6 +1131,25 @@ async def _run_daily_direct(args: argparse.Namespace):
|
||||
await bot_module.bot(runner_args)
|
||||
|
||||
|
||||
async def _run_vonage():
|
||||
"""Run Vonage bot (no FastAPI server)."""
|
||||
logger.info("Running Vonage transport...")
|
||||
|
||||
application_id, session_id, token = await configure_vonage()
|
||||
runner_args = VonageRunnerArguments(
|
||||
application_id=application_id, vonage_session_id=session_id, token=token
|
||||
)
|
||||
runner_args.handle_sigint = True
|
||||
|
||||
# Get the bot module and run it directly
|
||||
bot_module = _get_bot_module()
|
||||
|
||||
print(f"Joining Vonage session: {runner_args.vonage_session_id}")
|
||||
print()
|
||||
|
||||
await bot_module.bot(runner_args)
|
||||
|
||||
|
||||
def _validate_and_clean_proxy(proxy: str) -> str:
|
||||
"""Validate and clean proxy hostname, removing protocol if present."""
|
||||
if not proxy:
|
||||
@@ -922,22 +1189,27 @@ def runner_port() -> int:
|
||||
def main(parser: argparse.ArgumentParser | None = None):
|
||||
"""Start the Pipecat development runner.
|
||||
|
||||
Parses command-line arguments and starts a FastAPI server configured
|
||||
for the specified transport type.
|
||||
Parses command-line arguments and starts a FastAPI server that supports
|
||||
WebRTC, Daily, and telephony transports simultaneously. Clients declare
|
||||
which transport to use via the ``transport`` field in the ``/start`` body.
|
||||
|
||||
When ``-t`` is provided, the server restricts ``/start`` to that transport
|
||||
only and displays transport-specific startup information.
|
||||
|
||||
The runner discovers and runs any ``bot(runner_args)`` function found in the
|
||||
calling module.
|
||||
|
||||
Command-line arguments:
|
||||
- --host: Server host address (default: localhost) 879
|
||||
- --host: Server host address (default: localhost)
|
||||
- --port: Server port (default: 7860)
|
||||
- -t/--transport: Transport type (daily, webrtc, twilio, telnyx, plivo, exotel)
|
||||
- -t/--transport: Restrict to a single transport and set as default for /start
|
||||
(daily, webrtc, twilio, telnyx, plivo, exotel). Omit to support all transports.
|
||||
- -x/--proxy: Public proxy hostname for telephony webhooks
|
||||
- -d/--direct: Connect directly to Daily room (automatically sets transport to daily)
|
||||
- -f/--folder: Path to downloads folder
|
||||
- --dialin: Enable Daily PSTN dial-in webhook handling (requires Daily transport)
|
||||
- --dialin: Enable Daily PSTN dial-in webhook handling
|
||||
- --esp32: Enable SDP munging for ESP32 compatibility (requires --host with IP address)
|
||||
- --whatsapp: Ensure requried WhatsApp environment variables are present
|
||||
- --whatsapp: Ensure required WhatsApp environment variables are present
|
||||
- -v/--verbose: Increase logging verbosity
|
||||
|
||||
Args:
|
||||
@@ -957,9 +1229,12 @@ def main(parser: argparse.ArgumentParser | None = None):
|
||||
"-t",
|
||||
"--transport",
|
||||
type=str,
|
||||
choices=["daily", "webrtc", *TELEPHONY_TRANSPORTS],
|
||||
default="webrtc",
|
||||
help="Transport type",
|
||||
choices=["daily", "vonage", "webrtc", *TELEPHONY_TRANSPORTS],
|
||||
default=None,
|
||||
help=(
|
||||
"Restrict the server to a single transport and set it as the default for /start. "
|
||||
"Omit to support all transports simultaneously (default behaviour)."
|
||||
),
|
||||
)
|
||||
parser.add_argument("-x", "--proxy", help="Public proxy host name")
|
||||
parser.add_argument(
|
||||
@@ -977,7 +1252,7 @@ def main(parser: argparse.ArgumentParser | None = None):
|
||||
"--dialin",
|
||||
action="store_true",
|
||||
default=False,
|
||||
help="Enable Daily PSTN dial-in webhook handling (requires Daily transport)",
|
||||
help="Enable Daily PSTN dial-in webhook handling",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--esp32",
|
||||
@@ -989,7 +1264,7 @@ def main(parser: argparse.ArgumentParser | None = None):
|
||||
"--whatsapp",
|
||||
action="store_true",
|
||||
default=False,
|
||||
help="Ensure requried WhatsApp environment variables are present",
|
||||
help="Ensure required WhatsApp environment variables are present",
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
@@ -998,12 +1273,13 @@ def main(parser: argparse.ArgumentParser | None = None):
|
||||
if args.proxy:
|
||||
args.proxy = _validate_and_clean_proxy(args.proxy)
|
||||
|
||||
# Auto-set transport to daily if --direct is used without explicit transport
|
||||
if args.direct and args.transport == "webrtc": # webrtc is the default
|
||||
args.transport = "daily"
|
||||
elif args.direct and args.transport != "daily":
|
||||
logger.error("--direct flag only works with Daily transport (-t daily)")
|
||||
return
|
||||
# --direct implies Daily transport
|
||||
if args.direct:
|
||||
if args.transport is None or args.transport == "daily":
|
||||
args.transport = "daily"
|
||||
else:
|
||||
logger.error("--direct flag only works with Daily transport (-t daily)")
|
||||
return
|
||||
|
||||
# Validate ESP32 requirements
|
||||
if args.esp32 and args.host == "localhost":
|
||||
@@ -1011,7 +1287,7 @@ def main(parser: argparse.ArgumentParser | None = None):
|
||||
return
|
||||
|
||||
# Validate dial-in requirements
|
||||
if args.dialin and args.transport != "daily":
|
||||
if args.dialin and args.transport is not None and args.transport != "daily":
|
||||
logger.error("--dialin flag only works with Daily transport (-t daily)")
|
||||
return
|
||||
|
||||
@@ -1029,28 +1305,12 @@ def main(parser: argparse.ArgumentParser | None = None):
|
||||
asyncio.run(_run_daily_direct(args))
|
||||
return
|
||||
|
||||
# Print startup message for server-based transports
|
||||
if args.transport == "webrtc":
|
||||
print()
|
||||
if args.esp32:
|
||||
print(f"🚀 Bot ready! (ESP32 mode)")
|
||||
elif args.whatsapp:
|
||||
print(f"🚀 Bot ready! (WhatsApp)")
|
||||
else:
|
||||
print(f"🚀 Bot ready!")
|
||||
print(f" → Open http://{args.host}:{args.port}/client in your browser")
|
||||
print()
|
||||
elif args.transport == "daily":
|
||||
print()
|
||||
print(f"🚀 Bot ready!")
|
||||
if args.dialin:
|
||||
print(
|
||||
f" → Daily dial-in webhook: http://{args.host}:{args.port}/daily-dialin-webhook"
|
||||
)
|
||||
print(f" → Configure this URL in your Daily phone number settings")
|
||||
else:
|
||||
print(f" → Open http://{args.host}:{args.port} in your browser to start a session")
|
||||
# Print startup message
|
||||
_print_startup_message(args)
|
||||
if args.transport == "vonage":
|
||||
asyncio.run(_run_vonage())
|
||||
print()
|
||||
return
|
||||
|
||||
RUNNER_DOWNLOADS_FOLDER = args.folder
|
||||
RUNNER_HOST = args.host
|
||||
|
||||
@@ -99,16 +99,35 @@ class DailyRunnerArguments(RunnerArguments):
|
||||
token: str | None = None
|
||||
|
||||
|
||||
@dataclass
|
||||
class VonageRunnerArguments(RunnerArguments):
|
||||
"""Vonage transport session arguments for the runner.
|
||||
|
||||
Parameters:
|
||||
application_id: Vonage application ID
|
||||
vonage_session_id: Vonage session ID
|
||||
token: Vonage Session Token
|
||||
"""
|
||||
|
||||
application_id: str
|
||||
vonage_session_id: str
|
||||
token: str
|
||||
|
||||
|
||||
@dataclass
|
||||
class WebSocketRunnerArguments(RunnerArguments):
|
||||
"""WebSocket transport session arguments for the runner.
|
||||
|
||||
Parameters:
|
||||
websocket: WebSocket connection for audio streaming
|
||||
transport_type: Transport type identifier. Set to ``"websocket"`` for plain
|
||||
WebSocket connections; ``None`` triggers auto-detection from the first
|
||||
telephony provider message.
|
||||
body: Additional request data
|
||||
"""
|
||||
|
||||
websocket: WebSocket
|
||||
transport_type: str | None = None
|
||||
|
||||
|
||||
@dataclass
|
||||
|
||||
@@ -33,7 +33,7 @@ import json
|
||||
import os
|
||||
import re
|
||||
from collections.abc import Callable
|
||||
from typing import Any
|
||||
from typing import Any, cast
|
||||
|
||||
from fastapi import WebSocket
|
||||
from loguru import logger
|
||||
@@ -42,9 +42,10 @@ from pipecat.runner.types import (
|
||||
DailyRunnerArguments,
|
||||
LiveKitRunnerArguments,
|
||||
SmallWebRTCRunnerArguments,
|
||||
VonageRunnerArguments,
|
||||
WebSocketRunnerArguments,
|
||||
)
|
||||
from pipecat.transports.base_transport import BaseTransport
|
||||
from pipecat.transports.base_transport import BaseTransport, TransportParams
|
||||
|
||||
|
||||
def _detect_transport_type_from_message(message_data: dict) -> str:
|
||||
@@ -271,6 +272,14 @@ def get_transport_client_id(transport: BaseTransport, client: Any) -> str:
|
||||
except ImportError:
|
||||
pass
|
||||
|
||||
try:
|
||||
from pipecat.transports.vonage.video_connector import VonageVideoConnectorTransport
|
||||
|
||||
if isinstance(transport, VonageVideoConnectorTransport):
|
||||
return client["streamId"]
|
||||
except ImportError:
|
||||
pass
|
||||
|
||||
logger.warning(f"Unable to get client id from unsupported transport {type(transport)}")
|
||||
return ""
|
||||
|
||||
@@ -303,6 +312,24 @@ async def maybe_capture_participant_camera(
|
||||
except ImportError:
|
||||
pass
|
||||
|
||||
try:
|
||||
from pipecat.transports.vonage.video_connector import (
|
||||
SubscribeSettings,
|
||||
VonageVideoConnectorTransport,
|
||||
)
|
||||
|
||||
if isinstance(transport, VonageVideoConnectorTransport):
|
||||
await transport.subscribe_to_stream(
|
||||
client["streamId"],
|
||||
SubscribeSettings(
|
||||
subscribe_to_audio=True,
|
||||
subscribe_to_video=True,
|
||||
preferred_framerate=framerate if framerate != 0 else None,
|
||||
),
|
||||
)
|
||||
except ImportError:
|
||||
pass
|
||||
|
||||
|
||||
async def maybe_capture_participant_screen(
|
||||
transport: BaseTransport, client: Any, framerate: int = 0
|
||||
@@ -534,6 +561,10 @@ async def create_transport(
|
||||
audio_out_enabled=True,
|
||||
# add_wav_header and serializer will be set automatically
|
||||
),
|
||||
"vonage": lambda: VonageVideoConnectorTransportParams(
|
||||
audio_in_enabled=True,
|
||||
audio_out_enabled=True
|
||||
),
|
||||
}
|
||||
|
||||
transport = await create_transport(runner_args, transport_params)
|
||||
@@ -562,6 +593,12 @@ async def create_transport(
|
||||
)
|
||||
|
||||
elif isinstance(runner_args, WebSocketRunnerArguments):
|
||||
if runner_args.transport_type == "websocket":
|
||||
params = _get_transport_params("websocket", transport_params)
|
||||
from pipecat.transports.websocket.fastapi import FastAPIWebsocketTransport
|
||||
|
||||
return FastAPIWebsocketTransport(websocket=runner_args.websocket, params=params)
|
||||
|
||||
# Parse once to determine the provider and get data
|
||||
transport_type, call_data = await parse_telephony_websocket(runner_args.websocket)
|
||||
params = _get_transport_params(transport_type, transport_params)
|
||||
@@ -581,6 +618,31 @@ async def create_transport(
|
||||
runner_args.room_name,
|
||||
params=params,
|
||||
)
|
||||
elif isinstance(runner_args, VonageRunnerArguments):
|
||||
from pipecat.transports.vonage.video_connector import (
|
||||
VonageVideoConnectorTransport,
|
||||
VonageVideoConnectorTransportParams,
|
||||
)
|
||||
|
||||
try:
|
||||
params = cast(
|
||||
VonageVideoConnectorTransportParams,
|
||||
_get_transport_params("vonage", transport_params),
|
||||
)
|
||||
except ValueError:
|
||||
webrtc_params: TransportParams = cast(
|
||||
TransportParams, _get_transport_params("webrtc", transport_params)
|
||||
)
|
||||
params = VonageVideoConnectorTransportParams(
|
||||
**webrtc_params.model_dump(),
|
||||
video_in_auto_subscribe=True,
|
||||
)
|
||||
|
||||
return VonageVideoConnectorTransport(
|
||||
runner_args.application_id,
|
||||
runner_args.vonage_session_id,
|
||||
runner_args.token,
|
||||
params=params,
|
||||
)
|
||||
else:
|
||||
raise ValueError(f"Unsupported runner arguments type: {type(runner_args)}")
|
||||
|
||||
52
src/pipecat/runner/vonage.py
Normal file
52
src/pipecat/runner/vonage.py
Normal file
@@ -0,0 +1,52 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
|
||||
"""Vonage session configuration utilities.
|
||||
|
||||
This module extracts the necessary parameters to connect to a Vonage Video session.
|
||||
|
||||
Required environment variables:
|
||||
|
||||
- VONAGE_APPLICATION_ID - Vonage application ID
|
||||
- VONAGE_SESSION_ID - Vonage session ID
|
||||
- VONAGE_TOKEN - Vonage token
|
||||
|
||||
Example:
|
||||
from pipecat.runner.vonage import configure
|
||||
|
||||
application_id, session_id, token = await configure()
|
||||
"""
|
||||
|
||||
import os
|
||||
|
||||
|
||||
async def configure() -> tuple[str, str, str]:
|
||||
"""Configure Vonage application ID, session ID and token from environment.
|
||||
|
||||
Returns:
|
||||
Tuple containing the server application_id, session_id and token.
|
||||
|
||||
Raises:
|
||||
Exception: If required Vonage configuration is not provided.
|
||||
"""
|
||||
application_id = os.getenv("VONAGE_APPLICATION_ID")
|
||||
session_id = os.getenv("VONAGE_SESSION_ID")
|
||||
token = os.getenv("VONAGE_TOKEN")
|
||||
|
||||
if not application_id:
|
||||
raise Exception(
|
||||
"No Vonage application ID specified. Use set VONAGE_APPLICATION_ID in your environment."
|
||||
)
|
||||
|
||||
if not session_id:
|
||||
raise Exception(
|
||||
"No Vonage Session ID specified. Use set VONAGE_SESSION_ID in your environment."
|
||||
)
|
||||
|
||||
if not token:
|
||||
raise Exception("No Vonage token specified. Use set VONAGE_TOKEN in your environment.")
|
||||
|
||||
return (application_id, session_id, token)
|
||||
@@ -586,9 +586,9 @@ class AssemblyAISTTService(WebsocketSTTService):
|
||||
await self._call_event_handler("on_connected")
|
||||
logger.debug(f"{self} Connected to AssemblyAI WebSocket")
|
||||
except Exception as e:
|
||||
self._websocket = None
|
||||
self._connected = False
|
||||
await self.push_error(error_msg=f"Unable to connect to AssemblyAI: {e}", exception=e)
|
||||
raise
|
||||
|
||||
async def _disconnect_websocket(self):
|
||||
"""Close the websocket connection to AssemblyAI."""
|
||||
|
||||
@@ -339,10 +339,10 @@ class AWSTranscribeSTTService(WebsocketSTTService):
|
||||
await self._call_event_handler("on_connected")
|
||||
logger.info(f"{self} Successfully connected to AWS Transcribe")
|
||||
except Exception as e:
|
||||
self._websocket = None
|
||||
await self.push_error(
|
||||
error_msg=f"Unable to connect to AWS Transcribe: {e}", exception=e
|
||||
)
|
||||
raise
|
||||
|
||||
async def _disconnect_websocket(self):
|
||||
"""Close the websocket connection to AWS Transcribe."""
|
||||
|
||||
@@ -540,14 +540,25 @@ class AzureTTSService(TTSService, AzureBaseTTSService):
|
||||
self._last_timestamp = timestamp
|
||||
|
||||
async def _word_processor_task_handler(self):
|
||||
"""Process word timestamps from the queue and call add_word_timestamps."""
|
||||
"""Process word timestamps from the queue and call add_word_timestamps.
|
||||
|
||||
Also handles a None sentinel from _handle_completed: once all pending
|
||||
words have been drained, it signals audio stream completion via
|
||||
_audio_queue so that run_tts exits only after the last word has been
|
||||
processed.
|
||||
"""
|
||||
while True:
|
||||
try:
|
||||
word, timestamp_seconds = await self._word_boundary_queue.get()
|
||||
if self._current_context_id:
|
||||
await self.add_word_timestamps(
|
||||
[(word, timestamp_seconds)], self._current_context_id
|
||||
)
|
||||
item = await self._word_boundary_queue.get()
|
||||
if item is None:
|
||||
# All words drained — now signal audio completion.
|
||||
self._audio_queue.put_nowait(None)
|
||||
else:
|
||||
word, timestamp_seconds = item
|
||||
if self._current_context_id:
|
||||
await self.add_word_timestamps(
|
||||
[(word, timestamp_seconds)], self._current_context_id
|
||||
)
|
||||
self._word_boundary_queue.task_done()
|
||||
except asyncio.CancelledError:
|
||||
break
|
||||
@@ -569,17 +580,21 @@ class AzureTTSService(TTSService, AzureBaseTTSService):
|
||||
Args:
|
||||
evt: Completion event from Azure Speech SDK.
|
||||
"""
|
||||
# Store duration for cumulative offset calculation
|
||||
if evt.result and evt.result.audio_duration:
|
||||
self._current_sentence_duration = evt.result.audio_duration.total_seconds()
|
||||
|
||||
# Flush any pending word before completing
|
||||
if self._last_word is not None:
|
||||
self._word_boundary_queue.put_nowait((self._last_word, self._last_timestamp))
|
||||
self._last_word = None
|
||||
self._last_timestamp = None
|
||||
|
||||
# Store duration for cumulative offset calculation
|
||||
if evt.result and evt.result.audio_duration:
|
||||
self._current_sentence_duration = evt.result.audio_duration.total_seconds()
|
||||
|
||||
self._audio_queue.put_nowait(None) # Signal completion
|
||||
# Route completion through the word boundary queue so the word processor
|
||||
# task drains all pending words before signaling audio stream completion.
|
||||
# Without this, the last word's TTSTextFrame may arrive after
|
||||
# TTSStoppedFrame, causing it to be missed by observers and the UI.
|
||||
self._word_boundary_queue.put_nowait(None)
|
||||
|
||||
def _handle_canceled(self, evt):
|
||||
"""Handle synthesis cancellation.
|
||||
|
||||
@@ -354,7 +354,8 @@ class CartesiaSTTService(WebsocketSTTService):
|
||||
self._websocket = await websocket_connect(ws_url, additional_headers=headers)
|
||||
await self._call_event_handler("on_connected")
|
||||
except Exception as e:
|
||||
await self.push_error(error_msg=f"Unknown error occurred: {e}", exception=e)
|
||||
self._websocket = None
|
||||
await self.push_error(error_msg=f"Unable to connect to Cartesia: {e}", exception=e)
|
||||
|
||||
async def _disconnect_websocket(self):
|
||||
ws = self._websocket
|
||||
|
||||
@@ -8,6 +8,7 @@
|
||||
|
||||
import base64
|
||||
import json
|
||||
import re
|
||||
from collections.abc import AsyncGenerator
|
||||
from dataclasses import dataclass, field
|
||||
from enum import StrEnum
|
||||
@@ -431,10 +432,20 @@ class CartesiaTTSService(WebsocketTTSService):
|
||||
base_lang = language.split("-")[0].lower()
|
||||
return base_lang in {"zh", "ja"}
|
||||
|
||||
def _process_word_timestamps_for_language(
|
||||
_CARTESIA_TAG_RE = re.compile(r"</?(?:spell|emotion|break|volume|speed)\b[^>]*>", re.IGNORECASE)
|
||||
|
||||
def _strip_cartesia_tags(self, text: str) -> str:
|
||||
text = self._CARTESIA_TAG_RE.sub(" ", text)
|
||||
text = re.sub(r"\s+", " ", text)
|
||||
return text.strip()
|
||||
|
||||
def _normalize_word_timestamps(
|
||||
self, words: list[str], starts: list[float]
|
||||
) -> list[tuple[str, float]]:
|
||||
"""Process word timestamps based on the current language.
|
||||
"""Normalize raw word timestamps from Cartesia before further processing.
|
||||
|
||||
Strips Cartesia SSML tags (spell, emotion, break, volume, speed) from each word
|
||||
and drops entries that become empty after stripping.
|
||||
|
||||
For Chinese and Japanese, Cartesia groups related characters in the same timestamp
|
||||
message.
|
||||
@@ -458,14 +469,18 @@ class CartesiaTTSService(WebsocketTTSService):
|
||||
# For Chinese/Japanese, combine all characters in this message into one word
|
||||
# using the first character's start time.
|
||||
if words and starts:
|
||||
combined_word = "".join(words)
|
||||
combined_word = "".join(self._strip_cartesia_tags(w) for w in words)
|
||||
first_start = starts[0]
|
||||
return [(combined_word, first_start)]
|
||||
return [(combined_word, first_start)] if combined_word else []
|
||||
else:
|
||||
return []
|
||||
else:
|
||||
# For non-CJK languages, use as-is
|
||||
return list(zip(words, starts))
|
||||
result = []
|
||||
for word, start in zip(words, starts):
|
||||
cleaned = self._strip_cartesia_tags(word)
|
||||
if cleaned:
|
||||
result.append((cleaned, start))
|
||||
return result
|
||||
|
||||
def _word_timestamps_include_inter_frame_spaces(self) -> bool:
|
||||
"""Whether timestamp text should be treated as carrying its own spacing."""
|
||||
@@ -662,7 +677,7 @@ class CartesiaTTSService(WebsocketTTSService):
|
||||
await self.remove_audio_context(ctx_id)
|
||||
elif msg["type"] == "timestamps":
|
||||
# Process the timestamps based on language before adding them
|
||||
processed_timestamps = self._process_word_timestamps_for_language(
|
||||
processed_timestamps = self._normalize_word_timestamps(
|
||||
msg["word_timestamps"]["words"], msg["word_timestamps"]["start"]
|
||||
)
|
||||
await self.add_word_timestamps(
|
||||
|
||||
@@ -358,7 +358,8 @@ class ElevenLabsSTTService(SegmentedSTTService):
|
||||
|
||||
# Add required model_id and language_code
|
||||
data.add_field("model_id", self._settings.model)
|
||||
data.add_field("language_code", self._settings.language)
|
||||
if self._settings.language:
|
||||
data.add_field("language_code", self._settings.language)
|
||||
if self._settings.tag_audio_events is not None:
|
||||
data.add_field("tag_audio_events", str(self._settings.tag_audio_events).lower())
|
||||
keyterms = self._settings.keyterms
|
||||
@@ -822,6 +823,7 @@ class ElevenLabsRealtimeSTTService(WebsocketSTTService):
|
||||
await self._call_event_handler("on_connected")
|
||||
logger.debug("Connected to ElevenLabs Realtime STT")
|
||||
except Exception as e:
|
||||
self._websocket = None
|
||||
await self.push_error(
|
||||
error_msg=f"Unable to connect to ElevenLabs Realtime STT: {e}", exception=e
|
||||
)
|
||||
|
||||
@@ -594,6 +594,10 @@ class ElevenLabsTTSService(WebsocketTTSService):
|
||||
self._partial_word_start_time = 0.0
|
||||
self._alignment_started_context_ids: set[str | None] = set()
|
||||
|
||||
# Context IDs whose context-init has been sent, so the keepalive knows
|
||||
# which contexts are safe to target.
|
||||
self._context_init_sent: set[str] = set()
|
||||
|
||||
# Context management for v1 multi API
|
||||
self._receive_task = None
|
||||
self._keepalive_task = None
|
||||
@@ -792,6 +796,7 @@ class ElevenLabsTTSService(WebsocketTTSService):
|
||||
finally:
|
||||
await self.remove_active_audio_context()
|
||||
self._websocket = None
|
||||
self._context_init_sent.clear()
|
||||
await self._call_event_handler("on_disconnected")
|
||||
|
||||
def _get_websocket(self):
|
||||
@@ -822,6 +827,7 @@ class ElevenLabsTTSService(WebsocketTTSService):
|
||||
self._partial_word = ""
|
||||
self._partial_word_start_time = 0.0
|
||||
self._alignment_started_context_ids.discard(context_id)
|
||||
self._context_init_sent.discard(context_id)
|
||||
|
||||
async def on_audio_context_interrupted(self, context_id: str):
|
||||
"""Close the ElevenLabs context when the bot is interrupted."""
|
||||
@@ -914,26 +920,35 @@ class ElevenLabsTTSService(WebsocketTTSService):
|
||||
while True:
|
||||
await asyncio.sleep(KEEPALIVE_SLEEP)
|
||||
try:
|
||||
if self._websocket and self._websocket.state is State.OPEN:
|
||||
context_id = self.get_active_audio_context_id()
|
||||
if context_id:
|
||||
# Send keepalive with context ID to keep the connection alive
|
||||
keepalive_message = {
|
||||
"text": "",
|
||||
"context_id": context_id,
|
||||
}
|
||||
logger.trace(f"Sending keepalive for context {context_id}")
|
||||
else:
|
||||
# It's possible to have a user interruption which clears the context
|
||||
# without generating a new TTS response. In this case, we'll just send
|
||||
# an empty message to keep the connection alive.
|
||||
keepalive_message = {"text": ""}
|
||||
logger.trace("Sending keepalive without context")
|
||||
await self._websocket.send(json.dumps(keepalive_message))
|
||||
await self._send_keepalive()
|
||||
except websockets.ConnectionClosed as e:
|
||||
logger.warning(f"{self} keepalive error: {e}")
|
||||
break
|
||||
|
||||
async def _send_keepalive(self):
|
||||
"""Send a single keepalive message to keep the WebSocket connection alive.
|
||||
|
||||
Only stamps a ``context_id`` once its context-init (carrying
|
||||
``voice_settings``) has been sent. Otherwise the keepalive would be the
|
||||
context's first message, with no ``voice_settings``, and ElevenLabs would
|
||||
reject the later context-init with a 1008 policy violation. A context-less
|
||||
keepalive is sufficient until the context-init is sent.
|
||||
"""
|
||||
if not self._websocket or self._websocket.state is not State.OPEN:
|
||||
return
|
||||
|
||||
context_id = self.get_active_audio_context_id()
|
||||
if context_id and context_id in self._context_init_sent:
|
||||
# The context's voice_settings context-init has been sent, so it's
|
||||
# safe to keep that context alive.
|
||||
keepalive_message = {"text": "", "context_id": context_id}
|
||||
else:
|
||||
# No active context, or the active context's context-init hasn't been
|
||||
# sent yet. A context-less keepalive keeps the connection alive without
|
||||
# opening the context prematurely.
|
||||
keepalive_message = {"text": ""}
|
||||
await self._websocket.send(json.dumps(keepalive_message))
|
||||
|
||||
async def _send_text(self, text: str, context_id: str):
|
||||
"""Send text to the WebSocket for synthesis."""
|
||||
if self._websocket and context_id:
|
||||
@@ -980,6 +995,9 @@ class ElevenLabsTTSService(WebsocketTTSService):
|
||||
locator.model_dump()
|
||||
for locator in self._pronunciation_dictionary_locators
|
||||
]
|
||||
# Mark the context-init as sent so the keepalive may now
|
||||
# target this context_id.
|
||||
self._context_init_sent.add(context_id)
|
||||
await self._websocket.send(json.dumps(msg))
|
||||
logger.trace(f"Created new context {context_id}")
|
||||
|
||||
|
||||
@@ -558,8 +558,9 @@ class GladiaSTTService(WebsocketSTTService):
|
||||
|
||||
logger.debug(f"{self} Connected to Gladia WebSocket")
|
||||
except Exception as e:
|
||||
self._websocket = None
|
||||
self._connection_active = False
|
||||
await self.push_error(error_msg=f"Unable to connect to Gladia: {e}", exception=e)
|
||||
raise
|
||||
|
||||
async def _disconnect_websocket(self):
|
||||
"""Close the websocket connection to Gladia."""
|
||||
|
||||
@@ -423,8 +423,8 @@ class GradiumSTTService(WebsocketSTTService):
|
||||
logger.debug("Connected to Gradium STT")
|
||||
|
||||
except Exception as e:
|
||||
await self.push_error(error_msg=f"Unknown error occurred: {e}", exception=e)
|
||||
raise
|
||||
self._websocket = None
|
||||
await self.push_error(error_msg=f"Unable to connect to Gradium: {e}", exception=e)
|
||||
|
||||
async def _disconnect(self):
|
||||
await super()._disconnect()
|
||||
|
||||
0
src/pipecat/services/inception/__init__.py
Normal file
0
src/pipecat/services/inception/__init__.py
Normal file
124
src/pipecat/services/inception/llm.py
Normal file
124
src/pipecat/services/inception/llm.py
Normal file
@@ -0,0 +1,124 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
|
||||
"""Inception LLM service implementation using OpenAI-compatible interface."""
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Literal
|
||||
|
||||
from loguru import logger
|
||||
|
||||
from pipecat.adapters.services.open_ai_adapter import OpenAILLMInvocationParams
|
||||
from pipecat.services.openai.base_llm import BaseOpenAILLMService
|
||||
from pipecat.services.openai.llm import OpenAILLMService
|
||||
from pipecat.services.settings import NOT_GIVEN as _NOT_GIVEN
|
||||
from pipecat.services.settings import _NotGiven, is_given
|
||||
|
||||
|
||||
@dataclass
|
||||
class InceptionLLMSettings(BaseOpenAILLMService.Settings):
|
||||
"""Settings for InceptionLLMService.
|
||||
|
||||
Parameters:
|
||||
reasoning_effort: Controls how much reasoning the model applies.
|
||||
One of "instant", "low", "medium", or "high". When unset, the
|
||||
parameter is omitted and Inception's server-side default applies.
|
||||
realtime: When True, reduces time to first diffusion block (TTFT).
|
||||
"""
|
||||
|
||||
reasoning_effort: Literal["instant", "low", "medium", "high"] | None | _NotGiven = field(
|
||||
default_factory=lambda: _NOT_GIVEN
|
||||
)
|
||||
realtime: bool | None | _NotGiven = field(default_factory=lambda: _NOT_GIVEN)
|
||||
|
||||
|
||||
class InceptionLLMService(OpenAILLMService):
|
||||
"""A service for interacting with Inception's API using the OpenAI-compatible interface.
|
||||
|
||||
This service extends OpenAILLMService to connect to Inception's API endpoint while
|
||||
maintaining full compatibility with OpenAI's interface and functionality.
|
||||
Supports Mercury-2, Inception's diffusion-based reasoning model.
|
||||
"""
|
||||
|
||||
# Inception doesn't support the "developer" message role.
|
||||
supports_developer_role = False
|
||||
|
||||
Settings = InceptionLLMSettings
|
||||
_settings: Settings
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
*,
|
||||
api_key: str,
|
||||
base_url: str = "https://api.inceptionlabs.ai/v1",
|
||||
settings: Settings | None = None,
|
||||
**kwargs,
|
||||
):
|
||||
"""Initialize the Inception LLM service.
|
||||
|
||||
Args:
|
||||
api_key: The API key for accessing Inception's API.
|
||||
base_url: The base URL for Inception API. Defaults to "https://api.inceptionlabs.ai/v1".
|
||||
settings: Runtime-updatable settings.
|
||||
**kwargs: Additional keyword arguments passed to OpenAILLMService.
|
||||
"""
|
||||
default_settings = self.Settings(
|
||||
model="mercury-2",
|
||||
reasoning_effort=None,
|
||||
realtime=None,
|
||||
)
|
||||
|
||||
if settings is not None:
|
||||
default_settings.apply_update(settings)
|
||||
|
||||
super().__init__(api_key=api_key, base_url=base_url, settings=default_settings, **kwargs)
|
||||
|
||||
def create_client(self, api_key=None, base_url=None, **kwargs):
|
||||
"""Create OpenAI-compatible client for Inception API endpoint.
|
||||
|
||||
Args:
|
||||
api_key: The API key for authentication. If None, uses instance default.
|
||||
base_url: The base URL for the API. If None, uses instance default.
|
||||
**kwargs: Additional keyword arguments for client configuration.
|
||||
|
||||
Returns:
|
||||
An OpenAI-compatible client configured for Inception's API.
|
||||
"""
|
||||
logger.debug(f"Creating Inception client with api {base_url}")
|
||||
return super().create_client(api_key, base_url, **kwargs)
|
||||
|
||||
def build_chat_completion_params(self, params_from_context: OpenAILLMInvocationParams) -> dict:
|
||||
"""Build parameters for Inception chat completion request.
|
||||
|
||||
Extends the base OpenAI parameters with Inception-specific options
|
||||
such as reasoning_effort and realtime.
|
||||
|
||||
Args:
|
||||
params_from_context: Parameters, derived from the LLM context, to
|
||||
use for the chat completion. Contains messages, tools, and tool
|
||||
choice.
|
||||
|
||||
Returns:
|
||||
Dictionary of parameters for the chat completion request.
|
||||
"""
|
||||
params = super().build_chat_completion_params(params_from_context)
|
||||
|
||||
if (
|
||||
is_given(self._settings.reasoning_effort)
|
||||
and self._settings.reasoning_effort is not None
|
||||
):
|
||||
params["reasoning_effort"] = self._settings.reasoning_effort
|
||||
|
||||
# realtime is Inception-specific and unknown to the OpenAI SDK,
|
||||
# so it must be passed via extra_body to avoid validation errors.
|
||||
extra_body = {}
|
||||
if is_given(self._settings.realtime) and self._settings.realtime is not None:
|
||||
extra_body["realtime"] = self._settings.realtime
|
||||
|
||||
if extra_body:
|
||||
params["extra_body"] = extra_body
|
||||
|
||||
return params
|
||||
@@ -155,7 +155,6 @@ def language_to_soniox_language(language: Language) -> str:
|
||||
Language.ID: "id",
|
||||
Language.IT: "it",
|
||||
Language.JA: "ja",
|
||||
Language.KA: "ka",
|
||||
Language.KK: "kk",
|
||||
Language.KN: "kn",
|
||||
Language.KO: "ko",
|
||||
@@ -232,6 +231,7 @@ class SonioxSTTSettings(STTSettings):
|
||||
context_version 2.
|
||||
enable_speaker_diarization: Whether to enable speaker diarization.
|
||||
enable_language_identification: Whether to enable language identification.
|
||||
max_endpoint_delay_ms: Max ms before endpoint detection finalizes the turn (500-3000).
|
||||
client_reference_id: Client reference ID to use for transcription.
|
||||
"""
|
||||
|
||||
@@ -242,6 +242,7 @@ class SonioxSTTSettings(STTSettings):
|
||||
enable_language_identification: bool | None | _NotGiven = field(
|
||||
default_factory=lambda: NOT_GIVEN
|
||||
)
|
||||
max_endpoint_delay_ms: int | None | _NotGiven = field(default_factory=lambda: NOT_GIVEN)
|
||||
client_reference_id: str | None | _NotGiven = field(default_factory=lambda: NOT_GIVEN)
|
||||
|
||||
|
||||
@@ -309,6 +310,7 @@ class SonioxSTTService(WebsocketSTTService):
|
||||
context=None,
|
||||
enable_speaker_diarization=False,
|
||||
enable_language_identification=False,
|
||||
max_endpoint_delay_ms=None,
|
||||
client_reference_id=None,
|
||||
)
|
||||
|
||||
@@ -390,8 +392,7 @@ class SonioxSTTService(WebsocketSTTService):
|
||||
changed = await super()._update_settings(delta)
|
||||
|
||||
if changed:
|
||||
await self._disconnect()
|
||||
await self._connect()
|
||||
await self._request_reconnect()
|
||||
|
||||
return changed
|
||||
|
||||
@@ -522,6 +523,7 @@ class SonioxSTTService(WebsocketSTTService):
|
||||
"audio_format": self._audio_format,
|
||||
"num_channels": self._num_channels,
|
||||
"enable_endpoint_detection": enable_endpoint_detection,
|
||||
"max_endpoint_delay_ms": s.max_endpoint_delay_ms,
|
||||
"sample_rate": self.sample_rate,
|
||||
"language_hints": _prepare_language_hints(assert_given(s.language_hints)),
|
||||
"language_hints_strict": s.language_hints_strict,
|
||||
@@ -537,8 +539,8 @@ class SonioxSTTService(WebsocketSTTService):
|
||||
await self._call_event_handler("on_connected")
|
||||
logger.debug("Connected to Soniox STT")
|
||||
except Exception as e:
|
||||
self._websocket = None
|
||||
await self.push_error(error_msg=f"Unable to connect to Soniox: {e}", exception=e)
|
||||
raise
|
||||
|
||||
async def _disconnect_websocket(self):
|
||||
"""Close the websocket connection to Soniox."""
|
||||
|
||||
@@ -44,17 +44,15 @@ GLADIA_TTFS_P99: float = 1.49
|
||||
GOOGLE_TTFS_P99: float = 1.57
|
||||
GRADIUM_TTFS_P99: float = 1.61
|
||||
GROQ_TTFS_P99: float = 1.54
|
||||
MISTRAL_TTFS_P99: float = 1.89
|
||||
OPENAI_TTFS_P99: float = 2.01
|
||||
OPENAI_REALTIME_TTFS_P99: float = 1.66
|
||||
SARVAM_TTFS_P99: float = 1.17
|
||||
SMALLEST_TTFS_P99: float = 1.59
|
||||
SONIOX_TTFS_P99: float = 0.35
|
||||
SPEECHMATICS_TTFS_P99: float = 0.74
|
||||
XAI_TTFS_P99: float = 2.14
|
||||
|
||||
# These services run locally and should be replaced with measured values
|
||||
NVIDIA_TTFS_P99: float = DEFAULT_TTFS_P99
|
||||
WHISPER_TTFS_P99: float = DEFAULT_TTFS_P99
|
||||
|
||||
# No benchmark available yet; using conservative default
|
||||
MISTRAL_TTFS_P99: float = DEFAULT_TTFS_P99
|
||||
SMALLEST_TTFS_P99: float = DEFAULT_TTFS_P99
|
||||
XAI_TTFS_P99: float = DEFAULT_TTFS_P99
|
||||
|
||||
@@ -50,9 +50,13 @@ from pipecat.services.ai_service import AIService
|
||||
from pipecat.services.settings import TTSSettings, is_given
|
||||
from pipecat.services.websocket_service import WebsocketService
|
||||
from pipecat.transcriptions.language import Language
|
||||
from pipecat.utils.context.aggregated_frame_sequencer import AggregatedFrameSequencer
|
||||
from pipecat.utils.context.word_completion_tracker import WordCompletionTracker
|
||||
from pipecat.utils.frame_queue import FrameQueue
|
||||
from pipecat.utils.text.base_text_filter import BaseTextFilter
|
||||
from pipecat.utils.text.pattern_pair_aggregator import PatternMatch
|
||||
from pipecat.utils.text.simple_text_aggregator import SimpleTextAggregator
|
||||
from pipecat.utils.text.word_timestamp_utils import merge_punct_tokens
|
||||
from pipecat.utils.time import seconds_to_nanoseconds
|
||||
|
||||
|
||||
@@ -97,7 +101,6 @@ class _WordTimestampEntry:
|
||||
word: str
|
||||
timestamp: float
|
||||
context_id: str
|
||||
includes_inter_frame_spaces: bool | None = None
|
||||
|
||||
|
||||
class TTSService(AIService):
|
||||
@@ -289,6 +292,16 @@ class TTSService(AIService):
|
||||
self._text_filters: Sequence[BaseTextFilter] = text_filters or []
|
||||
self._transport_destination: str | None = transport_destination
|
||||
|
||||
# Ordered sequence of every AggregatedTextFrame slot that passes through
|
||||
# _push_tts_frames (both spoken and skipped). Skipped frames are held here
|
||||
# until all preceding spoken slots are complete, then flushed downstream so
|
||||
# their append_to_context=True arrives at the assistant aggregator in the
|
||||
# correct order relative to the TTSTextFrames from spoken sentences.
|
||||
# Tracks all AggregatedTextFrame slots (spoken and skipped) in order.
|
||||
# Skipped frames are held until preceding spoken slots complete, ensuring
|
||||
# append_to_context=True reaches the assistant aggregator in the right order.
|
||||
self._aggregated_frame_sequencer = AggregatedFrameSequencer(name=str(self))
|
||||
|
||||
self._resampler = create_stream_resampler()
|
||||
|
||||
self._processing_text: bool = False
|
||||
@@ -690,7 +703,15 @@ class TTSService(AIService):
|
||||
# Stop the aggregation metric (no-op if already stopped on first sentence).
|
||||
await self.stop_text_aggregation_metrics()
|
||||
if remaining:
|
||||
await self._push_tts_frames(AggregatedTextFrame(remaining.text, remaining.type))
|
||||
await self._push_tts_frames(
|
||||
AggregatedTextFrame(
|
||||
remaining.text,
|
||||
remaining.type,
|
||||
raw_text=remaining.full_match
|
||||
if isinstance(remaining, PatternMatch)
|
||||
else remaining.text,
|
||||
)
|
||||
)
|
||||
|
||||
# We pause processing incoming frames if the LLM response included
|
||||
# text (it might be that it's only a function calling response). We
|
||||
@@ -733,7 +754,7 @@ class TTSService(AIService):
|
||||
push_assistant_aggregation = frame.append_to_context and not self._llm_response_started
|
||||
# Assumption: text in TTSSpeakFrame does not include inter-frame spaces
|
||||
await self._push_tts_frames(
|
||||
AggregatedTextFrame(frame.text, AggregationType.SENTENCE),
|
||||
AggregatedTextFrame(frame.text, AggregationType.SENTENCE, raw_text=frame.text),
|
||||
append_tts_text_to_context=frame.append_to_context,
|
||||
push_assistant_aggregation=push_assistant_aggregation,
|
||||
)
|
||||
@@ -887,6 +908,7 @@ class TTSService(AIService):
|
||||
self._llm_response_started = False
|
||||
self._streamed_text = ""
|
||||
self._text_aggregation_metrics_started = False
|
||||
self._aggregated_frame_sequencer.clear() # discard all pending slots on interruption
|
||||
await self.reset_word_timestamps()
|
||||
|
||||
await self._stop_audio_context_task()
|
||||
@@ -930,9 +952,23 @@ class TTSService(AIService):
|
||||
if aggregate.type != AggregationType.TOKEN:
|
||||
# Stop the aggregation metric on the first sentence only.
|
||||
await self.stop_text_aggregation_metrics()
|
||||
await self._push_tts_frames(
|
||||
AggregatedTextFrame(aggregate.text, aggregate.type), includes_inter_frame_spaces
|
||||
raw_text = (
|
||||
aggregate.full_match if isinstance(aggregate, PatternMatch) else aggregate.text
|
||||
)
|
||||
await self._push_tts_frames(
|
||||
AggregatedTextFrame(aggregate.text, aggregate.type, raw_text=raw_text),
|
||||
includes_inter_frame_spaces,
|
||||
)
|
||||
|
||||
async def _push_frame_respecting_previous_aggregated_frame(
|
||||
self, frame: AggregatedTextFrame, context_id: str
|
||||
):
|
||||
# Enqueue the skipped frame; returns it immediately if no spoken slot
|
||||
# precedes it, or holds it until the sequencer can flush it in order.
|
||||
for f in self._aggregated_frame_sequencer.register_skipped(
|
||||
frame, context_id, self._transport_destination
|
||||
):
|
||||
await self.push_frame(f)
|
||||
|
||||
async def _push_tts_frames(
|
||||
self,
|
||||
@@ -944,10 +980,13 @@ class TTSService(AIService):
|
||||
type = src_frame.aggregated_by
|
||||
text = src_frame.text
|
||||
|
||||
# Create context ID and store metadata
|
||||
context_id = self.create_context_id()
|
||||
|
||||
# Skip sending to TTS if the aggregation type is in the skip list. Simply
|
||||
# push the original frame downstream.
|
||||
if type in self._skip_aggregator_types:
|
||||
await self.push_frame(src_frame)
|
||||
await self._push_frame_respecting_previous_aggregated_frame(src_frame, context_id)
|
||||
return
|
||||
|
||||
# Whitespace gating depends on aggregation mode:
|
||||
@@ -998,9 +1037,6 @@ class TTSService(AIService):
|
||||
await self.stop_processing_metrics()
|
||||
return
|
||||
|
||||
# Create context ID and store metadata
|
||||
context_id = self.create_context_id()
|
||||
|
||||
# To support use cases that may want to know the text before it's spoken, we
|
||||
# push the AggregatedTextFrame version before transforming and sending to TTS.
|
||||
# However, we do not want to add this text to the assistant context until it
|
||||
@@ -1045,6 +1081,21 @@ class TTSService(AIService):
|
||||
await self.start_ttfb_metrics()
|
||||
await self.append_to_audio_context(context_id, TTSStartedFrame(context_id=context_id))
|
||||
|
||||
# Register this spoken frame so the sequencer can track its completion
|
||||
# and unblock any skipped frames queued behind it. Word-timestamp services
|
||||
# complete the slot via process_word; push_text_frames services complete it
|
||||
# below after the TTSTextFrame is appended to the audio context.
|
||||
self._aggregated_frame_sequencer.register_spoken(
|
||||
src_frame,
|
||||
context_id,
|
||||
tracker=WordCompletionTracker(
|
||||
prepared_text, llm_text=src_frame.raw_text or src_frame.text
|
||||
)
|
||||
if not self._push_text_frames
|
||||
else None,
|
||||
append_to_context=self._tts_contexts[context_id].append_to_context,
|
||||
)
|
||||
|
||||
await self.tts_process_generator(context_id, self.run_tts(prepared_text, context_id))
|
||||
|
||||
if not self._is_streaming_tokens:
|
||||
@@ -1066,6 +1117,10 @@ class TTSService(AIService):
|
||||
frame.append_to_context = append_tts_text_to_context
|
||||
# Appending to the context, so it preserves the ordering.
|
||||
await self.append_to_audio_context(context_id, frame)
|
||||
# TTSTextFrame is queued; mark the spoken slot complete so any skipped
|
||||
# frames (e.g. code blocks) waiting behind it can be flushed in order.
|
||||
for f in self._aggregated_frame_sequencer.complete_spoken_slot():
|
||||
await self.push_frame(f)
|
||||
|
||||
async def tts_process_generator(
|
||||
self, context_id: str, generator: AsyncGenerator[Frame | None, None]
|
||||
@@ -1114,10 +1169,8 @@ class TTSService(AIService):
|
||||
if self._initial_word_times:
|
||||
cached = self._initial_word_times.copy()
|
||||
self._initial_word_times = []
|
||||
for word, timestamp_seconds, ctx_id, ifs in cached:
|
||||
await self._add_word_timestamps(
|
||||
[(word, timestamp_seconds)], ctx_id, includes_inter_frame_spaces=ifs
|
||||
)
|
||||
for word, timestamp_seconds, ctx_id in cached:
|
||||
await self._add_word_timestamps([(word, timestamp_seconds)], ctx_id)
|
||||
|
||||
async def reset_word_timestamps(self):
|
||||
"""Reset word timestamp tracking."""
|
||||
@@ -1139,6 +1192,11 @@ class TTSService(AIService):
|
||||
playback order by _handle_audio_context. Otherwise they are processed immediately
|
||||
via _add_word_timestamps.
|
||||
|
||||
When ``includes_inter_frame_spaces`` is True (e.g. Inworld TTS), punctuation and
|
||||
space-only tokens are merged into the preceding word via ``_merge_punct_tokens``
|
||||
before queuing, so the tracker always receives words with trailing punctuation
|
||||
already attached. ``includes_inter_frame_spaces`` is reset to None after merging.
|
||||
|
||||
Args:
|
||||
word_times: List of (word, timestamp) tuples where timestamp is in seconds.
|
||||
context_id: Unique identifier for the TTS context.
|
||||
@@ -1147,29 +1205,22 @@ class TTSService(AIService):
|
||||
consumers must not inject additional spaces between tokens. None leaves
|
||||
the frame's own default unchanged.
|
||||
"""
|
||||
if includes_inter_frame_spaces:
|
||||
word_times = merge_punct_tokens(word_times)
|
||||
|
||||
if context_id and self.audio_context_available(context_id):
|
||||
for word, timestamp in word_times:
|
||||
await self.append_to_audio_context(
|
||||
context_id,
|
||||
_WordTimestampEntry(
|
||||
word=word,
|
||||
timestamp=timestamp,
|
||||
context_id=context_id,
|
||||
includes_inter_frame_spaces=includes_inter_frame_spaces,
|
||||
),
|
||||
_WordTimestampEntry(word=word, timestamp=timestamp, context_id=context_id),
|
||||
)
|
||||
else:
|
||||
await self._add_word_timestamps(
|
||||
word_times=word_times,
|
||||
context_id=context_id,
|
||||
includes_inter_frame_spaces=includes_inter_frame_spaces,
|
||||
)
|
||||
await self._add_word_timestamps(word_times=word_times, context_id=context_id)
|
||||
|
||||
async def _add_word_timestamps(
|
||||
self,
|
||||
word_times: list[tuple[str, float]],
|
||||
context_id: str | None = None,
|
||||
includes_inter_frame_spaces: bool | None = None,
|
||||
):
|
||||
"""Process word timestamps directly, building and pushing TTSTextFrames inline.
|
||||
|
||||
@@ -1185,19 +1236,15 @@ class TTSService(AIService):
|
||||
ts_ns = seconds_to_nanoseconds(timestamp)
|
||||
if self._initial_word_timestamp == -1:
|
||||
# Cache until we have audio and can compute PTS.
|
||||
self._initial_word_times.append(
|
||||
(word, timestamp, context_id, includes_inter_frame_spaces)
|
||||
)
|
||||
self._initial_word_times.append((word, timestamp, context_id))
|
||||
else:
|
||||
frame = TTSTextFrame(word, aggregated_by=AggregationType.WORD)
|
||||
if includes_inter_frame_spaces is not None:
|
||||
frame.includes_inter_frame_spaces = includes_inter_frame_spaces
|
||||
frame.pts = self._initial_word_timestamp + ts_ns
|
||||
frame.context_id = context_id
|
||||
if context_id in self._tts_contexts:
|
||||
frame.append_to_context = self._tts_contexts[context_id].append_to_context
|
||||
self._word_last_pts = frame.pts
|
||||
await self.push_frame(frame)
|
||||
pts = self._initial_word_timestamp + ts_ns
|
||||
# Build TTSTextFrame(s) for this word token, advancing the active
|
||||
# slot's tracker and flushing any skipped frames now unblocked.
|
||||
for f in self._aggregated_frame_sequencer.process_word(word, pts, context_id):
|
||||
if isinstance(f, TTSTextFrame):
|
||||
self._word_last_pts = f.pts
|
||||
await self.push_frame(f)
|
||||
|
||||
#
|
||||
# Audio context methods (active when using websocket-based TTS with context management)
|
||||
@@ -1382,6 +1429,18 @@ class TTSService(AIService):
|
||||
frame.pts = self._word_last_pts
|
||||
await self.push_frame(frame)
|
||||
|
||||
async def _apply_force_complete(self):
|
||||
"""Force-complete all incomplete spoken slots and push any unblocked skipped frames.
|
||||
|
||||
Called at end-of-context to handle TTS providers that silently drop word-timestamp
|
||||
events. Emits a TTSTextFrame for any remaining unspoken text, then flushes skipped
|
||||
frames that were blocked by those incomplete slots.
|
||||
"""
|
||||
for f in self._aggregated_frame_sequencer.force_complete(self._word_last_pts):
|
||||
if isinstance(f, TTSTextFrame):
|
||||
self._word_last_pts = f.pts
|
||||
await self.push_frame(f)
|
||||
|
||||
async def _handle_audio_context(self, context_id: str):
|
||||
"""Process items from an audio context queue until it is exhausted."""
|
||||
queue = self._audio_contexts[context_id]
|
||||
@@ -1402,7 +1461,6 @@ class TTSService(AIService):
|
||||
await self._add_word_timestamps(
|
||||
[(frame.word, frame.timestamp)],
|
||||
frame.context_id,
|
||||
includes_inter_frame_spaces=frame.includes_inter_frame_spaces,
|
||||
)
|
||||
continue
|
||||
elif isinstance(frame, TTSAudioRawFrame):
|
||||
@@ -1416,6 +1474,9 @@ class TTSService(AIService):
|
||||
if isinstance(frame, TTSStartedFrame):
|
||||
should_push_stop_frame = self._push_stop_frames
|
||||
elif isinstance(frame, TTSStoppedFrame):
|
||||
# Checking if we have any remaining spoken slots before pushing the TTSStoppedFrame
|
||||
await self._apply_force_complete()
|
||||
|
||||
should_push_stop_frame = False
|
||||
# Setting the last word timestamp as the TTSStoppedFrame PTS
|
||||
if not frame.pts:
|
||||
@@ -1433,8 +1494,11 @@ class TTSService(AIService):
|
||||
should_push_stop_frame = False
|
||||
break
|
||||
|
||||
await self._apply_force_complete()
|
||||
|
||||
if should_push_stop_frame and self._push_stop_frames:
|
||||
await self.push_frame(TTSStoppedFrame(context_id=context_id))
|
||||
|
||||
await self._maybe_reset_word_timestamps()
|
||||
|
||||
async def on_audio_context_interrupted(self, context_id: str):
|
||||
|
||||
@@ -76,7 +76,9 @@ class WebsocketService(ABC):
|
||||
logger.warning(f"{self} reconnecting (attempt: {attempt_number})")
|
||||
await self._disconnect_websocket()
|
||||
await self._connect_websocket()
|
||||
return await self._verify_connection()
|
||||
if not await self._verify_connection():
|
||||
raise ConnectionError(f"{self} websocket reconnection failed verification")
|
||||
return True
|
||||
|
||||
async def _try_reconnect(
|
||||
self,
|
||||
|
||||
@@ -293,8 +293,9 @@ class XAISTTService(WebsocketSTTService):
|
||||
await self._call_event_handler("on_connected")
|
||||
logger.debug(f"{self} connected to xAI STT WebSocket")
|
||||
except Exception as e:
|
||||
self._websocket = None
|
||||
self._session_ready.clear()
|
||||
await self.push_error(error_msg=f"Unable to connect to xAI STT: {e}", exception=e)
|
||||
raise
|
||||
|
||||
async def _disconnect_websocket(self):
|
||||
"""Close the WebSocket connection."""
|
||||
|
||||
@@ -448,6 +448,9 @@ class BaseOutputTransport(FrameProcessor):
|
||||
self._video_task: asyncio.Task | None = None
|
||||
self._clock_task: asyncio.Task | None = None
|
||||
|
||||
# If timestamps are equal, use this count to preserve the insertion order
|
||||
self._clock_queue_counter = itertools.count()
|
||||
|
||||
@property
|
||||
def sample_rate(self) -> int:
|
||||
"""Get the audio sample rate.
|
||||
@@ -498,7 +501,7 @@ class BaseOutputTransport(FrameProcessor):
|
||||
frame: The end frame signaling sender shutdown.
|
||||
"""
|
||||
# Let the sink tasks process the queue until they reach this EndFrame.
|
||||
await self._clock_queue.put((float("inf"), frame.id, frame))
|
||||
await self._clock_queue.put((float("inf"), next(self._clock_queue_counter), frame))
|
||||
await self._audio_queue.put(frame)
|
||||
|
||||
# At this point we have enqueued an EndFrame and we need to wait for
|
||||
@@ -610,7 +613,7 @@ class BaseOutputTransport(FrameProcessor):
|
||||
Args:
|
||||
frame: The frame with timing information to handle.
|
||||
"""
|
||||
await self._clock_queue.put((frame.pts, frame.id, frame))
|
||||
await self._clock_queue.put((frame.pts, next(self._clock_queue_counter), frame))
|
||||
|
||||
async def handle_sync_frame(self, frame: Frame):
|
||||
"""Handle frames that need synchronized processing.
|
||||
|
||||
0
src/pipecat/transports/vonage/__init__.py
Normal file
0
src/pipecat/transports/vonage/__init__.py
Normal file
1090
src/pipecat/transports/vonage/client.py
Normal file
1090
src/pipecat/transports/vonage/client.py
Normal file
File diff suppressed because it is too large
Load Diff
150
src/pipecat/transports/vonage/utils.py
Normal file
150
src/pipecat/transports/vonage/utils.py
Normal file
@@ -0,0 +1,150 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
"""Vonage Video Connector utils."""
|
||||
|
||||
from dataclasses import dataclass, replace
|
||||
from enum import StrEnum
|
||||
|
||||
import numpy as np
|
||||
import numpy.typing as npt
|
||||
|
||||
from pipecat.audio.resamplers.base_audio_resampler import BaseAudioResampler
|
||||
|
||||
|
||||
@dataclass
|
||||
class AudioProps:
|
||||
"""Audio properties for normalization.
|
||||
|
||||
Parameters:
|
||||
sample_rate: The sample rate of the audio.
|
||||
is_stereo: Whether the audio is stereo (True) or mono (False).
|
||||
"""
|
||||
|
||||
sample_rate: int
|
||||
is_stereo: bool
|
||||
|
||||
|
||||
class ImageFormat(StrEnum):
|
||||
"""Enum for image formats."""
|
||||
|
||||
PLANAR_YUV420 = "PLANAR_YUV420"
|
||||
PACKED_YUV444 = "PACKED_YUV444"
|
||||
RGB = "RGB"
|
||||
RGBA = "RGBA"
|
||||
BGR = "BGR"
|
||||
BGRA = "BGRA"
|
||||
|
||||
|
||||
def check_audio_data(
|
||||
buffer: bytes | memoryview, number_of_frames: int, number_of_channels: int
|
||||
) -> None:
|
||||
"""Check the audio sample width based on buffer size, number of frames and channels."""
|
||||
if number_of_channels not in (1, 2):
|
||||
raise ValueError(f"We only accept mono or stereo audio, got {number_of_channels}")
|
||||
|
||||
if isinstance(buffer, memoryview):
|
||||
bytes_per_sample = buffer.itemsize
|
||||
else:
|
||||
bytes_per_sample = len(buffer) // (number_of_frames * number_of_channels)
|
||||
|
||||
if bytes_per_sample != 2:
|
||||
raise ValueError(f"We only accept 16 bit PCM audio, got {bytes_per_sample * 8} bit")
|
||||
|
||||
|
||||
def process_audio_channels(
|
||||
audio: npt.NDArray[np.int16], current: AudioProps, target: AudioProps
|
||||
) -> npt.NDArray[np.int16]:
|
||||
"""Normalize audio channels to the target properties."""
|
||||
if current.is_stereo != target.is_stereo:
|
||||
if target.is_stereo:
|
||||
audio = np.repeat(audio, 2)
|
||||
else:
|
||||
audio = audio.reshape(-1, 2).mean(axis=1).astype(np.int16)
|
||||
|
||||
return audio
|
||||
|
||||
|
||||
async def process_audio(
|
||||
resampler: BaseAudioResampler,
|
||||
audio: npt.NDArray[np.int16],
|
||||
current: AudioProps,
|
||||
target: AudioProps,
|
||||
) -> npt.NDArray[np.int16]:
|
||||
"""Normalize audio to the target properties."""
|
||||
res_audio = audio
|
||||
if current.sample_rate != target.sample_rate:
|
||||
# first normalize channels to mono if needed, then resample, then normalize channels to target
|
||||
res_audio = process_audio_channels(res_audio, current, replace(current, is_stereo=False))
|
||||
current = replace(current, is_stereo=False)
|
||||
res_audio_bytes: bytes = await resampler.resample(
|
||||
res_audio.tobytes(), current.sample_rate, target.sample_rate
|
||||
)
|
||||
res_audio = np.frombuffer(res_audio_bytes, dtype=np.int16)
|
||||
|
||||
res_audio = process_audio_channels(res_audio, current, target)
|
||||
|
||||
return res_audio
|
||||
|
||||
|
||||
def image_colorspace_conversion(
|
||||
image: bytes, size: tuple[int, int], from_format: ImageFormat, to_format: ImageFormat
|
||||
) -> bytes | None:
|
||||
"""Convert image colorspace from one format to another."""
|
||||
match (from_format, to_format):
|
||||
case (fmt1, fmt2) if fmt1 == fmt2:
|
||||
return image
|
||||
case (ImageFormat.RGB, ImageFormat.BGR) | (ImageFormat.BGR, ImageFormat.RGB):
|
||||
np_input = np.frombuffer(image, dtype=np.uint8)
|
||||
np_output = np_input.reshape(size[1], size[0], 3)[:, :, ::-1]
|
||||
return np_output.tobytes()
|
||||
case (ImageFormat.RGBA, ImageFormat.BGRA) | (ImageFormat.BGRA, ImageFormat.RGBA):
|
||||
np_input = np.frombuffer(image, dtype=np.uint8)
|
||||
np_output = np_input.reshape(size[1], size[0], 4)[:, :, [2, 1, 0, 3]]
|
||||
return np_output.tobytes()
|
||||
case (ImageFormat.PLANAR_YUV420, ImageFormat.PACKED_YUV444):
|
||||
# YUV420 (I420) has Y plane of size width*height, U and V planes of size (width/2)*(height/2)
|
||||
# Packed YUV444 interleaves Y, U, V values for each pixel (YUVYUVYUV...)
|
||||
width, height = size
|
||||
y_plane_size = width * height
|
||||
uv_plane_size_420 = (width // 2) * (height // 2)
|
||||
|
||||
np_input = np.frombuffer(image, dtype=np.uint8)
|
||||
y_plane = np_input[:y_plane_size].reshape(height, width)
|
||||
u_plane_420 = np_input[y_plane_size : y_plane_size + uv_plane_size_420].reshape(
|
||||
height // 2, width // 2
|
||||
)
|
||||
v_plane_420 = np_input[
|
||||
y_plane_size + uv_plane_size_420 : y_plane_size + 2 * uv_plane_size_420
|
||||
].reshape(height // 2, width // 2)
|
||||
|
||||
# Upsample U and V planes by repeating each pixel in 2x2 blocks
|
||||
u_plane_444 = np.repeat(np.repeat(u_plane_420, 2, axis=0), 2, axis=1)
|
||||
v_plane_444 = np.repeat(np.repeat(v_plane_420, 2, axis=0), 2, axis=1)
|
||||
|
||||
# Interleave Y, U, V values for packed format (YUVYUVYUV...)
|
||||
np_output = np.stack([y_plane, u_plane_444, v_plane_444], axis=-1)
|
||||
return np_output.tobytes()
|
||||
case (ImageFormat.PACKED_YUV444, ImageFormat.PLANAR_YUV420):
|
||||
# Packed YUV444 has Y, U, V interleaved (YUVYUVYUV...)
|
||||
# YUV420 (I420) has Y plane of size width*height, U and V planes of size (width/2)*(height/2)
|
||||
width, height = size
|
||||
|
||||
np_input = np.frombuffer(image, dtype=np.uint8).reshape(height, width, 3)
|
||||
y_plane = np_input[:, :, 0].reshape(height, width)
|
||||
u_plane_444 = np_input[:, :, 1]
|
||||
v_plane_444 = np_input[:, :, 2]
|
||||
|
||||
# Downsample U and V planes by taking every other pixel (2x2 -> 1 averaging)
|
||||
u_plane_420 = u_plane_444[::2, ::2].reshape(height // 2, width // 2)
|
||||
v_plane_420 = v_plane_444[::2, ::2].reshape(height // 2, width // 2)
|
||||
|
||||
# Concatenate Y, U, V planes
|
||||
np_output = np.concatenate(
|
||||
[y_plane.flatten(), u_plane_420.flatten(), v_plane_420.flatten()]
|
||||
)
|
||||
return np_output.tobytes()
|
||||
case _:
|
||||
return None
|
||||
483
src/pipecat/transports/vonage/video_connector.py
Normal file
483
src/pipecat/transports/vonage/video_connector.py
Normal file
@@ -0,0 +1,483 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
"""Vonage Video Connector transport."""
|
||||
|
||||
from typing import Optional
|
||||
|
||||
from loguru import logger
|
||||
|
||||
from pipecat.frames.frames import (
|
||||
CancelFrame,
|
||||
EndFrame,
|
||||
Frame,
|
||||
InputAudioRawFrame,
|
||||
InterruptionFrame,
|
||||
OutputAudioRawFrame,
|
||||
OutputImageRawFrame,
|
||||
StartFrame,
|
||||
UserImageRawFrame,
|
||||
)
|
||||
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor, FrameProcessorSetup
|
||||
from pipecat.transports.base_input import BaseInputTransport
|
||||
from pipecat.transports.base_output import BaseOutputTransport
|
||||
from pipecat.transports.base_transport import BaseTransport
|
||||
from pipecat.transports.vonage.client import (
|
||||
Session, # type: ignore[attr-defined]
|
||||
Stream, # type: ignore[attr-defined]
|
||||
Subscriber, # type: ignore[attr-defined]
|
||||
VonageClient,
|
||||
VonageClientListener,
|
||||
)
|
||||
|
||||
# the following "as" imports help to re-export these types and avoid type checking warnings
|
||||
# when importing these types from the main transport module
|
||||
from pipecat.transports.vonage.client import (
|
||||
SubscribeSettings as SubscribeSettings,
|
||||
)
|
||||
from pipecat.transports.vonage.client import (
|
||||
VonageException as VonageException,
|
||||
)
|
||||
from pipecat.transports.vonage.client import (
|
||||
VonageVideoConnectorTransportParams as VonageVideoConnectorTransportParams,
|
||||
)
|
||||
|
||||
|
||||
class VonageVideoConnectorInputTransport(BaseInputTransport):
|
||||
"""Input transport for Vonage, handling audio input from the Vonage session.
|
||||
|
||||
Receives audio from a Vonage Video session and pushes it as input frames.
|
||||
"""
|
||||
|
||||
_params: VonageVideoConnectorTransportParams
|
||||
|
||||
def __init__(self, client: VonageClient, params: VonageVideoConnectorTransportParams):
|
||||
"""Initialize the Vonage input transport.
|
||||
|
||||
Args:
|
||||
client: The VonageClient instance to use.
|
||||
params: Transport parameters for input configuration.
|
||||
"""
|
||||
super().__init__(params)
|
||||
self._initialized: bool = False
|
||||
self._client: VonageClient = client
|
||||
self._listener_id: int = -1
|
||||
self._connected: bool = False
|
||||
|
||||
async def start(self, frame: StartFrame) -> None:
|
||||
"""Start the Vonage input transport.
|
||||
|
||||
Args:
|
||||
frame: The StartFrame to initiate the transport.
|
||||
"""
|
||||
await super().start(frame)
|
||||
|
||||
if self._initialized:
|
||||
return
|
||||
|
||||
self._initialized = True
|
||||
|
||||
if self._params.audio_in_enabled or self._params.video_in_enabled:
|
||||
self._listener_id = self._client.add_listener(
|
||||
VonageClientListener(
|
||||
on_audio_in=self._audio_in_cb,
|
||||
on_video_in=self._video_in_cb,
|
||||
on_error=self._on_error_cb,
|
||||
)
|
||||
)
|
||||
try:
|
||||
await self._client.connect(frame)
|
||||
self._connected = True
|
||||
except Exception as exc:
|
||||
logger.error(f"Error connecting to Vonage session: {exc}")
|
||||
await self.push_error("Vonage video connector connection error", fatal=True)
|
||||
return
|
||||
|
||||
await self.set_transport_ready(frame)
|
||||
|
||||
async def setup(self, setup: FrameProcessorSetup) -> None:
|
||||
"""Set up the processor with required components.
|
||||
|
||||
Args:
|
||||
setup: Configuration object containing setup parameters.
|
||||
"""
|
||||
await super().setup(setup)
|
||||
await self._client.setup(setup)
|
||||
|
||||
async def cleanup(self) -> None:
|
||||
"""Cleanup input transport."""
|
||||
await super().cleanup() # type: ignore
|
||||
await self._client.cleanup()
|
||||
|
||||
async def _audio_in_cb(self, _session: Session, audio: InputAudioRawFrame) -> None:
|
||||
if self._connected and self._params.audio_in_enabled:
|
||||
await self.push_audio_frame(audio)
|
||||
|
||||
async def _video_in_cb(self, _subscriber: Subscriber, video: UserImageRawFrame) -> None:
|
||||
if self._connected and self._params.video_in_enabled:
|
||||
await self.push_video_frame(video)
|
||||
|
||||
async def _on_error_cb(self, session: Session, description: str, code: int) -> None:
|
||||
logger.error(
|
||||
f"Vonage input transport error session={session.id} code={code} description={description}"
|
||||
)
|
||||
if self._connected:
|
||||
await self.push_error("Vonage video connector error", fatal=True)
|
||||
|
||||
async def stop(self, frame: EndFrame) -> None:
|
||||
"""Stop the Vonage input transport.
|
||||
|
||||
Args:
|
||||
frame: The EndFrame to stop the transport.
|
||||
"""
|
||||
await super().stop(frame)
|
||||
await self._stop_client()
|
||||
|
||||
async def cancel(self, frame: CancelFrame) -> None:
|
||||
"""Cancel the Vonage input transport.
|
||||
|
||||
Args:
|
||||
frame: The CancelFrame to cancel the transport.
|
||||
"""
|
||||
await super().cancel(frame)
|
||||
await self._stop_client()
|
||||
|
||||
async def _stop_client(self) -> None:
|
||||
if self._connected:
|
||||
self._client.remove_listener(self._listener_id)
|
||||
self._connected = False
|
||||
try:
|
||||
await self._client.disconnect()
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
async def subscribe_to_stream(self, stream_id: str, params: SubscribeSettings) -> None:
|
||||
"""Subscribe to a participant's stream.
|
||||
|
||||
Args:
|
||||
stream_id: The ID of the participant to subscribe to.
|
||||
params: Subscription parameters for the subscription.
|
||||
"""
|
||||
await self._client.subscribe_to_stream(stream_id, params)
|
||||
|
||||
|
||||
class VonageVideoConnectorOutputTransport(BaseOutputTransport):
|
||||
"""Output transport for Vonage, handling audio output to the Vonage session.
|
||||
|
||||
Sends audio frames to a Vonage Video session as output.
|
||||
"""
|
||||
|
||||
_params: VonageVideoConnectorTransportParams
|
||||
|
||||
def __init__(self, client: VonageClient, params: VonageVideoConnectorTransportParams):
|
||||
"""Initialize the Vonage output transport.
|
||||
|
||||
Args:
|
||||
client: The VonageClient instance to use.
|
||||
params: Transport parameters for output configuration.
|
||||
"""
|
||||
super().__init__(params)
|
||||
self._initialized: bool = False
|
||||
self._client = client
|
||||
self._connected: bool = False
|
||||
self._listener_id: int = -1
|
||||
|
||||
async def start(self, frame: StartFrame) -> None:
|
||||
"""Start the Vonage output transport.
|
||||
|
||||
Args:
|
||||
frame: The StartFrame to initiate the transport.
|
||||
"""
|
||||
await super().start(frame)
|
||||
|
||||
if self._initialized:
|
||||
return
|
||||
|
||||
self._initialized = True
|
||||
|
||||
if self._params.audio_out_enabled or self._params.video_out_enabled:
|
||||
self._listener_id = self._client.add_listener(
|
||||
VonageClientListener(on_error=self._on_error_cb)
|
||||
)
|
||||
try:
|
||||
await self._client.connect(frame)
|
||||
self._connected = True
|
||||
except Exception as exc:
|
||||
logger.error(f"Error connecting to Vonage session: {exc}")
|
||||
await self.push_error("Vonage video connector connection error", fatal=True)
|
||||
return
|
||||
|
||||
await self.set_transport_ready(frame)
|
||||
|
||||
async def setup(self, setup: FrameProcessorSetup) -> None:
|
||||
"""Set up the processor with required components.
|
||||
|
||||
Args:
|
||||
setup: Configuration object containing setup parameters.
|
||||
"""
|
||||
await super().setup(setup)
|
||||
await self._client.setup(setup)
|
||||
|
||||
async def cleanup(self) -> None:
|
||||
"""Cleanup output transport."""
|
||||
await super().cleanup() # type: ignore
|
||||
await self._client.cleanup()
|
||||
|
||||
async def process_frame(self, frame: Frame, direction: FrameDirection) -> None:
|
||||
"""Process a frame for the Vonage output transport.
|
||||
|
||||
Args:
|
||||
frame: The frame to process.
|
||||
direction: The direction of frame flow in the pipeline.
|
||||
"""
|
||||
await super().process_frame(frame, direction)
|
||||
|
||||
# if we get an interruption frame, we need to ensure the buffers inside Vonage Video Connector are cleared
|
||||
if (
|
||||
self._connected
|
||||
and isinstance(frame, InterruptionFrame)
|
||||
and self._params.clear_buffers_on_interruption
|
||||
):
|
||||
logger.info("Clearing Vonage media buffers due to interruption frame")
|
||||
self._client.clear_media_buffers()
|
||||
|
||||
async def write_audio_frame(self, frame: OutputAudioRawFrame) -> bool:
|
||||
"""Write an audio frame to the Vonage session.
|
||||
|
||||
Args:
|
||||
frame: The OutputAudioRawFrame to send.
|
||||
"""
|
||||
result = False
|
||||
if self._connected and self._params.audio_out_enabled:
|
||||
result = await self._client.write_audio(frame)
|
||||
|
||||
return result
|
||||
|
||||
async def write_video_frame(self, frame: OutputImageRawFrame) -> bool:
|
||||
"""Write a video frame to the transport.
|
||||
|
||||
Args:
|
||||
frame: The output video frame to write.
|
||||
"""
|
||||
result = False
|
||||
if self._connected and self._params.video_out_enabled:
|
||||
result = await self._client.write_video(frame)
|
||||
|
||||
return result
|
||||
|
||||
async def stop(self, frame: EndFrame) -> None:
|
||||
"""Stop the Vonage output transport.
|
||||
|
||||
Args:
|
||||
frame: The EndFrame to stop the transport.
|
||||
"""
|
||||
await super().stop(frame)
|
||||
await self._stop_client()
|
||||
|
||||
async def cancel(self, frame: CancelFrame) -> None:
|
||||
"""Cancel the Vonage output transport.
|
||||
|
||||
Args:
|
||||
frame: The CancelFrame to cancel the transport.
|
||||
"""
|
||||
await super().cancel(frame)
|
||||
await self._stop_client()
|
||||
|
||||
async def _stop_client(self) -> None:
|
||||
if self._connected:
|
||||
self._client.remove_listener(self._listener_id)
|
||||
self._connected = False
|
||||
try:
|
||||
await self._client.disconnect()
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
async def _on_error_cb(self, session: Session, description: str, code: int) -> None:
|
||||
logger.error(
|
||||
f"Vonage output transport error session={session.id} code={code} description={description}"
|
||||
)
|
||||
if self._connected:
|
||||
await self.push_error("Vonage video connector error", fatal=True)
|
||||
|
||||
|
||||
class VonageVideoConnectorTransport(BaseTransport):
|
||||
"""Vonage Video Connector transport implementation for Pipecat.
|
||||
|
||||
Provides input and output audio transport for Vonage Video sessions, supporting event handling
|
||||
for session and participant lifecycle.
|
||||
|
||||
Supported features:
|
||||
|
||||
- Audio input and output transport for Vonage Video sessions
|
||||
- Event handler registration for session and participant events
|
||||
- Publisher and subscriber management
|
||||
- Configurable audio and migration parameters
|
||||
"""
|
||||
|
||||
_params: VonageVideoConnectorTransportParams
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
application_id: str,
|
||||
session_id: str,
|
||||
token: str,
|
||||
params: VonageVideoConnectorTransportParams,
|
||||
):
|
||||
"""Initialize the Vonage Video Connector transport.
|
||||
|
||||
Args:
|
||||
application_id: The Vonage Video application ID.
|
||||
session_id: The session ID to connect to.
|
||||
token: The authentication token for the session.
|
||||
params: Transport parameters for input/output configuration.
|
||||
"""
|
||||
super().__init__()
|
||||
self._params = params
|
||||
|
||||
self._client = VonageClient(application_id, session_id, token, params)
|
||||
|
||||
# Register supported handlers.
|
||||
self._register_event_handler("on_joined")
|
||||
self._register_event_handler("on_left")
|
||||
self._register_event_handler("on_error")
|
||||
self._register_event_handler("on_client_connected", sync=True)
|
||||
self._register_event_handler("on_client_disconnected")
|
||||
self._register_event_handler("on_first_participant_joined", sync=True)
|
||||
self._register_event_handler("on_participant_joined", sync=True)
|
||||
self._register_event_handler("on_participant_left")
|
||||
|
||||
self._client.add_listener(
|
||||
VonageClientListener(
|
||||
on_connected=self._on_connected,
|
||||
on_disconnected=self._on_disconnected,
|
||||
on_error=self._on_error,
|
||||
on_stream_received=self._on_stream_received,
|
||||
on_stream_dropped=self._on_stream_dropped,
|
||||
on_subscriber_connected=self._on_subscriber_connected,
|
||||
on_subscriber_disconnected=self._on_subscriber_disconnected,
|
||||
)
|
||||
)
|
||||
|
||||
self._input: VonageVideoConnectorInputTransport | None = None
|
||||
self._output: VonageVideoConnectorOutputTransport | None = None
|
||||
self._one_stream_received: bool = False
|
||||
|
||||
def input(self) -> FrameProcessor:
|
||||
"""Get the input transport for Vonage.
|
||||
|
||||
Returns:
|
||||
The VonageVideoConnectorInputTransport instance.
|
||||
"""
|
||||
if not self._input:
|
||||
self._input = VonageVideoConnectorInputTransport(self._client, self._params)
|
||||
return self._input
|
||||
|
||||
def output(self) -> FrameProcessor:
|
||||
"""Get the output transport for Vonage.
|
||||
|
||||
Returns:
|
||||
The VonageVideoConnectorOutputTransport instance.
|
||||
"""
|
||||
if not self._output:
|
||||
self._output = VonageVideoConnectorOutputTransport(self._client, self._params)
|
||||
return self._output
|
||||
|
||||
async def subscribe_to_stream(self, stream_id: str, params: SubscribeSettings) -> None:
|
||||
"""Subscribe to a participant's stream.
|
||||
|
||||
Args:
|
||||
stream_id: The ID of the participant to subscribe to.
|
||||
params: Subscription parameters for the subscription.
|
||||
"""
|
||||
if self._input:
|
||||
await self._input.subscribe_to_stream(stream_id, params)
|
||||
|
||||
async def _on_connected(self, session: Session) -> None:
|
||||
"""Handle session connected event.
|
||||
|
||||
Args:
|
||||
session: The connected Session object.
|
||||
"""
|
||||
await self._call_event_handler("on_joined", {"sessionId": session.id})
|
||||
|
||||
async def _on_disconnected(self, session: Session) -> None:
|
||||
"""Handle session disconnected event.
|
||||
|
||||
Args:
|
||||
session: The disconnected Session object.
|
||||
"""
|
||||
await self._call_event_handler("on_left", {"sessionId": session.id})
|
||||
|
||||
async def _on_error(self, _session: Session, description: str, _code: int) -> None:
|
||||
"""Handle session error event.
|
||||
|
||||
Args:
|
||||
_session: The Session object.
|
||||
description: Error description.
|
||||
_code: Error code.
|
||||
"""
|
||||
await self._call_event_handler("on_error", description)
|
||||
|
||||
async def _on_stream_received(self, session: Session, stream: Stream) -> None:
|
||||
"""Handle stream received event.
|
||||
|
||||
Args:
|
||||
session: The Session object.
|
||||
stream: The received Stream object.
|
||||
"""
|
||||
client = {
|
||||
"sessionId": session.id,
|
||||
"streamId": stream.id,
|
||||
"connectionData": stream.connection.data,
|
||||
}
|
||||
if not self._one_stream_received:
|
||||
self._one_stream_received = True
|
||||
await self._call_event_handler("on_first_participant_joined", client)
|
||||
|
||||
await self._call_event_handler("on_participant_joined", client)
|
||||
|
||||
async def _on_stream_dropped(self, session: Session, stream: Stream) -> None:
|
||||
"""Handle stream dropped event.
|
||||
|
||||
Args:
|
||||
session: The Session object.
|
||||
stream: The dropped Stream object.
|
||||
"""
|
||||
client = {
|
||||
"sessionId": session.id,
|
||||
"streamId": stream.id,
|
||||
"connectionData": stream.connection.data,
|
||||
}
|
||||
await self._call_event_handler("on_participant_left", client)
|
||||
|
||||
async def _on_subscriber_connected(self, subscriber: Subscriber) -> None:
|
||||
"""Handle subscriber connected event.
|
||||
|
||||
Args:
|
||||
subscriber: The connected Subscriber object.
|
||||
"""
|
||||
await self._call_event_handler(
|
||||
"on_client_connected",
|
||||
{
|
||||
"subscriberId": subscriber.stream.id,
|
||||
"streamId": subscriber.stream.id,
|
||||
"connectionData": subscriber.stream.connection.data,
|
||||
},
|
||||
)
|
||||
|
||||
async def _on_subscriber_disconnected(self, subscriber: Subscriber) -> None:
|
||||
"""Handle subscriber disconnected event.
|
||||
|
||||
Args:
|
||||
subscriber: The disconnected Subscriber object.
|
||||
"""
|
||||
await self._call_event_handler(
|
||||
"on_client_disconnected",
|
||||
{
|
||||
"subscriberId": subscriber.stream.id,
|
||||
"streamId": subscriber.stream.id,
|
||||
"connectionData": subscriber.stream.connection.data,
|
||||
},
|
||||
)
|
||||
@@ -20,6 +20,7 @@ from loguru import logger
|
||||
|
||||
from pipecat.frames.frames import (
|
||||
Frame,
|
||||
FunctionCallsStartedFrame,
|
||||
InterruptionFrame,
|
||||
LLMFullResponseEndFrame,
|
||||
LLMMarkerFrame,
|
||||
@@ -222,6 +223,14 @@ class UserTurnCompletionLLMServiceMixin(FrameProcessor):
|
||||
# ensures graceful degradation if the LLM disobeys and outputs additional text.
|
||||
self._turn_suppressed = False
|
||||
self._turn_complete_found = False # True when ✓ (COMPLETE) is detected
|
||||
# Set when the LLM made a tool call during this turn. Informational
|
||||
# only — broadcasting is idempotency-gated by
|
||||
# ``_turn_completion_broadcasted``.
|
||||
self._turn_had_function_call = False
|
||||
# True once ``UserTurnInferenceCompletedFrame`` has been broadcast
|
||||
# for this turn. Prevents double-broadcast when ✓ and a tool call
|
||||
# both occur in the same turn.
|
||||
self._turn_completion_broadcasted = False
|
||||
|
||||
# Timeout handling
|
||||
self._user_turn_completion_config = UserTurnCompletionConfig()
|
||||
@@ -236,6 +245,27 @@ class UserTurnCompletionLLMServiceMixin(FrameProcessor):
|
||||
"""
|
||||
self._user_turn_completion_config = config
|
||||
|
||||
async def _broadcast_turn_completion(self):
|
||||
"""Broadcast ``UserTurnInferenceCompletedFrame`` at most once per turn.
|
||||
|
||||
Called from the two places we know the LLM has committed to a
|
||||
response for the current user turn:
|
||||
|
||||
- the ``✓`` marker is detected in the text stream
|
||||
- a ``FunctionCallsStartedFrame`` is emitted — the LLM committed
|
||||
to a tool call before producing (or instead of) a marker.
|
||||
|
||||
Broadcasting on the tool-call path matters for races: the
|
||||
downstream ``UserStoppedSpeakingFrame`` needs to propagate
|
||||
before the function actually executes and a
|
||||
``FunctionCallResultFrame`` flows back to the assistant
|
||||
aggregator.
|
||||
"""
|
||||
if self._turn_completion_broadcasted:
|
||||
return
|
||||
self._turn_completion_broadcasted = True
|
||||
await self.broadcast_frame(UserTurnInferenceCompletedFrame)
|
||||
|
||||
async def _start_incomplete_timeout(self, incomplete_type: Literal["short", "long"]):
|
||||
"""Start a timeout task for incomplete turn handling.
|
||||
|
||||
@@ -325,6 +355,8 @@ class UserTurnCompletionLLMServiceMixin(FrameProcessor):
|
||||
self._turn_text_buffer = ""
|
||||
self._turn_suppressed = False
|
||||
self._turn_complete_found = False
|
||||
self._turn_had_function_call = False
|
||||
self._turn_completion_broadcasted = False
|
||||
|
||||
async def process_frame(self, frame: Frame, direction: FrameDirection):
|
||||
"""Process frames, handling turn completion state resets.
|
||||
@@ -351,7 +383,14 @@ class UserTurnCompletionLLMServiceMixin(FrameProcessor):
|
||||
frame: The frame to push downstream.
|
||||
direction: The direction of frame flow. Defaults to downstream.
|
||||
"""
|
||||
if isinstance(frame, LLMFullResponseEndFrame):
|
||||
if isinstance(frame, FunctionCallsStartedFrame):
|
||||
self._turn_had_function_call = True
|
||||
# Broadcast turn completion now, before the function dispatches
|
||||
# — gives ``UserStoppedSpeakingFrame`` maximum time to propagate
|
||||
# so the assistant aggregator's ``_user_speaking`` is False by
|
||||
# the time a ``FunctionCallResultFrame`` arrives.
|
||||
await self._broadcast_turn_completion()
|
||||
elif isinstance(frame, LLMFullResponseEndFrame):
|
||||
await self._turn_reset()
|
||||
|
||||
await super().push_frame(frame, direction)
|
||||
@@ -427,7 +466,9 @@ class UserTurnCompletionLLMServiceMixin(FrameProcessor):
|
||||
# LLMTurnCompletionUserTurnStopStrategy) can fire
|
||||
# `on_user_turn_stopped`. Must fire before the marker so
|
||||
# downstream consumers see the signal before the response.
|
||||
await self.broadcast_frame(UserTurnInferenceCompletedFrame)
|
||||
# Idempotent: a tool call earlier in the turn may have
|
||||
# already broadcast.
|
||||
await self._broadcast_turn_completion()
|
||||
|
||||
# Push the marker as a sideband signal that the assistant
|
||||
# aggregator will prepend to the upcoming aggregated text,
|
||||
|
||||
354
src/pipecat/utils/context/aggregated_frame_sequencer.py
Normal file
354
src/pipecat/utils/context/aggregated_frame_sequencer.py
Normal file
@@ -0,0 +1,354 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
|
||||
"""Ordered sequencer for AggregatedTextFrame slots through TTS processing."""
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
|
||||
from loguru import logger
|
||||
|
||||
from pipecat.frames.frames import (
|
||||
AggregatedTextFrame,
|
||||
AggregationType,
|
||||
Frame,
|
||||
TTSTextFrame,
|
||||
)
|
||||
from pipecat.utils.context.word_completion_tracker import WordCompletionTracker
|
||||
|
||||
|
||||
@dataclass
|
||||
class _AggregatedFrameSlot:
|
||||
"""Ordered slot tracking one AggregatedTextFrame through TTS processing.
|
||||
|
||||
Every frame that passes through _push_tts_frames — whether spoken or skipped —
|
||||
occupies a slot in the sequencer. Skipped frames wait at their position and are
|
||||
emitted downstream only after all preceding spoken slots are complete, preserving
|
||||
correct context ordering.
|
||||
"""
|
||||
|
||||
frame: AggregatedTextFrame
|
||||
context_id: str
|
||||
spoken: bool
|
||||
tracker: WordCompletionTracker | None = None
|
||||
transport_destination: str | None = None
|
||||
complete: bool = False
|
||||
|
||||
|
||||
class AggregatedFrameSequencer:
|
||||
"""Sequences AggregatedTextFrame slots to preserve TTS context ordering.
|
||||
|
||||
Manages an ordered queue of spoken and skipped TTS slots. Spoken slots are tracked
|
||||
via a :class:`WordCompletionTracker`; skipped slots (e.g. code blocks excluded from
|
||||
TTS synthesis) wait in-place until all preceding spoken slots are complete, then are
|
||||
flushed downstream with ``append_to_context=True``.
|
||||
|
||||
All methods are synchronous and return lists of frames the caller should push
|
||||
downstream, making the sequencer fully testable without any async machinery.
|
||||
|
||||
Example::
|
||||
|
||||
sequencer = AggregatedFrameSequencer()
|
||||
sequencer.register_spoken(frame, ctx_id, tracker, append_to_context=True)
|
||||
for f in sequencer.process_word("hello", pts=1000, context_id=ctx_id):
|
||||
await self.push_frame(f)
|
||||
"""
|
||||
|
||||
def __init__(self, name: str = "AggregatedFrameSequencer"):
|
||||
"""Initialize the sequencer.
|
||||
|
||||
Args:
|
||||
name: Label used in log messages (typically the owning TTS service name).
|
||||
"""
|
||||
self._name = name
|
||||
self._slots: list[_AggregatedFrameSlot] = []
|
||||
self._context_append_to_context: dict[str, bool] = {}
|
||||
|
||||
def register_spoken(
|
||||
self,
|
||||
frame: AggregatedTextFrame,
|
||||
context_id: str,
|
||||
tracker: WordCompletionTracker | None,
|
||||
append_to_context: bool,
|
||||
) -> None:
|
||||
"""Register a spoken AggregatedTextFrame slot.
|
||||
|
||||
Called from _push_tts_frames for frames sent to the TTS service. The slot is
|
||||
marked complete either via :meth:`process_word` (word-timestamp services) or
|
||||
:meth:`complete_spoken_slot` (push_text_frames=True services).
|
||||
|
||||
Args:
|
||||
frame: The AggregatedTextFrame being spoken.
|
||||
context_id: The TTS context ID assigned to this frame.
|
||||
tracker: WordCompletionTracker for word-timestamp services; None for
|
||||
push_text_frames=True services (they complete via complete_spoken_slot).
|
||||
append_to_context: Whether word frames built for this context should carry
|
||||
append_to_context=True.
|
||||
"""
|
||||
self._context_append_to_context[context_id] = append_to_context
|
||||
self._slots.append(
|
||||
_AggregatedFrameSlot(
|
||||
frame=frame,
|
||||
context_id=context_id,
|
||||
spoken=True,
|
||||
tracker=tracker,
|
||||
)
|
||||
)
|
||||
|
||||
def register_skipped(
|
||||
self,
|
||||
frame: AggregatedTextFrame,
|
||||
context_id: str,
|
||||
transport_destination: str | None,
|
||||
) -> list[Frame]:
|
||||
"""Register a skipped AggregatedTextFrame and attempt an immediate flush.
|
||||
|
||||
The frame is appended as a skipped slot. If no incomplete spoken slot precedes
|
||||
it, the frame is returned right away; otherwise it waits until a later
|
||||
:meth:`flush` unblocks it.
|
||||
|
||||
Args:
|
||||
frame: The skipped AggregatedTextFrame (e.g. a code block).
|
||||
context_id: The context ID assigned in _push_tts_frames.
|
||||
transport_destination: Transport routing value to attach at flush time.
|
||||
|
||||
Returns:
|
||||
Frames to push downstream (empty when blocked by a preceding spoken slot).
|
||||
"""
|
||||
frame.context_id = context_id
|
||||
self._slots.append(
|
||||
_AggregatedFrameSlot(
|
||||
frame=frame,
|
||||
context_id=context_id,
|
||||
spoken=False,
|
||||
transport_destination=transport_destination,
|
||||
)
|
||||
)
|
||||
return self.flush()
|
||||
|
||||
def process_word(
|
||||
self,
|
||||
word: str,
|
||||
pts: int,
|
||||
context_id: str | None,
|
||||
) -> list[Frame]:
|
||||
"""Process one word-timestamp event and return frames to push downstream.
|
||||
|
||||
Locates the active (first incomplete spoken) slot with a tracker, advances it
|
||||
by the incoming word, and builds a :class:`TTSTextFrame`. Handles:
|
||||
|
||||
- Normal words that fit entirely within the active slot.
|
||||
- Overflow words straddling two slot boundaries.
|
||||
- Force-complete when the TTS drops an event (word belongs to the next slot).
|
||||
- Passthrough for words not recognised by any slot.
|
||||
- Flushes any skipped slots unblocked by slot completion.
|
||||
|
||||
Args:
|
||||
word: A word token from the TTS service word-timestamp stream.
|
||||
pts: Presentation timestamp (nanoseconds) to assign to the frame.
|
||||
context_id: TTS context ID from the word-timestamp event.
|
||||
|
||||
Returns:
|
||||
Ordered list of frames (TTSTextFrame and/or AggregatedTextFrame) to push.
|
||||
"""
|
||||
active = self._get_active_slot()
|
||||
is_complete = False
|
||||
raw_overflow_word = None
|
||||
|
||||
if active and active.tracker:
|
||||
if not active.tracker.word_belongs_here(word):
|
||||
next_slot = self._get_next_active_slot(active)
|
||||
word_fits_next = (
|
||||
next_slot is not None
|
||||
and next_slot.tracker is not None
|
||||
and next_slot.tracker.word_belongs_here(word)
|
||||
)
|
||||
if not word_fits_next:
|
||||
logger.warning(
|
||||
f"{self._name} Word '{word}' not recognised by any slot, "
|
||||
"emitting as passthrough"
|
||||
)
|
||||
return [self._build_word_frame(word, pts, context_id)]
|
||||
|
||||
is_complete = active.tracker.add_word_and_check_complete(word)
|
||||
raw_overflow_word = active.tracker.get_overflow_word()
|
||||
|
||||
frame_text = (
|
||||
active.tracker.get_word_for_frame() if (active and active.tracker) else word
|
||||
) or word
|
||||
raw_text = active.tracker.get_llm_consumed() if (active and active.tracker) else None
|
||||
emit_context_id = active.context_id if active else context_id
|
||||
|
||||
# logger.debug(f"{self._name} Word '{word}' → frame_text='{frame_text}', raw='{raw_text}'")
|
||||
frames: list[Frame] = [
|
||||
self._build_word_frame(frame_text, pts, emit_context_id, raw_text=raw_text)
|
||||
]
|
||||
|
||||
if is_complete and active:
|
||||
active.complete = True
|
||||
frames.extend(self.flush(last_word_pts=pts))
|
||||
if raw_overflow_word:
|
||||
logger.debug(f"{self._name} Emitting overflow word '{raw_overflow_word}'")
|
||||
frames.extend(self._process_overflow(raw_overflow_word, pts))
|
||||
|
||||
return frames
|
||||
|
||||
def complete_spoken_slot(self) -> list[Frame]:
|
||||
"""Mark the first pending spoken slot complete and flush unblocked skipped frames.
|
||||
|
||||
Used by push_text_frames=True services: after the TTSTextFrame has been appended
|
||||
to the audio context, this marks the spoken slot done and releases any skipped
|
||||
frames waiting behind it.
|
||||
|
||||
Returns:
|
||||
AggregatedTextFrame(s) that are now unblocked and should be pushed.
|
||||
"""
|
||||
slot = next((s for s in self._slots if s.spoken and not s.complete), None)
|
||||
if slot:
|
||||
slot.complete = True
|
||||
return self.flush()
|
||||
|
||||
def flush(self, last_word_pts: int | None = None) -> list[Frame]:
|
||||
"""Walk the slot queue and return all skipped frames that are now unblocked.
|
||||
|
||||
Removes complete spoken slots from the head of the queue, then emits (and
|
||||
removes) skipped slots whose preceding spoken slots are all done. Stops at
|
||||
the first incomplete spoken slot.
|
||||
|
||||
Args:
|
||||
last_word_pts: When provided, skipped frames receive this PTS so they
|
||||
appear immediately after the last spoken word in the timeline.
|
||||
|
||||
Returns:
|
||||
AggregatedTextFrame(s) ready to be pushed downstream.
|
||||
"""
|
||||
frames: list[Frame] = []
|
||||
while self._slots:
|
||||
slot = self._slots[0]
|
||||
if slot.spoken and slot.complete:
|
||||
self._slots.pop(0)
|
||||
elif not slot.spoken and not slot.complete:
|
||||
slot.frame.append_to_context = True
|
||||
slot.frame.transport_destination = slot.transport_destination
|
||||
if last_word_pts:
|
||||
slot.frame.pts = last_word_pts
|
||||
logger.debug(f"{self._name}: Flushing Aggregated Frame {slot.frame}")
|
||||
frames.append(slot.frame)
|
||||
slot.complete = True
|
||||
self._slots.pop(0)
|
||||
else:
|
||||
break # spoken but not yet complete — wait
|
||||
return frames
|
||||
|
||||
def force_complete(self, last_word_pts: int) -> list[Frame]:
|
||||
"""Force-complete all incomplete spoken slots and flush skipped frames.
|
||||
|
||||
Called at the end of an audio context to handle TTS providers that silently drop
|
||||
word-timestamp events. Emits a TTSTextFrame for any remaining unspoken text in
|
||||
each incomplete slot, marks it complete, then flushes all now-unblocked skipped
|
||||
frames.
|
||||
|
||||
Args:
|
||||
last_word_pts: PTS of the last received word frame, used as the PTS for
|
||||
force-completed frames and forwarded to :meth:`flush`.
|
||||
|
||||
Returns:
|
||||
Combined list of TTSTextFrames (for incomplete spoken slots) and
|
||||
AggregatedTextFrames (skipped slots now unblocked), in emission order.
|
||||
"""
|
||||
frames: list[Frame] = []
|
||||
for slot in self._slots:
|
||||
if slot.spoken and not slot.complete:
|
||||
if slot.tracker:
|
||||
remaining_text = slot.tracker.get_remaining_tts_text()
|
||||
raw_remaining = slot.tracker.get_remaining_llm_text()
|
||||
if raw_remaining and remaining_text and remaining_text not in raw_remaining:
|
||||
logger.warning(
|
||||
f"{self._name} force-complete: raw_remaining {repr(raw_remaining)} "
|
||||
f"does not contain remaining_text {repr(remaining_text)}, discarding"
|
||||
)
|
||||
raw_remaining = None
|
||||
if remaining_text:
|
||||
logger.debug(
|
||||
f"{self._name} force-completing slot with remaining text "
|
||||
f"{repr(remaining_text)}"
|
||||
)
|
||||
frames.append(
|
||||
self._build_word_frame(
|
||||
remaining_text,
|
||||
last_word_pts,
|
||||
slot.context_id,
|
||||
raw_text=raw_remaining,
|
||||
)
|
||||
)
|
||||
slot.complete = True
|
||||
frames.extend(self.flush(last_word_pts=last_word_pts))
|
||||
return frames
|
||||
|
||||
def clear(self) -> None:
|
||||
"""Clear all slots and context metadata (called on interruption/reset)."""
|
||||
self._slots.clear()
|
||||
self._context_append_to_context.clear()
|
||||
|
||||
# -------------------------------------------------------------------------
|
||||
# Internal helpers
|
||||
# -------------------------------------------------------------------------
|
||||
|
||||
def _get_active_slot(self) -> _AggregatedFrameSlot | None:
|
||||
"""Return the first incomplete spoken slot that has a tracker."""
|
||||
return next(
|
||||
(s for s in self._slots if s.spoken and not s.complete and s.tracker is not None),
|
||||
None,
|
||||
)
|
||||
|
||||
def _get_next_active_slot(self, current: _AggregatedFrameSlot) -> _AggregatedFrameSlot | None:
|
||||
"""Return the first incomplete spoken slot with a tracker after *current*."""
|
||||
found = False
|
||||
for s in self._slots:
|
||||
if s is current:
|
||||
found = True
|
||||
continue
|
||||
if found and s.spoken and not s.complete and s.tracker is not None:
|
||||
return s
|
||||
return None
|
||||
|
||||
def _build_word_frame(
|
||||
self,
|
||||
text: str,
|
||||
pts: int,
|
||||
context_id: str | None,
|
||||
raw_text: str | None = None,
|
||||
) -> Frame:
|
||||
"""Build a TTSTextFrame with all standard word-timestamp attributes set."""
|
||||
frame = TTSTextFrame(text, aggregated_by=AggregationType.WORD)
|
||||
frame.pts = pts
|
||||
frame.context_id = context_id
|
||||
frame.append_to_context = (
|
||||
self._context_append_to_context.get(context_id, True)
|
||||
if context_id is not None
|
||||
else True
|
||||
)
|
||||
frame.raw_text = raw_text
|
||||
return frame
|
||||
|
||||
def _process_overflow(self, raw_overflow_word: str, pts: int) -> list[Frame]:
|
||||
"""Feed an overflow suffix into the next active slot and return resulting frames."""
|
||||
frames: list[Frame] = []
|
||||
next_active = self._get_active_slot()
|
||||
if not next_active or not next_active.tracker:
|
||||
return frames
|
||||
overflow_complete = next_active.tracker.add_word_and_check_complete(raw_overflow_word)
|
||||
frames.append(
|
||||
self._build_word_frame(
|
||||
raw_overflow_word,
|
||||
pts,
|
||||
next_active.context_id,
|
||||
raw_text=next_active.tracker.get_llm_consumed(),
|
||||
)
|
||||
)
|
||||
if overflow_complete:
|
||||
next_active.complete = True
|
||||
frames.extend(self.flush(last_word_pts=pts))
|
||||
return frames
|
||||
489
src/pipecat/utils/context/word_completion_tracker.py
Normal file
489
src/pipecat/utils/context/word_completion_tracker.py
Normal file
@@ -0,0 +1,489 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
|
||||
"""Word completion tracker for TTS context ordering."""
|
||||
|
||||
import re
|
||||
import unicodedata
|
||||
|
||||
from loguru import logger
|
||||
|
||||
|
||||
class WordCompletionTracker:
|
||||
"""Tracks whether all words from a source AggregatedTextFrame have been spoken.
|
||||
|
||||
Compares normalized alphanumeric character counts between the TTS text and
|
||||
accumulated spoken words, making the check robust to punctuation, spacing,
|
||||
and XML/HTML tags (e.g. SSML tags like ``<spell>...</spell>`` returned by some
|
||||
TTS providers in word-timestamp events).
|
||||
|
||||
When ``llm_text`` is provided (e.g. the original pattern-matched text including
|
||||
delimiters like ``<card>4111 1111 1111 1111</card>``), the tracker additionally
|
||||
maps each spoken word back to its corresponding span in that LLM text. This
|
||||
lets callers attach the original text to ``TTSTextFrame`` entries so the
|
||||
conversation context receives properly-tagged content rather than the cleaned
|
||||
words received from the TTS provider.
|
||||
|
||||
Background: TTS providers apply their own SSML tags to the text before
|
||||
synthesis and return word-timestamp events containing the raw spoken words
|
||||
(e.g. ``"4111"``, ``"1111"``). Without LLM-text tracking, the conversation
|
||||
context would only see those cleaned words and lose the original structure
|
||||
(e.g. ``<card>4111 1111 1111 1111</card>``). By mapping normalized char counts
|
||||
back to positions in ``llm_text``, each TTSTextFrame can carry the exact span
|
||||
of original text it represents.
|
||||
|
||||
Overflow handling: TTS providers sometimes return a single word token that
|
||||
spans the boundary between two AggregatedTextFrames (e.g. ``"1111</spell>And"``
|
||||
when one frame ends with ``1111</card>`` and the next begins with ``And``). The
|
||||
tracker detects this and exposes the raw overflow suffix via ``get_overflow_word()``,
|
||||
so callers can feed the remainder into the next frame's tracker and emit a
|
||||
correctly-attributed TTSTextFrame for each part.
|
||||
|
||||
Example::
|
||||
|
||||
tracker = WordCompletionTracker("Hello, world!")
|
||||
tracker.add_word_and_check_complete("Hello") # False
|
||||
tracker.add_word_and_check_complete("world") # True — normalized "helloworld" >= "helloworld"
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
tts_text: str,
|
||||
llm_text: str | None = None,
|
||||
):
|
||||
"""Initialize the tracker with the text of the frame being spoken.
|
||||
|
||||
Args:
|
||||
tts_text: Full text of the AggregatedTextFrame sent to TTS (may include
|
||||
TTS-specific SSML tags). Used for normalized char-count completion
|
||||
tracking and as the cursor reference for the TTS word stream.
|
||||
llm_text: Original LLM-produced text including pattern delimiters (e.g.
|
||||
``<card>4111 1111 1111 1111</card>``). When provided, each
|
||||
``add_word_and_check_complete`` call also returns the corresponding
|
||||
LLM span via ``get_llm_consumed()``. Both texts normalize to the
|
||||
same alphanumeric sequence, so the same char-count cursor drives
|
||||
position tracking in both.
|
||||
"""
|
||||
self._tts_normalized = self._normalize(tts_text)
|
||||
self._received = ""
|
||||
|
||||
# _tts_text is the original tts_text before normalization.
|
||||
# _tts_pos is a cursor into it, advanced by the same alnum count
|
||||
# as the TTS word stream, so the force-complete path can emit the remaining
|
||||
# unspoken text as a TTSTextFrame instead of silently dropping it.
|
||||
self._tts_text = tts_text
|
||||
self._tts_pos = 0
|
||||
|
||||
# _llm_text is the original LLM-produced text (with pattern delimiters like
|
||||
# <card>...</card>). We track _llm_pos as a cursor into it, advancing
|
||||
# by the same number of alphanumeric chars consumed from the TTS word stream.
|
||||
self._llm_text = llm_text
|
||||
self._llm_pos = 0
|
||||
|
||||
self._overflow_word: str | None = None
|
||||
self._llm_consumed: str | None = None
|
||||
self._frame_word: str | None = None
|
||||
|
||||
@staticmethod
|
||||
def _normalize(text: str) -> str:
|
||||
"""Strip XML/HTML tags then keep only lowercase alphanumeric characters.
|
||||
|
||||
Accented letters (e.g. ã, é) are reduced to their base letter so TTS output
|
||||
can be matched against LLM text even when the provider strips diacritics.
|
||||
Non-Latin scripts (CJK, Hangul) are kept as-is — each original character
|
||||
contributes exactly one char to the result, keeping normalized length in sync
|
||||
with raw alnum counts used by _advance_by_alnums.
|
||||
"""
|
||||
text = re.sub(r"<[^>]+>", "", text)
|
||||
result = []
|
||||
for char in text:
|
||||
# Ignore punctuation, spaces, emojis, etc.
|
||||
# Keep only letters and numbers.
|
||||
if not char.isalnum():
|
||||
continue
|
||||
# NFD decomposes accented characters into:
|
||||
# é -> e + ◌́
|
||||
# ã -> a + ◌̃
|
||||
#
|
||||
# Non-accented characters usually stay unchanged.
|
||||
nfd = unicodedata.normalize("NFD", char)
|
||||
# Unicode category "Mn" means:
|
||||
# Mark, Nonspacing
|
||||
#
|
||||
# These are combining accent marks that modify
|
||||
# the previous character but are not standalone.
|
||||
#
|
||||
# Example:
|
||||
# "é" becomes:
|
||||
# nfd[0] = "e"
|
||||
# nfd[1] = "◌́" (category = "Mn")
|
||||
#
|
||||
# If the second character is a combining accent,
|
||||
# keep only the base letter.
|
||||
if len(nfd) >= 2 and unicodedata.category(nfd[1]) == "Mn":
|
||||
# Accented letter: keep the base character only (drops the combining mark).
|
||||
result.append(nfd[0].lower())
|
||||
else:
|
||||
# Regular ASCII, numbers, CJK, Hangul, etc.
|
||||
# are kept unchanged (except lowercase conversion).
|
||||
result.append(char.lower())
|
||||
return "".join(result)
|
||||
|
||||
# Typographic variants that LLMs commonly emit but TTS services normalize away.
|
||||
_TYPOGRAPHY_FOLD = str.maketrans(
|
||||
{
|
||||
"‘": "'", # ' LEFT SINGLE QUOTATION MARK
|
||||
"’": "'", # ' RIGHT SINGLE QUOTATION MARK
|
||||
"ʼ": "'", # ʼ MODIFIER LETTER APOSTROPHE
|
||||
"“": '"', # " LEFT DOUBLE QUOTATION MARK
|
||||
"”": '"', # " RIGHT DOUBLE QUOTATION MARK
|
||||
"–": "-", # – EN DASH
|
||||
"—": "-", # — EM DASH
|
||||
}
|
||||
)
|
||||
|
||||
@staticmethod
|
||||
def _fold_typography(text: str) -> str:
|
||||
"""Replace typographic punctuation variants with their ASCII equivalents."""
|
||||
return text.translate(WordCompletionTracker._TYPOGRAPHY_FOLD)
|
||||
|
||||
@staticmethod
|
||||
def _remove_trailing_punctuation(text: str) -> str:
|
||||
"""Remove punctuation only at the very end of the given text."""
|
||||
i = len(text)
|
||||
while i > 0 and unicodedata.category(text[i - 1]).startswith("P"):
|
||||
i -= 1
|
||||
return text[:i]
|
||||
|
||||
@staticmethod
|
||||
def _advance_by_alnums(text: str, start_pos: int, n: int) -> int:
|
||||
"""Return the position in *text* after advancing past *n* alphanumeric chars.
|
||||
|
||||
Moves through the text one character at a time, counting only alphanumeric
|
||||
characters. XML/HTML tags (``<...>``) are skipped entirely — their content
|
||||
is not counted against the budget, so the returned span includes the full tag.
|
||||
Other non-alphanumeric characters (spaces, punctuation) are also passed over
|
||||
without decrementing the budget.
|
||||
|
||||
After the *n* alnum chars are consumed, advances further past any immediately
|
||||
following punctuation (e.g. the ``,`` in ``"questions,"`` or the ``.`` in
|
||||
``"done."``), stopping before the next space, alnum char, or XML tag.
|
||||
|
||||
Args:
|
||||
text: The source text to scan.
|
||||
start_pos: Starting position in *text*.
|
||||
n: Number of alphanumeric characters to consume.
|
||||
"""
|
||||
pos = start_pos
|
||||
count = 0
|
||||
while pos < len(text) and count < n:
|
||||
if text[pos] == "<":
|
||||
end = text.find(">", pos)
|
||||
pos = end + 1 if end != -1 else pos + 1
|
||||
elif text[pos].isalnum():
|
||||
count += 1
|
||||
pos += 1
|
||||
else:
|
||||
pos += 1
|
||||
|
||||
while pos < len(text):
|
||||
if text[pos] == "<":
|
||||
break
|
||||
if text[pos].isalnum() or text[pos].isspace():
|
||||
break
|
||||
pos += 1
|
||||
|
||||
return pos
|
||||
|
||||
def add_word_and_check_complete(self, word: str) -> bool:
|
||||
"""Record a spoken word from a word-timestamp event.
|
||||
|
||||
Normalizes ``word``, appends it to the running total, and checks whether
|
||||
all expected alphanumeric characters have been covered.
|
||||
|
||||
Before advancing, checks whether the word belongs to this frame via
|
||||
``word_belongs_here``. If it does not (e.g. the TTS provider silently
|
||||
dropped a word-timestamp), the slot is force-completed: the remaining
|
||||
unspoken text from ``tts_text`` is stored in ``_frame_word`` so a
|
||||
TTSTextFrame can still be emitted for the dropped portion, all remaining
|
||||
``llm_text`` is consumed, and the entire incoming word is set as overflow
|
||||
so the caller's overflow path routes it to the next slot unchanged.
|
||||
|
||||
If ``llm_text`` was provided at construction time, also advances the LLM
|
||||
cursor by the same number of alphanumeric chars consumed from this word and
|
||||
stores the corresponding LLM span in ``_llm_consumed``. When this word
|
||||
completes the frame, the entire remaining LLM text (including any closing
|
||||
tags) is consumed so nothing is lost.
|
||||
|
||||
If the word overshoots the expected length (overflow), the raw suffix of
|
||||
the word (everything after the last char belonging to this frame) is stored
|
||||
in ``_overflow_word``, so the caller can attribute it to the next
|
||||
AggregatedTextFrame.
|
||||
|
||||
Args:
|
||||
word: A single word token returned by the TTS service. TTS services that
|
||||
emit spaces and punctuation as separate tokens (e.g. Inworld) must
|
||||
pre-merge those tokens into the preceding word before calling this
|
||||
method (see ``TTSService._merge_punct_tokens``).
|
||||
|
||||
Returns:
|
||||
True when all expected content has been covered.
|
||||
"""
|
||||
normalized = self._normalize(word)
|
||||
|
||||
prev_len = len(self._received)
|
||||
expected_len = len(self._tts_normalized)
|
||||
|
||||
self._overflow_word = None
|
||||
self._llm_consumed = None
|
||||
self._frame_word = None
|
||||
|
||||
if prev_len > expected_len:
|
||||
logger.warning(f"{self}, trying to add a word in an already complete frame")
|
||||
return True
|
||||
|
||||
# If the word doesn't match the next expected chars, the TTS provider
|
||||
# likely dropped a word-timestamp event. Force-complete this slot: emit the
|
||||
# remaining TTS text as _frame_word so a TTSTextFrame is still produced
|
||||
# for the unspoken portion, consume all remaining llm_text, and route the
|
||||
# entire incoming word as overflow for the next slot.
|
||||
if not self.word_belongs_here(word):
|
||||
self._frame_word = self._tts_text[self._tts_pos :]
|
||||
if self._llm_text is not None:
|
||||
self._llm_consumed = self._llm_text[self._llm_pos :]
|
||||
self._llm_pos = len(self._llm_text)
|
||||
# This should not happen: force-complete sweeps all remaining
|
||||
# llm_text, so the span must contain the frame word. If it
|
||||
# doesn't, tts_text and llm_text are out of sync in an
|
||||
# unexpected way — discard rather than returning a corrupt span.
|
||||
# Also removing punctuation from the frame word to match the
|
||||
# expected text, since some TTS services may add punctuation to
|
||||
# the raw text.
|
||||
word_without_punctuation = self._remove_trailing_punctuation(self._frame_word)
|
||||
if word_without_punctuation and word_without_punctuation not in self._llm_consumed:
|
||||
logger.warning(
|
||||
f"WordCompletionTracker: force-complete llm_consumed {repr(self._llm_consumed)!s} "
|
||||
f"does not contain frame_word {repr(self._frame_word)!s}, discarding"
|
||||
)
|
||||
self._llm_consumed = None
|
||||
self._received = self._tts_normalized # force-complete
|
||||
self._overflow_word = word
|
||||
return True
|
||||
|
||||
self._received += normalized
|
||||
|
||||
# How many normalized chars from this word belong to the current frame.
|
||||
chars_for_frame = min(len(normalized), expected_len - prev_len)
|
||||
|
||||
if prev_len + len(normalized) > expected_len:
|
||||
# This word straddles the frame boundary. Split into:
|
||||
# - _frame_word: the prefix of `word` up to the split point, used
|
||||
# for the TTSTextFrame of the current slot.
|
||||
# - raw overflow word: the raw suffix after the split point, used
|
||||
# to build a TTSTextFrame attributed to the next AggregatedTextFrame.
|
||||
split_pos = self._advance_by_alnums(word, 0, chars_for_frame)
|
||||
self._frame_word = word[:split_pos]
|
||||
self._overflow_word = word[split_pos:]
|
||||
else:
|
||||
# Word fits entirely in this frame.
|
||||
self._frame_word = word
|
||||
|
||||
# Advance the TTS cursor by the same alnum count so the force-complete
|
||||
# path knows where in _tts_text to start from.
|
||||
self._tts_pos = self._advance_by_alnums(self._tts_text, self._tts_pos, chars_for_frame)
|
||||
|
||||
if self._llm_text is not None:
|
||||
if self.is_complete:
|
||||
# Consume ALL remaining LLM text: closing tags (e.g. </card>)
|
||||
# and any trailing punctuation that the TTS will not send separately.
|
||||
self._llm_consumed = self._llm_text[self._llm_pos :]
|
||||
self._llm_pos = len(self._llm_text)
|
||||
else:
|
||||
if chars_for_frame == 0:
|
||||
# Consume exactly the raw word in llm_text, skipping any
|
||||
# leading spaces that belong to the previous token's span.
|
||||
start = self._llm_pos
|
||||
while start < len(self._llm_text) and self._llm_text[start].isspace():
|
||||
start += 1
|
||||
end = start + len(word)
|
||||
self._llm_consumed = self._llm_text[start:end]
|
||||
self._llm_pos = end
|
||||
else:
|
||||
# Advance through llm_text by exactly chars_for_frame alphanumeric
|
||||
# chars. Non-alnum chars (spaces, opening tags) are included in the
|
||||
# slice, preserving the original formatting for the context.
|
||||
new_pos = self._advance_by_alnums(
|
||||
self._llm_text, self._llm_pos, chars_for_frame
|
||||
)
|
||||
self._llm_consumed = self._llm_text[self._llm_pos : new_pos]
|
||||
self._llm_pos = new_pos
|
||||
# This should not happen: the LLM cursor is driven by the same
|
||||
# alnum count as the word stream, so the consumed span must contain
|
||||
# the frame word. If it doesn't, the cursors drifted out of sync
|
||||
# in an unexpected way — discard rather than returning a corrupt span.
|
||||
# Also removing punctuation from the frame word to match the
|
||||
# expected text, since some TTS services may add punctuation to
|
||||
# the raw text.
|
||||
word_without_punctuation = self._remove_trailing_punctuation(self._frame_word)
|
||||
if word_without_punctuation and self._fold_typography(
|
||||
word_without_punctuation
|
||||
) not in self._fold_typography(self._llm_consumed):
|
||||
logger.warning(
|
||||
f"WordCompletionTracker: llm_consumed {repr(self._llm_consumed)!s} "
|
||||
f"does not contain frame_word {repr(self._frame_word)!s}, discarding"
|
||||
)
|
||||
self._llm_consumed = None
|
||||
|
||||
return self.is_complete
|
||||
|
||||
def word_belongs_here(self, word: str) -> bool:
|
||||
"""Return True if this word plausibly belongs to the remaining TTS text.
|
||||
|
||||
Dispatches to one of two checks depending on whether the word contains
|
||||
any alphanumeric characters after normalization:
|
||||
|
||||
- Alnum words: prefix-match against the remaining expected chars.
|
||||
- Symbol/punctuation words (empty after normalization): literal substring
|
||||
search in the remaining raw TTS text, with a fallback for TTS providers
|
||||
that substitute Unicode symbols with ASCII punctuation.
|
||||
|
||||
Used to detect when the TTS provider silently dropped a word-timestamp
|
||||
event: if the incoming word does not match this slot's remaining content,
|
||||
the caller should force-complete this slot and route the word to the next.
|
||||
"""
|
||||
normalized = self._normalize(word)
|
||||
if normalized:
|
||||
return self._alnum_word_belongs_here(normalized)
|
||||
else:
|
||||
return self._symbol_word_belongs_here(word)
|
||||
|
||||
def _alnum_word_belongs_here(self, normalized: str) -> bool:
|
||||
"""Return True if an alnum-containing word matches this frame's remaining expected chars.
|
||||
|
||||
Accepts both full words and partial tokens — the word belongs here as long
|
||||
as its normalized characters are a prefix of what is still expected. This
|
||||
also handles the overflow case where the word is longer than the remaining
|
||||
content (the excess is detected and split in ``add_word_and_check_complete``).
|
||||
"""
|
||||
remaining = self._tts_normalized[len(self._received) :]
|
||||
if not remaining:
|
||||
return False
|
||||
check_len = min(len(normalized), len(remaining))
|
||||
return remaining.startswith(normalized[:check_len])
|
||||
|
||||
def _symbol_word_belongs_here(self, word: str) -> bool:
|
||||
"""Return True if a non-alnum word (emoji, punctuation, symbol) belongs to this frame.
|
||||
|
||||
Two checks are applied in order:
|
||||
|
||||
1. **Literal substring**: search for the raw word in the remaining TTS text.
|
||||
``_advance_by_alnums`` may have already moved ``_tts_pos`` past some trailing
|
||||
punctuation, so the search window is backed up to include those characters.
|
||||
|
||||
2. **Symbol substitution fallback**: some TTS providers substitute Unicode symbols
|
||||
with ASCII punctuation in word-timestamp events (e.g. ElevenLabs reports ``→``
|
||||
as ``-``), so check 1 always fails even though the word belongs here. If alnum
|
||||
content still remains unconsumed and the next non-space character in the TTS
|
||||
text is itself a non-alnum symbol, accept the word as a substitution.
|
||||
"""
|
||||
search_start = self._tts_pos
|
||||
while search_start > 0:
|
||||
ch = self._tts_text[search_start - 1]
|
||||
if ch.isalnum() or ch.isspace() or ch == ">":
|
||||
break
|
||||
search_start -= 1
|
||||
if word in self._tts_text[search_start:]:
|
||||
return True
|
||||
|
||||
if len(self._received) >= len(self._tts_normalized):
|
||||
return False
|
||||
|
||||
pos = self._tts_pos
|
||||
while pos < len(self._tts_text) and self._tts_text[pos].isspace():
|
||||
pos += 1
|
||||
return pos < len(self._tts_text) and not self._tts_text[pos].isalnum()
|
||||
|
||||
def get_word_for_frame(self) -> str | None:
|
||||
"""Return the portion of the last word that belongs to this frame.
|
||||
|
||||
- Normal word (no overflow): the full word.
|
||||
- Straddling word: the prefix up to the frame boundary (e.g. ``"1111"``
|
||||
from ``"1111 And"``).
|
||||
- Force-completed (word didn't belong): the remaining unspoken text from
|
||||
``tts_text`` so a TTSTextFrame can still be emitted for the dropped
|
||||
portion. The incoming word is routed as overflow to the next slot.
|
||||
"""
|
||||
return self._frame_word.strip() if self._frame_word else self._frame_word
|
||||
|
||||
def get_overflow_word(self) -> str | None:
|
||||
"""Return the raw suffix of the last word that overflows into the next frame.
|
||||
|
||||
Preserves the original casing and any non-alphanumeric characters so the
|
||||
overflow TTSTextFrame has natural word text. Returns None when there is no
|
||||
overflow (the word fit entirely within this frame).
|
||||
"""
|
||||
return self._overflow_word.strip() if self._overflow_word else self._overflow_word
|
||||
|
||||
def get_llm_consumed(self) -> str | None:
|
||||
"""Return the LLM text span consumed for the last added word.
|
||||
|
||||
Returns None if no llm_text was provided at construction time.
|
||||
"""
|
||||
return self._llm_consumed.strip() if self._llm_consumed else self._llm_consumed
|
||||
|
||||
def get_accumulated_tts_text(self) -> str:
|
||||
"""Return all consumed text from tts_text up to the current cursor position.
|
||||
|
||||
Unlike ``get_word_for_frame()`` (which reflects only the last word), this returns
|
||||
everything that has been consumed since construction or the last ``reset()``.
|
||||
"""
|
||||
return self._tts_text[: self._tts_pos]
|
||||
|
||||
def get_accumulated_llm_text(self) -> str | None:
|
||||
"""Return all consumed text from llm_text up to the current cursor position.
|
||||
|
||||
Unlike ``get_llm_consumed()`` (which reflects only the last word), this returns
|
||||
everything that has been consumed since construction or the last ``reset()``.
|
||||
Returns None if no llm_text was provided at construction time.
|
||||
"""
|
||||
if self._llm_text is None:
|
||||
return None
|
||||
return self._llm_text[: self._llm_pos]
|
||||
|
||||
def get_remaining_tts_text(self) -> str:
|
||||
"""Return the unspoken portion of tts_text, stripped of leading/trailing whitespace.
|
||||
|
||||
This is the text that the TTS provider has not yet confirmed via word-timestamp
|
||||
events. Useful for force-completing a slot when the audio context ends before all
|
||||
word-timestamp events have arrived.
|
||||
"""
|
||||
return self._tts_text[self._tts_pos :].strip()
|
||||
|
||||
def get_remaining_llm_text(self) -> str | None:
|
||||
"""Return the unspoken portion of llm_text, stripped of leading/trailing whitespace.
|
||||
|
||||
Returns None if no llm_text was provided at construction time. Like
|
||||
``get_remaining_tts_text()``, intended for force-completing a slot so that the
|
||||
conversation context receives the full original text.
|
||||
"""
|
||||
if self._llm_text is None:
|
||||
return None
|
||||
remaining = self._llm_text[self._llm_pos :].strip()
|
||||
return remaining if remaining else None
|
||||
|
||||
@property
|
||||
def is_complete(self) -> bool:
|
||||
"""True when accumulated normalized chars >= expected normalized chars."""
|
||||
return len(self._received) >= len(self._tts_normalized)
|
||||
|
||||
def reset(self):
|
||||
"""Reset received word accumulation without changing the expected text."""
|
||||
self._received = ""
|
||||
self._tts_pos = 0
|
||||
self._llm_pos = 0
|
||||
self._overflow_word = None
|
||||
self._llm_consumed = None
|
||||
self._frame_word = None
|
||||
53
src/pipecat/utils/text/word_timestamp_utils.py
Normal file
53
src/pipecat/utils/text/word_timestamp_utils.py
Normal file
@@ -0,0 +1,53 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
|
||||
"""Utilities for normalizing word-timestamp streams from TTS services."""
|
||||
|
||||
import re
|
||||
|
||||
|
||||
def merge_punct_tokens(
|
||||
word_times: list[tuple[str, float]],
|
||||
) -> list[tuple[str, float]]:
|
||||
"""Merge punctuation/space-only tokens into the preceding word.
|
||||
|
||||
Some TTS services (e.g. Inworld) emit spaces and punctuation as separate
|
||||
word-timestamp tokens rather than attaching them to the adjacent word.
|
||||
This function collapses those tokens so downstream consumers always receive
|
||||
words with trailing punctuation already attached — identical to the format
|
||||
produced by ElevenLabs or Cartesia.
|
||||
|
||||
A token is considered punct/space-only when its text contains no alphanumeric
|
||||
characters after stripping XML/HTML tags. Such tokens are appended to the
|
||||
preceding word's text and their timestamp is discarded (the preceding word's
|
||||
timestamp is kept). Leading punct/space tokens with no preceding word are
|
||||
silently discarded. Every output token is stripped of leading and trailing
|
||||
whitespace (spaces, tabs, newlines).
|
||||
|
||||
Args:
|
||||
word_times: Raw list of ``(word, timestamp)`` pairs from the TTS service.
|
||||
|
||||
Returns:
|
||||
Merged list where every entry contains at least one alphanumeric character
|
||||
and has no leading or trailing whitespace.
|
||||
|
||||
Example::
|
||||
|
||||
merge_punct_tokens([("questions", 1.0), (", ", 1.2), ("explain", 1.4)])
|
||||
# → [("questions,", 1.0), ("explain", 1.4)]
|
||||
"""
|
||||
merged: list[tuple[str, float]] = []
|
||||
for word, ts in word_times:
|
||||
stripped = re.sub(r"<[^>]+>", "", word)
|
||||
has_alnum = any(c.isalnum() for c in stripped)
|
||||
if not has_alnum:
|
||||
if merged:
|
||||
prev_word, prev_ts = merged[-1]
|
||||
merged[-1] = (prev_word + word, prev_ts)
|
||||
# else: leading punct/space with no preceding word → discard
|
||||
else:
|
||||
merged.append((word, ts))
|
||||
return [(word.strip(), ts) for word, ts in merged]
|
||||
612
tests/test_aggregated_frame_sequencer.py
Normal file
612
tests/test_aggregated_frame_sequencer.py
Normal file
@@ -0,0 +1,612 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
|
||||
"""Tests for AggregatedFrameSequencer.
|
||||
|
||||
All methods on the sequencer are synchronous and return lists of frames,
|
||||
so no async machinery is needed here.
|
||||
|
||||
Test groups:
|
||||
- register_skipped: immediate flush vs. blocked by a preceding spoken slot
|
||||
- register_spoken / complete_spoken_slot: push_text_frames=True path
|
||||
- flush: pts propagation, transport_destination, stops at incomplete spoken slot
|
||||
- process_word: normal, completing, passthrough, raw_text propagation
|
||||
- process_word overflow: single token spanning two slot boundaries
|
||||
- process_word force-complete via belongs_here failure
|
||||
- force_complete: remaining text emission, raw_text, corrupt raw discard, slot ordering
|
||||
- clear: resets all state
|
||||
"""
|
||||
|
||||
import unittest
|
||||
|
||||
from pipecat.frames.frames import AggregatedTextFrame, AggregationType, TTSTextFrame
|
||||
from pipecat.utils.context.aggregated_frame_sequencer import AggregatedFrameSequencer
|
||||
from pipecat.utils.context.word_completion_tracker import WordCompletionTracker
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _seq() -> AggregatedFrameSequencer:
|
||||
return AggregatedFrameSequencer(name="test")
|
||||
|
||||
|
||||
def _spoken_frame(text: str) -> AggregatedTextFrame:
|
||||
return AggregatedTextFrame(text, AggregationType.SENTENCE)
|
||||
|
||||
|
||||
def _skipped_frame(text: str) -> AggregatedTextFrame:
|
||||
return AggregatedTextFrame(text, "code")
|
||||
|
||||
|
||||
def _tracker(tts_text: str, llm_text: str | None = None) -> WordCompletionTracker:
|
||||
return WordCompletionTracker(tts_text, llm_text=llm_text)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# register_skipped
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestRegisterSkipped(unittest.TestCase):
|
||||
def test_emits_immediately_with_empty_queue(self):
|
||||
seq = _seq()
|
||||
frame = _skipped_frame("code block")
|
||||
result = seq.register_skipped(frame, "ctx1", None)
|
||||
self.assertEqual(len(result), 1)
|
||||
self.assertIs(result[0], frame)
|
||||
|
||||
def test_sets_append_to_context_true(self):
|
||||
seq = _seq()
|
||||
frame = _skipped_frame("code")
|
||||
seq.register_skipped(frame, "ctx1", None)
|
||||
self.assertTrue(frame.append_to_context)
|
||||
|
||||
def test_sets_context_id_on_frame(self):
|
||||
seq = _seq()
|
||||
frame = _skipped_frame("code")
|
||||
seq.register_skipped(frame, "ctx42", None)
|
||||
self.assertEqual(frame.context_id, "ctx42")
|
||||
|
||||
def test_sets_transport_destination(self):
|
||||
seq = _seq()
|
||||
frame = _skipped_frame("code")
|
||||
result = seq.register_skipped(frame, "ctx1", "dest-A")
|
||||
self.assertEqual(result[0].transport_destination, "dest-A")
|
||||
|
||||
def test_blocked_by_incomplete_spoken_slot(self):
|
||||
seq = _seq()
|
||||
seq.register_spoken(_spoken_frame("hello world"), "ctx1", _tracker("hello world"), True)
|
||||
result = seq.register_skipped(_skipped_frame("code"), "ctx2", None)
|
||||
self.assertEqual(result, [])
|
||||
|
||||
def test_emits_immediately_after_already_complete_spoken_slot(self):
|
||||
seq = _seq()
|
||||
seq.register_spoken(_spoken_frame("hi"), "ctx1", tracker=None, append_to_context=True)
|
||||
seq.complete_spoken_slot()
|
||||
result = seq.register_skipped(_skipped_frame("code"), "ctx2", None)
|
||||
self.assertEqual(len(result), 1)
|
||||
|
||||
def test_multiple_skipped_before_any_spoken_all_emit(self):
|
||||
seq = _seq()
|
||||
r1 = seq.register_skipped(_skipped_frame("code1"), "ctx1", None)
|
||||
r2 = seq.register_skipped(_skipped_frame("code2"), "ctx2", None)
|
||||
self.assertEqual(len(r1), 1)
|
||||
self.assertEqual(len(r2), 1)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# register_spoken / complete_spoken_slot (push_text_frames=True path)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestCompleteSpokenSlot(unittest.TestCase):
|
||||
def test_noop_with_empty_queue(self):
|
||||
seq = _seq()
|
||||
self.assertEqual(seq.complete_spoken_slot(), [])
|
||||
|
||||
def test_marks_slot_complete_and_flushes_skipped(self):
|
||||
seq = _seq()
|
||||
seq.register_spoken(_spoken_frame("hello"), "ctx1", tracker=None, append_to_context=True)
|
||||
skipped = _skipped_frame("code")
|
||||
seq.register_skipped(skipped, "ctx2", None) # blocked
|
||||
|
||||
result = seq.complete_spoken_slot()
|
||||
self.assertEqual(len(result), 1)
|
||||
self.assertIs(result[0], skipped)
|
||||
self.assertTrue(skipped.append_to_context)
|
||||
|
||||
def test_only_first_pending_slot_is_marked(self):
|
||||
seq = _seq()
|
||||
seq.register_spoken(_spoken_frame("one"), "ctx1", tracker=None, append_to_context=True)
|
||||
seq.register_spoken(_spoken_frame("two"), "ctx2", tracker=None, append_to_context=True)
|
||||
skipped = _skipped_frame("code")
|
||||
seq.register_skipped(skipped, "ctx3", None)
|
||||
|
||||
# ctx2 still blocks the skipped frame
|
||||
result = seq.complete_spoken_slot()
|
||||
self.assertEqual(result, [])
|
||||
|
||||
def test_skipped_flushes_after_all_preceding_spoken_complete(self):
|
||||
seq = _seq()
|
||||
seq.register_spoken(_spoken_frame("one"), "ctx1", tracker=None, append_to_context=True)
|
||||
seq.register_spoken(_spoken_frame("two"), "ctx2", tracker=None, append_to_context=True)
|
||||
skipped = _skipped_frame("code")
|
||||
seq.register_skipped(skipped, "ctx3", None)
|
||||
|
||||
seq.complete_spoken_slot() # completes ctx1
|
||||
result = seq.complete_spoken_slot() # completes ctx2 → flush skipped
|
||||
self.assertEqual(len(result), 1)
|
||||
self.assertIs(result[0], skipped)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# flush
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestFlush(unittest.TestCase):
|
||||
def test_empty_queue_returns_empty(self):
|
||||
self.assertEqual(_seq().flush(), [])
|
||||
|
||||
def test_stops_at_incomplete_spoken_slot(self):
|
||||
seq = _seq()
|
||||
seq.register_spoken(_spoken_frame("hello"), "ctx1", tracker=None, append_to_context=True)
|
||||
seq.register_skipped(_skipped_frame("code"), "ctx2", None)
|
||||
self.assertEqual(seq.flush(), [])
|
||||
|
||||
def test_last_word_pts_assigned_to_skipped_frame(self):
|
||||
seq = _seq()
|
||||
seq.register_spoken(_spoken_frame("hello"), "ctx1", _tracker("hello"), True)
|
||||
skipped = _skipped_frame("code")
|
||||
seq.register_skipped(skipped, "ctx2", None)
|
||||
|
||||
# process_word("hello") completes the spoken slot and calls flush(last_word_pts=77)
|
||||
result = seq.process_word("hello", pts=77, context_id="ctx1")
|
||||
flushed = [f for f in result if isinstance(f, AggregatedTextFrame) and f.text == "code"]
|
||||
self.assertEqual(len(flushed), 1)
|
||||
self.assertEqual(flushed[0].pts, 77)
|
||||
|
||||
def test_complete_spoken_slots_are_swept(self):
|
||||
seq = _seq()
|
||||
seq.register_spoken(_spoken_frame("hello"), "ctx1", tracker=None, append_to_context=True)
|
||||
seq.complete_spoken_slot()
|
||||
# Queue should be empty after sweeping the complete spoken slot
|
||||
self.assertEqual(seq._slots, [])
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# process_word — basic
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestProcessWordBasic(unittest.TestCase):
|
||||
def _seq_with_spoken(self, text: str, ctx: str = "ctx1", append: bool = True):
|
||||
seq = _seq()
|
||||
seq.register_spoken(_spoken_frame(text), ctx, _tracker(text), append)
|
||||
return seq
|
||||
|
||||
def test_returns_tts_text_frame(self):
|
||||
seq = self._seq_with_spoken("hello")
|
||||
result = seq.process_word("hello", pts=100, context_id="ctx1")
|
||||
self.assertEqual(len(result), 1)
|
||||
self.assertIsInstance(result[0], TTSTextFrame)
|
||||
|
||||
def test_frame_text_and_pts(self):
|
||||
seq = self._seq_with_spoken("hello")
|
||||
result = seq.process_word("hello", pts=100, context_id="ctx1")
|
||||
self.assertEqual(result[0].text, "hello")
|
||||
self.assertEqual(result[0].pts, 100)
|
||||
|
||||
def test_frame_context_id(self):
|
||||
seq = self._seq_with_spoken("hello", ctx="ctx99")
|
||||
result = seq.process_word("hello", pts=1, context_id="ctx99")
|
||||
self.assertEqual(result[0].context_id, "ctx99")
|
||||
|
||||
def test_append_to_context_true(self):
|
||||
seq = self._seq_with_spoken("hello", append=True)
|
||||
result = seq.process_word("hello", pts=1, context_id="ctx1")
|
||||
self.assertTrue(result[0].append_to_context)
|
||||
|
||||
def test_append_to_context_false(self):
|
||||
seq = self._seq_with_spoken("hello", append=False)
|
||||
result = seq.process_word("hello", pts=1, context_id="ctx1")
|
||||
self.assertFalse(result[0].append_to_context)
|
||||
|
||||
def test_non_completing_word_does_not_flush_skipped(self):
|
||||
seq = self._seq_with_spoken("hello world")
|
||||
seq.register_skipped(_skipped_frame("code"), "ctx2", None)
|
||||
result = seq.process_word("hello", pts=10, context_id="ctx1")
|
||||
self.assertEqual(len(result), 1)
|
||||
self.assertIsInstance(result[0], TTSTextFrame)
|
||||
|
||||
def test_completing_word_flushes_blocked_skipped_frame(self):
|
||||
seq = self._seq_with_spoken("hello")
|
||||
skipped = _skipped_frame("code")
|
||||
seq.register_skipped(skipped, "ctx2", None)
|
||||
result = seq.process_word("hello", pts=50, context_id="ctx1")
|
||||
self.assertEqual(len(result), 2)
|
||||
self.assertIsInstance(result[0], TTSTextFrame)
|
||||
self.assertIs(result[1], skipped)
|
||||
|
||||
def test_last_of_multiple_words_flushes_skipped(self):
|
||||
seq = self._seq_with_spoken("hello world")
|
||||
skipped = _skipped_frame("code")
|
||||
seq.register_skipped(skipped, "ctx2", None)
|
||||
seq.process_word("hello", pts=10, context_id="ctx1")
|
||||
result = seq.process_word("world", pts=20, context_id="ctx1")
|
||||
self.assertTrue(any(f is skipped for f in result))
|
||||
|
||||
def test_no_active_slot_emits_passthrough(self):
|
||||
seq = _seq()
|
||||
result = seq.process_word("hello", pts=1, context_id="ctx-unknown")
|
||||
self.assertEqual(len(result), 1)
|
||||
self.assertIsInstance(result[0], TTSTextFrame)
|
||||
self.assertEqual(result[0].text, "hello")
|
||||
self.assertEqual(result[0].context_id, "ctx-unknown")
|
||||
|
||||
def test_passthrough_uses_default_append_to_context_true(self):
|
||||
seq = _seq()
|
||||
result = seq.process_word("hello", pts=1, context_id="ctx-unknown")
|
||||
self.assertTrue(result[0].append_to_context)
|
||||
|
||||
def test_unrecognised_word_emits_passthrough(self):
|
||||
seq = _seq()
|
||||
seq.register_spoken(_spoken_frame("hello world"), "ctx1", _tracker("hello world"), True)
|
||||
# "zzz" doesn't belong to "hello world" and there is no next slot
|
||||
result = seq.process_word("zzz", pts=5, context_id="ctx1")
|
||||
self.assertEqual(len(result), 1)
|
||||
self.assertEqual(result[0].text, "zzz")
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# process_word — raw_text propagation
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestProcessWordRawText(unittest.TestCase):
|
||||
def test_raw_text_split_across_word_frames(self):
|
||||
seq = _seq()
|
||||
seq.register_spoken(
|
||||
_spoken_frame("4111 1111"),
|
||||
"ctx1",
|
||||
WordCompletionTracker("4111 1111", llm_text="<card>4111 1111</card>"),
|
||||
append_to_context=True,
|
||||
)
|
||||
r1 = seq.process_word("4111", pts=10, context_id="ctx1")
|
||||
r2 = seq.process_word("1111", pts=20, context_id="ctx1")
|
||||
self.assertEqual(r1[0].raw_text, "<card>4111")
|
||||
last_word_frames = [f for f in r2 if isinstance(f, TTSTextFrame)]
|
||||
self.assertEqual(last_word_frames[0].raw_text, "1111</card>")
|
||||
|
||||
def test_raw_text_none_when_no_llm_text(self):
|
||||
seq = _seq()
|
||||
seq.register_spoken(_spoken_frame("hello"), "ctx1", _tracker("hello"), True)
|
||||
result = seq.process_word("hello", pts=1, context_id="ctx1")
|
||||
self.assertIsNone(result[0].raw_text)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# process_word — overflow (single token spanning two slots)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestProcessWordOverflow(unittest.TestCase):
|
||||
def test_overflow_produces_two_tts_text_frames(self):
|
||||
seq = _seq()
|
||||
seq.register_spoken(_spoken_frame("abc"), "ctx1", _tracker("abc"), True)
|
||||
seq.register_spoken(_spoken_frame("def"), "ctx2", _tracker("def"), True)
|
||||
|
||||
result = seq.process_word("abcdef", pts=100, context_id="ctx1")
|
||||
word_frames = [f for f in result if isinstance(f, TTSTextFrame)]
|
||||
self.assertEqual(len(word_frames), 2)
|
||||
self.assertEqual(word_frames[0].text, "abc")
|
||||
self.assertEqual(word_frames[1].text, "def")
|
||||
|
||||
def test_overflow_assigns_correct_context_ids(self):
|
||||
seq = _seq()
|
||||
seq.register_spoken(_spoken_frame("abc"), "ctx1", _tracker("abc"), True)
|
||||
seq.register_spoken(_spoken_frame("def"), "ctx2", _tracker("def"), True)
|
||||
|
||||
result = seq.process_word("abcdef", pts=100, context_id="ctx1")
|
||||
word_frames = [f for f in result if isinstance(f, TTSTextFrame)]
|
||||
self.assertEqual(word_frames[0].context_id, "ctx1")
|
||||
self.assertEqual(word_frames[1].context_id, "ctx2")
|
||||
|
||||
def test_overflow_completing_next_slot_flushes_skipped(self):
|
||||
seq = _seq()
|
||||
seq.register_spoken(_spoken_frame("abc"), "ctx1", _tracker("abc"), True)
|
||||
seq.register_spoken(_spoken_frame("def"), "ctx2", _tracker("def"), True)
|
||||
skipped = _skipped_frame("code")
|
||||
seq.register_skipped(skipped, "ctx3", None) # blocked behind ctx2
|
||||
|
||||
result = seq.process_word("abcdef", pts=100, context_id="ctx1")
|
||||
self.assertTrue(any(f is skipped for f in result))
|
||||
|
||||
def test_overflow_not_completing_next_slot_does_not_flush_skipped(self):
|
||||
seq = _seq()
|
||||
seq.register_spoken(_spoken_frame("abc"), "ctx1", _tracker("abc"), True)
|
||||
seq.register_spoken(_spoken_frame("def ghi"), "ctx2", _tracker("def ghi"), True)
|
||||
skipped = _skipped_frame("code")
|
||||
seq.register_skipped(skipped, "ctx3", None)
|
||||
|
||||
# "abcdef" overflows: "def" goes to ctx2, but ctx2 still expects " ghi"
|
||||
result = seq.process_word("abcdef", pts=100, context_id="ctx1")
|
||||
self.assertFalse(any(f is skipped for f in result))
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# process_word — force-complete via word_belongs_here failure
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestProcessWordForcesComplete(unittest.TestCase):
|
||||
def test_word_for_next_slot_force_completes_current(self):
|
||||
"""When a word belongs to the next slot but not the current, the current
|
||||
slot is force-completed and the word is routed to the next slot."""
|
||||
seq = _seq()
|
||||
seq.register_spoken(_spoken_frame("hello"), "ctx1", _tracker("hello"), True)
|
||||
seq.register_spoken(_spoken_frame("world"), "ctx2", _tracker("world"), True)
|
||||
|
||||
# "world" doesn't belong to ctx1 but belongs to ctx2
|
||||
result = seq.process_word("world", pts=50, context_id="ctx2")
|
||||
word_frames = [f for f in result if isinstance(f, TTSTextFrame)]
|
||||
texts = {f.text for f in word_frames}
|
||||
self.assertIn("world", texts)
|
||||
|
||||
def test_force_complete_then_overflow_flushes_skipped(self):
|
||||
seq = _seq()
|
||||
seq.register_spoken(_spoken_frame("hello"), "ctx1", _tracker("hello"), True)
|
||||
seq.register_spoken(_spoken_frame("world"), "ctx2", _tracker("world"), True)
|
||||
skipped = _skipped_frame("code")
|
||||
seq.register_skipped(skipped, "ctx3", None)
|
||||
|
||||
# "world" force-completes ctx1 and completes ctx2 via overflow
|
||||
result = seq.process_word("world", pts=50, context_id="ctx2")
|
||||
self.assertTrue(any(f is skipped for f in result))
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# force_complete
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestForceComplete(unittest.TestCase):
|
||||
def test_emits_remaining_text_when_word_dropped(self):
|
||||
seq = _seq()
|
||||
seq.register_spoken(_spoken_frame("hello world"), "ctx1", _tracker("hello world"), True)
|
||||
seq.process_word("hello", pts=10, context_id="ctx1") # "world" never arrives
|
||||
|
||||
result = seq.force_complete(last_word_pts=50)
|
||||
tts_frames = [f for f in result if isinstance(f, TTSTextFrame)]
|
||||
self.assertEqual(len(tts_frames), 1)
|
||||
self.assertEqual(tts_frames[0].text, "world")
|
||||
self.assertEqual(tts_frames[0].pts, 50)
|
||||
|
||||
def test_emits_full_text_when_no_words_arrived(self):
|
||||
seq = _seq()
|
||||
seq.register_spoken(_spoken_frame("hello world"), "ctx1", _tracker("hello world"), True)
|
||||
|
||||
result = seq.force_complete(last_word_pts=0)
|
||||
tts_frames = [f for f in result if isinstance(f, TTSTextFrame)]
|
||||
self.assertEqual(len(tts_frames), 1)
|
||||
self.assertEqual(tts_frames[0].text, "hello world")
|
||||
|
||||
def test_already_complete_slot_emits_nothing(self):
|
||||
seq = _seq()
|
||||
seq.register_spoken(_spoken_frame("hi"), "ctx1", _tracker("hi"), True)
|
||||
seq.process_word("hi", pts=5, context_id="ctx1") # completes normally
|
||||
|
||||
result = seq.force_complete(last_word_pts=10)
|
||||
self.assertEqual(result, [])
|
||||
|
||||
def test_flushes_skipped_frames_after_completing(self):
|
||||
seq = _seq()
|
||||
seq.register_spoken(_spoken_frame("hello"), "ctx1", _tracker("hello"), True)
|
||||
skipped = _skipped_frame("code")
|
||||
seq.register_skipped(skipped, "ctx2", None)
|
||||
|
||||
result = seq.force_complete(last_word_pts=20)
|
||||
self.assertTrue(any(f is skipped for f in result))
|
||||
self.assertTrue(skipped.append_to_context)
|
||||
|
||||
def test_propagates_raw_text(self):
|
||||
seq = _seq()
|
||||
seq.register_spoken(
|
||||
_spoken_frame("4111 1111"),
|
||||
"ctx1",
|
||||
WordCompletionTracker("4111 1111", llm_text="<card>4111 1111</card>"),
|
||||
append_to_context=True,
|
||||
)
|
||||
seq.process_word("4111", pts=10, context_id="ctx1") # "1111" never arrives
|
||||
|
||||
result = seq.force_complete(last_word_pts=20)
|
||||
tts_frames = [f for f in result if isinstance(f, TTSTextFrame)]
|
||||
self.assertEqual(tts_frames[0].text, "1111")
|
||||
self.assertEqual(tts_frames[0].raw_text, "1111</card>")
|
||||
|
||||
def test_discards_corrupt_raw_remaining(self):
|
||||
"""raw_remaining is discarded when it does not contain remaining_text."""
|
||||
seq = _seq()
|
||||
# "abc" normalized ≠ "xyz" normalized — any remaining won't be in raw_remaining
|
||||
seq.register_spoken(
|
||||
_spoken_frame("abc"),
|
||||
"ctx1",
|
||||
WordCompletionTracker("abc", llm_text="xyz"),
|
||||
append_to_context=True,
|
||||
)
|
||||
result = seq.force_complete(last_word_pts=0)
|
||||
tts_frames = [f for f in result if isinstance(f, TTSTextFrame)]
|
||||
self.assertEqual(len(tts_frames), 1)
|
||||
self.assertEqual(tts_frames[0].text, "abc")
|
||||
self.assertIsNone(tts_frames[0].raw_text) # discarded due to corruption
|
||||
|
||||
def test_slot_without_tracker_just_marks_complete_and_flushes(self):
|
||||
seq = _seq()
|
||||
seq.register_spoken(_spoken_frame("hello"), "ctx1", tracker=None, append_to_context=True)
|
||||
skipped = _skipped_frame("code")
|
||||
seq.register_skipped(skipped, "ctx2", None)
|
||||
|
||||
result = seq.force_complete(last_word_pts=0)
|
||||
tts_frames = [f for f in result if isinstance(f, TTSTextFrame)]
|
||||
self.assertEqual(tts_frames, []) # no tracker → no word frame
|
||||
self.assertTrue(any(f is skipped for f in result))
|
||||
|
||||
def test_multiple_incomplete_slots_all_emitted(self):
|
||||
seq = _seq()
|
||||
seq.register_spoken(_spoken_frame("hello"), "ctx1", _tracker("hello"), True)
|
||||
seq.register_spoken(_spoken_frame("world"), "ctx2", _tracker("world"), True)
|
||||
|
||||
result = seq.force_complete(last_word_pts=0)
|
||||
tts_frames = [f for f in result if isinstance(f, TTSTextFrame)]
|
||||
texts = {f.text for f in tts_frames}
|
||||
self.assertIn("hello", texts)
|
||||
self.assertIn("world", texts)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# clear
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestClear(unittest.TestCase):
|
||||
def test_clears_slots(self):
|
||||
seq = _seq()
|
||||
seq.register_spoken(_spoken_frame("hello"), "ctx1", _tracker("hello"), True)
|
||||
seq.register_skipped(_skipped_frame("code"), "ctx2", None)
|
||||
seq.clear()
|
||||
self.assertEqual(seq._slots, [])
|
||||
|
||||
def test_clears_context_map(self):
|
||||
seq = _seq()
|
||||
seq.register_spoken(_spoken_frame("hello"), "ctx1", _tracker("hello"), True)
|
||||
seq.clear()
|
||||
self.assertEqual(seq._context_append_to_context, {})
|
||||
|
||||
def test_after_clear_skipped_emits_immediately(self):
|
||||
seq = _seq()
|
||||
seq.register_spoken(_spoken_frame("hello"), "ctx1", _tracker("hello"), True)
|
||||
seq.clear()
|
||||
frame = _skipped_frame("code")
|
||||
result = seq.register_skipped(frame, "ctx2", None)
|
||||
self.assertEqual(len(result), 1)
|
||||
|
||||
def test_after_clear_process_word_uses_passthrough(self):
|
||||
seq = _seq()
|
||||
seq.register_spoken(_spoken_frame("hello"), "ctx1", _tracker("hello"), True)
|
||||
seq.clear()
|
||||
result = seq.process_word("hello", pts=1, context_id="ctx1")
|
||||
# No active slot after clear → passthrough
|
||||
self.assertEqual(len(result), 1)
|
||||
self.assertEqual(result[0].text, "hello")
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# CJK languages — Korean, Japanese, Chinese
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestCJKLanguages(unittest.TestCase):
|
||||
"""Sequencer behaviour for CJK language scenarios.
|
||||
|
||||
Korean: Cartesia returns each word as a separate timestamp event (one word
|
||||
per process_word call). Japanese/Chinese: Cartesia merges all characters
|
||||
in one timestamp message into a single combined token before calling
|
||||
process_word.
|
||||
"""
|
||||
|
||||
# --- Korean ---
|
||||
|
||||
def test_korean_word_by_word_completes_slot_and_flushes_skipped(self):
|
||||
"""Korean words fed one at a time complete the spoken slot and unblock a skipped frame."""
|
||||
seq = _seq()
|
||||
sentence = "저는 여러분의 AI 어시스턴트입니다."
|
||||
words = ["저는", "여러분의", "AI", "어시스턴트입니다."]
|
||||
seq.register_spoken(_spoken_frame(sentence), "ctx1", _tracker(sentence), True)
|
||||
skipped = _skipped_frame("[code]")
|
||||
seq.register_skipped(skipped, "ctx2", None)
|
||||
|
||||
# Skipped stays blocked until the last word arrives
|
||||
for word in words[:-1]:
|
||||
partial = seq.process_word(word, pts=100, context_id="ctx1")
|
||||
self.assertFalse(any(f is skipped for f in partial))
|
||||
|
||||
result = seq.process_word(words[-1], pts=200, context_id="ctx1")
|
||||
self.assertTrue(any(f is skipped for f in result))
|
||||
|
||||
def test_korean_force_complete_emits_correct_remaining_text(self):
|
||||
"""After one Korean word, force_complete emits the correct unspoken suffix."""
|
||||
seq = _seq()
|
||||
sentence = "저는 여러분의 AI 어시스턴트입니다."
|
||||
seq.register_spoken(_spoken_frame(sentence), "ctx1", _tracker(sentence), True)
|
||||
seq.process_word("저는", pts=10, context_id="ctx1")
|
||||
|
||||
result = seq.force_complete(last_word_pts=50)
|
||||
tts_frames = [f for f in result if isinstance(f, TTSTextFrame)]
|
||||
self.assertEqual(len(tts_frames), 1)
|
||||
self.assertEqual(tts_frames[0].text, "여러분의 AI 어시스턴트입니다.")
|
||||
self.assertEqual(tts_frames[0].pts, 50)
|
||||
|
||||
# --- Japanese ---
|
||||
|
||||
def test_japanese_combined_groups_complete_spoken_slot(self):
|
||||
"""Two Cartesia-style combined Japanese groups complete the slot and flush skipped."""
|
||||
seq = _seq()
|
||||
sentence = "こんにちは、私はあなたの"
|
||||
seq.register_spoken(_spoken_frame(sentence), "ctx1", _tracker(sentence), True)
|
||||
skipped = _skipped_frame("[skipped]")
|
||||
seq.register_skipped(skipped, "ctx2", None)
|
||||
|
||||
r1 = seq.process_word("こんにちは、私", pts=100, context_id="ctx1")
|
||||
self.assertFalse(any(f is skipped for f in r1))
|
||||
|
||||
r2 = seq.process_word("はあなたの", pts=200, context_id="ctx1")
|
||||
self.assertTrue(any(f is skipped for f in r2))
|
||||
|
||||
def test_japanese_force_complete_emits_remaining_chars(self):
|
||||
"""After the first Japanese combined group, force_complete emits the rest."""
|
||||
seq = _seq()
|
||||
sentence = "こんにちは、私はあなたの"
|
||||
seq.register_spoken(_spoken_frame(sentence), "ctx1", _tracker(sentence), True)
|
||||
seq.process_word("こんにちは、私", pts=10, context_id="ctx1")
|
||||
|
||||
result = seq.force_complete(last_word_pts=50)
|
||||
tts_frames = [f for f in result if isinstance(f, TTSTextFrame)]
|
||||
self.assertEqual(len(tts_frames), 1)
|
||||
self.assertEqual(tts_frames[0].text, "はあなたの")
|
||||
|
||||
# --- Chinese ---
|
||||
|
||||
def test_chinese_combined_groups_complete_spoken_slot(self):
|
||||
"""Two Cartesia-style combined Chinese groups complete the slot and flush skipped."""
|
||||
seq = _seq()
|
||||
sentence = "你好,我是你的智能"
|
||||
seq.register_spoken(_spoken_frame(sentence), "ctx1", _tracker(sentence), True)
|
||||
skipped = _skipped_frame("[skipped]")
|
||||
seq.register_skipped(skipped, "ctx2", None)
|
||||
|
||||
r1 = seq.process_word("你好,我是", pts=100, context_id="ctx1")
|
||||
self.assertFalse(any(f is skipped for f in r1))
|
||||
|
||||
r2 = seq.process_word("你的智能", pts=200, context_id="ctx1")
|
||||
self.assertTrue(any(f is skipped for f in r2))
|
||||
|
||||
def test_chinese_force_complete_emits_remaining_chars(self):
|
||||
"""After the first Chinese combined group, force_complete emits the rest."""
|
||||
seq = _seq()
|
||||
sentence = "你好,我是你的智能"
|
||||
seq.register_spoken(_spoken_frame(sentence), "ctx1", _tracker(sentence), True)
|
||||
seq.process_word("你好,我是", pts=10, context_id="ctx1")
|
||||
|
||||
result = seq.force_complete(last_word_pts=50)
|
||||
tts_frames = [f for f in result if isinstance(f, TTSTextFrame)]
|
||||
self.assertEqual(len(tts_frames), 1)
|
||||
self.assertEqual(tts_frames[0].text, "你的智能")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
45
tests/test_cartesia_stt.py
Normal file
45
tests/test_cartesia_stt.py
Normal file
@@ -0,0 +1,45 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
|
||||
from unittest.mock import AsyncMock
|
||||
|
||||
import pytest
|
||||
from websockets.protocol import State
|
||||
|
||||
from pipecat.services.cartesia.stt import CartesiaSTTService
|
||||
|
||||
|
||||
class _FakeWebsocket:
|
||||
def __init__(self, *, state=State.OPEN, send_side_effect=None):
|
||||
self.state = state
|
||||
self.send = AsyncMock(side_effect=send_side_effect)
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_cartesia_connect_failure_clears_stale_websocket(monkeypatch):
|
||||
async def fake_websocket_connect(*args, **kwargs):
|
||||
raise RuntimeError("connection failed")
|
||||
|
||||
monkeypatch.setattr("pipecat.services.cartesia.stt.websocket_connect", fake_websocket_connect)
|
||||
|
||||
service = CartesiaSTTService(api_key="test-key", sample_rate=16000)
|
||||
service._websocket = _FakeWebsocket(state=State.CLOSED)
|
||||
|
||||
await service._connect_websocket()
|
||||
|
||||
assert service._websocket is None
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_cartesia_run_stt_logs_send_failure_without_clearing_websocket():
|
||||
service = CartesiaSTTService(api_key="test-key", sample_rate=16000)
|
||||
websocket = _FakeWebsocket(send_side_effect=RuntimeError("websocket closed"))
|
||||
service._websocket = websocket
|
||||
|
||||
async for _ in service.run_stt(b"\x00" * 160):
|
||||
pass
|
||||
|
||||
assert service._websocket is websocket
|
||||
@@ -18,7 +18,7 @@ def _service(language: str) -> CartesiaTTSService:
|
||||
def _process_word_timestamps(
|
||||
words: list[str], starts: list[float], language: str
|
||||
) -> list[tuple[str, float]]:
|
||||
return _service(language)._process_word_timestamps_for_language(words, starts)
|
||||
return _service(language)._normalize_word_timestamps(words, starts)
|
||||
|
||||
|
||||
def _concatenate_processed_timestamps(
|
||||
@@ -27,7 +27,7 @@ def _concatenate_processed_timestamps(
|
||||
service = _service(language)
|
||||
text_parts = []
|
||||
for words, starts in timestamp_groups:
|
||||
processed_timestamps = service._process_word_timestamps_for_language(words, starts)
|
||||
processed_timestamps = service._normalize_word_timestamps(words, starts)
|
||||
includes_inter_frame_spaces = service._word_timestamps_include_inter_frame_spaces()
|
||||
text_parts.extend(
|
||||
TextPartForConcatenation(
|
||||
|
||||
@@ -6,9 +6,14 @@
|
||||
|
||||
"""Tests for ElevenLabs TTS alignment handling."""
|
||||
|
||||
import json
|
||||
from typing import Any
|
||||
|
||||
import pytest
|
||||
from websockets.protocol import State
|
||||
|
||||
from pipecat.services.elevenlabs.tts import (
|
||||
ElevenLabsTTSService,
|
||||
_select_alignment,
|
||||
_strip_utterance_leading_spaces,
|
||||
calculate_word_times,
|
||||
@@ -200,3 +205,87 @@ def test_select_alignment_works_with_http_field_names():
|
||||
)
|
||||
assert selected is not None
|
||||
assert selected["characters"] == list(" Hi")
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Keepalive vs context-init race
|
||||
#
|
||||
# The keepalive must only stamp a context_id once its context-init (carrying
|
||||
# voice_settings) has been sent. Stamping it earlier makes the keepalive the
|
||||
# context's first message, with no voice_settings, and ElevenLabs rejects the
|
||||
# later context-init with a 1008 policy violation.
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class _FakeWebSocket:
|
||||
"""Minimal stand-in for the ElevenLabs websocket that records sends."""
|
||||
|
||||
def __init__(self):
|
||||
self.state = State.OPEN
|
||||
self.sent: list[dict] = []
|
||||
|
||||
async def send(self, data: str):
|
||||
self.sent.append(json.loads(data))
|
||||
|
||||
|
||||
def _make_service() -> ElevenLabsTTSService:
|
||||
return ElevenLabsTTSService(
|
||||
api_key="test-key",
|
||||
settings=ElevenLabsTTSService.Settings(
|
||||
voice="test-voice",
|
||||
stability=0.55,
|
||||
similarity_boost=0.85,
|
||||
use_speaker_boost=True,
|
||||
speed=0.81,
|
||||
),
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_keepalive_does_not_stamp_context_before_init():
|
||||
"""During the pre-init window the keepalive must not stamp the new context_id."""
|
||||
service = _make_service()
|
||||
ws = _FakeWebSocket()
|
||||
service._websocket = ws
|
||||
|
||||
# Simulate the start of an LLM turn: TTSService sets the turn context id on
|
||||
# LLMFullResponseStartFrame, before run_tts sends the voice_settings init.
|
||||
service._turn_context_id = "ctx-1"
|
||||
service._playing_context_id = None
|
||||
assert "ctx-1" not in service._context_init_sent
|
||||
|
||||
await service._send_keepalive()
|
||||
|
||||
# Context-less keepalive: the real context-init stays the context's first
|
||||
# message, so ElevenLabs won't reject it with 1008.
|
||||
assert ws.sent == [{"text": ""}]
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_keepalive_stamps_context_after_init():
|
||||
"""Once the context-init has been sent, the keepalive targets that context."""
|
||||
service = _make_service()
|
||||
ws = _FakeWebSocket()
|
||||
service._websocket = ws
|
||||
service._turn_context_id = "ctx-1"
|
||||
service._playing_context_id = None
|
||||
# run_tts records the context once its voice_settings init has gone out.
|
||||
service._context_init_sent.add("ctx-1")
|
||||
|
||||
await service._send_keepalive()
|
||||
|
||||
assert ws.sent == [{"text": "", "context_id": "ctx-1"}]
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_keepalive_without_active_context_sends_empty():
|
||||
"""With no active context, the keepalive sends a plain empty message."""
|
||||
service = _make_service()
|
||||
ws = _FakeWebSocket()
|
||||
service._websocket = ws
|
||||
service._turn_context_id = None
|
||||
service._playing_context_id = None
|
||||
|
||||
await service._send_keepalive()
|
||||
|
||||
assert ws.sent == [{"text": ""}]
|
||||
|
||||
324
tests/test_runner_run.py
Normal file
324
tests/test_runner_run.py
Normal file
@@ -0,0 +1,324 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
|
||||
import argparse
|
||||
import io
|
||||
import sys
|
||||
import types
|
||||
import unittest
|
||||
from contextlib import redirect_stdout
|
||||
from unittest.mock import MagicMock, patch
|
||||
|
||||
from fastapi import FastAPI
|
||||
from fastapi.testclient import TestClient
|
||||
from pydantic import BaseModel
|
||||
|
||||
from pipecat.runner.run import (
|
||||
_print_startup_message,
|
||||
_setup_daily_routes,
|
||||
_setup_telephony_routes,
|
||||
_setup_unified_start_route,
|
||||
_setup_webrtc_routes,
|
||||
_setup_websocket_routes,
|
||||
_transport_route_dependencies,
|
||||
_transport_routes_enabled,
|
||||
)
|
||||
|
||||
|
||||
class TestRunnerRun(unittest.TestCase):
|
||||
def _capture_startup_message(self, args: argparse.Namespace) -> str:
|
||||
buffer = io.StringIO()
|
||||
with redirect_stdout(buffer):
|
||||
_print_startup_message(args)
|
||||
return buffer.getvalue()
|
||||
|
||||
def test_transport_route_dependencies_maps_transports_to_modules(self):
|
||||
self.assertEqual(_transport_route_dependencies("daily"), ("daily",))
|
||||
self.assertEqual(_transport_route_dependencies("webrtc"), ("aiortc",))
|
||||
self.assertEqual(_transport_route_dependencies("websocket"), ("fastapi", "websockets"))
|
||||
self.assertEqual(_transport_route_dependencies("telephony"), ("fastapi", "websockets"))
|
||||
self.assertEqual(_transport_route_dependencies("twilio"), ("fastapi", "websockets"))
|
||||
self.assertEqual(_transport_route_dependencies("telnyx"), ("fastapi", "websockets"))
|
||||
self.assertEqual(_transport_route_dependencies("plivo"), ("fastapi", "websockets"))
|
||||
self.assertEqual(_transport_route_dependencies("exotel"), ("fastapi", "websockets"))
|
||||
self.assertEqual(_transport_route_dependencies("vonage"), ())
|
||||
|
||||
def test_transport_routes_enabled_maps_transports_to_dependency_checks(self):
|
||||
def module_available(module: str) -> bool:
|
||||
return module in {"fastapi", "websockets"}
|
||||
|
||||
with patch("pipecat.runner.run._is_module_available", side_effect=module_available):
|
||||
self.assertFalse(_transport_routes_enabled("daily"))
|
||||
self.assertFalse(_transport_routes_enabled("webrtc"))
|
||||
self.assertTrue(_transport_routes_enabled("websocket"))
|
||||
self.assertTrue(_transport_routes_enabled("telephony"))
|
||||
self.assertTrue(_transport_routes_enabled("twilio"))
|
||||
self.assertTrue(_transport_routes_enabled("vonage"))
|
||||
|
||||
def test_setup_webrtc_routes_skips_when_aiortc_is_missing(self):
|
||||
"""WebRTC routes should be optional when the webrtc extra is not installed."""
|
||||
app = FastAPI()
|
||||
args = argparse.Namespace(folder=None, esp32=False, host="localhost")
|
||||
|
||||
with (
|
||||
patch("pipecat.runner.run._transport_routes_enabled", return_value=False),
|
||||
patch("pipecat.runner.run.logger") as logger,
|
||||
):
|
||||
_setup_webrtc_routes(app, args, {})
|
||||
|
||||
paths = {route.path for route in app.routes}
|
||||
self.assertNotIn("/api/offer", paths)
|
||||
logger.info.assert_not_called()
|
||||
|
||||
def test_setup_webrtc_routes_registers_routes_when_webrtc_is_available(self):
|
||||
"""WebRTC routes should be registered when dependencies are available."""
|
||||
app = FastAPI()
|
||||
args = argparse.Namespace(folder=None, esp32=False, host="localhost")
|
||||
|
||||
connection_module = types.ModuleType("pipecat.transports.smallwebrtc.connection")
|
||||
connection_module.SmallWebRTCConnection = MagicMock()
|
||||
|
||||
request_handler_module = types.ModuleType("pipecat.transports.smallwebrtc.request_handler")
|
||||
|
||||
class IceCandidate(BaseModel):
|
||||
candidate: str
|
||||
sdp_mid: str
|
||||
sdp_mline_index: int
|
||||
|
||||
class SmallWebRTCPatchRequest(BaseModel):
|
||||
pc_id: str
|
||||
candidates: list[IceCandidate] = []
|
||||
|
||||
class SmallWebRTCRequest(BaseModel):
|
||||
sdp: str
|
||||
type: str
|
||||
pc_id: str | None = None
|
||||
restart_pc: bool | None = None
|
||||
request_data: dict | None = None
|
||||
|
||||
request_handler_module.IceCandidate = IceCandidate
|
||||
request_handler_module.SmallWebRTCPatchRequest = SmallWebRTCPatchRequest
|
||||
request_handler_module.SmallWebRTCRequest = SmallWebRTCRequest
|
||||
|
||||
class MockSmallWebRTCRequestHandler:
|
||||
def __init__(self, *args, **kwargs):
|
||||
pass
|
||||
|
||||
async def close(self):
|
||||
pass
|
||||
|
||||
request_handler_module.SmallWebRTCRequestHandler = MockSmallWebRTCRequestHandler
|
||||
|
||||
with (
|
||||
patch("pipecat.runner.run._transport_routes_enabled", return_value=True),
|
||||
patch.dict(
|
||||
sys.modules,
|
||||
{
|
||||
"pipecat.transports.smallwebrtc.connection": connection_module,
|
||||
"pipecat.transports.smallwebrtc.request_handler": request_handler_module,
|
||||
},
|
||||
),
|
||||
):
|
||||
_setup_webrtc_routes(app, args, {})
|
||||
|
||||
paths = {route.path for route in app.routes}
|
||||
self.assertIn("/api/offer", paths)
|
||||
self.assertIn("/files/{filename:path}", paths)
|
||||
|
||||
def test_setup_websocket_routes_skips_when_websocket_is_missing(self):
|
||||
"""Plain WebSocket routes should be optional."""
|
||||
app = FastAPI()
|
||||
args = argparse.Namespace()
|
||||
|
||||
with patch("pipecat.runner.run._transport_routes_enabled", return_value=False):
|
||||
_setup_websocket_routes(app, args)
|
||||
|
||||
paths = {route.path for route in app.routes}
|
||||
self.assertNotIn("/ws-client", paths)
|
||||
|
||||
def test_setup_websocket_routes_registers_when_websocket_is_available(self):
|
||||
"""Plain WebSocket route should be registered when dependencies are available."""
|
||||
app = FastAPI()
|
||||
args = argparse.Namespace()
|
||||
|
||||
with patch("pipecat.runner.run._transport_routes_enabled", return_value=True):
|
||||
_setup_websocket_routes(app, args)
|
||||
|
||||
paths = {route.path for route in app.routes}
|
||||
self.assertIn("/ws-client", paths)
|
||||
|
||||
def test_setup_telephony_routes_skips_when_websocket_is_missing(self):
|
||||
"""Telephony WebSocket routes should be optional."""
|
||||
app = FastAPI()
|
||||
args = argparse.Namespace(transport=None)
|
||||
|
||||
with patch("pipecat.runner.run._transport_routes_enabled", return_value=False):
|
||||
_setup_telephony_routes(app, args)
|
||||
|
||||
paths = {route.path for route in app.routes}
|
||||
self.assertNotIn("/ws", paths)
|
||||
|
||||
def test_setup_telephony_routes_registers_when_websocket_is_available(self):
|
||||
"""Telephony WebSocket route should be registered when dependencies are available."""
|
||||
app = FastAPI()
|
||||
args = argparse.Namespace(transport=None)
|
||||
|
||||
with patch("pipecat.runner.run._transport_routes_enabled", return_value=True):
|
||||
_setup_telephony_routes(app, args)
|
||||
|
||||
paths = {route.path for route in app.routes}
|
||||
self.assertIn("/ws", paths)
|
||||
|
||||
def test_setup_telephony_routes_registers_provider_webhook_for_selected_transport(self):
|
||||
"""Provider webhook route should be registered for selected telephony transports."""
|
||||
app = FastAPI()
|
||||
args = argparse.Namespace(transport="twilio", proxy="example.ngrok.io")
|
||||
|
||||
with patch("pipecat.runner.run._transport_routes_enabled", return_value=True):
|
||||
_setup_telephony_routes(app, args)
|
||||
|
||||
post_root_routes = [
|
||||
route for route in app.routes if route.path == "/" and "POST" in route.methods
|
||||
]
|
||||
self.assertEqual(len(post_root_routes), 1)
|
||||
|
||||
def test_setup_daily_routes_skips_when_daily_is_missing(self):
|
||||
"""Daily routes should be optional."""
|
||||
app = FastAPI()
|
||||
args = argparse.Namespace(dialin=False)
|
||||
|
||||
with patch("pipecat.runner.run._transport_routes_enabled", return_value=False):
|
||||
_setup_daily_routes(app, args)
|
||||
|
||||
paths = {route.path for route in app.routes}
|
||||
self.assertNotIn("/daily", paths)
|
||||
|
||||
def test_setup_daily_routes_registers_when_daily_is_available(self):
|
||||
"""Daily route should be registered when dependencies are available."""
|
||||
app = FastAPI()
|
||||
args = argparse.Namespace(dialin=False)
|
||||
|
||||
with patch("pipecat.runner.run._transport_routes_enabled", return_value=True):
|
||||
_setup_daily_routes(app, args)
|
||||
|
||||
paths = {route.path for route in app.routes}
|
||||
self.assertIn("/daily", paths)
|
||||
|
||||
def test_setup_daily_routes_registers_dialin_route_when_enabled(self):
|
||||
"""Daily dial-in route should be registered when requested and available."""
|
||||
app = FastAPI()
|
||||
args = argparse.Namespace(dialin=True)
|
||||
|
||||
with patch("pipecat.runner.run._transport_routes_enabled", return_value=True):
|
||||
_setup_daily_routes(app, args)
|
||||
|
||||
paths = {route.path for route in app.routes}
|
||||
self.assertIn("/daily", paths)
|
||||
self.assertIn("/daily-dialin-webhook", paths)
|
||||
|
||||
def test_websocket_routes_require_fastapi_and_websockets(self):
|
||||
with patch(
|
||||
"pipecat.runner.run._is_module_available",
|
||||
side_effect=lambda module: module == "fastapi",
|
||||
) as is_module_available:
|
||||
self.assertFalse(_transport_routes_enabled("websocket"))
|
||||
|
||||
self.assertEqual(
|
||||
[call.args[0] for call in is_module_available.call_args_list],
|
||||
["fastapi", "websockets"],
|
||||
)
|
||||
|
||||
def test_start_rejects_disabled_transport_before_running_bot(self):
|
||||
app = FastAPI()
|
||||
args = argparse.Namespace(transport=None)
|
||||
_setup_unified_start_route(app, args, {})
|
||||
|
||||
with patch("pipecat.runner.run._transport_routes_enabled", return_value=False):
|
||||
response = TestClient(app).post("/start", json={"transport": "daily"})
|
||||
|
||||
self.assertEqual(response.status_code, 400)
|
||||
self.assertEqual(
|
||||
response.json()["detail"],
|
||||
(
|
||||
"Transport 'daily' is disabled in this runner environment. "
|
||||
"Check the startup banner for enabled transports."
|
||||
),
|
||||
)
|
||||
|
||||
def test_startup_message_all_transports_shows_open_url_and_transport_status(self):
|
||||
args = argparse.Namespace(transport=None, host="localhost", port=7860)
|
||||
|
||||
def routes_enabled(transport: str) -> bool:
|
||||
return transport in {"telephony", "websocket"}
|
||||
|
||||
with patch("pipecat.runner.run._transport_routes_enabled", side_effect=routes_enabled):
|
||||
output = self._capture_startup_message(args)
|
||||
|
||||
self.assertEqual(
|
||||
output,
|
||||
(
|
||||
"\n"
|
||||
"🚀 Bot ready!\n"
|
||||
" → Open: http://localhost:7860\n"
|
||||
" → Enabled transports: telephony, websocket\n"
|
||||
" → Disabled transports: daily (install pipecat-ai[daily]), "
|
||||
"webrtc (install pipecat-ai[webrtc])\n"
|
||||
"\n"
|
||||
),
|
||||
)
|
||||
|
||||
def test_startup_message_all_transports_omits_disabled_status_when_all_enabled(self):
|
||||
args = argparse.Namespace(transport=None, host="localhost", port=7860)
|
||||
|
||||
with patch("pipecat.runner.run._transport_routes_enabled", return_value=True):
|
||||
output = self._capture_startup_message(args)
|
||||
|
||||
self.assertEqual(
|
||||
output,
|
||||
(
|
||||
"\n"
|
||||
"🚀 Bot ready!\n"
|
||||
" → Open: http://localhost:7860\n"
|
||||
" → Enabled transports: daily, webrtc, telephony, websocket\n"
|
||||
"\n"
|
||||
),
|
||||
)
|
||||
|
||||
def test_startup_message_webrtc_uses_root_open_url(self):
|
||||
args = argparse.Namespace(
|
||||
transport="webrtc", host="localhost", port=7860, esp32=False, whatsapp=False
|
||||
)
|
||||
|
||||
with patch("pipecat.runner.run._transport_routes_enabled", return_value=True):
|
||||
output = self._capture_startup_message(args)
|
||||
|
||||
self.assertIn(" → Open: http://localhost:7860\n", output)
|
||||
self.assertNotIn("/client", output)
|
||||
|
||||
def test_startup_message_daily_uses_root_open_url(self):
|
||||
args = argparse.Namespace(transport="daily", host="localhost", port=7860, dialin=False)
|
||||
|
||||
with patch("pipecat.runner.run._transport_routes_enabled", return_value=True):
|
||||
output = self._capture_startup_message(args)
|
||||
|
||||
self.assertIn(" → Open: http://localhost:7860\n", output)
|
||||
self.assertNotIn("/daily in your browser", output)
|
||||
|
||||
def test_startup_message_telephony_keeps_provider_endpoint_details(self):
|
||||
args = argparse.Namespace(
|
||||
transport="twilio", host="localhost", port=7860, proxy="example.ngrok.io"
|
||||
)
|
||||
|
||||
with patch("pipecat.runner.run._transport_routes_enabled", return_value=True):
|
||||
output = self._capture_startup_message(args)
|
||||
|
||||
self.assertIn(" → Open: http://localhost:7860\n", output)
|
||||
self.assertIn(" → XML webhook: http://localhost:7860/\n", output)
|
||||
self.assertIn(" → WebSocket: ws://localhost:7860/ws\n", output)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
@@ -5,8 +5,10 @@
|
||||
#
|
||||
|
||||
import json
|
||||
from unittest.mock import AsyncMock
|
||||
|
||||
import pytest
|
||||
from websockets.protocol import State
|
||||
|
||||
from pipecat.frames.frames import TranscriptionFrame
|
||||
from pipecat.services.soniox.stt import END_TOKEN, SonioxSTTService, _language_from_tokens
|
||||
@@ -14,8 +16,10 @@ from pipecat.transcriptions.language import Language
|
||||
|
||||
|
||||
class _FakeWebsocket:
|
||||
def __init__(self, messages):
|
||||
def __init__(self, messages, *, state=State.OPEN, send_side_effect=None):
|
||||
self._messages = messages
|
||||
self.state = state
|
||||
self.send = AsyncMock(side_effect=send_side_effect)
|
||||
|
||||
def __aiter__(self):
|
||||
return self._iter_messages()
|
||||
@@ -25,6 +29,21 @@ class _FakeWebsocket:
|
||||
yield message
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_connect_failure_clears_stale_websocket_without_raising(monkeypatch):
|
||||
async def fake_websocket_connect(*args, **kwargs):
|
||||
raise RuntimeError("connection failed")
|
||||
|
||||
monkeypatch.setattr("pipecat.services.soniox.stt.websocket_connect", fake_websocket_connect)
|
||||
|
||||
service = SonioxSTTService(api_key="test-key")
|
||||
service._websocket = _FakeWebsocket([], state=State.CLOSED)
|
||||
|
||||
await service._connect_websocket()
|
||||
|
||||
assert service._websocket is None
|
||||
|
||||
|
||||
def test_language_from_tokens_uses_single_recognized_language():
|
||||
tokens = [
|
||||
{"text": "Hello", "language": "en"},
|
||||
|
||||
@@ -21,6 +21,13 @@ repeated for each TTSSpeakFrame, with no cross-group contamination.
|
||||
Also covers LLM response flow with push_text_frames=True (non-word-timestamp TTS):
|
||||
verifies TTSTextFrame ordering relative to LLMFullResponseEndFrame.
|
||||
|
||||
Also covers smart-text / WordCompletionTracker features:
|
||||
- Skipped frames (skip_aggregator_types) held until preceding spoken slots complete.
|
||||
- raw_text on AggregatedTextFrame propagated as spans to TTSTextFrames.
|
||||
- Overflow: a single TTS word straddling two AggregatedTextFrame boundaries produces
|
||||
two correctly-attributed TTSTextFrames.
|
||||
- Force-complete safety net: skipped frames flush even when TTS drops word timestamps.
|
||||
|
||||
Also covers the interruption-during-pause deadlock scenario (see test_no_deadlock_on_interrupt_*).
|
||||
"""
|
||||
|
||||
@@ -50,6 +57,7 @@ from pipecat.frames.frames import (
|
||||
)
|
||||
from pipecat.services.tts_service import TTSService
|
||||
from pipecat.tests.utils import SleepFrame, run_test
|
||||
from pipecat.utils.text.base_text_aggregator import AggregationType
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Test-only frame
|
||||
@@ -422,7 +430,7 @@ def _assert_group_ordering(
|
||||
# All frames between TTSStartedFrame and TTSStoppedFrame must be audio.
|
||||
mid_types = types[started_idx + 1 : stopped_idx]
|
||||
for t in mid_types:
|
||||
assert t is TTSAudioRawFrame, (
|
||||
assert t in (TTSAudioRawFrame, TTSTextFrame), (
|
||||
f"Group {foo_label!r}: unexpected frame {t.__name__!r} between "
|
||||
f"TTSStartedFrame and TTSStoppedFrame. Got: {type_names}"
|
||||
)
|
||||
@@ -551,7 +559,7 @@ async def test_http_push_text_llm_response_end_after_tts_text():
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_http_word_timestamps_verbatim_tokens():
|
||||
"""HTTP path: text, PTS order, flag, and text-before-audio are all verified.
|
||||
"""HTTP path: text, PTS order, and text-before-audio are all verified.
|
||||
|
||||
Word timestamps arrive in the audio context queue before the audio frame.
|
||||
_handle_audio_context caches them, then flushes when the first audio frame
|
||||
@@ -572,7 +580,6 @@ async def test_http_word_timestamps_verbatim_tokens():
|
||||
audio_frames = [f for f in down if isinstance(f, TTSAudioRawFrame)]
|
||||
|
||||
assert [f.text for f in tts_text_frames] == ["hello", "world"]
|
||||
assert all(f.includes_inter_frame_spaces is True for f in tts_text_frames)
|
||||
|
||||
pts_values = [f.pts for f in tts_text_frames]
|
||||
assert pts_values == sorted(pts_values) and len(set(pts_values)) == len(pts_values), (
|
||||
@@ -590,15 +597,14 @@ async def test_http_word_timestamps_verbatim_tokens():
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_http_word_timestamps_punctuation_tokens():
|
||||
"""Verbatim punctuation tokens are preserved with flag=True; default flag is False.
|
||||
"""Punct-only tokens are merged into the preceding word when includes_inter_frame_spaces=True.
|
||||
|
||||
Models the Inworld API scenario: the TTS returns tokens exactly as sent.
|
||||
Space placement rule:
|
||||
- word-follows-word: space is the leading char of the next word (e.g. " world")
|
||||
- word-follows-punctuation: space is the trailing char of the punctuation token
|
||||
(e.g. "! "), so the following word token carries no leading space.
|
||||
The flag must reach every frame and the text must not be modified.
|
||||
Also acts as a regression guard that flag=False is the default.
|
||||
Models the Inworld API scenario: the TTS returns separate space and punctuation
|
||||
tokens. add_word_timestamps calls merge_punct_tokens when includes_inter_frame_spaces
|
||||
is True, collapsing those tokens into the preceding word before the tracker sees them.
|
||||
|
||||
With flag=False (default) tokens are forwarded as-is; the tracker strips leading/
|
||||
trailing whitespace from each frame word via get_word_for_frame().
|
||||
"""
|
||||
verbatim_tokens = [
|
||||
("hello", 0.0),
|
||||
@@ -609,9 +615,9 @@ async def test_http_word_timestamps_punctuation_tokens():
|
||||
(" you", 0.75),
|
||||
("?", 0.9),
|
||||
]
|
||||
expected_texts = ["hello", " world", "! ", "How", " are", " you", "?"]
|
||||
|
||||
# With flag=True: all tokens verbatim, all frames carry the flag.
|
||||
# With flag=True: punct-only tokens ("! " and "?") are merged into the preceding
|
||||
# words (" world" → " world! " and " you" → " you?"), then stripped by the tracker.
|
||||
tts_ifs = _MockWordTimestampHttpTTSService(
|
||||
includes_inter_frame_spaces=True,
|
||||
word_times=verbatim_tokens,
|
||||
@@ -621,12 +627,11 @@ async def test_http_word_timestamps_punctuation_tokens():
|
||||
frames_to_send=[TTSSpeakFrame(text="hello world! How are you?", append_to_context=False)],
|
||||
)
|
||||
text_frames_ifs = [f for f in frames_ifs[0] if isinstance(f, TTSTextFrame)]
|
||||
assert [f.text for f in text_frames_ifs] == expected_texts, (
|
||||
"Verbatim tokens must not be modified"
|
||||
assert [f.text for f in text_frames_ifs] == ["hello", "world!", "How", "are", "you?"], (
|
||||
"Punct-only tokens must be merged into the preceding word"
|
||||
)
|
||||
assert all(f.includes_inter_frame_spaces is True for f in text_frames_ifs)
|
||||
|
||||
# With flag=False (default): same tokens, flag must be False on every frame.
|
||||
# With flag=False (default): no merging; tracker strips leading/trailing spaces.
|
||||
tts_plain = _MockWordTimestampHttpTTSService(
|
||||
word_times=verbatim_tokens,
|
||||
)
|
||||
@@ -635,13 +640,12 @@ async def test_http_word_timestamps_punctuation_tokens():
|
||||
frames_to_send=[TTSSpeakFrame(text="hello world! How are you?", append_to_context=False)],
|
||||
)
|
||||
text_frames_plain = [f for f in frames_plain[0] if isinstance(f, TTSTextFrame)]
|
||||
assert [f.text for f in text_frames_plain] == expected_texts
|
||||
assert all(f.includes_inter_frame_spaces is False for f in text_frames_plain)
|
||||
assert [f.text for f in text_frames_plain] == ["hello", "world", "!", "How", "are", "you", "?"]
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_websocket_word_timestamps_verbatim_tokens():
|
||||
"""WebSocket path: _WordTimestampEntry carries verbatim text, PTS, and flag.
|
||||
"""WebSocket path: text, PTS order, and text-before-audio are all verified.
|
||||
|
||||
Unlike the HTTP path the word timestamps are sent asynchronously from a
|
||||
background task. They arrive before the audio frame and are cached until
|
||||
@@ -662,7 +666,6 @@ async def test_websocket_word_timestamps_verbatim_tokens():
|
||||
audio_frames = [f for f in down if isinstance(f, TTSAudioRawFrame)]
|
||||
|
||||
assert [f.text for f in tts_text_frames] == ["hello", "world"]
|
||||
assert all(f.includes_inter_frame_spaces is True for f in tts_text_frames)
|
||||
|
||||
pts_values = [f.pts for f in tts_text_frames]
|
||||
assert pts_values == sorted(pts_values) and len(set(pts_values)) == len(pts_values), (
|
||||
@@ -678,7 +681,7 @@ async def test_websocket_word_timestamps_verbatim_tokens():
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_websocket_word_timestamps_punctuation_tokens():
|
||||
"""WebSocket path: verbatim punctuation tokens reach TTSTextFrame unchanged."""
|
||||
"""WebSocket path: punct-only tokens are merged into the preceding word."""
|
||||
verbatim_tokens = [
|
||||
("hello", 0.0),
|
||||
(" world", 0.15),
|
||||
@@ -697,10 +700,443 @@ async def test_websocket_word_timestamps_punctuation_tokens():
|
||||
frames_to_send=[TTSSpeakFrame(text="hello world! How are you?", append_to_context=False)],
|
||||
)
|
||||
text_frames = [f for f in frames_received[0] if isinstance(f, TTSTextFrame)]
|
||||
assert [f.text for f in text_frames] == ["hello", " world", "! ", "How", " are", " you", "?"], (
|
||||
"Verbatim tokens must not be modified"
|
||||
assert [f.text for f in text_frames] == ["hello", "world!", "How", "are", "you?"], (
|
||||
"Punct-only tokens must be merged into the preceding word"
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Per-call word-timestamp mock (for overflow tests)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class _MockPerCallWordTimestampHttpTTSService(TTSService):
|
||||
"""HTTP-style TTS where each run_tts() call consumes its own word-time list.
|
||||
|
||||
Designed for tests that need different word tokens per sentence. The
|
||||
``word_times_per_call`` list is consumed in order; an empty inner list means
|
||||
no word-timestamp events are emitted for that call.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
word_times_per_call: list[list[tuple[str, float]]],
|
||||
**kwargs,
|
||||
):
|
||||
super().__init__(
|
||||
push_start_frame=True,
|
||||
push_stop_frames=True,
|
||||
push_text_frames=False,
|
||||
sample_rate=_SAMPLE_RATE,
|
||||
**kwargs,
|
||||
)
|
||||
self._word_times_queue = list(word_times_per_call)
|
||||
|
||||
def can_generate_metrics(self) -> bool:
|
||||
return False
|
||||
|
||||
async def run_tts(self, text: str, context_id: str) -> AsyncGenerator[Frame, None]:
|
||||
word_times = self._word_times_queue.pop(0) if self._word_times_queue else []
|
||||
if word_times:
|
||||
await self.add_word_timestamps(word_times, context_id=context_id)
|
||||
yield TTSAudioRawFrame(
|
||||
audio=_FAKE_AUDIO,
|
||||
sample_rate=_SAMPLE_RATE,
|
||||
num_channels=1,
|
||||
context_id=context_id,
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Tests: skipped frame ordering (skip_aggregator_types)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_http_skipped_frame_waits_for_spoken_words():
|
||||
"""Skipped frames are held until the preceding spoken slot's word timestamps
|
||||
are all processed, then flushed in order (HTTP / synchronous audio path).
|
||||
|
||||
Sequence sent:
|
||||
AggregatedTextFrame("hello world", SENTENCE) — spoken; yields 2 TTSTextFrames
|
||||
AggregatedTextFrame("some code", "code") — in skip_aggregator_types; must wait
|
||||
|
||||
Expected downstream order:
|
||||
TTSTextFrame("hello")
|
||||
TTSTextFrame("world")
|
||||
AggregatedTextFrame("some code", append_to_context=True)
|
||||
"""
|
||||
tts = _MockWordTimestampHttpTTSService(skip_aggregator_types=["code"])
|
||||
frames_received = await run_test(
|
||||
tts,
|
||||
frames_to_send=[
|
||||
AggregatedTextFrame("hello world", AggregationType.SENTENCE),
|
||||
AggregatedTextFrame("some code", "code"),
|
||||
],
|
||||
)
|
||||
down = frames_received[0]
|
||||
|
||||
word_frames = [f for f in down if isinstance(f, TTSTextFrame)]
|
||||
skipped = [f for f in down if isinstance(f, AggregatedTextFrame) and f.text == "some code"]
|
||||
|
||||
assert [f.text for f in word_frames] == ["hello", "world"]
|
||||
assert len(skipped) == 1
|
||||
assert skipped[0].append_to_context is True
|
||||
|
||||
last_word_idx = max(down.index(f) for f in word_frames)
|
||||
skipped_idx = down.index(skipped[0])
|
||||
assert skipped_idx > last_word_idx, (
|
||||
f"Skipped frame (pos {skipped_idx}) must appear after last word frame (pos {last_word_idx})"
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_ws_skipped_frame_waits_for_spoken_words():
|
||||
"""Same ordering guarantee on the WebSocket / async audio delivery path.
|
||||
|
||||
Because audio is delivered from a background task after asyncio.sleep(), the
|
||||
skipped frame arrives at _push_frame_respecting_previous_aggregated_frame
|
||||
*before* the spoken slot's word timestamps have been processed, directly
|
||||
exercising the hold-and-flush path.
|
||||
"""
|
||||
tts = _MockWordTimestampWSTTSService(skip_aggregator_types=["code"])
|
||||
frames_received = await run_test(
|
||||
tts,
|
||||
frames_to_send=[
|
||||
AggregatedTextFrame("hello world", AggregationType.SENTENCE),
|
||||
AggregatedTextFrame("some code", "code"),
|
||||
],
|
||||
)
|
||||
down = frames_received[0]
|
||||
|
||||
word_frames = [f for f in down if isinstance(f, TTSTextFrame)]
|
||||
skipped = [f for f in down if isinstance(f, AggregatedTextFrame) and f.text == "some code"]
|
||||
|
||||
assert [f.text for f in word_frames] == ["hello", "world"]
|
||||
assert len(skipped) == 1
|
||||
assert skipped[0].append_to_context is True
|
||||
|
||||
last_word_idx = max(down.index(f) for f in word_frames)
|
||||
skipped_idx = down.index(skipped[0])
|
||||
assert skipped_idx > last_word_idx, (
|
||||
f"Skipped frame (pos {skipped_idx}) must appear after last word frame (pos {last_word_idx})"
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_skipped_frame_before_spoken_emits_immediately():
|
||||
"""A skipped frame with no preceding spoken slot is emitted right away.
|
||||
|
||||
Sequence:
|
||||
AggregatedTextFrame("some code", "code") — no spoken slot before it → emits now
|
||||
AggregatedTextFrame("hello world", SENTENCE) — spoken; TTSTextFrames follow
|
||||
|
||||
Expected: AggregatedTextFrame("some code") appears *before* TTSTextFrame("hello").
|
||||
"""
|
||||
tts = _MockWordTimestampHttpTTSService(skip_aggregator_types=["code"])
|
||||
frames_received = await run_test(
|
||||
tts,
|
||||
frames_to_send=[
|
||||
AggregatedTextFrame("some code", "code"),
|
||||
AggregatedTextFrame("hello world", AggregationType.SENTENCE),
|
||||
],
|
||||
)
|
||||
down = frames_received[0]
|
||||
|
||||
word_frames = [f for f in down if isinstance(f, TTSTextFrame)]
|
||||
skipped = [f for f in down if isinstance(f, AggregatedTextFrame) and f.text == "some code"]
|
||||
|
||||
assert len(skipped) == 1
|
||||
assert skipped[0].append_to_context is True
|
||||
assert len(word_frames) >= 1
|
||||
|
||||
skipped_idx = down.index(skipped[0])
|
||||
first_word_idx = down.index(word_frames[0])
|
||||
assert skipped_idx < first_word_idx, (
|
||||
f"Skipped frame (pos {skipped_idx}) must appear before first word frame (pos {first_word_idx})"
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_skipped_frame_flushed_when_word_timestamps_incomplete():
|
||||
"""Force-complete path: skipped frame still emits when the TTS drops word timestamps.
|
||||
|
||||
Only one of the two expected tokens ("hello") is returned. The spoken slot never
|
||||
reaches its expected character count through the normal path. When
|
||||
on_audio_context_done fires it force-completes any remaining spoken slots and
|
||||
flushes the waiting skipped frame.
|
||||
"""
|
||||
tts = _MockWordTimestampHttpTTSService(
|
||||
word_times=[("hello", 0.0)], # "world" is never sent
|
||||
skip_aggregator_types=["code"],
|
||||
)
|
||||
frames_received = await run_test(
|
||||
tts,
|
||||
frames_to_send=[
|
||||
AggregatedTextFrame("hello world", AggregationType.SENTENCE),
|
||||
AggregatedTextFrame("some code", "code"),
|
||||
],
|
||||
)
|
||||
down = frames_received[0]
|
||||
|
||||
skipped = [f for f in down if isinstance(f, AggregatedTextFrame) and f.text == "some code"]
|
||||
assert len(skipped) == 1, "Skipped frame must be flushed via force-complete safety net"
|
||||
assert skipped[0].append_to_context is True
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Tests: raw_text propagation through WordCompletionTracker
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_raw_text_propagated_to_tts_text_frames():
|
||||
"""raw_text on AggregatedTextFrame is split across TTSTextFrames by the tracker.
|
||||
|
||||
The frame carries raw_text="<card>4111 1111</card>" while the TTS-prepared
|
||||
text is "4111 1111". The WordCompletionTracker advances a cursor through the
|
||||
raw text in step with incoming word tokens, so each TTSTextFrame receives the
|
||||
exact raw span it represents.
|
||||
|
||||
Expected (trailing whitespace stripped because includes_inter_frame_spaces=False):
|
||||
TTSTextFrame("4111").raw_text == "<card>4111"
|
||||
TTSTextFrame("1111").raw_text == "1111</card>"
|
||||
"""
|
||||
tts = _MockWordTimestampHttpTTSService()
|
||||
frames_received = await run_test(
|
||||
tts,
|
||||
frames_to_send=[
|
||||
AggregatedTextFrame(
|
||||
"4111 1111", AggregationType.SENTENCE, raw_text="<card>4111 1111</card>"
|
||||
)
|
||||
],
|
||||
)
|
||||
word_frames = [f for f in frames_received[0] if isinstance(f, TTSTextFrame)]
|
||||
|
||||
assert [f.text for f in word_frames] == ["4111", "1111"]
|
||||
# get_raw_consumed() strips trailing whitespace when includes_inter_frame_spaces=False
|
||||
assert word_frames[0].raw_text == "<card>4111"
|
||||
assert word_frames[1].raw_text == "1111</card>"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Tests: overflow — TTS word spanning two AggregatedTextFrame boundaries
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_overflow_word_spanning_two_aggregated_frames():
|
||||
"""A single TTS token straddling two AggregatedTextFrame boundaries produces
|
||||
two correctly-attributed TTSTextFrames.
|
||||
|
||||
Setup:
|
||||
Frame 1: AggregatedTextFrame("abc", SENTENCE)
|
||||
Frame 2: AggregatedTextFrame("def", SENTENCE)
|
||||
|
||||
The TTS for frame 1 returns the single token "abcdef", which overshoots
|
||||
frame 1 by three characters. _emit_overflow_word splits it:
|
||||
TTSTextFrame("abc") — frame 1's portion (context_id = ctx1)
|
||||
TTSTextFrame("def") — overflow attributed to frame 2 (context_id = ctx2)
|
||||
|
||||
Frame 2 receives no word-timestamp events because the overflow already
|
||||
consumed its expected text.
|
||||
"""
|
||||
tts = _MockPerCallWordTimestampHttpTTSService(
|
||||
word_times_per_call=[
|
||||
[("abcdef", 0.0)], # frame 1: single token spanning both frames
|
||||
[], # frame 2: no word timestamps (overflow already covered it)
|
||||
]
|
||||
)
|
||||
frames_received = await run_test(
|
||||
tts,
|
||||
frames_to_send=[
|
||||
AggregatedTextFrame("abc", AggregationType.SENTENCE),
|
||||
AggregatedTextFrame("def", AggregationType.SENTENCE),
|
||||
],
|
||||
)
|
||||
word_frames = [f for f in frames_received[0] if isinstance(f, TTSTextFrame)]
|
||||
|
||||
assert [f.text for f in word_frames] == ["abc", "def"], (
|
||||
f"Expected ['abc', 'def'] but got {[f.text for f in word_frames]}"
|
||||
)
|
||||
assert word_frames[0].context_id != word_frames[1].context_id, (
|
||||
"Overflow TTSTextFrame must carry frame 2's context_id, not frame 1's"
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Per-call word-timestamp mock for WebSocket path (for force-complete tests)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class _MockPerCallWordTimestampWSTTSService(TTSService):
|
||||
"""WebSocket-style TTS where each run_tts() call consumes its own word-time list.
|
||||
|
||||
Mirrors _MockPerCallWordTimestampHttpTTSService but uses the async audio-context
|
||||
delivery pattern so it exercises _handle_audio_context (the WebSocket path).
|
||||
An empty inner list means no word-timestamp events are emitted for that call.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
word_times_per_call: list[list[tuple[str, float]]],
|
||||
**kwargs,
|
||||
):
|
||||
super().__init__(
|
||||
push_start_frame=True,
|
||||
push_text_frames=False,
|
||||
pause_frame_processing=False,
|
||||
sample_rate=_SAMPLE_RATE,
|
||||
**kwargs,
|
||||
)
|
||||
self._word_times_queue = list(word_times_per_call)
|
||||
|
||||
def can_generate_metrics(self) -> bool:
|
||||
return False
|
||||
|
||||
async def run_tts(self, text: str, context_id: str) -> AsyncGenerator[Frame, None]:
|
||||
word_times = self._word_times_queue.pop(0) if self._word_times_queue else []
|
||||
|
||||
async def _deliver():
|
||||
await asyncio.sleep(0.01)
|
||||
if word_times:
|
||||
await self.add_word_timestamps(word_times, context_id=context_id)
|
||||
await self.append_to_audio_context(
|
||||
context_id,
|
||||
TTSAudioRawFrame(
|
||||
audio=_FAKE_AUDIO,
|
||||
sample_rate=_SAMPLE_RATE,
|
||||
num_channels=1,
|
||||
context_id=context_id,
|
||||
),
|
||||
)
|
||||
await self.append_to_audio_context(context_id, TTSStoppedFrame(context_id=context_id))
|
||||
await self.remove_audio_context(context_id)
|
||||
|
||||
self.create_task(_deliver(), name=f"mock_ws_per_call_deliver_{context_id}")
|
||||
if False:
|
||||
yield
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Tests: _force_complete_spoken_slots — TTSTextFrame emission for dropped timestamps
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_http_force_complete_partial_timestamps_emits_remaining_text():
|
||||
"""_force_complete_spoken_slots emits a TTSTextFrame for the unspoken word suffix.
|
||||
|
||||
Only the first token ("hello") is delivered as a word-timestamp event; "world"
|
||||
is never sent. When the audio context ends _force_complete_spoken_slots fires,
|
||||
reads get_remaining_text() from the tracker, and emits TTSTextFrame("world").
|
||||
|
||||
Expected TTSTextFrames in order: ["hello", "world"].
|
||||
"""
|
||||
tts = _MockWordTimestampHttpTTSService(word_times=[("hello", 0.0)])
|
||||
frames_received = await run_test(
|
||||
tts,
|
||||
frames_to_send=[TTSSpeakFrame(text="hello world", append_to_context=False)],
|
||||
)
|
||||
word_frames = [f for f in frames_received[0] if isinstance(f, TTSTextFrame)]
|
||||
|
||||
assert [f.text for f in word_frames] == ["hello", "world"], (
|
||||
f"Expected ['hello', 'world'] but got {[f.text for f in word_frames]}"
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_http_force_complete_no_timestamps_emits_full_text():
|
||||
"""_force_complete_spoken_slots emits the full text when no word timestamps arrive.
|
||||
|
||||
No word-timestamp events are sent for "hello world". The slot remains incomplete
|
||||
when the audio context ends; force-complete reads the full remaining text from the
|
||||
tracker and emits TTSTextFrame("hello world").
|
||||
"""
|
||||
tts = _MockPerCallWordTimestampHttpTTSService(word_times_per_call=[[]])
|
||||
frames_received = await run_test(
|
||||
tts,
|
||||
frames_to_send=[TTSSpeakFrame(text="hello world", append_to_context=False)],
|
||||
)
|
||||
word_frames = [f for f in frames_received[0] if isinstance(f, TTSTextFrame)]
|
||||
|
||||
assert len(word_frames) == 1, (
|
||||
f"Expected exactly 1 TTSTextFrame, got {len(word_frames)}: {[f.text for f in word_frames]}"
|
||||
)
|
||||
assert word_frames[0].text == "hello world", (
|
||||
f"Expected TTSTextFrame('hello world'), got {word_frames[0].text!r}"
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_http_force_complete_raw_text_propagated():
|
||||
"""force-complete carries the correct raw_text span on the emitted TTSTextFrame.
|
||||
|
||||
AggregatedTextFrame carries raw_text="<card>4111 1111</card>". Only "4111" arrives
|
||||
as a word-timestamp; "1111" is force-completed.
|
||||
|
||||
Expected:
|
||||
TTSTextFrame("4111").raw_text == "<card>4111" — from normal word path
|
||||
TTSTextFrame("1111").raw_text == "1111</card>" — from force-complete path
|
||||
"""
|
||||
tts = _MockPerCallWordTimestampHttpTTSService(word_times_per_call=[[("4111", 0.0)]])
|
||||
frames_received = await run_test(
|
||||
tts,
|
||||
frames_to_send=[
|
||||
AggregatedTextFrame(
|
||||
"4111 1111", AggregationType.SENTENCE, raw_text="<card>4111 1111</card>"
|
||||
)
|
||||
],
|
||||
)
|
||||
word_frames = [f for f in frames_received[0] if isinstance(f, TTSTextFrame)]
|
||||
|
||||
assert [f.text for f in word_frames] == ["4111", "1111"], (
|
||||
f"Expected ['4111', '1111'] but got {[f.text for f in word_frames]}"
|
||||
)
|
||||
assert word_frames[0].raw_text == "<card>4111", (
|
||||
f"Expected raw_text '<card>4111' on first frame, got {word_frames[0].raw_text!r}"
|
||||
)
|
||||
assert word_frames[1].raw_text == "1111</card>", (
|
||||
f"Expected raw_text '1111</card>' on force-complete frame, got {word_frames[1].raw_text!r}"
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_ws_force_complete_partial_timestamps_emits_remaining_text():
|
||||
"""WebSocket path: _force_complete_spoken_slots emits TTSTextFrame for dropped token.
|
||||
|
||||
Mirrors test_http_force_complete_partial_timestamps_emits_remaining_text on the
|
||||
async audio delivery path to confirm force-complete fires correctly from
|
||||
_handle_audio_context when TTSStoppedFrame arrives before all word timestamps.
|
||||
"""
|
||||
tts = _MockWordTimestampWSTTSService(word_times=[("hello", 0.0)])
|
||||
frames_received = await run_test(
|
||||
tts,
|
||||
frames_to_send=[TTSSpeakFrame(text="hello world", append_to_context=False)],
|
||||
)
|
||||
word_frames = [f for f in frames_received[0] if isinstance(f, TTSTextFrame)]
|
||||
|
||||
assert [f.text for f in word_frames] == ["hello", "world"], (
|
||||
f"Expected ['hello', 'world'] but got {[f.text for f in word_frames]}"
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_ws_force_complete_no_timestamps_emits_full_text():
|
||||
"""WebSocket path: full text emitted as single TTSTextFrame when no timestamps arrive."""
|
||||
tts = _MockPerCallWordTimestampWSTTSService(word_times_per_call=[[]])
|
||||
frames_received = await run_test(
|
||||
tts,
|
||||
frames_to_send=[TTSSpeakFrame(text="hello world", append_to_context=False)],
|
||||
)
|
||||
word_frames = [f for f in frames_received[0] if isinstance(f, TTSTextFrame)]
|
||||
|
||||
assert len(word_frames) == 1, (
|
||||
f"Expected exactly 1 TTSTextFrame, got {len(word_frames)}: {[f.text for f in word_frames]}"
|
||||
)
|
||||
assert word_frames[0].text == "hello world", (
|
||||
f"Expected TTSTextFrame('hello world'), got {word_frames[0].text!r}"
|
||||
)
|
||||
assert all(f.includes_inter_frame_spaces is True for f in text_frames)
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
|
||||
3104
tests/test_vonage_video_connector.py
Normal file
3104
tests/test_vonage_video_connector.py
Normal file
File diff suppressed because it is too large
Load Diff
@@ -165,6 +165,19 @@ async def test_reconnect_exhausted_emits_non_fatal_error(service, report_error):
|
||||
assert "Connection refused" in final_error.error
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_reconnect_exhausted_when_connect_does_not_raise(service, report_error):
|
||||
"""A non-raising failed connect is treated as a failed reconnect attempt."""
|
||||
result = await service._try_reconnect(report_error=report_error)
|
||||
|
||||
assert result is False
|
||||
assert report_error.call_count == 4
|
||||
final_error = report_error.call_args_list[-1][0][0]
|
||||
assert isinstance(final_error, ErrorFrame)
|
||||
assert final_error.fatal is False
|
||||
assert "websocket reconnection failed verification" in final_error.error
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Quick failure detection — accept then immediately close
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
1602
tests/test_word_completion_tracker.py
Normal file
1602
tests/test_word_completion_tracker.py
Normal file
File diff suppressed because it is too large
Load Diff
90
tests/test_word_timestamp_utils.py
Normal file
90
tests/test_word_timestamp_utils.py
Normal file
@@ -0,0 +1,90 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
|
||||
import unittest
|
||||
|
||||
from pipecat.utils.text.word_timestamp_utils import merge_punct_tokens
|
||||
|
||||
|
||||
class TestMergePunctTokens(unittest.TestCase):
|
||||
def test_empty_list(self):
|
||||
self.assertEqual(merge_punct_tokens([]), [])
|
||||
|
||||
def test_all_alnum_words_pass_through(self):
|
||||
input = [("hello", 0.0), ("world", 1.0)]
|
||||
self.assertEqual(merge_punct_tokens(input), [("hello", 0.0), ("world", 1.0)])
|
||||
|
||||
def test_trailing_space_merged_and_stripped(self):
|
||||
input = [("I", 0.0), (" ", 0.2)]
|
||||
self.assertEqual(merge_punct_tokens(input), [("I", 0.0)])
|
||||
|
||||
def test_comma_space_merged_and_stripped(self):
|
||||
input = [("questions", 1.0), (", ", 1.2), ("explain", 1.4)]
|
||||
self.assertEqual(merge_punct_tokens(input), [("questions,", 1.0), ("explain", 1.4)])
|
||||
|
||||
def test_leading_space_with_no_preceding_word_discarded(self):
|
||||
input = [(" ", 0.0), ("hello", 0.5)]
|
||||
self.assertEqual(merge_punct_tokens(input), [("hello", 0.5)])
|
||||
|
||||
def test_leading_empty_string_discarded(self):
|
||||
input = [("", 0.0), ("hello", 0.5)]
|
||||
self.assertEqual(merge_punct_tokens(input), [("hello", 0.5)])
|
||||
|
||||
def test_multiple_consecutive_punct_tokens_merged_and_stripped(self):
|
||||
input = [("word", 0.0), (",", 0.1), (" ", 0.2), ("next", 0.3)]
|
||||
self.assertEqual(merge_punct_tokens(input), [("word,", 0.0), ("next", 0.3)])
|
||||
|
||||
def test_timestamp_of_preceding_word_is_kept(self):
|
||||
"""Merged punct tokens adopt the preceding word's timestamp."""
|
||||
input = [("hello", 2.5), (",", 2.7)]
|
||||
result = merge_punct_tokens(input)
|
||||
self.assertEqual(result, [("hello,", 2.5)])
|
||||
|
||||
def test_xml_tag_only_token_is_treated_as_punct(self):
|
||||
"""A token that is only an XML tag (no alnum chars) merges into the preceding word."""
|
||||
input = [("word", 0.0), ("<break/>", 0.1), ("next", 0.3)]
|
||||
self.assertEqual(merge_punct_tokens(input), [("word<break/>", 0.0), ("next", 0.3)])
|
||||
|
||||
def test_xml_tag_with_alnum_content_passes_through(self):
|
||||
"""A token like '<spell>123</spell>' has alnum chars after stripping tags."""
|
||||
input = [("<spell>123</spell>", 0.0), ("and", 0.5)]
|
||||
self.assertEqual(merge_punct_tokens(input), [("<spell>123</spell>", 0.0), ("and", 0.5)])
|
||||
|
||||
def test_inworld_style_full_stream(self):
|
||||
"""Full Inworld-style raw stream produces expected merged and stripped output."""
|
||||
raw = [
|
||||
("", 0.0),
|
||||
("I", 0.1),
|
||||
(" ", 0.2),
|
||||
("can", 0.3),
|
||||
(" ", 0.4),
|
||||
("answer", 0.5),
|
||||
(" ", 0.6),
|
||||
("questions", 0.7),
|
||||
(", ", 0.8),
|
||||
("explain", 0.9),
|
||||
(" ", 1.0),
|
||||
("things", 1.1),
|
||||
(".", 1.2),
|
||||
]
|
||||
expected = [
|
||||
("I", 0.1),
|
||||
("can", 0.3),
|
||||
("answer", 0.5),
|
||||
("questions,", 0.7),
|
||||
("explain", 0.9),
|
||||
("things.", 1.1),
|
||||
]
|
||||
self.assertEqual(merge_punct_tokens(raw), expected)
|
||||
|
||||
def test_only_punct_tokens_returns_empty(self):
|
||||
"""A list containing only punct/space tokens produces an empty result."""
|
||||
input = [(" ", 0.0), (",", 0.1), (".", 0.2)]
|
||||
self.assertEqual(merge_punct_tokens(input), [])
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
224
uv.lock
generated
224
uv.lock
generated
@@ -307,7 +307,8 @@ dependencies = [
|
||||
{ name = "docstring-parser" },
|
||||
{ name = "httpx" },
|
||||
{ name = "jiter" },
|
||||
{ name = "pydantic" },
|
||||
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
|
||||
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
|
||||
{ name = "sniffio" },
|
||||
{ name = "typing-extensions" },
|
||||
]
|
||||
@@ -616,7 +617,8 @@ version = "1.5.11"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
dependencies = [
|
||||
{ name = "httpx" },
|
||||
{ name = "pydantic" },
|
||||
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
|
||||
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
|
||||
{ name = "typing-extensions" },
|
||||
{ name = "websocket-client" },
|
||||
{ name = "websockets" },
|
||||
@@ -1268,8 +1270,10 @@ version = "6.1.1"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
dependencies = [
|
||||
{ name = "httpx" },
|
||||
{ name = "pydantic" },
|
||||
{ name = "pydantic-core" },
|
||||
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
|
||||
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
|
||||
{ name = "pydantic-core", version = "2.33.2", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
|
||||
{ name = "pydantic-core", version = "2.46.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
|
||||
{ name = "typing-extensions" },
|
||||
{ name = "websockets" },
|
||||
]
|
||||
@@ -1394,7 +1398,8 @@ version = "0.136.1"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
dependencies = [
|
||||
{ name = "annotated-doc" },
|
||||
{ name = "pydantic" },
|
||||
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
|
||||
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
|
||||
{ name = "starlette" },
|
||||
{ name = "typing-extensions" },
|
||||
{ name = "typing-inspection" },
|
||||
@@ -1445,7 +1450,8 @@ source = { registry = "https://pypi.org/simple" }
|
||||
dependencies = [
|
||||
{ name = "fastar" },
|
||||
{ name = "httpx" },
|
||||
{ name = "pydantic", extra = ["email"] },
|
||||
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, extra = ["email"], marker = "python_full_version == '3.13.*'" },
|
||||
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, extra = ["email"], marker = "python_full_version != '3.13.*'" },
|
||||
{ name = "rich-toolkit" },
|
||||
{ name = "rignore" },
|
||||
{ name = "sentry-sdk" },
|
||||
@@ -1865,7 +1871,8 @@ dependencies = [
|
||||
{ name = "distro" },
|
||||
{ name = "google-auth", extra = ["requests"] },
|
||||
{ name = "httpx" },
|
||||
{ name = "pydantic" },
|
||||
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
|
||||
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
|
||||
{ name = "requests" },
|
||||
{ name = "sniffio" },
|
||||
{ name = "tenacity" },
|
||||
@@ -1944,7 +1951,8 @@ dependencies = [
|
||||
{ name = "anyio" },
|
||||
{ name = "distro" },
|
||||
{ name = "httpx" },
|
||||
{ name = "pydantic" },
|
||||
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
|
||||
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
|
||||
{ name = "sniffio" },
|
||||
{ name = "typing-extensions" },
|
||||
]
|
||||
@@ -2249,8 +2257,10 @@ dependencies = [
|
||||
{ name = "eval-type-backport" },
|
||||
{ name = "exceptiongroup" },
|
||||
{ name = "httpx" },
|
||||
{ name = "pydantic" },
|
||||
{ name = "pydantic-core" },
|
||||
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
|
||||
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
|
||||
{ name = "pydantic-core", version = "2.33.2", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
|
||||
{ name = "pydantic-core", version = "2.46.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
|
||||
{ name = "typing-extensions" },
|
||||
{ name = "websockets" },
|
||||
]
|
||||
@@ -2728,7 +2738,8 @@ source = { registry = "https://pypi.org/simple" }
|
||||
dependencies = [
|
||||
{ name = "langchain-core" },
|
||||
{ name = "langgraph" },
|
||||
{ name = "pydantic" },
|
||||
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
|
||||
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
|
||||
]
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/a6/74/03fd4c07993c49c4b80635bb4c723643ff78af81c9471d1266f879f68df1/langchain-1.3.0.tar.gz", hash = "sha256:8ec70ee0cef94255f3e522423b254093a3dd34509638d353c50f3d9dd498debc", size = 580604, upload-time = "2026-05-12T14:45:50.7Z" }
|
||||
wheels = [
|
||||
@@ -2743,7 +2754,8 @@ dependencies = [
|
||||
{ name = "langchain-core" },
|
||||
{ name = "langchain-text-splitters" },
|
||||
{ name = "langsmith" },
|
||||
{ name = "pydantic" },
|
||||
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
|
||||
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
|
||||
{ name = "pyyaml" },
|
||||
{ name = "requests" },
|
||||
{ name = "sqlalchemy" },
|
||||
@@ -2785,7 +2797,8 @@ dependencies = [
|
||||
{ name = "langchain-protocol" },
|
||||
{ name = "langsmith" },
|
||||
{ name = "packaging" },
|
||||
{ name = "pydantic" },
|
||||
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
|
||||
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
|
||||
{ name = "pyyaml" },
|
||||
{ name = "tenacity" },
|
||||
{ name = "typing-extensions" },
|
||||
@@ -2843,7 +2856,8 @@ dependencies = [
|
||||
{ name = "langgraph-checkpoint" },
|
||||
{ name = "langgraph-prebuilt" },
|
||||
{ name = "langgraph-sdk" },
|
||||
{ name = "pydantic" },
|
||||
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
|
||||
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
|
||||
{ name = "xxhash" },
|
||||
]
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/58/61/d5d25e783035aa307d289b37e082258a6061c0fb4caa4a284f3bf1e87169/langgraph-1.2.0.tar.gz", hash = "sha256:4a9baaf62afc5d5f63144a50095140a34b9aa9b7cea695d25326d564775348e7", size = 690248, upload-time = "2026-05-12T03:46:39.164Z" }
|
||||
@@ -2898,7 +2912,8 @@ dependencies = [
|
||||
{ name = "httpx" },
|
||||
{ name = "orjson", marker = "platform_python_implementation != 'PyPy'" },
|
||||
{ name = "packaging" },
|
||||
{ name = "pydantic" },
|
||||
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
|
||||
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
|
||||
{ name = "requests" },
|
||||
{ name = "requests-toolbelt" },
|
||||
{ name = "uuid-utils" },
|
||||
@@ -3188,7 +3203,8 @@ dependencies = [
|
||||
{ name = "httpx" },
|
||||
{ name = "httpx-sse" },
|
||||
{ name = "jsonschema" },
|
||||
{ name = "pydantic" },
|
||||
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
|
||||
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
|
||||
{ name = "pydantic-settings" },
|
||||
{ name = "pyjwt", extra = ["crypto"] },
|
||||
{ name = "python-multipart" },
|
||||
@@ -3227,7 +3243,8 @@ dependencies = [
|
||||
{ name = "openai" },
|
||||
{ name = "posthog" },
|
||||
{ name = "protobuf" },
|
||||
{ name = "pydantic" },
|
||||
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
|
||||
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
|
||||
{ name = "pytz" },
|
||||
{ name = "qdrant-client" },
|
||||
{ name = "sqlalchemy" },
|
||||
@@ -3247,7 +3264,8 @@ dependencies = [
|
||||
{ name = "jsonpath-python" },
|
||||
{ name = "opentelemetry-api" },
|
||||
{ name = "opentelemetry-semantic-conventions" },
|
||||
{ name = "pydantic" },
|
||||
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
|
||||
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
|
||||
{ name = "python-dateutil" },
|
||||
{ name = "typing-inspection" },
|
||||
]
|
||||
@@ -3818,7 +3836,8 @@ dependencies = [
|
||||
{ name = "distro" },
|
||||
{ name = "httpx" },
|
||||
{ name = "jiter" },
|
||||
{ name = "pydantic" },
|
||||
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
|
||||
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
|
||||
{ name = "sniffio" },
|
||||
{ name = "tqdm" },
|
||||
{ name = "typing-extensions" },
|
||||
@@ -4195,7 +4214,8 @@ dependencies = [
|
||||
{ name = "openai" },
|
||||
{ name = "pillow" },
|
||||
{ name = "protobuf" },
|
||||
{ name = "pydantic" },
|
||||
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
|
||||
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
|
||||
{ name = "pyloudnorm" },
|
||||
{ name = "resampy" },
|
||||
{ name = "soxr" },
|
||||
@@ -4351,7 +4371,7 @@ rnnoise = [
|
||||
]
|
||||
runner = [
|
||||
{ name = "fastapi" },
|
||||
{ name = "pipecat-ai-small-webrtc-prebuilt" },
|
||||
{ name = "pipecat-ai-prebuilt" },
|
||||
{ name = "python-dotenv" },
|
||||
{ name = "uvicorn" },
|
||||
]
|
||||
@@ -4394,6 +4414,9 @@ tracing = [
|
||||
ultravox = [
|
||||
{ name = "websockets" },
|
||||
]
|
||||
vonage-video-connector = [
|
||||
{ name = "vonage-video-connector", marker = "python_full_version == '3.13.*' and sys_platform == 'linux'" },
|
||||
]
|
||||
webrtc = [
|
||||
{ name = "aiortc" },
|
||||
{ name = "opencv-python" },
|
||||
@@ -4516,7 +4539,7 @@ requires-dist = [
|
||||
{ name = "pipecat-ai", extras = ["websockets-base"], marker = "extra == 'ultravox'" },
|
||||
{ name = "pipecat-ai", extras = ["websockets-base"], marker = "extra == 'websocket'" },
|
||||
{ name = "pipecat-ai", extras = ["websockets-base"], marker = "extra == 'xai'" },
|
||||
{ name = "pipecat-ai-small-webrtc-prebuilt", marker = "extra == 'runner'", specifier = ">=2.5.0" },
|
||||
{ name = "pipecat-ai-prebuilt", marker = "extra == 'runner'", specifier = ">=1.0.1" },
|
||||
{ name = "piper-tts", marker = "extra == 'piper'", specifier = ">=1.3.0,<2" },
|
||||
{ name = "protobuf", specifier = ">=5.29.6,<7" },
|
||||
{ name = "protobuf", marker = "extra == 'nvidia'", specifier = ">=6.31.1,<7" },
|
||||
@@ -4547,10 +4570,11 @@ requires-dist = [
|
||||
{ name = "transformers", marker = "extra == 'local-smart-turn'", specifier = ">=4.48.0,<6" },
|
||||
{ name = "transformers", marker = "extra == 'moondream'", specifier = ">=4.48.0,<6" },
|
||||
{ name = "uvicorn", marker = "extra == 'runner'", specifier = ">=0.32.0,<1.0.0" },
|
||||
{ name = "vonage-video-connector", marker = "python_full_version == '3.13.*' and sys_platform == 'linux' and extra == 'vonage-video-connector'", specifier = "~=0.2.3b0" },
|
||||
{ name = "wait-for2", marker = "python_full_version < '3.12'", specifier = ">=0.4.1,<1" },
|
||||
{ name = "websockets", marker = "extra == 'websockets-base'", specifier = ">=13.1,<16.0" },
|
||||
]
|
||||
provides-extras = ["aic", "anthropic", "assemblyai", "asyncai", "aws", "aws-nova-sonic", "azure", "cartesia", "camb", "cerebras", "daily", "deepgram", "deepseek", "elevenlabs", "fal", "fireworks", "fish", "gladia", "google", "gradium", "grok", "groq", "gstreamer", "heygen", "hume", "inworld", "koala", "kokoro", "langchain", "lemonslice", "livekit", "lmnt", "local", "local-smart-turn", "mcp", "mem0", "mistral", "mlx-whisper", "moondream", "nebius", "neuphonic", "novita", "nvidia", "openai", "rnnoise", "openrouter", "perplexity", "piper", "qwen", "resembleai", "rime", "runner", "sagemaker", "sambanova", "sarvam", "sentry", "silero", "simli", "smallest", "soniox", "soundfile", "speechmatics", "strands", "tavus", "together", "tracing", "ultravox", "webrtc", "websocket", "websockets-base", "whisper", "xai"]
|
||||
provides-extras = ["aic", "anthropic", "assemblyai", "asyncai", "aws", "aws-nova-sonic", "azure", "cartesia", "camb", "cerebras", "daily", "deepgram", "deepseek", "elevenlabs", "fal", "fireworks", "fish", "gladia", "google", "gradium", "grok", "groq", "gstreamer", "heygen", "hume", "inception", "inworld", "koala", "kokoro", "langchain", "lemonslice", "livekit", "lmnt", "local", "local-smart-turn", "mcp", "mem0", "mistral", "mlx-whisper", "moondream", "nebius", "neuphonic", "novita", "nvidia", "openai", "rnnoise", "openrouter", "perplexity", "piper", "qwen", "resembleai", "rime", "runner", "sagemaker", "sambanova", "sarvam", "sentry", "silero", "simli", "smallest", "soniox", "soundfile", "speechmatics", "strands", "tavus", "together", "tracing", "ultravox", "vonage-video-connector", "webrtc", "websocket", "websockets-base", "whisper", "xai"]
|
||||
|
||||
[package.metadata.requires-dev]
|
||||
dev = [
|
||||
@@ -4578,15 +4602,15 @@ docs = [
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "pipecat-ai-small-webrtc-prebuilt"
|
||||
version = "2.5.0"
|
||||
name = "pipecat-ai-prebuilt"
|
||||
version = "1.0.1"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
dependencies = [
|
||||
{ name = "fastapi", extra = ["all"] },
|
||||
]
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/2c/4f/40bfc9fc1a13f9b1f2657e292c51ff3e3516c530ca722effdcf342d465d9/pipecat_ai_small_webrtc_prebuilt-2.5.0.tar.gz", hash = "sha256:51481506b7b5dff10eff0357ff929cba504a5198c3393697178d2be9895ad9e6", size = 474299, upload-time = "2026-04-22T18:05:16.494Z" }
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/fa/27/91857cd93661922687e51f4141583dbeb71f9a6c8d0d6379bae1aa467522/pipecat_ai_prebuilt-1.0.1.tar.gz", hash = "sha256:9453136fcb994802f9b650b5175f3ce1d0476849a9e609fefe52ecc1c3299680", size = 601771, upload-time = "2026-05-20T16:08:14.485Z" }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/34/58/1a2e10c1fb7b44e47558cb6c0954e24a60f98afe912fe55c74fdee66f080/pipecat_ai_small_webrtc_prebuilt-2.5.0-py3-none-any.whl", hash = "sha256:23b1eee95662a0072d9ee5128b8567108eda10d5a54ad71f279730afbb678bfe", size = 474308, upload-time = "2026-04-22T18:05:14.552Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/ec/4f/a636e47967c3aa885ae912502d73a46d1e824a67992e405ea1e94b78bd94/pipecat_ai_prebuilt-1.0.1-py3-none-any.whl", hash = "sha256:45d78d3fd2ac8193626a5dabb5f45d0ff2d35bfc92098b4bcea308ae612196aa", size = 601994, upload-time = "2026-05-20T16:08:12.4Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@@ -4920,15 +4944,43 @@ wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/0c/c3/44f3fbbfa403ea2a7c779186dc20772604442dde72947e7d01069cbe98e3/pycparser-3.0-py3-none-any.whl", hash = "sha256:b727414169a36b7d524c1c3e31839a521725078d7b2ff038656844266160a992", size = 48172, upload-time = "2026-01-21T14:26:50.693Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "pydantic"
|
||||
version = "2.11.10"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
resolution-markers = [
|
||||
"python_full_version == '3.13.*'",
|
||||
]
|
||||
dependencies = [
|
||||
{ name = "annotated-types", marker = "python_full_version == '3.13.*'" },
|
||||
{ name = "pydantic-core", version = "2.33.2", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
|
||||
{ name = "typing-extensions", marker = "python_full_version == '3.13.*'" },
|
||||
{ name = "typing-inspection", marker = "python_full_version == '3.13.*'" },
|
||||
]
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/ae/54/ecab642b3bed45f7d5f59b38443dcb36ef50f85af192e6ece103dbfe9587/pydantic-2.11.10.tar.gz", hash = "sha256:dc280f0982fbda6c38fada4e476dc0a4f3aeaf9c6ad4c28df68a666ec3c61423", size = 788494, upload-time = "2025-10-04T10:40:41.338Z" }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/bd/1f/73c53fcbfb0b5a78f91176df41945ca466e71e9d9d836e5c522abda39ee7/pydantic-2.11.10-py3-none-any.whl", hash = "sha256:802a655709d49bd004c31e865ef37da30b540786a46bfce02333e0e24b5fe29a", size = 444823, upload-time = "2025-10-04T10:40:39.055Z" },
|
||||
]
|
||||
|
||||
[package.optional-dependencies]
|
||||
email = [
|
||||
{ name = "email-validator", marker = "python_full_version == '3.13.*'" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "pydantic"
|
||||
version = "2.13.4"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
resolution-markers = [
|
||||
"python_full_version >= '3.14'",
|
||||
"python_full_version == '3.12.*'",
|
||||
"python_full_version < '3.12'",
|
||||
]
|
||||
dependencies = [
|
||||
{ name = "annotated-types" },
|
||||
{ name = "pydantic-core" },
|
||||
{ name = "typing-extensions" },
|
||||
{ name = "typing-inspection" },
|
||||
{ name = "annotated-types", marker = "python_full_version != '3.13.*'" },
|
||||
{ name = "pydantic-core", version = "2.46.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
|
||||
{ name = "typing-extensions", marker = "python_full_version != '3.13.*'" },
|
||||
{ name = "typing-inspection", marker = "python_full_version != '3.13.*'" },
|
||||
]
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/18/a5/b60d21ac674192f8ab0ba4e9fd860690f9b4a6e51ca5df118733b487d8d6/pydantic-2.13.4.tar.gz", hash = "sha256:c40756b57adaa8b1efeeced5c196f3f3b7c435f90e84ea7f443901bec8099ef6", size = 844775, upload-time = "2026-05-06T13:43:05.343Z" }
|
||||
wheels = [
|
||||
@@ -4937,15 +4989,88 @@ wheels = [
|
||||
|
||||
[package.optional-dependencies]
|
||||
email = [
|
||||
{ name = "email-validator" },
|
||||
{ name = "email-validator", marker = "python_full_version != '3.13.*'" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "pydantic-core"
|
||||
version = "2.33.2"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
resolution-markers = [
|
||||
"python_full_version == '3.13.*'",
|
||||
]
|
||||
dependencies = [
|
||||
{ name = "typing-extensions", marker = "python_full_version == '3.13.*'" },
|
||||
]
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/ad/88/5f2260bdfae97aabf98f1778d43f69574390ad787afb646292a638c923d4/pydantic_core-2.33.2.tar.gz", hash = "sha256:7cb8bc3605c29176e1b105350d2e6474142d7c1bd1d9327c4a9bdb46bf827acc", size = 435195, upload-time = "2025-04-23T18:33:52.104Z" }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/3f/8d/71db63483d518cbbf290261a1fc2839d17ff89fce7089e08cad07ccfce67/pydantic_core-2.33.2-cp311-cp311-macosx_10_12_x86_64.whl", hash = "sha256:4c5b0a576fb381edd6d27f0a85915c6daf2f8138dc5c267a57c08a62900758c7", size = 2028584, upload-time = "2025-04-23T18:31:03.106Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/24/2f/3cfa7244ae292dd850989f328722d2aef313f74ffc471184dc509e1e4e5a/pydantic_core-2.33.2-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:e799c050df38a639db758c617ec771fd8fb7a5f8eaaa4b27b101f266b216a246", size = 1855071, upload-time = "2025-04-23T18:31:04.621Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/b3/d3/4ae42d33f5e3f50dd467761304be2fa0a9417fbf09735bc2cce003480f2a/pydantic_core-2.33.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:dc46a01bf8d62f227d5ecee74178ffc448ff4e5197c756331f71efcc66dc980f", size = 1897823, upload-time = "2025-04-23T18:31:06.377Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/f4/f3/aa5976e8352b7695ff808599794b1fba2a9ae2ee954a3426855935799488/pydantic_core-2.33.2-cp311-cp311-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:a144d4f717285c6d9234a66778059f33a89096dfb9b39117663fd8413d582dcc", size = 1983792, upload-time = "2025-04-23T18:31:07.93Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/d5/7a/cda9b5a23c552037717f2b2a5257e9b2bfe45e687386df9591eff7b46d28/pydantic_core-2.33.2-cp311-cp311-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:73cf6373c21bc80b2e0dc88444f41ae60b2f070ed02095754eb5a01df12256de", size = 2136338, upload-time = "2025-04-23T18:31:09.283Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/2b/9f/b8f9ec8dd1417eb9da784e91e1667d58a2a4a7b7b34cf4af765ef663a7e5/pydantic_core-2.33.2-cp311-cp311-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:3dc625f4aa79713512d1976fe9f0bc99f706a9dee21dfd1810b4bbbf228d0e8a", size = 2730998, upload-time = "2025-04-23T18:31:11.7Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/47/bc/cd720e078576bdb8255d5032c5d63ee5c0bf4b7173dd955185a1d658c456/pydantic_core-2.33.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:881b21b5549499972441da4758d662aeea93f1923f953e9cbaff14b8b9565aef", size = 2003200, upload-time = "2025-04-23T18:31:13.536Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/ca/22/3602b895ee2cd29d11a2b349372446ae9727c32e78a94b3d588a40fdf187/pydantic_core-2.33.2-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:bdc25f3681f7b78572699569514036afe3c243bc3059d3942624e936ec93450e", size = 2113890, upload-time = "2025-04-23T18:31:15.011Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/ff/e6/e3c5908c03cf00d629eb38393a98fccc38ee0ce8ecce32f69fc7d7b558a7/pydantic_core-2.33.2-cp311-cp311-musllinux_1_1_aarch64.whl", hash = "sha256:fe5b32187cbc0c862ee201ad66c30cf218e5ed468ec8dc1cf49dec66e160cc4d", size = 2073359, upload-time = "2025-04-23T18:31:16.393Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/12/e7/6a36a07c59ebefc8777d1ffdaf5ae71b06b21952582e4b07eba88a421c79/pydantic_core-2.33.2-cp311-cp311-musllinux_1_1_armv7l.whl", hash = "sha256:bc7aee6f634a6f4a95676fcb5d6559a2c2a390330098dba5e5a5f28a2e4ada30", size = 2245883, upload-time = "2025-04-23T18:31:17.892Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/16/3f/59b3187aaa6cc0c1e6616e8045b284de2b6a87b027cce2ffcea073adf1d2/pydantic_core-2.33.2-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:235f45e5dbcccf6bd99f9f472858849f73d11120d76ea8707115415f8e5ebebf", size = 2241074, upload-time = "2025-04-23T18:31:19.205Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/e0/ed/55532bb88f674d5d8f67ab121a2a13c385df382de2a1677f30ad385f7438/pydantic_core-2.33.2-cp311-cp311-win32.whl", hash = "sha256:6368900c2d3ef09b69cb0b913f9f8263b03786e5b2a387706c5afb66800efd51", size = 1910538, upload-time = "2025-04-23T18:31:20.541Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/fe/1b/25b7cccd4519c0b23c2dd636ad39d381abf113085ce4f7bec2b0dc755eb1/pydantic_core-2.33.2-cp311-cp311-win_amd64.whl", hash = "sha256:1e063337ef9e9820c77acc768546325ebe04ee38b08703244c1309cccc4f1bab", size = 1952909, upload-time = "2025-04-23T18:31:22.371Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/49/a9/d809358e49126438055884c4366a1f6227f0f84f635a9014e2deb9b9de54/pydantic_core-2.33.2-cp311-cp311-win_arm64.whl", hash = "sha256:6b99022f1d19bc32a4c2a0d544fc9a76e3be90f0b3f4af413f87d38749300e65", size = 1897786, upload-time = "2025-04-23T18:31:24.161Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/18/8a/2b41c97f554ec8c71f2a8a5f85cb56a8b0956addfe8b0efb5b3d77e8bdc3/pydantic_core-2.33.2-cp312-cp312-macosx_10_12_x86_64.whl", hash = "sha256:a7ec89dc587667f22b6a0b6579c249fca9026ce7c333fc142ba42411fa243cdc", size = 2009000, upload-time = "2025-04-23T18:31:25.863Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/a1/02/6224312aacb3c8ecbaa959897af57181fb6cf3a3d7917fd44d0f2917e6f2/pydantic_core-2.33.2-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:3c6db6e52c6d70aa0d00d45cdb9b40f0433b96380071ea80b09277dba021ddf7", size = 1847996, upload-time = "2025-04-23T18:31:27.341Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/d6/46/6dcdf084a523dbe0a0be59d054734b86a981726f221f4562aed313dbcb49/pydantic_core-2.33.2-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:4e61206137cbc65e6d5256e1166f88331d3b6238e082d9f74613b9b765fb9025", size = 1880957, upload-time = "2025-04-23T18:31:28.956Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/ec/6b/1ec2c03837ac00886ba8160ce041ce4e325b41d06a034adbef11339ae422/pydantic_core-2.33.2-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:eb8c529b2819c37140eb51b914153063d27ed88e3bdc31b71198a198e921e011", size = 1964199, upload-time = "2025-04-23T18:31:31.025Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/2d/1d/6bf34d6adb9debd9136bd197ca72642203ce9aaaa85cfcbfcf20f9696e83/pydantic_core-2.33.2-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:c52b02ad8b4e2cf14ca7b3d918f3eb0ee91e63b3167c32591e57c4317e134f8f", size = 2120296, upload-time = "2025-04-23T18:31:32.514Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/e0/94/2bd0aaf5a591e974b32a9f7123f16637776c304471a0ab33cf263cf5591a/pydantic_core-2.33.2-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:96081f1605125ba0855dfda83f6f3df5ec90c61195421ba72223de35ccfb2f88", size = 2676109, upload-time = "2025-04-23T18:31:33.958Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/f9/41/4b043778cf9c4285d59742281a769eac371b9e47e35f98ad321349cc5d61/pydantic_core-2.33.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:8f57a69461af2a5fa6e6bbd7a5f60d3b7e6cebb687f55106933188e79ad155c1", size = 2002028, upload-time = "2025-04-23T18:31:39.095Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/cb/d5/7bb781bf2748ce3d03af04d5c969fa1308880e1dca35a9bd94e1a96a922e/pydantic_core-2.33.2-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:572c7e6c8bb4774d2ac88929e3d1f12bc45714ae5ee6d9a788a9fb35e60bb04b", size = 2100044, upload-time = "2025-04-23T18:31:41.034Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/fe/36/def5e53e1eb0ad896785702a5bbfd25eed546cdcf4087ad285021a90ed53/pydantic_core-2.33.2-cp312-cp312-musllinux_1_1_aarch64.whl", hash = "sha256:db4b41f9bd95fbe5acd76d89920336ba96f03e149097365afe1cb092fceb89a1", size = 2058881, upload-time = "2025-04-23T18:31:42.757Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/01/6c/57f8d70b2ee57fc3dc8b9610315949837fa8c11d86927b9bb044f8705419/pydantic_core-2.33.2-cp312-cp312-musllinux_1_1_armv7l.whl", hash = "sha256:fa854f5cf7e33842a892e5c73f45327760bc7bc516339fda888c75ae60edaeb6", size = 2227034, upload-time = "2025-04-23T18:31:44.304Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/27/b9/9c17f0396a82b3d5cbea4c24d742083422639e7bb1d5bf600e12cb176a13/pydantic_core-2.33.2-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:5f483cfb75ff703095c59e365360cb73e00185e01aaea067cd19acffd2ab20ea", size = 2234187, upload-time = "2025-04-23T18:31:45.891Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/b0/6a/adf5734ffd52bf86d865093ad70b2ce543415e0e356f6cacabbc0d9ad910/pydantic_core-2.33.2-cp312-cp312-win32.whl", hash = "sha256:9cb1da0f5a471435a7bc7e439b8a728e8b61e59784b2af70d7c169f8dd8ae290", size = 1892628, upload-time = "2025-04-23T18:31:47.819Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/43/e4/5479fecb3606c1368d496a825d8411e126133c41224c1e7238be58b87d7e/pydantic_core-2.33.2-cp312-cp312-win_amd64.whl", hash = "sha256:f941635f2a3d96b2973e867144fde513665c87f13fe0e193c158ac51bfaaa7b2", size = 1955866, upload-time = "2025-04-23T18:31:49.635Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/0d/24/8b11e8b3e2be9dd82df4b11408a67c61bb4dc4f8e11b5b0fc888b38118b5/pydantic_core-2.33.2-cp312-cp312-win_arm64.whl", hash = "sha256:cca3868ddfaccfbc4bfb1d608e2ccaaebe0ae628e1416aeb9c4d88c001bb45ab", size = 1888894, upload-time = "2025-04-23T18:31:51.609Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/46/8c/99040727b41f56616573a28771b1bfa08a3d3fe74d3d513f01251f79f172/pydantic_core-2.33.2-cp313-cp313-macosx_10_12_x86_64.whl", hash = "sha256:1082dd3e2d7109ad8b7da48e1d4710c8d06c253cbc4a27c1cff4fbcaa97a9e3f", size = 2015688, upload-time = "2025-04-23T18:31:53.175Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/3a/cc/5999d1eb705a6cefc31f0b4a90e9f7fc400539b1a1030529700cc1b51838/pydantic_core-2.33.2-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:f517ca031dfc037a9c07e748cefd8d96235088b83b4f4ba8939105d20fa1dcd6", size = 1844808, upload-time = "2025-04-23T18:31:54.79Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/6f/5e/a0a7b8885c98889a18b6e376f344da1ef323d270b44edf8174d6bce4d622/pydantic_core-2.33.2-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:0a9f2c9dd19656823cb8250b0724ee9c60a82f3cdf68a080979d13092a3b0fef", size = 1885580, upload-time = "2025-04-23T18:31:57.393Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/3b/2a/953581f343c7d11a304581156618c3f592435523dd9d79865903272c256a/pydantic_core-2.33.2-cp313-cp313-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:2b0a451c263b01acebe51895bfb0e1cc842a5c666efe06cdf13846c7418caa9a", size = 1973859, upload-time = "2025-04-23T18:31:59.065Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/e6/55/f1a813904771c03a3f97f676c62cca0c0a4138654107c1b61f19c644868b/pydantic_core-2.33.2-cp313-cp313-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:1ea40a64d23faa25e62a70ad163571c0b342b8bf66d5fa612ac0dec4f069d916", size = 2120810, upload-time = "2025-04-23T18:32:00.78Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/aa/c3/053389835a996e18853ba107a63caae0b9deb4a276c6b472931ea9ae6e48/pydantic_core-2.33.2-cp313-cp313-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:0fb2d542b4d66f9470e8065c5469ec676978d625a8b7a363f07d9a501a9cb36a", size = 2676498, upload-time = "2025-04-23T18:32:02.418Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/eb/3c/f4abd740877a35abade05e437245b192f9d0ffb48bbbbd708df33d3cda37/pydantic_core-2.33.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:9fdac5d6ffa1b5a83bca06ffe7583f5576555e6c8b3a91fbd25ea7780f825f7d", size = 2000611, upload-time = "2025-04-23T18:32:04.152Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/59/a7/63ef2fed1837d1121a894d0ce88439fe3e3b3e48c7543b2a4479eb99c2bd/pydantic_core-2.33.2-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:04a1a413977ab517154eebb2d326da71638271477d6ad87a769102f7c2488c56", size = 2107924, upload-time = "2025-04-23T18:32:06.129Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/04/8f/2551964ef045669801675f1cfc3b0d74147f4901c3ffa42be2ddb1f0efc4/pydantic_core-2.33.2-cp313-cp313-musllinux_1_1_aarch64.whl", hash = "sha256:c8e7af2f4e0194c22b5b37205bfb293d166a7344a5b0d0eaccebc376546d77d5", size = 2063196, upload-time = "2025-04-23T18:32:08.178Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/26/bd/d9602777e77fc6dbb0c7db9ad356e9a985825547dce5ad1d30ee04903918/pydantic_core-2.33.2-cp313-cp313-musllinux_1_1_armv7l.whl", hash = "sha256:5c92edd15cd58b3c2d34873597a1e20f13094f59cf88068adb18947df5455b4e", size = 2236389, upload-time = "2025-04-23T18:32:10.242Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/42/db/0e950daa7e2230423ab342ae918a794964b053bec24ba8af013fc7c94846/pydantic_core-2.33.2-cp313-cp313-musllinux_1_1_x86_64.whl", hash = "sha256:65132b7b4a1c0beded5e057324b7e16e10910c106d43675d9bd87d4f38dde162", size = 2239223, upload-time = "2025-04-23T18:32:12.382Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/58/4d/4f937099c545a8a17eb52cb67fe0447fd9a373b348ccfa9a87f141eeb00f/pydantic_core-2.33.2-cp313-cp313-win32.whl", hash = "sha256:52fb90784e0a242bb96ec53f42196a17278855b0f31ac7c3cc6f5c1ec4811849", size = 1900473, upload-time = "2025-04-23T18:32:14.034Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/a0/75/4a0a9bac998d78d889def5e4ef2b065acba8cae8c93696906c3a91f310ca/pydantic_core-2.33.2-cp313-cp313-win_amd64.whl", hash = "sha256:c083a3bdd5a93dfe480f1125926afcdbf2917ae714bdb80b36d34318b2bec5d9", size = 1955269, upload-time = "2025-04-23T18:32:15.783Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/f9/86/1beda0576969592f1497b4ce8e7bc8cbdf614c352426271b1b10d5f0aa64/pydantic_core-2.33.2-cp313-cp313-win_arm64.whl", hash = "sha256:e80b087132752f6b3d714f041ccf74403799d3b23a72722ea2e6ba2e892555b9", size = 1893921, upload-time = "2025-04-23T18:32:18.473Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/a4/7d/e09391c2eebeab681df2b74bfe6c43422fffede8dc74187b2b0bf6fd7571/pydantic_core-2.33.2-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:61c18fba8e5e9db3ab908620af374db0ac1baa69f0f32df4f61ae23f15e586ac", size = 1806162, upload-time = "2025-04-23T18:32:20.188Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/f1/3d/847b6b1fed9f8ed3bb95a9ad04fbd0b212e832d4f0f50ff4d9ee5a9f15cf/pydantic_core-2.33.2-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:95237e53bb015f67b63c91af7518a62a8660376a6a0db19b89acc77a4d6199f5", size = 1981560, upload-time = "2025-04-23T18:32:22.354Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/6f/9a/e73262f6c6656262b5fdd723ad90f518f579b7bc8622e43a942eec53c938/pydantic_core-2.33.2-cp313-cp313t-win_amd64.whl", hash = "sha256:c2fc0a768ef76c15ab9238afa6da7f69895bb5d1ee83aeea2e3509af4472d0b9", size = 1935777, upload-time = "2025-04-23T18:32:25.088Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/7b/27/d4ae6487d73948d6f20dddcd94be4ea43e74349b56eba82e9bdee2d7494c/pydantic_core-2.33.2-pp311-pypy311_pp73-macosx_10_12_x86_64.whl", hash = "sha256:dd14041875d09cc0f9308e37a6f8b65f5585cf2598a53aa0123df8b129d481f8", size = 2025200, upload-time = "2025-04-23T18:33:14.199Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/f1/b8/b3cb95375f05d33801024079b9392a5ab45267a63400bf1866e7ce0f0de4/pydantic_core-2.33.2-pp311-pypy311_pp73-macosx_11_0_arm64.whl", hash = "sha256:d87c561733f66531dced0da6e864f44ebf89a8fba55f31407b00c2f7f9449593", size = 1859123, upload-time = "2025-04-23T18:33:16.555Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/05/bc/0d0b5adeda59a261cd30a1235a445bf55c7e46ae44aea28f7bd6ed46e091/pydantic_core-2.33.2-pp311-pypy311_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:2f82865531efd18d6e07a04a17331af02cb7a651583c418df8266f17a63c6612", size = 1892852, upload-time = "2025-04-23T18:33:18.513Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/3e/11/d37bdebbda2e449cb3f519f6ce950927b56d62f0b84fd9cb9e372a26a3d5/pydantic_core-2.33.2-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:2bfb5112df54209d820d7bf9317c7a6c9025ea52e49f46b6a2060104bba37de7", size = 2067484, upload-time = "2025-04-23T18:33:20.475Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/8c/55/1f95f0a05ce72ecb02a8a8a1c3be0579bbc29b1d5ab68f1378b7bebc5057/pydantic_core-2.33.2-pp311-pypy311_pp73-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:64632ff9d614e5eecfb495796ad51b0ed98c453e447a76bcbeeb69615079fc7e", size = 2108896, upload-time = "2025-04-23T18:33:22.501Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/53/89/2b2de6c81fa131f423246a9109d7b2a375e83968ad0800d6e57d0574629b/pydantic_core-2.33.2-pp311-pypy311_pp73-musllinux_1_1_aarch64.whl", hash = "sha256:f889f7a40498cc077332c7ab6b4608d296d852182211787d4f3ee377aaae66e8", size = 2069475, upload-time = "2025-04-23T18:33:24.528Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/b8/e9/1f7efbe20d0b2b10f6718944b5d8ece9152390904f29a78e68d4e7961159/pydantic_core-2.33.2-pp311-pypy311_pp73-musllinux_1_1_armv7l.whl", hash = "sha256:de4b83bb311557e439b9e186f733f6c645b9417c84e2eb8203f3f820a4b988bf", size = 2239013, upload-time = "2025-04-23T18:33:26.621Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/3c/b2/5309c905a93811524a49b4e031e9851a6b00ff0fb668794472ea7746b448/pydantic_core-2.33.2-pp311-pypy311_pp73-musllinux_1_1_x86_64.whl", hash = "sha256:82f68293f055f51b51ea42fafc74b6aad03e70e191799430b90c13d643059ebb", size = 2238715, upload-time = "2025-04-23T18:33:28.656Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/32/56/8a7ca5d2cd2cda1d245d34b1c9a942920a718082ae8e54e5f3e5a58b7add/pydantic_core-2.33.2-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:329467cecfb529c925cf2bbd4d60d2c509bc2fb52a20c1045bf09bb70971a9c1", size = 2066757, upload-time = "2025-04-23T18:33:30.645Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "pydantic-core"
|
||||
version = "2.46.4"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
resolution-markers = [
|
||||
"python_full_version >= '3.14'",
|
||||
"python_full_version == '3.12.*'",
|
||||
"python_full_version < '3.12'",
|
||||
]
|
||||
dependencies = [
|
||||
{ name = "typing-extensions" },
|
||||
{ name = "typing-extensions", marker = "python_full_version != '3.13.*'" },
|
||||
]
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/9d/56/921726b776ace8d8f5db44c4ef961006580d91dc52b803c489fafd1aa249/pydantic_core-2.46.4.tar.gz", hash = "sha256:62f875393d7f270851f20523dd2e29f082bcc82292d66db2b64ea71f64b6e1c1", size = 471464, upload-time = "2026-05-06T13:37:06.98Z" }
|
||||
wheels = [
|
||||
@@ -5047,7 +5172,8 @@ name = "pydantic-extra-types"
|
||||
version = "2.11.1"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
dependencies = [
|
||||
{ name = "pydantic" },
|
||||
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
|
||||
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
|
||||
{ name = "typing-extensions" },
|
||||
]
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/66/71/dba38ee2651f84f7842206adbd2233d8bbdb59fb85e9fa14232486a8c471/pydantic_extra_types-2.11.1.tar.gz", hash = "sha256:46792d2307383859e923d8fcefa82108b1a141f8a9c0198982b3832ab5ef1049", size = 172002, upload-time = "2026-03-16T08:08:03.92Z" }
|
||||
@@ -5060,7 +5186,8 @@ name = "pydantic-settings"
|
||||
version = "2.14.1"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
dependencies = [
|
||||
{ name = "pydantic" },
|
||||
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
|
||||
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
|
||||
{ name = "python-dotenv" },
|
||||
{ name = "typing-inspection" },
|
||||
]
|
||||
@@ -5424,7 +5551,8 @@ dependencies = [
|
||||
{ name = "numpy" },
|
||||
{ name = "portalocker" },
|
||||
{ name = "protobuf" },
|
||||
{ name = "pydantic" },
|
||||
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
|
||||
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
|
||||
{ name = "urllib3" },
|
||||
]
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/65/45/5b1bdd15a3c7730eefb9c113600829e20d689b82b5a23f9e07d107094004/qdrant_client-1.18.0.tar.gz", hash = "sha256:52e8ece1a7d40519801bf0b70713bfa0f6b7ae28c7275bbe0b0286fbed7f6db4", size = 352580, upload-time = "2026-05-11T14:12:38.702Z" }
|
||||
@@ -5915,8 +6043,10 @@ version = "0.1.28"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
dependencies = [
|
||||
{ name = "httpx" },
|
||||
{ name = "pydantic" },
|
||||
{ name = "pydantic-core" },
|
||||
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
|
||||
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
|
||||
{ name = "pydantic-core", version = "2.33.2", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
|
||||
{ name = "pydantic-core", version = "2.46.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
|
||||
{ name = "typing-extensions" },
|
||||
{ name = "websockets" },
|
||||
]
|
||||
@@ -6244,7 +6374,8 @@ version = "0.2.8"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
dependencies = [
|
||||
{ name = "numpy" },
|
||||
{ name = "pydantic" },
|
||||
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
|
||||
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
|
||||
{ name = "speechmatics-rt" },
|
||||
]
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/e4/b2/72b5b2203bbefbd22e7692adaca0dd7c2feebed1aaea5599ec579f74fbbf/speechmatics_voice-0.2.8.tar.gz", hash = "sha256:b2d9cbf773fd94400c744734662e2b16b5bdc4271d0dafde46ac032c438fe000", size = 61419, upload-time = "2026-01-26T16:26:09.082Z" }
|
||||
@@ -6544,7 +6675,8 @@ dependencies = [
|
||||
{ name = "opentelemetry-api" },
|
||||
{ name = "opentelemetry-instrumentation-threading" },
|
||||
{ name = "opentelemetry-sdk" },
|
||||
{ name = "pydantic" },
|
||||
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
|
||||
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
|
||||
{ name = "pyyaml" },
|
||||
{ name = "typing-extensions" },
|
||||
{ name = "watchdog" },
|
||||
@@ -7131,6 +7263,18 @@ wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/f4/34/a9dbe051de88a63eb7408ea66630bac38e72f7f6077d4be58737106860d9/virtualenv-21.3.3-py3-none-any.whl", hash = "sha256:7d5987d8369e098e41406efb780a3d4ca79280097293899e351a6407ee153ab3", size = 7594554, upload-time = "2026-05-13T18:01:27.815Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "vonage-video-connector"
|
||||
version = "0.2.3b0"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
dependencies = [
|
||||
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
|
||||
]
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/8a/db/385df7fd618b31f0def554aca568d87b4b2f9ccc3a1457ae7eea5e8bf775/vonage_video_connector-0.2.3b0-py3-none-manylinux_2_35_aarch64.whl", hash = "sha256:9d1ffa93f3aadd24a980294df2b63b0f853b8dfa25b277690e0864e7586f8bb7", size = 12101114, upload-time = "2026-03-02T15:34:45.007Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/9f/4e/03b183599370473c3277140e9ecbb33621449935a02042ecbcf8c555ebad/vonage_video_connector-0.2.3b0-py3-none-manylinux_2_35_x86_64.whl", hash = "sha256:718e39e7e488ac50fecda75e24ab01c9d16d4078bb4f79ee7857e282493e2e4e", size = 13971535, upload-time = "2026-03-02T15:34:47.186Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "wait-for2"
|
||||
version = "0.4.1"
|
||||
|
||||
Reference in New Issue
Block a user