Compare commits

...

73 Commits

Author SHA1 Message Date
Mark Backman
780c004168 Merge pull request #4423 from joycech333/feat/inception-llm-service
feat: add Inception LLM service with Mercury 2 support
2026-05-21 12:02:27 -04:00
Mark Backman
28f9203401 Code review fixes 2026-05-21 11:45:17 -04:00
joycech333
77cc314a08 feat: add Inception LLM service with Mercury-2 support
Adds InceptionLLMService, an OpenAI-compatible service for Inception's
Mercury-2 diffusion-based reasoning model. Supports reasoning_effort
(instant/low/medium/high) and realtime mode for reduced TTFT.
2026-05-21 11:23:23 -04:00
Mark Backman
4a8d1d0b5e Merge pull request #4532 from pipecat-ai/mb/cleanup-logging-after-smart-text-handling
Clean up smart text logging
2026-05-21 08:35:46 -04:00
Mark Backman
87f5d60693 Merge pull request #4531 from pipecat-ai/mb/pipecat-prebuilt-1.0.1
chore: bump pipecat-ai-prebuilt to 1.0.1
2026-05-21 08:35:31 -04:00
Mark Backman
c699b31daa Merge pull request #4534 from pipecat-ai/mb/changelog-4521
Add changelog for #4521
2026-05-21 08:35:15 -04:00
Mark Backman
ee674ffb01 Add changelog for #4521 2026-05-20 17:57:43 -04:00
mihafabcic-soniox
86a5710801 Add max_endpoint_delay_ms and clean up Sonoix STT settings (#4521) 2026-05-20 17:54:48 -04:00
Mark Backman
4a96b2a9e6 Clean up smart text logging 2026-05-20 15:38:59 -04:00
Mark Backman
105d6f27da Merge pull request #4514 from pipecat-ai/mb/websocket-stt-service-exception-handling
Align websocket STT connection failures
2026-05-20 15:15:35 -04:00
Filipi da Silva Fuchter
e0e3cd336a Merge pull request #4529 from pipecat-ai/filipi/squash_skill
New skill to squash commits.
2026-05-20 16:06:23 -03:00
Mark Backman
9586db5b50 Preserve websocket reconnect failure retries 2026-05-20 14:45:29 -04:00
Mark Backman
a890ab7b21 Add changelog for PR #4531 2026-05-20 12:18:03 -04:00
Mark Backman
c1bf7dbb4a chore: bump pipecat-ai-prebuilt to 1.0.1 2026-05-20 12:15:09 -04:00
Mark Backman
709a0ce839 Merge pull request #4527 from pipecat-ai/mb/fix-elevenlabs-keepalive-1008
Fix ElevenLabs keepalive racing context-init (1008 disconnects)
2026-05-20 11:21:17 -04:00
Mark Backman
be93350eae Merge pull request #4522 from pipecat-ai/mb/stt-latency-smallest
Add P99 latency for Smallest AI, Mistral, XAI STT
2026-05-20 11:21:00 -04:00
Mark Backman
4a96ab7073 Merge pull request #4524 from pipecat-ai/mb/fix-runner-imports
Improve runner optional transport handling
2026-05-20 11:16:16 -04:00
filipi87
c321f50e76 New skill to squash commits. 2026-05-20 10:29:03 -03:00
Filipi da Silva Fuchter
bca337f97e Merge pull request #4380 from pipecat-ai/filipi/smart_text
Smart Text Handling
2026-05-20 10:18:30 -03:00
filipi87
5d9e8c5ac5 Removing debug log. 2026-05-20 10:13:46 -03:00
Mark Backman
70773bce0a Add changelog for PR #4527 2026-05-20 09:08:47 -04:00
filipi87
8bdb49bd1a chore: add changelogs for word-timestamp and frame-ordering fixes 2026-05-20 10:03:30 -03:00
filipi87
81bb81c1d0 test: add automated tests for word tracking, frame sequencing, and Cartesia TTS
Adds tests for AggregatedFrameSequencer, WordCompletionTracker, and
word_timestamp_utils (including CJK language scenarios). Updates existing
Cartesia TTS and TTS frame ordering tests to cover the new behaviours.
2026-05-20 10:03:26 -03:00
filipi87
e1bdee598c fix: preserve raw_text through TTS pipeline for correct LLM context attribution
TTSTextFrame entries were losing their original text structure when word
timestamps were enabled. AggregatedTextFrame now carries a raw_text field with
the original LLM-produced text (including pattern delimiters such as
<card>...</card>). The assistant context receives properly-tagged content
rather than the cleaned words returned by the TTS provider. Also handles words
that straddle two sentence boundaries by splitting and attributing each part
to its correct source frame.
2026-05-20 10:03:21 -03:00
filipi87
185a89bb3b fix: strip Cartesia SSML tags from word timestamp entries
SSML markup (e.g. <spell>, <emotion>, <break>) was leaking into word entries
returned by the Cartesia word-timestamps API. Tags are now stripped before
processing so word-to-text attribution remains accurate when SSML is present
in the TTS input.
2026-05-20 10:03:15 -03:00
filipi87
6b9deefbe3 fix: preserve frame insertion order in BaseOutputTransport for equal PTS values
Frames sharing the same presentation timestamp were being reordered by the
priority queue. Adds a monotonic counter as a tiebreaker so frames with equal
PTS are always emitted in insertion order, preventing subtle audio/text
sequencing bugs.
2026-05-20 10:03:08 -03:00
filipi87
deefc32faf fix: hold skipped TTS frames in position until preceding spoken frames complete
Skipped frames (e.g. code blocks filtered via skip_aggregator_types) were
emitted to the assistant context immediately instead of waiting for preceding
spoken frames to finish. Introduces AggregatedFrameSequencer to hold each
frame's slot and flush only after all earlier spoken sentences are complete,
keeping context ordering correct.
2026-05-20 10:03:03 -03:00
Mark Backman
a5e6886b80 Fix ElevenLabs keepalive racing context-init (1008 disconnects)
The keepalive could fire for a new turn's context before that context's
voice_settings context-init was sent, making the keepalive the context's
first message (no voice_settings) and causing ElevenLabs to reject the
later init with a 1008 policy violation. The keepalive now only targets a
context once its context-init has been sent (tracked in _context_init_sent).
2026-05-20 08:59:01 -04:00
Mark Backman
d11a4ba0cd Use shared telephony route availability checks 2026-05-20 08:57:48 -04:00
Mark Backman
38407e091d Add p99 values for Mistral and XAI 2026-05-19 22:51:33 -04:00
Mark Backman
82cd931efa Merge pull request #4306 from YFortin/fix/azure-tts-last-word-race
fix(azure-tts): Route completion through word boundary queue to prevent last word from being missed
2026-05-19 22:27:50 -04:00
Mark Backman
33e5d1f89b Add changelog for PR #4522 2026-05-19 18:33:58 -04:00
Mark Backman
861dd23873 Add changelog for runner updates 2026-05-19 17:31:07 -04:00
Mark Backman
b825dd779e Clarify runner startup banner 2026-05-19 17:31:07 -04:00
Mark Backman
1487da53a9 Improve runner optional transport handling 2026-05-19 17:03:16 -04:00
Mark Backman
aff84a5d9e Add P99 latency for Smallest AI STT 2026-05-19 11:05:15 -04:00
Mark Backman
c09f6d5adb Merge pull request #4052 from Vonage/vonage_video_connector_transport
Vonage WebRTC Transport Integration
2026-05-19 10:56:20 -04:00
asilvestre
e2d249e5d9 adding uv.lock 2026-05-19 16:33:38 +02:00
asilvestre
956b39b0dc remove extraenous await in cleanup 2026-05-19 16:33:04 +02:00
Mark Backman
e298491068 Add changelog for websocket STT failure handling 2026-05-18 12:41:56 -04:00
Mark Backman
97b00042df Align websocket STT connection failures 2026-05-18 12:35:01 -04:00
asilvestre
bc769eaa82 Changing the example to use OpenAI 2026-05-18 14:40:56 +02:00
asilvestre
ee5aa4dc71 SubscribeSettings to be pydantic and comment fixes 2026-05-18 14:40:56 +02:00
asilvestre
dd38fbc735 add documentation entry 2026-05-18 14:40:56 +02:00
asilvestre
a1c40df471 add documentation entry 2026-05-18 14:40:56 +02:00
asilvestre
c4ff9300c9 fix linting and typechecking 2026-05-18 14:40:56 +02:00
asilvestre
cab4585cbb added changelog 2026-05-18 14:40:56 +02:00
Antoni Silvestre
18368d047e Linting and changes to adapt to v1.0 2026-05-18 14:40:56 +02:00
asilvestre
e3abb4b6d7 apply suggestions in PR 2026-05-18 14:40:56 +02:00
Antoni Silvestre
0fd971d59d Update src/pipecat/runner/types.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-05-18 14:40:56 +02:00
asilvestre
c61672194d Vonage Video Connector Transport 2026-05-18 14:40:49 +02:00
Filipi da Silva Fuchter
c51a817efa Merge pull request #4442 from pipecat-ai/filipi/runner_all_transports
Unified start route to make all transports available
2026-05-18 09:27:44 -03:00
Bismeet singh
d85eda6da8 Merge pull request #4507 from BismeetSingh/fix/elevenlabs-stt-service-crash-language
Fix/elevenlabs stt service crash language
2026-05-17 10:17:07 -04:00
Aleix Conchillo Flaqué
71feb42711 Merge pull request #4503 from pipecat-ai/changelog-1.2.1
Release 1.2.1 - Changelog Update
2026-05-15 15:19:55 -07:00
aconchillo
6b93ca0cb6 Update changelog for version 1.2.1 2026-05-15 22:18:46 +00:00
Aleix Conchillo Flaqué
b6ecce754b Merge pull request #4501 from pipecat-ai/aleix/fix-filter-incomplete-tool-calls
Fix filter-incomplete + function-calling deadlock
2026-05-15 15:11:45 -07:00
Aleix Conchillo Flaqué
d39e6bf921 Add changelog for #4501 2026-05-15 14:54:51 -07:00
Aleix Conchillo Flaqué
63064860ef Move OpenAITTSService instructions into Settings in the example
Mirrors the deprecation in ``OpenAITTSService.__init__``: ``instructions``
is now a Settings field. The constructor still accepts it for backward
compatibility but the canonical path is through ``Settings``.
2026-05-15 14:54:51 -07:00
Aleix Conchillo Flaqué
f5158d51e7 Add filter-incomplete + function-calling turn-management example
A copy of ``turn-management-filter-incomplete-turns.py`` extended with
a ``get_weather(location)`` direct function. Exercises the path where
the LLM responds to a complete user turn by calling a tool — used to
reproduce (and now verify the fix for) the ``_user_speaking`` gating
bug between filter-incomplete and function calls.
2026-05-15 14:54:51 -07:00
Aleix Conchillo Flaqué
94dbd2fa68 Broadcast UserTurnInferenceCompletedFrame on tool calls in filter-incomplete
With ``filter_incomplete_user_turns`` enabled, an LLM that responded to
a user turn by calling a tool (without first emitting a ✓ marker)
never finalized the user turn. ``UserStoppedSpeakingFrame`` stayed
deferred, the assistant aggregator kept ``_user_speaking=True``, and
when ``FunctionCallResultFrame`` arrived its ``not self._user_speaking``
gate dropped the context push — the LLM continuation never ran and
the call hung silently.

Broadcast ``UserTurnInferenceCompletedFrame`` on
``FunctionCallsStartedFrame`` (i.e. the moment the LLM commits to a
tool call, before the function dispatches), gated by a new
``_turn_completion_broadcasted`` flag so the ✓ path and the tool-call
path don't both fire. The flag resets in ``_turn_reset`` alongside
the other per-turn state.

Emitting on the start frame rather than ``LLMFullResponseEndFrame``
also shrinks the race window — ``UserStoppedSpeakingFrame`` (a
``SystemFrame``) has the maximum possible head start over the
``FunctionCallResultFrame`` (``DataFrame``) that follows.
2026-05-15 14:50:35 -07:00
filipi87
b493ed8d3a Removing the websocket transport from elevenlabs example. 2026-05-15 10:11:38 -03:00
filipi87
c3338667b1 Mounting the prebuilt frontend UI and root redirect for all transports. 2026-05-15 10:06:47 -03:00
filipi87
c8efe319b3 Adding the changelog for the changes. 2026-05-14 11:10:33 -03:00
filipi87
d6655e7a5e Fixing ruff format. 2026-05-12 10:40:09 -03:00
filipi87
33b73df6ec Changing the websocket route to return the same data as PCC. 2026-05-12 10:38:15 -03:00
filipi87
c9f0172e9f Example supporting plain websocket. 2026-05-08 09:46:18 -03:00
filipi87
2638885c62 Adding support for the plain websocket transport. 2026-05-08 09:37:07 -03:00
filipi87
cb426cbb14 Fixing format. 2026-05-07 16:04:43 -03:00
filipi87
d39beff817 Fixing format. 2026-05-07 16:01:54 -03:00
filipi87
1eade184f1 Creating a status endpoint to return the available transports. 2026-05-07 15:53:15 -03:00
filipi87
3fa193b983 Unified start route to make all transports available. 2026-05-07 15:34:32 -03:00
Yan Fortin
6feeee515f chore: rename changelog fragment to match PR #4306 2026-04-14 18:49:35 -04:00
Yan Fortin
55fb4b0845 fix(azure-tts): route completion through word boundary queue to prevent last word from being missed
The Azure TTS _handle_completed callback was putting the audio stream
completion signal (None) directly into _audio_queue while the last word
was still pending in _word_boundary_queue. This caused a race condition
where run_tts could exit and TTSStoppedFrame could be emitted before the
word processor task had a chance to process and emit the final word's
TTSTextFrame.

The fix routes the completion signal through _word_boundary_queue as a
None sentinel. The word processor task now recognizes this sentinel and
only signals _audio_queue after all pending words have been drained.
This guarantees the last word's TTSTextFrame is always emitted before
TTSStoppedFrame.

The cancellation/interruption path (_handle_canceled) is unchanged and
still signals _audio_queue directly, which is correct since word ordering
does not matter when speech is interrupted.
2026-04-14 18:48:40 -04:00
77 changed files with 10864 additions and 416 deletions

View File

@@ -0,0 +1,91 @@
---
name: squash-commits
description: Reorganize messy branch commits into a small set of logical, meaningful commits without changing any content. Drops merge-from-main commits. Safe: creates a backup branch first.
---
Reorganize the commits on the current branch into a small number of logical commits. Do NOT change any file content — only the commit structure changes.
## Instructions
### 1. Safety check
```bash
git status --short
```
If there are uncommitted changes, stop and tell the user to commit or stash them first.
### 2. Inspect the branch
```bash
git log main..HEAD --oneline
git diff main..HEAD --name-only
```
List every file changed vs `main` and every commit on the branch (excluding merge commits from main).
### 3. Create a backup branch
```bash
git branch backup/<current-branch-name>
```
Tell the user the backup exists so they can recover if needed.
### 4. Soft-reset to main and unstage everything
```bash
git reset --soft main
git restore --staged .
```
All branch changes are now in the working tree, unstaged. No content has changed.
### 5. Plan the logical groups
Read the changed files and the original commit messages to understand what the work covers. Group related files into logical commits. Typical groups:
- Core feature or fix (new source files + modified core files)
- Secondary features or fixes (each as its own commit if distinct)
- Refactoring or renames
- Tests
- Changelogs / docs
Use the changelog files (if any) as a strong hint — each changelog entry often maps to one commit.
Present the proposed grouping to the user and ask for confirmation before committing.
### 6. Commit in logical groups
For each group, stage only the relevant files and commit with a clear message following the project's conventions:
```bash
git add <file1> <file2> ...
git commit -m "..."
```
Use conventional commit prefixes if the project uses them (`feat:`, `fix:`, `refactor:`, `test:`, `chore:`).
### 7. Verify
```bash
git log main..HEAD --oneline
git diff main..HEAD --name-only
git status --short
```
Confirm:
- Commit count is small and each message is meaningful
- The set of changed files vs `main` is identical to before
- Working tree is clean
### 8. Remind about force-push
The branch history has been rewritten. Tell the user they will need to `git push --force-with-lease` when they are ready to update the remote. Do NOT push automatically.
## Rules
- Never change file contents. If you find yourself editing a file, stop.
- Never skip the backup branch step.
- Never force-push without explicit user instruction.
- If any step fails or the result looks wrong, tell the user and suggest restoring from the backup: `git reset --hard backup/<branch-name>`.

View File

@@ -7,6 +7,27 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
<!-- towncrier release notes start -->
## [1.2.1] - 2026-05-15
### Changed
- Changed the default WebSocket endpoints for `GradiumSTTService` and
`GradiumTTSService` to the region-neutral
`wss://api.gradium.ai/api/speech/asr` and
`wss://api.gradium.ai/api/speech/tts`. Gradium now automatically routes
traffic to the nearest endpoint. Override the url to pin to a specific
region.
(PR [#4500](https://github.com/pipecat-ai/pipecat/pull/4500))
### Fixed
- Fixed bot hangs when `filter_incomplete_user_turns` was enabled and the LLM
responded by calling a tool. The user turn never finalized, so the assistant
aggregator gated the tool-result context push and the LLM continuation never
ran. Tool calls now finalize the turn the moment they start, before the
function dispatches.
(PR [#4501](https://github.com/pipecat-ai/pipecat/pull/4501))
## [1.2.0] - 2026-05-14
### Added

View File

@@ -92,10 +92,10 @@ Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.yout
| Category | Services |
| ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/api-reference/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/api-reference/server/services/stt/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/api-reference/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/api-reference/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/api-reference/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/api-reference/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/api-reference/server/services/stt/gladia), [Google](https://docs.pipecat.ai/api-reference/server/services/stt/google), [Gradium](https://docs.pipecat.ai/api-reference/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/api-reference/server/services/stt/groq), [Mistral](https://docs.pipecat.ai/api-reference/server/services/stt/mistral), [NVIDIA](https://docs.pipecat.ai/api-reference/server/services/stt/nvidia), [OpenAI (Whisper)](https://docs.pipecat.ai/api-reference/server/services/stt/openai), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/api-reference/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/api-reference/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/api-reference/server/services/stt/whisper), [xAI](https://docs.pipecat.ai/api-reference/server/services/stt/xai) |
| LLMs | [Anthropic](https://docs.pipecat.ai/api-reference/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/api-reference/server/services/llm/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/api-reference/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/api-reference/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/api-reference/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/api-reference/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/api-reference/server/services/llm/grok), [Groq](https://docs.pipecat.ai/api-reference/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/api-reference/server/services/llm/mistral), [Nebius](https://docs.pipecat.ai/api-reference/server/services/llm/nebius), [Novita](https://docs.pipecat.ai/api-reference/server/services/llm/novita), [NVIDIA NIM](https://docs.pipecat.ai/api-reference/server/services/llm/nvidia), [Ollama](https://docs.pipecat.ai/api-reference/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/api-reference/server/services/llm/openai), [OpenAI Responses](https://docs.pipecat.ai/api-reference/server/services/llm/openai-responses), [OpenRouter](https://docs.pipecat.ai/api-reference/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/api-reference/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/api-reference/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/api-reference/server/services/llm/sambanova), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/llm/sarvam), [Together AI](https://docs.pipecat.ai/api-reference/server/services/llm/together) |
| LLMs | [Anthropic](https://docs.pipecat.ai/api-reference/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/api-reference/server/services/llm/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/api-reference/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/api-reference/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/api-reference/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/api-reference/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/api-reference/server/services/llm/grok), [Groq](https://docs.pipecat.ai/api-reference/server/services/llm/groq), [Inception](https://docs.pipecat.ai/api-reference/server/services/llm/inception), [Mistral](https://docs.pipecat.ai/api-reference/server/services/llm/mistral), [Nebius](https://docs.pipecat.ai/api-reference/server/services/llm/nebius), [Novita](https://docs.pipecat.ai/api-reference/server/services/llm/novita), [NVIDIA NIM](https://docs.pipecat.ai/api-reference/server/services/llm/nvidia), [Ollama](https://docs.pipecat.ai/api-reference/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/api-reference/server/services/llm/openai), [OpenAI Responses](https://docs.pipecat.ai/api-reference/server/services/llm/openai-responses), [OpenRouter](https://docs.pipecat.ai/api-reference/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/api-reference/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/api-reference/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/api-reference/server/services/llm/sambanova), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/llm/sarvam), [Together AI](https://docs.pipecat.ai/api-reference/server/services/llm/together) |
| Text-to-Speech | [Async](https://docs.pipecat.ai/api-reference/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/api-reference/server/services/tts/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/tts/azure), [Camb AI](https://docs.pipecat.ai/api-reference/server/services/tts/camb), [Cartesia](https://docs.pipecat.ai/api-reference/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/api-reference/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/api-reference/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/api-reference/server/services/tts/fish), [Google](https://docs.pipecat.ai/api-reference/server/services/tts/google), [Gradium](https://docs.pipecat.ai/api-reference/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/api-reference/server/services/tts/groq), [Hume](https://docs.pipecat.ai/api-reference/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/api-reference/server/services/tts/inworld), [Kokoro](https://docs.pipecat.ai/api-reference/server/services/tts/kokoro), [LMNT](https://docs.pipecat.ai/api-reference/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/api-reference/server/services/tts/minimax), [Mistral](https://docs.pipecat.ai/api-reference/server/services/tts/mistral), [Neuphonic](https://docs.pipecat.ai/api-reference/server/services/tts/neuphonic), [NVIDIA](https://docs.pipecat.ai/api-reference/server/services/tts/nvidia), [OpenAI](https://docs.pipecat.ai/api-reference/server/services/tts/openai), [Piper](https://docs.pipecat.ai/api-reference/server/services/tts/piper), [Resemble](https://docs.pipecat.ai/api-reference/server/services/tts/resemble), [Rime](https://docs.pipecat.ai/api-reference/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/tts/sarvam), [Smallest](https://docs.pipecat.ai/api-reference/server/services/tts/smallest), [Soniox](https://docs.pipecat.ai/api-reference/server/services/tts/soniox), [Speechmatics](https://docs.pipecat.ai/api-reference/server/services/tts/speechmatics), [xAI](https://docs.pipecat.ai/api-reference/server/services/tts/xai), [XTTS](https://docs.pipecat.ai/api-reference/server/services/tts/xtts) |
| Speech-to-Speech | [AWS Nova Sonic](https://docs.pipecat.ai/api-reference/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/api-reference/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/api-reference/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/api-reference/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/api-reference/server/services/s2s/ultravox), |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/api-reference/server/services/transport/fastapi-websocket), [LiveKit (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/livekit), [SmallWebRTCTransport](https://docs.pipecat.ai/api-reference/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/api-reference/server/services/transport/websocket-server), [WhatsApp](https://docs.pipecat.ai/api-reference/server/services/transport/whatsapp), Local |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/api-reference/server/services/transport/fastapi-websocket), [LiveKit (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/livekit), [SmallWebRTCTransport](https://docs.pipecat.ai/api-reference/server/services/transport/small-webrtc), [Vonage (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/vonage), [WebSocket Server](https://docs.pipecat.ai/api-reference/server/services/transport/websocket-server), [WhatsApp](https://docs.pipecat.ai/api-reference/server/services/transport/whatsapp), Local |
| Serializers | [Exotel](https://docs.pipecat.ai/api-reference/server/services/serializers/exotel), [Genesys](https://docs.pipecat.ai/api-reference/server/services/serializers/genesys), [Plivo](https://docs.pipecat.ai/api-reference/server/services/serializers/plivo), [Twilio](https://docs.pipecat.ai/api-reference/server/services/serializers/twilio), [Telnyx](https://docs.pipecat.ai/api-reference/server/services/serializers/telnyx), [Vonage](https://docs.pipecat.ai/api-reference/server/services/serializers/vonage) |
| Video | [HeyGen](https://docs.pipecat.ai/api-reference/server/services/video/heygen), [LemonSlice](https://docs.pipecat.ai/api-reference/server/services/transport/lemonslice), [Tavus](https://docs.pipecat.ai/api-reference/server/services/video/tavus), [Simli](https://docs.pipecat.ai/api-reference/server/services/video/simli) |
| Memory | [mem0](https://docs.pipecat.ai/api-reference/server/services/memory/mem0) |

1
changelog/4052.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `VonageVideoConnectorTransport`, a new transport integration for real-time Vonage WebRTC sessions using the Vonage Video Connector library.

1
changelog/4306.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed Azure TTS last word being missed by observers and RTVI UI. The completion signal was racing with word timestamp processing, causing the final word's `TTSTextFrame` to arrive after `TTSStoppedFrame`. Completion is now routed through the word boundary queue to ensure all words are processed before signaling stream end.

View File

@@ -0,0 +1 @@
- Fixed `BaseOutputTransport` reordering frames that share the same presentation timestamp. Frames with equal PTS values are now emitted in insertion order, preventing subtle audio/text sequencing bugs when multiple frames arrive at the same time.

View File

@@ -0,0 +1 @@
- Fixed Cartesia word timestamps leaking SSML tag text (e.g. `<spell>`, `<emotion>`, `<break>`) into word entries. Tags are now stripped before processing, so word-to-text attribution remains accurate when SSML markup is present in the TTS input.

View File

@@ -0,0 +1 @@
- Fixed `TTSTextFrame` entries losing their original text structure when word timestamps are enabled. Each `TTSTextFrame` now carries a `raw_text` field containing the corresponding span of the original LLM-produced text (including pattern delimiters such as `<card>4111 1111 1111 1111</card>`), so the assistant context receives properly-tagged content rather than the cleaned words returned by the TTS provider. Also handles words that straddle two sentence boundaries by splitting them and attributing each part to its correct source frame.

1
changelog/4380.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed skipped TTS frames (e.g. code blocks filtered via `skip_aggregator_types`) being emitted to the assistant context immediately instead of waiting for preceding spoken frames to finish. They now hold their position in the frame sequence and are flushed only after all earlier spoken sentences are complete, keeping context ordering correct.

1
changelog/4423.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `InceptionLLMService` for Inception's Mercury 2 diffusion reasoning model, with support for `reasoning_effort` and `realtime` settings.

View File

@@ -0,0 +1 @@
- Added `GET /status` endpoint to the development runner that reports which transports the running instance accepts (all by default, or the single transport passed via `-t`).

1
changelog/4442.added.md Normal file
View File

@@ -0,0 +1 @@
- Added plain WebSocket transport support to the development runner. Bots can now accept connections from non-telephony WebSocket clients (e.g., browser apps using protobuf framing) via the `/ws-client` endpoint alongside other transports.

View File

@@ -0,0 +1 @@
- ⚠️ The development runner now supports all transports (WebRTC, Daily, telephony, plain WebSocket) simultaneously from a single server. The `/start` endpoint accepts a `"transport"` field to select the transport per-request; omitting `-t` at startup enables all transports instead of defaulting to WebRTC. The Daily browser-redirect route moved from `GET /` to `GET /daily`.

View File

@@ -1 +0,0 @@
- Changed the default WebSocket endpoints for `GradiumSTTService` and `GradiumTTSService` to the region-neutral `wss://api.gradium.ai/api/speech/asr` and `wss://api.gradium.ai/api/speech/tts`. Gradium now automatically routes traffic to the nearest endpoint. Override the url to pin to a specific region.

1
changelog/4507.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed `ElevenLabsSTTService` crashing when `language` was passed as `None`. When `language` is not set, the service now lets ElevenLabs auto-detect the audio language.

1
changelog/4514.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed websocket STT connection setup failures so services clear stale websocket state and emit non-fatal error frames, allowing `ServiceSwitcher` failover to keep agents running.

1
changelog/4521.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `max_endpoint_delay_ms` to `SonioxSTTService.Settings`, controlling the maximum delay (500-3000 ms) before endpoint detection finalizes a turn.

View File

@@ -0,0 +1 @@
- `SonioxSTTService` now applies settings updates (e.g. via `STTUpdateSettingsFrame`) using a graceful reconnect instead of a hard disconnect/reconnect, preserving the service's reconnect retry behavior.

View File

@@ -0,0 +1 @@
- Removed the unsupported Georgian (`Language.KA`) language mapping from `SonioxSTTService`.

View File

@@ -0,0 +1 @@
- Updated the default p99 TTFS latency values for Smallest AI, Mistral, and XAI STT so turn stop timing uses measured values instead of the conservative fallback.

View File

@@ -0,0 +1 @@
- Updated the development runner startup banner to show the prebuilt client URL once and list enabled or disabled transports with install hints.

1
changelog/4524.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed the development runner so missing optional transport dependencies disable only their related routes instead of failing startup in all-transport mode.

1
changelog/4527.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed a race in `ElevenLabsTTSService` where the periodic keepalive could be sent for a new turn's context before that context's `voice_settings` initialization message, causing ElevenLabs to close the WebSocket with a 1008 policy violation (`voice_settings field must be provided in the first message ...`). The keepalive now only targets a context once its context-init has been sent.

View File

@@ -0,0 +1 @@
- Bumped `pipecat-ai-prebuilt` to 1.0.1 in the `runner` extra, updating the prebuilt client UI served by the development runner.

View File

@@ -91,6 +91,9 @@ HEYGEN_LIVE_AVATAR_API_KEY=...
HUME_API_KEY=...
HUME_VOICE_ID=...
# Inception
INCEPTION_API_KEY=...
# Inworld
INWORLD_API_KEY=...
@@ -211,6 +214,11 @@ TWILIO_AUTH_TOKEN=...
# Ultravox Realtime
ULTRAVOX_API_KEY=...
# Vonage
VONAGE_APPLICATION_ID=...
VONAGE_SESSION_ID=...
VONAGE_TOKEN=...
# WhatsApp
WHATSAPP_TOKEN=...
WHATSAPP_WEBHOOK_VERIFICATION_TOKEN=...

View File

@@ -0,0 +1,177 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame, TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.inception.llm import InceptionLLMService
from pipecat.services.llm_service import FunctionCallParams
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
async def fetch_weather_from_api(params: FunctionCallParams):
await params.result_callback({"conditions": "nice", "temperature": "75"})
async def fetch_restaurant_recommendation(params: FunctionCallParams):
await params.result_callback({"name": "The Golden Dragon"})
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = InceptionLLMService(
api_key=os.environ["INCEPTION_API_KEY"],
settings=InceptionLLMService.Settings(
reasoning_effort="instant",
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
# You can also register a function_name of None to get all functions
# sent to the same callback with an additional function_name parameter.
llm.register_function("get_current_weather", fetch_weather_from_api)
llm.register_function("get_restaurant_recommendation", fetch_restaurant_recommendation)
@llm.event_handler("on_function_calls_started")
async def on_function_calls_started(service, function_calls):
await tts.queue_frame(TTSSpeakFrame("Let me check on that."))
weather_function = FunctionSchema(
name="get_current_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the user's location.",
},
},
required=["location", "format"],
)
restaurant_function = FunctionSchema(
name="get_restaurant_recommendation",
description="Get a restaurant recommendation",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
},
required=["location"],
)
tools = ToolsSchema(standard_tools=[weather_function, restaurant_function])
context = LLMContext(tools=tools)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
[
transport.input(),
stt,
user_aggregator,
llm,
tts,
transport.output(),
assistant_aggregator,
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -68,9 +68,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = OpenAITTSService(
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAITTSService.Settings(
instructions="Please speak clearly and at a moderate pace.",
voice="ballad",
),
instructions="Please speak clearly and at a moderate pace.",
)
llm = OpenAILLMService(

View File

@@ -0,0 +1,134 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
"""Example of using OpenAI Realtime voice LLM service with Vonage Video Connector transport."""
import asyncio
import os
import sys
from collections.abc import Callable
from typing import Any
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.observers.loggers.transcription_log_observer import TranscriptionLogObserver
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.vonage import configure
from pipecat.services.openai.realtime.events import (
AudioConfiguration,
AudioInput,
InputAudioNoiseReduction,
InputAudioTranscription,
SemanticTurnDetection,
SessionProperties,
)
from pipecat.services.openai.realtime.llm import OpenAIRealtimeLLMService
from pipecat.transports.vonage.video_connector import (
VonageVideoConnectorTransport,
VonageVideoConnectorTransportParams,
)
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main() -> None:
"""Main entry point for the OpenAI Realtime vonage video connector example."""
(application_id, session_id, token) = await configure()
transport = VonageVideoConnectorTransport(
application_id,
session_id,
token,
VonageVideoConnectorTransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
publisher_name="Bot",
),
)
llm = OpenAIRealtimeLLMService(
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAIRealtimeLLMService.Settings(
system_instruction="""You are a helpful and friendly AI.
Act like a human, but remember that you aren't a human and that you can't do human
things in the real world. Your voice and personality should be warm and engaging, with a lively and
playful tone.
If interacting in a non-English language, start by using the standard accent or dialect familiar to
the user. Talk quickly.
You are participating in a voice conversation. Keep your responses concise, short, and to the point
unless specifically asked to elaborate on a topic.
Remember, your responses should be short. Just one or two sentences, usually. Respond in English.""",
session_properties=SessionProperties(
audio=AudioConfiguration(
input=AudioInput(
transcription=InputAudioTranscription(),
turn_detection=SemanticTurnDetection(),
noise_reduction=InputAudioNoiseReduction(type="near_field"),
)
),
),
),
)
context = LLMContext(
[{"role": "developer", "content": "Say hello!"}],
)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
[
transport.input(),
user_aggregator,
llm,
transport.output(),
assistant_aggregator,
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
observers=[TranscriptionLogObserver()],
)
event_handler: Callable[[str], Callable[[Any], Any]] = transport.event_handler
@event_handler("on_client_connected")
async def on_client_connected(transport: VonageVideoConnectorTransport, client: object) -> None:
logger.info("Client connected")
await task.queue_frames([LLMRunFrame()])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,201 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
"""Example 22: Filter Incomplete Turns
Demonstrates LLM-based turn completion detection to suppress bot responses when
the user was cut off mid-thought. The LLM outputs one of three markers:
- ✓ (complete): User finished their thought, respond normally
- ○ (incomplete short): User was cut off, wait ~5s then prompt
- ◐ (incomplete long): User needs time to think, wait ~10s then prompt
When incomplete is detected, the bot's response is suppressed. After the timeout
expires, the LLM is automatically prompted to re-engage the user.
"""
import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
AssistantTurnStoppedMessage,
LLMContextAggregatorPair,
LLMUserAggregatorParams,
UserTurnStoppedMessage,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.llm_service import FunctionCallParams
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_turn_strategies import FilterIncompleteUserTurnStrategies
load_dotenv(override=True)
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
}
async def get_weather(params: FunctionCallParams, location: str):
"""Return the current weather for a location.
A stub that always reports the same conditions — replace with a real
weather API in production.
Args:
location (str): The city and state or country, e.g. "Paris, France".
"""
await params.result_callback(
{
"location": location,
"temperature_celsius": 22,
"conditions": "partly cloudy",
}
)
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
llm = OpenAILLMService(
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction=(
"You are a helpful assistant in a voice conversation. Your "
"responses will be spoken aloud, so avoid emojis, bullet "
"points, or other formatting that can't be spoken. Respond to "
"what the user said in a creative, helpful, and brief way. "
"If the user asks about the weather, call the get_weather "
"tool and speak the result back naturally."
),
),
)
llm.register_direct_function(get_weather)
tts = CartesiaTTSService(
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
context = LLMContext(tools=ToolsSchema(standard_tools=[get_weather]))
# `FilterIncompleteUserTurnStrategies` pairs the default detector
# chain with `LLMTurnCompletionUserTurnStopStrategy`: detectors
# trigger LLM inference but the public `on_user_turn_stopped` event
# fires only when the LLM confirms ✓. The LLM marks each response
# with one of:
# ✓ = complete (respond normally)
# ○ = incomplete short (wait 5s, then prompt)
# ◐ = incomplete long (wait 10s, then prompt)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
vad_analyzer=SileroVADAnalyzer(),
user_turn_strategies=FilterIncompleteUserTurnStrategies(
# Optional: customize turn completion behavior
# config=UserTurnCompletionConfig(
# incomplete_short_timeout=5.0,
# incomplete_long_timeout=10.0,
# incomplete_short_prompt="Custom prompt...",
# incomplete_long_prompt="Custom prompt...",
# instructions="Custom turn completion instructions...",
# ),
),
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
assistant_aggregator, # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
@user_aggregator.event_handler("on_user_turn_stopped")
async def on_user_turn_stopped(aggregator, strategy, message: UserTurnStoppedMessage):
timestamp = f"[{message.timestamp}] " if message.timestamp else ""
line = f"{timestamp}user: {message.content}"
logger.info(f"Transcript: {line}")
@assistant_aggregator.event_handler("on_assistant_turn_stopped")
async def on_assistant_turn_stopped(aggregator, message: AssistantTurnStoppedMessage):
timestamp = f"[{message.timestamp}] " if message.timestamp else ""
line = f"{timestamp}assistant: {message.content}"
logger.info(f"Transcript: {line}")
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -22,9 +22,9 @@ from pipecat.processors.aggregators.llm_response_universal import (
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.soniox.stt import SonioxSTTService
from pipecat.services.soniox.tts import SonioxTTSService
from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
@@ -53,12 +53,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = SonioxSTTService(api_key=os.environ["SONIOX_API_KEY"])
tts = CartesiaTTSService(
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
tts = SonioxTTSService(api_key=os.environ["SONIOX_API_KEY"])
llm = OpenAILLMService(
api_key=os.environ["OPENAI_API_KEY"],
@@ -103,9 +98,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
await task.queue_frames([LLMRunFrame()])
await asyncio.sleep(10)
logger.info("Updating Soniox STT settings: language=es")
logger.info("Updating Soniox STT settings: language_hints=[es]")
await task.queue_frame(
STTUpdateSettingsFrame(delta=SonioxSTTService.Settings(language=Language.ES))
STTUpdateSettingsFrame(delta=SonioxSTTService.Settings(language_hints=[Language.ES]))
)
@transport.event_handler("on_client_disconnected")

View File

@@ -77,6 +77,7 @@ groq = [ "groq>=0.23.0,<2" ]
gstreamer = [ "pygobject~=3.50.0" ]
heygen = [ "livekit>=1.0.13,<2", "pipecat-ai[websockets-base]" ]
hume = [ "hume>=0.11.2,<1" ]
inception = []
inworld = [ "pipecat-ai[websockets-base]" ]
koala = [ "pvkoala~=2.0.3" ]
kokoro = [ "kokoro-onnx>=0.5.0,<1", "requests>=2.32.5,<3" ]
@@ -103,7 +104,7 @@ piper = [ "piper-tts>=1.3.0,<2", "requests>=2.32.5,<3" ]
qwen = []
resembleai = [ "pipecat-ai[websockets-base]" ]
rime = [ "pipecat-ai[websockets-base]" ]
runner = [ "python-dotenv>=1.0.0,<2.0.0", "uvicorn>=0.32.0,<1.0.0", "fastapi>=0.115.6,<1", "pipecat-ai-small-webrtc-prebuilt>=2.5.0"]
runner = [ "python-dotenv>=1.0.0,<2.0.0", "uvicorn>=0.32.0,<1.0.0", "fastapi>=0.115.6,<1", "pipecat-ai-prebuilt>=1.0.1"]
sagemaker = ["aws_sdk_sagemaker_runtime_http2; python_version>='3.12'"]
sambanova = []
sarvam = [ "sarvamai==0.1.28", "pipecat-ai[websockets-base]" ]
@@ -119,6 +120,7 @@ tavus = [ "pipecat-ai[daily]" ]
together = []
tracing = [ "opentelemetry-sdk>=1.33.0,<2", "opentelemetry-api>=1.33.0,<2", "opentelemetry-instrumentation>=0.54b0,<1" ]
ultravox = [ "pipecat-ai[websockets-base]" ]
vonage-video-connector = [ "vonage-video-connector~=0.2.3b0; python_full_version>='3.13' and python_full_version<'3.14' and platform_system=='Linux'" ]
webrtc = [ "aiortc>=1.14.0,<2", "opencv-python>=4.11.0.86,<5" ]
websocket = [ "pipecat-ai[websockets-base]", "fastapi>=0.115.6,<1" ]
websockets-base = [ "websockets>=13.1,<16.0" ]

View File

@@ -198,6 +198,7 @@ TESTS_FUNCTION_CALLING = [
("function-calling/function-calling-sarvam.py", EVAL_WEATHER),
("function-calling/function-calling-novita.py", EVAL_WEATHER),
("function-calling/function-calling-deepseek.py", EVAL_WEATHER),
("function-calling/function-calling-inception.py", EVAL_WEATHER),
# Video
("function-calling/function-calling-anthropic-video.py", EVAL_VISION_CAMERA),
("function-calling/function-calling-aws-video.py", EVAL_VISION_CAMERA),
@@ -242,6 +243,7 @@ TESTS_VIDEO_AVATAR = [
TESTS_TURN_MANAGEMENT = [
("turn-management/turn-management-filter-incomplete-turns.py", EVAL_COMPLETE_TURN),
("turn-management/turn-management-filter-incomplete-turns-function-calling.py", EVAL_WEATHER),
]
TESTS_THINKING = [

View File

@@ -383,10 +383,14 @@ class AggregatedTextFrame(TextFrame):
Parameters:
aggregated_by: Method used to aggregate the text frames.
context_id: Unique identifier for the TTS context that generated this text.
raw_text: The full matched text including start/end pattern delimiters, set when
this frame was produced from a PatternMatch (e.g. a ``<code>...</code>`` block).
None for ordinary sentence aggregations.
"""
aggregated_by: AggregationType | str
context_id: str | None = None
raw_text: str | None = None
@dataclass

View File

@@ -25,6 +25,7 @@ from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.vad.vad_analyzer import VADAnalyzer
from pipecat.audio.vad.vad_controller import VADController
from pipecat.frames.frames import (
AggregatedTextFrame,
AssistantImageRawFrame,
BotStartedSpeakingFrame,
BotStoppedSpeakingFrame,
@@ -1496,9 +1497,14 @@ class LLMAssistantAggregator(LLMContextAggregator):
if len(frame.text) == 0:
return
text = (
frame.raw_text
if isinstance(frame, AggregatedTextFrame) and frame.raw_text
else frame.text
)
self._aggregation.append(
TextPartForConcatenation(
frame.text, includes_inter_part_spaces=frame.includes_inter_frame_spaces
text, includes_inter_part_spaces=frame.includes_inter_frame_spaces
)
)

View File

@@ -23,6 +23,7 @@ from pipecat.frames.frames import (
)
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.utils.text.base_text_aggregator import BaseTextAggregator
from pipecat.utils.text.pattern_pair_aggregator import PatternMatch
from pipecat.utils.text.simple_text_aggregator import SimpleTextAggregator
@@ -85,7 +86,11 @@ class LLMTextProcessor(FrameProcessor):
out_frame = AggregatedTextFrame(
text=aggregation.text,
aggregated_by=aggregation.type,
raw_text=aggregation.full_match
if isinstance(aggregation, PatternMatch)
else aggregation.text,
)
out_frame.append_to_context = True
out_frame.skip_tts = in_frame.skip_tts
await self.push_frame(out_frame)
@@ -96,6 +101,9 @@ class LLMTextProcessor(FrameProcessor):
out_frame = AggregatedTextFrame(
text=remaining.text,
aggregated_by=remaining.type,
raw_text=remaining.full_match
if isinstance(remaining, PatternMatch)
else remaining.text,
)
out_frame.skip_tts = skip_tts
await self.push_frame(out_frame)

View File

@@ -528,6 +528,9 @@ class RTVIObserver(BaseObserver):
text = await transform(text, agg_type)
isTTS = isinstance(frame, TTSTextFrame)
if agg_type is not AggregationType.WORD:
logger.trace(f"{self} Aggregated LLM text: {text}, {agg_type} spoken:{isTTS}")
if self._params.bot_output_enabled:
message = RTVI.BotOutputMessage(
data=RTVI.BotOutputMessageData(text=text, spoken=isTTS, aggregated_by=agg_type)

View File

@@ -19,6 +19,10 @@ All bots must implement a `bot(runner_args)` async function as the entry point.
The server automatically discovers and executes this function when connections
are established.
By default the runner starts a single FastAPI server that supports WebRTC, Daily,
and telephony transports simultaneously. Clients declare which transport they want
via the ``transport`` field in the ``/start`` request body (default: ``"webrtc"``).
Single transport example::
async def bot(runner_args: RunnerArguments):
@@ -55,18 +59,38 @@ Supported transports:
- WebRTC - Provides local WebRTC interface with prebuilt UI
- Telephony - Handles webhook and WebSocket connections for Twilio, Telnyx, Plivo, Exotel
The ``/start`` endpoint accepts::
{
"transport": "webrtc", // "webrtc" | "daily" | "twilio" | "telnyx" |
// "plivo" | "exotel" — default: "webrtc"
// WebRTC-specific
"enableDefaultIceServers": false,
"body": {...},
// Daily-specific
"createDailyRoom": true,
"dailyRoomProperties": {...},
"dailyMeetingTokenProperties": {...},
"body": {...}
}
To run locally:
- WebRTC: `python bot.py -t webrtc`
- ESP32: `python bot.py -t webrtc --esp32 --host 192.168.1.100`
- Daily (server): `python bot.py -t daily`
- Daily (direct, testing only): `python bot.py -d`
- Telephony: `python bot.py -t twilio -x your_username.ngrok.io`
- Exotel: `python bot.py -t exotel` (no proxy needed, but ngrok connection to HTTP 7860 is required)
- All transports (default): ``python bot.py``
- WebRTC only: ``python bot.py -t webrtc``
- ESP32: ``python bot.py -t webrtc --esp32 --host 192.168.1.100``
- Daily only: ``python bot.py -t daily``
- Daily (direct, testing only): ``python bot.py -d``
- Telephony: ``python bot.py -t twilio -x your_username.ngrok.io``
- Exotel: ``python bot.py -t exotel`` (no proxy needed, but ngrok connection to HTTP 7860 is required)
- WhatsApp: ``python bot.py --whatsapp``
"""
import argparse
import asyncio
import importlib.util
import mimetypes
import os
import sys
@@ -85,8 +109,10 @@ from pipecat.runner.types import (
DailyRunnerArguments,
RunnerArguments,
SmallWebRTCRunnerArguments,
VonageRunnerArguments,
WebSocketRunnerArguments,
)
from pipecat.runner.vonage import configure as configure_vonage
try:
import uvicorn
@@ -106,6 +132,18 @@ load_dotenv(override=True)
os.environ["ENV"] = "local"
TELEPHONY_TRANSPORTS = ["twilio", "telnyx", "plivo", "exotel"]
TRANSPORT_ROUTE_DEPENDENCIES = {
"daily": ("daily",),
"webrtc": ("aiortc",),
"telephony": ("fastapi", "websockets"),
"websocket": ("fastapi", "websockets"),
}
TRANSPORT_INSTALL_HINTS = {
"daily": "install pipecat-ai[daily]",
"webrtc": "install pipecat-ai[webrtc]",
"telephony": "install pipecat-ai[websocket]",
"websocket": "install pipecat-ai[websocket]",
}
# Mirror Pipecat Cloud's 4-hour max session limit so dev rooms get cleaned up.
PIPECAT_ROOM_EXP_HOURS = 4.0
@@ -131,6 +169,120 @@ Import this to add custom routes from other packages before calling
"""
def _is_module_available(module: str) -> bool:
"""Check whether a module can be imported without importing it.
Args:
module: Fully-qualified module name to check.
Returns:
``True`` if Python can resolve the module, ``False`` otherwise.
"""
try:
return importlib.util.find_spec(module) is not None
except (ImportError, ModuleNotFoundError, ValueError):
return False
def _transport_route_dependencies(transport: str) -> tuple[str, ...]:
"""Return module dependencies required for a transport route.
Args:
transport: Transport name from the runner request or CLI.
Returns:
Module names required to enable the transport route.
"""
if transport in TELEPHONY_TRANSPORTS:
return TRANSPORT_ROUTE_DEPENDENCIES["telephony"]
return TRANSPORT_ROUTE_DEPENDENCIES.get(transport, ())
def _transport_routes_enabled(transport: str) -> bool:
"""Return whether a transport route can run in this environment.
Args:
transport: Transport name from the runner request or CLI.
Returns:
``True`` if the requested transport is enabled.
"""
return all(_is_module_available(module) for module in _transport_route_dependencies(transport))
def _runner_url(args: argparse.Namespace) -> str:
"""Return the browser URL for the runner prebuilt client."""
return f"http://{args.host}:{args.port}"
def _transport_status_lists() -> tuple[list[str], list[str]]:
"""Return enabled and disabled transport labels for the startup banner."""
transports = ["daily", "webrtc", "telephony", "websocket"]
enabled = []
disabled = []
for label in transports:
if _transport_routes_enabled(label):
enabled.append(label)
else:
disabled.append(f"{label} ({TRANSPORT_INSTALL_HINTS[label]})")
return enabled, disabled
def _format_transport_status(labels: list[str]) -> str:
"""Format a startup banner transport status list."""
return ", ".join(labels) if labels else "none"
def _print_startup_message(args: argparse.Namespace):
"""Print connection information for the development runner."""
print()
if args.transport is None:
enabled, disabled = _transport_status_lists()
print("🚀 Bot ready!")
print(f" → Open: {_runner_url(args)}")
print(f" → Enabled transports: {_format_transport_status(enabled)}")
if disabled:
print(f" → Disabled transports: {_format_transport_status(disabled)}")
elif args.transport == "webrtc":
if args.esp32:
print("🚀 Bot ready! (ESP32 mode)")
elif args.whatsapp:
print("🚀 Bot ready! (WhatsApp)")
else:
print("🚀 Bot ready! (WebRTC)")
if _transport_routes_enabled("webrtc"):
print(f" → Open: {_runner_url(args)}")
else:
print(f" → WebRTC disabled ({TRANSPORT_INSTALL_HINTS['webrtc']})")
elif args.transport == "daily":
print("🚀 Bot ready! (Daily)")
if not _transport_routes_enabled("daily"):
print(f" → Daily disabled ({TRANSPORT_INSTALL_HINTS['daily']})")
else:
print(f" → Open: {_runner_url(args)}")
if args.dialin:
print(
f" → Daily dial-in webhook: "
f"http://{args.host}:{args.port}/daily-dialin-webhook"
)
print(" → Configure this URL in your Daily phone number settings")
elif args.transport in TELEPHONY_TRANSPORTS:
print(f"🚀 Bot ready! ({args.transport.capitalize()})")
if not _transport_routes_enabled(args.transport):
print(f" → Telephony disabled ({TRANSPORT_INSTALL_HINTS['telephony']})")
else:
print(f" → Open: {_runner_url(args)}")
if args.proxy:
print(f" → XML webhook: http://{args.host}:{args.port}/")
print(f" → WebSocket: ws://{args.host}:{args.port}/ws")
elif args.transport == "vonage":
print()
print("🚀 Bot ready!")
print()
def _get_bot_module():
"""Get the bot module from the calling script."""
import importlib.util
@@ -186,8 +338,35 @@ async def _run_telephony_bot(websocket: WebSocket, args: argparse.Namespace):
await bot_module.bot(runner_args)
async def _run_websocket_bot(websocket: WebSocket, args: argparse.Namespace):
"""Run a bot for plain WebSocket transport."""
bot_module = _get_bot_module()
runner_args = WebSocketRunnerArguments(
websocket=websocket,
transport_type="websocket",
session_id=str(uuid.uuid4()),
)
runner_args.cli_args = args
await bot_module.bot(runner_args)
def _setup_websocket_routes(app: FastAPI, args: argparse.Namespace):
"""Set up the plain WebSocket route at ``/ws-client``."""
if not _transport_routes_enabled("websocket"):
return
@app.websocket("/ws-client")
async def websocket_client_endpoint(websocket: WebSocket):
"""Handle plain WebSocket connections (non-telephony)."""
await websocket.accept()
logger.debug("Plain WebSocket connection accepted")
await _run_websocket_bot(websocket, args)
def _configure_server_app(args: argparse.Namespace):
"""Configure the module-level FastAPI app with transport-specific routes."""
"""Configure the module-level FastAPI app with routes for all transports."""
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
@@ -196,17 +375,207 @@ def _configure_server_app(args: argparse.Namespace):
allow_headers=["*"],
)
# Set up transport-specific routes
if args.transport == "webrtc":
_setup_webrtc_routes(app, args)
if args.whatsapp:
_setup_whatsapp_routes(app, args)
elif args.transport == "daily":
_setup_daily_routes(app, args)
elif args.transport in TELEPHONY_TRANSPORTS:
_setup_telephony_routes(app, args)
else:
logger.warning(f"Unknown transport type: {args.transport}")
# Shared session store: session_id -> body data. Used by the WebRTC /start
# flow and the /sessions/{session_id}/... proxy routes.
active_sessions: dict[str, dict[str, Any]] = {}
_setup_frontend_routes(app)
_setup_webrtc_routes(app, args, active_sessions)
_setup_daily_routes(app, args)
_setup_telephony_routes(app, args)
_setup_websocket_routes(app, args)
_setup_unified_start_route(app, args, active_sessions)
if args.whatsapp:
_setup_whatsapp_routes(app, args)
def _setup_unified_start_route(
app: FastAPI, args: argparse.Namespace, active_sessions: dict[str, dict[str, Any]]
):
"""Register the unified POST /start and GET /status endpoints.
Handles WebRTC, Daily, and telephony transport start flows. Clients specify
which transport they want via the ``transport`` field in the request body.
When ``-t`` was passed on the command line, requests for any other transport
are rejected with HTTP 400.
"""
ALL_TRANSPORTS = ["webrtc", "daily", *TELEPHONY_TRANSPORTS, "websocket"]
@app.get("/status")
async def status():
"""Return the transports supported by this runner instance."""
transports = [args.transport] if args.transport is not None else ALL_TRANSPORTS
return {"status": "ready", "transports": transports}
class IceServer(TypedDict, total=False):
urls: str | list[str]
class IceConfig(TypedDict):
iceServers: list[IceServer]
class StartBotResult(TypedDict, total=False):
sessionId: str
iceConfig: IceConfig | None
dailyRoom: str | None
dailyToken: str | None
wsUrl: str | None
token: str | None
@app.post("/start")
async def start_agent(request: Request):
"""Start a bot session.
Accepts::
{
"transport": "webrtc", // "webrtc" | "daily" | "twilio" | "telnyx" |
// "plivo" | "exotel" — default: "webrtc"
// WebRTC-specific
"enableDefaultIceServers": false,
"body": {...},
// Daily-specific
"createDailyRoom": true,
"dailyRoomProperties": {...},
"dailyMeetingTokenProperties": {...},
"body": {...}
}
"""
try:
request_data = await request.json()
logger.debug(f"Received request: {request_data}")
except Exception as e:
logger.error(f"Failed to parse request body: {e}")
request_data = {}
# Determine transport: explicit field → legacy Daily hint → CLI default → webrtc
transport = request_data.get("transport")
if transport is None and request_data.get("createDailyRoom", False):
transport = "daily"
if transport is None:
transport = args.transport or "webrtc"
# Enforce restriction when -t was explicitly set on the command line
if args.transport is not None and transport != args.transport:
raise HTTPException(
status_code=400,
detail=(
f"Transport '{transport}' is not allowed. "
f"Server is configured for '{args.transport}' only (-t {args.transport})."
),
)
if not _transport_routes_enabled(transport):
raise HTTPException(
status_code=400,
detail=(
f"Transport '{transport}' is disabled in this runner environment. "
"Check the startup banner for enabled transports."
),
)
if transport == "webrtc":
# WebRTC: register the session; the bot starts when the WebRTC offer arrives.
session_id = str(uuid.uuid4())
active_sessions[session_id] = request_data.get("body", {})
result = StartBotResult(
sessionId=session_id,
)
if request_data.get("enableDefaultIceServers"):
result["iceConfig"] = IceConfig(
iceServers=[IceServer(urls=["stun:stun.l.google.com:19302"])]
)
return result
elif transport == "daily":
create_daily_room = request_data.get("createDailyRoom", False)
body = request_data.get("body", {})
daily_room_properties_dict = request_data.get("dailyRoomProperties", None)
daily_token_properties_dict = request_data.get("dailyMeetingTokenProperties", None)
bot_module = _get_bot_module()
existing_room_url = os.getenv("DAILY_ROOM_URL")
session_id = str(uuid.uuid4())
result: StartBotResult | None = None
if create_daily_room or existing_room_url:
from pipecat.runner.daily import configure
from pipecat.transports.daily.utils import (
DailyMeetingTokenProperties,
DailyRoomProperties,
)
async with aiohttp.ClientSession() as session:
room_properties = None
if daily_room_properties_dict:
daily_room_properties_dict.setdefault(
"exp", time.time() + PIPECAT_ROOM_EXP_HOURS * 3600
)
daily_room_properties_dict.setdefault("eject_at_room_exp", True)
try:
room_properties = DailyRoomProperties(**daily_room_properties_dict)
logger.debug(f"Using custom room properties: {room_properties}")
except Exception as e:
logger.error(f"Failed to parse dailyRoomProperties: {e}")
token_properties = None
if daily_token_properties_dict:
try:
token_properties = DailyMeetingTokenProperties(
**daily_token_properties_dict
)
logger.debug(f"Using custom token properties: {token_properties}")
except Exception as e:
logger.error(f"Failed to parse dailyMeetingTokenProperties: {e}")
room_url, token = await configure(
session,
room_exp_duration=PIPECAT_ROOM_EXP_HOURS,
room_properties=room_properties,
token_properties=token_properties,
)
runner_args = DailyRunnerArguments(
room_url=room_url, token=token, body=body, session_id=session_id
)
result = StartBotResult(
dailyRoom=room_url,
dailyToken=token,
sessionId=session_id,
)
else:
runner_args = RunnerArguments(body=body, session_id=session_id)
runner_args.cli_args = args
asyncio.create_task(bot_module.bot(runner_args))
return result
elif transport in TELEPHONY_TRANSPORTS:
# Telephony: the bot starts when the provider connects to /ws.
# Return the WebSocket URL so the caller knows where to point their provider.
scheme = "wss" if args.host != "localhost" else "ws"
return StartBotResult(
wsUrl=f"{scheme}://{args.host}:{args.port}/ws",
)
elif transport == "websocket":
# Plain WebSocket: the bot starts when the client connects to /ws-client.
scheme = "wss" if args.host != "localhost" else "ws"
session_id = str(uuid.uuid4())
return StartBotResult(
wsUrl=f"{scheme}://{args.host}:{args.port}/ws-client",
sessionId=session_id,
token="mock_token",
)
else:
raise HTTPException(
status_code=400,
detail=f"Unknown transport '{transport}'.",
)
def _resolve_download_path(folder: str, filename: str) -> Path:
@@ -220,11 +589,30 @@ def _resolve_download_path(folder: str, filename: str) -> Path:
return file_path
def _setup_webrtc_routes(app: FastAPI, args: argparse.Namespace):
"""Set up WebRTC-specific routes."""
def _setup_frontend_routes(app: FastAPI):
"""Mount the prebuilt frontend UI and root redirect for all transports."""
try:
from pipecat_ai_small_webrtc_prebuilt.frontend import SmallWebRTCPrebuiltUI
from pipecat_ai_prebuilt.frontend import PipecatPrebuiltUI
except ImportError as e:
logger.error(f"Prebuilt frontend not available: {e}")
return
app.mount("/client", PipecatPrebuiltUI)
@app.get("/", include_in_schema=False)
async def root_redirect():
"""Redirect root requests to client interface."""
return RedirectResponse(url="/client/")
def _setup_webrtc_routes(
app: FastAPI, args: argparse.Namespace, active_sessions: dict[str, dict[str, Any]]
):
"""Set up WebRTC-specific routes."""
if not _transport_routes_enabled("webrtc"):
return
try:
from pipecat.transports.smallwebrtc.connection import SmallWebRTCConnection
from pipecat.transports.smallwebrtc.request_handler import (
IceCandidate,
@@ -233,30 +621,9 @@ def _setup_webrtc_routes(app: FastAPI, args: argparse.Namespace):
SmallWebRTCRequestHandler,
)
except ImportError as e:
logger.error(f"WebRTC transport dependencies not installed: {e}")
logger.warning(f"WebRTC routes disabled after dependency check passed: {e}")
return
class IceServer(TypedDict, total=False):
urls: str | list[str]
class IceConfig(TypedDict):
iceServers: list[IceServer]
class StartBotResult(TypedDict, total=False):
sessionId: str
iceConfig: IceConfig | None
# In-memory store of active sessions: session_id -> session info
active_sessions: dict[str, dict[str, Any]] = {}
# Mount the frontend
app.mount("/client", SmallWebRTCPrebuiltUI)
@app.get("/", include_in_schema=False)
async def root_redirect():
"""Redirect root requests to client interface."""
return RedirectResponse(url="/client/")
@app.get("/files/{filename:path}")
async def download_file(filename: str):
"""Handle file downloads."""
@@ -315,29 +682,6 @@ def _setup_webrtc_routes(app: FastAPI, args: argparse.Namespace):
await small_webrtc_handler.handle_patch_request(request)
return {"status": "success"}
@app.post("/start")
async def rtvi_start(request: Request):
"""Mimic Pipecat Cloud's /start endpoint."""
# Parse the request body
try:
request_data = await request.json()
logger.debug(f"Received request: {request_data}")
except Exception as e:
logger.error(f"Failed to parse request body: {e}")
request_data = {}
# Store session info immediately in memory, replicate the behavior expected on Pipecat Cloud
session_id = str(uuid.uuid4())
active_sessions[session_id] = request_data.get("body", {})
result: StartBotResult = {"sessionId": session_id}
if request_data.get("enableDefaultIceServers"):
result["iceConfig"] = IceConfig(
iceServers=[IceServer(urls=["stun:stun.l.google.com:19302"])]
)
return result
@app.api_route(
"/sessions/{session_id}/{path:path}",
methods=["GET", "POST", "PUT", "PATCH", "DELETE"],
@@ -562,13 +906,13 @@ def _setup_whatsapp_routes(app: FastAPI, args: argparse.Namespace):
def _setup_daily_routes(app: FastAPI, args: argparse.Namespace):
"""Set up Daily-specific routes."""
if not _transport_routes_enabled("daily"):
return
@app.get("/")
@app.get("/daily")
async def create_room_and_start_agent():
"""Launch a Daily bot and redirect to room."""
print("Starting bot with Daily transport and redirecting to Daily room")
import aiohttp
logger.debug("Starting bot with Daily transport and redirecting to Daily room")
from pipecat.runner.daily import configure
@@ -584,105 +928,6 @@ def _setup_daily_routes(app: FastAPI, args: argparse.Namespace):
asyncio.create_task(bot_module.bot(runner_args))
return RedirectResponse(room_url)
@app.post("/start")
async def start_agent(request: Request):
"""Handler for /start endpoints.
Expects POST body like::
{
"createDailyRoom": true,
"dailyRoomProperties": { "start_video_off": true },
"dailyMeetingTokenProperties": { "is_owner": true, "user_name": "Bot" },
"body": { "custom_data": "value" }
}
"""
print("Starting bot with Daily transport")
# Parse the request body
try:
request_data = await request.json()
logger.debug(f"Received request: {request_data}")
except Exception as e:
logger.error(f"Failed to parse request body: {e}")
request_data = {}
create_daily_room = request_data.get("createDailyRoom", False)
body = request_data.get("body", {})
daily_room_properties_dict = request_data.get("dailyRoomProperties", None)
daily_token_properties_dict = request_data.get("dailyMeetingTokenProperties", None)
bot_module = _get_bot_module()
existing_room_url = os.getenv("DAILY_ROOM_URL")
session_id = str(uuid.uuid4())
result = None
# Configure room if:
# 1. Explicitly requested via createDailyRoom in payload
# 2. Using pre-configured room from DAILY_ROOM_URL env var
if create_daily_room or existing_room_url:
import aiohttp
from pipecat.runner.daily import configure
from pipecat.transports.daily.utils import (
DailyMeetingTokenProperties,
DailyRoomProperties,
)
async with aiohttp.ClientSession() as session:
# Parse dailyRoomProperties if provided
room_properties = None
if daily_room_properties_dict:
# Apply Pipecat Cloud's session policy if caller didn't override.
daily_room_properties_dict.setdefault(
"exp", time.time() + PIPECAT_ROOM_EXP_HOURS * 3600
)
daily_room_properties_dict.setdefault("eject_at_room_exp", True)
try:
room_properties = DailyRoomProperties(**daily_room_properties_dict)
logger.debug(f"Using custom room properties: {room_properties}")
except Exception as e:
logger.error(f"Failed to parse dailyRoomProperties: {e}")
# Continue without custom properties
# Parse dailyMeetingTokenProperties if provided
token_properties = None
if daily_token_properties_dict:
try:
token_properties = DailyMeetingTokenProperties(
**daily_token_properties_dict
)
logger.debug(f"Using custom token properties: {token_properties}")
except Exception as e:
logger.error(f"Failed to parse dailyMeetingTokenProperties: {e}")
# Continue without custom properties
room_url, token = await configure(
session,
room_exp_duration=PIPECAT_ROOM_EXP_HOURS,
room_properties=room_properties,
token_properties=token_properties,
)
runner_args = DailyRunnerArguments(
room_url=room_url, token=token, body=body, session_id=session_id
)
result = {
"dailyRoom": room_url,
"dailyToken": token,
"sessionId": session_id,
}
else:
runner_args = RunnerArguments(body=body, session_id=session_id)
# Update CLI args.
runner_args.cli_args = args
# Start the bot in the background
asyncio.create_task(bot_module.bot(runner_args))
return result
if args.dialin:
@app.post("/daily-dialin-webhook")
@@ -731,8 +976,6 @@ def _setup_daily_routes(app: FastAPI, args: argparse.Namespace):
detail="Missing required fields: From, To, callId, callDomain",
)
import aiohttp
from pipecat.runner.daily import configure
from pipecat.runner.types import DailyDialinRequest, DialinSettings
@@ -801,44 +1044,54 @@ def _setup_daily_routes(app: FastAPI, args: argparse.Namespace):
def _setup_telephony_routes(app: FastAPI, args: argparse.Namespace):
"""Set up telephony-specific routes."""
# XML response templates (Exotel doesn't use XML webhooks)
XML_TEMPLATES = {
"twilio": f"""<?xml version="1.0" encoding="UTF-8"?>
"""Set up telephony-specific routes.
The WebSocket endpoint (``/ws``) is always registered so providers can
connect directly. The XML webhook (``POST /``) is only registered when a
specific telephony transport is chosen via ``-t`` because the XML template
is provider-specific and requires a proxy hostname (``--proxy``).
"""
if not _transport_routes_enabled("telephony"):
return
if args.transport in TELEPHONY_TRANSPORTS:
# XML response templates (Exotel doesn't use XML webhooks)
XML_TEMPLATES = {
"twilio": f"""<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Connect>
<Stream url="wss://{args.proxy}/ws"></Stream>
</Connect>
<Pause length="40"/>
</Response>""",
"telnyx": f"""<?xml version="1.0" encoding="UTF-8"?>
"telnyx": f"""<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Connect>
<Stream url="wss://{args.proxy}/ws" bidirectionalMode="rtp"></Stream>
</Connect>
<Pause length="40"/>
</Response>""",
"plivo": f"""<?xml version="1.0" encoding="UTF-8"?>
"plivo": f"""<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Stream bidirectional="true" keepCallAlive="true" contentType="audio/x-mulaw;rate=8000">wss://{args.proxy}/ws</Stream>
</Response>""",
}
}
@app.post("/")
async def start_call():
"""Handle telephony webhook and return XML response."""
if args.transport == "exotel":
# Exotel doesn't use POST webhooks - redirect to proper documentation
logger.debug("POST Exotel endpoint - not used")
return {
"error": "Exotel doesn't use POST webhooks",
"websocket_url": f"wss://{args.proxy}/ws",
"note": "Configure the WebSocket URL above in your Exotel App Bazaar Voicebot Applet",
}
else:
logger.debug(f"POST {args.transport.upper()} XML")
xml_content = XML_TEMPLATES.get(args.transport, "<Response></Response>")
return HTMLResponse(content=xml_content, media_type="application/xml")
@app.post("/")
async def start_call():
"""Handle telephony webhook and return XML response."""
if args.transport == "exotel":
# Exotel doesn't use POST webhooks - redirect to proper documentation
logger.debug("POST Exotel endpoint - not used")
return {
"error": "Exotel doesn't use POST webhooks",
"websocket_url": f"wss://{args.proxy}/ws",
"note": "Configure the WebSocket URL above in your Exotel App Bazaar Voicebot Applet",
}
else:
logger.debug(f"POST {args.transport.upper()} XML")
xml_content = XML_TEMPLATES.get(args.transport, "<Response></Response>")
return HTMLResponse(content=xml_content, media_type="application/xml")
@app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
@@ -847,11 +1100,6 @@ def _setup_telephony_routes(app: FastAPI, args: argparse.Namespace):
logger.debug("WebSocket connection accepted")
await _run_telephony_bot(websocket, args)
@app.get("/")
async def start_agent():
"""Simple status endpoint for telephony transports."""
return {"status": f"Bot started with {args.transport}"}
async def _run_daily_direct(args: argparse.Namespace):
"""Run Daily bot with direct connection (no FastAPI server)."""
@@ -883,6 +1131,25 @@ async def _run_daily_direct(args: argparse.Namespace):
await bot_module.bot(runner_args)
async def _run_vonage():
"""Run Vonage bot (no FastAPI server)."""
logger.info("Running Vonage transport...")
application_id, session_id, token = await configure_vonage()
runner_args = VonageRunnerArguments(
application_id=application_id, vonage_session_id=session_id, token=token
)
runner_args.handle_sigint = True
# Get the bot module and run it directly
bot_module = _get_bot_module()
print(f"Joining Vonage session: {runner_args.vonage_session_id}")
print()
await bot_module.bot(runner_args)
def _validate_and_clean_proxy(proxy: str) -> str:
"""Validate and clean proxy hostname, removing protocol if present."""
if not proxy:
@@ -922,22 +1189,27 @@ def runner_port() -> int:
def main(parser: argparse.ArgumentParser | None = None):
"""Start the Pipecat development runner.
Parses command-line arguments and starts a FastAPI server configured
for the specified transport type.
Parses command-line arguments and starts a FastAPI server that supports
WebRTC, Daily, and telephony transports simultaneously. Clients declare
which transport to use via the ``transport`` field in the ``/start`` body.
When ``-t`` is provided, the server restricts ``/start`` to that transport
only and displays transport-specific startup information.
The runner discovers and runs any ``bot(runner_args)`` function found in the
calling module.
Command-line arguments:
- --host: Server host address (default: localhost) 879
- --host: Server host address (default: localhost)
- --port: Server port (default: 7860)
- -t/--transport: Transport type (daily, webrtc, twilio, telnyx, plivo, exotel)
- -t/--transport: Restrict to a single transport and set as default for /start
(daily, webrtc, twilio, telnyx, plivo, exotel). Omit to support all transports.
- -x/--proxy: Public proxy hostname for telephony webhooks
- -d/--direct: Connect directly to Daily room (automatically sets transport to daily)
- -f/--folder: Path to downloads folder
- --dialin: Enable Daily PSTN dial-in webhook handling (requires Daily transport)
- --dialin: Enable Daily PSTN dial-in webhook handling
- --esp32: Enable SDP munging for ESP32 compatibility (requires --host with IP address)
- --whatsapp: Ensure requried WhatsApp environment variables are present
- --whatsapp: Ensure required WhatsApp environment variables are present
- -v/--verbose: Increase logging verbosity
Args:
@@ -957,9 +1229,12 @@ def main(parser: argparse.ArgumentParser | None = None):
"-t",
"--transport",
type=str,
choices=["daily", "webrtc", *TELEPHONY_TRANSPORTS],
default="webrtc",
help="Transport type",
choices=["daily", "vonage", "webrtc", *TELEPHONY_TRANSPORTS],
default=None,
help=(
"Restrict the server to a single transport and set it as the default for /start. "
"Omit to support all transports simultaneously (default behaviour)."
),
)
parser.add_argument("-x", "--proxy", help="Public proxy host name")
parser.add_argument(
@@ -977,7 +1252,7 @@ def main(parser: argparse.ArgumentParser | None = None):
"--dialin",
action="store_true",
default=False,
help="Enable Daily PSTN dial-in webhook handling (requires Daily transport)",
help="Enable Daily PSTN dial-in webhook handling",
)
parser.add_argument(
"--esp32",
@@ -989,7 +1264,7 @@ def main(parser: argparse.ArgumentParser | None = None):
"--whatsapp",
action="store_true",
default=False,
help="Ensure requried WhatsApp environment variables are present",
help="Ensure required WhatsApp environment variables are present",
)
args = parser.parse_args()
@@ -998,12 +1273,13 @@ def main(parser: argparse.ArgumentParser | None = None):
if args.proxy:
args.proxy = _validate_and_clean_proxy(args.proxy)
# Auto-set transport to daily if --direct is used without explicit transport
if args.direct and args.transport == "webrtc": # webrtc is the default
args.transport = "daily"
elif args.direct and args.transport != "daily":
logger.error("--direct flag only works with Daily transport (-t daily)")
return
# --direct implies Daily transport
if args.direct:
if args.transport is None or args.transport == "daily":
args.transport = "daily"
else:
logger.error("--direct flag only works with Daily transport (-t daily)")
return
# Validate ESP32 requirements
if args.esp32 and args.host == "localhost":
@@ -1011,7 +1287,7 @@ def main(parser: argparse.ArgumentParser | None = None):
return
# Validate dial-in requirements
if args.dialin and args.transport != "daily":
if args.dialin and args.transport is not None and args.transport != "daily":
logger.error("--dialin flag only works with Daily transport (-t daily)")
return
@@ -1029,28 +1305,12 @@ def main(parser: argparse.ArgumentParser | None = None):
asyncio.run(_run_daily_direct(args))
return
# Print startup message for server-based transports
if args.transport == "webrtc":
print()
if args.esp32:
print(f"🚀 Bot ready! (ESP32 mode)")
elif args.whatsapp:
print(f"🚀 Bot ready! (WhatsApp)")
else:
print(f"🚀 Bot ready!")
print(f" → Open http://{args.host}:{args.port}/client in your browser")
print()
elif args.transport == "daily":
print()
print(f"🚀 Bot ready!")
if args.dialin:
print(
f" → Daily dial-in webhook: http://{args.host}:{args.port}/daily-dialin-webhook"
)
print(f" → Configure this URL in your Daily phone number settings")
else:
print(f" → Open http://{args.host}:{args.port} in your browser to start a session")
# Print startup message
_print_startup_message(args)
if args.transport == "vonage":
asyncio.run(_run_vonage())
print()
return
RUNNER_DOWNLOADS_FOLDER = args.folder
RUNNER_HOST = args.host

View File

@@ -99,16 +99,35 @@ class DailyRunnerArguments(RunnerArguments):
token: str | None = None
@dataclass
class VonageRunnerArguments(RunnerArguments):
"""Vonage transport session arguments for the runner.
Parameters:
application_id: Vonage application ID
vonage_session_id: Vonage session ID
token: Vonage Session Token
"""
application_id: str
vonage_session_id: str
token: str
@dataclass
class WebSocketRunnerArguments(RunnerArguments):
"""WebSocket transport session arguments for the runner.
Parameters:
websocket: WebSocket connection for audio streaming
transport_type: Transport type identifier. Set to ``"websocket"`` for plain
WebSocket connections; ``None`` triggers auto-detection from the first
telephony provider message.
body: Additional request data
"""
websocket: WebSocket
transport_type: str | None = None
@dataclass

View File

@@ -33,7 +33,7 @@ import json
import os
import re
from collections.abc import Callable
from typing import Any
from typing import Any, cast
from fastapi import WebSocket
from loguru import logger
@@ -42,9 +42,10 @@ from pipecat.runner.types import (
DailyRunnerArguments,
LiveKitRunnerArguments,
SmallWebRTCRunnerArguments,
VonageRunnerArguments,
WebSocketRunnerArguments,
)
from pipecat.transports.base_transport import BaseTransport
from pipecat.transports.base_transport import BaseTransport, TransportParams
def _detect_transport_type_from_message(message_data: dict) -> str:
@@ -271,6 +272,14 @@ def get_transport_client_id(transport: BaseTransport, client: Any) -> str:
except ImportError:
pass
try:
from pipecat.transports.vonage.video_connector import VonageVideoConnectorTransport
if isinstance(transport, VonageVideoConnectorTransport):
return client["streamId"]
except ImportError:
pass
logger.warning(f"Unable to get client id from unsupported transport {type(transport)}")
return ""
@@ -303,6 +312,24 @@ async def maybe_capture_participant_camera(
except ImportError:
pass
try:
from pipecat.transports.vonage.video_connector import (
SubscribeSettings,
VonageVideoConnectorTransport,
)
if isinstance(transport, VonageVideoConnectorTransport):
await transport.subscribe_to_stream(
client["streamId"],
SubscribeSettings(
subscribe_to_audio=True,
subscribe_to_video=True,
preferred_framerate=framerate if framerate != 0 else None,
),
)
except ImportError:
pass
async def maybe_capture_participant_screen(
transport: BaseTransport, client: Any, framerate: int = 0
@@ -534,6 +561,10 @@ async def create_transport(
audio_out_enabled=True,
# add_wav_header and serializer will be set automatically
),
"vonage": lambda: VonageVideoConnectorTransportParams(
audio_in_enabled=True,
audio_out_enabled=True
),
}
transport = await create_transport(runner_args, transport_params)
@@ -562,6 +593,12 @@ async def create_transport(
)
elif isinstance(runner_args, WebSocketRunnerArguments):
if runner_args.transport_type == "websocket":
params = _get_transport_params("websocket", transport_params)
from pipecat.transports.websocket.fastapi import FastAPIWebsocketTransport
return FastAPIWebsocketTransport(websocket=runner_args.websocket, params=params)
# Parse once to determine the provider and get data
transport_type, call_data = await parse_telephony_websocket(runner_args.websocket)
params = _get_transport_params(transport_type, transport_params)
@@ -581,6 +618,31 @@ async def create_transport(
runner_args.room_name,
params=params,
)
elif isinstance(runner_args, VonageRunnerArguments):
from pipecat.transports.vonage.video_connector import (
VonageVideoConnectorTransport,
VonageVideoConnectorTransportParams,
)
try:
params = cast(
VonageVideoConnectorTransportParams,
_get_transport_params("vonage", transport_params),
)
except ValueError:
webrtc_params: TransportParams = cast(
TransportParams, _get_transport_params("webrtc", transport_params)
)
params = VonageVideoConnectorTransportParams(
**webrtc_params.model_dump(),
video_in_auto_subscribe=True,
)
return VonageVideoConnectorTransport(
runner_args.application_id,
runner_args.vonage_session_id,
runner_args.token,
params=params,
)
else:
raise ValueError(f"Unsupported runner arguments type: {type(runner_args)}")

View File

@@ -0,0 +1,52 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
"""Vonage session configuration utilities.
This module extracts the necessary parameters to connect to a Vonage Video session.
Required environment variables:
- VONAGE_APPLICATION_ID - Vonage application ID
- VONAGE_SESSION_ID - Vonage session ID
- VONAGE_TOKEN - Vonage token
Example:
from pipecat.runner.vonage import configure
application_id, session_id, token = await configure()
"""
import os
async def configure() -> tuple[str, str, str]:
"""Configure Vonage application ID, session ID and token from environment.
Returns:
Tuple containing the server application_id, session_id and token.
Raises:
Exception: If required Vonage configuration is not provided.
"""
application_id = os.getenv("VONAGE_APPLICATION_ID")
session_id = os.getenv("VONAGE_SESSION_ID")
token = os.getenv("VONAGE_TOKEN")
if not application_id:
raise Exception(
"No Vonage application ID specified. Use set VONAGE_APPLICATION_ID in your environment."
)
if not session_id:
raise Exception(
"No Vonage Session ID specified. Use set VONAGE_SESSION_ID in your environment."
)
if not token:
raise Exception("No Vonage token specified. Use set VONAGE_TOKEN in your environment.")
return (application_id, session_id, token)

View File

@@ -586,9 +586,9 @@ class AssemblyAISTTService(WebsocketSTTService):
await self._call_event_handler("on_connected")
logger.debug(f"{self} Connected to AssemblyAI WebSocket")
except Exception as e:
self._websocket = None
self._connected = False
await self.push_error(error_msg=f"Unable to connect to AssemblyAI: {e}", exception=e)
raise
async def _disconnect_websocket(self):
"""Close the websocket connection to AssemblyAI."""

View File

@@ -339,10 +339,10 @@ class AWSTranscribeSTTService(WebsocketSTTService):
await self._call_event_handler("on_connected")
logger.info(f"{self} Successfully connected to AWS Transcribe")
except Exception as e:
self._websocket = None
await self.push_error(
error_msg=f"Unable to connect to AWS Transcribe: {e}", exception=e
)
raise
async def _disconnect_websocket(self):
"""Close the websocket connection to AWS Transcribe."""

View File

@@ -540,14 +540,25 @@ class AzureTTSService(TTSService, AzureBaseTTSService):
self._last_timestamp = timestamp
async def _word_processor_task_handler(self):
"""Process word timestamps from the queue and call add_word_timestamps."""
"""Process word timestamps from the queue and call add_word_timestamps.
Also handles a None sentinel from _handle_completed: once all pending
words have been drained, it signals audio stream completion via
_audio_queue so that run_tts exits only after the last word has been
processed.
"""
while True:
try:
word, timestamp_seconds = await self._word_boundary_queue.get()
if self._current_context_id:
await self.add_word_timestamps(
[(word, timestamp_seconds)], self._current_context_id
)
item = await self._word_boundary_queue.get()
if item is None:
# All words drained — now signal audio completion.
self._audio_queue.put_nowait(None)
else:
word, timestamp_seconds = item
if self._current_context_id:
await self.add_word_timestamps(
[(word, timestamp_seconds)], self._current_context_id
)
self._word_boundary_queue.task_done()
except asyncio.CancelledError:
break
@@ -569,17 +580,21 @@ class AzureTTSService(TTSService, AzureBaseTTSService):
Args:
evt: Completion event from Azure Speech SDK.
"""
# Store duration for cumulative offset calculation
if evt.result and evt.result.audio_duration:
self._current_sentence_duration = evt.result.audio_duration.total_seconds()
# Flush any pending word before completing
if self._last_word is not None:
self._word_boundary_queue.put_nowait((self._last_word, self._last_timestamp))
self._last_word = None
self._last_timestamp = None
# Store duration for cumulative offset calculation
if evt.result and evt.result.audio_duration:
self._current_sentence_duration = evt.result.audio_duration.total_seconds()
self._audio_queue.put_nowait(None) # Signal completion
# Route completion through the word boundary queue so the word processor
# task drains all pending words before signaling audio stream completion.
# Without this, the last word's TTSTextFrame may arrive after
# TTSStoppedFrame, causing it to be missed by observers and the UI.
self._word_boundary_queue.put_nowait(None)
def _handle_canceled(self, evt):
"""Handle synthesis cancellation.

View File

@@ -354,7 +354,8 @@ class CartesiaSTTService(WebsocketSTTService):
self._websocket = await websocket_connect(ws_url, additional_headers=headers)
await self._call_event_handler("on_connected")
except Exception as e:
await self.push_error(error_msg=f"Unknown error occurred: {e}", exception=e)
self._websocket = None
await self.push_error(error_msg=f"Unable to connect to Cartesia: {e}", exception=e)
async def _disconnect_websocket(self):
ws = self._websocket

View File

@@ -8,6 +8,7 @@
import base64
import json
import re
from collections.abc import AsyncGenerator
from dataclasses import dataclass, field
from enum import StrEnum
@@ -431,10 +432,20 @@ class CartesiaTTSService(WebsocketTTSService):
base_lang = language.split("-")[0].lower()
return base_lang in {"zh", "ja"}
def _process_word_timestamps_for_language(
_CARTESIA_TAG_RE = re.compile(r"</?(?:spell|emotion|break|volume|speed)\b[^>]*>", re.IGNORECASE)
def _strip_cartesia_tags(self, text: str) -> str:
text = self._CARTESIA_TAG_RE.sub(" ", text)
text = re.sub(r"\s+", " ", text)
return text.strip()
def _normalize_word_timestamps(
self, words: list[str], starts: list[float]
) -> list[tuple[str, float]]:
"""Process word timestamps based on the current language.
"""Normalize raw word timestamps from Cartesia before further processing.
Strips Cartesia SSML tags (spell, emotion, break, volume, speed) from each word
and drops entries that become empty after stripping.
For Chinese and Japanese, Cartesia groups related characters in the same timestamp
message.
@@ -458,14 +469,18 @@ class CartesiaTTSService(WebsocketTTSService):
# For Chinese/Japanese, combine all characters in this message into one word
# using the first character's start time.
if words and starts:
combined_word = "".join(words)
combined_word = "".join(self._strip_cartesia_tags(w) for w in words)
first_start = starts[0]
return [(combined_word, first_start)]
return [(combined_word, first_start)] if combined_word else []
else:
return []
else:
# For non-CJK languages, use as-is
return list(zip(words, starts))
result = []
for word, start in zip(words, starts):
cleaned = self._strip_cartesia_tags(word)
if cleaned:
result.append((cleaned, start))
return result
def _word_timestamps_include_inter_frame_spaces(self) -> bool:
"""Whether timestamp text should be treated as carrying its own spacing."""
@@ -662,7 +677,7 @@ class CartesiaTTSService(WebsocketTTSService):
await self.remove_audio_context(ctx_id)
elif msg["type"] == "timestamps":
# Process the timestamps based on language before adding them
processed_timestamps = self._process_word_timestamps_for_language(
processed_timestamps = self._normalize_word_timestamps(
msg["word_timestamps"]["words"], msg["word_timestamps"]["start"]
)
await self.add_word_timestamps(

View File

@@ -358,7 +358,8 @@ class ElevenLabsSTTService(SegmentedSTTService):
# Add required model_id and language_code
data.add_field("model_id", self._settings.model)
data.add_field("language_code", self._settings.language)
if self._settings.language:
data.add_field("language_code", self._settings.language)
if self._settings.tag_audio_events is not None:
data.add_field("tag_audio_events", str(self._settings.tag_audio_events).lower())
keyterms = self._settings.keyterms
@@ -822,6 +823,7 @@ class ElevenLabsRealtimeSTTService(WebsocketSTTService):
await self._call_event_handler("on_connected")
logger.debug("Connected to ElevenLabs Realtime STT")
except Exception as e:
self._websocket = None
await self.push_error(
error_msg=f"Unable to connect to ElevenLabs Realtime STT: {e}", exception=e
)

View File

@@ -594,6 +594,10 @@ class ElevenLabsTTSService(WebsocketTTSService):
self._partial_word_start_time = 0.0
self._alignment_started_context_ids: set[str | None] = set()
# Context IDs whose context-init has been sent, so the keepalive knows
# which contexts are safe to target.
self._context_init_sent: set[str] = set()
# Context management for v1 multi API
self._receive_task = None
self._keepalive_task = None
@@ -792,6 +796,7 @@ class ElevenLabsTTSService(WebsocketTTSService):
finally:
await self.remove_active_audio_context()
self._websocket = None
self._context_init_sent.clear()
await self._call_event_handler("on_disconnected")
def _get_websocket(self):
@@ -822,6 +827,7 @@ class ElevenLabsTTSService(WebsocketTTSService):
self._partial_word = ""
self._partial_word_start_time = 0.0
self._alignment_started_context_ids.discard(context_id)
self._context_init_sent.discard(context_id)
async def on_audio_context_interrupted(self, context_id: str):
"""Close the ElevenLabs context when the bot is interrupted."""
@@ -914,26 +920,35 @@ class ElevenLabsTTSService(WebsocketTTSService):
while True:
await asyncio.sleep(KEEPALIVE_SLEEP)
try:
if self._websocket and self._websocket.state is State.OPEN:
context_id = self.get_active_audio_context_id()
if context_id:
# Send keepalive with context ID to keep the connection alive
keepalive_message = {
"text": "",
"context_id": context_id,
}
logger.trace(f"Sending keepalive for context {context_id}")
else:
# It's possible to have a user interruption which clears the context
# without generating a new TTS response. In this case, we'll just send
# an empty message to keep the connection alive.
keepalive_message = {"text": ""}
logger.trace("Sending keepalive without context")
await self._websocket.send(json.dumps(keepalive_message))
await self._send_keepalive()
except websockets.ConnectionClosed as e:
logger.warning(f"{self} keepalive error: {e}")
break
async def _send_keepalive(self):
"""Send a single keepalive message to keep the WebSocket connection alive.
Only stamps a ``context_id`` once its context-init (carrying
``voice_settings``) has been sent. Otherwise the keepalive would be the
context's first message, with no ``voice_settings``, and ElevenLabs would
reject the later context-init with a 1008 policy violation. A context-less
keepalive is sufficient until the context-init is sent.
"""
if not self._websocket or self._websocket.state is not State.OPEN:
return
context_id = self.get_active_audio_context_id()
if context_id and context_id in self._context_init_sent:
# The context's voice_settings context-init has been sent, so it's
# safe to keep that context alive.
keepalive_message = {"text": "", "context_id": context_id}
else:
# No active context, or the active context's context-init hasn't been
# sent yet. A context-less keepalive keeps the connection alive without
# opening the context prematurely.
keepalive_message = {"text": ""}
await self._websocket.send(json.dumps(keepalive_message))
async def _send_text(self, text: str, context_id: str):
"""Send text to the WebSocket for synthesis."""
if self._websocket and context_id:
@@ -980,6 +995,9 @@ class ElevenLabsTTSService(WebsocketTTSService):
locator.model_dump()
for locator in self._pronunciation_dictionary_locators
]
# Mark the context-init as sent so the keepalive may now
# target this context_id.
self._context_init_sent.add(context_id)
await self._websocket.send(json.dumps(msg))
logger.trace(f"Created new context {context_id}")

View File

@@ -558,8 +558,9 @@ class GladiaSTTService(WebsocketSTTService):
logger.debug(f"{self} Connected to Gladia WebSocket")
except Exception as e:
self._websocket = None
self._connection_active = False
await self.push_error(error_msg=f"Unable to connect to Gladia: {e}", exception=e)
raise
async def _disconnect_websocket(self):
"""Close the websocket connection to Gladia."""

View File

@@ -423,8 +423,8 @@ class GradiumSTTService(WebsocketSTTService):
logger.debug("Connected to Gradium STT")
except Exception as e:
await self.push_error(error_msg=f"Unknown error occurred: {e}", exception=e)
raise
self._websocket = None
await self.push_error(error_msg=f"Unable to connect to Gradium: {e}", exception=e)
async def _disconnect(self):
await super()._disconnect()

View File

@@ -0,0 +1,124 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
"""Inception LLM service implementation using OpenAI-compatible interface."""
from dataclasses import dataclass, field
from typing import Literal
from loguru import logger
from pipecat.adapters.services.open_ai_adapter import OpenAILLMInvocationParams
from pipecat.services.openai.base_llm import BaseOpenAILLMService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.settings import NOT_GIVEN as _NOT_GIVEN
from pipecat.services.settings import _NotGiven, is_given
@dataclass
class InceptionLLMSettings(BaseOpenAILLMService.Settings):
"""Settings for InceptionLLMService.
Parameters:
reasoning_effort: Controls how much reasoning the model applies.
One of "instant", "low", "medium", or "high". When unset, the
parameter is omitted and Inception's server-side default applies.
realtime: When True, reduces time to first diffusion block (TTFT).
"""
reasoning_effort: Literal["instant", "low", "medium", "high"] | None | _NotGiven = field(
default_factory=lambda: _NOT_GIVEN
)
realtime: bool | None | _NotGiven = field(default_factory=lambda: _NOT_GIVEN)
class InceptionLLMService(OpenAILLMService):
"""A service for interacting with Inception's API using the OpenAI-compatible interface.
This service extends OpenAILLMService to connect to Inception's API endpoint while
maintaining full compatibility with OpenAI's interface and functionality.
Supports Mercury-2, Inception's diffusion-based reasoning model.
"""
# Inception doesn't support the "developer" message role.
supports_developer_role = False
Settings = InceptionLLMSettings
_settings: Settings
def __init__(
self,
*,
api_key: str,
base_url: str = "https://api.inceptionlabs.ai/v1",
settings: Settings | None = None,
**kwargs,
):
"""Initialize the Inception LLM service.
Args:
api_key: The API key for accessing Inception's API.
base_url: The base URL for Inception API. Defaults to "https://api.inceptionlabs.ai/v1".
settings: Runtime-updatable settings.
**kwargs: Additional keyword arguments passed to OpenAILLMService.
"""
default_settings = self.Settings(
model="mercury-2",
reasoning_effort=None,
realtime=None,
)
if settings is not None:
default_settings.apply_update(settings)
super().__init__(api_key=api_key, base_url=base_url, settings=default_settings, **kwargs)
def create_client(self, api_key=None, base_url=None, **kwargs):
"""Create OpenAI-compatible client for Inception API endpoint.
Args:
api_key: The API key for authentication. If None, uses instance default.
base_url: The base URL for the API. If None, uses instance default.
**kwargs: Additional keyword arguments for client configuration.
Returns:
An OpenAI-compatible client configured for Inception's API.
"""
logger.debug(f"Creating Inception client with api {base_url}")
return super().create_client(api_key, base_url, **kwargs)
def build_chat_completion_params(self, params_from_context: OpenAILLMInvocationParams) -> dict:
"""Build parameters for Inception chat completion request.
Extends the base OpenAI parameters with Inception-specific options
such as reasoning_effort and realtime.
Args:
params_from_context: Parameters, derived from the LLM context, to
use for the chat completion. Contains messages, tools, and tool
choice.
Returns:
Dictionary of parameters for the chat completion request.
"""
params = super().build_chat_completion_params(params_from_context)
if (
is_given(self._settings.reasoning_effort)
and self._settings.reasoning_effort is not None
):
params["reasoning_effort"] = self._settings.reasoning_effort
# realtime is Inception-specific and unknown to the OpenAI SDK,
# so it must be passed via extra_body to avoid validation errors.
extra_body = {}
if is_given(self._settings.realtime) and self._settings.realtime is not None:
extra_body["realtime"] = self._settings.realtime
if extra_body:
params["extra_body"] = extra_body
return params

View File

@@ -155,7 +155,6 @@ def language_to_soniox_language(language: Language) -> str:
Language.ID: "id",
Language.IT: "it",
Language.JA: "ja",
Language.KA: "ka",
Language.KK: "kk",
Language.KN: "kn",
Language.KO: "ko",
@@ -232,6 +231,7 @@ class SonioxSTTSettings(STTSettings):
context_version 2.
enable_speaker_diarization: Whether to enable speaker diarization.
enable_language_identification: Whether to enable language identification.
max_endpoint_delay_ms: Max ms before endpoint detection finalizes the turn (500-3000).
client_reference_id: Client reference ID to use for transcription.
"""
@@ -242,6 +242,7 @@ class SonioxSTTSettings(STTSettings):
enable_language_identification: bool | None | _NotGiven = field(
default_factory=lambda: NOT_GIVEN
)
max_endpoint_delay_ms: int | None | _NotGiven = field(default_factory=lambda: NOT_GIVEN)
client_reference_id: str | None | _NotGiven = field(default_factory=lambda: NOT_GIVEN)
@@ -309,6 +310,7 @@ class SonioxSTTService(WebsocketSTTService):
context=None,
enable_speaker_diarization=False,
enable_language_identification=False,
max_endpoint_delay_ms=None,
client_reference_id=None,
)
@@ -390,8 +392,7 @@ class SonioxSTTService(WebsocketSTTService):
changed = await super()._update_settings(delta)
if changed:
await self._disconnect()
await self._connect()
await self._request_reconnect()
return changed
@@ -522,6 +523,7 @@ class SonioxSTTService(WebsocketSTTService):
"audio_format": self._audio_format,
"num_channels": self._num_channels,
"enable_endpoint_detection": enable_endpoint_detection,
"max_endpoint_delay_ms": s.max_endpoint_delay_ms,
"sample_rate": self.sample_rate,
"language_hints": _prepare_language_hints(assert_given(s.language_hints)),
"language_hints_strict": s.language_hints_strict,
@@ -537,8 +539,8 @@ class SonioxSTTService(WebsocketSTTService):
await self._call_event_handler("on_connected")
logger.debug("Connected to Soniox STT")
except Exception as e:
self._websocket = None
await self.push_error(error_msg=f"Unable to connect to Soniox: {e}", exception=e)
raise
async def _disconnect_websocket(self):
"""Close the websocket connection to Soniox."""

View File

@@ -44,17 +44,15 @@ GLADIA_TTFS_P99: float = 1.49
GOOGLE_TTFS_P99: float = 1.57
GRADIUM_TTFS_P99: float = 1.61
GROQ_TTFS_P99: float = 1.54
MISTRAL_TTFS_P99: float = 1.89
OPENAI_TTFS_P99: float = 2.01
OPENAI_REALTIME_TTFS_P99: float = 1.66
SARVAM_TTFS_P99: float = 1.17
SMALLEST_TTFS_P99: float = 1.59
SONIOX_TTFS_P99: float = 0.35
SPEECHMATICS_TTFS_P99: float = 0.74
XAI_TTFS_P99: float = 2.14
# These services run locally and should be replaced with measured values
NVIDIA_TTFS_P99: float = DEFAULT_TTFS_P99
WHISPER_TTFS_P99: float = DEFAULT_TTFS_P99
# No benchmark available yet; using conservative default
MISTRAL_TTFS_P99: float = DEFAULT_TTFS_P99
SMALLEST_TTFS_P99: float = DEFAULT_TTFS_P99
XAI_TTFS_P99: float = DEFAULT_TTFS_P99

View File

@@ -50,9 +50,13 @@ from pipecat.services.ai_service import AIService
from pipecat.services.settings import TTSSettings, is_given
from pipecat.services.websocket_service import WebsocketService
from pipecat.transcriptions.language import Language
from pipecat.utils.context.aggregated_frame_sequencer import AggregatedFrameSequencer
from pipecat.utils.context.word_completion_tracker import WordCompletionTracker
from pipecat.utils.frame_queue import FrameQueue
from pipecat.utils.text.base_text_filter import BaseTextFilter
from pipecat.utils.text.pattern_pair_aggregator import PatternMatch
from pipecat.utils.text.simple_text_aggregator import SimpleTextAggregator
from pipecat.utils.text.word_timestamp_utils import merge_punct_tokens
from pipecat.utils.time import seconds_to_nanoseconds
@@ -97,7 +101,6 @@ class _WordTimestampEntry:
word: str
timestamp: float
context_id: str
includes_inter_frame_spaces: bool | None = None
class TTSService(AIService):
@@ -289,6 +292,16 @@ class TTSService(AIService):
self._text_filters: Sequence[BaseTextFilter] = text_filters or []
self._transport_destination: str | None = transport_destination
# Ordered sequence of every AggregatedTextFrame slot that passes through
# _push_tts_frames (both spoken and skipped). Skipped frames are held here
# until all preceding spoken slots are complete, then flushed downstream so
# their append_to_context=True arrives at the assistant aggregator in the
# correct order relative to the TTSTextFrames from spoken sentences.
# Tracks all AggregatedTextFrame slots (spoken and skipped) in order.
# Skipped frames are held until preceding spoken slots complete, ensuring
# append_to_context=True reaches the assistant aggregator in the right order.
self._aggregated_frame_sequencer = AggregatedFrameSequencer(name=str(self))
self._resampler = create_stream_resampler()
self._processing_text: bool = False
@@ -690,7 +703,15 @@ class TTSService(AIService):
# Stop the aggregation metric (no-op if already stopped on first sentence).
await self.stop_text_aggregation_metrics()
if remaining:
await self._push_tts_frames(AggregatedTextFrame(remaining.text, remaining.type))
await self._push_tts_frames(
AggregatedTextFrame(
remaining.text,
remaining.type,
raw_text=remaining.full_match
if isinstance(remaining, PatternMatch)
else remaining.text,
)
)
# We pause processing incoming frames if the LLM response included
# text (it might be that it's only a function calling response). We
@@ -733,7 +754,7 @@ class TTSService(AIService):
push_assistant_aggregation = frame.append_to_context and not self._llm_response_started
# Assumption: text in TTSSpeakFrame does not include inter-frame spaces
await self._push_tts_frames(
AggregatedTextFrame(frame.text, AggregationType.SENTENCE),
AggregatedTextFrame(frame.text, AggregationType.SENTENCE, raw_text=frame.text),
append_tts_text_to_context=frame.append_to_context,
push_assistant_aggregation=push_assistant_aggregation,
)
@@ -887,6 +908,7 @@ class TTSService(AIService):
self._llm_response_started = False
self._streamed_text = ""
self._text_aggregation_metrics_started = False
self._aggregated_frame_sequencer.clear() # discard all pending slots on interruption
await self.reset_word_timestamps()
await self._stop_audio_context_task()
@@ -930,9 +952,23 @@ class TTSService(AIService):
if aggregate.type != AggregationType.TOKEN:
# Stop the aggregation metric on the first sentence only.
await self.stop_text_aggregation_metrics()
await self._push_tts_frames(
AggregatedTextFrame(aggregate.text, aggregate.type), includes_inter_frame_spaces
raw_text = (
aggregate.full_match if isinstance(aggregate, PatternMatch) else aggregate.text
)
await self._push_tts_frames(
AggregatedTextFrame(aggregate.text, aggregate.type, raw_text=raw_text),
includes_inter_frame_spaces,
)
async def _push_frame_respecting_previous_aggregated_frame(
self, frame: AggregatedTextFrame, context_id: str
):
# Enqueue the skipped frame; returns it immediately if no spoken slot
# precedes it, or holds it until the sequencer can flush it in order.
for f in self._aggregated_frame_sequencer.register_skipped(
frame, context_id, self._transport_destination
):
await self.push_frame(f)
async def _push_tts_frames(
self,
@@ -944,10 +980,13 @@ class TTSService(AIService):
type = src_frame.aggregated_by
text = src_frame.text
# Create context ID and store metadata
context_id = self.create_context_id()
# Skip sending to TTS if the aggregation type is in the skip list. Simply
# push the original frame downstream.
if type in self._skip_aggregator_types:
await self.push_frame(src_frame)
await self._push_frame_respecting_previous_aggregated_frame(src_frame, context_id)
return
# Whitespace gating depends on aggregation mode:
@@ -998,9 +1037,6 @@ class TTSService(AIService):
await self.stop_processing_metrics()
return
# Create context ID and store metadata
context_id = self.create_context_id()
# To support use cases that may want to know the text before it's spoken, we
# push the AggregatedTextFrame version before transforming and sending to TTS.
# However, we do not want to add this text to the assistant context until it
@@ -1045,6 +1081,21 @@ class TTSService(AIService):
await self.start_ttfb_metrics()
await self.append_to_audio_context(context_id, TTSStartedFrame(context_id=context_id))
# Register this spoken frame so the sequencer can track its completion
# and unblock any skipped frames queued behind it. Word-timestamp services
# complete the slot via process_word; push_text_frames services complete it
# below after the TTSTextFrame is appended to the audio context.
self._aggregated_frame_sequencer.register_spoken(
src_frame,
context_id,
tracker=WordCompletionTracker(
prepared_text, llm_text=src_frame.raw_text or src_frame.text
)
if not self._push_text_frames
else None,
append_to_context=self._tts_contexts[context_id].append_to_context,
)
await self.tts_process_generator(context_id, self.run_tts(prepared_text, context_id))
if not self._is_streaming_tokens:
@@ -1066,6 +1117,10 @@ class TTSService(AIService):
frame.append_to_context = append_tts_text_to_context
# Appending to the context, so it preserves the ordering.
await self.append_to_audio_context(context_id, frame)
# TTSTextFrame is queued; mark the spoken slot complete so any skipped
# frames (e.g. code blocks) waiting behind it can be flushed in order.
for f in self._aggregated_frame_sequencer.complete_spoken_slot():
await self.push_frame(f)
async def tts_process_generator(
self, context_id: str, generator: AsyncGenerator[Frame | None, None]
@@ -1114,10 +1169,8 @@ class TTSService(AIService):
if self._initial_word_times:
cached = self._initial_word_times.copy()
self._initial_word_times = []
for word, timestamp_seconds, ctx_id, ifs in cached:
await self._add_word_timestamps(
[(word, timestamp_seconds)], ctx_id, includes_inter_frame_spaces=ifs
)
for word, timestamp_seconds, ctx_id in cached:
await self._add_word_timestamps([(word, timestamp_seconds)], ctx_id)
async def reset_word_timestamps(self):
"""Reset word timestamp tracking."""
@@ -1139,6 +1192,11 @@ class TTSService(AIService):
playback order by _handle_audio_context. Otherwise they are processed immediately
via _add_word_timestamps.
When ``includes_inter_frame_spaces`` is True (e.g. Inworld TTS), punctuation and
space-only tokens are merged into the preceding word via ``_merge_punct_tokens``
before queuing, so the tracker always receives words with trailing punctuation
already attached. ``includes_inter_frame_spaces`` is reset to None after merging.
Args:
word_times: List of (word, timestamp) tuples where timestamp is in seconds.
context_id: Unique identifier for the TTS context.
@@ -1147,29 +1205,22 @@ class TTSService(AIService):
consumers must not inject additional spaces between tokens. None leaves
the frame's own default unchanged.
"""
if includes_inter_frame_spaces:
word_times = merge_punct_tokens(word_times)
if context_id and self.audio_context_available(context_id):
for word, timestamp in word_times:
await self.append_to_audio_context(
context_id,
_WordTimestampEntry(
word=word,
timestamp=timestamp,
context_id=context_id,
includes_inter_frame_spaces=includes_inter_frame_spaces,
),
_WordTimestampEntry(word=word, timestamp=timestamp, context_id=context_id),
)
else:
await self._add_word_timestamps(
word_times=word_times,
context_id=context_id,
includes_inter_frame_spaces=includes_inter_frame_spaces,
)
await self._add_word_timestamps(word_times=word_times, context_id=context_id)
async def _add_word_timestamps(
self,
word_times: list[tuple[str, float]],
context_id: str | None = None,
includes_inter_frame_spaces: bool | None = None,
):
"""Process word timestamps directly, building and pushing TTSTextFrames inline.
@@ -1185,19 +1236,15 @@ class TTSService(AIService):
ts_ns = seconds_to_nanoseconds(timestamp)
if self._initial_word_timestamp == -1:
# Cache until we have audio and can compute PTS.
self._initial_word_times.append(
(word, timestamp, context_id, includes_inter_frame_spaces)
)
self._initial_word_times.append((word, timestamp, context_id))
else:
frame = TTSTextFrame(word, aggregated_by=AggregationType.WORD)
if includes_inter_frame_spaces is not None:
frame.includes_inter_frame_spaces = includes_inter_frame_spaces
frame.pts = self._initial_word_timestamp + ts_ns
frame.context_id = context_id
if context_id in self._tts_contexts:
frame.append_to_context = self._tts_contexts[context_id].append_to_context
self._word_last_pts = frame.pts
await self.push_frame(frame)
pts = self._initial_word_timestamp + ts_ns
# Build TTSTextFrame(s) for this word token, advancing the active
# slot's tracker and flushing any skipped frames now unblocked.
for f in self._aggregated_frame_sequencer.process_word(word, pts, context_id):
if isinstance(f, TTSTextFrame):
self._word_last_pts = f.pts
await self.push_frame(f)
#
# Audio context methods (active when using websocket-based TTS with context management)
@@ -1382,6 +1429,18 @@ class TTSService(AIService):
frame.pts = self._word_last_pts
await self.push_frame(frame)
async def _apply_force_complete(self):
"""Force-complete all incomplete spoken slots and push any unblocked skipped frames.
Called at end-of-context to handle TTS providers that silently drop word-timestamp
events. Emits a TTSTextFrame for any remaining unspoken text, then flushes skipped
frames that were blocked by those incomplete slots.
"""
for f in self._aggregated_frame_sequencer.force_complete(self._word_last_pts):
if isinstance(f, TTSTextFrame):
self._word_last_pts = f.pts
await self.push_frame(f)
async def _handle_audio_context(self, context_id: str):
"""Process items from an audio context queue until it is exhausted."""
queue = self._audio_contexts[context_id]
@@ -1402,7 +1461,6 @@ class TTSService(AIService):
await self._add_word_timestamps(
[(frame.word, frame.timestamp)],
frame.context_id,
includes_inter_frame_spaces=frame.includes_inter_frame_spaces,
)
continue
elif isinstance(frame, TTSAudioRawFrame):
@@ -1416,6 +1474,9 @@ class TTSService(AIService):
if isinstance(frame, TTSStartedFrame):
should_push_stop_frame = self._push_stop_frames
elif isinstance(frame, TTSStoppedFrame):
# Checking if we have any remaining spoken slots before pushing the TTSStoppedFrame
await self._apply_force_complete()
should_push_stop_frame = False
# Setting the last word timestamp as the TTSStoppedFrame PTS
if not frame.pts:
@@ -1433,8 +1494,11 @@ class TTSService(AIService):
should_push_stop_frame = False
break
await self._apply_force_complete()
if should_push_stop_frame and self._push_stop_frames:
await self.push_frame(TTSStoppedFrame(context_id=context_id))
await self._maybe_reset_word_timestamps()
async def on_audio_context_interrupted(self, context_id: str):

View File

@@ -76,7 +76,9 @@ class WebsocketService(ABC):
logger.warning(f"{self} reconnecting (attempt: {attempt_number})")
await self._disconnect_websocket()
await self._connect_websocket()
return await self._verify_connection()
if not await self._verify_connection():
raise ConnectionError(f"{self} websocket reconnection failed verification")
return True
async def _try_reconnect(
self,

View File

@@ -293,8 +293,9 @@ class XAISTTService(WebsocketSTTService):
await self._call_event_handler("on_connected")
logger.debug(f"{self} connected to xAI STT WebSocket")
except Exception as e:
self._websocket = None
self._session_ready.clear()
await self.push_error(error_msg=f"Unable to connect to xAI STT: {e}", exception=e)
raise
async def _disconnect_websocket(self):
"""Close the WebSocket connection."""

View File

@@ -448,6 +448,9 @@ class BaseOutputTransport(FrameProcessor):
self._video_task: asyncio.Task | None = None
self._clock_task: asyncio.Task | None = None
# If timestamps are equal, use this count to preserve the insertion order
self._clock_queue_counter = itertools.count()
@property
def sample_rate(self) -> int:
"""Get the audio sample rate.
@@ -498,7 +501,7 @@ class BaseOutputTransport(FrameProcessor):
frame: The end frame signaling sender shutdown.
"""
# Let the sink tasks process the queue until they reach this EndFrame.
await self._clock_queue.put((float("inf"), frame.id, frame))
await self._clock_queue.put((float("inf"), next(self._clock_queue_counter), frame))
await self._audio_queue.put(frame)
# At this point we have enqueued an EndFrame and we need to wait for
@@ -610,7 +613,7 @@ class BaseOutputTransport(FrameProcessor):
Args:
frame: The frame with timing information to handle.
"""
await self._clock_queue.put((frame.pts, frame.id, frame))
await self._clock_queue.put((frame.pts, next(self._clock_queue_counter), frame))
async def handle_sync_frame(self, frame: Frame):
"""Handle frames that need synchronized processing.

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,150 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
"""Vonage Video Connector utils."""
from dataclasses import dataclass, replace
from enum import StrEnum
import numpy as np
import numpy.typing as npt
from pipecat.audio.resamplers.base_audio_resampler import BaseAudioResampler
@dataclass
class AudioProps:
"""Audio properties for normalization.
Parameters:
sample_rate: The sample rate of the audio.
is_stereo: Whether the audio is stereo (True) or mono (False).
"""
sample_rate: int
is_stereo: bool
class ImageFormat(StrEnum):
"""Enum for image formats."""
PLANAR_YUV420 = "PLANAR_YUV420"
PACKED_YUV444 = "PACKED_YUV444"
RGB = "RGB"
RGBA = "RGBA"
BGR = "BGR"
BGRA = "BGRA"
def check_audio_data(
buffer: bytes | memoryview, number_of_frames: int, number_of_channels: int
) -> None:
"""Check the audio sample width based on buffer size, number of frames and channels."""
if number_of_channels not in (1, 2):
raise ValueError(f"We only accept mono or stereo audio, got {number_of_channels}")
if isinstance(buffer, memoryview):
bytes_per_sample = buffer.itemsize
else:
bytes_per_sample = len(buffer) // (number_of_frames * number_of_channels)
if bytes_per_sample != 2:
raise ValueError(f"We only accept 16 bit PCM audio, got {bytes_per_sample * 8} bit")
def process_audio_channels(
audio: npt.NDArray[np.int16], current: AudioProps, target: AudioProps
) -> npt.NDArray[np.int16]:
"""Normalize audio channels to the target properties."""
if current.is_stereo != target.is_stereo:
if target.is_stereo:
audio = np.repeat(audio, 2)
else:
audio = audio.reshape(-1, 2).mean(axis=1).astype(np.int16)
return audio
async def process_audio(
resampler: BaseAudioResampler,
audio: npt.NDArray[np.int16],
current: AudioProps,
target: AudioProps,
) -> npt.NDArray[np.int16]:
"""Normalize audio to the target properties."""
res_audio = audio
if current.sample_rate != target.sample_rate:
# first normalize channels to mono if needed, then resample, then normalize channels to target
res_audio = process_audio_channels(res_audio, current, replace(current, is_stereo=False))
current = replace(current, is_stereo=False)
res_audio_bytes: bytes = await resampler.resample(
res_audio.tobytes(), current.sample_rate, target.sample_rate
)
res_audio = np.frombuffer(res_audio_bytes, dtype=np.int16)
res_audio = process_audio_channels(res_audio, current, target)
return res_audio
def image_colorspace_conversion(
image: bytes, size: tuple[int, int], from_format: ImageFormat, to_format: ImageFormat
) -> bytes | None:
"""Convert image colorspace from one format to another."""
match (from_format, to_format):
case (fmt1, fmt2) if fmt1 == fmt2:
return image
case (ImageFormat.RGB, ImageFormat.BGR) | (ImageFormat.BGR, ImageFormat.RGB):
np_input = np.frombuffer(image, dtype=np.uint8)
np_output = np_input.reshape(size[1], size[0], 3)[:, :, ::-1]
return np_output.tobytes()
case (ImageFormat.RGBA, ImageFormat.BGRA) | (ImageFormat.BGRA, ImageFormat.RGBA):
np_input = np.frombuffer(image, dtype=np.uint8)
np_output = np_input.reshape(size[1], size[0], 4)[:, :, [2, 1, 0, 3]]
return np_output.tobytes()
case (ImageFormat.PLANAR_YUV420, ImageFormat.PACKED_YUV444):
# YUV420 (I420) has Y plane of size width*height, U and V planes of size (width/2)*(height/2)
# Packed YUV444 interleaves Y, U, V values for each pixel (YUVYUVYUV...)
width, height = size
y_plane_size = width * height
uv_plane_size_420 = (width // 2) * (height // 2)
np_input = np.frombuffer(image, dtype=np.uint8)
y_plane = np_input[:y_plane_size].reshape(height, width)
u_plane_420 = np_input[y_plane_size : y_plane_size + uv_plane_size_420].reshape(
height // 2, width // 2
)
v_plane_420 = np_input[
y_plane_size + uv_plane_size_420 : y_plane_size + 2 * uv_plane_size_420
].reshape(height // 2, width // 2)
# Upsample U and V planes by repeating each pixel in 2x2 blocks
u_plane_444 = np.repeat(np.repeat(u_plane_420, 2, axis=0), 2, axis=1)
v_plane_444 = np.repeat(np.repeat(v_plane_420, 2, axis=0), 2, axis=1)
# Interleave Y, U, V values for packed format (YUVYUVYUV...)
np_output = np.stack([y_plane, u_plane_444, v_plane_444], axis=-1)
return np_output.tobytes()
case (ImageFormat.PACKED_YUV444, ImageFormat.PLANAR_YUV420):
# Packed YUV444 has Y, U, V interleaved (YUVYUVYUV...)
# YUV420 (I420) has Y plane of size width*height, U and V planes of size (width/2)*(height/2)
width, height = size
np_input = np.frombuffer(image, dtype=np.uint8).reshape(height, width, 3)
y_plane = np_input[:, :, 0].reshape(height, width)
u_plane_444 = np_input[:, :, 1]
v_plane_444 = np_input[:, :, 2]
# Downsample U and V planes by taking every other pixel (2x2 -> 1 averaging)
u_plane_420 = u_plane_444[::2, ::2].reshape(height // 2, width // 2)
v_plane_420 = v_plane_444[::2, ::2].reshape(height // 2, width // 2)
# Concatenate Y, U, V planes
np_output = np.concatenate(
[y_plane.flatten(), u_plane_420.flatten(), v_plane_420.flatten()]
)
return np_output.tobytes()
case _:
return None

View File

@@ -0,0 +1,483 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
"""Vonage Video Connector transport."""
from typing import Optional
from loguru import logger
from pipecat.frames.frames import (
CancelFrame,
EndFrame,
Frame,
InputAudioRawFrame,
InterruptionFrame,
OutputAudioRawFrame,
OutputImageRawFrame,
StartFrame,
UserImageRawFrame,
)
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor, FrameProcessorSetup
from pipecat.transports.base_input import BaseInputTransport
from pipecat.transports.base_output import BaseOutputTransport
from pipecat.transports.base_transport import BaseTransport
from pipecat.transports.vonage.client import (
Session, # type: ignore[attr-defined]
Stream, # type: ignore[attr-defined]
Subscriber, # type: ignore[attr-defined]
VonageClient,
VonageClientListener,
)
# the following "as" imports help to re-export these types and avoid type checking warnings
# when importing these types from the main transport module
from pipecat.transports.vonage.client import (
SubscribeSettings as SubscribeSettings,
)
from pipecat.transports.vonage.client import (
VonageException as VonageException,
)
from pipecat.transports.vonage.client import (
VonageVideoConnectorTransportParams as VonageVideoConnectorTransportParams,
)
class VonageVideoConnectorInputTransport(BaseInputTransport):
"""Input transport for Vonage, handling audio input from the Vonage session.
Receives audio from a Vonage Video session and pushes it as input frames.
"""
_params: VonageVideoConnectorTransportParams
def __init__(self, client: VonageClient, params: VonageVideoConnectorTransportParams):
"""Initialize the Vonage input transport.
Args:
client: The VonageClient instance to use.
params: Transport parameters for input configuration.
"""
super().__init__(params)
self._initialized: bool = False
self._client: VonageClient = client
self._listener_id: int = -1
self._connected: bool = False
async def start(self, frame: StartFrame) -> None:
"""Start the Vonage input transport.
Args:
frame: The StartFrame to initiate the transport.
"""
await super().start(frame)
if self._initialized:
return
self._initialized = True
if self._params.audio_in_enabled or self._params.video_in_enabled:
self._listener_id = self._client.add_listener(
VonageClientListener(
on_audio_in=self._audio_in_cb,
on_video_in=self._video_in_cb,
on_error=self._on_error_cb,
)
)
try:
await self._client.connect(frame)
self._connected = True
except Exception as exc:
logger.error(f"Error connecting to Vonage session: {exc}")
await self.push_error("Vonage video connector connection error", fatal=True)
return
await self.set_transport_ready(frame)
async def setup(self, setup: FrameProcessorSetup) -> None:
"""Set up the processor with required components.
Args:
setup: Configuration object containing setup parameters.
"""
await super().setup(setup)
await self._client.setup(setup)
async def cleanup(self) -> None:
"""Cleanup input transport."""
await super().cleanup() # type: ignore
await self._client.cleanup()
async def _audio_in_cb(self, _session: Session, audio: InputAudioRawFrame) -> None:
if self._connected and self._params.audio_in_enabled:
await self.push_audio_frame(audio)
async def _video_in_cb(self, _subscriber: Subscriber, video: UserImageRawFrame) -> None:
if self._connected and self._params.video_in_enabled:
await self.push_video_frame(video)
async def _on_error_cb(self, session: Session, description: str, code: int) -> None:
logger.error(
f"Vonage input transport error session={session.id} code={code} description={description}"
)
if self._connected:
await self.push_error("Vonage video connector error", fatal=True)
async def stop(self, frame: EndFrame) -> None:
"""Stop the Vonage input transport.
Args:
frame: The EndFrame to stop the transport.
"""
await super().stop(frame)
await self._stop_client()
async def cancel(self, frame: CancelFrame) -> None:
"""Cancel the Vonage input transport.
Args:
frame: The CancelFrame to cancel the transport.
"""
await super().cancel(frame)
await self._stop_client()
async def _stop_client(self) -> None:
if self._connected:
self._client.remove_listener(self._listener_id)
self._connected = False
try:
await self._client.disconnect()
except Exception:
pass
async def subscribe_to_stream(self, stream_id: str, params: SubscribeSettings) -> None:
"""Subscribe to a participant's stream.
Args:
stream_id: The ID of the participant to subscribe to.
params: Subscription parameters for the subscription.
"""
await self._client.subscribe_to_stream(stream_id, params)
class VonageVideoConnectorOutputTransport(BaseOutputTransport):
"""Output transport for Vonage, handling audio output to the Vonage session.
Sends audio frames to a Vonage Video session as output.
"""
_params: VonageVideoConnectorTransportParams
def __init__(self, client: VonageClient, params: VonageVideoConnectorTransportParams):
"""Initialize the Vonage output transport.
Args:
client: The VonageClient instance to use.
params: Transport parameters for output configuration.
"""
super().__init__(params)
self._initialized: bool = False
self._client = client
self._connected: bool = False
self._listener_id: int = -1
async def start(self, frame: StartFrame) -> None:
"""Start the Vonage output transport.
Args:
frame: The StartFrame to initiate the transport.
"""
await super().start(frame)
if self._initialized:
return
self._initialized = True
if self._params.audio_out_enabled or self._params.video_out_enabled:
self._listener_id = self._client.add_listener(
VonageClientListener(on_error=self._on_error_cb)
)
try:
await self._client.connect(frame)
self._connected = True
except Exception as exc:
logger.error(f"Error connecting to Vonage session: {exc}")
await self.push_error("Vonage video connector connection error", fatal=True)
return
await self.set_transport_ready(frame)
async def setup(self, setup: FrameProcessorSetup) -> None:
"""Set up the processor with required components.
Args:
setup: Configuration object containing setup parameters.
"""
await super().setup(setup)
await self._client.setup(setup)
async def cleanup(self) -> None:
"""Cleanup output transport."""
await super().cleanup() # type: ignore
await self._client.cleanup()
async def process_frame(self, frame: Frame, direction: FrameDirection) -> None:
"""Process a frame for the Vonage output transport.
Args:
frame: The frame to process.
direction: The direction of frame flow in the pipeline.
"""
await super().process_frame(frame, direction)
# if we get an interruption frame, we need to ensure the buffers inside Vonage Video Connector are cleared
if (
self._connected
and isinstance(frame, InterruptionFrame)
and self._params.clear_buffers_on_interruption
):
logger.info("Clearing Vonage media buffers due to interruption frame")
self._client.clear_media_buffers()
async def write_audio_frame(self, frame: OutputAudioRawFrame) -> bool:
"""Write an audio frame to the Vonage session.
Args:
frame: The OutputAudioRawFrame to send.
"""
result = False
if self._connected and self._params.audio_out_enabled:
result = await self._client.write_audio(frame)
return result
async def write_video_frame(self, frame: OutputImageRawFrame) -> bool:
"""Write a video frame to the transport.
Args:
frame: The output video frame to write.
"""
result = False
if self._connected and self._params.video_out_enabled:
result = await self._client.write_video(frame)
return result
async def stop(self, frame: EndFrame) -> None:
"""Stop the Vonage output transport.
Args:
frame: The EndFrame to stop the transport.
"""
await super().stop(frame)
await self._stop_client()
async def cancel(self, frame: CancelFrame) -> None:
"""Cancel the Vonage output transport.
Args:
frame: The CancelFrame to cancel the transport.
"""
await super().cancel(frame)
await self._stop_client()
async def _stop_client(self) -> None:
if self._connected:
self._client.remove_listener(self._listener_id)
self._connected = False
try:
await self._client.disconnect()
except Exception:
pass
async def _on_error_cb(self, session: Session, description: str, code: int) -> None:
logger.error(
f"Vonage output transport error session={session.id} code={code} description={description}"
)
if self._connected:
await self.push_error("Vonage video connector error", fatal=True)
class VonageVideoConnectorTransport(BaseTransport):
"""Vonage Video Connector transport implementation for Pipecat.
Provides input and output audio transport for Vonage Video sessions, supporting event handling
for session and participant lifecycle.
Supported features:
- Audio input and output transport for Vonage Video sessions
- Event handler registration for session and participant events
- Publisher and subscriber management
- Configurable audio and migration parameters
"""
_params: VonageVideoConnectorTransportParams
def __init__(
self,
application_id: str,
session_id: str,
token: str,
params: VonageVideoConnectorTransportParams,
):
"""Initialize the Vonage Video Connector transport.
Args:
application_id: The Vonage Video application ID.
session_id: The session ID to connect to.
token: The authentication token for the session.
params: Transport parameters for input/output configuration.
"""
super().__init__()
self._params = params
self._client = VonageClient(application_id, session_id, token, params)
# Register supported handlers.
self._register_event_handler("on_joined")
self._register_event_handler("on_left")
self._register_event_handler("on_error")
self._register_event_handler("on_client_connected", sync=True)
self._register_event_handler("on_client_disconnected")
self._register_event_handler("on_first_participant_joined", sync=True)
self._register_event_handler("on_participant_joined", sync=True)
self._register_event_handler("on_participant_left")
self._client.add_listener(
VonageClientListener(
on_connected=self._on_connected,
on_disconnected=self._on_disconnected,
on_error=self._on_error,
on_stream_received=self._on_stream_received,
on_stream_dropped=self._on_stream_dropped,
on_subscriber_connected=self._on_subscriber_connected,
on_subscriber_disconnected=self._on_subscriber_disconnected,
)
)
self._input: VonageVideoConnectorInputTransport | None = None
self._output: VonageVideoConnectorOutputTransport | None = None
self._one_stream_received: bool = False
def input(self) -> FrameProcessor:
"""Get the input transport for Vonage.
Returns:
The VonageVideoConnectorInputTransport instance.
"""
if not self._input:
self._input = VonageVideoConnectorInputTransport(self._client, self._params)
return self._input
def output(self) -> FrameProcessor:
"""Get the output transport for Vonage.
Returns:
The VonageVideoConnectorOutputTransport instance.
"""
if not self._output:
self._output = VonageVideoConnectorOutputTransport(self._client, self._params)
return self._output
async def subscribe_to_stream(self, stream_id: str, params: SubscribeSettings) -> None:
"""Subscribe to a participant's stream.
Args:
stream_id: The ID of the participant to subscribe to.
params: Subscription parameters for the subscription.
"""
if self._input:
await self._input.subscribe_to_stream(stream_id, params)
async def _on_connected(self, session: Session) -> None:
"""Handle session connected event.
Args:
session: The connected Session object.
"""
await self._call_event_handler("on_joined", {"sessionId": session.id})
async def _on_disconnected(self, session: Session) -> None:
"""Handle session disconnected event.
Args:
session: The disconnected Session object.
"""
await self._call_event_handler("on_left", {"sessionId": session.id})
async def _on_error(self, _session: Session, description: str, _code: int) -> None:
"""Handle session error event.
Args:
_session: The Session object.
description: Error description.
_code: Error code.
"""
await self._call_event_handler("on_error", description)
async def _on_stream_received(self, session: Session, stream: Stream) -> None:
"""Handle stream received event.
Args:
session: The Session object.
stream: The received Stream object.
"""
client = {
"sessionId": session.id,
"streamId": stream.id,
"connectionData": stream.connection.data,
}
if not self._one_stream_received:
self._one_stream_received = True
await self._call_event_handler("on_first_participant_joined", client)
await self._call_event_handler("on_participant_joined", client)
async def _on_stream_dropped(self, session: Session, stream: Stream) -> None:
"""Handle stream dropped event.
Args:
session: The Session object.
stream: The dropped Stream object.
"""
client = {
"sessionId": session.id,
"streamId": stream.id,
"connectionData": stream.connection.data,
}
await self._call_event_handler("on_participant_left", client)
async def _on_subscriber_connected(self, subscriber: Subscriber) -> None:
"""Handle subscriber connected event.
Args:
subscriber: The connected Subscriber object.
"""
await self._call_event_handler(
"on_client_connected",
{
"subscriberId": subscriber.stream.id,
"streamId": subscriber.stream.id,
"connectionData": subscriber.stream.connection.data,
},
)
async def _on_subscriber_disconnected(self, subscriber: Subscriber) -> None:
"""Handle subscriber disconnected event.
Args:
subscriber: The disconnected Subscriber object.
"""
await self._call_event_handler(
"on_client_disconnected",
{
"subscriberId": subscriber.stream.id,
"streamId": subscriber.stream.id,
"connectionData": subscriber.stream.connection.data,
},
)

View File

@@ -20,6 +20,7 @@ from loguru import logger
from pipecat.frames.frames import (
Frame,
FunctionCallsStartedFrame,
InterruptionFrame,
LLMFullResponseEndFrame,
LLMMarkerFrame,
@@ -222,6 +223,14 @@ class UserTurnCompletionLLMServiceMixin(FrameProcessor):
# ensures graceful degradation if the LLM disobeys and outputs additional text.
self._turn_suppressed = False
self._turn_complete_found = False # True when ✓ (COMPLETE) is detected
# Set when the LLM made a tool call during this turn. Informational
# only — broadcasting is idempotency-gated by
# ``_turn_completion_broadcasted``.
self._turn_had_function_call = False
# True once ``UserTurnInferenceCompletedFrame`` has been broadcast
# for this turn. Prevents double-broadcast when ✓ and a tool call
# both occur in the same turn.
self._turn_completion_broadcasted = False
# Timeout handling
self._user_turn_completion_config = UserTurnCompletionConfig()
@@ -236,6 +245,27 @@ class UserTurnCompletionLLMServiceMixin(FrameProcessor):
"""
self._user_turn_completion_config = config
async def _broadcast_turn_completion(self):
"""Broadcast ``UserTurnInferenceCompletedFrame`` at most once per turn.
Called from the two places we know the LLM has committed to a
response for the current user turn:
- the ``✓`` marker is detected in the text stream
- a ``FunctionCallsStartedFrame`` is emitted — the LLM committed
to a tool call before producing (or instead of) a marker.
Broadcasting on the tool-call path matters for races: the
downstream ``UserStoppedSpeakingFrame`` needs to propagate
before the function actually executes and a
``FunctionCallResultFrame`` flows back to the assistant
aggregator.
"""
if self._turn_completion_broadcasted:
return
self._turn_completion_broadcasted = True
await self.broadcast_frame(UserTurnInferenceCompletedFrame)
async def _start_incomplete_timeout(self, incomplete_type: Literal["short", "long"]):
"""Start a timeout task for incomplete turn handling.
@@ -325,6 +355,8 @@ class UserTurnCompletionLLMServiceMixin(FrameProcessor):
self._turn_text_buffer = ""
self._turn_suppressed = False
self._turn_complete_found = False
self._turn_had_function_call = False
self._turn_completion_broadcasted = False
async def process_frame(self, frame: Frame, direction: FrameDirection):
"""Process frames, handling turn completion state resets.
@@ -351,7 +383,14 @@ class UserTurnCompletionLLMServiceMixin(FrameProcessor):
frame: The frame to push downstream.
direction: The direction of frame flow. Defaults to downstream.
"""
if isinstance(frame, LLMFullResponseEndFrame):
if isinstance(frame, FunctionCallsStartedFrame):
self._turn_had_function_call = True
# Broadcast turn completion now, before the function dispatches
# — gives ``UserStoppedSpeakingFrame`` maximum time to propagate
# so the assistant aggregator's ``_user_speaking`` is False by
# the time a ``FunctionCallResultFrame`` arrives.
await self._broadcast_turn_completion()
elif isinstance(frame, LLMFullResponseEndFrame):
await self._turn_reset()
await super().push_frame(frame, direction)
@@ -427,7 +466,9 @@ class UserTurnCompletionLLMServiceMixin(FrameProcessor):
# LLMTurnCompletionUserTurnStopStrategy) can fire
# `on_user_turn_stopped`. Must fire before the marker so
# downstream consumers see the signal before the response.
await self.broadcast_frame(UserTurnInferenceCompletedFrame)
# Idempotent: a tool call earlier in the turn may have
# already broadcast.
await self._broadcast_turn_completion()
# Push the marker as a sideband signal that the assistant
# aggregator will prepend to the upcoming aggregated text,

View File

@@ -0,0 +1,354 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
"""Ordered sequencer for AggregatedTextFrame slots through TTS processing."""
from dataclasses import dataclass, field
from loguru import logger
from pipecat.frames.frames import (
AggregatedTextFrame,
AggregationType,
Frame,
TTSTextFrame,
)
from pipecat.utils.context.word_completion_tracker import WordCompletionTracker
@dataclass
class _AggregatedFrameSlot:
"""Ordered slot tracking one AggregatedTextFrame through TTS processing.
Every frame that passes through _push_tts_frames — whether spoken or skipped —
occupies a slot in the sequencer. Skipped frames wait at their position and are
emitted downstream only after all preceding spoken slots are complete, preserving
correct context ordering.
"""
frame: AggregatedTextFrame
context_id: str
spoken: bool
tracker: WordCompletionTracker | None = None
transport_destination: str | None = None
complete: bool = False
class AggregatedFrameSequencer:
"""Sequences AggregatedTextFrame slots to preserve TTS context ordering.
Manages an ordered queue of spoken and skipped TTS slots. Spoken slots are tracked
via a :class:`WordCompletionTracker`; skipped slots (e.g. code blocks excluded from
TTS synthesis) wait in-place until all preceding spoken slots are complete, then are
flushed downstream with ``append_to_context=True``.
All methods are synchronous and return lists of frames the caller should push
downstream, making the sequencer fully testable without any async machinery.
Example::
sequencer = AggregatedFrameSequencer()
sequencer.register_spoken(frame, ctx_id, tracker, append_to_context=True)
for f in sequencer.process_word("hello", pts=1000, context_id=ctx_id):
await self.push_frame(f)
"""
def __init__(self, name: str = "AggregatedFrameSequencer"):
"""Initialize the sequencer.
Args:
name: Label used in log messages (typically the owning TTS service name).
"""
self._name = name
self._slots: list[_AggregatedFrameSlot] = []
self._context_append_to_context: dict[str, bool] = {}
def register_spoken(
self,
frame: AggregatedTextFrame,
context_id: str,
tracker: WordCompletionTracker | None,
append_to_context: bool,
) -> None:
"""Register a spoken AggregatedTextFrame slot.
Called from _push_tts_frames for frames sent to the TTS service. The slot is
marked complete either via :meth:`process_word` (word-timestamp services) or
:meth:`complete_spoken_slot` (push_text_frames=True services).
Args:
frame: The AggregatedTextFrame being spoken.
context_id: The TTS context ID assigned to this frame.
tracker: WordCompletionTracker for word-timestamp services; None for
push_text_frames=True services (they complete via complete_spoken_slot).
append_to_context: Whether word frames built for this context should carry
append_to_context=True.
"""
self._context_append_to_context[context_id] = append_to_context
self._slots.append(
_AggregatedFrameSlot(
frame=frame,
context_id=context_id,
spoken=True,
tracker=tracker,
)
)
def register_skipped(
self,
frame: AggregatedTextFrame,
context_id: str,
transport_destination: str | None,
) -> list[Frame]:
"""Register a skipped AggregatedTextFrame and attempt an immediate flush.
The frame is appended as a skipped slot. If no incomplete spoken slot precedes
it, the frame is returned right away; otherwise it waits until a later
:meth:`flush` unblocks it.
Args:
frame: The skipped AggregatedTextFrame (e.g. a code block).
context_id: The context ID assigned in _push_tts_frames.
transport_destination: Transport routing value to attach at flush time.
Returns:
Frames to push downstream (empty when blocked by a preceding spoken slot).
"""
frame.context_id = context_id
self._slots.append(
_AggregatedFrameSlot(
frame=frame,
context_id=context_id,
spoken=False,
transport_destination=transport_destination,
)
)
return self.flush()
def process_word(
self,
word: str,
pts: int,
context_id: str | None,
) -> list[Frame]:
"""Process one word-timestamp event and return frames to push downstream.
Locates the active (first incomplete spoken) slot with a tracker, advances it
by the incoming word, and builds a :class:`TTSTextFrame`. Handles:
- Normal words that fit entirely within the active slot.
- Overflow words straddling two slot boundaries.
- Force-complete when the TTS drops an event (word belongs to the next slot).
- Passthrough for words not recognised by any slot.
- Flushes any skipped slots unblocked by slot completion.
Args:
word: A word token from the TTS service word-timestamp stream.
pts: Presentation timestamp (nanoseconds) to assign to the frame.
context_id: TTS context ID from the word-timestamp event.
Returns:
Ordered list of frames (TTSTextFrame and/or AggregatedTextFrame) to push.
"""
active = self._get_active_slot()
is_complete = False
raw_overflow_word = None
if active and active.tracker:
if not active.tracker.word_belongs_here(word):
next_slot = self._get_next_active_slot(active)
word_fits_next = (
next_slot is not None
and next_slot.tracker is not None
and next_slot.tracker.word_belongs_here(word)
)
if not word_fits_next:
logger.warning(
f"{self._name} Word '{word}' not recognised by any slot, "
"emitting as passthrough"
)
return [self._build_word_frame(word, pts, context_id)]
is_complete = active.tracker.add_word_and_check_complete(word)
raw_overflow_word = active.tracker.get_overflow_word()
frame_text = (
active.tracker.get_word_for_frame() if (active and active.tracker) else word
) or word
raw_text = active.tracker.get_llm_consumed() if (active and active.tracker) else None
emit_context_id = active.context_id if active else context_id
# logger.debug(f"{self._name} Word '{word}' → frame_text='{frame_text}', raw='{raw_text}'")
frames: list[Frame] = [
self._build_word_frame(frame_text, pts, emit_context_id, raw_text=raw_text)
]
if is_complete and active:
active.complete = True
frames.extend(self.flush(last_word_pts=pts))
if raw_overflow_word:
logger.debug(f"{self._name} Emitting overflow word '{raw_overflow_word}'")
frames.extend(self._process_overflow(raw_overflow_word, pts))
return frames
def complete_spoken_slot(self) -> list[Frame]:
"""Mark the first pending spoken slot complete and flush unblocked skipped frames.
Used by push_text_frames=True services: after the TTSTextFrame has been appended
to the audio context, this marks the spoken slot done and releases any skipped
frames waiting behind it.
Returns:
AggregatedTextFrame(s) that are now unblocked and should be pushed.
"""
slot = next((s for s in self._slots if s.spoken and not s.complete), None)
if slot:
slot.complete = True
return self.flush()
def flush(self, last_word_pts: int | None = None) -> list[Frame]:
"""Walk the slot queue and return all skipped frames that are now unblocked.
Removes complete spoken slots from the head of the queue, then emits (and
removes) skipped slots whose preceding spoken slots are all done. Stops at
the first incomplete spoken slot.
Args:
last_word_pts: When provided, skipped frames receive this PTS so they
appear immediately after the last spoken word in the timeline.
Returns:
AggregatedTextFrame(s) ready to be pushed downstream.
"""
frames: list[Frame] = []
while self._slots:
slot = self._slots[0]
if slot.spoken and slot.complete:
self._slots.pop(0)
elif not slot.spoken and not slot.complete:
slot.frame.append_to_context = True
slot.frame.transport_destination = slot.transport_destination
if last_word_pts:
slot.frame.pts = last_word_pts
logger.debug(f"{self._name}: Flushing Aggregated Frame {slot.frame}")
frames.append(slot.frame)
slot.complete = True
self._slots.pop(0)
else:
break # spoken but not yet complete — wait
return frames
def force_complete(self, last_word_pts: int) -> list[Frame]:
"""Force-complete all incomplete spoken slots and flush skipped frames.
Called at the end of an audio context to handle TTS providers that silently drop
word-timestamp events. Emits a TTSTextFrame for any remaining unspoken text in
each incomplete slot, marks it complete, then flushes all now-unblocked skipped
frames.
Args:
last_word_pts: PTS of the last received word frame, used as the PTS for
force-completed frames and forwarded to :meth:`flush`.
Returns:
Combined list of TTSTextFrames (for incomplete spoken slots) and
AggregatedTextFrames (skipped slots now unblocked), in emission order.
"""
frames: list[Frame] = []
for slot in self._slots:
if slot.spoken and not slot.complete:
if slot.tracker:
remaining_text = slot.tracker.get_remaining_tts_text()
raw_remaining = slot.tracker.get_remaining_llm_text()
if raw_remaining and remaining_text and remaining_text not in raw_remaining:
logger.warning(
f"{self._name} force-complete: raw_remaining {repr(raw_remaining)} "
f"does not contain remaining_text {repr(remaining_text)}, discarding"
)
raw_remaining = None
if remaining_text:
logger.debug(
f"{self._name} force-completing slot with remaining text "
f"{repr(remaining_text)}"
)
frames.append(
self._build_word_frame(
remaining_text,
last_word_pts,
slot.context_id,
raw_text=raw_remaining,
)
)
slot.complete = True
frames.extend(self.flush(last_word_pts=last_word_pts))
return frames
def clear(self) -> None:
"""Clear all slots and context metadata (called on interruption/reset)."""
self._slots.clear()
self._context_append_to_context.clear()
# -------------------------------------------------------------------------
# Internal helpers
# -------------------------------------------------------------------------
def _get_active_slot(self) -> _AggregatedFrameSlot | None:
"""Return the first incomplete spoken slot that has a tracker."""
return next(
(s for s in self._slots if s.spoken and not s.complete and s.tracker is not None),
None,
)
def _get_next_active_slot(self, current: _AggregatedFrameSlot) -> _AggregatedFrameSlot | None:
"""Return the first incomplete spoken slot with a tracker after *current*."""
found = False
for s in self._slots:
if s is current:
found = True
continue
if found and s.spoken and not s.complete and s.tracker is not None:
return s
return None
def _build_word_frame(
self,
text: str,
pts: int,
context_id: str | None,
raw_text: str | None = None,
) -> Frame:
"""Build a TTSTextFrame with all standard word-timestamp attributes set."""
frame = TTSTextFrame(text, aggregated_by=AggregationType.WORD)
frame.pts = pts
frame.context_id = context_id
frame.append_to_context = (
self._context_append_to_context.get(context_id, True)
if context_id is not None
else True
)
frame.raw_text = raw_text
return frame
def _process_overflow(self, raw_overflow_word: str, pts: int) -> list[Frame]:
"""Feed an overflow suffix into the next active slot and return resulting frames."""
frames: list[Frame] = []
next_active = self._get_active_slot()
if not next_active or not next_active.tracker:
return frames
overflow_complete = next_active.tracker.add_word_and_check_complete(raw_overflow_word)
frames.append(
self._build_word_frame(
raw_overflow_word,
pts,
next_active.context_id,
raw_text=next_active.tracker.get_llm_consumed(),
)
)
if overflow_complete:
next_active.complete = True
frames.extend(self.flush(last_word_pts=pts))
return frames

View File

@@ -0,0 +1,489 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
"""Word completion tracker for TTS context ordering."""
import re
import unicodedata
from loguru import logger
class WordCompletionTracker:
"""Tracks whether all words from a source AggregatedTextFrame have been spoken.
Compares normalized alphanumeric character counts between the TTS text and
accumulated spoken words, making the check robust to punctuation, spacing,
and XML/HTML tags (e.g. SSML tags like ``<spell>...</spell>`` returned by some
TTS providers in word-timestamp events).
When ``llm_text`` is provided (e.g. the original pattern-matched text including
delimiters like ``<card>4111 1111 1111 1111</card>``), the tracker additionally
maps each spoken word back to its corresponding span in that LLM text. This
lets callers attach the original text to ``TTSTextFrame`` entries so the
conversation context receives properly-tagged content rather than the cleaned
words received from the TTS provider.
Background: TTS providers apply their own SSML tags to the text before
synthesis and return word-timestamp events containing the raw spoken words
(e.g. ``"4111"``, ``"1111"``). Without LLM-text tracking, the conversation
context would only see those cleaned words and lose the original structure
(e.g. ``<card>4111 1111 1111 1111</card>``). By mapping normalized char counts
back to positions in ``llm_text``, each TTSTextFrame can carry the exact span
of original text it represents.
Overflow handling: TTS providers sometimes return a single word token that
spans the boundary between two AggregatedTextFrames (e.g. ``"1111</spell>And"``
when one frame ends with ``1111</card>`` and the next begins with ``And``). The
tracker detects this and exposes the raw overflow suffix via ``get_overflow_word()``,
so callers can feed the remainder into the next frame's tracker and emit a
correctly-attributed TTSTextFrame for each part.
Example::
tracker = WordCompletionTracker("Hello, world!")
tracker.add_word_and_check_complete("Hello") # False
tracker.add_word_and_check_complete("world") # True — normalized "helloworld" >= "helloworld"
"""
def __init__(
self,
tts_text: str,
llm_text: str | None = None,
):
"""Initialize the tracker with the text of the frame being spoken.
Args:
tts_text: Full text of the AggregatedTextFrame sent to TTS (may include
TTS-specific SSML tags). Used for normalized char-count completion
tracking and as the cursor reference for the TTS word stream.
llm_text: Original LLM-produced text including pattern delimiters (e.g.
``<card>4111 1111 1111 1111</card>``). When provided, each
``add_word_and_check_complete`` call also returns the corresponding
LLM span via ``get_llm_consumed()``. Both texts normalize to the
same alphanumeric sequence, so the same char-count cursor drives
position tracking in both.
"""
self._tts_normalized = self._normalize(tts_text)
self._received = ""
# _tts_text is the original tts_text before normalization.
# _tts_pos is a cursor into it, advanced by the same alnum count
# as the TTS word stream, so the force-complete path can emit the remaining
# unspoken text as a TTSTextFrame instead of silently dropping it.
self._tts_text = tts_text
self._tts_pos = 0
# _llm_text is the original LLM-produced text (with pattern delimiters like
# <card>...</card>). We track _llm_pos as a cursor into it, advancing
# by the same number of alphanumeric chars consumed from the TTS word stream.
self._llm_text = llm_text
self._llm_pos = 0
self._overflow_word: str | None = None
self._llm_consumed: str | None = None
self._frame_word: str | None = None
@staticmethod
def _normalize(text: str) -> str:
"""Strip XML/HTML tags then keep only lowercase alphanumeric characters.
Accented letters (e.g. ã, é) are reduced to their base letter so TTS output
can be matched against LLM text even when the provider strips diacritics.
Non-Latin scripts (CJK, Hangul) are kept as-is — each original character
contributes exactly one char to the result, keeping normalized length in sync
with raw alnum counts used by _advance_by_alnums.
"""
text = re.sub(r"<[^>]+>", "", text)
result = []
for char in text:
# Ignore punctuation, spaces, emojis, etc.
# Keep only letters and numbers.
if not char.isalnum():
continue
# NFD decomposes accented characters into:
# é -> e + ◌́
# ã -> a + ◌̃
#
# Non-accented characters usually stay unchanged.
nfd = unicodedata.normalize("NFD", char)
# Unicode category "Mn" means:
# Mark, Nonspacing
#
# These are combining accent marks that modify
# the previous character but are not standalone.
#
# Example:
# "é" becomes:
# nfd[0] = "e"
# nfd[1] = "◌́" (category = "Mn")
#
# If the second character is a combining accent,
# keep only the base letter.
if len(nfd) >= 2 and unicodedata.category(nfd[1]) == "Mn":
# Accented letter: keep the base character only (drops the combining mark).
result.append(nfd[0].lower())
else:
# Regular ASCII, numbers, CJK, Hangul, etc.
# are kept unchanged (except lowercase conversion).
result.append(char.lower())
return "".join(result)
# Typographic variants that LLMs commonly emit but TTS services normalize away.
_TYPOGRAPHY_FOLD = str.maketrans(
{
"": "'", # ' LEFT SINGLE QUOTATION MARK
"": "'", # ' RIGHT SINGLE QUOTATION MARK
"ʼ": "'", # ʼ MODIFIER LETTER APOSTROPHE
"": '"', # " LEFT DOUBLE QUOTATION MARK
"": '"', # " RIGHT DOUBLE QUOTATION MARK
"": "-", # EN DASH
"": "-", # — EM DASH
}
)
@staticmethod
def _fold_typography(text: str) -> str:
"""Replace typographic punctuation variants with their ASCII equivalents."""
return text.translate(WordCompletionTracker._TYPOGRAPHY_FOLD)
@staticmethod
def _remove_trailing_punctuation(text: str) -> str:
"""Remove punctuation only at the very end of the given text."""
i = len(text)
while i > 0 and unicodedata.category(text[i - 1]).startswith("P"):
i -= 1
return text[:i]
@staticmethod
def _advance_by_alnums(text: str, start_pos: int, n: int) -> int:
"""Return the position in *text* after advancing past *n* alphanumeric chars.
Moves through the text one character at a time, counting only alphanumeric
characters. XML/HTML tags (``<...>``) are skipped entirely — their content
is not counted against the budget, so the returned span includes the full tag.
Other non-alphanumeric characters (spaces, punctuation) are also passed over
without decrementing the budget.
After the *n* alnum chars are consumed, advances further past any immediately
following punctuation (e.g. the ``,`` in ``"questions,"`` or the ``.`` in
``"done."``), stopping before the next space, alnum char, or XML tag.
Args:
text: The source text to scan.
start_pos: Starting position in *text*.
n: Number of alphanumeric characters to consume.
"""
pos = start_pos
count = 0
while pos < len(text) and count < n:
if text[pos] == "<":
end = text.find(">", pos)
pos = end + 1 if end != -1 else pos + 1
elif text[pos].isalnum():
count += 1
pos += 1
else:
pos += 1
while pos < len(text):
if text[pos] == "<":
break
if text[pos].isalnum() or text[pos].isspace():
break
pos += 1
return pos
def add_word_and_check_complete(self, word: str) -> bool:
"""Record a spoken word from a word-timestamp event.
Normalizes ``word``, appends it to the running total, and checks whether
all expected alphanumeric characters have been covered.
Before advancing, checks whether the word belongs to this frame via
``word_belongs_here``. If it does not (e.g. the TTS provider silently
dropped a word-timestamp), the slot is force-completed: the remaining
unspoken text from ``tts_text`` is stored in ``_frame_word`` so a
TTSTextFrame can still be emitted for the dropped portion, all remaining
``llm_text`` is consumed, and the entire incoming word is set as overflow
so the caller's overflow path routes it to the next slot unchanged.
If ``llm_text`` was provided at construction time, also advances the LLM
cursor by the same number of alphanumeric chars consumed from this word and
stores the corresponding LLM span in ``_llm_consumed``. When this word
completes the frame, the entire remaining LLM text (including any closing
tags) is consumed so nothing is lost.
If the word overshoots the expected length (overflow), the raw suffix of
the word (everything after the last char belonging to this frame) is stored
in ``_overflow_word``, so the caller can attribute it to the next
AggregatedTextFrame.
Args:
word: A single word token returned by the TTS service. TTS services that
emit spaces and punctuation as separate tokens (e.g. Inworld) must
pre-merge those tokens into the preceding word before calling this
method (see ``TTSService._merge_punct_tokens``).
Returns:
True when all expected content has been covered.
"""
normalized = self._normalize(word)
prev_len = len(self._received)
expected_len = len(self._tts_normalized)
self._overflow_word = None
self._llm_consumed = None
self._frame_word = None
if prev_len > expected_len:
logger.warning(f"{self}, trying to add a word in an already complete frame")
return True
# If the word doesn't match the next expected chars, the TTS provider
# likely dropped a word-timestamp event. Force-complete this slot: emit the
# remaining TTS text as _frame_word so a TTSTextFrame is still produced
# for the unspoken portion, consume all remaining llm_text, and route the
# entire incoming word as overflow for the next slot.
if not self.word_belongs_here(word):
self._frame_word = self._tts_text[self._tts_pos :]
if self._llm_text is not None:
self._llm_consumed = self._llm_text[self._llm_pos :]
self._llm_pos = len(self._llm_text)
# This should not happen: force-complete sweeps all remaining
# llm_text, so the span must contain the frame word. If it
# doesn't, tts_text and llm_text are out of sync in an
# unexpected way — discard rather than returning a corrupt span.
# Also removing punctuation from the frame word to match the
# expected text, since some TTS services may add punctuation to
# the raw text.
word_without_punctuation = self._remove_trailing_punctuation(self._frame_word)
if word_without_punctuation and word_without_punctuation not in self._llm_consumed:
logger.warning(
f"WordCompletionTracker: force-complete llm_consumed {repr(self._llm_consumed)!s} "
f"does not contain frame_word {repr(self._frame_word)!s}, discarding"
)
self._llm_consumed = None
self._received = self._tts_normalized # force-complete
self._overflow_word = word
return True
self._received += normalized
# How many normalized chars from this word belong to the current frame.
chars_for_frame = min(len(normalized), expected_len - prev_len)
if prev_len + len(normalized) > expected_len:
# This word straddles the frame boundary. Split into:
# - _frame_word: the prefix of `word` up to the split point, used
# for the TTSTextFrame of the current slot.
# - raw overflow word: the raw suffix after the split point, used
# to build a TTSTextFrame attributed to the next AggregatedTextFrame.
split_pos = self._advance_by_alnums(word, 0, chars_for_frame)
self._frame_word = word[:split_pos]
self._overflow_word = word[split_pos:]
else:
# Word fits entirely in this frame.
self._frame_word = word
# Advance the TTS cursor by the same alnum count so the force-complete
# path knows where in _tts_text to start from.
self._tts_pos = self._advance_by_alnums(self._tts_text, self._tts_pos, chars_for_frame)
if self._llm_text is not None:
if self.is_complete:
# Consume ALL remaining LLM text: closing tags (e.g. </card>)
# and any trailing punctuation that the TTS will not send separately.
self._llm_consumed = self._llm_text[self._llm_pos :]
self._llm_pos = len(self._llm_text)
else:
if chars_for_frame == 0:
# Consume exactly the raw word in llm_text, skipping any
# leading spaces that belong to the previous token's span.
start = self._llm_pos
while start < len(self._llm_text) and self._llm_text[start].isspace():
start += 1
end = start + len(word)
self._llm_consumed = self._llm_text[start:end]
self._llm_pos = end
else:
# Advance through llm_text by exactly chars_for_frame alphanumeric
# chars. Non-alnum chars (spaces, opening tags) are included in the
# slice, preserving the original formatting for the context.
new_pos = self._advance_by_alnums(
self._llm_text, self._llm_pos, chars_for_frame
)
self._llm_consumed = self._llm_text[self._llm_pos : new_pos]
self._llm_pos = new_pos
# This should not happen: the LLM cursor is driven by the same
# alnum count as the word stream, so the consumed span must contain
# the frame word. If it doesn't, the cursors drifted out of sync
# in an unexpected way — discard rather than returning a corrupt span.
# Also removing punctuation from the frame word to match the
# expected text, since some TTS services may add punctuation to
# the raw text.
word_without_punctuation = self._remove_trailing_punctuation(self._frame_word)
if word_without_punctuation and self._fold_typography(
word_without_punctuation
) not in self._fold_typography(self._llm_consumed):
logger.warning(
f"WordCompletionTracker: llm_consumed {repr(self._llm_consumed)!s} "
f"does not contain frame_word {repr(self._frame_word)!s}, discarding"
)
self._llm_consumed = None
return self.is_complete
def word_belongs_here(self, word: str) -> bool:
"""Return True if this word plausibly belongs to the remaining TTS text.
Dispatches to one of two checks depending on whether the word contains
any alphanumeric characters after normalization:
- Alnum words: prefix-match against the remaining expected chars.
- Symbol/punctuation words (empty after normalization): literal substring
search in the remaining raw TTS text, with a fallback for TTS providers
that substitute Unicode symbols with ASCII punctuation.
Used to detect when the TTS provider silently dropped a word-timestamp
event: if the incoming word does not match this slot's remaining content,
the caller should force-complete this slot and route the word to the next.
"""
normalized = self._normalize(word)
if normalized:
return self._alnum_word_belongs_here(normalized)
else:
return self._symbol_word_belongs_here(word)
def _alnum_word_belongs_here(self, normalized: str) -> bool:
"""Return True if an alnum-containing word matches this frame's remaining expected chars.
Accepts both full words and partial tokens — the word belongs here as long
as its normalized characters are a prefix of what is still expected. This
also handles the overflow case where the word is longer than the remaining
content (the excess is detected and split in ``add_word_and_check_complete``).
"""
remaining = self._tts_normalized[len(self._received) :]
if not remaining:
return False
check_len = min(len(normalized), len(remaining))
return remaining.startswith(normalized[:check_len])
def _symbol_word_belongs_here(self, word: str) -> bool:
"""Return True if a non-alnum word (emoji, punctuation, symbol) belongs to this frame.
Two checks are applied in order:
1. **Literal substring**: search for the raw word in the remaining TTS text.
``_advance_by_alnums`` may have already moved ``_tts_pos`` past some trailing
punctuation, so the search window is backed up to include those characters.
2. **Symbol substitution fallback**: some TTS providers substitute Unicode symbols
with ASCII punctuation in word-timestamp events (e.g. ElevenLabs reports ``→``
as ``-``), so check 1 always fails even though the word belongs here. If alnum
content still remains unconsumed and the next non-space character in the TTS
text is itself a non-alnum symbol, accept the word as a substitution.
"""
search_start = self._tts_pos
while search_start > 0:
ch = self._tts_text[search_start - 1]
if ch.isalnum() or ch.isspace() or ch == ">":
break
search_start -= 1
if word in self._tts_text[search_start:]:
return True
if len(self._received) >= len(self._tts_normalized):
return False
pos = self._tts_pos
while pos < len(self._tts_text) and self._tts_text[pos].isspace():
pos += 1
return pos < len(self._tts_text) and not self._tts_text[pos].isalnum()
def get_word_for_frame(self) -> str | None:
"""Return the portion of the last word that belongs to this frame.
- Normal word (no overflow): the full word.
- Straddling word: the prefix up to the frame boundary (e.g. ``"1111"``
from ``"1111 And"``).
- Force-completed (word didn't belong): the remaining unspoken text from
``tts_text`` so a TTSTextFrame can still be emitted for the dropped
portion. The incoming word is routed as overflow to the next slot.
"""
return self._frame_word.strip() if self._frame_word else self._frame_word
def get_overflow_word(self) -> str | None:
"""Return the raw suffix of the last word that overflows into the next frame.
Preserves the original casing and any non-alphanumeric characters so the
overflow TTSTextFrame has natural word text. Returns None when there is no
overflow (the word fit entirely within this frame).
"""
return self._overflow_word.strip() if self._overflow_word else self._overflow_word
def get_llm_consumed(self) -> str | None:
"""Return the LLM text span consumed for the last added word.
Returns None if no llm_text was provided at construction time.
"""
return self._llm_consumed.strip() if self._llm_consumed else self._llm_consumed
def get_accumulated_tts_text(self) -> str:
"""Return all consumed text from tts_text up to the current cursor position.
Unlike ``get_word_for_frame()`` (which reflects only the last word), this returns
everything that has been consumed since construction or the last ``reset()``.
"""
return self._tts_text[: self._tts_pos]
def get_accumulated_llm_text(self) -> str | None:
"""Return all consumed text from llm_text up to the current cursor position.
Unlike ``get_llm_consumed()`` (which reflects only the last word), this returns
everything that has been consumed since construction or the last ``reset()``.
Returns None if no llm_text was provided at construction time.
"""
if self._llm_text is None:
return None
return self._llm_text[: self._llm_pos]
def get_remaining_tts_text(self) -> str:
"""Return the unspoken portion of tts_text, stripped of leading/trailing whitespace.
This is the text that the TTS provider has not yet confirmed via word-timestamp
events. Useful for force-completing a slot when the audio context ends before all
word-timestamp events have arrived.
"""
return self._tts_text[self._tts_pos :].strip()
def get_remaining_llm_text(self) -> str | None:
"""Return the unspoken portion of llm_text, stripped of leading/trailing whitespace.
Returns None if no llm_text was provided at construction time. Like
``get_remaining_tts_text()``, intended for force-completing a slot so that the
conversation context receives the full original text.
"""
if self._llm_text is None:
return None
remaining = self._llm_text[self._llm_pos :].strip()
return remaining if remaining else None
@property
def is_complete(self) -> bool:
"""True when accumulated normalized chars >= expected normalized chars."""
return len(self._received) >= len(self._tts_normalized)
def reset(self):
"""Reset received word accumulation without changing the expected text."""
self._received = ""
self._tts_pos = 0
self._llm_pos = 0
self._overflow_word = None
self._llm_consumed = None
self._frame_word = None

View File

@@ -0,0 +1,53 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
"""Utilities for normalizing word-timestamp streams from TTS services."""
import re
def merge_punct_tokens(
word_times: list[tuple[str, float]],
) -> list[tuple[str, float]]:
"""Merge punctuation/space-only tokens into the preceding word.
Some TTS services (e.g. Inworld) emit spaces and punctuation as separate
word-timestamp tokens rather than attaching them to the adjacent word.
This function collapses those tokens so downstream consumers always receive
words with trailing punctuation already attached — identical to the format
produced by ElevenLabs or Cartesia.
A token is considered punct/space-only when its text contains no alphanumeric
characters after stripping XML/HTML tags. Such tokens are appended to the
preceding word's text and their timestamp is discarded (the preceding word's
timestamp is kept). Leading punct/space tokens with no preceding word are
silently discarded. Every output token is stripped of leading and trailing
whitespace (spaces, tabs, newlines).
Args:
word_times: Raw list of ``(word, timestamp)`` pairs from the TTS service.
Returns:
Merged list where every entry contains at least one alphanumeric character
and has no leading or trailing whitespace.
Example::
merge_punct_tokens([("questions", 1.0), (", ", 1.2), ("explain", 1.4)])
# → [("questions,", 1.0), ("explain", 1.4)]
"""
merged: list[tuple[str, float]] = []
for word, ts in word_times:
stripped = re.sub(r"<[^>]+>", "", word)
has_alnum = any(c.isalnum() for c in stripped)
if not has_alnum:
if merged:
prev_word, prev_ts = merged[-1]
merged[-1] = (prev_word + word, prev_ts)
# else: leading punct/space with no preceding word → discard
else:
merged.append((word, ts))
return [(word.strip(), ts) for word, ts in merged]

View File

@@ -0,0 +1,612 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
"""Tests for AggregatedFrameSequencer.
All methods on the sequencer are synchronous and return lists of frames,
so no async machinery is needed here.
Test groups:
- register_skipped: immediate flush vs. blocked by a preceding spoken slot
- register_spoken / complete_spoken_slot: push_text_frames=True path
- flush: pts propagation, transport_destination, stops at incomplete spoken slot
- process_word: normal, completing, passthrough, raw_text propagation
- process_word overflow: single token spanning two slot boundaries
- process_word force-complete via belongs_here failure
- force_complete: remaining text emission, raw_text, corrupt raw discard, slot ordering
- clear: resets all state
"""
import unittest
from pipecat.frames.frames import AggregatedTextFrame, AggregationType, TTSTextFrame
from pipecat.utils.context.aggregated_frame_sequencer import AggregatedFrameSequencer
from pipecat.utils.context.word_completion_tracker import WordCompletionTracker
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def _seq() -> AggregatedFrameSequencer:
return AggregatedFrameSequencer(name="test")
def _spoken_frame(text: str) -> AggregatedTextFrame:
return AggregatedTextFrame(text, AggregationType.SENTENCE)
def _skipped_frame(text: str) -> AggregatedTextFrame:
return AggregatedTextFrame(text, "code")
def _tracker(tts_text: str, llm_text: str | None = None) -> WordCompletionTracker:
return WordCompletionTracker(tts_text, llm_text=llm_text)
# ---------------------------------------------------------------------------
# register_skipped
# ---------------------------------------------------------------------------
class TestRegisterSkipped(unittest.TestCase):
def test_emits_immediately_with_empty_queue(self):
seq = _seq()
frame = _skipped_frame("code block")
result = seq.register_skipped(frame, "ctx1", None)
self.assertEqual(len(result), 1)
self.assertIs(result[0], frame)
def test_sets_append_to_context_true(self):
seq = _seq()
frame = _skipped_frame("code")
seq.register_skipped(frame, "ctx1", None)
self.assertTrue(frame.append_to_context)
def test_sets_context_id_on_frame(self):
seq = _seq()
frame = _skipped_frame("code")
seq.register_skipped(frame, "ctx42", None)
self.assertEqual(frame.context_id, "ctx42")
def test_sets_transport_destination(self):
seq = _seq()
frame = _skipped_frame("code")
result = seq.register_skipped(frame, "ctx1", "dest-A")
self.assertEqual(result[0].transport_destination, "dest-A")
def test_blocked_by_incomplete_spoken_slot(self):
seq = _seq()
seq.register_spoken(_spoken_frame("hello world"), "ctx1", _tracker("hello world"), True)
result = seq.register_skipped(_skipped_frame("code"), "ctx2", None)
self.assertEqual(result, [])
def test_emits_immediately_after_already_complete_spoken_slot(self):
seq = _seq()
seq.register_spoken(_spoken_frame("hi"), "ctx1", tracker=None, append_to_context=True)
seq.complete_spoken_slot()
result = seq.register_skipped(_skipped_frame("code"), "ctx2", None)
self.assertEqual(len(result), 1)
def test_multiple_skipped_before_any_spoken_all_emit(self):
seq = _seq()
r1 = seq.register_skipped(_skipped_frame("code1"), "ctx1", None)
r2 = seq.register_skipped(_skipped_frame("code2"), "ctx2", None)
self.assertEqual(len(r1), 1)
self.assertEqual(len(r2), 1)
# ---------------------------------------------------------------------------
# register_spoken / complete_spoken_slot (push_text_frames=True path)
# ---------------------------------------------------------------------------
class TestCompleteSpokenSlot(unittest.TestCase):
def test_noop_with_empty_queue(self):
seq = _seq()
self.assertEqual(seq.complete_spoken_slot(), [])
def test_marks_slot_complete_and_flushes_skipped(self):
seq = _seq()
seq.register_spoken(_spoken_frame("hello"), "ctx1", tracker=None, append_to_context=True)
skipped = _skipped_frame("code")
seq.register_skipped(skipped, "ctx2", None) # blocked
result = seq.complete_spoken_slot()
self.assertEqual(len(result), 1)
self.assertIs(result[0], skipped)
self.assertTrue(skipped.append_to_context)
def test_only_first_pending_slot_is_marked(self):
seq = _seq()
seq.register_spoken(_spoken_frame("one"), "ctx1", tracker=None, append_to_context=True)
seq.register_spoken(_spoken_frame("two"), "ctx2", tracker=None, append_to_context=True)
skipped = _skipped_frame("code")
seq.register_skipped(skipped, "ctx3", None)
# ctx2 still blocks the skipped frame
result = seq.complete_spoken_slot()
self.assertEqual(result, [])
def test_skipped_flushes_after_all_preceding_spoken_complete(self):
seq = _seq()
seq.register_spoken(_spoken_frame("one"), "ctx1", tracker=None, append_to_context=True)
seq.register_spoken(_spoken_frame("two"), "ctx2", tracker=None, append_to_context=True)
skipped = _skipped_frame("code")
seq.register_skipped(skipped, "ctx3", None)
seq.complete_spoken_slot() # completes ctx1
result = seq.complete_spoken_slot() # completes ctx2 → flush skipped
self.assertEqual(len(result), 1)
self.assertIs(result[0], skipped)
# ---------------------------------------------------------------------------
# flush
# ---------------------------------------------------------------------------
class TestFlush(unittest.TestCase):
def test_empty_queue_returns_empty(self):
self.assertEqual(_seq().flush(), [])
def test_stops_at_incomplete_spoken_slot(self):
seq = _seq()
seq.register_spoken(_spoken_frame("hello"), "ctx1", tracker=None, append_to_context=True)
seq.register_skipped(_skipped_frame("code"), "ctx2", None)
self.assertEqual(seq.flush(), [])
def test_last_word_pts_assigned_to_skipped_frame(self):
seq = _seq()
seq.register_spoken(_spoken_frame("hello"), "ctx1", _tracker("hello"), True)
skipped = _skipped_frame("code")
seq.register_skipped(skipped, "ctx2", None)
# process_word("hello") completes the spoken slot and calls flush(last_word_pts=77)
result = seq.process_word("hello", pts=77, context_id="ctx1")
flushed = [f for f in result if isinstance(f, AggregatedTextFrame) and f.text == "code"]
self.assertEqual(len(flushed), 1)
self.assertEqual(flushed[0].pts, 77)
def test_complete_spoken_slots_are_swept(self):
seq = _seq()
seq.register_spoken(_spoken_frame("hello"), "ctx1", tracker=None, append_to_context=True)
seq.complete_spoken_slot()
# Queue should be empty after sweeping the complete spoken slot
self.assertEqual(seq._slots, [])
# ---------------------------------------------------------------------------
# process_word — basic
# ---------------------------------------------------------------------------
class TestProcessWordBasic(unittest.TestCase):
def _seq_with_spoken(self, text: str, ctx: str = "ctx1", append: bool = True):
seq = _seq()
seq.register_spoken(_spoken_frame(text), ctx, _tracker(text), append)
return seq
def test_returns_tts_text_frame(self):
seq = self._seq_with_spoken("hello")
result = seq.process_word("hello", pts=100, context_id="ctx1")
self.assertEqual(len(result), 1)
self.assertIsInstance(result[0], TTSTextFrame)
def test_frame_text_and_pts(self):
seq = self._seq_with_spoken("hello")
result = seq.process_word("hello", pts=100, context_id="ctx1")
self.assertEqual(result[0].text, "hello")
self.assertEqual(result[0].pts, 100)
def test_frame_context_id(self):
seq = self._seq_with_spoken("hello", ctx="ctx99")
result = seq.process_word("hello", pts=1, context_id="ctx99")
self.assertEqual(result[0].context_id, "ctx99")
def test_append_to_context_true(self):
seq = self._seq_with_spoken("hello", append=True)
result = seq.process_word("hello", pts=1, context_id="ctx1")
self.assertTrue(result[0].append_to_context)
def test_append_to_context_false(self):
seq = self._seq_with_spoken("hello", append=False)
result = seq.process_word("hello", pts=1, context_id="ctx1")
self.assertFalse(result[0].append_to_context)
def test_non_completing_word_does_not_flush_skipped(self):
seq = self._seq_with_spoken("hello world")
seq.register_skipped(_skipped_frame("code"), "ctx2", None)
result = seq.process_word("hello", pts=10, context_id="ctx1")
self.assertEqual(len(result), 1)
self.assertIsInstance(result[0], TTSTextFrame)
def test_completing_word_flushes_blocked_skipped_frame(self):
seq = self._seq_with_spoken("hello")
skipped = _skipped_frame("code")
seq.register_skipped(skipped, "ctx2", None)
result = seq.process_word("hello", pts=50, context_id="ctx1")
self.assertEqual(len(result), 2)
self.assertIsInstance(result[0], TTSTextFrame)
self.assertIs(result[1], skipped)
def test_last_of_multiple_words_flushes_skipped(self):
seq = self._seq_with_spoken("hello world")
skipped = _skipped_frame("code")
seq.register_skipped(skipped, "ctx2", None)
seq.process_word("hello", pts=10, context_id="ctx1")
result = seq.process_word("world", pts=20, context_id="ctx1")
self.assertTrue(any(f is skipped for f in result))
def test_no_active_slot_emits_passthrough(self):
seq = _seq()
result = seq.process_word("hello", pts=1, context_id="ctx-unknown")
self.assertEqual(len(result), 1)
self.assertIsInstance(result[0], TTSTextFrame)
self.assertEqual(result[0].text, "hello")
self.assertEqual(result[0].context_id, "ctx-unknown")
def test_passthrough_uses_default_append_to_context_true(self):
seq = _seq()
result = seq.process_word("hello", pts=1, context_id="ctx-unknown")
self.assertTrue(result[0].append_to_context)
def test_unrecognised_word_emits_passthrough(self):
seq = _seq()
seq.register_spoken(_spoken_frame("hello world"), "ctx1", _tracker("hello world"), True)
# "zzz" doesn't belong to "hello world" and there is no next slot
result = seq.process_word("zzz", pts=5, context_id="ctx1")
self.assertEqual(len(result), 1)
self.assertEqual(result[0].text, "zzz")
# ---------------------------------------------------------------------------
# process_word — raw_text propagation
# ---------------------------------------------------------------------------
class TestProcessWordRawText(unittest.TestCase):
def test_raw_text_split_across_word_frames(self):
seq = _seq()
seq.register_spoken(
_spoken_frame("4111 1111"),
"ctx1",
WordCompletionTracker("4111 1111", llm_text="<card>4111 1111</card>"),
append_to_context=True,
)
r1 = seq.process_word("4111", pts=10, context_id="ctx1")
r2 = seq.process_word("1111", pts=20, context_id="ctx1")
self.assertEqual(r1[0].raw_text, "<card>4111")
last_word_frames = [f for f in r2 if isinstance(f, TTSTextFrame)]
self.assertEqual(last_word_frames[0].raw_text, "1111</card>")
def test_raw_text_none_when_no_llm_text(self):
seq = _seq()
seq.register_spoken(_spoken_frame("hello"), "ctx1", _tracker("hello"), True)
result = seq.process_word("hello", pts=1, context_id="ctx1")
self.assertIsNone(result[0].raw_text)
# ---------------------------------------------------------------------------
# process_word — overflow (single token spanning two slots)
# ---------------------------------------------------------------------------
class TestProcessWordOverflow(unittest.TestCase):
def test_overflow_produces_two_tts_text_frames(self):
seq = _seq()
seq.register_spoken(_spoken_frame("abc"), "ctx1", _tracker("abc"), True)
seq.register_spoken(_spoken_frame("def"), "ctx2", _tracker("def"), True)
result = seq.process_word("abcdef", pts=100, context_id="ctx1")
word_frames = [f for f in result if isinstance(f, TTSTextFrame)]
self.assertEqual(len(word_frames), 2)
self.assertEqual(word_frames[0].text, "abc")
self.assertEqual(word_frames[1].text, "def")
def test_overflow_assigns_correct_context_ids(self):
seq = _seq()
seq.register_spoken(_spoken_frame("abc"), "ctx1", _tracker("abc"), True)
seq.register_spoken(_spoken_frame("def"), "ctx2", _tracker("def"), True)
result = seq.process_word("abcdef", pts=100, context_id="ctx1")
word_frames = [f for f in result if isinstance(f, TTSTextFrame)]
self.assertEqual(word_frames[0].context_id, "ctx1")
self.assertEqual(word_frames[1].context_id, "ctx2")
def test_overflow_completing_next_slot_flushes_skipped(self):
seq = _seq()
seq.register_spoken(_spoken_frame("abc"), "ctx1", _tracker("abc"), True)
seq.register_spoken(_spoken_frame("def"), "ctx2", _tracker("def"), True)
skipped = _skipped_frame("code")
seq.register_skipped(skipped, "ctx3", None) # blocked behind ctx2
result = seq.process_word("abcdef", pts=100, context_id="ctx1")
self.assertTrue(any(f is skipped for f in result))
def test_overflow_not_completing_next_slot_does_not_flush_skipped(self):
seq = _seq()
seq.register_spoken(_spoken_frame("abc"), "ctx1", _tracker("abc"), True)
seq.register_spoken(_spoken_frame("def ghi"), "ctx2", _tracker("def ghi"), True)
skipped = _skipped_frame("code")
seq.register_skipped(skipped, "ctx3", None)
# "abcdef" overflows: "def" goes to ctx2, but ctx2 still expects " ghi"
result = seq.process_word("abcdef", pts=100, context_id="ctx1")
self.assertFalse(any(f is skipped for f in result))
# ---------------------------------------------------------------------------
# process_word — force-complete via word_belongs_here failure
# ---------------------------------------------------------------------------
class TestProcessWordForcesComplete(unittest.TestCase):
def test_word_for_next_slot_force_completes_current(self):
"""When a word belongs to the next slot but not the current, the current
slot is force-completed and the word is routed to the next slot."""
seq = _seq()
seq.register_spoken(_spoken_frame("hello"), "ctx1", _tracker("hello"), True)
seq.register_spoken(_spoken_frame("world"), "ctx2", _tracker("world"), True)
# "world" doesn't belong to ctx1 but belongs to ctx2
result = seq.process_word("world", pts=50, context_id="ctx2")
word_frames = [f for f in result if isinstance(f, TTSTextFrame)]
texts = {f.text for f in word_frames}
self.assertIn("world", texts)
def test_force_complete_then_overflow_flushes_skipped(self):
seq = _seq()
seq.register_spoken(_spoken_frame("hello"), "ctx1", _tracker("hello"), True)
seq.register_spoken(_spoken_frame("world"), "ctx2", _tracker("world"), True)
skipped = _skipped_frame("code")
seq.register_skipped(skipped, "ctx3", None)
# "world" force-completes ctx1 and completes ctx2 via overflow
result = seq.process_word("world", pts=50, context_id="ctx2")
self.assertTrue(any(f is skipped for f in result))
# ---------------------------------------------------------------------------
# force_complete
# ---------------------------------------------------------------------------
class TestForceComplete(unittest.TestCase):
def test_emits_remaining_text_when_word_dropped(self):
seq = _seq()
seq.register_spoken(_spoken_frame("hello world"), "ctx1", _tracker("hello world"), True)
seq.process_word("hello", pts=10, context_id="ctx1") # "world" never arrives
result = seq.force_complete(last_word_pts=50)
tts_frames = [f for f in result if isinstance(f, TTSTextFrame)]
self.assertEqual(len(tts_frames), 1)
self.assertEqual(tts_frames[0].text, "world")
self.assertEqual(tts_frames[0].pts, 50)
def test_emits_full_text_when_no_words_arrived(self):
seq = _seq()
seq.register_spoken(_spoken_frame("hello world"), "ctx1", _tracker("hello world"), True)
result = seq.force_complete(last_word_pts=0)
tts_frames = [f for f in result if isinstance(f, TTSTextFrame)]
self.assertEqual(len(tts_frames), 1)
self.assertEqual(tts_frames[0].text, "hello world")
def test_already_complete_slot_emits_nothing(self):
seq = _seq()
seq.register_spoken(_spoken_frame("hi"), "ctx1", _tracker("hi"), True)
seq.process_word("hi", pts=5, context_id="ctx1") # completes normally
result = seq.force_complete(last_word_pts=10)
self.assertEqual(result, [])
def test_flushes_skipped_frames_after_completing(self):
seq = _seq()
seq.register_spoken(_spoken_frame("hello"), "ctx1", _tracker("hello"), True)
skipped = _skipped_frame("code")
seq.register_skipped(skipped, "ctx2", None)
result = seq.force_complete(last_word_pts=20)
self.assertTrue(any(f is skipped for f in result))
self.assertTrue(skipped.append_to_context)
def test_propagates_raw_text(self):
seq = _seq()
seq.register_spoken(
_spoken_frame("4111 1111"),
"ctx1",
WordCompletionTracker("4111 1111", llm_text="<card>4111 1111</card>"),
append_to_context=True,
)
seq.process_word("4111", pts=10, context_id="ctx1") # "1111" never arrives
result = seq.force_complete(last_word_pts=20)
tts_frames = [f for f in result if isinstance(f, TTSTextFrame)]
self.assertEqual(tts_frames[0].text, "1111")
self.assertEqual(tts_frames[0].raw_text, "1111</card>")
def test_discards_corrupt_raw_remaining(self):
"""raw_remaining is discarded when it does not contain remaining_text."""
seq = _seq()
# "abc" normalized ≠ "xyz" normalized — any remaining won't be in raw_remaining
seq.register_spoken(
_spoken_frame("abc"),
"ctx1",
WordCompletionTracker("abc", llm_text="xyz"),
append_to_context=True,
)
result = seq.force_complete(last_word_pts=0)
tts_frames = [f for f in result if isinstance(f, TTSTextFrame)]
self.assertEqual(len(tts_frames), 1)
self.assertEqual(tts_frames[0].text, "abc")
self.assertIsNone(tts_frames[0].raw_text) # discarded due to corruption
def test_slot_without_tracker_just_marks_complete_and_flushes(self):
seq = _seq()
seq.register_spoken(_spoken_frame("hello"), "ctx1", tracker=None, append_to_context=True)
skipped = _skipped_frame("code")
seq.register_skipped(skipped, "ctx2", None)
result = seq.force_complete(last_word_pts=0)
tts_frames = [f for f in result if isinstance(f, TTSTextFrame)]
self.assertEqual(tts_frames, []) # no tracker → no word frame
self.assertTrue(any(f is skipped for f in result))
def test_multiple_incomplete_slots_all_emitted(self):
seq = _seq()
seq.register_spoken(_spoken_frame("hello"), "ctx1", _tracker("hello"), True)
seq.register_spoken(_spoken_frame("world"), "ctx2", _tracker("world"), True)
result = seq.force_complete(last_word_pts=0)
tts_frames = [f for f in result if isinstance(f, TTSTextFrame)]
texts = {f.text for f in tts_frames}
self.assertIn("hello", texts)
self.assertIn("world", texts)
# ---------------------------------------------------------------------------
# clear
# ---------------------------------------------------------------------------
class TestClear(unittest.TestCase):
def test_clears_slots(self):
seq = _seq()
seq.register_spoken(_spoken_frame("hello"), "ctx1", _tracker("hello"), True)
seq.register_skipped(_skipped_frame("code"), "ctx2", None)
seq.clear()
self.assertEqual(seq._slots, [])
def test_clears_context_map(self):
seq = _seq()
seq.register_spoken(_spoken_frame("hello"), "ctx1", _tracker("hello"), True)
seq.clear()
self.assertEqual(seq._context_append_to_context, {})
def test_after_clear_skipped_emits_immediately(self):
seq = _seq()
seq.register_spoken(_spoken_frame("hello"), "ctx1", _tracker("hello"), True)
seq.clear()
frame = _skipped_frame("code")
result = seq.register_skipped(frame, "ctx2", None)
self.assertEqual(len(result), 1)
def test_after_clear_process_word_uses_passthrough(self):
seq = _seq()
seq.register_spoken(_spoken_frame("hello"), "ctx1", _tracker("hello"), True)
seq.clear()
result = seq.process_word("hello", pts=1, context_id="ctx1")
# No active slot after clear → passthrough
self.assertEqual(len(result), 1)
self.assertEqual(result[0].text, "hello")
# ---------------------------------------------------------------------------
# CJK languages — Korean, Japanese, Chinese
# ---------------------------------------------------------------------------
class TestCJKLanguages(unittest.TestCase):
"""Sequencer behaviour for CJK language scenarios.
Korean: Cartesia returns each word as a separate timestamp event (one word
per process_word call). Japanese/Chinese: Cartesia merges all characters
in one timestamp message into a single combined token before calling
process_word.
"""
# --- Korean ---
def test_korean_word_by_word_completes_slot_and_flushes_skipped(self):
"""Korean words fed one at a time complete the spoken slot and unblock a skipped frame."""
seq = _seq()
sentence = "저는 여러분의 AI 어시스턴트입니다."
words = ["저는", "여러분의", "AI", "어시스턴트입니다."]
seq.register_spoken(_spoken_frame(sentence), "ctx1", _tracker(sentence), True)
skipped = _skipped_frame("[code]")
seq.register_skipped(skipped, "ctx2", None)
# Skipped stays blocked until the last word arrives
for word in words[:-1]:
partial = seq.process_word(word, pts=100, context_id="ctx1")
self.assertFalse(any(f is skipped for f in partial))
result = seq.process_word(words[-1], pts=200, context_id="ctx1")
self.assertTrue(any(f is skipped for f in result))
def test_korean_force_complete_emits_correct_remaining_text(self):
"""After one Korean word, force_complete emits the correct unspoken suffix."""
seq = _seq()
sentence = "저는 여러분의 AI 어시스턴트입니다."
seq.register_spoken(_spoken_frame(sentence), "ctx1", _tracker(sentence), True)
seq.process_word("저는", pts=10, context_id="ctx1")
result = seq.force_complete(last_word_pts=50)
tts_frames = [f for f in result if isinstance(f, TTSTextFrame)]
self.assertEqual(len(tts_frames), 1)
self.assertEqual(tts_frames[0].text, "여러분의 AI 어시스턴트입니다.")
self.assertEqual(tts_frames[0].pts, 50)
# --- Japanese ---
def test_japanese_combined_groups_complete_spoken_slot(self):
"""Two Cartesia-style combined Japanese groups complete the slot and flush skipped."""
seq = _seq()
sentence = "こんにちは、私はあなたの"
seq.register_spoken(_spoken_frame(sentence), "ctx1", _tracker(sentence), True)
skipped = _skipped_frame("[skipped]")
seq.register_skipped(skipped, "ctx2", None)
r1 = seq.process_word("こんにちは、私", pts=100, context_id="ctx1")
self.assertFalse(any(f is skipped for f in r1))
r2 = seq.process_word("はあなたの", pts=200, context_id="ctx1")
self.assertTrue(any(f is skipped for f in r2))
def test_japanese_force_complete_emits_remaining_chars(self):
"""After the first Japanese combined group, force_complete emits the rest."""
seq = _seq()
sentence = "こんにちは、私はあなたの"
seq.register_spoken(_spoken_frame(sentence), "ctx1", _tracker(sentence), True)
seq.process_word("こんにちは、私", pts=10, context_id="ctx1")
result = seq.force_complete(last_word_pts=50)
tts_frames = [f for f in result if isinstance(f, TTSTextFrame)]
self.assertEqual(len(tts_frames), 1)
self.assertEqual(tts_frames[0].text, "はあなたの")
# --- Chinese ---
def test_chinese_combined_groups_complete_spoken_slot(self):
"""Two Cartesia-style combined Chinese groups complete the slot and flush skipped."""
seq = _seq()
sentence = "你好,我是你的智能"
seq.register_spoken(_spoken_frame(sentence), "ctx1", _tracker(sentence), True)
skipped = _skipped_frame("[skipped]")
seq.register_skipped(skipped, "ctx2", None)
r1 = seq.process_word("你好,我是", pts=100, context_id="ctx1")
self.assertFalse(any(f is skipped for f in r1))
r2 = seq.process_word("你的智能", pts=200, context_id="ctx1")
self.assertTrue(any(f is skipped for f in r2))
def test_chinese_force_complete_emits_remaining_chars(self):
"""After the first Chinese combined group, force_complete emits the rest."""
seq = _seq()
sentence = "你好,我是你的智能"
seq.register_spoken(_spoken_frame(sentence), "ctx1", _tracker(sentence), True)
seq.process_word("你好,我是", pts=10, context_id="ctx1")
result = seq.force_complete(last_word_pts=50)
tts_frames = [f for f in result if isinstance(f, TTSTextFrame)]
self.assertEqual(len(tts_frames), 1)
self.assertEqual(tts_frames[0].text, "你的智能")
if __name__ == "__main__":
unittest.main()

View File

@@ -0,0 +1,45 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
from unittest.mock import AsyncMock
import pytest
from websockets.protocol import State
from pipecat.services.cartesia.stt import CartesiaSTTService
class _FakeWebsocket:
def __init__(self, *, state=State.OPEN, send_side_effect=None):
self.state = state
self.send = AsyncMock(side_effect=send_side_effect)
@pytest.mark.asyncio
async def test_cartesia_connect_failure_clears_stale_websocket(monkeypatch):
async def fake_websocket_connect(*args, **kwargs):
raise RuntimeError("connection failed")
monkeypatch.setattr("pipecat.services.cartesia.stt.websocket_connect", fake_websocket_connect)
service = CartesiaSTTService(api_key="test-key", sample_rate=16000)
service._websocket = _FakeWebsocket(state=State.CLOSED)
await service._connect_websocket()
assert service._websocket is None
@pytest.mark.asyncio
async def test_cartesia_run_stt_logs_send_failure_without_clearing_websocket():
service = CartesiaSTTService(api_key="test-key", sample_rate=16000)
websocket = _FakeWebsocket(send_side_effect=RuntimeError("websocket closed"))
service._websocket = websocket
async for _ in service.run_stt(b"\x00" * 160):
pass
assert service._websocket is websocket

View File

@@ -18,7 +18,7 @@ def _service(language: str) -> CartesiaTTSService:
def _process_word_timestamps(
words: list[str], starts: list[float], language: str
) -> list[tuple[str, float]]:
return _service(language)._process_word_timestamps_for_language(words, starts)
return _service(language)._normalize_word_timestamps(words, starts)
def _concatenate_processed_timestamps(
@@ -27,7 +27,7 @@ def _concatenate_processed_timestamps(
service = _service(language)
text_parts = []
for words, starts in timestamp_groups:
processed_timestamps = service._process_word_timestamps_for_language(words, starts)
processed_timestamps = service._normalize_word_timestamps(words, starts)
includes_inter_frame_spaces = service._word_timestamps_include_inter_frame_spaces()
text_parts.extend(
TextPartForConcatenation(

View File

@@ -6,9 +6,14 @@
"""Tests for ElevenLabs TTS alignment handling."""
import json
from typing import Any
import pytest
from websockets.protocol import State
from pipecat.services.elevenlabs.tts import (
ElevenLabsTTSService,
_select_alignment,
_strip_utterance_leading_spaces,
calculate_word_times,
@@ -200,3 +205,87 @@ def test_select_alignment_works_with_http_field_names():
)
assert selected is not None
assert selected["characters"] == list(" Hi")
# ---------------------------------------------------------------------------
# Keepalive vs context-init race
#
# The keepalive must only stamp a context_id once its context-init (carrying
# voice_settings) has been sent. Stamping it earlier makes the keepalive the
# context's first message, with no voice_settings, and ElevenLabs rejects the
# later context-init with a 1008 policy violation.
# ---------------------------------------------------------------------------
class _FakeWebSocket:
"""Minimal stand-in for the ElevenLabs websocket that records sends."""
def __init__(self):
self.state = State.OPEN
self.sent: list[dict] = []
async def send(self, data: str):
self.sent.append(json.loads(data))
def _make_service() -> ElevenLabsTTSService:
return ElevenLabsTTSService(
api_key="test-key",
settings=ElevenLabsTTSService.Settings(
voice="test-voice",
stability=0.55,
similarity_boost=0.85,
use_speaker_boost=True,
speed=0.81,
),
)
@pytest.mark.asyncio
async def test_keepalive_does_not_stamp_context_before_init():
"""During the pre-init window the keepalive must not stamp the new context_id."""
service = _make_service()
ws = _FakeWebSocket()
service._websocket = ws
# Simulate the start of an LLM turn: TTSService sets the turn context id on
# LLMFullResponseStartFrame, before run_tts sends the voice_settings init.
service._turn_context_id = "ctx-1"
service._playing_context_id = None
assert "ctx-1" not in service._context_init_sent
await service._send_keepalive()
# Context-less keepalive: the real context-init stays the context's first
# message, so ElevenLabs won't reject it with 1008.
assert ws.sent == [{"text": ""}]
@pytest.mark.asyncio
async def test_keepalive_stamps_context_after_init():
"""Once the context-init has been sent, the keepalive targets that context."""
service = _make_service()
ws = _FakeWebSocket()
service._websocket = ws
service._turn_context_id = "ctx-1"
service._playing_context_id = None
# run_tts records the context once its voice_settings init has gone out.
service._context_init_sent.add("ctx-1")
await service._send_keepalive()
assert ws.sent == [{"text": "", "context_id": "ctx-1"}]
@pytest.mark.asyncio
async def test_keepalive_without_active_context_sends_empty():
"""With no active context, the keepalive sends a plain empty message."""
service = _make_service()
ws = _FakeWebSocket()
service._websocket = ws
service._turn_context_id = None
service._playing_context_id = None
await service._send_keepalive()
assert ws.sent == [{"text": ""}]

324
tests/test_runner_run.py Normal file
View File

@@ -0,0 +1,324 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import argparse
import io
import sys
import types
import unittest
from contextlib import redirect_stdout
from unittest.mock import MagicMock, patch
from fastapi import FastAPI
from fastapi.testclient import TestClient
from pydantic import BaseModel
from pipecat.runner.run import (
_print_startup_message,
_setup_daily_routes,
_setup_telephony_routes,
_setup_unified_start_route,
_setup_webrtc_routes,
_setup_websocket_routes,
_transport_route_dependencies,
_transport_routes_enabled,
)
class TestRunnerRun(unittest.TestCase):
def _capture_startup_message(self, args: argparse.Namespace) -> str:
buffer = io.StringIO()
with redirect_stdout(buffer):
_print_startup_message(args)
return buffer.getvalue()
def test_transport_route_dependencies_maps_transports_to_modules(self):
self.assertEqual(_transport_route_dependencies("daily"), ("daily",))
self.assertEqual(_transport_route_dependencies("webrtc"), ("aiortc",))
self.assertEqual(_transport_route_dependencies("websocket"), ("fastapi", "websockets"))
self.assertEqual(_transport_route_dependencies("telephony"), ("fastapi", "websockets"))
self.assertEqual(_transport_route_dependencies("twilio"), ("fastapi", "websockets"))
self.assertEqual(_transport_route_dependencies("telnyx"), ("fastapi", "websockets"))
self.assertEqual(_transport_route_dependencies("plivo"), ("fastapi", "websockets"))
self.assertEqual(_transport_route_dependencies("exotel"), ("fastapi", "websockets"))
self.assertEqual(_transport_route_dependencies("vonage"), ())
def test_transport_routes_enabled_maps_transports_to_dependency_checks(self):
def module_available(module: str) -> bool:
return module in {"fastapi", "websockets"}
with patch("pipecat.runner.run._is_module_available", side_effect=module_available):
self.assertFalse(_transport_routes_enabled("daily"))
self.assertFalse(_transport_routes_enabled("webrtc"))
self.assertTrue(_transport_routes_enabled("websocket"))
self.assertTrue(_transport_routes_enabled("telephony"))
self.assertTrue(_transport_routes_enabled("twilio"))
self.assertTrue(_transport_routes_enabled("vonage"))
def test_setup_webrtc_routes_skips_when_aiortc_is_missing(self):
"""WebRTC routes should be optional when the webrtc extra is not installed."""
app = FastAPI()
args = argparse.Namespace(folder=None, esp32=False, host="localhost")
with (
patch("pipecat.runner.run._transport_routes_enabled", return_value=False),
patch("pipecat.runner.run.logger") as logger,
):
_setup_webrtc_routes(app, args, {})
paths = {route.path for route in app.routes}
self.assertNotIn("/api/offer", paths)
logger.info.assert_not_called()
def test_setup_webrtc_routes_registers_routes_when_webrtc_is_available(self):
"""WebRTC routes should be registered when dependencies are available."""
app = FastAPI()
args = argparse.Namespace(folder=None, esp32=False, host="localhost")
connection_module = types.ModuleType("pipecat.transports.smallwebrtc.connection")
connection_module.SmallWebRTCConnection = MagicMock()
request_handler_module = types.ModuleType("pipecat.transports.smallwebrtc.request_handler")
class IceCandidate(BaseModel):
candidate: str
sdp_mid: str
sdp_mline_index: int
class SmallWebRTCPatchRequest(BaseModel):
pc_id: str
candidates: list[IceCandidate] = []
class SmallWebRTCRequest(BaseModel):
sdp: str
type: str
pc_id: str | None = None
restart_pc: bool | None = None
request_data: dict | None = None
request_handler_module.IceCandidate = IceCandidate
request_handler_module.SmallWebRTCPatchRequest = SmallWebRTCPatchRequest
request_handler_module.SmallWebRTCRequest = SmallWebRTCRequest
class MockSmallWebRTCRequestHandler:
def __init__(self, *args, **kwargs):
pass
async def close(self):
pass
request_handler_module.SmallWebRTCRequestHandler = MockSmallWebRTCRequestHandler
with (
patch("pipecat.runner.run._transport_routes_enabled", return_value=True),
patch.dict(
sys.modules,
{
"pipecat.transports.smallwebrtc.connection": connection_module,
"pipecat.transports.smallwebrtc.request_handler": request_handler_module,
},
),
):
_setup_webrtc_routes(app, args, {})
paths = {route.path for route in app.routes}
self.assertIn("/api/offer", paths)
self.assertIn("/files/{filename:path}", paths)
def test_setup_websocket_routes_skips_when_websocket_is_missing(self):
"""Plain WebSocket routes should be optional."""
app = FastAPI()
args = argparse.Namespace()
with patch("pipecat.runner.run._transport_routes_enabled", return_value=False):
_setup_websocket_routes(app, args)
paths = {route.path for route in app.routes}
self.assertNotIn("/ws-client", paths)
def test_setup_websocket_routes_registers_when_websocket_is_available(self):
"""Plain WebSocket route should be registered when dependencies are available."""
app = FastAPI()
args = argparse.Namespace()
with patch("pipecat.runner.run._transport_routes_enabled", return_value=True):
_setup_websocket_routes(app, args)
paths = {route.path for route in app.routes}
self.assertIn("/ws-client", paths)
def test_setup_telephony_routes_skips_when_websocket_is_missing(self):
"""Telephony WebSocket routes should be optional."""
app = FastAPI()
args = argparse.Namespace(transport=None)
with patch("pipecat.runner.run._transport_routes_enabled", return_value=False):
_setup_telephony_routes(app, args)
paths = {route.path for route in app.routes}
self.assertNotIn("/ws", paths)
def test_setup_telephony_routes_registers_when_websocket_is_available(self):
"""Telephony WebSocket route should be registered when dependencies are available."""
app = FastAPI()
args = argparse.Namespace(transport=None)
with patch("pipecat.runner.run._transport_routes_enabled", return_value=True):
_setup_telephony_routes(app, args)
paths = {route.path for route in app.routes}
self.assertIn("/ws", paths)
def test_setup_telephony_routes_registers_provider_webhook_for_selected_transport(self):
"""Provider webhook route should be registered for selected telephony transports."""
app = FastAPI()
args = argparse.Namespace(transport="twilio", proxy="example.ngrok.io")
with patch("pipecat.runner.run._transport_routes_enabled", return_value=True):
_setup_telephony_routes(app, args)
post_root_routes = [
route for route in app.routes if route.path == "/" and "POST" in route.methods
]
self.assertEqual(len(post_root_routes), 1)
def test_setup_daily_routes_skips_when_daily_is_missing(self):
"""Daily routes should be optional."""
app = FastAPI()
args = argparse.Namespace(dialin=False)
with patch("pipecat.runner.run._transport_routes_enabled", return_value=False):
_setup_daily_routes(app, args)
paths = {route.path for route in app.routes}
self.assertNotIn("/daily", paths)
def test_setup_daily_routes_registers_when_daily_is_available(self):
"""Daily route should be registered when dependencies are available."""
app = FastAPI()
args = argparse.Namespace(dialin=False)
with patch("pipecat.runner.run._transport_routes_enabled", return_value=True):
_setup_daily_routes(app, args)
paths = {route.path for route in app.routes}
self.assertIn("/daily", paths)
def test_setup_daily_routes_registers_dialin_route_when_enabled(self):
"""Daily dial-in route should be registered when requested and available."""
app = FastAPI()
args = argparse.Namespace(dialin=True)
with patch("pipecat.runner.run._transport_routes_enabled", return_value=True):
_setup_daily_routes(app, args)
paths = {route.path for route in app.routes}
self.assertIn("/daily", paths)
self.assertIn("/daily-dialin-webhook", paths)
def test_websocket_routes_require_fastapi_and_websockets(self):
with patch(
"pipecat.runner.run._is_module_available",
side_effect=lambda module: module == "fastapi",
) as is_module_available:
self.assertFalse(_transport_routes_enabled("websocket"))
self.assertEqual(
[call.args[0] for call in is_module_available.call_args_list],
["fastapi", "websockets"],
)
def test_start_rejects_disabled_transport_before_running_bot(self):
app = FastAPI()
args = argparse.Namespace(transport=None)
_setup_unified_start_route(app, args, {})
with patch("pipecat.runner.run._transport_routes_enabled", return_value=False):
response = TestClient(app).post("/start", json={"transport": "daily"})
self.assertEqual(response.status_code, 400)
self.assertEqual(
response.json()["detail"],
(
"Transport 'daily' is disabled in this runner environment. "
"Check the startup banner for enabled transports."
),
)
def test_startup_message_all_transports_shows_open_url_and_transport_status(self):
args = argparse.Namespace(transport=None, host="localhost", port=7860)
def routes_enabled(transport: str) -> bool:
return transport in {"telephony", "websocket"}
with patch("pipecat.runner.run._transport_routes_enabled", side_effect=routes_enabled):
output = self._capture_startup_message(args)
self.assertEqual(
output,
(
"\n"
"🚀 Bot ready!\n"
" → Open: http://localhost:7860\n"
" → Enabled transports: telephony, websocket\n"
" → Disabled transports: daily (install pipecat-ai[daily]), "
"webrtc (install pipecat-ai[webrtc])\n"
"\n"
),
)
def test_startup_message_all_transports_omits_disabled_status_when_all_enabled(self):
args = argparse.Namespace(transport=None, host="localhost", port=7860)
with patch("pipecat.runner.run._transport_routes_enabled", return_value=True):
output = self._capture_startup_message(args)
self.assertEqual(
output,
(
"\n"
"🚀 Bot ready!\n"
" → Open: http://localhost:7860\n"
" → Enabled transports: daily, webrtc, telephony, websocket\n"
"\n"
),
)
def test_startup_message_webrtc_uses_root_open_url(self):
args = argparse.Namespace(
transport="webrtc", host="localhost", port=7860, esp32=False, whatsapp=False
)
with patch("pipecat.runner.run._transport_routes_enabled", return_value=True):
output = self._capture_startup_message(args)
self.assertIn(" → Open: http://localhost:7860\n", output)
self.assertNotIn("/client", output)
def test_startup_message_daily_uses_root_open_url(self):
args = argparse.Namespace(transport="daily", host="localhost", port=7860, dialin=False)
with patch("pipecat.runner.run._transport_routes_enabled", return_value=True):
output = self._capture_startup_message(args)
self.assertIn(" → Open: http://localhost:7860\n", output)
self.assertNotIn("/daily in your browser", output)
def test_startup_message_telephony_keeps_provider_endpoint_details(self):
args = argparse.Namespace(
transport="twilio", host="localhost", port=7860, proxy="example.ngrok.io"
)
with patch("pipecat.runner.run._transport_routes_enabled", return_value=True):
output = self._capture_startup_message(args)
self.assertIn(" → Open: http://localhost:7860\n", output)
self.assertIn(" → XML webhook: http://localhost:7860/\n", output)
self.assertIn(" → WebSocket: ws://localhost:7860/ws\n", output)
if __name__ == "__main__":
unittest.main()

View File

@@ -5,8 +5,10 @@
#
import json
from unittest.mock import AsyncMock
import pytest
from websockets.protocol import State
from pipecat.frames.frames import TranscriptionFrame
from pipecat.services.soniox.stt import END_TOKEN, SonioxSTTService, _language_from_tokens
@@ -14,8 +16,10 @@ from pipecat.transcriptions.language import Language
class _FakeWebsocket:
def __init__(self, messages):
def __init__(self, messages, *, state=State.OPEN, send_side_effect=None):
self._messages = messages
self.state = state
self.send = AsyncMock(side_effect=send_side_effect)
def __aiter__(self):
return self._iter_messages()
@@ -25,6 +29,21 @@ class _FakeWebsocket:
yield message
@pytest.mark.asyncio
async def test_connect_failure_clears_stale_websocket_without_raising(monkeypatch):
async def fake_websocket_connect(*args, **kwargs):
raise RuntimeError("connection failed")
monkeypatch.setattr("pipecat.services.soniox.stt.websocket_connect", fake_websocket_connect)
service = SonioxSTTService(api_key="test-key")
service._websocket = _FakeWebsocket([], state=State.CLOSED)
await service._connect_websocket()
assert service._websocket is None
def test_language_from_tokens_uses_single_recognized_language():
tokens = [
{"text": "Hello", "language": "en"},

View File

@@ -21,6 +21,13 @@ repeated for each TTSSpeakFrame, with no cross-group contamination.
Also covers LLM response flow with push_text_frames=True (non-word-timestamp TTS):
verifies TTSTextFrame ordering relative to LLMFullResponseEndFrame.
Also covers smart-text / WordCompletionTracker features:
- Skipped frames (skip_aggregator_types) held until preceding spoken slots complete.
- raw_text on AggregatedTextFrame propagated as spans to TTSTextFrames.
- Overflow: a single TTS word straddling two AggregatedTextFrame boundaries produces
two correctly-attributed TTSTextFrames.
- Force-complete safety net: skipped frames flush even when TTS drops word timestamps.
Also covers the interruption-during-pause deadlock scenario (see test_no_deadlock_on_interrupt_*).
"""
@@ -50,6 +57,7 @@ from pipecat.frames.frames import (
)
from pipecat.services.tts_service import TTSService
from pipecat.tests.utils import SleepFrame, run_test
from pipecat.utils.text.base_text_aggregator import AggregationType
# ---------------------------------------------------------------------------
# Test-only frame
@@ -422,7 +430,7 @@ def _assert_group_ordering(
# All frames between TTSStartedFrame and TTSStoppedFrame must be audio.
mid_types = types[started_idx + 1 : stopped_idx]
for t in mid_types:
assert t is TTSAudioRawFrame, (
assert t in (TTSAudioRawFrame, TTSTextFrame), (
f"Group {foo_label!r}: unexpected frame {t.__name__!r} between "
f"TTSStartedFrame and TTSStoppedFrame. Got: {type_names}"
)
@@ -551,7 +559,7 @@ async def test_http_push_text_llm_response_end_after_tts_text():
@pytest.mark.asyncio
async def test_http_word_timestamps_verbatim_tokens():
"""HTTP path: text, PTS order, flag, and text-before-audio are all verified.
"""HTTP path: text, PTS order, and text-before-audio are all verified.
Word timestamps arrive in the audio context queue before the audio frame.
_handle_audio_context caches them, then flushes when the first audio frame
@@ -572,7 +580,6 @@ async def test_http_word_timestamps_verbatim_tokens():
audio_frames = [f for f in down if isinstance(f, TTSAudioRawFrame)]
assert [f.text for f in tts_text_frames] == ["hello", "world"]
assert all(f.includes_inter_frame_spaces is True for f in tts_text_frames)
pts_values = [f.pts for f in tts_text_frames]
assert pts_values == sorted(pts_values) and len(set(pts_values)) == len(pts_values), (
@@ -590,15 +597,14 @@ async def test_http_word_timestamps_verbatim_tokens():
@pytest.mark.asyncio
async def test_http_word_timestamps_punctuation_tokens():
"""Verbatim punctuation tokens are preserved with flag=True; default flag is False.
"""Punct-only tokens are merged into the preceding word when includes_inter_frame_spaces=True.
Models the Inworld API scenario: the TTS returns tokens exactly as sent.
Space placement rule:
- word-follows-word: space is the leading char of the next word (e.g. " world")
- word-follows-punctuation: space is the trailing char of the punctuation token
(e.g. "! "), so the following word token carries no leading space.
The flag must reach every frame and the text must not be modified.
Also acts as a regression guard that flag=False is the default.
Models the Inworld API scenario: the TTS returns separate space and punctuation
tokens. add_word_timestamps calls merge_punct_tokens when includes_inter_frame_spaces
is True, collapsing those tokens into the preceding word before the tracker sees them.
With flag=False (default) tokens are forwarded as-is; the tracker strips leading/
trailing whitespace from each frame word via get_word_for_frame().
"""
verbatim_tokens = [
("hello", 0.0),
@@ -609,9 +615,9 @@ async def test_http_word_timestamps_punctuation_tokens():
(" you", 0.75),
("?", 0.9),
]
expected_texts = ["hello", " world", "! ", "How", " are", " you", "?"]
# With flag=True: all tokens verbatim, all frames carry the flag.
# With flag=True: punct-only tokens ("! " and "?") are merged into the preceding
# words (" world" → " world! " and " you" → " you?"), then stripped by the tracker.
tts_ifs = _MockWordTimestampHttpTTSService(
includes_inter_frame_spaces=True,
word_times=verbatim_tokens,
@@ -621,12 +627,11 @@ async def test_http_word_timestamps_punctuation_tokens():
frames_to_send=[TTSSpeakFrame(text="hello world! How are you?", append_to_context=False)],
)
text_frames_ifs = [f for f in frames_ifs[0] if isinstance(f, TTSTextFrame)]
assert [f.text for f in text_frames_ifs] == expected_texts, (
"Verbatim tokens must not be modified"
assert [f.text for f in text_frames_ifs] == ["hello", "world!", "How", "are", "you?"], (
"Punct-only tokens must be merged into the preceding word"
)
assert all(f.includes_inter_frame_spaces is True for f in text_frames_ifs)
# With flag=False (default): same tokens, flag must be False on every frame.
# With flag=False (default): no merging; tracker strips leading/trailing spaces.
tts_plain = _MockWordTimestampHttpTTSService(
word_times=verbatim_tokens,
)
@@ -635,13 +640,12 @@ async def test_http_word_timestamps_punctuation_tokens():
frames_to_send=[TTSSpeakFrame(text="hello world! How are you?", append_to_context=False)],
)
text_frames_plain = [f for f in frames_plain[0] if isinstance(f, TTSTextFrame)]
assert [f.text for f in text_frames_plain] == expected_texts
assert all(f.includes_inter_frame_spaces is False for f in text_frames_plain)
assert [f.text for f in text_frames_plain] == ["hello", "world", "!", "How", "are", "you", "?"]
@pytest.mark.asyncio
async def test_websocket_word_timestamps_verbatim_tokens():
"""WebSocket path: _WordTimestampEntry carries verbatim text, PTS, and flag.
"""WebSocket path: text, PTS order, and text-before-audio are all verified.
Unlike the HTTP path the word timestamps are sent asynchronously from a
background task. They arrive before the audio frame and are cached until
@@ -662,7 +666,6 @@ async def test_websocket_word_timestamps_verbatim_tokens():
audio_frames = [f for f in down if isinstance(f, TTSAudioRawFrame)]
assert [f.text for f in tts_text_frames] == ["hello", "world"]
assert all(f.includes_inter_frame_spaces is True for f in tts_text_frames)
pts_values = [f.pts for f in tts_text_frames]
assert pts_values == sorted(pts_values) and len(set(pts_values)) == len(pts_values), (
@@ -678,7 +681,7 @@ async def test_websocket_word_timestamps_verbatim_tokens():
@pytest.mark.asyncio
async def test_websocket_word_timestamps_punctuation_tokens():
"""WebSocket path: verbatim punctuation tokens reach TTSTextFrame unchanged."""
"""WebSocket path: punct-only tokens are merged into the preceding word."""
verbatim_tokens = [
("hello", 0.0),
(" world", 0.15),
@@ -697,10 +700,443 @@ async def test_websocket_word_timestamps_punctuation_tokens():
frames_to_send=[TTSSpeakFrame(text="hello world! How are you?", append_to_context=False)],
)
text_frames = [f for f in frames_received[0] if isinstance(f, TTSTextFrame)]
assert [f.text for f in text_frames] == ["hello", " world", "! ", "How", " are", " you", "?"], (
"Verbatim tokens must not be modified"
assert [f.text for f in text_frames] == ["hello", "world!", "How", "are", "you?"], (
"Punct-only tokens must be merged into the preceding word"
)
# ---------------------------------------------------------------------------
# Per-call word-timestamp mock (for overflow tests)
# ---------------------------------------------------------------------------
class _MockPerCallWordTimestampHttpTTSService(TTSService):
"""HTTP-style TTS where each run_tts() call consumes its own word-time list.
Designed for tests that need different word tokens per sentence. The
``word_times_per_call`` list is consumed in order; an empty inner list means
no word-timestamp events are emitted for that call.
"""
def __init__(
self,
word_times_per_call: list[list[tuple[str, float]]],
**kwargs,
):
super().__init__(
push_start_frame=True,
push_stop_frames=True,
push_text_frames=False,
sample_rate=_SAMPLE_RATE,
**kwargs,
)
self._word_times_queue = list(word_times_per_call)
def can_generate_metrics(self) -> bool:
return False
async def run_tts(self, text: str, context_id: str) -> AsyncGenerator[Frame, None]:
word_times = self._word_times_queue.pop(0) if self._word_times_queue else []
if word_times:
await self.add_word_timestamps(word_times, context_id=context_id)
yield TTSAudioRawFrame(
audio=_FAKE_AUDIO,
sample_rate=_SAMPLE_RATE,
num_channels=1,
context_id=context_id,
)
# ---------------------------------------------------------------------------
# Tests: skipped frame ordering (skip_aggregator_types)
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_http_skipped_frame_waits_for_spoken_words():
"""Skipped frames are held until the preceding spoken slot's word timestamps
are all processed, then flushed in order (HTTP / synchronous audio path).
Sequence sent:
AggregatedTextFrame("hello world", SENTENCE) — spoken; yields 2 TTSTextFrames
AggregatedTextFrame("some code", "code") — in skip_aggregator_types; must wait
Expected downstream order:
TTSTextFrame("hello")
TTSTextFrame("world")
AggregatedTextFrame("some code", append_to_context=True)
"""
tts = _MockWordTimestampHttpTTSService(skip_aggregator_types=["code"])
frames_received = await run_test(
tts,
frames_to_send=[
AggregatedTextFrame("hello world", AggregationType.SENTENCE),
AggregatedTextFrame("some code", "code"),
],
)
down = frames_received[0]
word_frames = [f for f in down if isinstance(f, TTSTextFrame)]
skipped = [f for f in down if isinstance(f, AggregatedTextFrame) and f.text == "some code"]
assert [f.text for f in word_frames] == ["hello", "world"]
assert len(skipped) == 1
assert skipped[0].append_to_context is True
last_word_idx = max(down.index(f) for f in word_frames)
skipped_idx = down.index(skipped[0])
assert skipped_idx > last_word_idx, (
f"Skipped frame (pos {skipped_idx}) must appear after last word frame (pos {last_word_idx})"
)
@pytest.mark.asyncio
async def test_ws_skipped_frame_waits_for_spoken_words():
"""Same ordering guarantee on the WebSocket / async audio delivery path.
Because audio is delivered from a background task after asyncio.sleep(), the
skipped frame arrives at _push_frame_respecting_previous_aggregated_frame
*before* the spoken slot's word timestamps have been processed, directly
exercising the hold-and-flush path.
"""
tts = _MockWordTimestampWSTTSService(skip_aggregator_types=["code"])
frames_received = await run_test(
tts,
frames_to_send=[
AggregatedTextFrame("hello world", AggregationType.SENTENCE),
AggregatedTextFrame("some code", "code"),
],
)
down = frames_received[0]
word_frames = [f for f in down if isinstance(f, TTSTextFrame)]
skipped = [f for f in down if isinstance(f, AggregatedTextFrame) and f.text == "some code"]
assert [f.text for f in word_frames] == ["hello", "world"]
assert len(skipped) == 1
assert skipped[0].append_to_context is True
last_word_idx = max(down.index(f) for f in word_frames)
skipped_idx = down.index(skipped[0])
assert skipped_idx > last_word_idx, (
f"Skipped frame (pos {skipped_idx}) must appear after last word frame (pos {last_word_idx})"
)
@pytest.mark.asyncio
async def test_skipped_frame_before_spoken_emits_immediately():
"""A skipped frame with no preceding spoken slot is emitted right away.
Sequence:
AggregatedTextFrame("some code", "code") — no spoken slot before it → emits now
AggregatedTextFrame("hello world", SENTENCE) — spoken; TTSTextFrames follow
Expected: AggregatedTextFrame("some code") appears *before* TTSTextFrame("hello").
"""
tts = _MockWordTimestampHttpTTSService(skip_aggregator_types=["code"])
frames_received = await run_test(
tts,
frames_to_send=[
AggregatedTextFrame("some code", "code"),
AggregatedTextFrame("hello world", AggregationType.SENTENCE),
],
)
down = frames_received[0]
word_frames = [f for f in down if isinstance(f, TTSTextFrame)]
skipped = [f for f in down if isinstance(f, AggregatedTextFrame) and f.text == "some code"]
assert len(skipped) == 1
assert skipped[0].append_to_context is True
assert len(word_frames) >= 1
skipped_idx = down.index(skipped[0])
first_word_idx = down.index(word_frames[0])
assert skipped_idx < first_word_idx, (
f"Skipped frame (pos {skipped_idx}) must appear before first word frame (pos {first_word_idx})"
)
@pytest.mark.asyncio
async def test_skipped_frame_flushed_when_word_timestamps_incomplete():
"""Force-complete path: skipped frame still emits when the TTS drops word timestamps.
Only one of the two expected tokens ("hello") is returned. The spoken slot never
reaches its expected character count through the normal path. When
on_audio_context_done fires it force-completes any remaining spoken slots and
flushes the waiting skipped frame.
"""
tts = _MockWordTimestampHttpTTSService(
word_times=[("hello", 0.0)], # "world" is never sent
skip_aggregator_types=["code"],
)
frames_received = await run_test(
tts,
frames_to_send=[
AggregatedTextFrame("hello world", AggregationType.SENTENCE),
AggregatedTextFrame("some code", "code"),
],
)
down = frames_received[0]
skipped = [f for f in down if isinstance(f, AggregatedTextFrame) and f.text == "some code"]
assert len(skipped) == 1, "Skipped frame must be flushed via force-complete safety net"
assert skipped[0].append_to_context is True
# ---------------------------------------------------------------------------
# Tests: raw_text propagation through WordCompletionTracker
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_raw_text_propagated_to_tts_text_frames():
"""raw_text on AggregatedTextFrame is split across TTSTextFrames by the tracker.
The frame carries raw_text="<card>4111 1111</card>" while the TTS-prepared
text is "4111 1111". The WordCompletionTracker advances a cursor through the
raw text in step with incoming word tokens, so each TTSTextFrame receives the
exact raw span it represents.
Expected (trailing whitespace stripped because includes_inter_frame_spaces=False):
TTSTextFrame("4111").raw_text == "<card>4111"
TTSTextFrame("1111").raw_text == "1111</card>"
"""
tts = _MockWordTimestampHttpTTSService()
frames_received = await run_test(
tts,
frames_to_send=[
AggregatedTextFrame(
"4111 1111", AggregationType.SENTENCE, raw_text="<card>4111 1111</card>"
)
],
)
word_frames = [f for f in frames_received[0] if isinstance(f, TTSTextFrame)]
assert [f.text for f in word_frames] == ["4111", "1111"]
# get_raw_consumed() strips trailing whitespace when includes_inter_frame_spaces=False
assert word_frames[0].raw_text == "<card>4111"
assert word_frames[1].raw_text == "1111</card>"
# ---------------------------------------------------------------------------
# Tests: overflow — TTS word spanning two AggregatedTextFrame boundaries
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_overflow_word_spanning_two_aggregated_frames():
"""A single TTS token straddling two AggregatedTextFrame boundaries produces
two correctly-attributed TTSTextFrames.
Setup:
Frame 1: AggregatedTextFrame("abc", SENTENCE)
Frame 2: AggregatedTextFrame("def", SENTENCE)
The TTS for frame 1 returns the single token "abcdef", which overshoots
frame 1 by three characters. _emit_overflow_word splits it:
TTSTextFrame("abc") — frame 1's portion (context_id = ctx1)
TTSTextFrame("def") — overflow attributed to frame 2 (context_id = ctx2)
Frame 2 receives no word-timestamp events because the overflow already
consumed its expected text.
"""
tts = _MockPerCallWordTimestampHttpTTSService(
word_times_per_call=[
[("abcdef", 0.0)], # frame 1: single token spanning both frames
[], # frame 2: no word timestamps (overflow already covered it)
]
)
frames_received = await run_test(
tts,
frames_to_send=[
AggregatedTextFrame("abc", AggregationType.SENTENCE),
AggregatedTextFrame("def", AggregationType.SENTENCE),
],
)
word_frames = [f for f in frames_received[0] if isinstance(f, TTSTextFrame)]
assert [f.text for f in word_frames] == ["abc", "def"], (
f"Expected ['abc', 'def'] but got {[f.text for f in word_frames]}"
)
assert word_frames[0].context_id != word_frames[1].context_id, (
"Overflow TTSTextFrame must carry frame 2's context_id, not frame 1's"
)
# ---------------------------------------------------------------------------
# Per-call word-timestamp mock for WebSocket path (for force-complete tests)
# ---------------------------------------------------------------------------
class _MockPerCallWordTimestampWSTTSService(TTSService):
"""WebSocket-style TTS where each run_tts() call consumes its own word-time list.
Mirrors _MockPerCallWordTimestampHttpTTSService but uses the async audio-context
delivery pattern so it exercises _handle_audio_context (the WebSocket path).
An empty inner list means no word-timestamp events are emitted for that call.
"""
def __init__(
self,
word_times_per_call: list[list[tuple[str, float]]],
**kwargs,
):
super().__init__(
push_start_frame=True,
push_text_frames=False,
pause_frame_processing=False,
sample_rate=_SAMPLE_RATE,
**kwargs,
)
self._word_times_queue = list(word_times_per_call)
def can_generate_metrics(self) -> bool:
return False
async def run_tts(self, text: str, context_id: str) -> AsyncGenerator[Frame, None]:
word_times = self._word_times_queue.pop(0) if self._word_times_queue else []
async def _deliver():
await asyncio.sleep(0.01)
if word_times:
await self.add_word_timestamps(word_times, context_id=context_id)
await self.append_to_audio_context(
context_id,
TTSAudioRawFrame(
audio=_FAKE_AUDIO,
sample_rate=_SAMPLE_RATE,
num_channels=1,
context_id=context_id,
),
)
await self.append_to_audio_context(context_id, TTSStoppedFrame(context_id=context_id))
await self.remove_audio_context(context_id)
self.create_task(_deliver(), name=f"mock_ws_per_call_deliver_{context_id}")
if False:
yield
# ---------------------------------------------------------------------------
# Tests: _force_complete_spoken_slots — TTSTextFrame emission for dropped timestamps
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_http_force_complete_partial_timestamps_emits_remaining_text():
"""_force_complete_spoken_slots emits a TTSTextFrame for the unspoken word suffix.
Only the first token ("hello") is delivered as a word-timestamp event; "world"
is never sent. When the audio context ends _force_complete_spoken_slots fires,
reads get_remaining_text() from the tracker, and emits TTSTextFrame("world").
Expected TTSTextFrames in order: ["hello", "world"].
"""
tts = _MockWordTimestampHttpTTSService(word_times=[("hello", 0.0)])
frames_received = await run_test(
tts,
frames_to_send=[TTSSpeakFrame(text="hello world", append_to_context=False)],
)
word_frames = [f for f in frames_received[0] if isinstance(f, TTSTextFrame)]
assert [f.text for f in word_frames] == ["hello", "world"], (
f"Expected ['hello', 'world'] but got {[f.text for f in word_frames]}"
)
@pytest.mark.asyncio
async def test_http_force_complete_no_timestamps_emits_full_text():
"""_force_complete_spoken_slots emits the full text when no word timestamps arrive.
No word-timestamp events are sent for "hello world". The slot remains incomplete
when the audio context ends; force-complete reads the full remaining text from the
tracker and emits TTSTextFrame("hello world").
"""
tts = _MockPerCallWordTimestampHttpTTSService(word_times_per_call=[[]])
frames_received = await run_test(
tts,
frames_to_send=[TTSSpeakFrame(text="hello world", append_to_context=False)],
)
word_frames = [f for f in frames_received[0] if isinstance(f, TTSTextFrame)]
assert len(word_frames) == 1, (
f"Expected exactly 1 TTSTextFrame, got {len(word_frames)}: {[f.text for f in word_frames]}"
)
assert word_frames[0].text == "hello world", (
f"Expected TTSTextFrame('hello world'), got {word_frames[0].text!r}"
)
@pytest.mark.asyncio
async def test_http_force_complete_raw_text_propagated():
"""force-complete carries the correct raw_text span on the emitted TTSTextFrame.
AggregatedTextFrame carries raw_text="<card>4111 1111</card>". Only "4111" arrives
as a word-timestamp; "1111" is force-completed.
Expected:
TTSTextFrame("4111").raw_text == "<card>4111" — from normal word path
TTSTextFrame("1111").raw_text == "1111</card>" — from force-complete path
"""
tts = _MockPerCallWordTimestampHttpTTSService(word_times_per_call=[[("4111", 0.0)]])
frames_received = await run_test(
tts,
frames_to_send=[
AggregatedTextFrame(
"4111 1111", AggregationType.SENTENCE, raw_text="<card>4111 1111</card>"
)
],
)
word_frames = [f for f in frames_received[0] if isinstance(f, TTSTextFrame)]
assert [f.text for f in word_frames] == ["4111", "1111"], (
f"Expected ['4111', '1111'] but got {[f.text for f in word_frames]}"
)
assert word_frames[0].raw_text == "<card>4111", (
f"Expected raw_text '<card>4111' on first frame, got {word_frames[0].raw_text!r}"
)
assert word_frames[1].raw_text == "1111</card>", (
f"Expected raw_text '1111</card>' on force-complete frame, got {word_frames[1].raw_text!r}"
)
@pytest.mark.asyncio
async def test_ws_force_complete_partial_timestamps_emits_remaining_text():
"""WebSocket path: _force_complete_spoken_slots emits TTSTextFrame for dropped token.
Mirrors test_http_force_complete_partial_timestamps_emits_remaining_text on the
async audio delivery path to confirm force-complete fires correctly from
_handle_audio_context when TTSStoppedFrame arrives before all word timestamps.
"""
tts = _MockWordTimestampWSTTSService(word_times=[("hello", 0.0)])
frames_received = await run_test(
tts,
frames_to_send=[TTSSpeakFrame(text="hello world", append_to_context=False)],
)
word_frames = [f for f in frames_received[0] if isinstance(f, TTSTextFrame)]
assert [f.text for f in word_frames] == ["hello", "world"], (
f"Expected ['hello', 'world'] but got {[f.text for f in word_frames]}"
)
@pytest.mark.asyncio
async def test_ws_force_complete_no_timestamps_emits_full_text():
"""WebSocket path: full text emitted as single TTSTextFrame when no timestamps arrive."""
tts = _MockPerCallWordTimestampWSTTSService(word_times_per_call=[[]])
frames_received = await run_test(
tts,
frames_to_send=[TTSSpeakFrame(text="hello world", append_to_context=False)],
)
word_frames = [f for f in frames_received[0] if isinstance(f, TTSTextFrame)]
assert len(word_frames) == 1, (
f"Expected exactly 1 TTSTextFrame, got {len(word_frames)}: {[f.text for f in word_frames]}"
)
assert word_frames[0].text == "hello world", (
f"Expected TTSTextFrame('hello world'), got {word_frames[0].text!r}"
)
assert all(f.includes_inter_frame_spaces is True for f in text_frames)
@pytest.mark.asyncio

File diff suppressed because it is too large Load Diff

View File

@@ -165,6 +165,19 @@ async def test_reconnect_exhausted_emits_non_fatal_error(service, report_error):
assert "Connection refused" in final_error.error
@pytest.mark.asyncio
async def test_reconnect_exhausted_when_connect_does_not_raise(service, report_error):
"""A non-raising failed connect is treated as a failed reconnect attempt."""
result = await service._try_reconnect(report_error=report_error)
assert result is False
assert report_error.call_count == 4
final_error = report_error.call_args_list[-1][0][0]
assert isinstance(final_error, ErrorFrame)
assert final_error.fatal is False
assert "websocket reconnection failed verification" in final_error.error
# ---------------------------------------------------------------------------
# Quick failure detection — accept then immediately close
# ---------------------------------------------------------------------------

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,90 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import unittest
from pipecat.utils.text.word_timestamp_utils import merge_punct_tokens
class TestMergePunctTokens(unittest.TestCase):
def test_empty_list(self):
self.assertEqual(merge_punct_tokens([]), [])
def test_all_alnum_words_pass_through(self):
input = [("hello", 0.0), ("world", 1.0)]
self.assertEqual(merge_punct_tokens(input), [("hello", 0.0), ("world", 1.0)])
def test_trailing_space_merged_and_stripped(self):
input = [("I", 0.0), (" ", 0.2)]
self.assertEqual(merge_punct_tokens(input), [("I", 0.0)])
def test_comma_space_merged_and_stripped(self):
input = [("questions", 1.0), (", ", 1.2), ("explain", 1.4)]
self.assertEqual(merge_punct_tokens(input), [("questions,", 1.0), ("explain", 1.4)])
def test_leading_space_with_no_preceding_word_discarded(self):
input = [(" ", 0.0), ("hello", 0.5)]
self.assertEqual(merge_punct_tokens(input), [("hello", 0.5)])
def test_leading_empty_string_discarded(self):
input = [("", 0.0), ("hello", 0.5)]
self.assertEqual(merge_punct_tokens(input), [("hello", 0.5)])
def test_multiple_consecutive_punct_tokens_merged_and_stripped(self):
input = [("word", 0.0), (",", 0.1), (" ", 0.2), ("next", 0.3)]
self.assertEqual(merge_punct_tokens(input), [("word,", 0.0), ("next", 0.3)])
def test_timestamp_of_preceding_word_is_kept(self):
"""Merged punct tokens adopt the preceding word's timestamp."""
input = [("hello", 2.5), (",", 2.7)]
result = merge_punct_tokens(input)
self.assertEqual(result, [("hello,", 2.5)])
def test_xml_tag_only_token_is_treated_as_punct(self):
"""A token that is only an XML tag (no alnum chars) merges into the preceding word."""
input = [("word", 0.0), ("<break/>", 0.1), ("next", 0.3)]
self.assertEqual(merge_punct_tokens(input), [("word<break/>", 0.0), ("next", 0.3)])
def test_xml_tag_with_alnum_content_passes_through(self):
"""A token like '<spell>123</spell>' has alnum chars after stripping tags."""
input = [("<spell>123</spell>", 0.0), ("and", 0.5)]
self.assertEqual(merge_punct_tokens(input), [("<spell>123</spell>", 0.0), ("and", 0.5)])
def test_inworld_style_full_stream(self):
"""Full Inworld-style raw stream produces expected merged and stripped output."""
raw = [
("", 0.0),
("I", 0.1),
(" ", 0.2),
("can", 0.3),
(" ", 0.4),
("answer", 0.5),
(" ", 0.6),
("questions", 0.7),
(", ", 0.8),
("explain", 0.9),
(" ", 1.0),
("things", 1.1),
(".", 1.2),
]
expected = [
("I", 0.1),
("can", 0.3),
("answer", 0.5),
("questions,", 0.7),
("explain", 0.9),
("things.", 1.1),
]
self.assertEqual(merge_punct_tokens(raw), expected)
def test_only_punct_tokens_returns_empty(self):
"""A list containing only punct/space tokens produces an empty result."""
input = [(" ", 0.0), (",", 0.1), (".", 0.2)]
self.assertEqual(merge_punct_tokens(input), [])
if __name__ == "__main__":
unittest.main()

224
uv.lock generated
View File

@@ -307,7 +307,8 @@ dependencies = [
{ name = "docstring-parser" },
{ name = "httpx" },
{ name = "jiter" },
{ name = "pydantic" },
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
{ name = "sniffio" },
{ name = "typing-extensions" },
]
@@ -616,7 +617,8 @@ version = "1.5.11"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "httpx" },
{ name = "pydantic" },
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
{ name = "typing-extensions" },
{ name = "websocket-client" },
{ name = "websockets" },
@@ -1268,8 +1270,10 @@ version = "6.1.1"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "httpx" },
{ name = "pydantic" },
{ name = "pydantic-core" },
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
{ name = "pydantic-core", version = "2.33.2", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
{ name = "pydantic-core", version = "2.46.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
{ name = "typing-extensions" },
{ name = "websockets" },
]
@@ -1394,7 +1398,8 @@ version = "0.136.1"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "annotated-doc" },
{ name = "pydantic" },
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
{ name = "starlette" },
{ name = "typing-extensions" },
{ name = "typing-inspection" },
@@ -1445,7 +1450,8 @@ source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "fastar" },
{ name = "httpx" },
{ name = "pydantic", extra = ["email"] },
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, extra = ["email"], marker = "python_full_version == '3.13.*'" },
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, extra = ["email"], marker = "python_full_version != '3.13.*'" },
{ name = "rich-toolkit" },
{ name = "rignore" },
{ name = "sentry-sdk" },
@@ -1865,7 +1871,8 @@ dependencies = [
{ name = "distro" },
{ name = "google-auth", extra = ["requests"] },
{ name = "httpx" },
{ name = "pydantic" },
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
{ name = "requests" },
{ name = "sniffio" },
{ name = "tenacity" },
@@ -1944,7 +1951,8 @@ dependencies = [
{ name = "anyio" },
{ name = "distro" },
{ name = "httpx" },
{ name = "pydantic" },
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
{ name = "sniffio" },
{ name = "typing-extensions" },
]
@@ -2249,8 +2257,10 @@ dependencies = [
{ name = "eval-type-backport" },
{ name = "exceptiongroup" },
{ name = "httpx" },
{ name = "pydantic" },
{ name = "pydantic-core" },
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
{ name = "pydantic-core", version = "2.33.2", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
{ name = "pydantic-core", version = "2.46.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
{ name = "typing-extensions" },
{ name = "websockets" },
]
@@ -2728,7 +2738,8 @@ source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "langchain-core" },
{ name = "langgraph" },
{ name = "pydantic" },
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
]
sdist = { url = "https://files.pythonhosted.org/packages/a6/74/03fd4c07993c49c4b80635bb4c723643ff78af81c9471d1266f879f68df1/langchain-1.3.0.tar.gz", hash = "sha256:8ec70ee0cef94255f3e522423b254093a3dd34509638d353c50f3d9dd498debc", size = 580604, upload-time = "2026-05-12T14:45:50.7Z" }
wheels = [
@@ -2743,7 +2754,8 @@ dependencies = [
{ name = "langchain-core" },
{ name = "langchain-text-splitters" },
{ name = "langsmith" },
{ name = "pydantic" },
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
{ name = "pyyaml" },
{ name = "requests" },
{ name = "sqlalchemy" },
@@ -2785,7 +2797,8 @@ dependencies = [
{ name = "langchain-protocol" },
{ name = "langsmith" },
{ name = "packaging" },
{ name = "pydantic" },
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
{ name = "pyyaml" },
{ name = "tenacity" },
{ name = "typing-extensions" },
@@ -2843,7 +2856,8 @@ dependencies = [
{ name = "langgraph-checkpoint" },
{ name = "langgraph-prebuilt" },
{ name = "langgraph-sdk" },
{ name = "pydantic" },
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
{ name = "xxhash" },
]
sdist = { url = "https://files.pythonhosted.org/packages/58/61/d5d25e783035aa307d289b37e082258a6061c0fb4caa4a284f3bf1e87169/langgraph-1.2.0.tar.gz", hash = "sha256:4a9baaf62afc5d5f63144a50095140a34b9aa9b7cea695d25326d564775348e7", size = 690248, upload-time = "2026-05-12T03:46:39.164Z" }
@@ -2898,7 +2912,8 @@ dependencies = [
{ name = "httpx" },
{ name = "orjson", marker = "platform_python_implementation != 'PyPy'" },
{ name = "packaging" },
{ name = "pydantic" },
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
{ name = "requests" },
{ name = "requests-toolbelt" },
{ name = "uuid-utils" },
@@ -3188,7 +3203,8 @@ dependencies = [
{ name = "httpx" },
{ name = "httpx-sse" },
{ name = "jsonschema" },
{ name = "pydantic" },
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
{ name = "pydantic-settings" },
{ name = "pyjwt", extra = ["crypto"] },
{ name = "python-multipart" },
@@ -3227,7 +3243,8 @@ dependencies = [
{ name = "openai" },
{ name = "posthog" },
{ name = "protobuf" },
{ name = "pydantic" },
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
{ name = "pytz" },
{ name = "qdrant-client" },
{ name = "sqlalchemy" },
@@ -3247,7 +3264,8 @@ dependencies = [
{ name = "jsonpath-python" },
{ name = "opentelemetry-api" },
{ name = "opentelemetry-semantic-conventions" },
{ name = "pydantic" },
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
{ name = "python-dateutil" },
{ name = "typing-inspection" },
]
@@ -3818,7 +3836,8 @@ dependencies = [
{ name = "distro" },
{ name = "httpx" },
{ name = "jiter" },
{ name = "pydantic" },
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
{ name = "sniffio" },
{ name = "tqdm" },
{ name = "typing-extensions" },
@@ -4195,7 +4214,8 @@ dependencies = [
{ name = "openai" },
{ name = "pillow" },
{ name = "protobuf" },
{ name = "pydantic" },
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
{ name = "pyloudnorm" },
{ name = "resampy" },
{ name = "soxr" },
@@ -4351,7 +4371,7 @@ rnnoise = [
]
runner = [
{ name = "fastapi" },
{ name = "pipecat-ai-small-webrtc-prebuilt" },
{ name = "pipecat-ai-prebuilt" },
{ name = "python-dotenv" },
{ name = "uvicorn" },
]
@@ -4394,6 +4414,9 @@ tracing = [
ultravox = [
{ name = "websockets" },
]
vonage-video-connector = [
{ name = "vonage-video-connector", marker = "python_full_version == '3.13.*' and sys_platform == 'linux'" },
]
webrtc = [
{ name = "aiortc" },
{ name = "opencv-python" },
@@ -4516,7 +4539,7 @@ requires-dist = [
{ name = "pipecat-ai", extras = ["websockets-base"], marker = "extra == 'ultravox'" },
{ name = "pipecat-ai", extras = ["websockets-base"], marker = "extra == 'websocket'" },
{ name = "pipecat-ai", extras = ["websockets-base"], marker = "extra == 'xai'" },
{ name = "pipecat-ai-small-webrtc-prebuilt", marker = "extra == 'runner'", specifier = ">=2.5.0" },
{ name = "pipecat-ai-prebuilt", marker = "extra == 'runner'", specifier = ">=1.0.1" },
{ name = "piper-tts", marker = "extra == 'piper'", specifier = ">=1.3.0,<2" },
{ name = "protobuf", specifier = ">=5.29.6,<7" },
{ name = "protobuf", marker = "extra == 'nvidia'", specifier = ">=6.31.1,<7" },
@@ -4547,10 +4570,11 @@ requires-dist = [
{ name = "transformers", marker = "extra == 'local-smart-turn'", specifier = ">=4.48.0,<6" },
{ name = "transformers", marker = "extra == 'moondream'", specifier = ">=4.48.0,<6" },
{ name = "uvicorn", marker = "extra == 'runner'", specifier = ">=0.32.0,<1.0.0" },
{ name = "vonage-video-connector", marker = "python_full_version == '3.13.*' and sys_platform == 'linux' and extra == 'vonage-video-connector'", specifier = "~=0.2.3b0" },
{ name = "wait-for2", marker = "python_full_version < '3.12'", specifier = ">=0.4.1,<1" },
{ name = "websockets", marker = "extra == 'websockets-base'", specifier = ">=13.1,<16.0" },
]
provides-extras = ["aic", "anthropic", "assemblyai", "asyncai", "aws", "aws-nova-sonic", "azure", "cartesia", "camb", "cerebras", "daily", "deepgram", "deepseek", "elevenlabs", "fal", "fireworks", "fish", "gladia", "google", "gradium", "grok", "groq", "gstreamer", "heygen", "hume", "inworld", "koala", "kokoro", "langchain", "lemonslice", "livekit", "lmnt", "local", "local-smart-turn", "mcp", "mem0", "mistral", "mlx-whisper", "moondream", "nebius", "neuphonic", "novita", "nvidia", "openai", "rnnoise", "openrouter", "perplexity", "piper", "qwen", "resembleai", "rime", "runner", "sagemaker", "sambanova", "sarvam", "sentry", "silero", "simli", "smallest", "soniox", "soundfile", "speechmatics", "strands", "tavus", "together", "tracing", "ultravox", "webrtc", "websocket", "websockets-base", "whisper", "xai"]
provides-extras = ["aic", "anthropic", "assemblyai", "asyncai", "aws", "aws-nova-sonic", "azure", "cartesia", "camb", "cerebras", "daily", "deepgram", "deepseek", "elevenlabs", "fal", "fireworks", "fish", "gladia", "google", "gradium", "grok", "groq", "gstreamer", "heygen", "hume", "inception", "inworld", "koala", "kokoro", "langchain", "lemonslice", "livekit", "lmnt", "local", "local-smart-turn", "mcp", "mem0", "mistral", "mlx-whisper", "moondream", "nebius", "neuphonic", "novita", "nvidia", "openai", "rnnoise", "openrouter", "perplexity", "piper", "qwen", "resembleai", "rime", "runner", "sagemaker", "sambanova", "sarvam", "sentry", "silero", "simli", "smallest", "soniox", "soundfile", "speechmatics", "strands", "tavus", "together", "tracing", "ultravox", "vonage-video-connector", "webrtc", "websocket", "websockets-base", "whisper", "xai"]
[package.metadata.requires-dev]
dev = [
@@ -4578,15 +4602,15 @@ docs = [
]
[[package]]
name = "pipecat-ai-small-webrtc-prebuilt"
version = "2.5.0"
name = "pipecat-ai-prebuilt"
version = "1.0.1"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "fastapi", extra = ["all"] },
]
sdist = { url = "https://files.pythonhosted.org/packages/2c/4f/40bfc9fc1a13f9b1f2657e292c51ff3e3516c530ca722effdcf342d465d9/pipecat_ai_small_webrtc_prebuilt-2.5.0.tar.gz", hash = "sha256:51481506b7b5dff10eff0357ff929cba504a5198c3393697178d2be9895ad9e6", size = 474299, upload-time = "2026-04-22T18:05:16.494Z" }
sdist = { url = "https://files.pythonhosted.org/packages/fa/27/91857cd93661922687e51f4141583dbeb71f9a6c8d0d6379bae1aa467522/pipecat_ai_prebuilt-1.0.1.tar.gz", hash = "sha256:9453136fcb994802f9b650b5175f3ce1d0476849a9e609fefe52ecc1c3299680", size = 601771, upload-time = "2026-05-20T16:08:14.485Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/34/58/1a2e10c1fb7b44e47558cb6c0954e24a60f98afe912fe55c74fdee66f080/pipecat_ai_small_webrtc_prebuilt-2.5.0-py3-none-any.whl", hash = "sha256:23b1eee95662a0072d9ee5128b8567108eda10d5a54ad71f279730afbb678bfe", size = 474308, upload-time = "2026-04-22T18:05:14.552Z" },
{ url = "https://files.pythonhosted.org/packages/ec/4f/a636e47967c3aa885ae912502d73a46d1e824a67992e405ea1e94b78bd94/pipecat_ai_prebuilt-1.0.1-py3-none-any.whl", hash = "sha256:45d78d3fd2ac8193626a5dabb5f45d0ff2d35bfc92098b4bcea308ae612196aa", size = 601994, upload-time = "2026-05-20T16:08:12.4Z" },
]
[[package]]
@@ -4920,15 +4944,43 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/0c/c3/44f3fbbfa403ea2a7c779186dc20772604442dde72947e7d01069cbe98e3/pycparser-3.0-py3-none-any.whl", hash = "sha256:b727414169a36b7d524c1c3e31839a521725078d7b2ff038656844266160a992", size = 48172, upload-time = "2026-01-21T14:26:50.693Z" },
]
[[package]]
name = "pydantic"
version = "2.11.10"
source = { registry = "https://pypi.org/simple" }
resolution-markers = [
"python_full_version == '3.13.*'",
]
dependencies = [
{ name = "annotated-types", marker = "python_full_version == '3.13.*'" },
{ name = "pydantic-core", version = "2.33.2", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
{ name = "typing-extensions", marker = "python_full_version == '3.13.*'" },
{ name = "typing-inspection", marker = "python_full_version == '3.13.*'" },
]
sdist = { url = "https://files.pythonhosted.org/packages/ae/54/ecab642b3bed45f7d5f59b38443dcb36ef50f85af192e6ece103dbfe9587/pydantic-2.11.10.tar.gz", hash = "sha256:dc280f0982fbda6c38fada4e476dc0a4f3aeaf9c6ad4c28df68a666ec3c61423", size = 788494, upload-time = "2025-10-04T10:40:41.338Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/bd/1f/73c53fcbfb0b5a78f91176df41945ca466e71e9d9d836e5c522abda39ee7/pydantic-2.11.10-py3-none-any.whl", hash = "sha256:802a655709d49bd004c31e865ef37da30b540786a46bfce02333e0e24b5fe29a", size = 444823, upload-time = "2025-10-04T10:40:39.055Z" },
]
[package.optional-dependencies]
email = [
{ name = "email-validator", marker = "python_full_version == '3.13.*'" },
]
[[package]]
name = "pydantic"
version = "2.13.4"
source = { registry = "https://pypi.org/simple" }
resolution-markers = [
"python_full_version >= '3.14'",
"python_full_version == '3.12.*'",
"python_full_version < '3.12'",
]
dependencies = [
{ name = "annotated-types" },
{ name = "pydantic-core" },
{ name = "typing-extensions" },
{ name = "typing-inspection" },
{ name = "annotated-types", marker = "python_full_version != '3.13.*'" },
{ name = "pydantic-core", version = "2.46.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
{ name = "typing-extensions", marker = "python_full_version != '3.13.*'" },
{ name = "typing-inspection", marker = "python_full_version != '3.13.*'" },
]
sdist = { url = "https://files.pythonhosted.org/packages/18/a5/b60d21ac674192f8ab0ba4e9fd860690f9b4a6e51ca5df118733b487d8d6/pydantic-2.13.4.tar.gz", hash = "sha256:c40756b57adaa8b1efeeced5c196f3f3b7c435f90e84ea7f443901bec8099ef6", size = 844775, upload-time = "2026-05-06T13:43:05.343Z" }
wheels = [
@@ -4937,15 +4989,88 @@ wheels = [
[package.optional-dependencies]
email = [
{ name = "email-validator" },
{ name = "email-validator", marker = "python_full_version != '3.13.*'" },
]
[[package]]
name = "pydantic-core"
version = "2.33.2"
source = { registry = "https://pypi.org/simple" }
resolution-markers = [
"python_full_version == '3.13.*'",
]
dependencies = [
{ name = "typing-extensions", marker = "python_full_version == '3.13.*'" },
]
sdist = { url = "https://files.pythonhosted.org/packages/ad/88/5f2260bdfae97aabf98f1778d43f69574390ad787afb646292a638c923d4/pydantic_core-2.33.2.tar.gz", hash = "sha256:7cb8bc3605c29176e1b105350d2e6474142d7c1bd1d9327c4a9bdb46bf827acc", size = 435195, upload-time = "2025-04-23T18:33:52.104Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/3f/8d/71db63483d518cbbf290261a1fc2839d17ff89fce7089e08cad07ccfce67/pydantic_core-2.33.2-cp311-cp311-macosx_10_12_x86_64.whl", hash = "sha256:4c5b0a576fb381edd6d27f0a85915c6daf2f8138dc5c267a57c08a62900758c7", size = 2028584, upload-time = "2025-04-23T18:31:03.106Z" },
{ url = "https://files.pythonhosted.org/packages/24/2f/3cfa7244ae292dd850989f328722d2aef313f74ffc471184dc509e1e4e5a/pydantic_core-2.33.2-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:e799c050df38a639db758c617ec771fd8fb7a5f8eaaa4b27b101f266b216a246", size = 1855071, upload-time = "2025-04-23T18:31:04.621Z" },
{ url = "https://files.pythonhosted.org/packages/b3/d3/4ae42d33f5e3f50dd467761304be2fa0a9417fbf09735bc2cce003480f2a/pydantic_core-2.33.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:dc46a01bf8d62f227d5ecee74178ffc448ff4e5197c756331f71efcc66dc980f", size = 1897823, upload-time = "2025-04-23T18:31:06.377Z" },
{ url = "https://files.pythonhosted.org/packages/f4/f3/aa5976e8352b7695ff808599794b1fba2a9ae2ee954a3426855935799488/pydantic_core-2.33.2-cp311-cp311-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:a144d4f717285c6d9234a66778059f33a89096dfb9b39117663fd8413d582dcc", size = 1983792, upload-time = "2025-04-23T18:31:07.93Z" },
{ url = "https://files.pythonhosted.org/packages/d5/7a/cda9b5a23c552037717f2b2a5257e9b2bfe45e687386df9591eff7b46d28/pydantic_core-2.33.2-cp311-cp311-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:73cf6373c21bc80b2e0dc88444f41ae60b2f070ed02095754eb5a01df12256de", size = 2136338, upload-time = "2025-04-23T18:31:09.283Z" },
{ url = "https://files.pythonhosted.org/packages/2b/9f/b8f9ec8dd1417eb9da784e91e1667d58a2a4a7b7b34cf4af765ef663a7e5/pydantic_core-2.33.2-cp311-cp311-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:3dc625f4aa79713512d1976fe9f0bc99f706a9dee21dfd1810b4bbbf228d0e8a", size = 2730998, upload-time = "2025-04-23T18:31:11.7Z" },
{ url = "https://files.pythonhosted.org/packages/47/bc/cd720e078576bdb8255d5032c5d63ee5c0bf4b7173dd955185a1d658c456/pydantic_core-2.33.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:881b21b5549499972441da4758d662aeea93f1923f953e9cbaff14b8b9565aef", size = 2003200, upload-time = "2025-04-23T18:31:13.536Z" },
{ url = "https://files.pythonhosted.org/packages/ca/22/3602b895ee2cd29d11a2b349372446ae9727c32e78a94b3d588a40fdf187/pydantic_core-2.33.2-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:bdc25f3681f7b78572699569514036afe3c243bc3059d3942624e936ec93450e", size = 2113890, upload-time = "2025-04-23T18:31:15.011Z" },
{ url = "https://files.pythonhosted.org/packages/ff/e6/e3c5908c03cf00d629eb38393a98fccc38ee0ce8ecce32f69fc7d7b558a7/pydantic_core-2.33.2-cp311-cp311-musllinux_1_1_aarch64.whl", hash = "sha256:fe5b32187cbc0c862ee201ad66c30cf218e5ed468ec8dc1cf49dec66e160cc4d", size = 2073359, upload-time = "2025-04-23T18:31:16.393Z" },
{ url = "https://files.pythonhosted.org/packages/12/e7/6a36a07c59ebefc8777d1ffdaf5ae71b06b21952582e4b07eba88a421c79/pydantic_core-2.33.2-cp311-cp311-musllinux_1_1_armv7l.whl", hash = "sha256:bc7aee6f634a6f4a95676fcb5d6559a2c2a390330098dba5e5a5f28a2e4ada30", size = 2245883, upload-time = "2025-04-23T18:31:17.892Z" },
{ url = "https://files.pythonhosted.org/packages/16/3f/59b3187aaa6cc0c1e6616e8045b284de2b6a87b027cce2ffcea073adf1d2/pydantic_core-2.33.2-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:235f45e5dbcccf6bd99f9f472858849f73d11120d76ea8707115415f8e5ebebf", size = 2241074, upload-time = "2025-04-23T18:31:19.205Z" },
{ url = "https://files.pythonhosted.org/packages/e0/ed/55532bb88f674d5d8f67ab121a2a13c385df382de2a1677f30ad385f7438/pydantic_core-2.33.2-cp311-cp311-win32.whl", hash = "sha256:6368900c2d3ef09b69cb0b913f9f8263b03786e5b2a387706c5afb66800efd51", size = 1910538, upload-time = "2025-04-23T18:31:20.541Z" },
{ url = "https://files.pythonhosted.org/packages/fe/1b/25b7cccd4519c0b23c2dd636ad39d381abf113085ce4f7bec2b0dc755eb1/pydantic_core-2.33.2-cp311-cp311-win_amd64.whl", hash = "sha256:1e063337ef9e9820c77acc768546325ebe04ee38b08703244c1309cccc4f1bab", size = 1952909, upload-time = "2025-04-23T18:31:22.371Z" },
{ url = "https://files.pythonhosted.org/packages/49/a9/d809358e49126438055884c4366a1f6227f0f84f635a9014e2deb9b9de54/pydantic_core-2.33.2-cp311-cp311-win_arm64.whl", hash = "sha256:6b99022f1d19bc32a4c2a0d544fc9a76e3be90f0b3f4af413f87d38749300e65", size = 1897786, upload-time = "2025-04-23T18:31:24.161Z" },
{ url = "https://files.pythonhosted.org/packages/18/8a/2b41c97f554ec8c71f2a8a5f85cb56a8b0956addfe8b0efb5b3d77e8bdc3/pydantic_core-2.33.2-cp312-cp312-macosx_10_12_x86_64.whl", hash = "sha256:a7ec89dc587667f22b6a0b6579c249fca9026ce7c333fc142ba42411fa243cdc", size = 2009000, upload-time = "2025-04-23T18:31:25.863Z" },
{ url = "https://files.pythonhosted.org/packages/a1/02/6224312aacb3c8ecbaa959897af57181fb6cf3a3d7917fd44d0f2917e6f2/pydantic_core-2.33.2-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:3c6db6e52c6d70aa0d00d45cdb9b40f0433b96380071ea80b09277dba021ddf7", size = 1847996, upload-time = "2025-04-23T18:31:27.341Z" },
{ url = "https://files.pythonhosted.org/packages/d6/46/6dcdf084a523dbe0a0be59d054734b86a981726f221f4562aed313dbcb49/pydantic_core-2.33.2-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:4e61206137cbc65e6d5256e1166f88331d3b6238e082d9f74613b9b765fb9025", size = 1880957, upload-time = "2025-04-23T18:31:28.956Z" },
{ url = "https://files.pythonhosted.org/packages/ec/6b/1ec2c03837ac00886ba8160ce041ce4e325b41d06a034adbef11339ae422/pydantic_core-2.33.2-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:eb8c529b2819c37140eb51b914153063d27ed88e3bdc31b71198a198e921e011", size = 1964199, upload-time = "2025-04-23T18:31:31.025Z" },
{ url = "https://files.pythonhosted.org/packages/2d/1d/6bf34d6adb9debd9136bd197ca72642203ce9aaaa85cfcbfcf20f9696e83/pydantic_core-2.33.2-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:c52b02ad8b4e2cf14ca7b3d918f3eb0ee91e63b3167c32591e57c4317e134f8f", size = 2120296, upload-time = "2025-04-23T18:31:32.514Z" },
{ url = "https://files.pythonhosted.org/packages/e0/94/2bd0aaf5a591e974b32a9f7123f16637776c304471a0ab33cf263cf5591a/pydantic_core-2.33.2-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:96081f1605125ba0855dfda83f6f3df5ec90c61195421ba72223de35ccfb2f88", size = 2676109, upload-time = "2025-04-23T18:31:33.958Z" },
{ url = "https://files.pythonhosted.org/packages/f9/41/4b043778cf9c4285d59742281a769eac371b9e47e35f98ad321349cc5d61/pydantic_core-2.33.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:8f57a69461af2a5fa6e6bbd7a5f60d3b7e6cebb687f55106933188e79ad155c1", size = 2002028, upload-time = "2025-04-23T18:31:39.095Z" },
{ url = "https://files.pythonhosted.org/packages/cb/d5/7bb781bf2748ce3d03af04d5c969fa1308880e1dca35a9bd94e1a96a922e/pydantic_core-2.33.2-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:572c7e6c8bb4774d2ac88929e3d1f12bc45714ae5ee6d9a788a9fb35e60bb04b", size = 2100044, upload-time = "2025-04-23T18:31:41.034Z" },
{ url = "https://files.pythonhosted.org/packages/fe/36/def5e53e1eb0ad896785702a5bbfd25eed546cdcf4087ad285021a90ed53/pydantic_core-2.33.2-cp312-cp312-musllinux_1_1_aarch64.whl", hash = "sha256:db4b41f9bd95fbe5acd76d89920336ba96f03e149097365afe1cb092fceb89a1", size = 2058881, upload-time = "2025-04-23T18:31:42.757Z" },
{ url = "https://files.pythonhosted.org/packages/01/6c/57f8d70b2ee57fc3dc8b9610315949837fa8c11d86927b9bb044f8705419/pydantic_core-2.33.2-cp312-cp312-musllinux_1_1_armv7l.whl", hash = "sha256:fa854f5cf7e33842a892e5c73f45327760bc7bc516339fda888c75ae60edaeb6", size = 2227034, upload-time = "2025-04-23T18:31:44.304Z" },
{ url = "https://files.pythonhosted.org/packages/27/b9/9c17f0396a82b3d5cbea4c24d742083422639e7bb1d5bf600e12cb176a13/pydantic_core-2.33.2-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:5f483cfb75ff703095c59e365360cb73e00185e01aaea067cd19acffd2ab20ea", size = 2234187, upload-time = "2025-04-23T18:31:45.891Z" },
{ url = "https://files.pythonhosted.org/packages/b0/6a/adf5734ffd52bf86d865093ad70b2ce543415e0e356f6cacabbc0d9ad910/pydantic_core-2.33.2-cp312-cp312-win32.whl", hash = "sha256:9cb1da0f5a471435a7bc7e439b8a728e8b61e59784b2af70d7c169f8dd8ae290", size = 1892628, upload-time = "2025-04-23T18:31:47.819Z" },
{ url = "https://files.pythonhosted.org/packages/43/e4/5479fecb3606c1368d496a825d8411e126133c41224c1e7238be58b87d7e/pydantic_core-2.33.2-cp312-cp312-win_amd64.whl", hash = "sha256:f941635f2a3d96b2973e867144fde513665c87f13fe0e193c158ac51bfaaa7b2", size = 1955866, upload-time = "2025-04-23T18:31:49.635Z" },
{ url = "https://files.pythonhosted.org/packages/0d/24/8b11e8b3e2be9dd82df4b11408a67c61bb4dc4f8e11b5b0fc888b38118b5/pydantic_core-2.33.2-cp312-cp312-win_arm64.whl", hash = "sha256:cca3868ddfaccfbc4bfb1d608e2ccaaebe0ae628e1416aeb9c4d88c001bb45ab", size = 1888894, upload-time = "2025-04-23T18:31:51.609Z" },
{ url = "https://files.pythonhosted.org/packages/46/8c/99040727b41f56616573a28771b1bfa08a3d3fe74d3d513f01251f79f172/pydantic_core-2.33.2-cp313-cp313-macosx_10_12_x86_64.whl", hash = "sha256:1082dd3e2d7109ad8b7da48e1d4710c8d06c253cbc4a27c1cff4fbcaa97a9e3f", size = 2015688, upload-time = "2025-04-23T18:31:53.175Z" },
{ url = "https://files.pythonhosted.org/packages/3a/cc/5999d1eb705a6cefc31f0b4a90e9f7fc400539b1a1030529700cc1b51838/pydantic_core-2.33.2-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:f517ca031dfc037a9c07e748cefd8d96235088b83b4f4ba8939105d20fa1dcd6", size = 1844808, upload-time = "2025-04-23T18:31:54.79Z" },
{ url = "https://files.pythonhosted.org/packages/6f/5e/a0a7b8885c98889a18b6e376f344da1ef323d270b44edf8174d6bce4d622/pydantic_core-2.33.2-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:0a9f2c9dd19656823cb8250b0724ee9c60a82f3cdf68a080979d13092a3b0fef", size = 1885580, upload-time = "2025-04-23T18:31:57.393Z" },
{ url = "https://files.pythonhosted.org/packages/3b/2a/953581f343c7d11a304581156618c3f592435523dd9d79865903272c256a/pydantic_core-2.33.2-cp313-cp313-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:2b0a451c263b01acebe51895bfb0e1cc842a5c666efe06cdf13846c7418caa9a", size = 1973859, upload-time = "2025-04-23T18:31:59.065Z" },
{ url = "https://files.pythonhosted.org/packages/e6/55/f1a813904771c03a3f97f676c62cca0c0a4138654107c1b61f19c644868b/pydantic_core-2.33.2-cp313-cp313-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:1ea40a64d23faa25e62a70ad163571c0b342b8bf66d5fa612ac0dec4f069d916", size = 2120810, upload-time = "2025-04-23T18:32:00.78Z" },
{ url = "https://files.pythonhosted.org/packages/aa/c3/053389835a996e18853ba107a63caae0b9deb4a276c6b472931ea9ae6e48/pydantic_core-2.33.2-cp313-cp313-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:0fb2d542b4d66f9470e8065c5469ec676978d625a8b7a363f07d9a501a9cb36a", size = 2676498, upload-time = "2025-04-23T18:32:02.418Z" },
{ url = "https://files.pythonhosted.org/packages/eb/3c/f4abd740877a35abade05e437245b192f9d0ffb48bbbbd708df33d3cda37/pydantic_core-2.33.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:9fdac5d6ffa1b5a83bca06ffe7583f5576555e6c8b3a91fbd25ea7780f825f7d", size = 2000611, upload-time = "2025-04-23T18:32:04.152Z" },
{ url = "https://files.pythonhosted.org/packages/59/a7/63ef2fed1837d1121a894d0ce88439fe3e3b3e48c7543b2a4479eb99c2bd/pydantic_core-2.33.2-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:04a1a413977ab517154eebb2d326da71638271477d6ad87a769102f7c2488c56", size = 2107924, upload-time = "2025-04-23T18:32:06.129Z" },
{ url = "https://files.pythonhosted.org/packages/04/8f/2551964ef045669801675f1cfc3b0d74147f4901c3ffa42be2ddb1f0efc4/pydantic_core-2.33.2-cp313-cp313-musllinux_1_1_aarch64.whl", hash = "sha256:c8e7af2f4e0194c22b5b37205bfb293d166a7344a5b0d0eaccebc376546d77d5", size = 2063196, upload-time = "2025-04-23T18:32:08.178Z" },
{ url = "https://files.pythonhosted.org/packages/26/bd/d9602777e77fc6dbb0c7db9ad356e9a985825547dce5ad1d30ee04903918/pydantic_core-2.33.2-cp313-cp313-musllinux_1_1_armv7l.whl", hash = "sha256:5c92edd15cd58b3c2d34873597a1e20f13094f59cf88068adb18947df5455b4e", size = 2236389, upload-time = "2025-04-23T18:32:10.242Z" },
{ url = "https://files.pythonhosted.org/packages/42/db/0e950daa7e2230423ab342ae918a794964b053bec24ba8af013fc7c94846/pydantic_core-2.33.2-cp313-cp313-musllinux_1_1_x86_64.whl", hash = "sha256:65132b7b4a1c0beded5e057324b7e16e10910c106d43675d9bd87d4f38dde162", size = 2239223, upload-time = "2025-04-23T18:32:12.382Z" },
{ url = "https://files.pythonhosted.org/packages/58/4d/4f937099c545a8a17eb52cb67fe0447fd9a373b348ccfa9a87f141eeb00f/pydantic_core-2.33.2-cp313-cp313-win32.whl", hash = "sha256:52fb90784e0a242bb96ec53f42196a17278855b0f31ac7c3cc6f5c1ec4811849", size = 1900473, upload-time = "2025-04-23T18:32:14.034Z" },
{ url = "https://files.pythonhosted.org/packages/a0/75/4a0a9bac998d78d889def5e4ef2b065acba8cae8c93696906c3a91f310ca/pydantic_core-2.33.2-cp313-cp313-win_amd64.whl", hash = "sha256:c083a3bdd5a93dfe480f1125926afcdbf2917ae714bdb80b36d34318b2bec5d9", size = 1955269, upload-time = "2025-04-23T18:32:15.783Z" },
{ url = "https://files.pythonhosted.org/packages/f9/86/1beda0576969592f1497b4ce8e7bc8cbdf614c352426271b1b10d5f0aa64/pydantic_core-2.33.2-cp313-cp313-win_arm64.whl", hash = "sha256:e80b087132752f6b3d714f041ccf74403799d3b23a72722ea2e6ba2e892555b9", size = 1893921, upload-time = "2025-04-23T18:32:18.473Z" },
{ url = "https://files.pythonhosted.org/packages/a4/7d/e09391c2eebeab681df2b74bfe6c43422fffede8dc74187b2b0bf6fd7571/pydantic_core-2.33.2-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:61c18fba8e5e9db3ab908620af374db0ac1baa69f0f32df4f61ae23f15e586ac", size = 1806162, upload-time = "2025-04-23T18:32:20.188Z" },
{ url = "https://files.pythonhosted.org/packages/f1/3d/847b6b1fed9f8ed3bb95a9ad04fbd0b212e832d4f0f50ff4d9ee5a9f15cf/pydantic_core-2.33.2-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:95237e53bb015f67b63c91af7518a62a8660376a6a0db19b89acc77a4d6199f5", size = 1981560, upload-time = "2025-04-23T18:32:22.354Z" },
{ url = "https://files.pythonhosted.org/packages/6f/9a/e73262f6c6656262b5fdd723ad90f518f579b7bc8622e43a942eec53c938/pydantic_core-2.33.2-cp313-cp313t-win_amd64.whl", hash = "sha256:c2fc0a768ef76c15ab9238afa6da7f69895bb5d1ee83aeea2e3509af4472d0b9", size = 1935777, upload-time = "2025-04-23T18:32:25.088Z" },
{ url = "https://files.pythonhosted.org/packages/7b/27/d4ae6487d73948d6f20dddcd94be4ea43e74349b56eba82e9bdee2d7494c/pydantic_core-2.33.2-pp311-pypy311_pp73-macosx_10_12_x86_64.whl", hash = "sha256:dd14041875d09cc0f9308e37a6f8b65f5585cf2598a53aa0123df8b129d481f8", size = 2025200, upload-time = "2025-04-23T18:33:14.199Z" },
{ url = "https://files.pythonhosted.org/packages/f1/b8/b3cb95375f05d33801024079b9392a5ab45267a63400bf1866e7ce0f0de4/pydantic_core-2.33.2-pp311-pypy311_pp73-macosx_11_0_arm64.whl", hash = "sha256:d87c561733f66531dced0da6e864f44ebf89a8fba55f31407b00c2f7f9449593", size = 1859123, upload-time = "2025-04-23T18:33:16.555Z" },
{ url = "https://files.pythonhosted.org/packages/05/bc/0d0b5adeda59a261cd30a1235a445bf55c7e46ae44aea28f7bd6ed46e091/pydantic_core-2.33.2-pp311-pypy311_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:2f82865531efd18d6e07a04a17331af02cb7a651583c418df8266f17a63c6612", size = 1892852, upload-time = "2025-04-23T18:33:18.513Z" },
{ url = "https://files.pythonhosted.org/packages/3e/11/d37bdebbda2e449cb3f519f6ce950927b56d62f0b84fd9cb9e372a26a3d5/pydantic_core-2.33.2-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:2bfb5112df54209d820d7bf9317c7a6c9025ea52e49f46b6a2060104bba37de7", size = 2067484, upload-time = "2025-04-23T18:33:20.475Z" },
{ url = "https://files.pythonhosted.org/packages/8c/55/1f95f0a05ce72ecb02a8a8a1c3be0579bbc29b1d5ab68f1378b7bebc5057/pydantic_core-2.33.2-pp311-pypy311_pp73-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:64632ff9d614e5eecfb495796ad51b0ed98c453e447a76bcbeeb69615079fc7e", size = 2108896, upload-time = "2025-04-23T18:33:22.501Z" },
{ url = "https://files.pythonhosted.org/packages/53/89/2b2de6c81fa131f423246a9109d7b2a375e83968ad0800d6e57d0574629b/pydantic_core-2.33.2-pp311-pypy311_pp73-musllinux_1_1_aarch64.whl", hash = "sha256:f889f7a40498cc077332c7ab6b4608d296d852182211787d4f3ee377aaae66e8", size = 2069475, upload-time = "2025-04-23T18:33:24.528Z" },
{ url = "https://files.pythonhosted.org/packages/b8/e9/1f7efbe20d0b2b10f6718944b5d8ece9152390904f29a78e68d4e7961159/pydantic_core-2.33.2-pp311-pypy311_pp73-musllinux_1_1_armv7l.whl", hash = "sha256:de4b83bb311557e439b9e186f733f6c645b9417c84e2eb8203f3f820a4b988bf", size = 2239013, upload-time = "2025-04-23T18:33:26.621Z" },
{ url = "https://files.pythonhosted.org/packages/3c/b2/5309c905a93811524a49b4e031e9851a6b00ff0fb668794472ea7746b448/pydantic_core-2.33.2-pp311-pypy311_pp73-musllinux_1_1_x86_64.whl", hash = "sha256:82f68293f055f51b51ea42fafc74b6aad03e70e191799430b90c13d643059ebb", size = 2238715, upload-time = "2025-04-23T18:33:28.656Z" },
{ url = "https://files.pythonhosted.org/packages/32/56/8a7ca5d2cd2cda1d245d34b1c9a942920a718082ae8e54e5f3e5a58b7add/pydantic_core-2.33.2-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:329467cecfb529c925cf2bbd4d60d2c509bc2fb52a20c1045bf09bb70971a9c1", size = 2066757, upload-time = "2025-04-23T18:33:30.645Z" },
]
[[package]]
name = "pydantic-core"
version = "2.46.4"
source = { registry = "https://pypi.org/simple" }
resolution-markers = [
"python_full_version >= '3.14'",
"python_full_version == '3.12.*'",
"python_full_version < '3.12'",
]
dependencies = [
{ name = "typing-extensions" },
{ name = "typing-extensions", marker = "python_full_version != '3.13.*'" },
]
sdist = { url = "https://files.pythonhosted.org/packages/9d/56/921726b776ace8d8f5db44c4ef961006580d91dc52b803c489fafd1aa249/pydantic_core-2.46.4.tar.gz", hash = "sha256:62f875393d7f270851f20523dd2e29f082bcc82292d66db2b64ea71f64b6e1c1", size = 471464, upload-time = "2026-05-06T13:37:06.98Z" }
wheels = [
@@ -5047,7 +5172,8 @@ name = "pydantic-extra-types"
version = "2.11.1"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "pydantic" },
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
{ name = "typing-extensions" },
]
sdist = { url = "https://files.pythonhosted.org/packages/66/71/dba38ee2651f84f7842206adbd2233d8bbdb59fb85e9fa14232486a8c471/pydantic_extra_types-2.11.1.tar.gz", hash = "sha256:46792d2307383859e923d8fcefa82108b1a141f8a9c0198982b3832ab5ef1049", size = 172002, upload-time = "2026-03-16T08:08:03.92Z" }
@@ -5060,7 +5186,8 @@ name = "pydantic-settings"
version = "2.14.1"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "pydantic" },
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
{ name = "python-dotenv" },
{ name = "typing-inspection" },
]
@@ -5424,7 +5551,8 @@ dependencies = [
{ name = "numpy" },
{ name = "portalocker" },
{ name = "protobuf" },
{ name = "pydantic" },
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
{ name = "urllib3" },
]
sdist = { url = "https://files.pythonhosted.org/packages/65/45/5b1bdd15a3c7730eefb9c113600829e20d689b82b5a23f9e07d107094004/qdrant_client-1.18.0.tar.gz", hash = "sha256:52e8ece1a7d40519801bf0b70713bfa0f6b7ae28c7275bbe0b0286fbed7f6db4", size = 352580, upload-time = "2026-05-11T14:12:38.702Z" }
@@ -5915,8 +6043,10 @@ version = "0.1.28"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "httpx" },
{ name = "pydantic" },
{ name = "pydantic-core" },
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
{ name = "pydantic-core", version = "2.33.2", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
{ name = "pydantic-core", version = "2.46.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
{ name = "typing-extensions" },
{ name = "websockets" },
]
@@ -6244,7 +6374,8 @@ version = "0.2.8"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "numpy" },
{ name = "pydantic" },
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
{ name = "speechmatics-rt" },
]
sdist = { url = "https://files.pythonhosted.org/packages/e4/b2/72b5b2203bbefbd22e7692adaca0dd7c2feebed1aaea5599ec579f74fbbf/speechmatics_voice-0.2.8.tar.gz", hash = "sha256:b2d9cbf773fd94400c744734662e2b16b5bdc4271d0dafde46ac032c438fe000", size = 61419, upload-time = "2026-01-26T16:26:09.082Z" }
@@ -6544,7 +6675,8 @@ dependencies = [
{ name = "opentelemetry-api" },
{ name = "opentelemetry-instrumentation-threading" },
{ name = "opentelemetry-sdk" },
{ name = "pydantic" },
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
{ name = "pydantic", version = "2.13.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version != '3.13.*'" },
{ name = "pyyaml" },
{ name = "typing-extensions" },
{ name = "watchdog" },
@@ -7131,6 +7263,18 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/f4/34/a9dbe051de88a63eb7408ea66630bac38e72f7f6077d4be58737106860d9/virtualenv-21.3.3-py3-none-any.whl", hash = "sha256:7d5987d8369e098e41406efb780a3d4ca79280097293899e351a6407ee153ab3", size = 7594554, upload-time = "2026-05-13T18:01:27.815Z" },
]
[[package]]
name = "vonage-video-connector"
version = "0.2.3b0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "pydantic", version = "2.11.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.13.*'" },
]
wheels = [
{ url = "https://files.pythonhosted.org/packages/8a/db/385df7fd618b31f0def554aca568d87b4b2f9ccc3a1457ae7eea5e8bf775/vonage_video_connector-0.2.3b0-py3-none-manylinux_2_35_aarch64.whl", hash = "sha256:9d1ffa93f3aadd24a980294df2b63b0f853b8dfa25b277690e0864e7586f8bb7", size = 12101114, upload-time = "2026-03-02T15:34:45.007Z" },
{ url = "https://files.pythonhosted.org/packages/9f/4e/03b183599370473c3277140e9ecbb33621449935a02042ecbcf8c555ebad/vonage_video_connector-0.2.3b0-py3-none-manylinux_2_35_x86_64.whl", hash = "sha256:718e39e7e488ac50fecda75e24ab01c9d16d4078bb4f79ee7857e282493e2e4e", size = 13971535, upload-time = "2026-03-02T15:34:47.186Z" },
]
[[package]]
name = "wait-for2"
version = "0.4.1"