Compare commits

..

97 Commits

Author SHA1 Message Date
James Hush
c9f5f44287 Add claude code files 2026-01-27 15:23:33 +08:00
Mark Backman
ecf2e69f3f Merge pull request #3536 from surapuramakhil/main
LLMAssistantAggregator: preserve non-ASCII characters in JSON output
2026-01-26 16:42:05 -05:00
Mark Backman
1542d922e7 Merge pull request #3546 from pipecat-ai/pk/changelog-fragment-for-pr-3406
Added a changelog fragment for PR 3406
2026-01-26 16:31:57 -05:00
Paul Kompfner
15d5d1159e Added a changelog fragment for PR 3406 2026-01-26 16:27:33 -05:00
Mark Backman
884630a6bd Merge pull request #3559 from pipecat-ai/aleix/transport-broadcast-fixes
transports: fix broadcast_frame_class reference
2026-01-26 16:25:31 -05:00
Mark Backman
1cf137c6a8 Merge pull request #3565 from pipecat-ai/markbackman-patch-1 2026-01-26 15:49:35 -05:00
Mark Backman
9c6b11cecf Update README links to use absolute URLs 2026-01-26 13:03:39 -05:00
Mark Backman
061a0dc43d Merge pull request #3498 from pipecat-ai/mb/azure-tts-8khz-workaround
AzureTTSService 8khz workaround
2026-01-26 09:48:22 -05:00
Mark Backman
328bbe069f Merge pull request #3554 from pipecat-ai/mb/simplify-stt-ttfb
Simplify STT finalize handling
2026-01-26 08:00:04 -05:00
Mark Backman
dc32ecc872 Merge pull request #3555 from pipecat-ai/mb/speechmatics-stt-ttfb
Align Speechmatics STT TTFB metrics with STT classes
2026-01-26 07:59:34 -05:00
Aleix Conchillo Flaqué
f94a60f381 transports: fix broadcast_frame_class reference 2026-01-25 15:42:09 -08:00
Mark Backman
a4acc12f91 Align Speechmatics STT TTFB metrics with STT classes 2026-01-24 18:26:34 -05:00
Mark Backman
e93112e76e Simplify STT finalize handling 2026-01-24 15:28:27 -05:00
Mark Backman
680bcaac66 Merge pull request #3550 from pipecat-ai/mb/update-smart-turn-data-env-var
Update env var to PIPECAT_SMART_TURN_LOG_DATA
2026-01-24 13:52:36 -05:00
Mark Backman
d2ac9006a2 Update env var to PIPECAT_SMART_TURN_LOG_DATA 2026-01-24 12:50:42 -05:00
Mark Backman
bcb019e8ab Add TTFB metrics for STT services (#3495) 2026-01-23 18:47:34 -05:00
kompfner
4ea546785f Merge pull request #3406 from omChauhanDev/fix/openrouter-gemini-messages
fix(openrouter): handle multiple system messages for Gemini models
2026-01-23 14:53:59 -05:00
Aleix Conchillo Flaqué
8951442b8e Merge pull request #3534 from pipecat-ai/aleix/claude-skills-pr-description
claude: add pr-description skill
2026-01-22 17:34:46 -08:00
Aleix Conchillo Flaqué
7e6e3031e7 claude: add pr-description skill 2026-01-22 13:41:50 -08:00
Akhil
3b3c7aa8cc LLMAssistantAggregator: preserve non-ASCII characters in JSON output
Add ensure_ascii=False to json.dumps() calls for tool call arguments
and function call results to prevent unnecessary unicode escaping.
2026-01-22 15:37:44 -06:00
Aleix Conchillo Flaqué
308829f92b Merge pull request #3533 from pipecat-ai/aleix/claude-skills-docstring
claude: add docstring skill
2026-01-22 12:58:38 -08:00
Aleix Conchillo Flaqué
82a799e63e claude: add docstring skill 2026-01-22 12:53:38 -08:00
Cale Shapera
6b5bcae86f change default Inworld TTS model to inworld-tts-1.5-max (#3531) 2026-01-22 14:21:15 -05:00
Mark Backman
836073849c Merge pull request #3527 from weakcamel/patch-1
Update README.md - fix Google Imagen URL
2026-01-22 10:46:10 -05:00
Waldek Maleska
b13b65d6e2 Update README.md - fix Google Imagen URL 2026-01-22 15:17:41 +00:00
Mark Backman
3d545b718d Merge pull request #3344 from omChauhanDev/fix/stt-dynamic-language-update
fix: treat language as first-class STT setting
2026-01-22 09:21:56 -05:00
marcus-daily
f2fa5d9733 Updating changelog 2026-01-22 14:17:59 +00:00
marcus-daily
76b774072c Formatting fixes 2026-01-22 14:17:59 +00:00
marcus-daily
b6341ffaa5 Save Smart Turn input data if SMART_TURN_LOG_DATA is set 2026-01-22 14:17:59 +00:00
Mark Backman
29fae67c9e Merge pull request #3523 from omChauhanDev/add-location-support-google-tts
feat(google): add location parameter to TTS services
2026-01-22 09:12:16 -05:00
Mark Backman
718ea1c15e Merge pull request #3526 from pipecat-ai/mb/remove-logs
Remove application logs
2026-01-22 08:48:07 -05:00
Mark Backman
8e09d94614 Remove application logs 2026-01-22 08:28:52 -05:00
Aleix Conchillo Flaqué
de73e28563 Merge pull request #3510 from omChauhanDev/feat/add-reached-filter-methods
feat(task): add additive filter methods for frame monitoring
2026-01-21 21:05:33 -08:00
Aleix Conchillo Flaqué
55250b4f7e Merge pull request #3521 from pipecat-ai/aleix/claude-changelog-skill
claude: initial /changelog skill
2026-01-21 20:50:47 -08:00
Om Chauhan
281145a991 added changelog 2026-01-22 09:55:57 +05:30
Om Chauhan
7bd32e2fe5 feat(google): add location parameter to TTS services 2026-01-22 09:49:19 +05:30
James Hush
8f05d95f50 feat: add video_out_codec parameter for DailyTransport (#3520)
* feat: add video_out_codec parameter for DailyTransport

Add video_out_codec parameter to TransportParams allowing configuration
of the preferred video codec (VP8, H264, H265) for video output.

When set, this passes the preferredCodec option to Daily's
VideoPublishingSettings during the join operation.

* chore: move video_out_codec parameter to changelog folder (#3522)

* Initial plan

* Move video_out_codec parameter to changelog/3520.added.md

Co-authored-by: jamsea <614910+jamsea@users.noreply.github.com>

* Revert all CHANGELOG.md changes, keep only changelog/3520.added.md

Co-authored-by: jamsea <614910+jamsea@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jamsea <614910+jamsea@users.noreply.github.com>

---------

Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jamsea <614910+jamsea@users.noreply.github.com>
2026-01-22 11:31:07 +08:00
Om Chauhan
87c12f3098 changed frame filter storage type from tuples to sets 2026-01-22 08:43:46 +05:30
Om Chauhan
9c0bf89247 added changelog 2026-01-22 08:43:46 +05:30
Om Chauhan
6e44a2ab49 feat(task): add additive filter methods for frame monitoring 2026-01-22 08:43:46 +05:30
Aleix Conchillo Flaqué
7aa7b86aed claude: initial /changelog skill 2026-01-21 18:43:04 -08:00
Aleix Conchillo Flaqué
5ad9faeb4c Merge pull request #3519 from pipecat-ai/aleix/embedded-rtvi-processor
automatically add RTVI to the pipeline
2026-01-21 18:17:26 -08:00
Aleix Conchillo Flaqué
9e8f8b45c6 added changelog files for #3519 2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
0ee11ad333 tests: disable RTVI in tests by default 2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
124a3c35af RTVIObserver: don't handle some frames direction 2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
054e504868 examples(foundational): remove RTVI (automatically added by PipelineTask) 2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
e85a00cc0e PipelineTask: automatically add RTVI processor and RTVI observer
If `enable_rtvi` is enabled (enabled by default) and RTVI processor will be
added automatically to the pipeline. Also, and RTVI observer will be
registered.
2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
cc61cdbba3 RTVIProcessor: add create_rtvi_observer() 2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
62f4708d43 transports: broadcast InputTransportMessageFrame frames 2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
ba0ddb1832 FrameProcessor: copy kwargs when broadcasting frame 2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
eacd2a4b71 FrameProcessor: add broadcast_frame_instance() 2026-01-21 18:14:17 -08:00
Mark Backman
7ed110650d Merge pull request #3516 from okue/minorpatch1
refactor(user_mute): remove unnecessary _bot_speaking assignment in _handle_bot_stopped_speaking
2026-01-21 10:33:59 -05:00
okue
4a724379fc refactor(user_mute): remove unnecessary _bot_speaking assignment in _handle_bot_stopped_speaking
The _bot_speaking flag does not need to be set in this method,
so the redundant assignment has been removed.
2026-01-21 23:59:15 +09:00
Aleix Conchillo Flaqué
768d3958dd Merge pull request #3512 from pipecat-ai/changelog-0.0.100
Release 0.0.100 - Changelog Update
2026-01-20 19:32:56 -08:00
aconchillo
5f9ff8bd58 Update changelog for version 0.0.100 2026-01-20 19:21:19 -08:00
Aleix Conchillo Flaqué
59ed422052 Merge pull request #3511 from pipecat-ai/aleix/camb-tts-client-on-start
CambTTSService: initialize client during StartFrame
2026-01-20 19:17:45 -08:00
Aleix Conchillo Flaqué
7e0ca113af CambTTSService: initialize client during StartFrame 2026-01-20 19:07:12 -08:00
Aleix Conchillo Flaqué
13c52e0e6d Merge pull request #3509 from pipecat-ai/aleix/nvidia-stt-tts-improvements
NVIDIA STT/TTS performance improvements
2026-01-20 16:39:12 -08:00
Aleix Conchillo Flaqué
a787fd9cd8 NVIDIATTSService: process incoming audio frame right away
Process audio as soon as we receive it from the generator. Previously, we were
reading from the generator and adding elements into a queue until there was no
more data, then we would process the queue.
2026-01-20 15:41:05 -08:00
Aleix Conchillo Flaqué
14495c425a NVIDIASTTService: no need for additional queue and task 2026-01-20 13:50:17 -08:00
Aleix Conchillo Flaqué
461bd0a2e0 update changelog for #3494 and #3499 2026-01-20 13:26:40 -08:00
Aleix Conchillo Flaqué
bd45ce2b4e Merge pull request #3499 from lukepayyapilli/fix/livekit-video-queue-memory-leak
fix(livekit): prevent memory leak when video_in_enabled is False
2026-01-20 13:21:21 -08:00
Aleix Conchillo Flaqué
a266644b06 Merge pull request #3494 from omChauhanDev/fix/uninterruptible-frame-handling
fix: preserve UninterruptibleFrames in __reset_process_queue
2026-01-20 13:19:40 -08:00
Mark Backman
03faadd7f9 Merge pull request #3508 from pipecat-ai/ss/log-daily-ids
Log Daily participant and meeting session IDs upon successful join in…
2026-01-20 15:43:48 -05:00
Aleix Conchillo Flaqué
bf43032652 Merge pull request #3504 from pipecat-ai/aleix/nvidia-stt-tts-error-handling
NVIDIA STT/TTS error handling
2026-01-20 09:41:08 -08:00
Sunah Suh
fa6f924b31 Log Daily participant and meeting session IDs upon successful join in Daily Transport 2026-01-20 11:31:17 -06:00
Aleix Conchillo Flaqué
a010a020fd add changelog fo 3504 2026-01-20 09:03:30 -08:00
Aleix Conchillo Flaqué
655006aff5 NvidiaSegmentedSTTService: simplify exception handling 2026-01-20 08:58:14 -08:00
Aleix Conchillo Flaqué
671dc8cd9b NvidiaSTTService: initialize client on StartFrame
Initialize client on StartFrame so errrors are reported within the pipeline.
2026-01-20 08:58:14 -08:00
Aleix Conchillo Flaqué
9a718ded1e NvidiaTTSService: initialize client on StartFrame
Initialize client on StartFrame so errrors are reported within the pipeline.
2026-01-20 08:58:14 -08:00
Aleix Conchillo Flaqué
024809b39a Merge pull request #3503 from pipecat-ai/aleix/ai-service-start-end-cancel
AIService: handle StartFrame/EndFrame/CancelFrame exceptions
2026-01-20 08:56:39 -08:00
Aleix Conchillo Flaqué
6cf0d53d00 AIService: handle StartFrame/EndFrame/CancelFrame exceptions
If AIService subclasses implement start()/stop()/cancel() and exception are not
handled, execution will not continue and therefore the originator frames will
not be pushed. This would cause the pipeline to not be started (i.e. StartFrame
would not be pushed downstream) or stopped properly.
2026-01-20 08:54:22 -08:00
kompfner
778dacc9a8 Merge pull request #3486 from pipecat-ai/pk/fix-nova-sonic-reset-conversation
Fix `AWSNovaSonicLLMService.reset_conversation()`
2026-01-20 10:07:38 -05:00
Paul Kompfner
06b3ecd2d6 In AWS Nova Sonic service, send the "interactive" user message (which triggers the bot response) only after sending the audio input start event, per the AWS team's recommendation 2026-01-20 09:56:25 -05:00
Paul Kompfner
b4d143e39b Add CHANGELOG for fixing AWSNovaSonicLLMService.reset_conversation() 2026-01-20 09:56:25 -05:00
Paul Kompfner
c89083e72e Improve 20e example to ask the bot to give a recap when loading a previous conversation from disk 2026-01-20 09:56:25 -05:00
Luke Payyapilli
1ac811ab32 chore: revert unrelated uv.lock changes 2026-01-20 09:19:43 -05:00
Luke Payyapilli
f6359d460e chore: install livekit as optional extra in CI instead of dev dep 2026-01-20 09:16:16 -05:00
Aleix Conchillo Flaqué
f03a7175c7 Merge pull request #3501 from pipecat-ai/aleix/improve-eval-numerical-word-prompt
scripts(eval): give examples to numerical word answers
2026-01-19 20:22:06 -08:00
Aleix Conchillo Flaqué
aed44c863a scripts(eval): give examples to numerical word answers
Some models need extra help.
2026-01-19 14:37:00 -08:00
Mark Backman
cddd6d5b0a Merge pull request #3492 from pipecat-ai/mb/remove-unused-imports
Remove unused imports
2026-01-19 14:07:16 -05:00
Mark Backman
11cf891ac8 Manual updates for unused imports 2026-01-19 14:03:22 -05:00
Luke Payyapilli
c89ae717fe style: fix ruff formatting 2026-01-19 11:13:41 -05:00
Luke Payyapilli
562bdd3084 test: add livekit to dev deps and improve test clarity 2026-01-19 11:11:54 -05:00
Mark Backman
cc4c3650e1 Merge pull request #3491 from pipecat-ai/mb/update-release-evals
Add Camb TTS to release evals
2026-01-19 11:04:05 -05:00
Luke Payyapilli
dfc1f09b77 fix(livekit): prevent memory leak when video_in_enabled is False 2026-01-19 11:00:23 -05:00
Mark Backman
0b1a4792b8 Bump to latest azure-cognitiveservices-speech version, 1.47.0 2026-01-19 09:52:28 -05:00
Mark Backman
14bd3b1b32 Set Azure TTS default prosody rate to None 2026-01-19 09:19:57 -05:00
Mark Backman
f733e77496 AzureTTS: work around word ordering issue at 8khz sample rate 2026-01-19 09:13:41 -05:00
Filipi da Silva Fuchter
5fc46cc450 Merge pull request #3493 from omChauhanDev/fix/globally-unique-pc-id
fix: make SmallWebRTCConnection pc_id globally unique
2026-01-19 09:04:48 -05:00
Om Chauhan
4a9eb82f92 fix: preserve UninterruptibleFrames in __reset_process_queue 2026-01-18 20:39:13 +05:30
Om Chauhan
990d8386e4 fix: make SmallWebRTCConnection pc_id globally unique 2026-01-18 19:41:51 +05:30
Mark Backman
ce7d823770 Remove unused imports 2026-01-18 08:22:22 -05:00
Mark Backman
0b93c3f900 Add Camb TTS to release evals 2026-01-17 16:27:16 -05:00
Paul Kompfner
6fa797c8e4 Fix AWS Nova Sonic reset_conversation(), which would previously error out.
Issues:
- After disconnecting, we were prematurely sending audio messages using the new prompt and content names, before the new prompt and content were created
- We weren't properly sending system instruction and conversation history messages to Nova Sonic with `"interactive": false`
2026-01-16 22:31:54 -05:00
Om Chauhan
38506f51f7 fix(openrouter): handle multiple system messages for Gemini models 2026-01-11 21:19:47 +05:30
Om Chauhan
1ceb01665f fix: treat language as first-class STT setting 2026-01-04 11:04:30 +05:30
152 changed files with 2342 additions and 461 deletions

8
.claude/.gitignore vendored Normal file
View File

@@ -0,0 +1,8 @@
# Claude Code temporary files
*.tmp
*.log
.claude-cache/
# OS files
.DS_Store
Thumbs.db

200
.claude/QUICKSTART.md Normal file
View File

@@ -0,0 +1,200 @@
# Claude Code Quick Start for Pipecat
This guide helps you get started using Claude Code with the Pipecat project.
## Initial Setup
1. **Install Claude Code** (if not already installed):
```bash
# Follow instructions at https://claude.ai/claude-code
```
2. **Install project dependencies**:
```bash
uv sync --group dev --all-extras --no-extra gstreamer --no-extra krisp --no-extra local
```
3. **Install pre-commit hooks**:
```bash
uv run pre-commit install
```
## Common Commands
### Testing
- "Run all tests"
- "Run tests for [specific file]"
- "Run tests and show coverage"
### Code Quality
- "Format the code"
- "Fix linting issues"
- "Run type checking"
- "Run pre-commit hooks"
### Development
- "Add a new TTS service for [provider]"
- "Create a new processor that [does something]"
- "Add a new frame type for [purpose]"
- "Document the [ClassName] class" (uses `/docstring` skill)
### Documentation
- "Document this module using Google style"
- "Add docstrings to [file/class]"
- Use `/docstring ClassName` for comprehensive class documentation
### Git Operations
- "Create a commit for these changes"
- "Create a pull request"
- Use `/pr-description` skill for detailed PR descriptions
- Use `/changelog` skill for changelog entries
## Custom Skills
### `/docstring [ClassName]`
Automatically documents a Python class and its methods following Google-style conventions.
**Example:**
```
/docstring AudioProcessor
```
This will:
- Find the class in the codebase
- Add module docstring if missing
- Add class docstring with purpose and event handlers
- Document all public methods
- Document constructor parameters
- Skip private methods and already-documented code
### `/changelog`
Generates changelog entries using towncrier.
### `/pr-description`
Creates comprehensive pull request descriptions based on your changes.
## Project-Specific Tips
### Understanding Pipecat Architecture
When asking Claude Code to help with development:
1. **Frame-Based System**: All data flows through frames
- Ask: "Explain how frames work in this pipeline"
- Reference: `src/pipecat/frames/frames.py`
2. **Processor Pattern**: Everything is a processor
- Ask: "Show me how to create a custom processor"
- Reference: `src/pipecat/processors/frame_processor.py`
3. **Service Integrations**: Many AI service integrations
- Ask: "How do I add a new TTS service?"
- Reference: `src/pipecat/services/tts/`
### Working with Examples
- "Show me examples of [feature]"
- "Create a simple example that [does something]"
- Examples are in `examples/foundational/` (building blocks) and `examples/` (complete apps)
### Debugging
- "Help me debug this pipeline"
- "Why isn't my processor receiving frames?"
- "Trace the flow of this frame type through the pipeline"
## Best Practices
1. **Be Specific**: Instead of "fix this", say "fix the audio dropouts in the TTS processor"
2. **Context**: Provide context about what you're building
- "I'm building a voice assistant that needs to interrupt TTS"
- "I want to add vision capabilities to this chatbot"
3. **Reference Examples**: Point to existing patterns
- "Similar to how DeepgramTTS works"
- "Following the pattern in OpenAILLMService"
4. **Test-Driven**: Ask for tests
- "Create tests for this processor"
- "Add test coverage for the error handling"
5. **Documentation**: Keep docs updated
- "Update the docstrings for these changes"
- "Add a usage example to the class docstring"
## Example Conversations
### Adding a New Feature
```
You: "I need to add a processor that detects when the user says 'hello' and triggers an event"
Claude Code will:
1. Create the processor class
2. Implement frame processing logic
3. Add event emission
4. Create tests
5. Add documentation
```
### Debugging an Issue
```
You: "The audio is cutting out in my pipeline. Here's the code: [paste code]"
Claude Code will:
1. Analyze the pipeline structure
2. Check for common issues (buffer sizes, async handling, etc.)
3. Suggest fixes
4. Explain the root cause
```
### Refactoring
```
You: "Refactor the XYZ service to use the new WebSocket pattern from ABC service"
Claude Code will:
1. Analyze both services
2. Identify the pattern differences
3. Apply the refactoring
4. Update tests
5. Maintain backward compatibility if needed
```
## Useful Prompts
- "Explain how [feature] works in this codebase"
- "Add error handling for [scenario]"
- "Create an example that demonstrates [feature]"
- "Optimize this processor for [use case]"
- "Add logging to help debug [issue]"
- "Make this code more maintainable"
- "Add type hints to this file"
- "Create a comprehensive test suite for [component]"
## Configuration Reference
All Claude Code settings are in [.claude/settings.json](.claude/settings.json):
- Project commands (test, lint, format, etc.)
- Coding standards
- File patterns
- Important files and directories
For detailed architecture info, see [.claude/README.md](.claude/README.md).
## Getting Help
- **Project docs**: https://docs.pipecat.ai
- **Discord**: https://discord.gg/pipecat
- **GitHub Issues**: https://github.com/pipecat-ai/pipecat/issues
- **Examples**: https://github.com/pipecat-ai/pipecat-examples
## Tips for Success
1. Start with small, specific tasks
2. Use the custom skills (`/docstring`, `/pr-description`, etc.)
3. Reference existing code patterns
4. Ask for explanations when confused
5. Request tests and documentation
6. Run pre-commit hooks before committing
Happy coding with Claude! 🎙️🤖

177
.claude/README.md Normal file
View File

@@ -0,0 +1,177 @@
# Claude Code Setup for Pipecat
This directory contains configuration and custom skills for working with the Pipecat project using Claude Code.
## Project Overview
Pipecat is an open-source Python framework for building real-time voice and multimodal conversational agents. It provides a composable, frame-based architecture for orchestrating audio, video, AI services, and conversation pipelines.
## Architecture
### Core Concepts
1. **Frames** - The fundamental data units in Pipecat (audio, text, images, system messages, etc.)
- Located in: `src/pipecat/frames/frames.py`
- Different frame types for different data: `AudioRawFrame`, `TextFrame`, `ImageRawFrame`, etc.
2. **Processors** - Processing units that receive, transform, and emit frames
- Base class: `src/pipecat/processors/frame_processor.py`
- Can be chained to form pipelines
- Examples: STT services, LLMs, TTS services, aggregators, etc.
3. **Pipelines** - Chains of processors that define data flow
- Created using the `Pipeline` class
- Processors linked using `link()` method or `|` operator
4. **Transports** - Handle input/output for audio/video streams
- WebRTC (Daily), WebSocket, Local audio, etc.
- Located in: `src/pipecat/transports/`
### Key Directories
- `src/pipecat/` - Main source code
- `frames/` - Frame definitions and utilities
- `processors/` - Base processors and common processors
- `services/` - AI service integrations (STT, TTS, LLM, etc.)
- `transports/` - Transport implementations
- `audio/` - Audio processing utilities
- `examples/` - Example applications and foundational examples
- `tests/` - Test suite
- `docs/` - Documentation source
## Development Workflow
### Setup
```bash
# Install dependencies
uv sync --group dev --all-extras --no-extra gstreamer --no-extra krisp --no-extra local
# Install pre-commit hooks
uv run pre-commit install
```
### Running Tests
```bash
# All tests
uv run pytest
# Specific test file
uv run pytest tests/test_name.py
# With coverage
uv run coverage run --module pytest
uv run coverage report
```
### Code Quality
```bash
# Format code
uv run ruff format .
# Lint code
uv run ruff check .
# Fix linting issues
uv run ruff check --fix .
# Type checking
uv run pyright
# Run all pre-commit hooks
uv run pre-commit run --all-files
```
### Building
```bash
# Build package
uv build
```
## Custom Skills
This project includes custom Claude Code skills:
### `/docstring`
Document Python modules and classes using Google-style docstrings.
Usage: `/docstring ClassName`
### `/changelog`
Generate changelog entries using towncrier.
### `/pr-description`
Generate comprehensive PR descriptions based on changes.
## Coding Standards
1. **Docstrings** - Use Google-style docstrings for all public APIs
- Module docstrings required
- Class docstrings with purpose and event handlers
- Method docstrings with Args/Returns/Raises
- Constructor (`__init__`) must document all parameters
2. **Type Hints** - Required for all function signatures
- Use `from typing import ...` for complex types
- Dataclasses should have field type annotations
3. **Async/Await** - Consistent use of async patterns
- Most processors use async methods
- Tests use pytest-asyncio
4. **Code Style**
- Line length: 100 characters max
- Ruff for linting and formatting
- Follow existing patterns in the codebase
5. **Testing**
- Write tests for new features
- Use pytest fixtures for common setups
- Mock external services when appropriate
## Contributing
1. Fork the repository
2. Create a feature branch
3. Make changes following coding standards
4. Add tests for new functionality
5. Run pre-commit hooks: `uv run pre-commit run --all-files`
6. Submit a pull request
## Common Tasks
### Adding a New Service Integration
1. Create service file in `src/pipecat/services/<category>/`
2. Inherit from appropriate base class (e.g., `TTSService`, `LLMService`)
3. Implement required abstract methods
4. Add service to `pyproject.toml` optional dependencies
5. Add documentation
6. Add tests in `tests/`
### Adding a New Processor
1. Create processor in `src/pipecat/processors/`
2. Inherit from `FrameProcessor` or appropriate subclass
3. Override `process_frame()` method
4. Handle relevant frame types
5. Emit frames using `await self.push_frame()`
6. Add tests
### Adding a New Frame Type
1. Add frame definition to `src/pipecat/frames/frames.py`
2. Inherit from appropriate base frame class
3. Use `@dataclass` decorator for data frames
4. Document the frame type and its fields
5. Update processors that should handle this frame type
## Resources
- [Documentation](https://docs.pipecat.ai)
- [GitHub Repository](https://github.com/pipecat-ai/pipecat)
- [Examples](https://github.com/pipecat-ai/pipecat-examples)
- [Discord Community](https://discord.gg/pipecat)

81
.claude/settings.json Normal file
View File

@@ -0,0 +1,81 @@
{
"description": "Pipecat - Open-source Python framework for real-time voice and multimodal AI agents",
"conventions": {
"language": "Python",
"version": ">=3.10",
"package_manager": "uv",
"code_style": "Google docstrings, Ruff formatting",
"test_framework": "pytest",
"async": true
},
"project_info": {
"type": "python_library",
"framework": "pipecat",
"main_source": "src/pipecat",
"examples": "examples/",
"tests": "tests/",
"docs": "docs/"
},
"commands": {
"install": "uv sync --group dev --all-extras --no-extra gstreamer --no-extra krisp --no-extra local",
"test": "uv run pytest",
"test_file": "uv run pytest {file}",
"lint": "uv run ruff check .",
"lint_fix": "uv run ruff check --fix .",
"format": "uv run ruff format .",
"format_check": "uv run ruff format --check .",
"type_check": "uv run pyright",
"pre_commit": "uv run pre-commit run --all-files",
"build": "uv build",
"changelog": "uv run towncrier build --version {version}"
},
"coding_standards": [
"Use Google-style docstrings for all public classes and methods",
"Follow Ruff linting rules (see pyproject.toml)",
"Maintain type hints for all function signatures",
"Use async/await patterns consistently",
"Keep line length at 100 characters maximum",
"Use dataclasses with type annotations for configuration classes",
"Prefer composition over inheritance where appropriate",
"Write comprehensive pytest tests with asyncio support",
"Document event handlers in class docstrings with Example:: sections"
],
"file_patterns": {
"source_files": "src/pipecat/**/*.py",
"test_files": "tests/**/*.py",
"example_files": "examples/**/*.py",
"config_files": "pyproject.toml"
},
"important_files": [
"pyproject.toml - Project configuration and dependencies",
"CONTRIBUTING.md - Contributing guidelines",
"README.md - Project overview and quick start",
"src/pipecat/__init__.py - Main package exports",
"src/pipecat/frames/frames.py - Core frame definitions",
"src/pipecat/processors/frame_processor.py - Base processor class"
],
"documentation": {
"style": "Google",
"build_command": "cd docs && make html",
"skip_private_methods": true,
"skip_simple_dunders": true,
"require_module_docstrings": true,
"require_class_docstrings": true,
"require_init_docstrings": true
},
"git": {
"main_branch": "main",
"commit_style": "Conventional Commits",
"pre_commit_hooks": true
},
"ai_assistance_notes": [
"This project is a real-time voice and multimodal AI framework",
"Core concepts: Frames (data units), Processors (processing units), Pipelines (chains of processors)",
"Heavy use of async/await for real-time processing",
"WebRTC and WebSocket transports for audio/video streaming",
"Integration with many AI services (OpenAI, Anthropic, Deepgram, ElevenLabs, etc.)",
"Frame-based architecture allows composable, modular pipeline construction",
"Tests use pytest-asyncio for async test support",
"Pre-commit hooks enforce code quality (run 'uv run pre-commit install')"
]
}

View File

@@ -0,0 +1,40 @@
---
name: changelog
description: Create changelog files for important commits in a PR
---
Create changelog files for the important commits in this PR. The PR number is provided as an argument.
## Instructions
1. First, check what commits are on the current branch compared to main:
```
git log main..HEAD --oneline
```
2. For each significant change, create a changelog file in the `changelog/` folder using the format:
- `{PR_NUMBER}.added.md` - for new features
- `{PR_NUMBER}.added.2.md`, `{PR_NUMBER}.added.3.md` - for additional new features
- `{PR_NUMBER}.changed.md` - for changes to existing functionality
- `{PR_NUMBER}.fixed.md` - for bug fixes
- `{PR_NUMBER}.deprecated.md` - for deprecations
3. Each changelog file should at least contain a main single line starting with `- ` followed by a clear description of the change.
4. If the change is complicated, changelog files can have indented lines after the main line with additional details or code samples.
5. Use ⚠️ emoji prefix for breaking changes.
## Example
For PR #3519 with a new feature and a bug fix:
`changelog/3519.added.md`:
```
- Added `SomeNewFeature` for doing something useful.
```
`changelog/3519.fixed.md`:
```
- Fixed an issue where something was not working correctly.
```

View File

@@ -0,0 +1,257 @@
---
name: docstring
description: Document a Python module and its classes using Google style
---
Document a Python module and its classes using Google-style docstrings following project conventions. The class name is provided as an argument.
## Instructions
1. First, find the class in the codebase:
```
Search for "class ClassName" in src/pipecat/
```
2. If multiple files contain that class name:
- List all matches with their file paths
- Ask the user which one they want to document
- Wait for confirmation before proceeding
3. Once the file is identified, read the module to understand its structure:
- Identify all classes, functions, and important type aliases
- Understand the purpose of each component
4. Apply documentation in this order:
- Module docstring (at top, after imports)
- Class docstrings
- `__init__` methods (always document constructor parameters)
- Public methods (not starting with `_`)
- Dataclass/config classes with field descriptions
5. Skip documentation for:
- Private methods (starting with `_`)
- Simple dunder methods (`__str__`, `__repr__`, `__post_init__`)
- Very simple pass-through properties
- **Already documented code** - If a class, method, or function already has a complete docstring that follows the project style, do not modify it. A docstring is complete if it has:
- A one-line summary
- Args section (if it has parameters)
- Returns section (if it returns something meaningful)
- Only add or improve documentation where it is missing or incomplete
## Module Docstring Format
```python
"""[One-line description of module purpose].
[Optional: Longer explanation of functionality, key classes, or use cases.]
"""
```
Example:
```python
"""Neuphonic text-to-speech service implementations.
This module provides WebSocket and HTTP-based integrations with Neuphonic's
text-to-speech API for real-time audio synthesis.
"""
```
## Class Docstring Format
```python
class ClassName:
"""One-line summary describing what the class does.
[Longer description explaining purpose, behavior, and key features.
Use action-oriented language.]
[Optional: Event handlers, usage notes, or important caveats.]
"""
```
Example:
```python
class FrameProcessor(BaseObject):
"""Base class for all frame processors in the pipeline.
Frame processors are the building blocks of Pipecat pipelines, they can be
linked to form complex processing pipelines. They receive frames, process
them, and pass them to the next or previous processor in the chain.
Event handlers available:
- on_before_process_frame: Called before a frame is processed
- on_after_process_frame: Called after a frame is processed
Example::
@processor.event_handler("on_before_process_frame")
async def on_before_process_frame(processor, frame):
...
@processor.event_handler("on_after_process_frame")
async def on_after_process_frame(processor, frame):
...
"""
```
Note: When listing event handlers, do NOT use backticks. Include an `Example::` section (with double colon for Sphinx) showing the decorator pattern and function signature for each event.
## Constructor (`__init__`) Format
```python
def __init__(self, *, param1: Type, param2: Type = default, **kwargs):
"""Initialize the [ClassName].
Args:
param1: Description of param1 and its purpose.
param2: Description of param2. Defaults to [default].
**kwargs: Additional arguments passed to parent class.
"""
```
Example:
```python
def __init__(
self,
*,
api_key: str,
voice_id: Optional[str] = None,
sample_rate: Optional[int] = 22050,
**kwargs,
):
"""Initialize the Neuphonic TTS service.
Args:
api_key: Neuphonic API key for authentication.
voice_id: ID of the voice to use for synthesis.
sample_rate: Audio sample rate in Hz. Defaults to 22050.
**kwargs: Additional arguments passed to parent InterruptibleTTSService.
"""
```
## Method Docstring Format
```python
async def method_name(self, param1: Type) -> ReturnType:
"""One-line summary of what method does.
[Longer description if behavior isn't obvious.]
Args:
param1: Description of param1.
Returns:
Description of return value.
Raises:
ExceptionType: When this exception is raised.
"""
```
Example:
```python
async def put(self, item: Tuple[Frame, FrameDirection, FrameCallback]):
"""Put an item into the priority queue.
System frames (`SystemFrame`) have higher priority than any other
frames. If a non-frame item is provided it will have the highest priority.
Args:
item: The item to enqueue.
"""
```
## Dataclass/Config Format
```python
@dataclass
class ConfigName:
"""One-line description of configuration.
[Explanation of when/how to use this config.]
Parameters:
field1: Description of field1.
field2: Description of field2. Defaults to [default].
"""
field1: Type
field2: Type = default_value
```
Example:
```python
@dataclass
class FrameProcessorSetup:
"""Configuration parameters for frame processor initialization.
Parameters:
clock: The clock instance for timing operations.
task_manager: The task manager for handling async operations.
observer: Optional observer for monitoring frame processing events.
"""
clock: BaseClock
task_manager: BaseTaskManager
observer: Optional[BaseObserver] = None
```
## Enum Documentation Format
```python
class EnumName(Enum):
"""One-line description of the enum purpose.
[Longer description of how the enum is used.]
Parameters:
VALUE1: Description of VALUE1.
VALUE2: Description of VALUE2.
"""
VALUE1 = 1
VALUE2 = 2
```
## Writing Style Guidelines
- **Concise and professional** - No casual language or filler words
- **Action-oriented** - Start with verbs: "Processes...", "Manages...", "Converts..."
- **Purpose before implementation** - Explain WHY before HOW
- **Clear parameter descriptions** - Include type hints, defaults, and purpose
- **No redundant type info** - Type hints are in the signature, don't repeat in description
- **Use backticks for code references** - Wrap class names, method names, event names, parameter names, and code snippets in backticks
Good: "Neuphonic API key for authentication."
Bad: "str: The API key (string) that is used for authenticating with Neuphonic."
Good: "Triggers `on_speech_started` when the `VADAnalyzer` detects speech."
Bad: "Triggers on_speech_started when the VADAnalyzer detects speech."
## Deprecation Notice Format
When documenting deprecated code:
```python
"""[Description].
.. deprecated:: X.X.X
`ClassName` is deprecated and will be removed in a future version.
Use `NewClassName` instead.
"""
```
## Checklist
Before finishing, verify:
- [ ] Module has a docstring at the top (after copyright header and imports)
- [ ] All public classes have docstrings
- [ ] All `__init__` methods document their parameters
- [ ] All public methods have docstrings with Args/Returns/Raises as needed
- [ ] Dataclasses use "Parameters:" section for field descriptions
- [ ] Enums document each value in "Parameters:" section
- [ ] Writing is concise and action-oriented
- [ ] No documentation added to private methods (starting with `_`)
- [ ] Existing complete docstrings were left unchanged

View File

@@ -0,0 +1,128 @@
---
name: pr-description
description: Update a GitHub PR description with a summary of changes
---
Update a GitHub pull request description based on the changes in the PR.
## Arguments
```
/pr-description <PR_NUMBER> [--fixes <ISSUE_NUMBERS>]
```
- `PR_NUMBER` (required): The pull request number to update
- `--fixes` (optional): Comma-separated issue numbers that this PR fixes (e.g., `--fixes 123,456`)
Examples:
- `/pr-description 3534`
- `/pr-description 3534 --fixes 123`
- `/pr-description 3534 --fixes 123,456,789`
## Instructions
1. First, gather information about the PR:
- Use GitHub plugin to get PR details (title, current description, base branch)
- Use local git to get commits: `git log main..HEAD --oneline`
- Use local git to get the diff: `git diff main..HEAD`
- Parse any `--fixes` argument for issue numbers
2. Check the existing PR description:
- If it already has a complete, accurate description that reflects the changes, do nothing
- If it's missing sections, incomplete, or outdated compared to the actual changes, proceed to update
- If it only has the template placeholder text, generate a full description
3. Analyze the changes:
- Understand the purpose of each commit
- Identify any breaking changes (API changes, removed features, behavior changes)
- Look for new features, bug fixes, refactoring, or documentation changes
- Collect issue numbers from:
- The `--fixes` argument (if provided)
- Commit messages (patterns like "Fixes #123", "Closes #456", "Resolves #789")
4. Generate or update the PR description with these sections:
## PR Description Format
### Summary (always include)
Brief bullet points describing what changed and why. Focus on the *purpose* and *impact*, not implementation details.
```markdown
## Summary
- Added X to enable Y
- Fixed bug where Z would happen
- Refactored W for better maintainability
```
### Breaking Changes (include only if applicable)
Document any changes that affect existing users or APIs.
```markdown
## Breaking Changes
- `ClassName.method()` now requires a `param` argument
- Removed deprecated `old_function()` - use `new_function()` instead
```
### Testing (include when non-obvious)
How to verify the changes work. Skip for trivial changes.
```markdown
## Testing
- Run `uv run pytest tests/test_feature.py` to verify the fix
- Example usage: `uv run examples/new_feature.py`
```
### Fixes (include if issues are provided or found in commits)
List issues this PR fixes. GitHub will automatically close these issues when the PR is merged.
```markdown
## Fixes
- Fixes #123
- Fixes #456
```
Note: Use "Fixes #X" format (not "Closes" or "Resolves") for consistency. Each issue should be on its own line with "Fixes" to ensure GitHub auto-closes them.
## Guidelines
- **Be concise** - Reviewers should understand the PR in 30 seconds
- **Focus on why** - The diff shows *what* changed, explain *why*
- **Skip empty sections** - Only include sections that have content
- **Use bullet points** - Easier to scan than paragraphs
- **Don't duplicate the diff** - Avoid listing every file or line changed
## Example Output
```markdown
## Summary
- Added `/docstring` skill for documenting Python modules with Google-style docstrings
- Skill finds classes by name and handles conflicts when multiple matches exist
- Skips already-documented code to avoid unnecessary changes
## Testing
/docstring ClassName
## Fixes
- Fixes #123
```
## Checklist
Before updating the PR:
- [ ] Verified existing description needs updating (not already complete)
- [ ] Summary accurately reflects the changes
- [ ] Breaking changes are clearly documented (if any)
- [ ] No unnecessary sections included
- [ ] Description is concise and scannable

View File

@@ -33,7 +33,7 @@ jobs:
- name: Install dependencies
run: |
uv sync --group dev --extra anthropic --extra aws --extra google --extra langchain --extra websocket
uv sync --group dev --extra anthropic --extra aws --extra google --extra langchain --extra livekit --extra websocket
- name: Run tests with coverage
run: |

View File

@@ -37,7 +37,7 @@ jobs:
- name: Install dependencies
run: |
uv sync --group dev --extra anthropic --extra aws --extra google --extra langchain --extra websocket
uv sync --group dev --extra anthropic --extra aws --extra google --extra langchain --extra livekit --extra websocket
- name: Test with pytest
run: |

18
.gitignore vendored
View File

@@ -4,7 +4,14 @@ __pycache__/
*~
venv
.venv
/.idea
.idea
.gradle
.next
next-env.d.ts
local.properties
*.log
*.lock
smart_turn_audio_log
#*#
# Distribution / Packaging
@@ -27,7 +34,7 @@ share/python-wheels/
*.egg
MANIFEST
.DS_Store
.env
.env*
fly.toml
# Examples
@@ -54,4 +61,9 @@ docs/api/api
.python-version
# Pipecat
whisker_setup.py
whisker_setup.py
# Claude Code - exclude temporary files but keep configuration
.claude/.claude-cache/
.claude/**/*.tmp
.claude/**/*.log

View File

@@ -7,6 +7,129 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
<!-- towncrier release notes start -->
## [0.0.100] - 2026-01-20
### Added
- Added Hathora service to support Hathora-hosted TTS and STT models (only
non-streaming)
(PR [#3169](https://github.com/pipecat-ai/pipecat/pull/3169))
- Added `CambTTSService`, using Camb.ai's TTS integration with MARS models
(mars-flash, mars-pro, mars-instruct) for high-quality text-to-speech
synthesis.
(PR [#3349](https://github.com/pipecat-ai/pipecat/pull/3349))
- Added the `additional_headers` param to `WebsocketClientParams`, allowing
`WebsocketClientTransport` to send custom headers on connect, for cases such
as authentication.
(PR [#3461](https://github.com/pipecat-ai/pipecat/pull/3461))
- Added `UserIdleController` for detecting user idle state, integrated into
`LLMUserAggregator` and `UserTurnProcessor` via optional `user_idle_timeout`
parameter. Emits `on_user_turn_idle` event for application-level handling.
Deprecated `UserIdleProcessor` in favor of the new compositional approach.
(PR [#3482](https://github.com/pipecat-ai/pipecat/pull/3482))
- Added `on_user_mute_started` and `on_user_mute_stopped` event handlers to
`LLMUserAggregator` for tracking user mute state changes.
(PR [#3490](https://github.com/pipecat-ai/pipecat/pull/3490))
### Changed
- Enhanced interruption handling in `AsyncAITTSService` by supporting
multi-context WebSocket sessions for more robust context management.
(PR [#3287](https://github.com/pipecat-ai/pipecat/pull/3287))
- Throttle `UserSpeakingFrame` to broadcast at most every 200ms instead of on
every audio chunk, reducing frame processing overhead during user speech.
(PR [#3483](https://github.com/pipecat-ai/pipecat/pull/3483))
### Deprecated
- For consistency with other package names, we just deprecated
`pipecat.turns.mute` (introduced in Pipecat 0.0.99) in favor of
`pipecat.turns.user_mute`.
(PR [#3479](https://github.com/pipecat-ai/pipecat/pull/3479))
### Fixed
- Corrected TTFB metric calculation in `AsyncAIHttpTTSService`.
(PR [#3287](https://github.com/pipecat-ai/pipecat/pull/3287))
- Fixed an issue where the "bot-llm-text" RTVI event would not fire for
realtime (speech-to-speech) services:
- `AWSNovaSonicLLMService`
- `GeminiLiveLLMService`
- `OpenAIRealtimeLLMService`
- `GrokRealtimeLLMService`
The issue was that these services weren't pushing `LLMTextFrame`s. Now
they do.
(PR [#3446](https://github.com/pipecat-ai/pipecat/pull/3446))
- Fixed an issue where `on_user_turn_stop_timeout` could fire while a user is
talking when using `ExternalUserTurnStrategies`.
(PR [#3454](https://github.com/pipecat-ai/pipecat/pull/3454))
- Fixed an issue where user turn start strategies were not being reset after a
user turn started, causing incorrect strategy behavior.
(PR [#3455](https://github.com/pipecat-ai/pipecat/pull/3455))
- Fixed `MinWordsUserTurnStartStrategy` to not aggregate transcriptions,
preventing incorrect turn starts when words are spoken with pauses between
them.
(PR [#3462](https://github.com/pipecat-ai/pipecat/pull/3462))
- Fixed an issue where Grok Realtime would error out when running with
SmallWebRTC transport.
(PR [#3480](https://github.com/pipecat-ai/pipecat/pull/3480))
- Fixed a `Mem0MemoryService` issue where passing `async_mode: true` was
causing an error. See
https://docs.mem0.ai/platform/features/async-mode-default-change.
(PR [#3484](https://github.com/pipecat-ai/pipecat/pull/3484))
- Fixed `AWSNovaSonicLLMService.reset_conversation()`, which would previously
error out. Now it successfully reconnects and "rehydrates" from the context
object.
(PR [#3486](https://github.com/pipecat-ai/pipecat/pull/3486))
- Fixed `AzureTTSService` transcript formatting issues:
- Punctuation now appears without extra spaces (e.g., "Hello!" instead of
"Hello !")
- CJK languages (Chinese, Japanese, Korean) no longer have unwanted spaces
between characters
(PR [#3489](https://github.com/pipecat-ai/pipecat/pull/3489))
- Fixed an issue where `UninterruptibleFrame` frames would not be preserved in
some cases.
(PR [#3494](https://github.com/pipecat-ai/pipecat/pull/3494))
- Fixed memory leak in `LiveKitTransport` when `video_in_enabled` is `False`.
(PR [#3499](https://github.com/pipecat-ai/pipecat/pull/3499))
- Fixed an issue in `AIService` where unhandled exceptions in `start()`,
`stop()`, or `cancel()` implementations would prevent `process_frame()` to
continue and therefore `StartFrame`, `EndFrame`, or `CancelFrame` from being
pushed downstream, causing the pipeline to not start or stop properly.
(PR [#3503](https://github.com/pipecat-ai/pipecat/pull/3503))
- Moved `NVIDIATTSService` and `NVIDIASTTService` client initialization from
constructor to `start()` for better error handling.
(PR [#3504](https://github.com/pipecat-ai/pipecat/pull/3504))
- Optimized `NVIDIATTSService` to process incoming audio frames immediately.
(PR [#3509](https://github.com/pipecat-ai/pipecat/pull/3509))
- Optimized `NVIDIASTTService` by removing unnecessary queue and task.
(PR [#3509](https://github.com/pipecat-ai/pipecat/pull/3509))
- Fixed a `CambTTSService` issue where client was being initialized in the
constructor which wouldn't allow for proper Pipeline error handling.
(PR [#3511](https://github.com/pipecat-ai/pipecat/pull/3511))
## [0.0.99] - 2026-01-13
### Added

View File

@@ -81,7 +81,7 @@ Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.yout
| Serializers | [Exotel](https://docs.pipecat.ai/server/utilities/serializers/exotel), [Plivo](https://docs.pipecat.ai/server/utilities/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/utilities/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/utilities/serializers/telnyx), [Vonage](https://docs.pipecat.ai/server/utilities/serializers/vonage) |
| Video | [HeyGen](https://docs.pipecat.ai/server/services/video/heygen), [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) |
| Memory | [mem0](https://docs.pipecat.ai/server/services/memory/mem0) |
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/fal), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) |
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/google-imagen), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) |
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [ai-coustics](https://docs.pipecat.ai/server/utilities/audio/aic-filter) |
| Analytics & Metrics | [OpenTelemetry](https://docs.pipecat.ai/server/utilities/opentelemetry), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) |

View File

@@ -1 +0,0 @@
- Added Hathora service to support Hathora-hosted TTS and STT models (only non-streaming)

View File

@@ -1 +0,0 @@
- Enhanced interruption handling in `AsyncAITTSService` by supporting multi-context WebSocket sessions for more robust context management.

View File

@@ -1 +0,0 @@
- Corrected TTFB metric calculation in `AsyncAIHttpTTSService`.

View File

@@ -1 +0,0 @@
- Added `CambTTSService`, using Camb.ai's TTS integration with MARS models (mars-flash, mars-pro, mars-instruct) for high-quality text-to-speech synthesis.

1
changelog/3406.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed an issue where if you were using `OpenRouterLLMService` with a Gemini model, it wouldn't handle multiple `"system"` messages as expected (and as we do in `GoogleLLMService`), which is to convert subsequent ones into `"user"` messages. Instead, the latest `"system"` message would overwrite the previous ones.

View File

@@ -1,8 +0,0 @@
- Fixed an issue where the "bot-llm-text" RTVI event would not fire for realtime (speech-to-speech) services:
- `AWSNovaSonicLLMService`
- `GeminiLiveLLMService`
- `OpenAIRealtimeLLMService`
- `GrokRealtimeLLMService`
The issue was that these services weren't pushing `LLMTextFrame`s. Now they do.

View File

@@ -1 +0,0 @@
- Fixed an issue where `on_user_turn_stop_timeout` could fire while a user is talking when using `ExternalUserTurnStrategies`.

View File

@@ -1 +0,0 @@
- Fixed an issue where user turn start strategies were not being reset after a user turn started, causing incorrect strategy behavior.

View File

@@ -1 +0,0 @@
- Added the `additional_headers` param to `WebsocketClientParams`, allowing `WebsocketClientTransport` to send custom headers on connect, for cases such as authentication.

View File

@@ -1 +0,0 @@
- Fixed `MinWordsUserTurnStartStrategy` to not aggregate transcriptions, preventing incorrect turn starts when words are spoken with pauses between them.

View File

@@ -1 +0,0 @@
- For consistency with other package names, we just deprecated `pipecat.turns.mute` (introduced in Pipecat 0.0.99) in favor of `pipecat.turns.user_mute`.

View File

@@ -1 +0,0 @@
- Fixed an issue where Grok Realtime would error out when running with SmallWebRTC transport.

View File

@@ -1 +0,0 @@
- Added `UserIdleController` for detecting user idle state, integrated into `LLMUserAggregator` and `UserTurnProcessor` via optional `user_idle_timeout` parameter. Emits `on_user_turn_idle` event for application-level handling. Deprecated `UserIdleProcessor` in favor of the new compositional approach.

View File

@@ -1 +0,0 @@
- Throttle `UserSpeakingFrame` to broadcast at most every 200ms instead of on every audio chunk, reducing frame processing overhead during user speech.

View File

@@ -1 +0,0 @@
- Fixed a `Mem0MemoryService` issue where passing `async_mode: true` was causing an error. See https://docs.mem0.ai/platform/features/async-mode-default-change.

View File

@@ -1,3 +0,0 @@
- Fixed `AzureTTSService` transcript formatting issues:
- Punctuation now appears without extra spaces (e.g., "Hello!" instead of "Hello !")
- CJK languages (Chinese, Japanese, Korean) no longer have unwanted spaces between characters

View File

@@ -1 +0,0 @@
- Added `on_user_mute_started` and `on_user_mute_stopped` event handlers to `LLMUserAggregator` for tracking user mute state changes.

View File

@@ -0,0 +1 @@
- `SarvamSTTService` now defaults `vad_signals` and `high_vad_sensitivity` to `None` (omitted from connection parameters), improving latency by ~300ms compared to the previous defaults.

View File

@@ -0,0 +1 @@
- Improved the STT TTFB (Time To First Byte) measurement, reporting the delay between when the user stops speaking and when the final transcription is received. Note: Unlike traditional TTFB which measures from a discrete request, STT services receive continuous audio input—so we measure from speech end to final transcript, which captures the latency that matters for voice AI applications. In support of this change, added `finalized` field to `TranscriptionFrame` to indicate when a transcript is the final result for an utterance.

View File

@@ -0,0 +1 @@
- Added `add_reached_upstream_filter()` and `add_reached_downstream_filter()` methods to `PipelineTask` for appending frame types.

1
changelog/3510.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `reached_upstream_types` and `reached_downstream_types` read-only properties to `PipelineTask` for inspecting current frame filters.

View File

@@ -0,0 +1 @@
- Changed frame filter storage from tuples to sets in `PipelineTask`.

View File

@@ -0,0 +1 @@
- Added `RTVIProcessor.create_rtvi_observer()` factory method for creating RTVI observers.

View File

@@ -0,0 +1 @@
- Added `FrameProcessor.broadcast_frame_instance(frame)` method to broadcast a frame instance by extracting its fields and creating new instances for each direction.

1
changelog/3519.added.md Normal file
View File

@@ -0,0 +1 @@
- `PipelineTask` now automatically adds `RTVIProcessor` and registers `RTVIObserver` when `enable_rtvi=True` (default), simplifying pipeline setup.

View File

@@ -0,0 +1 @@
- Fixed `FrameProcessor.broadcast_frame()` to deep copy kwargs, preventing shared mutable references between the downstream and upstream frame instances.

1
changelog/3519.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Transports now properly broadcast `InputTransportMessageFrame` frames both upstream and downstream instead of only pushing downstream.

1
changelog/3520.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `video_out_codec` parameter to `TransportParams` allowing configuration of the preferred video codec (e.g., `"VP8"`, `"H264"`, `"H265"`) for video output in `DailyTransport`.

1
changelog/3523.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `location` parameter to Google TTS services (`GoogleHttpTTSService`, `GoogleTTSService`, `GeminiTTSService`) for regional endpoint support.

1
changelog/3525.added.md Normal file
View File

@@ -0,0 +1 @@
- Added new `PIPECAT_SMART_TURN_LOG_DATA` environment variable, which causes Smart Turn input data to be saved to disk

View File

@@ -0,0 +1,2 @@
- Changed default Inworld TTS model from `inworld-tts-1` to
`inworld-tts-1.5-max`.

View File

@@ -10,7 +10,6 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams

View File

@@ -45,7 +45,6 @@ from pipecat.services.google.tts import GoogleTTSService
from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies

View File

@@ -28,7 +28,7 @@ from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.filters.krisp_viva_filter import KrispVivaFilter
from pipecat.audio.turn.krisp_viva_turn import KrispTurnParams, KrispVivaTurn
from pipecat.audio.turn.krisp_viva_turn import KrispVivaTurn
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame

View File

@@ -23,7 +23,6 @@ from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.processors.frameworks.rtvi import RTVIObserver, RTVIProcessor
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.stt import DeepgramSTTService
@@ -93,12 +92,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
),
)
rtvi = RTVIProcessor()
pipeline = Pipeline(
[
transport.input(),
rtvi,
stt,
user_aggregator,
llm,
@@ -115,7 +111,6 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
enable_usage_metrics=True,
),
observers=[
RTVIObserver(rtvi),
DebugLogObserver(
frame_types={
TTSTextFrame: (BaseOutputTransport, FrameEndpoint.SOURCE),

View File

@@ -22,7 +22,6 @@ from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.processors.frameworks.rtvi import RTVIConfig, RTVIObserver, RTVIProcessor
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.stt import DeepgramSTTService
@@ -88,12 +87,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
),
)
rtvi = RTVIProcessor(config=RTVIConfig(config=[]))
pipeline = Pipeline(
[
transport.input(),
rtvi,
stt,
user_aggregator,
llm,
@@ -110,7 +106,6 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
enable_usage_metrics=True,
),
observers=[
RTVIObserver(rtvi),
DebugLogObserver(
frame_types={
TTSTextFrame: (BaseOutputTransport, FrameEndpoint.SOURCE),

View File

@@ -22,7 +22,6 @@ from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.processors.frameworks.rtvi import RTVIConfig, RTVIObserver, RTVIProcessor
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.stt import DeepgramSTTService
@@ -90,12 +89,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
),
)
rtvi = RTVIProcessor(config=RTVIConfig(config=[]))
pipeline = Pipeline(
[
transport.input(), # Transport user input
rtvi,
stt,
user_aggregator, # User responses
llm, # LLM
@@ -114,7 +110,6 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
observers=[
RTVIObserver(rtvi),
DebugLogObserver(
frame_types={
TTSTextFrame: (BaseOutputTransport, FrameEndpoint.SOURCE),
@@ -123,10 +118,6 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
],
)
@rtvi.event_handler("on_client_ready")
async def on_client_ready(rtvi):
await rtvi.set_bot_ready()
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")

View File

@@ -22,7 +22,7 @@ from pipecat.frames.frames import (
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,

View File

@@ -17,7 +17,7 @@ from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame, TTSSpeakFrame, UserImageRequestFrame
from pipecat.frames.frames import LLMRunFrame, UserImageRequestFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask

View File

@@ -22,7 +22,6 @@ from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.aws.nova_sonic.llm import AWSNovaSonicLLMService
@@ -114,6 +113,14 @@ async def load_conversation(params: FunctionCallParams):
# "content": f"{AWSNovaSonicLLMService.AWAIT_TRIGGER_ASSISTANT_RESPONSE_INSTRUCTION}",
# }
# )
# If the last message isn't from the user, add a message asking for a recap
if messages and messages[-1].get("role") != "user":
messages.append(
{
"role": "user",
"content": "Can you catch me up on what we were talking about?",
}
)
params.context.set_messages(messages)
await params.llm.reset_conversation()
# await params.llm.trigger_assistant_response()

View File

@@ -59,7 +59,6 @@ from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.processors.frameworks.rtvi import RTVIConfig, RTVIObserver, RTVIProcessor
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.stt import DeepgramSTTService
@@ -255,12 +254,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
),
),
)
rtvi = RTVIProcessor(config=RTVIConfig(config=[]))
pipeline = Pipeline(
[
transport.input(),
rtvi,
stt,
user_aggregator,
memory,
@@ -278,12 +275,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
observers=[RTVIObserver(rtvi)],
)
@rtvi.event_handler("on_client_ready")
@task.rtvi.event_handler("on_client_ready")
async def on_client_ready(rtvi):
await rtvi.set_bot_ready()
# Get personalized greeting based on user memories. Can pass agent_id and run_id as per requirement of the application to manage short term memory or agent specific memory.
greeting = await get_initial_greeting(
memory_client=memory.memory_client, user_id=USER_ID, agent_id=None, run_id=None

View File

@@ -22,7 +22,6 @@ from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.processors.frameworks.rtvi import RTVIObserver, RTVIProcessor
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
@@ -87,8 +86,6 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
),
)
rtvi = RTVIProcessor()
pipeline = Pipeline(
[
transport.input(), # Transport user input
@@ -108,13 +105,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
enable_metrics=True,
enable_usage_metrics=True,
),
observers=[RTVIObserver(rtvi)],
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@rtvi.event_handler("on_client_ready")
@task.rtvi.event_handler("on_client_ready")
async def on_client_ready(rtvi):
await rtvi.set_bot_ready()
# Kick off the conversation
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])

View File

@@ -9,7 +9,6 @@ import asyncio
import io
import json
import os
import re
import shutil
import aiohttp

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2025, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -22,7 +22,6 @@ from pipecat.processors.aggregators.llm_response_universal import (
LLMUserAggregatorParams,
)
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.processors.frameworks.rtvi import RTVIObserver, RTVIProcessor
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.google.gemini_live.llm import GeminiLiveLLMService
@@ -125,14 +124,10 @@ async def run_bot(pipecat_transport):
),
)
# RTVI events for Pipecat client UI
rtvi = RTVIProcessor()
pipeline = Pipeline(
[
pipecat_transport.input(),
user_aggregator,
rtvi,
llm, # LLM
EdgeDetectionProcessor(
pipecat_transport._params.video_out_width,
@@ -149,13 +144,11 @@ async def run_bot(pipecat_transport):
enable_metrics=True,
enable_usage_metrics=True,
),
observers=[RTVIObserver(rtvi)],
)
@rtvi.event_handler("on_client_ready")
@task.rtvi.event_handler("on_client_ready")
async def on_client_ready(rtvi):
logger.info("Pipecat client ready.")
await rtvi.set_bot_ready()
# Kick off the conversation.
await task.queue_frames([LLMRunFrame()])

View File

@@ -13,7 +13,7 @@ from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame, ThoughtTranscriptionMessage, TranscriptionMessage
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask

View File

@@ -53,8 +53,6 @@ from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.grok.realtime.events import (
SessionProperties,
WebSearchTool,
XSearchTool,
)
from pipecat.services.grok.realtime.llm import GrokRealtimeLLMService
from pipecat.services.llm_service import FunctionCallParams

View File

@@ -4,7 +4,7 @@ This directory contains examples showing how to build voice and multimodal agent
## Setup
1. Follow the [README](../../README.md#%EF%B8%8F-contributing-to-the-framework) steps to get your local environment configured.
1. Follow the [README](https://github.com/pipecat-ai/pipecat/blob/main/README.md#%EF%B8%8F-contributing-to-the-framework) steps to get your local environment configured.
> **Run from root directory**: Make sure you are running the steps from the root directory.
@@ -140,4 +140,4 @@ uv run python <example-name> --host 0.0.0.0 --port 8080
- **Connection errors**: Verify API keys in `.env` file
- **Port conflicts**: Use `--port` to change the port
For more examples, visit our the [`pipecat-examples repository](https://github.com/pipecat-ai/pipecat-examples).
For more examples, visit our the [pipecat-examples repository](https://github.com/pipecat-ai/pipecat-examples).

View File

@@ -1,11 +1,11 @@
agent_name = "quickstart"
image = "your_username/quickstart:0.1"
secret_set = "quickstart-secrets"
agent_name = "quickstart-test"
image = "markatdaily/quickstart-test:latest"
secret_set = "quickstart-test-secrets"
agent_profile = "agent-1x"
# RECOMMENDED: Set an image pull secret:
# https://docs.pipecat.ai/deployment/pipecat-cloud/fundamentals/secrets#image-pull-secrets
# image_credentials = "your_image_pull_secret"
image_credentials = "dockerhub-access"
[scaling]
min_agents = 1

View File

@@ -54,7 +54,7 @@ assemblyai = [ "pipecat-ai[websockets-base]" ]
asyncai = [ "pipecat-ai[websockets-base]" ]
aws = [ "aioboto3~=15.5.0", "pipecat-ai[websockets-base]" ]
aws-nova-sonic = [ "aws_sdk_bedrock_runtime~=0.2.0; python_version>='3.12'" ]
azure = [ "azure-cognitiveservices-speech~=1.44.0"]
azure = [ "azure-cognitiveservices-speech~=1.47.0"]
cartesia = [ "cartesia~=2.0.3", "pipecat-ai[websockets-base]" ]
camb = [ "camb-sdk>=1.5.4" ]
cerebras = []

View File

@@ -293,12 +293,13 @@ async def run_eval_pipeline(
"You should only call the eval function if:\n"
"- The user explicitly attempts to answer the question, AND\n"
f"- Their answer can be cleanly evaluated using: {eval_config.eval}\n"
"Ignore greetings, comments, non-answers, or requests for clarification."
"Ignore greetings, comments, non-answers, or requests for clarification.\n"
"Numerical word answers are allowed (e.g., 'five' is the same as '5').\n"
)
if eval_config.eval_speaks_first:
system_prompt = f"You are an evaluation agent, be extremly brief. Numerical word answers are allowed. You will start the conversation by saying: '{example_prompt}'. {common_system_prompt}"
system_prompt = f"You are an evaluation agent, be extremly brief. You will start the conversation by saying: '{example_prompt}'. {common_system_prompt}"
else:
system_prompt = f"You are an evaluation agent, be extremly brief. Numerical word answers are allowed. First, ask one question: {example_prompt}. {common_system_prompt}"
system_prompt = f"You are an evaluation agent, be extremly brief. First, ask one question: {example_prompt}. {common_system_prompt}"
messages = [
{

View File

@@ -137,6 +137,7 @@ TESTS_07 = [
# ("07zd-interruptible-aicoustics.py", EVAL_SIMPLE_MATH),
("07ze-interruptible-hume.py", EVAL_SIMPLE_MATH),
("07zf-interruptible-gradium.py", EVAL_SIMPLE_MATH),
("07zg-interruptible-camb.py", EVAL_SIMPLE_MATH),
("07zh-interruptible-hathora.py", EVAL_SIMPLE_MATH),
# Needs a local XTTS docker instance running.
# ("07i-interruptible-xtts.py", EVAL_SIMPLE_MATH),

View File

@@ -22,7 +22,7 @@ from pathlib import Path
try:
import numpy as np
import soundfile as sf
import soundfile as sf # noqa: F401
from audio_file_utils import calculate_audio_stats, read_audio_file, write_audio_file
except ImportError as e:
print(f"Error: Missing required dependencies: {e}")

View File

@@ -23,7 +23,7 @@ from pathlib import Path
try:
import numpy as np
import soundfile as sf
import soundfile as sf # noqa: F401
from audio_file_utils import read_audio_file
except ImportError as e:
print(f"Error: Missing required dependencies: {e}")

View File

@@ -10,7 +10,7 @@ import base64
import copy
import json
from dataclasses import dataclass
from typing import Any, Dict, List, Literal, Optional, TypedDict
from typing import Any, Dict, List, Optional, TypedDict
from loguru import logger

View File

@@ -9,7 +9,7 @@
import base64
import json
from dataclasses import dataclass, field
from typing import Any, Dict, List, Optional, Tuple, TypedDict
from typing import Any, Dict, List, Optional, TypedDict
from loguru import logger
from openai import NotGiven

View File

@@ -7,10 +7,8 @@
"""OpenAI LLM adapter for Pipecat."""
import copy
import json
from typing import Any, Dict, List, TypedDict
from openai._types import NOT_GIVEN as OPEN_AI_NOT_GIVEN
from openai._types import NotGiven as OpenAINotGiven
from openai.types.chat import (
ChatCompletionMessageParam,

View File

@@ -9,7 +9,6 @@
This module provides an audio filter implementation using Krisp VIVA SDK.
"""
import asyncio
import os
import numpy as np

View File

@@ -16,6 +16,7 @@ import numpy as np
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import BaseSmartTurn
from pipecat.utils.env import env_truthy
try:
import onnxruntime as ort
@@ -48,6 +49,8 @@ class LocalSmartTurnAnalyzerV3(BaseSmartTurn):
"""
super().__init__(**kwargs)
self._log_data = env_truthy("PIPECAT_SMART_TURN_LOG_DATA", default=False)
if not smart_turn_model_path:
# Load bundled model
model_name = "smart-turn-v3.2-cpu.onnx"
@@ -81,6 +84,49 @@ class LocalSmartTurnAnalyzerV3(BaseSmartTurn):
logger.debug("Loaded Local Smart Turn v3.x")
def _write_audio_to_wav(
self, audio_array: np.ndarray, sample_rate: int = 16000, suffix: str = ""
) -> None:
"""Write audio data to a WAV file in a background thread.
Args:
audio_array: The audio data as a numpy array (float32, normalized to [-1, 1]).
sample_rate: The sample rate of the audio data.
suffix: Optional suffix to append to the filename (e.g., "_raw", "_padded").
"""
import os
import threading
import wave
from datetime import datetime
# Generate filename with current timestamp (millisecond precision)
timestamp = datetime.now().strftime("%Y-%m-%d__%H:%M:%S.%f")[:-3]
log_dir = "./smart_turn_audio_log"
os.makedirs(log_dir, exist_ok=True)
filename = os.path.join(log_dir, f"{timestamp}{suffix}.wav")
# Make a copy of the audio data to avoid issues with the array being modified
audio_copy = audio_array.copy()
def write_wav():
try:
# Convert float32 audio to int16 for WAV file
audio_int16 = (audio_copy * 32767).astype(np.int16)
with wave.open(filename, "wb") as wav_file:
wav_file.setnchannels(1) # Mono
wav_file.setsampwidth(2) # 2 bytes for int16
wav_file.setframerate(sample_rate)
wav_file.writeframes(audio_int16.tobytes())
logger.debug(f"Wrote audio to {filename}")
except Exception as e:
logger.error(f"Failed to write audio to {filename}: {e}")
# Start background thread to write the WAV file
thread = threading.Thread(target=write_wav, daemon=True)
thread.start()
def _predict_endpoint(self, audio_array: np.ndarray) -> Dict[str, Any]:
"""Predict end-of-turn using local ONNX model."""
@@ -95,6 +141,8 @@ class LocalSmartTurnAnalyzerV3(BaseSmartTurn):
return np.pad(audio_array, (padding, 0), mode="constant", constant_values=0)
return audio_array
audio_for_logging = audio_array
# Truncate to 8 seconds (keeping the end) or pad to 8 seconds
audio_array = truncate_audio_to_last_n_seconds(audio_array, n_seconds=8)
@@ -122,6 +170,10 @@ class LocalSmartTurnAnalyzerV3(BaseSmartTurn):
# Make prediction (1 for Complete, 0 for Incomplete)
prediction = 1 if probability > 0.5 else 0
if self._log_data:
suffix = "_complete" if prediction == 1 else "_incomplete"
self._write_audio_to_wav(audio_for_logging, sample_rate=16000, suffix=suffix)
return {
"prediction": prediction,
"probability": probability,

View File

@@ -426,12 +426,15 @@ class TranscriptionFrame(TextFrame):
timestamp: When the transcription occurred.
language: Detected or specified language of the speech.
result: Raw result from the STT service.
finalized: Whether this is the final transcription for an utterance.
Set by STT services that support commit/finalize signals.
"""
user_id: str
timestamp: str
language: Optional[Language] = None
result: Optional[Any] = None
finalized: bool = False
def __str__(self):
return f"{self.name}(user: {self.user_id}, text: [{self.text}], language: {self.language}, timestamp: {self.timestamp})"

View File

@@ -15,7 +15,7 @@ import asyncio
import importlib.util
import os
from pathlib import Path
from typing import Any, AsyncIterable, Dict, Iterable, List, Optional, Tuple, Type
from typing import Any, AsyncIterable, Dict, Iterable, List, Optional, Set, Tuple, Type
from loguru import logger
from pydantic import BaseModel, ConfigDict, Field
@@ -49,6 +49,7 @@ from pipecat.pipeline.pipeline import Pipeline, PipelineSink, PipelineSource
from pipecat.pipeline.task_observer import TaskObserver
from pipecat.processors.aggregators.llm_response import LLMUserContextAggregator
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor, FrameProcessorSetup
from pipecat.processors.frameworks.rtvi import RTVIObserverParams, RTVIProcessor
from pipecat.utils.asyncio.task_manager import BaseTaskManager, TaskManager, TaskManagerParams
from pipecat.utils.tracing.setup import is_tracing_available
from pipecat.utils.tracing.turn_trace_observer import TurnTraceObserver
@@ -225,9 +226,12 @@ class PipelineTask(BasePipelineTask):
conversation_id: Optional[str] = None,
enable_tracing: bool = False,
enable_turn_tracking: bool = True,
enable_rtvi: bool = True,
idle_timeout_frames: Tuple[Type[Frame], ...] = (BotSpeakingFrame, UserSpeakingFrame),
idle_timeout_secs: Optional[float] = IDLE_TIMEOUT_SECS,
observers: Optional[List[BaseObserver]] = None,
rtvi_processor: Optional[RTVIProcessor] = None,
rtvi_observer_params: Optional[RTVIObserverParams] = None,
task_manager: Optional[BaseTaskManager] = None,
):
"""Initialize the PipelineTask.
@@ -244,6 +248,7 @@ class PipelineTask(BasePipelineTask):
check_dangling_tasks: Whether to check for processors' tasks finishing properly.
clock: Clock implementation for timing operations.
conversation_id: Optional custom ID for the conversation.
enable_rtvi: Whether to automatically add RTVI support to the pipeline.
enable_tracing: Whether to enable tracing.
enable_turn_tracking: Whether to enable turn tracking.
idle_timeout_frames: A tuple with the frames that should trigger an idle
@@ -252,6 +257,8 @@ class PipelineTask(BasePipelineTask):
None. If a pipeline is idle the pipeline task will be cancelled
automatically.
observers: List of observers for monitoring pipeline execution.
rtvi_observer_params: The RTVI observer parameter to use if RTVI is enabled.
rtvi_processor: The RTVI processor to add if RTVI is enabled.
task_manager: Optional task manager for handling asyncio tasks.
"""
super().__init__()
@@ -306,6 +313,16 @@ class PipelineTask(BasePipelineTask):
self._heartbeat_push_task: Optional[asyncio.Task] = None
self._heartbeat_monitor_task: Optional[asyncio.Task] = None
# RTVI support
self._rtvi = None
if enable_rtvi:
self._rtvi = rtvi_processor or RTVIProcessor()
observers.append(self._rtvi.create_rtvi_observer(params=rtvi_observer_params))
@self.rtvi.event_handler("on_client_ready")
async def on_client_ready(rtvi: RTVIProcessor):
await rtvi.set_bot_ready()
# This is the idle event. When selected frames are pushed from any
# processor we consider the pipeline is not idle. We use an observer
# which will be listening any part of the pipeline.
@@ -335,7 +352,8 @@ class PipelineTask(BasePipelineTask):
# allows us to receive and react to downstream frames.
source = PipelineSource(self._source_push_frame, name=f"{self}::Source")
sink = PipelineSink(self._sink_push_frame, name=f"{self}::Sink")
self._pipeline = Pipeline([pipeline], source=source, sink=sink)
processors = [self._rtvi, pipeline] if self._rtvi else [pipeline]
self._pipeline = Pipeline(processors, source=source, sink=sink)
# The task observer acts as a proxy to the provided observers. This way,
# we only need to pass a single observer (using the StartFrame) which
@@ -348,8 +366,8 @@ class PipelineTask(BasePipelineTask):
# in. This is mainly for efficiency reason because each event handler
# creates a task and most likely you only care about one or two frame
# types.
self._reached_upstream_types: Tuple[Type[Frame], ...] = ()
self._reached_downstream_types: Tuple[Type[Frame], ...] = ()
self._reached_upstream_types: Set[Type[Frame]] = set()
self._reached_downstream_types: Set[Type[Frame]] = set()
self._register_event_handler("on_frame_reached_upstream")
self._register_event_handler("on_frame_reached_downstream")
self._register_event_handler("on_idle_timeout")
@@ -398,6 +416,35 @@ class PipelineTask(BasePipelineTask):
"""
return self._turn_trace_observer
@property
def rtvi(self) -> RTVIProcessor:
"""Get the RTVI processor if RTVI is enabled.
Returns:
The RTVI processor added to the pipeline when RTVI is enabled.
"""
if not self._rtvi:
raise Exception(f"{self} RTVI is not enabled.")
return self._rtvi
@property
def reached_upstream_types(self) -> Tuple[Type[Frame], ...]:
"""Get the currently configured upstream frame type filters.
Returns:
Tuple of frame types that trigger the on_frame_reached_upstream event.
"""
return tuple(self._reached_upstream_types)
@property
def reached_downstream_types(self) -> Tuple[Type[Frame], ...]:
"""Get the currently configured downstream frame type filters.
Returns:
Tuple of frame types that trigger the on_frame_reached_downstream event.
"""
return tuple(self._reached_downstream_types)
def event_handler(self, event_name: str):
"""Decorator for registering event handlers.
@@ -441,7 +488,7 @@ class PipelineTask(BasePipelineTask):
Args:
types: Tuple of frame types to monitor for upstream events.
"""
self._reached_upstream_types = types
self._reached_upstream_types = set(types)
def set_reached_downstream_filter(self, types: Tuple[Type[Frame], ...]):
"""Set which frame types trigger the on_frame_reached_downstream event.
@@ -449,7 +496,23 @@ class PipelineTask(BasePipelineTask):
Args:
types: Tuple of frame types to monitor for downstream events.
"""
self._reached_downstream_types = types
self._reached_downstream_types = set(types)
def add_reached_upstream_filter(self, types: Tuple[Type[Frame], ...]):
"""Add frame types to trigger the on_frame_reached_upstream event.
Args:
types: Tuple of frame types to add to upstream monitoring.
"""
self._reached_upstream_types.update(types)
def add_reached_downstream_filter(self, types: Tuple[Type[Frame], ...]):
"""Add frame types to trigger the on_frame_reached_downstream event.
Args:
types: Tuple of frame types to add to downstream monitoring.
"""
self._reached_downstream_types.update(types)
def has_finished(self) -> bool:
"""Check if the pipeline task has finished execution.
@@ -749,7 +812,7 @@ class PipelineTask(BasePipelineTask):
pipeline to be stopped (e.g. EndTaskFrame) in which case we would send
an EndFrame down the pipeline.
"""
if isinstance(frame, self._reached_upstream_types):
if isinstance(frame, tuple(self._reached_upstream_types)):
await self._call_event_handler("on_frame_reached_upstream", frame)
if isinstance(frame, EndTaskFrame):
@@ -788,7 +851,7 @@ class PipelineTask(BasePipelineTask):
processors have handled the EndFrame and therefore we can exit the task
cleanly.
"""
if isinstance(frame, self._reached_downstream_types):
if isinstance(frame, tuple(self._reached_downstream_types)):
await self._call_event_handler("on_frame_reached_downstream", frame)
if isinstance(frame, StartFrame):

View File

@@ -833,7 +833,7 @@ class LLMAssistantAggregator(LLMContextAggregator):
"id": frame.tool_call_id,
"function": {
"name": frame.function_name,
"arguments": json.dumps(frame.arguments),
"arguments": json.dumps(frame.arguments, ensure_ascii=False),
},
"type": "function",
}
@@ -866,7 +866,7 @@ class LLMAssistantAggregator(LLMContextAggregator):
# Update context with the function call result
if frame.result:
result = json.dumps(frame.result)
result = json.dumps(frame.result, ensure_ascii=False)
self._update_function_call_result(frame.function_name, frame.tool_call_id, result)
else:
self._update_function_call_result(frame.function_name, frame.tool_call_id, "COMPLETED")

View File

@@ -34,7 +34,6 @@ from PIL import Image
from pipecat.adapters.base_llm_adapter import BaseLLMAdapter
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.frames.frames import AudioRawFrame, Frame
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
# JSON custom encoder to handle bytes arrays so that we can log contexts
# with images to the console.

View File

@@ -18,7 +18,7 @@ from typing import List
from loguru import logger
from pipecat.frames.frames import ErrorFrame, Frame, TranscriptionFrame
from pipecat.frames.frames import Frame, TranscriptionFrame
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor

View File

@@ -12,7 +12,9 @@ management, and frame flow control mechanisms.
"""
import asyncio
import dataclasses
import traceback
from copy import deepcopy
from dataclasses import dataclass
from enum import Enum
from typing import (
@@ -779,8 +781,40 @@ class FrameProcessor(BaseObject):
frame_cls: The class of the frame to be broadcasted.
**kwargs: Keyword arguments to be passed to the frame's constructor.
"""
await self.push_frame(frame_cls(**kwargs))
await self.push_frame(frame_cls(**kwargs), FrameDirection.UPSTREAM)
await self.push_frame(frame_cls(**deepcopy(kwargs)))
await self.push_frame(frame_cls(**deepcopy(kwargs)), FrameDirection.UPSTREAM)
async def broadcast_frame_instance(self, frame: Frame):
"""Broadcasts a frame instance upstream and downstream.
This method creates two new frame instances copying all fields from the
original frame except `id` and `name`, which get fresh values.
Args:
frame: The frame instance to broadcast.
Note:
Prefer using `broadcast_frame()` when possible, as it is more
efficient. This method should only be used when you are not the
creator of the frame and need to broadcast an existing instance.
"""
frame_cls = type(frame)
init_fields = {f.name: getattr(frame, f.name) for f in dataclasses.fields(frame) if f.init}
extra_fields = {
f.name: getattr(frame, f.name)
for f in dataclasses.fields(frame)
if not f.init and f.name not in ("id", "name")
}
new_frame = frame_cls(**deepcopy(init_fields))
for k, v in deepcopy(extra_fields).items():
setattr(new_frame, k, v)
await self.push_frame(new_frame)
new_frame = frame_cls(**deepcopy(init_fields))
for k, v in deepcopy(extra_fields).items():
setattr(new_frame, k, v)
await self.push_frame(new_frame, FrameDirection.UPSTREAM)
async def __start(self, frame: StartFrame):
"""Handle the start frame to initialize processor state.
@@ -950,7 +984,8 @@ class FrameProcessor(BaseObject):
# Process current queue and keep UninterruptibleFrame frames.
while not self.__process_queue.empty():
item = self.__process_queue.get_nowait()
if isinstance(item, UninterruptibleFrame):
frame = item[0]
if isinstance(frame, UninterruptibleFrame):
new_queue.put_nowait(item)
self.__process_queue.task_done()

View File

@@ -1100,13 +1100,11 @@ class RTVIObserver(BaseObserver):
if (
isinstance(frame, (UserStartedSpeakingFrame, UserStoppedSpeakingFrame))
and (direction == FrameDirection.DOWNSTREAM)
and self._params.user_speaking_enabled
):
await self._handle_interruptions(frame)
elif (
isinstance(frame, (BotStartedSpeakingFrame, BotStoppedSpeakingFrame))
and (direction == FrameDirection.UPSTREAM)
and self._params.bot_speaking_enabled
):
await self._handle_bot_speaking(frame)
@@ -1413,6 +1411,18 @@ class RTVIProcessor(FrameProcessor):
self._registered_services[service.name] = service
def create_rtvi_observer(self, *, params: Optional[RTVIObserverParams] = None, **kwargs):
"""Creates a new RTVI Observer.
Args:
params: Settings to enable/disable specific messages.
**kwargs: Additional arguments passed to the observer.
Returns:
A new RTVI observer.
"""
return RTVIObserver(self, params=params, **kwargs)
async def set_client_ready(self):
"""Mark the client as ready and trigger the ready event."""
self._client_ready = True

View File

@@ -263,7 +263,7 @@ def _setup_webrtc_routes(
"""Handle WebRTC offer requests via SmallWebRTCRequestHandler."""
# Prepare runner arguments with the callback to run your bot
async def webrtc_connection_callback(connection):
async def webrtc_connection_callback(connection: SmallWebRTCConnection):
bot_module = _get_bot_module()
runner_args = SmallWebRTCRunnerArguments(
@@ -406,13 +406,7 @@ def _setup_whatsapp_routes(app: FastAPI):
return
try:
from pipecat_ai_small_webrtc_prebuilt.frontend import SmallWebRTCPrebuiltUI
from pipecat.transports.smallwebrtc.connection import SmallWebRTCConnection
from pipecat.transports.smallwebrtc.request_handler import (
SmallWebRTCRequest,
SmallWebRTCRequestHandler,
)
from pipecat.transports.whatsapp.api import WhatsAppWebhookRequest
from pipecat.transports.whatsapp.client import WhatsAppClient
except ImportError as e:

View File

@@ -126,7 +126,7 @@ class ProtobufFrameSerializer(FrameSerializer):
if "pts" in args_dict:
del args_dict["pts"]
# Special handling for MessageFrame -> OutputTransportMessageUrgentFrame
# Special handling for MessageFrame -> InputTransportMessageFrame
if class_name == MessageFrame:
try:
msg = json.loads(args_dict["data"])

View File

@@ -148,11 +148,11 @@ class AIService(FrameProcessor):
await super().process_frame(frame, direction)
if isinstance(frame, StartFrame):
await self.start(frame)
elif isinstance(frame, CancelFrame):
await self.cancel(frame)
await self._start(frame)
elif isinstance(frame, EndFrame):
await self.stop(frame)
await self._stop(frame)
elif isinstance(frame, CancelFrame):
await self._cancel(frame)
async def process_generator(self, generator: AsyncGenerator[Frame | None, None]):
"""Process frames from an async generator.
@@ -169,3 +169,21 @@ class AIService(FrameProcessor):
await self.push_error_frame(f)
else:
await self.push_frame(f)
async def _start(self, frame: StartFrame):
try:
await self.start(frame)
except Exception as e:
logger.error(f"{self}: exception processing {frame}: {e}")
async def _stop(self, frame: EndFrame):
try:
await self.stop(frame)
except Exception as e:
logger.error(f"{self}: exception processing {frame}: {e}")
async def _cancel(self, frame: CancelFrame):
try:
await self.cancel(frame)
except Exception as e:
logger.error(f"{self}: exception processing {frame}: {e}")

View File

@@ -161,7 +161,7 @@ class AssemblyAISTTService(WebsocketSTTService):
"""
await super().process_frame(frame, direction)
if isinstance(frame, VADUserStartedSpeakingFrame):
await self.start_ttfb_metrics()
pass
elif isinstance(frame, VADUserStoppedSpeakingFrame):
if (
self._vad_force_turn_endpoint
@@ -354,7 +354,6 @@ class AssemblyAISTTService(WebsocketSTTService):
"""Handle transcription results."""
if not message.transcript:
return
await self.stop_ttfb_metrics()
if message.end_of_turn and (
not self._connection_params.formatted_finals or message.turn_is_formatted
):

View File

@@ -296,6 +296,7 @@ class AWSNovaSonicLLMService(LLMService):
self._user_text_buffer = ""
self._assistant_text_buffer = ""
self._completed_tool_calls = set()
self._audio_input_started = False
file_path = files("pipecat.services.aws.nova_sonic").joinpath("ready.wav")
with wave.open(file_path.open("rb"), "rb") as wav_file:
@@ -532,14 +533,30 @@ class AWSNovaSonicLLMService(LLMService):
if system_instruction:
await self._send_text_event(text=system_instruction, role=Role.SYSTEM)
# Send conversation history
for message in llm_connection_params["messages"]:
# Send conversation history (except for the last message if it's from the
# user, which we'll send as interactive after starting audio input)
messages = llm_connection_params["messages"]
last_user_message = None
for i, message in enumerate(messages):
# logger.debug(f"Seeding conversation history with message: {message}")
await self._send_text_event(text=message.text, role=message.role)
is_last_message = i == len(messages) - 1
if is_last_message and message.role == Role.USER:
# Save for sending after audio input starts
last_user_message = message
else:
await self._send_text_event(text=message.text, role=message.role)
# Start audio input
await self._send_audio_input_start_event()
# Now send the last user message as interactive to trigger bot response
if last_user_message:
# logger.debug(
# f"Sending last user message as interactive to trigger bot response: {last_user_message}")
await self._send_text_event(
text=last_user_message.text, role=last_user_message.role, interactive=True
)
# Start receiving events
self._receive_task = self.create_task(self._receive_task_handler())
@@ -602,6 +619,7 @@ class AWSNovaSonicLLMService(LLMService):
self._user_text_buffer = ""
self._assistant_text_buffer = ""
self._completed_tool_calls = set()
self._audio_input_started = False
logger.info("Finished disconnecting")
except Exception as e:
@@ -727,8 +745,18 @@ class AWSNovaSonicLLMService(LLMService):
}}
'''
await self._send_client_event(audio_content_start)
self._audio_input_started = True
async def _send_text_event(self, text: str, role: Role):
async def _send_text_event(self, text: str, role: Role, interactive: bool = False):
"""Send a text event to the LLM.
Args:
text: The text content to send.
role: The role associated with the text (e.g., USER, ASSISTANT, SYSTEM).
interactive: Whether the content is interactive. Defaults to False.
False: conversation history or system instruction, sent prior to interactive audio
True: text input sent during (or at the start of) interactive audio
"""
if not self._stream or not self._prompt_name or not text:
return
@@ -741,7 +769,7 @@ class AWSNovaSonicLLMService(LLMService):
"promptName": "{self._prompt_name}",
"contentName": "{content_name}",
"type": "TEXT",
"interactive": true,
"interactive": {json.dumps(interactive)},
"role": "{role.value}",
"textInputConfiguration": {{
"mediaType": "text/plain"
@@ -779,7 +807,7 @@ class AWSNovaSonicLLMService(LLMService):
await self._send_client_event(text_content_end)
async def _send_user_audio_event(self, audio: bytes):
if not self._stream:
if not self._stream or not self._audio_input_started:
return
blob = base64.b64encode(audio)

View File

@@ -10,7 +10,6 @@ This module provides a WebSocket-based connection to AWS Transcribe for real-tim
speech-to-text transcription with support for multiple languages and audio formats.
"""
import asyncio
import json
import os
import random
@@ -159,7 +158,6 @@ class AWSTranscribeSTTService(WebsocketSTTService):
await self._websocket.send(event_message)
# Start metrics after first chunk sent
await self.start_processing_metrics()
await self.start_ttfb_metrics()
except Exception as e:
yield ErrorFrame(error=f"Error sending audio: {e}")
@@ -471,7 +469,6 @@ class AWSTranscribeSTTService(WebsocketSTTService):
is_final = not result.get("IsPartial", True)
if transcript:
await self.stop_ttfb_metrics()
if is_final:
await self.push_frame(
TranscriptionFrame(

View File

@@ -10,7 +10,6 @@ This module provides integration with Amazon Polly for text-to-speech synthesis,
supporting multiple languages, voices, and SSML features.
"""
import asyncio
import os
from typing import AsyncGenerator, List, Optional

View File

@@ -17,3 +17,8 @@ with warnings.catch_warnings():
DeprecationWarning,
stacklevel=2,
)
__all__ = [
"AWSNovaSonicLLMService",
"Params",
]

View File

@@ -8,8 +8,6 @@
from typing import Optional
from loguru import logger
from pipecat.transcriptions.language import Language, resolve_language

View File

@@ -15,7 +15,6 @@ import io
from typing import AsyncGenerator
import aiohttp
from loguru import logger
from PIL import Image
from pipecat.frames.frames import ErrorFrame, Frame, URLImageRawFrame

View File

@@ -116,7 +116,6 @@ class AzureSTTService(STTService):
"""
try:
await self.start_processing_metrics()
await self.start_ttfb_metrics()
if self._audio_stream:
self._audio_stream.write(audio)
yield None
@@ -191,7 +190,6 @@ class AzureSTTService(STTService):
self, transcript: str, is_final: bool, language: Optional[Language] = None
):
"""Handle a transcription result with tracing."""
await self.stop_ttfb_metrics()
await self.stop_processing_metrics()
def _on_handle_recognized(self, event):

View File

@@ -90,7 +90,7 @@ class AzureBaseTTSService:
emphasis: Emphasis level for speech ("strong", "moderate", "reduced").
language: Language for synthesis. Defaults to English (US).
pitch: Voice pitch adjustment (e.g., "+10%", "-5Hz", "high").
rate: Speech rate multiplier. Defaults to "1.05".
rate: Speech rate adjustment (e.g., "1.0", "1.25", "slow", "fast").
role: Voice role for expression (e.g., "YoungAdultFemale").
style: Speaking style (e.g., "cheerful", "sad", "excited").
style_degree: Intensity of the speaking style (0.01 to 2.0).
@@ -100,7 +100,7 @@ class AzureBaseTTSService:
emphasis: Optional[str] = None
language: Optional[Language] = Language.EN_US
pitch: Optional[str] = None
rate: Optional[str] = "1.05"
rate: Optional[str] = None
role: Optional[str] = None
style: Optional[str] = None
style_degree: Optional[str] = None
@@ -185,7 +185,9 @@ class AzureBaseTTSService:
if self._settings["volume"]:
prosody_attrs.append(f"volume='{self._settings['volume']}'")
ssml += f"<prosody {' '.join(prosody_attrs)}>"
# Only wrap in prosody tag if there are prosody attributes
if prosody_attrs:
ssml += f"<prosody {' '.join(prosody_attrs)}>"
if self._settings["emphasis"]:
ssml += f"<emphasis level='{self._settings['emphasis']}'>"
@@ -195,7 +197,8 @@ class AzureBaseTTSService:
if self._settings["emphasis"]:
ssml += "</emphasis>"
ssml += "</prosody>"
if prosody_attrs:
ssml += "</prosody>"
if self._settings["style"]:
ssml += "</mstts:express-as>"
@@ -277,6 +280,11 @@ class AzureTTSService(WordTTSService, AzureBaseTTSService):
self._started = False
self._first_chunk = True
self._cumulative_audio_offset: float = 0.0 # Cumulative audio duration in seconds
self._current_sentence_base_offset: float = 0.0 # Base offset for current sentence
self._current_sentence_duration: float = 0.0 # Duration from Azure callback
self._current_sentence_max_word_offset: float = (
0.0 # Max word boundary offset seen in current sentence (for 8kHz workaround)
)
self._last_word: Optional[str] = None # Track last word for punctuation merging
self._last_timestamp: Optional[float] = None # Track last timestamp
@@ -386,8 +394,14 @@ class AzureTTSService(WordTTSService, AzureBaseTTSService):
word = evt.text
sentence_relative_seconds = evt.audio_offset / 10_000_000.0
# Add cumulative offset to get absolute timestamp across sentences
absolute_seconds = self._cumulative_audio_offset + sentence_relative_seconds
# Use base offset captured at start of run_tts to avoid race conditions
# with callbacks from overlapping TTS requests
absolute_seconds = self._current_sentence_base_offset + sentence_relative_seconds
# Track max word offset for accurate cumulative timing
# (audio_duration from Azure doesn't always match word boundary offsets at 8kHz)
if sentence_relative_seconds > self._current_sentence_max_word_offset:
self._current_sentence_max_word_offset = sentence_relative_seconds
if not word:
return
@@ -492,9 +506,9 @@ class AzureTTSService(WordTTSService, AzureBaseTTSService):
self._last_word = None
self._last_timestamp = None
# Update cumulative audio offset for next sentence
# Store duration for cumulative offset calculation
if evt.result and evt.result.audio_duration:
self._cumulative_audio_offset += evt.result.audio_duration.total_seconds()
self._current_sentence_duration = evt.result.audio_duration.total_seconds()
self._audio_queue.put_nowait(None) # Signal completion
@@ -530,6 +544,9 @@ class AzureTTSService(WordTTSService, AzureBaseTTSService):
self._started = False
self._first_chunk = True
self._cumulative_audio_offset = 0.0
self._current_sentence_base_offset = 0.0
self._current_sentence_duration = 0.0
self._current_sentence_max_word_offset = 0.0
self._last_word = None
self._last_timestamp = None
@@ -604,6 +621,12 @@ class AzureTTSService(WordTTSService, AzureBaseTTSService):
self._started = True
self._first_chunk = True
# Capture base offset BEFORE starting synthesis to avoid race conditions
# Word boundary callbacks will use this value
self._current_sentence_base_offset = self._cumulative_audio_offset
self._current_sentence_duration = 0.0
self._current_sentence_max_word_offset = 0.0
ssml = self._construct_ssml(text)
self._speech_synthesizer.speak_ssml_async(ssml)
await self.start_tts_usage_metrics(text)
@@ -627,6 +650,16 @@ class AzureTTSService(WordTTSService, AzureBaseTTSService):
)
yield frame
# Update cumulative offset for next sentence
# At 8kHz, Azure's audio_duration doesn't match word boundary offsets,
# so we use max_word_offset as a workaround. At other sample rates,
# audio_duration is accurate.
# TODO: Remove after Azure fixes word boundary timing at 8kHz
if self.sample_rate == 8000:
self._cumulative_audio_offset += self._current_sentence_max_word_offset
else:
self._cumulative_audio_offset += self._current_sentence_duration
except Exception as e:
yield ErrorFrame(error=f"Unknown error occurred: {e}")
yield TTSStoppedFrame()

View File

@@ -199,9 +199,10 @@ class CambTTSService(TTSService):
"""
super().__init__(sample_rate=sample_rate, **kwargs)
params = params or CambTTSService.InputParams()
self._api_key = api_key
self._timeout = timeout
self._client = AsyncCambAI(api_key=api_key, timeout=timeout)
params = params or CambTTSService.InputParams()
# Warn if sample rate doesn't match model's supported rate
if sample_rate and sample_rate != MODEL_SAMPLE_RATES.get(model):
@@ -222,6 +223,8 @@ class CambTTSService(TTSService):
self.set_voice(str(voice_id))
self._voice_id = voice_id
self._client = None
def can_generate_metrics(self) -> bool:
"""Check if this service can generate processing metrics.
@@ -249,6 +252,8 @@ class CambTTSService(TTSService):
"""
await super().start(frame)
self._client = AsyncCambAI(api_key=self._api_key, timeout=self._timeout)
# Use model-specific sample rate if not explicitly specified
if not self._init_sample_rate:
self._sample_rate = MODEL_SAMPLE_RATES.get(self.model_name, 22050)
@@ -289,6 +294,8 @@ class CambTTSService(TTSService):
await self.start_tts_usage_metrics(text)
yield TTSStartedFrame()
assert self._client is not None, "Camb.ai TTS service not initialized"
# Buffer for aligning chunks to 2-byte boundaries (16-bit PCM)
audio_buffer = b""

View File

@@ -207,9 +207,8 @@ class CartesiaSTTService(WebsocketSTTService):
await super().cancel(frame)
await self._disconnect()
async def start_metrics(self):
async def _start_metrics(self):
"""Start performance metrics collection for transcription processing."""
await self.start_ttfb_metrics()
await self.start_processing_metrics()
async def process_frame(self, frame: Frame, direction: FrameDirection):
@@ -222,7 +221,7 @@ class CartesiaSTTService(WebsocketSTTService):
await super().process_frame(frame, direction)
if isinstance(frame, VADUserStartedSpeakingFrame):
await self.start_metrics()
await self._start_metrics()
elif isinstance(frame, VADUserStoppedSpeakingFrame):
# Send finalize command to flush the transcription session
if self._websocket and self._websocket.state is State.OPEN:
@@ -342,7 +341,6 @@ class CartesiaSTTService(WebsocketSTTService):
pass
if len(transcript) > 0:
await self.stop_ttfb_metrics()
if is_final:
await self.push_frame(
TranscriptionFrame(

View File

@@ -6,8 +6,6 @@
"""Cerebras LLM service implementation using OpenAI-compatible interface."""
from typing import List
from loguru import logger
from pipecat.adapters.services.open_ai_adapter import OpenAILLMInvocationParams

View File

@@ -27,7 +27,6 @@ from pipecat.frames.frames import (
UserStartedSpeakingFrame,
UserStoppedSpeakingFrame,
)
from pipecat.processors.frame_processor import FrameDirection
from pipecat.services.stt_service import WebsocketSTTService
from pipecat.transcriptions.language import Language
from pipecat.utils.time import time_now_iso8601
@@ -660,6 +659,8 @@ class DeepgramFluxSTTService(WebsocketSTTService):
average_confidence = self._calculate_average_confidence(data)
if not self._params.min_confidence or average_confidence > self._params.min_confidence:
# EndOfTurn means Flux has determined the turn is complete,
# so this TranscriptionFrame is always finalized
await self.push_frame(
TranscriptionFrame(
transcript,
@@ -667,6 +668,7 @@ class DeepgramFluxSTTService(WebsocketSTTService):
time_now_iso8601(),
self._language,
result=data,
finalized=True,
)
)
else:

View File

@@ -276,9 +276,8 @@ class DeepgramSTTService(STTService):
# GH issue: https://github.com/deepgram/deepgram-python-sdk/issues/570
await self._connection.finish()
async def start_metrics(self):
"""Start TTFB and processing metrics collection."""
await self.start_ttfb_metrics()
async def _start_metrics(self):
"""Start processing metrics collection for this utterance."""
await self.start_processing_metrics()
async def _on_error(self, *args, **kwargs):
@@ -292,7 +291,7 @@ class DeepgramSTTService(STTService):
await self._connect()
async def _on_speech_started(self, *args, **kwargs):
await self.start_metrics()
await self._start_metrics()
await self._call_event_handler("on_speech_started", *args, **kwargs)
await self.broadcast_frame(UserStartedSpeakingFrame)
if self._should_interrupt:
@@ -320,8 +319,12 @@ class DeepgramSTTService(STTService):
language = result.channel.alternatives[0].languages[0]
language = Language(language)
if len(transcript) > 0:
await self.stop_ttfb_metrics()
if is_final:
# Check if this response is from a finalize() call.
# Only mark as finalized when both we requested it AND Deepgram confirms it.
from_finalize = getattr(result, "from_finalize", False)
if from_finalize:
self.confirm_finalize()
await self.push_frame(
TranscriptionFrame(
transcript,
@@ -356,8 +359,10 @@ class DeepgramSTTService(STTService):
if isinstance(frame, VADUserStartedSpeakingFrame) and not self.vad_enabled:
# Start metrics if Deepgram VAD is disabled & pipeline VAD has detected speech
await self.start_metrics()
await self._start_metrics()
elif isinstance(frame, VADUserStoppedSpeakingFrame):
# https://developers.deepgram.com/docs/finalize
# Mark that we're awaiting a from_finalize response
self.request_finalize()
await self._connection.finalize()
logger.trace(f"Triggered finalize event on: {frame.name=}, {direction=}")

View File

@@ -363,9 +363,6 @@ class DeepgramSageMakerSTTService(STTService):
if not transcript.strip():
return
# Stop TTFB metrics on first transcript
await self.stop_ttfb_metrics()
is_final = parsed.get("is_final", False)
speech_final = parsed.get("speech_final", False)
@@ -417,9 +414,8 @@ class DeepgramSageMakerSTTService(STTService):
"""
pass
async def start_metrics(self):
"""Start TTFB and processing metrics collection."""
await self.start_ttfb_metrics()
async def _start_metrics(self):
"""Start processing metrics collection."""
await self.start_processing_metrics()
async def process_frame(self, frame: Frame, direction: FrameDirection):
@@ -433,7 +429,7 @@ class DeepgramSageMakerSTTService(STTService):
# Start metrics when user starts speaking (if VAD is not provided by Deepgram)
if isinstance(frame, VADUserStartedSpeakingFrame):
await self.start_metrics()
await self._start_metrics()
elif isinstance(frame, VADUserStoppedSpeakingFrame):
# Send finalize message to Deepgram when user stops speaking
# This tells Deepgram to flush any remaining audio and return final results

View File

@@ -6,8 +6,6 @@
"""DeepSeek LLM service implementation using OpenAI-compatible interface."""
from typing import List
from loguru import logger
from pipecat.adapters.services.open_ai_adapter import OpenAILLMInvocationParams

View File

@@ -310,7 +310,6 @@ class ElevenLabsSTTService(SegmentedSTTService):
self, transcript: str, is_final: bool, language: Optional[str] = None
):
"""Handle a transcription result with tracing."""
await self.stop_ttfb_metrics()
await self.stop_processing_metrics()
async def run_stt(self, audio: bytes) -> AsyncGenerator[Frame, None]:
@@ -328,7 +327,6 @@ class ElevenLabsSTTService(SegmentedSTTService):
"""
try:
await self.start_processing_metrics()
await self.start_ttfb_metrics()
# Upload audio and get transcription result directly
result = await self._transcribe_audio(audio)
@@ -539,9 +537,8 @@ class ElevenLabsRealtimeSTTService(WebsocketSTTService):
await super().cancel(frame)
await self._disconnect()
async def start_metrics(self):
async def _start_metrics(self):
"""Start performance metrics collection for transcription processing."""
await self.start_ttfb_metrics()
await self.start_processing_metrics()
async def process_frame(self, frame: Frame, direction: FrameDirection):
@@ -555,7 +552,7 @@ class ElevenLabsRealtimeSTTService(WebsocketSTTService):
if isinstance(frame, VADUserStartedSpeakingFrame):
# Start metrics when user starts speaking
await self.start_metrics()
await self._start_metrics()
elif isinstance(frame, VADUserStoppedSpeakingFrame):
# Send commit when user stops speaking (manual commit mode)
if self._params.commit_strategy == CommitStrategy.MANUAL:
@@ -764,8 +761,6 @@ class ElevenLabsRealtimeSTTService(WebsocketSTTService):
if not text:
return
await self.stop_ttfb_metrics()
# Get language if provided
language = data.get("language_code")
@@ -803,7 +798,6 @@ class ElevenLabsRealtimeSTTService(WebsocketSTTService):
if not text:
return
await self.stop_ttfb_metrics()
await self.stop_processing_metrics()
# Get language if provided
@@ -845,7 +839,6 @@ class ElevenLabsRealtimeSTTService(WebsocketSTTService):
if not text:
return
await self.stop_ttfb_metrics()
await self.stop_processing_metrics()
# Get language if provided

View File

@@ -249,7 +249,6 @@ class FalSTTService(SegmentedSTTService):
self, transcript: str, is_final: bool, language: Optional[str] = None
):
"""Handle a transcription result with tracing."""
await self.stop_ttfb_metrics()
await self.stop_processing_metrics()
async def run_stt(self, audio: bytes) -> AsyncGenerator[Frame, None]:
@@ -267,7 +266,6 @@ class FalSTTService(SegmentedSTTService):
"""
try:
await self.start_processing_metrics()
await self.start_ttfb_metrics()
# Send to Fal directly (audio is already in WAV format from base class)
data_uri = fal_client.encode(audio, "audio/x-wav")

View File

@@ -6,8 +6,6 @@
"""Fireworks AI service implementation using OpenAI-compatible interface."""
from typing import List
from loguru import logger
from pipecat.adapters.services.open_ai_adapter import OpenAILLMInvocationParams

View File

@@ -1,2 +1,7 @@
from .file_api import GeminiFileAPI
from .gemini import GeminiMultimodalLiveLLMService
__all__ = [
"GeminiFileAPI",
"GeminiMultimodalLiveLLMService",
]

Some files were not shown because too many files have changed in this diff Show More