Compare commits
3 Commits
pk/optiona
...
mb/static-
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
5a6cc4d35c | ||
|
|
28be775740 | ||
|
|
bc730e4069 |
@@ -1 +0,0 @@
|
||||
../../.claude/skills/changelog
|
||||
@@ -1 +0,0 @@
|
||||
../../.claude/skills/cleanup
|
||||
@@ -1 +0,0 @@
|
||||
../../.claude/skills/code-review
|
||||
@@ -1 +0,0 @@
|
||||
../../.claude/skills/docstring
|
||||
@@ -1 +0,0 @@
|
||||
../../.claude/skills/pr-description
|
||||
@@ -1 +0,0 @@
|
||||
../../.claude/skills/pr-submit
|
||||
@@ -1 +0,0 @@
|
||||
../../.claude/skills/update-docs
|
||||
@@ -1,27 +0,0 @@
|
||||
{
|
||||
"name": "pipecat-dev-skills",
|
||||
"owner": {
|
||||
"name": "Pipecat"
|
||||
},
|
||||
"metadata": {
|
||||
"description": "Development workflow skills for contributing to the Pipecat project",
|
||||
"version": "1.0.0"
|
||||
},
|
||||
"plugins": [
|
||||
{
|
||||
"name": "pipecat-dev",
|
||||
"description": "Development workflow skills for contributing to the Pipecat project",
|
||||
"version": "1.0.0",
|
||||
"source": "./",
|
||||
"skills": [
|
||||
"./.claude/skills/changelog",
|
||||
"./.claude/skills/cleanup",
|
||||
"./.claude/skills/code-review",
|
||||
"./.claude/skills/docstring",
|
||||
"./.claude/skills/pr-description",
|
||||
"./.claude/skills/pr-submit",
|
||||
"./.claude/skills/update-docs"
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -1,5 +0,0 @@
|
||||
{
|
||||
"attribution": {
|
||||
"commit": ""
|
||||
}
|
||||
}
|
||||
@@ -26,26 +26,12 @@ Create changelog files for the important commits in this PR. The PR number is pr
|
||||
- `{PR_NUMBER}.performance.md` - for performance improvements
|
||||
- `{PR_NUMBER}.other.md` - for other changes
|
||||
|
||||
4. Each changelog file should at least contain a main single line starting with `- ` followed by a clear description of the change. No line wrapping.
|
||||
4. Each changelog file should at least contain a main single line starting with `- ` followed by a clear description of the change.
|
||||
|
||||
5. If the change is complicated, changelog files can have indented lines after the main line with additional details or code samples.
|
||||
|
||||
6. Use ⚠️ emoji prefix for breaking changes.
|
||||
|
||||
7. **Write changes in user-facing terms first.** Lead with what users of the framework will notice: new APIs, changed behavior, new parameters, fixed bugs they might have hit, etc. Implementation details (internal refactoring, how something is wired up under the hood) can be included as secondary context after the user-facing description, but should never be the *only* content of a changelog entry when there is a user-visible effect.
|
||||
|
||||
**Good** (user-facing first, implementation detail as context):
|
||||
```
|
||||
- Turn completion instructions now persist correctly across full context updates when using `system_instruction`. Previously they were injected as a context system message, which caused warning spam and didn't survive context updates.
|
||||
```
|
||||
|
||||
**Bad** (implementation detail only, no user-facing framing):
|
||||
```
|
||||
- Fixed turn completion instructions being injected as a context system message instead of using `system_instruction`.
|
||||
```
|
||||
|
||||
Ask yourself: "If I'm a developer building on Pipecat, what would I notice changed?" Start there.
|
||||
|
||||
## Example
|
||||
|
||||
For PR #3519 with a new feature and a bug fix:
|
||||
@@ -57,5 +43,5 @@ For PR #3519 with a new feature and a bug fix:
|
||||
|
||||
`changelog/3519.fixed.md`:
|
||||
```
|
||||
- Fixed an issue where something was not working correctly in some user-visible scenario. The root cause was an internal implementation detail.
|
||||
- Fixed an issue where something was not working correctly.
|
||||
```
|
||||
|
||||
@@ -1,312 +0,0 @@
|
||||
---
|
||||
name: cleanup
|
||||
description: Review, refactor, document, and validate code changes in the current branch
|
||||
---
|
||||
|
||||
# Code Cleanup Skill
|
||||
|
||||
The **Code Cleanup Skill** reviews, refactors, and documents code changes in your current branch, ensuring alignment with **Pipecat's architecture, coding standards, and example patterns**.
|
||||
It focuses on **readability, correctness, performance, and consistency**, while avoiding breaking changes.
|
||||
|
||||
---
|
||||
|
||||
## Skill Overview
|
||||
|
||||
This skill analyzes all changes introduced in your branch and performs the following actions:
|
||||
|
||||
1. **Analyze Branch Changes**
|
||||
- Review uncommitted changes and outgoing commits
|
||||
2. **Refactor for Readability**
|
||||
- Improve clarity, naming, structure, and modern Python usage
|
||||
3. **Enhance Performance**
|
||||
- Identify safe, conservative optimization opportunities
|
||||
4. **Add Documentation**
|
||||
- Apply Pipecat-style, Google-format docstrings
|
||||
5. **Ensure Pattern Consistency**
|
||||
- Match existing Pipecat services, pipelines, and examples
|
||||
6. **Validate Examples**
|
||||
- Ensure examples follow foundational patterns (e.g. `07-interruptible.py`)
|
||||
|
||||
---
|
||||
|
||||
## Usage
|
||||
|
||||
Invoke the skill using any of the following commands:
|
||||
|
||||
- "Clean up my branch code"
|
||||
- "Refactor the changes in my branch"
|
||||
- "Review and improve my branch code"
|
||||
- `/cleanup`
|
||||
|
||||
---
|
||||
|
||||
## What This Skill Does
|
||||
|
||||
### 1. Analyze Branch Changes
|
||||
|
||||
The skill retrieves all uncommitted changes and outgoing commits to understand:
|
||||
|
||||
- New files added
|
||||
- Modified files
|
||||
- Code additions and deletions
|
||||
- Overall scope and intent of changes
|
||||
|
||||
---
|
||||
|
||||
### 2. Code Refactoring
|
||||
|
||||
#### Readability Improvements
|
||||
|
||||
- Replace tuples with named classes or dataclasses
|
||||
- Improve variable, method, and class naming
|
||||
- Extract complex logic into well-named helper methods
|
||||
- Add missing type hints
|
||||
- Simplify nested or complex conditionals
|
||||
- Replace deprecated methods and features
|
||||
- Normalize formatting to match Pipecat style
|
||||
|
||||
#### Performance Enhancements
|
||||
|
||||
- Identify inefficient loops or repeated work
|
||||
- Suggest appropriate data structures
|
||||
- Optimize async workflows and I/O
|
||||
- Remove redundant operations
|
||||
|
||||
> Performance changes are conservative and non-breaking.
|
||||
|
||||
---
|
||||
|
||||
### 3. Documentation
|
||||
|
||||
Documentation follows **Google-style docstrings**, consistent with Pipecat conventions.
|
||||
|
||||
#### Class Documentation
|
||||
|
||||
```python
|
||||
class ExampleService:
|
||||
"""Brief one-line description.
|
||||
|
||||
Detailed explanation of the class purpose, responsibilities,
|
||||
and important behaviors.
|
||||
|
||||
Supported features:
|
||||
|
||||
- Feature 1
|
||||
- Feature 2
|
||||
- Feature 3
|
||||
"""
|
||||
```
|
||||
|
||||
#### Method Documentation
|
||||
|
||||
```python
|
||||
def process_data(self, data: str, options: Optional[dict] = None) -> bool:
|
||||
"""Process incoming data with optional configuration.
|
||||
|
||||
Args:
|
||||
data: The input data to process.
|
||||
options: Optional configuration dictionary.
|
||||
|
||||
Returns:
|
||||
True if processing succeeded, False otherwise.
|
||||
|
||||
Raises:
|
||||
ValueError: If data is empty or invalid.
|
||||
"""
|
||||
```
|
||||
|
||||
#### Pydantic Model Parameters
|
||||
|
||||
```python
|
||||
class InputParams(BaseModel):
|
||||
"""Configuration parameters for the service.
|
||||
|
||||
Parameters:
|
||||
timeout: Request timeout in seconds.
|
||||
retry_count: Number of retry attempts.
|
||||
enable_logging: Whether to enable debug logging.
|
||||
"""
|
||||
|
||||
timeout: Optional[float] = None
|
||||
retry_count: int = 3
|
||||
enable_logging: bool = False
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4. Pattern Consistency Checks
|
||||
|
||||
#### Service Classes
|
||||
|
||||
- Correct inheritance (`TTSService`, `STTService`, `LLMService`)
|
||||
- Consistent constructor signatures
|
||||
- Frame emission patterns
|
||||
- Metrics support:
|
||||
- `can_generate_metrics()`
|
||||
- TTFB metrics
|
||||
- Usage metrics
|
||||
- Alignment with similar existing services
|
||||
|
||||
#### Examples
|
||||
|
||||
Validated against `examples/07-interruptible.py`:
|
||||
|
||||
- Proper `create_transport()` usage
|
||||
- Correct pipeline structure
|
||||
- Task setup and observers
|
||||
- Event handler registration
|
||||
- Runner and bot entrypoint consistency
|
||||
|
||||
---
|
||||
|
||||
### 5. Specific Implementation Patterns
|
||||
|
||||
#### Service Implementation
|
||||
|
||||
```python
|
||||
class ExampleTTSService(TTSService):
|
||||
|
||||
def __init__(self, *, api_key: Optional[str] = None, **kwargs):
|
||||
super().__init__(**kwargs)
|
||||
self._api_key = api_key or os.getenv("SERVICE_API_KEY")
|
||||
|
||||
def can_generate_metrics(self) -> bool:
|
||||
return True
|
||||
|
||||
async def run_tts(self, text: str) -> AsyncGenerator[Frame, None]:
|
||||
try:
|
||||
await self.start_ttfb_metrics()
|
||||
yield TTSStartedFrame()
|
||||
# ... processing ...
|
||||
yield TTSAudioRawFrame(...)
|
||||
finally:
|
||||
await self.stop_ttfb_metrics()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
#### Example Structure Pattern
|
||||
|
||||
```python
|
||||
transport_params = {
|
||||
"daily": lambda: DailyParams(...),
|
||||
"twilio": lambda: FastAPIWebsocketParams(...),
|
||||
"webrtc": lambda: TransportParams(...),
|
||||
}
|
||||
|
||||
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
stt = DeepgramSTTService(...)
|
||||
tts = SomeTTSService(...)
|
||||
llm = OpenAILLMService(...)
|
||||
|
||||
context = LLMContext(messages)
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(...)
|
||||
|
||||
pipeline = Pipeline([...])
|
||||
task = PipelineTask(pipeline, params=..., observers=[...])
|
||||
|
||||
@transport.event_handler("on_client_connected")
|
||||
async def on_client_connected(transport, client):
|
||||
await task.queue_frames([LLMRunFrame()])
|
||||
|
||||
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
|
||||
await runner.run(task)
|
||||
|
||||
async def bot(runner_args: RunnerArguments):
|
||||
"""Main bot entry point compatible with Pipecat Cloud."""
|
||||
transport = await create_transport(runner_args, transport_params)
|
||||
await run_bot(transport, runner_args)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Execution Flow
|
||||
|
||||
1. Fetch uncommitted and outgoing changes
|
||||
2. Categorize files (services, examples, tests, utilities)
|
||||
3. Analyze each file:
|
||||
- Readability
|
||||
- Performance
|
||||
- Documentation
|
||||
- Pattern consistency
|
||||
4. Generate actionable recommendations
|
||||
5. Apply Pipecat standards
|
||||
|
||||
---
|
||||
|
||||
## Examples
|
||||
|
||||
### Before: Tuple Usage
|
||||
|
||||
```python
|
||||
def get_audio_info(self) -> Tuple[int, int]:
|
||||
return (48000, 1)
|
||||
```
|
||||
|
||||
### After: Named Class
|
||||
|
||||
```python
|
||||
class AudioInfo:
|
||||
"""Audio configuration information.
|
||||
|
||||
Parameters:
|
||||
sample_rate: Sample rate in Hz.
|
||||
num_channels: Number of audio channels.
|
||||
"""
|
||||
|
||||
sample_rate: int
|
||||
num_channels: int
|
||||
|
||||
def get_audio_info(self) -> AudioInfo:
|
||||
return AudioInfo(sample_rate=48000, num_channels=1)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Before: Missing Documentation
|
||||
|
||||
```python
|
||||
class NewTTSService(TTSService):
|
||||
def __init__(self, api_key: str, voice: str):
|
||||
self._api_key = api_key
|
||||
self._voice = voice
|
||||
```
|
||||
|
||||
### After: Fully Documented
|
||||
|
||||
```python
|
||||
class NewTTSService(TTSService):
|
||||
"""Text-to-speech service using NewProvider API.
|
||||
|
||||
Streams PCM audio and emits TTSAudioRawFrame frames compatible
|
||||
with Pipecat transports.
|
||||
|
||||
Supported features:
|
||||
- Text-to-speech synthesis
|
||||
- Streaming PCM audio
|
||||
- Voice customization
|
||||
- TTFB metrics
|
||||
"""
|
||||
|
||||
def __init__(self, *, api_key: str, voice: str, **kwargs):
|
||||
"""Initialize the NewTTSService.
|
||||
|
||||
Args:
|
||||
api_key: API key for authentication.
|
||||
voice: Voice identifier to use.
|
||||
**kwargs: Additional arguments passed to the parent service.
|
||||
"""
|
||||
super().__init__(**kwargs)
|
||||
self._api_key = api_key
|
||||
self.set_voice(voice)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Notes
|
||||
|
||||
- Non-breaking improvements only
|
||||
- Backward compatibility preserved
|
||||
- Conservative performance changes
|
||||
- Google-style docstrings
|
||||
- Pattern checks follow recent Pipecat code
|
||||
@@ -1,107 +0,0 @@
|
||||
---
|
||||
name: code-review
|
||||
description: Automated code review for pull requests using multiple specialized agents
|
||||
disable-model-invocation: true
|
||||
allowed-tools: Bash(gh issue view:*), Bash(gh search:*), Bash(gh issue list:*), Bash(gh pr comment:*), Bash(gh pr diff:*), Bash(gh pr view:*), Bash(gh pr list:*)
|
||||
---
|
||||
|
||||
Provide a code review for the given pull request.
|
||||
|
||||
**Agent assumptions (applies to all agents and subagents):**
|
||||
|
||||
- All tools are functional and will work without error. Do not test tools or make exploratory calls. Make sure this is clear to every subagent that is launched.
|
||||
- Only call a tool if it is required to complete the task. Every tool call should have a clear purpose.
|
||||
|
||||
To do this, follow these steps precisely:
|
||||
|
||||
1. Launch a haiku agent to check if any of the following are true:
|
||||
- The pull request is closed
|
||||
- The pull request is a draft
|
||||
- The pull request does not need code review (e.g. automated PR, trivial change that is obviously correct)
|
||||
- Claude has already commented on this PR (check `gh pr view <PR> --comments` for comments left by claude)
|
||||
|
||||
If any condition is true, stop and do not proceed.
|
||||
|
||||
Note: Still review Claude generated PR's.
|
||||
|
||||
2. Launch a haiku agent to return a list of file paths (not their contents) for all relevant CLAUDE.md files including:
|
||||
- The root CLAUDE.md file, if it exists
|
||||
- Any CLAUDE.md files in directories containing files modified by the pull request
|
||||
|
||||
3. Launch a sonnet agent to view the pull request and return a summary of the changes
|
||||
|
||||
4. Launch 4 agents in parallel to independently review the changes. Each agent should return the list of issues, where each issue includes a description and the reason it was flagged (e.g. "CLAUDE.md adherence", "bug"). The agents should do the following:
|
||||
|
||||
Agents 1 + 2: CLAUDE.md compliance sonnet agents
|
||||
Audit changes for CLAUDE.md compliance in parallel. Note: When evaluating CLAUDE.md compliance for a file, you should only consider CLAUDE.md files that share a file path with the file or parents.
|
||||
|
||||
Agent 3: Opus bug agent (parallel subagent with agent 4)
|
||||
Scan for obvious bugs. Focus only on the diff itself without reading extra context. Flag only significant bugs; ignore nitpicks and likely false positives. Do not flag issues that you cannot validate without looking at context outside of the git diff.
|
||||
|
||||
Agent 4: Opus bug agent (parallel subagent with agent 3)
|
||||
Look for problems that exist in the introduced code. This could be security issues, incorrect logic, etc. Only look for issues that fall within the changed code.
|
||||
|
||||
**CRITICAL: We only want HIGH SIGNAL issues.** Flag issues where:
|
||||
- The code will fail to compile or parse (syntax errors, type errors, missing imports, unresolved references)
|
||||
- The code will definitely produce wrong results regardless of inputs (clear logic errors)
|
||||
- Clear, unambiguous CLAUDE.md violations where you can quote the exact rule being broken
|
||||
|
||||
Do NOT flag:
|
||||
- Code style or quality concerns
|
||||
- Potential issues that depend on specific inputs or state
|
||||
- Subjective suggestions or improvements
|
||||
|
||||
If you are not certain an issue is real, do not flag it. False positives erode trust and waste reviewer time.
|
||||
|
||||
In addition to the above, each subagent should be told the PR title and description. This will help provide context regarding the author's intent.
|
||||
|
||||
5. For each issue found in the previous step by agents 3 and 4, launch parallel subagents to validate the issue. These subagents should get the PR title and description along with a description of the issue. The agent's job is to review the issue to validate that the stated issue is truly an issue with high confidence. For example, if an issue such as "variable is not defined" was flagged, the subagent's job would be to validate that is actually true in the code. Another example would be CLAUDE.md issues. The agent should validate that the CLAUDE.md rule that was violated is scoped for this file and is actually violated. Use Opus subagents for bugs and logic issues, and sonnet agents for CLAUDE.md violations.
|
||||
|
||||
6. Filter out any issues that were not validated in step 5. This step will give us our list of high signal issues for our review.
|
||||
|
||||
7. If issues were found, skip to step 8 to post comments.
|
||||
|
||||
If NO issues were found, post a summary comment using `gh pr comment` (if `--comment` argument is provided):
|
||||
"No issues found. Checked for bugs and CLAUDE.md compliance."
|
||||
|
||||
8. Create a list of all comments that you plan on leaving. This is only for you to make sure you are comfortable with the comments. Do not post this list anywhere.
|
||||
|
||||
9. Post inline comments for each issue using `gh pr review` with inline comments. For each comment:
|
||||
- Provide a brief description of the issue
|
||||
- For small, self-contained fixes, include a committable suggestion block
|
||||
- For larger fixes (6+ lines, structural changes, or changes spanning multiple locations), describe the issue and suggested fix without a suggestion block
|
||||
- Never post a committable suggestion UNLESS committing the suggestion fixes the issue entirely. If follow up steps are required, do not leave a committable suggestion.
|
||||
|
||||
**IMPORTANT: Only post ONE comment per unique issue. Do not post duplicate comments.**
|
||||
|
||||
Use this list when evaluating issues in Steps 4 and 5 (these are false positives, do NOT flag):
|
||||
|
||||
- Pre-existing issues
|
||||
- Something that appears to be a bug but is actually correct
|
||||
- Pedantic nitpicks that a senior engineer would not flag
|
||||
- Issues that a linter will catch (do not run the linter to verify)
|
||||
- General code quality concerns (e.g., lack of test coverage, general security issues) unless explicitly required in CLAUDE.md
|
||||
- Issues mentioned in CLAUDE.md but explicitly silenced in the code (e.g., via a lint ignore comment)
|
||||
|
||||
Notes:
|
||||
|
||||
- Use gh CLI to interact with GitHub (e.g., fetch pull requests, create comments). Do not use web fetch.
|
||||
- Create a todo list before starting.
|
||||
- You must cite and link each issue in inline comments (e.g., if referring to a CLAUDE.md, include a link to it).
|
||||
- If no issues are found, post a comment with the following format:
|
||||
|
||||
---
|
||||
|
||||
## Code review
|
||||
|
||||
No issues found. Checked for bugs and CLAUDE.md compliance.
|
||||
|
||||
---
|
||||
|
||||
- When linking to code in inline comments, follow the following format precisely, otherwise the Markdown preview won't render correctly: `https://github.com/OWNER/REPO/blob/FULL_SHA/path/to/file.py#L10-L15`
|
||||
- Requires full git sha
|
||||
- You must provide the full sha. Commands like `https://github.com/owner/repo/blob/$(git rev-parse HEAD)/foo/bar` will not work, since your comment will be directly rendered in Markdown.
|
||||
- Repo name must match the repo you're code reviewing
|
||||
- # sign after the file name
|
||||
- Line range format is L[start]-L[end]
|
||||
- Provide at least 1 line of context before and after, centered on the line you are commenting about (eg. if you are commenting about lines 5-6, you should link to `L4-7`)
|
||||
@@ -3,20 +3,21 @@ name: docstring
|
||||
description: Document a Python module and its classes using Google style
|
||||
---
|
||||
|
||||
Document a Python module or class using Google-style docstrings following project conventions. The argument can be a class name or a module path.
|
||||
Document a Python module and its classes using Google-style docstrings following project conventions. The class name is provided as an argument.
|
||||
|
||||
## Instructions
|
||||
|
||||
1. Determine what to document based on the argument:
|
||||
1. First, find the class in the codebase:
|
||||
```
|
||||
Search for "class ClassName" in src/pipecat/
|
||||
```
|
||||
|
||||
**If a module path is provided** (e.g. `src/pipecat/audio/vad/vad_analyzer.py`):
|
||||
- Use that file directly
|
||||
2. If multiple files contain that class name:
|
||||
- List all matches with their file paths
|
||||
- Ask the user which one they want to document
|
||||
- Wait for confirmation before proceeding
|
||||
|
||||
**If a class name is provided** (e.g. `VADAnalyzer`):
|
||||
- Search for `class ClassName` in `src/pipecat/`
|
||||
- If multiple files contain that class name, list all matches with their file paths, ask the user which one they want to document, and wait for confirmation
|
||||
|
||||
2. Once the file is identified, read the module to understand its structure:
|
||||
3. Once the file is identified, read the module to understand its structure:
|
||||
- Identify all classes, functions, and important type aliases
|
||||
- Understand the purpose of each component
|
||||
|
||||
|
||||
@@ -1,28 +0,0 @@
|
||||
---
|
||||
name: pr-submit
|
||||
description: Create and submit a GitHub PR from the current branch
|
||||
---
|
||||
|
||||
Submit the current changes as a GitHub pull request.
|
||||
|
||||
## Instructions
|
||||
|
||||
1. Check the current state of the repository:
|
||||
- Run `git status` to see staged, unstaged, and untracked changes
|
||||
- Run `git diff` to see current changes
|
||||
- Run `git log --oneline -10` to see recent commits
|
||||
|
||||
2. If there are uncommitted changes relevant to the PR:
|
||||
- Ask the user if they want a specific prefix for the branch name (e.g., `alice/`, `fix/`, `feat/`)
|
||||
- Create a new branch based on the current branch
|
||||
- Commit the changes using multiple commits if the changes are unrelated
|
||||
|
||||
3. Push the branch and create the PR:
|
||||
- Push with `-u` flag to set upstream tracking
|
||||
- Create the PR using `gh pr create`
|
||||
|
||||
4. After the PR is created:
|
||||
- Run `/changelog <pr_number>` to generate changelog files, then commit and push them
|
||||
- Run `/pr-description <pr_number>` to update the PR description
|
||||
|
||||
5. Return the PR URL to the user.
|
||||
@@ -1,306 +0,0 @@
|
||||
---
|
||||
name: update-docs
|
||||
description: Update documentation pages to match source code changes on the current branch
|
||||
---
|
||||
|
||||
Update documentation pages to reflect source code changes on the current branch. Analyzes the diff against main, maps changed source files to their corresponding doc pages, and makes targeted edits.
|
||||
|
||||
## Arguments
|
||||
|
||||
```
|
||||
/update-docs [DOCS_PATH]
|
||||
```
|
||||
|
||||
- `DOCS_PATH` (optional): Path to the docs repository root. If not provided, ask the user.
|
||||
|
||||
Examples:
|
||||
- `/update-docs /Users/me/src/docs`
|
||||
- `/update-docs`
|
||||
|
||||
## Instructions
|
||||
|
||||
### Step 1: Resolve docs path
|
||||
|
||||
If `DOCS_PATH` was provided as an argument, use it. Otherwise, ask the user for the path to their docs repository.
|
||||
|
||||
Verify the path exists and contains `server/services/` subdirectory.
|
||||
|
||||
### Step 2: Create docs branch
|
||||
|
||||
Get the current pipecat branch name:
|
||||
```bash
|
||||
git rev-parse --abbrev-ref HEAD
|
||||
```
|
||||
|
||||
In the docs repo, create a new branch off main with a matching name:
|
||||
```bash
|
||||
cd DOCS_PATH && git checkout main && git pull && git checkout -b {branch-name}-docs
|
||||
```
|
||||
|
||||
For example, if the pipecat branch is `feat/new-service`, the docs branch becomes `feat/new-service-docs`.
|
||||
|
||||
All doc edits in subsequent steps are made on this branch.
|
||||
|
||||
### Step 3: Detect changed source files
|
||||
|
||||
Run:
|
||||
```bash
|
||||
git diff main..HEAD --name-only
|
||||
```
|
||||
|
||||
Filter to files that could affect documentation:
|
||||
- `src/pipecat/services/**/*.py` (service implementations)
|
||||
- `src/pipecat/transports/**/*.py` (transport implementations)
|
||||
- `src/pipecat/serializers/**/*.py` (serializer implementations)
|
||||
- `src/pipecat/processors/**/*.py` (processor implementations)
|
||||
- `src/pipecat/audio/**/*.py` (audio utilities)
|
||||
- `src/pipecat/turns/**/*.py` (turn management)
|
||||
- `src/pipecat/observers/**/*.py` (observers)
|
||||
- `src/pipecat/pipeline/**/*.py` (pipeline core)
|
||||
|
||||
Ignore `__init__.py`, `__pycache__`, test files, and files that only contain type re-exports.
|
||||
|
||||
### Step 4: Map source files to doc pages
|
||||
|
||||
For each changed source file, find the corresponding doc page. Read the mapping file at `.claude/skills/update-docs/SOURCE_DOC_MAPPING.md` and apply its tiered lookup: tier 1 (known exceptions) → tier 2 (pattern matching) → tier 3 (search fallback). **First match wins.**
|
||||
|
||||
### Step 5: Analyze each source-doc pair
|
||||
|
||||
For each mapped pair:
|
||||
|
||||
1. **Read the full source file** to understand current state
|
||||
2. **Read the diff** for that file: `git diff main..HEAD -- <source_file>`
|
||||
3. **Read the current doc page** in full
|
||||
|
||||
Identify what changed by comparing source to docs:
|
||||
|
||||
- **Constructor parameters**: Compare `__init__` signature to the Configuration section's `<ParamField>` entries
|
||||
- **InputParams fields**: Compare `InputParams(BaseModel)` class fields to the InputParams table
|
||||
- **Event handlers**: Compare `_register_event_handler` calls and event handler definitions to Event Handlers section
|
||||
- **Class names / imports**: Check if Usage examples reference correct names
|
||||
- **Behavioral changes**: Check if Notes section needs updating
|
||||
|
||||
### Step 6: Make targeted edits
|
||||
|
||||
For each doc page that needs updates, edit **only the sections that need changes**. Preserve all other content exactly as-is.
|
||||
|
||||
#### Rules
|
||||
|
||||
- **Never remove content** unless the corresponding source code was removed
|
||||
- **Never rewrite sections** that are already accurate
|
||||
- **Match existing formatting** — if the page uses `<ParamField>` tags, use them; if it uses tables, use tables
|
||||
- **Keep descriptions concise** — match the tone and length of surrounding content
|
||||
- **Preserve CardGroup, links, and examples** unless they reference removed functionality
|
||||
- **Don't touch frontmatter** unless the class was renamed
|
||||
|
||||
#### Section-specific guidance
|
||||
|
||||
**Configuration** (constructor params):
|
||||
- Use `<ParamField path="name" type="type" default="value">` format if the page already uses it
|
||||
- Add new params in logical order (required first, then optional)
|
||||
- Remove params that no longer exist in source
|
||||
- Update types/defaults that changed
|
||||
|
||||
**InputParams** (runtime settings):
|
||||
- Use markdown table format: `| Parameter | Type | Default | Description |`
|
||||
- Match the field names and types from the `InputParams(BaseModel)` class
|
||||
- Include the default values from the source
|
||||
|
||||
**Usage** (code examples):
|
||||
- Update import paths, class names, and parameter names
|
||||
- Only modify examples if they would break or be misleading with the new API
|
||||
- Don't rewrite working examples just to add new optional params
|
||||
|
||||
**Notes**:
|
||||
- Add notes for new behavioral gotchas or breaking changes
|
||||
- Remove notes about limitations that were fixed
|
||||
- Keep existing notes that are still accurate
|
||||
|
||||
**Event Handlers**:
|
||||
- Update the event table and example code
|
||||
- Add new events, remove deleted ones
|
||||
- Update handler signatures if they changed
|
||||
|
||||
**Overview / Key Features / Prerequisites**:
|
||||
- Only update if the PR fundamentally changes what the service does (new capability, removed capability, renamed class)
|
||||
- Most PRs will NOT need changes to these sections
|
||||
|
||||
### Step 7: Update guides
|
||||
|
||||
Guides at `DOCS_PATH/guides/` reference specific class names, parameters, imports, and code patterns. After completing reference doc edits, check if any guides need updates too.
|
||||
|
||||
For each changed source file, collect the class names, renamed parameters, and changed imports from the diff. Search the guides directory:
|
||||
```bash
|
||||
grep -rl "ClassName\|old_param_name" DOCS_PATH/guides/
|
||||
```
|
||||
|
||||
For each guide that references changed code:
|
||||
1. Read the full guide
|
||||
2. Update class names, parameter names, import paths, and code examples that are now incorrect
|
||||
3. **Don't rewrite prose** — only fix the specific references that changed
|
||||
4. Leave guides alone if they reference the service generally but don't use any changed APIs
|
||||
|
||||
Guide directories:
|
||||
- `guides/learn/` — conceptual tutorials (pipeline, LLM, STT, TTS, etc.)
|
||||
- `guides/fundamentals/` — practical how-tos (metrics, recording, transcripts, etc.)
|
||||
- `guides/features/` — feature-specific guides (Gemini Live, OpenAI audio, WhatsApp, etc.)
|
||||
- `guides/telephony/` — telephony integration guides (Twilio, Plivo, Telnyx, etc.)
|
||||
|
||||
### Step 8: Identify doc gaps
|
||||
|
||||
After processing all mapped pairs, check for two kinds of gaps:
|
||||
|
||||
**Missing pages**: Source files that had no doc page mapping (neither tier 1, 2, nor 3) and are not marked as "(skip)". For each, tell the user:
|
||||
- The source file path
|
||||
- The main class(es) it defines
|
||||
- Whether a new doc page should be created
|
||||
|
||||
**Missing sections**: Mapped doc pages that are missing standard sections compared to the source. For example, a transport page with no Configuration section, or a service page with no InputParams table when the source defines `InputParams(BaseModel)`. Flag these and offer to add the missing sections.
|
||||
|
||||
If the user wants a new page, do all three of the following:
|
||||
|
||||
#### 8a: Create the doc page
|
||||
|
||||
Create the new `.mdx` file using this template structure:
|
||||
```
|
||||
---
|
||||
title: "Service Name"
|
||||
description: "Brief description"
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
[Description from class docstring or source analysis]
|
||||
|
||||
<CardGroup cols={2}>
|
||||
[Cards for API reference and examples if available]
|
||||
</CardGroup>
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
pip install "pipecat-ai[package-name]"
|
||||
```
|
||||
|
||||
## Prerequisites
|
||||
|
||||
[Environment variables and account setup]
|
||||
|
||||
## Configuration
|
||||
|
||||
[ParamField entries for constructor params]
|
||||
|
||||
## InputParams
|
||||
|
||||
[Table of InputParams fields, if the service has them]
|
||||
|
||||
## Usage
|
||||
|
||||
### Basic Setup
|
||||
|
||||
```python
|
||||
[Minimal working example]
|
||||
```
|
||||
|
||||
## Notes
|
||||
|
||||
[Important caveats]
|
||||
|
||||
## Event Handlers
|
||||
|
||||
[Event table and example code]
|
||||
```
|
||||
|
||||
#### 8b: Add to docs.json
|
||||
|
||||
Add the new page path to `DOCS_PATH/docs.json` in the correct navigation group. The path format is `server/services/{category}/{provider}` (without the `.mdx` extension).
|
||||
|
||||
Find the matching group in the navigation structure:
|
||||
- **STT** → `"group": "Speech-to-Text"` under Services
|
||||
- **TTS** → `"group": "Text-to-Speech"` under Services
|
||||
- **LLM** → `"group": "LLM"` under Services
|
||||
- **S2S** → `"group": "Speech-to-Speech"` under Services
|
||||
- **Transport** → `"group": "Transport"` under Services
|
||||
- **Serializer** → `"group": "Serializers"` under Services
|
||||
- **Image generation** → `"group": "Image Generation"` under Services
|
||||
- **Video** → `"group": "Video"` under Services
|
||||
- **Memory** → `"group": "Memory"` under Services
|
||||
- **Vision** → `"group": "Vision"` under Services
|
||||
- **Analytics** → `"group": "Analytics & Monitoring"` under Services
|
||||
|
||||
Insert the new entry **alphabetically** within the group's `pages` array. For example, adding a new STT service "foo":
|
||||
```json
|
||||
{
|
||||
"group": "Speech-to-Text",
|
||||
"pages": [
|
||||
"server/services/stt/assemblyai",
|
||||
"server/services/stt/aws",
|
||||
...
|
||||
"server/services/stt/foo",
|
||||
...
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
#### 8c: Add to supported-services.mdx
|
||||
|
||||
Add a new row to the correct category table in `DOCS_PATH/server/services/supported-services.mdx`.
|
||||
|
||||
Use this format:
|
||||
```
|
||||
| [DisplayName](/server/services/{category}/{provider}) | `pip install "pipecat-ai[package]"` |
|
||||
```
|
||||
|
||||
To determine the correct values:
|
||||
- **DisplayName**: Use the service's human-readable name (e.g., "ElevenLabs", "AWS Polly", "Google Gemini")
|
||||
- **package**: Look at the service's `pyproject.toml` extras or the import pattern in the source code. For example, if the service is in `src/pipecat/services/foo/`, the package is typically `foo`.
|
||||
- If no pip dependencies are required, use `No dependencies required` instead.
|
||||
|
||||
Insert the new row **alphabetically** within the table. Match the column alignment of the existing rows.
|
||||
|
||||
### Step 9: Output summary
|
||||
|
||||
After all edits are complete, print a summary:
|
||||
|
||||
```
|
||||
## Documentation Updates
|
||||
|
||||
### Updated reference pages
|
||||
- `server/services/stt/deepgram.mdx` — Updated Configuration (added `new_param`), InputParams (updated `language` default)
|
||||
- `server/services/tts/elevenlabs.mdx` — Updated Event Handlers (added `on_connected`)
|
||||
|
||||
### Updated guides
|
||||
- `guides/learn/speech-to-text.mdx` — Updated code example (renamed `old_param` → `new_param`)
|
||||
|
||||
### New service pages
|
||||
- `server/services/tts/newprovider.mdx` — Created page, added to docs.json (Text-to-Speech), added to supported-services.mdx
|
||||
|
||||
### Unmapped source files
|
||||
- `src/pipecat/services/newprovider/tts.py` — NewProviderTTSService (no doc page exists)
|
||||
|
||||
### Skipped files
|
||||
- `src/pipecat/services/ai_service.py` — internal base class
|
||||
```
|
||||
|
||||
## Guidelines
|
||||
|
||||
- **Be conservative** — only change what the diff warrants. Don't "improve" docs beyond what changed in source.
|
||||
- **Read before editing** — always read the full doc page before making changes so you understand the existing structure.
|
||||
- **Preserve voice** — match the writing style of the existing doc page, don't impose a different tone.
|
||||
- **One PR at a time** — this skill operates on the current branch's diff against main. Don't look at other branches.
|
||||
- **Parallel analysis** — when multiple source files map to different doc pages, analyze and edit them in parallel for efficiency.
|
||||
- **Shared source files** — files like `services/google/google.py` are shared bases. Check which services import from them and update all affected doc pages.
|
||||
|
||||
## Checklist
|
||||
|
||||
Before finishing, verify:
|
||||
|
||||
- [ ] All changed source files were checked against the mapping table
|
||||
- [ ] Each doc page edit matches the actual source code change (not guessed)
|
||||
- [ ] No content was removed unless the corresponding source was removed
|
||||
- [ ] New parameters have accurate types and defaults from source
|
||||
- [ ] Formatting matches the existing page style
|
||||
- [ ] Guides referencing changed APIs were checked and updated
|
||||
- [ ] New service pages were added to `docs.json` in the correct group, alphabetically
|
||||
- [ ] New service pages were added to `supported-services.mdx` in the correct table, alphabetically
|
||||
- [ ] Unmapped files were reported to the user
|
||||
@@ -1,79 +0,0 @@
|
||||
# Source-to-Doc Mapping
|
||||
|
||||
Maps pipecat source files to their documentation pages. Source paths are relative to `src/pipecat/`. Doc paths are relative to `DOCS_PATH`.
|
||||
|
||||
## Name mismatches
|
||||
|
||||
These source paths don't follow the standard `services/{provider}/{type}.py` → `server/services/{type}/{provider}.mdx` pattern.
|
||||
|
||||
| Source path | Doc page |
|
||||
|---|---|
|
||||
| `services/google/llm.py` | `server/services/llm/gemini.mdx` |
|
||||
| `services/google/llm_vertex.py` | `server/services/llm/google-vertex.mdx` |
|
||||
| `services/google/google.py` | (shared base — check which services use it) |
|
||||
| `services/google/gemini_live/**` | `server/services/s2s/gemini-live.mdx` |
|
||||
| `services/google/gemini_live/llm_vertex.py` | `server/services/s2s/gemini-live-vertex.mdx` |
|
||||
| `services/aws_nova_sonic/**` | `server/services/s2s/aws.mdx` |
|
||||
| `services/ultravox/**` | `server/services/s2s/ultravox.mdx` |
|
||||
| `services/grok/realtime/**` | `server/services/s2s/grok.mdx` |
|
||||
| `services/openai/realtime/**` | `server/services/s2s/openai.mdx` |
|
||||
| `processors/frameworks/rtvi.py` | `server/frameworks/rtvi/rtvi-processor.mdx` and `server/frameworks/rtvi/rtvi-observer.mdx` |
|
||||
| `processors/transcript_processor.py` | `server/utilities/transcript-processor.mdx` |
|
||||
| `processors/user_idle_processor.py` | `server/utilities/user-idle-processor.mdx` |
|
||||
| `processors/idle_frame_processor.py` | `server/pipeline/pipeline-idle-detection.mdx` |
|
||||
| `pipeline/task.py` | `server/pipeline/pipeline-task.mdx` |
|
||||
| `pipeline/runner.py` | `server/utilities/runner/guide.mdx` |
|
||||
| `transports/base_transport.py` | `server/services/transport/transport-params.mdx` |
|
||||
|
||||
## Skip list
|
||||
|
||||
These files should never trigger doc updates.
|
||||
|
||||
| Pattern | Reason |
|
||||
|---|---|
|
||||
| `services/ai_service.py` | Internal base class |
|
||||
| `services/stt_service.py` | Internal base class |
|
||||
| `services/tts_service.py` | Internal base class |
|
||||
| `services/llm_service.py` | Internal base class |
|
||||
| `services/websocket_service.py` | Internal base class |
|
||||
| `services/openai_realtime_beta/**` | Deprecated |
|
||||
| `services/openai_realtime/**` | Deprecated |
|
||||
| `services/gemini_multimodal_live/**` | Deprecated |
|
||||
| `services/aws/agent_core.py` | Internal |
|
||||
| `services/aws/sagemaker/**` | No doc page |
|
||||
| `transports/base_input.py` | Internal base class |
|
||||
| `transports/base_output.py` | Internal base class |
|
||||
| `transports/websocket/client.py` | No doc page |
|
||||
| `serializers/base_serializer.py` | Internal base class |
|
||||
| `serializers/protobuf.py` | Internal |
|
||||
| `processors/audio/**` | Internal |
|
||||
| `pipeline/pipeline.py` | Core architecture, not a service doc |
|
||||
|
||||
## Pattern matching
|
||||
|
||||
For files not in the tables above, apply these patterns. Convert underscores to hyphens in provider names for doc filenames.
|
||||
|
||||
| Source pattern | Doc pattern |
|
||||
|---|---|
|
||||
| `services/{provider}/stt*.py` | `server/services/stt/{provider}.mdx` |
|
||||
| `services/{provider}/tts*.py` | `server/services/tts/{provider}.mdx` |
|
||||
| `services/{provider}/llm*.py` | `server/services/llm/{provider}.mdx` |
|
||||
| `services/{provider}/image*.py` | `server/services/image-generation/{provider}.mdx` |
|
||||
| `services/{provider}/video*.py` | `server/services/video/{provider}.mdx` |
|
||||
| `services/{provider}/realtime/**` | `server/services/s2s/{provider}.mdx` |
|
||||
| `transports/{name}/**` | `server/services/transport/{name}.mdx` |
|
||||
| `serializers/{name}.py` | `server/services/serializers/{name}.mdx` |
|
||||
| `observers/**` | `server/utilities/observers/` (match by class name) |
|
||||
| `audio/vad/**` | `server/utilities/audio/` (match by class name) |
|
||||
| `audio/filters/**` | `server/utilities/audio/` (match by class name) |
|
||||
| `audio/mixers/**` | `server/utilities/audio/` (match by class name) |
|
||||
| `processors/filters/**` | `server/utilities/filters/` (match by class name) |
|
||||
|
||||
If the doc file doesn't exist at the resolved path, the file is **unmapped**.
|
||||
|
||||
## Search fallback
|
||||
|
||||
For files that don't match any table or pattern above:
|
||||
1. Extract the main class name(s) from the source file
|
||||
2. Search the docs directory for that class name: `grep -r "ClassName" DOCS_PATH/server/`
|
||||
3. If found in a doc page, use that as the mapping
|
||||
30
.dockerignore
Normal file
30
.dockerignore
Normal file
@@ -0,0 +1,30 @@
|
||||
# flyctl launch added from .gitignore
|
||||
**/.vscode
|
||||
**/env
|
||||
**/__pycache__
|
||||
**/*~
|
||||
**/venv
|
||||
#*#
|
||||
|
||||
# Distribution / packaging
|
||||
**/.Python
|
||||
**/build
|
||||
**/develop-eggs
|
||||
**/dist
|
||||
**/downloads
|
||||
**/eggs
|
||||
**/.eggs
|
||||
**/lib
|
||||
**/lib64
|
||||
**/parts
|
||||
**/sdist
|
||||
**/var
|
||||
**/wheels
|
||||
**/share/python-wheels
|
||||
**/*.egg-info
|
||||
**/.installed.cfg
|
||||
**/*.egg
|
||||
**/MANIFEST
|
||||
**/.DS_Store
|
||||
**/.env
|
||||
fly.toml
|
||||
5
.github/workflows/coverage.yaml
vendored
5
.github/workflows/coverage.yaml
vendored
@@ -29,7 +29,6 @@ jobs:
|
||||
|
||||
- name: Install system packages
|
||||
run: |
|
||||
sudo apt-get update
|
||||
sudo apt-get install -y portaudio19-dev
|
||||
|
||||
- name: Install dependencies
|
||||
@@ -37,14 +36,10 @@ jobs:
|
||||
uv sync --group dev \
|
||||
--extra anthropic \
|
||||
--extra aws \
|
||||
--extra deepgram \
|
||||
--extra google \
|
||||
--extra langchain \
|
||||
--extra livekit \
|
||||
--extra piper \
|
||||
--extra runner \
|
||||
--extra sagemaker \
|
||||
--extra tracing \
|
||||
--extra websocket
|
||||
|
||||
- name: Run tests with coverage
|
||||
|
||||
8
.github/workflows/format.yaml
vendored
8
.github/workflows/format.yaml
vendored
@@ -32,9 +32,7 @@ jobs:
|
||||
run: uv python install 3.12
|
||||
|
||||
- name: Install development dependencies
|
||||
# `--all-extras` (matching the dev setup in README.md) so pyright can
|
||||
# resolve types from various optional dependencies.
|
||||
run: uv sync --group dev --all-extras --no-extra gstreamer --no-extra local
|
||||
run: uv sync --group dev
|
||||
|
||||
- name: Ruff formatter
|
||||
id: ruff-format
|
||||
@@ -43,7 +41,3 @@ jobs:
|
||||
- name: Ruff linter (all rules)
|
||||
id: ruff-check
|
||||
run: uv run ruff check
|
||||
|
||||
- name: Type check (pyright)
|
||||
id: pyright
|
||||
run: uv run pyright
|
||||
|
||||
2
.github/workflows/generate-changelog.yml
vendored
2
.github/workflows/generate-changelog.yml
vendored
@@ -86,7 +86,7 @@ jobs:
|
||||
fi
|
||||
|
||||
# Validate fragment types
|
||||
VALID_TYPES="added changed deprecated removed fixed performance security other"
|
||||
VALID_TYPES="added changed deprecated removed fixed security other"
|
||||
INVALID_FRAGMENTS=""
|
||||
|
||||
for file in changelog/*.md; do
|
||||
|
||||
16
.github/workflows/python-compatibility.yaml
vendored
16
.github/workflows/python-compatibility.yaml
vendored
@@ -14,7 +14,7 @@ jobs:
|
||||
strategy:
|
||||
fail-fast: false
|
||||
matrix:
|
||||
python-version: ['3.11.15', '3.12.13', '3.13.12', '3.14.3']
|
||||
python-version: ['3.10.18', '3.11.13', '3.12.11', '3.13.5']
|
||||
|
||||
name: Python ${{ matrix.python-version }}
|
||||
steps:
|
||||
@@ -40,9 +40,19 @@ jobs:
|
||||
uv python install ${{ matrix.python-version }}
|
||||
uv python pin ${{ matrix.python-version }}
|
||||
|
||||
- name: Test uv sync with all extras
|
||||
- name: Test uv sync with all extras (Python < 3.13)
|
||||
if: "!startsWith(matrix.python-version, '3.13.')"
|
||||
run: |
|
||||
uv sync --group dev --all-extras
|
||||
uv sync --group dev --all-extras --no-extra krisp
|
||||
|
||||
- name: Test uv sync without PyTorch extras (Python 3.13+)
|
||||
if: startsWith(matrix.python-version, '3.13.')
|
||||
run: |
|
||||
uv sync --group dev --all-extras \
|
||||
--no-extra krisp \
|
||||
--no-extra local-smart-turn \
|
||||
--no-extra moondream \
|
||||
--no-extra mlx-whisper
|
||||
|
||||
- name: Verify installation
|
||||
run: |
|
||||
|
||||
51
.github/workflows/sync-quickstart.yaml
vendored
Normal file
51
.github/workflows/sync-quickstart.yaml
vendored
Normal file
@@ -0,0 +1,51 @@
|
||||
name: Sync Quickstart to pipecat-quickstart repo
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [main]
|
||||
paths:
|
||||
- 'examples/quickstart/**'
|
||||
workflow_dispatch: # Manual trigger
|
||||
|
||||
jobs:
|
||||
sync-quickstart:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Checkout main repo
|
||||
uses: actions/checkout@v4
|
||||
with:
|
||||
fetch-depth: 0
|
||||
|
||||
- name: Checkout quickstart repo
|
||||
uses: actions/checkout@v4
|
||||
with:
|
||||
repository: pipecat-ai/pipecat-quickstart
|
||||
token: ${{ secrets.QUICKSTART_SYNC_TOKEN }}
|
||||
path: quickstart-repo
|
||||
|
||||
- name: Sync files (excluding uv.lock and README.md)
|
||||
run: |
|
||||
# Copy all files except uv.lock and README.md
|
||||
find examples/quickstart -type f \
|
||||
-not -name "README.md" \
|
||||
-not -name "uv.lock" \
|
||||
-exec cp {} quickstart-repo/ \;
|
||||
|
||||
- name: Commit and push changes
|
||||
run: |
|
||||
cd quickstart-repo
|
||||
git config user.name "GitHub Action"
|
||||
git config user.email "action@github.com"
|
||||
git add .
|
||||
|
||||
# Only commit if there are changes
|
||||
if ! git diff --staged --quiet; then
|
||||
git commit -m "Sync from pipecat main repo
|
||||
|
||||
Updated files from examples/quickstart/
|
||||
Commit: ${{ github.sha }}
|
||||
"
|
||||
git push
|
||||
else
|
||||
echo "No changes to sync"
|
||||
fi
|
||||
5
.github/workflows/tests.yaml
vendored
5
.github/workflows/tests.yaml
vendored
@@ -33,7 +33,6 @@ jobs:
|
||||
|
||||
- name: Install system packages
|
||||
run: |
|
||||
sudo apt-get update
|
||||
sudo apt-get install -y portaudio19-dev
|
||||
|
||||
- name: Install dependencies
|
||||
@@ -41,14 +40,10 @@ jobs:
|
||||
uv sync --group dev \
|
||||
--extra anthropic \
|
||||
--extra aws \
|
||||
--extra deepgram \
|
||||
--extra google \
|
||||
--extra langchain \
|
||||
--extra livekit \
|
||||
--extra piper \
|
||||
--extra runner \
|
||||
--extra sagemaker \
|
||||
--extra tracing \
|
||||
--extra websocket
|
||||
|
||||
- name: Test with pytest
|
||||
|
||||
148
.github/workflows/update-docs.yml
vendored
148
.github/workflows/update-docs.yml
vendored
@@ -1,148 +0,0 @@
|
||||
name: Update Documentation on PR Merge
|
||||
|
||||
on:
|
||||
pull_request_target:
|
||||
types: [closed]
|
||||
branches: [main]
|
||||
paths:
|
||||
- "src/pipecat/services/**"
|
||||
- "src/pipecat/transports/**"
|
||||
- "src/pipecat/serializers/**"
|
||||
- "src/pipecat/processors/**"
|
||||
- "src/pipecat/audio/**"
|
||||
- "src/pipecat/turns/**"
|
||||
- "src/pipecat/observers/**"
|
||||
- "src/pipecat/pipeline/**"
|
||||
workflow_dispatch:
|
||||
inputs:
|
||||
pr_number:
|
||||
description: "PR number to generate docs for"
|
||||
required: true
|
||||
type: string
|
||||
|
||||
jobs:
|
||||
update-docs:
|
||||
if: >-
|
||||
github.event_name == 'workflow_dispatch' ||
|
||||
github.event.pull_request.merged == true
|
||||
runs-on: ubuntu-latest
|
||||
timeout-minutes: 15
|
||||
permissions:
|
||||
contents: read
|
||||
pull-requests: read
|
||||
id-token: write
|
||||
steps:
|
||||
- name: Checkout pipecat
|
||||
uses: actions/checkout@v4
|
||||
with:
|
||||
fetch-depth: 0
|
||||
|
||||
- name: Checkout docs
|
||||
uses: actions/checkout@v4
|
||||
with:
|
||||
repository: pipecat-ai/docs
|
||||
token: ${{ secrets.DOCS_SYNC_TOKEN }}
|
||||
path: _docs
|
||||
|
||||
- name: Resolve PR number
|
||||
id: pr
|
||||
run: |
|
||||
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
|
||||
echo "number=${{ inputs.pr_number }}" >> "$GITHUB_OUTPUT"
|
||||
else
|
||||
echo "number=${{ github.event.pull_request.number }}" >> "$GITHUB_OUTPUT"
|
||||
fi
|
||||
|
||||
- name: Update documentation
|
||||
uses: anthropics/claude-code-action@v1
|
||||
env:
|
||||
DOCS_SYNC_TOKEN: ${{ secrets.DOCS_SYNC_TOKEN }}
|
||||
with:
|
||||
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
|
||||
github_token: ${{ secrets.GITHUB_TOKEN }}
|
||||
prompt: |
|
||||
You are updating documentation for the pipecat-ai/docs repository based on
|
||||
changes merged in PR #${{ steps.pr.outputs.number }} of pipecat-ai/pipecat.
|
||||
|
||||
## Setup
|
||||
|
||||
1. Read the skill instructions at `.claude/skills/update-docs/SKILL.md`
|
||||
2. Read the source-to-doc mapping at `.claude/skills/update-docs/SOURCE_DOC_MAPPING.md`
|
||||
3. The docs repository is checked out at `./_docs/`
|
||||
|
||||
## Get the diff
|
||||
|
||||
Run `gh pr diff ${{ steps.pr.outputs.number }}` to see what changed in the PR.
|
||||
Also run `gh pr diff ${{ steps.pr.outputs.number }} --name-only` to get the list of changed files.
|
||||
Filter to source files matching the directories listed in SKILL.md Step 3.
|
||||
|
||||
If no relevant source files were changed, exit with "No documentation changes needed."
|
||||
|
||||
## Follow the skill instructions
|
||||
|
||||
Apply the SKILL.md workflow (Steps 3-9) with these adaptations for automation:
|
||||
|
||||
### Docs path
|
||||
Use `./_docs/` — it's already checked out. Do not ask for a path.
|
||||
|
||||
### Branch management
|
||||
- Branch name: `docs/pr-${{ steps.pr.outputs.number }}`
|
||||
- Work inside `./_docs/` for all doc edits and git operations
|
||||
- Check if the branch already exists on the remote:
|
||||
```bash
|
||||
cd _docs && git fetch origin docs/pr-${{ steps.pr.outputs.number }} 2>/dev/null
|
||||
```
|
||||
- If it exists: check it out (supports workflow re-runs)
|
||||
- If not: create it from main
|
||||
|
||||
### Git config
|
||||
Before committing in `_docs`, set:
|
||||
```bash
|
||||
git config user.name "github-actions[bot]"
|
||||
git config user.email "github-actions[bot]@users.noreply.github.com"
|
||||
```
|
||||
|
||||
### No interactive questions
|
||||
Do not ask questions. If you encounter gaps (unmapped files, missing sections,
|
||||
ambiguous changes), note them in the PR body under "## Gaps identified".
|
||||
|
||||
### Creating the docs PR
|
||||
After committing all changes in `_docs`, push and create a PR:
|
||||
```bash
|
||||
cd _docs
|
||||
git push -u origin docs/pr-${{ steps.pr.outputs.number }}
|
||||
GH_TOKEN=$DOCS_SYNC_TOKEN gh pr create \
|
||||
--repo pipecat-ai/docs \
|
||||
--label auto-docs \
|
||||
--label pipecat \
|
||||
--title "docs: update for pipecat PR #${{ steps.pr.outputs.number }}" \
|
||||
--body "$(cat <<'BODY'
|
||||
Automated documentation update for [pipecat PR #${{ steps.pr.outputs.number }}](https://github.com/pipecat-ai/pipecat/pull/${{ steps.pr.outputs.number }}).
|
||||
|
||||
## Changes
|
||||
<summarize each doc page updated and what changed>
|
||||
|
||||
## Gaps identified
|
||||
<any unmapped files, missing doc pages, or missing sections — or "None">
|
||||
BODY
|
||||
)"
|
||||
```
|
||||
|
||||
### Re-run handling
|
||||
If `gh pr create` fails because a PR from that branch already exists,
|
||||
push the updated commits and use `gh pr edit` to update the body instead.
|
||||
|
||||
### No-op
|
||||
If after analyzing the diff you determine no documentation changes are needed
|
||||
(e.g., only skip-listed files changed, or changes don't affect public API docs),
|
||||
exit cleanly without creating a branch or PR. Output "No documentation changes needed."
|
||||
|
||||
## Important rules
|
||||
- Only modify files inside `./_docs/` — never modify pipecat source code
|
||||
- Follow the conservative editing rules from SKILL.md Step 6
|
||||
- Read each doc page fully before editing (SKILL.md Guidelines)
|
||||
- Use `GH_TOKEN=$DOCS_SYNC_TOKEN` for all `gh` commands targeting pipecat-ai/docs
|
||||
claude_args: |
|
||||
--model claude-sonnet-4-5-20250929
|
||||
--max-turns 30
|
||||
--allowedTools "Read,Write,Edit,Glob,Grep,Bash"
|
||||
@@ -1,13 +1,8 @@
|
||||
repos:
|
||||
- repo: local
|
||||
- repo: https://github.com/astral-sh/ruff-pre-commit
|
||||
rev: v0.12.1
|
||||
hooks:
|
||||
- id: ruff
|
||||
name: ruff
|
||||
entry: uv run ruff check --fix
|
||||
language: system
|
||||
types: [python]
|
||||
language_version: python3
|
||||
args: [--fix]
|
||||
- id: ruff-format
|
||||
name: ruff-format
|
||||
entry: uv run ruff format
|
||||
language: system
|
||||
types: [python]
|
||||
|
||||
@@ -11,7 +11,7 @@ build:
|
||||
jobs:
|
||||
post_install:
|
||||
- pip install uv
|
||||
- UV_PROJECT_ENVIRONMENT=$READTHEDOCS_VIRTUALENV_PATH uv sync --group docs --all-extras --no-extra gstreamer --no-extra local_smart_turn --no-extra moondream --no-extra mlx-whisper
|
||||
- UV_PROJECT_ENVIRONMENT=$READTHEDOCS_VIRTUALENV_PATH uv sync --group docs --all-extras --no-extra krisp --no-extra gstreamer --no-extra local_smart_turn --no-extra moondream --no-extra riva --no-extra mlx-whisper
|
||||
|
||||
sphinx:
|
||||
configuration: docs/api/conf.py
|
||||
|
||||
174
AGENTS.md
174
AGENTS.md
@@ -1,174 +0,0 @@
|
||||
# AGENTS.md
|
||||
|
||||
This file provides guidance to AI coding agents when working with code in this repository.
|
||||
|
||||
## Project Overview
|
||||
|
||||
Pipecat is an open-source Python framework for building real-time voice and multimodal conversational AI agents. It orchestrates audio/video, AI services, transports, and conversation pipelines using a frame-based architecture.
|
||||
|
||||
## Common Commands
|
||||
|
||||
```bash
|
||||
# Setup development environment
|
||||
uv sync --group dev --all-extras --no-extra gstreamer --no-extra local
|
||||
|
||||
# Install pre-commit hooks
|
||||
uv run pre-commit install
|
||||
|
||||
# Run all tests
|
||||
uv run pytest
|
||||
|
||||
# Run a single test file
|
||||
uv run pytest tests/test_name.py
|
||||
|
||||
# Run a specific test
|
||||
uv run pytest tests/test_name.py::test_function_name
|
||||
|
||||
# Preview changelog
|
||||
uv run towncrier build --draft --version Unreleased
|
||||
|
||||
# Lint and format check
|
||||
uv run ruff check
|
||||
uv run ruff format --check
|
||||
|
||||
# Update dependencies (after editing pyproject.toml)
|
||||
uv lock && uv sync
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
### Frame-Based Pipeline Processing
|
||||
|
||||
All data flows as **Frame** objects through a pipeline of **FrameProcessors**:
|
||||
|
||||
```
|
||||
[Processor1] → [Processor2] → ... → [ProcessorN]
|
||||
```
|
||||
|
||||
**Key components:**
|
||||
|
||||
- **Frames** (`src/pipecat/frames/frames.py`): Data units (audio, text, video) and control signals. Flow DOWNSTREAM (input→output) or UPSTREAM (acknowledgments/errors).
|
||||
|
||||
- **FrameProcessor** (`src/pipecat/processors/frame_processor.py`): Base processing unit. Each processor receives frames, processes them, and pushes results downstream.
|
||||
|
||||
- **Pipeline** (`src/pipecat/pipeline/pipeline.py`): Chains processors together.
|
||||
|
||||
- **ParallelPipeline** (`src/pipecat/pipeline/parallel_pipeline.py`): Runs multiple pipelines in parallel.
|
||||
|
||||
- **Transports** (`src/pipecat/transports/`): Transports are frame processors used for external I/O layer (Daily WebRTC, LiveKit WebRTC, WebSocket, Local). Abstract interface via `BaseTransport`, `BaseInputTransport` and `BaseOutputTransport`.
|
||||
|
||||
- **Pipeline Task (`src/pipecat/pipeline/task.py`)**: Runs and manages a pipeline. Pipeline tasks send the first frame, `StartFrame`, to the pipeline in order for processors to know they can start processing and pushing frames. Pipeline tasks internally create a pipeline with two additional processors, a source processor before the user-defined pipeline and a sink processor at the end. Those are used for multiple things: error handling, pipeline task level events, heartbeat monitoring, etc.
|
||||
|
||||
- **Pipeline Runner (`src/pipecat/pipeline/runner.py`)**: High-level entry point for executing pipeline tasks. Handles signal management (SIGINT/SIGTERM) for graceful shutdown and optional garbage collection. Run a single pipeline task with `await runner.run(task)` or multiple concurrently with `await asyncio.gather(runner.run(task1), runner.run(task2))`.
|
||||
|
||||
- **Services** (`src/pipecat/services/`): 60+ AI provider integrations (STT, TTS, LLM, etc.). Extend base classes: `AIService`, `LLMService`, `STTService`, `TTSService`, `VisionService`.
|
||||
|
||||
- **Serializers** (`src/pipecat/serializers/`): Convert frames to/from wire formats for WebSocket transports. `FrameSerializer` base class defines `serialize()` and `deserialize()`. Telephony serializers (Twilio, Plivo, Vonage, Telnyx, Exotel, Genesys) handle provider-specific protocols and audio encoding (e.g., μ-law).
|
||||
|
||||
- **RTVI** (`src/pipecat/processors/frameworks/rtvi.py`): Real-Time Voice Interface protocol bridging clients and the pipeline. `RTVIProcessor` handles incoming client messages (text input, audio, function call results). `RTVIObserver` converts pipeline frames to outgoing messages: user/bot speaking events, transcriptions, LLM/TTS lifecycle, function calls, metrics, and audio levels.
|
||||
|
||||
- **Observers** (`src/pipecat/observers/`): Monitor frame flow without modifying the pipeline. Passed to `PipelineTask` via the `observers` parameter. Implement `on_process_frame()` and `on_push_frame()` callbacks.
|
||||
|
||||
### Important Patterns
|
||||
|
||||
- **Context Aggregation**: `LLMContext` accumulates messages for LLM calls; `UserResponse` aggregates user input
|
||||
|
||||
- **Turn Management**: Turn management is done through `LLMUserAggregator` and
|
||||
`LLMAssistantAggregator`, created with `LLMContextAggregatorPair`
|
||||
|
||||
- **User turn strategies**: Detection of when the user starts and stops speaking is done via user turn start/stop strategies. They push `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` respectively.
|
||||
|
||||
- **Interruptions**: Interruptions are usually triggered by a user turn start strategy (e.g. `VADUserTurnStartStrategy`) but they can be triggered by other processors as well, in which case the user turn start strategies don't need to. An `InterruptionFrame` carries an optional `asyncio.Event` that is set when the frame reaches the pipeline sink. If a processor stops an `InterruptionFrame` from propagating downstream (i.e., doesn't push it), it **must** call `frame.complete()` to avoid stalling `push_interruption_task_frame_and_wait()` callers.
|
||||
|
||||
- **Uninterruptible Frames**: These are frames that will not be removed from internal queues even if there's an interruption. For example, `EndFrame` and `StopFrame`.
|
||||
|
||||
- **Events**: Most classes in Pipecat have `BaseObject` as the very base class. `BaseObject` has support for events. Events can run in the background in an async task (default) or synchronously (`sync=True`) if we want immediate action. Synchronous event handlers need to execute fast.
|
||||
|
||||
- **Async Task Management**: Always use `self.create_task(coroutine, name)` instead of raw `asyncio.create_task()`. The `TaskManager` automatically tracks tasks and cleans them up on processor shutdown. Use `await self.cancel_task(task, timeout)` for cancellation.
|
||||
|
||||
- **Error Handling**: Use `await self.push_error(msg, exception, fatal)` to push errors upstream. Services should use `fatal=False` (the default) so application code can handle errors and take action (e.g. switch to another service).
|
||||
|
||||
### Key Directories
|
||||
|
||||
| Directory | Purpose |
|
||||
| -------------------------- | -------------------------------------------------- |
|
||||
| `src/pipecat/frames/` | Frame definitions (100+ types) |
|
||||
| `src/pipecat/processors/` | FrameProcessor base + aggregators, filters, audio |
|
||||
| `src/pipecat/pipeline/` | Pipeline orchestration |
|
||||
| `src/pipecat/services/` | AI service integrations (60+ providers) |
|
||||
| `src/pipecat/transports/` | Transport layer (Daily, LiveKit, WebSocket, Local) |
|
||||
| `src/pipecat/serializers/` | Frame serialization for WebSocket protocols |
|
||||
| `src/pipecat/observers/` | Pipeline observers for monitoring frame flow |
|
||||
| `src/pipecat/audio/` | VAD, filters, mixers, turn detection, DTMF |
|
||||
| `src/pipecat/turns/` | User turn management |
|
||||
|
||||
## Code Style
|
||||
|
||||
- **Docstrings**: Google-style. Classes describe purpose; `__init__` has `Args:` section; dataclasses use `Parameters:` section.
|
||||
- **Deprecations**: Use the `.. deprecated:: <version>` Sphinx directive in docstrings (never inline tags like `[DEPRECATED]`), and pair it with a runtime `warnings.warn(..., DeprecationWarning)` at the call site. See `CONTRIBUTING.md` for full conventions.
|
||||
- **Linting**: Ruff (line length 100). Pre-commit hooks enforce formatting.
|
||||
- **Type hints**: Required for complex async code.
|
||||
- **Dataclass vs Pydantic**: Use `@dataclass` for frames and internal pipeline data (high-frequency, no validation needed). Use Pydantic `BaseModel` for configuration, parameters, metrics, and external API data (benefits from validation and serialization). Specifically:
|
||||
- `@dataclass`: Frame types, context aggregator pairs, internal data containers
|
||||
- `BaseModel`: Service `InputParams`, transport/VAD/turn params, metrics data, API request/response models, serializer params
|
||||
|
||||
### Docstring Example
|
||||
|
||||
```python
|
||||
class MyService(LLMService):
|
||||
"""Description of what the service does.
|
||||
|
||||
More detailed description.
|
||||
|
||||
Event handlers available:
|
||||
|
||||
- on_connected: Called when we are connected
|
||||
|
||||
Example::
|
||||
|
||||
@service.event_handler("on_connected")
|
||||
async def on_connected(service, frame):
|
||||
...
|
||||
"""
|
||||
|
||||
def __init__(self, param1: str, **kwargs):
|
||||
"""Initialize the service.
|
||||
|
||||
Args:
|
||||
param1: Description of param1.
|
||||
**kwargs: Additional arguments passed to parent.
|
||||
"""
|
||||
super().__init__(**kwargs)
|
||||
|
||||
|
||||
# Pydantic params class with a deprecated field
|
||||
class MyParams(BaseModel):
|
||||
"""Configuration parameters for MyService.
|
||||
|
||||
Parameters:
|
||||
new_setting: Replacement for ``old_setting``.
|
||||
old_setting: Legacy setting, no longer used.
|
||||
|
||||
.. deprecated:: 1.2.0
|
||||
Use ``new_setting`` instead. Will be removed in 2.0.0.
|
||||
"""
|
||||
|
||||
new_setting: str = "default"
|
||||
old_setting: str | None = None
|
||||
```
|
||||
|
||||
## Service Implementation
|
||||
|
||||
When adding a new service:
|
||||
|
||||
1. Extend the appropriate base class (`STTService`, `TTSService`, `LLMService`, etc.)
|
||||
2. Implement required abstract methods
|
||||
3. Handle necessary frames
|
||||
4. By default, all frames should be pushed in the direction they came
|
||||
5. Push `ErrorFrame` on failures
|
||||
6. Add metrics tracking via `MetricsData` if relevant
|
||||
7. Follow the pattern of existing services in `src/pipecat/services/`
|
||||
|
||||
## Testing
|
||||
|
||||
Test utilities live in `src/pipecat/tests/utils.py`. Use `run_test()` to send frames through a pipeline and assert expected output frames in each direction. Use `SleepFrame(sleep=N)` to add delays between frames.
|
||||
2832
CHANGELOG.md
2832
CHANGELOG.md
File diff suppressed because it is too large
Load Diff
62
CHANGELOG.md.template
Normal file
62
CHANGELOG.md.template
Normal file
@@ -0,0 +1,62 @@
|
||||
# Changelog
|
||||
|
||||
All notable changes to the **<project name>** SDK will be documented in this file.
|
||||
|
||||
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
||||
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||
|
||||
Please make sure to add your changes to the appropriate categories:
|
||||
|
||||
## [Unreleased]
|
||||
|
||||
### Added
|
||||
|
||||
<!-- for new functionality -->
|
||||
|
||||
- n/a
|
||||
|
||||
### Changed
|
||||
|
||||
<!-- for changed functionality -->
|
||||
|
||||
- n/a
|
||||
|
||||
### Deprecated
|
||||
|
||||
<!-- for soon-to-be removed functionality -->
|
||||
|
||||
- n/a
|
||||
|
||||
### Removed
|
||||
|
||||
<!-- for removed functionality -->
|
||||
|
||||
- n/a
|
||||
|
||||
### Fixed
|
||||
|
||||
<!-- for fixed bugs -->
|
||||
|
||||
- n/a
|
||||
|
||||
### Performance
|
||||
|
||||
<!-- for performance-relevant changes -->
|
||||
|
||||
- n/a
|
||||
|
||||
### Security
|
||||
|
||||
<!-- for security-relevant changes -->
|
||||
|
||||
- n/a
|
||||
|
||||
### Other
|
||||
|
||||
<!-- for everything else -->
|
||||
|
||||
- n/a
|
||||
|
||||
## [0.1.0] - YYYY-MM-DD
|
||||
|
||||
Initial release.
|
||||
144
CLAUDE.md
144
CLAUDE.md
@@ -1 +1,143 @@
|
||||
@AGENTS.md
|
||||
# CLAUDE.md
|
||||
|
||||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||
|
||||
## Project Overview
|
||||
|
||||
Pipecat is an open-source Python framework for building real-time voice and multimodal conversational AI agents. It orchestrates audio/video, AI services, transports, and conversation pipelines using a frame-based architecture.
|
||||
|
||||
## Common Commands
|
||||
|
||||
```bash
|
||||
# Setup development environment
|
||||
uv sync --group dev --all-extras --no-extra gstreamer --no-extra krisp
|
||||
|
||||
# Install pre-commit hooks
|
||||
uv run pre-commit install
|
||||
|
||||
# Run all tests
|
||||
uv run pytest
|
||||
|
||||
# Run a single test file
|
||||
uv run pytest tests/test_name.py
|
||||
|
||||
# Run a specific test
|
||||
uv run pytest tests/test_name.py::test_function_name
|
||||
|
||||
# Preview changelog
|
||||
towncrier build --draft --version Unreleased
|
||||
|
||||
# Lint and format check
|
||||
uv run ruff check
|
||||
uv run ruff format --check
|
||||
|
||||
# Update dependencies (after editing pyproject.toml)
|
||||
uv lock && uv sync
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
### Frame-Based Pipeline Processing
|
||||
|
||||
All data flows as **Frame** objects through a pipeline of **FrameProcessors**:
|
||||
|
||||
```
|
||||
Transport Input → Pipeline Source → [Processor1] → [Processor2] → ... → Pipeline Sink → Transport Output
|
||||
```
|
||||
|
||||
**Key components:**
|
||||
|
||||
- **Frames** (`src/pipecat/frames/frames.py`): Data units (audio, text, video) and control signals. Flow DOWNSTREAM (input→output) or UPSTREAM (acknowledgments/errors).
|
||||
|
||||
- **FrameProcessor** (`src/pipecat/processors/frame_processor.py`): Base processing unit. Each processor receives frames, processes them, and pushes results downstream.
|
||||
|
||||
- **Pipeline** (`src/pipecat/pipeline/pipeline.py`): Chains processors together.
|
||||
|
||||
- **ParallelPipeline** (`src/pipecat/pipeline/parallel_pipeline.py`): Runs multiple pipelines in parallel.
|
||||
|
||||
- **Transports** (`src/pipecat/transports/`): External I/O layer (Daily WebRTC, LiveKit WebRTC, WebSocket, Local). Abstract interface via `BaseTransport`.
|
||||
|
||||
- **Services** (`src/pipecat/services/`): 60+ AI provider integrations (STT, TTS, LLM, etc.). Extend base classes: `AIService`, `LLMService`, `STTService`, `TTSService`, `VisionService`.
|
||||
|
||||
- **Serializers** (`src/pipecat/serializers/`): Convert frames to/from wire formats for WebSocket transports. `FrameSerializer` base class defines `serialize()` and `deserialize()`. Telephony serializers (Twilio, Plivo, Vonage, Telnyx, Exotel, Genesys) handle provider-specific protocols and audio encoding (e.g., μ-law).
|
||||
|
||||
- **RTVI** (`src/pipecat/processors/frameworks/rtvi.py`): Real-Time Voice Interface protocol bridging clients and the pipeline. `RTVIProcessor` handles incoming client messages (text input, audio, function call results). `RTVIObserver` converts pipeline frames to outgoing messages: user/bot speaking events, transcriptions, LLM/TTS lifecycle, function calls, metrics, and audio levels.
|
||||
|
||||
### Important Patterns
|
||||
|
||||
- **Context Aggregation**: `LLMContext` accumulates messages for LLM calls; `UserResponse` aggregates user input
|
||||
|
||||
- **Turn Management**: Turn management is done through `LLMUserAggregator` and
|
||||
`LLMAssistantAggregator`, created with `LLMContextAggregatorPair`
|
||||
|
||||
- **User turn strategies**: Detection of when the user starts and stops speaking is done via user turn start/stop strategies. They push `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` respectively.
|
||||
|
||||
- **Interruptions**: Interruptions are usually triggered by a user turn start strategy (e.g. `VADUserTurnStartStrategy`) but they can be triggered by other processors as well, in which case the user turn start strategies don't need to. An `InterruptionFrame` carries an optional `asyncio.Event` that is set when the frame reaches the pipeline sink. If a processor stops an `InterruptionFrame` from propagating downstream (i.e., doesn't push it), it **must** call `frame.complete()` to avoid stalling `push_interruption_task_frame_and_wait()` callers.
|
||||
|
||||
- **Uninterruptible Frames**: These are frames that will not be removed from internal queues even if there's an interruption. For example, `EndFrame` and `StopFrame`.
|
||||
|
||||
- **Events**: Most classes in Pipecat have `BaseObject` as the very base class. `BaseObject` has support for events. Events can run in the background in an async task (default) or synchronously (`sync=True`) if we want immediate action. Synchronous event handlers need to exectue fast.
|
||||
|
||||
### Key Directories
|
||||
|
||||
| Directory | Purpose |
|
||||
|---------------------------|----------------------------------------------------|
|
||||
| `src/pipecat/frames/` | Frame definitions (100+ types) |
|
||||
| `src/pipecat/processors/` | FrameProcessor base + aggregators, filters, audio |
|
||||
| `src/pipecat/pipeline/` | Pipeline orchestration |
|
||||
| `src/pipecat/services/` | AI service integrations (60+ providers) |
|
||||
| `src/pipecat/transports/` | Transport layer (Daily, LiveKit, WebSocket, Local) |
|
||||
| `src/pipecat/serializers/`| Frame serialization for WebSocket protocols |
|
||||
| `src/pipecat/audio/` | VAD, filters, mixers, turn detection, DTMF |
|
||||
| `src/pipecat/turns/` | User turn management |
|
||||
|
||||
## Code Style
|
||||
|
||||
- **Docstrings**: Google-style. Classes describe purpose; `__init__` has `Args:` section; dataclasses use `Parameters:` section.
|
||||
- **Linting**: Ruff (line length 100). Pre-commit hooks enforce formatting.
|
||||
- **Type hints**: Required for complex async code.
|
||||
|
||||
### Docstring Example
|
||||
|
||||
```python
|
||||
class MyService(LLMService):
|
||||
"""Description of what the service does.
|
||||
|
||||
More detailed description.
|
||||
|
||||
Event handlers available:
|
||||
|
||||
- on_connected: Called when we are connected
|
||||
|
||||
Example::
|
||||
|
||||
@service.event_handler("on_connected")
|
||||
async def on_connected(service, frame):
|
||||
...
|
||||
"""
|
||||
|
||||
def __init__(self, param1: str, **kwargs):
|
||||
"""Initialize the service.
|
||||
|
||||
Args:
|
||||
param1: Description of param1.
|
||||
**kwargs: Additional arguments passed to parent.
|
||||
"""
|
||||
super().__init__(**kwargs)
|
||||
```
|
||||
|
||||
## Service Implementation
|
||||
|
||||
When adding a new service:
|
||||
|
||||
1. Extend the appropriate base class (`STTService`, `TTSService`, `LLMService`, etc.)
|
||||
2. Implement required abstract methods
|
||||
3. Handle necessary frames
|
||||
4. By default, all frames should be pushed in the direction they came
|
||||
5. Push `ErrorFrame` on failures
|
||||
6. Add metrics tracking via `MetricsData` if relevant
|
||||
7. Follow the pattern of existing services in `src/pipecat/services/`
|
||||
|
||||
## Pull Requests
|
||||
|
||||
After creating a PR, use `/changelog <pr_number>` to generate the changelog file and `/pr-description <pr_number>` to update the PR description.
|
||||
|
||||
@@ -23,8 +23,9 @@ Create your integration following the patterns and examples shown in the "Integr
|
||||
Your repository must contain these components:
|
||||
|
||||
- **Source code** - Complete implementation following Pipecat patterns
|
||||
- **Foundational example** - Single file example showing basic usage (see [Pipecat examples](https://github.com/pipecat-ai/pipecat/tree/main/examples))
|
||||
- **Foundational example** - Single file example showing basic usage (see [Pipecat examples](https://github.com/pipecat-ai/pipecat/tree/main/examples/foundational))
|
||||
- **README.md** - Must include:
|
||||
|
||||
- Introduction and explanation of your integration
|
||||
- Installation instructions
|
||||
- Usage instructions with Pipecat Pipeline
|
||||
@@ -65,25 +66,12 @@ Once your PR is submitted, post in the `#community-integrations` Discord channel
|
||||
|
||||
#### Websocket-based Services
|
||||
|
||||
**Base class:** `WebsocketSTTService`
|
||||
|
||||
**Use for:** Services where you manage the websocket connection directly. Combines `STTService` with `WebsocketService` for automatic reconnection and keepalive support.
|
||||
|
||||
**Examples:**
|
||||
|
||||
- [CartesiaSTTService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/cartesia/stt.py)
|
||||
- [ElevenLabsRealtimeSTTService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/elevenlabs/stt.py)
|
||||
|
||||
#### SDK-based Streaming Services
|
||||
|
||||
**Base class:** `STTService`
|
||||
|
||||
**Use for:** Streaming services where the provider's Python SDK manages the connection internally.
|
||||
|
||||
**Examples:**
|
||||
|
||||
- [DeepgramSTTService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/deepgram/stt.py)
|
||||
- [GoogleSTTService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/google/stt.py)
|
||||
- [SpeechmaticsSTTService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/speechmatics/stt.py)
|
||||
|
||||
#### File-based Services
|
||||
|
||||
@@ -121,59 +109,56 @@ Once your PR is submitted, post in the `#community-integrations` Discord channel
|
||||
|
||||
#### Key requirements:
|
||||
|
||||
- **`_process_context(self, context: LLMContext)`** — The main method that processes an LLM context and generates a response. Each LLM service overrides `process_frame` to extract context from `LLMContextFrame` and calls `_process_context`.
|
||||
|
||||
- **`adapter_class`** — Class attribute pointing to a `BaseLLMAdapter` subclass. Defaults to `OpenAILLMAdapter`. Non-OpenAI services must implement their own adapter (see `src/pipecat/adapters/base_llm_adapter.py`) with methods:
|
||||
- `get_llm_invocation_params(context)` — Extract provider-specific params from universal context
|
||||
- `to_provider_tools_format(tools_schema)` — Convert standard tools to provider format
|
||||
- `get_messages_for_logging(context)` — Format messages for logging
|
||||
- Reference adapters: `src/pipecat/adapters/services/` (anthropic, gemini, bedrock, etc.)
|
||||
|
||||
- **Frame sequence:** Output must follow this frame sequence pattern:
|
||||
- `LLMFullResponseStartFrame` — Signals the start of an LLM response
|
||||
- `LLMTextFrame` — Contains LLM content, typically streamed as tokens
|
||||
- `LLMFullResponseEndFrame` — Signals the end of an LLM response
|
||||
|
||||
- **Thought frames (reasoning models):** If the model supports extended thinking / chain-of-thought, emit thought frames alongside the response:
|
||||
- `LLMThoughtStartFrame` — Signals the start of a thought
|
||||
- `LLMThoughtTextFrame` — Contains thought content, streamed as tokens
|
||||
- `LLMThoughtEndFrame` — Signals the end of a thought
|
||||
- `LLMFullResponseStartFrame` - Signals the start of an LLM response
|
||||
- `LLMTextFrame` - Contains LLM content, typically streamed as tokens
|
||||
- `LLMFullResponseEndFrame` - Signals the end of an LLM response
|
||||
|
||||
- **Context aggregation** is handled by the framework via `LLMContext` + `LLMContextAggregatorPair`. The LLM service just processes context it receives — no need to implement aggregators.
|
||||
- **Context aggregation:** Implement context aggregation to collect user and assistant content:
|
||||
- Aggregators come in pairs with a `user()` instance and `assistant()` instance
|
||||
- Context must adhere to the `LLMContext` universal format
|
||||
- Aggregators should handle adding messages, function calls, and images to the context
|
||||
|
||||
### TTS (Text-to-Speech) Services
|
||||
|
||||
#### WebsocketTTSService
|
||||
#### AudioContextWordTTSService
|
||||
|
||||
**Use for:** Websocket-based streaming services (with or without word timestamps)
|
||||
**Use for:** Websocket-based services supporting word/timestamp alignment
|
||||
|
||||
**Examples:**
|
||||
**Example:**
|
||||
|
||||
- [CartesiaTTSService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/cartesia/tts.py)
|
||||
- [ElevenLabsTTSService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/elevenlabs/tts.py)
|
||||
|
||||
#### InterruptibleTTSService
|
||||
|
||||
**Use for:** Websocket-based services without word timestamps that reconnect on interruption (e.g. don't support a context ID or interruption message)
|
||||
**Use for:** Websocket-based services without word/timestamp alignment, requiring disconnection on interruption
|
||||
|
||||
**Example:**
|
||||
|
||||
- [SarvamTTSService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/sarvam/tts.py)
|
||||
|
||||
#### WordTTSService
|
||||
|
||||
**Use for:** HTTP-based services supporting word/timestamp alignment
|
||||
|
||||
**Example:**
|
||||
|
||||
- [ElevenLabsHttpTTSService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/elevenlabs/tts.py)
|
||||
|
||||
#### TTSService
|
||||
|
||||
**Use for:** HTTP-based services (word timestamps are supported in the base class)
|
||||
**Use for:** HTTP-based services without word/timestamp alignment
|
||||
|
||||
**Examples:**
|
||||
**Example:**
|
||||
|
||||
- [GoogleHttpTTSService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/google/tts.py)
|
||||
- [OpenAITTSService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/openai/tts.py)
|
||||
|
||||
#### Key requirements:
|
||||
|
||||
- For websocket services, use asyncio WebSocket implementation
|
||||
- For websocket services, use asyncio WebSocket implementation (required for v13+ support)
|
||||
- Handle idle service timeouts with keepalives
|
||||
- TTS services push both audio (`TTSAudioRawFrame`) and text (`TTSTextFrame`) frames
|
||||
- TTSServices push both audio (`TTSRawAudioFrame`) and text (`TTSTextFrame`) frames
|
||||
|
||||
### Telephony Serializers
|
||||
|
||||
@@ -217,25 +202,14 @@ Vision services process images and provide analysis such as descriptions, object
|
||||
|
||||
#### Key requirements:
|
||||
|
||||
- Must implement `run_vision` method that takes a `UserImageRawFrame` and returns an `AsyncGenerator[Frame, None]`
|
||||
- The method processes the image frame and yields frames with analysis results
|
||||
- Must yield the frame sequence: `VisionFullResponseStartFrame`, `VisionTextFrame`, `VisionFullResponseEndFrame`
|
||||
- Must implement `run_vision` method that takes an `LLMContext` and returns an `AsyncGenerator[Frame, None]`
|
||||
- The method processes the latest image in the context and yields frames with analysis results
|
||||
- Typically yields `TextFrame` objects containing descriptions or answers
|
||||
|
||||
## Implementation Guidelines
|
||||
|
||||
### Naming Conventions
|
||||
|
||||
#### Package and Repository Naming
|
||||
|
||||
Use the `pipecat-{vendor}` naming convention for your PyPI package and repository:
|
||||
|
||||
- `pipecat-{vendor}` — for single-service integrations (e.g., `pipecat-deepdub`)
|
||||
- `pipecat-{vendor}-{type}` — when a vendor offers multiple service types (e.g., `pipecat-upliftai-stt`, `pipecat-upliftai-tts`)
|
||||
|
||||
This convention makes community packages easily discoverable via PyPI search and clearly identifies them as part of the Pipecat ecosystem.
|
||||
|
||||
#### Class Naming
|
||||
|
||||
- **STT:** `VendorSTTService`
|
||||
- **LLM:** `VendorLLMService`
|
||||
- **TTS:**
|
||||
@@ -259,137 +233,24 @@ def can_generate_metrics(self) -> bool:
|
||||
return True
|
||||
```
|
||||
|
||||
### Service Settings
|
||||
### Dynamic Settings Updates
|
||||
|
||||
Every AI service (STT, LLM, TTS, image generation, etc.) exposes a **Settings dataclass** that serves two roles:
|
||||
|
||||
1. **Store mode** — the service's `self._settings` holds the current value of every runtime-updatable field.
|
||||
2. **Delta mode** — an update frame (e.g. `TTSUpdateSettingsFrame`) specifies only the fields that should change; unspecified fields remain `NOT_GIVEN`.
|
||||
|
||||
#### Defining your Settings class
|
||||
|
||||
Extend `STTSettings`, `TTSSettings`, `LLMSettings`, or `ImageGenSettings` (or, if your service directly subclasses `AIService`, `ServiceSettings`). The base classes already provide common fields (e.g. `model`, `voice`, `language`). You only need to add **service-specific knobs that should be runtime-updatable**:
|
||||
STT, LLM, and TTS services support `ServiceUpdateSettingsFrame` for dynamic configuration changes. The base STTService has an `_update_settings()` method that handles settings, and the private `_settings` `Dict` is used to store settings and provide access to the subclass.
|
||||
|
||||
```python
|
||||
from dataclasses import dataclass, field
|
||||
async def set_language(self, language: Language):
|
||||
"""Set the recognition language and reconnect.
|
||||
|
||||
from pipecat.services.settings import TTSSettings, NOT_GIVEN
|
||||
|
||||
@dataclass
|
||||
class MyTTSSettings(TTSSettings):
|
||||
"""Settings for MyTTS service.
|
||||
|
||||
Parameters:
|
||||
speaking_rate: Speed multiplier (0.5–2.0).
|
||||
Args:
|
||||
language: The language to use for speech recognition.
|
||||
"""
|
||||
|
||||
speaking_rate: float | None = field(default_factory=lambda: NOT_GIVEN)
|
||||
```
|
||||
|
||||
**What goes in Settings vs. `__init__` params:**
|
||||
|
||||
| Belongs in Settings | Stays as `__init__` params |
|
||||
| -------------------------------------------------------- | ----------------------------------------- |
|
||||
| Model name, voice, language | API keys, auth tokens |
|
||||
| Service-specific tuning knobs (rate, pitch, temperature) | Base URLs, endpoint overrides |
|
||||
| Anything users may want to change mid-session | Audio encoding, sample format |
|
||||
| | Connection parameters (timeouts, retries) |
|
||||
|
||||
The rule of thumb: if a caller might send an update frame to change it at runtime, it belongs in Settings. Everything else is init-only config stored as `self._xxx`.
|
||||
|
||||
#### Wiring settings into `__init__`
|
||||
|
||||
Accept an **optional** `settings` parameter. Build a `default_settings` object with all fields set to real values, then merge any caller overrides with `apply_update`.
|
||||
|
||||
Add a `Settings` **class attribute** that points to your settings dataclass. This lets callers access the settings class through the service itself (e.g. `MyTTSService.Settings(...)`) without a separate import:
|
||||
|
||||
```python
|
||||
from typing import Optional
|
||||
|
||||
class MyTTSService(TTSService):
|
||||
Settings = MyTTSSettings
|
||||
_settings: Settings
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
*,
|
||||
api_key: str,
|
||||
settings: Optional[Settings] = None,
|
||||
**kwargs,
|
||||
):
|
||||
# 1. Defaults — every field has a real value (store mode).
|
||||
default_settings = self.Settings(
|
||||
model="my-model-v1",
|
||||
voice="default-voice",
|
||||
language="en",
|
||||
speaking_rate=1.0,
|
||||
)
|
||||
|
||||
# 2. Merge caller overrides (only given fields win).
|
||||
if settings is not None:
|
||||
default_settings.apply_update(settings)
|
||||
|
||||
# 3. Pass the fully-populated settings to the base class.
|
||||
super().__init__(settings=default_settings, **kwargs)
|
||||
|
||||
# 4. Init-only config stored separately.
|
||||
self._api_key = api_key
|
||||
```
|
||||
|
||||
This pattern lets callers override only what they care about:
|
||||
|
||||
```python
|
||||
# Uses all defaults
|
||||
svc = MyTTSService(api_key="sk-xxx")
|
||||
|
||||
# Overrides just the voice — access Settings through the service class
|
||||
svc = MyTTSService(
|
||||
api_key="sk-xxx",
|
||||
settings=MyTTSService.Settings(voice="custom-voice"),
|
||||
)
|
||||
```
|
||||
|
||||
#### Reacting to runtime changes
|
||||
|
||||
AI services support runtime configuration changes via `*UpdateSettingsFrame`s (e.g. `STTUpdateSettingsFrame`, `TTSUpdateSettingsFrame`, `LLMUpdateSettingsFrame`).
|
||||
|
||||
To react to runtime setting changes, override `_update_settings`. The base implementation applies the delta to `self._settings` and returns a `dict` mapping each changed field name to its **pre-update** value. Your override should call `super()` first, then act on the changed fields. A common implementation might look like:
|
||||
|
||||
```python
|
||||
async def _update_settings(self, update: TTSSettings) -> dict[str, Any]:
|
||||
"""Apply a settings update, reconfiguring the connection if needed."""
|
||||
changed = await super()._update_settings(update)
|
||||
|
||||
if not changed:
|
||||
return changed
|
||||
|
||||
logger.info(f"Switching STT language to: [{language}]")
|
||||
self._settings["language"] = language
|
||||
await self._disconnect()
|
||||
await self._connect()
|
||||
|
||||
return changed
|
||||
```
|
||||
|
||||
The dict keys work like a set for membership tests (`"language" in changed`) and truthiness (`if changed`). Use `changed.keys() - {"language"}` for set difference, or `changed["language"]` to inspect the previous value of a field.
|
||||
|
||||
Note that, in this example, the service requires a reconnect to apply the new language. Consider, for each setting, whether your service requires reconnection or can apply changes in-place.
|
||||
|
||||
If your service can't yet apply certain settings at runtime, call `self._warn_unhandled_updated_settings(changed)` with any unhandled field names so users get a clear log message:
|
||||
|
||||
```python
|
||||
async def _update_settings(self, update: TTSSettings) -> dict[str, Any]:
|
||||
changed = await super()._update_settings(update)
|
||||
|
||||
if not changed:
|
||||
return changed
|
||||
|
||||
if "language" in changed:
|
||||
await self._update_language()
|
||||
else:
|
||||
# TODO: this should be temporary - handle changes to other settings soon!
|
||||
self._warn_unhandled_updated_settings(changed.keys() - {"language"})
|
||||
|
||||
return changed
|
||||
```
|
||||
Note that, in this example, Deepgram requires the websocket connection be disconnected and reconnected to reinitialize the service with the new value. Consider if your service requires reconnection.
|
||||
|
||||
### Sample Rate Handling
|
||||
|
||||
@@ -399,7 +260,7 @@ Sample rates are set via PipelineParams and passed to each frame processor at in
|
||||
async def start(self, frame: StartFrame):
|
||||
"""Start the service."""
|
||||
await super().start(frame)
|
||||
self._settings.output_sample_rate = self.sample_rate
|
||||
self._settings["output_format"]["sample_rate"] = self.sample_rate
|
||||
await self._connect()
|
||||
```
|
||||
|
||||
@@ -409,7 +270,7 @@ Note that `self.sample_rate` is a `@property` set in the TTSService base class,
|
||||
|
||||
Use Pipecat's tracing decorators:
|
||||
|
||||
- **STT:** `@traced_stt` - decorate `_handle_transcription(self, transcript, is_final, language)` (the standard method name convention)
|
||||
- **STT:** `@traced_stt` - decorate a function that handles `transcript`, `is_final`, `language` as args
|
||||
- **LLM:** `@traced_llm` - decorate the `_process_context()` method
|
||||
- **TTS:** `@traced_tts` - decorate the `run_tts()` method
|
||||
|
||||
@@ -417,9 +278,8 @@ Use Pipecat's tracing decorators:
|
||||
|
||||
### Packaging and Distribution
|
||||
|
||||
- Name your package `pipecat-{vendor}` (see [Naming Conventions](#naming-conventions))
|
||||
- Use [uv](https://docs.astral.sh/uv/) for packaging (encouraged)
|
||||
- Publish to PyPI for easier installation
|
||||
- Consider releasing to PyPI for easier installation
|
||||
- Follow semantic versioning principles
|
||||
- Maintain a changelog
|
||||
|
||||
@@ -432,15 +292,17 @@ For REST-based communication, use aiohttp. Pipecat includes this as a required d
|
||||
- Wrap API calls in appropriate try/catch blocks
|
||||
- Handle rate limits and network failures gracefully
|
||||
- Provide meaningful error messages
|
||||
- When errors occur, raise exceptions AND push errors to notify the pipeline:
|
||||
- When errors occur, raise exceptions AND push `ErrorFrame`s to notify the pipeline:
|
||||
|
||||
```python
|
||||
from pipecat.frames.frames import ErrorFrame
|
||||
|
||||
try:
|
||||
# Your API call
|
||||
result = await self._make_api_call()
|
||||
except Exception as e:
|
||||
# Push error upstream to notify the pipeline
|
||||
await self.push_error(f"{self} error: {e}", exception=e)
|
||||
# Push error frame to pipeline
|
||||
await self.push_error(ErrorFrame(error=f"{self} error: {e}"))
|
||||
# Raise or handle as appropriate
|
||||
raise
|
||||
```
|
||||
|
||||
@@ -49,12 +49,12 @@ Every pull request that makes a user-facing change should include a changelog en
|
||||
```
|
||||
|
||||
2. Choose the appropriate type:
|
||||
|
||||
- `added.md` - New features
|
||||
- `changed.md` - Changes in existing functionality
|
||||
- `deprecated.md` - Soon-to-be removed features
|
||||
- `removed.md` - Removed features
|
||||
- `fixed.md` - Bug fixes
|
||||
- `performance.md` - Performance improvements
|
||||
- `security.md` - Security fixes
|
||||
- `other.md` - Other changes (documentation, dependencies, etc.)
|
||||
|
||||
@@ -80,6 +80,7 @@ Every pull request that makes a user-facing change should include a changelog en
|
||||
|
||||
```markdown
|
||||
- Updated service configuration:
|
||||
|
||||
- Changed default timeout to 30 seconds
|
||||
- Added retry logic for failed connections
|
||||
```
|
||||
@@ -104,6 +105,7 @@ changelog/1234.changed.2.md
|
||||
|
||||
```markdown
|
||||
- Updated service configuration:
|
||||
|
||||
- Changed default timeout to 30 seconds
|
||||
- Added retry logic for failed connections
|
||||
```
|
||||
|
||||
69
README.md
69
README.md
@@ -8,7 +8,7 @@
|
||||
|
||||
**Pipecat** is an open-source Python framework for building real-time voice and multimodal conversational agents. Orchestrate audio and video, AI services, different transports, and conversation pipelines effortlessly—so you can focus on what makes your agent unique.
|
||||
|
||||
> Want to dive right in? Run `pipecat init quickstart` or follow the [quickstart guide](https://docs.pipecat.ai/getting-started/quickstart).
|
||||
> Want to dive right in? Try the [quickstart](https://docs.pipecat.ai/getting-started/quickstart).
|
||||
|
||||
## 🚀 What You Can Build
|
||||
|
||||
@@ -28,10 +28,6 @@
|
||||
|
||||
## 🌐 Pipecat Ecosystem
|
||||
|
||||
### 🧩 Multi-agent systems
|
||||
|
||||
Need multiple AI agents working together? [Pipecat Subagents](https://github.com/pipecat-ai/pipecat-subagents) lets you build distributed multi-agent systems where each agent runs its own pipeline and communicates through a shared message bus. Hand off conversations between specialists, dispatch background tasks, and scale agents across processes or machines.
|
||||
|
||||
### 📱 Client SDKs
|
||||
|
||||
Building client applications? You can connect to Pipecat from any platform using our official SDKs:
|
||||
@@ -59,20 +55,6 @@ Looking for help debugging your pipeline and processors? Check out [Whisker](htt
|
||||
|
||||
Love terminal applications? Check out [Tail](https://github.com/pipecat-ai/tail), a terminal dashboard for Pipecat.
|
||||
|
||||
### 🤖 Claude Code Skills
|
||||
|
||||
Use [Pipecat Skills](https://github.com/pipecat-ai/skills) with [Claude Code](https://claude.ai/code) to scaffold projects, deploy to Pipecat Cloud, and more. Install the marketplace with:
|
||||
|
||||
```
|
||||
claude plugin marketplace add pipecat-ai/skills
|
||||
```
|
||||
|
||||
and install any of the available plugins.
|
||||
|
||||
### 🧩 Community Integrations
|
||||
|
||||
Build and share your own Pipecat service integrations! Browse existing [community integrations](https://docs.pipecat.ai/api-reference/server/services/community-integrations) or check out our [guide](COMMUNITY_INTEGRATIONS.md) to create your own.
|
||||
|
||||
### 📺️ Pipecat TV Channel
|
||||
|
||||
Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.youtube.com/playlist?list=PLzU2zoMTQIHjqC3v4q2XVSR3hGSzwKFwH) channel.
|
||||
@@ -83,28 +65,27 @@ Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.yout
|
||||
<a href="https://github.com/pipecat-ai/pipecat-examples/tree/main/simple-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat-examples/main/simple-chatbot/image.png" width="400" /></a>
|
||||
<a href="https://github.com/pipecat-ai/pipecat-examples/tree/main/storytelling-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat-examples/main/storytelling-chatbot/image.png" width="400" /></a>
|
||||
<br/>
|
||||
<a href="https://github.com/pipecat-ai/pipecat-examples/tree/main/daily-multi-translation"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat-examples/main/daily-multi-translation/image.png" width="400" /></a>
|
||||
<a href="https://github.com/pipecat-ai/pipecat/blob/main/examples/vision/vision-moondream.py"><img src="https://github.com/pipecat-ai/pipecat/blob/main/examples/assets/moondream.png" width="400" /></a>
|
||||
<a href="https://github.com/pipecat-ai/pipecat-examples/tree/main/translation-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat-examples/main/translation-chatbot/image.png" width="400" /></a>
|
||||
<a href="https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/12-describe-video.py"><img src="https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/assets/moondream.png" width="400" /></a>
|
||||
</p>
|
||||
|
||||
## 🧩 Available services
|
||||
|
||||
| Category | Services |
|
||||
| ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/api-reference/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/api-reference/server/services/stt/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/api-reference/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/api-reference/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/api-reference/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/api-reference/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/api-reference/server/services/stt/gladia), [Google](https://docs.pipecat.ai/api-reference/server/services/stt/google), [Gradium](https://docs.pipecat.ai/api-reference/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/api-reference/server/services/stt/groq), [Mistral](https://docs.pipecat.ai/api-reference/server/services/stt/mistral), [NVIDIA](https://docs.pipecat.ai/api-reference/server/services/stt/nvidia), [OpenAI (Whisper)](https://docs.pipecat.ai/api-reference/server/services/stt/openai), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/api-reference/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/api-reference/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/api-reference/server/services/stt/whisper), [xAI](https://docs.pipecat.ai/api-reference/server/services/stt/xai) |
|
||||
| LLMs | [Anthropic](https://docs.pipecat.ai/api-reference/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/api-reference/server/services/llm/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/api-reference/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/api-reference/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/api-reference/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/api-reference/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/api-reference/server/services/llm/grok), [Groq](https://docs.pipecat.ai/api-reference/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/api-reference/server/services/llm/mistral), [Nebius](https://docs.pipecat.ai/api-reference/server/services/llm/nebius), [Novita](https://docs.pipecat.ai/api-reference/server/services/llm/novita), [NVIDIA NIM](https://docs.pipecat.ai/api-reference/server/services/llm/nvidia), [Ollama](https://docs.pipecat.ai/api-reference/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/api-reference/server/services/llm/openai), [OpenAI Responses](https://docs.pipecat.ai/api-reference/server/services/llm/openai-responses), [OpenRouter](https://docs.pipecat.ai/api-reference/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/api-reference/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/api-reference/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/api-reference/server/services/llm/sambanova), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/llm/sarvam), [Together AI](https://docs.pipecat.ai/api-reference/server/services/llm/together) |
|
||||
| Text-to-Speech | [Async](https://docs.pipecat.ai/api-reference/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/api-reference/server/services/tts/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/tts/azure), [Camb AI](https://docs.pipecat.ai/api-reference/server/services/tts/camb), [Cartesia](https://docs.pipecat.ai/api-reference/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/api-reference/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/api-reference/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/api-reference/server/services/tts/fish), [Google](https://docs.pipecat.ai/api-reference/server/services/tts/google), [Gradium](https://docs.pipecat.ai/api-reference/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/api-reference/server/services/tts/groq), [Hume](https://docs.pipecat.ai/api-reference/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/api-reference/server/services/tts/inworld), [Kokoro](https://docs.pipecat.ai/api-reference/server/services/tts/kokoro), [LMNT](https://docs.pipecat.ai/api-reference/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/api-reference/server/services/tts/minimax), [Mistral](https://docs.pipecat.ai/api-reference/server/services/tts/mistral), [Neuphonic](https://docs.pipecat.ai/api-reference/server/services/tts/neuphonic), [NVIDIA](https://docs.pipecat.ai/api-reference/server/services/tts/nvidia), [OpenAI](https://docs.pipecat.ai/api-reference/server/services/tts/openai), [Piper](https://docs.pipecat.ai/api-reference/server/services/tts/piper), [Resemble](https://docs.pipecat.ai/api-reference/server/services/tts/resemble), [Rime](https://docs.pipecat.ai/api-reference/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/tts/sarvam), [Smallest](https://docs.pipecat.ai/api-reference/server/services/tts/smallest), [Soniox](https://docs.pipecat.ai/api-reference/server/services/tts/soniox), [Speechmatics](https://docs.pipecat.ai/api-reference/server/services/tts/speechmatics), [xAI](https://docs.pipecat.ai/api-reference/server/services/tts/xai), [XTTS](https://docs.pipecat.ai/api-reference/server/services/tts/xtts) |
|
||||
| Speech-to-Speech | [AWS Nova Sonic](https://docs.pipecat.ai/api-reference/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/api-reference/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/api-reference/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/api-reference/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/api-reference/server/services/s2s/ultravox), |
|
||||
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/api-reference/server/services/transport/fastapi-websocket), [LiveKit (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/livekit), [SmallWebRTCTransport](https://docs.pipecat.ai/api-reference/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/api-reference/server/services/transport/websocket-server), [WhatsApp](https://docs.pipecat.ai/api-reference/server/services/transport/whatsapp), Local |
|
||||
| Serializers | [Exotel](https://docs.pipecat.ai/api-reference/server/services/serializers/exotel), [Genesys](https://docs.pipecat.ai/api-reference/server/services/serializers/genesys), [Plivo](https://docs.pipecat.ai/api-reference/server/services/serializers/plivo), [Twilio](https://docs.pipecat.ai/api-reference/server/services/serializers/twilio), [Telnyx](https://docs.pipecat.ai/api-reference/server/services/serializers/telnyx), [Vonage](https://docs.pipecat.ai/api-reference/server/services/serializers/vonage) |
|
||||
| Video | [HeyGen](https://docs.pipecat.ai/api-reference/server/services/video/heygen), [LemonSlice](https://docs.pipecat.ai/api-reference/server/services/transport/lemonslice), [Tavus](https://docs.pipecat.ai/api-reference/server/services/video/tavus), [Simli](https://docs.pipecat.ai/api-reference/server/services/video/simli) |
|
||||
| Memory | [mem0](https://docs.pipecat.ai/api-reference/server/services/memory/mem0) |
|
||||
| Vision & Image | [fal](https://docs.pipecat.ai/api-reference/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/api-reference/server/services/image-generation/google-imagen), [Moondream](https://docs.pipecat.ai/api-reference/server/services/vision/moondream) |
|
||||
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/api-reference/server/utilities/audio/silero-vad-analyzer), [Krisp Viva](https://docs.pipecat.ai/guides/features/krisp-viva), [Koala](https://docs.pipecat.ai/api-reference/server/utilities/audio/koala-filter), [ai-coustics](https://docs.pipecat.ai/api-reference/server/utilities/audio/aic-filter), [RNNoise](https://docs.pipecat.ai/api-reference/server/utilities/audio/rnnoise-filter) |
|
||||
| Analytics & Metrics | [OpenTelemetry](https://docs.pipecat.ai/api-reference/server/utilities/opentelemetry), [Sentry](https://docs.pipecat.ai/api-reference/server/services/analytics/sentry) |
|
||||
| Community | [Browse community integrations →](https://docs.pipecat.ai/api-reference/server/services/community-integrations) |
|
||||
| Category | Services |
|
||||
| ------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Gradium](https://docs.pipecat.ai/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [Hathora](https://docs.pipecat.ai/server/services/stt/hathora), [NVIDIA Riva](https://docs.pipecat.ai/server/services/stt/riva), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [SambaNova (Whisper)](https://docs.pipecat.ai/server/services/stt/sambanova), [Sarvam](https://docs.pipecat.ai/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) |
|
||||
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/server/services/llm/aws), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/server/services/llm/mistral), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/server/services/llm/sambanova) [Together AI](https://docs.pipecat.ai/server/services/llm/together) |
|
||||
| Text-to-Speech | [Async](https://docs.pipecat.ai/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Camb AI](https://docs.pipecat.ai/server/services/tts/camb), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [Gradium](https://docs.pipecat.ai/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/server/services/tts/groq), [Hathora](https://docs.pipecat.ai/server/services/tts/hathora), [Hume](https://docs.pipecat.ai/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/server/services/tts/inworld), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [NVIDIA Riva](https://docs.pipecat.ai/server/services/tts/riva), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Resemble](https://docs.pipecat.ai/server/services/tts/resemble), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [Speechmatics](https://docs.pipecat.ai/server/services/tts/speechmatics), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
|
||||
| Speech-to-Speech | [AWS Nova Sonic](https://docs.pipecat.ai/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/server/services/s2s/ultravox), |
|
||||
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local |
|
||||
| Serializers | [Exotel](https://docs.pipecat.ai/server/utilities/serializers/exotel), [Plivo](https://docs.pipecat.ai/server/utilities/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/utilities/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/utilities/serializers/telnyx), [Vonage](https://docs.pipecat.ai/server/utilities/serializers/vonage) |
|
||||
| Video | [HeyGen](https://docs.pipecat.ai/server/services/video/heygen), [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) |
|
||||
| Memory | [mem0](https://docs.pipecat.ai/server/services/memory/mem0) |
|
||||
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/google-imagen), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) |
|
||||
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [ai-coustics](https://docs.pipecat.ai/server/utilities/audio/aic-filter) |
|
||||
| Analytics & Metrics | [OpenTelemetry](https://docs.pipecat.ai/server/utilities/opentelemetry), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) |
|
||||
|
||||
📚 [View full services documentation →](https://docs.pipecat.ai/api-reference/server/services/supported-services)
|
||||
📚 [View full services documentation →](https://docs.pipecat.ai/server/services/supported-services)
|
||||
|
||||
## ⚡ Getting started
|
||||
|
||||
@@ -146,15 +127,15 @@ You can get started with Pipecat running on your local machine, then move your a
|
||||
|
||||
## 🧪 Code examples
|
||||
|
||||
- [Foundational](https://github.com/pipecat-ai/pipecat/tree/main/examples) — small snippets that build on each other, introducing one or two concepts at a time
|
||||
- [Foundational](https://github.com/pipecat-ai/pipecat/tree/main/examples/foundational) — small snippets that build on each other, introducing one or two concepts at a time
|
||||
- [Example apps](https://github.com/pipecat-ai/pipecat-examples) — complete applications that you can use as starting points for development
|
||||
|
||||
## 🛠️ Contributing to the framework
|
||||
|
||||
### Prerequisites
|
||||
|
||||
**Minimum Python Version:** 3.11
|
||||
**Recommended Python Version:** >= 3.12
|
||||
**Minimum Python Version:** 3.10
|
||||
**Recommended Python Version:** 3.12
|
||||
|
||||
### Setup Steps
|
||||
|
||||
@@ -170,6 +151,7 @@ You can get started with Pipecat running on your local machine, then move your a
|
||||
```bash
|
||||
uv sync --group dev --all-extras \
|
||||
--no-extra gstreamer \
|
||||
--no-extra krisp \
|
||||
--no-extra local \
|
||||
```
|
||||
|
||||
@@ -181,15 +163,6 @@ You can get started with Pipecat running on your local machine, then move your a
|
||||
|
||||
> **Note**: Some extras (local, gstreamer) require system dependencies. See documentation if you encounter build errors.
|
||||
|
||||
### Claude Code Skills
|
||||
|
||||
Install development workflow skills for contributing to Pipecat with [Claude Code](https://claude.ai/code):
|
||||
|
||||
```
|
||||
claude plugin marketplace add pipecat-ai/pipecat
|
||||
claude plugin install pipecat-dev@pipecat-dev-skills
|
||||
```
|
||||
|
||||
### Running tests
|
||||
|
||||
To run all tests, from the root directory:
|
||||
|
||||
1
changelog/3134.added.md
Normal file
1
changelog/3134.added.md
Normal file
@@ -0,0 +1 @@
|
||||
- Added `ResembleAITTSService` for text-to-speech using Resemble AI's streaming WebSocket API with word-level timestamps and jitter buffering for smooth audio playback.
|
||||
1
changelog/3355.added.md
Normal file
1
changelog/3355.added.md
Normal file
@@ -0,0 +1 @@
|
||||
- Added `UserBotLatencyObserver` for tracking user-to-bot response latency. When tracing is enabled, latency measurements are automatically recorded as `turn.user_bot_latency_seconds` attributes on OpenTelemetry turn spans.
|
||||
1
changelog/3355.deprecated.md
Normal file
1
changelog/3355.deprecated.md
Normal file
@@ -0,0 +1 @@
|
||||
- Deprecated `UserBotLatencyLogObserver`. Use `UserBotLatencyObserver` directly with its `on_latency_measured` event handler instead.
|
||||
1
changelog/3542.fixed.md
Normal file
1
changelog/3542.fixed.md
Normal file
@@ -0,0 +1 @@
|
||||
- Fixed pipeline freeze when `InterruptionFrame` discards `EndFrame` or `StopFrame` by making terminal frames uninterruptible.
|
||||
1
changelog/3589.fixed.md
Normal file
1
changelog/3589.fixed.md
Normal file
@@ -0,0 +1 @@
|
||||
- Fixed OpenAI LLM stream not being closed on cancellation/exception, which could leak sockets.
|
||||
1
changelog/3593.added.md
Normal file
1
changelog/3593.added.md
Normal file
@@ -0,0 +1 @@
|
||||
- Added support for Inworld TTS Websocket Auto Mode for improved latency
|
||||
1
changelog/3593.changed.md
Normal file
1
changelog/3593.changed.md
Normal file
@@ -0,0 +1 @@
|
||||
- Updated timestamps to be cumulative within an agent turn, using flushCompleted message as an indication of when timestamps from the server are reset to 0
|
||||
1
changelog/3610.fixed.md
Normal file
1
changelog/3610.fixed.md
Normal file
@@ -0,0 +1 @@
|
||||
- Fixed `PipelineTask` adding duplicate `RTVIProcessor` and `RTVIObserver` when they were already provided in the pipeline or observers list. They are now detected and skipped, with appropriate warnings and errors logged for mismatched configurations.
|
||||
1
changelog/3612.changed.md
Normal file
1
changelog/3612.changed.md
Normal file
@@ -0,0 +1 @@
|
||||
- Changed `KokoroTTSService` to use `kokoro-onnx` instead of `kokoro` as the underlying TTS engine.
|
||||
1
changelog/3616.fixed.md
Normal file
1
changelog/3616.fixed.md
Normal file
@@ -0,0 +1 @@
|
||||
- Fixed function call timeout task not being cancelled when the handler completes without calling `result_callback` or is cancelled externally, which caused `RuntimeWarning: coroutine was never awaited`.
|
||||
5
changelog/3617.fixed.md
Normal file
5
changelog/3617.fixed.md
Normal file
@@ -0,0 +1,5 @@
|
||||
- Fixed sentence splitting for Japanese, Chinese, Korean, and other non-Latin
|
||||
languages in TTS pipeline. NLTK's sentence tokenizer does not support CJK
|
||||
languages, causing text to accumulate until flush instead of being split at
|
||||
sentence boundaries. Added fallback detection for unambiguous non-Latin
|
||||
sentence-ending punctuation (e.g., `。`, `?`, `!`).
|
||||
1
changelog/3623.fixed.md
Normal file
1
changelog/3623.fixed.md
Normal file
@@ -0,0 +1 @@
|
||||
- Fixed `PipelineTask` to also call `set_bot_ready()` when an external `RTVIProcessor` is provided.
|
||||
1
changelog/3628.fixed.md
Normal file
1
changelog/3628.fixed.md
Normal file
@@ -0,0 +1 @@
|
||||
- Fixed `VADController` not broadcasting `SpeechControlParamsFrame` on startup, which prevented STT services from receiving VAD params needed for TTFB measurement.
|
||||
1
changelog/3629.fixed.md
Normal file
1
changelog/3629.fixed.md
Normal file
@@ -0,0 +1 @@
|
||||
- Fixed `StopAsyncIteration` exceptions in `parse_telephony_websocket()` when WebSocket connections close before sending expected messages.
|
||||
1
changelog/3630.added.md
Normal file
1
changelog/3630.added.md
Normal file
@@ -0,0 +1 @@
|
||||
- Added RTVI function call lifecycle events (`llm-function-call-started`, `llm-function-call-in-progress`, `llm-function-call-stopped`) with configurable security levels via `RTVIObserverParams.function_call_report_level`. Supports per-function control over what information is exposed (`DISABLED`, `NONE`, `NAME`, or `FULL`).
|
||||
1
changelog/3630.deprecated.md
Normal file
1
changelog/3630.deprecated.md
Normal file
@@ -0,0 +1 @@
|
||||
- Deprecated `RTVILLMFunctionCallMessage`, `RTVILLMFunctionCallMessageData`, and `RTVIProcessor.handle_function_call()`. Use the new `llm-function-call-in-progress` event sent automatically by `RTVIObserver` instead.
|
||||
1
changelog/3635.fixed.md
Normal file
1
changelog/3635.fixed.md
Normal file
@@ -0,0 +1 @@
|
||||
- Fixed WebSocket transport error when broadcasting `InputTransportMessageFrame` by correctly instantiating the frame with its message parameter.
|
||||
1
changelog/3649.fixed.md
Normal file
1
changelog/3649.fixed.md
Normal file
@@ -0,0 +1 @@
|
||||
- Fixed orphan OpenTelemetry spans during flow initialization and transitions in tracing.
|
||||
1
changelog/3652.changed.md
Normal file
1
changelog/3652.changed.md
Normal file
@@ -0,0 +1 @@
|
||||
- Upgraded the `pipecat-ai-small-webrtc-prebuilt` package to v2.1.0.
|
||||
1
changelog/3656.added.md
Normal file
1
changelog/3656.added.md
Normal file
@@ -0,0 +1 @@
|
||||
- Added `OpenAIRealtimeSTTService` for real-time streaming speech-to-text using OpenAI's Realtime API WebSocket transcription sessions. Supports local VAD and server-side VAD modes, noise reduction, and automatic reconnection.
|
||||
10
changelog/3659.changed.md
Normal file
10
changelog/3659.changed.md
Normal file
@@ -0,0 +1,10 @@
|
||||
- ⚠️ The default `VADParams` `stop_secs` default is changing from `0.8` seconds
|
||||
to `0.2` seconds. This change both simplifies the developer experience and
|
||||
improves the performance of STT services. With a shorter `stop_secs` value,
|
||||
STT services using a local VAD can finalize sooner, resulting in faster
|
||||
transcription.
|
||||
|
||||
- `SpeechTimeoutUserTurnStopStrategy`: control how long to wait for
|
||||
additional user speech using `user_speech_timeout` (default: 0.6 sec).
|
||||
- `TurnAnalyzerUserTurnStopStrategy`: the turn analyzer automatically adjusts
|
||||
the user wait time based on the audio input.
|
||||
1
changelog/3660.changed.md
Normal file
1
changelog/3660.changed.md
Normal file
@@ -0,0 +1 @@
|
||||
- Moved interruption wait event from per-processor instance state to `InterruptionFrame` itself. Added `InterruptionFrame.complete()` to signal when the interruption has fully traversed the pipeline. Custom processors that block or consume an `InterruptionFrame` before it reaches the pipeline sink must call `frame.complete()` to avoid stalling `push_interruption_task_frame_and_wait()`. A warning is logged if completion does not happen within 2 seconds.
|
||||
1
changelog/3663.fixed.md
Normal file
1
changelog/3663.fixed.md
Normal file
@@ -0,0 +1 @@
|
||||
- Fixed `SambaNovaLLMService` and `GoogleLLMOpenAIBetaService` streams not being closed on cancellation/exception, which could leak sockets.
|
||||
1
changelog/3664.changed.md
Normal file
1
changelog/3664.changed.md
Normal file
@@ -0,0 +1 @@
|
||||
- Update the default model to `scribe_v2` for `ElevenLabsSTTService`.
|
||||
1
changelog/3666.changed.md
Normal file
1
changelog/3666.changed.md
Normal file
@@ -0,0 +1 @@
|
||||
- Changed the `DeepgramSTTService` default setting for `smart_format` to `False`, as agents don't need smart formatting. Disabling this setting provides a small performance improvement, as well.
|
||||
1
changelog/3667.fixed.md
Normal file
1
changelog/3667.fixed.md
Normal file
@@ -0,0 +1 @@
|
||||
- Fixed an issue in `InworldTTSService` where punctuation was pronounced. Now, the `InworldTTSService` ensures proper spacing between sentences, resolving pronunciation issues.
|
||||
1
changelog/3668.fixed.md
Normal file
1
changelog/3668.fixed.md
Normal file
@@ -0,0 +1 @@
|
||||
- Fixed `ParallelPipeline` allowing frames pushed by internal processors to escape during lifecycle frame (`StartFrame`/`EndFrame`/`CancelFrame`) synchronization. These frames are now buffered and flushed after all branches complete.
|
||||
1
changelog/3678.added.md
Normal file
1
changelog/3678.added.md
Normal file
@@ -0,0 +1 @@
|
||||
- Added pyright basic type checking configuration for the core framework.
|
||||
@@ -1 +0,0 @@
|
||||
- Added a `session_id` field to `RunnerArguments` so bots can log or trace a per-session identifier in local development the same way they can in Pipecat Cloud. The development runner now mints a UUID at every construction site, and paths that already returned a `sessionId` to the caller (Daily `/start`, dial-in webhook) share that same UUID with the runner args instead of generating two. The SmallWebRTC `/api/offer` endpoint also accepts an optional `session_id` query parameter so the `/sessions/{session_id}/...` proxy can thread it through.
|
||||
@@ -1 +0,0 @@
|
||||
- Updated the default `SonioxTTSService` model from `tts-rt-v1-preview` to the generally available `tts-rt-v1`.
|
||||
@@ -1 +0,0 @@
|
||||
- Added a `max_buffer_delay_ms` constructor argument to `CartesiaTTSService` for controlling Cartesia's server-side text buffering. When unset, Pipecat picks a sensible default based on `text_aggregation_mode`: `0` in `SENTENCE` mode (custom buffering — avoids stacking client-side aggregation on top of Cartesia's default 3000ms server buffer) and unset in `TOKEN` mode (Cartesia's managed buffering applies). Pass an explicit value (0–5000ms) to override.
|
||||
@@ -1 +0,0 @@
|
||||
- Default `cartesia_version` for `CartesiaTTSService` bumped from `2025-04-16` to `2026-03-01`, matching `CartesiaHttpTTSService` and unlocking the `use_normalized_timestamps` and `max_buffer_delay_ms` fields.
|
||||
@@ -1 +0,0 @@
|
||||
- ⚠️ `CartesiaTTSService` now sends `use_normalized_timestamps: true` instead of the deprecated `use_original_timestamps` field. Word timestamps now reflect what was actually spoken (post text-normalization and pronunciation-dictionary substitution), matching the convention Pipecat uses for ElevenLabs. This is a behavior change for `sonic-3` users, who were previously receiving timestamps tied to the input transcript.
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed `CartesiaHttpTTSService` pushing two `ErrorFrame`s on a non-200 response — one with the API's error text and a second, less informative "Unknown error" frame from the outer exception handler. It now pushes a single frame that includes the HTTP status code and returns cleanly.
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed Cartesia tag helpers (`SPELL`, `EMOTION_TAG`, `PAUSE_TAG`, `VOLUME_TAG`, `SPEED_TAG`) raising `TypeError` when called on an instance (e.g. `tts.SPELL("hi")`). They're now `@staticmethod` and callable from both the class and an instance.
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed `CartesiaTTSService` surfacing `flush_done` messages from Cartesia as `ErrorFrame`s. The latest API emits a `flush_done` per transcript when server-side buffering is disabled; Pipecat now consumes them silently since each turn already has its own `context_id`.
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed an issue where `LocalSmartTurnAnalyzerV3` was imported unconditionally for user turn stop strategies. It is now only imported when `default_user_turn_stop_strategies()` is called. This improves startup time and removes the `transformers` "PyTorch/TensorFlow/Flax not found" warning when the default stop strategies are not used.
|
||||
@@ -1 +0,0 @@
|
||||
- Broadened `tool_resources` to `app_resources` for easy access not just in tool handlers but in other places like custom `FrameProcessor`s. Three changes: a rename (`tool_resources` → `app_resources`), a new `app_resources` property on `PipelineTask`, and a new `pipeline_task` property on `FrameProcessor`. Tool handlers now read `params.app_resources`; custom processors read `self.pipeline_task.app_resources`. The previous `tool_resources` aliases (on `PipelineTask`, `FunctionCallParams`, and `FrameProcessorSetup`) keep working but are deprecated as of 1.2.0 and emit `DeprecationWarning`s.
|
||||
@@ -1 +0,0 @@
|
||||
- Lowered the per-message log in `SmallWebRTCInputTransport._handle_app_message` from `debug` to `trace`. App messages can be high-frequency and were noisy at debug level; set the loguru level to `TRACE` to see them again.
|
||||
@@ -1 +0,0 @@
|
||||
- Added a `mip_opt_out` constructor argument to `DeepgramTTSService` and `DeepgramHttpTTSService` so callers can opt out of the Deepgram Model Improvement Program. When set, the value is forwarded to Deepgram as a query parameter on the speak request. Defaults to `None`, which preserves the existing behavior. See https://dpgr.am/deepgram-mip for pricing implications before enabling.
|
||||
@@ -1 +0,0 @@
|
||||
- Changed the default model for `GrokRealtimeLLMService` to `grok-voice-think-fast-1.0`, xAI's recommended Voice Agent model. The previous default of `grok-voice-fast-1.0` has been deprecated by xAI and is being removed.
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed `GrokRealtimeLLMService` ignoring the configured model. The model was stored in `Settings` but never sent to xAI, so every session silently fell back to xAI's server-side default. The model is now passed via the `?model=` query parameter on the WebSocket URL as xAI's Voice Agent API requires.
|
||||
@@ -1 +0,0 @@
|
||||
- Added an opt-in `add_tool_change_messages` flag to the LLM aggregators (set via `LLMContextAggregatorPair(..., add_tool_change_messages=True)`) that appends a developer-role message to the context whenever `LLMSetToolsFrame` changes the set of advertised standard tools. Helps the LLM stay coherent across mid-conversation tool changes, mitigating several flavors of tool-call-related hallucination: calling tools that have been removed, avoiding tools that have been re-added, and hallucinating output (made-up answers or tool-call-shaped non-tool-calls) when tools are unavailable.
|
||||
@@ -1 +0,0 @@
|
||||
- Added `LLMTurnCompletionUserTurnStopStrategy` in `pipecat.turns.user_stop`. When installed, the strategy gates `on_user_turn_stopped` on a `UserTurnInferenceCompletedFrame` (a new fieldless system frame emitted by any component that can judge turn completeness — e.g. the `UserTurnCompletionLLMServiceMixin` on `✓`). A `finalization_timeout` provides a safety net if no completion frame ever arrives.
|
||||
@@ -1 +0,0 @@
|
||||
- Added `deferred(strategy)` and `DeferredUserTurnStopStrategy` in `pipecat.turns.user_stop`. Wraps a stop strategy so it fires only the inference-triggered event and suppresses `on_user_turn_stopped`, leaving finalization to another strategy in the chain such as `LLMTurnCompletionUserTurnStopStrategy`.
|
||||
@@ -1 +0,0 @@
|
||||
- Added `FilterIncompleteUserTurnStrategies` in `pipecat.turns.user_turn_strategies` — a `UserTurnStrategies` specialization that wraps the detector chain with `deferred(...)` and appends `LLMTurnCompletionUserTurnStopStrategy` as the finalizer. Common case: `user_turn_strategies=FilterIncompleteUserTurnStrategies()`. Pass `config=UserTurnCompletionConfig(...)` to customize timeouts and prompts.
|
||||
@@ -1 +0,0 @@
|
||||
- Added `ExternalUserTurnCompletionStopStrategy` in `pipecat.turns.user_stop` — a generic stop strategy that finalizes the user turn whenever a `UserTurnInferenceCompletedFrame` arrives, regardless of which component produced it. `LLMTurnCompletionUserTurnStopStrategy` now extends this base; future producers (Flux, custom end-of-turn classifiers, etc.) can use the base directly or subclass it to add producer-specific setup.
|
||||
@@ -1 +0,0 @@
|
||||
- Added `on_user_turn_inference_triggered`, a new event on the user turn controller, processor, aggregator and stop strategies that fires when a strategy has enough signal to start LLM inference. By default it fires together with `on_user_turn_stopped`; a gating strategy can fire only the inference-triggered event and defer finalization to a peer.
|
||||
@@ -1 +0,0 @@
|
||||
- Deprecated `LLMUserAggregatorParams.filter_incomplete_user_turns`. Use `user_turn_strategies=FilterIncompleteUserTurnStrategies()` (or add `LLMTurnCompletionUserTurnStopStrategy` to a custom `user_turn_strategies.stop`) instead. Setting the legacy flag still works for one release: the aggregator emits a `DeprecationWarning` and rewires the strategies as if you had passed `FilterIncompleteUserTurnStrategies` directly.
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed `on_user_turn_stopped` firing prematurely when `filter_incomplete_user_turns` was enabled. The event now fires only after the LLM confirms the user turn is complete (`✓`); previously the smart-turn detector's tentative stop was bubbling up before the LLM had a chance to veto it, causing observers, transcript appenders and UI indicators to receive an early — and sometimes duplicated — signal.
|
||||
@@ -1,6 +0,0 @@
|
||||
- Added first-class RTVI support for the UI Agent Protocol:
|
||||
- Adds `ui-event`, `ui-snapshot`, and `ui-cancel-task` client-to-server messages, plus `ui-command` and `ui-task` server-to-client messages, with paired `*Data` / `*Message` pydantic models.
|
||||
- Adds built-in command payload models for `Toast`, `Navigate`, `ScrollTo`, `Highlight`, `Focus`, `Click`, `SetInputValue`, and `SelectText`; matching default handlers live in `@pipecat-ai/client-react`.
|
||||
- Adds `RTVIProcessor.on_ui_message` for inbound `ui-event`, `ui-snapshot`, and `ui-cancel-task` messages.
|
||||
- Adds five UI pipeline frames, mirroring the `client-message` frame-and-event pattern: downstream code pushes `RTVIUICommandFrame` / `RTVIUITaskFrame` for the observer to wrap into outbound `UICommandMessage` / `UITaskMessage` envelopes, while the processor pushes inbound `RTVIUIEventFrame`, `RTVIUISnapshotFrame`, and `RTVIUICancelTaskFrame` alongside `on_ui_message`.
|
||||
- Bumps the RTVI `PROTOCOL_VERSION` from `1.2.0` to `1.3.0`.
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed `TTSSpeakFrame(append_to_context=True)` greetings sometimes splitting across two assistant messages in the LLM context and not surfacing in `on_assistant_turn_stopped`. The `LLMAssistantPushAggregationFrame` emitted at the end of a TTS context now carries a PTS just past the last word so it can't overtake clock-queued `TTSTextFrame`s in the transport's output, and `LLMAssistantAggregator` now triggers `on_assistant_turn_started`/`on_assistant_turn_stopped` when it receives the frame outside an LLM response cycle (restoring v0.0.104 behavior for greeting transcripts).
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed `ElevenLabsTTSService` and `ElevenLabsHttpTTSService` producing merged words (e.g. `bookLook`) when using Flash models. Flash often splits sentences mid-stream into alignment chunks that begin with a real inter-word space, but the previous fix unconditionally stripped that space from every chunk. Leading spaces are now stripped only on the first alignment chunk of an utterance, so subsequent chunks correctly flush partial words across boundaries.
|
||||
@@ -1 +0,0 @@
|
||||
- AWS Transcribe STT, Polly TTS, Bedrock LLM, and the Bedrock AgentCore processor now resolve credentials via the standard boto3 provider chain (EC2 instance profiles, EKS pod roles / IRSA, ECS task roles, SSO, `~/.aws/credentials`) when explicit credentials and `AWS_*` environment variables are absent. Services running with IAM roles no longer need to export static credentials.
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed AWS Polly TTS, Bedrock LLM, and the Bedrock AgentCore processor erroring out when only one of `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` was set in the environment. The half-populated kwargs are no longer forwarded to aioboto3; partial env-var configurations now fall through to the boto3 credential chain like fully-unset configurations do.
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed a path traversal issue in the development runner's `/files/{filename:path}` download endpoint. Previously, when the runner was started with `--folder`, a request like `/files/..%2F..%2Fetc%2Fpasswd` could escape the configured folder because `%2F`-encoded separators bypassed Starlette's path normalisation. The endpoint now resolves the joined path and rejects any filename that escapes the allowed base with a 403, and also returns 404 (instead of an implicit `null` 200) when `--folder` is unset.
|
||||
@@ -1 +0,0 @@
|
||||
- Changed the default Inworld TTS model from `inworld-tts-1.5-max` to `inworld-tts-2` (Realtime TTS-2) across `InworldHttpTTSService`, `InworldTTSService`, and the `InworldRealtimeLLMService` cascade. Existing users can pin the prior model explicitly via the `model`/`tts_model` argument; both `inworld-tts-1.5-max` and `inworld-tts-1.5-mini` remain valid model IDs.
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed `ElevenLabsTTSService` and `ElevenLabsHttpTTSService` writing romanized/normalized text to the LLM context. With non-Latin input (e.g., Chinese), the assistant transcript was getting populated with pinyin (`Ni Hao !` instead of `你好!`), which then degraded subsequent LLM turns. The services now consume `alignment` by default and only switch to `normalizedAlignment` / `normalized_alignment` when `pronunciation_dictionary_locators` is configured (where `alignment` has overlapping restarts that produce duplicated/garbled words, per #4316). Both fields are read with preferred-with-fallback semantics since each is nullable per the API schema.
|
||||
@@ -1 +0,0 @@
|
||||
- Added `keyterms` support to ElevenLabs STT services so Scribe V2 callers can bias transcription for both file-based and realtime transcription.
|
||||
@@ -1 +0,0 @@
|
||||
- Deprecated `ResampyResampler` in favor of `SOXRAudioResampler` (or the `create_file_resampler()` / `create_stream_resampler()` factories). Instantiating `ResampyResampler` now emits a `DeprecationWarning`. The class will be removed in Pipecat 2.0 along with the default `resampy` and `numba` dependencies.
|
||||
@@ -1 +0,0 @@
|
||||
- Changed the default model for `GrokLLMService` from `grok-3` to `grok-4.20-non-reasoning`. xAI is retiring `grok-3` on May 15, 2026.
|
||||
@@ -1 +0,0 @@
|
||||
- Added `watchdog_min_timeout` parameter to `DeepgramFluxSTT` and `DeepgramFluxSageMakerSTT` (default `0.5` seconds) to control the minimum silence duration before the watchdog sends a silence packet to prevent dangling turns. The actual threshold is `max(chunk_duration * 2, watchdog_min_timeout)`, so it also adapts automatically to the audio chunk size in use.
|
||||
@@ -1 +0,0 @@
|
||||
- `DeepgramFluxSTT` watchdog silence threshold is now dynamic: `max(chunk_duration * 2, watchdog_min_timeout)` instead of a fixed 500 ms. This prevents false silence injections when large audio chunks are sent at lower frequency.
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed a deadlock in `TTSService` that could permanently stall pipeline processing when all three conditions occurred together: `pause_frame_processing=True`, an interruption arrived before any TTS audio was played, and an `UninterruptibleFrame` (e.g. `TTSUpdateSettingsFrame`, `FunctionCallResultFrame`) was in the processing queue at that moment. The process task would block on `__process_event.wait()` indefinitely because `BotStoppedSpeakingFrame` never arrives (no audio was played) and the interruption handler did not resume processing. Affects services using `pause_frame_processing=True` such as ElevenLabs, Rime, AsyncAI, Gradium, and ResembleAI.
|
||||
@@ -1 +0,0 @@
|
||||
- `ElevenLabsTTSService` now sends `close_context` to the server as soon as the turn is complete (on `on_turn_context_completed`) rather than waiting until all audio has finished playing back. The `isFinal` message from ElevenLabs is now used to signal `TTSStoppedFrame` and clean up the audio context, improving turn transition timing.
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed interruptions being delayed when a slow non-uninterruptible frame was processing and an uninterruptible frame was waiting in the queue. The bot would stall until the slow frame finished instead of cancelling it immediately on interruption.
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed `TTSService` dropping uninterruptible frames (e.g. `FunctionCallResultFrame`) from its internal serialization queue when an interruption occurs. Previously, the queue was recreated on every interruption, silently discarding any queued frames. The queue is now reset instead of recreated, preserving uninterruptible frames so they are always delivered downstream.
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user