Compare commits
4 Commits
kompfner-p
...
aleix/fram
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
ba86fc2f08 | ||
|
|
d459465eb6 | ||
|
|
74aea65f17 | ||
|
|
bd7b24596e |
@@ -1,40 +0,0 @@
|
||||
---
|
||||
name: changelog
|
||||
description: Create changelog files for important commits in a PR
|
||||
---
|
||||
|
||||
Create changelog files for the important commits in this PR. The PR number is provided as an argument.
|
||||
|
||||
## Instructions
|
||||
|
||||
1. First, check what commits are on the current branch compared to main:
|
||||
```
|
||||
git log main..HEAD --oneline
|
||||
```
|
||||
|
||||
2. For each significant change, create a changelog file in the `changelog/` folder using the format:
|
||||
- `{PR_NUMBER}.added.md` - for new features
|
||||
- `{PR_NUMBER}.added.2.md`, `{PR_NUMBER}.added.3.md` - for additional new features
|
||||
- `{PR_NUMBER}.changed.md` - for changes to existing functionality
|
||||
- `{PR_NUMBER}.fixed.md` - for bug fixes
|
||||
- `{PR_NUMBER}.deprecated.md` - for deprecations
|
||||
|
||||
3. Each changelog file should at least contain a main single line starting with `- ` followed by a clear description of the change.
|
||||
|
||||
4. If the change is complicated, changelog files can have indented lines after the main line with additional details or code samples.
|
||||
|
||||
5. Use ⚠️ emoji prefix for breaking changes.
|
||||
|
||||
## Example
|
||||
|
||||
For PR #3519 with a new feature and a bug fix:
|
||||
|
||||
`changelog/3519.added.md`:
|
||||
```
|
||||
- Added `SomeNewFeature` for doing something useful.
|
||||
```
|
||||
|
||||
`changelog/3519.fixed.md`:
|
||||
```
|
||||
- Fixed an issue where something was not working correctly.
|
||||
```
|
||||
@@ -1,257 +0,0 @@
|
||||
---
|
||||
name: docstring
|
||||
description: Document a Python module and its classes using Google style
|
||||
---
|
||||
|
||||
Document a Python module and its classes using Google-style docstrings following project conventions. The class name is provided as an argument.
|
||||
|
||||
## Instructions
|
||||
|
||||
1. First, find the class in the codebase:
|
||||
```
|
||||
Search for "class ClassName" in src/pipecat/
|
||||
```
|
||||
|
||||
2. If multiple files contain that class name:
|
||||
- List all matches with their file paths
|
||||
- Ask the user which one they want to document
|
||||
- Wait for confirmation before proceeding
|
||||
|
||||
3. Once the file is identified, read the module to understand its structure:
|
||||
- Identify all classes, functions, and important type aliases
|
||||
- Understand the purpose of each component
|
||||
|
||||
4. Apply documentation in this order:
|
||||
- Module docstring (at top, after imports)
|
||||
- Class docstrings
|
||||
- `__init__` methods (always document constructor parameters)
|
||||
- Public methods (not starting with `_`)
|
||||
- Dataclass/config classes with field descriptions
|
||||
|
||||
5. Skip documentation for:
|
||||
- Private methods (starting with `_`)
|
||||
- Simple dunder methods (`__str__`, `__repr__`, `__post_init__`)
|
||||
- Very simple pass-through properties
|
||||
- **Already documented code** - If a class, method, or function already has a complete docstring that follows the project style, do not modify it. A docstring is complete if it has:
|
||||
- A one-line summary
|
||||
- Args section (if it has parameters)
|
||||
- Returns section (if it returns something meaningful)
|
||||
- Only add or improve documentation where it is missing or incomplete
|
||||
|
||||
## Module Docstring Format
|
||||
|
||||
```python
|
||||
"""[One-line description of module purpose].
|
||||
|
||||
[Optional: Longer explanation of functionality, key classes, or use cases.]
|
||||
"""
|
||||
```
|
||||
|
||||
Example:
|
||||
```python
|
||||
"""Neuphonic text-to-speech service implementations.
|
||||
|
||||
This module provides WebSocket and HTTP-based integrations with Neuphonic's
|
||||
text-to-speech API for real-time audio synthesis.
|
||||
"""
|
||||
```
|
||||
|
||||
## Class Docstring Format
|
||||
|
||||
```python
|
||||
class ClassName:
|
||||
"""One-line summary describing what the class does.
|
||||
|
||||
[Longer description explaining purpose, behavior, and key features.
|
||||
Use action-oriented language.]
|
||||
|
||||
[Optional: Event handlers, usage notes, or important caveats.]
|
||||
"""
|
||||
```
|
||||
|
||||
Example:
|
||||
```python
|
||||
class FrameProcessor(BaseObject):
|
||||
"""Base class for all frame processors in the pipeline.
|
||||
|
||||
Frame processors are the building blocks of Pipecat pipelines, they can be
|
||||
linked to form complex processing pipelines. They receive frames, process
|
||||
them, and pass them to the next or previous processor in the chain.
|
||||
|
||||
Event handlers available:
|
||||
|
||||
- on_before_process_frame: Called before a frame is processed
|
||||
- on_after_process_frame: Called after a frame is processed
|
||||
|
||||
Example::
|
||||
|
||||
@processor.event_handler("on_before_process_frame")
|
||||
async def on_before_process_frame(processor, frame):
|
||||
...
|
||||
|
||||
@processor.event_handler("on_after_process_frame")
|
||||
async def on_after_process_frame(processor, frame):
|
||||
...
|
||||
"""
|
||||
```
|
||||
|
||||
Note: When listing event handlers, do NOT use backticks. Include an `Example::` section (with double colon for Sphinx) showing the decorator pattern and function signature for each event.
|
||||
|
||||
## Constructor (`__init__`) Format
|
||||
|
||||
```python
|
||||
def __init__(self, *, param1: Type, param2: Type = default, **kwargs):
|
||||
"""Initialize the [ClassName].
|
||||
|
||||
Args:
|
||||
param1: Description of param1 and its purpose.
|
||||
param2: Description of param2. Defaults to [default].
|
||||
**kwargs: Additional arguments passed to parent class.
|
||||
"""
|
||||
```
|
||||
|
||||
Example:
|
||||
```python
|
||||
def __init__(
|
||||
self,
|
||||
*,
|
||||
api_key: str,
|
||||
voice_id: Optional[str] = None,
|
||||
sample_rate: Optional[int] = 22050,
|
||||
**kwargs,
|
||||
):
|
||||
"""Initialize the Neuphonic TTS service.
|
||||
|
||||
Args:
|
||||
api_key: Neuphonic API key for authentication.
|
||||
voice_id: ID of the voice to use for synthesis.
|
||||
sample_rate: Audio sample rate in Hz. Defaults to 22050.
|
||||
**kwargs: Additional arguments passed to parent InterruptibleTTSService.
|
||||
"""
|
||||
```
|
||||
|
||||
## Method Docstring Format
|
||||
|
||||
```python
|
||||
async def method_name(self, param1: Type) -> ReturnType:
|
||||
"""One-line summary of what method does.
|
||||
|
||||
[Longer description if behavior isn't obvious.]
|
||||
|
||||
Args:
|
||||
param1: Description of param1.
|
||||
|
||||
Returns:
|
||||
Description of return value.
|
||||
|
||||
Raises:
|
||||
ExceptionType: When this exception is raised.
|
||||
"""
|
||||
```
|
||||
|
||||
Example:
|
||||
```python
|
||||
async def put(self, item: Tuple[Frame, FrameDirection, FrameCallback]):
|
||||
"""Put an item into the priority queue.
|
||||
|
||||
System frames (`SystemFrame`) have higher priority than any other
|
||||
frames. If a non-frame item is provided it will have the highest priority.
|
||||
|
||||
Args:
|
||||
item: The item to enqueue.
|
||||
"""
|
||||
```
|
||||
|
||||
## Dataclass/Config Format
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class ConfigName:
|
||||
"""One-line description of configuration.
|
||||
|
||||
[Explanation of when/how to use this config.]
|
||||
|
||||
Parameters:
|
||||
field1: Description of field1.
|
||||
field2: Description of field2. Defaults to [default].
|
||||
"""
|
||||
|
||||
field1: Type
|
||||
field2: Type = default_value
|
||||
```
|
||||
|
||||
Example:
|
||||
```python
|
||||
@dataclass
|
||||
class FrameProcessorSetup:
|
||||
"""Configuration parameters for frame processor initialization.
|
||||
|
||||
Parameters:
|
||||
clock: The clock instance for timing operations.
|
||||
task_manager: The task manager for handling async operations.
|
||||
observer: Optional observer for monitoring frame processing events.
|
||||
"""
|
||||
|
||||
clock: BaseClock
|
||||
task_manager: BaseTaskManager
|
||||
observer: Optional[BaseObserver] = None
|
||||
```
|
||||
|
||||
## Enum Documentation Format
|
||||
|
||||
```python
|
||||
class EnumName(Enum):
|
||||
"""One-line description of the enum purpose.
|
||||
|
||||
[Longer description of how the enum is used.]
|
||||
|
||||
Parameters:
|
||||
VALUE1: Description of VALUE1.
|
||||
VALUE2: Description of VALUE2.
|
||||
"""
|
||||
|
||||
VALUE1 = 1
|
||||
VALUE2 = 2
|
||||
```
|
||||
|
||||
## Writing Style Guidelines
|
||||
|
||||
- **Concise and professional** - No casual language or filler words
|
||||
- **Action-oriented** - Start with verbs: "Processes...", "Manages...", "Converts..."
|
||||
- **Purpose before implementation** - Explain WHY before HOW
|
||||
- **Clear parameter descriptions** - Include type hints, defaults, and purpose
|
||||
- **No redundant type info** - Type hints are in the signature, don't repeat in description
|
||||
- **Use backticks for code references** - Wrap class names, method names, event names, parameter names, and code snippets in backticks
|
||||
|
||||
Good: "Neuphonic API key for authentication."
|
||||
Bad: "str: The API key (string) that is used for authenticating with Neuphonic."
|
||||
|
||||
Good: "Triggers `on_speech_started` when the `VADAnalyzer` detects speech."
|
||||
Bad: "Triggers on_speech_started when the VADAnalyzer detects speech."
|
||||
|
||||
## Deprecation Notice Format
|
||||
|
||||
When documenting deprecated code:
|
||||
|
||||
```python
|
||||
"""[Description].
|
||||
|
||||
.. deprecated:: X.X.X
|
||||
`ClassName` is deprecated and will be removed in a future version.
|
||||
Use `NewClassName` instead.
|
||||
"""
|
||||
```
|
||||
|
||||
## Checklist
|
||||
|
||||
Before finishing, verify:
|
||||
|
||||
- [ ] Module has a docstring at the top (after copyright header and imports)
|
||||
- [ ] All public classes have docstrings
|
||||
- [ ] All `__init__` methods document their parameters
|
||||
- [ ] All public methods have docstrings with Args/Returns/Raises as needed
|
||||
- [ ] Dataclasses use "Parameters:" section for field descriptions
|
||||
- [ ] Enums document each value in "Parameters:" section
|
||||
- [ ] Writing is concise and action-oriented
|
||||
- [ ] No documentation added to private methods (starting with `_`)
|
||||
- [ ] Existing complete docstrings were left unchanged
|
||||
@@ -1,128 +0,0 @@
|
||||
---
|
||||
name: pr-description
|
||||
description: Update a GitHub PR description with a summary of changes
|
||||
---
|
||||
|
||||
Update a GitHub pull request description based on the changes in the PR.
|
||||
|
||||
## Arguments
|
||||
|
||||
```
|
||||
/pr-description <PR_NUMBER> [--fixes <ISSUE_NUMBERS>]
|
||||
```
|
||||
|
||||
- `PR_NUMBER` (required): The pull request number to update
|
||||
- `--fixes` (optional): Comma-separated issue numbers that this PR fixes (e.g., `--fixes 123,456`)
|
||||
|
||||
Examples:
|
||||
- `/pr-description 3534`
|
||||
- `/pr-description 3534 --fixes 123`
|
||||
- `/pr-description 3534 --fixes 123,456,789`
|
||||
|
||||
## Instructions
|
||||
|
||||
1. First, gather information about the PR:
|
||||
- Use GitHub plugin to get PR details (title, current description, base branch)
|
||||
- Use local git to get commits: `git log main..HEAD --oneline`
|
||||
- Use local git to get the diff: `git diff main..HEAD`
|
||||
- Parse any `--fixes` argument for issue numbers
|
||||
|
||||
2. Check the existing PR description:
|
||||
- If it already has a complete, accurate description that reflects the changes, do nothing
|
||||
- If it's missing sections, incomplete, or outdated compared to the actual changes, proceed to update
|
||||
- If it only has the template placeholder text, generate a full description
|
||||
|
||||
3. Analyze the changes:
|
||||
- Understand the purpose of each commit
|
||||
- Identify any breaking changes (API changes, removed features, behavior changes)
|
||||
- Look for new features, bug fixes, refactoring, or documentation changes
|
||||
- Collect issue numbers from:
|
||||
- The `--fixes` argument (if provided)
|
||||
- Commit messages (patterns like "Fixes #123", "Closes #456", "Resolves #789")
|
||||
|
||||
4. Generate or update the PR description with these sections:
|
||||
|
||||
## PR Description Format
|
||||
|
||||
### Summary (always include)
|
||||
|
||||
Brief bullet points describing what changed and why. Focus on the *purpose* and *impact*, not implementation details.
|
||||
|
||||
```markdown
|
||||
## Summary
|
||||
|
||||
- Added X to enable Y
|
||||
- Fixed bug where Z would happen
|
||||
- Refactored W for better maintainability
|
||||
```
|
||||
|
||||
### Breaking Changes (include only if applicable)
|
||||
|
||||
Document any changes that affect existing users or APIs.
|
||||
|
||||
```markdown
|
||||
## Breaking Changes
|
||||
|
||||
- `ClassName.method()` now requires a `param` argument
|
||||
- Removed deprecated `old_function()` - use `new_function()` instead
|
||||
```
|
||||
|
||||
### Testing (include when non-obvious)
|
||||
|
||||
How to verify the changes work. Skip for trivial changes.
|
||||
|
||||
```markdown
|
||||
## Testing
|
||||
|
||||
- Run `uv run pytest tests/test_feature.py` to verify the fix
|
||||
- Example usage: `uv run examples/new_feature.py`
|
||||
```
|
||||
|
||||
### Fixes (include if issues are provided or found in commits)
|
||||
|
||||
List issues this PR fixes. GitHub will automatically close these issues when the PR is merged.
|
||||
|
||||
```markdown
|
||||
## Fixes
|
||||
|
||||
- Fixes #123
|
||||
- Fixes #456
|
||||
```
|
||||
|
||||
Note: Use "Fixes #X" format (not "Closes" or "Resolves") for consistency. Each issue should be on its own line with "Fixes" to ensure GitHub auto-closes them.
|
||||
|
||||
## Guidelines
|
||||
|
||||
- **Be concise** - Reviewers should understand the PR in 30 seconds
|
||||
- **Focus on why** - The diff shows *what* changed, explain *why*
|
||||
- **Skip empty sections** - Only include sections that have content
|
||||
- **Use bullet points** - Easier to scan than paragraphs
|
||||
- **Don't duplicate the diff** - Avoid listing every file or line changed
|
||||
|
||||
## Example Output
|
||||
|
||||
```markdown
|
||||
## Summary
|
||||
|
||||
- Added `/docstring` skill for documenting Python modules with Google-style docstrings
|
||||
- Skill finds classes by name and handles conflicts when multiple matches exist
|
||||
- Skips already-documented code to avoid unnecessary changes
|
||||
|
||||
## Testing
|
||||
|
||||
/docstring ClassName
|
||||
|
||||
## Fixes
|
||||
|
||||
- Fixes #123
|
||||
```
|
||||
|
||||
## Checklist
|
||||
|
||||
Before updating the PR:
|
||||
|
||||
- [ ] Verified existing description needs updating (not already complete)
|
||||
- [ ] Summary accurately reflects the changes
|
||||
- [ ] Breaking changes are clearly documented (if any)
|
||||
- [ ] No unnecessary sections included
|
||||
- [ ] Description is concise and scannable
|
||||
2
.github/workflows/coverage.yaml
vendored
2
.github/workflows/coverage.yaml
vendored
@@ -33,7 +33,7 @@ jobs:
|
||||
|
||||
- name: Install dependencies
|
||||
run: |
|
||||
uv sync --group dev --extra anthropic --extra aws --extra google --extra langchain --extra livekit --extra websocket
|
||||
uv sync --group dev --extra anthropic --extra aws --extra google --extra langchain
|
||||
|
||||
- name: Run tests with coverage
|
||||
run: |
|
||||
|
||||
2
.github/workflows/tests.yaml
vendored
2
.github/workflows/tests.yaml
vendored
@@ -37,7 +37,7 @@ jobs:
|
||||
|
||||
- name: Install dependencies
|
||||
run: |
|
||||
uv sync --group dev --extra anthropic --extra aws --extra google --extra langchain --extra livekit --extra websocket
|
||||
uv sync --group dev --extra anthropic --extra aws --extra google --extra langchain
|
||||
|
||||
- name: Test with pytest
|
||||
run: |
|
||||
|
||||
16
.gitignore
vendored
16
.gitignore
vendored
@@ -4,14 +4,7 @@ __pycache__/
|
||||
*~
|
||||
venv
|
||||
.venv
|
||||
.idea
|
||||
.gradle
|
||||
.next
|
||||
next-env.d.ts
|
||||
local.properties
|
||||
*.log
|
||||
*.lock
|
||||
smart_turn_audio_log
|
||||
/.idea
|
||||
#*#
|
||||
|
||||
# Distribution / Packaging
|
||||
@@ -34,7 +27,7 @@ share/python-wheels/
|
||||
*.egg
|
||||
MANIFEST
|
||||
.DS_Store
|
||||
.env*
|
||||
.env
|
||||
fly.toml
|
||||
|
||||
# Examples
|
||||
@@ -58,7 +51,4 @@ docs/api/_build/
|
||||
docs/api/api
|
||||
|
||||
# uv
|
||||
.python-version
|
||||
|
||||
# Pipecat
|
||||
whisker_setup.py
|
||||
.python-version
|
||||
658
CHANGELOG.md
658
CHANGELOG.md
@@ -7,664 +7,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
||||
|
||||
<!-- towncrier release notes start -->
|
||||
|
||||
## [0.0.100] - 2026-01-20
|
||||
|
||||
### Added
|
||||
|
||||
- Added Hathora service to support Hathora-hosted TTS and STT models (only
|
||||
non-streaming)
|
||||
(PR [#3169](https://github.com/pipecat-ai/pipecat/pull/3169))
|
||||
|
||||
- Added `CambTTSService`, using Camb.ai's TTS integration with MARS models
|
||||
(mars-flash, mars-pro, mars-instruct) for high-quality text-to-speech
|
||||
synthesis.
|
||||
(PR [#3349](https://github.com/pipecat-ai/pipecat/pull/3349))
|
||||
|
||||
- Added the `additional_headers` param to `WebsocketClientParams`, allowing
|
||||
`WebsocketClientTransport` to send custom headers on connect, for cases such
|
||||
as authentication.
|
||||
(PR [#3461](https://github.com/pipecat-ai/pipecat/pull/3461))
|
||||
|
||||
- Added `UserIdleController` for detecting user idle state, integrated into
|
||||
`LLMUserAggregator` and `UserTurnProcessor` via optional `user_idle_timeout`
|
||||
parameter. Emits `on_user_turn_idle` event for application-level handling.
|
||||
Deprecated `UserIdleProcessor` in favor of the new compositional approach.
|
||||
(PR [#3482](https://github.com/pipecat-ai/pipecat/pull/3482))
|
||||
|
||||
- Added `on_user_mute_started` and `on_user_mute_stopped` event handlers to
|
||||
`LLMUserAggregator` for tracking user mute state changes.
|
||||
(PR [#3490](https://github.com/pipecat-ai/pipecat/pull/3490))
|
||||
|
||||
### Changed
|
||||
|
||||
- Enhanced interruption handling in `AsyncAITTSService` by supporting
|
||||
multi-context WebSocket sessions for more robust context management.
|
||||
(PR [#3287](https://github.com/pipecat-ai/pipecat/pull/3287))
|
||||
|
||||
- Throttle `UserSpeakingFrame` to broadcast at most every 200ms instead of on
|
||||
every audio chunk, reducing frame processing overhead during user speech.
|
||||
(PR [#3483](https://github.com/pipecat-ai/pipecat/pull/3483))
|
||||
|
||||
### Deprecated
|
||||
|
||||
- For consistency with other package names, we just deprecated
|
||||
`pipecat.turns.mute` (introduced in Pipecat 0.0.99) in favor of
|
||||
`pipecat.turns.user_mute`.
|
||||
(PR [#3479](https://github.com/pipecat-ai/pipecat/pull/3479))
|
||||
|
||||
### Fixed
|
||||
|
||||
- Corrected TTFB metric calculation in `AsyncAIHttpTTSService`.
|
||||
(PR [#3287](https://github.com/pipecat-ai/pipecat/pull/3287))
|
||||
|
||||
- Fixed an issue where the "bot-llm-text" RTVI event would not fire for
|
||||
realtime (speech-to-speech) services:
|
||||
|
||||
- `AWSNovaSonicLLMService`
|
||||
- `GeminiLiveLLMService`
|
||||
- `OpenAIRealtimeLLMService`
|
||||
- `GrokRealtimeLLMService`
|
||||
|
||||
The issue was that these services weren't pushing `LLMTextFrame`s. Now
|
||||
they do.
|
||||
(PR [#3446](https://github.com/pipecat-ai/pipecat/pull/3446))
|
||||
|
||||
- Fixed an issue where `on_user_turn_stop_timeout` could fire while a user is
|
||||
talking when using `ExternalUserTurnStrategies`.
|
||||
(PR [#3454](https://github.com/pipecat-ai/pipecat/pull/3454))
|
||||
|
||||
- Fixed an issue where user turn start strategies were not being reset after a
|
||||
user turn started, causing incorrect strategy behavior.
|
||||
(PR [#3455](https://github.com/pipecat-ai/pipecat/pull/3455))
|
||||
|
||||
- Fixed `MinWordsUserTurnStartStrategy` to not aggregate transcriptions,
|
||||
preventing incorrect turn starts when words are spoken with pauses between
|
||||
them.
|
||||
(PR [#3462](https://github.com/pipecat-ai/pipecat/pull/3462))
|
||||
|
||||
- Fixed an issue where Grok Realtime would error out when running with
|
||||
SmallWebRTC transport.
|
||||
(PR [#3480](https://github.com/pipecat-ai/pipecat/pull/3480))
|
||||
|
||||
- Fixed a `Mem0MemoryService` issue where passing `async_mode: true` was
|
||||
causing an error. See
|
||||
https://docs.mem0.ai/platform/features/async-mode-default-change.
|
||||
(PR [#3484](https://github.com/pipecat-ai/pipecat/pull/3484))
|
||||
|
||||
- Fixed `AWSNovaSonicLLMService.reset_conversation()`, which would previously
|
||||
error out. Now it successfully reconnects and "rehydrates" from the context
|
||||
object.
|
||||
(PR [#3486](https://github.com/pipecat-ai/pipecat/pull/3486))
|
||||
|
||||
- Fixed `AzureTTSService` transcript formatting issues:
|
||||
- Punctuation now appears without extra spaces (e.g., "Hello!" instead of
|
||||
"Hello !")
|
||||
- CJK languages (Chinese, Japanese, Korean) no longer have unwanted spaces
|
||||
between characters
|
||||
(PR [#3489](https://github.com/pipecat-ai/pipecat/pull/3489))
|
||||
|
||||
- Fixed an issue where `UninterruptibleFrame` frames would not be preserved in
|
||||
some cases.
|
||||
(PR [#3494](https://github.com/pipecat-ai/pipecat/pull/3494))
|
||||
|
||||
- Fixed memory leak in `LiveKitTransport` when `video_in_enabled` is `False`.
|
||||
(PR [#3499](https://github.com/pipecat-ai/pipecat/pull/3499))
|
||||
|
||||
- Fixed an issue in `AIService` where unhandled exceptions in `start()`,
|
||||
`stop()`, or `cancel()` implementations would prevent `process_frame()` to
|
||||
continue and therefore `StartFrame`, `EndFrame`, or `CancelFrame` from being
|
||||
pushed downstream, causing the pipeline to not start or stop properly.
|
||||
(PR [#3503](https://github.com/pipecat-ai/pipecat/pull/3503))
|
||||
|
||||
- Moved `NVIDIATTSService` and `NVIDIASTTService` client initialization from
|
||||
constructor to `start()` for better error handling.
|
||||
(PR [#3504](https://github.com/pipecat-ai/pipecat/pull/3504))
|
||||
|
||||
- Optimized `NVIDIATTSService` to process incoming audio frames immediately.
|
||||
(PR [#3509](https://github.com/pipecat-ai/pipecat/pull/3509))
|
||||
|
||||
- Optimized `NVIDIASTTService` by removing unnecessary queue and task.
|
||||
(PR [#3509](https://github.com/pipecat-ai/pipecat/pull/3509))
|
||||
|
||||
- Fixed a `CambTTSService` issue where client was being initialized in the
|
||||
constructor which wouldn't allow for proper Pipeline error handling.
|
||||
(PR [#3511](https://github.com/pipecat-ai/pipecat/pull/3511))
|
||||
|
||||
## [0.0.99] - 2026-01-13
|
||||
|
||||
### Added
|
||||
|
||||
- Introducing user turn strategies. User turn strategies indicate when the user
|
||||
turn starts or stops. In conversational agents, these are often referred to
|
||||
as start/stop speaking or turn-taking plans or policies.
|
||||
|
||||
User turn start strategies indicate when the user starts speaking (e.g.
|
||||
using VAD events or when a user says one or more words).
|
||||
|
||||
User turn stop strategies indicate when the user stops speaking (e.g. using
|
||||
an end-of-turn detection model or by observing incoming transcriptions).
|
||||
|
||||
A list of strategies can be specified for both strategies; strategies are
|
||||
evaluated in order until one evaluates to true.
|
||||
|
||||
Available user turn start strategies:
|
||||
|
||||
- VADUserTurnStartStrategy
|
||||
- TranscriptionUserTurnStartStrategy
|
||||
- MinWordsUserTurnStartStrategy
|
||||
- ExternalUserTurnStartStrategy
|
||||
|
||||
Available user turn stop strategies:
|
||||
|
||||
- TranscriptionUserTurnStopStrategy
|
||||
- TurnAnalyzerUserTurnStopStrategy
|
||||
- ExternalUserTurnStopStrategy
|
||||
|
||||
The default strategies are:
|
||||
|
||||
- start: [VADUserTurnStartStrategy, TranscriptionUserTurnStartStrategy]
|
||||
- stop: [TranscriptionUserTurnStopStrategy]
|
||||
|
||||
Turn strategies are configured when setting up `LLMContextAggregatorPair`.
|
||||
For example:
|
||||
|
||||
```python
|
||||
context_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(
|
||||
user_turn_strategies=UserTurnStrategies(
|
||||
stop=[
|
||||
TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams())
|
||||
)
|
||||
],
|
||||
)
|
||||
),
|
||||
)
|
||||
```
|
||||
|
||||
In order to use the user turn strategies you must update to the new
|
||||
universal `LLMContext` and `LLMContextAggregatorPair`.
|
||||
(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045))
|
||||
|
||||
- Added `RNNoiseFilter` for real-time noise suppression using RNNoise neural
|
||||
network via pyrnnoise library.
|
||||
(PR [#3205](https://github.com/pipecat-ai/pipecat/pull/3205))
|
||||
|
||||
- Added `GrokRealtimeLLMService` for xAI's Grok Voice Agent API with real-time
|
||||
voice conversations:
|
||||
|
||||
- Support for real-time audio streaming with WebSocket connection
|
||||
- Built-in server-side VAD (Voice Activity Detection)
|
||||
- Multiple voice options: Ara, Rex, Sal, Eve, Leo
|
||||
- Built-in tools support: web_search, x_search, file_search
|
||||
- Custom function calling with standard Pipecat tools schema
|
||||
- Configurable audio formats (PCM at 8kHz-48kHz)
|
||||
(PR [#3267](https://github.com/pipecat-ai/pipecat/pull/3267))
|
||||
|
||||
- Added an approximation of TTFB for Ultravox.
|
||||
(PR [#3268](https://github.com/pipecat-ai/pipecat/pull/3268))
|
||||
|
||||
- Added a new `AudioContextTTSService` to the TTS service base classes. The
|
||||
`AudioContextWordTTSService` now inherits from `AudioContextTTSService` and
|
||||
`WebsocketWordTTSService`.
|
||||
(PR [#3289](https://github.com/pipecat-ai/pipecat/pull/3289))
|
||||
|
||||
- `LLMUserAggregator` now exposes the following events:
|
||||
|
||||
- `on_user_turn_started`: triggered when a user turn starts
|
||||
- `on_user_turn_stopped`: triggered when a user turn ends
|
||||
- `on_user_turn_stop_timeout`: triggered when a user turn does not stop
|
||||
and times out
|
||||
(PR [#3291](https://github.com/pipecat-ai/pipecat/pull/3291))
|
||||
|
||||
- Introducing user mute strategies. User mute strategies indicate when user
|
||||
input should be muted based on the current system state.
|
||||
|
||||
In conversational agents, user mute strategies are used to prevent user
|
||||
input from interrupting bot speech, tool execution, or other critical system
|
||||
operations.
|
||||
|
||||
A list of strategies can be specified; all strategies are evaluated for
|
||||
every frame so that each strategy can maintain its internal state. A user
|
||||
frame is muted if any of the configured strategies indicates it should be
|
||||
muted.
|
||||
|
||||
Available user mute strategies:
|
||||
|
||||
- `FirstSpeechUserMuteStrategy`
|
||||
- `MuteUntilFirstBotCompleteUserMuteStrategy`
|
||||
- `AlwaysUserMuteStrategy`
|
||||
- `FunctionCallUserMuteStrategy`
|
||||
|
||||
User mute strategies replace the legacy `STTMuteFilter` and provide a more
|
||||
flexible and composable approach to muting user input.
|
||||
|
||||
User mute strategies are configured when setting up the
|
||||
`LLMContextAggregatorPair`. For example:
|
||||
|
||||
```python
|
||||
context_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(
|
||||
user_mute_strategies=[
|
||||
FirstSpeechUserMuteStrategy(),
|
||||
]
|
||||
),
|
||||
)
|
||||
```
|
||||
|
||||
In order to use user mute strategies you should update to the new universal
|
||||
`LLMContext` and `LLMContextAggregatorPair`.
|
||||
(PR [#3292](https://github.com/pipecat-ai/pipecat/pull/3292))
|
||||
|
||||
- Added `use_ssl` parameter to `NvidiaSTTService`, `NvidiaSegmentedSTTService`
|
||||
and `NvidiaTTSService`.
|
||||
(PR [#3300](https://github.com/pipecat-ai/pipecat/pull/3300))
|
||||
|
||||
- Added `enable_interruptions` constructor argument to all user turn
|
||||
strategies. This tells the `LLMUserAggregator` to push or not push an
|
||||
`InterruptionFrame`.
|
||||
(PR [#3316](https://github.com/pipecat-ai/pipecat/pull/3316))
|
||||
|
||||
- Added `split_sentences` parameter to `SpeechmaticsSTTService` to control
|
||||
sentence splitting behavior for finals on sentence boundaries.
|
||||
(PR [#3328](https://github.com/pipecat-ai/pipecat/pull/3328))
|
||||
|
||||
- Added word-level timestamp support to `AzureTTSService` for accurate
|
||||
text-to-audio synchronization.
|
||||
(PR [#3334](https://github.com/pipecat-ai/pipecat/pull/3334))
|
||||
|
||||
- Added `pronunciation_dict_id` parameter to `CartesiaTTSService.InputParams`
|
||||
and `CartesiaHttpTTSService.InputParams` to support Cartesia's pronunciation
|
||||
dictionary feature for custom pronunciations.
|
||||
(PR [#3346](https://github.com/pipecat-ai/pipecat/pull/3346))
|
||||
|
||||
- Added support for using the HeyGen LiveAvatar API with the `HeyGenTransport`
|
||||
(see https://www.liveavatar.com/).
|
||||
(PR [#3357](https://github.com/pipecat-ai/pipecat/pull/3357))
|
||||
|
||||
- Added image support to `OpenAIRealtimeLLMService` via `InputImageRawFrame`:
|
||||
|
||||
- New `start_video_paused` parameter to control initial video input state
|
||||
- New `video_frame_detail` parameter to set image processing quality
|
||||
("auto",
|
||||
"low", or "high"). This corresponds to OpenAI Realtime's `image_detail`
|
||||
parameter.
|
||||
- `set_video_input_paused()` method to pause/resume video input at runtime
|
||||
- `set_video_frame_detail()` method to adjust video frame quality
|
||||
dynamically
|
||||
- Automatic rate limiting (1 frame per second) to prevent API overload
|
||||
(PR [#3360](https://github.com/pipecat-ai/pipecat/pull/3360))
|
||||
|
||||
- Added `UserTurnProcessor`, a frame processor built on `UserTurnController`
|
||||
that pushes `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` frames
|
||||
and interruptions based on the controller's user turn strategies.
|
||||
(PR [#3372](https://github.com/pipecat-ai/pipecat/pull/3372))
|
||||
|
||||
- Added `UserTurnController` to manage user turns. It emits
|
||||
`on_user_turn_started`, `on_user_turn_stopped`, and
|
||||
`on_user_turn_stop_timeout` events, and can be integrated into processors to
|
||||
detect and handle user turns. `LLMUserAggregator` and `UserTurnProcessor` are
|
||||
implemented using this controller.
|
||||
(PR [#3372](https://github.com/pipecat-ai/pipecat/pull/3372))
|
||||
|
||||
- Added `should_interrupt` property to `DeepgramFluxSTTService`,
|
||||
`DeepgramSTTService`, and `SpeechmaticsSTTService` to configure whether the
|
||||
bot should be interrupted when the external service detects user speech.
|
||||
(PR [#3374](https://github.com/pipecat-ai/pipecat/pull/3374))
|
||||
|
||||
- `LLMAssistantAggregator` now exposes the following events:
|
||||
|
||||
- `on_assistant_turn_started`: triggered when the assistant turn starts
|
||||
- `on_assistant_turn_stopped`: triggered when the assistant turn ends
|
||||
- `on_assistant_thought`: triggered when there's an assistant thought
|
||||
available
|
||||
(PR [#3385](https://github.com/pipecat-ai/pipecat/pull/3385))
|
||||
|
||||
- Added `KrispVivaTurn` analyzer for end of turn detection using the Krisp VIVA
|
||||
SDK (requires `krisp_audio`).
|
||||
(PR [#3391](https://github.com/pipecat-ai/pipecat/pull/3391))
|
||||
|
||||
- Added support for setting up a pipeline task from external files. You can now
|
||||
register custom pipeline task setup files by setting the
|
||||
`PIPECAT_SETUP_FILES` environment variable. This variable should contain a
|
||||
colon-separated list of Python files (e.g. `export
|
||||
PIPECAT_SETUP_FILES="setup1.py:setup.py:..."`). Each file must define a
|
||||
function with the following signature:
|
||||
|
||||
```python
|
||||
async def setup_pipeline_task(task: PipelineTask):
|
||||
...
|
||||
```
|
||||
|
||||
(PR [#3397](https://github.com/pipecat-ai/pipecat/pull/3397))
|
||||
|
||||
- Added a keepalive task for `InworldTTSService` to keep the service connected
|
||||
in the event of no generations for longer periods of time.
|
||||
(PR [#3403](https://github.com/pipecat-ai/pipecat/pull/3403))
|
||||
|
||||
- Added `enable_vad` to `Params` for use in the `GladiaSTTService`. When
|
||||
enabled, `GladiaSTTService` acts as the turn controller, emitting
|
||||
`UserStartedSpeakingFrame`, `UserStoppedSpeakingFrame`, and optionally
|
||||
`InterruptionFrame`.
|
||||
(PR [#3404](https://github.com/pipecat-ai/pipecat/pull/3404))
|
||||
|
||||
- Added `should_interrupt` property to `GladiaSTTService` to configure whether
|
||||
the bot should be interrupted when the external service detects user speech.
|
||||
(PR [#3404](https://github.com/pipecat-ai/pipecat/pull/3404))
|
||||
|
||||
- Added `VonageFrameSerializer` for the Vonage Video API Audio Connector
|
||||
WebSocket protocol.
|
||||
(PR [#3410](https://github.com/pipecat-ai/pipecat/pull/3410))
|
||||
|
||||
- Added `append_trailing_space` parameter to `TTSService` to automatically
|
||||
append a trailing space to text before sending to TTS, helping prevent some
|
||||
services from vocalizing trailing punctuation.
|
||||
(PR [#3424](https://github.com/pipecat-ai/pipecat/pull/3424))
|
||||
|
||||
### Changed
|
||||
|
||||
- Updated `ElevenLabsRealtimeSTTService` to accept the
|
||||
`include_language_detection` parameter to detect language.
|
||||
|
||||
```python
|
||||
stt = ElevenLabsRealtimeSTTService(
|
||||
api_key=os.getenv("ELEVENLABS_API_KEY"),
|
||||
include_language_detection=True
|
||||
)
|
||||
```
|
||||
|
||||
(PR [#3216](https://github.com/pipecat-ai/pipecat/pull/3216))
|
||||
|
||||
- Updated `SpeechmaticsSTTService` to use new Python Voice SDK with improved
|
||||
VAD, Smart Turn capabilities, and brings dramatic improvements to latency
|
||||
without any impact on accuracy. Use the `turn_detection_mode` parameter to control
|
||||
the endpointing of speech, with `TurnDetectionMode.EXTERNAL` (default),
|
||||
`TurnDetectionMode.ADAPTIVE`, or `TurnDetectionMode.SMART_TURN`.
|
||||
|
||||
```python
|
||||
stt = SpeechmaticsSTTService(
|
||||
api_key=os.getenv("SPEECHMATICS_API_KEY"),
|
||||
params=SpeechmaticsSTTService.InputParams(
|
||||
language=Language.EN,
|
||||
turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.ADAPTIVE,
|
||||
speaker_active_format="<{speaker_id}>{text}</{speaker_id}>",
|
||||
),
|
||||
)
|
||||
```
|
||||
|
||||
(PR [#3225](https://github.com/pipecat-ai/pipecat/pull/3225))
|
||||
|
||||
- `daily-python` updated to 0.23.0.
|
||||
(PR [#3257](https://github.com/pipecat-ai/pipecat/pull/3257))
|
||||
|
||||
- `TranscriptionFrame` and `InterimTranscriptionFrame` produced by
|
||||
`DailyTransport` now include the transport source (i.e., the originating
|
||||
audio track).
|
||||
(PR [#3257](https://github.com/pipecat-ai/pipecat/pull/3257))
|
||||
|
||||
- Updates to Inworld TTS services:
|
||||
|
||||
- Improved `InworldTTSService`'s websocket implementation to better flush
|
||||
and close context to better handle long inputs.
|
||||
- Improved docstrings for `InworldTTSService` and `InworldHttpTTSService`.
|
||||
(PR [#3288](https://github.com/pipecat-ai/pipecat/pull/3288))
|
||||
|
||||
- Improved the error handling and reconnection logic for `WebsocketServer` by
|
||||
distinguishing between errors when disconnecting and websocket communication
|
||||
errors.
|
||||
(PR [#3392](https://github.com/pipecat-ai/pipecat/pull/3392))
|
||||
|
||||
- Updated `DeepgramSTTService` to push user started/stopped speaking and
|
||||
interruption frames when `vad_enabled` is set to true. This centralizes the
|
||||
frames into the service, removing the need to have your application code
|
||||
handle Deepgram's events and push these frames.
|
||||
(PR [#3314](https://github.com/pipecat-ai/pipecat/pull/3314))
|
||||
|
||||
- Added encoding validation to `DeepgramTTSService` to prevent unsupported
|
||||
encodings from reaching the API. The service now raises `ValueError` at
|
||||
initialization with a clear error message.
|
||||
(PR [#3329](https://github.com/pipecat-ai/pipecat/pull/3329))
|
||||
|
||||
- Updated `read_audio_frame` & `read_video_frame` methods in
|
||||
`SmallWebRTCClient` to check if the track is enabled before logging a
|
||||
warning.
|
||||
(PR [#3336](https://github.com/pipecat-ai/pipecat/pull/3336))
|
||||
|
||||
- Updated `CartesiaTTSService` to support setting `language=None`, resulting in
|
||||
Cartesia auto-detecting the language of the conversation.
|
||||
(PR [#3366](https://github.com/pipecat-ai/pipecat/pull/3366))
|
||||
|
||||
- The bundled Smart Turn weights are now updated to v3.2, which has better
|
||||
handling of short utterances, and is more robust against background noise.
|
||||
(PR [#3367](https://github.com/pipecat-ai/pipecat/pull/3367))
|
||||
|
||||
- Updated `SpeechmaticsSTTService` dependency to `speechmatics-voice[smart]>=0.2.6`
|
||||
(PR [#3371](https://github.com/pipecat-ai/pipecat/pull/3371))
|
||||
|
||||
- Smart Turn now takes into account `vad_start_seconds` when buffering audio,
|
||||
meaning that the start of the turn audio is not cut off. This improves
|
||||
accuracy for short utterances.
|
||||
|
||||
- The default value of `pre_speech_ms` is now set to 500ms for Smart Turn.
|
||||
(PR [#3377](https://github.com/pipecat-ai/pipecat/pull/3377))
|
||||
|
||||
- Improved Krisp SDK management to allow `KrispVivaTurn` and `KrispVivaFilter`
|
||||
to share a single SDK instance within the same process.
|
||||
(PR [#3391](https://github.com/pipecat-ai/pipecat/pull/3391))
|
||||
|
||||
- Updated default model for `GroqTTSService` to `canopylabs/orpheus-v1-english`
|
||||
and voice ID to `autumn`.
|
||||
(PR [#3399](https://github.com/pipecat-ai/pipecat/pull/3399))
|
||||
|
||||
- Enhanced `FastAPIWebsocketTransport` with optional protocol-level audio
|
||||
packetization via the `fixed_audio_packet_size` parameter to support media
|
||||
endpoints requiring strict framing and real-time pacing.
|
||||
(PR [#3410](https://github.com/pipecat-ai/pipecat/pull/3410))
|
||||
|
||||
- `DeepgramTTSService` and `RimeTTSService` now set `append_trailing_space` to
|
||||
`True` to prevent punctuation (e.g., “dot”) from being pronounced.
|
||||
(PR [#3424](https://github.com/pipecat-ai/pipecat/pull/3424))
|
||||
|
||||
- Updated `GeminiLiveLLMService` to push `LLMThoughtStartFrame`,
|
||||
`LLMThoughtTextFrame`, and `LLMThoughtEndFrame` when the model returns
|
||||
thought content.
|
||||
(PR [#3431](https://github.com/pipecat-ai/pipecat/pull/3431))
|
||||
|
||||
### Deprecated
|
||||
|
||||
- `pipecat.audio.interruptions.MinWordsInterruptionStrategy` is deprecated. Use
|
||||
`pipecat.turns.user_start.MinWordsUserTurnStartStrategy` with
|
||||
`LLMUserAggregator`'s new `user_turn_strategies` parameter instead.
|
||||
(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045))
|
||||
|
||||
- `FrameProcessor.interruption_strategies` is deprecated, use
|
||||
`LLMUserAggregator`'s new `user_turn_strategies` parameter instead.
|
||||
(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045))
|
||||
|
||||
- The `LLMUserAggregatorParams` and `LLMAssistantAggregatorParams` classes in
|
||||
`pipecat.processors.aggregators.llm_response` are now deprecated. Use the new
|
||||
universal `LLMContext` and `LLMContextAggregatorPair` instead.
|
||||
(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045))
|
||||
|
||||
- Deprecated the `emulated` field in the `UserStartedSpeakingFrame` and
|
||||
`UserStoppedSpeakingFrame` frames.
|
||||
(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045))
|
||||
|
||||
- `EmulateUserStartedSpeakingFrame` and `EmulateUserStoppedSpeakingFrame`
|
||||
frames are deprecated.
|
||||
(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045))
|
||||
|
||||
- ⚠️ `TransportParams.turn_analyzer` is deprecated and might result in
|
||||
unexpected behavior, use `LLMUserAggregator`'s new `user_turn_strategies`
|
||||
parameter instead.
|
||||
(PR [#3045](https://github.com/pipecat-ai/pipecat/pull/3045))
|
||||
|
||||
- For `SpeechmaticsSTTService`, the `end_of_utterance_mode` parameter is
|
||||
deprecated. Use the new `turn_detection_mode` parameter instead, with
|
||||
`TurnDetectionMode.EXTERNAL`,`TurnDetectionMode.ADAPTIVE`, or
|
||||
`TurnDetectionMode.SMART_TURN`. The `enable_vad` parameter is also
|
||||
deprecated and is inferred from the `turn_detection_mode`.
|
||||
(PR [#3225](https://github.com/pipecat-ai/pipecat/pull/3225))
|
||||
|
||||
- `OpenAILLMContext` and its associated things (context aggregators, etc.) are
|
||||
now deprecated in favor of the universal `LLMContext` and its associated
|
||||
things.
|
||||
|
||||
From the developer's point of view, switching to using `LLMContext`
|
||||
machinery will usually be a matter of going from this:
|
||||
|
||||
```python
|
||||
context = OpenAILLMContext(messages, tools)
|
||||
context_aggregator = llm.create_context_aggregator(context)
|
||||
```
|
||||
|
||||
To this:
|
||||
|
||||
```
|
||||
context = LLMContext(messages, tools)
|
||||
context_aggregator = LLMContextAggregatorPair(context)
|
||||
```
|
||||
|
||||
(PR [#3263](https://github.com/pipecat-ai/pipecat/pull/3263))
|
||||
|
||||
- `STTMuteFilter` is deprecated and will be removed in a future version. Use
|
||||
`LLMUserAggregator`'s new `user_mute_strategies` instead.
|
||||
(PR [#3292](https://github.com/pipecat-ai/pipecat/pull/3292))
|
||||
|
||||
- `FrameProcessor.interruptions_allowed` is now deprecated, use
|
||||
`LLMUserAggregator`'s new parameter `user_mute_strategies` instead.
|
||||
(PR [#3297](https://github.com/pipecat-ai/pipecat/pull/3297))
|
||||
|
||||
- `PipelineParams.allow_interruptions` is now deprecated, use
|
||||
`LLMUserAggregator`'s new parameter `user_turn_strategies` instead. For
|
||||
example, to disable interruptions but still get user turns you can do:
|
||||
|
||||
```python
|
||||
context_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(
|
||||
user_turn_strategies=UserTurnStrategies(
|
||||
start=[TranscriptionUserTurnStartStrategy(enable_interruptions=False)],
|
||||
),
|
||||
),
|
||||
)
|
||||
```
|
||||
|
||||
(PR [#3297](https://github.com/pipecat-ai/pipecat/pull/3297))
|
||||
|
||||
- `TranscriptProcessor` and related data classes and frames
|
||||
(`TranscriptionMessage`, `ThoughtTranscriptionMessage`,
|
||||
`TranscriptionUpdateFrame`) are deprecated. Use `LLMUserAggregator`'s and
|
||||
`LLMAssistantAggregator`'s new events (`on_user_turn_stopped` and
|
||||
`on_assistant_turn_stopped`) instead.
|
||||
(PR [#3385](https://github.com/pipecat-ai/pipecat/pull/3385))
|
||||
|
||||
- Deprecated support for the `vad_events` `LiveOptions` in
|
||||
`DeepgramSTTService`. Instead, use a local Silero VAD for VAD events.
|
||||
Additionally, deprecated `should_interrupt` which will be removed along with
|
||||
`vad_events` support in a future release.
|
||||
(PR [#3386](https://github.com/pipecat-ai/pipecat/pull/3386))
|
||||
|
||||
- Loading external observers from files is deprecated, use the new pipeline
|
||||
task setup files and `PIPECAT_SETUP_FILES` environment variable instead.
|
||||
(PR [#3397](https://github.com/pipecat-ai/pipecat/pull/3397))
|
||||
|
||||
### Fixed
|
||||
|
||||
- Improved error handling in `ElevenLabsRealtimeSTTService`
|
||||
(PR [#3233](https://github.com/pipecat-ai/pipecat/pull/3233))
|
||||
|
||||
- Fixed an issue in `ElevenLabsRealtimeSTTService` causing an infinite loop
|
||||
that blocks the process if the websocket disconnects due to an error
|
||||
(PR [#3233](https://github.com/pipecat-ai/pipecat/pull/3233))
|
||||
|
||||
- Fixed a bug in `STTMuteFilter` where the user was not always muted during
|
||||
function calls, especially when there were multiple simultaneous calls.
|
||||
(PR [#3292](https://github.com/pipecat-ai/pipecat/pull/3292))
|
||||
|
||||
- Fixed a `RNNoiseFilter` issue that would cause a "[Errno 12] Cannot allocate
|
||||
memory" error when processing silence audio frames.
|
||||
(PR [#3322](https://github.com/pipecat-ai/pipecat/pull/3322))
|
||||
|
||||
- Updated `SpeechmaticsSTTService` for version `0.0.99+`:
|
||||
|
||||
- Fixed `SpeechmaticsSTTService` to listen for `VADUserStoppedSpeakingFrame`
|
||||
in order to finalize transcription.
|
||||
- Default to `TurnDetectionMode.FIXED` for Pipecat-controlled end of turn
|
||||
detection.
|
||||
- Only emit VAD + interruption frames if VAD is enabled within the plugin
|
||||
(modes other than `TurnDetectionMode.FIXED` or `TurnDetectionMode.EXTERNAL`).
|
||||
(PR [#3328](https://github.com/pipecat-ai/pipecat/pull/3328))
|
||||
|
||||
- Fixed an issue with function calling where a handler failing to invoke its
|
||||
result callback could leave the context stuck in IN_PROGRESS, causing LLM
|
||||
inference for subsequent function call results to block while waiting on the
|
||||
unresolved call.
|
||||
(PR [#3343](https://github.com/pipecat-ai/pipecat/pull/3343))
|
||||
|
||||
- Fixed an issue with DeepgramTTSService where the model would output "Dot"
|
||||
instead of a period in some circumstances.
|
||||
(PR [#3345](https://github.com/pipecat-ai/pipecat/pull/3345))
|
||||
|
||||
- Fixed an issue in `traced_stt` where `model_name` in OpenTelemetry appears as
|
||||
`unknown`.
|
||||
(PR [#3351](https://github.com/pipecat-ai/pipecat/pull/3351))
|
||||
|
||||
- Fixed an issue in GeminiLiveLLMService where TranscriptionFrames were
|
||||
occasionally not pushed.
|
||||
(PR [#3356](https://github.com/pipecat-ai/pipecat/pull/3356))
|
||||
|
||||
- Fixed potential memory leaks and initialization issues in `KrispVivaFilter`
|
||||
by improving SDK lifecycle management.
|
||||
(PR [#3391](https://github.com/pipecat-ai/pipecat/pull/3391))
|
||||
|
||||
- Fixed timing issue in `BaseOutputTransport` where the bot speaking flag was
|
||||
set after awaiting, allowing the event loop to re-enter the method before the
|
||||
guard was set.
|
||||
(PR [#3400](https://github.com/pipecat-ai/pipecat/pull/3400))
|
||||
|
||||
- Fixed parallel function calling when using Gemini thinking.
|
||||
(PR [3420](https://github.com/pipecat-ai/pipecat/pull/3420))
|
||||
|
||||
- Fixed an issue in `traced_llm` where `model_name` in OpenTelemetry appears as
|
||||
`unknown`.
|
||||
(PR [#3422](https://github.com/pipecat-ai/pipecat/pull/3422))
|
||||
|
||||
- Fixed an issue in `traced_tts`, `traced_gemini_live`, and
|
||||
`traced_openai_realtime` where `model_name` in OpenTelemetry appears as
|
||||
`unknown`.
|
||||
(PR [#3428](https://github.com/pipecat-ai/pipecat/pull/3428))
|
||||
|
||||
- Fixed `request_image_frame` (for backwards compatibility) and restored
|
||||
function-call–related fields in `UserImageRequestFrame` and
|
||||
`UserImageRawFrame`, preventing a case where adding a non-LLM message to the
|
||||
context could trigger duplicate LLM inferences (on image arrival and on
|
||||
function-call result), potentially causing an infinite inference loop.
|
||||
(PR [#3430](https://github.com/pipecat-ai/pipecat/pull/3430))
|
||||
|
||||
- Fixed `LLMContext.create_audio_message()` by correcting an internal helper
|
||||
that was incorrectly declared async while being run in `asyncio.to_thread()`.
|
||||
(PR [#3435](https://github.com/pipecat-ai/pipecat/pull/3435))
|
||||
|
||||
### Other
|
||||
|
||||
- Added `52-live-transcription.py` foundational example demonstrating live
|
||||
transcription and translation from English to Spanish. In this example, the
|
||||
bot is not interruptible: as the user continues speaking, English
|
||||
transcriptions are queued, and the bot continuously translates and speaks
|
||||
each queued sentence in Spanish without being interrupted by new user speech.
|
||||
(PR [#3316](https://github.com/pipecat-ai/pipecat/pull/3316))
|
||||
|
||||
- Added a new foundational example `53-concurrent-llm-evaluation.py` that shows
|
||||
how to use `UserTurnProcessor`.
|
||||
(PR [#3372](https://github.com/pipecat-ai/pipecat/pull/3372))
|
||||
|
||||
- Added a new foundational example `28-user-assistant-turns.py` that shows how
|
||||
to use the new `LLMUserAggregator` and `LLMAssistantAggregator` events to
|
||||
gather a conversation transcript.
|
||||
(PR [#3385](https://github.com/pipecat-ai/pipecat/pull/3385))
|
||||
|
||||
## [0.0.98] - 2025-12-17
|
||||
|
||||
### Added
|
||||
|
||||
2
LICENSE
2
LICENSE
@@ -1,6 +1,6 @@
|
||||
BSD 2-Clause License
|
||||
|
||||
Copyright (c) 2024–2026, Daily
|
||||
Copyright (c) 2024–2025, Daily
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are met:
|
||||
|
||||
@@ -73,15 +73,15 @@ Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.yout
|
||||
|
||||
| Category | Services |
|
||||
| ------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Gradium](https://docs.pipecat.ai/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [Hathora](https://docs.pipecat.ai/server/services/stt/hathora), [NVIDIA Riva](https://docs.pipecat.ai/server/services/stt/riva), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [SambaNova (Whisper)](https://docs.pipecat.ai/server/services/stt/sambanova), [Sarvam](https://docs.pipecat.ai/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) |
|
||||
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Gradium](https://docs.pipecat.ai/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [NVIDIA Riva](https://docs.pipecat.ai/server/services/stt/riva), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [SambaNova (Whisper)](https://docs.pipecat.ai/server/services/stt/sambanova), [Sarvam](https://docs.pipecat.ai/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) |
|
||||
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/server/services/llm/aws), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/server/services/llm/mistral), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/server/services/llm/sambanova) [Together AI](https://docs.pipecat.ai/server/services/llm/together) |
|
||||
| Text-to-Speech | [Async](https://docs.pipecat.ai/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Camb AI](https://docs.pipecat.ai/server/services/tts/camb), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [Gradium](https://docs.pipecat.ai/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/server/services/tts/groq), [Hathora](https://docs.pipecat.ai/server/services/tts/hathora), [Hume](https://docs.pipecat.ai/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/server/services/tts/inworld), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [NVIDIA Riva](https://docs.pipecat.ai/server/services/tts/riva), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [Speechmatics](https://docs.pipecat.ai/server/services/tts/speechmatics), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
|
||||
| Text-to-Speech | [Async](https://docs.pipecat.ai/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [Gradium](https://docs.pipecat.ai/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/server/services/tts/groq), [Hume](https://docs.pipecat.ai/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/server/services/tts/inworld), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [NVIDIA Riva](https://docs.pipecat.ai/server/services/tts/riva), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [Speechmatics](https://docs.pipecat.ai/server/services/tts/speechmatics), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
|
||||
| Speech-to-Speech | [AWS Nova Sonic](https://docs.pipecat.ai/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/server/services/s2s/ultravox), |
|
||||
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local |
|
||||
| Serializers | [Exotel](https://docs.pipecat.ai/server/utilities/serializers/exotel), [Plivo](https://docs.pipecat.ai/server/utilities/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/utilities/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/utilities/serializers/telnyx), [Vonage](https://docs.pipecat.ai/server/utilities/serializers/vonage) |
|
||||
| Serializers | [Plivo](https://docs.pipecat.ai/server/utilities/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/utilities/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/utilities/serializers/telnyx) |
|
||||
| Video | [HeyGen](https://docs.pipecat.ai/server/services/video/heygen), [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) |
|
||||
| Memory | [mem0](https://docs.pipecat.ai/server/services/memory/mem0) |
|
||||
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/google-imagen), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) |
|
||||
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/fal), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) |
|
||||
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [ai-coustics](https://docs.pipecat.ai/server/utilities/audio/aic-filter) |
|
||||
| Analytics & Metrics | [OpenTelemetry](https://docs.pipecat.ai/server/utilities/opentelemetry), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) |
|
||||
|
||||
|
||||
42
changelog/3045.added.md
Normal file
42
changelog/3045.added.md
Normal file
@@ -0,0 +1,42 @@
|
||||
- Introducing user turn strategies. User turn strategies indicate when the user turn starts or stops. In conversational agents, these are often referred to as start/stop speaking or turn-taking plans or policies.
|
||||
|
||||
User turn start strategies indicate when the user starts speaking (e.g. using VAD events or when a user says one or more words).
|
||||
|
||||
User turn stop strategies indicate when the user stops speaking (e.g. using an end-of-turn detection model or by observing incoming transcriptions).
|
||||
|
||||
A list of strategies can be specified for both strategies; strategies are evaluated in order until one evaluates to true.
|
||||
|
||||
Available user turn start strategies:
|
||||
- VADUserTurnStartStrategy
|
||||
- TranscriptionUserTurnStartStrategy
|
||||
- MinWordsUserTurnStartStrategy
|
||||
- ExternalUserTurnStartStrategy
|
||||
|
||||
Available user turn stop strategies:
|
||||
- TranscriptionUserTurnStopStrategy
|
||||
- TurnAnalyzerUserTurnStopStrategy
|
||||
- ExternalUserTurnStopStrategy
|
||||
|
||||
The default strategies are:
|
||||
|
||||
- start: [VADUserTurnStartStrategy, TranscriptionUserTurnStartStrategy]
|
||||
- stop: [TranscriptionUserTurnStopStrategy]
|
||||
|
||||
Turn strategies are configured when setting up `LLMContextAggregatorPair`. For example:
|
||||
|
||||
```python
|
||||
context_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(
|
||||
user_turn_strategies=UserTurnStrategies(
|
||||
stop=[
|
||||
TurnAnalyzerUserTurnStopStrategy(
|
||||
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams())
|
||||
)
|
||||
],
|
||||
)
|
||||
),
|
||||
)
|
||||
```
|
||||
|
||||
In order to use the user turn strategies you must update to the new universal `LLMContext` and `LLMContextAggregatorPair`.
|
||||
1
changelog/3045.deprecated.2.md
Normal file
1
changelog/3045.deprecated.2.md
Normal file
@@ -0,0 +1 @@
|
||||
- ⚠️ `TransportParams.turn_analyzer` is deprecated and might result in unexpected behavior, use `LLMUserAggregator`'s new `turn_start_strategies` parameter instead.
|
||||
1
changelog/3045.deprecated.3.md
Normal file
1
changelog/3045.deprecated.3.md
Normal file
@@ -0,0 +1 @@
|
||||
- `FrameProcessor.interruption_strategies` is deprecated, use `LLMUserAggregator`'s new `turn_start_strategies` parameter instead.
|
||||
1
changelog/3045.deprecated.4.md
Normal file
1
changelog/3045.deprecated.4.md
Normal file
@@ -0,0 +1 @@
|
||||
- `EmulateUserStartedSpeakingFrame` and `EmulateUserStoppedSpeakingFrame` frames are deprecated.
|
||||
1
changelog/3045.deprecated.5.md
Normal file
1
changelog/3045.deprecated.5.md
Normal file
@@ -0,0 +1 @@
|
||||
- Deprecated the `emulated` field in the `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` frames.
|
||||
1
changelog/3045.deprecated.6.md
Normal file
1
changelog/3045.deprecated.6.md
Normal file
@@ -0,0 +1 @@
|
||||
- The `LLMUserAggregatorParams` and `LLMAssistantAggregatorParams` classes in `pipecat.processors.aggregators.llm_response` are now deprecated. Use the new universal `LLMContext` and `LLMContextAggregatorPair` instead.
|
||||
1
changelog/3045.deprecated.md
Normal file
1
changelog/3045.deprecated.md
Normal file
@@ -0,0 +1 @@
|
||||
- `pipecat.audio.interruptions.MinWordsInterruptionStrategy` is deprecated. Use `pipecat.turns.user_start.MinWordsUserTurnStartStrategy` with `LLMUserAggregator`'s new `turn_start_strategies` parameter instead.
|
||||
1
changelog/3205.added.md
Normal file
1
changelog/3205.added.md
Normal file
@@ -0,0 +1 @@
|
||||
- Added `RNNoiseFilter` for real-time noise suppression using RNNoise neural network via pyrnnoise library.
|
||||
15
changelog/3225.changed.md
Normal file
15
changelog/3225.changed.md
Normal file
@@ -0,0 +1,15 @@
|
||||
- Updated `SpeechmaticsSTTService` to use new Python Voice SDK with improved VAD,
|
||||
Smart Turn capabilities, and brings dramatic improvements to latency without
|
||||
any impact on accuracy. Use the `turn_detection_mode` parameter to control the
|
||||
endpointing of speech, with `TurnDetectionMode.EXTERNAL` (default),
|
||||
`TurnDetectionMode.ADAPTIVE`, or `TurnDetectionMode.SMART_TURN`.
|
||||
```python
|
||||
stt = SpeechmaticsSTTService(
|
||||
api_key=os.getenv("SPEECHMATICS_API_KEY"),
|
||||
params=SpeechmaticsSTTService.InputParams(
|
||||
language=Language.EN,
|
||||
turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.ADAPTIVE,
|
||||
speaker_active_format="<{speaker_id}>{text}</{speaker_id}>",
|
||||
),
|
||||
)
|
||||
```
|
||||
4
changelog/3225.deprecated.md
Normal file
4
changelog/3225.deprecated.md
Normal file
@@ -0,0 +1,4 @@
|
||||
- For `SpeechmaticsSTTService`, the `end_of_utterance_mode` parameter is deprecated.
|
||||
Use the new `turn_detection_mode` parameter instead, with `TurnDetectionMode.EXTERNAL`,
|
||||
`TurnDetectionMode.ADAPTIVE`, or `TurnDetectionMode.SMART_TURN`. The `enable_vad`
|
||||
parameter is also deprecated and is inferred from the `turn_detection_mode`.
|
||||
2
changelog/3233.fixed.md
Normal file
2
changelog/3233.fixed.md
Normal file
@@ -0,0 +1,2 @@
|
||||
- Improved error handling in `ElevenLabsRealtimeSTTService`
|
||||
- Fixed an issue in `ElevenLabsRealtimeSTTService` causing an infinite loop that blocks the process if the websocket disconnects due to an error
|
||||
1
changelog/3257.changed.2.md
Normal file
1
changelog/3257.changed.2.md
Normal file
@@ -0,0 +1 @@
|
||||
- `TranscriptionFrame` and `InterimTranscriptionFrame` produced by `DailyTransport` now include the transport source (i.e., the originating audio track).
|
||||
1
changelog/3257.changed.md
Normal file
1
changelog/3257.changed.md
Normal file
@@ -0,0 +1 @@
|
||||
- `daily-python` updated to 0.23.0.
|
||||
15
changelog/3263.deprecated.md
Normal file
15
changelog/3263.deprecated.md
Normal file
@@ -0,0 +1,15 @@
|
||||
- `OpenAILLMContext` and its associated things (context aggregators, etc.) are now deprecated in favor of the universal `LLMContext` and its associated things.
|
||||
|
||||
From the developer's point of view, switching to using `LLMContext` machinery will usually be a matter of going from this:
|
||||
|
||||
```python
|
||||
context = OpenAILLMContext(messages, tools)
|
||||
context_aggregator = llm.create_context_aggregator(context)
|
||||
```
|
||||
|
||||
To this:
|
||||
|
||||
```
|
||||
context = LLMContext(messages, tools)
|
||||
context_aggregator = LLMContextAggregatorPair(context)
|
||||
```
|
||||
8
changelog/3267.added.md
Normal file
8
changelog/3267.added.md
Normal file
@@ -0,0 +1,8 @@
|
||||
- Added `GrokRealtimeLLMService` for xAI's Grok Voice Agent API with real-time voice conversations:
|
||||
|
||||
- Support for real-time audio streaming with WebSocket connection
|
||||
- Built-in server-side VAD (Voice Activity Detection)
|
||||
- Multiple voice options: Ara, Rex, Sal, Eve, Leo
|
||||
- Built-in tools support: web_search, x_search, file_search
|
||||
- Custom function calling with standard Pipecat tools schema
|
||||
- Configurable audio formats (PCM at 8kHz-48kHz)
|
||||
1
changelog/3268.added.md
Normal file
1
changelog/3268.added.md
Normal file
@@ -0,0 +1 @@
|
||||
- Added an approximation of TTFB for Ultravox.
|
||||
5
changelog/3288.changed.md
Normal file
5
changelog/3288.changed.md
Normal file
@@ -0,0 +1,5 @@
|
||||
- Updates to Inworld TTS services:
|
||||
|
||||
- Improved `InworldTTSService`'s websocket implementation to better flush and
|
||||
close context to better handle long inputs.
|
||||
- Improved docstrings for `InworldTTSService` and `InworldHttpTTSService`.
|
||||
1
changelog/3289.added.md
Normal file
1
changelog/3289.added.md
Normal file
@@ -0,0 +1 @@
|
||||
- Added a new `AudioContextTTSService` to the TTS service base classes. The `AudioContextWordTTSService` now inherits from `AudioContextTTSService` and `WebsocketWordTTSService`.
|
||||
4
changelog/3291.added.md
Normal file
4
changelog/3291.added.md
Normal file
@@ -0,0 +1,4 @@
|
||||
- `LLMUserAggregator` now exposes the following events:
|
||||
- `on_user_turn_started`: triggered when a user turn starts
|
||||
- `on_user_turn_stopped`: triggered when a user turn ends
|
||||
- `on_user_turn_stop_timeout`: triggered when a user turn does not stop and times out
|
||||
29
changelog/3292.added.md
Normal file
29
changelog/3292.added.md
Normal file
@@ -0,0 +1,29 @@
|
||||
- Introducing user mute strategies. User mute strategies indicate when user input should be muted based on the current system state.
|
||||
|
||||
In conversational agents, user mute strategies are used to prevent user input from interrupting bot speech, tool execution, or other critical system operations.
|
||||
|
||||
A list of strategies can be specified; all strategies are evaluated for every frame so that each strategy can maintain its internal state. A user frame is muted if any of the configured strategies indicates it should be muted.
|
||||
|
||||
Available user mute strategies:
|
||||
|
||||
* `FirstSpeechUserMuteStrategy`
|
||||
* `MuteUntilFirstBotCompleteUserMuteStrategy`
|
||||
* `AlwaysUserMuteStrategy`
|
||||
* `FunctionCallUserMuteStrategy`
|
||||
|
||||
User mute strategies replace the legacy `STTMuteFilter` and provide a more flexible and composable approach to muting user input.
|
||||
|
||||
User mute strategies are configured when setting up the `LLMContextAggregatorPair`. For example:
|
||||
|
||||
```python
|
||||
context_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(
|
||||
user_mute_strategies=[
|
||||
FirstSpeechUserMuteStrategy(),
|
||||
]
|
||||
),
|
||||
)
|
||||
```
|
||||
|
||||
In order to use user mute strategies you should update to the new universal `LLMContext` and `LLMContextAggregatorPair`.
|
||||
1
changelog/3292.deprecated.md
Normal file
1
changelog/3292.deprecated.md
Normal file
@@ -0,0 +1 @@
|
||||
- `STTMuteFilter` is deprecated and will be removed in a future version. Use `LLMUserAggregator`'s new `user_mute_strategies` instead.
|
||||
1
changelog/3292.fixed.md
Normal file
1
changelog/3292.fixed.md
Normal file
@@ -0,0 +1 @@
|
||||
- Fixed a bug in `STTMuteFilter` where the user was not always muted during function calls, especially when there were multiple simultaneous calls.
|
||||
1
changelog/3297.deprecated.2.md
Normal file
1
changelog/3297.deprecated.2.md
Normal file
@@ -0,0 +1 @@
|
||||
- `FrameProcessor.interruptions_allowed` is now deprecated, use `LLMUserAggregator`'s new parameter `user_mute_strategies` instead.
|
||||
12
changelog/3297.deprecated.md
Normal file
12
changelog/3297.deprecated.md
Normal file
@@ -0,0 +1,12 @@
|
||||
- `PipelineParams.allow_interruptions` is now deprecated, use `LLMUserAggregator`'s new parameter `turn_start_strategies` instead. For example, to disable interruptions but still get user turns you can do:
|
||||
|
||||
```python
|
||||
context_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(
|
||||
user_turn_strategies=UserTurnStrategies(
|
||||
start=[TranscriptionUserTurnStartStrategy(enable_interruptions=False)],
|
||||
),
|
||||
),
|
||||
)
|
||||
```
|
||||
1
changelog/3300.added.md
Normal file
1
changelog/3300.added.md
Normal file
@@ -0,0 +1 @@
|
||||
- Added `use_ssl` parameter to `NvidiaSTTService`, `NvidiaSegmentedSTTService` and `NvidiaTTSService`.
|
||||
1
changelog/3314.changed.md
Normal file
1
changelog/3314.changed.md
Normal file
@@ -0,0 +1 @@
|
||||
- Updated `DeepgramSTTService` to push user started/stopped speaking and interruption frames when `vad_enabled` is set to true. This centralizes the frames into the service, removing the need to have your application code handle Deepgram's events and push these frames.
|
||||
1
changelog/3316.added.md
Normal file
1
changelog/3316.added.md
Normal file
@@ -0,0 +1 @@
|
||||
- Added `enable_interruptions` constructor argument to all user turn strategies. This tells the `LLMUserAggregator` to push or not push an `InterruptionFrame`.
|
||||
1
changelog/3316.other.md
Normal file
1
changelog/3316.other.md
Normal file
@@ -0,0 +1 @@
|
||||
- Added `52-live-transcription.py` foundational example demonstrating live transcription and translation from English to Spanish. In this example, the bot is not interruptible: as the user continues speaking, English transcriptions are queued, and the bot continuously translates and speaks each queued sentence in Spanish without being interrupted by new user speech.
|
||||
1
changelog/3326.added.md
Normal file
1
changelog/3326.added.md
Normal file
@@ -0,0 +1 @@
|
||||
- Frame processors can now push frames from the top of the pipeline using new methods `queue_task_frame()` and `queue_task_frames()`.
|
||||
@@ -1 +0,0 @@
|
||||
- Added `add_reached_upstream_filter()` and `add_reached_downstream_filter()` methods to `PipelineTask` for appending frame types.
|
||||
@@ -1 +0,0 @@
|
||||
- Added `reached_upstream_types` and `reached_downstream_types` read-only properties to `PipelineTask` for inspecting current frame filters.
|
||||
@@ -1 +0,0 @@
|
||||
- Changed frame filter storage from tuples to sets in `PipelineTask`.
|
||||
@@ -1 +0,0 @@
|
||||
- Added `RTVIProcessor.create_rtvi_observer()` factory method for creating RTVI observers.
|
||||
@@ -1 +0,0 @@
|
||||
- Added `FrameProcessor.broadcast_frame_instance(frame)` method to broadcast a frame instance by extracting its fields and creating new instances for each direction.
|
||||
@@ -1 +0,0 @@
|
||||
- `PipelineTask` now automatically adds `RTVIProcessor` and registers `RTVIObserver` when `enable_rtvi=True` (default), simplifying pipeline setup.
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed `FrameProcessor.broadcast_frame()` to deep copy kwargs, preventing shared mutable references between the downstream and upstream frame instances.
|
||||
@@ -1 +0,0 @@
|
||||
- Transports now properly broadcast `InputTransportMessageFrame` frames both upstream and downstream instead of only pushing downstream.
|
||||
@@ -1 +0,0 @@
|
||||
- Added `video_out_codec` parameter to `TransportParams` allowing configuration of the preferred video codec (e.g., `"VP8"`, `"H264"`, `"H265"`) for video output in `DailyTransport`.
|
||||
@@ -1 +0,0 @@
|
||||
- Added `location` parameter to Google TTS services (`GoogleHttpTTSService`, `GoogleTTSService`, `GeminiTTSService`) for regional endpoint support.
|
||||
@@ -1 +0,0 @@
|
||||
- Added new `SMART_TURN_LOG_DATA` environment variable, which causes Smart Turn input data to be saved to disk
|
||||
@@ -1,2 +0,0 @@
|
||||
- Changed default Inworld TTS model from `inworld-tts-1` to
|
||||
`inworld-tts-1.5-max`.
|
||||
@@ -91,25 +91,6 @@ autodoc_mock_imports = [
|
||||
# MLX dependencies (Apple Silicon specific)
|
||||
"mlx",
|
||||
"mlx_whisper", # Note: might need underscore format too
|
||||
# Pydantic v2 compatibility issues in third-party SDKs
|
||||
"hume",
|
||||
"hume.tts",
|
||||
"hume.tts.types",
|
||||
"cartesia",
|
||||
"camb",
|
||||
"sarvamai",
|
||||
"openpipe",
|
||||
"openai.types.beta.realtime",
|
||||
"langchain_core",
|
||||
"langchain_core.messages",
|
||||
# FastAPI - Pydantic v2 compatibility issues during Sphinx autodoc
|
||||
"fastapi",
|
||||
"fastapi.applications",
|
||||
"fastapi.routing",
|
||||
"fastapi.params",
|
||||
"fastapi.middleware",
|
||||
"fastapi.responses",
|
||||
"uvicorn",
|
||||
]
|
||||
|
||||
# HTML output settings
|
||||
|
||||
@@ -31,9 +31,6 @@ AZURE_DALLE_API_KEY=...
|
||||
AZURE_DALLE_ENDPOINT=https://...
|
||||
AZURE_DALLE_MODEL=...
|
||||
|
||||
# Camb.ai
|
||||
CAMB_API_KEY=...
|
||||
|
||||
# Cartesia
|
||||
CARTESIA_API_KEY=...
|
||||
CARTESIA_VOICE_ID=...
|
||||
@@ -85,9 +82,6 @@ GROK_API_KEY=...
|
||||
# Groq
|
||||
GROQ_API_KEY=...
|
||||
|
||||
# Hathora
|
||||
HATHORA_API_KEY=...
|
||||
|
||||
# Heygen
|
||||
HEYGEN_API_KEY=...
|
||||
HEYGEN_LIVE_AVATAR_API_KEY=...
|
||||
@@ -103,8 +97,7 @@ INWORLD_API_KEY=...
|
||||
KRISP_MODEL_PATH=...
|
||||
|
||||
# Krisp Viva
|
||||
KRISP_VIVA_FILTER_MODEL_PATH=...
|
||||
KRISP_VIVA_TURN_MODEL_PATH=...
|
||||
KRISP_VIVA_MODEL_PATH=...
|
||||
|
||||
# LiveKit
|
||||
LIVEKIT_API_KEY=...
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
@@ -85,7 +85,7 @@ async def run_example(webrtc_connection: SmallWebRTCConnection):
|
||||
]
|
||||
|
||||
context = LLMContext(messages)
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
||||
context_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(
|
||||
user_turn_strategies=UserTurnStrategies(
|
||||
@@ -98,11 +98,11 @@ async def run_example(webrtc_connection: SmallWebRTCConnection):
|
||||
[
|
||||
transport.input(), # Transport user input
|
||||
stt,
|
||||
user_aggregator, # User responses
|
||||
context_aggregator.user(), # User responses
|
||||
llm, # LLM
|
||||
tts, # TTS
|
||||
transport.output(), # Transport bot output
|
||||
assistant_aggregator, # Assistant spoken responses
|
||||
context_aggregator.assistant(), # Assistant spoken responses
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
@@ -68,7 +68,7 @@ async def main():
|
||||
]
|
||||
|
||||
context = LLMContext(messages)
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
||||
context_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(
|
||||
user_turn_strategies=UserTurnStrategies(
|
||||
@@ -82,11 +82,11 @@ async def main():
|
||||
pipeline = Pipeline(
|
||||
[
|
||||
transport.input(), # Transport user input
|
||||
user_aggregator, # User responses
|
||||
context_aggregator.user(), # User responses
|
||||
llm, # LLM
|
||||
tts, # TTS
|
||||
transport.output(), # Transport bot output
|
||||
assistant_aggregator, # Assistant spoken responses
|
||||
context_aggregator.assistant(), # Assistant spoken responses
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
@@ -78,7 +78,7 @@ async def main():
|
||||
]
|
||||
|
||||
context = LLMContext(messages)
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
||||
context_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(
|
||||
user_turn_strategies=UserTurnStrategies(
|
||||
@@ -91,11 +91,11 @@ async def main():
|
||||
[
|
||||
transport.input(), # Transport user input
|
||||
stt,
|
||||
user_aggregator, # User responses
|
||||
context_aggregator.user(), # User responses
|
||||
llm, # LLM
|
||||
tts, # TTS
|
||||
transport.output(), # Transport bot output
|
||||
assistant_aggregator, # Assistant spoken responses
|
||||
context_aggregator.assistant(), # Assistant spoken responses
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
@@ -106,7 +106,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
]
|
||||
|
||||
context = LLMContext(messages)
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
||||
context_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(
|
||||
user_turn_strategies=UserTurnStrategies(
|
||||
@@ -119,12 +119,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
[
|
||||
transport.input(),
|
||||
stt,
|
||||
user_aggregator,
|
||||
context_aggregator.user(),
|
||||
llm,
|
||||
tts,
|
||||
ml,
|
||||
transport.output(),
|
||||
assistant_aggregator,
|
||||
context_aggregator.assistant(),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
@@ -120,7 +120,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
]
|
||||
|
||||
context = LLMContext(messages)
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
||||
context_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(
|
||||
user_turn_strategies=UserTurnStrategies(
|
||||
@@ -138,12 +138,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
[
|
||||
transport.input(),
|
||||
stt,
|
||||
user_aggregator,
|
||||
context_aggregator.user(),
|
||||
llm,
|
||||
tts,
|
||||
image_sync_aggregator,
|
||||
transport.output(),
|
||||
assistant_aggregator,
|
||||
context_aggregator.assistant(),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
@@ -77,7 +77,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
]
|
||||
|
||||
context = LLMContext(messages)
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
||||
context_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(
|
||||
user_turn_strategies=UserTurnStrategies(
|
||||
@@ -90,11 +90,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
[
|
||||
transport.input(), # Transport user input
|
||||
stt,
|
||||
user_aggregator, # User responses
|
||||
context_aggregator.user(), # User responses
|
||||
llm, # LLM
|
||||
tts, # TTS
|
||||
transport.output(), # Transport bot output
|
||||
assistant_aggregator, # Assistant spoken responses
|
||||
context_aggregator.assistant(), # Assistant spoken responses
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
@@ -76,7 +76,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
]
|
||||
|
||||
context = LLMContext(messages)
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
||||
context_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(
|
||||
user_turn_strategies=UserTurnStrategies(
|
||||
@@ -89,11 +89,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
[
|
||||
transport.input(), # Transport user input
|
||||
stt,
|
||||
user_aggregator, # User responses
|
||||
context_aggregator.user(), # User responses
|
||||
llm, # LLM
|
||||
tts, # TTS
|
||||
transport.output(), # Transport bot output
|
||||
assistant_aggregator, # Assistant spoken responses
|
||||
context_aggregator.assistant(), # Assistant spoken responses
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
@@ -131,7 +131,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
]
|
||||
|
||||
context = LLMContext(messages)
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
||||
context_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(user_turn_strategies=ExternalUserTurnStrategies()),
|
||||
)
|
||||
@@ -140,11 +140,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
[
|
||||
transport.input(), # Transport user input
|
||||
stt,
|
||||
user_aggregator, # User responses
|
||||
context_aggregator.user(), # User responses
|
||||
llm, # LLM
|
||||
tts, # TTS
|
||||
transport.output(), # Transport bot output
|
||||
assistant_aggregator, # Assistant spoken responses
|
||||
context_aggregator.assistant(), # Assistant spoken responses
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
@@ -117,7 +117,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
]
|
||||
|
||||
context = LLMContext(messages)
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
||||
context_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(
|
||||
user_turn_strategies=UserTurnStrategies(
|
||||
@@ -132,11 +132,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
[
|
||||
transport.input(), # Transport user input
|
||||
stt, # STT
|
||||
user_aggregator, # User responses
|
||||
context_aggregator.user(), # User responses
|
||||
llm, # LLM
|
||||
tts, # TTS
|
||||
transport.output(), # Transport bot output
|
||||
assistant_aggregator, # Assistant spoken responses
|
||||
context_aggregator.assistant(), # Assistant spoken responses
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
@@ -76,7 +76,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
]
|
||||
|
||||
context = LLMContext(messages)
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
||||
context_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(
|
||||
user_turn_strategies=UserTurnStrategies(
|
||||
@@ -89,11 +89,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
[
|
||||
transport.input(), # Transport user input
|
||||
stt,
|
||||
user_aggregator, # User responses
|
||||
context_aggregator.user(), # User responses
|
||||
llm, # LLM
|
||||
tts, # TTS
|
||||
transport.output(), # Transport bot output
|
||||
assistant_aggregator, # Assistant spoken responses
|
||||
context_aggregator.assistant(), # Assistant spoken responses
|
||||
]
|
||||
)
|
||||
task = PipelineTask(
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
@@ -23,6 +23,7 @@ from pipecat.processors.aggregators.llm_response_universal import (
|
||||
LLMContextAggregatorPair,
|
||||
LLMUserAggregatorParams,
|
||||
)
|
||||
from pipecat.processors.frameworks.rtvi import RTVIObserver, RTVIProcessor
|
||||
from pipecat.runner.types import RunnerArguments
|
||||
from pipecat.runner.utils import create_transport
|
||||
from pipecat.services.deepgram.stt import DeepgramSTTService
|
||||
@@ -81,7 +82,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
]
|
||||
|
||||
context = LLMContext(messages)
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
||||
context_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(
|
||||
user_turn_strategies=UserTurnStrategies(
|
||||
@@ -92,15 +93,18 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
),
|
||||
)
|
||||
|
||||
rtvi = RTVIProcessor()
|
||||
|
||||
pipeline = Pipeline(
|
||||
[
|
||||
transport.input(),
|
||||
rtvi,
|
||||
stt,
|
||||
user_aggregator,
|
||||
context_aggregator.user(),
|
||||
llm,
|
||||
tts,
|
||||
transport.output(),
|
||||
assistant_aggregator,
|
||||
context_aggregator.assistant(),
|
||||
]
|
||||
)
|
||||
|
||||
@@ -111,6 +115,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
enable_usage_metrics=True,
|
||||
),
|
||||
observers=[
|
||||
RTVIObserver(rtvi),
|
||||
DebugLogObserver(
|
||||
frame_types={
|
||||
TTSTextFrame: (BaseOutputTransport, FrameEndpoint.SOURCE),
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
@@ -22,6 +22,7 @@ from pipecat.processors.aggregators.llm_response_universal import (
|
||||
LLMContextAggregatorPair,
|
||||
LLMUserAggregatorParams,
|
||||
)
|
||||
from pipecat.processors.frameworks.rtvi import RTVIConfig, RTVIObserver, RTVIProcessor
|
||||
from pipecat.runner.types import RunnerArguments
|
||||
from pipecat.runner.utils import create_transport
|
||||
from pipecat.services.deepgram.stt import DeepgramSTTService
|
||||
@@ -78,7 +79,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
]
|
||||
|
||||
context = LLMContext(messages)
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
||||
context_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(
|
||||
user_turn_strategies=UserTurnStrategies(
|
||||
@@ -87,15 +88,18 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
),
|
||||
)
|
||||
|
||||
rtvi = RTVIProcessor(config=RTVIConfig(config=[]))
|
||||
|
||||
pipeline = Pipeline(
|
||||
[
|
||||
transport.input(),
|
||||
rtvi,
|
||||
stt,
|
||||
user_aggregator,
|
||||
context_aggregator.user(),
|
||||
llm,
|
||||
tts,
|
||||
transport.output(),
|
||||
assistant_aggregator,
|
||||
context_aggregator.assistant(),
|
||||
]
|
||||
)
|
||||
|
||||
@@ -106,6 +110,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
enable_usage_metrics=True,
|
||||
),
|
||||
observers=[
|
||||
RTVIObserver(rtvi),
|
||||
DebugLogObserver(
|
||||
frame_types={
|
||||
TTSTextFrame: (BaseOutputTransport, FrameEndpoint.SOURCE),
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
@@ -82,7 +82,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
]
|
||||
|
||||
context = LLMContext(messages)
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
||||
context_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(
|
||||
user_turn_strategies=UserTurnStrategies(
|
||||
@@ -97,11 +97,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
[
|
||||
transport.input(), # Transport user input
|
||||
stt,
|
||||
user_aggregator, # User responses
|
||||
context_aggregator.user(), # User responses
|
||||
llm, # LLM
|
||||
tts, # TTS
|
||||
transport.output(), # Transport bot output
|
||||
assistant_aggregator, # Assistant spoken responses
|
||||
context_aggregator.assistant(), # Assistant spoken responses
|
||||
]
|
||||
)
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
@@ -78,7 +78,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
]
|
||||
|
||||
context = LLMContext(messages)
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
||||
context_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(
|
||||
user_turn_strategies=UserTurnStrategies(
|
||||
@@ -91,11 +91,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
[
|
||||
transport.input(), # Transport user input
|
||||
stt,
|
||||
user_aggregator, # User responses
|
||||
context_aggregator.user(), # User responses
|
||||
llm, # LLM
|
||||
tts, # TTS
|
||||
transport.output(), # Transport bot output
|
||||
assistant_aggregator, # Assistant spoken responses
|
||||
context_aggregator.assistant(), # Assistant spoken responses
|
||||
]
|
||||
)
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
@@ -105,7 +105,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
]
|
||||
|
||||
context = LLMContext(messages)
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
||||
context_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(
|
||||
user_turn_strategies=UserTurnStrategies(
|
||||
@@ -118,12 +118,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
[
|
||||
transport.input(), # Transport user input
|
||||
stt, # STT
|
||||
user_aggregator, # User responses
|
||||
context_aggregator.user(), # User responses
|
||||
llm, # LLM
|
||||
tts, # TTS
|
||||
transport.output(), # Transport bot output
|
||||
audiobuffer, # write audio data to a file
|
||||
assistant_aggregator, # Assistant spoken responses
|
||||
context_aggregator.assistant(), # Assistant spoken responses
|
||||
]
|
||||
)
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
@@ -22,6 +22,7 @@ from pipecat.processors.aggregators.llm_response_universal import (
|
||||
LLMContextAggregatorPair,
|
||||
LLMUserAggregatorParams,
|
||||
)
|
||||
from pipecat.processors.frameworks.rtvi import RTVIConfig, RTVIObserver, RTVIProcessor
|
||||
from pipecat.runner.types import RunnerArguments
|
||||
from pipecat.runner.utils import create_transport
|
||||
from pipecat.services.deepgram.stt import DeepgramSTTService
|
||||
@@ -80,7 +81,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
]
|
||||
|
||||
context = LLMContext(messages)
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
||||
context_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(
|
||||
user_turn_strategies=UserTurnStrategies(
|
||||
@@ -89,15 +90,18 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
),
|
||||
)
|
||||
|
||||
rtvi = RTVIProcessor(config=RTVIConfig(config=[]))
|
||||
|
||||
pipeline = Pipeline(
|
||||
[
|
||||
transport.input(), # Transport user input
|
||||
rtvi,
|
||||
stt,
|
||||
user_aggregator, # User responses
|
||||
context_aggregator.user(), # User responses
|
||||
llm, # LLM
|
||||
tts, # TTS (HumeTTSService with word timestamps)
|
||||
transport.output(), # Transport bot output
|
||||
assistant_aggregator, # Assistant spoken responses
|
||||
context_aggregator.assistant(), # Assistant spoken responses
|
||||
]
|
||||
)
|
||||
|
||||
@@ -110,6 +114,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
),
|
||||
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
|
||||
observers=[
|
||||
RTVIObserver(rtvi),
|
||||
DebugLogObserver(
|
||||
frame_types={
|
||||
TTSTextFrame: (BaseOutputTransport, FrameEndpoint.SOURCE),
|
||||
@@ -118,6 +123,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
],
|
||||
)
|
||||
|
||||
@rtvi.event_handler("on_client_ready")
|
||||
async def on_client_ready(rtvi):
|
||||
await rtvi.set_bot_ready()
|
||||
|
||||
@transport.event_handler("on_client_connected")
|
||||
async def on_client_connected(transport, client):
|
||||
logger.info(f"Client connected")
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
@@ -76,7 +76,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
]
|
||||
|
||||
context = LLMContext(messages)
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
||||
context_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(
|
||||
user_turn_strategies=UserTurnStrategies(
|
||||
@@ -89,11 +89,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
[
|
||||
transport.input(), # Transport user input
|
||||
stt,
|
||||
user_aggregator, # User responses
|
||||
context_aggregator.user(), # User responses
|
||||
llm, # LLM
|
||||
tts, # TTS
|
||||
transport.output(), # Transport bot output
|
||||
assistant_aggregator, # Assistant spoken responses
|
||||
context_aggregator.assistant(), # Assistant spoken responses
|
||||
]
|
||||
)
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
@@ -103,7 +103,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
lc = LangchainProcessor(history_chain)
|
||||
|
||||
context = LLMContext()
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
||||
context_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(
|
||||
user_turn_strategies=UserTurnStrategies(
|
||||
@@ -116,11 +116,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
[
|
||||
transport.input(), # Transport user input
|
||||
stt,
|
||||
user_aggregator, # User responses
|
||||
context_aggregator.user(), # User responses
|
||||
lc, # Langchain
|
||||
tts, # TTS
|
||||
transport.output(), # Transport bot output
|
||||
assistant_aggregator, # Assistant spoken responses
|
||||
context_aggregator.assistant(), # Assistant spoken responses
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
@@ -71,7 +71,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
]
|
||||
|
||||
context = LLMContext(messages)
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
||||
context_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(user_turn_strategies=ExternalUserTurnStrategies()),
|
||||
)
|
||||
@@ -80,11 +80,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
[
|
||||
transport.input(), # Transport user input
|
||||
stt, # STT
|
||||
user_aggregator, # User responses
|
||||
context_aggregator.user(), # User responses
|
||||
llm, # LLM
|
||||
tts, # TTS
|
||||
transport.output(), # Transport bot output
|
||||
assistant_aggregator, # Assistant spoken responses
|
||||
context_aggregator.assistant(), # Assistant spoken responses
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
@@ -81,7 +81,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
]
|
||||
|
||||
context = LLMContext(messages)
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
||||
context_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(
|
||||
user_turn_strategies=UserTurnStrategies(
|
||||
@@ -96,11 +96,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
[
|
||||
transport.input(), # Transport user input
|
||||
stt, # STT
|
||||
user_aggregator, # User responses
|
||||
context_aggregator.user(), # User responses
|
||||
llm, # LLM
|
||||
tts, # TTS
|
||||
transport.output(), # Transport bot output
|
||||
assistant_aggregator, # Assistant spoken responses
|
||||
context_aggregator.assistant(), # Assistant spoken responses
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
@@ -86,7 +86,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
]
|
||||
|
||||
context = LLMContext(messages)
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
||||
context_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(
|
||||
user_turn_strategies=UserTurnStrategies(
|
||||
@@ -99,11 +99,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
[
|
||||
transport.input(), # Transport user input
|
||||
stt, # STT
|
||||
user_aggregator, # User responses
|
||||
context_aggregator.user(), # User responses
|
||||
llm, # LLM
|
||||
tts, # TTS
|
||||
transport.output(), # Transport bot output
|
||||
assistant_aggregator, # Assistant spoken responses
|
||||
context_aggregator.assistant(), # Assistant spoken responses
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
@@ -72,7 +72,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
]
|
||||
|
||||
context = LLMContext(messages)
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
||||
context_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(user_turn_strategies=ExternalUserTurnStrategies()),
|
||||
)
|
||||
@@ -81,11 +81,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
[
|
||||
transport.input(), # Transport user input
|
||||
stt, # STT
|
||||
user_aggregator, # User responses
|
||||
context_aggregator.user(), # User responses
|
||||
llm, # LLM
|
||||
tts, # TTS
|
||||
transport.output(), # Transport bot output
|
||||
assistant_aggregator, # Assistant spoken responses
|
||||
context_aggregator.assistant(), # Assistant spoken responses
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
@@ -75,7 +75,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
]
|
||||
|
||||
context = LLMContext(messages)
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
||||
context_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(
|
||||
user_turn_strategies=UserTurnStrategies(
|
||||
@@ -88,11 +88,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
[
|
||||
transport.input(), # Transport user input
|
||||
stt, # STT
|
||||
user_aggregator, # User responses
|
||||
context_aggregator.user(), # User responses
|
||||
llm, # LLM
|
||||
tts, # TTS
|
||||
transport.output(), # Transport bot output
|
||||
assistant_aggregator, # Assistant spoken responses
|
||||
context_aggregator.assistant(), # Assistant spoken responses
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
@@ -85,7 +85,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
]
|
||||
|
||||
context = LLMContext(messages)
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
||||
context_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(
|
||||
user_turn_strategies=UserTurnStrategies(
|
||||
@@ -100,11 +100,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
[
|
||||
transport.input(), # Transport user input
|
||||
stt,
|
||||
user_aggregator, # User responses
|
||||
context_aggregator.user(), # User responses
|
||||
llm, # LLM
|
||||
tts, # TTS
|
||||
transport.output(), # Transport bot output
|
||||
assistant_aggregator, # Assistant spoken responses
|
||||
context_aggregator.assistant(), # Assistant spoken responses
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
@@ -78,7 +78,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
]
|
||||
|
||||
context = LLMContext(messages)
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
||||
context_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(
|
||||
user_turn_strategies=UserTurnStrategies(
|
||||
@@ -91,11 +91,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
[
|
||||
transport.input(), # Transport user input
|
||||
stt,
|
||||
user_aggregator, # User responses
|
||||
context_aggregator.user(), # User responses
|
||||
llm, # LLM
|
||||
tts, # TTS
|
||||
transport.output(), # Transport bot output
|
||||
assistant_aggregator, # Assistant spoken responses
|
||||
context_aggregator.assistant(), # Assistant spoken responses
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
@@ -78,7 +78,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
]
|
||||
|
||||
context = LLMContext(messages)
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
||||
context_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(
|
||||
user_turn_strategies=UserTurnStrategies(
|
||||
@@ -91,11 +91,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
[
|
||||
transport.input(), # Transport user input
|
||||
stt,
|
||||
user_aggregator, # User responses
|
||||
context_aggregator.user(), # User responses
|
||||
llm, # LLM
|
||||
tts, # TTS
|
||||
transport.output(), # Transport bot output
|
||||
assistant_aggregator, # Assistant spoken responses
|
||||
context_aggregator.assistant(), # Assistant spoken responses
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
@@ -80,7 +80,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
]
|
||||
|
||||
context = LLMContext(messages)
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
||||
context_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(
|
||||
user_turn_strategies=UserTurnStrategies(
|
||||
@@ -93,11 +93,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
[
|
||||
transport.input(), # Transport user input
|
||||
stt,
|
||||
user_aggregator, # User responses
|
||||
context_aggregator.user(), # User responses
|
||||
llm, # LLM
|
||||
tts, # TTS
|
||||
transport.output(), # Transport bot output
|
||||
assistant_aggregator, # Assistant spoken responses
|
||||
context_aggregator.assistant(), # Assistant spoken responses
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
@@ -84,7 +84,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
]
|
||||
|
||||
context = LLMContext(messages)
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
||||
context_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(
|
||||
user_turn_strategies=UserTurnStrategies(
|
||||
@@ -97,11 +97,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
[
|
||||
transport.input(), # Transport user input
|
||||
stt, # STT
|
||||
user_aggregator, # User responses
|
||||
context_aggregator.user(), # User responses
|
||||
llm, # LLM
|
||||
tts, # TTS
|
||||
transport.output(), # Transport bot output
|
||||
assistant_aggregator, # Assistant spoken responses
|
||||
context_aggregator.assistant(), # Assistant spoken responses
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
@@ -84,7 +84,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
]
|
||||
|
||||
context = LLMContext(messages)
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
||||
context_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(
|
||||
user_turn_strategies=UserTurnStrategies(
|
||||
@@ -97,11 +97,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
[
|
||||
transport.input(), # Transport user input
|
||||
stt, # STT
|
||||
user_aggregator, # User responses
|
||||
context_aggregator.user(), # User responses
|
||||
llm, # LLM
|
||||
tts, # TTS
|
||||
transport.output(), # Transport bot output
|
||||
assistant_aggregator, # Assistant spoken responses
|
||||
context_aggregator.assistant(), # Assistant spoken responses
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
@@ -78,7 +78,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
]
|
||||
|
||||
context = LLMContext(messages)
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
||||
context_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(
|
||||
user_turn_strategies=UserTurnStrategies(
|
||||
@@ -91,11 +91,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
[
|
||||
transport.input(), # Transport user input
|
||||
stt, # STT
|
||||
user_aggregator, # User responses
|
||||
context_aggregator.user(), # User responses
|
||||
llm, # LLM
|
||||
tts, # TTS
|
||||
transport.output(), # Transport bot output
|
||||
assistant_aggregator, # Assistant spoken responses
|
||||
context_aggregator.assistant(), # Assistant spoken responses
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
@@ -83,7 +83,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
]
|
||||
|
||||
context = LLMContext(messages)
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
||||
context_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(
|
||||
user_turn_strategies=UserTurnStrategies(
|
||||
@@ -96,11 +96,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
[
|
||||
transport.input(), # Transport user input
|
||||
stt,
|
||||
user_aggregator, # User responses
|
||||
context_aggregator.user(), # User responses
|
||||
llm, # LLM
|
||||
tts, # TTS
|
||||
transport.output(), # Transport bot output
|
||||
assistant_aggregator, # Assistant spoken responses
|
||||
context_aggregator.assistant(), # Assistant spoken responses
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
@@ -81,7 +81,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
]
|
||||
|
||||
context = LLMContext(messages)
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
||||
context_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(
|
||||
user_turn_strategies=UserTurnStrategies(
|
||||
@@ -96,11 +96,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
[
|
||||
transport.input(), # Transport user input
|
||||
stt,
|
||||
user_aggregator, # User responses
|
||||
context_aggregator.user(), # User responses
|
||||
llm, # LLM
|
||||
tts, # TTS
|
||||
transport.output(), # Transport bot output
|
||||
assistant_aggregator, # Assistant spoken responses
|
||||
context_aggregator.assistant(), # Assistant spoken responses
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
@@ -1,140 +0,0 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
|
||||
|
||||
import os
|
||||
|
||||
from dotenv import load_dotenv
|
||||
from loguru import logger
|
||||
|
||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
||||
from pipecat.audio.vad.vad_analyzer import VADParams
|
||||
from pipecat.frames.frames import LLMRunFrame
|
||||
from pipecat.pipeline.pipeline import Pipeline
|
||||
from pipecat.pipeline.runner import PipelineRunner
|
||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||
from pipecat.processors.aggregators.llm_context import LLMContext
|
||||
from pipecat.processors.aggregators.llm_response_universal import (
|
||||
LLMContextAggregatorPair,
|
||||
LLMUserAggregatorParams,
|
||||
)
|
||||
from pipecat.runner.types import RunnerArguments
|
||||
from pipecat.runner.utils import create_transport
|
||||
from pipecat.services.cartesia.tts import CartesiaTTSService
|
||||
from pipecat.services.gladia.config import GladiaInputParams, LanguageConfig
|
||||
from pipecat.services.gladia.stt import GladiaSTTService
|
||||
from pipecat.services.openai.llm import OpenAILLMService
|
||||
from pipecat.transcriptions.language import Language
|
||||
from pipecat.transports.base_transport import BaseTransport, TransportParams
|
||||
from pipecat.transports.daily.transport import DailyParams
|
||||
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
|
||||
from pipecat.turns.user_turn_strategies import ExternalUserTurnStrategies
|
||||
|
||||
load_dotenv(override=True)
|
||||
|
||||
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
|
||||
# instantiated. The function will be called when the desired transport gets
|
||||
# selected.
|
||||
transport_params = {
|
||||
"daily": lambda: DailyParams(
|
||||
audio_in_enabled=True,
|
||||
audio_out_enabled=True,
|
||||
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
|
||||
),
|
||||
"twilio": lambda: FastAPIWebsocketParams(
|
||||
audio_in_enabled=True,
|
||||
audio_out_enabled=True,
|
||||
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
|
||||
),
|
||||
"webrtc": lambda: TransportParams(
|
||||
audio_in_enabled=True,
|
||||
audio_out_enabled=True,
|
||||
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
|
||||
),
|
||||
}
|
||||
|
||||
|
||||
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
logger.info(f"Starting bot")
|
||||
|
||||
stt = GladiaSTTService(
|
||||
api_key=os.getenv("GLADIA_API_KEY", ""),
|
||||
region=os.getenv("GLADIA_REGION"),
|
||||
params=GladiaInputParams(
|
||||
language_config=LanguageConfig(
|
||||
languages=[Language.EN],
|
||||
),
|
||||
enable_vad=True,
|
||||
),
|
||||
)
|
||||
|
||||
tts = CartesiaTTSService(
|
||||
api_key=os.getenv("CARTESIA_API_KEY", ""),
|
||||
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
|
||||
)
|
||||
|
||||
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY", ""))
|
||||
|
||||
messages = [
|
||||
{
|
||||
"role": "system",
|
||||
"content": f"You are a helpful LLM. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
|
||||
},
|
||||
]
|
||||
|
||||
context = LLMContext(messages)
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(user_turn_strategies=ExternalUserTurnStrategies()),
|
||||
)
|
||||
|
||||
pipeline = Pipeline(
|
||||
[
|
||||
transport.input(), # Transport user input
|
||||
stt, # STT
|
||||
user_aggregator, # User responses
|
||||
llm, # LLM
|
||||
tts, # TTS
|
||||
transport.output(), # Transport bot output
|
||||
assistant_aggregator, # Assistant spoken responses
|
||||
]
|
||||
)
|
||||
|
||||
task = PipelineTask(
|
||||
pipeline,
|
||||
params=PipelineParams(
|
||||
enable_metrics=True,
|
||||
enable_usage_metrics=True,
|
||||
),
|
||||
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
|
||||
)
|
||||
|
||||
@transport.event_handler("on_client_connected")
|
||||
async def on_client_connected(transport, client):
|
||||
logger.info(f"Client connected")
|
||||
# Kick off the conversation.
|
||||
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
||||
await task.queue_frames([LLMRunFrame()])
|
||||
|
||||
@transport.event_handler("on_client_disconnected")
|
||||
async def on_client_disconnected(transport, client):
|
||||
logger.info(f"Client disconnected")
|
||||
await task.cancel()
|
||||
|
||||
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
|
||||
await runner.run(task)
|
||||
|
||||
|
||||
async def bot(runner_args: RunnerArguments):
|
||||
"""Main bot entry point compatible with Pipecat Cloud."""
|
||||
transport = await create_transport(runner_args, transport_params)
|
||||
await run_bot(transport, runner_args)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
from pipecat.runner.run import main
|
||||
|
||||
main()
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
@@ -87,7 +87,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
]
|
||||
|
||||
context = LLMContext(messages)
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
||||
context_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(
|
||||
user_turn_strategies=UserTurnStrategies(
|
||||
@@ -100,11 +100,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
[
|
||||
transport.input(), # Transport user input
|
||||
stt, # STT
|
||||
user_aggregator, # User responses
|
||||
context_aggregator.user(), # User responses
|
||||
llm, # LLM
|
||||
tts, # TTS
|
||||
transport.output(), # Transport bot output
|
||||
assistant_aggregator, # Assistant spoken responses
|
||||
context_aggregator.assistant(), # Assistant spoken responses
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
@@ -10,6 +10,7 @@ import os
|
||||
from dotenv import load_dotenv
|
||||
from loguru import logger
|
||||
|
||||
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
|
||||
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
|
||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
||||
from pipecat.audio.vad.vad_analyzer import VADParams
|
||||
@@ -74,7 +75,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
]
|
||||
|
||||
context = LLMContext(messages)
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
||||
context_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(
|
||||
user_turn_strategies=UserTurnStrategies(
|
||||
@@ -87,11 +88,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
[
|
||||
transport.input(), # Transport user input
|
||||
stt,
|
||||
user_aggregator, # User respones
|
||||
context_aggregator.user(), # User respones
|
||||
llm, # LLM
|
||||
tts, # TTS
|
||||
transport.output(), # Transport bot output
|
||||
assistant_aggregator, # Assistant spoken responses
|
||||
context_aggregator.assistant(), # Assistant spoken responses
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
# Copyright (c) 2024–2025, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
@@ -76,7 +76,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
]
|
||||
|
||||
context = LLMContext(messages)
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
||||
context_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(
|
||||
user_turn_strategies=UserTurnStrategies(
|
||||
@@ -89,11 +89,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
[
|
||||
transport.input(), # Transport user input
|
||||
stt,
|
||||
user_aggregator, # User responses
|
||||
context_aggregator.user(), # User responses
|
||||
llm, # LLM
|
||||
tts, # TTS
|
||||
transport.output(), # Transport bot output
|
||||
assistant_aggregator, # Assistant spoken responses
|
||||
context_aggregator.assistant(), # Assistant spoken responses
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user