Compare commits

..

1 Commits

Author SHA1 Message Date
James Hush
181cc43724 Background blur example 2025-04-08 14:09:19 +08:00
223 changed files with 7892 additions and 10935 deletions

View File

@@ -1,87 +0,0 @@
name: Bug report
description: Report a bug or unexpected behavior
type: Bug
body:
- type: markdown
attributes:
value: |
## Bug Report
Thank you for taking the time to fill out this bug report.
- type: markdown
attributes:
value: |
### Environment
- type: input
id: pipecat-version
attributes:
label: pipecat version
description: Which version are you using?
placeholder: e.g., 0.0.63
validations:
required: true
- type: input
id: python-version
attributes:
label: Python version
description: Which Python version are you using?
placeholder: e.g., 3.12.8
validations:
required: true
- type: input
id: os
attributes:
label: Operating System
description: Which OS are you using?
placeholder: e.g., Ubuntu 24.04, Windows 11, macOS 12.5
validations:
required: true
- type: textarea
id: description
attributes:
label: Issue description
description: Provide a clear description of the issue.
validations:
required: true
- type: textarea
id: repro
attributes:
label: Reproduction steps
description: List the steps to reproduce the issue.
placeholder: |
1. Do this...
2. Then do that...
3. Observe the error...
validations:
required: true
- type: textarea
id: expected
attributes:
label: Expected behavior
description: What did you expect to happen?
validations:
required: true
- type: textarea
id: actual
attributes:
label: Actual behavior
description: What actually happened?
validations:
required: true
- type: textarea
id: logs
attributes:
label: Logs
description: If applicable, include any relevant logs or error messages
render: shell
validations:
required: false

View File

@@ -1,67 +0,0 @@
name: Question
description: Ask a question or get help
type: Question
body:
- type: markdown
attributes:
value: |
## Question
Use this form to ask a question about pipecat.
- type: markdown
attributes:
value: |
### Environment (if applicable)
- type: input
id: pipecat-version
attributes:
label: pipecat version
description: Which version are you using? (if applicable)
placeholder: e.g., 0.0.63
validations:
required: false
- type: input
id: python-version
attributes:
label: Python version
description: Which Python version are you using? (if applicable)
placeholder: e.g., 3.12.8
validations:
required: false
- type: input
id: os
attributes:
label: Operating System
description: Which OS are you using? (if applicable)
placeholder: e.g., Ubuntu 24.04, Windows 11, macOS 12.5
validations:
required: false
- type: textarea
id: question
attributes:
label: Question
description: Provide your question in detail here.
validations:
required: true
- type: textarea
id: tried
attributes:
label: What I've tried
description: Describe what you've already tried or research you've done.
placeholder: I've looked at the documentation and tried...
validations:
required: false
- type: textarea
id: context
attributes:
label: Context
description: Any additional context or information that might help others understand your question better.
validations:
required: false

View File

@@ -1,52 +0,0 @@
name: Feature request
description: Suggest an enhancement or new feature
type: Enhancement
body:
- type: markdown
attributes:
value: |
## Feature Request
Thank you for suggesting an enhancement to pipecat.
- type: textarea
id: problem
attributes:
label: Problem Statement
description: A clear description of the problem this feature would solve.
placeholder: I'm always frustrated when...
validations:
required: true
- type: textarea
id: solution
attributes:
label: Proposed Solution
description: A clear and concise description of what you want to happen.
validations:
required: true
- type: textarea
id: alternatives
attributes:
label: Alternative Solutions
description: Any alternative solutions or features you've considered.
validations:
required: false
- type: textarea
id: context
attributes:
label: Additional Context
description: Add any other context, mockups, or screenshots about the feature request here.
placeholder: You can drag and drop images here to include them.
validations:
required: false
- type: checkboxes
id: contribution
attributes:
label: Would you be willing to help implement this feature?
options:
- label: Yes, I'd like to contribute
- label: No, I'm just suggesting

View File

@@ -1,82 +0,0 @@
name: Service Issue
description: An issue with a third-party service
type: Service Issue
body:
- type: markdown
attributes:
value: |
## Service Issue
Use this form to report an issue with a third-party service integration.
- type: input
id: pipecat-version
attributes:
label: pipecat version
description: Which version are you using?
placeholder: e.g., 0.0.63
validations:
required: true
- type: input
id: service-name
attributes:
label: Service Name
description: Which third-party service is having issues?
placeholder: e.g., OpenAI, ElevenLabs, Anthropic
validations:
required: true
- type: input
id: service-version
attributes:
label: Service or model version
description: Which version of the service API or model are you using?
placeholder: e.g., v1, gpt-4.1
validations:
required: false
- type: textarea
id: description
attributes:
label: Issue Description
description: Provide a clear description of the service issue.
validations:
required: true
- type: textarea
id: reproduction
attributes:
label: Reproduction Steps
description: Provide steps to reproduce the issue.
placeholder: |
1. Configure service X
2. Call method Y
3. See error Z
validations:
required: true
- type: textarea
id: expected
attributes:
label: Expected Behavior
description: What did you expect to happen?
validations:
required: true
- type: textarea
id: actual
attributes:
label: Actual Behavior
description: What actually happened?
validations:
required: true
- type: textarea
id: logs
attributes:
label: Error Logs
description: If available, include any error messages or logs.
render: shell
validations:
required: false

View File

@@ -1,56 +0,0 @@
name: New Service
description: Request to support a new third-party service
type: New Service
body:
- type: markdown
attributes:
value: |
## New Service Request
Use this form to request support for a new third-party service in pipecat.
- type: input
id: service-name
attributes:
label: Service Name
description: What is the name of the third-party service?
placeholder: e.g., NewAPI, SomeService
validations:
required: true
- type: input
id: service-website
attributes:
label: Service Website
description: Link to the service's website or documentation
placeholder: e.g., https://newapi.com
validations:
required: true
- type: textarea
id: service-description
attributes:
label: Service Description
description: Briefly describe what this service does and how it works.
validations:
required: true
- type: textarea
id: api-info
attributes:
label: API Information
description: If available, provide details about the service's API.
placeholder: |
- API documentation link
- Authentication method
- Key endpoints you'd like supported
validations:
required: false
- type: checkboxes
id: contribution
attributes:
label: Would you be willing to help implement this service?
options:
- label: Yes, I'd like to contribute
- label: No, I'm just suggesting

View File

@@ -1,74 +0,0 @@
name: Dependency Issue
description: An issue with a Pipecat dependency (not a third-party service)
type: Dependency Issue
body:
- type: markdown
attributes:
value: |
## Dependency Issue
Use this form to report an issue with a Pipecat dependency.
- type: input
id: pipecat-version
attributes:
label: pipecat version
description: Which version are you using?
placeholder: e.g., 0.0.63
validations:
required: true
- type: input
id: dependency-name
attributes:
label: Dependency Name
description: Which Pipecat dependency is causing the issue?
placeholder: e.g., openai, anthropic, fastapi
validations:
required: true
- type: input
id: dependency-version
attributes:
label: Dependency Version
description: Which version of the dependency are you using?
placeholder: e.g., 1.2.3
validations:
required: true
- type: textarea
id: description
attributes:
label: Issue Description
description: Provide a clear description of the dependency issue.
validations:
required: true
- type: textarea
id: impact
attributes:
label: Impact
description: How is this dependency issue affecting your usage of pipecat?
validations:
required: true
- type: textarea
id: reproduction
attributes:
label: Reproduction Steps
description: If applicable, provide steps to reproduce the issue.
placeholder: |
1. Install dependency X
2. Run command Y
3. See error Z
validations:
required: false
- type: textarea
id: logs
attributes:
label: Error Logs
description: If applicable, include any relevant error messages or logs.
render: shell
validations:
required: false

View File

@@ -1,70 +0,0 @@
name: Troubleshooting
description: Help with a specific use case
type: Troubleshooting
body:
- type: markdown
attributes:
value: |
## Troubleshooting Request
Use this form to get help with a specific use case or implementation.
- type: input
id: pipecat-version
attributes:
label: pipecat version
description: Which version are you using?
placeholder: e.g., 0.0.63
validations:
required: true
- type: input
id: python-version
attributes:
label: Python version
description: Which version of Python are you using?
placeholder: e.g., 3.12.8
validations:
required: true
- type: input
id: os
attributes:
label: Operating System
description: Which OS are you using?
placeholder: e.g., Ubuntu 24.04, Windows 11, macOS 12.5
validations:
required: true
- type: textarea
id: use-case
attributes:
label: Use Case Description
description: Describe what you're trying to accomplish with pipecat.
validations:
required: true
- type: textarea
id: current-approach
attributes:
label: Current Approach
description: What have you tried so far? Include code snippets if relevant.
render: python
validations:
required: true
- type: textarea
id: errors
attributes:
label: Errors or Unexpected Behavior
description: Describe any errors or unexpected behavior you're encountering.
validations:
required: true
- type: textarea
id: additional-context
attributes:
label: Additional Context
description: Any other information that might help us understand your situation.
validations:
required: false

View File

@@ -1 +0,0 @@
blank_issues_enabled: false

View File

@@ -9,101 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added ### Added
- Added support for Application Default Credentials in Google services, - Added a new iOS client option to the `SmallWebRTCTransport` **video-transform** example.
`GoogleSTTService`, `GoogleTTSService`, and `GoogleVertexLLMService`.
- Added support for Smart Turn Detection via the `turn_analyzer` transport
parameter. You can now choose between `SmartTurnAnalyzer()` for remote
inference or `LocalCoreMLSmartTurnAnalyzer()` for on-device inference using
Core ML.
- `DeepgramTTSService` accepts `base_url` argument again, allowing you to
connect to an on-prem service.
- Added `LLMUserAggregatorParams` and `LLMAssistantAggregatorParams` which allow
you to control aggregator settings. You can now pass these arguments when
creating aggregator pairs with `create_context_aggregator()`.
- Added `previous_text` context support to ElevenLabsHttpTTSService, improving
speech consistency across sentences within an LLM response.
- Added word/timestamp pairs to `ElevenLabsHttpTTSService`.
- It is now possible to disable `SoundfileMixer` when created. You can then use
`MixerEnableFrame` to dynamically enable it when necessary.
- Added `on_client_connected` and `on_client_disconnected` event handlers to
the `DailyTransport` class. These handlers map to the same underlying Daily
events as `on_participant_joined` and `on_participant_left`, respectively.
This makes it easier to write a single bot pipeline that can also use other
transports like `SmallWebRTCTransport` and `FastAPIWebsocketTransport`.
### Changed
- Daily's REST helpers now include an `eject_at_token_exp` param, which ejects
the user when their token expires. This new parameter defaults to False.
Also, the default value for `enable_prejoin_ui` changed to False and
`eject_at_room_exp` changed to False.
- `OpenAILLMService` and `OpenPipeLLMService` now use `gpt-4.1` as their
default model.
- `SoundfileMixer` constructor arguments need to be keywords.
### Deprecated
- `DeepgramSTTService` parameter `url` is now deprecated, use `base_url`
instead.
### Removed
- Parameters `user_kwargs` and `assistant_kwargs` when creating a context
aggregator pair using `create_context_aggregator()` have been removed. Use
`user_params` and `assistant_params` instead.
### Fixed
- Fixed an issue that would cause TTS websocket-based services to not cleanup
resources properly when disconnecting.
- Fixed a `TavusVideoService` issue that was causing audio choppiness.
- Fixed an issue in `SmallWebRTCTransport` where an error was thrown if the
client did not create a video transceiver.
- Fixed an issue where LLM input parameters were not working and applied correctly in `GoogleVertexLLMService`, causing
unexpected behavior during inference.
## [0.0.63] - 2025-04-11
### Added
- Added media resolution control to `GeminiMultimodalLiveLLMService` with
`GeminiMediaResolution` enum, allowing configuration of token usage for
image processing (LOW: 64 tokens, MEDIUM: 256 tokens, HIGH: zoomed reframing
with 256 tokens).
- Added Gemini's Voice Activity Detection (VAD) configuration to
`GeminiMultimodalLiveLLMService` with `GeminiVADParams`, allowing fine
control over speech detection sensitivity and timing, including:
- Start sensitivity (how quickly speech is detected)
- End sensitivity (how quickly turns end after pauses)
- Prefix padding (milliseconds of audio to keep before speech is detected)
- Silence duration (milliseconds of silence required to end a turn)
- Added comprehensive language support to `GeminiMultimodalLiveLLMService`,
supporting over 30 languages via the `language` parameter, with proper
mapping between Pipecat's `Language` enum and Gemini's language codes.
- Added support in `SmallWebRTCTransport` to detect when remote tracks are
muted.
- Added support for image capture from a video stream to the
`SmallWebRTCTransport`.
- Added a new iOS client option to the `SmallWebRTCTransport`
**video-transform** example.
- Added new processors `ProducerProcessor` and `ConsumerProcessor`. The - Added new processors `ProducerProcessor` and `ConsumerProcessor`. The
producer processor processes frames from the pipeline and decides whether the producer processor processes frames from the pipeline and decides whether the
@@ -119,26 +25,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
type was incorrectly handled as a codec retransmission. type was incorrectly handled as a codec retransmission.
- Avoid initial video delays. - Avoid initial video delays.
### Changed
- In `GeminiMultimodalLiveLLMService`, removed the `transcribe_model_audio`
parameter in favor of Gemini Live's native output transcription support. Now
text transcriptions are produced directly by the model. No configuration is
required.
- Updated `GeminiMultimodalLiveLLMService`s default `model` to
`models/gemini-2.0-flash-live-001` and `base_url` to the `v1beta` websocket
URL.
### Fixed ### Fixed
- Updated `daily-python` to 0.17.0 to fix an issue that was preventing to run on
older platforms.
- Fixed an issue where `CartesiaTTSService`'s spell feature would result in
the spelled word in the context appearing as "F,O,O,B,A,R" instead of
"FOOBAR".
- Fixed an issue in the Azure TTS services where the language was being set - Fixed an issue in the Azure TTS services where the language was being set
incorrectly. incorrectly.
@@ -146,9 +34,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
`TransportParams.audio_out_10ms_chunks`. Previously, it only worked with 20ms `TransportParams.audio_out_10ms_chunks`. Previously, it only worked with 20ms
chunks. chunks.
- Fixed an issue with `GeminiMultimodalLiveLLMService` where the assistant
context messages had no space between words.
- Fixed an issue where `LLMAssistantContextAggregator` would prevent a - Fixed an issue where `LLMAssistantContextAggregator` would prevent a
`BotStoppedSpeakingFrame` from moving through the pipeline. `BotStoppedSpeakingFrame` from moving through the pipeline.

233
README.md
View File

@@ -1,72 +1,43 @@
<h1><div align="center"> <h1><div align="center">
<img alt="pipecat" width="300px" height="auto" src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/pipecat.png">  <img alt="pipecat" width="300px" height="auto" src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/pipecat.png">
</div></h1> </div></h1>
[![PyPI](https://img.shields.io/pypi/v/pipecat-ai)](https://pypi.org/project/pipecat-ai) ![Tests](https://github.com/pipecat-ai/pipecat/actions/workflows/tests.yaml/badge.svg) [![codecov](https://codecov.io/gh/pipecat-ai/pipecat/graph/badge.svg?token=LNVUIVO4Y9)](https://codecov.io/gh/pipecat-ai/pipecat) [![Docs](https://img.shields.io/badge/Documentation-blue)](https://docs.pipecat.ai) [![Discord](https://img.shields.io/discord/1239284677165056021)](https://discord.gg/pipecat) [![PyPI](https://img.shields.io/pypi/v/pipecat-ai)](https://pypi.org/project/pipecat-ai) ![Tests](https://github.com/pipecat-ai/pipecat/actions/workflows/tests.yaml/badge.svg) [![codecov](https://codecov.io/gh/pipecat-ai/pipecat/graph/badge.svg?token=LNVUIVO4Y9)](https://codecov.io/gh/pipecat-ai/pipecat) [![Docs](https://img.shields.io/badge/Documentation-blue)](https://docs.pipecat.ai) [![Discord](https://img.shields.io/discord/1239284677165056021)](https://discord.gg/pipecat)
# 🎙️ Pipecat: Real-Time Voice & Multimodal AI Agents Pipecat is an open source Python framework for building voice and multimodal conversational agents. It handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions, letting you focus on creating engaging experiences.
**Pipecat** is an open-source Python framework for building real-time voice and multimodal conversational agents. Orchestrate audio and video, AI services, different transports, and conversation pipelines effortlessly—so you can focus on what makes your agent unique. ## What you can build
## 🚀 What You Can Build - **Voice Assistants**: [Natural, real-time conversations with AI](https://demo.dailybots.ai/)
- **Interactive Agents**: Personal coaches and meeting assistants
- **Multimodal Apps**: Combine voice, video, images, and text
- **Creative Tools**: [Story-telling experiences](https://storytelling-chatbot.fly.dev/) and social companions
- **Business Solutions**: [Customer intake flows](https://www.youtube.com/watch?v=lDevgsp9vn0) and support bots
- **Complex conversational flows**: [Refer to Pipecat Flows](https://github.com/pipecat-ai/pipecat-flows) to learn more
- **Voice Assistants** natural, streaming conversations with AI ## See it in action
- **AI Companions** coaches, meeting assistants, characters
- **Multimodal Interfaces** voice, video, images, and more
- **Interactive Storytelling** creative tools with generative media
- **Business Agents** customer intake, support bots, guided flows
- **Complex Dialog Systems** design logic with structured conversations
🧭 Looking to build structured conversations? Check out [Pipecat Flows](https://github.com/pipecat-ai/pipecat-flows) for managing complex conversational states and transitions.
## 🧠 Why Pipecat?
- **Voice-first**: Integrates speech recognition, text-to-speech, and conversation handling
- **Pluggable**: Supports many AI services and tools
- **Composable Pipelines**: Build complex behavior from modular components
- **Real-Time**: Ultra-low latency interaction with different transports (e.g. WebSockets or WebRTC)
## 🎬 See it in action
<p float="left"> <p float="left">
<a href="https://github.com/pipecat-ai/pipecat/tree/main/examples/simple-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/simple-chatbot/image.png" width="400" /></a>&nbsp; <a href="https://github.com/pipecat-ai/pipecat/tree/main/examples/simple-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/simple-chatbot/image.png" width="280" /></a>&nbsp;
<a href="https://github.com/pipecat-ai/pipecat/tree/main/examples/storytelling-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/storytelling-chatbot/image.png" width="400" /></a> <a href="https://github.com/pipecat-ai/pipecat/tree/main/examples/storytelling-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/storytelling-chatbot/image.png" width="280" /></a>
<br/> <br/>
<a href="https://github.com/pipecat-ai/pipecat/tree/main/examples/translation-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/translation-chatbot/image.png" width="400" /></a>&nbsp; <a href="https://github.com/pipecat-ai/pipecat/tree/main/examples/translation-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/translation-chatbot/image.png" width="280" /></a>&nbsp;
<a href="https://github.com/pipecat-ai/pipecat/tree/main/examples/moondream-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/moondream-chatbot/image.png" width="400" /></a> <a href="https://github.com/pipecat-ai/pipecat/tree/main/examples/moondream-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/moondream-chatbot/image.png" width="280" /></a>
</p> </p>
## 📱 Client SDKs ## Key features
You can connect to Pipecat from any platform using our official SDKs: - **Voice-first Design**: Built-in speech recognition, TTS, and conversation handling
- **Flexible Integration**: Works with popular AI services (OpenAI, ElevenLabs, etc.)
- **Pipeline Architecture**: Build complex apps from simple, reusable components
- **Real-time Processing**: Frame-based pipeline architecture for fluid interactions
- **Production Ready**: Enterprise-grade WebRTC and Websocket support
| Platform | SDK Repo | Description | 💡 Looking to build structured conversations? Check out [Pipecat Flows](https://github.com/pipecat-ai/pipecat-flows) for managing complex conversational states and transitions.
| -------- | ------------------------------------------------------------------------------ | -------------------------------- |
| Web | [pipecat-client-web](https://github.com/pipecat-ai/pipecat-client-web) | JavaScript and React client SDKs |
| iOS | [pipecat-client-ios](https://github.com/pipecat-ai/pipecat-client-ios) | Swift SDK for iOS |
| Android | [pipecat-client-android](https://github.com/pipecat-ai/pipecat-client-android) | Kotlin SDK for Android |
| C++ | [pipecat-client-cxx](https://github.com/pipecat-ai/pipecat-client-cxx) | C++ client SDK |
## 🧩 Available services ## Getting started
| Category | Services | You can get started with Pipecat running on your local machine, then move your agent processes to the cloud when youre ready. You can also add a 📞 telephone number, 🖼️ image output, 📺 video input, use different LLMs, and more.
| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [Parakeet (NVIDIA)](https://docs.pipecat.ai/server/services/stt/parakeet), [Ultravox](https://docs.pipecat.ai/server/services/stt/ultravox), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) |
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [Together AI](https://docs.pipecat.ai/server/services/llm/together) |
| Text-to-Speech | [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [FastPitch (NVIDIA)](https://docs.pipecat.ai/server/services/tts/fastpitch), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
| Speech-to-Speech | [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai) |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local |
| Video | [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) |
| Memory | [mem0](https://docs.pipecat.ai/server/services/memory/mem0) |
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/fal), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) |
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [Noisereduce](https://docs.pipecat.ai/server/utilities/audio/noisereduce-filter) |
| Analytics & Metrics | [Canonical AI](https://docs.pipecat.ai/server/services/analytics/canonical), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) |
📚 [View full services documentation →](https://docs.pipecat.ai/server/services/supported-services)
## ⚡ Getting started
You can get started with Pipecat running on your local machine, then move your agent processes to the cloud when youre ready.
```shell ```shell
# Install the module # Install the module
@@ -82,51 +53,141 @@ To keep things lightweight, only the core framework is included by default. If y
pip install "pipecat-ai[option,...]" pip install "pipecat-ai[option,...]"
``` ```
## 🧪 Code examples ### Available services
| Category | Services | Install Command Example |
| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------- |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [Parakeet (NVIDIA)](https://docs.pipecat.ai/server/services/stt/parakeet), [Ultravox](https://docs.pipecat.ai/server/services/stt/ultravox), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) | `pip install "pipecat-ai[deepgram]"` |
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [Together AI](https://docs.pipecat.ai/server/services/llm/together) | `pip install "pipecat-ai[openai]"` |
| Text-to-Speech | [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [FastPitch (NVIDIA)](https://docs.pipecat.ai/server/services/tts/fastpitch), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) | `pip install "pipecat-ai[cartesia]"` |
| Speech-to-Speech | [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai) | `pip install "pipecat-ai[google]"` |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local | `pip install "pipecat-ai[daily]"` |
| Video | [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) | `pip install "pipecat-ai[tavus,simli]"` |
| Memory | [mem0](https://docs.pipecat.ai/server/services/memory/mem0) | `pip install "pipecat-ai[mem0]"` |
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/fal), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) | `pip install "pipecat-ai[moondream]"` |
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [Noisereduce](https://docs.pipecat.ai/server/utilities/audio/noisereduce-filter) | `pip install "pipecat-ai[silero]"` |
| Analytics & Metrics | [Canonical AI](https://docs.pipecat.ai/server/services/analytics/canonical), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) | `pip install "pipecat-ai[canonical]"` |
📚 [View full services documentation →](https://docs.pipecat.ai/server/services/supported-services)
## Code examples
- [Foundational](https://github.com/pipecat-ai/pipecat/tree/main/examples/foundational) — small snippets that build on each other, introducing one or two concepts at a time - [Foundational](https://github.com/pipecat-ai/pipecat/tree/main/examples/foundational) — small snippets that build on each other, introducing one or two concepts at a time
- [Example apps](https://github.com/pipecat-ai/pipecat/tree/main/examples/) — complete applications that you can use as starting points for development - [Example apps](https://github.com/pipecat-ai/pipecat/tree/main/examples/) — complete applications that you can use as starting points for development
## 🛠️ Hacking on the framework itself ## A simple voice agent running locally
1. Set up a virtual environment before following these instructions. From the root of the repo: Here is a very basic Pipecat bot that greets a user when they join a real-time session. We'll use [Daily](https://daily.co) for real-time media transport, and [Cartesia](https://cartesia.ai/) for text-to-speech.
```shell ```python
python3 -m venv venv import asyncio
source venv/bin/activate
```
2. Install the development dependencies: from pipecat.frames.frames import TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.pipeline.runner import PipelineRunner
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
```shell async def main():
pip install -r dev-requirements.txt # Use Daily as a real-time media transport (WebRTC)
``` transport = DailyTransport(
room_url=...,
token="", # leave empty. Note: token is _not_ your api key
bot_name="Bot Name",
params=DailyParams(audio_out_enabled=True))
3. Install the git pre-commit hooks (these help ensure your code follows project rules): # Use Cartesia for Text-to-Speech
tts = CartesiaTTSService(
api_key=...,
voice_id=...
)
```shell # Simple pipeline that will process text to speech and output the result
pre-commit install pipeline = Pipeline([tts, transport.output()])
```
4. Install the `pipecat-ai` package locally in editable mode: # Create Pipecat processor that can run one or more pipelines tasks
runner = PipelineRunner()
```shell # Assign the task callable to run the pipeline
pip install -e . task = PipelineTask(pipeline)
```
> The `-e` or `--editable` option allows you to modify the code without reinstalling. # Register an event handler to play audio when a
# participant joins the transport WebRTC session
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
participant_name = participant.get("info", {}).get("userName", "")
# Queue a TextFrame that will get spoken by the TTS service (Cartesia)
await task.queue_frame(TextFrame(f"Hello there, {participant_name}!"))
5. Include optional dependencies as needed. For example: # Register an event handler to exit the application when the user leaves.
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
```shell # Run the pipeline task
pip install -e ".[daily,deepgram,cartesia,openai,silero]" await runner.run(task)
```
6. (Optional) If you want to use this package from another directory: if __name__ == "__main__":
asyncio.run(main())
```
```shell Run it with:
pip install "path_to_this_repo[option,...]"
``` ```shell
python app.py
```
Daily provides a prebuilt WebRTC user interface. While the app is running, you can visit at `https://<yourdomain>.daily.co/<room_url>` and listen to the bot say hello!
## WebRTC for production use
WebSockets are fine for server-to-server communication or for initial development. But for production use, youll need client-server audio to use a protocol designed for real-time media transport. (For an explanation of the difference between WebSockets and WebRTC, see [this post.](https://www.daily.co/blog/how-to-talk-to-an-llm-with-your-voice/#webrtc))
One way to get up and running quickly with WebRTC is to sign up for a Daily developer account. Daily gives you SDKs and global infrastructure for audio (and video) routing. Every account gets 10,000 audio/video/transcription minutes free each month.
Sign up [here](https://dashboard.daily.co/u/signup) and [create a room](https://docs.daily.co/reference/rest-api/rooms) in the developer Dashboard.
## Hacking on the framework itself
_Note: You may need to set up a virtual environment before following these instructions. From the root of the repo:_
```shell
python3 -m venv venv
source venv/bin/activate
```
Install the development dependencies:
```shell
pip install -r dev-requirements.txt
```
Install the git pre-commit hooks (these help ensure your code follows project rules):
```shell
pre-commit install
```
Install the `pipecat-ai` package locally in editable mode:
```shell
pip install -e .
```
The `-e` or `--editable` option allows you to modify the code without reinstalling.
To include optional dependencies, add them to the install command. For example:
```shell
pip install -e ".[daily,deepgram,cartesia,openai,silero]" # Updated for the services you're using
```
If you want to use this package from another directory:
```shell
pip install "path_to_this_repo[option,...]"
```
### Running tests ### Running tests
@@ -136,11 +197,11 @@ From the root directory, run:
pytest pytest
``` ```
### Setting up your editor ## Setting up your editor
This project uses strict [PEP 8](https://peps.python.org/pep-0008/) formatting via [Ruff](https://github.com/astral-sh/ruff). This project uses strict [PEP 8](https://peps.python.org/pep-0008/) formatting via [Ruff](https://github.com/astral-sh/ruff).
#### Emacs ### Emacs
You can use [use-package](https://github.com/jwiegley/use-package) to install [emacs-lazy-ruff](https://github.com/christophermadsen/emacs-lazy-ruff) package and configure `ruff` arguments: You can use [use-package](https://github.com/jwiegley/use-package) to install [emacs-lazy-ruff](https://github.com/christophermadsen/emacs-lazy-ruff) package and configure `ruff` arguments:
@@ -162,7 +223,7 @@ You can use [use-package](https://github.com/jwiegley/use-package) to install [e
:hook ((python-mode . pyvenv-auto-run))) :hook ((python-mode . pyvenv-auto-run)))
``` ```
#### Visual Studio Code ### Visual Studio Code
Install the Install the
[Ruff](https://marketplace.visualstudio.com/items?itemName=charliermarsh.ruff) extension. Then edit the user settings (_Ctrl-Shift-P_ `Open User Settings (JSON)`) and set it as the default Python formatter, and enable formatting on save: [Ruff](https://marketplace.visualstudio.com/items?itemName=charliermarsh.ruff) extension. Then edit the user settings (_Ctrl-Shift-P_ `Open User Settings (JSON)`) and set it as the default Python formatter, and enable formatting on save:
@@ -174,7 +235,7 @@ Install the
} }
``` ```
#### PyCharm ### PyCharm
`ruff` was installed in the `venv` environment described before, now to enable autoformatting on save, go to `File` -> `Settings` -> `Tools` -> `File Watchers` and add a new watcher with the following settings: `ruff` was installed in the `venv` environment described before, now to enable autoformatting on save, go to `File` -> `Settings` -> `Tools` -> `File Watchers` and add a new watcher with the following settings:
@@ -184,7 +245,7 @@ Install the
4. **Arguments**: `format $FilePath$` 4. **Arguments**: `format $FilePath$`
5. **Program**: `$PyInterpreterDirectory$/ruff` 5. **Program**: `$PyInterpreterDirectory$/ruff`
## 🤝 Contributing ## Contributing
We welcome contributions from the community! Whether you're fixing bugs, improving documentation, or adding new features, here's how you can help: We welcome contributions from the community! Whether you're fixing bugs, improving documentation, or adding new features, here's how you can help:
@@ -197,7 +258,7 @@ Before submitting a pull request, please check existing issues and PRs to avoid
We aim to review all contributions promptly and provide constructive feedback to help get your changes merged. We aim to review all contributions promptly and provide constructive feedback to help get your changes merged.
## 🛟 Getting help ## Getting help
➡️ [Join our Discord](https://discord.gg/pipecat) ➡️ [Join our Discord](https://discord.gg/pipecat)

22
docs/ISSUE_TEMPLATE.md Normal file
View File

@@ -0,0 +1,22 @@
# Description
Is this reporting a bug or feature request?
If reporting a bug, please fill out the following:
### Environment
- pipecat-ai version:
- python version:
- OS:
### Issue description
Provide a clear description of the issue.
### Repro steps
List the steps to reproduce the issue.
### Expected behavior
### Actual behavior
### Logs

View File

@@ -92,8 +92,4 @@ ASSEMBLYAI_API_KEY=...
OPENROUTER_API_KEY=... OPENROUTER_API_KEY=...
# Piper # Piper
PIPER_BASE_URL=... PIPER_BASE_URL=...
# Smart turn
LOCAL_SMART_TURN_MODEL_PATH=
REMOTE_SMART_TURN_URL=

View File

@@ -72,7 +72,7 @@ async def main():
# voice_id="gD1IexrzCvsXPHUuT0s3", # voice_id="gD1IexrzCvsXPHUuT0s3",
) )
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY")) llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [ messages = [
{ {

View File

@@ -95,7 +95,7 @@ async def main():
# voice_id="gD1IexrzCvsXPHUuT0s3", # voice_id="gD1IexrzCvsXPHUuT0s3",
) )
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY")) llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [ messages = [
{ {

View File

@@ -53,7 +53,7 @@ async def main(room_url: str, token: str):
voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""), voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
) )
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY")) llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [ messages = [
{ {

View File

@@ -43,7 +43,7 @@ async def main(room_url: str, token: str):
api_key=os.getenv("CARTESIA_API_KEY", ""), voice_id="71a7ad14-091c-4e8e-a314-022ece01c121" api_key=os.getenv("CARTESIA_API_KEY", ""), voice_id="71a7ad14-091c-4e8e-a314-022ece01c121"
) )
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY")) llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [ messages = [
{ {

View File

@@ -141,7 +141,6 @@ async def dial(request: RoomRequest, raw_request: Request):
"display_name": request.From, "display_name": request.From,
"sip_mode": "dial-in", "sip_mode": "dial-in",
"num_endpoints": 2 if request.call_transfer is not None else 1, "num_endpoints": 2 if request.call_transfer is not None else 1,
"codecs": {"audio": ["OPUS"]},
} }
daily_room_properties["sip"] = sip_config daily_room_properties["sip"] = sip_config

View File

@@ -103,7 +103,6 @@ export default async function handler(req, res) {
display_name: From, display_name: From,
sip_mode: 'dial-in', sip_mode: 'dial-in',
num_endpoints: call_transfer !== null ? 2 : 1, num_endpoints: call_transfer !== null ? 2 : 1,
codecs: {"audio": ["OPUS"]},
}; };
daily_room_properties.sip = sip_config; daily_room_properties.sip = sip_config;
} }
@@ -173,4 +172,4 @@ export const config = {
sizeLimit: '1mb', sizeLimit: '1mb',
}, },
}, },
}; };

View File

@@ -61,7 +61,7 @@ async def main(room_url: str, token: str):
api_key=os.getenv("CARTESIA_API_KEY"), voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22" api_key=os.getenv("CARTESIA_API_KEY"), voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22"
) )
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY")) llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [ messages = [
{ {

View File

@@ -4,54 +4,54 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.frames.frames import EndFrame, TTSSpeakFrame from pipecat.frames.frames import EndFrame, TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask from pipecat.pipeline.task import PipelineTask
from pipecat.services.piper.tts import PiperTTSService from pipecat.services.piper.tts import PiperTTSService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def run_bot(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Starting bot")
# Create a transport using the WebRTC connection async def main():
transport = SmallWebRTCTransport(
webrtc_connection=webrtc_connection,
params=TransportParams(
audio_out_enabled=True,
),
)
# Create an HTTP session
async with aiohttp.ClientSession() as session: async with aiohttp.ClientSession() as session:
(room_url, _) = await configure(session)
transport = DailyTransport(
room_url, None, "Say One Thing", DailyParams(audio_out_enabled=True)
)
tts = PiperTTSService( tts = PiperTTSService(
base_url=os.getenv("PIPER_BASE_URL"), aiohttp_session=session, sample_rate=24000 base_url=os.getenv("PIPER_BASE_URL"), aiohttp_session=session, sample_rate=24000
) )
runner = PipelineRunner()
task = PipelineTask(Pipeline([tts, transport.output()])) task = PipelineTask(Pipeline([tts, transport.output()]))
# Register an event handler so we can play the audio when the client joins # Register an event handler so we can play the audio when the
@transport.event_handler("on_client_connected") # participant joins.
async def on_client_connected(transport, client): @transport.event_handler("on_first_participant_joined")
await task.queue_frames([TTSSpeakFrame(f"Hello there!"), EndFrame()]) async def on_first_participant_joined(transport, participant):
await task.queue_frames(
runner = PipelineRunner(handle_sigint=False) [TTSSpeakFrame(f"Hello there, how are you today ?"), EndFrame()]
)
await runner.run(task) await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -1,59 +0,0 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.frames.frames import EndFrame, TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.services.rime.tts import RimeHttpTTSService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True)
async def run_bot(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Starting bot")
# Create a transport using the WebRTC connection
transport = SmallWebRTCTransport(
webrtc_connection=webrtc_connection,
params=TransportParams(
audio_out_enabled=True,
),
)
# Create an HTTP session
async with aiohttp.ClientSession() as session:
tts = RimeHttpTTSService(
api_key=os.getenv("RIME_API_KEY", ""),
voice_id="rex",
aiohttp_session=session,
)
task = PipelineTask(Pipeline([tts, transport.output()]))
# Register an event handler so we can play the audio when the client joins
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
await task.queue_frames([TTSSpeakFrame(f"Hello there!"), EndFrame()])
runner = PipelineRunner(handle_sigint=False)
await runner.run(task)
if __name__ == "__main__":
from run import main
main()

View File

@@ -4,52 +4,56 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.frames.frames import EndFrame, TTSSpeakFrame from pipecat.frames.frames import EndFrame, TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask from pipecat.pipeline.task import PipelineTask
from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def run_bot(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Starting bot")
# Create a transport using the WebRTC connection async def main():
transport = SmallWebRTCTransport( async with aiohttp.ClientSession() as session:
webrtc_connection=webrtc_connection, (room_url, _) = await configure(session)
params=TransportParams(
audio_out_enabled=True,
),
)
tts = CartesiaTTSService( transport = DailyTransport(
api_key=os.getenv("CARTESIA_API_KEY"), room_url, None, "Say One Thing", DailyParams(audio_out_enabled=True)
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady )
)
task = PipelineTask(Pipeline([tts, transport.output()])) tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
# Register an event handler so we can play the audio when the client joins runner = PipelineRunner()
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
await task.queue_frames([TTSSpeakFrame(f"Hello there!"), EndFrame()])
runner = PipelineRunner(handle_sigint=False) task = PipelineTask(Pipeline([tts, transport.output()]))
await runner.run(task) # Register an event handler so we can play the audio when the
# participant joins.
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
participant_name = participant.get("info", {}).get("userName", "")
await task.queue_frames(
[TTSSpeakFrame(f"Hello there, {participant_name}!"), EndFrame()]
)
await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,49 +4,51 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.frames.frames import EndFrame, TTSSpeakFrame from pipecat.frames.frames import EndFrame, TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask from pipecat.pipeline.task import PipelineTask
from pipecat.services.riva.tts import FastPitchTTSService from pipecat.services.riva.tts import FastPitchTTSService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def run_bot(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Starting bot")
# Create a transport using the WebRTC connection async def main():
transport = SmallWebRTCTransport( async with aiohttp.ClientSession() as session:
webrtc_connection=webrtc_connection, (room_url, _) = await configure(session)
params=TransportParams(
audio_out_enabled=True,
),
)
tts = FastPitchTTSService(api_key=os.getenv("NVIDIA_API_KEY")) transport = DailyTransport(
room_url, None, "Say One Thing", DailyParams(audio_out_enabled=True)
)
task = PipelineTask(Pipeline([tts, transport.output()])) tts = FastPitchTTSService(api_key=os.getenv("NVIDIA_API_KEY"))
# Register an event handler so we can play the audio when the client joins runner = PipelineRunner()
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
await task.queue_frames([TTSSpeakFrame(f"Hello there!"), EndFrame()])
runner = PipelineRunner(handle_sigint=False) task = PipelineTask(Pipeline([tts, transport.output()]))
await runner.run(task) # Register an event handler so we can play the audio when the
# participant joins.
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
participant_name = participant.get("info", {}).get("userName", "")
await task.queue_frames([TTSSpeakFrame(f"Aloha, {participant_name}!"), EndFrame()])
await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,10 +4,14 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.frames.frames import EndFrame, LLMMessagesFrame from pipecat.frames.frames import EndFrame, LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.pipeline import Pipeline
@@ -15,51 +19,46 @@ from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask from pipecat.pipeline.task import PipelineTask
from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def run_bot(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Starting bot")
# Create a transport using the WebRTC connection async def main():
transport = SmallWebRTCTransport( async with aiohttp.ClientSession() as session:
webrtc_connection=webrtc_connection, (room_url, _) = await configure(session)
params=TransportParams(
audio_out_enabled=True,
),
)
tts = CartesiaTTSService( transport = DailyTransport(
api_key=os.getenv("CARTESIA_API_KEY"), room_url, None, "Say One Thing From an LLM", DailyParams(audio_out_enabled=True)
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady )
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY")) tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
messages = [ llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
{
"role": "system",
"content": "You are an LLM in a WebRTC session, and this is a 'hello world' demo. Say hello to the world.",
}
]
task = PipelineTask(Pipeline([llm, tts, transport.output()])) messages = [
{
"role": "system",
"content": "You are an LLM in a WebRTC session, and this is a 'hello world' demo. Say hello to the world.",
}
]
# Register an event handler so we can play the audio when the client joins runner = PipelineRunner()
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
await task.queue_frames([LLMMessagesFrame(messages), EndFrame()])
runner = PipelineRunner(handle_sigint=False) task = PipelineTask(Pipeline([llm, tts, transport.output()]))
await runner.run(task) @transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await task.queue_frames([LLMMessagesFrame(messages), EndFrame()])
await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,67 +4,59 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.frames.frames import TextFrame from pipecat.frames.frames import TextFrame
from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask from pipecat.pipeline.task import PipelineTask
from pipecat.services.fal.image import FalImageGenService from pipecat.services.fal.image import FalImageGenService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def run_bot(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Starting bot")
# Create a transport using the WebRTC connection async def main():
transport = SmallWebRTCTransport(
webrtc_connection=webrtc_connection,
params=TransportParams(
camera_out_enabled=True,
camera_out_width=1024,
camera_out_height=1024,
),
)
# Create an HTTP session
async with aiohttp.ClientSession() as session: async with aiohttp.ClientSession() as session:
(room_url, _) = await configure(session)
transport = DailyTransport(
room_url,
None,
"Show a still frame image",
DailyParams(camera_out_enabled=True, camera_out_width=1024, camera_out_height=1024),
)
imagegen = FalImageGenService( imagegen = FalImageGenService(
params=FalImageGenService.InputParams(image_size="square_hd"), params=FalImageGenService.InputParams(image_size="square_hd"),
aiohttp_session=session, aiohttp_session=session,
key=os.getenv("FAL_KEY"), key=os.getenv("FAL_KEY"),
) )
runner = PipelineRunner()
task = PipelineTask(Pipeline([imagegen, transport.output()])) task = PipelineTask(Pipeline([imagegen, transport.output()]))
# Register an event handler so we can play the audio when the client joins @transport.event_handler("on_first_participant_joined")
@transport.event_handler("on_client_connected") async def on_first_participant_joined(transport, participant):
async def on_client_connected(transport, client):
await task.queue_frame(TextFrame("a cat in the style of picasso")) await task.queue_frame(TextFrame("a cat in the style of picasso"))
@transport.event_handler("on_client_disconnected") @transport.event_handler("on_participant_left")
async def on_client_disconnected(transport, client): async def on_participant_left(transport, participant, reason):
logger.info(f"Client disconnected")
@transport.event_handler("on_client_closed")
async def on_client_closed(transport, client):
logger.info(f"Client closed connection")
await task.cancel() await task.cancel()
runner = PipelineRunner(handle_sigint=False)
await runner.run(task) await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,67 +4,62 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.frames.frames import TextFrame from pipecat.frames.frames import EndFrame, TextFrame
from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.services.google.image import GoogleImageGenService from pipecat.services.google.image import GoogleImageGenService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def run_bot(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Starting bot")
# Create a transport using the WebRTC connection async def main():
transport = SmallWebRTCTransport( async with aiohttp.ClientSession() as session:
webrtc_connection=webrtc_connection, (room_url, _) = await configure(session)
params=TransportParams(
camera_out_enabled=True,
camera_out_width=1024,
camera_out_height=1024,
),
)
imagegen = GoogleImageGenService( transport = DailyTransport(
api_key=os.getenv("GOOGLE_API_KEY"), room_url,
) None,
"Show a still frame image",
DailyParams(camera_out_enabled=True, camera_out_width=1024, camera_out_height=1024),
)
task = PipelineTask( imagegen = GoogleImageGenService(
Pipeline([imagegen, transport.output()]), api_key=os.getenv("GOOGLE_API_KEY"),
params=PipelineParams(enable_metrics=True), )
)
# Register an event handler so we can play the audio when the client joins runner = PipelineRunner()
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
await task.queue_frame(TextFrame("a cat in the style of picasso"))
await task.queue_frame(TextFrame("a dog in the style of picasso"))
await task.queue_frame(TextFrame("a fish in the style of picasso"))
@transport.event_handler("on_client_disconnected") task = PipelineTask(
async def on_client_disconnected(transport, client): Pipeline([imagegen, transport.output()]),
logger.info(f"Client disconnected") params=PipelineParams(enable_metrics=True),
)
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") await task.queue_frame(TextFrame("a cat in the style of picasso"))
await task.cancel() await task.queue_frame(TextFrame("a dog in the style of picasso"))
await task.queue_frame(TextFrame("a fish in the style of picasso"))
runner = PipelineRunner(handle_sigint=False) @transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
await runner.run(task) await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -13,9 +13,9 @@ import os
import sys import sys
import aiohttp import aiohttp
from daily_runner import configure
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.frames.frames import EndPipeFrame, LLMMessagesFrame, TextFrame from pipecat.frames.frames import EndPipeFrame, LLMMessagesFrame, TextFrame
from pipecat.pipeline.merge_pipeline import SequentialMergePipeline from pipecat.pipeline.merge_pipeline import SequentialMergePipeline

View File

@@ -4,12 +4,15 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
from dataclasses import dataclass from dataclasses import dataclass
import aiohttp import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.frames.frames import ( from pipecat.frames.frames import (
DataFrame, DataFrame,
@@ -27,12 +30,13 @@ from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.cartesia.tts import CartesiaHttpTTSService from pipecat.services.cartesia.tts import CartesiaHttpTTSService
from pipecat.services.fal.image import FalImageGenService from pipecat.services.fal.image import FalImageGenService
from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
@dataclass @dataclass
class MonthFrame(DataFrame): class MonthFrame(DataFrame):
@@ -63,29 +67,23 @@ class MonthPrepender(FrameProcessor):
await self.push_frame(frame, direction) await self.push_frame(frame, direction)
async def run_bot(webrtc_connection: SmallWebRTCConnection): async def main():
"""Run the Calendar Month Narration bot using WebRTC transport.
Args:
webrtc_connection: The WebRTC connection to use
room_name: Optional room name for display purposes
"""
logger.info(f"Starting bot")
# Create a transport using the WebRTC connection
transport = SmallWebRTCTransport(
webrtc_connection=webrtc_connection,
params=TransportParams(
audio_out_enabled=True,
camera_out_enabled=True,
camera_out_width=1024,
camera_out_height=1024,
),
)
# Create an HTTP session for API calls
async with aiohttp.ClientSession() as session: async with aiohttp.ClientSession() as session:
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY")) (room_url, _) = await configure(session)
transport = DailyTransport(
room_url,
None,
"Month Narration Bot",
DailyParams(
audio_out_enabled=True,
camera_out_enabled=True,
camera_out_width=1024,
camera_out_height=1024,
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
tts = CartesiaHttpTTSService( tts = CartesiaHttpTTSService(
api_key=os.getenv("CARTESIA_API_KEY"), api_key=os.getenv("CARTESIA_API_KEY"),
@@ -146,30 +144,14 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection):
frames.append(MonthFrame(month=month)) frames.append(MonthFrame(month=month))
frames.append(LLMMessagesFrame(messages)) frames.append(LLMMessagesFrame(messages))
runner = PipelineRunner()
task = PipelineTask(pipeline) task = PipelineTask(pipeline)
# Set up transport event handlers await task.queue_frames(frames)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Start the month narration once connected
await task.queue_frames(frames)
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
@transport.event_handler("on_client_closed")
async def on_client_closed(transport, client):
logger.info(f"Client closed connection")
await task.cancel()
# Run the pipeline
runner = PipelineRunner(handle_sigint=False)
await runner.run(task) await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -93,7 +93,7 @@ async def main():
self.frame = frame self.frame = frame
await self.push_frame(frame, direction) await self.push_frame(frame, direction)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY")) llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
tts = CartesiaHttpTTSService( tts = CartesiaHttpTTSService(
api_key=os.getenv("CARTESIA_API_KEY"), api_key=os.getenv("CARTESIA_API_KEY"),

View File

@@ -4,10 +4,14 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import Frame, MetricsFrame from pipecat.frames.frames import Frame, MetricsFrame
@@ -23,14 +27,14 @@ from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
class MetricsLogger(FrameProcessor): class MetricsLogger(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection): async def process_frame(self, frame: Frame, direction: FrameDirection):
@@ -52,83 +56,76 @@ class MetricsLogger(FrameProcessor):
await self.push_frame(frame, direction) await self.push_frame(frame, direction)
async def run_bot(webrtc_connection: SmallWebRTCConnection): async def main():
logger.info(f"Starting bot") async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = SmallWebRTCTransport( transport = DailyTransport(
webrtc_connection=webrtc_connection, room_url,
params=TransportParams( token,
audio_in_enabled=True, "Respond bot",
audio_out_enabled=True, DailyParams(
vad_enabled=True, audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(), transcription_enabled=True,
vad_audio_passthrough=True, vad_enabled=True,
), vad_analyzer=SileroVADAnalyzer(),
) ),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
tts = CartesiaTTSService( llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY")) ml = MetricsLogger()
ml = MetricsLogger() messages = [
{
messages = [ "role": "system",
{ "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
"role": "system", },
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(),
stt,
context_aggregator.user(),
llm,
tts,
ml,
transport.output(),
context_aggregator.assistant(),
] ]
)
task = PipelineTask( context = OpenAILLMContext(messages)
pipeline, context_aggregator = llm.create_context_aggregator(context)
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
)
@transport.event_handler("on_client_connected") pipeline = Pipeline(
async def on_client_connected(transport, client): [
logger.info(f"Client connected") transport.input(),
# Kick off the conversation. context_aggregator.user(),
messages.append({"role": "system", "content": "Please introduce yourself to the user."}) llm,
await task.queue_frames([context_aggregator.user().get_context_frame()]) tts,
ml,
transport.output(),
context_aggregator.assistant(),
]
)
@transport.event_handler("on_client_disconnected") task = PipelineTask(
async def on_client_disconnected(transport, client): pipeline,
logger.info(f"Client disconnected") params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
)
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") await transport.capture_participant_transcription(participant["id"])
await task.cancel() # Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner(handle_sigint=False) @transport.event_handler("on_participant_left")
await runner.run(task) async def on_participant_left(transport, participant, reason):
await task.cancel()
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,11 +4,15 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from PIL import Image from PIL import Image
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import ( from pipecat.frames.frames import (
@@ -16,6 +20,7 @@ from pipecat.frames.frames import (
BotStoppedSpeakingFrame, BotStoppedSpeakingFrame,
Frame, Frame,
OutputImageRawFrame, OutputImageRawFrame,
TextFrame,
) )
from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.runner import PipelineRunner
@@ -23,14 +28,14 @@ from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
class ImageSyncAggregator(FrameProcessor): class ImageSyncAggregator(FrameProcessor):
def __init__(self, speaking_path: str, waiting_path: str): def __init__(self, speaking_path: str, waiting_path: str):
@@ -67,90 +72,83 @@ class ImageSyncAggregator(FrameProcessor):
await self.push_frame(frame) await self.push_frame(frame)
async def run_bot(webrtc_connection: SmallWebRTCConnection): async def main():
logger.info(f"Starting bot") async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = SmallWebRTCTransport( transport = DailyTransport(
webrtc_connection=webrtc_connection, room_url,
params=TransportParams( token,
audio_in_enabled=True, "Respond bot",
audio_out_enabled=True, DailyParams(
camera_out_enabled=True, audio_out_enabled=True,
camera_out_width=1024, camera_out_enabled=True,
camera_out_height=1024, camera_out_width=1024,
vad_enabled=True, camera_out_height=1024,
vad_analyzer=SileroVADAnalyzer(), transcription_enabled=True,
vad_audio_passthrough=True, vad_enabled=True,
), vad_analyzer=SileroVADAnalyzer(),
) ),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
tts = CartesiaTTSService( llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY")) messages = [
{
messages = [ "role": "system",
{ "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
"role": "system", },
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
image_sync_aggregator = ImageSyncAggregator(
os.path.join(os.path.dirname(__file__), "assets", "speaking.png"),
os.path.join(os.path.dirname(__file__), "assets", "waiting.png"),
)
pipeline = Pipeline(
[
transport.input(),
stt,
context_aggregator.user(),
llm,
tts,
image_sync_aggregator,
transport.output(),
context_aggregator.assistant(),
] ]
)
task = PipelineTask( context = OpenAILLMContext(messages)
pipeline, context_aggregator = llm.create_context_aggregator(context)
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_connected") image_sync_aggregator = ImageSyncAggregator(
async def on_client_connected(transport, client): os.path.join(os.path.dirname(__file__), "assets", "speaking.png"),
logger.info(f"Client connected") os.path.join(os.path.dirname(__file__), "assets", "waiting.png"),
# Kick off the conversation. )
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_client_disconnected") pipeline = Pipeline(
async def on_client_disconnected(transport, client): [
logger.info(f"Client disconnected") transport.input(),
context_aggregator.user(),
llm,
tts,
image_sync_aggregator,
transport.output(),
context_aggregator.assistant(),
]
)
@transport.event_handler("on_client_closed") task = PipelineTask(
async def on_client_closed(transport, client): pipeline,
logger.info(f"Client closed connection") params=PipelineParams(
await task.cancel() allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
runner = PipelineRunner(handle_sigint=False) @transport.event_handler("on_first_participant_joined")
await runner.run(task) async def on_first_participant_joined(transport, participant):
participant_name = participant.get("info", {}).get("userName", "")
await transport.capture_participant_transcription(participant["id"])
await task.queue_frames([TextFrame(f"Hi there {participant_name}!")])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -0,0 +1,104 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.audio.vad.silero import SileroVAD
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
transcription_enabled=True,
),
)
vad = SileroVAD()
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(),
vad,
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -4,10 +4,14 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.pipeline import Pipeline
@@ -15,92 +19,84 @@ from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def run_bot(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Starting bot")
transport = SmallWebRTCTransport( async def main():
webrtc_connection=webrtc_connection, async with aiohttp.ClientSession() as session:
params=TransportParams( (room_url, token) = await configure(session)
audio_in_enabled=True,
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = CartesiaTTSService( tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"), api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
) )
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY")) llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [ messages = [
{ {
"role": "system", "role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.", "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
}, },
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
] ]
)
task = PipelineTask( context = OpenAILLMContext(messages)
pipeline, context_aggregator = llm.create_context_aggregator(context)
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_connected") pipeline = Pipeline(
async def on_client_connected(transport, client): [
logger.info(f"Client connected") transport.input(), # Transport user input
# Kick off the conversation. context_aggregator.user(), # User responses
messages.append({"role": "system", "content": "Please introduce yourself to the user."}) llm, # LLM
await task.queue_frames([context_aggregator.user().get_context_frame()]) tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
@transport.event_handler("on_client_disconnected") task = PipelineTask(
async def on_client_disconnected(transport, client): pipeline,
logger.info(f"Client disconnected") params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") await transport.capture_participant_transcription(participant["id"])
await task.cancel() # Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner(handle_sigint=False) @transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
await runner.run(task) runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -0,0 +1,106 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.anthropic.llm import AnthropicLLMService
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = AnthropicLLMService(
api_key=os.getenv("ANTHROPIC_API_KEY"), model="claude-3-opus-20240229"
)
# todo: think more about how to handle system prompts in a more general way. OpenAI,
# Google, and Anthropic all have slightly different approaches to providing a system
# prompt.
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative, helpful, and brief way. Say hello.",
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -1,106 +0,0 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.audio.vad.silero import SileroVAD
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True)
async def run_bot(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Starting bot")
transport = SmallWebRTCTransport(
webrtc_connection=webrtc_connection,
params=TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
vad = SileroVAD()
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(),
stt,
vad,
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
@transport.event_handler("on_client_closed")
async def on_client_closed(transport, client):
logger.info(f"Client closed connection")
await task.cancel()
runner = PipelineRunner(handle_sigint=False)
await runner.run(task)
if __name__ == "__main__":
from run import main
main()

View File

@@ -4,8 +4,11 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_community.chat_message_histories import ChatMessageHistory from langchain_community.chat_message_histories import ChatMessageHistory
@@ -13,6 +16,7 @@ from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_openai import ChatOpenAI from langchain_openai import ChatOpenAI
from loguru import logger from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMMessagesFrame from pipecat.frames.frames import LLMMessagesFrame
@@ -25,14 +29,14 @@ from pipecat.processors.aggregators.llm_response import (
) )
from pipecat.processors.frameworks.langchain import LangchainProcessor from pipecat.processors.frameworks.langchain import LangchainProcessor
from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
message_store = {} message_store = {}
@@ -42,97 +46,90 @@ def get_session_history(session_id: str) -> BaseChatMessageHistory:
return message_store[session_id] return message_store[session_id]
async def run_bot(webrtc_connection: SmallWebRTCConnection): async def main():
logger.info(f"Starting bot") async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = SmallWebRTCTransport( transport = DailyTransport(
webrtc_connection=webrtc_connection, room_url,
params=TransportParams( token,
audio_in_enabled=True, "Respond bot",
audio_out_enabled=True, DailyParams(
vad_enabled=True, audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(), transcription_enabled=True,
vad_audio_passthrough=True, vad_enabled=True,
), vad_analyzer=SileroVADAnalyzer(),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"Be nice and helpful. Answer very briefly and without special characters like `#` or `*`. "
"Your response will be synthesized to voice and those characters will create unnatural sounds.",
), ),
MessagesPlaceholder("chat_history"), )
("human", "{input}"),
]
)
chain = prompt | ChatOpenAI(model="gpt-4.1", temperature=0.7)
history_chain = RunnableWithMessageHistory(
chain,
get_session_history,
history_messages_key="chat_history",
input_messages_key="input",
)
lc = LangchainProcessor(history_chain)
tma_in = LLMUserResponseAggregator() tts = CartesiaTTSService(
tma_out = LLMAssistantResponseAggregator() api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
pipeline = Pipeline( prompt = ChatPromptTemplate.from_messages(
[ [
transport.input(), # Transport user input (
stt, "system",
tma_in, # User responses "Be nice and helpful. Answer very briefly and without special characters like `#` or `*`. "
lc, # Langchain "Your response will be synthesized to voice and those characters will create unnatural sounds.",
tts, # TTS ),
transport.output(), # Transport bot output MessagesPlaceholder("chat_history"),
tma_out, # Assistant spoken responses ("human", "{input}"),
] ]
) )
chain = prompt | ChatOpenAI(model="gpt-4o", temperature=0.7)
history_chain = RunnableWithMessageHistory(
chain,
get_session_history,
history_messages_key="chat_history",
input_messages_key="input",
)
lc = LangchainProcessor(history_chain)
task = PipelineTask( tma_in = LLMUserResponseAggregator()
pipeline, tma_out = LLMAssistantResponseAggregator()
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_connected") pipeline = Pipeline(
async def on_client_connected(transport, client): [
logger.info(f"Client connected") transport.input(), # Transport user input
# Kick off the conversation. tma_in, # User responses
# the `LLMMessagesFrame` will be picked up by the LangchainProcessor using lc, # Langchain
# only the content of the last message to inject it in the prompt defined tts, # TTS
# above. So no role is required here. transport.output(), # Transport bot output
messages = [({"content": "Please briefly introduce yourself to the user."})] tma_out, # Assistant spoken responses
await task.queue_frames([LLMMessagesFrame(messages)]) ]
)
@transport.event_handler("on_client_disconnected") task = PipelineTask(
async def on_client_disconnected(transport, client): pipeline,
logger.info(f"Client disconnected") params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") await transport.capture_participant_transcription(participant["id"])
await task.cancel() lc.set_participant_id(participant["id"])
# Kick off the conversation.
# the `LLMMessagesFrame` will be picked up by the LangchainProcessor using
# only the content of the last message to inject it in the prompt defined
# above. So no role is required here.
messages = [({"content": "Please briefly introduce yourself to the user."})]
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner(handle_sigint=False) @transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
await runner.run(task) runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,11 +4,15 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from deepgram import LiveOptions from deepgram import LiveOptions
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.frames.frames import ( from pipecat.frames.frames import (
BotInterruptionFrame, BotInterruptionFrame,
@@ -23,95 +27,91 @@ from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.deepgram.stt import DeepgramSTTService from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.deepgram.tts import DeepgramTTSService from pipecat.services.deepgram.tts import DeepgramTTSService
from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def run_bot(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Starting bot")
transport = SmallWebRTCTransport( async def main():
webrtc_connection=webrtc_connection, async with aiohttp.ClientSession() as session:
params=TransportParams( (room_url, _) = await configure(session)
audio_in_enabled=True,
audio_out_enabled=True,
),
)
stt = DeepgramSTTService( transport = DailyTransport(
api_key=os.getenv("DEEPGRAM_API_KEY"), room_url,
live_options=LiveOptions(vad_events=True, utterance_end_ms="1000"), None,
) "Respond bot",
DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
)
tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-helios-en") stt = DeepgramSTTService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
live_options=LiveOptions(vad_events=True, utterance_end_ms="1000"),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY")) tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-helios-en")
messages = [ llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages) messages = [
context_aggregator = llm.create_context_aggregator(context) {
"role": "system",
pipeline = Pipeline( "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
[ },
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
] ]
)
task = PipelineTask( context = OpenAILLMContext(messages)
pipeline, context_aggregator = llm.create_context_aggregator(context)
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@stt.event_handler("on_speech_started") pipeline = Pipeline(
async def on_speech_started(stt, *args, **kwargs): [
await task.queue_frames([BotInterruptionFrame(), UserStartedSpeakingFrame()]) transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
@stt.event_handler("on_utterance_end") task = PipelineTask(
async def on_utterance_end(stt, *args, **kwargs): pipeline,
await task.queue_frames([StopInterruptionFrame(), UserStoppedSpeakingFrame()]) params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_connected") @stt.event_handler("on_speech_started")
async def on_client_connected(transport, client): async def on_speech_started(stt, *args, **kwargs):
logger.info(f"Client connected") await task.queue_frames([BotInterruptionFrame(), UserStartedSpeakingFrame()])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_client_disconnected") @stt.event_handler("on_utterance_end")
async def on_client_disconnected(transport, client): async def on_utterance_end(stt, *args, **kwargs):
logger.info(f"Client disconnected") await task.queue_frames([StopInterruptionFrame(), UserStoppedSpeakingFrame()])
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") # Kick off the conversation.
await task.cancel() messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner(handle_sigint=False) @transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
await runner.run(task) runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,10 +4,14 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.pipeline import Pipeline
@@ -17,87 +21,82 @@ from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.deepgram.stt import DeepgramSTTService from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.deepgram.tts import DeepgramTTSService from pipecat.services.deepgram.tts import DeepgramTTSService
from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def run_bot(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Starting bot")
transport = SmallWebRTCTransport( async def main():
webrtc_connection=webrtc_connection, async with aiohttp.ClientSession() as session:
params=TransportParams( (room_url, _) = await configure(session)
audio_in_enabled=True,
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) transport = DailyTransport(
room_url,
None,
"Respond bot",
DailyParams(
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-helios-en") stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY")) tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-helios-en")
messages = [ llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages) messages = [
context_aggregator = llm.create_context_aggregator(context) {
"role": "system",
pipeline = Pipeline( "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
[ },
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
] ]
)
task = PipelineTask( context = OpenAILLMContext(messages)
pipeline, context_aggregator = llm.create_context_aggregator(context)
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_connected") pipeline = Pipeline(
async def on_client_connected(transport, client): [
logger.info(f"Client connected") transport.input(), # Transport user input
# Kick off the conversation. stt, # STT
messages.append({"role": "system", "content": "Please introduce yourself to the user."}) context_aggregator.user(), # User responses
await task.queue_frames([context_aggregator.user().get_context_frame()]) llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
@transport.event_handler("on_client_disconnected") task = PipelineTask(
async def on_client_disconnected(transport, client): pipeline,
logger.info(f"Client disconnected") params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") # Kick off the conversation.
await task.cancel() messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner(handle_sigint=False) @transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
await runner.run(task) runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,44 +4,45 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.elevenlabs.tts import ElevenLabsHttpTTSService from pipecat.services.elevenlabs.tts import ElevenLabsHttpTTSService
from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def run_bot(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Starting bot")
transport = SmallWebRTCTransport( async def main():
webrtc_connection=webrtc_connection,
params=TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
# Create an HTTP session
async with aiohttp.ClientSession() as session: async with aiohttp.ClientSession() as session:
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) (room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = ElevenLabsHttpTTSService( tts = ElevenLabsHttpTTSService(
api_key=os.getenv("ELEVENLABS_API_KEY", ""), api_key=os.getenv("ELEVENLABS_API_KEY", ""),
@@ -49,7 +50,7 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection):
aiohttp_session=session, aiohttp_session=session,
) )
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY")) llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [ messages = [
{ {
@@ -64,7 +65,6 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection):
pipeline = Pipeline( pipeline = Pipeline(
[ [
transport.input(), # Transport user input transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses context_aggregator.user(), # User responses
llm, # LLM llm, # LLM
tts, # TTS tts, # TTS
@@ -83,28 +83,21 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection):
), ),
) )
@transport.event_handler("on_client_connected") @transport.event_handler("on_first_participant_joined")
async def on_client_connected(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client connected") await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation. # Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."}) messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()]) await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_client_disconnected") @transport.event_handler("on_participant_left")
async def on_client_disconnected(transport, client): async def on_participant_left(transport, participant, reason):
logger.info(f"Client disconnected")
@transport.event_handler("on_client_closed")
async def on_client_closed(transport, client):
logger.info(f"Client closed connection")
await task.cancel() await task.cancel()
runner = PipelineRunner(handle_sigint=False) runner = PipelineRunner()
await runner.run(task) await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,103 +4,99 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.elevenlabs.tts import ElevenLabsTTSService from pipecat.services.elevenlabs.tts import ElevenLabsTTSService
from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def run_bot(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Starting bot")
transport = SmallWebRTCTransport( async def main():
webrtc_connection=webrtc_connection, async with aiohttp.ClientSession() as session:
params=TransportParams( (room_url, token) = await configure(session)
audio_in_enabled=True,
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = ElevenLabsTTSService( tts = ElevenLabsTTSService(
api_key=os.getenv("ELEVENLABS_API_KEY", ""), api_key=os.getenv("ELEVENLABS_API_KEY", ""),
voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""), voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
) )
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY")) llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [ messages = [
{ {
"role": "system", "role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.", "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
}, },
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
] ]
)
task = PipelineTask( context = OpenAILLMContext(messages)
pipeline, context_aggregator = llm.create_context_aggregator(context)
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_connected") pipeline = Pipeline(
async def on_client_connected(transport, client): [
logger.info(f"Client connected") transport.input(), # Transport user input
# Kick off the conversation. context_aggregator.user(), # User responses
messages.append({"role": "system", "content": "Please introduce yourself to the user."}) llm, # LLM
await task.queue_frames([context_aggregator.user().get_context_frame()]) tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
@transport.event_handler("on_client_disconnected") task = PipelineTask(
async def on_client_disconnected(transport, client): pipeline,
logger.info(f"Client disconnected") params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") await transport.capture_participant_transcription(participant["id"])
await task.cancel() # Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner(handle_sigint=False) @transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
await runner.run(task) runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,104 +4,100 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.playht.tts import PlayHTHttpTTSService from pipecat.services.playht.tts import PlayHTHttpTTSService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def run_bot(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Starting bot")
transport = SmallWebRTCTransport( async def main():
webrtc_connection=webrtc_connection, async with aiohttp.ClientSession() as session:
params=TransportParams( (room_url, token) = await configure(session)
audio_in_enabled=True,
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = PlayHTHttpTTSService( tts = PlayHTHttpTTSService(
user_id=os.getenv("PLAYHT_USER_ID"), user_id=os.getenv("PLAYHT_USER_ID"),
api_key=os.getenv("PLAYHT_API_KEY"), api_key=os.getenv("PLAYHT_API_KEY"),
voice_url="s3://voice-cloning-zero-shot/d9ff78ba-d016-47f6-b0ef-dd630f59414e/female-cs/manifest.json", voice_url="s3://voice-cloning-zero-shot/d9ff78ba-d016-47f6-b0ef-dd630f59414e/female-cs/manifest.json",
) )
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY")) llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [ messages = [
{ {
"role": "system", "role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.", "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
}, },
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
] ]
)
task = PipelineTask( context = OpenAILLMContext(messages)
pipeline, context_aggregator = llm.create_context_aggregator(context)
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_connected") pipeline = Pipeline(
async def on_client_connected(transport, client): [
logger.info(f"Client connected") transport.input(), # Transport user input
# Kick off the conversation. context_aggregator.user(), # User responses
messages.append({"role": "system", "content": "Please introduce yourself to the user."}) llm, # LLM
await task.queue_frames([context_aggregator.user().get_context_frame()]) tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
@transport.event_handler("on_client_disconnected") task = PipelineTask(
async def on_client_disconnected(transport, client): pipeline,
logger.info(f"Client disconnected") params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") await transport.capture_participant_transcription(participant["id"])
await task.cancel() # Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner(handle_sigint=False) @transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
await runner.run(task) runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,106 +4,102 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.playht.tts import PlayHTTTSService from pipecat.services.playht.tts import PlayHTTTSService
from pipecat.transcriptions.language import Language from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def run_bot(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Starting bot")
transport = SmallWebRTCTransport( async def main():
webrtc_connection=webrtc_connection, async with aiohttp.ClientSession() as session:
params=TransportParams( (room_url, token) = await configure(session)
audio_in_enabled=True,
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = PlayHTTTSService( tts = PlayHTTTSService(
user_id=os.getenv("PLAYHT_USER_ID"), user_id=os.getenv("PLAYHT_USER_ID"),
api_key=os.getenv("PLAYHT_API_KEY"), api_key=os.getenv("PLAYHT_API_KEY"),
voice_url="s3://voice-cloning-zero-shot/e46b4027-b38d-4d24-b292-38fbca2be0ef/original/manifest.json", voice_url="s3://voice-cloning-zero-shot/e46b4027-b38d-4d24-b292-38fbca2be0ef/original/manifest.json",
params=PlayHTTTSService.InputParams(language=Language.EN), params=PlayHTTTSService.InputParams(language=Language.EN),
) )
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY")) llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [ messages = [
{ {
"role": "system", "role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.", "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
}, },
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
] ]
)
task = PipelineTask( context = OpenAILLMContext(messages)
pipeline, context_aggregator = llm.create_context_aggregator(context)
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_connected") pipeline = Pipeline(
async def on_client_connected(transport, client): [
logger.info(f"Client connected") transport.input(), # Transport user input
# Kick off the conversation. context_aggregator.user(), # User responses
messages.append({"role": "system", "content": "Please introduce yourself to the user."}) llm, # LLM
await task.queue_frames([context_aggregator.user().get_context_frame()]) tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
@transport.event_handler("on_client_disconnected") task = PipelineTask(
async def on_client_disconnected(transport, client): pipeline,
logger.info(f"Client disconnected") params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") await transport.capture_participant_transcription(participant["id"])
await task.cancel() # Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner(handle_sigint=False) @transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
await runner.run(task) runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,10 +4,14 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.pipeline import Pipeline
@@ -17,97 +21,93 @@ from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.azure.llm import AzureLLMService from pipecat.services.azure.llm import AzureLLMService
from pipecat.services.azure.stt import AzureSTTService from pipecat.services.azure.stt import AzureSTTService
from pipecat.services.azure.tts import AzureTTSService from pipecat.services.azure.tts import AzureTTSService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def run_bot(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Starting bot")
transport = SmallWebRTCTransport( async def main():
webrtc_connection=webrtc_connection, async with aiohttp.ClientSession() as session:
params=TransportParams( (room_url, token) = await configure(session)
audio_in_enabled=True,
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
stt = AzureSTTService( transport = DailyTransport(
api_key=os.getenv("AZURE_SPEECH_API_KEY"), room_url,
region=os.getenv("AZURE_SPEECH_REGION"), token,
) "Respond bot",
DailyParams(
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
tts = AzureTTSService( stt = AzureSTTService(
api_key=os.getenv("AZURE_SPEECH_API_KEY"), api_key=os.getenv("AZURE_SPEECH_API_KEY"),
region=os.getenv("AZURE_SPEECH_REGION"), region=os.getenv("AZURE_SPEECH_REGION"),
) )
llm = AzureLLMService( tts = AzureTTSService(
api_key=os.getenv("AZURE_CHATGPT_API_KEY"), api_key=os.getenv("AZURE_SPEECH_API_KEY"),
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"), region=os.getenv("AZURE_SPEECH_REGION"),
model=os.getenv("AZURE_CHATGPT_MODEL"), )
)
messages = [ llm = AzureLLMService(
{ api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
"role": "system", endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.", model=os.getenv("AZURE_CHATGPT_MODEL"),
}, )
]
context = OpenAILLMContext(messages) messages = [
context_aggregator = llm.create_context_aggregator(context) {
"role": "system",
pipeline = Pipeline( "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
[ },
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
] ]
)
task = PipelineTask( context = OpenAILLMContext(messages)
pipeline, context_aggregator = llm.create_context_aggregator(context)
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_connected") pipeline = Pipeline(
async def on_client_connected(transport, client): [
logger.info(f"Client connected") transport.input(), # Transport user input
# Kick off the conversation. stt, # STT
messages.append({"role": "system", "content": "Please introduce yourself to the user."}) context_aggregator.user(), # User responses
await task.queue_frames([context_aggregator.user().get_context_frame()]) llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
@transport.event_handler("on_client_disconnected") task = PipelineTask(
async def on_client_disconnected(transport, client): pipeline,
logger.info(f"Client disconnected") params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") await transport.capture_participant_transcription(participant["id"])
await task.cancel() # Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner(handle_sigint=False) @transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
await runner.run(task) runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,10 +4,14 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.pipeline import Pipeline
@@ -17,92 +21,95 @@ from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.openai.stt import OpenAISTTService from pipecat.services.openai.stt import OpenAISTTService
from pipecat.services.openai.tts import OpenAITTSService from pipecat.services.openai.tts import OpenAITTSService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def run_bot(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Starting bot")
transport = SmallWebRTCTransport( async def main():
webrtc_connection=webrtc_connection, async with aiohttp.ClientSession() as session:
params=TransportParams( (room_url, token) = await configure(session)
audio_in_enabled=True,
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
stt = OpenAISTTService( transport = DailyTransport(
api_key=os.getenv("OPENAI_API_KEY"), room_url,
model="gpt-4o-transcribe", token,
prompt="Expect words related to dogs, such as breed names.", "Respond bot",
) DailyParams(
audio_out_enabled=True,
audio_out_sample_rate=24000,
transcription_enabled=False,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
tts = OpenAITTSService(api_key=os.getenv("OPENAI_API_KEY"), voice="ballad") # You can use the OpenAI compatible API like Groq.
# stt = OpenAISTTService(
# base_url="https://api.groq.com/openai/v1",
# api_key="gsk_***",
# model="whisper-large-v3",
# )
stt = OpenAISTTService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o-transcribe-latest",
prompt="Expect words related to dogs, such as breed names.",
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY")) tts = OpenAITTSService(api_key=os.getenv("OPENAI_API_KEY"), voice="ballad")
messages = [ llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
{
"role": "system",
"content": "You are very knowledgable about dogs. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages) messages = [
context_aggregator = llm.create_context_aggregator(context) {
"role": "system",
pipeline = Pipeline( "content": "You are very knowledgable about dogs. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
[ },
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
] ]
)
task = PipelineTask( context = OpenAILLMContext(messages)
pipeline, context_aggregator = llm.create_context_aggregator(context)
params=PipelineParams(
allow_interruptions=True,
audio_out_sample_rate=24000,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_connected") pipeline = Pipeline(
async def on_client_connected(transport, client): [
logger.info(f"Client connected") transport.input(), # Transport user input
# Kick off the conversation. stt, # STT
messages.append({"role": "system", "content": "Please introduce yourself to the user."}) context_aggregator.user(), # User responses
await task.queue_frames([context_aggregator.user().get_context_frame()]) llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
@transport.event_handler("on_client_disconnected") task = PipelineTask(
async def on_client_disconnected(transport, client): pipeline,
logger.info(f"Client disconnected") params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") await transport.capture_participant_transcription(participant["id"])
await task.cancel() # Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner(handle_sigint=False) @transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
await runner.run(task) runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,11 +4,15 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import time import time
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.pipeline import Pipeline
@@ -16,97 +20,90 @@ from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openpipe.llm import OpenPipeLLMService from pipecat.services.openpipe.llm import OpenPipeLLMService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def run_bot(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Starting bot")
transport = SmallWebRTCTransport( async def main():
webrtc_connection=webrtc_connection, async with aiohttp.ClientSession() as session:
params=TransportParams( (room_url, token) = await configure(session)
audio_in_enabled=True,
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = CartesiaTTSService( tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"), api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
) )
timestamp = int(time.time()) timestamp = int(time.time())
llm = OpenPipeLLMService( llm = OpenPipeLLMService(
api_key=os.getenv("OPENAI_API_KEY"), api_key=os.getenv("OPENAI_API_KEY"),
openpipe_api_key=os.getenv("OPENPIPE_API_KEY"), openpipe_api_key=os.getenv("OPENPIPE_API_KEY"),
tags={"conversation_id": f"pipecat-{timestamp}"}, model="gpt-4o",
) tags={"conversation_id": f"pipecat-{timestamp}"},
)
messages = [ messages = [
{ {
"role": "system", "role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.", "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
}, },
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
] ]
)
task = PipelineTask( context = OpenAILLMContext(messages)
pipeline, context_aggregator = llm.create_context_aggregator(context)
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_connected") pipeline = Pipeline(
async def on_client_connected(transport, client): [
logger.info(f"Client connected") transport.input(), # Transport user input
# Kick off the conversation. context_aggregator.user(), # User responses
messages.append({"role": "system", "content": "Please introduce yourself to the user."}) llm, # LLM
await task.queue_frames([context_aggregator.user().get_context_frame()]) tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
@transport.event_handler("on_client_disconnected") task = PipelineTask(
async def on_client_disconnected(transport, client): pipeline,
logger.info(f"Client disconnected") params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") await transport.capture_participant_transcription(participant["id"])
await task.cancel() # Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner(handle_sigint=False) @transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
await runner.run(task) runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,44 +4,45 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.xtts.tts import XTTSService from pipecat.services.xtts.tts import XTTSService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def run_bot(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Starting bot")
transport = SmallWebRTCTransport( async def main():
webrtc_connection=webrtc_connection,
params=TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
# Create an HTTP session
async with aiohttp.ClientSession() as session: async with aiohttp.ClientSession() as session:
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) (room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = XTTSService( tts = XTTSService(
aiohttp_session=session, aiohttp_session=session,
@@ -49,7 +50,7 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection):
base_url="http://localhost:8000", base_url="http://localhost:8000",
) )
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY")) llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [ messages = [
{ {
@@ -64,7 +65,6 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection):
pipeline = Pipeline( pipeline = Pipeline(
[ [
transport.input(), # Transport user input transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses context_aggregator.user(), # User responses
llm, # LLM llm, # LLM
tts, # TTS tts, # TTS
@@ -83,28 +83,21 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection):
), ),
) )
@transport.event_handler("on_client_connected") @transport.event_handler("on_first_participant_joined")
async def on_client_connected(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client connected") await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation. # Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."}) messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()]) await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_client_disconnected") @transport.event_handler("on_participant_left")
async def on_client_disconnected(transport, client): async def on_participant_left(transport, participant, reason):
logger.info(f"Client disconnected")
@transport.event_handler("on_client_closed")
async def on_client_closed(transport, client):
logger.info(f"Client closed connection")
await task.cancel() await task.cancel()
runner = PipelineRunner(handle_sigint=False) runner = PipelineRunner()
await runner.run(task) await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,10 +4,14 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.pipeline import Pipeline
@@ -19,96 +23,94 @@ from pipecat.services.gladia.config import GladiaInputParams, LanguageConfig
from pipecat.services.gladia.stt import GladiaSTTService from pipecat.services.gladia.stt import GladiaSTTService
from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transcriptions.language import Language from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def run_bot(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Starting bot")
transport = SmallWebRTCTransport( async def main():
webrtc_connection=webrtc_connection, async with aiohttp.ClientSession() as session:
params=TransportParams( (room_url, token) = await configure(session)
audio_in_enabled=True,
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
stt = GladiaSTTService( transport = DailyTransport(
api_key=os.getenv("GLADIA_API_KEY", ""), room_url,
params=GladiaInputParams( token,
language_config=LanguageConfig( "Respond bot",
languages=[Language.EN], DailyParams(
) audio_out_enabled=True,
), vad_enabled=True,
) vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
tts = CartesiaTTSService( stt = GladiaSTTService(
api_key=os.getenv("CARTESIA_API_KEY", ""), api_key=os.getenv("GLADIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady params=GladiaInputParams(
) language_config=LanguageConfig(
languages=[Language.EN],
)
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY", "")) tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
messages = [ llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
{
"role": "system",
"content": f"You are a helpful LLM. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages) messages = [
context_aggregator = llm.create_context_aggregator(context) {
"role": "system",
pipeline = Pipeline( "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
[ },
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
] ]
)
task = PipelineTask( context = OpenAILLMContext(messages)
pipeline, context_aggregator = llm.create_context_aggregator(context)
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_connected") pipeline = Pipeline(
async def on_client_connected(transport, client): [
logger.info(f"Client connected") transport.input(), # Transport user input
# Kick off the conversation. stt, # STT
messages.append({"role": "system", "content": "Please introduce yourself to the user."}) context_aggregator.user(), # User responses
await task.queue_frames([context_aggregator.user().get_context_frame()]) llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
@transport.event_handler("on_client_disconnected") task = PipelineTask(
async def on_client_disconnected(transport, client): pipeline,
logger.info(f"Client disconnected") params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") await transport.capture_participant_transcription(participant["id"])
await task.cancel() # Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner(handle_sigint=False) # Register an event handler to exit the application when the user leaves.
await runner.run(task) @transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,100 +4,96 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.lmnt.tts import LmntTTSService from pipecat.services.lmnt.tts import LmntTTSService
from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def run_bot(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Starting bot")
transport = SmallWebRTCTransport( async def main():
webrtc_connection=webrtc_connection, async with aiohttp.ClientSession() as session:
params=TransportParams( (room_url, token) = await configure(session)
audio_in_enabled=True,
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = LmntTTSService(api_key=os.getenv("LMNT_API_KEY"), voice_id="morgan") tts = LmntTTSService(api_key=os.getenv("LMNT_API_KEY"), voice_id="morgan")
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY")) llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [ messages = [
{ {
"role": "system", "role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.", "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
}, },
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User respones
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
] ]
)
task = PipelineTask( context = OpenAILLMContext(messages)
pipeline, context_aggregator = llm.create_context_aggregator(context)
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_connected") pipeline = Pipeline(
async def on_client_connected(transport, client): [
logger.info(f"Client connected") transport.input(), # Transport user input
# Kick off the conversation. context_aggregator.user(), # User respones
messages.append({"role": "system", "content": "Please introduce yourself to the user."}) llm, # LLM
await task.queue_frames([context_aggregator.user().get_context_frame()]) tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
@transport.event_handler("on_client_disconnected") task = PipelineTask(
async def on_client_disconnected(transport, client): pipeline,
logger.info(f"Client disconnected") params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") await transport.capture_participant_transcription(participant["id"])
await task.cancel() # Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner(handle_sigint=False) @transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
await runner.run(task) runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -1,102 +0,0 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.groq.llm import GroqLLMService
from pipecat.services.groq.stt import GroqSTTService
from pipecat.services.groq.tts import GroqTTSService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True)
async def run_bot(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Starting bot")
transport = SmallWebRTCTransport(
webrtc_connection=webrtc_connection,
params=TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
stt = GroqSTTService(api_key=os.getenv("GROQ_API_KEY"))
llm = GroqLLMService(api_key=os.getenv("GROQ_API_KEY"), model="llama-3.3-70b-versatile")
tts = GroqTTSService(api_key=os.getenv("GROQ_API_KEY"))
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
),
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
@transport.event_handler("on_client_closed")
async def on_client_closed(transport, client):
logger.info(f"Client closed connection")
await task.cancel()
runner = PipelineRunner(handle_sigint=False)
await runner.run(task)
if __name__ == "__main__":
from run import main
main()

View File

@@ -0,0 +1,115 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.together.llm import TogetherLLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = TogetherLLMService(
api_key=os.getenv("TOGETHER_API_KEY"),
model="meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
params=TogetherLLMService.InputParams(
temperature=1.0,
top_p=0.9,
top_k=40,
extra={
"frequency_penalty": 2.0,
"presence_penalty": 0.0,
},
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond in plain language. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
user_aggregator = context_aggregator.user()
assistant_aggregator = context_aggregator.assistant()
pipeline = Pipeline(
[
transport.input(), # Transport user input
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
assistant_aggregator, # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -4,10 +4,14 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.pipeline import Pipeline
@@ -17,93 +21,89 @@ from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.aws.tts import PollyTTSService from pipecat.services.aws.tts import PollyTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def run_bot(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Starting bot")
transport = SmallWebRTCTransport( async def main():
webrtc_connection=webrtc_connection, async with aiohttp.ClientSession() as session:
params=TransportParams( (room_url, _) = await configure(session)
audio_in_enabled=True,
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) transport = DailyTransport(
room_url,
None,
"Respond bot",
DailyParams(
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
tts = PollyTTSService( stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
api_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
region=os.getenv("AWS_REGION"),
voice_id="Amy",
params=PollyTTSService.InputParams(engine="neural", language="en-GB", rate="1.05"),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY")) tts = PollyTTSService(
api_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
region=os.getenv("AWS_REGION"),
voice_id="Amy",
params=PollyTTSService.InputParams(engine="neural", language="en-GB", rate="1.05"),
)
messages = [ llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages) messages = [
context_aggregator = llm.create_context_aggregator(context) {
"role": "system",
pipeline = Pipeline( "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
[ },
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
] ]
)
task = PipelineTask( context = OpenAILLMContext(messages)
pipeline, context_aggregator = llm.create_context_aggregator(context)
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_connected") pipeline = Pipeline(
async def on_client_connected(transport, client): [
logger.info(f"Client connected") transport.input(), # Transport user input
# Kick off the conversation. stt, # STT
messages.append({"role": "system", "content": "Please introduce yourself to the user."}) context_aggregator.user(), # User responses
await task.queue_frames([context_aggregator.user().get_context_frame()]) llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
@transport.event_handler("on_client_disconnected") task = PipelineTask(
async def on_client_disconnected(transport, client): pipeline,
logger.info(f"Client disconnected") params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") await transport.capture_participant_transcription(participant["id"])
await task.cancel() # Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner(handle_sigint=False) @transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
await runner.run(task) runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,10 +4,14 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.pipeline import Pipeline
@@ -18,94 +22,88 @@ from pipecat.services.google.llm import GoogleLLMService
from pipecat.services.google.stt import GoogleSTTService from pipecat.services.google.stt import GoogleSTTService
from pipecat.services.google.tts import GoogleTTSService from pipecat.services.google.tts import GoogleTTSService
from pipecat.transcriptions.language import Language from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def run_bot(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Starting bot")
transport = SmallWebRTCTransport( async def main():
webrtc_connection=webrtc_connection, async with aiohttp.ClientSession() as session:
params=TransportParams( (room_url, _) = await configure(session)
audio_in_enabled=True,
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
stt = GoogleSTTService( transport = DailyTransport(
params=GoogleSTTService.InputParams(languages=Language.EN_US), room_url,
credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"), None,
) "Respond bot",
DailyParams(
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
tts = GoogleTTSService( stt = GoogleSTTService(
voice_id="en-US-Chirp3-HD-Charon", params=GoogleSTTService.InputParams(languages=Language.EN_US),
params=GoogleTTSService.InputParams(language=Language.EN_US), )
credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
)
llm = GoogleLLMService(api_key=os.getenv("GOOGLE_API_KEY")) tts = GoogleTTSService(
voice_id="en-US-Journey-F",
params=GoogleTTSService.InputParams(language=Language.EN_US),
)
messages = [ llm = GoogleLLMService(api_key=os.getenv("GOOGLE_API_KEY"))
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages) messages = [
context_aggregator = llm.create_context_aggregator(context) {
"role": "system",
pipeline = Pipeline( "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
[ },
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User respones
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
] ]
)
task = PipelineTask( context = OpenAILLMContext(messages)
pipeline, context_aggregator = llm.create_context_aggregator(context)
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_connected") pipeline = Pipeline(
async def on_client_connected(transport, client): [
logger.info(f"Client connected") transport.input(), # Transport user input
# Kick off the conversation. stt, # STT
messages.append({"role": "system", "content": "Please introduce yourself to the user."}) context_aggregator.user(), # User respones
await task.queue_frames([context_aggregator.user().get_context_frame()]) llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
@transport.event_handler("on_client_disconnected") task = PipelineTask(
async def on_client_disconnected(transport, client): pipeline,
logger.info(f"Client disconnected") params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") await transport.capture_participant_transcription(participant["id"])
await task.cancel() # Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner(handle_sigint=False) @transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
await runner.run(task) runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,10 +4,14 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.pipeline import Pipeline
@@ -17,92 +21,88 @@ from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.assemblyai.stt import AssemblyAISTTService from pipecat.services.assemblyai.stt import AssemblyAISTTService
from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def run_bot(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Starting bot")
transport = SmallWebRTCTransport( async def main():
webrtc_connection=webrtc_connection, async with aiohttp.ClientSession() as session:
params=TransportParams( (room_url, token) = await configure(session)
audio_in_enabled=True,
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
stt = AssemblyAISTTService( transport = DailyTransport(
api_key=os.getenv("ASSEMBLYAI_API_KEY"), room_url,
) token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
tts = CartesiaTTSService( stt = AssemblyAISTTService(
api_key=os.getenv("CARTESIA_API_KEY"), api_key=os.getenv("ASSEMBLYAI_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady )
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY")) tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
messages = [ llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages) messages = [
context_aggregator = llm.create_context_aggregator(context) {
"role": "system",
pipeline = Pipeline( "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
[ },
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
] ]
)
task = PipelineTask( context = OpenAILLMContext(messages)
pipeline, context_aggregator = llm.create_context_aggregator(context)
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_connected") pipeline = Pipeline(
async def on_client_connected(transport, client): [
logger.info(f"Client connected") transport.input(), # Transport user input
# Kick off the conversation. stt, # STT
messages.append({"role": "system", "content": "Please introduce yourself to the user."}) context_aggregator.user(), # User responses
await task.queue_frames([context_aggregator.user().get_context_frame()]) llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
@transport.event_handler("on_client_disconnected") task = PipelineTask(
async def on_client_disconnected(transport, client): pipeline,
logger.info(f"Client disconnected") params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") await transport.capture_participant_transcription(participant["id"])
await task.cancel() # Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner(handle_sigint=False) @transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
await runner.run(task) runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,10 +4,14 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.audio.filters.krisp_filter import KrispFilter from pipecat.audio.filters.krisp_filter import KrispFilter
from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.audio.vad.silero import SileroVADAnalyzer
@@ -18,88 +22,83 @@ from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.deepgram.stt import DeepgramSTTService from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.deepgram.tts import DeepgramTTSService from pipecat.services.deepgram.tts import DeepgramTTSService
from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def run_bot(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Starting bot")
transport = SmallWebRTCTransport( async def main():
webrtc_connection=webrtc_connection, async with aiohttp.ClientSession() as session:
params=TransportParams( (room_url, token) = await configure(session)
audio_in_enabled=True,
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
audio_in_filter=KrispFilter(),
),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
audio_in_filter=KrispFilter(),
),
)
tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-helios-en") stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY")) tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-helios-en")
messages = [ llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages) messages = [
context_aggregator = llm.create_context_aggregator(context) {
"role": "system",
pipeline = Pipeline( "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
[ },
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
] ]
)
task = PipelineTask( context = OpenAILLMContext(messages)
pipeline, context_aggregator = llm.create_context_aggregator(context)
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_connected") pipeline = Pipeline(
async def on_client_connected(transport, client): [
logger.info(f"Client connected") transport.input(), # Transport user input
# Kick off the conversation. stt, # STT
messages.append({"role": "system", "content": "Please introduce yourself to the user."}) context_aggregator.user(), # User responses
await task.queue_frames([context_aggregator.user().get_context_frame()]) llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
@transport.event_handler("on_client_disconnected") task = PipelineTask(
async def on_client_disconnected(transport, client): pipeline,
logger.info(f"Client disconnected") params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") # Kick off the conversation.
await task.cancel() messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner(handle_sigint=False) @transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
await runner.run(task) runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,44 +4,45 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.rime.tts import RimeHttpTTSService from pipecat.services.rime.tts import RimeHttpTTSService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def run_bot(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Starting bot")
transport = SmallWebRTCTransport( async def main():
webrtc_connection=webrtc_connection,
params=TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
# Create an HTTP session
async with aiohttp.ClientSession() as session: async with aiohttp.ClientSession() as session:
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) (room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = RimeHttpTTSService( tts = RimeHttpTTSService(
api_key=os.getenv("RIME_API_KEY", ""), api_key=os.getenv("RIME_API_KEY", ""),
@@ -49,7 +50,7 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection):
aiohttp_session=session, aiohttp_session=session,
) )
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY")) llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [ messages = [
{ {
@@ -64,7 +65,6 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection):
pipeline = Pipeline( pipeline = Pipeline(
[ [
transport.input(), # Transport user input transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses context_aggregator.user(), # User responses
llm, # LLM llm, # LLM
tts, # TTS tts, # TTS
@@ -83,28 +83,21 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection):
), ),
) )
@transport.event_handler("on_client_connected") @transport.event_handler("on_first_participant_joined")
async def on_client_connected(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client connected") await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation. # Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."}) messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()]) await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_client_disconnected") @transport.event_handler("on_participant_left")
async def on_client_disconnected(transport, client): async def on_participant_left(transport, participant, reason):
logger.info(f"Client disconnected")
@transport.event_handler("on_client_closed")
async def on_client_closed(transport, client):
logger.info(f"Client closed connection")
await task.cancel() await task.cancel()
runner = PipelineRunner(handle_sigint=False) runner = PipelineRunner()
await runner.run(task) await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,103 +4,99 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.rime.tts import RimeTTSService from pipecat.services.rime.tts import RimeTTSService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def run_bot(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Starting bot")
transport = SmallWebRTCTransport( async def main():
webrtc_connection=webrtc_connection, async with aiohttp.ClientSession() as session:
params=TransportParams( (room_url, token) = await configure(session)
audio_in_enabled=True,
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = RimeTTSService( tts = RimeTTSService(
api_key=os.getenv("RIME_API_KEY", ""), api_key=os.getenv("RIME_API_KEY", ""),
voice_id="rex", voice_id="rex",
) )
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY")) llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [ messages = [
{ {
"role": "system", "role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.", "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
}, },
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
] ]
)
task = PipelineTask( context = OpenAILLMContext(messages)
pipeline, context_aggregator = llm.create_context_aggregator(context)
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_connected") pipeline = Pipeline(
async def on_client_connected(transport, client): [
logger.info(f"Client connected") transport.input(), # Transport user input
# Kick off the conversation. context_aggregator.user(), # User responses
messages.append({"role": "system", "content": "Please introduce yourself to the user."}) llm, # LLM
await task.queue_frames([context_aggregator.user().get_context_frame()]) tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
@transport.event_handler("on_client_disconnected") task = PipelineTask(
async def on_client_disconnected(transport, client): pipeline,
logger.info(f"Client disconnected") params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") await transport.capture_participant_transcription(participant["id"])
await task.cancel() # Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner(handle_sigint=False) @transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
await runner.run(task) runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,10 +4,14 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.pipeline import Pipeline
@@ -17,87 +21,76 @@ from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.nim.llm import NimLLMService from pipecat.services.nim.llm import NimLLMService
from pipecat.services.riva.stt import ParakeetSTTService from pipecat.services.riva.stt import ParakeetSTTService
from pipecat.services.riva.tts import FastPitchTTSService from pipecat.services.riva.tts import FastPitchTTSService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def run_bot(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Starting bot")
transport = SmallWebRTCTransport( async def main():
webrtc_connection=webrtc_connection, async with aiohttp.ClientSession() as session:
params=TransportParams( (room_url, _) = await configure(session)
audio_in_enabled=True,
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
stt = ParakeetSTTService(api_key=os.getenv("NVIDIA_API_KEY")) transport = DailyTransport(
room_url,
None,
"Respond bot",
DailyParams(
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
llm = NimLLMService(api_key=os.getenv("NVIDIA_API_KEY"), model="meta/llama-3.1-405b-instruct") stt = ParakeetSTTService(api_key=os.getenv("NVIDIA_API_KEY"))
tts = FastPitchTTSService(api_key=os.getenv("NVIDIA_API_KEY")) llm = NimLLMService(
api_key=os.getenv("NVIDIA_API_KEY"), model="meta/llama-3.1-405b-instruct"
)
messages = [ tts = FastPitchTTSService(api_key=os.getenv("NVIDIA_API_KEY"))
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages) messages = [
context_aggregator = llm.create_context_aggregator(context) {
"role": "system",
pipeline = Pipeline( "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
[ },
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
] ]
)
task = PipelineTask( context = OpenAILLMContext(messages)
pipeline, context_aggregator = llm.create_context_aggregator(context)
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_connected") pipeline = Pipeline(
async def on_client_connected(transport, client): [
logger.info(f"Client connected") transport.input(), # Transport user input
# Kick off the conversation. stt, # STT
messages.append({"role": "system", "content": "Please introduce yourself to the user."}) context_aggregator.user(), # User responses
await task.queue_frames([context_aggregator.user().get_context_frame()]) llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
@transport.event_handler("on_client_disconnected") task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True))
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") # Kick off the conversation.
await task.cancel() messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner(handle_sigint=False) @transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
await runner.run(task) runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,12 +4,16 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
from dataclasses import dataclass from dataclasses import dataclass
import aiohttp
import google.ai.generativelanguage as glm import google.ai.generativelanguage as glm
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import ( from pipecat.frames.frames import (
@@ -28,15 +32,14 @@ from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.frame_processor import FrameProcessor from pipecat.processors.frame_processor import FrameProcessor
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.google.llm import GoogleLLMService from pipecat.services.google.llm import GoogleLLMService
from pipecat.services.google.tts import GoogleTTSService from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
marker = "|----|" marker = "|----|"
system_message = f""" system_message = f"""
@@ -190,92 +193,85 @@ class TanscriptionContextFixup(FrameProcessor):
await self.push_frame(frame, direction) await self.push_frame(frame, direction)
async def run_bot(webrtc_connection: SmallWebRTCConnection): async def main():
logger.info(f"Starting bot") async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = SmallWebRTCTransport( transport = DailyTransport(
webrtc_connection=webrtc_connection, room_url,
params=TransportParams( token,
audio_in_enabled=True, "Respond bot",
audio_out_enabled=True, DailyParams(
# No transcription at all. just audio input to Gemini! audio_out_enabled=True,
# transcription_enabled=True, # No transcription at all. just audio input to Gemini!
vad_enabled=True, # transcription_enabled=True,
vad_analyzer=SileroVADAnalyzer(), vad_enabled=True,
vad_audio_passthrough=True, vad_analyzer=SileroVADAnalyzer(),
), vad_audio_passthrough=True,
) ),
)
llm = GoogleLLMService(api_key=os.getenv("GOOGLE_API_KEY"), model="gemini-2.0-flash-001") tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
tts = GoogleTTSService( llm = GoogleLLMService(api_key=os.getenv("GOOGLE_API_KEY"), model="gemini-2.0-flash-001")
voice_id="en-US-Chirp3-HD-Charon",
params=GoogleTTSService.InputParams(language=Language.EN_US),
credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
)
messages = [ messages = [
{ {
"role": "system", "role": "system",
"content": system_message, "content": system_message,
}, },
{ {
"role": "user", "role": "user",
"content": "Start by saying hello.", "content": "Start by saying hello.",
}, },
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
audio_collector = UserAudioCollector(context, context_aggregator.user())
pull_transcript_out_of_llm_output = TranscriptExtractor(context)
fixup_context_messages = TanscriptionContextFixup(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
audio_collector,
context_aggregator.user(), # User responses
llm, # LLM
pull_transcript_out_of_llm_output,
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
fixup_context_messages,
] ]
)
task = PipelineTask( context = OpenAILLMContext(messages)
pipeline, context_aggregator = llm.create_context_aggregator(context)
params=PipelineParams( audio_collector = UserAudioCollector(context, context_aggregator.user())
allow_interruptions=True, pull_transcript_out_of_llm_output = TranscriptExtractor(context)
enable_metrics=True, fixup_context_messages = TanscriptionContextFixup(context)
enable_usage_metrics=True,
),
)
@transport.event_handler("on_client_connected") pipeline = Pipeline(
async def on_client_connected(transport, client): [
logger.info(f"Client connected") transport.input(), # Transport user input
# Kick off the conversation. audio_collector,
messages.append({"role": "system", "content": "Please introduce yourself to the user."}) context_aggregator.user(), # User responses
await task.queue_frames([context_aggregator.user().get_context_frame()]) llm, # LLM
pull_transcript_out_of_llm_output,
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
fixup_context_messages,
]
)
@transport.event_handler("on_client_disconnected") task = PipelineTask(
async def on_client_disconnected(transport, client): pipeline,
logger.info(f"Client disconnected") params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
),
)
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") await transport.capture_participant_transcription(participant["id"])
await task.cancel() # Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner(handle_sigint=False) @transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
await runner.run(task) runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,103 +4,99 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.fish.tts import FishAudioTTSService from pipecat.services.fish.tts import FishAudioTTSService
from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def run_bot(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Starting bot")
transport = SmallWebRTCTransport( async def main():
webrtc_connection=webrtc_connection, async with aiohttp.ClientSession() as session:
params=TransportParams( (room_url, token) = await configure(session)
audio_in_enabled=True,
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = FishAudioTTSService( tts = FishAudioTTSService(
api_key=os.getenv("FISH_API_KEY"), api_key=os.getenv("FISH_API_KEY"),
model="4ce7e917cedd4bc2bb2e6ff3a46acaa1", # Barack Obama model="4ce7e917cedd4bc2bb2e6ff3a46acaa1", # Barack Obama
) )
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY")) llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [ messages = [
{ {
"role": "system", "role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.", "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
}, },
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
] ]
)
task = PipelineTask( context = OpenAILLMContext(messages)
pipeline, context_aggregator = llm.create_context_aggregator(context)
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_connected") pipeline = Pipeline(
async def on_client_connected(transport, client): [
logger.info(f"Client connected") transport.input(), # Transport user input
# Kick off the conversation. context_aggregator.user(), # User responses
messages.append({"role": "system", "content": "Please introduce yourself to the user."}) llm, # LLM
await task.queue_frames([context_aggregator.user().get_context_frame()]) tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
@transport.event_handler("on_client_disconnected") task = PipelineTask(
async def on_client_disconnected(transport, client): pipeline,
logger.info(f"Client disconnected") params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") await transport.capture_participant_transcription(participant["id"])
await task.cancel() # Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner(handle_sigint=False) @transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
await runner.run(task) runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,10 +4,14 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams from pipecat.audio.vad.vad_analyzer import VADParams
@@ -16,9 +20,7 @@ from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.ultravox.stt import UltravoxSTTService from pipecat.services.ultravox.stt import UltravoxSTTService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
@@ -26,6 +28,8 @@ load_dotenv(override=True)
# The Ultravox model is compute-intensive and performs best with GPU acceleration. # The Ultravox model is compute-intensive and performs best with GPU acceleration.
# This can be deployed on cloud GPU providers like Cerebrium.ai for optimal performance. # This can be deployed on cloud GPU providers like Cerebrium.ai for optimal performance.
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
# Want to initialize the ultravox processor since it takes time to load the model and dont # Want to initialize the ultravox processor since it takes time to load the model and dont
# want to load it every time the pipeline is run # want to load it every time the pipeline is run
@@ -35,61 +39,53 @@ ultravox_processor = UltravoxSTTService(
) )
async def run_bot(webrtc_connection: SmallWebRTCConnection): async def main():
logger.info(f"Starting bot") async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = SmallWebRTCTransport( transport = DailyTransport(
webrtc_connection=webrtc_connection, room_url,
params=TransportParams( token,
audio_in_enabled=True, "Respond bot",
audio_out_enabled=True, DailyParams(
vad_enabled=True, audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)), transcription_enabled=False,
vad_audio_passthrough=True, vad_enabled=True,
), vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
) vad_audio_passthrough=True,
),
)
tts = CartesiaTTSService( tts = CartesiaTTSService(
api_key=os.environ.get("CARTESIA_API_KEY"), api_key=os.environ.get("CARTESIA_API_KEY"),
voice_id="97f4b8fb-f2fe-444b-bb9a-c109783a857a", voice_id="97f4b8fb-f2fe-444b-bb9a-c109783a857a",
) )
pipeline = Pipeline( pipeline = Pipeline(
[ [
transport.input(), # Transport user input transport.input(), # Transport user input
ultravox_processor, ultravox_processor,
tts, # TTS tts, # TTS
transport.output(), # Transport bot output transport.output(), # Transport bot output
] ]
) )
task = PipelineTask( task = PipelineTask(
pipeline, pipeline,
params=PipelineParams( params=PipelineParams(
allow_interruptions=True, allow_interruptions=True,
enable_metrics=True, enable_metrics=True,
), ),
) )
@transport.event_handler("on_client_connected") @transport.event_handler("on_participant_left")
async def on_client_connected(transport, client): async def on_participant_left(transport, participant, reason):
logger.info(f"Client connected") await task.cancel()
@transport.event_handler("on_client_disconnected") runner = PipelineRunner()
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
@transport.event_handler("on_client_closed") await runner.run(task)
async def on_client_closed(transport, client):
logger.info(f"Client closed connection")
await task.cancel()
runner = PipelineRunner(handle_sigint=False)
await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,103 +4,99 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.neuphonic.tts import NeuphonicHttpTTSService from pipecat.services.neuphonic.tts import NeuphonicHttpTTSService
from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def run_bot(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Starting bot")
transport = SmallWebRTCTransport( async def main():
webrtc_connection=webrtc_connection, async with aiohttp.ClientSession() as session:
params=TransportParams( (room_url, token) = await configure(session)
audio_in_enabled=True,
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = NeuphonicHttpTTSService( tts = NeuphonicHttpTTSService(
api_key=os.getenv("NEUPHONIC_API_KEY"), api_key=os.getenv("NEUPHONIC_API_KEY"),
voice_id="fc854436-2dac-4d21-aa69-ae17b54e98eb", # Emily voice_id="fc854436-2dac-4d21-aa69-ae17b54e98eb", # Emily
) )
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY")) llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [ messages = [
{ {
"role": "system", "role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.", "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
}, },
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
] ]
)
task = PipelineTask( context = OpenAILLMContext(messages)
pipeline, context_aggregator = llm.create_context_aggregator(context)
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_connected") pipeline = Pipeline(
async def on_client_connected(transport, client): [
logger.info(f"Client connected") transport.input(), # Transport user input
# Kick off the conversation. context_aggregator.user(), # User responses
messages.append({"role": "system", "content": "Please introduce yourself to the user."}) llm, # LLM
await task.queue_frames([context_aggregator.user().get_context_frame()]) tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
@transport.event_handler("on_client_disconnected") task = PipelineTask(
async def on_client_disconnected(transport, client): pipeline,
logger.info(f"Client disconnected") params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") await transport.capture_participant_transcription(participant["id"])
await task.cancel() # Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner(handle_sigint=False) @transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
await runner.run(task) runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,103 +4,99 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.neuphonic.tts import NeuphonicTTSService from pipecat.services.neuphonic.tts import NeuphonicTTSService
from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def run_bot(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Starting bot")
transport = SmallWebRTCTransport( async def main():
webrtc_connection=webrtc_connection, async with aiohttp.ClientSession() as session:
params=TransportParams( (room_url, token) = await configure(session)
audio_in_enabled=True,
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = NeuphonicTTSService( tts = NeuphonicTTSService(
api_key=os.getenv("NEUPHONIC_API_KEY"), api_key=os.getenv("NEUPHONIC_API_KEY"),
voice_id="fc854436-2dac-4d21-aa69-ae17b54e98eb", # Emily voice_id="fc854436-2dac-4d21-aa69-ae17b54e98eb", # Emily
) )
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY")) llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [ messages = [
{ {
"role": "system", "role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.", "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
}, },
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
] ]
)
task = PipelineTask( context = OpenAILLMContext(messages)
pipeline, context_aggregator = llm.create_context_aggregator(context)
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_connected") pipeline = Pipeline(
async def on_client_connected(transport, client): [
logger.info(f"Client connected") transport.input(), # Transport user input
# Kick off the conversation. context_aggregator.user(), # User responses
messages.append({"role": "system", "content": "Please introduce yourself to the user."}) llm, # LLM
await task.queue_frames([context_aggregator.user().get_context_frame()]) tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
@transport.event_handler("on_client_disconnected") task = PipelineTask(
async def on_client_disconnected(transport, client): pipeline,
logger.info(f"Client disconnected") params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") await transport.capture_participant_transcription(participant["id"])
await task.cancel() # Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner(handle_sigint=False) @transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
await runner.run(task) runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,10 +4,14 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.pipeline import Pipeline
@@ -17,92 +21,89 @@ from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.fal.stt import FalSTTService from pipecat.services.fal.stt import FalSTTService
from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def run_bot(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Starting bot")
transport = SmallWebRTCTransport( async def main():
webrtc_connection=webrtc_connection, async with aiohttp.ClientSession() as session:
params=TransportParams( (room_url, token) = await configure(session)
audio_in_enabled=True,
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
stt = FalSTTService( transport = DailyTransport(
api_key=os.getenv("FAL_KEY"), room_url,
) token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
tts = CartesiaTTSService( stt = FalSTTService(
api_key=os.getenv("CARTESIA_API_KEY"), api_key=os.getenv("FAL_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady )
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY")) tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
messages = [ llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages) messages = [
context_aggregator = llm.create_context_aggregator(context) {
"role": "system",
pipeline = Pipeline( "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
[ },
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
] ]
)
task = PipelineTask( context = OpenAILLMContext(messages)
pipeline, context_aggregator = llm.create_context_aggregator(context)
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_connected") pipeline = Pipeline(
async def on_client_connected(transport, client): [
logger.info(f"Client connected") transport.input(), # Transport user input
# Kick off the conversation. stt, # STT
messages.append({"role": "system", "content": "Please introduce yourself to the user."}) context_aggregator.user(), # User responses
await task.queue_frames([context_aggregator.user().get_context_frame()]) llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
@transport.event_handler("on_client_disconnected") task = PipelineTask(
async def on_client_disconnected(transport, client): pipeline,
logger.info(f"Client disconnected") params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") await transport.capture_participant_transcription(participant["id"])
await task.cancel() # Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner(handle_sigint=False) # Register an event handler to exit the application when the user leaves.
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
await runner.run(task) runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -45,7 +45,7 @@ async def main():
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
) )
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY")) llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [ messages = [
{ {

View File

@@ -0,0 +1,103 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.groq.llm import GroqLLMService
from pipecat.services.groq.stt import GroqSTTService
from pipecat.services.groq.tts import GroqTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
# transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
stt = GroqSTTService(api_key=os.getenv("GROQ_API_KEY"))
llm = GroqLLMService(api_key=os.getenv("GROQ_API_KEY"), model="llama-3.3-70b-versatile")
tts = GroqTTSService(api_key=os.getenv("GROQ_API_KEY"))
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -4,8 +4,8 @@ import os
from typing import Tuple from typing import Tuple
import aiohttp import aiohttp
from daily_runner import configure
from dotenv import load_dotenv from dotenv import load_dotenv
from runner import configure
from pipecat.frames.frames import AudioFrame, EndFrame, ImageFrame, LLMMessagesFrame, TextFrame from pipecat.frames.frames import AudioFrame, EndFrame, ImageFrame, LLMMessagesFrame, TextFrame
from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.pipeline import Pipeline
@@ -72,8 +72,7 @@ async def main():
async def get_text_and_audio(messages) -> Tuple[str, bytearray]: async def get_text_and_audio(messages) -> Tuple[str, bytearray]:
"""This function streams text from the LLM and uses the TTS service to convert """This function streams text from the LLM and uses the TTS service to convert
that text to speech as it's received. that text to speech as it's received."""
"""
source_queue = asyncio.Queue() source_queue = asyncio.Queue()
sink_queue = asyncio.Queue() sink_queue = asyncio.Queue()
sentence_aggregator = SentenceAggregator() sentence_aggregator = SentenceAggregator()

View File

@@ -4,9 +4,13 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.frames.frames import ( from pipecat.frames.frames import (
Frame, Frame,
@@ -19,12 +23,13 @@ from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
class MirrorProcessor(FrameProcessor): class MirrorProcessor(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection): async def process_frame(self, frame: Frame, direction: FrameDirection):
@@ -39,7 +44,6 @@ class MirrorProcessor(FrameProcessor):
) )
) )
elif isinstance(frame, InputImageRawFrame): elif isinstance(frame, InputImageRawFrame):
print(f"Received image frame: {frame.size} {frame.format}")
await self.push_frame( await self.push_frame(
OutputImageRawFrame(image=frame.image, size=frame.size, format=frame.format) OutputImageRawFrame(image=frame.image, size=frame.size, format=frame.format)
) )
@@ -47,48 +51,42 @@ class MirrorProcessor(FrameProcessor):
await self.push_frame(frame, direction) await self.push_frame(frame, direction)
async def run_bot(webrtc_connection: SmallWebRTCConnection): async def main():
logger.info(f"Starting bot") async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = SmallWebRTCTransport( transport = DailyTransport(
webrtc_connection=webrtc_connection, room_url,
params=TransportParams( token,
audio_in_enabled=True, "Test",
audio_out_enabled=True, DailyParams(
camera_in_enabled=True, audio_in_enabled=True,
camera_out_enabled=True, audio_out_enabled=True,
camera_out_is_live=True, camera_out_enabled=True,
camera_out_width=1280, camera_out_is_live=True,
camera_out_height=720, camera_out_width=1280,
), camera_out_height=720,
) ),
)
pipeline = Pipeline([transport.input(), MirrorProcessor(), transport.output()]) @transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_video(participant["id"])
task = PipelineTask( pipeline = Pipeline([transport.input(), MirrorProcessor(), transport.output()])
pipeline,
params=PipelineParams(),
)
@transport.event_handler("on_client_connected") runner = PipelineRunner()
async def on_client_connected(transport, client):
logger.info(f"Client connected")
@transport.event_handler("on_client_disconnected") task = PipelineTask(
async def on_client_disconnected(transport, client): pipeline,
logger.info(f"Client disconnected") params=PipelineParams(
audio_in_sample_rate=24000,
audio_out_sample_rate=24000,
),
)
@transport.event_handler("on_client_closed") await runner.run(task)
async def on_client_closed(transport, client):
logger.info(f"Client closed connection")
await task.cancel()
runner = PipelineRunner(handle_sigint=False)
await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -5,10 +5,13 @@
# #
import asyncio import asyncio
import sys
import tkinter as tk import tkinter as tk
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.frames.frames import ( from pipecat.frames.frames import (
Frame, Frame,
@@ -21,13 +24,14 @@ from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.local.tk import TkLocalTransport, TkTransportParams from pipecat.transports.local.tk import TkLocalTransport, TkTransportParams
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
class MirrorProcessor(FrameProcessor): class MirrorProcessor(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection): async def process_frame(self, frame: Frame, direction: FrameDirection):
@@ -49,59 +53,52 @@ class MirrorProcessor(FrameProcessor):
await self.push_frame(frame, direction) await self.push_frame(frame, direction)
async def run_bot(webrtc_connection: SmallWebRTCConnection): async def main():
logger.info(f"Starting bot") async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
p2p_transport = SmallWebRTCTransport( tk_root = tk.Tk()
webrtc_connection=webrtc_connection, tk_root.title("Local Mirror")
params=TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
camera_in_enabled=True,
camera_out_enabled=True,
camera_out_is_live=True,
camera_out_width=1280,
camera_out_height=720,
),
)
tk_root = tk.Tk() daily_transport = DailyTransport(
tk_root.title("Local Mirror") room_url, token, "Test", DailyParams(audio_in_enabled=True)
)
tk_transport = TkLocalTransport( tk_transport = TkLocalTransport(
tk_root, tk_root,
TkTransportParams( TkTransportParams(
audio_out_enabled=True, audio_out_enabled=True,
camera_out_enabled=True, camera_out_enabled=True,
camera_out_is_live=True, camera_out_is_live=True,
camera_out_width=1280, camera_out_width=1280,
camera_out_height=720, camera_out_height=720,
), ),
) )
@p2p_transport.event_handler("on_client_connected") @daily_transport.event_handler("on_first_participant_joined")
async def on_client_connected(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client connected") await transport.capture_participant_video(participant["id"])
pipeline = Pipeline([p2p_transport.input(), MirrorProcessor(), tk_transport.output()]) pipeline = Pipeline([daily_transport.input(), MirrorProcessor(), tk_transport.output()])
task = PipelineTask( task = PipelineTask(
pipeline, pipeline,
params=PipelineParams(), params=PipelineParams(
) audio_in_sample_rate=24000,
audio_out_sample_rate=24000,
),
)
async def run_tk(): async def run_tk():
while not task.has_finished(): while not task.has_finished():
tk_root.update() tk_root.update()
tk_root.update_idletasks() tk_root.update_idletasks()
await asyncio.sleep(0.1) await asyncio.sleep(0.1)
runner = PipelineRunner(handle_sigint=False) runner = PipelineRunner()
await asyncio.gather(runner.run(task), run_tk()) await asyncio.gather(runner.run(task), run_tk())
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,99 +4,89 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.filters.wake_check_filter import WakeCheckFilter from pipecat.processors.filters.wake_check_filter import WakeCheckFilter
from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def run_bot(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Starting bot")
transport = SmallWebRTCTransport( async def main():
webrtc_connection=webrtc_connection, async with aiohttp.ClientSession() as session:
params=TransportParams( (room_url, token) = await configure(session)
audio_in_enabled=True,
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) transport = DailyTransport(
room_url,
token,
"Robot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = CartesiaTTSService( tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"), api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
) )
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY")) llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [ messages = [
{ {
"role": "system", "role": "system",
"content": "You are a helpful assistant. Respond to what the user said in a creative and helpful way. Keep your responses brief.", "content": "You are a helpful assistant. Respond to what the user said in a creative and helpful way. Keep your responses brief.",
}, },
]
hey_robot_filter = WakeCheckFilter(["hey robot", "hey, robot"])
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
hey_robot_filter, # Filter out speech not directed at the robot
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
] ]
)
task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True)) hey_robot_filter = WakeCheckFilter(["hey robot", "hey, robot"])
@transport.event_handler("on_client_connected") context = OpenAILLMContext(messages)
async def on_client_connected(transport, client): context_aggregator = llm.create_context_aggregator(context)
logger.info(f"Client connected")
# Kick off the conversation.
await task.queue_frame(TTSSpeakFrame("Hi! If you want to talk to me, just say 'Hey Robot'"))
@transport.event_handler("on_client_disconnected") pipeline = Pipeline(
async def on_client_disconnected(transport, client): [
logger.info(f"Client disconnected") transport.input(), # Transport user input
hey_robot_filter, # Filter out speech not directed at the robot
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
@transport.event_handler("on_client_closed") task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True))
async def on_client_closed(transport, client):
logger.info(f"Client closed connection")
await task.cancel()
runner = PipelineRunner(handle_sigint=False) @transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
await tts.say("Hi! If you want to talk to me, just say 'Hey Robot'.")
await runner.run(task) runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,18 +4,21 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import wave import wave
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import ( from pipecat.frames.frames import (
Frame, Frame,
LLMFullResponseEndFrame, LLMFullResponseEndFrame,
OutputAudioRawFrame, OutputAudioRawFrame,
TTSSpeakFrame,
) )
from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.runner import PipelineRunner
@@ -27,14 +30,14 @@ from pipecat.processors.aggregators.openai_llm_context import (
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.processors.logger import FrameLogger from pipecat.processors.logger import FrameLogger
from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
sounds = {} sounds = {}
sound_files = ["ding1.wav", "ding2.wav"] sound_files = ["ding1.wav", "ding2.wav"]
@@ -77,83 +80,70 @@ class InboundSoundEffectWrapper(FrameProcessor):
await self.push_frame(frame, direction) await self.push_frame(frame, direction)
async def run_bot(webrtc_connection: SmallWebRTCConnection): async def main():
logger.info(f"Starting bot") async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = SmallWebRTCTransport( transport = DailyTransport(
webrtc_connection=webrtc_connection, room_url,
params=TransportParams( token,
audio_in_enabled=True, "Respond bot",
audio_out_enabled=True, DailyParams(
vad_enabled=True, audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(), transcription_enabled=True,
vad_audio_passthrough=True, vad_enabled=True,
), vad_analyzer=SileroVADAnalyzer(),
) ),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY")) tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
tts = CartesiaTTSService( messages = [
api_key=os.getenv("CARTESIA_API_KEY"), {
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady "role": "system",
) "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio. Respond to what the user said in a creative and helpful way.",
},
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
out_sound = OutboundSoundEffectWrapper()
in_sound = InboundSoundEffectWrapper()
fl = FrameLogger("LLM Out")
fl2 = FrameLogger("Transcription In")
pipeline = Pipeline(
[
transport.input(),
stt,
context_aggregator.user(),
in_sound,
fl2,
llm,
fl,
tts,
out_sound,
transport.output(),
context_aggregator.assistant(),
] ]
)
task = PipelineTask(pipeline) context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
out_sound = OutboundSoundEffectWrapper()
in_sound = InboundSoundEffectWrapper()
fl = FrameLogger("LLM Out")
fl2 = FrameLogger("Transcription In")
@transport.event_handler("on_client_connected") pipeline = Pipeline(
async def on_client_connected(transport, client): [
logger.info(f"Client connected") transport.input(),
# Kick off the conversation. context_aggregator.user(),
await task.queue_frame(TTSSpeakFrame("Hi, I'm listening!")) in_sound,
await transport.send_audio(sounds["ding1.wav"]) fl2,
llm,
fl,
tts,
out_sound,
transport.output(),
context_aggregator.assistant(),
]
)
@transport.event_handler("on_client_disconnected") @transport.event_handler("on_first_participant_joined")
async def on_client_disconnected(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client disconnected") await transport.capture_participant_transcription(participant["id"])
await tts.say("Hi, I'm listening!")
await transport.send_audio(sounds["ding1.wav"])
@transport.event_handler("on_client_closed") runner = PipelineRunner()
async def on_client_closed(transport, client):
logger.info(f"Client closed connection")
await task.cancel()
runner = PipelineRunner(handle_sigint=False) task = PipelineTask(pipeline)
await runner.run(task) await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,11 +4,15 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
from typing import Optional from typing import Optional
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import Frame, TextFrame, UserImageRequestFrame from pipecat.frames.frames import Frame, TextFrame, UserImageRequestFrame
@@ -19,14 +23,14 @@ from pipecat.processors.aggregators.user_response import UserResponseAggregator
from pipecat.processors.aggregators.vision_image_frame import VisionImageFrameAggregator from pipecat.processors.aggregators.vision_image_frame import VisionImageFrameAggregator
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.moondream.vision import MoondreamService from pipecat.services.moondream.vision import MoondreamService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
class UserImageRequester(FrameProcessor): class UserImageRequester(FrameProcessor):
def __init__(self, participant_id: Optional[str] = None): def __init__(self, participant_id: Optional[str] = None):
@@ -46,81 +50,61 @@ class UserImageRequester(FrameProcessor):
await self.push_frame(frame, direction) await self.push_frame(frame, direction)
async def run_bot(webrtc_connection: SmallWebRTCConnection): async def main():
# Get WebRTC peer connection ID async with aiohttp.ClientSession() as session:
webrtc_peer_id = webrtc_connection.pc_id (room_url, token) = await configure(session)
logger.info(f"Starting bot with peer_id: {webrtc_peer_id}") transport = DailyTransport(
room_url,
token,
"Describe participant video",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
transport = SmallWebRTCTransport( user_response = UserResponseAggregator()
webrtc_connection=webrtc_connection,
params=TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
camera_in_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
user_response = UserResponseAggregator() image_requester = UserImageRequester()
# Initialize the image requester without setting the participant ID yet vision_aggregator = VisionImageFrameAggregator()
image_requester = UserImageRequester()
vision_aggregator = VisionImageFrameAggregator() # If you run into weird description, try with use_cpu=True
moondream = MoondreamService()
# If you run into weird description, try with use_cpu=True tts = CartesiaTTSService(
moondream = MoondreamService() api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) @transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await tts.say("Hi there! Feel free to ask me what I see.")
await transport.capture_participant_video(participant["id"], framerate=0)
await transport.capture_participant_transcription(participant["id"])
image_requester.set_participant_id(participant["id"])
tts = CartesiaTTSService( pipeline = Pipeline(
api_key=os.getenv("CARTESIA_API_KEY"), [
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady transport.input(),
) user_response,
image_requester,
vision_aggregator,
moondream,
tts,
transport.output(),
]
)
pipeline = Pipeline( task = PipelineTask(pipeline)
[
transport.input(),
stt,
user_response,
image_requester,
vision_aggregator,
moondream,
tts,
transport.output(),
]
)
task = PipelineTask(pipeline) runner = PipelineRunner()
@transport.event_handler("on_client_connected") await runner.run(task)
async def on_client_connected(transport, client):
logger.info(f"Client connected: {client}")
# Welcome message
await tts.say("Hi there! Feel free to ask me what I see.")
# Set the participant ID in the image requester
image_requester.set_participant_id(webrtc_peer_id)
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
@transport.event_handler("on_client_closed")
async def on_client_closed(transport, client):
logger.info(f"Client closed connection")
await task.cancel()
runner = PipelineRunner(handle_sigint=False)
await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,29 +4,33 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
from typing import Optional from typing import Optional
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import Frame, TextFrame, UserImageRequestFrame from pipecat.frames.frames import Frame, TextFrame, UserImageRequestFrame
from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.user_response import UserResponseAggregator from pipecat.processors.aggregators.user_response import UserResponseAggregator
from pipecat.processors.aggregators.vision_image_frame import VisionImageFrameAggregator from pipecat.processors.aggregators.vision_image_frame import VisionImageFrameAggregator
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.google.llm import GoogleLLMService from pipecat.services.google.llm import GoogleLLMService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
class UserImageRequester(FrameProcessor): class UserImageRequester(FrameProcessor):
def __init__(self, participant_id: Optional[str] = None): def __init__(self, participant_id: Optional[str] = None):
@@ -46,84 +50,61 @@ class UserImageRequester(FrameProcessor):
await self.push_frame(frame, direction) await self.push_frame(frame, direction)
async def run_bot(webrtc_connection: SmallWebRTCConnection): async def main():
# Get WebRTC peer connection ID async with aiohttp.ClientSession() as session:
webrtc_peer_id = webrtc_connection.pc_id (room_url, token) = await configure(session)
logger.info(f"Starting bot with peer_id: {webrtc_peer_id}") transport = DailyTransport(
room_url,
token,
"Describe participant video",
DailyParams(
audio_in_enabled=True, # This is so Silero VAD can get audio data
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
transport = SmallWebRTCTransport( user_response = UserResponseAggregator()
webrtc_connection=webrtc_connection,
params=TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
camera_in_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
user_response = UserResponseAggregator() image_requester = UserImageRequester()
# Initialize the image requester without setting the participant ID yet vision_aggregator = VisionImageFrameAggregator()
image_requester = UserImageRequester()
vision_aggregator = VisionImageFrameAggregator() google = GoogleLLMService(model="gemini-2.0-flash-001", api_key=os.getenv("GOOGLE_API_KEY"))
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
# Google Gemini model for vision analysis @transport.event_handler("on_first_participant_joined")
google = GoogleLLMService(model="gemini-2.0-flash-001", api_key=os.getenv("GOOGLE_API_KEY")) async def on_first_participant_joined(transport, participant):
await tts.say("Hi there! Feel free to ask me what I see.")
await transport.capture_participant_video(participant["id"], framerate=0)
await transport.capture_participant_transcription(participant["id"])
image_requester.set_participant_id(participant["id"])
tts = CartesiaTTSService( pipeline = Pipeline(
api_key=os.getenv("CARTESIA_API_KEY"), [
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady transport.input(),
) user_response,
image_requester,
vision_aggregator,
google,
tts,
transport.output(),
]
)
pipeline = Pipeline( task = PipelineTask(pipeline)
[
transport.input(),
stt,
user_response,
image_requester,
vision_aggregator,
google,
tts,
transport.output(),
]
)
task = PipelineTask( runner = PipelineRunner()
pipeline,
params=PipelineParams(allow_interruptions=True),
)
@transport.event_handler("on_client_connected") await runner.run(task)
async def on_client_connected(transport, client):
logger.info(f"Client connected: {client}")
# Welcome message
await tts.say("Hi there! Feel free to ask me what I see.")
# Set the participant ID in the image requester
image_requester.set_participant_id(webrtc_peer_id)
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
@transport.event_handler("on_client_closed")
async def on_client_closed(transport, client):
logger.info(f"Client closed connection")
await task.cancel()
runner = PipelineRunner(handle_sigint=False)
await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,29 +4,33 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
from typing import Optional from typing import Optional
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import Frame, TextFrame, UserImageRequestFrame from pipecat.frames.frames import Frame, TextFrame, UserImageRequestFrame
from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.user_response import UserResponseAggregator from pipecat.processors.aggregators.user_response import UserResponseAggregator
from pipecat.processors.aggregators.vision_image_frame import VisionImageFrameAggregator from pipecat.processors.aggregators.vision_image_frame import VisionImageFrameAggregator
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
class UserImageRequester(FrameProcessor): class UserImageRequester(FrameProcessor):
def __init__(self, participant_id: Optional[str] = None): def __init__(self, participant_id: Optional[str] = None):
@@ -46,84 +50,60 @@ class UserImageRequester(FrameProcessor):
await self.push_frame(frame, direction) await self.push_frame(frame, direction)
async def run_bot(webrtc_connection: SmallWebRTCConnection): async def main():
# Get WebRTC peer connection ID async with aiohttp.ClientSession() as session:
webrtc_peer_id = webrtc_connection.pc_id (room_url, token) = await configure(session)
logger.info(f"Starting bot with peer_id: {webrtc_peer_id}") transport = DailyTransport(
room_url,
token,
"Describe participant video",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
transport = SmallWebRTCTransport( user_response = UserResponseAggregator()
webrtc_connection=webrtc_connection,
params=TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
camera_in_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
user_response = UserResponseAggregator() image_requester = UserImageRequester()
# Initialize the image requester without setting the participant ID yet vision_aggregator = VisionImageFrameAggregator()
image_requester = UserImageRequester()
vision_aggregator = VisionImageFrameAggregator() openai = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
# OpenAI GPT-4o for vision analysis @transport.event_handler("on_first_participant_joined")
openai = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY")) async def on_first_participant_joined(transport, participant):
await tts.say("Hi there! Feel free to ask me what I see.")
await transport.capture_participant_video(participant["id"], framerate=0)
await transport.capture_participant_transcription(participant["id"])
image_requester.set_participant_id(participant["id"])
tts = CartesiaTTSService( pipeline = Pipeline(
api_key=os.getenv("CARTESIA_API_KEY"), [
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady transport.input(),
) user_response,
image_requester,
vision_aggregator,
openai,
tts,
transport.output(),
]
)
pipeline = Pipeline( task = PipelineTask(pipeline)
[
transport.input(),
stt,
user_response,
image_requester,
vision_aggregator,
openai,
tts,
transport.output(),
]
)
task = PipelineTask( runner = PipelineRunner()
pipeline,
params=PipelineParams(allow_interruptions=True),
)
@transport.event_handler("on_client_connected") await runner.run(task)
async def on_client_connected(transport, client):
logger.info(f"Client connected: {client}")
# Welcome message
await tts.say("Hi there! Feel free to ask me what I see.")
# Set the participant ID in the image requester
image_requester.set_participant_id(webrtc_peer_id)
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
@transport.event_handler("on_client_closed")
async def on_client_closed(transport, client):
logger.info(f"Client closed connection")
await task.cancel()
runner = PipelineRunner(handle_sigint=False)
await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,29 +4,33 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
from typing import Optional from typing import Optional
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import Frame, TextFrame, UserImageRequestFrame from pipecat.frames.frames import Frame, TextFrame, UserImageRequestFrame
from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.user_response import UserResponseAggregator from pipecat.processors.aggregators.user_response import UserResponseAggregator
from pipecat.processors.aggregators.vision_image_frame import VisionImageFrameAggregator from pipecat.processors.aggregators.vision_image_frame import VisionImageFrameAggregator
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.anthropic.llm import AnthropicLLMService from pipecat.services.anthropic.llm import AnthropicLLMService
from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
class UserImageRequester(FrameProcessor): class UserImageRequester(FrameProcessor):
def __init__(self, participant_id: Optional[str] = None): def __init__(self, participant_id: Optional[str] = None):
@@ -46,84 +50,60 @@ class UserImageRequester(FrameProcessor):
await self.push_frame(frame, direction) await self.push_frame(frame, direction)
async def run_bot(webrtc_connection: SmallWebRTCConnection): async def main():
# Get WebRTC peer connection ID async with aiohttp.ClientSession() as session:
webrtc_peer_id = webrtc_connection.pc_id (room_url, token) = await configure(session)
logger.info(f"Starting bot with peer_id: {webrtc_peer_id}") transport = DailyTransport(
room_url,
token,
"Describe participant video",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
transport = SmallWebRTCTransport( user_response = UserResponseAggregator()
webrtc_connection=webrtc_connection,
params=TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
camera_in_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
user_response = UserResponseAggregator() image_requester = UserImageRequester()
# Initialize the image requester without setting the participant ID yet vision_aggregator = VisionImageFrameAggregator()
image_requester = UserImageRequester()
vision_aggregator = VisionImageFrameAggregator() anthropic = AnthropicLLMService(api_key=os.getenv("ANTHROPIC_API_KEY"))
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
# Anthropic for vision analysis @transport.event_handler("on_first_participant_joined")
anthropic = AnthropicLLMService(api_key=os.getenv("ANTHROPIC_API_KEY")) async def on_first_participant_joined(transport, participant):
await tts.say("Hi there! Feel free to ask me what I see.")
await transport.capture_participant_video(participant["id"], framerate=0)
await transport.capture_participant_transcription(participant["id"])
image_requester.set_participant_id(participant["id"])
tts = CartesiaTTSService( pipeline = Pipeline(
api_key=os.getenv("CARTESIA_API_KEY"), [
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady transport.input(),
) user_response,
image_requester,
vision_aggregator,
anthropic,
tts,
transport.output(),
]
)
pipeline = Pipeline( task = PipelineTask(pipeline)
[
transport.input(),
stt,
user_response,
image_requester,
vision_aggregator,
anthropic,
tts,
transport.output(),
]
)
task = PipelineTask( runner = PipelineRunner()
pipeline,
params=PipelineParams(allow_interruptions=True),
)
@transport.event_handler("on_client_connected") await runner.run(task)
async def on_client_connected(transport, client):
logger.info(f"Client connected: {client}")
# Welcome message
await tts.say("Hi there! Feel free to ask me what I see.")
# Set the participant ID in the image requester
image_requester.set_participant_id(webrtc_peer_id)
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
@transport.event_handler("on_client_closed")
async def on_client_closed(transport, client):
logger.info(f"Client closed connection")
await task.cancel()
runner = PipelineRunner(handle_sigint=False)
await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,9 +4,13 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import Frame, TranscriptionFrame from pipecat.frames.frames import Frame, TranscriptionFrame
@@ -15,12 +19,13 @@ from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask from pipecat.pipeline.task import PipelineTask
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.whisper.stt import WhisperSTTService from pipecat.services.whisper.stt import WhisperSTTService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
class TranscriptionLogger(FrameProcessor): class TranscriptionLogger(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection): async def process_frame(self, frame: Frame, direction: FrameDirection):
@@ -30,42 +35,34 @@ class TranscriptionLogger(FrameProcessor):
print(f"Transcription: {frame.text}") print(f"Transcription: {frame.text}")
async def run_bot(webrtc_connection: SmallWebRTCConnection): async def main():
logger.info(f"Starting bot") async with aiohttp.ClientSession() as session:
(room_url, _) = await configure(session)
transport = SmallWebRTCTransport( transport = DailyTransport(
webrtc_connection=webrtc_connection, room_url,
params=TransportParams( None,
audio_in_enabled=True, "Transcription bot",
vad_enabled=True, DailyParams(
vad_analyzer=SileroVADAnalyzer(), audio_in_enabled=True,
vad_audio_passthrough=True, vad_enabled=True,
), vad_analyzer=SileroVADAnalyzer(),
) vad_audio_passthrough=True,
),
)
stt = WhisperSTTService() stt = WhisperSTTService()
tl = TranscriptionLogger() tl = TranscriptionLogger()
pipeline = Pipeline([transport.input(), stt, tl]) pipeline = Pipeline([transport.input(), stt, tl])
task = PipelineTask(pipeline) task = PipelineTask(pipeline)
@transport.event_handler("on_client_disconnected") runner = PipelineRunner()
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
@transport.event_handler("on_client_closed") await runner.run(task)
async def on_client_closed(transport, client):
logger.info(f"Client closed connection")
await task.cancel()
runner = PipelineRunner(handle_sigint=False)
await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,11 +4,14 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.frames.frames import Frame, TranscriptionFrame from pipecat.frames.frames import Frame, TranscriptionFrame
from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.pipeline import Pipeline
@@ -16,12 +19,13 @@ from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask from pipecat.pipeline.task import PipelineTask
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.deepgram.stt import DeepgramSTTService, Language, LiveOptions from pipecat.services.deepgram.stt import DeepgramSTTService, Language, LiveOptions
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
class TranscriptionLogger(FrameProcessor): class TranscriptionLogger(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection): async def process_frame(self, frame: Frame, direction: FrameDirection):
@@ -31,40 +35,29 @@ class TranscriptionLogger(FrameProcessor):
print(f"Transcription: {frame.text}") print(f"Transcription: {frame.text}")
async def run_bot(webrtc_connection: SmallWebRTCConnection): async def main():
logger.info(f"Starting bot") async with aiohttp.ClientSession() as session:
(room_url, _) = await configure(session)
transport = SmallWebRTCTransport( transport = DailyTransport(
webrtc_connection=webrtc_connection, room_url, None, "Transcription bot", DailyParams(audio_in_enabled=True)
params=TransportParams(audio_in_enabled=True), )
)
stt = DeepgramSTTService( stt = DeepgramSTTService(
api_key=os.getenv("DEEPGRAM_API_KEY"), api_key=os.getenv("DEEPGRAM_API_KEY"),
live_options=LiveOptions(language=Language.EN), # live_options=LiveOptions(language=Language.FR),
) )
tl = TranscriptionLogger() tl = TranscriptionLogger()
pipeline = Pipeline([transport.input(), stt, tl]) pipeline = Pipeline([transport.input(), stt, tl])
task = PipelineTask(pipeline) task = PipelineTask(pipeline)
@transport.event_handler("on_client_disconnected") runner = PipelineRunner()
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
@transport.event_handler("on_client_closed") await runner.run(task)
async def on_client_closed(transport, client):
logger.info(f"Client closed connection")
await task.cancel()
runner = PipelineRunner(handle_sigint=False)
await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,10 +4,14 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.frames.frames import Frame, TranscriptionFrame from pipecat.frames.frames import Frame, TranscriptionFrame
from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.pipeline import Pipeline
@@ -15,12 +19,13 @@ from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask from pipecat.pipeline.task import PipelineTask
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.gladia import GladiaSTTService from pipecat.services.gladia import GladiaSTTService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
class TranscriptionLogger(FrameProcessor): class TranscriptionLogger(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection): async def process_frame(self, frame: Frame, direction: FrameDirection):
@@ -30,40 +35,29 @@ class TranscriptionLogger(FrameProcessor):
print(f"Transcription: {frame.text}") print(f"Transcription: {frame.text}")
async def run_bot(webrtc_connection: SmallWebRTCConnection): async def main():
logger.info(f"Starting bot") async with aiohttp.ClientSession() as session:
(room_url, _) = await configure(session)
transport = SmallWebRTCTransport( transport = DailyTransport(
webrtc_connection=webrtc_connection, room_url, None, "Transcription bot", DailyParams(audio_in_enabled=True)
params=TransportParams(audio_in_enabled=True), )
)
stt = GladiaSTTService( stt = GladiaSTTService(
api_key=os.getenv("GLADIA_API_KEY"), api_key=os.getenv("GLADIA_API_KEY"),
# live_options=LiveOptions(language=Language.FR), # live_options=LiveOptions(language=Language.FR),
) )
tl = TranscriptionLogger() tl = TranscriptionLogger()
pipeline = Pipeline([transport.input(), stt, tl]) pipeline = Pipeline([transport.input(), stt, tl])
task = PipelineTask(pipeline) task = PipelineTask(pipeline)
@transport.event_handler("on_client_disconnected") runner = PipelineRunner()
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
@transport.event_handler("on_client_closed") await runner.run(task)
async def on_client_closed(transport, client):
logger.info(f"Client closed connection")
await task.cancel()
runner = PipelineRunner(handle_sigint=False)
await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,10 +4,14 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.frames.frames import Frame, TranscriptionFrame from pipecat.frames.frames import Frame, TranscriptionFrame
from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.pipeline import Pipeline
@@ -15,12 +19,13 @@ from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask from pipecat.pipeline.task import PipelineTask
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.assemblyai.stt import AssemblyAISTTService from pipecat.services.assemblyai.stt import AssemblyAISTTService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
class TranscriptionLogger(FrameProcessor): class TranscriptionLogger(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection): async def process_frame(self, frame: Frame, direction: FrameDirection):
@@ -30,39 +35,28 @@ class TranscriptionLogger(FrameProcessor):
print(f"Transcription: {frame.text}") print(f"Transcription: {frame.text}")
async def run_bot(webrtc_connection: SmallWebRTCConnection): async def main():
logger.info(f"Starting bot") async with aiohttp.ClientSession() as session:
(room_url, _) = await configure(session)
transport = SmallWebRTCTransport( transport = DailyTransport(
webrtc_connection=webrtc_connection, room_url, None, "Transcription bot", DailyParams(audio_in_enabled=True)
params=TransportParams(audio_in_enabled=True), )
)
stt = AssemblyAISTTService( stt = AssemblyAISTTService(
api_key=os.getenv("ASSEMBLYAI_API_KEY"), api_key=os.getenv("ASSEMBLYAI_API_KEY"),
) )
tl = TranscriptionLogger() tl = TranscriptionLogger()
pipeline = Pipeline([transport.input(), stt, tl]) pipeline = Pipeline([transport.input(), stt, tl])
task = PipelineTask(pipeline) task = PipelineTask(pipeline)
@transport.event_handler("on_client_disconnected") runner = PipelineRunner()
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
@transport.event_handler("on_client_closed") await runner.run(task)
async def on_client_closed(transport, client):
logger.info(f"Client closed connection")
await task.cancel()
runner = PipelineRunner(handle_sigint=False)
await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,11 +4,14 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import sys
import time import time
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams from pipecat.audio.vad.vad_analyzer import VADParams
@@ -18,12 +21,13 @@ from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.whisper.stt import MLXModel, WhisperSTTServiceMLX from pipecat.services.whisper.stt import MLXModel, WhisperSTTServiceMLX
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
STOP_SECS = 2.0 STOP_SECS = 2.0
@@ -52,48 +56,40 @@ class TranscriptionLogger(FrameProcessor):
self._last_transcription_time = time.time() self._last_transcription_time = time.time()
async def run_bot(webrtc_connection: SmallWebRTCConnection): async def main():
logger.info(f"Starting bot") async with aiohttp.ClientSession() as session:
(room_url, _) = await configure(session)
transport = SmallWebRTCTransport( transport = DailyTransport(
webrtc_connection=webrtc_connection, room_url,
params=TransportParams( None,
audio_in_enabled=True, "Transcription bot",
vad_enabled=True, DailyParams(
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=STOP_SECS)), audio_in_enabled=True,
vad_audio_passthrough=True, vad_enabled=True,
), vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=STOP_SECS)),
) vad_audio_passthrough=True,
),
)
stt = WhisperSTTServiceMLX(model=MLXModel.LARGE_V3_TURBO) stt = WhisperSTTServiceMLX(model=MLXModel.LARGE_V3_TURBO)
tl = TranscriptionLogger() tl = TranscriptionLogger()
pipeline = Pipeline([transport.input(), stt, tl]) pipeline = Pipeline([transport.input(), stt, tl])
task = PipelineTask( task = PipelineTask(
pipeline, pipeline,
params=PipelineParams( params=PipelineParams(
enable_metrics=True, enable_metrics=True,
report_only_initial_ttfb=False, report_only_initial_ttfb=False,
), ),
) )
@transport.event_handler("on_client_disconnected") runner = PipelineRunner()
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
@transport.event_handler("on_client_closed") await runner.run(task)
async def on_client_closed(transport, client):
logger.info(f"Client closed connection")
await task.cancel()
runner = PipelineRunner(handle_sigint=False)
await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,10 +4,14 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema
@@ -18,118 +22,106 @@ from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def fetch_weather_from_api(function_name, tool_call_id, args, llm, context, result_callback): async def fetch_weather_from_api(function_name, tool_call_id, args, llm, context, result_callback):
await llm.push_frame(TTSSpeakFrame("Let me check on that.")) await llm.push_frame(TTSSpeakFrame("Let me check on that."))
await result_callback({"conditions": "nice", "temperature": "75"}) await result_callback({"conditions": "nice", "temperature": "75"})
async def run_bot(webrtc_connection: SmallWebRTCConnection): async def main():
logger.info(f"Starting bot") async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = SmallWebRTCTransport( transport = DailyTransport(
webrtc_connection=webrtc_connection, room_url,
params=TransportParams( token,
audio_in_enabled=True, "Respond bot",
audio_out_enabled=True, DailyParams(
vad_enabled=True, audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(), transcription_enabled=True,
vad_audio_passthrough=True, vad_enabled=True,
), vad_analyzer=SileroVADAnalyzer(),
) ),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
tts = CartesiaTTSService( llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY")) # You can also register a function_name of None to get all functions
# sent to the same callback with an additional function_name parameter.
llm.register_function("get_current_weather", fetch_weather_from_api)
# You can also register a function_name of None to get all functions weather_function = FunctionSchema(
# sent to the same callback with an additional function_name parameter. name="get_current_weather",
llm.register_function("get_current_weather", fetch_weather_from_api) description="Get the current weather",
properties={
weather_function = FunctionSchema( "location": {
name="get_current_weather", "type": "string",
description="Get the current weather", "description": "The city and state, e.g. San Francisco, CA",
properties={ },
"location": { "format": {
"type": "string", "type": "string",
"description": "The city and state, e.g. San Francisco, CA", "enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the user's location.",
},
}, },
"format": { required=["location", "format"],
"type": "string", )
"enum": ["celsius", "fahrenheit"], tools = ToolsSchema(standard_tools=[weather_function])
"description": "The temperature unit to use. Infer this from the user's location.",
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
}, },
},
required=["location", "format"],
)
tools = ToolsSchema(standard_tools=[weather_function])
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(),
stt,
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
] ]
)
task = PipelineTask( context = OpenAILLMContext(messages, tools)
pipeline, context_aggregator = llm.create_context_aggregator(context)
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_connected") pipeline = Pipeline(
async def on_client_connected(transport, client): [
logger.info(f"Client connected") transport.input(),
# Kick off the conversation. context_aggregator.user(),
await task.queue_frames([context_aggregator.user().get_context_frame()]) llm,
tts,
transport.output(),
context_aggregator.assistant(),
]
)
@transport.event_handler("on_client_disconnected") task = PipelineTask(
async def on_client_disconnected(transport, client): pipeline,
logger.info(f"Client disconnected") params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") await transport.capture_participant_transcription(participant["id"])
await task.cancel() # Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner(handle_sigint=False) runner = PipelineRunner()
await runner.run(task) await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,10 +4,14 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema
@@ -18,111 +22,99 @@ from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.anthropic.llm import AnthropicLLMService from pipecat.services.anthropic.llm import AnthropicLLMService
from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def get_weather(function_name, tool_call_id, arguments, llm, context, result_callback): async def get_weather(function_name, tool_call_id, arguments, llm, context, result_callback):
location = arguments["location"] location = arguments["location"]
await result_callback(f"The weather in {location} is currently 72 degrees and sunny.") await result_callback(f"The weather in {location} is currently 72 degrees and sunny.")
async def run_bot(webrtc_connection: SmallWebRTCConnection): async def main():
logger.info(f"Starting bot") async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = SmallWebRTCTransport( transport = DailyTransport(
webrtc_connection=webrtc_connection, room_url,
params=TransportParams( token,
audio_in_enabled=True, "Respond bot",
audio_out_enabled=True, DailyParams(
vad_enabled=True, audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(), transcription_enabled=True,
vad_audio_passthrough=True, vad_enabled=True,
), vad_analyzer=SileroVADAnalyzer(),
) ),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
tts = CartesiaTTSService( llm = AnthropicLLMService(
api_key=os.getenv("CARTESIA_API_KEY"), api_key=os.getenv("ANTHROPIC_API_KEY"), model="claude-3-7-sonnet-latest"
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady )
) llm.register_function("get_weather", get_weather)
llm = AnthropicLLMService( weather_function = FunctionSchema(
api_key=os.getenv("ANTHROPIC_API_KEY"), model="claude-3-7-sonnet-latest" name="get_weather",
) description="Get the current weather",
llm.register_function("get_weather", get_weather) properties={
"location": {
weather_function = FunctionSchema( "type": "string",
name="get_weather", "description": "The city and state, e.g. San Francisco, CA",
description="Get the current weather", },
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
}, },
}, required=["location"],
required=["location"], )
) tools = ToolsSchema(standard_tools=[weather_function])
tools = ToolsSchema(standard_tools=[weather_function])
# todo: test with very short initial user message # todo: test with very short initial user message
# messages = [{"role": "system", # messages = [{"role": "system",
# "content": "You are a helpful assistant who can report the weather in any location in the universe. Respond concisely. Your response will be turned into speech so use only simple words and punctuation."}, # "content": "You are a helpful assistant who can report the weather in any location in the universe. Respond concisely. Your response will be turned into speech so use only simple words and punctuation."},
# {"role": "user", # {"role": "user",
# "content": " Start the conversation by introducing yourself."}] # "content": " Start the conversation by introducing yourself."}]
messages = [{"role": "user", "content": "Say 'hello' to start the conversation."}] messages = [{"role": "user", "content": "Say 'hello' to start the conversation."}]
context = OpenAILLMContext(messages, tools) context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context) context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline( pipeline = Pipeline(
[ [
transport.input(), # Transport user input transport.input(), # Transport user input
stt, context_aggregator.user(), # User spoken responses
context_aggregator.user(), # User spoken responses llm, # LLM
llm, # LLM tts, # TTS
tts, # TTS transport.output(), # Transport bot output
transport.output(), # Transport bot output context_aggregator.assistant(), # Assistant spoken responses and tool context
context_aggregator.assistant(), # Assistant spoken responses and tool context ]
] )
)
task = PipelineTask( task = PipelineTask(
pipeline, pipeline,
params=PipelineParams( params=PipelineParams(
allow_interruptions=True, allow_interruptions=True,
enable_metrics=True, enable_metrics=True,
), ),
) )
@transport.event_handler("on_client_connected") @transport.event_handler("on_first_participant_joined")
async def on_client_connected(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client connected") await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation. # Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()]) await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_client_disconnected") runner = PipelineRunner()
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
@transport.event_handler("on_client_closed") await runner.run(task)
async def on_client_closed(transport, client):
logger.info(f"Client closed connection")
await task.cancel()
runner = PipelineRunner(handle_sigint=False)
await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -6,9 +6,12 @@
import asyncio import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema
@@ -19,16 +22,14 @@ from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.anthropic.llm import AnthropicLLMService from pipecat.services.anthropic.llm import AnthropicLLMService
from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
# Global variable to store the peer connection ID video_participant_id = None
webrtc_peer_id = None
async def get_weather(function_name, tool_call_id, arguments, llm, context, result_callback): async def get_weather(function_name, tool_call_id, arguments, llm, context, result_callback):
@@ -38,83 +39,72 @@ async def get_weather(function_name, tool_call_id, arguments, llm, context, resu
async def get_image(function_name, tool_call_id, arguments, llm, context, result_callback): async def get_image(function_name, tool_call_id, arguments, llm, context, result_callback):
question = arguments["question"] question = arguments["question"]
logger.debug(f"Requesting image with user_id={webrtc_peer_id}, question={question}")
# Request the image frame
await llm.request_image_frame( await llm.request_image_frame(
user_id=webrtc_peer_id, user_id=video_participant_id,
function_name=function_name, function_name=function_name,
tool_call_id=tool_call_id, tool_call_id=tool_call_id,
text_content=question, text_content=question,
) )
# Wait a short time for the frame to be processed
await asyncio.sleep(0.5)
# Return a result to complete the function call async def main():
await result_callback( global llm
f"I've captured an image from your camera and I'm analyzing what you asked about: {question}"
)
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
async def run_bot(webrtc_connection: SmallWebRTCConnection): transport = DailyTransport(
global webrtc_peer_id room_url,
webrtc_peer_id = webrtc_connection.pc_id token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
logger.info(f"Starting bot with peer_id: {webrtc_peer_id}") tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
transport = SmallWebRTCTransport( llm = AnthropicLLMService(
webrtc_connection=webrtc_connection, api_key=os.getenv("ANTHROPIC_API_KEY"),
params=TransportParams( model="claude-3-7-sonnet-latest",
audio_in_enabled=True, enable_prompt_caching_beta=True,
audio_out_enabled=True, )
camera_in_enabled=True, # Make sure camera input is enabled llm.register_function("get_weather", get_weather)
vad_enabled=True, llm.register_function("get_image", get_image)
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) weather_function = FunctionSchema(
name="get_weather",
tts = CartesiaTTSService( description="Get the current weather",
api_key=os.getenv("CARTESIA_API_KEY"), properties={
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady "location": {
) "type": "string",
"description": "The city and state, e.g. San Francisco, CA",
llm = AnthropicLLMService( },
api_key=os.getenv("ANTHROPIC_API_KEY"),
model="claude-3-7-sonnet-latest",
enable_prompt_caching_beta=True,
)
llm.register_function("get_weather", get_weather)
llm.register_function("get_image", get_image)
weather_function = FunctionSchema(
name="get_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
}, },
}, required=["location"],
required=["location"], )
) get_image_function = FunctionSchema(
get_image_function = FunctionSchema( name="get_image",
name="get_image", description="Get an image from the video stream.",
description="Get an image from the video stream.", properties={
properties={ "question": {
"question": { "type": "string",
"type": "string", "description": "The question that the user is asking about the image.",
"description": "The question that the user is asking about the image.", }
} },
}, required=["question"],
required=["question"], )
) tools = ToolsSchema(standard_tools=[weather_function, get_image_function])
tools = ToolsSchema(standard_tools=[weather_function, get_image_function])
system_prompt = """\ # todo: test with very short initial user message
system_prompt = """\
You are a helpful assistant who converses with a user and answers questions. Respond concisely to general questions. You are a helpful assistant who converses with a user and answers questions. Respond concisely to general questions.
Your response will be turned into speech so use only simple words and punctuation. Your response will be turned into speech so use only simple words and punctuation.
@@ -125,73 +115,63 @@ You can respond to questions about the weather using the get_weather tool.
You can answer questions about the user's video stream using the get_image tool. Some examples of phrases that \ You can answer questions about the user's video stream using the get_image tool. Some examples of phrases that \
indicate you should use the get_image tool are: indicate you should use the get_image tool are:
- What do you see? - What do you see?
- What's in the video? - What's in the video?
- Can you describe the video? - Can you describe the video?
- Tell me about what you see. - Tell me about what you see.
- Tell me something interesting about what you see. - Tell me something interesting about what you see.
- What's happening in the video? - What's happening in the video?
If you need to use a tool, simply use the tool. Do not tell the user the tool you are using. Be brief and concise. If you need to use a tool, simply use the tool. Do not tell the user the tool you are using. Be brief and concise.
""" """
messages = [ messages = [
{ {
"role": "system", "role": "system",
"content": [ "content": [
{ {
"type": "text", "type": "text",
"text": system_prompt, "text": system_prompt,
} }
], ],
}, },
{"role": "user", "content": "Start the conversation by introducing yourself."}, {"role": "user", "content": "Start the conversation by introducing yourself."},
]
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User speech to text
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses and tool context
] ]
)
task = PipelineTask( context = OpenAILLMContext(messages, tools)
pipeline, context_aggregator = llm.create_context_aggregator(context)
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
),
)
@transport.event_handler("on_client_connected") pipeline = Pipeline(
async def on_client_connected(transport, client): [
logger.info(f"Client connected: {client}") transport.input(), # Transport user input
# Kick off the conversation. context_aggregator.user(), # User speech to text
await task.queue_frames([context_aggregator.user().get_context_frame()]) llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses and tool context
]
)
@transport.event_handler("on_client_disconnected") task = PipelineTask(
async def on_client_disconnected(transport, client): pipeline,
logger.info(f"Client disconnected") params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
),
)
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") global video_participant_id
await task.cancel() video_participant_id = participant["id"]
await transport.capture_participant_transcription(video_participant_id)
await transport.capture_participant_video(video_participant_id, framerate=0)
# Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner(handle_sigint=False) runner = PipelineRunner()
await runner.run(task)
await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,10 +4,14 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema
@@ -18,111 +22,99 @@ from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.together.llm import TogetherLLMService from pipecat.services.together.llm import TogetherLLMService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def fetch_weather_from_api(function_name, tool_call_id, args, llm, context, result_callback): async def fetch_weather_from_api(function_name, tool_call_id, args, llm, context, result_callback):
await llm.push_frame(TTSSpeakFrame("Let me check on that.")) await llm.push_frame(TTSSpeakFrame("Let me check on that."))
await result_callback({"conditions": "nice", "temperature": "75"}) await result_callback({"conditions": "nice", "temperature": "75"})
async def run_bot(webrtc_connection: SmallWebRTCConnection): async def main():
logger.info(f"Starting bot") async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = SmallWebRTCTransport( transport = DailyTransport(
webrtc_connection=webrtc_connection, room_url,
params=TransportParams( token,
audio_in_enabled=True, "Respond bot",
audio_out_enabled=True, DailyParams(
vad_enabled=True, audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(), transcription_enabled=True,
vad_audio_passthrough=True, vad_enabled=True,
), vad_analyzer=SileroVADAnalyzer(),
) ),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
tts = CartesiaTTSService( llm = TogetherLLMService(
api_key=os.getenv("CARTESIA_API_KEY"), api_key=os.getenv("TOGETHER_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady model="meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
) )
# You can also register a function_name of None to get all functions
# sent to the same callback with an additional function_name parameter.
llm.register_function("get_current_weather", fetch_weather_from_api)
llm = TogetherLLMService( weather_function = FunctionSchema(
api_key=os.getenv("TOGETHER_API_KEY"), name="get_current_weather",
model="meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo", description="Get the current weather",
) properties={
# You can also register a function_name of None to get all functions "location": {
# sent to the same callback with an additional function_name parameter. "type": "string",
llm.register_function("get_current_weather", fetch_weather_from_api) "description": "The city and state, e.g. San Francisco, CA",
},
weather_function = FunctionSchema( "format": {
name="get_current_weather", "type": "string",
description="Get the current weather", "enum": ["celsius", "fahrenheit"],
properties={ "description": "The temperature unit to use. Infer this from the user's location.",
"location": { },
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
}, },
"format": { required=["location", "format"],
"type": "string", )
"enum": ["celsius", "fahrenheit"], tools = ToolsSchema(standard_tools=[weather_function])
"description": "The temperature unit to use. Infer this from the user's location.", messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
}, },
},
required=["location", "format"],
)
tools = ToolsSchema(standard_tools=[weather_function])
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(),
stt,
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
] ]
)
task = PipelineTask(pipeline) context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
@transport.event_handler("on_client_connected") pipeline = Pipeline(
async def on_client_connected(transport, client): [
logger.info(f"Client connected") transport.input(),
# Kick off the conversation. context_aggregator.user(),
await task.queue_frames([context_aggregator.user().get_context_frame()]) llm,
tts,
transport.output(),
context_aggregator.assistant(),
]
)
@transport.event_handler("on_client_disconnected") task = PipelineTask(pipeline)
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") await transport.capture_participant_transcription(participant["id"])
await task.cancel() # Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner(handle_sigint=False) runner = PipelineRunner()
await runner.run(task) await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -6,9 +6,12 @@
import asyncio import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema
@@ -18,17 +21,15 @@ from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
# Global variable to store the peer connection ID video_participant_id = None
webrtc_peer_id = None
async def get_weather(function_name, tool_call_id, arguments, llm, context, result_callback): async def get_weather(function_name, tool_call_id, arguments, llm, context, result_callback):
@@ -37,85 +38,71 @@ async def get_weather(function_name, tool_call_id, arguments, llm, context, resu
async def get_image(function_name, tool_call_id, arguments, llm, context, result_callback): async def get_image(function_name, tool_call_id, arguments, llm, context, result_callback):
logger.debug(f"!!! IN get_image {video_participant_id}, {arguments}")
question = arguments["question"] question = arguments["question"]
logger.debug(f"Requesting image with user_id={webrtc_peer_id}, question={question}")
# Request the image frame
await llm.request_image_frame( await llm.request_image_frame(
user_id=webrtc_peer_id, user_id=video_participant_id,
function_name=function_name, function_name=function_name,
tool_call_id=tool_call_id, tool_call_id=tool_call_id,
text_content=question, text_content=question,
) )
# Wait a short time for the frame to be processed
await asyncio.sleep(0.5)
# Return a result to complete the function call async def main():
await result_callback( async with aiohttp.ClientSession() as session:
f"I've captured an image from your camera and I'm analyzing what you asked about: {question}" (room_url, token) = await configure(session)
)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
async def run_bot(webrtc_connection: SmallWebRTCConnection): tts = CartesiaTTSService(
global webrtc_peer_id api_key=os.getenv("CARTESIA_API_KEY"),
webrtc_peer_id = webrtc_connection.pc_id voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
logger.info(f"Starting bot with peer_id: {webrtc_peer_id}") llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
llm.register_function("get_weather", get_weather)
llm.register_function("get_image", get_image)
transport = SmallWebRTCTransport( weather_function = FunctionSchema(
webrtc_connection=webrtc_connection, name="get_weather",
params=TransportParams( description="Get the current weather",
audio_in_enabled=True, properties={
audio_out_enabled=True, "location": {
camera_in_enabled=True, # Make sure camera input is enabled "type": "string",
vad_enabled=True, "description": "The city and state, e.g. San Francisco, CA",
vad_analyzer=SileroVADAnalyzer(), },
vad_audio_passthrough=True, "format": {
), "type": "string",
) "enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the user's location.",
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) },
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm.register_function("get_weather", get_weather)
llm.register_function("get_image", get_image)
weather_function = FunctionSchema(
name="get_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
}, },
"format": { required=["location"],
"type": "string", )
"enum": ["celsius", "fahrenheit"], get_image_function = FunctionSchema(
"description": "The temperature unit to use. Infer this from the user's location.", name="get_image",
description="Get an image from the video stream.",
properties={
"question": {
"type": "string",
"description": "The question that the user is asking about the image.",
}
}, },
}, required=["question"],
required=["location"], )
) tools = ToolsSchema(standard_tools=[weather_function, get_image_function])
get_image_function = FunctionSchema(
name="get_image",
description="Get an image from the video stream.",
properties={
"question": {
"type": "string",
"description": "The question that the user is asking about the image.",
}
},
required=["question"],
)
tools = ToolsSchema(standard_tools=[weather_function, get_image_function])
system_prompt = """\ system_prompt = """\
You are a helpful assistant who converses with a user and answers questions. Respond concisely to general questions. You are a helpful assistant who converses with a user and answers questions. Respond concisely to general questions.
Your response will be turned into speech so use only simple words and punctuation. Your response will be turned into speech so use only simple words and punctuation.
@@ -126,55 +113,46 @@ You can respond to questions about the weather using the get_weather tool.
You can answer questions about the user's video stream using the get_image tool. Some examples of phrases that \ You can answer questions about the user's video stream using the get_image tool. Some examples of phrases that \
indicate you should use the get_image tool are: indicate you should use the get_image tool are:
- What do you see? - What do you see?
- What's in the video? - What's in the video?
- Can you describe the video? - Can you describe the video?
- Tell me about what you see. - Tell me about what you see.
- Tell me something interesting about what you see. - Tell me something interesting about what you see.
- What's happening in the video? - What's happening in the video?
""" """
messages = [ messages = [
{"role": "system", "content": system_prompt}, {"role": "system", "content": system_prompt},
]
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(),
stt,
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
] ]
)
task = PipelineTask(pipeline) context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
@transport.event_handler("on_client_connected") pipeline = Pipeline(
async def on_client_connected(transport, client): [
logger.info(f"Client connected") transport.input(),
# Kick off the conversation. context_aggregator.user(),
await task.queue_frames([context_aggregator.user().get_context_frame()]) llm,
tts,
transport.output(),
context_aggregator.assistant(),
]
)
@transport.event_handler("on_client_disconnected") task = PipelineTask(pipeline)
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") global video_participant_id
await task.cancel() video_participant_id = participant["id"]
await transport.capture_participant_transcription(participant["id"])
await transport.capture_participant_video(video_participant_id, framerate=0)
# Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner(handle_sigint=False) runner = PipelineRunner()
await runner.run(task) await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -6,9 +6,12 @@
import asyncio import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema
@@ -19,17 +22,15 @@ from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.google.llm import GoogleLLMService from pipecat.services.google.llm import GoogleLLMService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
# Global variable to store the peer connection ID video_participant_id = None
webrtc_peer_id = None
async def get_weather(function_name, tool_call_id, arguments, llm, context, result_callback): async def get_weather(function_name, tool_call_id, arguments, llm, context, result_callback):
@@ -39,85 +40,71 @@ async def get_weather(function_name, tool_call_id, arguments, llm, context, resu
async def get_image(function_name, tool_call_id, arguments, llm, context, result_callback): async def get_image(function_name, tool_call_id, arguments, llm, context, result_callback):
logger.debug(f"!!! IN get_image {video_participant_id}, {arguments}")
question = arguments["question"] question = arguments["question"]
logger.debug(f"Requesting image with user_id={webrtc_peer_id}, question={question}")
# Request the image frame
await llm.request_image_frame( await llm.request_image_frame(
user_id=webrtc_peer_id, user_id=video_participant_id,
function_name=function_name, function_name=function_name,
tool_call_id=tool_call_id, tool_call_id=tool_call_id,
text_content=question, text_content=question,
) )
# Wait a short time for the frame to be processed
await asyncio.sleep(0.5)
# Return a result to complete the function call async def main():
await result_callback( async with aiohttp.ClientSession() as session:
f"I've captured an image from your camera and I'm analyzing what you asked about: {question}" (room_url, token) = await configure(session)
)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
async def run_bot(webrtc_connection: SmallWebRTCConnection): tts = CartesiaTTSService(
global webrtc_peer_id api_key=os.getenv("CARTESIA_API_KEY"),
webrtc_peer_id = webrtc_connection.pc_id voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
logger.info(f"Starting bot with peer_id: {webrtc_peer_id}") llm = GoogleLLMService(api_key=os.getenv("GOOGLE_API_KEY"), model="gemini-2.0-flash-001")
llm.register_function("get_weather", get_weather)
llm.register_function("get_image", get_image)
transport = SmallWebRTCTransport( weather_function = FunctionSchema(
webrtc_connection=webrtc_connection, name="get_weather",
params=TransportParams( description="Get the current weather",
audio_in_enabled=True, properties={
audio_out_enabled=True, "location": {
camera_in_enabled=True, # Make sure camera input is enabled "type": "string",
vad_enabled=True, "description": "The city and state, e.g. San Francisco, CA",
vad_analyzer=SileroVADAnalyzer(), },
vad_audio_passthrough=True, "format": {
), "type": "string",
) "enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the user's location.",
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) },
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = GoogleLLMService(api_key=os.getenv("GOOGLE_API_KEY"), model="gemini-2.0-flash-001")
llm.register_function("get_weather", get_weather)
llm.register_function("get_image", get_image)
weather_function = FunctionSchema(
name="get_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
}, },
"format": { required=["location", "format"],
"type": "string", )
"enum": ["celsius", "fahrenheit"], get_image_function = FunctionSchema(
"description": "The temperature unit to use. Infer this from the user's location.", name="get_image",
description="Get an image from the video stream.",
properties={
"question": {
"type": "string",
"description": "The question that the user is asking about the image.",
}
}, },
}, required=["question"],
required=["location", "format"], )
) tools = ToolsSchema(standard_tools=[weather_function, get_image_function])
get_image_function = FunctionSchema(
name="get_image",
description="Get an image from the video stream.",
properties={
"question": {
"type": "string",
"description": "The question that the user is asking about the image.",
}
},
required=["question"],
)
tools = ToolsSchema(standard_tools=[weather_function, get_image_function])
system_prompt = """\ system_prompt = """\
You are a helpful assistant who converses with a user and answers questions. Respond concisely to general questions. You are a helpful assistant who converses with a user and answers questions. Respond concisely to general questions.
Your response will be turned into speech so use only simple words and punctuation. Your response will be turned into speech so use only simple words and punctuation.
@@ -128,63 +115,54 @@ You can respond to questions about the weather using the get_weather tool.
You can answer questions about the user's video stream using the get_image tool. Some examples of phrases that \ You can answer questions about the user's video stream using the get_image tool. Some examples of phrases that \
indicate you should use the get_image tool are: indicate you should use the get_image tool are:
- What do you see? - What do you see?
- What's in the video? - What's in the video?
- Can you describe the video? - Can you describe the video?
- Tell me about what you see. - Tell me about what you see.
- Tell me something interesting about what you see. - Tell me something interesting about what you see.
- What's happening in the video? - What's happening in the video?
""" """
messages = [ messages = [
{"role": "system", "content": system_prompt}, {"role": "system", "content": system_prompt},
{"role": "user", "content": "Say hello."}, {"role": "user", "content": "Say hello."},
]
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(),
stt,
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
] ]
)
task = PipelineTask( context = OpenAILLMContext(messages, tools)
pipeline, context_aggregator = llm.create_context_aggregator(context)
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
),
)
@transport.event_handler("on_client_connected") pipeline = Pipeline(
async def on_client_connected(transport, client): [
logger.info(f"Client connected: {client}") transport.input(),
# Kick off the conversation. context_aggregator.user(),
await task.queue_frames([context_aggregator.user().get_context_frame()]) llm,
tts,
transport.output(),
context_aggregator.assistant(),
]
)
@transport.event_handler("on_client_disconnected") task = PipelineTask(
async def on_client_disconnected(transport, client): pipeline,
logger.info(f"Client disconnected") params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
),
)
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") global video_participant_id
await task.cancel() video_participant_id = participant["id"]
await transport.capture_participant_transcription(participant["id"])
await transport.capture_participant_video(video_participant_id, framerate=0)
# Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner(handle_sigint=False) runner = PipelineRunner()
await runner.run(task) await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,10 +4,14 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema
@@ -20,113 +24,105 @@ from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.groq.llm import GroqLLMService from pipecat.services.groq.llm import GroqLLMService
from pipecat.services.groq.stt import GroqSTTService from pipecat.services.groq.stt import GroqSTTService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def fetch_weather_from_api(function_name, tool_call_id, args, llm, context, result_callback): async def fetch_weather_from_api(function_name, tool_call_id, args, llm, context, result_callback):
await llm.push_frame(TTSSpeakFrame("Let me check on that.")) await llm.push_frame(TTSSpeakFrame("Let me check on that."))
await result_callback({"conditions": "nice", "temperature": "75"}) await result_callback({"conditions": "nice", "temperature": "75"})
async def run_bot(webrtc_connection: SmallWebRTCConnection): async def main():
logger.info(f"Starting bot") async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = SmallWebRTCTransport( transport = DailyTransport(
webrtc_connection=webrtc_connection, room_url,
params=TransportParams( token,
audio_in_enabled=True, "Respond bot",
audio_out_enabled=True, DailyParams(
vad_enabled=True, audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(), vad_enabled=True,
vad_audio_passthrough=True, vad_analyzer=SileroVADAnalyzer(),
), vad_audio_passthrough=True,
) ),
)
stt = GroqSTTService(api_key=os.getenv("GROQ_API_KEY"), model="distil-whisper-large-v3-en") stt = GroqSTTService(api_key=os.getenv("GROQ_API_KEY"), model="distil-whisper-large-v3-en")
tts = CartesiaTTSService( tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"), api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
) )
llm = GroqLLMService(api_key=os.getenv("GROQ_API_KEY"), model="llama-3.3-70b-versatile") llm = GroqLLMService(api_key=os.getenv("GROQ_API_KEY"), model="llama-3.3-70b-versatile")
# You can also register a function_name of None to get all functions # You can also register a function_name of None to get all functions
# sent to the same callback with an additional function_name parameter. # sent to the same callback with an additional function_name parameter.
llm.register_function("get_current_weather", fetch_weather_from_api) llm.register_function("get_current_weather", fetch_weather_from_api)
weather_function = FunctionSchema( weather_function = FunctionSchema(
name="get_current_weather", name="get_current_weather",
description="Get the current weather", description="Get the current weather",
properties={ properties={
"location": { "location": {
"type": "string", "type": "string",
"description": "The city and state, e.g. San Francisco, CA", "description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the user's location.",
},
}, },
"format": { required=["location"],
"type": "string", )
"enum": ["celsius", "fahrenheit"], tools = ToolsSchema(standard_tools=[weather_function])
"description": "The temperature unit to use. Infer this from the user's location.", messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
}, },
},
required=["location"],
)
tools = ToolsSchema(standard_tools=[weather_function])
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(),
stt,
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
] ]
)
task = PipelineTask( context = OpenAILLMContext(messages, tools)
pipeline, context_aggregator = llm.create_context_aggregator(context)
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
),
)
@transport.event_handler("on_client_connected") pipeline = Pipeline(
async def on_client_connected(transport, client): [
logger.info(f"Client connected") transport.input(),
# Kick off the conversation. stt,
await task.queue_frames([context_aggregator.user().get_context_frame()]) context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
]
)
@transport.event_handler("on_client_disconnected") task = PipelineTask(
async def on_client_disconnected(transport, client): pipeline,
logger.info(f"Client disconnected") params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
),
)
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") await transport.capture_participant_transcription(participant["id"])
await task.cancel() # Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner(handle_sigint=False) runner = PipelineRunner()
await runner.run(task) await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,10 +4,14 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema
@@ -17,114 +21,102 @@ from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.grok.llm import GrokLLMService from pipecat.services.grok.llm import GrokLLMService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def fetch_weather_from_api(function_name, tool_call_id, args, llm, context, result_callback): async def fetch_weather_from_api(function_name, tool_call_id, args, llm, context, result_callback):
await result_callback({"conditions": "nice", "temperature": "75"}) await result_callback({"conditions": "nice", "temperature": "75"})
async def run_bot(webrtc_connection: SmallWebRTCConnection): async def main():
logger.info(f"Starting bot") async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = SmallWebRTCTransport( transport = DailyTransport(
webrtc_connection=webrtc_connection, room_url,
params=TransportParams( token,
audio_in_enabled=True, "Respond bot",
audio_out_enabled=True, DailyParams(
vad_enabled=True, audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(), transcription_enabled=True,
vad_audio_passthrough=True, vad_enabled=True,
), vad_analyzer=SileroVADAnalyzer(),
) ),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
tts = CartesiaTTSService( llm = GrokLLMService(api_key=os.getenv("GROK_API_KEY"))
api_key=os.getenv("CARTESIA_API_KEY"), # You can also register a function_name of None to get all functions
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady # sent to the same callback with an additional function_name parameter.
) llm.register_function("get_current_weather", fetch_weather_from_api)
llm = GrokLLMService(api_key=os.getenv("GROK_API_KEY")) weather_function = FunctionSchema(
# You can also register a function_name of None to get all functions name="get_current_weather",
# sent to the same callback with an additional function_name parameter. description="Get the current weather",
llm.register_function("get_current_weather", fetch_weather_from_api) properties={
"location": {
weather_function = FunctionSchema( "type": "string",
name="get_current_weather", "description": "The city and state, e.g. San Francisco, CA",
description="Get the current weather", },
properties={ "format": {
"location": { "type": "string",
"type": "string", "enum": ["celsius", "fahrenheit"],
"description": "The city and state, e.g. San Francisco, CA", "description": "The temperature unit to use. Infer this from the user's location.",
},
}, },
"format": { required=["location", "format"],
"type": "string", )
"enum": ["celsius", "fahrenheit"], tools = ToolsSchema(standard_tools=[weather_function])
"description": "The temperature unit to use. Infer this from the user's location.", messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
}, },
},
required=["location", "format"],
)
tools = ToolsSchema(standard_tools=[weather_function])
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(),
stt,
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
] ]
)
task = PipelineTask( context = OpenAILLMContext(messages, tools)
pipeline, context_aggregator = llm.create_context_aggregator(context)
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
),
)
@transport.event_handler("on_client_connected") pipeline = Pipeline(
async def on_client_connected(transport, client): [
logger.info(f"Client connected") transport.input(),
# Kick off the conversation. context_aggregator.user(),
await task.queue_frames([context_aggregator.user().get_context_frame()]) llm,
tts,
transport.output(),
context_aggregator.assistant(),
]
)
@transport.event_handler("on_client_disconnected") task = PipelineTask(
async def on_client_disconnected(transport, client): pipeline,
logger.info(f"Client disconnected") params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
),
)
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") await transport.capture_participant_transcription(participant["id"])
await task.cancel() # Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner(handle_sigint=False) runner = PipelineRunner()
await runner.run(task) await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,10 +4,14 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema
@@ -19,118 +23,106 @@ from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.azure.llm import AzureLLMService from pipecat.services.azure.llm import AzureLLMService
from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def fetch_weather_from_api(function_name, tool_call_id, args, llm, context, result_callback): async def fetch_weather_from_api(function_name, tool_call_id, args, llm, context, result_callback):
await llm.push_frame(TTSSpeakFrame("Let me check on that.")) await llm.push_frame(TTSSpeakFrame("Let me check on that."))
await result_callback({"conditions": "nice", "temperature": "75"}) await result_callback({"conditions": "nice", "temperature": "75"})
async def run_bot(webrtc_connection: SmallWebRTCConnection): async def main():
logger.info(f"Starting bot") async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = SmallWebRTCTransport( transport = DailyTransport(
webrtc_connection=webrtc_connection, room_url,
params=TransportParams( token,
audio_in_enabled=True, "Respond bot",
audio_out_enabled=True, DailyParams(
vad_enabled=True, audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(), transcription_enabled=True,
vad_audio_passthrough=True, vad_enabled=True,
), vad_analyzer=SileroVADAnalyzer(),
) ),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
tts = CartesiaTTSService( llm = AzureLLMService(
api_key=os.getenv("CARTESIA_API_KEY"), api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
) model=os.getenv("AZURE_CHATGPT_MODEL"),
)
# You can also register a function_name of None to get all functions
# sent to the same callback with an additional function_name parameter.
llm.register_function("get_current_weather", fetch_weather_from_api)
llm = AzureLLMService( weather_function = FunctionSchema(
api_key=os.getenv("AZURE_CHATGPT_API_KEY"), name="get_current_weather",
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"), description="Get the current weather",
model=os.getenv("AZURE_CHATGPT_MODEL"), properties={
) "location": {
# You can also register a function_name of None to get all functions "type": "string",
# sent to the same callback with an additional function_name parameter. "description": "The city and state, e.g. San Francisco, CA",
llm.register_function("get_current_weather", fetch_weather_from_api) },
"format": {
weather_function = FunctionSchema( "type": "string",
name="get_current_weather", "enum": ["celsius", "fahrenheit"],
description="Get the current weather", "description": "The temperature unit to use. Infer this from the user's location.",
properties={ },
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
}, },
"format": { required=["location", "format"],
"type": "string", )
"enum": ["celsius", "fahrenheit"], tools = ToolsSchema(standard_tools=[weather_function])
"description": "The temperature unit to use. Infer this from the user's location.", messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
}, },
},
required=["location", "format"],
)
tools = ToolsSchema(standard_tools=[weather_function])
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(),
stt,
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
] ]
)
task = PipelineTask( context = OpenAILLMContext(messages, tools)
pipeline, context_aggregator = llm.create_context_aggregator(context)
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
),
)
@transport.event_handler("on_client_connected") pipeline = Pipeline(
async def on_client_connected(transport, client): [
logger.info(f"Client connected") transport.input(),
# Kick off the conversation. context_aggregator.user(),
await task.queue_frames([context_aggregator.user().get_context_frame()]) llm,
tts,
transport.output(),
context_aggregator.assistant(),
]
)
@transport.event_handler("on_client_disconnected") task = PipelineTask(
async def on_client_disconnected(transport, client): pipeline,
logger.info(f"Client disconnected") params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
),
)
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") await transport.capture_participant_transcription(participant["id"])
await task.cancel() # Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner(handle_sigint=False) runner = PipelineRunner()
await runner.run(task) await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,10 +4,14 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema
@@ -18,118 +22,106 @@ from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.fireworks.llm import FireworksLLMService from pipecat.services.fireworks.llm import FireworksLLMService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def fetch_weather_from_api(function_name, tool_call_id, args, llm, context, result_callback): async def fetch_weather_from_api(function_name, tool_call_id, args, llm, context, result_callback):
await llm.push_frame(TTSSpeakFrame("Let me check on that.")) await llm.push_frame(TTSSpeakFrame("Let me check on that."))
await result_callback({"conditions": "nice", "temperature": "75"}) await result_callback({"conditions": "nice", "temperature": "75"})
async def run_bot(webrtc_connection: SmallWebRTCConnection): async def main():
logger.info(f"Starting bot") async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = SmallWebRTCTransport( transport = DailyTransport(
webrtc_connection=webrtc_connection, room_url,
params=TransportParams( token,
audio_in_enabled=True, "Respond bot",
audio_out_enabled=True, DailyParams(
vad_enabled=True, audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(), transcription_enabled=True,
vad_audio_passthrough=True, vad_enabled=True,
), vad_analyzer=SileroVADAnalyzer(),
) ),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
tts = CartesiaTTSService( llm = FireworksLLMService(
api_key=os.getenv("CARTESIA_API_KEY"), api_key=os.getenv("FIREWORKS_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady model="accounts/fireworks/models/llama-v3p1-405b-instruct",
) )
# You can also register a function_name of None to get all functions
# sent to the same callback with an additional function_name parameter.
llm.register_function("get_current_weather", fetch_weather_from_api)
llm = FireworksLLMService( weather_function = FunctionSchema(
api_key=os.getenv("FIREWORKS_API_KEY"), name="get_current_weather",
model="accounts/fireworks/models/llama-v3p1-405b-instruct", description="Get the current weather",
) properties={
# You can also register a function_name of None to get all functions "location": {
# sent to the same callback with an additional function_name parameter. "type": "string",
llm.register_function("get_current_weather", fetch_weather_from_api) "description": "The city and state, e.g. San Francisco, CA",
},
weather_function = FunctionSchema( "format": {
name="get_current_weather", "type": "string",
description="Get the current weather", "enum": ["celsius", "fahrenheit"],
properties={ "description": "The temperature unit to use. Infer this from the user's location.",
"location": { },
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
}, },
"format": { required=["location", "format"],
"type": "string", )
"enum": ["celsius", "fahrenheit"], tools = ToolsSchema(standard_tools=[weather_function])
"description": "The temperature unit to use. Infer this from the user's location.", messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
}, },
},
required=["location", "format"],
)
tools = ToolsSchema(standard_tools=[weather_function])
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(),
stt,
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
] ]
)
task = PipelineTask( context = OpenAILLMContext(messages, tools)
pipeline, context_aggregator = llm.create_context_aggregator(context)
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
),
)
@transport.event_handler("on_client_connected") pipeline = Pipeline(
async def on_client_connected(transport, client): [
logger.info(f"Client connected") transport.input(),
# Kick off the conversation. context_aggregator.user(),
await task.queue_frames([context_aggregator.user().get_context_frame()]) llm,
tts,
transport.output(),
context_aggregator.assistant(),
]
)
@transport.event_handler("on_client_disconnected") task = PipelineTask(
async def on_client_disconnected(transport, client): pipeline,
logger.info(f"Client disconnected") params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
),
)
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") await transport.capture_participant_transcription(participant["id"])
await task.cancel() # Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner(handle_sigint=False) runner = PipelineRunner()
await runner.run(task) await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,10 +4,14 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema
@@ -18,116 +22,106 @@ from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.nim.llm import NimLLMService from pipecat.services.nim.llm import NimLLMService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def fetch_weather_from_api(function_name, tool_call_id, args, llm, context, result_callback): async def fetch_weather_from_api(function_name, tool_call_id, args, llm, context, result_callback):
await llm.push_frame(TTSSpeakFrame("Let me check on that.")) await llm.push_frame(TTSSpeakFrame("Let me check on that."))
await result_callback({"conditions": "nice", "temperature": "75"}) await result_callback({"conditions": "nice", "temperature": "75"})
async def run_bot(webrtc_connection: SmallWebRTCConnection): async def main():
logger.info(f"Starting bot") async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = SmallWebRTCTransport( transport = DailyTransport(
webrtc_connection=webrtc_connection, room_url,
params=TransportParams( token,
audio_in_enabled=True, "Respond bot",
audio_out_enabled=True, DailyParams(
vad_enabled=True, audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(), transcription_enabled=True,
vad_audio_passthrough=True, vad_enabled=True,
), vad_analyzer=SileroVADAnalyzer(),
) ),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
# text_filters=[MarkdownTextFilter()],
)
tts = CartesiaTTSService( llm = NimLLMService(
api_key=os.getenv("CARTESIA_API_KEY"), api_key=os.getenv("NVIDIA_API_KEY"), model="meta/llama-3.3-70b-instruct"
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady )
# text_filters=[MarkdownTextFilter()], # You can also register a function_name of None to get all functions
) # sent to the same callback with an additional function_name parameter.
llm.register_function("get_current_weather", fetch_weather_from_api)
llm = NimLLMService(api_key=os.getenv("NVIDIA_API_KEY"), model="meta/llama-3.3-70b-instruct") weather_function = FunctionSchema(
# You can also register a function_name of None to get all functions name="get_current_weather",
# sent to the same callback with an additional function_name parameter. description="Get the current weather",
llm.register_function("get_current_weather", fetch_weather_from_api) properties={
"location": {
weather_function = FunctionSchema( "type": "string",
name="get_current_weather", "description": "The city and state, e.g. San Francisco, CA",
description="Get the current weather", },
properties={ "format": {
"location": { "type": "string",
"type": "string", "enum": ["celsius", "fahrenheit"],
"description": "The city and state, e.g. San Francisco, CA", "description": "The temperature unit to use. Infer this from the user's location.",
},
}, },
"format": { required=["location", "format"],
"type": "string", )
"enum": ["celsius", "fahrenheit"], tools = ToolsSchema(standard_tools=[weather_function])
"description": "The temperature unit to use. Infer this from the user's location.", messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
}, },
},
required=["location", "format"],
)
tools = ToolsSchema(standard_tools=[weather_function])
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(),
stt,
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
] ]
)
task = PipelineTask( context = OpenAILLMContext(messages, tools)
pipeline, context_aggregator = llm.create_context_aggregator(context)
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
),
)
@transport.event_handler("on_client_connected") pipeline = Pipeline(
async def on_client_connected(transport, client): [
logger.info(f"Client connected") transport.input(),
# Kick off the conversation. context_aggregator.user(),
await task.queue_frames([context_aggregator.user().get_context_frame()]) llm,
tts,
transport.output(),
context_aggregator.assistant(),
]
)
@transport.event_handler("on_client_disconnected") task = PipelineTask(
async def on_client_disconnected(transport, client): pipeline,
logger.info(f"Client disconnected") params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
),
)
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") await transport.capture_participant_transcription(participant["id"])
await task.cancel() # Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner(handle_sigint=False) runner = PipelineRunner()
await runner.run(task) await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,10 +4,14 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema
@@ -19,66 +23,66 @@ from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.cerebras.llm import CerebrasLLMService from pipecat.services.cerebras.llm import CerebrasLLMService
from pipecat.services.deepgram.stt import DeepgramSTTService from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def fetch_weather_from_api(function_name, tool_call_id, args, llm, context, result_callback): async def fetch_weather_from_api(function_name, tool_call_id, args, llm, context, result_callback):
await llm.push_frame(TTSSpeakFrame("Let me check on that.")) await llm.push_frame(TTSSpeakFrame("Let me check on that."))
await result_callback({"conditions": "nice", "temperature": "75"}) await result_callback({"conditions": "nice", "temperature": "75"})
async def run_bot(webrtc_connection: SmallWebRTCConnection): async def main():
logger.info(f"Starting bot") async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = SmallWebRTCTransport( transport = DailyTransport(
webrtc_connection=webrtc_connection, room_url,
params=TransportParams( token,
audio_in_enabled=True, "Respond bot",
audio_out_enabled=True, DailyParams(
vad_enabled=True, audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(), transcription_enabled=True,
vad_audio_passthrough=True, vad_enabled=True,
), vad_analyzer=SileroVADAnalyzer(),
) ),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
tts = CartesiaTTSService( llm = CerebrasLLMService(api_key=os.getenv("CEREBRAS_API_KEY"), model="llama-3.3-70b")
api_key=os.getenv("CARTESIA_API_KEY"), # You can also register a function_name of None to get all functions
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady # sent to the same callback with an additional function_name parameter.
) llm.register_function("get_current_weather", fetch_weather_from_api)
llm = CerebrasLLMService(api_key=os.getenv("CEREBRAS_API_KEY"), model="llama-3.3-70b") weather_function = FunctionSchema(
# You can also register a function_name of None to get all functions name="get_current_weather",
# sent to the same callback with an additional function_name parameter. description="Get the current weather",
llm.register_function("get_current_weather", fetch_weather_from_api) properties={
"location": {
weather_function = FunctionSchema( "type": "string",
name="get_current_weather", "description": "The city and state, e.g. San Francisco, CA",
description="Get the current weather", },
properties={ "format": {
"location": { "type": "string",
"type": "string", "enum": ["celsius", "fahrenheit"],
"description": "The city and state, e.g. San Francisco, CA", "description": "The temperature unit to use. Infer this from the user's location.",
},
}, },
"format": { required=["location", "format"],
"type": "string", )
"enum": ["celsius", "fahrenheit"], tools = ToolsSchema(standard_tools=[weather_function])
"description": "The temperature unit to use. Infer this from the user's location.", messages = [
}, {
}, "role": "system",
required=["location", "format"], "content": """You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way.
)
tools = ToolsSchema(standard_tools=[weather_function])
messages = [
{
"role": "system",
"content": """You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way.
You have one functions available: You have one functions available:
@@ -88,56 +92,44 @@ Infer whether to use Fahrenheit or Celsius automatically based on the location,
Start by asking me for my location. Then, use 'get_weather_current' to give me a forecast. Start by asking me for my location. Then, use 'get_weather_current' to give me a forecast.
Respond to what the user said in a creative and helpful way.""", Respond to what the user said in a creative and helpful way.""",
}, },
]
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(),
stt,
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
] ]
)
task = PipelineTask( context = OpenAILLMContext(messages, tools)
pipeline, context_aggregator = llm.create_context_aggregator(context)
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_connected") pipeline = Pipeline(
async def on_client_connected(transport, client): [
logger.info(f"Client connected") transport.input(),
# Kick off the conversation. context_aggregator.user(),
await task.queue_frames([context_aggregator.user().get_context_frame()]) llm,
tts,
transport.output(),
context_aggregator.assistant(),
]
)
@transport.event_handler("on_client_disconnected") task = PipelineTask(
async def on_client_disconnected(transport, client): pipeline,
logger.info(f"Client disconnected") params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") await transport.capture_participant_transcription(participant["id"])
await task.cancel() # Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner(handle_sigint=False) runner = PipelineRunner()
await runner.run(task) await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,10 +4,14 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema
@@ -18,67 +22,67 @@ from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.deepseek.llm import DeepSeekLLMService from pipecat.services.deepseek.llm import DeepSeekLLMService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def fetch_weather_from_api(function_name, tool_call_id, args, llm, context, result_callback): async def fetch_weather_from_api(function_name, tool_call_id, args, llm, context, result_callback):
await llm.push_frame(TTSSpeakFrame("Let me check on that.")) await llm.push_frame(TTSSpeakFrame("Let me check on that."))
await result_callback({"conditions": "nice", "temperature": "75"}) await result_callback({"conditions": "nice", "temperature": "75"})
async def run_bot(webrtc_connection: SmallWebRTCConnection): async def main():
logger.info(f"Starting bot") async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = SmallWebRTCTransport( transport = DailyTransport(
webrtc_connection=webrtc_connection, room_url,
params=TransportParams( token,
audio_in_enabled=True, "Respond bot",
audio_out_enabled=True, DailyParams(
vad_enabled=True, audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(), transcription_enabled=True,
vad_audio_passthrough=True, vad_enabled=True,
), vad_analyzer=SileroVADAnalyzer(),
) ),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
tts = CartesiaTTSService( llm = DeepSeekLLMService(api_key=os.getenv("DEEPSEEK_API_KEY"), model="deepseek-chat")
api_key=os.getenv("CARTESIA_API_KEY"), # You can also register a function_name of None to get all functions
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady # sent to the same callback with an additional function_name parameter.
) llm.register_function("get_current_weather", fetch_weather_from_api)
llm = DeepSeekLLMService(api_key=os.getenv("DEEPSEEK_API_KEY"), model="deepseek-chat") weather_function = FunctionSchema(
# You can also register a function_name of None to get all functions name="get_current_weather",
# sent to the same callback with an additional function_name parameter. description="Get the current weather",
llm.register_function("get_current_weather", fetch_weather_from_api) properties={
"location": {
weather_function = FunctionSchema( "type": "string",
name="get_current_weather", "description": "The city and state, e.g. San Francisco, CA",
description="Get the current weather", },
properties={ "format": {
"location": { "type": "string",
"type": "string", "enum": ["celsius", "fahrenheit"],
"description": "The city and state, e.g. San Francisco, CA", "description": "The temperature unit to use. Infer this from the user's location.",
},
}, },
"format": { required=["location", "format"],
"type": "string", )
"enum": ["celsius", "fahrenheit"], tools = ToolsSchema(standard_tools=[weather_function])
"description": "The temperature unit to use. Infer this from the user's location.", messages = [
}, {
}, "role": "system",
required=["location", "format"], "content": """You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way.
)
tools = ToolsSchema(standard_tools=[weather_function])
messages = [
{
"role": "system",
"content": """You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way.
You have one functions available: You have one functions available:
@@ -88,56 +92,44 @@ Infer whether to use Fahrenheit or Celsius automatically based on the location,
Start by asking me for my location. Then, use 'get_weather_current' to give me a forecast. Start by asking me for my location. Then, use 'get_weather_current' to give me a forecast.
Respond to what the user said in a creative and helpful way.""", Respond to what the user said in a creative and helpful way.""",
}, },
]
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(),
stt,
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
] ]
)
task = PipelineTask( context = OpenAILLMContext(messages, tools)
pipeline, context_aggregator = llm.create_context_aggregator(context)
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_connected") pipeline = Pipeline(
async def on_client_connected(transport, client): [
logger.info(f"Client connected") transport.input(),
# Kick off the conversation. context_aggregator.user(),
await task.queue_frames([context_aggregator.user().get_context_frame()]) llm,
tts,
transport.output(),
context_aggregator.assistant(),
]
)
@transport.event_handler("on_client_disconnected") task = PipelineTask(
async def on_client_disconnected(transport, client): pipeline,
logger.info(f"Client disconnected") params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") await transport.capture_participant_transcription(participant["id"])
await task.cancel() # Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner(handle_sigint=False) runner = PipelineRunner()
await runner.run(task) await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,10 +4,14 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema
@@ -18,120 +22,108 @@ from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.azure.tts import AzureTTSService from pipecat.services.azure.tts import AzureTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openrouter.llm import OpenRouterLLMService from pipecat.services.openrouter.llm import OpenRouterLLMService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def fetch_weather_from_api(function_name, tool_call_id, args, llm, context, result_callback): async def fetch_weather_from_api(function_name, tool_call_id, args, llm, context, result_callback):
await llm.push_frame(TTSSpeakFrame("Let me check on that.")) await llm.push_frame(TTSSpeakFrame("Let me check on that."))
await result_callback({"conditions": "nice", "temperature": "75"}) await result_callback({"conditions": "nice", "temperature": "75"})
async def run_bot(webrtc_connection: SmallWebRTCConnection): async def main():
logger.info(f"Starting bot") async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = SmallWebRTCTransport( transport = DailyTransport(
webrtc_connection=webrtc_connection, room_url,
params=TransportParams( token,
audio_in_enabled=True, "Respond bot",
audio_out_enabled=True, DailyParams(
vad_enabled=True, audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(), transcription_enabled=True,
vad_audio_passthrough=True, vad_enabled=True,
), vad_analyzer=SileroVADAnalyzer(),
) ),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) tts = AzureTTSService(
api_key=os.getenv("AZURE_API_KEY"),
region="eastus",
voice="en-US-JennyNeural",
params=AzureTTSService.InputParams(language="en-US", rate="1.1", style="cheerful"),
)
tts = AzureTTSService( llm = OpenRouterLLMService(
api_key=os.getenv("AZURE_API_KEY"), api_key=os.getenv("OPENROUTER_API_KEY"), model="openai/gpt-4o-2024-11-20"
region="eastus", )
voice="en-US-JennyNeural", # You can also register a function_name of None to get all functions
params=AzureTTSService.InputParams(language="en-US", rate="1.1", style="cheerful"), # sent to the same callback with an additional function_name parameter.
) llm.register_function("get_current_weather", fetch_weather_from_api)
llm = OpenRouterLLMService( weather_function = FunctionSchema(
api_key=os.getenv("OPENROUTER_API_KEY"), model="openai/gpt-4o-2024-11-20" name="get_current_weather",
) description="Get the current weather",
# You can also register a function_name of None to get all functions properties={
# sent to the same callback with an additional function_name parameter. "location": {
llm.register_function("get_current_weather", fetch_weather_from_api) "type": "string",
"description": "The city and state, e.g. San Francisco, CA",
weather_function = FunctionSchema( },
name="get_current_weather", "format": {
description="Get the current weather", "type": "string",
properties={ "enum": ["celsius", "fahrenheit"],
"location": { "description": "The temperature unit to use. Infer this from the user's location.",
"type": "string", },
"description": "The city and state, e.g. San Francisco, CA",
}, },
"format": { required=["location", "format"],
"type": "string", )
"enum": ["celsius", "fahrenheit"], tools = ToolsSchema(standard_tools=[weather_function])
"description": "The temperature unit to use. Infer this from the user's location.", messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
}, },
},
required=["location", "format"],
)
tools = ToolsSchema(standard_tools=[weather_function])
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(),
stt,
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
] ]
)
task = PipelineTask( context = OpenAILLMContext(messages, tools)
pipeline, context_aggregator = llm.create_context_aggregator(context)
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_connected") pipeline = Pipeline(
async def on_client_connected(transport, client): [
logger.info(f"Client connected") transport.input(),
# Kick off the conversation. context_aggregator.user(),
await task.queue_frames([context_aggregator.user().get_context_frame()]) llm,
tts,
transport.output(),
context_aggregator.assistant(),
]
)
@transport.event_handler("on_client_disconnected") task = PipelineTask(
async def on_client_disconnected(transport, client): pipeline,
logger.info(f"Client disconnected") params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") await transport.capture_participant_transcription(participant["id"])
await task.cancel() # Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner(handle_sigint=False) runner = PipelineRunner()
await runner.run(task) await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -11,10 +11,14 @@ currently support function calling. The example shows basic chat completion func
using Perplexity's API while maintaining compatibility with the OpenAI interface. using Perplexity's API while maintaining compatibility with the OpenAI interface.
""" """
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.pipeline import Pipeline
@@ -22,91 +26,79 @@ from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.perplexity.llm import PerplexityLLMService from pipecat.services.perplexity.llm import PerplexityLLMService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def run_bot(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Starting bot")
transport = SmallWebRTCTransport( async def main():
webrtc_connection=webrtc_connection, async with aiohttp.ClientSession() as session:
params=TransportParams( (room_url, token) = await configure(session)
audio_in_enabled=True,
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = CartesiaTTSService( tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"), api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
) )
llm = PerplexityLLMService(api_key=os.getenv("PERPLEXITY_API_KEY"), model="sonar") llm = PerplexityLLMService(api_key=os.getenv("PERPLEXITY_API_KEY"), model="sonar")
messages = [ messages = [
{ {
"role": "user", "role": "user",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.", "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
}, },
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(),
stt,
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
] ]
)
task = PipelineTask( context = OpenAILLMContext(messages)
pipeline, context_aggregator = llm.create_context_aggregator(context)
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_connected") pipeline = Pipeline(
async def on_client_connected(transport, client): [
logger.info(f"Client connected") transport.input(),
# Kick off the conversation. context_aggregator.user(),
await task.queue_frames([context_aggregator.user().get_context_frame()]) llm,
tts,
transport.output(),
context_aggregator.assistant(),
]
)
@transport.event_handler("on_client_disconnected") task = PipelineTask(
async def on_client_disconnected(transport, client): pipeline,
logger.info(f"Client disconnected") params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") await transport.capture_participant_transcription(participant["id"])
await task.cancel() # Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner(handle_sigint=False) runner = PipelineRunner()
await runner.run(task) await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,10 +4,14 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema
@@ -17,116 +21,104 @@ from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.elevenlabs.tts import ElevenLabsTTSService from pipecat.services.elevenlabs.tts import ElevenLabsTTSService
from pipecat.services.google.llm_openai import GoogleLLMOpenAIBetaService from pipecat.services.google.llm_openai import GoogleLLMOpenAIBetaService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def fetch_weather_from_api(function_name, tool_call_id, args, llm, context, result_callback): async def fetch_weather_from_api(function_name, tool_call_id, args, llm, context, result_callback):
await llm.push_frame(TTSSpeakFrame("Let me check on that.")) await llm.push_frame(TTSSpeakFrame("Let me check on that."))
await result_callback({"conditions": "nice", "temperature": "75"}) await result_callback({"conditions": "nice", "temperature": "75"})
async def run_bot(webrtc_connection: SmallWebRTCConnection): async def main():
logger.info(f"Starting bot") async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = SmallWebRTCTransport( transport = DailyTransport(
webrtc_connection=webrtc_connection, room_url,
params=TransportParams( token,
audio_in_enabled=True, "Respond bot",
audio_out_enabled=True, DailyParams(
vad_enabled=True, audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(), transcription_enabled=True,
vad_audio_passthrough=True, vad_enabled=True,
), vad_analyzer=SileroVADAnalyzer(),
) ),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) tts = ElevenLabsTTSService(
api_key=os.getenv("ELEVENLABS_API_KEY", ""),
voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
)
tts = ElevenLabsTTSService( llm = GoogleLLMOpenAIBetaService(api_key=os.getenv("GEMINI_API_KEY"))
api_key=os.getenv("ELEVENLABS_API_KEY", ""), # You can aslo register a function_name of None to get all functions
voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""), # sent to the same callback with an additional function_name parameter.
) llm.register_function("get_current_weather", fetch_weather_from_api)
llm = GoogleLLMOpenAIBetaService(api_key=os.getenv("GEMINI_API_KEY")) weather_function = FunctionSchema(
# You can aslo register a function_name of None to get all functions name="get_current_weather",
# sent to the same callback with an additional function_name parameter. description="Get the current weather",
llm.register_function("get_current_weather", fetch_weather_from_api) properties={
"location": {
weather_function = FunctionSchema( "type": "string",
name="get_current_weather", "description": "The city and state, e.g. San Francisco, CA",
description="Get the current weather", },
properties={ "format": {
"location": { "type": "string",
"type": "string", "enum": ["celsius", "fahrenheit"],
"description": "The city and state, e.g. San Francisco, CA", "description": "The temperature unit to use. Infer this from the user's location.",
},
}, },
"format": { required=["location", "format"],
"type": "string", )
"enum": ["celsius", "fahrenheit"], tools = ToolsSchema(standard_tools=[weather_function])
"description": "The temperature unit to use. Infer this from the user's location.", messages = [
{
"role": "user",
"content": "Start a conversation with 'Hey there' to get the current weather.",
}, },
},
required=["location", "format"],
)
tools = ToolsSchema(standard_tools=[weather_function])
messages = [
{
"role": "user",
"content": "Start a conversation with 'Hey there' to get the current weather.",
},
]
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(),
stt,
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
] ]
)
task = PipelineTask( context = OpenAILLMContext(messages, tools)
pipeline, context_aggregator = llm.create_context_aggregator(context)
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
),
)
@transport.event_handler("on_client_connected") pipeline = Pipeline(
async def on_client_connected(transport, client): [
logger.info(f"Client connected") transport.input(),
# Kick off the conversation. context_aggregator.user(),
await task.queue_frames([context_aggregator.user().get_context_frame()]) llm,
tts,
transport.output(),
context_aggregator.assistant(),
]
)
@transport.event_handler("on_client_disconnected") task = PipelineTask(
async def on_client_disconnected(transport, client): pipeline,
logger.info(f"Client disconnected") params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
),
)
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") await transport.capture_participant_transcription(participant["id"])
await task.cancel() # Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner(handle_sigint=False) runner = PipelineRunner()
await runner.run(task) await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,10 +4,14 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema
@@ -17,122 +21,110 @@ from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.elevenlabs.tts import ElevenLabsTTSService from pipecat.services.elevenlabs.tts import ElevenLabsTTSService
from pipecat.services.google.llm_vertex import GoogleVertexLLMService from pipecat.services.google.llm_vertex import GoogleVertexLLMService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def fetch_weather_from_api(function_name, tool_call_id, args, llm, context, result_callback): async def fetch_weather_from_api(function_name, tool_call_id, args, llm, context, result_callback):
await llm.push_frame(TTSSpeakFrame("Let me check on that.")) await llm.push_frame(TTSSpeakFrame("Let me check on that."))
await result_callback({"conditions": "nice", "temperature": "75"}) await result_callback({"conditions": "nice", "temperature": "75"})
async def run_bot(webrtc_connection: SmallWebRTCConnection): async def main():
logger.info(f"Starting bot") async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = SmallWebRTCTransport( transport = DailyTransport(
webrtc_connection=webrtc_connection, room_url,
params=TransportParams( token,
audio_in_enabled=True, "Respond bot",
audio_out_enabled=True, DailyParams(
vad_enabled=True, audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(), transcription_enabled=True,
vad_audio_passthrough=True, vad_enabled=True,
), vad_analyzer=SileroVADAnalyzer(),
) ),
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = ElevenLabsTTSService(
api_key=os.getenv("ELEVENLABS_API_KEY", ""),
voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
)
llm = GoogleVertexLLMService(
# credentials="<json-credentials>",
params=GoogleVertexLLMService.InputParams(
project_id="<google-project-id>",
) )
)
# You can aslo register a function_name of None to get all functions
# sent to the same callback with an additional function_name parameter.
llm.register_function("get_current_weather", fetch_weather_from_api)
weather_function = FunctionSchema( tts = ElevenLabsTTSService(
name="get_current_weather", api_key=os.getenv("ELEVENLABS_API_KEY", ""),
description="Get the current weather", voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
properties={ )
"location": {
"type": "string", llm = GoogleVertexLLMService(
"description": "The city and state, e.g. San Francisco, CA", # credentials="<json-credentials>",
params=GoogleVertexLLMService.InputParams(
project_id="<google-project-id>",
)
)
# You can aslo register a function_name of None to get all functions
# sent to the same callback with an additional function_name parameter.
llm.register_function("get_current_weather", fetch_weather_from_api)
weather_function = FunctionSchema(
name="get_current_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the user's location.",
},
}, },
"format": { required=["location", "format"],
"type": "string", )
"enum": ["celsius", "fahrenheit"], tools = ToolsSchema(standard_tools=[weather_function])
"description": "The temperature unit to use. Infer this from the user's location.",
messages = [
{
"role": "user",
"content": "Start a conversation with 'Hey there' to get the current weather.",
}, },
},
required=["location", "format"],
)
tools = ToolsSchema(standard_tools=[weather_function])
messages = [
{
"role": "user",
"content": "Start a conversation with 'Hey there' to get the current weather.",
},
]
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(),
stt,
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
] ]
)
task = PipelineTask( context = OpenAILLMContext(messages, tools)
pipeline, context_aggregator = llm.create_context_aggregator(context)
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
),
)
@transport.event_handler("on_client_connected") pipeline = Pipeline(
async def on_client_connected(transport, client): [
logger.info(f"Client connected") transport.input(),
# Kick off the conversation. context_aggregator.user(),
await task.queue_frames([context_aggregator.user().get_context_frame()]) llm,
tts,
transport.output(),
context_aggregator.assistant(),
]
)
@transport.event_handler("on_client_disconnected") task = PipelineTask(
async def on_client_disconnected(transport, client): pipeline,
logger.info(f"Client disconnected") params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
),
)
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") await transport.capture_participant_transcription(participant["id"])
await task.cancel() # Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner(handle_sigint=False) runner = PipelineRunner()
await runner.run(task) await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,10 +4,14 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from runner import configure
from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema
@@ -18,118 +22,106 @@ from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.qwen.llm import QwenLLMService from pipecat.services.qwen.llm import QwenLLMService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def fetch_weather_from_api(function_name, tool_call_id, args, llm, context, result_callback): async def fetch_weather_from_api(function_name, tool_call_id, args, llm, context, result_callback):
await llm.push_frame(TTSSpeakFrame("Let me check on that.")) await llm.push_frame(TTSSpeakFrame("Let me check on that."))
await result_callback({"conditions": "nice", "temperature": "75"}) await result_callback({"conditions": "nice", "temperature": "75"})
async def run_bot(webrtc_connection: SmallWebRTCConnection): async def main():
logger.info(f"Starting bot") async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = SmallWebRTCTransport( transport = DailyTransport(
webrtc_connection=webrtc_connection, room_url,
params=TransportParams( token,
audio_in_enabled=True, "Respond bot",
audio_out_enabled=True, DailyParams(
vad_enabled=True, audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(), transcription_enabled=True,
vad_audio_passthrough=True, vad_enabled=True,
), vad_analyzer=SileroVADAnalyzer(),
) ),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY")) tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
tts = CartesiaTTSService( llm = QwenLLMService(api_key=os.getenv("QWEN_API_KEY"), model="qwen2.5-72b-instruct")
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = QwenLLMService(api_key=os.getenv("QWEN_API_KEY"), model="qwen2.5-72b-instruct") # You can also register a function_name of None to get all functions
# sent to the same callback with an additional function_name parameter.
llm.register_function("get_current_weather", fetch_weather_from_api)
# You can also register a function_name of None to get all functions weather_function = FunctionSchema(
# sent to the same callback with an additional function_name parameter. name="get_current_weather",
llm.register_function("get_current_weather", fetch_weather_from_api) description="Get the current weather",
properties={
weather_function = FunctionSchema( "location": {
name="get_current_weather", "type": "string",
description="Get the current weather", "description": "The city and state, e.g. San Francisco, CA",
properties={ },
"location": { "format": {
"type": "string", "type": "string",
"description": "The city and state, e.g. San Francisco, CA", "enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the user's location.",
},
}, },
"format": { required=["location", "format"],
"type": "string", )
"enum": ["celsius", "fahrenheit"], tools = ToolsSchema(standard_tools=[weather_function])
"description": "The temperature unit to use. Infer this from the user's location.",
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
}, },
},
required=["location", "format"],
)
tools = ToolsSchema(standard_tools=[weather_function])
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(),
stt,
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
] ]
)
task = PipelineTask( context = OpenAILLMContext(messages, tools)
pipeline, context_aggregator = llm.create_context_aggregator(context)
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_connected") pipeline = Pipeline(
async def on_client_connected(transport, client): [
logger.info(f"Client connected") transport.input(),
# Kick off the conversation. context_aggregator.user(),
await task.queue_frames([context_aggregator.user().get_context_frame()]) llm,
tts,
transport.output(),
context_aggregator.assistant(),
]
)
@transport.event_handler("on_client_disconnected") task = PipelineTask(
async def on_client_disconnected(transport, client): pipeline,
logger.info(f"Client disconnected") params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") await transport.capture_participant_transcription(participant["id"])
await task.cancel() # Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner(handle_sigint=False) runner = PipelineRunner()
await runner.run(task) await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,11 +4,15 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from openai.types.chat import ChatCompletionToolParam from openai.types.chat import ChatCompletionToolParam
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.parallel_pipeline import ParallelPipeline from pipecat.pipeline.parallel_pipeline import ParallelPipeline
@@ -18,14 +22,13 @@ from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.filters.function_filter import FunctionFilter from pipecat.processors.filters.function_filter import FunctionFilter
from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
current_voice = "News Lady" current_voice = "News Lady"
@@ -52,117 +55,105 @@ async def barbershop_man_filter(frame) -> bool:
return current_voice == "Barbershop Man" return current_voice == "Barbershop Man"
async def run_bot(webrtc_connection: SmallWebRTCConnection): async def main():
logger.info(f"Starting bot") async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = SmallWebRTCTransport( transport = DailyTransport(
webrtc_connection=webrtc_connection, room_url,
params=TransportParams( token,
audio_in_enabled=True, "Pipecat",
audio_out_enabled=True, DailyParams(
vad_enabled=True, audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(), transcription_enabled=True,
vad_audio_passthrough=True, vad_enabled=True,
), vad_analyzer=SileroVADAnalyzer(),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
news_lady = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="bf991597-6c13-47e4-8411-91ec2de5c466", # Newslady
)
british_lady = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
barbershop_man = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="a0e99841-438c-4a64-b679-ae501e7d6091", # Barbershop Man
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm.register_function("switch_voice", switch_voice)
tools = [
ChatCompletionToolParam(
type="function",
function={
"name": "switch_voice",
"description": "Switch your voice only when the user asks you to",
"parameters": {
"type": "object",
"properties": {
"voice": {
"type": "string",
"description": "The voice the user wants you to use",
},
},
"required": ["voice"],
},
},
)
]
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities. Respond to what the user said in a creative and helpful way. Your output should not include non-alphanumeric characters. You can do the following voices: 'News Lady', 'British Lady' and 'Barbershop Man'.",
},
]
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
llm, # LLM
ParallelPipeline( # TTS (one of the following vocies)
[FunctionFilter(news_lady_filter), news_lady], # News Lady voice
[
FunctionFilter(british_lady_filter),
british_lady,
], # British Reading Lady voice
[FunctionFilter(barbershop_man_filter), barbershop_man], # Barbershop Man voice
), ),
transport.output(), # Transport bot output )
context_aggregator.assistant(), # Assistant spoken responses
news_lady = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="bf991597-6c13-47e4-8411-91ec2de5c466", # Newslady
)
british_lady = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
barbershop_man = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="a0e99841-438c-4a64-b679-ae501e7d6091", # Barbershop Man
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
llm.register_function("switch_voice", switch_voice)
tools = [
ChatCompletionToolParam(
type="function",
function={
"name": "switch_voice",
"description": "Switch your voice only when the user asks you to",
"parameters": {
"type": "object",
"properties": {
"voice": {
"type": "string",
"description": "The voice the user wants you to use",
},
},
"required": ["voice"],
},
},
)
] ]
) messages = [
task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True))
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append(
{ {
"role": "system", "role": "system",
"content": f"Please introduce yourself to the user and let them know the voices you can do. Your initial responses should be as if you were a {current_voice}.", "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities. Respond to what the user said in a creative and helpful way. Your output should not include non-alphanumeric characters. You can do the following voices: 'News Lady', 'British Lady' and 'Barbershop Man'.",
} },
]
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
context_aggregator.user(), # User responses
llm, # LLM
ParallelPipeline( # TTS (one of the following vocies)
[FunctionFilter(news_lady_filter), news_lady], # News Lady voice
[
FunctionFilter(british_lady_filter),
british_lady,
], # British Reading Lady voice
[FunctionFilter(barbershop_man_filter), barbershop_man], # Barbershop Man voice
),
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
) )
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_client_disconnected") task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True))
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") await transport.capture_participant_transcription(participant["id"])
await task.cancel() # Kick off the conversation.
messages.append(
{
"role": "system",
"content": f"Please introduce yourself to the user and let them know the voices you can do. Your initial responses should be as if you were a {current_voice}.",
}
)
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner(handle_sigint=False) runner = PipelineRunner()
await runner.run(task) await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

View File

@@ -4,12 +4,16 @@
# SPDX-License-Identifier: BSD 2-Clause License # SPDX-License-Identifier: BSD 2-Clause License
# #
import asyncio
import os import os
import sys
import aiohttp
from deepgram import LiveOptions from deepgram import LiveOptions
from dotenv import load_dotenv from dotenv import load_dotenv
from loguru import logger from loguru import logger
from openai.types.chat import ChatCompletionToolParam from openai.types.chat import ChatCompletionToolParam
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.parallel_pipeline import ParallelPipeline from pipecat.pipeline.parallel_pipeline import ParallelPipeline
@@ -21,12 +25,12 @@ from pipecat.processors.filters.function_filter import FunctionFilter
from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import TransportParams from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True) load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
current_language = "English" current_language = "English"
@@ -45,110 +49,101 @@ async def spanish_filter(frame) -> bool:
return current_language == "Spanish" return current_language == "Spanish"
async def run_bot(webrtc_connection: SmallWebRTCConnection): async def main():
logger.info(f"Starting bot") async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = SmallWebRTCTransport( transport = DailyTransport(
webrtc_connection=webrtc_connection, room_url,
params=TransportParams( token,
audio_in_enabled=True, "Pipecat",
audio_out_enabled=True, DailyParams(
vad_enabled=True, audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(), vad_enabled=True,
vad_audio_passthrough=True, vad_analyzer=SileroVADAnalyzer(),
), vad_audio_passthrough=True,
)
stt = DeepgramSTTService(
api_key=os.getenv("DEEPGRAM_API_KEY"), live_options=LiveOptions(language="multi")
)
english_tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
spanish_tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="d4db5fb9-f44b-4bd1-85fa-192e0f0d75f9", # Spanish-speaking Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm.register_function("switch_language", switch_language)
tools = [
ChatCompletionToolParam(
type="function",
function={
"name": "switch_language",
"description": "Switch to another language when the user asks you to",
"parameters": {
"type": "object",
"properties": {
"language": {
"type": "string",
"description": "The language the user wants you to speak",
},
},
"required": ["language"],
},
},
)
]
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities. Respond to what the user said in a creative and helpful way. Your output should not include non-alphanumeric characters. You can speak the following languages: 'English' and 'Spanish'.",
},
]
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
llm, # LLM
ParallelPipeline( # TTS (bot will speak the chosen language)
[FunctionFilter(english_filter), english_tts], # English
[FunctionFilter(spanish_filter), spanish_tts], # Spanish
), ),
transport.output(), # Transport bot output )
context_aggregator.assistant(), # Assistant spoken responses
stt = DeepgramSTTService(
api_key=os.getenv("DEEPGRAM_API_KEY"), live_options=LiveOptions(language="multi")
)
english_tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
spanish_tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="846d6cb0-2301-48b6-9683-48f5618ea2f6", # Spanish-speaking Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
llm.register_function("switch_language", switch_language)
tools = [
ChatCompletionToolParam(
type="function",
function={
"name": "switch_language",
"description": "Switch to another language when the user asks you to",
"parameters": {
"type": "object",
"properties": {
"language": {
"type": "string",
"description": "The language the user wants you to speak",
},
},
"required": ["language"],
},
},
)
] ]
) messages = [
task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True))
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append(
{ {
"role": "system", "role": "system",
"content": f"Please introduce yourself to the user and let them know the languages you speak. Your initial responses should be in {current_language}.", "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities. Respond to what the user said in a creative and helpful way. Your output should not include non-alphanumeric characters. You can speak the following languages: 'English' and 'Spanish'.",
} },
]
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
llm, # LLM
ParallelPipeline( # TTS (bot will speak the chosen language)
[FunctionFilter(english_filter), english_tts], # English
[FunctionFilter(spanish_filter), spanish_tts], # Spanish
),
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
) )
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_client_disconnected") task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True))
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
@transport.event_handler("on_client_closed") @transport.event_handler("on_first_participant_joined")
async def on_client_closed(transport, client): async def on_first_participant_joined(transport, participant):
logger.info(f"Client closed connection") await transport.capture_participant_transcription(participant["id"])
await task.cancel() # Kick off the conversation.
messages.append(
{
"role": "system",
"content": f"Please introduce yourself to the user and let them know the languages you speak. Your initial responses should be in {current_language}.",
}
)
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner(handle_sigint=False) runner = PipelineRunner()
await runner.run(task) await runner.run(task)
if __name__ == "__main__": if __name__ == "__main__":
from run import main asyncio.run(main())
main()

Some files were not shown because too many files have changed in this diff Show More