Compare commits

..

1 Commits

Author SHA1 Message Date
Aleix Conchillo Flaqué
41695806e8 process SystemFrames in a queue as high priority frames 2025-05-07 17:05:35 -07:00
232 changed files with 1280 additions and 14585 deletions

View File

@@ -5,147 +5,6 @@ All notable changes to **Pipecat** will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [Unreleased]
### Added
- Added `SarvamTTSService`, which implements Sarvam AI's TTS API:
https://docs.sarvam.ai/api-reference-docs/text-to-speech/convert.
- Added `PipelineTask.add_observer()` and `PipelineTask.remove_observer()` to
allow mangaging observers at runtime. This is useful for cases where the task
is passed around to other code components that might want to observe the
pipeline dynamically.
- Added `user_id` field to `TranscriptionMessage`. This allows identifying the
user in a multi-user scenario. Note that this requires that
`TranscriptionFrame` has the `user_id` properly set.
- Added new `PipelineTask` event handlers `on_pipeline_started`,
`on_pipeline_stopped`, `on_pipeline_ended` and `on_pipeline_cancelled`, which
correspond to the `StartFrame`, `StopFrame`, `EndFrame` and `CancelFrame`
respectively.
- Added additional languages to `LmntTTSService`. Languages include: `hi`, `id`,
`it`, `ja`, `nl`, `pl`, `ru`, `sv`, `th`, `tr`, `uk`, `vi`.
- Added a `model` parameter to the `LmntTTSService` constructor, allowing
switching between LMNT models.
- Added `MiniMaxHttpTTSService`, which implements MiniMax's T2A API for TTS.
Learn more: https://www.minimax.io/platform_overview
- A new function `FrameProcessor.setup()` has been added to allow setting up
frame processors before receiving a `StartFrame`. This is what's happening
internally: `FrameProcessor.setup()` is called, `StartFrame` is pushed from
the beginning of the pipeline, your regular pipeline operations, `EndFrame`
or `CancelFrame` are pushed from the beginning of the pipeline and finally
`FrameProcessor.cleanup()` is called.
- Added support for OpenTelemetry tracing in Pipecat. This initial
implementation includes:
- A `setup_tracing` method where you can specify your OpenTelemetry exporter
- Service decorators for STT (`@traced_stt`), LLM (`@traced_llm`), and TTS
(`@traced_tts`) which trace the execution and collect properties and
metrics (TTFB, token usage, character counts, etc.)
- Class decorators that provide execution tracking; these are generic and can
be used for service tracking as needed
- Spans that help track traces on a per conversations and turn basis:
```
conversation-uuid
├── turn-1
│ ├── stt_deepgramsttservice
│ ├── llm_openaillmservice
│ └── tts_cartesiattsservice
...
└── turn-n
└── ...
```
By default, Pipecat has implemented service decorators to trace execution of
STT, LLM, and TTS services. You can enable tracing by setting `enable_tracing`
to `True` in the PipelineTask.
- Added `TurnTrackingObserver`, which tracks the start and end of a user/bot
turn pair and emits events `on_turn_started` and `on_turn_stopped`
corresponding to the start and end of a turn, respectively.
- Allow passing observers to `run_test()` while running unit tests.
### Changed
- Updated the default model for `AnthropicLLMService` to
`claude-sonnet-4-20250514`.
- Updated the default model for `GeminiMultimodalLiveLLMService` to
`models/gemini-2.5-flash-preview-native-audio-dialog`.
- `BaseTextFilter` methods `filter()`, `update_settings()`,
`handle_interruption()` and `reset_interruption()` are now async.
- `BaseTextAggregator` methods `aggregate()`, `handle_interruption()` and
`reset()` are now async.
- The API version for `CartesiaTTSService` and `CartesiaHttpTTSService` has
been updated. Also, the `cartesia` dependency has been updated to 2.x.
- `CartesiaTTSService` and `CartesiaHttpTTSService` now support Cartesia's new
`speed` parameter which accepts values of `slow`, `normal`, and `fast`.
- `GeminiMultimodalLiveLLMService` now uses the user transcription and usage
metrics provided by Gemini Live.
- `GoogleLLMService` has been updated to use `google-genai` instead of the
deprecated `google-generativeai`.
### Deprecated
- In `CartesiaTTSService` and `CartesiaHttpTTSService`, `emotion` has been
deprecated by Cartesia. Pipecat is following suit and deprecating `emotion`
as well.
### Removed
- Since `GeminiMultimodalLiveLLMService` now transcribes it's own audio, the
`transcribe_user_audio` arg has been removed. Audio is now transcribed
automatically.
- Removed `SileroVAD` frame processor, just use `SileroVADAnalyzer`
instead. Also removed, `07a-interruptible-vad.py` example.
### Fixed
- Fixed an issue with `ElevenLabsTTSService` where changing the model or voice
while the service is running wasn't working.
- Fixed an issue that would cause multiple instances of the same class to behave
incorrectly if any of the given constructor arguments defaulted to a mutable
value (e.g. lists, dictionaries, objects).
- Fixed an issue with `CartesiaTTSService` where `TTSTextFrame` messages weren't
being emitted when the model was set to `sonic`. This resulted in the
assistant context not being updated with assistant messages.
### Performance
- Don't create event handler tasks if no user event handlers have been
registered.
### Other
- Added foundation examples `07y-interruptible-minimax.py` and
`07z-interruptible-sarvam.py`to show how to use the `MiniMaxHttpTTSService`
and `SarvamTTSService`, respectively.
- Added an `open-telemetry-tracing` example, showing how to setup tracing. The
example also includes Jaeger as an open source OpenTelemetry client to review
traces from the example runs.
- Added foundational example `29-turn-tracking-observer.py` to show how to use
the `TurnTrackingObserver`.
## [0.0.67] - 2025-05-07
### Added

View File

@@ -49,18 +49,18 @@ You can connect to Pipecat from any platform using our official SDKs:
## 🧩 Available services
| Category | Services |
| ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [Parakeet (NVIDIA)](https://docs.pipecat.ai/server/services/stt/parakeet), [Ultravox](https://docs.pipecat.ai/server/services/stt/ultravox), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) |
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/server/services/llm/aws), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [Together AI](https://docs.pipecat.ai/server/services/llm/together) |
| Text-to-Speech | [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [FastPitch (NVIDIA)](https://docs.pipecat.ai/server/services/tts/fastpitch), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
| Speech-to-Speech | [AWS Nova Sonic](https://docs.pipecat.ai/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai) |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local |
| Video | [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) |
| Memory | [mem0](https://docs.pipecat.ai/server/services/memory/mem0) |
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/fal), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) |
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [Noisereduce](https://docs.pipecat.ai/server/utilities/audio/noisereduce-filter) |
| Analytics & Metrics | [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) |
| Category | Services |
|---------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [Parakeet (NVIDIA)](https://docs.pipecat.ai/server/services/stt/parakeet), [Ultravox](https://docs.pipecat.ai/server/services/stt/ultravox), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) |
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/server/services/llm/aws), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [Together AI](https://docs.pipecat.ai/server/services/llm/together) |
| Text-to-Speech | [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [FastPitch (NVIDIA)](https://docs.pipecat.ai/server/services/tts/fastpitch), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
| Speech-to-Speech | [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai) |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local |
| Video | [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) |
| Memory | [mem0](https://docs.pipecat.ai/server/services/memory/mem0) |
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/fal), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) |
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [Noisereduce](https://docs.pipecat.ai/server/utilities/audio/noisereduce-filter) |
| Analytics & Metrics | [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) |
📚 [View full services documentation →](https://docs.pipecat.ai/server/services/supported-services)

View File

@@ -95,16 +95,9 @@ OPENROUTER_API_KEY=...
PIPER_BASE_URL=...
# Smart turn
LOCAL_SMART_TURN_MODEL_PATH=...
LOCAL_SMART_TURN_MODEL_PATH=
FAL_SMART_TURN_API_KEY=...
# Twilio
TWILIO_ACCOUNT_SID=...
TWILIO_AUTH_TOKEN=...
# MiniMax
MINIMAX_API_KEY=...
MINIMAX_GROUP_ID=...
# Sarvam AI
SARVAM_API_KEY=...
TWILIO_ACCOUNT_SID=
TWILIO_AUTH_TOKEN=

View File

@@ -12,7 +12,7 @@
"@daily-co/daily-js": "0.74.0"
},
"devDependencies": {
"vite": "^6.3.5"
"vite": "^6.0.9"
}
},
"node_modules/@babel/runtime": {
@@ -999,9 +999,9 @@
}
},
"node_modules/vite": {
"version": "6.3.5",
"resolved": "https://registry.npmjs.org/vite/-/vite-6.3.5.tgz",
"integrity": "sha512-cZn6NDFE7wdTpINgs++ZJ4N49W2vRp8LCKrn3Ob1kYNtOo21vfDoaV5GzBfLU4MovSAB8uNRm4jgzVQZ+mBzPQ==",
"version": "6.3.3",
"resolved": "https://registry.npmjs.org/vite/-/vite-6.3.3.tgz",
"integrity": "sha512-5nXH+QsELbFKhsEfWLkHrvgRpTdGJzqOZ+utSdmPTvwHmvU6ITTm3xx+mRusihkcI8GeC7lCDyn3kDtiki9scw==",
"dev": true,
"dependencies": {
"esbuild": "^0.25.0",

View File

@@ -12,7 +12,7 @@
"license": "ISC",
"description": "",
"devDependencies": {
"vite": "^6.3.5"
"vite": "^6.0.9"
},
"dependencies": {
"@daily-co/daily-js": "0.74.0"

View File

@@ -1,6 +1,3 @@
# Modal clone
modal-examples
# Python
__pycache__/
*.py[cod]

View File

@@ -1,91 +1,37 @@
# Deploying Pipecat to Modal.com
Deployment example for [modal.com](https://www.modal.com). This example demonstrates how to deploy a FastAPI webapp to Modal with an RTVI compatible `/connect` endpoint that launches a Pipecat pipeline in a separate Modal container and returns a room/token for the client to join. This example also supports providing a parameter to the `/connect` endpoint for specifying which Pipecat pipeline to launch; openai, gemini, or vllm. The vllm pipeline points to a self-hosted OpenAI compatible LLM, using a llama model (neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w4a16), deployed to Modal.
Barebones deployment example for [modal.com](https://www.modal.com)
![](diagram.jpg)
1. Install dependencies
# Running this Example
```bash
python -m venv venv
source venv/bin/active # or OS equivalent
pip install -r requirements.txt
```
## Install the Modal CLI
2. Setup .env
Setup a Modal account and install it on your machine if you have not already, following their easy 3 steps in their [Getting Started Guide](https://modal.com/docs/guide#getting-started)
```bash
cp env.example .env
```
## Deploy a self-serve LLM
Alternatively, you can configure your Modal app to use [secrets](https://modal.com/docs/guide/secrets)
1. Deploy Modal's OpenAI-compatible LLM service:
3. Test the app locally
```bash
git clone https://github.com/modal-labs/modal-examples
cd modal-examples
modal deploy 06_gpu_and_ml/llm-serving/vllm_inference.py
```
Refer to Modal's guide and example for [Deploying an OpenAI-compatible LLM service with vLLM](https://modal.com/docs/examples/vllm_inference) for more details.
2. Take note of the endpoint URL from the previous step, which will look like:
```
https://{your-workspace}--example-vllm-openai-compatible-serve.modal.run
```
You'll need this for the `bot_vllm.py` file in the next section.
**Note:** The default Modal LLM example uses Llama-3.1 and will shut down after 15 minutes of inactivity. Cold starts take 5-10 minutes. To prepare the service, we recommend visiting the `/docs` endpoint (`https://<Modal workspace>--example-vllm-openai-compatible-serve.modal.run/docs`) for your deployed LLM and wait for it to fully load before connecting your client.
## Deploy FastAPI App and Pipecat pipeline to Modal
1. Setup environment variables
```bash
cd server
cp env.example .env
# Modify .env to provide your service API Keys
```
Alternatively, you can configure your Modal app to use [secrets](https://modal.com/docs/guide/secrets)
2. Update the `modal_url` in `server/src/bot_vllm.py` to point to the url produced from the self-serve llm deploy, mentioned above.
3. From within the `server` directory, test the app locally:
```bash
modal serve app.py
```
```bash
modal serve app.py
```
4. Deploy to production
```bash
modal deploy app.py
```
```bash
modal deploy app.py
```
5. Note the endpoint URL produced from this deployment. It will look like:
## Configuration options
```bash
https://{your-workspace}--pipecat-modal-fastapi-app.modal.run
```
This app sets some sensible defaults for reducing cold starts, such as `minkeep_warm=1`, which will keep at least 1 warm instance ready for your bot function.
You'll need this URL for the client's `app.js` configuration mentioned in its README.
## Launch your bots on Modal
### Option 1: Direct Link
Simply click on the url displayed after running the server or deploy step to launch an agent and be redirected to a Daily room to talk with the launched bot. This will use the OpenAI pipeline.
### Option 2: Connect via an RTVI Client
Follow the instructions provided in the [client folder's README](client/javascript/README.md) for building and running a custom client that connects to your Modal endpoint. The provided client provides a dropdown for choosing which bot pipeline to run.
# Navigating your llm, server, and Pipecat logs
In your [Modal dashboard](https://modal.com/apps), you should have two Apps listed under Live Apps:
1. `example-vllm-openai-compatible`: This App contains the containers and logs used to run your self-hosted LLM. There will be just one App Function listed: `serve`. Click on this function to view logs for your LLM.
2. `pipecat-modal`: This App contains the containers and logs used to run your `connect` endpoints and Pipecat pipelines. It will list two App Functions:
1. `fastapi_app`: This function is running the endpoints that your client will interact with and initiate starting a new pipeline (`/`, `/connect`, `/status`). Click on this function to see logs for each endpoint hit.
2. `bot_runner`: This function handles launching and running a bot pipeline. Click on this function to get a list of all pipeline runs and access each run's logs.
# Modal + Pipecat Tips
- In most other Pipecat examples, we use `Popen` to launch the pipeline process from the `/connect` endpoint. In this example, we use a Modal function instead. This allows us to run the pipelines using a separately defined Modal image as well as run each pipeline in an isolated container.
- For the FastAPI and most common Pipecat Pipeline containers, a default `debian_slim` CPU-only should be all that's required to run. GPU containers are needed for self-hosted services.
- To minimize cold starts of the pipeline and reduce latency for users, set `min_containers=1` on the Modal Function that launches the pipeline to ensure at least one warm instance of your function is always available.
- For next steps on running a self-hosted llm and reducing latency, check out all of [Modal's LLM examples](https://modal.com/docs/examples/vllm_inference).
It has been configured to only allow a concurrency of 1 (`max_inputs=1`) as each user will require their own running function.

View File

@@ -0,0 +1,80 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
import aiohttp
import modal
from bot import _voice_bot_process
from fastapi import HTTPException
from fastapi.responses import JSONResponse
from loguru import logger
MAX_SESSION_TIME = 15 * 60 # 15 minutes
app = modal.App("pipecat-modal")
image = modal.Image.debian_slim(python_version="3.12").pip_install_from_requirements(
"requirements.txt"
)
@app.function(
image=image,
cpu=1.0,
secrets=[modal.Secret.from_dotenv()],
keep_warm=1,
enable_memory_snapshot=True,
max_inputs=1, # Do not reuse instances across requests
retries=0,
)
def launch_bot_process(room_url: str, token: str):
_voice_bot_process(room_url, token)
@app.function(
image=image,
secrets=[modal.Secret.from_dotenv()],
)
@modal.web_endpoint(method="POST")
async def start():
from pipecat.transports.services.helpers.daily_rest import (
DailyRESTHelper,
DailyRoomParams,
)
logger.info("Request received")
async with aiohttp.ClientSession() as session:
daily_rest_helper = DailyRESTHelper(
daily_api_key=os.getenv("DAILY_API_KEY", ""),
daily_api_url=os.getenv("DAILY_API_URL", "https://api.daily.co/v1"),
aiohttp_session=session,
)
# Create new Daily room
room = await daily_rest_helper.create_room(DailyRoomParams())
if not room.url:
raise HTTPException(
status_code=500,
detail="Unable to create room",
)
logger.info(f"Created room: {room.url}")
# Create bot token for room
token = await daily_rest_helper.get_token(room.url, MAX_SESSION_TIME)
if not token:
raise HTTPException(status_code=500, detail=f"Failed to get token for room: {room.url}")
logger.info(f"Bot token created: {token}")
# Spawn a new bot process
launch_bot_process.spawn(room_url=room.url, token=token)
# Return room URL to the user to join
# Note: in production, you would want to return a token to the user
return JSONResponse(content={"room_url": room.url, token: token})

View File

@@ -0,0 +1,95 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url: str, token: str):
transport = DailyTransport(
room_url,
token,
"bot",
DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
transcription_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY", ""), voice_id="71a7ad14-091c-4e8e-a314-022ece01c121"
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(),
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
runner = PipelineRunner()
await runner.run(task)
def _voice_bot_process(room_url: str, token: str):
asyncio.run(main(room_url, token))

View File

@@ -1 +0,0 @@
node_modules

View File

@@ -1,29 +0,0 @@
# JavaScript Implementation
Basic implementation using the [Pipecat JavaScript SDK](https://docs.pipecat.ai/client/js/introduction).
## Setup
1. Deploy the Modal server. See the main [README](../../README).
2. Navigate to the `client/javascript` directory:
```bash
cd client/javascript
```
3. Modify the baseUrl in src/app.js to point to your deployed Modal endpoint
4. Install dependencies:
```bash
npm install
```
5. Run the client app:
```
npm run dev
```
6. Visit http://localhost:5173 in your browser.

View File

@@ -1,49 +0,0 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>AI Chatbot</title>
</head>
<body>
<div class="container">
<div class="status-bar">
<div class="status">
Status: <span id="connection-status">Disconnected</span>
</div>
<div class="controls">
<select id="bot-selector">
<option value="openai">OpenAI</option>
<option value="gemini">Gemini</option>
<option value="vllm">Llama</option>
</select>
<button id="connect-btn">Connect</button>
<button id="disconnect-btn" disabled>Disconnect</button>
</div>
</div>
<div class="main-content">
<div class="bot-container">
<div id="bot-video-container"></div>
<audio id="bot-audio" autoplay></audio>
</div>
</div>
<div class="device-bar">
<div class="device-controls">
<select id="device-selector"></select>
<button id="mic-toggle-btn">Mute Mic</button>
</div>
</div>
<div class="debug-panel">
<h3>Debug Info</h3>
<div id="debug-log"></div>
</div>
</div>
<script type="module" src="/src/app.js"></script>
<link rel="stylesheet" href="/src/style.css" />
</body>
</html>

File diff suppressed because it is too large Load Diff

View File

@@ -1,21 +0,0 @@
{
"name": "client",
"version": "1.0.0",
"main": "index.js",
"scripts": {
"dev": "vite",
"build": "vite build",
"preview": "vite preview"
},
"keywords": [],
"author": "",
"license": "ISC",
"description": "",
"devDependencies": {
"vite": "^6.3.5"
},
"dependencies": {
"@pipecat-ai/client-js": "^0.3.5",
"@pipecat-ai/daily-transport": "^0.3.10"
}
}

View File

@@ -1,381 +0,0 @@
/**
* Copyright (c) 20242025, Daily
*
* SPDX-License-Identifier: BSD 2-Clause License
*/
/**
* RTVI Client Implementation
*
* This client connects to an RTVI-compatible bot server using WebRTC (via Daily).
* It handles audio/video streaming and manages the connection lifecycle.
*
* Requirements:
* - A running RTVI bot server (defaults to http://localhost:7860)
* - The server must implement the /connect endpoint that returns Daily.co room credentials
* - Browser with WebRTC support
*/
import { RTVIClient, RTVIEvent } from '@pipecat-ai/client-js';
import { DailyTransport } from '@pipecat-ai/daily-transport';
/**
* ChatbotClient handles the connection and media management for a real-time
* voice and video interaction with an AI bot.
*/
class ChatbotClient {
constructor() {
// Initialize client state
this.rtviClient = null;
this.setupDOMElements();
this.initializeClientAndTransport();
this.setupEventListeners();
}
/**
* Set up references to DOM elements and create necessary media elements
*/
setupDOMElements() {
// Get references to UI control elements
this.connectBtn = document.getElementById('connect-btn');
this.disconnectBtn = document.getElementById('disconnect-btn');
this.statusSpan = document.getElementById('connection-status');
this.debugLog = document.getElementById('debug-log');
this.botVideoContainer = document.getElementById('bot-video-container');
this.deviceSelector = document.getElementById('device-selector');
// Create an audio element for bot's voice output
this.botAudio = document.createElement('audio');
this.botAudio.autoplay = true;
this.botAudio.playsInline = true;
document.body.appendChild(this.botAudio);
}
/**
* Set up event listeners for connect/disconnect buttons
*/
setupEventListeners() {
this.connectBtn.addEventListener('click', () => this.connect());
this.disconnectBtn.addEventListener('click', () => this.disconnect());
// Populate device selector
this.rtviClient.getAllMics().then((mics) => {
console.log('Available mics:', mics);
mics.forEach((device) => {
const option = document.createElement('option');
option.value = device.deviceId;
option.textContent = device.label || `Microphone ${device.deviceId}`;
this.deviceSelector.appendChild(option);
});
});
this.deviceSelector.addEventListener('change', (event) => {
const selectedDeviceId = event.target.value;
console.log('Selected device ID:', selectedDeviceId);
this.rtviClient.updateMic(selectedDeviceId);
});
// Handle mic mute/unmute toggle
const micToggleBtn = document.getElementById('mic-toggle-btn');
micToggleBtn.addEventListener('click', () => {
let micEnabled = this.rtviClient.isMicEnabled;
micToggleBtn.textContent = micEnabled ? 'Unmute Mic' : 'Mute Mic';
this.rtviClient.enableMic(!micEnabled);
// Add logic to mute/unmute the mic
if (micEnabled) {
console.log('Mic muted');
// Add code to mute the mic
} else {
console.log('Mic unmuted');
// Add code to unmute the mic
}
});
}
/**
* Set up the RTVI client and Daily transport
*/
async initializeClientAndTransport() {
// Initialize the RTVI client with a DailyTransport and our configuration
this.rtviClient = new RTVIClient({
transport: new DailyTransport(),
params: {
// REPLACE WITH YOUR MODAL URL ENDPOINT
baseUrl:
'https://<Modal workspace>--pipecat-modal-bot-launcher.modal.run',
endpoints: {
connect: '/connect',
},
requestData: {
bot_name: 'openai',
},
},
enableMic: true, // Enable microphone for user input
enableCam: false,
callbacks: {
// Handle connection state changes
onConnected: () => {
this.updateStatus('Connected');
this.connectBtn.disabled = true;
this.disconnectBtn.disabled = false;
this.log('Client connected');
},
onDisconnected: () => {
this.updateStatus('Disconnected');
this.connectBtn.disabled = false;
this.disconnectBtn.disabled = true;
this.log('Client disconnected');
},
// Handle transport state changes
onTransportStateChanged: (state) => {
this.updateStatus(`Transport: ${state}`);
this.log(`Transport state changed: ${state}`);
if (state === 'connecting') {
window.startTime = Date.now();
}
if (state === 'ready') {
this.setupMediaTracks();
console.warn('TIME TO BOT READY:', Date.now() - window.startTime);
}
},
// Handle bot connection events
onBotConnected: (participant) => {
this.log(`Bot connected: ${JSON.stringify(participant)}`);
},
onBotDisconnected: (participant) => {
this.log(`Bot disconnected: ${JSON.stringify(participant)}`);
},
onBotReady: (data) => {
this.log(`Bot ready: ${JSON.stringify(data)}`);
this.setupMediaTracks();
},
// Transcript events
onUserTranscript: (data) => {
// Only log final transcripts
if (data.final) {
this.log(`User: ${data.text}`);
}
},
onBotTranscript: (data) => {
this.log(`Bot: ${data.text}`);
},
// Error handling
onMessageError: (error) => {
console.log('Message error:', error);
},
onMicUpdated: (data) => {
console.log('Mic updated:', data);
this.deviceSelector.value = data.deviceId;
},
onError: (error) => {
console.log('Error:', JSON.stringify(error));
},
},
});
// Set up listeners for media track events
this.setupTrackListeners();
await this.rtviClient.initDevices();
window.client = this.rtviClient;
}
/**
* Add a timestamped message to the debug log
*/
log(message) {
const entry = document.createElement('div');
entry.textContent = `${new Date().toISOString()} - ${message}`;
// Add styling based on message type
if (message.startsWith('User: ')) {
entry.style.color = '#2196F3'; // blue for user
} else if (message.startsWith('Bot: ')) {
entry.style.color = '#4CAF50'; // green for bot
}
this.debugLog.appendChild(entry);
this.debugLog.scrollTop = this.debugLog.scrollHeight;
console.log(message);
}
/**
* Update the connection status display
*/
updateStatus(status) {
this.statusSpan.textContent = status;
this.log(`Status: ${status}`);
}
/**
* Check for available media tracks and set them up if present
* This is called when the bot is ready or when the transport state changes to ready
*/
setupMediaTracks() {
if (!this.rtviClient) return;
// Get current tracks from the client
const tracks = this.rtviClient.tracks();
// Set up any available bot tracks
if (tracks.bot?.audio) {
this.setupAudioTrack(tracks.bot.audio);
}
if (tracks.bot?.video) {
this.setupVideoTrack(tracks.bot.video);
}
}
/**
* Set up listeners for track events (start/stop)
* This handles new tracks being added during the session
*/
setupTrackListeners() {
if (!this.rtviClient) return;
// Listen for new tracks starting
this.rtviClient.on(RTVIEvent.TrackStarted, (track, participant) => {
// Only handle non-local (bot) tracks
if (!participant?.local) {
if (track.kind === 'audio') {
this.setupAudioTrack(track);
} else if (track.kind === 'video') {
this.setupVideoTrack(track);
}
this.log(
`Track started event: ${track.kind} from ${
participant?.name || 'unknown'
}`
);
} else {
this.log('Local mic unmuted');
}
});
// Listen for tracks stopping
this.rtviClient.on(RTVIEvent.TrackStopped, (track, participant) => {
if (participant.local) {
this.log('Local mic muted');
return;
}
this.log(
`Track stopped event: ${track.kind} from ${
participant?.name || 'unknown'
}`
);
});
}
/**
* Set up an audio track for playback
* Handles both initial setup and track updates
*/
setupAudioTrack(track) {
this.log('Setting up audio track');
// Check if we're already playing this track
if (this.botAudio.srcObject) {
const oldTrack = this.botAudio.srcObject.getAudioTracks()[0];
if (oldTrack?.id === track.id) return;
}
// Create a new MediaStream with the track and set it as the audio source
this.botAudio.srcObject = new MediaStream([track]);
}
/**
* Set up a video track for display
* Handles both initial setup and track updates
*/
setupVideoTrack(track) {
this.log('Setting up video track');
const videoEl = document.createElement('video');
videoEl.autoplay = true;
videoEl.playsInline = true;
videoEl.muted = true;
videoEl.style.width = '100%';
videoEl.style.height = '100%';
videoEl.style.objectFit = 'cover';
// Check if we're already displaying this track
if (this.botVideoContainer.querySelector('video')?.srcObject) {
const oldTrack = this.botVideoContainer
.querySelector('video')
.srcObject.getVideoTracks()[0];
if (oldTrack?.id === track.id) return;
}
// Create a new MediaStream with the track and set it as the video source
videoEl.srcObject = new MediaStream([track]);
this.botVideoContainer.innerHTML = '';
this.botVideoContainer.appendChild(videoEl);
}
/**
* Initialize and connect to the bot
* This sets up the RTVI client, initializes devices, and establishes the connection
*/
async connect() {
try {
const botSelector = document.getElementById('bot-selector');
const selectedBot = botSelector.value;
this.rtviClient.params.requestData.bot_name = selectedBot;
// Initialize audio/video devices
this.log('Initializing devices...');
await this.rtviClient.initDevices();
// Connect to the bot
this.log(`Connecting to bot: ${selectedBot}`);
await this.rtviClient.connect();
this.log('Connection complete');
} catch (error) {
// Handle any errors during connection
console.error('Connection error:', error);
this.log(`Error connecting: ${JSON.stringify(error.message)}`);
this.log(`Error stack: ${error.stack}`);
this.updateStatus('Error');
// Clean up if there's an error
if (this.rtviClient) {
try {
await this.rtviClient.disconnect();
} catch (disconnectError) {
this.log(`Error during disconnect: ${disconnectError.message}`);
}
}
}
}
/**
* Disconnect from the bot and clean up media resources
*/
async disconnect() {
if (this.rtviClient) {
try {
// Disconnect the RTVI client
await this.rtviClient.disconnect();
// Clean up audio
if (this.botAudio.srcObject) {
this.botAudio.srcObject.getTracks().forEach((track) => track.stop());
this.botAudio.srcObject = null;
}
// Clean up video
if (this.botVideoContainer.querySelector('video')?.srcObject) {
const video = this.botVideoContainer.querySelector('video');
video.srcObject.getTracks().forEach((track) => track.stop());
video.srcObject = null;
}
this.botVideoContainer.innerHTML = '';
} catch (error) {
this.log(`Error disconnecting: ${error.message}`);
}
}
}
}
// Initialize the client when the page loads
window.addEventListener('DOMContentLoaded', () => {
new ChatbotClient();
});

View File

@@ -1,135 +0,0 @@
body {
margin: 0;
padding: 20px;
font-family: Arial, sans-serif;
background-color: #f0f0f0;
}
.container {
max-width: 1200px;
margin: 0 auto;
}
.status-bar,
.device-bar {
display: flex;
justify-content: space-between;
align-items: center;
padding: 10px;
background-color: #fff;
border-radius: 8px;
margin-bottom: 20px;
}
.controls,
.device-controls {
display: flex;
align-items: center;
gap: 10px; /* Adds spacing between elements */
}
.device-controls {
margin-left: auto;
}
.controls button,
.device-controls button {
padding: 8px 16px;
margin-left: 10px;
border: none;
border-radius: 4px;
cursor: pointer;
}
#bot-selector,
#device-selector {
padding: 8px 16px;
padding-right: 40px;
border: none;
border-radius: 4px;
background-color: #6c757d; /* Gray background */
color: white; /* White text */
cursor: pointer;
appearance: none; /* Removes default browser styling for dropdowns */
background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 24 24' fill='white'%3E%3Cpath d='M7 10l5 5 5-5z'/%3E%3C/svg%3E"); /* Custom arrow */
background-repeat: no-repeat;
background-position: right 8px center; /* Position the arrow */
}
#bot-selector:focus,
#device-selector:focus {
outline: none;
box-shadow: 0 0 4px rgba(0, 0, 0, 0.3); /* Add a subtle focus effect */
}
#connect-btn {
background-color: #4caf50;
color: white;
}
#disconnect-btn {
background-color: #f44336;
color: white;
}
#mic-toggle-btn {
}
button:disabled {
opacity: 0.5;
cursor: not-allowed;
}
.main-content {
background-color: #fff;
border-radius: 8px;
padding: 20px;
margin-bottom: 20px;
}
.bot-container {
display: flex;
flex-direction: column;
align-items: center;
}
#bot-video-container {
width: 640px;
height: 360px;
background-color: #e0e0e0;
border-radius: 8px;
margin: 20px auto;
overflow: hidden;
display: flex;
align-items: center;
justify-content: center;
}
#bot-video-container video {
width: 100%;
height: 100%;
object-fit: cover;
}
.debug-panel {
background-color: #fff;
border-radius: 8px;
padding: 20px;
}
.debug-panel h3 {
margin: 0 0 10px 0;
font-size: 16px;
font-weight: bold;
}
#debug-log {
height: 200px;
overflow-y: auto;
background-color: #f8f8f8;
padding: 10px;
border-radius: 4px;
font-family: monospace;
font-size: 12px;
line-height: 1.4;
}

Binary file not shown.

Before

Width:  |  Height:  |  Size: 114 KiB

View File

@@ -0,0 +1,3 @@
DAILY_API_KEY=
OPENAI_API_KEY=
CARTESIA_API_KEY=

View File

@@ -0,0 +1,4 @@
python-dotenv==1.0.1
modal==0.71.3
pipecat-ai[daily,silero,cartesia,openai]
fastapi==0.115.6

View File

@@ -1,307 +0,0 @@
"""modal_example.
This module shows a simple example of how to deploy a bot using Modal and FastAPI.
It includes:
- FastAPI endpoints for starting agents and checking bot statuses.
- Dynamic loading of bot implementations.
- Use of a Daily transport for bot communication.
"""
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import importlib
import os
from contextlib import asynccontextmanager
from typing import Any, Dict, Literal
import aiohttp
import modal
from fastapi import APIRouter, FastAPI, HTTPException
from fastapi.responses import JSONResponse, RedirectResponse
from pydantic import BaseModel
# container specifications for the FastAPI web server
web_image = (
modal.Image.debian_slim(python_version="3.13")
.pip_install_from_requirements("requirements.txt")
.pip_install("pipecat-ai[daily]")
.add_local_dir("src", remote_path="/root/src")
)
# container specifications for the Pipecat pipeline
bot_image = (
modal.Image.debian_slim(python_version="3.13")
.apt_install("ffmpeg")
.pip_install_from_requirements("requirements.txt")
.pip_install("pipecat-ai[daily,elevenlabs,openai,silero,google]")
.add_local_dir("src", remote_path="/root/src")
)
app = modal.App("pipecat-modal", secrets=[modal.Secret.from_dotenv()])
router = APIRouter()
bot_jobs = {}
daily_helpers = {}
# Names of all supported bot implementations
# These correspond to the bot files in the src directory
BotName = Literal["openai", "gemini", "vllm"]
def cleanup():
"""Cleanup function to terminate all bot processes.
Called during server shutdown.
"""
for entry in bot_jobs.values():
func = modal.FunctionCall.from_id(entry[0])
if func:
func.cancel()
def get_bot_file(bot_name: BotName) -> str:
"""Retrieve the bot file name corresponding to the provided bot_name.
Args:
bot_name (BotName): The name of the bot (e.g., 'openai', 'gemini', 'vllm').
Returns:
str: The file name corresponding to the bot implementation.
Raises:
ValueError: If the bot name is invalid or not supported.
"""
# bot_implementation = os.getenv("BOT_IMPLEMENTATION", "openai").lower().strip()
bot_implementation = bot_name.lower().strip()
if not bot_implementation:
bot_implementation = "openai"
if bot_implementation not in ["openai", "gemini", "vllm"]:
raise ValueError(
f"Invalid BOT_IMPLEMENTATION: {bot_implementation}. Must be 'openai' or 'gemini' or 'vllm'"
)
return f"bot_{bot_implementation}"
def get_runner(path: str, bot_file: str) -> callable:
"""Dynamically import the run_bot function based on the bot name.
Args:
path (str): The path to the bot files (e.g., 'src').
bot_file (str): The file name of the bot implementation (e.g., 'openai', 'gemini', 'vllm').
Returns:
function: The run_bot function from the specified bot module.
Raises:
ImportError: If the specified bot module or run_bot function is not found.
"""
try:
# Dynamically construct the module name
module_name = f"{path}.{bot_file}"
# Import the module
module = importlib.import_module(module_name)
# Get the run_bot function from the module
return getattr(module, "run_bot")
except (ImportError, AttributeError) as e:
raise ImportError(f"Failed to import run_bot from {module_name}: {e}")
async def create_room_and_token() -> tuple[str, str]:
"""Create a Daily room and generate an authentication token.
This function checks for existing room URL and token in the environment variables.
If not found, it creates a new room using the Daily API and generates a token for it.
Returns:
tuple[str, str]: A tuple containing the room URL and the authentication token.
Raises:
HTTPException: If room creation or token generation fails.
"""
from pipecat.transports.services.helpers.daily_rest import DailyRoomParams
room_url = os.getenv("DAILY_SAMPLE_ROOM_URL", None)
token = os.getenv("DAILY_SAMPLE_ROOM_TOKEN", None)
if not room_url:
room = await daily_helpers["rest"].create_room(DailyRoomParams())
if not room.url:
raise HTTPException(status_code=500, detail="Failed to create room")
room_url = room.url
token = await daily_helpers["rest"].get_token(room_url)
if not token:
raise HTTPException(status_code=500, detail=f"Failed to get token for room: {room_url}")
return room_url, token
@app.function(image=bot_image, min_containers=1)
async def bot_runner(room_url, token, bot_name: BotName = "openai"):
"""Launch the provided bot process, providing the given room URL and token for the bot to join.
Args:
room_url (str): The URL of the Daily room where the bot and client will communicate.
token (str): The authentication token for the room.
bot_name (BotName): The name of the bot implementation to use. Defaults to "openai".
Raises:
HTTPException: If the bot pipeline fails to start.
"""
try:
path = "src"
bot_file = get_bot_file(bot_name)
run_bot = get_runner(path, bot_file)
print(f"Starting bot process: {bot_file} -u {room_url} -t {token}")
await run_bot(room_url, token)
except Exception as e:
raise HTTPException(status_code=500, detail=f"Failed to start bot pipeline: {e}")
@asynccontextmanager
async def lifespan(app: FastAPI):
"""FastAPI lifespan manager that handles startup and shutdown tasks.
- Creates aiohttp session
- Initializes Daily API helper
- Cleans up resources on shutdown
"""
from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper
aiohttp_session = aiohttp.ClientSession()
daily_helpers["rest"] = DailyRESTHelper(
daily_api_key=os.getenv("DAILY_API_KEY", ""),
daily_api_url=os.getenv("DAILY_API_URL", "https://api.daily.co/v1"),
aiohttp_session=aiohttp_session,
)
yield
await aiohttp_session.close()
cleanup()
class ConnectData(BaseModel):
"""Data provided by client to specify the bot pipeline.
Attributes:
bot_name (BotName): The name of the bot to connect to. Defaults to "openai".
"""
bot_name: BotName = "openai"
async def start(data: ConnectData):
"""Internal method to start a bot agent and return the room URL and token.
Args:
data (ConnectData): The data containing the bot name to use.
Returns:
tuple[str, str]: A tuple containing the room URL and token.
"""
room_url, token = await create_room_and_token()
launch_bot_func = modal.Function.from_name("pipecat-modal", "bot_runner")
function_id = launch_bot_func.spawn(room_url, token, data.bot_name)
bot_jobs[function_id] = (function_id, room_url)
return room_url, token
@router.get("/")
async def start_agent():
"""A user endpoint for launching a bot agent and redirecting to the created room URL.
This function retrieves the bot implementation from the environment,
starts the bot agent, and redirects the user to the room URL to
interact with the bot through a Daily Prebuilt Interface.
Returns:
RedirectResponse: A response that redirects to the room URL.
"""
bot_name = os.getenv("BOT_IMPLEMENTATION", "openai").lower().strip()
print(f"Starting bot: {bot_name}")
room_url, token = await start(ConnectData(bot_name=bot_name))
return RedirectResponse(room_url)
@router.post("/connect")
async def rtvi_connect(data: ConnectData) -> Dict[Any, Any]:
"""A user endpoint for launching a bot agent and retrieving the room/token credentials.
This function retrieves the bot implementation from the request, if provided,
starts the bot agent, and returns the room URL and token for the bot. This allows the
client to then connect to the bot using their own RTVI interface.
Args:
data (ConnectData): Optional. The data containing the bot name to use.
Returns:
Dict[Any, Any]: A dictionary containing the room URL and token.
"""
print(f"Starting bot: {data.bot_name}")
if data is None or not data.bot_name:
data.bot_name = os.getenv("BOT_IMPLEMENTATION", "openai").lower().strip()
room_url, token = await start(data)
return {"room_url": room_url, "token": token}
@router.get("/status/{fid}")
def get_status(fid: str):
"""Retrieve the status of a bot process by its function ID.
Args:
fid (str): The function ID of the bot process.
Returns:
JSONResponse: A JSON response containing the bot's status and result code.
Raises:
HTTPException: If the bot process with the given ID is not found.
"""
func = modal.FunctionCall.from_id(fid)
if not func:
raise HTTPException(status_code=404, detail=f"Bot with process id: {fid} not found")
try:
result = func.get(timeout=0)
return JSONResponse({"bot_id": fid, "status": "finished", "code": result})
except modal.exception.OutputExpiredError:
return JSONResponse({"bot_id": fid, "status": "finished", "code": 404})
except TimeoutError:
return JSONResponse({"bot_id": fid, "status": "running", "code": 202})
@app.function(image=web_image, min_containers=1)
@modal.concurrent(max_inputs=1)
@modal.asgi_app()
def fastapi_app():
"""Create and configure the FastAPI application.
This function initializes the FastAPI app with middleware, routes, and lifespan management.
It is decorated to be used as a Modal ASGI app.
"""
from fastapi.middleware.cors import CORSMiddleware
# Initialize FastAPI app
web_app = FastAPI(lifespan=lifespan)
web_app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
# Include the endpoints from endpoints.py
web_app.include_router(router)
return web_app

View File

@@ -1,14 +0,0 @@
DAILY_API_KEY=
# determines which bot file to default to: 'openai', 'gemini', or 'vllm'
BOT_IMPLEMENTATION=openai
# needed for the openai bot pipeline
OPENAI_API_KEY=
ELEVENLABS_API_KEY=
# needed for the gemini live bot pipeline
GOOGLE_API_KEY=
# needed if you modified the API Key for your self-hosted LLM
VLLM_API_KEY=

View File

@@ -1,2 +0,0 @@
python-dotenv==1.0.1
modal==0.71.3

Binary file not shown.

Before

Width:  |  Height:  |  Size: 759 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 884 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 876 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 881 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 866 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 874 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 882 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 885 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 888 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 890 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 898 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 836 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 903 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 908 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 908 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 905 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 903 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 866 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 849 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 866 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 866 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 864 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 858 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 875 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 881 KiB

View File

@@ -1,198 +0,0 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
"""Gemini Bot Implementation.
This module implements a chatbot using Google's Gemini Multimodal Live model.
It includes:
- Real-time audio/video interaction through Daily
- Animated robot avatar
- Speech-to-speech model
The bot runs as part of a pipeline that processes audio/video frames and manages
the conversation flow using Gemini's streaming capabilities.
"""
import os
import sys
from dotenv import load_dotenv
from loguru import logger
from PIL import Image
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import (
BotStartedSpeakingFrame,
BotStoppedSpeakingFrame,
Frame,
OutputImageRawFrame,
SpriteFrame,
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.processors.frameworks.rtvi import RTVIConfig, RTVIObserver, RTVIProcessor
from pipecat.services.gemini_multimodal_live.gemini import GeminiMultimodalLiveLLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
try:
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
except ValueError:
# Handle the case where logger is already initialized
pass
sprites = []
script_dir = os.path.dirname(__file__)
for i in range(1, 26):
# Build the full path to the image file
full_path = os.path.join(script_dir, f"assets/robot0{i}.png")
# Get the filename without the extension to use as the dictionary key
# Open the image and convert it to bytes
with Image.open(full_path) as img:
sprites.append(OutputImageRawFrame(image=img.tobytes(), size=img.size, format=img.format))
# Create a smooth animation by adding reversed frames
flipped = sprites[::-1]
sprites.extend(flipped)
# Define static and animated states
quiet_frame = sprites[0] # Static frame for when bot is listening
talking_frame = SpriteFrame(images=sprites) # Animation sequence for when bot is talking
class TalkingAnimation(FrameProcessor):
"""Manages the bot's visual animation states.
Switches between static (listening) and animated (talking) states based on
the bot's current speaking status.
"""
def __init__(self):
super().__init__()
self._is_talking = False
async def process_frame(self, frame: Frame, direction: FrameDirection):
"""Process incoming frames and update animation state.
Args:
frame: The incoming frame to process
direction: The direction of frame flow in the pipeline
"""
await super().process_frame(frame, direction)
# Switch to talking animation when bot starts speaking
if isinstance(frame, BotStartedSpeakingFrame):
if not self._is_talking:
await self.push_frame(talking_frame)
self._is_talking = True
# Return to static frame when bot stops speaking
elif isinstance(frame, BotStoppedSpeakingFrame):
await self.push_frame(quiet_frame)
self._is_talking = False
await self.push_frame(frame, direction)
async def run_bot(room_url: str, token: str):
"""Main bot execution function.
Sets up and runs the bot pipeline including:
- Daily video transport with specific audio parameters
- Gemini Live multimodal model integration
- Voice activity detection
- Animation processing
- RTVI event handling
"""
# Set up Daily transport with specific audio/video parameters for Gemini
transport = DailyTransport(
room_url,
token,
"Chatbot",
DailyParams(
audio_out_enabled=True,
camera_out_enabled=True,
camera_out_width=1024,
camera_out_height=576,
vad_enabled=True,
vad_audio_passthrough=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.5)),
),
)
# Initialize the Gemini Multimodal Live model
llm = GeminiMultimodalLiveLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
voice_id="Puck", # Aoede, Charon, Fenrir, Kore, Puck
transcribe_user_audio=True,
)
messages = [
{
"role": "user",
"content": "You are Chatbot, a friendly, helpful robot. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way, but keep your responses brief. Start by introducing yourself.",
},
]
# Set up conversation context and management
# The context_aggregator will automatically collect conversation context
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
ta = TalkingAnimation()
#
# RTVI events for Pipecat client UI
#
rtvi = RTVIProcessor(config=RTVIConfig(config=[]))
pipeline = Pipeline(
[
transport.input(),
rtvi,
context_aggregator.user(),
llm,
ta,
transport.output(),
context_aggregator.assistant(),
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
),
observers=[RTVIObserver(rtvi)],
)
await task.queue_frame(quiet_frame)
@rtvi.event_handler("on_client_ready")
async def on_client_ready(rtvi):
await rtvi.set_bot_ready()
# Kick off the conversation
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
print(f"Participant left: {participant}")
await task.cancel()
runner = PipelineRunner()
await runner.run(task)

View File

@@ -1,226 +0,0 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
"""OpenAI Bot Implementation.
This module implements a chatbot using OpenAI's GPT-4 model for natural language
processing. It includes:
- Real-time audio/video interaction through Daily
- Animated robot avatar
- Text-to-speech using ElevenLabs
- Support for both English and Spanish
The bot runs as part of a pipeline that processes audio/video frames and manages
the conversation flow.
"""
import os
import sys
from dotenv import load_dotenv
from loguru import logger
from PIL import Image
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import (
BotStartedSpeakingFrame,
BotStoppedSpeakingFrame,
Frame,
OutputImageRawFrame,
SpriteFrame,
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.processors.frameworks.rtvi import RTVIConfig, RTVIObserver, RTVIProcessor
from pipecat.services.elevenlabs.tts import ElevenLabsTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
try:
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
except ValueError:
# Handle the case where logger is already initialized
pass
sprites = []
script_dir = os.path.dirname(__file__)
# Load sequential animation frames
for i in range(1, 26):
# Build the full path to the image file
full_path = os.path.join(script_dir, f"assets/robot0{i}.png")
# Get the filename without the extension to use as the dictionary key
# Open the image and convert it to bytes
with Image.open(full_path) as img:
sprites.append(OutputImageRawFrame(image=img.tobytes(), size=img.size, format=img.format))
# Create a smooth animation by adding reversed frames
flipped = sprites[::-1]
sprites.extend(flipped)
# Define static and animated states
quiet_frame = sprites[0] # Static frame for when bot is listening
talking_frame = SpriteFrame(images=sprites) # Animation sequence for when bot is talking
class TalkingAnimation(FrameProcessor):
"""Manages the bot's visual animation states.
Switches between static (listening) and animated (talking) states based on
the bot's current speaking status.
"""
def __init__(self):
super().__init__()
self._is_talking = False
async def process_frame(self, frame: Frame, direction: FrameDirection):
"""Process incoming frames and update animation state.
Args:
frame: The incoming frame to process
direction: The direction of frame flow in the pipeline
"""
await super().process_frame(frame, direction)
# Switch to talking animation when bot starts speaking
if isinstance(frame, BotStartedSpeakingFrame):
if not self._is_talking:
await self.push_frame(talking_frame)
self._is_talking = True
# Return to static frame when bot stops speaking
elif isinstance(frame, BotStoppedSpeakingFrame):
await self.push_frame(quiet_frame)
self._is_talking = False
await self.push_frame(frame, direction)
async def run_bot(room_url: str, token: str):
"""Main bot execution function.
Sets up and runs the bot pipeline including:
- Daily video transport
- Speech-to-text and text-to-speech services
- Language model integration
- Animation processing
- RTVI event handling
"""
# Set up Daily transport with video/audio parameters
transport = DailyTransport(
room_url,
token,
"Chatbot",
DailyParams(
audio_out_enabled=True,
camera_out_enabled=True,
camera_out_width=1024,
camera_out_height=576,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
transcription_enabled=True,
#
# Spanish
#
# transcription_settings=DailyTranscriptionSettings(
# language="es",
# tier="nova",
# model="2-general"
# )
),
)
# Initialize text-to-speech service
tts = ElevenLabsTTSService(
api_key=os.getenv("ELEVENLABS_API_KEY"),
#
# English
#
voice_id="SAz9YHcvj6GT2YYXdXww",
#
# Spanish
#
# model="eleven_multilingual_v2",
# voice_id="gD1IexrzCvsXPHUuT0s3",
)
# Initialize LLM service
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
messages = [
{
"role": "system",
#
# English
#
"content": "You are an incessant one-upper. Start by asking the user how their day is going.",
#
# Spanish
#
# "content": "Eres Chatbot, un amigable y útil robot. Tu objetivo es demostrar tus capacidades de una manera breve. Tus respuestas se convertiran a audio así que nunca no debes incluir caracteres especiales. Contesta a lo que el usuario pregunte de una manera creativa, útil y breve. Empieza por presentarte a ti mismo.",
},
]
# Set up conversation context and management
# The context_aggregator will automatically collect conversation context
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
ta = TalkingAnimation()
#
# RTVI events for Pipecat client UI
#
rtvi = RTVIProcessor(config=RTVIConfig(config=[]))
pipeline = Pipeline(
[
transport.input(),
rtvi,
context_aggregator.user(),
llm,
tts,
ta,
transport.output(),
context_aggregator.assistant(),
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
),
observers=[RTVIObserver(rtvi)],
)
await task.queue_frame(quiet_frame)
@rtvi.event_handler("on_client_ready")
async def on_client_ready(rtvi):
await rtvi.set_bot_ready()
# Kick off the conversation
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
print(f"Participant left: {participant}")
await task.cancel()
runner = PipelineRunner()
await runner.run(task)

View File

@@ -1,239 +0,0 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
"""OpenAI Bot Implementation.
This module implements a chatbot using OpenAI's GPT-4 model for natural language
processing. It includes:
- Real-time audio/video interaction through Daily
- Animated robot avatar
- Text-to-speech using ElevenLabs
- Support for both English and Spanish
The bot runs as part of a pipeline that processes audio/video frames and manages
the conversation flow.
"""
import os
import sys
from typing import List
from dotenv import load_dotenv
from loguru import logger
from openai.types.chat import ChatCompletionMessageParam
from PIL import Image
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import (
BotStartedSpeakingFrame,
BotStoppedSpeakingFrame,
Frame,
OutputImageRawFrame,
SpriteFrame,
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.processors.frameworks.rtvi import RTVIConfig, RTVIObserver, RTVIProcessor
from pipecat.services.elevenlabs.tts import ElevenLabsTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
try:
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
except ValueError:
# Handle the case where logger is already initialized
pass
# REPLACE WITH YOUR MODAL URL ENDPOINT
modal_url = "https://<Modal workspace>--example-vllm-openai-compatible-serve.modal.run"
api_key = os.getenv("VLLM_API_KEY", "super-secret-key")
sprites = []
script_dir = os.path.dirname(__file__)
# Load sequential animation frames
for i in range(1, 26):
# Build the full path to the image file
full_path = os.path.join(script_dir, f"assets/robot0{i}.png")
# Get the filename without the extension to use as the dictionary key
# Open the image and convert it to bytes
with Image.open(full_path) as img:
sprites.append(OutputImageRawFrame(image=img.tobytes(), size=img.size, format=img.format))
# Create a smooth animation by adding reversed frames
flipped = sprites[::-1]
sprites.extend(flipped)
# Define static and animated states
quiet_frame = sprites[0] # Static frame for when bot is listening
talking_frame = SpriteFrame(images=sprites) # Animation sequence for when bot is talking
class TalkingAnimation(FrameProcessor):
"""Manages the bot's visual animation states.
Switches between static (listening) and animated (talking) states based on
the bot's current speaking status.
"""
def __init__(self):
super().__init__()
self._is_talking = False
async def process_frame(self, frame: Frame, direction: FrameDirection):
"""Process incoming frames and update animation state.
Args:
frame: The incoming frame to process
direction: The direction of frame flow in the pipeline
"""
await super().process_frame(frame, direction)
# Switch to talking animation when bot starts speaking
if isinstance(frame, BotStartedSpeakingFrame):
if not self._is_talking:
await self.push_frame(talking_frame)
self._is_talking = True
# Return to static frame when bot stops speaking
elif isinstance(frame, BotStoppedSpeakingFrame):
await self.push_frame(quiet_frame)
self._is_talking = False
await self.push_frame(frame, direction)
async def run_bot(room_url: str, token: str):
"""Main bot execution function.
Sets up and runs the bot pipeline including:
- Daily video transport
- Speech-to-text and text-to-speech services
- Language model integration
- Animation processing
- RTVI event handling
"""
# Set up Daily transport with video/audio parameters
transport = DailyTransport(
room_url,
token,
"Chatbot",
DailyParams(
audio_out_enabled=True,
camera_out_enabled=True,
camera_out_width=1024,
camera_out_height=576,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
transcription_enabled=True,
#
# Spanish
#
# transcription_settings=DailyTranscriptionSettings(
# language="es",
# tier="nova",
# model="2-general"
# )
),
)
# Initialize text-to-speech service
tts = ElevenLabsTTSService(
api_key=os.getenv("ELEVENLABS_API_KEY"),
#
# English
#
voice_id="D38z5RcWu1voky8WS1ja",
#
# Spanish
#
# model="eleven_multilingual_v2",
# voice_id="gD1IexrzCvsXPHUuT0s3",
)
# Initialize LLM service
llm = OpenAILLMService(
# To use OpenAI
api_key=api_key,
# Or, to use a local vLLM (or similar) api server
model="neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w4a16",
base_url=f"{modal_url}/v1",
)
messages = [
{
"role": "system",
#
# English
#
"content": "You are a salesman for Modal, the cloud-native serverless Python computing platform.",
#
# Spanish
#
# "content": "Eres Chatbot, un amigable y útil robot. Tu objetivo es demostrar tus capacidades de una manera breve. Tus respuestas se convertiran a audio así que nunca no debes incluir caracteres especiales. Contesta a lo que el usuario pregunte de una manera creativa, útil y breve. Empieza por presentarte a ti mismo.",
},
]
# Set up conversation context and management
# The context_aggregator will automatically collect conversation context
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
ta = TalkingAnimation()
#
# RTVI events for Pipecat client UI
#
rtvi = RTVIProcessor(config=RTVIConfig(config=[]))
pipeline = Pipeline(
[
transport.input(),
rtvi,
context_aggregator.user(),
llm,
tts,
ta,
transport.output(),
context_aggregator.assistant(),
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
),
observers=[RTVIObserver(rtvi)],
)
await task.queue_frame(quiet_frame)
@rtvi.event_handler("on_client_ready")
async def on_client_ready(rtvi):
await rtvi.set_bot_ready()
# Kick off the conversation
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
print(f"Participant left: {participant}")
await task.cancel()
runner = PipelineRunner()
await runner.run(task)

View File

@@ -1,84 +0,0 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import argparse
import asyncio
import importlib
import os
def get_bot_file(arg_bot: str | None) -> str:
bot_implementation = arg_bot or os.getenv("BOT_IMPLEMENTATION", "openai").lower().strip()
if not bot_implementation:
bot_implementation = "openai"
if bot_implementation not in ["openai", "gemini", "vllm"]:
raise ValueError(
f"Invalid BOT_IMPLEMENTATION: {bot_implementation}. Must be 'openai' or 'gemini'"
)
return f"bot_{bot_implementation}"
def get_runner(bot_file: str):
"""Dynamically import the run_bot function based on the bot name.
Args:
bot_name (str): The name of the bot implementation (e.g., 'openai', 'gemini').
Returns:
function: The run_bot function from the specified bot module.
Raises:
ImportError: If the specified bot module or run_bot function is not found.
"""
try:
# Dynamically construct the module name
module_name = f"{bot_file}"
# Import the module
module = importlib.import_module(module_name)
# Get the run_bot function from the module
return getattr(module, "run_bot")
except (ImportError, AttributeError) as e:
raise ImportError(f"Failed to import run_bot from {module_name}: {e}")
def main():
"""Parse the args to launch the appropriate bot using the given room/token."""
parser = argparse.ArgumentParser(description="Daily AI SDK Bot Sample")
parser.add_argument(
"-u", "--url", type=str, required=False, help="URL of the Daily room to join"
)
parser.add_argument(
"-t",
"--token",
type=str,
required=False,
help="Daily room token",
)
parser.add_argument(
"-b",
"--bot",
type=str,
required=False,
help="Bot runner to use (e.g., openai, gemini)",
)
args, unknown = parser.parse_known_args()
url = args.url or os.getenv("DAILY_SAMPLE_ROOM_URL")
token = args.token or os.getenv("DAILY_SAMPLE_ROOM_TOKEN")
bot_file = get_bot_file(args.bot)
if not url:
raise Exception(
"No Daily room specified. use the -u/--url option from the command line, or set DAILY_SAMPLE_ROOM_URL in your environment to specify a Daily room URL."
)
run_bot = get_runner(bot_file)
asyncio.run(run_bot(url, token))
if __name__ == "__main__":
main()

View File

@@ -100,28 +100,7 @@ phone numbers with valid values for your use case.
### Dialin Request
The server will receive a request when a call is received from Daily.
The payload that the webhook received is as follows:
```json
{
// for dial-in from webhook
"To": "+14152251493",
"From": "+14158483432",
"callId": "string-contains-uuid",
"callDomain": "string-contains-uuid",
"sipHeaders": {
"X-My-Custom-Header": "value",
"x-caller": "+1234567890",
"x-called": "+1987654321",
},
}
```
The `To`, `From`, `callId`, `callDomain` fields are converted to
`snake_case` and mapped to `dialin_settings`. In addition, `sipHeader`
contains any custom SIP headers received by Daily on the SIP
interconnect address (`sip_uri`). These are headers sent from
Twilio or other external SIP platforms, for example, to send the
caller's phone number.
The server will receive a request when a call is received from Daily.
### Dialout Request
@@ -179,7 +158,6 @@ curl -X POST http://localhost:3000/api/dial \
"From": "+1987654321",
"callId": "call-uuid-123",
"callDomain": "domain-uuid-456",
"sipHeader": {},
"dialout_settings": [
{
"phoneNumber": "+1234567890",

View File

@@ -39,11 +39,6 @@ class RoomRequest(BaseModel):
None, description="A flag to perform voicemail or answeing-machine detection"
)
call_transfer: Optional[Dict[str, Any]] = Field(None, description="to initiate a call transfer")
sipHeaders: Optional[Dict[str, Any]] = Field(
None,
alias="sip_headers",
description="Custom SIP headers received from the external SIP provider",
)
class Config:
populate_by_name = True
@@ -62,14 +57,6 @@ class RoomRequest(BaseModel):
"callDomain": "string-contains-uuid"
These need to be remapped to dialin_settings
In addition, we may receive in the body that can be
sent to the bot as a custom field, sip_headers
"sipHeaders": {
"X-My-Custom-Header": "value",
"x-caller": "+14158483432",
"x-called": "+14152251493",
},
"dialout_settings": [
{"phoneNumber": "+14158483432", "callerId": "+14152251493"},
{"sipUri": "sip:username@sip.hostname"}
@@ -170,7 +157,6 @@ async def dial(request: RoomRequest, raw_request: Request):
"dialout_settings": request.dialout_settings,
"voicemail_detection": request.voicemail_detection,
"call_transfer": request.call_transfer,
"sip_headers": request.sipHeaders, # passing the SIP headers to the bot
},
}

View File

@@ -65,7 +65,6 @@ export default async function handler(req, res) {
From,
callId,
callDomain,
sipHeaders,
dialout_settings,
voicemail_detection,
call_transfer
@@ -118,7 +117,6 @@ export default async function handler(req, res) {
dialout_settings,
voicemail_detection,
call_transfer,
sip_headers: sipHeaders,
},
};

View File

@@ -1,111 +0,0 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
import aiohttp
from daily_runner import configure
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService, Language, LiveOptions
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_in_enabled=True,
audio_in_passthrough=False,
audio_out_enabled=True,
audio_out_sample_rate=16000,
transcription_enabled=False,
vad_analyzer=SileroVADAnalyzer(),
),
)
stt = DeepgramSTTService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
live_options=LiveOptions(language=Language.EN),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_audio(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -10,12 +10,12 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia.tts import CartesiaHttpTTSService
from pipecat.processors.audio.vad.silero import SileroVAD
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import TransportParams
@@ -33,13 +33,14 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespac
params=TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = CartesiaHttpTTSService(
vad = SileroVAD()
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
@@ -58,13 +59,14 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespac
pipeline = Pipeline(
[
transport.input(), # Transport user input
transport.input(),
stt,
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
vad,
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
]
)

View File

@@ -1,111 +0,0 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import argparse
import os
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.minimax.tts import MiniMaxHttpTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True)
async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespace):
logger.info(f"Starting bot")
# Create an HTTP session
async with aiohttp.ClientSession() as session:
transport = SmallWebRTCTransport(
webrtc_connection=webrtc_connection,
params=TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = MiniMaxHttpTTSService(
api_key=os.getenv("MINIMAX_API_KEY", ""),
group_id=os.getenv("MINIMAX_GROUP_ID", ""),
aiohttp_session=session,
params=MiniMaxHttpTTSService.InputParams(language=Language.EN),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
@transport.event_handler("on_client_closed")
async def on_client_closed(transport, client):
logger.info(f"Client closed connection")
await task.cancel()
runner = PipelineRunner(handle_sigint=False)
await runner.run(task)
if __name__ == "__main__":
from run import main
main()

View File

@@ -1,109 +0,0 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import argparse
import os
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.sarvam.tts import SarvamTTSService
from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True)
async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespace):
logger.info(f"Starting bot")
transport = SmallWebRTCTransport(
webrtc_connection=webrtc_connection,
params=TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
# Create an HTTP session
async with aiohttp.ClientSession() as session:
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = SarvamTTSService(
api_key=os.getenv("SARVAM_API_KEY"),
aiohttp_session=session,
params=SarvamTTSService.InputParams(language=Language.EN),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
@transport.event_handler("on_client_closed")
async def on_client_closed(transport, client):
logger.info(f"Client closed connection")
await task.cancel()
runner = PipelineRunner(handle_sigint=False)
await runner.run(task)
if __name__ == "__main__":
from run import main
main()

View File

@@ -1,165 +0,0 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import argparse
from typing import Optional
from dotenv import load_dotenv
from loguru import logger
from pipecat.frames.frames import (
EndFrame,
Frame,
InputImageRawFrame,
OutputImageRawFrame,
TextFrame,
TTSTextFrame,
UserImageRequestFrame,
UserStartedSpeakingFrame,
)
from pipecat.observers.loggers.debug_log_observer import DebugLogObserver, FrameEndpoint
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.vision_image_frame import VisionImageFrameAggregator
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.processors.frameworks.rtvi import (
RTVIConfig,
RTVIObserver,
RTVIProcessor,
RTVIServerMessageFrame,
)
from pipecat.processors.gstreamer.pipeline_source import GStreamerPipelineSource
from pipecat.services.moondream.vision import MoondreamService
from pipecat.transports.base_input import BaseInputTransport
from pipecat.transports.base_output import BaseOutputTransport
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True)
class AlertProcessor(FrameProcessor):
def __init__(self, connection: SmallWebRTCConnection):
super().__init__()
self._connection = connection
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, TextFrame):
text = frame.text.strip().upper()
message_frame = RTVIServerMessageFrame(data=text)
await self.push_frame(message_frame)
await self.push_frame(frame, direction)
class UserImageRequester(FrameProcessor):
def __init__(self, participant_id: Optional[str] = None):
super().__init__()
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, OutputImageRawFrame):
await self.push_frame(frame)
# logger.info(f"UserImageRequester received image frame with size: {frame.size}")
text_frame = TextFrame(
"Are there people in the bottom right corner of the image? Only answer with YES or NO."
)
await self.push_frame(text_frame)
input_frame = InputImageRawFrame(
image=frame.image,
size=frame.size,
format=frame.format,
)
await self.push_frame(input_frame)
else:
await self.push_frame(frame, direction)
async def run_bot(webrtc_connection: SmallWebRTCConnection, args: argparse.Namespace):
logger.info(f"Starting bot with video input: {args.input}")
transport = SmallWebRTCTransport(
webrtc_connection=webrtc_connection,
params=TransportParams(
audio_out_enabled=True,
video_out_enabled=True,
video_out_is_live=True,
video_out_width=1280,
video_out_height=720,
),
)
gst = GStreamerPipelineSource(
pipeline=(f"rtspsrc location={args.input} ! decodebin ! autovideosink"),
out_params=GStreamerPipelineSource.OutputParams(
video_width=1280,
video_height=720,
),
)
rtvi = RTVIProcessor(config=RTVIConfig(config=[]))
# If you run into weird description, try with use_cpu=True
moondream = MoondreamService()
ir = UserImageRequester()
va = VisionImageFrameAggregator()
alert = AlertProcessor(connection=webrtc_connection)
pipeline = Pipeline(
[
gst, # GStreamer file source
rtvi,
ir,
# debug,
va,
moondream,
alert, # Send an email alert or something if the door is open
transport.output(), # Transport bot output
]
)
task = PipelineTask(
pipeline,
observers=[
RTVIObserver(rtvi),
DebugLogObserver(
frame_types={
# TextFrame: None,
TextFrame: (MoondreamService, FrameEndpoint.SOURCE),
# InputImageRawFrame: None,
EndFrame: None,
}
),
],
)
@rtvi.event_handler("on_client_ready")
async def on_client_ready(rtvi):
logger.info(f"Bot ready: {rtvi}")
await rtvi.set_bot_ready()
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected: {client}")
runner = PipelineRunner(handle_sigint=False)
await runner.run(task)
if __name__ == "__main__":
from run import main
parser = argparse.ArgumentParser(description="Pipecat Bot Runner")
parser.add_argument("-i", "--input", type=str, required=True, help="Input video file")
main(parser)

View File

@@ -53,6 +53,7 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespac
api_key=os.getenv("GOOGLE_API_KEY"),
system_instruction=system_instruction,
voice_id="Puck", # Aoede, Charon, Fenrir, Kore, Puck
transcribe_user_audio=True,
)
# Build the pipeline

View File

@@ -47,6 +47,7 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespac
api_key=os.getenv("GOOGLE_API_KEY"),
voice_id="Aoede", # Puck, Charon, Kore, Fenrir, Aoede
# system_instruction="Talk like a pirate."
transcribe_user_audio=True,
# inference_on_context_initialization=False,
)

View File

@@ -89,6 +89,7 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespac
llm = GeminiMultimodalLiveLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
system_instruction=system_instruction,
transcribe_user_audio=True,
tools=tools,
)

View File

@@ -51,6 +51,7 @@ async def main():
api_key=os.getenv("GOOGLE_API_KEY"),
voice_id="Aoede", # Puck, Charon, Kore, Fenrir, Aoede
# system_instruction="Talk like a pirate."
transcribe_user_audio=True,
# inference_on_context_initialization=False,
)

View File

@@ -59,6 +59,7 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespac
llm = GeminiMultimodalLiveLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
transcribe_user_audio=True,
system_instruction=SYSTEM_INSTRUCTION,
tools=[{"google_search": {}}, {"code_execution": {}}],
params=InputParams(modalities=GeminiMultimodalModalities.TEXT),

View File

@@ -58,6 +58,7 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespac
llm = GeminiMultimodalLiveLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
voice_id="Puck", # Aoede, Charon, Fenrir, Kore, Puck
transcribe_user_audio=True,
system_instruction=system_instruction,
tools=tools,
)

View File

@@ -1,119 +0,0 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import argparse
import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True)
async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespace):
logger.info(f"Starting bot")
transport = SmallWebRTCTransport(
webrtc_connection=webrtc_connection,
params=TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
turn_observer = task.turn_tracking_observer
if turn_observer:
@turn_observer.event_handler("on_turn_started")
async def on_turn_started(observer, turn_number):
logger.info(f"🔄 Turn {turn_number} started")
@turn_observer.event_handler("on_turn_ended")
async def on_turn_ended(observer, turn_number, duration, was_interrupted):
if was_interrupted:
logger.info(f"🔄 Turn {turn_number} interrupted after {duration:.2f}s")
else:
logger.info(f"🏁 Turn {turn_number} completed in {duration:.2f}s")
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
@transport.event_handler("on_client_closed")
async def on_client_closed(transport, client):
logger.info(f"Client closed connection")
await task.cancel()
runner = PipelineRunner(handle_sigint=False)
await runner.run(task)
if __name__ == "__main__":
from run import main
main()

View File

@@ -11,17 +11,18 @@ from pathlib import Path
from dotenv import load_dotenv
from loguru import logger
from openai import audio
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.observers.base_observer import BaseObserver, FramePushed
from pipecat.frames.frames import Frame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.google.llm import GoogleLLMService, LLMSearchResponseFrame
from pipecat.services.llm_service import LLMService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
@@ -32,7 +33,7 @@ load_dotenv(override=True)
# Function handlers for the LLM
search_tool = {"google_search": {}}
search_tool = {"google_search_retrieval": {}}
tools = [search_tool]
system_instruction = """
@@ -49,22 +50,14 @@ Start each interaction by asking the user about which place they would like to k
"""
class LLMSearchLoggerObserver(BaseObserver):
async def on_push_frame(self, data: FramePushed):
src = data.source
dst = data.destination
frame = data.frame
timestamp = data.timestamp
if not isinstance(src, LLMService) and not isinstance(dst, LLMService):
return
time_sec = timestamp / 1_000_000_000
arrow = ""
class LLMSearchLoggerProcessor(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, LLMSearchResponseFrame):
logger.debug(f"🧠 {arrow} {dst} LLM SEARCH RESPONSE FRAME: {frame} at {time_sec:.2f}s")
print(f"LLMSearchLoggerProcessor: {frame}")
await self.push_frame(frame)
async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespace):
@@ -91,6 +84,7 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespac
api_key=os.getenv("GOOGLE_API_KEY"),
system_instruction=system_instruction,
tools=tools,
model="gemini-1.5-flash-002",
)
context = OpenAILLMContext(
@@ -103,23 +97,22 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespac
)
context_aggregator = llm.create_context_aggregator(context)
llm_search_logger = LLMSearchLoggerProcessor()
pipeline = Pipeline(
[
transport.input(),
stt,
context_aggregator.user(),
llm,
llm_search_logger,
tts,
transport.output(),
context_aggregator.assistant(),
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(allow_interruptions=True),
observers=[LLMSearchLoggerObserver()],
)
task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True))
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):

View File

@@ -99,14 +99,21 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespac
)
# Register handler for voice switching
async def on_voice_tag(match: PatternMatch):
def on_voice_tag(match: PatternMatch):
voice_name = match.content.strip().lower()
if voice_name in VOICE_IDS:
# First flush any existing audio to finish the current context
await tts.flush_audio()
# Then set the new voice
tts.set_voice(VOICE_IDS[voice_name])
logger.info(f"Switched to {voice_name} voice")
voice_id = VOICE_IDS[voice_name]
# Create task to reset the TTS context after voice change
async def change_voice():
# First flush any existing audio to finish the current context
await tts.flush_audio()
# Then set the new voice
tts.set_voice(voice_id)
logger.info(f"Switched to {voice_name} voice")
# Schedule the voice change task
asyncio.create_task(change_voice())
else:
logger.warning(f"Unknown voice: {voice_name}")

View File

@@ -4,7 +4,6 @@
# SPDX-License-Identifier: BSD 2-Clause License
#
import argparse
import asyncio
import io
import os
@@ -81,7 +80,7 @@ class UrlToImageProcessor(FrameProcessor):
logger.error(error_msg)
async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespace):
async def run_bot(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Starting bot")
transport = SmallWebRTCTransport(

View File

@@ -4,7 +4,6 @@
# SPDX-License-Identifier: BSD 2-Clause License
#
import argparse
import os
import sys
@@ -29,7 +28,7 @@ from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True)
async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespace):
async def run_bot(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Starting bot")
transport = SmallWebRTCTransport(

View File

@@ -4,7 +4,6 @@
# SPDX-License-Identifier: BSD 2-Clause License
#
import argparse
import asyncio
import io
import os
@@ -82,7 +81,7 @@ class UrlToImageProcessor(FrameProcessor):
logger.error(error_msg)
async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespace):
async def run_bot(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Starting bot")
transport = SmallWebRTCTransport(

View File

@@ -16,7 +16,7 @@
"@types/node": "^22.13.1",
"@vitejs/plugin-react-swc": "^3.7.2",
"typescript": "^5.7.3",
"vite": "^6.3.5"
"vite": "^6.0.2"
}
},
"node_modules/@babel/runtime": {
@@ -1370,9 +1370,9 @@
}
},
"node_modules/vite": {
"version": "6.3.5",
"resolved": "https://registry.npmjs.org/vite/-/vite-6.3.5.tgz",
"integrity": "sha512-cZn6NDFE7wdTpINgs++ZJ4N49W2vRp8LCKrn3Ob1kYNtOo21vfDoaV5GzBfLU4MovSAB8uNRm4jgzVQZ+mBzPQ==",
"version": "6.3.3",
"resolved": "https://registry.npmjs.org/vite/-/vite-6.3.3.tgz",
"integrity": "sha512-5nXH+QsELbFKhsEfWLkHrvgRpTdGJzqOZ+utSdmPTvwHmvU6ITTm3xx+mRusihkcI8GeC7lCDyn3kDtiki9scw==",
"dev": true,
"dependencies": {
"esbuild": "^0.25.0",

View File

@@ -15,7 +15,7 @@
"@types/node": "^22.13.1",
"@vitejs/plugin-react-swc": "^3.7.2",
"typescript": "^5.7.3",
"vite": "^6.3.5"
"vite": "^6.0.2"
},
"dependencies": {
"@pipecat-ai/client-js": "^0.3.5",

View File

@@ -13,7 +13,7 @@
"@pipecat-ai/daily-transport": "^0.3.8"
},
"devDependencies": {
"vite": "^6.3.5"
"vite": "^6.0.9"
}
},
"node_modules/@babel/runtime": {
@@ -1114,9 +1114,9 @@
}
},
"node_modules/vite": {
"version": "6.3.5",
"resolved": "https://registry.npmjs.org/vite/-/vite-6.3.5.tgz",
"integrity": "sha512-cZn6NDFE7wdTpINgs++ZJ4N49W2vRp8LCKrn3Ob1kYNtOo21vfDoaV5GzBfLU4MovSAB8uNRm4jgzVQZ+mBzPQ==",
"version": "6.3.3",
"resolved": "https://registry.npmjs.org/vite/-/vite-6.3.3.tgz",
"integrity": "sha512-5nXH+QsELbFKhsEfWLkHrvgRpTdGJzqOZ+utSdmPTvwHmvU6ITTm3xx+mRusihkcI8GeC7lCDyn3kDtiki9scw==",
"dev": true,
"dependencies": {
"esbuild": "^0.25.0",

View File

@@ -12,7 +12,7 @@
"license": "ISC",
"description": "",
"devDependencies": {
"vite": "^6.3.5"
"vite": "^6.0.9"
},
"dependencies": {
"@pipecat-ai/client-js": "^0.3.5",

View File

@@ -102,9 +102,9 @@ async def main():
llm = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
model="gemini-1.5-flash-002",
system_instruction=system_instruction,
tools=tools,
model="gemini-1.5-flash",
)
context = OpenAILLMContext(
@@ -153,6 +153,7 @@ async def main():
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
logger.debug("First participant joined: {}", participant["id"])
await transport.capture_participant_transcription(participant["id"])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):

View File

@@ -1,69 +0,0 @@
# OpenTelemetry Tracing with Pipecat
This repository demonstrates OpenTelemetry tracing integration for Pipecat services, with examples for different backends.
## Tracing Features in Pipecat
- **Hierarchical Tracing**: Track entire conversations, turns, and service calls
- **Service Tracing**: Detailed spans for TTS, STT, and LLM services with rich context
- **TTFB Metrics**: Capture Time To First Byte metrics for latency analysis
- **Usage Statistics**: Track character counts for TTS and token usage for LLMs
## Trace Structure
Traces are organized hierarchically:
```
Conversation (conversation)
├── turn
│ ├── stt_deepgramsttservice
│ ├── llm_openaillmservice
│ └── tts_cartesiattsservice
└── turn
├── stt_deepgramsttservice
├── llm_openaillmservice
└── tts_cartesiattsservice
turn
└── ...
```
This organization helps you track conversation-to-conversation and turn-to-turn interactions.
## Available Demos
| Demo | Description |
| ------------------------------- | ------------------------------------------------------------------------- |
| [Jaeger Tracing](./jaeger/) | Tracing with Jaeger, an open-source end-to-end distributed tracing system |
| [Langfuse Tracing](./langfuse/) | Tracing with Langfuse, a specialized platform for LLM observability |
## Common Requirements
- Python 3.10+
- Pipecat and its dependencies
- API keys for the services used (Deepgram, Cartesia, OpenAI)
- The appropriate OpenTelemetry exporters
## How Tracing Works
The tracing system consists of:
1. **TurnTrackingObserver**: Detects conversation turns
2. **TurnTraceObserver**: Creates spans for turns and conversations
3. **Service Decorators**: `@traced_tts`, `@traced_stt`, `@traced_llm` for service-specific tracing
4. **Context Providers**: Share context between different parts of the pipeline
## Getting Started
1. Choose one of the demos from the table above
2. Follow the README instructions in the respective directory
## Common Troubleshooting
- **Debugging Traces**: Set `OTEL_CONSOLE_EXPORT=true` to print traces to the console for debugging
- **Missing Metrics**: Check that `enable_metrics=True` in PipelineParams
- **API Key Issues**: Verify your API keys are set correctly in the .env file
## References
- [OpenTelemetry Python Documentation](https://opentelemetry-python.readthedocs.io/)
- [Pipecat Documentation](https://docs.pipecat.ai/server/utilities/opentelemetry)

View File

@@ -1,80 +0,0 @@
# Jaeger Tracing for Pipecat
This demo showcases OpenTelemetry tracing integration for Pipecat services using Jaeger, allowing you to visualize service calls, performance metrics, and dependencies.
## Setup Instructions
### 1. Start the Jaeger Container
Run Jaeger in Docker to collect and visualize traces:
```bash
docker run -d --name jaeger \
-e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \
-p 16686:16686 \
-p 4317:4317 \
-p 4318:4318 \
jaegertracing/all-in-one:latest
```
### 2. Environment Configuration
Create a `.env` file with your API keys and enable tracing:
```
ENABLE_TRACING=true
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 # Point to your Jaeger backend
# OTEL_CONSOLE_EXPORT=true # Set to any value for debug output to console
# Service API keys
DEEPGRAM_API_KEY=your_key_here
CARTESIA_API_KEY=your_key_here
OPENAI_API_KEY=your_key_here
```
### 3. Install Dependencies
```bash
pip install -r requirements.txt
```
### 4. Run the Demo
```bash
python bot.py
```
### 5. View Traces in Jaeger
Open your browser to [http://localhost:16686](http://localhost:16686) and select the "pipecat-demo" service to view traces.
## Jaeger-Specific Configuration
In the `bot.py` file, note the GRPC exporter configuration:
```python
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
# Create the exporter
otlp_exporter = OTLPSpanExporter(
endpoint=os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT", "http://localhost:4317"),
insecure=True,
)
# Set up tracing with the exporter
setup_tracing(
service_name="pipecat-demo",
exporter=otlp_exporter,
console_export=bool(os.getenv("OTEL_CONSOLE_EXPORT")),
)
```
## Troubleshooting
- **No Traces in Jaeger**: Ensure the Docker container is running and the OTLP endpoint is correct
- **Connection Errors**: Verify network connectivity to the Jaeger container
- **Exporter Issues**: Try the Console exporter (`OTEL_CONSOLE_EXPORT=true`) to verify tracing works
## References
- [Jaeger Documentation](https://www.jaegertracing.io/docs/latest/)

View File

@@ -1,161 +0,0 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import argparse
import os
import sys
from dotenv import load_dotenv
from loguru import logger
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.llm_service import FunctionCallParams
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
from pipecat.utils.tracing.setup import setup_tracing
load_dotenv(override=True)
IS_TRACING_ENABLED = bool(os.getenv("ENABLE_TRACING"))
# Initialize tracing if enabled
if IS_TRACING_ENABLED:
# Create the exporter
otlp_exporter = OTLPSpanExporter(
endpoint=os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT", "http://localhost:4317"),
insecure=True,
)
# Set up tracing with the exporter
setup_tracing(
service_name="pipecat-demo",
exporter=otlp_exporter,
console_export=bool(os.getenv("OTEL_CONSOLE_EXPORT")),
)
logger.info("OpenTelemetry tracing initialized")
async def fetch_weather_from_api(params: FunctionCallParams):
await params.llm.push_frame(TTSSpeakFrame("Let me check on that."))
await params.result_callback({"conditions": "nice", "temperature": "75"})
async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespace):
logger.info(f"Starting bot")
transport = SmallWebRTCTransport(
webrtc_connection=webrtc_connection,
params=TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"), params=OpenAILLMService.InputParams(temperature=0.5)
)
# You can also register a function_name of None to get all functions
# sent to the same callback with an additional function_name parameter.
llm.register_function("get_current_weather", fetch_weather_from_api)
weather_function = FunctionSchema(
name="get_current_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the user's location.",
},
},
required=["location", "format"],
)
tools = ToolsSchema(standard_tools=[weather_function])
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(),
stt,
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
),
enable_tracing=IS_TRACING_ENABLED,
# Optionally, add a conversation ID to track the conversation
# conversation_id="8df26cc1-6db0-4a7a-9930-1e037c8f1fa2",
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
@transport.event_handler("on_client_closed")
async def on_client_closed(transport, client):
logger.info(f"Client closed connection")
await task.cancel()
runner = PipelineRunner(handle_sigint=False)
await runner.run(task)
if __name__ == "__main__":
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), "..")))
from run import main
main()

View File

@@ -1,10 +0,0 @@
DEEPGRAM_API_KEY=your_deepgram_key
CARTESIA_API_KEY=your_cartesia_key
OPENAI_API_KEY=your_openai_key
# Set to any value to enable tracing
ENABLE_TRACING=true
# OTLP endpoint (defaults to localhost:4317 if not set)
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
# Set to any value to enable console output for debugging
# OTEL_CONSOLE_EXPORT=true

View File

@@ -1,6 +0,0 @@
fastapi
uvicorn
python-dotenv
pipecat-ai[webrtc,silero,cartesia,deepgram,openai,tracing]
pipecat-ai-small-webrtc-prebuilt
opentelemetry-exporter-otlp-proto-grpc

View File

@@ -1,82 +0,0 @@
# Langfuse Tracing for Pipecat
This demo showcases [Langfuse](https://langfuse.com) tracing integration for Pipecat services via OpenTelemetry, allowing you to visualize service calls, performance metrics, and dependencies with a focus on LLM observability.
Pipecat trace in Langfuse:
https://github.com/user-attachments/assets/13dd7431-bf5e-42e3-8d6d-2ed84c51195d
## Setup Instructions
### 1. Create a Langfuse Project and get API keys
[Self-host](https://langfuse.com/self-hosting) Langfuse or create a free [Langfuse Cloud](https://cloud.langfuse.com) account.
Create a new project and get the API keys.
### 2. Environment Configuration
Base64 encode your Langfuse public and secret key:
```bash
echo -n "pk-lf-1234567890:sk-lf-1234567890" | base64
```
Create a `.env` file with your API keys to enable tracing:
```
ENABLE_TRACING=true
# OTLP endpoint for Langfuse
OTEL_EXPORTER_OTLP_ENDPOINT=http://cloud.langfuse.com/api/public/otel
OTEL_EXPORTER_OTLP_HEADERS=Authorization=Basic%20<base64_encoded_api_key>
# Set to any value to enable console output for debugging
# OTEL_CONSOLE_EXPORT=true
# Service API keys
DEEPGRAM_API_KEY=your_key_here
CARTESIA_API_KEY=your_key_here
OPENAI_API_KEY=your_key_here
```
### 3. Install Dependencies
```bash
pip install -r requirements.txt
```
### 4. Run the Demo
```bash
python bot.py
```
### 5. View Traces in Langfuse
Open your browser to [https://cloud.langfuse.com](https://cloud.langfuse.com) to view traces.
## Langfuse-Specific Configuration
In the `bot.py` file, note the HTTP exporter configuration:
```python
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
# Create the exporter - configured from environment variables
otlp_exporter = OTLPSpanExporter()
# Set up tracing with the exporter
setup_tracing(
service_name="pipecat-demo",
exporter=otlp_exporter,
console_export=bool(os.getenv("OTEL_CONSOLE_EXPORT")),
)
```
## Troubleshooting
- **No Traces in Langfuse**: Ensure that your credentials are correct and follow this [troubleshooting guide](https://langfuse.com/faq/all/missing-traces)
- **Connection Errors**: Verify network connectivity to Langfuse
- **Authorization Issues**: Check that your base64 encoding is correct and the API keys are valid
## References
- [Langfuse OpenTelemetry Documentation](https://langfuse.com/docs/opentelemetry/get-started)

View File

@@ -1,158 +0,0 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import argparse
import os
import sys
from dotenv import load_dotenv
from loguru import logger
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.llm_service import FunctionCallParams
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
from pipecat.utils.tracing.setup import setup_tracing
load_dotenv(override=True)
IS_TRACING_ENABLED = bool(os.getenv("ENABLE_TRACING"))
# Initialize tracing if enabled
if IS_TRACING_ENABLED:
# Create the exporter
otlp_exporter = OTLPSpanExporter()
# Set up tracing with the exporter
setup_tracing(
service_name="pipecat-demo",
exporter=otlp_exporter,
console_export=bool(os.getenv("OTEL_CONSOLE_EXPORT")),
)
logger.info("OpenTelemetry tracing initialized")
async def fetch_weather_from_api(params: FunctionCallParams):
await params.llm.push_frame(TTSSpeakFrame("Let me check on that."))
await params.result_callback({"conditions": "nice", "temperature": "75"})
async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespace):
logger.info(f"Starting bot")
transport = SmallWebRTCTransport(
webrtc_connection=webrtc_connection,
params=TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"), params=OpenAILLMService.InputParams(temperature=0.5)
)
# You can also register a function_name of None to get all functions
# sent to the same callback with an additional function_name parameter.
llm.register_function("get_current_weather", fetch_weather_from_api)
weather_function = FunctionSchema(
name="get_current_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the user's location.",
},
},
required=["location", "format"],
)
tools = ToolsSchema(standard_tools=[weather_function])
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(),
stt,
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
),
enable_tracing=IS_TRACING_ENABLED,
# Optionally, add a conversation ID to track the conversation
# conversation_id="8df26cc1-6db0-4a7a-9930-1e037c8f1fa2",
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
@transport.event_handler("on_client_closed")
async def on_client_closed(transport, client):
logger.info(f"Client closed connection")
await task.cancel()
runner = PipelineRunner(handle_sigint=False)
await runner.run(task)
if __name__ == "__main__":
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), "..")))
from run import main
main()

View File

@@ -1,11 +0,0 @@
DEEPGRAM_API_KEY=your_deepgram_key
CARTESIA_API_KEY=your_cartesia_key
OPENAI_API_KEY=your_openai_key
# Set to any value to enable tracing
ENABLE_TRACING=true
# OTLP endpoint (change to us.cloud.langfuse.com if you use the US data region)
OTEL_EXPORTER_OTLP_ENDPOINT=http://cloud.langfuse.com/api/public/otel
OTEL_EXPORTER_OTLP_HEADERS=Authorization=Basic%20<base64_encoded_api_keys>
# Set to any value to enable console output for debugging
# OTEL_CONSOLE_EXPORT=true

View File

@@ -1,6 +0,0 @@
fastapi
uvicorn
python-dotenv
pipecat-ai[webrtc,silero,cartesia,deepgram,openai,tracing]
pipecat-ai-small-webrtc-prebuilt
opentelemetry-exporter-otlp-proto-http

View File

@@ -1,205 +0,0 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import argparse
import asyncio
import importlib.util
import os
import sys
from contextlib import asynccontextmanager
from inspect import iscoroutinefunction, signature
from typing import Any, Callable, Dict, Optional, Tuple
import uvicorn
from dotenv import load_dotenv
from fastapi import BackgroundTasks, FastAPI
from fastapi.responses import RedirectResponse
from loguru import logger
from pipecat_ai_small_webrtc_prebuilt.frontend import SmallWebRTCPrebuiltUI
from pipecat.transports.network.webrtc_connection import IceServer, SmallWebRTCConnection
# Load environment variables
load_dotenv(override=True)
app = FastAPI()
# Store connections by pc_id
pcs_map: Dict[str, SmallWebRTCConnection] = {}
ice_servers = [
IceServer(
urls="stun:stun.l.google.com:19302",
)
]
# Mount the frontend at /
app.mount("/client", SmallWebRTCPrebuiltUI)
# Store program arguments
args: argparse.Namespace = argparse.Namespace()
# Store the bot module and function info
bot_module: Any = None
run_bot_func: Optional[Callable] = None
is_webrtc_bot: bool = True
def import_bot_file(file_path: str) -> Tuple[Any, Callable, bool]:
"""Dynamically import the bot file and determine how to run it.
Returns:
tuple: (module, run_function, is_webrtc_bot)
- module: The imported module
- run_function: Either run_bot or main function
- is_webrtc_bot: True if run_bot function exists and accepts a WebRTC connection
"""
if not os.path.exists(file_path):
raise FileNotFoundError(f"Bot file not found: {file_path}")
# Extract module name without extension
module_name = os.path.splitext(os.path.basename(file_path))[0]
# Load the module
spec = importlib.util.spec_from_file_location(module_name, file_path)
if not spec or not spec.loader:
raise ImportError(f"Could not load spec for {file_path}")
module = importlib.util.module_from_spec(spec)
sys.modules[module_name] = module
spec.loader.exec_module(module)
# Check for run_bot function first
if hasattr(module, "run_bot"):
run_func = module.run_bot
# Check if the function accepts a WebRTC connection
sig = signature(run_func)
is_webrtc = len(sig.parameters) > 0
return module, run_func, is_webrtc
# Fall back to main function
if hasattr(module, "main") and iscoroutinefunction(module.main):
return module, module.main, False
raise AttributeError(f"No run_bot or async main function found in {file_path}")
@app.get("/", include_in_schema=False)
async def root_redirect():
return RedirectResponse(url="/client/")
@app.post("/api/offer")
async def offer(request: dict, background_tasks: BackgroundTasks):
global run_bot_func, is_webrtc_bot
if not run_bot_func:
raise RuntimeError("No bot file has been loaded")
if not is_webrtc_bot:
return {
"error": "This bot doesn't support WebRTC connections, it's running in standalone mode"
}
pc_id = request.get("pc_id")
if pc_id and pc_id in pcs_map:
pipecat_connection = pcs_map[pc_id]
logger.info(f"Reusing existing connection for pc_id: {pc_id}")
await pipecat_connection.renegotiate(
sdp=request["sdp"], type=request["type"], restart_pc=request.get("restart_pc", False)
)
else:
pipecat_connection = SmallWebRTCConnection(ice_servers)
await pipecat_connection.initialize(sdp=request["sdp"], type=request["type"])
@pipecat_connection.event_handler("closed")
async def handle_disconnected(webrtc_connection: SmallWebRTCConnection):
logger.info(f"Discarding peer connection for pc_id: {webrtc_connection.pc_id}")
pcs_map.pop(webrtc_connection.pc_id, None)
# We've already checked that run_bot_func exists
assert run_bot_func is not None
background_tasks.add_task(run_bot_func, pipecat_connection, args)
answer = pipecat_connection.get_answer()
# Updating the peer connection inside the map
pcs_map[answer["pc_id"]] = pipecat_connection
return answer
@asynccontextmanager
async def lifespan(app: FastAPI):
yield # Run app
coros = [pc.close() for pc in pcs_map.values()]
await asyncio.gather(*coros)
pcs_map.clear()
async def run_standalone_bot() -> None:
"""Run a standalone bot that doesn't require WebRTC"""
global run_bot_func
if run_bot_func is not None:
await run_bot_func()
else:
raise RuntimeError("No bot function available to run")
def main(parser: Optional[argparse.ArgumentParser] = None):
global args
if not parser:
parser = argparse.ArgumentParser(description="Pipecat Bot Runner")
parser.add_argument("bot_file", nargs="?", help="Path to the bot file", default=None)
parser.add_argument(
"--host", default="localhost", help="Host for HTTP server (default: localhost)"
)
parser.add_argument(
"--port", type=int, default=7860, help="Port for HTTP server (default: 7860)"
)
parser.add_argument("--verbose", "-v", action="count", default=0)
args = parser.parse_args()
logger.remove(0)
if args.verbose:
logger.add(sys.stderr, level="TRACE")
else:
logger.add(sys.stderr, level="DEBUG")
# Infer the bot file from the caller if not provided explicitly
bot_file = args.bot_file
if bot_file is None:
# Get the __file__ of the script that called main()
import inspect
caller_frame = inspect.stack()[1]
caller_globals = caller_frame.frame.f_globals
bot_file = caller_globals.get("__file__")
if not bot_file:
print("❌ Could not determine the bot file. Pass it explicitly to main().")
sys.exit(1)
# Import the bot file
try:
global run_bot_func, bot_module, is_webrtc_bot
bot_module, run_bot_func, is_webrtc_bot = import_bot_file(bot_file)
logger.info(f"Successfully loaded bot from {bot_file}")
if is_webrtc_bot:
logger.info("Detected WebRTC-compatible bot, starting web server...")
uvicorn.run(app, host=args.host, port=args.port)
else:
logger.info("Detected standalone bot, running directly...")
asyncio.run(run_standalone_bot())
except Exception as e:
logger.error(f"Error loading bot file: {e}")
sys.exit(1)
if __name__ == "__main__":
main()

View File

@@ -16,7 +16,7 @@
"@types/node": "^22.13.1",
"@vitejs/plugin-react-swc": "^3.7.2",
"typescript": "^5.7.3",
"vite": "^6.3.5"
"vite": "^6.0.2"
}
},
"node_modules/@babel/runtime": {
@@ -1371,9 +1371,9 @@
}
},
"node_modules/vite": {
"version": "6.3.5",
"resolved": "https://registry.npmjs.org/vite/-/vite-6.3.5.tgz",
"integrity": "sha512-cZn6NDFE7wdTpINgs++ZJ4N49W2vRp8LCKrn3Ob1kYNtOo21vfDoaV5GzBfLU4MovSAB8uNRm4jgzVQZ+mBzPQ==",
"version": "6.3.3",
"resolved": "https://registry.npmjs.org/vite/-/vite-6.3.3.tgz",
"integrity": "sha512-5nXH+QsELbFKhsEfWLkHrvgRpTdGJzqOZ+utSdmPTvwHmvU6ITTm3xx+mRusihkcI8GeC7lCDyn3kDtiki9scw==",
"dev": true,
"dependencies": {
"esbuild": "^0.25.0",

View File

@@ -15,7 +15,7 @@
"@types/node": "^22.13.1",
"@vitejs/plugin-react-swc": "^3.7.2",
"typescript": "^5.7.3",
"vite": "^6.3.5"
"vite": "^6.0.2"
},
"dependencies": {
"@pipecat-ai/client-js": "^0.3.2",

View File

@@ -13,7 +13,7 @@
"@pipecat-ai/daily-transport": "^0.3.8"
},
"devDependencies": {
"vite": "^6.3.5"
"vite": "^6.0.9"
}
},
"node_modules/@babel/runtime": {
@@ -1114,9 +1114,9 @@
}
},
"node_modules/vite": {
"version": "6.3.5",
"resolved": "https://registry.npmjs.org/vite/-/vite-6.3.5.tgz",
"integrity": "sha512-cZn6NDFE7wdTpINgs++ZJ4N49W2vRp8LCKrn3Ob1kYNtOo21vfDoaV5GzBfLU4MovSAB8uNRm4jgzVQZ+mBzPQ==",
"version": "6.3.3",
"resolved": "https://registry.npmjs.org/vite/-/vite-6.3.3.tgz",
"integrity": "sha512-5nXH+QsELbFKhsEfWLkHrvgRpTdGJzqOZ+utSdmPTvwHmvU6ITTm3xx+mRusihkcI8GeC7lCDyn3kDtiki9scw==",
"dev": true,
"dependencies": {
"esbuild": "^0.25.0",

View File

@@ -12,7 +12,7 @@
"license": "ISC",
"description": "",
"devDependencies": {
"vite": "^6.3.5"
"vite": "^6.0.9"
},
"dependencies": {
"@pipecat-ai/client-js": "^0.3.5",

View File

@@ -1,5 +0,0 @@
ios
android
node_modules
.expo
.env

View File

@@ -1 +0,0 @@
22.14

View File

@@ -1,21 +0,0 @@
// Disabling the logs from react-native-webrtc
import debug from 'debug';
debug.disable('rn-webrtc:*');
// Ignoring the warnings from react-native-background-timer while they don't fix this issue:
// https://github.com/ocetnik/react-native-background-timer/issues/366
import { LogBox } from 'react-native';
LogBox.ignoreLogs([
"`new NativeEventEmitter()` was called with a non-null argument without the required `addListener` method.",
"`new NativeEventEmitter()` was called with a non-null argument without the required `removeListeners` method."
]);
// Enable debug logs
/*window.localStorage = window.localStorage || {};
window.localStorage.debug = '*';
window.localStorage.getItem = (itemName) => {
console.log('Requesting the localStorage item ', itemName);
return window.localStorage[itemName];
};*/
export { default } from './src/App';

View File

@@ -1,89 +0,0 @@
# React Native implementation
Basic implementation using the [Pipecat RN SDK](https://docs.pipecat.ai/client/react-native/introduction).
## Usage
### Expo requirements
This project cannot be used with an [Expo Go](https://docs.expo.dev/workflow/expo-go/) app because [it requires custom native code](https://docs.expo.io/workflow/customizing/).
When a project requires custom native code or a config plugin, we need to transition from using [Expo Go](https://docs.expo.dev/workflow/expo-go/)
to a [development build](https://docs.expo.dev/development/introduction/).
More details about the custom native code used by this demo can be found in [rn-daily-js-expo-config-plugin](https://github.com/daily-co/rn-daily-js-expo-config-plugin).
### Building remotely
If you do not have experience with Xcode and Android Studio builds or do not have them installed locally on your computer, you will need to follow [this guide from Expo to use EAS Build](https://docs.expo.dev/development/create-development-builds/#create-and-install-eas-build).
### Building locally
You will need to have installed locally on your computer:
- [Xcode](https://developer.apple.com/xcode/) to build for iOS;
- [Android Studio](https://developer.android.com/studio) to build for Android;
#### Install the demo dependencies
```bash
# Use the version of node specified in .nvmrc
nvm i
# Install dependencies
yarn install
# Before a native app can be compiled, the native source code must be generated.
npx expo prebuild
```
#### Running on Android
After plugging in an Android device [configured for debugging](https://developer.android.com/studio/debug/dev-options), run the following command:
```
npm run android
```
#### Running on iOS
First, you'll need to do a one-time setup. This is required to build to a physical device.
If you're familiar with Xcode, open `ios/RNSimpleChatbot.xcworkspace` and, in the target settings, provide a development team registered with Apple.
If you're newer to Xcode, here are some more detailed instructions to get you started.
First, open the project in Xcode. Make sure to specifically select `RNSimpleChatbot.xcworkspace` from `/ios`. The `/ios` directory will have been generated by running `npx expo prebuild` as instructed above. This is also a good time to plug in your iOS device to be sure the following steps are successful.
From the main menu, select `Settings` and then `Accounts`. Click the `+` sign to add an account (e.g. an Apple ID).
![xcode-accounts.png](./docsAssets/xcode-accounts.png)
Once an account is added, perform the following steps:
1. Close `Settings`.
1. Select the folder icon in the top left corner.
1. Select `RNSimpleChatbot` from the side panel
1. Navigate to `Signing & Capabilities` in the top nav bar.
1. Open the "Team" dropdown
1. Select the account added in the previous step.
The "Signing Certificate" section should update accordingly with your account information.
![xcode-signing.png](./docsAssets/xcode-signing.png)
**Troubleshooting common errors:**
- If you see the error `Change your bundle identifier to a unique string to try again`, update the "Bundle Identifier" input in `Signing & Capabilities` to make it unique. This should resolve the error.
- If you see an error that says `Xcode was unable to launch because it has an invalid code signature, inadequate entitlements or its profile has not been explicitly trusted by the user`, you may need to update the settings on your iPhone to enable the required permissions as follows:
1. Open `Settings` on your iPhone
1. Select `General`, then `Device Management`
1. Click `Trust` for DailyPlayground
- You may also be prompted to enter you login keychain password. Be sure to click `Always trust` to avoid the prompt showing multiple times.
After, run the following command:
```
npm run ios
```

View File

@@ -1,75 +0,0 @@
{
"expo": {
"name": "RN Simple Chatbot",
"slug": "simple-chatbot-demo",
"newArchEnabled": false,
"version": "1.0.0",
"orientation": "portrait",
"icon": "./assets/images/pipecat.png",
"userInterfaceStyle": "light",
"splash": {
"image": "./assets/images/splash.png",
"resizeMode": "contain",
"backgroundColor": "#ffffff"
},
"updates": {
"fallbackToCacheTimeout": 0
},
"assetBundlePatterns": [
"**/*"
],
"ios": {
"supportsTablet": true,
"bitcode": false,
"bundleIdentifier": "co.daily.SimpleChatbot",
"infoPlist": {
"UIBackgroundModes": [
"voip"
]
}
},
"android": {
"adaptiveIcon": {
"foregroundImage": "./assets/images/pipecat.png",
"backgroundColor": "#FFFFFF"
},
"package": "co.daily.SimpleChatbot",
"permissions": [
"android.permission.ACCESS_NETWORK_STATE",
"android.permission.BLUETOOTH",
"android.permission.CAMERA",
"android.permission.INTERNET",
"android.permission.MODIFY_AUDIO_SETTINGS",
"android.permission.RECORD_AUDIO",
"android.permission.SYSTEM_ALERT_WINDOW",
"android.permission.WAKE_LOCK",
"android.permission.FOREGROUND_SERVICE",
"android.permission.FOREGROUND_SERVICE_CAMERA",
"android.permission.FOREGROUND_SERVICE_MICROPHONE",
"android.permission.FOREGROUND_SERVICE_MEDIA_PROJECTION",
"android.permission.POST_NOTIFICATIONS"
]
},
"web": {
"favicon": "./assets/images/pipecat.png"
},
"plugins": [
"@config-plugins/react-native-webrtc",
"@daily-co/config-plugin-rn-daily-js",
[
"expo-build-properties",
{
"android": {
"minSdkVersion": 24,
"compileSdkVersion": 35,
"targetSdkVersion": 35,
"buildToolsVersion": "35.0.0"
},
"ios": {
"deploymentTarget": "15.1"
}
}
]
]
}
}

Binary file not shown.

Before

Width:  |  Height:  |  Size: 3.7 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 29 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 46 KiB

Some files were not shown because too many files have changed in this diff Show More