Compare commits

..

2 Commits

Author SHA1 Message Date
Chad Bailey
0369733f9c explicit MetadataFrame 2025-01-31 19:27:01 +00:00
Chad Bailey
74b85a450f wip 2025-01-31 19:27:01 +00:00
287 changed files with 2966 additions and 26666 deletions

View File

@@ -1,54 +0,0 @@
name: coverage
on:
workflow_dispatch:
push:
branches:
- main
pull_request:
branches:
- "**"
paths-ignore:
- "docs/**"
jobs:
coverage:
name: "Coverage"
runs-on: ubuntu-latest
steps:
- name: Checkout repo
uses: actions/checkout@v4
- name: Set up Python
id: setup_python
uses: actions/setup-python@v4
with:
python-version: "3.10"
- name: Cache virtual environment
uses: actions/cache@v3
with:
# We are hashing dev-requirements.txt and test-requirements.txt which
# contain all dependencies needed to run the tests.
key: venv-${{ runner.os }}-${{ steps.setup_python.outputs.python-version}}-${{ hashFiles('dev-requirements.txt') }}-${{ hashFiles('test-requirements.txt') }}
path: .venv
- name: Install system packages
id: install_system_packages
run: |
sudo apt-get install -y portaudio19-dev
- name: Setup virtual environment
run: |
python -m venv .venv
- name: Install basic Python dependencies
run: |
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r dev-requirements.txt -r test-requirements.txt
- name: Run tests with coverage
run: |
source .venv/bin/activate
coverage run
coverage xml
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v5
with:
token: ${{ secrets.CODECOV_TOKEN }}
slug: pipecat-ai/pipecat

15
.gitignore vendored
View File

@@ -32,21 +32,6 @@ fly.toml
# Example files
pipecat/examples/twilio-chatbot/templates/streams.xml
pipecat/examples/bot-ready-signalling/client/react-native/node_modules/
pipecat/examples/bot-ready-signalling/client/react-native/.expo/
pipecat/examples/bot-ready-signalling/client/react-native/dist/
pipecat/examples/bot-ready-signalling/client/react-native/npm-debug.*
pipecat/examples/bot-ready-signalling/client/react-native/*.jks
pipecat/examples/bot-ready-signalling/client/react-native/*.p8
pipecat/examples/bot-ready-signalling/client/react-native/*.p12
pipecat/examples/bot-ready-signalling/client/react-native/*.key
pipecat/examples/bot-ready-signalling/client/react-native/*.mobileprovision
pipecat/examples/bot-ready-signalling/client/react-native/*.orig.*
pipecat/examples/bot-ready-signalling/client/react-native/web-build/
# macOS
.DS_Store
# Documentation
docs/api/_build/

View File

@@ -9,321 +9,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added
- Pipecat version will now be logged on every application startup. This will
help us identify what version we are running in case of any issues.
- Added a new `StopFrame` which can be used to stop a pipeline task while
keeping the frame processors running. The frame processors could then be used
in a different pipeline. The difference between a `StopFrame` and a
`StopTaskFrame` is that, as with `EndFrame` and `EndTaskFrame`, the
`StopFrame` is pushed from the task and the `StopTaskFrame` is pushed upstream
inside the pipeline by any processor.
- Added a new `PipelineTask` parameter `observers` that replaces the previous
`PipelineParams.observers`.
- Added a new `PipelineTask` parameter `check_dangling_tasks` to enable or
disable checking for frame processors' dangling tasks when the Pipeline
finishes running.
- Added new `on_completion_timeout` event for LLM services (all OpenAI-based
services, Anthropic and Google). Note that this event will only get triggered
if LLM timeouts are setup and if the timeout was reached. It can be useful to
retrigger another completion and see if the timeout was just a blip.
- Added new log observers `LLMLogObserver` and `TranscriptionLogObserver` that
can be useful for debugging your pipelines.
- Added `room_url` property to `DailyTransport`.
- Added `addons` argument to `DeepgramSTTService`.
- Added `exponential_backoff_time()` to `utils.network` module.
### Changed
- ⚠️ `PipelineTask` now requires keyword arguments (except for the first one for
the pipeline).
- The base `TTSService` class now strips leading newlines before sending text
to the TTS provider. This change is to solve issues where some TTS providers,
like Azure, would not output text due to newlines.
- `GrokLLMSService` now uses `grok-2` as the default model.
- `AnthropicLLMService` now uses `claude-3-7-sonnet-20250219` as the default
model.
- `RimeHttpTTSService` needs an `aiohttp.ClientSession` to be passed to the
constructor as all the other HTTP-based services.
- `RimeHttpTTSService` doesn't use a default voice anymore.
- `DeepgramSTTService` now uses the new `nova-3` model by default. If you want
to use the previous model you can pass `LiveOptions(model="nova-2-general")`.
(see https://deepgram.com/learn/introducing-nova-3-speech-to-text-api)
```python
stt = DeepgramSTTService(..., live_options=LiveOptions(model="nova-2-general"))
```
### Deprecated
- `PipelineParams.observers` is now deprecated, you the new `PipelineTask`
parameter `observers`.
### Removed
- Remove `TransportParams.audio_out_is_live` since it was not being used at all.
### Fixed
- Fixed an `AudioContextWordTTSService` issue that would cause an `EndFrame` to
disconnect from the TTS service before audio from all the contexts was
received. This affected services like Cartesia and Rime.
- Fixed an issue that was not allowing to pass an `OpenAILLMContext` to create
`GoogleLLMService`'s context aggregators.
- Fixed a `ElevenLabsTTSService`, `FishAudioTTSService`, `LMNTTTSService` and
`PlayHTTTSService` issue that was resulting in audio requested before an
interruption being played after an interruption.
- Fixed `match_endofsentence` support for ellipses.
- Fixed an issue that would cause undesired interruptions via
`EmulateUserStartedSpeakingFrame` when only interim transcriptions (i.e. no
final transcriptions) where received.
- Fixed an issue where `EndTaskFrame` was not triggering
`on_client_disconnected` or closing the WebSocket in FastAPI.
- Fixed an issue in `DeepgramSTTService` where the `sample_rate` passed to the
`LiveOptions` was not being used, causing the service to use the default
sample rate of pipeline.
- Fixed a context aggregator issue that would not append the LLM text response
to the context if a function call happened in the same LLM turn.
- Fixed an issue that was causing HTTP TTS services to push `TTSStoppedFrame`
more than once.
- Fixed a `FishAudioTTSService` issue where `TTSStoppedFrame` was not being
pushed.
- Fixed an issue that `start_callback` was not invoked for some LLM services.
- Fixed an issue that would cause `DeepgramSTTService` to stop working after an
error occurred (e.g. sudden network loss). If the network recovered we would
not reconnect.
- Fixed a `STTMuteFilter` issue that would not mute user audio frames causing
transcriptions to be generated by the STT service.
### Other
- Added Gemini support to `examples/phone-chatbot`.
## [0.0.57] - 2025-02-14
### Added
- Added new `AudioContextWordTTSService`. This is a TTS base class for TTS
services that handling multiple separate audio requests.
- Added new frames `EmulateUserStartedSpeakingFrame` and
`EmulateUserStoppedSpeakingFrame` which can be used to emulated VAD behavior
without VAD being present or not being triggered.
- Added a new `audio_in_stream_on_start` field to `TransportParams`.
- Added a new method `start_audio_in_streaming` in the `BaseInputTransport`.
- This method should be used to start receiving the input audio in case the
field `audio_in_stream_on_start` is set to `false`.
- Added support for the `RTVIProcessor` to handle buffered audio in `base64`
format, converting it into InputAudioRawFrame for transport.
- Added support for the `RTVIProcessor` to trigger `start_audio_in_streaming`
only after the `client-ready` message.
- Added new `MUTE_UNTIL_FIRST_BOT_COMPLETE` strategy to `STTMuteStrategy`. This
strategy starts muted and remains muted until the first bot speech completes,
ensuring the bot's first response cannot be interrupted. This complements the
existing `FIRST_SPEECH` strategy which only mutes during the first detected
bot speech.
- Added support for Google Cloud Speech-to-Text V2 through `GoogleSTTService`.
- Added `RimeTTSService`, a new `WordTTSService`. Updated the foundational
example `07q-interruptible-rime.py` to use `RimeTTSService`.
- Added support for Groq's Whisper API through the new `GroqSTTService` and
OpenAI's Whisper API through the new `OpenAISTTService`. Introduced a new
base class `BaseWhisperSTTService` to handle common Whisper API
functionality.
- Added `PerplexityLLMService` for Perplexity NIM API integration, with an
OpenAI-compatible interface. Also, added foundational example
`14n-function-calling-perplexity.py`.
- Added `DailyTransport.update_remote_participants()`. This allows you to update
remote participant's settings, like their permissions or which of their
devices are enabled. Requires that the local participant have participant
admin permission.
### Changed
- We don't consider a colon `:` and end of sentence any more.
- Updated `DailyTransport` to respect the `audio_in_stream_on_start` field,
ensuring it only starts receiving the audio input if it is enabled.
- Updated `FastAPIWebsocketOutputTransport` to send `TransportMessageFrame` and
`TransportMessageUrgentFrame` to the serializer.
- Updated `WebsocketServerOutputTransport` to send `TransportMessageFrame` and
`TransportMessageUrgentFrame` to the serializer.
- Enhanced `STTMuteConfig` to validate strategy combinations, preventing
`MUTE_UNTIL_FIRST_BOT_COMPLETE` and `FIRST_SPEECH` from being used together
as they handle first bot speech differently.
- Updated foundational example `07n-interruptible-google.py` to use all Google
services.
- `RimeHttpTTSService` now uses the `mistv2` model by default.
- Improved error handling in `AzureTTSService` to properly detect and log
synthesis cancellation errors.
- Enhanced `WhisperSTTService` with full language support and improved model
documentation.
- Updated foundation example `14f-function-calling-groq.py` to use
`GroqSTTService` for transcription.
- Updated `GroqLLMService` to use `llama-3.3-70b-versatile` as the default
model.
- `RTVIObserver` doesn't handle `LLMSearchResponseFrame` frames anymore. For
now, to handle those frames you need to create a `GoogleRTVIObserver`
instead.
### Deprecated
- `STTMuteFilter` constructor's `stt_service` parameter is now deprecated and
will be removed in a future version. The filter now manages mute state
internally instead of querying the STT service.
- `RTVI.observer()` is now deprecated, instantiate an `RTVIObserver` directly
instead.
- All RTVI frame processors (e.g. `RTVISpeakingProcessor`,
`RTVIBotLLMProcessor`) are now deprecated, instantiate an `RTVIObserver`
instead.
### Fixed
- Fixed a `FalImageGenService` issue that was causing the event loop to be
blocked while loading the downloadded image.
- Fixed a `CartesiaTTSService` service issue that would cause audio overlapping
in some cases.
- Fixed a websocket-based service issue (e.g. `CartesiaTTSService`) that was
preventing a reconnection after the server disconnected cleanly, which was
causing an inifite loop instead.
- Fixed a `BaseOutputTransport` issue that was causing upstream frames to no be
pushed upstream.
- Fixed multiple issue where user transcriptions where not being handled
properly. It was possible for short utterances to not trigger VAD which would
cause user transcriptions to be ignored. It was also possible for one or more
transcriptions to be generated after VAD in which case they would also be
ignored.
- Fixed an issue that was causing `BotStoppedSpeakingFrame` to be generated too
late. This could then cause issues unblocking `STTMuteFilter` later than
desired.
- Fixed an issue that was causing `AudioBufferProcessor` to not record
synchronized audio.
- Fixed an `RTVI` issue that was causing `bot-tts-text` messages to be sent
before being processed by the output transport.
- Fixed an issue[#1192] in 11labs where we are trying to reconnect/disconnect
the websocket connection even when the connection is already closed.
- Fixed an issue where `has_regular_messages` condition was always true in
`GoogleLLMContext` due to `Part` having `function_call` & `function_response`
with `None` values.
### Other
- Added new `instant-voice` example. This example showcases how to enable
instant voice communication as soon as a user connects.
- Added new `local-input-select-stt` example. This examples allows you to play
with local audio inputs by slecting them through a nice text interface.
## [0.0.56] - 2025-02-06
### Changed
- Use `gemini-2.0-flash-001` as the default model for `GoogleLLMSerivce`.
- Improved foundational examples 22b, 22c, and 22d to support function calling.
With these base examples, `FunctionCallInProgressFrame` and
`FunctionCallResultFrame` will no longer be blocked by the gates.
### Fixed
- Fixed a `TkLocalTransport` and `LocalAudioTransport` issues that was causing
errors on cleanup.
- Fixed an issue that was causing `tests.utils` import to fail because of
logging setup.
- Fixed a `SentryMetrics` issue that was preventing any metrics to be sent to
Sentry and also was preventing from metrics frames to be pushed to the
pipeline.
- Fixed an issue in `BaseOutputTransport` where incoming audio would not be
resampled to the desired output sample rate.
- Fixed an issue with the `TwilioFrameSerializer` and `TelnyxFrameSerializer`
where `twilio_sample_rate` and `telnyx_sample_rate` were incorrectly
initialized to `audio_in_sample_rate`. Those values currently default to 8000
and should be set manually from the serializer constructor if a different
value is needed.
### Other
- Added a new `sentry-metrics` example.
## [0.0.55] - 2025-02-05
### Added
- Added a new `start_metadata` field to `PipelineParams`. The provided metadata
will be set to the initial `StartFrame` being pushed from the `PipelineTask`.
- Added new fields to `PipelineParams` to control audio input and output sample
rates for the whole pipeline. This allows controlling sample rates from a
single place instead of having to specify sample rates in each
service. Setting a sample rate to a service is still possible and will
override the value from `PipelineParams`.
- Introduce audio resamplers (`BaseAudioResampler`). This is just a base class
to implement audio resamplers. Currently, two implementations are provided
`SOXRAudioResampler` and `ResampyResampler`. A new
`create_default_resampler()` has been added (replacing the now deprecated
`resample_audio()`).
- It is now possible to specify the asyncio event loop that a `PipelineTask` and
all the processors should run on by passing it as a new argument to the
`PipelineRunner`. This could allow running pipelines in multiple threads each
@@ -356,12 +41,6 @@ stt = DeepgramSTTService(..., live_options=LiveOptions(model="nova-2-general"))
### Changed
- `GatedOpenAILLMContextAggregator` now require keyword arguments. Also, a new
`start_open` argument has been added to set the initial state of the gate.
- Added `organization` and `project` level authentication to
`OpenAILLMService`.
- Improved the language checking logic in `ElevenLabsTTSService` and
`ElevenLabsHttpTTSService` to properly handle language codes based on model
compatibility, with appropriate warnings when language codes cannot be
@@ -371,30 +50,8 @@ stt = DeepgramSTTService(..., live_options=LiveOptions(model="nova-2-general"))
contain a combination of function calls, function call responses, system
messages, or just messages.
- `InputDTMFFrame` is now based on `DTMFFrame`. There's also a new
`OutputDTMFFrame` frame.
### Deprecated
- `resample_audio()` is now deprecated, use `create_default_resampler()`
instead.
### Removed
- `AudioBufferProcessor.reset_audio_buffers()` has been removed, use
`AudioBufferProcessor.start_recording()` and
`AudioBufferProcessor.stop_recording()` instead.
### Fixed
- Fixed a `AudioBufferProcessor` that would cause crackling in some recordings.
- Fixed an issue in `AudioBufferProcessor` where user callback would not be
called on task cancellation.
- Fixed an issue in `AudioBufferProcessor` that would cause wrong silence
padding in some cases.
- Fixed an issue where `ElevenLabsTTSService` messages would return a 1009
websocket error by increasing the max message size limit to 16MB.
@@ -410,22 +67,11 @@ stt = DeepgramSTTService(..., live_options=LiveOptions(model="nova-2-general"))
### Other
- Improved Unit Test `run_test()` to use `PipelineTask` and
`PipelineRunner`. There's now also some control around `StartFrame` and
`EndFrame`. The `EndTaskFrame` has been removed since it doesn't seem
necessary with this new approach.
- Updated `twilio-chatbot` with a few new features: use 8000 sample rate and
avoid resampling, a new client useful for stress testing and testing locally
without the need to make phone calls. Also, added audio recording on both the
client and the server to make sure the audio sounds good.
- Updated examples to use `task.cancel()` to immediately exit the example when a
participant leaves or disconnects, instead of pushing an `EndFrame`. Pushing
an `EndFrame` causes the bot to run through everything that is internally
queued (which could take some seconds). Note that using `task.cancel()` might
not always be the best option and pushing an `EndFrame` could still be
desirable to make sure all the pipeline is flushed.
- Updated all examples to use `task.cancel()` instead of pushing an `EndFrame`
when a participant leaves/disconnects. If you push an `EndFrame` this will
cause the bot to run through everything that is internally queued (which could
take seconds). Instead, if a participant disconnects there is nothing else to
be sent and therefore we should stop immediately.
## [0.0.54] - 2025-01-27
@@ -449,8 +95,7 @@ stt = DeepgramSTTService(..., live_options=LiveOptions(model="nova-2-general"))
- Added `enable_recording` and `geo` parameters to `DailyRoomProperties`.
- Added `RecordingsBucketConfig` to `DailyRoomProperties` to upload recordings
to a custom AWS bucket.
- Added `RecordingsBucketConfig` to `DailyRoomProperties` to upload recordings to a custom AWS bucket.
### Changed
@@ -1875,9 +1520,6 @@ async def on_connected(processor):
### Changed
- `FrameSerializer.serialize()` and `FrameSerializer.deserialize()` are now
`async`.
- `Filter` has been renamed to `FrameFilter` and it's now under
`processors/filters`.

View File

@@ -2,7 +2,7 @@
 <img alt="pipecat" width="300px" height="auto" src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/pipecat.png">
</div></h1>
[![PyPI](https://img.shields.io/pypi/v/pipecat-ai)](https://pypi.org/project/pipecat-ai) ![Tests](https://github.com/pipecat-ai/pipecat/actions/workflows/tests.yaml/badge.svg) [![codecov](https://codecov.io/gh/pipecat-ai/pipecat/graph/badge.svg?token=LNVUIVO4Y9)](https://codecov.io/gh/pipecat-ai/pipecat) [![Docs](https://img.shields.io/badge/Documentation-blue)](https://docs.pipecat.ai) [![Discord](https://img.shields.io/discord/1239284677165056021)](https://discord.gg/pipecat)
[![PyPI](https://img.shields.io/pypi/v/pipecat-ai)](https://pypi.org/project/pipecat-ai) ![Tests](https://github.com/pipecat-ai/pipecat/actions/workflows/tests.yaml/badge.svg) [![Docs](https://img.shields.io/badge/Documentation-blue)](https://docs.pipecat.ai) [![Discord](https://img.shields.io/discord/1239284677165056021)](https://discord.gg/pipecat) <a href="https://app.commanddash.io/agent/github_pipecat-ai_pipecat"><img src="https://img.shields.io/badge/AI-Code%20Agent-EB9FDA"></a>
Pipecat is an open source Python framework for building voice and multimodal conversational agents. It handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions, letting you focus on creating engaging experiences.
@@ -55,17 +55,17 @@ pip install "pipecat-ai[option,...]"
### Available services
| Category | Services | Install Command Example |
| ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------- |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) | `pip install "pipecat-ai[deepgram]"` |
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Together AI](https://docs.pipecat.ai/server/services/llm/together) | `pip install "pipecat-ai[openai]"` |
| Text-to-Speech | [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) | `pip install "pipecat-ai[cartesia]"` |
| Speech-to-Speech | [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai) | `pip install "pipecat-ai[google]"` |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local | `pip install "pipecat-ai[daily]"` |
| Video | [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) | `pip install "pipecat-ai[tavus,simli]"` |
| Vision & Image | [Moondream](https://docs.pipecat.ai/server/services/vision/moondream), [fal](https://docs.pipecat.ai/server/services/image-generation/fal) | `pip install "pipecat-ai[moondream]"` |
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [Noisereduce](https://docs.pipecat.ai/server/utilities/audio/noisereduce-filter) | `pip install "pipecat-ai[silero]"` |
| Analytics & Metrics | [Canonical AI](https://docs.pipecat.ai/server/services/analytics/canonical), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) | `pip install "pipecat-ai[canonical]"` |
| Category | Services | Install Command Example |
| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------- |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) | `pip install "pipecat-ai[deepgram]"` |
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Together AI](https://docs.pipecat.ai/server/services/llm/together) | `pip install "pipecat-ai[openai]"` |
| Text-to-Speech | [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) | `pip install "pipecat-ai[cartesia]"` |
| Speech-to-Speech | [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai) | `pip install "pipecat-ai[openai]"` |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local | `pip install "pipecat-ai[daily]"` |
| Video | [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) | `pip install "pipecat-ai[tavus,simli]"` |
| Vision & Image | [Moondream](https://docs.pipecat.ai/server/services/vision/moondream), [fal](https://docs.pipecat.ai/server/services/image-generation/fal) | `pip install "pipecat-ai[moondream]"` |
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [Noisereduce](https://docs.pipecat.ai/server/utilities/audio/noisereduce-filter) | `pip install "pipecat-ai[silero]"` |
| Analytics & Metrics | [Canonical AI](https://docs.pipecat.ai/server/services/analytics/canonical), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) | `pip install "pipecat-ai[canonical]"` |
📚 [View full services documentation →](https://docs.pipecat.ai/server/services/supported-services)
@@ -149,40 +149,36 @@ Sign up [here](https://dashboard.daily.co/u/signup) and [create a room](https://
## Hacking on the framework itself
_Note: You may need to set up a virtual environment before following these instructions. From the root of the repo:_
_Note that you may need to set up a virtual environment before following the instructions below. For instance, you might need to run the following from the root of the repo:_
```shell
python3 -m venv venv
source venv/bin/activate
```
Install the development dependencies:
From the root of this repo, run the following:
```shell
pip install -r dev-requirements.txt
```
Install the git pre-commit hooks (these help ensure your code follows project rules):
This will install the necessary development dependencies. Also, make sure you install the git pre-commit hooks:
```shell
pre-commit install
```
Install the `pipecat-ai` package locally in editable mode:
The hooks will just save you time when you submit a PR by making sure your code follows the project rules.
To use the package locally (e.g. to run sample files), run:
```shell
pip install -e .
pip install --editable ".[option,...]"
```
The `-e` or `--editable` option allows you to modify the code without reinstalling.
The `--editable` option makes sure you don't have to run `pip install` again and you can just edit the project files locally.
To include optional dependencies, add them to the install command. For example:
```shell
pip install -e ".[daily,deepgram,cartesia,openai,silero]" # Updated for the services you're using
```
If you want to use this package from another directory:
If you want to use this package from another directory, you can run:
```shell
pip install "path_to_this_repo[option,...]"

View File

@@ -1,11 +0,0 @@
coverage:
range: 50..90 # coverage lower than 50 is red, higher than 90 green, between color code
status:
project:
default:
target: auto # auto % coverage target
threshold: 5% # allow for 5% reduction of coverage without failing
# do not run coverage on patch nor changes
patch: false

View File

@@ -1,12 +1,11 @@
build~=1.2.2
coverage~=7.6.12
grpcio-tools~=1.67.1
grpcio-tools~=1.69.0
pip-tools~=7.4.1
pre-commit~=4.0.1
pyright~=1.1.394
pyright~=1.1.392
pytest~=8.3.4
pytest-asyncio~=0.25.3
ruff~=0.9.7
setuptools~=70.0.0
pytest-asyncio~=0.25.2
ruff~=0.9.1
setuptools~=75.8.0
setuptools_scm~=8.1.0
python-dotenv~=1.0.1

View File

@@ -18,9 +18,6 @@ AZURE_DALLE_API_KEY=...
AZURE_DALLE_ENDPOINT=https://...
AZURE_DALLE_MODEL=...
# Cartesia
CARTESIA_API_KEY=...
# Daily
DAILY_API_KEY=...
DAILY_SAMPLE_ROOM_URL=https://...

View File

@@ -12,7 +12,7 @@
"@daily-co/daily-js": "0.74.0"
},
"devDependencies": {
"vite": "^6.0.9"
"vite": "^6.0.2"
}
},
"node_modules/@babel/runtime": {
@@ -1007,14 +1007,15 @@
}
},
"node_modules/vite": {
"version": "6.1.0",
"resolved": "https://registry.npmjs.org/vite/-/vite-6.1.0.tgz",
"integrity": "sha512-RjjMipCKVoR4hVfPY6GQTgveinjNuyLw+qruksLDvA5ktI1150VmcMBKmQaEWJhg/j6Uaf6dNCNA0AfdzUb/hQ==",
"version": "6.0.7",
"resolved": "https://registry.npmjs.org/vite/-/vite-6.0.7.tgz",
"integrity": "sha512-RDt8r/7qx9940f8FcOIAH9PTViRrghKaK2K1jY3RaAURrEUbm9Du1mJ72G+jlhtG3WwodnfzY8ORQZbBavZEAQ==",
"dev": true,
"license": "MIT",
"dependencies": {
"esbuild": "^0.24.2",
"postcss": "^8.5.1",
"rollup": "^4.30.1"
"postcss": "^8.4.49",
"rollup": "^4.23.0"
},
"bin": {
"vite": "bin/vite.js"

View File

@@ -12,7 +12,7 @@
"license": "ISC",
"description": "",
"devDependencies": {
"vite": "^6.0.9"
"vite": "^6.0.2"
},
"dependencies": {
"@daily-co/daily-js": "0.74.0"

View File

@@ -1,60 +0,0 @@
# React Native Implementation
Basic implementation using the [Pipecat React Native SDK](https://docs.pipecat.ai/client/react-native/introduction).
## Usage
### Expo requirements
This project cannot be used with an [Expo Go](https://docs.expo.dev/workflow/expo-go/) app because [it requires custom native code](https://docs.expo.io/workflow/customizing/).
When a project requires custom native code or a config plugin, we need to transition from using [Expo Go](https://docs.expo.dev/workflow/expo-go/)
to a [development build](https://docs.expo.dev/development/introduction/).
More details about the custom native code used by this demo can be found in [rn-daily-js-expo-config-plugin](https://github.com/daily-co/rn-daily-js-expo-config-plugin).
### Building remotely
If you do not have experience with Xcode and Android Studio builds or do not have them installed locally on your computer, you will need to follow [this guide from Expo to use EAS Build](https://docs.expo.dev/development/create-development-builds/#create-and-install-eas-build).
### Building locally
You will need to have installed locally on your computer:
- [Xcode](https://developer.apple.com/xcode/) to build for iOS;
- [Android Studio](https://developer.android.com/studio) to build for Android;
#### Install the demo dependencies
```bash
# Use the version of node specified in .nvmrc
nvm i
# Install dependencies
npm i
# Before a native app can be compiled, the native source code must be generated.
npx expo prebuild
# Configure the environment variable to connect to the local server
cp env.example .env
# edit .env and add your local ip address, for example: http://192.168.1.16:7860
```
#### Running on Android
After plugging in an Android device [configured for debugging](https://developer.android.com/studio/debug/dev-options), run the following command:
```
npm run android
```
#### Running on iOS
Run the following command:
```
npm run ios
```
#### Connect to the server
Use the http://localhost:5173 in your app.

View File

@@ -1,75 +0,0 @@
{
"expo": {
"name": "bot-ready-rn",
"slug": "bot-ready-rn",
"version": "1.0.0",
"orientation": "portrait",
"icon": "./assets/icon.png",
"userInterfaceStyle": "light",
"splash": {
"image": "./assets/splash.png",
"resizeMode": "contain",
"backgroundColor": "#ffffff"
},
"updates": {
"fallbackToCacheTimeout": 0
},
"assetBundlePatterns": [
"**/*"
],
"ios": {
"supportsTablet": true,
"bitcode": false,
"bundleIdentifier": "co.daily.expo.BotReady",
"infoPlist": {
"UIBackgroundModes": [
"voip"
]
},
"appleTeamId": "EEBGKV9N3N"
},
"android": {
"adaptiveIcon": {
"foregroundImage": "./assets/adaptive-icon.png",
"backgroundColor": "#FFFFFF"
},
"package": "co.daily.expo.BotReady",
"permissions": [
"android.permission.ACCESS_NETWORK_STATE",
"android.permission.BLUETOOTH",
"android.permission.CAMERA",
"android.permission.INTERNET",
"android.permission.MODIFY_AUDIO_SETTINGS",
"android.permission.RECORD_AUDIO",
"android.permission.SYSTEM_ALERT_WINDOW",
"android.permission.WAKE_LOCK",
"android.permission.FOREGROUND_SERVICE",
"android.permission.FOREGROUND_SERVICE_CAMERA",
"android.permission.FOREGROUND_SERVICE_MICROPHONE",
"android.permission.FOREGROUND_SERVICE_MEDIA_PROJECTION",
"android.permission.POST_NOTIFICATIONS"
]
},
"web": {
"favicon": "./assets/favicon.png"
},
"plugins": [
"@config-plugins/react-native-webrtc",
"@daily-co/config-plugin-rn-daily-js",
[
"expo-build-properties",
{
"android": {
"minSdkVersion": 24,
"compileSdkVersion": 35,
"targetSdkVersion": 34,
"buildToolsVersion": "35.0.0"
},
"ios": {
"deploymentTarget": "15.1"
}
}
]
]
}
}

Binary file not shown.

Before

Width:  |  Height:  |  Size: 17 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.4 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 22 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 46 KiB

View File

@@ -1,7 +0,0 @@
module.exports = function(api) {
api.cache(true);
return {
presets: ['babel-preset-expo'],
plugins: [["module:react-native-dotenv"]],
};
};

View File

@@ -1 +0,0 @@
API_BASE_URL=http://YOUR_LOCAL_IP:7860

View File

@@ -1,7 +0,0 @@
import { registerRootComponent } from "expo";
import App from "./src/App";
// registerRootComponent calls AppRegistry.registerComponent('main', () => App);
// It also ensures that the environment is set up appropriately
registerRootComponent(App);

View File

@@ -1,4 +0,0 @@
// Learn more https://docs.expo.io/guides/customizing-metro
const { getDefaultConfig } = require('expo/metro-config');
module.exports = getDefaultConfig(__dirname);

File diff suppressed because it is too large Load Diff

View File

@@ -1,31 +0,0 @@
{
"name": "bot-ready-rn",
"version": "1.0.0",
"scripts": {
"start": "expo start --dev-client",
"android": "expo run:android --device",
"ios": "expo run:ios --device",
"web": "expo start --web"
},
"dependencies": {
"@config-plugins/react-native-webrtc": "^10.0.0",
"@daily-co/config-plugin-rn-daily-js": "0.0.7",
"@daily-co/react-native-daily-js": "^0.70.0",
"@daily-co/react-native-webrtc": "^118.0.3-daily.2",
"@react-native-async-storage/async-storage": "1.23.1",
"expo": "^52.0.0",
"expo-build-properties": "~0.13.1",
"expo-dev-client": "~5.0.5",
"expo-splash-screen": "~0.29.16",
"expo-status-bar": "~2.0.0",
"react": "18.3.1",
"react-native": "0.76.3",
"react-native-background-timer": "^2.4.1",
"react-native-dotenv": "^3.4.11",
"react-native-get-random-values": "^1.11.0"
},
"devDependencies": {
"@babel/core": "^7.12.9"
},
"private": true
}

View File

@@ -1,121 +0,0 @@
import React, { useState, useEffect } from 'react';
import {SafeAreaView, View, Text, Button, StyleSheet, ScrollView} from 'react-native';
import Daily from "@daily-co/react-native-daily-js";
import { API_BASE_URL } from "@env";
const CallScreen = () => {
const [connectionStatus, setConnectionStatus] = useState('Disconnected');
const [isConnected, setIsConnected] = useState(false);
const [callObject, setCallObject] = useState(null);
const [logs, setLogs] = useState([]);
useEffect(() => {
if (callObject) {
setupTrackListeners(callObject);
}
}, [callObject]);
const log = (message) => {
setLogs((prevLogs) => [...prevLogs, `${new Date().toISOString()} - ${message}`]);
console.log(message);
};
const setupTrackListeners = (callObject) => {
callObject.on("joined-meeting", () => {
setConnectionStatus('Connected');
setIsConnected(true);
log('Client connected');
});
callObject.on("left-meeting", () => {
setConnectionStatus('Disconnected');
setIsConnected(false);
log('Client disconnected');
});
callObject.on("participant-left", () => {
// When the bot leaves, we are also disconnecting from the call
disconnect().catch((err) => {
log(`Failed to disconnect ${err}`);
})
});
// Trigger so the bot can start sending audio
callObject.on("track-started", (evt) => {
if (evt.track.kind === "audio" && evt.participant.local === false) {
handleEventToConsole(evt)
log("Sending the message that will trigger the bot to play the audio.")
callObject.sendAppMessage("playable")
}
});
callObject.on("error", (evt) => log(`Error: ${evt.error}`));
// Other events just for awareness
callObject.on("track-stopped", handleEventToConsole);
callObject.on("participant-joined", handleEventToConsole);
callObject.on("participant-updated", handleEventToConsole);
};
const handleEventToConsole = (evt) => {
log(`Received event: ${evt.action}`);
};
const connect = async () => {
try {
const callObject = Daily.createCallObject({ subscribeToTracksAutomatically: true });
setCallObject(callObject);
const connectionUrl = `${API_BASE_URL}/connect`
const res = await fetch(connectionUrl, { method: "POST", headers: { "Content-Type": "application/json" } });
const roomInfo = await res.json();
await callObject.join({ url: roomInfo.room_url });
} catch (error) {
log(`Error connecting: ${error.message}`);
}
};
const disconnect = async () => {
if (callObject) {
try {
await callObject.leave();
await callObject.destroy();
setCallObject(null);
} catch (error) {
log(`Error disconnecting: ${error.message}`);
}
}
};
return (
<SafeAreaView style={styles.safeArea}>
<View style={styles.container}>
<View style={styles.statusBar}>
<Text>Status: <Text style={styles.status}>{connectionStatus}</Text></Text>
<View style={styles.controls}>
<Button
title={isConnected ? "Disconnect" : "Connect"}
onPress={isConnected ? disconnect : connect}
/>
</View>
</View>
<View style={styles.debugPanel}>
<Text style={styles.debugTitle}>Debug Info</Text>
<ScrollView style={styles.debugLog}>
{logs.map((logEntry, index) => (
<Text key={index} style={styles.logText}>{logEntry}</Text>
))}
</ScrollView>
</View>
</View>
</SafeAreaView>
);
};
const styles = StyleSheet.create({
safeArea: { flex: 1, backgroundColor: '#f0f0f0', padding: 20 },
container: { flex: 1, margin: 20 },
statusBar: { flexDirection: 'row', justifyContent: 'space-between', alignItems: 'center', padding: 10, backgroundColor: '#fff', borderRadius: 8, marginBottom: 20 },
status: { fontWeight: 'bold' },
controls: { flexDirection: 'row', gap: 10 },
debugPanel: { height: '80%', backgroundColor: '#fff', borderRadius: 8, padding: 20},
debugTitle: { fontSize: 16, fontWeight: 'bold' },
debugLog: { height: '100%', overflow: 'scroll', backgroundColor: '#f8f8f8', padding: 10, borderRadius: 4, fontFamily: 'monospace', fontSize: 12, lineHeight: 1.4 },
});
export default CallScreen;

View File

@@ -6,7 +6,6 @@
import argparse
import os
from typing import Optional
import aiohttp
@@ -19,7 +18,7 @@ async def configure(aiohttp_session: aiohttp.ClientSession):
async def configure_with_args(
aiohttp_session: aiohttp.ClientSession, parser: Optional[argparse.ArgumentParser] = None
aiohttp_session: aiohttp.ClientSession, parser: argparse.ArgumentParser | None = None
):
if not parser:
parser = argparse.ArgumentParser(description="Daily AI SDK Bot Sample")

View File

@@ -31,15 +31,16 @@ logger.add(sys.stderr, level="DEBUG")
class SilenceFrame(OutputAudioRawFrame):
def __init__(
self,
*,
sample_rate: int,
duration: float,
audio: bytes = None,
sample_rate: int = 16000,
num_channels: int = 1,
duration: float = 0.1,
):
# Initialize the parent class with the silent frame's data
super().__init__(
audio=self.create_silent_audio_frame(sample_rate, 1, duration).audio,
audio=self.create_silent_audio_frame(sample_rate, num_channels, duration).audio,
sample_rate=sample_rate,
num_channels=1,
num_channels=num_channels,
)
@staticmethod
@@ -79,10 +80,7 @@ async def main():
return
await task.queue_frames(
[
SilenceFrame(
sample_rate=task.params.audio_out_sample_rate,
duration=0.5,
),
SilenceFrame(duration=0.5),
TTSSpeakFrame(f"Hello there, how are you doing today ?"),
EndFrame(),
]

View File

@@ -65,6 +65,7 @@ async def main():
# English
#
voice_id="cgSgspJ2msm6clMCkdW9",
aiohttp_session=session,
#
# Spanish
#
@@ -119,11 +120,10 @@ async def main():
]
)
task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True))
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await audio_buffer_processor.start_recording()
await transport.capture_participant_transcription(participant["id"])
await task.queue_frames([context_aggregator.user().get_context_frame()])

View File

@@ -82,6 +82,7 @@ async def main():
# English
#
voice_id="cgSgspJ2msm6clMCkdW9",
aiohttp_session=session,
#
# Spanish
#
@@ -108,9 +109,8 @@ async def main():
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
# NOTE: Watch out! This will save all the conversation in memory. You
# can pass `buffer_size` to get periodic callbacks.
audiobuffer = AudioBufferProcessor()
# Save audio every 10 seconds.
audiobuffer = AudioBufferProcessor(buffer_size=480000)
pipeline = Pipeline(
[
@@ -124,7 +124,7 @@ async def main():
]
)
task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True))
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@audiobuffer.event_handler("on_audio_data")
async def on_audio_data(buffer, audio, sample_rate, num_channels):
@@ -132,7 +132,6 @@ async def main():
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await audiobuffer.start_recording()
await transport.capture_participant_transcription(participant["id"])
await task.queue_frames([context_aggregator.user().get_context_frame()])

View File

@@ -70,7 +70,7 @@ async def main(room_url: str, token: str):
]
)
task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True))
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):

View File

@@ -62,7 +62,7 @@ async def main(room_url: str, token: str):
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -16,7 +16,8 @@ from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.transports.local.audio import LocalAudioTransport, LocalAudioTransportParams
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.local.audio import LocalAudioTransport
load_dotenv(override=True)
@@ -25,7 +26,7 @@ logger.add(sys.stderr, level="DEBUG")
async def main():
transport = LocalAudioTransport(LocalAudioTransportParams(audio_out_enabled=True))
transport = LocalAudioTransport(TransportParams(audio_out_enabled=True))
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
@@ -40,7 +41,7 @@ async def main():
await asyncio.sleep(1)
await task.queue_frames([TTSSpeakFrame("Hello there, how is it going!"), EndFrame()])
runner = PipelineRunner(handle_sigint=False if sys.platform == "win32" else True)
runner = PipelineRunner()
await asyncio.gather(runner.run(task), say_something())

View File

@@ -18,7 +18,8 @@ from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.services.fal import FalImageGenService
from pipecat.transports.local.tk import TkLocalTransport, TkTransportParams
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.local.tk import TkLocalTransport
load_dotenv(override=True)
@@ -33,9 +34,7 @@ async def main():
transport = TkLocalTransport(
tk_root,
TkTransportParams(
camera_out_enabled=True, camera_out_width=1024, camera_out_height=1024
),
TransportParams(camera_out_enabled=True, camera_out_width=1024, camera_out_height=1024),
)
imagegen = FalImageGenService(

View File

@@ -44,8 +44,7 @@ async def main():
runner = PipelineRunner()
task = PipelineTask(
Pipeline([imagegen, transport.output()]),
params=PipelineParams(enable_metrics=True),
Pipeline([imagegen, transport.output()]), PipelineParams(enable_metrics=True)
)
@transport.event_handler("on_first_participant_joined")

View File

@@ -51,6 +51,7 @@ async def main():
)
elevenlabs_tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)

View File

@@ -30,7 +30,8 @@ from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.cartesia import CartesiaHttpTTSService
from pipecat.services.fal import FalImageGenService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.local.tk import TkLocalTransport, TkTransportParams
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.local.tk import TkLocalTransport, TkOutputTransport
load_dotenv(override=True)
@@ -151,7 +152,7 @@ async def main():
transport = TkLocalTransport(
tk_root,
TkTransportParams(
TransportParams(
audio_out_enabled=True,
camera_out_enabled=True,
camera_out_width=1024,

View File

@@ -105,10 +105,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
PipelineParams(enable_metrics=True, enable_usage_metrics=True),
)
@transport.event_handler("on_first_participant_joined")

View File

@@ -27,7 +27,7 @@ from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.cartesia import CartesiaHttpTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
@@ -91,7 +91,7 @@ async def main():
),
)
tts = CartesiaTTSService(
tts = CartesiaHttpTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
@@ -127,7 +127,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -76,7 +76,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -74,7 +74,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -79,7 +79,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -103,7 +103,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -81,7 +81,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -74,7 +74,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -74,7 +74,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -75,7 +75,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -77,7 +77,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -83,7 +83,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -18,7 +18,7 @@ from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.openai import OpenAILLMService, OpenAISTTService, OpenAITTSService
from pipecat.services.openai import OpenAILLMService, OpenAITTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
@@ -38,21 +38,12 @@ async def main():
DailyParams(
audio_out_enabled=True,
audio_out_sample_rate=24000,
transcription_enabled=False,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
# You can use the OpenAI compatible API like Groq.
# stt = OpenAISTTService(
# base_url="https://api.groq.com/openai/v1",
# api_key="gsk_***",
# model="whisper-large-v3",
# )
stt = OpenAISTTService(api_key=os.getenv("OPENAI_API_KEY"), model="whisper-1")
tts = OpenAITTSService(api_key=os.getenv("OPENAI_API_KEY"), voice="alloy")
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
@@ -70,7 +61,6 @@ async def main():
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
@@ -81,7 +71,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -81,7 +81,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -75,7 +75,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -80,7 +80,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -38,6 +38,7 @@ async def main():
"Respond bot",
DailyParams(
audio_out_enabled=True,
audio_out_sample_rate=24000,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
@@ -71,7 +72,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -88,7 +88,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -81,7 +81,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -18,7 +18,9 @@ from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.google import GoogleLLMService, GoogleSTTService, GoogleTTSService
from pipecat.services.deepgram import DeepgramSTTService
from pipecat.services.google import GoogleTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transcriptions.language import Language
from pipecat.transports.services.daily import DailyParams, DailyTransport
@@ -38,22 +40,21 @@ async def main():
"Respond bot",
DailyParams(
audio_out_enabled=True,
audio_out_sample_rate=24000,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
stt = GoogleSTTService(
params=GoogleSTTService.InputParams(languages=Language.EN_US),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = GoogleTTSService(
voice_id="en-US-Journey-F",
params=GoogleTTSService.InputParams(language=Language.EN_US),
)
llm = GoogleLLMService(api_key=os.getenv("GOOGLE_API_KEY"))
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [
{
@@ -79,7 +80,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -80,7 +80,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -76,7 +76,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -19,7 +19,7 @@ from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.openai import OpenAILLMService
from pipecat.services.rime import RimeTTSService
from pipecat.services.rime import RimeHttpTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
@@ -44,9 +44,10 @@ async def main():
),
)
tts = RimeTTSService(
tts = RimeHttpTTSService(
api_key=os.getenv("RIME_API_KEY", ""),
voice_id="rex",
params=RimeHttpTTSService.InputParams(reduce_latency=True),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
@@ -74,7 +75,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -74,7 +74,7 @@ async def main():
]
)
task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True))
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):

View File

@@ -216,7 +216,11 @@ async def main():
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
llm = GoogleLLMService(api_key=os.getenv("GOOGLE_API_KEY"), model="gemini-2.0-flash-001")
llm = GoogleLLMService(
model="gemini-1.5-flash-latest",
# model="gemini-exp-1114",
api_key=os.getenv("GOOGLE_API_KEY"),
)
messages = [
{
@@ -251,7 +255,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -74,7 +74,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -48,6 +48,7 @@ async def main():
region=os.getenv("AZURE_SPEECH_REGION"),
)
tts2 = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id="jBpfuIE2acCO8z3wKNLl",
)

View File

@@ -21,7 +21,7 @@ from pipecat.frames.frames import (
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.transports.services.daily import DailyParams, DailyTransport
@@ -61,6 +61,7 @@ async def main():
"Test",
DailyParams(
audio_in_enabled=True,
audio_in_sample_rate=24000,
audio_out_enabled=True,
camera_out_enabled=True,
camera_out_is_live=True,
@@ -77,13 +78,7 @@ async def main():
runner = PipelineRunner()
task = PipelineTask(
pipeline,
params=PipelineParams(
audio_in_sample_rate=24000,
audio_out_sample_rate=24000,
),
)
task = PipelineTask(pipeline)
await runner.run(task)

View File

@@ -22,9 +22,10 @@ from pipecat.frames.frames import (
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.transports.local.tk import TkLocalTransport, TkTransportParams
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.local.tk import TkLocalTransport
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
@@ -61,12 +62,12 @@ async def main():
tk_root.title("Local Mirror")
daily_transport = DailyTransport(
room_url, token, "Test", DailyParams(audio_in_enabled=True)
room_url, token, "Test", DailyParams(audio_in_enabled=True, audio_in_sample_rate=24000)
)
tk_transport = TkLocalTransport(
tk_root,
TkTransportParams(
TransportParams(
audio_out_enabled=True,
camera_out_enabled=True,
camera_out_is_live=True,
@@ -81,13 +82,7 @@ async def main():
pipeline = Pipeline([daily_transport.input(), MirrorProcessor(), tk_transport.output()])
task = PipelineTask(
pipeline,
params=PipelineParams(
audio_in_sample_rate=24000,
audio_out_sample_rate=24000,
),
)
task = PipelineTask(pipeline)
async def run_tk():
while not task.has_finished():

View File

@@ -76,7 +76,7 @@ async def main():
]
)
task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True))
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):

View File

@@ -7,7 +7,6 @@
import asyncio
import os
import sys
from typing import Optional
import aiohttp
from dotenv import load_dotenv
@@ -33,7 +32,7 @@ logger.add(sys.stderr, level="DEBUG")
class UserImageRequester(FrameProcessor):
def __init__(self, participant_id: Optional[str] = None):
def __init__(self, participant_id: str | None = None):
super().__init__()
self._participant_id = participant_id

View File

@@ -7,7 +7,6 @@
import asyncio
import os
import sys
from typing import Optional
import aiohttp
from dotenv import load_dotenv
@@ -33,7 +32,7 @@ logger.add(sys.stderr, level="DEBUG")
class UserImageRequester(FrameProcessor):
def __init__(self, participant_id: Optional[str] = None):
def __init__(self, participant_id: str | None = None):
super().__init__()
self._participant_id = participant_id
@@ -73,7 +72,9 @@ async def main():
vision_aggregator = VisionImageFrameAggregator()
google = GoogleLLMService(model="gemini-2.0-flash-001", api_key=os.getenv("GOOGLE_API_KEY"))
google = GoogleLLMService(
model="gemini-1.5-flash-latest", api_key=os.getenv("GOOGLE_API_KEY")
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),

View File

@@ -7,7 +7,6 @@
import asyncio
import os
import sys
from typing import Optional
import aiohttp
from dotenv import load_dotenv
@@ -33,7 +32,7 @@ logger.add(sys.stderr, level="DEBUG")
class UserImageRequester(FrameProcessor):
def __init__(self, participant_id: Optional[str] = None):
def __init__(self, participant_id: str | None = None):
super().__init__()
self._participant_id = participant_id

View File

@@ -7,7 +7,6 @@
import asyncio
import os
import sys
from typing import Optional
import aiohttp
from dotenv import load_dotenv
@@ -33,7 +32,7 @@ logger.add(sys.stderr, level="DEBUG")
class UserImageRequester(FrameProcessor):
def __init__(self, participant_id: Optional[str] = None):
def __init__(self, participant_id: str | None = None):
super().__init__()
self._participant_id = participant_id

View File

@@ -16,7 +16,8 @@ from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.whisper import WhisperSTTService
from pipecat.transports.local.audio import LocalAudioTransport, LocalAudioTransportParams
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.local.audio import LocalAudioTransport
load_dotenv(override=True)
@@ -33,7 +34,7 @@ class TranscriptionLogger(FrameProcessor):
async def main():
transport = LocalAudioTransport(LocalAudioTransportParams(audio_in_enabled=True))
transport = LocalAudioTransport(TransportParams(audio_in_enabled=True))
stt = WhisperSTTService()
@@ -43,7 +44,7 @@ async def main():
task = PipelineTask(pipeline)
runner = PipelineRunner(handle_sigint=False if sys.platform == "win32" else True)
runner = PipelineRunner()
await runner.run(task)

View File

@@ -112,7 +112,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -99,13 +99,7 @@ async def main():
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
),
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True, enable_metrics=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):

View File

@@ -153,13 +153,7 @@ If you need to use a tool, simply use the tool. Do not tell the user the tool yo
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
),
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True, enable_metrics=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):

View File

@@ -14,7 +14,6 @@ from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -31,12 +30,6 @@ logger.add(sys.stderr, level="DEBUG")
video_participant_id = None
async def start_fetch_weather(function_name, llm, context):
"""Push a frame to the LLM; this is handy when the LLM response might take a while."""
await llm.push_frame(TTSSpeakFrame("Let me check on that."))
logger.debug(f"Starting fetch_weather_from_api with function_name: {function_name}")
async def get_weather(function_name, tool_call_id, arguments, llm, context, result_callback):
location = arguments["location"]
await result_callback(f"The weather in {location} is currently 72 degrees and sunny.")
@@ -69,8 +62,12 @@ async def main():
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
llm = GoogleLLMService(api_key=os.getenv("GOOGLE_API_KEY"), model="gemini-2.0-flash-001")
llm.register_function("get_weather", get_weather, start_fetch_weather)
llm = GoogleLLMService(
model="gemini-1.5-flash-latest",
# model="gemini-exp-1114",
api_key=os.getenv("GOOGLE_API_KEY"),
)
llm.register_function("get_weather", get_weather)
llm.register_function("get_image", get_image)
tools = [
@@ -152,7 +149,7 @@ indicate you should use the get_image tool are:
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -20,7 +20,7 @@ from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.groq import GroqLLMService, GroqSTTService
from pipecat.services.groq import GroqLLMService
from pipecat.services.openai import OpenAILLMContext
from pipecat.transports.services.daily import DailyParams, DailyTransport
@@ -50,20 +50,20 @@ async def main():
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
stt = GroqSTTService(api_key=os.getenv("GROQ_API_KEY"), model="distil-whisper-large-v3-en")
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
llm = GroqLLMService(api_key=os.getenv("GROQ_API_KEY"), model="llama-3.3-70b-versatile")
llm = GroqLLMService(
api_key=os.getenv("GROQ_API_KEY"), model="llama3-groq-70b-8192-tool-use-preview"
)
# Register a function_name of None to get all functions
# sent to the same callback with an additional function_name parameter.
llm.register_function(None, fetch_weather_from_api, start_callback=start_fetch_weather)
@@ -105,7 +105,6 @@ async def main():
pipeline = Pipeline(
[
transport.input(),
stt,
context_aggregator.user(),
llm,
tts,
@@ -116,7 +115,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -113,7 +113,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -117,7 +117,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -116,7 +116,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -116,7 +116,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -123,7 +123,7 @@ Start by asking me for my location. Then, use 'get_weather_current' to give me a
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -123,7 +123,7 @@ Start by asking me for my location. Then, use 'get_weather_current' to give me a
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -117,7 +117,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -1,106 +0,0 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
"""This example demonstrates using the Perplexity API as a drop-in replacement for OpenAI.
Note that while this file is in the function-calling examples, Perplexity's API does not
currently support function calling. The example shows basic chat completion functionality
using Perplexity's API while maintaining compatibility with the OpenAI interface.
"""
import asyncio
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from openai.types.chat import ChatCompletionToolParam
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.openai import OpenAILLMContext, OpenAILLMService
from pipecat.services.perplexity import PerplexityLLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
llm = PerplexityLLMService(api_key=os.getenv("PERPLEXITY_API_KEY"), model="sonar")
messages = [
{
"role": "user",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(),
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -133,7 +133,7 @@ async def main():
]
)
task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True))
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):

View File

@@ -126,7 +126,7 @@ async def main():
]
)
task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True))
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):

View File

@@ -85,13 +85,7 @@ async def main():
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
),
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True, enable_metrics=True))
# When a participant joins, start transcription for that participant so the
# bot can "hear" and respond to them.

View File

@@ -108,7 +108,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
report_only_initial_ttfb=True,

View File

@@ -38,6 +38,7 @@ async def main():
"GStreamer",
DailyParams(
audio_out_enabled=True,
audio_out_is_live=True,
camera_out_enabled=True,
camera_out_width=1280,
camera_out_height=720,
@@ -50,6 +51,8 @@ async def main():
out_params=GStreamerPipelineSource.OutputParams(
video_width=1280,
video_height=720,
audio_sample_rate=24000,
audio_channels=1,
),
)

View File

@@ -16,13 +16,10 @@ from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import TranscriptionMessage, TranscriptionUpdateFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.transcript_processor import TranscriptProcessor
from pipecat.services.deepgram import DeepgramSTTService
from pipecat.services.openai_realtime_beta import (
InputAudioTranscription,
OpenAIRealtimeBetaLLMService,
@@ -83,7 +80,9 @@ async def main():
"Respond bot",
DailyParams(
audio_in_enabled=True,
audio_in_sample_rate=24000,
audio_out_enabled=True,
audio_out_sample_rate=24000,
transcription_enabled=False,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.8)),
@@ -143,29 +142,21 @@ Remember, your responses should be short. Just one or two sentences, usually."""
tools,
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
# Create transcript processor and handler
transcript = TranscriptProcessor()
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
transcript.user(), # User transcripts
context_aggregator.user(),
llm, # LLM
context_aggregator.assistant(),
transcript.assistant(), # Assistant transcripts
transport.output(), # Transport bot output
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
@@ -173,16 +164,9 @@ Remember, your responses should be short. Just one or two sentences, usually."""
),
)
# Register event handler for transcript updates
@transcript.event_handler("on_transcript_update")
async def on_transcript_update(processor, frame):
logger.debug(f"Received transcript update with {len(frame.messages)} new messages")
for msg in frame.messages:
logger.debug(msg)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
# await transport.capture_participant_transcription(participant["id"])
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])

View File

@@ -212,7 +212,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -177,7 +177,9 @@ async def main():
"Respond bot",
DailyParams(
audio_in_enabled=True,
audio_in_sample_rate=24000,
audio_out_enabled=True,
audio_out_sample_rate=24000,
transcription_enabled=False,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.8)),
@@ -237,7 +239,7 @@ Remember, your responses should be short. Just one or two sentences, usually."""
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -209,7 +209,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -237,7 +237,7 @@ async def main():
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
llm = GoogleLLMService(model="gemini-2.0-flash-001", api_key=os.getenv("GOOGLE_API_KEY"))
llm = GoogleLLMService(model="gemini-1.5-flash-latest", api_key=os.getenv("GOOGLE_API_KEY"))
# you can either register a single function for all function calls, or specific functions
# llm.register_function(None, fetch_weather_from_api)
@@ -263,7 +263,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -87,11 +87,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
# We just use 16000 because that's what Tavus is expecting and
# we avoid resampling.
audio_in_sample_rate=16000,
audio_out_sample_rate=16000,
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -104,11 +104,8 @@ async def main():
)
# This processor keeps the last context and will let it through once the
# notifier is woken up. We start with the gate open because we send an
# initial context frame to start the conversation.
gated_context_aggregator = GatedOpenAILLMContextAggregator(
notifier=notifier, start_open=True
)
# notifier is woken up.
gated_context_aggregator = GatedOpenAILLMContextAggregator(notifier)
# Notify if the user hasn't said anything.
async def user_idle_notifier(frame):
@@ -145,7 +142,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -12,7 +12,6 @@ import time
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from openai.types.chat import ChatCompletionToolParam
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
@@ -20,8 +19,6 @@ from pipecat.frames.frames import (
CancelFrame,
EndFrame,
Frame,
FunctionCallInProgressFrame,
FunctionCallResultFrame,
LLMMessagesFrame,
StartFrame,
StartInterruptionFrame,
@@ -29,7 +26,6 @@ from pipecat.frames.frames import (
SystemFrame,
TextFrame,
TranscriptionFrame,
TTSSpeakFrame,
UserStartedSpeakingFrame,
UserStoppedSpeakingFrame,
)
@@ -133,12 +129,11 @@ class CompletenessCheck(FrameProcessor):
class OutputGate(FrameProcessor):
def __init__(self, *, notifier: BaseNotifier, start_open: bool = False, **kwargs):
def __init__(self, notifier: BaseNotifier, **kwargs):
super().__init__(**kwargs)
self._gate_open = start_open
self._gate_open = False
self._frames_buffer = []
self._notifier = notifier
self._gate_task = None
def close_gate(self):
self._gate_open = False
@@ -161,11 +156,6 @@ class OutputGate(FrameProcessor):
await self.push_frame(frame, direction)
return
# Don't block function call frames
if isinstance(frame, (FunctionCallInProgressFrame, FunctionCallResultFrame)):
await self.push_frame(frame, direction)
return
# Ignore frames that are not following the direction of this gate.
if direction != FrameDirection.DOWNSTREAM:
await self.push_frame(frame, direction)
@@ -179,13 +169,10 @@ class OutputGate(FrameProcessor):
async def _start(self):
self._frames_buffer = []
if not self._gate_task:
self._gate_task = self.create_task(self._gate_task_handler())
self._gate_task = self.create_task(self._gate_task_handler())
async def _stop(self):
if self._gate_task:
await self.cancel_task(self._gate_task)
self._gate_task = None
await self.cancel_task(self._gate_task)
async def _gate_task_handler(self):
while True:
@@ -199,16 +186,6 @@ class OutputGate(FrameProcessor):
break
async def start_fetch_weather(function_name, llm, context):
"""Push a frame to the LLM; this is handy when the LLM response might take a while."""
await llm.push_frame(TTSSpeakFrame("Let me check on that."))
logger.debug(f"Starting fetch_weather_from_api with function_name: {function_name}")
async def fetch_weather_from_api(function_name, tool_call_id, args, llm, context, result_callback):
await result_callback({"conditions": "nice", "temperature": "75"})
async def main():
async with aiohttp.ClientSession() as session:
(room_url, _) = await configure(session)
@@ -239,34 +216,6 @@ async def main():
# This is the regular LLM.
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
# Register a function_name of None to get all functions
# sent to the same callback with an additional function_name parameter.
llm.register_function(None, fetch_weather_from_api, start_callback=start_fetch_weather)
tools = [
ChatCompletionToolParam(
type="function",
function={
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
},
"required": ["location", "format"],
},
},
)
]
messages = [
{
@@ -275,7 +224,7 @@ async def main():
},
]
context = OpenAILLMContext(messages, tools)
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
# We have instructed the LLM to return 'YES' if it thinks the user
@@ -303,9 +252,7 @@ async def main():
# sentence, this will wake up the notifier if that happens.
user_idle = UserIdleProcessor(callback=user_idle_notifier, timeout=5.0)
# We start with the gate open because we send an initial context frame
# to start the conversation.
bot_output_gate = OutputGate(notifier=notifier, start_open=True)
bot_output_gate = OutputGate(notifier=notifier)
async def block_user_stopped_speaking(frame):
return not isinstance(frame, UserStoppedSpeakingFrame)
@@ -316,8 +263,6 @@ async def main():
or isinstance(frame, LLMMessagesFrame)
or isinstance(frame, StartInterruptionFrame)
or isinstance(frame, StopInterruptionFrame)
or isinstance(frame, FunctionCallInProgressFrame)
or isinstance(frame, FunctionCallResultFrame)
)
pipeline = Pipeline(
@@ -355,7 +300,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -12,7 +12,6 @@ import time
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from openai.types.chat import ChatCompletionToolParam
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
@@ -20,8 +19,6 @@ from pipecat.frames.frames import (
CancelFrame,
EndFrame,
Frame,
FunctionCallInProgressFrame,
FunctionCallResultFrame,
LLMMessagesFrame,
StartFrame,
StartInterruptionFrame,
@@ -29,7 +26,6 @@ from pipecat.frames.frames import (
SystemFrame,
TextFrame,
TranscriptionFrame,
TTSSpeakFrame,
UserStartedSpeakingFrame,
UserStoppedSpeakingFrame,
)
@@ -337,12 +333,11 @@ class CompletenessCheck(FrameProcessor):
class OutputGate(FrameProcessor):
def __init__(self, *, notifier: BaseNotifier, start_open: bool = False, **kwargs):
def __init__(self, notifier: BaseNotifier, **kwargs):
super().__init__(**kwargs)
self._gate_open = start_open
self._gate_open = False
self._frames_buffer = []
self._notifier = notifier
self._gate_task = None
def close_gate(self):
self._gate_open = False
@@ -365,11 +360,6 @@ class OutputGate(FrameProcessor):
await self.push_frame(frame, direction)
return
# Don't block function call frames
if isinstance(frame, (FunctionCallInProgressFrame, FunctionCallResultFrame)):
await self.push_frame(frame, direction)
return
# Ignore frames that are not following the direction of this gate.
if direction != FrameDirection.DOWNSTREAM:
await self.push_frame(frame, direction)
@@ -383,13 +373,10 @@ class OutputGate(FrameProcessor):
async def _start(self):
self._frames_buffer = []
if not self._gate_task:
self._gate_task = self.create_task(self._gate_task_handler())
self._gate_task = self.create_task(self._gate_task_handler())
async def _stop(self):
if self._gate_task:
await self.cancel_task(self._gate_task)
self._gate_task = None
await self.cancel_task(self._gate_task)
async def _gate_task_handler(self):
while True:
@@ -403,16 +390,6 @@ class OutputGate(FrameProcessor):
break
async def start_fetch_weather(function_name, llm, context):
"""Push a frame to the LLM; this is handy when the LLM response might take a while."""
await llm.push_frame(TTSSpeakFrame("Let me check on that."))
logger.debug(f"Starting fetch_weather_from_api with function_name: {function_name}")
async def fetch_weather_from_api(function_name, tool_call_id, args, llm, context, result_callback):
await result_callback({"conditions": "nice", "temperature": "75"})
async def main():
async with aiohttp.ClientSession() as session:
(room_url, _) = await configure(session)
@@ -449,34 +426,6 @@ async def main():
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o",
)
# Register a function_name of None to get all functions
# sent to the same callback with an additional function_name parameter.
llm.register_function(None, fetch_weather_from_api, start_callback=start_fetch_weather)
tools = [
ChatCompletionToolParam(
type="function",
function={
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
},
"required": ["location", "format"],
},
},
)
]
messages = [
{
@@ -485,7 +434,7 @@ async def main():
},
]
context = OpenAILLMContext(messages, tools)
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
# We have instructed the LLM to return 'YES' if it thinks the user
@@ -512,9 +461,7 @@ async def main():
# sentence, this will wake up the notifier if that happens.
user_idle = UserIdleProcessor(callback=user_idle_notifier, timeout=5.0)
# We start with the gate open because we send an initial context frame
# to start the conversation.
bot_output_gate = OutputGate(notifier=notifier, start_open=True)
bot_output_gate = OutputGate(notifier=notifier)
async def block_user_stopped_speaking(frame):
return not isinstance(frame, UserStoppedSpeakingFrame)
@@ -525,8 +472,6 @@ async def main():
or isinstance(frame, LLMMessagesFrame)
or isinstance(frame, StartInterruptionFrame)
or isinstance(frame, StopInterruptionFrame)
or isinstance(frame, FunctionCallInProgressFrame)
or isinstance(frame, FunctionCallResultFrame)
)
pipeline = Pipeline(
@@ -564,7 +509,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -20,13 +20,13 @@ from pipecat.frames.frames import (
CancelFrame,
EndFrame,
Frame,
FunctionCallInProgressFrame,
FunctionCallResultFrame,
InputAudioRawFrame,
LLMFullResponseEndFrame,
LLMFullResponseStartFrame,
LLMMessagesFrame,
StartFrame,
StartInterruptionFrame,
StopInterruptionFrame,
SystemFrame,
TextFrame,
TranscriptionFrame,
@@ -55,9 +55,13 @@ load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
TRANSCRIBER_MODEL = "gemini-2.0-flash-001"
CLASSIFIER_MODEL = "gemini-2.0-flash-001"
CONVERSATION_MODEL = "gemini-2.0-flash-001"
# TRANSCRIBER_MODEL = "gemini-1.5-flash-latest"
# CLASSIFIER_MODEL = "gemini-1.5-flash-latest"
# CONVERSATION_MODEL = "gemini-1.5-flash-latest"
TRANSCRIBER_MODEL = "gemini-2.0-flash-exp"
CLASSIFIER_MODEL = "gemini-2.0-flash-exp"
CONVERSATION_MODEL = "gemini-2.0-flash-exp"
transcriber_system_instruction = """You are an audio transcriber. You are receiving audio from a user. Your job is to
transcribe the input audio to text exactly as it was said by the user.
@@ -495,7 +499,7 @@ class UserAggregatorBuffer(LLMResponseAggregator):
if isinstance(frame, UserStartedSpeakingFrame):
self._transcription = ""
async def push_aggregation(self):
async def _push_aggregation(self):
if self._aggregation:
self._transcription = self._aggregation
self._aggregation = ""
@@ -553,7 +557,6 @@ class OutputGate(FrameProcessor):
self._notifier = notifier
self._context = context
self._transcription_buffer = user_transcription_buffer
self._gate_task = None
def close_gate(self):
self._gate_open = False
@@ -576,11 +579,6 @@ class OutputGate(FrameProcessor):
await self.push_frame(frame, direction)
return
# Don't block function call frames
if isinstance(frame, (FunctionCallInProgressFrame, FunctionCallResultFrame)):
await self.push_frame(frame, direction)
return
# Ignore frames that are not following the direction of this gate.
if direction != FrameDirection.DOWNSTREAM:
await self.push_frame(frame, direction)
@@ -601,13 +599,10 @@ class OutputGate(FrameProcessor):
async def _start(self):
self._frames_buffer = []
if not self._gate_task:
self._gate_task = self.create_task(self._gate_task_handler())
self._gate_task = self.create_task(self._gate_task_handler())
async def _stop(self):
if self._gate_task:
await self.cancel_task(self._gate_task)
self._gate_task = None
await self.cancel_task(self._gate_task)
async def _gate_task_handler(self):
while True:
@@ -644,6 +639,7 @@ async def main():
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
audio_in_sample_rate=16000,
),
)
@@ -681,6 +677,12 @@ async def main():
context = OpenAILLMContext()
context_aggregator = conversation_llm.create_context_aggregator(context)
# We have instructed the LLM to return 'True' if it thinks the user
# completed a sentence. So, if it's 'True' we will return true in this
# predicate which will wake up the notifier.
async def wake_check_filter(frame):
return frame.text == "True"
# This is a notifier that we use to synchronize the two LLMs.
notifier = EventNotifier()
@@ -697,6 +699,14 @@ async def main():
async def block_user_stopped_speaking(frame):
return not isinstance(frame, UserStoppedSpeakingFrame)
async def pass_only_llm_trigger_frames(frame):
return (
isinstance(frame, OpenAILLMContextFrame)
or isinstance(frame, LLMMessagesFrame)
or isinstance(frame, StartInterruptionFrame)
or isinstance(frame, StopInterruptionFrame)
)
conversation_audio_context_assembler = ConversationAudioContextAssembler(context=context)
user_aggregator_buffer = UserAggregatorBuffer()
@@ -742,7 +752,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -87,7 +87,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

View File

@@ -61,11 +61,9 @@ async def main():
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
# Configure the mute processor with both strategies
stt_mute_processor = STTMuteFilter(
stt_service=stt,
config=STTMuteConfig(
strategies={
STTMuteStrategy.MUTE_UNTIL_FIRST_BOT_COMPLETE,
STTMuteStrategy.FUNCTION_CALL,
}
strategies={STTMuteStrategy.FIRST_SPEECH, STTMuteStrategy.FUNCTION_CALL}
),
)
@@ -122,7 +120,7 @@ async def main():
]
)
task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True))
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):

View File

@@ -143,14 +143,7 @@ class InputTranscriptionContextFilter(FrameProcessor):
return
try:
# Make sure we're working with a GoogleLLMContext
context = GoogleLLMContext.upgrade_to_google(frame.context)
message = context.messages[-1]
if not isinstance(message, glm.Content):
logger.error(f"Expected glm.Content, got {type(message)}")
return
message = frame.context.messages[-1]
last_part = message.parts[-1]
if not (
message.role == "user"
@@ -299,7 +292,7 @@ async def main():
conversation_llm = GoogleLLMService(
name="Conversation",
model="gemini-2.0-flash-001",
model="gemini-1.5-flash-latest",
# model="gemini-exp-1121",
api_key=os.getenv("GOOGLE_API_KEY"),
# we can give the GoogleLLMService a system instruction to use directly
@@ -310,7 +303,7 @@ async def main():
input_transcription_llm = GoogleLLMService(
name="Transcription",
model="gemini-2.0-flash-001",
model="gemini-1.5-flash-latest",
# model="gemini-exp-1121",
api_key=os.getenv("GOOGLE_API_KEY"),
system_instruction=transcriber_system_message,
@@ -354,7 +347,7 @@ async def main():
task = PipelineTask(
pipeline,
params=PipelineParams(
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,

Some files were not shown because too many files have changed in this diff Show More