Compare commits
1 Commits
hush/custo
...
hush/realt
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
230d92850a |
@@ -1,8 +1,7 @@
|
|||||||
repos:
|
repos:
|
||||||
- repo: https://github.com/astral-sh/ruff-pre-commit
|
- repo: local
|
||||||
rev: v0.9.7
|
|
||||||
hooks:
|
hooks:
|
||||||
- id: ruff
|
- id: ruff-format-hook
|
||||||
language_version: python3
|
name: Check ruff formatting
|
||||||
args: [ --select, I, ]
|
entry: sh scripts/pre-commit.sh
|
||||||
- id: ruff-format
|
language: system
|
||||||
|
|||||||
648
CHANGELOG.md
648
CHANGELOG.md
@@ -9,635 +9,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
|||||||
|
|
||||||
### Added
|
### Added
|
||||||
|
|
||||||
- Added support for Smart Turn Detection via the `turn_analyzer` transport
|
|
||||||
parameter. You can now choose between `SmartTurnAnalyzer()` for remote
|
|
||||||
inference or `LocalCoreMLSmartTurnAnalyzer()` for on-device inference using
|
|
||||||
Core ML.
|
|
||||||
|
|
||||||
- `DeepgramTTSService` accepts `base_url` argument again, allowing you to
|
|
||||||
connect to an on-prem service.
|
|
||||||
|
|
||||||
- Added `LLMUserAggregatorParams` and `LLMAssistantAggregatorParams` which allow
|
|
||||||
you to control aggregator settings. You can now pass these arguments when
|
|
||||||
creating aggregator pairs with `create_context_aggregator()`.
|
|
||||||
|
|
||||||
- Added `previous_text` context support to ElevenLabsHttpTTSService, improving
|
|
||||||
speech consistency across sentences within an LLM response.
|
|
||||||
|
|
||||||
- Added word/timestamp pairs to `ElevenLabsHttpTTSService`.
|
|
||||||
|
|
||||||
- It is now possible to disable `SoundfileMixer` when created. You can then use
|
|
||||||
`MixerEnableFrame` to dynamically enable it when necessary.
|
|
||||||
|
|
||||||
- Added `on_client_connected` and `on_client_disconnected` event handlers to
|
|
||||||
the `DailyTransport` class. These handlers map to the same underlying Daily
|
|
||||||
events as `on_participant_joined` and `on_participant_left`, respectively.
|
|
||||||
This makes it easier to write a single bot pipeline that can also use other
|
|
||||||
transports like `SmallWebRTCTransport` and `FastAPIWebsocketTransport`.
|
|
||||||
|
|
||||||
### Changed
|
|
||||||
|
|
||||||
- Daily's REST helpers now include an `eject_at_token_exp` param, which ejects
|
|
||||||
the user when their token expires. This new parameter defaults to False.
|
|
||||||
Also, the default value for `enable_prejoin_ui` changed to False and
|
|
||||||
`eject_at_room_exp` changed to False.
|
|
||||||
|
|
||||||
- `OpenAILLMService` and `OpenPipeLLMService` now use `gpt-4.1` as their
|
|
||||||
default model.
|
|
||||||
|
|
||||||
- `SoundfileMixer` constructor arguments need to be keywords.
|
|
||||||
|
|
||||||
### Deprecated
|
|
||||||
|
|
||||||
- `DeepgramSTTService` parameter `url` is now deprecated, use `base_url`
|
|
||||||
instead.
|
|
||||||
|
|
||||||
### Removed
|
|
||||||
|
|
||||||
- Parameters `user_kwargs` and `assistant_kwargs` when creating a context
|
|
||||||
aggregator pair using `create_context_aggregator()` have been removed. Use
|
|
||||||
`user_params` and `assistant_params` instead.
|
|
||||||
|
|
||||||
### Fixed
|
|
||||||
|
|
||||||
- Fixed an issue that would cause TTS websocket-based services to not cleanup
|
|
||||||
resources properly when disconnecting.
|
|
||||||
|
|
||||||
- Fixed a `TavusVideoService` issue that was causing audio choppiness.
|
|
||||||
|
|
||||||
- Fixed an issue in `SmallWebRTCTransport` where an error was thrown if the
|
|
||||||
client did not create a video transceiver.
|
|
||||||
|
|
||||||
- Fixed an issue where LLM input parameters were not working and applied correctly in `GoogleVertexLLMService`, causing
|
|
||||||
unexpected behavior during inference.
|
|
||||||
|
|
||||||
## [0.0.63] - 2025-04-11
|
|
||||||
|
|
||||||
### Added
|
|
||||||
|
|
||||||
- Added media resolution control to `GeminiMultimodalLiveLLMService` with
|
|
||||||
`GeminiMediaResolution` enum, allowing configuration of token usage for
|
|
||||||
image processing (LOW: 64 tokens, MEDIUM: 256 tokens, HIGH: zoomed reframing
|
|
||||||
with 256 tokens).
|
|
||||||
|
|
||||||
- Added Gemini's Voice Activity Detection (VAD) configuration to
|
|
||||||
`GeminiMultimodalLiveLLMService` with `GeminiVADParams`, allowing fine
|
|
||||||
control over speech detection sensitivity and timing, including:
|
|
||||||
|
|
||||||
- Start sensitivity (how quickly speech is detected)
|
|
||||||
- End sensitivity (how quickly turns end after pauses)
|
|
||||||
- Prefix padding (milliseconds of audio to keep before speech is detected)
|
|
||||||
- Silence duration (milliseconds of silence required to end a turn)
|
|
||||||
|
|
||||||
- Added comprehensive language support to `GeminiMultimodalLiveLLMService`,
|
|
||||||
supporting over 30 languages via the `language` parameter, with proper
|
|
||||||
mapping between Pipecat's `Language` enum and Gemini's language codes.
|
|
||||||
|
|
||||||
- Added support in `SmallWebRTCTransport` to detect when remote tracks are
|
|
||||||
muted.
|
|
||||||
|
|
||||||
- Added support for image capture from a video stream to the
|
|
||||||
`SmallWebRTCTransport`.
|
|
||||||
|
|
||||||
- Added a new iOS client option to the `SmallWebRTCTransport`
|
|
||||||
**video-transform** example.
|
|
||||||
|
|
||||||
- Added new processors `ProducerProcessor` and `ConsumerProcessor`. The
|
|
||||||
producer processor processes frames from the pipeline and decides whether the
|
|
||||||
consumers should consume it or not. If so, the same frame that is received by
|
|
||||||
the producer is sent to the consumer. There can be multiple consumers per
|
|
||||||
producer. These processors can be useful to push frames from one part of a
|
|
||||||
pipeline to a different one (e.g. when using `ParallelPipeline`).
|
|
||||||
|
|
||||||
- Improvements for the `SmallWebRTCTransport`:
|
|
||||||
- Wait until the pipeline is ready before triggering the `connected` event.
|
|
||||||
- Queue messages if the data channel is not ready.
|
|
||||||
- Update the aiortc dependency to fix an issue where the 'video/rtx' MIME
|
|
||||||
type was incorrectly handled as a codec retransmission.
|
|
||||||
- Avoid initial video delays.
|
|
||||||
|
|
||||||
### Changed
|
|
||||||
|
|
||||||
- In `GeminiMultimodalLiveLLMService`, removed the `transcribe_model_audio`
|
|
||||||
parameter in favor of Gemini Live's native output transcription support. Now
|
|
||||||
text transcriptions are produced directly by the model. No configuration is
|
|
||||||
required.
|
|
||||||
|
|
||||||
- Updated `GeminiMultimodalLiveLLMService`’s default `model` to
|
|
||||||
`models/gemini-2.0-flash-live-001` and `base_url` to the `v1beta` websocket
|
|
||||||
URL.
|
|
||||||
|
|
||||||
### Fixed
|
|
||||||
|
|
||||||
- Updated `daily-python` to 0.17.0 to fix an issue that was preventing to run on
|
|
||||||
older platforms.
|
|
||||||
|
|
||||||
- Fixed an issue where `CartesiaTTSService`'s spell feature would result in
|
|
||||||
the spelled word in the context appearing as "F,O,O,B,A,R" instead of
|
|
||||||
"FOOBAR".
|
|
||||||
|
|
||||||
- Fixed an issue in the Azure TTS services where the language was being set
|
|
||||||
incorrectly.
|
|
||||||
|
|
||||||
- Fixed `SmallWebRTCTransport` to support dynamic values for
|
|
||||||
`TransportParams.audio_out_10ms_chunks`. Previously, it only worked with 20ms
|
|
||||||
chunks.
|
|
||||||
|
|
||||||
- Fixed an issue with `GeminiMultimodalLiveLLMService` where the assistant
|
|
||||||
context messages had no space between words.
|
|
||||||
|
|
||||||
- Fixed an issue where `LLMAssistantContextAggregator` would prevent a
|
|
||||||
`BotStoppedSpeakingFrame` from moving through the pipeline.
|
|
||||||
|
|
||||||
## [0.0.62] - 2025-04-01 "An April Fools' release"
|
|
||||||
|
|
||||||
### Added
|
|
||||||
|
|
||||||
- Added `TransportParams.audio_out_10ms_chunks` parameter to allow controlling
|
|
||||||
the amount of audio being sent by the output transport. It defaults to 4, so
|
|
||||||
40ms audio chunks are sent.
|
|
||||||
|
|
||||||
- Added `QwenLLMService` for Qwen integration with an OpenAI-compatible
|
|
||||||
interface. Added foundational example `14q-function-calling-qwen.py`.
|
|
||||||
|
|
||||||
- Added `Mem0MemoryService`. Mem0 is a self-improving memory layer for LLM
|
|
||||||
applications. Learn more at: https://mem0.ai/.
|
|
||||||
|
|
||||||
- Added `WhisperSTTServiceMLX` for Whisper transcription on Apple Silicon.
|
|
||||||
See example in `examples/foundational/13e-whisper-mlx.py`. Latency of
|
|
||||||
completed transcription using Whisper large-v3-turbo on an M4 macbook is
|
|
||||||
~500ms.
|
|
||||||
|
|
||||||
- Added `SmallWebRTCTransport`, a new P2P WebRTC transport.
|
|
||||||
|
|
||||||
- Created two examples in `p2p-webrtc`:
|
|
||||||
- **video-transform**: Demonstrates sending and receiving audio/video with
|
|
||||||
`SmallWebRTCTransport` using `TypeScript`. Includes video frame
|
|
||||||
processing with OpenCV.
|
|
||||||
- **voice-agent**: A minimal example of creating a voice agent with
|
|
||||||
`SmallWebRTCTransport`.
|
|
||||||
|
|
||||||
- `GladiaSTTService` now have comprehensive support for the latest API config
|
|
||||||
options, including model, language detection, preprocessing, custom
|
|
||||||
vocabulary, custom spelling, translation, and message filtering options.
|
|
||||||
|
|
||||||
- Added `SmallWebRTCTransport`, a new P2P WebRTC transport.
|
|
||||||
|
|
||||||
- Created two examples in `p2p-webrtc`:
|
|
||||||
- **video-transform**: Demonstrates sending and receiving audio/video with
|
|
||||||
`SmallWebRTCTransport` using `TypeScript`. Includes video frame
|
|
||||||
processing with OpenCV.
|
|
||||||
- **voice-agent**: A minimal example of creating a voice agent with
|
|
||||||
`SmallWebRTCTransport`.
|
|
||||||
|
|
||||||
- Added support to `ProtobufFrameSerializer` to send the messages from
|
|
||||||
`TransportMessageFrame` and `TransportMessageUrgentFrame`.
|
|
||||||
|
|
||||||
- Added support for a new TTS service, `PiperTTSService`.
|
|
||||||
(see https://github.com/rhasspy/piper/)
|
|
||||||
|
|
||||||
- It is now possible to tell whether `UserStartedSpeakingFrame` or
|
|
||||||
`UserStoppedSpeakingFrame` have been generated because of emulation frames.
|
|
||||||
|
|
||||||
### Changed
|
|
||||||
|
|
||||||
- `FunctionCallResultFrame`a are now system frames. This is to prevent function
|
|
||||||
call results to be discarded during interruptions.
|
|
||||||
|
|
||||||
- Pipecat services have been reorganized into packages. Each package can have
|
|
||||||
one or more of the following modules (in the future new module names might be
|
|
||||||
needed) depending on the services implemented:
|
|
||||||
|
|
||||||
- image: for image generation services
|
|
||||||
- llm: for LLM services
|
|
||||||
- memory: for memory services
|
|
||||||
- stt: for Speech-To-Text services
|
|
||||||
- tts: for Text-To-Speech services
|
|
||||||
- video: for video generation services
|
|
||||||
- vision: for video recognition services
|
|
||||||
|
|
||||||
- Base classes for AI services have been reorganized into modules. They can now
|
|
||||||
be found in
|
|
||||||
`pipecat.services.[ai_service,image_service,llm_service,stt_service,vision_service]`.
|
|
||||||
|
|
||||||
- `GladiaSTTService` now uses the `solaria-1` model by default. Other params
|
|
||||||
use Gladia's default values. Added support for more language codes.
|
|
||||||
|
|
||||||
### Deprecated
|
|
||||||
|
|
||||||
- All Pipecat services imports have been deprecated and a warning will be shown
|
|
||||||
when using the old import. The new import should be
|
|
||||||
`pipecat.services.[service].[image,llm,memory,stt,tts,video,vision]`. For
|
|
||||||
example, `from pipecat.services.openai.llm import OpenAILLMService`.
|
|
||||||
|
|
||||||
- Import for AI services base classes from `pipecat.services.ai_services` is now
|
|
||||||
deprecated, use one of
|
|
||||||
`pipecat.services.[ai_service,image_service,llm_service,stt_service,vision_service]`.
|
|
||||||
|
|
||||||
- Deprecated the `language` parameter in `GladiaSTTService.InputParams` in
|
|
||||||
favor of `language_config`, which better aligns with Gladia's API.
|
|
||||||
|
|
||||||
- Deprecated using `GladiaSTTService.InputParams` directly. Use the new
|
|
||||||
`GladiaInputParams` class instead.
|
|
||||||
|
|
||||||
### Fixed
|
|
||||||
|
|
||||||
- Fixed a `FastAPIWebsocketTransport` and `WebsocketClientTransport` issue that
|
|
||||||
would cause the transport to be closed prematurely, preventing the internally
|
|
||||||
queued audio to be sent. The same issue could also cause an infinite loop
|
|
||||||
while using an output mixer and when sending an `EndFrame`, preventing the bot
|
|
||||||
to finish.
|
|
||||||
|
|
||||||
- Fixed an issue that could cause the `TranscriptionUpdateFrame` being pushed
|
|
||||||
because of an interruption to be discarded.
|
|
||||||
|
|
||||||
- Fixed an issue that would cause `SegmentedSTTService` based services
|
|
||||||
(e.g. `OpenAISTTService`) to try to transcribe non-spoken audio, causing
|
|
||||||
invalid transcriptions.
|
|
||||||
|
|
||||||
- Fixed an issue where `GoogleTTSService` was emitting two `TTSStoppedFrames`.
|
|
||||||
|
|
||||||
### Performance
|
|
||||||
|
|
||||||
- Output transports now send 40ms audio chunks instead of 20ms. This should
|
|
||||||
improve performance.
|
|
||||||
|
|
||||||
- `BotSpeakingFrame`s are now sent every 200ms. If the output transport audio chunks
|
|
||||||
are higher than 200ms then they will be sent at every audio chunk.
|
|
||||||
|
|
||||||
### Other
|
|
||||||
|
|
||||||
- Added foundational example `37-mem0.py` demonstrating how to use the
|
|
||||||
`Mem0MemoryService`.
|
|
||||||
|
|
||||||
- Added foundational example `13e-whisper-mlx.py` demonstrating how to use the
|
|
||||||
`WhisperSTTServiceMLX`.
|
|
||||||
|
|
||||||
## [0.0.61] - 2025-03-26
|
|
||||||
|
|
||||||
### Added
|
|
||||||
|
|
||||||
- Added a new frame, `LLMSetToolChoiceFrame`, which provides a mechanism
|
|
||||||
for modifying the `tool_choice` in the context.
|
|
||||||
|
|
||||||
- Added `GroqTTSService` which provides text-to-speech functionality using
|
|
||||||
Groq's API.
|
|
||||||
|
|
||||||
- Added support in `DailyTransport` for updating remote participants'
|
|
||||||
`canReceive` permission via the `update_remote_participants()` method, by
|
|
||||||
bumping the daily-python dependency to >= 0.16.0.
|
|
||||||
|
|
||||||
- ElevenLabs TTS services now support a sample rate of 8000.
|
|
||||||
|
|
||||||
- Added support for `instructions` in `OpenAITTSService`.
|
|
||||||
|
|
||||||
- Added support for `base_url` in `OpenAIImageGenService` and
|
|
||||||
`OpenAITTSService`.
|
|
||||||
|
|
||||||
### Fixed
|
|
||||||
|
|
||||||
- Fixed an issue in `RTVIObserver` that prevented handling of Google LLM
|
|
||||||
context messages. The observer now processes both OpenAI-style and
|
|
||||||
Google-style contexts.
|
|
||||||
|
|
||||||
- Fixed an issue in Daily involving switching virtual devices, by bumping the
|
|
||||||
daily-python dependency to >= 0.16.1.
|
|
||||||
|
|
||||||
- Fixed a `GoogleAssistantContextAggregator` issue where function calls
|
|
||||||
placeholders where not being updated when then function call result was
|
|
||||||
different from a string.
|
|
||||||
|
|
||||||
- Fixed an issue that would cause `LLMAssistantContextAggregator` to block
|
|
||||||
processing more frames while processing a function call result.
|
|
||||||
|
|
||||||
- Fixed an issue where the `RTVIObserver` would report two bot started and
|
|
||||||
stopped speaking events for each bot turn.
|
|
||||||
|
|
||||||
- Fixed an issue in `UltravoxSTTService` that caused improper audio processing
|
|
||||||
and incorrect LLM frame output.
|
|
||||||
|
|
||||||
### Other
|
|
||||||
|
|
||||||
- Added `examples/foundational/07x-interruptible-local.py` to show how a local
|
|
||||||
transport can be used.
|
|
||||||
|
|
||||||
## [0.0.60] - 2025-03-20
|
|
||||||
|
|
||||||
### Added
|
|
||||||
|
|
||||||
- Added `default_headers` parameter to `BaseOpenAILLMService` constructor.
|
|
||||||
|
|
||||||
### Changed
|
|
||||||
|
|
||||||
- Rollback to `deepgram-sdk` 3.8.0 since 3.10.1 was causing connections issues.
|
|
||||||
|
|
||||||
- Changed the default `InputAudioTranscription` model to `gpt-4o-transcribe`
|
|
||||||
for `OpenAIRealtimeBetaLLMService`.
|
|
||||||
|
|
||||||
### Other
|
|
||||||
|
|
||||||
- Update the `19-openai-realtime-beta.py` and `19a-azure-realtime-beta.py`
|
|
||||||
examples to use the FunctionSchema format.
|
|
||||||
|
|
||||||
## [0.0.59] - 2025-03-20
|
|
||||||
|
|
||||||
### Added
|
|
||||||
|
|
||||||
- When registering a function call it is now possible to indicate if you want
|
|
||||||
the function call to be cancelled if there's a user interruption via
|
|
||||||
`cancel_on_interruption` (defaults to False). This is now possible because
|
|
||||||
function calls are executed concurrently.
|
|
||||||
|
|
||||||
- Added support for detecting idle pipelines. By default, if no activity has
|
|
||||||
been detected during 5 minutes, the `PipelineTask` will be automatically
|
|
||||||
cancelled. It is possible to override this behavior by passing
|
|
||||||
`cancel_on_idle_timeout=False`. It is also possible to change the default
|
|
||||||
timeout with `idle_timeout_secs` or the frames that prevent the pipeline from
|
|
||||||
being idle with `idle_timeout_frames`. Finally, an `on_idle_timeout` event
|
|
||||||
handler will be triggered if the idle timeout is reached (whether the pipeline
|
|
||||||
task is cancelled or not).
|
|
||||||
|
|
||||||
- Added `FalSTTService`, which provides STT for Fal's Wizper API.
|
|
||||||
|
|
||||||
- Added a `reconnect_on_error` parameter to websocket-based TTS services as well
|
|
||||||
as a `on_connection_error` event handler. The `reconnect_on_error` indicates
|
|
||||||
whether the TTS service should reconnect on error. The `on_connection_error`
|
|
||||||
will always get called if there's any error no matter the value of
|
|
||||||
`reconnect_on_error`. This allows, for example, to fallback to a different TTS
|
|
||||||
provider if something goes wrong with the current one.
|
|
||||||
|
|
||||||
- Added new `SkipTagsAggregator` that extends `BaseTextAggregator` to aggregate
|
|
||||||
text and skips end of sentence matching if aggregated text is between
|
|
||||||
start/end tags.
|
|
||||||
|
|
||||||
- Added new `PatternPairAggregator` that extends `BaseTextAggregator` to
|
|
||||||
identify content between matching pattern pairs in streamed text. This allows
|
|
||||||
for detection and processing of structured content like XML-style tags that
|
|
||||||
may span across multiple text chunks or sentence boundaries.
|
|
||||||
|
|
||||||
- Added new `BaseTextAggregator`. Text aggregators are used by the TTS service
|
|
||||||
to aggregate LLM tokens and decide when the aggregated text should be pushed
|
|
||||||
to the TTS service. They also allow for the text to be manipulated while it's
|
|
||||||
being aggregated. A text aggregator can be passed via `text_aggregator` to the
|
|
||||||
TTS service.
|
|
||||||
|
|
||||||
- Added new `sample_rate` constructor parameter to `TavusVideoService` to allow
|
|
||||||
changing the output sample rate.
|
|
||||||
|
|
||||||
- Added new `NeuphonicTTSService`.
|
|
||||||
(see https://neuphonic.com)
|
|
||||||
|
|
||||||
- Added new `UltravoxSTTService`.
|
|
||||||
(see https://github.com/fixie-ai/ultravox)
|
|
||||||
|
|
||||||
- Added `on_frame_reached_upstream` and `on_frame_reached_downstream` event
|
|
||||||
handlers to `PipelineTask`. Those events will be called when a frame reaches
|
|
||||||
the beginning or end of the pipeline respectively. Note that by default, the
|
|
||||||
event handlers will not be called unless a filter is set with
|
|
||||||
`PipelineTask.set_reached_upstream_filter()` or
|
|
||||||
`PipelineTask.set_reached_downstream_filter()`.
|
|
||||||
|
|
||||||
- Added support for Chirp voices in `GoogleTTSService`.
|
|
||||||
|
|
||||||
- Added a `flush_audio()` method to `FishTTSService` and `LmntTTSService`.
|
|
||||||
|
|
||||||
- Added a `set_language` convenience method for `GoogleSTTService`, allowing
|
|
||||||
you to set a single language. This is in addition to the `set_languages`
|
|
||||||
method which allows you to set a list of languages.
|
|
||||||
|
|
||||||
- Added `on_user_turn_audio_data` and `on_bot_turn_audio_data` to
|
|
||||||
`AudioBufferProcessor`. This gives the ability to grab the audio of only that
|
|
||||||
turn for both the user and the bot.
|
|
||||||
|
|
||||||
- Added new base class `BaseObject` which is now the base class of
|
|
||||||
`FrameProcessor`, `PipelineRunner`, `PipelineTask` and `BaseTransport`. The
|
|
||||||
new `BaseObject` adds supports for event handlers.
|
|
||||||
|
|
||||||
- Added support for a unified format for specifying function calling across all
|
|
||||||
LLM services.
|
|
||||||
|
|
||||||
```python
|
|
||||||
weather_function = FunctionSchema(
|
|
||||||
name="get_current_weather",
|
|
||||||
description="Get the current weather",
|
|
||||||
properties={
|
|
||||||
"location": {
|
|
||||||
"type": "string",
|
|
||||||
"description": "The city and state, e.g. San Francisco, CA",
|
|
||||||
},
|
|
||||||
"format": {
|
|
||||||
"type": "string",
|
|
||||||
"enum": ["celsius", "fahrenheit"],
|
|
||||||
"description": "The temperature unit to use. Infer this from the user's location.",
|
|
||||||
},
|
|
||||||
},
|
|
||||||
required=["location"],
|
|
||||||
)
|
|
||||||
tools = ToolsSchema(standard_tools=[weather_function])
|
|
||||||
```
|
|
||||||
|
|
||||||
- Added `speech_threshold` parameter to `GladiaSTTService`.
|
|
||||||
|
|
||||||
- Allow passing user (`user_kwargs`) and assistant (`assistant_kwargs`) context
|
|
||||||
aggregator parameters when using `create_context_aggregator()`. The values are
|
|
||||||
passed as a mapping that will then be converted to arguments.
|
|
||||||
|
|
||||||
- Added `speed` as an `InputParam` for both `ElevenLabsTTSService` and
|
|
||||||
`ElevenLabsHttpTTSService`.
|
|
||||||
|
|
||||||
- Added new `LLMFullResponseAggregator` to aggregate full LLM completions. At
|
|
||||||
every completion the `on_completion` event handler is triggered.
|
|
||||||
|
|
||||||
- Added a new frame, `RTVIServerMessageFrame`, and RTVI message
|
|
||||||
`RTVIServerMessage` which provides a generic mechanism for sending custom
|
|
||||||
messages from server to client. The `RTVIServerMessageFrame` is processed by
|
|
||||||
the `RTVIObserver` and will be delivered to the client's `onServerMessage`
|
|
||||||
callback or `ServerMessage` event.
|
|
||||||
|
|
||||||
- Added `GoogleLLMOpenAIBetaService` for Google LLM integration with an
|
|
||||||
OpenAI-compatible interface. Added foundational example
|
|
||||||
`14o-function-calling-gemini-openai-format.py`.
|
|
||||||
|
|
||||||
- Added `AzureRealtimeBetaLLMService` to support Azure's OpeanAI Realtime API. Added
|
|
||||||
foundational example `19a-azure-realtime-beta.py`.
|
|
||||||
|
|
||||||
- Introduced `GoogleVertexLLMService`, a new class for integrating with Vertex AI
|
|
||||||
Gemini models. Added foundational example
|
|
||||||
`14p-function-calling-gemini-vertex-ai.py`.
|
|
||||||
|
|
||||||
- Added support in `OpenAIRealtimeBetaLLMService` for a slate of new features:
|
|
||||||
|
|
||||||
- The `'gpt-4o-transcribe'` input audio transcription model, along
|
|
||||||
with new `language` and `prompt` options specific to that model.
|
|
||||||
- The `input_audio_noise_reduction` session property.
|
|
||||||
|
|
||||||
```python
|
|
||||||
session_properties = SessionProperties(
|
|
||||||
# ...
|
|
||||||
input_audio_noise_reduction=InputAudioNoiseReduction(
|
|
||||||
type="near_field" # also supported: "far_field"
|
|
||||||
)
|
|
||||||
# ...
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
- The `'semantic_vad'` `turn_detection` session property value, a more
|
|
||||||
sophisticated model for detecting when the user has stopped speaking.
|
|
||||||
- `on_conversation_item_created` and `on_conversation_item_updated`
|
|
||||||
events to `OpenAIRealtimeBetaLLMService`.
|
|
||||||
|
|
||||||
```python
|
|
||||||
@llm.event_handler("on_conversation_item_created")
|
|
||||||
async def on_conversation_item_created(llm, item_id, item):
|
|
||||||
# ...
|
|
||||||
|
|
||||||
@llm.event_handler("on_conversation_item_updated")
|
|
||||||
async def on_conversation_item_updated(llm, item_id, item):
|
|
||||||
# `item` may not always be available here
|
|
||||||
# ...
|
|
||||||
```
|
|
||||||
|
|
||||||
- The `retrieve_conversation_item(item_id)` method for introspecting a
|
|
||||||
conversation item on the server.
|
|
||||||
|
|
||||||
```python
|
|
||||||
item = await llm.retrieve_conversation_item(item_id)
|
|
||||||
```
|
|
||||||
|
|
||||||
### Changed
|
|
||||||
|
|
||||||
- Updated `OpenAISTTService` to use `gpt-4o-transcribe` as the default
|
|
||||||
transcription model.
|
|
||||||
|
|
||||||
- Updated `OpenAITTSService` to use `gpt-4o-mini-tts` as the default TTS model.
|
|
||||||
|
|
||||||
- Function calls are now executed in tasks. This means that the pipeline will
|
|
||||||
not be blocked while the function call is being executed.
|
|
||||||
|
|
||||||
- ⚠️ `PipelineTask` will now be automatically cancelled if no bot activity is
|
|
||||||
happening in the pipeline. There are a few settings to configure this
|
|
||||||
behavior, see `PipelineTask` documentation for more details.
|
|
||||||
|
|
||||||
- All event handlers are now executed in separate tasks in order to prevent
|
|
||||||
blocking the pipeline. It is possible that event handlers take some time to
|
|
||||||
execute in which case the pipeline would be blocked waiting for the event
|
|
||||||
handler to complete.
|
|
||||||
|
|
||||||
- Updated `TranscriptProcessor` to support text output from
|
|
||||||
`OpenAIRealtimeBetaLLMService`.
|
|
||||||
|
|
||||||
- `OpenAIRealtimeBetaLLMService` and `GeminiMultimodalLiveLLMService` now push
|
|
||||||
a `TTSTextFrame`.
|
|
||||||
|
|
||||||
- Updated the default mode for `CartesiaTTSService` and
|
|
||||||
`CartesiaHttpTTSService` to `sonic-2`.
|
|
||||||
|
|
||||||
### Deprecated
|
|
||||||
|
|
||||||
- Passing a `start_callback` to `LLMService.register_function()` is now
|
|
||||||
deprecated, simply move the code from the start callback to the function call.
|
|
||||||
|
|
||||||
- `TTSService` parameter `text_filter` is now deprecated, use `text_filters`
|
|
||||||
instead which is now a list. This allows passing multiple filters that will be
|
|
||||||
executed in order.
|
|
||||||
|
|
||||||
### Removed
|
|
||||||
|
|
||||||
- Removed deprecated `audio.resample_audio()`, use `create_default_resampler()`
|
|
||||||
instead.
|
|
||||||
|
|
||||||
- Removed deprecated`stt_service` parameter from `STTMuteFilter`.
|
|
||||||
|
|
||||||
- Removed deprecated RTVI processors, use an `RTVIObserver` instead.
|
|
||||||
|
|
||||||
- Removed deprecated `AWSTTSService`, use `PollyTTSService` instead.
|
|
||||||
|
|
||||||
- Removed deprecated field `tier` from `DailyTranscriptionSettings`, use `model`
|
|
||||||
instead.
|
|
||||||
|
|
||||||
- Removed deprecated `pipecat.vad` package, use `pipecat.audio.vad` instead.
|
|
||||||
|
|
||||||
### Fixed
|
|
||||||
|
|
||||||
- Fixed an assistant aggregator issue that could cause assistant text to be
|
|
||||||
split into multiple chunks during function calls.
|
|
||||||
|
|
||||||
- Fixed an assistant aggregator issue that was causing assistant text to not be
|
|
||||||
added to the context during function calls. This could lead to duplications.
|
|
||||||
|
|
||||||
- Fixed a `SegmentedSTTService` issue that was causing audio to be sent
|
|
||||||
prematurely to the STT service. Instead of analyzing the volume in this
|
|
||||||
service we rely on VAD events which use both VAD and volume.
|
|
||||||
|
|
||||||
- Fixed a `GeminiMultimodalLiveLLMService` issue that was causing messages to be
|
|
||||||
duplicated in the context when pushing `LLMMessagesAppendFrame` frames.
|
|
||||||
|
|
||||||
- Fixed an issue with `SegmentedSTTService` based services
|
|
||||||
(e.g. `GroqSTTService`) that was not allow audio to pass-through downstream.
|
|
||||||
|
|
||||||
- Fixed a `CartesiaTTSService` and `RimeTTSService` issue that would consider
|
|
||||||
text between spelling out tags end of sentence.
|
|
||||||
|
|
||||||
- Fixed a `match_endofsentence` issue that would result in floating point
|
|
||||||
numbers to be considered an end of sentence.
|
|
||||||
|
|
||||||
- Fixed a `match_endofsentence` issue that would result in emails to be
|
|
||||||
considered an end of sentence.
|
|
||||||
|
|
||||||
- Fixed an issue where the RTVI message `disconnect-bot` was pushing an
|
|
||||||
`EndFrame`, resulting in the pipeline not shutting down. It now pushes an
|
|
||||||
`EndTaskFrame` upstream to shutdown the pipeline.
|
|
||||||
|
|
||||||
- Fixed an issue with the `GoogleSTTService` where stream timeouts during
|
|
||||||
periods of inactivity were causing connection failures. The service now
|
|
||||||
properly detects timeout errors and handles reconnection gracefully,
|
|
||||||
ensuring continuous operation even after periods of silence or when using an
|
|
||||||
`STTMuteFilter`.
|
|
||||||
|
|
||||||
- Fixed an issue in `RimeTTSService` where the last line of text sent didn't
|
|
||||||
result in an audio output being generated.
|
|
||||||
|
|
||||||
- Fixed `OpenAIRealtimeBetaLLMService` by adding proper handling for:
|
|
||||||
- The `conversation.item.input_audio_transcription.delta` server message,
|
|
||||||
which was added server-side at some point and not handled client-side.
|
|
||||||
- Errors reported by the `response.done` server message.
|
|
||||||
|
|
||||||
### Other
|
|
||||||
|
|
||||||
- Add foundational example `07w-interruptible-fal.py`, showing `FalSTTService`.
|
|
||||||
|
|
||||||
- Added a new Ultravox example
|
|
||||||
`examples/foundational/07u-interruptible-ultravox.py`.
|
|
||||||
|
|
||||||
- Added new Neuphonic examples
|
|
||||||
`examples/foundational/07v-interruptible-neuphonic.py` and
|
|
||||||
`examples/foundational/07v-interruptible-neuphonic-http.py`.
|
|
||||||
|
|
||||||
- Added a new example `examples/foundational/36-user-email-gathering.py` to show
|
|
||||||
how to gather user emails. The example uses's Cartesia's `<spell></spell>`
|
|
||||||
tags and Rime `spell()` function to spell out the emails for confirmation.
|
|
||||||
|
|
||||||
- Update the `34-audio-recording.py` example to include an STT processor.
|
|
||||||
|
|
||||||
- Added foundational example `35-voice-switching.py` showing how to use the new
|
|
||||||
`PatternPairAggregator`. This example shows how to encode information for the
|
|
||||||
LLM to instruct TTS voice changes, but this can be used to encode any
|
|
||||||
information into the LLM response, which you want to parse and use in other
|
|
||||||
parts of your application.
|
|
||||||
|
|
||||||
- Added a Pipecat Cloud deployment example to the `examples` directory.
|
|
||||||
|
|
||||||
- Removed foundational examples 28b and 28c as the TranscriptProcessor no
|
|
||||||
longer has an LLM depedency. Renamed foundational example 28a to
|
|
||||||
`28-transcript-processor.py`.
|
|
||||||
|
|
||||||
## [0.0.58] - 2025-02-26
|
|
||||||
|
|
||||||
### Added
|
|
||||||
|
|
||||||
- Added track-specific audio event `on_track_audio_data` to
|
|
||||||
`AudioBufferProcessor` for accessing separate input and output audio tracks.
|
|
||||||
|
|
||||||
- Pipecat version will now be logged on every application startup. This will
|
- Pipecat version will now be logged on every application startup. This will
|
||||||
help us identify what version we are running in case of any issues.
|
help us identify what version we are running in case of any issues.
|
||||||
|
|
||||||
@@ -674,10 +45,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
|||||||
- ⚠️ `PipelineTask` now requires keyword arguments (except for the first one for
|
- ⚠️ `PipelineTask` now requires keyword arguments (except for the first one for
|
||||||
the pipeline).
|
the pipeline).
|
||||||
|
|
||||||
- Updated `PlayHTHttpTTSService` to take a `voice_engine` and `protocol` input
|
|
||||||
in the constructor. The previous method of providing a `voice_engine` input
|
|
||||||
that contains the engine and protocol is deprecated by PlayHT.
|
|
||||||
|
|
||||||
- The base `TTSService` class now strips leading newlines before sending text
|
- The base `TTSService` class now strips leading newlines before sending text
|
||||||
to the TTS provider. This change is to solve issues where some TTS providers,
|
to the TTS provider. This change is to solve issues where some TTS providers,
|
||||||
like Azure, would not output text due to newlines.
|
like Azure, would not output text due to newlines.
|
||||||
@@ -711,12 +78,6 @@ stt = DeepgramSTTService(..., live_options=LiveOptions(model="nova-2-general"))
|
|||||||
|
|
||||||
### Fixed
|
### Fixed
|
||||||
|
|
||||||
- Fixed an issue that would cause undesired interruptions via
|
|
||||||
`EmulateUserStartedSpeakingFrame`.
|
|
||||||
|
|
||||||
- Fixed a `GoogleLLMService` that was causing an exception when sending inline
|
|
||||||
audio in some cases.
|
|
||||||
|
|
||||||
- Fixed an `AudioContextWordTTSService` issue that would cause an `EndFrame` to
|
- Fixed an `AudioContextWordTTSService` issue that would cause an `EndFrame` to
|
||||||
disconnect from the TTS service before audio from all the contexts was
|
disconnect from the TTS service before audio from all the contexts was
|
||||||
received. This affected services like Cartesia and Rime.
|
received. This affected services like Cartesia and Rime.
|
||||||
@@ -730,6 +91,10 @@ stt = DeepgramSTTService(..., live_options=LiveOptions(model="nova-2-general"))
|
|||||||
|
|
||||||
- Fixed `match_endofsentence` support for ellipses.
|
- Fixed `match_endofsentence` support for ellipses.
|
||||||
|
|
||||||
|
- Fixed an issue that would cause undesired interruptions via
|
||||||
|
`EmulateUserStartedSpeakingFrame` when only interim transcriptions (i.e. no
|
||||||
|
final transcriptions) where received.
|
||||||
|
|
||||||
- Fixed an issue where `EndTaskFrame` was not triggering
|
- Fixed an issue where `EndTaskFrame` was not triggering
|
||||||
`on_client_disconnected` or closing the WebSocket in FastAPI.
|
`on_client_disconnected` or closing the WebSocket in FastAPI.
|
||||||
|
|
||||||
@@ -759,9 +124,6 @@ stt = DeepgramSTTService(..., live_options=LiveOptions(model="nova-2-general"))
|
|||||||
|
|
||||||
- Added Gemini support to `examples/phone-chatbot`.
|
- Added Gemini support to `examples/phone-chatbot`.
|
||||||
|
|
||||||
- Added foundational example `34-audio-recording.py` showing how to use the
|
|
||||||
AudioBufferProcessor callbacks to save merged and track recordings.
|
|
||||||
|
|
||||||
## [0.0.57] - 2025-02-14
|
## [0.0.57] - 2025-02-14
|
||||||
|
|
||||||
### Added
|
### Added
|
||||||
@@ -2394,7 +1756,7 @@ async def on_connected(processor):
|
|||||||
completed. If a task is never ran `has_finished()` will return False.
|
completed. If a task is never ran `has_finished()` will return False.
|
||||||
|
|
||||||
- `PipelineRunner` now supports SIGTERM. If received, the runner will be
|
- `PipelineRunner` now supports SIGTERM. If received, the runner will be
|
||||||
cancelled.
|
canceled.
|
||||||
|
|
||||||
### Fixed
|
### Fixed
|
||||||
|
|
||||||
|
|||||||
@@ -26,52 +26,11 @@ git commit -m "Description of your changes"
|
|||||||
git push origin your-branch-name
|
git push origin your-branch-name
|
||||||
```
|
```
|
||||||
|
|
||||||
8. **Submit a Pull Request (PR)**: Open a PR from your forked repository to the main branch of this repo.
|
9. **Submit a Pull Request (PR)**: Open a PR from your forked repository to the main branch of this repo.
|
||||||
> Important: Describe the changes you've made clearly!
|
> Important: Describe the changes you've made clearly!
|
||||||
|
|
||||||
Our maintainers will review your PR, and once everything is good, your contributions will be merged!
|
Our maintainers will review your PR, and once everything is good, your contributions will be merged!
|
||||||
|
|
||||||
## Code Style and Documentation
|
|
||||||
|
|
||||||
### Python Code Style
|
|
||||||
|
|
||||||
We use Ruff for code linting and formatting. Please ensure your code passes all linting checks before submitting a PR.
|
|
||||||
|
|
||||||
### Docstring Conventions
|
|
||||||
|
|
||||||
We follow Google-style docstrings with these specific conventions:
|
|
||||||
|
|
||||||
- Class docstrings should fully document all parameters used in `__init__`
|
|
||||||
- We don't require separate docstrings for `__init__` methods when parameters are documented in the class docstring
|
|
||||||
- Property methods should have docstrings explaining their purpose and return value
|
|
||||||
|
|
||||||
Example of correctly documented class:
|
|
||||||
|
|
||||||
```python
|
|
||||||
class MyClass:
|
|
||||||
"""Class description.
|
|
||||||
|
|
||||||
Additional details about the class.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
param1: Description of first parameter.
|
|
||||||
param2: Description of second parameter.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self, param1, param2):
|
|
||||||
# No docstring required here as parameters are documented above
|
|
||||||
self.param1 = param1
|
|
||||||
self.param2 = param2
|
|
||||||
|
|
||||||
@property
|
|
||||||
def some_property(self) -> str:
|
|
||||||
"""Get the formatted property value.
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
A string representation of the property.
|
|
||||||
"""
|
|
||||||
return f"Property: {self.param1}"
|
|
||||||
```
|
|
||||||
|
|
||||||
# Contributor Covenant Code of Conduct
|
# Contributor Covenant Code of Conduct
|
||||||
|
|
||||||
@@ -92,23 +51,23 @@ diverse, inclusive, and healthy community.
|
|||||||
Examples of behavior that contributes to a positive environment for our
|
Examples of behavior that contributes to a positive environment for our
|
||||||
community include:
|
community include:
|
||||||
|
|
||||||
- Demonstrating empathy and kindness toward other people
|
* Demonstrating empathy and kindness toward other people
|
||||||
- Being respectful of differing opinions, viewpoints, and experiences
|
* Being respectful of differing opinions, viewpoints, and experiences
|
||||||
- Giving and gracefully accepting constructive feedback
|
* Giving and gracefully accepting constructive feedback
|
||||||
- Accepting responsibility and apologizing to those affected by our mistakes,
|
* Accepting responsibility and apologizing to those affected by our mistakes,
|
||||||
and learning from the experience
|
and learning from the experience
|
||||||
- Focusing on what is best not just for us as individuals, but for the overall
|
* Focusing on what is best not just for us as individuals, but for the overall
|
||||||
community
|
community
|
||||||
|
|
||||||
Examples of unacceptable behavior include:
|
Examples of unacceptable behavior include:
|
||||||
|
|
||||||
- The use of sexualized language or imagery, and sexual attention or advances of
|
* The use of sexualized language or imagery, and sexual attention or advances of
|
||||||
any kind
|
any kind
|
||||||
- Trolling, insulting or derogatory comments, and personal or political attacks
|
* Trolling, insulting or derogatory comments, and personal or political attacks
|
||||||
- Public or private harassment
|
* Public or private harassment
|
||||||
- Publishing others' private information, such as a physical or email address,
|
* Publishing others' private information, such as a physical or email address,
|
||||||
without their explicit permission
|
without their explicit permission
|
||||||
- Other conduct which could reasonably be considered inappropriate in a
|
* Other conduct which could reasonably be considered inappropriate in a
|
||||||
professional setting
|
professional setting
|
||||||
|
|
||||||
## Enforcement Responsibilities
|
## Enforcement Responsibilities
|
||||||
@@ -203,4 +162,4 @@ For answers to common questions about this code of conduct, see the FAQ at
|
|||||||
[v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html
|
[v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html
|
||||||
[Mozilla CoC]: https://github.com/mozilla/diversity
|
[Mozilla CoC]: https://github.com/mozilla/diversity
|
||||||
[FAQ]: https://www.contributor-covenant.org/faq
|
[FAQ]: https://www.contributor-covenant.org/faq
|
||||||
[translations]: https://www.contributor-covenant.org/translations
|
[translations]: https://www.contributor-covenant.org/translations
|
||||||
23
README.md
23
README.md
@@ -55,18 +55,17 @@ pip install "pipecat-ai[option,...]"
|
|||||||
|
|
||||||
### Available services
|
### Available services
|
||||||
|
|
||||||
| Category | Services | Install Command Example |
|
| Category | Services | Install Command Example |
|
||||||
| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------- |
|
| ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------- |
|
||||||
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [Parakeet (NVIDIA)](https://docs.pipecat.ai/server/services/stt/parakeet), [Ultravox](https://docs.pipecat.ai/server/services/stt/ultravox), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) | `pip install "pipecat-ai[deepgram]"` |
|
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) | `pip install "pipecat-ai[deepgram]"` |
|
||||||
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [Together AI](https://docs.pipecat.ai/server/services/llm/together) | `pip install "pipecat-ai[openai]"` |
|
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Together AI](https://docs.pipecat.ai/server/services/llm/together) | `pip install "pipecat-ai[openai]"` |
|
||||||
| Text-to-Speech | [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [FastPitch (NVIDIA)](https://docs.pipecat.ai/server/services/tts/fastpitch), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) | `pip install "pipecat-ai[cartesia]"` |
|
| Text-to-Speech | [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) | `pip install "pipecat-ai[cartesia]"` |
|
||||||
| Speech-to-Speech | [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai) | `pip install "pipecat-ai[google]"` |
|
| Speech-to-Speech | [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai) | `pip install "pipecat-ai[google]"` |
|
||||||
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local | `pip install "pipecat-ai[daily]"` |
|
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local | `pip install "pipecat-ai[daily]"` |
|
||||||
| Video | [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) | `pip install "pipecat-ai[tavus,simli]"` |
|
| Video | [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) | `pip install "pipecat-ai[tavus,simli]"` |
|
||||||
| Memory | [mem0](https://docs.pipecat.ai/server/services/memory/mem0) | `pip install "pipecat-ai[mem0]"` |
|
| Vision & Image | [Moondream](https://docs.pipecat.ai/server/services/vision/moondream), [fal](https://docs.pipecat.ai/server/services/image-generation/fal) | `pip install "pipecat-ai[moondream]"` |
|
||||||
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/fal), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) | `pip install "pipecat-ai[moondream]"` |
|
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [Noisereduce](https://docs.pipecat.ai/server/utilities/audio/noisereduce-filter) | `pip install "pipecat-ai[silero]"` |
|
||||||
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [Noisereduce](https://docs.pipecat.ai/server/utilities/audio/noisereduce-filter) | `pip install "pipecat-ai[silero]"` |
|
| Analytics & Metrics | [Canonical AI](https://docs.pipecat.ai/server/services/analytics/canonical), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) | `pip install "pipecat-ai[canonical]"` |
|
||||||
| Analytics & Metrics | [Canonical AI](https://docs.pipecat.ai/server/services/analytics/canonical), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) | `pip install "pipecat-ai[canonical]"` |
|
|
||||||
|
|
||||||
📚 [View full services documentation →](https://docs.pipecat.ai/server/services/supported-services)
|
📚 [View full services documentation →](https://docs.pipecat.ai/server/services/supported-services)
|
||||||
|
|
||||||
|
|||||||
@@ -3,11 +3,10 @@ coverage~=7.6.12
|
|||||||
grpcio-tools~=1.67.1
|
grpcio-tools~=1.67.1
|
||||||
pip-tools~=7.4.1
|
pip-tools~=7.4.1
|
||||||
pre-commit~=4.0.1
|
pre-commit~=4.0.1
|
||||||
pyright~=1.1.397
|
pyright~=1.1.394
|
||||||
pytest~=8.3.4
|
pytest~=8.3.4
|
||||||
pytest-asyncio~=0.25.3
|
pytest-asyncio~=0.25.3
|
||||||
pytest-aiohttp==1.1.0
|
ruff~=0.9.7
|
||||||
ruff~=0.11.1
|
|
||||||
setuptools~=70.0.0
|
setuptools~=70.0.0
|
||||||
setuptools_scm~=8.1.0
|
setuptools_scm~=8.1.0
|
||||||
python-dotenv~=1.0.1
|
python-dotenv~=1.0.1
|
||||||
|
|||||||
@@ -50,14 +50,6 @@ autodoc_mock_imports = [
|
|||||||
"pyht.protos",
|
"pyht.protos",
|
||||||
"pyht.protos.api_pb2",
|
"pyht.protos.api_pb2",
|
||||||
"pipecat_ai_playht", # PlayHT wrapper
|
"pipecat_ai_playht", # PlayHT wrapper
|
||||||
"vllm",
|
|
||||||
"aiortc",
|
|
||||||
"aiortc.mediastreams",
|
|
||||||
"cv2",
|
|
||||||
"av",
|
|
||||||
"pyneuphonic",
|
|
||||||
"mem0",
|
|
||||||
"mlx_whisper",
|
|
||||||
"anthropic",
|
"anthropic",
|
||||||
"assemblyai",
|
"assemblyai",
|
||||||
"boto3",
|
"boto3",
|
||||||
|
|||||||
@@ -45,10 +45,8 @@ Transport & Serialization
|
|||||||
Utilities
|
Utilities
|
||||||
~~~~~~~~~
|
~~~~~~~~~
|
||||||
|
|
||||||
* :mod:`Adapters <pipecat.adapters>`
|
|
||||||
* :mod:`Clocks <pipecat.clocks>`
|
* :mod:`Clocks <pipecat.clocks>`
|
||||||
* :mod:`Metrics <pipecat.metrics>`
|
* :mod:`Metrics <pipecat.metrics>`
|
||||||
* :mod:`Observers <pipecat.observers>`
|
|
||||||
* :mod:`Sync <pipecat.sync>`
|
* :mod:`Sync <pipecat.sync>`
|
||||||
* :mod:`Transcriptions <pipecat.transcriptions>`
|
* :mod:`Transcriptions <pipecat.transcriptions>`
|
||||||
* :mod:`Utils <pipecat.utils>`
|
* :mod:`Utils <pipecat.utils>`
|
||||||
@@ -58,12 +56,10 @@ Utilities
|
|||||||
:caption: API Reference
|
:caption: API Reference
|
||||||
:hidden:
|
:hidden:
|
||||||
|
|
||||||
Adapters <api/pipecat.adapters>
|
|
||||||
Audio <api/pipecat.audio>
|
Audio <api/pipecat.audio>
|
||||||
Clocks <api/pipecat.clocks>
|
Clocks <api/pipecat.clocks>
|
||||||
Frames <api/pipecat.frames>
|
Frames <api/pipecat.frames>
|
||||||
Metrics <api/pipecat.metrics>
|
Metrics <api/pipecat.metrics>
|
||||||
Observers <api/pipecat.observers>
|
|
||||||
Pipeline <api/pipecat.pipeline>
|
Pipeline <api/pipecat.pipeline>
|
||||||
Processors <api/pipecat.processors>
|
Processors <api/pipecat.processors>
|
||||||
Serializers <api/pipecat.serializers>
|
Serializers <api/pipecat.serializers>
|
||||||
|
|||||||
@@ -12,29 +12,22 @@ pipecat-ai[aws]
|
|||||||
pipecat-ai[azure]
|
pipecat-ai[azure]
|
||||||
pipecat-ai[canonical]
|
pipecat-ai[canonical]
|
||||||
pipecat-ai[cartesia]
|
pipecat-ai[cartesia]
|
||||||
pipecat-ai[cerebras]
|
|
||||||
pipecat-ai[deepseek]
|
|
||||||
pipecat-ai[daily]
|
pipecat-ai[daily]
|
||||||
pipecat-ai[deepgram]
|
pipecat-ai[deepgram]
|
||||||
pipecat-ai[elevenlabs]
|
pipecat-ai[elevenlabs]
|
||||||
pipecat-ai[fal]
|
pipecat-ai[fal]
|
||||||
pipecat-ai[fireworks]
|
pipecat-ai[fireworks]
|
||||||
pipecat-ai[fish]
|
|
||||||
pipecat-ai[gladia]
|
pipecat-ai[gladia]
|
||||||
pipecat-ai[google]
|
pipecat-ai[google]
|
||||||
pipecat-ai[grok]
|
pipecat-ai[grok]
|
||||||
pipecat-ai[groq]
|
pipecat-ai[groq]
|
||||||
# pipecat-ai[krisp] # Mocked
|
# pipecat-ai[krisp] # Mocked instead
|
||||||
pipecat-ai[koala]
|
|
||||||
pipecat-ai[langchain]
|
pipecat-ai[langchain]
|
||||||
pipecat-ai[livekit]
|
pipecat-ai[livekit]
|
||||||
pipecat-ai[lmnt]
|
pipecat-ai[lmnt]
|
||||||
pipecat-ai[local]
|
pipecat-ai[local]
|
||||||
# pipecat-ai[mem0] # Mocked
|
|
||||||
# pipecat-ai[mlx-whisper] # Mocked
|
|
||||||
pipecat-ai[moondream]
|
pipecat-ai[moondream]
|
||||||
pipecat-ai[nim]
|
pipecat-ai[nim]
|
||||||
# pipecat-ai[neuphonic] # Mocked
|
|
||||||
pipecat-ai[noisereduce]
|
pipecat-ai[noisereduce]
|
||||||
pipecat-ai[openai]
|
pipecat-ai[openai]
|
||||||
# pipecat-ai[openpipe]
|
# pipecat-ai[openpipe]
|
||||||
@@ -43,9 +36,5 @@ pipecat-ai[riva]
|
|||||||
pipecat-ai[silero]
|
pipecat-ai[silero]
|
||||||
pipecat-ai[simli]
|
pipecat-ai[simli]
|
||||||
pipecat-ai[soundfile]
|
pipecat-ai[soundfile]
|
||||||
pipecat-ai[tavus]
|
|
||||||
pipecat-ai[together]
|
|
||||||
# pipecat-ai[ultravox] # Mocked
|
|
||||||
# pipecat-ai[webrtc] # Mocked
|
|
||||||
pipecat-ai[websocket]
|
pipecat-ai[websocket]
|
||||||
pipecat-ai[whisper]
|
pipecat-ai[whisper]
|
||||||
@@ -29,9 +29,6 @@ DAILY_SAMPLE_ROOM_URL=https://...
|
|||||||
ELEVENLABS_API_KEY=...
|
ELEVENLABS_API_KEY=...
|
||||||
ELEVENLABS_VOICE_ID=...
|
ELEVENLABS_VOICE_ID=...
|
||||||
|
|
||||||
# Neuphonic
|
|
||||||
NEUPHONIC_API_KEY=...
|
|
||||||
|
|
||||||
# Fal
|
# Fal
|
||||||
FAL_KEY=...
|
FAL_KEY=...
|
||||||
|
|
||||||
@@ -90,10 +87,3 @@ ASSEMBLYAI_API_KEY=...
|
|||||||
|
|
||||||
# OpenRouter
|
# OpenRouter
|
||||||
OPENROUTER_API_KEY=...
|
OPENROUTER_API_KEY=...
|
||||||
|
|
||||||
# Piper
|
|
||||||
PIPER_BASE_URL=...
|
|
||||||
|
|
||||||
# Smart turn
|
|
||||||
LOCAL_SMART_TURN_MODEL_PATH=
|
|
||||||
REMOTE_SMART_TURN_URL=
|
|
||||||
@@ -18,7 +18,7 @@ from pipecat.frames.frames import AudioRawFrame, EndFrame, OutputAudioRawFrame,
|
|||||||
from pipecat.pipeline.pipeline import Pipeline
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineTask
|
from pipecat.pipeline.task import PipelineTask
|
||||||
from pipecat.services.cartesia.tts import CartesiaTTSService
|
from pipecat.services.cartesia import CartesiaTTSService
|
||||||
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
@@ -64,7 +64,7 @@ async def main():
|
|||||||
|
|
||||||
tts = CartesiaTTSService(
|
tts = CartesiaTTSService(
|
||||||
api_key=os.getenv("CARTESIA_API_KEY"),
|
api_key=os.getenv("CARTESIA_API_KEY"),
|
||||||
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
|
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
|
||||||
)
|
)
|
||||||
|
|
||||||
runner = PipelineRunner()
|
runner = PipelineRunner()
|
||||||
|
|||||||
@@ -21,9 +21,9 @@ from pipecat.pipeline.runner import PipelineRunner
|
|||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||||
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
||||||
from pipecat.processors.audio.audio_buffer_processor import AudioBufferProcessor
|
from pipecat.processors.audio.audio_buffer_processor import AudioBufferProcessor
|
||||||
from pipecat.services.canonical.metrics import CanonicalMetricsService
|
from pipecat.services.canonical import CanonicalMetricsService
|
||||||
from pipecat.services.elevenlabs.tts import ElevenLabsTTSService
|
from pipecat.services.elevenlabs import ElevenLabsTTSService
|
||||||
from pipecat.services.openai.llm import OpenAILLMService
|
from pipecat.services.openai import OpenAILLMService
|
||||||
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
@@ -72,7 +72,7 @@ async def main():
|
|||||||
# voice_id="gD1IexrzCvsXPHUuT0s3",
|
# voice_id="gD1IexrzCvsXPHUuT0s3",
|
||||||
)
|
)
|
||||||
|
|
||||||
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
|
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
|
||||||
|
|
||||||
messages = [
|
messages = [
|
||||||
{
|
{
|
||||||
@@ -113,8 +113,8 @@ async def main():
|
|||||||
llm,
|
llm,
|
||||||
tts,
|
tts,
|
||||||
transport.output(),
|
transport.output(),
|
||||||
canonical, # uploads audio buffer to Canonical AI for metrics
|
|
||||||
audio_buffer_processor, # captures audio into a buffer
|
audio_buffer_processor, # captures audio into a buffer
|
||||||
|
canonical, # uploads audio buffer to Canonical AI for metrics
|
||||||
context_aggregator.assistant(),
|
context_aggregator.assistant(),
|
||||||
]
|
]
|
||||||
)
|
)
|
||||||
|
|||||||
@@ -23,8 +23,8 @@ from pipecat.pipeline.runner import PipelineRunner
|
|||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||||
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
||||||
from pipecat.processors.audio.audio_buffer_processor import AudioBufferProcessor
|
from pipecat.processors.audio.audio_buffer_processor import AudioBufferProcessor
|
||||||
from pipecat.services.elevenlabs.tts import ElevenLabsTTSService
|
from pipecat.services.elevenlabs import ElevenLabsTTSService
|
||||||
from pipecat.services.openai.llm import OpenAILLMService
|
from pipecat.services.openai import OpenAILLMService
|
||||||
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
@@ -32,16 +32,10 @@ load_dotenv(override=True)
|
|||||||
logger.remove(0)
|
logger.remove(0)
|
||||||
logger.add(sys.stderr, level="DEBUG")
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
# Create the recordings directory if it doesn't exist
|
|
||||||
os.makedirs("recordings", exist_ok=True)
|
|
||||||
|
|
||||||
|
async def save_audio(audio: bytes, sample_rate: int, num_channels: int):
|
||||||
async def save_audio(audio: bytes, sample_rate: int, num_channels: int, name: str):
|
|
||||||
if len(audio) > 0:
|
if len(audio) > 0:
|
||||||
filename = os.path.join(
|
filename = f"conversation_recording{datetime.datetime.now().strftime('%Y%m%d_%H%M%S')}.wav"
|
||||||
"recordings",
|
|
||||||
f"{name}_conversation_recording{datetime.datetime.now().strftime('%Y%m%d_%H%M%S')}.wav",
|
|
||||||
)
|
|
||||||
with io.BytesIO() as buffer:
|
with io.BytesIO() as buffer:
|
||||||
with wave.open(buffer, "wb") as wf:
|
with wave.open(buffer, "wb") as wf:
|
||||||
wf.setsampwidth(2)
|
wf.setsampwidth(2)
|
||||||
@@ -95,7 +89,7 @@ async def main():
|
|||||||
# voice_id="gD1IexrzCvsXPHUuT0s3",
|
# voice_id="gD1IexrzCvsXPHUuT0s3",
|
||||||
)
|
)
|
||||||
|
|
||||||
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
|
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
|
||||||
|
|
||||||
messages = [
|
messages = [
|
||||||
{
|
{
|
||||||
@@ -116,7 +110,7 @@ async def main():
|
|||||||
|
|
||||||
# NOTE: Watch out! This will save all the conversation in memory. You
|
# NOTE: Watch out! This will save all the conversation in memory. You
|
||||||
# can pass `buffer_size` to get periodic callbacks.
|
# can pass `buffer_size` to get periodic callbacks.
|
||||||
audiobuffer = AudioBufferProcessor(enable_turn_audio=True)
|
audiobuffer = AudioBufferProcessor()
|
||||||
|
|
||||||
pipeline = Pipeline(
|
pipeline = Pipeline(
|
||||||
[
|
[
|
||||||
@@ -134,15 +128,7 @@ async def main():
|
|||||||
|
|
||||||
@audiobuffer.event_handler("on_audio_data")
|
@audiobuffer.event_handler("on_audio_data")
|
||||||
async def on_audio_data(buffer, audio, sample_rate, num_channels):
|
async def on_audio_data(buffer, audio, sample_rate, num_channels):
|
||||||
await save_audio(audio, sample_rate, num_channels, "full")
|
await save_audio(audio, sample_rate, num_channels)
|
||||||
|
|
||||||
@audiobuffer.event_handler("on_user_turn_audio_data")
|
|
||||||
async def on_user_turn_audio_data(buffer, audio, sample_rate, num_channels):
|
|
||||||
await save_audio(audio, sample_rate, num_channels, "user")
|
|
||||||
|
|
||||||
@audiobuffer.event_handler("on_bot_turn_audio_data")
|
|
||||||
async def on_bot_turn_audio_data(buffer, audio, sample_rate, num_channels):
|
|
||||||
await save_audio(audio, sample_rate, num_channels, "bot")
|
|
||||||
|
|
||||||
@transport.event_handler("on_first_participant_joined")
|
@transport.event_handler("on_first_participant_joined")
|
||||||
async def on_first_participant_joined(transport, participant):
|
async def on_first_participant_joined(transport, participant):
|
||||||
|
|||||||
@@ -1,9 +1,3 @@
|
|||||||
#
|
|
||||||
# Copyright (c) 2024–2025, Daily
|
|
||||||
#
|
|
||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
|
||||||
#
|
|
||||||
|
|
||||||
import argparse
|
import argparse
|
||||||
import asyncio
|
import asyncio
|
||||||
import os
|
import os
|
||||||
@@ -18,8 +12,8 @@ from pipecat.pipeline.pipeline import Pipeline
|
|||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||||
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
||||||
from pipecat.services.elevenlabs.tts import ElevenLabsTTSService
|
from pipecat.services.elevenlabs import ElevenLabsTTSService
|
||||||
from pipecat.services.openai.llm import OpenAILLMService
|
from pipecat.services.openai import OpenAILLMService
|
||||||
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
@@ -53,7 +47,7 @@ async def main(room_url: str, token: str):
|
|||||||
voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
|
voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
|
||||||
)
|
)
|
||||||
|
|
||||||
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
|
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
|
||||||
|
|
||||||
messages = [
|
messages = [
|
||||||
{
|
{
|
||||||
|
|||||||
@@ -1,9 +1,3 @@
|
|||||||
#
|
|
||||||
# Copyright (c) 2024–2025, Daily
|
|
||||||
#
|
|
||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
|
||||||
#
|
|
||||||
|
|
||||||
import os
|
import os
|
||||||
|
|
||||||
import aiohttp
|
import aiohttp
|
||||||
|
|||||||
@@ -1,9 +1,3 @@
|
|||||||
#
|
|
||||||
# Copyright (c) 2024–2025, Daily
|
|
||||||
#
|
|
||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
|
||||||
#
|
|
||||||
|
|
||||||
import asyncio
|
import asyncio
|
||||||
import os
|
import os
|
||||||
import sys
|
import sys
|
||||||
@@ -16,8 +10,8 @@ from pipecat.pipeline.pipeline import Pipeline
|
|||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||||
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
||||||
from pipecat.services.cartesia.tts import CartesiaTTSService
|
from pipecat.services.cartesia import CartesiaTTSService
|
||||||
from pipecat.services.openai.llm import OpenAILLMService
|
from pipecat.services.openai import OpenAILLMService
|
||||||
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
@@ -40,10 +34,10 @@ async def main(room_url: str, token: str):
|
|||||||
)
|
)
|
||||||
|
|
||||||
tts = CartesiaTTSService(
|
tts = CartesiaTTSService(
|
||||||
api_key=os.getenv("CARTESIA_API_KEY", ""), voice_id="71a7ad14-091c-4e8e-a314-022ece01c121"
|
api_key=os.getenv("CARTESIA_API_KEY", ""), voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22"
|
||||||
)
|
)
|
||||||
|
|
||||||
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
|
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
|
||||||
|
|
||||||
messages = [
|
messages = [
|
||||||
{
|
{
|
||||||
|
|||||||
@@ -1,178 +0,0 @@
|
|||||||
# Handling PSTN/SIP Dial-in on Pipecat Cloud
|
|
||||||
|
|
||||||
This repository contains two server implementations for handling
|
|
||||||
the pinless dial-in workflow in Pipecat Cloud. This is the companion to the
|
|
||||||
Pipecat Cloud [pstn_sip starter image](https://github.com/daily-co/pipecat-cloud-images/tree/main/pipecat-starters/pstn_sip).
|
|
||||||
In addition you can use `/api/dial` to trigger dial-out, and
|
|
||||||
eventually, call-transfers.
|
|
||||||
|
|
||||||
1. [FastAPI Server](fastapi-webhook-server/README.md) -
|
|
||||||
A FastAPI implementation that handles PSTN (Public Switched Telephone
|
|
||||||
Network) and SIP (Session Initiation Protocol) calls using the Daily API.
|
|
||||||
|
|
||||||
2. [Next.js Serverless](nextjs-webhook-server/README.md) -
|
|
||||||
A Next.js API implementation designed for deployment on Vercel's
|
|
||||||
serverless platform.
|
|
||||||
|
|
||||||
Both implementations provide:
|
|
||||||
|
|
||||||
- HMAC signature validation for pinless webhook
|
|
||||||
- Structured logging
|
|
||||||
- Support for dial-in and dial-out settings
|
|
||||||
- Voicemail detection and call transfer functionality (coming soon)
|
|
||||||
- Test request handling
|
|
||||||
|
|
||||||
## Choosing an Implementation
|
|
||||||
|
|
||||||
- Use the **FastAPI Server** if you:
|
|
||||||
|
|
||||||
- Need a standalone server
|
|
||||||
- Prefer Python and FastAPI
|
|
||||||
- Want to deploy to traditional hosting platforms
|
|
||||||
|
|
||||||
- Use the **Next.js Serverless** implementation if you:
|
|
||||||
- Want serverless deployment
|
|
||||||
- Prefer JavaScript/TypeScript
|
|
||||||
- Already use Next.js and Vercel for other projects
|
|
||||||
- Need quick scaling and zero maintenance
|
|
||||||
|
|
||||||
## Prerequisites
|
|
||||||
|
|
||||||
### Environment Variables
|
|
||||||
|
|
||||||
Both implementations require similar environment variables:
|
|
||||||
|
|
||||||
- `PIPECAT_CLOUD_API_KEY`: Pipecat Cloud API Key, begins with pk\_\*
|
|
||||||
- `AGENT_NAME`: Your Daily agent name
|
|
||||||
- `PINLESS_HMAC_SECRET`: Your HMAC secret for request verification
|
|
||||||
- `LOG_LEVEL`: (Optional) Logging level (defaults to 'info')
|
|
||||||
|
|
||||||
See the individual README files in each implementation directory for
|
|
||||||
specific setup instructions.
|
|
||||||
|
|
||||||
### Phone number setup
|
|
||||||
|
|
||||||
You can buy a phone number through the Pipecat Cloud Dashboard:
|
|
||||||
|
|
||||||
1. Go to `Settings` > `Telephony`
|
|
||||||
2. Follow the UI to purchase a phone number
|
|
||||||
3. Configure the webhook URL to receive incoming calls (e.g. `https://my-webhook-url.com/api/dial`)
|
|
||||||
|
|
||||||
Or purchase the number using Daily's
|
|
||||||
[PhoneNumbers API](https://docs.daily.co/reference/rest-api/phone-numbers).
|
|
||||||
|
|
||||||
```bash
|
|
||||||
curl --request POST \
|
|
||||||
--url https://api.daily.co/v1/domain-dialin-config \
|
|
||||||
--header 'Authorization: Bearer $TOKEN' \
|
|
||||||
--header 'Content-Type: application/json' \
|
|
||||||
--data-raw '{
|
|
||||||
"type": "pinless_dialin",
|
|
||||||
"name_prefix": "Customer1",
|
|
||||||
"phone_number": "+1PURCHASED_NUM",
|
|
||||||
"room_creation_api": "https://example.com/api/dial",
|
|
||||||
"hold_music_url": "https://example.com/static/ringtone.mp3",
|
|
||||||
"timeout_config": {
|
|
||||||
"message": "No agent is available right now"
|
|
||||||
}
|
|
||||||
}'
|
|
||||||
```
|
|
||||||
|
|
||||||
The API will return a static SIP URI (`sip_uri`) that can be called
|
|
||||||
from other SIP services.
|
|
||||||
|
|
||||||
### `room_creation_api`
|
|
||||||
|
|
||||||
To make and receive calls currently you have to host a server that
|
|
||||||
handles incoming calls. In the coming weeks, incoming calls will be
|
|
||||||
directly handled within Daily and we will expose an endpoint similar
|
|
||||||
to `{service}/start` that will manage this for you.
|
|
||||||
|
|
||||||
In the meantime, the server described below serves as the webhook
|
|
||||||
handler for the `room_creation_api`. Configure your pinless phone
|
|
||||||
number or SIP interconnect to the `ngrok` tunnel or
|
|
||||||
the actual server URL, append `/api/dial` to the webhook URL.
|
|
||||||
|
|
||||||
## Example curl commands
|
|
||||||
|
|
||||||
Note: Replace `http://localhost:3000` with your actual server URL and
|
|
||||||
phone numbers with valid values for your use case.
|
|
||||||
|
|
||||||
### Dialin Request
|
|
||||||
|
|
||||||
The server will receive a request when a call is received from Daily.
|
|
||||||
|
|
||||||
### Dialout Request
|
|
||||||
|
|
||||||
Dial a number, will use any purchased number
|
|
||||||
|
|
||||||
```bash
|
|
||||||
curl -X POST http://localhost:3000/api/dial \
|
|
||||||
-H "Content-Type: application/json" \
|
|
||||||
-d '{
|
|
||||||
"dialout_settings": [
|
|
||||||
{
|
|
||||||
"phoneNumber": "+1234567890",
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}'
|
|
||||||
```
|
|
||||||
|
|
||||||
Dial a number with callerId, which is the UUID of a purchased number.
|
|
||||||
|
|
||||||
```bash
|
|
||||||
curl -X POST http://localhost:3000/api/dial \
|
|
||||||
-H "Content-Type: application/json" \
|
|
||||||
-d '{
|
|
||||||
"dialout_settings": [
|
|
||||||
{
|
|
||||||
"phoneNumber": "+1234567890",
|
|
||||||
"callerId": "purchased_phone_uuid"
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}'
|
|
||||||
```
|
|
||||||
|
|
||||||
Dial a number
|
|
||||||
|
|
||||||
```bash
|
|
||||||
curl -X POST http://localhost:3000/api/dial \
|
|
||||||
-H "Content-Type: application/json" \
|
|
||||||
-d '{
|
|
||||||
"dialout_settings": [
|
|
||||||
{
|
|
||||||
"phoneNumber": "+1234567890",
|
|
||||||
"callerId": "purchased_phone_uuid"
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}'
|
|
||||||
```
|
|
||||||
|
|
||||||
### Advanced Request with Voicemail Detection
|
|
||||||
|
|
||||||
```bash
|
|
||||||
curl -X POST http://localhost:3000/api/dial \
|
|
||||||
-H "Content-Type: application/json" \
|
|
||||||
-d '{
|
|
||||||
"To": "+1234567890",
|
|
||||||
"From": "+1987654321",
|
|
||||||
"callId": "call-uuid-123",
|
|
||||||
"callDomain": "domain-uuid-456",
|
|
||||||
"dialout_settings": [
|
|
||||||
{
|
|
||||||
"phoneNumber": "+1234567890",
|
|
||||||
"callerId": "purchased_phone_uuid"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"voicemail_detection": {
|
|
||||||
"testInPrebuilt": true
|
|
||||||
},
|
|
||||||
"call_transfer": {
|
|
||||||
"mode": "dialout",
|
|
||||||
"speakSummary": true,
|
|
||||||
"storeSummary": true,
|
|
||||||
"operatorNumber": "+1234567890",
|
|
||||||
"testInPrebuilt": true
|
|
||||||
}
|
|
||||||
}'
|
|
||||||
```
|
|
||||||
@@ -1,98 +0,0 @@
|
|||||||
# FastAPI server for handling Daily PSTN/SIP Webhook
|
|
||||||
|
|
||||||
A FastAPI server that handles PSTN (Public Switched Telephone Network) and SIP (Session Initiation Protocol) calls using the Daily API.
|
|
||||||
|
|
||||||
## Setup
|
|
||||||
|
|
||||||
1. Clone the repository
|
|
||||||
|
|
||||||
2. Navigate to the `fastapi-webhook-server` directory:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cd fastapi-webhook-server
|
|
||||||
```
|
|
||||||
|
|
||||||
3. Install dependencies:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
pip install -r requirements.txt
|
|
||||||
```
|
|
||||||
|
|
||||||
4. Copy `env.example` to `.env`:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cp env.example .env
|
|
||||||
```
|
|
||||||
|
|
||||||
5. Update `.env` with your credentials:
|
|
||||||
|
|
||||||
- `AGENT_NAME`: Your Daily agent name
|
|
||||||
- `PIPECAT_CLOUD_API_KEY`: Your Daily API key
|
|
||||||
- `PINLESS_HMAC_SECRET`: Your HMAC secret for request verification
|
|
||||||
|
|
||||||
## Running the Server
|
|
||||||
|
|
||||||
Start the server:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
python server.py
|
|
||||||
```
|
|
||||||
|
|
||||||
The server will run on `http://localhost:7860` and you can expose it via ngrok for testing:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
`ngrok http 7860`
|
|
||||||
```
|
|
||||||
|
|
||||||
> Tip: Use a subdomain for a consistent URL (e.g. `ngrok http -subdomain=mydomain http://localhost:7860`)
|
|
||||||
|
|
||||||
## API Endpoints
|
|
||||||
|
|
||||||
### GET /
|
|
||||||
|
|
||||||
Health check endpoint that returns a "Hello, World!" message.
|
|
||||||
|
|
||||||
### POST /api/dial
|
|
||||||
|
|
||||||
Initiates a PSTN/SIP call with the following request body format:
|
|
||||||
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"To": "+14152251493",
|
|
||||||
"From": "+14158483432",
|
|
||||||
"callId": "string-contains-uuid",
|
|
||||||
"callDomain": "string-contains-uuid",
|
|
||||||
"dialout_settings": [
|
|
||||||
{
|
|
||||||
"phoneNumber": "+14158483432",
|
|
||||||
"callerId": "+14152251493"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"voicemail_detection": {
|
|
||||||
"testInPrebuilt": true
|
|
||||||
},
|
|
||||||
"call_transfer": {
|
|
||||||
"mode": "dialout",
|
|
||||||
"speakSummary": true,
|
|
||||||
"storeSummary": true,
|
|
||||||
"operatorNumber": "+14152250006",
|
|
||||||
"testInPrebuilt": true
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
#### Response
|
|
||||||
|
|
||||||
Returns a JSON object containing:
|
|
||||||
|
|
||||||
- `status`: Success/failure status
|
|
||||||
- `data`: Response from Daily API
|
|
||||||
- `room_properties`: Properties of the created Daily room
|
|
||||||
|
|
||||||
## Error Handling
|
|
||||||
|
|
||||||
- 401: Invalid signature
|
|
||||||
- 400: Invalid authorization header (e.g. missing Daily API key in bot.py)
|
|
||||||
- 405: Method not allowed (e.g. incorrect route on the webhook URL)
|
|
||||||
- 500: Server errors (missing API key, network issues)
|
|
||||||
- Other status codes are passed through from the Daily API
|
|
||||||
@@ -1,3 +0,0 @@
|
|||||||
AGENT_NAME="your-agent-name"
|
|
||||||
PIPECAT_CLOUD_API_KEY="your-daily-api-key"
|
|
||||||
PINLESS_HMAC_SECRET="hmac-secret-pinless-dialin"
|
|
||||||
@@ -1,6 +0,0 @@
|
|||||||
fastapi
|
|
||||||
uvicorn
|
|
||||||
python-dotenv
|
|
||||||
requests
|
|
||||||
pydantic
|
|
||||||
loguru
|
|
||||||
@@ -1,202 +0,0 @@
|
|||||||
#
|
|
||||||
# Copyright (c) 2025, Daily
|
|
||||||
#
|
|
||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
|
||||||
#
|
|
||||||
|
|
||||||
# server.py
|
|
||||||
|
|
||||||
|
|
||||||
import base64 # for calculating hmac signature
|
|
||||||
import hmac
|
|
||||||
import os # for accessing environment variables
|
|
||||||
import time # for setting expiration time
|
|
||||||
from typing import Any, Dict, List, Optional
|
|
||||||
|
|
||||||
import requests
|
|
||||||
from dotenv import load_dotenv
|
|
||||||
from fastapi import FastAPI, HTTPException, Request
|
|
||||||
from loguru import logger
|
|
||||||
from pydantic import BaseModel, Field
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
|
||||||
|
|
||||||
app = FastAPI()
|
|
||||||
|
|
||||||
|
|
||||||
class RoomRequest(BaseModel):
|
|
||||||
test: Optional[str] = Field(None, alias="Test", description="Test field")
|
|
||||||
To: Optional[str] = Field(None, alias="to", description="Destination phone number")
|
|
||||||
From: Optional[str] = Field(None, alias="from", description="Source phone number")
|
|
||||||
callId: Optional[str] = Field(None, alias="call_id", description="Unique call identifier")
|
|
||||||
callDomain: Optional[str] = Field(
|
|
||||||
None, alias="call_domain", description="Call domain identifier"
|
|
||||||
)
|
|
||||||
dialout_settings: Optional[List[Dict[str, Any]]] = Field(
|
|
||||||
None, description="An array of phone numbers or SIP URIs to dialout to"
|
|
||||||
)
|
|
||||||
voicemail_detection: Optional[Dict[str, Any]] = Field(
|
|
||||||
None, description="A flag to perform voicemail or answeing-machine detection"
|
|
||||||
)
|
|
||||||
call_transfer: Optional[Dict[str, Any]] = Field(None, description="to initiate a call transfer")
|
|
||||||
|
|
||||||
class Config:
|
|
||||||
populate_by_name = True
|
|
||||||
alias_generator = None
|
|
||||||
|
|
||||||
|
|
||||||
"""
|
|
||||||
body can contain any fields, but for handling PSTN/SIP,
|
|
||||||
we recommend sending the following custom values:
|
|
||||||
dialin, dialout, voicemail detection, and call transfer
|
|
||||||
|
|
||||||
|
|
||||||
"To": "+14152251493",
|
|
||||||
"From": "+14158483432",
|
|
||||||
"callId": "string-contains-uuid",
|
|
||||||
"callDomain": "string-contains-uuid"
|
|
||||||
These need to be remapped to dialin_settings
|
|
||||||
|
|
||||||
"dialout_settings": [
|
|
||||||
{"phoneNumber": "+14158483432", "callerId": "+14152251493"},
|
|
||||||
{"sipUri": "sip:username@sip.hostname"}
|
|
||||||
],
|
|
||||||
},
|
|
||||||
|
|
||||||
voicemail_detection:{
|
|
||||||
testInPrebuilt: true
|
|
||||||
},
|
|
||||||
|
|
||||||
"call_transfer": {
|
|
||||||
"mode": "dialout",
|
|
||||||
"speakSummary": true,
|
|
||||||
"storeSummary": true,
|
|
||||||
"operatorNumber": "+14152250006",
|
|
||||||
"testInPrebuilt": true
|
|
||||||
}
|
|
||||||
"""
|
|
||||||
|
|
||||||
|
|
||||||
@app.get("/")
|
|
||||||
async def read_root():
|
|
||||||
return {"message": "Hello, World!"}
|
|
||||||
|
|
||||||
|
|
||||||
@app.post("/api/dial")
|
|
||||||
async def dial(request: RoomRequest, raw_request: Request):
|
|
||||||
logger.info("Incoming request to /dial:")
|
|
||||||
logger.info(f"Headers: {dict(raw_request.headers)}")
|
|
||||||
raw_body = await raw_request.body()
|
|
||||||
raw_body_str = raw_body.decode()
|
|
||||||
logger.info(f"Raw body: {raw_body_str}")
|
|
||||||
logger.info(f"Parsed body: {request.dict()}")
|
|
||||||
|
|
||||||
# calculate signature and compare/verify
|
|
||||||
hmac_secret = os.getenv("PINLESS_HMAC_SECRET")
|
|
||||||
timestamp = raw_request.headers.get("x-pinless-timestamp")
|
|
||||||
signature = raw_request.headers.get("x-pinless-signature")
|
|
||||||
|
|
||||||
if not hmac_secret:
|
|
||||||
logger.debug("Skipping HMAC validation - PINLESS_HMAC_SECRET not set")
|
|
||||||
elif timestamp and signature:
|
|
||||||
message = timestamp + "." + raw_body_str
|
|
||||||
|
|
||||||
base64_decoded_secret = base64.b64decode(hmac_secret)
|
|
||||||
computed_signature = base64.b64encode(
|
|
||||||
hmac.new(base64_decoded_secret, message.encode(), "sha256").digest()
|
|
||||||
).decode()
|
|
||||||
|
|
||||||
if computed_signature != signature:
|
|
||||||
logger.error(f"Invalid signature. Expected {signature}, got {computed_signature}")
|
|
||||||
raise HTTPException(status_code=401, detail="Invalid signature")
|
|
||||||
else:
|
|
||||||
logger.debug("Skipping HMAC validation - no signature headers present")
|
|
||||||
|
|
||||||
if request.test == "test":
|
|
||||||
logger.debug("Test request received")
|
|
||||||
return {"status": "success", "message": "Test request received"}
|
|
||||||
|
|
||||||
dialin_settings = None
|
|
||||||
# these fields are camelCase in the request
|
|
||||||
required_fields = ["To", "From", "callId", "callDomain"]
|
|
||||||
if all(
|
|
||||||
field in request.dict() and request.dict()[field] is not None for field in required_fields
|
|
||||||
):
|
|
||||||
# transform from camelCase to snake_case because daily-python expects snake_case
|
|
||||||
dialin_settings = {
|
|
||||||
"From": request.From,
|
|
||||||
"To": request.To,
|
|
||||||
"call_id": request.callId,
|
|
||||||
"call_domain": request.callDomain,
|
|
||||||
# transform from camelCase to snake_case
|
|
||||||
}
|
|
||||||
logger.debug(f"Populated dialin_settings from request: {dialin_settings}")
|
|
||||||
|
|
||||||
daily_room_properties = {
|
|
||||||
"enable_dialout": request.dialout_settings is not None,
|
|
||||||
}
|
|
||||||
|
|
||||||
if dialin_settings is not None:
|
|
||||||
sip_config = {
|
|
||||||
"display_name": request.From,
|
|
||||||
"sip_mode": "dial-in",
|
|
||||||
"num_endpoints": 2 if request.call_transfer is not None else 1,
|
|
||||||
"codecs": {"audio": ["OPUS"]},
|
|
||||||
}
|
|
||||||
daily_room_properties["sip"] = sip_config
|
|
||||||
|
|
||||||
# Setting default expiry to 5 minutes from now
|
|
||||||
daily_room_properties["exp"] = int(time.time()) + (5 * 60)
|
|
||||||
|
|
||||||
logger.debug(f"Daily room properties: {daily_room_properties}")
|
|
||||||
payload = {
|
|
||||||
"createDailyRoom": True,
|
|
||||||
"dailyRoomProperties": daily_room_properties,
|
|
||||||
"body": {
|
|
||||||
"dialin_settings": dialin_settings,
|
|
||||||
"dialout_settings": request.dialout_settings,
|
|
||||||
"voicemail_detection": request.voicemail_detection,
|
|
||||||
"call_transfer": request.call_transfer,
|
|
||||||
},
|
|
||||||
}
|
|
||||||
|
|
||||||
pcc_api_key = os.getenv("PIPECAT_CLOUD_API_KEY")
|
|
||||||
agent_name = os.getenv("AGENT_NAME", "my-first-agent")
|
|
||||||
|
|
||||||
if not pcc_api_key:
|
|
||||||
raise HTTPException(status_code=500, detail="DAILY_API_KEY environment variable is not set")
|
|
||||||
|
|
||||||
headers = {"Authorization": f"Bearer {pcc_api_key}", "Content-Type": "application/json"}
|
|
||||||
|
|
||||||
url = f"https://api.pipecat.daily.co/v1/public/{agent_name}/start"
|
|
||||||
|
|
||||||
logger.debug(f"Making API call to Daily: {url} {headers} {payload}")
|
|
||||||
|
|
||||||
try:
|
|
||||||
response = requests.post(url, json=payload, headers=headers)
|
|
||||||
response.raise_for_status()
|
|
||||||
response_data = response.json()
|
|
||||||
logger.debug(f"Response: {response_data}")
|
|
||||||
return {
|
|
||||||
"status": "success",
|
|
||||||
"data": response_data,
|
|
||||||
"room_properties": daily_room_properties,
|
|
||||||
}
|
|
||||||
except requests.exceptions.HTTPError as e:
|
|
||||||
# Pass through the status code and error details from the Daily API
|
|
||||||
status_code = e.response.status_code
|
|
||||||
error_detail = e.response.json() if e.response.content else str(e)
|
|
||||||
logger.error(f"HTTP error: {error_detail}")
|
|
||||||
raise HTTPException(status_code=status_code, detail=error_detail)
|
|
||||||
except requests.exceptions.RequestException as e:
|
|
||||||
logger.error(f"Request error: {str(e)}")
|
|
||||||
raise HTTPException(status_code=500, detail=str(e))
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
try:
|
|
||||||
import uvicorn
|
|
||||||
|
|
||||||
uvicorn.run(app, host="0.0.0.0", port=7860)
|
|
||||||
except KeyboardInterrupt:
|
|
||||||
logger.info("Server stopped manually")
|
|
||||||
@@ -1,53 +0,0 @@
|
|||||||
# dependencies
|
|
||||||
/node_modules
|
|
||||||
/.pnp
|
|
||||||
.pnp.js
|
|
||||||
|
|
||||||
# testing
|
|
||||||
/coverage
|
|
||||||
|
|
||||||
# next.js
|
|
||||||
/.next/
|
|
||||||
/out/
|
|
||||||
|
|
||||||
# production
|
|
||||||
/build
|
|
||||||
|
|
||||||
# misc
|
|
||||||
.DS_Store
|
|
||||||
*.pem
|
|
||||||
|
|
||||||
# debug
|
|
||||||
npm-debug.log*
|
|
||||||
yarn-debug.log*
|
|
||||||
yarn-error.log*
|
|
||||||
.pnpm-debug.log*
|
|
||||||
|
|
||||||
# local env files
|
|
||||||
.env*.local
|
|
||||||
|
|
||||||
# vercel
|
|
||||||
.vercel
|
|
||||||
|
|
||||||
# typescript
|
|
||||||
*.tsbuildinfo
|
|
||||||
next-env.d.ts
|
|
||||||
|
|
||||||
# IDE specific files
|
|
||||||
.idea/
|
|
||||||
.vscode/
|
|
||||||
*.swp
|
|
||||||
*.swo
|
|
||||||
|
|
||||||
# Logs
|
|
||||||
logs
|
|
||||||
*.log
|
|
||||||
|
|
||||||
# OS generated files
|
|
||||||
.DS_Store
|
|
||||||
.DS_Store?
|
|
||||||
._*
|
|
||||||
.Spotlight-V100
|
|
||||||
.Trashes
|
|
||||||
ehthumbs.db
|
|
||||||
Thumbs.db
|
|
||||||
@@ -1,115 +0,0 @@
|
|||||||
# Next.js server for handling Daily PSTN/SIP Webhook
|
|
||||||
|
|
||||||
Next.js API routes for handling Daily PSTN/SIP Pipecat requests.
|
|
||||||
|
|
||||||
## Features
|
|
||||||
|
|
||||||
- API endpoint for handling Daily PSTN/SIP Pipecat requests
|
|
||||||
- HMAC signature validation
|
|
||||||
- Structured logging with Pino
|
|
||||||
- Support for dial-in and dial-out settings
|
|
||||||
- Voicemail detection and call transfer functionality
|
|
||||||
- Test request handling
|
|
||||||
|
|
||||||
## Setup
|
|
||||||
|
|
||||||
1. Clone the repository
|
|
||||||
|
|
||||||
2. Navigate to the `nextjs-webhook-server` directory:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cd nextjs-webhook-server
|
|
||||||
```
|
|
||||||
|
|
||||||
3. Install dependencies:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
npm install
|
|
||||||
```
|
|
||||||
|
|
||||||
4. Create `.env.local` file with your credentials:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cp env.local.example .env.local
|
|
||||||
```
|
|
||||||
|
|
||||||
5. Update your `.env` with your secrets:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
PIPECAT_CLOUD_API_KEY=pk_*
|
|
||||||
AGENT_NAME=my-first-agent
|
|
||||||
PINLESS_HMAC_SECRET=your_hmac_secret
|
|
||||||
LOG_LEVEL=info
|
|
||||||
```
|
|
||||||
|
|
||||||
### Running the server
|
|
||||||
|
|
||||||
Run the development server:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
npm run dev
|
|
||||||
```
|
|
||||||
|
|
||||||
The server will run on `http://localhost:7860` and you can expose it via ngrok for testing:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
`ngrok http 7860`
|
|
||||||
```
|
|
||||||
|
|
||||||
> Tip: Use a subdomain for a consistent URL (e.g. `ngrok http -subdomain=mydomain http://localhost:7860`)
|
|
||||||
|
|
||||||
## API Endpoints
|
|
||||||
|
|
||||||
### GET /api
|
|
||||||
|
|
||||||
Returns a simple "Hello, World!" message with a cute cat emoji to verify the server is running.
|
|
||||||
|
|
||||||
### POST /api/dial
|
|
||||||
|
|
||||||
Handles dial-in and dial-out requests for Pipecat Cloud.
|
|
||||||
|
|
||||||
#### Test Requests
|
|
||||||
|
|
||||||
The endpoint handles test requests when a webhook is configured. Send a request with `"Test": "test"` to verify your setup:
|
|
||||||
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"Test": "test"
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
#### Production Request Format
|
|
||||||
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
// for dial-in from webhook
|
|
||||||
"To": "+14152251493",
|
|
||||||
"From": "+14158483432",
|
|
||||||
"callId": "string-contains-uuid",
|
|
||||||
"callDomain": "string-contains-uuid",
|
|
||||||
// for making a dial out to a phone or SIP
|
|
||||||
"dialout_settings": [
|
|
||||||
{ "phoneNumber": "+14158483432", "callerId": "purchased_phone_uuid" },
|
|
||||||
{ "sipUri": "sip:username@sip.hostname.com" }
|
|
||||||
]
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
## Deployment
|
|
||||||
|
|
||||||
The application is configured for Vercel deployment:
|
|
||||||
|
|
||||||
1. Push your code to a Git repository
|
|
||||||
2. Import your project in Vercel dashboard
|
|
||||||
3. Configure environment variables:
|
|
||||||
- `PIPECAT_CLOUD_API_KEY`
|
|
||||||
- `AGENT_NAME`
|
|
||||||
- `PINLESS_HMAC_SECRET`
|
|
||||||
- `LOG_LEVEL` (optional, defaults to 'info')
|
|
||||||
4. Deploy!
|
|
||||||
|
|
||||||
## Security
|
|
||||||
|
|
||||||
- HMAC signature validation for request authentication
|
|
||||||
- Environment variables for sensitive credentials
|
|
||||||
- Method validation (POST only for /dial)
|
|
||||||
@@ -1,4 +0,0 @@
|
|||||||
AGENT_NAME=my-first-agent
|
|
||||||
PIPECAT_CLOUD_API_KEY=your_daily_api_key
|
|
||||||
PINLESS_HMAC_SECRET=your_hmac_secret
|
|
||||||
LOG_LEVEL="info"
|
|
||||||
File diff suppressed because it is too large
Load Diff
@@ -1,22 +0,0 @@
|
|||||||
{
|
|
||||||
"name": "my-daily-app",
|
|
||||||
"version": "0.1.0",
|
|
||||||
"private": true,
|
|
||||||
"scripts": {
|
|
||||||
"dev": "next dev -p 7860",
|
|
||||||
"build": "next build",
|
|
||||||
"start": "next start -p 7860",
|
|
||||||
"lint": "next lint"
|
|
||||||
},
|
|
||||||
"dependencies": {
|
|
||||||
"axios": "^1.6.0",
|
|
||||||
"next": "^14.0.0",
|
|
||||||
"pino": "^8.15.0",
|
|
||||||
"react": "^18.2.0",
|
|
||||||
"react-dom": "^18.2.0"
|
|
||||||
},
|
|
||||||
"devDependencies": {
|
|
||||||
"eslint": "^8.46.0",
|
|
||||||
"eslint-config-next": "^14.0.0"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
@@ -1,176 +0,0 @@
|
|||||||
import { logger } from '../../lib/utils';
|
|
||||||
import axios from 'axios';
|
|
||||||
import crypto from 'crypto';
|
|
||||||
|
|
||||||
const validateSignature = (body, signature, timestamp, secret) => {
|
|
||||||
// Skip if any required fields are missing
|
|
||||||
if (!signature || !timestamp || !secret) {
|
|
||||||
logger.warn('Missing required fields for HMAC validation');
|
|
||||||
return true;
|
|
||||||
}
|
|
||||||
|
|
||||||
try {
|
|
||||||
const decodedSecret = Buffer.from(secret, 'base64');
|
|
||||||
const hmac = crypto.createHmac('sha256', decodedSecret);
|
|
||||||
const signatureData = `${timestamp}.${body}`;
|
|
||||||
const computedSignature = hmac.update(signatureData).digest('base64');
|
|
||||||
|
|
||||||
logger.debug('Signature validation:', {
|
|
||||||
timestamp,
|
|
||||||
signatureData: signatureData.substring(0, 50) + '...',
|
|
||||||
computedSignature,
|
|
||||||
receivedSignature: signature
|
|
||||||
});
|
|
||||||
|
|
||||||
return computedSignature === signature;
|
|
||||||
} catch (error) {
|
|
||||||
logger.error('Error validating signature:', error);
|
|
||||||
return true; // Allow request to proceed on error
|
|
||||||
}
|
|
||||||
};
|
|
||||||
|
|
||||||
export default async function handler(req, res) {
|
|
||||||
// Only allow POST requests
|
|
||||||
if (req.method !== 'POST') {
|
|
||||||
return res.status(405).json({ error: 'Method not allowed' });
|
|
||||||
}
|
|
||||||
|
|
||||||
try {
|
|
||||||
logger.info('Incoming request to /api/dial:');
|
|
||||||
logger.info(`Headers: ${JSON.stringify(req.headers)}`);
|
|
||||||
|
|
||||||
const rawBody = JSON.stringify(req.body);
|
|
||||||
logger.info(`Raw body: ${rawBody}`);
|
|
||||||
|
|
||||||
const signature = req.headers['x-pinless-signature'];
|
|
||||||
const timestamp = req.headers['x-pinless-timestamp'];
|
|
||||||
|
|
||||||
if (signature && timestamp) {
|
|
||||||
logger.info('Validating HMAC signature');
|
|
||||||
if (!validateSignature(rawBody, signature, timestamp, process.env.PINLESS_HMAC_SECRET)) {
|
|
||||||
logger.error('Invalid HMAC signature', { signature, timestamp });
|
|
||||||
return res.status(401).json({
|
|
||||||
error: 'Invalid signature',
|
|
||||||
message: 'Invalid HMAC signature'
|
|
||||||
});
|
|
||||||
}
|
|
||||||
} else {
|
|
||||||
logger.info('Skipping HMAC validation - no signature headers present');
|
|
||||||
}
|
|
||||||
|
|
||||||
// Extract request data
|
|
||||||
const {
|
|
||||||
Test: test,
|
|
||||||
To,
|
|
||||||
From,
|
|
||||||
callId,
|
|
||||||
callDomain,
|
|
||||||
dialout_settings,
|
|
||||||
voicemail_detection,
|
|
||||||
call_transfer
|
|
||||||
} = req.body;
|
|
||||||
|
|
||||||
// Handle test requests when a webhook is configured
|
|
||||||
if (test === 'test') {
|
|
||||||
logger.debug('Test request received');
|
|
||||||
return res.status(200).json({ status: 'success', message: 'Test request received' });
|
|
||||||
}
|
|
||||||
|
|
||||||
// Process dialin settings
|
|
||||||
let dialin_settings = null;
|
|
||||||
const requiredFields = ['To', 'From', 'callId', 'callDomain'];
|
|
||||||
|
|
||||||
if (requiredFields.every(field => req.body[field] !== undefined && req.body[field] !== null)) {
|
|
||||||
dialin_settings = {
|
|
||||||
// snake_case because pipecat expects this format
|
|
||||||
From,
|
|
||||||
To,
|
|
||||||
call_id: callId,
|
|
||||||
call_domain: callDomain,
|
|
||||||
};
|
|
||||||
logger.debug(`Populated dialin_settings from request: ${JSON.stringify(dialin_settings)}`);
|
|
||||||
}
|
|
||||||
|
|
||||||
// Set up Daily room properties
|
|
||||||
const daily_room_properties = {
|
|
||||||
enable_dialout: dialout_settings !== undefined && dialout_settings !== null,
|
|
||||||
exp: Math.floor(Date.now() / 1000) + (5 * 60), // 5 minutes from now
|
|
||||||
};
|
|
||||||
|
|
||||||
// Configure SIP if dialin settings are provided
|
|
||||||
if (dialin_settings !== null) {
|
|
||||||
const sip_config = {
|
|
||||||
display_name: From,
|
|
||||||
sip_mode: 'dial-in',
|
|
||||||
num_endpoints: call_transfer !== null ? 2 : 1,
|
|
||||||
codecs: {"audio": ["OPUS"]},
|
|
||||||
};
|
|
||||||
daily_room_properties.sip = sip_config;
|
|
||||||
}
|
|
||||||
|
|
||||||
// Prepare payload for {service}/start API call
|
|
||||||
const payload = {
|
|
||||||
createDailyRoom: true,
|
|
||||||
dailyRoomProperties: daily_room_properties,
|
|
||||||
body: {
|
|
||||||
dialin_settings,
|
|
||||||
dialout_settings,
|
|
||||||
voicemail_detection,
|
|
||||||
call_transfer,
|
|
||||||
},
|
|
||||||
};
|
|
||||||
|
|
||||||
logger.debug(`Daily room properties: ${JSON.stringify(daily_room_properties)}`);
|
|
||||||
|
|
||||||
// Get Daily API key and agent name from environment variables
|
|
||||||
const pccApiKey = process.env.PIPECAT_CLOUD_API_KEY;
|
|
||||||
const agentName = process.env.AGENT_NAME || 'my-first-agent';
|
|
||||||
|
|
||||||
if (!pccApiKey) {
|
|
||||||
throw new Error('PIPECAT_CLOUD_API_KEY environment variable is not set');
|
|
||||||
}
|
|
||||||
|
|
||||||
// Set up headers for Daily API call
|
|
||||||
const headers = {
|
|
||||||
'Authorization': `Bearer ${pccApiKey}`,
|
|
||||||
'Content-Type': 'application/json',
|
|
||||||
};
|
|
||||||
|
|
||||||
const url = `https://api.pipecat.daily.co/v1/public/${agentName}/start`;
|
|
||||||
logger.debug(`Making API call to Daily: ${url} ${JSON.stringify(headers)} ${JSON.stringify(payload)}`);
|
|
||||||
|
|
||||||
try {
|
|
||||||
const response = await axios.post(url, payload, { headers });
|
|
||||||
logger.debug(`Response: ${JSON.stringify(response.data)}`);
|
|
||||||
|
|
||||||
return res.status(200).json({
|
|
||||||
status: 'success',
|
|
||||||
data: response.data,
|
|
||||||
room_properties: daily_room_properties,
|
|
||||||
});
|
|
||||||
} catch (error) {
|
|
||||||
if (error.response) {
|
|
||||||
// Pass through status code and error details from the Daily API
|
|
||||||
const statusCode = error.response.status;
|
|
||||||
const errorDetail = error.response.data || error.message;
|
|
||||||
logger.error(`HTTP error: ${JSON.stringify(errorDetail)}`);
|
|
||||||
return res.status(statusCode).json(errorDetail);
|
|
||||||
} else {
|
|
||||||
logger.error(`Request error: ${error.message}`);
|
|
||||||
return res.status(500).json({ error: error.message });
|
|
||||||
}
|
|
||||||
}
|
|
||||||
} catch (error) {
|
|
||||||
logger.error(`Unexpected error: ${error.message}`);
|
|
||||||
return res.status(500).json({ error: 'Internal server error', message: error.message });
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// Configure body parser to preserve raw body text
|
|
||||||
export const config = {
|
|
||||||
api: {
|
|
||||||
bodyParser: {
|
|
||||||
sizeLimit: '1mb',
|
|
||||||
},
|
|
||||||
},
|
|
||||||
};
|
|
||||||
@@ -1,6 +0,0 @@
|
|||||||
import { logger } from '../../lib/utils';
|
|
||||||
|
|
||||||
export default function handler(req, res) {
|
|
||||||
logger.info('Received request to /api');
|
|
||||||
res.status(200).json({ message: 'Hello, World! from ᓚᘏᗢ' });
|
|
||||||
}
|
|
||||||
@@ -1,6 +0,0 @@
|
|||||||
module.exports = {
|
|
||||||
version: 2,
|
|
||||||
buildCommand: "next build",
|
|
||||||
outputDirectory: ".next",
|
|
||||||
cleanUrls: true
|
|
||||||
};
|
|
||||||
@@ -1,94 +0,0 @@
|
|||||||
# Python
|
|
||||||
__pycache__/
|
|
||||||
*.py[cod]
|
|
||||||
*$py.class
|
|
||||||
*.so
|
|
||||||
.Python
|
|
||||||
build/
|
|
||||||
dist/
|
|
||||||
*.egg-info/
|
|
||||||
*.egg
|
|
||||||
.installed.cfg
|
|
||||||
.eggs/
|
|
||||||
downloads/
|
|
||||||
lib/
|
|
||||||
lib64/
|
|
||||||
parts/
|
|
||||||
sdist/
|
|
||||||
var/
|
|
||||||
wheels/
|
|
||||||
share/python-wheels/
|
|
||||||
MANIFEST
|
|
||||||
|
|
||||||
# Virtual Environments
|
|
||||||
venv/
|
|
||||||
env/
|
|
||||||
.env
|
|
||||||
.venv/
|
|
||||||
ENV/
|
|
||||||
env.bak/
|
|
||||||
venv.bak/
|
|
||||||
|
|
||||||
# IDE
|
|
||||||
.idea/
|
|
||||||
.vscode/
|
|
||||||
.spyderproject
|
|
||||||
.spyproject
|
|
||||||
.ropeproject
|
|
||||||
|
|
||||||
# Testing and Coverage
|
|
||||||
.coverage
|
|
||||||
.coverage.*
|
|
||||||
htmlcov/
|
|
||||||
.pytest_cache/
|
|
||||||
.tox/
|
|
||||||
.nox/
|
|
||||||
.cache
|
|
||||||
nosetests.xml
|
|
||||||
coverage.xml
|
|
||||||
*.cover
|
|
||||||
.hypothesis/
|
|
||||||
cover/
|
|
||||||
|
|
||||||
# Logs and Databases
|
|
||||||
*.log
|
|
||||||
*.db
|
|
||||||
db.sqlite3
|
|
||||||
db.sqlite3-journal
|
|
||||||
pip-log.txt
|
|
||||||
|
|
||||||
# System Files
|
|
||||||
.DS_Store
|
|
||||||
Thumbs.db
|
|
||||||
desktop.ini
|
|
||||||
*.swp
|
|
||||||
*.swo
|
|
||||||
*.bak
|
|
||||||
*.tmp
|
|
||||||
*~
|
|
||||||
|
|
||||||
# Build and Documentation
|
|
||||||
docs/_build/
|
|
||||||
.pybuilder/
|
|
||||||
target/
|
|
||||||
instance/
|
|
||||||
.webassets-cache
|
|
||||||
.pdm.toml
|
|
||||||
.pdm-python
|
|
||||||
.pdm-build/
|
|
||||||
__pypackages__/
|
|
||||||
|
|
||||||
# Other
|
|
||||||
*.mo
|
|
||||||
*.pot
|
|
||||||
*.sage.py
|
|
||||||
.mypy_cache/
|
|
||||||
.dmypy.json
|
|
||||||
dmypy.json
|
|
||||||
.pyre/
|
|
||||||
.pytype/
|
|
||||||
cython_debug/
|
|
||||||
.ipynb_checkpoints
|
|
||||||
|
|
||||||
# Pipecat cloud
|
|
||||||
.pcc-deploy.toml
|
|
||||||
@@ -1,7 +0,0 @@
|
|||||||
FROM dailyco/pipecat-base:latest
|
|
||||||
|
|
||||||
COPY ./requirements.txt requirements.txt
|
|
||||||
|
|
||||||
RUN pip install --no-cache-dir --upgrade -r requirements.txt
|
|
||||||
|
|
||||||
COPY ./bot.py bot.py
|
|
||||||
@@ -1,196 +0,0 @@
|
|||||||
# Pipecat Cloud Starter Project
|
|
||||||
|
|
||||||
[](https://docs.pipecat.daily.co) [](https://discord.gg/dailyco)
|
|
||||||
|
|
||||||
A template voice agent for [Pipecat Cloud](https://www.daily.co/products/pipecat-cloud/) that demonstrates building and deploying a conversational AI agent.
|
|
||||||
|
|
||||||
> **For a detailed step-by-step guide, see our [Quickstart Documentation](https://docs.pipecat.daily.co/quickstart).**
|
|
||||||
|
|
||||||
## Prerequisites
|
|
||||||
|
|
||||||
- Python 3.10+
|
|
||||||
- Linux, MacOS, or Windows Subsystem for Linux (WSL)
|
|
||||||
- [Docker](https://www.docker.com) and a Docker repository (e.g., [Docker Hub](https://hub.docker.com))
|
|
||||||
- A Docker Hub account (or other container registry account)
|
|
||||||
- [Pipecat Cloud](https://pipecat.daily.co) account
|
|
||||||
|
|
||||||
> **Note**: If you haven't installed Docker yet, follow the official installation guides for your platform ([Linux](https://docs.docker.com/engine/install/), [Mac](https://docs.docker.com/desktop/setup/install/mac-install/), [Windows](https://docs.docker.com/desktop/setup/install/windows-install/)). For Docker Hub, [create a free account](https://hub.docker.com/signup) and log in via terminal with `docker login`.
|
|
||||||
|
|
||||||
## Get Started
|
|
||||||
|
|
||||||
### 1. Get the starter project
|
|
||||||
|
|
||||||
Clone the starter project from GitHub:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git clone https://github.com/daily-co/pipecat-cloud-starter
|
|
||||||
cd pipecat-cloud-starter
|
|
||||||
```
|
|
||||||
|
|
||||||
### 2. Set up your Python environment
|
|
||||||
|
|
||||||
We recommend using a virtual environment to manage your Python dependencies.
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Create a virtual environment
|
|
||||||
python -m venv .venv
|
|
||||||
|
|
||||||
# Activate it
|
|
||||||
source .venv/bin/activate # On Windows: .venv\Scripts\activate
|
|
||||||
|
|
||||||
# Install the Pipecat Cloud CLI
|
|
||||||
pip install pipecatcloud
|
|
||||||
```
|
|
||||||
|
|
||||||
### 3. Authenticate with Pipecat Cloud
|
|
||||||
|
|
||||||
```bash
|
|
||||||
pcc auth login
|
|
||||||
```
|
|
||||||
|
|
||||||
### 4. Acquire required API keys
|
|
||||||
|
|
||||||
This starter requires the following API keys:
|
|
||||||
|
|
||||||
- **OpenAI API Key**: Get from [platform.openai.com/api-keys](https://platform.openai.com/api-keys)
|
|
||||||
- **Cartesia API Key**: Get from [play.cartesia.ai/keys](https://play.cartesia.ai/keys)
|
|
||||||
- **Daily API Key**: Automatically provided through your Pipecat Cloud account
|
|
||||||
|
|
||||||
### 5. Configure to run locally (optional)
|
|
||||||
|
|
||||||
You can test your agent locally before deploying to Pipecat Cloud:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Set environment variables with your API keys
|
|
||||||
export CARTESIA_API_KEY="your_cartesia_key"
|
|
||||||
export DAILY_API_KEY="your_daily_key"
|
|
||||||
export OPENAI_API_KEY="your_openai_key"
|
|
||||||
```
|
|
||||||
|
|
||||||
> Your `DAILY_API_KEY` can be found at [https://pipecat.daily.co](https://pipecat.daily.co) under the `Settings` in the `Daily (WebRTC)` tab.
|
|
||||||
|
|
||||||
First install requirements:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
pip install -r requirements.txt
|
|
||||||
```
|
|
||||||
|
|
||||||
Then, launch the bot.py script locally:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
LOCAL_RUN=1 python bot.py
|
|
||||||
```
|
|
||||||
|
|
||||||
## Deploy & Run
|
|
||||||
|
|
||||||
### 1. Build and push your Docker image
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Build the image (targeting ARM architecture for cloud deployment)
|
|
||||||
docker build --platform=linux/arm64 -t my-first-agent:latest .
|
|
||||||
|
|
||||||
# Tag with your Docker username and version
|
|
||||||
docker tag my-first-agent:latest your-username/my-first-agent:0.1
|
|
||||||
|
|
||||||
# Push to Docker Hub
|
|
||||||
docker push your-username/my-first-agent:0.1
|
|
||||||
```
|
|
||||||
|
|
||||||
### 2. Create a secret set for your API keys
|
|
||||||
|
|
||||||
The starter project requires API keys for OpenAI and Cartesia:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Copy the example env file
|
|
||||||
cp env.example .env
|
|
||||||
|
|
||||||
# Edit .env to add your API keys:
|
|
||||||
# CARTESIA_API_KEY=your_cartesia_key
|
|
||||||
# OPENAI_API_KEY=your_openai_key
|
|
||||||
|
|
||||||
# Create a secret set from your .env file
|
|
||||||
pcc secrets set my-first-agent-secrets --file .env
|
|
||||||
```
|
|
||||||
|
|
||||||
Alternatively, you can create secrets directly via CLI:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
pcc secrets set my-first-agent-secrets \
|
|
||||||
CARTESIA_API_KEY=your_cartesia_key \
|
|
||||||
OPENAI_API_KEY=your_openai_key
|
|
||||||
```
|
|
||||||
|
|
||||||
### 3. Deploy to Pipecat Cloud
|
|
||||||
|
|
||||||
```bash
|
|
||||||
pcc deploy my-first-agent your-username/my-first-agent:0.1 --secrets my-first-agent-secrets
|
|
||||||
```
|
|
||||||
|
|
||||||
> **Note (Optional)**: For a more maintainable approach, you can use the included `pcc-deploy.toml` file:
|
|
||||||
>
|
|
||||||
> ```toml
|
|
||||||
> agent_name = "my-first-agent"
|
|
||||||
> image = "your-username/my-first-agent:0.1"
|
|
||||||
> secret_set = "my-first-agent-secrets"
|
|
||||||
>
|
|
||||||
> [scaling]
|
|
||||||
> min_instances = 0
|
|
||||||
> ```
|
|
||||||
>
|
|
||||||
> Then simply run `pcc deploy` without additional arguments.
|
|
||||||
|
|
||||||
> **Note**: If your repository is private, you'll need to add credentials:
|
|
||||||
>
|
|
||||||
> ```bash
|
|
||||||
> # Create pull secret (you’ll be prompted for credentials)
|
|
||||||
> pcc secrets image-pull-secret pull-secret https://index.docker.io/v1/
|
|
||||||
>
|
|
||||||
> # Deploy with credentials
|
|
||||||
> pcc deploy my-first-agent your-username/my-first-agent:0.1 --credentials pull-secret
|
|
||||||
> ```
|
|
||||||
|
|
||||||
### 4. Check deployment and scaling (optional)
|
|
||||||
|
|
||||||
By default, your agent will use "scale-to-zero" configuration, which means it may have a cold start of around 10 seconds when first used. By default, idle instances are maintained for 5 minutes before being terminated when using scale-to-zero.
|
|
||||||
|
|
||||||
For more responsive testing, you can scale your deployment to keep a minimum of one instance warm:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Ensure at least one warm instance is always available
|
|
||||||
pcc deploy my-first-agent your-username/my-first-agent:0.1 --min-instances 1
|
|
||||||
|
|
||||||
# Check the status of your deployment
|
|
||||||
pcc agent status my-first-agent
|
|
||||||
```
|
|
||||||
|
|
||||||
By default, idle instances are maintained for 5 minutes before being terminated when using scale-to-zero.
|
|
||||||
|
|
||||||
### 5. Create an API key
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Create a public API key for accessing your agent
|
|
||||||
pcc organizations keys create
|
|
||||||
|
|
||||||
# Set it as the default key to use with your agent
|
|
||||||
pcc organizations keys use
|
|
||||||
```
|
|
||||||
|
|
||||||
### 6. Start your agent
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Start a session with your agent in a Daily room
|
|
||||||
pcc agent start my-first-agent --use-daily
|
|
||||||
```
|
|
||||||
|
|
||||||
This will return a URL, which you can use to connect to your running agent.
|
|
||||||
|
|
||||||
## Documentation
|
|
||||||
|
|
||||||
For more details on Pipecat Cloud and its capabilities:
|
|
||||||
|
|
||||||
- [Pipecat Cloud Documentation](https://docs.pipecat.daily.co)
|
|
||||||
- [Pipecat Project Documentation](https://docs.pipecat.ai)
|
|
||||||
|
|
||||||
## Support
|
|
||||||
|
|
||||||
Join our [Discord community](https://discord.gg/dailyco) for help and discussions.
|
|
||||||
@@ -1,161 +0,0 @@
|
|||||||
#
|
|
||||||
# Copyright (c) 2025, Daily
|
|
||||||
#
|
|
||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
|
||||||
#
|
|
||||||
|
|
||||||
import os
|
|
||||||
|
|
||||||
import aiohttp
|
|
||||||
from dotenv import load_dotenv
|
|
||||||
from loguru import logger
|
|
||||||
from pipecatcloud.agent import DailySessionArguments
|
|
||||||
|
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
|
||||||
from pipecat.frames.frames import LLMMessagesFrame
|
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
|
||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
|
||||||
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
|
||||||
from pipecat.services.cartesia.tts import CartesiaTTSService
|
|
||||||
from pipecat.services.openai.llm import OpenAILLMService
|
|
||||||
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
|
||||||
|
|
||||||
# Check if we're in local development mode
|
|
||||||
LOCAL_RUN = os.getenv("LOCAL_RUN")
|
|
||||||
if LOCAL_RUN:
|
|
||||||
import asyncio
|
|
||||||
import webbrowser
|
|
||||||
|
|
||||||
try:
|
|
||||||
from local_runner import configure
|
|
||||||
except ImportError:
|
|
||||||
logger.error("Could not import local_runner module. Local development mode may not work.")
|
|
||||||
|
|
||||||
# Load environment variables
|
|
||||||
load_dotenv(override=True)
|
|
||||||
|
|
||||||
|
|
||||||
async def main(room_url: str, token: str):
|
|
||||||
"""Main pipeline setup and execution function.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
room_url: The Daily room URL
|
|
||||||
token: The Daily room token
|
|
||||||
"""
|
|
||||||
logger.debug("Starting bot in room: {}", room_url)
|
|
||||||
|
|
||||||
transport = DailyTransport(
|
|
||||||
room_url,
|
|
||||||
token,
|
|
||||||
"bot",
|
|
||||||
DailyParams(
|
|
||||||
audio_out_enabled=True,
|
|
||||||
transcription_enabled=True,
|
|
||||||
vad_enabled=True,
|
|
||||||
vad_analyzer=SileroVADAnalyzer(),
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
tts = CartesiaTTSService(
|
|
||||||
api_key=os.getenv("CARTESIA_API_KEY"), voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22"
|
|
||||||
)
|
|
||||||
|
|
||||||
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
|
|
||||||
|
|
||||||
messages = [
|
|
||||||
{
|
|
||||||
"role": "system",
|
|
||||||
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
|
||||||
},
|
|
||||||
]
|
|
||||||
|
|
||||||
context = OpenAILLMContext(messages)
|
|
||||||
context_aggregator = llm.create_context_aggregator(context)
|
|
||||||
|
|
||||||
pipeline = Pipeline(
|
|
||||||
[
|
|
||||||
transport.input(),
|
|
||||||
context_aggregator.user(),
|
|
||||||
llm,
|
|
||||||
tts,
|
|
||||||
transport.output(),
|
|
||||||
context_aggregator.assistant(),
|
|
||||||
]
|
|
||||||
)
|
|
||||||
|
|
||||||
task = PipelineTask(
|
|
||||||
pipeline,
|
|
||||||
params=PipelineParams(
|
|
||||||
allow_interruptions=True,
|
|
||||||
enable_metrics=True,
|
|
||||||
enable_usage_metrics=True,
|
|
||||||
report_only_initial_ttfb=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
@transport.event_handler("on_first_participant_joined")
|
|
||||||
async def on_first_participant_joined(transport, participant):
|
|
||||||
logger.info("First participant joined: {}", participant["id"])
|
|
||||||
await transport.capture_participant_transcription(participant["id"])
|
|
||||||
# Kick off the conversation.
|
|
||||||
messages.append(
|
|
||||||
{
|
|
||||||
"role": "system",
|
|
||||||
"content": "Please start with 'Hello World' and introduce yourself to the user.",
|
|
||||||
}
|
|
||||||
)
|
|
||||||
await task.queue_frames([LLMMessagesFrame(messages)])
|
|
||||||
|
|
||||||
@transport.event_handler("on_participant_left")
|
|
||||||
async def on_participant_left(transport, participant, reason):
|
|
||||||
logger.info("Participant left: {}", participant)
|
|
||||||
await task.cancel()
|
|
||||||
|
|
||||||
runner = PipelineRunner()
|
|
||||||
|
|
||||||
await runner.run(task)
|
|
||||||
|
|
||||||
|
|
||||||
async def bot(args: DailySessionArguments):
|
|
||||||
"""Main bot entry point compatible with the FastAPI route handler.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
room_url: The Daily room URL
|
|
||||||
token: The Daily room token
|
|
||||||
body: The configuration object from the request body
|
|
||||||
session_id: The session ID for logging
|
|
||||||
"""
|
|
||||||
logger.info(f"Bot process initialized {args.room_url} {args.token}")
|
|
||||||
|
|
||||||
try:
|
|
||||||
await main(args.room_url, args.token)
|
|
||||||
logger.info("Bot process completed")
|
|
||||||
except Exception as e:
|
|
||||||
logger.exception(f"Error in bot process: {str(e)}")
|
|
||||||
raise
|
|
||||||
|
|
||||||
|
|
||||||
# Local development functions
|
|
||||||
async def local_main():
|
|
||||||
"""Function for local development testing."""
|
|
||||||
try:
|
|
||||||
async with aiohttp.ClientSession() as session:
|
|
||||||
(room_url, token) = await configure(session)
|
|
||||||
logger.warning("_")
|
|
||||||
logger.warning("_")
|
|
||||||
logger.warning(f"Talk to your voice agent here: {room_url}")
|
|
||||||
logger.warning("_")
|
|
||||||
logger.warning("_")
|
|
||||||
webbrowser.open(room_url)
|
|
||||||
await main(room_url, token)
|
|
||||||
except Exception as e:
|
|
||||||
logger.exception(f"Error in local development mode: {e}")
|
|
||||||
|
|
||||||
|
|
||||||
# Local development entry point
|
|
||||||
if LOCAL_RUN and __name__ == "__main__":
|
|
||||||
try:
|
|
||||||
asyncio.run(local_main())
|
|
||||||
except Exception as e:
|
|
||||||
logger.exception(f"Failed to run in local mode: {e}")
|
|
||||||
@@ -1,2 +0,0 @@
|
|||||||
CARTESIA_API_KEY=
|
|
||||||
OPENAI_API_KEY=
|
|
||||||
@@ -1,46 +0,0 @@
|
|||||||
#
|
|
||||||
# Copyright (c) 2024–2025, Daily
|
|
||||||
#
|
|
||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
|
||||||
#
|
|
||||||
|
|
||||||
import os
|
|
||||||
|
|
||||||
import aiohttp
|
|
||||||
|
|
||||||
from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper, DailyRoomParams
|
|
||||||
|
|
||||||
|
|
||||||
async def configure(aiohttp_session: aiohttp.ClientSession):
|
|
||||||
(url, token) = await configure_with_args(aiohttp_session)
|
|
||||||
return (url, token)
|
|
||||||
|
|
||||||
|
|
||||||
async def configure_with_args(aiohttp_session: aiohttp.ClientSession = None):
|
|
||||||
key = os.getenv("DAILY_API_KEY")
|
|
||||||
if not key:
|
|
||||||
raise Exception(
|
|
||||||
"No Daily API key specified. set DAILY_API_KEY in your environment to specify a Daily API key, available from https://dashboard.daily.co/developers."
|
|
||||||
)
|
|
||||||
|
|
||||||
daily_rest_helper = DailyRESTHelper(
|
|
||||||
daily_api_key=key,
|
|
||||||
daily_api_url=os.getenv("DAILY_API_URL", "https://api.daily.co/v1"),
|
|
||||||
aiohttp_session=aiohttp_session,
|
|
||||||
)
|
|
||||||
|
|
||||||
room = await daily_rest_helper.create_room(
|
|
||||||
DailyRoomParams(properties={"enable_prejoin_ui": False})
|
|
||||||
)
|
|
||||||
if not room.url:
|
|
||||||
raise HTTPException(status_code=500, detail="Failed to create room")
|
|
||||||
|
|
||||||
url = room.url
|
|
||||||
|
|
||||||
# Create a meeting token for the given room with an expiration 1 hour in
|
|
||||||
# the future.
|
|
||||||
expiry_time: float = 60 * 60
|
|
||||||
|
|
||||||
token = await daily_rest_helper.get_token(url, expiry_time)
|
|
||||||
|
|
||||||
return (url, token)
|
|
||||||
@@ -1,6 +0,0 @@
|
|||||||
agent_name = "my-first-agent"
|
|
||||||
image = "your-username/my-first-agent:0.1"
|
|
||||||
secret_set = "my-first-agent-secrets"
|
|
||||||
|
|
||||||
[scaling]
|
|
||||||
min_instances = 0
|
|
||||||
@@ -1,3 +0,0 @@
|
|||||||
pipecatcloud
|
|
||||||
pipecat-ai[cartesia,daily,openai,silero]>=0.0.58
|
|
||||||
python-dotenv~=1.0.1
|
|
||||||
@@ -1,57 +0,0 @@
|
|||||||
#
|
|
||||||
# Copyright (c) 2024–2025, Daily
|
|
||||||
#
|
|
||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
|
||||||
#
|
|
||||||
|
|
||||||
import os
|
|
||||||
|
|
||||||
import aiohttp
|
|
||||||
from dotenv import load_dotenv
|
|
||||||
from loguru import logger
|
|
||||||
|
|
||||||
from pipecat.frames.frames import EndFrame, TTSSpeakFrame
|
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
|
||||||
from pipecat.pipeline.task import PipelineTask
|
|
||||||
from pipecat.services.piper.tts import PiperTTSService
|
|
||||||
from pipecat.transports.base_transport import TransportParams
|
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
|
||||||
|
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
|
||||||
logger.info(f"Starting bot")
|
|
||||||
|
|
||||||
# Create a transport using the WebRTC connection
|
|
||||||
transport = SmallWebRTCTransport(
|
|
||||||
webrtc_connection=webrtc_connection,
|
|
||||||
params=TransportParams(
|
|
||||||
audio_out_enabled=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
# Create an HTTP session
|
|
||||||
async with aiohttp.ClientSession() as session:
|
|
||||||
tts = PiperTTSService(
|
|
||||||
base_url=os.getenv("PIPER_BASE_URL"), aiohttp_session=session, sample_rate=24000
|
|
||||||
)
|
|
||||||
|
|
||||||
task = PipelineTask(Pipeline([tts, transport.output()]))
|
|
||||||
|
|
||||||
# Register an event handler so we can play the audio when the client joins
|
|
||||||
@transport.event_handler("on_client_connected")
|
|
||||||
async def on_client_connected(transport, client):
|
|
||||||
await task.queue_frames([TTSSpeakFrame(f"Hello there!"), EndFrame()])
|
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
|
||||||
|
|
||||||
await runner.run(task)
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
from run import main
|
|
||||||
|
|
||||||
main()
|
|
||||||
@@ -1,59 +0,0 @@
|
|||||||
#
|
|
||||||
# Copyright (c) 2024–2025, Daily
|
|
||||||
#
|
|
||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
|
||||||
#
|
|
||||||
|
|
||||||
import os
|
|
||||||
|
|
||||||
import aiohttp
|
|
||||||
from dotenv import load_dotenv
|
|
||||||
from loguru import logger
|
|
||||||
|
|
||||||
from pipecat.frames.frames import EndFrame, TTSSpeakFrame
|
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
|
||||||
from pipecat.pipeline.task import PipelineTask
|
|
||||||
from pipecat.services.rime.tts import RimeHttpTTSService
|
|
||||||
from pipecat.transports.base_transport import TransportParams
|
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
|
||||||
|
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
|
||||||
logger.info(f"Starting bot")
|
|
||||||
|
|
||||||
# Create a transport using the WebRTC connection
|
|
||||||
transport = SmallWebRTCTransport(
|
|
||||||
webrtc_connection=webrtc_connection,
|
|
||||||
params=TransportParams(
|
|
||||||
audio_out_enabled=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
# Create an HTTP session
|
|
||||||
async with aiohttp.ClientSession() as session:
|
|
||||||
tts = RimeHttpTTSService(
|
|
||||||
api_key=os.getenv("RIME_API_KEY", ""),
|
|
||||||
voice_id="rex",
|
|
||||||
aiohttp_session=session,
|
|
||||||
)
|
|
||||||
|
|
||||||
task = PipelineTask(Pipeline([tts, transport.output()]))
|
|
||||||
|
|
||||||
# Register an event handler so we can play the audio when the client joins
|
|
||||||
@transport.event_handler("on_client_connected")
|
|
||||||
async def on_client_connected(transport, client):
|
|
||||||
await task.queue_frames([TTSSpeakFrame(f"Hello there!"), EndFrame()])
|
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
|
||||||
|
|
||||||
await runner.run(task)
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
from run import main
|
|
||||||
|
|
||||||
main()
|
|
||||||
@@ -4,53 +4,56 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.frames.frames import EndFrame, TranscriptionFrame, TTSSpeakFrame
|
from pipecat.frames.frames import EndFrame, TTSSpeakFrame
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineTask
|
from pipecat.pipeline.task import PipelineTask
|
||||||
from pipecat.processors.frame_processor import FrameProcessor
|
from pipecat.services.cartesia import CartesiaTTSService
|
||||||
from pipecat.services.cartesia.tts import CartesiaTTSService
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.transports.base_transport import TransportParams
|
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
|
||||||
logger.info(f"Starting bot")
|
|
||||||
|
|
||||||
# Create a transport using the WebRTC connection
|
async def main():
|
||||||
transport = SmallWebRTCTransport(
|
async with aiohttp.ClientSession() as session:
|
||||||
webrtc_connection=webrtc_connection,
|
(room_url, _) = await configure(session)
|
||||||
params=TransportParams(
|
|
||||||
audio_out_enabled=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
tts = CartesiaTTSService(
|
transport = DailyTransport(
|
||||||
api_key=os.getenv("CARTESIA_API_KEY"),
|
room_url, None, "Say One Thing", DailyParams(audio_out_enabled=True)
|
||||||
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
|
)
|
||||||
)
|
|
||||||
|
|
||||||
task = PipelineTask(Pipeline([tts, transport.output()]))
|
tts = CartesiaTTSService(
|
||||||
|
api_key=os.getenv("CARTESIA_API_KEY"),
|
||||||
|
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
|
||||||
|
)
|
||||||
|
|
||||||
# Register an event handler so we can play the audio when the client joins
|
runner = PipelineRunner()
|
||||||
@transport.event_handler("on_client_connected")
|
|
||||||
async def on_client_connected(transport, client):
|
|
||||||
await task.queue_frames([TTSSpeakFrame(f"Hello there!"), EndFrame()])
|
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
task = PipelineTask(Pipeline([tts, transport.output()]))
|
||||||
|
|
||||||
await runner.run(task)
|
# Register an event handler so we can play the audio when the
|
||||||
|
# participant joins.
|
||||||
|
@transport.event_handler("on_first_participant_joined")
|
||||||
|
async def on_first_participant_joined(transport, participant):
|
||||||
|
participant_name = participant.get("info", {}).get("userName", "")
|
||||||
|
await task.queue_frames(
|
||||||
|
[TTSSpeakFrame(f"Hello there, {participant_name}!"), EndFrame()]
|
||||||
|
)
|
||||||
|
|
||||||
|
await runner.run(task)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
@@ -15,7 +15,7 @@ from pipecat.frames.frames import EndFrame, TTSSpeakFrame
|
|||||||
from pipecat.pipeline.pipeline import Pipeline
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineTask
|
from pipecat.pipeline.task import PipelineTask
|
||||||
from pipecat.services.cartesia.tts import CartesiaTTSService
|
from pipecat.services.cartesia import CartesiaTTSService
|
||||||
from pipecat.transports.local.audio import LocalAudioTransport, LocalAudioTransportParams
|
from pipecat.transports.local.audio import LocalAudioTransport, LocalAudioTransportParams
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
@@ -29,7 +29,7 @@ async def main():
|
|||||||
|
|
||||||
tts = CartesiaTTSService(
|
tts = CartesiaTTSService(
|
||||||
api_key=os.getenv("CARTESIA_API_KEY"),
|
api_key=os.getenv("CARTESIA_API_KEY"),
|
||||||
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
|
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
|
||||||
)
|
)
|
||||||
|
|
||||||
pipeline = Pipeline([tts, transport.output()])
|
pipeline = Pipeline([tts, transport.output()])
|
||||||
|
|||||||
@@ -1,9 +1,3 @@
|
|||||||
#
|
|
||||||
# Copyright (c) 2024–2025, Daily
|
|
||||||
#
|
|
||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
|
||||||
#
|
|
||||||
|
|
||||||
import argparse
|
import argparse
|
||||||
import asyncio
|
import asyncio
|
||||||
import os
|
import os
|
||||||
@@ -18,7 +12,7 @@ from pipecat.frames.frames import TextFrame
|
|||||||
from pipecat.pipeline.pipeline import Pipeline
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineTask
|
from pipecat.pipeline.task import PipelineTask
|
||||||
from pipecat.services.cartesia.tts import CartesiaTTSService
|
from pipecat.services.cartesia import CartesiaTTSService
|
||||||
from pipecat.transports.services.livekit import LiveKitParams, LiveKitTransport
|
from pipecat.transports.services.livekit import LiveKitParams, LiveKitTransport
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
@@ -89,7 +83,7 @@ async def main():
|
|||||||
|
|
||||||
tts = CartesiaTTSService(
|
tts = CartesiaTTSService(
|
||||||
api_key=os.getenv("CARTESIA_API_KEY"),
|
api_key=os.getenv("CARTESIA_API_KEY"),
|
||||||
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
|
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
|
||||||
)
|
)
|
||||||
|
|
||||||
runner = PipelineRunner()
|
runner = PipelineRunner()
|
||||||
|
|||||||
@@ -4,49 +4,51 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.frames.frames import EndFrame, TTSSpeakFrame
|
from pipecat.frames.frames import EndFrame, TTSSpeakFrame
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineTask
|
from pipecat.pipeline.task import PipelineTask
|
||||||
from pipecat.services.riva.tts import FastPitchTTSService
|
from pipecat.services.riva import FastPitchTTSService
|
||||||
from pipecat.transports.base_transport import TransportParams
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
|
||||||
logger.info(f"Starting bot")
|
|
||||||
|
|
||||||
# Create a transport using the WebRTC connection
|
async def main():
|
||||||
transport = SmallWebRTCTransport(
|
async with aiohttp.ClientSession() as session:
|
||||||
webrtc_connection=webrtc_connection,
|
(room_url, _) = await configure(session)
|
||||||
params=TransportParams(
|
|
||||||
audio_out_enabled=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
tts = FastPitchTTSService(api_key=os.getenv("NVIDIA_API_KEY"))
|
transport = DailyTransport(
|
||||||
|
room_url, None, "Say One Thing", DailyParams(audio_out_enabled=True)
|
||||||
|
)
|
||||||
|
|
||||||
task = PipelineTask(Pipeline([tts, transport.output()]))
|
tts = FastPitchTTSService(api_key=os.getenv("NVIDIA_API_KEY"))
|
||||||
|
|
||||||
# Register an event handler so we can play the audio when the client joins
|
runner = PipelineRunner()
|
||||||
@transport.event_handler("on_client_connected")
|
|
||||||
async def on_client_connected(transport, client):
|
|
||||||
await task.queue_frames([TTSSpeakFrame(f"Hello there!"), EndFrame()])
|
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
task = PipelineTask(Pipeline([tts, transport.output()]))
|
||||||
|
|
||||||
await runner.run(task)
|
# Register an event handler so we can play the audio when the
|
||||||
|
# participant joins.
|
||||||
|
@transport.event_handler("on_first_participant_joined")
|
||||||
|
async def on_first_participant_joined(transport, participant):
|
||||||
|
participant_name = participant.get("info", {}).get("userName", "")
|
||||||
|
await task.queue_frames([TTSSpeakFrame(f"Aloha, {participant_name}!"), EndFrame()])
|
||||||
|
|
||||||
|
await runner.run(task)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
@@ -4,62 +4,61 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.frames.frames import EndFrame, LLMMessagesFrame
|
from pipecat.frames.frames import EndFrame, LLMMessagesFrame
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineTask
|
from pipecat.pipeline.task import PipelineTask
|
||||||
from pipecat.services.cartesia.tts import CartesiaTTSService
|
from pipecat.services.cartesia import CartesiaTTSService
|
||||||
from pipecat.services.openai.llm import OpenAILLMService
|
from pipecat.services.openai import OpenAILLMService
|
||||||
from pipecat.transports.base_transport import TransportParams
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
|
||||||
logger.info(f"Starting bot")
|
|
||||||
|
|
||||||
# Create a transport using the WebRTC connection
|
async def main():
|
||||||
transport = SmallWebRTCTransport(
|
async with aiohttp.ClientSession() as session:
|
||||||
webrtc_connection=webrtc_connection,
|
(room_url, _) = await configure(session)
|
||||||
params=TransportParams(
|
|
||||||
audio_out_enabled=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
tts = CartesiaTTSService(
|
transport = DailyTransport(
|
||||||
api_key=os.getenv("CARTESIA_API_KEY"),
|
room_url, None, "Say One Thing From an LLM", DailyParams(audio_out_enabled=True)
|
||||||
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
|
)
|
||||||
)
|
|
||||||
|
|
||||||
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
|
tts = CartesiaTTSService(
|
||||||
|
api_key=os.getenv("CARTESIA_API_KEY"),
|
||||||
|
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
|
||||||
|
)
|
||||||
|
|
||||||
messages = [
|
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
|
||||||
{
|
|
||||||
"role": "system",
|
|
||||||
"content": "You are an LLM in a WebRTC session, and this is a 'hello world' demo. Say hello to the world.",
|
|
||||||
}
|
|
||||||
]
|
|
||||||
|
|
||||||
task = PipelineTask(Pipeline([llm, tts, transport.output()]))
|
messages = [
|
||||||
|
{
|
||||||
|
"role": "system",
|
||||||
|
"content": "You are an LLM in a WebRTC session, and this is a 'hello world' demo. Say hello to the world.",
|
||||||
|
}
|
||||||
|
]
|
||||||
|
|
||||||
# Register an event handler so we can play the audio when the client joins
|
runner = PipelineRunner()
|
||||||
@transport.event_handler("on_client_connected")
|
|
||||||
async def on_client_connected(transport, client):
|
|
||||||
await task.queue_frames([LLMMessagesFrame(messages), EndFrame()])
|
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
task = PipelineTask(Pipeline([llm, tts, transport.output()]))
|
||||||
|
|
||||||
await runner.run(task)
|
@transport.event_handler("on_first_participant_joined")
|
||||||
|
async def on_first_participant_joined(transport, participant):
|
||||||
|
await task.queue_frames([LLMMessagesFrame(messages), EndFrame()])
|
||||||
|
|
||||||
|
await runner.run(task)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
@@ -4,67 +4,59 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
import aiohttp
|
import aiohttp
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.frames.frames import TextFrame
|
from pipecat.frames.frames import TextFrame
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineTask
|
from pipecat.pipeline.task import PipelineTask
|
||||||
from pipecat.services.fal.image import FalImageGenService
|
from pipecat.services.fal import FalImageGenService
|
||||||
from pipecat.transports.base_transport import TransportParams
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
|
||||||
logger.info(f"Starting bot")
|
|
||||||
|
|
||||||
# Create a transport using the WebRTC connection
|
async def main():
|
||||||
transport = SmallWebRTCTransport(
|
|
||||||
webrtc_connection=webrtc_connection,
|
|
||||||
params=TransportParams(
|
|
||||||
camera_out_enabled=True,
|
|
||||||
camera_out_width=1024,
|
|
||||||
camera_out_height=1024,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
# Create an HTTP session
|
|
||||||
async with aiohttp.ClientSession() as session:
|
async with aiohttp.ClientSession() as session:
|
||||||
|
(room_url, _) = await configure(session)
|
||||||
|
|
||||||
|
transport = DailyTransport(
|
||||||
|
room_url,
|
||||||
|
None,
|
||||||
|
"Show a still frame image",
|
||||||
|
DailyParams(camera_out_enabled=True, camera_out_width=1024, camera_out_height=1024),
|
||||||
|
)
|
||||||
|
|
||||||
imagegen = FalImageGenService(
|
imagegen = FalImageGenService(
|
||||||
params=FalImageGenService.InputParams(image_size="square_hd"),
|
params=FalImageGenService.InputParams(image_size="square_hd"),
|
||||||
aiohttp_session=session,
|
aiohttp_session=session,
|
||||||
key=os.getenv("FAL_KEY"),
|
key=os.getenv("FAL_KEY"),
|
||||||
)
|
)
|
||||||
|
|
||||||
|
runner = PipelineRunner()
|
||||||
|
|
||||||
task = PipelineTask(Pipeline([imagegen, transport.output()]))
|
task = PipelineTask(Pipeline([imagegen, transport.output()]))
|
||||||
|
|
||||||
# Register an event handler so we can play the audio when the client joins
|
@transport.event_handler("on_first_participant_joined")
|
||||||
@transport.event_handler("on_client_connected")
|
async def on_first_participant_joined(transport, participant):
|
||||||
async def on_client_connected(transport, client):
|
|
||||||
await task.queue_frame(TextFrame("a cat in the style of picasso"))
|
await task.queue_frame(TextFrame("a cat in the style of picasso"))
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
@transport.event_handler("on_participant_left")
|
||||||
async def on_client_disconnected(transport, client):
|
async def on_participant_left(transport, participant, reason):
|
||||||
logger.info(f"Client disconnected")
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
|
||||||
async def on_client_closed(transport, client):
|
|
||||||
logger.info(f"Client closed connection")
|
|
||||||
await task.cancel()
|
await task.cancel()
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
|
||||||
|
|
||||||
await runner.run(task)
|
await runner.run(task)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
@@ -17,7 +17,7 @@ from pipecat.frames.frames import TextFrame
|
|||||||
from pipecat.pipeline.pipeline import Pipeline
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineTask
|
from pipecat.pipeline.task import PipelineTask
|
||||||
from pipecat.services.fal.image import FalImageGenService
|
from pipecat.services.fal import FalImageGenService
|
||||||
from pipecat.transports.local.tk import TkLocalTransport, TkTransportParams
|
from pipecat.transports.local.tk import TkLocalTransport, TkTransportParams
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|||||||
@@ -4,67 +4,62 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.frames.frames import TextFrame
|
from pipecat.frames.frames import EndFrame, TextFrame
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||||
from pipecat.services.google.image import GoogleImageGenService
|
from pipecat.services.google import GoogleImageGenService
|
||||||
from pipecat.transports.base_transport import TransportParams
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
|
||||||
logger.info(f"Starting bot")
|
|
||||||
|
|
||||||
# Create a transport using the WebRTC connection
|
async def main():
|
||||||
transport = SmallWebRTCTransport(
|
async with aiohttp.ClientSession() as session:
|
||||||
webrtc_connection=webrtc_connection,
|
(room_url, _) = await configure(session)
|
||||||
params=TransportParams(
|
|
||||||
camera_out_enabled=True,
|
|
||||||
camera_out_width=1024,
|
|
||||||
camera_out_height=1024,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
imagegen = GoogleImageGenService(
|
transport = DailyTransport(
|
||||||
api_key=os.getenv("GOOGLE_API_KEY"),
|
room_url,
|
||||||
)
|
None,
|
||||||
|
"Show a still frame image",
|
||||||
|
DailyParams(camera_out_enabled=True, camera_out_width=1024, camera_out_height=1024),
|
||||||
|
)
|
||||||
|
|
||||||
task = PipelineTask(
|
imagegen = GoogleImageGenService(
|
||||||
Pipeline([imagegen, transport.output()]),
|
api_key=os.getenv("GOOGLE_API_KEY"),
|
||||||
params=PipelineParams(enable_metrics=True),
|
)
|
||||||
)
|
|
||||||
|
|
||||||
# Register an event handler so we can play the audio when the client joins
|
runner = PipelineRunner()
|
||||||
@transport.event_handler("on_client_connected")
|
|
||||||
async def on_client_connected(transport, client):
|
|
||||||
await task.queue_frame(TextFrame("a cat in the style of picasso"))
|
|
||||||
await task.queue_frame(TextFrame("a dog in the style of picasso"))
|
|
||||||
await task.queue_frame(TextFrame("a fish in the style of picasso"))
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
task = PipelineTask(
|
||||||
async def on_client_disconnected(transport, client):
|
Pipeline([imagegen, transport.output()]),
|
||||||
logger.info(f"Client disconnected")
|
params=PipelineParams(enable_metrics=True),
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
@transport.event_handler("on_first_participant_joined")
|
||||||
async def on_client_closed(transport, client):
|
async def on_first_participant_joined(transport, participant):
|
||||||
logger.info(f"Client closed connection")
|
await task.queue_frame(TextFrame("a cat in the style of picasso"))
|
||||||
await task.cancel()
|
await task.queue_frame(TextFrame("a dog in the style of picasso"))
|
||||||
|
await task.queue_frame(TextFrame("a fish in the style of picasso"))
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
@transport.event_handler("on_participant_left")
|
||||||
|
async def on_participant_left(transport, participant, reason):
|
||||||
|
await task.queue_frame(EndFrame())
|
||||||
|
|
||||||
await runner.run(task)
|
await runner.run(task)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
@@ -13,9 +13,9 @@ import os
|
|||||||
import sys
|
import sys
|
||||||
|
|
||||||
import aiohttp
|
import aiohttp
|
||||||
from daily_runner import configure
|
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.frames.frames import EndPipeFrame, LLMMessagesFrame, TextFrame
|
from pipecat.frames.frames import EndPipeFrame, LLMMessagesFrame, TextFrame
|
||||||
from pipecat.pipeline.merge_pipeline import SequentialMergePipeline
|
from pipecat.pipeline.merge_pipeline import SequentialMergePipeline
|
||||||
|
|||||||
@@ -4,12 +4,15 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
from dataclasses import dataclass
|
from dataclasses import dataclass
|
||||||
|
|
||||||
import aiohttp
|
import aiohttp
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.frames.frames import (
|
from pipecat.frames.frames import (
|
||||||
DataFrame,
|
DataFrame,
|
||||||
@@ -24,15 +27,16 @@ from pipecat.pipeline.sync_parallel_pipeline import SyncParallelPipeline
|
|||||||
from pipecat.pipeline.task import PipelineTask
|
from pipecat.pipeline.task import PipelineTask
|
||||||
from pipecat.processors.aggregators.sentence import SentenceAggregator
|
from pipecat.processors.aggregators.sentence import SentenceAggregator
|
||||||
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
|
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
|
||||||
from pipecat.services.cartesia.tts import CartesiaHttpTTSService
|
from pipecat.services.cartesia import CartesiaHttpTTSService
|
||||||
from pipecat.services.fal.image import FalImageGenService
|
from pipecat.services.fal import FalImageGenService
|
||||||
from pipecat.services.openai.llm import OpenAILLMService
|
from pipecat.services.openai import OpenAILLMService
|
||||||
from pipecat.transports.base_transport import TransportParams
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
@dataclass
|
||||||
class MonthFrame(DataFrame):
|
class MonthFrame(DataFrame):
|
||||||
@@ -63,33 +67,27 @@ class MonthPrepender(FrameProcessor):
|
|||||||
await self.push_frame(frame, direction)
|
await self.push_frame(frame, direction)
|
||||||
|
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
async def main():
|
||||||
"""Run the Calendar Month Narration bot using WebRTC transport.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
webrtc_connection: The WebRTC connection to use
|
|
||||||
room_name: Optional room name for display purposes
|
|
||||||
"""
|
|
||||||
logger.info(f"Starting bot")
|
|
||||||
|
|
||||||
# Create a transport using the WebRTC connection
|
|
||||||
transport = SmallWebRTCTransport(
|
|
||||||
webrtc_connection=webrtc_connection,
|
|
||||||
params=TransportParams(
|
|
||||||
audio_out_enabled=True,
|
|
||||||
camera_out_enabled=True,
|
|
||||||
camera_out_width=1024,
|
|
||||||
camera_out_height=1024,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
# Create an HTTP session for API calls
|
|
||||||
async with aiohttp.ClientSession() as session:
|
async with aiohttp.ClientSession() as session:
|
||||||
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
|
(room_url, _) = await configure(session)
|
||||||
|
|
||||||
|
transport = DailyTransport(
|
||||||
|
room_url,
|
||||||
|
None,
|
||||||
|
"Month Narration Bot",
|
||||||
|
DailyParams(
|
||||||
|
audio_out_enabled=True,
|
||||||
|
camera_out_enabled=True,
|
||||||
|
camera_out_width=1024,
|
||||||
|
camera_out_height=1024,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
|
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
|
||||||
|
|
||||||
tts = CartesiaHttpTTSService(
|
tts = CartesiaHttpTTSService(
|
||||||
api_key=os.getenv("CARTESIA_API_KEY"),
|
api_key=os.getenv("CARTESIA_API_KEY"),
|
||||||
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
|
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
|
||||||
)
|
)
|
||||||
|
|
||||||
imagegen = FalImageGenService(
|
imagegen = FalImageGenService(
|
||||||
@@ -146,30 +144,14 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
|||||||
frames.append(MonthFrame(month=month))
|
frames.append(MonthFrame(month=month))
|
||||||
frames.append(LLMMessagesFrame(messages))
|
frames.append(LLMMessagesFrame(messages))
|
||||||
|
|
||||||
|
runner = PipelineRunner()
|
||||||
|
|
||||||
task = PipelineTask(pipeline)
|
task = PipelineTask(pipeline)
|
||||||
|
|
||||||
# Set up transport event handlers
|
await task.queue_frames(frames)
|
||||||
@transport.event_handler("on_client_connected")
|
|
||||||
async def on_client_connected(transport, client):
|
|
||||||
logger.info(f"Client connected")
|
|
||||||
# Start the month narration once connected
|
|
||||||
await task.queue_frames(frames)
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
|
||||||
async def on_client_disconnected(transport, client):
|
|
||||||
logger.info(f"Client disconnected")
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
|
||||||
async def on_client_closed(transport, client):
|
|
||||||
logger.info(f"Client closed connection")
|
|
||||||
await task.cancel()
|
|
||||||
|
|
||||||
# Run the pipeline
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
|
||||||
await runner.run(task)
|
await runner.run(task)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
@@ -27,9 +27,9 @@ from pipecat.pipeline.sync_parallel_pipeline import SyncParallelPipeline
|
|||||||
from pipecat.pipeline.task import PipelineTask
|
from pipecat.pipeline.task import PipelineTask
|
||||||
from pipecat.processors.aggregators.sentence import SentenceAggregator
|
from pipecat.processors.aggregators.sentence import SentenceAggregator
|
||||||
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
|
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
|
||||||
from pipecat.services.cartesia.tts import CartesiaHttpTTSService
|
from pipecat.services.cartesia import CartesiaHttpTTSService
|
||||||
from pipecat.services.fal.image import FalImageGenService
|
from pipecat.services.fal import FalImageGenService
|
||||||
from pipecat.services.openai.llm import OpenAILLMService
|
from pipecat.services.openai import OpenAILLMService
|
||||||
from pipecat.transports.local.tk import TkLocalTransport, TkTransportParams
|
from pipecat.transports.local.tk import TkLocalTransport, TkTransportParams
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
@@ -93,11 +93,11 @@ async def main():
|
|||||||
self.frame = frame
|
self.frame = frame
|
||||||
await self.push_frame(frame, direction)
|
await self.push_frame(frame, direction)
|
||||||
|
|
||||||
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
|
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
|
||||||
|
|
||||||
tts = CartesiaHttpTTSService(
|
tts = CartesiaHttpTTSService(
|
||||||
api_key=os.getenv("CARTESIA_API_KEY"),
|
api_key=os.getenv("CARTESIA_API_KEY"),
|
||||||
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
|
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
|
||||||
)
|
)
|
||||||
|
|
||||||
imagegen = FalImageGenService(
|
imagegen = FalImageGenService(
|
||||||
|
|||||||
@@ -4,13 +4,17 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
||||||
from pipecat.frames.frames import Frame, MetricsFrame, TranscriptionFrame, TTSSpeakFrame
|
from pipecat.frames.frames import Frame, MetricsFrame
|
||||||
from pipecat.metrics.metrics import (
|
from pipecat.metrics.metrics import (
|
||||||
LLMUsageMetricsData,
|
LLMUsageMetricsData,
|
||||||
ProcessingMetricsData,
|
ProcessingMetricsData,
|
||||||
@@ -22,40 +26,17 @@ from pipecat.pipeline.runner import PipelineRunner
|
|||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||||
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
||||||
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
|
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
|
||||||
from pipecat.services.cartesia.tts import CartesiaTTSService
|
from pipecat.services.cartesia import CartesiaTTSService
|
||||||
from pipecat.services.deepgram.stt import DeepgramSTTService
|
from pipecat.services.openai import OpenAILLMService
|
||||||
from pipecat.services.openai.llm import OpenAILLMService
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.transports.base_transport import TransportParams
|
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
# Custom processor that prints a message if it receives a TranscriptionFrame that says "banana"
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
class BananaProcessor(FrameProcessor):
|
|
||||||
"""A custom processor that listens for transcription frames containing the word 'banana'."""
|
|
||||||
|
|
||||||
def __init__(self):
|
|
||||||
super().__init__()
|
|
||||||
|
|
||||||
async def process_frame(self, frame: Frame, direction: FrameDirection):
|
|
||||||
# Ensure the super method is called first
|
|
||||||
await super().process_frame(frame, direction)
|
|
||||||
|
|
||||||
if isinstance(frame, TranscriptionFrame):
|
|
||||||
logger.debug(f"Received transcription frame: {frame.text}")
|
|
||||||
if "banana" in frame.text.lower():
|
|
||||||
logger.info("---- Received 'banana' in transcription frame")
|
|
||||||
|
|
||||||
# Push the frame after processing
|
|
||||||
await self.push_frame(frame)
|
|
||||||
|
|
||||||
|
|
||||||
class MetricsLogger(FrameProcessor):
|
class MetricsLogger(FrameProcessor):
|
||||||
def __init__(self):
|
|
||||||
super().__init__()
|
|
||||||
|
|
||||||
async def process_frame(self, frame: Frame, direction: FrameDirection):
|
async def process_frame(self, frame: Frame, direction: FrameDirection):
|
||||||
await super().process_frame(frame, direction)
|
await super().process_frame(frame, direction)
|
||||||
|
|
||||||
@@ -75,86 +56,76 @@ class MetricsLogger(FrameProcessor):
|
|||||||
await self.push_frame(frame, direction)
|
await self.push_frame(frame, direction)
|
||||||
|
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
async def main():
|
||||||
logger.info(f"Starting bot")
|
async with aiohttp.ClientSession() as session:
|
||||||
|
(room_url, token) = await configure(session)
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
transport = DailyTransport(
|
||||||
webrtc_connection=webrtc_connection,
|
room_url,
|
||||||
params=TransportParams(
|
token,
|
||||||
audio_in_enabled=True,
|
"Respond bot",
|
||||||
audio_out_enabled=True,
|
DailyParams(
|
||||||
vad_enabled=True,
|
audio_out_enabled=True,
|
||||||
vad_analyzer=SileroVADAnalyzer(),
|
transcription_enabled=True,
|
||||||
vad_audio_passthrough=True,
|
vad_enabled=True,
|
||||||
),
|
vad_analyzer=SileroVADAnalyzer(),
|
||||||
)
|
),
|
||||||
|
)
|
||||||
|
|
||||||
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
|
tts = CartesiaTTSService(
|
||||||
|
api_key=os.getenv("CARTESIA_API_KEY"),
|
||||||
|
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
|
||||||
|
)
|
||||||
|
|
||||||
tts = CartesiaTTSService(
|
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
|
||||||
api_key=os.getenv("CARTESIA_API_KEY"),
|
|
||||||
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
|
|
||||||
)
|
|
||||||
|
|
||||||
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
|
ml = MetricsLogger()
|
||||||
|
|
||||||
ml = MetricsLogger()
|
messages = [
|
||||||
|
{
|
||||||
messages = [
|
"role": "system",
|
||||||
{
|
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
||||||
"role": "system",
|
},
|
||||||
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
|
||||||
},
|
|
||||||
]
|
|
||||||
|
|
||||||
context = OpenAILLMContext(messages)
|
|
||||||
context_aggregator = llm.create_context_aggregator(context)
|
|
||||||
|
|
||||||
banana = BananaProcessor()
|
|
||||||
|
|
||||||
pipeline = Pipeline(
|
|
||||||
[
|
|
||||||
transport.input(),
|
|
||||||
stt,
|
|
||||||
banana,
|
|
||||||
context_aggregator.user(),
|
|
||||||
llm,
|
|
||||||
tts,
|
|
||||||
ml,
|
|
||||||
transport.output(),
|
|
||||||
context_aggregator.assistant(),
|
|
||||||
]
|
]
|
||||||
)
|
|
||||||
|
|
||||||
task = PipelineTask(
|
context = OpenAILLMContext(messages)
|
||||||
pipeline,
|
context_aggregator = llm.create_context_aggregator(context)
|
||||||
params=PipelineParams(
|
|
||||||
enable_metrics=True,
|
|
||||||
enable_usage_metrics=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_connected")
|
pipeline = Pipeline(
|
||||||
async def on_client_connected(transport, client):
|
[
|
||||||
logger.info(f"Client connected")
|
transport.input(),
|
||||||
# Kick off the conversation.
|
context_aggregator.user(),
|
||||||
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
llm,
|
||||||
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
tts,
|
||||||
|
ml,
|
||||||
|
transport.output(),
|
||||||
|
context_aggregator.assistant(),
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
task = PipelineTask(
|
||||||
async def on_client_disconnected(transport, client):
|
pipeline,
|
||||||
logger.info(f"Client disconnected")
|
params=PipelineParams(
|
||||||
|
enable_metrics=True,
|
||||||
|
enable_usage_metrics=True,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
@transport.event_handler("on_first_participant_joined")
|
||||||
async def on_client_closed(transport, client):
|
async def on_first_participant_joined(transport, participant):
|
||||||
logger.info(f"Client closed connection")
|
await transport.capture_participant_transcription(participant["id"])
|
||||||
await task.cancel()
|
# Kick off the conversation.
|
||||||
|
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
||||||
|
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
@transport.event_handler("on_participant_left")
|
||||||
await runner.run(task)
|
async def on_participant_left(transport, participant, reason):
|
||||||
|
await task.cancel()
|
||||||
|
|
||||||
|
runner = PipelineRunner()
|
||||||
|
|
||||||
|
await runner.run(task)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
@@ -4,11 +4,15 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
||||||
from pipecat.frames.frames import (
|
from pipecat.frames.frames import (
|
||||||
@@ -16,21 +20,22 @@ from pipecat.frames.frames import (
|
|||||||
BotStoppedSpeakingFrame,
|
BotStoppedSpeakingFrame,
|
||||||
Frame,
|
Frame,
|
||||||
OutputImageRawFrame,
|
OutputImageRawFrame,
|
||||||
|
TextFrame,
|
||||||
)
|
)
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||||
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
||||||
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
|
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
|
||||||
from pipecat.services.cartesia.tts import CartesiaTTSService
|
from pipecat.services.cartesia import CartesiaTTSService
|
||||||
from pipecat.services.deepgram.stt import DeepgramSTTService
|
from pipecat.services.openai import OpenAILLMService
|
||||||
from pipecat.services.openai.llm import OpenAILLMService
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.transports.base_transport import TransportParams
|
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
|
|
||||||
class ImageSyncAggregator(FrameProcessor):
|
class ImageSyncAggregator(FrameProcessor):
|
||||||
def __init__(self, speaking_path: str, waiting_path: str):
|
def __init__(self, speaking_path: str, waiting_path: str):
|
||||||
@@ -67,90 +72,83 @@ class ImageSyncAggregator(FrameProcessor):
|
|||||||
await self.push_frame(frame)
|
await self.push_frame(frame)
|
||||||
|
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
async def main():
|
||||||
logger.info(f"Starting bot")
|
async with aiohttp.ClientSession() as session:
|
||||||
|
(room_url, token) = await configure(session)
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
transport = DailyTransport(
|
||||||
webrtc_connection=webrtc_connection,
|
room_url,
|
||||||
params=TransportParams(
|
token,
|
||||||
audio_in_enabled=True,
|
"Respond bot",
|
||||||
audio_out_enabled=True,
|
DailyParams(
|
||||||
camera_out_enabled=True,
|
audio_out_enabled=True,
|
||||||
camera_out_width=1024,
|
camera_out_enabled=True,
|
||||||
camera_out_height=1024,
|
camera_out_width=1024,
|
||||||
vad_enabled=True,
|
camera_out_height=1024,
|
||||||
vad_analyzer=SileroVADAnalyzer(),
|
transcription_enabled=True,
|
||||||
vad_audio_passthrough=True,
|
vad_enabled=True,
|
||||||
),
|
vad_analyzer=SileroVADAnalyzer(),
|
||||||
)
|
),
|
||||||
|
)
|
||||||
|
|
||||||
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
|
tts = CartesiaTTSService(
|
||||||
|
api_key=os.getenv("CARTESIA_API_KEY"),
|
||||||
|
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
|
||||||
|
)
|
||||||
|
|
||||||
tts = CartesiaTTSService(
|
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
|
||||||
api_key=os.getenv("CARTESIA_API_KEY"),
|
|
||||||
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
|
|
||||||
)
|
|
||||||
|
|
||||||
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
|
messages = [
|
||||||
|
{
|
||||||
messages = [
|
"role": "system",
|
||||||
{
|
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
||||||
"role": "system",
|
},
|
||||||
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
|
||||||
},
|
|
||||||
]
|
|
||||||
|
|
||||||
context = OpenAILLMContext(messages)
|
|
||||||
context_aggregator = llm.create_context_aggregator(context)
|
|
||||||
|
|
||||||
image_sync_aggregator = ImageSyncAggregator(
|
|
||||||
os.path.join(os.path.dirname(__file__), "assets", "speaking.png"),
|
|
||||||
os.path.join(os.path.dirname(__file__), "assets", "waiting.png"),
|
|
||||||
)
|
|
||||||
|
|
||||||
pipeline = Pipeline(
|
|
||||||
[
|
|
||||||
transport.input(),
|
|
||||||
stt,
|
|
||||||
context_aggregator.user(),
|
|
||||||
llm,
|
|
||||||
tts,
|
|
||||||
image_sync_aggregator,
|
|
||||||
transport.output(),
|
|
||||||
context_aggregator.assistant(),
|
|
||||||
]
|
]
|
||||||
)
|
|
||||||
|
|
||||||
task = PipelineTask(
|
context = OpenAILLMContext(messages)
|
||||||
pipeline,
|
context_aggregator = llm.create_context_aggregator(context)
|
||||||
params=PipelineParams(
|
|
||||||
allow_interruptions=True,
|
|
||||||
enable_metrics=True,
|
|
||||||
enable_usage_metrics=True,
|
|
||||||
report_only_initial_ttfb=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_connected")
|
image_sync_aggregator = ImageSyncAggregator(
|
||||||
async def on_client_connected(transport, client):
|
os.path.join(os.path.dirname(__file__), "assets", "speaking.png"),
|
||||||
logger.info(f"Client connected")
|
os.path.join(os.path.dirname(__file__), "assets", "waiting.png"),
|
||||||
# Kick off the conversation.
|
)
|
||||||
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
pipeline = Pipeline(
|
||||||
async def on_client_disconnected(transport, client):
|
[
|
||||||
logger.info(f"Client disconnected")
|
transport.input(),
|
||||||
|
context_aggregator.user(),
|
||||||
|
llm,
|
||||||
|
tts,
|
||||||
|
image_sync_aggregator,
|
||||||
|
transport.output(),
|
||||||
|
context_aggregator.assistant(),
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
task = PipelineTask(
|
||||||
async def on_client_closed(transport, client):
|
pipeline,
|
||||||
logger.info(f"Client closed connection")
|
params=PipelineParams(
|
||||||
await task.cancel()
|
allow_interruptions=True,
|
||||||
|
enable_metrics=True,
|
||||||
|
enable_usage_metrics=True,
|
||||||
|
report_only_initial_ttfb=True,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
@transport.event_handler("on_first_participant_joined")
|
||||||
await runner.run(task)
|
async def on_first_participant_joined(transport, participant):
|
||||||
|
participant_name = participant.get("info", {}).get("userName", "")
|
||||||
|
await transport.capture_participant_transcription(participant["id"])
|
||||||
|
await task.queue_frames([TextFrame(f"Hi there {participant_name}!")])
|
||||||
|
|
||||||
|
@transport.event_handler("on_participant_left")
|
||||||
|
async def on_participant_left(transport, participant, reason):
|
||||||
|
await task.cancel()
|
||||||
|
|
||||||
|
runner = PipelineRunner()
|
||||||
|
|
||||||
|
await runner.run(task)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
104
examples/foundational/07-interruptible-vad.py
Normal file
104
examples/foundational/07-interruptible-vad.py
Normal file
@@ -0,0 +1,104 @@
|
|||||||
|
#
|
||||||
|
# Copyright (c) 2024–2025, Daily
|
||||||
|
#
|
||||||
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
|
from dotenv import load_dotenv
|
||||||
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
|
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||||
|
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
||||||
|
from pipecat.processors.audio.vad.silero import SileroVAD
|
||||||
|
from pipecat.services.cartesia import CartesiaTTSService
|
||||||
|
from pipecat.services.openai import OpenAILLMService
|
||||||
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
|
|
||||||
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
|
|
||||||
|
async def main():
|
||||||
|
async with aiohttp.ClientSession() as session:
|
||||||
|
(room_url, token) = await configure(session)
|
||||||
|
|
||||||
|
transport = DailyTransport(
|
||||||
|
room_url,
|
||||||
|
token,
|
||||||
|
"Respond bot",
|
||||||
|
DailyParams(
|
||||||
|
audio_in_enabled=True,
|
||||||
|
audio_out_enabled=True,
|
||||||
|
transcription_enabled=True,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
|
vad = SileroVAD()
|
||||||
|
|
||||||
|
tts = CartesiaTTSService(
|
||||||
|
api_key=os.getenv("CARTESIA_API_KEY"),
|
||||||
|
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
|
||||||
|
)
|
||||||
|
|
||||||
|
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
|
||||||
|
|
||||||
|
messages = [
|
||||||
|
{
|
||||||
|
"role": "system",
|
||||||
|
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
||||||
|
},
|
||||||
|
]
|
||||||
|
|
||||||
|
context = OpenAILLMContext(messages)
|
||||||
|
context_aggregator = llm.create_context_aggregator(context)
|
||||||
|
|
||||||
|
pipeline = Pipeline(
|
||||||
|
[
|
||||||
|
transport.input(),
|
||||||
|
vad,
|
||||||
|
context_aggregator.user(),
|
||||||
|
llm,
|
||||||
|
tts,
|
||||||
|
transport.output(),
|
||||||
|
context_aggregator.assistant(),
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
|
task = PipelineTask(
|
||||||
|
pipeline,
|
||||||
|
params=PipelineParams(
|
||||||
|
allow_interruptions=True,
|
||||||
|
enable_metrics=True,
|
||||||
|
enable_usage_metrics=True,
|
||||||
|
report_only_initial_ttfb=True,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
|
@transport.event_handler("on_first_participant_joined")
|
||||||
|
async def on_first_participant_joined(transport, participant):
|
||||||
|
await transport.capture_participant_transcription(participant["id"])
|
||||||
|
# Kick off the conversation.
|
||||||
|
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
||||||
|
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
||||||
|
|
||||||
|
@transport.event_handler("on_participant_left")
|
||||||
|
async def on_participant_left(transport, participant, reason):
|
||||||
|
await task.cancel()
|
||||||
|
|
||||||
|
runner = PipelineRunner()
|
||||||
|
|
||||||
|
await runner.run(task)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
asyncio.run(main())
|
||||||
@@ -4,103 +4,99 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||||
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
||||||
from pipecat.services.cartesia.tts import CartesiaTTSService
|
from pipecat.services.cartesia import CartesiaTTSService
|
||||||
from pipecat.services.deepgram.stt import DeepgramSTTService
|
from pipecat.services.openai import OpenAILLMService
|
||||||
from pipecat.services.openai.llm import OpenAILLMService
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.transports.base_transport import TransportParams
|
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
|
||||||
logger.info(f"Starting bot")
|
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
async def main():
|
||||||
webrtc_connection=webrtc_connection,
|
async with aiohttp.ClientSession() as session:
|
||||||
params=TransportParams(
|
(room_url, token) = await configure(session)
|
||||||
audio_in_enabled=True,
|
|
||||||
audio_out_enabled=True,
|
|
||||||
vad_enabled=True,
|
|
||||||
vad_analyzer=SileroVADAnalyzer(),
|
|
||||||
vad_audio_passthrough=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
|
transport = DailyTransport(
|
||||||
|
room_url,
|
||||||
|
token,
|
||||||
|
"Respond bot",
|
||||||
|
DailyParams(
|
||||||
|
audio_out_enabled=True,
|
||||||
|
transcription_enabled=True,
|
||||||
|
vad_enabled=True,
|
||||||
|
vad_analyzer=SileroVADAnalyzer(),
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
tts = CartesiaTTSService(
|
tts = CartesiaTTSService(
|
||||||
api_key=os.getenv("CARTESIA_API_KEY"),
|
api_key=os.getenv("CARTESIA_API_KEY"),
|
||||||
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
|
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
|
||||||
)
|
)
|
||||||
|
|
||||||
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
|
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
|
||||||
|
|
||||||
messages = [
|
messages = [
|
||||||
{
|
{
|
||||||
"role": "system",
|
"role": "system",
|
||||||
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
||||||
},
|
},
|
||||||
]
|
|
||||||
|
|
||||||
context = OpenAILLMContext(messages)
|
|
||||||
context_aggregator = llm.create_context_aggregator(context)
|
|
||||||
|
|
||||||
pipeline = Pipeline(
|
|
||||||
[
|
|
||||||
transport.input(), # Transport user input
|
|
||||||
stt,
|
|
||||||
context_aggregator.user(), # User responses
|
|
||||||
llm, # LLM
|
|
||||||
tts, # TTS
|
|
||||||
transport.output(), # Transport bot output
|
|
||||||
context_aggregator.assistant(), # Assistant spoken responses
|
|
||||||
]
|
]
|
||||||
)
|
|
||||||
|
|
||||||
task = PipelineTask(
|
context = OpenAILLMContext(messages)
|
||||||
pipeline,
|
context_aggregator = llm.create_context_aggregator(context)
|
||||||
params=PipelineParams(
|
|
||||||
allow_interruptions=True,
|
|
||||||
enable_metrics=True,
|
|
||||||
enable_usage_metrics=True,
|
|
||||||
report_only_initial_ttfb=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_connected")
|
pipeline = Pipeline(
|
||||||
async def on_client_connected(transport, client):
|
[
|
||||||
logger.info(f"Client connected")
|
transport.input(), # Transport user input
|
||||||
# Kick off the conversation.
|
context_aggregator.user(), # User responses
|
||||||
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
llm, # LLM
|
||||||
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
tts, # TTS
|
||||||
|
transport.output(), # Transport bot output
|
||||||
|
context_aggregator.assistant(), # Assistant spoken responses
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
task = PipelineTask(
|
||||||
async def on_client_disconnected(transport, client):
|
pipeline,
|
||||||
logger.info(f"Client disconnected")
|
params=PipelineParams(
|
||||||
|
allow_interruptions=True,
|
||||||
|
enable_metrics=True,
|
||||||
|
enable_usage_metrics=True,
|
||||||
|
report_only_initial_ttfb=True,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
@transport.event_handler("on_first_participant_joined")
|
||||||
async def on_client_closed(transport, client):
|
async def on_first_participant_joined(transport, participant):
|
||||||
logger.info(f"Client closed connection")
|
await transport.capture_participant_transcription(participant["id"])
|
||||||
await task.cancel()
|
# Kick off the conversation.
|
||||||
|
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
||||||
|
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
@transport.event_handler("on_participant_left")
|
||||||
|
async def on_participant_left(transport, participant, reason):
|
||||||
|
await task.cancel()
|
||||||
|
|
||||||
await runner.run(task)
|
runner = PipelineRunner()
|
||||||
|
|
||||||
|
await runner.run(task)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
@@ -4,57 +4,62 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
import aiohttp
|
import aiohttp
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||||
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
||||||
from pipecat.services.deepgram.stt import DeepgramSTTService
|
from pipecat.services.anthropic import AnthropicLLMService
|
||||||
from pipecat.services.openai.llm import OpenAILLMService
|
from pipecat.services.cartesia import CartesiaTTSService
|
||||||
from pipecat.services.rime.tts import RimeHttpTTSService
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.transports.base_transport import TransportParams
|
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
|
||||||
logger.info(f"Starting bot")
|
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
async def main():
|
||||||
webrtc_connection=webrtc_connection,
|
|
||||||
params=TransportParams(
|
|
||||||
audio_in_enabled=True,
|
|
||||||
audio_out_enabled=True,
|
|
||||||
vad_enabled=True,
|
|
||||||
vad_analyzer=SileroVADAnalyzer(),
|
|
||||||
vad_audio_passthrough=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
# Create an HTTP session
|
|
||||||
async with aiohttp.ClientSession() as session:
|
async with aiohttp.ClientSession() as session:
|
||||||
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
|
(room_url, token) = await configure(session)
|
||||||
|
|
||||||
tts = RimeHttpTTSService(
|
transport = DailyTransport(
|
||||||
api_key=os.getenv("RIME_API_KEY", ""),
|
room_url,
|
||||||
voice_id="rex",
|
token,
|
||||||
aiohttp_session=session,
|
"Respond bot",
|
||||||
|
DailyParams(
|
||||||
|
audio_out_enabled=True,
|
||||||
|
transcription_enabled=True,
|
||||||
|
vad_enabled=True,
|
||||||
|
vad_analyzer=SileroVADAnalyzer(),
|
||||||
|
),
|
||||||
)
|
)
|
||||||
|
|
||||||
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
|
tts = CartesiaTTSService(
|
||||||
|
api_key=os.getenv("CARTESIA_API_KEY"),
|
||||||
|
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
|
||||||
|
)
|
||||||
|
|
||||||
|
llm = AnthropicLLMService(
|
||||||
|
api_key=os.getenv("ANTHROPIC_API_KEY"), model="claude-3-opus-20240229"
|
||||||
|
)
|
||||||
|
|
||||||
|
# todo: think more about how to handle system prompts in a more general way. OpenAI,
|
||||||
|
# Google, and Anthropic all have slightly different approaches to providing a system
|
||||||
|
# prompt.
|
||||||
messages = [
|
messages = [
|
||||||
{
|
{
|
||||||
"role": "system",
|
"role": "system",
|
||||||
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative, helpful, and brief way. Say hello.",
|
||||||
},
|
},
|
||||||
]
|
]
|
||||||
|
|
||||||
@@ -64,7 +69,6 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
|||||||
pipeline = Pipeline(
|
pipeline = Pipeline(
|
||||||
[
|
[
|
||||||
transport.input(), # Transport user input
|
transport.input(), # Transport user input
|
||||||
stt,
|
|
||||||
context_aggregator.user(), # User responses
|
context_aggregator.user(), # User responses
|
||||||
llm, # LLM
|
llm, # LLM
|
||||||
tts, # TTS
|
tts, # TTS
|
||||||
@@ -83,28 +87,20 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
|||||||
),
|
),
|
||||||
)
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_connected")
|
@transport.event_handler("on_first_participant_joined")
|
||||||
async def on_client_connected(transport, client):
|
async def on_first_participant_joined(transport, participant):
|
||||||
logger.info(f"Client connected")
|
await transport.capture_participant_transcription(participant["id"])
|
||||||
# Kick off the conversation.
|
# Kick off the conversation.
|
||||||
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
|
||||||
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
@transport.event_handler("on_participant_left")
|
||||||
async def on_client_disconnected(transport, client):
|
async def on_participant_left(transport, participant, reason):
|
||||||
logger.info(f"Client disconnected")
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
|
||||||
async def on_client_closed(transport, client):
|
|
||||||
logger.info(f"Client closed connection")
|
|
||||||
await task.cancel()
|
await task.cancel()
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
runner = PipelineRunner()
|
||||||
|
|
||||||
await runner.run(task)
|
await runner.run(task)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
@@ -1,106 +0,0 @@
|
|||||||
#
|
|
||||||
# Copyright (c) 2024–2025, Daily
|
|
||||||
#
|
|
||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
|
||||||
#
|
|
||||||
|
|
||||||
import os
|
|
||||||
|
|
||||||
from dotenv import load_dotenv
|
|
||||||
from loguru import logger
|
|
||||||
|
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
|
||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
|
||||||
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
|
||||||
from pipecat.processors.audio.vad.silero import SileroVAD
|
|
||||||
from pipecat.services.cartesia.tts import CartesiaTTSService
|
|
||||||
from pipecat.services.deepgram.stt import DeepgramSTTService
|
|
||||||
from pipecat.services.openai.llm import OpenAILLMService
|
|
||||||
from pipecat.transports.base_transport import TransportParams
|
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
|
||||||
|
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
|
||||||
logger.info(f"Starting bot")
|
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
|
||||||
webrtc_connection=webrtc_connection,
|
|
||||||
params=TransportParams(
|
|
||||||
audio_in_enabled=True,
|
|
||||||
audio_out_enabled=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
|
|
||||||
|
|
||||||
vad = SileroVAD()
|
|
||||||
|
|
||||||
tts = CartesiaTTSService(
|
|
||||||
api_key=os.getenv("CARTESIA_API_KEY"),
|
|
||||||
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
|
|
||||||
)
|
|
||||||
|
|
||||||
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
|
|
||||||
|
|
||||||
messages = [
|
|
||||||
{
|
|
||||||
"role": "system",
|
|
||||||
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
|
||||||
},
|
|
||||||
]
|
|
||||||
|
|
||||||
context = OpenAILLMContext(messages)
|
|
||||||
context_aggregator = llm.create_context_aggregator(context)
|
|
||||||
|
|
||||||
pipeline = Pipeline(
|
|
||||||
[
|
|
||||||
transport.input(),
|
|
||||||
stt,
|
|
||||||
vad,
|
|
||||||
context_aggregator.user(),
|
|
||||||
llm,
|
|
||||||
tts,
|
|
||||||
transport.output(),
|
|
||||||
context_aggregator.assistant(),
|
|
||||||
]
|
|
||||||
)
|
|
||||||
|
|
||||||
task = PipelineTask(
|
|
||||||
pipeline,
|
|
||||||
params=PipelineParams(
|
|
||||||
allow_interruptions=True,
|
|
||||||
enable_metrics=True,
|
|
||||||
enable_usage_metrics=True,
|
|
||||||
report_only_initial_ttfb=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_connected")
|
|
||||||
async def on_client_connected(transport, client):
|
|
||||||
logger.info(f"Client connected")
|
|
||||||
# Kick off the conversation.
|
|
||||||
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
|
||||||
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
|
||||||
async def on_client_disconnected(transport, client):
|
|
||||||
logger.info(f"Client disconnected")
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
|
||||||
async def on_client_closed(transport, client):
|
|
||||||
logger.info(f"Client closed connection")
|
|
||||||
await task.cancel()
|
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
|
||||||
|
|
||||||
await runner.run(task)
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
from run import main
|
|
||||||
|
|
||||||
main()
|
|
||||||
@@ -4,8 +4,11 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
|
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
|
||||||
from langchain_community.chat_message_histories import ChatMessageHistory
|
from langchain_community.chat_message_histories import ChatMessageHistory
|
||||||
@@ -13,6 +16,7 @@ from langchain_core.chat_history import BaseChatMessageHistory
|
|||||||
from langchain_core.runnables.history import RunnableWithMessageHistory
|
from langchain_core.runnables.history import RunnableWithMessageHistory
|
||||||
from langchain_openai import ChatOpenAI
|
from langchain_openai import ChatOpenAI
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
||||||
from pipecat.frames.frames import LLMMessagesFrame
|
from pipecat.frames.frames import LLMMessagesFrame
|
||||||
@@ -24,15 +28,15 @@ from pipecat.processors.aggregators.llm_response import (
|
|||||||
LLMUserResponseAggregator,
|
LLMUserResponseAggregator,
|
||||||
)
|
)
|
||||||
from pipecat.processors.frameworks.langchain import LangchainProcessor
|
from pipecat.processors.frameworks.langchain import LangchainProcessor
|
||||||
from pipecat.services.cartesia.tts import CartesiaTTSService
|
from pipecat.services.cartesia import CartesiaTTSService
|
||||||
from pipecat.services.deepgram.stt import DeepgramSTTService
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.transports.base_transport import TransportParams
|
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
message_store = {}
|
message_store = {}
|
||||||
|
|
||||||
|
|
||||||
@@ -42,97 +46,90 @@ def get_session_history(session_id: str) -> BaseChatMessageHistory:
|
|||||||
return message_store[session_id]
|
return message_store[session_id]
|
||||||
|
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
async def main():
|
||||||
logger.info(f"Starting bot")
|
async with aiohttp.ClientSession() as session:
|
||||||
|
(room_url, token) = await configure(session)
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
transport = DailyTransport(
|
||||||
webrtc_connection=webrtc_connection,
|
room_url,
|
||||||
params=TransportParams(
|
token,
|
||||||
audio_in_enabled=True,
|
"Respond bot",
|
||||||
audio_out_enabled=True,
|
DailyParams(
|
||||||
vad_enabled=True,
|
audio_out_enabled=True,
|
||||||
vad_analyzer=SileroVADAnalyzer(),
|
transcription_enabled=True,
|
||||||
vad_audio_passthrough=True,
|
vad_enabled=True,
|
||||||
),
|
vad_analyzer=SileroVADAnalyzer(),
|
||||||
)
|
|
||||||
|
|
||||||
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
|
|
||||||
|
|
||||||
tts = CartesiaTTSService(
|
|
||||||
api_key=os.getenv("CARTESIA_API_KEY"),
|
|
||||||
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
|
|
||||||
)
|
|
||||||
|
|
||||||
prompt = ChatPromptTemplate.from_messages(
|
|
||||||
[
|
|
||||||
(
|
|
||||||
"system",
|
|
||||||
"Be nice and helpful. Answer very briefly and without special characters like `#` or `*`. "
|
|
||||||
"Your response will be synthesized to voice and those characters will create unnatural sounds.",
|
|
||||||
),
|
),
|
||||||
MessagesPlaceholder("chat_history"),
|
)
|
||||||
("human", "{input}"),
|
|
||||||
]
|
|
||||||
)
|
|
||||||
chain = prompt | ChatOpenAI(model="gpt-4.1", temperature=0.7)
|
|
||||||
history_chain = RunnableWithMessageHistory(
|
|
||||||
chain,
|
|
||||||
get_session_history,
|
|
||||||
history_messages_key="chat_history",
|
|
||||||
input_messages_key="input",
|
|
||||||
)
|
|
||||||
lc = LangchainProcessor(history_chain)
|
|
||||||
|
|
||||||
tma_in = LLMUserResponseAggregator()
|
tts = CartesiaTTSService(
|
||||||
tma_out = LLMAssistantResponseAggregator()
|
api_key=os.getenv("CARTESIA_API_KEY"),
|
||||||
|
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
|
||||||
|
)
|
||||||
|
|
||||||
pipeline = Pipeline(
|
prompt = ChatPromptTemplate.from_messages(
|
||||||
[
|
[
|
||||||
transport.input(), # Transport user input
|
(
|
||||||
stt,
|
"system",
|
||||||
tma_in, # User responses
|
"Be nice and helpful. Answer very briefly and without special characters like `#` or `*`. "
|
||||||
lc, # Langchain
|
"Your response will be synthesized to voice and those characters will create unnatural sounds.",
|
||||||
tts, # TTS
|
),
|
||||||
transport.output(), # Transport bot output
|
MessagesPlaceholder("chat_history"),
|
||||||
tma_out, # Assistant spoken responses
|
("human", "{input}"),
|
||||||
]
|
]
|
||||||
)
|
)
|
||||||
|
chain = prompt | ChatOpenAI(model="gpt-4o", temperature=0.7)
|
||||||
|
history_chain = RunnableWithMessageHistory(
|
||||||
|
chain,
|
||||||
|
get_session_history,
|
||||||
|
history_messages_key="chat_history",
|
||||||
|
input_messages_key="input",
|
||||||
|
)
|
||||||
|
lc = LangchainProcessor(history_chain)
|
||||||
|
|
||||||
task = PipelineTask(
|
tma_in = LLMUserResponseAggregator()
|
||||||
pipeline,
|
tma_out = LLMAssistantResponseAggregator()
|
||||||
params=PipelineParams(
|
|
||||||
allow_interruptions=True,
|
|
||||||
enable_metrics=True,
|
|
||||||
enable_usage_metrics=True,
|
|
||||||
report_only_initial_ttfb=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_connected")
|
pipeline = Pipeline(
|
||||||
async def on_client_connected(transport, client):
|
[
|
||||||
logger.info(f"Client connected")
|
transport.input(), # Transport user input
|
||||||
# Kick off the conversation.
|
tma_in, # User responses
|
||||||
# the `LLMMessagesFrame` will be picked up by the LangchainProcessor using
|
lc, # Langchain
|
||||||
# only the content of the last message to inject it in the prompt defined
|
tts, # TTS
|
||||||
# above. So no role is required here.
|
transport.output(), # Transport bot output
|
||||||
messages = [({"content": "Please briefly introduce yourself to the user."})]
|
tma_out, # Assistant spoken responses
|
||||||
await task.queue_frames([LLMMessagesFrame(messages)])
|
]
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
task = PipelineTask(
|
||||||
async def on_client_disconnected(transport, client):
|
pipeline,
|
||||||
logger.info(f"Client disconnected")
|
params=PipelineParams(
|
||||||
|
allow_interruptions=True,
|
||||||
|
enable_metrics=True,
|
||||||
|
enable_usage_metrics=True,
|
||||||
|
report_only_initial_ttfb=True,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
@transport.event_handler("on_first_participant_joined")
|
||||||
async def on_client_closed(transport, client):
|
async def on_first_participant_joined(transport, participant):
|
||||||
logger.info(f"Client closed connection")
|
await transport.capture_participant_transcription(participant["id"])
|
||||||
await task.cancel()
|
lc.set_participant_id(participant["id"])
|
||||||
|
# Kick off the conversation.
|
||||||
|
# the `LLMMessagesFrame` will be picked up by the LangchainProcessor using
|
||||||
|
# only the content of the last message to inject it in the prompt defined
|
||||||
|
# above. So no role is required here.
|
||||||
|
messages = [({"content": "Please briefly introduce yourself to the user."})]
|
||||||
|
await task.queue_frames([LLMMessagesFrame(messages)])
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
@transport.event_handler("on_participant_left")
|
||||||
|
async def on_participant_left(transport, participant, reason):
|
||||||
|
await task.cancel()
|
||||||
|
|
||||||
await runner.run(task)
|
runner = PipelineRunner()
|
||||||
|
|
||||||
|
await runner.run(task)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
@@ -4,11 +4,15 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
from deepgram import LiveOptions
|
from deepgram import LiveOptions
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.frames.frames import (
|
from pipecat.frames.frames import (
|
||||||
BotInterruptionFrame,
|
BotInterruptionFrame,
|
||||||
@@ -20,98 +24,93 @@ from pipecat.pipeline.pipeline import Pipeline
|
|||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||||
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
||||||
from pipecat.services.deepgram.stt import DeepgramSTTService
|
from pipecat.services.deepgram import DeepgramSTTService, DeepgramTTSService
|
||||||
from pipecat.services.deepgram.tts import DeepgramTTSService
|
from pipecat.services.openai import OpenAILLMService
|
||||||
from pipecat.services.openai.llm import OpenAILLMService
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.transports.base_transport import TransportParams
|
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
|
||||||
logger.info(f"Starting bot")
|
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
async def main():
|
||||||
webrtc_connection=webrtc_connection,
|
async with aiohttp.ClientSession() as session:
|
||||||
params=TransportParams(
|
(room_url, _) = await configure(session)
|
||||||
audio_in_enabled=True,
|
|
||||||
audio_out_enabled=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
stt = DeepgramSTTService(
|
transport = DailyTransport(
|
||||||
api_key=os.getenv("DEEPGRAM_API_KEY"),
|
room_url,
|
||||||
live_options=LiveOptions(vad_events=True, utterance_end_ms="1000"),
|
None,
|
||||||
)
|
"Respond bot",
|
||||||
|
DailyParams(
|
||||||
|
audio_in_enabled=True,
|
||||||
|
audio_out_enabled=True,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-helios-en")
|
stt = DeepgramSTTService(
|
||||||
|
api_key=os.getenv("DEEPGRAM_API_KEY"),
|
||||||
|
live_options=LiveOptions(vad_events=True, utterance_end_ms="1000"),
|
||||||
|
)
|
||||||
|
|
||||||
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
|
tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-helios-en")
|
||||||
|
|
||||||
messages = [
|
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
|
||||||
{
|
|
||||||
"role": "system",
|
|
||||||
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
|
||||||
},
|
|
||||||
]
|
|
||||||
|
|
||||||
context = OpenAILLMContext(messages)
|
messages = [
|
||||||
context_aggregator = llm.create_context_aggregator(context)
|
{
|
||||||
|
"role": "system",
|
||||||
pipeline = Pipeline(
|
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
||||||
[
|
},
|
||||||
transport.input(), # Transport user input
|
|
||||||
stt, # STT
|
|
||||||
context_aggregator.user(), # User responses
|
|
||||||
llm, # LLM
|
|
||||||
tts, # TTS
|
|
||||||
transport.output(), # Transport bot output
|
|
||||||
context_aggregator.assistant(), # Assistant spoken responses
|
|
||||||
]
|
]
|
||||||
)
|
|
||||||
|
|
||||||
task = PipelineTask(
|
context = OpenAILLMContext(messages)
|
||||||
pipeline,
|
context_aggregator = llm.create_context_aggregator(context)
|
||||||
params=PipelineParams(
|
|
||||||
allow_interruptions=True,
|
|
||||||
enable_metrics=True,
|
|
||||||
enable_usage_metrics=True,
|
|
||||||
report_only_initial_ttfb=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
@stt.event_handler("on_speech_started")
|
pipeline = Pipeline(
|
||||||
async def on_speech_started(stt, *args, **kwargs):
|
[
|
||||||
await task.queue_frames([BotInterruptionFrame(), UserStartedSpeakingFrame()])
|
transport.input(), # Transport user input
|
||||||
|
stt, # STT
|
||||||
|
context_aggregator.user(), # User responses
|
||||||
|
llm, # LLM
|
||||||
|
tts, # TTS
|
||||||
|
transport.output(), # Transport bot output
|
||||||
|
context_aggregator.assistant(), # Assistant spoken responses
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
@stt.event_handler("on_utterance_end")
|
task = PipelineTask(
|
||||||
async def on_utterance_end(stt, *args, **kwargs):
|
pipeline,
|
||||||
await task.queue_frames([StopInterruptionFrame(), UserStoppedSpeakingFrame()])
|
params=PipelineParams(
|
||||||
|
allow_interruptions=True,
|
||||||
|
enable_metrics=True,
|
||||||
|
enable_usage_metrics=True,
|
||||||
|
report_only_initial_ttfb=True,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_connected")
|
@stt.event_handler("on_speech_started")
|
||||||
async def on_client_connected(transport, client):
|
async def on_speech_started(stt, *args, **kwargs):
|
||||||
logger.info(f"Client connected")
|
await task.queue_frames([BotInterruptionFrame(), UserStartedSpeakingFrame()])
|
||||||
# Kick off the conversation.
|
|
||||||
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
|
||||||
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
@stt.event_handler("on_utterance_end")
|
||||||
async def on_client_disconnected(transport, client):
|
async def on_utterance_end(stt, *args, **kwargs):
|
||||||
logger.info(f"Client disconnected")
|
await task.queue_frames([StopInterruptionFrame(), UserStoppedSpeakingFrame()])
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
@transport.event_handler("on_first_participant_joined")
|
||||||
async def on_client_closed(transport, client):
|
async def on_first_participant_joined(transport, participant):
|
||||||
logger.info(f"Client closed connection")
|
# Kick off the conversation.
|
||||||
await task.cancel()
|
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
||||||
|
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
@transport.event_handler("on_participant_left")
|
||||||
|
async def on_participant_left(transport, participant, reason):
|
||||||
|
await task.cancel()
|
||||||
|
|
||||||
await runner.run(task)
|
runner = PipelineRunner()
|
||||||
|
|
||||||
|
await runner.run(task)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
@@ -4,100 +4,98 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||||
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
||||||
from pipecat.services.deepgram.stt import DeepgramSTTService
|
from pipecat.services.deepgram import DeepgramSTTService, DeepgramTTSService
|
||||||
from pipecat.services.deepgram.tts import DeepgramTTSService
|
from pipecat.services.openai import OpenAILLMService
|
||||||
from pipecat.services.openai.llm import OpenAILLMService
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.transports.base_transport import TransportParams
|
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
|
||||||
logger.info(f"Starting bot")
|
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
async def main():
|
||||||
webrtc_connection=webrtc_connection,
|
async with aiohttp.ClientSession() as session:
|
||||||
params=TransportParams(
|
(room_url, _) = await configure(session)
|
||||||
audio_in_enabled=True,
|
|
||||||
audio_out_enabled=True,
|
|
||||||
vad_enabled=True,
|
|
||||||
vad_analyzer=SileroVADAnalyzer(),
|
|
||||||
vad_audio_passthrough=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
|
transport = DailyTransport(
|
||||||
|
room_url,
|
||||||
|
None,
|
||||||
|
"Respond bot",
|
||||||
|
DailyParams(
|
||||||
|
audio_out_enabled=True,
|
||||||
|
vad_enabled=True,
|
||||||
|
vad_analyzer=SileroVADAnalyzer(),
|
||||||
|
vad_audio_passthrough=True,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-helios-en")
|
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
|
||||||
|
|
||||||
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
|
tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-helios-en")
|
||||||
|
|
||||||
messages = [
|
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
|
||||||
{
|
|
||||||
"role": "system",
|
|
||||||
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
|
||||||
},
|
|
||||||
]
|
|
||||||
|
|
||||||
context = OpenAILLMContext(messages)
|
messages = [
|
||||||
context_aggregator = llm.create_context_aggregator(context)
|
{
|
||||||
|
"role": "system",
|
||||||
pipeline = Pipeline(
|
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
||||||
[
|
},
|
||||||
transport.input(), # Transport user input
|
|
||||||
stt, # STT
|
|
||||||
context_aggregator.user(), # User responses
|
|
||||||
llm, # LLM
|
|
||||||
tts, # TTS
|
|
||||||
transport.output(), # Transport bot output
|
|
||||||
context_aggregator.assistant(), # Assistant spoken responses
|
|
||||||
]
|
]
|
||||||
)
|
|
||||||
|
|
||||||
task = PipelineTask(
|
context = OpenAILLMContext(messages)
|
||||||
pipeline,
|
context_aggregator = llm.create_context_aggregator(context)
|
||||||
params=PipelineParams(
|
|
||||||
allow_interruptions=True,
|
|
||||||
enable_metrics=True,
|
|
||||||
enable_usage_metrics=True,
|
|
||||||
report_only_initial_ttfb=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_connected")
|
pipeline = Pipeline(
|
||||||
async def on_client_connected(transport, client):
|
[
|
||||||
logger.info(f"Client connected")
|
transport.input(), # Transport user input
|
||||||
# Kick off the conversation.
|
stt, # STT
|
||||||
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
context_aggregator.user(), # User responses
|
||||||
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
llm, # LLM
|
||||||
|
tts, # TTS
|
||||||
|
transport.output(), # Transport bot output
|
||||||
|
context_aggregator.assistant(), # Assistant spoken responses
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
task = PipelineTask(
|
||||||
async def on_client_disconnected(transport, client):
|
pipeline,
|
||||||
logger.info(f"Client disconnected")
|
params=PipelineParams(
|
||||||
|
allow_interruptions=True,
|
||||||
|
enable_metrics=True,
|
||||||
|
enable_usage_metrics=True,
|
||||||
|
report_only_initial_ttfb=True,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
@transport.event_handler("on_first_participant_joined")
|
||||||
async def on_client_closed(transport, client):
|
async def on_first_participant_joined(transport, participant):
|
||||||
logger.info(f"Client closed connection")
|
# Kick off the conversation.
|
||||||
await task.cancel()
|
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
||||||
|
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
@transport.event_handler("on_participant_left")
|
||||||
|
async def on_participant_left(transport, participant, reason):
|
||||||
|
await task.cancel()
|
||||||
|
|
||||||
await runner.run(task)
|
runner = PipelineRunner()
|
||||||
|
|
||||||
|
await runner.run(task)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
@@ -1,110 +0,0 @@
|
|||||||
#
|
|
||||||
# Copyright (c) 2024–2025, Daily
|
|
||||||
#
|
|
||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
|
||||||
#
|
|
||||||
|
|
||||||
import os
|
|
||||||
|
|
||||||
import aiohttp
|
|
||||||
from dotenv import load_dotenv
|
|
||||||
from loguru import logger
|
|
||||||
|
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
|
||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
|
||||||
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
|
||||||
from pipecat.services.deepgram.stt import DeepgramSTTService
|
|
||||||
from pipecat.services.elevenlabs.tts import ElevenLabsHttpTTSService
|
|
||||||
from pipecat.services.openai.llm import OpenAILLMService
|
|
||||||
from pipecat.transports.base_transport import TransportParams
|
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
|
||||||
|
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
|
||||||
logger.info(f"Starting bot")
|
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
|
||||||
webrtc_connection=webrtc_connection,
|
|
||||||
params=TransportParams(
|
|
||||||
audio_in_enabled=True,
|
|
||||||
audio_out_enabled=True,
|
|
||||||
vad_enabled=True,
|
|
||||||
vad_analyzer=SileroVADAnalyzer(),
|
|
||||||
vad_audio_passthrough=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
# Create an HTTP session
|
|
||||||
async with aiohttp.ClientSession() as session:
|
|
||||||
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
|
|
||||||
|
|
||||||
tts = ElevenLabsHttpTTSService(
|
|
||||||
api_key=os.getenv("ELEVENLABS_API_KEY", ""),
|
|
||||||
voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
|
|
||||||
aiohttp_session=session,
|
|
||||||
)
|
|
||||||
|
|
||||||
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
|
|
||||||
|
|
||||||
messages = [
|
|
||||||
{
|
|
||||||
"role": "system",
|
|
||||||
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
|
||||||
},
|
|
||||||
]
|
|
||||||
|
|
||||||
context = OpenAILLMContext(messages)
|
|
||||||
context_aggregator = llm.create_context_aggregator(context)
|
|
||||||
|
|
||||||
pipeline = Pipeline(
|
|
||||||
[
|
|
||||||
transport.input(), # Transport user input
|
|
||||||
stt,
|
|
||||||
context_aggregator.user(), # User responses
|
|
||||||
llm, # LLM
|
|
||||||
tts, # TTS
|
|
||||||
transport.output(), # Transport bot output
|
|
||||||
context_aggregator.assistant(), # Assistant spoken responses
|
|
||||||
]
|
|
||||||
)
|
|
||||||
|
|
||||||
task = PipelineTask(
|
|
||||||
pipeline,
|
|
||||||
params=PipelineParams(
|
|
||||||
allow_interruptions=True,
|
|
||||||
enable_metrics=True,
|
|
||||||
enable_usage_metrics=True,
|
|
||||||
report_only_initial_ttfb=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_connected")
|
|
||||||
async def on_client_connected(transport, client):
|
|
||||||
logger.info(f"Client connected")
|
|
||||||
# Kick off the conversation.
|
|
||||||
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
|
||||||
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
|
||||||
async def on_client_disconnected(transport, client):
|
|
||||||
logger.info(f"Client disconnected")
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
|
||||||
async def on_client_closed(transport, client):
|
|
||||||
logger.info(f"Client closed connection")
|
|
||||||
await task.cancel()
|
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
|
||||||
|
|
||||||
await runner.run(task)
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
from run import main
|
|
||||||
|
|
||||||
main()
|
|
||||||
@@ -4,103 +4,99 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||||
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
||||||
from pipecat.services.deepgram.stt import DeepgramSTTService
|
from pipecat.services.elevenlabs import ElevenLabsTTSService
|
||||||
from pipecat.services.elevenlabs.tts import ElevenLabsTTSService
|
from pipecat.services.openai import OpenAILLMService
|
||||||
from pipecat.services.openai.llm import OpenAILLMService
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.transports.base_transport import TransportParams
|
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
|
||||||
logger.info(f"Starting bot")
|
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
async def main():
|
||||||
webrtc_connection=webrtc_connection,
|
async with aiohttp.ClientSession() as session:
|
||||||
params=TransportParams(
|
(room_url, token) = await configure(session)
|
||||||
audio_in_enabled=True,
|
|
||||||
audio_out_enabled=True,
|
|
||||||
vad_enabled=True,
|
|
||||||
vad_analyzer=SileroVADAnalyzer(),
|
|
||||||
vad_audio_passthrough=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
|
transport = DailyTransport(
|
||||||
|
room_url,
|
||||||
|
token,
|
||||||
|
"Respond bot",
|
||||||
|
DailyParams(
|
||||||
|
audio_out_enabled=True,
|
||||||
|
transcription_enabled=True,
|
||||||
|
vad_enabled=True,
|
||||||
|
vad_analyzer=SileroVADAnalyzer(),
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
tts = ElevenLabsTTSService(
|
tts = ElevenLabsTTSService(
|
||||||
api_key=os.getenv("ELEVENLABS_API_KEY", ""),
|
api_key=os.getenv("ELEVENLABS_API_KEY", ""),
|
||||||
voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
|
voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
|
||||||
)
|
)
|
||||||
|
|
||||||
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
|
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
|
||||||
|
|
||||||
messages = [
|
messages = [
|
||||||
{
|
{
|
||||||
"role": "system",
|
"role": "system",
|
||||||
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
||||||
},
|
},
|
||||||
]
|
|
||||||
|
|
||||||
context = OpenAILLMContext(messages)
|
|
||||||
context_aggregator = llm.create_context_aggregator(context)
|
|
||||||
|
|
||||||
pipeline = Pipeline(
|
|
||||||
[
|
|
||||||
transport.input(), # Transport user input
|
|
||||||
stt,
|
|
||||||
context_aggregator.user(), # User responses
|
|
||||||
llm, # LLM
|
|
||||||
tts, # TTS
|
|
||||||
transport.output(), # Transport bot output
|
|
||||||
context_aggregator.assistant(), # Assistant spoken responses
|
|
||||||
]
|
]
|
||||||
)
|
|
||||||
|
|
||||||
task = PipelineTask(
|
context = OpenAILLMContext(messages)
|
||||||
pipeline,
|
context_aggregator = llm.create_context_aggregator(context)
|
||||||
params=PipelineParams(
|
|
||||||
allow_interruptions=True,
|
|
||||||
enable_metrics=True,
|
|
||||||
enable_usage_metrics=True,
|
|
||||||
report_only_initial_ttfb=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_connected")
|
pipeline = Pipeline(
|
||||||
async def on_client_connected(transport, client):
|
[
|
||||||
logger.info(f"Client connected")
|
transport.input(), # Transport user input
|
||||||
# Kick off the conversation.
|
context_aggregator.user(), # User responses
|
||||||
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
llm, # LLM
|
||||||
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
tts, # TTS
|
||||||
|
transport.output(), # Transport bot output
|
||||||
|
context_aggregator.assistant(), # Assistant spoken responses
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
task = PipelineTask(
|
||||||
async def on_client_disconnected(transport, client):
|
pipeline,
|
||||||
logger.info(f"Client disconnected")
|
params=PipelineParams(
|
||||||
|
allow_interruptions=True,
|
||||||
|
enable_metrics=True,
|
||||||
|
enable_usage_metrics=True,
|
||||||
|
report_only_initial_ttfb=True,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
@transport.event_handler("on_first_participant_joined")
|
||||||
async def on_client_closed(transport, client):
|
async def on_first_participant_joined(transport, participant):
|
||||||
logger.info(f"Client closed connection")
|
await transport.capture_participant_transcription(participant["id"])
|
||||||
await task.cancel()
|
# Kick off the conversation.
|
||||||
|
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
||||||
|
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
@transport.event_handler("on_participant_left")
|
||||||
|
async def on_participant_left(transport, participant, reason):
|
||||||
|
await task.cancel()
|
||||||
|
|
||||||
await runner.run(task)
|
runner = PipelineRunner()
|
||||||
|
|
||||||
|
await runner.run(task)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
@@ -4,104 +4,100 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||||
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
||||||
from pipecat.services.deepgram.stt import DeepgramSTTService
|
from pipecat.services.openai import OpenAILLMService
|
||||||
from pipecat.services.openai.llm import OpenAILLMService
|
from pipecat.services.playht import PlayHTHttpTTSService
|
||||||
from pipecat.services.playht.tts import PlayHTHttpTTSService
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.transports.base_transport import TransportParams
|
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
|
||||||
logger.info(f"Starting bot")
|
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
async def main():
|
||||||
webrtc_connection=webrtc_connection,
|
async with aiohttp.ClientSession() as session:
|
||||||
params=TransportParams(
|
(room_url, token) = await configure(session)
|
||||||
audio_in_enabled=True,
|
|
||||||
audio_out_enabled=True,
|
|
||||||
vad_enabled=True,
|
|
||||||
vad_analyzer=SileroVADAnalyzer(),
|
|
||||||
vad_audio_passthrough=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
|
transport = DailyTransport(
|
||||||
|
room_url,
|
||||||
|
token,
|
||||||
|
"Respond bot",
|
||||||
|
DailyParams(
|
||||||
|
audio_out_enabled=True,
|
||||||
|
transcription_enabled=True,
|
||||||
|
vad_enabled=True,
|
||||||
|
vad_analyzer=SileroVADAnalyzer(),
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
tts = PlayHTHttpTTSService(
|
tts = PlayHTHttpTTSService(
|
||||||
user_id=os.getenv("PLAYHT_USER_ID"),
|
user_id=os.getenv("PLAYHT_USER_ID"),
|
||||||
api_key=os.getenv("PLAYHT_API_KEY"),
|
api_key=os.getenv("PLAYHT_API_KEY"),
|
||||||
voice_url="s3://voice-cloning-zero-shot/d9ff78ba-d016-47f6-b0ef-dd630f59414e/female-cs/manifest.json",
|
voice_url="s3://voice-cloning-zero-shot/d9ff78ba-d016-47f6-b0ef-dd630f59414e/female-cs/manifest.json",
|
||||||
)
|
)
|
||||||
|
|
||||||
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
|
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
|
||||||
|
|
||||||
messages = [
|
messages = [
|
||||||
{
|
{
|
||||||
"role": "system",
|
"role": "system",
|
||||||
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
||||||
},
|
},
|
||||||
]
|
|
||||||
|
|
||||||
context = OpenAILLMContext(messages)
|
|
||||||
context_aggregator = llm.create_context_aggregator(context)
|
|
||||||
|
|
||||||
pipeline = Pipeline(
|
|
||||||
[
|
|
||||||
transport.input(), # Transport user input
|
|
||||||
stt,
|
|
||||||
context_aggregator.user(), # User responses
|
|
||||||
llm, # LLM
|
|
||||||
tts, # TTS
|
|
||||||
transport.output(), # Transport bot output
|
|
||||||
context_aggregator.assistant(), # Assistant spoken responses
|
|
||||||
]
|
]
|
||||||
)
|
|
||||||
|
|
||||||
task = PipelineTask(
|
context = OpenAILLMContext(messages)
|
||||||
pipeline,
|
context_aggregator = llm.create_context_aggregator(context)
|
||||||
params=PipelineParams(
|
|
||||||
allow_interruptions=True,
|
|
||||||
enable_metrics=True,
|
|
||||||
enable_usage_metrics=True,
|
|
||||||
report_only_initial_ttfb=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_connected")
|
pipeline = Pipeline(
|
||||||
async def on_client_connected(transport, client):
|
[
|
||||||
logger.info(f"Client connected")
|
transport.input(), # Transport user input
|
||||||
# Kick off the conversation.
|
context_aggregator.user(), # User responses
|
||||||
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
llm, # LLM
|
||||||
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
tts, # TTS
|
||||||
|
transport.output(), # Transport bot output
|
||||||
|
context_aggregator.assistant(), # Assistant spoken responses
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
task = PipelineTask(
|
||||||
async def on_client_disconnected(transport, client):
|
pipeline,
|
||||||
logger.info(f"Client disconnected")
|
params=PipelineParams(
|
||||||
|
allow_interruptions=True,
|
||||||
|
enable_metrics=True,
|
||||||
|
enable_usage_metrics=True,
|
||||||
|
report_only_initial_ttfb=True,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
@transport.event_handler("on_first_participant_joined")
|
||||||
async def on_client_closed(transport, client):
|
async def on_first_participant_joined(transport, participant):
|
||||||
logger.info(f"Client closed connection")
|
await transport.capture_participant_transcription(participant["id"])
|
||||||
await task.cancel()
|
# Kick off the conversation.
|
||||||
|
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
||||||
|
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
@transport.event_handler("on_participant_left")
|
||||||
|
async def on_participant_left(transport, participant, reason):
|
||||||
|
await task.cancel()
|
||||||
|
|
||||||
await runner.run(task)
|
runner = PipelineRunner()
|
||||||
|
|
||||||
|
await runner.run(task)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
@@ -4,106 +4,102 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||||
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
||||||
from pipecat.services.deepgram.stt import DeepgramSTTService
|
from pipecat.services.openai import OpenAILLMService
|
||||||
from pipecat.services.openai.llm import OpenAILLMService
|
from pipecat.services.playht import PlayHTTTSService
|
||||||
from pipecat.services.playht.tts import PlayHTTTSService
|
|
||||||
from pipecat.transcriptions.language import Language
|
from pipecat.transcriptions.language import Language
|
||||||
from pipecat.transports.base_transport import TransportParams
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
|
||||||
logger.info(f"Starting bot")
|
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
async def main():
|
||||||
webrtc_connection=webrtc_connection,
|
async with aiohttp.ClientSession() as session:
|
||||||
params=TransportParams(
|
(room_url, token) = await configure(session)
|
||||||
audio_in_enabled=True,
|
|
||||||
audio_out_enabled=True,
|
|
||||||
vad_enabled=True,
|
|
||||||
vad_analyzer=SileroVADAnalyzer(),
|
|
||||||
vad_audio_passthrough=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
|
transport = DailyTransport(
|
||||||
|
room_url,
|
||||||
|
token,
|
||||||
|
"Respond bot",
|
||||||
|
DailyParams(
|
||||||
|
audio_out_enabled=True,
|
||||||
|
transcription_enabled=True,
|
||||||
|
vad_enabled=True,
|
||||||
|
vad_analyzer=SileroVADAnalyzer(),
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
tts = PlayHTTTSService(
|
tts = PlayHTTTSService(
|
||||||
user_id=os.getenv("PLAYHT_USER_ID"),
|
user_id=os.getenv("PLAYHT_USER_ID"),
|
||||||
api_key=os.getenv("PLAYHT_API_KEY"),
|
api_key=os.getenv("PLAYHT_API_KEY"),
|
||||||
voice_url="s3://voice-cloning-zero-shot/e46b4027-b38d-4d24-b292-38fbca2be0ef/original/manifest.json",
|
voice_url="s3://voice-cloning-zero-shot/d9ff78ba-d016-47f6-b0ef-dd630f59414e/female-cs/manifest.json",
|
||||||
params=PlayHTTTSService.InputParams(language=Language.EN),
|
params=PlayHTTTSService.InputParams(language=Language.EN),
|
||||||
)
|
)
|
||||||
|
|
||||||
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
|
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
|
||||||
|
|
||||||
messages = [
|
messages = [
|
||||||
{
|
{
|
||||||
"role": "system",
|
"role": "system",
|
||||||
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
||||||
},
|
},
|
||||||
]
|
|
||||||
|
|
||||||
context = OpenAILLMContext(messages)
|
|
||||||
context_aggregator = llm.create_context_aggregator(context)
|
|
||||||
|
|
||||||
pipeline = Pipeline(
|
|
||||||
[
|
|
||||||
transport.input(), # Transport user input
|
|
||||||
stt,
|
|
||||||
context_aggregator.user(), # User responses
|
|
||||||
llm, # LLM
|
|
||||||
tts, # TTS
|
|
||||||
transport.output(), # Transport bot output
|
|
||||||
context_aggregator.assistant(), # Assistant spoken responses
|
|
||||||
]
|
]
|
||||||
)
|
|
||||||
|
|
||||||
task = PipelineTask(
|
context = OpenAILLMContext(messages)
|
||||||
pipeline,
|
context_aggregator = llm.create_context_aggregator(context)
|
||||||
params=PipelineParams(
|
|
||||||
allow_interruptions=True,
|
|
||||||
enable_metrics=True,
|
|
||||||
enable_usage_metrics=True,
|
|
||||||
report_only_initial_ttfb=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_connected")
|
pipeline = Pipeline(
|
||||||
async def on_client_connected(transport, client):
|
[
|
||||||
logger.info(f"Client connected")
|
transport.input(), # Transport user input
|
||||||
# Kick off the conversation.
|
context_aggregator.user(), # User responses
|
||||||
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
llm, # LLM
|
||||||
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
tts, # TTS
|
||||||
|
transport.output(), # Transport bot output
|
||||||
|
context_aggregator.assistant(), # Assistant spoken responses
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
task = PipelineTask(
|
||||||
async def on_client_disconnected(transport, client):
|
pipeline,
|
||||||
logger.info(f"Client disconnected")
|
params=PipelineParams(
|
||||||
|
allow_interruptions=True,
|
||||||
|
enable_metrics=True,
|
||||||
|
enable_usage_metrics=True,
|
||||||
|
report_only_initial_ttfb=True,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
@transport.event_handler("on_first_participant_joined")
|
||||||
async def on_client_closed(transport, client):
|
async def on_first_participant_joined(transport, participant):
|
||||||
logger.info(f"Client closed connection")
|
await transport.capture_participant_transcription(participant["id"])
|
||||||
await task.cancel()
|
# Kick off the conversation.
|
||||||
|
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
||||||
|
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
@transport.event_handler("on_participant_left")
|
||||||
|
async def on_participant_left(transport, participant, reason):
|
||||||
|
await task.cancel()
|
||||||
|
|
||||||
await runner.run(task)
|
runner = PipelineRunner()
|
||||||
|
|
||||||
|
await runner.run(task)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
@@ -4,110 +4,108 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||||
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
||||||
from pipecat.services.azure.llm import AzureLLMService
|
from pipecat.services.azure import AzureLLMService, AzureSTTService, AzureTTSService
|
||||||
from pipecat.services.azure.stt import AzureSTTService
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.services.azure.tts import AzureTTSService
|
|
||||||
from pipecat.transports.base_transport import TransportParams
|
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
|
||||||
logger.info(f"Starting bot")
|
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
async def main():
|
||||||
webrtc_connection=webrtc_connection,
|
async with aiohttp.ClientSession() as session:
|
||||||
params=TransportParams(
|
(room_url, token) = await configure(session)
|
||||||
audio_in_enabled=True,
|
|
||||||
audio_out_enabled=True,
|
|
||||||
vad_enabled=True,
|
|
||||||
vad_analyzer=SileroVADAnalyzer(),
|
|
||||||
vad_audio_passthrough=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
stt = AzureSTTService(
|
transport = DailyTransport(
|
||||||
api_key=os.getenv("AZURE_SPEECH_API_KEY"),
|
room_url,
|
||||||
region=os.getenv("AZURE_SPEECH_REGION"),
|
token,
|
||||||
)
|
"Respond bot",
|
||||||
|
DailyParams(
|
||||||
|
audio_out_enabled=True,
|
||||||
|
vad_enabled=True,
|
||||||
|
vad_analyzer=SileroVADAnalyzer(),
|
||||||
|
vad_audio_passthrough=True,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
tts = AzureTTSService(
|
stt = AzureSTTService(
|
||||||
api_key=os.getenv("AZURE_SPEECH_API_KEY"),
|
api_key=os.getenv("AZURE_SPEECH_API_KEY"),
|
||||||
region=os.getenv("AZURE_SPEECH_REGION"),
|
region=os.getenv("AZURE_SPEECH_REGION"),
|
||||||
)
|
)
|
||||||
|
|
||||||
llm = AzureLLMService(
|
tts = AzureTTSService(
|
||||||
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
|
api_key=os.getenv("AZURE_SPEECH_API_KEY"),
|
||||||
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
|
region=os.getenv("AZURE_SPEECH_REGION"),
|
||||||
model=os.getenv("AZURE_CHATGPT_MODEL"),
|
)
|
||||||
)
|
|
||||||
|
|
||||||
messages = [
|
llm = AzureLLMService(
|
||||||
{
|
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
|
||||||
"role": "system",
|
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
|
||||||
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
model=os.getenv("AZURE_CHATGPT_MODEL"),
|
||||||
},
|
)
|
||||||
]
|
|
||||||
|
|
||||||
context = OpenAILLMContext(messages)
|
messages = [
|
||||||
context_aggregator = llm.create_context_aggregator(context)
|
{
|
||||||
|
"role": "system",
|
||||||
pipeline = Pipeline(
|
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
||||||
[
|
},
|
||||||
transport.input(), # Transport user input
|
|
||||||
stt, # STT
|
|
||||||
context_aggregator.user(), # User responses
|
|
||||||
llm, # LLM
|
|
||||||
tts, # TTS
|
|
||||||
transport.output(), # Transport bot output
|
|
||||||
context_aggregator.assistant(), # Assistant spoken responses
|
|
||||||
]
|
]
|
||||||
)
|
|
||||||
|
|
||||||
task = PipelineTask(
|
context = OpenAILLMContext(messages)
|
||||||
pipeline,
|
context_aggregator = llm.create_context_aggregator(context)
|
||||||
params=PipelineParams(
|
|
||||||
allow_interruptions=True,
|
|
||||||
enable_metrics=True,
|
|
||||||
enable_usage_metrics=True,
|
|
||||||
report_only_initial_ttfb=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_connected")
|
pipeline = Pipeline(
|
||||||
async def on_client_connected(transport, client):
|
[
|
||||||
logger.info(f"Client connected")
|
transport.input(), # Transport user input
|
||||||
# Kick off the conversation.
|
stt, # STT
|
||||||
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
context_aggregator.user(), # User responses
|
||||||
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
llm, # LLM
|
||||||
|
tts, # TTS
|
||||||
|
transport.output(), # Transport bot output
|
||||||
|
context_aggregator.assistant(), # Assistant spoken responses
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
task = PipelineTask(
|
||||||
async def on_client_disconnected(transport, client):
|
pipeline,
|
||||||
logger.info(f"Client disconnected")
|
params=PipelineParams(
|
||||||
|
allow_interruptions=True,
|
||||||
|
enable_metrics=True,
|
||||||
|
enable_usage_metrics=True,
|
||||||
|
report_only_initial_ttfb=True,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
@transport.event_handler("on_first_participant_joined")
|
||||||
async def on_client_closed(transport, client):
|
async def on_first_participant_joined(transport, participant):
|
||||||
logger.info(f"Client closed connection")
|
await transport.capture_participant_transcription(participant["id"])
|
||||||
await task.cancel()
|
# Kick off the conversation.
|
||||||
|
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
||||||
|
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
@transport.event_handler("on_participant_left")
|
||||||
|
async def on_participant_left(transport, participant, reason):
|
||||||
|
await task.cancel()
|
||||||
|
|
||||||
await runner.run(task)
|
runner = PipelineRunner()
|
||||||
|
|
||||||
|
await runner.run(task)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
@@ -4,105 +4,106 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||||
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
||||||
from pipecat.services.openai.llm import OpenAILLMService
|
from pipecat.services.openai import OpenAILLMService, OpenAISTTService, OpenAITTSService
|
||||||
from pipecat.services.openai.stt import OpenAISTTService
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.services.openai.tts import OpenAITTSService
|
|
||||||
from pipecat.transports.base_transport import TransportParams
|
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
|
||||||
logger.info(f"Starting bot")
|
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
async def main():
|
||||||
webrtc_connection=webrtc_connection,
|
async with aiohttp.ClientSession() as session:
|
||||||
params=TransportParams(
|
(room_url, token) = await configure(session)
|
||||||
audio_in_enabled=True,
|
|
||||||
audio_out_enabled=True,
|
|
||||||
vad_enabled=True,
|
|
||||||
vad_analyzer=SileroVADAnalyzer(),
|
|
||||||
vad_audio_passthrough=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
stt = OpenAISTTService(
|
transport = DailyTransport(
|
||||||
api_key=os.getenv("OPENAI_API_KEY"),
|
room_url,
|
||||||
model="gpt-4o-transcribe-latest",
|
token,
|
||||||
prompt="Expect words related to dogs, such as breed names.",
|
"Respond bot",
|
||||||
)
|
DailyParams(
|
||||||
|
audio_out_enabled=True,
|
||||||
|
audio_out_sample_rate=24000,
|
||||||
|
transcription_enabled=False,
|
||||||
|
vad_enabled=True,
|
||||||
|
vad_analyzer=SileroVADAnalyzer(),
|
||||||
|
vad_audio_passthrough=True,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
tts = OpenAITTSService(api_key=os.getenv("OPENAI_API_KEY"), voice="ballad")
|
# You can use the OpenAI compatible API like Groq.
|
||||||
|
# stt = OpenAISTTService(
|
||||||
|
# base_url="https://api.groq.com/openai/v1",
|
||||||
|
# api_key="gsk_***",
|
||||||
|
# model="whisper-large-v3",
|
||||||
|
# )
|
||||||
|
stt = OpenAISTTService(api_key=os.getenv("OPENAI_API_KEY"), model="whisper-1")
|
||||||
|
|
||||||
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
|
tts = OpenAITTSService(api_key=os.getenv("OPENAI_API_KEY"), voice="alloy")
|
||||||
|
|
||||||
messages = [
|
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
|
||||||
{
|
|
||||||
"role": "system",
|
|
||||||
"content": "You are very knowledgable about dogs. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
|
||||||
},
|
|
||||||
]
|
|
||||||
|
|
||||||
context = OpenAILLMContext(messages)
|
messages = [
|
||||||
context_aggregator = llm.create_context_aggregator(context)
|
{
|
||||||
|
"role": "system",
|
||||||
pipeline = Pipeline(
|
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
||||||
[
|
},
|
||||||
transport.input(), # Transport user input
|
|
||||||
stt, # STT
|
|
||||||
context_aggregator.user(), # User responses
|
|
||||||
llm, # LLM
|
|
||||||
tts, # TTS
|
|
||||||
transport.output(), # Transport bot output
|
|
||||||
context_aggregator.assistant(), # Assistant spoken responses
|
|
||||||
]
|
]
|
||||||
)
|
|
||||||
|
|
||||||
task = PipelineTask(
|
context = OpenAILLMContext(messages)
|
||||||
pipeline,
|
context_aggregator = llm.create_context_aggregator(context)
|
||||||
params=PipelineParams(
|
|
||||||
allow_interruptions=True,
|
|
||||||
audio_out_sample_rate=24000,
|
|
||||||
enable_metrics=True,
|
|
||||||
enable_usage_metrics=True,
|
|
||||||
report_only_initial_ttfb=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_connected")
|
pipeline = Pipeline(
|
||||||
async def on_client_connected(transport, client):
|
[
|
||||||
logger.info(f"Client connected")
|
transport.input(), # Transport user input
|
||||||
# Kick off the conversation.
|
stt, # STT
|
||||||
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
context_aggregator.user(), # User responses
|
||||||
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
llm, # LLM
|
||||||
|
tts, # TTS
|
||||||
|
transport.output(), # Transport bot output
|
||||||
|
context_aggregator.assistant(), # Assistant spoken responses
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
task = PipelineTask(
|
||||||
async def on_client_disconnected(transport, client):
|
pipeline,
|
||||||
logger.info(f"Client disconnected")
|
params=PipelineParams(
|
||||||
|
allow_interruptions=True,
|
||||||
|
enable_metrics=True,
|
||||||
|
enable_usage_metrics=True,
|
||||||
|
report_only_initial_ttfb=True,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
@transport.event_handler("on_first_participant_joined")
|
||||||
async def on_client_closed(transport, client):
|
async def on_first_participant_joined(transport, participant):
|
||||||
logger.info(f"Client closed connection")
|
await transport.capture_participant_transcription(participant["id"])
|
||||||
await task.cancel()
|
# Kick off the conversation.
|
||||||
|
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
||||||
|
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
@transport.event_handler("on_participant_left")
|
||||||
|
async def on_participant_left(transport, participant, reason):
|
||||||
|
await task.cancel()
|
||||||
|
|
||||||
await runner.run(task)
|
runner = PipelineRunner()
|
||||||
|
|
||||||
|
await runner.run(task)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
@@ -4,109 +4,106 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
import time
|
import time
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||||
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
||||||
from pipecat.services.cartesia.tts import CartesiaTTSService
|
from pipecat.services.cartesia import CartesiaTTSService
|
||||||
from pipecat.services.deepgram.stt import DeepgramSTTService
|
from pipecat.services.openpipe import OpenPipeLLMService
|
||||||
from pipecat.services.openpipe.llm import OpenPipeLLMService
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.transports.base_transport import TransportParams
|
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
|
||||||
logger.info(f"Starting bot")
|
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
async def main():
|
||||||
webrtc_connection=webrtc_connection,
|
async with aiohttp.ClientSession() as session:
|
||||||
params=TransportParams(
|
(room_url, token) = await configure(session)
|
||||||
audio_in_enabled=True,
|
|
||||||
audio_out_enabled=True,
|
|
||||||
vad_enabled=True,
|
|
||||||
vad_analyzer=SileroVADAnalyzer(),
|
|
||||||
vad_audio_passthrough=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
|
transport = DailyTransport(
|
||||||
|
room_url,
|
||||||
|
token,
|
||||||
|
"Respond bot",
|
||||||
|
DailyParams(
|
||||||
|
audio_out_enabled=True,
|
||||||
|
transcription_enabled=True,
|
||||||
|
vad_enabled=True,
|
||||||
|
vad_analyzer=SileroVADAnalyzer(),
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
tts = CartesiaTTSService(
|
tts = CartesiaTTSService(
|
||||||
api_key=os.getenv("CARTESIA_API_KEY"),
|
api_key=os.getenv("CARTESIA_API_KEY"),
|
||||||
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
|
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
|
||||||
)
|
)
|
||||||
|
|
||||||
timestamp = int(time.time())
|
timestamp = int(time.time())
|
||||||
llm = OpenPipeLLMService(
|
llm = OpenPipeLLMService(
|
||||||
api_key=os.getenv("OPENAI_API_KEY"),
|
api_key=os.getenv("OPENAI_API_KEY"),
|
||||||
openpipe_api_key=os.getenv("OPENPIPE_API_KEY"),
|
openpipe_api_key=os.getenv("OPENPIPE_API_KEY"),
|
||||||
tags={"conversation_id": f"pipecat-{timestamp}"},
|
model="gpt-4o",
|
||||||
)
|
tags={"conversation_id": f"pipecat-{timestamp}"},
|
||||||
|
)
|
||||||
|
|
||||||
messages = [
|
messages = [
|
||||||
{
|
{
|
||||||
"role": "system",
|
"role": "system",
|
||||||
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
||||||
},
|
},
|
||||||
]
|
|
||||||
|
|
||||||
context = OpenAILLMContext(messages)
|
|
||||||
context_aggregator = llm.create_context_aggregator(context)
|
|
||||||
|
|
||||||
pipeline = Pipeline(
|
|
||||||
[
|
|
||||||
transport.input(), # Transport user input
|
|
||||||
stt,
|
|
||||||
context_aggregator.user(), # User responses
|
|
||||||
llm, # LLM
|
|
||||||
tts, # TTS
|
|
||||||
transport.output(), # Transport bot output
|
|
||||||
context_aggregator.assistant(), # Assistant spoken responses
|
|
||||||
]
|
]
|
||||||
)
|
|
||||||
|
|
||||||
task = PipelineTask(
|
context = OpenAILLMContext(messages)
|
||||||
pipeline,
|
context_aggregator = llm.create_context_aggregator(context)
|
||||||
params=PipelineParams(
|
|
||||||
allow_interruptions=True,
|
|
||||||
enable_metrics=True,
|
|
||||||
enable_usage_metrics=True,
|
|
||||||
report_only_initial_ttfb=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_connected")
|
pipeline = Pipeline(
|
||||||
async def on_client_connected(transport, client):
|
[
|
||||||
logger.info(f"Client connected")
|
transport.input(), # Transport user input
|
||||||
# Kick off the conversation.
|
context_aggregator.user(), # User responses
|
||||||
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
llm, # LLM
|
||||||
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
tts, # TTS
|
||||||
|
transport.output(), # Transport bot output
|
||||||
|
context_aggregator.assistant(), # Assistant spoken responses
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
task = PipelineTask(
|
||||||
async def on_client_disconnected(transport, client):
|
pipeline,
|
||||||
logger.info(f"Client disconnected")
|
params=PipelineParams(
|
||||||
|
allow_interruptions=True,
|
||||||
|
enable_metrics=True,
|
||||||
|
enable_usage_metrics=True,
|
||||||
|
report_only_initial_ttfb=True,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
@transport.event_handler("on_first_participant_joined")
|
||||||
async def on_client_closed(transport, client):
|
async def on_first_participant_joined(transport, participant):
|
||||||
logger.info(f"Client closed connection")
|
await transport.capture_participant_transcription(participant["id"])
|
||||||
await task.cancel()
|
# Kick off the conversation.
|
||||||
|
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
||||||
|
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
@transport.event_handler("on_participant_left")
|
||||||
|
async def on_participant_left(transport, participant, reason):
|
||||||
|
await task.cancel()
|
||||||
|
|
||||||
await runner.run(task)
|
runner = PipelineRunner()
|
||||||
|
|
||||||
|
await runner.run(task)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
@@ -4,44 +4,45 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
import aiohttp
|
import aiohttp
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||||
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
||||||
from pipecat.services.deepgram.stt import DeepgramSTTService
|
from pipecat.services.openai import OpenAILLMService
|
||||||
from pipecat.services.openai.llm import OpenAILLMService
|
from pipecat.services.xtts import XTTSService
|
||||||
from pipecat.services.xtts.tts import XTTSService
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.transports.base_transport import TransportParams
|
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
|
||||||
logger.info(f"Starting bot")
|
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
async def main():
|
||||||
webrtc_connection=webrtc_connection,
|
|
||||||
params=TransportParams(
|
|
||||||
audio_in_enabled=True,
|
|
||||||
audio_out_enabled=True,
|
|
||||||
vad_enabled=True,
|
|
||||||
vad_analyzer=SileroVADAnalyzer(),
|
|
||||||
vad_audio_passthrough=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
# Create an HTTP session
|
|
||||||
async with aiohttp.ClientSession() as session:
|
async with aiohttp.ClientSession() as session:
|
||||||
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
|
(room_url, token) = await configure(session)
|
||||||
|
|
||||||
|
transport = DailyTransport(
|
||||||
|
room_url,
|
||||||
|
token,
|
||||||
|
"Respond bot",
|
||||||
|
DailyParams(
|
||||||
|
audio_out_enabled=True,
|
||||||
|
transcription_enabled=True,
|
||||||
|
vad_enabled=True,
|
||||||
|
vad_analyzer=SileroVADAnalyzer(),
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
tts = XTTSService(
|
tts = XTTSService(
|
||||||
aiohttp_session=session,
|
aiohttp_session=session,
|
||||||
@@ -49,7 +50,7 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
|||||||
base_url="http://localhost:8000",
|
base_url="http://localhost:8000",
|
||||||
)
|
)
|
||||||
|
|
||||||
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
|
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
|
||||||
|
|
||||||
messages = [
|
messages = [
|
||||||
{
|
{
|
||||||
@@ -64,7 +65,6 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
|||||||
pipeline = Pipeline(
|
pipeline = Pipeline(
|
||||||
[
|
[
|
||||||
transport.input(), # Transport user input
|
transport.input(), # Transport user input
|
||||||
stt,
|
|
||||||
context_aggregator.user(), # User responses
|
context_aggregator.user(), # User responses
|
||||||
llm, # LLM
|
llm, # LLM
|
||||||
tts, # TTS
|
tts, # TTS
|
||||||
@@ -83,28 +83,21 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
|||||||
),
|
),
|
||||||
)
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_connected")
|
@transport.event_handler("on_first_participant_joined")
|
||||||
async def on_client_connected(transport, client):
|
async def on_first_participant_joined(transport, participant):
|
||||||
logger.info(f"Client connected")
|
await transport.capture_participant_transcription(participant["id"])
|
||||||
# Kick off the conversation.
|
# Kick off the conversation.
|
||||||
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
||||||
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
@transport.event_handler("on_participant_left")
|
||||||
async def on_client_disconnected(transport, client):
|
async def on_participant_left(transport, participant, reason):
|
||||||
logger.info(f"Client disconnected")
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
|
||||||
async def on_client_closed(transport, client):
|
|
||||||
logger.info(f"Client closed connection")
|
|
||||||
await task.cancel()
|
await task.cancel()
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
runner = PipelineRunner()
|
||||||
|
|
||||||
await runner.run(task)
|
await runner.run(task)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
@@ -4,111 +4,106 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||||
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
||||||
from pipecat.services.cartesia.tts import CartesiaTTSService
|
from pipecat.services.cartesia import CartesiaTTSService
|
||||||
from pipecat.services.gladia.config import GladiaInputParams, LanguageConfig
|
from pipecat.services.gladia import GladiaSTTService
|
||||||
from pipecat.services.gladia.stt import GladiaSTTService
|
from pipecat.services.openai import OpenAILLMService
|
||||||
from pipecat.services.openai.llm import OpenAILLMService
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.transcriptions.language import Language
|
|
||||||
from pipecat.transports.base_transport import TransportParams
|
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
|
||||||
logger.info(f"Starting bot")
|
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
async def main():
|
||||||
webrtc_connection=webrtc_connection,
|
async with aiohttp.ClientSession() as session:
|
||||||
params=TransportParams(
|
(room_url, token) = await configure(session)
|
||||||
audio_in_enabled=True,
|
|
||||||
audio_out_enabled=True,
|
|
||||||
vad_enabled=True,
|
|
||||||
vad_analyzer=SileroVADAnalyzer(),
|
|
||||||
vad_audio_passthrough=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
stt = GladiaSTTService(
|
transport = DailyTransport(
|
||||||
api_key=os.getenv("GLADIA_API_KEY", ""),
|
room_url,
|
||||||
params=GladiaInputParams(
|
token,
|
||||||
language_config=LanguageConfig(
|
"Respond bot",
|
||||||
languages=[Language.EN],
|
DailyParams(
|
||||||
)
|
audio_out_enabled=True,
|
||||||
),
|
vad_enabled=True,
|
||||||
)
|
vad_analyzer=SileroVADAnalyzer(),
|
||||||
|
vad_audio_passthrough=True,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
tts = CartesiaTTSService(
|
stt = GladiaSTTService(
|
||||||
api_key=os.getenv("CARTESIA_API_KEY", ""),
|
api_key=os.getenv("GLADIA_API_KEY"),
|
||||||
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
|
)
|
||||||
)
|
|
||||||
|
|
||||||
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY", ""))
|
tts = CartesiaTTSService(
|
||||||
|
api_key=os.getenv("CARTESIA_API_KEY"),
|
||||||
|
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
|
||||||
|
)
|
||||||
|
|
||||||
messages = [
|
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
|
||||||
{
|
|
||||||
"role": "system",
|
|
||||||
"content": f"You are a helpful LLM. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
|
||||||
},
|
|
||||||
]
|
|
||||||
|
|
||||||
context = OpenAILLMContext(messages)
|
messages = [
|
||||||
context_aggregator = llm.create_context_aggregator(context)
|
{
|
||||||
|
"role": "system",
|
||||||
pipeline = Pipeline(
|
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
||||||
[
|
},
|
||||||
transport.input(), # Transport user input
|
|
||||||
stt, # STT
|
|
||||||
context_aggregator.user(), # User responses
|
|
||||||
llm, # LLM
|
|
||||||
tts, # TTS
|
|
||||||
transport.output(), # Transport bot output
|
|
||||||
context_aggregator.assistant(), # Assistant spoken responses
|
|
||||||
]
|
]
|
||||||
)
|
|
||||||
|
|
||||||
task = PipelineTask(
|
context = OpenAILLMContext(messages)
|
||||||
pipeline,
|
context_aggregator = llm.create_context_aggregator(context)
|
||||||
params=PipelineParams(
|
|
||||||
allow_interruptions=True,
|
|
||||||
enable_metrics=True,
|
|
||||||
enable_usage_metrics=True,
|
|
||||||
report_only_initial_ttfb=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_connected")
|
pipeline = Pipeline(
|
||||||
async def on_client_connected(transport, client):
|
[
|
||||||
logger.info(f"Client connected")
|
transport.input(), # Transport user input
|
||||||
# Kick off the conversation.
|
stt, # STT
|
||||||
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
context_aggregator.user(), # User responses
|
||||||
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
llm, # LLM
|
||||||
|
tts, # TTS
|
||||||
|
transport.output(), # Transport bot output
|
||||||
|
context_aggregator.assistant(), # Assistant spoken responses
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
task = PipelineTask(
|
||||||
async def on_client_disconnected(transport, client):
|
pipeline,
|
||||||
logger.info(f"Client disconnected")
|
params=PipelineParams(
|
||||||
|
allow_interruptions=True,
|
||||||
|
enable_metrics=True,
|
||||||
|
enable_usage_metrics=True,
|
||||||
|
report_only_initial_ttfb=True,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
@transport.event_handler("on_first_participant_joined")
|
||||||
async def on_client_closed(transport, client):
|
async def on_first_participant_joined(transport, participant):
|
||||||
logger.info(f"Client closed connection")
|
await transport.capture_participant_transcription(participant["id"])
|
||||||
await task.cancel()
|
# Kick off the conversation.
|
||||||
|
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
||||||
|
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
# Register an event handler to exit the application when the user leaves.
|
||||||
await runner.run(task)
|
@transport.event_handler("on_participant_left")
|
||||||
|
async def on_participant_left(transport, participant, reason):
|
||||||
|
await task.cancel()
|
||||||
|
|
||||||
|
runner = PipelineRunner()
|
||||||
|
|
||||||
|
await runner.run(task)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
@@ -4,100 +4,96 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||||
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
||||||
from pipecat.services.deepgram.stt import DeepgramSTTService
|
from pipecat.services.lmnt import LmntTTSService
|
||||||
from pipecat.services.lmnt.tts import LmntTTSService
|
from pipecat.services.openai import OpenAILLMService
|
||||||
from pipecat.services.openai.llm import OpenAILLMService
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.transports.base_transport import TransportParams
|
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
|
||||||
logger.info(f"Starting bot")
|
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
async def main():
|
||||||
webrtc_connection=webrtc_connection,
|
async with aiohttp.ClientSession() as session:
|
||||||
params=TransportParams(
|
(room_url, token) = await configure(session)
|
||||||
audio_in_enabled=True,
|
|
||||||
audio_out_enabled=True,
|
|
||||||
vad_enabled=True,
|
|
||||||
vad_analyzer=SileroVADAnalyzer(),
|
|
||||||
vad_audio_passthrough=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
|
transport = DailyTransport(
|
||||||
|
room_url,
|
||||||
|
token,
|
||||||
|
"Respond bot",
|
||||||
|
DailyParams(
|
||||||
|
audio_out_enabled=True,
|
||||||
|
transcription_enabled=True,
|
||||||
|
vad_enabled=True,
|
||||||
|
vad_analyzer=SileroVADAnalyzer(),
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
tts = LmntTTSService(api_key=os.getenv("LMNT_API_KEY"), voice_id="morgan")
|
tts = LmntTTSService(api_key=os.getenv("LMNT_API_KEY"), voice_id="morgan")
|
||||||
|
|
||||||
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
|
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
|
||||||
|
|
||||||
messages = [
|
messages = [
|
||||||
{
|
{
|
||||||
"role": "system",
|
"role": "system",
|
||||||
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
||||||
},
|
},
|
||||||
]
|
|
||||||
|
|
||||||
context = OpenAILLMContext(messages)
|
|
||||||
context_aggregator = llm.create_context_aggregator(context)
|
|
||||||
|
|
||||||
pipeline = Pipeline(
|
|
||||||
[
|
|
||||||
transport.input(), # Transport user input
|
|
||||||
stt,
|
|
||||||
context_aggregator.user(), # User respones
|
|
||||||
llm, # LLM
|
|
||||||
tts, # TTS
|
|
||||||
transport.output(), # Transport bot output
|
|
||||||
context_aggregator.assistant(), # Assistant spoken responses
|
|
||||||
]
|
]
|
||||||
)
|
|
||||||
|
|
||||||
task = PipelineTask(
|
context = OpenAILLMContext(messages)
|
||||||
pipeline,
|
context_aggregator = llm.create_context_aggregator(context)
|
||||||
params=PipelineParams(
|
|
||||||
allow_interruptions=True,
|
|
||||||
enable_metrics=True,
|
|
||||||
enable_usage_metrics=True,
|
|
||||||
report_only_initial_ttfb=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_connected")
|
pipeline = Pipeline(
|
||||||
async def on_client_connected(transport, client):
|
[
|
||||||
logger.info(f"Client connected")
|
transport.input(), # Transport user input
|
||||||
# Kick off the conversation.
|
context_aggregator.user(), # User respones
|
||||||
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
llm, # LLM
|
||||||
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
tts, # TTS
|
||||||
|
transport.output(), # Transport bot output
|
||||||
|
context_aggregator.assistant(), # Assistant spoken responses
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
task = PipelineTask(
|
||||||
async def on_client_disconnected(transport, client):
|
pipeline,
|
||||||
logger.info(f"Client disconnected")
|
params=PipelineParams(
|
||||||
|
allow_interruptions=True,
|
||||||
|
enable_metrics=True,
|
||||||
|
enable_usage_metrics=True,
|
||||||
|
report_only_initial_ttfb=True,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
@transport.event_handler("on_first_participant_joined")
|
||||||
async def on_client_closed(transport, client):
|
async def on_first_participant_joined(transport, participant):
|
||||||
logger.info(f"Client closed connection")
|
await transport.capture_participant_transcription(participant["id"])
|
||||||
await task.cancel()
|
# Kick off the conversation.
|
||||||
|
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
||||||
|
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
@transport.event_handler("on_participant_left")
|
||||||
|
async def on_participant_left(transport, participant, reason):
|
||||||
|
await task.cancel()
|
||||||
|
|
||||||
await runner.run(task)
|
runner = PipelineRunner()
|
||||||
|
|
||||||
|
await runner.run(task)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
@@ -1,102 +0,0 @@
|
|||||||
#
|
|
||||||
# Copyright (c) 2024–2025, Daily
|
|
||||||
#
|
|
||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
|
||||||
#
|
|
||||||
|
|
||||||
import os
|
|
||||||
|
|
||||||
from dotenv import load_dotenv
|
|
||||||
from loguru import logger
|
|
||||||
|
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
|
||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
|
||||||
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
|
||||||
from pipecat.services.groq.llm import GroqLLMService
|
|
||||||
from pipecat.services.groq.stt import GroqSTTService
|
|
||||||
from pipecat.services.groq.tts import GroqTTSService
|
|
||||||
from pipecat.transports.base_transport import TransportParams
|
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
|
||||||
|
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
|
||||||
logger.info(f"Starting bot")
|
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
|
||||||
webrtc_connection=webrtc_connection,
|
|
||||||
params=TransportParams(
|
|
||||||
audio_in_enabled=True,
|
|
||||||
audio_out_enabled=True,
|
|
||||||
vad_enabled=True,
|
|
||||||
vad_analyzer=SileroVADAnalyzer(),
|
|
||||||
vad_audio_passthrough=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
stt = GroqSTTService(api_key=os.getenv("GROQ_API_KEY"))
|
|
||||||
|
|
||||||
llm = GroqLLMService(api_key=os.getenv("GROQ_API_KEY"), model="llama-3.3-70b-versatile")
|
|
||||||
|
|
||||||
tts = GroqTTSService(api_key=os.getenv("GROQ_API_KEY"))
|
|
||||||
|
|
||||||
messages = [
|
|
||||||
{
|
|
||||||
"role": "system",
|
|
||||||
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
|
||||||
},
|
|
||||||
]
|
|
||||||
|
|
||||||
context = OpenAILLMContext(messages)
|
|
||||||
context_aggregator = llm.create_context_aggregator(context)
|
|
||||||
|
|
||||||
pipeline = Pipeline(
|
|
||||||
[
|
|
||||||
transport.input(), # Transport user input
|
|
||||||
stt,
|
|
||||||
context_aggregator.user(), # User responses
|
|
||||||
llm, # LLM
|
|
||||||
tts, # TTS
|
|
||||||
transport.output(), # Transport bot output
|
|
||||||
context_aggregator.assistant(), # Assistant spoken responses
|
|
||||||
]
|
|
||||||
)
|
|
||||||
|
|
||||||
task = PipelineTask(
|
|
||||||
pipeline,
|
|
||||||
params=PipelineParams(
|
|
||||||
allow_interruptions=True,
|
|
||||||
enable_metrics=True,
|
|
||||||
enable_usage_metrics=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_connected")
|
|
||||||
async def on_client_connected(transport, client):
|
|
||||||
logger.info(f"Client connected")
|
|
||||||
# Kick off the conversation.
|
|
||||||
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
|
||||||
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
|
||||||
async def on_client_disconnected(transport, client):
|
|
||||||
logger.info(f"Client disconnected")
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
|
||||||
async def on_client_closed(transport, client):
|
|
||||||
logger.info(f"Client closed connection")
|
|
||||||
await task.cancel()
|
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
|
||||||
|
|
||||||
await runner.run(task)
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
from run import main
|
|
||||||
|
|
||||||
main()
|
|
||||||
115
examples/foundational/07l-interruptible-together.py
Normal file
115
examples/foundational/07l-interruptible-together.py
Normal file
@@ -0,0 +1,115 @@
|
|||||||
|
#
|
||||||
|
# Copyright (c) 2024–2025, Daily
|
||||||
|
#
|
||||||
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
|
from dotenv import load_dotenv
|
||||||
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
|
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
||||||
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
|
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||||
|
from pipecat.services.ai_services import OpenAILLMContext
|
||||||
|
from pipecat.services.cartesia import CartesiaTTSService
|
||||||
|
from pipecat.services.together import TogetherLLMService
|
||||||
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
|
|
||||||
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
|
|
||||||
|
async def main():
|
||||||
|
async with aiohttp.ClientSession() as session:
|
||||||
|
(room_url, token) = await configure(session)
|
||||||
|
|
||||||
|
transport = DailyTransport(
|
||||||
|
room_url,
|
||||||
|
token,
|
||||||
|
"Respond bot",
|
||||||
|
DailyParams(
|
||||||
|
audio_out_enabled=True,
|
||||||
|
transcription_enabled=True,
|
||||||
|
vad_enabled=True,
|
||||||
|
vad_analyzer=SileroVADAnalyzer(),
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
|
tts = CartesiaTTSService(
|
||||||
|
api_key=os.getenv("CARTESIA_API_KEY"),
|
||||||
|
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
|
||||||
|
)
|
||||||
|
|
||||||
|
llm = TogetherLLMService(
|
||||||
|
api_key=os.getenv("TOGETHER_API_KEY"),
|
||||||
|
model="meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
|
||||||
|
params=TogetherLLMService.InputParams(
|
||||||
|
temperature=1.0,
|
||||||
|
top_p=0.9,
|
||||||
|
top_k=40,
|
||||||
|
extra={
|
||||||
|
"frequency_penalty": 2.0,
|
||||||
|
"presence_penalty": 0.0,
|
||||||
|
},
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
|
messages = [
|
||||||
|
{
|
||||||
|
"role": "system",
|
||||||
|
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond in plain language. Respond to what the user said in a creative and helpful way.",
|
||||||
|
},
|
||||||
|
]
|
||||||
|
|
||||||
|
context = OpenAILLMContext(messages)
|
||||||
|
context_aggregator = llm.create_context_aggregator(context)
|
||||||
|
user_aggregator = context_aggregator.user()
|
||||||
|
assistant_aggregator = context_aggregator.assistant()
|
||||||
|
|
||||||
|
pipeline = Pipeline(
|
||||||
|
[
|
||||||
|
transport.input(), # Transport user input
|
||||||
|
user_aggregator, # User responses
|
||||||
|
llm, # LLM
|
||||||
|
tts, # TTS
|
||||||
|
transport.output(), # Transport bot output
|
||||||
|
assistant_aggregator, # Assistant spoken responses
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
|
task = PipelineTask(
|
||||||
|
pipeline,
|
||||||
|
params=PipelineParams(
|
||||||
|
allow_interruptions=True,
|
||||||
|
enable_metrics=True,
|
||||||
|
enable_usage_metrics=True,
|
||||||
|
report_only_initial_ttfb=True,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
|
@transport.event_handler("on_first_participant_joined")
|
||||||
|
async def on_first_participant_joined(transport, participant):
|
||||||
|
await transport.capture_participant_transcription(participant["id"])
|
||||||
|
# Kick off the conversation.
|
||||||
|
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
||||||
|
|
||||||
|
@transport.event_handler("on_participant_left")
|
||||||
|
async def on_participant_left(transport, participant, reason):
|
||||||
|
await task.cancel()
|
||||||
|
|
||||||
|
runner = PipelineRunner()
|
||||||
|
|
||||||
|
await runner.run(task)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
asyncio.run(main())
|
||||||
@@ -4,106 +4,106 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||||
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
||||||
from pipecat.services.aws.tts import PollyTTSService
|
from pipecat.services.aws import PollyTTSService
|
||||||
from pipecat.services.deepgram.stt import DeepgramSTTService
|
from pipecat.services.deepgram import DeepgramSTTService
|
||||||
from pipecat.services.openai.llm import OpenAILLMService
|
from pipecat.services.openai import OpenAILLMService
|
||||||
from pipecat.transports.base_transport import TransportParams
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
|
||||||
logger.info(f"Starting bot")
|
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
async def main():
|
||||||
webrtc_connection=webrtc_connection,
|
async with aiohttp.ClientSession() as session:
|
||||||
params=TransportParams(
|
(room_url, _) = await configure(session)
|
||||||
audio_in_enabled=True,
|
|
||||||
audio_out_enabled=True,
|
|
||||||
vad_enabled=True,
|
|
||||||
vad_analyzer=SileroVADAnalyzer(),
|
|
||||||
vad_audio_passthrough=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
|
transport = DailyTransport(
|
||||||
|
room_url,
|
||||||
|
None,
|
||||||
|
"Respond bot",
|
||||||
|
DailyParams(
|
||||||
|
audio_out_enabled=True,
|
||||||
|
vad_enabled=True,
|
||||||
|
vad_analyzer=SileroVADAnalyzer(),
|
||||||
|
vad_audio_passthrough=True,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
tts = PollyTTSService(
|
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
|
||||||
api_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
|
|
||||||
aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
|
|
||||||
region=os.getenv("AWS_REGION"),
|
|
||||||
voice_id="Amy",
|
|
||||||
params=PollyTTSService.InputParams(engine="neural", language="en-GB", rate="1.05"),
|
|
||||||
)
|
|
||||||
|
|
||||||
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
|
tts = PollyTTSService(
|
||||||
|
api_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
|
||||||
|
aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
|
||||||
|
region=os.getenv("AWS_REGION"),
|
||||||
|
voice_id="Amy",
|
||||||
|
params=PollyTTSService.InputParams(engine="neural", language="en-GB", rate="1.05"),
|
||||||
|
)
|
||||||
|
|
||||||
messages = [
|
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
|
||||||
{
|
|
||||||
"role": "system",
|
|
||||||
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
|
||||||
},
|
|
||||||
]
|
|
||||||
|
|
||||||
context = OpenAILLMContext(messages)
|
messages = [
|
||||||
context_aggregator = llm.create_context_aggregator(context)
|
{
|
||||||
|
"role": "system",
|
||||||
pipeline = Pipeline(
|
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
||||||
[
|
},
|
||||||
transport.input(), # Transport user input
|
|
||||||
stt, # STT
|
|
||||||
context_aggregator.user(), # User responses
|
|
||||||
llm, # LLM
|
|
||||||
tts, # TTS
|
|
||||||
transport.output(), # Transport bot output
|
|
||||||
context_aggregator.assistant(), # Assistant spoken responses
|
|
||||||
]
|
]
|
||||||
)
|
|
||||||
|
|
||||||
task = PipelineTask(
|
context = OpenAILLMContext(messages)
|
||||||
pipeline,
|
context_aggregator = llm.create_context_aggregator(context)
|
||||||
params=PipelineParams(
|
|
||||||
allow_interruptions=True,
|
|
||||||
enable_metrics=True,
|
|
||||||
enable_usage_metrics=True,
|
|
||||||
report_only_initial_ttfb=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_connected")
|
pipeline = Pipeline(
|
||||||
async def on_client_connected(transport, client):
|
[
|
||||||
logger.info(f"Client connected")
|
transport.input(), # Transport user input
|
||||||
# Kick off the conversation.
|
stt, # STT
|
||||||
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
context_aggregator.user(), # User responses
|
||||||
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
llm, # LLM
|
||||||
|
tts, # TTS
|
||||||
|
transport.output(), # Transport bot output
|
||||||
|
context_aggregator.assistant(), # Assistant spoken responses
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
task = PipelineTask(
|
||||||
async def on_client_disconnected(transport, client):
|
pipeline,
|
||||||
logger.info(f"Client disconnected")
|
params=PipelineParams(
|
||||||
|
allow_interruptions=True,
|
||||||
|
enable_metrics=True,
|
||||||
|
enable_usage_metrics=True,
|
||||||
|
report_only_initial_ttfb=True,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
@transport.event_handler("on_first_participant_joined")
|
||||||
async def on_client_closed(transport, client):
|
async def on_first_participant_joined(transport, participant):
|
||||||
logger.info(f"Client closed connection")
|
await transport.capture_participant_transcription(participant["id"])
|
||||||
await task.cancel()
|
# Kick off the conversation.
|
||||||
|
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
||||||
|
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
@transport.event_handler("on_participant_left")
|
||||||
|
async def on_participant_left(transport, participant, reason):
|
||||||
|
await task.cancel()
|
||||||
|
|
||||||
await runner.run(task)
|
runner = PipelineRunner()
|
||||||
|
|
||||||
|
await runner.run(task)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
@@ -4,108 +4,104 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||||
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
||||||
from pipecat.services.google.llm import GoogleLLMService
|
from pipecat.services.google import GoogleLLMService, GoogleSTTService, GoogleTTSService
|
||||||
from pipecat.services.google.stt import GoogleSTTService
|
|
||||||
from pipecat.services.google.tts import GoogleTTSService
|
|
||||||
from pipecat.transcriptions.language import Language
|
from pipecat.transcriptions.language import Language
|
||||||
from pipecat.transports.base_transport import TransportParams
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
|
||||||
logger.info(f"Starting bot")
|
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
async def main():
|
||||||
webrtc_connection=webrtc_connection,
|
async with aiohttp.ClientSession() as session:
|
||||||
params=TransportParams(
|
(room_url, _) = await configure(session)
|
||||||
audio_in_enabled=True,
|
|
||||||
audio_out_enabled=True,
|
|
||||||
vad_enabled=True,
|
|
||||||
vad_analyzer=SileroVADAnalyzer(),
|
|
||||||
vad_audio_passthrough=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
stt = GoogleSTTService(
|
transport = DailyTransport(
|
||||||
params=GoogleSTTService.InputParams(languages=Language.EN_US),
|
room_url,
|
||||||
credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
|
None,
|
||||||
)
|
"Respond bot",
|
||||||
|
DailyParams(
|
||||||
|
audio_out_enabled=True,
|
||||||
|
vad_enabled=True,
|
||||||
|
vad_analyzer=SileroVADAnalyzer(),
|
||||||
|
vad_audio_passthrough=True,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
tts = GoogleTTSService(
|
stt = GoogleSTTService(
|
||||||
voice_id="en-US-Chirp3-HD-Charon",
|
params=GoogleSTTService.InputParams(languages=Language.EN_US),
|
||||||
params=GoogleTTSService.InputParams(language=Language.EN_US),
|
)
|
||||||
credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
|
|
||||||
)
|
|
||||||
|
|
||||||
llm = GoogleLLMService(api_key=os.getenv("GOOGLE_API_KEY"))
|
tts = GoogleTTSService(
|
||||||
|
voice_id="en-US-Journey-F",
|
||||||
|
params=GoogleTTSService.InputParams(language=Language.EN_US),
|
||||||
|
)
|
||||||
|
|
||||||
messages = [
|
llm = GoogleLLMService(api_key=os.getenv("GOOGLE_API_KEY"))
|
||||||
{
|
|
||||||
"role": "system",
|
|
||||||
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
|
||||||
},
|
|
||||||
]
|
|
||||||
|
|
||||||
context = OpenAILLMContext(messages)
|
messages = [
|
||||||
context_aggregator = llm.create_context_aggregator(context)
|
{
|
||||||
|
"role": "system",
|
||||||
pipeline = Pipeline(
|
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
||||||
[
|
},
|
||||||
transport.input(), # Transport user input
|
|
||||||
stt, # STT
|
|
||||||
context_aggregator.user(), # User respones
|
|
||||||
llm, # LLM
|
|
||||||
tts, # TTS
|
|
||||||
transport.output(), # Transport bot output
|
|
||||||
context_aggregator.assistant(), # Assistant spoken responses
|
|
||||||
]
|
]
|
||||||
)
|
|
||||||
|
|
||||||
task = PipelineTask(
|
context = OpenAILLMContext(messages)
|
||||||
pipeline,
|
context_aggregator = llm.create_context_aggregator(context)
|
||||||
params=PipelineParams(
|
|
||||||
allow_interruptions=True,
|
|
||||||
enable_metrics=True,
|
|
||||||
enable_usage_metrics=True,
|
|
||||||
report_only_initial_ttfb=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_connected")
|
pipeline = Pipeline(
|
||||||
async def on_client_connected(transport, client):
|
[
|
||||||
logger.info(f"Client connected")
|
transport.input(), # Transport user input
|
||||||
# Kick off the conversation.
|
stt, # STT
|
||||||
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
context_aggregator.user(), # User respones
|
||||||
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
llm, # LLM
|
||||||
|
tts, # TTS
|
||||||
|
transport.output(), # Transport bot output
|
||||||
|
context_aggregator.assistant(), # Assistant spoken responses
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
task = PipelineTask(
|
||||||
async def on_client_disconnected(transport, client):
|
pipeline,
|
||||||
logger.info(f"Client disconnected")
|
params=PipelineParams(
|
||||||
|
allow_interruptions=True,
|
||||||
|
enable_metrics=True,
|
||||||
|
enable_usage_metrics=True,
|
||||||
|
report_only_initial_ttfb=True,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
@transport.event_handler("on_first_participant_joined")
|
||||||
async def on_client_closed(transport, client):
|
async def on_first_participant_joined(transport, participant):
|
||||||
logger.info(f"Client closed connection")
|
await transport.capture_participant_transcription(participant["id"])
|
||||||
await task.cancel()
|
# Kick off the conversation.
|
||||||
|
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
||||||
|
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
@transport.event_handler("on_participant_left")
|
||||||
|
async def on_participant_left(transport, participant, reason):
|
||||||
|
await task.cancel()
|
||||||
|
|
||||||
await runner.run(task)
|
runner = PipelineRunner()
|
||||||
|
|
||||||
|
await runner.run(task)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
@@ -4,105 +4,105 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||||
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
||||||
from pipecat.services.assemblyai.stt import AssemblyAISTTService
|
from pipecat.services.assemblyai import AssemblyAISTTService
|
||||||
from pipecat.services.cartesia.tts import CartesiaTTSService
|
from pipecat.services.cartesia import CartesiaTTSService
|
||||||
from pipecat.services.openai.llm import OpenAILLMService
|
from pipecat.services.openai import OpenAILLMService
|
||||||
from pipecat.transports.base_transport import TransportParams
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
|
||||||
logger.info(f"Starting bot")
|
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
async def main():
|
||||||
webrtc_connection=webrtc_connection,
|
async with aiohttp.ClientSession() as session:
|
||||||
params=TransportParams(
|
(room_url, token) = await configure(session)
|
||||||
audio_in_enabled=True,
|
|
||||||
audio_out_enabled=True,
|
|
||||||
vad_enabled=True,
|
|
||||||
vad_analyzer=SileroVADAnalyzer(),
|
|
||||||
vad_audio_passthrough=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
stt = AssemblyAISTTService(
|
transport = DailyTransport(
|
||||||
api_key=os.getenv("ASSEMBLYAI_API_KEY"),
|
room_url,
|
||||||
)
|
token,
|
||||||
|
"Respond bot",
|
||||||
|
DailyParams(
|
||||||
|
audio_out_enabled=True,
|
||||||
|
vad_enabled=True,
|
||||||
|
vad_analyzer=SileroVADAnalyzer(),
|
||||||
|
vad_audio_passthrough=True,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
tts = CartesiaTTSService(
|
stt = AssemblyAISTTService(
|
||||||
api_key=os.getenv("CARTESIA_API_KEY"),
|
api_key=os.getenv("ASSEMBLYAI_API_KEY"),
|
||||||
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
|
)
|
||||||
)
|
|
||||||
|
|
||||||
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
|
tts = CartesiaTTSService(
|
||||||
|
api_key=os.getenv("CARTESIA_API_KEY"),
|
||||||
|
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
|
||||||
|
)
|
||||||
|
|
||||||
messages = [
|
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
|
||||||
{
|
|
||||||
"role": "system",
|
|
||||||
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
|
||||||
},
|
|
||||||
]
|
|
||||||
|
|
||||||
context = OpenAILLMContext(messages)
|
messages = [
|
||||||
context_aggregator = llm.create_context_aggregator(context)
|
{
|
||||||
|
"role": "system",
|
||||||
pipeline = Pipeline(
|
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
||||||
[
|
},
|
||||||
transport.input(), # Transport user input
|
|
||||||
stt, # STT
|
|
||||||
context_aggregator.user(), # User responses
|
|
||||||
llm, # LLM
|
|
||||||
tts, # TTS
|
|
||||||
transport.output(), # Transport bot output
|
|
||||||
context_aggregator.assistant(), # Assistant spoken responses
|
|
||||||
]
|
]
|
||||||
)
|
|
||||||
|
|
||||||
task = PipelineTask(
|
context = OpenAILLMContext(messages)
|
||||||
pipeline,
|
context_aggregator = llm.create_context_aggregator(context)
|
||||||
params=PipelineParams(
|
|
||||||
allow_interruptions=True,
|
|
||||||
enable_metrics=True,
|
|
||||||
enable_usage_metrics=True,
|
|
||||||
report_only_initial_ttfb=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_connected")
|
pipeline = Pipeline(
|
||||||
async def on_client_connected(transport, client):
|
[
|
||||||
logger.info(f"Client connected")
|
transport.input(), # Transport user input
|
||||||
# Kick off the conversation.
|
stt, # STT
|
||||||
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
context_aggregator.user(), # User responses
|
||||||
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
llm, # LLM
|
||||||
|
tts, # TTS
|
||||||
|
transport.output(), # Transport bot output
|
||||||
|
context_aggregator.assistant(), # Assistant spoken responses
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
task = PipelineTask(
|
||||||
async def on_client_disconnected(transport, client):
|
pipeline,
|
||||||
logger.info(f"Client disconnected")
|
params=PipelineParams(
|
||||||
|
allow_interruptions=True,
|
||||||
|
enable_metrics=True,
|
||||||
|
enable_usage_metrics=True,
|
||||||
|
report_only_initial_ttfb=True,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
@transport.event_handler("on_first_participant_joined")
|
||||||
async def on_client_closed(transport, client):
|
async def on_first_participant_joined(transport, participant):
|
||||||
logger.info(f"Client closed connection")
|
await transport.capture_participant_transcription(participant["id"])
|
||||||
await task.cancel()
|
# Kick off the conversation.
|
||||||
|
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
||||||
|
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
@transport.event_handler("on_participant_left")
|
||||||
|
async def on_participant_left(transport, participant, reason):
|
||||||
|
await task.cancel()
|
||||||
|
|
||||||
await runner.run(task)
|
runner = PipelineRunner()
|
||||||
|
|
||||||
|
await runner.run(task)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
@@ -4,102 +4,100 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.audio.filters.krisp_filter import KrispFilter
|
from pipecat.audio.filters.krisp_filter import KrispFilter
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||||
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
||||||
from pipecat.services.deepgram.stt import DeepgramSTTService
|
from pipecat.services.deepgram import DeepgramSTTService, DeepgramTTSService
|
||||||
from pipecat.services.deepgram.tts import DeepgramTTSService
|
from pipecat.services.openai import OpenAILLMService
|
||||||
from pipecat.services.openai.llm import OpenAILLMService
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.transports.base_transport import TransportParams
|
from pipecat.vad.silero import SileroVADAnalyzer
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
|
||||||
logger.info(f"Starting bot")
|
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
async def main():
|
||||||
webrtc_connection=webrtc_connection,
|
async with aiohttp.ClientSession() as session:
|
||||||
params=TransportParams(
|
(room_url, token) = await configure(session)
|
||||||
audio_in_enabled=True,
|
|
||||||
audio_out_enabled=True,
|
|
||||||
vad_enabled=True,
|
|
||||||
vad_analyzer=SileroVADAnalyzer(),
|
|
||||||
vad_audio_passthrough=True,
|
|
||||||
audio_in_filter=KrispFilter(),
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
|
transport = DailyTransport(
|
||||||
|
room_url,
|
||||||
|
token,
|
||||||
|
"Respond bot",
|
||||||
|
DailyParams(
|
||||||
|
audio_out_enabled=True,
|
||||||
|
vad_enabled=True,
|
||||||
|
vad_analyzer=SileroVADAnalyzer(),
|
||||||
|
vad_audio_passthrough=True,
|
||||||
|
audio_in_filter=KrispFilter(),
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-helios-en")
|
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
|
||||||
|
|
||||||
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
|
tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-helios-en")
|
||||||
|
|
||||||
messages = [
|
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
|
||||||
{
|
|
||||||
"role": "system",
|
|
||||||
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
|
||||||
},
|
|
||||||
]
|
|
||||||
|
|
||||||
context = OpenAILLMContext(messages)
|
messages = [
|
||||||
context_aggregator = llm.create_context_aggregator(context)
|
{
|
||||||
|
"role": "system",
|
||||||
pipeline = Pipeline(
|
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
||||||
[
|
},
|
||||||
transport.input(), # Transport user input
|
|
||||||
stt, # STT
|
|
||||||
context_aggregator.user(), # User responses
|
|
||||||
llm, # LLM
|
|
||||||
tts, # TTS
|
|
||||||
transport.output(), # Transport bot output
|
|
||||||
context_aggregator.assistant(), # Assistant spoken responses
|
|
||||||
]
|
]
|
||||||
)
|
|
||||||
|
|
||||||
task = PipelineTask(
|
context = OpenAILLMContext(messages)
|
||||||
pipeline,
|
context_aggregator = llm.create_context_aggregator(context)
|
||||||
params=PipelineParams(
|
|
||||||
allow_interruptions=True,
|
|
||||||
enable_metrics=True,
|
|
||||||
enable_usage_metrics=True,
|
|
||||||
report_only_initial_ttfb=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_connected")
|
pipeline = Pipeline(
|
||||||
async def on_client_connected(transport, client):
|
[
|
||||||
logger.info(f"Client connected")
|
transport.input(), # Transport user input
|
||||||
# Kick off the conversation.
|
stt, # STT
|
||||||
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
context_aggregator.user(), # User responses
|
||||||
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
llm, # LLM
|
||||||
|
tts, # TTS
|
||||||
|
transport.output(), # Transport bot output
|
||||||
|
context_aggregator.assistant(), # Assistant spoken responses
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
task = PipelineTask(
|
||||||
async def on_client_disconnected(transport, client):
|
pipeline,
|
||||||
logger.info(f"Client disconnected")
|
params=PipelineParams(
|
||||||
|
allow_interruptions=True,
|
||||||
|
enable_metrics=True,
|
||||||
|
enable_usage_metrics=True,
|
||||||
|
report_only_initial_ttfb=True,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
@transport.event_handler("on_first_participant_joined")
|
||||||
async def on_client_closed(transport, client):
|
async def on_first_participant_joined(transport, participant):
|
||||||
logger.info(f"Client closed connection")
|
# Kick off the conversation.
|
||||||
await task.cancel()
|
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
||||||
|
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
@transport.event_handler("on_participant_left")
|
||||||
|
async def on_participant_left(transport, participant, reason):
|
||||||
|
await task.cancel()
|
||||||
|
|
||||||
await runner.run(task)
|
runner = PipelineRunner()
|
||||||
|
|
||||||
|
await runner.run(task)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
@@ -4,103 +4,99 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||||
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
||||||
from pipecat.services.deepgram.stt import DeepgramSTTService
|
from pipecat.services.openai import OpenAILLMService
|
||||||
from pipecat.services.openai.llm import OpenAILLMService
|
from pipecat.services.rime import RimeTTSService
|
||||||
from pipecat.services.rime.tts import RimeTTSService
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.transports.base_transport import TransportParams
|
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
|
||||||
logger.info(f"Starting bot")
|
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
async def main():
|
||||||
webrtc_connection=webrtc_connection,
|
async with aiohttp.ClientSession() as session:
|
||||||
params=TransportParams(
|
(room_url, token) = await configure(session)
|
||||||
audio_in_enabled=True,
|
|
||||||
audio_out_enabled=True,
|
|
||||||
vad_enabled=True,
|
|
||||||
vad_analyzer=SileroVADAnalyzer(),
|
|
||||||
vad_audio_passthrough=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
|
transport = DailyTransport(
|
||||||
|
room_url,
|
||||||
|
token,
|
||||||
|
"Respond bot",
|
||||||
|
DailyParams(
|
||||||
|
audio_out_enabled=True,
|
||||||
|
transcription_enabled=True,
|
||||||
|
vad_enabled=True,
|
||||||
|
vad_analyzer=SileroVADAnalyzer(),
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
tts = RimeTTSService(
|
tts = RimeTTSService(
|
||||||
api_key=os.getenv("RIME_API_KEY", ""),
|
api_key=os.getenv("RIME_API_KEY", ""),
|
||||||
voice_id="rex",
|
voice_id="rex",
|
||||||
)
|
)
|
||||||
|
|
||||||
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
|
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
|
||||||
|
|
||||||
messages = [
|
messages = [
|
||||||
{
|
{
|
||||||
"role": "system",
|
"role": "system",
|
||||||
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
||||||
},
|
},
|
||||||
]
|
|
||||||
|
|
||||||
context = OpenAILLMContext(messages)
|
|
||||||
context_aggregator = llm.create_context_aggregator(context)
|
|
||||||
|
|
||||||
pipeline = Pipeline(
|
|
||||||
[
|
|
||||||
transport.input(), # Transport user input
|
|
||||||
stt,
|
|
||||||
context_aggregator.user(), # User responses
|
|
||||||
llm, # LLM
|
|
||||||
tts, # TTS
|
|
||||||
transport.output(), # Transport bot output
|
|
||||||
context_aggregator.assistant(), # Assistant spoken responses
|
|
||||||
]
|
]
|
||||||
)
|
|
||||||
|
|
||||||
task = PipelineTask(
|
context = OpenAILLMContext(messages)
|
||||||
pipeline,
|
context_aggregator = llm.create_context_aggregator(context)
|
||||||
params=PipelineParams(
|
|
||||||
allow_interruptions=True,
|
|
||||||
enable_metrics=True,
|
|
||||||
enable_usage_metrics=True,
|
|
||||||
report_only_initial_ttfb=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_connected")
|
pipeline = Pipeline(
|
||||||
async def on_client_connected(transport, client):
|
[
|
||||||
logger.info(f"Client connected")
|
transport.input(), # Transport user input
|
||||||
# Kick off the conversation.
|
context_aggregator.user(), # User responses
|
||||||
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
llm, # LLM
|
||||||
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
tts, # TTS
|
||||||
|
transport.output(), # Transport bot output
|
||||||
|
context_aggregator.assistant(), # Assistant spoken responses
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
task = PipelineTask(
|
||||||
async def on_client_disconnected(transport, client):
|
pipeline,
|
||||||
logger.info(f"Client disconnected")
|
params=PipelineParams(
|
||||||
|
allow_interruptions=True,
|
||||||
|
enable_metrics=True,
|
||||||
|
enable_usage_metrics=True,
|
||||||
|
report_only_initial_ttfb=True,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
@transport.event_handler("on_first_participant_joined")
|
||||||
async def on_client_closed(transport, client):
|
async def on_first_participant_joined(transport, participant):
|
||||||
logger.info(f"Client closed connection")
|
await transport.capture_participant_transcription(participant["id"])
|
||||||
await task.cancel()
|
# Kick off the conversation.
|
||||||
|
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
||||||
|
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
@transport.event_handler("on_participant_left")
|
||||||
|
async def on_participant_left(transport, participant, reason):
|
||||||
|
await task.cancel()
|
||||||
|
|
||||||
await runner.run(task)
|
runner = PipelineRunner()
|
||||||
|
|
||||||
|
await runner.run(task)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
@@ -4,100 +4,92 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||||
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
||||||
from pipecat.services.nim.llm import NimLLMService
|
from pipecat.services.nim import NimLLMService
|
||||||
from pipecat.services.riva.stt import ParakeetSTTService
|
from pipecat.services.riva import FastPitchTTSService, ParakeetSTTService
|
||||||
from pipecat.services.riva.tts import FastPitchTTSService
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.transports.base_transport import TransportParams
|
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
|
||||||
logger.info(f"Starting bot")
|
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
async def main():
|
||||||
webrtc_connection=webrtc_connection,
|
async with aiohttp.ClientSession() as session:
|
||||||
params=TransportParams(
|
(room_url, _) = await configure(session)
|
||||||
audio_in_enabled=True,
|
|
||||||
audio_out_enabled=True,
|
|
||||||
vad_enabled=True,
|
|
||||||
vad_analyzer=SileroVADAnalyzer(),
|
|
||||||
vad_audio_passthrough=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
stt = ParakeetSTTService(api_key=os.getenv("NVIDIA_API_KEY"))
|
transport = DailyTransport(
|
||||||
|
room_url,
|
||||||
|
None,
|
||||||
|
"Respond bot",
|
||||||
|
DailyParams(
|
||||||
|
audio_out_enabled=True,
|
||||||
|
vad_enabled=True,
|
||||||
|
vad_analyzer=SileroVADAnalyzer(),
|
||||||
|
vad_audio_passthrough=True,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
llm = NimLLMService(api_key=os.getenv("NVIDIA_API_KEY"), model="meta/llama-3.1-405b-instruct")
|
stt = ParakeetSTTService(api_key=os.getenv("NVIDIA_API_KEY"))
|
||||||
|
|
||||||
tts = FastPitchTTSService(api_key=os.getenv("NVIDIA_API_KEY"))
|
llm = NimLLMService(
|
||||||
|
api_key=os.getenv("NVIDIA_API_KEY"), model="meta/llama-3.1-405b-instruct"
|
||||||
|
)
|
||||||
|
|
||||||
messages = [
|
tts = FastPitchTTSService(api_key=os.getenv("NVIDIA_API_KEY"))
|
||||||
{
|
|
||||||
"role": "system",
|
|
||||||
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
|
||||||
},
|
|
||||||
]
|
|
||||||
|
|
||||||
context = OpenAILLMContext(messages)
|
messages = [
|
||||||
context_aggregator = llm.create_context_aggregator(context)
|
{
|
||||||
|
"role": "system",
|
||||||
pipeline = Pipeline(
|
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
||||||
[
|
},
|
||||||
transport.input(), # Transport user input
|
|
||||||
stt, # STT
|
|
||||||
context_aggregator.user(), # User responses
|
|
||||||
llm, # LLM
|
|
||||||
tts, # TTS
|
|
||||||
transport.output(), # Transport bot output
|
|
||||||
context_aggregator.assistant(), # Assistant spoken responses
|
|
||||||
]
|
]
|
||||||
)
|
|
||||||
|
|
||||||
task = PipelineTask(
|
context = OpenAILLMContext(messages)
|
||||||
pipeline,
|
context_aggregator = llm.create_context_aggregator(context)
|
||||||
params=PipelineParams(
|
|
||||||
allow_interruptions=True,
|
|
||||||
enable_metrics=True,
|
|
||||||
enable_usage_metrics=True,
|
|
||||||
report_only_initial_ttfb=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_connected")
|
pipeline = Pipeline(
|
||||||
async def on_client_connected(transport, client):
|
[
|
||||||
logger.info(f"Client connected")
|
transport.input(), # Transport user input
|
||||||
# Kick off the conversation.
|
stt, # STT
|
||||||
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
context_aggregator.user(), # User responses
|
||||||
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
llm, # LLM
|
||||||
|
tts, # TTS
|
||||||
|
transport.output(), # Transport bot output
|
||||||
|
context_aggregator.assistant(), # Assistant spoken responses
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True))
|
||||||
async def on_client_disconnected(transport, client):
|
|
||||||
logger.info(f"Client disconnected")
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
@transport.event_handler("on_first_participant_joined")
|
||||||
async def on_client_closed(transport, client):
|
async def on_first_participant_joined(transport, participant):
|
||||||
logger.info(f"Client closed connection")
|
# Kick off the conversation.
|
||||||
await task.cancel()
|
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
||||||
|
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
@transport.event_handler("on_participant_left")
|
||||||
|
async def on_participant_left(transport, participant, reason):
|
||||||
|
await task.cancel()
|
||||||
|
|
||||||
await runner.run(task)
|
runner = PipelineRunner()
|
||||||
|
|
||||||
|
await runner.run(task)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
@@ -4,12 +4,16 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
from dataclasses import dataclass
|
from dataclasses import dataclass
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
import google.ai.generativelanguage as glm
|
import google.ai.generativelanguage as glm
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
||||||
from pipecat.frames.frames import (
|
from pipecat.frames.frames import (
|
||||||
@@ -28,15 +32,14 @@ from pipecat.pipeline.runner import PipelineRunner
|
|||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||||
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
||||||
from pipecat.processors.frame_processor import FrameProcessor
|
from pipecat.processors.frame_processor import FrameProcessor
|
||||||
from pipecat.services.google.llm import GoogleLLMService
|
from pipecat.services.cartesia import CartesiaTTSService
|
||||||
from pipecat.services.google.tts import GoogleTTSService
|
from pipecat.services.google import GoogleLLMService
|
||||||
from pipecat.transcriptions.language import Language
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.transports.base_transport import TransportParams
|
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
marker = "|----|"
|
marker = "|----|"
|
||||||
system_message = f"""
|
system_message = f"""
|
||||||
@@ -190,92 +193,85 @@ class TanscriptionContextFixup(FrameProcessor):
|
|||||||
await self.push_frame(frame, direction)
|
await self.push_frame(frame, direction)
|
||||||
|
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
async def main():
|
||||||
logger.info(f"Starting bot")
|
async with aiohttp.ClientSession() as session:
|
||||||
|
(room_url, token) = await configure(session)
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
transport = DailyTransport(
|
||||||
webrtc_connection=webrtc_connection,
|
room_url,
|
||||||
params=TransportParams(
|
token,
|
||||||
audio_in_enabled=True,
|
"Respond bot",
|
||||||
audio_out_enabled=True,
|
DailyParams(
|
||||||
# No transcription at all. just audio input to Gemini!
|
audio_out_enabled=True,
|
||||||
# transcription_enabled=True,
|
# No transcription at all. just audio input to Gemini!
|
||||||
vad_enabled=True,
|
# transcription_enabled=True,
|
||||||
vad_analyzer=SileroVADAnalyzer(),
|
vad_enabled=True,
|
||||||
vad_audio_passthrough=True,
|
vad_analyzer=SileroVADAnalyzer(),
|
||||||
),
|
vad_audio_passthrough=True,
|
||||||
)
|
),
|
||||||
|
)
|
||||||
|
|
||||||
llm = GoogleLLMService(api_key=os.getenv("GOOGLE_API_KEY"), model="gemini-2.0-flash-001")
|
tts = CartesiaTTSService(
|
||||||
|
api_key=os.getenv("CARTESIA_API_KEY"),
|
||||||
|
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
|
||||||
|
)
|
||||||
|
|
||||||
tts = GoogleTTSService(
|
llm = GoogleLLMService(api_key=os.getenv("GOOGLE_API_KEY"), model="gemini-2.0-flash-001")
|
||||||
voice_id="en-US-Chirp3-HD-Charon",
|
|
||||||
params=GoogleTTSService.InputParams(language=Language.EN_US),
|
|
||||||
credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
|
|
||||||
)
|
|
||||||
|
|
||||||
messages = [
|
messages = [
|
||||||
{
|
{
|
||||||
"role": "system",
|
"role": "system",
|
||||||
"content": system_message,
|
"content": system_message,
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"role": "user",
|
"role": "user",
|
||||||
"content": "Start by saying hello.",
|
"content": "Start by saying hello.",
|
||||||
},
|
},
|
||||||
]
|
|
||||||
|
|
||||||
context = OpenAILLMContext(messages)
|
|
||||||
context_aggregator = llm.create_context_aggregator(context)
|
|
||||||
audio_collector = UserAudioCollector(context, context_aggregator.user())
|
|
||||||
pull_transcript_out_of_llm_output = TranscriptExtractor(context)
|
|
||||||
fixup_context_messages = TanscriptionContextFixup(context)
|
|
||||||
|
|
||||||
pipeline = Pipeline(
|
|
||||||
[
|
|
||||||
transport.input(), # Transport user input
|
|
||||||
audio_collector,
|
|
||||||
context_aggregator.user(), # User responses
|
|
||||||
llm, # LLM
|
|
||||||
pull_transcript_out_of_llm_output,
|
|
||||||
tts, # TTS
|
|
||||||
transport.output(), # Transport bot output
|
|
||||||
context_aggregator.assistant(), # Assistant spoken responses
|
|
||||||
fixup_context_messages,
|
|
||||||
]
|
]
|
||||||
)
|
|
||||||
|
|
||||||
task = PipelineTask(
|
context = OpenAILLMContext(messages)
|
||||||
pipeline,
|
context_aggregator = llm.create_context_aggregator(context)
|
||||||
params=PipelineParams(
|
audio_collector = UserAudioCollector(context, context_aggregator.user())
|
||||||
allow_interruptions=True,
|
pull_transcript_out_of_llm_output = TranscriptExtractor(context)
|
||||||
enable_metrics=True,
|
fixup_context_messages = TanscriptionContextFixup(context)
|
||||||
enable_usage_metrics=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_connected")
|
pipeline = Pipeline(
|
||||||
async def on_client_connected(transport, client):
|
[
|
||||||
logger.info(f"Client connected")
|
transport.input(), # Transport user input
|
||||||
# Kick off the conversation.
|
audio_collector,
|
||||||
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
context_aggregator.user(), # User responses
|
||||||
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
llm, # LLM
|
||||||
|
pull_transcript_out_of_llm_output,
|
||||||
|
tts, # TTS
|
||||||
|
transport.output(), # Transport bot output
|
||||||
|
context_aggregator.assistant(), # Assistant spoken responses
|
||||||
|
fixup_context_messages,
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
task = PipelineTask(
|
||||||
async def on_client_disconnected(transport, client):
|
pipeline,
|
||||||
logger.info(f"Client disconnected")
|
params=PipelineParams(
|
||||||
|
allow_interruptions=True,
|
||||||
|
enable_metrics=True,
|
||||||
|
enable_usage_metrics=True,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
@transport.event_handler("on_first_participant_joined")
|
||||||
async def on_client_closed(transport, client):
|
async def on_first_participant_joined(transport, participant):
|
||||||
logger.info(f"Client closed connection")
|
await transport.capture_participant_transcription(participant["id"])
|
||||||
await task.cancel()
|
# Kick off the conversation.
|
||||||
|
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
@transport.event_handler("on_participant_left")
|
||||||
|
async def on_participant_left(transport, participant, reason):
|
||||||
|
await task.cancel()
|
||||||
|
|
||||||
await runner.run(task)
|
runner = PipelineRunner()
|
||||||
|
|
||||||
|
await runner.run(task)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
@@ -4,103 +4,99 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||||
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
||||||
from pipecat.services.deepgram.stt import DeepgramSTTService
|
from pipecat.services.fish import FishAudioTTSService
|
||||||
from pipecat.services.fish.tts import FishAudioTTSService
|
from pipecat.services.openai import OpenAILLMService
|
||||||
from pipecat.services.openai.llm import OpenAILLMService
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.transports.base_transport import TransportParams
|
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
|
||||||
logger.info(f"Starting bot")
|
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
async def main():
|
||||||
webrtc_connection=webrtc_connection,
|
async with aiohttp.ClientSession() as session:
|
||||||
params=TransportParams(
|
(room_url, token) = await configure(session)
|
||||||
audio_in_enabled=True,
|
|
||||||
audio_out_enabled=True,
|
|
||||||
vad_enabled=True,
|
|
||||||
vad_analyzer=SileroVADAnalyzer(),
|
|
||||||
vad_audio_passthrough=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
|
transport = DailyTransport(
|
||||||
|
room_url,
|
||||||
|
token,
|
||||||
|
"Respond bot",
|
||||||
|
DailyParams(
|
||||||
|
audio_out_enabled=True,
|
||||||
|
transcription_enabled=True,
|
||||||
|
vad_enabled=True,
|
||||||
|
vad_analyzer=SileroVADAnalyzer(),
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
tts = FishAudioTTSService(
|
tts = FishAudioTTSService(
|
||||||
api_key=os.getenv("FISH_API_KEY"),
|
api_key=os.getenv("FISH_API_KEY"),
|
||||||
model="4ce7e917cedd4bc2bb2e6ff3a46acaa1", # Barack Obama
|
model="4ce7e917cedd4bc2bb2e6ff3a46acaa1", # Barack Obama
|
||||||
)
|
)
|
||||||
|
|
||||||
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
|
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
|
||||||
|
|
||||||
messages = [
|
messages = [
|
||||||
{
|
{
|
||||||
"role": "system",
|
"role": "system",
|
||||||
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
||||||
},
|
},
|
||||||
]
|
|
||||||
|
|
||||||
context = OpenAILLMContext(messages)
|
|
||||||
context_aggregator = llm.create_context_aggregator(context)
|
|
||||||
|
|
||||||
pipeline = Pipeline(
|
|
||||||
[
|
|
||||||
transport.input(), # Transport user input
|
|
||||||
stt,
|
|
||||||
context_aggregator.user(), # User responses
|
|
||||||
llm, # LLM
|
|
||||||
tts, # TTS
|
|
||||||
transport.output(), # Transport bot output
|
|
||||||
context_aggregator.assistant(), # Assistant spoken responses
|
|
||||||
]
|
]
|
||||||
)
|
|
||||||
|
|
||||||
task = PipelineTask(
|
context = OpenAILLMContext(messages)
|
||||||
pipeline,
|
context_aggregator = llm.create_context_aggregator(context)
|
||||||
params=PipelineParams(
|
|
||||||
allow_interruptions=True,
|
|
||||||
enable_metrics=True,
|
|
||||||
enable_usage_metrics=True,
|
|
||||||
report_only_initial_ttfb=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_connected")
|
pipeline = Pipeline(
|
||||||
async def on_client_connected(transport, client):
|
[
|
||||||
logger.info(f"Client connected")
|
transport.input(), # Transport user input
|
||||||
# Kick off the conversation.
|
context_aggregator.user(), # User responses
|
||||||
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
llm, # LLM
|
||||||
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
tts, # TTS
|
||||||
|
transport.output(), # Transport bot output
|
||||||
|
context_aggregator.assistant(), # Assistant spoken responses
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
task = PipelineTask(
|
||||||
async def on_client_disconnected(transport, client):
|
pipeline,
|
||||||
logger.info(f"Client disconnected")
|
params=PipelineParams(
|
||||||
|
allow_interruptions=True,
|
||||||
|
enable_metrics=True,
|
||||||
|
enable_usage_metrics=True,
|
||||||
|
report_only_initial_ttfb=True,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
@transport.event_handler("on_first_participant_joined")
|
||||||
async def on_client_closed(transport, client):
|
async def on_first_participant_joined(transport, participant):
|
||||||
logger.info(f"Client closed connection")
|
await transport.capture_participant_transcription(participant["id"])
|
||||||
await task.cancel()
|
# Kick off the conversation.
|
||||||
|
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
||||||
|
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
@transport.event_handler("on_participant_left")
|
||||||
|
async def on_participant_left(transport, participant, reason):
|
||||||
|
await task.cancel()
|
||||||
|
|
||||||
await runner.run(task)
|
runner = PipelineRunner()
|
||||||
|
|
||||||
|
await runner.run(task)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
@@ -1,95 +0,0 @@
|
|||||||
#
|
|
||||||
# Copyright (c) 2024–2025, Daily
|
|
||||||
#
|
|
||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
|
||||||
#
|
|
||||||
|
|
||||||
import os
|
|
||||||
|
|
||||||
from dotenv import load_dotenv
|
|
||||||
from loguru import logger
|
|
||||||
|
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
|
||||||
from pipecat.audio.vad.vad_analyzer import VADParams
|
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
|
||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
|
||||||
from pipecat.services.cartesia.tts import CartesiaTTSService
|
|
||||||
from pipecat.services.ultravox.stt import UltravoxSTTService
|
|
||||||
from pipecat.transports.base_transport import TransportParams
|
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
|
||||||
|
|
||||||
# NOTE: This example requires GPU resources to run efficiently.
|
|
||||||
# The Ultravox model is compute-intensive and performs best with GPU acceleration.
|
|
||||||
# This can be deployed on cloud GPU providers like Cerebrium.ai for optimal performance.
|
|
||||||
|
|
||||||
|
|
||||||
# Want to initialize the ultravox processor since it takes time to load the model and dont
|
|
||||||
# want to load it every time the pipeline is run
|
|
||||||
ultravox_processor = UltravoxSTTService(
|
|
||||||
model_name="fixie-ai/ultravox-v0_5-llama-3_1-8b",
|
|
||||||
hf_token=os.getenv("HF_TOKEN"),
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
|
||||||
logger.info(f"Starting bot")
|
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
|
||||||
webrtc_connection=webrtc_connection,
|
|
||||||
params=TransportParams(
|
|
||||||
audio_in_enabled=True,
|
|
||||||
audio_out_enabled=True,
|
|
||||||
vad_enabled=True,
|
|
||||||
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
|
|
||||||
vad_audio_passthrough=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
tts = CartesiaTTSService(
|
|
||||||
api_key=os.environ.get("CARTESIA_API_KEY"),
|
|
||||||
voice_id="97f4b8fb-f2fe-444b-bb9a-c109783a857a",
|
|
||||||
)
|
|
||||||
|
|
||||||
pipeline = Pipeline(
|
|
||||||
[
|
|
||||||
transport.input(), # Transport user input
|
|
||||||
ultravox_processor,
|
|
||||||
tts, # TTS
|
|
||||||
transport.output(), # Transport bot output
|
|
||||||
]
|
|
||||||
)
|
|
||||||
|
|
||||||
task = PipelineTask(
|
|
||||||
pipeline,
|
|
||||||
params=PipelineParams(
|
|
||||||
allow_interruptions=True,
|
|
||||||
enable_metrics=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_connected")
|
|
||||||
async def on_client_connected(transport, client):
|
|
||||||
logger.info(f"Client connected")
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
|
||||||
async def on_client_disconnected(transport, client):
|
|
||||||
logger.info(f"Client disconnected")
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
|
||||||
async def on_client_closed(transport, client):
|
|
||||||
logger.info(f"Client closed connection")
|
|
||||||
await task.cancel()
|
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
|
||||||
|
|
||||||
await runner.run(task)
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
from run import main
|
|
||||||
|
|
||||||
main()
|
|
||||||
@@ -1,106 +0,0 @@
|
|||||||
#
|
|
||||||
# Copyright (c) 2024–2025, Daily
|
|
||||||
#
|
|
||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
|
||||||
#
|
|
||||||
|
|
||||||
import os
|
|
||||||
|
|
||||||
from dotenv import load_dotenv
|
|
||||||
from loguru import logger
|
|
||||||
|
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
|
||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
|
||||||
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
|
||||||
from pipecat.services.deepgram.stt import DeepgramSTTService
|
|
||||||
from pipecat.services.neuphonic.tts import NeuphonicHttpTTSService
|
|
||||||
from pipecat.services.openai.llm import OpenAILLMService
|
|
||||||
from pipecat.transports.base_transport import TransportParams
|
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
|
||||||
|
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
|
||||||
logger.info(f"Starting bot")
|
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
|
||||||
webrtc_connection=webrtc_connection,
|
|
||||||
params=TransportParams(
|
|
||||||
audio_in_enabled=True,
|
|
||||||
audio_out_enabled=True,
|
|
||||||
vad_enabled=True,
|
|
||||||
vad_analyzer=SileroVADAnalyzer(),
|
|
||||||
vad_audio_passthrough=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
|
|
||||||
|
|
||||||
tts = NeuphonicHttpTTSService(
|
|
||||||
api_key=os.getenv("NEUPHONIC_API_KEY"),
|
|
||||||
voice_id="fc854436-2dac-4d21-aa69-ae17b54e98eb", # Emily
|
|
||||||
)
|
|
||||||
|
|
||||||
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
|
|
||||||
|
|
||||||
messages = [
|
|
||||||
{
|
|
||||||
"role": "system",
|
|
||||||
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
|
||||||
},
|
|
||||||
]
|
|
||||||
|
|
||||||
context = OpenAILLMContext(messages)
|
|
||||||
context_aggregator = llm.create_context_aggregator(context)
|
|
||||||
|
|
||||||
pipeline = Pipeline(
|
|
||||||
[
|
|
||||||
transport.input(), # Transport user input
|
|
||||||
stt,
|
|
||||||
context_aggregator.user(), # User responses
|
|
||||||
llm, # LLM
|
|
||||||
tts, # TTS
|
|
||||||
transport.output(), # Transport bot output
|
|
||||||
context_aggregator.assistant(), # Assistant spoken responses
|
|
||||||
]
|
|
||||||
)
|
|
||||||
|
|
||||||
task = PipelineTask(
|
|
||||||
pipeline,
|
|
||||||
params=PipelineParams(
|
|
||||||
allow_interruptions=True,
|
|
||||||
enable_metrics=True,
|
|
||||||
enable_usage_metrics=True,
|
|
||||||
report_only_initial_ttfb=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_connected")
|
|
||||||
async def on_client_connected(transport, client):
|
|
||||||
logger.info(f"Client connected")
|
|
||||||
# Kick off the conversation.
|
|
||||||
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
|
||||||
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
|
||||||
async def on_client_disconnected(transport, client):
|
|
||||||
logger.info(f"Client disconnected")
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
|
||||||
async def on_client_closed(transport, client):
|
|
||||||
logger.info(f"Client closed connection")
|
|
||||||
await task.cancel()
|
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
|
||||||
|
|
||||||
await runner.run(task)
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
from run import main
|
|
||||||
|
|
||||||
main()
|
|
||||||
@@ -1,106 +0,0 @@
|
|||||||
#
|
|
||||||
# Copyright (c) 2024–2025, Daily
|
|
||||||
#
|
|
||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
|
||||||
#
|
|
||||||
|
|
||||||
import os
|
|
||||||
|
|
||||||
from dotenv import load_dotenv
|
|
||||||
from loguru import logger
|
|
||||||
|
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
|
||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
|
||||||
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
|
||||||
from pipecat.services.deepgram.stt import DeepgramSTTService
|
|
||||||
from pipecat.services.neuphonic.tts import NeuphonicTTSService
|
|
||||||
from pipecat.services.openai.llm import OpenAILLMService
|
|
||||||
from pipecat.transports.base_transport import TransportParams
|
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
|
||||||
|
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
|
||||||
logger.info(f"Starting bot")
|
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
|
||||||
webrtc_connection=webrtc_connection,
|
|
||||||
params=TransportParams(
|
|
||||||
audio_in_enabled=True,
|
|
||||||
audio_out_enabled=True,
|
|
||||||
vad_enabled=True,
|
|
||||||
vad_analyzer=SileroVADAnalyzer(),
|
|
||||||
vad_audio_passthrough=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
|
|
||||||
|
|
||||||
tts = NeuphonicTTSService(
|
|
||||||
api_key=os.getenv("NEUPHONIC_API_KEY"),
|
|
||||||
voice_id="fc854436-2dac-4d21-aa69-ae17b54e98eb", # Emily
|
|
||||||
)
|
|
||||||
|
|
||||||
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
|
|
||||||
|
|
||||||
messages = [
|
|
||||||
{
|
|
||||||
"role": "system",
|
|
||||||
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
|
||||||
},
|
|
||||||
]
|
|
||||||
|
|
||||||
context = OpenAILLMContext(messages)
|
|
||||||
context_aggregator = llm.create_context_aggregator(context)
|
|
||||||
|
|
||||||
pipeline = Pipeline(
|
|
||||||
[
|
|
||||||
transport.input(), # Transport user input
|
|
||||||
stt,
|
|
||||||
context_aggregator.user(), # User responses
|
|
||||||
llm, # LLM
|
|
||||||
tts, # TTS
|
|
||||||
transport.output(), # Transport bot output
|
|
||||||
context_aggregator.assistant(), # Assistant spoken responses
|
|
||||||
]
|
|
||||||
)
|
|
||||||
|
|
||||||
task = PipelineTask(
|
|
||||||
pipeline,
|
|
||||||
params=PipelineParams(
|
|
||||||
allow_interruptions=True,
|
|
||||||
enable_metrics=True,
|
|
||||||
enable_usage_metrics=True,
|
|
||||||
report_only_initial_ttfb=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_connected")
|
|
||||||
async def on_client_connected(transport, client):
|
|
||||||
logger.info(f"Client connected")
|
|
||||||
# Kick off the conversation.
|
|
||||||
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
|
||||||
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
|
||||||
async def on_client_disconnected(transport, client):
|
|
||||||
logger.info(f"Client disconnected")
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
|
||||||
async def on_client_closed(transport, client):
|
|
||||||
logger.info(f"Client closed connection")
|
|
||||||
await task.cancel()
|
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
|
||||||
|
|
||||||
await runner.run(task)
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
from run import main
|
|
||||||
|
|
||||||
main()
|
|
||||||
@@ -1,108 +0,0 @@
|
|||||||
#
|
|
||||||
# Copyright (c) 2024–2025, Daily
|
|
||||||
#
|
|
||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
|
||||||
#
|
|
||||||
|
|
||||||
import os
|
|
||||||
|
|
||||||
from dotenv import load_dotenv
|
|
||||||
from loguru import logger
|
|
||||||
|
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
|
||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
|
||||||
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
|
||||||
from pipecat.services.cartesia.tts import CartesiaTTSService
|
|
||||||
from pipecat.services.fal.stt import FalSTTService
|
|
||||||
from pipecat.services.openai.llm import OpenAILLMService
|
|
||||||
from pipecat.transports.base_transport import TransportParams
|
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
|
||||||
|
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
|
||||||
logger.info(f"Starting bot")
|
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
|
||||||
webrtc_connection=webrtc_connection,
|
|
||||||
params=TransportParams(
|
|
||||||
audio_in_enabled=True,
|
|
||||||
audio_out_enabled=True,
|
|
||||||
vad_enabled=True,
|
|
||||||
vad_analyzer=SileroVADAnalyzer(),
|
|
||||||
vad_audio_passthrough=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
stt = FalSTTService(
|
|
||||||
api_key=os.getenv("FAL_KEY"),
|
|
||||||
)
|
|
||||||
|
|
||||||
tts = CartesiaTTSService(
|
|
||||||
api_key=os.getenv("CARTESIA_API_KEY"),
|
|
||||||
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
|
|
||||||
)
|
|
||||||
|
|
||||||
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
|
|
||||||
|
|
||||||
messages = [
|
|
||||||
{
|
|
||||||
"role": "system",
|
|
||||||
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
|
||||||
},
|
|
||||||
]
|
|
||||||
|
|
||||||
context = OpenAILLMContext(messages)
|
|
||||||
context_aggregator = llm.create_context_aggregator(context)
|
|
||||||
|
|
||||||
pipeline = Pipeline(
|
|
||||||
[
|
|
||||||
transport.input(), # Transport user input
|
|
||||||
stt, # STT
|
|
||||||
context_aggregator.user(), # User responses
|
|
||||||
llm, # LLM
|
|
||||||
tts, # TTS
|
|
||||||
transport.output(), # Transport bot output
|
|
||||||
context_aggregator.assistant(), # Assistant spoken responses
|
|
||||||
]
|
|
||||||
)
|
|
||||||
|
|
||||||
task = PipelineTask(
|
|
||||||
pipeline,
|
|
||||||
params=PipelineParams(
|
|
||||||
allow_interruptions=True,
|
|
||||||
enable_metrics=True,
|
|
||||||
enable_usage_metrics=True,
|
|
||||||
report_only_initial_ttfb=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_connected")
|
|
||||||
async def on_client_connected(transport, client):
|
|
||||||
logger.info(f"Client connected")
|
|
||||||
# Kick off the conversation.
|
|
||||||
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
|
||||||
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
|
||||||
async def on_client_disconnected(transport, client):
|
|
||||||
logger.info(f"Client disconnected")
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
|
||||||
async def on_client_closed(transport, client):
|
|
||||||
logger.info(f"Client closed connection")
|
|
||||||
await task.cancel()
|
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
|
||||||
|
|
||||||
await runner.run(task)
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
from run import main
|
|
||||||
|
|
||||||
main()
|
|
||||||
@@ -1,91 +0,0 @@
|
|||||||
#
|
|
||||||
# Copyright (c) 2024–2025, Daily
|
|
||||||
#
|
|
||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
|
||||||
#
|
|
||||||
|
|
||||||
import asyncio
|
|
||||||
import os
|
|
||||||
import sys
|
|
||||||
|
|
||||||
from dotenv import load_dotenv
|
|
||||||
from loguru import logger
|
|
||||||
|
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
|
||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
|
||||||
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
|
||||||
from pipecat.services.cartesia.tts import CartesiaTTSService
|
|
||||||
from pipecat.services.deepgram.stt import DeepgramSTTService
|
|
||||||
from pipecat.services.openai.llm import OpenAILLMService
|
|
||||||
from pipecat.transports.local.audio import LocalAudioTransport, LocalAudioTransportParams
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
|
||||||
|
|
||||||
logger.remove(0)
|
|
||||||
logger.add(sys.stderr, level="DEBUG")
|
|
||||||
|
|
||||||
|
|
||||||
async def main():
|
|
||||||
transport = LocalAudioTransport(
|
|
||||||
LocalAudioTransportParams(
|
|
||||||
audio_in_enabled=True,
|
|
||||||
audio_out_enabled=True,
|
|
||||||
vad_enabled=True,
|
|
||||||
vad_analyzer=SileroVADAnalyzer(),
|
|
||||||
vad_audio_passthrough=True,
|
|
||||||
)
|
|
||||||
)
|
|
||||||
|
|
||||||
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
|
|
||||||
|
|
||||||
tts = CartesiaTTSService(
|
|
||||||
api_key=os.getenv("CARTESIA_API_KEY"),
|
|
||||||
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
|
|
||||||
)
|
|
||||||
|
|
||||||
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
|
|
||||||
|
|
||||||
messages = [
|
|
||||||
{
|
|
||||||
"role": "system",
|
|
||||||
"content": "You are a helpful LLM. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
|
||||||
},
|
|
||||||
]
|
|
||||||
|
|
||||||
context = OpenAILLMContext(messages)
|
|
||||||
context_aggregator = llm.create_context_aggregator(context)
|
|
||||||
|
|
||||||
pipeline = Pipeline(
|
|
||||||
[
|
|
||||||
transport.input(), # Transport user input
|
|
||||||
stt,
|
|
||||||
context_aggregator.user(), # User responses
|
|
||||||
llm, # LLM
|
|
||||||
tts, # TTS
|
|
||||||
transport.output(), # Transport bot output
|
|
||||||
context_aggregator.assistant(), # Assistant spoken responses
|
|
||||||
]
|
|
||||||
)
|
|
||||||
|
|
||||||
task = PipelineTask(
|
|
||||||
pipeline,
|
|
||||||
params=PipelineParams(
|
|
||||||
allow_interruptions=True,
|
|
||||||
enable_metrics=True,
|
|
||||||
enable_usage_metrics=True,
|
|
||||||
report_only_initial_ttfb=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
|
|
||||||
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
|
||||||
|
|
||||||
runner = PipelineRunner()
|
|
||||||
|
|
||||||
await runner.run(task)
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
asyncio.run(main())
|
|
||||||
@@ -4,8 +4,8 @@ import os
|
|||||||
from typing import Tuple
|
from typing import Tuple
|
||||||
|
|
||||||
import aiohttp
|
import aiohttp
|
||||||
from daily_runner import configure
|
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.frames.frames import AudioFrame, EndFrame, ImageFrame, LLMMessagesFrame, TextFrame
|
from pipecat.frames.frames import AudioFrame, EndFrame, ImageFrame, LLMMessagesFrame, TextFrame
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
@@ -72,8 +72,7 @@ async def main():
|
|||||||
|
|
||||||
async def get_text_and_audio(messages) -> Tuple[str, bytearray]:
|
async def get_text_and_audio(messages) -> Tuple[str, bytearray]:
|
||||||
"""This function streams text from the LLM and uses the TTS service to convert
|
"""This function streams text from the LLM and uses the TTS service to convert
|
||||||
that text to speech as it's received.
|
that text to speech as it's received."""
|
||||||
"""
|
|
||||||
source_queue = asyncio.Queue()
|
source_queue = asyncio.Queue()
|
||||||
sink_queue = asyncio.Queue()
|
sink_queue = asyncio.Queue()
|
||||||
sentence_aggregator = SentenceAggregator()
|
sentence_aggregator = SentenceAggregator()
|
||||||
|
|||||||
@@ -4,9 +4,13 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
|
import sys
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.frames.frames import (
|
from pipecat.frames.frames import (
|
||||||
Frame,
|
Frame,
|
||||||
@@ -19,12 +23,13 @@ from pipecat.pipeline.pipeline import Pipeline
|
|||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||||
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
|
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
|
||||||
from pipecat.transports.base_transport import TransportParams
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
|
|
||||||
class MirrorProcessor(FrameProcessor):
|
class MirrorProcessor(FrameProcessor):
|
||||||
async def process_frame(self, frame: Frame, direction: FrameDirection):
|
async def process_frame(self, frame: Frame, direction: FrameDirection):
|
||||||
@@ -39,7 +44,6 @@ class MirrorProcessor(FrameProcessor):
|
|||||||
)
|
)
|
||||||
)
|
)
|
||||||
elif isinstance(frame, InputImageRawFrame):
|
elif isinstance(frame, InputImageRawFrame):
|
||||||
print(f"Received image frame: {frame.size} {frame.format}")
|
|
||||||
await self.push_frame(
|
await self.push_frame(
|
||||||
OutputImageRawFrame(image=frame.image, size=frame.size, format=frame.format)
|
OutputImageRawFrame(image=frame.image, size=frame.size, format=frame.format)
|
||||||
)
|
)
|
||||||
@@ -47,48 +51,42 @@ class MirrorProcessor(FrameProcessor):
|
|||||||
await self.push_frame(frame, direction)
|
await self.push_frame(frame, direction)
|
||||||
|
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
async def main():
|
||||||
logger.info(f"Starting bot")
|
async with aiohttp.ClientSession() as session:
|
||||||
|
(room_url, token) = await configure(session)
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
transport = DailyTransport(
|
||||||
webrtc_connection=webrtc_connection,
|
room_url,
|
||||||
params=TransportParams(
|
token,
|
||||||
audio_in_enabled=True,
|
"Test",
|
||||||
audio_out_enabled=True,
|
DailyParams(
|
||||||
camera_in_enabled=True,
|
audio_in_enabled=True,
|
||||||
camera_out_enabled=True,
|
audio_out_enabled=True,
|
||||||
camera_out_is_live=True,
|
camera_out_enabled=True,
|
||||||
camera_out_width=1280,
|
camera_out_is_live=True,
|
||||||
camera_out_height=720,
|
camera_out_width=1280,
|
||||||
),
|
camera_out_height=720,
|
||||||
)
|
),
|
||||||
|
)
|
||||||
|
|
||||||
pipeline = Pipeline([transport.input(), MirrorProcessor(), transport.output()])
|
@transport.event_handler("on_first_participant_joined")
|
||||||
|
async def on_first_participant_joined(transport, participant):
|
||||||
|
await transport.capture_participant_video(participant["id"])
|
||||||
|
|
||||||
task = PipelineTask(
|
pipeline = Pipeline([transport.input(), MirrorProcessor(), transport.output()])
|
||||||
pipeline,
|
|
||||||
params=PipelineParams(),
|
|
||||||
)
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_connected")
|
runner = PipelineRunner()
|
||||||
async def on_client_connected(transport, client):
|
|
||||||
logger.info(f"Client connected")
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
task = PipelineTask(
|
||||||
async def on_client_disconnected(transport, client):
|
pipeline,
|
||||||
logger.info(f"Client disconnected")
|
params=PipelineParams(
|
||||||
|
audio_in_sample_rate=24000,
|
||||||
|
audio_out_sample_rate=24000,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
await runner.run(task)
|
||||||
async def on_client_closed(transport, client):
|
|
||||||
logger.info(f"Client closed connection")
|
|
||||||
await task.cancel()
|
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
|
||||||
|
|
||||||
await runner.run(task)
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
@@ -5,10 +5,13 @@
|
|||||||
#
|
#
|
||||||
|
|
||||||
import asyncio
|
import asyncio
|
||||||
|
import sys
|
||||||
import tkinter as tk
|
import tkinter as tk
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.frames.frames import (
|
from pipecat.frames.frames import (
|
||||||
Frame,
|
Frame,
|
||||||
@@ -21,13 +24,14 @@ from pipecat.pipeline.pipeline import Pipeline
|
|||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||||
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
|
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
|
||||||
from pipecat.transports.base_transport import TransportParams
|
|
||||||
from pipecat.transports.local.tk import TkLocalTransport, TkTransportParams
|
from pipecat.transports.local.tk import TkLocalTransport, TkTransportParams
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
|
|
||||||
class MirrorProcessor(FrameProcessor):
|
class MirrorProcessor(FrameProcessor):
|
||||||
async def process_frame(self, frame: Frame, direction: FrameDirection):
|
async def process_frame(self, frame: Frame, direction: FrameDirection):
|
||||||
@@ -49,59 +53,52 @@ class MirrorProcessor(FrameProcessor):
|
|||||||
await self.push_frame(frame, direction)
|
await self.push_frame(frame, direction)
|
||||||
|
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
async def main():
|
||||||
logger.info(f"Starting bot")
|
async with aiohttp.ClientSession() as session:
|
||||||
|
(room_url, token) = await configure(session)
|
||||||
|
|
||||||
p2p_transport = SmallWebRTCTransport(
|
tk_root = tk.Tk()
|
||||||
webrtc_connection=webrtc_connection,
|
tk_root.title("Local Mirror")
|
||||||
params=TransportParams(
|
|
||||||
audio_in_enabled=True,
|
|
||||||
audio_out_enabled=True,
|
|
||||||
camera_in_enabled=True,
|
|
||||||
camera_out_enabled=True,
|
|
||||||
camera_out_is_live=True,
|
|
||||||
camera_out_width=1280,
|
|
||||||
camera_out_height=720,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
tk_root = tk.Tk()
|
daily_transport = DailyTransport(
|
||||||
tk_root.title("Local Mirror")
|
room_url, token, "Test", DailyParams(audio_in_enabled=True)
|
||||||
|
)
|
||||||
|
|
||||||
tk_transport = TkLocalTransport(
|
tk_transport = TkLocalTransport(
|
||||||
tk_root,
|
tk_root,
|
||||||
TkTransportParams(
|
TkTransportParams(
|
||||||
audio_out_enabled=True,
|
audio_out_enabled=True,
|
||||||
camera_out_enabled=True,
|
camera_out_enabled=True,
|
||||||
camera_out_is_live=True,
|
camera_out_is_live=True,
|
||||||
camera_out_width=1280,
|
camera_out_width=1280,
|
||||||
camera_out_height=720,
|
camera_out_height=720,
|
||||||
),
|
),
|
||||||
)
|
)
|
||||||
|
|
||||||
@p2p_transport.event_handler("on_client_connected")
|
@daily_transport.event_handler("on_first_participant_joined")
|
||||||
async def on_client_connected(transport, client):
|
async def on_first_participant_joined(transport, participant):
|
||||||
logger.info(f"Client connected")
|
await transport.capture_participant_video(participant["id"])
|
||||||
|
|
||||||
pipeline = Pipeline([p2p_transport.input(), MirrorProcessor(), tk_transport.output()])
|
pipeline = Pipeline([daily_transport.input(), MirrorProcessor(), tk_transport.output()])
|
||||||
|
|
||||||
task = PipelineTask(
|
task = PipelineTask(
|
||||||
pipeline,
|
pipeline,
|
||||||
params=PipelineParams(),
|
params=PipelineParams(
|
||||||
)
|
audio_in_sample_rate=24000,
|
||||||
|
audio_out_sample_rate=24000,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
async def run_tk():
|
async def run_tk():
|
||||||
while not task.has_finished():
|
while not task.has_finished():
|
||||||
tk_root.update()
|
tk_root.update()
|
||||||
tk_root.update_idletasks()
|
tk_root.update_idletasks()
|
||||||
await asyncio.sleep(0.1)
|
await asyncio.sleep(0.1)
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
runner = PipelineRunner()
|
||||||
|
|
||||||
await asyncio.gather(runner.run(task), run_tk())
|
await asyncio.gather(runner.run(task), run_tk())
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
@@ -4,99 +4,89 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
||||||
from pipecat.frames.frames import TTSSpeakFrame
|
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||||
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
||||||
from pipecat.processors.filters.wake_check_filter import WakeCheckFilter
|
from pipecat.processors.filters.wake_check_filter import WakeCheckFilter
|
||||||
from pipecat.services.cartesia.tts import CartesiaTTSService
|
from pipecat.services.cartesia import CartesiaTTSService
|
||||||
from pipecat.services.deepgram.stt import DeepgramSTTService
|
from pipecat.services.openai import OpenAILLMService
|
||||||
from pipecat.services.openai.llm import OpenAILLMService
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.transports.base_transport import TransportParams
|
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
|
||||||
logger.info(f"Starting bot")
|
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
async def main():
|
||||||
webrtc_connection=webrtc_connection,
|
async with aiohttp.ClientSession() as session:
|
||||||
params=TransportParams(
|
(room_url, token) = await configure(session)
|
||||||
audio_in_enabled=True,
|
|
||||||
audio_out_enabled=True,
|
|
||||||
vad_enabled=True,
|
|
||||||
vad_analyzer=SileroVADAnalyzer(),
|
|
||||||
vad_audio_passthrough=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
|
transport = DailyTransport(
|
||||||
|
room_url,
|
||||||
|
token,
|
||||||
|
"Robot",
|
||||||
|
DailyParams(
|
||||||
|
audio_out_enabled=True,
|
||||||
|
transcription_enabled=True,
|
||||||
|
vad_enabled=True,
|
||||||
|
vad_analyzer=SileroVADAnalyzer(),
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
tts = CartesiaTTSService(
|
tts = CartesiaTTSService(
|
||||||
api_key=os.getenv("CARTESIA_API_KEY"),
|
api_key=os.getenv("CARTESIA_API_KEY"),
|
||||||
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
|
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
|
||||||
)
|
)
|
||||||
|
|
||||||
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
|
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
|
||||||
|
|
||||||
messages = [
|
messages = [
|
||||||
{
|
{
|
||||||
"role": "system",
|
"role": "system",
|
||||||
"content": "You are a helpful assistant. Respond to what the user said in a creative and helpful way. Keep your responses brief.",
|
"content": "You are a helpful assistant. Respond to what the user said in a creative and helpful way. Keep your responses brief.",
|
||||||
},
|
},
|
||||||
]
|
|
||||||
|
|
||||||
hey_robot_filter = WakeCheckFilter(["hey robot", "hey, robot"])
|
|
||||||
|
|
||||||
context = OpenAILLMContext(messages)
|
|
||||||
context_aggregator = llm.create_context_aggregator(context)
|
|
||||||
|
|
||||||
pipeline = Pipeline(
|
|
||||||
[
|
|
||||||
transport.input(), # Transport user input
|
|
||||||
stt, # STT
|
|
||||||
hey_robot_filter, # Filter out speech not directed at the robot
|
|
||||||
context_aggregator.user(), # User responses
|
|
||||||
llm, # LLM
|
|
||||||
tts, # TTS
|
|
||||||
transport.output(), # Transport bot output
|
|
||||||
context_aggregator.assistant(), # Assistant spoken responses
|
|
||||||
]
|
]
|
||||||
)
|
|
||||||
|
|
||||||
task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True))
|
hey_robot_filter = WakeCheckFilter(["hey robot", "hey, robot"])
|
||||||
|
|
||||||
@transport.event_handler("on_client_connected")
|
context = OpenAILLMContext(messages)
|
||||||
async def on_client_connected(transport, client):
|
context_aggregator = llm.create_context_aggregator(context)
|
||||||
logger.info(f"Client connected")
|
|
||||||
# Kick off the conversation.
|
|
||||||
await task.queue_frame(TTSSpeakFrame("Hi! If you want to talk to me, just say 'Hey Robot'"))
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
pipeline = Pipeline(
|
||||||
async def on_client_disconnected(transport, client):
|
[
|
||||||
logger.info(f"Client disconnected")
|
transport.input(), # Transport user input
|
||||||
|
hey_robot_filter, # Filter out speech not directed at the robot
|
||||||
|
context_aggregator.user(), # User responses
|
||||||
|
llm, # LLM
|
||||||
|
tts, # TTS
|
||||||
|
transport.output(), # Transport bot output
|
||||||
|
context_aggregator.assistant(), # Assistant spoken responses
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True))
|
||||||
async def on_client_closed(transport, client):
|
|
||||||
logger.info(f"Client closed connection")
|
|
||||||
await task.cancel()
|
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
@transport.event_handler("on_first_participant_joined")
|
||||||
|
async def on_first_participant_joined(transport, participant):
|
||||||
|
await transport.capture_participant_transcription(participant["id"])
|
||||||
|
await tts.say("Hi! If you want to talk to me, just say 'Hey Robot'.")
|
||||||
|
|
||||||
await runner.run(task)
|
runner = PipelineRunner()
|
||||||
|
|
||||||
|
await runner.run(task)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
@@ -4,18 +4,21 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
import wave
|
import wave
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
||||||
from pipecat.frames.frames import (
|
from pipecat.frames.frames import (
|
||||||
Frame,
|
Frame,
|
||||||
LLMFullResponseEndFrame,
|
LLMFullResponseEndFrame,
|
||||||
OutputAudioRawFrame,
|
OutputAudioRawFrame,
|
||||||
TTSSpeakFrame,
|
|
||||||
)
|
)
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
@@ -26,15 +29,15 @@ from pipecat.processors.aggregators.openai_llm_context import (
|
|||||||
)
|
)
|
||||||
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
|
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
|
||||||
from pipecat.processors.logger import FrameLogger
|
from pipecat.processors.logger import FrameLogger
|
||||||
from pipecat.services.cartesia.tts import CartesiaTTSService
|
from pipecat.services.cartesia import CartesiaTTSService
|
||||||
from pipecat.services.deepgram.stt import DeepgramSTTService
|
from pipecat.services.openai import OpenAILLMService
|
||||||
from pipecat.services.openai.llm import OpenAILLMService
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.transports.base_transport import TransportParams
|
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
|
|
||||||
sounds = {}
|
sounds = {}
|
||||||
sound_files = ["ding1.wav", "ding2.wav"]
|
sound_files = ["ding1.wav", "ding2.wav"]
|
||||||
@@ -77,83 +80,70 @@ class InboundSoundEffectWrapper(FrameProcessor):
|
|||||||
await self.push_frame(frame, direction)
|
await self.push_frame(frame, direction)
|
||||||
|
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
async def main():
|
||||||
logger.info(f"Starting bot")
|
async with aiohttp.ClientSession() as session:
|
||||||
|
(room_url, token) = await configure(session)
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
transport = DailyTransport(
|
||||||
webrtc_connection=webrtc_connection,
|
room_url,
|
||||||
params=TransportParams(
|
token,
|
||||||
audio_in_enabled=True,
|
"Respond bot",
|
||||||
audio_out_enabled=True,
|
DailyParams(
|
||||||
vad_enabled=True,
|
audio_out_enabled=True,
|
||||||
vad_analyzer=SileroVADAnalyzer(),
|
transcription_enabled=True,
|
||||||
vad_audio_passthrough=True,
|
vad_enabled=True,
|
||||||
),
|
vad_analyzer=SileroVADAnalyzer(),
|
||||||
)
|
),
|
||||||
|
)
|
||||||
|
|
||||||
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
|
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
|
||||||
|
|
||||||
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
|
tts = CartesiaTTSService(
|
||||||
|
api_key=os.getenv("CARTESIA_API_KEY"),
|
||||||
|
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
|
||||||
|
)
|
||||||
|
|
||||||
tts = CartesiaTTSService(
|
messages = [
|
||||||
api_key=os.getenv("CARTESIA_API_KEY"),
|
{
|
||||||
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
|
"role": "system",
|
||||||
)
|
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio. Respond to what the user said in a creative and helpful way.",
|
||||||
|
},
|
||||||
messages = [
|
|
||||||
{
|
|
||||||
"role": "system",
|
|
||||||
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio. Respond to what the user said in a creative and helpful way.",
|
|
||||||
},
|
|
||||||
]
|
|
||||||
|
|
||||||
context = OpenAILLMContext(messages)
|
|
||||||
context_aggregator = llm.create_context_aggregator(context)
|
|
||||||
out_sound = OutboundSoundEffectWrapper()
|
|
||||||
in_sound = InboundSoundEffectWrapper()
|
|
||||||
fl = FrameLogger("LLM Out")
|
|
||||||
fl2 = FrameLogger("Transcription In")
|
|
||||||
|
|
||||||
pipeline = Pipeline(
|
|
||||||
[
|
|
||||||
transport.input(),
|
|
||||||
stt,
|
|
||||||
context_aggregator.user(),
|
|
||||||
in_sound,
|
|
||||||
fl2,
|
|
||||||
llm,
|
|
||||||
fl,
|
|
||||||
tts,
|
|
||||||
out_sound,
|
|
||||||
transport.output(),
|
|
||||||
context_aggregator.assistant(),
|
|
||||||
]
|
]
|
||||||
)
|
|
||||||
|
|
||||||
task = PipelineTask(pipeline)
|
context = OpenAILLMContext(messages)
|
||||||
|
context_aggregator = llm.create_context_aggregator(context)
|
||||||
|
out_sound = OutboundSoundEffectWrapper()
|
||||||
|
in_sound = InboundSoundEffectWrapper()
|
||||||
|
fl = FrameLogger("LLM Out")
|
||||||
|
fl2 = FrameLogger("Transcription In")
|
||||||
|
|
||||||
@transport.event_handler("on_client_connected")
|
pipeline = Pipeline(
|
||||||
async def on_client_connected(transport, client):
|
[
|
||||||
logger.info(f"Client connected")
|
transport.input(),
|
||||||
# Kick off the conversation.
|
context_aggregator.user(),
|
||||||
await task.queue_frame(TTSSpeakFrame("Hi, I'm listening!"))
|
in_sound,
|
||||||
await transport.send_audio(sounds["ding1.wav"])
|
fl2,
|
||||||
|
llm,
|
||||||
|
fl,
|
||||||
|
tts,
|
||||||
|
out_sound,
|
||||||
|
transport.output(),
|
||||||
|
context_aggregator.assistant(),
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
@transport.event_handler("on_first_participant_joined")
|
||||||
async def on_client_disconnected(transport, client):
|
async def on_first_participant_joined(transport, participant):
|
||||||
logger.info(f"Client disconnected")
|
await transport.capture_participant_transcription(participant["id"])
|
||||||
|
await tts.say("Hi, I'm listening!")
|
||||||
|
await transport.send_audio(sounds["ding1.wav"])
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
runner = PipelineRunner()
|
||||||
async def on_client_closed(transport, client):
|
|
||||||
logger.info(f"Client closed connection")
|
|
||||||
await task.cancel()
|
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
task = PipelineTask(pipeline)
|
||||||
|
|
||||||
await runner.run(task)
|
await runner.run(task)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
@@ -4,11 +4,15 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
from typing import Optional
|
from typing import Optional
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
||||||
from pipecat.frames.frames import Frame, TextFrame, UserImageRequestFrame
|
from pipecat.frames.frames import Frame, TextFrame, UserImageRequestFrame
|
||||||
@@ -18,15 +22,15 @@ from pipecat.pipeline.task import PipelineTask
|
|||||||
from pipecat.processors.aggregators.user_response import UserResponseAggregator
|
from pipecat.processors.aggregators.user_response import UserResponseAggregator
|
||||||
from pipecat.processors.aggregators.vision_image_frame import VisionImageFrameAggregator
|
from pipecat.processors.aggregators.vision_image_frame import VisionImageFrameAggregator
|
||||||
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
|
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
|
||||||
from pipecat.services.cartesia.tts import CartesiaTTSService
|
from pipecat.services.cartesia import CartesiaTTSService
|
||||||
from pipecat.services.deepgram.stt import DeepgramSTTService
|
from pipecat.services.moondream import MoondreamService
|
||||||
from pipecat.services.moondream.vision import MoondreamService
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.transports.base_transport import TransportParams
|
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
|
|
||||||
class UserImageRequester(FrameProcessor):
|
class UserImageRequester(FrameProcessor):
|
||||||
def __init__(self, participant_id: Optional[str] = None):
|
def __init__(self, participant_id: Optional[str] = None):
|
||||||
@@ -46,81 +50,61 @@ class UserImageRequester(FrameProcessor):
|
|||||||
await self.push_frame(frame, direction)
|
await self.push_frame(frame, direction)
|
||||||
|
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
async def main():
|
||||||
# Get WebRTC peer connection ID
|
async with aiohttp.ClientSession() as session:
|
||||||
webrtc_peer_id = webrtc_connection.pc_id
|
(room_url, token) = await configure(session)
|
||||||
|
|
||||||
logger.info(f"Starting bot with peer_id: {webrtc_peer_id}")
|
transport = DailyTransport(
|
||||||
|
room_url,
|
||||||
|
token,
|
||||||
|
"Describe participant video",
|
||||||
|
DailyParams(
|
||||||
|
audio_out_enabled=True,
|
||||||
|
transcription_enabled=True,
|
||||||
|
vad_enabled=True,
|
||||||
|
vad_analyzer=SileroVADAnalyzer(),
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
user_response = UserResponseAggregator()
|
||||||
webrtc_connection=webrtc_connection,
|
|
||||||
params=TransportParams(
|
|
||||||
audio_in_enabled=True,
|
|
||||||
audio_out_enabled=True,
|
|
||||||
camera_in_enabled=True,
|
|
||||||
vad_enabled=True,
|
|
||||||
vad_analyzer=SileroVADAnalyzer(),
|
|
||||||
vad_audio_passthrough=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
user_response = UserResponseAggregator()
|
image_requester = UserImageRequester()
|
||||||
|
|
||||||
# Initialize the image requester without setting the participant ID yet
|
vision_aggregator = VisionImageFrameAggregator()
|
||||||
image_requester = UserImageRequester()
|
|
||||||
|
|
||||||
vision_aggregator = VisionImageFrameAggregator()
|
# If you run into weird description, try with use_cpu=True
|
||||||
|
moondream = MoondreamService()
|
||||||
|
|
||||||
# If you run into weird description, try with use_cpu=True
|
tts = CartesiaTTSService(
|
||||||
moondream = MoondreamService()
|
api_key=os.getenv("CARTESIA_API_KEY"),
|
||||||
|
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
|
||||||
|
)
|
||||||
|
|
||||||
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
|
@transport.event_handler("on_first_participant_joined")
|
||||||
|
async def on_first_participant_joined(transport, participant):
|
||||||
|
await tts.say("Hi there! Feel free to ask me what I see.")
|
||||||
|
await transport.capture_participant_video(participant["id"], framerate=0)
|
||||||
|
await transport.capture_participant_transcription(participant["id"])
|
||||||
|
image_requester.set_participant_id(participant["id"])
|
||||||
|
|
||||||
tts = CartesiaTTSService(
|
pipeline = Pipeline(
|
||||||
api_key=os.getenv("CARTESIA_API_KEY"),
|
[
|
||||||
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
|
transport.input(),
|
||||||
)
|
user_response,
|
||||||
|
image_requester,
|
||||||
|
vision_aggregator,
|
||||||
|
moondream,
|
||||||
|
tts,
|
||||||
|
transport.output(),
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
pipeline = Pipeline(
|
task = PipelineTask(pipeline)
|
||||||
[
|
|
||||||
transport.input(),
|
|
||||||
stt,
|
|
||||||
user_response,
|
|
||||||
image_requester,
|
|
||||||
vision_aggregator,
|
|
||||||
moondream,
|
|
||||||
tts,
|
|
||||||
transport.output(),
|
|
||||||
]
|
|
||||||
)
|
|
||||||
|
|
||||||
task = PipelineTask(pipeline)
|
runner = PipelineRunner()
|
||||||
|
|
||||||
@transport.event_handler("on_client_connected")
|
await runner.run(task)
|
||||||
async def on_client_connected(transport, client):
|
|
||||||
logger.info(f"Client connected: {client}")
|
|
||||||
|
|
||||||
# Welcome message
|
|
||||||
await tts.say("Hi there! Feel free to ask me what I see.")
|
|
||||||
|
|
||||||
# Set the participant ID in the image requester
|
|
||||||
image_requester.set_participant_id(webrtc_peer_id)
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
|
||||||
async def on_client_disconnected(transport, client):
|
|
||||||
logger.info(f"Client disconnected")
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
|
||||||
async def on_client_closed(transport, client):
|
|
||||||
logger.info(f"Client closed connection")
|
|
||||||
await task.cancel()
|
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
|
||||||
|
|
||||||
await runner.run(task)
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
@@ -4,29 +4,33 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
from typing import Optional
|
from typing import Optional
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
||||||
from pipecat.frames.frames import Frame, TextFrame, UserImageRequestFrame
|
from pipecat.frames.frames import Frame, TextFrame, UserImageRequestFrame
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
from pipecat.pipeline.task import PipelineTask
|
||||||
from pipecat.processors.aggregators.user_response import UserResponseAggregator
|
from pipecat.processors.aggregators.user_response import UserResponseAggregator
|
||||||
from pipecat.processors.aggregators.vision_image_frame import VisionImageFrameAggregator
|
from pipecat.processors.aggregators.vision_image_frame import VisionImageFrameAggregator
|
||||||
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
|
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
|
||||||
from pipecat.services.cartesia.tts import CartesiaTTSService
|
from pipecat.services.cartesia import CartesiaTTSService
|
||||||
from pipecat.services.deepgram.stt import DeepgramSTTService
|
from pipecat.services.google import GoogleLLMService
|
||||||
from pipecat.services.google.llm import GoogleLLMService
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.transports.base_transport import TransportParams
|
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
|
|
||||||
class UserImageRequester(FrameProcessor):
|
class UserImageRequester(FrameProcessor):
|
||||||
def __init__(self, participant_id: Optional[str] = None):
|
def __init__(self, participant_id: Optional[str] = None):
|
||||||
@@ -46,84 +50,61 @@ class UserImageRequester(FrameProcessor):
|
|||||||
await self.push_frame(frame, direction)
|
await self.push_frame(frame, direction)
|
||||||
|
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
async def main():
|
||||||
# Get WebRTC peer connection ID
|
async with aiohttp.ClientSession() as session:
|
||||||
webrtc_peer_id = webrtc_connection.pc_id
|
(room_url, token) = await configure(session)
|
||||||
|
|
||||||
logger.info(f"Starting bot with peer_id: {webrtc_peer_id}")
|
transport = DailyTransport(
|
||||||
|
room_url,
|
||||||
|
token,
|
||||||
|
"Describe participant video",
|
||||||
|
DailyParams(
|
||||||
|
audio_in_enabled=True, # This is so Silero VAD can get audio data
|
||||||
|
audio_out_enabled=True,
|
||||||
|
transcription_enabled=True,
|
||||||
|
vad_enabled=True,
|
||||||
|
vad_analyzer=SileroVADAnalyzer(),
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
user_response = UserResponseAggregator()
|
||||||
webrtc_connection=webrtc_connection,
|
|
||||||
params=TransportParams(
|
|
||||||
audio_in_enabled=True,
|
|
||||||
audio_out_enabled=True,
|
|
||||||
camera_in_enabled=True,
|
|
||||||
vad_enabled=True,
|
|
||||||
vad_analyzer=SileroVADAnalyzer(),
|
|
||||||
vad_audio_passthrough=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
user_response = UserResponseAggregator()
|
image_requester = UserImageRequester()
|
||||||
|
|
||||||
# Initialize the image requester without setting the participant ID yet
|
vision_aggregator = VisionImageFrameAggregator()
|
||||||
image_requester = UserImageRequester()
|
|
||||||
|
|
||||||
vision_aggregator = VisionImageFrameAggregator()
|
google = GoogleLLMService(model="gemini-2.0-flash-001", api_key=os.getenv("GOOGLE_API_KEY"))
|
||||||
|
|
||||||
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
|
tts = CartesiaTTSService(
|
||||||
|
api_key=os.getenv("CARTESIA_API_KEY"),
|
||||||
|
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
|
||||||
|
)
|
||||||
|
|
||||||
# Google Gemini model for vision analysis
|
@transport.event_handler("on_first_participant_joined")
|
||||||
google = GoogleLLMService(model="gemini-2.0-flash-001", api_key=os.getenv("GOOGLE_API_KEY"))
|
async def on_first_participant_joined(transport, participant):
|
||||||
|
await tts.say("Hi there! Feel free to ask me what I see.")
|
||||||
|
await transport.capture_participant_video(participant["id"], framerate=0)
|
||||||
|
await transport.capture_participant_transcription(participant["id"])
|
||||||
|
image_requester.set_participant_id(participant["id"])
|
||||||
|
|
||||||
tts = CartesiaTTSService(
|
pipeline = Pipeline(
|
||||||
api_key=os.getenv("CARTESIA_API_KEY"),
|
[
|
||||||
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
|
transport.input(),
|
||||||
)
|
user_response,
|
||||||
|
image_requester,
|
||||||
|
vision_aggregator,
|
||||||
|
google,
|
||||||
|
tts,
|
||||||
|
transport.output(),
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
pipeline = Pipeline(
|
task = PipelineTask(pipeline)
|
||||||
[
|
|
||||||
transport.input(),
|
|
||||||
stt,
|
|
||||||
user_response,
|
|
||||||
image_requester,
|
|
||||||
vision_aggregator,
|
|
||||||
google,
|
|
||||||
tts,
|
|
||||||
transport.output(),
|
|
||||||
]
|
|
||||||
)
|
|
||||||
|
|
||||||
task = PipelineTask(
|
runner = PipelineRunner()
|
||||||
pipeline,
|
|
||||||
params=PipelineParams(allow_interruptions=True),
|
|
||||||
)
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_connected")
|
await runner.run(task)
|
||||||
async def on_client_connected(transport, client):
|
|
||||||
logger.info(f"Client connected: {client}")
|
|
||||||
|
|
||||||
# Welcome message
|
|
||||||
await tts.say("Hi there! Feel free to ask me what I see.")
|
|
||||||
|
|
||||||
# Set the participant ID in the image requester
|
|
||||||
image_requester.set_participant_id(webrtc_peer_id)
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
|
||||||
async def on_client_disconnected(transport, client):
|
|
||||||
logger.info(f"Client disconnected")
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
|
||||||
async def on_client_closed(transport, client):
|
|
||||||
logger.info(f"Client closed connection")
|
|
||||||
await task.cancel()
|
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
|
||||||
|
|
||||||
await runner.run(task)
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
@@ -4,29 +4,33 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
from typing import Optional
|
from typing import Optional
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
||||||
from pipecat.frames.frames import Frame, TextFrame, UserImageRequestFrame
|
from pipecat.frames.frames import Frame, TextFrame, UserImageRequestFrame
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
from pipecat.pipeline.task import PipelineTask
|
||||||
from pipecat.processors.aggregators.user_response import UserResponseAggregator
|
from pipecat.processors.aggregators.user_response import UserResponseAggregator
|
||||||
from pipecat.processors.aggregators.vision_image_frame import VisionImageFrameAggregator
|
from pipecat.processors.aggregators.vision_image_frame import VisionImageFrameAggregator
|
||||||
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
|
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
|
||||||
from pipecat.services.cartesia.tts import CartesiaTTSService
|
from pipecat.services.cartesia import CartesiaTTSService
|
||||||
from pipecat.services.deepgram.stt import DeepgramSTTService
|
from pipecat.services.openai import OpenAILLMService
|
||||||
from pipecat.services.openai.llm import OpenAILLMService
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.transports.base_transport import TransportParams
|
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
|
|
||||||
class UserImageRequester(FrameProcessor):
|
class UserImageRequester(FrameProcessor):
|
||||||
def __init__(self, participant_id: Optional[str] = None):
|
def __init__(self, participant_id: Optional[str] = None):
|
||||||
@@ -46,84 +50,60 @@ class UserImageRequester(FrameProcessor):
|
|||||||
await self.push_frame(frame, direction)
|
await self.push_frame(frame, direction)
|
||||||
|
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
async def main():
|
||||||
# Get WebRTC peer connection ID
|
async with aiohttp.ClientSession() as session:
|
||||||
webrtc_peer_id = webrtc_connection.pc_id
|
(room_url, token) = await configure(session)
|
||||||
|
|
||||||
logger.info(f"Starting bot with peer_id: {webrtc_peer_id}")
|
transport = DailyTransport(
|
||||||
|
room_url,
|
||||||
|
token,
|
||||||
|
"Describe participant video",
|
||||||
|
DailyParams(
|
||||||
|
audio_out_enabled=True,
|
||||||
|
transcription_enabled=True,
|
||||||
|
vad_enabled=True,
|
||||||
|
vad_analyzer=SileroVADAnalyzer(),
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
user_response = UserResponseAggregator()
|
||||||
webrtc_connection=webrtc_connection,
|
|
||||||
params=TransportParams(
|
|
||||||
audio_in_enabled=True,
|
|
||||||
audio_out_enabled=True,
|
|
||||||
camera_in_enabled=True,
|
|
||||||
vad_enabled=True,
|
|
||||||
vad_analyzer=SileroVADAnalyzer(),
|
|
||||||
vad_audio_passthrough=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
user_response = UserResponseAggregator()
|
image_requester = UserImageRequester()
|
||||||
|
|
||||||
# Initialize the image requester without setting the participant ID yet
|
vision_aggregator = VisionImageFrameAggregator()
|
||||||
image_requester = UserImageRequester()
|
|
||||||
|
|
||||||
vision_aggregator = VisionImageFrameAggregator()
|
openai = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
|
||||||
|
|
||||||
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
|
tts = CartesiaTTSService(
|
||||||
|
api_key=os.getenv("CARTESIA_API_KEY"),
|
||||||
|
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
|
||||||
|
)
|
||||||
|
|
||||||
# OpenAI GPT-4o for vision analysis
|
@transport.event_handler("on_first_participant_joined")
|
||||||
openai = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
|
async def on_first_participant_joined(transport, participant):
|
||||||
|
await tts.say("Hi there! Feel free to ask me what I see.")
|
||||||
|
await transport.capture_participant_video(participant["id"], framerate=0)
|
||||||
|
await transport.capture_participant_transcription(participant["id"])
|
||||||
|
image_requester.set_participant_id(participant["id"])
|
||||||
|
|
||||||
tts = CartesiaTTSService(
|
pipeline = Pipeline(
|
||||||
api_key=os.getenv("CARTESIA_API_KEY"),
|
[
|
||||||
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
|
transport.input(),
|
||||||
)
|
user_response,
|
||||||
|
image_requester,
|
||||||
|
vision_aggregator,
|
||||||
|
openai,
|
||||||
|
tts,
|
||||||
|
transport.output(),
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
pipeline = Pipeline(
|
task = PipelineTask(pipeline)
|
||||||
[
|
|
||||||
transport.input(),
|
|
||||||
stt,
|
|
||||||
user_response,
|
|
||||||
image_requester,
|
|
||||||
vision_aggregator,
|
|
||||||
openai,
|
|
||||||
tts,
|
|
||||||
transport.output(),
|
|
||||||
]
|
|
||||||
)
|
|
||||||
|
|
||||||
task = PipelineTask(
|
runner = PipelineRunner()
|
||||||
pipeline,
|
|
||||||
params=PipelineParams(allow_interruptions=True),
|
|
||||||
)
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_connected")
|
await runner.run(task)
|
||||||
async def on_client_connected(transport, client):
|
|
||||||
logger.info(f"Client connected: {client}")
|
|
||||||
|
|
||||||
# Welcome message
|
|
||||||
await tts.say("Hi there! Feel free to ask me what I see.")
|
|
||||||
|
|
||||||
# Set the participant ID in the image requester
|
|
||||||
image_requester.set_participant_id(webrtc_peer_id)
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
|
||||||
async def on_client_disconnected(transport, client):
|
|
||||||
logger.info(f"Client disconnected")
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
|
||||||
async def on_client_closed(transport, client):
|
|
||||||
logger.info(f"Client closed connection")
|
|
||||||
await task.cancel()
|
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
|
||||||
|
|
||||||
await runner.run(task)
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
@@ -4,29 +4,33 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
from typing import Optional
|
from typing import Optional
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
||||||
from pipecat.frames.frames import Frame, TextFrame, UserImageRequestFrame
|
from pipecat.frames.frames import Frame, TextFrame, UserImageRequestFrame
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
from pipecat.pipeline.task import PipelineTask
|
||||||
from pipecat.processors.aggregators.user_response import UserResponseAggregator
|
from pipecat.processors.aggregators.user_response import UserResponseAggregator
|
||||||
from pipecat.processors.aggregators.vision_image_frame import VisionImageFrameAggregator
|
from pipecat.processors.aggregators.vision_image_frame import VisionImageFrameAggregator
|
||||||
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
|
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
|
||||||
from pipecat.services.anthropic.llm import AnthropicLLMService
|
from pipecat.services.anthropic import AnthropicLLMService
|
||||||
from pipecat.services.cartesia.tts import CartesiaTTSService
|
from pipecat.services.cartesia import CartesiaTTSService
|
||||||
from pipecat.services.deepgram.stt import DeepgramSTTService
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.transports.base_transport import TransportParams
|
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
|
|
||||||
class UserImageRequester(FrameProcessor):
|
class UserImageRequester(FrameProcessor):
|
||||||
def __init__(self, participant_id: Optional[str] = None):
|
def __init__(self, participant_id: Optional[str] = None):
|
||||||
@@ -46,84 +50,60 @@ class UserImageRequester(FrameProcessor):
|
|||||||
await self.push_frame(frame, direction)
|
await self.push_frame(frame, direction)
|
||||||
|
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
async def main():
|
||||||
# Get WebRTC peer connection ID
|
async with aiohttp.ClientSession() as session:
|
||||||
webrtc_peer_id = webrtc_connection.pc_id
|
(room_url, token) = await configure(session)
|
||||||
|
|
||||||
logger.info(f"Starting bot with peer_id: {webrtc_peer_id}")
|
transport = DailyTransport(
|
||||||
|
room_url,
|
||||||
|
token,
|
||||||
|
"Describe participant video",
|
||||||
|
DailyParams(
|
||||||
|
audio_out_enabled=True,
|
||||||
|
transcription_enabled=True,
|
||||||
|
vad_enabled=True,
|
||||||
|
vad_analyzer=SileroVADAnalyzer(),
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
user_response = UserResponseAggregator()
|
||||||
webrtc_connection=webrtc_connection,
|
|
||||||
params=TransportParams(
|
|
||||||
audio_in_enabled=True,
|
|
||||||
audio_out_enabled=True,
|
|
||||||
camera_in_enabled=True,
|
|
||||||
vad_enabled=True,
|
|
||||||
vad_analyzer=SileroVADAnalyzer(),
|
|
||||||
vad_audio_passthrough=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
user_response = UserResponseAggregator()
|
image_requester = UserImageRequester()
|
||||||
|
|
||||||
# Initialize the image requester without setting the participant ID yet
|
vision_aggregator = VisionImageFrameAggregator()
|
||||||
image_requester = UserImageRequester()
|
|
||||||
|
|
||||||
vision_aggregator = VisionImageFrameAggregator()
|
anthropic = AnthropicLLMService(api_key=os.getenv("ANTHROPIC_API_KEY"))
|
||||||
|
|
||||||
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
|
tts = CartesiaTTSService(
|
||||||
|
api_key=os.getenv("CARTESIA_API_KEY"),
|
||||||
|
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
|
||||||
|
)
|
||||||
|
|
||||||
# Anthropic for vision analysis
|
@transport.event_handler("on_first_participant_joined")
|
||||||
anthropic = AnthropicLLMService(api_key=os.getenv("ANTHROPIC_API_KEY"))
|
async def on_first_participant_joined(transport, participant):
|
||||||
|
await tts.say("Hi there! Feel free to ask me what I see.")
|
||||||
|
await transport.capture_participant_video(participant["id"], framerate=0)
|
||||||
|
await transport.capture_participant_transcription(participant["id"])
|
||||||
|
image_requester.set_participant_id(participant["id"])
|
||||||
|
|
||||||
tts = CartesiaTTSService(
|
pipeline = Pipeline(
|
||||||
api_key=os.getenv("CARTESIA_API_KEY"),
|
[
|
||||||
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
|
transport.input(),
|
||||||
)
|
user_response,
|
||||||
|
image_requester,
|
||||||
|
vision_aggregator,
|
||||||
|
anthropic,
|
||||||
|
tts,
|
||||||
|
transport.output(),
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
pipeline = Pipeline(
|
task = PipelineTask(pipeline)
|
||||||
[
|
|
||||||
transport.input(),
|
|
||||||
stt,
|
|
||||||
user_response,
|
|
||||||
image_requester,
|
|
||||||
vision_aggregator,
|
|
||||||
anthropic,
|
|
||||||
tts,
|
|
||||||
transport.output(),
|
|
||||||
]
|
|
||||||
)
|
|
||||||
|
|
||||||
task = PipelineTask(
|
runner = PipelineRunner()
|
||||||
pipeline,
|
|
||||||
params=PipelineParams(allow_interruptions=True),
|
|
||||||
)
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_connected")
|
await runner.run(task)
|
||||||
async def on_client_connected(transport, client):
|
|
||||||
logger.info(f"Client connected: {client}")
|
|
||||||
|
|
||||||
# Welcome message
|
|
||||||
await tts.say("Hi there! Feel free to ask me what I see.")
|
|
||||||
|
|
||||||
# Set the participant ID in the image requester
|
|
||||||
image_requester.set_participant_id(webrtc_peer_id)
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
|
||||||
async def on_client_disconnected(transport, client):
|
|
||||||
logger.info(f"Client disconnected")
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
|
||||||
async def on_client_closed(transport, client):
|
|
||||||
logger.info(f"Client closed connection")
|
|
||||||
await task.cancel()
|
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
|
||||||
|
|
||||||
await runner.run(task)
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
@@ -4,23 +4,27 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
|
import sys
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
|
||||||
from pipecat.frames.frames import Frame, TranscriptionFrame
|
from pipecat.frames.frames import Frame, TranscriptionFrame
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineTask
|
from pipecat.pipeline.task import PipelineTask
|
||||||
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
|
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
|
||||||
from pipecat.services.whisper.stt import WhisperSTTService
|
from pipecat.services.whisper import WhisperSTTService
|
||||||
from pipecat.transports.base_transport import TransportParams
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
|
|
||||||
class TranscriptionLogger(FrameProcessor):
|
class TranscriptionLogger(FrameProcessor):
|
||||||
async def process_frame(self, frame: Frame, direction: FrameDirection):
|
async def process_frame(self, frame: Frame, direction: FrameDirection):
|
||||||
@@ -30,42 +34,26 @@ class TranscriptionLogger(FrameProcessor):
|
|||||||
print(f"Transcription: {frame.text}")
|
print(f"Transcription: {frame.text}")
|
||||||
|
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
async def main():
|
||||||
logger.info(f"Starting bot")
|
async with aiohttp.ClientSession() as session:
|
||||||
|
(room_url, _) = await configure(session)
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
transport = DailyTransport(
|
||||||
webrtc_connection=webrtc_connection,
|
room_url, None, "Transcription bot", DailyParams(audio_in_enabled=True)
|
||||||
params=TransportParams(
|
)
|
||||||
audio_in_enabled=True,
|
|
||||||
vad_enabled=True,
|
|
||||||
vad_analyzer=SileroVADAnalyzer(),
|
|
||||||
vad_audio_passthrough=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
stt = WhisperSTTService()
|
stt = WhisperSTTService()
|
||||||
|
|
||||||
tl = TranscriptionLogger()
|
tl = TranscriptionLogger()
|
||||||
|
|
||||||
pipeline = Pipeline([transport.input(), stt, tl])
|
pipeline = Pipeline([transport.input(), stt, tl])
|
||||||
|
|
||||||
task = PipelineTask(pipeline)
|
task = PipelineTask(pipeline)
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
runner = PipelineRunner()
|
||||||
async def on_client_disconnected(transport, client):
|
|
||||||
logger.info(f"Client disconnected")
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
await runner.run(task)
|
||||||
async def on_client_closed(transport, client):
|
|
||||||
logger.info(f"Client closed connection")
|
|
||||||
await task.cancel()
|
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
|
||||||
|
|
||||||
await runner.run(task)
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
@@ -10,13 +10,12 @@ import sys
|
|||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
|
||||||
from pipecat.frames.frames import Frame, TranscriptionFrame
|
from pipecat.frames.frames import Frame, TranscriptionFrame
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineTask
|
from pipecat.pipeline.task import PipelineTask
|
||||||
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
|
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
|
||||||
from pipecat.services.whisper.stt import WhisperSTTService
|
from pipecat.services.whisper import WhisperSTTService
|
||||||
from pipecat.transports.local.audio import LocalAudioTransport, LocalAudioTransportParams
|
from pipecat.transports.local.audio import LocalAudioTransport, LocalAudioTransportParams
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
@@ -34,14 +33,7 @@ class TranscriptionLogger(FrameProcessor):
|
|||||||
|
|
||||||
|
|
||||||
async def main():
|
async def main():
|
||||||
transport = LocalAudioTransport(
|
transport = LocalAudioTransport(LocalAudioTransportParams(audio_in_enabled=True))
|
||||||
LocalAudioTransportParams(
|
|
||||||
audio_in_enabled=True,
|
|
||||||
vad_enabled=True,
|
|
||||||
vad_analyzer=SileroVADAnalyzer(),
|
|
||||||
vad_audio_passthrough=True,
|
|
||||||
)
|
|
||||||
)
|
|
||||||
|
|
||||||
stt = WhisperSTTService()
|
stt = WhisperSTTService()
|
||||||
|
|
||||||
|
|||||||
@@ -4,24 +4,28 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.frames.frames import Frame, TranscriptionFrame
|
from pipecat.frames.frames import Frame, TranscriptionFrame
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineTask
|
from pipecat.pipeline.task import PipelineTask
|
||||||
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
|
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
|
||||||
from pipecat.services.deepgram.stt import DeepgramSTTService, Language, LiveOptions
|
from pipecat.services.deepgram import DeepgramSTTService, Language, LiveOptions
|
||||||
from pipecat.transports.base_transport import TransportParams
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
|
|
||||||
class TranscriptionLogger(FrameProcessor):
|
class TranscriptionLogger(FrameProcessor):
|
||||||
async def process_frame(self, frame: Frame, direction: FrameDirection):
|
async def process_frame(self, frame: Frame, direction: FrameDirection):
|
||||||
@@ -31,40 +35,29 @@ class TranscriptionLogger(FrameProcessor):
|
|||||||
print(f"Transcription: {frame.text}")
|
print(f"Transcription: {frame.text}")
|
||||||
|
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
async def main():
|
||||||
logger.info(f"Starting bot")
|
async with aiohttp.ClientSession() as session:
|
||||||
|
(room_url, _) = await configure(session)
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
transport = DailyTransport(
|
||||||
webrtc_connection=webrtc_connection,
|
room_url, None, "Transcription bot", DailyParams(audio_in_enabled=True)
|
||||||
params=TransportParams(audio_in_enabled=True),
|
)
|
||||||
)
|
|
||||||
|
|
||||||
stt = DeepgramSTTService(
|
stt = DeepgramSTTService(
|
||||||
api_key=os.getenv("DEEPGRAM_API_KEY"),
|
api_key=os.getenv("DEEPGRAM_API_KEY"),
|
||||||
live_options=LiveOptions(language=Language.EN),
|
# live_options=LiveOptions(language=Language.FR),
|
||||||
)
|
)
|
||||||
|
|
||||||
tl = TranscriptionLogger()
|
tl = TranscriptionLogger()
|
||||||
|
|
||||||
pipeline = Pipeline([transport.input(), stt, tl])
|
pipeline = Pipeline([transport.input(), stt, tl])
|
||||||
|
|
||||||
task = PipelineTask(pipeline)
|
task = PipelineTask(pipeline)
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
runner = PipelineRunner()
|
||||||
async def on_client_disconnected(transport, client):
|
|
||||||
logger.info(f"Client disconnected")
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
await runner.run(task)
|
||||||
async def on_client_closed(transport, client):
|
|
||||||
logger.info(f"Client closed connection")
|
|
||||||
await task.cancel()
|
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
|
||||||
|
|
||||||
await runner.run(task)
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
@@ -4,10 +4,14 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.frames.frames import Frame, TranscriptionFrame
|
from pipecat.frames.frames import Frame, TranscriptionFrame
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
@@ -15,12 +19,13 @@ from pipecat.pipeline.runner import PipelineRunner
|
|||||||
from pipecat.pipeline.task import PipelineTask
|
from pipecat.pipeline.task import PipelineTask
|
||||||
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
|
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
|
||||||
from pipecat.services.gladia import GladiaSTTService
|
from pipecat.services.gladia import GladiaSTTService
|
||||||
from pipecat.transports.base_transport import TransportParams
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
|
|
||||||
class TranscriptionLogger(FrameProcessor):
|
class TranscriptionLogger(FrameProcessor):
|
||||||
async def process_frame(self, frame: Frame, direction: FrameDirection):
|
async def process_frame(self, frame: Frame, direction: FrameDirection):
|
||||||
@@ -30,40 +35,29 @@ class TranscriptionLogger(FrameProcessor):
|
|||||||
print(f"Transcription: {frame.text}")
|
print(f"Transcription: {frame.text}")
|
||||||
|
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
async def main():
|
||||||
logger.info(f"Starting bot")
|
async with aiohttp.ClientSession() as session:
|
||||||
|
(room_url, _) = await configure(session)
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
transport = DailyTransport(
|
||||||
webrtc_connection=webrtc_connection,
|
room_url, None, "Transcription bot", DailyParams(audio_in_enabled=True)
|
||||||
params=TransportParams(audio_in_enabled=True),
|
)
|
||||||
)
|
|
||||||
|
|
||||||
stt = GladiaSTTService(
|
stt = GladiaSTTService(
|
||||||
api_key=os.getenv("GLADIA_API_KEY"),
|
api_key=os.getenv("GLADIA_API_KEY"),
|
||||||
# live_options=LiveOptions(language=Language.FR),
|
# live_options=LiveOptions(language=Language.FR),
|
||||||
)
|
)
|
||||||
|
|
||||||
tl = TranscriptionLogger()
|
tl = TranscriptionLogger()
|
||||||
|
|
||||||
pipeline = Pipeline([transport.input(), stt, tl])
|
pipeline = Pipeline([transport.input(), stt, tl])
|
||||||
|
|
||||||
task = PipelineTask(pipeline)
|
task = PipelineTask(pipeline)
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
runner = PipelineRunner()
|
||||||
async def on_client_disconnected(transport, client):
|
|
||||||
logger.info(f"Client disconnected")
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
await runner.run(task)
|
||||||
async def on_client_closed(transport, client):
|
|
||||||
logger.info(f"Client closed connection")
|
|
||||||
await task.cancel()
|
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
|
||||||
|
|
||||||
await runner.run(task)
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
@@ -4,23 +4,28 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.frames.frames import Frame, TranscriptionFrame
|
from pipecat.frames.frames import Frame, TranscriptionFrame
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineTask
|
from pipecat.pipeline.task import PipelineTask
|
||||||
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
|
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
|
||||||
from pipecat.services.assemblyai.stt import AssemblyAISTTService
|
from pipecat.services.assemblyai import AssemblyAISTTService
|
||||||
from pipecat.transports.base_transport import TransportParams
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
|
|
||||||
class TranscriptionLogger(FrameProcessor):
|
class TranscriptionLogger(FrameProcessor):
|
||||||
async def process_frame(self, frame: Frame, direction: FrameDirection):
|
async def process_frame(self, frame: Frame, direction: FrameDirection):
|
||||||
@@ -30,39 +35,28 @@ class TranscriptionLogger(FrameProcessor):
|
|||||||
print(f"Transcription: {frame.text}")
|
print(f"Transcription: {frame.text}")
|
||||||
|
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
async def main():
|
||||||
logger.info(f"Starting bot")
|
async with aiohttp.ClientSession() as session:
|
||||||
|
(room_url, _) = await configure(session)
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
transport = DailyTransport(
|
||||||
webrtc_connection=webrtc_connection,
|
room_url, None, "Transcription bot", DailyParams(audio_in_enabled=True)
|
||||||
params=TransportParams(audio_in_enabled=True),
|
)
|
||||||
)
|
|
||||||
|
|
||||||
stt = AssemblyAISTTService(
|
stt = AssemblyAISTTService(
|
||||||
api_key=os.getenv("ASSEMBLYAI_API_KEY"),
|
api_key=os.getenv("ASSEMBLYAI_API_KEY"),
|
||||||
)
|
)
|
||||||
|
|
||||||
tl = TranscriptionLogger()
|
tl = TranscriptionLogger()
|
||||||
|
|
||||||
pipeline = Pipeline([transport.input(), stt, tl])
|
pipeline = Pipeline([transport.input(), stt, tl])
|
||||||
|
|
||||||
task = PipelineTask(pipeline)
|
task = PipelineTask(pipeline)
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
runner = PipelineRunner()
|
||||||
async def on_client_disconnected(transport, client):
|
|
||||||
logger.info(f"Client disconnected")
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
await runner.run(task)
|
||||||
async def on_client_closed(transport, client):
|
|
||||||
logger.info(f"Client closed connection")
|
|
||||||
await task.cancel()
|
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
|
||||||
|
|
||||||
await runner.run(task)
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
@@ -1,99 +0,0 @@
|
|||||||
#
|
|
||||||
# Copyright (c) 2024–2025, Daily
|
|
||||||
#
|
|
||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
|
||||||
#
|
|
||||||
|
|
||||||
|
|
||||||
import time
|
|
||||||
|
|
||||||
from dotenv import load_dotenv
|
|
||||||
from loguru import logger
|
|
||||||
|
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
|
||||||
from pipecat.audio.vad.vad_analyzer import VADParams
|
|
||||||
from pipecat.frames.frames import Frame, TranscriptionFrame, UserStoppedSpeakingFrame
|
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
|
||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
|
||||||
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
|
|
||||||
from pipecat.services.whisper.stt import MLXModel, WhisperSTTServiceMLX
|
|
||||||
from pipecat.transports.base_transport import TransportParams
|
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
|
||||||
|
|
||||||
|
|
||||||
STOP_SECS = 2.0
|
|
||||||
|
|
||||||
|
|
||||||
class TranscriptionLogger(FrameProcessor):
|
|
||||||
"""Measures transcription latency.
|
|
||||||
|
|
||||||
Uses the (intentionally) long STOP_SECS parameter to give the transcription time to finish,
|
|
||||||
then outputs the timing between when the VAD first classified audio input as not-speech and
|
|
||||||
the delivery of the last transcription frame.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self):
|
|
||||||
super().__init__()
|
|
||||||
self._last_transcription_time = time.time()
|
|
||||||
|
|
||||||
async def process_frame(self, frame: Frame, direction: FrameDirection):
|
|
||||||
await super().process_frame(frame, direction)
|
|
||||||
|
|
||||||
if isinstance(frame, UserStoppedSpeakingFrame):
|
|
||||||
logger.debug(
|
|
||||||
f"Transcription latency: {(STOP_SECS - (time.time() - self._last_transcription_time)):.2f}"
|
|
||||||
)
|
|
||||||
|
|
||||||
if isinstance(frame, TranscriptionFrame):
|
|
||||||
self._last_transcription_time = time.time()
|
|
||||||
|
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
|
||||||
logger.info(f"Starting bot")
|
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
|
||||||
webrtc_connection=webrtc_connection,
|
|
||||||
params=TransportParams(
|
|
||||||
audio_in_enabled=True,
|
|
||||||
vad_enabled=True,
|
|
||||||
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=STOP_SECS)),
|
|
||||||
vad_audio_passthrough=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
stt = WhisperSTTServiceMLX(model=MLXModel.LARGE_V3_TURBO)
|
|
||||||
|
|
||||||
tl = TranscriptionLogger()
|
|
||||||
|
|
||||||
pipeline = Pipeline([transport.input(), stt, tl])
|
|
||||||
|
|
||||||
task = PipelineTask(
|
|
||||||
pipeline,
|
|
||||||
params=PipelineParams(
|
|
||||||
enable_metrics=True,
|
|
||||||
report_only_initial_ttfb=False,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
|
||||||
async def on_client_disconnected(transport, client):
|
|
||||||
logger.info(f"Client disconnected")
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
|
||||||
async def on_client_closed(transport, client):
|
|
||||||
logger.info(f"Client closed connection")
|
|
||||||
await task.cancel()
|
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
|
||||||
|
|
||||||
await runner.run(task)
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
from run import main
|
|
||||||
|
|
||||||
main()
|
|
||||||
@@ -4,132 +4,132 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from openai.types.chat import ChatCompletionToolParam
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.adapters.schemas.function_schema import FunctionSchema
|
|
||||||
from pipecat.adapters.schemas.tools_schema import ToolsSchema
|
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
||||||
from pipecat.frames.frames import TTSSpeakFrame
|
from pipecat.frames.frames import TTSSpeakFrame
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||||
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
from pipecat.services.cartesia import CartesiaTTSService
|
||||||
from pipecat.services.cartesia.tts import CartesiaTTSService
|
from pipecat.services.openai import OpenAILLMContext, OpenAILLMService
|
||||||
from pipecat.services.deepgram.stt import DeepgramSTTService
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.services.openai.llm import OpenAILLMService
|
|
||||||
from pipecat.transports.base_transport import TransportParams
|
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
|
|
||||||
|
async def start_fetch_weather(function_name, llm, context):
|
||||||
|
"""Push a frame to the LLM; this is handy when the LLM response might take a while."""
|
||||||
|
await llm.push_frame(TTSSpeakFrame("Let me check on that."))
|
||||||
|
logger.debug(f"Starting fetch_weather_from_api with function_name: {function_name}")
|
||||||
|
|
||||||
|
|
||||||
async def fetch_weather_from_api(function_name, tool_call_id, args, llm, context, result_callback):
|
async def fetch_weather_from_api(function_name, tool_call_id, args, llm, context, result_callback):
|
||||||
await llm.push_frame(TTSSpeakFrame("Let me check on that."))
|
|
||||||
await result_callback({"conditions": "nice", "temperature": "75"})
|
await result_callback({"conditions": "nice", "temperature": "75"})
|
||||||
|
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
async def main():
|
||||||
logger.info(f"Starting bot")
|
async with aiohttp.ClientSession() as session:
|
||||||
|
(room_url, token) = await configure(session)
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
transport = DailyTransport(
|
||||||
webrtc_connection=webrtc_connection,
|
room_url,
|
||||||
params=TransportParams(
|
token,
|
||||||
audio_in_enabled=True,
|
"Respond bot",
|
||||||
audio_out_enabled=True,
|
DailyParams(
|
||||||
vad_enabled=True,
|
audio_out_enabled=True,
|
||||||
vad_analyzer=SileroVADAnalyzer(),
|
transcription_enabled=True,
|
||||||
vad_audio_passthrough=True,
|
vad_enabled=True,
|
||||||
),
|
vad_analyzer=SileroVADAnalyzer(),
|
||||||
)
|
),
|
||||||
|
)
|
||||||
|
|
||||||
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
|
tts = CartesiaTTSService(
|
||||||
|
api_key=os.getenv("CARTESIA_API_KEY"),
|
||||||
|
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
|
||||||
|
)
|
||||||
|
|
||||||
tts = CartesiaTTSService(
|
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
|
||||||
api_key=os.getenv("CARTESIA_API_KEY"),
|
# Register a function_name of None to get all functions
|
||||||
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
|
# sent to the same callback with an additional function_name parameter.
|
||||||
)
|
llm.register_function(None, fetch_weather_from_api, start_callback=start_fetch_weather)
|
||||||
|
|
||||||
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
|
tools = [
|
||||||
|
ChatCompletionToolParam(
|
||||||
# You can also register a function_name of None to get all functions
|
type="function",
|
||||||
# sent to the same callback with an additional function_name parameter.
|
function={
|
||||||
llm.register_function("get_current_weather", fetch_weather_from_api)
|
"name": "get_current_weather",
|
||||||
|
"description": "Get the current weather",
|
||||||
weather_function = FunctionSchema(
|
"parameters": {
|
||||||
name="get_current_weather",
|
"type": "object",
|
||||||
description="Get the current weather",
|
"properties": {
|
||||||
properties={
|
"location": {
|
||||||
"location": {
|
"type": "string",
|
||||||
"type": "string",
|
"description": "The city and state, e.g. San Francisco, CA",
|
||||||
"description": "The city and state, e.g. San Francisco, CA",
|
},
|
||||||
},
|
"format": {
|
||||||
"format": {
|
"type": "string",
|
||||||
"type": "string",
|
"enum": ["celsius", "fahrenheit"],
|
||||||
"enum": ["celsius", "fahrenheit"],
|
"description": "The temperature unit to use. Infer this from the users location.",
|
||||||
"description": "The temperature unit to use. Infer this from the user's location.",
|
},
|
||||||
},
|
},
|
||||||
},
|
"required": ["location", "format"],
|
||||||
required=["location", "format"],
|
},
|
||||||
)
|
},
|
||||||
tools = ToolsSchema(standard_tools=[weather_function])
|
)
|
||||||
|
]
|
||||||
messages = [
|
messages = [
|
||||||
{
|
{
|
||||||
"role": "system",
|
"role": "system",
|
||||||
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
||||||
},
|
},
|
||||||
]
|
|
||||||
|
|
||||||
context = OpenAILLMContext(messages, tools)
|
|
||||||
context_aggregator = llm.create_context_aggregator(context)
|
|
||||||
|
|
||||||
pipeline = Pipeline(
|
|
||||||
[
|
|
||||||
transport.input(),
|
|
||||||
stt,
|
|
||||||
context_aggregator.user(),
|
|
||||||
llm,
|
|
||||||
tts,
|
|
||||||
transport.output(),
|
|
||||||
context_aggregator.assistant(),
|
|
||||||
]
|
]
|
||||||
)
|
|
||||||
|
|
||||||
task = PipelineTask(
|
context = OpenAILLMContext(messages, tools)
|
||||||
pipeline,
|
context_aggregator = llm.create_context_aggregator(context)
|
||||||
params=PipelineParams(
|
|
||||||
allow_interruptions=True,
|
|
||||||
enable_metrics=True,
|
|
||||||
enable_usage_metrics=True,
|
|
||||||
report_only_initial_ttfb=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_connected")
|
pipeline = Pipeline(
|
||||||
async def on_client_connected(transport, client):
|
[
|
||||||
logger.info(f"Client connected")
|
transport.input(),
|
||||||
# Kick off the conversation.
|
context_aggregator.user(),
|
||||||
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
llm,
|
||||||
|
tts,
|
||||||
|
transport.output(),
|
||||||
|
context_aggregator.assistant(),
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
task = PipelineTask(
|
||||||
async def on_client_disconnected(transport, client):
|
pipeline,
|
||||||
logger.info(f"Client disconnected")
|
params=PipelineParams(
|
||||||
|
allow_interruptions=True,
|
||||||
|
enable_metrics=True,
|
||||||
|
enable_usage_metrics=True,
|
||||||
|
report_only_initial_ttfb=True,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
@transport.event_handler("on_first_participant_joined")
|
||||||
async def on_client_closed(transport, client):
|
async def on_first_participant_joined(transport, participant):
|
||||||
logger.info(f"Client closed connection")
|
await transport.capture_participant_transcription(participant["id"])
|
||||||
await task.cancel()
|
# Kick off the conversation.
|
||||||
|
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
runner = PipelineRunner()
|
||||||
|
|
||||||
await runner.run(task)
|
await runner.run(task)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
@@ -4,125 +4,119 @@
|
|||||||
# SPDX-License-Identifier: BSD 2-Clause License
|
# SPDX-License-Identifier: BSD 2-Clause License
|
||||||
#
|
#
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import os
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from loguru import logger
|
from loguru import logger
|
||||||
|
from runner import configure
|
||||||
|
|
||||||
from pipecat.adapters.schemas.function_schema import FunctionSchema
|
|
||||||
from pipecat.adapters.schemas.tools_schema import ToolsSchema
|
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
||||||
from pipecat.pipeline.pipeline import Pipeline
|
from pipecat.pipeline.pipeline import Pipeline
|
||||||
from pipecat.pipeline.runner import PipelineRunner
|
from pipecat.pipeline.runner import PipelineRunner
|
||||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||||
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
|
||||||
from pipecat.services.anthropic.llm import AnthropicLLMService
|
from pipecat.services.anthropic import AnthropicLLMService
|
||||||
from pipecat.services.cartesia.tts import CartesiaTTSService
|
from pipecat.services.cartesia import CartesiaTTSService
|
||||||
from pipecat.services.deepgram.stt import DeepgramSTTService
|
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||||
from pipecat.transports.base_transport import TransportParams
|
|
||||||
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
|
|
||||||
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
|
|
||||||
|
|
||||||
load_dotenv(override=True)
|
load_dotenv(override=True)
|
||||||
|
|
||||||
|
logger.remove(0)
|
||||||
|
logger.add(sys.stderr, level="DEBUG")
|
||||||
|
|
||||||
|
|
||||||
async def get_weather(function_name, tool_call_id, arguments, llm, context, result_callback):
|
async def get_weather(function_name, tool_call_id, arguments, llm, context, result_callback):
|
||||||
location = arguments["location"]
|
location = arguments["location"]
|
||||||
await result_callback(f"The weather in {location} is currently 72 degrees and sunny.")
|
await result_callback(f"The weather in {location} is currently 72 degrees and sunny.")
|
||||||
|
|
||||||
|
|
||||||
async def run_bot(webrtc_connection: SmallWebRTCConnection):
|
async def main():
|
||||||
logger.info(f"Starting bot")
|
async with aiohttp.ClientSession() as session:
|
||||||
|
(room_url, token) = await configure(session)
|
||||||
|
|
||||||
transport = SmallWebRTCTransport(
|
transport = DailyTransport(
|
||||||
webrtc_connection=webrtc_connection,
|
room_url,
|
||||||
params=TransportParams(
|
token,
|
||||||
audio_in_enabled=True,
|
"Respond bot",
|
||||||
audio_out_enabled=True,
|
DailyParams(
|
||||||
vad_enabled=True,
|
audio_out_enabled=True,
|
||||||
vad_analyzer=SileroVADAnalyzer(),
|
transcription_enabled=True,
|
||||||
vad_audio_passthrough=True,
|
vad_enabled=True,
|
||||||
),
|
vad_analyzer=SileroVADAnalyzer(),
|
||||||
)
|
),
|
||||||
|
)
|
||||||
|
|
||||||
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
|
tts = CartesiaTTSService(
|
||||||
|
api_key=os.getenv("CARTESIA_API_KEY"),
|
||||||
|
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
|
||||||
|
)
|
||||||
|
|
||||||
tts = CartesiaTTSService(
|
llm = AnthropicLLMService(
|
||||||
api_key=os.getenv("CARTESIA_API_KEY"),
|
api_key=os.getenv("ANTHROPIC_API_KEY"), model="claude-3-5-sonnet-20240620"
|
||||||
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
|
)
|
||||||
)
|
llm.register_function("get_weather", get_weather)
|
||||||
|
|
||||||
llm = AnthropicLLMService(
|
tools = [
|
||||||
api_key=os.getenv("ANTHROPIC_API_KEY"), model="claude-3-7-sonnet-latest"
|
{
|
||||||
)
|
"name": "get_weather",
|
||||||
llm.register_function("get_weather", get_weather)
|
"description": "Get the current weather in a given location",
|
||||||
|
"input_schema": {
|
||||||
weather_function = FunctionSchema(
|
"type": "object",
|
||||||
name="get_weather",
|
"properties": {
|
||||||
description="Get the current weather",
|
"location": {
|
||||||
properties={
|
"type": "string",
|
||||||
"location": {
|
"description": "The city and state, e.g. San Francisco, CA",
|
||||||
"type": "string",
|
}
|
||||||
"description": "The city and state, e.g. San Francisco, CA",
|
},
|
||||||
},
|
"required": ["location"],
|
||||||
},
|
},
|
||||||
required=["location"],
|
}
|
||||||
)
|
|
||||||
tools = ToolsSchema(standard_tools=[weather_function])
|
|
||||||
|
|
||||||
# todo: test with very short initial user message
|
|
||||||
|
|
||||||
# messages = [{"role": "system",
|
|
||||||
# "content": "You are a helpful assistant who can report the weather in any location in the universe. Respond concisely. Your response will be turned into speech so use only simple words and punctuation."},
|
|
||||||
# {"role": "user",
|
|
||||||
# "content": " Start the conversation by introducing yourself."}]
|
|
||||||
|
|
||||||
messages = [{"role": "user", "content": "Say 'hello' to start the conversation."}]
|
|
||||||
|
|
||||||
context = OpenAILLMContext(messages, tools)
|
|
||||||
context_aggregator = llm.create_context_aggregator(context)
|
|
||||||
|
|
||||||
pipeline = Pipeline(
|
|
||||||
[
|
|
||||||
transport.input(), # Transport user input
|
|
||||||
stt,
|
|
||||||
context_aggregator.user(), # User spoken responses
|
|
||||||
llm, # LLM
|
|
||||||
tts, # TTS
|
|
||||||
transport.output(), # Transport bot output
|
|
||||||
context_aggregator.assistant(), # Assistant spoken responses and tool context
|
|
||||||
]
|
]
|
||||||
)
|
|
||||||
|
|
||||||
task = PipelineTask(
|
# todo: test with very short initial user message
|
||||||
pipeline,
|
|
||||||
params=PipelineParams(
|
|
||||||
allow_interruptions=True,
|
|
||||||
enable_metrics=True,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_connected")
|
# messages = [{"role": "system",
|
||||||
async def on_client_connected(transport, client):
|
# "content": "You are a helpful assistant who can report the weather in any location in the universe. Respond concisely. Your response will be turned into speech so use only simple words and punctuation."},
|
||||||
logger.info(f"Client connected")
|
# {"role": "user",
|
||||||
# Kick off the conversation.
|
# "content": " Start the conversation by introducing yourself."}]
|
||||||
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_disconnected")
|
messages = [{"role": "user", "content": "Say 'hello' to start the conversation."}]
|
||||||
async def on_client_disconnected(transport, client):
|
|
||||||
logger.info(f"Client disconnected")
|
|
||||||
|
|
||||||
@transport.event_handler("on_client_closed")
|
context = OpenAILLMContext(messages, tools)
|
||||||
async def on_client_closed(transport, client):
|
context_aggregator = llm.create_context_aggregator(context)
|
||||||
logger.info(f"Client closed connection")
|
|
||||||
await task.cancel()
|
|
||||||
|
|
||||||
runner = PipelineRunner(handle_sigint=False)
|
pipeline = Pipeline(
|
||||||
|
[
|
||||||
|
transport.input(), # Transport user input
|
||||||
|
context_aggregator.user(), # User spoken responses
|
||||||
|
llm, # LLM
|
||||||
|
tts, # TTS
|
||||||
|
transport.output(), # Transport bot output
|
||||||
|
context_aggregator.assistant(), # Assistant spoken responses and tool context
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
await runner.run(task)
|
task = PipelineTask(
|
||||||
|
pipeline,
|
||||||
|
params=PipelineParams(
|
||||||
|
allow_interruptions=True,
|
||||||
|
enable_metrics=True,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
|
@transport.event_handler("on_first_participant_joined")
|
||||||
|
async def on_first_participant_joined(transport, participant):
|
||||||
|
await transport.capture_participant_transcription(participant["id"])
|
||||||
|
# Kick off the conversation.
|
||||||
|
await task.queue_frames([context_aggregator.user().get_context_frame()])
|
||||||
|
|
||||||
|
runner = PipelineRunner()
|
||||||
|
|
||||||
|
await runner.run(task)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
from run import main
|
asyncio.run(main())
|
||||||
|
|
||||||
main()
|
|
||||||
|
|||||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user