Compare commits

..

4 Commits

Author SHA1 Message Date
James Hush
97c7820372 Fix copyright 2026-03-05 11:17:02 +08:00
James Hush
7d957292e0 Rename to 57, extract content check into helper method 2026-03-05 11:14:36 +08:00
James Hush
218ab01070 Use ParallelPipeline for content filter in example 07
Run the content filter concurrently with LLM text generation
using ParallelPipeline, with a ContentFilterGate that blocks
output until the filter approves or rejects the content.
2026-03-05 11:08:56 +08:00
James Hush
9dbd923cfc Make separate file 2026-03-05 10:41:53 +08:00
550 changed files with 14481 additions and 31974 deletions

View File

@@ -32,20 +32,6 @@ Create changelog files for the important commits in this PR. The PR number is pr
6. Use ⚠️ emoji prefix for breaking changes.
7. **Write changes in user-facing terms first.** Lead with what users of the framework will notice: new APIs, changed behavior, new parameters, fixed bugs they might have hit, etc. Implementation details (internal refactoring, how something is wired up under the hood) can be included as secondary context after the user-facing description, but should never be the *only* content of a changelog entry when there is a user-visible effect.
**Good** (user-facing first, implementation detail as context):
```
- Turn completion instructions now persist correctly across full context updates when using `system_instruction`. Previously they were injected as a context system message, which caused warning spam and didn't survive context updates.
```
**Bad** (implementation detail only, no user-facing framing):
```
- Fixed turn completion instructions being injected as a context system message instead of using `system_instruction`.
```
Ask yourself: "If I'm a developer building on Pipecat, what would I notice changed?" Start there.
## Example
For PR #3519 with a new feature and a bug fix:
@@ -57,5 +43,5 @@ For PR #3519 with a new feature and a bug fix:
`changelog/3519.fixed.md`:
```
- Fixed an issue where something was not working correctly in some user-visible scenario. The root cause was an internal implementation detail.
- Fixed an issue where something was not working correctly.
```

View File

@@ -7,883 +7,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
<!-- towncrier release notes start -->
## [0.0.108] - 2026-03-27
### Added
- Added `SarvamLLMService` with support for `sarvam-30b`, `sarvam-30b-16k`,
`sarvam-105b` and `sarvam-105b-32k`.
(PR [#3978](https://github.com/pipecat-ai/pipecat/pull/3978))
- Added `on_turn_context_created(context_id)` hook to `TTSService`. Override
this to perform provider-specific setup (e.g. eagerly opening a server-side
context) before text starts flowing. Called each time a new turn context ID
is created.
(PR [#4013](https://github.com/pipecat-ai/pipecat/pull/4013))
- Added `XAIHttpTTSService` for text-to-speech using xAI's HTTP TTS API.
(PR [#4031](https://github.com/pipecat-ai/pipecat/pull/4031))
- Added support for "developer" role messages in conversation context across
all LLM adapters. For non-OpenAI services (Anthropic, Google, AWS Bedrock),
"developer" messages are converted to "user" messages (use
`system_instruction` to set the system instruction). For OpenAI services,
"developer" messages pass through in conversation history. For the Responses
API, they are kept as "developer" role (matching the existing "system" →
"developer" conversion).
(PR [#4089](https://github.com/pipecat-ai/pipecat/pull/4089))
- Added `SmallestTTSService`, a WebSocket-based TTS service integration with
Smallest AI's Waves API. Supports the Lightning v2 and v3.1 models with
configurable voice, language, speed, consistency, similarity, and enhancement
settings.
(PR [#4092](https://github.com/pipecat-ai/pipecat/pull/4092))
- Added warnings in turn stop strategies when `VADParams.stop_secs` differs
from the recommended default (0.2s) or when `stop_secs >= STT p99 latency`,
which collapses the STT wait timeout to 0s and may cause delayed turn
detection. The warnings guide developers to re-run the
[stt-benchmark](https://github.com/pipecat-ai/stt-benchmark) with their VAD
settings.
(PR [#4115](https://github.com/pipecat-ai/pipecat/pull/4115))
- Added `domain` parameter to `AssemblyAISTTSettings` for specialized
recognition modes such as Medical Mode (`domain="medical-v1"`).
(PR [#4117](https://github.com/pipecat-ai/pipecat/pull/4117))
- Added `NovitaLLMService` for using Novita AI's LLM models via their
OpenAI-compatible API.
(PR [#4119](https://github.com/pipecat-ai/pipecat/pull/4119))
- Added `cleanup()` method to `VADAnalyzer` and `VADController` so VAD analyzer
resources are properly released when no longer needed. Custom `VADAnalyzer`
subclasses can override `cleanup()` to free any held resources.
(PR [#4120](https://github.com/pipecat-ai/pipecat/pull/4120))
- Added `on_end_of_turn` event handler to `AssemblyAISTTService`. This fires
after the final transcript is pushed, providing a reliable hook for
end-of-turn logic that doesn't race with `TranscriptionFrame`. Works in both
Pipecat and AssemblyAI turn detection modes.
(PR [#4128](https://github.com/pipecat-ai/pipecat/pull/4128))
- Added `DeepgramFluxSageMakerSTTService` for running Deepgram Flux
speech-to-text on AWS SageMaker endpoints. Use with
`ExternalUserTurnStrategies` to take advantage of Flux's turn detection.
(PR [#4143](https://github.com/pipecat-ai/pipecat/pull/4143))
- Added `Mem0MemoryService.get_memories()` convenience method for retrieving
all stored memories outside the pipeline (e.g. to build a personalized
greeting at connection time). This avoids the need to manually handle client
type branching, filter construction, and async wrapping.
(PR [#4156](https://github.com/pipecat-ai/pipecat/pull/4156))
### Changed
- Added context prewarming path for `InworldTTSService` to improve first audio
latency.
(PR [#4013](https://github.com/pipecat-ai/pipecat/pull/4013))
- Added `KrispVivaVadAnalyzer` for Voice Activity Detection using the Krisp
VIVA SDK (requires `krisp_audio`).
(PR [#4022](https://github.com/pipecat-ai/pipecat/pull/4022))
- Modified `InworldTTSService` to close context at end of turn instead of
relying on idle timeout.
(PR [#4028](https://github.com/pipecat-ai/pipecat/pull/4028))
- Added Gemini 3 support to the Gemini Live service.
(PR [#4078](https://github.com/pipecat-ai/pipecat/pull/4078))
- `TTSService`: the default `stop_frame_timeout_s` (idle time before an
automatic `TTSStoppedFrame` is pushed when `push_stop_frames=True`) has
changed from `2.0` to `3.0` seconds.
(PR [#4084](https://github.com/pipecat-ai/pipecat/pull/4084))
- ⚠️ `GeminiLLMAdapter` now only treats `messages[0]` as the initial system
message, matching all other adapters. Previously it searched for the first
"system" message anywhere in the conversation history. A "system" message
appearing later in the list will now be converted to "user" instead of being
extracted as the system instruction.
(PR [#4089](https://github.com/pipecat-ai/pipecat/pull/4089))
- Fixed `InworldTtsService` to fallback to full text when TTS timestamps are
not received.
(PR [#4113](https://github.com/pipecat-ai/pipecat/pull/4113))
- ⚠️ Realtime services (Gemini Live, OpenAI Realtime, Grok Realtime, Nova
Sonic) now prefer `system_instruction` from service settings over an initial
system message in the LLM context, matching the behavior of non-realtime
services. Previously, context-provided system instructions took precedence. A
warning is now logged when both are set.
(PR [#4130](https://github.com/pipecat-ai/pipecat/pull/4130))
- Bumped `nvidia-riva-client` minimum version to `>=2.25.1`.
(PR [#4136](https://github.com/pipecat-ai/pipecat/pull/4136))
- Upgraded `protobuf` from 5.x to 6.x (`>=6.31.1,<7`).
(PR [#4136](https://github.com/pipecat-ai/pipecat/pull/4136))
- Unrecognized language strings (e.g. Deepgram's `"multi"`) no longer produce a
warning at startup. The log message has been downgraded to debug level since
these are valid service-specific values that are passed through correctly.
(PR [#4137](https://github.com/pipecat-ai/pipecat/pull/4137))
- `GrokLLMService` and `GrokRealtimeLLMService` now live in the
`pipecat.services.xai` module alongside `XAIHttpTTSService`, since all three
use the same xAI API. Update imports from `pipecat.services.grok.*` to
`pipecat.services.xai.*` (e.g. `from pipecat.services.xai.llm import
GrokLLMService`).
(PR [#4142](https://github.com/pipecat-ai/pipecat/pull/4142))
- ⚠️ Bumped `mem0ai` dependency from `~=0.1.94` to `>=1.0.8,<2`. Users of the
`mem0` extra will need to update their mem0ai package.
(PR [#4156](https://github.com/pipecat-ai/pipecat/pull/4156))
### Deprecated
- `pipecat.services.grok.llm`, `pipecat.services.grok.realtime.llm`, and
`pipecat.services.grok.realtime.events` are deprecated. The old import paths
still work but emit a `DeprecationWarning`; use `pipecat.services.xai.llm`,
`pipecat.services.xai.realtime.llm`, and
`pipecat.services.xai.realtime.events` instead.
(PR [#4142](https://github.com/pipecat-ai/pipecat/pull/4142))
### Removed
- ⚠️ `TTSService.add_word_timestamps()` no longer supports the `"Reset"` and
`"TTSStoppedFrame"` sentinel strings. If you have a custom TTS service that
called `await self.add_word_timestamps([("Reset", 0)])` or `await
self.add_word_timestamps([("TTSStoppedFrame", 0), ("Reset", 0)], ctx_id)`,
replace them with `await self.append_to_audio_context(ctx_id,
TTSStoppedFrame(context_id=ctx_id))` and let `_handle_audio_context` manage
the word-timestamp reset automatically.
(PR [#4145](https://github.com/pipecat-ai/pipecat/pull/4145))
- Removed `SambaNovaSTTService`. SambaNova no longer offers speech-to-text
audio models. Use another STT provider instead.
(PR [#4154](https://github.com/pipecat-ai/pipecat/pull/4154))
### Fixed
- Fixed Gemini Live (`GoogleGeminiLiveLLMService`) not honoring
`settings.system_instruction`. The system instruction was being read from a
deprecated constructor parameter instead of the settings object, causing it
to be silently ignored.
(PR [#4089](https://github.com/pipecat-ai/pipecat/pull/4089))
- Fixed `AWSBedrockLLMAdapter` sending an empty message list to the API when
the only message in context was a system message. The lone system message is
now converted to "user" role instead of being extracted, matching the
existing Anthropic adapter behavior.
(PR [#4089](https://github.com/pipecat-ai/pipecat/pull/4089))
- Fixed Gemini Live pipeline hanging indefinitely when an `EndFrame` was
deferred while waiting for the bot to finish responding and `turn_complete`
never arrived. As a possible root-cause fix, `turn_complete` messages are now
handled even if they lack `usage_metadata`. As a fallback, the deferred
`EndFrame` now has a 30-second safety timeout.
(PR [#4125](https://github.com/pipecat-ai/pipecat/pull/4125))
- Fixed ElevenLabs WebSocket disconnections (1008 "Maximum simultaneous
contexts exceeded") caused by rapid user interruptions. When interruptions
arrived before any TTS text was generated, phantom contexts were created on
the ElevenLabs server that were never closed, eventually exceeding the
5-context limit.
(PR [#4126](https://github.com/pipecat-ai/pipecat/pull/4126))
- Fixed the final sentence being dropped from the conversation context when
using RTVI text input with non-word-timestamp TTS services. The
`LLMFullResponseEndFrame` was racing ahead of the last `TTSTextFrame`,
causing the `LLMAssistantAggregator` to finalize the context before the final
sentence arrived.
(PR [#4127](https://github.com/pipecat-ai/pipecat/pull/4127))
- Fixed audio crackling and popping in recordings when both user and bot are
speaking. `AudioBufferProcessor` no longer injects silence into a track's
buffer while that track is actively producing audio, preventing mid-utterance
interruptions in the recorded output.
(PR [#4135](https://github.com/pipecat-ai/pipecat/pull/4135))
- Fixed websocket TTS word timestamps so interrupted contexts cannot leak stale
words or backward PTS values into later turns.
(PR [#4145](https://github.com/pipecat-ai/pipecat/pull/4145))
- Fixed a race condition in `InterruptibleTTSService` where, if `run_tts` had
been invoked but `BotStartedSpeakingFrame` had not yet been received, a user
interruption could allow stale audio to leak through.
(PR [#4145](https://github.com/pipecat-ai/pipecat/pull/4145))
- Fixed Gemini Live local VAD mode (`GeminiVADParams(disabled=True)` with
external VAD) not working. The bot now correctly detects user speech and
signals turn boundaries to the Gemini API.
(PR [#4146](https://github.com/pipecat-ai/pipecat/pull/4146))
- Fixed Gemini Live message handling to process all `server_content` fields
independently. Gemini 3.x can bundle multiple fields (e.g. `model_turn` and
`output_transcription`) on the same message, but the previous `elif` chain
only processed the first match, silently dropping the rest.
(PR [#4147](https://github.com/pipecat-ai/pipecat/pull/4147))
- Fixed `ServiceSwitcher` with `ServiceSwitcherStrategyFailover` incorrectly
triggering failover when `ErrorFrame`s from other pipeline stages (e.g. TTS)
propagated upstream through the switcher. Previously, any non-fatal error
passing through would be misattributed to the active service and trigger an
unwanted service switch. Now only errors originating from the switcher's own
managed services trigger failover.
(PR [#4149](https://github.com/pipecat-ai/pipecat/pull/4149))
- Fixed `LiveKitOutputTransport` not clearing the `rtc.AudioSource` internal
buffer on interruption, causing the bot to continue speaking for several
seconds after being interrupted.
(PR [#4151](https://github.com/pipecat-ai/pipecat/pull/4151))
- Fixed a crash in OpenAI LLM processing when the provider returns
`chunk.choices[0].delta.audio = None`, which caused `'NoneType' object has no
attribute 'get'` errors during audio transcript handling.
(PR [#4152](https://github.com/pipecat-ai/pipecat/pull/4152))
- Fixed error floods in `DeepgramSTTService` when the WebSocket connection
drops. With Deepgram SDK 6.x, `send_media()` raises exceptions on a dead
connection instead of silently failing, causing every queued audio frame to
log an error. Now `send_media()` failures are caught gracefully — a single
warning is logged and audio frames are skipped until the existing
reconnection logic restores the connection.
(PR [#4153](https://github.com/pipecat-ai/pipecat/pull/4153))
- `Mem0MemoryService` no longer blocks the event loop during memory storage and
retrieval. All Mem0 API calls now run in a background thread, and message
storage is fire-and-forget so it doesn't delay downstream processing.
(PR [#4156](https://github.com/pipecat-ai/pipecat/pull/4156))
- Fixed `Mem0MemoryService` failing to store messages when the context
contained system or developer role messages. The Mem0 API only accepts user
and assistant roles, so other roles are now filtered out before storing.
(PR [#4156](https://github.com/pipecat-ai/pipecat/pull/4156))
- Added missing `on_dtmf_event` callback to `LemonSliceTransportClient.setup()`
`DailyCallbacks` construction, fixing a `ValidationError` at pipeline setup
time.
(PR [#4161](https://github.com/pipecat-ai/pipecat/pull/4161))
- Fixed an issue in `InworldTTSService` where, in cases of fast interruption,
we would continue receiving audio from the previous context.
(PR [#4167](https://github.com/pipecat-ai/pipecat/pull/4167))
- Fixed a word timestamp interleaving issue in `InworldTTSService` when
processing multiple sentences.
(PR [#4167](https://github.com/pipecat-ai/pipecat/pull/4167))
- Fixed duplicate `TTSStoppedFrame` being pushed in TTS services using
`push_stop_frames=True`. When the stop-frame timeout fired, a second
`TTSStoppedFrame` could be pushed after the normal one at context completion.
(PR [#4172](https://github.com/pipecat-ai/pipecat/pull/4172))
- ⚠️ Fixed `DeepgramSTTService` compatibility with deepgram-sdk 6.1.0. The SDK
now requires explicit message objects for `send_keep_alive()`,
`send_close_stream()`, and `send_finalize()`. The minimum deepgram-sdk
version is now 6.1.0.
(PR [#4174](https://github.com/pipecat-ai/pipecat/pull/4174))
- Fixed RTVI events not being delivered to clients when using WebSocket
transports. `ProtobufFrameSerializer` now sets `ignore_rtvi_messages=False`
by default.
(PR [#4176](https://github.com/pipecat-ai/pipecat/pull/4176))
- Fixed a timing issue where turn detection timer tasks (idle controller,
speech timeout, turn analyzer, and turn completion) could miss their first
tick because the newly created asyncio task was not yet scheduled when the
caller continued.
(PR [#4183](https://github.com/pipecat-ai/pipecat/pull/4183))
- Fixed `FastAPIWebsocketTransport` intermittently hanging on shutdown when the
remote side (e.g. Twilio) disconnects while audio is being sent. A race
condition between the send and receive paths could cause the
`on_client_disconnected` callback to be skipped, leaving the pipeline waiting
for a disconnect signal that never came.
(PR [#4186](https://github.com/pipecat-ai/pipecat/pull/4186))
### Performance
- `RimeTTSService` now handles Rime's `done` WebSocket message to complete
audio contexts immediately, eliminating the 3-second idle timeout that
previously added latency at the end of each utterance.
(PR [#4172](https://github.com/pipecat-ai/pipecat/pull/4172))
## [0.0.107] - 2026-03-23
### Added
- Added `frame_order` parameter to `SyncParallelPipeline`. Set
`frame_order=FrameOrder.PIPELINE` to push synchronized output frames in
pipeline definition order (all frames from the first pipeline, then the
second, etc.) instead of the default arrival order.
(PR [#4029](https://github.com/pipecat-ai/pipecat/pull/4029))
- Added `sync_with_audio` field to `OutputImageRawFrame`. When set to `True`,
the output transport queues image frames with audio so they are displayed
only after all preceding audio has been sent, enabling synchronized
audio/image playback.
(PR [#4029](https://github.com/pipecat-ai/pipecat/pull/4029))
- Added `OpenAIResponsesLLMService`, a new LLM service that uses the OpenAI
Responses API. Supports streaming text, function calling, usage metrics, and
out-of-band inference. Works with the universal `LLMContext` and
`LLMContextAggregatorPair`. See
`examples/foundational/07-interruptible-openai-responses.py` and
`14-function-calling-openai-responses.py`.
(PR [#4074](https://github.com/pipecat-ai/pipecat/pull/4074))
- Added `audio_out_auto_silence` parameter to `TransportParams` (defaults to
`True`). When set to `False`, the transport waits for audio data instead of
inserting silence when the output queue is empty, which is useful for
scenarios that require uninterrupted audio playback without artificial gaps.
(PR [#4104](https://github.com/pipecat-ai/pipecat/pull/4104))
### Changed
- Renamed tracing span attributes to align with OpenTelemetry GenAI semantic
conventions: `gen_ai.system` to `gen_ai.provider.name`, `system` to
`gen_ai.system_instructions`, `gen_ai.usage.cache_read_input_tokens` to
`gen_ai.usage.cache_read.input_tokens`, and
`gen_ai.usage.cache_creation_input_tokens` to
`gen_ai.usage.cache_creation.input_tokens`.
(PR [#3449](https://github.com/pipecat-ai/pipecat/pull/3449))
- `DeepgramSageMakerTTSService` now correctly routes audio through the base
`TTSService` audio context queue. Audio frames are delivered via
`append_to_audio_context()` instead of being pushed directly, enabling proper
ordering, interruption handling, and start/stop frame lifecycle management.
Interruptions now trigger a `Clear` message to Deepgram (flushing its text
buffer) at the right time via `on_audio_context_interrupted`.
(PR [#4083](https://github.com/pipecat-ai/pipecat/pull/4083))
- `GradiumTTSService` now sends a per-context `setup` message with
`client_req_id` before the first text message for each TTS context, following
Gradium's multiplexing protocol. Previously, a single setup message was sent
at connection time without a `client_req_id`, which prevented Gradium from
associating requests with their sessions when using `close_ws_on_eos=False`.
(PR [#4091](https://github.com/pipecat-ai/pipecat/pull/4091))
### Fixed
- Fixed stale `system_instruction` in LLM tracing spans by reading from
`_settings.system_instruction` instead of the removed `_system_instruction`
attribute.
(PR [#3449](https://github.com/pipecat-ai/pipecat/pull/3449))
- Fixed `SyncParallelPipeline` breaking the Whisker debugger.
(PR [#4029](https://github.com/pipecat-ai/pipecat/pull/4029))
- Fixed `SyncParallelPipeline` race condition where concurrent SystemFrame
processing (e.g. from RTVI) could corrupt sink queues and cause deadlocks.
SystemFrames now take a fast path that passes them through without draining
queued output.
(PR [#4029](https://github.com/pipecat-ai/pipecat/pull/4029))
- Fixed TTS frame ordering so that non-system frames always arrive in correct
order relative to the `TTSStartedFrame`/`TTSAudioRawFrame`/`TTSStoppedFrame`
sequence. Previously these frames could race ahead of or behind audio context
frames, producing out-of-order output downstream.
(PR [#4075](https://github.com/pipecat-ai/pipecat/pull/4075))
- Fixed `SarvamTTSService` audio and error frames now route through
`append_to_audio_context()` instead of `push_frame()`, ensuring correct
behavior with audio contexts and interruptions.
(PR [#4082](https://github.com/pipecat-ai/pipecat/pull/4082))
- Fixed audio frame ordering and interruption handling in Fish Audio, LMNT,
Neuphonic, and Rime NonJson TTS services. These services were bypassing the
base `TTSService` audio context serialization queue by pushing audio frames
directly, which could cause out-of-order frames and broken interruptions
during speech.
(PR [#4090](https://github.com/pipecat-ai/pipecat/pull/4090))
- Fixed Genesys AudioHook serializer to always include the `parameters` field in
protocol messages. The AudioHook protocol requires every message to carry a
`parameters` object (even if empty), but `_create_message` omitted it when no
parameters were provided. This caused clients that validate message structure
(including the Genesys reference implementation) to reject `pong` and
parameter-less `closed` responses, breaking server sequence tracking and
preventing `outputVariables` from reaching the Architect flow.
(PR [#4093](https://github.com/pipecat-ai/pipecat/pull/4093))
## [0.0.106] - 2026-03-18
### Added
- Added optional `service` field to `ServiceUpdateSettingsFrame` (and its
subclasses `LLMUpdateSettingsFrame`, `TTSUpdateSettingsFrame`,
`STTUpdateSettingsFrame`) to target a specific service instance. When
`service` is set, only the matching service applies the settings; others
forward the frame unchanged. This enables updating a single service when
multiple services of the same type exist in the pipeline.
(PR [#4004](https://github.com/pipecat-ai/pipecat/pull/4004))
- Added `sip_provider` and `room_geo` parameters to `configure()` in the Daily
runner. These convenience parameters let callers specify a SIP provider name
and geographic region directly without manually constructing
`DailyRoomProperties` and `DailyRoomSipParams`.
(PR [#4005](https://github.com/pipecat-ai/pipecat/pull/4005))
- Added `PerplexityLLMAdapter` that automatically transforms conversation
messages to satisfy Perplexity's stricter API constraints (strict role
alternation, no non-initial system messages, last message must be user/tool).
Previously, certain conversation histories could cause Perplexity API errors
that didn't occur with OpenAI (`PerplexityLLMService` subclasses
`OpenAILLMService` since Perplexity uses an OpenAI-compatible API).
(PR [#4009](https://github.com/pipecat-ai/pipecat/pull/4009))
- Added DTMF input event support to the Daily transport. Incoming DTMF tones
are now received via Daily's `on_dtmf_event` callback and pushed into the
pipeline as `InputDTMFFrame`, enabling bots to react to keypad presses from
phone callers.
(PR [#4047](https://github.com/pipecat-ai/pipecat/pull/4047))
- Added `WakePhraseUserTurnStartStrategy` for triggering user turns based on
wake phrases, with support for `single_activation` mode. Deprecates
`WakeCheckFilter`.
(PR [#4064](https://github.com/pipecat-ai/pipecat/pull/4064))
- Added `default_user_turn_start_strategies()` and
`default_user_turn_stop_strategies()` helper functions for composing custom
strategy lists.
(PR [#4064](https://github.com/pipecat-ai/pipecat/pull/4064))
### Changed
- Changed tool result JSON serialization to use `ensure_ascii=False`,
preserving UTF-8 characters instead of escaping them. This reduces context
size and token usage for non-English languages.
(PR [#3457](https://github.com/pipecat-ai/pipecat/pull/3457))
- `OpenAIRealtimeSTTService`'s `noise_reduction` parameter is now part of
`OpenAIRealtimeSTTSettings`, making it runtime-updatable via
`STTUpdateSettingsFrame`. The direct `noise_reduction` init argument is
deprecated as of 0.0.106.
(PR [#3991](https://github.com/pipecat-ai/pipecat/pull/3991))
- Updated `sarvamai` dependency from `0.1.26a2` (alpha) to `0.1.26` (stable
release).
(PR [#3997](https://github.com/pipecat-ai/pipecat/pull/3997))
- `SimliVideoService` now extends `AIService` instead of `FrameProcessor`,
aligning it with the HeyGen and Tavus video services. It supports
`SimliVideoService.Settings(...)` for configuration and uses
`start()`/`stop()`/`cancel()` lifecycle methods. Existing constructor usage
(`api_key`, `face_id`, etc.) remains unchanged.
(PR [#4001](https://github.com/pipecat-ai/pipecat/pull/4001))
- Update `pipecat-ai-small-webrtc-prebuilt` to `2.4.0`.
(PR [#4023](https://github.com/pipecat-ai/pipecat/pull/4023))
- Nova Sonic assistant text transcripts are now delivered in real-time using
speculative text events instead of delayed final text events. Previously,
assistant text only arrived after all audio had finished playing, causing
laggy transcripts in client UIs. Speculative text arrives before each audio
chunk, providing text synchronized with what the bot is saying. This also
simplifies the internal text handling by removing the interruption re-push
hack and assistant text buffer.
(PR [#4042](https://github.com/pipecat-ai/pipecat/pull/4042))
- Updated `daily-python` dependency to 0.25.0.
(PR [#4047](https://github.com/pipecat-ai/pipecat/pull/4047))
- Added `enable_dialout` parameter to `configure()` in `pipecat.runner.daily`
to support dial-out rooms. Also narrowed misleading `Optional` type hints and
deduplicated token expiry calculation.
(PR [#4048](https://github.com/pipecat-ai/pipecat/pull/4048))
- Extended `ProcessFrameResult` to stop strategies, allowing a stop strategy to
short-circuit evaluation of subsequent strategies by returning `STOP`.
(PR [#4064](https://github.com/pipecat-ai/pipecat/pull/4064))
- `GradiumSTTService` now takes both an `encoding` and `sample_rate`
constructor argument which is assmebled in the class to form the
`input_format`. PCM accepts `8000`, `16000`, and `24000` Hz sample rates.
(PR [#4066](https://github.com/pipecat-ai/pipecat/pull/4066))
- Improved `GradiumSTTService` transcription accuracy by reworking how text
fragments are accumulated and finalized. Previously, trailing words could be
dropped when the server's `flushed` response arrived before all text tokens
were delivered. The service now uses a short aggregation delay after flush to
capture trailing tokens, producing complete utterances.
(PR [#4066](https://github.com/pipecat-ai/pipecat/pull/4066))
### Deprecated
- `SimliVideoService.InputParams` is deprecated. Use the direct constructor
parameters `max_session_length`, `max_idle_time`, and `enable_logging`
instead.
(PR [#4001](https://github.com/pipecat-ai/pipecat/pull/4001))
- Deprecated `LocalSmartTurnAnalyzerV2` and `LocalCoreMLSmartTurnAnalyzer`. Use
`LocalSmartTurnAnalyzerV3` instead. Instantiating these analyzers will now
emit a `DeprecationWarning`.
(PR [#4012](https://github.com/pipecat-ai/pipecat/pull/4012))
- Deprecated `WakeCheckFilter` in favor of `WakePhraseUserTurnStartStrategy`.
(PR [#4064](https://github.com/pipecat-ai/pipecat/pull/4064))
### Fixed
- Fixed an issue where the default model for `OpenAILLMService` and
`AzureLLMService` was mistakenly reverted to `gpt-4o`. The defaults are now
restored to `gpt-4.1`.
(PR [#4000](https://github.com/pipecat-ai/pipecat/pull/4000))
- Fixed a race condition where `EndTaskFrame` could cause the pipeline to shut
down before in-flight frames (e.g. LLM function call responses) finished
processing. `EndTaskFrame` and `StopTaskFrame` now flow through the pipeline
as `ControlFrame`s, ensuring all pending work is flushed before shutdown
begins. `CancelTaskFrame` and `InterruptionTaskFrame` remain immediate
(`SystemFrame`).
(PR [#4006](https://github.com/pipecat-ai/pipecat/pull/4006))
- Fixed `ParallelPipeline` dropping or misordering frames during lifecycle
synchronization. Buffered frames are now flushed in the correct order
relative to synchronization frames (`StartFrame` goes first,
`EndFrame`/`CancelFrame` go after), and frames added to the buffer during
flush are also drained.
(PR [#4007](https://github.com/pipecat-ai/pipecat/pull/4007))
- Fixed `TTSService` potentially canceling in-flight audio during shutdown. The
stop sequence now waits for all queued audio contexts to finish processing
before canceling the stop frame task.
(PR [#4007](https://github.com/pipecat-ai/pipecat/pull/4007))
- Fixed `Language` enum values (e.g. `Language.ES`) not being converted to
service-specific codes when passed via
`settings=Service.Settings(language=Language.ES)` at init time. This caused
API errors (e.g. 400 from Rime) because the raw enum was sent instead of the
expected language code (e.g. `"spa"`). Runtime updates via
`UpdateSettingsFrame` were unaffected. The fix centralizes conversion in the
base `TTSService` and `STTService` classes so all services handle this
consistently.
(PR [#4024](https://github.com/pipecat-ai/pipecat/pull/4024))
- Fixed `DeepgramSTTService` ignoring the `base_url` scheme when using `ws://`
or `http://`. Previously these were silently overwritten with `wss://` /
`https://`, breaking air-gapped or private deployments that don't use TLS.
All scheme choices (`wss://`, `https://`, `ws://`, `http://`, or bare
hostname) are now respected.
(PR [#4026](https://github.com/pipecat-ai/pipecat/pull/4026))
- Fixed `LLMSwitcher.register_function()` and `register_direct_function()` not
accepting or forwarding the `timeout_secs` parameter.
(PR [#4037](https://github.com/pipecat-ai/pipecat/pull/4037))
- Fixed empty user transcriptions in Nova Sonic causing spurious interruptions.
Previously, an empty transcription could trigger an interruption of the
assistant's response even though the user hadn't actually spoken.
(PR [#4042](https://github.com/pipecat-ai/pipecat/pull/4042))
- Fixed `SonioxSTTService` and `OpenAIRealtimeSTTService` crash when language
parameters contain plain strings instead of `Language` enum values.
(PR [#4046](https://github.com/pipecat-ai/pipecat/pull/4046))
- Fixed premature user turn stops caused by late transcriptions arriving
between turns. A stale transcript from the previous turn could persist into
the next turn and trigger a stop before the current turn's real transcript
arrived. Stop strategies are now reset at both turn start and turn stop to
prevent state from leaking across turn boundaries.
(PR [#4057](https://github.com/pipecat-ai/pipecat/pull/4057))
- Fixed raw language strings like `"de-DE"` silently failing when passed to
TTS/STT services (e.g. ElevenLabs producing no audio). Raw strings now go
through the same `Language` enum resolution as enum values, so regional codes
like `"de-DE"` are properly converted to service-expected formats like
`"de"`. Unrecognized strings log a warning instead of failing silently.
(PR [#4058](https://github.com/pipecat-ai/pipecat/pull/4058))
- Fixed Deepgram STT list-type settings (`keyterm`, `keywords`, `search`,
`redact`, `replace`) being stringified instead of passed as lists to the SDK,
which caused them to be sent as literal strings (e.g. `"['pipecat']"`) in the
WebSocket query params.
(PR [#4063](https://github.com/pipecat-ai/pipecat/pull/4063))
- Fixed `MinWordsUserTurnStartStrategy` including text below the word threshold
in the output by resetting aggregation when the minimum word count is not
met.
(PR [#4064](https://github.com/pipecat-ai/pipecat/pull/4064))
- Fixed audio overlap and potential dropped TTS content when multiple assistant
turns occur in quick succession. `TTSService` now flushes remaining text
before pausing frame processing on `LLMFullResponseEndFrame`/`EndFrame`,
instead of pausing first.
(PR [#4071](https://github.com/pipecat-ai/pipecat/pull/4071))
### Security
- Bumped PyJWT minimum version from 2.10.1 to 2.12.0 in the `livekit` extra to
address CVE-2026-32597 (GHSA-752w-5fwx-jx9f), where PyJWT <= 2.11.0 accepted
unknown `crit` header extensions.
(PR [#4035](https://github.com/pipecat-ai/pipecat/pull/4035))
## [0.0.105] - 2026-03-10
### Added
- Added concurrent audio context support: `CartesiaTTSService` can now
synthesize the next sentence while the previous one is still playing, by
setting `pause_frame_processing=False` and routing each sentence through its
own audio context queue.
(PR [#3804](https://github.com/pipecat-ai/pipecat/pull/3804))
- Added custom video track support to Daily transport. Use
`video_out_destinations` in `DailyParams` to publish multiple video tracks
simultaneously, mirroring the existing `audio_out_destinations` feature.
(PR [#3831](https://github.com/pipecat-ai/pipecat/pull/3831))
- Added `ServiceSwitcherStrategyFailover` that automatically switches to the
next service when the active service reports a non-fatal error. Recovery
policies can be implemented via the `on_service_switched` event handler.
(PR [#3861](https://github.com/pipecat-ai/pipecat/pull/3861))
- Added optional `timeout_secs` parameter to `register_function()` and
`register_direct_function()` for per-tool function call timeout control,
overriding the global `function_call_timeout_secs` default.
(PR [#3915](https://github.com/pipecat-ai/pipecat/pull/3915))
- Added `cloud-audio-only` recording option to Daily transport's
`enable_recording` property.
(PR [#3916](https://github.com/pipecat-ai/pipecat/pull/3916))
- Wired up `system_instruction` in `BaseOpenAILLMService`,
`AnthropicLLMService`, and `AWSBedrockLLMService` so it works as a default
system prompt, matching the behavior of the Google services. This enables
sharing a single `LLMContext` across multiple LLM services, where each
service provides its own system instruction independently.
```python
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful assistant.",
)
context = LLMContext()
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
context.add_message({"role": "user", "content": "Please introduce yourself."})
await task.queue_frames([LLMRunFrame()])
```
(PR [#3918](https://github.com/pipecat-ai/pipecat/pull/3918))
- Added `vad_threshold` parameter to `AssemblyAIConnectionParams` for
configuring voice activity detection sensitivity in U3 Pro. Aligning this
with external VAD thresholds (e.g., Silero VAD) prevents the "dead zone"
where AssemblyAI transcribes speech that VAD hasn't detected yet.
(PR [#3927](https://github.com/pipecat-ai/pipecat/pull/3927))
- Added `push_empty_transcripts` parameter to `BaseWhisperSTTService` and
`OpenAISTTService` to allow empty transcripts to be pushed downstream as
`TranscriptionFrame` instead of discarding them (the default behavior). This
is intended for situations where VAD fires even though the user did not
speak. In these cases, it is useful to know that nothing was transcribed so
that the agent can resume speaking, instead of waiting longer for a
transcription.
(PR [#3930](https://github.com/pipecat-ai/pipecat/pull/3930))
- LLM services (`BaseOpenAILLMService`, `AnthropicLLMService`,
`AWSBedrockLLMService`) now log a warning when both `system_instruction` and
a system message in the context are set. The constructor's
`system_instruction` takes precedence.
(PR [#3932](https://github.com/pipecat-ai/pipecat/pull/3932))
- Runtime settings updates (via `STTUpdateSettingsFrame`) now work for AWS
Transcribe, Azure, Cartesia, Deepgram, ElevenLabs Realtime, Gradium, and
Soniox STT services. Previously, changing settings at runtime only stored the
new values without reconnecting.
(PR [#3946](https://github.com/pipecat-ai/pipecat/pull/3946))
- Exposed `on_summary_applied` event on `LLMAssistantAggregator`, allowing
users to listen for context summarization events without accessing private
members.
(PR [#3947](https://github.com/pipecat-ai/pipecat/pull/3947))
- Deepgram Flux STT settings (`keyterm`, `eot_threshold`,
`eager_eot_threshold`, `eot_timeout_ms`) can now be updated mid-stream via
`STTUpdateSettingsFrame` without triggering a reconnect. The new values are
sent to Deepgram as a Configure WebSocket message on the existing connection.
(PR [#3953](https://github.com/pipecat-ai/pipecat/pull/3953))
- Added `system_instruction` parameter to `run_inference` across all LLM
services, allowing callers to override the system prompt for one-shot
inference calls. Used by `_generate_summary` to pass the summarization prompt
cleanly.
(PR [#3968](https://github.com/pipecat-ai/pipecat/pull/3968))
### Changed
- Audio context management (previously in `AudioContextTTSService`) is now
built into `TTSService`. All WebSocket providers (`cartesia`, `elevenlabs`,
`asyncai`, `inworld`, `rime`, `gradium`, `resembleai`) now inherit from
`WebsocketTTSService` directly. Word-timestamp baseline is set automatically
on the first audio chunk of each context instead of requiring each provider
to call `start_word_timestamps()` in their receive loop.
(PR [#3804](https://github.com/pipecat-ai/pipecat/pull/3804))
- Daily transport now uses `CustomVideoSource`/`CustomVideoTrack` instead of
`VirtualCameraDevice` for the default camera output, mirroring how audio
already works with `CustomAudioSource`/`CustomAudioTrack`.
(PR [#3831](https://github.com/pipecat-ai/pipecat/pull/3831))
- ⚠️ Updated `DeepgramSTTService` to use `deepgram-sdk` v6. The `LiveOptions`
class was removed from the SDK and is now provided by pipecat directly;
import it from `pipecat.services.deepgram.stt` instead of `deepgram`.
(PR [#3848](https://github.com/pipecat-ai/pipecat/pull/3848))
- `ServiceSwitcherStrategy` base class now provides a `handle_error()` hook for
subclasses to implement error-based switching. `ServiceSwitcher` defaults to
`ServiceSwitcherStrategyManual` and `strategy_type` is now optional.
(PR [#3861](https://github.com/pipecat-ai/pipecat/pull/3861))
- Support for Voice Focus 2.0 models.
- Updated `aic-sdk` to `~=2.1.0` to support Voice Focus 2.0 models.
- Cleaned unused `ParameterFixedError` exception handling in `AICFilter`
parameter setup.
(PR [#3889](https://github.com/pipecat-ai/pipecat/pull/3889))
- `max_context_tokens` and `max_unsummarized_messages` in
`LLMAutoContextSummarizationConfig` (and deprecated
`LLMContextSummarizationConfig`) can now be set to `None` independently to
disable that summarization threshold. At least one must remain set.
(PR [#3914](https://github.com/pipecat-ai/pipecat/pull/3914))
- ⚠️ Removed `formatted_finals` and `word_finalization_max_wait_time` from
`AssemblyAIConnectionParams` as these were v2 API parameters not supported in
v3. Clarified that `format_turns` only applies to Universal-Streaming models;
U3 Pro has automatic formatting built-in.
(PR [#3927](https://github.com/pipecat-ai/pipecat/pull/3927))
- Changed `DeepgramTTSService` to send a Clear message on interruption instead
of disconnecting and reconnecting the WebSocket, allowing the connection to
persist throughout the session.
(PR [#3958](https://github.com/pipecat-ai/pipecat/pull/3958))
- Re-added `enhancement_level` support to `AICFilter` with runtime
`FilterEnableFrame` control, applying `ProcessorParameter.Bypass` and
`ProcessorParameter.EnhancementLevel` together.
(PR [#3961](https://github.com/pipecat-ai/pipecat/pull/3961))
- Updated `daily-python` dependency from `~=0.23.0` to `~=0.24.0`.
(PR [#3970](https://github.com/pipecat-ai/pipecat/pull/3970))
- Updated `FishAudioTTSService` default model from `s1` to `s2-pro`, matching
Fish Audio's latest recommended model for improved quality and speed.
(PR [#3973](https://github.com/pipecat-ai/pipecat/pull/3973))
- `AzureSTTService` `region` parameter is now optional when `private_endpoint`
is provided. A `ValueError` is raised if neither is given, and a warning is
logged if both are provided (`private_endpoint` takes priority).
(PR [#3974](https://github.com/pipecat-ai/pipecat/pull/3974))
### Deprecated
- Deprecated `AudioContextTTSService` and `AudioContextWordTTSService`.
Subclass `WebsocketTTSService` directly instead; audio context management is
now part of the base `TTSService`.
- Deprecated `WordTTSService`, `WebsocketWordTTSService`, and
`InterruptibleWordTTSService`. Word timestamp logic is now always active in
`TTSService` and no longer needs to be opted into via a subclass.
(PR [#3804](https://github.com/pipecat-ai/pipecat/pull/3804))
- Deprecated `pipecat.services.google.llm_vertex`,
`pipecat.services.google.llm_openai`, and
`pipecat.services.google.gemini_live.llm_vertex` modules. Use
`pipecat.services.google.vertex.llm`, `pipecat.services.google.openai.llm`,
and `pipecat.services.google.gemini_live.vertex.llm` instead. The old import
paths still work but will emit a `DeprecationWarning`.
(PR [#3980](https://github.com/pipecat-ai/pipecat/pull/3980))
### Removed
- ⚠️ Removed `supports_word_timestamps` parameter from `TTSService.__init__()`.
Word timestamp logic is now always active. Remove this argument from any
custom subclass `super().__init__()` calls.
(PR [#3804](https://github.com/pipecat-ai/pipecat/pull/3804))
### Fixed
- Fixed `DeepgramSTTService` keepalive ping timeout disconnections. The
deepgram-sdk v6 removed automatic keepalive; pipecat now sends explicit
`KeepAlive` messages every 5 seconds, within the recommended 35 second
interval before Deepgram's 10-second inactivity timeout.
(PR [#3848](https://github.com/pipecat-ai/pipecat/pull/3848))
- Fixed `BufferError: Existing exports of data: object cannot be re-sized` in
`AICFilter` caused by holding a `memoryview` on the mutable audio buffer
across async yield points.
(PR [#3889](https://github.com/pipecat-ai/pipecat/pull/3889))
- Fixed TTS context not being appended to the assistant message history when
using `TTSSpeakFrame` with `append_to_context=True` with some TTS providers.
(PR [#3936](https://github.com/pipecat-ai/pipecat/pull/3936))
- Fixed context summarization leaving orphaned tool responses in the kept
context when tool calls were moved to the summarized portion.
(PR [#3937](https://github.com/pipecat-ai/pipecat/pull/3937))
- Fixed turn completion state not resetting at end of LLM responses.
`LLMFullResponseEndFrame` is pushed (not received) by the LLM service, so the
mixin now handles it in `push_frame` instead of `process_frame`.
(PR [#3956](https://github.com/pipecat-ai/pipecat/pull/3956))
- Fixed turn completion instructions being injected as a context system message
instead of using `system_instruction`. This caused warning spam when
`system_instruction` was also set and didn't persist across full context
updates.
(PR [#3957](https://github.com/pipecat-ai/pipecat/pull/3957))
- Fixed `TTSService` audio context queue getting blocked when
`append_to_audio_context()` was called with a `None` context ID, which
prevented subsequent audio from being delivered.
(PR [#3958](https://github.com/pipecat-ai/pipecat/pull/3958))
- Fixed `on_call_state_updated` event handler in LiveKit transport receiving
incorrect number of arguments due to redundant `self` passed to
`_call_event_handler`.
(PR [#3959](https://github.com/pipecat-ai/pipecat/pull/3959))
- Fixed OpenAI Realtime, OpenAI Realtime Beta, and Grok realtime services
treating `conversation_already_has_active_response` as a fatal error. These
services now log it as a non-fatal debug event when a response is already in
progress.
(PR [#3960](https://github.com/pipecat-ai/pipecat/pull/3960))
- Fixed `SmallWebRTCConnection` silently discarding messages sent before the
data channel is open by queuing them and flushing once the channel is ready.
A bounded queue (`MAX_MESSAGE_QUEUE_SIZE = 50`) prevents unbounded memory
growth, and a 10-second timeout after connection clears the queue and falls
back to discard mode if the data channel never opens.
(PR [#3962](https://github.com/pipecat-ai/pipecat/pull/3962))
- Fixed `AzureSTTService` failing to initialize when `private_endpoint` is
provided. The Azure Speech SDK's `SpeechConfig` does not accept both `region`
and `endpoint` simultaneously, so they are now passed conditionally.
(PR [#3967](https://github.com/pipecat-ai/pipecat/pull/3967))
- Fixed `GoogleLLMService` ignoring the `system_instruction` set via
constructor or `GoogleLLMSettings` when a system message was also present in
the context. The settings value now correctly takes priority, and a warning
is logged when both are set.
(PR [#3976](https://github.com/pipecat-ai/pipecat/pull/3976))
### Other
- Updated foundational examples to use `system_instruction` on LLM services
instead of adding system messages to `LLMContext`.
(PR [#3918](https://github.com/pipecat-ai/pipecat/pull/3918))
- Updated AssemblyAI turn detection example to use `keyterms_prompt` list
format instead of `prompt` string for improved clarity.
(PR [#3929](https://github.com/pipecat-ai/pipecat/pull/3929))
- Updated foundational examples and eval scripts to use `"user"` role instead
of `"system"` when adding messages to `LLMContext`, since system prompts
should be set via `system_instruction` on the LLM service.
(PR [#3931](https://github.com/pipecat-ai/pipecat/pull/3931))
## [0.0.104] - 2026-03-02
### Added

View File

@@ -65,25 +65,12 @@ Once your PR is submitted, post in the `#community-integrations` Discord channel
#### Websocket-based Services
**Base class:** `WebsocketSTTService`
**Use for:** Services where you manage the websocket connection directly. Combines `STTService` with `WebsocketService` for automatic reconnection and keepalive support.
**Examples:**
- [CartesiaSTTService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/cartesia/stt.py)
- [ElevenLabsRealtimeSTTService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/elevenlabs/stt.py)
#### SDK-based Streaming Services
**Base class:** `STTService`
**Use for:** Streaming services where the provider's Python SDK manages the connection internally.
**Examples:**
- [DeepgramSTTService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/deepgram/stt.py)
- [GoogleSTTService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/google/stt.py)
- [SpeechmaticsSTTService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/speechmatics/stt.py)
#### File-based Services
@@ -121,59 +108,55 @@ Once your PR is submitted, post in the `#community-integrations` Discord channel
#### Key requirements:
- **`_process_context(self, context: LLMContext)`** — The main method that processes an LLM context and generates a response. Each LLM service overrides `process_frame` to extract context from `LLMContextFrame` and calls `_process_context`.
- **`adapter_class`** — Class attribute pointing to a `BaseLLMAdapter` subclass. Defaults to `OpenAILLMAdapter`. Non-OpenAI services must implement their own adapter (see `src/pipecat/adapters/base_llm_adapter.py`) with methods:
- `get_llm_invocation_params(context)` — Extract provider-specific params from universal context
- `to_provider_tools_format(tools_schema)` — Convert standard tools to provider format
- `get_messages_for_logging(context)` — Format messages for logging
- Reference adapters: `src/pipecat/adapters/services/` (anthropic, gemini, bedrock, etc.)
- **Frame sequence:** Output must follow this frame sequence pattern:
- `LLMFullResponseStartFrame` Signals the start of an LLM response
- `LLMTextFrame` Contains LLM content, typically streamed as tokens
- `LLMFullResponseEndFrame` Signals the end of an LLM response
- `LLMFullResponseStartFrame` - Signals the start of an LLM response
- `LLMTextFrame` - Contains LLM content, typically streamed as tokens
- `LLMFullResponseEndFrame` - Signals the end of an LLM response
- **Thought frames (reasoning models):** If the model supports extended thinking / chain-of-thought, emit thought frames alongside the response:
- `LLMThoughtStartFrame` — Signals the start of a thought
- `LLMThoughtTextFrame` — Contains thought content, streamed as tokens
- `LLMThoughtEndFrame` — Signals the end of a thought
- **Context aggregation** is handled by the framework via `LLMContext` + `LLMContextAggregatorPair`. The LLM service just processes context it receives — no need to implement aggregators.
- **Context aggregation:** Implement context aggregation to collect user and assistant content:
- Aggregators come in pairs with a `user()` instance and `assistant()` instance
- Context must adhere to the `LLMContext` universal format
- Aggregators should handle adding messages, function calls, and images to the context
### TTS (Text-to-Speech) Services
#### WebsocketTTSService
#### AudioContextWordTTSService
**Use for:** Websocket-based streaming services (with or without word timestamps)
**Use for:** Websocket-based services supporting word/timestamp alignment
**Examples:**
**Example:**
- [CartesiaTTSService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/cartesia/tts.py)
- [ElevenLabsTTSService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/elevenlabs/tts.py)
#### InterruptibleTTSService
**Use for:** Websocket-based services without word timestamps that reconnect on interruption (e.g. don't support a context ID or interruption message)
**Use for:** Websocket-based services without word/timestamp alignment, requiring disconnection on interruption
**Example:**
- [SarvamTTSService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/sarvam/tts.py)
#### WordTTSService
**Use for:** HTTP-based services supporting word/timestamp alignment
**Example:**
- [ElevenLabsHttpTTSService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/elevenlabs/tts.py)
#### TTSService
**Use for:** HTTP-based services (word timestamps are supported in the base class)
**Use for:** HTTP-based services without word/timestamp alignment
**Examples:**
**Example:**
- [GoogleHttpTTSService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/google/tts.py)
- [OpenAITTSService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/openai/tts.py)
#### Key requirements:
- For websocket services, use asyncio WebSocket implementation
- For websocket services, use asyncio WebSocket implementation (required for v13+ support)
- Handle idle service timeouts with keepalives
- TTS services push both audio (`TTSAudioRawFrame`) and text (`TTSTextFrame`) frames
- TTSServices push both audio (`TTSRawAudioFrame`) and text (`TTSTextFrame`) frames
### Telephony Serializers
@@ -217,9 +200,9 @@ Vision services process images and provide analysis such as descriptions, object
#### Key requirements:
- Must implement `run_vision` method that takes a `UserImageRawFrame` and returns an `AsyncGenerator[Frame, None]`
- The method processes the image frame and yields frames with analysis results
- Must yield the frame sequence: `VisionFullResponseStartFrame`, `VisionTextFrame`, `VisionFullResponseEndFrame`
- Must implement `run_vision` method that takes an `LLMContext` and returns an `AsyncGenerator[Frame, None]`
- The method processes the latest image in the context and yields frames with analysis results
- Typically yields `TextFrame` objects containing descriptions or answers
## Implementation Guidelines
@@ -248,105 +231,49 @@ def can_generate_metrics(self) -> bool:
return True
```
### Service Settings
### Dynamic Settings Updates
Every AI service (STT, LLM, TTS, image generation, etc.) exposes a **Settings dataclass** that serves two roles:
STT, LLM, and TTS services support runtime configuration changes via `*UpdateSettingsFrame`s (e.g. `STTUpdateSettingsFrame`, `TTSUpdateSettingsFrame`, `LLMUpdateSettingsFrame`).
1. **Store mode** — the service's `self._settings` holds the current value of every runtime-updatable field.
2. **Delta mode** — an update frame (e.g. `TTSUpdateSettingsFrame`) specifies only the fields that should change; unspecified fields remain `NOT_GIVEN`.
#### Defining your Settings class
Extend `STTSettings`, `TTSSettings`, `LLMSettings`, or `ImageGenSettings` (or, if your service directly subclasses `AIService`, `ServiceSettings`). The base classes already provide common fields (e.g. `model`, `voice`, `language`). You only need to add **service-specific knobs that should be runtime-updatable**:
Each service declares a settings dataclass that extends the appropriate base (`STTSettings`, `TTSSettings`, `LLMSettings`). Fields default to `NOT_GIVEN` so that update objects can represent sparse deltas:
```python
from dataclasses import dataclass, field
from pipecat.services.settings import TTSSettings, NOT_GIVEN
from pipecat.services.settings import STTSettings, NOT_GIVEN
@dataclass
class MyTTSSettings(TTSSettings):
"""Settings for MyTTS service.
class MySTTSettings(STTSettings):
"""Settings for my STT service.
Parameters:
speaking_rate: Speed multiplier (0.52.0).
region: Cloud region for the service.
"""
speaking_rate: float | None = field(default_factory=lambda: NOT_GIVEN)
region: str = field(default_factory=lambda: NOT_GIVEN)
```
**What goes in Settings vs. `__init__` params:**
| Belongs in Settings | Stays as `__init__` params |
| -------------------------------------------------------- | ----------------------------------------- |
| Model name, voice, language | API keys, auth tokens |
| Service-specific tuning knobs (rate, pitch, temperature) | Base URLs, endpoint overrides |
| Anything users may want to change mid-session | Audio encoding, sample format |
| | Connection parameters (timeouts, retries) |
The rule of thumb: if a caller might send an update frame to change it at runtime, it belongs in Settings. Everything else is init-only config stored as `self._xxx`.
#### Wiring settings into `__init__`
Accept an **optional** `settings` parameter. Build a `default_settings` object with all fields set to real values, then merge any caller overrides with `apply_update`.
Add a `Settings` **class attribute** that points to your settings dataclass. This lets callers access the settings class through the service itself (e.g. `MyTTSService.Settings(...)`) without a separate import:
The service stores its current settings in `self._settings` and declares the type with a class-level annotation for editor support:
```python
from typing import Optional
class MySTTService(STTService):
_settings: MySTTSettings
class MyTTSService(TTSService):
Settings = MyTTSSettings
_settings: Settings
def __init__(
self,
*,
api_key: str,
settings: Optional[Settings] = None,
**kwargs,
):
# 1. Defaults — every field has a real value (store mode).
default_settings = self.Settings(
model="my-model-v1",
voice="default-voice",
language="en",
speaking_rate=1.0,
def __init__(self, *, model: str, language: str, region: str, **kwargs):
# An initial value should be provided for every settings field.
# This will be validated at service start.
# (If you track sample_rate, it can be a placeholder value like 0; see
# "Sample Rate Handling").
super().__init__(
settings=MySTTSettings(model=model, language=language, region=region), **kwargs
)
# 2. Merge caller overrides (only given fields win).
if settings is not None:
default_settings.apply_update(settings)
# 3. Pass the fully-populated settings to the base class.
super().__init__(settings=default_settings, **kwargs)
# 4. Init-only config stored separately.
self._api_key = api_key
```
This pattern lets callers override only what they care about:
```python
# Uses all defaults
svc = MyTTSService(api_key="sk-xxx")
# Overrides just the voice — access Settings through the service class
svc = MyTTSService(
api_key="sk-xxx",
settings=MyTTSService.Settings(voice="custom-voice"),
)
```
#### Reacting to runtime changes
AI services support runtime configuration changes via `*UpdateSettingsFrame`s (e.g. `STTUpdateSettingsFrame`, `TTSUpdateSettingsFrame`, `LLMUpdateSettingsFrame`).
To react to runtime setting changes, override `_update_settings`. The base implementation applies the delta to `self._settings` and returns a `dict` mapping each changed field name to its **pre-update** value. Your override should call `super()` first, then act on the changed fields. A common implementation might look like:
```python
async def _update_settings(self, update: TTSSettings) -> dict[str, Any]:
"""Apply a settings update, reconfiguring the connection if needed."""
async def _update_settings(self, update: STTSettings) -> dict[str, Any]:
"""Apply a settings update, reconfiguring the recognizer if needed."""
changed = await super()._update_settings(update)
if not changed:
@@ -365,7 +292,7 @@ Note that, in this example, the service requires a reconnect to apply the new la
If your service can't yet apply certain settings at runtime, call `self._warn_unhandled_updated_settings(changed)` with any unhandled field names so users get a clear log message:
```python
async def _update_settings(self, update: TTSSettings) -> dict[str, Any]:
async def _update_settings(self, update: STTSettings) -> dict[str, Any]:
changed = await super()._update_settings(update)
if not changed:
@@ -398,7 +325,7 @@ Note that `self.sample_rate` is a `@property` set in the TTSService base class,
Use Pipecat's tracing decorators:
- **STT:** `@traced_stt` - decorate `_handle_transcription(self, transcript, is_final, language)` (the standard method name convention)
- **STT:** `@traced_stt` - decorate a function that handles `transcript`, `is_final`, `language` as args
- **LLM:** `@traced_llm` - decorate the `_process_context()` method
- **TTS:** `@traced_tts` - decorate the `run_tts()` method
@@ -420,15 +347,17 @@ For REST-based communication, use aiohttp. Pipecat includes this as a required d
- Wrap API calls in appropriate try/catch blocks
- Handle rate limits and network failures gracefully
- Provide meaningful error messages
- When errors occur, raise exceptions AND push errors to notify the pipeline:
- When errors occur, raise exceptions AND push `ErrorFrame`s to notify the pipeline:
```python
from pipecat.frames.frames import ErrorFrame
try:
# Your API call
result = await self._make_api_call()
except Exception as e:
# Push error upstream to notify the pipeline
await self.push_error(f"{self} error: {e}", exception=e)
# Push error frame to pipeline
await self.push_error(ErrorFrame(error=f"{self} error: {e}"))
# Raise or handle as appropriate
raise
```

View File

@@ -65,10 +65,6 @@ claude plugin marketplace add pipecat-ai/skills
and install any of the available plugins.
### 🧩 Community Integrations
Build and share your own Pipecat service integrations! Browse existing [community integrations](https://docs.pipecat.ai/server/services/community-integrations) or check out our [guide](COMMUNITY_INTEGRATIONS.md) to create your own.
### 📺️ Pipecat TV Channel
Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.youtube.com/playlist?list=PLzU2zoMTQIHjqC3v4q2XVSR3hGSzwKFwH) channel.
@@ -85,20 +81,19 @@ Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.yout
## 🧩 Available services
| Category | Services |
| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Gradium](https://docs.pipecat.ai/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [NVIDIA Riva](https://docs.pipecat.ai/server/services/stt/riva), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [Sarvam](https://docs.pipecat.ai/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) |
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/server/services/llm/aws), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/server/services/llm/mistral), [Novita](https://docs.pipecat.ai/server/services/llm/novita), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nvidia), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/server/services/llm/sambanova), [Sarvam](https://docs.pipecat.ai/server/services/llm/sarvam), [Together AI](https://docs.pipecat.ai/server/services/llm/together) |
| Text-to-Speech | [Async](https://docs.pipecat.ai/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Camb AI](https://docs.pipecat.ai/server/services/tts/camb), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [Gradium](https://docs.pipecat.ai/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/server/services/tts/groq), [Hume](https://docs.pipecat.ai/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/server/services/tts/inworld), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [NVIDIA Riva](https://docs.pipecat.ai/server/services/tts/riva), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [Resemble](https://docs.pipecat.ai/server/services/tts/resemble), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [Smallest](https://docs.pipecat.ai/server/services/tts/smallest), [Speechmatics](https://docs.pipecat.ai/server/services/tts/speechmatics), [xAI](https://docs.pipecat.ai/server/services/tts/xai), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
| Speech-to-Speech | [AWS Nova Sonic](https://docs.pipecat.ai/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/server/services/s2s/ultravox), |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local |
| Serializers | [Exotel](https://docs.pipecat.ai/server/utilities/serializers/exotel), [Plivo](https://docs.pipecat.ai/server/utilities/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/utilities/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/utilities/serializers/telnyx), [Vonage](https://docs.pipecat.ai/server/utilities/serializers/vonage) |
| Video | [HeyGen](https://docs.pipecat.ai/server/services/video/heygen), [LemonSlice](https://docs.pipecat.ai/server/services/video/lemonslice), [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) |
| Memory | [mem0](https://docs.pipecat.ai/server/services/memory/mem0) |
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/google-imagen), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) |
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [ai-coustics](https://docs.pipecat.ai/server/utilities/audio/aic-filter) |
| Analytics & Metrics | [OpenTelemetry](https://docs.pipecat.ai/server/utilities/opentelemetry), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) |
| Community | [Browse community integrations →](https://docs.pipecat.ai/server/services/community-integrations) |
| Category | Services |
| ------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Gradium](https://docs.pipecat.ai/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [Hathora](https://docs.pipecat.ai/server/services/stt/hathora), [NVIDIA Riva](https://docs.pipecat.ai/server/services/stt/riva), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [SambaNova (Whisper)](https://docs.pipecat.ai/server/services/stt/sambanova), [Sarvam](https://docs.pipecat.ai/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) |
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/server/services/llm/aws), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/server/services/llm/mistral), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/server/services/llm/sambanova) [Together AI](https://docs.pipecat.ai/server/services/llm/together) |
| Text-to-Speech | [Async](https://docs.pipecat.ai/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Camb AI](https://docs.pipecat.ai/server/services/tts/camb), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [Gradium](https://docs.pipecat.ai/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/server/services/tts/groq), [Hathora](https://docs.pipecat.ai/server/services/tts/hathora), [Hume](https://docs.pipecat.ai/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/server/services/tts/inworld), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [NVIDIA Riva](https://docs.pipecat.ai/server/services/tts/riva), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [Resemble](https://docs.pipecat.ai/server/services/tts/resemble), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [Speechmatics](https://docs.pipecat.ai/server/services/tts/speechmatics), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
| Speech-to-Speech | [AWS Nova Sonic](https://docs.pipecat.ai/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/server/services/s2s/ultravox), |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local |
| Serializers | [Exotel](https://docs.pipecat.ai/server/utilities/serializers/exotel), [Plivo](https://docs.pipecat.ai/server/utilities/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/utilities/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/utilities/serializers/telnyx), [Vonage](https://docs.pipecat.ai/server/utilities/serializers/vonage) |
| Video | [HeyGen](https://docs.pipecat.ai/server/services/video/heygen), [LemonSlice](https://docs.pipecat.ai/server/services/video/lemonslice), [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) |
| Memory | [mem0](https://docs.pipecat.ai/server/services/memory/mem0) |
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/google-imagen), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) |
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [ai-coustics](https://docs.pipecat.ai/server/utilities/audio/aic-filter) |
| Analytics & Metrics | [OpenTelemetry](https://docs.pipecat.ai/server/utilities/opentelemetry), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) |
📚 [View full services documentation →](https://docs.pipecat.ai/server/services/supported-services)

View File

@@ -0,0 +1 @@
- ⚠️ Updated `DeepgramSTTService` to use `deepgram-sdk` v6. The `LiveOptions` class was removed from the SDK and is now provided by pipecat directly; import it from `pipecat.services.deepgram.stt` instead of `deepgram`.

1
changelog/3848.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed `DeepgramSTTService` keepalive ping timeout disconnections. The deepgram-sdk v6 removed automatic keepalive; pipecat now sends explicit `KeepAlive` messages every 5 seconds, within the recommended 35 second interval before Deepgram's 10-second inactivity timeout.

View File

@@ -0,0 +1,3 @@
- Support for Voice Focus 2.0 models.
- Updated `aic-sdk` to `~=2.1.0` to support Voice Focus 2.0 models.
- Cleaned unused `ParameterFixedError` exception handling in `AICFilter` parameter setup.

1
changelog/3889.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed `BufferError: Existing exports of data: object cannot be re-sized` in `AICFilter` caused by holding a `memoryview` on the mutable audio buffer across async yield points.

View File

@@ -0,0 +1 @@
- `max_context_tokens` and `max_unsummarized_messages` in `LLMAutoContextSummarizationConfig` (and deprecated `LLMContextSummarizationConfig`) can now be set to `None` independently to disable that summarization threshold. At least one must remain set.

1
changelog/3915.added.md Normal file
View File

@@ -0,0 +1 @@
- Added optional `timeout_secs` parameter to `register_function()` and `register_direct_function()` for per-tool function call timeout control, overriding the global `function_call_timeout_secs` default.

1
changelog/3916.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `cloud-audio-only` recording option to Daily transport's `enable_recording` property.

15
changelog/3918.added.md Normal file
View File

@@ -0,0 +1,15 @@
- Wired up `system_instruction` in `BaseOpenAILLMService`, `AnthropicLLMService`, and `AWSBedrockLLMService` so it works as a default system prompt, matching the behavior of the Google services. This enables sharing a single `LLMContext` across multiple LLM services, where each service provides its own system instruction independently.
```python
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful assistant.",
)
context = LLMContext()
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
context.add_message({"role": "user", "content": "Please introduce yourself."})
await task.queue_frames([LLMRunFrame()])
```

1
changelog/3918.other.md Normal file
View File

@@ -0,0 +1 @@
- Updated foundational examples to use `system_instruction` on LLM services instead of adding system messages to `LLMContext`.

View File

@@ -80,9 +80,15 @@ GOOGLE_TEST_CREDENTIALS=...
# Gradium
GRAPDIUM_API_KEY=...
# Grok
GROK_API_KEY=...
# Groq
GROQ_API_KEY=...
# Hathora
HATHORA_API_KEY=...
# Heygen
HEYGEN_API_KEY=...
HEYGEN_LIVE_AVATAR_API_KEY=...
@@ -124,9 +130,6 @@ MISTRAL_API_KEY=...
# Neuphonic
NEUPHONIC_API_KEY=...
# Novita
NOVITA_API_KEY=...
# NVIDIA
NVIDIA_API_KEY=...
@@ -176,9 +179,6 @@ SENTRY_DSN=...
SIMLI_API_KEY=...
SIMLI_FACE_ID=...
# Smallest
SMALLEST_API_KEY=...
# Smart turn
LOCAL_SMART_TURN_MODEL_PATH=...
FAL_SMART_TURN_API_KEY=...
@@ -212,6 +212,3 @@ WHATSAPP_TOKEN=...
WHATSAPP_WEBHOOK_VERIFICATION_TOKEN=...
WHATSAPP_PHONE_NUMBER_ID=...
WHATSAPP_APP_SECRET=...
# xAI / Grok
XAI_API_KEY=...

View File

@@ -39,9 +39,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# Create an HTTP session
async with aiohttp.ClientSession() as session:
tts = PiperHttpTTSService(
base_url=os.getenv("PIPER_BASE_URL"),
aiohttp_session=session,
sample_rate=24000,
base_url=os.getenv("PIPER_BASE_URL"), aiohttp_session=session, sample_rate=24000
)
task = PipelineTask(

View File

@@ -39,10 +39,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async with aiohttp.ClientSession() as session:
tts = RimeHttpTTSService(
api_key=os.getenv("RIME_API_KEY", ""),
voice_id="rex",
aiohttp_session=session,
settings=RimeHttpTTSService.Settings(
voice="rex",
),
)
task = PipelineTask(

View File

@@ -37,9 +37,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
task = PipelineTask(

View File

@@ -29,9 +29,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
pipeline = Pipeline([tts, transport.output()])

View File

@@ -37,9 +37,7 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
runner = PipelineRunner()

View File

@@ -39,16 +39,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
system_instruction="You are an LLM in a WebRTC session, and this is a 'hello world' demo.",
)
task = PipelineTask(
@@ -60,7 +56,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
context = LLMContext()
context.add_message({"role": "developer", "content": "Say hello to the world."})
context.add_message({"role": "system", "content": "Say hello to the world."})
await task.queue_frames([LLMContextFrame(context), EndFrame()])
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)

View File

@@ -45,9 +45,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# Create an HTTP session
async with aiohttp.ClientSession() as session:
imagegen = FalImageGenService(
settings=FalImageGenService.Settings(
image_size="square_hd",
),
params=FalImageGenService.InputParams(image_size="square_hd"),
aiohttp_session=session,
key=os.getenv("FAL_KEY"),
)

View File

@@ -37,9 +37,7 @@ async def main():
)
imagegen = FalImageGenService(
settings=FalImageGenService.Settings(
image_size="square_hd",
),
params=FalImageGenService.InputParams(image_size="square_hd"),
aiohttp_session=session,
key=os.getenv("FAL_KEY"),
)

View File

@@ -67,16 +67,12 @@ async def run_example(webrtc_connection: SmallWebRTCConnection):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -109,9 +105,7 @@ async def run_example(webrtc_connection: SmallWebRTCConnection):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -50,16 +50,13 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
model="gpt-4o",
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -92,7 +89,7 @@ async def main():
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
{"role": "system", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])

View File

@@ -57,16 +57,12 @@ async def main():
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
context = LLMContext()

View File

@@ -16,12 +16,11 @@ from pipecat.frames.frames import (
Frame,
LLMContextFrame,
LLMFullResponseStartFrame,
OutputImageRawFrame,
TextFrame,
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.sync_parallel_pipeline import FrameOrder, SyncParallelPipeline
from pipecat.pipeline.sync_parallel_pipeline import SyncParallelPipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.sentence import SentenceAggregator
@@ -31,7 +30,6 @@ from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaHttpTTSService
from pipecat.services.fal.image import FalImageGenService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.tts_service import TextAggregationMode
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
@@ -46,18 +44,6 @@ class MonthFrame(DataFrame):
return f"{self.name}(month: {self.month})"
class MarkImageForPlaybackSync(FrameProcessor):
"""Marks output image frames to be synchronized with audio playback."""
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, OutputImageRawFrame):
frame.sync_with_audio = True
await self.push_frame(frame, direction)
class MonthPrepender(FrameProcessor):
def __init__(self):
super().__init__()
@@ -112,19 +98,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaHttpTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaHttpTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
# No need to aggregate by sentences (the default), as we already know we're getting full sentences
# (Otherwise the service will unnecessarily wait for follow-up input to confirm the sentence is complete,
# which, sadly, actually breaks the synchronization mechanism)
text_aggregation_mode=TextAggregationMode.TOKEN,
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
imagegen = FalImageGenService(
settings=FalImageGenService.Settings(
image_size="square_hd",
),
params=FalImageGenService.InputParams(image_size="square_hd"),
aiohttp_session=session,
key=os.getenv("FAL_KEY"),
)
@@ -137,26 +115,17 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# that, each pipeline runs concurrently and `SyncParallelPipeline` will
# wait for the input frame to be processed.
#
# We use `FrameOrder.PIPELINE` so that each synchronized batch of output
# frames is pushed in the order the pipelines are listed: image first,
# then audio. This ensures the transport receives the image before the
# audio frames it should accompany.
#
# Note that `SyncParallelPipeline` requires the last processor in each
# of the pipelines to be synchronous. In this case, we use
# `FalImageGenService` and `CartesiaHttpTTSService` which make HTTP
# `CartesiaHttpTTSService` and `FalImageGenService` which make HTTP
# requests and wait for the response.
pipeline = Pipeline(
[
llm, # LLM
sentence_aggregator, # Aggregates LLM output into full sentences
SyncParallelPipeline( # Run pipelines in parallel aggregating the result
[
imagegen, # Generate image
MarkImageForPlaybackSync(), # Mark image as needing sync w/audio during playback
],
[month_prepender, tts], # Create "Month: sentence" and output audio
frame_order=FrameOrder.PIPELINE,
[imagegen], # Generate image
),
transport.output(), # Transport output
]
@@ -179,7 +148,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]:
messages = [
{
"role": "user",
"role": "system",
"content": f"Describe a nature photograph suitable for use in a calendar, for the month of {month}. Include only the image description with no preamble. Limit the description to one sentence, please.",
}
]

View File

@@ -0,0 +1,198 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
import tkinter as tk
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.frames.frames import (
Frame,
LLMContextFrame,
OutputAudioRawFrame,
TextFrame,
TTSAudioRawFrame,
URLImageRawFrame,
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.sync_parallel_pipeline import SyncParallelPipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.sentence import SentenceAggregator
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.cartesia.tts import CartesiaHttpTTSService
from pipecat.services.fal.image import FalImageGenService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.local.tk import TkLocalTransport, TkTransportParams
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
tk_root = tk.Tk()
tk_root.title("Calendar")
runner = PipelineRunner()
async def get_month_data(month):
messages = [
{
"role": "system",
"content": f"Describe a nature photograph suitable for use in a calendar, for the month of {month}. Include only the image description with no preamble. Limit the description to one sentence, please.",
}
]
class ImageDescription(FrameProcessor):
def __init__(self):
super().__init__()
self.text = ""
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, TextFrame):
self.text = frame.text
await self.push_frame(frame, direction)
class AudioGrabber(FrameProcessor):
def __init__(self):
super().__init__()
self.audio = bytearray()
self.frame = None
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, TTSAudioRawFrame):
self.audio.extend(frame.audio)
self.frame = OutputAudioRawFrame(
bytes(self.audio), frame.sample_rate, frame.num_channels
)
await self.push_frame(frame, direction)
class ImageGrabber(FrameProcessor):
def __init__(self):
super().__init__()
self.frame = None
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, URLImageRawFrame):
self.frame = frame
await self.push_frame(frame, direction)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
tts = CartesiaHttpTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
imagegen = FalImageGenService(
params=FalImageGenService.InputParams(image_size="square_hd"),
aiohttp_session=session,
key=os.getenv("FAL_KEY"),
)
sentence_aggregator = SentenceAggregator()
description = ImageDescription()
audio_grabber = AudioGrabber()
image_grabber = ImageGrabber()
# With `SyncParallelPipeline` we synchronize audio and images by
# pushing them basically in order (e.g. I1 A1 A1 A1 I2 A2 A2 A2 A2
# I3 A3). To do that, each pipeline runs concurrently and
# `SyncParallelPipeline` will wait for the input frame to be
# processed.
#
# Note that `SyncParallelPipeline` requires the last processor in
# each of the pipelines to be synchronous. In this case, we use
# `CartesiaHttpTTSService` and `FalImageGenService` which make HTTP
# requests and wait for the response.
pipeline = Pipeline(
[
llm, # LLM
sentence_aggregator, # Aggregates LLM output into full sentences
description, # Store sentence
SyncParallelPipeline(
[tts, audio_grabber], # Generate and store audio for the given sentence
[imagegen, image_grabber], # Generate and storeimage for the given sentence
),
]
)
task = PipelineTask(pipeline)
await task.queue_frame(LLMContextFrame(LLMContext(messages)))
await task.stop_when_done()
await runner.run(task)
return {
"month": month,
"text": description.text,
"image": image_grabber.frame,
"audio": audio_grabber.frame,
}
transport = TkLocalTransport(
tk_root,
TkTransportParams(
audio_out_enabled=True,
video_out_enabled=True,
video_out_width=1024,
video_out_height=1024,
),
)
pipeline = Pipeline([transport.output()])
task = PipelineTask(pipeline)
# We only specify a few months as we create tasks all at once and we
# might get rate limited otherwise.
months: list[str] = [
"January",
"February",
]
# We create one task per month. This will be executed concurrently.
month_tasks = [asyncio.create_task(get_month_data(month)) for month in months]
# Now we wait for each month task in the order they're completed. The
# benefit is we'll have as little delay as possible before the first
# month, and likely no delay between months, but the months won't
# display in order.
async def show_images(month_tasks):
for month_data_task in asyncio.as_completed(month_tasks):
data = await month_data_task
await task.queue_frames([data["image"], data["audio"]])
await runner.stop_when_done()
async def run_tk():
while not task.has_finished():
tk_root.update()
tk_root.update_idletasks()
await asyncio.sleep(0.1)
await asyncio.gather(runner.run(task), show_images(month_tasks), run_tk())
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -83,16 +83,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
ml = MetricsLogger()
@@ -129,9 +125,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -100,16 +100,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()

View File

@@ -6,7 +6,6 @@
import os
import aiohttp
from dotenv import load_dotenv
from loguru import logger
@@ -53,68 +52,60 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
async with aiohttp.ClientSession() as session:
stt = CartesiaSTTService(api_key=os.getenv("CARTESIA_API_KEY"))
stt = CartesiaSTTService(api_key=os.getenv("CARTESIA_API_KEY"))
tts = CartesiaHttpTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
aiohttp_session=session,
settings=CartesiaHttpTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
tts = CartesiaHttpTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
assistant_aggregator, # Assistant spoken responses
]
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
assistant_aggregator, # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
await runner.run(task)
async def bot(runner_args: RunnerArguments):

View File

@@ -24,6 +24,7 @@ from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.tts_service import TextAggregationMode
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
@@ -55,16 +56,15 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
# Alternatively, you can use TextAggregationMode.TOKEN to stream tokens instead of
# sentencesfor faster response times.
# text_aggregation_mode=TextAggregationMode.TOKEN,
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -98,9 +98,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -21,6 +21,7 @@ from pipecat.processors.aggregators.llm_response_universal import (
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.openai.base_llm import BaseOpenAILLMService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.speechmatics.stt import SpeechmaticsSTTService
from pipecat.services.speechmatics.tts import SpeechmaticsTTSService
@@ -92,7 +93,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async with aiohttp.ClientSession() as session:
stt = SpeechmaticsSTTService(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
settings=SpeechmaticsSTTService.Settings(
params=SpeechmaticsSTTService.InputParams(
language=Language.EN,
turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.ADAPTIVE,
# focus_speakers=["S1"],
@@ -103,21 +104,32 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = SpeechmaticsTTSService(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
settings=SpeechmaticsTTSService.Settings(
voice="sarah",
),
voice_id="sarah",
aiohttp_session=session,
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
temperature=0.75,
system_instruction="You are a helpful British assistant called Sarah in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Always include punctuation in your responses. Give very short replies - do not give longer replies unless strictly necessary. Respond to what the user said in a concise, funny, creative and helpful way. Use `<Sn/>` tags to identify different speakers - do not use tags in your replies. Do not respond to speakers within `<PASSIVE/>` tags unless explicitly asked to.",
),
params=BaseOpenAILLMService.InputParams(temperature=0.75),
)
context = LLMContext()
messages = [
{
"role": "system",
"content": (
"You are a helpful British assistant called Sarah. "
"Your goal is to demonstrate your capabilities in a succinct way. "
"Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. "
"Always include punctuation in your responses. "
"Give very short replies - do not give longer replies unless strictly necessary. "
"Respond to what the user said in a concise, funny, creative and helpful way. "
"Use `<Sn/>` tags to identify different speakers - do not use tags in your replies. "
"Do not respond to speakers within `<PASSIVE/>` tags unless explicitly asked to. "
),
},
]
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(user_turn_strategies=ExternalUserTurnStrategies()),
@@ -148,7 +160,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message({"role": "developer", "content": "Say a short hello to the user."})
messages.append({"role": "system", "content": "Say a short hello to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -22,6 +22,7 @@ from pipecat.processors.aggregators.llm_response_universal import (
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.openai.base_llm import BaseOpenAILLMService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.speechmatics.stt import SpeechmaticsSTTService
from pipecat.services.speechmatics.tts import SpeechmaticsTTSService
@@ -75,7 +76,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async with aiohttp.ClientSession() as session:
stt = SpeechmaticsSTTService(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
settings=SpeechmaticsSTTService.Settings(
params=SpeechmaticsSTTService.InputParams(
language=Language.EN,
speaker_active_format="<{speaker_id}>{text}</{speaker_id}>",
),
@@ -83,21 +84,31 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = SpeechmaticsTTSService(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
settings=SpeechmaticsTTSService.Settings(
voice="sarah",
),
voice_id="sarah",
aiohttp_session=session,
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
temperature=0.75,
system_instruction="You are a helpful British assistant called Sarah in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Always include punctuation in your responses. Give very short replies - do not give longer replies unless strictly necessary. Respond to what the user said in a concise, funny, creative and helpful way. Use `<Sn/>` tags to identify different speakers - do not use tags in your replies. Do not respond to speakers within `<PASSIVE/>` tags unless explicitly asked to.",
),
params=BaseOpenAILLMService.InputParams(temperature=0.75),
)
context = LLMContext()
messages = [
{
"role": "system",
"content": (
"You are a helpful British assistant called Sarah. "
"Your goal is to demonstrate your capabilities in a succinct way. "
"Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. "
"Always include punctuation in your responses. "
"Give very short replies - do not give longer replies unless strictly necessary. "
"Respond to what the user said in a concise, funny, creative and helpful way. "
"Use `<Sn/>` tags to identify different speakers - do not use tags in your replies."
),
},
]
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -128,7 +139,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message({"role": "developer", "content": "Say a short hello to the user."})
messages.append({"role": "system", "content": "Say a short hello to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -71,16 +71,15 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
"Be nice and helpful. Answer very briefly and without special characters like `#` or `*`. "
"Your response will be synthesized to voice and those characters will create unnatural sounds.",
),
MessagesPlaceholder("chat_history"),
("human", "{input}"),

View File

@@ -1,151 +0,0 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.aws.llm import AWSBedrockLLMService, AWSBedrockLLMSettings
from pipecat.services.deepgram.flux.sagemaker.stt import DeepgramFluxSageMakerSTTService
from pipecat.services.deepgram.sagemaker.tts import DeepgramSageMakerTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_turn_strategies import ExternalUserTurnStrategies
load_dotenv(override=True)
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
# Initialize Deepgram Flux SageMaker STT Service
# This requires:
# - AWS credentials configured (via environment variables or AWS CLI)
# - A deployed SageMaker endpoint with Deepgram Flux model
stt = DeepgramFluxSageMakerSTTService(
endpoint_name=os.getenv("SAGEMAKER_STT_ENDPOINT_NAME"),
region=os.getenv("AWS_REGION"),
settings=DeepgramFluxSageMakerSTTService.Settings(
min_confidence=0.3,
),
)
# Initialize Deepgram SageMaker TTS Service
# This requires:
# - AWS credentials configured (via environment variables or AWS CLI)
# - A deployed SageMaker endpoint with Deepgram TTS model
tts = DeepgramSageMakerTTSService(
endpoint_name=os.getenv("SAGEMAKER_TTS_ENDPOINT_NAME"),
region=os.getenv("AWS_REGION"),
settings=DeepgramSageMakerTTSService.Settings(
voice="aura-2-andromeda-en",
),
)
llm = AWSBedrockLLMService(
aws_region=os.getenv("AWS_REGION"),
settings=AWSBedrockLLMSettings(
model="us.amazon.nova-pro-v1:0",
temperature=0.8,
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
context = LLMContext()
# Use ExternalUserTurnStrategies since Flux handles turn detection natively
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=ExternalUserTurnStrategies(),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
assistant_aggregator, # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
@stt.event_handler("on_update")
async def on_deepgram_flux_update(stt, transcript):
logger.debug(f"On deepgram flux update: {transcript}")
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -56,23 +56,14 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = DeepgramFluxSTTService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
settings=DeepgramFluxSTTService.Settings(
min_confidence=0.3,
),
params=DeepgramFluxSTTService.InputParams(min_confidence=0.3),
)
tts = DeepgramTTSService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
settings=DeepgramTTSService.Settings(
voice="aura-2-andromeda-en",
),
)
tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-2-andromeda-en")
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -109,9 +100,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -59,20 +59,18 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = DeepgramHttpTTSService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
settings=DeepgramHttpTTSService.Settings(
voice="aura-2-andromeda-en",
),
voice="aura-2-andromeda-en",
aiohttp_session=session,
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
messages = []
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -104,7 +102,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
{"role": "system", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])

View File

@@ -22,7 +22,7 @@ from pipecat.processors.aggregators.llm_response_universal import (
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.aws.llm import AWSBedrockLLMService, AWSBedrockLLMSettings
from pipecat.services.aws.llm import AWSBedrockLLMService
from pipecat.services.deepgram.sagemaker.stt import DeepgramSageMakerSTTService
from pipecat.services.deepgram.sagemaker.tts import DeepgramSageMakerTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
@@ -69,18 +69,14 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = DeepgramSageMakerTTSService(
endpoint_name=os.getenv("SAGEMAKER_TTS_ENDPOINT_NAME"),
region=os.getenv("AWS_REGION"),
settings=DeepgramSageMakerTTSService.Settings(
voice="aura-2-andromeda-en",
),
voice="aura-2-andromeda-en",
)
llm = AWSBedrockLLMService(
aws_region=os.getenv("AWS_REGION"),
settings=AWSBedrockLLMSettings(
model="us.amazon.nova-pro-v1:0",
temperature=0.8,
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
model="us.amazon.nova-pro-v1:0",
params=AWSBedrockLLMService.InputParams(temperature=0.8),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -114,9 +110,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -7,6 +7,7 @@
import os
from deepgram import LiveOptions
from dotenv import load_dotenv
from loguru import logger
@@ -55,24 +56,14 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = DeepgramSTTService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
settings=DeepgramSTTService.Settings(
vad_events=True,
utterance_end_ms="1000",
),
live_options=LiveOptions(vad_events=True, utterance_end_ms="1000"),
)
tts = DeepgramTTSService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
settings=DeepgramTTSService.Settings(
voice="aura-2-andromeda-en",
),
)
tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-2-andromeda-en")
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -106,9 +97,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -55,18 +55,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = DeepgramTTSService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
settings=DeepgramTTSService.Settings(
voice="aura-2-andromeda-en",
),
)
tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-2-andromeda-en")
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -100,9 +93,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -63,17 +63,13 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = ElevenLabsHttpTTSService(
api_key=os.getenv("ELEVENLABS_API_KEY", ""),
voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
aiohttp_session=session,
settings=ElevenLabsHttpTTSService.Settings(
voice=os.getenv("ELEVENLABS_VOICE_ID", ""),
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -108,7 +104,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
{"role": "system", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])

View File

@@ -57,16 +57,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = ElevenLabsTTSService(
api_key=os.getenv("ELEVENLABS_API_KEY", ""),
settings=ElevenLabsTTSService.Settings(
voice=os.getenv("ELEVENLABS_VOICE_ID", ""),
),
voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -100,9 +96,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -1,128 +0,0 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.xai.llm import GrokLLMService
from pipecat.services.xai.tts import XAIHttpTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
async with aiohttp.ClientSession() as session:
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = XAIHttpTTSService(
api_key=os.getenv("XAI_API_KEY"),
aiohttp_session=session,
settings=XAIHttpTTSService.Settings(
voice="eve",
),
)
llm = GrokLLMService(
api_key=os.getenv("XAI_API_KEY"),
settings=GrokLLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
assistant_aggregator, # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -65,10 +65,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
llm = AzureLLMService(
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
settings=AzureLLMService.Settings(
model=os.getenv("AZURE_CHATGPT_MODEL"),
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
model=os.getenv("AZURE_CHATGPT_MODEL"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -102,9 +100,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -65,10 +65,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
llm = AzureLLMService(
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
settings=AzureLLMService.Settings(
model=os.getenv("AZURE_CHATGPT_MODEL"),
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
model=os.getenv("AZURE_CHATGPT_MODEL"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -102,9 +100,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -54,24 +54,15 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = OpenAISTTService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAISTTService.Settings(
model="gpt-4o-transcribe",
prompt="Expect words related to dogs, such as breed names.",
),
model="gpt-4o-transcribe",
prompt="Expect words related to dogs, such as breed names.",
)
tts = OpenAITTSService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAITTSService.Settings(
voice="ballad",
),
)
tts = OpenAITTSService(api_key=os.getenv("OPENAI_API_KEY"), voice="ballad")
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are very knowledgable about dogs. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
),
system_instruction="You are very knowledgable about dogs. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -106,9 +97,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -55,25 +55,20 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = OpenAIRealtimeSTTService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAIRealtimeSTTService.Settings(
model="gpt-4o-transcribe",
prompt="Expect words related to dogs, such as breed names.",
language=Language.EN,
),
model="gpt-4o-transcribe",
prompt="Expect words related to dogs, such as breed names.",
language=Language.EN,
# Uses local VAD by default.
# To enable server-side VAD, set turn_detection=None or
# a dict with server_vad settings.
# turn_detection={"type": "server_vad", "threshold": 0.5},
)
tts = OpenAITTSService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAITTSService.Settings(
voice="ballad",
),
)
tts = OpenAITTSService(api_key=os.getenv("OPENAI_API_KEY"), voice="ballad")
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are very knowledgable about dogs. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
),
system_instruction="You are very knowledgable about dogs. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -108,9 +103,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -57,9 +57,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
timestamp = int(time.time())
@@ -67,9 +65,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
api_key=os.getenv("OPENAI_API_KEY"),
openpipe_api_key=os.getenv("OPENPIPE_API_KEY"),
tags={"conversation_id": f"pipecat-{timestamp}"},
settings=OpenPipeLLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -103,9 +99,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -59,17 +59,13 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = XTTSService(
aiohttp_session=session,
settings=XTTSService.Settings(
voice="Claribel Dervla",
),
voice_id="Claribel Dervla",
base_url="http://localhost:8000",
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -104,7 +100,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
{"role": "system", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])

View File

@@ -23,7 +23,7 @@ from pipecat.processors.aggregators.llm_response_universal import (
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.gladia.config import LanguageConfig
from pipecat.services.gladia.config import GladiaInputParams, LanguageConfig
from pipecat.services.gladia.stt import GladiaSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transcriptions.language import Language
@@ -58,7 +58,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = GladiaSTTService(
api_key=os.getenv("GLADIA_API_KEY", ""),
region=os.getenv("GLADIA_REGION"),
settings=GladiaSTTService.Settings(
params=GladiaInputParams(
language_config=LanguageConfig(
languages=[Language.EN],
),
@@ -68,19 +68,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY", ""),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY", ""),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY", ""))
context = LLMContext()
messages = [
{
"role": "system",
"content": f"You are a helpful LLM. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
@@ -114,9 +114,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -23,7 +23,7 @@ from pipecat.processors.aggregators.llm_response_universal import (
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.gladia.config import LanguageConfig
from pipecat.services.gladia.config import GladiaInputParams, LanguageConfig
from pipecat.services.gladia.stt import GladiaSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transcriptions.language import Language
@@ -57,7 +57,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = GladiaSTTService(
api_key=os.getenv("GLADIA_API_KEY", ""),
region=os.getenv("GLADIA_REGION"),
settings=GladiaSTTService.Settings(
params=GladiaInputParams(
language_config=LanguageConfig(
languages=[Language.EN],
)
@@ -66,19 +66,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY", ""),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY", ""),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY", ""))
context = LLMContext()
messages = [
{
"role": "system",
"content": f"You are a helpful LLM. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -109,9 +109,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -54,18 +54,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = LmntTTSService(
api_key=os.getenv("LMNT_API_KEY"),
settings=LmntTTSService.Settings(
voice="morgan",
),
)
tts = LmntTTSService(api_key=os.getenv("LMNT_API_KEY"), voice_id="morgan")
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -99,9 +92,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -56,10 +56,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
llm = GroqLLMService(
api_key=os.getenv("GROQ_API_KEY"),
settings=GroqLLMService.Settings(
model="llama-3.1-8b-instant",
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
model="meta-llama/llama-4-maverick-17b-128e-instruct",
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
tts = GroqTTSService(api_key=os.getenv("GROQ_API_KEY"))
@@ -95,9 +93,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -95,16 +95,13 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = AWSPollyTTSService(
region="us-west-2", # only specific regions support generative TTS
settings=AWSPollyTTSService.Settings(
voice="Joanna",
engine="generative",
rate="1.1",
),
voice_id="Joanna",
params=AWSPollyTTSService.InputParams(engine="generative", rate="1.1"),
)
# Create Strands agent processor
try:
agent = build_agent(model_id="us.anthropic.claude-sonnet-4-6", max_tokens=8000)
agent = build_agent(model_id="us.anthropic.claude-3-5-haiku-20241022-v1:0", max_tokens=8000)
llm = StrandsAgentsProcessor(agent=agent)
logger.info("Successfully created Strands agent for NAB customer service coaching")
except Exception as e:
@@ -151,8 +148,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
LLMMessagesAppendFrame(
messages=[
{
"role": "developer",
"content": f"Greet the user and introduce yourself. Don't use emojis.",
"role": "user",
"content": f"Greet the user and introduce yourself.",
}
],
run_llm=True,

View File

@@ -54,20 +54,15 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = AWSPollyTTSService(
region="us-west-2", # only specific regions support generative TTS
settings=AWSPollyTTSService.Settings(
voice="Joanna",
engine="generative",
rate="1.1",
),
voice_id="Joanna",
params=AWSPollyTTSService.InputParams(engine="generative", rate="1.1"),
)
llm = AWSBedrockLLMService(
aws_region="us-west-2",
settings=AWSBedrockLLMService.Settings(
model="us.anthropic.claude-sonnet-4-6",
temperature=0.8,
# system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
model="us.anthropic.claude-haiku-4-5-20251001-v1:0",
params=AWSBedrockLLMService.InputParams(temperature=0.8),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -101,9 +96,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -70,27 +70,21 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = GoogleSTTService(
params=GoogleSTTService.InputParams(languages=Language.EN_US),
credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
settings=GoogleSTTService.Settings(
languages=[Language.EN_US],
),
)
tts = GoogleTTSService(
voice_id="en-US-Chirp3-HD-Charon",
params=GoogleTTSService.InputParams(language=Language.EN_US),
credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
settings=GoogleTTSService.Settings(
voice="en-US-Chirp3-HD-Charon",
language=Language.EN_US,
),
)
llm = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
settings=GoogleLLMService.Settings(
model="gemini-2.5-flash-image",
# model="gemini-3-pro-image-preview", # A more powerful model, but slower,
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
model="gemini-2.5-flash-image",
# model="gemini-3-pro-image-preview", # A more powerful model, but slower,
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -124,9 +118,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation with a styled introduction
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -54,17 +54,15 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot with Gemini TTS")
stt = GoogleSTTService(
settings=GoogleSTTService.Settings(
languages=[Language.EN_US],
),
params=GoogleSTTService.InputParams(languages=Language.EN_US),
credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
)
tts = GeminiTTSService(
credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
settings=GeminiTTSService.Settings(
model="gemini-2.5-flash-tts",
voice="Charon",
model="gemini-2.5-flash-tts",
voice_id="Charon",
params=GeminiTTSService.InputParams(
language=Language.EN_US,
prompt="You are a helpful AI assistant. Speak in a natural, conversational tone.",
),
@@ -73,8 +71,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
llm = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
model="gemini-2.5-flash",
settings=GoogleLLMService.Settings(
system_instruction="""You are a helpful assistant in a voice conversation.
system_instruction="""You are a helpful AI assistant in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way.
IMPORTANT: You're using Gemini TTS which supports expressive markup tags. You can use these tags in your responses:
- [sigh] - Insert a sigh sound
@@ -91,8 +88,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
- "[whispering] Let me tell you a secret."
- "The answer is... [long pause] ...42!"
Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Keep responses concise. Respond to what the user said in a creative and helpful way.""",
),
Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.""",
)
context = LLMContext()
@@ -128,7 +124,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# Kick off the conversation
context.add_message(
{
"role": "developer",
"role": "system",
"content": "You are an AI assistant. You can help with a variety of tasks. Introduce yourself and ask the user what they would like to know.",
}
)

View File

@@ -54,31 +54,25 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = GoogleSTTService(
settings=GoogleSTTService.Settings(
languages=[Language.EN_US],
# Add model to use a specific model
# model="chirp_3",
),
params=GoogleSTTService.InputParams(languages=Language.EN_US, model="chirp_3"),
credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
location="us",
)
tts = GoogleHttpTTSService(
settings=GoogleHttpTTSService.Settings(
voice="en-US-Chirp3-HD-Charon",
language=Language.EN_US,
),
voice_id="en-US-Chirp3-HD-Charon",
params=GoogleHttpTTSService.InputParams(language=Language.EN_US),
credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
)
llm = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
settings=GoogleLLMService.Settings(
model="gemini-2.5-flash",
# force a certain amount of thinking if you want it
# thinking=GoogleLLMService.ThinkingConfig(thinking_budget=4096)
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
model="gemini-2.5-flash",
# force a certain amount of thinking if you want it
# params=GoogleLLMService.InputParams(
# thinking=GoogleLLMService.ThinkingConfig(thinking_budget=4096)
# ),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -112,9 +106,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -54,31 +54,25 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = GoogleSTTService(
settings=GoogleSTTService.Settings(
languages=[Language.EN_US],
# Add model to use a specific model
# model="chirp_3",
),
params=GoogleSTTService.InputParams(languages=Language.EN_US, model="chirp_3"),
credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
location="us",
)
tts = GoogleTTSService(
settings=GoogleTTSService.Settings(
voice="en-US-Chirp3-HD-Charon",
language=Language.EN_US,
),
voice_id="en-US-Chirp3-HD-Charon",
params=GoogleTTSService.InputParams(language=Language.EN_US),
credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
)
llm = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
settings=GoogleLLMService.Settings(
model="gemini-2.5-flash",
# force a certain amount of thinking if you want it
# thinking=GoogleLLMService.ThinkingConfig(thinking_budget=4096),
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
model="gemini-2.5-flash",
# force a certain amount of thinking if you want it
# params=GoogleLLMService.InputParams(
# thinking=GoogleLLMService.ThinkingConfig(thinking_budget=4096)
# ),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -112,9 +106,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -22,6 +22,7 @@ from pipecat.processors.aggregators.llm_response_universal import (
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.assemblyai.models import AssemblyAIConnectionParams
from pipecat.services.assemblyai.stt import AssemblyAISTTService
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.openai.llm import OpenAILLMService
@@ -93,13 +94,13 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = AssemblyAISTTService(
api_key=os.getenv("ASSEMBLYAI_API_KEY"),
vad_force_turn_endpoint=False, # Use AssemblyAI's built-in turn detection
settings=AssemblyAISTTService.Settings(
model="u3-rt-pro",
connection_params=AssemblyAIConnectionParams(
speech_model="u3-rt-pro",
# Optional: Tune turn detection timing (defaults shown below)
# min_turn_silence=100, # Default
# max_turn_silence=1000, # Default
# Optional: Boost accuracy for specific names/terms
# keyterms_prompt=["Xiomara", "Saoirse", "Krzystof", "API", "OAuth"],
# prompt="Names: Xiomara, Saoirse, Krzystof. Technical terms: API, OAuth.",
# Optional: Enable speaker diarization
# speaker_labels=True,
),
@@ -107,16 +108,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -153,9 +150,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -59,16 +59,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -102,9 +98,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -84,17 +84,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
api_key=os.getenv("CARTESIA_API_KEY"), voice_id="71a7ad14-091c-4e8e-a314-022ece01c121"
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -104,7 +99,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=KrispVivaTurn())]
),
vad_analyzer=SileroVADAnalyzer(), # or KrispVivaVadAnalyzer
vad_analyzer=SileroVADAnalyzer(),
),
)
@@ -134,9 +129,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -58,18 +58,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = DeepgramTTSService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
settings=DeepgramTTSService.Settings(
voice="aura-helios-en",
),
)
tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-helios-en")
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -103,9 +96,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -60,19 +60,14 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = RimeHttpTTSService(
api_key=os.getenv("RIME_API_KEY", ""),
settings=RimeHttpTTSService.Settings(
voice="luna",
model="arcana",
),
voice_id="luna",
model="arcana",
aiohttp_session=session,
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -107,7 +102,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
{"role": "system", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])

View File

@@ -56,16 +56,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = RimeTTSService(
api_key=os.getenv("RIME_API_KEY", ""),
settings=RimeTTSService.Settings(
voice="luna",
),
voice_id="luna",
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -99,9 +95,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -56,10 +56,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
llm = NvidiaLLMService(
api_key=os.getenv("NVIDIA_API_KEY"),
settings=NvidiaLLMService.Settings(
model="meta/llama-3.3-70b-instruct",
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
model="meta/llama-3.3-70b-instruct",
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
tts = NvidiaTTSService(api_key=os.getenv("NVIDIA_API_KEY"))
@@ -95,9 +93,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -48,7 +48,7 @@ load_dotenv(override=True)
marker = "|----|"
system_message = f"""
You are a helpful LLM in a voice call. Your goals are to be helpful and brief in your responses.
You are a helpful LLM in a WebRTC call. Your goals are to be helpful and brief in your responses.
You are expert at transcribing audio to text. You will receive a mixture of audio and text input. When
asked to transcribe what the user said, output an exact, word-for-word transcription.
@@ -216,24 +216,31 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
llm = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
settings=GoogleLLMService.Settings(
model="gemini-2.5-flash",
system_instruction=system_message,
# force a certain amount of thinking if you want it
# thinking=GoogleLLMService.ThinkingConfig(thinking_budget=4096)
),
model="gemini-2.5-flash",
# force a certain amount of thinking if you want it
# params=GoogleLLMService.InputParams(
# thinking=GoogleLLMService.ThinkingConfig(thinking_budget=4096)
# ),
)
tts = GoogleTTSService(
settings=GoogleTTSService.Settings(
voice="en-US-Chirp3-HD-Charon",
language=Language.EN_US,
),
voice_id="en-US-Chirp3-HD-Charon",
params=GoogleTTSService.InputParams(language=Language.EN_US),
credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
)
context = LLMContext()
messages = [
{
"role": "system",
"content": system_message,
},
{
"role": "user",
"content": "Start by saying hello.",
},
]
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -269,9 +276,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -57,16 +57,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = FishAudioTTSService(
api_key=os.getenv("FISH_API_KEY"),
settings=FishAudioTTSService.Settings(
voice="4ce7e917cedd4bc2bb2e6ff3a46acaa1", # Barack Obama
),
model="4ce7e917cedd4bc2bb2e6ff3a46acaa1", # Barack Obama
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -100,9 +96,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -60,17 +60,13 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = NeuphonicHttpTTSService(
api_key=os.getenv("NEUPHONIC_API_KEY"),
settings=NeuphonicHttpTTSService.Settings(
voice="fc854436-2dac-4d21-aa69-ae17b54e98eb", # Emily
),
voice_id="fc854436-2dac-4d21-aa69-ae17b54e98eb", # Emily
aiohttp_session=session,
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -105,7 +101,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
{"role": "system", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])

View File

@@ -56,16 +56,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = NeuphonicTTSService(
api_key=os.getenv("NEUPHONIC_API_KEY"),
settings=NeuphonicTTSService.Settings(
voice="fc854436-2dac-4d21-aa69-ae17b54e98eb", # Emily
),
voice_id="fc854436-2dac-4d21-aa69-ae17b54e98eb", # Emily
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -99,9 +95,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -7,7 +7,6 @@
import os
import aiohttp
from dotenv import load_dotenv
from loguru import logger
@@ -54,70 +53,62 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
async with aiohttp.ClientSession() as session:
stt = FalSTTService(
api_key=os.getenv("FAL_KEY"),
aiohttp_session=session,
)
stt = FalSTTService(
api_key=os.getenv("FAL_KEY"),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
assistant_aggregator, # Assistant spoken responses
]
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
assistant_aggregator, # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
await runner.run(task)
async def bot(runner_args: RunnerArguments):

View File

@@ -44,16 +44,12 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
system_instruction="You are a helpful LLM. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -82,7 +78,7 @@ async def main():
),
)
context.add_message({"role": "developer", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
runner = PipelineRunner()

View File

@@ -63,16 +63,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
api_key=os.getenv("MINIMAX_API_KEY", ""),
group_id=os.getenv("MINIMAX_GROUP_ID", ""),
aiohttp_session=session,
settings=MiniMaxHttpTTSService.Settings(
language=Language.EN,
),
params=MiniMaxHttpTTSService.InputParams(language=Language.EN),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -107,7 +103,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
{"role": "system", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])

View File

@@ -23,7 +23,7 @@ from pipecat.processors.aggregators.llm_response_universal import (
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.sarvam.llm import SarvamLLMService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.sarvam.stt import SarvamSTTService
from pipecat.services.sarvam.tts import SarvamHttpTTSService
from pipecat.transcriptions.language import Language
@@ -59,24 +59,18 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async with aiohttp.ClientSession() as session:
stt = SarvamSTTService(
api_key=os.getenv("SARVAM_API_KEY"),
settings=SarvamSTTService.Settings(
model="saarika:v2.5",
),
model="saarika:v2.5",
)
tts = SarvamHttpTTSService(
api_key=os.getenv("SARVAM_API_KEY"),
aiohttp_session=session,
settings=SarvamHttpTTSService.Settings(
language=Language.EN_IN,
),
params=SarvamHttpTTSService.InputParams(language=Language.EN),
)
llm = SarvamLLMService(
api_key=os.getenv("SARVAM_API_KEY"),
settings=SarvamLLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -111,7 +105,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "user", "content": "Please introduce yourself to the user."}
{"role": "system", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])

View File

@@ -21,7 +21,7 @@ from pipecat.processors.aggregators.llm_response_universal import (
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.sarvam.llm import SarvamLLMService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.sarvam.stt import SarvamSTTService
from pipecat.services.sarvam.tts import SarvamTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
@@ -54,23 +54,17 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = SarvamSTTService(
api_key=os.getenv("SARVAM_API_KEY"),
settings=SarvamSTTService.Settings(
model="saaras:v3",
),
model="saarika:v2.5",
)
tts = SarvamTTSService(
api_key=os.getenv("SARVAM_API_KEY"),
settings=SarvamTTSService.Settings(
model="bulbul:v3",
voice="shubh",
),
model="bulbul:v2",
voice_id="manisha",
)
llm = SarvamLLMService(
api_key=os.getenv("SARVAM_API_KEY"),
settings=SarvamLLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -96,7 +90,6 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
allow_interruptions=True,
),
)
@@ -104,7 +97,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
# Optionally, you can wait for 30 seconds and then change the voice.

View File

@@ -24,7 +24,7 @@ from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.soniox.stt import SonioxSTTService
from pipecat.services.soniox.stt import SonioxInputParams, SonioxSTTService
from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
@@ -53,9 +53,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = SonioxSTTService(
api_key=os.getenv("SONIOX_API_KEY"),
settings=SonioxSTTService.Settings(
# Add language hints to use a specific language
# Add strict mode to enforce the language hints
params=SonioxInputParams(
language_hints=[Language.EN],
language_hints_strict=True,
),
@@ -63,16 +61,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -105,9 +99,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -58,18 +58,15 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = InworldHttpTTSService(
api_key=os.getenv("INWORLD_API_KEY", ""),
aiohttp_session=session,
streaming=True,
settings=InworldHttpTTSService.Settings(
voice="Ashley",
),
voice_id="Ashley",
model="inworld-tts-1",
# Set to False for non-streaming mode or True for streaming mode.
streaming=True,
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful AI demonstrating Inworld AI's TTS. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a friendly and helpful way.",
),
system_instruction="You are a helpful AI demonstrating Inworld AI's TTS. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a friendly and helpful way.",
)
context = LLMContext()
@@ -111,7 +108,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info("Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
{"role": "system", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])

View File

@@ -10,7 +10,8 @@ from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.frames.frames import LLMRunFrame, TTSTextFrame
from pipecat.observers.loggers.debug_log_observer import DebugLogObserver, FrameEndpoint
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -24,6 +25,7 @@ from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.inworld.tts import InworldTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_output import BaseOutputTransport
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
@@ -54,17 +56,14 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = InworldTTSService(
api_key=os.getenv("INWORLD_API_KEY", ""),
settings=InworldTTSService.Settings(
voice="Ashley",
temperature=1.1,
),
voice_id="Ashley",
model="inworld-tts-1",
temperature=1.1,
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful AI demonstrating Inworld AI's TTS. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a friendly and helpful way.",
),
system_instruction="You are a helpful AI demonstrating Inworld AI's TTS. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a friendly and helpful way.",
)
context = LLMContext()
@@ -91,6 +90,13 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
enable_metrics=True,
enable_usage_metrics=True,
),
observers=[
DebugLogObserver(
frame_types={
TTSTextFrame: (BaseOutputTransport, FrameEndpoint.SOURCE),
}
),
],
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@@ -98,9 +104,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info("Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -60,17 +60,13 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = AsyncAIHttpTTSService(
api_key=os.getenv("ASYNCAI_API_KEY", ""),
settings=AsyncAIHttpTTSService.Settings(
voice="e0f39dc4-f691-4e78-bba5-5c636692cc04",
),
voice_id=os.getenv("ASYNCAI_VOICE_ID", "e0f39dc4-f691-4e78-bba5-5c636692cc04"),
aiohttp_session=session,
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -105,7 +101,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
{"role": "system", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])

View File

@@ -57,16 +57,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = AsyncAITTSService(
api_key=os.getenv("ASYNCAI_API_KEY", ""),
settings=AsyncAITTSService.Settings(
voice="e0f39dc4-f691-4e78-bba5-5c636692cc04",
),
voice_id=os.getenv("ASYNCAI_VOICE_ID", "e0f39dc4-f691-4e78-bba5-5c636692cc04"),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -100,9 +96,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -77,16 +77,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -128,9 +124,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
await audiobuffer.start_recording()
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@audiobuffer.event_handler("on_audio_data")

View File

@@ -59,16 +59,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = HumeTTSService(
api_key=os.getenv("HUME_API_KEY"),
# Replace with your Hume voice ID
settings=HumeTTSService.Settings(
voice="f898a92e-685f-43fa-985b-a46920f0650b",
),
voice_id="f898a92e-685f-43fa-985b-a46920f0650b",
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -113,9 +109,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
"💡 Word timestamps are enabled! Watch the console for TTSTextFrame logs showing each word with its PTS."
)
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -55,24 +55,20 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = GradiumSTTService(
api_key=os.getenv("GRADIUM_API_KEY"),
api_endpoint_base_url="wss://us.api.gradium.ai/api/speech/asr",
settings=GradiumSTTService.Settings(
params=GradiumSTTService.InputParams(
language=Language.EN,
),
)
tts = GradiumTTSService(
api_key=os.getenv("GRADIUM_API_KEY"),
voice_id="YTpq7expH9539ERJ",
url="wss://us.api.gradium.ai/api/speech/tts",
settings=GradiumTTSService.Settings(
voice="YTpq7expH9539ERJ",
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -106,9 +102,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -56,16 +56,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CambTTSService(
api_key=os.getenv("CAMB_API_KEY"),
settings=CambTTSService.Settings(
model="mars-flash",
),
model="mars-flash",
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful voice assistant powered by Camb AI text-to-speech. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Keep responses concise.",
),
system_instruction="You are a helpful voice assistant powered by Camb AI text-to-speech. ",
)
context = LLMContext()
@@ -99,9 +95,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info("Client connected")
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 2024-2026, Daily
# Copyright (c) 20242026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -21,9 +21,9 @@ from pipecat.processors.aggregators.llm_response_universal import (
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.responses.llm import OpenAIResponsesLLMService
from pipecat.services.hathora.stt import HathoraSTTService
from pipecat.services.hathora.tts import HathoraTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
@@ -51,24 +51,24 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
stt = HathoraSTTService(
model="nvidia-parakeet-tdt-0.6b-v3",
)
llm = OpenAIResponsesLLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAIResponsesLLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
tts = HathoraTTSService(
model="hexgrad-kokoro-82m",
)
# See https://models.hathora.dev/model/qwen3-30b-a3b
llm = OpenAILLMService(
base_url="https://app-362f7ca1-6975-4e18-a605-ab202bf2c315.app.hathora.dev/v1",
api_key=os.getenv("HATHORA_API_KEY"),
model=None,
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
@@ -77,11 +77,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt,
user_aggregator, # User responses
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
assistant_aggregator, # Assistant spoken responses
context_aggregator.assistant(), # Assistant spoken responses
]
)
@@ -98,9 +98,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -54,17 +54,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = PiperTTSService(
settings=PiperTTSService.Settings(
voice="en_US-ryan-high",
),
)
tts = PiperTTSService(voice_id="en_US-ryan-high")
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -98,9 +92,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -54,17 +54,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = KokoroTTSService(
settings=KokoroTTSService.Settings(
voice="af_heart",
),
)
tts = KokoroTTSService(voice_id="af_heart")
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -98,9 +92,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -30,20 +30,24 @@ from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
}
@@ -55,16 +59,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = ResembleAITTSService(
api_key=os.getenv("RESEMBLE_API_KEY"),
settings=ResembleAITTSService.Settings(
voice=os.getenv("RESEMBLE_VOICE_UUID"),
),
voice_id=os.getenv("RESEMBLE_VOICE_UUID"),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -98,9 +98,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -1,122 +0,0 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.smallest.tts import SmallestTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
)
tts = SmallestTTSService(
api_key=os.getenv("SMALLEST_API_KEY"),
settings=SmallestTTSService.Settings(
voice="sophia",
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
),
)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
[
transport.input(),
stt,
user_aggregator,
llm,
tts,
transport.output(),
assistant_aggregator,
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -95,16 +95,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
@@ -141,9 +137,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected: {client}")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -10,7 +10,7 @@ from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.frames.frames import LLMRunFrame, TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -19,6 +19,7 @@ from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.processors.filters.wake_check_filter import WakeCheckFilter
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
@@ -27,11 +28,6 @@ from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_start import WakePhraseUserTurnStartStrategy
from pipecat.turns.user_turn_strategies import (
UserTurnStrategies,
default_user_turn_start_strategies,
)
load_dotenv(override=True)
@@ -56,49 +52,31 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
settings=DeepgramSTTService.Settings(
keyterm=["pipecat"],
),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
system_instruction="You are a helpful assistant. Respond to what the user said in a creative and helpful way. Keep your responses brief.",
)
hey_robot_filter = WakeCheckFilter(["hey robot", "hey, robot"])
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
start=[
WakePhraseUserTurnStartStrategy(
phrases=["pipecat"],
# Timeout before wake phrase must be spoken again
timeout=5.0,
),
*default_user_turn_start_strategies(),
]
),
vad_analyzer=SileroVADAnalyzer(),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
stt, # STT
hey_robot_filter, # Filter out speech not directed at the robot
user_aggregator, # User responses
llm, # LLM
tts, # TTS
@@ -121,7 +99,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
{
"role": "system",
"content": "Please introduce yourself. Tell the user they should say 'Hey Robot' before talking to you.",
}
)
await task.queue_frames([LLMRunFrame()])

View File

@@ -106,16 +106,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
context = LLMContext()

View File

@@ -1,139 +0,0 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
from dotenv import load_dotenv
from loguru import logger
from PIL import Image
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.responses.llm import OpenAIResponsesLLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
load_dotenv(override=True)
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAIResponsesLLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAIResponsesLLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way. You are also able to describe images.",
),
)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
assistant_aggregator, # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
if not runner_args.body:
script_dir = os.path.dirname(__file__)
runner_args.body = {
"image_path": os.path.join(script_dir, "assets", "cat.jpg"),
"question": "Describe this image",
}
image_path = runner_args.body["image_path"]
question = runner_args.body["question"]
# Kick off the conversation.
image = Image.open(image_path)
message = await LLMContext.create_image_message(
image=image.tobytes(),
format="RGB",
size=image.size,
text=question,
)
context.add_message(message)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -53,16 +53,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way. You are also able to describe images.",
),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way. You are also able to describe images.",
)
context = LLMContext()

View File

@@ -53,16 +53,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = AnthropicLLMService(
api_key=os.getenv("ANTHROPIC_API_KEY"),
settings=AnthropicLLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way. You are also able to describe images.",
),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way. You are also able to describe images.",
)
context = LLMContext()

View File

@@ -53,18 +53,17 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = AWSBedrockLLMService(
aws_region="us-west-2",
settings=AWSBedrockLLMService.Settings(
model="us.anthropic.claude-sonnet-4-6",
temperature=0.8,
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way. You are also able to describe images.",
),
model="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
# Note: usually, prefer providing latency="optimized" param.
# Here we can't because AWS Bedrock doesn't support it for Claude 3.7,
# which we need for image input.
params=AWSBedrockLLMService.InputParams(temperature=0.8),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way. You are also able to describe images.",
)
context = LLMContext()

View File

@@ -53,16 +53,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
settings=GoogleLLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way. You are also able to describe images.",
),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way. You are also able to describe images.",
)
context = LLMContext()

View File

@@ -42,9 +42,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
vision = MoondreamService()

View File

@@ -13,7 +13,6 @@ from pipecat.frames.frames import Frame, TranscriptionFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.audio.vad_processor import VADProcessor
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
@@ -41,12 +40,15 @@ class TranscriptionLogger(FrameProcessor):
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
}
@@ -57,9 +59,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = WhisperSTTService()
tl = TranscriptionLogger()
vad_processor = VADProcessor(vad_analyzer=SileroVADAnalyzer())
pipeline = Pipeline([transport.input(), vad_processor, stt, tl])
pipeline = Pipeline([transport.input(), stt, tl])
task = PipelineTask(
pipeline,

View File

@@ -15,7 +15,6 @@ from pipecat.frames.frames import Frame, TranscriptionFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.audio.vad_processor import VADProcessor
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.whisper.stt import WhisperSTTService
from pipecat.transports.local.audio import LocalAudioTransport, LocalAudioTransportParams
@@ -41,15 +40,15 @@ async def main():
transport = LocalAudioTransport(
LocalAudioTransportParams(
audio_in_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
)
)
stt = WhisperSTTService()
tl = TranscriptionLogger()
vad_processor = VADProcessor(vad_analyzer=SileroVADAnalyzer())
pipeline = Pipeline([transport.input(), vad_processor, stt, tl])
pipeline = Pipeline([transport.input(), stt, tl])
task = PipelineTask(pipeline)

Some files were not shown because too many files have changed in this diff Show More