Compare commits

...

658 Commits

Author SHA1 Message Date
James Hush
b6da5c18b7 Add changelog for #4389 2026-04-30 14:38:30 +08:00
James Hush
4b6881b81d fix(aws): surface fatal errors on missing/invalid credentials
AWS services were silently failing on bad credentials. Nova Sonic was the
worst offender: no audio, no clear log, and an InvalidStateError from
awscrt at shutdown that masked the real cause.

Changes:
- Nova Sonic: connect failure now pushes a fatal ErrorFrame with a
  "check AWS credentials and region" hint. _disconnect wraps stream and
  session-end cleanup so a partially-initialized stream no longer raises
  InvalidStateError on top of the real error.
- Bedrock LLM and Polly TTS: branch on botocore ClientError. Auth-class
  codes (UnrecognizedClientException, InvalidSignatureException,
  AccessDeniedException, ExpiredTokenException, InvalidAccessKeyId,
  SignatureDoesNotMatch, MissingAuthenticationTokenException, AuthFailure)
  push fatal errors. Other client errors stay non-fatal (transient).
- Transcribe STT: _connect_websocket catch-all is now fatal, since
  presigned URL and websocket connect failures don't recover on retry.
2026-04-30 14:36:44 +08:00
Mark Backman
bfdd19464f Merge pull request #4385 from pipecat-ai/mb/runner-session-id
feat(runner): add session_id to RunnerArguments
2026-04-29 13:17:47 -04:00
Mark Backman
1a93ff52f1 Merge pull request #4386 from pipecat-ai/mb/update-soniox-model
feat(soniox): update default TTS model to tts-rt-v1
2026-04-29 13:17:09 -04:00
Mark Backman
6e2008a7a6 Add changelog for #4386 2026-04-29 11:09:38 -04:00
Mark Backman
da8d3a2d80 feat(soniox): update default TTS model to tts-rt-v1
Promotes the Soniox TTS default model from `tts-rt-v1-preview` to the
generally available `tts-rt-v1`.
2026-04-29 11:05:12 -04:00
Mark Backman
6b608e7e22 Add changelog for #4385 2026-04-29 09:53:42 -04:00
Mark Backman
924b9a9d8c feat(runner): add session_id to RunnerArguments
Adds a `session_id: str | None` field to `RunnerArguments` so bots can
log/trace a per-session identifier in local development the same way
they can in Pipecat Cloud (where it is provided via the
`x-daily-session-id` header).

The local runner now mints a UUID at every `*RunnerArguments`
construction site. For paths that already returned a `sessionId` to the
caller (Daily `/start`, dial-in webhook), a single UUID is now generated
and shared between `runner_args.session_id` and the response body
instead of being thrown away. The SmallWebRTC `/api/offer` endpoint
accepts an optional `session_id` so the `/sessions/{session_id}/...`
proxy can thread it through.

This is the prerequisite step for collapsing pipecat-cloud's
`SessionArguments` / `*SessionArguments` hierarchy onto the upstream
runner types.
2026-04-29 09:45:55 -04:00
Aleix Conchillo Flaqué
9411c4b67e Merge pull request #4382 from pipecat-ai/aleix/unfill-changelog-script
chore(changelog): add release-changelog.py and fix (PR line indentation in towncrier template
2026-04-28 13:18:49 -07:00
Mark Backman
ac5eb97670 Merge pull request #4384 from pipecat-ai/mb/nvidia-remove-riva-ref
Update README to remove NVIDIA references to RIVA
2026-04-28 13:18:36 -04:00
Mark Backman
3034f8bb3b Update README to remove NVIDIA references to RIVA 2026-04-28 12:42:58 -04:00
Aleix Conchillo Flaqué
60c66eda48 chore(towncrier): indent (PR ref line by two spaces in template
So the rendered changelog has the (PR [...]) line aligned as a list
continuation under its bullet. Verified with both short and wrapped
entries via `towncrier build --draft`.
2026-04-27 15:07:53 -07:00
Aleix Conchillo Flaqué
ea3585146c chore(scripts): add release-changelog.py
Adds a script to unfill (single-line) entry paragraphs in CHANGELOG.md
while keeping `(PR [...])` on its own continuation line.
2026-04-27 15:07:53 -07:00
Aleix Conchillo Flaqué
9697abe559 Merge pull request #4381 from pipecat-ai/changelog-1.1.0
Release 1.1.0 - Changelog Update
2026-04-27 14:02:20 -07:00
aconchillo
cb0335c82a Update changelog for version 1.1.0 2026-04-27 13:59:17 -07:00
Aleix Conchillo Flaqué
f560614af9 Merge pull request #4379 from pipecat-ai/aleix/bump-daily-python-0.28
chore(daily): bump daily-python to ~=0.28.0
2026-04-27 13:46:00 -07:00
Aleix Conchillo Flaqué
d7a196a3f4 docs(changelog): add entry for daily-python 0.28.0 bump 2026-04-27 13:35:14 -07:00
Aleix Conchillo Flaqué
644e106c03 chore(daily): bump daily-python to ~=0.28.0 2026-04-27 13:35:14 -07:00
Mark Backman
70f83b4a75 Merge pull request #4360 from pipecat-ai/mb/soniox-tts
Add Soniox real-time TTS service
2026-04-27 16:06:24 -04:00
Mark Backman
35ed37c539 chore: add changelog fragment for PR #4360 2026-04-27 16:04:02 -04:00
Mark Backman
58a038ddb2 Add Soniox real-time TTS service
Introduce SonioxTTSService, a WebSocket TTS provider that streams text and
receives audio over a persistent connection, multiplexing up to 5 concurrent
streams per socket via Soniox's `stream_id`. Also updates the README service
table and the Soniox voice example to use the new TTS end-to-end.
2026-04-27 16:04:02 -04:00
Aleix Conchillo Flaqué
de3c1d6e8b Merge pull request #4370 from pipecat-ai/aleix/daily-screen-video-destination
feat(daily): support screenVideo destination and configurable camera send settings
2026-04-27 11:36:22 -07:00
Aleix Conchillo Flaqué
0a9878998f docs(changelog): add entries for camera_out_send_settings and video_out_bitrate deprecation 2026-04-27 11:28:59 -07:00
Aleix Conchillo Flaqué
8459c01af8 feat(daily): add camera_out_send_settings and deprecate video_out_bitrate
Replaces the hardcoded camera publishing send settings in
DailyTransport with a new DailyParams.camera_out_send_settings dict that
applications can pass through verbatim to the Daily client. This makes
the encoding/codec/bitrate configuration user-controllable instead of
being driven solely by the generic TransportParams fields.

As a consequence, TransportParams.video_out_bitrate is deprecated for
the Daily transport (now configured via camera_out_send_settings) and
its default is changed to None.
2026-04-27 11:28:59 -07:00
Aleix Conchillo Flaqué
baaabf7d73 docs(changelog): add entry for screenVideo destination support 2026-04-27 11:28:59 -07:00
Aleix Conchillo Flaqué
4735b74776 feat(daily): support screenVideo destination for video output
Adds a dedicated screen video track alongside the existing camera track
so applications can publish to Daily's built-in "screenVideo" destination
via video_out_destinations. The track is created at join time and wired
into the client settings (inputs and publishing) when "screenVideo" is
configured; write_video_frame routes frames to the appropriate track
based on the frame's transport_destination.
2026-04-27 11:28:59 -07:00
kompfner
0109aea04c Merge pull request #4377 from pipecat-ai/pk/add-example-for-tool-resources
Add example demonstrating usage of `tool_resources`
2026-04-27 13:02:39 -04:00
kompfner
ce1311f6ba Merge pull request #4301 from bnovik0v/fix-4300-missing-tool-lifecycle
Fail missing tool calls cleanly
2026-04-27 11:54:43 -04:00
Paul Kompfner
2520243d9d style: apply ruff format 2026-04-27 11:48:27 -04:00
borislav
8869e25142 fix: compare bound method by equality, not identity
Bound methods are created fresh on each attribute access, so
'self._missing_function_call_handler is self._missing_function_call_handler'
is always False. Using 'is' meant the placeholder branch never fired and
both warnings logged when a function was missing at queue time.

Switch to == so equality compares the underlying function and instance.
Strengthen the missing-at-queue-time test to assert the second warning
does not fire.
2026-04-27 17:34:31 +02:00
borislav
822392b0d4 fix: re-resolve registry item at execution time
Address review feedback: a function may be unregistered between when
run_function_calls queues it and when _run_function_call executes it.
Restore the live lookup, falling back to the missing-function handler
when the entry is gone, so the call still terminates with a normal
tool result. Factor the missing-handler item construction into a
helper since it's now built in two places.
2026-04-27 17:22:30 +02:00
Paul Kompfner
124863175a Add example demonstrating usage of tool_resources 2026-04-27 11:20:53 -04:00
kompfner
17a5e78fb4 Merge pull request #4376 from pipecat-ai/pk/add-openai-responses-to-readme
Add OpenAI responses to readme
2026-04-27 11:02:39 -04:00
kompfner
bc29bdb95e Merge pull request #4371 from Stoic-Angel/feat-global-context
Add a global context for tool calls: tool_resources
2026-04-27 10:55:03 -04:00
Paul Kompfner
005fe33b25 Update docs URLs in README to reflect new docs site structure and avoid redirects 2026-04-27 10:22:49 -04:00
Paul Kompfner
24154474c9 Add OpenAI Responses to the README's list of LLM services 2026-04-27 10:19:13 -04:00
kompfner
86effc4d10 Merge pull request #4015 from prettyprettyprettygood/feat/nova-sonic-session-continuation
feat(nova-sonic): add proactive session continuation for conversation…
2026-04-27 09:36:48 -04:00
Mark Backman
58e50882d8 Merge pull request #4374 from pipecat-ai/mb/fix-daily-runner-room-props
Expire runner-created Daily rooms after 4h
2026-04-27 09:07:31 -04:00
Mark Backman
ef183d0c96 Add changelog for #4374 2026-04-27 09:00:17 -04:00
Mark Backman
f078df7805 runner: expire Daily rooms after 4h to mirror Pipecat Cloud session limit
Runner-created Daily rooms previously had no expiration when callers
posted partial `dailyRoomProperties` (e.g. `{"start_video_off": true}`).
The model-default `exp=None` and `eject_at_room_exp=False` meant Daily's
cron never cleaned them up, so rooms accumulated indefinitely.

Encode the policy in the runner: define `PIPECAT_CLOUD_ROOM_EXP_HOURS=4.0`,
inject `exp` and `eject_at_room_exp=True` into user-supplied properties via
`setdefault` (so explicit caller values still win), and pass
`room_exp_duration` to all four `configure()` call sites.
2026-04-27 09:00:17 -04:00
Mark Backman
815cd44c2a Merge pull request #4372 from pipecat-ai/mb/relax-frames-proto-5x
Relax protobuf pin to support both 5.x and 6.x runtimes
2026-04-27 08:58:23 -04:00
Garegin Harutyunyan
e5941926be Krisp tt demo tool (#4335)
* VIVA SDK TT v3 support

* Format fix.

* Renamed the API naming, removed '3' from the name.

* Implementation of User turn start strategy using Krisp VIVA Interruption Prediction in scope of TT v3 support.

* TT demo tool

* Some improvements for demo scripts, audio recordin, etc.

* Enhance demo scripts with VAD selection and audio embedding features. Updated HTML report to include annotated audio players and improved response time metrics in summary formatting. Added README for setup and usage instructions.

* Refactor interrupt prediction demo to compare multiple interruption strategies (Krisp IP vs VAD). Updated README with usage instructions and output details. Enhanced audio processing with new helper functions for generating beeps and mixing audio.

* Refactor demo scripts to improve latency metrics by introducing total_delay property in TurnEvent. Update formatting in reports and visualizations to reflect accurate speech end times, including VAD wait times. Enhance HTML report with detailed latency information and adjust audio processing to account for VAD stop seconds.

* Add audio resampling functionality and update demo scripts for improved audio processing

- Introduced `resample_audio` function to handle audio resampling with linear interpolation.
- Updated `demo_audio_recorder.py` to utilize the new resampling feature, ensuring audio is saved at the requested sample rate.
- Modified `demo_interrupt_prediction.py` and `demo_turn_taking.py` to resample audio to 16 kHz for compatibility with Silero VAD.
- Adjusted imports in demo scripts to include the new resampling function.
- Enhanced error handling for sample rate discrepancies in audio recording.

* Enhance demo_interrupt_prediction.py with VAD type selection and improved processing logic

- Added support for selecting between "silero" and "krisp" VAD engines in the demo script.
- Introduced a new create_vad function to configure VAD analyzers based on the selected type.
- Updated audio processing logic to handle VAD type-specific resampling and state management.
- Modified the KrispVivaIPUserTurnStartStrategy to utilize a separate vad_flag for per-frame VAD input, improving interruption detection accuracy.

* Refactor audio processing scripts for improved readability and consistency

- Updated type hinting in `resample_audio` function to use `tuple` instead of `Tuple`.
- Simplified print statements in `demo_audio_recorder.py`, `demo_formatting.py`, and `demo_interrupt_prediction.py` for better readability.
- Adjusted argument formatting in `demo_audio_recorder.py` and `demo_formatting.py` for consistency.
- Cleaned up list comprehensions in `demo_formatting.py`, `demo_html_report.py`, and `demo_interrupt_prediction.py` for clarity.
- Enhanced error handling in `__init__.py` for the KrispVivaIPUserTurnStartStrategy import.

* Refactor VAD handling in KrispVivaIPUserTurnStartStrategy and update tests for clarity

- Simplified the argument formatting in the _handle_vad_started method for improved readability.
- Updated test assertions to reflect changes in VAD processing logic, ensuring that the vad_flag is correctly set to False during continuous state processing.
- Enhanced test cases to verify that the process method is called appropriately under different conditions.

* more format fixes.

* removed demo scripts.

* reverted wrongly removed file.

* Corrected the IP integration logic.

* style fix.

* Refactor audio processing and state management in KrispVivaIPUserTurnStartStrategy

- Removed the unused _vad_flag attribute to streamline state tracking.
- Updated the reset method to clear the audio buffer instead of resetting the vad_flag.
- Adjusted the process_frame method to use _speech_active for VAD input, enhancing clarity in the logic.
- Modified tests to reflect changes in state management and ensure proper functionality of the reset method and audio buffer handling.

* FIxed formatting

---------

Co-authored-by: Aram Poghosyan <apoghosyan@krisp.ai>
2026-04-27 08:14:00 -04:00
Mark Backman
6266c026a6 Merge pull request #4362 from ai-coustics/ai-coustics/aic-sdk-py-v2.2.0-update
Update aic-sdk to v2.2.0
2026-04-25 06:51:41 -04:00
Gökmen Görgen
e25dccfc6b update aic-sdk to ~=2.2.0 and rename AICOUSTICS_LICENSE_KEY to AIC_LICENSE_KEY. 2026-04-25 10:13:06 +02:00
Gökmen Görgen
3bbfc42854 remove adaptive audio enhancement example and support for runtime enhancement level updates in AICFilter. 2026-04-25 10:05:47 +02:00
Gökmen Görgen
3b2127f912 rename environment variables and references from AICOUSTICS to AIC. 2026-04-25 09:51:23 +02:00
Gökmen Görgen
ea12b10742 rename mcp-aic-adaptive.py to mcp-aicoustics-adaptive.py. 2026-04-25 09:51:23 +02:00
Gökmen Görgen
a2fbed86cf add adaptive audio enhancement example and support for runtime enhancement level updates in AICFilter. 2026-04-25 09:51:23 +02:00
Gökmen Görgen
f75f361629 bump aic-sdk to 2.2.0 and update AICFilter with model_id and enhancement_level changes. 2026-04-25 09:51:23 +02:00
Mark Backman
4c153e5d3c Add changelog for #4372 2026-04-24 21:20:46 -04:00
Mark Backman
4088992d97 Relax protobuf pin to support both 5.x and 6.x runtimes
Pipecat 1.0.8 hard-required protobuf 6.x via the base `protobuf>=6.31.1,<7`
pin, blocking users whose dependency graph already constrains protobuf to
the 5.x line. The original bump (PR #4136) was only needed because
`nvidia-riva-client>=2.25.1` ships gencode compiled with protoc 6.31.1.

Changes:

- Widen base pin to `protobuf>=5.29.6,<7`.
- Regenerate `frames_pb2.py` with `grpcio-tools~=1.67.1` (protoc 5.x). Per
  Google's cross-version runtime guarantee, 5.x gencode runs on both 5.x
  and 6.x runtimes, so this single artifact serves all users.
- Loosen the dev pin `grpcio-tools` to `>=1.67.1,<2` so contributors can
  install `pipecat[dev,nvidia]` without resolver conflict. Comment in
  `frames.proto` documents the 1.67.x requirement for regeneration.
- Add an explicit `protobuf>=6.31.1,<7` to the `nvidia` extra. This
  compensates for nvidia-riva-client's missing `protobuf` install
  requirement (upstream packaging gap, see
  https://github.com/nvidia-riva/python-clients/issues/172). When that
  issue is resolved, the explicit protobuf entry in the `nvidia` extra
  can be removed.

Verified: pipecat imports cleanly on both protobuf 5.29.6 and 6.33.6;
`tests/test_protobuf_serializer.py` passes; `import riva.client` succeeds
when `pipecat[nvidia]` is installed.
2026-04-24 21:15:32 -04:00
Osman Ipek
f1b16a672a feat(nova-sonic): add proactive session continuation for conversations >8min
Nova Sonic sessions have an AWS-imposed ~8-minute time limit. This adds
transparent session continuation that rotates sessions in the background
before the limit is reached, preserving conversation context with no
user-perceptible interruption.

Implementation follows the AWS reference architecture:
- Monitor loop detects when session age exceeds threshold
- On assistant AUDIO contentStart: start buffering user audio, create next
  session (sessionStart + promptStart + system instruction)
- Track SPECULATIVE/FINAL text counts as completion signal
- On completion signal: send conversation history + audioInputStart +
  buffered audio to next session, then promote immediately
- Close old session in background (non-blocking)
- Dead session detection: recreate next session if idle >30s

Key design decisions:
- Session continuation enabled by default (fundamental for long conversations)
- Conversation history tracked in real-time via _sc_conversation_history
  (independent of pipeline context aggregator which updates asynchronously)
- Completion signal check in _handle_content_end_event (after history update)
  to ensure latest text is included in handoff
- Rolling audio buffer (default 3s) captures user audio during transition
- transition_threshold_seconds capped at 420s (7min) for safety margin
- Unified event methods (_send_text_event, _send_client_event, etc.) accept
  optional stream/prompt_name params, eliminating duplicate SC methods

Also adds:
- SessionContinuationParams config (enabled, threshold, buffer, timeout)
2026-04-24 14:55:55 -07:00
Aayush Jain
65b15a8528 add changelog 2026-04-25 02:23:25 +05:30
Aayush Jain
108e32eb72 Add a global context for tool calls - tool_resources, as a parameter to PipelineTask and FrameProcessorSetup 2026-04-25 02:12:40 +05:30
Filipi da Silva Fuchter
38a02271c5 Merge pull request #4368 from pipecat-ai/filipi/stt_service
Fix issue where STTService unintentionally created a method with the same name as SegmentedSTTService.
2026-04-24 14:31:36 -03:00
filipi87
2ce203aeb8 Renaming the method to _maybe_reconnect_on_user_stopped_speaking. 2026-04-24 13:08:32 -03:00
filipi87
b30df95f13 Fix issue where STTService unintentionally created a method with the same name as SegmentedSTTService. 2026-04-24 13:00:38 -03:00
kompfner
6be8deee2a Merge pull request #4361 from pipecat-ai/pk/pyright-fixes
Some pyright fixes
2026-04-24 11:58:28 -04:00
Paul Kompfner
c113cacd59 refactor(types): name the LLMContext/OpenAI boundary with explicit cast helpers
LLMContext's NotGiven, LLMContextToolChoice, and LLMStandardMessage are
currently aliased to their OpenAI equivalents, so passing values
between the two sides type-checks implicitly. That works today but
obscures the fact that these are meant to be conceptually distinct —
if LLMContext ever diverges from OpenAI's types, every implicit
crossing would silently break.

Introduce two module-private cast helpers in open_ai_adapter.py:

- _openai_from_llm_context_tool_choice(tool_choice)
- _openai_from_llm_standard_message(message)

Both are typed no-ops today (implemented with typing.cast) but each
carries a docstring explaining why the cast is present, and every
boundary crossing now routes through a named function. Future readers
(and future greps) can find the crossings; a later divergence becomes
a mechanical find-and-update rather than hunting through adapter code.

No behavior change, no pyright error delta.
2026-04-24 10:10:03 -04:00
Paul Kompfner
d0495eeef6 fix(types): narrow voice in SpeechmaticsTTSSettings to disallow None
After widening TTSSettings.voice to str | None | _NotGiven (so other
TTS services can opt into None as a valid "no voice" state), pyright
flagged Speechmatics' URL builder receiving str | None where it
required str.

Speechmatics has no "no voice" mode (the URL path includes the voice
name), so override the inherited field in SpeechmaticsTTSSettings to
str | _NotGiven. The call site stays as a plain assert_given(...)
without an extra None check.
2026-04-23 21:08:47 -04:00
Paul Kompfner
c3eb69165c fix(types): accept SDK NotGiven in LLM Settings fields used for passthrough
Three LLM services initialize certain Settings fields with the SDK's
NOT_GIVEN (openai.NOT_GIVEN or anthropic.NOT_GIVEN) so the value
flows unmodified into SDK API calls. The inherited field types from
LLMSettings only admit pipecat's _NotGiven, so pyright flagged each
constructor call as a flavor mismatch.

Widen the field types in each service-specific Settings subclass so
they accept both pipecat's _NotGiven (for delta-mode defaults) and
the corresponding SDK NotGiven (for store-mode passthrough):

- OpenAILLMSettings: frequency_penalty, presence_penalty, seed,
  temperature, top_p, max_tokens, max_completion_tokens.
- OpenAIResponsesLLMSettings: temperature, top_p,
  max_completion_tokens.
- AnthropicLLMSettings: temperature, top_k, top_p, thinking.

Every overridden field is genuinely read from self._settings and
passed directly to the SDK, so none of the overrides are vestigial.

Clears 21 pyright errors and restores test_service_settings_complete
parity with the pre-NOT_GIVEN-swap state.
2026-04-23 18:32:46 -04:00
Paul Kompfner
0302f6d05c chore(pyright): drop newly-clean files from ignore list
asyncai/tts and google/vertex/llm are now clean after the missing-None
sweep (both benefited from the TTSSettings.voice / LLMSettings
cascades).

- src/pipecat/services/asyncai/tts.py
- src/pipecat/services/google/vertex/llm.py
2026-04-23 18:18:00 -04:00
Paul Kompfner
b9ff333654 fix(types): admit None on settings fields that accept it as a default
Service-specific Settings subclasses declared fields as T | _NotGiven
(no None), but the services routinely pass None to those fields during
init to mean "don't override — use the vendor's default". The field
type just didn't reflect that a None value is valid, so pyright
flagged every None at the call sites.

Change the declarations to T | None | _NotGiven, matching the pattern
already used by ServiceSettings.model and TTSSettings.language. No
constructor-call changes; the default_factory stays NOT_GIVEN.

Fields touched across 11 files:

- services/settings.py: TTSSettings.voice (base class; covers
  asyncai, cartesia, elevenlabs, fish, hume, kokoro, lmnt, mistral,
  neuphonic, piper, resembleai, rime, xtts TTS services).
- services/aws/llm.py: latency.
- services/aws/tts.py: engine, pitch, rate, volume, lexicon_names.
- services/azure/tts.py: emphasis, pitch, rate, role, style,
  style_degree, volume.
- services/google/gemini_live/llm.py: vad.
- services/google/llm.py: thinking.
- services/google/stt.py: language_codes.
- services/inworld/tts.py: speaking_rate, temperature.
- services/openai/tts.py: instructions, speed.
- services/speechmatics/stt.py: 13 fields (domain, operating_point,
  max_delay, end_of_utterance_*, punctuation_overrides, *_partials,
  split_sentences, enable_diarization, speaker_*, max_speakers,
  prefer_current_speaker, extra_params).
- services/ultravox/llm.py: output_medium.

Clears 94 pyright errors (1035 -> 941).
2026-04-23 18:18:00 -04:00
Paul Kompfner
92610944af chore(pyright): drop newly-clean files from ignore list
Three files no longer have pyright errors after the is_given /
assert_given sweep — remove them from the ignore list (which serves as
a live todo of files with remaining type errors).

- src/pipecat/processors/gstreamer/pipeline_source.py
- src/pipecat/services/camb/tts.py
- src/pipecat/services/speechmatics/tts.py
2026-04-23 17:44:17 -04:00
Paul Kompfner
6a337f1bc6 fix(types): assert_given at store-mode settings read sites
Apply assert_given across service modules to narrow reads from
store-mode settings fields (self._settings.X, default_settings.X),
where _NotGiven is declared in the field type but should never appear
at runtime (enforced by validate_complete()).

Two idioms used:

- Inline wrap for single uses:
    func(assert_given(self._settings.enable_prompt_caching), ...)

- Extract-and-reuse when the same value is used multiple times:
    thinking = assert_given(self._settings.thinking)
    if thinking:
        params["thinking"] = thinking.model_dump(...)

43 service files touched. Cleared ~172 pyright errors; remaining
_NotGiven-related errors are in adjacent categories (flavor mismatch
between openai/anthropic NotGiven and pipecat _NotGiven, settings
field types that should allow None but don't) that need different
fixes.
2026-04-23 17:39:17 -04:00
Filipi da Silva Fuchter
ef7fa07bf7 Merge pull request #4358 from pipecat-ai/filipi/fix_aiortc_sctp
Fixed SmallWebRTC data channel silently stalling on networks with a 1280-byte MTU
2026-04-23 17:49:18 -03:00
filipi87
ce1506792e Linking to the docs instead of full explanation. 2026-04-23 17:46:54 -03:00
Paul Kompfner
70f3d32734 feat(types): add assert_given for narrowing store-mode settings reads
In store-mode settings objects, _NotGiven should never appear (the
invariant enforced by validate_complete). But the declared field types
still include _NotGiven because the same class doubles as delta mode,
so every field read is typed X | None | _NotGiven and pyright flags
operations that assume X | None.

assert_given is a one-line extractor that narrows away _NotGiven and
raises loudly if the invariant is violated — preferable to scattering
is_given guards that defend against something that can't occur in
practice.

    resolved_model = assert_given(self._settings.model)  # str | None
2026-04-23 16:40:07 -04:00
Paul Kompfner
356618b448 fix(types): use is_given at call sites pyright flagged
Replace direct identity checks against NOT_GIVEN with is_given() at
sites where pyright's inability to narrow on non-singleton sentinels
was causing type errors.

- adapters/services/anthropic_adapter.py: narrow converted.system for
  _resolve_system_instruction.
- services/openai/llm.py: narrow params.service_tier using OpenAI's
  is_given.
- services/sarvam/llm.py: narrow tools / tool_choice using OpenAI's
  is_given (aliased as openai_is_given alongside the existing
  settings.is_given import).
- services/sarvam/tts.py: narrow settings.voice using settings.is_given.
2026-04-23 16:15:07 -04:00
Paul Kompfner
1624d7a474 feat(types): add is_given TypeGuard helpers for NotGiven sentinels
Pyright can't narrow identity checks against module-level NotGiven
sentinels (they aren't typed as singletons), which leaves many
NotGiven-bearing unions stuck as unnarrowed types throughout the
codebase. Introduce is_given TypeGuard helpers so narrowing works via
isinstance under the hood.

Each helper is co-located with the NotGiven flavor it guards:

- services/settings.py: upgrade the existing is_given to a TypeGuard.
- processors/aggregators/llm_context.py: add an is_given for
  LLMContext's NotGiven. Treat LLMContext's re-exported types
  (LLMStandardMessage, LLMContextToolChoice, NOT_GIVEN, NotGiven) as
  LLMContext's own — independent definitions that happen to coincide
  with OpenAI's as an implementation detail.
- adapters/services/anthropic_adapter.py: add is_given for anthropic's
  NotGiven.
- adapters/services/open_ai_adapter.py: add is_given for openai's
  NotGiven.
2026-04-23 15:33:43 -04:00
Paul Kompfner
092b1dcb0f fix(types): widen TLLMInvocationParams bound to Mapping[str, Any]
TypedDict types are not subtypes of dict[...] in the type system
(per PEP 589), so TypedDict-based invocation param classes could not
satisfy the TypeVar bound. Mapping[str, Any] accepts TypedDicts while
preserving the "string-keyed mapping" constraint.
2026-04-23 14:35:59 -04:00
Mark Backman
b90ea9bf6a Merge pull request #4352 from pipecat-ai/mb/pyright-fixes-1-per-file
More pyright fixes
2026-04-23 14:14:36 -04:00
kompfner
05c97804d5 Merge pull request #4359 from pipecat-ai/pk/changelog-4355-rename
chore: rebind Gemini Live reconnect changelog fragment to PR #4355
2026-04-23 14:10:36 -04:00
Paul Kompfner
7a8357a569 chore: rebind Gemini Live reconnect changelog fragment to PR #4355
The original contributor's PR (#4328) landed as #4355. Rename the fragment
so the rendered changelog links to the merged PR, and add the leading `- `
bullet prefix that towncrier expects.
2026-04-23 12:00:56 -04:00
filipi87
44756de15a Adding changelog for the SmallWebRTC fix. 2026-04-23 12:19:56 -03:00
filipi87
94304ec74e Fixed SmallWebRTC data channel silently stalling on networks with a 1280-byte MTU. 2026-04-23 12:18:33 -03:00
kompfner
a3fe34f4a2 Merge pull request #4355 from pipecat-ai/pk/gemini-live-context-reseed-on-reconnect
Re-seed Gemini Live context on reconnect without session resumption
2026-04-23 11:00:22 -04:00
Sathwika Reddy Geereddy
21f6c2afa5 Update NVIDIA STT services for Nemotron Speech defaults and config parity (#4269)
* Update NVIDIA STT services for Nemotron Speech defaults and config parity

* Add changelog entry for PR #4269

* initialize boosted LM settings defaults in streaming STT

* Align NVIDIA STT language handling with other STT services

* add finalised flag to Nvidia stt final transcripts, remove processing latency logs

* Changing interim transcription logging to tracing.

---------

Co-authored-by: sathwika <geereddysath@nvidia.com>
Co-authored-by: filipi87 <filipi87@gmail.com>
2026-04-23 09:01:27 -04:00
Filipi da Silva Fuchter
4d14251f4a Merge pull request #4354 from pipecat-ai/filipi/includes_inter_frame_spaces
feat(tts): add includes_inter_frame_spaces flag to word-timestamp API - follow-up
2026-04-23 08:49:26 -03:00
Paul Kompfner
1421c4ba22 fix: handle Gemini Live 2.5 quirks when re-seeding context on reconnect
Extends the reconnect re-seeding fix to work cleanly on Gemini Live 2.5,
which has stricter seed requirements than 3.x and a documented audio-input /
history-recall limitation. Both initial connection and reconnect now share a
single code path (`_create_initial_response(for_reconnect=...)`), with four
well-documented cases.

On Gemini 2.5 reconnect, `turn_complete=True` is now forced on the seed so
the model produces a recap-style response immediately instead of briefly
acting "forgetful" on the user's next utterance — the latter being
especially jarring mid-conversation. When a 2.5 seed doesn't already end
with a user turn (e.g. the bot had finished speaking before the disconnect),
a blank user turn is appended to satisfy the server's seed-shape
requirement. Gemini 3.x needs neither workaround.
2026-04-22 15:58:54 -04:00
filipi87
6b1d8d9fa5 Fixing merge conflicts. 2026-04-22 15:22:32 -03:00
filipi87
ac810e57ed Merge branch 'main' into filipi/includes_inter_frame_spaces
# Conflicts:
#	uv.lock
2026-04-22 15:22:06 -03:00
filipi87
bba7ca80e3 Bumping to small-webrtc-prebuilt 2.5.0 to fix karaoke highlighting. 2026-04-22 15:20:37 -03:00
filipi87
79250f1fe0 Making includes_inter_frame_spaces optional for word-timestamp. 2026-04-22 14:20:30 -03:00
Mark Backman
4f6e76e6fd Add changelog entries for #4352 2026-04-22 12:23:33 -04:00
Mark Backman
b0962861c8 Acknowledge Tkinter's GC-reference idiom with a scoped type ignore
Tkinter's `Label` only stores `PhotoImage` references at the C level, so
Python GC eats them unless something on the Python side keeps a
reference. The canonical fix is to stash the reference on the widget
itself: `label.image = photo`. Tkinter widgets are plain Python objects,
so the assignment works at runtime, but the stub declares no `image`
attribute (correctly — there isn't one; we're adding it).

Narrow the suppression to `# type: ignore[attr-defined]` on the one
line. The existing comment above the assignment already documents why.
2026-04-22 12:19:16 -04:00
Mark Backman
ec7c35fe98 Move Mistral message fixups into MistralLLMAdapter
Mistral imposes three conversation-history quirks on top of the
OpenAI-compatible wire format: tool messages must be followed by an
assistant message; non-initial system messages are rejected; trailing
assistant messages require `prefix=True`. These rules were applied
inline in `MistralLLMService.build_chat_completion_params`, which is the
wrong layer — every other provider with OpenAI-compatible-but-quirky
shape (Perplexity, etc.) owns its transformations in a
`BaseLLMAdapter` subclass that runs during `get_llm_invocation_params`.

Create `MistralLLMAdapter(OpenAILLMAdapter)` on the Perplexity template
and wire it in via the existing `adapter_class` dispatch. The service
now only handles Mistral-specific request-level mapping (`random_seed`
in place of `seed`), and the message shape concerns live with other
provider format logic.

No behavior change. The transform function casts to `list[dict[str,
Any]]` internally because mutating `role` and attaching Mistral's
non-standard `prefix` field both step outside OpenAI's TypedDict
contract; the cast at the return boundary encodes that we're emitting
Mistral's extended schema, not OpenAI's.
2026-04-22 12:17:46 -04:00
Mark Backman
10b86b4bbe Coerce inspect.getdoc() None to empty string before parsing
`inspect.getdoc()` returns `str | None`, but `docstring_parser.parse()`
requires `str`. Functions without a docstring produced `None`, which
the type checker correctly flagged.

Coerce to `""` at the call site. `docstring_parser.parse("")` returns
an empty docstring whose `.description` and `.params` are already
handled by the surrounding `or ""` fallbacks, so runtime behavior is
unchanged.
2026-04-22 12:01:00 -04:00
Mark Backman
8ec56092c0 Remove duplicate ResponseCreated type 2026-04-22 11:58:15 -04:00
Mark Backman
0c3c5e5c7d Widen ToolsSchema.standard_tools to Sequence for covariance
`ToolsSchema.__init__` declared `standard_tools: list[FunctionSchema |
DirectFunction]`. Callers (`BaseLLMAdapter`, `MCPService`) pass in
`list[FunctionSchema]`, which is not assignable to the union list
because `list` is invariant in its element type.

Widen the parameter to `Sequence[...]` (covariant) so `list[X]` and
`list[X | Y]` both fit. A narrower `list[FunctionSchema]` is still
accepted, and nothing in this class mutates the argument — the
constructor immediately copies it via `_map_standard_tools`.

Also correct the `custom_tools` property return type to include
`None`, matching the stored `_custom_tools` field.

This single edit clears the pyright errors for three ignore-list
entries: `tools_schema.py`, `base_llm_adapter.py`, and `mcp_service.py`.
2026-04-22 11:54:20 -04:00
Mark Backman
b64ed3f9e2 Narrow settings.model at service boundaries, not via truthiness
Two services were reading `_settings.model` (typed `str | _NotGiven |
None` because NOT_GIVEN is the default) and coercing it with `or ""`
or similar. `_NotGiven.__bool__` returns False, so the runtime
behavior happened to work, but the type was a lie — pyright saw
`str | _NotGiven` flowing into APIs that required `str` or `str | None`.

- `AIService._sync_model_name_to_metrics`: use `isinstance(model, str)`
  narrowing with an empty-string fallback. Equivalent runtime behavior,
  honest type, no truthiness dependency on a sentinel.
- `SarvamLLMService.__init__`: validate the model is a real string
  before handing it to `_validate_model(str)`. A non-string model at
  this point is a configuration bug; raise `ValueError` so the error
  is clear and survives `python -O` (unlike an assert).
2026-04-22 11:52:20 -04:00
Mark Backman
5872006d6b Encode lazy-init invariants at the right site, not at read sites
Three spots had the same shape: a field starts None, a later method
populates it, a read site later reads it. Pyright can't track the
cross-method invariant. Rather than spray assertions at the read
sites, fix each site at the structural level:

- `FastAPIWebsocketInputTransport._monitor_websocket` now takes the
  session timeout as an argument. The task-creation site already
  guards on truthiness, so the call can pass the non-None value
  directly and the method's signature tells the truth.
- `FrameProcessorMetrics.task_manager` raises `RuntimeError` instead
  of asserting. Asserts are stripped under `python -O`; a real raise
  keeps the runtime safety net and still narrows the type for pyright.
- `SOXRStreamAudioResampler._maybe_initialize_sox_stream` returns the
  initialized stream. Callers use the return value and never touch
  the Optional `_soxr_stream` attribute, so narrowing stays inside
  the init method where the invariant is established.
2026-04-22 11:45:18 -04:00
Mark Backman
457eb7aa92 Mark abstract image/vision generators as real async generators
`ImageGenService.run_image_gen` and `VisionService.run_vision` were
declared `async def ... -> AsyncGenerator[Frame, None]` with `pass`
bodies. Without a `yield` anywhere in the body, Python treats the
function as a coroutine returning an `AsyncGenerator`, not as an async
generator itself, so callers got a coroutine where they expected an
iterator.

Add `raise NotImplementedError; yield` so the body contains a yield
(making this a real async generator) while still raising cleanly if a
subclass ever calls `super().run_*` by mistake.
2026-04-22 11:19:23 -04:00
Mark Backman
14cd476b20 Drop pyright ignores for services fixed by run_stt/run_tts widening
Deepgram STT, Gradium TTS, Smallest STT, and xAI STT/TTS had exactly
one pyright error each, all of them the AsyncGenerator return-type
mismatch resolved in 08fe9157c. Remove them from the ignore list.
2026-04-22 11:09:27 -04:00
Mark Backman
3b0affe5b4 Guard run_stt WebSocket sends with try/except
AssemblyAI, Cartesia, Gradium, and Soniox STT services sent audio over
the WebSocket without catching transient send failures, so a single
network hiccup could propagate an exception up through process_frame
and end the pipeline. Other push-based STT services (Deepgram, xAI,
Azure, Smallest, etc.) already guard their sends.

Follow the deepgram/stt.py pattern: log a warning and continue. The
existing connection-state check at the top of each call handles
recovery on the next invocation.
2026-04-22 11:03:41 -04:00
Mark Backman
08fe9157cc Widen run_stt/run_tts return type to AsyncGenerator[Frame | None, None]
The push-based STT/TTS implementations send audio/text over a socket and
receive results via a separate receive task, so there is nothing to
yield inline. They yield `None` by design. The previous declaration of
`AsyncGenerator[Frame, None]` disagreed with that, while the consumer
(`AIService.process_generator`) already accepted `Frame | None`. Widen
the producer side (abstract base and every subclass) so the type honestly
describes the contract.

Pure annotation change; no runtime behavior difference.
2026-04-22 11:01:50 -04:00
Mark Backman
3f3d3c9203 Merge pull request #4337 from pipecat-ai/mb/fix-speech-stop-strategy
Split user-turn stop timeout into independent speech and STT timers
2026-04-22 10:23:03 -04:00
Mark Backman
6b6896a543 Merge pull request #4350 from pipecat-ai/mb/pyright-precise-ignore-list
Expand pyright coverage to full src/pipecat with per-file ignores
2026-04-22 09:56:59 -04:00
Filipi da Silva Fuchter
7858813871 Merge pull request #4270 from sathwikareddy02/nvidia-llm-update
Enhance NVIDIA LLM reasoning tokens handling and allow keyless local …
2026-04-22 10:47:54 -03:00
Mark Backman
7bba74ebd6 Expand pyright coverage to full src/pipecat with per-file ignores
Previously, six modules (adapters, audio, processors, serializers,
services, transports) were ignored wholesale. Many files in those
modules already pass type checking, but we had no way to protect them
from regressions or make the remaining work visible.

Switch the include list to src/pipecat so any new module is checked by
default, and replace directory-level ignores with the 140 specific
files that still fail. This puts 189 previously-untyped files under
type checking immediately and turns the remaining work into a concrete,
shrinking TODO list.
2026-04-22 09:45:31 -04:00
Mark Backman
f425e946eb Merge pull request #4349 from pipecat-ai/mb/serializer-pyright
Fix type errors in serializers and add to pyright checked set
2026-04-22 09:43:31 -04:00
Filipi da Silva Fuchter
75bd1b5b9b Merge pull request #4323 from dakshdua/daksh/allow-noninitial-whitespace-chunks
fix: when aggregating by tokens, allow inter-token whitespace once non-whitespace has been sent
2026-04-22 10:27:08 -03:00
filipi87
d953c201bd Adding changelog entry to the fix. 2026-04-22 10:24:21 -03:00
Mark Backman
263cad41f0 Add changelog for #4349 2026-04-21 18:14:15 -04:00
Mark Backman
df9642eb5a Fix type errors in serializers and add to pyright checked set
Moves src/pipecat/serializers into pyright's include list. Narrows
self._params to each subclass's InputParams in exotel, vonage, plivo,
twilio, genesys, and telnyx. In protobuf.py, renames the reassigned
frame local to avoid clobbering its Frame type and silences two dynamic
attribute accesses on the generated frames_pb2 module.

Also aligns telnyx and plivo hangup validation with twilio: if
auto_hang_up=True (the default) but required credentials are missing,
__init__ now raises ValueError instead of silently logging a warning
at call-end time. Previously a misconfigured serializer would construct
fine and fail to hang up the call later, leaving a phantom billable
session.
2026-04-21 18:12:54 -04:00
Mark Backman
dcbe86d0fc Unify fallback timeout into the user-speech timer
Collapse the separate fallback timer into the existing user_speech_timeout
timer, restarted when a transcript arrives without a VAD stop. stt_timeout
has no meaning on the fallback path, so the stt wait is marked done
immediately. This drops the _fallback_timeout_task / _fallback_expired
bookkeeping and the branched trigger condition.
2026-04-21 17:33:12 -04:00
Mark Backman
7fc79511dd Merge pull request #4348 from pipecat-ai/mb/pyright-scripts-docs
Fix type errors in scripts and add to pyright checked set
2026-04-21 16:56:49 -04:00
Mark Backman
4d9dc64af8 Install all extras in format workflow for pyright
CI was running `uv sync --group dev` without extras. Adds daily and
tracing to extras.
2026-04-21 16:53:57 -04:00
Mark Backman
21f5cfe21a Fix type errors in utils and add to pyright checked set 2026-04-21 16:47:12 -04:00
Mark Backman
308044808d Rename to _user_speech_wait_done 2026-04-21 16:39:30 -04:00
Mark Backman
c244a950eb Add src/pipecat/tests to include list, alphabetize list 2026-04-21 16:24:53 -04:00
Mark Backman
847bd8af4b Remove src/pipecat/sync which doesn't exist 2026-04-21 16:21:46 -04:00
Mark Backman
10e58d6e42 Fix type errors in scripts and add to pyright checked set 2026-04-21 16:17:49 -04:00
Mark Backman
609a0a14e7 Merge pull request #4341 from pipecat-ai/mb/xai-tts
Add XAITTSService for xAI streaming WebSocket TTS
2026-04-21 15:52:37 -04:00
Mark Backman
84891de04d Add voice/xai-http.py to release evals 2026-04-21 15:49:59 -04:00
Mark Backman
9a49517609 Add changelog entry for #4341 2026-04-21 15:48:27 -04:00
Mark Backman
d8f5c0be71 Add XAITTSService for xAI streaming WebSocket TTS
Adds XAITTSService in the existing xai/tts.py module, alongside the
existing XAIHttpTTSService. Connects to xAI's streaming endpoint at
wss://api.x.ai/v1/tts, streams text.delta chunks up and base64 audio.delta
chunks down on the same connection so audio starts flowing before the full
utterance is synthesized.

Extends InterruptibleTTSService since xAI's protocol is strictly sequential
per connection and exposes neither a cancel verb nor a context ID — the
only way to stop an in-flight utterance is to tear down the WebSocket,
which is exactly what InterruptibleTTSService does on interruption when
the bot is speaking.

Voice, language, codec, and sample_rate are passed as query-string params
at connect time; runtime setting changes reconnect the socket. Defaults to
raw PCM so emitted TTSAudioRawFrame objects need no decoding downstream.

Splits the existing example into voice-xai.py (WebSocket) and
voice-xai-http.py (batch HTTP) so each variant has its own entry point.
Promotes the xai extra to depend on pipecat-ai[websockets-base] since the
new service imports the websockets library.
2026-04-21 15:48:26 -04:00
Mark Backman
93393ea91c Merge pull request #4338 from pipecat-ai/mb/fix-examples-types
Include examples in type checking
2026-04-21 15:47:10 -04:00
Mark Backman
58a17c7b1b Include examples in type checking
Remove `examples/` from the `pyrightconfig.json` ignore list and fix
the resulting type errors across all example files. Common fixes:

- Required API keys: `os.getenv("X")` -> `os.environ["X"]` so the
  return type is `str` rather than `str | None`, and misconfiguration
  fails fast.
- Narrow `LLMContextMessage` union members with `isinstance(..., dict)`
  before dict-style access.
- `assert isinstance(params.llm, ...)` before calling service-specific
  methods that aren't on the base `LLMService`.
- Guard optional frame fields (e.g. `LLMSearchResponseFrame.search_result`)
  before use.
2026-04-21 15:43:31 -04:00
Mark Backman
103ced1eaa Merge pull request #4347 from pipecat-ai/mb/deepgram-stt-keepalive-unbound 2026-04-21 15:15:55 -04:00
Mark Backman
ac9bea27aa Merge pull request #4340 from pipecat-ai/mb/xai-stt
Add xAI streaming STT service
2026-04-21 14:52:38 -04:00
Mark Backman
648094da26 Add changelog for #4347 2026-04-21 14:51:30 -04:00
Mark Backman
29d604f608 Fix UnboundLocalError in Deepgram STT connection handler
If the WebSocket handshake is cancelled or fails before `keepalive_task`
is assigned (e.g. an STTUpdateSettingsFrame triggers a reconnect during
initial connect), the `finally` block tried to cancel an unbound local.

Initialize `keepalive_task = None` before the try and guard the cancel.
2026-04-21 14:48:55 -04:00
Mark Backman
b838bd906b Add changelog for #4340 2026-04-21 13:45:34 -04:00
Mark Backman
c091232f2f Add xAI streaming STT service
New `XAISTTService` wraps xAI's real-time speech-to-text WebSocket
(`wss://api.x.ai/v1/stt`). It extends `WebsocketSTTService`, authenticates
with the `XAI_API_KEY` as a Bearer token on the WS handshake, and streams
raw audio (PCM/mu-law/A-law) with configurable interim results, endpointing,
language, multichannel, and diarization settings.

- `src/pipecat/services/xai/stt.py`: new service, settings dataclass, and
  `language_to_xai_stt_language` helper.
- `src/pipecat/services/stt_latency.py`: `XAI_TTFS_P99` default.
- `pyproject.toml` / `uv.lock`: `xai` extra now pulls in `websockets-base`.
- `README.md`: link to xAI STT in the services table.
- `examples/voice/voice-xai.py`: swap DeepgramSTTService for XAISTTService so
  the xAI voice example is fully xAI.
- `examples/transcription/transcription-xai.py`: new transcription-only
  example using the new service.
2026-04-21 13:45:34 -04:00
Mark Backman
8e247f395b Merge pull request #4344 from pipecat-ai/mb/11labs-normalized-alignment 2026-04-21 13:41:04 -04:00
Mark Backman
b0e3b69b35 Merge pull request #4342 from pipecat-ai/mb/docs-workflow-label 2026-04-21 13:40:38 -04:00
kompfner
9213b22852 Merge pull request #4346 from pipecat-ai/pk/use-ExternalUserTurnStrategies-in-deepgram-flux-example
Use ExternalUserTurnStrategies, as expected, in a Deepgram Flux example
2026-04-21 13:20:27 -04:00
Paul Kompfner
81571beb1b Use ExternalUserTurnStrategies, as expected, in a Deepgram Flux example 2026-04-21 10:51:59 -04:00
Mark Backman
a07bee2318 Add changelog for #4344 2026-04-21 09:12:15 -04:00
Mark Backman
a0f79b4700 Use ElevenLabs normalized_alignment so word timestamps match spoken audio 2026-04-21 09:09:19 -04:00
Mark Backman
2c3f051a1f Merge pull request #4325 from radhikagpt1208/fix/sentry-metrics-drop-metricsframe
Fix SentryMetrics dropping MetricsFrame from stop_ttfb/stop_processing
2026-04-21 07:57:42 -04:00
Mark Backman
c1b3a9f4b5 Add pipecat label to update-docs CI workflow 2026-04-20 20:40:54 -04:00
Mark Backman
9ded7bab1b Merge pull request #4334 from dhruvladia-sarvam/feat/sarvam-stt-vad-parameters-exposed
Sarvam - VAD parameters configurable on saaras:v3
2026-04-20 16:04:23 -04:00
dhruvladia-sarvam
34fb303c44 changelog descriptions 2026-04-21 00:29:38 +05:30
dhruvladia-sarvam
2aec2467cb Deprecated InputParams fix and default model change to saaras:v3 2026-04-21 00:19:49 +05:30
Mark Backman
9d8eefd2a2 Add changelog for #4337 2026-04-20 12:02:20 -04:00
Mark Backman
b59c4775da Split user-turn stop timeout into independent speech and STT timers
SpeechTimeoutUserTurnStopStrategy previously collapsed two waits into
max(stt_timeout, user_speech_timeout), which over-waited for finalizing
STT services and could also end the turn early in a legacy code path.
Run them as independent timers instead:

- user_speech_timeout: policy floor, always runs to completion.
- stt_timeout: latency safety net, short-circuited by a finalized
  transcript since STT has signaled it has nothing more to send.

The no-VAD fallback now waits only user_speech_timeout rather than
max(stt_timeout, user_speech_timeout); stt_timeout is defined relative
to VAD stop and has no meaning when no VAD event occurred. This
shortens the fallback wait for users who set stt_timeout greater than
user_speech_timeout.
2026-04-20 11:55:09 -04:00
Harshita Jain
03bd667f95 Fix Smallest AI TTS WebSocket endpoint URL and remove unsupported flush (#4320)
* Fix Smallest AI TTS WebSocket endpoint URL to match API documentation

Update base URL from waves-api.smallest.ai to api.smallest.ai and
fix path prefix from /api/v1/ to /waves/v1/ per the v4.0.0 docs.

* Update keepalive using silent space message instead of unsupported flush
2026-04-20 11:15:25 -04:00
Mark Backman
e8c3f73968 Merge pull request #4336 from pipecat-ai/mb/pyright-ignore-modules
Silence pyright diagnostics for unchecked modules in IDE
2026-04-20 09:15:02 -04:00
sathwika
91e5b1ad9a Handle NVIDIA LLM reasoning content in stream wrapper 2026-04-20 14:17:39 +05:30
dhruvladia-sarvam
f2a19cb1a3 Initial commit for vad parameters on saaras:v3 2026-04-20 13:52:48 +05:30
sathwika
74becffe55 add changelog 2026-04-20 11:47:20 +05:30
sathwika
995f897b80 Enhance NVIDIA LLM reasoning tokens handling and allow keyless local NIM endpoints 2026-04-20 11:47:16 +05:30
Mark Backman
74d11dc0aa Silence pyright diagnostics for unchecked modules in IDE
Pylance analyzes open files even when they're outside the `include`
set, producing noise in the editor. Adding these paths to `ignore`
suppresses diagnostics without affecting import resolution.
2026-04-19 09:19:15 -04:00
Ian Lee
b435ddfa44 feat(tts): add includes_inter_frame_spaces flag to word-timestamp API
Some TTS providers (e.g. Inworld) return verbatim tokens where spaces and
punctuation are already embedded in the token text. When downstream consumers
join these tokens with an extra space they produce "hello , world" instead of
"hello, world".

Add an opt-in `includes_inter_frame_spaces: bool = False` parameter to
`add_word_timestamps` / `_add_word_timestamps`. The flag is threaded through
`_WordTimestampEntry` and stamped onto every emitted `TTSTextFrame`.
Defaults to `False` — no behaviour change for existing services.

`InworldTTSService` passes `includes_inter_frame_spaces=True` and stops
pre-processing tokens in `_calculate_word_times`, returning them verbatim.

Tests added to `test_tts_frame_ordering.py` covering both HTTP and WebSocket
delivery paths: verbatim text preservation, PTS ordering, text-before-audio
ordering, and the Inworld punctuation-token scenario.

Made-with: Cursor
2026-04-18 12:03:32 -07:00
Mark Backman
6d3dfd8f64 Merge pull request #4329 from pipecat-ai/mb/resolve-krisp-warning
Silence krisp_audio import logs on auto-import
2026-04-17 18:23:01 -04:00
Mark Backman
ce9c214eec Silence krisp_audio import logs on auto-import
The two logger.error lines in krisp_instance.py fired at module-load time
whenever anything transitively imported it (e.g. pipecat.turns.user_start
pulling in krisp_viva_ip_user_turn_start_strategy), producing noisy output
for users who never asked for Krisp. Drop the log calls and raise a more
informative ImportError that names the affected classes so direct
importers still get clear guidance.
2026-04-17 18:18:33 -04:00
Mark Backman
8c8b76e9d2 Merge pull request #4326 from pipecat-ai/mb/flux-multilingual 2026-04-17 15:59:11 -04:00
denxxs
7b3141ba19 chore: update changelog fragment to PR #4328 2026-04-18 01:15:27 +05:30
denxxs
928ade993b fix: re-seed Gemini Live context on reconnect without session resumption 2026-04-18 01:14:05 +05:30
Mark Backman
42a6fc703c Address review feedback
- Fall back to Language.EN in _primary_detected_language when model is
  flux-general-en, preserving prior behavior on the default model.
- Standardize example on DeepgramFluxSTTService.Settings and drop the
  now-redundant DeepgramFluxSTTSettings import.
- Narrow the changed-behavior changelog to reflect that flux-general-en
  frames still carry Language.EN.
2026-04-17 15:38:14 -04:00
Mark Backman
c5c18335fd Merge pull request #4324 from pipecat-ai/mb/pyright-initial
Add pyright type checking: step 1
2026-04-17 14:04:35 -04:00
Mark Backman
3159503c7f Merge pull request #4327 from pipecat-ai/filipi/pyright_service_switcher
Fixing typecheck for service switcher.
2026-04-17 13:59:40 -04:00
filipi87
0340e25e9f Fixing typecheck for service switcher. 2026-04-17 12:44:57 -03:00
Mark Backman
af861b7975 Add changelog for #4326 2026-04-17 10:31:37 -04:00
Mark Backman
6bb4e8295f Add multilingual support for Deepgram Flux STT
Enables the flux-general-multi model with one or more language_hints.
Hints are sent as repeatable URL params at connect time and via a
Configure control message when updated mid-stream (detect-then-lock).
TranscriptionFrame.language now reflects the language Flux detected
for each turn via the TurnInfo `languages` field.
2026-04-17 10:30:45 -04:00
Mark Backman
f5f92dea63 Add changelog entries and restore multi-line WhatsApp error log
Add changelog entries for the pyright introduction and the
LiveKitRunnerArguments.token signature tightening. Restore the
indented multi-line format for the WhatsApp missing-env error,
now listing only the vars that are actually missing.
2026-04-17 09:39:55 -04:00
Mark Backman
cb1463f9f1 Fix type errors in runner and add to pyright checked set
Make required parameters non-optional: LiveKitRunnerArguments.token,
_create_telephony_transport args. Use os.environ[] instead of
os.getenv() for required WhatsApp env vars. Guard spec/loader None
in module loading. Tighten sip_caller_phone guard in daily.py.
2026-04-17 09:39:55 -04:00
Garegin Harutyunyan
4c19f5584c VIVA SDK TT v3 support (#4252)
* VIVA SDK TT v3 support

* Format fix.

* Renamed the API naming, removed '3' from the name.

* Implementation of User turn start strategy using Krisp VIVA Interruption Prediction in scope of TT v3 support.

* Typo fix in voice-krisp-viva example to use KrispVivaFilter class

* style fix.

* test run error fixes.

* some test related changes.

* Fixed tests

* Stule fixes.
2026-04-17 07:53:41 -04:00
Radhika Gupta
80fecab4de Fix SentryMetrics dropping MetricsFrame from stop_ttfb/stop_processing
SentryMetrics.stop_ttfb_metrics and stop_processing_metrics called the
base FrameProcessorMetrics implementation but discarded its return
value (implicit `return None`). FrameProcessorMetrics.stop_ttfb_metrics
/ stop_processing_metrics build and return a MetricsFrame, which
FrameProcessor.stop_ttfb_metrics / stop_processing_metrics then pushes
downstream so observers (e.g. UserBotLatencyObserver,
MetricsLogObserver) can see TTFB / processing metrics.

Because SentryMetrics returned None, the FrameProcessor never pushed
the MetricsFrame, so any pipeline using metrics=SentryMetrics() on STT
/ LLM / TTS services silently lost all downstream TTFB and processing
MetricsFrames. The metrics were still calculated and logged
internally, and Sentry transactions still finished correctly, but
observers never saw them.

Forward the MetricsFrame returned by the base class so FrameProcessor
can push it into the pipeline.
2026-04-17 14:48:36 +05:30
Mark Backman
ab91047300 Fix type errors in pipeline and add to pyright checked set
Use Sequence[FrameProcessor] instead of list[FrameProcessor] in Pipeline,
ServiceSwitcher, and ServiceSwitcherStrategy parameters to accept subtype
lists. Add cast() in LLMSwitcher for narrowed return types. Guard against
None in task_observer._send_to_proxy and replace hasattr with truthiness
check in task._cleanup.
2026-04-16 21:47:11 -04:00
Mark Backman
3127cc6161 Fix type errors in turns and add to pyright checked set
Widen base strategy process_frame return types to ProcessFrameResult |
None to match actual behavior (None treated as CONTINUE). Give
UserTurnCompletionLLMServiceMixin a FrameProcessor base class so pyright
can see create_task, cancel_task, process_frame, and push_frame.
2026-04-16 21:33:43 -04:00
Mark Backman
36319ecbf0 Replace system role message
In UserTurnCompletionMixin, use a developer role message for
LLM messages following an incomplete turn
2026-04-16 21:26:08 -04:00
Mark Backman
c6a1837844 Fix type errors in extensions and add to pyright checked set
Tighten LLMMessagesAppendFrame and LLMMessagesUpdateFrame message fields
from list[dict] to list[LLMContextMessage] to match actual usage. Add
type annotations on inline message lists in IVR navigator and voicemail
detector.
2026-04-16 21:22:46 -04:00
Daksh Dua
31127abd9a Allow inter-token whitespace once non-whitespace has been sent
In token-streaming mode, _push_tts_frames previously stripped only
leading newlines and dropped any pure-whitespace frame. That silently
discarded meaningful inter-token whitespace (e.g. a standalone "\n"
token between "hello" and "world"), losing prosody cues and any
downstream sentence-boundary semantics.

Track whether a non-whitespace character has been sent in the current
context. While the flag is false, strip all leading whitespace; once
true, let whitespace tokens flow through. Reset the flag on
LLMFullResponseEndFrame/EndFrame and on interruption, and save/restore
it around TTSSpeakFrame since each utterance is its own context.

Sentence-aggregation mode preserves the existing behavior.
2026-04-16 15:51:35 -07:00
Mark Backman
aa355e3d32 Fix type errors in observers and add to pyright checked set
Group three co-assigned fields (_start_frame_id, _start_frame_arrival_ns,
_start_wall_clock) into a single _StartFrameInfo dataclass. This makes
the "always set together" invariant structural rather than implicit, and
fixes the incorrect str | None annotation on _start_frame_id (Frame.id
is int).
2026-04-16 18:25:10 -04:00
Mark Backman
9bd51cd88c Add incremental pyright type checking with CI enforcement
Add pyrightconfig.json with basic type checking for zero-error modules
(clocks, metrics, transcriptions, frames) and enforce via CI. The
include list will expand as modules are fixed.
2026-04-16 18:04:42 -04:00
Aleix Conchillo Flaqué
fc1c3b48dc Merge pull request #4322 from pipecat-ai/aleix/readme-subagents
Add Pipecat Subagents to the ecosystem section in README
2026-04-16 10:38:56 -07:00
Aleix Conchillo Flaqué
4278a37ebc Merge pull request #4321 from pipecat-ai/aleix/fix-redundant-type-checks
Remove redundant duplicate type checks in direct_function.py
2026-04-16 10:38:45 -07:00
Mark Backman
7e045257e8 Merge pull request #4314 from pipecat-ai/mb/prudent-system-instruction-logging
Log system instruction once at composition time, not on every LLM call
2026-04-16 13:18:33 -04:00
dyi1
b8a1f45d4c Improve HeyGen LiveAvatar plugin reliability and performance (#4312)
* Improve HeyGen LiveAvatar plugin reliability and performance

- Add WebSocket ready gate: wait for session.state_updated connected
  event before sending commands (prevents silently dropped messages)
- Add keep-alive mechanism: send session.keep_alive every 2.5 min to
  prevent 5-minute inactivity timeout
- Optimize audio chunking: 600ms first chunk for faster initial
  response, 1s subsequent chunks for efficient streaming
- Fix audio buffer flush: send remaining buffered audio on utterance
  end instead of discarding it
- Fix WS state cleanup: properly reset connected/ready state when
  WebSocket drops unexpectedly
- Add livekit_config passthrough in LiveAvatar session token creation
- Replace stray print() with logger.debug()

* Fix HeyGenOutputTransport.start() signature and use 400ms first chunk

- Update transport.py to match new client.start() signature (no
  audio_chunk_size param)
- Change first chunk size from 600ms to 400ms per feedback

* Fix transport audio resampling and client.start() error propagation

- Add audio resampling in HeyGenOutputTransport.write_audio_frame() to
  ensure audio is always 24kHz before sending to HeyGen (was sending
  at pipeline sample rate, causing garbled audio)
- Raise exception on WS ready timeout instead of silently returning,
  preventing transport from appearing ready when WS connection failed

* Fix session readiness gate to work with LITE mode

LITE mode does not send session.state_updated WS events. Instead,
use a dual-signal _session_ready event that fires on either:
- WS session.state_updated connected (FULL mode)
- LiveKit participant connected (LITE mode)

Also reorder start() to connect both WS and LiveKit before waiting,
since the WS events may depend on LiveKit being connected.

Verified with live sandbox session - all tests pass.

* Simplify session readiness to use only WS ready gate

Remove _session_ready dual-signal and use only _ws_ready, which fires
on the session.state_updated connected WS event. Increase timeout to
30s. LiveKit is connected before waiting so the WS event can arrive.

* Reduce WS ready gate timeout back to 10s

* Remove WS ready gate (session.state_updated not reliably received)

The session.state_updated connected event is not reliably received
via the websockets library. Remove the gate for now and assume the
session is ready after WS + LiveKit connect. Keep-alive, chunking,
buffer flush, state cleanup, and other improvements remain.
2026-04-16 12:58:14 -04:00
Aleix Conchillo Flaqué
8ec85f981d Add Pipecat Subagents to the ecosystem section in README 2026-04-16 09:57:23 -07:00
Aleix Conchillo Flaqué
2f52905d32 Remove redundant duplicate type checks in direct_function.py
After the typing modernization, `dict or dict` and `list or list`
were left behind where `Dict`/`List` had been replaced by `dict`/`list`.
2026-04-16 09:51:21 -07:00
Aleix Conchillo Flaqué
f86cf98c6d Merge pull request #4319 from pipecat-ai/aleix/modernize-typing
Modernize Python typing across the codebase
2026-04-16 09:43:17 -07:00
Aleix Conchillo Flaqué
84fcba772d Replace percent format with f-string in daily/utils.py 2026-04-16 09:30:19 -07:00
Aleix Conchillo Flaqué
b3bb6fdaa5 Modernize Python typing across the codebase
Automated via ruff UP006, UP007, UP035, UP045 rules (target: py311):

- Replace `typing.List`, `Dict`, `Tuple`, `Set`, `FrozenSet`, `Type`
  with their built-in equivalents (`list`, `dict`, `tuple`, etc.)
- Replace `typing.Optional[X]` with `X | None`
- Replace `typing.Union[X, Y]` with `X | Y`
- Move `Mapping`, `Sequence`, `Callable`, `Awaitable`,
  `MutableMapping`, `MutableSequence`, `Iterator`, `AsyncIterator`,
  `AsyncGenerator` imports from `typing` to `collections.abc`
- Remove now-unused `typing` imports
- Add `from __future__ import annotations` to 5 files that use
  forward-reference strings in `X | "Y"` annotations
2026-04-16 09:28:23 -07:00
Aleix Conchillo Flaqué
12b8af3d89 pyproject: use UP ruff linting option 2026-04-16 09:26:12 -07:00
Aleix Conchillo Flaqué
1c4ffb7845 Merge pull request #4313 from pipecat-ai/ac/daily-send-dtmf
Add send_dtmf() to DailyTransport
2026-04-16 08:57:48 -07:00
Aleix Conchillo Flaqué
8d4feede23 Split #4313 changelog into one entry per file 2026-04-16 08:55:03 -07:00
Aleix Conchillo Flaqué
b11a3bc43f Add method field to Daily DTMF output frames
Lets callers specify Daily's DTMF delivery method (e.g. "rfc2833"
or "info") alongside `session_id` and `digit_duration_ms`. Forwarded
to Daily's `send_dtmf` as `method`.
2026-04-16 08:55:03 -07:00
Mark Backman
8dce66933f Merge pull request #4315 from pipecat-ai/mb/update-tavus-transport-on-connected
Update Tavus transport example
2026-04-16 09:20:52 -04:00
Mark Backman
7291026695 Update Tavus transport example
Show how to use on_connected event handler to obtain
Daily room URL
2026-04-15 23:04:31 -04:00
Mark Backman
686e250db1 Add changelog for #4314 2026-04-15 21:03:13 -04:00
Mark Backman
e8d6f611cd Log system_instruction once at composition time 2026-04-15 21:02:20 -04:00
Aleix Conchillo Flaqué
f094ce80fb Add to_string helper on output DTMF frames
Mirrors the existing `from_string` classmethod and lets callers
turn a frame's `buttons` list back into a dial string like `"123#"`.
`__str__` and the Daily transport's native DTMF path reuse it.
2026-04-15 15:14:47 -07:00
Aleix Conchillo Flaqué
9fbe1bf2a3 Document button as a convenience shortcut, not a deprecation
The single-key `button` field on `OutputDTMFFrame` and
`OutputDTMFUrgentFrame` is kept as a first-class ergonomic shortcut
for the common single-keypress case, equivalent to
`buttons=[button]`. `buttons` takes precedence when both are set.
2026-04-15 15:09:01 -07:00
Aleix Conchillo Flaqué
d8b0e78bc8 Represent DTMF sequences as list[KeypadEntry] via buttons field
Replaces the string-based `tones` field with a type-safe
`buttons: list[KeypadEntry]` on `OutputDTMFFrame` and
`OutputDTMFUrgentFrame`, matching the existing singular `button`
field on `InputDTMFFrame`. A `from_string` classmethod builds the
list from a dial string like `"123#"` (invalid characters raise
ValueError from the `KeypadEntry` constructor).

The base output audio fallback now iterates `frame.buttons`
directly, LiveKit sends `frame.buttons[0].value`, and the Daily
transport joins the button values into the single string Daily's
`send_dtmf` expects.
2026-04-15 15:05:45 -07:00
Aleix Conchillo Flaqué
675b7df408 Add tones to OutputDTMFFrame and simplify DTMF frame hierarchy
Introduces a new `tones` field on `OutputDTMFFrame` and
`OutputDTMFUrgentFrame` for sending multi-digit DTMF sequences and
deprecates the existing single-key `button` field. When only `button`
is set, it is used as a single-character `tones` string for backward
compatibility.

`DTMFFrame` is kept as an empty marker class so both input and output
DTMF frames can still be identified via isinstance. `InputDTMFFrame`
keeps its required `button` field (single keypress semantics).

The Daily-specific `DailyOutputDTMFFrame` and
`DailyOutputDTMFUrgentFrame` frames no longer need to override
`button` and simply add `session_id` and `digit_duration_ms`, which
are forwarded to Daily's `send_dtmf` as `sessionId` and
`digitDurationMs`.

The base output audio fallback now iterates `tones` and generates a
tone per character; LiveKit's native DTMF path sends `tones[0]` since
its API is single-tone.
2026-04-15 14:48:02 -07:00
Aleix Conchillo Flaqué
30f39d7395 Add DailyOutputDTMFFrame and DailyOutputDTMFUrgentFrame
Introduces Daily-specific DTMF output frames that carry explicit
`tones`, `session_id` and `digit_duration_ms` fields, forwarded to
Daily's `send_dtmf` as `tones`, `sessionId` and `digitDurationMs`.
The inherited `button` and `transport_destination` fields are
ignored for these frames in the Daily transport.
2026-04-15 14:20:08 -07:00
Aleix Conchillo Flaqué
fe2ef9c712 Add changelog for #4313 2026-04-15 10:43:28 -07:00
Aleix Conchillo Flaqué
173cf39aee Add send_dtmf() to DailyTransport
Exposes the Daily call client's DTMF sending capability so
applications can send tones during a call (e.g. IVR navigation).
2026-04-15 10:43:28 -07:00
Filipi da Silva Fuchter
ac43a70d36 Merge pull request #4311 from pipecat-ai/filipi/reconnect_websocket
New approach to reconnect STT services after updating settings.
2026-04-15 14:39:24 -03:00
filipi87
8e4fd10e0f Removing CancelledError handling from DeepgramSTTService. 2026-04-15 14:36:17 -03:00
filipi87
aeab417cd1 Changelogs for the STT service reconnect improvements. 2026-04-15 13:23:25 -03:00
filipi87
d263ad3c34 Refactoring DeepgramSTT to use request to reconnect. 2026-04-15 13:21:12 -03:00
filipi87
f3c454dc54 Refactoring CartesiaSTT to use request to reconnect. 2026-04-15 13:19:36 -03:00
filipi87
fc63790657 New approach to reconnect STT services after updating settings. 2026-04-15 11:01:58 -03:00
Mark Backman
9ffcccdd84 Merge pull request #4253 from pipecat-ai/mb/mistral-stt
Add Mistral Voxtral Realtime STT service
2026-04-15 09:00:27 -04:00
Mark Backman
503782c8b2 Merge pull request #4304 from pipecat-ai/mb/tavus-deps
Add missing daily-python dependency for tavus extra
2026-04-14 18:14:19 -04:00
Mark Backman
b834a893fe Add changelog for #4304 2026-04-14 17:52:29 -04:00
Mark Backman
ba023248d9 Add missing daily-python dependency for tavus extra 2026-04-14 17:48:37 -04:00
borislav
14cf783647 chore: add changelog for #4301 2026-04-14 22:41:09 +02:00
borislav
86e726107f fix: fail missing tool calls cleanly 2026-04-14 22:40:45 +02:00
Aleix Conchillo Flaqué
457f55e99a Merge pull request #4297 from pipecat-ai/changelog-1.0.0
Release 1.0.0 - Changelog Update
2026-04-14 12:08:35 -07:00
aconchillo
f8318289d4 Update changelog for version 1.0.0 2026-04-14 12:06:43 -07:00
Aleix Conchillo Flaqué
958d90819f Merge pull request #4294 from pipecat-ai/ac/fix-assistant-turn-stopped-event
Fix on_assistant_turn_stopped not firing for tool-call-only responses
2026-04-14 10:09:55 -07:00
Aleix Conchillo Flaqué
403235eb48 Add changelog for #4294 2026-04-14 10:07:19 -07:00
Aleix Conchillo Flaqué
698c2ba92e Fix on_assistant_turn_stopped not firing for empty LLM responses
When the LLM returned zero text tokens (e.g. it was interrupted before producing
tokens or about to push tokens), push_aggregation() returned an empty string and
on_assistant_turn_stopped was never emitted. This left consumers waiting for an
event that would never arrive.

Now on_assistant_turn_stopped always fires, with an empty content string when
the LLM produced no text tokens.

Fixes #4292
2026-04-14 10:07:19 -07:00
Mark Backman
f013d5632b Merge pull request #4293 from pipecat-ai/mb/fix-elevenlabs-tts-enable-logging
Fix ElevenLabs TTS boolean params and add missing features
2026-04-14 12:58:31 -04:00
Mark Backman
570849955c Merge pull request #4295 from pipecat-ai/mb/context-summarization-index-0
Fix context summarization failing with mid-conversation system messages
2026-04-14 12:24:47 -04:00
Mark Backman
84b885682f Add changelog for #4295 2026-04-14 11:49:31 -04:00
Mark Backman
989fb4deaa Fix context summarization failing with mid-conversation system messages
Only treat messages[0] as the initial system prompt when determining the
summarization range. Previously, the code scanned the entire context for
the first system-role message, which caused failures when the only system
message was a mid-conversation injection (e.g. "The user has been quiet").
In that case summary_start exceeded summary_end, producing an empty range
and "No messages to summarize" errors.

Fixes #4286
2026-04-14 11:48:50 -04:00
dhruvladia-sarvam
ab74605a26 Sarvam TTS request id added to agent logs (#4278)
- Added trace logging to correlate Sarvam request_id with context_id
2026-04-14 11:02:05 -04:00
Mark Backman
49998d252b Add changelog for #4293 2026-04-14 10:13:12 -04:00
Mark Backman
84566c1110 Remove unused ElevenLabsOutputFormat and add missing sample rates
Remove dead ElevenLabsOutputFormat type alias. Add pcm_32000 and
pcm_48000 to output_format_from_sample_rate to match the ElevenLabs API.
2026-04-14 10:11:31 -04:00
Mark Backman
45aa95fa10 Fix ElevenLabs boolean query params and add enable_logging to HTTP service
The enable_logging and enable_ssml_parsing URL params used truthy checks,
so False was treated the same as None (both skipped). Also, Python's
str(False) produces "False" but the API expects lowercase "false".

Additionally, add enable_logging support to ElevenLabsHttpTTSService
which was missing entirely.
2026-04-14 10:04:23 -04:00
Mark Backman
d1f7af0330 Merge pull request #4283 from pipecat-ai/mb/user-stop-transcript-improvements 2026-04-13 19:27:05 -04:00
Mark Backman
31b5a64382 Merge pull request #4282 from pipecat-ai/mb/cartesia-stt-settings-update
Reconnect Cartesia STT websocket on settings change
2026-04-13 18:18:36 -04:00
Mark Backman
d20013d7a6 Add changelog for #4283 2026-04-13 18:12:04 -04:00
Mark Backman
804e3ea9ec Trigger turn stop immediately when transcript arrives after p99 timeout
When the STT p99 timeout fires without a transcript, the turn stop
strategy previously did nothing — falling through to the 5-second
user_turn_stop_timeout. Now, a _timeout_expired flag tracks when the
timeout has elapsed so that a late transcript triggers the turn stop
immediately instead of waiting for the fallback.
2026-04-13 18:11:32 -04:00
Aleix Conchillo Flaqué
a14d257cf2 update pytest to >=9 2026-04-13 15:08:47 -07:00
Aleix Conchillo Flaqué
a8660aabfe update uv.lock 2026-04-13 15:06:25 -07:00
Aleix Conchillo Flaqué
7dc763d512 Merge pull request #4272 from pipecat-ai/pk/llm-context-get-messages-elide-large-values
Add truncate_large_values to LLMContext.get_messages()
2026-04-13 15:04:41 -07:00
Mark Backman
36b15c92ef Add changelog for #4282 2026-04-13 17:29:39 -04:00
Mark Backman
64ed0aae13 Reconnect Cartesia STT websocket when settings change at runtime
Previously settings updates were ignored with a TODO comment. Now when
model/language changes via STTUpdateSettingsFrame the service disconnects
and reconnects with the new query parameters.

Key changes:
- Implement _update_settings to disconnect/reconnect on changes
- Check `is not State.OPEN` in run_stt to catch CLOSING state
- Send `done` command before closing for clean session shutdown
- Capture websocket reference in _disconnect_websocket to prevent a
  concurrent _connect from having its new connection nulled by a stale
  finally block
2026-04-13 17:28:34 -04:00
Mark Backman
be81dac723 Merge pull request #4280 from pipecat-ai/mb/resolve-vuln-2026-04-13
Update uv.lock resolving langchain-core and cryptography vulnerabilities
2026-04-13 11:58:25 -04:00
Mark Backman
d942a713af Update uv.lock resolving langchain-core and cryptography vulnerabilities 2026-04-13 11:09:31 -04:00
Filipi da Silva Fuchter
e248c4c049 Merge pull request #4249 from sathwikareddy02/nvidia-tts-update
Add stitching support and enhancements for NvidiaTTSService
2026-04-13 09:39:48 -03:00
filipi87
1d5dcf1698 Invoking to remove the audio context when there is no more audio to receive. 2026-04-13 09:34:13 -03:00
sathwika
f45a410f56 refactor/simplify NvidiaTTSService synthesis stream shutdown 2026-04-13 14:35:17 +05:30
Paul Kompfner
e38647151d Fix language: binary data is replaced with placeholders, not truncated 2026-04-11 14:39:25 -04:00
Paul Kompfner
1a02b5d61a Rename elide_large_values to truncate_large_values 2026-04-11 14:29:05 -04:00
Aleix Conchillo Flaqué
4254c1f0e0 Merge pull request #4273 from pipecat-ai/ac/test-fixes
Fix LLM test constructors and wake phrase test race
2026-04-10 21:27:00 -07:00
Aleix Conchillo Flaqué
f91a113de7 tests: yield in wake phrase strategy setup to let tasks start
The strategy schedules background tasks during setup. Fast-running
tests could observe state before those tasks had a chance to run;
yielding once via asyncio.sleep(0) ensures they do.
2026-04-10 17:37:50 -07:00
Aleix Conchillo Flaqué
e553bb010f tests: migrate LLM tests to Settings-based constructor API
Replace the old `model=` / `params=InputParams(...)` style with the
new `settings=<Service>.Settings(...)` form across LLM service tests.
2026-04-10 17:37:49 -07:00
Paul Kompfner
245339e885 Add changelog for #4272 2026-04-10 16:37:49 -04:00
Paul Kompfner
812cdc6822 Add elide_large_values to LLMContext.get_messages()
Enable callers to get a compact version of context messages suitable
for serialization, logging, and debugging tools. For standard
messages, known binary data (base64 images, audio) is fully elided.
For LLM-specific messages, long string values are recursively
truncated. Adapter get_messages_for_logging() methods now use this.
2026-04-10 16:35:36 -04:00
Aleix Conchillo Flaqué
153814ecc2 scripts/evals: create recording subdirectories when saving audio
Example files can live under subdirectories (e.g. foundational/01.py),
so the recording path needs its parent directory created before the
audio file is written.
2026-04-10 13:19:20 -07:00
Filipi da Silva Fuchter
b1204cc430 Merge pull request #4241 from pipecat-ai/filipi/async_tools_cancellable
Enable async tool cancellation feature.
2026-04-10 15:28:01 -03:00
filipi87
c542167065 Refactored on_function_calls_cancelled to use FunctionCallFromLLM. 2026-04-10 15:06:39 -03:00
Aleix Conchillo Flaqué
02116c58de Merge pull request #4244 from omChauhanDev/fix/vad-stuck-speaking-on-mute
fix VAD stuck in SPEAKING state when audio stops mid-speech
2026-04-10 10:46:53 -07:00
Aleix Conchillo Flaqué
dcd21e7ff4 Rework audio idle detection with timestamp-based adaptive sleep
Replaces the per-frame asyncio.Event signaling with a monotonic
timestamp updated on each audio frame. The handler sleeps until the
next deadline (last_audio_time + timeout), recomputing on each wake-up
to account for audio arriving during sleep.

This avoids waking the handler on every audio frame (~50/s at 20ms
chunks), and guarantees detection latency is bounded by timeout rather
than 2 * timeout.

Also renames audio_starvation_timeout to audio_idle_timeout and
associated identifiers for consistency with existing pipecat naming
(user_idle_timeout, etc.).
2026-04-10 10:35:18 -07:00
Aleix Conchillo Flaqué
5356f3028b Merge pull request #4271 from pipecat-ai/mb/fix-translation-readme
Fix translation example in README
2026-04-10 10:26:27 -07:00
Om Chauhan
cb2c1868b0 fix VAD stuck in SPEAKING state when audio stops mid-speech 2026-04-10 09:54:48 -07:00
Aleix Conchillo Flaqué
dac88c0a47 Merge pull request #4267 from pipecat-ai/ac/fix-observer-cleanup-ordering
Fix observer cleanup ordering to stop proxy tasks before closing resources
2026-04-10 09:05:33 -07:00
kompfner
8e5fe8afda Merge pull request #4067 from omChauhanDev/fix-gemini3-flash-thinking-default
fix: default thinking config for Gemini 3+ Flash models
2026-04-10 10:41:44 -04:00
kompfner
d07eebff20 Merge pull request #4248 from omChauhanDev/add-openai-custom-tools-support
Add custom_tools support for OpenAI adapters
2026-04-10 10:27:28 -04:00
Paul Kompfner
ef4dcca4f1 Update changelog to describe user-facing custom_tools support 2026-04-10 10:23:13 -04:00
Paul Kompfner
fc3307bc63 Use OpenAI SDK types for tool params in adapters and tests
These are TypedDicts (plain dicts at runtime), so no behavioral change
— just more descriptive type hints for readers. Use ToolParam instead
of FunctionToolParam for the Responses adapter to reflect that custom
non-function tools are supported. Use ChatCompletionToolParam instead
of Any for the completions adapter return type. Update tests to use
typed params in expected values.
2026-04-10 10:15:39 -04:00
Mark Backman
da9a55a430 Fix translation example in README 2026-04-10 09:13:42 -04:00
Filipi da Silva Fuchter
094d36904c Merge pull request #4268 from pipecat-ai/filipi/lemonslice_improments
LemonSlice transport updates - new events, extra params
2026-04-10 08:50:39 -03:00
sathwika
746fadc2b5 thread simplification + handling interuption 2026-04-10 17:18:22 +05:30
filipi87
8cce25d2d2 Fixing openai examples. 2026-04-10 08:25:50 -03:00
filipi87
891f00cb5f Using the on_function_calls_cancelled inside the examples. 2026-04-10 07:45:20 -03:00
filipi87
1ca094dad7 Not invoking on_function_calls_started for the cancel function, and creating on_function_calls_cancelled 2026-04-10 07:40:52 -03:00
filipi87
346c585290 Enabling the option to cancel the tools for all the async examples. 2026-04-10 07:31:51 -03:00
jp-lemon
c134110399 LemonSlice transport updates 2026-04-10 07:10:41 -03:00
Aleix Conchillo Flaqué
f9117e6d4a Add changelog for PIPECAT_OBSERVER_FILES removal 2026-04-09 17:39:54 -07:00
Aleix Conchillo Flaqué
360e4480e0 Remove deprecated _load_observer_files in favor of setup files 2026-04-09 17:38:46 -07:00
Aleix Conchillo Flaqué
9b7e15c9bc Add changelog for #4267 2026-04-09 16:55:40 -07:00
Aleix Conchillo Flaqué
00ea86fda8 Fix observer cleanup ordering to stop proxy tasks before closing resources
During pipeline shutdown, proxy tasks must be cancelled before observer
resources are cleaned up. Previously, stop() was called inside
_cancel_tasks() and start() was called in _start_tasks(), which could
lead to proxy tasks still consuming frames after observer resources
were closed.

Now the lifecycle is explicit in _handle_start_frame: start() after all
observers are loaded, and stop() before cleanup() on shutdown.

Also fixes misleading variable name in TaskObserver.cleanup() where
iterating self._proxies yields observer keys, not Proxy values.

Fixes #4195
2026-04-09 16:55:40 -07:00
Aleix Conchillo Flaqué
5f75728207 EventNotifier: update docstring with single-consumer use case 2026-04-09 16:21:42 -07:00
Aleix Conchillo Flaqué
9d274f0fb3 PipelineTask: update dangling task logging 2026-04-09 16:21:05 -07:00
Aleix Conchillo Flaqué
43ddbdf1ec Merge pull request #3797 from iamjr15/fix/idle-processor-event-race
Fix asyncio.Event race conditions in idle processors
2026-04-09 16:04:03 -07:00
iamjr15
565349d332 Fix asyncio.Event race conditions in idle processors
Move event.clear() from finally block to success path in
IdleFrameProcessor and UserIdleProcessor._idle_task_handler().
The finally block unconditionally cleared signals set during
async timeout callbacks, causing false-positive idle detection.

Closes #3402
2026-04-09 13:41:01 -07:00
filipi87
2dd1170229 Updating the Anthropic stream example to allow cancel the location tracking. 2026-04-09 17:26:51 -03:00
filipi87
5cf90cba98 Addressing PR review comments. 2026-04-09 17:11:04 -03:00
Aleix Conchillo Flaqué
981b7bdcb7 Merge pull request #4255 from omChauhanDev/fix/async-gc-collect
PipelineRunner: make _gc_collect async
2026-04-09 12:09:38 -07:00
Filipi da Silva Fuchter
c4320e7f07 Merge pull request #4265 from pipecat-ai/filipi/fix_elevenlabs_token_aggregation
Using the correct default for auto_mode based on text_aggregation_mode.
2026-04-09 15:30:36 -03:00
filipi87
ea0be4d39c Changelog for the elevenlabs fix. 2026-04-09 15:25:06 -03:00
filipi87
dca4e1090a Using the correct default for auto_mode based on text_aggregation_mode. 2026-04-09 15:21:30 -03:00
Cale Shapera
ec574edd53 Add Inworld Realtime Service (#4140)
* Add Inworld Realtime LLM service

Adds a WebSocket-based realtime service for Inworld's cascade
STT/LLM/TTS API with semantic VAD, function calling, and streaming
transcription support.

New files:
- src/pipecat/services/inworld/realtime/ (service, events)
- src/pipecat/adapters/services/inworld_realtime_adapter.py
- examples/foundational/19zb-inworld-realtime.py

Also includes:
- websockets dependency for inworld extra in pyproject.toml
- Adapter and settings tests matching OpenAI/Grok realtime patterns
- Fix for double-response when server-side VAD is enabled

* Prefer init-provided system instruction in Inworld Realtime

Adopt _resolve_system_instruction() from BaseLLMAdapter, matching the
pattern applied to OpenAI Realtime, Grok Realtime, Gemini Live, and
Nova Sonic in the pk/realtime-services-init-v-context-system-instructions-cleanup
branch.

* Update changelog entry with PR number

* Fix changelog format to use bullet point

* Polish PR: default model, example cleanup, changelog update

- Change default model from gpt-4.1-nano to gpt-4.1-mini
- Add function calling demo to example
- Remove demo-testing artifact from system instruction
- Mention Router support in changelog

* Address PR review feedback for Inworld Realtime

- Move example to examples/realtime/realtime-inworld.py
- Change initial context role from "user" to "developer"
- Remove explicit sample rates from example; sync them in
  _ensure_audio_config so Inworld gets the transport's actual rates
- Add audio race condition guard in _handle_evt_audio_delta (matches
  OpenAI realtime pattern)
- Convert remaining "system"/"developer" messages to "user" in adapter
- Add clarifying comment for local-VAD vs server-VAD metrics paths

* Simplify example, add provider tracking, remove local VAD path

- Remove function calling from example, switch model to xai/grok-4-1-fast-non-reasoning
- Add pipecat-realtime session key prefix and provider_data metadata
  for Inworld traffic attribution
- Remove local VAD code path (Inworld only supports server-side VAD)
- Use typed InputAudioBufferAppendEvent for audio sends

* Default TTS model to inworld-tts-1.5-max

* Remove dead shimmed tools code, set STT/VAD defaults

- Remove non-functional AdapterType.SHIM custom tools code from adapter
- Default STT model to assemblyai/u3-rt-pro
- Default VAD eagerness to low
2026-04-09 13:04:17 -04:00
filipi87
772fb57090 Enable async tool cancellation feature. 2026-04-09 10:29:23 -03:00
Filipi da Silva Fuchter
76601944c6 Merge pull request #4230 from pipecat-ai/filipi/async_tools_stream
Support for streaming multiple responses via function calls
2026-04-09 10:26:33 -03:00
filipi87
178985ec8a Refactoring the frame queue to avoid overhead. 2026-04-09 10:24:22 -03:00
filipi87
edc197d050 Creating a new example for async stream using Google. 2026-04-09 09:50:00 -03:00
filipi87
7ece8e3c4a Creating a new example for async stream using Anthropic. 2026-04-09 09:41:07 -03:00
filipi87
7b45a56119 Changelogs for the new feature and the fix. 2026-04-09 09:04:19 -03:00
filipi87
a544f885a3 Added new examples: function-calling-openai-async-stream.py and function-calling-openai-responses-async-stream.py 2026-04-09 09:04:06 -03:00
filipi87
375deac912 Support for streaming multiple responses via function calls. 2026-04-09 09:03:53 -03:00
filipi87
699ca38dc1 Allowing to check if a specific frame is in the queue. 2026-04-09 09:03:06 -03:00
filipi87
aeda60f761 Refactoring the FrameQueue to be able to track any Frame. 2026-04-09 09:02:47 -03:00
Om Chauhan
b010dd58d2 added changelog 2026-04-08 09:37:58 +05:30
Om Chauhan
225ea907d5 make PipelineRunner._gc_collect async 2026-04-08 09:27:18 +05:30
Om Chauhan
1443dfb070 added changelog 2026-04-08 08:48:26 +05:30
Om Chauhan
4bef85e363 added custom_tools support for OpenAI adapters 2026-04-08 08:40:03 +05:30
Mark Backman
215b2dc7f3 Add voice-mistral to evals 2026-04-07 15:37:07 -04:00
Mark Backman
874e2878be Update README with Mistral services 2026-04-07 15:36:22 -04:00
Mark Backman
9131fa5c12 Add changelog for PR #4253 2026-04-07 15:32:38 -04:00
Mark Backman
68a3070ad4 Add Mistral Voxtral Realtime STT service 2026-04-07 15:26:56 -04:00
Mark Backman
a7bf9f538c Clean up comments in MistralTTSService 2026-04-07 12:56:10 -04:00
Mark Backman
0acfb4dd49 Merge pull request #4251 from pipecat-ai/mb/mistral-tts
Add Mistral Voxtral streaming TTS service
2026-04-07 12:50:48 -04:00
Mark Backman
8594401024 Add changelog for PR #4251 2026-04-07 12:32:06 -04:00
Mark Backman
aa7a014518 Add mistral voice example 2026-04-07 12:32:06 -04:00
Filipi da Silva Fuchter
27a8a973b1 Merge pull request #4201 from pipecat-ai/mb/handle-recurring-disconnects
Fix WebsocketService infinite reconnection loop
2026-04-07 11:02:24 -03:00
sathwika
8abda808ca Add Nvidia copyright header 2026-04-07 19:27:04 +05:30
Mark Backman
7f3f23dcb9 Add Mistral Voxtral streaming TTS service
Integrate with Mistral's Voxtral TTS API (voxtral-mini-tts-2603) using
HTTP streaming with Server-Sent Events. Converts base64-encoded float32
PCM chunks from the API to int16 for the Pipecat pipeline.
2026-04-07 09:39:36 -04:00
Filipi da Silva Fuchter
be509e5647 Merge pull request #4245 from kollaikal-rupesh/fix/mixer-cancel-cleanup
Stop audio mixer on pipeline cancellation
2026-04-07 10:36:18 -03:00
sathwika
9f0b18b03d Add changelog fragments for PR #4249 2026-04-07 18:18:55 +05:30
Filipi da Silva Fuchter
6eccd16543 Merge pull request #4217 from pipecat-ai/filipi/async_tools
Supporting async function calls.
2026-04-07 09:35:03 -03:00
filipi87
d8dc6bc7d0 New example for async function calls using Google. 2026-04-07 09:31:22 -03:00
filipi87
d12a8529e2 New example for async function calls using OpenAI responses. 2026-04-07 09:28:01 -03:00
filipi87
aa061f7e2c Renaming the openai and anthropic examples to async instead of delayed. 2026-04-07 09:23:45 -03:00
Filipi da Silva Fuchter
e863293198 Improving docstring description.
Co-authored-by: kompfner <paul@daily.co>
2026-04-07 08:14:39 -04:00
filipi87
9c7d5a9de2 Improving changelog description to mention group_parallel_tools. 2026-04-07 09:13:08 -03:00
Filipi da Silva Fuchter
a451c42dc7 Merge pull request #4247 from pipecat-ai/filipi/background_sound_example
Fixing the background sound example.
2026-04-07 09:06:14 -03:00
sathwika
bc009d8f98 Add stitching support and enhancements for NvidiaTTSService 2026-04-07 14:49:45 +05:30
Rupesh
67ee802772 Remove changelog entry per review feedback 2026-04-06 21:36:53 -07:00
filipi87
ceaa27ee6e Fixing the background sound example. 2026-04-06 18:25:30 -03:00
filipi87
42335e2ef0 Renaming to async_tool and providing description. 2026-04-06 09:56:48 -03:00
Rupesh
7585864113 Stop audio mixer on pipeline cancellation to prevent 100% CPU usage 2026-04-06 01:51:29 -07:00
kompfner
18852adc28 Merge pull request #4242 from pipecat-ai/pk/gemini-live-fix-session-resumption
Fix Gemini Live session resumption hanging after reconnect
2026-04-04 11:43:24 -04:00
Paul Kompfner
f11b6d7151 Fix Gemini Live session resumption hanging after reconnect
After a reconnect, _ready_for_realtime_input was never set back to True
because _create_initial_response (which sets the flag) is only called on
initial connection. This caused all audio/video/text to be silently
dropped after reconnecting, making the bot appear to hang.

Set the flag in _handle_session_ready when we detect a reconnect, either
via session_resumption_handle (server restores state) or via existing
context (rare case where connection drops before first resumption handle).
2026-04-03 18:27:10 -04:00
Paul Kompfner
9df1e18b43 Fix Gemini Live session resumption hanging after reconnect
After a reconnect, _ready_for_realtime_input was never set back to True
because _create_initial_response (which sets the flag) is only called on
initial connection. This caused all audio/video/text to be silently
dropped after reconnecting, making the bot appear to hang.

Set the flag in _handle_session_ready when context already exists
(i.e. reconnect case) since we don't need to go through
_create_initial_response again.
2026-04-03 16:32:03 -04:00
Mark Backman
b8f9a21e0c Merge pull request #4240 from pipecat-ai/mb/remove-old-files
Remove orphaned .dockerignore and CHANGELOG.md.template
2026-04-03 15:40:57 -04:00
Mark Backman
c18d997ad8 Remove orphaned .dockerignore and CHANGELOG.md.template 2026-04-03 14:55:25 -04:00
Mark Backman
56aaebe1b0 Merge pull request #4239 from pipecat-ai/mb/remove-deprecation-module-proxy
Remove DeprecatedModuleProxy and service re-export shims
2026-04-03 14:03:17 -04:00
Mark Backman
916af84974 Remove DeprecatedModuleProxy and service re-export shims
Remove the deprecation proxy infrastructure that allowed old-style flat
imports (e.g. `from pipecat.services.openai import OpenAILLMService`).
Users must now import from specific submodules
(`from pipecat.services.openai.llm import OpenAILLMService`), which is
already the established pattern across all internal code and 179+ examples.

- Strip 32 proxy `__init__.py` files to empty
- Strip 3 non-proxy files with bare star imports (minimax, sambanova, sarvam)
- Strip google/gemini_live `__init__.py` re-exports
- Remove DeprecatedModuleProxy class and helpers from services/__init__.py
- Remove ruff per-file ignore for services/__init__.py
- Fix 2 examples using old-style imports
2026-04-03 13:43:02 -04:00
Mark Backman
3e911b5fa0 Merge pull request #4236 from pipecat-ai/mb/more-deprecation-removals-2026-04-03
Remove deprecated fields, shims, and backward-compatibility code
2026-04-03 13:28:03 -04:00
Aleix Conchillo Flaqué
7c08779a2f Merge pull request #4234 from pipecat-ai/aleix/export-runner-app
Export FastAPI app from runner for custom routes
2026-04-03 09:45:39 -07:00
Mark Backman
988c08a5b6 Merge pull request #4238 from pipecat-ai/mb/fix-daily-utils-docs
Fix Pydantic v2 + Sphinx autodoc incompatibility for Daily utils
2026-04-03 12:39:09 -04:00
Mark Backman
7351298849 Fix Pydantic v2 + Sphinx autodoc incompatibility for Daily utils
Patch Pydantic's DICT_TYPES check in conf.py to accept Union-wrapped
dict types, fixing the autodoc import failure for models using
ConfigDict(extra="allow").
2026-04-03 12:00:11 -04:00
kompfner
392134be46 Merge pull request #4231 from pipecat-ai/pk/llm-messages-transform-frame
Add a `LLMMessagesTransformFrame` to facilitate programmatically edit…
2026-04-03 11:54:34 -04:00
Paul Kompfner
9266e1e7ad Remove comment referencing removed OpenAILLMContext 2026-04-03 11:53:57 -04:00
Mark Backman
e9eff4626f Merge pull request #4237 from pipecat-ai/mb/docstring-fixes-2026-04-03
Docstring fixes for docs auto-generation
2026-04-03 11:50:20 -04:00
Mark Backman
21aa50283e Update docs build script and README for current workflow
Make -W (warnings as errors) opt-in via --strict flag instead of
default, and update README to reflect uv-based workflow and current
directory structure.
2026-04-03 11:43:44 -04:00
Paul Kompfner
70469e3c0c Assert no LLMContextFrame when run_llm is not set in message frame tests 2026-04-03 11:34:58 -04:00
Paul Kompfner
6111df947e Test LLMAssistantAggregator handling of upstream message frames
Add tests for LLMRunFrame, LLMMessagesAppendFrame, LLMMessagesUpdateFrame,
and LLMMessagesTransformFrame sent upstream to LLMAssistantAggregator,
mirroring the existing LLMUserAggregator downstream tests. Add
frames_to_send_direction param to run_test helper to support this.
2026-04-03 11:34:58 -04:00
Paul Kompfner
4eebfd65d9 Add a LLMMessagesTransformFrame to facilitate programmatically editing context in a frame-based way.
The previous approach required the caller to directly grab a reference to the context object, grab a "snapshot" of its messages *at that point in time*, transform the messages, and then push an `LLMMessagesUpdateFrame` with the transformed messages. This approach can lead to problems: what if there had already been a change to the context queued in the pipeline? The transformed messages would simply overwrite it without consideration.
2026-04-03 11:34:50 -04:00
Mark Backman
c2358b273b Use Parameters instead of Attributes in docstrings to fix duplicate object warnings
Napoleon's Attributes section creates class-level attribute docs that
duplicate the __init__ parameter docs when napoleon_include_init_with_doc
is enabled. Using Parameters avoids the duplication.
2026-04-03 10:36:36 -04:00
Mark Backman
3a10a528c0 Remove deprecated fields, shims, and backward-compatibility code
- Remove expect_stripped_words from LLMAssistantAggregatorParams and related warnings
- Remove old multi-parameter on_push_frame observer signature support in TaskObserver
- Remove deprecated context field from UserImageRequestFrame
- Remove deprecated LiveKitTransportMessageFrame and LiveKitTransportMessageUrgentFrame
- Remove deprecated pipecat.turns.mute shim module
2026-04-03 10:10:51 -04:00
Mark Backman
f078b8b867 Fix Sphinx docstring RST formatting warnings
Replace Markdown code blocks with RST syntax in genesys.py, fix
deprecated directive transitions in nvidia and summarization modules,
remove stray bullet prefix in whisper arg docs, restructure code block
in turn completion mixin, and add deepgram mock to Sphinx conf.
2026-04-03 09:57:20 -04:00
Mark Backman
5490820338 Merge pull request #4235 from pipecat-ai/mb/deprecation-docs-cleanup
Clean up docs config after deprecation pass
2026-04-03 09:57:05 -04:00
Mark Backman
10697636c9 Add changelog for #4235 2026-04-03 09:52:31 -04:00
Mark Backman
e1638a9342 Clean up docs config after riva removal and add missing modules
Remove stale riva mock imports from autodoc_mock_imports since the riva
service was removed and nvidia-riva-client is installed during doc builds.
Add pipecat.turns and pipecat.extensions to import_core_modules() and
add Turns to the index.rst toctree. Regenerate uv.lock to reflect the
riva extra removal from pyproject.toml.
2026-04-03 09:52:31 -04:00
Mark Backman
bfffefa95c Remove leftover riva and remote-smart-turn references
Clean up deprecated extras from pyproject.toml and the docs
build script.
2026-04-03 09:29:29 -04:00
Mark Backman
fbb49ffc8d Merge pull request #4233 from pipecat-ai/mb/remove-unused-imports-2026-04-02
Remove unused imports across codebase
2026-04-03 07:26:13 -04:00
filipi87
eace782752 Renaming from async_tool to tool. 2026-04-03 08:20:14 -03:00
Mark Backman
b94071d37f Merge pull request #4232 from pipecat-ai/mb/more-deprecation-removals 2026-04-03 06:52:56 -04:00
Aleix Conchillo Flaqué
796a10fe9c Add changelog for #4234 2026-04-02 21:16:49 -07:00
Aleix Conchillo Flaqué
1ab07d312f Export FastAPI app from runner so custom routes can be added
Move the FastAPI instance to module level so other packages can import
it and register routes before main() is called. main() now configures
the existing app with transport-specific routes instead of creating a
new one.
2026-04-02 21:16:17 -07:00
Mark Backman
8adb38f87c Remove unused imports across codebase 2026-04-02 22:21:16 -04:00
Mark Backman
33f145d70a Add changelog fragments for #4232 2026-04-02 22:10:09 -04:00
Mark Backman
41e46ee69e Remove deprecated vad_events and should_interrupt from DeepgramSTTService
Deepgram's built-in VAD events were deprecated in 0.0.99 in favor of
Silero VAD. This removes vad_events from settings and LiveOptions,
the should_interrupt parameter, the vad_enabled property,
_on_speech_started/_on_utterance_end handlers, and simplifies
_on_message and process_frame accordingly.
2026-04-02 22:05:49 -04:00
Mark Backman
60933b7a56 Remove deprecated send_transcription_frames param and fix broken _warn_deprecated_param calls
Remove the send_transcription_frames parameter from OpenAI Realtime LLM
(deprecated since 0.0.92). Also fix undefined _warn_deprecated_param
calls in both OpenAI and xAI realtime services, replacing them with the
existing _warn_init_param_moved_to_settings method.
2026-04-02 21:58:57 -04:00
Mark Backman
64e09d592e Remove deprecated TranscriptionUserTurnStopStrategy alias
Replaced by SpeechTimeoutUserTurnStopStrategy since 0.0.102.
2026-04-02 21:57:03 -04:00
Mark Backman
883de8ab08 Remove dangling turn_analyzer docstring and unused imports from TransportParams 2026-04-02 21:56:11 -04:00
Mark Backman
793ed8f9e3 Remove deprecated UserBotLatencyLogObserver and UserIdleProcessor
UserBotLatencyLogObserver (deprecated 0.0.102) is replaced by
UserBotLatencyObserver. UserIdleProcessor (deprecated 0.0.100) is
replaced by LLMUserAggregator with user_idle_timeout.
2026-04-02 21:54:36 -04:00
Vanessa Pyne
d8ea33e1a4 Merge pull request #4034 from omChauhanDev/fix/mcp-persistent-session
fixed MCPClient to reuse session across tool calls
2026-04-02 18:51:31 -05:00
vipyne
1d7404ef21 Update MCP examples 2026-04-02 18:15:56 -05:00
Om Chauhan
dc909e2713 add changelog fragments 2026-04-02 18:06:28 -05:00
Om Chauhan
e22f9f84bb fixed MCPClient to reuse session across tool calls 2026-04-02 18:06:28 -05:00
filipi87
7af72eee3e Creating new delayed examples for openai and anthropic. 2026-04-02 18:40:41 -03:00
Aleix Conchillo Flaqué
57068f1b38 Merge pull request #4229 from pipecat-ai/aleix/deprecate-transport-vad-turn-analyzers
Remove deprecated transport VAD/turn analyzers and ExternalUserTurnStrategies
2026-04-02 14:30:12 -07:00
filipi87
bbb605accc Changelog entries for the fixes and improvements. 2026-04-02 16:58:42 -03:00
filipi87
929a0e33f4 Fixing the automated tests. 2026-04-02 16:58:28 -03:00
filipi87
3724ecd378 Supporting async function calls. 2026-04-02 16:58:19 -03:00
filipi87
4c8734c5e1 Fixing an issue where the BotOutputTransport was discarding the UninterruptibleFrames. 2026-04-02 16:57:46 -03:00
filipi87
283f6df205 Creating a FrameQueue so we can properly reset without discarding uninterruptible frames. 2026-04-02 16:57:22 -03:00
Aleix Conchillo Flaqué
a29be38f48 LLMUserAggregator: remove self-queued frame tracking
The _self_queued_frames set and _internal_queue_frame wrapper were used
to prevent re-processing SpeechControlParamsFrame that the aggregator
queued to itself. Now that the frame is no longer special-cased, this
tracking is unnecessary. Also removes unused FrameCallback import.
2026-04-02 12:42:06 -07:00
Aleix Conchillo Flaqué
976c644f90 Fix tests to expect SpeechControlParamsFrame from default turn strategy 2026-04-02 12:42:06 -07:00
Aleix Conchillo Flaqué
34aa37f395 Add changelog for #4229 2026-04-02 11:54:07 -07:00
Aleix Conchillo Flaqué
380867a87a LLMUserAggregator: remove auto ExternalUserTurnStrategies() 2026-04-02 11:52:26 -07:00
Aleix Conchillo Flaqué
cc3af59db4 transports: remove deprecated VAD and turn analyzers 2026-04-02 11:51:08 -07:00
Mark Backman
f93d13efff Merge pull request #4228 from pipecat-ai/mb/remove-turn-deprecations 2026-04-02 14:32:21 -04:00
Mark Backman
c28b7e8f26 Merge pull request #4219 from lukehalley/feat/bedrock-prompt-caching
feat(aws): add prompt caching support for Bedrock ConverseStream
2026-04-02 12:26:28 -04:00
Mark Backman
d1a2dee7a1 fix(aws): initialize enable_prompt_caching in default settings 2026-04-02 12:20:47 -04:00
Luke Halley
da1a1a59a4 feat(aws): handle LLMEnablePromptCachingFrame for runtime toggling
Add LLMEnablePromptCachingFrame handler to process_frame for parity
with AnthropicLLMService, enabling runtime toggling of prompt caching.
2026-04-02 12:13:46 -04:00
Luke Halley
134790b17c chore: add changelog fragment for PR #4219 2026-04-02 12:10:57 -04:00
Luke Halley
e5aa3bbc20 feat(aws): add prompt caching support for Bedrock ConverseStream
Adds `enable_prompt_caching` setting to `AWSBedrockLLMSettings`. When
enabled, appends `cachePoint` markers to system prompts and tool
definitions in ConverseStream requests.

This can reduce TTFT by up to 85% for multi-turn conversations where
the system prompt stays constant (e.g. voice agents, chat assistants).

Follows the same pattern as `AnthropicLLMService.enable_prompt_caching`.

Usage:
```python
llm = AWSBedrockLLMService(
    settings=AWSBedrockLLMSettings(
        model="au.anthropic.claude-haiku-4-5-20251001-v1:0",
        enable_prompt_caching=True,
    ),
)
```

See: https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html
2026-04-02 12:10:57 -04:00
Mark Backman
3be0ea05ef Add changelog entries for #4228 2026-04-02 11:34:22 -04:00
Mark Backman
0c59819682 Remove allow_interruptions from voice-sarvam example
This was missed from the allow_interruptions removal commit.
2026-04-02 11:32:44 -04:00
Mark Backman
5b67dcd9e7 Remove deprecated EmulateUser{Started,Stopped}SpeakingFrame and emulated field
Remove EmulateUserStartedSpeakingFrame, EmulateUserStoppedSpeakingFrame
(deprecated since v0.0.99), and the emulated field from
UserStartedSpeakingFrame and UserStoppedSpeakingFrame. Clean up the
handling code in base_input.py and a stale comment in nova_sonic/llm.py.
2026-04-02 11:31:29 -04:00
Mark Backman
d503383c23 Remove deprecated interruption_strategies plumbing
The interruption_strategies mechanism was deprecated in v0.0.99 in favor
of LLMUserAggregator's user_turn_strategies. All evaluation logic was
already removed — this removes the remaining field definitions, property,
StartFrame propagation, conditional check in base_input.py, strategy
files, and test.
2026-04-02 11:19:17 -04:00
Mark Backman
fa30268b84 Remove deprecated TranscriptionMessage, ThoughtTranscriptionMessage, and TranscriptionUpdateFrame 2026-04-02 11:03:23 -04:00
Mark Backman
2a118084bd Remove deprecated transcript_processor module 2026-04-02 10:57:05 -04:00
Mark Backman
87e8ed109a Remove deprecated STTMuteFilter, STTMuteConfig, and STTMuteStrategy 2026-04-02 10:52:41 -04:00
Mark Backman
a5e1bbf4a3 Remove deprecated UserResponseAggregator class 2026-04-02 10:50:05 -04:00
Mark Backman
f8267f1ea6 Remove deprecated allow_interruptions parameter
This field was deprecated in v0.0.99 in favor of LLMUserAggregator's
user_turn_strategies / user_mute_strategies parameters. Since the default
was True (interruptions allowed), removing the guards keeps the current
default behavior.
2026-04-02 10:47:44 -04:00
Mark Backman
74acb0b7d0 Remove deprecated class_decorators tracing module 2026-04-02 10:31:15 -04:00
Mark Backman
41e3afbc2f Remove deprecated add_pattern_pair method from PatternPairAggregator 2026-04-02 10:28:01 -04:00
Aleix Conchillo Flaqué
d4824ffe8a Merge pull request #4225 from pipecat-ai/aleix/transport-and-other-deprecations
Remove deprecated transport module aliases and sync package
2026-04-01 19:43:22 -07:00
Mark Backman
2426f80789 Merge pull request #4220 from pipecat-ai/mb/more-service-deprecations
Remove more deprecated service parameters and shims
2026-04-01 22:23:39 -04:00
Mark Backman
5ce46df599 Use self.create_context_id() instead of raw uuid in CartesiaTTSService 2026-04-01 22:18:41 -04:00
Aleix Conchillo Flaqué
a6013ba437 update uv.lock 2026-04-01 19:12:39 -07:00
Aleix Conchillo Flaqué
279ca5a87b Add changelog for #4225 2026-04-01 19:04:11 -07:00
Aleix Conchillo Flaqué
c6f79592d8 remove deprecated sync package 2026-04-01 19:04:11 -07:00
Aleix Conchillo Flaqué
e74e497b8d transports: remove old deprecated modules 2026-04-01 19:04:11 -07:00
Aleix Conchillo Flaqué
d245b79bba Merge pull request #3984 from pipecat-ai/aleix/update-onnxruntime
Update onnxruntime to 1.24.3
2026-04-01 19:03:57 -07:00
Mark Backman
8a794424dd Update uv.lock 2026-04-01 19:05:17 -04:00
Aleix Conchillo Flaqué
f4743a6c91 require python >= 3.11 2026-04-01 19:02:34 -04:00
Aleix Conchillo Flaqué
ba32a48510 github: remove python 3.10 from compatibility chart 2026-04-01 19:02:34 -04:00
Aleix Conchillo Flaqué
a9cafa2a3b Add changelog for #3984 2026-04-01 19:02:34 -04:00
Aleix Conchillo Flaqué
58b1b7249e Update onnxruntime to 1.24.3
This version adds support for Python 3.14.
2026-04-01 19:02:32 -04:00
Aleix Conchillo Flaqué
db8e73e5ca Merge pull request #4224 from pipecat-ai/aleix/optional-function-call-timeout
Make function_call_timeout_secs optional
2026-04-01 14:39:10 -07:00
Mark Backman
170f6dfe8b Add changelog for #4220 2026-04-01 17:03:05 -04:00
Mark Backman
c763abc4ae Add deprecation version to update_options in GoogleSTTService 2026-04-01 17:03:05 -04:00
Mark Backman
197d96fc49 Remove deprecated enable_prompt_caching_beta from Anthropic InputParams 2026-04-01 17:03:05 -04:00
Mark Backman
c8e9bf77fd Remove deprecated simli_config and use_turn_server params from SimliVideoService 2026-04-01 17:03:05 -04:00
Mark Backman
48b25962e2 Remove deprecated english_normalization param from MiniMax TTS InputParams 2026-04-01 17:03:05 -04:00
Mark Backman
5d093c9ad7 Remove deprecated InputParams class from GoogleVertexLLMService
The location and project_id fields were deprecated since 0.0.90 in
favor of direct __init__ parameters. Now that InputParams is removed,
project_id is required and location defaults to "us-east4" directly
in the signature.
2026-04-01 17:03:05 -04:00
Mark Backman
d93f63deb5 Remove deprecated base_url param from GeminiLiveLLMService 2026-04-01 17:03:05 -04:00
Mark Backman
09a57972f5 Remove deprecated api_key param from GeminiTTSService 2026-04-01 17:03:05 -04:00
Mark Backman
f83d062df9 Remove deprecated InputParams alias from GladiaSTTService 2026-04-01 17:03:05 -04:00
Mark Backman
a2a42b8703 Remove deprecated confidence param from GladiaSTTService 2026-04-01 17:03:05 -04:00
Mark Backman
e60a72e2d4 Remove deprecated language param from GladiaInputParams 2026-04-01 17:03:05 -04:00
Mark Backman
83f4989a78 Remove deprecated model param from FishAudioTTSService 2026-04-01 17:03:05 -04:00
Mark Backman
5d2b288274 Remove deprecated url param from DeepgramSTTService 2026-04-01 17:03:05 -04:00
Mark Backman
52ece87ac9 Remove deprecated send_transcription_frames param from AWSNovaSonicLLMService 2026-04-01 17:03:05 -04:00
Mark Backman
bc4bbb1895 Remove deprecated PollyTTSService alias 2026-04-01 17:03:05 -04:00
Mark Backman
eb014fffc4 Flush Cartesia context on voice/model/language changes
Override _update_settings in CartesiaTTSService to flush the current
audio context and assign a new turn context ID when voice, model, or
language settings change. This prevents Context has closed errors
from Cartesia API, which locks these parameters per context.
2026-04-01 17:03:05 -04:00
Mark Backman
e74930b954 Remove deprecated text_aggregator and text_filter params from TTS
Remove the deprecated text_aggregator parameter from TTSService,
CartesiaTTSService, and RimeTTSService, and the deprecated text_filter
parameter from TTSService. Users should use LLMTextProcessor before
the TTS service instead. Update the voice-switching example to use
LLMTextProcessor with PatternPairAggregator.
2026-04-01 17:03:05 -04:00
Aleix Conchillo Flaqué
6ed4109da9 Add changelog for #4224 2026-04-01 13:58:45 -07:00
Aleix Conchillo Flaqué
53f809b7d5 Make function_call_timeout_secs optional and skip timeout task when unset
Change the default from 10s to None so deferred function calls can run
indefinitely when no timeout is configured. Only create the timeout
task when a timeout is actually provided (per-call or service-level).
2026-04-01 13:58:09 -07:00
kompfner
a3c7f6c2af Merge pull request #4215 from pipecat-ai/pk/remove-openaillmcontext
Remove deprecated `OpenAILLMContext` as well as everything (code path…
2026-04-01 14:03:35 -04:00
Paul Kompfner
df68665ec1 Add changelog entries for OpenAILLMContext removal 2026-04-01 14:03:08 -04:00
Harshita Jain
bd6cbd7fe7 feat: add Smallest AI STT service integration (#4162)
Add SmallestSTTService using the Pulse WebSocket API for real-time
transcription. Includes SmallestSTTSettings dataclass, 32-language
support with resolve_language fallback, VAD-driven finalize signal,
and SMALLEST_TTFS_P99 latency constant.   

Also adds X-Source and X-Pipecat-Version headers to Smallest STT
and TTS WebSocket connections.
2026-04-01 13:44:04 -04:00
Mark Backman
33ef6b3174 Merge pull request #4218 from pipecat-ai/mb/rename-all-examples
Rename all examples
2026-04-01 07:15:57 -04:00
Mark Backman
3ca656cae5 Update simli name to match others 2026-03-31 22:54:21 -04:00
Mark Backman
6a84d02156 Update evals
- Removed evals for removed services
- Added eval for function-calling-deepseek.py
2026-03-31 22:13:52 -04:00
Mark Backman
080da8b94c Update eval script paths to match renamed example files 2026-03-31 22:09:42 -04:00
Mark Backman
d3021b4590 Rename example files to prepend parent folder name, preventing package shadowing
Example files like openai.py shadow installed packages when Python adds the
script directory to sys.path. Prepend the parent folder name to each example
file (e.g. openai.py -> function-calling-openai.py). Also split
thinking-and-mcp/ into separate mcp/ and thinking/ directories.
2026-03-31 22:06:01 -04:00
Paul Kompfner
92e34ea6e8 Fix potential UnboundLocalError for system_message in tracing decorator
Restore the `system_message = None` initialization that was dropped
when collapsing the OpenAILLMContext branch.
2026-03-31 21:00:51 -04:00
Paul Kompfner
ebab75765d Fix stream cancellation tests to mock get_chat_completions
The tests were mocking the removed _stream_chat_completions_*_context
methods. Update them to mock get_chat_completions instead.
2026-03-31 18:54:23 -04:00
Paul Kompfner
110c88bf92 Remove stale re-export of deleted google.openai subpackage 2026-03-31 18:53:55 -04:00
Paul Kompfner
19e521b75a Simplify LLMContextFrame handling in process_frame methods
Now that LLMContextFrame is the only frame that provides a context,
remove the intermediate `context = None` / `if context:` pattern
and handle context processing directly in the isinstance branch.
2026-03-31 18:35:48 -04:00
Paul Kompfner
394599d031 Remove deprecated OpenAILLMContext as well as everything (code paths or whole types) dependent on it (all of which were also deprecated) 2026-03-31 18:15:25 -04:00
mattie ruth backman
0f47076703 More RTVI version parsing improvements 2026-03-31 16:05:53 -04:00
mattie ruth backman
3e255f3d21 improve version format check 2026-03-31 16:05:53 -04:00
mattie ruth backman
565b9b961d add tests for rtvi versioning 2026-03-31 16:05:53 -04:00
mattie ruth backman
692c3c74d1 We should now expect clients to be version 1.0.0 with valid versioning info 2026-03-31 16:05:53 -04:00
Mark Backman
7d309b3340 Merge pull request #4208 from pipecat-ai/mb/remove-deprecated-services
Remove deprecated service module shims
2026-03-31 15:37:12 -04:00
Mark Backman
04e8444096 Add changelog for #4208 2026-03-31 15:34:16 -04:00
Mark Backman
7501effad5 Remove deprecated service module shims and old implementations
Delete deprecated import shims that only re-export from new locations:
- services/ai_services.py
- services/gemini_multimodal_live/
- services/aws_nova_sonic/
- services/openai_realtime/
- services/deepgram/{stt,tts}_sagemaker.py
- services/google/{llm_openai,llm_vertex,google}.py
- services/google/gemini_live/llm_vertex.py
- services/riva/
- services/nim/

Remove deprecated implementations replaced by newer services:
- services/openai_realtime_beta/ (use openai.realtime)
- services/google/openai/ (use google.llm)

Also removes associated examples and tests for deleted services.
2026-03-31 15:34:14 -04:00
Mark Backman
0c8ff9c4c3 Merge pull request #4209 from pipecat-ai/mb/grok-3-default
Change GrokLLMService default model to grok-3
2026-03-31 15:29:34 -04:00
Mark Backman
53f6426b0b Merge pull request #4216 from pipecat-ai/mb/add-missing-google-vertex
Add missing google-vertex.py file
2026-03-31 15:29:04 -04:00
Mark Backman
9e32ade44b Merge pull request #4203 from pipecat-ai/mb/fix-json-decode-tool-calls
Handle incomplete function call arguments from interrupted LLM streams
2026-03-31 15:28:53 -04:00
Mark Backman
2574d24400 Merge pull request #4202 from pipecat-ai/mb/fix-inworld-tts-streaming-utf8
Fix UTF-8 decode error in Inworld TTS streaming response
2026-03-31 15:28:37 -04:00
Mark Backman
27cb078716 Add missing google-vertex.py file 2026-03-31 15:25:52 -04:00
Mark Backman
ca636813a8 Merge pull request #4206 from pipecat-ai/mb/flatten-examples-dir
Move foundational examples to examples/
2026-03-31 15:23:49 -04:00
Mark Backman
47b41a0ff7 Rename services/ to voice/ and function-calling/, flatten to top level
Replace the nested services/speech/ and services/function-calling/ with
top-level voice/ and function-calling/ directories. Update eval script
paths and README to match.
2026-03-31 15:20:03 -04:00
Mark Backman
f14638a1fd Revert "Flatten services/ nesting: promote speech and function-calling to top level"
This reverts commit e1939ecd44.
2026-03-31 14:59:23 -04:00
Mark Backman
e1939ecd44 Flatten services/ nesting: promote speech and function-calling to top level
Move services/speech/* directly into services/ and services/function-calling/*
into top-level function-calling/. Update eval script paths and README.
2026-03-31 14:55:22 -04:00
Mark Backman
dc5b94f9e0 Merge pull request #4213 from pipecat-ai/mb/google-imagen-4
Update default Google Imagen model to imagen-4.0
2026-03-31 13:20:20 -04:00
Mark Backman
1d85aedcae Split features/ into audio/, observability/, and rag/ subfolders
Extract focused example groups from the catch-all features/ folder:
- audio/: audio recording, background sound, sound effects
- observability/: observer, heartbeats, sentry metrics
- rag/: mem0, gemini-rag, gemini grounding metadata

Update README to document the new folders.
2026-03-31 13:15:06 -04:00
Mark Backman
e719cbbe6d Reorganize examples into topic-based subfolders
Move 304 examples from a flat numbered directory into 14 descriptive
subfolders: getting-started, services (speech + function-calling),
transcription, vision, realtime, persistent-context,
context-summarization, update-settings (stt/tts/llm), turn-management,
thinking-and-mcp, transports, video-avatar, video-processing, and
features.

Strip numbered prefixes from filenames (e.g. 07c-interruptible-deepgram.py
becomes services/speech/deepgram.py) since the folder context makes them
redundant. Keep numbered prefixes only in getting-started/ where ordering
matters.

Update eval script paths and README to match the new structure.
2026-03-31 13:12:24 -04:00
Mark Backman
f2ce7ececc Move foundational examples to examples/ 2026-03-31 13:12:24 -04:00
kompfner
bd7496fa27 Merge pull request #4211 from pipecat-ai/pk/openai-responses-websocket-service-refactor
Introduce WebsocketLLMService and refactor OpenAIResponsesLLMService …
2026-03-31 13:02:45 -04:00
Paul Kompfner
0a8bcf58c4 Register on_connection_error event handler in WebsocketLLMService 2026-03-31 10:52:33 -04:00
Paul Kompfner
0fb45c6114 Guard _drain_cancelled_response against None websocket 2026-03-31 10:32:47 -04:00
Paul Kompfner
657a5def57 Use consistent 'inference' terminology in error messages 2026-03-31 10:17:29 -04:00
Paul Kompfner
30903042e5 Work around OpenAI Python SDK temperature bug in example 2026-03-31 10:16:30 -04:00
Mark Backman
9936ec16cb Add changelog for #4213 2026-03-31 09:28:31 -04:00
Mark Backman
212aff15c9 Update default Google Imagen model to imagen-4.0-generate-001 2026-03-31 09:16:24 -04:00
Paul Kompfner
f2b3f87661 Clarify discrete vs continuous contrast in WebsocketLLMService docstring 2026-03-30 23:46:23 -04:00
Paul Kompfner
77cfb181f6 Clarify per-inference helper usage in WebsocketLLMService docstring 2026-03-30 23:25:56 -04:00
Paul Kompfner
0b256936c6 Add ConnectionClosed to _receive_response_events raises docstring 2026-03-30 23:14:45 -04:00
Paul Kompfner
3922963c7a Extract helpers in _process_context to reduce repeated code 2026-03-30 23:10:38 -04:00
Paul Kompfner
ab9f2a35b6 Clean up TTFB metrics and previous_response state on inference failure 2026-03-30 23:04:06 -04:00
Paul Kompfner
f19d1183d8 Clean up TTFB metrics and previous_response state on retry failure 2026-03-30 23:00:22 -04:00
Paul Kompfner
9ad4fe6344 Use concrete inference language instead of abstract transaction terminology 2026-03-30 22:42:40 -04:00
Paul Kompfner
04882f6f2a Simplify _connect_websocket guard and remove unused State import 2026-03-30 22:32:08 -04:00
Paul Kompfner
712e42533d Introduce WebsocketLLMService and refactor OpenAIResponsesLLMService to use it
Add WebsocketLLMService as a base class for WebSocket-based LLM services,
parallel to WebsocketTTSService/WebsocketSTTService but codifying a
transactional request-response model rather than a continuous background
receive loop.

WebsocketLLMService provides:
- Connection lifecycle (start/stop/cancel → connect/disconnect)
- _ws_send/_ws_recv with transparent ConnectionClosed handling
  (auto-reconnect via exponential backoff → WebsocketReconnectedError)
- _ensure_connected with retry via _try_reconnect

OpenAIResponsesLLMService now inherits from WebsocketLLMService, removing
duplicated connection management code (_connect, _disconnect, _reconnect,
_ensure_connected, _ws_send, start, stop, cancel) and simplifying
_process_context from a loop with attempt tracking to a flat try/except
with a single retry.
2026-03-30 22:26:31 -04:00
Mark Backman
7d8b436018 Add changelog for #4209 2026-03-30 21:40:17 -04:00
Mark Backman
bf1856f610 Change GrokLLMService default model from grok-3-beta to grok-3
The grok-3 model is now generally available, so update the default
from the beta variant.
2026-03-30 21:39:33 -04:00
Mark Backman
248e0a4c90 Merge pull request #4207 from pipecat-ai/mb/remove-krisp
Remove docs uses of krisp optional dependency
2026-03-30 19:54:14 -04:00
Mark Backman
89dcd57577 Remove docs uses of krisp optional dependency 2026-03-30 19:50:40 -04:00
Mark Backman
32022a952e Merge pull request #4205 from pipecat-ai/mb/remove-quickstart
Remove quickstart example from repo
2026-03-30 18:58:49 -04:00
Aleix Conchillo Flaqué
65d9fcc315 Merge pull request #4204 from pipecat-ai/aleix/remove-some-deprecations
Remove deprecated APIs and modules
2026-03-30 15:32:53 -07:00
Mark Backman
b78ae40d3c Remove quickstart example from repo 2026-03-30 18:20:41 -04:00
Aleix Conchillo Flaqué
ece4d0661e update uv.lock 2026-03-30 15:06:05 -07:00
Aleix Conchillo Flaqué
82a852c1ff Add changelog for #4204 2026-03-30 15:06:05 -07:00
Aleix Conchillo Flaqué
5be1b9c8cb LLMService: remove deprecated request_image_frame() 2026-03-30 15:06:05 -07:00
Aleix Conchillo Flaqué
7913d4e188 FrameProcessor: remove deprecated wait_for_task() 2026-03-30 14:45:42 -07:00
Aleix Conchillo Flaqué
c8dd7c2b57 rtvi: remove old deprecations 2026-03-30 14:44:32 -07:00
Aleix Conchillo Flaqué
77e5f4acc1 runner(daily): remove deprecated configure_with_args() 2026-03-30 14:31:39 -07:00
Aleix Conchillo Flaqué
be8d4dfd87 TTSService: remove deprecated say() function 2026-03-30 14:29:30 -07:00
Aleix Conchillo Flaqué
bb2c60a998 transports: remove deprecated vad_enabled and vad_audio_passthrough 2026-03-30 14:28:34 -07:00
Aleix Conchillo Flaqué
7c644ed810 RTVIObserver: remove deprecated errors_enabled 2026-03-30 14:26:53 -07:00
Aleix Conchillo Flaqué
96ceec2a43 transports: remove deprecated camera_in_* and camera_out_* params 2026-03-30 14:24:40 -07:00
Aleix Conchillo Flaqué
d249473f0b AudioBufferProcessor: remove deprecated user_continuous_stream 2026-03-30 14:22:21 -07:00
Aleix Conchillo Flaqué
1da2018c85 PipelineTask: remove deprecated on_pipeline_ended/cancelled/stopped 2026-03-30 14:20:45 -07:00
Aleix Conchillo Flaqué
af126ec7cf PipelineParams: remove deprecated observers field 2026-03-30 14:18:07 -07:00
Aleix Conchillo Flaqué
340e58bf5c LLMService: remove old function call single argument 2026-03-30 14:16:18 -07:00
Aleix Conchillo Flaqué
7873159d0f LLMService: remove start_callback 2026-03-30 14:13:23 -07:00
Aleix Conchillo Flaqué
c783101741 frames: remove deprecated interruption frames 2026-03-30 14:08:42 -07:00
Aleix Conchillo Flaqué
73b8bbf963 frames: remove deprecated transport frames 2026-03-30 14:08:24 -07:00
Aleix Conchillo Flaqué
ebbe5acc8f frames: remove deprecated KeypadEntryFrame 2026-03-30 14:07:54 -07:00
Aleix Conchillo Flaqué
dd1bea2a5f audio(turn): remove FalSmartTurnAnalyzer and LocalSmartTurnAnalyzer 2026-03-30 14:04:29 -07:00
Aleix Conchillo Flaqué
136e6a58be audio(utils): remove create_default_resampler 2026-03-30 14:02:13 -07:00
Aleix Conchillo Flaqué
f0d04dde1c audio(filters): remove KrispFilter 2026-03-30 14:01:06 -07:00
Aleix Conchillo Flaqué
742a278c05 audio(filters): remove NoisereduceFilter 2026-03-30 13:58:35 -07:00
Aleix Conchillo Flaqué
b16befc9e9 transports(daily): remove deprecated frames 2026-03-30 13:56:25 -07:00
kompfner
0c11eb6fd0 Merge pull request #4141 from pipecat-ai/pk/openai-responses-websocket-service
feat: add WebSocket-based OpenAI Responses LLM service
2026-03-30 15:25:32 -04:00
Mark Backman
ea39389e03 Add changelog for #4203 2026-03-30 14:24:49 -04:00
Mark Backman
4adf0fd585 Handle incomplete function call arguments from interrupted LLM streams
When a user interruption causes the LLM chunk stream to exit early,
function call arguments may be incomplete JSON. Wrap json.loads() in
try/except JSONDecodeError to skip malformed function calls with a
warning instead of crashing. Fixes #2461.
2026-03-30 14:24:04 -04:00
Mark Backman
465b9bcbc6 Add changelog for #4202 2026-03-30 14:16:21 -04:00
Mark Backman
3f4814cf84 Fix UTF-8 decode error in Inworld TTS streaming response
Buffer raw bytes and only decode after splitting on newline boundaries,
preventing multi-byte UTF-8 characters from being split at chunk edges.

Fixes #3538
2026-03-30 14:15:06 -04:00
Mark Backman
f6a3678f93 Improve tests 2026-03-30 12:46:30 -04:00
Mark Backman
3af93ed257 Add changelog for #4201 2026-03-30 12:31:26 -04:00
Mark Backman
f37bf989dd Make reconnection failure error non-fatal to allow service failover
A single service failing to reconnect should not kill the entire
pipeline. Non-fatal errors flow through the pipeline so application
code (e.g. ServiceSwitcher) can handle failover to a backup service.
2026-03-30 12:29:53 -04:00
Mark Backman
86a16d53bc Detect quick connection failures in WebsocketService to prevent infinite reconnection loops
When a WebSocket server accepts the handshake but immediately closes the
connection (e.g. invalid API key returning close code 1008), the existing
exponential backoff does not help because the handshake keeps succeeding.
This tracks how long each connection survives and emits a non-fatal
ErrorFrame after 3 consecutive sub-5s failures, allowing ServiceSwitcher
failover instead of killing the pipeline.

Fixes #3711
2026-03-30 12:23:11 -04:00
Paul Kompfner
0efef19d60 Fix code review issues in WebSocket Responses service
- Use finally block in _disconnect to ensure state is always cleaned
  up, even if websocket.close() throws — prevents stale cancellation
  state (e.g. _cancel_pending_response) from polluting a new connection
- Catch ConnectionClosed in _drain_cancelled_response alongside
  TimeoutError — prevents _needs_drain from staying True and bricking
  the service on every subsequent inference attempt
- Fall back to OPENAI_API_KEY env var when api_key is not passed,
  since the WebSocket connection uses raw websockets (not the
  AsyncOpenAI client which handles this automatically)
- Use _clear_cancellation_state() instead of piecemeal resets where
  appropriate
2026-03-30 10:54:47 -04:00
Mark Backman
87b8f38a48 Merge pull request #4198 from pipecat-ai/mb/readme-update-2026-03-30
Add missing services to README available services table
2026-03-30 10:46:52 -04:00
Mark Backman
e1a3ddbb57 Add missing services to README available services table
Adds Kokoro (TTS), LiveKit and WhatsApp (Transport), Genesys
(Serializers), and Krisp Viva and RNNoise (Audio Processing).
2026-03-30 10:06:14 -04:00
Paul Kompfner
b5683556d4 Remove duplicate entries in run-release-evals.py, which appeared after a rebase 2026-03-30 10:03:43 -04:00
Paul Kompfner
26f85687d6 Handle response cancellation by draining before next inference
Instead of trying to filter stale events inline (unreliable — the API
doesn't provide a way to correlate events to a specific response),
drain remaining events from a cancelled response before starting the
next one. On cancellation, send response.cancel and set a drain flag.
At the start of the next _process_context, read and discard events
until a terminal event arrives, ensuring a clean connection. Falls
back to reconnecting if draining times out.
2026-03-30 09:59:03 -04:00
Paul Kompfner
670ce30a1c Document why HTTP variant doesn't use previous_response_id
Over HTTP, previous_response_id requires store=True (30-day OpenAI-side
conversation storage). The WebSocket variant avoids this via a
connection-local in-memory cache that works with store=False. Add
comments explaining this in both class docstrings, at the store=False
parameter, and in the adapter's previous_response_id note.
2026-03-30 09:59:03 -04:00
Paul Kompfner
1c8d31de70 Add trace logging for previous_response_id decisions and fix example
Add detailed trace-level logging to _apply_previous_response_optimization
showing why the optimization was applied or fell back to full context,
including the relevant data for debugging.

Use append_to_context=False for the filler TTSSpeakFrame in the
function-calling example to avoid altering the conversation history
and breaking the previous_response_id prefix match.
2026-03-30 09:59:03 -04:00
Paul Kompfner
9defff2a34 Skip server-known output items in previous_response_id optimization
When using previous_response_id, the server already knows its own
output from the previous response. Store the raw response output and,
on the next call, compare it against the items following the matched
input prefix — checking role and text content for messages, and call_id
for function calls. If the items match, skip them and send only truly
new input (user messages, tool results). Falls back to full context if
either the prefix or the output comparison fails.
2026-03-30 09:59:03 -04:00
Paul Kompfner
59d28f9fd2 Add changelog for WebSocket OpenAI Responses service 2026-03-30 09:59:03 -04:00
Paul Kompfner
f2a8a9e753 Add WebSocket-based OpenAI Responses LLM service with previous_response_id optimization
Introduce a WebSocket variant of the OpenAI Responses API service that
maintains a persistent connection to wss://api.openai.com/v1/responses
for lower-latency inference. The WebSocket variant automatically uses
previous_response_id to send only incremental context when possible,
falling back to full context on reconnection or cache miss.

The WebSocket variant becomes the new default OpenAIResponsesLLMService,
and the HTTP variant is renamed to OpenAIResponsesHttpLLMService. Both
share a private base class with common settings, parameter building,
and run_inference (always HTTP) logic.
2026-03-30 09:58:56 -04:00
Mark Backman
d1eb2699f3 Merge pull request #4192 from pipecat-ai/mb/update-langchain
Update langchain dependencies to latest major versions
2026-03-30 08:54:41 -04:00
Mark Backman
2e0f5fc6e9 Merge pull request #4194 from pipecat-ai/mb/update-community-integrations-package-convention
Add pipecat-{vendor} package naming convention to community guide
2026-03-30 08:52:28 -04:00
Mark Backman
dd3ca6fbba Merge pull request #4191 from pipecat-ai/mb/remove-openpipe
Remove OpenPipe integration
2026-03-30 08:52:14 -04:00
Mark Backman
171692aa30 Add pipecat-{vendor} package naming convention to community guide
Formalizes the package naming pattern that most community contributors
already follow organically, improving discoverability on PyPI.
2026-03-29 12:39:20 -04:00
Mark Backman
81ddd103f9 Fix KeyError on context messages without role in RTVI observer
Use dict.get() instead of direct key access to handle context messages
that don't have a 'role' key, such as tool results.
2026-03-29 10:28:00 -04:00
Mark Backman
8c9e189394 Fix langchain imports for langchain 1.x compatibility
ChatPromptTemplate moved from langchain.prompts to langchain_core.prompts
in langchain 1.x.
2026-03-29 10:27:48 -04:00
Mark Backman
b6579dc763 Update uv lock with latest versions of Pygments and cryptography 2026-03-29 10:20:45 -04:00
Mark Backman
abd63336e4 Add changelog for #4192 2026-03-29 10:18:52 -04:00
Mark Backman
ccb9dc20f8 Update langchain dependencies to latest major versions
Update langchain 0.3→1.2, langchain-community 0.3→0.4, and
langchain-openai 0.3→1.1. This also unblocks openai>=2.26 which
was previously constrained by the now-removed openpipe package.
2026-03-29 10:17:28 -04:00
Mark Backman
2177e28ee1 Remove OpenPipe integration
OpenPipe was acquired by CoreWeave in September 2025. The Python package
hasn't been updated since June 2025 and the repo since 2024. The openpipe
package caps openai<=1.97.1, creating dependency conflicts with other
extras. Remove the dead integration to clean up the codebase.
2026-03-29 10:12:35 -04:00
Mark Backman
3eb7c2bcd9 Merge pull request #4187 from OmerCohenAviv/fix/heartbeat-monitor-configurable
Fix heartbeat monitor timeout not respecting custom heartbeat interval
2026-03-29 09:31:12 -04:00
Mark Backman
878940f94e Merge pull request #4189 from Arindam200/main
Add NebiusLLMService for Nebius Token Factory
2026-03-29 09:03:06 -04:00
Mark Backman
a3aeafcb2d Alphabetize nebius entry in pyproject.toml extras 2026-03-29 08:58:01 -04:00
Mark Backman
63254fe337 Add NebiusLLMService with developer role and tool support fixes
- Add Nebius LLM service wrapping OpenAI-compatible Token Factory API
- Set supports_developer_role = False (Nebius rejects developer role)
- Default to openai/gpt-oss-120b model (supports function calling)
- Add Nebius function-calling example and env.example entry
- Fix Sarvam developer role support
- Update examples to use developer role for intro messages
2026-03-29 08:50:11 -04:00
Arindam200
39919f7889 Add NebiusLLMService for Nebius Token Factory
Adds an OpenAI-compatible LLM service for Nebius Token Factory, supporting
open-source models (Meta Llama, Qwen, DeepSeek) via their OpenAI-compatible
REST API at https://api.tokenfactory.nebius.com/v1/.
2026-03-29 14:35:46 +05:30
OmercohenAviv
f2e0f5d20c move wait_time out of loop 2026-03-29 00:05:21 +03:00
OmercohenAviv
2724ef6d6f non optional 2026-03-28 12:12:02 +03:00
OmercohenAviv
33fb8852e6 ruff 2026-03-28 12:05:30 +03:00
OmercohenAviv
5fe48da2fb Merge branch 'main' into fix/heartbeat-monitor-configurable 2026-03-28 11:57:23 +03:00
OmercohenAviv
dccd98ec8a test 2026-03-28 11:53:51 +03:00
Aleix Conchillo Flaqué
a84c69858e Merge pull request #4185 from pipecat-ai/changelog-0.0.108
Release 0.0.108 - Changelog Update
2026-03-27 21:47:53 -07:00
aconchillo
ca224219dc Update changelog for version 0.0.108 2026-03-27 21:43:37 -07:00
Aleix Conchillo Flaqué
83dc979d19 Merge pull request #4186 from pipecat-ai/mb/fix-websocket-disconnect-race-condition
Fix FastAPI WebSocket disconnect race condition
2026-03-27 21:40:21 -07:00
Aleix Conchillo Flaqué
fc76b3f2fb update pyproject.toml and uv.lock 2026-03-27 21:36:03 -07:00
Mark Backman
4670370dbb Add changelog for #4186 2026-03-28 00:02:44 -04:00
Mark Backman
47e53890e3 Fix FastAPI WebSocket disconnect race condition causing pipeline hang
When the remote side disconnects while send() is in flight, send() was
setting _closing=True. This prevented the receive loop from firing
on_client_disconnected, causing the pipeline to hang waiting for a
disconnect signal that never came.

The fix removes _closing from send() (that flag means we initiated the
close) and instead checks Starlette application_state in _can_send()
to suppress subsequent sends after a failure.

Fixes #3912
2026-03-28 00:01:25 -04:00
Aleix Conchillo Flaqué
195180b6f4 Merge pull request #4184 from pipecat-ai/aleix/fix-sarvam-examples-role
Fix Sarvam examples to use 'user' role instead of 'developer'
2026-03-27 20:34:59 -07:00
Aleix Conchillo Flaqué
8b64166bb7 Fix Sarvam examples to use 'user' role instead of 'developer'
Sarvam uses the OpenAI-compatible API but does not support the
'developer' role, causing errors. Use 'user' role instead.
2026-03-27 20:33:25 -07:00
Aleix Conchillo Flaqué
1d18995435 Merge pull request #4183 from pipecat-ai/aleix/fix-task-scheduling
Yield after create_task to ensure timer tasks are scheduled
2026-03-27 20:32:32 -07:00
Aleix Conchillo Flaqué
ea7324b2ba Add changelog for #4183 2026-03-27 19:03:55 -07:00
Aleix Conchillo Flaqué
52ed7137af Yield after create_task to ensure timer tasks are scheduled
Add `await asyncio.sleep(0)` after `create_task()` calls in
UserIdleController, SpeechTimeoutUserTurnStopStrategy,
TurnAnalyzerUserTurnStopStrategy, and UserTurnCompletionLLMServiceMixin
so the event loop schedules the newly created timer tasks before the
caller continues.
2026-03-27 19:03:23 -07:00
kompfner
b33df03724 Merge pull request #4179 from pipecat-ai/pk/fix-gemini-live-vertex
Don't send history_config for Gemini Live Vertex (unsupported)
2026-03-27 17:34:29 -04:00
Paul Kompfner
28fbe1db08 Don't send history_config for Gemini Live Vertex (unsupported) 2026-03-27 17:30:47 -04:00
kompfner
9240e92d9f Merge pull request #4177 from pipecat-ai/pk/tweak-26i-for-gemini-3.1-flash-live-support
Tweak 26i example system instruction for Gemini 3.1 Flash Live compat…
2026-03-27 17:20:06 -04:00
Paul Kompfner
5caf53f086 Tweak 26i example system instruction for Gemini 3.1 Flash Live compatibility
Gemini 3.1 Flash Live won't reliably report ending its turn until
after it says something following a tool call. Restructure the system
instruction so the model says goodbye *after* calling
end_conversation, and add a comment explaining the deferred EndFrame
behavior that makes this work.
2026-03-27 17:13:17 -04:00
Mark Backman
ac2716811c Merge pull request #4176 from pipecat-ai/mb/fix-websocket-rtvi-messages
Fix RTVI events not delivered over WebSocket transports
2026-03-27 16:50:37 -04:00
Mark Backman
d313d56776 Fix RTVI events not delivered over WebSocket transports
The base serializer filters out RTVI protocol messages by default
(ignore_rtvi_messages=True) to prevent them from being sent over
telephony media streams. ProtobufFrameSerializer is used by WebSocket
transports, which are the delivery channel for these messages, so
disable the filter there.
2026-03-27 16:47:11 -04:00
kompfner
159776f106 Merge pull request #4175 from pipecat-ai/pk/gemini-live-dropped-support-for-text-modality
Warn when TEXT modality is set for Gemini Live, and remove 26d text example
2026-03-27 16:26:36 -04:00
kompfner
a23803478f Merge pull request #4171 from pipecat-ai/pk/fix-gemini-3.1-flash-live-video
Gate Gemini Live sending real-time input messages to the API until it…
2026-03-27 16:26:03 -04:00
Mark Backman
bae193ab4d Merge pull request #4172 from pipecat-ai/mb/rime-tts-fixes
Fix Rime TTS stop-frame handling and handle done message
2026-03-27 16:22:25 -04:00
Paul Kompfner
04adb697be Warn when TEXT modality is set for Gemini Live, and remove 26d text example
All recent Gemini Live models (including the default
gemini-2.5-flash-native-audio-preview-12-2025, and going at least as
far back as gemini-2.5-flash-native-audio-preview-09-2025) only
support AUDIO as a response modality. We considered using
`modalities=TEXT` as a Pipecat-level signal to suppress audio output
frames (so developers could pair Gemini Live with an external TTS),
but the output transcription from the API arrives too late relative
to the audio to be useful for driving an external TTS service.

For now, just log a warning when a TEXT modality is configured
(at init or via set_model_modalities) and proceed as normal. The 26d
text-modality example is removed since it no longer represents a
viable configuration.
2026-03-27 16:21:15 -04:00
Mark Backman
4f9c8a6860 Merge pull request #4174 from pipecat-ai/fix/deepgram-sdk-6.1.0-compat
Fix Deepgram STT compatibility with deepgram-sdk 6.1.0
2026-03-27 15:11:43 -04:00
Mark Backman
a1a29b3933 Add changelog for #4174 2026-03-27 14:50:12 -04:00
Mark Backman
0798803c70 Bump deepgram-sdk minimum version to 6.1.0 2026-03-27 14:46:17 -04:00
Mark Backman
6422661d08 Fix Deepgram STT compatibility with deepgram-sdk 6.1.0
The SDK now requires explicit message objects for send_keep_alive,
send_close_stream, and send_finalize instead of no-arg calls.
2026-03-27 14:40:48 -04:00
Mark Backman
ed94b65d83 Merge pull request #4173 from pipecat-ai/filipi/updating_inworld_examples
Removing the models from the Inworld example so we can use the default model.
2026-03-27 14:02:55 -04:00
filipi87
f9670b9601 Removing the models from the Inworld example so we can use the default model. 2026-03-27 14:23:20 -03:00
OmercohenAviv
de8ba68589 Fix heartbeat monitor timeout not respecting custom heartbeat interval
The heartbeat monitor timeout (`HEARTBEAT_MONITOR_SECS`) was a static
module-level constant that never derived from the user-configurable
`heartbeats_period_secs`. This meant overriding the heartbeat interval
had no effect on the monitor window, causing spurious warnings or
delayed detection depending on the configured interval.

Add a new `heartbeats_monitor_secs` parameter to `PipelineParams` so
the monitor timeout is independently configurable (defaults to 10s).
The monitor handler now reads from the instance param instead of the
hard-coded constant.

Made-with: Cursor
2026-03-27 19:41:06 +03:00
Paul Kompfner
5b2991f47f Gate Gemini Live sending real-time input messages to the API until it's ready, i.e. after we've sent the initial conversation history (or determined that we don't need to).
This fixes the 26c example when using Gemini 3.1 Flash Live, which seems to be more strict about not receiving real-time input (at least, video messages) before conversation history.
2026-03-27 12:41:05 -04:00
Mark Backman
fc3186dc0d Add changelog entries for PR #4172 2026-03-27 12:38:53 -04:00
Mark Backman
1808b447c9 Handle done message from Rime TTS to avoid stop-frame timeout
Rime's WebSocket API sends a done message when synthesis completes.
Handle it to stop TTFB metrics, push TTSStoppedFrame, and remove the
audio context immediately instead of relying on the 3-second
stop_frame_timeout_s fallback.
2026-03-27 12:37:03 -04:00
Mark Backman
70df9d3fe4 Fix duplicate TTSStoppedFrame in TTS service timeout path 2026-03-27 12:07:37 -04:00
Filipi da Silva Fuchter
a8bfc23d3a Merge pull request #4167 from pipecat-ai/filipi/inworld_improvements
InworldTTSService improvements.
2026-03-27 11:15:14 -04:00
filipi87
e2870fc2ac Changing to debug the log when we are not able to append audio to the context. 2026-03-27 12:12:16 -03:00
filipi87
e851f8c1d5 Adding changelog entry for the fix. 2026-03-27 12:11:35 -03:00
filipi87
b31bece617 Not trying to recreate the context. 2026-03-27 12:06:21 -03:00
kompfner
9e350bcc2f Merge pull request #4147 from pipecat-ai/cb/gemini-transcript-fixes
Fix Gemini Live to handle bundled server_content fields
2026-03-27 11:00:10 -04:00
Paul Kompfner
9c2594c484 Remove brittle test 2026-03-27 10:56:39 -04:00
Mark Backman
900fc88430 Merge pull request #4128 from pipecat-ai/mb/end-of-turn-assembly 2026-03-27 10:47:09 -04:00
filipi87
4ef5ac6f0c InworldTTSService improvements. 2026-03-27 11:33:32 -03:00
Mark Backman
cbb3d99493 Merge pull request #4166 from pipecat-ai/mb/fix-example-ordering-56
Fix example numbering, add LemonSlice to evals
2026-03-27 10:29:07 -04:00
Filipi da Silva Fuchter
fb1996cedc Merge pull request #4143 from pipecat-ai/cb/sagemaker-flux
Add Deepgram Flux STT service for AWS SageMaker
2026-03-27 10:27:49 -04:00
Filipi da Silva Fuchter
95c55ec6c3 Merge pull request #4145 from pipecat-ai/filipi/tts_improvements_remove_reset
TTS improvements.
2026-03-27 10:24:59 -04:00
Mark Backman
a45de9af7f Merge pull request #4161 from tanmayc25/fix/lemonslice-missing-dtmf-callback
fix(lemonslice): add missing on_dtmf_event callback in DailyCallbacks construction
2026-03-27 10:19:54 -04:00
Mark Backman
5e61a57582 Fix changelog entry for #4161 2026-03-27 10:16:25 -04:00
Mark Backman
d8b0ed18fd Fix example numbering, add LemonSlice to evals 2026-03-27 10:11:37 -04:00
Mark Backman
789275a57b Merge pull request #4164 from pipecat-ai/mb/update-community-integrations-guide
docs: update COMMUNITY_INTEGRATIONS.md for accuracy
2026-03-27 09:38:31 -04:00
Filipi da Silva Fuchter
38c961a363 Merge pull request #4113 from inworld-ai/ian/lang-timestamps
fix(inworld): fallback to full text when TTS timestamps are not received
2026-03-27 09:34:05 -04:00
Mark Backman
41a86a51bf docs: update COMMUNITY_INTEGRATIONS.md for accuracy
- Replace deprecated TTS classes (AudioContextWordTTSService, WordTTSService)
  with current hierarchy (WebsocketTTSService, InterruptibleTTSService, TTSService)
- Add WebsocketSTTService and SDK-based STTService categories
- Fix LLM section: document _process_context, adapter_class, remove deprecated
  create_context_aggregator guidance, add thought frames for reasoning models
- Fix Vision section: run_vision takes UserImageRawFrame not LLMContext,
  yields Vision*Frame types not TextFrame
- Fix push_error API: takes (error_msg, exception) not ErrorFrame
- Fix frame name: TTSRawAudioFrame → TTSAudioRawFrame
- Remove stale v13+ version reference
- Clarify @traced_stt method convention
2026-03-27 09:22:32 -04:00
Filipi da Silva Fuchter
e1bfa4cf21 Merge pull request #4152 from vpalmisano/vpalmisano-patch-1
Fix audio transcript check in base_llm.py
2026-03-27 08:34:15 -04:00
filipi87
537d57449e Fixing the format and including the changelog. 2026-03-27 09:29:46 -03:00
Tanmay Chaudhari
33e146decd fix(lemonslice): add missing on_dtmf_event callback in DailyCallbacks construction
DailyCallbacks gained a required on_dtmf_event field in PR #4047.
PR #4079 fixed this for TavusTransportClient but
LemonSliceTransportClient.setup() was not updated, causing a pydantic
ValidationError at pipeline setup time.
2026-03-27 12:06:26 +05:30
Mark Backman
eee47deb34 Merge pull request #4060 from alpsencer/fix/empty-tool-call-arguments
fix(openai): handle tool calls with empty/null arguments
2026-03-26 22:04:37 -04:00
Mark Backman
21a729ae5d Merge pull request #4146 from pipecat-ai/mb/gemini-live-local-vad 2026-03-26 17:48:21 -04:00
Filipi da Silva Fuchter
1870f4010e Merge pull request #4158 from pipecat-ai/filipi/flux_refactor
Creating a base class, DeepgramFluxSTTBase, to reuse Deepgram Flux logic
2026-03-26 17:33:35 -04:00
filipi87
28683a7296 Moving flux_stt.py to deepgram/flux/sagemaker/stt.py 2026-03-26 17:43:51 -03:00
filipi87
0e504d876d Creating a base class DeepgramFluxSTTBase so we can reuse Deepgram Flux logic. 2026-03-26 17:37:37 -03:00
Mark Backman
5c51981207 Merge pull request #4149 from pipecat-ai/mb/fix-service-switcher-passthrough-errors
Fix ServiceSwitcher reacting to pass-through ErrorFrames
2026-03-26 16:34:45 -04:00
Mark Backman
a13c4d1248 Narrow ServiceSwitcher error check to active service only
Only trigger handle_error for ErrorFrames originating from the active
service, not any managed service. This prevents edge cases where errors
from a non-active service could incorrectly trigger failover.
2026-03-26 15:28:19 -04:00
filipi87
ca1b4ad124 Organizing the methods from Deepgram Flux and Flux SageMaker in the same position. 2026-03-26 16:05:17 -03:00
Mark Backman
533dcdba3f Merge pull request #4154 from pipecat-ai/mb/deprecate-sambanova-stt
Remove SambaNovaSTTService
2026-03-26 14:10:14 -04:00
Mark Backman
7eec03cb77 Merge pull request #4156 from pipecat-ai/mb/mem0-improvements
fix(mem0): improve Mem0 service reliability and add get_memories() method
2026-03-26 14:09:34 -04:00
Mark Backman
83911dced6 docs: add changelog entries for #4156 2026-03-26 13:30:00 -04:00
Mark Backman
4e4a8c45d5 build(mem0): bump mem0ai dependency to >=1.0.8,<2 2026-03-26 13:28:41 -04:00
Mark Backman
9c6d51c570 feat(mem0): add get_memories() convenience method to Mem0MemoryService
Expose a public method for retrieving all stored memories outside the
pipeline, avoiding the need for callers to reimplement client branching,
OR filter construction, and asyncio.to_thread wrapping. Simplify the
example get_initial_greeting() to use it.
2026-03-26 13:28:41 -04:00
Mark Backman
9152d85824 fix(mem0): filter to user/assistant roles before storing in Mem0
Mem0 API only accepts user and assistant roles. Filter out system,
developer, and other roles before calling add() to avoid 400 errors.
2026-03-26 13:28:41 -04:00
Mark Backman
6a87d0e87d fix(mem0): make memory service non-blocking and use position parameter
Move blocking Mem0 API calls off the event loop using asyncio.to_thread().
Store messages as a fire-and-forget background task via create_task() since
the result is not needed. Insert memory messages at the configured position
in the context instead of always appending.

Closes #1741
2026-03-26 13:28:41 -04:00
Mark Backman
fe0633ecd1 Add 14s to release evals 2026-03-26 12:27:27 -04:00
Mark Backman
ca2bfd6f12 Remove SambaNovaSTTService
SambaNova no longer offers speech-to-text audio models.
2026-03-26 12:22:06 -04:00
kompfner
345ccc0abe Merge pull request #4148 from pipecat-ai/khk/gemini-transcription-fixes-addon
Fix bundled Gemini Live transcription ordering
2026-03-26 11:33:10 -04:00
namanbansal013
800fd6a916 Changelog entry for the websocket word context leak. 2026-03-26 11:52:34 -03:00
filipi87
d286991257 Changelog entry for the changes involving add_word_timestamp. 2026-03-26 11:51:31 -03:00
namanbansal013
a06bf47ed2 Discard any pre-audio word timestamps from the interrupted turn. 2026-03-26 11:42:24 -03:00
Mark Backman
5ad4aa9bea Merge pull request #4153 from pipecat-ai/mb/deepgram-stt-try-except
Handle Deepgram SDK 6.x send_media() exceptions
2026-03-26 10:15:21 -04:00
filipi87
c4466ba678 Adding changelog for the InterruptibleTTSService race condition fix 2026-03-26 10:58:57 -03:00
filipi87
df602b900d Preventing a race condition in the InterruptibleTTSServices in cases where run_tts has been invoked but the BotStartedSpeakingFrame has not yet been received. 2026-03-26 10:39:44 -03:00
Mark Backman
c331c75d66 Add tests for send_media() exception handling in DeepgramSTTService 2026-03-26 09:20:58 -04:00
filipi87
f7ec6befe1 Invoking superclass method when audio context is interrupted or completed. 2026-03-26 10:14:21 -03:00
Mark Backman
6a6ee8d563 Merge pull request #4150 from pipecat-ai/mb/resolve-dependabot-2026-03-25
Bump nltk minimum version to 3.9.4 to resolve CVE-2026-33230
2026-03-26 09:10:47 -04:00
Mark Backman
259f5e124c Add changelog for #4153 2026-03-26 08:48:45 -04:00
Mark Backman
cfe91d11ec Handle Deepgram SDK 6.x send_media() exceptions
Deepgram SDK 6.x surfaces connection errors from send_media() instead
of silently swallowing them. This causes error floods when the WebSocket
disconnects since every queued audio frame hits the dead connection.

Wrap send_media() in try/except: on failure, log one warning and set
self._connection = None so subsequent frames skip until the existing
_connection_handler reconnects.
2026-03-26 08:45:42 -04:00
Vittorio Palmisano
467184e63e Fix audio transcript check in base_llm.py
In some cases the openai provider could answer with a `chunk.choices[0].delta.audio = None`, so the process context fails with error:
```
pipecat/services/openai/base_llm.py:552): Error during completion: 'NoneType' object has no attribute 'get'
```
2026-03-26 13:09:36 +01:00
Mark Backman
af566ac936 Merge pull request #4151 from ajmeraharsh/fix/livekit-clear-audio-queue-on-interruption
fix(livekit): clear AudioSource buffer on interruption
2026-03-26 00:52:26 -04:00
ajmeraharsh
62484a4fc3 fix(livekit): clear AudioSource buffer on interruption
When an InterruptionFrame arrives, the Python-side audio task is
cancelled but frames already submitted to rtc.AudioSource continue
playing from its internal buffer. This causes the bot to keep speaking
for several seconds after being interrupted.

Fix by overriding process_frame in LiveKitOutputTransport to call
audio_source.clear_queue() on InterruptionFrame, immediately flushing
the buffered audio.
2026-03-26 09:47:00 +05:30
Mark Backman
7fef3b01eb Merge pull request #4142 from pipecat-ai/mb/grok-move-to-xai-module
Consolidate Grok services into xai module
2026-03-25 23:32:18 -04:00
Mark Backman
6d1918f12a Update GROK_API_KEY to XAI_API_KEY 2026-03-25 23:23:58 -04:00
Mark Backman
e58740e948 Bump nltk minimum version to 3.9.4 to resolve CVE-2026-33230 2026-03-25 23:16:46 -04:00
Mark Backman
ddfe44940d Add changelog for #4149 2026-03-25 22:54:25 -04:00
Mark Backman
fdbdbc8be3 Fix ServiceSwitcher reacting to pass-through ErrorFrames from other pipeline stages
ErrorFrames propagating upstream from downstream processors (e.g. TTS) would
enter the ServiceSwitcher via process_frame, traverse the active service sub-pipeline,
and reach push_frame where they incorrectly triggered failover. Now only errors whose
processor is one of the managed services trigger handle_error. Also fix the log in
handle_error to attribute errors to the actual source processor rather than the
current active_service.

Closes #4139
2026-03-25 22:53:04 -04:00
Kwindla Hultman Kramer
3cd7d882fb Fix bundled Gemini Live transcription ordering 2026-03-25 18:56:00 -07:00
Chad Bailey
2d78533d77 Add changelog for Gemini Live server_content fix 2026-03-25 23:42:42 +00:00
Chad Bailey
c1dd44f947 Fix Gemini Live message handling to process all server_content fields
Gemini 3.x can bundle multiple fields (e.g. model_turn and
output_transcription) on the same server_content message. The previous
elif chain would only process the first matching field and silently
drop the rest. Switch to independent if checks so every field is
handled.
2026-03-25 23:42:07 +00:00
Mark Backman
9db15e7942 Add changelog entry for #4146 2026-03-25 18:07:05 -04:00
Mark Backman
503e5e9106 Fix Gemini Live local VAD by sending correct activity events to server
When Gemini Live was configured with local VAD (server-side VAD disabled),
the service was listening for the wrong frame types and not sending
ActivityStart/ActivityEnd events to the server. Now it listens for
VADUserStartedSpeakingFrame/VADUserStoppedSpeakingFrame and sends the
appropriate activity signals when local VAD is in use.

Also removes the unnecessary local SileroVADAnalyzer from server-side VAD
examples and adds a new 26a example demonstrating local VAD configuration.
2026-03-25 18:00:13 -04:00
filipi87
2ff4b3f4a3 Improving docstring based on the recent changes. 2026-03-25 17:52:05 -03:00
filipi87
b4096f9a11 Refactoring to remove "Reset" and "TTSStoppedFrame" from word. 2026-03-25 17:47:24 -03:00
filipi87
c4253a7d98 Refactoring to invoke append_to_audio_context instead of direct queue put. 2026-03-25 17:21:55 -03:00
Filipi da Silva Fuchter
2441c4f801 Merge pull request #4135 from pipecat-ai/filipi/audio_buffer
Fixed audio crackling and popping artifacts in AudioBufferProcessor
2026-03-25 15:40:17 -04:00
Mark Backman
a7a55dd30e Merge pull request #4136 from pipecat-ai/mb/bump-package-version-nvidia
Upgrade protobuf to 6.x for nvidia-riva-client compatibility
2026-03-25 15:27:48 -04:00
Mark Backman
de6a7223ba Suppress verbose gRPC C-core logging in nvidia services
Set GRPC_VERBOSITY=ERROR by default so users do not see noisy fork
handler and abseil warnings from the gRPC C library. Users can still
override by setting GRPC_VERBOSITY themselves.
2026-03-25 15:23:54 -04:00
Mark Backman
165932e1cc Add changelog for #4136 2026-03-25 15:23:54 -04:00
Mark Backman
1f0d9ad01a Upgrade protobuf to 6.x for nvidia-riva-client 2.25.1 compatibility
nvidia-riva-client 2.25.1 ships with gencode compiled against protobuf
6.31.1, which requires a runtime >= 6.31.1. Update protobuf from 5.29.6
to >=6.31.1,<7 and grpcio-tools from 1.67.1 to 1.78.0 to match.
Regenerate frames_pb2.py with the new compiler.
2026-03-25 15:23:53 -04:00
Chad Bailey
052075c244 updated changelog 2026-03-25 19:12:37 +00:00
Chad Bailey
a8d0e1de9f Update changelog filename with PR number 2026-03-25 19:10:20 +00:00
Chad Bailey
4f0b2066c0 Add Deepgram Flux STT service for AWS SageMaker
Add DeepgramFluxSageMakerSTTService that combines SageMaker's HTTP/2
transport with Flux's JSON turn detection protocol (StartOfTurn,
EndOfTurn, EagerEndOfTurn, TurnResumed). Includes mid-stream Configure
support, silence watchdog, and an example bot.
2026-03-25 19:09:52 +00:00
filipi87
413dbaf974 Automated tests to validate the silence injection guards. 2026-03-25 16:05:58 -03:00
Ian Lee
5645909d34 [inworld] add falbback for empty timestamps from server 2026-03-25 11:55:09 -07:00
filipi87
da3f184316 Automated tests to validate the silence injection guards. 2026-03-25 15:38:21 -03:00
filipi87
e5a2723632 Fixed audio crackling and popping artifacts in AudioBufferProcessor. 2026-03-25 15:29:50 -03:00
Mark Backman
4ee4002d5d Merge pull request #4137 from pipecat-ai/mb/language-string-log-level-debug
Downgrade unrecognized language string log from warning to debug
2026-03-25 12:26:46 -04:00
Mark Backman
54a17ab1f3 Add changelog for #4142 2026-03-25 12:22:37 -04:00
Mark Backman
1c99a537b2 Consolidate Grok services into xai module
Both GrokLLMService and XAIHttpTTSService use the same xAI API (api.x.ai),
so move Grok source files into the xai module. Leave deprecation shims in
the old grok/ paths for backward compatibility.
2026-03-25 12:07:40 -04:00
Mark Backman
ff5d055b3c Merge pull request #4031 from niczy/xai-tts-service
Add xAI TTS service
2026-03-25 10:57:08 -04:00
Mark Backman
adc003d6c7 Code review cleanup 2026-03-25 10:53:07 -04:00
Nicholas Zhao
bbd14de9c5 Address PR review: rename to XAIHttpTTSService, add language map, clean up API
- Rename XAITTSService → XAIHttpTTSService and XAITTSSettings → XAIHttpTTSSettings
- Add language_to_xai_language() with explicit LANGUAGE_MAP using resolve_language()
- Remove deprecated InputParams, params, voice, language init params
- Remove XAI_DEFAULT_SAMPLE_RATE and XAI_PCM_CODEC constants; add encoding param
- Set sample_rate=None default (picked up from PipelineParams or user)
- Use Language.EN enum instead of string "en" for default language
- Add changelog/4031.added.md
- Add 07e-interruptible-xai.py foundational example
- Update 14g-function-calling-grok.py to use XAIHttpTTSService
- Register 07e in run-release-evals.py
2026-03-25 10:46:54 -04:00
Nicholas Zhao
02b97035f8 Add xAI TTS service 2026-03-25 10:45:15 -04:00
Mark Backman
f470ff193e Update language tests to expect debug instead of warning 2026-03-25 10:26:10 -04:00
Mark Backman
7bc8b89a54 Add changelog for #4137 2026-03-25 10:21:44 -04:00
Mark Backman
a8eff6fbbf Downgrade unrecognized language string log from warning to debug
Service-specific language strings like Deepgram's "multi" are valid
pass-through values, not issues worth warning about.
2026-03-25 10:20:36 -04:00
Mark Backman
b66c892100 Add changelog for #4128 2026-03-24 16:15:00 -04:00
Mark Backman
6c30371295 Fix Deepgram Flux event handler docstring to match implementation
Update documented event signatures to include transcript argument
where the code actually passes it. Remove stale on_speech_started
and on_utterance_end entries that were never registered.
2026-03-24 16:12:25 -04:00
Mark Backman
ddf6a41854 Add on_end_of_turn event handler to AssemblyAI STT
Fires after the final transcript is pushed in both Pipecat and
AssemblyAI turn detection modes, giving users a reliable hook
that arrives after all transcript frames. Matches the existing
Deepgram Flux on_end_of_turn pattern.
2026-03-24 16:11:35 -04:00
Om Chauhan
fa982a05c0 added changelog 2026-03-18 09:46:15 +05:30
Om Chauhan
419c7d4450 fix: default thinking config for Gemini 3+ Flash models 2026-03-18 09:33:54 +05:30
Yavuz Alp Sencer ÖZTÜRK
9a55eb67cf fix(openai): handle tool calls with empty/null arguments
When an LLM returns a tool call with no arguments (arguments=null in
the streaming chunks), the tool call is silently dropped because:

1. `tool_call.function.arguments` is None, so nothing is accumulated
   and `arguments` stays as "" (empty string)
2. `if function_name and arguments:` treats "" as falsy, skipping the
   entire tool call execution

OpenAI always sends arguments="{}" even for parameterless tools,
masking this bug. But vLLM, Ollama, and other OpenAI-compatible
providers may omit arguments entirely when the tool schema has no
required parameters, causing tool calls to be silently ignored.

Fix: check only `function_name` (not `arguments`) and default empty
arguments to "{}" so `json.loads` produces an empty dict. Apply the
same fallback for intermediate tool calls in multi-tool responses.
2026-03-17 19:44:59 +03:00
842 changed files with 31583 additions and 32807 deletions

View File

@@ -144,7 +144,7 @@ class InputParams(BaseModel):
#### Examples
Validated against `examples/foundational/07-interruptible.py`:
Validated against `examples/07-interruptible.py`:
- Proper `create_transport()` usage
- Correct pipeline structure

View File

@@ -1,30 +0,0 @@
# flyctl launch added from .gitignore
**/.vscode
**/env
**/__pycache__
**/*~
**/venv
#*#
# Distribution / packaging
**/.Python
**/build
**/develop-eggs
**/dist
**/downloads
**/eggs
**/.eggs
**/lib
**/lib64
**/parts
**/sdist
**/var
**/wheels
**/share/python-wheels
**/*.egg-info
**/.installed.cfg
**/*.egg
**/MANIFEST
**/.DS_Store
**/.env
fly.toml

View File

@@ -32,7 +32,7 @@ jobs:
run: uv python install 3.12
- name: Install development dependencies
run: uv sync --group dev
run: uv sync --group dev --extra daily --extra tracing
- name: Ruff formatter
id: ruff-format
@@ -41,3 +41,7 @@ jobs:
- name: Ruff linter (all rules)
id: ruff-check
run: uv run ruff check
- name: Type check (pyright)
id: pyright
run: uv run pyright

View File

@@ -14,7 +14,7 @@ jobs:
strategy:
fail-fast: false
matrix:
python-version: ['3.10.19', '3.11.14', '3.12.12', '3.13.12']
python-version: ['3.11.15', '3.12.13', '3.13.12', '3.14.3']
name: Python ${{ matrix.python-version }}
steps:
@@ -42,7 +42,7 @@ jobs:
- name: Test uv sync with all extras
run: |
uv sync --group dev --all-extras --no-extra krisp
uv sync --group dev --all-extras
- name: Verify installation
run: |

View File

@@ -1,51 +0,0 @@
name: Sync Quickstart to pipecat-quickstart repo
on:
push:
branches: [main]
paths:
- 'examples/quickstart/**'
workflow_dispatch: # Manual trigger
jobs:
sync-quickstart:
runs-on: ubuntu-latest
steps:
- name: Checkout main repo
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Checkout quickstart repo
uses: actions/checkout@v4
with:
repository: pipecat-ai/pipecat-quickstart
token: ${{ secrets.QUICKSTART_SYNC_TOKEN }}
path: quickstart-repo
- name: Sync files (excluding uv.lock and README.md)
run: |
# Copy all files except uv.lock and README.md
find examples/quickstart -type f \
-not -name "README.md" \
-not -name "uv.lock" \
-exec cp {} quickstart-repo/ \;
- name: Commit and push changes
run: |
cd quickstart-repo
git config user.name "GitHub Action"
git config user.email "action@github.com"
git add .
# Only commit if there are changes
if ! git diff --staged --quiet; then
git commit -m "Sync from pipecat main repo
Updated files from examples/quickstart/
Commit: ${{ github.sha }}
"
git push
else
echo "No changes to sync"
fi

View File

@@ -114,6 +114,7 @@ jobs:
GH_TOKEN=$DOCS_SYNC_TOKEN gh pr create \
--repo pipecat-ai/docs \
--label auto-docs \
--label pipecat \
--title "docs: update for pipecat PR #${{ steps.pr.outputs.number }}" \
--body "$(cat <<'BODY'
Automated documentation update for [pipecat PR #${{ steps.pr.outputs.number }}](https://github.com/pipecat-ai/pipecat/pull/${{ steps.pr.outputs.number }}).

View File

@@ -1,8 +1,13 @@
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.12.1
- repo: local
hooks:
- id: ruff
language_version: python3
args: [--fix]
name: ruff
entry: uv run ruff check --fix
language: system
types: [python]
- id: ruff-format
name: ruff-format
entry: uv run ruff format
language: system
types: [python]

View File

@@ -11,7 +11,7 @@ build:
jobs:
post_install:
- pip install uv
- UV_PROJECT_ENVIRONMENT=$READTHEDOCS_VIRTUALENV_PATH uv sync --group docs --all-extras --no-extra krisp --no-extra gstreamer --no-extra local_smart_turn --no-extra moondream --no-extra riva --no-extra mlx-whisper
- UV_PROJECT_ENVIRONMENT=$READTHEDOCS_VIRTUALENV_PATH uv sync --group docs --all-extras --no-extra gstreamer --no-extra local_smart_turn --no-extra moondream --no-extra mlx-whisper
sphinx:
configuration: docs/api/conf.py

File diff suppressed because it is too large Load Diff

View File

@@ -1,62 +0,0 @@
# Changelog
All notable changes to the **&lt;project name&gt;** SDK will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
Please make sure to add your changes to the appropriate categories:
## [Unreleased]
### Added
<!-- for new functionality -->
- n/a
### Changed
<!-- for changed functionality -->
- n/a
### Deprecated
<!-- for soon-to-be removed functionality -->
- n/a
### Removed
<!-- for removed functionality -->
- n/a
### Fixed
<!-- for fixed bugs -->
- n/a
### Performance
<!-- for performance-relevant changes -->
- n/a
### Security
<!-- for security-relevant changes -->
- n/a
### Other
<!-- for everything else -->
- n/a
## [0.1.0] - YYYY-MM-DD
Initial release.

View File

@@ -10,7 +10,7 @@ Pipecat is an open-source Python framework for building real-time voice and mult
```bash
# Setup development environment
uv sync --group dev --all-extras --no-extra gstreamer --no-extra krisp
uv sync --group dev --all-extras --no-extra gstreamer
# Install pre-commit hooks
uv run pre-commit install

View File

@@ -23,7 +23,7 @@ Create your integration following the patterns and examples shown in the "Integr
Your repository must contain these components:
- **Source code** - Complete implementation following Pipecat patterns
- **Foundational example** - Single file example showing basic usage (see [Pipecat examples](https://github.com/pipecat-ai/pipecat/tree/main/examples/foundational))
- **Foundational example** - Single file example showing basic usage (see [Pipecat examples](https://github.com/pipecat-ai/pipecat/tree/main/examples))
- **README.md** - Must include:
- Introduction and explanation of your integration
- Installation instructions
@@ -65,12 +65,25 @@ Once your PR is submitted, post in the `#community-integrations` Discord channel
#### Websocket-based Services
**Base class:** `WebsocketSTTService`
**Use for:** Services where you manage the websocket connection directly. Combines `STTService` with `WebsocketService` for automatic reconnection and keepalive support.
**Examples:**
- [CartesiaSTTService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/cartesia/stt.py)
- [ElevenLabsRealtimeSTTService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/elevenlabs/stt.py)
#### SDK-based Streaming Services
**Base class:** `STTService`
**Use for:** Streaming services where the provider's Python SDK manages the connection internally.
**Examples:**
- [DeepgramSTTService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/deepgram/stt.py)
- [SpeechmaticsSTTService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/speechmatics/stt.py)
- [GoogleSTTService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/google/stt.py)
#### File-based Services
@@ -108,55 +121,59 @@ Once your PR is submitted, post in the `#community-integrations` Discord channel
#### Key requirements:
- **Frame sequence:** Output must follow this frame sequence pattern:
- `LLMFullResponseStartFrame` - Signals the start of an LLM response
- `LLMTextFrame` - Contains LLM content, typically streamed as tokens
- `LLMFullResponseEndFrame` - Signals the end of an LLM response
- **`_process_context(self, context: LLMContext)`** — The main method that processes an LLM context and generates a response. Each LLM service overrides `process_frame` to extract context from `LLMContextFrame` and calls `_process_context`.
- **Context aggregation:** Implement context aggregation to collect user and assistant content:
- Aggregators come in pairs with a `user()` instance and `assistant()` instance
- Context must adhere to the `LLMContext` universal format
- Aggregators should handle adding messages, function calls, and images to the context
- **`adapter_class`** — Class attribute pointing to a `BaseLLMAdapter` subclass. Defaults to `OpenAILLMAdapter`. Non-OpenAI services must implement their own adapter (see `src/pipecat/adapters/base_llm_adapter.py`) with methods:
- `get_llm_invocation_params(context)` — Extract provider-specific params from universal context
- `to_provider_tools_format(tools_schema)` — Convert standard tools to provider format
- `get_messages_for_logging(context)` — Format messages for logging
- Reference adapters: `src/pipecat/adapters/services/` (anthropic, gemini, bedrock, etc.)
- **Frame sequence:** Output must follow this frame sequence pattern:
- `LLMFullResponseStartFrame` — Signals the start of an LLM response
- `LLMTextFrame` — Contains LLM content, typically streamed as tokens
- `LLMFullResponseEndFrame` — Signals the end of an LLM response
- **Thought frames (reasoning models):** If the model supports extended thinking / chain-of-thought, emit thought frames alongside the response:
- `LLMThoughtStartFrame` — Signals the start of a thought
- `LLMThoughtTextFrame` — Contains thought content, streamed as tokens
- `LLMThoughtEndFrame` — Signals the end of a thought
- **Context aggregation** is handled by the framework via `LLMContext` + `LLMContextAggregatorPair`. The LLM service just processes context it receives — no need to implement aggregators.
### TTS (Text-to-Speech) Services
#### AudioContextWordTTSService
#### WebsocketTTSService
**Use for:** Websocket-based services supporting word/timestamp alignment
**Use for:** Websocket-based streaming services (with or without word timestamps)
**Example:**
**Examples:**
- [CartesiaTTSService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/cartesia/tts.py)
- [ElevenLabsTTSService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/elevenlabs/tts.py)
#### InterruptibleTTSService
**Use for:** Websocket-based services without word/timestamp alignment, requiring disconnection on interruption
**Use for:** Websocket-based services without word timestamps that reconnect on interruption (e.g. don't support a context ID or interruption message)
**Example:**
- [SarvamTTSService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/sarvam/tts.py)
#### WordTTSService
**Use for:** HTTP-based services supporting word/timestamp alignment
**Example:**
- [ElevenLabsHttpTTSService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/elevenlabs/tts.py)
#### TTSService
**Use for:** HTTP-based services without word/timestamp alignment
**Use for:** HTTP-based services (word timestamps are supported in the base class)
**Example:**
**Examples:**
- [GoogleHttpTTSService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/google/tts.py)
- [OpenAITTSService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/openai/tts.py)
#### Key requirements:
- For websocket services, use asyncio WebSocket implementation (required for v13+ support)
- For websocket services, use asyncio WebSocket implementation
- Handle idle service timeouts with keepalives
- TTSServices push both audio (`TTSRawAudioFrame`) and text (`TTSTextFrame`) frames
- TTS services push both audio (`TTSAudioRawFrame`) and text (`TTSTextFrame`) frames
### Telephony Serializers
@@ -200,14 +217,25 @@ Vision services process images and provide analysis such as descriptions, object
#### Key requirements:
- Must implement `run_vision` method that takes an `LLMContext` and returns an `AsyncGenerator[Frame, None]`
- The method processes the latest image in the context and yields frames with analysis results
- Typically yields `TextFrame` objects containing descriptions or answers
- Must implement `run_vision` method that takes a `UserImageRawFrame` and returns an `AsyncGenerator[Frame, None]`
- The method processes the image frame and yields frames with analysis results
- Must yield the frame sequence: `VisionFullResponseStartFrame`, `VisionTextFrame`, `VisionFullResponseEndFrame`
## Implementation Guidelines
### Naming Conventions
#### Package and Repository Naming
Use the `pipecat-{vendor}` naming convention for your PyPI package and repository:
- `pipecat-{vendor}` — for single-service integrations (e.g., `pipecat-deepdub`)
- `pipecat-{vendor}-{type}` — when a vendor offers multiple service types (e.g., `pipecat-upliftai-stt`, `pipecat-upliftai-tts`)
This convention makes community packages easily discoverable via PyPI search and clearly identifies them as part of the Pipecat ecosystem.
#### Class Naming
- **STT:** `VendorSTTService`
- **LLM:** `VendorLLMService`
- **TTS:**
@@ -381,7 +409,7 @@ Note that `self.sample_rate` is a `@property` set in the TTSService base class,
Use Pipecat's tracing decorators:
- **STT:** `@traced_stt` - decorate a function that handles `transcript`, `is_final`, `language` as args
- **STT:** `@traced_stt` - decorate `_handle_transcription(self, transcript, is_final, language)` (the standard method name convention)
- **LLM:** `@traced_llm` - decorate the `_process_context()` method
- **TTS:** `@traced_tts` - decorate the `run_tts()` method
@@ -389,8 +417,9 @@ Use Pipecat's tracing decorators:
### Packaging and Distribution
- Name your package `pipecat-{vendor}` (see [Naming Conventions](#naming-conventions))
- Use [uv](https://docs.astral.sh/uv/) for packaging (encouraged)
- Consider releasing to PyPI for easier installation
- Publish to PyPI for easier installation
- Follow semantic versioning principles
- Maintain a changelog
@@ -403,17 +432,15 @@ For REST-based communication, use aiohttp. Pipecat includes this as a required d
- Wrap API calls in appropriate try/catch blocks
- Handle rate limits and network failures gracefully
- Provide meaningful error messages
- When errors occur, raise exceptions AND push `ErrorFrame`s to notify the pipeline:
- When errors occur, raise exceptions AND push errors to notify the pipeline:
```python
from pipecat.frames.frames import ErrorFrame
try:
# Your API call
result = await self._make_api_call()
except Exception as e:
# Push error frame to pipeline
await self.push_error(ErrorFrame(error=f"{self} error: {e}"))
# Push error upstream to notify the pipeline
await self.push_error(f"{self} error: {e}", exception=e)
# Raise or handle as appropriate
raise
```

View File

@@ -8,7 +8,7 @@
**Pipecat** is an open-source Python framework for building real-time voice and multimodal conversational agents. Orchestrate audio and video, AI services, different transports, and conversation pipelines effortlessly—so you can focus on what makes your agent unique.
> Want to dive right in? Try the [quickstart](https://docs.pipecat.ai/getting-started/quickstart).
> Want to dive right in? Run `pipecat init quickstart` or follow the [quickstart guide](https://docs.pipecat.ai/getting-started/quickstart).
## 🚀 What You Can Build
@@ -28,6 +28,10 @@
## 🌐 Pipecat Ecosystem
### 🧩 Multi-agent systems
Need multiple AI agents working together? [Pipecat Subagents](https://github.com/pipecat-ai/pipecat-subagents) lets you build distributed multi-agent systems where each agent runs its own pipeline and communicates through a shared message bus. Hand off conversations between specialists, dispatch background tasks, and scale agents across processes or machines.
### 📱 Client SDKs
Building client applications? You can connect to Pipecat from any platform using our official SDKs:
@@ -67,7 +71,7 @@ and install any of the available plugins.
### 🧩 Community Integrations
Build and share your own Pipecat service integrations! Browse existing [community integrations](https://docs.pipecat.ai/server/services/community-integrations) or check out our [guide](COMMUNITY_INTEGRATIONS.md) to create your own.
Build and share your own Pipecat service integrations! Browse existing [community integrations](https://docs.pipecat.ai/api-reference/server/services/community-integrations) or check out our [guide](COMMUNITY_INTEGRATIONS.md) to create your own.
### 📺️ Pipecat TV Channel
@@ -79,28 +83,28 @@ Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.yout
<a href="https://github.com/pipecat-ai/pipecat-examples/tree/main/simple-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat-examples/main/simple-chatbot/image.png" width="400" /></a>&nbsp;
<a href="https://github.com/pipecat-ai/pipecat-examples/tree/main/storytelling-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat-examples/main/storytelling-chatbot/image.png" width="400" /></a>
<br/>
<a href="https://github.com/pipecat-ai/pipecat-examples/tree/main/translation-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat-examples/main/translation-chatbot/image.png" width="400" /></a>&nbsp;
<a href="https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/12-describe-video.py"><img src="https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/assets/moondream.png" width="400" /></a>
<a href="https://github.com/pipecat-ai/pipecat-examples/tree/main/daily-multi-translation"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat-examples/main/daily-multi-translation/image.png" width="400" /></a>&nbsp;
<a href="https://github.com/pipecat-ai/pipecat/blob/main/examples/vision/vision-moondream.py"><img src="https://github.com/pipecat-ai/pipecat/blob/main/examples/assets/moondream.png" width="400" /></a>
</p>
## 🧩 Available services
| Category | Services |
| ------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Gradium](https://docs.pipecat.ai/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [NVIDIA Riva](https://docs.pipecat.ai/server/services/stt/riva), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [SambaNova (Whisper)](https://docs.pipecat.ai/server/services/stt/sambanova), [Sarvam](https://docs.pipecat.ai/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) |
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/server/services/llm/aws), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/server/services/llm/mistral), [Novita](https://docs.pipecat.ai/server/services/llm/novita), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nvidia), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/server/services/llm/sambanova), [Sarvam](https://docs.pipecat.ai/server/services/llm/sarvam), [Together AI](https://docs.pipecat.ai/server/services/llm/together) |
| Text-to-Speech | [Async](https://docs.pipecat.ai/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Camb AI](https://docs.pipecat.ai/server/services/tts/camb), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [Gradium](https://docs.pipecat.ai/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/server/services/tts/groq), [Hume](https://docs.pipecat.ai/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/server/services/tts/inworld), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [NVIDIA Riva](https://docs.pipecat.ai/server/services/tts/riva), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [Resemble](https://docs.pipecat.ai/server/services/tts/resemble), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [Smallest](https://docs.pipecat.ai/server/services/tts/smallest), [Speechmatics](https://docs.pipecat.ai/server/services/tts/speechmatics), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
| Speech-to-Speech | [AWS Nova Sonic](https://docs.pipecat.ai/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/server/services/s2s/ultravox), |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local |
| Serializers | [Exotel](https://docs.pipecat.ai/server/utilities/serializers/exotel), [Plivo](https://docs.pipecat.ai/server/utilities/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/utilities/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/utilities/serializers/telnyx), [Vonage](https://docs.pipecat.ai/server/utilities/serializers/vonage) |
| Video | [HeyGen](https://docs.pipecat.ai/server/services/video/heygen), [LemonSlice](https://docs.pipecat.ai/server/services/video/lemonslice), [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) |
| Memory | [mem0](https://docs.pipecat.ai/server/services/memory/mem0) |
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/google-imagen), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) |
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [ai-coustics](https://docs.pipecat.ai/server/utilities/audio/aic-filter) |
| Analytics & Metrics | [OpenTelemetry](https://docs.pipecat.ai/server/utilities/opentelemetry), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) |
| Community | [Browse community integrations →](https://docs.pipecat.ai/server/services/community-integrations) |
| Category | Services |
| ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/api-reference/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/api-reference/server/services/stt/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/api-reference/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/api-reference/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/api-reference/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/api-reference/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/api-reference/server/services/stt/gladia), [Google](https://docs.pipecat.ai/api-reference/server/services/stt/google), [Gradium](https://docs.pipecat.ai/api-reference/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/api-reference/server/services/stt/groq), [Mistral](https://docs.pipecat.ai/api-reference/server/services/stt/mistral), [NVIDIA](https://docs.pipecat.ai/api-reference/server/services/stt/nvidia), [OpenAI (Whisper)](https://docs.pipecat.ai/api-reference/server/services/stt/openai), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/api-reference/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/api-reference/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/api-reference/server/services/stt/whisper), [xAI](https://docs.pipecat.ai/api-reference/server/services/stt/xai) |
| LLMs | [Anthropic](https://docs.pipecat.ai/api-reference/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/api-reference/server/services/llm/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/api-reference/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/api-reference/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/api-reference/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/api-reference/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/api-reference/server/services/llm/grok), [Groq](https://docs.pipecat.ai/api-reference/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/api-reference/server/services/llm/mistral), [Nebius](https://docs.pipecat.ai/api-reference/server/services/llm/nebius), [Novita](https://docs.pipecat.ai/api-reference/server/services/llm/novita), [NVIDIA NIM](https://docs.pipecat.ai/api-reference/server/services/llm/nvidia), [Ollama](https://docs.pipecat.ai/api-reference/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/api-reference/server/services/llm/openai), [OpenAI Responses](https://docs.pipecat.ai/api-reference/server/services/llm/openai-responses), [OpenRouter](https://docs.pipecat.ai/api-reference/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/api-reference/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/api-reference/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/api-reference/server/services/llm/sambanova), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/llm/sarvam), [Together AI](https://docs.pipecat.ai/api-reference/server/services/llm/together) |
| Text-to-Speech | [Async](https://docs.pipecat.ai/api-reference/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/api-reference/server/services/tts/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/tts/azure), [Camb AI](https://docs.pipecat.ai/api-reference/server/services/tts/camb), [Cartesia](https://docs.pipecat.ai/api-reference/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/api-reference/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/api-reference/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/api-reference/server/services/tts/fish), [Google](https://docs.pipecat.ai/api-reference/server/services/tts/google), [Gradium](https://docs.pipecat.ai/api-reference/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/api-reference/server/services/tts/groq), [Hume](https://docs.pipecat.ai/api-reference/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/api-reference/server/services/tts/inworld), [Kokoro](https://docs.pipecat.ai/api-reference/server/services/tts/kokoro), [LMNT](https://docs.pipecat.ai/api-reference/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/api-reference/server/services/tts/minimax), [Mistral](https://docs.pipecat.ai/api-reference/server/services/tts/mistral), [Neuphonic](https://docs.pipecat.ai/api-reference/server/services/tts/neuphonic), [NVIDIA](https://docs.pipecat.ai/api-reference/server/services/tts/nvidia), [OpenAI](https://docs.pipecat.ai/api-reference/server/services/tts/openai), [Piper](https://docs.pipecat.ai/api-reference/server/services/tts/piper), [Resemble](https://docs.pipecat.ai/api-reference/server/services/tts/resemble), [Rime](https://docs.pipecat.ai/api-reference/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/tts/sarvam), [Smallest](https://docs.pipecat.ai/api-reference/server/services/tts/smallest), [Soniox](https://docs.pipecat.ai/api-reference/server/services/tts/soniox), [Speechmatics](https://docs.pipecat.ai/api-reference/server/services/tts/speechmatics), [xAI](https://docs.pipecat.ai/api-reference/server/services/tts/xai), [XTTS](https://docs.pipecat.ai/api-reference/server/services/tts/xtts) |
| Speech-to-Speech | [AWS Nova Sonic](https://docs.pipecat.ai/api-reference/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/api-reference/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/api-reference/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/api-reference/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/api-reference/server/services/s2s/ultravox), |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/api-reference/server/services/transport/fastapi-websocket), [LiveKit (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/livekit), [SmallWebRTCTransport](https://docs.pipecat.ai/api-reference/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/api-reference/server/services/transport/websocket-server), [WhatsApp](https://docs.pipecat.ai/api-reference/server/services/transport/whatsapp), Local |
| Serializers | [Exotel](https://docs.pipecat.ai/api-reference/server/services/serializers/exotel), [Genesys](https://docs.pipecat.ai/api-reference/server/services/serializers/genesys), [Plivo](https://docs.pipecat.ai/api-reference/server/services/serializers/plivo), [Twilio](https://docs.pipecat.ai/api-reference/server/services/serializers/twilio), [Telnyx](https://docs.pipecat.ai/api-reference/server/services/serializers/telnyx), [Vonage](https://docs.pipecat.ai/api-reference/server/services/serializers/vonage) |
| Video | [HeyGen](https://docs.pipecat.ai/api-reference/server/services/video/heygen), [LemonSlice](https://docs.pipecat.ai/api-reference/server/services/transport/lemonslice), [Tavus](https://docs.pipecat.ai/api-reference/server/services/video/tavus), [Simli](https://docs.pipecat.ai/api-reference/server/services/video/simli) |
| Memory | [mem0](https://docs.pipecat.ai/api-reference/server/services/memory/mem0) |
| Vision & Image | [fal](https://docs.pipecat.ai/api-reference/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/api-reference/server/services/image-generation/google-imagen), [Moondream](https://docs.pipecat.ai/api-reference/server/services/vision/moondream) |
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/api-reference/server/utilities/audio/silero-vad-analyzer), [Krisp Viva](https://docs.pipecat.ai/guides/features/krisp-viva), [Koala](https://docs.pipecat.ai/api-reference/server/utilities/audio/koala-filter), [ai-coustics](https://docs.pipecat.ai/api-reference/server/utilities/audio/aic-filter), [RNNoise](https://docs.pipecat.ai/api-reference/server/utilities/audio/rnnoise-filter) |
| Analytics & Metrics | [OpenTelemetry](https://docs.pipecat.ai/api-reference/server/utilities/opentelemetry), [Sentry](https://docs.pipecat.ai/api-reference/server/services/analytics/sentry) |
| Community | [Browse community integrations →](https://docs.pipecat.ai/api-reference/server/services/community-integrations) |
📚 [View full services documentation →](https://docs.pipecat.ai/server/services/supported-services)
📚 [View full services documentation →](https://docs.pipecat.ai/api-reference/server/services/supported-services)
## ⚡ Getting started
@@ -142,15 +146,15 @@ You can get started with Pipecat running on your local machine, then move your a
## 🧪 Code examples
- [Foundational](https://github.com/pipecat-ai/pipecat/tree/main/examples/foundational) — small snippets that build on each other, introducing one or two concepts at a time
- [Foundational](https://github.com/pipecat-ai/pipecat/tree/main/examples) — small snippets that build on each other, introducing one or two concepts at a time
- [Example apps](https://github.com/pipecat-ai/pipecat-examples) — complete applications that you can use as starting points for development
## 🛠️ Contributing to the framework
### Prerequisites
**Minimum Python Version:** 3.10
**Recommended Python Version:** 3.12
**Minimum Python Version:** 3.11
**Recommended Python Version:** >= 3.12
### Setup Steps
@@ -166,7 +170,6 @@ You can get started with Pipecat running on your local machine, then move your a
```bash
uv sync --group dev --all-extras \
--no-extra gstreamer \
--no-extra krisp \
--no-extra local \
```

View File

@@ -1 +0,0 @@
- Added `SarvamLLMService` with support for `sarvam-30b`, `sarvam-30b-16k`, `sarvam-105b` and `sarvam-105b-32k`

View File

@@ -1 +0,0 @@
- Added `on_turn_context_created(context_id)` hook to `TTSService`. Override this to perform provider-specific setup (e.g. eagerly opening a server-side context) before text starts flowing. Called each time a new turn context ID is created.

View File

@@ -1 +0,0 @@
- Added context prewarming path for `InworldTTSService` to improve first audio latency

View File

@@ -1 +0,0 @@
- Added `KrispVivaVadAnalyzer` for Voice Activity Detection using the Krisp VIVA SDK (requires `krisp_audio`).

View File

@@ -1 +0,0 @@
- Modeified `InworldTTSService` to close context at end of turn instead of relying on idle timeout

View File

@@ -1 +0,0 @@
- Added Gemini 3 support to the Gemini Live service.

View File

@@ -1 +0,0 @@
- `TTSService`: the default `stop_frame_timeout_s` (idle time before an automatic `TTSStoppedFrame` is pushed when `push_stop_frames=True`) has changed from `2.0` to `3.0` seconds.

View File

@@ -1 +0,0 @@
- Added support for "developer" role messages in conversation context across all LLM adapters. For non-OpenAI services (Anthropic, Google, AWS Bedrock), "developer" messages are converted to "user" messages (use `system_instruction` to set the system instruction). For OpenAI services, "developer" messages pass through in conversation history. For the Responses API, they are kept as "developer" role (matching the existing "system" → "developer" conversion).

View File

@@ -1 +0,0 @@
- ⚠️ `GeminiLLMAdapter` now only treats `messages[0]` as the initial system message, matching all other adapters. Previously it searched for the first "system" message anywhere in the conversation history. A "system" message appearing later in the list will now be converted to "user" instead of being extracted as the system instruction.

View File

@@ -1 +0,0 @@
- Fixed Gemini Live (`GoogleGeminiLiveLLMService`) not honoring `settings.system_instruction`. The system instruction was being read from a deprecated constructor parameter instead of the settings object, causing it to be silently ignored.

View File

@@ -1 +0,0 @@
- Fixed `AWSBedrockLLMAdapter` sending an empty message list to the API when the only message in context was a system message. The lone system message is now converted to "user" role instead of being extracted, matching the existing Anthropic adapter behavior.

View File

@@ -1 +0,0 @@
- Added `SmallestTTSService`, a WebSocket-based TTS service integration with Smallest AI's Waves API. Supports the Lightning v2 and v3.1 models with configurable voice, language, speed, consistency, similarity, and enhancement settings.

View File

@@ -1 +0,0 @@
- Added warnings in turn stop strategies when `VADParams.stop_secs` differs from the recommended default (0.2s) or when `stop_secs >= STT p99 latency`, which collapses the STT wait timeout to 0s and may cause delayed turn detection. The warnings guide developers to re-run the [stt-benchmark](https://github.com/pipecat-ai/stt-benchmark) with their VAD settings.

View File

@@ -1 +0,0 @@
- Added `domain` parameter to `AssemblyAISTTSettings` for specialized recognition modes such as Medical Mode (`domain="medical-v1"`).

View File

@@ -1 +0,0 @@
- Added `NovitaLLMService` for using Novita AI's LLM models via their OpenAI-compatible API.

View File

@@ -1 +0,0 @@
- Added `cleanup()` method to `VADAnalyzer` and `VADController` so VAD analyzer resources are properly released when no longer needed. Custom `VADAnalyzer` subclasses can override `cleanup()` to free any held resources.

View File

@@ -1 +0,0 @@
- Fixed Gemini Live pipeline hanging indefinitely when an `EndFrame` was deferred while waiting for the bot to finish responding and `turn_complete` never arrived. As a possible root-cause fix, `turn_complete` messages are now handled even if they lack `usage_metadata`. As a fallback, the deferred `EndFrame` now has a 30-second safety timeout.

View File

@@ -1 +0,0 @@
- Fixed ElevenLabs WebSocket disconnections (1008 "Maximum simultaneous contexts exceeded") caused by rapid user interruptions. When interruptions arrived before any TTS text was generated, phantom contexts were created on the ElevenLabs server that were never closed, eventually exceeding the 5-context limit.

View File

@@ -1 +0,0 @@
- Fixed the final sentence being dropped from the conversation context when using RTVI text input with non-word-timestamp TTS services. The `LLMFullResponseEndFrame` was racing ahead of the last `TTSTextFrame`, causing the `LLMAssistantAggregator` to finalize the context before the final sentence arrived.

View File

@@ -1 +0,0 @@
- ⚠️ Realtime services (Gemini Live, OpenAI Realtime, Grok Realtime, Nova Sonic) now prefer `system_instruction` from service settings over an initial system message in the LLM context, matching the behavior of non-realtime services. Previously, context-provided system instructions took precedence. A warning is now logged when both are set.

1
changelog/4385.added.md Normal file
View File

@@ -0,0 +1 @@
- Added a `session_id` field to `RunnerArguments` so bots can log or trace a per-session identifier in local development the same way they can in Pipecat Cloud. The development runner now mints a UUID at every construction site, and paths that already returned a `sessionId` to the caller (Daily `/start`, dial-in webhook) share that same UUID with the runner args instead of generating two. The SmallWebRTC `/api/offer` endpoint also accepts an optional `session_id` query parameter so the `/sessions/{session_id}/...` proxy can thread it through.

View File

@@ -0,0 +1 @@
- Updated the default `SonioxTTSService` model from `tts-rt-v1-preview` to the generally available `tts-rt-v1`.

9
changelog/4389.fixed.md Normal file
View File

@@ -0,0 +1,9 @@
- Fixed AWS services failing silently on missing or invalid credentials.
`AWSNovaSonicLLMService`, `AWSBedrockLLMService`, `AWSPollyTTSService`,
and `AWSTranscribeSTTService` now push a fatal `ErrorFrame` with a
"check AWS credentials and region" hint on auth-class failures, so the
pipeline cancels promptly instead of continuing to run with no output.
- Fixed `AWSNovaSonicLLMService._disconnect` raising `InvalidStateError`
from `awscrt/aio/http.py` when cleanup ran on a stream from a failed
`invoke_model_with_bidirectional_stream` call. The error was masking
the real connect-time auth failure in the logs.

View File

@@ -5,7 +5,7 @@
{% for text, values in sections[section][category].items() %}
{{ text }}
(PR {{ values|join(', ') }})
(PR {{ values|join(', ') }})
{% endfor %}
{% endfor %}

View File

@@ -1,108 +1,60 @@
# Pipecat Documentation
# Pipecat API Documentation
This directory contains the source files for auto-generating Pipecat's server API reference documentation.
## Setup
1. Install documentation dependencies:
```bash
pip install -r requirements.txt
```
2. Make the build scripts executable:
```bash
chmod +x build-docs.sh rtd-test.py
```
This directory contains the source files for auto-generating Pipecat's API reference documentation.
## Building Documentation
From this directory, you can build the documentation in several ways:
### Local Build
From this directory:
```bash
# Using the build script (automatically opens docs when done)
./build-docs.sh
# Build docs (warnings shown but don't fail the build)
cd docs/api && uv run ./build-docs.sh
# Or directly with sphinx-build
sphinx-build -b html . _build/html -W --keep-going
# Build with strict mode (warnings treated as errors)
cd docs/api && uv run ./build-docs.sh --strict
```
### ReadTheDocs Test Build
The build script will:
To test the documentation build process exactly as it would run on ReadTheDocs:
```bash
./rtd-test.py
```
This script:
- Creates a fresh virtual environment
- Installs all dependencies as specified in requirements files
- Handles conflicting dependencies (like grpcio versions for Riva)
- Builds the documentation in an isolated environment
- Provides detailed logging of the build process
Use this script to verify your documentation will build correctly on ReadTheDocs before pushing changes.
## Viewing Documentation
The built documentation will be available at `_build/html/index.html`. To open:
```bash
# On MacOS
open _build/html/index.html
# On Linux
xdg-open _build/html/index.html
# On Windows
start _build/html/index.html
```
1. Install documentation dependencies via `uv sync --group docs`
2. Clean previous build output
3. Run `sphinx-build` to generate HTML documentation
4. Open the result in your browser (macOS)
## Directory Structure
```
.
├── api/ # Auto-generated API documentation
├── _build/ # Built documentation
├── _static/ # Static files (images, css, etc.)
├── conf.py # Sphinx configuration
├── api/ # Auto-generated API documentation (created during build)
├── _build/ # Built documentation output
├── conf.py # Sphinx configuration (mock imports, extensions, etc.)
├── index.rst # Main documentation entry point
├── requirements-base.txt # Base documentation dependencies
├── requirements-riva.txt # Riva-specific dependencies
├── build-docs.sh # Local build script
└── rtd-test.py # ReadTheDocs test build script
└── rtd-test.sh # ReadTheDocs test build script (uses pip, not uv)
```
## Notes
## How It Works
- Documentation is auto-generated from Python docstrings
- Service modules are automatically detected and included
- The build process matches our ReadTheDocs configuration
- Warnings are treated as errors (-W flag) to maintain consistency
- The --keep-going flag ensures all errors are reported
- Dependencies are split into multiple requirements files to handle version conflicts
- `conf.py` runs `sphinx-apidoc` during Sphinx's `setup()` phase to generate `.rst` files from Python source
- Sphinx autodoc imports each module to extract docstrings
- Modules with unavailable dependencies are listed in `autodoc_mock_imports` in `conf.py`
- Napoleon extension converts Google-style docstrings to reStructuredText
## Troubleshooting
If you encounter missing service modules:
**Module not appearing in docs:**
1. Verify the service is installed with its extras: `pip install pipecat-ai[service-name]`
2. Check the build logs for import errors
3. Ensure the service module is properly initialized in the package
4. Run `./rtd-test.py` to test in an isolated environment matching ReadTheDocs
1. Check the build output for `autodoc: failed to import` warnings
2. If the module has an unresolvable import dependency, add it to `autodoc_mock_imports` in `conf.py`
3. Verify the module is importable: `uv run python -c "import pipecat.module.name"`
For dependency conflicts:
**Duplicate object warnings:**
1. Check the requirements files for version specifications
2. Use `rtd-test.py` to verify dependency resolution
3. Consider adding service-specific requirements files if needed
These come from re-export modules or Sphinx discovering the same class through multiple import paths. Usually cosmetic.
For more information:
**Docstring formatting warnings:**
- [ReadTheDocs Configuration](.readthedocs.yaml)
- [Sphinx Documentation](https://www.sphinx-doc.org/)
Docstrings use reStructuredText, not Markdown. Common issues:
- Use `Example::` with indented code blocks, not `` ```python ``
- Ensure blank lines between directive content and subsequent sections
- Use `Parameters:` (not `Attributes:`) for dataclass field documentation to avoid duplicate entries

View File

@@ -1,8 +1,16 @@
#!/bin/bash
# Usage: ./build-docs.sh [--strict]
# --strict: Treat warnings as errors (default: warnings only)
SPHINX_OPTS=""
if [ "$1" = "--strict" ]; then
SPHINX_OPTS="-W --keep-going"
fi
# Build docs using uv
echo "Installing dependencies with uv..."
uv sync --group docs --all-extras --no-extra krisp --no-extra gstreamer --no-extra local_smart_turn --no-extra moondream --no-extra riva --no-extra mlx-whisper
uv sync --group docs --all-extras --no-extra gstreamer --no-extra local_smart_turn --no-extra moondream --no-extra mlx-whisper
# Check if sphinx-build is available
if ! uv run sphinx-build --version &> /dev/null; then
@@ -14,8 +22,7 @@ fi
rm -rf _build
echo "Building documentation..."
# Build docs matching ReadTheDocs configuration
uv run sphinx-build -b html -d _build/doctrees . _build/html -W --keep-going
uv run sphinx-build -b html -d _build/doctrees . _build/html $SPHINX_OPTS
if [ $? -eq 0 ]; then
echo "Documentation built successfully!"

View File

@@ -4,6 +4,19 @@ import sys
from datetime import datetime
from pathlib import Path
# Fix Pydantic v2 + Sphinx autodoc incompatibility: ConfigDict(extra="allow") fails
# during Sphinx's import because __pydantic_extra__ annotation on BaseModel resolves to
# `Dict[str, Any] | None` whose get_origin() is Union, not dict. Patch the check to
# accept Union-wrapped dict types (i.e., Optional[Dict[str, Any]]).
import pydantic._internal._generate_schema as _pydantic_gs
_ORIG_DICT_TYPES = _pydantic_gs.DICT_TYPES
# Expand the accepted types to include Union (Optional[Dict[str, Any]])
import types
import typing
_pydantic_gs.DICT_TYPES = [*_ORIG_DICT_TYPES, typing.Union, types.UnionType]
# Configure logging
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
logger = logging.getLogger("sphinx-build")
@@ -48,8 +61,6 @@ autodoc_default_options = {
# Mock imports for optional dependencies
autodoc_mock_imports = [
# Krisp - has build issues on some platforms
"pipecat_ai_krisp",
"krisp",
"krisp_audio",
# System-specific GUI libraries
"_tkinter",
@@ -78,16 +89,6 @@ autodoc_mock_imports = [
"einops",
"intel_extension_for_pytorch",
"huggingface_hub",
# riva dependencies
"riva",
"riva.client",
"riva.client.Auth",
"riva.client.ASRService",
"riva.client.StreamingRecognitionConfig",
"riva.client.RecognitionConfig",
"riva.client.AudioEncoding",
"riva.client.proto.riva_tts_pb2",
"riva.client.SpeechSynthesisService",
# MLX dependencies (Apple Silicon specific)
"mlx",
"mlx_whisper", # Note: might need underscore format too
@@ -98,7 +99,6 @@ autodoc_mock_imports = [
"cartesia",
"camb",
"sarvamai",
"openpipe",
"openai.types.beta.realtime",
"langchain_core",
"langchain_core.messages",
@@ -110,6 +110,8 @@ autodoc_mock_imports = [
"fastapi.middleware",
"fastapi.responses",
"uvicorn",
# Deepgram dependencies
"deepgram",
]
# HTML output settings
@@ -136,6 +138,8 @@ def import_core_modules():
"pipecat.runner",
"pipecat.serializers",
"pipecat.transcriptions",
"pipecat.turns",
"pipecat.extensions",
"pipecat.utils",
]
@@ -180,7 +184,6 @@ def setup(app):
logger.info(f"Source directory: {source_dir}")
excludes = [
str(project_root / "src/pipecat/pipeline/to_be_updated"),
str(project_root / "src/pipecat/examples"),
str(project_root / "src/pipecat/tests"),
"**/test_*.py",

View File

@@ -32,4 +32,5 @@ Quick Links
Services <api/pipecat.services>
Transcriptions <api/pipecat.transcriptions>
Transports <api/pipecat.transports>
Turns <api/pipecat.turns>
Utils <api/pipecat.utils>

View File

@@ -1,5 +1,5 @@
# AI-COUSTICS
AICOUSTICS_LICENSE_KEY=...
AIC_LICENSE_KEY=...
# Anthropic
ANTHROPIC_API_KEY=...
@@ -80,9 +80,6 @@ GOOGLE_TEST_CREDENTIALS=...
# Gradium
GRAPDIUM_API_KEY=...
# Grok
GROK_API_KEY=...
# Groq
GROQ_API_KEY=...
@@ -124,6 +121,9 @@ MINIMAX_GROUP_ID=...
# Mistral
MISTRAL_API_KEY=...
# Nebius
NEBIUS_API_KEY=...
# Neuphonic
NEUPHONIC_API_KEY=...
@@ -136,9 +136,6 @@ NVIDIA_API_KEY=...
# OpenAI
OPENAI_API_KEY=...
# OpenPipe
OPENPIPE_API_KEY=...
# OpenRouter
OPENROUTER_API_KEY=...
@@ -215,3 +212,12 @@ WHATSAPP_TOKEN=...
WHATSAPP_WEBHOOK_VERIFICATION_TOKEN=...
WHATSAPP_PHONE_NUMBER_ID=...
WHATSAPP_APP_SECRET=...
# xAI / Grok
XAI_API_KEY=...
# PIPECAT_SCTP_MAX_CHUNK_SIZE controls the maximum SCTP DATA-chunk payload
# size (bytes) used by aiortc's data channel. The default is 1100.
# All the details here:
# https://docs.pipecat.ai/api-reference/server/services/transport/small-webrtc#pipecat_sctp_max_chunk_size
#PIPECAT_SCTP_MAX_CHUNK_SIZE=1100

View File

@@ -1,31 +1,150 @@
# Pipecat Examples
This directory contains examples to help you learn how to build with Pipecat.
This directory contains examples showing how to build voice and multimodal agents with Pipecat.
## Getting Started
## Setup
New to Pipecat? Start here:
1. Follow the [README](https://github.com/pipecat-ai/pipecat/blob/main/README.md#%EF%B8%8F-contributing-to-the-framework) steps to get your local environment configured.
- **[Quickstart](quickstart/)** - Get your first voice AI bot running in 5 minutes _(coming soon)_
- **[Client/Server Web](client-server-web/)** - Learn to build web applications with Pipecat's client SDKs _(coming soon)_
- **[Phone Bot with Twilio](phone-bot-twilio/)** - Connect your bot to a phone number _(coming soon)_
> **Run from root directory**: Make sure you are running the steps from the root directory.
## Foundational Examples
> **Using local audio?**: The `LocalAudioTransport` requires a system dependency for `portaudio`. Install the dependency to use the transport.
Single-file examples that introduce core Pipecat concepts one at a time. These examples:
2. Copy the [`env.example`](../env.example) file and add API keys for services you plan to use:
- Build on each other progressively
- Focus on specific features or integrations
- Are used for testing with every Pipecat release
```bash
cp env.example .env
# Edit .env with your API keys
```
See the **[Foundational Examples README](foundational/)** for the complete list.
3. Run any example:
## More Advanced Examples
```bash
uv run python getting-started/01-say-one-thing.py
```
Ready to explore complex use cases? Visit **[pipecat-examples](https://github.com/pipecat-ai/pipecat-examples)** for:
4. Open the web interface at http://localhost:7860/client/ and click "Connect"
- Production-ready applications
- Multi-platform client implementations
- Telephony integrations
- Multimodal and creative applications
- Deployment and monitoring examples
## Running examples with other transports
Most examples support running with other transports, like Twilio or Daily.
### Daily
You need to create a Daily account at https://dashboard.daily.co/u/signup. Once signed up, you can create your own room from the dashboard and set the environment variables `DAILY_ROOM_URL` and `DAILY_API_KEY`. Alternatively, you can let the example create a room for you (still needs `DAILY_API_KEY` environment variable). Then, start any example with `-t daily`:
```bash
uv run getting-started/06-voice-agent.py -t daily
```
### Twilio
It is also possible to run the example through a Twilio phone number. You will need to setup a few things:
1. Install and run [ngrok](https://ngrok.com/download).
```bash
ngrok http 7860
```
2. Configure your Twilio phone number. One way is to setup a TwiML app and set the request URL to the ngrok URL from step (1). Then, set your phone number to use the new TwiML app.
Then, run the example with:
```bash
uv run getting-started/06-voice-agent.py -t twilio -x NGROK_HOST_NAME
```
## Directory Structure
### [`getting-started/`](./getting-started/)
Progressive introduction to Pipecat, from minimal TTS to a full voice agent with function calling.
### [`voice/`](./voice/)
Full STT + LLM + TTS voice agent pipelines showcasing different speech service providers (Deepgram, ElevenLabs, Cartesia, etc.)
### [`function-calling/`](./function-calling/)
Function calling with different LLM providers (OpenAI, Anthropic, Google, etc.)
### [`transcription/`](./transcription/)
Speech-to-text examples with various STT providers.
### [`vision/`](./vision/)
Image description and vision capabilities with different multimodal LLMs.
### [`realtime/`](./realtime/)
Realtime and multimodal live APIs (OpenAI Realtime, Gemini Live, AWS Nova Sonic, Ultravox, Grok).
### [`persistent-context/`](./persistent-context/)
Maintaining conversation context across sessions with different providers.
### [`context-summarization/`](./context-summarization/)
Summarizing conversation context to manage token limits.
### [`update-settings/`](./update-settings/)
Changing service settings at runtime, organized by service type:
- **[`stt/`](./update-settings/stt/)** — Speech-to-text settings
- **[`tts/`](./update-settings/tts/)** — Text-to-speech settings
- **[`llm/`](./update-settings/llm/)** — LLM settings
### [`turn-management/`](./turn-management/)
Turn detection, interruption handling, and user input management.
### [`thinking-and-mcp/`](./thinking-and-mcp/)
LLM thinking/reasoning modes and MCP (Model Context Protocol) tool server integration.
### [`transports/`](./transports/)
Transport layer examples (WebRTC, Daily, LiveKit).
### [`video-avatar/`](./video-avatar/)
Video avatar integrations (Tavus, HeyGen, Simli, LemonSlice).
### [`video-processing/`](./video-processing/)
Video processing, mirroring, GStreamer, and custom video tracks.
### [`audio/`](./audio/)
Audio recording, background sounds, and sound effects.
### [`observability/`](./observability/)
Pipeline monitoring: observers, heartbeats, and Sentry metrics.
### [`rag/`](./rag/)
Retrieval-augmented generation, grounding, and long-term memory (Mem0, Gemini).
### [`features/`](./features/)
Miscellaneous features: wake phrases, live translation, service switching, voice switching, and more.
## Advanced Usage
### Customizing Network Settings
```bash
uv run python <example-name> --host 0.0.0.0 --port 8080
```
### Troubleshooting
- **No audio/video**: Check browser permissions for microphone and camera
- **Connection errors**: Verify API keys in `.env` file
- **Port conflicts**: Use `--port` to change the port
For more examples, visit the [pipecat-examples repository](https://github.com/pipecat-ai/pipecat-examples).

View File

Before

Width:  |  Height:  |  Size: 63 KiB

After

Width:  |  Height:  |  Size: 63 KiB

View File

Before

Width:  |  Height:  |  Size: 1.1 MiB

After

Width:  |  Height:  |  Size: 1.1 MiB

View File

Before

Width:  |  Height:  |  Size: 871 KiB

After

Width:  |  Height:  |  Size: 871 KiB

View File

Before

Width:  |  Height:  |  Size: 868 KiB

After

Width:  |  Height:  |  Size: 868 KiB

View File

Before

Width:  |  Height:  |  Size: 868 KiB

After

Width:  |  Height:  |  Size: 868 KiB

View File

Before

Width:  |  Height:  |  Size: 870 KiB

After

Width:  |  Height:  |  Size: 870 KiB

View File

Before

Width:  |  Height:  |  Size: 871 KiB

After

Width:  |  Height:  |  Size: 871 KiB

View File

Before

Width:  |  Height:  |  Size: 871 KiB

After

Width:  |  Height:  |  Size: 871 KiB

View File

Before

Width:  |  Height:  |  Size: 872 KiB

After

Width:  |  Height:  |  Size: 872 KiB

View File

Before

Width:  |  Height:  |  Size: 868 KiB

After

Width:  |  Height:  |  Size: 868 KiB

View File

Before

Width:  |  Height:  |  Size: 33 KiB

After

Width:  |  Height:  |  Size: 33 KiB

View File

Before

Width:  |  Height:  |  Size: 30 KiB

After

Width:  |  Height:  |  Size: 30 KiB

View File

@@ -34,7 +34,7 @@ from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
OFFICE_SOUND_FILE = os.path.join(
os.path.dirname(__file__), "assets", "office-ambience-24000-mono.mp3"
os.path.dirname(__file__), "../assets", "office-ambience-24000-mono.mp3"
)
# We use lambdas to defer transport parameter creation until the transport
@@ -71,17 +71,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),

View File

@@ -108,17 +108,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"), audio_passthrough=True)
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"], audio_passthrough=True)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121",
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),

View File

@@ -102,17 +102,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),

View File

@@ -89,10 +89,10 @@ async def get_current_weather(params: FunctionCallParams):
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info("Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
@@ -109,7 +109,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# Primary LLM for conversation (could be any provider)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction=system_prompt,
),
@@ -117,7 +117,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# Dedicated cheap/fast LLM for summarization only
summarization_llm = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
api_key=os.environ["GOOGLE_API_KEY"],
settings=GoogleLLMService.Settings(
model="gemini-2.5-flash",
),

View File

@@ -36,7 +36,7 @@ from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.google import GoogleLLMService
from pipecat.services.google.llm import GoogleLLMService
from pipecat.services.llm_service import FunctionCallParams
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
@@ -77,17 +77,17 @@ async def get_current_weather(params: FunctionCallParams):
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info("Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
api_key=os.environ["GOOGLE_API_KEY"],
settings=GoogleLLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way. You have access to tools to get the current weather - use them when relevant.",
),

View File

@@ -72,10 +72,10 @@ async def summarize_conversation(params: FunctionCallParams):
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info("Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
@@ -91,7 +91,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
"""
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction=system_prompt,
),

View File

@@ -77,17 +77,17 @@ async def get_current_weather(params: FunctionCallParams):
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info("Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way. You have access to tools to get the current weather - use them when relevant.",
),

View File

@@ -63,17 +63,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),

View File

@@ -58,24 +58,24 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
openai_llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
groq_llm = GroqLLMService(
api_key=os.getenv("GROQ_API_KEY"),
api_key=os.environ["GROQ_API_KEY"],
settings=GroqLLMService.Settings(
system_instruction="You are a very helpful assistant. Your goal is to demonstrate your capabilities in detail in a creative and helpful way.",
),

View File

@@ -63,10 +63,10 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info("Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
@@ -74,7 +74,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# Main LLM — drives the conversation. Its RTVI events reach the client.
main_llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
@@ -83,7 +83,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# Evaluator LLM — silently grades the user's message in the background.
# Its RTVI events will be suppressed so the client is unaware of this branch.
evaluator_llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
name="EvaluatorLLM",
settings=OpenAILLMService.Settings(
system_instruction="You are a silent quality evaluator. When given a user message, respond with a single JSON object: {'score': <1-5>, 'reason': '<brief reason>'}. Do not respond conversationally.",

View File

@@ -91,17 +91,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),

View File

@@ -56,10 +56,10 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = DeepgramTTSService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
api_key=os.environ["DEEPGRAM_API_KEY"],
settings=DeepgramTTSService.Settings(
voice="aura-asteria-en",
),
@@ -68,7 +68,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
llm = OpenAILLMService(
# To use OpenAI
# api_key=os.getenv("OPENAI_API_KEY"),
# api_key=os.environ["OPENAI_API_KEY"],
# Or, to use a local vLLM (or similar) api server
settings=OpenAILLMService.Settings(
model="meta-llama/Meta-Llama-3-8B-Instruct",

View File

@@ -55,17 +55,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="d4db5fb9-f44b-4bd1-85fa-192e0f0d75f9", # Spanish-speaking Lady
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="You are a live translation assistant. Your sole purpose is to translate English text into Spanish. When you receive English text from the user, immediately translate it into natural, fluent Spanish. Do not add explanations, commentary, or extra information—only provide the Spanish translation of the text you receive.",
),

View File

@@ -45,7 +45,7 @@ from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.frames.frames import LLMRunFrame, TTSUpdateSettingsFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -54,6 +54,7 @@ from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.processors.aggregators.llm_text_processor import LLMTextProcessor
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
@@ -100,39 +101,43 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
# Create pattern pair aggregator for voice switching
pattern_aggregator = PatternPairAggregator()
llm_text_aggregator = PatternPairAggregator()
# Add pattern for voice switching
pattern_aggregator.add_pattern(
llm_text_aggregator.add_pattern(
type="voice",
start_pattern="<voice>",
end_pattern="</voice>",
action=MatchAction.REMOVE, # Remove tags from final text
action=MatchAction.AGGREGATE,
)
# Register handler for voice switching
async def on_voice_tag(match: PatternMatch):
voice_name = match.text.strip().lower()
if voice_name in VOICE_IDS:
# First flush any existing audio to finish the current context
await tts.flush_audio()
# Then set the new voice
await tts.set_voice(VOICE_IDS[voice_name])
await llm_text_processor.push_frame(
TTSUpdateSettingsFrame(
delta=CartesiaTTSService.Settings(voice=VOICE_IDS[voice_name])
)
)
logger.info(f"Switched to {voice_name} voice")
else:
logger.warning(f"Unknown voice: {voice_name}")
pattern_aggregator.on_pattern_match("voice", on_voice_tag)
llm_text_aggregator.on_pattern_match("voice", on_voice_tag)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
# Process LLM text through the pattern aggregator before TTS
llm_text_processor = LLMTextProcessor(text_aggregator=llm_text_aggregator)
# Initialize TTS with narrator voice as default
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice=VOICE_IDS["narrator"],
),
text_aggregator=pattern_aggregator,
skip_aggregator_types=["voice"], # Skip voice tags in TTS speech
)
# System prompt for storytelling with voice switching
@@ -185,7 +190,7 @@ Remember: Use narrator voice for EVERYTHING except the actual quoted dialogue.""
# Initialize LLM
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction=system_prompt,
),
@@ -204,7 +209,8 @@ Remember: Use narrator voice for EVERYTHING except the actual quoted dialogue.""
stt,
user_aggregator,
llm,
tts, # TTS with pattern aggregator
llm_text_processor,
tts,
transport.output(),
assistant_aggregator,
]

View File

@@ -94,19 +94,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
required=["location", "format"],
)
stt_cartesia = CartesiaSTTService(api_key=os.getenv("CARTESIA_API_KEY"))
stt_deepgram = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt_cartesia = CartesiaSTTService(api_key=os.environ["CARTESIA_API_KEY"])
stt_deepgram = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
# Uses ServiceSwitcherStrategyManual by default
stt_switcher = ServiceSwitcher(services=[stt_cartesia, stt_deepgram])
tts_cartesia = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
tts_deepgram = DeepgramTTSService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
api_key=os.environ["DEEPGRAM_API_KEY"],
settings=DeepgramTTSService.Settings(
voice="aura-2-helena-en",
),
@@ -117,11 +117,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
system_prompt = "You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way."
llm_openai = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(system_instruction=system_prompt),
)
llm_google = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
api_key=os.environ["GOOGLE_API_KEY"],
settings=GoogleLLMService.Settings(system_instruction=system_prompt),
)
# Uses ServiceSwitcherStrategyManual by default

View File

@@ -42,14 +42,14 @@ class SwitchLanguage(ParallelPipeline):
self._current_language = "English"
english_tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
spanish_tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="d4db5fb9-f44b-4bd1-85fa-192e0f0d75f9", # Spanish-speaking Lady
),
@@ -101,7 +101,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
api_key=os.environ["DEEPGRAM_API_KEY"],
settings=DeepgramSTTService.Settings(
language="multi",
),
@@ -110,7 +110,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = SwitchLanguage()
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way. You can speak the following languages: 'English' and 'Spanish'.",
),

View File

@@ -42,21 +42,21 @@ class SwitchVoices(ParallelPipeline):
self._current_voice = "News Lady"
news_lady = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="bf991597-6c13-47e4-8411-91ec2de5c466", # Newslady
),
)
british_lady = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
barbershop_man = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="a0e99841-438c-4a64-b679-ae501e7d6091", # Barbershop Man
),
@@ -114,12 +114,12 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = SwitchVoices()
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative and helpful way. You can do the following voices: 'News Lady', 'British Lady' and 'Barbershop Man'.",
),

View File

@@ -60,13 +60,13 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
# Cartesia offers a `<spell></spell>` tags that we can use to ask the user
# to confirm the emails.
# (see https://docs.cartesia.ai/build-with-sonic/formatting-text-for-sonic/spelling-out-input-text)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
@@ -84,7 +84,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# )
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="You need to gather a valid email or emails from the user. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. If the user provides one or more email addresses confirm them with the user. Enclose all emails with <spell> tags, for example <spell>a@a.com</spell>.",
),

View File

@@ -52,22 +52,22 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
classifier_llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
classifier_llm = OpenAILLMService(api_key=os.environ["OPENAI_API_KEY"])
voicemail = VoicemailDetector(llm=classifier_llm)

View File

@@ -57,21 +57,21 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
api_key=os.environ["DEEPGRAM_API_KEY"],
settings=DeepgramSTTService.Settings(
keyterm=["pipecat"],
),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),

View File

@@ -1,71 +0,0 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.frames.frames import EndFrame, TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.piper.tts import PiperHttpTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(audio_out_enabled=True),
"twilio": lambda: FastAPIWebsocketParams(audio_out_enabled=True),
"webrtc": lambda: TransportParams(audio_out_enabled=True),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
# Create an HTTP session
async with aiohttp.ClientSession() as session:
tts = PiperHttpTTSService(
base_url=os.getenv("PIPER_BASE_URL"),
aiohttp_session=session,
sample_rate=24000,
)
task = PipelineTask(
Pipeline([tts, transport.output()]),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
# Register an event handler so we can play the audio when the client joins
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
await task.queue_frames([TTSSpeakFrame(f"Hello there!"), EndFrame()])
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -1,72 +0,0 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.frames.frames import EndFrame, TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.rime.tts import RimeHttpTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(audio_out_enabled=True),
"twilio": lambda: FastAPIWebsocketParams(audio_out_enabled=True),
"webrtc": lambda: TransportParams(audio_out_enabled=True),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
# Create an HTTP session
async with aiohttp.ClientSession() as session:
tts = RimeHttpTTSService(
api_key=os.getenv("RIME_API_KEY", ""),
aiohttp_session=session,
settings=RimeHttpTTSService.Settings(
voice="rex",
),
)
task = PipelineTask(
Pipeline([tts, transport.output()]),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
# Register an event handler so we can play the audio when the client joins
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
await task.queue_frames([TTSSpeakFrame(f"Hello there!"), EndFrame()])
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -1,64 +0,0 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
from dotenv import load_dotenv
from loguru import logger
from pipecat.frames.frames import TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.runner.livekit import configure
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.transports.livekit.transport import LiveKitParams, LiveKitTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
(url, token, room_name) = await configure()
transport = LiveKitTransport(
url=url,
token=token,
room_name=room_name,
params=LiveKitParams(audio_out_enabled=True),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
runner = PipelineRunner()
task = PipelineTask(Pipeline([tts, transport.output()]))
# Register an event handler so we can play the audio when the
# participant joins.
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant_id):
await asyncio.sleep(1)
await task.queue_frame(
TTSSpeakFrame(
"Hello there! How are you doing today? Would you like to talk about the weather?"
)
)
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -1,64 +0,0 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.frames.frames import EndFrame, TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.nvidia.tts import NvidiaTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(audio_out_enabled=True),
"twilio": lambda: FastAPIWebsocketParams(audio_out_enabled=True),
"webrtc": lambda: TransportParams(audio_out_enabled=True),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
tts = NvidiaTTSService(api_key=os.getenv("NVIDIA_API_KEY"))
task = PipelineTask(
Pipeline([tts, transport.output()]),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
# Register an event handler so we can play the audio when the client joins
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
await task.queue_frames([TTSSpeakFrame(f"Hello there!"), EndFrame()])
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -1,84 +0,0 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.frames.frames import TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.fal.image import FalImageGenService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
load_dotenv(override=True)
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
video_out_enabled=True,
video_out_width=1024,
video_out_height=1024,
),
"webrtc": lambda: TransportParams(
video_out_enabled=True,
video_out_width=1024,
video_out_height=1024,
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
# Create an HTTP session
async with aiohttp.ClientSession() as session:
imagegen = FalImageGenService(
settings=FalImageGenService.Settings(
image_size="square_hd",
),
aiohttp_session=session,
key=os.getenv("FAL_KEY"),
)
task = PipelineTask(
Pipeline([imagegen, transport.output()]),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
# Register an event handler so we can play the audio when the client joins
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
await task.queue_frame(TextFrame("a cat in the style of picasso"))
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -1,155 +0,0 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import Frame, LLMRunFrame, MetricsFrame
from pipecat.metrics.metrics import (
LLMUsageMetricsData,
ProcessingMetricsData,
TTFBMetricsData,
TTSUsageMetricsData,
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
class MetricsLogger(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, MetricsFrame):
for d in frame.data:
if isinstance(d, TTFBMetricsData):
print(f"!!! MetricsFrame: {frame}, ttfb: {d.value}")
elif isinstance(d, ProcessingMetricsData):
print(f"!!! MetricsFrame: {frame}, processing: {d.value}")
elif isinstance(d, LLMUsageMetricsData):
tokens = d.value
print(
f"!!! MetricsFrame: {frame}, tokens: {tokens.prompt_tokens}, characters: {tokens.completion_tokens}"
)
elif isinstance(d, TTSUsageMetricsData):
print(f"!!! MetricsFrame: {frame}, characters: {d.value}")
await self.push_frame(frame, direction)
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
ml = MetricsLogger()
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
[
transport.input(),
stt,
user_aggregator,
llm,
tts,
ml,
transport.output(),
assistant_aggregator,
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -1,133 +0,0 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.deepgram.tts import DeepgramTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_turn_strategies import ExternalUserTurnStrategies
load_dotenv(override=True)
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
settings=DeepgramSTTService.Settings(
vad_events=True,
utterance_end_ms="1000",
),
)
tts = DeepgramTTSService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
settings=DeepgramTTSService.Settings(
voice="aura-2-andromeda-en",
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(user_turn_strategies=ExternalUserTurnStrategies()),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
assistant_aggregator, # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -1,130 +0,0 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
import time
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openpipe.llm import OpenPipeLLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
timestamp = int(time.time())
llm = OpenPipeLLMService(
api_key=os.getenv("OPENAI_API_KEY"),
openpipe_api_key=os.getenv("OPENPIPE_API_KEY"),
tags={"conversation_id": f"pipecat-{timestamp}"},
settings=OpenPipeLLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
assistant_aggregator, # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -1,130 +0,0 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.filters.krisp_filter import KrispFilter
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.deepgram.tts import DeepgramTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
audio_in_filter=KrispFilter(),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
audio_in_filter=KrispFilter(),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
audio_in_filter=KrispFilter(),
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = DeepgramTTSService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
settings=DeepgramTTSService.Settings(
voice="aura-helios-en",
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
assistant_aggregator, # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -1,121 +0,0 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
import time
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import Frame, TranscriptionFrame, UserStoppedSpeakingFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.audio.vad_processor import VADProcessor
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.sambanova.stt import SambaNovaSTTService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
STOP_SECS = 2.0
class TranscriptionLogger(FrameProcessor):
"""Measures transcription latency.
Uses the (intentionally) long STOP_SECS parameter to give the transcription time to finish,
then outputs the timing between when the VAD first classified audio input as not-speech and
the delivery of the last transcription frame.
"""
def __init__(self):
super().__init__()
self._last_transcription_time = time.time()
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, UserStoppedSpeakingFrame):
logger.debug(
f"Transcription latency: {(STOP_SECS - (time.time() - self._last_transcription_time)):.2f}"
)
if isinstance(frame, TranscriptionFrame):
self._last_transcription_time = time.time()
# Push all frames through
await self.push_frame(frame, direction)
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = SambaNovaSTTService(
settings=SambaNovaSTTService.Settings(
model="Whisper-Large-v3",
),
api_key=os.getenv("SAMBANOVA_API_KEY"),
)
tl = TranscriptionLogger()
vad_processor = VADProcessor(
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=STOP_SECS))
)
pipeline = Pipeline([transport.input(), vad_processor, stt, tl])
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -1,151 +0,0 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.grok.llm import GrokLLMService
from pipecat.services.llm_service import FunctionCallParams
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
async def fetch_weather_from_api(params: FunctionCallParams):
await params.result_callback({"conditions": "nice", "temperature": "75"})
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = GrokLLMService(
api_key=os.getenv("GROK_API_KEY"),
settings=GrokLLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
# You can also register a function_name of None to get all functions
# sent to the same callback with an additional function_name parameter.
llm.register_function("get_current_weather", fetch_weather_from_api)
weather_function = FunctionSchema(
name="get_current_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the user's location.",
},
},
required=["location", "format"],
)
tools = ToolsSchema(standard_tools=[weather_function])
context = LLMContext(tools=tools)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
[
transport.input(),
stt,
user_aggregator,
llm,
tts,
transport.output(),
assistant_aggregator,
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -1,162 +0,0 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame, TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.elevenlabs.tts import ElevenLabsTTSService
from pipecat.services.google.openai.llm import GoogleLLMOpenAIBetaService
from pipecat.services.llm_service import FunctionCallParams
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
async def fetch_weather_from_api(params: FunctionCallParams):
await params.result_callback({"conditions": "nice", "temperature": "75"})
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = ElevenLabsTTSService(
api_key=os.getenv("ELEVENLABS_API_KEY", ""),
settings=ElevenLabsTTSService.Settings(
voice=os.getenv("ELEVENLABS_VOICE_ID", ""),
),
)
llm = GoogleLLMOpenAIBetaService(
api_key=os.getenv("GOOGLE_API_KEY"),
settings=GoogleLLMOpenAIBetaService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
# You can aslo register a function_name of None to get all functions
# sent to the same callback with an additional function_name parameter.
llm.register_function("get_current_weather", fetch_weather_from_api)
@llm.event_handler("on_function_calls_started")
async def on_function_calls_started(service, function_calls):
await tts.queue_frame(TTSSpeakFrame("Let me check on that."))
weather_function = FunctionSchema(
name="get_current_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the user's location.",
},
},
required=["location", "format"],
)
tools = ToolsSchema(standard_tools=[weather_function])
messages = [
{
"role": "developer",
"content": "Start a conversation with 'Hey there' to get the current weather.",
},
]
context = OpenAILLMContext(messages, tools)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
[
transport.input(),
stt,
user_aggregator,
llm,
tts,
transport.output(),
assistant_aggregator,
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -1,219 +0,0 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
from datetime import datetime
from dotenv import load_dotenv
from loguru import logger
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame, TranscriptionMessage
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.transcript_processor import TranscriptProcessor
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.llm_service import FunctionCallParams
from pipecat.services.openai_realtime_beta import (
InputAudioNoiseReduction,
InputAudioTranscription,
OpenAIRealtimeBetaLLMService,
SemanticTurnDetection,
SessionProperties,
)
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
async def fetch_weather_from_api(params: FunctionCallParams):
temperature = 75 if params.arguments["format"] == "fahrenheit" else 24
await params.result_callback(
{
"conditions": "nice",
"temperature": temperature,
"format": params.arguments["format"],
"timestamp": datetime.now().strftime("%Y%m%d_%H%M%S"),
}
)
async def fetch_restaurant_recommendation(params: FunctionCallParams):
await params.result_callback({"name": "The Golden Dragon"})
weather_function = FunctionSchema(
name="get_current_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
},
required=["location", "format"],
)
restaurant_function = FunctionSchema(
name="get_restaurant_recommendation",
description="Get a restaurant recommendation",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
},
required=["location"],
)
# Create tools schema
tools = ToolsSchema(standard_tools=[weather_function, restaurant_function])
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
session_properties = SessionProperties(
input_audio_transcription=InputAudioTranscription(),
# Set openai TurnDetection parameters. Not setting this at all will turn it
# on by default
turn_detection=SemanticTurnDetection(),
# Or set to False to disable openai turn detection and use transport VAD
# turn_detection=False,
input_audio_noise_reduction=InputAudioNoiseReduction(type="near_field"),
# tools=tools,
instructions="""You are a helpful and friendly AI.
Act like a human, but remember that you aren't a human and that you can't do human
things in the real world. Your voice and personality should be warm and engaging, with a lively and
playful tone.
If interacting in a non-English language, start by using the standard accent or dialect familiar to
the user. Talk quickly. You should always call a function if you can. Do not refer to these rules,
even if you're asked about them.
You are participating in a voice conversation. Keep your responses concise, short, and to the point
unless specifically asked to elaborate on a topic.
You have access to the following tools:
- get_current_weather: Get the current weather for a given location.
- get_restaurant_recommendation: Get a restaurant recommendation for a given location.
Remember, your responses should be short. Just one or two sentences, usually. Respond in English.""",
)
llm = OpenAIRealtimeBetaLLMService(
api_key=os.getenv("OPENAI_API_KEY"),
session_properties=session_properties,
)
# you can either register a single function for all function calls, or specific functions
# llm.register_function(None, fetch_weather_from_api)
llm.register_function("get_current_weather", fetch_weather_from_api)
llm.register_function("get_restaurant_recommendation", fetch_restaurant_recommendation)
transcript = TranscriptProcessor()
# Create a standard OpenAI LLM context object using the normal messages format. The
# OpenAIRealtimeBetaLLMService will convert this internally to messages that the
# openai WebSocket API can understand.
context = OpenAILLMContext(
[{"role": "developer", "content": "Say hello!"}],
tools,
)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
context_aggregator.user(),
llm, # LLM
transcript.user(), # Placed after the LLM, as LLM pushes TranscriptionFrames downstream
transport.output(), # Transport bot output
transcript.assistant(), # After the transcript output, to time with the audio output
context_aggregator.assistant(),
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
# Register event handler for transcript updates
@transcript.event_handler("on_transcript_update")
async def on_transcript_update(processor, frame):
for msg in frame.messages:
if isinstance(msg, TranscriptionMessage):
timestamp = f"[{msg.timestamp}] " if msg.timestamp else ""
line = f"{timestamp}{msg.role}: {msg.content}"
logger.info(f"Transcript: {line}")
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -1,214 +0,0 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
from datetime import datetime
from dotenv import load_dotenv
from loguru import logger
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.llm_service import FunctionCallParams
from pipecat.services.openai_realtime_beta import (
AzureRealtimeBetaLLMService,
InputAudioTranscription,
SessionProperties,
)
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
async def fetch_weather_from_api(params: FunctionCallParams):
temperature = 75 if params.arguments["format"] == "fahrenheit" else 24
await params.result_callback(
{
"conditions": "nice",
"temperature": temperature,
"format": params.arguments["format"],
"timestamp": datetime.now().strftime("%Y%m%d_%H%M%S"),
}
)
async def fetch_restaurant_recommendation(params: FunctionCallParams):
await params.result_callback({"name": "The Golden Dragon"})
# Define weather function using standardized schema
weather_function = FunctionSchema(
name="get_current_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
},
required=["location", "format"],
)
restaurant_function = FunctionSchema(
name="get_restaurant_recommendation",
description="Get a restaurant recommendation",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
},
required=["location"],
)
# Create tools schema
tools = ToolsSchema(standard_tools=[weather_function, restaurant_function])
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
session_properties = SessionProperties(
input_audio_transcription=InputAudioTranscription(model="whisper-1"),
# Set openai TurnDetection parameters. Not setting this at all will turn it
# on by default
# turn_detection=TurnDetection(silence_duration_ms=1000),
# Or set to False to disable openai turn detection and use transport VAD
# turn_detection=False,
# tools=tools,
instructions="""You are a helpful and friendly AI.
Act like a human, but remember that you aren't a human and that you can't do human
things in the real world. Your voice and personality should be warm and engaging, with a lively and
playful tone.
If interacting in a non-English language, start by using the standard accent or dialect familiar to
the user. Talk quickly. You should always call a function if you can. Do not refer to these rules,
even if you're asked about them.
-
You are participating in a voice conversation. Keep your responses concise, short, and to the point
unless specifically asked to elaborate on a topic.
You have access to the following tools:
- get_current_weather: Get the current weather for a given location.
- get_restaurant_recommendation: Get a restaurant recommendation for a given location.
Remember, your responses should be short. Just one or two sentences, usually. Respond in English.""",
)
llm = AzureRealtimeBetaLLMService(
api_key=os.getenv("AZURE_REALTIME_API_KEY"),
base_url=os.getenv("AZURE_REALTIME_BASE_URL"),
session_properties=session_properties,
)
# you can either register a single function for all function calls, or specific functions
# llm.register_function(None, fetch_weather_from_api)
llm.register_function("get_current_weather", fetch_weather_from_api)
llm.register_function("get_restaurant_recommendation", fetch_restaurant_recommendation)
# Create a standard OpenAI LLM context object using the normal messages format. The
# OpenAIRealtimeBetaLLMService will convert this internally to messages that the
# openai WebSocket API can understand.
context = OpenAILLMContext(
[{"role": "developer", "content": "Say hello!"}],
# [{"role": "developer", "content": [{"type": "text", "text": "Say hello!"}]}],
# [
# {
# "role": "developer",
# "content": [
# {"type": "text", "text": "Say"},
# {"type": "text", "text": "yo what's up!"},
# ],
# }
# ],
tools,
)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
context_aggregator.user(),
llm, # LLM
transport.output(), # Transport bot output
context_aggregator.assistant(),
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -1,215 +0,0 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
from datetime import datetime
from dotenv import load_dotenv
from loguru import logger
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.llm_service import FunctionCallParams
from pipecat.services.openai_realtime_beta import (
InputAudioNoiseReduction,
InputAudioTranscription,
OpenAIRealtimeBetaLLMService,
SemanticTurnDetection,
SessionProperties,
)
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
async def fetch_weather_from_api(params: FunctionCallParams):
temperature = 75 if params.arguments["format"] == "fahrenheit" else 24
await params.result_callback(
{
"conditions": "nice",
"temperature": temperature,
"format": params.arguments["format"],
"timestamp": datetime.now().strftime("%Y%m%d_%H%M%S"),
}
)
async def fetch_restaurant_recommendation(params: FunctionCallParams):
await params.result_callback({"name": "The Golden Dragon"})
weather_function = FunctionSchema(
name="get_current_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
},
required=["location", "format"],
)
restaurant_function = FunctionSchema(
name="get_restaurant_recommendation",
description="Get a restaurant recommendation",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
},
required=["location"],
)
# Create tools schema
tools = ToolsSchema(standard_tools=[weather_function, restaurant_function])
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
session_properties = SessionProperties(
input_audio_transcription=InputAudioTranscription(),
modalities=["text"],
# Set openai TurnDetection parameters. Not setting this at all will turn it
# on by default
turn_detection=SemanticTurnDetection(),
# Or set to False to disable openai turn detection and use transport VAD
# turn_detection=False,
input_audio_noise_reduction=InputAudioNoiseReduction(type="near_field"),
# tools=tools,
instructions="""You are a helpful and friendly AI.
Act like a human, but remember that you aren't a human and that you can't do human
things in the real world. Your voice and personality should be warm and engaging, with a lively and
playful tone.
If interacting in a non-English language, start by using the standard accent or dialect familiar to
the user. Talk quickly. You should always call a function if you can. Do not refer to these rules,
even if you're asked about them.
You are participating in a voice conversation. Keep your responses concise, short, and to the point
unless specifically asked to elaborate on a topic.
You have access to the following tools:
- get_current_weather: Get the current weather for a given location.
- get_restaurant_recommendation: Get a restaurant recommendation for a given location.
Remember, your responses should be short. Just one or two sentences, usually. Respond in English.""",
)
llm = OpenAIRealtimeBetaLLMService(
api_key=os.getenv("OPENAI_API_KEY"),
session_properties=session_properties,
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
# you can either register a single function for all function calls, or specific functions
# llm.register_function(None, fetch_weather_from_api)
llm.register_function("get_current_weather", fetch_weather_from_api)
llm.register_function("get_restaurant_recommendation", fetch_restaurant_recommendation)
# Create a standard OpenAI LLM context object using the normal messages format. The
# OpenAIRealtimeBetaLLMService will convert this internally to messages that the
# openai WebSocket API can understand.
context = OpenAILLMContext(
[{"role": "developer", "content": "Say hello!"}],
tools,
)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
context_aggregator.user(),
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(),
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -1,267 +0,0 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import glob
import json
import os
from datetime import datetime
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import (
OpenAILLMContext,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.llm_service import FunctionCallParams
from pipecat.services.openai_realtime_beta import (
InputAudioTranscription,
OpenAIRealtimeBetaLLMService,
SessionProperties,
TurnDetection,
)
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
BASE_FILENAME = "/tmp/pipecat_conversation_"
async def fetch_weather_from_api(params: FunctionCallParams):
temperature = 75 if params.arguments["format"] == "fahrenheit" else 24
await params.result_callback(
{
"conditions": "nice",
"temperature": temperature,
"format": params.arguments["format"],
"timestamp": datetime.now().strftime("%Y%m%d_%H%M%S"),
}
)
async def get_saved_conversation_filenames(params: FunctionCallParams):
# Construct the full pattern including the BASE_FILENAME
full_pattern = f"{BASE_FILENAME}*.json"
# Use glob to find all matching files
matching_files = glob.glob(full_pattern)
logger.debug(f"matching files: {matching_files}")
await params.result_callback({"filenames": matching_files})
async def save_conversation(params: FunctionCallParams):
timestamp = datetime.now().strftime("%Y-%m-%d_%H:%M:%S")
filename = f"{BASE_FILENAME}{timestamp}.json"
logger.debug(
f"writing conversation to {filename}\n{json.dumps(params.context.messages, indent=4)}"
)
try:
with open(filename, "w") as file:
messages = params.context.get_messages_for_persistent_storage()
# remove the last message, which is the instruction we just gave to save the conversation
messages.pop()
json.dump(messages, file, indent=2)
await params.result_callback({"success": True})
except Exception as e:
await params.result_callback({"success": False, "error": str(e)})
async def load_conversation(params: FunctionCallParams):
async def _reset():
filename = params.arguments["filename"]
logger.debug(f"loading conversation from {filename}")
try:
with open(filename, "r") as file:
params.context.set_messages(json.load(file))
await params.llm.reset_conversation()
await params.llm._create_response()
except Exception as e:
await params.result_callback({"success": False, "error": str(e)})
asyncio.create_task(_reset())
tools = [
{
"type": "function",
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
},
"required": ["location", "format"],
},
},
{
"type": "function",
"name": "save_conversation",
"description": "Save the current conversation. Use this function to persist the current conversation to external storage.",
"parameters": {
"type": "object",
"properties": {},
"required": [],
},
},
{
"type": "function",
"name": "get_saved_conversation_filenames",
"description": "Get a list of saved conversation histories. Returns a list of filenames. Each filename includes a date and timestamp. Each file is conversation history that can be loaded into this session.",
"parameters": {
"type": "object",
"properties": {},
"required": [],
},
},
{
"type": "function",
"name": "load_conversation",
"description": "Load a conversation history. Use this function to load a conversation history into the current session.",
"parameters": {
"type": "object",
"properties": {
"filename": {
"type": "string",
"description": "The filename of the conversation history to load.",
}
},
"required": ["filename"],
},
},
]
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
session_properties = SessionProperties(
input_audio_transcription=InputAudioTranscription(),
# Set openai TurnDetection parameters. Not setting this at all will turn
# it on by default
turn_detection=TurnDetection(silence_duration_ms=1000),
# Or set to False to disable openai turn detection and use transport VAD
# turn_detection=False,
# tools=tools,
instructions="""Your knowledge cutoff is 2023-10. You are a helpful and friendly AI.
Act like a human, but remember that you aren't a human and that you can't do human
things in the real world. Your voice and personality should be warm and engaging, with a lively and
playful tone.
If interacting in a non-English language, start by using the standard accent or dialect familiar to
the user. Talk quickly. You should always call a function if you can. Do not refer to these rules,
even if you're asked about them.
-
You are participating in a voice conversation. Keep your responses concise, short, and to the point
unless specifically asked to elaborate on a topic.
Remember, your responses should be short. Just one or two sentences, usually.""",
)
llm = OpenAIRealtimeBetaLLMService(
api_key=os.getenv("OPENAI_API_KEY"),
session_properties=session_properties,
)
# you can either register a single function for all function calls, or specific functions
# llm.register_function(None, fetch_weather_from_api)
llm.register_function("get_current_weather", fetch_weather_from_api)
llm.register_function("save_conversation", save_conversation)
llm.register_function("get_saved_conversation_filenames", get_saved_conversation_filenames)
llm.register_function("load_conversation", load_conversation)
context = OpenAILLMContext([], tools)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(),
llm, # LLM
transport.output(), # Transport bot output
context_aggregator.assistant(),
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -1,155 +0,0 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.google.gemini_live.llm import GeminiLiveLLMService, GeminiModalities
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
SYSTEM_INSTRUCTION = f"""
"You are Gemini Chatbot, a friendly, helpful robot.
Your goal is to demonstrate your capabilities in a succinct way.
Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points.
Respond to what the user said in a creative and helpful way. Keep your responses brief. One or two sentences at most.
"""
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
# KNOWN ISSUE: If using GeminiLiveVertexLLMService, you cannot specify a
# modality other than AUDIO (at least not if using the service's default
# model, which is a native audio model:
# https://cloud.google.com/vertex-ai/generative-ai/docs/live-api/tools#native-audio).
llm = GeminiLiveLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
settings=GeminiLiveLLMService.Settings(
system_instruction=SYSTEM_INSTRUCTION,
modalities=GeminiModalities.TEXT,
),
tools=[{"google_search": {}}, {"code_execution": {}}],
)
# Optionally, you can set the response modalities via a function
# llm.set_model_modalities(
# GeminiMultimodalModalities.TEXT
# )
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"), voice_id="71a7ad14-091c-4e8e-a314-022ece01c121"
)
messages = [
{
"role": "developer",
"content": 'Start by saying "Hello, I\'m Gemini".',
},
]
# Set up conversation context and management
# The context_aggregator will automatically collect conversation context
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
# Set stop_secs to something roughly similar to the internal setting
# of the Multimodal Live api, just to align events. This doesn't
# really matter because we can only use the Multimodal Live API's
# phrase endpointing, for now.
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.5))
),
)
pipeline = Pipeline(
[
transport.input(),
user_aggregator,
llm,
tts,
transport.output(),
assistant_aggregator,
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -1,143 +0,0 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.fal_smart_turn import FalSmartTurnAnalyzer
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[
TurnAnalyzerUserTurnStopStrategy(
turn_analyzer=FalSmartTurnAnalyzer(
api_key=os.getenv("FAL_SMART_TURN_API_KEY"),
aiohttp_session=aiohttp.ClientSession(),
)
)
]
),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
assistant_aggregator, # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -1,250 +0,0 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import io
import json
import os
import re
import shutil
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from mcp import StdioServerParameters
from PIL import Image
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import (
Frame,
FunctionCallResultFrame,
LLMRunFrame,
URLImageRawFrame,
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.anthropic.llm import AnthropicLLMService
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.mcp_service import MCPClient
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
load_dotenv(override=True)
class UrlToImageProcessor(FrameProcessor):
def __init__(self, aiohttp_session: aiohttp.ClientSession, **kwargs):
super().__init__(**kwargs)
self._aiohttp_session = aiohttp_session
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, FunctionCallResultFrame):
await self.push_frame(frame, direction)
image_url = self.extract_url(frame.result)
if image_url:
await self.run_image_process(image_url)
# sometimes we get multiple image urls- process 1 at a time
await asyncio.sleep(1)
else:
await self.push_frame(frame, direction)
def extract_url(self, text: str):
try:
data = json.loads(text)
if "artObject" in data:
return data["artObject"]["webImage"]["url"]
if "artworks" in data and len(data["artworks"]):
return data["artworks"][0]["webImage"]["url"]
except (json.JSONDecodeError, KeyError, TypeError):
pass
return None
async def run_image_process(self, image_url: str):
try:
logger.debug(f"handling image from url: '{image_url}'")
async with self._aiohttp_session.get(image_url) as response:
image_stream = io.BytesIO(await response.content.read())
image = Image.open(image_stream)
image = image.convert("RGB")
frame = URLImageRawFrame(
url=image_url, image=image.tobytes(), size=image.size, format="RGB"
)
await self.push_frame(frame)
except Exception as e:
error_msg = f"Error handling image url {image_url}: {str(e)}"
logger.error(error_msg)
# full list of tools available from rijksmuseum MCP:
# - get_artwork_details
# - get_artwork_image
# - get_user_sets
# - get_user_set_details
# - open_image_in_browser
# - get_artist_timeline
mcp_tools_filter = ["get_artwork_details", "get_artwork_image", "open_image_in_browser"]
def open_image_output_filter(output: str):
pattern = r"Successfully opened image in browser: "
text_to_print = re.sub(pattern, "", output)
print(f"🖼️ link to high resolution artwork: {text_to_print}")
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
video_out_enabled=True,
video_out_width=1024,
video_out_height=1024,
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
video_out_enabled=True,
video_out_width=1024,
video_out_height=1024,
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
# Create an HTTP session for API calls
async with aiohttp.ClientSession() as session:
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
system_prompt = f"""
You are a helpful LLM in a voice call.
Your goal is to demonstrate your capabilities in a succinct way.
You have access to tools to search the Rijksmuseum collection.
Offer, for example, to show a floral still life, use the `search_artwork` tool.
The tool may respond with a JSON object with an `artworks` array. Choose the art from that array.
Once the tool has responded, tell the user the title and use the `open_image_in_browser` tool.
Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points.
Respond to what the user said in a creative and helpful way.
Don't overexplain what you are doing.
Just respond with short sentences when you are carrying out tool calls.
"""
llm = AnthropicLLMService(
api_key=os.getenv("ANTHROPIC_API_KEY"),
settings=AnthropicLLMService.Settings(
system_instruction=system_prompt,
),
)
try:
mcp = MCPClient(
server_params=StdioServerParameters(
command=shutil.which("npx"),
# https://github.com/r-huijts/rijksmuseum-mcp
args=["-y", "mcp-server-rijksmuseum"],
env={"RIJKSMUSEUM_API_KEY": os.getenv("RIJKSMUSEUM_API_KEY")},
),
# Optional
tools_filter=mcp_tools_filter, # Optional
tools_output_filters={"open_image_in_browser": open_image_output_filter},
)
except Exception as e:
logger.error(f"error setting up mcp")
logger.exception("error trace:")
mcp_image = UrlToImageProcessor(aiohttp_session=session)
tools = {}
try:
tools = await mcp.register_tools(llm)
except Exception as e:
logger.error(f"error registering tools")
logger.exception("error trace:")
context = LLMContext(tools=tools)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
user_aggregator, # User spoken responses
llm, # LLM
tts, # TTS
mcp_image, # URL image -> output
transport.output(), # Transport bot output
assistant_aggregator, # Assistant spoken responses and tool context
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected: {client}")
# Kick off the conversation.
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
if not os.getenv("RIJKSMUSEUM_API_KEY"):
logger.error(
f"Please set RIJKSMUSEUM_API_KEY environment variable for this example. See https://github.com/r-huijts/rijksmuseum-mcp and https://www.rijksmuseum.nl/en/register?redirectUrl=https://www.https://www.rijksmuseum.nl/en/rijksstudio/my/profile"
)
import sys
sys.exit(1)
from pipecat.runner.run import main
main()

View File

@@ -1,162 +0,0 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
from dotenv import load_dotenv
from loguru import logger
from mcp.client.session_group import StreamableHttpParameters
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.google.llm import GoogleLLMService
from pipecat.services.mcp_service import MCPClient
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
system_prompt = f"""
You are a helpful LLM in a voice call.
Your goal is to answer questions about the user's GitHub repositories and account.
You have access to a number of tools provided by Github. Use any and all tools to help users.
Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points.
Don't overexplain what you are doing.
Just respond with short sentences when you are carrying out tool calls.
"""
llm = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
system_instruction=system_prompt,
)
try:
# Github MCP docs: https://github.com/github/github-mcp-server
# Enable Github Copilot on your GitHub account. Free tier is ok. (https://github.com/settings/copilot)
# Generate a personal access token. It must be a Fine-grained token, classic tokens are not supported. (https://github.com/settings/personal-access-tokens)
# Set permissions you want to use (eg. "all repositories", "profile: read/write", etc)
mcp = MCPClient(
server_params=StreamableHttpParameters(
url="https://api.githubcopilot.com/mcp/",
headers={"Authorization": f"Bearer {os.getenv('GITHUB_PERSONAL_ACCESS_TOKEN')}"},
)
)
except Exception as e:
logger.error(f"error setting up mcp")
logger.exception("error trace:")
tools = {}
try:
tools = await mcp.register_tools(llm)
except Exception as e:
logger.error(f"error registering tools")
logger.exception("error trace:")
context = LLMContext(tools=tools)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
user_aggregator, # User spoken responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
assistant_aggregator, # Assistant spoken responses and tool context
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected: {client}")
# Kick off the conversation.
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
if not os.getenv("GITHUB_PERSONAL_ACCESS_TOKEN"):
logger.error(
f"Please set GITHUB_PERSONAL_ACCESS_TOKEN environment variable for this example."
)
import sys
sys.exit(1)
from pipecat.runner.run import main
main()

View File

@@ -1,163 +0,0 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
from dotenv import load_dotenv
from loguru import logger
from mcp.client.session_group import StreamableHttpParameters
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.google.gemini_live.llm import GeminiLiveLLMService
from pipecat.services.mcp_service import MCPClient
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
try:
# Github MCP docs: https://github.com/github/github-mcp-server
# Enable Github Copilot on your GitHub account. Free tier is ok. (https://github.com/settings/copilot)
# Generate a personal access token. It must be a Fine-grained token, classic tokens are not supported. (https://github.com/settings/personal-access-tokens)
# Set permissions you want to use (eg. "all repositories", "profile: read/write", etc)
mcp = MCPClient(
server_params=StreamableHttpParameters(
url="https://api.githubcopilot.com/mcp/",
headers={"Authorization": f"Bearer {os.getenv('GITHUB_PERSONAL_ACCESS_TOKEN')}"},
)
)
except Exception as e:
logger.error(f"error setting up mcp")
logger.exception("error trace:")
tools = {}
try:
tools = await mcp.get_tools_schema()
except Exception as e:
logger.error(f"error registering tools")
logger.exception("error trace:")
system = f"""
You are a helpful LLM in a voice call.
Your goal is to answer questions about the user's GitHub repositories and account.
You have access to a number of tools provided by Github. Use any and all tools to help users.
Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points.
Don't overexplain what you are doing.
Just respond with short sentences when you are carrying out tool calls.
"""
llm = GeminiLiveLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
system_instruction=system,
tools=tools,
)
await mcp.register_tools_schema(tools, llm)
context = LLMContext([{"role": "developer", "content": "Please introduce yourself."}])
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
user_aggregator, # User spoken responses
llm, # LLM
transport.output(), # Transport bot output
assistant_aggregator, # Assistant spoken responses and tool context
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected: {client}")
# Kick off the conversation.
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
if not os.getenv("GITHUB_PERSONAL_ACCESS_TOKEN"):
logger.error(
f"Please set GITHUB_PERSONAL_ACCESS_TOKEN environment variable for this example."
)
import sys
sys.exit(1)
from pipecat.runner.run import main
main()

Some files were not shown because too many files have changed in this diff Show More