Compare commits

...

196 Commits

Author SHA1 Message Date
Aleix Conchillo Flaqué
9697abe559 Merge pull request #4381 from pipecat-ai/changelog-1.1.0
Release 1.1.0 - Changelog Update
2026-04-27 14:02:20 -07:00
aconchillo
cb0335c82a Update changelog for version 1.1.0 2026-04-27 13:59:17 -07:00
Aleix Conchillo Flaqué
f560614af9 Merge pull request #4379 from pipecat-ai/aleix/bump-daily-python-0.28
chore(daily): bump daily-python to ~=0.28.0
2026-04-27 13:46:00 -07:00
Aleix Conchillo Flaqué
d7a196a3f4 docs(changelog): add entry for daily-python 0.28.0 bump 2026-04-27 13:35:14 -07:00
Aleix Conchillo Flaqué
644e106c03 chore(daily): bump daily-python to ~=0.28.0 2026-04-27 13:35:14 -07:00
Mark Backman
70f83b4a75 Merge pull request #4360 from pipecat-ai/mb/soniox-tts
Add Soniox real-time TTS service
2026-04-27 16:06:24 -04:00
Mark Backman
35ed37c539 chore: add changelog fragment for PR #4360 2026-04-27 16:04:02 -04:00
Mark Backman
58a038ddb2 Add Soniox real-time TTS service
Introduce SonioxTTSService, a WebSocket TTS provider that streams text and
receives audio over a persistent connection, multiplexing up to 5 concurrent
streams per socket via Soniox's `stream_id`. Also updates the README service
table and the Soniox voice example to use the new TTS end-to-end.
2026-04-27 16:04:02 -04:00
Aleix Conchillo Flaqué
de3c1d6e8b Merge pull request #4370 from pipecat-ai/aleix/daily-screen-video-destination
feat(daily): support screenVideo destination and configurable camera send settings
2026-04-27 11:36:22 -07:00
Aleix Conchillo Flaqué
0a9878998f docs(changelog): add entries for camera_out_send_settings and video_out_bitrate deprecation 2026-04-27 11:28:59 -07:00
Aleix Conchillo Flaqué
8459c01af8 feat(daily): add camera_out_send_settings and deprecate video_out_bitrate
Replaces the hardcoded camera publishing send settings in
DailyTransport with a new DailyParams.camera_out_send_settings dict that
applications can pass through verbatim to the Daily client. This makes
the encoding/codec/bitrate configuration user-controllable instead of
being driven solely by the generic TransportParams fields.

As a consequence, TransportParams.video_out_bitrate is deprecated for
the Daily transport (now configured via camera_out_send_settings) and
its default is changed to None.
2026-04-27 11:28:59 -07:00
Aleix Conchillo Flaqué
baaabf7d73 docs(changelog): add entry for screenVideo destination support 2026-04-27 11:28:59 -07:00
Aleix Conchillo Flaqué
4735b74776 feat(daily): support screenVideo destination for video output
Adds a dedicated screen video track alongside the existing camera track
so applications can publish to Daily's built-in "screenVideo" destination
via video_out_destinations. The track is created at join time and wired
into the client settings (inputs and publishing) when "screenVideo" is
configured; write_video_frame routes frames to the appropriate track
based on the frame's transport_destination.
2026-04-27 11:28:59 -07:00
kompfner
0109aea04c Merge pull request #4377 from pipecat-ai/pk/add-example-for-tool-resources
Add example demonstrating usage of `tool_resources`
2026-04-27 13:02:39 -04:00
kompfner
ce1311f6ba Merge pull request #4301 from bnovik0v/fix-4300-missing-tool-lifecycle
Fail missing tool calls cleanly
2026-04-27 11:54:43 -04:00
Paul Kompfner
2520243d9d style: apply ruff format 2026-04-27 11:48:27 -04:00
borislav
8869e25142 fix: compare bound method by equality, not identity
Bound methods are created fresh on each attribute access, so
'self._missing_function_call_handler is self._missing_function_call_handler'
is always False. Using 'is' meant the placeholder branch never fired and
both warnings logged when a function was missing at queue time.

Switch to == so equality compares the underlying function and instance.
Strengthen the missing-at-queue-time test to assert the second warning
does not fire.
2026-04-27 17:34:31 +02:00
borislav
822392b0d4 fix: re-resolve registry item at execution time
Address review feedback: a function may be unregistered between when
run_function_calls queues it and when _run_function_call executes it.
Restore the live lookup, falling back to the missing-function handler
when the entry is gone, so the call still terminates with a normal
tool result. Factor the missing-handler item construction into a
helper since it's now built in two places.
2026-04-27 17:22:30 +02:00
Paul Kompfner
124863175a Add example demonstrating usage of tool_resources 2026-04-27 11:20:53 -04:00
kompfner
17a5e78fb4 Merge pull request #4376 from pipecat-ai/pk/add-openai-responses-to-readme
Add OpenAI responses to readme
2026-04-27 11:02:39 -04:00
kompfner
bc29bdb95e Merge pull request #4371 from Stoic-Angel/feat-global-context
Add a global context for tool calls: tool_resources
2026-04-27 10:55:03 -04:00
Paul Kompfner
005fe33b25 Update docs URLs in README to reflect new docs site structure and avoid redirects 2026-04-27 10:22:49 -04:00
Paul Kompfner
24154474c9 Add OpenAI Responses to the README's list of LLM services 2026-04-27 10:19:13 -04:00
kompfner
86effc4d10 Merge pull request #4015 from prettyprettyprettygood/feat/nova-sonic-session-continuation
feat(nova-sonic): add proactive session continuation for conversation…
2026-04-27 09:36:48 -04:00
Mark Backman
58e50882d8 Merge pull request #4374 from pipecat-ai/mb/fix-daily-runner-room-props
Expire runner-created Daily rooms after 4h
2026-04-27 09:07:31 -04:00
Mark Backman
ef183d0c96 Add changelog for #4374 2026-04-27 09:00:17 -04:00
Mark Backman
f078df7805 runner: expire Daily rooms after 4h to mirror Pipecat Cloud session limit
Runner-created Daily rooms previously had no expiration when callers
posted partial `dailyRoomProperties` (e.g. `{"start_video_off": true}`).
The model-default `exp=None` and `eject_at_room_exp=False` meant Daily's
cron never cleaned them up, so rooms accumulated indefinitely.

Encode the policy in the runner: define `PIPECAT_CLOUD_ROOM_EXP_HOURS=4.0`,
inject `exp` and `eject_at_room_exp=True` into user-supplied properties via
`setdefault` (so explicit caller values still win), and pass
`room_exp_duration` to all four `configure()` call sites.
2026-04-27 09:00:17 -04:00
Mark Backman
815cd44c2a Merge pull request #4372 from pipecat-ai/mb/relax-frames-proto-5x
Relax protobuf pin to support both 5.x and 6.x runtimes
2026-04-27 08:58:23 -04:00
Garegin Harutyunyan
e5941926be Krisp tt demo tool (#4335)
* VIVA SDK TT v3 support

* Format fix.

* Renamed the API naming, removed '3' from the name.

* Implementation of User turn start strategy using Krisp VIVA Interruption Prediction in scope of TT v3 support.

* TT demo tool

* Some improvements for demo scripts, audio recordin, etc.

* Enhance demo scripts with VAD selection and audio embedding features. Updated HTML report to include annotated audio players and improved response time metrics in summary formatting. Added README for setup and usage instructions.

* Refactor interrupt prediction demo to compare multiple interruption strategies (Krisp IP vs VAD). Updated README with usage instructions and output details. Enhanced audio processing with new helper functions for generating beeps and mixing audio.

* Refactor demo scripts to improve latency metrics by introducing total_delay property in TurnEvent. Update formatting in reports and visualizations to reflect accurate speech end times, including VAD wait times. Enhance HTML report with detailed latency information and adjust audio processing to account for VAD stop seconds.

* Add audio resampling functionality and update demo scripts for improved audio processing

- Introduced `resample_audio` function to handle audio resampling with linear interpolation.
- Updated `demo_audio_recorder.py` to utilize the new resampling feature, ensuring audio is saved at the requested sample rate.
- Modified `demo_interrupt_prediction.py` and `demo_turn_taking.py` to resample audio to 16 kHz for compatibility with Silero VAD.
- Adjusted imports in demo scripts to include the new resampling function.
- Enhanced error handling for sample rate discrepancies in audio recording.

* Enhance demo_interrupt_prediction.py with VAD type selection and improved processing logic

- Added support for selecting between "silero" and "krisp" VAD engines in the demo script.
- Introduced a new create_vad function to configure VAD analyzers based on the selected type.
- Updated audio processing logic to handle VAD type-specific resampling and state management.
- Modified the KrispVivaIPUserTurnStartStrategy to utilize a separate vad_flag for per-frame VAD input, improving interruption detection accuracy.

* Refactor audio processing scripts for improved readability and consistency

- Updated type hinting in `resample_audio` function to use `tuple` instead of `Tuple`.
- Simplified print statements in `demo_audio_recorder.py`, `demo_formatting.py`, and `demo_interrupt_prediction.py` for better readability.
- Adjusted argument formatting in `demo_audio_recorder.py` and `demo_formatting.py` for consistency.
- Cleaned up list comprehensions in `demo_formatting.py`, `demo_html_report.py`, and `demo_interrupt_prediction.py` for clarity.
- Enhanced error handling in `__init__.py` for the KrispVivaIPUserTurnStartStrategy import.

* Refactor VAD handling in KrispVivaIPUserTurnStartStrategy and update tests for clarity

- Simplified the argument formatting in the _handle_vad_started method for improved readability.
- Updated test assertions to reflect changes in VAD processing logic, ensuring that the vad_flag is correctly set to False during continuous state processing.
- Enhanced test cases to verify that the process method is called appropriately under different conditions.

* more format fixes.

* removed demo scripts.

* reverted wrongly removed file.

* Corrected the IP integration logic.

* style fix.

* Refactor audio processing and state management in KrispVivaIPUserTurnStartStrategy

- Removed the unused _vad_flag attribute to streamline state tracking.
- Updated the reset method to clear the audio buffer instead of resetting the vad_flag.
- Adjusted the process_frame method to use _speech_active for VAD input, enhancing clarity in the logic.
- Modified tests to reflect changes in state management and ensure proper functionality of the reset method and audio buffer handling.

* FIxed formatting

---------

Co-authored-by: Aram Poghosyan <apoghosyan@krisp.ai>
2026-04-27 08:14:00 -04:00
Mark Backman
6266c026a6 Merge pull request #4362 from ai-coustics/ai-coustics/aic-sdk-py-v2.2.0-update
Update aic-sdk to v2.2.0
2026-04-25 06:51:41 -04:00
Gökmen Görgen
e25dccfc6b update aic-sdk to ~=2.2.0 and rename AICOUSTICS_LICENSE_KEY to AIC_LICENSE_KEY. 2026-04-25 10:13:06 +02:00
Gökmen Görgen
3bbfc42854 remove adaptive audio enhancement example and support for runtime enhancement level updates in AICFilter. 2026-04-25 10:05:47 +02:00
Gökmen Görgen
3b2127f912 rename environment variables and references from AICOUSTICS to AIC. 2026-04-25 09:51:23 +02:00
Gökmen Görgen
ea12b10742 rename mcp-aic-adaptive.py to mcp-aicoustics-adaptive.py. 2026-04-25 09:51:23 +02:00
Gökmen Görgen
a2fbed86cf add adaptive audio enhancement example and support for runtime enhancement level updates in AICFilter. 2026-04-25 09:51:23 +02:00
Gökmen Görgen
f75f361629 bump aic-sdk to 2.2.0 and update AICFilter with model_id and enhancement_level changes. 2026-04-25 09:51:23 +02:00
Mark Backman
4c153e5d3c Add changelog for #4372 2026-04-24 21:20:46 -04:00
Mark Backman
4088992d97 Relax protobuf pin to support both 5.x and 6.x runtimes
Pipecat 1.0.8 hard-required protobuf 6.x via the base `protobuf>=6.31.1,<7`
pin, blocking users whose dependency graph already constrains protobuf to
the 5.x line. The original bump (PR #4136) was only needed because
`nvidia-riva-client>=2.25.1` ships gencode compiled with protoc 6.31.1.

Changes:

- Widen base pin to `protobuf>=5.29.6,<7`.
- Regenerate `frames_pb2.py` with `grpcio-tools~=1.67.1` (protoc 5.x). Per
  Google's cross-version runtime guarantee, 5.x gencode runs on both 5.x
  and 6.x runtimes, so this single artifact serves all users.
- Loosen the dev pin `grpcio-tools` to `>=1.67.1,<2` so contributors can
  install `pipecat[dev,nvidia]` without resolver conflict. Comment in
  `frames.proto` documents the 1.67.x requirement for regeneration.
- Add an explicit `protobuf>=6.31.1,<7` to the `nvidia` extra. This
  compensates for nvidia-riva-client's missing `protobuf` install
  requirement (upstream packaging gap, see
  https://github.com/nvidia-riva/python-clients/issues/172). When that
  issue is resolved, the explicit protobuf entry in the `nvidia` extra
  can be removed.

Verified: pipecat imports cleanly on both protobuf 5.29.6 and 6.33.6;
`tests/test_protobuf_serializer.py` passes; `import riva.client` succeeds
when `pipecat[nvidia]` is installed.
2026-04-24 21:15:32 -04:00
Osman Ipek
f1b16a672a feat(nova-sonic): add proactive session continuation for conversations >8min
Nova Sonic sessions have an AWS-imposed ~8-minute time limit. This adds
transparent session continuation that rotates sessions in the background
before the limit is reached, preserving conversation context with no
user-perceptible interruption.

Implementation follows the AWS reference architecture:
- Monitor loop detects when session age exceeds threshold
- On assistant AUDIO contentStart: start buffering user audio, create next
  session (sessionStart + promptStart + system instruction)
- Track SPECULATIVE/FINAL text counts as completion signal
- On completion signal: send conversation history + audioInputStart +
  buffered audio to next session, then promote immediately
- Close old session in background (non-blocking)
- Dead session detection: recreate next session if idle >30s

Key design decisions:
- Session continuation enabled by default (fundamental for long conversations)
- Conversation history tracked in real-time via _sc_conversation_history
  (independent of pipeline context aggregator which updates asynchronously)
- Completion signal check in _handle_content_end_event (after history update)
  to ensure latest text is included in handoff
- Rolling audio buffer (default 3s) captures user audio during transition
- transition_threshold_seconds capped at 420s (7min) for safety margin
- Unified event methods (_send_text_event, _send_client_event, etc.) accept
  optional stream/prompt_name params, eliminating duplicate SC methods

Also adds:
- SessionContinuationParams config (enabled, threshold, buffer, timeout)
2026-04-24 14:55:55 -07:00
Aayush Jain
65b15a8528 add changelog 2026-04-25 02:23:25 +05:30
Aayush Jain
108e32eb72 Add a global context for tool calls - tool_resources, as a parameter to PipelineTask and FrameProcessorSetup 2026-04-25 02:12:40 +05:30
Filipi da Silva Fuchter
38a02271c5 Merge pull request #4368 from pipecat-ai/filipi/stt_service
Fix issue where STTService unintentionally created a method with the same name as SegmentedSTTService.
2026-04-24 14:31:36 -03:00
filipi87
2ce203aeb8 Renaming the method to _maybe_reconnect_on_user_stopped_speaking. 2026-04-24 13:08:32 -03:00
filipi87
b30df95f13 Fix issue where STTService unintentionally created a method with the same name as SegmentedSTTService. 2026-04-24 13:00:38 -03:00
kompfner
6be8deee2a Merge pull request #4361 from pipecat-ai/pk/pyright-fixes
Some pyright fixes
2026-04-24 11:58:28 -04:00
Paul Kompfner
c113cacd59 refactor(types): name the LLMContext/OpenAI boundary with explicit cast helpers
LLMContext's NotGiven, LLMContextToolChoice, and LLMStandardMessage are
currently aliased to their OpenAI equivalents, so passing values
between the two sides type-checks implicitly. That works today but
obscures the fact that these are meant to be conceptually distinct —
if LLMContext ever diverges from OpenAI's types, every implicit
crossing would silently break.

Introduce two module-private cast helpers in open_ai_adapter.py:

- _openai_from_llm_context_tool_choice(tool_choice)
- _openai_from_llm_standard_message(message)

Both are typed no-ops today (implemented with typing.cast) but each
carries a docstring explaining why the cast is present, and every
boundary crossing now routes through a named function. Future readers
(and future greps) can find the crossings; a later divergence becomes
a mechanical find-and-update rather than hunting through adapter code.

No behavior change, no pyright error delta.
2026-04-24 10:10:03 -04:00
Paul Kompfner
d0495eeef6 fix(types): narrow voice in SpeechmaticsTTSSettings to disallow None
After widening TTSSettings.voice to str | None | _NotGiven (so other
TTS services can opt into None as a valid "no voice" state), pyright
flagged Speechmatics' URL builder receiving str | None where it
required str.

Speechmatics has no "no voice" mode (the URL path includes the voice
name), so override the inherited field in SpeechmaticsTTSSettings to
str | _NotGiven. The call site stays as a plain assert_given(...)
without an extra None check.
2026-04-23 21:08:47 -04:00
Paul Kompfner
c3eb69165c fix(types): accept SDK NotGiven in LLM Settings fields used for passthrough
Three LLM services initialize certain Settings fields with the SDK's
NOT_GIVEN (openai.NOT_GIVEN or anthropic.NOT_GIVEN) so the value
flows unmodified into SDK API calls. The inherited field types from
LLMSettings only admit pipecat's _NotGiven, so pyright flagged each
constructor call as a flavor mismatch.

Widen the field types in each service-specific Settings subclass so
they accept both pipecat's _NotGiven (for delta-mode defaults) and
the corresponding SDK NotGiven (for store-mode passthrough):

- OpenAILLMSettings: frequency_penalty, presence_penalty, seed,
  temperature, top_p, max_tokens, max_completion_tokens.
- OpenAIResponsesLLMSettings: temperature, top_p,
  max_completion_tokens.
- AnthropicLLMSettings: temperature, top_k, top_p, thinking.

Every overridden field is genuinely read from self._settings and
passed directly to the SDK, so none of the overrides are vestigial.

Clears 21 pyright errors and restores test_service_settings_complete
parity with the pre-NOT_GIVEN-swap state.
2026-04-23 18:32:46 -04:00
Paul Kompfner
0302f6d05c chore(pyright): drop newly-clean files from ignore list
asyncai/tts and google/vertex/llm are now clean after the missing-None
sweep (both benefited from the TTSSettings.voice / LLMSettings
cascades).

- src/pipecat/services/asyncai/tts.py
- src/pipecat/services/google/vertex/llm.py
2026-04-23 18:18:00 -04:00
Paul Kompfner
b9ff333654 fix(types): admit None on settings fields that accept it as a default
Service-specific Settings subclasses declared fields as T | _NotGiven
(no None), but the services routinely pass None to those fields during
init to mean "don't override — use the vendor's default". The field
type just didn't reflect that a None value is valid, so pyright
flagged every None at the call sites.

Change the declarations to T | None | _NotGiven, matching the pattern
already used by ServiceSettings.model and TTSSettings.language. No
constructor-call changes; the default_factory stays NOT_GIVEN.

Fields touched across 11 files:

- services/settings.py: TTSSettings.voice (base class; covers
  asyncai, cartesia, elevenlabs, fish, hume, kokoro, lmnt, mistral,
  neuphonic, piper, resembleai, rime, xtts TTS services).
- services/aws/llm.py: latency.
- services/aws/tts.py: engine, pitch, rate, volume, lexicon_names.
- services/azure/tts.py: emphasis, pitch, rate, role, style,
  style_degree, volume.
- services/google/gemini_live/llm.py: vad.
- services/google/llm.py: thinking.
- services/google/stt.py: language_codes.
- services/inworld/tts.py: speaking_rate, temperature.
- services/openai/tts.py: instructions, speed.
- services/speechmatics/stt.py: 13 fields (domain, operating_point,
  max_delay, end_of_utterance_*, punctuation_overrides, *_partials,
  split_sentences, enable_diarization, speaker_*, max_speakers,
  prefer_current_speaker, extra_params).
- services/ultravox/llm.py: output_medium.

Clears 94 pyright errors (1035 -> 941).
2026-04-23 18:18:00 -04:00
Paul Kompfner
92610944af chore(pyright): drop newly-clean files from ignore list
Three files no longer have pyright errors after the is_given /
assert_given sweep — remove them from the ignore list (which serves as
a live todo of files with remaining type errors).

- src/pipecat/processors/gstreamer/pipeline_source.py
- src/pipecat/services/camb/tts.py
- src/pipecat/services/speechmatics/tts.py
2026-04-23 17:44:17 -04:00
Paul Kompfner
6a337f1bc6 fix(types): assert_given at store-mode settings read sites
Apply assert_given across service modules to narrow reads from
store-mode settings fields (self._settings.X, default_settings.X),
where _NotGiven is declared in the field type but should never appear
at runtime (enforced by validate_complete()).

Two idioms used:

- Inline wrap for single uses:
    func(assert_given(self._settings.enable_prompt_caching), ...)

- Extract-and-reuse when the same value is used multiple times:
    thinking = assert_given(self._settings.thinking)
    if thinking:
        params["thinking"] = thinking.model_dump(...)

43 service files touched. Cleared ~172 pyright errors; remaining
_NotGiven-related errors are in adjacent categories (flavor mismatch
between openai/anthropic NotGiven and pipecat _NotGiven, settings
field types that should allow None but don't) that need different
fixes.
2026-04-23 17:39:17 -04:00
Filipi da Silva Fuchter
ef7fa07bf7 Merge pull request #4358 from pipecat-ai/filipi/fix_aiortc_sctp
Fixed SmallWebRTC data channel silently stalling on networks with a 1280-byte MTU
2026-04-23 17:49:18 -03:00
filipi87
ce1506792e Linking to the docs instead of full explanation. 2026-04-23 17:46:54 -03:00
Paul Kompfner
70f3d32734 feat(types): add assert_given for narrowing store-mode settings reads
In store-mode settings objects, _NotGiven should never appear (the
invariant enforced by validate_complete). But the declared field types
still include _NotGiven because the same class doubles as delta mode,
so every field read is typed X | None | _NotGiven and pyright flags
operations that assume X | None.

assert_given is a one-line extractor that narrows away _NotGiven and
raises loudly if the invariant is violated — preferable to scattering
is_given guards that defend against something that can't occur in
practice.

    resolved_model = assert_given(self._settings.model)  # str | None
2026-04-23 16:40:07 -04:00
Paul Kompfner
356618b448 fix(types): use is_given at call sites pyright flagged
Replace direct identity checks against NOT_GIVEN with is_given() at
sites where pyright's inability to narrow on non-singleton sentinels
was causing type errors.

- adapters/services/anthropic_adapter.py: narrow converted.system for
  _resolve_system_instruction.
- services/openai/llm.py: narrow params.service_tier using OpenAI's
  is_given.
- services/sarvam/llm.py: narrow tools / tool_choice using OpenAI's
  is_given (aliased as openai_is_given alongside the existing
  settings.is_given import).
- services/sarvam/tts.py: narrow settings.voice using settings.is_given.
2026-04-23 16:15:07 -04:00
Paul Kompfner
1624d7a474 feat(types): add is_given TypeGuard helpers for NotGiven sentinels
Pyright can't narrow identity checks against module-level NotGiven
sentinels (they aren't typed as singletons), which leaves many
NotGiven-bearing unions stuck as unnarrowed types throughout the
codebase. Introduce is_given TypeGuard helpers so narrowing works via
isinstance under the hood.

Each helper is co-located with the NotGiven flavor it guards:

- services/settings.py: upgrade the existing is_given to a TypeGuard.
- processors/aggregators/llm_context.py: add an is_given for
  LLMContext's NotGiven. Treat LLMContext's re-exported types
  (LLMStandardMessage, LLMContextToolChoice, NOT_GIVEN, NotGiven) as
  LLMContext's own — independent definitions that happen to coincide
  with OpenAI's as an implementation detail.
- adapters/services/anthropic_adapter.py: add is_given for anthropic's
  NotGiven.
- adapters/services/open_ai_adapter.py: add is_given for openai's
  NotGiven.
2026-04-23 15:33:43 -04:00
Paul Kompfner
092b1dcb0f fix(types): widen TLLMInvocationParams bound to Mapping[str, Any]
TypedDict types are not subtypes of dict[...] in the type system
(per PEP 589), so TypedDict-based invocation param classes could not
satisfy the TypeVar bound. Mapping[str, Any] accepts TypedDicts while
preserving the "string-keyed mapping" constraint.
2026-04-23 14:35:59 -04:00
Mark Backman
b90ea9bf6a Merge pull request #4352 from pipecat-ai/mb/pyright-fixes-1-per-file
More pyright fixes
2026-04-23 14:14:36 -04:00
kompfner
05c97804d5 Merge pull request #4359 from pipecat-ai/pk/changelog-4355-rename
chore: rebind Gemini Live reconnect changelog fragment to PR #4355
2026-04-23 14:10:36 -04:00
Paul Kompfner
7a8357a569 chore: rebind Gemini Live reconnect changelog fragment to PR #4355
The original contributor's PR (#4328) landed as #4355. Rename the fragment
so the rendered changelog links to the merged PR, and add the leading `- `
bullet prefix that towncrier expects.
2026-04-23 12:00:56 -04:00
filipi87
44756de15a Adding changelog for the SmallWebRTC fix. 2026-04-23 12:19:56 -03:00
filipi87
94304ec74e Fixed SmallWebRTC data channel silently stalling on networks with a 1280-byte MTU. 2026-04-23 12:18:33 -03:00
kompfner
a3fe34f4a2 Merge pull request #4355 from pipecat-ai/pk/gemini-live-context-reseed-on-reconnect
Re-seed Gemini Live context on reconnect without session resumption
2026-04-23 11:00:22 -04:00
Sathwika Reddy Geereddy
21f6c2afa5 Update NVIDIA STT services for Nemotron Speech defaults and config parity (#4269)
* Update NVIDIA STT services for Nemotron Speech defaults and config parity

* Add changelog entry for PR #4269

* initialize boosted LM settings defaults in streaming STT

* Align NVIDIA STT language handling with other STT services

* add finalised flag to Nvidia stt final transcripts, remove processing latency logs

* Changing interim transcription logging to tracing.

---------

Co-authored-by: sathwika <geereddysath@nvidia.com>
Co-authored-by: filipi87 <filipi87@gmail.com>
2026-04-23 09:01:27 -04:00
Filipi da Silva Fuchter
4d14251f4a Merge pull request #4354 from pipecat-ai/filipi/includes_inter_frame_spaces
feat(tts): add includes_inter_frame_spaces flag to word-timestamp API - follow-up
2026-04-23 08:49:26 -03:00
Paul Kompfner
1421c4ba22 fix: handle Gemini Live 2.5 quirks when re-seeding context on reconnect
Extends the reconnect re-seeding fix to work cleanly on Gemini Live 2.5,
which has stricter seed requirements than 3.x and a documented audio-input /
history-recall limitation. Both initial connection and reconnect now share a
single code path (`_create_initial_response(for_reconnect=...)`), with four
well-documented cases.

On Gemini 2.5 reconnect, `turn_complete=True` is now forced on the seed so
the model produces a recap-style response immediately instead of briefly
acting "forgetful" on the user's next utterance — the latter being
especially jarring mid-conversation. When a 2.5 seed doesn't already end
with a user turn (e.g. the bot had finished speaking before the disconnect),
a blank user turn is appended to satisfy the server's seed-shape
requirement. Gemini 3.x needs neither workaround.
2026-04-22 15:58:54 -04:00
filipi87
6b1d8d9fa5 Fixing merge conflicts. 2026-04-22 15:22:32 -03:00
filipi87
ac810e57ed Merge branch 'main' into filipi/includes_inter_frame_spaces
# Conflicts:
#	uv.lock
2026-04-22 15:22:06 -03:00
filipi87
bba7ca80e3 Bumping to small-webrtc-prebuilt 2.5.0 to fix karaoke highlighting. 2026-04-22 15:20:37 -03:00
filipi87
79250f1fe0 Making includes_inter_frame_spaces optional for word-timestamp. 2026-04-22 14:20:30 -03:00
Mark Backman
4f6e76e6fd Add changelog entries for #4352 2026-04-22 12:23:33 -04:00
Mark Backman
b0962861c8 Acknowledge Tkinter's GC-reference idiom with a scoped type ignore
Tkinter's `Label` only stores `PhotoImage` references at the C level, so
Python GC eats them unless something on the Python side keeps a
reference. The canonical fix is to stash the reference on the widget
itself: `label.image = photo`. Tkinter widgets are plain Python objects,
so the assignment works at runtime, but the stub declares no `image`
attribute (correctly — there isn't one; we're adding it).

Narrow the suppression to `# type: ignore[attr-defined]` on the one
line. The existing comment above the assignment already documents why.
2026-04-22 12:19:16 -04:00
Mark Backman
ec7c35fe98 Move Mistral message fixups into MistralLLMAdapter
Mistral imposes three conversation-history quirks on top of the
OpenAI-compatible wire format: tool messages must be followed by an
assistant message; non-initial system messages are rejected; trailing
assistant messages require `prefix=True`. These rules were applied
inline in `MistralLLMService.build_chat_completion_params`, which is the
wrong layer — every other provider with OpenAI-compatible-but-quirky
shape (Perplexity, etc.) owns its transformations in a
`BaseLLMAdapter` subclass that runs during `get_llm_invocation_params`.

Create `MistralLLMAdapter(OpenAILLMAdapter)` on the Perplexity template
and wire it in via the existing `adapter_class` dispatch. The service
now only handles Mistral-specific request-level mapping (`random_seed`
in place of `seed`), and the message shape concerns live with other
provider format logic.

No behavior change. The transform function casts to `list[dict[str,
Any]]` internally because mutating `role` and attaching Mistral's
non-standard `prefix` field both step outside OpenAI's TypedDict
contract; the cast at the return boundary encodes that we're emitting
Mistral's extended schema, not OpenAI's.
2026-04-22 12:17:46 -04:00
Mark Backman
10b86b4bbe Coerce inspect.getdoc() None to empty string before parsing
`inspect.getdoc()` returns `str | None`, but `docstring_parser.parse()`
requires `str`. Functions without a docstring produced `None`, which
the type checker correctly flagged.

Coerce to `""` at the call site. `docstring_parser.parse("")` returns
an empty docstring whose `.description` and `.params` are already
handled by the surrounding `or ""` fallbacks, so runtime behavior is
unchanged.
2026-04-22 12:01:00 -04:00
Mark Backman
8ec56092c0 Remove duplicate ResponseCreated type 2026-04-22 11:58:15 -04:00
Mark Backman
0c3c5e5c7d Widen ToolsSchema.standard_tools to Sequence for covariance
`ToolsSchema.__init__` declared `standard_tools: list[FunctionSchema |
DirectFunction]`. Callers (`BaseLLMAdapter`, `MCPService`) pass in
`list[FunctionSchema]`, which is not assignable to the union list
because `list` is invariant in its element type.

Widen the parameter to `Sequence[...]` (covariant) so `list[X]` and
`list[X | Y]` both fit. A narrower `list[FunctionSchema]` is still
accepted, and nothing in this class mutates the argument — the
constructor immediately copies it via `_map_standard_tools`.

Also correct the `custom_tools` property return type to include
`None`, matching the stored `_custom_tools` field.

This single edit clears the pyright errors for three ignore-list
entries: `tools_schema.py`, `base_llm_adapter.py`, and `mcp_service.py`.
2026-04-22 11:54:20 -04:00
Mark Backman
b64ed3f9e2 Narrow settings.model at service boundaries, not via truthiness
Two services were reading `_settings.model` (typed `str | _NotGiven |
None` because NOT_GIVEN is the default) and coercing it with `or ""`
or similar. `_NotGiven.__bool__` returns False, so the runtime
behavior happened to work, but the type was a lie — pyright saw
`str | _NotGiven` flowing into APIs that required `str` or `str | None`.

- `AIService._sync_model_name_to_metrics`: use `isinstance(model, str)`
  narrowing with an empty-string fallback. Equivalent runtime behavior,
  honest type, no truthiness dependency on a sentinel.
- `SarvamLLMService.__init__`: validate the model is a real string
  before handing it to `_validate_model(str)`. A non-string model at
  this point is a configuration bug; raise `ValueError` so the error
  is clear and survives `python -O` (unlike an assert).
2026-04-22 11:52:20 -04:00
Mark Backman
5872006d6b Encode lazy-init invariants at the right site, not at read sites
Three spots had the same shape: a field starts None, a later method
populates it, a read site later reads it. Pyright can't track the
cross-method invariant. Rather than spray assertions at the read
sites, fix each site at the structural level:

- `FastAPIWebsocketInputTransport._monitor_websocket` now takes the
  session timeout as an argument. The task-creation site already
  guards on truthiness, so the call can pass the non-None value
  directly and the method's signature tells the truth.
- `FrameProcessorMetrics.task_manager` raises `RuntimeError` instead
  of asserting. Asserts are stripped under `python -O`; a real raise
  keeps the runtime safety net and still narrows the type for pyright.
- `SOXRStreamAudioResampler._maybe_initialize_sox_stream` returns the
  initialized stream. Callers use the return value and never touch
  the Optional `_soxr_stream` attribute, so narrowing stays inside
  the init method where the invariant is established.
2026-04-22 11:45:18 -04:00
Mark Backman
457eb7aa92 Mark abstract image/vision generators as real async generators
`ImageGenService.run_image_gen` and `VisionService.run_vision` were
declared `async def ... -> AsyncGenerator[Frame, None]` with `pass`
bodies. Without a `yield` anywhere in the body, Python treats the
function as a coroutine returning an `AsyncGenerator`, not as an async
generator itself, so callers got a coroutine where they expected an
iterator.

Add `raise NotImplementedError; yield` so the body contains a yield
(making this a real async generator) while still raising cleanly if a
subclass ever calls `super().run_*` by mistake.
2026-04-22 11:19:23 -04:00
Mark Backman
14cd476b20 Drop pyright ignores for services fixed by run_stt/run_tts widening
Deepgram STT, Gradium TTS, Smallest STT, and xAI STT/TTS had exactly
one pyright error each, all of them the AsyncGenerator return-type
mismatch resolved in 08fe9157c. Remove them from the ignore list.
2026-04-22 11:09:27 -04:00
Mark Backman
3b0affe5b4 Guard run_stt WebSocket sends with try/except
AssemblyAI, Cartesia, Gradium, and Soniox STT services sent audio over
the WebSocket without catching transient send failures, so a single
network hiccup could propagate an exception up through process_frame
and end the pipeline. Other push-based STT services (Deepgram, xAI,
Azure, Smallest, etc.) already guard their sends.

Follow the deepgram/stt.py pattern: log a warning and continue. The
existing connection-state check at the top of each call handles
recovery on the next invocation.
2026-04-22 11:03:41 -04:00
Mark Backman
08fe9157cc Widen run_stt/run_tts return type to AsyncGenerator[Frame | None, None]
The push-based STT/TTS implementations send audio/text over a socket and
receive results via a separate receive task, so there is nothing to
yield inline. They yield `None` by design. The previous declaration of
`AsyncGenerator[Frame, None]` disagreed with that, while the consumer
(`AIService.process_generator`) already accepted `Frame | None`. Widen
the producer side (abstract base and every subclass) so the type honestly
describes the contract.

Pure annotation change; no runtime behavior difference.
2026-04-22 11:01:50 -04:00
Mark Backman
3f3d3c9203 Merge pull request #4337 from pipecat-ai/mb/fix-speech-stop-strategy
Split user-turn stop timeout into independent speech and STT timers
2026-04-22 10:23:03 -04:00
Mark Backman
6b6896a543 Merge pull request #4350 from pipecat-ai/mb/pyright-precise-ignore-list
Expand pyright coverage to full src/pipecat with per-file ignores
2026-04-22 09:56:59 -04:00
Filipi da Silva Fuchter
7858813871 Merge pull request #4270 from sathwikareddy02/nvidia-llm-update
Enhance NVIDIA LLM reasoning tokens handling and allow keyless local …
2026-04-22 10:47:54 -03:00
Mark Backman
7bba74ebd6 Expand pyright coverage to full src/pipecat with per-file ignores
Previously, six modules (adapters, audio, processors, serializers,
services, transports) were ignored wholesale. Many files in those
modules already pass type checking, but we had no way to protect them
from regressions or make the remaining work visible.

Switch the include list to src/pipecat so any new module is checked by
default, and replace directory-level ignores with the 140 specific
files that still fail. This puts 189 previously-untyped files under
type checking immediately and turns the remaining work into a concrete,
shrinking TODO list.
2026-04-22 09:45:31 -04:00
Mark Backman
f425e946eb Merge pull request #4349 from pipecat-ai/mb/serializer-pyright
Fix type errors in serializers and add to pyright checked set
2026-04-22 09:43:31 -04:00
Filipi da Silva Fuchter
75bd1b5b9b Merge pull request #4323 from dakshdua/daksh/allow-noninitial-whitespace-chunks
fix: when aggregating by tokens, allow inter-token whitespace once non-whitespace has been sent
2026-04-22 10:27:08 -03:00
filipi87
d953c201bd Adding changelog entry to the fix. 2026-04-22 10:24:21 -03:00
Mark Backman
263cad41f0 Add changelog for #4349 2026-04-21 18:14:15 -04:00
Mark Backman
df9642eb5a Fix type errors in serializers and add to pyright checked set
Moves src/pipecat/serializers into pyright's include list. Narrows
self._params to each subclass's InputParams in exotel, vonage, plivo,
twilio, genesys, and telnyx. In protobuf.py, renames the reassigned
frame local to avoid clobbering its Frame type and silences two dynamic
attribute accesses on the generated frames_pb2 module.

Also aligns telnyx and plivo hangup validation with twilio: if
auto_hang_up=True (the default) but required credentials are missing,
__init__ now raises ValueError instead of silently logging a warning
at call-end time. Previously a misconfigured serializer would construct
fine and fail to hang up the call later, leaving a phantom billable
session.
2026-04-21 18:12:54 -04:00
Mark Backman
dcbe86d0fc Unify fallback timeout into the user-speech timer
Collapse the separate fallback timer into the existing user_speech_timeout
timer, restarted when a transcript arrives without a VAD stop. stt_timeout
has no meaning on the fallback path, so the stt wait is marked done
immediately. This drops the _fallback_timeout_task / _fallback_expired
bookkeeping and the branched trigger condition.
2026-04-21 17:33:12 -04:00
Mark Backman
7fc79511dd Merge pull request #4348 from pipecat-ai/mb/pyright-scripts-docs
Fix type errors in scripts and add to pyright checked set
2026-04-21 16:56:49 -04:00
Mark Backman
4d9dc64af8 Install all extras in format workflow for pyright
CI was running `uv sync --group dev` without extras. Adds daily and
tracing to extras.
2026-04-21 16:53:57 -04:00
Mark Backman
21f5cfe21a Fix type errors in utils and add to pyright checked set 2026-04-21 16:47:12 -04:00
Mark Backman
308044808d Rename to _user_speech_wait_done 2026-04-21 16:39:30 -04:00
Mark Backman
c244a950eb Add src/pipecat/tests to include list, alphabetize list 2026-04-21 16:24:53 -04:00
Mark Backman
847bd8af4b Remove src/pipecat/sync which doesn't exist 2026-04-21 16:21:46 -04:00
Mark Backman
10e58d6e42 Fix type errors in scripts and add to pyright checked set 2026-04-21 16:17:49 -04:00
Mark Backman
609a0a14e7 Merge pull request #4341 from pipecat-ai/mb/xai-tts
Add XAITTSService for xAI streaming WebSocket TTS
2026-04-21 15:52:37 -04:00
Mark Backman
84891de04d Add voice/xai-http.py to release evals 2026-04-21 15:49:59 -04:00
Mark Backman
9a49517609 Add changelog entry for #4341 2026-04-21 15:48:27 -04:00
Mark Backman
d8f5c0be71 Add XAITTSService for xAI streaming WebSocket TTS
Adds XAITTSService in the existing xai/tts.py module, alongside the
existing XAIHttpTTSService. Connects to xAI's streaming endpoint at
wss://api.x.ai/v1/tts, streams text.delta chunks up and base64 audio.delta
chunks down on the same connection so audio starts flowing before the full
utterance is synthesized.

Extends InterruptibleTTSService since xAI's protocol is strictly sequential
per connection and exposes neither a cancel verb nor a context ID — the
only way to stop an in-flight utterance is to tear down the WebSocket,
which is exactly what InterruptibleTTSService does on interruption when
the bot is speaking.

Voice, language, codec, and sample_rate are passed as query-string params
at connect time; runtime setting changes reconnect the socket. Defaults to
raw PCM so emitted TTSAudioRawFrame objects need no decoding downstream.

Splits the existing example into voice-xai.py (WebSocket) and
voice-xai-http.py (batch HTTP) so each variant has its own entry point.
Promotes the xai extra to depend on pipecat-ai[websockets-base] since the
new service imports the websockets library.
2026-04-21 15:48:26 -04:00
Mark Backman
93393ea91c Merge pull request #4338 from pipecat-ai/mb/fix-examples-types
Include examples in type checking
2026-04-21 15:47:10 -04:00
Mark Backman
58a17c7b1b Include examples in type checking
Remove `examples/` from the `pyrightconfig.json` ignore list and fix
the resulting type errors across all example files. Common fixes:

- Required API keys: `os.getenv("X")` -> `os.environ["X"]` so the
  return type is `str` rather than `str | None`, and misconfiguration
  fails fast.
- Narrow `LLMContextMessage` union members with `isinstance(..., dict)`
  before dict-style access.
- `assert isinstance(params.llm, ...)` before calling service-specific
  methods that aren't on the base `LLMService`.
- Guard optional frame fields (e.g. `LLMSearchResponseFrame.search_result`)
  before use.
2026-04-21 15:43:31 -04:00
Mark Backman
103ced1eaa Merge pull request #4347 from pipecat-ai/mb/deepgram-stt-keepalive-unbound 2026-04-21 15:15:55 -04:00
Mark Backman
ac9bea27aa Merge pull request #4340 from pipecat-ai/mb/xai-stt
Add xAI streaming STT service
2026-04-21 14:52:38 -04:00
Mark Backman
648094da26 Add changelog for #4347 2026-04-21 14:51:30 -04:00
Mark Backman
29d604f608 Fix UnboundLocalError in Deepgram STT connection handler
If the WebSocket handshake is cancelled or fails before `keepalive_task`
is assigned (e.g. an STTUpdateSettingsFrame triggers a reconnect during
initial connect), the `finally` block tried to cancel an unbound local.

Initialize `keepalive_task = None` before the try and guard the cancel.
2026-04-21 14:48:55 -04:00
Mark Backman
b838bd906b Add changelog for #4340 2026-04-21 13:45:34 -04:00
Mark Backman
c091232f2f Add xAI streaming STT service
New `XAISTTService` wraps xAI's real-time speech-to-text WebSocket
(`wss://api.x.ai/v1/stt`). It extends `WebsocketSTTService`, authenticates
with the `XAI_API_KEY` as a Bearer token on the WS handshake, and streams
raw audio (PCM/mu-law/A-law) with configurable interim results, endpointing,
language, multichannel, and diarization settings.

- `src/pipecat/services/xai/stt.py`: new service, settings dataclass, and
  `language_to_xai_stt_language` helper.
- `src/pipecat/services/stt_latency.py`: `XAI_TTFS_P99` default.
- `pyproject.toml` / `uv.lock`: `xai` extra now pulls in `websockets-base`.
- `README.md`: link to xAI STT in the services table.
- `examples/voice/voice-xai.py`: swap DeepgramSTTService for XAISTTService so
  the xAI voice example is fully xAI.
- `examples/transcription/transcription-xai.py`: new transcription-only
  example using the new service.
2026-04-21 13:45:34 -04:00
Mark Backman
8e247f395b Merge pull request #4344 from pipecat-ai/mb/11labs-normalized-alignment 2026-04-21 13:41:04 -04:00
Mark Backman
b0e3b69b35 Merge pull request #4342 from pipecat-ai/mb/docs-workflow-label 2026-04-21 13:40:38 -04:00
kompfner
9213b22852 Merge pull request #4346 from pipecat-ai/pk/use-ExternalUserTurnStrategies-in-deepgram-flux-example
Use ExternalUserTurnStrategies, as expected, in a Deepgram Flux example
2026-04-21 13:20:27 -04:00
Paul Kompfner
81571beb1b Use ExternalUserTurnStrategies, as expected, in a Deepgram Flux example 2026-04-21 10:51:59 -04:00
Mark Backman
a07bee2318 Add changelog for #4344 2026-04-21 09:12:15 -04:00
Mark Backman
a0f79b4700 Use ElevenLabs normalized_alignment so word timestamps match spoken audio 2026-04-21 09:09:19 -04:00
Mark Backman
2c3f051a1f Merge pull request #4325 from radhikagpt1208/fix/sentry-metrics-drop-metricsframe
Fix SentryMetrics dropping MetricsFrame from stop_ttfb/stop_processing
2026-04-21 07:57:42 -04:00
Mark Backman
c1b3a9f4b5 Add pipecat label to update-docs CI workflow 2026-04-20 20:40:54 -04:00
Mark Backman
9ded7bab1b Merge pull request #4334 from dhruvladia-sarvam/feat/sarvam-stt-vad-parameters-exposed
Sarvam - VAD parameters configurable on saaras:v3
2026-04-20 16:04:23 -04:00
dhruvladia-sarvam
34fb303c44 changelog descriptions 2026-04-21 00:29:38 +05:30
dhruvladia-sarvam
2aec2467cb Deprecated InputParams fix and default model change to saaras:v3 2026-04-21 00:19:49 +05:30
Mark Backman
9d8eefd2a2 Add changelog for #4337 2026-04-20 12:02:20 -04:00
Mark Backman
b59c4775da Split user-turn stop timeout into independent speech and STT timers
SpeechTimeoutUserTurnStopStrategy previously collapsed two waits into
max(stt_timeout, user_speech_timeout), which over-waited for finalizing
STT services and could also end the turn early in a legacy code path.
Run them as independent timers instead:

- user_speech_timeout: policy floor, always runs to completion.
- stt_timeout: latency safety net, short-circuited by a finalized
  transcript since STT has signaled it has nothing more to send.

The no-VAD fallback now waits only user_speech_timeout rather than
max(stt_timeout, user_speech_timeout); stt_timeout is defined relative
to VAD stop and has no meaning when no VAD event occurred. This
shortens the fallback wait for users who set stt_timeout greater than
user_speech_timeout.
2026-04-20 11:55:09 -04:00
Harshita Jain
03bd667f95 Fix Smallest AI TTS WebSocket endpoint URL and remove unsupported flush (#4320)
* Fix Smallest AI TTS WebSocket endpoint URL to match API documentation

Update base URL from waves-api.smallest.ai to api.smallest.ai and
fix path prefix from /api/v1/ to /waves/v1/ per the v4.0.0 docs.

* Update keepalive using silent space message instead of unsupported flush
2026-04-20 11:15:25 -04:00
Mark Backman
e8c3f73968 Merge pull request #4336 from pipecat-ai/mb/pyright-ignore-modules
Silence pyright diagnostics for unchecked modules in IDE
2026-04-20 09:15:02 -04:00
sathwika
91e5b1ad9a Handle NVIDIA LLM reasoning content in stream wrapper 2026-04-20 14:17:39 +05:30
dhruvladia-sarvam
f2a19cb1a3 Initial commit for vad parameters on saaras:v3 2026-04-20 13:52:48 +05:30
sathwika
74becffe55 add changelog 2026-04-20 11:47:20 +05:30
sathwika
995f897b80 Enhance NVIDIA LLM reasoning tokens handling and allow keyless local NIM endpoints 2026-04-20 11:47:16 +05:30
Mark Backman
74d11dc0aa Silence pyright diagnostics for unchecked modules in IDE
Pylance analyzes open files even when they're outside the `include`
set, producing noise in the editor. Adding these paths to `ignore`
suppresses diagnostics without affecting import resolution.
2026-04-19 09:19:15 -04:00
Ian Lee
b435ddfa44 feat(tts): add includes_inter_frame_spaces flag to word-timestamp API
Some TTS providers (e.g. Inworld) return verbatim tokens where spaces and
punctuation are already embedded in the token text. When downstream consumers
join these tokens with an extra space they produce "hello , world" instead of
"hello, world".

Add an opt-in `includes_inter_frame_spaces: bool = False` parameter to
`add_word_timestamps` / `_add_word_timestamps`. The flag is threaded through
`_WordTimestampEntry` and stamped onto every emitted `TTSTextFrame`.
Defaults to `False` — no behaviour change for existing services.

`InworldTTSService` passes `includes_inter_frame_spaces=True` and stops
pre-processing tokens in `_calculate_word_times`, returning them verbatim.

Tests added to `test_tts_frame_ordering.py` covering both HTTP and WebSocket
delivery paths: verbatim text preservation, PTS ordering, text-before-audio
ordering, and the Inworld punctuation-token scenario.

Made-with: Cursor
2026-04-18 12:03:32 -07:00
Mark Backman
6d3dfd8f64 Merge pull request #4329 from pipecat-ai/mb/resolve-krisp-warning
Silence krisp_audio import logs on auto-import
2026-04-17 18:23:01 -04:00
Mark Backman
ce9c214eec Silence krisp_audio import logs on auto-import
The two logger.error lines in krisp_instance.py fired at module-load time
whenever anything transitively imported it (e.g. pipecat.turns.user_start
pulling in krisp_viva_ip_user_turn_start_strategy), producing noisy output
for users who never asked for Krisp. Drop the log calls and raise a more
informative ImportError that names the affected classes so direct
importers still get clear guidance.
2026-04-17 18:18:33 -04:00
Mark Backman
8c8b76e9d2 Merge pull request #4326 from pipecat-ai/mb/flux-multilingual 2026-04-17 15:59:11 -04:00
denxxs
7b3141ba19 chore: update changelog fragment to PR #4328 2026-04-18 01:15:27 +05:30
denxxs
928ade993b fix: re-seed Gemini Live context on reconnect without session resumption 2026-04-18 01:14:05 +05:30
Mark Backman
42a6fc703c Address review feedback
- Fall back to Language.EN in _primary_detected_language when model is
  flux-general-en, preserving prior behavior on the default model.
- Standardize example on DeepgramFluxSTTService.Settings and drop the
  now-redundant DeepgramFluxSTTSettings import.
- Narrow the changed-behavior changelog to reflect that flux-general-en
  frames still carry Language.EN.
2026-04-17 15:38:14 -04:00
Mark Backman
c5c18335fd Merge pull request #4324 from pipecat-ai/mb/pyright-initial
Add pyright type checking: step 1
2026-04-17 14:04:35 -04:00
Mark Backman
3159503c7f Merge pull request #4327 from pipecat-ai/filipi/pyright_service_switcher
Fixing typecheck for service switcher.
2026-04-17 13:59:40 -04:00
filipi87
0340e25e9f Fixing typecheck for service switcher. 2026-04-17 12:44:57 -03:00
Mark Backman
af861b7975 Add changelog for #4326 2026-04-17 10:31:37 -04:00
Mark Backman
6bb4e8295f Add multilingual support for Deepgram Flux STT
Enables the flux-general-multi model with one or more language_hints.
Hints are sent as repeatable URL params at connect time and via a
Configure control message when updated mid-stream (detect-then-lock).
TranscriptionFrame.language now reflects the language Flux detected
for each turn via the TurnInfo `languages` field.
2026-04-17 10:30:45 -04:00
Mark Backman
f5f92dea63 Add changelog entries and restore multi-line WhatsApp error log
Add changelog entries for the pyright introduction and the
LiveKitRunnerArguments.token signature tightening. Restore the
indented multi-line format for the WhatsApp missing-env error,
now listing only the vars that are actually missing.
2026-04-17 09:39:55 -04:00
Mark Backman
cb1463f9f1 Fix type errors in runner and add to pyright checked set
Make required parameters non-optional: LiveKitRunnerArguments.token,
_create_telephony_transport args. Use os.environ[] instead of
os.getenv() for required WhatsApp env vars. Guard spec/loader None
in module loading. Tighten sip_caller_phone guard in daily.py.
2026-04-17 09:39:55 -04:00
Garegin Harutyunyan
4c19f5584c VIVA SDK TT v3 support (#4252)
* VIVA SDK TT v3 support

* Format fix.

* Renamed the API naming, removed '3' from the name.

* Implementation of User turn start strategy using Krisp VIVA Interruption Prediction in scope of TT v3 support.

* Typo fix in voice-krisp-viva example to use KrispVivaFilter class

* style fix.

* test run error fixes.

* some test related changes.

* Fixed tests

* Stule fixes.
2026-04-17 07:53:41 -04:00
Radhika Gupta
80fecab4de Fix SentryMetrics dropping MetricsFrame from stop_ttfb/stop_processing
SentryMetrics.stop_ttfb_metrics and stop_processing_metrics called the
base FrameProcessorMetrics implementation but discarded its return
value (implicit `return None`). FrameProcessorMetrics.stop_ttfb_metrics
/ stop_processing_metrics build and return a MetricsFrame, which
FrameProcessor.stop_ttfb_metrics / stop_processing_metrics then pushes
downstream so observers (e.g. UserBotLatencyObserver,
MetricsLogObserver) can see TTFB / processing metrics.

Because SentryMetrics returned None, the FrameProcessor never pushed
the MetricsFrame, so any pipeline using metrics=SentryMetrics() on STT
/ LLM / TTS services silently lost all downstream TTFB and processing
MetricsFrames. The metrics were still calculated and logged
internally, and Sentry transactions still finished correctly, but
observers never saw them.

Forward the MetricsFrame returned by the base class so FrameProcessor
can push it into the pipeline.
2026-04-17 14:48:36 +05:30
Mark Backman
ab91047300 Fix type errors in pipeline and add to pyright checked set
Use Sequence[FrameProcessor] instead of list[FrameProcessor] in Pipeline,
ServiceSwitcher, and ServiceSwitcherStrategy parameters to accept subtype
lists. Add cast() in LLMSwitcher for narrowed return types. Guard against
None in task_observer._send_to_proxy and replace hasattr with truthiness
check in task._cleanup.
2026-04-16 21:47:11 -04:00
Mark Backman
3127cc6161 Fix type errors in turns and add to pyright checked set
Widen base strategy process_frame return types to ProcessFrameResult |
None to match actual behavior (None treated as CONTINUE). Give
UserTurnCompletionLLMServiceMixin a FrameProcessor base class so pyright
can see create_task, cancel_task, process_frame, and push_frame.
2026-04-16 21:33:43 -04:00
Mark Backman
36319ecbf0 Replace system role message
In UserTurnCompletionMixin, use a developer role message for
LLM messages following an incomplete turn
2026-04-16 21:26:08 -04:00
Mark Backman
c6a1837844 Fix type errors in extensions and add to pyright checked set
Tighten LLMMessagesAppendFrame and LLMMessagesUpdateFrame message fields
from list[dict] to list[LLMContextMessage] to match actual usage. Add
type annotations on inline message lists in IVR navigator and voicemail
detector.
2026-04-16 21:22:46 -04:00
Daksh Dua
31127abd9a Allow inter-token whitespace once non-whitespace has been sent
In token-streaming mode, _push_tts_frames previously stripped only
leading newlines and dropped any pure-whitespace frame. That silently
discarded meaningful inter-token whitespace (e.g. a standalone "\n"
token between "hello" and "world"), losing prosody cues and any
downstream sentence-boundary semantics.

Track whether a non-whitespace character has been sent in the current
context. While the flag is false, strip all leading whitespace; once
true, let whitespace tokens flow through. Reset the flag on
LLMFullResponseEndFrame/EndFrame and on interruption, and save/restore
it around TTSSpeakFrame since each utterance is its own context.

Sentence-aggregation mode preserves the existing behavior.
2026-04-16 15:51:35 -07:00
Mark Backman
aa355e3d32 Fix type errors in observers and add to pyright checked set
Group three co-assigned fields (_start_frame_id, _start_frame_arrival_ns,
_start_wall_clock) into a single _StartFrameInfo dataclass. This makes
the "always set together" invariant structural rather than implicit, and
fixes the incorrect str | None annotation on _start_frame_id (Frame.id
is int).
2026-04-16 18:25:10 -04:00
Mark Backman
9bd51cd88c Add incremental pyright type checking with CI enforcement
Add pyrightconfig.json with basic type checking for zero-error modules
(clocks, metrics, transcriptions, frames) and enforce via CI. The
include list will expand as modules are fixed.
2026-04-16 18:04:42 -04:00
Aleix Conchillo Flaqué
fc1c3b48dc Merge pull request #4322 from pipecat-ai/aleix/readme-subagents
Add Pipecat Subagents to the ecosystem section in README
2026-04-16 10:38:56 -07:00
Aleix Conchillo Flaqué
4278a37ebc Merge pull request #4321 from pipecat-ai/aleix/fix-redundant-type-checks
Remove redundant duplicate type checks in direct_function.py
2026-04-16 10:38:45 -07:00
Mark Backman
7e045257e8 Merge pull request #4314 from pipecat-ai/mb/prudent-system-instruction-logging
Log system instruction once at composition time, not on every LLM call
2026-04-16 13:18:33 -04:00
dyi1
b8a1f45d4c Improve HeyGen LiveAvatar plugin reliability and performance (#4312)
* Improve HeyGen LiveAvatar plugin reliability and performance

- Add WebSocket ready gate: wait for session.state_updated connected
  event before sending commands (prevents silently dropped messages)
- Add keep-alive mechanism: send session.keep_alive every 2.5 min to
  prevent 5-minute inactivity timeout
- Optimize audio chunking: 600ms first chunk for faster initial
  response, 1s subsequent chunks for efficient streaming
- Fix audio buffer flush: send remaining buffered audio on utterance
  end instead of discarding it
- Fix WS state cleanup: properly reset connected/ready state when
  WebSocket drops unexpectedly
- Add livekit_config passthrough in LiveAvatar session token creation
- Replace stray print() with logger.debug()

* Fix HeyGenOutputTransport.start() signature and use 400ms first chunk

- Update transport.py to match new client.start() signature (no
  audio_chunk_size param)
- Change first chunk size from 600ms to 400ms per feedback

* Fix transport audio resampling and client.start() error propagation

- Add audio resampling in HeyGenOutputTransport.write_audio_frame() to
  ensure audio is always 24kHz before sending to HeyGen (was sending
  at pipeline sample rate, causing garbled audio)
- Raise exception on WS ready timeout instead of silently returning,
  preventing transport from appearing ready when WS connection failed

* Fix session readiness gate to work with LITE mode

LITE mode does not send session.state_updated WS events. Instead,
use a dual-signal _session_ready event that fires on either:
- WS session.state_updated connected (FULL mode)
- LiveKit participant connected (LITE mode)

Also reorder start() to connect both WS and LiveKit before waiting,
since the WS events may depend on LiveKit being connected.

Verified with live sandbox session - all tests pass.

* Simplify session readiness to use only WS ready gate

Remove _session_ready dual-signal and use only _ws_ready, which fires
on the session.state_updated connected WS event. Increase timeout to
30s. LiveKit is connected before waiting so the WS event can arrive.

* Reduce WS ready gate timeout back to 10s

* Remove WS ready gate (session.state_updated not reliably received)

The session.state_updated connected event is not reliably received
via the websockets library. Remove the gate for now and assume the
session is ready after WS + LiveKit connect. Keep-alive, chunking,
buffer flush, state cleanup, and other improvements remain.
2026-04-16 12:58:14 -04:00
Aleix Conchillo Flaqué
8ec85f981d Add Pipecat Subagents to the ecosystem section in README 2026-04-16 09:57:23 -07:00
Aleix Conchillo Flaqué
2f52905d32 Remove redundant duplicate type checks in direct_function.py
After the typing modernization, `dict or dict` and `list or list`
were left behind where `Dict`/`List` had been replaced by `dict`/`list`.
2026-04-16 09:51:21 -07:00
Aleix Conchillo Flaqué
f86cf98c6d Merge pull request #4319 from pipecat-ai/aleix/modernize-typing
Modernize Python typing across the codebase
2026-04-16 09:43:17 -07:00
Aleix Conchillo Flaqué
84fcba772d Replace percent format with f-string in daily/utils.py 2026-04-16 09:30:19 -07:00
Aleix Conchillo Flaqué
b3bb6fdaa5 Modernize Python typing across the codebase
Automated via ruff UP006, UP007, UP035, UP045 rules (target: py311):

- Replace `typing.List`, `Dict`, `Tuple`, `Set`, `FrozenSet`, `Type`
  with their built-in equivalents (`list`, `dict`, `tuple`, etc.)
- Replace `typing.Optional[X]` with `X | None`
- Replace `typing.Union[X, Y]` with `X | Y`
- Move `Mapping`, `Sequence`, `Callable`, `Awaitable`,
  `MutableMapping`, `MutableSequence`, `Iterator`, `AsyncIterator`,
  `AsyncGenerator` imports from `typing` to `collections.abc`
- Remove now-unused `typing` imports
- Add `from __future__ import annotations` to 5 files that use
  forward-reference strings in `X | "Y"` annotations
2026-04-16 09:28:23 -07:00
Aleix Conchillo Flaqué
12b8af3d89 pyproject: use UP ruff linting option 2026-04-16 09:26:12 -07:00
Aleix Conchillo Flaqué
1c4ffb7845 Merge pull request #4313 from pipecat-ai/ac/daily-send-dtmf
Add send_dtmf() to DailyTransport
2026-04-16 08:57:48 -07:00
Aleix Conchillo Flaqué
8d4feede23 Split #4313 changelog into one entry per file 2026-04-16 08:55:03 -07:00
Aleix Conchillo Flaqué
b11a3bc43f Add method field to Daily DTMF output frames
Lets callers specify Daily's DTMF delivery method (e.g. "rfc2833"
or "info") alongside `session_id` and `digit_duration_ms`. Forwarded
to Daily's `send_dtmf` as `method`.
2026-04-16 08:55:03 -07:00
Mark Backman
8dce66933f Merge pull request #4315 from pipecat-ai/mb/update-tavus-transport-on-connected
Update Tavus transport example
2026-04-16 09:20:52 -04:00
Mark Backman
7291026695 Update Tavus transport example
Show how to use on_connected event handler to obtain
Daily room URL
2026-04-15 23:04:31 -04:00
Mark Backman
686e250db1 Add changelog for #4314 2026-04-15 21:03:13 -04:00
Mark Backman
e8d6f611cd Log system_instruction once at composition time 2026-04-15 21:02:20 -04:00
Aleix Conchillo Flaqué
f094ce80fb Add to_string helper on output DTMF frames
Mirrors the existing `from_string` classmethod and lets callers
turn a frame's `buttons` list back into a dial string like `"123#"`.
`__str__` and the Daily transport's native DTMF path reuse it.
2026-04-15 15:14:47 -07:00
Aleix Conchillo Flaqué
9fbe1bf2a3 Document button as a convenience shortcut, not a deprecation
The single-key `button` field on `OutputDTMFFrame` and
`OutputDTMFUrgentFrame` is kept as a first-class ergonomic shortcut
for the common single-keypress case, equivalent to
`buttons=[button]`. `buttons` takes precedence when both are set.
2026-04-15 15:09:01 -07:00
Aleix Conchillo Flaqué
d8b0e78bc8 Represent DTMF sequences as list[KeypadEntry] via buttons field
Replaces the string-based `tones` field with a type-safe
`buttons: list[KeypadEntry]` on `OutputDTMFFrame` and
`OutputDTMFUrgentFrame`, matching the existing singular `button`
field on `InputDTMFFrame`. A `from_string` classmethod builds the
list from a dial string like `"123#"` (invalid characters raise
ValueError from the `KeypadEntry` constructor).

The base output audio fallback now iterates `frame.buttons`
directly, LiveKit sends `frame.buttons[0].value`, and the Daily
transport joins the button values into the single string Daily's
`send_dtmf` expects.
2026-04-15 15:05:45 -07:00
Aleix Conchillo Flaqué
675b7df408 Add tones to OutputDTMFFrame and simplify DTMF frame hierarchy
Introduces a new `tones` field on `OutputDTMFFrame` and
`OutputDTMFUrgentFrame` for sending multi-digit DTMF sequences and
deprecates the existing single-key `button` field. When only `button`
is set, it is used as a single-character `tones` string for backward
compatibility.

`DTMFFrame` is kept as an empty marker class so both input and output
DTMF frames can still be identified via isinstance. `InputDTMFFrame`
keeps its required `button` field (single keypress semantics).

The Daily-specific `DailyOutputDTMFFrame` and
`DailyOutputDTMFUrgentFrame` frames no longer need to override
`button` and simply add `session_id` and `digit_duration_ms`, which
are forwarded to Daily's `send_dtmf` as `sessionId` and
`digitDurationMs`.

The base output audio fallback now iterates `tones` and generates a
tone per character; LiveKit's native DTMF path sends `tones[0]` since
its API is single-tone.
2026-04-15 14:48:02 -07:00
Aleix Conchillo Flaqué
30f39d7395 Add DailyOutputDTMFFrame and DailyOutputDTMFUrgentFrame
Introduces Daily-specific DTMF output frames that carry explicit
`tones`, `session_id` and `digit_duration_ms` fields, forwarded to
Daily's `send_dtmf` as `tones`, `sessionId` and `digitDurationMs`.
The inherited `button` and `transport_destination` fields are
ignored for these frames in the Daily transport.
2026-04-15 14:20:08 -07:00
Aleix Conchillo Flaqué
fe2ef9c712 Add changelog for #4313 2026-04-15 10:43:28 -07:00
Aleix Conchillo Flaqué
173cf39aee Add send_dtmf() to DailyTransport
Exposes the Daily call client's DTMF sending capability so
applications can send tones during a call (e.g. IVR navigation).
2026-04-15 10:43:28 -07:00
Filipi da Silva Fuchter
ac43a70d36 Merge pull request #4311 from pipecat-ai/filipi/reconnect_websocket
New approach to reconnect STT services after updating settings.
2026-04-15 14:39:24 -03:00
filipi87
8e4fd10e0f Removing CancelledError handling from DeepgramSTTService. 2026-04-15 14:36:17 -03:00
filipi87
aeab417cd1 Changelogs for the STT service reconnect improvements. 2026-04-15 13:23:25 -03:00
filipi87
d263ad3c34 Refactoring DeepgramSTT to use request to reconnect. 2026-04-15 13:21:12 -03:00
filipi87
f3c454dc54 Refactoring CartesiaSTT to use request to reconnect. 2026-04-15 13:19:36 -03:00
filipi87
fc63790657 New approach to reconnect STT services after updating settings. 2026-04-15 11:01:58 -03:00
Mark Backman
9ffcccdd84 Merge pull request #4253 from pipecat-ai/mb/mistral-stt
Add Mistral Voxtral Realtime STT service
2026-04-15 09:00:27 -04:00
Mark Backman
503782c8b2 Merge pull request #4304 from pipecat-ai/mb/tavus-deps
Add missing daily-python dependency for tavus extra
2026-04-14 18:14:19 -04:00
Mark Backman
b834a893fe Add changelog for #4304 2026-04-14 17:52:29 -04:00
Mark Backman
ba023248d9 Add missing daily-python dependency for tavus extra 2026-04-14 17:48:37 -04:00
borislav
14cf783647 chore: add changelog for #4301 2026-04-14 22:41:09 +02:00
borislav
86e726107f fix: fail missing tool calls cleanly 2026-04-14 22:40:45 +02:00
Mark Backman
215b2dc7f3 Add voice-mistral to evals 2026-04-07 15:37:07 -04:00
Mark Backman
874e2878be Update README with Mistral services 2026-04-07 15:36:22 -04:00
Mark Backman
9131fa5c12 Add changelog for PR #4253 2026-04-07 15:32:38 -04:00
Mark Backman
68a3070ad4 Add Mistral Voxtral Realtime STT service 2026-04-07 15:26:56 -04:00
Mark Backman
a7bf9f538c Clean up comments in MistralTTSService 2026-04-07 12:56:10 -04:00
596 changed files with 12349 additions and 5861 deletions

View File

@@ -32,7 +32,7 @@ jobs:
run: uv python install 3.12
- name: Install development dependencies
run: uv sync --group dev
run: uv sync --group dev --extra daily --extra tracing
- name: Ruff formatter
id: ruff-format
@@ -41,3 +41,7 @@ jobs:
- name: Ruff linter (all rules)
id: ruff-check
run: uv run ruff check
- name: Type check (pyright)
id: pyright
run: uv run pyright

View File

@@ -114,6 +114,7 @@ jobs:
GH_TOKEN=$DOCS_SYNC_TOKEN gh pr create \
--repo pipecat-ai/docs \
--label auto-docs \
--label pipecat \
--title "docs: update for pipecat PR #${{ steps.pr.outputs.number }}" \
--body "$(cat <<'BODY'
Automated documentation update for [pipecat PR #${{ steps.pr.outputs.number }}](https://github.com/pipecat-ai/pipecat/pull/${{ steps.pr.outputs.number }}).

View File

@@ -7,6 +7,344 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
<!-- towncrier release notes start -->
## [1.1.0] - 2026-04-27
### Added
- Added `MistralSTTService` for real-time speech-to-text using Mistral's
Voxtral Realtime API (`voxtral-mini-transcribe-realtime-2602`). Supports
streaming transcription with interim results, automatic language detection,
and VAD-driven utterance lifecycle.
(PR [#4253](https://github.com/pipecat-ai/pipecat/pull/4253))
- Added `buttons` field to `OutputDTMFFrame` and `OutputDTMFUrgentFrame` for
sending multi-key DTMF sequences as a `list[KeypadEntry]`. Use
`OutputDTMFFrame.from_string("123#")` (or the equivalent on
`OutputDTMFUrgentFrame`) to build one from a dial string, and `to_string()`
to convert back.
(PR [#4313](https://github.com/pipecat-ai/pipecat/pull/4313))
- Added `DailyTransport.send_dtmf()` to expose the Daily call client's DTMF
sending capability, enabling applications to send tones during a call (e.g.
IVR navigation).
(PR [#4313](https://github.com/pipecat-ai/pipecat/pull/4313))
- Added `DailyOutputDTMFFrame` and `DailyOutputDTMFUrgentFrame` frames. In
addition to the inherited `buttons`, they accept `session_id`,
`digit_duration_ms` and `method`, which are forwarded to Daily's `send_dtmf`
as `sessionId`, `digitDurationMs` and `method`.
(PR [#4313](https://github.com/pipecat-ai/pipecat/pull/4313))
- Added incremental `pyright` type checking. A `pyrightconfig.json` at the repo
root uses `typeCheckingMode: "basic"` with an explicit `include` list of
modules that pass cleanly (`clocks`, `metrics`, `transcriptions`, `frames`,
`observers`, `extensions`, `turns`, `pipeline`, `runner`). Remaining modules
will be added in subsequent PRs. CI enforces the checked set via `uv run
pyright` in the format workflow.
(PR [#4324](https://github.com/pipecat-ai/pipecat/pull/4324))
- Added multilingual support to `DeepgramFluxSTTService` via a new
`language_hints: list[Language]` setting. Works with Deepgram's new
`flux-general-multi` model to bias transcription across English, Spanish,
French, German, Hindi, Russian, Portuguese, Japanese, Italian, and Dutch.
Omit the hints to use auto-detection, or pass a subset to bias toward
expected languages. Hints can be updated mid-stream via
`STTUpdateSettingsFrame` (sent as a Deepgram `Configure` control message, no
reconnect) to support detect-then-lock flows.
(PR [#4326](https://github.com/pipecat-ai/pipecat/pull/4326))
- Added fine-grained server-side VAD tuning options to
`SarvamSTTService.Settings` for the `saaras:v3` model, including speech
thresholds, frame-count controls, pre-speech padding, interruption
sensitivity, and initial-frame skipping.
(PR [#4334](https://github.com/pipecat-ai/pipecat/pull/4334))
- Added `XAISTTService` for real-time speech-to-text using xAI's voice STT
WebSocket API (`wss://api.x.ai/v1/stt`). Streams raw audio (PCM, µ-law, or
A-law) and emits interim and final transcription frames driven by the
server's `is_final` / `speech_final` flags. Settings expose
`interim_results`, `endpointing`, `language`, `multichannel`, `channels`, and
`diarize`. Requires the `xai` optional extra (`pip install
"pipecat-ai[xai]"`).
(PR [#4340](https://github.com/pipecat-ai/pipecat/pull/4340))
- Added `XAITTSService` for streaming text-to-speech using xAI's WebSocket TTS
endpoint (`wss://api.x.ai/v1/tts`). Streams `text.delta` chunks up and base64
`audio.delta` chunks down on the same connection so audio begins flowing
before the full utterance finishes synthesizing; complements the batch-HTTP
`XAIHttpTTSService`. Defaults to raw PCM output so `TTSAudioRawFrame` needs
no decoding. The `xai` optional extra now pulls in
`pipecat-ai[websockets-base]`.
(PR [#4341](https://github.com/pipecat-ai/pipecat/pull/4341))
- Added `SonioxTTSService`, a real-time WebSocket TTS service that streams text
in and audio out over a persistent connection. Install with `pip install
"pipecat-ai[soniox]"`.
(PR [#4360](https://github.com/pipecat-ai/pipecat/pull/4360))
- Added support for Daily's built-in `screenVideo` destination in
`DailyTransport`. When `"screenVideo"` is included in
`video_out_destinations` transport parameter, a dedicated screen video track
is created at join time and frames with `transport_destination="screenVideo"`
are routed to it.
```python
params = DailyParams(
video_out_enabled=True,
video_out_is_live=True,
video_out_width=1280,
video_out_height=720,
video_out_destinations=["screenVideo"]
)
...
frame = OutputImageRawFrame(...)
frame.transport_destination = "screenVideo"
```
(PR [#4370](https://github.com/pipecat-ai/pipecat/pull/4370))
- Added `camera_out_send_settings` to `DailyParams`. This dict is passed
verbatim to the Daily client's camera publishing settings, allowing
applications to fully control encoding, codec, bitrate, and framerate.
```python
params = DailyParams(
camera_out_send_settings={
"maxQuality": "high",
"encodings": {
"high": {"maxBitrate": 2_000_000, "maxFramerate": 30}
},
},
)
```
(PR [#4370](https://github.com/pipecat-ai/pipecat/pull/4370))
- Added `tool_resources` to `PipelineTask` and `FunctionCallParams`. Pass an
application-defined object (DB handles, clients, state, etc.) to
`PipelineTask(..., tool_resources=...)` and access it from any tool handler
via `params.tool_resources`. Passed by reference; the caller retains their
handle and can read mutations after the task finishes. Resolves #4256.
(PR [#4371](https://github.com/pipecat-ai/pipecat/pull/4371))
### Changed
- Updated NVIDIA STT services to align with Nemotron Speech defaults and
configuration: `api_key` is now optional for local deployments, additional
recognition settings are available (including alternatives, word offsets, and
diarization), and streaming/segmented docs now reflect Nemotron Speech APIs.
- NVIDIA streaming STT now sets `TranscriptionFrame.finalized=True` when the
provider marks a result as final, and preserves `language` on both
`TranscriptionFrame` and `InterimTranscriptionFrame`.
(PR [#4269](https://github.com/pipecat-ai/pipecat/pull/4269))
- Updated `NvidiaLLMService` to emit model reasoning as `LLMThought*Frame`s
(from both `reasoning_content` and `<think>...</think>` output), avoid mixing
reasoning text into normal assistant content, and allow keyless local NIM
endpoints while warning when the cloud endpoint is used without an API key.
(PR [#4270](https://github.com/pipecat-ai/pipecat/pull/4270))
- STT services now reconnect safely when settings change: reconnection is
deferred until the current user turn ends (i.e., until
`UserStoppedSpeakingFrame` is received) rather than interrupting an active
speech session. Audio frames received while the reconnect is in progress are
buffered and replayed once the new connection is ready. `CartesiaSTTService`
and `DeepgramSTTService` both use this new behavior.
(PR [#4311](https://github.com/pipecat-ai/pipecat/pull/4311))
- Reduced debug log noise for LLM services. The system instruction is now
logged once when composed (e.g. when turn completion is enabled) instead of
on every LLM call. Per-call logs now show only the conversation messages,
consistent across Google, Anthropic, AWS, and OpenAI services.
(PR [#4314](https://github.com/pipecat-ai/pipecat/pull/4314))
- `LiveKitRunnerArguments.token` is now a required `str` (previously `str |
None` with a default of `None`). LiveKit requires a token to join a room, so
the type now reflects reality. This only affects custom runners that
construct `LiveKitRunnerArguments` directly; code consuming the argument from
the standard runner is unaffected.
(PR [#4324](https://github.com/pipecat-ai/pipecat/pull/4324))
- `TranscriptionFrame.language` and `InterimTranscriptionFrame.language`
emitted by `DeepgramFluxSTTService` now reflect the language Deepgram
detected for each turn (read from the `languages` field on Flux's `TurnInfo`
event). On `flux-general-multi` this gives per-turn accuracy for downstream
consumers (e.g. TTS voice selection). `flux-general-en` continues to emit
`Language.EN`.
(PR [#4326](https://github.com/pipecat-ai/pipecat/pull/4326))
- Added `includes_inter_frame_spaces` parameter to
`TTSService.add_word_timestamps` and `_add_word_timestamps` (default `None`).
When `True`, downstream consumers will not inject additional spaces between
tokens; `None` leaves each frame's own default unchanged.
- `InworldTTSService` now passes `includes_inter_frame_spaces=True` when
reporting word timestamps, since Inworld tokens already include inter-word
spacing.
(PR [#4330](https://github.com/pipecat-ai/pipecat/pull/4330))
- `SarvamSTTService` now uses `saaras:v3` as its default model instead of
`saarika:v2.5`. Applications that relied on the previous default should set
`settings=SarvamSTTService.Settings(model="saarika:v2.5")` explicitly.
(PR [#4334](https://github.com/pipecat-ai/pipecat/pull/4334))
- `SpeechTimeoutUserTurnStopStrategy` now waits only `user_speech_timeout` when
a transcript arrives without a VAD stop event, rather than
`max(ttfs_p99_latency, user_speech_timeout)`. If you had `ttfs_p99_latency >
user_speech_timeout`, turn detection in that path is slightly faster than
before.
(PR [#4337](https://github.com/pipecat-ai/pipecat/pull/4337))
- If you use an STT service that emits finalized transcripts (Speechmatics,
Soniox, Deepgram Flux, AssemblyAI) with `SpeechTimeoutUserTurnStopStrategy`,
user turns now end as soon as `user_speech_timeout` elapses after VAD stop.
Previously the strategy also waited for the STT P99 latency
(`ttfs_p99_latency`) even when the transcript was already marked final.
`user_speech_timeout` is still honored as a floor — STT finalization never
shortens it.
(PR [#4337](https://github.com/pipecat-ai/pipecat/pull/4337))
- ⚠️ `PlivoFrameSerializer` and `TelnyxFrameSerializer` now raise `ValueError`
at construction when `auto_hang_up=True` (the default) but required
credentials are missing, matching `TwilioFrameSerializer`. Previously they
constructed successfully and the hangup failed silently at call-end, leaving
phantom billable sessions on the provider. If you relied on the old silent
behavior, pass `auto_hang_up=False` explicitly or provide the credentials.
The specific fields checked are `call_id`/`auth_id`/`auth_token` for Plivo
and `call_control_id`/`api_key` for Telnyx.
(PR [#4349](https://github.com/pipecat-ai/pipecat/pull/4349))
- `ToolsSchema(standard_tools=...)` now accepts any `Sequence[FunctionSchema |
DirectFunction]` rather than requiring an exact `list` of the union. Callers
can pass a narrower `list[FunctionSchema]` (or any other `Sequence`) without
the type checker complaining about list invariance.
(PR [#4352](https://github.com/pipecat-ai/pipecat/pull/4352))
- Updated `aic-sdk` dependency to `~=2.2.0`. The `AIC_LICENSE_KEY` environment
variable replaces the previous `AICOUSTICS_LICENSE_KEY`.
(PR [#4362](https://github.com/pipecat-ai/pipecat/pull/4362))
- Loosened the `protobuf` dependency to `>=5.29.6,<7`, so projects pinned to
protobuf 5.x can install `pipecat-ai` again. The previous `>=6.31.1,<7` pin
(introduced in 1.0.8 alongside the `nvidia-riva-client 2.25.1` upgrade)
silently blocked any environment whose dependency graph already constrained
protobuf to the 5.x line. The bundled `frames_pb2.py` is now compiled with
protoc 5.x so it imports cleanly on both 5.x and 6.x runtimes.
Installing the `nvidia` extra still pulls protobuf 6.x: `nvidia-riva-client
2.25.1` ships gencode that requires a 6.x runtime, so `pipecat-ai[nvidia]`
now declares `protobuf>=6.31.1,<7` explicitly to cover an upstream packaging
gap (https://github.com/nvidia-riva/python-clients/issues/172).
(PR [#4372](https://github.com/pipecat-ai/pipecat/pull/4372))
- Daily rooms created by the development runner (`pipecat.runner.run`) now
expire after 4 hours with `eject_at_room_exp=True`, mirroring Pipecat Cloud's
max session limit. Previously, runner-created rooms inherited a 2-hour
expiration on the default code paths and had no expiration at all when
callers posted partial `dailyRoomProperties` (e.g. `{"start_video_off":
true}`) to `/start`, causing rooms to accumulate indefinitely. Explicit `exp`
and `eject_at_room_exp` values in `dailyRoomProperties` are still respected.
(PR [#4374](https://github.com/pipecat-ai/pipecat/pull/4374))
- Updated `daily-python` dependency to `~=0.28.0`.
(PR [#4379](https://github.com/pipecat-ai/pipecat/pull/4379))
### Deprecated
- Deprecated `TransportParams.video_out_bitrate` for the Daily transport. Use
`DailyParams.camera_out_send_settings` instead to configure camera publishing
encodings (bitrate, framerate, codec, etc.).
(PR [#4370](https://github.com/pipecat-ai/pipecat/pull/4370))
### Fixed
- Fixed missing tool handlers so unregistered tool calls fail with a normal
final tool result instead of leaving tool-call state hanging.
(PR [#4301](https://github.com/pipecat-ai/pipecat/pull/4301))
- Fixed `pipecat-ai[tavus]` not installing the required `daily-python`
dependency. Installing the `tavus` extra now correctly pulls in
`pipecat-ai[daily]`.
(PR [#4304](https://github.com/pipecat-ai/pipecat/pull/4304))
- Fixed audio loss and potential errors when STT settings were updated
mid-speech. Previously, `CartesiaSTTService` and `DeepgramSTTService` would
immediately disconnect and reconnect when settings changed, dropping any
in-flight audio. Reconnection is now deferred until the user stops speaking,
and audio arriving during the reconnect window is buffered and replayed.
(PR [#4311](https://github.com/pipecat-ai/pipecat/pull/4311))
- Fixed `SmallestTTSService` WebSocket endpoint URL to match Smallest AI v4.0.0
API (`wss://waves-api.smallest.ai` → `wss://api.smallest.ai`) and restored
keepalive using a silent space message instead of the unsupported flush
command.
(PR [#4320](https://github.com/pipecat-ai/pipecat/pull/4320))
- Fixed whitespace handling in TTS token streaming mode. Inter-token whitespace
(e.g., spaces between words) is now preserved for correct prosody, while
leading whitespace before the first non-whitespace token is still stripped to
avoid issues with TTS models that are sensitive to leading spaces.
(PR [#4323](https://github.com/pipecat-ai/pipecat/pull/4323))
- Fixed `SentryMetrics` silently dropping `MetricsFrame`s from
`stop_ttfb_metrics` and `stop_processing_metrics`. `SentryMetrics` called the
base `FrameProcessorMetrics` implementation but discarded its return value,
so `FrameProcessor` never pushed the `MetricsFrame` downstream. This
prevented observers (e.g. `UserBotLatencyObserver`, `MetricsLogObserver`)
from seeing TTFB and processing metrics for any service using
`metrics=SentryMetrics()`. The metrics were still calculated and Sentry
transactions still completed — only the downstream frame push was affected.
(PR [#4325](https://github.com/pipecat-ai/pipecat/pull/4325))
- Fixed `ElevenLabsTTSService` and `ElevenLabsHttpTTSService` emitting word
timestamps and `TTSTextFrame` content that matched the input text instead of
the spoken audio when a pronunciation dictionary
(`pronunciation_dictionary_locators`) or text normalization rewrote the
input. Both services now consume ElevenLabs' normalized alignment, so
downstream consumers (captions, transcripts, context aggregation) reflect
what the listener actually hears.
(PR [#4344](https://github.com/pipecat-ai/pipecat/pull/4344))
- Fixed a crash in `DeepgramSTTService` when an `STTUpdateSettingsFrame`
arrived before the WebSocket handshake completed (for example, when pushing
an update upstream on `StartFrame`). The settings-triggered reconnect
cancelled the in-flight connection task before its keepalive task was
created, causing an `UnboundLocalError: cannot access local variable
'keepalive_task'` in the handler's `finally` block.
(PR [#4347](https://github.com/pipecat-ai/pipecat/pull/4347))
- Fixed direct-function registration crashing for functions without a
docstring. `DirectFunctionWrapper` passed `inspect.getdoc()`'s result to
`docstring_parser.parse()`, which raises when the docstring is `None`.
Functions now register cleanly whether or not they have a docstring; an empty
docstring produces empty description and parameter metadata as expected.
(PR [#4352](https://github.com/pipecat-ai/pipecat/pull/4352))
- Fixed `AssemblyAISTTService`, `CartesiaSTTService`, `GradiumSTTService`, and
`SonioxSTTService` crashing the pipeline on transient WebSocket send
failures. Each `run_stt` sent audio directly without catching errors, so a
single network hiccup mid-stream raised an uncaught exception through
`process_frame`. The guards now log a warning and let the connection-state
check on the next call handle recovery, matching the pattern used by
Deepgram, xAI, Azure, and other push-based STTs.
(PR [#4352](https://github.com/pipecat-ai/pipecat/pull/4352))
- Fixed Gemini Live losing conversation history in the (rare) case of a
WebSocket reconnect before any session resumption handle is received. When
the session reconnects (e.g. on system instruction change), conversation
history is now re-seeded into the new session before it is marked ready for
input.
(PR [#4355](https://github.com/pipecat-ai/pipecat/pull/4355))
- Fixed SmallWebRTC data channel silently stalling on networks with a 1280-byte
MTU (IPv6, Tailscale overlays, many consumer VPNs). aiortc's default SCTP
chunk size of 1200 bytes produces ~1305-byte UDP datagrams after headers,
which the kernel rejects with EMSGSIZE; aiortc has no path-MTU discovery so
it retransmits forever at the same oversized size. The chunk size is now
clamped to 1100 bytes (~1205-byte datagrams, ~75 bytes of slack). Override
with `PIPECAT_SCTP_MAX_CHUNK_SIZE` if your path MTU requires a different
value.
(PR [#4358](https://github.com/pipecat-ai/pipecat/pull/4358))
## [1.0.0] - 2026-04-14
Migration guide: https://docs.pipecat.ai/pipecat/migration/migration-1.0

View File

@@ -28,6 +28,10 @@
## 🌐 Pipecat Ecosystem
### 🧩 Multi-agent systems
Need multiple AI agents working together? [Pipecat Subagents](https://github.com/pipecat-ai/pipecat-subagents) lets you build distributed multi-agent systems where each agent runs its own pipeline and communicates through a shared message bus. Hand off conversations between specialists, dispatch background tasks, and scale agents across processes or machines.
### 📱 Client SDKs
Building client applications? You can connect to Pipecat from any platform using our official SDKs:
@@ -67,7 +71,7 @@ and install any of the available plugins.
### 🧩 Community Integrations
Build and share your own Pipecat service integrations! Browse existing [community integrations](https://docs.pipecat.ai/server/services/community-integrations) or check out our [guide](COMMUNITY_INTEGRATIONS.md) to create your own.
Build and share your own Pipecat service integrations! Browse existing [community integrations](https://docs.pipecat.ai/api-reference/server/services/community-integrations) or check out our [guide](COMMUNITY_INTEGRATIONS.md) to create your own.
### 📺️ Pipecat TV Channel
@@ -85,22 +89,22 @@ Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.yout
## 🧩 Available services
| Category | Services |
| ------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Gradium](https://docs.pipecat.ai/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [NVIDIA Riva](https://docs.pipecat.ai/server/services/stt/riva), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [Sarvam](https://docs.pipecat.ai/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) |
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/server/services/llm/aws), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/server/services/llm/mistral), [Nebius](https://docs.pipecat.ai/server/services/llm/nebius), [Novita](https://docs.pipecat.ai/server/services/llm/novita), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nvidia), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/server/services/llm/sambanova), [Sarvam](https://docs.pipecat.ai/server/services/llm/sarvam), [Together AI](https://docs.pipecat.ai/server/services/llm/together) |
| Text-to-Speech | [Async](https://docs.pipecat.ai/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Camb AI](https://docs.pipecat.ai/server/services/tts/camb), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [Gradium](https://docs.pipecat.ai/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/server/services/tts/groq), [Hume](https://docs.pipecat.ai/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/server/services/tts/inworld), [Kokoro](https://docs.pipecat.ai/server/services/tts/kokoro), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [NVIDIA Riva](https://docs.pipecat.ai/server/services/tts/riva), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [Resemble](https://docs.pipecat.ai/server/services/tts/resemble), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [Smallest](https://docs.pipecat.ai/server/services/tts/smallest), [Speechmatics](https://docs.pipecat.ai/server/services/tts/speechmatics), [xAI](https://docs.pipecat.ai/server/services/tts/xai), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
| Speech-to-Speech | [AWS Nova Sonic](https://docs.pipecat.ai/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/server/services/s2s/ultravox), |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [LiveKit (WebRTC)](https://docs.pipecat.ai/server/services/transport/livekit), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), [WhatsApp](https://docs.pipecat.ai/server/services/transport/whatsapp), Local |
| Serializers | [Exotel](https://docs.pipecat.ai/server/services/serializers/exotel), [Genesys](https://docs.pipecat.ai/server/services/serializers/genesys), [Plivo](https://docs.pipecat.ai/server/services/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/services/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/services/serializers/telnyx), [Vonage](https://docs.pipecat.ai/server/services/serializers/vonage) |
| Video | [HeyGen](https://docs.pipecat.ai/server/services/video/heygen), [LemonSlice](https://docs.pipecat.ai/server/services/transport/lemonslice), [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) |
| Memory | [mem0](https://docs.pipecat.ai/server/services/memory/mem0) |
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/google-imagen), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) |
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp Viva](https://docs.pipecat.ai/guides/features/krisp-viva), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [ai-coustics](https://docs.pipecat.ai/server/utilities/audio/aic-filter), [RNNoise](https://docs.pipecat.ai/server/utilities/audio/rnnoise-filter) |
| Analytics & Metrics | [OpenTelemetry](https://docs.pipecat.ai/server/utilities/opentelemetry), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) |
| Community | [Browse community integrations →](https://docs.pipecat.ai/server/services/community-integrations) |
| Category | Services |
| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/api-reference/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/api-reference/server/services/stt/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/api-reference/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/api-reference/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/api-reference/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/api-reference/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/api-reference/server/services/stt/gladia), [Google](https://docs.pipecat.ai/api-reference/server/services/stt/google), [Gradium](https://docs.pipecat.ai/api-reference/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/api-reference/server/services/stt/groq), [Mistral](https://docs.pipecat.ai/api-reference/server/services/stt/mistral), [NVIDIA Riva](https://docs.pipecat.ai/api-reference/server/services/stt/riva), [OpenAI (Whisper)](https://docs.pipecat.ai/api-reference/server/services/stt/openai), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/api-reference/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/api-reference/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/api-reference/server/services/stt/whisper), [xAI](https://docs.pipecat.ai/api-reference/server/services/stt/xai) |
| LLMs | [Anthropic](https://docs.pipecat.ai/api-reference/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/api-reference/server/services/llm/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/api-reference/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/api-reference/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/api-reference/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/api-reference/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/api-reference/server/services/llm/grok), [Groq](https://docs.pipecat.ai/api-reference/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/api-reference/server/services/llm/mistral), [Nebius](https://docs.pipecat.ai/api-reference/server/services/llm/nebius), [Novita](https://docs.pipecat.ai/api-reference/server/services/llm/novita), [NVIDIA NIM](https://docs.pipecat.ai/api-reference/server/services/llm/nvidia), [Ollama](https://docs.pipecat.ai/api-reference/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/api-reference/server/services/llm/openai), [OpenAI Responses](https://docs.pipecat.ai/api-reference/server/services/llm/openai-responses), [OpenRouter](https://docs.pipecat.ai/api-reference/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/api-reference/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/api-reference/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/api-reference/server/services/llm/sambanova), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/llm/sarvam), [Together AI](https://docs.pipecat.ai/api-reference/server/services/llm/together) |
| Text-to-Speech | [Async](https://docs.pipecat.ai/api-reference/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/api-reference/server/services/tts/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/tts/azure), [Camb AI](https://docs.pipecat.ai/api-reference/server/services/tts/camb), [Cartesia](https://docs.pipecat.ai/api-reference/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/api-reference/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/api-reference/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/api-reference/server/services/tts/fish), [Google](https://docs.pipecat.ai/api-reference/server/services/tts/google), [Gradium](https://docs.pipecat.ai/api-reference/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/api-reference/server/services/tts/groq), [Hume](https://docs.pipecat.ai/api-reference/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/api-reference/server/services/tts/inworld), [Kokoro](https://docs.pipecat.ai/api-reference/server/services/tts/kokoro), [LMNT](https://docs.pipecat.ai/api-reference/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/api-reference/server/services/tts/minimax), [Mistral](https://docs.pipecat.ai/api-reference/server/services/tts/mistral), [Neuphonic](https://docs.pipecat.ai/api-reference/server/services/tts/neuphonic), [NVIDIA Riva](https://docs.pipecat.ai/api-reference/server/services/tts/riva), [OpenAI](https://docs.pipecat.ai/api-reference/server/services/tts/openai), [Piper](https://docs.pipecat.ai/api-reference/server/services/tts/piper), [Resemble](https://docs.pipecat.ai/api-reference/server/services/tts/resemble), [Rime](https://docs.pipecat.ai/api-reference/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/tts/sarvam), [Smallest](https://docs.pipecat.ai/api-reference/server/services/tts/smallest), [Soniox](https://docs.pipecat.ai/api-reference/server/services/tts/soniox), [Speechmatics](https://docs.pipecat.ai/api-reference/server/services/tts/speechmatics), [xAI](https://docs.pipecat.ai/api-reference/server/services/tts/xai), [XTTS](https://docs.pipecat.ai/api-reference/server/services/tts/xtts) |
| Speech-to-Speech | [AWS Nova Sonic](https://docs.pipecat.ai/api-reference/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/api-reference/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/api-reference/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/api-reference/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/api-reference/server/services/s2s/ultravox), |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/api-reference/server/services/transport/fastapi-websocket), [LiveKit (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/livekit), [SmallWebRTCTransport](https://docs.pipecat.ai/api-reference/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/api-reference/server/services/transport/websocket-server), [WhatsApp](https://docs.pipecat.ai/api-reference/server/services/transport/whatsapp), Local |
| Serializers | [Exotel](https://docs.pipecat.ai/api-reference/server/services/serializers/exotel), [Genesys](https://docs.pipecat.ai/api-reference/server/services/serializers/genesys), [Plivo](https://docs.pipecat.ai/api-reference/server/services/serializers/plivo), [Twilio](https://docs.pipecat.ai/api-reference/server/services/serializers/twilio), [Telnyx](https://docs.pipecat.ai/api-reference/server/services/serializers/telnyx), [Vonage](https://docs.pipecat.ai/api-reference/server/services/serializers/vonage) |
| Video | [HeyGen](https://docs.pipecat.ai/api-reference/server/services/video/heygen), [LemonSlice](https://docs.pipecat.ai/api-reference/server/services/transport/lemonslice), [Tavus](https://docs.pipecat.ai/api-reference/server/services/video/tavus), [Simli](https://docs.pipecat.ai/api-reference/server/services/video/simli) |
| Memory | [mem0](https://docs.pipecat.ai/api-reference/server/services/memory/mem0) |
| Vision & Image | [fal](https://docs.pipecat.ai/api-reference/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/api-reference/server/services/image-generation/google-imagen), [Moondream](https://docs.pipecat.ai/api-reference/server/services/vision/moondream) |
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/api-reference/server/utilities/audio/silero-vad-analyzer), [Krisp Viva](https://docs.pipecat.ai/guides/features/krisp-viva), [Koala](https://docs.pipecat.ai/api-reference/server/utilities/audio/koala-filter), [ai-coustics](https://docs.pipecat.ai/api-reference/server/utilities/audio/aic-filter), [RNNoise](https://docs.pipecat.ai/api-reference/server/utilities/audio/rnnoise-filter) |
| Analytics & Metrics | [OpenTelemetry](https://docs.pipecat.ai/api-reference/server/utilities/opentelemetry), [Sentry](https://docs.pipecat.ai/api-reference/server/services/analytics/sentry) |
| Community | [Browse community integrations →](https://docs.pipecat.ai/api-reference/server/services/community-integrations) |
📚 [View full services documentation →](https://docs.pipecat.ai/server/services/supported-services)
📚 [View full services documentation →](https://docs.pipecat.ai/api-reference/server/services/supported-services)
## ⚡ Getting started

View File

@@ -1,5 +1,5 @@
# AI-COUSTICS
AICOUSTICS_LICENSE_KEY=...
AIC_LICENSE_KEY=...
# Anthropic
ANTHROPIC_API_KEY=...
@@ -214,4 +214,10 @@ WHATSAPP_PHONE_NUMBER_ID=...
WHATSAPP_APP_SECRET=...
# xAI / Grok
XAI_API_KEY=...
XAI_API_KEY=...
# PIPECAT_SCTP_MAX_CHUNK_SIZE controls the maximum SCTP DATA-chunk payload
# size (bytes) used by aiortc's data channel. The default is 1100.
# All the details here:
# https://docs.pipecat.ai/api-reference/server/services/transport/small-webrtc#pipecat_sctp_max_chunk_size
#PIPECAT_SCTP_MAX_CHUNK_SIZE=1100

View File

@@ -71,17 +71,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),

View File

@@ -108,17 +108,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"), audio_passthrough=True)
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"], audio_passthrough=True)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121",
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),

View File

@@ -102,17 +102,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),

View File

@@ -89,10 +89,10 @@ async def get_current_weather(params: FunctionCallParams):
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info("Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
@@ -109,7 +109,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# Primary LLM for conversation (could be any provider)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction=system_prompt,
),
@@ -117,7 +117,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# Dedicated cheap/fast LLM for summarization only
summarization_llm = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
api_key=os.environ["GOOGLE_API_KEY"],
settings=GoogleLLMService.Settings(
model="gemini-2.5-flash",
),

View File

@@ -77,17 +77,17 @@ async def get_current_weather(params: FunctionCallParams):
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info("Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
api_key=os.environ["GOOGLE_API_KEY"],
settings=GoogleLLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way. You have access to tools to get the current weather - use them when relevant.",
),

View File

@@ -72,10 +72,10 @@ async def summarize_conversation(params: FunctionCallParams):
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info("Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
@@ -91,7 +91,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
"""
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction=system_prompt,
),

View File

@@ -77,17 +77,17 @@ async def get_current_weather(params: FunctionCallParams):
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info("Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way. You have access to tools to get the current weather - use them when relevant.",
),

View File

@@ -63,17 +63,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),

View File

@@ -58,24 +58,24 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
openai_llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
groq_llm = GroqLLMService(
api_key=os.getenv("GROQ_API_KEY"),
api_key=os.environ["GROQ_API_KEY"],
settings=GroqLLMService.Settings(
system_instruction="You are a very helpful assistant. Your goal is to demonstrate your capabilities in detail in a creative and helpful way.",
),

View File

@@ -63,10 +63,10 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info("Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
@@ -74,7 +74,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# Main LLM — drives the conversation. Its RTVI events reach the client.
main_llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
@@ -83,7 +83,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# Evaluator LLM — silently grades the user's message in the background.
# Its RTVI events will be suppressed so the client is unaware of this branch.
evaluator_llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
name="EvaluatorLLM",
settings=OpenAILLMService.Settings(
system_instruction="You are a silent quality evaluator. When given a user message, respond with a single JSON object: {'score': <1-5>, 'reason': '<brief reason>'}. Do not respond conversationally.",

View File

@@ -91,17 +91,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),

View File

@@ -56,10 +56,10 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = DeepgramTTSService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
api_key=os.environ["DEEPGRAM_API_KEY"],
settings=DeepgramTTSService.Settings(
voice="aura-asteria-en",
),
@@ -68,7 +68,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
llm = OpenAILLMService(
# To use OpenAI
# api_key=os.getenv("OPENAI_API_KEY"),
# api_key=os.environ["OPENAI_API_KEY"],
# Or, to use a local vLLM (or similar) api server
settings=OpenAILLMService.Settings(
model="meta-llama/Meta-Llama-3-8B-Instruct",

View File

@@ -55,17 +55,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="d4db5fb9-f44b-4bd1-85fa-192e0f0d75f9", # Spanish-speaking Lady
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="You are a live translation assistant. Your sole purpose is to translate English text into Spanish. When you receive English text from the user, immediately translate it into natural, fluent Spanish. Do not add explanations, commentary, or extra information—only provide the Spanish translation of the text you receive.",
),

View File

@@ -126,14 +126,14 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
llm_text_aggregator.on_pattern_match("voice", on_voice_tag)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
# Process LLM text through the pattern aggregator before TTS
llm_text_processor = LLMTextProcessor(text_aggregator=llm_text_aggregator)
# Initialize TTS with narrator voice as default
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice=VOICE_IDS["narrator"],
),
@@ -190,7 +190,7 @@ Remember: Use narrator voice for EVERYTHING except the actual quoted dialogue.""
# Initialize LLM
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction=system_prompt,
),

View File

@@ -94,19 +94,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
required=["location", "format"],
)
stt_cartesia = CartesiaSTTService(api_key=os.getenv("CARTESIA_API_KEY"))
stt_deepgram = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt_cartesia = CartesiaSTTService(api_key=os.environ["CARTESIA_API_KEY"])
stt_deepgram = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
# Uses ServiceSwitcherStrategyManual by default
stt_switcher = ServiceSwitcher(services=[stt_cartesia, stt_deepgram])
tts_cartesia = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
tts_deepgram = DeepgramTTSService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
api_key=os.environ["DEEPGRAM_API_KEY"],
settings=DeepgramTTSService.Settings(
voice="aura-2-helena-en",
),
@@ -117,11 +117,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
system_prompt = "You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way."
llm_openai = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(system_instruction=system_prompt),
)
llm_google = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
api_key=os.environ["GOOGLE_API_KEY"],
settings=GoogleLLMService.Settings(system_instruction=system_prompt),
)
# Uses ServiceSwitcherStrategyManual by default

View File

@@ -42,14 +42,14 @@ class SwitchLanguage(ParallelPipeline):
self._current_language = "English"
english_tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
spanish_tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="d4db5fb9-f44b-4bd1-85fa-192e0f0d75f9", # Spanish-speaking Lady
),
@@ -101,7 +101,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
api_key=os.environ["DEEPGRAM_API_KEY"],
settings=DeepgramSTTService.Settings(
language="multi",
),
@@ -110,7 +110,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = SwitchLanguage()
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way. You can speak the following languages: 'English' and 'Spanish'.",
),

View File

@@ -42,21 +42,21 @@ class SwitchVoices(ParallelPipeline):
self._current_voice = "News Lady"
news_lady = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="bf991597-6c13-47e4-8411-91ec2de5c466", # Newslady
),
)
british_lady = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
barbershop_man = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="a0e99841-438c-4a64-b679-ae501e7d6091", # Barbershop Man
),
@@ -114,12 +114,12 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = SwitchVoices()
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative and helpful way. You can do the following voices: 'News Lady', 'British Lady' and 'Barbershop Man'.",
),

View File

@@ -60,13 +60,13 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
# Cartesia offers a `<spell></spell>` tags that we can use to ask the user
# to confirm the emails.
# (see https://docs.cartesia.ai/build-with-sonic/formatting-text-for-sonic/spelling-out-input-text)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
@@ -84,7 +84,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# )
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="You need to gather a valid email or emails from the user. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. If the user provides one or more email addresses confirm them with the user. Enclose all emails with <spell> tags, for example <spell>a@a.com</spell>.",
),

View File

@@ -52,22 +52,22 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
classifier_llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
classifier_llm = OpenAILLMService(api_key=os.environ["OPENAI_API_KEY"])
voicemail = VoicemailDetector(llm=classifier_llm)

View File

@@ -57,21 +57,21 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
api_key=os.environ["DEEPGRAM_API_KEY"],
settings=DeepgramSTTService.Settings(
keyterm=["pipecat"],
),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),

View File

@@ -107,17 +107,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = AnthropicLLMService(
api_key=os.getenv("ANTHROPIC_API_KEY"),
api_key=os.environ["ANTHROPIC_API_KEY"],
enable_async_tool_cancellation=True,
settings=AnthropicLLMService.Settings(
system_instruction=(

View File

@@ -66,17 +66,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = AnthropicLLMService(
api_key=os.getenv("ANTHROPIC_API_KEY"),
api_key=os.environ["ANTHROPIC_API_KEY"],
enable_async_tool_cancellation=True,
settings=AnthropicLLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",

View File

@@ -86,10 +86,10 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
@@ -97,7 +97,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# Anthropic for vision analysis
llm = AnthropicLLMService(
api_key=os.getenv("ANTHROPIC_API_KEY"),
api_key=os.environ["ANTHROPIC_API_KEY"],
settings=AnthropicLLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way. You are able to describe images from the user camera.",
),

View File

@@ -65,17 +65,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = AnthropicLLMService(
api_key=os.getenv("ANTHROPIC_API_KEY"),
api_key=os.environ["ANTHROPIC_API_KEY"],
settings=AnthropicLLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),

View File

@@ -86,10 +86,10 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),

View File

@@ -60,18 +60,18 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = AzureLLMService(
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
api_key=os.environ["AZURE_CHATGPT_API_KEY"],
endpoint=os.environ["AZURE_CHATGPT_ENDPOINT"],
settings=AzureLLMService.Settings(
model=os.getenv("AZURE_CHATGPT_MODEL"),
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",

View File

@@ -60,17 +60,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = CerebrasLLMService(
api_key=os.getenv("CEREBRAS_API_KEY"),
api_key=os.environ["CEREBRAS_API_KEY"],
settings=CerebrasLLMService.Settings(
system_instruction="""You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.

View File

@@ -60,17 +60,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = DeepSeekLLMService(
api_key=os.getenv("DEEPSEEK_API_KEY"),
api_key=os.environ["DEEPSEEK_API_KEY"],
settings=DeepSeekLLMService.Settings(
model="deepseek-chat",
system_instruction="""You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.

View File

@@ -76,17 +76,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),

View File

@@ -60,17 +60,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = FireworksLLMService(
api_key=os.getenv("FIREWORKS_API_KEY"),
api_key=os.environ["FIREWORKS_API_KEY"],
settings=FireworksLLMService.Settings(
model="accounts/fireworks/models/gpt-oss-20b",
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",

View File

@@ -107,17 +107,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
api_key=os.environ["GOOGLE_API_KEY"],
enable_async_tool_cancellation=True,
settings=GoogleLLMService.Settings(
system_instruction=(

View File

@@ -98,10 +98,10 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
@@ -127,7 +127,7 @@ indicate you should use the get_image tool are:
"""
llm = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
api_key=os.environ["GOOGLE_API_KEY"],
enable_async_tool_cancellation=True,
settings=GoogleLLMService.Settings(
system_instruction=system_prompt,

View File

@@ -60,19 +60,19 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = ElevenLabsTTSService(
api_key=os.getenv("ELEVENLABS_API_KEY", ""),
api_key=os.environ["ELEVENLABS_API_KEY"],
settings=ElevenLabsTTSService.Settings(
voice=os.getenv("ELEVENLABS_VOICE_ID", ""),
voice=os.getenv("ELEVENLABS_VOICE_ID", "Xb7hH8MSUJpSbSDYk0k2"),
),
)
llm = GoogleVertexLLMService(
credentials=os.getenv("GOOGLE_VERTEX_TEST_CREDENTIALS"),
project_id=os.getenv("GOOGLE_CLOUD_PROJECT_ID"),
location=os.getenv("GOOGLE_CLOUD_LOCATION"),
credentials=os.environ["GOOGLE_VERTEX_TEST_CREDENTIALS"],
project_id=os.environ["GOOGLE_CLOUD_PROJECT_ID"],
location=os.environ["GOOGLE_CLOUD_LOCATION"],
settings=GoogleVertexLLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
@@ -103,14 +103,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
)
tools = ToolsSchema(standard_tools=[weather_function])
messages = [
{
"role": "developer",
"content": "Start a conversation with 'Hey there' to get the current weather.",
},
]
context = LLMContext(messages, tools)
context = LLMContext(tools=tools)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -141,6 +134,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{
"role": "developer",
"content": "Please introduce yourself to the user.",
}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -86,10 +86,10 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
@@ -97,7 +97,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# Google Gemini model for vision analysis
llm = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
api_key=os.environ["GOOGLE_API_KEY"],
settings=GoogleLLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way. You are able to describe images from the user camera.",
),

View File

@@ -96,10 +96,10 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
@@ -125,7 +125,7 @@ indicate you should use the get_image tool are:
"""
llm = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
api_key=os.environ["GOOGLE_API_KEY"],
settings=GoogleLLMService.Settings(
system_instruction=system_prompt,
),

View File

@@ -62,10 +62,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
async with aiohttp.ClientSession() as session:
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = XAIHttpTTSService(
api_key=os.getenv("XAI_API_KEY"),
api_key=os.environ["XAI_API_KEY"],
aiohttp_session=session,
settings=XAIHttpTTSService.Settings(
voice="eve",
@@ -73,7 +73,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
)
llm = GrokLLMService(
api_key=os.getenv("XAI_API_KEY"),
api_key=os.environ["XAI_API_KEY"],
settings=GrokLLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),

View File

@@ -60,17 +60,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = GroqSTTService(api_key=os.getenv("GROQ_API_KEY"))
stt = GroqSTTService(api_key=os.environ["GROQ_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = GroqLLMService(
api_key=os.getenv("GROQ_API_KEY"),
api_key=os.environ["GROQ_API_KEY"],
settings=GroqLLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),

View File

@@ -63,17 +63,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = MistralLLMService(
api_key=os.getenv("MISTRAL_API_KEY"),
api_key=os.environ["MISTRAL_API_KEY"],
settings=MistralLLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),

View File

@@ -117,17 +117,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way. You are able to describe images from the user camera.",
),

View File

@@ -63,17 +63,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = NebiusLLMService(
api_key=os.getenv("NEBIUS_API_KEY"),
api_key=os.environ["NEBIUS_API_KEY"],
settings=NebiusLLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),

View File

@@ -63,17 +63,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = NovitaLLMService(
api_key=os.getenv("NOVITA_API_KEY"),
api_key=os.environ["NOVITA_API_KEY"],
settings=NovitaLLMService.Settings(
model="openai/gpt-oss-120b",
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",

View File

@@ -60,17 +60,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = NvidiaLLMService(
api_key=os.getenv("NVIDIA_API_KEY"),
api_key=os.environ["NVIDIA_API_KEY"],
settings=NvidiaLLMService.Settings(
model="nvidia/llama-3.3-nemotron-super-49b-v1.5",
# Recommended when turning thinking off

View File

@@ -64,10 +64,10 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),

View File

@@ -107,17 +107,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
enable_async_tool_cancellation=True,
settings=OpenAILLMService.Settings(
system_instruction=(

View File

@@ -70,7 +70,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = OpenAISTTService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAISTTService.Settings(
model="gpt-4o-transcribe",
prompt="Expect words related weather, such as temperature and conditions. And restaurant names.",
@@ -78,7 +78,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
)
tts = OpenAITTSService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAITTSService.Settings(
voice="ballad",
),
@@ -86,7 +86,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
enable_async_tool_cancellation=True,
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",

View File

@@ -107,17 +107,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAIResponsesLLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
enable_async_tool_cancellation=True,
settings=OpenAIResponsesLLMService.Settings(
system_instruction=(

View File

@@ -66,17 +66,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAIResponsesLLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
enable_async_tool_cancellation=True,
settings=OpenAIResponsesLLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",

View File

@@ -63,17 +63,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAIResponsesHttpLLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAIResponsesHttpLLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),

View File

@@ -87,17 +87,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAIResponsesHttpLLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAIResponsesHttpLLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way. You are able to describe images from the user camera.",
),

View File

@@ -87,17 +87,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAIResponsesLLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAIResponsesLLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way. You are able to describe images from the user camera.",
),

View File

@@ -63,17 +63,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAIResponsesLLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAIResponsesLLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),

View File

@@ -87,17 +87,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way. You are able to describe images from the user camera.",
),

View File

@@ -64,7 +64,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = OpenAISTTService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAISTTService.Settings(
model="gpt-4o-transcribe",
prompt="Expect words related weather, such as temperature and conditions. And restaurant names.",
@@ -72,7 +72,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
)
tts = OpenAITTSService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAITTSService.Settings(
voice="ballad",
),
@@ -80,7 +80,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),

View File

@@ -60,11 +60,11 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = AzureTTSService(
api_key=os.getenv("AZURE_SPEECH_API_KEY"),
region=os.getenv("AZURE_SPEECH_REGION"),
api_key=os.environ["AZURE_SPEECH_API_KEY"],
region=os.environ["AZURE_SPEECH_REGION"],
settings=AzureTTSService.Settings(
voice="en-US-JennyNeural",
language="en-US",
@@ -74,7 +74,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
)
llm = OpenRouterLLMService(
api_key=os.getenv("OPENROUTER_API_KEY"),
api_key=os.environ["OPENROUTER_API_KEY"],
settings=OpenRouterLLMService.Settings(
model="openai/gpt-4o-2024-11-20",
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",

View File

@@ -58,17 +58,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = PerplexityLLMService(
api_key=os.getenv("PERPLEXITY_API_KEY"),
api_key=os.environ["PERPLEXITY_API_KEY"],
settings=PerplexityLLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),

View File

@@ -60,17 +60,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = QwenLLMService(
api_key=os.getenv("QWEN_API_KEY"),
api_key=os.environ["QWEN_API_KEY"],
model="qwen2.5-72b-instruct",
settings=QwenLLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",

View File

@@ -61,18 +61,18 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
api_key=os.environ["DEEPGRAM_API_KEY"],
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = SambaNovaLLMService(
api_key=os.getenv("SAMBANOVA_API_KEY"),
api_key=os.environ["SAMBANOVA_API_KEY"],
settings=SambaNovaLLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),

View File

@@ -64,21 +64,21 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = SarvamSTTService(
api_key=os.getenv("SARVAM_API_KEY"),
api_key=os.environ["SARVAM_API_KEY"],
settings=SarvamSTTService.Settings(
model="saaras:v3",
),
)
tts = SarvamTTSService(
api_key=os.getenv("SARVAM_API_KEY"),
api_key=os.environ["SARVAM_API_KEY"],
settings=SarvamTTSService.Settings(
model="bulbul:v3",
voice="shubh",
),
)
llm = SarvamLLMService(
api_key=os.getenv("SARVAM_API_KEY"),
api_key=os.environ["SARVAM_API_KEY"],
settings=SarvamLLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),

View File

@@ -60,17 +60,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = TogetherLLMService(
api_key=os.getenv("TOGETHER_API_KEY"),
api_key=os.environ["TOGETHER_API_KEY"],
settings=TogetherLLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),

View File

@@ -0,0 +1,260 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
"""Example demonstrating ``PipelineTask(tool_resources=...)``.
``tool_resources`` is an application-defined bag of anything you want every
tool handler in a session to share by reference: database handles, HTTP
clients, feature flags, per-user state, observability clients, in-memory
caches — whatever fits your app. Pipecat passes it through untouched as
``FunctionCallParams.tool_resources``.
This example uses a small ``ToolCallLogger`` as a stand-in for that "shared
thing". A real app might just as easily pass a Postgres pool, a Redis
client, a Stripe SDK instance, or any combination thereof. The mechanics
shown here — construct once, hand to the task, read it from each handler,
inspect it after the session — are the same regardless of what you put in.
We bundle resources in a typed ``SessionResources`` dataclass and cast back
to it at the top of each handler. Pipecat doesn't care what type you pass
(a plain dict works too), but a typed container gives you autocomplete and
refactor safety instead of dict-by-string-key lookups.
"""
import json
import os
from collections.abc import Mapping
from dataclasses import dataclass
from datetime import UTC, datetime, timezone
from typing import Any, cast
from dotenv import load_dotenv
from loguru import logger
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame, TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.llm_service import FunctionCallParams
from pipecat.services.openai.responses.llm import OpenAIResponsesLLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
class ToolCallLogger:
"""Stand-in shared resource — swap for whatever your app actually needs."""
def __init__(self):
"""Initialize the logger with an empty list of recorded calls."""
self._calls: list[dict[str, Any]] = []
def log_tool_call(self, function_name: str, arguments: Mapping[str, Any]) -> None:
"""Record a tool call invocation.
Args:
function_name: The name of the tool being invoked.
arguments: The arguments passed to the tool.
"""
entry = {
"timestamp": datetime.now(UTC).isoformat(),
"function_name": function_name,
"arguments": dict(arguments),
}
self._calls.append(entry)
logger.info(f"[ToolCallLogger] {function_name} called with {dict(arguments)}")
def dump(self) -> str:
"""Return all recorded tool calls as a JSON string."""
return json.dumps(self._calls, indent=2)
@dataclass
class SessionResources:
"""Typed container for everything the tool handlers in this session share.
Add fields here as the app grows (e.g. ``db: AsyncConnection``,
``http: httpx.AsyncClient``). Handlers ``cast()`` ``params.tool_resources``
to this type to get autocomplete and refactor safety.
"""
tool_call_logger: ToolCallLogger
async def fetch_weather_from_api(params: FunctionCallParams):
resources = cast(SessionResources, params.tool_resources)
resources.tool_call_logger.log_tool_call(params.function_name, params.arguments)
await params.result_callback({"conditions": "nice", "temperature": "75"})
async def fetch_restaurant_recommendation(params: FunctionCallParams):
resources = cast(SessionResources, params.tool_resources)
resources.tool_call_logger.log_tool_call(params.function_name, params.arguments)
await params.result_callback({"name": "The Golden Dragon"})
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAIResponsesLLMService(
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAIResponsesLLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
# You can also register a function_name of None to get all functions
# sent to the same callback with an additional function_name parameter.
llm.register_function("get_current_weather", fetch_weather_from_api)
llm.register_function("get_restaurant_recommendation", fetch_restaurant_recommendation)
@llm.event_handler("on_connection_error")
async def on_connection_error(service, error):
logger.error(f"LLM connection error: {error}")
@llm.event_handler("on_function_calls_started")
async def on_function_calls_started(service, function_calls):
# Avoid appending this filler message to the LLM context — it would
# alter the conversation history and prevent
# OpenAIResponsesLLMService's previous_response_id optimization from
# matching, forcing a full context resend.
await tts.queue_frame(TTSSpeakFrame("Let me check on that.", append_to_context=False))
weather_function = FunctionSchema(
name="get_current_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the user's location.",
},
},
required=["location", "format"],
)
restaurant_function = FunctionSchema(
name="get_restaurant_recommendation",
description="Get a restaurant recommendation",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
},
required=["location"],
)
tools = ToolsSchema(standard_tools=[weather_function, restaurant_function])
context = LLMContext(tools=tools)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
[
transport.input(),
stt,
user_aggregator,
llm,
tts,
transport.output(),
assistant_aggregator,
]
)
# Keep a local handle so we can read collected state after the session
# ends; Pipecat never copies or clears the object.
tool_call_logger = ToolCallLogger()
resources = SessionResources(tool_call_logger=tool_call_logger)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
tool_resources=resources,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
# The session has ended; read whatever state the handlers built up.
logger.info(f"Tool calls logged during session:\n{tool_call_logger.dump()}")
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -36,7 +36,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),

View File

@@ -28,7 +28,7 @@ async def main():
transport = LocalAudioTransport(LocalAudioTransportParams(audio_out_enabled=True))
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),

View File

@@ -38,14 +38,14 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),

View File

@@ -42,7 +42,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
imagegen = GoogleImageGenService(
api_key=os.getenv("GOOGLE_API_KEY"),
api_key=os.environ["GOOGLE_API_KEY"],
)
task = PipelineTask(

View File

@@ -108,10 +108,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# Create an HTTP session for API calls
async with aiohttp.ClientSession() as session:
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(api_key=os.environ["OPENAI_API_KEY"])
tts = CartesiaHttpTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaHttpTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),

View File

@@ -96,17 +96,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),

View File

@@ -51,17 +51,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),

View File

@@ -40,17 +40,17 @@ async def main():
)
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),

View File

@@ -63,17 +63,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),

View File

@@ -53,10 +53,10 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
@@ -77,7 +77,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
"""
llm = AnthropicLLMService(
api_key=os.getenv("ANTHROPIC_API_KEY"),
api_key=os.environ["ANTHROPIC_API_KEY"],
settings=AnthropicLLMService.Settings(
system_instruction=system_prompt,
),

View File

@@ -49,10 +49,10 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
@@ -71,7 +71,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
"""
llm = AnthropicLLMService(
api_key=os.getenv("ANTHROPIC_API_KEY"),
api_key=os.environ["ANTHROPIC_API_KEY"],
settings=AnthropicLLMService.Settings(
system_instruction=system_prompt,
),

View File

@@ -23,8 +23,6 @@ from pipecat.processors.aggregators.llm_response_universal import (
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.google.gemini_live.llm import GeminiLiveLLMService
from pipecat.services.mcp_service import MCPClient
from pipecat.transports.base_transport import BaseTransport, TransportParams
@@ -54,15 +52,6 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
system = f"""
You are a helpful LLM in a voice call.
Your goal is to answer questions about the user's GitHub repositories and account.
@@ -85,7 +74,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tools = await mcp.get_tools_schema()
llm = GeminiLiveLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
api_key=os.environ["GOOGLE_API_KEY"],
system_instruction=system,
tools=tools,
)

View File

@@ -54,10 +54,10 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
@@ -73,7 +73,7 @@ Just respond with short sentences when you are carrying out tool calls.
"""
llm = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
api_key=os.environ["GOOGLE_API_KEY"],
settings=GoogleLLMService.Settings(
system_instruction=system_prompt,
),

View File

@@ -100,17 +100,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),

View File

@@ -60,12 +60,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
)
stt = DeepgramSTTService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
api_key=os.environ["DEEPGRAM_API_KEY"],
metrics=SentryMetrics(),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
@@ -73,7 +73,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
metrics=SentryMetrics(),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",

View File

@@ -84,7 +84,7 @@ async def load_conversation(params: FunctionCallParams):
filename = params.arguments["filename"]
logger.debug(f"loading conversation from {filename}")
try:
with open(filename, "r") as file:
with open(filename) as file:
params.context.set_messages(json.load(file))
logger.debug(
f"loaded conversation from {filename}\n{json.dumps(params.context.get_messages(), indent=4)}"
@@ -170,17 +170,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = AnthropicLLMService(
api_key=os.getenv("ANTHROPIC_API_KEY"),
api_key=os.environ["ANTHROPIC_API_KEY"],
settings=AnthropicLLMService.Settings(
system_instruction=system_instruction,
),

View File

@@ -87,7 +87,9 @@ async def save_conversation(params: FunctionCallParams):
# the simplest thing to do is to pop messages until the last one is an assistant
# response
while messages and not (
messages[-1].get("role") == "assistant" and "content" in messages[-1]
isinstance(messages[-1], dict)
and messages[-1].get("role") == "assistant"
and "content" in messages[-1]
):
messages.pop()
if messages: # we never expect this to be empty
@@ -105,7 +107,7 @@ async def load_conversation(params: FunctionCallParams):
filename = params.arguments["filename"]
logger.debug(f"loading conversation from {filename}")
try:
with open(filename, "r") as file:
with open(filename) as file:
messages = json.load(file)
# HACK: if using the older Nova Sonic (pre-2) model, you need a special way of
# triggering the first assistant response. The call to trigger_assistant_response(),
@@ -125,6 +127,7 @@ async def load_conversation(params: FunctionCallParams):
}
)
params.context.set_messages(messages)
assert isinstance(params.llm, AWSNovaSonicLLMService)
await params.llm.reset_conversation()
# await params.llm.trigger_assistant_response()
except Exception as e:
@@ -219,9 +222,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
)
llm = AWSNovaSonicLLMService(
secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
region=os.getenv("AWS_REGION"), # as of 2025-05-06, us-east-1 is the only supported region
secret_access_key=os.environ["AWS_SECRET_ACCESS_KEY"],
access_key_id=os.environ["AWS_ACCESS_KEY_ID"],
region=os.environ["AWS_REGION"], # as of 2025-05-06, us-east-1 is the only supported region
settings=AWSNovaSonicLLMService.Settings(
voice="tiffany", # matthew, tiffany, amy
system_instruction=system_instruction,

View File

@@ -110,7 +110,7 @@ async def load_conversation(params: FunctionCallParams):
filename = params.arguments["filename"]
logger.debug(f"loading conversation from {filename}")
try:
with open(filename, "r") as file:
with open(filename) as file:
params.context.set_messages(json.load(file))
await params.result_callback(
{
@@ -243,17 +243,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
api_key=os.environ["GOOGLE_API_KEY"],
system_instruction=system_instruction,
)

View File

@@ -94,7 +94,7 @@ async def load_conversation(params: FunctionCallParams):
filename = params.arguments["filename"]
logger.debug(f"loading conversation from {filename}")
try:
with open(filename, "r") as file:
with open(filename) as file:
params.context.set_messages(json.load(file))
await params.llm.reset_conversation()
# Manually create a response since we've reset the conversation
@@ -192,7 +192,7 @@ Remember, your responses should be short - just one or two sentences usually."""
)
llm = GrokRealtimeLLMService(
api_key=os.getenv("XAI_API_KEY"),
api_key=os.environ["XAI_API_KEY"],
session_properties=session_properties,
)

View File

@@ -91,8 +91,9 @@ async def load_conversation(params: FunctionCallParams):
filename = params.arguments["filename"]
logger.debug(f"loading conversation from {filename}")
try:
with open(filename, "r") as file:
with open(filename) as file:
params.context.set_messages(json.load(file))
assert isinstance(params.llm, OpenAIRealtimeLLMService)
await params.llm.reset_conversation()
# NOTE: we manually create a response here rather than relying
# on the function callback to trigger one since we've reset the
@@ -171,10 +172,10 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
llm = OpenAIRealtimeLLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAIRealtimeLLMService.Settings(
system_instruction="""Your knowledge cutoff is 2023-10. You are a helpful and friendly AI.

View File

@@ -85,7 +85,7 @@ async def load_conversation(params: FunctionCallParams):
filename = params.arguments["filename"]
logger.debug(f"loading conversation from {filename}")
try:
with open(filename, "r") as file:
with open(filename) as file:
params.context.set_messages(json.load(file))
logger.debug(
f"loaded conversation from {filename}\n{json.dumps(params.context.get_messages(), indent=4)}"
@@ -171,17 +171,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAIResponsesHttpLLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAIResponsesHttpLLMService.Settings(
system_instruction=system_instruction,
),

View File

@@ -85,7 +85,7 @@ async def load_conversation(params: FunctionCallParams):
filename = params.arguments["filename"]
logger.debug(f"loading conversation from {filename}")
try:
with open(filename, "r") as file:
with open(filename) as file:
params.context.set_messages(json.load(file))
logger.debug(
f"loaded conversation from {filename}\n{json.dumps(params.context.get_messages(), indent=4)}"
@@ -171,17 +171,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAIResponsesLLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAIResponsesLLMService.Settings(
system_instruction=system_instruction,
),

View File

@@ -85,7 +85,7 @@ async def load_conversation(params: FunctionCallParams):
filename = params.arguments["filename"]
logger.debug(f"loading conversation from {filename}")
try:
with open(filename, "r") as file:
with open(filename) as file:
params.context.set_messages(json.load(file))
logger.debug(
f"loaded conversation from {filename}\n{json.dumps(params.context.get_messages(), indent=4)}"
@@ -171,17 +171,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
system_instruction=system_instruction,
)

View File

@@ -95,10 +95,10 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
@@ -106,7 +106,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# Initialize the Gemini Multimodal Live model
llm = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
api_key=os.environ["GOOGLE_API_KEY"],
settings=GoogleLLMService.Settings(
system_instruction=system_instruction,
),

View File

@@ -52,7 +52,7 @@ import os
import time
from dotenv import load_dotenv
from google import genai
from google import genai # pyright: ignore[reportAttributeAccessIssue]
from loguru import logger
from pipecat.adapters.schemas.function_schema import FunctionSchema
@@ -87,7 +87,7 @@ def get_rag_content():
"""Get the RAG content from the file."""
script_dir = os.path.dirname(os.path.abspath(__file__))
rag_content_path = os.path.join(script_dir, "assets", "rag-content.txt")
with open(rag_content_path, "r") as f:
with open(rag_content_path) as f:
return f.read()
@@ -179,10 +179,10 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="f9836c6e-a0bd-460e-9d3c-f7299fa60f94", # Southern Lady
),
@@ -197,7 +197,7 @@ Your response will be turned into speech so use only simple words and punctuatio
"""
llm = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
api_key=os.environ["GOOGLE_API_KEY"],
settings=GoogleLLMService.Settings(
model=VOICE_MODEL,
system_instruction=system_prompt,

View File

@@ -133,11 +133,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
# Initialize text-to-speech service
tts = ElevenLabsTTSService(
api_key=os.getenv("ELEVENLABS_API_KEY"),
api_key=os.environ["ELEVENLABS_API_KEY"],
settings=ElevenLabsTTSService.Settings(
voice="pNInz6obpgDQGcFmaJgB",
),
@@ -196,7 +196,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# Initialize LLM service
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="""You are a personal assistant. You can remember things about the person you are talking to.
Some Guidelines:

View File

@@ -30,6 +30,7 @@ from pipecat.processors.aggregators.llm_response_universal import (
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.aws.nova_sonic.llm import AWSNovaSonicLLMService
from pipecat.services.aws.nova_sonic.session_continuation import SessionContinuationParams
from pipecat.services.llm_service import FunctionCallParams
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
@@ -116,8 +117,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# Create the AWS Nova Sonic LLM service
llm = AWSNovaSonicLLMService(
secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
secret_access_key=os.environ["AWS_SECRET_ACCESS_KEY"],
access_key_id=os.environ["AWS_ACCESS_KEY_ID"],
# as of 2025-12-09, these are the supported regions:
# - Nova 2 Sonic (the default model):
# - us-east-1
@@ -126,12 +127,22 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# - Nova Sonic (the older model):
# - us-east-1
# - ap-northeast-1
region=os.getenv("AWS_REGION"),
region=os.environ["AWS_REGION"],
session_token=os.getenv("AWS_SESSION_TOKEN"),
settings=AWSNovaSonicLLMService.Settings(
voice="tiffany",
system_instruction=system_instruction,
),
# Session continuation is enabled by default, allowing seamless
# conversations longer than the AWS ~8-minute session limit.
# The service rotates sessions in the background with no
# user-perceptible interruption. You can tune the threshold or
# disable it with: session_continuation=SessionContinuationParams(enabled=False)
session_continuation=SessionContinuationParams(
# When to start preparing the next session (default: 360 = 6 min).
# Lower this (e.g. 20) to see a handoff happen quickly during testing.
transition_threshold_seconds=360,
),
# you could choose to pass tools here rather than via context
# tools=tools
)

View File

@@ -112,8 +112,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
llm = AzureRealtimeLLMService(
api_key=os.getenv("AZURE_REALTIME_API_KEY"),
base_url=os.getenv("AZURE_REALTIME_BASE_URL"),
api_key=os.environ["AZURE_REALTIME_API_KEY"],
base_url=os.environ["AZURE_REALTIME_BASE_URL"],
settings=AzureRealtimeLLMService.Settings(
system_instruction="""You are a helpful and friendly AI.

View File

@@ -104,7 +104,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# Initialize Gemini service with File API support
llm = GeminiLiveLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
api_key=os.environ["GOOGLE_API_KEY"],
settings=GeminiLiveLLMService.Settings(
system_instruction=system_instruction,
voice="Charon", # Aoede, Charon, Fenrir, Kore, Puck

View File

@@ -114,7 +114,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
)
llm = GeminiLiveLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
api_key=os.environ["GOOGLE_API_KEY"],
settings=GeminiLiveLLMService.Settings(
system_instruction=system_instruction,
),

View File

@@ -67,7 +67,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# Initialize the Gemini Multimodal Live model
llm = GeminiLiveLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
api_key=os.environ["GOOGLE_API_KEY"],
settings=GeminiLiveLLMService.Settings(
voice="Puck", # Aoede, Charon, Fenrir, Kore, Puck
system_instruction=system_instruction,

View File

@@ -133,7 +133,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
)
llm = GeminiLiveLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
api_key=os.environ["GOOGLE_API_KEY"],
settings=GeminiLiveLLMService.Settings(
system_instruction=system_instruction,
),

View File

@@ -75,7 +75,8 @@ class GroundingMetadataProcessor(FrameProcessor):
if isinstance(frame, LLMSearchResponseFrame):
self._grounding_count += 1
logger.info(f"\n\n🔍 GROUNDING METADATA RECEIVED #{self._grounding_count}\n")
logger.info(f"📝 Search Result Text: {frame.search_result[:200]}...")
if frame.search_result:
logger.info(f"📝 Search Result Text: {frame.search_result[:200]}...")
if frame.rendered_content:
logger.info(f"🔗 Rendered Content: {frame.rendered_content}")
@@ -101,7 +102,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
)
llm = GeminiLiveLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
api_key=os.environ["GOOGLE_API_KEY"],
settings=GeminiLiveLLMService.Settings(
system_instruction=SYSTEM_INSTRUCTION,
voice="Charon", # Aoede, Charon, Fenrir, Kore, Puck
@@ -111,16 +112,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# Create a processor to capture grounding metadata
grounding_processor = GroundingMetadataProcessor()
messages = [
{
"role": "user",
"content": "Please introduce yourself and let me know that you can help with current information by searching the web. Ask me what current information I'd like to know about.",
},
]
# Set up conversation context and management
context = LLMContext(messages)
context = LLMContext()
# Server-side VAD is enabled by default; no local VAD is added.
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(context)
@@ -144,6 +137,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{
"role": "developer",
"content": "Please introduce yourself and let me know that you can help with current information by searching the web. Ask me what current information I'd like to know about.",
}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -54,7 +54,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
llm = GeminiLiveLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
api_key=os.environ["GOOGLE_API_KEY"],
settings=GeminiLiveLLMService.Settings(
voice="Aoede", # Puck, Charon, Kore, Fenrir, Aoede
vad=GeminiVADParams(disabled=True),

View File

@@ -110,8 +110,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
llm = GeminiLiveVertexLLMService(
credentials=os.getenv("GOOGLE_VERTEX_TEST_CREDENTIALS"),
project_id=os.getenv("GOOGLE_CLOUD_PROJECT_ID"),
location=os.getenv("GOOGLE_CLOUD_LOCATION"),
project_id=os.environ["GOOGLE_CLOUD_PROJECT_ID"],
location=os.environ["GOOGLE_CLOUD_LOCATION"],
settings=GeminiLiveVertexLLMService.Settings(
system_instruction=system_instruction,
voice="Puck", # Aoede, Charon, Fenrir, Kore, Puck

Some files were not shown because too many files have changed in this diff Show More