Compare commits

...

473 Commits

Author SHA1 Message Date
James Hush
97c7820372 Fix copyright 2026-03-05 11:17:02 +08:00
James Hush
7d957292e0 Rename to 57, extract content check into helper method 2026-03-05 11:14:36 +08:00
James Hush
218ab01070 Use ParallelPipeline for content filter in example 07
Run the content filter concurrently with LLM text generation
using ParallelPipeline, with a ContentFilterGate that blocks
output until the filter approves or rejects the content.
2026-03-05 11:08:56 +08:00
James Hush
9dbd923cfc Make separate file 2026-03-05 10:41:53 +08:00
Aleix Conchillo Flaqué
fd545cabab update uv.lock 2026-03-04 17:40:24 -08:00
Aleix Conchillo Flaqué
1aadb8bd73 Merge pull request #3918 from pipecat-ai/aleix/system-instruction-openai-anthropic
Wire up system_instruction in OpenAI, Anthropic, and AWS Bedrock
2026-03-04 17:40:00 -08:00
Aleix Conchillo Flaqué
3c60b0c8af Add changelog for #3918 2026-03-04 17:37:32 -08:00
Aleix Conchillo Flaqué
0004a116d8 examples(foundational): use system_instruction in all examples 2026-03-04 17:37:32 -08:00
Aleix Conchillo Flaqué
01f0caf252 wire up system_instruction in OpenAI, Anthropic and AWS Bedrock 2026-03-04 17:37:32 -08:00
Vanessa Pyne
b42dfa4734 Merge pull request #3916 from pipecat-ai/vp-add-cloud-audio-only
daily-transport: add cloud-audio-only recording option
2026-03-04 16:58:39 -06:00
vipyne
aa31ced32f add changelog for 3916 2026-03-04 16:58:28 -06:00
vipyne
9ca900cc4a daily-transport: add cloud-audio-only recording option 2026-03-04 16:58:28 -06:00
Mark Backman
4b9fe5afe3 Merge pull request #3915 from pipecat-ai/mb/function_call_timeout_secs-error-msg
Add per-tool function call timeout_secs
2026-03-04 15:01:34 -05:00
Mark Backman
f76b8d2982 Merge pull request #3917 from pipecat-ai/mb/sagemaker-init.py
Add missing __init__.py to sagemaker module
2026-03-04 12:25:44 -05:00
Mark Backman
27ae6a0349 Add missing __init__.py to sagemaker module 2026-03-04 11:50:37 -05:00
Mark Backman
97e4e7c647 Add changelog for #3915 2026-03-04 09:42:01 -05:00
Mark Backman
df35ceca2c Add per-tool timeout_secs to register_function and register_direct_function
The default function call timeout (10s) causes silent failures for
long-running tools. This adds an optional timeout_secs parameter to
register_function() and register_direct_function() so individual tools
can override the global function_call_timeout_secs. The warning message
now mentions both the per-tool and global timeout options.
2026-03-04 09:37:56 -05:00
Mark Backman
e5ae5e6f2d Merge pull request #3914 from pipecat-ai/mb/optional-summarization-thresholds
Make max_context_tokens and max_unsummarized_messages independently optional
2026-03-04 08:57:16 -05:00
Mark Backman
6789aee9e8 Add changelog for #3914 2026-03-03 20:09:26 -05:00
Mark Backman
b358657a79 Make max_context_tokens and max_unsummarized_messages independently optional
Allow either threshold to be set to None to cleanly disable that trigger,
instead of requiring users to set a very large number as a workaround.
At least one of the two must remain set (validated at construction time).
2026-03-03 20:08:22 -05:00
Mark Backman
9186f65952 Merge pull request #3908 from pipecat-ai/mb/uv-lock-2026-03-03
uv.lock update
2026-03-03 13:28:27 -05:00
Mark Backman
bdeeacec51 uv.lock update 2026-03-03 10:37:35 -05:00
Mark Backman
8f04f894d5 Merge pull request #3907 from pipecat-ai/mb/update-docs-skill-new-services 2026-03-03 09:48:01 -05:00
Mark Backman
ca0ec16373 Merge pull request #3889 from ai-coustics/goedev/aic-voice-focus-and-memoryview-fix
AIC Voice Focus version update & concurrency safety issue on audio buffer.
2026-03-03 09:28:13 -05:00
Filipi da Silva Fuchter
150c8b92e5 Merge pull request #3848 from pipecat-ai/filipi/deepgram
Upgrading Deepgram to version 6
2026-03-03 09:07:10 -05:00
filipi87
0fdf7dc16a Fixing sagemaker merge conflicts. 2026-03-03 11:03:51 -03:00
filipi87
fc905a7ef5 Merge branch 'main' into filipi/deepgram
# Conflicts:
#	src/pipecat/services/deepgram/stt_sagemaker.py
2026-03-03 10:54:30 -03:00
Mark Backman
aca92745cd Update update-docs skill to register new services in docs.json and supported-services.mdx
When the skill creates a new service documentation page, it now also adds
the page to docs.json navigation and the supported-services.mdx table.
2026-03-03 08:40:12 -05:00
Aleix Conchillo Flaqué
5940731dd0 Merge pull request #3906 from pipecat-ai/changelog-0.0.104
Release 0.0.104 - Changelog Update
2026-03-02 21:24:05 -08:00
aconchillo
62260454a2 Update changelog for version 0.0.104 2026-03-02 21:23:39 -08:00
Aleix Conchillo Flaqué
d1ad7a9580 Merge pull request #3905 from pipecat-ai/aleix/tavus-fix-callback-on-joined
transport(tavus): fix on_joined callback
2026-03-02 21:11:12 -08:00
Aleix Conchillo Flaqué
252f17e1ca transport(tavus): fix on_joined callback 2026-03-02 21:06:49 -08:00
Mark Backman
c79a739c85 Merge pull request #3856 from zkleb-aai/assemblyai-u3-rt-pro
Assemblyai u3 rt pro
2026-03-02 20:28:28 -05:00
Mark Backman
038f6a77d1 Linting 2026-03-02 20:24:30 -05:00
Aleix Conchillo Flaqué
5952ea711c update uv.lock 2026-03-02 16:42:58 -08:00
Mark Backman
aad1211a57 Merge pull request #3885 from pipecat-ai/mb/latency-breakdown
Add latency breakdown to UserBotLatencyObserver
2026-03-02 19:27:35 -05:00
Mark Backman
7dbb130666 Add chronological_events utility function to display UserBotLatencyObserver report 2026-03-02 19:23:42 -05:00
zack
c6c2c5ba05 Fix end_of_turn_confidence_threshold: set to 1.0 (not 0.0) for universal-streaming
- u3-rt-pro: Does not set parameter (not used)
- universal-streaming models: Set to 1.0 to maintain fast response
- This ensures fast response time matches previous implementation
2026-03-02 18:25:25 -05:00
Aleix Conchillo Flaqué
141b0ee014 Merge pull request #3902 from pipecat-ai/aleix/deepgram-sagemaker-move
Move Deepgram SageMaker modules to sagemaker/ subpackage
2026-03-02 15:25:17 -08:00
Aleix Conchillo Flaqué
303616599f Add changelog for #3902 2026-03-02 15:20:52 -08:00
Aleix Conchillo Flaqué
088eb9b01c examples: update to new sagemaker packages 2026-03-02 15:20:52 -08:00
zack
32773b42d6 Improve terminology: rename file and replace 'STT mode' with 'AssemblyAI turn detection'
- Rename 07o-interruptible-assemblyai-stt.py -> 07o-interruptible-assemblyai-turn-detection.py
- Replace 'STT mode' with 'AssemblyAI turn detection mode' throughout codebase
- Replace 'Mode 1'/'Mode 2' with descriptive 'Pipecat turn detection'/'AssemblyAI turn detection'
- Update changelog to use 'built-in turn detection' terminology
- Addresses PR feedback about confusing terminology
2026-03-02 18:08:46 -05:00
Mark Backman
c039e08741 Merge pull request #3900 from pipecat-ai/filipi/lemonslice
Adding the LemonSlice transport integration
2026-03-02 17:56:18 -05:00
zack
b449515410 Address PR review feedback: remove debug logs, fix hasattr logic, add VADAnalyzer 2026-03-02 17:54:31 -05:00
Mark Backman
aae9136df9 Review feedback 2026-03-02 17:52:39 -05:00
Aleix Conchillo Flaqué
fdeddd7c95 Add deprecation shims for moved stt_sagemaker/tts_sagemaker modules
Re-export from the new pipecat.services.deepgram.sagemaker.{stt,tts}
paths so existing imports keep working with a deprecation warning.
2026-03-02 14:47:17 -08:00
Aleix Conchillo Flaqué
11783520c0 services(deepgram): move stt|tts_sagemaker to sagemaker/stt|tts.py 2026-03-02 14:43:34 -08:00
filipi87
49c73bb0a3 Merge branch 'main' into filipi/lemonslice
# Conflicts:
#	README.md
#	uv.lock
2026-03-02 19:24:52 -03:00
filipi87
f07e55a4ed Wrap LemonSlice session creation params in LemonSliceNewSessionRequest 2026-03-02 19:15:18 -03:00
filipi87
daf14f5065 Renaming LemonSlice utils file to api. 2026-03-02 19:08:17 -03:00
filipi87
ebb794995b Changing the log levels. 2026-03-02 19:06:13 -03:00
zkleb-aai
5c2ca0ce64 Update changelog/3856.changed.md
Co-authored-by: Mark Backman <m.backman@gmail.com>
2026-03-02 17:04:54 -05:00
zkleb-aai
6729f4366a Update src/pipecat/services/assemblyai/stt.py
Co-authored-by: Mark Backman <m.backman@gmail.com>
2026-03-02 17:04:42 -05:00
zkleb-aai
7648b62e6e Update src/pipecat/services/assemblyai/stt.py
Co-authored-by: Mark Backman <m.backman@gmail.com>
2026-03-02 17:04:17 -05:00
filipi87
7afd7068b5 Retrieving the elevenlabs voice ID from environment variable 2026-03-02 19:02:51 -03:00
filipi87
07fdd610ca Using a default voice in case it is not provided. 2026-03-02 19:02:33 -03:00
Mark Backman
a4796a2373 Merge pull request #3898 from pipecat-ai/mb/revert-processing-metrics-deprecation
Revert processing metrics deprecation
2026-03-02 16:39:02 -05:00
Aleix Conchillo Flaqué
44466cfa07 Merge pull request #3896 from pipecat-ai/aleix/broadcast-interruption
Add broadcast_interruption() to FrameProcessor
2026-03-02 13:36:39 -08:00
Aleix Conchillo Flaqué
741ff14d3a Rename changelog files to use PR #3896 and mark breaking change 2026-03-02 13:26:45 -08:00
Aleix Conchillo Flaqué
4a61d5bfad Add broadcast_interruption() to FrameProcessor
Replace the round-trip push_interruption_task_frame_and_wait() mechanism
with broadcast_interruption(), which pushes an InterruptionFrame both
upstream and downstream directly from the calling processor.

This eliminates race conditions (transcription arriving before the
InterruptionFrame comes back), swallowed-event timeouts (frame blocked
before reaching the sink), and the complexity of _wait_for_interruption
flag / queue bypass / frame.complete() obligations.

- Add broadcast_interruption() to FrameProcessor
- Deprecate push_interruption_task_frame_and_wait() (delegates to new method)
- Remove event field and complete() from InterruptionFrame/InterruptionTaskFrame
- Remove _wait_for_interruption flag and all special-case logic
- Remove frame.complete() calls in stt_mute_filter and llm_response_universal
- Update all 17 call sites to use broadcast_interruption()
- Update tests
2026-03-02 13:26:45 -08:00
Mark Backman
d0ecb3c7a8 Revert "Deprecate processing metrics (ProcessingMetricsData)" (#3852)
This reverts commit 127b52bad5.
2026-03-02 16:26:29 -05:00
Mark Backman
8f66272de7 Update changelog 2026-03-02 16:16:38 -05:00
Mark Backman
ff5b985009 Convert observer data models to Pydantic BaseModel with timestamps
Enables .model_dump() serialization for Pipecat Cloud collection.
All metrics now include start_time (Unix timestamp) for timeline
plotting alongside duration_secs.
2026-03-02 16:11:43 -05:00
Mark Backman
a738a4d82b Add function call latency tracking to LatencyBreakdown 2026-03-02 16:11:43 -05:00
Mark Backman
ddba1b84a9 Add first-bot-speech latency to UserBotLatencyObserver
Measure time from ClientConnectedFrame to first BotStartedSpeakingFrame,
emitting a one-time on_first_bot_speech_latency event with breakdown.
2026-03-02 16:11:43 -05:00
Mark Backman
18155b6a63 Add latency breakdown to UserBotLatencyObserver
Add per-service latency breakdown metrics alongside existing user-to-bot
latency measurement. When enable_metrics=True, the observer now emits an
on_latency_breakdown event with TTFB, text aggregation, and user turn
duration metrics collected between VADUserStoppedSpeakingFrame and
BotStartedSpeakingFrame.

- Add LatencyBreakdown dataclass with ttfb, text_aggregation,
  user_turn_secs fields
- Accumulate MetricsFrame data during user→bot cycles
- Reset accumulators on InterruptionFrame to discard stale metrics
- Measure user_turn_secs from actual user silence (VAD timestamp -
  stop_secs) to turn release (UserStoppedSpeakingFrame)
- Filter zero-value TTFB entries from startup metric resets
- Add frame deduplication using bounded deque + set pattern
- Update example 29 with latency breakdown display
2026-03-02 16:11:43 -05:00
Mark Backman
ac69b3441e Fix tracing to use ServiceSettings API instead of dict access
The ServiceSettings refactor (PR #3714) changed self._settings from
dicts to dataclass subclasses, but tracing code still used .items(),
in containment, and subscript access, causing AttributeError on
every traced call. Use given_fields() for iteration and attribute
access for named fields.
2026-03-02 16:11:43 -05:00
Mark Backman
98bd530574 Add changelog for #3881 2026-03-02 16:11:42 -05:00
Mark Backman
b1e55fd6c2 Merge pull request #3881 from pipecat-ai/mb/startup-observer
Add StartupTimingObserver
2026-03-02 16:07:28 -05:00
Mark Backman
dbdb54ce0f Add on_connected event handler to DailyTransport for cross-transport consistency 2026-03-02 15:44:37 -05:00
Mark Backman
c1743dcffd Rename Tavus event, on_connected 2026-03-02 15:22:44 -05:00
Mark Backman
389d0c3fb6 Use on_pipeline_started from PipelineTask for startup report
Replace the PipelineSink detection in StartupTimingObserver with an
on_pipeline_started() callback from PipelineTask via TaskObserver.
This fixes premature report emission when using ParallelPipeline,
which has its own inner PipelineSinks per branch.
2026-03-02 14:33:55 -05:00
Mark Backman
a88eae7849 Merge pull request #3895 from pipecat-ai/aleix/update-nvidia-example-model
Update Nvidia example to use llama-3.3-70b-instruct
2026-03-02 14:27:53 -05:00
Mark Backman
0cfd953a90 Use _ArrivalInfo dataclass instead of tuple for arrival tracking 2026-03-02 14:15:41 -05:00
Mark Backman
bbbfdfd321 Replace per-processor start_time with start_offset_secs
Use start_offset_secs (offset from StartFrame) on ProcessorStartupTiming
instead of a wall-clock timestamp. Reports keep a single start_time
anchor for dashboard visualization. Remove _mono_to_wall conversion.
2026-03-02 14:07:34 -05:00
Aleix Conchillo Flaqué
193f93c2ce Update Nvidia example to use llama-3.3-70b-instruct model 2026-03-02 10:16:27 -08:00
Mark Backman
75669b12a2 Convert observer data models to Pydantic BaseModel with timestamps
Switch ProcessorStartupTiming, StartupTimingReport, and
TransportTimingReport from dataclasses to Pydantic BaseModel. Add
start_time (Unix timestamp) fields and wall clock conversion for
monotonic observer timestamps.
2026-03-02 13:10:09 -05:00
Mark Backman
68e8732e72 Add BotConnectedFrame and on_transport_timing_report event
Add BotConnectedFrame (SystemFrame) pushed by SFU transports (Daily,
LiveKit, HeyGen, Tavus) when the bot joins the room. Replace the
on_transport_readiness_measured event with on_transport_timing_report
which includes both bot_connected_secs and client_connected_secs.
2026-03-02 13:10:09 -05:00
Mark Backman
de87894778 Update changelog for #3881 2026-03-02 13:10:09 -05:00
Mark Backman
0836066898 Add ClientConnectedFrame and transport readiness timing
Introduce ClientConnectedFrame (SystemFrame) pushed by all transports
when a client connects. StartupTimingObserver uses this to measure
transport readiness — the time from StartFrame to first client
connection — via a new on_transport_readiness_measured event.
2026-03-02 13:10:09 -05:00
Mark Backman
58aa8e1ba5 Add changelog for #3881 2026-03-02 13:10:09 -05:00
Mark Backman
670e5000d2 Merge pull request #3893 from pipecat-ai/mb/fix-azure-error-propagation
Propagate Azure TTS/STT cancellation errors to the pipeline
2026-03-02 13:04:54 -05:00
Mark Backman
e6b9c5c4dc Propagate Azure TTS/STT cancellation errors to the pipeline
Azure TTS _handle_canceled was putting None (the normal completion
signal) into the audio queue for all cancellation reasons, so run_tts
treated errors identically to success—silently producing no audio.
Now error cancellations put an Exception marker in the queue, which
run_tts converts to an ErrorFrame.

Azure STT had no canceled event handler at all, so auth failures,
network errors, and rate-limit cancellations were invisible. Added
_on_handle_canceled which pushes an ErrorFrame upstream via push_error.

Fixes pipecat-ai/pipecat#3892
2026-03-02 12:36:08 -05:00
Mark Backman
c54232bdb4 Add StartupTimingObserver for measuring processor start() times
Tracks how long each processor start method takes during pipeline
startup by measuring StartFrame arrive/leave deltas. Emits a timing
report via the on_startup_timing_report event and auto-logs a summary.
Internal pipeline processors are excluded from reports by default.
2026-03-02 10:48:50 -05:00
Mark Backman
5a6a93e277 Merge pull request #3886 from dhruvladia-sarvam/add/user-agent
fix(sarvam): standardize STT/TTS User-Agent headers
2026-03-02 10:21:23 -05:00
dhruvladia-sarvam
f386722ef9 removing unnecessary logs 2026-03-02 20:38:39 +05:30
Mark Backman
7c07e090a4 Merge pull request #3891 from pipecat-ai/mb/fix-update-docs-oidc
Fix update-docs workflow OIDC failure with pull_request_target
2026-03-02 09:29:35 -05:00
filipi87
8b09f7bbb4 Upgrading Deepgram to version 6. 2026-03-02 11:22:33 -03:00
Mark Backman
07ba255073 Fix update-docs workflow OIDC failure with pull_request_target
The switch from pull_request to pull_request_target (for fork PR
secret access) broke claude-code-action default OIDC-based GitHub
App authentication. Pass github_token explicitly to bypass OIDC.
2026-03-02 09:20:24 -05:00
Mark Backman
eb7a4b7aee Merge pull request #3874 from pipecat-ai/mb/pr-3873
Changelog for PR 3873, docstrings change
2026-03-02 08:36:05 -05:00
Rupesh
ad74d19c6b Remove resampling warning log for consistency with rest of codebase 2026-03-02 13:24:00 +00:00
Rupesh
5e8d722bf2 Use soxr for high-quality audio resampling instead of numpy linear interpolation 2026-03-02 13:24:00 +00:00
Rupesh
a7f6db8436 Add changelog fragment for #3857 2026-03-02 13:24:00 +00:00
Rupesh
442ea6a97e Fix Smart Turn v3 producing incorrect predictions at non-16kHz sample rates
The Whisper-based ONNX model expects 16 kHz audio, but the
_predict_endpoint method had five hardcoded references to 16000 without
checking the actual pipeline sample rate. When running at 8 kHz (e.g.
Twilio telephony), audio was fed to the feature extractor at the wrong
rate, causing the model to perceive speech at 2x speed with shifted
formant frequencies and produce incorrect end-of-turn predictions.

Add automatic resampling via numpy interpolation before feature
extraction and replace all hardcoded sample rate values with a
_MODEL_SAMPLE_RATE constant. Also fix the WAV debug logger to write
files with the correct sample rate header.

Fixes #3844
2026-03-02 13:24:00 +00:00
Mark Backman
018ead8551 Changelog for PR 3873, docstrings change 2026-03-02 08:08:43 -05:00
Mark Backman
5e99aeedf5 Merge pull request #3888 from pipecat-ai/mb/fix-filter-incomplete-turns
Re-inject turn completion instructions after LLM context reset
2026-03-02 08:03:08 -05:00
Mark Backman
c579749d8a Merge pull request #3875 from pipecat-ai/mb/foundational-ex-updates
Miscellaneous foundational example updates
2026-03-02 08:02:51 -05:00
Mark Backman
094de42f0c Merge pull request #3879 from pipecat-ai/mb/fix-tracing-settings
Fix tracing to use ServiceSettings API instead of dict access
2026-03-02 08:01:45 -05:00
dhruvladia-sarvam
1242f1c10e changelog entry 2026-03-02 18:09:51 +05:30
dhruvladia-sarvam
55a641e258 fix(sarvam): standardize STT/TTS User-Agent headers 2026-03-02 18:09:51 +05:30
Gökmen Görgen
7575ea7e07 add changelog entries for Voice Focus 2.0 support and buffer resize fix in AICFilter. 2026-03-02 12:47:59 +01:00
Gökmen Görgen
d6aab6b52e simplify parameter setup in AICFilter.
ParameterFixedError is deprecated.
2026-03-02 12:24:52 +01:00
Gökmen Görgen
8ff3e21654 use new version of vf model. 2026-03-02 11:22:51 +01:00
Gökmen Görgen
ea59695551 don't use memoryview for concurrency safety.
Snapshot the blocks into immutable bytes and trim the buffer BEFORE any await, so no memoryview is
held across async yield points. Without this, a concurrent filter() or stop() call could try to
extend() or clear() the bytearray while a memoryview still exports it, raising "Existing exports
of data: object cannot be re-sized".
2026-03-02 10:55:25 +01:00
Gökmen Görgen
16c676a921 add a test for reproducing the user feedback first. 2026-03-02 10:34:50 +01:00
Mark Backman
91c46ffbf4 Re-inject turn completion instructions after LLM context reset
When filter_incomplete_user_turns is enabled and an LLMMessagesUpdateFrame
replaces the context via set_messages(), the turn completion instructions
system message was lost. This caused the LLM to stop emitting turn
completion markers. Re-inject the instructions after set_messages() to
fix this.
2026-03-01 16:37:07 -05:00
Mark Backman
024c62946f Merge pull request #3878 from pipecat-ai/mb/fix-update-docs-workflow-secrets
fix: use pull_request_target for docs workflow to access secrets from fork PRs
2026-03-01 14:53:53 -05:00
Mark Backman
9b969736f6 Merge pull request #3764 from kedar389/add-support-for-private-endpoint-azure-stt
feat: Add support for private endpoint in Azure STT
2026-03-01 14:50:34 -05:00
Radek Sedlák
6fc718947d Merge branch 'pipecat-ai:main' into add-support-for-private-endpoint-azure-stt 2026-03-01 17:55:45 +01:00
zack
cb7e612738 Remove test files and testing documentation from PR 2026-03-01 11:51:51 -05:00
zack
36b9c05730 Fix changelog entries to use proper markdown bullet format 2026-03-01 11:45:24 -05:00
zack
6968d83ccb Add changelog entries for PR #3856 2026-03-01 11:44:51 -05:00
zack
42f91a9056 Apply ruff formatting fixes 2026-03-01 11:44:37 -05:00
zack
5de495cc98 Use logger.warning instead of warnings.warn for deprecation message
- Makes deprecation warning visible in logs without needing Python warning flags
- Users will see the warning during normal operation
2026-03-01 11:39:00 -05:00
zack
d1cbc81108 Fix 07o example to use new min_turn_silence parameter name in docs and comments 2026-03-01 11:36:46 -05:00
zack
66fca7e382 Add backward compatibility for min_end_of_turn_silence_when_confident parameter
- Keep old parameter name for backward compatibility
- Add deprecation warning when old parameter is used
- Automatically migrate old parameter value to new min_turn_silence parameter
- Exclude deprecated parameter from WebSocket URL to avoid sending it to API
- New parameter takes precedence if both are set
2026-03-01 11:33:22 -05:00
zack
07ae4b8d38 Update AssemblyAI examples to use u3-rt-pro and improve 55d example
- Update 13d-assemblyai-transcription.py to explicitly use u3-rt-pro model
- Update 55d-update-settings-assemblyai-stt.py to demonstrate keyterms updates instead of language updates
- Add helpful logging to show before/after keyterms boosting effect
- Use difficult names (Xiomara, Saoirse, Krzystof) to demonstrate boosting effectiveness
2026-03-01 11:27:31 -05:00
zack
21a409e447 Update prompt warning and rename min_end_of_turn_silence_when_confident to min_turn_silence
- Add "beta feature" note to custom prompt warning
- Rename min_end_of_turn_silence_when_confident parameter to min_turn_silence across all AssemblyAI code
- Update documentation, examples, and test files to use new parameter name
2026-03-01 11:17:39 -05:00
Aleix Conchillo Flaqué
903dc6c1a9 Merge pull request #3883 from pipecat-ai/aleix/queue-frame-direction
Add direction parameter to PipelineTask.queue_frame() and queue_frames()
2026-03-01 06:04:28 -08:00
Mark Backman
dee94b3cb8 Merge pull request #3795 from omChauhanDev/fix/realtime-cancel-not-active
fix(realtime): handle response_cancel_not_active as non-fatal
2026-03-01 07:29:59 -05:00
Om Chauhan
ece4343839 changed log level to debug 2026-03-01 12:25:42 +05:30
Aleix Conchillo Flaqué
94a59de4e1 Add changelog for #3883 2026-02-28 17:28:44 -08:00
Aleix Conchillo Flaqué
f37fd39cdb Add optional direction parameter to PipelineTask.queue_frame() and queue_frames()
Allow pushing frames upstream through the pipeline by passing
FrameDirection.UPSTREAM. Downstream frames use the existing push queue,
while upstream frames are pushed directly from the pipeline sink.
2026-02-28 17:28:44 -08:00
Mark Backman
9d4955054c Fix tracing to use ServiceSettings API instead of dict access
The ServiceSettings refactor (PR #3714) changed self._settings from
dicts to dataclass subclasses, but tracing code still used .items(),
in containment, and subscript access, causing AttributeError on
every traced call. Use given_fields() for iteration and attribute
access for named fields.
2026-02-27 22:41:40 -05:00
Mark Backman
6464230627 fix: use pull_request_target for docs workflow to access secrets from fork PRs
The update-docs workflow intermittently failed with "Input required and
not supplied: token" because pull_request events from fork PRs don't
have access to repository secrets. Switching to pull_request_target
runs the workflow in the base repo's context, ensuring secrets are
always available. This is safe since the workflow only runs on
already-merged PRs.
2026-02-27 22:22:35 -05:00
Mark Backman
950a8628dc Miscellaneous foundational example updates 2026-02-27 19:49:45 -05:00
Mark Backman
17205c1647 Merge pull request #3871 from rupesh-svg/fix/rtvi-processor-double-insert
Fix PipelineTask double-inserting RTVIProcessor with custom RTVIObserver
2026-02-27 19:34:46 -05:00
Mark Backman
2a776d0c1e Merge pull request #3873 from rimelabs/matt/rime/add_speedAlpha_param_to_arcana
[RimeTTS] Add `speedAlpha` parameter support to the `arcana` model
2026-02-27 19:27:56 -05:00
zack
d7ce1eedd9 Add foundational examples for AssemblyAI u3-rt-pro
- 07o-interruptible-assemblyai.py: Basic example using Pipecat VAD mode
- 07o-interruptible-assemblyai-stt.py: Advanced example using STT-controlled
  turn detection with comprehensive documentation on u3-rt-pro features
  (turn detection tuning, prompt-based enhancement, speaker diarization)
2026-02-27 17:58:18 -05:00
zack
ef00f27d53 Fix incorrect await on synchronous request_finalize() method
The request_finalize() method in STTService is synchronous (sets a flag),
but was being called with await in the VAD turn endpoint handling code.
This caused "object NoneType can't be used in 'await' expression" errors.

Also includes automatic formatting improvements from ruff.
2026-02-27 17:58:05 -05:00
Rupesh
56f2564ed1 Use local variable instead of instance variable for RTVI prepend decision
Replace _rtvi_external instance variable with a local prepend_rtvi flag
since it is only used during __init__ to decide whether to prepend the
RTVIProcessor to the pipeline.
2026-02-27 14:45:37 -08:00
macaki
000d38e253 [Rime] Both mist and arcana now support the speedAlpha parameter. 2026-02-27 15:17:23 -07:00
Filipi da Silva Fuchter
36edef489e Merge pull request #3863 from pipecat-ai/filipi/manual_summarization
Manual context summarization
2026-02-27 16:46:37 -05:00
filipi87
d077a810ae Fixing context summarization tests 2026-02-27 18:42:50 -03:00
filipi87
0839e3813f Refactoring the examples to use the new context summarization classes. 2026-02-27 18:42:39 -03:00
filipi87
69414e8a5a Added example 54b-context-summarization-manual-openai.py demonstrating on-demand summarization triggered via a function call tool. 2026-02-27 18:42:23 -03:00
filipi87
dfd0a515f3 Changelog entries for the context summarization improvements. 2026-02-27 18:42:13 -03:00
filipi87
ed7f0a2c08 Adding support for on-demand summarization 2026-02-27 18:41:55 -03:00
filipi87
08d93ce9b6 Renamed LLMAssistantAggregatorParams fields for clarity. 2026-02-27 18:41:17 -03:00
filipi87
f11d4b6944 Refactored LLMContextSummarizationConfig into two focused classes, LLMContextSummaryConfig and LLMAutoContextSummarizationConfig. 2026-02-27 18:40:41 -03:00
filipi87
51a3310e78 Added LLMSummarizeContextFrame: push this frame anywhere in the pipeline to trigger on-demand context summarization (e.g. from a function call tool). 2026-02-27 18:39:57 -03:00
Rupesh
6f33aff0c6 Fix PipelineTask double-inserting RTVIProcessor when custom RTVIObserver is provided
When the user places an RTVIProcessor inside their pipeline and provides
a custom RTVIObserver subclass in observers, PipelineTask correctly
detects both and logs "skipping default ones." However it then
unconditionally prepends self._rtvi to the pipeline, causing the
processor to appear twice in the frame chain.

Track whether the RTVIProcessor was found externally (inside the user
pipeline) vs created internally. Only prepend it when created internally.

Fixes #3867
2026-02-27 13:29:01 -08:00
zack
45532a9478 Remove info logs and unused import per PR feedback
- Remove unused Mapping import
- Remove info logs at initialization (connection params)
- Remove info logs in _handle_transcription (transcript details, text sent to LLM)
- Remove info logs in _build_ws_url (WebSocket URL and params)
- Keep debug logs (less verbose, appropriate for development)
2026-02-27 16:15:49 -05:00
Mark Backman
4eb993c980 Merge pull request #3868 from wollerman/wollerman/numba-version-pin-update
fix: Update numba version pin from == to >=0.61.2
2026-02-27 16:04:20 -05:00
Mark Backman
83e29eb478 Merge pull request #3855 from pipecat-ai/mb/context-summarization-improvements
Improve context summarization with dedicated LLM, timeout, and observability
2026-02-27 15:24:38 -05:00
zack
6ba9f780b0 Remove unnecessary SpeechStarted fallback in STT mode
u3-rt-pro guarantees SpeechStarted is always sent before transcripts,
so the fallback UserStartedSpeakingFrame broadcast is never needed.

This ensures clean pairing of UserStarted/StoppedSpeakingFrame:
- Start: Always from _handle_speech_started
- Stop: Always from _handle_transcription on final turn
2026-02-27 15:00:38 -05:00
zack
aa7e9a17d5 Fix finalization pattern: Use request/confirm in Pipecat mode, finalized flag in STT mode
- Add request_finalize() before sending ForceEndpoint in Pipecat mode
- Keep confirm_finalize() when receiving formatted finals in Pipecat mode
- Remove confirm_finalize() from STT mode (use finalized=True instead)

This follows Pipecat's two-step finalization pattern where request_finalize()
is called when sending a finalize request to the STT service, and
confirm_finalize() is called when receiving confirmation back.
2026-02-27 14:55:22 -05:00
Matt
acff172bf2 create changelog entry 2026-02-27 14:52:37 -05:00
Mark Backman
9747e8da4a Merge pull request #3866 from pipecat-ai/mb/fix-docs-workflow-version
Fix docs workflow to add auto-docs label
2026-02-27 13:09:36 -05:00
Mark Backman
8fc63352d9 Merge pull request #3865 from pipecat-ai/mb/elevenlabs-realtime-stt-finalized
Set finalized flag on ElevenLabs Realtime STT for manual commit strategy
2026-02-27 13:09:17 -05:00
Matt
6ebfea4746 update numba version pin to >= 2026-02-27 12:44:31 -05:00
Mark Backman
f74af9b9c7 Always apply a timeout to summarization LLM calls
Even when summarization_timeout is explicitly set to None, use a
DEFAULT_SUMMARIZATION_TIMEOUT (120s) fallback so the LLM call can
never hang indefinitely. Applied in both LLMService and the dedicated
LLM path in LLMContextSummarizer.
2026-02-27 12:09:00 -05:00
Mark Backman
82c249608f Move dedicated LLM summarization into LLMContextSummarizer
The dedicated LLM logic lived in LLMAssistantAggregator, creating two
code paths and requiring the aggregator to call a private LLMService
method. Move it into the summarizer which already owns the config and
summarization lifecycle, keeping the aggregator handler as a single-line
upstream push.
2026-02-27 12:09:00 -05:00
Mark Backman
98e737b4e9 Add tests for context summarization improvements
Cover summary message role, template, on_summary_applied event,
summarization timeout, and dedicated LLM routing/error handling.
2026-02-27 12:08:43 -05:00
Mark Backman
ec9ddb3199 Add changelog entries for context summarization improvements (#3855) 2026-02-27 12:07:34 -05:00
Mark Backman
712305c5b1 Add example 54c showing custom context summarization 2026-02-27 12:07:34 -05:00
Mark Backman
be8ea818c8 Add on_summary_applied event for observability
Emits a SummaryAppliedEvent after context summarization completes,
  providing message counts so applications can track compression
  metrics.
2026-02-27 12:07:34 -05:00
Mark Backman
50710e9c3f Add summarization timeout to prevent hung LLM calls
Adds a configurable summarization_timeout (default 120s) that cancels
  summary generation if the LLM hangs. On timeout, an error result is
  returned so _summarization_in_progress resets and future
  summarizations are unblocked.
2026-02-27 12:07:34 -05:00
Mark Backman
a489bfaf00 Add optional dedicated LLM for context summarization
Adds an  field to LLMContextSummarizationConfig that allows
  routing summarization to a separate LLM service (e.g., Gemini Flash)
  instead of the pipeline's primary model. This avoids paying for
  expensive inference when compressing context in long-running sessions.
2026-02-27 12:07:34 -05:00
Mark Backman
945a523eed Add configurable summary_message_template to LLMContextSummarizationConfig
Allows applications to customize how the summary is wrapped when
  injected into context (e.g., XML tags, custom delimiters) so system
  prompts can distinguish summaries from live conversation.
2026-02-27 12:07:34 -05:00
Mark Backman
790c434a08 Update summary message role: use user instead of assistant
The context summary is information provided to the assistant, not
  something the assistant said.
2026-02-27 12:07:34 -05:00
Filipi da Silva Fuchter
db40a354be Merge pull request #3794 from omChauhanDev/fix/context-summarization-llm-specific-message
skipping provider-specific messages during summarization
2026-02-27 10:57:34 -05:00
filipi87
aa6d3b38b3 Add explanatory comments for LLMSpecificMessage guards in context summarization, amd fixed the missing guard in LLMContextSummarizer._apply_summary when searching for the first system message. 2026-02-27 12:53:25 -03:00
Mark Backman
41d6470e4a Fix docs workflow: add auto-docs label, remove version info 2026-02-27 10:39:37 -05:00
Mark Backman
601822e3e5 Add changelog for PR #3865 2026-02-27 10:25:48 -05:00
Mark Backman
3a32d91c66 Set finalized flag on ElevenLabs Realtime STT transcriptions for manual commit strategy 2026-02-27 10:21:10 -05:00
Filipi da Silva Fuchter
35b3803ebc Merge pull request #3845 from pipecat-ai/filipi/fix_tts_speak_frame
Add TTSSpeakFrame.push_assistant_aggregation to force context flush after TTS.
2026-02-27 09:59:33 -05:00
filipi87
3b427a47b6 Fixing Piper test. 2026-02-27 11:57:11 -03:00
filipi87
d701c3427c Changelog entry for the TTSSpeakFrame fix. 2026-02-27 11:57:03 -03:00
filipi87
1f45e80f9d Updated the 52-live-translation.py example to demonstrate the fix 2026-02-27 11:56:52 -03:00
filipi87
bc6f8e51de Fixed TTSSpeakFrame not automatically committing spoken text to the conversation context when used outside of an LLM response (e.g., for bot greeting messages or injected speech) 2026-02-27 11:56:44 -03:00
filipi87
deba2515f9 Added a new LLMAssistantPushAggregationFrame control frame that signals LLMAssistantAggregator to immediately flush its text buffer to the conversation context 2026-02-27 11:56:36 -03:00
Mark Backman
127b52bad5 Merge pull request #3852 from pipecat-ai/mb/deprecate-processing-metrics
Deprecate processing metrics (ProcessingMetricsData)
2026-02-27 09:50:29 -05:00
Mark Backman
0697f72dae Merge pull request #3864 from pipecat-ai/mb/auto-docs-update
Add automated docs update workflow
2026-02-27 09:36:27 -05:00
Mark Backman
c259a6a73b Deprecate processing metrics (ProcessingMetricsData)
Add deprecation warnings to start_processing_metrics() and
stop_processing_metrics() on FrameProcessorMetrics and FrameProcessor.
Mark ProcessingMetricsData as deprecated in docstring. All existing
behavior is preserved — the warnings inform users that these will be
removed in a future version.
2026-02-27 09:22:29 -05:00
Mark Backman
3e04f5d05f Add GitHub Actions workflow to auto-update docs on PR merge
Runs Claude Code Action after PRs merge to main when source files
in services/transports/serializers/processors/audio/turns/observers/pipeline
are changed. Creates a docs PR on pipecat-ai/docs with targeted edits
following the existing update-docs skill instructions.
2026-02-27 09:18:15 -05:00
zack
cd07937c5d Fix missing imports: Add UserStartedSpeakingFrame and UserStoppedSpeakingFrame 2026-02-26 22:18:02 -05:00
zack
72934bd8ae Add u3-rt-pro support and improvements to AssemblyAI STT service
- Fix speaker diarization: Add field alias for speaker_label → speaker
  mapping in TurnMessage model
- Add warning for non-optimal min_end_of_turn_silence_when_confident
  values (recommends 100ms for best latency)
- Improve max_turn_silence override warning message clarity
- Update custom prompt warning (remove 88% accuracy claim)
- Add comprehensive logging for debugging:
  - Log final connection params after modifications
  - Log WebSocket URL and parsed parameters
  - Log speaker field in transcripts
  - Log text sent to LLM with speaker formatting
- Support dynamic configuration updates via STTUpdateSettingsFrame:
  - keyterms_prompt (when AssemblyAI API supports it)
  - prompt
  - max_turn_silence
  - min_end_of_turn_silence_when_confident
2026-02-26 22:04:21 -05:00
Mark Backman
2a6a993869 Merge pull request #3850 from rupesh-svg/fix/genesys-remove-audio-chunk-logging
Remove verbose audio chunk logging from GenesysAudioHookSerializer
2026-02-26 21:52:54 -05:00
Rupesh
bbaa79fef0 Add changelog for PR #3850 2026-02-26 14:00:34 -08:00
Rupesh
fff9db0d8f Remove verbose audio chunk logging from GenesysAudioHookSerializer
Fixes #3777
2026-02-26 13:51:05 -08:00
kompfner
7fe458fe59 Merge pull request #3817 from pipecat-ai/pk/service-settings-fix-back-compat-for-nested-external-sdk-types
Flatten `LiveOptions` into individual fields on `DeepgramSTTSettings`…
2026-02-26 11:08:27 -05:00
Paul Kompfner
faed775d90 Extract _DeepgramSTTSettingsBase with shared _merge_live_options_delta to deduplicate LiveOptions merge logic between __init__ and apply_update, and between the Deepgram STT and SageMaker variants; make top-level model/language take precedence over conflicting live_options values in updates; remove unnecessary Language enum-to-string conversion (Language is a StrEnum) 2026-02-26 11:02:44 -05:00
Mark Backman
b63ca524f5 Merge pull request #3806 from pipecat-ai/mb/ultravox-updates
Align Ultravox Realtime service with OpenAI/Gemini patterns
2026-02-26 10:49:21 -05:00
Mark Backman
907ff58d41 Align Ultravox Realtime service with OpenAI/Gemini patterns
- Add InterruptionFrame handling with stop_all_metrics()
- Add processing metrics (start/stop) at response boundaries
- Fix agent transcript handling for voice and text modalities:
  - Voice mode: push LLMTextFrame (append_to_context=False) and
    TTSTextFrame for deltas, skip duplicated final text
  - Text mode: push LLMTextFrame with proper response lifecycle,
    no TTSTextFrame (downstream TTS handles audio)
- Add output_medium parameter to AgentInputParams and OneShotInputParams
- Improve TTFB measurement using VAD speech end time
- Update example with user turn strategies and transcript events
- Add text-only output example (50a-ultravox-realtime-text.py)
2026-02-26 10:44:36 -05:00
Mark Backman
97b93ebe57 Merge pull request #3696 from pipecat-ai/mb/streaming-tts-input
Improve streaming TTS input support, add TextAggregationMetricsData
2026-02-26 10:26:53 -05:00
Mark Backman
3ae173520e Code review feedback 2026-02-26 10:23:35 -05:00
Paul Kompfner
c184ac09b8 Inline _build_live_options into _connect in DeepgramSTTService and DeepgramSageMakerSTTService since it's trivial and only called from one place 2026-02-26 09:42:15 -05:00
Paul Kompfner
3c20eda8bf Keep model/language in LiveOptions at construction time so apply_update's bidirectional sync is sufficient; simplify _build_live_options to only add sample_rate 2026-02-26 09:32:52 -05:00
Mark Backman
d69a337def Add text_aggregation_mode parameter to TTSService
Move the sentence vs token aggregation concern into text aggregators
so all text flows through them regardless of mode. This enables
pattern detection and tag handling to work in TOKEN mode.

- Add TextAggregationMode enum (SENTENCE, TOKEN) as the user-facing
  TTS setting, separate from the internal AggregationType
- Add TOKEN mode support to Simple, SkipTags, and PatternPair aggregators
- Add text_aggregation_mode parameter to TTSService and all TTS subclasses
- Deprecate aggregate_sentences in favor of text_aggregation_mode
- Merge TTSService._process_text_frame() into a single codepath
2026-02-26 08:55:41 -05:00
Mark Backman
f7434cdde1 Add text aggregation time metric for TTS sentence aggregation
Add TextAggregationMetricsData measuring the time from the first LLM
token to the first complete sentence, representing the latency cost of
sentence aggregation in the TTS pipeline.
2026-02-26 08:48:47 -05:00
Paul Kompfner
e21e8585f0 Add deepgram and sagemaker extras to CI test dependencies so Deepgram and Deepgram Sagemaker settings tests can run 2026-02-25 18:59:59 -05:00
Paul Kompfner
8b6aa4b912 Unflatten LiveOptions back into a single live_options field on DeepgramSTTSettings and DeepgramSageMakerSTTSettings; add apply_update override with delta-merge semantics and from_mapping override for backward-compatible dict-style updates 2026-02-25 18:25:11 -05:00
Paul Kompfner
a4b6db6fb4 Flatten LiveOptions into individual fields on DeepgramSTTSettings and DeepgramSageMakerSTTSettings for backward-compatible dict-style updates via STTUpdateSettingsFrame; during the big service settings refactor, we accidentally got rid of the ability to update individual LiveOptions fields with a sparse update 2026-02-25 17:39:31 -05:00
Mark Backman
edc79d374a Merge pull request #3836 from pipecat-ai/mb/small-webrtc-prebuilt-2.3.0
Update the pipecat-ai-small-webrtc-prebuilt to 2.3.0
2026-02-25 17:18:32 -05:00
Mark Backman
e521aef5df Merge pull request #3842 from pipecat-ai/mb/claude-plugin-docs
Add /update-docs skill to claude-plugin
2026-02-25 16:38:16 -05:00
kompfner
3cfff51205 Merge pull request #3827 from pipecat-ai/pk/gemini-tts-service-remove-model-ivar
Remove unnecessary `_model` ivar from `GeminiTTSService`, using `_set…
2026-02-25 16:14:38 -05:00
Paul Kompfner
3d8e3a4043 Remove unnecessary _model ivar from ElevenLabs STT services, using _settings.model instead. 2026-02-25 16:07:33 -05:00
Paul Kompfner
7ee0400c4c Remove unnecessary _model ivar from Hathora TTS and STT services, using _settings.model instead. 2026-02-25 16:07:26 -05:00
Paul Kompfner
781d191509 Remove unnecessary _model ivar from GeminiTTSService, using _settings.model instead 2026-02-25 15:59:38 -05:00
kompfner
a8cb2a26d1 Merge pull request #3841 from pipecat-ai/pk/groq-tweaks
A few Groq-related tweaks:
2026-02-25 15:54:33 -05:00
kompfner
b1df1ba5d4 Merge pull request #3834 from pipecat-ai/pk/make-ai-service-exclusive-syncer-of-model-name-to-metrics
Make it so that `AIService` is the exclusive "syncer" of model name t…
2026-02-25 15:53:59 -05:00
Mark Backman
eee2ef7e85 Add /update-docs skill to claude-plugin 2026-02-25 15:45:16 -05:00
Paul Kompfner
ff0f3dce32 A few Groq-related tweaks:
- Wire up passing speed setting to Groq, even though only a value of 1.0 is supported today
- Update the 55y example to switch voices instead of changing speed
- Add a 55zn example to exercise runtime updates of Groq STT
2026-02-25 15:10:48 -05:00
Paul Kompfner
bca42f7d68 Fix Hathora 55 series examples, and fix Hathora missing settings field warning 2026-02-25 14:48:40 -05:00
Paul Kompfner
27940d83a2 Make it so that AIService is the exclusive "syncer" of model name to metrics.
The only (rare) exception—where a service directly still needs to directly call `self._sync_model_name_to_metrics()`—is when the model name need to be "pulled" from another field (or nested field) in settings up to settings.model on a settings update. This only occurs in Deepgram services, where we use the voice as the model name.

This change has the side-effect of bringing model name to metrics for a number of services that were accidentally omitting it before.
2026-02-25 14:48:24 -05:00
Mark Backman
937c691f2a Merge pull request #3838 from pipecat-ai/mb/remove-playht
Remove PlayHT TTS services
2026-02-25 14:34:15 -05:00
Mark Backman
6803d38d3f Merge pull request #3833 from pipecat-ai/mb/add-performance-changelog-fragment
Add Performance as a changelog fragment option
2026-02-25 14:33:52 -05:00
Mark Backman
44993fe9e3 Remove PlayHT TTS services 2026-02-25 14:12:39 -05:00
Mark Backman
0fe4c732b7 Merge pull request #3837 from alts/alts/append-trailing-space
Add `append_trailing_space` to all Rime websocket services
2026-02-25 13:35:07 -05:00
Stephen Altamirano
ceead60ef2 Add append_trailing_space to all Rime websocket services
This was added in 31daa889e8, but only
to `RimeTTSService`, not to `RimeNonJsonTTSService. Bringing these
to parity means that users switching between the two, with the same
inputs, have more consistent vocalization behaviors.
2026-02-25 10:02:38 -08:00
Mark Backman
e028194dbe Update the pipecat-ai-small-webrtc-prebuilt to 2.3.0 2026-02-25 12:23:13 -05:00
Mark Backman
81f4672535 Add Performance as a changelog fragment option 2026-02-25 09:47:42 -05:00
Mark Backman
9273b158ea Merge pull request #3825 from pipecat-ai/mb/llm-user-aggregator-interim-transcription
Consume InterimTranscriptionFrame and TranslationFrame in LLMUserAggregator
2026-02-25 09:06:34 -05:00
Mark Backman
353a28842c Merge pull request #3807 from pipecat-ai/mb/update-openai-realtime-1.5
Update OpenAI Realtime default model to gpt-realtime-1.5
2026-02-25 09:06:19 -05:00
Mark Backman
3e6c59c736 Merge pull request #3809 from pipecat-ai/mb/krisp-viva-result
Add Krisp API key support and debug logging
2026-02-25 09:05:12 -05:00
Mark Backman
0ca8c850fb Add TurnMetricsData and e2e processing time for KrispVivaTurn
Introduce a generic TurnMetricsData class for turn detection metrics,
replacing the service-specific SmartTurnMetricsData (now deprecated).
Add end-to-end processing time measurement to KrispVivaTurn, tracking
the interval from VAD speech-to-silence transition to model threshold
crossing. Consume metrics in the strategy _handle_input_audio path
so they are pushed immediately when fresh.
2026-02-25 09:01:21 -05:00
Mark Backman
73ee4da7d4 Add Krisp API key support for new SDK licensing requirement
The Krisp VIVA SDK v1.8.0 requires a license key in globalInit(). Add
api_key parameter to KrispVivaSDKManager, KrispVivaTurn, and
KrispVivaFilter with fallback to KRISP_API_KEY env var. Maintain
backwards compatibility with older SDK versions by catching TypeError
and falling back to the old 3-arg signature.
2026-02-25 09:01:00 -05:00
Filipi da Silva Fuchter
2f60074da3 Merge pull request #3814 from pipecat-ai/filipi/fix_close_context
Fixed an issue where the TTS providers did not close the context after the audio context finished processing all audio.
2026-02-25 08:21:04 -05:00
filipi87
751b1b8100 Adding the changelog entries for the tts fixes. 2026-02-25 10:18:25 -03:00
filipi87
d899f0af11 Refactored all AudioContextTTSService based providers to override the new callbacks instead of _handle_interruption(), making provider-specific cleanup cleaner and more explicit 2026-02-25 10:18:16 -03:00
filipi87
c09ae6ba6d Added two new lifecycle callbacks to AudioContextTTSService: on_audio_context_interrupted() and on_audio_context_completed() 2026-02-25 10:17:54 -03:00
Mark Backman
a187a4b3b2 Merge pull request #3830 from pipecat-ai/aleix/restore-dev-skills 2026-02-25 06:33:16 -05:00
Aleix Conchillo Flaqué
68e19a730b Restore dev skills and add marketplace for maintainer workflows
Brings back the 6 development workflow skills (changelog, cleanup,
code-review, docstring, pr-description, pr-submit) that were moved
to pipecat-ai/skills, and adds a .claude-plugin/marketplace.json so
other pipecat-ai repos can install them. Updates README contributing
section with installation instructions.
2026-02-24 23:47:06 -08:00
Mark Backman
67cb7d575f Merge pull request #3828 from pipecat-ai/mb/skip-empty-audio-filter-frames
Skip empty audio frames after filter buffering
2026-02-24 23:27:22 -05:00
Mark Backman
a84930dc3e Skip empty audio frames after filter buffering
Audio filters like RNNoise, KrispViva, and AIC return empty bytes while
buffering audio to accumulate their required frame size. These empty
frames were flowing downstream, causing misleading "Empty audio frame
received for STT service" warnings.

Skip the frame in BaseInputTransport when audio is empty, preventing
unnecessary processing in VAD and downstream processors.

Fixes #3517
2026-02-24 23:21:52 -05:00
kompfner
54fd73c460 Merge pull request #3821 from pipecat-ai/pk/fix-missing-field-warning-rime-tts
Fix missing field warning in `RimeTTSService`
2026-02-24 21:58:17 -05:00
kompfner
c954e1c898 Merge pull request #3820 from pipecat-ai/pk/fix-breakage-when-sending-generic-settings-update
Fix breakage when using a generic settings update (e.g. a `TTSSetting…
2026-02-24 21:58:05 -05:00
kompfner
db76cd052a Merge pull request #3819 from pipecat-ai/pk/make-update-settings-frames-uninterruptible
Make `ServiceUpdateSettingsFrame` uninterruptible—settings updates ar…
2026-02-24 21:57:41 -05:00
Mark Backman
167e68672b Add changelog for InterimTranscriptionFrame/TranslationFrame fix 2026-02-24 20:52:16 -05:00
Mark Backman
69d916ca51 Consume InterimTranscriptionFrame and TranslationFrame in LLMUserAggregator
These frames were falling through to the else branch and being pushed
downstream, unlike TranscriptionFrame which is explicitly consumed.
This aligns with how the assistant aggregator already filters them.
2026-02-24 20:51:41 -05:00
Mark Backman
9890b93d08 Merge pull request #3822 from pipecat-ai/fix/stt-ttfb-timeout-timestamp
Fix STT TTFB timeout measuring to use transcript arrival time
2026-02-24 19:45:15 -05:00
Mark Backman
f928206b3a Add changelog for STT TTFB timeout fix 2026-02-24 19:02:40 -05:00
Mark Backman
f421ad9cf6 Fix STT TTFB timeout measuring to timeout expiry instead of transcript time
PR #3776 replaced manual timestamp tracking with stop_ttfb_metrics() in
the timeout handler, but without an end_time it uses time.time() at
timeout expiry—producing TTFB = timeout + stop_secs (~2.2s) instead of
the actual transcript latency.

Restore _last_transcript_time tracking so the timeout handler measures
to when the transcript arrived, and skip reporting if none arrived.
2026-02-24 18:57:38 -05:00
Paul Kompfner
d918a20b75 Fix missing field warning in RimeTTSService 2026-02-24 18:14:16 -05:00
Paul Kompfner
d91c230b85 Fix breakage when using a generic settings update (e.g. a TTSSettings) instead of a more specific one (e.g. a RimeTTSSettings). Both should work, assuming you're only changing fields present in the generic settings. 2026-02-24 18:05:27 -05:00
Paul Kompfner
b6f21ab15d Make ServiceUpdateSettingsFrame uninterruptible—settings updates are generally independent of specific utterances.
Before this change, settings updates were often not applied. For example, a `TTSUpdateSettingsFrame` queued while the bot was speaking would only have an effect at the end of the bot's reply, and any interruption before the end of the reply would "cancel" the update.
2026-02-24 17:47:53 -05:00
kompfner
6f0061ab96 Merge pull request #3812 from pipecat-ai/pk/service-settings-storage-v-delta-mode
Make clearer the distinction between "storage-mode" and "delta-mode" …
2026-02-24 15:37:49 -05:00
Aleix Conchillo Flaqué
761397d1f9 Merge pull request #3816 from pipecat-ai/aleix/use-skills-repo
Move skills to pipecat-ai/skills repo, add README instructions
2026-02-24 12:11:20 -08:00
Aleix Conchillo Flaqué
d9bb4d07c6 Merge pull request #3815 from pipecat-ai/fix/sentry-metrics-signatures
Fix SentryMetrics method signatures to match base class
2026-02-24 12:11:07 -08:00
Aleix Conchillo Flaqué
ee46cbce4c Move skills to pipecat-ai/skills repo, add README instructions
Remove bundled Claude Code skills (changelog, cleanup, code-review,
docstring, pr-description, pr-submit) that now live in
https://github.com/pipecat-ai/skills. Add a section to the README
with installation instructions. The update-docs skill remains as
it is specific to this repository.
2026-02-24 11:41:19 -08:00
Aleix Conchillo Flaqué
b4b9976b9c Fix SentryMetrics method signatures to match base class
Update start_ttfb_metrics, stop_ttfb_metrics, start_processing_metrics,
and stop_processing_metrics to accept start_time/end_time keyword
arguments matching the updated FrameProcessorMetrics signatures.

Closes #3808
2026-02-24 11:26:34 -08:00
Paul Kompfner
b78a293ffb Flatten input_params into individual fields on SonioxSTTSettings and GladiaSTTSettings
This makes each service-specific field individually visible to the delta/update mechanism (`apply_update`, `given_fields`) and removes the need for complex sync logic between `input_params` and top-level fields like `model`.

- Soniox: replace `input_params: SonioxInputParams` with 8 individual fields, simplify `_update_settings` by removing model sync logic, remove unused `is_given` import
- Gladia: replace `input_params: GladiaInputParams` with 11 individual fields, resolve deprecated `language` → `language_config` at init time rather than at `_prepare_settings` time
2026-02-24 14:01:43 -05:00
Paul Kompfner
0a89d24f70 Update some more services to ensure that there are no un-initialized fields in self._settings 2026-02-24 14:01:43 -05:00
Paul Kompfner
8c9ccf8f82 Bump various deprecation messages from mentioning version 0.0.103 to 0.0.104 2026-02-24 14:01:43 -05:00
Paul Kompfner
bcc2b4def4 Make clearer the distinction between "storage-mode" and "delta-mode" usage of *Settings objects
- Storage mode: for use in `self._settings`. All fields should be specified, i.e. should not be `NOT_GIVEN`.
- Delta mode: for use in `*UpdateSettingsFrame`.

In service of this, this commit:
- Adds a runtime check that all fields are specified in storage mode
- Updates all services to specify all fields in stored settings
- Updates all services to no longer check for `is_given` in stored settings (not necessary anymore)
- Updates relevant docstrings
- Renames `update` to `delta` in `*UpdateSettingsFrame`
- Updates community integrations guide
2026-02-24 14:01:28 -05:00
Filipi da Silva Fuchter
57d25c564c Merge pull request #3786 from pipecat-ai/filipi/refactor_word_tts_service
Refactoring the services using the WordTTSService
2026-02-24 13:53:58 -05:00
filipi87
6cda2ff941 Changelog entry for word timestamp refactor and deprecation notes. 2026-02-24 15:49:02 -03:00
filipi87
323477bfa4 Refactoring the services using the WordTTSService. 2026-02-24 15:48:46 -03:00
Mark Backman
fa692ec989 Merge pull request #3813 from pipecat-ai/mb/fix-stt-ttfb
Fix STT TTFB metrics for Soniox and AWS Transcribe
2026-02-24 13:12:32 -05:00
Mark Backman
23ad181515 Fix Soniox processing metrics to measure token-to-transcript time
Move start_processing_metrics from run_stt (called per audio chunk,
producing noisy 0ms logs) to _receive_messages when the first final
token arrives for a new utterance. The existing stop_processing_metrics
in send_endpoint_transcript completes the pair, giving a meaningful
measurement of time from first recognition to finalized transcript.
2026-02-24 13:09:29 -05:00
Mark Backman
6f7664846c Add can_generate_metrics to Soniox and AWS Transcribe STT services
Commit 859cd7c9 refactored STT TTFB measurement to use the base class
start_ttfb_metrics/stop_ttfb_metrics, which are gated behind
can_generate_metrics(). Soniox and AWS Transcribe never overrode this
method (default returns False), so TTFB was silently never reported.
2026-02-24 12:59:44 -05:00
Mark Backman
081aaa50dc Merge pull request #3811 from pipecat-ai/mb/nltk-upgrade
Bump nltk minimum version from 3.9.1 to 3.9.3
2026-02-24 10:28:32 -05:00
Mark Backman
aff8ab8a40 Update OpenAI Realtime default model to gpt-realtime-1.5 2026-02-24 09:07:31 -05:00
Mark Backman
0f7e6e14ab Bump nltk minimum version from 3.9.1 to 3.9.3
Resolves a security vulnerability flagged by Dependabot (#164).
2026-02-24 08:56:00 -05:00
Mark Backman
65f563ad34 Add debug logging to KrispVivaTurn analyze_end_of_turn and update example
Move speech detection tracking outside the per-frame loop in append_audio
since is_speech applies to the whole buffer. Add debug log in
analyze_end_of_turn to show state and probability at decision time. Update
the Krisp VIVA example to use Cartesia TTS and turn analyzer strategy.
2026-02-23 21:35:35 -05:00
Mark Backman
9c2ac661a3 Merge pull request #3805 from pipecat-ai/mb/dataclass-basemodel
Add dataclass vs Pydantic BaseModel convention to CLAUDE.md
2026-02-23 19:32:31 -05:00
kompfner
cdd65b6c0a Merge pull request #3714 from pipecat-ai/pk/service-settings-refactor
Broad refactor of service settings and how they’re updated at runtime
2026-02-23 17:15:15 -05:00
Paul Kompfner
71fc078c24 Refine ServiceSettings docstring: clarify NOT_GIVEN semantics and fix frame reference
Use wildcard `*UpdateSettingsFrame` to cover all frame types. Clarify that NOT_GIVEN only appears in update deltas, not in the service's current settings state.
2026-02-23 16:55:11 -05:00
Paul Kompfner
7556427862 Revise changelog entries for service settings refactor
Split the single "changed" entry into separate "added", "changed", and "deprecated" entries for clarity. Add a note about the subtle behavior change in the deprecated set_model/set_voice/set_language methods.
2026-02-23 16:52:11 -05:00
Paul Kompfner
bcf11ecbd4 Looks like the Deepgram Sagemaker TTS services aren't able yet to successfully disconnect/reconnect to apply runtime settings updates. For now, marking them as not yet supporting runtime settings updates. 2026-02-23 16:02:00 -05:00
Paul Kompfner
ff174dd1c2 Fix STT/TTS Deepgram Sagemaker 55-series examples (examples updating settings at runtime) 2026-02-23 16:02:00 -05:00
Paul Kompfner
e804060e17 Update COMMUNITY_INTEGRATIONS.md _update_settings examples
Simplify the reconnect example to show a common pattern (reconnect on any change) and improve the _warn_unhandled_updated_settings example to show selective handling of specific fields.
2026-02-23 15:45:00 -05:00
Paul Kompfner
30db5fea7c Clarify that ServiceSettings and subclasses represent runtime-updatable settings
Update docstrings for ServiceSettings, LLMSettings, TTSSettings, and STTSettings to make clear these capture only the subset of service configuration that can be changed while the pipeline is running via UpdateSettingsFrame, not all constructor parameters.
2026-02-23 15:38:57 -05:00
Mark Backman
c527e1f30f Add dataclass vs Pydantic BaseModel rule to CLAUDE.md
Document the existing convention: use @dataclass for frames and
internal pipeline data, use Pydantic BaseModel for configuration,
parameters, metrics, and external API data.
2026-02-23 14:26:16 -05:00
Paul Kompfner
029f3dbefb Updating 55o ElevenLabsTTSService example to also exercise switching voices, which requires reconnect 2026-02-23 12:08:13 -05:00
kompfner
03cb0054f9 Merge branch 'main' into pk/service-settings-refactor 2026-02-23 11:46:03 -05:00
Mark Backman
6a7e9358c6 Merge pull request #3803 from pipecat-ai/mb/inline-smart-turn-v3-deps
Inline local-smart-turn-v3 deps for Poetry compatibility
2026-02-23 09:29:52 -05:00
Mark Backman
6a3718d33d Inline local-smart-turn-v3 deps for Poetry compatibility
Replace self-referential `pipecat-ai[local-smart-turn-v3]` extra in core
dependencies with the actual packages (`transformers`, `onnxruntime`).
Self-referential extras are not supported by Poetry and cause dependency
resolution failures. Since these are required by the default turn stop
strategy (LocalSmartTurnAnalyzerV3), they belong in core dependencies.

- Remove `local-smart-turn-v3` optional extra from pyproject.toml
- Remove try/except ModuleNotFoundError guard (now always installed)
- Remove `--extra local-smart-turn-v3` from CI workflows
2026-02-23 09:00:36 -05:00
Om Chauhan
b390dc369c added changelog 2026-02-21 18:33:29 +05:30
Om Chauhan
a18aa738e0 fix(realtime): handle response_cancel_not_active as non-fatal 2026-02-21 18:26:31 +05:30
Om Chauhan
9476b5d184 added changelog 2026-02-21 17:35:08 +05:30
Om Chauhan
f49658de15 skipping provider-specific messages during summarization 2026-02-21 17:19:50 +05:30
Aleix Conchillo Flaqué
b67af19d47 Merge pull request #3793 from pipecat-ai/changelog-0.0.103
Release 0.0.103 - Changelog Update
2026-02-20 16:42:46 -08:00
aconchillo
6d9c07b945 Update changelog for version 0.0.103 2026-02-20 16:39:36 -08:00
Aleix Conchillo Flaqué
18429f80f1 github(changelog): allow performance type 2026-02-20 16:32:40 -08:00
Aleix Conchillo Flaqué
0a54dc9721 Merge pull request #3792 from pipecat-ai/aleix/update-anthropic-default-model
Update default Anthropic model to claude-sonnet-4-6
2026-02-20 16:28:08 -08:00
Aleix Conchillo Flaqué
521f669051 Add changelog entries for PR #3792 2026-02-20 16:18:21 -08:00
Aleix Conchillo Flaqué
abb20f34ba Update default Anthropic model to claude-sonnet-4-6
Update the default model in AnthropicLLMService and remove the
now-unnecessary explicit model from the function calling example.
2026-02-20 16:17:51 -08:00
Joshua Primas
d38b1d97d4 Added changelog 2026-02-20 16:13:44 -08:00
Joshua Primas
0b4568843b Improved logging + error handling + pipecat bot name usage 2026-02-20 15:59:52 -08:00
Joshua Primas
35aba4128c Adding the LemonSlice transport integration 2026-02-20 15:24:48 -08:00
Aleix Conchillo Flaqué
b1e72ad4b7 Merge pull request #3789 from pipecat-ai/aleix/fix-missing-await-and-interruption-hang
Fix missing await and interruption timeout hang
2026-02-20 14:59:11 -08:00
Aleix Conchillo Flaqué
f610fb95f9 Add changelog entries for PR #3789 2026-02-20 14:56:46 -08:00
Aleix Conchillo Flaqué
827032fefb Unblock push_interruption_task_frame_and_wait after timeout
When the InterruptionFrame does not complete within the timeout the
caller was stuck in an infinite loop logging warnings. Now the event
is set after the first timeout so the processor can continue.

Also adds a keyword timeout parameter so callers can customize the
wait duration.
2026-02-20 14:56:42 -08:00
Aleix Conchillo Flaqué
af4ef95dc6 Fix missing await on add_audio_frames_message in Google audio examples
The method is async but was being called without await, silently
discarding the coroutine.
2026-02-20 14:24:22 -08:00
Aleix Conchillo Flaqué
0370bb15e4 update uv.lock 2026-02-20 13:47:04 -08:00
Aleix Conchillo Flaqué
2b3595485f Merge pull request #3788 from dhruvladia-sarvam/v3-fix-final
initial
2026-02-20 13:46:18 -08:00
Paul Kompfner
af4226adbf Add changelog entries for service settings refactor PR #3714 2026-02-20 15:26:17 -05:00
Paul Kompfner
29e2a861dc Update AIService.set_model_name to AIService._sync_model_name_to_metrics to:
- indicate clearly that it's not meant for public use
- make it clear the `self._settings` is the single source of truth for model information
- set the stage for an upcoming change where `AIService` subclasses won't have to ever worry about explicitly calling an `AIService` method to sync model name to metrics

Across all services, switch from accessing `self._model_name` or `self.model_name` in favor of `self._settings.model`.
2026-02-20 15:02:50 -05:00
Filipi da Silva Fuchter
63c664becb Merge pull request #3787 from pipecat-ai/filipi/refresh_active_audio_context
Fix race condition where context times out after sending second transcript
2026-02-20 14:50:38 -05:00
dhruvladia-sarvam
fecf462139 initial 2026-02-21 01:02:37 +05:30
Daksh Dua
023063759a Changelog entry for TTS race condition fix. 2026-02-20 16:00:34 -03:00
Daksh Dua
c49eda98e7 Fix race condition where context times out after sending second transcript 2026-02-20 15:37:14 -03:00
Filipi da Silva Fuchter
5d07326e36 Merge pull request #3732 from pipecat-ai/filipi/tts_updates
Refactored audio context management in TTS services
2026-02-20 13:02:42 -05:00
filipi87
fa659311b6 Changelog entry 2026-02-20 14:57:59 -03:00
filipi87
125c423356 Refactored audio context management in TTS services to improve encapsulation and reduce code duplication 2026-02-20 14:57:44 -03:00
Filipi da Silva Fuchter
c9615c8db6 Merge pull request #3779 from pipecat-ai/filipi/filter_observer
Allowing to define the list of frame processors whose frames should be silently ignored by the RTVI observer.
2026-02-20 12:42:02 -05:00
Aleix Conchillo Flaqué
28c542f6ed Merge pull request #3785 from pipecat-ai/mb/deepgram-sagemaker-tts
Add DeepgramSageMakerTTSService
2026-02-20 09:01:32 -08:00
Paul Kompfner
f5b86d9cdc Actually, revert the change making it so that STTService takes model and language args at init time. It'll be up to the subclasses to append those to _settings (or better yet, provide their own service-specific _settings). This avoids rocking the boat too too much. 2026-02-20 11:26:28 -05:00
Aleix Conchillo Flaqué
5708c81b93 Merge pull request #3782 from pipecat-ai/aleix/fix-mutable-default-args-aggregator-pair
Fix mutable default arguments in LLMContextAggregatorPair
2026-02-20 08:02:18 -08:00
Paul Kompfner
f4e9825c03 Remove self._voice_id from TTS Service implementations in favor of self._settings.voice 2026-02-20 10:52:57 -05:00
Mark Backman
82ce3ea8de Update 07c example to use DeepgramSageMakerTTSService 2026-02-20 08:10:41 -07:00
Mark Backman
62ada92188 Add changelog for PR #3785 2026-02-20 08:09:57 -07:00
Mark Backman
273692421f Add DeepgramSageMakerTTSService for Deepgram TTS on AWS SageMaker
Adds a TTS service that connects to Deepgram models deployed on AWS
SageMaker endpoints via HTTP/2 bidirectional streaming. Supports the
Deepgram TTS protocol (Speak, Flush, Clear, Close) over the BiDi
client, with interruption handling and per-turn TTFB metrics.

Updates the example and env.example with separate STT/TTS endpoint names.
2026-02-20 08:08:00 -07:00
Paul Kompfner
5d8a5bf750 Add initialization of self._settings to service superclasses (STTService, TTSService, LLMService), using "generic" settings for those services (STTSettings, TTSSettings, LLMSettings) 2026-02-20 09:31:22 -05:00
Mark Backman
0a3e212f93 Merge pull request #3784 from pipecat-ai/mb/stt-sagemaker-finalize
Align DeepgramSageMakerSTTService finalize pattern with DeepgramSTTService
2026-02-20 09:26:23 -05:00
Mark Backman
43d686c622 Add changelog entry for PR #3784 2026-02-20 07:17:36 -07:00
Mark Backman
4d136e1e28 Align DeepgramSageMakerSTTService finalize pattern with DeepgramSTTService 2026-02-20 07:15:38 -07:00
Aleix Conchillo Flaqué
2024285c75 Add changelog entries for PR #3782 2026-02-19 20:52:31 -08:00
Aleix Conchillo Flaqué
bc830c16f1 Fix mutable default arguments in LLMContextAggregatorPair
Replace mutable default parameter values with None and instantiate
inside the method body to avoid shared state across calls.
2026-02-19 20:52:00 -08:00
Paul Kompfner
fb27642190 Add self._settings to 6 remaining services
- AWSNovaSonicLLMService: new `AWSNovaSonicLLMSettings` with `voice_id` and `endpointing_sensitivity`; remove `self._params` entirely, storing audio I/O config as plain instance variables
- NeuphonicHttpTTSService: reuse `NeuphonicTTSSettings`; use inherited `language` field instead of bespoke `lang_code`
- NvidiaTTSService: new `NvidiaTTSSettings` with `quality`
- PiperTTSService / PiperHttpTTSService: new `PiperTTSSettings` / `PiperHttpTTSSettings` (no extra fields)
- SpeechmaticsTTSService: new `SpeechmaticsTTSSettings` with `max_retries`

Also remove redundant `lang_code` from `NeuphonicTTSSettings` (both WS and HTTP services now use the inherited `TTSSettings.language` field, with automatic enum conversion via the base class).

HTTP services (Neuphonic HTTP, Piper HTTP, Speechmatics) don't override `_update_settings` since the base class applies changes to `self._settings` and subsequent requests read from it automatically.
2026-02-19 18:35:59 -05:00
Paul Kompfner
463ea3725b Update Deepgram Flux with the new service settings pattern 2026-02-19 17:12:24 -05:00
Paul Kompfner
6c609031ee Add more 55-series examples
Also:
- remove unnecessary pass-through `_update_settings` implementation in `FalSTTService`
- warn that `AsyncAITTSService` doesn't currently support runtime settings updates
- update how `GradiumTTSService._update_settings` checks for voice changes
- remove a couple of unnecessary args (because they specified defaults) in other examples
2026-02-19 16:46:14 -05:00
filipi87
18630c9478 Adding changelog entry for RTVI observer ignored_sources feature. 2026-02-19 18:41:05 -03:00
filipi87
3a8d3cc841 Allowing to define the list of frame processors whose frames should be silently ignored by the RTVI observer. 2026-02-19 18:36:12 -03:00
Filipi da Silva Fuchter
2963c7589d Merge pull request #3774 from pipecat-ai/mb/broadcast-frames-rtvi-observer
Fix RTVIObserver missing upstream-only frames
2026-02-19 15:32:48 -05:00
filipi87
63caa403cb Improving RTVI doc description. 2026-02-19 17:31:25 -03:00
Paul Kompfner
ebb42a3c6d Fix forward reference crash in Google and Anthropic LLM ThinkingConfig
ThinkingConfig was defined as an inner class on the service but referenced in the Settings dataclass declared before the service class, causing a crash at import time. Move ThinkingConfig to a standalone class defined before Settings, and keep a class attribute alias for backward compatibility.
2026-02-19 15:06:48 -05:00
Paul Kompfner
cc54ff4708 Add more 55-series examples 2026-02-19 14:55:21 -05:00
Aleix Conchillo Flaqué
846cf0794d Merge pull request #3615 from omChauhanDev/fix/daily-transport-message-queue
fix(daily): queue outbound messages until transport joins
2026-02-19 11:55:11 -08:00
Aleix Conchillo Flaqué
498349c17e Merge pull request #3776 from pipecat-ai/aleix/stt-ttfb-metrics-refactor
Refactor STT TTFB metrics to use base class start/stop pattern
2026-02-19 11:46:46 -08:00
Aleix Conchillo Flaqué
474b27305f Merge pull request #3748 from pipecat-ai/mb/user-idle-configurable
Make UserIdleController always-on with dynamic timeout updates
2026-02-19 11:44:51 -08:00
Aleix Conchillo Flaqué
20509e8f96 Merge pull request #3744 from pipecat-ai/mb/user-idle-timeout-frame
Redesign UserIdleController to use BotStoppedSpeakingFrame
2026-02-19 11:34:42 -08:00
filipi87
5b2fa69bdc Renaming from broadcasted_sibling_id to broadcast_sibling_id 2026-02-19 16:24:07 -03:00
Aleix Conchillo Flaqué
4f8cacc769 Merge pull request #3747 from pipecat-ai/mb/update-comment-mute-strategy
Update comment in _maybe_mute_frame
2026-02-19 11:19:44 -08:00
Aleix Conchillo Flaqué
0145fb4ea0 Merge pull request #3763 from lukepayyapilli/fix/asyncgen-cleanup-uvloop-crash
Fix async generator cleanup to prevent uvloop crash on Python 3.12+
2026-02-19 11:14:00 -08:00
Aleix Conchillo Flaqué
8e52df7f03 Add changelog entries for PR #3776 2026-02-19 10:52:45 -08:00
Aleix Conchillo Flaqué
8ee99e37ff Merge pull request #3768 from tanmayc25/fix/tavus-sample-rate
fix: use audio.sample_rate instead of audio.audio_frames in TavusInputTransport
2026-02-19 10:52:34 -08:00
Aleix Conchillo Flaqué
bae4211369 Update dependency lock file 2026-02-19 10:52:28 -08:00
Aleix Conchillo Flaqué
859cd7c920 Refactor STT TTFB metrics to use base class start/stop pattern
Eliminate custom _emit_stt_ttfb_metric and manual timestamp tracking in
STTService by reusing FrameProcessor's start_ttfb_metrics/stop_ttfb_metrics
with new start_time/end_time parameters. This keeps the chronological
start→stop ordering and removes _speech_end_time and _last_transcription_time
state from STTService.
2026-02-19 10:52:24 -08:00
filipi87
d608c400f9 Preventing the duplicated BotStartedSpeakingFrame and BotStoppedSpeakingFrame. 2026-02-19 15:49:22 -03:00
Aleix Conchillo Flaqué
94e93bed83 Merge pull request #3719 from pipecat-ai/aleix/sip-transfer-refer-frames
Add SIP transfer and SIP REFER frames to Daily transport
2026-02-19 10:09:13 -08:00
filipi87
b1cee140b9 Refactoring to use broadcasted_sibling_id instead of broadcasted field. 2026-02-19 15:06:50 -03:00
Aleix Conchillo Flaqué
352361bdd2 Update changelog skill to avoid line wrapping 2026-02-19 09:20:33 -08:00
Aleix Conchillo Flaqué
baa61468a1 Add changelog entries for PR #3719 2026-02-19 09:20:33 -08:00
Aleix Conchillo Flaqué
7501ba2e45 Undeprecate DailyUpdateRemoteParticipantsFrame
Remove the deprecation warning and __post_init__ override. Also fix the
default value for remote_participants to use field(default_factory=dict)
instead of None.
2026-02-19 09:20:33 -08:00
Aleix Conchillo Flaqué
200716e8fe Add SIP transfer and SIP REFER frames to Daily transport
Add write_transport_frame() hook to BaseOutputTransport so subclasses
can handle custom frame types that flow through the audio queue. Add
DailySIPTransferFrame and DailySIPReferFrame as DataFrame subclasses
that queue with audio, ensuring SIP operations execute only after the
bot finishes its current utterance. Override write_transport_frame in
DailyOutputTransport to dispatch these frames to the existing
sip_call_transfer() and sip_refer() client methods.

Also switch DailyOutputTransport.send_message error handling from
logger.error to push_error for consistency.
2026-02-19 09:20:33 -08:00
Paul Kompfner
421696e1c2 Replace Any with specific types and add | _NotGiven to all *Settings field annotations across 49 service files
Every `*Settings` dataclass field whose default is `NOT_GIVEN` now carries `_NotGiven` in its type union so the type system accurately reflects the three-state semantics (real value, `None` where applicable, or not-yet-specified). Fields previously typed as bare `Any`, `str`, `float`, `bool`, `list`, `dict`, or `Optional[X]` are now narrowed to the specific type from the corresponding `InputParams` Pydantic model.
2026-02-19 11:28:29 -05:00
Mark Backman
50ef4909e3 Add changelog entries for PR #3774 2026-02-19 07:44:52 -07:00
Mark Backman
63df4642b5 Fix RTVIObserver missing upstream-only frames by adding broadcasted flag
RTVIObserver previously filtered out all upstream frames to avoid
duplicate messages from broadcasted frames. This caused upstream-only
frames to be silently ignored. Instead, add a `broadcasted` field to
the Frame base class that is set by broadcast_frame() and
broadcast_frame_instance(), and only skip upstream copies of
broadcasted frames.
2026-02-19 07:43:20 -07:00
Filipi da Silva Fuchter
43869a499d Merge pull request #3773 from pipecat-ai/mb/fix-ci-apt-get-update
Fix CI: add apt-get update before installing system packages
2026-02-19 09:28:25 -05:00
Mark Backman
d2bf3952ec Merge pull request #3772 from simliai/main
Update SimliClient to latest
2026-02-19 09:13:14 -05:00
Mark Backman
92c380ee77 Add apt-get update before installing system packages in CI
The CI was failing because the runner's package index was stale,
causing a 404 when fetching libasound2-dev (a dependency of
portaudio19-dev). Running apt-get update first refreshes the index.
2026-02-19 07:01:07 -07:00
antonyesk601
a55ba40921 fix: remove misimport 2026-02-19 10:41:17 +00:00
antonyesk601
fb1bfd03dd update SimliClient to latest 2026-02-19 10:35:50 +00:00
Paul Kompfner
a7edd8e441 Fix 55zp example 2026-02-18 17:15:22 -05:00
Paul Kompfner
2a07138abf Fix Grok Realtime dynamic session properties updating, and update corresponding 55zo example 2026-02-18 17:12:36 -05:00
Filipi da Silva Fuchter
a0a7b3101d Merge pull request #3765 from ianbbqzy/ian/inworld-default-async
[inworld] default timestamp transport strategy to ASYNC
2026-02-18 16:59:01 -05:00
Filipi da Silva Fuchter
39dc4ba99c Updated changelog/3765.changed.md 2026-02-18 16:58:27 -05:00
Paul Kompfner
ad942f6e4c Update 55zn example (UIltravox dynamic settings updates) to exercise changing modality, which is a setting that supports dynamic updates 2026-02-18 16:33:05 -05:00
Paul Kompfner
97d34ef9e1 Update OpenAI Realtime to warn when you try to update settings that can't be updated dynamically.
Update corresponding example to demonstrate updating output modality.
2026-02-18 16:16:06 -05:00
Paul Kompfner
c054780477 Fix 55zh example 2026-02-18 15:59:34 -05:00
Paul Kompfner
88a2dbdb82 Update 55zf example to update a setting that is supported by the default Camb TTS model 2026-02-18 15:48:50 -05:00
Paul Kompfner
d386a0efda Update Sarvam TTS to apply all changes to settings, not just voic 2026-02-18 15:31:08 -05:00
Paul Kompfner
b718a23c17 Tweak 55zd example 2026-02-18 15:25:50 -05:00
Paul Kompfner
e38f7d9451 Fix 55zc example 2026-02-18 15:23:23 -05:00
Paul Kompfner
b00d454842 Fix Inworld TTS settings updating 2026-02-18 15:19:57 -05:00
Paul Kompfner
0fa51811ea Fix 55z example 2026-02-18 15:11:04 -05:00
Paul Kompfner
323ee00b83 Fix 55w example 2026-02-18 14:51:48 -05:00
Paul Kompfner
0c73b77327 Update Lmnt TTS to support updating settings dynamically 2026-02-18 14:47:38 -05:00
Paul Kompfner
416e1cf877 Update Rime TTS services to store voice in the standard settings.voice field, as opposed to the nonstandard speaker field 2026-02-18 14:46:47 -05:00
Paul Kompfner
b4c5cb258b Tweak 55r example to make the settings update more pronounced 2026-02-18 14:15:14 -05:00
Paul Kompfner
728a97ade3 Update Deepgram TTS to support updating settings dynamically 2026-02-18 14:11:51 -05:00
Paul Kompfner
28677ec829 Tweak 55p example to make the settings update more pronounced 2026-02-18 13:49:32 -05:00
Paul Kompfner
17886d14e8 Fix ElevenLabsTTSService settings update code 2026-02-18 13:47:02 -05:00
Paul Kompfner
caf5dacbe8 Update 55j example to avoid console warning 2026-02-18 12:37:50 -05:00
Paul Kompfner
b8b531b66a In Cartesia TTS service, we don't need to override _update_settings. Parent class handling is enough, as new settings are picked up on the next run_tts (no need to reconnect). 2026-02-18 12:37:34 -05:00
Paul Kompfner
a14690e3a0 Fix the 55i example 2026-02-18 11:55:14 -05:00
Paul Kompfner
d913d954db Fix SpeechmaticsSTTService settings update code, and augment test file to better exercise it 2026-02-18 11:34:52 -05:00
Paul Kompfner
e98bb1df66 Simplify 55* examples: inline the settings update directly in the on_client_connected handler instead of wrapping it in a separate async task 2026-02-18 11:06:33 -05:00
Paul Kompfner
a7ada79fd9 Fix ElevenLabsRealtimeSTTService:
- Move `CommitStrategy` up in the file so it could be used by `ElevenLabsRealtimeSTTSettings`
- Fix a bug where `run_tts` would erroneously try to reconnect if a reconnection was already in flight (like a reconnection triggered by `_update_settings`)
2026-02-18 10:50:53 -05:00
Filipi da Silva Fuchter
a5b5a8e5cf Merge pull request #3759 from pipecat-ai/mb/gradium-context-update
Switch Gradium TTS to AudioContextWordTTSService for multiplexing
2026-02-18 10:16:57 -05:00
filipi87
1daea78b91 Fix GradiumTTSService to reuse context IDs across multiple run_tts calls and prevent the parent class from pushing text frames. 2026-02-18 12:12:49 -03:00
Paul Kompfner
7910f20e14 Update comment in Azure TTS explaining how we could support dynamic settings updates in the future 2026-02-18 10:07:33 -05:00
Paul Kompfner
d7d94a29f0 Add foundational examples (55) for runtime settings updates via *UpdateSettingsFrame
42 examples covering STT (13), TTS (21), LLM (4), and realtime (4) services. Each demonstrates updating service settings 10 seconds after client connects, verifying the typed settings machinery end-to-end for every provider.
2026-02-18 09:46:23 -05:00
Tanmay Chaudhari
6066eec853 Add changelog for PR #3768 2026-02-18 14:31:16 +05:30
Tanmay Chaudhari
cd379671aa fix: use audio.sample_rate instead of audio.audio_frames in TavusInputTransport 2026-02-18 14:18:16 +05:30
Ian Lee
8006223911 [inworld] default timestamp transport strategy to ASYNC 2026-02-17 15:13:20 -08:00
Paul Kompfner
ce51df677c Add backward-compat _aliases and from_mapping overrides to TTS settings
The migration from plain-dict `self._settings` to typed dataclasses renamed keys and flattened nested dicts. The deprecated dict-based `TTSUpdateSettingsFrame(settings={...})` code path calls `from_mapping`, which silently dropped old keys into `extra`.

- Add `_aliases` so renamed flat keys (e.g. `sample_rate` → `fish_sample_rate`, camelCase Inworld keys) resolve correctly.
- Override `from_mapping` to destructure nested dicts (`output_format`, `prosody`, `audioConfig`, `voice_setting`, `audio_setting`) into their flat field equivalents.
- Fix AsyncAI constructor bug passing `output_format={...}` dict instead of individual `output_container`/`output_encoding`/`output_sample_rate` fields.
2026-02-17 17:07:14 -05:00
Paul Kompfner
68ebd3d063 Migrate HumeTTSService to standard TTSSettings pattern and remove dead TTSService.update_setting
HumeTTSService now stores its params (description, speed, trailing_silence) in a proper `HumeTTSSettings` dataclass instead of a separate `_params` Pydantic model, making it work with `TTSUpdateSettingsFrame(update=...)`. The old `update_setting(key, value)` method is kept but deprecated.

Also removes the unused no-op `TTSService.update_setting` base method, which was never called by the `TTSUpdateSettingsFrame` pipeline.
2026-02-17 15:44:41 -05:00
Radek Sedlák
5ea2d47d39 feat: Add support for private endpoint in Azure STT 2026-02-17 21:42:00 +01:00
Paul Kompfner
94a651cee2 Remove dead ServiceSettings.to_dict method 2026-02-17 15:15:18 -05:00
Paul Kompfner
1cad4210ce Deprecate dict-based *UpdateSettingsFrame(settings={...}) code path in STT, TTS, and LLM services.
The dataclass-based API (`*UpdateSettingsFrame(update=*Settings(...))`) is the preferred path since 0.0.103. The dict path still works but now emits a `DeprecationWarning`.
2026-02-17 15:09:39 -05:00
Paul Kompfner
1cec8d119d Expand language field docstrings to clarify storage invariant.
The union type reflects the input side; after construction and `_update_settings`, the stored value is always a service-specific string.
2026-02-17 14:57:38 -05:00
Paul Kompfner
7dc16b1d92 Type language fields and centralize conversion in STT services.
Change `TTSSettings.language` and `STTSettings.language` from `Any` to `Language | str | _NotGiven`. Add `language_to_service_language` base method and centralized `isinstance`-guarded conversion in `STTService._update_settings` (mirroring TTS). Update the TTS guard from `is not None` to `isinstance(…, Language)` so raw strings pass through unchanged.

Remove now-redundant per-service language conversion from `_update_settings` overrides (ElevenLabs, Azure, Fal, Whisper). Add `language_to_service_language` to Azure STT so the centralized conversion picks it up. Fix AWS and NVIDIA STT `__init__` to convert language at construction time, then simplify their runtime accessors to read `_settings.language` directly.
2026-02-17 14:49:26 -05:00
Luke Payyapilli
247f0bbcd3 Fix async generator cleanup to prevent uvloop crash on Python 3.12+ 2026-02-17 13:10:31 -05:00
Paul Kompfner
d2372c127a Add specific type annotations to ServiceSettings fields, replacing Any with str, float, int unions as appropriate. 2026-02-17 11:56:37 -05:00
Paul Kompfner
3b1ba57452 Change apply_update / _update_settings return type from set[str] to dict[str, Any]. The dict maps each changed field name to its pre-update value, enabling services to do granular diffing of complex settings objects. Existing call-site patterns ("field" in changed, if changed, iteration) work unchanged; set-difference sites use changed.keys() - {...}. 2026-02-17 11:49:15 -05:00
Paul Kompfner
02c2778b8d Document _warn_unhandled_updated_settings pattern in COMMUNITY_INTEGRATIONS.md. 2026-02-17 11:08:26 -05:00
Paul Kompfner
fa6a6dabee Fix DeepgramSageMakerSTTService._update_settings live_options sync to match DeepgramSTTService pattern.
Add missing reverse sync (live_options → top-level model/language) and `set_model_name()` call.
2026-02-17 11:02:13 -05:00
Paul Kompfner
3a77b4c1d8 In services that don't handle runtime settings updates—or don't handle them for *all* available settings—log a warning about which fields specifically aren't handled. Revert new apply-settings-updates logic across various services, to reduce PR testing scope. This logic can be added service by service gradually as future work.
Note that for services that previously handled applying updates (through methods like `set_model` and `set_language`), we're keeping the update-applying logic (some or most of which is already well-tested) and expanding it to cover all relevant settings fields. Services under this bucket are:
- Deepgram STT
- Deepgram Sagemaker STT
- Elevenlabs STT
- Google STT
- Gradium STT
- OpenAI STT
- Speechmatics STT
2026-02-17 10:58:29 -05:00
Mark Backman
3537420d91 Merge pull request #3761 from speechmatics/fix/sdk-version 2026-02-17 08:02:00 -05:00
Sam Sykes
65fb88e61e chore: update version specifier for speechmatics-voice
Change the version specifier from `>=0.2.8` to
`~=0.2.8` for the `speechmatics-voice` package.
This ensures compatibility with future patch
versions while preventing potential breaking
changes from minor updates.
2026-02-17 09:58:17 +00:00
Sam Sykes
b345f48ac1 fix: update dependency specifier for speechmatics-voice
Change the version specifier from >=0.2.8 to ~=0.2.8 for the
speechmatics-voice package to ensure compatibility with future
patch versions.
2026-02-17 09:55:43 +00:00
Mark Backman
f181e12d8f Add changelog for PR #3759 2026-02-16 11:35:45 -07:00
Mark Backman
36de6003d0 Switch Gradium TTS to AudioContextWordTTSService for multiplexing
Use client_req_id-based multiplexing instead of disconnecting and
reconnecting the websocket on every interruption. This follows the
same pattern used by Cartesia, ElevenLabs, and other services via
AudioContextWordTTSService.

Key changes:
- Base class: InterruptibleWordTTSService -> AudioContextWordTTSService
- Add close_ws_on_eos: False to setup message to keep connection alive
- Add client_req_id to text, end_of_stream messages for demultiplexing
- Route audio via append_to_audio_context() instead of push_frame()
- Silently drop messages for cancelled/unknown contexts on interruption
- Add _handle_interruption() that resets context without reconnecting
- Remove no-op push_frame() override
2026-02-16 11:34:16 -07:00
Mark Backman
dba4de77bf Merge pull request #3684 from ai-coustics/goedev/aic-model-caching
AIC model caching
2026-02-16 10:43:14 -05:00
Mark Backman
507765625f Make UserIdleController always-on with dynamic timeout updates
Always create UserIdleController (timeout=0 means disabled), removing
all Optional guards. Add UserIdleTimeoutUpdateFrame to allow changing
the idle timeout at runtime.
2026-02-14 09:54:30 -05:00
Mark Backman
8f5e5e8e7c Update comment in _maybe_mute_frame 2026-02-14 09:41:42 -05:00
Mark Backman
c682a44bb6 Merge pull request #3738 from lukepayyapilli/fix/mute-events-before-start-frame
Fix LLMUserAggregator broadcasting mute events before StartFrame
2026-02-14 09:40:40 -05:00
Mark Backman
cb7023681f Add changelog for PR #3744 2026-02-14 08:57:46 -05:00
Mark Backman
012ef41ff4 Redesign UserIdleController to use BotStoppedSpeakingFrame
Replace the continuous heartbeat-based timer (UserSpeakingFrame/BotSpeakingFrame
+ asyncio.Event loop) with a simple one-shot timer that starts when
BotStoppedSpeakingFrame is received and cancels on UserStartedSpeakingFrame or
BotStartedSpeakingFrame. This eliminates false idle triggers caused by gaps
between the user finishing speaking and the bot starting to speak (LLM/TTS
latency).

Guard the timer start with two conditions to prevent false triggers:
- User turn in progress: during interruptions, BotStoppedSpeaking arrives
  while the user is still speaking mid-turn.
- Function calls in progress: FunctionCallsStarted arrives before
  BotStoppedSpeaking because the bot speaks concurrently with the function
  call starting, so the timer must wait for the result and subsequent bot
  response.
2026-02-14 08:55:56 -05:00
Paul Kompfner
66b7b4a5d4 Update COMMUNITY_INTEGRATIONS.md for the new dataclass-based service settings pattern. 2026-02-13 16:04:49 -05:00
Filipi da Silva Fuchter
f6bb5fa124 Merge pull request #3741 from pipecat-ai/filipi/update_prebuilt
Using the latest version of pipecat-ai-small-webrtc-prebuilt.
2026-02-13 15:31:48 -05:00
Paul Kompfner
b08548af9d Remove typed-settings migration scaffolding and rename _update_settings_from_typed to _update_settings.
Now that all services use typed `ServiceSettings` objects, this removes the interim scaffolding that supported both dict-based and typed settings paths in parallel. Specifically: removes old dict-based `_update_settings(settings: Mapping)` methods from base classes, removes `isinstance(self._settings, ServiceSettings)` guards, simplifies `process_frame` branching, and renames `_update_settings_from_typed` to `_update_settings` across all ~30 service implementations. Also renames the no-arg `_update_settings()` helper on realtime services to `_send_session_update()` to avoid collision, adds `from_mapping` overrides on `GoogleLLMSettings` and `AnthropicLLMSettings` for ThinkingConfig dict-to-object conversion, and replaces a broken no-arg `_update_settings()` call in Gemini Live with a TODO.
2026-02-13 15:12:26 -05:00
Paul Kompfner
ab92a0e1d7 Remove/deprecate service-specific set_model and set_voice overrides.
- NvidiaSTTService.set_model: convert to proper DeprecationWarning (model can't change at runtime for Riva streaming STT)
- NvidiaTTSService.set_model: same treatment for Riva TTS
- NvidiaSegmentedSTTService.set_model: remove — base class now routes through _update_settings_from_typed which re-creates the recognition config
- GeminiTTSService.set_voice: remove — move AVAILABLE_VOICES validation into _update_settings_from_typed so it fires on both legacy and new paths
2026-02-13 15:12:26 -05:00
Paul Kompfner
e37f2f99c4 Deprecate set_model, set_voice, and set_language in favor of *UpdateSettingsFrame. 2026-02-13 15:12:26 -05:00
Paul Kompfner
e43351f5f8 Add class-level _settings type annotations to all service classes for better editor support.
Standardize all STT, TTS, and LLM service classes to declare `_settings` with the narrowed Settings type as a class-level annotation. This gives editors and type checkers the specific type when hovering or autocompleting on `self._settings` in each service and its subclasses. Inline `self._settings: Type = ...` assignments are replaced with plain `self._settings = ...`.
2026-02-13 15:12:26 -05:00
Paul Kompfner
444cbb6499 Add turn-completion fields to LLMSettings and handle them in the typed-service-settings path.
`filter_incomplete_user_turns` and `user_turn_completion_config` were only handled in the legacy dict-based `_update_settings` code path. This adds them to `LLMSettings` and introduces `LLMService._update_settings_from_typed` so the typed path handles them too.
2026-02-13 15:12:26 -05:00
Paul Kompfner
8a4ab611be Broad service settings refactor, with the primary aim of making service settings discoverable and strongly-typed. Service settings can be updated at runtime with *UpdateSettingsFrames.
Does not (yet) touch `InputParams`, to avoid scope creep and touching something currently part of the public API. But there is a lot of overlap between `*Settings` object fields and `InputParams` fields.

Other than discoverability/typing, these are some other improvements brought by this refactor:
- There is now a single code path (see `_update_settings_from_typed`) where services can respond to settings changes (by, say, reconnecting if needed), improving maintainability and guaranteeing one and only one reconnection no matter which settings changed
- `set_language`/`set_model`/`set_voice`—which we're assuming are usable as public methods, though *not* recommended over `*UpdateSettingsFrame`—all use the same code path as settings updates. They're also now all consistent in that, if a service needs to respond to a change (by, say, reconnecting if needed), any of these methods will kick off that process. Note that this is technically a behavior change.
- Several services now properly react to changed settings by reconnecting:
  - `AWSTranscribeSTTService`
  - `AzureSTTService`
  - `SonioxSTTService`
  - `GladiaSTTService`
  - `SpeechmaticsSTTService`
  - `AssemblyAISTTService`
  - `CartesiaSTTService`
  - `FishAudioTTSService` (would previously only reconnect when `model` changed)
  - `GoogleSTTService`
  - `SpeechmaticsSTTService` (which previously only handled *some* settings updates through a nonstandard public `update_params` method)
  - `GradiumSTTService`
  - `NvidiaSegmentedSTTService` (which previously only handled changes to language)
- Bookkeeping across various services has been reduced, mostly by deduping ivars; the `self._settings` ivar is treated as the source of truth

NOTE: I pretty much guarantee that there are services missed in this PR in terms of bringing to consistency with how updates are handled (like whether changes in certain fields trigger reconnects when they need to). We can squash remaining inconsistencies as we stumble onto them, service by service. The goal here is to get things *mostly* in order, and establish the infrastructure and patterns we'll need going forward.
2026-02-13 15:12:26 -05:00
filipi87
2489c76bc6 Using the latest version of pipecat-ai-small-webrtc-prebuilt. 2026-02-13 16:43:25 -03:00
Mark Backman
73cb96bf66 Merge pull request #3739 from pipecat-ai/mb/docs-skill
Add /update-docs Claude Code skill
2026-02-13 13:26:06 -05:00
Mark Backman
79ec61d1d8 Merge pull request #3642 from pipecat-ai/cb/rime-arcana-v3
Update RimeTTSService for arcana and mistv2 model support
2026-02-13 13:25:27 -05:00
Mark Backman
ca440594fe Merge pull request #3720 from pipecat-ai/mb/fix-grok-realtime
Fix Grok Realtime voice type validation for server responses
2026-02-13 13:24:53 -05:00
Mark Backman
6c25dd4aa2 Merge pull request #3736 from pipecat-ai/mb/improve-events-docstrings
Improve events docstrings
2026-02-13 13:24:15 -05:00
Mark Backman
09bb6bb03b Merge pull request #3735 from pipecat-ai/mb/fix-llm-tracing-error-handilng
Fix double execution of service functions on tracing errors
2026-02-13 13:23:55 -05:00
Mark Backman
746fdfbfef Merge pull request #3728 from pipecat-ai/mb/upgrade-pillow
Bump Pillow upper bound from <12 to <13
2026-02-13 13:23:41 -05:00
Mark Backman
f7af9f1efd Broaden /update-docs scope to detect missing doc sections 2026-02-13 13:14:45 -05:00
Mark Backman
a5f95acaf5 Add changelog for PR #3735 2026-02-13 13:08:03 -05:00
Mark Backman
e50b138ab2 Fix double execution of service functions when tracing errors occur
The outer try/except in each service decorator caught both tracing
setup errors and application errors from the wrapped function. If the
function itself raised (e.g. LLM rate limit, TTS timeout), the
exception was caught and the function was called a second time.

Fix by tracking whether the original function was called via a
fn_called flag. If the function was already called, re-raise the
exception instead of falling back to untraced re-execution.
2026-02-13 13:08:03 -05:00
Mark Backman
3640c7a2dd Merge pull request #3733 from pipecat-ai/mb/fix-traceable-init
Deprecate unused class_decorators tracing module and fix stale comments
2026-02-13 13:04:34 -05:00
Mark Backman
2454bedf29 Add /update-docs skill for keeping docs in sync with source changes
Adds a Claude Code skill that analyzes the current branch diff against
main, maps changed source files to their doc pages, and makes targeted
updates to Configuration, InputParams, Usage, Notes, and Event Handlers
sections.
2026-02-13 12:52:23 -05:00
Luke Payyapilli
3adb2f50a6 Fix LLMUserAggregator broadcasting mute events before StartFrame 2026-02-13 11:59:56 -05:00
Mark Backman
01b7a93e08 Deprecate unused Traceable/class_decorators module and fix stale comments
The class_decorators.py module (Traceable, @traceable, @traced) is not
used anywhere in the codebase. Mark it deprecated and fix the misleading
comment in service_decorators.py that referenced it as if it were active.
2026-02-13 11:25:40 -05:00
Mark Backman
347eaf582d Merge pull request #3721 from pipecat-ai/fix/pipeline-scoped-tracing-context
Replace singleton context providers with pipeline-scoped TracingContext
2026-02-13 11:24:37 -05:00
Mark Backman
25ca296477 Move tracing fields to AIService and extract _get_turn_context helper
Consolidate _tracing_enabled and _tracing_context from LLMService,
STTService, and TTSService into the shared AIService base class.
Extract _get_turn_context() helper in service_decorators.py to
encapsulate the repeated pattern across all traced decorators.
2026-02-13 11:21:24 -05:00
Mark Backman
3fce88555f Improve events docstrings 2026-02-13 09:39:44 -05:00
Mark Backman
9e6f27c9f1 Merge pull request #3625 from ianbbqzy/ian/inworld-async-timestamp
[inworld] Allow Async delivery of timestamps info
2026-02-12 21:20:22 -05:00
Ian Lee
94f01af545 [inworld] Allow Async delivery of timestamps info
* speed up first audio chunk latency
2026-02-12 17:48:58 -08:00
Filipi da Silva Fuchter
432870cc36 Merge pull request #3729 from pipecat-ai/filipi/elevenlabs_issue
TTS services fixes.
2026-02-12 16:31:46 -05:00
Filipi da Silva Fuchter
e065907745 Merge pull request #3718 from pipecat-ai/filipi/bot_started_speaking
Fixing an issue in RTVI where we were sometimes receiving bot output messages before the bot started speaking.
2026-02-12 16:31:14 -05:00
Mark Backman
b7a5ca3d1e Merge pull request #3730 from pipecat-ai/mb/stt-keepalive
Move STT keepalive from WebsocketSTTService to STTService base class
2026-02-12 15:37:23 -05:00
filipi87
9569625f03 Changelog entries for the TTS fixes. 2026-02-12 16:11:02 -03:00
Mark Backman
18afe37bd1 Add changelog entries for PR #3642 2026-02-12 14:09:24 -05:00
Mark Backman
2b9777b812 Update RimeTTSService InputParams for arcana and mistv2 model support
Add model-specific params (arcana: repetition_penalty, temperature, top_p;
mistv2: no_text_normalization, save_oovs, segment) with dynamic query param
building via _build_settings(). Model/voice/param changes now trigger
WebSocket reconnection since all settings are URL query params.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-12 14:01:41 -05:00
filipi87
8866ab1585 Fixing RimeTTSService to reuse the same context when needed. 2026-02-12 15:53:38 -03:00
filipi87
f0995164d9 Fixing PlayHTTTSService to reuse the same context when needed. 2026-02-12 15:50:18 -03:00
filipi87
136732afae Fixing InworldTTSService to reuse the same context when needed. 2026-02-12 15:46:59 -03:00
filipi87
3410eb82b3 Fixing CartesiaTTSService to reuse the same context when needed. 2026-02-12 15:26:49 -03:00
Chad Bailey
794811fbdb Updated WSS endpoint for Rime Arcana v3 support 2026-02-12 13:24:29 -05:00
filipi87
abea22ec57 Fixing AsyncAITTSService to reuse the same context when needed. 2026-02-12 15:17:47 -03:00
Mark Backman
08beb0264a Add changelog entries for PR #3730 2026-02-12 13:14:11 -05:00
Mark Backman
2e15b4842c Move STT keepalive mechanism from WebsocketSTTService to STTService base class
This allows non-websocket STT services (like SarvamSTTService, which uses
the Sarvam Python SDK for connection management) to reuse the same keepalive
pattern. Subclasses override _send_keepalive() and _is_keepalive_ready() for
their specific protocol.
2026-02-12 11:09:39 -05:00
filipi87
6d95a2425c Fixing ElevenLabs TTS word timestamp interleaving across sentences. 2026-02-12 12:54:47 -03:00
Mark Backman
4667a3d66d Add changelog for #3728 2026-02-12 09:42:23 -05:00
Mark Backman
0bf2477d2c Bump Pillow upper bound from <12 to <13 2026-02-12 09:41:18 -05:00
Mark Backman
71a752c971 Add tests for TracingContext and TurnTraceObserver
Cover pipeline-scoped tracing context lifecycle, span hierarchy,
conversation/turn context management, and concurrent pipeline isolation.
2026-02-11 23:27:35 -05:00
Mark Backman
358f237507 Replace singleton context providers with pipeline-scoped TracingContext
ConversationContextProvider and TurnContextProvider were singletons that
stored tracing context as class-level state. When two PipelineTask instances
ran concurrently, they would overwrite each other's context, causing service
spans to attach to the wrong pipeline's turn span.

Replace both singletons with a single TracingContext object owned by each
PipelineTask, threaded to services via StartFrame.
2026-02-11 21:58:10 -05:00
Mark Backman
d99a256715 Merge pull request #3706 from ianbbqzy/ian/inworld-user-agent
[Inworld] add User-Agent and X-Request-Id for better traceability
2026-02-11 19:38:26 -05:00
Ian Lee
dcbcab1542 [Inworld] add User-Agent and X-Request-Id for better traceability 2026-02-11 15:47:20 -08:00
Mark Backman
a966947220 Add changelog for #3720 2026-02-11 18:04:58 -05:00
Mark Backman
16b060d9e9 Fix Grok Realtime voice type validation for server responses
The Grok API now returns prefixed voice names (e.g. "human_Ara") in
session.updated events, causing Pydantic validation errors. Widen the
voice field type from GrokVoice to GrokVoice | str to accept both
user-facing names and server-returned values.
2026-02-11 18:04:20 -05:00
filipi87
ed7fde324e Adding changelog entry for the RTVIObserver fix. 2026-02-11 16:23:42 -03:00
filipi87
beb4e86b5f Fixing an issue in RTVI where we were sometimes receiving bot output messages before the bot started speaking. 2026-02-11 16:17:28 -03:00
Aleix Conchillo Flaqué
e75ccd9c2f Merge pull request #3717 from pipecat-ai/aleix/update-claude-md-pr-instructions
Add /pr-submit skill and clean up CLAUDE.md
2026-02-11 10:40:20 -08:00
Aleix Conchillo Flaqué
a80919ceff Move PR submission instructions from CLAUDE.md to /pr-submit skill
Extract the procedural PR workflow into an actionable skill that can be
invoked with /pr-submit. CLAUDE.md is better suited for project context
and conventions, not step-by-step procedures.
2026-02-11 09:57:42 -08:00
Aleix Conchillo Flaqué
1fe4538982 Update PR submission instructions in CLAUDE.md
Expand the Pull Requests section with detailed step-by-step instructions
including branch naming, commit guidance, changelog generation, and PR
description updates.
2026-02-11 09:51:10 -08:00
Filipi da Silva Fuchter
9a48d93bd2 Merge pull request #3713 from pipecat-ai/filipi/smallwebrtc_8khz
Fixing smallwebrtc transport input audio resampling logic.
2026-02-11 11:58:32 -05:00
filipi87
0c3e59ed61 Adding changelog entry for the SmallWebRTCTransport fix. 2026-02-11 13:07:52 -03:00
filipi87
ec2b38dc29 Fixing smallwebrtc transport input audio resampling logic. 2026-02-11 13:01:25 -03:00
Gökmen Görgen
2036757b84 add unit tests for AICModelManager and AICFilter error handling, model loading, and processor behavior 2026-02-11 15:22:37 +01:00
Mark Backman
0574167fbd Merge pull request #3709 from pipecat-ai/mb/fix-quickstart-pcc-deploy
Fix quickstart pcc-deploy.toml
2026-02-10 22:19:37 -05:00
Mark Backman
972ad93e18 Fix quickstart pcc-deploy.toml 2026-02-10 22:17:09 -05:00
Mark Backman
ac53594967 Merge pull request #3708 from pipecat-ai/mb/fix-quickstart-pyproject
Fix quickstart pyproject.toml
2026-02-10 22:09:49 -05:00
Mark Backman
b063d9d43b Fix quickstart pyproject.toml 2026-02-10 22:06:38 -05:00
Mark Backman
48e93beadf Merge pull request #3705 from pipecat-ai/mb/quickstart-0.0.102
Update quickstart for 0.0.102
2026-02-10 21:57:33 -05:00
Mark Backman
883b24f577 Update quickstart for 0.0.102 2026-02-10 18:14:04 -05:00
Gökmen Görgen
ed3ec045aa add changelog file. 2026-02-09 12:04:09 +01:00
Gökmen Görgen
67d39a97f7 AIC model caching. 2026-02-09 11:51:28 +01:00
Om Chauhan
a4e187e138 replace background task with flush-on-join 2026-02-09 06:04:08 +05:30
Om Chauhan
9f380170d7 added changelog 2026-02-09 05:37:43 +05:30
Om Chauhan
12f27f9cda fix(daily): queue outbound messages until transport joins 2026-02-09 05:37:43 +05:30
410 changed files with 30590 additions and 7585 deletions

View File

@@ -0,0 +1,27 @@
{
"name": "pipecat-dev-skills",
"owner": {
"name": "Pipecat"
},
"metadata": {
"description": "Development workflow skills for contributing to the Pipecat project",
"version": "1.0.0"
},
"plugins": [
{
"name": "pipecat-dev",
"description": "Development workflow skills for contributing to the Pipecat project",
"version": "1.0.0",
"source": "./",
"skills": [
"./.claude/skills/changelog",
"./.claude/skills/cleanup",
"./.claude/skills/code-review",
"./.claude/skills/docstring",
"./.claude/skills/pr-description",
"./.claude/skills/pr-submit",
"./.claude/skills/update-docs"
]
}
]
}

View File

@@ -26,7 +26,7 @@ Create changelog files for the important commits in this PR. The PR number is pr
- `{PR_NUMBER}.performance.md` - for performance improvements
- `{PR_NUMBER}.other.md` - for other changes
4. Each changelog file should at least contain a main single line starting with `- ` followed by a clear description of the change.
4. Each changelog file should at least contain a main single line starting with `- ` followed by a clear description of the change. No line wrapping.
5. If the change is complicated, changelog files can have indented lines after the main line with additional details or code samples.

View File

@@ -1,6 +1,6 @@
# Code Cleanup Skill
The **Code Cleanup Skill** reviews, refactors, and documents code changes in your current branch, ensuring alignment with **Pipecats architecture, coding standards, and example patterns**.
The **Code Cleanup Skill** reviews, refactors, and documents code changes in your current branch, ensuring alignment with **Pipecat's architecture, coding standards, and example patterns**.
It focuses on **readability, correctness, performance, and consistency**, while avoiding breaking changes.
---
@@ -28,9 +28,9 @@ This skill analyzes all changes introduced in your branch and performs the follo
Invoke the skill using any of the following commands:
- Clean up my branch code
- Refactor the changes in my branch
- Review and improve my branch code
- "Clean up my branch code"
- "Refactor the changes in my branch"
- "Review and improve my branch code"
- `/cleanup`
---

View File

@@ -3,21 +3,20 @@ name: docstring
description: Document a Python module and its classes using Google style
---
Document a Python module and its classes using Google-style docstrings following project conventions. The class name is provided as an argument.
Document a Python module or class using Google-style docstrings following project conventions. The argument can be a class name or a module path.
## Instructions
1. First, find the class in the codebase:
```
Search for "class ClassName" in src/pipecat/
```
1. Determine what to document based on the argument:
2. If multiple files contain that class name:
- List all matches with their file paths
- Ask the user which one they want to document
- Wait for confirmation before proceeding
**If a module path is provided** (e.g. `src/pipecat/audio/vad/vad_analyzer.py`):
- Use that file directly
3. Once the file is identified, read the module to understand its structure:
**If a class name is provided** (e.g. `VADAnalyzer`):
- Search for `class ClassName` in `src/pipecat/`
- If multiple files contain that class name, list all matches with their file paths, ask the user which one they want to document, and wait for confirmation
2. Once the file is identified, read the module to understand its structure:
- Identify all classes, functions, and important type aliases
- Understand the purpose of each component

View File

@@ -0,0 +1,28 @@
---
name: pr-submit
description: Create and submit a GitHub PR from the current branch
---
Submit the current changes as a GitHub pull request.
## Instructions
1. Check the current state of the repository:
- Run `git status` to see staged, unstaged, and untracked changes
- Run `git diff` to see current changes
- Run `git log --oneline -10` to see recent commits
2. If there are uncommitted changes relevant to the PR:
- Ask the user if they want a specific prefix for the branch name (e.g., `alice/`, `fix/`, `feat/`)
- Create a new branch based on the current branch
- Commit the changes using multiple commits if the changes are unrelated
3. Push the branch and create the PR:
- Push with `-u` flag to set upstream tracking
- Create the PR using `gh pr create`
4. After the PR is created:
- Run `/changelog <pr_number>` to generate changelog files, then commit and push them
- Run `/pr-description <pr_number>` to update the PR description
5. Return the PR URL to the user.

View File

@@ -0,0 +1,306 @@
---
name: update-docs
description: Update documentation pages to match source code changes on the current branch
---
Update documentation pages to reflect source code changes on the current branch. Analyzes the diff against main, maps changed source files to their corresponding doc pages, and makes targeted edits.
## Arguments
```
/update-docs [DOCS_PATH]
```
- `DOCS_PATH` (optional): Path to the docs repository root. If not provided, ask the user.
Examples:
- `/update-docs /Users/me/src/docs`
- `/update-docs`
## Instructions
### Step 1: Resolve docs path
If `DOCS_PATH` was provided as an argument, use it. Otherwise, ask the user for the path to their docs repository.
Verify the path exists and contains `server/services/` subdirectory.
### Step 2: Create docs branch
Get the current pipecat branch name:
```bash
git rev-parse --abbrev-ref HEAD
```
In the docs repo, create a new branch off main with a matching name:
```bash
cd DOCS_PATH && git checkout main && git pull && git checkout -b {branch-name}-docs
```
For example, if the pipecat branch is `feat/new-service`, the docs branch becomes `feat/new-service-docs`.
All doc edits in subsequent steps are made on this branch.
### Step 3: Detect changed source files
Run:
```bash
git diff main..HEAD --name-only
```
Filter to files that could affect documentation:
- `src/pipecat/services/**/*.py` (service implementations)
- `src/pipecat/transports/**/*.py` (transport implementations)
- `src/pipecat/serializers/**/*.py` (serializer implementations)
- `src/pipecat/processors/**/*.py` (processor implementations)
- `src/pipecat/audio/**/*.py` (audio utilities)
- `src/pipecat/turns/**/*.py` (turn management)
- `src/pipecat/observers/**/*.py` (observers)
- `src/pipecat/pipeline/**/*.py` (pipeline core)
Ignore `__init__.py`, `__pycache__`, test files, and files that only contain type re-exports.
### Step 4: Map source files to doc pages
For each changed source file, find the corresponding doc page. Read the mapping file at `.claude/skills/update-docs/SOURCE_DOC_MAPPING.md` and apply its tiered lookup: tier 1 (known exceptions) → tier 2 (pattern matching) → tier 3 (search fallback). **First match wins.**
### Step 5: Analyze each source-doc pair
For each mapped pair:
1. **Read the full source file** to understand current state
2. **Read the diff** for that file: `git diff main..HEAD -- <source_file>`
3. **Read the current doc page** in full
Identify what changed by comparing source to docs:
- **Constructor parameters**: Compare `__init__` signature to the Configuration section's `<ParamField>` entries
- **InputParams fields**: Compare `InputParams(BaseModel)` class fields to the InputParams table
- **Event handlers**: Compare `_register_event_handler` calls and event handler definitions to Event Handlers section
- **Class names / imports**: Check if Usage examples reference correct names
- **Behavioral changes**: Check if Notes section needs updating
### Step 6: Make targeted edits
For each doc page that needs updates, edit **only the sections that need changes**. Preserve all other content exactly as-is.
#### Rules
- **Never remove content** unless the corresponding source code was removed
- **Never rewrite sections** that are already accurate
- **Match existing formatting** — if the page uses `<ParamField>` tags, use them; if it uses tables, use tables
- **Keep descriptions concise** — match the tone and length of surrounding content
- **Preserve CardGroup, links, and examples** unless they reference removed functionality
- **Don't touch frontmatter** unless the class was renamed
#### Section-specific guidance
**Configuration** (constructor params):
- Use `<ParamField path="name" type="type" default="value">` format if the page already uses it
- Add new params in logical order (required first, then optional)
- Remove params that no longer exist in source
- Update types/defaults that changed
**InputParams** (runtime settings):
- Use markdown table format: `| Parameter | Type | Default | Description |`
- Match the field names and types from the `InputParams(BaseModel)` class
- Include the default values from the source
**Usage** (code examples):
- Update import paths, class names, and parameter names
- Only modify examples if they would break or be misleading with the new API
- Don't rewrite working examples just to add new optional params
**Notes**:
- Add notes for new behavioral gotchas or breaking changes
- Remove notes about limitations that were fixed
- Keep existing notes that are still accurate
**Event Handlers**:
- Update the event table and example code
- Add new events, remove deleted ones
- Update handler signatures if they changed
**Overview / Key Features / Prerequisites**:
- Only update if the PR fundamentally changes what the service does (new capability, removed capability, renamed class)
- Most PRs will NOT need changes to these sections
### Step 7: Update guides
Guides at `DOCS_PATH/guides/` reference specific class names, parameters, imports, and code patterns. After completing reference doc edits, check if any guides need updates too.
For each changed source file, collect the class names, renamed parameters, and changed imports from the diff. Search the guides directory:
```bash
grep -rl "ClassName\|old_param_name" DOCS_PATH/guides/
```
For each guide that references changed code:
1. Read the full guide
2. Update class names, parameter names, import paths, and code examples that are now incorrect
3. **Don't rewrite prose** — only fix the specific references that changed
4. Leave guides alone if they reference the service generally but don't use any changed APIs
Guide directories:
- `guides/learn/` — conceptual tutorials (pipeline, LLM, STT, TTS, etc.)
- `guides/fundamentals/` — practical how-tos (metrics, recording, transcripts, etc.)
- `guides/features/` — feature-specific guides (Gemini Live, OpenAI audio, WhatsApp, etc.)
- `guides/telephony/` — telephony integration guides (Twilio, Plivo, Telnyx, etc.)
### Step 8: Identify doc gaps
After processing all mapped pairs, check for two kinds of gaps:
**Missing pages**: Source files that had no doc page mapping (neither tier 1, 2, nor 3) and are not marked as "(skip)". For each, tell the user:
- The source file path
- The main class(es) it defines
- Whether a new doc page should be created
**Missing sections**: Mapped doc pages that are missing standard sections compared to the source. For example, a transport page with no Configuration section, or a service page with no InputParams table when the source defines `InputParams(BaseModel)`. Flag these and offer to add the missing sections.
If the user wants a new page, do all three of the following:
#### 8a: Create the doc page
Create the new `.mdx` file using this template structure:
```
---
title: "Service Name"
description: "Brief description"
---
## Overview
[Description from class docstring or source analysis]
<CardGroup cols={2}>
[Cards for API reference and examples if available]
</CardGroup>
## Installation
```bash
pip install "pipecat-ai[package-name]"
```
## Prerequisites
[Environment variables and account setup]
## Configuration
[ParamField entries for constructor params]
## InputParams
[Table of InputParams fields, if the service has them]
## Usage
### Basic Setup
```python
[Minimal working example]
```
## Notes
[Important caveats]
## Event Handlers
[Event table and example code]
```
#### 8b: Add to docs.json
Add the new page path to `DOCS_PATH/docs.json` in the correct navigation group. The path format is `server/services/{category}/{provider}` (without the `.mdx` extension).
Find the matching group in the navigation structure:
- **STT** → `"group": "Speech-to-Text"` under Services
- **TTS** → `"group": "Text-to-Speech"` under Services
- **LLM** → `"group": "LLM"` under Services
- **S2S** → `"group": "Speech-to-Speech"` under Services
- **Transport** → `"group": "Transport"` under Services
- **Serializer** → `"group": "Serializers"` under Services
- **Image generation** → `"group": "Image Generation"` under Services
- **Video** → `"group": "Video"` under Services
- **Memory** → `"group": "Memory"` under Services
- **Vision** → `"group": "Vision"` under Services
- **Analytics** → `"group": "Analytics & Monitoring"` under Services
Insert the new entry **alphabetically** within the group's `pages` array. For example, adding a new STT service "foo":
```json
{
"group": "Speech-to-Text",
"pages": [
"server/services/stt/assemblyai",
"server/services/stt/aws",
...
"server/services/stt/foo",
...
]
}
```
#### 8c: Add to supported-services.mdx
Add a new row to the correct category table in `DOCS_PATH/server/services/supported-services.mdx`.
Use this format:
```
| [DisplayName](/server/services/{category}/{provider}) | `pip install "pipecat-ai[package]"` |
```
To determine the correct values:
- **DisplayName**: Use the service's human-readable name (e.g., "ElevenLabs", "AWS Polly", "Google Gemini")
- **package**: Look at the service's `pyproject.toml` extras or the import pattern in the source code. For example, if the service is in `src/pipecat/services/foo/`, the package is typically `foo`.
- If no pip dependencies are required, use `No dependencies required` instead.
Insert the new row **alphabetically** within the table. Match the column alignment of the existing rows.
### Step 9: Output summary
After all edits are complete, print a summary:
```
## Documentation Updates
### Updated reference pages
- `server/services/stt/deepgram.mdx` — Updated Configuration (added `new_param`), InputParams (updated `language` default)
- `server/services/tts/elevenlabs.mdx` — Updated Event Handlers (added `on_connected`)
### Updated guides
- `guides/learn/speech-to-text.mdx` — Updated code example (renamed `old_param` → `new_param`)
### New service pages
- `server/services/tts/newprovider.mdx` — Created page, added to docs.json (Text-to-Speech), added to supported-services.mdx
### Unmapped source files
- `src/pipecat/services/newprovider/tts.py` — NewProviderTTSService (no doc page exists)
### Skipped files
- `src/pipecat/services/ai_service.py` — internal base class
```
## Guidelines
- **Be conservative** — only change what the diff warrants. Don't "improve" docs beyond what changed in source.
- **Read before editing** — always read the full doc page before making changes so you understand the existing structure.
- **Preserve voice** — match the writing style of the existing doc page, don't impose a different tone.
- **One PR at a time** — this skill operates on the current branch's diff against main. Don't look at other branches.
- **Parallel analysis** — when multiple source files map to different doc pages, analyze and edit them in parallel for efficiency.
- **Shared source files** — files like `services/google/google.py` are shared bases. Check which services import from them and update all affected doc pages.
## Checklist
Before finishing, verify:
- [ ] All changed source files were checked against the mapping table
- [ ] Each doc page edit matches the actual source code change (not guessed)
- [ ] No content was removed unless the corresponding source was removed
- [ ] New parameters have accurate types and defaults from source
- [ ] Formatting matches the existing page style
- [ ] Guides referencing changed APIs were checked and updated
- [ ] New service pages were added to `docs.json` in the correct group, alphabetically
- [ ] New service pages were added to `supported-services.mdx` in the correct table, alphabetically
- [ ] Unmapped files were reported to the user

View File

@@ -0,0 +1,79 @@
# Source-to-Doc Mapping
Maps pipecat source files to their documentation pages. Source paths are relative to `src/pipecat/`. Doc paths are relative to `DOCS_PATH`.
## Name mismatches
These source paths don't follow the standard `services/{provider}/{type}.py``server/services/{type}/{provider}.mdx` pattern.
| Source path | Doc page |
|---|---|
| `services/google/llm.py` | `server/services/llm/gemini.mdx` |
| `services/google/llm_vertex.py` | `server/services/llm/google-vertex.mdx` |
| `services/google/google.py` | (shared base — check which services use it) |
| `services/google/gemini_live/**` | `server/services/s2s/gemini-live.mdx` |
| `services/google/gemini_live/llm_vertex.py` | `server/services/s2s/gemini-live-vertex.mdx` |
| `services/aws_nova_sonic/**` | `server/services/s2s/aws.mdx` |
| `services/ultravox/**` | `server/services/s2s/ultravox.mdx` |
| `services/grok/realtime/**` | `server/services/s2s/grok.mdx` |
| `services/openai/realtime/**` | `server/services/s2s/openai.mdx` |
| `processors/frameworks/rtvi.py` | `server/frameworks/rtvi/rtvi-processor.mdx` and `server/frameworks/rtvi/rtvi-observer.mdx` |
| `processors/transcript_processor.py` | `server/utilities/transcript-processor.mdx` |
| `processors/user_idle_processor.py` | `server/utilities/user-idle-processor.mdx` |
| `processors/idle_frame_processor.py` | `server/pipeline/pipeline-idle-detection.mdx` |
| `pipeline/task.py` | `server/pipeline/pipeline-task.mdx` |
| `pipeline/runner.py` | `server/utilities/runner/guide.mdx` |
| `transports/base_transport.py` | `server/services/transport/transport-params.mdx` |
## Skip list
These files should never trigger doc updates.
| Pattern | Reason |
|---|---|
| `services/ai_service.py` | Internal base class |
| `services/stt_service.py` | Internal base class |
| `services/tts_service.py` | Internal base class |
| `services/llm_service.py` | Internal base class |
| `services/websocket_service.py` | Internal base class |
| `services/openai_realtime_beta/**` | Deprecated |
| `services/openai_realtime/**` | Deprecated |
| `services/gemini_multimodal_live/**` | Deprecated |
| `services/aws/agent_core.py` | Internal |
| `services/aws/sagemaker/**` | No doc page |
| `transports/base_input.py` | Internal base class |
| `transports/base_output.py` | Internal base class |
| `transports/websocket/client.py` | No doc page |
| `serializers/base_serializer.py` | Internal base class |
| `serializers/protobuf.py` | Internal |
| `processors/audio/**` | Internal |
| `pipeline/pipeline.py` | Core architecture, not a service doc |
## Pattern matching
For files not in the tables above, apply these patterns. Convert underscores to hyphens in provider names for doc filenames.
| Source pattern | Doc pattern |
|---|---|
| `services/{provider}/stt*.py` | `server/services/stt/{provider}.mdx` |
| `services/{provider}/tts*.py` | `server/services/tts/{provider}.mdx` |
| `services/{provider}/llm*.py` | `server/services/llm/{provider}.mdx` |
| `services/{provider}/image*.py` | `server/services/image-generation/{provider}.mdx` |
| `services/{provider}/video*.py` | `server/services/video/{provider}.mdx` |
| `services/{provider}/realtime/**` | `server/services/s2s/{provider}.mdx` |
| `transports/{name}/**` | `server/services/transport/{name}.mdx` |
| `serializers/{name}.py` | `server/services/serializers/{name}.mdx` |
| `observers/**` | `server/utilities/observers/` (match by class name) |
| `audio/vad/**` | `server/utilities/audio/` (match by class name) |
| `audio/filters/**` | `server/utilities/audio/` (match by class name) |
| `audio/mixers/**` | `server/utilities/audio/` (match by class name) |
| `processors/filters/**` | `server/utilities/filters/` (match by class name) |
If the doc file doesn't exist at the resolved path, the file is **unmapped**.
## Search fallback
For files that don't match any table or pattern above:
1. Extract the main class name(s) from the source file
2. Search the docs directory for that class name: `grep -r "ClassName" DOCS_PATH/server/`
3. If found in a doc page, use that as the mapping

View File

@@ -29,6 +29,7 @@ jobs:
- name: Install system packages
run: |
sudo apt-get update
sudo apt-get install -y portaudio19-dev
- name: Install dependencies
@@ -36,11 +37,13 @@ jobs:
uv sync --group dev \
--extra anthropic \
--extra aws \
--extra deepgram \
--extra google \
--extra langchain \
--extra livekit \
--extra local-smart-turn-v3 \
--extra piper \
--extra sagemaker \
--extra tracing \
--extra websocket
- name: Run tests with coverage

View File

@@ -86,7 +86,7 @@ jobs:
fi
# Validate fragment types
VALID_TYPES="added changed deprecated removed fixed security other"
VALID_TYPES="added changed deprecated removed fixed performance security other"
INVALID_FRAGMENTS=""
for file in changelog/*.md; do

View File

@@ -33,6 +33,7 @@ jobs:
- name: Install system packages
run: |
sudo apt-get update
sudo apt-get install -y portaudio19-dev
- name: Install dependencies
@@ -40,11 +41,13 @@ jobs:
uv sync --group dev \
--extra anthropic \
--extra aws \
--extra deepgram \
--extra google \
--extra langchain \
--extra livekit \
--extra local-smart-turn-v3 \
--extra piper \
--extra sagemaker \
--extra tracing \
--extra websocket
- name: Test with pytest

147
.github/workflows/update-docs.yml vendored Normal file
View File

@@ -0,0 +1,147 @@
name: Update Documentation on PR Merge
on:
pull_request_target:
types: [closed]
branches: [main]
paths:
- "src/pipecat/services/**"
- "src/pipecat/transports/**"
- "src/pipecat/serializers/**"
- "src/pipecat/processors/**"
- "src/pipecat/audio/**"
- "src/pipecat/turns/**"
- "src/pipecat/observers/**"
- "src/pipecat/pipeline/**"
workflow_dispatch:
inputs:
pr_number:
description: "PR number to generate docs for"
required: true
type: string
jobs:
update-docs:
if: >-
github.event_name == 'workflow_dispatch' ||
github.event.pull_request.merged == true
runs-on: ubuntu-latest
timeout-minutes: 15
permissions:
contents: read
pull-requests: read
id-token: write
steps:
- name: Checkout pipecat
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Checkout docs
uses: actions/checkout@v4
with:
repository: pipecat-ai/docs
token: ${{ secrets.DOCS_SYNC_TOKEN }}
path: _docs
- name: Resolve PR number
id: pr
run: |
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
echo "number=${{ inputs.pr_number }}" >> "$GITHUB_OUTPUT"
else
echo "number=${{ github.event.pull_request.number }}" >> "$GITHUB_OUTPUT"
fi
- name: Update documentation
uses: anthropics/claude-code-action@v1
env:
DOCS_SYNC_TOKEN: ${{ secrets.DOCS_SYNC_TOKEN }}
with:
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
github_token: ${{ secrets.GITHUB_TOKEN }}
prompt: |
You are updating documentation for the pipecat-ai/docs repository based on
changes merged in PR #${{ steps.pr.outputs.number }} of pipecat-ai/pipecat.
## Setup
1. Read the skill instructions at `.claude/skills/update-docs/SKILL.md`
2. Read the source-to-doc mapping at `.claude/skills/update-docs/SOURCE_DOC_MAPPING.md`
3. The docs repository is checked out at `./_docs/`
## Get the diff
Run `gh pr diff ${{ steps.pr.outputs.number }}` to see what changed in the PR.
Also run `gh pr diff ${{ steps.pr.outputs.number }} --name-only` to get the list of changed files.
Filter to source files matching the directories listed in SKILL.md Step 3.
If no relevant source files were changed, exit with "No documentation changes needed."
## Follow the skill instructions
Apply the SKILL.md workflow (Steps 3-9) with these adaptations for automation:
### Docs path
Use `./_docs/` — it's already checked out. Do not ask for a path.
### Branch management
- Branch name: `docs/pr-${{ steps.pr.outputs.number }}`
- Work inside `./_docs/` for all doc edits and git operations
- Check if the branch already exists on the remote:
```bash
cd _docs && git fetch origin docs/pr-${{ steps.pr.outputs.number }} 2>/dev/null
```
- If it exists: check it out (supports workflow re-runs)
- If not: create it from main
### Git config
Before committing in `_docs`, set:
```bash
git config user.name "github-actions[bot]"
git config user.email "github-actions[bot]@users.noreply.github.com"
```
### No interactive questions
Do not ask questions. If you encounter gaps (unmapped files, missing sections,
ambiguous changes), note them in the PR body under "## Gaps identified".
### Creating the docs PR
After committing all changes in `_docs`, push and create a PR:
```bash
cd _docs
git push -u origin docs/pr-${{ steps.pr.outputs.number }}
GH_TOKEN=$DOCS_SYNC_TOKEN gh pr create \
--repo pipecat-ai/docs \
--label auto-docs \
--title "docs: update for pipecat PR #${{ steps.pr.outputs.number }}" \
--body "$(cat <<'BODY'
Automated documentation update for [pipecat PR #${{ steps.pr.outputs.number }}](https://github.com/pipecat-ai/pipecat/pull/${{ steps.pr.outputs.number }}).
## Changes
<summarize each doc page updated and what changed>
## Gaps identified
<any unmapped files, missing doc pages, or missing sections — or "None">
BODY
)"
```
### Re-run handling
If `gh pr create` fails because a PR from that branch already exists,
push the updated commits and use `gh pr edit` to update the body instead.
### No-op
If after analyzing the diff you determine no documentation changes are needed
(e.g., only skip-listed files changed, or changes don't affect public API docs),
exit cleanly without creating a branch or PR. Output "No documentation changes needed."
## Important rules
- Only modify files inside `./_docs/` — never modify pipecat source code
- Follow the conservative editing rules from SKILL.md Step 6
- Read each doc page fully before editing (SKILL.md Guidelines)
- Use `GH_TOKEN=$DOCS_SYNC_TOKEN` for all `gh` commands targeting pipecat-ai/docs
claude_args: |
--model claude-sonnet-4-5-20250929
--max-turns 30
--allowedTools "Read,Write,Edit,Glob,Grep,Bash"

View File

@@ -7,6 +7,598 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
<!-- towncrier release notes start -->
## [0.0.104] - 2026-03-02
### Added
- Added `TextAggregationMetricsData` metric measuring the time from the first
LLM token to the first complete sentence, representing the latency cost of
sentence aggregation in the TTS pipeline.
(PR [#3696](https://github.com/pipecat-ai/pipecat/pull/3696))
- Added support for using strongly-typed objects instead of dicts for updating
service settings at runtime.
Instead of, say:
```python
await task.queue_frame(
STTUpdateSettingsFrame(settings={"language": Language.ES})
)
```
you'd do:
```python
await task.queue_frame(
STTUpdateSettingsFrame(delta=DeepgramSTTSettings(language=Language.ES))
)
```
Each service now vends strongly-typed classes like `DeepgramSTTSettings`
representing the service's runtime-updatable settings.
(PR [#3714](https://github.com/pipecat-ai/pipecat/pull/3714))
- Added support for specifying private endpoints for Azure Speech-to-Text,
enabling use in private networks behind firewalls.
(PR [#3764](https://github.com/pipecat-ai/pipecat/pull/3764))
- Added `LemonSliceTransport` and `LemonSliceApi` to support adding real-time
LemonSlice Avatars to any Daily room.
(PR [#3791](https://github.com/pipecat-ai/pipecat/pull/3791))
- Added `output_medium` parameter to `AgentInputParams` and
`OneShotInputParams` in Ultravox service to control initial output medium
(text or voice) at call creation time.
(PR [#3806](https://github.com/pipecat-ai/pipecat/pull/3806))
- Added `TurnMetricsData` as a generic metrics class for turn detection, with
e2e processing time measurement. `KrispVivaTurn` now emits `TurnMetricsData`
with `e2e_processing_time_ms` tracking the interval from VAD
speech-to-silence transition to turn completion.
(PR [#3809](https://github.com/pipecat-ai/pipecat/pull/3809))
- Added `on_audio_context_interrupted()` and `on_audio_context_completed()`
callbacks to `AudioContextTTSService`. Subclasses can override these to
perform provider-specific cleanup instead of overriding
`_handle_interruption()`.
(PR [#3814](https://github.com/pipecat-ai/pipecat/pull/3814))
- Added `on_summary_applied` event to `LLMContextSummarizer` for observability,
providing message counts before and after context summarization.
(PR [#3855](https://github.com/pipecat-ai/pipecat/pull/3855))
- Added `summary_message_template` to `LLMContextSummarizationConfig` for
customizing how summaries are formatted when injected into context (e.g.,
wrapping in XML tags).
(PR [#3855](https://github.com/pipecat-ai/pipecat/pull/3855))
- Added `summarization_timeout` to `LLMContextSummarizationConfig` (default
120s) to prevent hung LLM calls from permanently blocking future
summarizations.
(PR [#3855](https://github.com/pipecat-ai/pipecat/pull/3855))
- Added optional `llm` field to `LLMContextSummarizationConfig` for routing
summarization to a dedicated LLM service (e.g., a cheaper/faster model)
instead of the pipeline's primary model.
(PR [#3855](https://github.com/pipecat-ai/pipecat/pull/3855))
- Add AssemblyAI u3-rt-pro model support with built-in turn detection mode
(PR [#3856](https://github.com/pipecat-ai/pipecat/pull/3856))
- Added `LLMSummarizeContextFrame` to trigger on-demand context summarization
from anywhere in the pipeline (e.g. a function call tool). Accepts an
optional `config: LLMContextSummaryConfig` to override summary generation
settings per request.
(PR [#3863](https://github.com/pipecat-ai/pipecat/pull/3863))
- Added `LLMContextSummaryConfig` (summary generation params:
`target_context_tokens`, `min_messages_after_summary`,
`summarization_prompt`) and `LLMAutoContextSummarizationConfig` (auto-trigger
thresholds: `max_context_tokens`, `max_unsummarized_messages`, plus a nested
`summary_config`). These replace the monolithic
`LLMContextSummarizationConfig`.
(PR [#3863](https://github.com/pipecat-ai/pipecat/pull/3863))
- Added support for the `speed_alpha` parameter to the `arcana` model in
`RimeTTSService`.
(PR [#3873](https://github.com/pipecat-ai/pipecat/pull/3873))
- Added `ClientConnectedFrame`, a new `SystemFrame` pushed by all transports
(Daily, LiveKit, FastAPI WebSocket, WebSocket Server, SmallWebRTC, HeyGen,
Tavus) when a client connects. Enables observers to track transport readiness
timing.
(PR [#3881](https://github.com/pipecat-ai/pipecat/pull/3881))
- Added `StartupTimingObserver` for measuring how long each processor's
`start()` method takes during pipeline startup. Also measures transport
readiness — the time from `StartFrame` to first client connection — via the
`on_transport_timing_report` event.
(PR [#3881](https://github.com/pipecat-ai/pipecat/pull/3881))
- Added `BotConnectedFrame` for SFU transports and `on_transport_timing_report`
event to `StartupTimingObserver` with bot and client connection timing.
(PR [#3881](https://github.com/pipecat-ai/pipecat/pull/3881))
- Added optional `direction` parameter to `PipelineTask.queue_frame()` and
`PipelineTask.queue_frames()`, allowing frames to be pushed upstream from the
end of the pipeline.
(PR [#3883](https://github.com/pipecat-ai/pipecat/pull/3883))
- Added `on_latency_breakdown` event to `UserBotLatencyObserver` providing
per-service TTFB, text aggregation, user turn duration, and function call
latency metrics for each user-to-bot response cycle.
(PR [#3885](https://github.com/pipecat-ai/pipecat/pull/3885))
- Added `on_first_bot_speech_latency` event to `UserBotLatencyObserver`
measuring the time from client connection to first bot speech. An
`on_latency_breakdown` is also emitted for this first speech event.
(PR [#3885](https://github.com/pipecat-ai/pipecat/pull/3885))
- Added `broadcast_interruption()` to `FrameProcessor`. This method pushes an
`InterruptionFrame` both upstream and downstream directly from the calling
processor, avoiding the round-trip through the pipeline task that
`push_interruption_task_frame_and_wait()` required.
(PR [#3896](https://github.com/pipecat-ai/pipecat/pull/3896))
### Changed
- Added `text_aggregation_mode` parameter to `TTSService` and all TTS
subclasses with a new `TextAggregationMode` enum (`SENTENCE`, `TOKEN`). All
text now flows through text aggregators regardless of mode, enabling pattern
detection and tag handling in TOKEN mode.
(PR [#3696](https://github.com/pipecat-ai/pipecat/pull/3696))
- ⚠️ Refactored runtime-updatable service settings to use strongly-typed
classes (`TTSSettings`, `STTSettings`, `LLMSettings`, and service-specific
subclasses) instead of plain dicts. Each service's `_settings` now holds
these strongly-typed objects. For service maintainers, see changes in
COMMUNITY_INTEGRATIONS.md.
(PR [#3714](https://github.com/pipecat-ai/pipecat/pull/3714))
- Word timestamp support has been moved from `WordTTSService` into `TTSService`
via a new `supports_word_timestamps` parameter. Services that previously
extended `WordTTSService`, `AudioContextWordTTSService`, or
`WebsocketWordTTSService` now pass `supports_word_timestamps=True` to their
parent `__init__` instead.
(PR [#3786](https://github.com/pipecat-ai/pipecat/pull/3786))
- Improved Ultravox TTFB measurement accuracy by using VAD speech end time
instead of `UserStoppedSpeakingFrame` timing.
(PR [#3806](https://github.com/pipecat-ai/pipecat/pull/3806))
- Aligned `UltravoxRealtimeLLMService` frame handling with OpenAI/Gemini
realtime services: added `InterruptionFrame` handling with metrics cleanup,
processing metrics at response boundaries, and improved agent transcript
handling for both voice and text output modalities.
(PR [#3806](https://github.com/pipecat-ai/pipecat/pull/3806))
- Updated `OpenAIRealtimeLLMService` default model to `gpt-realtime-1.5`.
(PR [#3807](https://github.com/pipecat-ai/pipecat/pull/3807))
- Added `api_key` parameter to `KrispVivaSDKManager`, `KrispVivaTurn`, and
`KrispVivaFilter` for Krisp SDK v1.6.1+ licensing. Falls back to
`KRISP_VIVA_API_KEY` environment variable.
(PR [#3809](https://github.com/pipecat-ai/pipecat/pull/3809))
- Bumped `nltk` minimum version from 3.9.1 to 3.9.3 to resolve a security
vulnerability.
(PR [#3811](https://github.com/pipecat-ai/pipecat/pull/3811))
- `ServiceSettingsUpdateFrame`s are now `UninterruptibleFrame`s. Generally
speaking, you don't want a user interruption to prevent a service setting
change from going into effect. Note that you usually don't use
`ServiceSettingsUpdateFrame` directly, you use one of its subclasses:
- `LLMUpdateSettingsFrame`
- `TTSUpdateSettingsFrame`
- `STTUpdateSettingsFrame`
(PR [#3819](https://github.com/pipecat-ai/pipecat/pull/3819))
- Updated context summarization to use `user` role instead of `assistant` for
summary messages.
(PR [#3855](https://github.com/pipecat-ai/pipecat/pull/3855))
- Rename `AssemblyAISTTService` parameter
`min_end_of_turn_silence_when_confident` parameter to `min_turn_silence` (old
name still supported with deprecation warning)
(PR [#3856](https://github.com/pipecat-ai/pipecat/pull/3856))
- ⚠️ Renamed `LLMAssistantAggregatorParams` fields:
`enable_context_summarization` → `enable_auto_context_summarization` and
`context_summarization_config` → `auto_context_summarization_config` (now
accepts `LLMAutoContextSummarizationConfig`). The old names still work with a
`DeprecationWarning` for one release cycle.
(PR [#3863](https://github.com/pipecat-ai/pipecat/pull/3863))
- `ElevenLabsRealtimeSTTService` now sets `TranscriptionFrame.finalized` to
`True` when using `CommitStrategy.MANUAL`.
(PR [#3865](https://github.com/pipecat-ai/pipecat/pull/3865))
- Updated numba version pin from == to >=0.61.2
(PR [#3868](https://github.com/pipecat-ai/pipecat/pull/3868))
- Updated tracing code to use `ServiceSettings` dataclass API
(`given_fields()`, attribute access) instead of dict-style access
(`.items()`, `in`, subscript).
(PR [#3879](https://github.com/pipecat-ai/pipecat/pull/3879))
- ⚠️ Removed `event` field and `complete()` method from `InterruptionFrame`.
Removed `event` field from `InterruptionTaskFrame`. These are no longer
needed since `broadcast_interruption()` does not require a round-trip
completion signal.
(PR [#3896](https://github.com/pipecat-ai/pipecat/pull/3896))
- Moved `pipecat.services.deepgram.stt_sagemaker` and
`pipecat.services.deepgram.tts_sagemaker` to
`pipecat.services.deepgram.sagemaker.stt` and
`pipecat.services.deepgram.sagemaker.tts`. The old import paths still work
but emit a `DeprecationWarning`.
(PR [#3902](https://github.com/pipecat-ai/pipecat/pull/3902))
### Deprecated
- ⚠️ Deprecated `aggregate_sentences` parameter on `TTSService` and all TTS
subclasses. Use `text_aggregation_mode=TextAggregationMode.SENTENCE` or
`text_aggregation_mode=TextAggregationMode.TOKEN` instead.
(PR [#3696](https://github.com/pipecat-ai/pipecat/pull/3696))
- Deprecated `set_model()`, `set_voice()`, and `set_language()` on AI services
in favor of runtime updates via `TTSUpdateSettingsFrame`,
`STTUpdateSettingsFrame`, and `LLMUpdateSettingsFrame`.
⚠️ Note, too, a subtle behavior change in these deprecated methods. Whereas
previously only `set_language()` caused the service to actually react to the
update (e.g. by reconnecting to a remote service so it an pick up the
change), now all these methods do. This change was made as part of a refactor
making them all work the same way under the hood.
(PR [#3714](https://github.com/pipecat-ai/pipecat/pull/3714))
- Dict-based `*UpdateSettingsFrame(settings={...})` is deprecated in favor of
passing typed settings delta objects with
`*UpdateSettingsFrame(delta={...})`.
(PR [#3714](https://github.com/pipecat-ai/pipecat/pull/3714))
- Deprecated `WordTTSService`, `WebsocketWordTTSService`,
`AudioContextWordTTSService`, and `InterruptibleWordTTSService`. Use their
non-word counterparts with `supports_word_timestamps=True` instead:
- `WordTTSService` → `TTSService(supports_word_timestamps=True)`
- `WebsocketWordTTSService` →
`WebsocketTTSService(supports_word_timestamps=True)`
- `AudioContextWordTTSService` →
`AudioContextTTSService(supports_word_timestamps=True)`
- `InterruptibleWordTTSService` →
`InterruptibleTTSService(supports_word_timestamps=True)`
(PR [#3786](https://github.com/pipecat-ai/pipecat/pull/3786))
- Deprecated `SmartTurnMetricsData` in favor of `TurnMetricsData`.
`BaseSmartTurn` now emits `TurnMetricsData` directly.
(PR [#3809](https://github.com/pipecat-ai/pipecat/pull/3809))
- Deprecated `LLMContextSummarizationConfig`. Use
`LLMAutoContextSummarizationConfig` with a nested `LLMContextSummaryConfig`
instead. The old class emits a `DeprecationWarning`.
(PR [#3863](https://github.com/pipecat-ai/pipecat/pull/3863))
- Deprecated `push_interruption_task_frame_and_wait()` in `FrameProcessor`. Use
`broadcast_interruption()` instead. The old method now delegates to
`broadcast_interruption()` and logs a deprecation warning.
(PR [#3896](https://github.com/pipecat-ai/pipecat/pull/3896))
### Removed
- Removed `local-smart-turn-v3` optional extra from `pyproject.toml`. The
`transformers` and `onnxruntime` packages are now always installed as core
dependencies since they are required by the default turn stop strategy,
`TurnAnalyzerUserTurnStopStrategy` which uses `LocalSmartTurnAnalyzerV3`.
(PR [#3803](https://github.com/pipecat-ai/pipecat/pull/3803))
- ⚠️ Removed `PlayHTTTSService` and `PlayHTHttpTTSService`. PlayHT has been
shut down and is no longer available.
(PR [#3838](https://github.com/pipecat-ai/pipecat/pull/3838))
### Fixed
- Added `LLMSpecificMessage` handling in `LLMContextSummarizationUtil` to skip
provider-specific messages during context summarization.
(PR [#3794](https://github.com/pipecat-ai/pipecat/pull/3794))
- Treated `response_cancel_not_active` as a non-fatal error in realtime
services (`OpenAIRealtimeLLMService`, `GrokRealtimeLLMService`,
`OpenAIRealtimeBetaLLMService`) to prevent WebSocket disconnection when
cancelling an inactive response.
(PR [#3795](https://github.com/pipecat-ai/pipecat/pull/3795))
- Fixed Poetry compatibility by inlining `local-smart-turn-v3` dependencies
(`transformers`, `onnxruntime`) into core dependencies instead of using a
self-referential extra.
(PR [#3803](https://github.com/pipecat-ai/pipecat/pull/3803))
- Fixed `SentryMetrics` method signatures to match updated
`FrameProcessorMetrics` base class, resolving `TypeError` when using
`start_time`/`end_time` keyword arguments.
(PR [#3808](https://github.com/pipecat-ai/pipecat/pull/3808))
- Fixed STT TTFB metrics not being reported for `SonioxSTTService` and
`AWSTranscribeSTTService` due to missing `can_generate_metrics()` override.
(PR [#3813](https://github.com/pipecat-ai/pipecat/pull/3813))
- Fixed an issue where `AudioContextTTSService`-based providers (AsyncAI,
ElevenLabs, Inworld, Rime) did not close or clean up their server-side audio
contexts after normal speech completion, only on interruption.
(PR [#3814](https://github.com/pipecat-ai/pipecat/pull/3814))
- Fixed STT TTFB metrics measuring timeout expiry time instead of actual
transcript arrival time.
(PR [#3822](https://github.com/pipecat-ai/pipecat/pull/3822))
- Fixed `InterimTranscriptionFrame` and `TranslationFrame` being
unintentionally pushed downstream in `LLMUserAggregator`. They are now
consumed like `TranscriptionFrame`.
(PR [#3825](https://github.com/pipecat-ai/pipecat/pull/3825))
- Fixed misleading "Empty audio frame received for STT service" warnings when
using audio filters (e.g. `RNNoiseFilter`, `KrispVivaFilter`, `AICFilter`)
that buffer audio internally.
(PR [#3828](https://github.com/pipecat-ai/pipecat/pull/3828))
- Fixed issues with `RimeNonJsonTTSService` where trailing punctuation is
sometimes vocalized
(PR [#3837](https://github.com/pipecat-ai/pipecat/pull/3837))
- Fixed `TTSSpeakFrame` not committing spoken text to the conversation context
when used outside of an LLM response (e.g., bot greetings or injected
speech).
(PR [#3845](https://github.com/pipecat-ai/pipecat/pull/3845))
- Removed verbose per-chunk audio logging from `GenesysAudioHookSerializer`
that flooded production logs.
(PR [#3850](https://github.com/pipecat-ai/pipecat/pull/3850))
- Add beta feature warning when using custom prompts with AssemblyAI
(PR [#3856](https://github.com/pipecat-ai/pipecat/pull/3856))
- Fixed `LocalSmartTurnAnalyzerV3` producing incorrect end-of-turn predictions
at non-16kHz sample rates (e.g. 8kHz Twilio telephony) by adding automatic
resampling to 16kHz before Whisper feature extraction.
(PR [#3857](https://github.com/pipecat-ai/pipecat/pull/3857))
- Fixed `PipelineTask` double-inserting `RTVIProcessor` into the frame chain
when the user provides both an `RTVIProcessor` in the pipeline and a custom
`RTVIObserver` subclass in observers.
(PR [#3867](https://github.com/pipecat-ai/pipecat/pull/3867))
- Fixed turn completion instructions being lost when `LLMMessagesUpdateFrame`
replaces the LLM context. When `filter_incomplete_user_turns` is enabled, the
turn completion system message is now re-injected after context replacement.
(PR [#3888](https://github.com/pipecat-ai/pipecat/pull/3888))
- Fixed Azure TTS and STT services silently swallowing cancellation errors
(invalid API key, network failures, rate limiting) instead of propagating
them as `ErrorFrame`s to the pipeline.
(PR [#3893](https://github.com/pipecat-ai/pipecat/pull/3893))
### Performance
- Switched `GradiumTTSService` from `InterruptibleWordTTSService` to
`AudioContextWordTTSService`, eliminating websocket disconnect/reconnect on
every interruption by using `client_req_id`-based multiplexing.
(PR [#3759](https://github.com/pipecat-ai/pipecat/pull/3759))
### Other
- Standardized Sarvam STT/TTS User-Agent header handling to consistently send
Pipecat SDK identity in websocket requests.
(PR [#3886](https://github.com/pipecat-ai/pipecat/pull/3886))
## [0.0.103] - 2026-02-20
### Added
- Added `"timestampTransportStrategy": "ASYNC"` to `InworldAITTSService`. This
allows timestamps info to trail audio chunks arrival, resulting in much
better first audio chunk latency
(PR [#3625](https://github.com/pipecat-ai/pipecat/pull/3625))
- Added model-specific `InputParams` to `RimeTTSService`: arcana params
(`repetition_penalty`, `temperature`, `top_p`) and mistv2 params
(`no_text_normalization`, `save_oovs`, `segment`). Model, voice, and param
changes now trigger WebSocket reconnection.
(PR [#3642](https://github.com/pipecat-ai/pipecat/pull/3642))
- Added `write_transport_frame()` hook to `BaseOutputTransport` allowing
transport subclasses to handle custom frame types that flow through the audio
queue.
(PR [#3719](https://github.com/pipecat-ai/pipecat/pull/3719))
- Added `DailySIPTransferFrame` and `DailySIPReferFrame` to the Daily
transport. These frames queue SIP transfer and SIP REFER operations with
audio, so the operation executes only after the bot finishes its current
utterance.
(PR [#3719](https://github.com/pipecat-ai/pipecat/pull/3719))
- Added keepalive support to `SarvamSTTService` to prevent idle connection
timeouts (e.g. when used behind a `ServiceSwitcher`).
(PR [#3730](https://github.com/pipecat-ai/pipecat/pull/3730))
- Added `UserIdleTimeoutUpdateFrame` to enable or disable user idle detection
at runtime by updating the timeout dynamically.
(PR [#3748](https://github.com/pipecat-ai/pipecat/pull/3748))
- Added `broadcast_sibling_id` field to the base `Frame` class. This field is
automatically set by `broadcast_frame()` and `broadcast_frame_instance()` to
the ID of the paired frame pushed in the opposite direction, allowing
receivers to identify broadcast pairs.
(PR [#3774](https://github.com/pipecat-ai/pipecat/pull/3774))
- Added `ignored_sources` parameter to `RTVIObserverParams` and
`add_ignored_source()`/`remove_ignored_source()` methods to `RTVIObserver` to
suppress RTVI messages from specific pipeline processors (e.g. a silent
evaluation LLM).
(PR [#3779](https://github.com/pipecat-ai/pipecat/pull/3779))
- Added `DeepgramSageMakerTTSService` for running Deepgram TTS models deployed
on AWS SageMaker endpoints via HTTP/2 bidirectional streaming. Supports the
Deepgram TTS protocol (Speak, Flush, Clear, Close), interruption handling,
and per-turn TTFB metrics.
(PR [#3785](https://github.com/pipecat-ai/pipecat/pull/3785))
### Changed
- ⚠️ `RimeTTSService` now defaults to `model="arcana"` and the
`wss://users-ws.rime.ai/ws3` endpoint. `InputParams` defaults changed from
mistv2-specific values to `None` — only explicitly-set params are sent as
query params.
(PR [#3642](https://github.com/pipecat-ai/pipecat/pull/3642))
- `AICFilter` now shares read-only AIC models via a singleton `AICModelManager`
in `aic_filter.py`.
- Multiple filters using the same model path or `(model_id,
model_download_dir)` share one loaded model, with reference counting and
concurrent load deduplication.
- Model file I/O runs off the event loop so the filter does not block.
(PR [#3684](https://github.com/pipecat-ai/pipecat/pull/3684))
- Added `X-User-Agent` and `X-Request-Id` headers to `InworldTTSService` for
better traceability.
(PR [#3706](https://github.com/pipecat-ai/pipecat/pull/3706))
- `DailyUpdateRemoteParticipantsFrame` is no longer deprecated and is now
queued with audio like other transport frames.
(PR [#3719](https://github.com/pipecat-ai/pipecat/pull/3719))
- Bumped Pillow dependency upper bound from `<12` to `<13` to allow Pillow
12.x.
(PR [#3728](https://github.com/pipecat-ai/pipecat/pull/3728))
- Moved STT keepalive mechanism from `WebsocketSTTService` to the `STTService`
base class, allowing any STT service (not just websocket-based ones) to use
idle-connection keepalive via the `keepalive_timeout` and
`keepalive_interval` parameters.
(PR [#3730](https://github.com/pipecat-ai/pipecat/pull/3730))
- Improved audio context management in `AudioContextTTSService` by moving
context ID tracking to the base class and adding
`reuse_context_id_within_turn` parameter to control concurrent TTS request
handling.
- Added helper methods: `has_active_audio_context()`,
`get_active_audio_context_id()`, `remove_active_audio_context()`,
`reset_active_audio_context()`
- Simplified Cartesia, ElevenLabs, Inworld, Rime, AsyncAI, and Gradium TTS
implementations by removing duplicate context management code
(PR [#3732](https://github.com/pipecat-ai/pipecat/pull/3732))
- `UserIdleController` is now always created with a default timeout of 0
(disabled). The `user_idle_timeout` parameter changed from `Optional[float] =
None` to `float = 0` in `UserTurnProcessor`, `LLMUserAggregatorParams`, and
`UserIdleController`.
(PR [#3748](https://github.com/pipecat-ai/pipecat/pull/3748))
- Change the version specifier from `>=0.2.8` to `~=0.2.8` for the
`speechmatics-voice` package to ensure compatibility with future patch
versions.
(PR [#3761](https://github.com/pipecat-ai/pipecat/pull/3761))
- Updated `InworldTTSService` and `InworldHttpTTSService` to use `ASYNC`
timestamp transport strategy by default
(PR [#3765](https://github.com/pipecat-ai/pipecat/pull/3765))
- Added `start_time` and `end_time` parameters to `start_ttfb_metrics()`,
`stop_ttfb_metrics()`, `start_processing_metrics()`, and
`stop_processing_metrics()` in `FrameProcessor` and `FrameProcessorMetrics`,
allowing custom timestamps for metrics measurement. `STTService` now uses
these instead of custom TTFB tracking.
(PR [#3776](https://github.com/pipecat-ai/pipecat/pull/3776))
- Updated default Anthropic model from `claude-sonnet-4-5-20250929` to
`claude-sonnet-4-6`.
(PR [#3792](https://github.com/pipecat-ai/pipecat/pull/3792))
### Deprecated
- Deprecated unused `Traceable`, `@traceable`, `@traced`, and
`AttachmentStrategy` in `pipecat.utils.tracing.class_decorators`. This module
will be removed in a future release.
(PR [#3733](https://github.com/pipecat-ai/pipecat/pull/3733))
### Fixed
- Fixed race condition where `RTVIObserver` could send messages before
`DailyTransport` join completed. Outbound messages are now queued & delivered
after the transport is ready.
(PR [#3615](https://github.com/pipecat-ai/pipecat/pull/3615))
- Fixed async generator cleanup in OpenAI LLM streaming to prevent
`AttributeError` with uvloop on Python 3.12+ (MagicStack/uvloop#699).
(PR [#3698](https://github.com/pipecat-ai/pipecat/pull/3698))
- Fixed `SmallWebRTCTransport` input audio resampling to properly handle all
sample rates, including 8kHz audio.
(PR [#3713](https://github.com/pipecat-ai/pipecat/pull/3713))
- Fixed a race condition in `RTVIObserver` where bot output messages could be
sent before the bot-started-speaking event.
(PR [#3718](https://github.com/pipecat-ai/pipecat/pull/3718))
- Fixed Grok Realtime `session.updated` event parsing failure caused by the API
returning prefixed voice names (e.g. `"human_Ara"` instead of `"Ara"`).
(PR [#3720](https://github.com/pipecat-ai/pipecat/pull/3720))
- Fixed context ID reuse issue in `ElevenLabsTTSService`, `InworldTTSService`,
`RimeTTSService`, `CartesiaTTSService`, `AsyncAITTSService`, and
`PlayHTTTSService`. Services now properly reuse the same context ID across
multiple `run_tts()` invocations within a single LLM turn, preventing context
tracking issues and incorrect lifecycle signaling.
(PR [#3729](https://github.com/pipecat-ai/pipecat/pull/3729))
- Fixed word timestamp interleaving issue in `ElevenLabsTTSService` when
processing multiple sentences within a single LLM turn.
(PR [#3729](https://github.com/pipecat-ai/pipecat/pull/3729))
- Fixed tracing service decorators executing the wrapped function twice when
the function itself raised an exception (e.g., LLM rate limit, TTS timeout).
(PR [#3735](https://github.com/pipecat-ai/pipecat/pull/3735))
- Fixed `LLMUserAggregator` broadcasting mute events before `StartFrame`
reaches downstream processors.
(PR [#3737](https://github.com/pipecat-ai/pipecat/pull/3737))
- Fixed `UserIdleController` false idle triggers caused by gaps between user
and bot activity frames. The idle timer now starts only after
`BotStoppedSpeakingFrame` and is suppressed during active user turns and
function calls.
(PR [#3744](https://github.com/pipecat-ai/pipecat/pull/3744))
- Fixed incorrect `sample_rate` assignment in
`TavusInputTransport._on_participant_audio_data` (was using
`audio.audio_frames` instead of `audio.sample_rate`).
(PR [#3768](https://github.com/pipecat-ai/pipecat/pull/3768))
- Fixed `RTVIObserver` not processing upstream-only frames. Previously, all
upstream frames were filtered out to avoid duplicate messages from
broadcasted frames. Now only upstream copies of broadcasted frames are
skipped.
(PR [#3774](https://github.com/pipecat-ai/pipecat/pull/3774))
- Fixed mutable default arguments in `LLMContextAggregatorPair.__init__()` that
could cause shared state across instances.
(PR [#3782](https://github.com/pipecat-ai/pipecat/pull/3782))
- Fixed `DeepgramSageMakerSTTService` to properly track finalize lifecycle
using `request_finalize()` / `confirm_finalize()` and use `is_final` (instead
of `is_final and speech_final`) for final transcription detection, matching
`DeepgramSTTService` behavior.
(PR [#3784](https://github.com/pipecat-ai/pipecat/pull/3784))
- Fixed a race condition in `AudioContextTTSService` where the audio context
could time out between consecutive TTS requests within the same turn, causing
audio to be discarded.
(PR [#3787](https://github.com/pipecat-ai/pipecat/pull/3787))
- Fixed `push_interruption_task_frame_and_wait()` hanging indefinitely when the
`InterruptionFrame` does not reach the pipeline sink within the timeout.
Added a `timeout` keyword argument to customize the wait duration.
(PR [#3789](https://github.com/pipecat-ai/pipecat/pull/3789))
## [0.0.102] - 2026-02-10
### Added

View File

@@ -25,7 +25,7 @@ uv run pytest tests/test_name.py
uv run pytest tests/test_name.py::test_function_name
# Preview changelog
towncrier build --draft --version Unreleased
uv run towncrier build --draft --version Unreleased
# Lint and format check
uv run ruff check
@@ -74,7 +74,7 @@ All data flows as **Frame** objects through a pipeline of **FrameProcessors**:
- **Context Aggregation**: `LLMContext` accumulates messages for LLM calls; `UserResponse` aggregates user input
- **Turn Management**: Turn management is done through `LLMUserAggregator` and
`LLMAssistantAggregator`, created with `LLMContextAggregatorPair`
`LLMAssistantAggregator`, created with `LLMContextAggregatorPair`
- **User turn strategies**: Detection of when the user starts and stops speaking is done via user turn start/stop strategies. They push `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` respectively.
@@ -90,23 +90,26 @@ All data flows as **Frame** objects through a pipeline of **FrameProcessors**:
### Key Directories
| Directory | Purpose |
|---------------------------|----------------------------------------------------|
| `src/pipecat/frames/` | Frame definitions (100+ types) |
| `src/pipecat/processors/` | FrameProcessor base + aggregators, filters, audio |
| `src/pipecat/pipeline/` | Pipeline orchestration |
| `src/pipecat/services/` | AI service integrations (60+ providers) |
| `src/pipecat/transports/` | Transport layer (Daily, LiveKit, WebSocket, Local) |
| `src/pipecat/serializers/`| Frame serialization for WebSocket protocols |
| `src/pipecat/observers/` | Pipeline observers for monitoring frame flow |
| `src/pipecat/audio/` | VAD, filters, mixers, turn detection, DTMF |
| `src/pipecat/turns/` | User turn management |
| Directory | Purpose |
| -------------------------- | -------------------------------------------------- |
| `src/pipecat/frames/` | Frame definitions (100+ types) |
| `src/pipecat/processors/` | FrameProcessor base + aggregators, filters, audio |
| `src/pipecat/pipeline/` | Pipeline orchestration |
| `src/pipecat/services/` | AI service integrations (60+ providers) |
| `src/pipecat/transports/` | Transport layer (Daily, LiveKit, WebSocket, Local) |
| `src/pipecat/serializers/` | Frame serialization for WebSocket protocols |
| `src/pipecat/observers/` | Pipeline observers for monitoring frame flow |
| `src/pipecat/audio/` | VAD, filters, mixers, turn detection, DTMF |
| `src/pipecat/turns/` | User turn management |
## Code Style
- **Docstrings**: Google-style. Classes describe purpose; `__init__` has `Args:` section; dataclasses use `Parameters:` section.
- **Linting**: Ruff (line length 100). Pre-commit hooks enforce formatting.
- **Type hints**: Required for complex async code.
- **Dataclass vs Pydantic**: Use `@dataclass` for frames and internal pipeline data (high-frequency, no validation needed). Use Pydantic `BaseModel` for configuration, parameters, metrics, and external API data (benefits from validation and serialization). Specifically:
- `@dataclass`: Frame types, context aggregator pairs, internal data containers
- `BaseModel`: Service `InputParams`, transport/VAD/turn params, metrics data, API request/response models, serializer params
### Docstring Example
@@ -152,7 +155,3 @@ When adding a new service:
## Testing
Test utilities live in `src/pipecat/tests/utils.py`. Use `run_test()` to send frames through a pipeline and assert expected output frames in each direction. Use `SleepFrame(sleep=N)` to add delays between frames.
## Pull Requests
After creating a PR, use `/changelog <pr_number>` to generate the changelog file and `/pr-description <pr_number>` to update the PR description.

View File

@@ -25,7 +25,6 @@ Your repository must contain these components:
- **Source code** - Complete implementation following Pipecat patterns
- **Foundational example** - Single file example showing basic usage (see [Pipecat examples](https://github.com/pipecat-ai/pipecat/tree/main/examples/foundational))
- **README.md** - Must include:
- Introduction and explanation of your integration
- Installation instructions
- Usage instructions with Pipecat Pipeline
@@ -110,7 +109,6 @@ Once your PR is submitted, post in the `#community-integrations` Discord channel
#### Key requirements:
- **Frame sequence:** Output must follow this frame sequence pattern:
- `LLMFullResponseStartFrame` - Signals the start of an LLM response
- `LLMTextFrame` - Contains LLM content, typically streamed as tokens
- `LLMFullResponseEndFrame` - Signals the end of an LLM response
@@ -235,22 +233,79 @@ def can_generate_metrics(self) -> bool:
### Dynamic Settings Updates
STT, LLM, and TTS services support `ServiceUpdateSettingsFrame` for dynamic configuration changes. The base STTService has an `_update_settings()` method that handles settings, and the private `_settings` `Dict` is used to store settings and provide access to the subclass.
STT, LLM, and TTS services support runtime configuration changes via `*UpdateSettingsFrame`s (e.g. `STTUpdateSettingsFrame`, `TTSUpdateSettingsFrame`, `LLMUpdateSettingsFrame`).
Each service declares a settings dataclass that extends the appropriate base (`STTSettings`, `TTSSettings`, `LLMSettings`). Fields default to `NOT_GIVEN` so that update objects can represent sparse deltas:
```python
async def set_language(self, language: Language):
"""Set the recognition language and reconnect.
from dataclasses import dataclass, field
Args:
language: The language to use for speech recognition.
from pipecat.services.settings import STTSettings, NOT_GIVEN
@dataclass
class MySTTSettings(STTSettings):
"""Settings for my STT service.
Parameters:
region: Cloud region for the service.
"""
logger.info(f"Switching STT language to: [{language}]")
self._settings["language"] = language
await self._disconnect()
await self._connect()
region: str = field(default_factory=lambda: NOT_GIVEN)
```
Note that, in this example, Deepgram requires the websocket connection be disconnected and reconnected to reinitialize the service with the new value. Consider if your service requires reconnection.
The service stores its current settings in `self._settings` and declares the type with a class-level annotation for editor support:
```python
class MySTTService(STTService):
_settings: MySTTSettings
def __init__(self, *, model: str, language: str, region: str, **kwargs):
# An initial value should be provided for every settings field.
# This will be validated at service start.
# (If you track sample_rate, it can be a placeholder value like 0; see
# "Sample Rate Handling").
super().__init__(
settings=MySTTSettings(model=model, language=language, region=region), **kwargs
)
```
To react to runtime setting changes, override `_update_settings`. The base implementation applies the delta to `self._settings` and returns a `dict` mapping each changed field name to its **pre-update** value. Your override should call `super()` first, then act on the changed fields. A common implementation might look like:
```python
async def _update_settings(self, update: STTSettings) -> dict[str, Any]:
"""Apply a settings update, reconfiguring the recognizer if needed."""
changed = await super()._update_settings(update)
if not changed:
return changed
await self._disconnect()
await self._connect()
return changed
```
The dict keys work like a set for membership tests (`"language" in changed`) and truthiness (`if changed`). Use `changed.keys() - {"language"}` for set difference, or `changed["language"]` to inspect the previous value of a field.
Note that, in this example, the service requires a reconnect to apply the new language. Consider, for each setting, whether your service requires reconnection or can apply changes in-place.
If your service can't yet apply certain settings at runtime, call `self._warn_unhandled_updated_settings(changed)` with any unhandled field names so users get a clear log message:
```python
async def _update_settings(self, update: STTSettings) -> dict[str, Any]:
changed = await super()._update_settings(update)
if not changed:
return changed
if "language" in changed:
await self._update_language()
else:
# TODO: this should be temporary - handle changes to other settings soon!
self._warn_unhandled_updated_settings(changed.keys() - {"language"})
return changed
```
### Sample Rate Handling
@@ -260,7 +315,7 @@ Sample rates are set via PipelineParams and passed to each frame processor at in
async def start(self, frame: StartFrame):
"""Start the service."""
await super().start(frame)
self._settings["output_format"]["sample_rate"] = self.sample_rate
self._settings.output_sample_rate = self.sample_rate
await self._connect()
```

View File

@@ -49,12 +49,12 @@ Every pull request that makes a user-facing change should include a changelog en
```
2. Choose the appropriate type:
- `added.md` - New features
- `changed.md` - Changes in existing functionality
- `deprecated.md` - Soon-to-be removed features
- `removed.md` - Removed features
- `fixed.md` - Bug fixes
- `performance.md` - Performance improvements
- `security.md` - Security fixes
- `other.md` - Other changes (documentation, dependencies, etc.)
@@ -80,7 +80,6 @@ Every pull request that makes a user-facing change should include a changelog en
```markdown
- Updated service configuration:
- Changed default timeout to 30 seconds
- Added retry logic for failed connections
```
@@ -105,7 +104,6 @@ changelog/1234.changed.2.md
```markdown
- Updated service configuration:
- Changed default timeout to 30 seconds
- Added retry logic for failed connections
```

View File

@@ -55,6 +55,16 @@ Looking for help debugging your pipeline and processors? Check out [Whisker](htt
Love terminal applications? Check out [Tail](https://github.com/pipecat-ai/tail), a terminal dashboard for Pipecat.
### 🤖 Claude Code Skills
Use [Pipecat Skills](https://github.com/pipecat-ai/skills) with [Claude Code](https://claude.ai/code) to scaffold projects, deploy to Pipecat Cloud, and more. Install the marketplace with:
```
claude plugin marketplace add pipecat-ai/skills
```
and install any of the available plugins.
### 📺️ Pipecat TV Channel
Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.youtube.com/playlist?list=PLzU2zoMTQIHjqC3v4q2XVSR3hGSzwKFwH) channel.
@@ -71,19 +81,19 @@ Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.yout
## 🧩 Available services
| Category | Services |
| ------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Gradium](https://docs.pipecat.ai/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [Hathora](https://docs.pipecat.ai/server/services/stt/hathora), [NVIDIA Riva](https://docs.pipecat.ai/server/services/stt/riva), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [SambaNova (Whisper)](https://docs.pipecat.ai/server/services/stt/sambanova), [Sarvam](https://docs.pipecat.ai/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) |
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/server/services/llm/aws), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/server/services/llm/mistral), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/server/services/llm/sambanova) [Together AI](https://docs.pipecat.ai/server/services/llm/together) |
| Text-to-Speech | [Async](https://docs.pipecat.ai/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Camb AI](https://docs.pipecat.ai/server/services/tts/camb), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [Gradium](https://docs.pipecat.ai/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/server/services/tts/groq), [Hathora](https://docs.pipecat.ai/server/services/tts/hathora), [Hume](https://docs.pipecat.ai/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/server/services/tts/inworld), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [NVIDIA Riva](https://docs.pipecat.ai/server/services/tts/riva), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Resemble](https://docs.pipecat.ai/server/services/tts/resemble), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [Speechmatics](https://docs.pipecat.ai/server/services/tts/speechmatics), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
| Speech-to-Speech | [AWS Nova Sonic](https://docs.pipecat.ai/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/server/services/s2s/ultravox), |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local |
| Serializers | [Exotel](https://docs.pipecat.ai/server/utilities/serializers/exotel), [Plivo](https://docs.pipecat.ai/server/utilities/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/utilities/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/utilities/serializers/telnyx), [Vonage](https://docs.pipecat.ai/server/utilities/serializers/vonage) |
| Video | [HeyGen](https://docs.pipecat.ai/server/services/video/heygen), [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) |
| Memory | [mem0](https://docs.pipecat.ai/server/services/memory/mem0) |
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/google-imagen), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) |
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [ai-coustics](https://docs.pipecat.ai/server/utilities/audio/aic-filter) |
| Analytics & Metrics | [OpenTelemetry](https://docs.pipecat.ai/server/utilities/opentelemetry), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) |
| Category | Services |
| ------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Gradium](https://docs.pipecat.ai/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [Hathora](https://docs.pipecat.ai/server/services/stt/hathora), [NVIDIA Riva](https://docs.pipecat.ai/server/services/stt/riva), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [SambaNova (Whisper)](https://docs.pipecat.ai/server/services/stt/sambanova), [Sarvam](https://docs.pipecat.ai/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) |
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/server/services/llm/aws), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/server/services/llm/mistral), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/server/services/llm/sambanova) [Together AI](https://docs.pipecat.ai/server/services/llm/together) |
| Text-to-Speech | [Async](https://docs.pipecat.ai/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Camb AI](https://docs.pipecat.ai/server/services/tts/camb), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [Gradium](https://docs.pipecat.ai/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/server/services/tts/groq), [Hathora](https://docs.pipecat.ai/server/services/tts/hathora), [Hume](https://docs.pipecat.ai/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/server/services/tts/inworld), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [NVIDIA Riva](https://docs.pipecat.ai/server/services/tts/riva), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [Resemble](https://docs.pipecat.ai/server/services/tts/resemble), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [Speechmatics](https://docs.pipecat.ai/server/services/tts/speechmatics), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
| Speech-to-Speech | [AWS Nova Sonic](https://docs.pipecat.ai/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/server/services/s2s/ultravox), |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local |
| Serializers | [Exotel](https://docs.pipecat.ai/server/utilities/serializers/exotel), [Plivo](https://docs.pipecat.ai/server/utilities/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/utilities/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/utilities/serializers/telnyx), [Vonage](https://docs.pipecat.ai/server/utilities/serializers/vonage) |
| Video | [HeyGen](https://docs.pipecat.ai/server/services/video/heygen), [LemonSlice](https://docs.pipecat.ai/server/services/video/lemonslice), [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) |
| Memory | [mem0](https://docs.pipecat.ai/server/services/memory/mem0) |
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/google-imagen), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) |
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [ai-coustics](https://docs.pipecat.ai/server/utilities/audio/aic-filter) |
| Analytics & Metrics | [OpenTelemetry](https://docs.pipecat.ai/server/utilities/opentelemetry), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) |
📚 [View full services documentation →](https://docs.pipecat.ai/server/services/supported-services)
@@ -163,6 +173,15 @@ You can get started with Pipecat running on your local machine, then move your a
> **Note**: Some extras (local, gstreamer) require system dependencies. See documentation if you encounter build errors.
### Claude Code Skills
Install development workflow skills for contributing to Pipecat with [Claude Code](https://claude.ai/code):
```
claude plugin marketplace add pipecat-ai/pipecat
claude plugin install pipecat-dev@pipecat-dev-skills
```
### Running tests
To run all tests, from the root directory:

View File

@@ -0,0 +1 @@
- ⚠️ Updated `DeepgramSTTService` to use `deepgram-sdk` v6. The `LiveOptions` class was removed from the SDK and is now provided by pipecat directly; import it from `pipecat.services.deepgram.stt` instead of `deepgram`.

1
changelog/3848.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed `DeepgramSTTService` keepalive ping timeout disconnections. The deepgram-sdk v6 removed automatic keepalive; pipecat now sends explicit `KeepAlive` messages every 5 seconds, within the recommended 35 second interval before Deepgram's 10-second inactivity timeout.

View File

@@ -0,0 +1,3 @@
- Support for Voice Focus 2.0 models.
- Updated `aic-sdk` to `~=2.1.0` to support Voice Focus 2.0 models.
- Cleaned unused `ParameterFixedError` exception handling in `AICFilter` parameter setup.

1
changelog/3889.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed `BufferError: Existing exports of data: object cannot be re-sized` in `AICFilter` caused by holding a `memoryview` on the mutable audio buffer across async yield points.

View File

@@ -0,0 +1 @@
- `max_context_tokens` and `max_unsummarized_messages` in `LLMAutoContextSummarizationConfig` (and deprecated `LLMContextSummarizationConfig`) can now be set to `None` independently to disable that summarization threshold. At least one must remain set.

1
changelog/3915.added.md Normal file
View File

@@ -0,0 +1 @@
- Added optional `timeout_secs` parameter to `register_function()` and `register_direct_function()` for per-tool function call timeout control, overriding the global `function_call_timeout_secs` default.

1
changelog/3916.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `cloud-audio-only` recording option to Daily transport's `enable_recording` property.

15
changelog/3918.added.md Normal file
View File

@@ -0,0 +1,15 @@
- Wired up `system_instruction` in `BaseOpenAILLMService`, `AnthropicLLMService`, and `AWSBedrockLLMService` so it works as a default system prompt, matching the behavior of the Google services. This enables sharing a single `LLMContext` across multiple LLM services, where each service provides its own system instruction independently.
```python
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful assistant.",
)
context = LLMContext()
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
context.add_message({"role": "user", "content": "Please introduce yourself."})
await task.queue_frames([LLMRunFrame()])
```

1
changelog/3918.other.md Normal file
View File

@@ -0,0 +1 @@
- Updated foundational examples to use `system_instruction` on LLM services instead of adding system messages to `LLMContext`.

View File

@@ -42,7 +42,7 @@ This script:
- Creates a fresh virtual environment
- Installs all dependencies as specified in requirements files
- Handles conflicting dependencies (like grpcio versions for Riva and PlayHT)
- Handles conflicting dependencies (like grpcio versions for Riva)
- Builds the documentation in an isolated environment
- Provides detailed logging of the build process
@@ -74,7 +74,6 @@ start _build/html/index.html
├── index.rst # Main documentation entry point
├── requirements-base.txt # Base documentation dependencies
├── requirements-riva.txt # Riva-specific dependencies
├── requirements-playht.txt # PlayHT-specific dependencies
├── build-docs.sh # Local build script
└── rtd-test.py # ReadTheDocs test build script
```

View File

@@ -47,7 +47,8 @@ DAILY_ROOM_URL=https://...
# Deepgram
DEEPGRAM_API_KEY=...
SAGEMAKER_ENDPOINT_NAME=...
SAGEMAKER_STT_ENDPOINT_NAME=...
SAGEMAKER_TTS_ENDPOINT_NAME=...
# DeepSeek
DEEPSEEK_API_KEY=...
@@ -103,9 +104,14 @@ INWORLD_API_KEY=...
KRISP_MODEL_PATH=...
# Krisp Viva
KRISP_VIVA_API_KEY=...
KRISP_VIVA_FILTER_MODEL_PATH=...
KRISP_VIVA_TURN_MODEL_PATH=...
# LemonSlice
LEMONSLICE_API_KEY=...
LEMONSLICE_AGENT_ID=...
# LiveKit
LIVEKIT_API_KEY=...
LIVEKIT_API_SECRET=...
@@ -145,10 +151,6 @@ KOALA_ACCESS_KEY=...
# Piper
PIPER_BASE_URL=...
# PlayHT
PLAYHT_USER_ID=...
PLAYHT_API_KEY=...
# Plivo
PLIVO_AUTH_ID=...
PLIVO_AUTH_TOKEN=...

View File

@@ -42,14 +42,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
messages = [
{
"role": "system",
"content": "You are an LLM in a WebRTC session, and this is a 'hello world' demo. Say hello to the world.",
}
]
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are an LLM in a WebRTC session, and this is a 'hello world' demo.",
)
task = PipelineTask(
Pipeline([llm, tts, transport.output()]),
@@ -59,7 +55,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# Register an event handler so we can play the audio when the client joins
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
await task.queue_frames([LLMContextFrame(LLMContext(messages)), EndFrame()])
context = LLMContext()
context.add_message({"role": "system", "content": "Say hello to the world."})
await task.queue_frames([LLMContextFrame(context), EndFrame()])
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)

View File

@@ -70,16 +70,12 @@ async def run_example(webrtc_connection: SmallWebRTCConnection):
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -109,7 +105,7 @@ async def run_example(webrtc_connection: SmallWebRTCConnection):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -53,16 +53,13 @@ async def main():
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o",
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -91,7 +88,9 @@ async def main():
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "system", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_participant_left")

View File

@@ -55,24 +55,17 @@ async def main():
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. "
"Your goal is to demonstrate your capabilities in a succinct way. "
"Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. "
"Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),

View File

@@ -86,18 +86,14 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
ml = MetricsLogger()
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -129,7 +125,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -103,16 +103,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),

View File

@@ -59,16 +59,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -99,7 +95,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -24,6 +24,7 @@ from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.tts_service import TextAggregationMode
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
@@ -56,18 +57,17 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
# Alternatively, you can use TextAggregationMode.TOKEN to stream tokens instead of
# sentencesfor faster response times.
# text_aggregation_mode=TextAggregationMode.TOKEN,
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -98,7 +98,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -10,6 +10,7 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -60,19 +61,18 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-2-andromeda-en")
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(user_turn_strategies=ExternalUserTurnStrategies()),
user_params=LLMUserAggregatorParams(
user_turn_strategies=ExternalUserTurnStrategies(),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
@@ -100,7 +100,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -63,14 +63,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
aiohttp_session=session,
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
messages = []
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
@@ -103,7 +101,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "system", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -23,8 +23,8 @@ from pipecat.processors.aggregators.llm_response_universal import (
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.aws.llm import AWSBedrockLLMService
from pipecat.services.deepgram.stt_sagemaker import DeepgramSageMakerSTTService
from pipecat.services.deepgram.tts import DeepgramTTSService
from pipecat.services.deepgram.sagemaker.stt import DeepgramSageMakerSTTService
from pipecat.services.deepgram.sagemaker.tts import DeepgramSageMakerTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
@@ -58,26 +58,28 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# - AWS credentials configured (via environment variables or AWS CLI)
# - A deployed SageMaker endpoint with Deepgram model
stt = DeepgramSageMakerSTTService(
endpoint_name=os.getenv("SAGEMAKER_ENDPOINT_NAME"),
endpoint_name=os.getenv("SAGEMAKER_STT_ENDPOINT_NAME"),
region=os.getenv("AWS_REGION"),
)
tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-2-andromeda-en")
# Initialize Deepgram SageMaker TTS Service
# This requires:
# - AWS credentials configured (via environment variables or AWS CLI)
# - A deployed SageMaker endpoint with Deepgram TTS model
tts = DeepgramSageMakerTTSService(
endpoint_name=os.getenv("SAGEMAKER_TTS_ENDPOINT_NAME"),
region=os.getenv("AWS_REGION"),
voice="aura-2-andromeda-en",
)
llm = AWSBedrockLLMService(
aws_region=os.getenv("AWS_REGION"),
model="us.amazon.nova-pro-v1:0",
params=AWSBedrockLLMService.InputParams(temperature=0.8),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -108,7 +110,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -61,16 +61,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-2-andromeda-en")
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(user_turn_strategies=ExternalUserTurnStrategies()),
@@ -101,7 +97,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -57,16 +57,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-2-andromeda-en")
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -97,7 +93,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -67,16 +67,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
aiohttp_session=session,
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -107,7 +103,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "system", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -60,16 +60,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -100,7 +96,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -66,16 +66,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
model=os.getenv("AZURE_CHATGPT_MODEL"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -106,7 +100,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -66,16 +66,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
model=os.getenv("AZURE_CHATGPT_MODEL"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -106,7 +100,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -11,7 +11,6 @@ from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -61,16 +60,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = OpenAITTSService(api_key=os.getenv("OPENAI_API_KEY"), voice="ballad")
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are very knowledgable about dogs. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are very knowledgable about dogs. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -102,7 +97,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -66,16 +66,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = OpenAITTSService(api_key=os.getenv("OPENAI_API_KEY"), voice="ballad")
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are very knowledgable about dogs. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are very knowledgable about dogs. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -107,7 +103,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -65,16 +65,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
api_key=os.getenv("OPENAI_API_KEY"),
openpipe_api_key=os.getenv("OPENPIPE_API_KEY"),
tags={"conversation_id": f"pipecat-{timestamp}"},
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -105,7 +99,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -63,16 +63,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
base_url="http://localhost:8000",
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -103,7 +99,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "system", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -56,16 +56,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = LmntTTSService(api_key=os.getenv("LMNT_API_KEY"), voice_id="morgan")
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -96,7 +92,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -55,19 +55,14 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = GroqSTTService(api_key=os.getenv("GROQ_API_KEY"))
llm = GroqLLMService(
api_key=os.getenv("GROQ_API_KEY"), model="meta-llama/llama-4-maverick-17b-128e-instruct"
api_key=os.getenv("GROQ_API_KEY"),
model="meta-llama/llama-4-maverick-17b-128e-instruct",
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
tts = GroqTTSService(api_key=os.getenv("GROQ_API_KEY"))
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -98,7 +93,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -62,16 +62,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
aws_region="us-west-2",
model="us.anthropic.claude-haiku-4-5-20251001-v1:0",
params=AWSBedrockLLMService.InputParams(temperature=0.8),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -102,7 +96,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "user", "content": "Please introduce yourself to the user."})
context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -83,17 +83,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
llm = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
model="gemini-2.5-flash-image",
# model="gemini-3-pro-image-preview", # A more powerful model, but slower
# model="gemini-3-pro-image-preview", # A more powerful model, but slower,
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -124,7 +118,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation with a styled introduction
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -71,13 +71,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
llm = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
model="gemini-2.5-flash",
)
# System message that instructs the AI on how to speak
messages = [
{
"role": "system",
"content": """You are a helpful AI assistant in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way.
system_instruction="""You are a helpful AI assistant in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way.
IMPORTANT: You're using Gemini TTS which supports expressive markup tags. You can use these tags in your responses:
- [sigh] - Insert a sigh sound
@@ -95,10 +89,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
- "The answer is... [long pause] ...42!"
Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.""",
},
]
)
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -129,7 +122,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation
messages.append(
context.add_message(
{
"role": "system",
"content": "You are an AI assistant. You can help with a variety of tasks. Introduce yourself and ask the user what they would like to know.",

View File

@@ -72,16 +72,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# params=GoogleLLMService.InputParams(
# thinking=GoogleLLMService.ThinkingConfig(thinking_budget=4096)
# ),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -112,7 +106,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -72,16 +72,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# params=GoogleLLMService.InputParams(
# thinking=GoogleLLMService.ThinkingConfig(thinking_budget=4096)
# ),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -112,7 +106,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -0,0 +1,175 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.assemblyai.models import AssemblyAIConnectionParams
from pipecat.services.assemblyai.stt import AssemblyAISTTService
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_turn_strategies import ExternalUserTurnStrategies
load_dotenv(override=True)
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
"""AssemblyAI u3-rt-pro with Built-in Turn Detection
This example demonstrates using AssemblyAI's u3-rt-pro Speech-to-Text model
with AssemblyAI's built-in turn detection for more natural conversation flow.
Key features:
1. AssemblyAI Turn Detection
- Set `vad_force_turn_endpoint=False` to use AssemblyAI's built-in turn detection
- AssemblyAI's model determines when user starts/stops speaking
- Uses `ExternalUserTurnStrategies` to delegate turn control to AssemblyAI
- More natural turn detection based on speech patterns and pauses
2. Advanced Turn Detection Tuning
- `min_turn_silence`: Minimum silence (ms) when confident about end-of-turn.
Lower values = faster responses. Default: 100ms
- `max_turn_silence`: Maximum silence (ms) before forcing end-of-turn.
Prevents long pauses. Default: 1000ms
3. Prompt-Based Transcription Enhancement
- Use `prompt` parameter to improve accuracy for specific names/terms
- Particularly useful for proper nouns, technical terms, domain vocabulary
- Example: "Names: Xiomara, Saoirse, Krzystof. Technical terms: API, OAuth."
4. Speaker Diarization (Optional)
- Enable with `speaker_labels=True`
- Automatically identifies different speakers in multi-party conversations
- TranscriptionFrame includes speaker_id field (e.g., "Speaker A", "Speaker B")
5. Language Detection (Optional, multilingual model only)
- Enable with `language_detection=True`
- Automatically detects spoken language
- Available with universal-streaming-multilingual model
For more information: https://www.assemblyai.com/docs/speech-to-text/streaming
"""
logger.info(f"Starting bot")
stt = AssemblyAISTTService(
api_key=os.getenv("ASSEMBLYAI_API_KEY"),
vad_force_turn_endpoint=False, # Use AssemblyAI's built-in turn detection
connection_params=AssemblyAIConnectionParams(
speech_model="u3-rt-pro",
# Optional: Tune turn detection timing (defaults shown below)
# min_turn_silence=100, # Default
# max_turn_silence=1000, # Default
# Optional: Boost accuracy for specific names/terms
# prompt="Names: Xiomara, Saoirse, Krzystof. Technical terms: API, OAuth.",
# Optional: Enable speaker diarization
# speaker_labels=True,
),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=ExternalUserTurnStrategies(),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
assistant_aggregator, # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -62,16 +62,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -102,7 +98,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -31,6 +31,8 @@ from pipecat.audio.filters.krisp_viva_filter import KrispVivaFilter
from pipecat.audio.turn.krisp_viva_turn import KrispVivaTurn
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.metrics.metrics import TurnMetricsData
from pipecat.observers.loggers.metrics_log_observer import MetricsLogObserver
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -41,32 +43,37 @@ from pipecat.processors.aggregators.llm_response_universal import (
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.deepgram.tts import DeepgramTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
krisp_viva_filter = KrispVivaFilter()
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
audio_in_filter=KrispVivaFilter(),
audio_in_filter=krisp_viva_filter,
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
audio_in_filter=KrispVivaFilter(),
audio_in_filter=krisp_viva_filter,
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
audio_in_filter=KrispVivaFilter(),
audio_in_filter=krisp_viva_filter,
),
}
@@ -76,18 +83,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-helios-en")
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"), voice_id="71a7ad14-091c-4e8e-a314-022ece01c121"
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
@@ -117,13 +122,14 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
observers=[MetricsLogObserver(include_metrics={TurnMetricsData})],
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -60,16 +60,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-helios-en")
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -100,7 +96,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -65,16 +65,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
aiohttp_session=session,
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -105,7 +101,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "system", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -56,19 +56,15 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = RimeTTSService(
api_key=os.getenv("RIME_API_KEY", ""),
voice_id="rex",
voice_id="luna",
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -99,7 +95,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -55,19 +55,14 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = NvidiaSTTService(api_key=os.getenv("NVIDIA_API_KEY"))
llm = NvidiaLLMService(
api_key=os.getenv("NVIDIA_API_KEY"), model="meta/llama-3.1-405b-instruct"
api_key=os.getenv("NVIDIA_API_KEY"),
model="meta/llama-3.3-70b-instruct",
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
tts = NvidiaTTSService(api_key=os.getenv("NVIDIA_API_KEY"))
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -98,7 +93,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -96,7 +96,7 @@ class UserAudioCollector(FrameProcessor):
self._user_speaking = True
elif isinstance(frame, UserStoppedSpeakingFrame):
self._user_speaking = False
self._context.add_audio_frames_message(audio_frames=self._audio_frames)
await self._context.add_audio_frames_message(audio_frames=self._audio_frames)
await self._user_context_aggregator.push_frame(LLMRunFrame())
elif isinstance(frame, InputAudioRawFrame):

View File

@@ -60,16 +60,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
model="4ce7e917cedd4bc2bb2e6ff3a46acaa1", # Barack Obama
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -100,7 +96,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -64,16 +64,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
aiohttp_session=session,
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -104,7 +100,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "system", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -59,16 +59,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
voice_id="fc854436-2dac-4d21-aa69-ae17b54e98eb", # Emily
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -99,7 +95,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -62,16 +62,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -102,7 +98,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -47,16 +47,12 @@ async def main():
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -82,7 +78,7 @@ async def main():
),
)
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
runner = PipelineRunner()

View File

@@ -66,16 +66,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
params=MiniMaxHttpTTSService.InputParams(language=Language.EN),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -106,7 +102,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "system", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -68,16 +68,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
params=SarvamHttpTTSService.InputParams(language=Language.EN),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -108,7 +104,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "system", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -62,16 +62,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
model="bulbul:v2",
voice_id="manisha",
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -101,7 +97,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
# Optionally, you can wait for 30 seconds and then change the voice.

View File

@@ -64,16 +64,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -103,7 +99,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -64,16 +64,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
streaming=True,
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful AI demonstrating Inworld AI's TTS. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a friendly and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful AI demonstrating Inworld AI's TTS. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a friendly and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -111,7 +107,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info("Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "system", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -61,16 +61,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
temperature=1.1,
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful AI demonstrating Inworld AI's TTS. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a friendly and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful AI demonstrating Inworld AI's TTS. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a friendly and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -108,7 +104,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info("Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -64,16 +64,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
aiohttp_session=session,
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -104,7 +100,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "system", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -60,16 +60,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
voice_id=os.getenv("ASYNCAI_VOICE_ID", "e0f39dc4-f691-4e78-bba5-5c636692cc04"),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -100,7 +96,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -40,7 +40,7 @@ def _create_aic_filter() -> AICFilter:
return AICFilter(
license_key=license_key,
model_id="quail-vf-l-16khz",
model_id="quail-vf-2.0-l-16khz",
)
@@ -80,16 +80,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=aic_vad_analyzer),
@@ -128,7 +124,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
await audiobuffer.start_recording()
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@audiobuffer.event_handler("on_audio_data")

View File

@@ -62,16 +62,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
voice_id="f898a92e-685f-43fa-985b-a46920f0650b",
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -113,7 +109,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
"💡 Word timestamps are enabled! Watch the console for TTSTextFrame logs showing each word with its PTS."
)
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -66,16 +66,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
url="wss://us.api.gradium.ai/api/speech/tts",
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -106,7 +102,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -59,18 +59,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
model="mars-flash",
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful voice assistant powered by Camb AI text-to-speech. ",
)
messages = [
{
"role": "system",
"content": "You are a helpful voice assistant powered by Camb AI text-to-speech. "
"Keep your responses concise and conversational since they will be spoken aloud. "
"Avoid special characters, emojis, or bullet points.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -101,7 +95,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info("Client connected")
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -64,16 +64,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
base_url="https://app-362f7ca1-6975-4e18-a605-ab202bf2c315.app.hathora.dev/v1",
api_key=os.getenv("HATHORA_API_KEY"),
model=None,
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
context_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -104,7 +98,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -56,16 +56,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = PiperTTSService(voice_id="en_US-ryan-high")
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -96,7 +92,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -56,16 +56,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = KokoroTTSService(voice_id="af_heart")
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -96,7 +92,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -62,16 +62,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
voice_id=os.getenv("RESEMBLE_VOICE_UUID"),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -102,7 +98,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -98,16 +98,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -141,7 +137,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected: {client}")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -10,7 +10,7 @@ from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import TTSSpeakFrame
from pipecat.frames.frames import LLMRunFrame, TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -59,18 +59,14 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
messages = [
{
"role": "system",
"content": "You are a helpful assistant. Respond to what the user said in a creative and helpful way. Keep your responses brief.",
},
]
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful assistant. Respond to what the user said in a creative and helpful way. Keep your responses brief.",
)
hey_robot_filter = WakeCheckFilter(["hey robot", "hey, robot"])
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -102,7 +98,13 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
await task.queue_frame(TTSSpeakFrame("Hi! If you want to talk to me, just say 'Hey Robot'"))
context.add_message(
{
"role": "system",
"content": "Please introduce yourself. Tell the user they should say 'Hey Robot' before talking to you.",
}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):

View File

@@ -104,21 +104,17 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),

View File

@@ -56,16 +56,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way. You are also able to describe images.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way. You are also able to describe images.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -114,7 +110,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
size=image.size,
text=question,
)
messages.append(message)
context.add_message(message)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -56,16 +56,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = AnthropicLLMService(api_key=os.getenv("ANTHROPIC_API_KEY"))
llm = AnthropicLLMService(
api_key=os.getenv("ANTHROPIC_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way. You are also able to describe images.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way. You are also able to describe images.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -114,7 +110,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
size=image.size,
text=question,
)
messages.append(message)
context.add_message(message)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -63,16 +63,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# Here we can't because AWS Bedrock doesn't support it for Claude 3.7,
# which we need for image input.
params=AWSBedrockLLMService.InputParams(temperature=0.8),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way. You are also able to describe images.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way. You are also able to describe images.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -121,7 +115,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
size=image.size,
text=question,
)
messages.append(message)
context.add_message(message)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -56,16 +56,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = GoogleLLMService(api_key=os.getenv("GOOGLE_API_KEY"))
llm = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way. You are also able to describe images.",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way. You are also able to describe images.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -114,7 +110,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
size=image.size,
text=question,
)
messages.append(message)
context.add_message(message)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -16,6 +16,7 @@ from pipecat.pipeline.task import PipelineTask
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.assemblyai.models import AssemblyAIConnectionParams
from pipecat.services.assemblyai.stt import AssemblyAISTTService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
@@ -49,6 +50,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = AssemblyAISTTService(
api_key=os.getenv("ASSEMBLYAI_API_KEY"),
connection_params=AssemblyAIConnectionParams(
speech_model="u3-rt-pro",
),
)
tl = TranscriptionLogger()

View File

@@ -70,7 +70,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
# You can also register a function_name of None to get all functions
# sent to the same callback with an additional function_name parameter.
@@ -110,14 +113,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
)
tools = ToolsSchema(standard_tools=[weather_function, restaurant_function])
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages, tools)
context = LLMContext(tools=tools)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),

View File

@@ -72,10 +72,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = AnthropicLLMService(
api_key=os.getenv("ANTHROPIC_API_KEY"),
model="claude-3-7-sonnet-latest",
)
llm = AnthropicLLMService(api_key=os.getenv("ANTHROPIC_API_KEY"))
llm.register_function("get_weather", get_weather)
llm.register_function("get_restaurant_recommendation", fetch_restaurant_recommendation)

View File

@@ -70,6 +70,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
llm = TogetherLLMService(
api_key=os.getenv("TOGETHER_API_KEY"),
model="meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
# You can also register a function_name of None to get all functions
# sent to the same callback with an additional function_name parameter.
@@ -96,14 +97,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
required=["location", "format"],
)
tools = ToolsSchema(standard_tools=[weather_function])
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages, tools)
context = LLMContext(tools=tools)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),

View File

@@ -94,7 +94,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
)
# Anthropic for vision analysis
llm = AnthropicLLMService(api_key=os.getenv("ANTHROPIC_API_KEY"))
llm = AnthropicLLMService(
api_key=os.getenv("ANTHROPIC_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way. You are able to describe images from the user camera.",
)
llm.register_function("fetch_user_image", fetch_user_image)
@llm.event_handler("on_function_calls_started")
@@ -118,14 +121,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
)
tools = ToolsSchema(standard_tools=[fetch_image_function])
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way. You are able to describe images from the user camera.",
},
]
context = LLMContext(messages, tools)
context = LLMContext(tools=tools)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -162,7 +158,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
client_id = get_transport_client_id(transport, client)
# Kick off the conversation.
messages.append(
context.add_message(
{
"role": "system",
"content": f"Please introduce yourself to the user. Use '{client_id}' as the user ID during function calls.",

View File

@@ -101,6 +101,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# Here we can't because AWS Bedrock doesn't support it for Claude 3.7,
# which we need for image input.
params=AWSBedrockLLMService.InputParams(temperature=0.8),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way. You are able to describe images from the user camera.",
)
llm.register_function("fetch_user_image", fetch_user_image)
@@ -125,14 +126,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
)
tools = ToolsSchema(standard_tools=[fetch_image_function])
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way. You are able to describe images from the user camera.",
},
]
context = LLMContext(messages, tools)
context = LLMContext(tools=tools)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -169,7 +163,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
client_id = get_transport_client_id(transport, client)
# Kick off the conversation.
messages.append(
context.add_message(
{
"role": "system",
"content": f"Please introduce yourself to the user. Use '{client_id}' as the user ID during function calls.",

View File

@@ -94,7 +94,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
)
# Google Gemini model for vision analysis
llm = GoogleLLMService(api_key=os.getenv("GOOGLE_API_KEY"))
llm = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way. You are able to describe images from the user camera.",
)
llm.register_function("fetch_user_image", fetch_user_image)
@llm.event_handler("on_function_calls_started")
@@ -118,14 +121,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
)
tools = ToolsSchema(standard_tools=[fetch_image_function])
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way. You are able to describe images from the user camera.",
},
]
context = LLMContext(messages, tools)
context = LLMContext(tools=tools)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -162,7 +158,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
client_id = get_transport_client_id(transport, client)
# Kick off the conversation.
messages.append(
context.add_message(
{
"role": "system",
"content": f"Please introduce yourself to the user. Use '{client_id}' as the user ID during function calls.",

View File

@@ -124,7 +124,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way. You are able to describe images from the user camera.",
)
llm.register_function("fetch_user_image", fetch_user_image)
@llm.event_handler("on_function_calls_started")
@@ -148,14 +151,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
)
tools = ToolsSchema(standard_tools=[fetch_image_function])
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way. You are able to describe images from the user camera.",
},
]
context = LLMContext(messages, tools)
context = LLMContext(tools=tools)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -200,7 +196,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
client_id = get_transport_client_id(transport, client)
# Kick off the conversation.
messages.append(
context.add_message(
{
"role": "system",
"content": f"Please introduce yourself to the user. Use '{client_id}' as the user ID during function calls.",

View File

@@ -94,7 +94,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way. You are able to describe images from the user camera.",
)
llm.register_function("fetch_user_image", fetch_user_image)
@llm.event_handler("on_function_calls_started")
@@ -118,14 +121,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
)
tools = ToolsSchema(standard_tools=[fetch_image_function])
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way. You are able to describe images from the user camera.",
},
]
context = LLMContext(messages, tools)
context = LLMContext(tools=tools)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -161,7 +157,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
client_id = get_transport_client_id(transport, client)
# Kick off the conversation.
messages.append(
context.add_message(
{
"role": "system",
"content": f"Please introduce yourself to the user. Use '{client_id}' as the user ID during function calls.",

Some files were not shown because too many files have changed in this diff Show More