Compare commits

...

1517 Commits

Author SHA1 Message Date
Aleix Conchillo Flaqué
ac5810bc0d Add changelog for #4129 2026-03-25 19:36:13 -07:00
Aleix Conchillo Flaqué
84f1e49712 Enable direct mode for filter processors
Enable direct mode on FrameFilter, IdentityFilter, and NullFilter so
they can bypass the async queue and process frames synchronously for
lower latency. Also pass **kwargs through to parent in FrameFilter.
2026-03-25 19:36:13 -07:00
Filipi da Silva Fuchter
2441c4f801 Merge pull request #4135 from pipecat-ai/filipi/audio_buffer
Fixed audio crackling and popping artifacts in AudioBufferProcessor
2026-03-25 15:40:17 -04:00
Mark Backman
a7a55dd30e Merge pull request #4136 from pipecat-ai/mb/bump-package-version-nvidia
Upgrade protobuf to 6.x for nvidia-riva-client compatibility
2026-03-25 15:27:48 -04:00
Mark Backman
de6a7223ba Suppress verbose gRPC C-core logging in nvidia services
Set GRPC_VERBOSITY=ERROR by default so users do not see noisy fork
handler and abseil warnings from the gRPC C library. Users can still
override by setting GRPC_VERBOSITY themselves.
2026-03-25 15:23:54 -04:00
Mark Backman
165932e1cc Add changelog for #4136 2026-03-25 15:23:54 -04:00
Mark Backman
1f0d9ad01a Upgrade protobuf to 6.x for nvidia-riva-client 2.25.1 compatibility
nvidia-riva-client 2.25.1 ships with gencode compiled against protobuf
6.31.1, which requires a runtime >= 6.31.1. Update protobuf from 5.29.6
to >=6.31.1,<7 and grpcio-tools from 1.67.1 to 1.78.0 to match.
Regenerate frames_pb2.py with the new compiler.
2026-03-25 15:23:53 -04:00
filipi87
413dbaf974 Automated tests to validate the silence injection guards. 2026-03-25 16:05:58 -03:00
filipi87
da3f184316 Automated tests to validate the silence injection guards. 2026-03-25 15:38:21 -03:00
filipi87
e5a2723632 Fixed audio crackling and popping artifacts in AudioBufferProcessor. 2026-03-25 15:29:50 -03:00
Mark Backman
4ee4002d5d Merge pull request #4137 from pipecat-ai/mb/language-string-log-level-debug
Downgrade unrecognized language string log from warning to debug
2026-03-25 12:26:46 -04:00
Mark Backman
ff5d055b3c Merge pull request #4031 from niczy/xai-tts-service
Add xAI TTS service
2026-03-25 10:57:08 -04:00
Mark Backman
adc003d6c7 Code review cleanup 2026-03-25 10:53:07 -04:00
Nicholas Zhao
bbd14de9c5 Address PR review: rename to XAIHttpTTSService, add language map, clean up API
- Rename XAITTSService → XAIHttpTTSService and XAITTSSettings → XAIHttpTTSSettings
- Add language_to_xai_language() with explicit LANGUAGE_MAP using resolve_language()
- Remove deprecated InputParams, params, voice, language init params
- Remove XAI_DEFAULT_SAMPLE_RATE and XAI_PCM_CODEC constants; add encoding param
- Set sample_rate=None default (picked up from PipelineParams or user)
- Use Language.EN enum instead of string "en" for default language
- Add changelog/4031.added.md
- Add 07e-interruptible-xai.py foundational example
- Update 14g-function-calling-grok.py to use XAIHttpTTSService
- Register 07e in run-release-evals.py
2026-03-25 10:46:54 -04:00
Nicholas Zhao
02b97035f8 Add xAI TTS service 2026-03-25 10:45:15 -04:00
Mark Backman
f470ff193e Update language tests to expect debug instead of warning 2026-03-25 10:26:10 -04:00
Mark Backman
7bc8b89a54 Add changelog for #4137 2026-03-25 10:21:44 -04:00
Mark Backman
a8eff6fbbf Downgrade unrecognized language string log from warning to debug
Service-specific language strings like Deepgram's "multi" are valid
pass-through values, not issues worth warning about.
2026-03-25 10:20:36 -04:00
kompfner
86e086c6b5 Merge pull request #4130 from pipecat-ai/pk/realtime-services-init-v-context-system-instructions-cleanup
Prefer init-provided system instructions in realtime services
2026-03-25 09:13:52 -04:00
Paul Kompfner
4bdfe1cf31 Add changelog for realtime system instruction preference change 2026-03-24 17:34:50 -04:00
Paul Kompfner
bb33045389 Add system instruction conflict resolution tests for realtime adapters
Test that OpenAI Realtime, Grok Realtime, and Nova Sonic adapters
prefer init-provided system_instruction over context-provided, warn
on conflicts, and don't warn for developer messages.
2026-03-24 17:30:35 -04:00
Paul Kompfner
ac2b1ecd47 Prefer init-provided system instruction in Grok Realtime
Add system_instruction parameter to the Grok Realtime adapter's
get_llm_invocation_params() and call _resolve_system_instruction() to
prefer init-provided over context-provided system instructions and
warn on conflicts. Previously context-provided took precedence.

Update the Grok Realtime example to use settings.system_instruction
instead of session_properties.instructions.
2026-03-24 17:29:19 -04:00
Paul Kompfner
e7dd84b552 Prefer init-provided system instruction in OpenAI Realtime
Add system_instruction parameter to the OpenAI Realtime adapter's
get_llm_invocation_params() and call _resolve_system_instruction() to
prefer init-provided over context-provided system instructions and
warn on conflicts. Previously context-provided took precedence.
2026-03-24 17:21:53 -04:00
Paul Kompfner
39329aaddb Prefer init-provided system instruction in Nova Sonic
Add system_instruction parameter to the Nova Sonic adapter's
get_llm_invocation_params() and call _resolve_system_instruction() to
prefer init-provided over context-provided system instructions and
warn on conflicts. Previously context-provided took precedence.

Remove the service-side fallback logic, as the adapter now handles
resolution.
2026-03-24 17:18:44 -04:00
Paul Kompfner
56a56a4174 Prefer init-provided system instruction in Gemini Live
Pass self._system_instruction_from_init to the adapter's
get_llm_invocation_params(), which calls _resolve_system_instruction()
to prefer init-provided over context-provided system instructions and
warn on conflicts. Previously context-provided took precedence.

Also fix the reconnect check to only reconnect when the resolved
system instruction actually differs from what the initial connection
used, avoiding unnecessary reconnects.
2026-03-24 17:06:56 -04:00
kompfner
b80328e038 Merge pull request #4125 from pipecat-ai/pk/gemini-live-endframe-deferral-issue
Gemini Live: fix EndFrame-deferral hang
2026-03-24 17:02:46 -04:00
kompfner
3a80be760b Merge pull request #4089 from pipecat-ai/pk/system-and-developer-message-handling-update
Centralize system message handling in adapters; add developer message support
2026-03-24 16:24:11 -04:00
Paul Kompfner
e0c49927cf Remove hard-coded model overrides from Together and Groq examples
Prefer service defaults — the hard-coded models we were using are no
longer available on these providers.
2026-03-24 16:05:15 -04:00
Paul Kompfner
45926a7135 Update Together.ai default model to openai/gpt-oss-20b
The previous default (meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo) is
no longer available as a serverless Together.ai model and now requires a
custom deployment. The new default is openai/gpt-oss-20b, one of
Together's recommended models for small & fast use-cases.
2026-03-24 16:05:15 -04:00
Paul Kompfner
8c678c1c98 Set supports_developer_role = False for more OpenAI-compatible services
DeepSeek, Mistral, OLLama, Qwen, SambaNova, and Together don't support
the "developer" message role.
2026-03-24 16:05:15 -04:00
Paul Kompfner
4c121332cf Convert developer messages to user for Cerebras (and lay groundwork for other incompatible services)
OpenAI-compatible services that don't support the "developer" message
role can now set supports_developer_role = False on the service class.
BaseOpenAILLMService passes this as convert_developer_to_user to the
adapter, which converts developer messages to user messages before
sending them to the API. Applied to Cerebras and Perplexity.

Also removes the now-redundant developer→user conversion step from
PerplexityLLMAdapter (handled by the parent adapter via the flag).
2026-03-24 16:05:15 -04:00
Paul Kompfner
74686f9190 Add changelog for Gemini Live system_instruction fix 2026-03-24 16:05:15 -04:00
Paul Kompfner
19bcc8620c Fix Gemini Live not honoring settings.system_instruction
_system_instruction_from_init was being set from the deprecated
`system_instruction` constructor parameter instead of
`self._settings.system_instruction`, so system instructions provided
via settings were silently ignored.
2026-03-24 16:05:15 -04:00
Paul Kompfner
0530722c58 Convert developer messages to user in Perplexity adapter
Perplexity doesn't support the "developer" role. Developer messages are
now converted to "user" before other transformations are applied.
2026-03-24 16:05:15 -04:00
Paul Kompfner
0d1b834770 Add developer message support to realtime adapters
OpenAI Realtime, Grok Realtime, and AWS Nova Sonic adapters now convert
"developer" role messages to "user" (consistent with all other non-OpenAI
adapters). Previously these messages were silently dropped. Adds starter
unit tests for all three realtime adapters.
2026-03-24 16:05:15 -04:00
Paul Kompfner
7a0f7b58d1 Remove bit of unintentionally-left-in debugging logic 2026-03-24 16:05:15 -04:00
Paul Kompfner
5806a3f0fa Use "developer" role for remaining developer-intent messages in examples 2026-03-24 16:05:04 -04:00
Paul Kompfner
27fabfc1b3 Improve warning message wording and formatting 2026-03-24 16:02:42 -04:00
Paul Kompfner
d779a5b4ea Use "developer" role for programmatic conversation-kickoff messages
These messages are developer instructions to the assistant (e.g. "Please
introduce yourself to the user"), not simulated user input. The
"developer" role is semantically correct for this purpose.
2026-03-24 16:02:42 -04:00
Paul Kompfner
2bb36b5b66 Update changelog for developer message simplification 2026-03-24 16:02:42 -04:00
Paul Kompfner
e0bc9c73c6 Add Anthropic interruptible example (07e) and register in release evals 2026-03-24 16:02:42 -04:00
Paul Kompfner
2135557689 Simplify: don't promote developer messages to system instruction
Developer messages are now always converted to "user" in non-OpenAI
adapters, never promoted to the system instruction. This removes an
inconsistency where adding an unrelated message to context would change
whether a developer message got promoted.

Simplifications:
- Rename _extract_initial_system_or_developer → _extract_initial_system
- Return Optional[str] instead of Tuple (role is always "system")
- Drop initial_context_message_role from _resolve_system_instruction
- Drop system_role fields from all ConvertedMessages dataclasses
2026-03-24 16:02:42 -04:00
Paul Kompfner
a0393b9af6 Fix: warn on system_instruction conflict even with single system message
When the only message in context was a system message,
_extract_initial_system_or_developer would convert it to "user" (to
prevent empty history) without warning about the conflict with
system_instruction. Now warns inline before converting, with a message
explaining both the conflict and the user-role conversion.
2026-03-24 16:02:42 -04:00
Paul Kompfner
64ba013b68 Move OpenAI Responses adapter tests into test_get_llm_invocation_params.py
Consolidates all adapter get_llm_invocation_params tests in one file.
Adds new tests for developer message handling in the Responses adapter.
2026-03-24 16:02:42 -04:00
Paul Kompfner
7377d88cf5 Move system_instruction tests into test_get_llm_invocation_params.py 2026-03-24 16:02:42 -04:00
Paul Kompfner
3bbec0a2c8 Broaden docstring: all non-OpenAI providers need non-empty messages 2026-03-24 16:02:42 -04:00
Paul Kompfner
e29a63e1ae Improve _extract_initial_system_or_developer docstring clarity 2026-03-24 16:02:42 -04:00
Paul Kompfner
45178972d7 Fix stale docstring in PerplexityLLMAdapter 2026-03-24 16:02:42 -04:00
Paul Kompfner
bb7199d143 Add changelog entries for #4089 2026-03-24 16:02:42 -04:00
Paul Kompfner
d4dea30407 Centralize system message handling in adapters; add developer message support
Two goals:

1. Centralize system_instruction vs context system message resolution into
   the LLM adapters. This eliminates duplication between in-pipeline and
   out-of-band (run_inference) code paths across ~16 locations in service
   llm.py files.

2. Add support for "developer" role messages in conversation context, which
   is facilitated by the above centralization.

Shared helpers on BaseLLMAdapter:
- _extract_initial_system_or_developer: extracts/converts messages[0]
  based on role and whether system_instruction is provided
- _resolve_system_instruction: warns on conflicts between system_instruction
  and context system messages, returns the effective instruction

Developer message handling (new):
- Non-OpenAI adapters: an initial "developer" message is promoted to the
  system instruction when no system_instruction is provided; otherwise it
  is converted to "user". Subsequent "developer" messages are always
  converted to "user". No conflict warning is emitted for developer
  messages (unlike "system" messages).
- OpenAI adapter: "developer" messages pass through in conversation
  history without triggering conflict warnings.
- OpenAI Responses adapter: "developer" messages are kept as "developer"
  role (same as "system", which is also converted to "developer" for the
  Responses API).

Other behavior changes:
- Gemini: "initial" system message detection now checks messages[0] only
  (previously searched anywhere in the list)
- Bedrock: a lone system message is now converted to "user" instead of
  being extracted to an empty message list (matches existing Anthropic
  behavior)
2026-03-24 16:02:42 -04:00
Mark Backman
b49bf1c83f Merge pull request #4127 from pipecat-ai/mb/tts-text-frame-ordering
Fix LLMFullResponseEndFrame racing ahead of final TTSTextFrame
2026-03-24 15:39:06 -04:00
Mark Backman
1b0f7ecb0e Merge pull request #4126 from pipecat-ai/mb/fix-tts-flush-phantom-contexts
Fix TTS flush creating phantom contexts on ElevenLabs
2026-03-24 15:33:58 -04:00
Mark Backman
8e57dd67a2 Add changelog for #4127 2026-03-24 15:10:48 -04:00
Mark Backman
5d71de8aad Fix LLMFullResponseEndFrame racing ahead of final TTSTextFrame
Route LLMFullResponseEndFrame through the serialization queue instead
of pushing it directly downstream when push_text_frames is enabled.
This ensures the frame is emitted only after the audio context is
fully drained, preserving correct ordering relative to TTSTextFrames.

Previously, the final sentence TTSTextFrame would arrive at the
LLMAssistantAggregator after LLMFullResponseEndFrame, causing it to
be dropped from the conversation context (especially with RTVI text
input where no subsequent interruption would flush the orphaned text).
2026-03-24 15:09:42 -04:00
Paul Kompfner
dc56cb2ccc Gemini Live: reset _bot_is_responding when releasing deferred EndFrame
Without this, the released EndFrame re-enters process_frame, sees
_bot_is_responding is still True, defers again, and loops indefinitely.
2026-03-24 15:01:07 -04:00
Paul Kompfner
063955b7eb Gemini Live: clean up EndFrame deferral state on disconnect
Cancel the deferral timeout task and clear the pending EndFrame during
disconnect, which could otherwise be left dangling after a
CancelFrame-triggered shutdown.
2026-03-24 14:30:14 -04:00
Mark Backman
e05bd54743 Add changelog for #4126 2026-03-24 13:43:07 -04:00
Mark Backman
35f52f70ab Fix TTS flush creating phantom contexts on providers like ElevenLabs
When an interruption arrives before any LLM text reaches run_tts, the
turn context ID exists but was never registered via create_audio_context.
Calling flush_audio for this unregistered context sends a message to the
provider (e.g. ElevenLabs) with a context_id it has never seen, which
implicitly creates a server-side context that is never closed. After
enough rapid interruptions these phantom contexts accumulate and exceed
the providers limit (ElevenLabs: 5 simultaneous contexts, 1008 policy
violation).

Guard the flush call with audio_context_available so it only fires when
the context was actually opened.

Fixes #4114
2026-03-24 13:42:01 -04:00
Paul Kompfner
d05eb02b98 Add changelog for #4125 2026-03-24 12:54:50 -04:00
Paul Kompfner
4abd4d031d Gemini Live: add safety timeout to EndFrame deferral to prevent indefinite pipeline hang
When an EndFrame arrives while the bot is mid-response, it is deferred
until turn_complete is received. If turn_complete never arrives, the
EndFrame gets stuck forever and the pipeline hangs indefinitely.

Add a 30-second timeout: if turn_complete hasn't arrived by then, the
deferred EndFrame is released anyway with a warning log. The timeout
is cancelled if turn_complete arrives normally.
2026-03-24 12:50:07 -04:00
Paul Kompfner
7e42998e9e Gemini Live: fix potential EndFrame-deferral hang by handling turn_complete without usage_metadata
We observed a case where a deferred EndFrame was never released in
Gemini Live, causing the pipeline to hang indefinitely. The EndFrame
deferral mechanism waits for _handle_msg_turn_complete to set
_bot_is_responding back to False, but turn_complete messages were only
processed if they also contained usage_metadata. If Gemini ever sent
turn_complete without usage_metadata, the message would be silently
dropped and the deferred EndFrame would never be released.

Now turn_complete is always handled regardless of usage_metadata
presence, with usage_metadata processing only when available.

Note: we have not actually observed a turn_complete without
usage_metadata in practice, so this is a theoretical fix for the
EndFrame-deferral hang. The actual root cause of the observed hang
may lie elsewhere.
2026-03-24 12:32:14 -04:00
Filipi da Silva Fuchter
28eb4544d3 Merge pull request #4122 from pipecat-ai/filipi/inworld_follow_up
Invoking on_turn_context_created when we receive a TTSSpeakFrame.
2026-03-24 12:28:00 -04:00
Filipi da Silva Fuchter
b45dcb1ae0 Merge pull request #4028 from inworld-ai/ian/close-on-turn-complete
fix(inworld): close context at end of turn instead of relying on idle timeout
2026-03-24 12:07:51 -04:00
Mark Backman
6eb988b729 Merge pull request #4092 from harshitajain165/harshita/smallest-tts-only
Add Smallest AI TTS service integration
2026-03-24 11:54:34 -04:00
Mark Backman
f68b3222b3 Fix SmallestTTSService to use InterruptibleTTSService audio context system
- Route audio through audio contexts (append_to_audio_context) instead of
  pushing frames directly, enabling proper turn management and interruptions
- Add push_stop_frames and push_start_frame so the base class handles
  TTSStartedFrame/TTSStoppedFrame lifecycle
- Remove manual context_id tracking (self._context_id) in favor of
  get_active_audio_context_id()
- Don't call remove_audio_context on "complete" — Smallest sends one
  per request, not per turn; let the base class timeout handle cleanup
- Guard v2-only params (consistency, similarity, enhancement) so they
  aren't sent to lightning-v3.1
- Remove request_id from request payload (not a documented request field)
- Add flush_audio override to send flush to WebSocket
2026-03-24 11:46:28 -04:00
filipi87
3274235ea1 Adding missing changelog entry. 2026-03-24 12:42:56 -03:00
filipi87
05b9c514fb Invoking TTSSpeakFrame when we receive a TTSSpeakFrame. 2026-03-24 12:39:28 -03:00
Filipi da Silva Fuchter
03c0d7c345 Merge pull request #4013 from inworld-ai/ian/prewarm-context-inworld-v2
[inworld] Pre-open WebSocket TTS context on LLM response start
2026-03-24 11:37:28 -04:00
Filipi da Silva Fuchter
0783edb185 Merge pull request #4120 from pipecat-ai/filipi/krisp-viva-vad-support
Added cleanup() method to VADAnalyzer base class
2026-03-24 11:26:53 -04:00
Mark Backman
51d28b4a9f Code review fixes 2026-03-24 11:21:04 -04:00
kompfner
cf083b8411 Merge pull request #4078 from pipecat-ai/cb/gemini-updates
Updates for Gemini Live
2026-03-24 11:18:00 -04:00
Harshita Jain
099814d74a Add Smallest AI TTS service integration
Adds SmallestTTSService, a WebSocket-based TTS service using Smallest AI's
Lightning v3.1 model. Follows current Pipecat service conventions:

- SmallestTTSSettings dataclass with runtime-updatable settings (voice,
  language, speed, etc.)
- Reconnects on model change; keepalive every 30s to prevent idle timeout
- TTS settings default to None so the API applies its own defaults
- Model enum: SmallestTTSModel.LIGHTNING_V3_1

Includes a foundational example (07zl-interruptible-smallest.py) using
Deepgram STT + Smallest TTS + OpenAI LLM.

STT integration will follow in a separate PR once the hallucination/finalize
behaviour is resolved.

Made-with: Cursor
2026-03-24 11:11:10 -04:00
Mark Backman
dd45843c42 Merge pull request #4117 from m-ods/feat/assemblyai-domain-param
feat(assemblyai): add domain parameter for Medical Mode
2026-03-24 11:02:01 -04:00
Mark Backman
fe15d8654b Add changelog for #4117 2026-03-24 10:57:55 -04:00
Paul Kompfner
68a440ae2e Move inference_on_context_initialization comment to constructor level 2026-03-24 10:49:45 -04:00
Paul Kompfner
8109ab6135 Further tweaks and improvements to Gemini 3 support in Gemini Live
Gets Gemini 3 support to the point where it works with:
- The "legacy" pattern from the previous (removed) 26- example
- inference_on_context_initialization=True (the default)
- inference_on_context_initialization=False
2026-03-24 10:45:41 -04:00
Filipi da Silva Fuchter
f311a0b6e4 Merge pull request #4084 from pipecat-ai/filipi/refactor_stop_frame
Refactoring the way we automatically push TTSStoppedFrame.
2026-03-24 10:06:02 -04:00
filipi87
9df8985d60 Refactoring the way we automatically push TTSStoppedFrame. 2026-03-24 11:00:06 -03:00
filipi87
b3a25e0ebe Adding changelog entry for cleanup method. 2026-03-24 10:53:07 -03:00
filipi87
02cfb129d3 Invoke cleanup method on VAD analyzer. 2026-03-24 10:49:14 -03:00
filipi87
311afef7da Fixing Krisp Viva example. 2026-03-24 10:48:22 -03:00
Filipi da Silva Fuchter
5ed183d215 Merge pull request #4022 from krispai/krisp-viva-vad-support
Draft Implementation for Krisp VIVA VAD.
2026-03-24 09:44:32 -04:00
Mark Backman
5c3d3aea2b Merge pull request #4115 from pipecat-ai/mb/user-turn-stop-warnings
Warn when VAD stop_secs misconfiguration may degrade turn detection
2026-03-24 09:32:20 -04:00
Mark Backman
0651569a4e Merge pull request #4119 from Alex-wuhu/novita-integration
feat: add Novita AI as LLM provider
2026-03-24 09:30:05 -04:00
Mark Backman
bf04ea2043 Add changelog for #4119 2026-03-24 09:23:40 -04:00
Mark Backman
aa0b49d69f Code review fixes 2026-03-24 09:22:08 -04:00
Alex-wuhu
8c6f4a8d7b Add Novita AI LLM service provider 2026-03-24 09:20:50 -04:00
Mark Backman
bbaa5971c4 Merge pull request #3981 from dhruvladia-sarvam/feat/sarvam-llm-integration
Sarvam LLM Integration
2026-03-24 09:00:01 -04:00
Mark Backman
cdd8c3e5bb Fix examples 2026-03-24 08:53:56 -04:00
Mark Backman
1c8a8f51d4 Code review fixes 2026-03-24 08:46:03 -04:00
dhruvladia-sarvam
349b8645f3 Merge branch 'main' into feat/sarvam-llm-integration 2026-03-24 16:34:12 +05:30
dhruvladia-sarvam
696196e30c alignment with pr 4081 2026-03-24 16:29:58 +05:30
Garegin Harutyunyan
dacffccd3a fixed runtime issue. 2026-03-24 12:56:19 +04:00
Martin Schweiger
f21b262969 feat(assemblyai): add domain parameter for Medical Mode
Add `domain` field to AssemblyAISTTSettings to support AssemblyAI's
streaming API `domain` query parameter, enabling specialized recognition
modes like Medical Mode (`medical-v1`).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 13:09:42 +08:00
Aleix Conchillo Flaqué
7414b30308 Merge pull request #4116 from pipecat-ai/changelog-0.0.107
Release 0.0.107 - Changelog Update
2026-03-23 20:13:49 -07:00
aconchillo
3268cb93d5 Update changelog for version 0.0.107 2026-03-23 20:11:31 -07:00
Aleix Conchillo Flaque
9211379720 update uv.lock 2026-03-23 20:06:28 -07:00
Mark Backman
42cab7eea0 Add changelog entry for #4115 2026-03-23 18:01:04 -04:00
Mark Backman
483b643b07 Warn when VAD stop_secs misconfiguration may degrade turn detection
Add warnings in SpeechTimeoutUserTurnStopStrategy and
TurnAnalyzerUserTurnStopStrategy when stop_secs differs from the
recommended default (0.2s) or when stop_secs >= STT p99 latency,
which collapses the STT wait timeout to 0s. Document the stop_secs=0.2
assumption in stt_latency.py.
2026-03-23 17:57:51 -04:00
Filipi da Silva Fuchter
12dc429761 Merge pull request #4104 from pipecat-ai/filipi/audio_issue
Allow defining whether to insert silence in the output transport.
2026-03-23 17:17:37 -04:00
filipi87
066b206b3d Renaming insert_silence to auto_silence 2026-03-23 18:12:26 -03:00
filipi87
ddd1b71b56 Renaming audio_out_insert_silence to audio_out_auto_silence 2026-03-23 17:57:42 -03:00
filipi87
8612c9f50a Updating to use daily-python 0.27.0 2026-03-23 17:52:41 -03:00
Mark Backman
d314e2831a Simplify 26 name, update evals 2026-03-23 15:46:13 -04:00
Mark Backman
fd0bfe141f Merge pull request #4109 from pipecat-ai/pk/tiny-fix 2026-03-23 15:17:19 -04:00
filipi87
3042929989 Fixing changelog description. 2026-03-23 15:57:25 -03:00
dhruvladia-sarvam
0f6cc231cf removing error wraps and model validation check 2026-03-24 00:06:15 +05:30
Chad Bailey
844555c520 removed old Gemini Live example 2026-03-23 18:31:36 +00:00
dhruvladia-sarvam
3428a4c6ad fixes post PR 4081 2026-03-23 23:45:27 +05:30
Mark Backman
f283cc5bc6 Merge pull request #4091 from pipecat-ai/mb/gradium-multiplexing-setup
feat: send per-context setup in Gradium TTS multiplexing
2026-03-23 12:00:53 -04:00
Mark Backman
70552d7697 Add changelog entry for #4091 2026-03-23 11:58:14 -04:00
Mark Backman
84c2a24c9f feat: send per-context setup messages in Gradium TTS multiplexing
Send a setup message with client_req_id before the first text message
for each context, matching Gradium multiplexing protocol. This allows
Gradium to associate each session with its setup configuration when
using close_ws_on_eos=False.
2026-03-23 11:58:14 -04:00
Garegin Harutyunyan
f8c7414ea7 format fix. 2026-03-23 18:58:19 +04:00
Garegin Harutyunyan
f1f51de962 Merge branch 'main' into krisp-viva-vad-support 2026-03-23 18:35:58 +04:00
Paul Kompfner
e93b0ace06 Remove an unnecessary check in SyncParallelPipeline 2026-03-23 10:00:32 -04:00
Garegin Harutyunyan
c32240e14b Fixed review comments. 2026-03-23 17:44:48 +04:00
filipi87
e6602f9244 Disabling auto_silence for tavus video service. 2026-03-22 18:28:57 -03:00
filipi87
9a30b18f21 Configuring Daily CustomAudioSource to automatically inject silence or not. 2026-03-22 17:29:01 -03:00
filipi87
936a39f4a1 Updating tavus examples to not send silence. 2026-03-22 14:41:23 -03:00
filipi87
3b1cb30926 Adding changelog entry. 2026-03-22 13:26:00 -03:00
filipi87
ce36487143 Allow defining whether to insert silence in the output transport. 2026-03-22 13:09:09 -03:00
Mark Backman
ec3bd8c5b1 Merge pull request #4097 from pipecat-ai/mb/update-minimax-docs-link
Update MiniMaxHttpTTSService platform docs link
2026-03-21 07:08:40 -04:00
Mark Backman
622ebd5d74 Update MiniMaxHttpTTSService platform docs link 2026-03-21 07:02:06 -04:00
Mark Backman
a9a1941a45 Merge pull request #4093 from poislagarde/fix/genesys-pong-parameters 2026-03-20 19:52:58 -04:00
Pablo Ois Lagarde
53e0136366 chore: rename changelog fragment to PR #4093
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 16:46:35 -03:00
Pablo Ois Lagarde
bc0e7130b8 fix: always include parameters field in Genesys AudioHook messages
The AudioHook protocol requires every message to carry a `parameters`
object. `_create_message` conditionally included it only when parameters
were truthy, so pong responses and closed responses without
outputVariables were sent without the field.

Clients that validate message structure (including the Genesys reference
implementation) rejected these messages, which broke server sequence
tracking and prevented outputVariables from reaching the Architect flow.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 16:37:53 -03:00
Mark Backman
d8af4447ff Merge pull request #3449 from kingster/telemetry-fix-system-message
fix: Record correct system_instruction in LLM spans for LLM services
2026-03-20 13:42:47 -04:00
Mark Backman
c89e366739 refactor: align tracing attributes with OpenTelemetry GenAI conventions
- gen_ai.system -> gen_ai.provider.name (deprecated)
- system / system_instructions -> gen_ai.system_instructions
- gen_ai.usage.cache_read_input_tokens -> gen_ai.usage.cache_read.input_tokens
- gen_ai.usage.cache_creation_input_tokens -> gen_ai.usage.cache_creation.input_tokens
2026-03-20 13:36:20 -04:00
Ian Lee
e9f3086ea3 add on_turn_context_created hook instead 2026-03-20 10:33:25 -07:00
Mark Backman
b5c362d6e6 refactor: rename tracing span attribute "system" to "system_instructions"
Align with the OpenTelemetry GenAI semantic convention
gen_ai.system_instructions for system prompts. The old "system"
attribute name was unrelated to gen_ai.system (which is for
provider name).
2026-03-20 13:20:03 -04:00
Mark Backman
e5aaa4c4eb fix: read system_instruction from _settings instead of removed attribute
Replace adapter-based extraction in traced_llm with direct reads from
_settings.system_instruction (priority) and context messages (fallback).
The old approach had three bugs: signature mismatch with Anthropic
adapter, key name inconsistency, and unnecessary overhead from full
message/tools conversion.

Also deduplicate the system instruction in spans -- it was appearing as
both "system" and "param.system_instruction".
2026-03-20 13:12:40 -04:00
Varun Singh
a12ad27348 enable_dialout should not depend on sip_caller_phone being set (#4087)
* enable_dialout should not depend on SIP being set

* we still need room_prefix to have pipecat-sip, s/sip/telephony in room prefix
2026-03-20 10:01:31 -07:00
Mark Backman
44504efdc7 Merge pull request #4090 from pipecat-ai/mb/fix-tts-audio-context-routing
fix: route TTS audio through audio context queue in Fish, LMNT, Neuph…
2026-03-20 11:06:53 -04:00
Mark Backman
da8070e98e Add changelog entry for #4090 2026-03-20 10:46:36 -04:00
Mark Backman
b98ad7fb64 fix: route TTS audio through audio context queue in Fish, LMNT, Neuphonic, Rime NonJson
These services were pushing audio frames directly via push_frame() in their
WebSocket receive loops, bypassing the base TTSService audio context
serialization queue. This causes incorrect frame ordering and broken
interruption handling.

Changes per service:
- Fish Audio: use append_to_audio_context(), replace _handle_interruption
  with on_audio_context_interrupted()
- LMNT: use append_to_audio_context(), remove redundant push_frame override
- Neuphonic: use append_to_audio_context(), remove redundant push_frame and
  process_frame overrides (base class handles pause/resume)
- Rime NonJson: use append_to_audio_context(), remove redundant push_frame
  override
2026-03-20 10:41:43 -04:00
Mark Backman
10ddf45015 Merge pull request #4088 from pipecat-ai/mb/add-community-integrations-README
Add community integrations to README
2026-03-20 10:34:09 -04:00
Filipi da Silva Fuchter
e41cb2cd0c Merge pull request #4083 from pipecat-ai/filipi/deepgram_sagemaker_tts_improvements
Improvements to DeepgramSageMakerTTSService
2026-03-20 10:30:48 -04:00
Filipi da Silva Fuchter
a69abcc67a Merge pull request #4082 from pipecat-ai/filipi/sarvam_tts_improvements
Improvements to SarvamTTSService.
2026-03-20 10:28:02 -04:00
Mark Backman
a11c48d5b0 Add community integrations to README 2026-03-20 10:09:58 -04:00
Kinshuk Bairagi
7caec9018b Merge branch 'pipecat-ai:main' into telemetry-fix-system-message 2026-03-20 18:36:31 +05:30
kompfner
08052d8880 Merge pull request #4085 from pipecat-ai/pk/remove-broken-05a-example
Remove 05a example, which was broken and isn't currently a priority t…
2026-03-19 15:59:39 -04:00
Paul Kompfner
4c456ada04 Remove 05a example, which was broken and isn't currently a priority to fix 2026-03-19 15:52:48 -04:00
kompfner
488dc1d07e Merge pull request #4074 from pipecat-ai/pk/openai-responses-llm-service
feat: add OpenAI Responses API LLM service
2026-03-19 15:44:26 -04:00
Paul Kompfner
dafbb2eb66 fix: typo "conversatione" → "conversation" in 20- examples 2026-03-19 15:38:38 -04:00
Paul Kompfner
ea1534f9f8 docs: note input_audio coming soon, no conversion needed
The LLMContext format already matches the expected Responses API
shape for input_audio, so no adapter conversion will be needed
once OpenAI enables support.
2026-03-19 15:36:23 -04:00
kompfner
f6e7599e49 Merge pull request #4029 from pipecat-ai/pk/sync-parallel-pipeline-fixes
`SyncParallelPipeline` and related fixes
2026-03-19 14:41:16 -04:00
Paul Kompfner
6424c36666 refactor: remove model init param from OpenAIResponsesLLMService
Model is only configurable via settings, matching the canonical API.
2026-03-19 14:38:01 -04:00
Paul Kompfner
05e344b9ec docs: port _closing comments from BaseOpenAILLMService 2026-03-19 14:30:34 -04:00
Paul Kompfner
4ec7be8850 feat: include cached_tokens and reasoning_tokens in usage metrics 2026-03-19 14:23:39 -04:00
Paul Kompfner
0533ea7b7f refactor: use direct attribute access for typed stream events
Replace getattr() calls with direct attribute access and isinstance()
checks on the strongly-typed OpenAI SDK event models.
2026-03-19 14:19:10 -04:00
Paul Kompfner
a3431d3b01 fix: prefer _full_model_name over _settings.model in tracing
The API-provided full model name is more specific than the
user-provided model name (e.g. includes version/snapshot details).
Reorder the lookup in _get_model_name and add a comment where the
Responses service sets the field.
2026-03-19 13:58:35 -04:00
Paul Kompfner
348df9d4ce fix: remove redundant instructions override in run_inference
The override would re-add `instructions` after the adapter had
intentionally converted it to a developer message for empty contexts.
Added a regression test.
2026-03-19 13:34:41 -04:00
Filipi da Silva Fuchter
a9256ebc35 Merge pull request #4075 from pipecat-ai/filipi/tts_frame_order
Fixing TTS frame order
2026-03-19 13:30:28 -04:00
filipi87
a0f311158d Changelog entry for the DeepgramSageMakerTTSService improvements. 2026-03-19 11:46:49 -03:00
filipi87
d3ca034c4f Routing the audio through the audio context queue. 2026-03-19 11:40:43 -03:00
filipi87
39425a675a Improvements to DeepgramSageMakerTTSService. 2026-03-19 11:32:56 -03:00
filipi87
c4d1b89049 Adding changelog entry for the Sarvam fixes. 2026-03-19 11:17:39 -03:00
filipi87
fd8c6c88bb Improvements to SarvamTTSService. 2026-03-19 11:13:17 -03:00
Paul Kompfner
57fd29f0c4 Remove changelog fragment that no longer applies after a rebase 2026-03-19 09:57:26 -04:00
Paul Kompfner
06f7da44f1 Clarify SyncParallelPipeline docstrings
Rewrite docstrings to more clearly explain what SyncParallelPipeline
does: hold all output until every parallel branch finishes, so frames
produced in response to a single input are released together.
2026-03-19 09:43:51 -04:00
Paul Kompfner
d702ebd6a2 Add frame_order parameter to SyncParallelPipeline
Adds a FrameOrder enum with ARRIVAL (default, existing behavior) and
PIPELINE (pushes frames in pipeline definition order). This lets callers
guarantee output ordering between parallel pipelines — e.g. ensuring
image frames precede audio frames — without needing a separate reordering
processor downstream.

Updates the 05-sync-speech-and-image example to use FrameOrder.PIPELINE,
removing the ImageBeforeAudioReorderer class entirely.
2026-03-19 09:43:51 -04:00
Paul Kompfner
26fc238eb7 Add changelog entry for Whisker debugger fix 2026-03-19 09:43:51 -04:00
Paul Kompfner
61ff53f2b9 Add changelog entries for PR #4029 2026-03-19 09:43:51 -04:00
Paul Kompfner
5e7639812a Add ImageBeforeAudioReorderer to sync-speech-and-image example
Add a processor after SyncParallelPipeline that ensures each image frame
precedes its corresponding TTS audio frames. SyncParallelPipeline batches
them together but doesn't guarantee branch ordering. The reorderer detects
when TTS frames arrive before their image (via context_id tracking) and
holds them until the image arrives.

Also rename ImageAudioSync to MarkImageForPlaybackSync for clarity.
2026-03-19 09:43:51 -04:00
Paul Kompfner
ba779f920f Revert a couple of logs that were changed from trace to debug just for debugging 2026-03-19 09:43:51 -04:00
Paul Kompfner
c3d6e965d8 Use TextAggregationMode.TOKEN in the 05-sync-speech-and-image
example since the SentenceAggregator already provides complete sentences.
2026-03-19 09:43:37 -04:00
Paul Kompfner
0f1ff16af1 Add sync_with_audio support for OutputImageRawFrame
Add a `sync_with_audio` field to `OutputImageRawFrame` that routes image
frames through the audio queue in the output transport, ensuring images
are only displayed after all preceding audio has been sent. This enables
proper audio/image synchronization in pipelines like the calendar month
narration example.

Update the 05-sync-speech-and-image example to use an `ImageAudioSync`
processor that sets this flag on image frames.
2026-03-19 09:41:21 -04:00
Paul Kompfner
1ede8460a2 Fix SyncParallelPipeline race condition with concurrent SystemFrame processing
The FrameProcessor two-queue architecture processes SystemFrames and
non-SystemFrames on separate concurrent async tasks. Both paths called
SyncParallelPipeline.process_frame(), which used the same per-pipeline
sink queues. A SystemFrame's wait_for_sync could steal frames from a
concurrent non-SystemFrame's wait_for_sync, corrupting synchronization
and stalling the pipeline.

This was triggered by the auto-embedded RTVI processor (added in
v0.0.101) which floods OutputTransportMessageUrgentFrame SystemFrames
through the pipeline during LLM responses.

Fix: SystemFrames (except EndFrame) now take a fast path — passed
through internal pipelines and pushed downstream directly without
touching the sink queues or drain logic. EndFrame retains the full
drain behavior as a lifecycle frame.
2026-03-19 09:41:21 -04:00
Paul Kompfner
463db59bb5 Minor comment typo fix 2026-03-19 09:41:21 -04:00
Paul Kompfner
0be4084683 Fix bug resulting in SyncParallelPipeline breaking the Whisker debugger 2026-03-19 09:41:21 -04:00
filipi87
8f6dfc4777 Mentioning the frame order fix in the changelog. 2026-03-19 10:26:58 -03:00
filipi87
6841c0719b Always appending TTSTextFrame to the audio context. 2026-03-19 10:12:01 -03:00
filipi87
2836b1ea7e Fixing the frame ordering of the AggregatedTextFrame. 2026-03-19 10:07:25 -03:00
filipi87
5fd98e1391 Fixing TTS frame order. 2026-03-19 09:43:40 -03:00
Mark Backman
ef419cd87a Merge pull request #4073 from joachimchauvet/fix/livekit-mixer-invalidstate-log-spam
Suppress InvalidState log spam from audio mixer during interruptions in LiveKit transport
2026-03-19 08:39:42 -04:00
Aleix Conchillo Flaqué
8750c26cdc Merge pull request #4080 from pipecat-ai/changelog-0.0.106
Release 0.0.106 - Changelog Update
2026-03-18 23:39:22 -07:00
aconchillo
3e0c536fe7 Update changelog for version 0.0.106 2026-03-18 23:36:18 -07:00
Aleix Conchillo Flaqué
7ee5fa9e20 Merge pull request #4079 from pipecat-ai/aleix/fix-tavus-dtmf-callback
Add missing on_dtmf_event callback to Tavus transport
2026-03-18 21:47:28 -07:00
Aleix Conchillo Flaqué
7dfcaf8096 Add missing on_dtmf_event callback to Tavus transport
The on_dtmf_event callback was added to DailyCallbacks in #4047 but
the Tavus transport was not updated, causing a missing argument error.
2026-03-18 21:46:06 -07:00
Chad Bailey
05157129e2 added changelog 2026-03-18 23:31:18 +00:00
Chad Bailey
4a0411cbc4 disabled single responses for gemini 3 live models 2026-03-18 23:23:45 +00:00
Chad Bailey
6cd39b8b42 updates 2026-03-18 23:04:22 +00:00
Chad Bailey
38d7882f0f updated context seeding to allow gemini 3.1 to greet the user 2026-03-18 21:28:17 +00:00
Filipi da Silva Fuchter
4aea7784c9 Fixed the ordering of _maybe_pause_frame_processing call in TTSService (#4071)
* Fixing the invocation of pause_frame_processing at the correct time when receiving LLMFullResponseEndFrame and EndFrame.
2026-03-18 16:55:59 -04:00
Mark Backman
bad10177d4 Add WakePhraseUserTurnStartStrategy (#4064)
- Add WakePhraseUserTurnStartStrategy for gating interaction behind wake                                                                            
  phrase detection, with timeout and single_activation modes                                                                                        
- Add default_user_turn_start_strategies() and                                                                                                      
  default_user_turn_stop_strategies() helper functions                                                                                              
- Deprecate WakeCheckFilter in favor of the new strategy
- Extend ProcessFrameResult to stop strategies for short-circuit evaluation
- Fix MinWordsUserTurnStartStrategy including filtered text in output
2026-03-18 16:47:17 -04:00
Mark Backman
c4be513044 Improvements for Nova Sonic LLM and TTS output frames (#4042)
* Fix empty user transcription causing spurious interruption in Nova Sonic

Skip _report_user_transcription_ended() when _user_text_buffer is empty,
which happens when the initial prompt is text-only. Previously, an empty
TranscriptionFrame was pushed upstream, triggering a chain reaction:
on_user_turn_stopped → UserStartedSpeakingFrame → interruption →
premature BotStoppedSpeaking → multiple response start/stop cycles.

* Improve TextFrame and assistant end of turn logic

Now, SPECULATIVE text results are used to push the LLMTextFrame,
AggregatedTextFrame, and TTSTextFrame. Additionally, the TTSTextFrames
are push at the end of the corresponding audio segment. 

* Remove BotStoppedSpeakingFrame fallback from Nova Sonic

Now that assistant response end is detected directly from Nova Sonic
contentEnd events (END_TURN and INTERRUPTED), the BotStoppedSpeakingFrame
handler is no longer needed. Inline the cleanup logic in reset_conversation.
2026-03-18 16:04:12 -04:00
Mark Backman
4b704e6d3a GradiumSTTService improvements (#4066)
* Remove duplicate reconnection logic from Gradium STT

The _receive_messages method had its own while-True reconnect loop,
duplicating the reconnection handling already provided by
WebsocketService._receive_task_handler (exponential backoff, max
retries, error reporting). Flatten to just the inner message loop
and let the base class handle reconnection.

* Align Gradium STT VAD handling with base class patterns

Replace the process_frame override with a _handle_vad_user_stopped_speaking
override, which is the proper hook provided by STTService. Move
start_processing_metrics() into run_stt (matching Gladia's pattern).
Remove unused FrameDirection and VADUserStartedSpeakingFrame imports.

* Add transcript aggregation delay after flushed to capture trailing tokens

Gradium flushed response can arrive before all text tokens have been
delivered. Instead of finalizing immediately on flushed, start a short
timer (100ms) that allows trailing tokens to accumulate before pushing
the final TranscriptionFrame.

* Add changelog for PR #4066

* Change default encoding to pcm_16000

* Decouple encoding from sample_rate in Gradium STT

The encoding parameter now takes just the base type (pcm, wav, opus)
and the sample rate is derived from the pipeline audio_in_sample_rate,
assembled dynamically via input_format_from_encoding(). This fixes the
mismatch where SAMPLE_RATE=24000 was passed to the base class while
encoding defaulted to pcm_16000.
2026-03-18 15:57:34 -04:00
Paul Kompfner
b1a8588209 feat: add 12- and 14d- image/video examples for OpenAI Responses 2026-03-18 15:39:06 -04:00
Paul Kompfner
5de794e1da feat: add service_tier support to OpenAIResponsesLLMService 2026-03-18 15:29:04 -04:00
Paul Kompfner
891966346c feat: add 55zi update-settings example for OpenAI Responses 2026-03-18 15:17:16 -04:00
Paul Kompfner
2001ab4577 feat: add 20a persistent context example for OpenAI Responses 2026-03-18 15:14:28 -04:00
Paul Kompfner
0449df828c chore: update previous_response_id comment 2026-03-18 15:07:10 -04:00
Paul Kompfner
951bb0c1a7 feat: set store=False and add run_inference tests
Set store=False in Responses API calls since we send full conversation
history as input items and don't use previous_response_id.

Add 5 run_inference tests for OpenAIResponsesLLMService using real
LLMContext and adapter (only HTTP client mocked).
2026-03-18 14:47:12 -04:00
Paul Kompfner
21b1812c71 chore: add note about previous_response_id and empty input handling 2026-03-18 14:26:51 -04:00
Paul Kompfner
c4f21ef76b test: add run_inference tests for OpenAIResponsesLLMService
Uses real LLMContext and adapter (only HTTP client is mocked) to test
basic inference, client exception propagation, system_instruction
override, empty context fallback, and max_tokens override.
2026-03-18 14:17:21 -04:00
Paul Kompfner
a7167ad121 test: add run_inference tests for OpenAIResponsesLLMService
Tests cover basic inference, client exception propagation,
system_instruction override, and max_tokens override.
2026-03-18 14:09:17 -04:00
Paul Kompfner
eaccb96454 docs: add changelog for OpenAI Responses API service 2026-03-18 11:46:49 -04:00
Paul Kompfner
45186cc4ce feat: add OpenAI Responses API LLM service
Add OpenAIResponsesLLMService using the Responses API, with a dedicated
adapter that converts LLMContext messages to Responses API input items
(system→developer, tool_calls→function_call, tool→function_call_output,
multimodal content conversion, and tools schema flattening).

- New adapter: open_ai_responses_adapter.py
- New service: openai/responses/llm.py
- Examples: 07-interruptible and 14-function-calling variants
- 19 unit tests for adapter conversion logic
- Eval entries for both examples
2026-03-18 11:45:23 -04:00
joachimchauvet
0378fb0d91 fix(livekit): suppress InvalidState log spam from audio mixer during interruptions 2026-03-18 16:04:42 +02:00
Mark Backman
53388e0426 Merge pull request #4063 from pipecat-ai/mb/wake-word-start-strategy 2026-03-17 21:05:10 -04:00
Mark Backman
edf16c5533 fix: pass list-type Deepgram settings as lists instead of stringifying
List-valued settings like keyterm, keywords, search, redact, and replace
were being converted to strings before being passed to the SDK connect()
method. The SDK expects lists so its encode_query can produce repeated
query params (keyterm=a&keyterm=b).
2026-03-17 18:24:20 -04:00
Mark Backman
d4f69dd333 Merge pull request #4046 from pipecat-ai/mb/fix-4045
Fix SonioxSTTService crash when language_hints contains plain strings…
2026-03-17 16:41:11 -04:00
Mark Backman
a32f558b07 Merge pull request #4026 from pipecat-ai/mb/fix-deepgram-base-url
Fix DeepgramSTTService base_url forcing HTTPS/WSS schemes
2026-03-17 16:39:24 -04:00
Mark Backman
4e99cb39b0 Merge pull request #4056 from pipecat-ai/mb/fix-filter-turns-deprecation
Fix deprecation warning when using filter_incomplete_user_turns
2026-03-17 16:23:43 -04:00
Mark Backman
10b3bff525 Merge pull request #4058 from pipecat-ai/mb/improve-stt-tts-language-code-robustness
fix: resolve raw language strings through Language enum for proper service conversion
2026-03-17 16:20:12 -04:00
Mark Backman
95ee096622 Merge pull request #4057 from pipecat-ai/mb/fix-4053
Fix stale state in user turn stop strategies between turns
2026-03-17 16:19:31 -04:00
Mark Backman
6799995b0a Merge pull request #4062 from pipecat-ai/mb/update-pyasn1-0.6.3
Update uv.lock with pyasn1 v0.6.3
2026-03-17 16:19:13 -04:00
Mark Backman
05abc95b5f Update uv.lock with pyasn1 v0.6.3 2026-03-17 16:10:35 -04:00
Mark Backman
18e654b3f0 docs: add changelog for #4058 2026-03-17 12:01:50 -04:00
Mark Backman
790a23d2e5 fix: resolve raw language strings through Language enum for proper service conversion
Raw strings like "de-DE" passed as the language parameter to TTS/STT services
were bypassing the Language enum resolution logic, causing silent failures
(e.g. ElevenLabs expects "de" not "de-DE"). Now raw strings are first converted
to Language enums so they go through the same resolve_language() path, with a
warning logged for unrecognized strings.
2026-03-17 12:00:28 -04:00
Mark Backman
d70df1d8b0 Add changelog for #4057 2026-03-17 11:35:38 -04:00
Mark Backman
5000b040dd Fix stale state in user turn stop strategies between turns
Reset stop strategies at turn start (not just turn stop) so that late
transcriptions arriving between turns do not leave stale _text that
causes premature stops on the next turn. Also cancel pending timeout
tasks in reset() for both SpeechTimeout and TurnAnalyzer strategies.
2026-03-17 11:31:08 -04:00
Mark Backman
248419a7c4 Merge pull request #4050 from pipecat-ai/copilot/update-enable-dialout-to-false
Fix PSTN runner defaulting enable_dialout to True
2026-03-17 11:07:23 -04:00
Mark Backman
024e2ebd4e Fix deprecation warning when using filter_incomplete_user_turns 2026-03-17 10:51:01 -04:00
Mark Backman
091f88e42e feat: add enable_dialout parameter to configure() for dial-out rooms
Expose enable_dialout as a configure() parameter (default False) so
dial-out examples can opt in without needing to build DailyRoomProperties
manually.
2026-03-17 09:03:50 -04:00
Mark Backman
e11b486312 fix: clean up configure() type hints, deduplicate token expiry, and improve comment
Narrow misleading Optional type hints on parameters that never accept
None, extract the duplicated token_exp_duration * 60 * 60 calculation,
remove unnecessary forward-reference quotes on DailyMeetingTokenProperties,
and clarify why enable_dialout is explicitly set to False.
2026-03-17 08:54:07 -04:00
Mark Backman
f54b3c6884 Merge pull request #4048 from julienvantyghem/daily-audio-only-docstring
update enable_recording param  documentation
2026-03-17 08:21:50 -04:00
copilot-swe-agent[bot]
7e60320a74 fix: set enable_dialout to False in PSTN runner to prevent room creation failures
Co-authored-by: jamsea <614910+jamsea@users.noreply.github.com>
2026-03-17 04:04:11 +00:00
copilot-swe-agent[bot]
89cb0f089e Initial plan 2026-03-17 04:01:00 +00:00
Julien Vantyghem
e5b4403ed4 update docstring following https://github.com/pipecat-ai/pipecat/pull/3916 2026-03-16 19:54:04 -06:00
Mark Backman
a0595adbdc Merge pull request #4012 from pipecat-ai/mb/deprecate-old-local-smart-turn 2026-03-16 21:09:26 -04:00
Mark Backman
dc1632bbac Merge pull request #4023 from pipecat-ai/mb/update-small-webrtc-prebuilt-2.4.0 2026-03-16 21:09:08 -04:00
Mark Backman
53f49ac094 Merge pull request #4024 from pipecat-ai/mb/fix-lang-enum-stt-tts 2026-03-16 21:08:48 -04:00
Mark Backman
bf02d61418 Merge pull request #4025 from pipecat-ai/mb/fix-example-system-instruction 2026-03-16 21:07:01 -04:00
Mark Backman
154a8d1987 Merge pull request #4035 from pipecat-ai/mb/bump-pyjwt-version 2026-03-16 21:06:31 -04:00
Mark Backman
fa5b757408 Merge pull request #4044 from pipecat-ai/mb/pyopenssl-upgrade 2026-03-16 21:06:09 -04:00
Aleix Conchillo Flaqué
c765bc98d3 Merge pull request #4047 from pipecat-ai/aleix/daily-python-0.25.0-dtmf-events
Update daily-python to 0.25.0 and add DTMF input events
2026-03-16 18:05:10 -07:00
Aleix Conchillo Flaqué
59486d5abf Add changelog entries for PR #4047 2026-03-16 17:58:12 -07:00
Aleix Conchillo Flaqué
5cb6aecc9f Add DTMF input event support to Daily transport
Handle Daily's on_dtmf_event callback, convert it to an
InputDTMFFrame pushed into the input transport. Also add __str__
methods to InputDTMFFrame and OutputDTMFFrame for better logging.
2026-03-16 17:57:39 -07:00
Aleix Conchillo Flaqué
5c685c35d7 pyproject: update daily-python to 0.25.0 2026-03-16 17:41:44 -07:00
Aleix Conchillo Flaqué
1a1d5e6a84 Merge pull request #4006 from pipecat-ai/aleix/task-frame-flush-ordering
handle EndTaskFrame, StopTaskFrame and CancelTaskFrame downstream
2026-03-16 17:35:11 -07:00
Mark Backman
abb8bae6f7 Add changelog for #4046 2026-03-16 19:51:37 -04:00
Mark Backman
2801439e48 Fix OpenAI STT crash when language is a plain string instead of Language enum 2026-03-16 19:48:49 -04:00
Mark Backman
3b8d040e41 Fix SonioxSTTService crash when language_hints contains plain strings (#4045)
Refactor language_to_soniox_language to use resolve_language + LANGUAGE_MAP
pattern consistent with other services. Fix resolve_language fallback to use
str(language) instead of language.value so plain strings don't crash.
2026-03-16 19:45:03 -04:00
Mark Backman
538b9fa2d9 Bump pyopenssl in uv.lock to 26.0.0 2026-03-16 17:58:44 -04:00
dhruvladia-sarvam
8a4f6b486e wrapper fixes 2026-03-17 02:47:47 +05:30
dhruvladia-sarvam
8745f20330 fix llm wrapper redundancy and restore run_inference parity 2026-03-15 22:24:06 +05:30
Mark Backman
b437cbe126 Merge pull request #4037 from omChauhanDev/fix/llm-switcher-timeout-secs
forward timeout_secs in LLMSwitcher register methods
2026-03-15 10:08:11 -04:00
Om Chauhan
ed0f5ab09b added changelog 2026-03-15 19:15:18 +05:30
Om Chauhan
a6ad8a355b forward timeout_secs in LLMSwitcher register methods 2026-03-15 19:10:32 +05:30
Mark Backman
e8415b7451 Add changelog for #4035 2026-03-15 08:56:54 -04:00
Mark Backman
24c3d23229 Bump PyJWT minimum version to 2.12.0 for CVE-2026-32597
Addresses Dependabot alert #165 (GHSA-752w-5fwx-jx9f) where PyJWT
<= 2.11.0 accepts unknown `crit` header extensions.
2026-03-15 08:53:06 -04:00
Ian Lee
3e5be23bd8 fix(inworld): close context at end of turn instead of relying on idle timeout
The Inworld WS TTS plugin previously relied on the base TTS service's 3-second AUDIO_CONTEXT_TIMEOUT to detect when audio was done, then sent close_context in on_audio_context_completed. This added unnecessary latency before TTSStoppedFrame was emitted.

The original implementation likely borrowed this idea from the 11labs' impelementation. But it's likely better to mirror the Cartesia plugin pattern where on_audio_context_completed is a no-op because the server signals completion directly.

Now close_context is sent in on_turn_context_completed (right after flush_context), so the server responds with contextClosed immediately after the last audio byte. The existing receive handler already calls remove_audio_context on contextClosed, which exits the audio context handler cleanly.
2026-03-13 12:52:07 -07:00
Mark Backman
2f7c441c1c Add changelog for #4026 2026-03-13 13:55:27 -04:00
Mark Backman
79b7a0f969 Fix DeepgramSTTService base_url forcing HTTPS/WSS schemes
The base_url parameter previously forced wss:// and https:// schemes,
breaking air-gapped or private deployments that need ws:// or http://.
Extract URL derivation into _derive_deepgram_urls() helper that respects
the developers scheme choice while deriving the paired WebSocket and
HTTP URLs the Deepgram SDK requires.

Closes #4019
2026-03-13 13:53:06 -04:00
Mark Backman
978a1a2083 Update the system_instruction wording in the foundational examples to not mention WebRTC call 2026-03-13 12:22:10 -04:00
Mark Backman
0ec5f5e5ac Add missing language deprecations for XTTSService, LmntTTSService 2026-03-13 11:33:59 -04:00
Garegin Harutyunyan
33f042b500 format fix. 2026-03-13 19:32:39 +04:00
Garegin Harutyunyan
0722784f3a tests for VAD. 2026-03-13 19:30:03 +04:00
Mark Backman
1ea23ad362 Add changelog for #4024 2026-03-13 10:58:51 -04:00
Mark Backman
9f2f73b6b4 Remove redundant per-service language conversion from subclasses
Now that the base TTSService and STTService handle Language enum
conversion at init time, subclasses no longer need to convert in their
own __init__ methods. Remove conversion calls from hardcoded defaults,
params paths, and deprecated direct arg paths across 22 service files.

Services just pass raw Language enums and let the base class convert
via language_to_service_language() polymorphic dispatch.
2026-03-13 10:57:04 -04:00
Mark Backman
8467058e48 Fix Language enum conversion at init time in base TTS/STT services
When a Language enum (e.g. Language.ES) is passed via
settings=Service.Settings(language=Language.ES), it gets stored as-is
without conversion to the service-specific code. The base
_update_settings() handles this for runtime updates, but at init time
apply_update() copies the raw enum. This causes API errors because
services send the unconverted enum value.

Add language conversion in TTSService.__init__ and STTService.__init__
after super().__init__(), using the subclass language_to_service_language()
via normal method resolution.
2026-03-13 10:56:33 -04:00
Garegin Harutyunyan
cbc1c275b3 num_frames_required() implementation. 2026-03-13 18:28:22 +04:00
Mark Backman
7365ebfdf9 Add changelog for #4023 2026-03-13 10:22:58 -04:00
Garegin Harutyunyan
14ca70f13e Fixed format issue. 2026-03-13 18:22:56 +04:00
Mark Backman
1064482ade Update pipecat-ai-small-webrtc-prebuilt to 2.4.0 2026-03-13 10:20:51 -04:00
Garegin Harutyunyan
f7568a91b1 Draft Implementation for Krisp VIVA VAD. 2026-03-13 18:12:21 +04:00
Ian Lee
dfe5fec8f9 [inworld] prewarm context on llm response start 2026-03-12 15:34:57 -07:00
Mark Backman
ed0b8dadb5 Add changelog for #4012 2026-03-12 17:22:13 -04:00
Mark Backman
de38ca626d Deprecate LocalSmartTurnAnalyzerV2 and LocalCoreMLSmartTurnAnalyzer
Both analyzers are superseded by LocalSmartTurnAnalyzerV3. Added
deprecation warnings and docstring notices following the existing
pattern from LocalSmartTurnAnalyzer.
2026-03-12 17:19:32 -04:00
kompfner
30d95e3b84 Merge pull request #4009 from pipecat-ai/pk/perplexity-message-ordering-strictness
Add PerplexityLLMAdapter for message ordering strictness
2026-03-12 16:51:11 -04:00
Paul Kompfner
99f28120b7 Remove trailing system→user conversion for cross-call stability
Perplexity appears to have statefulness within a conversation, so
converting a system message to "user" in one call and then back to
"system" in the next (after more messages are appended) causes API
errors. Remove the trailing system→user conversion entirely — if the
context only has system messages, the API call will fail but the
mistake will be caught right away.
2026-03-12 16:07:39 -04:00
Paul Kompfner
e69f5a76e1 Add test for trailing assistant+system ordering, improve docstring
Add test exercising the step 3 ordering where stripping a trailing
assistant exposes a system message that then gets converted to user.
Move the reasoning about when a trailing system message can occur
into the docstring.
2026-03-12 15:24:17 -04:00
Paul Kompfner
7f98cc9921 Remove initial system message merging, handle trailing system messages
Perplexity allows multiple initial system messages, so don't merge them.
Instead, skip system-system pairs during the consecutive same-role merge
step. Broaden the trailing message fix to convert any trailing system
message to user (not just a lone system message), so contexts with only
system messages don't fail.
2026-03-12 15:14:56 -04:00
Mark Backman
43a2d55c61 Merge pull request #4010 from pipecat-ai/mb/quickstart-cloud-build
Update quickstart to use cloud builds
2026-03-12 15:07:06 -04:00
Paul Kompfner
e4bf6281c6 Add changelog for #4009 2026-03-12 14:56:37 -04:00
Paul Kompfner
0373f85b85 Add PerplexityLLMAdapter to enforce Perplexity's message ordering constraints
Perplexity's API is stricter than OpenAI about conversation history:
- Requires strict alternation between user/tool and assistant messages
- Disallows system messages except as the initial message
- Requires the last message to be user or tool

The new adapter transforms messages before sending to satisfy all three
constraints: merging consecutive initial system messages, converting
non-initial system to user, merging consecutive same-role messages, and
removing trailing assistant messages.

Also adds dual-system-instruction warnings to Cerebras, Fireworks,
Mistral, Perplexity, and SambaNova services (matching the existing
BaseOpenAILLMService pattern), and updates the warning text in
BaseOpenAILLMService to be more descriptive.
2026-03-12 14:56:30 -04:00
Mark Backman
38a4d4ff23 Update quickstart to use cloud builds 2026-03-12 14:46:49 -04:00
Aleix Conchillo Flaqué
f6f08d19a8 Add changelog for #4006 2026-03-12 11:34:25 -07:00
Aleix Conchillo Flaqué
2eccd28cf0 handle EndTaskFrame, StopTaskFrame and CancelTaskFrame downstream
EndTaskFrame and StopTaskFrame are now ControlFrames instead of
SystemFrames, so they flow through the pipeline and queue behind
pending work. This prevents races where EndFrame could overtake
in-flight frames (e.g. function call responses).

CancelTaskFrame and InterruptionTaskFrame remain SystemFrames
(via new TaskSystemFrame base): since they need immediate propagation.

The sink now catches EndTaskFrame, StopTaskFrame and CancelTaskFrame
downstream and re-queues it upstream to the task, ensuring the full
pipeline drains before shutdown begins.
2026-03-12 11:34:25 -07:00
Aleix Conchillo Flaqué
374bfd4068 Merge pull request #4007 from pipecat-ai/aleix/fix-parallel-pipeline-flush-and-tts-stop-order
Fix ParallelPipeline flush ordering and TTS stop sequence
2026-03-12 10:21:31 -07:00
Aleix Conchillo Flaqué
a461b2b9e6 Add changelog entries for PR #4007 2026-03-12 10:16:29 -07:00
Aleix Conchillo Flaqué
1a66bdef8e Fix TTS stop ordering to drain audio contexts before canceling
Wait for _audio_context_task to finish draining the contexts queue
before canceling _stop_frame_task, ensuring all pending audio
contexts are processed during shutdown.
2026-03-12 10:16:29 -07:00
Aleix Conchillo Flaqué
73a56f5d81 Fix ParallelPipeline flush ordering and buffered frame handling
Flush buffered frames before pushing the synchronization frame so
downstream processors see the buffered frames first.  Switch to a
while-loop with pop(0) so frames added to the buffer during flush
are also drained.
2026-03-12 10:16:29 -07:00
kompfner
383300979d Merge pull request #4004 from pipecat-ai/pk/service-settings-update-frame-can-target-specific-service
Add optional `service` field to `ServiceUpdateSettingsFrame` for targ…
2026-03-12 11:48:41 -04:00
Paul Kompfner
27b686db8c Don't bother honoring the new LLMUpdateSettingsFrame.service field in the deprecated OpenAIRealtimeBetaLLMService 2026-03-12 11:04:49 -04:00
Mark Backman
3ffa72170b Merge pull request #3457 from ahoshaiyan/fix/reduce-tool-result-context-size
Reduce Tool Result Context Size by Using UTF-8 for JSON Serialization
2026-03-12 10:41:33 -04:00
Mark Backman
1fe1f0f439 Apply ensure_ascii=False to remaining LLM services and fix changelog format 2026-03-12 10:35:19 -04:00
Ali Alhoshaiyan
765fbeec63 Add changelog 2026-03-12 10:35:19 -04:00
Ali Alhoshaiyan
84538b0ca8 Reduce Call Tool Result Context Size by Allowing UTF-8 in JSON Serialization 2026-03-12 10:35:19 -04:00
Mark Backman
1c676c2073 Merge pull request #4005 from pipecat-ai/add-sip-provider-room-geo-to-configure
Add sip_provider and room_geo params to configure()
2026-03-12 09:28:28 -04:00
Mark Backman
bf66ae7e46 Add changelog for #4005 2026-03-12 09:22:31 -04:00
Varun Singh
7a7d600985 Add sip_provider and room_geo parameters to configure()
Add convenience parameters to configure() so callers don't need to
manually construct DailyRoomProperties/DailyRoomSipParams for common
SIP provider and geo configuration.
2026-03-11 21:50:10 -07:00
Paul Kompfner
36b57252b4 Add changelog for PR #4004 2026-03-11 21:47:51 -04:00
Paul Kompfner
65e4e365dc Add optional service field to ServiceUpdateSettingsFrame for targeting a specific service instance
When `service` is set and doesn't match, the service forwards the frame instead of consuming it. This allows targeting a specific service when multiple services of the same type exist in the pipeline.
2026-03-11 21:41:43 -04:00
kompfner
36f9a6d809 Merge pull request #4003 from pipecat-ai/pk/fix-deprecated-vad-analyzer-usage
Fix deprecated vad_analyzer usage in examples
2026-03-11 20:55:39 -04:00
Mark Backman
904331bba1 Merge pull request #4001 from pipecat-ai/mb/simli-settings
Migrate SimliVideoService to AIService with Settings pattern
2026-03-11 17:45:59 -04:00
Mark Backman
11b14b7857 Add changelog for PR #4001 2026-03-11 17:40:53 -04:00
Mark Backman
c0a3cdd35c Merge pull request #4002 from pipecat-ai/mb/update-quickstart-0.0.105
Update quickstart example for 0.0.105
2026-03-11 17:39:07 -04:00
Paul Kompfner
69e7677f4f Remove changelog for #4003 2026-03-11 17:33:20 -04:00
Paul Kompfner
9a0568e6fe Add changelog for #4003 2026-03-11 17:32:39 -04:00
Paul Kompfner
ccc2549c0c Broaden the vad_analyzer deprecation warning in BaseInputTransport to account for use-cases where there is no LLMUserAggregator at play 2026-03-11 17:28:26 -04:00
Paul Kompfner
e456a6bb23 Move away from remaining deprecated TransportParams.vad_analyzer usage in example files. Skip updates to deprecated services. 2026-03-11 17:17:40 -04:00
Mark Backman
2d9dc2fa1c Update quickstart example for 0.0.105 2026-03-11 17:12:59 -04:00
Mark Backman
59dc30a84d Merge pull request #3997 from pipecat-ai/mb/sarvam-package-0.1.26
Update sarvamai dependency from 0.1.26a2 to 0.1.26
2026-03-11 16:59:32 -04:00
Mark Backman
a54aa2d1f8 Migrate SimliVideoService to AIService with Settings pattern
Align Simli with HeyGen/Tavus by extending AIService instead of
FrameProcessor and using a ServiceSettings dataclass. InputParams is
preserved but deprecated; its fields are promoted to direct init params.
Lifecycle handling moves to start()/stop()/cancel() methods.
2026-03-11 16:56:41 -04:00
Mark Backman
3ceff3d5fd Merge pull request #4000 from pipecat-ai/mb/fix-openai-default-model
Fix: Restore default model to gpt-4.1 for OpenAI, Azure
2026-03-11 16:29:51 -04:00
kompfner
52057d628e Merge pull request #3999 from pipecat-ai/pk/camb-voice-int
Override CambTTSSettings.voice type from str to int to match Camb.ai'…
2026-03-11 16:18:59 -04:00
Mark Backman
4a45145cba Restored the default model to gpt-4.1 for OpenAI and Azure LLM services
The default model for OpenAILLMService and AzureLLMService was still set
to gpt-4o. Restored it to gpt-4.1. Also, removed hardcoded gpt-4o/gpt-4o-mini
model references from examples so they pick up the new default.
2026-03-11 16:18:47 -04:00
Paul Kompfner
080ed22ff5 Override CambTTSSettings.voice type from str to int to match Camb.ai's integer voice IDs 2026-03-11 15:44:05 -04:00
Mark Backman
71e6158861 Add changelog for PR #3997 2026-03-11 14:18:47 -04:00
Mark Backman
a9e124b84f Update sarvamai dependency from 0.1.26a2 to 0.1.26
Bump the Sarvam AI SDK to the stable release version.
2026-03-11 14:17:40 -04:00
kompfner
65561a1d83 Merge pull request #3996 from pipecat-ai/pk/prefer-nested-settings-alias
Prefer nested settings alias
2026-03-11 13:41:29 -04:00
Paul Kompfner
e5b60ba095 Make deprecated-init-param warnings recommend the preferred Service.Settings(...) pattern
Move the warning helper into AIService as _warn_init_param_moved_to_settings.
It now uses type(self).__name__ to produce messages like
"Use settings=AnthropicLLMService.Settings(model=...)" instead of the raw
settings class name "AnthropicLLMSettings(model=...)". Callers no longer need
to pass the settings class explicitly.
2026-03-11 13:04:15 -04:00
Paul Kompfner
eb9212f152 Update COMMUNITY_INTEGRATIONS.md code sample to prefer Settings alias over raw settings class name 2026-03-11 12:37:43 -04:00
Paul Kompfner
51a8a28a99 Prefer Service.ThinkingConfig over raw ThinkingConfig class names in Anthropic and Google services and examples 2026-03-11 12:34:10 -04:00
Paul Kompfner
6b168d6bbb Prefer Service.Settings over raw settings class names across all services
Replace direct references to settings class names (e.g. `FooSettings`) with the nested `Settings` alias form throughout all 87 service files:
- Type annotations: `Settings`
- Runtime code: `self.Settings`
- Docstrings: `ServiceClass.Settings`
- Cross-file inheritance: `ParentService.Settings`

This makes the `Settings` alias the canonical way to reference a service's settings, keeping only the class definition and alias assignment as the remaining hits for each raw settings class name.
2026-03-11 12:15:00 -04:00
kompfner
cbb4835e7b Merge pull request #3991 from pipecat-ai/pk/fix-out-of-date-docstrings
Fix out of date docstrings
2026-03-11 10:54:40 -04:00
Paul Kompfner
3cbd27d202 Add changelog for PR #3991 2026-03-11 10:44:15 -04:00
Paul Kompfner
42262d10bb Move OpenAIRealtimeSTTService's noise_reduction into its Settings object, as it might be useful to update it at runtime, and fix outdated OpenAIRealtimeSTTService docstring example 2026-03-11 10:44:15 -04:00
Paul Kompfner
df82df8e39 Fix outdated Google + Gemini TTS service docstring examples 2026-03-11 10:14:18 -04:00
Paul Kompfner
0ebcb55582 Fix outdated DeepgramSageMakerTTSService docstring example 2026-03-11 10:11:26 -04:00
Paul Kompfner
264ce681f7 Fix outdated DeepgramSageMakerSTTService docstring example 2026-03-11 10:10:15 -04:00
Paul Kompfner
916936d3ee Fix outdated Sarvam TTS docstring examples 2026-03-11 10:07:07 -04:00
Paul Kompfner
087abc9bb9 Fix outdated CambTTSService docstring example 2026-03-11 10:03:21 -04:00
Aleix Conchillo Flaqué
7e88b13421 Merge pull request #3983 from pipecat-ai/changelog-0.0.105
Release 0.0.105 - Changelog Update
2026-03-10 17:59:02 -07:00
aconchillo
610dc25fb1 Update changelog for version 0.0.105 2026-03-10 17:58:32 -07:00
Aleix Conchillo Flaqué
327bcfa8d2 Merge pull request #3982 from pipecat-ai/aleix/fix-examples
Fix Groq, Google, and Nvidia examples
2026-03-10 17:37:26 -07:00
Aleix Conchillo Flaqué
4c19337d89 Fix examples: Groq model, Google settings class, Nvidia system instruction 2026-03-10 15:29:52 -07:00
dhruvladia-sarvam
dc0386937a Initial 2026-03-11 02:27:57 +05:30
Aleix Conchillo Flaqué
a4310d4335 Merge pull request #3980 from pipecat-ai/aleix/move-google-vertex-openai
Move Google Vertex and OpenAI LLM modules to subpackages
2026-03-10 13:37:02 -07:00
Aleix Conchillo Flaqué
23218aaed7 Add changelog for #3980 2026-03-10 13:04:16 -07:00
Aleix Conchillo Flaqué
7be2c43e1d Update imports to use new google.gemini_live.vertex path 2026-03-10 13:00:31 -07:00
Aleix Conchillo Flaqué
ea09586db6 Add deprecation stub for google/gemini_live/llm_vertex.py 2026-03-10 13:00:02 -07:00
Aleix Conchillo Flaqué
d086b9f138 Move google/gemini_live/llm_vertex.py to google/gemini_live/vertex/llm.py 2026-03-10 12:59:36 -07:00
Aleix Conchillo Flaqué
b23652caa6 Update imports to use new google.vertex and google.openai paths 2026-03-10 12:58:04 -07:00
Aleix Conchillo Flaqué
4fa3890cec Add deprecation stub for google/llm_openai.py 2026-03-10 12:55:16 -07:00
Aleix Conchillo Flaqué
8ea006739c Move google/llm_openai.py to google/openai/llm.py 2026-03-10 12:54:37 -07:00
Aleix Conchillo Flaqué
b159d02b0c Add deprecation stub for google/llm_vertex.py 2026-03-10 12:54:05 -07:00
Aleix Conchillo Flaqué
0df421de9c Move google/llm_vertex.py to google/vertex/llm.py 2026-03-10 12:53:13 -07:00
Aleix Conchillo Flaqué
ed5b061716 Merge pull request #3979 from pipecat-ai/aleix/daily-optional-transcription-settings
Clean up start_transcription to use its settings parameter
2026-03-10 12:51:31 -07:00
kollaikal-rupesh
80bd935c19 Add ServiceSwitcherStrategyFailover for automatic failover on service errors (#3870)
* Add ServiceSwitcherStrategyFailover for automatic error-based service switching

Introduce a strategy hierarchy: ServiceSwitcherStrategy (base) →
ServiceSwitcherStrategyManual (handles ManuallySwitchServiceFrame) →
ServiceSwitcherStrategyFailover (adds error-based failover). ServiceSwitcher
now defaults to ServiceSwitcherStrategyManual with strategy_type optional.
Non-fatal ErrorFrames are forwarded to the strategy via handle_error().

* Move metadata request into _set_active_if_available

Requesting metadata is part of making a service active, so it belongs
alongside setting _active_service and firing on_service_switched. This
removes the duplicate queue_frame calls from ServiceSwitcher push_frame
and process_frame.
2026-03-10 15:37:30 -04:00
Mark Backman
43a9e9a1b5 Merge pull request #3899 from pipecat-ai/mb/tracing-service-settings-comment
Add defensive comment for given_fields() usage in tracing
2026-03-10 15:33:57 -04:00
Aleix Conchillo Flaqué
11a0c11050 Fix start_transcription ignoring its settings argument
DailyTransportClient.start_transcription() accepted a settings
parameter but always used self._params.transcription_settings
instead, silently discarding any custom settings passed by callers.
2026-03-10 12:08:53 -07:00
Aleix Conchillo Flaqué
4a2d57511d Make DailyParams.transcription_settings optional
Change transcription_settings to Optional[DailyTranscriptionSettings]
defaulting to None. The default settings are now applied at the call
site when transcription is started, and start_transcription receives
the serialized settings dict directly.
2026-03-10 11:55:38 -07:00
Aleix Conchillo Flaqué
743e2ac277 Merge pull request #3831 from pipecat-ai/aleix/custom-video-tracks
Replace VirtualCameraDevice with CustomVideoTrack + custom video track support
2026-03-10 11:44:29 -07:00
Aleix Conchillo Flaqué
86597cc9ec Add changelog entries for PR #3831 2026-03-10 11:32:16 -07:00
Aleix Conchillo Flaqué
14dd028b8f Add custom video track example with per-track params 2026-03-10 11:32:16 -07:00
Aleix Conchillo Flaqué
18e99123af Replace VirtualCameraDevice with CustomVideoTrack and add custom video track support
Use CustomVideoSource/CustomVideoTrack for the default camera output instead of
VirtualCameraDevice, mirroring how audio already uses CustomAudioSource/CustomAudioTrack.
Add support for custom video destinations (register_video_destination, add/remove
custom video tracks, routing in write_video_frame) so multiple video tracks can be
published simultaneously.
2026-03-10 11:32:16 -07:00
kompfner
6c4a46dc79 Merge pull request #3978 from pipecat-ai/pk/fix-inaccurate-comment
Fix an out-of-date comment for accuracy. In the OpenAI LLM service, w…
2026-03-10 14:29:39 -04:00
Mark Backman
9b26faff05 Merge pull request #3961 from ai-coustics/goekmengoergen/sys-663-re-enable-enhancement-level-feature-on-pipecat
Add enhancement_level support to `AICFilter`.
2026-03-10 14:24:15 -04:00
Paul Kompfner
3790640322 Fix an out-of-date comment for accuracy. In the OpenAI LLM service, we *don't* replace any context system messages with system instructions from the constructor. 2026-03-10 13:59:01 -04:00
Aleix Conchillo Flaqué
c25d5af8c8 Merge pull request #3970 from pipecat-ai/aleix/update-daily-python
Update daily-python to 0.24.0
2026-03-10 10:49:27 -07:00
Aleix Conchillo Flaqué
6e52623959 Merge pull request #3976 from pipecat-ai/aleix/fix-google-system-instruction-priority
Fix Google LLM system instruction priority
2026-03-10 10:48:28 -07:00
Mark Backman
912f1be31c Add system_instruction parameter to run_inference (#3968)
* Add system_instruction parameter to run_inference

Allow callers to provide a custom system instruction directly when calling
run_inference, without having to construct provider-specific context objects.

For OpenAI, the instruction is prepended as a system message (preserving
existing messages). For Anthropic, Google, and AWS Bedrock, it overrides the
single system field with a warning when an existing system instruction is
present in the context.

* Use system_instruction parameter in _generate_summary

Pass the summarization prompt via run_inference's system_instruction
parameter instead of embedding it as a system message in the context.

* Add changelog for #3968
2026-03-10 12:57:23 -04:00
Mark Backman
0817a57f4c Merge pull request #3974 from pipecat-ai/mb/azure-stt-region-optional
Make Azure STT region optional when private_endpoint is used
2026-03-10 12:31:39 -04:00
Aleix Conchillo Flaqué
db27aaa790 Add changelog for #3976 2026-03-10 09:26:26 -07:00
Aleix Conchillo Flaqué
153705f05b Fix Google LLM system instruction priority
Constructor/settings system_instruction now takes priority over the
context system message. Previously the context value would overwrite
the constructor value on every call. Warn when both are set.
2026-03-10 09:25:42 -07:00
Mark Backman
54c767cce3 Merge pull request #3960 from sysradium/fix-realtime-calls
Treat conversation_already_has_active_response as non-fatal in Realtime API
2026-03-10 11:40:34 -04:00
Mark Backman
2ce9179662 Merge pull request #3958 from pipecat-ai/mb/deepgram-tts-audio-context
Route Deepgram WebSocket TTS audio through audio context queue
2026-03-10 11:37:39 -04:00
Mark Backman
50cc01a578 Guard against None context ID in append_to_audio_context
After interruption, both _playing_context_id and _turn_context_id are
None. If a subclass calls append_to_audio_context(None, frame), the
recovery path matches (None == None) and creates a bogus audio context
that blocks the handler from ever processing the real context.

Early-return when context_id is falsy to prevent this.
2026-03-10 11:34:03 -04:00
Mark Backman
d5c0789ab5 Add changelog for #3958 2026-03-10 11:34:03 -04:00
Mark Backman
92b5185165 Route Deepgram WebSocket TTS audio through audio context queue
The Deepgram TTS service was bypassing pipecats audio context management
system, pushing audio frames directly via push_frame() instead of routing
them through append_to_audio_context(). This caused stale audio to leak
into the pipeline after interruptions and missed ordered playback
guarantees.

- Route audio frames through append_to_audio_context() with context
  availability checks to discard stale post-interruption frames
- Handle Flushed responses by appending TTSStoppedFrame and removing
  the audio context to signal completion
- Replace _handle_interruption override with on_audio_context_interrupted
  hook (the recommended pattern used by ElevenLabs and Cartesia)
- Remove redundant process_frame override that caused double-flush
  (base class already flushes via on_turn_context_completed)
- Remove redundant start_tts_usage_metrics call (base class handles
  aggregated usage metrics)
2026-03-10 11:34:03 -04:00
sysradium
ba0ebd5525 Treat conversation_already_has_active_response as non-fatal in Realtime API 2026-03-10 15:57:55 +01:00
Gökmen Görgen
3a6f848a5b update test description. 2026-03-10 14:54:49 +01:00
kompfner
c660152a84 Merge pull request #3966 from pipecat-ai/pk/add-some-more-missing-55-examples
Add missing 55-* update-settings examples for OpenPipe LLM and XTTS TTS
2026-03-10 09:45:58 -04:00
Gökmen Görgen
a96702acfc fix test. 2026-03-10 14:41:18 +01:00
Gökmen Görgen
780559dc32 address feedback. 2026-03-10 14:23:00 +01:00
Gökmen Görgen
8e1c8a38e4 don't change enhancement level if bypass toggled. 2026-03-10 14:18:45 +01:00
Gökmen Görgen
483f6689ed address feedback, use one logging. 2026-03-10 13:52:13 +01:00
Gökmen Görgen
bc11bf9673 remove _is_filter_enabled from AICFilter and refactor related logic and tests. 2026-03-10 13:48:32 +01:00
Gökmen Görgen
82b300298a add changelog. 2026-03-10 13:36:15 +01:00
Gökmen Görgen
0c87fcc48c re-add bypass parameter support to AICFilter and update related unit tests. 2026-03-10 13:36:15 +01:00
Gökmen Görgen
df64f3f943 add enhancement_level support to AICFilter.
# Conflicts:
#	src/pipecat/audio/filters/aic_filter.py
2026-03-10 13:36:15 +01:00
Mark Backman
db22bf0f75 Merge pull request #3973 from yuki901/fish-audio-s2-pro
Update Fish Audio default model from s1 to s2-pro
2026-03-10 07:57:27 -04:00
Mark Backman
edc65fc45e Add changelog for #3974 2026-03-10 07:48:02 -04:00
Mark Backman
233867fdfb Make region optional and validate Azure STT config
Make `region` optional so users can provide only `private_endpoint`.
Raise ValueError if neither is provided, and warn if both are given
(private_endpoint takes priority).
2026-03-10 07:47:05 -04:00
yukiobata1
ceb53e044b Add changelog for #3973 2026-03-10 19:29:47 +09:00
yukiobata1
c7ef23dd22 Update Fish Audio default model from s1 to s2-pro 2026-03-10 18:22:20 +09:00
Kinshuk Bairagi
9cc2644719 Improve system message extraction in traced_llm
Enhanced the logic for extracting the system message in the traced_llm decorator to support LLMContext via adapter and handle exceptions gracefully. This improves compatibility with different context types and ensures better tracing information.
2026-03-10 11:23:29 +05:30
Aleix Conchillo Flaqué
d79e35d84f Add changelog for #3970 2026-03-09 20:47:42 -07:00
Aleix Conchillo Flaqué
00eb190424 Update daily-python to 0.24.0 2026-03-09 20:47:13 -07:00
Mark Backman
0dc95692ba Merge pull request #3967 from pipecat-ai/mb/fix-azure-stt-private-endpoint 2026-03-09 21:57:10 -04:00
Mark Backman
07b901c2a5 Add changelog for #3967 2026-03-09 15:05:20 -04:00
Mark Backman
f533dc3203 Fix Azure STT SpeechConfig failing when private_endpoint is provided
SpeechConfig does not accept both `region` and `endpoint` simultaneously —
they are mutually exclusive. The previous code always passed both, which
raises ValueError when a user supplies a private_endpoint URL. Now we
conditionally pass either `endpoint` or `region`, never both.
2026-03-09 15:05:20 -04:00
Paul Kompfner
20c3f553b2 Add missing 55-* update-settings examples for OpenPipe LLM and XTTS TTS 2026-03-09 14:36:15 -04:00
kompfner
02791cd503 Merge pull request #3965 from pipecat-ai/pk/fix-integration-tests
Fix broken `test_unified_function_calling_anthropic` due to use of an…
2026-03-09 13:42:35 -04:00
kompfner
f2debd9b1d Merge pull request #3963 from pipecat-ai/pk/improve-claude-changelog-skill
Improve changelog skill: prioritize user-facing language and update e…
2026-03-09 13:00:32 -04:00
kompfner
c0c49d0ddc Merge pull request #3964 from pipecat-ai/pk/add-some-missing-55-examples
Add missing 55-* update-settings examples for Piper TTS, Kokoro TTS, …
2026-03-09 12:59:36 -04:00
Mark Backman
3d1f866e73 Merge pull request #3951 from pipecat-ai/mb/remove-unused-imports-2026-03-07
Remove unused imports, 2026-03-07
2026-03-09 12:49:08 -04:00
Mark Backman
786279f143 Remove unused imports, 2026-03-07 2026-03-09 12:44:47 -04:00
Paul Kompfner
9423d22051 Fix broken test_unified_function_calling_anthropic due to use of an unsupported/deprecated model.
Update the tests in test_integration_unified_function_calling.py to not specify particular models but instead just use service defaults (the tests shouldn't be model-dependent anyway)
2026-03-09 12:07:56 -04:00
Paul Kompfner
f1bb065823 Add missing 55-* update-settings examples for Piper TTS, Kokoro TTS, Whisper STT, and Whisper MLX STT
Also fix 13e-whisper-mlx.py to pass MLXModel.LARGE_V3_TURBO.value instead of the enum directly.
2026-03-09 11:54:25 -04:00
Filipi da Silva Fuchter
0c5e936aa5 Merge pull request #3936 from pipecat-ai/filipi/fix_push_aggregation
Fixed TTS context not being appended to the assistant message history
2026-03-09 11:14:38 -04:00
filipi87
f0c5925a79 Fixing Piper test. 2026-03-09 12:07:45 -03:00
Paul Kompfner
7f9169269c Improve changelog skill: prioritize user-facing language and update example changelog 2026-03-09 10:45:33 -04:00
Mark Backman
c16e534f73 Merge pull request #3952 from pipecat-ai/mb/settings-alias
Add Settings class attribute alias to all service classes
2026-03-09 10:45:10 -04:00
filipi87
8ec160f71e Making the changelog more user friendly. 2026-03-09 11:37:11 -03:00
Filipi da Silva Fuchter
1e615cd095 Merge pull request #3962 from pipecat-ai/filipi/smallwebrtc_queue
Queuing the messages received before the data channel is ready
2026-03-09 10:29:05 -04:00
filipi87
ba87d1609c Only marking self._is_yielding_frames_synchronously if receiving TTSAudioRawFrame 2026-03-09 11:24:36 -03:00
Mark Backman
f7dc13c0de Update COMMUNITY_INTEGRATIONS.md for Settings alias class 2026-03-09 10:24:24 -04:00
filipi87
c5ce667387 Retrieving the context_id from the TTSStartedFrame 2026-03-09 11:10:42 -03:00
filipi87
097f9c0896 Fixed to push LLMAssistantPushAggregationFrame when the base TTSService class is responsible for pushing the TTSStoppedFrame. 2026-03-09 11:04:09 -03:00
Filipi da Silva Fuchter
16336a3ea4 Merge pull request #3937 from pipecat-ai/filipi/fix_orphan_function_call
Fix context summarization leaving orphaned tool responses in kept context.
2026-03-09 09:19:17 -04:00
Mark Backman
9eaa99c8e2 Merge pull request #3957 from pipecat-ai/mb/user-turn-completion-system-instruction
Move turn completion instructions to system_instruction
2026-03-09 09:17:06 -04:00
filipi87
4557ef8c42 Renaming method to _get_earliest_function_call_not_resolved_in_range 2026-03-09 10:16:02 -03:00
filipi87
aa693bb5ee Adding changelog entry for the SmallWebRTCConnection fix. 2026-03-09 10:11:40 -03:00
filipi87
74a06a6968 Adding extra comment. 2026-03-09 10:06:38 -03:00
filipi87
322e317a00 Adding guardrails in case the data channel is never established. 2026-03-09 10:04:33 -03:00
filipi87
25165d6e2b Queuing the messages received before the data channel is ready to send them. 2026-03-09 09:47:45 -03:00
Mark Backman
8a02e6fbc5 Merge pull request #3959 from ajmeraharsh/fix/livekit-call-state-updated-args
fix(livekit): remove redundant self arg in on_call_state_updated
2026-03-09 08:38:12 -04:00
Mark Backman
d85ba75dda Merge pull request #3953 from pipecat-ai/mb/deepgram-flux-on-the-fly
Add on-the-fly Configure support for Deepgram Flux STT
2026-03-09 08:36:00 -04:00
ajmeraharsh
ae6f159b18 chore: add changelog entry for #3959 2026-03-09 09:15:03 +04:00
Aleix Conchillo Flaqué
30d0cccef0 Merge pull request #3947 from pipecat-ai/aleix/summary-applied-event
Expose on_summary_applied event on LLMAssistantAggregator
2026-03-08 19:05:50 -07:00
Aleix Conchillo Flaqué
3b947b7844 Add changelog for #3947 2026-03-08 19:02:51 -07:00
Aleix Conchillo Flaqué
1f8cc3d216 Expose on_summary_applied event on LLMAssistantAggregator
Forward the on_summary_applied event from the internal summarizer to
the aggregator so users can listen for it without accessing private
members. Update summarization examples to use the new public event.
2026-03-08 19:02:51 -07:00
ajmeraharsh
57c4d72bf0 fix(livekit): remove redundant self arg in on_call_state_updated event
_on_call_state_updated passes (self, state) to _call_event_handler,
but _run_handler already prepends self when invoking the handler.
This causes handlers to receive 3 positional arguments instead of 2,
making the on_call_state_updated event unusable.

This aligns with how _on_first_participant_joined correctly passes
only the data arg without self.
2026-03-09 02:51:35 +04:00
Mark Backman
64155e8f06 Add changelog for #3957 2026-03-08 10:44:45 -04:00
Mark Backman
efda57de5c Move turn completion instructions to system_instruction
Turn completion instructions were being injected as a system message in
the LLM context, which caused warning spam when system_instruction was
also set, did not persist across full context updates, and broke LLMs
that do not support consecutive system messages.

Instead, compose the turn completion instructions into the LLM service
system_instruction field. This is managed via _base_system_instruction
which stores the original value for restoration when turn completion is
disabled.
2026-03-08 10:41:40 -04:00
Mark Backman
764c3c4f32 Merge pull request #3938 from koriyoshi2041/fix/replace-bare-except-handlers
fix: replace bare except handlers with specific exception types
2026-03-08 09:04:49 -04:00
Mark Backman
a32e0be120 Merge pull request #3956 from radhikagpt1208/fix/turn-completion-mixin-state-reset
Fix turn completion mixin not resetting state when no `InterruptionFrame` is emitted
2026-03-08 08:54:34 -04:00
radhikagpt1208
b14c8e0e94 Fix turn completion mixin not resetting state after each LLM response 2026-03-08 08:46:45 -04:00
kigland
57f0b6d75b fix: address review feedback on exception handling
- mcp_service.py: remove unnecessary try/except around debug log,
  use len(available_tools.tools) to match actual iteration target
- bedrock_adapter.py, aws/llm.py: add AttributeError to except tuple
  to handle None content (previously caught by bare except)
2026-03-08 12:28:03 +08:00
Mark Backman
edd568b002 Merge pull request #3954 from pipecat-ai/mb/revert-quickstart-changes
Revert changes to quickstart
2026-03-07 15:49:30 -05:00
Mark Backman
807759b874 Revert changes to quickstart 2026-03-07 15:44:26 -05:00
Mark Backman
cd28c82de3 Update examples to use the class Settings alias 2026-03-07 09:15:24 -05:00
Mark Backman
4ebdacdea2 Add changelog for #3953 2026-03-07 08:48:11 -05:00
Mark Backman
c5da3cf2bd Add on-the-fly Configure support for Deepgram Flux STT
Wire up the existing settings update infrastructure to send a Configure
WebSocket message when keyterm, eot_threshold, eager_eot_threshold, or
eot_timeout_ms change mid-stream, avoiding a full reconnect.
2026-03-07 08:37:27 -05:00
Mark Backman
26631a9c31 Add Settings class attribute alias to all service classes
Add a `Settings` class-level alias on every STT, LLM, TTS, image,
vision, and video service class pointing to its settings dataclass.
This lets developers discover the right settings class via the service
class itself (e.g. `GoogleSTTService.Settings(...)`) without needing
to know or import the separate settings class name.
2026-03-07 08:17:40 -05:00
Mark Backman
fdf9fb6f02 Merge pull request #3946 from pipecat-ai/mb/tts-settings-review
Review TTS settings
2026-03-07 07:48:26 -05:00
Mark Backman
1cfaea2007 Address code review feedback 2026-03-07 07:42:42 -05:00
kompfner
dc97ffc909 Merge pull request #3943 from pipecat-ai/pk/llm-settings-updates
Minor findings from auditing LLM settings
2026-03-06 22:39:22 -05:00
Paul Kompfner
622d9279cb Use exact service class names in LLMSettings docstrings 2026-03-06 22:35:55 -05:00
Paul Kompfner
6088e6eb52 Make budget_tokens optional in AnthropicThinkingConfig
budget_tokens is required when type is "enabled" and rejected when type is "disabled" (this is validated by the server)
2026-03-06 22:32:14 -05:00
Paul Kompfner
256c8f87b4 Add missing step 3 comment to LLM service init methods
Adds the explicit "no params object" step 3 comment to all
LLM services that skip from step 2 to step 4 in their
settings initialization sequence, matching the pattern
established in services that do have a params object.
2026-03-06 22:32:14 -05:00
Mark Backman
f630d79900 Merge pull request #3944 from pipecat-ai/mb/gemini-live-settings-examples-fixes 2026-03-06 22:14:32 -05:00
Mark Backman
ec93cd1d51 Fix settings update handling in additional STT services 2026-03-06 21:52:45 -05:00
Mark Backman
9c42d27f4d Support runtime language updates in Azure STT
Extract recognizer setup/teardown into _connect/_disconnect so
_update_settings can reconnect when language changes at runtime.
2026-03-06 21:07:03 -05:00
Mark Backman
536f1e178a Fix race condition in Deepgram STT disconnect causing error flood
Clear self._connection before sending close stream so run_stt stops
sending audio immediately during the WebSocket close handshake.
2026-03-06 21:01:31 -05:00
Mark Backman
750b87dc24 Fix AWS examples, update to sonnet 4.6 2026-03-06 20:53:22 -05:00
Mark Backman
671e9a6846 TTS service and example updates 2026-03-06 20:53:22 -05:00
Mark Backman
2c85d2056c Examples fixes for Gemini Live 2026-03-06 18:42:22 -05:00
Mark Backman
a97a086dbd Fix GeminiLiveLLMService init referencing undefined _params variable
Replace references to undefined `_params` with `self._settings` for
language and VAD config. Add missing `system_instruction` to default
settings to satisfy validate_complete(). Remove redundant line that
read language from the deprecated `params` arg.
2026-03-06 18:41:54 -05:00
Mark Backman
4ed3480e4b Update TTSSettings docstrings with the corresponding class name(s) 2026-03-06 16:40:38 -05:00
Mark Backman
d59c0ea6c1 Merge pull request #3941 from pipecat-ai/mb/stt-settings-updates
STT services: settings and examples fixes
2026-03-06 15:21:30 -05:00
Mark Backman
7d41049b35 Review feedback, clarify corresponding class in STTSettings docstrings 2026-03-06 15:17:02 -05:00
Mark Backman
6431ad8e2a Fix service settings init ordering and example bugs
- Speechmatics: move config build after super().__init__ and settings
  delta so turn_detection_mode (e.g. ADAPTIVE) takes effect
- Google STT: fix example passing bare Language enum instead of list
- Google TTS: add missing explicit defaults for all custom settings fields
- Soniox: fix accidental tuple wrapping of STT service in example
- Speechmatics examples: fix system->user role in kick-off messages
- Deepgram Flux: move tag from settings to __init__ (billing metadata)
- ElevenLabs STT: default tag_audio_events to None (use API default)
- Fal STT: simplify language default handling
- Google TTS: rename GoogleStreamTTSSettings to GoogleTTSSettings
2026-03-06 15:17:01 -05:00
Mark Backman
c3794956ef Add deprecation version, fix foundational example double system message 2026-03-06 15:16:58 -05:00
Mark Backman
940da9eeeb Add vad_threshold to AssemblyAISTTSettings
Wire vad_threshold through Settings, default_settings, the deprecated
connection_params path, and _build_ws_url query params.
2026-03-06 15:16:10 -05:00
Mark Backman
696e431e96 Broaden Service Settings docs to cover all AI service types
Use "AI service" language instead of listing specific types, add
ServiceSettings as a fallback for direct AIService subclasses, and
clarify delta mode description with a concrete frame example.
2026-03-06 15:16:10 -05:00
kompfner
1a1c5668de Merge pull request #3942 from pipecat-ai/pk/aws-nova-sonic-audio-config
Add AudioConfig class to AWSNovaSonicLLMService for non-deprecated au…
2026-03-06 14:58:22 -05:00
Filipi da Silva Fuchter
4b9fc8a30c Merge pull request #3804 from pipecat-ai/filipi/concurrent_audio_contexts
Allowing concurrent audio contexts
2026-03-06 14:49:57 -05:00
Paul Kompfner
9b7a86bb12 Add AudioConfig class to AWSNovaSonicLLMService for non-deprecated audio configuration
The audio fields (sample rates, sample sizes, channel counts) on the deprecated `Params` class had no non-deprecated equivalent. This adds an `AudioConfig` class and `audio_config` init arg so users can specify audio configuration without relying on the deprecated `params` parameter.
2026-03-06 14:39:53 -05:00
filipi87
3000037dec Changelog entries for the TTS improvements and fixes. 2026-03-06 16:16:25 -03:00
filipi87
07abd3d60f Fixed BotStoppedSpeakingFrame emission: now emitted as soon as TTSStoppedFrame is received, with a fallback silence-based timeout increased to reduce false positives 2026-03-06 16:16:11 -03:00
filipi87
88ff7c451b Refactored all 25+ TTS service implementations to use the new push_start_frame=True pattern 2026-03-06 16:15:59 -03:00
filipi87
24430d8d45 Fixing Piper test. 2026-03-06 16:15:26 -03:00
filipi87
921e9e1fc9 Refactoring TTS services to allow concurrent audio contexts. 2026-03-06 16:15:10 -03:00
filipi87
c243850cf1 Removing observer from the inworld example. 2026-03-06 16:14:23 -03:00
kompfner
817f88e90b Merge pull request #3940 from pipecat-ai/pk/grok-realtime-settings-pattern
Adopt the `settings` pattern for Grok Realtime session properties
2026-03-06 14:09:25 -05:00
Aleix Conchillo Flaqué
e65ceb4edc Merge pull request #3931 from pipecat-ai/aleix/examples-always-use-user-role
Update foundational examples to use system_instruction
2026-03-06 10:41:33 -08:00
Aleix Conchillo Flaqué
593b75bc8b Update foundational examples to use "user" role
Use system_instruction on LLM service constructors instead of adding
system messages to LLMContext. Messages added to context now use
"user" role.
2026-03-06 09:53:33 -08:00
Paul Kompfner
f4c039048c Adopt the settings pattern for Grok Realtime session properties
Move `session_properties` into `GrokRealtimeLLMSettings`, making `settings` the canonical way to configure Grok Realtime — matching the pattern used across the rest of the codebase. The `session_properties` init arg is now deprecated in favor of `settings=GrokRealtimeLLMSettings(session_properties=...)`.

`system_instruction` is synced bidirectionally between the top-level settings field and `session_properties.instructions`, with top-level taking precedence on conflict. (Unlike OpenAI Realtime, Grok's `SessionProperties` has no `model` field, so no model sync is needed.)
2026-03-06 12:53:26 -05:00
kompfner
d84a250b62 Merge pull request #3939 from pipecat-ai/pk/openai-realtime-settings-pattern
Adopt the `settings` pattern for OpenAI Realtime session properties
2026-03-06 12:39:08 -05:00
Paul Kompfner
2b8a6d9ca4 In OpenAI/Azure Realtime examples, migrate to settings=OpenAIRealtimeLLMSettings(...) pattern
Move `session_properties` and `system_instruction` into the `settings` arg, matching the canonical pattern used across the codebase.
2026-03-06 12:00:41 -05:00
mattie ruth backman
18494658c3 rename models_vX to models.py and models_deprecated.py 2026-03-06 11:49:59 -05:00
mattie ruth backman
da0975a4e0 Fix forward reference 2026-03-06 11:49:59 -05:00
mattie ruth backman
49fba5209c copilot feedback 2026-03-06 11:49:59 -05:00
mattie ruth backman
158424aa28 Convert RTVI framework into a structured package
Replace the monolithic rtvi.py with a proper package split by concern
protocol version:
  - models_v0.py: deprecated pre-1.0 Pydantic models
  - models_v1.py: current RTVI protocol v1 message models
  - frames.py: RTVI pipeline frame dataclasses
  - observer.py: RTVIObserver and RTVIObserverParams
  - processor.py: RTVIProcessor (now lean, imports from submodules)
  - __init__.py: re-exports full public API for backward compatability
2026-03-06 11:49:59 -05:00
Paul Kompfner
bd4229ea9d Adopt the settings pattern for OpenAI Realtime session properties
Move `session_properties` into `OpenAIRealtimeLLMSettings`, making `settings` the canonical way to configure OpenAI Realtime — matching the pattern used across the rest of the codebase. The `session_properties` init arg is now deprecated in favor of `settings=OpenAIRealtimeLLMSettings(session_properties=...)`.

`model` and `system_instruction` are synced bidirectionally between the top-level settings fields and `session_properties.model`/`.instructions`, with top-level taking precedence on conflict.
2026-03-06 11:46:21 -05:00
kigland
848f35f5df fix: replace bare except handlers with specific exception types 2026-03-06 23:05:02 +08:00
kompfner
ac80b787bf Merge pull request #3877 from pipecat-ai/pk/service-init-cleanup
Add `settings` as canonical init arg for all AIService descendants, d…
2026-03-06 10:01:50 -05:00
Paul Kompfner
5b270fec8e In AWS Nova Sonic examples, migrate to newer pattern of passing in settings with voice and system_instruction, in favor of passing in voice_id as a direct init arg and the system instruction as the first message in the context 2026-03-06 09:57:57 -05:00
Paul Kompfner
a1641f3762 Add system_instruction to realtime service settings
Add `system_instruction=None` to `default_settings` for OpenAIRealtimeLLMService, GrokRealtimeLLMService, UltravoxRealtimeLLMService, AWSNovaSonicLLMService (Azure inherits from OpenAI), and OpenAIRealtimeBetaLLMService (Azure Beta inherits from OpenAI Beta).

Deprecate `system_instruction` init arg in AWSNovaSonicLLMService in favor of `settings=AWSNovaSonicLLMSettings(system_instruction=...)`. Use `self._settings.system_instruction` directly instead of storing a separate `self._system_instruction`.

Deprecation of `params` and `session_properties` in favor of `settings` for realtime services will be tackled in future work.
2026-03-06 09:57:34 -05:00
Paul Kompfner
78deaa735d Move system_instruction into LLMSettings
Add `system_instruction` field to `LLMSettings` so it is runtime-updatable via settings.
For Google (GoogleLLMService, GoogleVertexLLMService), deprecate the init-time arg since it was already shipped. For Anthropic, AWS Bedrock, and OpenAI, remove the init-time arg entirely since it was never shipped.

Still need to handle realtime services (OpenAI Realtime, Grok Realtime, Gemini Live).
2026-03-06 09:57:08 -05:00
filipi87
524b87f087 Adding changelog entry for the summarization fix. 2026-03-06 11:45:20 -03:00
filipi87
4ef3b52c72 Fix context summarization leaving orphaned tool responses in kept context. 2026-03-06 11:40:27 -03:00
Mark Backman
ee2895a783 Update COMMUNITY_INTEGRATIONS.md with full Service Settings guidance
Broaden the "Dynamic Settings Updates" section into "Service Settings"
covering the complete settings pattern: defining a Settings subclass,
wiring it into __init__ with defaults + apply_update, and distinguishing
init-only config from runtime-updatable fields.
2026-03-06 08:44:15 -05:00
Mark Backman
ab37185208 Update run_eval_pipeline with the latest settings, system_instruction patterns 2026-03-06 08:32:59 -05:00
Mark Backman
8a203dd98f Update more examples, misc services 2026-03-06 08:30:00 -05:00
Mark Backman
62554a2390 Update examples 2026-03-06 08:30:00 -05:00
Mark Backman
14c3a88f02 Fix tests 2026-03-06 08:29:14 -05:00
Mark Backman
939d753c2b Update LLMs 2026-03-06 08:29:14 -05:00
Mark Backman
a4375274b2 Add Settings subclasses to all services and auto-discovered init tests
- Add dedicated Settings subclasses to 20 LLM services that were
  borrowing parent Settings classes (e.g. AzureLLMSettings,
  GroqLLMSettings) so users don't need cross-module imports
- Fix field defaults to NOT_GIVEN in BaseWhisperSTTSettings,
  OpenAIRealtimeSTTSettings, and NvidiaSegmentedSTTSettings for
  delta-mode safety
- Fix incomplete default_settings in AWS, Cartesia, ElevenLabs,
  Fish, and Whisper services so validate_complete() passes
- Add auto-discovered tests that verify all Settings classes default
  to NOT_GIVEN (delta safety) and all services initialize with
  complete settings (store completeness)
2026-03-06 08:29:14 -05:00
Mark Backman
034e81ff18 Update STT service settings 2026-03-06 08:29:14 -05:00
Mark Backman
3cb792a801 Update TTS service settings 2026-03-06 08:29:14 -05:00
Mark Backman
1274bb2c55 Update deprecation version to 0.0.105 2026-03-06 08:29:14 -05:00
Mark Backman
f31bfcf4ec Clean up CartesiaTTSSettings: separate init-only vs runtime-updatable fields
Move output_container, output_encoding, output_sample_rate out of
CartesiaTTSSettings into plain instance attributes since they cannot
change at runtime without breaking the audio pipeline. Remove deprecated
speed/emotion fields and their dead references in _build_msg() and
run_tts(). Remove the from_mapping override that only existed to
destructure those now-removed output format fields.
2026-03-06 08:29:14 -05:00
Mark Backman
07f1d0cd96 Change _warn_deprecated_param to accept type references instead of strings
Update all ~192 call sites across 84 service files to pass class references
(e.g. `CartesiaTTSSettings`) instead of string names (`"CartesiaTTSSettings"`)
to `_warn_deprecated_param()`. This enables better IDE refactoring support.

Also fix `from_mapping` return type annotations in 5 settings subclasses to
use `typing.Self` instead of forward reference strings.
2026-03-06 08:29:14 -05:00
Mark Backman
bc2843e30a Fix deprecation version 2026-03-06 08:29:14 -05:00
Paul Kompfner
5dc312ce0c Add settings as canonical init arg for all AIService descendants, deprecate redundant model/voice/params args
ServiceSettings types were introduced for runtime updates via ServiceUpdateSettingsFrame, but there was tension between init-time and runtime APIs: overlapping-but-different InputParams vs ServiceSettings classes, and runtime-updatable fields like `model` and `voice` scattered as direct init args rather than living in a settings object. This unifies them so developers use the same settings type at both init and runtime, improving ergonomics and consistency.

Every concrete AIService subclass (LLM, TTS, STT, ImageGen, Vision, Video) now accepts a `settings` parameter for runtime-updatable config. Old init args (`model`, `voice_id`, `params`/`InputParams`) still work but emit DeprecationWarnings pointing to the new API. When both are provided, `settings` takes precedence. Leaf classes emit warnings; base classes do not, avoiding double warnings in inheritance chains.
2026-03-06 08:29:14 -05:00
Aleix Conchillo Flaqué
3199168d3e scripts(evals): use context.add_message() 2026-03-05 19:14:06 -08:00
Aleix Conchillo Flaqué
ea8f5f2e22 Merge pull request #3933 from pipecat-ai/aleix/misc-fixes
Fix Daily transport log level and eval script import
2026-03-05 18:48:14 -08:00
Aleix Conchillo Flaqué
1221e2dd76 Fix Daily transport log level and eval script import
Change participant_updated log from debug to trace (too noisy).
Fix deepgram LiveOptions import in eval script.
2026-03-05 16:37:02 -08:00
Aleix Conchillo Flaqué
5b598265c4 update uv.lock 2026-03-05 16:28:55 -08:00
Mark Backman
79131dd6c6 Merge pull request #3930 from dakshdua/main
Add `push_empty_transcripts` param to `BaseWhisperSTTService` to push received empty transcripts downstream
2026-03-05 19:25:15 -05:00
Aleix Conchillo Flaqué
5b808872d1 Merge pull request #3932 from pipecat-ai/aleix/system-instruction-conflict-warning
Warn when both system_instruction and context system message are set
2026-03-05 16:24:06 -08:00
Aleix Conchillo Flaqué
fda4cb6732 Add changelog for #3932 2026-03-05 16:16:41 -08:00
Daksh Dua
789ce2fd5e Add param to push empty transcripts 2026-03-05 16:16:24 -08:00
Aleix Conchillo Flaqué
f4b8245241 Warn when both system_instruction and context system message are set
system_instruction from the constructor always takes precedence. A
warning is now logged when the context also contains a system message
so users can spot the conflict.
2026-03-05 16:16:17 -08:00
Mark Backman
ca27e12c84 Merge pull request #3926 from pipecat-ai/mb/update-deps-2026-03-05
Update dependency version ranges for flexibility
2026-03-05 18:09:04 -05:00
Mark Backman
671ef5b6cc Merge pull request #3928 from zkleb-aai/simplify-assemblyai-examples
Update AssemblyAI turn detection example to use keyterms_prompt
2026-03-05 16:11:08 -05:00
zack
380726cfd3 Update AssemblyAI turn detection example to use keyterms_prompt
Change the commented example from prompt string format to keyterms_prompt
list format for better clarity and consistency with API best practices.
2026-03-05 15:47:54 -05:00
Mark Backman
f4dfeb0f8b Merge pull request #3927 from zkleb-aai/add-assemblyai-vad-threshold
feat(assemblyai): add vad_threshold parameter for U3 Pro
2026-03-05 15:36:23 -05:00
zack
11024ccc2c Add changelog entries for vad_threshold and parameter cleanup 2026-03-05 15:32:09 -05:00
zack
acfb07f859 feat(assemblyai): add vad_threshold parameter for U3 Pro
Add vad_threshold parameter to AssemblyAIConnectionParams to support
voice activity detection threshold configuration for the u3-rt-pro model.

This parameter allows users to align AssemblyAI's VAD threshold with
their external VAD systems (e.g., Silero VAD) to avoid the "dead zone"
where AssemblyAI transcribes speech that the external VAD hasn't
detected yet, which can delay interruption handling.

- Range: 0.0 to 1.0 (lower = more sensitive)
- Default: 0.3 (API default when not sent)
- Only applicable to u3-rt-pro model
- Automatically included in WebSocket query parameters

Recommended usage: Set vad_threshold to match your VAD's activation
threshold (e.g., both at 0.3) for optimal performance.
2026-03-05 15:27:13 -05:00
Mark Backman
06e49d597b Update dev dependencies 2026-03-05 15:23:07 -05:00
Mark Backman
60e9e26164 revert onnxruntime to onnxruntime~=1.23.2 to maintain Python 3.10 support 2026-03-05 15:13:28 -05:00
Mark Backman
3f97c91983 Update optional dependency version ranges and remove SDK dependencies
Widen version ranges for stable packages (anthropic, azure, deepgram,
groq, livekit, nvidia-riva-client, fastapi, ormsgpack, opentelemetry,
faster-whisper) and add upper bounds to previously uncapped packages
(hume, pyjwt, livekit-api, camb).

Replace CartesiaHttpTTSService's internal use of the Cartesia SDK with
direct aiohttp calls, accepting an optional aiohttp_session parameter.

Replace fal-client SDK calls in FalSTTService and FalImageGenService
with direct HTTP to bypass the SDK's aggressive retry/backoff logic
that caused significant latency regressions.
2026-03-05 15:06:54 -05:00
Mark Backman
05fa727c22 Update core dependency version ranges for flexibility
Widen version ranges for stable packages (aiofiles, docstring_parser,
onnxruntime) while adding upper bounds to previously uncapped packages
(transformers, numba, wait_for2). Bump soxr to 1.0.0 and pyloudnorm
to 0.2.0. Move silero extra to empty since onnxruntime is now a core dep.
2026-03-05 13:13:55 -05:00
Aleix Conchillo Flaqué
06be260e54 Merge pull request #3919 from pipecat-ai/aleix/daily-transport-event-logging
Add logging to Daily transport event handlers
2026-03-05 08:35:28 -08:00
Mark Backman
691d1d309e Merge pull request #3920 from pipecat-ai/mb/remove-hathora 2026-03-05 07:00:52 -05:00
Mark Backman
eeb8ed8588 Remove Hathora service integration
Hathora is shutting down on March 5, 2026. Remove the STT/TTS services,
examples, and related references.
2026-03-04 22:10:06 -05:00
Aleix Conchillo Flaqué
fd545cabab update uv.lock 2026-03-04 17:40:24 -08:00
Aleix Conchillo Flaqué
1aadb8bd73 Merge pull request #3918 from pipecat-ai/aleix/system-instruction-openai-anthropic
Wire up system_instruction in OpenAI, Anthropic, and AWS Bedrock
2026-03-04 17:40:00 -08:00
Aleix Conchillo Flaqué
3c60b0c8af Add changelog for #3918 2026-03-04 17:37:32 -08:00
Aleix Conchillo Flaqué
0004a116d8 examples(foundational): use system_instruction in all examples 2026-03-04 17:37:32 -08:00
Aleix Conchillo Flaqué
01f0caf252 wire up system_instruction in OpenAI, Anthropic and AWS Bedrock 2026-03-04 17:37:32 -08:00
Vanessa Pyne
b42dfa4734 Merge pull request #3916 from pipecat-ai/vp-add-cloud-audio-only
daily-transport: add cloud-audio-only recording option
2026-03-04 16:58:39 -06:00
vipyne
aa31ced32f add changelog for 3916 2026-03-04 16:58:28 -06:00
vipyne
9ca900cc4a daily-transport: add cloud-audio-only recording option 2026-03-04 16:58:28 -06:00
Aleix Conchillo Flaqué
96062972db Add logging to Daily transport event handlers
Add appropriate log levels to dial-in/dial-out, participant, transcription,
and recording event handlers. Move transcription error log from client
callback to transport handler to keep logging consistent at the transport
level.
2026-03-04 13:30:43 -08:00
Mark Backman
4b9fe5afe3 Merge pull request #3915 from pipecat-ai/mb/function_call_timeout_secs-error-msg
Add per-tool function call timeout_secs
2026-03-04 15:01:34 -05:00
Mark Backman
f76b8d2982 Merge pull request #3917 from pipecat-ai/mb/sagemaker-init.py
Add missing __init__.py to sagemaker module
2026-03-04 12:25:44 -05:00
Mark Backman
27ae6a0349 Add missing __init__.py to sagemaker module 2026-03-04 11:50:37 -05:00
Mark Backman
97e4e7c647 Add changelog for #3915 2026-03-04 09:42:01 -05:00
Mark Backman
df35ceca2c Add per-tool timeout_secs to register_function and register_direct_function
The default function call timeout (10s) causes silent failures for
long-running tools. This adds an optional timeout_secs parameter to
register_function() and register_direct_function() so individual tools
can override the global function_call_timeout_secs. The warning message
now mentions both the per-tool and global timeout options.
2026-03-04 09:37:56 -05:00
Mark Backman
e5ae5e6f2d Merge pull request #3914 from pipecat-ai/mb/optional-summarization-thresholds
Make max_context_tokens and max_unsummarized_messages independently optional
2026-03-04 08:57:16 -05:00
Mark Backman
6789aee9e8 Add changelog for #3914 2026-03-03 20:09:26 -05:00
Mark Backman
b358657a79 Make max_context_tokens and max_unsummarized_messages independently optional
Allow either threshold to be set to None to cleanly disable that trigger,
instead of requiring users to set a very large number as a workaround.
At least one of the two must remain set (validated at construction time).
2026-03-03 20:08:22 -05:00
Mark Backman
9186f65952 Merge pull request #3908 from pipecat-ai/mb/uv-lock-2026-03-03
uv.lock update
2026-03-03 13:28:27 -05:00
Mark Backman
bdeeacec51 uv.lock update 2026-03-03 10:37:35 -05:00
Mark Backman
8f04f894d5 Merge pull request #3907 from pipecat-ai/mb/update-docs-skill-new-services 2026-03-03 09:48:01 -05:00
Mark Backman
ca0ec16373 Merge pull request #3889 from ai-coustics/goedev/aic-voice-focus-and-memoryview-fix
AIC Voice Focus version update & concurrency safety issue on audio buffer.
2026-03-03 09:28:13 -05:00
Filipi da Silva Fuchter
150c8b92e5 Merge pull request #3848 from pipecat-ai/filipi/deepgram
Upgrading Deepgram to version 6
2026-03-03 09:07:10 -05:00
filipi87
0fdf7dc16a Fixing sagemaker merge conflicts. 2026-03-03 11:03:51 -03:00
filipi87
fc905a7ef5 Merge branch 'main' into filipi/deepgram
# Conflicts:
#	src/pipecat/services/deepgram/stt_sagemaker.py
2026-03-03 10:54:30 -03:00
Mark Backman
aca92745cd Update update-docs skill to register new services in docs.json and supported-services.mdx
When the skill creates a new service documentation page, it now also adds
the page to docs.json navigation and the supported-services.mdx table.
2026-03-03 08:40:12 -05:00
Aleix Conchillo Flaqué
5940731dd0 Merge pull request #3906 from pipecat-ai/changelog-0.0.104
Release 0.0.104 - Changelog Update
2026-03-02 21:24:05 -08:00
aconchillo
62260454a2 Update changelog for version 0.0.104 2026-03-02 21:23:39 -08:00
Aleix Conchillo Flaqué
d1ad7a9580 Merge pull request #3905 from pipecat-ai/aleix/tavus-fix-callback-on-joined
transport(tavus): fix on_joined callback
2026-03-02 21:11:12 -08:00
Aleix Conchillo Flaqué
252f17e1ca transport(tavus): fix on_joined callback 2026-03-02 21:06:49 -08:00
Mark Backman
c79a739c85 Merge pull request #3856 from zkleb-aai/assemblyai-u3-rt-pro
Assemblyai u3 rt pro
2026-03-02 20:28:28 -05:00
Mark Backman
038f6a77d1 Linting 2026-03-02 20:24:30 -05:00
Aleix Conchillo Flaqué
5952ea711c update uv.lock 2026-03-02 16:42:58 -08:00
Mark Backman
aad1211a57 Merge pull request #3885 from pipecat-ai/mb/latency-breakdown
Add latency breakdown to UserBotLatencyObserver
2026-03-02 19:27:35 -05:00
Mark Backman
7dbb130666 Add chronological_events utility function to display UserBotLatencyObserver report 2026-03-02 19:23:42 -05:00
zack
c6c2c5ba05 Fix end_of_turn_confidence_threshold: set to 1.0 (not 0.0) for universal-streaming
- u3-rt-pro: Does not set parameter (not used)
- universal-streaming models: Set to 1.0 to maintain fast response
- This ensures fast response time matches previous implementation
2026-03-02 18:25:25 -05:00
Aleix Conchillo Flaqué
141b0ee014 Merge pull request #3902 from pipecat-ai/aleix/deepgram-sagemaker-move
Move Deepgram SageMaker modules to sagemaker/ subpackage
2026-03-02 15:25:17 -08:00
Aleix Conchillo Flaqué
303616599f Add changelog for #3902 2026-03-02 15:20:52 -08:00
Aleix Conchillo Flaqué
088eb9b01c examples: update to new sagemaker packages 2026-03-02 15:20:52 -08:00
zack
32773b42d6 Improve terminology: rename file and replace 'STT mode' with 'AssemblyAI turn detection'
- Rename 07o-interruptible-assemblyai-stt.py -> 07o-interruptible-assemblyai-turn-detection.py
- Replace 'STT mode' with 'AssemblyAI turn detection mode' throughout codebase
- Replace 'Mode 1'/'Mode 2' with descriptive 'Pipecat turn detection'/'AssemblyAI turn detection'
- Update changelog to use 'built-in turn detection' terminology
- Addresses PR feedback about confusing terminology
2026-03-02 18:08:46 -05:00
Mark Backman
c039e08741 Merge pull request #3900 from pipecat-ai/filipi/lemonslice
Adding the LemonSlice transport integration
2026-03-02 17:56:18 -05:00
zack
b449515410 Address PR review feedback: remove debug logs, fix hasattr logic, add VADAnalyzer 2026-03-02 17:54:31 -05:00
Mark Backman
aae9136df9 Review feedback 2026-03-02 17:52:39 -05:00
Aleix Conchillo Flaqué
fdeddd7c95 Add deprecation shims for moved stt_sagemaker/tts_sagemaker modules
Re-export from the new pipecat.services.deepgram.sagemaker.{stt,tts}
paths so existing imports keep working with a deprecation warning.
2026-03-02 14:47:17 -08:00
Aleix Conchillo Flaqué
11783520c0 services(deepgram): move stt|tts_sagemaker to sagemaker/stt|tts.py 2026-03-02 14:43:34 -08:00
filipi87
49c73bb0a3 Merge branch 'main' into filipi/lemonslice
# Conflicts:
#	README.md
#	uv.lock
2026-03-02 19:24:52 -03:00
filipi87
f07e55a4ed Wrap LemonSlice session creation params in LemonSliceNewSessionRequest 2026-03-02 19:15:18 -03:00
filipi87
daf14f5065 Renaming LemonSlice utils file to api. 2026-03-02 19:08:17 -03:00
filipi87
ebb794995b Changing the log levels. 2026-03-02 19:06:13 -03:00
zkleb-aai
5c2ca0ce64 Update changelog/3856.changed.md
Co-authored-by: Mark Backman <m.backman@gmail.com>
2026-03-02 17:04:54 -05:00
zkleb-aai
6729f4366a Update src/pipecat/services/assemblyai/stt.py
Co-authored-by: Mark Backman <m.backman@gmail.com>
2026-03-02 17:04:42 -05:00
zkleb-aai
7648b62e6e Update src/pipecat/services/assemblyai/stt.py
Co-authored-by: Mark Backman <m.backman@gmail.com>
2026-03-02 17:04:17 -05:00
filipi87
7afd7068b5 Retrieving the elevenlabs voice ID from environment variable 2026-03-02 19:02:51 -03:00
filipi87
07fdd610ca Using a default voice in case it is not provided. 2026-03-02 19:02:33 -03:00
Mark Backman
a4796a2373 Merge pull request #3898 from pipecat-ai/mb/revert-processing-metrics-deprecation
Revert processing metrics deprecation
2026-03-02 16:39:02 -05:00
Aleix Conchillo Flaqué
44466cfa07 Merge pull request #3896 from pipecat-ai/aleix/broadcast-interruption
Add broadcast_interruption() to FrameProcessor
2026-03-02 13:36:39 -08:00
Mark Backman
68d7e98f95 Add defensive comment for given_fields() usage in tracing 2026-03-02 16:33:25 -05:00
Aleix Conchillo Flaqué
741ff14d3a Rename changelog files to use PR #3896 and mark breaking change 2026-03-02 13:26:45 -08:00
Aleix Conchillo Flaqué
4a61d5bfad Add broadcast_interruption() to FrameProcessor
Replace the round-trip push_interruption_task_frame_and_wait() mechanism
with broadcast_interruption(), which pushes an InterruptionFrame both
upstream and downstream directly from the calling processor.

This eliminates race conditions (transcription arriving before the
InterruptionFrame comes back), swallowed-event timeouts (frame blocked
before reaching the sink), and the complexity of _wait_for_interruption
flag / queue bypass / frame.complete() obligations.

- Add broadcast_interruption() to FrameProcessor
- Deprecate push_interruption_task_frame_and_wait() (delegates to new method)
- Remove event field and complete() from InterruptionFrame/InterruptionTaskFrame
- Remove _wait_for_interruption flag and all special-case logic
- Remove frame.complete() calls in stt_mute_filter and llm_response_universal
- Update all 17 call sites to use broadcast_interruption()
- Update tests
2026-03-02 13:26:45 -08:00
Mark Backman
d0ecb3c7a8 Revert "Deprecate processing metrics (ProcessingMetricsData)" (#3852)
This reverts commit 127b52bad5.
2026-03-02 16:26:29 -05:00
Mark Backman
8f66272de7 Update changelog 2026-03-02 16:16:38 -05:00
Mark Backman
ff5b985009 Convert observer data models to Pydantic BaseModel with timestamps
Enables .model_dump() serialization for Pipecat Cloud collection.
All metrics now include start_time (Unix timestamp) for timeline
plotting alongside duration_secs.
2026-03-02 16:11:43 -05:00
Mark Backman
a738a4d82b Add function call latency tracking to LatencyBreakdown 2026-03-02 16:11:43 -05:00
Mark Backman
ddba1b84a9 Add first-bot-speech latency to UserBotLatencyObserver
Measure time from ClientConnectedFrame to first BotStartedSpeakingFrame,
emitting a one-time on_first_bot_speech_latency event with breakdown.
2026-03-02 16:11:43 -05:00
Mark Backman
18155b6a63 Add latency breakdown to UserBotLatencyObserver
Add per-service latency breakdown metrics alongside existing user-to-bot
latency measurement. When enable_metrics=True, the observer now emits an
on_latency_breakdown event with TTFB, text aggregation, and user turn
duration metrics collected between VADUserStoppedSpeakingFrame and
BotStartedSpeakingFrame.

- Add LatencyBreakdown dataclass with ttfb, text_aggregation,
  user_turn_secs fields
- Accumulate MetricsFrame data during user→bot cycles
- Reset accumulators on InterruptionFrame to discard stale metrics
- Measure user_turn_secs from actual user silence (VAD timestamp -
  stop_secs) to turn release (UserStoppedSpeakingFrame)
- Filter zero-value TTFB entries from startup metric resets
- Add frame deduplication using bounded deque + set pattern
- Update example 29 with latency breakdown display
2026-03-02 16:11:43 -05:00
Mark Backman
ac69b3441e Fix tracing to use ServiceSettings API instead of dict access
The ServiceSettings refactor (PR #3714) changed self._settings from
dicts to dataclass subclasses, but tracing code still used .items(),
in containment, and subscript access, causing AttributeError on
every traced call. Use given_fields() for iteration and attribute
access for named fields.
2026-03-02 16:11:43 -05:00
Mark Backman
98bd530574 Add changelog for #3881 2026-03-02 16:11:42 -05:00
Mark Backman
b1e55fd6c2 Merge pull request #3881 from pipecat-ai/mb/startup-observer
Add StartupTimingObserver
2026-03-02 16:07:28 -05:00
Mark Backman
dbdb54ce0f Add on_connected event handler to DailyTransport for cross-transport consistency 2026-03-02 15:44:37 -05:00
Mark Backman
c1743dcffd Rename Tavus event, on_connected 2026-03-02 15:22:44 -05:00
Mark Backman
389d0c3fb6 Use on_pipeline_started from PipelineTask for startup report
Replace the PipelineSink detection in StartupTimingObserver with an
on_pipeline_started() callback from PipelineTask via TaskObserver.
This fixes premature report emission when using ParallelPipeline,
which has its own inner PipelineSinks per branch.
2026-03-02 14:33:55 -05:00
Mark Backman
a88eae7849 Merge pull request #3895 from pipecat-ai/aleix/update-nvidia-example-model
Update Nvidia example to use llama-3.3-70b-instruct
2026-03-02 14:27:53 -05:00
Mark Backman
0cfd953a90 Use _ArrivalInfo dataclass instead of tuple for arrival tracking 2026-03-02 14:15:41 -05:00
Mark Backman
bbbfdfd321 Replace per-processor start_time with start_offset_secs
Use start_offset_secs (offset from StartFrame) on ProcessorStartupTiming
instead of a wall-clock timestamp. Reports keep a single start_time
anchor for dashboard visualization. Remove _mono_to_wall conversion.
2026-03-02 14:07:34 -05:00
Aleix Conchillo Flaqué
193f93c2ce Update Nvidia example to use llama-3.3-70b-instruct model 2026-03-02 10:16:27 -08:00
Mark Backman
75669b12a2 Convert observer data models to Pydantic BaseModel with timestamps
Switch ProcessorStartupTiming, StartupTimingReport, and
TransportTimingReport from dataclasses to Pydantic BaseModel. Add
start_time (Unix timestamp) fields and wall clock conversion for
monotonic observer timestamps.
2026-03-02 13:10:09 -05:00
Mark Backman
68e8732e72 Add BotConnectedFrame and on_transport_timing_report event
Add BotConnectedFrame (SystemFrame) pushed by SFU transports (Daily,
LiveKit, HeyGen, Tavus) when the bot joins the room. Replace the
on_transport_readiness_measured event with on_transport_timing_report
which includes both bot_connected_secs and client_connected_secs.
2026-03-02 13:10:09 -05:00
Mark Backman
de87894778 Update changelog for #3881 2026-03-02 13:10:09 -05:00
Mark Backman
0836066898 Add ClientConnectedFrame and transport readiness timing
Introduce ClientConnectedFrame (SystemFrame) pushed by all transports
when a client connects. StartupTimingObserver uses this to measure
transport readiness — the time from StartFrame to first client
connection — via a new on_transport_readiness_measured event.
2026-03-02 13:10:09 -05:00
Mark Backman
58aa8e1ba5 Add changelog for #3881 2026-03-02 13:10:09 -05:00
Mark Backman
670e5000d2 Merge pull request #3893 from pipecat-ai/mb/fix-azure-error-propagation
Propagate Azure TTS/STT cancellation errors to the pipeline
2026-03-02 13:04:54 -05:00
Mark Backman
e6b9c5c4dc Propagate Azure TTS/STT cancellation errors to the pipeline
Azure TTS _handle_canceled was putting None (the normal completion
signal) into the audio queue for all cancellation reasons, so run_tts
treated errors identically to success—silently producing no audio.
Now error cancellations put an Exception marker in the queue, which
run_tts converts to an ErrorFrame.

Azure STT had no canceled event handler at all, so auth failures,
network errors, and rate-limit cancellations were invisible. Added
_on_handle_canceled which pushes an ErrorFrame upstream via push_error.

Fixes pipecat-ai/pipecat#3892
2026-03-02 12:36:08 -05:00
Mark Backman
c54232bdb4 Add StartupTimingObserver for measuring processor start() times
Tracks how long each processor start method takes during pipeline
startup by measuring StartFrame arrive/leave deltas. Emits a timing
report via the on_startup_timing_report event and auto-logs a summary.
Internal pipeline processors are excluded from reports by default.
2026-03-02 10:48:50 -05:00
Mark Backman
5a6a93e277 Merge pull request #3886 from dhruvladia-sarvam/add/user-agent
fix(sarvam): standardize STT/TTS User-Agent headers
2026-03-02 10:21:23 -05:00
dhruvladia-sarvam
f386722ef9 removing unnecessary logs 2026-03-02 20:38:39 +05:30
Mark Backman
7c07e090a4 Merge pull request #3891 from pipecat-ai/mb/fix-update-docs-oidc
Fix update-docs workflow OIDC failure with pull_request_target
2026-03-02 09:29:35 -05:00
filipi87
8b09f7bbb4 Upgrading Deepgram to version 6. 2026-03-02 11:22:33 -03:00
Mark Backman
07ba255073 Fix update-docs workflow OIDC failure with pull_request_target
The switch from pull_request to pull_request_target (for fork PR
secret access) broke claude-code-action default OIDC-based GitHub
App authentication. Pass github_token explicitly to bypass OIDC.
2026-03-02 09:20:24 -05:00
Mark Backman
eb7a4b7aee Merge pull request #3874 from pipecat-ai/mb/pr-3873
Changelog for PR 3873, docstrings change
2026-03-02 08:36:05 -05:00
Rupesh
ad74d19c6b Remove resampling warning log for consistency with rest of codebase 2026-03-02 13:24:00 +00:00
Rupesh
5e8d722bf2 Use soxr for high-quality audio resampling instead of numpy linear interpolation 2026-03-02 13:24:00 +00:00
Rupesh
a7f6db8436 Add changelog fragment for #3857 2026-03-02 13:24:00 +00:00
Rupesh
442ea6a97e Fix Smart Turn v3 producing incorrect predictions at non-16kHz sample rates
The Whisper-based ONNX model expects 16 kHz audio, but the
_predict_endpoint method had five hardcoded references to 16000 without
checking the actual pipeline sample rate. When running at 8 kHz (e.g.
Twilio telephony), audio was fed to the feature extractor at the wrong
rate, causing the model to perceive speech at 2x speed with shifted
formant frequencies and produce incorrect end-of-turn predictions.

Add automatic resampling via numpy interpolation before feature
extraction and replace all hardcoded sample rate values with a
_MODEL_SAMPLE_RATE constant. Also fix the WAV debug logger to write
files with the correct sample rate header.

Fixes #3844
2026-03-02 13:24:00 +00:00
Mark Backman
018ead8551 Changelog for PR 3873, docstrings change 2026-03-02 08:08:43 -05:00
Mark Backman
5e99aeedf5 Merge pull request #3888 from pipecat-ai/mb/fix-filter-incomplete-turns
Re-inject turn completion instructions after LLM context reset
2026-03-02 08:03:08 -05:00
Mark Backman
c579749d8a Merge pull request #3875 from pipecat-ai/mb/foundational-ex-updates
Miscellaneous foundational example updates
2026-03-02 08:02:51 -05:00
Mark Backman
094de42f0c Merge pull request #3879 from pipecat-ai/mb/fix-tracing-settings
Fix tracing to use ServiceSettings API instead of dict access
2026-03-02 08:01:45 -05:00
dhruvladia-sarvam
1242f1c10e changelog entry 2026-03-02 18:09:51 +05:30
dhruvladia-sarvam
55a641e258 fix(sarvam): standardize STT/TTS User-Agent headers 2026-03-02 18:09:51 +05:30
Gökmen Görgen
7575ea7e07 add changelog entries for Voice Focus 2.0 support and buffer resize fix in AICFilter. 2026-03-02 12:47:59 +01:00
Gökmen Görgen
d6aab6b52e simplify parameter setup in AICFilter.
ParameterFixedError is deprecated.
2026-03-02 12:24:52 +01:00
Gökmen Görgen
8ff3e21654 use new version of vf model. 2026-03-02 11:22:51 +01:00
Gökmen Görgen
ea59695551 don't use memoryview for concurrency safety.
Snapshot the blocks into immutable bytes and trim the buffer BEFORE any await, so no memoryview is
held across async yield points. Without this, a concurrent filter() or stop() call could try to
extend() or clear() the bytearray while a memoryview still exports it, raising "Existing exports
of data: object cannot be re-sized".
2026-03-02 10:55:25 +01:00
Gökmen Görgen
16c676a921 add a test for reproducing the user feedback first. 2026-03-02 10:34:50 +01:00
Mark Backman
91c46ffbf4 Re-inject turn completion instructions after LLM context reset
When filter_incomplete_user_turns is enabled and an LLMMessagesUpdateFrame
replaces the context via set_messages(), the turn completion instructions
system message was lost. This caused the LLM to stop emitting turn
completion markers. Re-inject the instructions after set_messages() to
fix this.
2026-03-01 16:37:07 -05:00
Mark Backman
024c62946f Merge pull request #3878 from pipecat-ai/mb/fix-update-docs-workflow-secrets
fix: use pull_request_target for docs workflow to access secrets from fork PRs
2026-03-01 14:53:53 -05:00
Mark Backman
9b969736f6 Merge pull request #3764 from kedar389/add-support-for-private-endpoint-azure-stt
feat: Add support for private endpoint in Azure STT
2026-03-01 14:50:34 -05:00
Radek Sedlák
6fc718947d Merge branch 'pipecat-ai:main' into add-support-for-private-endpoint-azure-stt 2026-03-01 17:55:45 +01:00
zack
cb7e612738 Remove test files and testing documentation from PR 2026-03-01 11:51:51 -05:00
zack
36b9c05730 Fix changelog entries to use proper markdown bullet format 2026-03-01 11:45:24 -05:00
zack
6968d83ccb Add changelog entries for PR #3856 2026-03-01 11:44:51 -05:00
zack
42f91a9056 Apply ruff formatting fixes 2026-03-01 11:44:37 -05:00
zack
5de495cc98 Use logger.warning instead of warnings.warn for deprecation message
- Makes deprecation warning visible in logs without needing Python warning flags
- Users will see the warning during normal operation
2026-03-01 11:39:00 -05:00
zack
d1cbc81108 Fix 07o example to use new min_turn_silence parameter name in docs and comments 2026-03-01 11:36:46 -05:00
zack
66fca7e382 Add backward compatibility for min_end_of_turn_silence_when_confident parameter
- Keep old parameter name for backward compatibility
- Add deprecation warning when old parameter is used
- Automatically migrate old parameter value to new min_turn_silence parameter
- Exclude deprecated parameter from WebSocket URL to avoid sending it to API
- New parameter takes precedence if both are set
2026-03-01 11:33:22 -05:00
zack
07ae4b8d38 Update AssemblyAI examples to use u3-rt-pro and improve 55d example
- Update 13d-assemblyai-transcription.py to explicitly use u3-rt-pro model
- Update 55d-update-settings-assemblyai-stt.py to demonstrate keyterms updates instead of language updates
- Add helpful logging to show before/after keyterms boosting effect
- Use difficult names (Xiomara, Saoirse, Krzystof) to demonstrate boosting effectiveness
2026-03-01 11:27:31 -05:00
zack
21a409e447 Update prompt warning and rename min_end_of_turn_silence_when_confident to min_turn_silence
- Add "beta feature" note to custom prompt warning
- Rename min_end_of_turn_silence_when_confident parameter to min_turn_silence across all AssemblyAI code
- Update documentation, examples, and test files to use new parameter name
2026-03-01 11:17:39 -05:00
Aleix Conchillo Flaqué
903dc6c1a9 Merge pull request #3883 from pipecat-ai/aleix/queue-frame-direction
Add direction parameter to PipelineTask.queue_frame() and queue_frames()
2026-03-01 06:04:28 -08:00
Mark Backman
dee94b3cb8 Merge pull request #3795 from omChauhanDev/fix/realtime-cancel-not-active
fix(realtime): handle response_cancel_not_active as non-fatal
2026-03-01 07:29:59 -05:00
Om Chauhan
ece4343839 changed log level to debug 2026-03-01 12:25:42 +05:30
Aleix Conchillo Flaqué
94a59de4e1 Add changelog for #3883 2026-02-28 17:28:44 -08:00
Aleix Conchillo Flaqué
f37fd39cdb Add optional direction parameter to PipelineTask.queue_frame() and queue_frames()
Allow pushing frames upstream through the pipeline by passing
FrameDirection.UPSTREAM. Downstream frames use the existing push queue,
while upstream frames are pushed directly from the pipeline sink.
2026-02-28 17:28:44 -08:00
Mark Backman
9d4955054c Fix tracing to use ServiceSettings API instead of dict access
The ServiceSettings refactor (PR #3714) changed self._settings from
dicts to dataclass subclasses, but tracing code still used .items(),
in containment, and subscript access, causing AttributeError on
every traced call. Use given_fields() for iteration and attribute
access for named fields.
2026-02-27 22:41:40 -05:00
Mark Backman
6464230627 fix: use pull_request_target for docs workflow to access secrets from fork PRs
The update-docs workflow intermittently failed with "Input required and
not supplied: token" because pull_request events from fork PRs don't
have access to repository secrets. Switching to pull_request_target
runs the workflow in the base repo's context, ensuring secrets are
always available. This is safe since the workflow only runs on
already-merged PRs.
2026-02-27 22:22:35 -05:00
Mark Backman
950a8628dc Miscellaneous foundational example updates 2026-02-27 19:49:45 -05:00
Mark Backman
17205c1647 Merge pull request #3871 from rupesh-svg/fix/rtvi-processor-double-insert
Fix PipelineTask double-inserting RTVIProcessor with custom RTVIObserver
2026-02-27 19:34:46 -05:00
Mark Backman
2a776d0c1e Merge pull request #3873 from rimelabs/matt/rime/add_speedAlpha_param_to_arcana
[RimeTTS] Add `speedAlpha` parameter support to the `arcana` model
2026-02-27 19:27:56 -05:00
zack
d7ce1eedd9 Add foundational examples for AssemblyAI u3-rt-pro
- 07o-interruptible-assemblyai.py: Basic example using Pipecat VAD mode
- 07o-interruptible-assemblyai-stt.py: Advanced example using STT-controlled
  turn detection with comprehensive documentation on u3-rt-pro features
  (turn detection tuning, prompt-based enhancement, speaker diarization)
2026-02-27 17:58:18 -05:00
zack
ef00f27d53 Fix incorrect await on synchronous request_finalize() method
The request_finalize() method in STTService is synchronous (sets a flag),
but was being called with await in the VAD turn endpoint handling code.
This caused "object NoneType can't be used in 'await' expression" errors.

Also includes automatic formatting improvements from ruff.
2026-02-27 17:58:05 -05:00
Rupesh
56f2564ed1 Use local variable instead of instance variable for RTVI prepend decision
Replace _rtvi_external instance variable with a local prepend_rtvi flag
since it is only used during __init__ to decide whether to prepend the
RTVIProcessor to the pipeline.
2026-02-27 14:45:37 -08:00
macaki
000d38e253 [Rime] Both mist and arcana now support the speedAlpha parameter. 2026-02-27 15:17:23 -07:00
Filipi da Silva Fuchter
36edef489e Merge pull request #3863 from pipecat-ai/filipi/manual_summarization
Manual context summarization
2026-02-27 16:46:37 -05:00
filipi87
d077a810ae Fixing context summarization tests 2026-02-27 18:42:50 -03:00
filipi87
0839e3813f Refactoring the examples to use the new context summarization classes. 2026-02-27 18:42:39 -03:00
filipi87
69414e8a5a Added example 54b-context-summarization-manual-openai.py demonstrating on-demand summarization triggered via a function call tool. 2026-02-27 18:42:23 -03:00
filipi87
dfd0a515f3 Changelog entries for the context summarization improvements. 2026-02-27 18:42:13 -03:00
filipi87
ed7f0a2c08 Adding support for on-demand summarization 2026-02-27 18:41:55 -03:00
filipi87
08d93ce9b6 Renamed LLMAssistantAggregatorParams fields for clarity. 2026-02-27 18:41:17 -03:00
filipi87
f11d4b6944 Refactored LLMContextSummarizationConfig into two focused classes, LLMContextSummaryConfig and LLMAutoContextSummarizationConfig. 2026-02-27 18:40:41 -03:00
filipi87
51a3310e78 Added LLMSummarizeContextFrame: push this frame anywhere in the pipeline to trigger on-demand context summarization (e.g. from a function call tool). 2026-02-27 18:39:57 -03:00
Rupesh
6f33aff0c6 Fix PipelineTask double-inserting RTVIProcessor when custom RTVIObserver is provided
When the user places an RTVIProcessor inside their pipeline and provides
a custom RTVIObserver subclass in observers, PipelineTask correctly
detects both and logs "skipping default ones." However it then
unconditionally prepends self._rtvi to the pipeline, causing the
processor to appear twice in the frame chain.

Track whether the RTVIProcessor was found externally (inside the user
pipeline) vs created internally. Only prepend it when created internally.

Fixes #3867
2026-02-27 13:29:01 -08:00
zack
45532a9478 Remove info logs and unused import per PR feedback
- Remove unused Mapping import
- Remove info logs at initialization (connection params)
- Remove info logs in _handle_transcription (transcript details, text sent to LLM)
- Remove info logs in _build_ws_url (WebSocket URL and params)
- Keep debug logs (less verbose, appropriate for development)
2026-02-27 16:15:49 -05:00
Mark Backman
4eb993c980 Merge pull request #3868 from wollerman/wollerman/numba-version-pin-update
fix: Update numba version pin from == to >=0.61.2
2026-02-27 16:04:20 -05:00
Mark Backman
83e29eb478 Merge pull request #3855 from pipecat-ai/mb/context-summarization-improvements
Improve context summarization with dedicated LLM, timeout, and observability
2026-02-27 15:24:38 -05:00
zack
6ba9f780b0 Remove unnecessary SpeechStarted fallback in STT mode
u3-rt-pro guarantees SpeechStarted is always sent before transcripts,
so the fallback UserStartedSpeakingFrame broadcast is never needed.

This ensures clean pairing of UserStarted/StoppedSpeakingFrame:
- Start: Always from _handle_speech_started
- Stop: Always from _handle_transcription on final turn
2026-02-27 15:00:38 -05:00
zack
aa7e9a17d5 Fix finalization pattern: Use request/confirm in Pipecat mode, finalized flag in STT mode
- Add request_finalize() before sending ForceEndpoint in Pipecat mode
- Keep confirm_finalize() when receiving formatted finals in Pipecat mode
- Remove confirm_finalize() from STT mode (use finalized=True instead)

This follows Pipecat's two-step finalization pattern where request_finalize()
is called when sending a finalize request to the STT service, and
confirm_finalize() is called when receiving confirmation back.
2026-02-27 14:55:22 -05:00
Matt
acff172bf2 create changelog entry 2026-02-27 14:52:37 -05:00
Mark Backman
9747e8da4a Merge pull request #3866 from pipecat-ai/mb/fix-docs-workflow-version
Fix docs workflow to add auto-docs label
2026-02-27 13:09:36 -05:00
Mark Backman
8fc63352d9 Merge pull request #3865 from pipecat-ai/mb/elevenlabs-realtime-stt-finalized
Set finalized flag on ElevenLabs Realtime STT for manual commit strategy
2026-02-27 13:09:17 -05:00
Matt
6ebfea4746 update numba version pin to >= 2026-02-27 12:44:31 -05:00
Mark Backman
f74af9b9c7 Always apply a timeout to summarization LLM calls
Even when summarization_timeout is explicitly set to None, use a
DEFAULT_SUMMARIZATION_TIMEOUT (120s) fallback so the LLM call can
never hang indefinitely. Applied in both LLMService and the dedicated
LLM path in LLMContextSummarizer.
2026-02-27 12:09:00 -05:00
Mark Backman
82c249608f Move dedicated LLM summarization into LLMContextSummarizer
The dedicated LLM logic lived in LLMAssistantAggregator, creating two
code paths and requiring the aggregator to call a private LLMService
method. Move it into the summarizer which already owns the config and
summarization lifecycle, keeping the aggregator handler as a single-line
upstream push.
2026-02-27 12:09:00 -05:00
Mark Backman
98e737b4e9 Add tests for context summarization improvements
Cover summary message role, template, on_summary_applied event,
summarization timeout, and dedicated LLM routing/error handling.
2026-02-27 12:08:43 -05:00
Mark Backman
ec9ddb3199 Add changelog entries for context summarization improvements (#3855) 2026-02-27 12:07:34 -05:00
Mark Backman
712305c5b1 Add example 54c showing custom context summarization 2026-02-27 12:07:34 -05:00
Mark Backman
be8ea818c8 Add on_summary_applied event for observability
Emits a SummaryAppliedEvent after context summarization completes,
  providing message counts so applications can track compression
  metrics.
2026-02-27 12:07:34 -05:00
Mark Backman
50710e9c3f Add summarization timeout to prevent hung LLM calls
Adds a configurable summarization_timeout (default 120s) that cancels
  summary generation if the LLM hangs. On timeout, an error result is
  returned so _summarization_in_progress resets and future
  summarizations are unblocked.
2026-02-27 12:07:34 -05:00
Mark Backman
a489bfaf00 Add optional dedicated LLM for context summarization
Adds an  field to LLMContextSummarizationConfig that allows
  routing summarization to a separate LLM service (e.g., Gemini Flash)
  instead of the pipeline's primary model. This avoids paying for
  expensive inference when compressing context in long-running sessions.
2026-02-27 12:07:34 -05:00
Mark Backman
945a523eed Add configurable summary_message_template to LLMContextSummarizationConfig
Allows applications to customize how the summary is wrapped when
  injected into context (e.g., XML tags, custom delimiters) so system
  prompts can distinguish summaries from live conversation.
2026-02-27 12:07:34 -05:00
Mark Backman
790c434a08 Update summary message role: use user instead of assistant
The context summary is information provided to the assistant, not
  something the assistant said.
2026-02-27 12:07:34 -05:00
Filipi da Silva Fuchter
db40a354be Merge pull request #3794 from omChauhanDev/fix/context-summarization-llm-specific-message
skipping provider-specific messages during summarization
2026-02-27 10:57:34 -05:00
filipi87
aa6d3b38b3 Add explanatory comments for LLMSpecificMessage guards in context summarization, amd fixed the missing guard in LLMContextSummarizer._apply_summary when searching for the first system message. 2026-02-27 12:53:25 -03:00
Mark Backman
41d6470e4a Fix docs workflow: add auto-docs label, remove version info 2026-02-27 10:39:37 -05:00
Mark Backman
601822e3e5 Add changelog for PR #3865 2026-02-27 10:25:48 -05:00
Mark Backman
3a32d91c66 Set finalized flag on ElevenLabs Realtime STT transcriptions for manual commit strategy 2026-02-27 10:21:10 -05:00
Filipi da Silva Fuchter
35b3803ebc Merge pull request #3845 from pipecat-ai/filipi/fix_tts_speak_frame
Add TTSSpeakFrame.push_assistant_aggregation to force context flush after TTS.
2026-02-27 09:59:33 -05:00
filipi87
3b427a47b6 Fixing Piper test. 2026-02-27 11:57:11 -03:00
filipi87
d701c3427c Changelog entry for the TTSSpeakFrame fix. 2026-02-27 11:57:03 -03:00
filipi87
1f45e80f9d Updated the 52-live-translation.py example to demonstrate the fix 2026-02-27 11:56:52 -03:00
filipi87
bc6f8e51de Fixed TTSSpeakFrame not automatically committing spoken text to the conversation context when used outside of an LLM response (e.g., for bot greeting messages or injected speech) 2026-02-27 11:56:44 -03:00
filipi87
deba2515f9 Added a new LLMAssistantPushAggregationFrame control frame that signals LLMAssistantAggregator to immediately flush its text buffer to the conversation context 2026-02-27 11:56:36 -03:00
Mark Backman
127b52bad5 Merge pull request #3852 from pipecat-ai/mb/deprecate-processing-metrics
Deprecate processing metrics (ProcessingMetricsData)
2026-02-27 09:50:29 -05:00
Mark Backman
0697f72dae Merge pull request #3864 from pipecat-ai/mb/auto-docs-update
Add automated docs update workflow
2026-02-27 09:36:27 -05:00
Mark Backman
c259a6a73b Deprecate processing metrics (ProcessingMetricsData)
Add deprecation warnings to start_processing_metrics() and
stop_processing_metrics() on FrameProcessorMetrics and FrameProcessor.
Mark ProcessingMetricsData as deprecated in docstring. All existing
behavior is preserved — the warnings inform users that these will be
removed in a future version.
2026-02-27 09:22:29 -05:00
Mark Backman
3e04f5d05f Add GitHub Actions workflow to auto-update docs on PR merge
Runs Claude Code Action after PRs merge to main when source files
in services/transports/serializers/processors/audio/turns/observers/pipeline
are changed. Creates a docs PR on pipecat-ai/docs with targeted edits
following the existing update-docs skill instructions.
2026-02-27 09:18:15 -05:00
zack
cd07937c5d Fix missing imports: Add UserStartedSpeakingFrame and UserStoppedSpeakingFrame 2026-02-26 22:18:02 -05:00
zack
72934bd8ae Add u3-rt-pro support and improvements to AssemblyAI STT service
- Fix speaker diarization: Add field alias for speaker_label → speaker
  mapping in TurnMessage model
- Add warning for non-optimal min_end_of_turn_silence_when_confident
  values (recommends 100ms for best latency)
- Improve max_turn_silence override warning message clarity
- Update custom prompt warning (remove 88% accuracy claim)
- Add comprehensive logging for debugging:
  - Log final connection params after modifications
  - Log WebSocket URL and parsed parameters
  - Log speaker field in transcripts
  - Log text sent to LLM with speaker formatting
- Support dynamic configuration updates via STTUpdateSettingsFrame:
  - keyterms_prompt (when AssemblyAI API supports it)
  - prompt
  - max_turn_silence
  - min_end_of_turn_silence_when_confident
2026-02-26 22:04:21 -05:00
Mark Backman
2a6a993869 Merge pull request #3850 from rupesh-svg/fix/genesys-remove-audio-chunk-logging
Remove verbose audio chunk logging from GenesysAudioHookSerializer
2026-02-26 21:52:54 -05:00
Rupesh
bbaa79fef0 Add changelog for PR #3850 2026-02-26 14:00:34 -08:00
Rupesh
fff9db0d8f Remove verbose audio chunk logging from GenesysAudioHookSerializer
Fixes #3777
2026-02-26 13:51:05 -08:00
kompfner
7fe458fe59 Merge pull request #3817 from pipecat-ai/pk/service-settings-fix-back-compat-for-nested-external-sdk-types
Flatten `LiveOptions` into individual fields on `DeepgramSTTSettings`…
2026-02-26 11:08:27 -05:00
Paul Kompfner
faed775d90 Extract _DeepgramSTTSettingsBase with shared _merge_live_options_delta to deduplicate LiveOptions merge logic between __init__ and apply_update, and between the Deepgram STT and SageMaker variants; make top-level model/language take precedence over conflicting live_options values in updates; remove unnecessary Language enum-to-string conversion (Language is a StrEnum) 2026-02-26 11:02:44 -05:00
Mark Backman
b63ca524f5 Merge pull request #3806 from pipecat-ai/mb/ultravox-updates
Align Ultravox Realtime service with OpenAI/Gemini patterns
2026-02-26 10:49:21 -05:00
Mark Backman
907ff58d41 Align Ultravox Realtime service with OpenAI/Gemini patterns
- Add InterruptionFrame handling with stop_all_metrics()
- Add processing metrics (start/stop) at response boundaries
- Fix agent transcript handling for voice and text modalities:
  - Voice mode: push LLMTextFrame (append_to_context=False) and
    TTSTextFrame for deltas, skip duplicated final text
  - Text mode: push LLMTextFrame with proper response lifecycle,
    no TTSTextFrame (downstream TTS handles audio)
- Add output_medium parameter to AgentInputParams and OneShotInputParams
- Improve TTFB measurement using VAD speech end time
- Update example with user turn strategies and transcript events
- Add text-only output example (50a-ultravox-realtime-text.py)
2026-02-26 10:44:36 -05:00
Mark Backman
97b93ebe57 Merge pull request #3696 from pipecat-ai/mb/streaming-tts-input
Improve streaming TTS input support, add TextAggregationMetricsData
2026-02-26 10:26:53 -05:00
Mark Backman
3ae173520e Code review feedback 2026-02-26 10:23:35 -05:00
Paul Kompfner
c184ac09b8 Inline _build_live_options into _connect in DeepgramSTTService and DeepgramSageMakerSTTService since it's trivial and only called from one place 2026-02-26 09:42:15 -05:00
Paul Kompfner
3c20eda8bf Keep model/language in LiveOptions at construction time so apply_update's bidirectional sync is sufficient; simplify _build_live_options to only add sample_rate 2026-02-26 09:32:52 -05:00
Mark Backman
d69a337def Add text_aggregation_mode parameter to TTSService
Move the sentence vs token aggregation concern into text aggregators
so all text flows through them regardless of mode. This enables
pattern detection and tag handling to work in TOKEN mode.

- Add TextAggregationMode enum (SENTENCE, TOKEN) as the user-facing
  TTS setting, separate from the internal AggregationType
- Add TOKEN mode support to Simple, SkipTags, and PatternPair aggregators
- Add text_aggregation_mode parameter to TTSService and all TTS subclasses
- Deprecate aggregate_sentences in favor of text_aggregation_mode
- Merge TTSService._process_text_frame() into a single codepath
2026-02-26 08:55:41 -05:00
Mark Backman
f7434cdde1 Add text aggregation time metric for TTS sentence aggregation
Add TextAggregationMetricsData measuring the time from the first LLM
token to the first complete sentence, representing the latency cost of
sentence aggregation in the TTS pipeline.
2026-02-26 08:48:47 -05:00
Paul Kompfner
e21e8585f0 Add deepgram and sagemaker extras to CI test dependencies so Deepgram and Deepgram Sagemaker settings tests can run 2026-02-25 18:59:59 -05:00
Paul Kompfner
8b6aa4b912 Unflatten LiveOptions back into a single live_options field on DeepgramSTTSettings and DeepgramSageMakerSTTSettings; add apply_update override with delta-merge semantics and from_mapping override for backward-compatible dict-style updates 2026-02-25 18:25:11 -05:00
Paul Kompfner
a4b6db6fb4 Flatten LiveOptions into individual fields on DeepgramSTTSettings and DeepgramSageMakerSTTSettings for backward-compatible dict-style updates via STTUpdateSettingsFrame; during the big service settings refactor, we accidentally got rid of the ability to update individual LiveOptions fields with a sparse update 2026-02-25 17:39:31 -05:00
Mark Backman
edc79d374a Merge pull request #3836 from pipecat-ai/mb/small-webrtc-prebuilt-2.3.0
Update the pipecat-ai-small-webrtc-prebuilt to 2.3.0
2026-02-25 17:18:32 -05:00
Mark Backman
e521aef5df Merge pull request #3842 from pipecat-ai/mb/claude-plugin-docs
Add /update-docs skill to claude-plugin
2026-02-25 16:38:16 -05:00
kompfner
3cfff51205 Merge pull request #3827 from pipecat-ai/pk/gemini-tts-service-remove-model-ivar
Remove unnecessary `_model` ivar from `GeminiTTSService`, using `_set…
2026-02-25 16:14:38 -05:00
Paul Kompfner
3d8e3a4043 Remove unnecessary _model ivar from ElevenLabs STT services, using _settings.model instead. 2026-02-25 16:07:33 -05:00
Paul Kompfner
7ee0400c4c Remove unnecessary _model ivar from Hathora TTS and STT services, using _settings.model instead. 2026-02-25 16:07:26 -05:00
Paul Kompfner
781d191509 Remove unnecessary _model ivar from GeminiTTSService, using _settings.model instead 2026-02-25 15:59:38 -05:00
kompfner
a8cb2a26d1 Merge pull request #3841 from pipecat-ai/pk/groq-tweaks
A few Groq-related tweaks:
2026-02-25 15:54:33 -05:00
kompfner
b1df1ba5d4 Merge pull request #3834 from pipecat-ai/pk/make-ai-service-exclusive-syncer-of-model-name-to-metrics
Make it so that `AIService` is the exclusive "syncer" of model name t…
2026-02-25 15:53:59 -05:00
Mark Backman
eee2ef7e85 Add /update-docs skill to claude-plugin 2026-02-25 15:45:16 -05:00
Paul Kompfner
ff0f3dce32 A few Groq-related tweaks:
- Wire up passing speed setting to Groq, even though only a value of 1.0 is supported today
- Update the 55y example to switch voices instead of changing speed
- Add a 55zn example to exercise runtime updates of Groq STT
2026-02-25 15:10:48 -05:00
Paul Kompfner
bca42f7d68 Fix Hathora 55 series examples, and fix Hathora missing settings field warning 2026-02-25 14:48:40 -05:00
Paul Kompfner
27940d83a2 Make it so that AIService is the exclusive "syncer" of model name to metrics.
The only (rare) exception—where a service directly still needs to directly call `self._sync_model_name_to_metrics()`—is when the model name need to be "pulled" from another field (or nested field) in settings up to settings.model on a settings update. This only occurs in Deepgram services, where we use the voice as the model name.

This change has the side-effect of bringing model name to metrics for a number of services that were accidentally omitting it before.
2026-02-25 14:48:24 -05:00
Mark Backman
937c691f2a Merge pull request #3838 from pipecat-ai/mb/remove-playht
Remove PlayHT TTS services
2026-02-25 14:34:15 -05:00
Mark Backman
6803d38d3f Merge pull request #3833 from pipecat-ai/mb/add-performance-changelog-fragment
Add Performance as a changelog fragment option
2026-02-25 14:33:52 -05:00
Mark Backman
44993fe9e3 Remove PlayHT TTS services 2026-02-25 14:12:39 -05:00
Mark Backman
0fe4c732b7 Merge pull request #3837 from alts/alts/append-trailing-space
Add `append_trailing_space` to all Rime websocket services
2026-02-25 13:35:07 -05:00
Stephen Altamirano
ceead60ef2 Add append_trailing_space to all Rime websocket services
This was added in 31daa889e8, but only
to `RimeTTSService`, not to `RimeNonJsonTTSService. Bringing these
to parity means that users switching between the two, with the same
inputs, have more consistent vocalization behaviors.
2026-02-25 10:02:38 -08:00
Mark Backman
e028194dbe Update the pipecat-ai-small-webrtc-prebuilt to 2.3.0 2026-02-25 12:23:13 -05:00
Mark Backman
81f4672535 Add Performance as a changelog fragment option 2026-02-25 09:47:42 -05:00
Mark Backman
9273b158ea Merge pull request #3825 from pipecat-ai/mb/llm-user-aggregator-interim-transcription
Consume InterimTranscriptionFrame and TranslationFrame in LLMUserAggregator
2026-02-25 09:06:34 -05:00
Mark Backman
353a28842c Merge pull request #3807 from pipecat-ai/mb/update-openai-realtime-1.5
Update OpenAI Realtime default model to gpt-realtime-1.5
2026-02-25 09:06:19 -05:00
Mark Backman
3e6c59c736 Merge pull request #3809 from pipecat-ai/mb/krisp-viva-result
Add Krisp API key support and debug logging
2026-02-25 09:05:12 -05:00
Mark Backman
0ca8c850fb Add TurnMetricsData and e2e processing time for KrispVivaTurn
Introduce a generic TurnMetricsData class for turn detection metrics,
replacing the service-specific SmartTurnMetricsData (now deprecated).
Add end-to-end processing time measurement to KrispVivaTurn, tracking
the interval from VAD speech-to-silence transition to model threshold
crossing. Consume metrics in the strategy _handle_input_audio path
so they are pushed immediately when fresh.
2026-02-25 09:01:21 -05:00
Mark Backman
73ee4da7d4 Add Krisp API key support for new SDK licensing requirement
The Krisp VIVA SDK v1.8.0 requires a license key in globalInit(). Add
api_key parameter to KrispVivaSDKManager, KrispVivaTurn, and
KrispVivaFilter with fallback to KRISP_API_KEY env var. Maintain
backwards compatibility with older SDK versions by catching TypeError
and falling back to the old 3-arg signature.
2026-02-25 09:01:00 -05:00
Filipi da Silva Fuchter
2f60074da3 Merge pull request #3814 from pipecat-ai/filipi/fix_close_context
Fixed an issue where the TTS providers did not close the context after the audio context finished processing all audio.
2026-02-25 08:21:04 -05:00
filipi87
751b1b8100 Adding the changelog entries for the tts fixes. 2026-02-25 10:18:25 -03:00
filipi87
d899f0af11 Refactored all AudioContextTTSService based providers to override the new callbacks instead of _handle_interruption(), making provider-specific cleanup cleaner and more explicit 2026-02-25 10:18:16 -03:00
filipi87
c09ae6ba6d Added two new lifecycle callbacks to AudioContextTTSService: on_audio_context_interrupted() and on_audio_context_completed() 2026-02-25 10:17:54 -03:00
Mark Backman
a187a4b3b2 Merge pull request #3830 from pipecat-ai/aleix/restore-dev-skills 2026-02-25 06:33:16 -05:00
Aleix Conchillo Flaqué
68e19a730b Restore dev skills and add marketplace for maintainer workflows
Brings back the 6 development workflow skills (changelog, cleanup,
code-review, docstring, pr-description, pr-submit) that were moved
to pipecat-ai/skills, and adds a .claude-plugin/marketplace.json so
other pipecat-ai repos can install them. Updates README contributing
section with installation instructions.
2026-02-24 23:47:06 -08:00
Mark Backman
67cb7d575f Merge pull request #3828 from pipecat-ai/mb/skip-empty-audio-filter-frames
Skip empty audio frames after filter buffering
2026-02-24 23:27:22 -05:00
Mark Backman
a84930dc3e Skip empty audio frames after filter buffering
Audio filters like RNNoise, KrispViva, and AIC return empty bytes while
buffering audio to accumulate their required frame size. These empty
frames were flowing downstream, causing misleading "Empty audio frame
received for STT service" warnings.

Skip the frame in BaseInputTransport when audio is empty, preventing
unnecessary processing in VAD and downstream processors.

Fixes #3517
2026-02-24 23:21:52 -05:00
kompfner
54fd73c460 Merge pull request #3821 from pipecat-ai/pk/fix-missing-field-warning-rime-tts
Fix missing field warning in `RimeTTSService`
2026-02-24 21:58:17 -05:00
kompfner
c954e1c898 Merge pull request #3820 from pipecat-ai/pk/fix-breakage-when-sending-generic-settings-update
Fix breakage when using a generic settings update (e.g. a `TTSSetting…
2026-02-24 21:58:05 -05:00
kompfner
db76cd052a Merge pull request #3819 from pipecat-ai/pk/make-update-settings-frames-uninterruptible
Make `ServiceUpdateSettingsFrame` uninterruptible—settings updates ar…
2026-02-24 21:57:41 -05:00
Mark Backman
167e68672b Add changelog for InterimTranscriptionFrame/TranslationFrame fix 2026-02-24 20:52:16 -05:00
Mark Backman
69d916ca51 Consume InterimTranscriptionFrame and TranslationFrame in LLMUserAggregator
These frames were falling through to the else branch and being pushed
downstream, unlike TranscriptionFrame which is explicitly consumed.
This aligns with how the assistant aggregator already filters them.
2026-02-24 20:51:41 -05:00
Mark Backman
9890b93d08 Merge pull request #3822 from pipecat-ai/fix/stt-ttfb-timeout-timestamp
Fix STT TTFB timeout measuring to use transcript arrival time
2026-02-24 19:45:15 -05:00
Mark Backman
f928206b3a Add changelog for STT TTFB timeout fix 2026-02-24 19:02:40 -05:00
Mark Backman
f421ad9cf6 Fix STT TTFB timeout measuring to timeout expiry instead of transcript time
PR #3776 replaced manual timestamp tracking with stop_ttfb_metrics() in
the timeout handler, but without an end_time it uses time.time() at
timeout expiry—producing TTFB = timeout + stop_secs (~2.2s) instead of
the actual transcript latency.

Restore _last_transcript_time tracking so the timeout handler measures
to when the transcript arrived, and skip reporting if none arrived.
2026-02-24 18:57:38 -05:00
Paul Kompfner
d918a20b75 Fix missing field warning in RimeTTSService 2026-02-24 18:14:16 -05:00
Paul Kompfner
d91c230b85 Fix breakage when using a generic settings update (e.g. a TTSSettings) instead of a more specific one (e.g. a RimeTTSSettings). Both should work, assuming you're only changing fields present in the generic settings. 2026-02-24 18:05:27 -05:00
Paul Kompfner
b6f21ab15d Make ServiceUpdateSettingsFrame uninterruptible—settings updates are generally independent of specific utterances.
Before this change, settings updates were often not applied. For example, a `TTSUpdateSettingsFrame` queued while the bot was speaking would only have an effect at the end of the bot's reply, and any interruption before the end of the reply would "cancel" the update.
2026-02-24 17:47:53 -05:00
kompfner
6f0061ab96 Merge pull request #3812 from pipecat-ai/pk/service-settings-storage-v-delta-mode
Make clearer the distinction between "storage-mode" and "delta-mode" …
2026-02-24 15:37:49 -05:00
Aleix Conchillo Flaqué
761397d1f9 Merge pull request #3816 from pipecat-ai/aleix/use-skills-repo
Move skills to pipecat-ai/skills repo, add README instructions
2026-02-24 12:11:20 -08:00
Aleix Conchillo Flaqué
d9bb4d07c6 Merge pull request #3815 from pipecat-ai/fix/sentry-metrics-signatures
Fix SentryMetrics method signatures to match base class
2026-02-24 12:11:07 -08:00
Aleix Conchillo Flaqué
ee46cbce4c Move skills to pipecat-ai/skills repo, add README instructions
Remove bundled Claude Code skills (changelog, cleanup, code-review,
docstring, pr-description, pr-submit) that now live in
https://github.com/pipecat-ai/skills. Add a section to the README
with installation instructions. The update-docs skill remains as
it is specific to this repository.
2026-02-24 11:41:19 -08:00
Aleix Conchillo Flaqué
b4b9976b9c Fix SentryMetrics method signatures to match base class
Update start_ttfb_metrics, stop_ttfb_metrics, start_processing_metrics,
and stop_processing_metrics to accept start_time/end_time keyword
arguments matching the updated FrameProcessorMetrics signatures.

Closes #3808
2026-02-24 11:26:34 -08:00
Paul Kompfner
b78a293ffb Flatten input_params into individual fields on SonioxSTTSettings and GladiaSTTSettings
This makes each service-specific field individually visible to the delta/update mechanism (`apply_update`, `given_fields`) and removes the need for complex sync logic between `input_params` and top-level fields like `model`.

- Soniox: replace `input_params: SonioxInputParams` with 8 individual fields, simplify `_update_settings` by removing model sync logic, remove unused `is_given` import
- Gladia: replace `input_params: GladiaInputParams` with 11 individual fields, resolve deprecated `language` → `language_config` at init time rather than at `_prepare_settings` time
2026-02-24 14:01:43 -05:00
Paul Kompfner
0a89d24f70 Update some more services to ensure that there are no un-initialized fields in self._settings 2026-02-24 14:01:43 -05:00
Paul Kompfner
8c9ccf8f82 Bump various deprecation messages from mentioning version 0.0.103 to 0.0.104 2026-02-24 14:01:43 -05:00
Paul Kompfner
bcc2b4def4 Make clearer the distinction between "storage-mode" and "delta-mode" usage of *Settings objects
- Storage mode: for use in `self._settings`. All fields should be specified, i.e. should not be `NOT_GIVEN`.
- Delta mode: for use in `*UpdateSettingsFrame`.

In service of this, this commit:
- Adds a runtime check that all fields are specified in storage mode
- Updates all services to specify all fields in stored settings
- Updates all services to no longer check for `is_given` in stored settings (not necessary anymore)
- Updates relevant docstrings
- Renames `update` to `delta` in `*UpdateSettingsFrame`
- Updates community integrations guide
2026-02-24 14:01:28 -05:00
Filipi da Silva Fuchter
57d25c564c Merge pull request #3786 from pipecat-ai/filipi/refactor_word_tts_service
Refactoring the services using the WordTTSService
2026-02-24 13:53:58 -05:00
filipi87
6cda2ff941 Changelog entry for word timestamp refactor and deprecation notes. 2026-02-24 15:49:02 -03:00
filipi87
323477bfa4 Refactoring the services using the WordTTSService. 2026-02-24 15:48:46 -03:00
Mark Backman
fa692ec989 Merge pull request #3813 from pipecat-ai/mb/fix-stt-ttfb
Fix STT TTFB metrics for Soniox and AWS Transcribe
2026-02-24 13:12:32 -05:00
Mark Backman
23ad181515 Fix Soniox processing metrics to measure token-to-transcript time
Move start_processing_metrics from run_stt (called per audio chunk,
producing noisy 0ms logs) to _receive_messages when the first final
token arrives for a new utterance. The existing stop_processing_metrics
in send_endpoint_transcript completes the pair, giving a meaningful
measurement of time from first recognition to finalized transcript.
2026-02-24 13:09:29 -05:00
Mark Backman
6f7664846c Add can_generate_metrics to Soniox and AWS Transcribe STT services
Commit 859cd7c9 refactored STT TTFB measurement to use the base class
start_ttfb_metrics/stop_ttfb_metrics, which are gated behind
can_generate_metrics(). Soniox and AWS Transcribe never overrode this
method (default returns False), so TTFB was silently never reported.
2026-02-24 12:59:44 -05:00
Mark Backman
081aaa50dc Merge pull request #3811 from pipecat-ai/mb/nltk-upgrade
Bump nltk minimum version from 3.9.1 to 3.9.3
2026-02-24 10:28:32 -05:00
Mark Backman
aff8ab8a40 Update OpenAI Realtime default model to gpt-realtime-1.5 2026-02-24 09:07:31 -05:00
Mark Backman
0f7e6e14ab Bump nltk minimum version from 3.9.1 to 3.9.3
Resolves a security vulnerability flagged by Dependabot (#164).
2026-02-24 08:56:00 -05:00
Mark Backman
65f563ad34 Add debug logging to KrispVivaTurn analyze_end_of_turn and update example
Move speech detection tracking outside the per-frame loop in append_audio
since is_speech applies to the whole buffer. Add debug log in
analyze_end_of_turn to show state and probability at decision time. Update
the Krisp VIVA example to use Cartesia TTS and turn analyzer strategy.
2026-02-23 21:35:35 -05:00
Mark Backman
9c2ac661a3 Merge pull request #3805 from pipecat-ai/mb/dataclass-basemodel
Add dataclass vs Pydantic BaseModel convention to CLAUDE.md
2026-02-23 19:32:31 -05:00
kompfner
cdd65b6c0a Merge pull request #3714 from pipecat-ai/pk/service-settings-refactor
Broad refactor of service settings and how they’re updated at runtime
2026-02-23 17:15:15 -05:00
Paul Kompfner
71fc078c24 Refine ServiceSettings docstring: clarify NOT_GIVEN semantics and fix frame reference
Use wildcard `*UpdateSettingsFrame` to cover all frame types. Clarify that NOT_GIVEN only appears in update deltas, not in the service's current settings state.
2026-02-23 16:55:11 -05:00
Paul Kompfner
7556427862 Revise changelog entries for service settings refactor
Split the single "changed" entry into separate "added", "changed", and "deprecated" entries for clarity. Add a note about the subtle behavior change in the deprecated set_model/set_voice/set_language methods.
2026-02-23 16:52:11 -05:00
Paul Kompfner
bcf11ecbd4 Looks like the Deepgram Sagemaker TTS services aren't able yet to successfully disconnect/reconnect to apply runtime settings updates. For now, marking them as not yet supporting runtime settings updates. 2026-02-23 16:02:00 -05:00
Paul Kompfner
ff174dd1c2 Fix STT/TTS Deepgram Sagemaker 55-series examples (examples updating settings at runtime) 2026-02-23 16:02:00 -05:00
Paul Kompfner
e804060e17 Update COMMUNITY_INTEGRATIONS.md _update_settings examples
Simplify the reconnect example to show a common pattern (reconnect on any change) and improve the _warn_unhandled_updated_settings example to show selective handling of specific fields.
2026-02-23 15:45:00 -05:00
Paul Kompfner
30db5fea7c Clarify that ServiceSettings and subclasses represent runtime-updatable settings
Update docstrings for ServiceSettings, LLMSettings, TTSSettings, and STTSettings to make clear these capture only the subset of service configuration that can be changed while the pipeline is running via UpdateSettingsFrame, not all constructor parameters.
2026-02-23 15:38:57 -05:00
Mark Backman
c527e1f30f Add dataclass vs Pydantic BaseModel rule to CLAUDE.md
Document the existing convention: use @dataclass for frames and
internal pipeline data, use Pydantic BaseModel for configuration,
parameters, metrics, and external API data.
2026-02-23 14:26:16 -05:00
Paul Kompfner
029f3dbefb Updating 55o ElevenLabsTTSService example to also exercise switching voices, which requires reconnect 2026-02-23 12:08:13 -05:00
kompfner
03cb0054f9 Merge branch 'main' into pk/service-settings-refactor 2026-02-23 11:46:03 -05:00
Mark Backman
6a7e9358c6 Merge pull request #3803 from pipecat-ai/mb/inline-smart-turn-v3-deps
Inline local-smart-turn-v3 deps for Poetry compatibility
2026-02-23 09:29:52 -05:00
Mark Backman
6a3718d33d Inline local-smart-turn-v3 deps for Poetry compatibility
Replace self-referential `pipecat-ai[local-smart-turn-v3]` extra in core
dependencies with the actual packages (`transformers`, `onnxruntime`).
Self-referential extras are not supported by Poetry and cause dependency
resolution failures. Since these are required by the default turn stop
strategy (LocalSmartTurnAnalyzerV3), they belong in core dependencies.

- Remove `local-smart-turn-v3` optional extra from pyproject.toml
- Remove try/except ModuleNotFoundError guard (now always installed)
- Remove `--extra local-smart-turn-v3` from CI workflows
2026-02-23 09:00:36 -05:00
Om Chauhan
b390dc369c added changelog 2026-02-21 18:33:29 +05:30
Om Chauhan
a18aa738e0 fix(realtime): handle response_cancel_not_active as non-fatal 2026-02-21 18:26:31 +05:30
Om Chauhan
9476b5d184 added changelog 2026-02-21 17:35:08 +05:30
Om Chauhan
f49658de15 skipping provider-specific messages during summarization 2026-02-21 17:19:50 +05:30
Aleix Conchillo Flaqué
b67af19d47 Merge pull request #3793 from pipecat-ai/changelog-0.0.103
Release 0.0.103 - Changelog Update
2026-02-20 16:42:46 -08:00
aconchillo
6d9c07b945 Update changelog for version 0.0.103 2026-02-20 16:39:36 -08:00
Aleix Conchillo Flaqué
18429f80f1 github(changelog): allow performance type 2026-02-20 16:32:40 -08:00
Aleix Conchillo Flaqué
0a54dc9721 Merge pull request #3792 from pipecat-ai/aleix/update-anthropic-default-model
Update default Anthropic model to claude-sonnet-4-6
2026-02-20 16:28:08 -08:00
Aleix Conchillo Flaqué
521f669051 Add changelog entries for PR #3792 2026-02-20 16:18:21 -08:00
Aleix Conchillo Flaqué
abb20f34ba Update default Anthropic model to claude-sonnet-4-6
Update the default model in AnthropicLLMService and remove the
now-unnecessary explicit model from the function calling example.
2026-02-20 16:17:51 -08:00
Joshua Primas
d38b1d97d4 Added changelog 2026-02-20 16:13:44 -08:00
Joshua Primas
0b4568843b Improved logging + error handling + pipecat bot name usage 2026-02-20 15:59:52 -08:00
Joshua Primas
35aba4128c Adding the LemonSlice transport integration 2026-02-20 15:24:48 -08:00
Aleix Conchillo Flaqué
b1e72ad4b7 Merge pull request #3789 from pipecat-ai/aleix/fix-missing-await-and-interruption-hang
Fix missing await and interruption timeout hang
2026-02-20 14:59:11 -08:00
Aleix Conchillo Flaqué
f610fb95f9 Add changelog entries for PR #3789 2026-02-20 14:56:46 -08:00
Aleix Conchillo Flaqué
827032fefb Unblock push_interruption_task_frame_and_wait after timeout
When the InterruptionFrame does not complete within the timeout the
caller was stuck in an infinite loop logging warnings. Now the event
is set after the first timeout so the processor can continue.

Also adds a keyword timeout parameter so callers can customize the
wait duration.
2026-02-20 14:56:42 -08:00
Aleix Conchillo Flaqué
af4ef95dc6 Fix missing await on add_audio_frames_message in Google audio examples
The method is async but was being called without await, silently
discarding the coroutine.
2026-02-20 14:24:22 -08:00
Aleix Conchillo Flaqué
0370bb15e4 update uv.lock 2026-02-20 13:47:04 -08:00
Aleix Conchillo Flaqué
2b3595485f Merge pull request #3788 from dhruvladia-sarvam/v3-fix-final
initial
2026-02-20 13:46:18 -08:00
Paul Kompfner
af4226adbf Add changelog entries for service settings refactor PR #3714 2026-02-20 15:26:17 -05:00
Paul Kompfner
29e2a861dc Update AIService.set_model_name to AIService._sync_model_name_to_metrics to:
- indicate clearly that it's not meant for public use
- make it clear the `self._settings` is the single source of truth for model information
- set the stage for an upcoming change where `AIService` subclasses won't have to ever worry about explicitly calling an `AIService` method to sync model name to metrics

Across all services, switch from accessing `self._model_name` or `self.model_name` in favor of `self._settings.model`.
2026-02-20 15:02:50 -05:00
Filipi da Silva Fuchter
63c664becb Merge pull request #3787 from pipecat-ai/filipi/refresh_active_audio_context
Fix race condition where context times out after sending second transcript
2026-02-20 14:50:38 -05:00
dhruvladia-sarvam
fecf462139 initial 2026-02-21 01:02:37 +05:30
Daksh Dua
023063759a Changelog entry for TTS race condition fix. 2026-02-20 16:00:34 -03:00
Daksh Dua
c49eda98e7 Fix race condition where context times out after sending second transcript 2026-02-20 15:37:14 -03:00
Filipi da Silva Fuchter
5d07326e36 Merge pull request #3732 from pipecat-ai/filipi/tts_updates
Refactored audio context management in TTS services
2026-02-20 13:02:42 -05:00
filipi87
fa659311b6 Changelog entry 2026-02-20 14:57:59 -03:00
filipi87
125c423356 Refactored audio context management in TTS services to improve encapsulation and reduce code duplication 2026-02-20 14:57:44 -03:00
Filipi da Silva Fuchter
c9615c8db6 Merge pull request #3779 from pipecat-ai/filipi/filter_observer
Allowing to define the list of frame processors whose frames should be silently ignored by the RTVI observer.
2026-02-20 12:42:02 -05:00
Aleix Conchillo Flaqué
28c542f6ed Merge pull request #3785 from pipecat-ai/mb/deepgram-sagemaker-tts
Add DeepgramSageMakerTTSService
2026-02-20 09:01:32 -08:00
Paul Kompfner
f5b86d9cdc Actually, revert the change making it so that STTService takes model and language args at init time. It'll be up to the subclasses to append those to _settings (or better yet, provide their own service-specific _settings). This avoids rocking the boat too too much. 2026-02-20 11:26:28 -05:00
Aleix Conchillo Flaqué
5708c81b93 Merge pull request #3782 from pipecat-ai/aleix/fix-mutable-default-args-aggregator-pair
Fix mutable default arguments in LLMContextAggregatorPair
2026-02-20 08:02:18 -08:00
Paul Kompfner
f4e9825c03 Remove self._voice_id from TTS Service implementations in favor of self._settings.voice 2026-02-20 10:52:57 -05:00
Mark Backman
82ce3ea8de Update 07c example to use DeepgramSageMakerTTSService 2026-02-20 08:10:41 -07:00
Mark Backman
62ada92188 Add changelog for PR #3785 2026-02-20 08:09:57 -07:00
Mark Backman
273692421f Add DeepgramSageMakerTTSService for Deepgram TTS on AWS SageMaker
Adds a TTS service that connects to Deepgram models deployed on AWS
SageMaker endpoints via HTTP/2 bidirectional streaming. Supports the
Deepgram TTS protocol (Speak, Flush, Clear, Close) over the BiDi
client, with interruption handling and per-turn TTFB metrics.

Updates the example and env.example with separate STT/TTS endpoint names.
2026-02-20 08:08:00 -07:00
Paul Kompfner
5d8a5bf750 Add initialization of self._settings to service superclasses (STTService, TTSService, LLMService), using "generic" settings for those services (STTSettings, TTSSettings, LLMSettings) 2026-02-20 09:31:22 -05:00
Mark Backman
0a3e212f93 Merge pull request #3784 from pipecat-ai/mb/stt-sagemaker-finalize
Align DeepgramSageMakerSTTService finalize pattern with DeepgramSTTService
2026-02-20 09:26:23 -05:00
Mark Backman
43d686c622 Add changelog entry for PR #3784 2026-02-20 07:17:36 -07:00
Mark Backman
4d136e1e28 Align DeepgramSageMakerSTTService finalize pattern with DeepgramSTTService 2026-02-20 07:15:38 -07:00
Aleix Conchillo Flaqué
2024285c75 Add changelog entries for PR #3782 2026-02-19 20:52:31 -08:00
Aleix Conchillo Flaqué
bc830c16f1 Fix mutable default arguments in LLMContextAggregatorPair
Replace mutable default parameter values with None and instantiate
inside the method body to avoid shared state across calls.
2026-02-19 20:52:00 -08:00
Paul Kompfner
fb27642190 Add self._settings to 6 remaining services
- AWSNovaSonicLLMService: new `AWSNovaSonicLLMSettings` with `voice_id` and `endpointing_sensitivity`; remove `self._params` entirely, storing audio I/O config as plain instance variables
- NeuphonicHttpTTSService: reuse `NeuphonicTTSSettings`; use inherited `language` field instead of bespoke `lang_code`
- NvidiaTTSService: new `NvidiaTTSSettings` with `quality`
- PiperTTSService / PiperHttpTTSService: new `PiperTTSSettings` / `PiperHttpTTSSettings` (no extra fields)
- SpeechmaticsTTSService: new `SpeechmaticsTTSSettings` with `max_retries`

Also remove redundant `lang_code` from `NeuphonicTTSSettings` (both WS and HTTP services now use the inherited `TTSSettings.language` field, with automatic enum conversion via the base class).

HTTP services (Neuphonic HTTP, Piper HTTP, Speechmatics) don't override `_update_settings` since the base class applies changes to `self._settings` and subsequent requests read from it automatically.
2026-02-19 18:35:59 -05:00
Paul Kompfner
463ea3725b Update Deepgram Flux with the new service settings pattern 2026-02-19 17:12:24 -05:00
Paul Kompfner
6c609031ee Add more 55-series examples
Also:
- remove unnecessary pass-through `_update_settings` implementation in `FalSTTService`
- warn that `AsyncAITTSService` doesn't currently support runtime settings updates
- update how `GradiumTTSService._update_settings` checks for voice changes
- remove a couple of unnecessary args (because they specified defaults) in other examples
2026-02-19 16:46:14 -05:00
filipi87
18630c9478 Adding changelog entry for RTVI observer ignored_sources feature. 2026-02-19 18:41:05 -03:00
filipi87
3a8d3cc841 Allowing to define the list of frame processors whose frames should be silently ignored by the RTVI observer. 2026-02-19 18:36:12 -03:00
Filipi da Silva Fuchter
2963c7589d Merge pull request #3774 from pipecat-ai/mb/broadcast-frames-rtvi-observer
Fix RTVIObserver missing upstream-only frames
2026-02-19 15:32:48 -05:00
filipi87
63caa403cb Improving RTVI doc description. 2026-02-19 17:31:25 -03:00
Paul Kompfner
ebb42a3c6d Fix forward reference crash in Google and Anthropic LLM ThinkingConfig
ThinkingConfig was defined as an inner class on the service but referenced in the Settings dataclass declared before the service class, causing a crash at import time. Move ThinkingConfig to a standalone class defined before Settings, and keep a class attribute alias for backward compatibility.
2026-02-19 15:06:48 -05:00
Paul Kompfner
cc54ff4708 Add more 55-series examples 2026-02-19 14:55:21 -05:00
Aleix Conchillo Flaqué
846cf0794d Merge pull request #3615 from omChauhanDev/fix/daily-transport-message-queue
fix(daily): queue outbound messages until transport joins
2026-02-19 11:55:11 -08:00
Aleix Conchillo Flaqué
498349c17e Merge pull request #3776 from pipecat-ai/aleix/stt-ttfb-metrics-refactor
Refactor STT TTFB metrics to use base class start/stop pattern
2026-02-19 11:46:46 -08:00
Aleix Conchillo Flaqué
474b27305f Merge pull request #3748 from pipecat-ai/mb/user-idle-configurable
Make UserIdleController always-on with dynamic timeout updates
2026-02-19 11:44:51 -08:00
Aleix Conchillo Flaqué
20509e8f96 Merge pull request #3744 from pipecat-ai/mb/user-idle-timeout-frame
Redesign UserIdleController to use BotStoppedSpeakingFrame
2026-02-19 11:34:42 -08:00
filipi87
5b2fa69bdc Renaming from broadcasted_sibling_id to broadcast_sibling_id 2026-02-19 16:24:07 -03:00
Aleix Conchillo Flaqué
4f8cacc769 Merge pull request #3747 from pipecat-ai/mb/update-comment-mute-strategy
Update comment in _maybe_mute_frame
2026-02-19 11:19:44 -08:00
Aleix Conchillo Flaqué
0145fb4ea0 Merge pull request #3763 from lukepayyapilli/fix/asyncgen-cleanup-uvloop-crash
Fix async generator cleanup to prevent uvloop crash on Python 3.12+
2026-02-19 11:14:00 -08:00
Aleix Conchillo Flaqué
8e52df7f03 Add changelog entries for PR #3776 2026-02-19 10:52:45 -08:00
Aleix Conchillo Flaqué
8ee99e37ff Merge pull request #3768 from tanmayc25/fix/tavus-sample-rate
fix: use audio.sample_rate instead of audio.audio_frames in TavusInputTransport
2026-02-19 10:52:34 -08:00
Aleix Conchillo Flaqué
bae4211369 Update dependency lock file 2026-02-19 10:52:28 -08:00
Aleix Conchillo Flaqué
859cd7c920 Refactor STT TTFB metrics to use base class start/stop pattern
Eliminate custom _emit_stt_ttfb_metric and manual timestamp tracking in
STTService by reusing FrameProcessor's start_ttfb_metrics/stop_ttfb_metrics
with new start_time/end_time parameters. This keeps the chronological
start→stop ordering and removes _speech_end_time and _last_transcription_time
state from STTService.
2026-02-19 10:52:24 -08:00
filipi87
d608c400f9 Preventing the duplicated BotStartedSpeakingFrame and BotStoppedSpeakingFrame. 2026-02-19 15:49:22 -03:00
Aleix Conchillo Flaqué
94e93bed83 Merge pull request #3719 from pipecat-ai/aleix/sip-transfer-refer-frames
Add SIP transfer and SIP REFER frames to Daily transport
2026-02-19 10:09:13 -08:00
filipi87
b1cee140b9 Refactoring to use broadcasted_sibling_id instead of broadcasted field. 2026-02-19 15:06:50 -03:00
Aleix Conchillo Flaqué
352361bdd2 Update changelog skill to avoid line wrapping 2026-02-19 09:20:33 -08:00
Aleix Conchillo Flaqué
baa61468a1 Add changelog entries for PR #3719 2026-02-19 09:20:33 -08:00
Aleix Conchillo Flaqué
7501ba2e45 Undeprecate DailyUpdateRemoteParticipantsFrame
Remove the deprecation warning and __post_init__ override. Also fix the
default value for remote_participants to use field(default_factory=dict)
instead of None.
2026-02-19 09:20:33 -08:00
Aleix Conchillo Flaqué
200716e8fe Add SIP transfer and SIP REFER frames to Daily transport
Add write_transport_frame() hook to BaseOutputTransport so subclasses
can handle custom frame types that flow through the audio queue. Add
DailySIPTransferFrame and DailySIPReferFrame as DataFrame subclasses
that queue with audio, ensuring SIP operations execute only after the
bot finishes its current utterance. Override write_transport_frame in
DailyOutputTransport to dispatch these frames to the existing
sip_call_transfer() and sip_refer() client methods.

Also switch DailyOutputTransport.send_message error handling from
logger.error to push_error for consistency.
2026-02-19 09:20:33 -08:00
Paul Kompfner
421696e1c2 Replace Any with specific types and add | _NotGiven to all *Settings field annotations across 49 service files
Every `*Settings` dataclass field whose default is `NOT_GIVEN` now carries `_NotGiven` in its type union so the type system accurately reflects the three-state semantics (real value, `None` where applicable, or not-yet-specified). Fields previously typed as bare `Any`, `str`, `float`, `bool`, `list`, `dict`, or `Optional[X]` are now narrowed to the specific type from the corresponding `InputParams` Pydantic model.
2026-02-19 11:28:29 -05:00
Mark Backman
50ef4909e3 Add changelog entries for PR #3774 2026-02-19 07:44:52 -07:00
Mark Backman
63df4642b5 Fix RTVIObserver missing upstream-only frames by adding broadcasted flag
RTVIObserver previously filtered out all upstream frames to avoid
duplicate messages from broadcasted frames. This caused upstream-only
frames to be silently ignored. Instead, add a `broadcasted` field to
the Frame base class that is set by broadcast_frame() and
broadcast_frame_instance(), and only skip upstream copies of
broadcasted frames.
2026-02-19 07:43:20 -07:00
Filipi da Silva Fuchter
43869a499d Merge pull request #3773 from pipecat-ai/mb/fix-ci-apt-get-update
Fix CI: add apt-get update before installing system packages
2026-02-19 09:28:25 -05:00
Mark Backman
d2bf3952ec Merge pull request #3772 from simliai/main
Update SimliClient to latest
2026-02-19 09:13:14 -05:00
Mark Backman
92c380ee77 Add apt-get update before installing system packages in CI
The CI was failing because the runner's package index was stale,
causing a 404 when fetching libasound2-dev (a dependency of
portaudio19-dev). Running apt-get update first refreshes the index.
2026-02-19 07:01:07 -07:00
antonyesk601
a55ba40921 fix: remove misimport 2026-02-19 10:41:17 +00:00
antonyesk601
fb1bfd03dd update SimliClient to latest 2026-02-19 10:35:50 +00:00
Paul Kompfner
a7edd8e441 Fix 55zp example 2026-02-18 17:15:22 -05:00
Paul Kompfner
2a07138abf Fix Grok Realtime dynamic session properties updating, and update corresponding 55zo example 2026-02-18 17:12:36 -05:00
Filipi da Silva Fuchter
a0a7b3101d Merge pull request #3765 from ianbbqzy/ian/inworld-default-async
[inworld] default timestamp transport strategy to ASYNC
2026-02-18 16:59:01 -05:00
Filipi da Silva Fuchter
39dc4ba99c Updated changelog/3765.changed.md 2026-02-18 16:58:27 -05:00
Paul Kompfner
ad942f6e4c Update 55zn example (UIltravox dynamic settings updates) to exercise changing modality, which is a setting that supports dynamic updates 2026-02-18 16:33:05 -05:00
Paul Kompfner
97d34ef9e1 Update OpenAI Realtime to warn when you try to update settings that can't be updated dynamically.
Update corresponding example to demonstrate updating output modality.
2026-02-18 16:16:06 -05:00
Paul Kompfner
c054780477 Fix 55zh example 2026-02-18 15:59:34 -05:00
Paul Kompfner
88a2dbdb82 Update 55zf example to update a setting that is supported by the default Camb TTS model 2026-02-18 15:48:50 -05:00
Paul Kompfner
d386a0efda Update Sarvam TTS to apply all changes to settings, not just voic 2026-02-18 15:31:08 -05:00
Paul Kompfner
b718a23c17 Tweak 55zd example 2026-02-18 15:25:50 -05:00
Paul Kompfner
e38f7d9451 Fix 55zc example 2026-02-18 15:23:23 -05:00
Paul Kompfner
b00d454842 Fix Inworld TTS settings updating 2026-02-18 15:19:57 -05:00
Paul Kompfner
0fa51811ea Fix 55z example 2026-02-18 15:11:04 -05:00
Paul Kompfner
323ee00b83 Fix 55w example 2026-02-18 14:51:48 -05:00
Paul Kompfner
0c73b77327 Update Lmnt TTS to support updating settings dynamically 2026-02-18 14:47:38 -05:00
Paul Kompfner
416e1cf877 Update Rime TTS services to store voice in the standard settings.voice field, as opposed to the nonstandard speaker field 2026-02-18 14:46:47 -05:00
Paul Kompfner
b4c5cb258b Tweak 55r example to make the settings update more pronounced 2026-02-18 14:15:14 -05:00
Paul Kompfner
728a97ade3 Update Deepgram TTS to support updating settings dynamically 2026-02-18 14:11:51 -05:00
Paul Kompfner
28677ec829 Tweak 55p example to make the settings update more pronounced 2026-02-18 13:49:32 -05:00
Paul Kompfner
17886d14e8 Fix ElevenLabsTTSService settings update code 2026-02-18 13:47:02 -05:00
Paul Kompfner
caf5dacbe8 Update 55j example to avoid console warning 2026-02-18 12:37:50 -05:00
Paul Kompfner
b8b531b66a In Cartesia TTS service, we don't need to override _update_settings. Parent class handling is enough, as new settings are picked up on the next run_tts (no need to reconnect). 2026-02-18 12:37:34 -05:00
Paul Kompfner
a14690e3a0 Fix the 55i example 2026-02-18 11:55:14 -05:00
Paul Kompfner
d913d954db Fix SpeechmaticsSTTService settings update code, and augment test file to better exercise it 2026-02-18 11:34:52 -05:00
Paul Kompfner
e98bb1df66 Simplify 55* examples: inline the settings update directly in the on_client_connected handler instead of wrapping it in a separate async task 2026-02-18 11:06:33 -05:00
Paul Kompfner
a7ada79fd9 Fix ElevenLabsRealtimeSTTService:
- Move `CommitStrategy` up in the file so it could be used by `ElevenLabsRealtimeSTTSettings`
- Fix a bug where `run_tts` would erroneously try to reconnect if a reconnection was already in flight (like a reconnection triggered by `_update_settings`)
2026-02-18 10:50:53 -05:00
Filipi da Silva Fuchter
a5b5a8e5cf Merge pull request #3759 from pipecat-ai/mb/gradium-context-update
Switch Gradium TTS to AudioContextWordTTSService for multiplexing
2026-02-18 10:16:57 -05:00
filipi87
1daea78b91 Fix GradiumTTSService to reuse context IDs across multiple run_tts calls and prevent the parent class from pushing text frames. 2026-02-18 12:12:49 -03:00
Paul Kompfner
7910f20e14 Update comment in Azure TTS explaining how we could support dynamic settings updates in the future 2026-02-18 10:07:33 -05:00
Paul Kompfner
d7d94a29f0 Add foundational examples (55) for runtime settings updates via *UpdateSettingsFrame
42 examples covering STT (13), TTS (21), LLM (4), and realtime (4) services. Each demonstrates updating service settings 10 seconds after client connects, verifying the typed settings machinery end-to-end for every provider.
2026-02-18 09:46:23 -05:00
Tanmay Chaudhari
6066eec853 Add changelog for PR #3768 2026-02-18 14:31:16 +05:30
Tanmay Chaudhari
cd379671aa fix: use audio.sample_rate instead of audio.audio_frames in TavusInputTransport 2026-02-18 14:18:16 +05:30
Ian Lee
8006223911 [inworld] default timestamp transport strategy to ASYNC 2026-02-17 15:13:20 -08:00
Paul Kompfner
ce51df677c Add backward-compat _aliases and from_mapping overrides to TTS settings
The migration from plain-dict `self._settings` to typed dataclasses renamed keys and flattened nested dicts. The deprecated dict-based `TTSUpdateSettingsFrame(settings={...})` code path calls `from_mapping`, which silently dropped old keys into `extra`.

- Add `_aliases` so renamed flat keys (e.g. `sample_rate` → `fish_sample_rate`, camelCase Inworld keys) resolve correctly.
- Override `from_mapping` to destructure nested dicts (`output_format`, `prosody`, `audioConfig`, `voice_setting`, `audio_setting`) into their flat field equivalents.
- Fix AsyncAI constructor bug passing `output_format={...}` dict instead of individual `output_container`/`output_encoding`/`output_sample_rate` fields.
2026-02-17 17:07:14 -05:00
Paul Kompfner
68ebd3d063 Migrate HumeTTSService to standard TTSSettings pattern and remove dead TTSService.update_setting
HumeTTSService now stores its params (description, speed, trailing_silence) in a proper `HumeTTSSettings` dataclass instead of a separate `_params` Pydantic model, making it work with `TTSUpdateSettingsFrame(update=...)`. The old `update_setting(key, value)` method is kept but deprecated.

Also removes the unused no-op `TTSService.update_setting` base method, which was never called by the `TTSUpdateSettingsFrame` pipeline.
2026-02-17 15:44:41 -05:00
Radek Sedlák
5ea2d47d39 feat: Add support for private endpoint in Azure STT 2026-02-17 21:42:00 +01:00
Paul Kompfner
94a651cee2 Remove dead ServiceSettings.to_dict method 2026-02-17 15:15:18 -05:00
Paul Kompfner
1cad4210ce Deprecate dict-based *UpdateSettingsFrame(settings={...}) code path in STT, TTS, and LLM services.
The dataclass-based API (`*UpdateSettingsFrame(update=*Settings(...))`) is the preferred path since 0.0.103. The dict path still works but now emits a `DeprecationWarning`.
2026-02-17 15:09:39 -05:00
Paul Kompfner
1cec8d119d Expand language field docstrings to clarify storage invariant.
The union type reflects the input side; after construction and `_update_settings`, the stored value is always a service-specific string.
2026-02-17 14:57:38 -05:00
Paul Kompfner
7dc16b1d92 Type language fields and centralize conversion in STT services.
Change `TTSSettings.language` and `STTSettings.language` from `Any` to `Language | str | _NotGiven`. Add `language_to_service_language` base method and centralized `isinstance`-guarded conversion in `STTService._update_settings` (mirroring TTS). Update the TTS guard from `is not None` to `isinstance(…, Language)` so raw strings pass through unchanged.

Remove now-redundant per-service language conversion from `_update_settings` overrides (ElevenLabs, Azure, Fal, Whisper). Add `language_to_service_language` to Azure STT so the centralized conversion picks it up. Fix AWS and NVIDIA STT `__init__` to convert language at construction time, then simplify their runtime accessors to read `_settings.language` directly.
2026-02-17 14:49:26 -05:00
Luke Payyapilli
247f0bbcd3 Fix async generator cleanup to prevent uvloop crash on Python 3.12+ 2026-02-17 13:10:31 -05:00
Paul Kompfner
d2372c127a Add specific type annotations to ServiceSettings fields, replacing Any with str, float, int unions as appropriate. 2026-02-17 11:56:37 -05:00
Paul Kompfner
3b1ba57452 Change apply_update / _update_settings return type from set[str] to dict[str, Any]. The dict maps each changed field name to its pre-update value, enabling services to do granular diffing of complex settings objects. Existing call-site patterns ("field" in changed, if changed, iteration) work unchanged; set-difference sites use changed.keys() - {...}. 2026-02-17 11:49:15 -05:00
Paul Kompfner
02c2778b8d Document _warn_unhandled_updated_settings pattern in COMMUNITY_INTEGRATIONS.md. 2026-02-17 11:08:26 -05:00
Paul Kompfner
fa6a6dabee Fix DeepgramSageMakerSTTService._update_settings live_options sync to match DeepgramSTTService pattern.
Add missing reverse sync (live_options → top-level model/language) and `set_model_name()` call.
2026-02-17 11:02:13 -05:00
Paul Kompfner
3a77b4c1d8 In services that don't handle runtime settings updates—or don't handle them for *all* available settings—log a warning about which fields specifically aren't handled. Revert new apply-settings-updates logic across various services, to reduce PR testing scope. This logic can be added service by service gradually as future work.
Note that for services that previously handled applying updates (through methods like `set_model` and `set_language`), we're keeping the update-applying logic (some or most of which is already well-tested) and expanding it to cover all relevant settings fields. Services under this bucket are:
- Deepgram STT
- Deepgram Sagemaker STT
- Elevenlabs STT
- Google STT
- Gradium STT
- OpenAI STT
- Speechmatics STT
2026-02-17 10:58:29 -05:00
Mark Backman
3537420d91 Merge pull request #3761 from speechmatics/fix/sdk-version 2026-02-17 08:02:00 -05:00
Sam Sykes
65fb88e61e chore: update version specifier for speechmatics-voice
Change the version specifier from `>=0.2.8` to
`~=0.2.8` for the `speechmatics-voice` package.
This ensures compatibility with future patch
versions while preventing potential breaking
changes from minor updates.
2026-02-17 09:58:17 +00:00
Sam Sykes
b345f48ac1 fix: update dependency specifier for speechmatics-voice
Change the version specifier from >=0.2.8 to ~=0.2.8 for the
speechmatics-voice package to ensure compatibility with future
patch versions.
2026-02-17 09:55:43 +00:00
Mark Backman
f181e12d8f Add changelog for PR #3759 2026-02-16 11:35:45 -07:00
Mark Backman
36de6003d0 Switch Gradium TTS to AudioContextWordTTSService for multiplexing
Use client_req_id-based multiplexing instead of disconnecting and
reconnecting the websocket on every interruption. This follows the
same pattern used by Cartesia, ElevenLabs, and other services via
AudioContextWordTTSService.

Key changes:
- Base class: InterruptibleWordTTSService -> AudioContextWordTTSService
- Add close_ws_on_eos: False to setup message to keep connection alive
- Add client_req_id to text, end_of_stream messages for demultiplexing
- Route audio via append_to_audio_context() instead of push_frame()
- Silently drop messages for cancelled/unknown contexts on interruption
- Add _handle_interruption() that resets context without reconnecting
- Remove no-op push_frame() override
2026-02-16 11:34:16 -07:00
Mark Backman
dba4de77bf Merge pull request #3684 from ai-coustics/goedev/aic-model-caching
AIC model caching
2026-02-16 10:43:14 -05:00
Mark Backman
507765625f Make UserIdleController always-on with dynamic timeout updates
Always create UserIdleController (timeout=0 means disabled), removing
all Optional guards. Add UserIdleTimeoutUpdateFrame to allow changing
the idle timeout at runtime.
2026-02-14 09:54:30 -05:00
Mark Backman
8f5e5e8e7c Update comment in _maybe_mute_frame 2026-02-14 09:41:42 -05:00
Mark Backman
c682a44bb6 Merge pull request #3738 from lukepayyapilli/fix/mute-events-before-start-frame
Fix LLMUserAggregator broadcasting mute events before StartFrame
2026-02-14 09:40:40 -05:00
Mark Backman
cb7023681f Add changelog for PR #3744 2026-02-14 08:57:46 -05:00
Mark Backman
012ef41ff4 Redesign UserIdleController to use BotStoppedSpeakingFrame
Replace the continuous heartbeat-based timer (UserSpeakingFrame/BotSpeakingFrame
+ asyncio.Event loop) with a simple one-shot timer that starts when
BotStoppedSpeakingFrame is received and cancels on UserStartedSpeakingFrame or
BotStartedSpeakingFrame. This eliminates false idle triggers caused by gaps
between the user finishing speaking and the bot starting to speak (LLM/TTS
latency).

Guard the timer start with two conditions to prevent false triggers:
- User turn in progress: during interruptions, BotStoppedSpeaking arrives
  while the user is still speaking mid-turn.
- Function calls in progress: FunctionCallsStarted arrives before
  BotStoppedSpeaking because the bot speaks concurrently with the function
  call starting, so the timer must wait for the result and subsequent bot
  response.
2026-02-14 08:55:56 -05:00
Paul Kompfner
66b7b4a5d4 Update COMMUNITY_INTEGRATIONS.md for the new dataclass-based service settings pattern. 2026-02-13 16:04:49 -05:00
Filipi da Silva Fuchter
f6bb5fa124 Merge pull request #3741 from pipecat-ai/filipi/update_prebuilt
Using the latest version of pipecat-ai-small-webrtc-prebuilt.
2026-02-13 15:31:48 -05:00
Paul Kompfner
b08548af9d Remove typed-settings migration scaffolding and rename _update_settings_from_typed to _update_settings.
Now that all services use typed `ServiceSettings` objects, this removes the interim scaffolding that supported both dict-based and typed settings paths in parallel. Specifically: removes old dict-based `_update_settings(settings: Mapping)` methods from base classes, removes `isinstance(self._settings, ServiceSettings)` guards, simplifies `process_frame` branching, and renames `_update_settings_from_typed` to `_update_settings` across all ~30 service implementations. Also renames the no-arg `_update_settings()` helper on realtime services to `_send_session_update()` to avoid collision, adds `from_mapping` overrides on `GoogleLLMSettings` and `AnthropicLLMSettings` for ThinkingConfig dict-to-object conversion, and replaces a broken no-arg `_update_settings()` call in Gemini Live with a TODO.
2026-02-13 15:12:26 -05:00
Paul Kompfner
ab92a0e1d7 Remove/deprecate service-specific set_model and set_voice overrides.
- NvidiaSTTService.set_model: convert to proper DeprecationWarning (model can't change at runtime for Riva streaming STT)
- NvidiaTTSService.set_model: same treatment for Riva TTS
- NvidiaSegmentedSTTService.set_model: remove — base class now routes through _update_settings_from_typed which re-creates the recognition config
- GeminiTTSService.set_voice: remove — move AVAILABLE_VOICES validation into _update_settings_from_typed so it fires on both legacy and new paths
2026-02-13 15:12:26 -05:00
Paul Kompfner
e37f2f99c4 Deprecate set_model, set_voice, and set_language in favor of *UpdateSettingsFrame. 2026-02-13 15:12:26 -05:00
Paul Kompfner
e43351f5f8 Add class-level _settings type annotations to all service classes for better editor support.
Standardize all STT, TTS, and LLM service classes to declare `_settings` with the narrowed Settings type as a class-level annotation. This gives editors and type checkers the specific type when hovering or autocompleting on `self._settings` in each service and its subclasses. Inline `self._settings: Type = ...` assignments are replaced with plain `self._settings = ...`.
2026-02-13 15:12:26 -05:00
Paul Kompfner
444cbb6499 Add turn-completion fields to LLMSettings and handle them in the typed-service-settings path.
`filter_incomplete_user_turns` and `user_turn_completion_config` were only handled in the legacy dict-based `_update_settings` code path. This adds them to `LLMSettings` and introduces `LLMService._update_settings_from_typed` so the typed path handles them too.
2026-02-13 15:12:26 -05:00
Paul Kompfner
8a4ab611be Broad service settings refactor, with the primary aim of making service settings discoverable and strongly-typed. Service settings can be updated at runtime with *UpdateSettingsFrames.
Does not (yet) touch `InputParams`, to avoid scope creep and touching something currently part of the public API. But there is a lot of overlap between `*Settings` object fields and `InputParams` fields.

Other than discoverability/typing, these are some other improvements brought by this refactor:
- There is now a single code path (see `_update_settings_from_typed`) where services can respond to settings changes (by, say, reconnecting if needed), improving maintainability and guaranteeing one and only one reconnection no matter which settings changed
- `set_language`/`set_model`/`set_voice`—which we're assuming are usable as public methods, though *not* recommended over `*UpdateSettingsFrame`—all use the same code path as settings updates. They're also now all consistent in that, if a service needs to respond to a change (by, say, reconnecting if needed), any of these methods will kick off that process. Note that this is technically a behavior change.
- Several services now properly react to changed settings by reconnecting:
  - `AWSTranscribeSTTService`
  - `AzureSTTService`
  - `SonioxSTTService`
  - `GladiaSTTService`
  - `SpeechmaticsSTTService`
  - `AssemblyAISTTService`
  - `CartesiaSTTService`
  - `FishAudioTTSService` (would previously only reconnect when `model` changed)
  - `GoogleSTTService`
  - `SpeechmaticsSTTService` (which previously only handled *some* settings updates through a nonstandard public `update_params` method)
  - `GradiumSTTService`
  - `NvidiaSegmentedSTTService` (which previously only handled changes to language)
- Bookkeeping across various services has been reduced, mostly by deduping ivars; the `self._settings` ivar is treated as the source of truth

NOTE: I pretty much guarantee that there are services missed in this PR in terms of bringing to consistency with how updates are handled (like whether changes in certain fields trigger reconnects when they need to). We can squash remaining inconsistencies as we stumble onto them, service by service. The goal here is to get things *mostly* in order, and establish the infrastructure and patterns we'll need going forward.
2026-02-13 15:12:26 -05:00
filipi87
2489c76bc6 Using the latest version of pipecat-ai-small-webrtc-prebuilt. 2026-02-13 16:43:25 -03:00
Mark Backman
73cb96bf66 Merge pull request #3739 from pipecat-ai/mb/docs-skill
Add /update-docs Claude Code skill
2026-02-13 13:26:06 -05:00
Mark Backman
79ec61d1d8 Merge pull request #3642 from pipecat-ai/cb/rime-arcana-v3
Update RimeTTSService for arcana and mistv2 model support
2026-02-13 13:25:27 -05:00
Mark Backman
ca440594fe Merge pull request #3720 from pipecat-ai/mb/fix-grok-realtime
Fix Grok Realtime voice type validation for server responses
2026-02-13 13:24:53 -05:00
Mark Backman
6c25dd4aa2 Merge pull request #3736 from pipecat-ai/mb/improve-events-docstrings
Improve events docstrings
2026-02-13 13:24:15 -05:00
Mark Backman
09bb6bb03b Merge pull request #3735 from pipecat-ai/mb/fix-llm-tracing-error-handilng
Fix double execution of service functions on tracing errors
2026-02-13 13:23:55 -05:00
Mark Backman
746fdfbfef Merge pull request #3728 from pipecat-ai/mb/upgrade-pillow
Bump Pillow upper bound from <12 to <13
2026-02-13 13:23:41 -05:00
Mark Backman
f7af9f1efd Broaden /update-docs scope to detect missing doc sections 2026-02-13 13:14:45 -05:00
Mark Backman
a5f95acaf5 Add changelog for PR #3735 2026-02-13 13:08:03 -05:00
Mark Backman
e50b138ab2 Fix double execution of service functions when tracing errors occur
The outer try/except in each service decorator caught both tracing
setup errors and application errors from the wrapped function. If the
function itself raised (e.g. LLM rate limit, TTS timeout), the
exception was caught and the function was called a second time.

Fix by tracking whether the original function was called via a
fn_called flag. If the function was already called, re-raise the
exception instead of falling back to untraced re-execution.
2026-02-13 13:08:03 -05:00
Mark Backman
3640c7a2dd Merge pull request #3733 from pipecat-ai/mb/fix-traceable-init
Deprecate unused class_decorators tracing module and fix stale comments
2026-02-13 13:04:34 -05:00
Mark Backman
2454bedf29 Add /update-docs skill for keeping docs in sync with source changes
Adds a Claude Code skill that analyzes the current branch diff against
main, maps changed source files to their doc pages, and makes targeted
updates to Configuration, InputParams, Usage, Notes, and Event Handlers
sections.
2026-02-13 12:52:23 -05:00
Luke Payyapilli
3adb2f50a6 Fix LLMUserAggregator broadcasting mute events before StartFrame 2026-02-13 11:59:56 -05:00
Mark Backman
01b7a93e08 Deprecate unused Traceable/class_decorators module and fix stale comments
The class_decorators.py module (Traceable, @traceable, @traced) is not
used anywhere in the codebase. Mark it deprecated and fix the misleading
comment in service_decorators.py that referenced it as if it were active.
2026-02-13 11:25:40 -05:00
Mark Backman
347eaf582d Merge pull request #3721 from pipecat-ai/fix/pipeline-scoped-tracing-context
Replace singleton context providers with pipeline-scoped TracingContext
2026-02-13 11:24:37 -05:00
Mark Backman
25ca296477 Move tracing fields to AIService and extract _get_turn_context helper
Consolidate _tracing_enabled and _tracing_context from LLMService,
STTService, and TTSService into the shared AIService base class.
Extract _get_turn_context() helper in service_decorators.py to
encapsulate the repeated pattern across all traced decorators.
2026-02-13 11:21:24 -05:00
Mark Backman
3fce88555f Improve events docstrings 2026-02-13 09:39:44 -05:00
Mark Backman
9e6f27c9f1 Merge pull request #3625 from ianbbqzy/ian/inworld-async-timestamp
[inworld] Allow Async delivery of timestamps info
2026-02-12 21:20:22 -05:00
Ian Lee
94f01af545 [inworld] Allow Async delivery of timestamps info
* speed up first audio chunk latency
2026-02-12 17:48:58 -08:00
Filipi da Silva Fuchter
432870cc36 Merge pull request #3729 from pipecat-ai/filipi/elevenlabs_issue
TTS services fixes.
2026-02-12 16:31:46 -05:00
Filipi da Silva Fuchter
e065907745 Merge pull request #3718 from pipecat-ai/filipi/bot_started_speaking
Fixing an issue in RTVI where we were sometimes receiving bot output messages before the bot started speaking.
2026-02-12 16:31:14 -05:00
Mark Backman
b7a5ca3d1e Merge pull request #3730 from pipecat-ai/mb/stt-keepalive
Move STT keepalive from WebsocketSTTService to STTService base class
2026-02-12 15:37:23 -05:00
filipi87
9569625f03 Changelog entries for the TTS fixes. 2026-02-12 16:11:02 -03:00
Mark Backman
18afe37bd1 Add changelog entries for PR #3642 2026-02-12 14:09:24 -05:00
Mark Backman
2b9777b812 Update RimeTTSService InputParams for arcana and mistv2 model support
Add model-specific params (arcana: repetition_penalty, temperature, top_p;
mistv2: no_text_normalization, save_oovs, segment) with dynamic query param
building via _build_settings(). Model/voice/param changes now trigger
WebSocket reconnection since all settings are URL query params.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-12 14:01:41 -05:00
filipi87
8866ab1585 Fixing RimeTTSService to reuse the same context when needed. 2026-02-12 15:53:38 -03:00
filipi87
f0995164d9 Fixing PlayHTTTSService to reuse the same context when needed. 2026-02-12 15:50:18 -03:00
filipi87
136732afae Fixing InworldTTSService to reuse the same context when needed. 2026-02-12 15:46:59 -03:00
filipi87
3410eb82b3 Fixing CartesiaTTSService to reuse the same context when needed. 2026-02-12 15:26:49 -03:00
Chad Bailey
794811fbdb Updated WSS endpoint for Rime Arcana v3 support 2026-02-12 13:24:29 -05:00
filipi87
abea22ec57 Fixing AsyncAITTSService to reuse the same context when needed. 2026-02-12 15:17:47 -03:00
Mark Backman
08beb0264a Add changelog entries for PR #3730 2026-02-12 13:14:11 -05:00
Mark Backman
2e15b4842c Move STT keepalive mechanism from WebsocketSTTService to STTService base class
This allows non-websocket STT services (like SarvamSTTService, which uses
the Sarvam Python SDK for connection management) to reuse the same keepalive
pattern. Subclasses override _send_keepalive() and _is_keepalive_ready() for
their specific protocol.
2026-02-12 11:09:39 -05:00
filipi87
6d95a2425c Fixing ElevenLabs TTS word timestamp interleaving across sentences. 2026-02-12 12:54:47 -03:00
Mark Backman
4667a3d66d Add changelog for #3728 2026-02-12 09:42:23 -05:00
Mark Backman
0bf2477d2c Bump Pillow upper bound from <12 to <13 2026-02-12 09:41:18 -05:00
Mark Backman
71a752c971 Add tests for TracingContext and TurnTraceObserver
Cover pipeline-scoped tracing context lifecycle, span hierarchy,
conversation/turn context management, and concurrent pipeline isolation.
2026-02-11 23:27:35 -05:00
Mark Backman
358f237507 Replace singleton context providers with pipeline-scoped TracingContext
ConversationContextProvider and TurnContextProvider were singletons that
stored tracing context as class-level state. When two PipelineTask instances
ran concurrently, they would overwrite each other's context, causing service
spans to attach to the wrong pipeline's turn span.

Replace both singletons with a single TracingContext object owned by each
PipelineTask, threaded to services via StartFrame.
2026-02-11 21:58:10 -05:00
Mark Backman
d99a256715 Merge pull request #3706 from ianbbqzy/ian/inworld-user-agent
[Inworld] add User-Agent and X-Request-Id for better traceability
2026-02-11 19:38:26 -05:00
Ian Lee
dcbcab1542 [Inworld] add User-Agent and X-Request-Id for better traceability 2026-02-11 15:47:20 -08:00
Mark Backman
a966947220 Add changelog for #3720 2026-02-11 18:04:58 -05:00
Mark Backman
16b060d9e9 Fix Grok Realtime voice type validation for server responses
The Grok API now returns prefixed voice names (e.g. "human_Ara") in
session.updated events, causing Pydantic validation errors. Widen the
voice field type from GrokVoice to GrokVoice | str to accept both
user-facing names and server-returned values.
2026-02-11 18:04:20 -05:00
filipi87
ed7fde324e Adding changelog entry for the RTVIObserver fix. 2026-02-11 16:23:42 -03:00
filipi87
beb4e86b5f Fixing an issue in RTVI where we were sometimes receiving bot output messages before the bot started speaking. 2026-02-11 16:17:28 -03:00
Aleix Conchillo Flaqué
e75ccd9c2f Merge pull request #3717 from pipecat-ai/aleix/update-claude-md-pr-instructions
Add /pr-submit skill and clean up CLAUDE.md
2026-02-11 10:40:20 -08:00
Aleix Conchillo Flaqué
a80919ceff Move PR submission instructions from CLAUDE.md to /pr-submit skill
Extract the procedural PR workflow into an actionable skill that can be
invoked with /pr-submit. CLAUDE.md is better suited for project context
and conventions, not step-by-step procedures.
2026-02-11 09:57:42 -08:00
Aleix Conchillo Flaqué
1fe4538982 Update PR submission instructions in CLAUDE.md
Expand the Pull Requests section with detailed step-by-step instructions
including branch naming, commit guidance, changelog generation, and PR
description updates.
2026-02-11 09:51:10 -08:00
Filipi da Silva Fuchter
9a48d93bd2 Merge pull request #3713 from pipecat-ai/filipi/smallwebrtc_8khz
Fixing smallwebrtc transport input audio resampling logic.
2026-02-11 11:58:32 -05:00
filipi87
0c3e59ed61 Adding changelog entry for the SmallWebRTCTransport fix. 2026-02-11 13:07:52 -03:00
filipi87
ec2b38dc29 Fixing smallwebrtc transport input audio resampling logic. 2026-02-11 13:01:25 -03:00
Gökmen Görgen
2036757b84 add unit tests for AICModelManager and AICFilter error handling, model loading, and processor behavior 2026-02-11 15:22:37 +01:00
Mark Backman
0574167fbd Merge pull request #3709 from pipecat-ai/mb/fix-quickstart-pcc-deploy
Fix quickstart pcc-deploy.toml
2026-02-10 22:19:37 -05:00
Mark Backman
972ad93e18 Fix quickstart pcc-deploy.toml 2026-02-10 22:17:09 -05:00
Mark Backman
ac53594967 Merge pull request #3708 from pipecat-ai/mb/fix-quickstart-pyproject
Fix quickstart pyproject.toml
2026-02-10 22:09:49 -05:00
Mark Backman
b063d9d43b Fix quickstart pyproject.toml 2026-02-10 22:06:38 -05:00
Mark Backman
48e93beadf Merge pull request #3705 from pipecat-ai/mb/quickstart-0.0.102
Update quickstart for 0.0.102
2026-02-10 21:57:33 -05:00
Aleix Conchillo Flaqué
640940a41a Merge pull request #3704 from pipecat-ai/changelog-0.0.102
Release 0.0.102 - Changelog Update
2026-02-10 18:31:30 -08:00
aconchillo
f1e2001a4e Update changelog for version 0.0.102 2026-02-10 18:28:21 -08:00
Aleix Conchillo Flaqué
12dc6c0b9e Merge pull request #3707 from pipecat-ai/aleix/fix-openai-stream-close-compat
fix(openai): use compatible stream closing for non-OpenAI providers
2026-02-10 18:26:18 -08:00
Aleix Conchillo Flaqué
93f4402198 Update stream close test to match new _closing helper 2026-02-10 18:19:57 -08:00
Aleix Conchillo Flaqué
f3eb5b30a0 Add changelog for #3707 2026-02-10 18:01:29 -08:00
Aleix Conchillo Flaqué
18aad05a7c fix(openai): use compatible stream closing for non-OpenAI providers
OpenAI's AsyncStream uses close() while async generators (e.g. from
OpenPipe) use aclose(). Replace direct async-with on the stream with a
helper that handles both protocols.
2026-02-10 17:59:21 -08:00
Mark Backman
883b24f577 Update quickstart for 0.0.102 2026-02-10 18:14:04 -05:00
Mark Backman
17ab9c425f Merge pull request #3675 from pipecat-ai/mb/elevenlabs-realtime-send-silence
Add silence-based keepalive to WebsocketSTTService
2026-02-10 18:03:38 -05:00
Mark Backman
2f5e61ac55 Add silence-based keepalive to WebsocketSTTService
Adds opt-in keepalive_timeout and keepalive_interval params to
WebsocketSTTService. When enabled, a background task sends silent audio
(or a service-specific protocol message) when the connection has been
idle, preventing server-side timeout disconnects.

Subclasses override _send_keepalive(silence) to wrap the silence in
their wire format. The default sends raw PCM bytes.

Enables keepalive for ElevenLabs (10s), Gladia (20s), and Soniox (1s),
replacing their per-service custom keepalive tasks.
2026-02-10 17:58:47 -05:00
Aleix Conchillo Flaqué
1128c5b7fb Merge pull request #3702 from pipecat-ai/aleix/add-missing-local-smartturn-dependency
pyproject: add local smartturn as a default dependency
2026-02-10 14:34:43 -08:00
Aleix Conchillo Flaqué
a9a5edd8ca pyproject: add local smartturn as a default dependency 2026-02-10 14:32:32 -08:00
Filipi da Silva Fuchter
a98c884e31 Merge pull request #3621 from pipecat-ai/filipi/context_compressure
Context summarization feature implementation
2026-02-10 17:04:47 -05:00
filipi87
2475697955 Changelog entries for context summarization 2026-02-10 18:59:12 -03:00
filipi87
ba242d4875 Context summarization example with Google 2026-02-10 18:59:03 -03:00
filipi87
5deb80932b Context summarization example with OpenAI 2026-02-10 18:58:55 -03:00
filipi87
4a00e6829f Automated tests for the context summarizer. 2026-02-10 18:58:44 -03:00
filipi87
9d89afa7d4 Automated tests for the context summarization feature. 2026-02-10 18:58:33 -03:00
filipi87
92b6ecd945 New Claude skill to help refactor and cleanup the code. 2026-02-10 18:58:22 -03:00
filipi87
314d074c61 Context summarization feature implementation. 2026-02-10 18:58:12 -03:00
Filipi da Silva Fuchter
9c627e7292 Merge pull request #3653 from pipecat-ai/filipi/heygen_lite
HeyGen improvements.
2026-02-10 12:12:22 -05:00
Filipi da Silva Fuchter
ad179b0852 Merge pull request #3584 from pipecat-ai/filipi/speak_frame
TTS services improvements.
2026-02-10 12:11:47 -05:00
filipi87
5128089d42 Add changelog entries for PR #3653. 2026-02-10 14:02:32 -03:00
filipi87
87a79df048 Updating the heygen examples to use sandbox by default. 2026-02-10 14:02:20 -03:00
filipi87
24f90715e3 Use LITE as the default mode, and add support for video_settings and is_sandbox in LiveAvatarNewSessionRequest. 2026-02-10 14:02:09 -03:00
filipi87
e00b98343e Changelog entries for TTS context tracking 2026-02-10 11:37:21 -03:00
filipi87
ad1bec4583 Updated openai example to use on_tts_request and append_to_text. 2026-02-10 11:28:35 -03:00
filipi87
a47d7f98ee Refactored all 30+ TTS service implementations to support context tracking 2026-02-10 11:28:08 -03:00
filipi87
19cd242261 Added TTS context tracking system to trace audio generation through the pipeline. 2026-02-10 11:27:58 -03:00
filipi87
9bb712a47b Simplified universal context aggregators, _handle_text() to only check frame.append_to_context instead of also checking self._started 2026-02-10 11:27:30 -03:00
filipi87
1dccbe7c0b Simplified context aggregators, _handle_text() to only check frame.append_to_context instead of also checking self._started 2026-02-10 11:27:13 -03:00
Mark Backman
2dd3e2f1e7 Merge pull request #3697 from pipecat-ai/mb/soniox-rt-4
Update SonioxSTTService default model to stt-rt-v4
2026-02-10 09:24:39 -05:00
filipi87
f206aaa28d - Added context_id field to all TTS-related frames (TTSAudioRawFrame, TTSStartedFrame, TTSStoppedFrame, AggregatedTextFrame, TTSTextFrame)
- Added append_to_context parameter to TTSSpeakFrame for conditional LLM context addition
2026-02-10 11:22:26 -03:00
Mark Backman
60e42f5690 Merge pull request #3701 from pipecat-ai/mb/changelog-3700 2026-02-10 09:19:42 -05:00
Mark Backman
88e981c013 Set vad_force_turn_endpoint to False in SonioxSTTService 2026-02-10 09:16:03 -05:00
Mark Backman
7bd8dfe898 Add changelog for PR 3700 2026-02-10 08:20:03 -05:00
Mark Backman
83039a1a35 Merge pull request #3700 from ashotbagh/chore/async-migration
chore: update Async API URL and default model
2026-02-10 08:17:04 -05:00
Ashot
28e8b61eb4 chore: update Async API URL and default model 2026-02-10 15:23:51 +04:00
Mark Backman
d47d95e1f0 Update SonioxSTTService default model to stt-rt-v4 2026-02-09 23:48:08 -05:00
Mark Backman
79b9d929c5 Merge pull request #3682 from eoinoreilly30/patch-1
Add new voice options 'marin' and 'cedar'
2026-02-09 23:47:39 -05:00
Eoin
dfc0856d54 Added changelog entry 2026-02-10 12:31:26 +09:00
Eoin
f3c1cd4cd6 Lint 2026-02-10 12:31:26 +09:00
Eoin
18d91d6df3 Add new voice options 'marin' and 'cedar' 2026-02-10 12:31:26 +09:00
Mark Backman
688f502488 Merge pull request #3644 from pipecat-ai/mb/update-assembly-ai-default-config
AssemblyAISTTService: Disable turn detection when setting vad_force_t…
2026-02-09 22:27:44 -05:00
Mark Backman
7684a94c33 AssemblyAISTTService: Disable turn detection when setting vad_force_turn_endpoint to True 2026-02-09 22:20:35 -05:00
Aleix Conchillo Flaqué
e27f4bccfb Merge pull request #3695 from pipecat-ai/aleix/more-claude-updates
CLAUDE.md: add pipeline task and pipeline runner
2026-02-09 18:14:30 -08:00
Mark Backman
fa8b0aeda8 Merge pull request #3690 from pipecat-ai/mb/add-claude-settings
Add shared Claude Code settings
2026-02-09 19:22:28 -05:00
Aleix Conchillo Flaqué
946f0f4e77 CLAUDE.md: add pipeline task and pipeline runner 2026-02-09 16:19:11 -08:00
Mark Backman
b9cf3f3225 Merge pull request #3694 from pipecat-ai/mb/claude-updates
Add observers, error handling, task management, and testing to CLAUDE.md
2026-02-09 19:05:49 -05:00
Aleix Conchillo Flaqué
d32c4b2f5f Merge pull request #3693 from pipecat-ai/aleix/update-examples-remove-default-turn-analyzer
remove the now default turn analyzer from examples
2026-02-09 16:04:19 -08:00
Mark Backman
77a5d16a10 Merge pull request #3692 from pipecat-ai/mb/request-metadata-updates
Rename RequestMetadataFrame to ServiceSwitcherRequestMetadataFrame with service targeting
2026-02-09 18:19:29 -05:00
Mark Backman
ca224834b2 Add observers, error handling, task management, and testing to CLAUDE.md 2026-02-09 18:12:24 -05:00
Aleix Conchillo Flaqué
3867bc6302 LLMUserAggregator: update turn analyzer warning 2026-02-09 14:33:38 -08:00
Aleix Conchillo Flaqué
83a8379401 examples: remove the now default turn analyzer user turn stop strategy 2026-02-09 14:33:38 -08:00
mattie ruth backman
f2688deb0d Update args field in RTVILLMFunctionCallInProgressMessageData to match API of existing RTVILLMFunctionCallResultData. 2026-02-09 17:17:01 -05:00
Mark Backman
981253c703 Rename RequestMetadataFrame to ServiceSwitcherRequestMetadataFrame with service targeting
Add a `service` field so the frame targets a specific service, allowing
ServiceSwitcher.push_frame to consume it only when the targeted service
matches the active service. STTService and test mocks now push the frame
downstream after handling instead of silently consuming it.
2026-02-09 16:48:34 -05:00
Mark Backman
aa6c9797ca Merge pull request #3671 from pipecat-ai/mb/sarvam-cleanup
Clean up on Sarvam STT and TTS classes
2026-02-09 15:58:34 -05:00
Mark Backman
6305e04569 Clean up on Sarvam STT and TTS classes 2026-02-09 15:53:05 -05:00
Mark Backman
3ff9b7b5ad Merge pull request #3687 from pipecat-ai/mb/rtvi-mute-events
Emit RTVI events for user mute/unmute
2026-02-09 15:18:28 -05:00
Mark Backman
cc797ba3cf Add shared Claude Code settings to disable commit attribution 2026-02-09 15:15:31 -05:00
Aleix Conchillo Flaqué
91c8122c17 Merge pull request #3689 from pipecat-ai/aleix/default-smart-turn-stop-strategy
Use TurnAnalyzerUserTurnStopStrategy as default stop strategy
2026-02-09 12:07:16 -08:00
Aleix Conchillo Flaqué
944ac92593 Fix test_langchain to use explicit stop strategy
The default stop strategy changed to TurnAnalyzerUserTurnStopStrategy,
which requires actual audio analysis. Use SpeechTimeoutUserTurnStopStrategy
explicitly since this test is not testing turn detection.
2026-02-09 12:00:41 -08:00
Aleix Conchillo Flaqué
ca0d2e68c3 Add changelog for #3689 2026-02-09 11:58:09 -08:00
Aleix Conchillo Flaqué
631463e573 Use TurnAnalyzerUserTurnStopStrategy as default stop strategy
Change the default user turn stop strategy from
TranscriptionUserTurnStopStrategy to TurnAnalyzerUserTurnStopStrategy
with LocalSmartTurnAnalyzerV3. Also reduce AUDIO_INPUT_TIMEOUT_SECS
from 1.0 to 0.5 and remove its debug log.
2026-02-09 11:58:09 -08:00
Mark Backman
6a553367a2 Merge pull request #3676 from pipecat-ai/mb/code-review-skill
Add Claude code-review skill
2026-02-09 14:48:20 -05:00
Mark Backman
00ec6c77ea Emit RTVI events for user mute/unmute state changes
Add UserMuteStartedFrame/UserMuteStoppedFrame and corresponding RTVI
messages so clients can observe when mute strategies activate/deactivate.
2026-02-09 14:44:32 -05:00
Mark Backman
ee6520db30 Merge pull request #3637 from pipecat-ai/mb/improve-user-stop-turn
Improve user turn stop timing by triggering timeout from VAD stop, push STT metadata to user aggregator
2026-02-09 14:43:22 -05:00
Aleix Conchillo Flaqué
2a572aedba Simplify ServiceSwitcher with closure-based filters
- Make ServiceSwitcherStrategy inherit from BaseObject with properties
  for services and active_service, and move initial service selection
  into the base class
- Add on_service_switched event to ServiceSwitcherStrategy
- handle_frame now returns the switched-to service (or None), allowing
  ServiceSwitcher to swallow ManuallySwitchServiceFrame on switch and
  request metadata from the new active service
- Override push_frame to suppress RequestMetadataFrame and
  ServiceMetadataFrame from inactive services
- Remove ServiceSwitcherFilter and ServiceSwitcherFilterFrame in favor
  of plain FunctionFilter instances with closures that check the
  strategy's active service directly
- FunctionFilter: add FilterType alias
- FunctionFilter: when direction is None, frames in both directions
  are filtered instead of just one
- Add docstrings to ServiceSwitcher and its components
2026-02-09 14:12:33 -05:00
Mark Backman
5e66702cf5 Improved the accuracy of the UserBotLatencyObserver and UserBotLatencyLogObserver 2026-02-09 14:12:33 -05:00
Mark Backman
34b068d657 Improve user turn stop timing by triggering timeout from VAD stop
Refactor TranscriptionUserTurnStopStrategy and TurnAnalyzerUserTurnStopStrategy
to use VADUserStoppedSpeakingFrame as the ground truth for when speech ended,
rather than triggering timeouts from transcription frames.
2026-02-09 14:12:33 -05:00
Mark Backman
05e2a013b3 Merge pull request #3672 from pipecat-ai/mb/rtvi-duplicate-events
Filter RTVIObserver to downstream frames only and broadcast FunctionCallCancelFrame
2026-02-09 12:58:28 -05:00
Mark Backman
5f64dae0cf Filter RTVIObserver to downstream frames only and broadcast FunctionCallCancelFrame
RTVIObserver now skips upstream frames to prevent duplicate RTVI messages
when frames are broadcast in both directions. Also changed
FunctionCallCancelFrame to use broadcast_frame for consistency with
other function call frames.
2026-02-09 12:39:25 -05:00
Mark Backman
1bf8b54502 Merge pull request #3683 from dhruvladia-sarvam/sarvam-v3-update 2026-02-09 06:49:59 -05:00
Gökmen Görgen
ed3ec045aa add changelog file. 2026-02-09 12:04:09 +01:00
Gökmen Görgen
67d39a97f7 AIC model caching. 2026-02-09 11:51:28 +01:00
dhruvladia-sarvam
947ff03c9f v3 addition 2026-02-09 13:04:45 +05:30
Om Chauhan
a4e187e138 replace background task with flush-on-join 2026-02-09 06:04:08 +05:30
Om Chauhan
9f380170d7 added changelog 2026-02-09 05:37:43 +05:30
Om Chauhan
12f27f9cda fix(daily): queue outbound messages until transport joins 2026-02-09 05:37:43 +05:30
Mark Backman
104d06551a Merge pull request #3679 from pipecat-ai/mb/remove-to-be-updated
Remove SequentialMergePipeline
2026-02-08 15:28:38 -05:00
Mark Backman
90ad2a4e81 Remove SequentialMergePipeline 2026-02-08 14:44:48 -05:00
Mark Backman
3494a94cac Add Claude code-review skill 2026-02-08 11:06:48 -05:00
Mark Backman
570f2d7fc0 Merge pull request #3667 from ianbbqzy/ian/fix-auto-mode-space
[inworld] aggregate_sentence mode needs trailing space
2026-02-07 18:22:32 -05:00
Ian Lee
f3d99adf8f [inworld] aggregate_sentence mode needs trailing space 2026-02-07 15:18:24 -08:00
Mark Backman
d34f416281 Merge pull request #3598 from dhruvladia-sarvam/sarvam-v3-update
ASR and TTS v3 update
2026-02-07 10:51:35 -05:00
Mark Backman
5a1deb7cb4 Merge pull request #3659 from pipecat-ai/mb/change-vad-defaults
Set VADParams stop_secs to 0.2 by default
2026-02-06 23:51:50 -05:00
Mark Backman
a5fc2b1650 Set VADParams stop_secs to 0.2 by default 2026-02-06 23:49:08 -05:00
Aleix Conchillo Flaqué
5cb8d91431 added changelog file for #3616 2026-02-06 16:45:23 -08:00
Aleix Conchillo Flaqué
ce690848c0 Merge pull request #3616 from omChauhanDev/fix/function-call-timeout-task-cleanup
fix: ensure function call timeout task is always cancelled
2026-02-06 16:40:56 -08:00
Aleix Conchillo Flaqué
30f51edfcd Merge pull request #3668 from pipecat-ai/aleix/parallel-pipeline-buffering
Buffer internal frames during ParallelPipeline lifecycle sync
2026-02-06 15:25:32 -08:00
Aleix Conchillo Flaqué
cd03d449cb Update changelog skill with skip rules and allowed types 2026-02-06 15:23:14 -08:00
Aleix Conchillo Flaqué
57df03aade Update CLAUDE.md with PR workflow instructions 2026-02-06 15:23:14 -08:00
Aleix Conchillo Flaqué
4945cfbd8f Buffer internal frames during ParallelPipeline lifecycle synchronization
Processors inside parallel sub-pipelines can push frames during
StartFrame/EndFrame/CancelFrame processing. Previously these frames
could escape the ParallelPipeline before all branches finished
processing the lifecycle frame. Now they are buffered and flushed
after synchronization completes.
2026-02-06 15:15:46 -08:00
Mark Backman
8d37d3bae7 Merge pull request #3666 from pipecat-ai/mb/deepgram-stt-smart-format
DeepgramSTTService: disable smart_format by default
2026-02-06 14:04:37 -05:00
Mark Backman
d7b1624d3c Merge pull request #3663 from lukepayyapilli/fix/stream-close-sambanova-google
fix: close stream on cancellation for SambaNova and Google OpenAI services
2026-02-06 14:02:31 -05:00
Mark Backman
7f65204c3b DeepgramSTTService: disable smart_format by default 2026-02-06 13:45:10 -05:00
Aleix Conchillo Flaqué
97eff414c3 Merge pull request #3660 from pipecat-ai/aleix/interruption-frame-completion-event
Attach asyncio.Event to InterruptionFrame for completion signaling
2026-02-06 10:14:26 -08:00
Aleix Conchillo Flaqué
5b67e76de7 Add changelog for PR #3660 2026-02-06 10:11:00 -08:00
Aleix Conchillo Flaqué
b9e79bd06a CLAUDE.md: explain about InterruptionFrame.complete() 2026-02-06 10:11:00 -08:00
Aleix Conchillo Flaqué
d5105a78e6 STTMuteFilter should call frame.complete() when InterruptionFrame is blocked 2026-02-06 10:11:00 -08:00
Aleix Conchillo Flaqué
a352b2d7a0 Add tests for InterruptionFrame completion event
Add tests for the event-based interruption completion: complete() sets
the event, complete() is safe without an event, the event fires at
the pipeline sink, and a warning is logged when the frame is blocked.

Also remove the unconditional await after the timeout so the function
returns instead of hanging when complete() is never called.
2026-02-06 09:57:24 -08:00
Aleix Conchillo Flaqué
2345090b10 Attach asyncio.Event to InterruptionFrame for completion signaling
Move the interruption wait event from per-processor instance state to
the frame itself. The event is created in
push_interruption_task_frame_and_wait(), threaded through
InterruptionTaskFrame → InterruptionFrame, and set when the frame
reaches the pipeline sink. This scopes the event to each interruption
flow rather than sharing mutable state on the processor.

Also adds a 2s timeout warning to help diagnose cases where
InterruptionFrame.complete() is never called.
2026-02-06 09:57:24 -08:00
Mark Backman
af562bf9a8 Merge pull request #3664 from pipecat-ai/mb/elevenlabs-scribe-v2
Update ElevenLabsSTTService to scribe_v2
2026-02-06 12:31:44 -05:00
Mark Backman
d4993f0dcf Update ElevenLabsSTTService to scribe_v2 2026-02-06 11:37:23 -05:00
Luke Payyapilli
1790a84bfd add changelog 2026-02-06 10:05:02 -05:00
Luke Payyapilli
29c53b99a4 fix: close stream on cancellation for SambaNova and Google OpenAI services 2026-02-06 10:02:40 -05:00
Mark Backman
aa5a855eab Merge pull request #3656 from pipecat-ai/mb/openai-realtime-stt
Add OpenAIRealtimeSTTService
2026-02-06 09:15:58 -05:00
Mark Backman
e66d6f8ffe Merge pull request #3658 from pipecat-ai/mb/bump-protobuf-5.29.6
Upgrade protobuf to >=5.29.6
2026-02-05 19:09:30 -05:00
Mark Backman
b8ac2ba713 Merge pull request #3593 from ianbbqzy/ian/inworld-auto-mode
Add auto_mode support for inworld plugin
2026-02-05 18:16:38 -05:00
Ian Lee
6eea40858e fix lint and changelog 2026-02-05 15:10:36 -08:00
Mark Backman
90700d10aa Upgrade protobuf to >=5.29.6 2026-02-05 18:08:52 -05:00
Mark Backman
fa85f7bbc7 Merge pull request #3640 from lukepayyapilli/fix/openai-stream-close
fix: close stream on cancellation to prevent socket leaks
2026-02-05 18:00:06 -05:00
Mark Backman
669f013970 Merge pull request #3657 from pipecat-ai/filipi/changing_no_audio_log_to_debug
Changing the ‘no audio received’ log from warning to debug.
2026-02-05 17:35:24 -05:00
filipi87
76f63e54e2 Changing the ‘no audio received’ log from warning to debug. 2026-02-05 18:07:14 -03:00
Filipi da Silva Fuchter
cce5a13444 Merge pull request #3650 from pipecat-ai/filipi/twilio_issues
Ignoring RTVI messages inside the Serializers by default.
2026-02-05 15:52:59 -05:00
Mark Backman
d11e1cd631 Update 13k to use ElevenLabsRealtimeSTTService 2026-02-05 15:48:00 -05:00
Mark Backman
8b9da632d1 Add OpenAIRealtimeSTTService 2026-02-05 15:48:00 -05:00
Mark Backman
b36f7892a4 Merge pull request #3654 from pipecat-ai/aleix/more-claude-update
CLAUDE.md: add RTVI and serializers
2026-02-05 15:23:35 -05:00
Mark Backman
9b43cde128 Merge pull request #3355 from itsderek23/user-bot-latency
Add `user_bot_latency_seconds` to OpenTelemetry turn spans
2026-02-05 15:23:15 -05:00
filipi87
6af4d872a8 Refactoring the serializers to ignore the RTVI messages by default. 2026-02-05 16:52:53 -03:00
Ian Lee
22398e1410 add changelog back 2026-02-05 11:39:39 -08:00
Ian Lee
d10467e043 update timestamps reset handling 2026-02-05 11:39:39 -08:00
Ian Lee
cbe131636d add changelog 2026-02-05 11:39:39 -08:00
Ian Lee
fef9e3ea32 Add auto_mode support for inworld plugin 2026-02-05 11:39:39 -08:00
Mark Backman
56d8ef2bf4 Deprecate UserBotLatencyLogObserver, update 29 example 2026-02-05 14:29:45 -05:00
Derek Haynes
8791559351 Add changelog entry for PR #3355 2026-02-05 14:29:45 -05:00
Derek Haynes
f6c919354f Add test for user bot latency 2026-02-05 14:29:45 -05:00
Derek Haynes
93138466d6 Feat: Add user-bot latency to OTel turn spans
This adds user-to-bot response latency tracking to OpenTelemetry spans:

- Created UserBotLatencyObserver as a reusable component for tracking
user-to-bot response latency
- Records the value as an attribute on turn spans (turn.user_bot_latency_seconds)
- Updated TurnTraceObserver to use UserBotLatencyObserver, following the same pattern as TurnTrackingObserver
- Updated PipelineTask to automatically create and wire UserBotLatencyObserver
when tracing is enabled (same as TurnTrackingObserver)
2026-02-05 14:29:42 -05:00
Mark Backman
5a5a98b497 Merge pull request #3649 from itsderek23/fix/tracing-orphan-spans
Fix orphan otel spans during flow initialization and transitions
2026-02-05 14:23:52 -05:00
Aleix Conchillo Flaqué
2b4f507d37 CLAUDE.md: add RTVI and serializers 2026-02-05 11:06:00 -08:00
Mark Backman
d6f3a90662 Merge pull request #3652 from pipecat-ai/mb/upgrade-small-webrtc-prebuilt-2.1.0
Upgrade pipecat-ai-small-webrtc-prebuilt to 2.1.0
2026-02-05 13:48:54 -05:00
Derek Haynes
8fb0e37965 Update changelog for #3649 2026-02-05 11:35:22 -07:00
Derek Haynes
0d45b48f7b Fix import placement 2026-02-05 11:26:58 -07:00
Mark Backman
6af4520b1f Merge pull request #3635 from pipecat-ai/filipi/fix_websocket
Fixed an error in the WebSocket transport that occurred when an InputTransportMessageFrame was received and broadcast.
2026-02-05 12:22:59 -05:00
filipi87
ba469e5645 Add changelog entry
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-05 12:19:51 -05:00
Mark Backman
bd12b60b5c Merge pull request #3614 from okue/fix/websocket-broadcast-frame-misuse
fix: pass frame class instead of instance to broadcast_frame in websocket transports
2026-02-05 12:19:03 -05:00
Mark Backman
54db37ea47 Upgrade pipecat-ai-small-webrtc-prebuilt to 2.1.0 2026-02-05 12:09:51 -05:00
filipi87
752e16f553 Ignoring RTVI messages inside TwilioSerializer by default. 2026-02-05 10:51:03 -03:00
Derek Haynes
7c7408a048 Fix orphan spans in tracing during flow initialization and transitions
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 06:06:13 -07:00
Mark Backman
8f42343927 Merge pull request #3630 from pipecat-ai/mb/add-function-call-messages-rtvi
Add native RTVI function call lifecycle messages
2026-02-04 16:20:42 -05:00
Mark Backman
46da6cd91b Update changelogs 2026-02-04 11:19:30 -05:00
Mark Backman
ecb02d9049 Bump RTVI_PROTOCOL_VERSION to 1.2.0 2026-02-04 11:17:38 -05:00
Mark Backman
cc68e00125 Deprecate llm-function-call message 2026-02-04 11:17:23 -05:00
Mark Backman
e0e3b5250b Add RTVIObserverParams to control what information is included in function call events 2026-02-04 11:05:05 -05:00
Luke Payyapilli
55a3b10e70 fix(openai): close stream on cancellation to prevent socket leaks 2026-02-04 09:59:10 -05:00
dhruvladia-sarvam
e6b06414b3 change default speaker for bulbul:v3-beta to shubh 2026-02-04 16:46:35 +05:30
Aleix Conchillo Flaqué
6bcfb40d12 Merge pull request #3636 from pipecat-ai/aleix/initial-claude-md
initial CLAUDE.md
2026-02-03 19:31:16 -08:00
Aleix Conchillo Flaqué
65b1a8ce36 initial CLAUDE.md 2026-02-03 18:04:54 -08:00
Mark Backman
2db3d94d06 Merge pull request #3628 from pipecat-ai/mb/broadcast-speech-control-params-frame
Fix: Broadcast SpeechControlParamsFrame from VADController
2026-02-03 18:44:15 -05:00
Mark Backman
2a26b9f7a3 Fix: Broadcast SpeechControlParamsFrame from VADController 2026-02-03 18:40:39 -05:00
Aleix Conchillo Flaqué
4f77c532fb Merge pull request #3623 from pipecat-ai/aleix/pipeline-task-rtvi-always-set-bot-ready
PipelineTask: also call set_bot_ready() for external RTVI processors
2026-02-03 14:21:03 -08:00
Aleix Conchillo Flaqué
c3a4da4a29 PipelineTask: also call set_bot_ready() for external RTVI processors 2026-02-03 14:16:08 -08:00
Mark Backman
84ca0b6d58 Merge pull request #3629 from pipecat-ai/fix/telephony-websocket-stopasynciteration
Fix StopAsyncIteration in parse_telephony_websocket
2026-02-03 12:10:07 -05:00
Mark Backman
c1857d255d Avoid nesting try/excepts 2026-02-03 12:00:04 -05:00
Mark Backman
d50ec33079 Merge pull request #3542 from lukepayyapilli/fix/terminal-frames-uninterruptible
fix: make EndFrame and StopFrame uninterruptible to prevent pipeline freeze
2026-02-03 10:08:17 -05:00
Mark Backman
40c84faff5 Remove handle_function_call_start 2026-02-03 10:00:59 -05:00
Mark Backman
84cd9346f9 Add native RTVI function call lifecycle messages 2026-02-03 10:00:59 -05:00
Luke Payyapilli
5d5b19e1d2 Add changelog entry 2026-02-03 09:12:59 -05:00
Luke Payyapilli
8d3e10f054 Make EndFrame and StopFrame uninterruptible to prevent pipeline freeze 2026-02-03 09:12:59 -05:00
dhruvladia-sarvam
1665ce181a refactor(sarvam): centralize model configuration with dataclasses 2026-02-03 14:33:41 +05:30
James Hush
803a20cc00 Fix formatting: remove extra blank line
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-03 16:46:44 +08:00
James Hush
90bead06ab Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-02-03 16:42:13 +08:00
James Hush
b427d534ae Add tests for parse_telephony_websocket StopAsyncIteration handling
Tests cover:
- No messages received (raises ValueError)
- One message received (logs warning, continues)
- Two messages received (normal operation)
- All telephony providers (Twilio, Telnyx, Plivo, Exotel)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-03 16:33:36 +08:00
James Hush
b030f1178d Add changelog and improve docstring for parse_telephony_websocket
- Added changelog entry for bug fix
- Enhanced docstring with Args and Raises sections

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-03 16:26:09 +08:00
James Hush
a627597bca Fix StopAsyncIteration in parse_telephony_websocket
Handle WebSocket disconnections gracefully when telephony providers send
fewer messages than expected. Adds explicit StopAsyncIteration handling
for both first and second message retrieval.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-03 16:25:07 +08:00
Aleix Conchillo Flaqué
4c10ddb7bb upgrade uv.lock 2026-02-02 16:25:06 -08:00
Mark Backman
a4e499dc80 Merge pull request #3617 from pipecat-ai/fix/cjk-sentence-splitting
Fix sentence splitting for CJK and other non-Latin languages
2026-02-02 18:16:51 -05:00
Mark Backman
ca49acfaa6 Merge pull request #3619 from pipecat-ai/mb/resemble-readme
Resemble cleanup
2026-02-02 09:20:11 -05:00
Mark Backman
86147f15f3 Renumber the Resemble foundational example 2026-02-02 09:07:05 -05:00
Mark Backman
5cda72d138 Add Resemble TTS to README 2026-02-02 09:05:03 -05:00
Mark Backman
54e62a8177 Merge pull request #3134 from pipecat-ai/mb/resemble-tts-draft
Add ResembleAITTSService
2026-02-02 08:59:27 -05:00
Mark Backman
a592b7fdf0 Update per PR 1789, align with ErrorFrame norms 2026-02-02 08:55:29 -05:00
Mark Backman
ba2b7c05d6 Add ResembleAITTSService 2026-02-02 08:55:27 -05:00
James Hush
774041e9a1 Add changelog for PR #3617
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 14:47:22 +08:00
James Hush
763002f2bc Fix sentence splitting for CJK and other non-Latin languages in TTS pipeline
NLTK's sent_tokenize() only supports ~15 European languages and defaults to
English. For Japanese, Chinese, Korean, Hindi, Arabic, and other non-Latin
languages, NLTK fails to recognize sentence boundaries like 。?! causing
text to accumulate until flush instead of being emitted sentence-by-sentence.

Add a fallback in match_endofsentence() that scans for unambiguous non-Latin
sentence-ending punctuation when NLTK fails to split the text. Latin
punctuation (. ! ? ; …) is excluded from the fallback since NLTK handles
those correctly and they can be ambiguous (abbreviations, decimals, etc.).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 14:27:49 +08:00
Om Chauhan
50dedf350d fix: ensure function call timeout task is always cancelled 2026-02-02 08:38:54 +05:30
okue
d3ecbb11c1 fix: pass frame class instead of instance to broadcast_frame in websocket transports
broadcast_frame() expects a frame class and kwargs, but the three
websocket input transports (fastapi, client, server) were incorrectly
passing a frame instance. This would cause a TypeError at runtime when
an InputTransportMessageFrame was received.
2026-02-01 20:38:34 +09:00
Aleix Conchillo Flaqué
f453227ba3 Merge pull request #3612 from pipecat-ai/aleix/use-kokoro-onnx
KokoroTTSService: use kokoro-onnx instead of kokoro
2026-01-31 21:03:55 -08:00
Aleix Conchillo Flaqué
52cc64019a Merge pull request #3611 from pipecat-ai/aleix/aicoustics-example-update
examples: update 07zd to use vad_analyzer in LLMUserAggregator
2026-01-31 21:02:50 -08:00
Aleix Conchillo Flaqué
95689cc81c KokoroTTSService: use kokoro-onnx instead of kokoro 2026-01-31 17:20:27 -08:00
Aleix Conchillo Flaqué
675c7c43e3 examples: update 07zd to use vad_analyzer in LLMUserAggregator 2026-01-31 15:31:15 -08:00
Aleix Conchillo Flaqué
bfd19e867c Merge pull request #3610 from pipecat-ai/aleix/dont-add-rtvi-observer-if-already-there
PipelineTask: don't add RTVIObserver if already there
2026-01-31 14:57:52 -08:00
Aleix Conchillo Flaqué
acc9923c0a PipelineTask: don't add RTVIObserver if already there 2026-01-31 14:54:29 -08:00
Mark Backman
bdc9e7e2e4 Merge pull request #3608 from pipecat-ai/mb/quickstart-0.0.101
Update quickstart for 0.0.101
2026-01-31 10:39:17 -05:00
Mark Backman
a587e1b99a Update quickstart for 0.0.101 2026-01-31 09:52:24 -05:00
Aleix Conchillo Flaqué
7853e5ca93 Merge pull request #3606 from pipecat-ai/changelog-0.0.101
Release 0.0.101 - Changelog Update
2026-01-30 22:58:22 -08:00
aconchillo
614b8e1a62 Update changelog for version 0.0.101 2026-01-30 22:54:31 -08:00
Aleix Conchillo Flaqué
ef51c2a5c6 changelog: fix 3582 changed file 2026-01-30 22:48:26 -08:00
Aleix Conchillo Flaqué
f42dc0d38e Merge pull request #3605 from pipecat-ai/aleix/gemini-live-schedule-transcription-timeout-handler
GeminiLiveLLMService: let the transcription timeout handler be scheduled
2026-01-30 22:44:05 -08:00
Aleix Conchillo Flaqué
d87f3543c7 GeminiLiveLLMService: let the transcription timeout handler be scheduled 2026-01-30 22:41:10 -08:00
Aleix Conchillo Flaqué
fee633cb92 scripts(evals): disable kokoro for now 2026-01-30 21:23:42 -08:00
Aleix Conchillo Flaqué
607af91153 Merge pull request #3604 from pipecat-ai/mb/fix-ivr-navigator-aggregation
Fix IVRNavigator to push AggregatedTextFrame when switching to conver…
2026-01-30 21:22:20 -08:00
Mark Backman
e779233918 Fix IVRNavigator to push AggregatedTextFrame when switching to conversation mode 2026-01-30 21:07:49 -05:00
Aleix Conchillo Flaqué
604d5d0b14 examples: update 07zi and 07zj to use vad_analyzer form LLMUserAggregator 2026-01-30 16:14:02 -08:00
Mark Backman
342ae7af41 Merge pull request #3601 from pipecat-ai/mb/add-22-release-evals
Add 22 foundational to release evals
2026-01-30 15:31:54 -05:00
Mark Backman
c92ec1552e Add 22 foundational to release evals 2026-01-30 15:12:52 -05:00
Aleix Conchillo Flaqué
93160f1455 scripts(evals): remove vad_analyzer from transport 2026-01-30 12:08:12 -08:00
Aleix Conchillo Flaqué
e3158e1131 Merge pull request #3600 from pipecat-ai/aleix/llm-server-timeout-task-never-waited
LLMService: make sure function call timeout handler is started
2026-01-30 12:01:18 -08:00
Mark Backman
63a23246d5 Add UserTurnCompletionLLMServiceMixin (#3518)
* Added UserTurnCompletionLLMServiceMixin class

* Added 22-filter-incomplete-turns.py foundational example

* Removed old 22 natural conversation foundational examples

* Added test_user_turn_completion_mixin.py
2026-01-30 14:57:15 -05:00
Aleix Conchillo Flaqué
569ea9849a Merge pull request #3599 from pipecat-ai/aleix/release-evals-disable-rtvi
scripts(evals): disable RTVI
2026-01-30 11:44:46 -08:00
Aleix Conchillo Flaqué
a98ca9b65b LLMService: make sure function call timeout handler is started 2026-01-30 11:38:26 -08:00
Aleix Conchillo Flaqué
c9310789dc scripts(evals): use new vad_analyzer from LLMUSerAggregator 2026-01-30 10:57:17 -08:00
Aleix Conchillo Flaqué
b93e12d701 scripts(evals): disable RTVI 2026-01-30 10:52:38 -08:00
Aleix Conchillo Flaqué
3f77da627d Merge pull request #3583 from pipecat-ai/aleix/move-vad-analyzer-to-llm-user-aggregator
VAD analyzer is now passed to LLMUserAggregator
2026-01-30 10:46:10 -08:00
Aleix Conchillo Flaqué
35d265770d LLMUserAggregator: don't process certain self-queued frames 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
9632efec8c VADProcessor: broadcast frames 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
27dbfa1eda NvidiaTTSService: return AsyncIterator instead of AsyncIterable 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
183c0aa4ef LLMUserAggregator: queue frames internally so strategies and controllers can process them 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
a69a037ffa changelog: add updates for #3583 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
c46e7f5da0 TurnAnalyzerUserTurnStopStrategy: only update vad params if frame contains vad 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
307aeaeda0 examples: update with LLMUserAggregatorParams vad_analyzer and VADProcessor 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
305ab44132 tests: add unittest.main() call 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
b486f35c70 audio: add new VADProcessor 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
c92080b0d2 LLMUserAggregator: add vad_analyzer and use VADController 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
ddfedaf478 audio(vad): add new VADController 2026-01-30 10:07:34 -08:00
Aleix Conchillo Flaqué
b1ad4d5ab0 BaseInputTransport: deprecate vad_analyzer 2026-01-30 10:07:33 -08:00
Aleix Conchillo Flaqué
0857aa87be Merge pull request #3595 from pipecat-ai/aleix/add-kokoro-tts-support
services(tss): add new KokoroTTSService
2026-01-30 09:49:05 -08:00
Aleix Conchillo Flaqué
fd3c5f69b7 upgrade uv.lock 2026-01-30 09:41:33 -08:00
Aleix Conchillo Flaqué
72ab329513 services(tss): add new KokoroTTSService 2026-01-30 09:39:01 -08:00
Filipi da Silva Fuchter
7999d08b7e Merge pull request #3052 from Navigate-AI/fork/main
Include pts in video and audio frames in SmallWebRTCClient
2026-01-30 09:03:29 -05:00
dhruvladia-sarvam
57821cf709 fix 2026-01-30 16:07:52 +05:30
dhruvladia-sarvam
18045582a9 ASR and TTS v3 update 2026-01-30 15:53:06 +05:30
Mark Backman
7be2b8cc34 Merge pull request #3587 from pipecat-ai/mb/gradium-improvements
GradiumSTTService now flushes pending transcripts on VAD stopped dete…
2026-01-29 18:11:25 -05:00
Aleix Conchillo Flaqué
671cc8eb74 Merge pull request #3590 from pipecat-ai/aleix/custom-cli-runner-args
runner: allow custom CLI arguments
2026-01-29 13:53:27 -08:00
Aleix Conchillo Flaqué
b4dce656f0 Merge pull request #3594 from pipecat-ai/aleix/user-turn-controller-reset-timeout-on-interims
UserTurnController: reset user turn timeout with interim transcriptions
2026-01-29 13:12:44 -08:00
Aleix Conchillo Flaqué
253a1d1114 UserTurnController: reset user turn timeout with interim transcriptions 2026-01-29 13:10:10 -08:00
Aleix Conchillo Flaqué
ca613bcb79 Merge pull request #3592 from pipecat-ai/aleix/broadcast-frame-no-deepcopy
don't deep copy fields when broadcasting frames
2026-01-29 11:50:20 -08:00
Aleix Conchillo Flaqué
0423acd8a0 STTService: just clear buffer before running run_stt() 2026-01-29 11:47:57 -08:00
Aleix Conchillo Flaqué
7eabaaa0ef FrameProcessors: do not deepcopy fields when broadcasting frames 2026-01-29 11:47:57 -08:00
Aleix Conchillo Flaqué
bbb8b53d03 runner: allow custom CLI arguments 2026-01-29 10:15:53 -08:00
Aleix Conchillo Flaqué
f3b72e9263 Merge pull request #3585 from pipecat-ai/aleix/improve-piper-tts-support
improve Piper TTS support
2026-01-29 08:36:13 -08:00
Mark Backman
31c7fbc5ba Add delay_in_frames and language support 2026-01-29 10:59:04 -05:00
Mark Backman
6ab12626d6 GradiumSTTService now flushes pending transcripts on VAD stopped detection 2026-01-29 10:26:17 -05:00
Mark Backman
b77a50de73 Merge pull request #3529 from lukepayyapilli/fix/llm-timeout-without-retry
feat: handle exceptions for BaseOpenAILLMService
2026-01-29 09:12:54 -05:00
Luke Payyapilli
433c1b9b92 add catch-all exception handler per review feedback 2026-01-29 09:07:06 -05:00
Aleix Conchillo Flaqué
bd00587092 changelog: add files for 3585 2026-01-29 00:16:39 -08:00
Aleix Conchillo Flaqué
5a85e27cc5 PiperHttpTTSService: allow passing a voice id 2026-01-29 00:16:39 -08:00
Aleix Conchillo Flaqué
11daa43b1b TTSService: resample _stream_audio_frames_from_iterator() input audio if needed 2026-01-29 00:16:39 -08:00
Aleix Conchillo Flaqué
875614ff7a tts: add support for local PiperTTSService 2026-01-29 00:16:39 -08:00
Aleix Conchillo Flaqué
eb1bf1e446 tts: rename PiperTTSService to PiperHttpTTSService 2026-01-28 23:27:32 -08:00
mattie ruth backman
7456a0a55f Fix the /start and /offer/api proxy endpoints for smallWebRTC to match pipecat cloud behavior WRT requestData 2026-01-28 15:25:13 -05:00
Filipi da Silva Fuchter
27277ed3d9 Merge pull request #3571 from pipecat-ai/filipi/funcion_call_improvements
Function call improvements
2026-01-28 14:03:40 -05:00
filipi87
5543bc56f3 Add changelog files for PR #3571
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-28 15:43:59 -03:00
filipi87
c8496dfb8e Updated the examples which use UserImageRequestFrame to defer the function call result. 2026-01-28 15:39:21 -03:00
filipi87
d3f4cbb620 Providing a way to defer the function call results. 2026-01-28 15:39:06 -03:00
filipi87
c9f922c479 Removed an overridden method that was identical to the parent implementation. 2026-01-28 15:38:40 -03:00
Aleix Conchillo Flaqué
49bd3da26b Merge pull request #3582 from pipecat-ai/aleix/daily-sample-room-url
rename DAILY_SAMPLE_ROOM_URL to DAILY_ROOM_URL
2026-01-28 10:38:14 -08:00
Aleix Conchillo Flaqué
f3ef488925 rename DAILY_SAMPLE_ROOM_URL to DAILY_ROOM_URL 2026-01-28 10:05:27 -08:00
Aleix Conchillo Flaqué
4f08098917 Merge pull request #3580 from Pulkit0729/fix/livekit
fix: adding missing livekit transport configs
2026-01-28 10:04:34 -08:00
Pulkit
a7cd5b0322 fix: adding missing livekit transport configs 2026-01-28 23:15:03 +05:30
Aleix Conchillo Flaqué
55dadc9118 tests(genesys): fix formatting 2026-01-28 09:15:42 -08:00
Aleix Conchillo Flaqué
01bbf61e0d Merge pull request #3500 from ssillerom/feature/genesys_serializer
Feature/genesys serializer
2026-01-28 09:09:11 -08:00
ssillerom
10fb77c0e2 added changelog file 2026-01-28 18:07:33 +01:00
ssillerom
2612fae527 ruff linting 2026-01-28 18:02:51 +01:00
ssillerom
c5be67f293 fix: create disconnect message passing output vars 2026-01-28 17:56:21 +01:00
kompfner
312caaba86 Merge pull request #3429 from lukepayyapilli/fix/gemini-live-interrupted-signal
feat: handle server_content.interrupted for faster interruptions
2026-01-28 10:25:36 -05:00
Luke Payyapilli
ff0eb6d286 fix: emit ErrorFrame on LLM completion timeout 2026-01-28 09:44:32 -05:00
ssillerom
ef6bbace98 fixes: super init inhereted class to set event hanlders in the construct 2026-01-28 15:40:24 +01:00
Filipi da Silva Fuchter
06ec21387f Merge pull request #3581 from pipecat-ai/filipi/open_ai_audio_duration
Fixed race condition in OpenAIRealtimeLLMService
2026-01-28 07:42:35 -05:00
filipi87
bdae177125 Adding changelog entry for the OpenAiRealtimeLLMService fix. 2026-01-28 08:39:11 -03:00
filipi87
468e159f9b Fixed race condition in OpenAIRealtimeLLMService that could cause an error when truncating the conversation. 2026-01-28 08:36:31 -03:00
ssillerom
a4acafd3be feature: added event handlers in constructor and call func in each _handle_* func 2026-01-28 10:54:26 +01:00
ssillerom
105824a372 Merge main into feature/genesys_serializer
Incorporates latest changes from main branch including:
- AIC filter and VAD updates
- STT service improvements
- Base serializer changes
- Various bug fixes
2026-01-28 10:48:56 +01:00
ssillerom
55e0d4ecc4 ruff fixes done 2026-01-28 08:59:28 +01:00
ssillerom
9102e81cb8 added tests to the PR 2026-01-27 23:39:43 +01:00
ssillerom
d7d8e93a3d feature: added custom params in closed message to genesys, simplified create_* functions, simplified constructor method and simplified opened message 2026-01-27 23:36:47 +01:00
Mark Backman
bf9b166464 Merge pull request #3575 from pipecat-ai/mb/fix-turn-stopped-event-end-cancel-frame
Emit on_assistant_turn_stopped and on_user_turn_stopped from EndFrame…
2026-01-27 14:55:34 -05:00
Mark Backman
e80e0eab29 Emit on_assistant_turn_stopped and on_user_turn_stopped from EndFrame or CancelFrame 2026-01-27 14:50:10 -05:00
Mark Backman
61242e6575 Merge pull request #3574 from pipecat-ai/mb/fix-websocket-close-message-handling
Fix WebsocketService infinite loop on graceful server disconnect
2026-01-27 13:53:26 -05:00
Aleix Conchillo Flaqué
8841387121 Merge pull request #3560 from pipecat-ai/aleix/serializer-base-objects
FrameSerializer: subclass from BaseObject so we can add events
2026-01-27 09:58:44 -08:00
Aleix Conchillo Flaqué
ee695ae9fe FrameSerializer: subclass from BaseObject so we can add events 2026-01-27 09:53:46 -08:00
Mark Backman
52012b0fb2 Fix WebsocketService infinite loop on graceful server disconnect 2026-01-27 12:41:28 -05:00
Mark Backman
f7a1c6b719 Merge pull request #3408 from ai-coustics/aic-v2
Add ai-coustics AIC SDK v2 support with model downloading
2026-01-27 10:38:26 -05:00
Gökmen Görgen
6aa77ccc13 group aic related changes in changelog. 2026-01-27 16:22:54 +01:00
Gökmen Görgen
45b7ec4e2c re-enable 07zd-interruptible-aicoustics.py in release evals. 2026-01-27 16:18:56 +01:00
Mark Backman
1c434c6ad5 Merge pull request #3562 from speechmatics/fix/smx-ttfs-finals
Support TTFS for Speechmatics STT
2026-01-27 08:35:34 -05:00
Mark Backman
4591affba9 Merge pull request #3568 from pipecat-ai/mb/changelog-3536 2026-01-27 07:14:41 -05:00
Sam Sykes
91346f5f37 Add support for self.request_finalize() for Pipecat-based VAD. 2026-01-27 10:44:35 +00:00
Filipi da Silva Fuchter
6a66ebe332 Merge pull request #3541 from pipecat-ai/filipi/audio_buffer
Refactoring AudioBufferProcessor to fix audio track synchronization.
2026-01-27 05:32:41 -05:00
Filipi da Silva Fuchter
c1d4180042 Merge pull request #3567 from pipecat-ai/filipi/openai_realtime_audio_duration
Fixed race condition in OpenAIRealtimeBetaLLMService
2026-01-27 05:30:33 -05:00
Gökmen Görgen
81a53c699c handle AIC processor init errors gracefully and ensure _aic_ready reflects readiness 2026-01-27 11:28:05 +01:00
Sam Sykes
60168f7f69 remove comment 2026-01-26 23:16:43 +00:00
Sam Sykes
23d7608e5f changelog update 2026-01-26 23:15:30 +00:00
Sam Sykes
99242c0a93 linting updates 2026-01-26 23:14:40 +00:00
Sam Sykes
3a71865cf4 removed old metrics 2026-01-26 23:11:25 +00:00
Mark Backman
ecf2e69f3f Merge pull request #3536 from surapuramakhil/main
LLMAssistantAggregator: preserve non-ASCII characters in JSON output
2026-01-26 16:42:05 -05:00
Mark Backman
febd52274d Add changelog fragment for PR 3536 2026-01-26 16:42:00 -05:00
Mark Backman
1542d922e7 Merge pull request #3546 from pipecat-ai/pk/changelog-fragment-for-pr-3406
Added a changelog fragment for PR 3406
2026-01-26 16:31:57 -05:00
Paul Kompfner
15d5d1159e Added a changelog fragment for PR 3406 2026-01-26 16:27:33 -05:00
Mark Backman
884630a6bd Merge pull request #3559 from pipecat-ai/aleix/transport-broadcast-fixes
transports: fix broadcast_frame_class reference
2026-01-26 16:25:31 -05:00
Mark Backman
1cf137c6a8 Merge pull request #3565 from pipecat-ai/markbackman-patch-1 2026-01-26 15:49:35 -05:00
filipi87
98fcfd7c91 Adding changelog entry for the OpenAiRealtimeBetaLLMService fix. 2026-01-26 17:19:08 -03:00
filipi87
2f23f2e39c Fixed race condition in OpenAIRealtimeBetaLLMService that could cause an error when truncating the conversation. 2026-01-26 17:08:27 -03:00
Mark Backman
9c6b11cecf Update README links to use absolute URLs 2026-01-26 13:03:39 -05:00
Sam Sykes
fc1444c9d6 Updated changelog 2026-01-26 16:25:37 +00:00
Sam Sykes
ea94939add update dependency 2026-01-26 16:24:56 +00:00
Sam Sykes
0c69ae6371 Changelog entry. 2026-01-26 16:07:59 +00:00
Sam Sykes
8b88280bb1 Default to using EXTERNAL mode. 2026-01-26 15:52:42 +00:00
Sam Sykes
960d0faea5 support is_eou for final segment in utterance 2026-01-26 15:48:04 +00:00
Luke Payyapilli
b9390ccb1b Address review: remove UserStartedSpeakingFrame, add explanatory comment 2026-01-26 10:08:17 -05:00
Mark Backman
061a0dc43d Merge pull request #3498 from pipecat-ai/mb/azure-tts-8khz-workaround
AzureTTSService 8khz workaround
2026-01-26 09:48:22 -05:00
Mark Backman
328bbe069f Merge pull request #3554 from pipecat-ai/mb/simplify-stt-ttfb
Simplify STT finalize handling
2026-01-26 08:00:04 -05:00
Mark Backman
dc32ecc872 Merge pull request #3555 from pipecat-ai/mb/speechmatics-stt-ttfb
Align Speechmatics STT TTFB metrics with STT classes
2026-01-26 07:59:34 -05:00
Gökmen Görgen
ca2eb1904f Merge remote-tracking branch 'origin/aic-v2' into aic-v2 2026-01-26 10:16:23 +01:00
Gökmen Görgen
4bce58f270 update changelog and remove outdated dependency notes 2026-01-26 10:15:15 +01:00
Gökmen Görgen
7572d63f8f Update src/pipecat/audio/vad/aic_vad.py
Co-authored-by: Andres O. Vela <andresovela@users.noreply.github.com>
2026-01-26 10:06:40 +01:00
Gökmen Görgen
3c463c9416 Update src/pipecat/audio/vad/aic_vad.py
Co-authored-by: Andres O. Vela <andresovela@users.noreply.github.com>
2026-01-26 10:06:33 +01:00
Gökmen Görgen
bd618d64e3 Update src/pipecat/audio/filters/aic_filter.py
Co-authored-by: Andres O. Vela <andresovela@users.noreply.github.com>
2026-01-26 10:06:16 +01:00
Gökmen Görgen
a824660df7 add unit tests for AICVADAnalyzer and AICFilter. 2026-01-26 09:56:36 +01:00
Gökmen Görgen
58b9019852 bump aic-sdk to 2.0.1 in optional dependencies. 2026-01-26 09:14:16 +01:00
Gökmen Görgen
afcdef8c81 docstring clarification. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
bd92104fb3 clarify voice confidence method behavior in AIC VAD. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
34e9f224a8 Update src/pipecat/audio/vad/aic_vad.py
Co-authored-by: Andres O. Vela <andresovela@users.noreply.github.com>
2026-01-26 08:44:17 +01:00
Gökmen Görgen
dca7f3b5b0 add changelog. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
70a85cd192 use path for keeping the consistency between the parameters. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
91e86658b7 force developer to set a license key, it's required. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
0a8588669c address feedback. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
0e99400148 two dots are rust specific thinks, I'm not sure if it's familiar for Python developers. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
648f20db6d Update src/pipecat/audio/vad/aic_vad.py
Co-authored-by: Andres O. Vela <andresovela@users.noreply.github.com>
2026-01-26 08:44:17 +01:00
Gökmen Görgen
09b5b6b12d Update src/pipecat/audio/vad/aic_vad.py
Co-authored-by: Andres O. Vela <andresovela@users.noreply.github.com>
2026-01-26 08:44:17 +01:00
Gökmen Görgen
0e6a423955 Update src/pipecat/audio/filters/aic_filter.py
Co-authored-by: Andres O. Vela <andresovela@users.noreply.github.com>
2026-01-26 08:44:17 +01:00
Gökmen Görgen
dc8972cd94 log optimal number of frames for given sample rate in AICFilter. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
e4e2231958 Update src/pipecat/audio/vad/aic_vad.py
Co-authored-by: Andres O. Vela <andresovela@users.noreply.github.com>
2026-01-26 08:44:17 +01:00
Gökmen Görgen
18b3ee743b replace os with pathlib.Path in AICFilter for path handling consistency. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
65b8e0e89c rename enabled to bypass in AICFilter for clarity. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
b77f8b065f remove voice gain. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
5fd43faec3 add min speech duration. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
abebcf37bd address feedback. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
ca4e3c79f9 Update pyproject.toml
Co-authored-by: Andres O. Vela <andresovela@users.noreply.github.com>
2026-01-26 08:44:17 +01:00
Gökmen Görgen
e8d1bec03b Update src/pipecat/audio/filters/aic_filter.py
Co-authored-by: Andres O. Vela <andresovela@users.noreply.github.com>
2026-01-26 08:44:17 +01:00
Gökmen Görgen
f0cc54589e remove enhancement level parameter from AICFilter. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
22b9aac2ff use quail model in the example. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
7f86f4ac27 fix class name. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
dcab79753b even the parameters are fixed, keep aic ready for processing. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
bdded9b026 set SDK ID for telemetry in AIC filter. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
1e1e275fea address feedback. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
effb6aa8f4 clean up unused imports in audio utils. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
a4a9bae79e drop v1 support from aic. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
c943ef9261 keep uv.lock as it is. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
f05809520b Remove outdated AIC Filter and VAD v2 files, migrate to consolidated implementations.
Added the new ACIFilter to the same module.
2026-01-26 08:44:17 +01:00
Gökmen Görgen
ec17dc6626 aic-sdk-py v2.
# Conflicts:
#	uv.lock

# Conflicts:
#	examples/foundational/07zd-interruptible-aicoustics.py
#	pyproject.toml
#	src/pipecat/audio/filters/aic_filter.py
#	src/pipecat/audio/vad/aic_vad.py
2026-01-26 08:44:17 +01:00
Gökmen Görgen
4e85e81d9b Update src/pipecat/audio/filters/aic_filter.py
Co-authored-by: Tobias <76444201+Fl1tzi@users.noreply.github.com>
2026-01-26 08:44:17 +01:00
Gökmen Görgen
a1cc88a233 Update src/pipecat/audio/filters/aic_filter.py
Co-authored-by: Tobias <76444201+Fl1tzi@users.noreply.github.com>
2026-01-26 08:44:17 +01:00
Gökmen Görgen
61a230ec53 Update src/pipecat/audio/filters/aic_filter.py
Co-authored-by: Stephan Eckes <stephan@steck.tech>
2026-01-26 08:44:17 +01:00
Gökmen Görgen
a13380b574 clean up unused imports in audio utils. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
2a927189d9 reorganize imports. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
a90c15362c drop v1 support from aic. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
d3bdd2d246 use new model id. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
465ae4f706 keep uv.lock as it is. 2026-01-26 08:44:17 +01:00
Gökmen Görgen
a0d801b658 Remove outdated AIC Filter and VAD v2 files, migrate to consolidated implementations.
Added the new ACIFilter to the same module.
2026-01-26 08:44:17 +01:00
Gökmen Görgen
35919a84e3 aic-sdk-py v2.
# Conflicts:
#	uv.lock
2026-01-26 08:44:17 +01:00
Aleix Conchillo Flaqué
f94a60f381 transports: fix broadcast_frame_class reference 2026-01-25 15:42:09 -08:00
ssillerom
a446bca72d changes: added OutputTransportUrgentFrame to on closed, removed callback 2026-01-25 21:12:28 +01:00
Sergio Sillero
8ae834366b Merge branch 'pipecat-ai:main' into feature/genesys_serializer 2026-01-25 21:04:27 +01:00
Mark Backman
a4acc12f91 Align Speechmatics STT TTFB metrics with STT classes 2026-01-24 18:26:34 -05:00
Mark Backman
e93112e76e Simplify STT finalize handling 2026-01-24 15:28:27 -05:00
Mark Backman
680bcaac66 Merge pull request #3550 from pipecat-ai/mb/update-smart-turn-data-env-var
Update env var to PIPECAT_SMART_TURN_LOG_DATA
2026-01-24 13:52:36 -05:00
Mark Backman
d2ac9006a2 Update env var to PIPECAT_SMART_TURN_LOG_DATA 2026-01-24 12:50:42 -05:00
Mark Backman
bcb019e8ab Add TTFB metrics for STT services (#3495) 2026-01-23 18:47:34 -05:00
kompfner
4ea546785f Merge pull request #3406 from omChauhanDev/fix/openrouter-gemini-messages
fix(openrouter): handle multiple system messages for Gemini models
2026-01-23 14:53:59 -05:00
filipi87
f128cdd19a Adding a changelog entry to the AudioBufferProcessor fix. 2026-01-23 16:16:01 -03:00
filipi87
7921bce4af Refactoring AudioBufferProcessor to fix audio track synchronization. 2026-01-23 16:15:48 -03:00
Luke Payyapilli
cadced3f79 feat: handle server_content.interrupted for faster barge-in response 2026-01-23 10:41:04 -05:00
Aleix Conchillo Flaqué
8951442b8e Merge pull request #3534 from pipecat-ai/aleix/claude-skills-pr-description
claude: add pr-description skill
2026-01-22 17:34:46 -08:00
Aleix Conchillo Flaqué
7e6e3031e7 claude: add pr-description skill 2026-01-22 13:41:50 -08:00
Akhil
3b3c7aa8cc LLMAssistantAggregator: preserve non-ASCII characters in JSON output
Add ensure_ascii=False to json.dumps() calls for tool call arguments
and function call results to prevent unnecessary unicode escaping.
2026-01-22 15:37:44 -06:00
Aleix Conchillo Flaqué
308829f92b Merge pull request #3533 from pipecat-ai/aleix/claude-skills-docstring
claude: add docstring skill
2026-01-22 12:58:38 -08:00
Aleix Conchillo Flaqué
82a799e63e claude: add docstring skill 2026-01-22 12:53:38 -08:00
Cale Shapera
6b5bcae86f change default Inworld TTS model to inworld-tts-1.5-max (#3531) 2026-01-22 14:21:15 -05:00
Mark Backman
836073849c Merge pull request #3527 from weakcamel/patch-1
Update README.md - fix Google Imagen URL
2026-01-22 10:46:10 -05:00
Waldek Maleska
b13b65d6e2 Update README.md - fix Google Imagen URL 2026-01-22 15:17:41 +00:00
Mark Backman
3d545b718d Merge pull request #3344 from omChauhanDev/fix/stt-dynamic-language-update
fix: treat language as first-class STT setting
2026-01-22 09:21:56 -05:00
marcus-daily
f2fa5d9733 Updating changelog 2026-01-22 14:17:59 +00:00
marcus-daily
76b774072c Formatting fixes 2026-01-22 14:17:59 +00:00
marcus-daily
b6341ffaa5 Save Smart Turn input data if SMART_TURN_LOG_DATA is set 2026-01-22 14:17:59 +00:00
Mark Backman
29fae67c9e Merge pull request #3523 from omChauhanDev/add-location-support-google-tts
feat(google): add location parameter to TTS services
2026-01-22 09:12:16 -05:00
Mark Backman
718ea1c15e Merge pull request #3526 from pipecat-ai/mb/remove-logs
Remove application logs
2026-01-22 08:48:07 -05:00
Mark Backman
8e09d94614 Remove application logs 2026-01-22 08:28:52 -05:00
Aleix Conchillo Flaqué
de73e28563 Merge pull request #3510 from omChauhanDev/feat/add-reached-filter-methods
feat(task): add additive filter methods for frame monitoring
2026-01-21 21:05:33 -08:00
Aleix Conchillo Flaqué
55250b4f7e Merge pull request #3521 from pipecat-ai/aleix/claude-changelog-skill
claude: initial /changelog skill
2026-01-21 20:50:47 -08:00
Om Chauhan
281145a991 added changelog 2026-01-22 09:55:57 +05:30
Om Chauhan
7bd32e2fe5 feat(google): add location parameter to TTS services 2026-01-22 09:49:19 +05:30
James Hush
8f05d95f50 feat: add video_out_codec parameter for DailyTransport (#3520)
* feat: add video_out_codec parameter for DailyTransport

Add video_out_codec parameter to TransportParams allowing configuration
of the preferred video codec (VP8, H264, H265) for video output.

When set, this passes the preferredCodec option to Daily's
VideoPublishingSettings during the join operation.

* chore: move video_out_codec parameter to changelog folder (#3522)

* Initial plan

* Move video_out_codec parameter to changelog/3520.added.md

Co-authored-by: jamsea <614910+jamsea@users.noreply.github.com>

* Revert all CHANGELOG.md changes, keep only changelog/3520.added.md

Co-authored-by: jamsea <614910+jamsea@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jamsea <614910+jamsea@users.noreply.github.com>

---------

Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jamsea <614910+jamsea@users.noreply.github.com>
2026-01-22 11:31:07 +08:00
Om Chauhan
87c12f3098 changed frame filter storage type from tuples to sets 2026-01-22 08:43:46 +05:30
Om Chauhan
9c0bf89247 added changelog 2026-01-22 08:43:46 +05:30
Om Chauhan
6e44a2ab49 feat(task): add additive filter methods for frame monitoring 2026-01-22 08:43:46 +05:30
Aleix Conchillo Flaqué
7aa7b86aed claude: initial /changelog skill 2026-01-21 18:43:04 -08:00
Aleix Conchillo Flaqué
5ad9faeb4c Merge pull request #3519 from pipecat-ai/aleix/embedded-rtvi-processor
automatically add RTVI to the pipeline
2026-01-21 18:17:26 -08:00
Aleix Conchillo Flaqué
9e8f8b45c6 added changelog files for #3519 2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
0ee11ad333 tests: disable RTVI in tests by default 2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
124a3c35af RTVIObserver: don't handle some frames direction 2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
054e504868 examples(foundational): remove RTVI (automatically added by PipelineTask) 2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
e85a00cc0e PipelineTask: automatically add RTVI processor and RTVI observer
If `enable_rtvi` is enabled (enabled by default) and RTVI processor will be
added automatically to the pipeline. Also, and RTVI observer will be
registered.
2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
cc61cdbba3 RTVIProcessor: add create_rtvi_observer() 2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
62f4708d43 transports: broadcast InputTransportMessageFrame frames 2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
ba0ddb1832 FrameProcessor: copy kwargs when broadcasting frame 2026-01-21 18:14:17 -08:00
Aleix Conchillo Flaqué
eacd2a4b71 FrameProcessor: add broadcast_frame_instance() 2026-01-21 18:14:17 -08:00
Mark Backman
7ed110650d Merge pull request #3516 from okue/minorpatch1
refactor(user_mute): remove unnecessary _bot_speaking assignment in _handle_bot_stopped_speaking
2026-01-21 10:33:59 -05:00
okue
4a724379fc refactor(user_mute): remove unnecessary _bot_speaking assignment in _handle_bot_stopped_speaking
The _bot_speaking flag does not need to be set in this method,
so the redundant assignment has been removed.
2026-01-21 23:59:15 +09:00
Aleix Conchillo Flaqué
768d3958dd Merge pull request #3512 from pipecat-ai/changelog-0.0.100
Release 0.0.100 - Changelog Update
2026-01-20 19:32:56 -08:00
aconchillo
5f9ff8bd58 Update changelog for version 0.0.100 2026-01-20 19:21:19 -08:00
Aleix Conchillo Flaqué
59ed422052 Merge pull request #3511 from pipecat-ai/aleix/camb-tts-client-on-start
CambTTSService: initialize client during StartFrame
2026-01-20 19:17:45 -08:00
Aleix Conchillo Flaqué
7e0ca113af CambTTSService: initialize client during StartFrame 2026-01-20 19:07:12 -08:00
Aleix Conchillo Flaqué
13c52e0e6d Merge pull request #3509 from pipecat-ai/aleix/nvidia-stt-tts-improvements
NVIDIA STT/TTS performance improvements
2026-01-20 16:39:12 -08:00
Aleix Conchillo Flaqué
a787fd9cd8 NVIDIATTSService: process incoming audio frame right away
Process audio as soon as we receive it from the generator. Previously, we were
reading from the generator and adding elements into a queue until there was no
more data, then we would process the queue.
2026-01-20 15:41:05 -08:00
Aleix Conchillo Flaqué
14495c425a NVIDIASTTService: no need for additional queue and task 2026-01-20 13:50:17 -08:00
Aleix Conchillo Flaqué
461bd0a2e0 update changelog for #3494 and #3499 2026-01-20 13:26:40 -08:00
Aleix Conchillo Flaqué
bd45ce2b4e Merge pull request #3499 from lukepayyapilli/fix/livekit-video-queue-memory-leak
fix(livekit): prevent memory leak when video_in_enabled is False
2026-01-20 13:21:21 -08:00
Aleix Conchillo Flaqué
a266644b06 Merge pull request #3494 from omChauhanDev/fix/uninterruptible-frame-handling
fix: preserve UninterruptibleFrames in __reset_process_queue
2026-01-20 13:19:40 -08:00
Mark Backman
03faadd7f9 Merge pull request #3508 from pipecat-ai/ss/log-daily-ids
Log Daily participant and meeting session IDs upon successful join in…
2026-01-20 15:43:48 -05:00
Aleix Conchillo Flaqué
bf43032652 Merge pull request #3504 from pipecat-ai/aleix/nvidia-stt-tts-error-handling
NVIDIA STT/TTS error handling
2026-01-20 09:41:08 -08:00
Sunah Suh
fa6f924b31 Log Daily participant and meeting session IDs upon successful join in Daily Transport 2026-01-20 11:31:17 -06:00
Aleix Conchillo Flaqué
a010a020fd add changelog fo 3504 2026-01-20 09:03:30 -08:00
Aleix Conchillo Flaqué
655006aff5 NvidiaSegmentedSTTService: simplify exception handling 2026-01-20 08:58:14 -08:00
Aleix Conchillo Flaqué
671dc8cd9b NvidiaSTTService: initialize client on StartFrame
Initialize client on StartFrame so errrors are reported within the pipeline.
2026-01-20 08:58:14 -08:00
Aleix Conchillo Flaqué
9a718ded1e NvidiaTTSService: initialize client on StartFrame
Initialize client on StartFrame so errrors are reported within the pipeline.
2026-01-20 08:58:14 -08:00
Aleix Conchillo Flaqué
024809b39a Merge pull request #3503 from pipecat-ai/aleix/ai-service-start-end-cancel
AIService: handle StartFrame/EndFrame/CancelFrame exceptions
2026-01-20 08:56:39 -08:00
Aleix Conchillo Flaqué
6cf0d53d00 AIService: handle StartFrame/EndFrame/CancelFrame exceptions
If AIService subclasses implement start()/stop()/cancel() and exception are not
handled, execution will not continue and therefore the originator frames will
not be pushed. This would cause the pipeline to not be started (i.e. StartFrame
would not be pushed downstream) or stopped properly.
2026-01-20 08:54:22 -08:00
kompfner
778dacc9a8 Merge pull request #3486 from pipecat-ai/pk/fix-nova-sonic-reset-conversation
Fix `AWSNovaSonicLLMService.reset_conversation()`
2026-01-20 10:07:38 -05:00
Paul Kompfner
06b3ecd2d6 In AWS Nova Sonic service, send the "interactive" user message (which triggers the bot response) only after sending the audio input start event, per the AWS team's recommendation 2026-01-20 09:56:25 -05:00
Paul Kompfner
b4d143e39b Add CHANGELOG for fixing AWSNovaSonicLLMService.reset_conversation() 2026-01-20 09:56:25 -05:00
Paul Kompfner
c89083e72e Improve 20e example to ask the bot to give a recap when loading a previous conversation from disk 2026-01-20 09:56:25 -05:00
Luke Payyapilli
1ac811ab32 chore: revert unrelated uv.lock changes 2026-01-20 09:19:43 -05:00
Luke Payyapilli
f6359d460e chore: install livekit as optional extra in CI instead of dev dep 2026-01-20 09:16:16 -05:00
Aleix Conchillo Flaqué
f03a7175c7 Merge pull request #3501 from pipecat-ai/aleix/improve-eval-numerical-word-prompt
scripts(eval): give examples to numerical word answers
2026-01-19 20:22:06 -08:00
Aleix Conchillo Flaqué
aed44c863a scripts(eval): give examples to numerical word answers
Some models need extra help.
2026-01-19 14:37:00 -08:00
ssillerom
fa5da3b0be change comments 2026-01-19 20:49:23 +01:00
ssillerom
7e82a0cf49 feature: Genesys AudioHook WebSocket protocol serializer for Pipecat 2026-01-19 20:45:22 +01:00
Mark Backman
cddd6d5b0a Merge pull request #3492 from pipecat-ai/mb/remove-unused-imports
Remove unused imports
2026-01-19 14:07:16 -05:00
Mark Backman
11cf891ac8 Manual updates for unused imports 2026-01-19 14:03:22 -05:00
Luke Payyapilli
c89ae717fe style: fix ruff formatting 2026-01-19 11:13:41 -05:00
Luke Payyapilli
562bdd3084 test: add livekit to dev deps and improve test clarity 2026-01-19 11:11:54 -05:00
Mark Backman
cc4c3650e1 Merge pull request #3491 from pipecat-ai/mb/update-release-evals
Add Camb TTS to release evals
2026-01-19 11:04:05 -05:00
Luke Payyapilli
dfc1f09b77 fix(livekit): prevent memory leak when video_in_enabled is False 2026-01-19 11:00:23 -05:00
Mark Backman
0b1a4792b8 Bump to latest azure-cognitiveservices-speech version, 1.47.0 2026-01-19 09:52:28 -05:00
Mark Backman
14bd3b1b32 Set Azure TTS default prosody rate to None 2026-01-19 09:19:57 -05:00
Mark Backman
f733e77496 AzureTTS: work around word ordering issue at 8khz sample rate 2026-01-19 09:13:41 -05:00
Filipi da Silva Fuchter
5fc46cc450 Merge pull request #3493 from omChauhanDev/fix/globally-unique-pc-id
fix: make SmallWebRTCConnection pc_id globally unique
2026-01-19 09:04:48 -05:00
Om Chauhan
4a9eb82f92 fix: preserve UninterruptibleFrames in __reset_process_queue 2026-01-18 20:39:13 +05:30
Om Chauhan
990d8386e4 fix: make SmallWebRTCConnection pc_id globally unique 2026-01-18 19:41:51 +05:30
Mark Backman
ce7d823770 Remove unused imports 2026-01-18 08:22:22 -05:00
Mark Backman
0b93c3f900 Add Camb TTS to release evals 2026-01-17 16:27:16 -05:00
Mark Backman
829c5f4604 Merge pull request #3169 from Incanta/hathora
Add Hathora STT and TTS services
2026-01-17 16:25:12 -05:00
Mike Seese
dc8ea615d9 add hathora to run-release-evals.py 2026-01-17 10:33:58 -08:00
Mike Seese
a3d206050d move hathora example as requested 2026-01-17 10:31:08 -08:00
Mike Seese
f48a567873 run the linter 2026-01-17 10:30:47 -08:00
Mark Backman
e69ccd8ea7 Merge pull request #3490 from pipecat-ai/mb/on-user-mute-events
Add on_user_mute_started and on_user_mute_stopped events
2026-01-17 11:05:15 -05:00
Mark Backman
11924bb980 Add on_user_mute_started and on_user_mute_stopped events 2026-01-17 11:01:46 -05:00
Mark Backman
af89154e96 Merge pull request #3489 from pipecat-ai/mb/fix-azure-tts-punctuation-spacing
fix: AzureTTSService punctuation spacing
2026-01-17 11:00:30 -05:00
Mark Backman
1485ea0831 Merge pull request #3488 from pipecat-ai/mb/on-user-turn-idle
Update on_user_idle to on_user_turn_idle
2026-01-17 11:00:16 -05:00
Mark Backman
e22bc777d8 Fix spacing for CJK languages 2026-01-17 09:04:50 -05:00
Mark Backman
043403fe23 fix: AzureTTSService punctuation spacing 2026-01-17 08:18:31 -05:00
Mark Backman
1e1160906e Update on_user_idle to on_user_turn_idle 2026-01-17 07:04:27 -05:00
Aleix Conchillo Flaqué
f7d3e63063 Merge pull request #3474 from pipecat-ai/fix/optional-member-access-function-call-cancel
Fix Pylance reportOptionalMemberAccess in _handle_function_call_cancel
2026-01-16 22:06:45 -08:00
Paul Kompfner
6fa797c8e4 Fix AWS Nova Sonic reset_conversation(), which would previously error out.
Issues:
- After disconnecting, we were prematurely sending audio messages using the new prompt and content names, before the new prompt and content were created
- We weren't properly sending system instruction and conversation history messages to Nova Sonic with `"interactive": false`
2026-01-16 22:31:54 -05:00
Mark Backman
473d39791b Merge pull request #3482 from pipecat-ai/mb/user-idle-in-user-aggregator
Add UserIdleController, deprecate UserIdleProcessor
2026-01-16 18:47:10 -05:00
Aleix Conchillo Flaqué
2114abb8c6 add changelog file for 3484 2026-01-16 15:46:29 -08:00
Aleix Conchillo Flaqué
4fb4c26f55 Merge pull request #3484 from amichyrpi/main
Remove async_mode parameter from Mem0 storage
2026-01-16 15:44:52 -08:00
Mark Backman
2e8e574ea5 Add UserIdleController, deprecate UserIdleProcessor 2026-01-16 18:44:19 -05:00
Aleix Conchillo Flaqué
84c7e97be2 Merge pull request #3483 from pipecat-ai/aleix/throttle-user-speaking-frame
throttle user speaking frame
2026-01-16 15:29:37 -08:00
Amory Hen
a6e7c99d55 Remove async_mode parameter from Mem0 storage 2026-01-17 00:26:38 +01:00
Aleix Conchillo Flaqué
ac3fa7f91f BaseOuputTransport: minor cleanup 2026-01-16 15:15:49 -08:00
Aleix Conchillo Flaqué
6eadad53b2 BaseInputTransport: throttle UserSpeakingFrame 2026-01-16 15:15:49 -08:00
kompfner
b11150f31f Merge pull request #3480 from pipecat-ai/pk/fix-grok-realtime-smallwebrtc
Fix an issue where Grok Realtime would error out when running with Sm…
2026-01-16 15:46:27 -05:00
Paul Kompfner
836cf60611 Fix an issue where Grok Realtime would error out when running with SmallWebRTC transport.
The underlying issue was related to the fact that we were sending audio to Grok before we had configured the Grok session with our default input sample rate (16000), so Grok was interpreting those initial audio chunks as having its default sample rate (24000). We didn't see this issue when using the Daily transport simply because in our test environments Daily took a smidge longer than a reflexive (localhost) pure WebRTC connection, so we would only send audio to Grok *after* we had configured the Grok session with the desired sample rate.
2026-01-16 15:41:33 -05:00
James Hush
1c13ad95a5 Fix Pylance reportOptionalMemberAccess in _handle_function_call_cancel
Extract dictionary value to local variable and check for None before
accessing cancel_on_interruption attribute, since the dictionary values
are typed as Optional[FunctionCallInProgressFrame].
2026-01-16 15:04:26 -05:00
Mark Backman
1e8516e91d Merge pull request #3476 from pipecat-ai/mb/project-urls
Update project.urls for PyPI
2026-01-16 14:57:39 -05:00
Mark Backman
32c775311d Merge pull request #3471 from pipecat-ai/mb/fix-pydantic-2.12-docs
Revert pydantic 2.12 extra type annotation
2026-01-16 14:57:24 -05:00
Mark Backman
28d0bb98de Merge pull request #3472 from pipecat-ai/mb/whisker-dev
Add whisker_setup.py setup file to .gitignore
2026-01-16 14:55:48 -05:00
Aleix Conchillo Flaqué
a9a9f3aeaa Merge pull request #3462 from pipecat-ai/aleix/fix-min-words-transcription-aggregation
MinWordsUserTurnStartStrategy: don't aggregate transcriptions
2026-01-16 11:18:23 -08:00
Aleix Conchillo Flaqué
c2a0735975 MinWordsUserTurnStartStrategy: don't aggregate transcriptions
If we aggregate transcriptions we will get incorrect interruptions. For example,
if we have a strategy with min_words=3 and we say "One" and pause, then "Two"
and pause and then "Three", this would trigger the start of the turn when it
shouldn't. We should only look at the incoming transcription text and don't
aggregate it with the previous.
2026-01-16 11:16:06 -08:00
Aleix Conchillo Flaqué
41cb53f6c2 Merge pull request #3479 from pipecat-ai/aleix/turns-mute-to-user-mute
turns: move mute to user_mute
2026-01-16 11:11:50 -08:00
Aleix Conchillo Flaqué
58552af8fd examples(foundational): remote STTMuteFilter example 2026-01-16 11:07:20 -08:00
Aleix Conchillo Flaqué
c7ab87b0cc turns: move mute to user_mute 2026-01-16 11:07:20 -08:00
Mark Backman
11ecc5fdee Update project.urls for PyPI 2026-01-16 12:48:13 -05:00
kompfner
19fb3eed9f Merge pull request #3466 from pipecat-ai/pk/fix-aws-nova-sonic-rtvi-bot-output
Fix realtime (speech-to-speech) services' RTVI event compatibility
2026-01-16 09:56:13 -05:00
Mark Backman
b292b32374 Merge pull request #3461 from glennpow/glenn/websocket-headers
Allow WebsocketClientTransport to send custom headers
2026-01-15 20:26:36 -05:00
Mark Backman
63d1393bb0 Add whisker_setup.py to .gitignore 2026-01-15 20:21:25 -05:00
Glenn Powell
37914cb062 Removed import and added changelog entry. 2026-01-15 16:47:15 -08:00
Mark Backman
ec40696854 Revert pydantic 2.12 extra type annotation 2026-01-15 19:16:15 -05:00
Mike Seese
2249f3d673 add requested changes from code review 2026-01-15 15:27:56 -08:00
Mike Seese
d2df324f29 fix some bugs after testing changes 2026-01-15 15:27:56 -08:00
Mike Seese
67fdb0b659 use parent _settings dict instead of self._params pattern 2026-01-15 15:27:56 -08:00
Mike Seese
e77bdf66f9 add can_generate_metrics functions 2026-01-15 15:27:56 -08:00
Mike Seese
1b3b67779c switch hathora services to use InputParams pattern 2026-01-15 15:27:55 -08:00
Mike Seese
6c7e386391 remove traced_stt from run_stt 2026-01-15 15:27:55 -08:00
Mike Seese
ba25b279d6 fix issues with PR suggestions 2026-01-15 15:27:55 -08:00
Mike Seese
e7c83c19b6 port turn_start_strategies to the newer user_turn_strategies 2026-01-15 15:27:55 -08:00
Mike Seese
7be7fb49a3 remove turn_analyzer args from transport params 2026-01-15 15:27:54 -08:00
Mike Seese
bcccb4cbb3 put fallback sample_rate value in function arg 2026-01-15 15:27:54 -08:00
Mike Seese
e9f1d951d3 Apply suggestions from code review
Co-authored-by: Mark Backman <m.backman@gmail.com>
2026-01-15 15:27:54 -08:00
Mike Seese
e5632a9339 transition Hathora service to use the unified API and apply PR feedback
add Hathora to root files

Hathora run linter

added hathora changelog
2026-01-15 15:27:53 -08:00
Mike Seese
1510fb4fc0 add Hathora STT and TTS services 2026-01-15 15:26:52 -08:00
Mark Backman
64a1ad2649 Merge pull request #3470 from pipecat-ai/mb/fix-docs-0.0.99
Docs fixes after 0.0.99
2026-01-15 17:34:44 -05:00
Mark Backman
4458ca1d24 Mock FastAPI 2026-01-15 17:29:47 -05:00
Mark Backman
21aaa48e62 Fix pydantic issues impacting autodoc 2026-01-15 17:29:47 -05:00
Mark Backman
e75c241030 Merge pull request #3468 from pipecat-ai/mb/camb-cleanuo
Clean up CambTTSService
2026-01-15 17:16:28 -05:00
Mark Backman
60216048a8 Docs fixes after 0.0.99 2026-01-15 16:40:42 -05:00
Mark Backman
f3c2e29fb4 Clean up CambTTSService 2026-01-15 15:59:17 -05:00
Paul Kompfner
ce99924be4 Add CHANGELOG entry describing fix for the missing "bot-llm-text" RTVI event when using realtime (speech-to-speech) services 2026-01-15 15:55:39 -05:00
Paul Kompfner
5de80a60d4 Fix "bot-llm-text" not firing when using Grok Realtime 2026-01-15 15:30:00 -05:00
Paul Kompfner
5753762350 Fix "bot-llm-text" not firing when using OpenAI Realtime 2026-01-15 15:16:08 -05:00
Paul Kompfner
885b318b04 Fix "bot-llm-text" not firing when using Gemini Live 2026-01-15 15:03:45 -05:00
Paul Kompfner
7a22d58cf4 Fix "bot-llm-text" not firing when using AWS Nova Sonic 2026-01-15 14:56:50 -05:00
Mark Backman
c8e4b462c9 Merge pull request #3460 from pipecat-ai/mb/reorder-07-examples
Renumber the 07 foundational examples
2026-01-15 14:44:21 -05:00
Mark Backman
30a3f42255 Merge pull request #3349 from eRuaro/feat/camb-tts-integration
Add Camb.ai TTS integration with MARS models
2026-01-15 14:43:12 -05:00
Neil Ruaro
26ddb2de2f minimal uv.lock update for camb-sdk 2026-01-16 03:18:01 +08:00
Neil Ruaro
f60eeaa212 reverted uv.lock, updated readthedocs.yaml, copyright year updates 2026-01-16 02:50:18 +08:00
Neil Ruaro
8cf72b36cb manually add camb-sdk to uv.lock, exclude camb from docs build 2026-01-16 02:26:38 +08:00
Neil Ruaro
38c3bcef96 exclude camb from docs build 2026-01-16 02:20:26 +08:00
Neil Ruaro
80604ba7b6 remove _update_settings method 2026-01-16 02:00:48 +08:00
Neil Ruaro
256c70c631 use UserTurnStrategies 2026-01-16 01:32:08 +08:00
Glenn Powell
0e3532c529 Allow WebsocketClientTransport to send custom headers 2026-01-15 09:31:48 -08:00
Neil Ruaro
9942fcfeb2 updated per PR reviews 2026-01-16 01:20:17 +08:00
Neil Ruaro
003c24ca6e Make model parameter explicit in docstring example 2026-01-16 01:18:37 +08:00
Neil Ruaro
ed120d014d Add model-specific sample rates, transport example, and fix audio buffer alignment 2026-01-16 01:18:37 +08:00
Neil Ruaro
e76a3d04f0 Update Camb TTS to 48kHz sample rate 2026-01-16 01:18:37 +08:00
Neil Ruaro
641d17007f Clean up Camb TTS service and tests 2026-01-16 01:18:37 +08:00
Neil Ruaro
9293b5f24a Migrate Camb TTS service from raw HTTP to official SDK
- Replace aiohttp with camb SDK (AsyncCambAI client)
- Add support for passing existing SDK client instance
- Simplify API: no longer requires aiohttp_session parameter
- Update example to use simplified initialization
- Rewrite tests to mock SDK client instead of HTTP servers
2026-01-16 01:18:37 +08:00
Neil Ruaro
c1f3cbd1d4 Yield TTSAudioRawFrame directly instead of calling private method 2026-01-16 01:18:37 +08:00
Neil Ruaro
78fa2ab65e Update default voice ID, fix MARS naming, and clean up example 2026-01-16 01:18:37 +08:00
Neil Ruaro
56da2caeed Update Camb.ai TTS inference options 2026-01-16 01:18:37 +08:00
Neil Ruaro
a541d65255 Update MARS model names to mars-flash, mars-pro, mars-instruct
Rename model identifiers from mars-8-* to the new naming convention:
- mars-8-flash -> mars-flash (default)
- mars-8 -> removed
- mars-8-instruct -> mars-instruct
- Added mars-pro
2026-01-16 01:18:37 +08:00
Neil Ruaro
a3d7e9eafe Address PR feedback: add --voice-id arg, remove test script
- Add --voice-id CLI argument to example (default: 2681)
- Remove test_camb_quick.py from examples/ (tests belong in tests/)
- Update docstring with new usage
2026-01-16 01:18:36 +08:00
Neil Ruaro
54933bea2a Rename changelog to PR number 2026-01-16 01:18:36 +08:00
Neil Ruaro
fcab9899cc Add changelog entry for Camb.ai TTS integration 2026-01-16 01:18:36 +08:00
Neil Ruaro
be098e85db Remove non-working Daily/WebRTC example
The Daily transport example had authentication issues. Keeping the
local audio example (07zb-interruptible-camb-local.py) which works.
2026-01-16 01:18:36 +08:00
Neil Ruaro
ed0ff46a87 added local test 2026-01-16 01:18:36 +08:00
Neil Ruaro
7ae0d651d6 added cambai tts integration 2026-01-16 01:18:36 +08:00
Mark Backman
efd4432cfb Renumber the 07 foundational examples 2026-01-15 10:26:17 -05:00
kompfner
24082b84f2 Merge pull request #3453 from pipecat-ai/pk/consistency-pass-on-user-started-stopped-speaking-frames
Do a consistency pass on how we're sending `UserStartedSpeakingFrame`…
2026-01-15 09:24:14 -05:00
Aleix Conchillo Flaqué
dcd5840341 Merge pull request #3455 from pipecat-ai/aleix/reset-user-turn-start-strategies
UserTurnController: reset user turn start strategies when turn triggered
2026-01-14 19:28:32 -08:00
Aleix Conchillo Flaqué
9e705ce768 UserTurnController: reset user turn start strategies when turn triggered 2026-01-14 18:20:29 -08:00
Mark Backman
965466cc09 Merge pull request #3454 from pipecat-ai/mb/external-turn-strategies-timeout
fix to make on_user_turn_stop_timeout work with ExternalUserTurnStrat…
2026-01-14 20:15:31 -05:00
Mark Backman
f3993f1775 fix to make on_user_turn_stop_timeout work with ExternalUserTurnStrategies 2026-01-14 20:10:56 -05:00
Paul Kompfner
e107902b14 Do a consistency pass on how we're sending UserStartedSpeakingFrames and UserStoppedSpeakingFrames. The codebase is now consistent in broadcasting both types of frames up and downstream. 2026-01-14 18:47:15 -05:00
kompfner
e7b5ff49f4 Merge pull request #3447 from pipecat-ai/pk/add-pr-3420-to-changelog
Add PR 3420 to CHANGELOG (it was missing)
2026-01-14 15:33:44 -05:00
Paul Kompfner
e33172c44e Add PR 3420 to CHANGELOG (it was missing) 2026-01-14 15:33:07 -05:00
Mark Backman
3d858e8aa6 Merge pull request #3444 from pipecat-ai/mb/update-quickstart-0.0.99
Update quickstart example for 0.0.99
2026-01-14 10:29:55 -05:00
Mark Backman
eab059c49a Merge pull request #3446 from pipecat-ai/mb/add-3392-changelog
Add PR 3392 to changelog, linting cleanup
2026-01-14 10:28:57 -05:00
Mark Backman
4aaff04fb3 Add PR 3392 to changelog, linting cleanup 2026-01-14 09:43:17 -05:00
Mark Backman
cb364f3cab Update quickstart example for 0.0.99 2026-01-14 08:59:20 -05:00
Mark Backman
a9bfb090c3 Merge pull request #3287 from ashotbagh/feature/asyncai-multicontext-wss
Fix TTFB metric and add multi-context WebSocket support for Async TTS
2026-01-14 07:52:52 -05:00
Ashot
c4ae4025f3 Adjustments of Async TTS for multicontext websocket support 2026-01-14 16:33:30 +04:00
Ashot
15067c678d adapt Async TTS to updated AudioContextTTSService 2026-01-14 15:45:27 +04:00
Ashot
5ae592f38e Improve Async TTS interruption handling by using AudioContextTTSService class and add changelog fragments 2026-01-14 15:45:27 +04:00
Ashot
9cdbc56be3 Fix TTFB metric and add multi-context WebSocket support for Async TTS 2026-01-14 15:45:27 +04:00
Om Chauhan
38506f51f7 fix(openrouter): handle multiple system messages for Gemini models 2026-01-11 21:19:47 +05:30
Om Chauhan
1ceb01665f fix: treat language as first-class STT setting 2026-01-04 11:04:30 +05:30
Martin Liu
8dfc59be13 Include pts in incoming video and audio frames 2025-11-12 18:36:56 -05:00
717 changed files with 72765 additions and 20191 deletions

View File

@@ -0,0 +1,27 @@
{
"name": "pipecat-dev-skills",
"owner": {
"name": "Pipecat"
},
"metadata": {
"description": "Development workflow skills for contributing to the Pipecat project",
"version": "1.0.0"
},
"plugins": [
{
"name": "pipecat-dev",
"description": "Development workflow skills for contributing to the Pipecat project",
"version": "1.0.0",
"source": "./",
"skills": [
"./.claude/skills/changelog",
"./.claude/skills/cleanup",
"./.claude/skills/code-review",
"./.claude/skills/docstring",
"./.claude/skills/pr-description",
"./.claude/skills/pr-submit",
"./.claude/skills/update-docs"
]
}
]
}

5
.claude/settings.json Normal file
View File

@@ -0,0 +1,5 @@
{
"attribution": {
"commit": ""
}
}

View File

@@ -0,0 +1,61 @@
---
name: changelog
description: Create changelog files for important commits in a PR
---
Create changelog files for the important commits in this PR. The PR number is provided as an argument.
## Instructions
1. Skip changelog for: documentation-only, internal refactoring, test-only, CI changes.
2. First, check what commits are on the current branch compared to main:
```
git log main..HEAD --oneline
```
3. For each significant change, create a changelog file in the `changelog/` folder using the format:
Allowed types: `added`, `changed`, `deprecated`, `removed`, `fixed`, `security`, `performance`, `other`
- `{PR_NUMBER}.added.md` - for new features
- `{PR_NUMBER}.added.2.md`, `{PR_NUMBER}.added.3.md` - for additional entries of the same type
- `{PR_NUMBER}.changed.md` - for changes to existing functionality
- `{PR_NUMBER}.fixed.md` - for bug fixes
- `{PR_NUMBER}.deprecated.md` - for deprecations
- `{PR_NUMBER}.removed.md` - for removed features
- `{PR_NUMBER}.security.md` - for security fixes
- `{PR_NUMBER}.performance.md` - for performance improvements
- `{PR_NUMBER}.other.md` - for other changes
4. Each changelog file should at least contain a main single line starting with `- ` followed by a clear description of the change. No line wrapping.
5. If the change is complicated, changelog files can have indented lines after the main line with additional details or code samples.
6. Use ⚠️ emoji prefix for breaking changes.
7. **Write changes in user-facing terms first.** Lead with what users of the framework will notice: new APIs, changed behavior, new parameters, fixed bugs they might have hit, etc. Implementation details (internal refactoring, how something is wired up under the hood) can be included as secondary context after the user-facing description, but should never be the *only* content of a changelog entry when there is a user-visible effect.
**Good** (user-facing first, implementation detail as context):
```
- Turn completion instructions now persist correctly across full context updates when using `system_instruction`. Previously they were injected as a context system message, which caused warning spam and didn't survive context updates.
```
**Bad** (implementation detail only, no user-facing framing):
```
- Fixed turn completion instructions being injected as a context system message instead of using `system_instruction`.
```
Ask yourself: "If I'm a developer building on Pipecat, what would I notice changed?" Start there.
## Example
For PR #3519 with a new feature and a bug fix:
`changelog/3519.added.md`:
```
- Added `SomeNewFeature` for doing something useful.
```
`changelog/3519.fixed.md`:
```
- Fixed an issue where something was not working correctly in some user-visible scenario. The root cause was an internal implementation detail.
```

View File

@@ -0,0 +1,307 @@
# Code Cleanup Skill
The **Code Cleanup Skill** reviews, refactors, and documents code changes in your current branch, ensuring alignment with **Pipecat's architecture, coding standards, and example patterns**.
It focuses on **readability, correctness, performance, and consistency**, while avoiding breaking changes.
---
## Skill Overview
This skill analyzes all changes introduced in your branch and performs the following actions:
1. **Analyze Branch Changes**
- Review uncommitted changes and outgoing commits
2. **Refactor for Readability**
- Improve clarity, naming, structure, and modern Python usage
3. **Enhance Performance**
- Identify safe, conservative optimization opportunities
4. **Add Documentation**
- Apply Pipecat-style, Google-format docstrings
5. **Ensure Pattern Consistency**
- Match existing Pipecat services, pipelines, and examples
6. **Validate Examples**
- Ensure examples follow foundational patterns (e.g. `07-interruptible.py`)
---
## Usage
Invoke the skill using any of the following commands:
- "Clean up my branch code"
- "Refactor the changes in my branch"
- "Review and improve my branch code"
- `/cleanup`
---
## What This Skill Does
### 1. Analyze Branch Changes
The skill retrieves all uncommitted changes and outgoing commits to understand:
- New files added
- Modified files
- Code additions and deletions
- Overall scope and intent of changes
---
### 2. Code Refactoring
#### Readability Improvements
- Replace tuples with named classes or dataclasses
- Improve variable, method, and class naming
- Extract complex logic into well-named helper methods
- Add missing type hints
- Simplify nested or complex conditionals
- Replace deprecated methods and features
- Normalize formatting to match Pipecat style
#### Performance Enhancements
- Identify inefficient loops or repeated work
- Suggest appropriate data structures
- Optimize async workflows and I/O
- Remove redundant operations
> Performance changes are conservative and non-breaking.
---
### 3. Documentation
Documentation follows **Google-style docstrings**, consistent with Pipecat conventions.
#### Class Documentation
```python
class ExampleService:
"""Brief one-line description.
Detailed explanation of the class purpose, responsibilities,
and important behaviors.
Supported features:
- Feature 1
- Feature 2
- Feature 3
"""
```
#### Method Documentation
```python
def process_data(self, data: str, options: Optional[dict] = None) -> bool:
"""Process incoming data with optional configuration.
Args:
data: The input data to process.
options: Optional configuration dictionary.
Returns:
True if processing succeeded, False otherwise.
Raises:
ValueError: If data is empty or invalid.
"""
```
#### Pydantic Model Parameters
```python
class InputParams(BaseModel):
"""Configuration parameters for the service.
Parameters:
timeout: Request timeout in seconds.
retry_count: Number of retry attempts.
enable_logging: Whether to enable debug logging.
"""
timeout: Optional[float] = None
retry_count: int = 3
enable_logging: bool = False
```
---
### 4. Pattern Consistency Checks
#### Service Classes
- Correct inheritance (`TTSService`, `STTService`, `LLMService`)
- Consistent constructor signatures
- Frame emission patterns
- Metrics support:
- `can_generate_metrics()`
- TTFB metrics
- Usage metrics
- Alignment with similar existing services
#### Examples
Validated against `examples/foundational/07-interruptible.py`:
- Proper `create_transport()` usage
- Correct pipeline structure
- Task setup and observers
- Event handler registration
- Runner and bot entrypoint consistency
---
### 5. Specific Implementation Patterns
#### Service Implementation
```python
class ExampleTTSService(TTSService):
def __init__(self, *, api_key: Optional[str] = None, **kwargs):
super().__init__(**kwargs)
self._api_key = api_key or os.getenv("SERVICE_API_KEY")
def can_generate_metrics(self) -> bool:
return True
async def run_tts(self, text: str) -> AsyncGenerator[Frame, None]:
try:
await self.start_ttfb_metrics()
yield TTSStartedFrame()
# ... processing ...
yield TTSAudioRawFrame(...)
finally:
await self.stop_ttfb_metrics()
```
---
#### Example Structure Pattern
```python
transport_params = {
"daily": lambda: DailyParams(...),
"twilio": lambda: FastAPIWebsocketParams(...),
"webrtc": lambda: TransportParams(...),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = DeepgramSTTService(...)
tts = SomeTTSService(...)
llm = OpenAILLMService(...)
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(...)
pipeline = Pipeline([...])
task = PipelineTask(pipeline, params=..., observers=[...])
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
await task.queue_frames([LLMRunFrame()])
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
```
---
## Execution Flow
1. Fetch uncommitted and outgoing changes
2. Categorize files (services, examples, tests, utilities)
3. Analyze each file:
- Readability
- Performance
- Documentation
- Pattern consistency
4. Generate actionable recommendations
5. Apply Pipecat standards
---
## Examples
### Before: Tuple Usage
```python
def get_audio_info(self) -> Tuple[int, int]:
return (48000, 1)
```
### After: Named Class
```python
class AudioInfo:
"""Audio configuration information.
Parameters:
sample_rate: Sample rate in Hz.
num_channels: Number of audio channels.
"""
sample_rate: int
num_channels: int
def get_audio_info(self) -> AudioInfo:
return AudioInfo(sample_rate=48000, num_channels=1)
```
---
### Before: Missing Documentation
```python
class NewTTSService(TTSService):
def __init__(self, api_key: str, voice: str):
self._api_key = api_key
self._voice = voice
```
### After: Fully Documented
```python
class NewTTSService(TTSService):
"""Text-to-speech service using NewProvider API.
Streams PCM audio and emits TTSAudioRawFrame frames compatible
with Pipecat transports.
Supported features:
- Text-to-speech synthesis
- Streaming PCM audio
- Voice customization
- TTFB metrics
"""
def __init__(self, *, api_key: str, voice: str, **kwargs):
"""Initialize the NewTTSService.
Args:
api_key: API key for authentication.
voice: Voice identifier to use.
**kwargs: Additional arguments passed to the parent service.
"""
super().__init__(**kwargs)
self._api_key = api_key
self.set_voice(voice)
```
---
## Notes
- Non-breaking improvements only
- Backward compatibility preserved
- Conservative performance changes
- Google-style docstrings
- Pattern checks follow recent Pipecat code

View File

@@ -0,0 +1,107 @@
---
name: code-review
description: Automated code review for pull requests using multiple specialized agents
disable-model-invocation: true
allowed-tools: Bash(gh issue view:*), Bash(gh search:*), Bash(gh issue list:*), Bash(gh pr comment:*), Bash(gh pr diff:*), Bash(gh pr view:*), Bash(gh pr list:*)
---
Provide a code review for the given pull request.
**Agent assumptions (applies to all agents and subagents):**
- All tools are functional and will work without error. Do not test tools or make exploratory calls. Make sure this is clear to every subagent that is launched.
- Only call a tool if it is required to complete the task. Every tool call should have a clear purpose.
To do this, follow these steps precisely:
1. Launch a haiku agent to check if any of the following are true:
- The pull request is closed
- The pull request is a draft
- The pull request does not need code review (e.g. automated PR, trivial change that is obviously correct)
- Claude has already commented on this PR (check `gh pr view <PR> --comments` for comments left by claude)
If any condition is true, stop and do not proceed.
Note: Still review Claude generated PR's.
2. Launch a haiku agent to return a list of file paths (not their contents) for all relevant CLAUDE.md files including:
- The root CLAUDE.md file, if it exists
- Any CLAUDE.md files in directories containing files modified by the pull request
3. Launch a sonnet agent to view the pull request and return a summary of the changes
4. Launch 4 agents in parallel to independently review the changes. Each agent should return the list of issues, where each issue includes a description and the reason it was flagged (e.g. "CLAUDE.md adherence", "bug"). The agents should do the following:
Agents 1 + 2: CLAUDE.md compliance sonnet agents
Audit changes for CLAUDE.md compliance in parallel. Note: When evaluating CLAUDE.md compliance for a file, you should only consider CLAUDE.md files that share a file path with the file or parents.
Agent 3: Opus bug agent (parallel subagent with agent 4)
Scan for obvious bugs. Focus only on the diff itself without reading extra context. Flag only significant bugs; ignore nitpicks and likely false positives. Do not flag issues that you cannot validate without looking at context outside of the git diff.
Agent 4: Opus bug agent (parallel subagent with agent 3)
Look for problems that exist in the introduced code. This could be security issues, incorrect logic, etc. Only look for issues that fall within the changed code.
**CRITICAL: We only want HIGH SIGNAL issues.** Flag issues where:
- The code will fail to compile or parse (syntax errors, type errors, missing imports, unresolved references)
- The code will definitely produce wrong results regardless of inputs (clear logic errors)
- Clear, unambiguous CLAUDE.md violations where you can quote the exact rule being broken
Do NOT flag:
- Code style or quality concerns
- Potential issues that depend on specific inputs or state
- Subjective suggestions or improvements
If you are not certain an issue is real, do not flag it. False positives erode trust and waste reviewer time.
In addition to the above, each subagent should be told the PR title and description. This will help provide context regarding the author's intent.
5. For each issue found in the previous step by agents 3 and 4, launch parallel subagents to validate the issue. These subagents should get the PR title and description along with a description of the issue. The agent's job is to review the issue to validate that the stated issue is truly an issue with high confidence. For example, if an issue such as "variable is not defined" was flagged, the subagent's job would be to validate that is actually true in the code. Another example would be CLAUDE.md issues. The agent should validate that the CLAUDE.md rule that was violated is scoped for this file and is actually violated. Use Opus subagents for bugs and logic issues, and sonnet agents for CLAUDE.md violations.
6. Filter out any issues that were not validated in step 5. This step will give us our list of high signal issues for our review.
7. If issues were found, skip to step 8 to post comments.
If NO issues were found, post a summary comment using `gh pr comment` (if `--comment` argument is provided):
"No issues found. Checked for bugs and CLAUDE.md compliance."
8. Create a list of all comments that you plan on leaving. This is only for you to make sure you are comfortable with the comments. Do not post this list anywhere.
9. Post inline comments for each issue using `gh pr review` with inline comments. For each comment:
- Provide a brief description of the issue
- For small, self-contained fixes, include a committable suggestion block
- For larger fixes (6+ lines, structural changes, or changes spanning multiple locations), describe the issue and suggested fix without a suggestion block
- Never post a committable suggestion UNLESS committing the suggestion fixes the issue entirely. If follow up steps are required, do not leave a committable suggestion.
**IMPORTANT: Only post ONE comment per unique issue. Do not post duplicate comments.**
Use this list when evaluating issues in Steps 4 and 5 (these are false positives, do NOT flag):
- Pre-existing issues
- Something that appears to be a bug but is actually correct
- Pedantic nitpicks that a senior engineer would not flag
- Issues that a linter will catch (do not run the linter to verify)
- General code quality concerns (e.g., lack of test coverage, general security issues) unless explicitly required in CLAUDE.md
- Issues mentioned in CLAUDE.md but explicitly silenced in the code (e.g., via a lint ignore comment)
Notes:
- Use gh CLI to interact with GitHub (e.g., fetch pull requests, create comments). Do not use web fetch.
- Create a todo list before starting.
- You must cite and link each issue in inline comments (e.g., if referring to a CLAUDE.md, include a link to it).
- If no issues are found, post a comment with the following format:
---
## Code review
No issues found. Checked for bugs and CLAUDE.md compliance.
---
- When linking to code in inline comments, follow the following format precisely, otherwise the Markdown preview won't render correctly: `https://github.com/OWNER/REPO/blob/FULL_SHA/path/to/file.py#L10-L15`
- Requires full git sha
- You must provide the full sha. Commands like `https://github.com/owner/repo/blob/$(git rev-parse HEAD)/foo/bar` will not work, since your comment will be directly rendered in Markdown.
- Repo name must match the repo you're code reviewing
- # sign after the file name
- Line range format is L[start]-L[end]
- Provide at least 1 line of context before and after, centered on the line you are commenting about (eg. if you are commenting about lines 5-6, you should link to `L4-7`)

View File

@@ -0,0 +1,256 @@
---
name: docstring
description: Document a Python module and its classes using Google style
---
Document a Python module or class using Google-style docstrings following project conventions. The argument can be a class name or a module path.
## Instructions
1. Determine what to document based on the argument:
**If a module path is provided** (e.g. `src/pipecat/audio/vad/vad_analyzer.py`):
- Use that file directly
**If a class name is provided** (e.g. `VADAnalyzer`):
- Search for `class ClassName` in `src/pipecat/`
- If multiple files contain that class name, list all matches with their file paths, ask the user which one they want to document, and wait for confirmation
2. Once the file is identified, read the module to understand its structure:
- Identify all classes, functions, and important type aliases
- Understand the purpose of each component
4. Apply documentation in this order:
- Module docstring (at top, after imports)
- Class docstrings
- `__init__` methods (always document constructor parameters)
- Public methods (not starting with `_`)
- Dataclass/config classes with field descriptions
5. Skip documentation for:
- Private methods (starting with `_`)
- Simple dunder methods (`__str__`, `__repr__`, `__post_init__`)
- Very simple pass-through properties
- **Already documented code** - If a class, method, or function already has a complete docstring that follows the project style, do not modify it. A docstring is complete if it has:
- A one-line summary
- Args section (if it has parameters)
- Returns section (if it returns something meaningful)
- Only add or improve documentation where it is missing or incomplete
## Module Docstring Format
```python
"""[One-line description of module purpose].
[Optional: Longer explanation of functionality, key classes, or use cases.]
"""
```
Example:
```python
"""Neuphonic text-to-speech service implementations.
This module provides WebSocket and HTTP-based integrations with Neuphonic's
text-to-speech API for real-time audio synthesis.
"""
```
## Class Docstring Format
```python
class ClassName:
"""One-line summary describing what the class does.
[Longer description explaining purpose, behavior, and key features.
Use action-oriented language.]
[Optional: Event handlers, usage notes, or important caveats.]
"""
```
Example:
```python
class FrameProcessor(BaseObject):
"""Base class for all frame processors in the pipeline.
Frame processors are the building blocks of Pipecat pipelines, they can be
linked to form complex processing pipelines. They receive frames, process
them, and pass them to the next or previous processor in the chain.
Event handlers available:
- on_before_process_frame: Called before a frame is processed
- on_after_process_frame: Called after a frame is processed
Example::
@processor.event_handler("on_before_process_frame")
async def on_before_process_frame(processor, frame):
...
@processor.event_handler("on_after_process_frame")
async def on_after_process_frame(processor, frame):
...
"""
```
Note: When listing event handlers, do NOT use backticks. Include an `Example::` section (with double colon for Sphinx) showing the decorator pattern and function signature for each event.
## Constructor (`__init__`) Format
```python
def __init__(self, *, param1: Type, param2: Type = default, **kwargs):
"""Initialize the [ClassName].
Args:
param1: Description of param1 and its purpose.
param2: Description of param2. Defaults to [default].
**kwargs: Additional arguments passed to parent class.
"""
```
Example:
```python
def __init__(
self,
*,
api_key: str,
voice_id: Optional[str] = None,
sample_rate: Optional[int] = 22050,
**kwargs,
):
"""Initialize the Neuphonic TTS service.
Args:
api_key: Neuphonic API key for authentication.
voice_id: ID of the voice to use for synthesis.
sample_rate: Audio sample rate in Hz. Defaults to 22050.
**kwargs: Additional arguments passed to parent InterruptibleTTSService.
"""
```
## Method Docstring Format
```python
async def method_name(self, param1: Type) -> ReturnType:
"""One-line summary of what method does.
[Longer description if behavior isn't obvious.]
Args:
param1: Description of param1.
Returns:
Description of return value.
Raises:
ExceptionType: When this exception is raised.
"""
```
Example:
```python
async def put(self, item: Tuple[Frame, FrameDirection, FrameCallback]):
"""Put an item into the priority queue.
System frames (`SystemFrame`) have higher priority than any other
frames. If a non-frame item is provided it will have the highest priority.
Args:
item: The item to enqueue.
"""
```
## Dataclass/Config Format
```python
@dataclass
class ConfigName:
"""One-line description of configuration.
[Explanation of when/how to use this config.]
Parameters:
field1: Description of field1.
field2: Description of field2. Defaults to [default].
"""
field1: Type
field2: Type = default_value
```
Example:
```python
@dataclass
class FrameProcessorSetup:
"""Configuration parameters for frame processor initialization.
Parameters:
clock: The clock instance for timing operations.
task_manager: The task manager for handling async operations.
observer: Optional observer for monitoring frame processing events.
"""
clock: BaseClock
task_manager: BaseTaskManager
observer: Optional[BaseObserver] = None
```
## Enum Documentation Format
```python
class EnumName(Enum):
"""One-line description of the enum purpose.
[Longer description of how the enum is used.]
Parameters:
VALUE1: Description of VALUE1.
VALUE2: Description of VALUE2.
"""
VALUE1 = 1
VALUE2 = 2
```
## Writing Style Guidelines
- **Concise and professional** - No casual language or filler words
- **Action-oriented** - Start with verbs: "Processes...", "Manages...", "Converts..."
- **Purpose before implementation** - Explain WHY before HOW
- **Clear parameter descriptions** - Include type hints, defaults, and purpose
- **No redundant type info** - Type hints are in the signature, don't repeat in description
- **Use backticks for code references** - Wrap class names, method names, event names, parameter names, and code snippets in backticks
Good: "Neuphonic API key for authentication."
Bad: "str: The API key (string) that is used for authenticating with Neuphonic."
Good: "Triggers `on_speech_started` when the `VADAnalyzer` detects speech."
Bad: "Triggers on_speech_started when the VADAnalyzer detects speech."
## Deprecation Notice Format
When documenting deprecated code:
```python
"""[Description].
.. deprecated:: X.X.X
`ClassName` is deprecated and will be removed in a future version.
Use `NewClassName` instead.
"""
```
## Checklist
Before finishing, verify:
- [ ] Module has a docstring at the top (after copyright header and imports)
- [ ] All public classes have docstrings
- [ ] All `__init__` methods document their parameters
- [ ] All public methods have docstrings with Args/Returns/Raises as needed
- [ ] Dataclasses use "Parameters:" section for field descriptions
- [ ] Enums document each value in "Parameters:" section
- [ ] Writing is concise and action-oriented
- [ ] No documentation added to private methods (starting with `_`)
- [ ] Existing complete docstrings were left unchanged

View File

@@ -0,0 +1,128 @@
---
name: pr-description
description: Update a GitHub PR description with a summary of changes
---
Update a GitHub pull request description based on the changes in the PR.
## Arguments
```
/pr-description <PR_NUMBER> [--fixes <ISSUE_NUMBERS>]
```
- `PR_NUMBER` (required): The pull request number to update
- `--fixes` (optional): Comma-separated issue numbers that this PR fixes (e.g., `--fixes 123,456`)
Examples:
- `/pr-description 3534`
- `/pr-description 3534 --fixes 123`
- `/pr-description 3534 --fixes 123,456,789`
## Instructions
1. First, gather information about the PR:
- Use GitHub plugin to get PR details (title, current description, base branch)
- Use local git to get commits: `git log main..HEAD --oneline`
- Use local git to get the diff: `git diff main..HEAD`
- Parse any `--fixes` argument for issue numbers
2. Check the existing PR description:
- If it already has a complete, accurate description that reflects the changes, do nothing
- If it's missing sections, incomplete, or outdated compared to the actual changes, proceed to update
- If it only has the template placeholder text, generate a full description
3. Analyze the changes:
- Understand the purpose of each commit
- Identify any breaking changes (API changes, removed features, behavior changes)
- Look for new features, bug fixes, refactoring, or documentation changes
- Collect issue numbers from:
- The `--fixes` argument (if provided)
- Commit messages (patterns like "Fixes #123", "Closes #456", "Resolves #789")
4. Generate or update the PR description with these sections:
## PR Description Format
### Summary (always include)
Brief bullet points describing what changed and why. Focus on the *purpose* and *impact*, not implementation details.
```markdown
## Summary
- Added X to enable Y
- Fixed bug where Z would happen
- Refactored W for better maintainability
```
### Breaking Changes (include only if applicable)
Document any changes that affect existing users or APIs.
```markdown
## Breaking Changes
- `ClassName.method()` now requires a `param` argument
- Removed deprecated `old_function()` - use `new_function()` instead
```
### Testing (include when non-obvious)
How to verify the changes work. Skip for trivial changes.
```markdown
## Testing
- Run `uv run pytest tests/test_feature.py` to verify the fix
- Example usage: `uv run examples/new_feature.py`
```
### Fixes (include if issues are provided or found in commits)
List issues this PR fixes. GitHub will automatically close these issues when the PR is merged.
```markdown
## Fixes
- Fixes #123
- Fixes #456
```
Note: Use "Fixes #X" format (not "Closes" or "Resolves") for consistency. Each issue should be on its own line with "Fixes" to ensure GitHub auto-closes them.
## Guidelines
- **Be concise** - Reviewers should understand the PR in 30 seconds
- **Focus on why** - The diff shows *what* changed, explain *why*
- **Skip empty sections** - Only include sections that have content
- **Use bullet points** - Easier to scan than paragraphs
- **Don't duplicate the diff** - Avoid listing every file or line changed
## Example Output
```markdown
## Summary
- Added `/docstring` skill for documenting Python modules with Google-style docstrings
- Skill finds classes by name and handles conflicts when multiple matches exist
- Skips already-documented code to avoid unnecessary changes
## Testing
/docstring ClassName
## Fixes
- Fixes #123
```
## Checklist
Before updating the PR:
- [ ] Verified existing description needs updating (not already complete)
- [ ] Summary accurately reflects the changes
- [ ] Breaking changes are clearly documented (if any)
- [ ] No unnecessary sections included
- [ ] Description is concise and scannable

View File

@@ -0,0 +1,28 @@
---
name: pr-submit
description: Create and submit a GitHub PR from the current branch
---
Submit the current changes as a GitHub pull request.
## Instructions
1. Check the current state of the repository:
- Run `git status` to see staged, unstaged, and untracked changes
- Run `git diff` to see current changes
- Run `git log --oneline -10` to see recent commits
2. If there are uncommitted changes relevant to the PR:
- Ask the user if they want a specific prefix for the branch name (e.g., `alice/`, `fix/`, `feat/`)
- Create a new branch based on the current branch
- Commit the changes using multiple commits if the changes are unrelated
3. Push the branch and create the PR:
- Push with `-u` flag to set upstream tracking
- Create the PR using `gh pr create`
4. After the PR is created:
- Run `/changelog <pr_number>` to generate changelog files, then commit and push them
- Run `/pr-description <pr_number>` to update the PR description
5. Return the PR URL to the user.

View File

@@ -0,0 +1,306 @@
---
name: update-docs
description: Update documentation pages to match source code changes on the current branch
---
Update documentation pages to reflect source code changes on the current branch. Analyzes the diff against main, maps changed source files to their corresponding doc pages, and makes targeted edits.
## Arguments
```
/update-docs [DOCS_PATH]
```
- `DOCS_PATH` (optional): Path to the docs repository root. If not provided, ask the user.
Examples:
- `/update-docs /Users/me/src/docs`
- `/update-docs`
## Instructions
### Step 1: Resolve docs path
If `DOCS_PATH` was provided as an argument, use it. Otherwise, ask the user for the path to their docs repository.
Verify the path exists and contains `server/services/` subdirectory.
### Step 2: Create docs branch
Get the current pipecat branch name:
```bash
git rev-parse --abbrev-ref HEAD
```
In the docs repo, create a new branch off main with a matching name:
```bash
cd DOCS_PATH && git checkout main && git pull && git checkout -b {branch-name}-docs
```
For example, if the pipecat branch is `feat/new-service`, the docs branch becomes `feat/new-service-docs`.
All doc edits in subsequent steps are made on this branch.
### Step 3: Detect changed source files
Run:
```bash
git diff main..HEAD --name-only
```
Filter to files that could affect documentation:
- `src/pipecat/services/**/*.py` (service implementations)
- `src/pipecat/transports/**/*.py` (transport implementations)
- `src/pipecat/serializers/**/*.py` (serializer implementations)
- `src/pipecat/processors/**/*.py` (processor implementations)
- `src/pipecat/audio/**/*.py` (audio utilities)
- `src/pipecat/turns/**/*.py` (turn management)
- `src/pipecat/observers/**/*.py` (observers)
- `src/pipecat/pipeline/**/*.py` (pipeline core)
Ignore `__init__.py`, `__pycache__`, test files, and files that only contain type re-exports.
### Step 4: Map source files to doc pages
For each changed source file, find the corresponding doc page. Read the mapping file at `.claude/skills/update-docs/SOURCE_DOC_MAPPING.md` and apply its tiered lookup: tier 1 (known exceptions) → tier 2 (pattern matching) → tier 3 (search fallback). **First match wins.**
### Step 5: Analyze each source-doc pair
For each mapped pair:
1. **Read the full source file** to understand current state
2. **Read the diff** for that file: `git diff main..HEAD -- <source_file>`
3. **Read the current doc page** in full
Identify what changed by comparing source to docs:
- **Constructor parameters**: Compare `__init__` signature to the Configuration section's `<ParamField>` entries
- **InputParams fields**: Compare `InputParams(BaseModel)` class fields to the InputParams table
- **Event handlers**: Compare `_register_event_handler` calls and event handler definitions to Event Handlers section
- **Class names / imports**: Check if Usage examples reference correct names
- **Behavioral changes**: Check if Notes section needs updating
### Step 6: Make targeted edits
For each doc page that needs updates, edit **only the sections that need changes**. Preserve all other content exactly as-is.
#### Rules
- **Never remove content** unless the corresponding source code was removed
- **Never rewrite sections** that are already accurate
- **Match existing formatting** — if the page uses `<ParamField>` tags, use them; if it uses tables, use tables
- **Keep descriptions concise** — match the tone and length of surrounding content
- **Preserve CardGroup, links, and examples** unless they reference removed functionality
- **Don't touch frontmatter** unless the class was renamed
#### Section-specific guidance
**Configuration** (constructor params):
- Use `<ParamField path="name" type="type" default="value">` format if the page already uses it
- Add new params in logical order (required first, then optional)
- Remove params that no longer exist in source
- Update types/defaults that changed
**InputParams** (runtime settings):
- Use markdown table format: `| Parameter | Type | Default | Description |`
- Match the field names and types from the `InputParams(BaseModel)` class
- Include the default values from the source
**Usage** (code examples):
- Update import paths, class names, and parameter names
- Only modify examples if they would break or be misleading with the new API
- Don't rewrite working examples just to add new optional params
**Notes**:
- Add notes for new behavioral gotchas or breaking changes
- Remove notes about limitations that were fixed
- Keep existing notes that are still accurate
**Event Handlers**:
- Update the event table and example code
- Add new events, remove deleted ones
- Update handler signatures if they changed
**Overview / Key Features / Prerequisites**:
- Only update if the PR fundamentally changes what the service does (new capability, removed capability, renamed class)
- Most PRs will NOT need changes to these sections
### Step 7: Update guides
Guides at `DOCS_PATH/guides/` reference specific class names, parameters, imports, and code patterns. After completing reference doc edits, check if any guides need updates too.
For each changed source file, collect the class names, renamed parameters, and changed imports from the diff. Search the guides directory:
```bash
grep -rl "ClassName\|old_param_name" DOCS_PATH/guides/
```
For each guide that references changed code:
1. Read the full guide
2. Update class names, parameter names, import paths, and code examples that are now incorrect
3. **Don't rewrite prose** — only fix the specific references that changed
4. Leave guides alone if they reference the service generally but don't use any changed APIs
Guide directories:
- `guides/learn/` — conceptual tutorials (pipeline, LLM, STT, TTS, etc.)
- `guides/fundamentals/` — practical how-tos (metrics, recording, transcripts, etc.)
- `guides/features/` — feature-specific guides (Gemini Live, OpenAI audio, WhatsApp, etc.)
- `guides/telephony/` — telephony integration guides (Twilio, Plivo, Telnyx, etc.)
### Step 8: Identify doc gaps
After processing all mapped pairs, check for two kinds of gaps:
**Missing pages**: Source files that had no doc page mapping (neither tier 1, 2, nor 3) and are not marked as "(skip)". For each, tell the user:
- The source file path
- The main class(es) it defines
- Whether a new doc page should be created
**Missing sections**: Mapped doc pages that are missing standard sections compared to the source. For example, a transport page with no Configuration section, or a service page with no InputParams table when the source defines `InputParams(BaseModel)`. Flag these and offer to add the missing sections.
If the user wants a new page, do all three of the following:
#### 8a: Create the doc page
Create the new `.mdx` file using this template structure:
```
---
title: "Service Name"
description: "Brief description"
---
## Overview
[Description from class docstring or source analysis]
<CardGroup cols={2}>
[Cards for API reference and examples if available]
</CardGroup>
## Installation
```bash
pip install "pipecat-ai[package-name]"
```
## Prerequisites
[Environment variables and account setup]
## Configuration
[ParamField entries for constructor params]
## InputParams
[Table of InputParams fields, if the service has them]
## Usage
### Basic Setup
```python
[Minimal working example]
```
## Notes
[Important caveats]
## Event Handlers
[Event table and example code]
```
#### 8b: Add to docs.json
Add the new page path to `DOCS_PATH/docs.json` in the correct navigation group. The path format is `server/services/{category}/{provider}` (without the `.mdx` extension).
Find the matching group in the navigation structure:
- **STT** → `"group": "Speech-to-Text"` under Services
- **TTS** → `"group": "Text-to-Speech"` under Services
- **LLM** → `"group": "LLM"` under Services
- **S2S** → `"group": "Speech-to-Speech"` under Services
- **Transport** → `"group": "Transport"` under Services
- **Serializer** → `"group": "Serializers"` under Services
- **Image generation** → `"group": "Image Generation"` under Services
- **Video** → `"group": "Video"` under Services
- **Memory** → `"group": "Memory"` under Services
- **Vision** → `"group": "Vision"` under Services
- **Analytics** → `"group": "Analytics & Monitoring"` under Services
Insert the new entry **alphabetically** within the group's `pages` array. For example, adding a new STT service "foo":
```json
{
"group": "Speech-to-Text",
"pages": [
"server/services/stt/assemblyai",
"server/services/stt/aws",
...
"server/services/stt/foo",
...
]
}
```
#### 8c: Add to supported-services.mdx
Add a new row to the correct category table in `DOCS_PATH/server/services/supported-services.mdx`.
Use this format:
```
| [DisplayName](/server/services/{category}/{provider}) | `pip install "pipecat-ai[package]"` |
```
To determine the correct values:
- **DisplayName**: Use the service's human-readable name (e.g., "ElevenLabs", "AWS Polly", "Google Gemini")
- **package**: Look at the service's `pyproject.toml` extras or the import pattern in the source code. For example, if the service is in `src/pipecat/services/foo/`, the package is typically `foo`.
- If no pip dependencies are required, use `No dependencies required` instead.
Insert the new row **alphabetically** within the table. Match the column alignment of the existing rows.
### Step 9: Output summary
After all edits are complete, print a summary:
```
## Documentation Updates
### Updated reference pages
- `server/services/stt/deepgram.mdx` — Updated Configuration (added `new_param`), InputParams (updated `language` default)
- `server/services/tts/elevenlabs.mdx` — Updated Event Handlers (added `on_connected`)
### Updated guides
- `guides/learn/speech-to-text.mdx` — Updated code example (renamed `old_param` → `new_param`)
### New service pages
- `server/services/tts/newprovider.mdx` — Created page, added to docs.json (Text-to-Speech), added to supported-services.mdx
### Unmapped source files
- `src/pipecat/services/newprovider/tts.py` — NewProviderTTSService (no doc page exists)
### Skipped files
- `src/pipecat/services/ai_service.py` — internal base class
```
## Guidelines
- **Be conservative** — only change what the diff warrants. Don't "improve" docs beyond what changed in source.
- **Read before editing** — always read the full doc page before making changes so you understand the existing structure.
- **Preserve voice** — match the writing style of the existing doc page, don't impose a different tone.
- **One PR at a time** — this skill operates on the current branch's diff against main. Don't look at other branches.
- **Parallel analysis** — when multiple source files map to different doc pages, analyze and edit them in parallel for efficiency.
- **Shared source files** — files like `services/google/google.py` are shared bases. Check which services import from them and update all affected doc pages.
## Checklist
Before finishing, verify:
- [ ] All changed source files were checked against the mapping table
- [ ] Each doc page edit matches the actual source code change (not guessed)
- [ ] No content was removed unless the corresponding source was removed
- [ ] New parameters have accurate types and defaults from source
- [ ] Formatting matches the existing page style
- [ ] Guides referencing changed APIs were checked and updated
- [ ] New service pages were added to `docs.json` in the correct group, alphabetically
- [ ] New service pages were added to `supported-services.mdx` in the correct table, alphabetically
- [ ] Unmapped files were reported to the user

View File

@@ -0,0 +1,79 @@
# Source-to-Doc Mapping
Maps pipecat source files to their documentation pages. Source paths are relative to `src/pipecat/`. Doc paths are relative to `DOCS_PATH`.
## Name mismatches
These source paths don't follow the standard `services/{provider}/{type}.py``server/services/{type}/{provider}.mdx` pattern.
| Source path | Doc page |
|---|---|
| `services/google/llm.py` | `server/services/llm/gemini.mdx` |
| `services/google/llm_vertex.py` | `server/services/llm/google-vertex.mdx` |
| `services/google/google.py` | (shared base — check which services use it) |
| `services/google/gemini_live/**` | `server/services/s2s/gemini-live.mdx` |
| `services/google/gemini_live/llm_vertex.py` | `server/services/s2s/gemini-live-vertex.mdx` |
| `services/aws_nova_sonic/**` | `server/services/s2s/aws.mdx` |
| `services/ultravox/**` | `server/services/s2s/ultravox.mdx` |
| `services/grok/realtime/**` | `server/services/s2s/grok.mdx` |
| `services/openai/realtime/**` | `server/services/s2s/openai.mdx` |
| `processors/frameworks/rtvi.py` | `server/frameworks/rtvi/rtvi-processor.mdx` and `server/frameworks/rtvi/rtvi-observer.mdx` |
| `processors/transcript_processor.py` | `server/utilities/transcript-processor.mdx` |
| `processors/user_idle_processor.py` | `server/utilities/user-idle-processor.mdx` |
| `processors/idle_frame_processor.py` | `server/pipeline/pipeline-idle-detection.mdx` |
| `pipeline/task.py` | `server/pipeline/pipeline-task.mdx` |
| `pipeline/runner.py` | `server/utilities/runner/guide.mdx` |
| `transports/base_transport.py` | `server/services/transport/transport-params.mdx` |
## Skip list
These files should never trigger doc updates.
| Pattern | Reason |
|---|---|
| `services/ai_service.py` | Internal base class |
| `services/stt_service.py` | Internal base class |
| `services/tts_service.py` | Internal base class |
| `services/llm_service.py` | Internal base class |
| `services/websocket_service.py` | Internal base class |
| `services/openai_realtime_beta/**` | Deprecated |
| `services/openai_realtime/**` | Deprecated |
| `services/gemini_multimodal_live/**` | Deprecated |
| `services/aws/agent_core.py` | Internal |
| `services/aws/sagemaker/**` | No doc page |
| `transports/base_input.py` | Internal base class |
| `transports/base_output.py` | Internal base class |
| `transports/websocket/client.py` | No doc page |
| `serializers/base_serializer.py` | Internal base class |
| `serializers/protobuf.py` | Internal |
| `processors/audio/**` | Internal |
| `pipeline/pipeline.py` | Core architecture, not a service doc |
## Pattern matching
For files not in the tables above, apply these patterns. Convert underscores to hyphens in provider names for doc filenames.
| Source pattern | Doc pattern |
|---|---|
| `services/{provider}/stt*.py` | `server/services/stt/{provider}.mdx` |
| `services/{provider}/tts*.py` | `server/services/tts/{provider}.mdx` |
| `services/{provider}/llm*.py` | `server/services/llm/{provider}.mdx` |
| `services/{provider}/image*.py` | `server/services/image-generation/{provider}.mdx` |
| `services/{provider}/video*.py` | `server/services/video/{provider}.mdx` |
| `services/{provider}/realtime/**` | `server/services/s2s/{provider}.mdx` |
| `transports/{name}/**` | `server/services/transport/{name}.mdx` |
| `serializers/{name}.py` | `server/services/serializers/{name}.mdx` |
| `observers/**` | `server/utilities/observers/` (match by class name) |
| `audio/vad/**` | `server/utilities/audio/` (match by class name) |
| `audio/filters/**` | `server/utilities/audio/` (match by class name) |
| `audio/mixers/**` | `server/utilities/audio/` (match by class name) |
| `processors/filters/**` | `server/utilities/filters/` (match by class name) |
If the doc file doesn't exist at the resolved path, the file is **unmapped**.
## Search fallback
For files that don't match any table or pattern above:
1. Extract the main class name(s) from the source file
2. Search the docs directory for that class name: `grep -r "ClassName" DOCS_PATH/server/`
3. If found in a doc page, use that as the mapping

View File

@@ -29,11 +29,22 @@ jobs:
- name: Install system packages
run: |
sudo apt-get update
sudo apt-get install -y portaudio19-dev
- name: Install dependencies
run: |
uv sync --group dev --extra anthropic --extra aws --extra google --extra langchain --extra websocket
uv sync --group dev \
--extra anthropic \
--extra aws \
--extra deepgram \
--extra google \
--extra langchain \
--extra livekit \
--extra piper \
--extra sagemaker \
--extra tracing \
--extra websocket
- name: Run tests with coverage
run: |

View File

@@ -86,7 +86,7 @@ jobs:
fi
# Validate fragment types
VALID_TYPES="added changed deprecated removed fixed security other"
VALID_TYPES="added changed deprecated removed fixed performance security other"
INVALID_FRAGMENTS=""
for file in changelog/*.md; do

View File

@@ -14,7 +14,7 @@ jobs:
strategy:
fail-fast: false
matrix:
python-version: ['3.10.18', '3.11.13', '3.12.11', '3.13.5']
python-version: ['3.10.19', '3.11.14', '3.12.12', '3.13.12']
name: Python ${{ matrix.python-version }}
steps:
@@ -40,20 +40,10 @@ jobs:
uv python install ${{ matrix.python-version }}
uv python pin ${{ matrix.python-version }}
- name: Test uv sync with all extras (Python < 3.13)
if: "!startsWith(matrix.python-version, '3.13.')"
- name: Test uv sync with all extras
run: |
uv sync --group dev --all-extras --no-extra krisp
- name: Test uv sync without PyTorch extras (Python 3.13+)
if: startsWith(matrix.python-version, '3.13.')
run: |
uv sync --group dev --all-extras \
--no-extra krisp \
--no-extra local-smart-turn \
--no-extra moondream \
--no-extra mlx-whisper
- name: Verify installation
run: |
uv run python --version

View File

@@ -33,11 +33,22 @@ jobs:
- name: Install system packages
run: |
sudo apt-get update
sudo apt-get install -y portaudio19-dev
- name: Install dependencies
run: |
uv sync --group dev --extra anthropic --extra aws --extra google --extra langchain --extra websocket
uv sync --group dev \
--extra anthropic \
--extra aws \
--extra deepgram \
--extra google \
--extra langchain \
--extra livekit \
--extra piper \
--extra sagemaker \
--extra tracing \
--extra websocket
- name: Test with pytest
run: |

147
.github/workflows/update-docs.yml vendored Normal file
View File

@@ -0,0 +1,147 @@
name: Update Documentation on PR Merge
on:
pull_request_target:
types: [closed]
branches: [main]
paths:
- "src/pipecat/services/**"
- "src/pipecat/transports/**"
- "src/pipecat/serializers/**"
- "src/pipecat/processors/**"
- "src/pipecat/audio/**"
- "src/pipecat/turns/**"
- "src/pipecat/observers/**"
- "src/pipecat/pipeline/**"
workflow_dispatch:
inputs:
pr_number:
description: "PR number to generate docs for"
required: true
type: string
jobs:
update-docs:
if: >-
github.event_name == 'workflow_dispatch' ||
github.event.pull_request.merged == true
runs-on: ubuntu-latest
timeout-minutes: 15
permissions:
contents: read
pull-requests: read
id-token: write
steps:
- name: Checkout pipecat
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Checkout docs
uses: actions/checkout@v4
with:
repository: pipecat-ai/docs
token: ${{ secrets.DOCS_SYNC_TOKEN }}
path: _docs
- name: Resolve PR number
id: pr
run: |
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
echo "number=${{ inputs.pr_number }}" >> "$GITHUB_OUTPUT"
else
echo "number=${{ github.event.pull_request.number }}" >> "$GITHUB_OUTPUT"
fi
- name: Update documentation
uses: anthropics/claude-code-action@v1
env:
DOCS_SYNC_TOKEN: ${{ secrets.DOCS_SYNC_TOKEN }}
with:
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
github_token: ${{ secrets.GITHUB_TOKEN }}
prompt: |
You are updating documentation for the pipecat-ai/docs repository based on
changes merged in PR #${{ steps.pr.outputs.number }} of pipecat-ai/pipecat.
## Setup
1. Read the skill instructions at `.claude/skills/update-docs/SKILL.md`
2. Read the source-to-doc mapping at `.claude/skills/update-docs/SOURCE_DOC_MAPPING.md`
3. The docs repository is checked out at `./_docs/`
## Get the diff
Run `gh pr diff ${{ steps.pr.outputs.number }}` to see what changed in the PR.
Also run `gh pr diff ${{ steps.pr.outputs.number }} --name-only` to get the list of changed files.
Filter to source files matching the directories listed in SKILL.md Step 3.
If no relevant source files were changed, exit with "No documentation changes needed."
## Follow the skill instructions
Apply the SKILL.md workflow (Steps 3-9) with these adaptations for automation:
### Docs path
Use `./_docs/` — it's already checked out. Do not ask for a path.
### Branch management
- Branch name: `docs/pr-${{ steps.pr.outputs.number }}`
- Work inside `./_docs/` for all doc edits and git operations
- Check if the branch already exists on the remote:
```bash
cd _docs && git fetch origin docs/pr-${{ steps.pr.outputs.number }} 2>/dev/null
```
- If it exists: check it out (supports workflow re-runs)
- If not: create it from main
### Git config
Before committing in `_docs`, set:
```bash
git config user.name "github-actions[bot]"
git config user.email "github-actions[bot]@users.noreply.github.com"
```
### No interactive questions
Do not ask questions. If you encounter gaps (unmapped files, missing sections,
ambiguous changes), note them in the PR body under "## Gaps identified".
### Creating the docs PR
After committing all changes in `_docs`, push and create a PR:
```bash
cd _docs
git push -u origin docs/pr-${{ steps.pr.outputs.number }}
GH_TOKEN=$DOCS_SYNC_TOKEN gh pr create \
--repo pipecat-ai/docs \
--label auto-docs \
--title "docs: update for pipecat PR #${{ steps.pr.outputs.number }}" \
--body "$(cat <<'BODY'
Automated documentation update for [pipecat PR #${{ steps.pr.outputs.number }}](https://github.com/pipecat-ai/pipecat/pull/${{ steps.pr.outputs.number }}).
## Changes
<summarize each doc page updated and what changed>
## Gaps identified
<any unmapped files, missing doc pages, or missing sections — or "None">
BODY
)"
```
### Re-run handling
If `gh pr create` fails because a PR from that branch already exists,
push the updated commits and use `gh pr edit` to update the body instead.
### No-op
If after analyzing the diff you determine no documentation changes are needed
(e.g., only skip-listed files changed, or changes don't affect public API docs),
exit cleanly without creating a branch or PR. Output "No documentation changes needed."
## Important rules
- Only modify files inside `./_docs/` — never modify pipecat source code
- Follow the conservative editing rules from SKILL.md Step 6
- Read each doc page fully before editing (SKILL.md Guidelines)
- Use `GH_TOKEN=$DOCS_SYNC_TOKEN` for all `gh` commands targeting pipecat-ai/docs
claude_args: |
--model claude-sonnet-4-5-20250929
--max-turns 30
--allowedTools "Read,Write,Edit,Glob,Grep,Bash"

16
.gitignore vendored
View File

@@ -4,7 +4,14 @@ __pycache__/
*~
venv
.venv
/.idea
.idea
.gradle
.next
next-env.d.ts
local.properties
*.log
*.lock
smart_turn_audio_log
#*#
# Distribution / Packaging
@@ -27,7 +34,7 @@ share/python-wheels/
*.egg
MANIFEST
.DS_Store
.env
.env*
fly.toml
# Examples
@@ -51,4 +58,7 @@ docs/api/_build/
docs/api/api
# uv
.python-version
.python-version
# Pipecat
whisker_setup.py

File diff suppressed because it is too large Load Diff

157
CLAUDE.md Normal file
View File

@@ -0,0 +1,157 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
Pipecat is an open-source Python framework for building real-time voice and multimodal conversational AI agents. It orchestrates audio/video, AI services, transports, and conversation pipelines using a frame-based architecture.
## Common Commands
```bash
# Setup development environment
uv sync --group dev --all-extras --no-extra gstreamer --no-extra krisp
# Install pre-commit hooks
uv run pre-commit install
# Run all tests
uv run pytest
# Run a single test file
uv run pytest tests/test_name.py
# Run a specific test
uv run pytest tests/test_name.py::test_function_name
# Preview changelog
uv run towncrier build --draft --version Unreleased
# Lint and format check
uv run ruff check
uv run ruff format --check
# Update dependencies (after editing pyproject.toml)
uv lock && uv sync
```
## Architecture
### Frame-Based Pipeline Processing
All data flows as **Frame** objects through a pipeline of **FrameProcessors**:
```
[Processor1] → [Processor2] → ... → [ProcessorN]
```
**Key components:**
- **Frames** (`src/pipecat/frames/frames.py`): Data units (audio, text, video) and control signals. Flow DOWNSTREAM (input→output) or UPSTREAM (acknowledgments/errors).
- **FrameProcessor** (`src/pipecat/processors/frame_processor.py`): Base processing unit. Each processor receives frames, processes them, and pushes results downstream.
- **Pipeline** (`src/pipecat/pipeline/pipeline.py`): Chains processors together.
- **ParallelPipeline** (`src/pipecat/pipeline/parallel_pipeline.py`): Runs multiple pipelines in parallel.
- **Transports** (`src/pipecat/transports/`): Transports are frame processors used for external I/O layer (Daily WebRTC, LiveKit WebRTC, WebSocket, Local). Abstract interface via `BaseTransport`, `BaseInputTransport` and `BaseOutputTransport`.
- **Pipeline Task (`src/pipecat/pipeline/task.py`)**: Runs and manages a pipeline. Pipeline tasks send the first frame, `StartFrame`, to the pipeline in order for processors to know they can start processing and pushing frames. Pipeline tasks internally create a pipeline with two additional processors, a source processor before the user-defined pipeline and a sink processor at the end. Those are used for multiple things: error handling, pipeline task level events, heartbeat monitoring, etc.
- **Pipeline Runner (`src/pipecat/pipeline/runner.py`)**: High-level entry point for executing pipeline tasks. Handles signal management (SIGINT/SIGTERM) for graceful shutdown and optional garbage collection. Run a single pipeline task with `await runner.run(task)` or multiple concurrently with `await asyncio.gather(runner.run(task1), runner.run(task2))`.
- **Services** (`src/pipecat/services/`): 60+ AI provider integrations (STT, TTS, LLM, etc.). Extend base classes: `AIService`, `LLMService`, `STTService`, `TTSService`, `VisionService`.
- **Serializers** (`src/pipecat/serializers/`): Convert frames to/from wire formats for WebSocket transports. `FrameSerializer` base class defines `serialize()` and `deserialize()`. Telephony serializers (Twilio, Plivo, Vonage, Telnyx, Exotel, Genesys) handle provider-specific protocols and audio encoding (e.g., μ-law).
- **RTVI** (`src/pipecat/processors/frameworks/rtvi.py`): Real-Time Voice Interface protocol bridging clients and the pipeline. `RTVIProcessor` handles incoming client messages (text input, audio, function call results). `RTVIObserver` converts pipeline frames to outgoing messages: user/bot speaking events, transcriptions, LLM/TTS lifecycle, function calls, metrics, and audio levels.
- **Observers** (`src/pipecat/observers/`): Monitor frame flow without modifying the pipeline. Passed to `PipelineTask` via the `observers` parameter. Implement `on_process_frame()` and `on_push_frame()` callbacks.
### Important Patterns
- **Context Aggregation**: `LLMContext` accumulates messages for LLM calls; `UserResponse` aggregates user input
- **Turn Management**: Turn management is done through `LLMUserAggregator` and
`LLMAssistantAggregator`, created with `LLMContextAggregatorPair`
- **User turn strategies**: Detection of when the user starts and stops speaking is done via user turn start/stop strategies. They push `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` respectively.
- **Interruptions**: Interruptions are usually triggered by a user turn start strategy (e.g. `VADUserTurnStartStrategy`) but they can be triggered by other processors as well, in which case the user turn start strategies don't need to. An `InterruptionFrame` carries an optional `asyncio.Event` that is set when the frame reaches the pipeline sink. If a processor stops an `InterruptionFrame` from propagating downstream (i.e., doesn't push it), it **must** call `frame.complete()` to avoid stalling `push_interruption_task_frame_and_wait()` callers.
- **Uninterruptible Frames**: These are frames that will not be removed from internal queues even if there's an interruption. For example, `EndFrame` and `StopFrame`.
- **Events**: Most classes in Pipecat have `BaseObject` as the very base class. `BaseObject` has support for events. Events can run in the background in an async task (default) or synchronously (`sync=True`) if we want immediate action. Synchronous event handlers need to execute fast.
- **Async Task Management**: Always use `self.create_task(coroutine, name)` instead of raw `asyncio.create_task()`. The `TaskManager` automatically tracks tasks and cleans them up on processor shutdown. Use `await self.cancel_task(task, timeout)` for cancellation.
- **Error Handling**: Use `await self.push_error(msg, exception, fatal)` to push errors upstream. Services should use `fatal=False` (the default) so application code can handle errors and take action (e.g. switch to another service).
### Key Directories
| Directory | Purpose |
| -------------------------- | -------------------------------------------------- |
| `src/pipecat/frames/` | Frame definitions (100+ types) |
| `src/pipecat/processors/` | FrameProcessor base + aggregators, filters, audio |
| `src/pipecat/pipeline/` | Pipeline orchestration |
| `src/pipecat/services/` | AI service integrations (60+ providers) |
| `src/pipecat/transports/` | Transport layer (Daily, LiveKit, WebSocket, Local) |
| `src/pipecat/serializers/` | Frame serialization for WebSocket protocols |
| `src/pipecat/observers/` | Pipeline observers for monitoring frame flow |
| `src/pipecat/audio/` | VAD, filters, mixers, turn detection, DTMF |
| `src/pipecat/turns/` | User turn management |
## Code Style
- **Docstrings**: Google-style. Classes describe purpose; `__init__` has `Args:` section; dataclasses use `Parameters:` section.
- **Linting**: Ruff (line length 100). Pre-commit hooks enforce formatting.
- **Type hints**: Required for complex async code.
- **Dataclass vs Pydantic**: Use `@dataclass` for frames and internal pipeline data (high-frequency, no validation needed). Use Pydantic `BaseModel` for configuration, parameters, metrics, and external API data (benefits from validation and serialization). Specifically:
- `@dataclass`: Frame types, context aggregator pairs, internal data containers
- `BaseModel`: Service `InputParams`, transport/VAD/turn params, metrics data, API request/response models, serializer params
### Docstring Example
```python
class MyService(LLMService):
"""Description of what the service does.
More detailed description.
Event handlers available:
- on_connected: Called when we are connected
Example::
@service.event_handler("on_connected")
async def on_connected(service, frame):
...
"""
def __init__(self, param1: str, **kwargs):
"""Initialize the service.
Args:
param1: Description of param1.
**kwargs: Additional arguments passed to parent.
"""
super().__init__(**kwargs)
```
## Service Implementation
When adding a new service:
1. Extend the appropriate base class (`STTService`, `TTSService`, `LLMService`, etc.)
2. Implement required abstract methods
3. Handle necessary frames
4. By default, all frames should be pushed in the direction they came
5. Push `ErrorFrame` on failures
6. Add metrics tracking via `MetricsData` if relevant
7. Follow the pattern of existing services in `src/pipecat/services/`
## Testing
Test utilities live in `src/pipecat/tests/utils.py`. Use `run_test()` to send frames through a pipeline and assert expected output frames in each direction. Use `SleepFrame(sleep=N)` to add delays between frames.

View File

@@ -25,7 +25,6 @@ Your repository must contain these components:
- **Source code** - Complete implementation following Pipecat patterns
- **Foundational example** - Single file example showing basic usage (see [Pipecat examples](https://github.com/pipecat-ai/pipecat/tree/main/examples/foundational))
- **README.md** - Must include:
- Introduction and explanation of your integration
- Installation instructions
- Usage instructions with Pipecat Pipeline
@@ -110,7 +109,6 @@ Once your PR is submitted, post in the `#community-integrations` Discord channel
#### Key requirements:
- **Frame sequence:** Output must follow this frame sequence pattern:
- `LLMFullResponseStartFrame` - Signals the start of an LLM response
- `LLMTextFrame` - Contains LLM content, typically streamed as tokens
- `LLMFullResponseEndFrame` - Signals the end of an LLM response
@@ -233,24 +231,137 @@ def can_generate_metrics(self) -> bool:
return True
```
### Dynamic Settings Updates
### Service Settings
STT, LLM, and TTS services support `ServiceUpdateSettingsFrame` for dynamic configuration changes. The base STTService has an `_update_settings()` method that handles settings, and the private `_settings` `Dict` is used to store settings and provide access to the subclass.
Every AI service (STT, LLM, TTS, image generation, etc.) exposes a **Settings dataclass** that serves two roles:
1. **Store mode** — the service's `self._settings` holds the current value of every runtime-updatable field.
2. **Delta mode** — an update frame (e.g. `TTSUpdateSettingsFrame`) specifies only the fields that should change; unspecified fields remain `NOT_GIVEN`.
#### Defining your Settings class
Extend `STTSettings`, `TTSSettings`, `LLMSettings`, or `ImageGenSettings` (or, if your service directly subclasses `AIService`, `ServiceSettings`). The base classes already provide common fields (e.g. `model`, `voice`, `language`). You only need to add **service-specific knobs that should be runtime-updatable**:
```python
async def set_language(self, language: Language):
"""Set the recognition language and reconnect.
from dataclasses import dataclass, field
Args:
language: The language to use for speech recognition.
from pipecat.services.settings import TTSSettings, NOT_GIVEN
@dataclass
class MyTTSSettings(TTSSettings):
"""Settings for MyTTS service.
Parameters:
speaking_rate: Speed multiplier (0.52.0).
"""
logger.info(f"Switching STT language to: [{language}]")
self._settings["language"] = language
await self._disconnect()
await self._connect()
speaking_rate: float | None = field(default_factory=lambda: NOT_GIVEN)
```
Note that, in this example, Deepgram requires the websocket connection be disconnected and reconnected to reinitialize the service with the new value. Consider if your service requires reconnection.
**What goes in Settings vs. `__init__` params:**
| Belongs in Settings | Stays as `__init__` params |
| -------------------------------------------------------- | ----------------------------------------- |
| Model name, voice, language | API keys, auth tokens |
| Service-specific tuning knobs (rate, pitch, temperature) | Base URLs, endpoint overrides |
| Anything users may want to change mid-session | Audio encoding, sample format |
| | Connection parameters (timeouts, retries) |
The rule of thumb: if a caller might send an update frame to change it at runtime, it belongs in Settings. Everything else is init-only config stored as `self._xxx`.
#### Wiring settings into `__init__`
Accept an **optional** `settings` parameter. Build a `default_settings` object with all fields set to real values, then merge any caller overrides with `apply_update`.
Add a `Settings` **class attribute** that points to your settings dataclass. This lets callers access the settings class through the service itself (e.g. `MyTTSService.Settings(...)`) without a separate import:
```python
from typing import Optional
class MyTTSService(TTSService):
Settings = MyTTSSettings
_settings: Settings
def __init__(
self,
*,
api_key: str,
settings: Optional[Settings] = None,
**kwargs,
):
# 1. Defaults — every field has a real value (store mode).
default_settings = self.Settings(
model="my-model-v1",
voice="default-voice",
language="en",
speaking_rate=1.0,
)
# 2. Merge caller overrides (only given fields win).
if settings is not None:
default_settings.apply_update(settings)
# 3. Pass the fully-populated settings to the base class.
super().__init__(settings=default_settings, **kwargs)
# 4. Init-only config stored separately.
self._api_key = api_key
```
This pattern lets callers override only what they care about:
```python
# Uses all defaults
svc = MyTTSService(api_key="sk-xxx")
# Overrides just the voice — access Settings through the service class
svc = MyTTSService(
api_key="sk-xxx",
settings=MyTTSService.Settings(voice="custom-voice"),
)
```
#### Reacting to runtime changes
AI services support runtime configuration changes via `*UpdateSettingsFrame`s (e.g. `STTUpdateSettingsFrame`, `TTSUpdateSettingsFrame`, `LLMUpdateSettingsFrame`).
To react to runtime setting changes, override `_update_settings`. The base implementation applies the delta to `self._settings` and returns a `dict` mapping each changed field name to its **pre-update** value. Your override should call `super()` first, then act on the changed fields. A common implementation might look like:
```python
async def _update_settings(self, update: TTSSettings) -> dict[str, Any]:
"""Apply a settings update, reconfiguring the connection if needed."""
changed = await super()._update_settings(update)
if not changed:
return changed
await self._disconnect()
await self._connect()
return changed
```
The dict keys work like a set for membership tests (`"language" in changed`) and truthiness (`if changed`). Use `changed.keys() - {"language"}` for set difference, or `changed["language"]` to inspect the previous value of a field.
Note that, in this example, the service requires a reconnect to apply the new language. Consider, for each setting, whether your service requires reconnection or can apply changes in-place.
If your service can't yet apply certain settings at runtime, call `self._warn_unhandled_updated_settings(changed)` with any unhandled field names so users get a clear log message:
```python
async def _update_settings(self, update: TTSSettings) -> dict[str, Any]:
changed = await super()._update_settings(update)
if not changed:
return changed
if "language" in changed:
await self._update_language()
else:
# TODO: this should be temporary - handle changes to other settings soon!
self._warn_unhandled_updated_settings(changed.keys() - {"language"})
return changed
```
### Sample Rate Handling
@@ -260,7 +371,7 @@ Sample rates are set via PipelineParams and passed to each frame processor at in
async def start(self, frame: StartFrame):
"""Start the service."""
await super().start(frame)
self._settings["output_format"]["sample_rate"] = self.sample_rate
self._settings.output_sample_rate = self.sample_rate
await self._connect()
```

View File

@@ -49,12 +49,12 @@ Every pull request that makes a user-facing change should include a changelog en
```
2. Choose the appropriate type:
- `added.md` - New features
- `changed.md` - Changes in existing functionality
- `deprecated.md` - Soon-to-be removed features
- `removed.md` - Removed features
- `fixed.md` - Bug fixes
- `performance.md` - Performance improvements
- `security.md` - Security fixes
- `other.md` - Other changes (documentation, dependencies, etc.)
@@ -80,7 +80,6 @@ Every pull request that makes a user-facing change should include a changelog en
```markdown
- Updated service configuration:
- Changed default timeout to 30 seconds
- Added retry logic for failed connections
```
@@ -105,7 +104,6 @@ changelog/1234.changed.2.md
```markdown
- Updated service configuration:
- Changed default timeout to 30 seconds
- Added retry logic for failed connections
```

View File

@@ -55,6 +55,20 @@ Looking for help debugging your pipeline and processors? Check out [Whisker](htt
Love terminal applications? Check out [Tail](https://github.com/pipecat-ai/tail), a terminal dashboard for Pipecat.
### 🤖 Claude Code Skills
Use [Pipecat Skills](https://github.com/pipecat-ai/skills) with [Claude Code](https://claude.ai/code) to scaffold projects, deploy to Pipecat Cloud, and more. Install the marketplace with:
```
claude plugin marketplace add pipecat-ai/skills
```
and install any of the available plugins.
### 🧩 Community Integrations
Build and share your own Pipecat service integrations! Browse existing [community integrations](https://docs.pipecat.ai/server/services/community-integrations) or check out our [guide](COMMUNITY_INTEGRATIONS.md) to create your own.
### 📺️ Pipecat TV Channel
Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.youtube.com/playlist?list=PLzU2zoMTQIHjqC3v4q2XVSR3hGSzwKFwH) channel.
@@ -71,19 +85,20 @@ Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.yout
## 🧩 Available services
| Category | Services |
| ------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Gradium](https://docs.pipecat.ai/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [NVIDIA Riva](https://docs.pipecat.ai/server/services/stt/riva), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [SambaNova (Whisper)](https://docs.pipecat.ai/server/services/stt/sambanova), [Sarvam](https://docs.pipecat.ai/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) |
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/server/services/llm/aws), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/server/services/llm/mistral), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/server/services/llm/sambanova) [Together AI](https://docs.pipecat.ai/server/services/llm/together) |
| Text-to-Speech | [Async](https://docs.pipecat.ai/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [Gradium](https://docs.pipecat.ai/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/server/services/tts/groq), [Hume](https://docs.pipecat.ai/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/server/services/tts/inworld), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [NVIDIA Riva](https://docs.pipecat.ai/server/services/tts/riva), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [Speechmatics](https://docs.pipecat.ai/server/services/tts/speechmatics), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
| Speech-to-Speech | [AWS Nova Sonic](https://docs.pipecat.ai/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/server/services/s2s/ultravox), |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local |
| Serializers | [Exotel](https://docs.pipecat.ai/server/utilities/serializers/exotel), [Plivo](https://docs.pipecat.ai/server/utilities/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/utilities/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/utilities/serializers/telnyx), [Vonage](https://docs.pipecat.ai/server/utilities/serializers/vonage) |
| Video | [HeyGen](https://docs.pipecat.ai/server/services/video/heygen), [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) |
| Memory | [mem0](https://docs.pipecat.ai/server/services/memory/mem0) |
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/fal), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) |
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [ai-coustics](https://docs.pipecat.ai/server/utilities/audio/aic-filter) |
| Analytics & Metrics | [OpenTelemetry](https://docs.pipecat.ai/server/utilities/opentelemetry), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) |
| Category | Services |
| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Gradium](https://docs.pipecat.ai/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [NVIDIA Riva](https://docs.pipecat.ai/server/services/stt/riva), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [SambaNova (Whisper)](https://docs.pipecat.ai/server/services/stt/sambanova), [Sarvam](https://docs.pipecat.ai/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) |
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/server/services/llm/aws), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/server/services/llm/mistral), [Novita](https://docs.pipecat.ai/server/services/llm/novita), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nvidia), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/server/services/llm/sambanova), [Sarvam](https://docs.pipecat.ai/server/services/llm/sarvam), [Together AI](https://docs.pipecat.ai/server/services/llm/together) |
| Text-to-Speech | [Async](https://docs.pipecat.ai/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Camb AI](https://docs.pipecat.ai/server/services/tts/camb), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [Gradium](https://docs.pipecat.ai/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/server/services/tts/groq), [Hume](https://docs.pipecat.ai/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/server/services/tts/inworld), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [NVIDIA Riva](https://docs.pipecat.ai/server/services/tts/riva), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [Resemble](https://docs.pipecat.ai/server/services/tts/resemble), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [Smallest](https://docs.pipecat.ai/server/services/tts/smallest), [Speechmatics](https://docs.pipecat.ai/server/services/tts/speechmatics), [xAI](https://docs.pipecat.ai/server/services/tts/xai), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
| Speech-to-Speech | [AWS Nova Sonic](https://docs.pipecat.ai/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/server/services/s2s/ultravox), |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local |
| Serializers | [Exotel](https://docs.pipecat.ai/server/utilities/serializers/exotel), [Plivo](https://docs.pipecat.ai/server/utilities/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/utilities/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/utilities/serializers/telnyx), [Vonage](https://docs.pipecat.ai/server/utilities/serializers/vonage) |
| Video | [HeyGen](https://docs.pipecat.ai/server/services/video/heygen), [LemonSlice](https://docs.pipecat.ai/server/services/video/lemonslice), [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) |
| Memory | [mem0](https://docs.pipecat.ai/server/services/memory/mem0) |
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/google-imagen), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) |
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [ai-coustics](https://docs.pipecat.ai/server/utilities/audio/aic-filter) |
| Analytics & Metrics | [OpenTelemetry](https://docs.pipecat.ai/server/utilities/opentelemetry), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) |
| Community | [Browse community integrations →](https://docs.pipecat.ai/server/services/community-integrations) |
📚 [View full services documentation →](https://docs.pipecat.ai/server/services/supported-services)
@@ -163,6 +178,15 @@ You can get started with Pipecat running on your local machine, then move your a
> **Note**: Some extras (local, gstreamer) require system dependencies. See documentation if you encounter build errors.
### Claude Code Skills
Install development workflow skills for contributing to Pipecat with [Claude Code](https://claude.ai/code):
```
claude plugin marketplace add pipecat-ai/pipecat
claude plugin install pipecat-dev@pipecat-dev-skills
```
### Running tests
To run all tests, from the root directory:

1
changelog/3978.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `SarvamLLMService` with support for `sarvam-30b`, `sarvam-30b-16k`, `sarvam-105b` and `sarvam-105b-32k`

1
changelog/4013.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `on_turn_context_created(context_id)` hook to `TTSService`. Override this to perform provider-specific setup (e.g. eagerly opening a server-side context) before text starts flowing. Called each time a new turn context ID is created.

View File

@@ -0,0 +1 @@
- Added context prewarming path for `InworldTTSService` to improve first audio latency

View File

@@ -0,0 +1 @@
- Added `KrispVivaVadAnalyzer` for Voice Activity Detection using the Krisp VIVA SDK (requires `krisp_audio`).

View File

@@ -0,0 +1 @@
- Modeified `InworldTTSService` to close context at end of turn instead of relying on idle timeout

1
changelog/4031.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `XAIHttpTTSService` for text-to-speech using xAI's HTTP TTS API.

View File

@@ -0,0 +1 @@
- Added Gemini 3 support to the Gemini Live service.

View File

@@ -0,0 +1 @@
- `TTSService`: the default `stop_frame_timeout_s` (idle time before an automatic `TTSStoppedFrame` is pushed when `push_stop_frames=True`) has changed from `2.0` to `3.0` seconds.

1
changelog/4089.added.md Normal file
View File

@@ -0,0 +1 @@
- Added support for "developer" role messages in conversation context across all LLM adapters. For non-OpenAI services (Anthropic, Google, AWS Bedrock), "developer" messages are converted to "user" messages (use `system_instruction` to set the system instruction). For OpenAI services, "developer" messages pass through in conversation history. For the Responses API, they are kept as "developer" role (matching the existing "system" → "developer" conversion).

View File

@@ -0,0 +1 @@
- ⚠️ `GeminiLLMAdapter` now only treats `messages[0]` as the initial system message, matching all other adapters. Previously it searched for the first "system" message anywhere in the conversation history. A "system" message appearing later in the list will now be converted to "user" instead of being extracted as the system instruction.

View File

@@ -0,0 +1 @@
- Fixed Gemini Live (`GoogleGeminiLiveLLMService`) not honoring `settings.system_instruction`. The system instruction was being read from a deprecated constructor parameter instead of the settings object, causing it to be silently ignored.

1
changelog/4089.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed `AWSBedrockLLMAdapter` sending an empty message list to the API when the only message in context was a system message. The lone system message is now converted to "user" role instead of being extracted, matching the existing Anthropic adapter behavior.

1
changelog/4092.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `SmallestTTSService`, a WebSocket-based TTS service integration with Smallest AI's Waves API. Supports the Lightning v2 and v3.1 models with configurable voice, language, speed, consistency, similarity, and enhancement settings.

1
changelog/4115.added.md Normal file
View File

@@ -0,0 +1 @@
- Added warnings in turn stop strategies when `VADParams.stop_secs` differs from the recommended default (0.2s) or when `stop_secs >= STT p99 latency`, which collapses the STT wait timeout to 0s and may cause delayed turn detection. The warnings guide developers to re-run the [stt-benchmark](https://github.com/pipecat-ai/stt-benchmark) with their VAD settings.

1
changelog/4117.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `domain` parameter to `AssemblyAISTTSettings` for specialized recognition modes such as Medical Mode (`domain="medical-v1"`).

1
changelog/4119.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `NovitaLLMService` for using Novita AI's LLM models via their OpenAI-compatible API.

1
changelog/4120.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `cleanup()` method to `VADAnalyzer` and `VADController` so VAD analyzer resources are properly released when no longer needed. Custom `VADAnalyzer` subclasses can override `cleanup()` to free any held resources.

1
changelog/4125.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed Gemini Live pipeline hanging indefinitely when an `EndFrame` was deferred while waiting for the bot to finish responding and `turn_complete` never arrived. As a possible root-cause fix, `turn_complete` messages are now handled even if they lack `usage_metadata`. As a fallback, the deferred `EndFrame` now has a 30-second safety timeout.

1
changelog/4126.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed ElevenLabs WebSocket disconnections (1008 "Maximum simultaneous contexts exceeded") caused by rapid user interruptions. When interruptions arrived before any TTS text was generated, phantom contexts were created on the ElevenLabs server that were never closed, eventually exceeding the 5-context limit.

1
changelog/4127.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed the final sentence being dropped from the conversation context when using RTVI text input with non-word-timestamp TTS services. The `LLMFullResponseEndFrame` was racing ahead of the last `TTSTextFrame`, causing the `LLMAssistantAggregator` to finalize the context before the final sentence arrived.

View File

@@ -0,0 +1 @@
- `FrameFilter`, `IdentityFilter`, and `NullFilter` now use direct mode for lower-latency synchronous frame processing, bypassing the async queue.

View File

@@ -0,0 +1 @@
- ⚠️ Realtime services (Gemini Live, OpenAI Realtime, Grok Realtime, Nova Sonic) now prefer `system_instruction` from service settings over an initial system message in the LLM context, matching the behavior of non-realtime services. Previously, context-provided system instructions took precedence. A warning is now logged when both are set.

1
changelog/4135.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed audio crackling and popping in recordings when both user and bot are speaking. `AudioBufferProcessor` no longer injects silence into a track's buffer while that track is actively producing audio, preventing mid-utterance interruptions in the recorded output.

View File

@@ -0,0 +1 @@
- Bumped `nvidia-riva-client` minimum version to `>=2.25.1`.

View File

@@ -0,0 +1 @@
- Upgraded `protobuf` from 5.x to 6.x (`>=6.31.1,<7`).

View File

@@ -0,0 +1 @@
- Unrecognized language strings (e.g. Deepgram's `"multi"`) no longer produce a warning at startup. The log message has been downgraded to debug level since these are valid service-specific values that are passed through correctly.

View File

@@ -42,7 +42,7 @@ This script:
- Creates a fresh virtual environment
- Installs all dependencies as specified in requirements files
- Handles conflicting dependencies (like grpcio versions for Riva and PlayHT)
- Handles conflicting dependencies (like grpcio versions for Riva)
- Builds the documentation in an isolated environment
- Provides detailed logging of the build process
@@ -74,7 +74,6 @@ start _build/html/index.html
├── index.rst # Main documentation entry point
├── requirements-base.txt # Base documentation dependencies
├── requirements-riva.txt # Riva-specific dependencies
├── requirements-playht.txt # PlayHT-specific dependencies
├── build-docs.sh # Local build script
└── rtd-test.py # ReadTheDocs test build script
```

View File

@@ -91,6 +91,25 @@ autodoc_mock_imports = [
# MLX dependencies (Apple Silicon specific)
"mlx",
"mlx_whisper", # Note: might need underscore format too
# Pydantic v2 compatibility issues in third-party SDKs
"hume",
"hume.tts",
"hume.tts.types",
"cartesia",
"camb",
"sarvamai",
"openpipe",
"openai.types.beta.realtime",
"langchain_core",
"langchain_core.messages",
# FastAPI - Pydantic v2 compatibility issues during Sphinx autodoc
"fastapi",
"fastapi.applications",
"fastapi.routing",
"fastapi.params",
"fastapi.middleware",
"fastapi.responses",
"uvicorn",
]
# HTML output settings

View File

@@ -31,6 +31,9 @@ AZURE_DALLE_API_KEY=...
AZURE_DALLE_ENDPOINT=https://...
AZURE_DALLE_MODEL=...
# Camb.ai
CAMB_API_KEY=...
# Cartesia
CARTESIA_API_KEY=...
CARTESIA_VOICE_ID=...
@@ -40,11 +43,12 @@ CEREBRAS_API_KEY=...
# Daily
DAILY_API_KEY=...
DAILY_SAMPLE_ROOM_URL=https://...
DAILY_ROOM_URL=https://...
# Deepgram
DEEPGRAM_API_KEY=...
SAGEMAKER_ENDPOINT_NAME=...
SAGEMAKER_STT_ENDPOINT_NAME=...
SAGEMAKER_TTS_ENDPOINT_NAME=...
# DeepSeek
DEEPSEEK_API_KEY=...
@@ -97,9 +101,14 @@ INWORLD_API_KEY=...
KRISP_MODEL_PATH=...
# Krisp Viva
KRISP_VIVA_API_KEY=...
KRISP_VIVA_FILTER_MODEL_PATH=...
KRISP_VIVA_TURN_MODEL_PATH=...
# LemonSlice
LEMONSLICE_API_KEY=...
LEMONSLICE_AGENT_ID=...
# LiveKit
LIVEKIT_API_KEY=...
LIVEKIT_API_SECRET=...
@@ -118,6 +127,9 @@ MISTRAL_API_KEY=...
# Neuphonic
NEUPHONIC_API_KEY=...
# Novita
NOVITA_API_KEY=...
# NVIDIA
NVIDIA_API_KEY=...
@@ -139,10 +151,6 @@ KOALA_ACCESS_KEY=...
# Piper
PIPER_BASE_URL=...
# PlayHT
PLAYHT_USER_ID=...
PLAYHT_API_KEY=...
# Plivo
PLIVO_AUTH_ID=...
PLIVO_AUTH_TOKEN=...
@@ -150,6 +158,10 @@ PLIVO_AUTH_TOKEN=...
# Qwen
QWEN_API_KEY=...
# Resemble AI
RESEMBLE_API_KEY=
RESEMBLE_VOICE_UUID=
# Rime
RIME_API_KEY=...
RIME_VOICE_ID=...
@@ -167,6 +179,9 @@ SENTRY_DSN=...
SIMLI_API_KEY=...
SIMLI_FACE_ID=...
# Smallest
SMALLEST_API_KEY=...
# Smart turn
LOCAL_SMART_TURN_MODEL_PATH=...
FAL_SMART_TURN_API_KEY=...

View File

@@ -16,7 +16,7 @@ from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.piper.tts import PiperTTSService
from pipecat.services.piper.tts import PiperHttpTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
@@ -24,9 +24,8 @@ from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(audio_out_enabled=True),
"twilio": lambda: FastAPIWebsocketParams(audio_out_enabled=True),
@@ -39,8 +38,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# Create an HTTP session
async with aiohttp.ClientSession() as session:
tts = PiperTTSService(
base_url=os.getenv("PIPER_BASE_URL"), aiohttp_session=session, sample_rate=24000
tts = PiperHttpTTSService(
base_url=os.getenv("PIPER_BASE_URL"),
aiohttp_session=session,
sample_rate=24000,
)
task = PipelineTask(

View File

@@ -23,9 +23,8 @@ from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(audio_out_enabled=True),
"twilio": lambda: FastAPIWebsocketParams(audio_out_enabled=True),
@@ -40,8 +39,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async with aiohttp.ClientSession() as session:
tts = RimeHttpTTSService(
api_key=os.getenv("RIME_API_KEY", ""),
voice_id="rex",
aiohttp_session=session,
settings=RimeHttpTTSService.Settings(
voice="rex",
),
)
task = PipelineTask(

View File

@@ -23,9 +23,8 @@ from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(audio_out_enabled=True),
"twilio": lambda: FastAPIWebsocketParams(audio_out_enabled=True),
@@ -38,7 +37,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
task = PipelineTask(

View File

@@ -29,7 +29,9 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
pipeline = Pipeline([tts, transport.output()])

View File

@@ -37,7 +37,9 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
runner = PipelineRunner()

View File

@@ -23,9 +23,8 @@ from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(audio_out_enabled=True),
"twilio": lambda: FastAPIWebsocketParams(audio_out_enabled=True),

View File

@@ -25,9 +25,8 @@ from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(audio_out_enabled=True),
"twilio": lambda: FastAPIWebsocketParams(audio_out_enabled=True),
@@ -40,17 +39,17 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
messages = [
{
"role": "system",
"content": "You are an LLM in a WebRTC session, and this is a 'hello world' demo. Say hello to the world.",
}
]
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
task = PipelineTask(
Pipeline([llm, tts, transport.output()]),
@@ -60,7 +59,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# Register an event handler so we can play the audio when the client joins
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
await task.queue_frames([LLMContextFrame(LLMContext(messages)), EndFrame()])
context = LLMContext()
context.add_message({"role": "developer", "content": "Say hello to the world."})
await task.queue_frames([LLMContextFrame(context), EndFrame()])
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)

View File

@@ -23,9 +23,8 @@ from pipecat.transports.daily.transport import DailyParams
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
video_out_enabled=True,
@@ -46,7 +45,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# Create an HTTP session
async with aiohttp.ClientSession() as session:
imagegen = FalImageGenService(
params=FalImageGenService.InputParams(image_size="square_hd"),
settings=FalImageGenService.Settings(
image_size="square_hd",
),
aiohttp_session=session,
key=os.getenv("FAL_KEY"),
)

View File

@@ -37,7 +37,9 @@ async def main():
)
imagegen = FalImageGenService(
params=FalImageGenService.InputParams(image_size="square_hd"),
settings=FalImageGenService.Settings(
image_size="square_hd",
),
aiohttp_session=session,
key=os.getenv("FAL_KEY"),
)

View File

@@ -22,9 +22,8 @@ from pipecat.transports.daily.transport import DailyParams
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
video_out_enabled=True,

View File

@@ -17,9 +17,7 @@ from fastapi.responses import RedirectResponse
from loguru import logger
from pipecat_ai_small_webrtc_prebuilt.frontend import SmallWebRTCPrebuiltUI
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -35,8 +33,6 @@ from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.smallwebrtc.connection import IceServer, SmallWebRTCConnection
from pipecat.transports.smallwebrtc.transport import SmallWebRTCTransport
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
@@ -64,7 +60,6 @@ async def run_example(webrtc_connection: SmallWebRTCConnection):
params=TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
)
@@ -72,26 +67,22 @@ async def run_example(webrtc_connection: SmallWebRTCConnection):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
@@ -118,7 +109,9 @@ async def run_example(webrtc_connection: SmallWebRTCConnection):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -12,9 +12,7 @@ import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -28,8 +26,6 @@ from pipecat.runner.daily import configure
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.daily.transport import DailyParams, DailyTransport
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
@@ -49,34 +45,27 @@ async def main():
audio_in_enabled=True,
audio_out_enabled=True,
transcription_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[
TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())
]
),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
@@ -102,7 +91,9 @@ async def main():
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_participant_left")

View File

@@ -12,9 +12,7 @@ import sys
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import (
InterruptionFrame,
TranscriptionFrame,
@@ -35,8 +33,6 @@ from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.livekit.transport import LiveKitParams, LiveKitTransport
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
@@ -54,37 +50,29 @@ async def main():
params=LiveKitParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. "
"Your goal is to demonstrate your capabilities in a succinct way. "
"Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. "
"Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(

View File

@@ -16,11 +16,12 @@ from pipecat.frames.frames import (
Frame,
LLMContextFrame,
LLMFullResponseStartFrame,
OutputImageRawFrame,
TextFrame,
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.sync_parallel_pipeline import SyncParallelPipeline
from pipecat.pipeline.sync_parallel_pipeline import FrameOrder, SyncParallelPipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.sentence import SentenceAggregator
@@ -30,6 +31,7 @@ from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaHttpTTSService
from pipecat.services.fal.image import FalImageGenService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.tts_service import TextAggregationMode
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
@@ -44,6 +46,18 @@ class MonthFrame(DataFrame):
return f"{self.name}(month: {self.month})"
class MarkImageForPlaybackSync(FrameProcessor):
"""Marks output image frames to be synchronized with audio playback."""
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, OutputImageRawFrame):
frame.sync_with_audio = True
await self.push_frame(frame, direction)
class MonthPrepender(FrameProcessor):
def __init__(self):
super().__init__()
@@ -65,9 +79,8 @@ class MonthPrepender(FrameProcessor):
await self.push_frame(frame, direction)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_out_enabled=True,
@@ -99,11 +112,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaHttpTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaHttpTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
# No need to aggregate by sentences (the default), as we already know we're getting full sentences
# (Otherwise the service will unnecessarily wait for follow-up input to confirm the sentence is complete,
# which, sadly, actually breaks the synchronization mechanism)
text_aggregation_mode=TextAggregationMode.TOKEN,
)
imagegen = FalImageGenService(
params=FalImageGenService.InputParams(image_size="square_hd"),
settings=FalImageGenService.Settings(
image_size="square_hd",
),
aiohttp_session=session,
key=os.getenv("FAL_KEY"),
)
@@ -116,17 +137,26 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# that, each pipeline runs concurrently and `SyncParallelPipeline` will
# wait for the input frame to be processed.
#
# We use `FrameOrder.PIPELINE` so that each synchronized batch of output
# frames is pushed in the order the pipelines are listed: image first,
# then audio. This ensures the transport receives the image before the
# audio frames it should accompany.
#
# Note that `SyncParallelPipeline` requires the last processor in each
# of the pipelines to be synchronous. In this case, we use
# `CartesiaHttpTTSService` and `FalImageGenService` which make HTTP
# `FalImageGenService` and `CartesiaHttpTTSService` which make HTTP
# requests and wait for the response.
pipeline = Pipeline(
[
llm, # LLM
sentence_aggregator, # Aggregates LLM output into full sentences
SyncParallelPipeline( # Run pipelines in parallel aggregating the result
[
imagegen, # Generate image
MarkImageForPlaybackSync(), # Mark image as needing sync w/audio during playback
],
[month_prepender, tts], # Create "Month: sentence" and output audio
[imagegen], # Generate image
frame_order=FrameOrder.PIPELINE,
),
transport.output(), # Transport output
]
@@ -149,7 +179,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]:
messages = [
{
"role": "system",
"role": "user",
"content": f"Describe a nature photograph suitable for use in a calendar, for the month of {month}. Include only the image description with no preamble. Limit the description to one sentence, please.",
}
]

View File

@@ -1,198 +0,0 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
import tkinter as tk
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.frames.frames import (
Frame,
LLMContextFrame,
OutputAudioRawFrame,
TextFrame,
TTSAudioRawFrame,
URLImageRawFrame,
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.sync_parallel_pipeline import SyncParallelPipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.sentence import SentenceAggregator
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.cartesia.tts import CartesiaHttpTTSService
from pipecat.services.fal.image import FalImageGenService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.local.tk import TkLocalTransport, TkTransportParams
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
tk_root = tk.Tk()
tk_root.title("Calendar")
runner = PipelineRunner()
async def get_month_data(month):
messages = [
{
"role": "system",
"content": f"Describe a nature photograph suitable for use in a calendar, for the month of {month}. Include only the image description with no preamble. Limit the description to one sentence, please.",
}
]
class ImageDescription(FrameProcessor):
def __init__(self):
super().__init__()
self.text = ""
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, TextFrame):
self.text = frame.text
await self.push_frame(frame, direction)
class AudioGrabber(FrameProcessor):
def __init__(self):
super().__init__()
self.audio = bytearray()
self.frame = None
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, TTSAudioRawFrame):
self.audio.extend(frame.audio)
self.frame = OutputAudioRawFrame(
bytes(self.audio), frame.sample_rate, frame.num_channels
)
await self.push_frame(frame, direction)
class ImageGrabber(FrameProcessor):
def __init__(self):
super().__init__()
self.frame = None
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, URLImageRawFrame):
self.frame = frame
await self.push_frame(frame, direction)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
tts = CartesiaHttpTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
imagegen = FalImageGenService(
params=FalImageGenService.InputParams(image_size="square_hd"),
aiohttp_session=session,
key=os.getenv("FAL_KEY"),
)
sentence_aggregator = SentenceAggregator()
description = ImageDescription()
audio_grabber = AudioGrabber()
image_grabber = ImageGrabber()
# With `SyncParallelPipeline` we synchronize audio and images by
# pushing them basically in order (e.g. I1 A1 A1 A1 I2 A2 A2 A2 A2
# I3 A3). To do that, each pipeline runs concurrently and
# `SyncParallelPipeline` will wait for the input frame to be
# processed.
#
# Note that `SyncParallelPipeline` requires the last processor in
# each of the pipelines to be synchronous. In this case, we use
# `CartesiaHttpTTSService` and `FalImageGenService` which make HTTP
# requests and wait for the response.
pipeline = Pipeline(
[
llm, # LLM
sentence_aggregator, # Aggregates LLM output into full sentences
description, # Store sentence
SyncParallelPipeline(
[tts, audio_grabber], # Generate and store audio for the given sentence
[imagegen, image_grabber], # Generate and storeimage for the given sentence
),
]
)
task = PipelineTask(pipeline)
await task.queue_frame(LLMContextFrame(LLMContext(messages)))
await task.stop_when_done()
await runner.run(task)
return {
"month": month,
"text": description.text,
"image": image_grabber.frame,
"audio": audio_grabber.frame,
}
transport = TkLocalTransport(
tk_root,
TkTransportParams(
audio_out_enabled=True,
video_out_enabled=True,
video_out_width=1024,
video_out_height=1024,
),
)
pipeline = Pipeline([transport.output()])
task = PipelineTask(pipeline)
# We only specify a few months as we create tasks all at once and we
# might get rate limited otherwise.
months: list[str] = [
"January",
"February",
]
# We create one task per month. This will be executed concurrently.
month_tasks = [asyncio.create_task(get_month_data(month)) for month in months]
# Now we wait for each month task in the order they're completed. The
# benefit is we'll have as little delay as possible before the first
# month, and likely no delay between months, but the months won't
# display in order.
async def show_images(month_tasks):
for month_data_task in asyncio.as_completed(month_tasks):
data = await month_data_task
await task.queue_frames([data["image"], data["audio"]])
await runner.stop_when_done()
async def run_tk():
while not task.has_finished():
tk_root.update()
tk_root.update_idletasks()
await asyncio.sleep(0.1)
await asyncio.gather(runner.run(task), show_images(month_tasks), run_tk())
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -9,9 +9,7 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import Frame, LLMRunFrame, MetricsFrame
from pipecat.metrics.metrics import (
LLMUsageMetricsData,
@@ -36,8 +34,6 @@ from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
@@ -62,24 +58,20 @@ class MetricsLogger(FrameProcessor):
await self.push_frame(frame, direction)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
}
@@ -91,28 +83,24 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
ml = MetricsLogger()
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
@@ -141,7 +129,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -10,9 +10,7 @@ from dotenv import load_dotenv
from loguru import logger
from PIL import Image
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import (
BotStartedSpeakingFrame,
BotStoppedSpeakingFrame,
@@ -36,8 +34,6 @@ from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
@@ -77,9 +73,8 @@ class ImageSyncAggregator(FrameProcessor):
await self.push_frame(frame, direction)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
@@ -87,7 +82,6 @@ transport_params = {
video_out_enabled=True,
video_out_width=1024,
video_out_height=1024,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
@@ -95,7 +89,6 @@ transport_params = {
video_out_enabled=True,
video_out_width=1024,
video_out_height=1024,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
}
@@ -107,26 +100,22 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
image_sync_aggregator = ImageSyncAggregator(

View File

@@ -6,12 +6,11 @@
import os
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -29,30 +28,24 @@ from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
}
@@ -60,68 +53,68 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = CartesiaSTTService(api_key=os.getenv("CARTESIA_API_KEY"))
async with aiohttp.ClientSession() as session:
stt = CartesiaSTTService(api_key=os.getenv("CARTESIA_API_KEY"))
tts = CartesiaHttpTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
tts = CartesiaHttpTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
aiohttp_session=session,
settings=CartesiaHttpTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
),
)
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
assistant_aggregator, # Assistant spoken responses
]
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
assistant_aggregator, # Assistant spoken responses
]
)
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
await runner.run(task)
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):

View File

@@ -0,0 +1,125 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.responses.llm import OpenAIResponsesLLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAIResponsesLLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAIResponsesLLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
assistant_aggregator, # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -9,9 +9,7 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -29,29 +27,23 @@ from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
}
@@ -63,26 +55,22 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
@@ -110,7 +98,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -21,7 +21,6 @@ from pipecat.processors.aggregators.llm_response_universal import (
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.openai.base_llm import BaseOpenAILLMService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.speechmatics.stt import SpeechmaticsSTTService
from pipecat.services.speechmatics.tts import SpeechmaticsTTSService
@@ -33,9 +32,8 @@ from pipecat.turns.user_turn_strategies import ExternalUserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
@@ -94,7 +92,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async with aiohttp.ClientSession() as session:
stt = SpeechmaticsSTTService(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
params=SpeechmaticsSTTService.InputParams(
settings=SpeechmaticsSTTService.Settings(
language=Language.EN,
turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.ADAPTIVE,
# focus_speakers=["S1"],
@@ -105,32 +103,21 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = SpeechmaticsTTSService(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
voice_id="sarah",
settings=SpeechmaticsTTSService.Settings(
voice="sarah",
),
aiohttp_session=session,
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
params=BaseOpenAILLMService.InputParams(temperature=0.75),
settings=OpenAILLMService.Settings(
temperature=0.75,
system_instruction="You are a helpful British assistant called Sarah in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Always include punctuation in your responses. Give very short replies - do not give longer replies unless strictly necessary. Respond to what the user said in a concise, funny, creative and helpful way. Use `<Sn/>` tags to identify different speakers - do not use tags in your replies. Do not respond to speakers within `<PASSIVE/>` tags unless explicitly asked to.",
),
)
messages = [
{
"role": "system",
"content": (
"You are a helpful British assistant called Sarah. "
"Your goal is to demonstrate your capabilities in a succinct way. "
"Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. "
"Always include punctuation in your responses. "
"Give very short replies - do not give longer replies unless strictly necessary. "
"Respond to what the user said in a concise, funny, creative and helpful way. "
"Use `<Sn/>` tags to identify different speakers - do not use tags in your replies. "
"Do not respond to speakers within `<PASSIVE/>` tags unless explicitly asked to. "
),
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(user_turn_strategies=ExternalUserTurnStrategies()),
@@ -161,7 +148,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Say a short hello to the user."})
context.add_message({"role": "developer", "content": "Say a short hello to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -10,9 +10,7 @@ import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -24,7 +22,6 @@ from pipecat.processors.aggregators.llm_response_universal import (
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.openai.base_llm import BaseOpenAILLMService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.speechmatics.stt import SpeechmaticsSTTService
from pipecat.services.speechmatics.tts import SpeechmaticsTTSService
@@ -32,29 +29,23 @@ from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
}
@@ -84,7 +75,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async with aiohttp.ClientSession() as session:
stt = SpeechmaticsSTTService(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
params=SpeechmaticsSTTService.InputParams(
settings=SpeechmaticsSTTService.Settings(
language=Language.EN,
speaker_active_format="<{speaker_id}>{text}</{speaker_id}>",
),
@@ -92,40 +83,24 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = SpeechmaticsTTSService(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
voice_id="sarah",
settings=SpeechmaticsTTSService.Settings(
voice="sarah",
),
aiohttp_session=session,
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
params=BaseOpenAILLMService.InputParams(temperature=0.75),
settings=OpenAILLMService.Settings(
temperature=0.75,
system_instruction="You are a helpful British assistant called Sarah in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Always include punctuation in your responses. Give very short replies - do not give longer replies unless strictly necessary. Respond to what the user said in a concise, funny, creative and helpful way. Use `<Sn/>` tags to identify different speakers - do not use tags in your replies. Do not respond to speakers within `<PASSIVE/>` tags unless explicitly asked to.",
),
)
messages = [
{
"role": "system",
"content": (
"You are a helpful British assistant called Sarah. "
"Your goal is to demonstrate your capabilities in a succinct way. "
"Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. "
"Always include punctuation in your responses. "
"Give very short replies - do not give longer replies unless strictly necessary. "
"Respond to what the user said in a concise, funny, creative and helpful way. "
"Use `<Sn/>` tags to identify different speakers - do not use tags in your replies."
),
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[
TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())
]
),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
@@ -153,7 +128,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Say a short hello to the user."})
context.add_message({"role": "developer", "content": "Say a short hello to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -15,9 +15,7 @@ from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_openai import ChatOpenAI
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMMessagesUpdateFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -35,8 +33,6 @@ from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
@@ -50,24 +46,20 @@ def get_session_history(session_id: str) -> BaseChatMessageHistory:
return message_store[session_id]
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
}
@@ -79,15 +71,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"Be nice and helpful. Answer very briefly and without special characters like `#` or `*`. "
"Your response will be synthesized to voice and those characters will create unnatural sounds.",
"You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
MessagesPlaceholder("chat_history"),
("human", "{input}"),
@@ -105,11 +98,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(

View File

@@ -10,6 +10,7 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -32,9 +33,8 @@ from pipecat.turns.user_turn_strategies import ExternalUserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
@@ -56,24 +56,32 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = DeepgramFluxSTTService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
params=DeepgramFluxSTTService.InputParams(min_confidence=0.3),
settings=DeepgramFluxSTTService.Settings(
min_confidence=0.3,
),
)
tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-2-andromeda-en")
tts = DeepgramTTSService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
settings=DeepgramTTSService.Settings(
voice="aura-2-andromeda-en",
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(user_turn_strategies=ExternalUserTurnStrategies()),
user_params=LLMUserAggregatorParams(
user_turn_strategies=ExternalUserTurnStrategies(),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
@@ -101,7 +109,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -11,9 +11,7 @@ import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -31,30 +29,24 @@ from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
}
@@ -67,29 +59,23 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = DeepgramHttpTTSService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
voice="aura-2-andromeda-en",
settings=DeepgramHttpTTSService.Settings(
voice="aura-2-andromeda-en",
),
aiohttp_session=session,
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[
TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())
]
),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
@@ -117,7 +103,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -10,9 +10,7 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -24,36 +22,30 @@ from pipecat.processors.aggregators.llm_response_universal import (
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.aws.llm import AWSBedrockLLMService
from pipecat.services.deepgram.stt_sagemaker import DeepgramSageMakerSTTService
from pipecat.services.deepgram.tts import DeepgramTTSService
from pipecat.services.aws.llm import AWSBedrockLLMService, AWSBedrockLLMSettings
from pipecat.services.deepgram.sagemaker.stt import DeepgramSageMakerSTTService
from pipecat.services.deepgram.sagemaker.tts import DeepgramSageMakerTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
}
@@ -66,33 +58,35 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# - AWS credentials configured (via environment variables or AWS CLI)
# - A deployed SageMaker endpoint with Deepgram model
stt = DeepgramSageMakerSTTService(
endpoint_name=os.getenv("SAGEMAKER_ENDPOINT_NAME"),
endpoint_name=os.getenv("SAGEMAKER_STT_ENDPOINT_NAME"),
region=os.getenv("AWS_REGION"),
)
tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-2-andromeda-en")
# Initialize Deepgram SageMaker TTS Service
# This requires:
# - AWS credentials configured (via environment variables or AWS CLI)
# - A deployed SageMaker endpoint with Deepgram TTS model
tts = DeepgramSageMakerTTSService(
endpoint_name=os.getenv("SAGEMAKER_TTS_ENDPOINT_NAME"),
region=os.getenv("AWS_REGION"),
settings=DeepgramSageMakerTTSService.Settings(
voice="aura-2-andromeda-en",
),
)
llm = AWSBedrockLLMService(
aws_region=os.getenv("AWS_REGION"),
model="us.amazon.nova-pro-v1:0",
params=AWSBedrockLLMService.InputParams(temperature=0.8),
settings=AWSBedrockLLMSettings(
model="us.amazon.nova-pro-v1:0",
temperature=0.8,
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
@@ -120,7 +114,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -7,7 +7,6 @@
import os
from deepgram import LiveOptions
from dotenv import load_dotenv
from loguru import logger
@@ -33,9 +32,8 @@ from pipecat.turns.user_turn_strategies import ExternalUserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
@@ -57,21 +55,27 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = DeepgramSTTService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
live_options=LiveOptions(vad_events=True, utterance_end_ms="1000"),
settings=DeepgramSTTService.Settings(
vad_events=True,
utterance_end_ms="1000",
),
)
tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-2-andromeda-en")
tts = DeepgramTTSService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
settings=DeepgramTTSService.Settings(
voice="aura-2-andromeda-en",
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(user_turn_strategies=ExternalUserTurnStrategies()),
@@ -102,7 +106,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -10,9 +10,7 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -30,30 +28,24 @@ from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
}
@@ -63,25 +55,24 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-2-andromeda-en")
tts = DeepgramTTSService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
settings=DeepgramTTSService.Settings(
voice="aura-2-andromeda-en",
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
@@ -109,7 +100,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -11,9 +11,7 @@ import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -31,30 +29,24 @@ from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
}
@@ -71,29 +63,23 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = ElevenLabsHttpTTSService(
api_key=os.getenv("ELEVENLABS_API_KEY", ""),
voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
aiohttp_session=session,
settings=ElevenLabsHttpTTSService.Settings(
voice=os.getenv("ELEVENLABS_VOICE_ID", ""),
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[
TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())
]
),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
@@ -121,7 +107,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -10,9 +10,7 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -30,30 +28,24 @@ from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
}
@@ -65,26 +57,22 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = ElevenLabsTTSService(
api_key=os.getenv("ELEVENLABS_API_KEY", ""),
voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
settings=ElevenLabsTTSService.Settings(
voice=os.getenv("ELEVENLABS_VOICE_ID", ""),
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
@@ -112,7 +100,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -0,0 +1,128 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.grok.llm import GrokLLMService
from pipecat.services.xai.tts import XAIHttpTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
async with aiohttp.ClientSession() as session:
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = XAIHttpTTSService(
api_key=os.getenv("GROK_API_KEY"),
aiohttp_session=session,
settings=XAIHttpTTSService.Settings(
voice="eve",
),
)
llm = GrokLLMService(
api_key=os.getenv("GROK_API_KEY"),
settings=GrokLLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
assistant_aggregator, # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -10,9 +10,7 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -30,29 +28,23 @@ from pipecat.services.azure.tts import AzureHttpTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
}
@@ -73,24 +65,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
llm = AzureLLMService(
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
model=os.getenv("AZURE_CHATGPT_MODEL"),
settings=AzureLLMService.Settings(
model=os.getenv("AZURE_CHATGPT_MODEL"),
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
@@ -118,7 +102,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -10,9 +10,7 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -30,29 +28,23 @@ from pipecat.services.azure.tts import AzureTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
}
@@ -73,24 +65,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
llm = AzureLLMService(
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
model=os.getenv("AZURE_CHATGPT_MODEL"),
settings=AzureLLMService.Settings(
model=os.getenv("AZURE_CHATGPT_MODEL"),
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
@@ -118,7 +102,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -0,0 +1,133 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.openai.stt import OpenAISTTService
from pipecat.services.openai.tts import OpenAITTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = OpenAISTTService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAISTTService.Settings(
model="gpt-4o-transcribe",
prompt="Expect words related to dogs, such as breed names.",
),
)
tts = OpenAITTSService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAITTSService.Settings(
voice="ballad",
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are very knowledgable about dogs. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
),
)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
assistant_aggregator, # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
audio_out_sample_rate=24000,
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -10,9 +10,7 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -25,34 +23,29 @@ from pipecat.processors.aggregators.llm_response_universal import (
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.openai.stt import OpenAISTTService
from pipecat.services.openai.stt import OpenAIRealtimeSTTService
from pipecat.services.openai.tts import OpenAITTSService
from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
}
@@ -60,31 +53,33 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = OpenAISTTService(
stt = OpenAIRealtimeSTTService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o-transcribe",
prompt="Expect words related to dogs, such as breed names.",
settings=OpenAIRealtimeSTTService.Settings(
model="gpt-4o-transcribe",
prompt="Expect words related to dogs, such as breed names.",
language=Language.EN,
),
)
tts = OpenAITTSService(api_key=os.getenv("OPENAI_API_KEY"), voice="ballad")
tts = OpenAITTSService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAITTSService.Settings(
voice="ballad",
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are very knowledgable about dogs. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
),
)
messages = [
{
"role": "system",
"content": "You are very knowledgable about dogs. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
@@ -113,7 +108,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -11,9 +11,7 @@ import time
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -31,29 +29,23 @@ from pipecat.services.openpipe.llm import OpenPipeLLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
}
@@ -65,7 +57,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
timestamp = int(time.time())
@@ -73,23 +67,15 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
api_key=os.getenv("OPENAI_API_KEY"),
openpipe_api_key=os.getenv("OPENPIPE_API_KEY"),
tags={"conversation_id": f"pipecat-{timestamp}"},
settings=OpenPipeLLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
@@ -117,7 +103,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -11,9 +11,7 @@ import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -31,29 +29,23 @@ from pipecat.services.xtts.tts import XTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
}
@@ -67,29 +59,23 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = XTTSService(
aiohttp_session=session,
voice_id="Claribel Dervla",
settings=XTTSService.Settings(
voice="Claribel Dervla",
),
base_url="http://localhost:8000",
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[
TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())
]
),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
@@ -117,7 +103,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -11,7 +11,6 @@ from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -24,7 +23,7 @@ from pipecat.processors.aggregators.llm_response_universal import (
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.gladia.config import GladiaInputParams, LanguageConfig
from pipecat.services.gladia.config import LanguageConfig
from pipecat.services.gladia.stt import GladiaSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transcriptions.language import Language
@@ -35,24 +34,20 @@ from pipecat.turns.user_turn_strategies import ExternalUserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
}
@@ -63,7 +58,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = GladiaSTTService(
api_key=os.getenv("GLADIA_API_KEY", ""),
region=os.getenv("GLADIA_REGION"),
params=GladiaInputParams(
settings=GladiaSTTService.Settings(
language_config=LanguageConfig(
languages=[Language.EN],
),
@@ -73,22 +68,25 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY", ""),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY", ""))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY", ""),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": f"You are a helpful LLM. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(user_turn_strategies=ExternalUserTurnStrategies()),
user_params=LLMUserAggregatorParams(
user_turn_strategies=ExternalUserTurnStrategies(),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
@@ -116,7 +114,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -10,9 +10,7 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -25,36 +23,30 @@ from pipecat.processors.aggregators.llm_response_universal import (
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.gladia.config import GladiaInputParams, LanguageConfig
from pipecat.services.gladia.config import LanguageConfig
from pipecat.services.gladia.stt import GladiaSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
}
@@ -65,7 +57,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = GladiaSTTService(
api_key=os.getenv("GLADIA_API_KEY", ""),
region=os.getenv("GLADIA_REGION"),
params=GladiaInputParams(
settings=GladiaSTTService.Settings(
language_config=LanguageConfig(
languages=[Language.EN],
)
@@ -74,26 +66,22 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY", ""),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY", ""))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY", ""),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": f"You are a helpful LLM. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
@@ -121,7 +109,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -10,10 +10,7 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -31,29 +28,23 @@ from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
}
@@ -63,25 +54,24 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = LmntTTSService(api_key=os.getenv("LMNT_API_KEY"), voice_id="morgan")
tts = LmntTTSService(
api_key=os.getenv("LMNT_API_KEY"),
settings=LmntTTSService.Settings(
voice="morgan",
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
@@ -109,7 +99,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -10,9 +10,7 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -30,29 +28,23 @@ from pipecat.services.groq.tts import GroqTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
}
@@ -63,26 +55,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = GroqSTTService(api_key=os.getenv("GROQ_API_KEY"))
llm = GroqLLMService(
api_key=os.getenv("GROQ_API_KEY"), model="meta-llama/llama-4-maverick-17b-128e-instruct"
api_key=os.getenv("GROQ_API_KEY"),
settings=GroqLLMService.Settings(
model="llama-3.1-8b-instant",
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
tts = GroqTTSService(api_key=os.getenv("GROQ_API_KEY"))
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
@@ -110,7 +95,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -8,7 +8,6 @@
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMMessagesAppendFrame
from pipecat.pipeline.pipeline import Pipeline
@@ -27,8 +26,6 @@ from pipecat.services.aws.tts import AWSPollyTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
# Strands agent setup
try:
@@ -41,24 +38,20 @@ except ImportError:
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
}
@@ -102,13 +95,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = AWSPollyTTSService(
region="us-west-2", # only specific regions support generative TTS
voice_id="Joanna",
params=AWSPollyTTSService.InputParams(engine="generative", rate="1.1"),
settings=AWSPollyTTSService.Settings(
voice="Joanna",
engine="generative",
rate="1.1",
),
)
# Create Strands agent processor
try:
agent = build_agent(model_id="us.anthropic.claude-3-5-haiku-20241022-v1:0", max_tokens=8000)
agent = build_agent(model_id="us.anthropic.claude-sonnet-4-6", max_tokens=8000)
llm = StrandsAgentsProcessor(agent=agent)
logger.info("Successfully created Strands agent for NAB customer service coaching")
except Exception as e:
@@ -122,11 +118,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
@@ -159,8 +151,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
LLMMessagesAppendFrame(
messages=[
{
"role": "user",
"content": f"Greet the user and introduce yourself.",
"role": "developer",
"content": f"Greet the user and introduce yourself. Don't use emojis.",
}
],
run_llm=True,

View File

@@ -8,9 +8,7 @@
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -28,29 +26,23 @@ from pipecat.services.aws.tts import AWSPollyTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
}
@@ -62,31 +54,26 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = AWSPollyTTSService(
region="us-west-2", # only specific regions support generative TTS
voice_id="Joanna",
params=AWSPollyTTSService.InputParams(engine="generative", rate="1.1"),
settings=AWSPollyTTSService.Settings(
voice="Joanna",
engine="generative",
rate="1.1",
),
)
llm = AWSBedrockLLMService(
aws_region="us-west-2",
model="us.anthropic.claude-haiku-4-5-20251001-v1:0",
params=AWSBedrockLLMService.InputParams(temperature=0.8),
settings=AWSBedrockLLMService.Settings(
model="us.anthropic.claude-sonnet-4-6",
temperature=0.8,
# system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
@@ -114,7 +101,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "user", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -25,9 +25,7 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -45,15 +43,11 @@ from pipecat.services.google.tts import GoogleTTSService
from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
@@ -61,7 +55,6 @@ transport_params = {
video_out_enabled=True,
video_out_width=1024,
video_out_height=1024,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
@@ -69,7 +62,6 @@ transport_params = {
video_out_enabled=True,
video_out_width=1024,
video_out_height=1024,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
}
@@ -78,37 +70,33 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = GoogleSTTService(
params=GoogleSTTService.InputParams(languages=Language.EN_US),
credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
settings=GoogleSTTService.Settings(
languages=[Language.EN_US],
),
)
tts = GoogleTTSService(
voice_id="en-US-Chirp3-HD-Charon",
params=GoogleTTSService.InputParams(language=Language.EN_US),
credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
settings=GoogleTTSService.Settings(
voice="en-US-Chirp3-HD-Charon",
language=Language.EN_US,
),
)
llm = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
model="gemini-2.5-flash-image",
# model="gemini-3-pro-image-preview", # A more powerful model, but slower
settings=GoogleLLMService.Settings(
model="gemini-2.5-flash-image",
# model="gemini-3-pro-image-preview", # A more powerful model, but slower,
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
@@ -136,7 +124,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation with a styled introduction
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -10,9 +10,7 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -31,29 +29,23 @@ from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
}
@@ -62,15 +54,17 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot with Gemini TTS")
stt = GoogleSTTService(
params=GoogleSTTService.InputParams(languages=Language.EN_US),
settings=GoogleSTTService.Settings(
languages=[Language.EN_US],
),
credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
)
tts = GeminiTTSService(
credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
model="gemini-2.5-flash-tts",
voice_id="Charon",
params=GeminiTTSService.InputParams(
settings=GeminiTTSService.Settings(
model="gemini-2.5-flash-tts",
voice="Charon",
language=Language.EN_US,
prompt="You are a helpful AI assistant. Speak in a natural, conversational tone.",
),
@@ -79,13 +73,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
llm = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
model="gemini-2.5-flash",
)
# System message that instructs the AI on how to speak
messages = [
{
"role": "system",
"content": """You are a helpful AI assistant in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way.
settings=GoogleLLMService.Settings(
system_instruction="""You are a helpful assistant in a voice conversation.
IMPORTANT: You're using Gemini TTS which supports expressive markup tags. You can use these tags in your responses:
- [sigh] - Insert a sigh sound
@@ -102,18 +91,14 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
- "[whispering] Let me tell you a secret."
- "The answer is... [long pause] ...42!"
Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.""",
},
]
Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Keep responses concise. Respond to what the user said in a creative and helpful way.""",
),
)
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
@@ -141,9 +126,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation
messages.append(
context.add_message(
{
"role": "system",
"role": "developer",
"content": "You are an AI assistant. You can help with a variety of tasks. Introduce yourself and ask the user what they would like to know.",
}
)

View File

@@ -10,9 +10,7 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -31,29 +29,23 @@ from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
}
@@ -62,41 +54,37 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = GoogleSTTService(
params=GoogleSTTService.InputParams(languages=Language.EN_US, model="chirp_3"),
settings=GoogleSTTService.Settings(
languages=[Language.EN_US],
# Add model to use a specific model
# model="chirp_3",
),
credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
location="us",
)
tts = GoogleHttpTTSService(
voice_id="en-US-Chirp3-HD-Charon",
params=GoogleHttpTTSService.InputParams(language=Language.EN_US),
settings=GoogleHttpTTSService.Settings(
voice="en-US-Chirp3-HD-Charon",
language=Language.EN_US,
),
credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
)
llm = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
model="gemini-2.5-flash",
# force a certain amount of thinking if you want it
# params=GoogleLLMService.InputParams(
# thinking=GoogleLLMService.ThinkingConfig(thinking_budget=4096)
# ),
settings=GoogleLLMService.Settings(
model="gemini-2.5-flash",
# force a certain amount of thinking if you want it
# thinking=GoogleLLMService.ThinkingConfig(thinking_budget=4096)
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
@@ -124,7 +112,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -10,9 +10,7 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -31,29 +29,23 @@ from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
}
@@ -62,41 +54,37 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = GoogleSTTService(
params=GoogleSTTService.InputParams(languages=Language.EN_US, model="chirp_3"),
settings=GoogleSTTService.Settings(
languages=[Language.EN_US],
# Add model to use a specific model
# model="chirp_3",
),
credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
location="us",
)
tts = GoogleTTSService(
voice_id="en-US-Chirp3-HD-Charon",
params=GoogleTTSService.InputParams(language=Language.EN_US),
settings=GoogleTTSService.Settings(
voice="en-US-Chirp3-HD-Charon",
language=Language.EN_US,
),
credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
)
llm = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
model="gemini-2.5-flash",
# force a certain amount of thinking if you want it
# params=GoogleLLMService.InputParams(
# thinking=GoogleLLMService.ThinkingConfig(thinking_budget=4096)
# ),
settings=GoogleLLMService.Settings(
model="gemini-2.5-flash",
# force a certain amount of thinking if you want it
# thinking=GoogleLLMService.ThinkingConfig(thinking_budget=4096),
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
@@ -124,7 +112,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -0,0 +1,180 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.assemblyai.stt import AssemblyAISTTService
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_turn_strategies import ExternalUserTurnStrategies
load_dotenv(override=True)
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
"""AssemblyAI u3-rt-pro with Built-in Turn Detection
This example demonstrates using AssemblyAI's u3-rt-pro Speech-to-Text model
with AssemblyAI's built-in turn detection for more natural conversation flow.
Key features:
1. AssemblyAI Turn Detection
- Set `vad_force_turn_endpoint=False` to use AssemblyAI's built-in turn detection
- AssemblyAI's model determines when user starts/stops speaking
- Uses `ExternalUserTurnStrategies` to delegate turn control to AssemblyAI
- More natural turn detection based on speech patterns and pauses
2. Advanced Turn Detection Tuning
- `min_turn_silence`: Minimum silence (ms) when confident about end-of-turn.
Lower values = faster responses. Default: 100ms
- `max_turn_silence`: Maximum silence (ms) before forcing end-of-turn.
Prevents long pauses. Default: 1000ms
3. Prompt-Based Transcription Enhancement
- Use `prompt` parameter to improve accuracy for specific names/terms
- Particularly useful for proper nouns, technical terms, domain vocabulary
- Example: "Names: Xiomara, Saoirse, Krzystof. Technical terms: API, OAuth."
4. Speaker Diarization (Optional)
- Enable with `speaker_labels=True`
- Automatically identifies different speakers in multi-party conversations
- TranscriptionFrame includes speaker_id field (e.g., "Speaker A", "Speaker B")
5. Language Detection (Optional, multilingual model only)
- Enable with `language_detection=True`
- Automatically detects spoken language
- Available with universal-streaming-multilingual model
For more information: https://www.assemblyai.com/docs/speech-to-text/streaming
"""
logger.info(f"Starting bot")
stt = AssemblyAISTTService(
api_key=os.getenv("ASSEMBLYAI_API_KEY"),
vad_force_turn_endpoint=False, # Use AssemblyAI's built-in turn detection
settings=AssemblyAISTTService.Settings(
model="u3-rt-pro",
# Optional: Tune turn detection timing (defaults shown below)
# min_turn_silence=100, # Default
# max_turn_silence=1000, # Default
# Optional: Boost accuracy for specific names/terms
# keyterms_prompt=["Xiomara", "Saoirse", "Krzystof", "API", "OAuth"],
# Optional: Enable speaker diarization
# speaker_labels=True,
),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=ExternalUserTurnStrategies(),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
assistant_aggregator, # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -10,9 +10,7 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -30,30 +28,24 @@ from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
),
}
@@ -67,26 +59,22 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
),
),
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
@@ -114,7 +102,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -28,10 +28,11 @@ from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.filters.krisp_viva_filter import KrispVivaFilter
from pipecat.audio.turn.krisp_viva_turn import KrispTurnParams, KrispVivaTurn
from pipecat.audio.turn.krisp_viva_turn import KrispVivaTurn
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.metrics.metrics import TurnMetricsData
from pipecat.observers.loggers.metrics_log_observer import MetricsLogObserver
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -42,8 +43,8 @@ from pipecat.processors.aggregators.llm_response_universal import (
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.deepgram.tts import DeepgramTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
@@ -53,27 +54,26 @@ from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
krisp_viva_filter = KrispVivaFilter()
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
audio_in_filter=KrispVivaFilter(),
audio_in_filter=krisp_viva_filter,
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
audio_in_filter=KrispVivaFilter(),
audio_in_filter=krisp_viva_filter,
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
audio_in_filter=KrispVivaFilter(),
audio_in_filter=krisp_viva_filter,
),
}
@@ -83,24 +83,28 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-helios-en")
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=KrispVivaTurn())]
),
vad_analyzer=SileroVADAnalyzer(), # or KrispVivaVadAnalyzer
),
)
@@ -123,13 +127,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
observers=[MetricsLogObserver(include_metrics={TurnMetricsData})],
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

Some files were not shown because too many files have changed in this diff Show More