Remove the deprecated text_aggregator parameter from TTSService,
CartesiaTTSService, and RimeTTSService, and the deprecated text_filter
parameter from TTSService. Users should use LLMTextProcessor before
the TTS service instead. Update the voice-switching example to use
LLMTextProcessor with PatternPairAggregator.
Add SmallestSTTService using the Pulse WebSocket API for real-time
transcription. Includes SmallestSTTSettings dataclass, 32-language
support with resolve_language fallback, VAD-driven finalize signal,
and SMALLEST_TTFS_P99 latency constant.
Also adds X-Source and X-Pipecat-Version headers to Smallest STT
and TTS WebSocket connections.
Example files like openai.py shadow installed packages when Python adds the
script directory to sys.path. Prepend the parent folder name to each example
file (e.g. openai.py -> function-calling-openai.py). Also split
thinking-and-mcp/ into separate mcp/ and thinking/ directories.
Replace the nested services/speech/ and services/function-calling/ with
top-level voice/ and function-calling/ directories. Update eval script
paths and README to match.
Move 304 examples from a flat numbered directory into 14 descriptive
subfolders: getting-started, services (speech + function-calling),
transcription, vision, realtime, persistent-context,
context-summarization, update-settings (stt/tts/llm), turn-management,
thinking-and-mcp, transports, video-avatar, video-processing, and
features.
Strip numbered prefixes from filenames (e.g. 07c-interruptible-deepgram.py
becomes services/speech/deepgram.py) since the folder context makes them
redundant. Keep numbered prefixes only in getting-started/ where ordering
matters.
Update eval script paths and README to match the new structure.
Add detailed trace-level logging to _apply_previous_response_optimization
showing why the optimization was applied or fell back to full context,
including the relevant data for debugging.
Use append_to_context=False for the filler TTSSpeakFrame in the
function-calling example to avoid altering the conversation history
and breaking the previous_response_id prefix match.
Introduce a WebSocket variant of the OpenAI Responses API service that
maintains a persistent connection to wss://api.openai.com/v1/responses
for lower-latency inference. The WebSocket variant automatically uses
previous_response_id to send only incremental context when possible,
falling back to full context on reconnection or cache miss.
The WebSocket variant becomes the new default OpenAIResponsesLLMService,
and the HTTP variant is renamed to OpenAIResponsesHttpLLMService. Both
share a private base class with common settings, parameter building,
and run_inference (always HTTP) logic.
OpenPipe was acquired by CoreWeave in September 2025. The Python package
hasn't been updated since June 2025 and the repo since 2024. The openpipe
package caps openai<=1.97.1, creating dependency conflicts with other
extras. Remove the dead integration to clean up the codebase.
- Add Nebius LLM service wrapping OpenAI-compatible Token Factory API
- Set supports_developer_role = False (Nebius rejects developer role)
- Default to openai/gpt-oss-120b model (supports function calling)
- Add Nebius function-calling example and env.example entry
- Fix Sarvam developer role support
- Update examples to use developer role for intro messages
Gemini 3.1 Flash Live won't reliably report ending its turn until
after it says something following a tool call. Restructure the system
instruction so the model says goodbye *after* calling
end_conversation, and add a comment explaining the deferred EndFrame
behavior that makes this work.
All recent Gemini Live models (including the default
gemini-2.5-flash-native-audio-preview-12-2025, and going at least as
far back as gemini-2.5-flash-native-audio-preview-09-2025) only
support AUDIO as a response modality. We considered using
`modalities=TEXT` as a Pipecat-level signal to suppress audio output
frames (so developers could pair Gemini Live with an external TTS),
but the output transcription from the API arrives too late relative
to the audio to be useful for driving an external TTS service.
For now, just log a warning when a TEXT modality is configured
(at init or via set_model_modalities) and proceed as normal. The 26d
text-modality example is removed since it no longer represents a
viable configuration.
Expose a public method for retrieving all stored memories outside the
pipeline, avoiding the need for callers to reimplement client branching,
OR filter construction, and asyncio.to_thread wrapping. Simplify the
example get_initial_greeting() to use it.
When Gemini Live was configured with local VAD (server-side VAD disabled),
the service was listening for the wrong frame types and not sending
ActivityStart/ActivityEnd events to the server. Now it listens for
VADUserStartedSpeakingFrame/VADUserStoppedSpeakingFrame and sends the
appropriate activity signals when local VAD is in use.
Also removes the unnecessary local SileroVADAnalyzer from server-side VAD
examples and adds a new 26a example demonstrating local VAD configuration.
Add DeepgramFluxSageMakerSTTService that combines SageMaker's HTTP/2
transport with Flux's JSON turn detection protocol (StartOfTurn,
EndOfTurn, EagerEndOfTurn, TurnResumed). Includes mid-stream Configure
support, silence watchdog, and an example bot.
Both GrokLLMService and XAIHttpTTSService use the same xAI API (api.x.ai),
so move Grok source files into the xai module. Leave deprecation shims in
the old grok/ paths for backward compatibility.
- Rename XAITTSService → XAIHttpTTSService and XAITTSSettings → XAIHttpTTSSettings
- Add language_to_xai_language() with explicit LANGUAGE_MAP using resolve_language()
- Remove deprecated InputParams, params, voice, language init params
- Remove XAI_DEFAULT_SAMPLE_RATE and XAI_PCM_CODEC constants; add encoding param
- Set sample_rate=None default (picked up from PipelineParams or user)
- Use Language.EN enum instead of string "en" for default language
- Add changelog/4031.added.md
- Add 07e-interruptible-xai.py foundational example
- Update 14g-function-calling-grok.py to use XAIHttpTTSService
- Register 07e in run-release-evals.py
Add system_instruction parameter to the Grok Realtime adapter's
get_llm_invocation_params() and call _resolve_system_instruction() to
prefer init-provided over context-provided system instructions and
warn on conflicts. Previously context-provided took precedence.
Update the Grok Realtime example to use settings.system_instruction
instead of session_properties.instructions.
These messages are developer instructions to the assistant (e.g. "Please
introduce yourself to the user"), not simulated user input. The
"developer" role is semantically correct for this purpose.