Commit Graph

168 Commits

Author SHA1 Message Date
filipi87
6ef7f6446a Saving the audio inside the Tavus video so we can test. 2026-05-21 09:01:12 -03:00
filipi87
7c61c36825 Recording the audios that we are receiving. 2026-05-20 19:03:01 -03:00
filipi87
1338da6831 Don't inject silence in the proxy. 2026-05-20 18:26:11 -03:00
filipi87
6a238e0d62 Refactoring how we are handling the silence. 2026-05-20 17:41:11 -03:00
filipi87
e7bad7a007 Buffering the audio before sending back. 2026-05-20 17:23:15 -03:00
filipi87
b360fbf7fc Handling interruption. 2026-05-20 16:26:47 -03:00
filipi87
f568b1d8df Fixing ruff format. 2026-05-20 12:48:25 -03:00
filipi87
fd7af7ba9f Changing the silence threshold to 10. 2026-05-20 12:45:31 -03:00
filipi87
b7d272a5be Skipping webrtc injected silence. 2026-05-20 12:18:43 -03:00
filipi87
996aa461ac Sending audio faster than realtime. 2026-05-20 12:03:24 -03:00
Aleix Conchillo Flaqué
f5158d51e7 Add filter-incomplete + function-calling turn-management example
A copy of ``turn-management-filter-incomplete-turns.py`` extended with
a ``get_weather(location)`` direct function. Exercises the path where
the LLM responds to a complete user turn by calling a tool — used to
reproduce (and now verify the fix for) the ``_user_speaking`` gating
bug between filter-incomplete and function calls.
2026-05-15 14:54:51 -07:00
Paul Kompfner
1a4a6f4edf refactor(gemini-live): bring tool-result handling in line with the canonical realtime pattern
Lays groundwork for cancel_on_interruption=False support on Gemini Live by
restructuring _process_completed_function_calls to match the shape used by
AWSNovaSonicLLMService and OpenAIRealtimeLLMService in #4441: a single-pass
forward iteration over raw context messages that detects async-tool
messages via async_tool_messages.parse_message and routes them — started
skipped silently, intermediate logged-as-error and surfaced via push_error,
final delivered via the formal FunctionResponse channel.

Replaces the prior two-pass structure that went through the adapter for
sync results — the service now uses a lightweight self._tool_call_id_to_name
map (populated when the model issues tool calls) for the name lookup the
adapter used to provide. Extracts a new GeminiLLMAdapter.to_function_response_dict
static method for the dict-coercion logic that wraps non-dict tool returns
as {value: <result>} for Gemini's FunctionResponse.response field; the
adapter's existing inline copy in _from_standard_message uses it too.

Example consolidation:

- Folds realtime-gemini-live-function-calling.py into the base
  realtime-gemini-live.py example so the base exercises function calling
  out of the box (matching realtime-openai.py and realtime-aws-nova-sonic.py).
- Renames realtime-gemini-live-vertex-function-calling.py to
  realtime-gemini-live-vertex.py, mirroring the consolidation.
- Adds realtime-gemini-live-async-tool.py.
- Updates scripts/evals/run-release-evals.py for the renames.

This commit alone doesn't make cancel_on_interruption=False fully work on
Gemini Live — additional investigation is pending. This is foundational
work to be built on.
2026-05-08 16:42:54 -04:00
Aleix Conchillo Flaqué
ea3585146c chore(scripts): add release-changelog.py
Adds a script to unfill (single-line) entry paragraphs in CHANGELOG.md
while keeping `(PR [...])` on its own continuation line.
2026-04-27 15:07:53 -07:00
Mark Backman
10e58d6e42 Fix type errors in scripts and add to pyright checked set 2026-04-21 16:17:49 -04:00
Mark Backman
84891de04d Add voice/xai-http.py to release evals 2026-04-21 15:49:59 -04:00
filipi87
0340e25e9f Fixing typecheck for service switcher. 2026-04-17 12:44:57 -03:00
Aleix Conchillo Flaqué
b3bb6fdaa5 Modernize Python typing across the codebase
Automated via ruff UP006, UP007, UP035, UP045 rules (target: py311):

- Replace `typing.List`, `Dict`, `Tuple`, `Set`, `FrozenSet`, `Type`
  with their built-in equivalents (`list`, `dict`, `tuple`, etc.)
- Replace `typing.Optional[X]` with `X | None`
- Replace `typing.Union[X, Y]` with `X | Y`
- Move `Mapping`, `Sequence`, `Callable`, `Awaitable`,
  `MutableMapping`, `MutableSequence`, `Iterator`, `AsyncIterator`,
  `AsyncGenerator` imports from `typing` to `collections.abc`
- Remove now-unused `typing` imports
- Add `from __future__ import annotations` to 5 files that use
  forward-reference strings in `X | "Y"` annotations
2026-04-16 09:28:23 -07:00
Mark Backman
9ffcccdd84 Merge pull request #4253 from pipecat-ai/mb/mistral-stt
Add Mistral Voxtral Realtime STT service
2026-04-15 09:00:27 -04:00
Aleix Conchillo Flaqué
153814ecc2 scripts/evals: create recording subdirectories when saving audio
Example files can live under subdirectories (e.g. foundational/01.py),
so the recording path needs its parent directory created before the
audio file is written.
2026-04-10 13:19:20 -07:00
Mark Backman
215b2dc7f3 Add voice-mistral to evals 2026-04-07 15:37:07 -04:00
kompfner
a3c7f6c2af Merge pull request #4215 from pipecat-ai/pk/remove-openaillmcontext
Remove deprecated `OpenAILLMContext` as well as everything (code path…
2026-04-01 14:03:35 -04:00
Mark Backman
3ca656cae5 Update simli name to match others 2026-03-31 22:54:21 -04:00
Mark Backman
6a84d02156 Update evals
- Removed evals for removed services
- Added eval for function-calling-deepseek.py
2026-03-31 22:13:52 -04:00
Mark Backman
080da8b94c Update eval script paths to match renamed example files 2026-03-31 22:09:42 -04:00
Paul Kompfner
394599d031 Remove deprecated OpenAILLMContext as well as everything (code paths or whole types) dependent on it (all of which were also deprecated) 2026-03-31 18:15:25 -04:00
Mark Backman
47b41a0ff7 Rename services/ to voice/ and function-calling/, flatten to top level
Replace the nested services/speech/ and services/function-calling/ with
top-level voice/ and function-calling/ directories. Update eval script
paths and README to match.
2026-03-31 15:20:03 -04:00
Mark Backman
f14638a1fd Revert "Flatten services/ nesting: promote speech and function-calling to top level"
This reverts commit e1939ecd44.
2026-03-31 14:59:23 -04:00
Mark Backman
e1939ecd44 Flatten services/ nesting: promote speech and function-calling to top level
Move services/speech/* directly into services/ and services/function-calling/*
into top-level function-calling/. Update eval script paths and README.
2026-03-31 14:55:22 -04:00
Mark Backman
e719cbbe6d Reorganize examples into topic-based subfolders
Move 304 examples from a flat numbered directory into 14 descriptive
subfolders: getting-started, services (speech + function-calling),
transcription, vision, realtime, persistent-context,
context-summarization, update-settings (stt/tts/llm), turn-management,
thinking-and-mcp, transports, video-avatar, video-processing, and
features.

Strip numbered prefixes from filenames (e.g. 07c-interruptible-deepgram.py
becomes services/speech/deepgram.py) since the folder context makes them
redundant. Keep numbered prefixes only in getting-started/ where ordering
matters.

Update eval script paths and README to match the new structure.
2026-03-31 13:12:24 -04:00
Mark Backman
f2ce7ececc Move foundational examples to examples/ 2026-03-31 13:12:24 -04:00
Paul Kompfner
b5683556d4 Remove duplicate entries in run-release-evals.py, which appeared after a rebase 2026-03-30 10:03:43 -04:00
Paul Kompfner
f2a8a9e753 Add WebSocket-based OpenAI Responses LLM service with previous_response_id optimization
Introduce a WebSocket variant of the OpenAI Responses API service that
maintains a persistent connection to wss://api.openai.com/v1/responses
for lower-latency inference. The WebSocket variant automatically uses
previous_response_id to send only incremental context when possible,
falling back to full context on reconnection or cache miss.

The WebSocket variant becomes the new default OpenAIResponsesLLMService,
and the HTTP variant is renamed to OpenAIResponsesHttpLLMService. Both
share a private base class with common settings, parameter building,
and run_inference (always HTTP) logic.
2026-03-30 09:58:56 -04:00
Mark Backman
2177e28ee1 Remove OpenPipe integration
OpenPipe was acquired by CoreWeave in September 2025. The Python package
hasn't been updated since June 2025 and the repo since 2024. The openpipe
package caps openai<=1.97.1, creating dependency conflicts with other
extras. Remove the dead integration to clean up the codebase.
2026-03-29 10:12:35 -04:00
Mark Backman
63254fe337 Add NebiusLLMService with developer role and tool support fixes
- Add Nebius LLM service wrapping OpenAI-compatible Token Factory API
- Set supports_developer_role = False (Nebius rejects developer role)
- Default to openai/gpt-oss-120b model (supports function calling)
- Add Nebius function-calling example and env.example entry
- Fix Sarvam developer role support
- Update examples to use developer role for intro messages
2026-03-29 08:50:11 -04:00
Mark Backman
d8b0ed18fd Fix example numbering, add LemonSlice to evals 2026-03-27 10:11:37 -04:00
Mark Backman
21a729ae5d Merge pull request #4146 from pipecat-ai/mb/gemini-live-local-vad 2026-03-26 17:48:21 -04:00
Mark Backman
fe0633ecd1 Add 14s to release evals 2026-03-26 12:27:27 -04:00
Mark Backman
503e5e9106 Fix Gemini Live local VAD by sending correct activity events to server
When Gemini Live was configured with local VAD (server-side VAD disabled),
the service was listening for the wrong frame types and not sending
ActivityStart/ActivityEnd events to the server. Now it listens for
VADUserStartedSpeakingFrame/VADUserStoppedSpeakingFrame and sends the
appropriate activity signals when local VAD is in use.

Also removes the unnecessary local SileroVADAnalyzer from server-side VAD
examples and adds a new 26a example demonstrating local VAD configuration.
2026-03-25 18:00:13 -04:00
Mark Backman
adc003d6c7 Code review cleanup 2026-03-25 10:53:07 -04:00
Paul Kompfner
e0bc9c73c6 Add Anthropic interruptible example (07e) and register in release evals 2026-03-24 16:02:42 -04:00
Mark Backman
6eb988b729 Merge pull request #4092 from harshitajain165/harshita/smallest-tts-only
Add Smallest AI TTS service integration
2026-03-24 11:54:34 -04:00
Mark Backman
51d28b4a9f Code review fixes 2026-03-24 11:21:04 -04:00
kompfner
cf083b8411 Merge pull request #4078 from pipecat-ai/cb/gemini-updates
Updates for Gemini Live
2026-03-24 11:18:00 -04:00
Mark Backman
aa0b49d69f Code review fixes 2026-03-24 09:22:08 -04:00
dhruvladia-sarvam
349b8645f3 Merge branch 'main' into feat/sarvam-llm-integration 2026-03-24 16:34:12 +05:30
dhruvladia-sarvam
696196e30c alignment with pr 4081 2026-03-24 16:29:58 +05:30
Mark Backman
d314e2831a Simplify 26 name, update evals 2026-03-23 15:46:13 -04:00
Paul Kompfner
b1a8588209 feat: add 12- and 14d- image/video examples for OpenAI Responses 2026-03-18 15:39:06 -04:00
Paul Kompfner
45186cc4ce feat: add OpenAI Responses API LLM service
Add OpenAIResponsesLLMService using the Responses API, with a dedicated
adapter that converts LLMContext messages to Responses API input items
(system→developer, tool_calls→function_call, tool→function_call_output,
multimodal content conversion, and tools schema flattening).

- New adapter: open_ai_responses_adapter.py
- New service: openai/responses/llm.py
- Examples: 07-interruptible and 14-function-calling variants
- 19 unit tests for adapter conversion logic
- Eval entries for both examples
2026-03-18 11:45:23 -04:00
Mark Backman
786279f143 Remove unused imports, 2026-03-07 2026-03-09 12:44:47 -04:00