Example files can live under subdirectories (e.g. foundational/01.py),
so the recording path needs its parent directory created before the
audio file is written.
Replace the nested services/speech/ and services/function-calling/ with
top-level voice/ and function-calling/ directories. Update eval script
paths and README to match.
Move 304 examples from a flat numbered directory into 14 descriptive
subfolders: getting-started, services (speech + function-calling),
transcription, vision, realtime, persistent-context,
context-summarization, update-settings (stt/tts/llm), turn-management,
thinking-and-mcp, transports, video-avatar, video-processing, and
features.
Strip numbered prefixes from filenames (e.g. 07c-interruptible-deepgram.py
becomes services/speech/deepgram.py) since the folder context makes them
redundant. Keep numbered prefixes only in getting-started/ where ordering
matters.
Update eval script paths and README to match the new structure.
Introduce a WebSocket variant of the OpenAI Responses API service that
maintains a persistent connection to wss://api.openai.com/v1/responses
for lower-latency inference. The WebSocket variant automatically uses
previous_response_id to send only incremental context when possible,
falling back to full context on reconnection or cache miss.
The WebSocket variant becomes the new default OpenAIResponsesLLMService,
and the HTTP variant is renamed to OpenAIResponsesHttpLLMService. Both
share a private base class with common settings, parameter building,
and run_inference (always HTTP) logic.
OpenPipe was acquired by CoreWeave in September 2025. The Python package
hasn't been updated since June 2025 and the repo since 2024. The openpipe
package caps openai<=1.97.1, creating dependency conflicts with other
extras. Remove the dead integration to clean up the codebase.
- Add Nebius LLM service wrapping OpenAI-compatible Token Factory API
- Set supports_developer_role = False (Nebius rejects developer role)
- Default to openai/gpt-oss-120b model (supports function calling)
- Add Nebius function-calling example and env.example entry
- Fix Sarvam developer role support
- Update examples to use developer role for intro messages
When Gemini Live was configured with local VAD (server-side VAD disabled),
the service was listening for the wrong frame types and not sending
ActivityStart/ActivityEnd events to the server. Now it listens for
VADUserStartedSpeakingFrame/VADUserStoppedSpeakingFrame and sends the
appropriate activity signals when local VAD is in use.
Also removes the unnecessary local SileroVADAnalyzer from server-side VAD
examples and adds a new 26a example demonstrating local VAD configuration.
Add OpenAIResponsesLLMService using the Responses API, with a dedicated
adapter that converts LLMContext messages to Responses API input items
(system→developer, tool_calls→function_call, tool→function_call_output,
multimodal content conversion, and tools schema flattening).
- New adapter: open_ai_responses_adapter.py
- New service: openai/responses/llm.py
- Examples: 07-interruptible and 14-function-calling variants
- 19 unit tests for adapter conversion logic
- Eval entries for both examples
Move speech detection tracking outside the per-frame loop in append_audio
since is_speech applies to the whole buffer. Add debug log in
analyze_end_of_turn to show state and probability at decision time. Update
the Krisp VIVA example to use Cartesia TTS and turn analyzer strategy.