Commit Graph

99 Commits

Author SHA1 Message Date
Xin Wang
3604db21eb Remove obsolete audio example files from the project 2026-03-06 14:43:11 +08:00
Xin Wang
da38157638 Add ASR interim results support in Assistant model and API
- Introduced `asr_interim_enabled` field in the Assistant model to control interim ASR results.
- Updated AssistantBase and AssistantUpdate schemas to include the new field.
- Modified the database schema to add the `asr_interim_enabled` column.
- Enhanced runtime metadata to reflect interim ASR settings.
- Updated API endpoints and tests to validate the new functionality.
- Adjusted documentation to include details about interim ASR results configuration.
2026-03-06 12:58:54 +08:00
Xin Wang
e11c3abb9e Implement DashScope ASR provider and enhance ASR service architecture
- Added DashScope ASR service implementation for real-time streaming.
- Updated ASR provider logic to support DashScope alongside existing providers.
- Enhanced runtime metadata resolution to include DashScope as a valid ASR provider.
- Modified configuration files and documentation to reflect the addition of DashScope.
- Introduced tests to validate DashScope integration and ASR service behavior.
- Refactored ASR service factory to accommodate new provider options and modes.
2026-03-06 11:44:39 +08:00
Xin Wang
7e0b777923 Refactor project structure and enhance backend integration
- Expanded package inclusion in `pyproject.toml` to support new modules.
- Introduced new `adapters` and `protocol` packages for better organization.
- Added backend adapter implementations for control plane integration.
- Updated main application imports to reflect new package structure.
- Removed deprecated core components and adjusted documentation accordingly.
- Enhanced architecture documentation to clarify the new runtime and integration layers.
2026-03-06 09:51:56 +08:00
Xin Wang
4e2450e800 Refactor backend integration and service architecture
- Removed the backend client compatibility wrapper and associated methods to streamline backend integration.
- Updated session management to utilize control plane gateways and runtime configuration providers.
- Adjusted TTS service implementations to remove the EdgeTTS service and simplify service dependencies.
- Enhanced documentation to reflect changes in backend integration and service architecture.
- Updated configuration files to remove deprecated TTS provider options and clarify available settings.
2026-03-06 09:00:43 +08:00
Xin Wang
6b589a1b7c Enhance session management and logging configuration
- Updated .env.example to clarify audio frame size validation and default codec settings.
- Refactored logging setup in main.py to support JSON serialization based on log format configuration.
- Improved session.py to dynamically compute audio frame bytes and include protocol version in session events.
- Added tests to validate session start events and audio frame handling based on chunk size settings.
2026-03-05 21:44:23 +08:00
Xin Wang
1cecbaa172 Update .gitignore and add audio example file
- Removed duplicate entry for Thumbs.db in .gitignore to streamline ignored files.
- Added a new audio example file: three_utterances_simple.wav to the audio_examples directory.
2026-03-05 21:28:17 +08:00
Xin Wang
935f2fbd1f Refactor assistant configuration management and update documentation
- Removed legacy agent profile settings from the .env.example and README, streamlining the configuration process.
- Introduced a new local YAML configuration adapter for assistant settings, allowing for easier management of assistant profiles.
- Updated backend integration documentation to clarify the behavior of assistant config sourcing based on backend URL settings.
- Adjusted various service implementations to directly utilize API keys from the new configuration structure.
- Enhanced test coverage for the new local YAML adapter and its integration with backend services.
2026-03-05 21:24:15 +08:00
Xin Wang
aaef370d70 Merge branch 'master' of https://gitea.xiaowang.eu.org/wx44wx/AI-VideoAssistant 2026-03-04 10:01:41 +08:00
Xin Wang
7d4af18815 Add output.audio.played message handling and update documentation
- Introduced `output.audio.played` message type for client acknowledgment of audio playback completion.
- Updated `DuplexPipeline` to track client playback state and handle playback completion events.
- Enhanced session handling to route `output.audio.played` messages to the pipeline.
- Revised API documentation to include details about the new message type and its fields.
- Updated schema documentation to reflect the addition of `output.audio.played` in the message flow.
2026-03-04 10:01:34 +08:00
Xin Wang
530d95eea4 Enhance Docker configuration and update dependencies for Realtime Agent Studio
- Updated Dockerfile for the API to include build tools for C++11 required for native extensions.
- Revised requirements.txt to upgrade several dependencies, including FastAPI and SQLAlchemy.
- Expanded docker-compose.yml to add MinIO service for S3-compatible storage and improved health checks for backend and engine services.
- Enhanced README.md in the Docker directory to provide detailed service descriptions and quick start instructions.
- Updated mkdocs.yml to reflect new navigation structure and added deployment overview documentation.
- Introduced new Dockerfiles for the engine and web services, including development configurations for hot reloading.
2026-03-04 10:01:00 +08:00
Xin Wang
3aa9e0f432 Enhance DuplexPipeline to support follow-up context for manual opener tool calls
- Introduced logic to trigger a follow-up turn when the manual opener greeting is empty.
- Updated `_execute_manual_opener_tool_calls` to return structured tool call and result data.
- Added `_build_manual_opener_follow_up_context` method to construct context for follow-up turns.
- Modified `_handle_turn` to accept system context for improved conversation management.
- Enhanced tests to validate the new follow-up behavior and ensure proper context handling.
2026-03-02 14:27:44 +08:00
Xin Wang
00b88c5afa Add manual opener tool calls to Assistant model and API
- Introduced `manual_opener_tool_calls` field in the Assistant model to support custom tool calls.
- Updated AssistantBase and AssistantUpdate schemas to include the new field.
- Implemented normalization and migration logic for handling manual opener tool calls in the API.
- Enhanced runtime metadata to include manual opener tool calls in responses.
- Updated tests to validate the new functionality and ensure proper handling of tool calls.
- Refactored tool ID normalization to support legacy tool names for backward compatibility.
2026-03-02 12:34:42 +08:00
Xin Wang
b5cdb76e52 Implement initial generated opener logic in DuplexPipeline to utilize tool-capable assistant turns when tools are available. Update tests to verify the correct behavior of the generated opener under various conditions, ensuring proper handling of user input and task management. 2026-03-02 02:47:30 +08:00
Xin Wang
4d553de34d Refactor assistant greeting logic to conditionally use system prompt for generated openers. Update related tests to verify new behavior and ensure correct metadata handling in API responses. Enhance UI to reflect changes in opener management based on generated opener settings. 2026-03-02 02:38:45 +08:00
Xin Wang
1561056a3d Add voice_choice_prompt and text_choice_prompt tools to API and UI. Implement state management and parameter definitions for user selection prompts, enhancing user interaction and experience. 2026-03-02 00:49:31 +08:00
Xin Wang
3643431565 Enhance WebSocket session configuration by introducing an optional config.resolved event, which provides a public snapshot of the session's configuration. Update the API reference documentation to clarify the conditions under which this event is emitted and the details it includes. Modify session management to respect the new setting for emitting configuration details, ensuring sensitive information remains secure. Update tests to validate the new behavior and ensure compliance with the updated configuration schema. 2026-03-01 23:08:44 +08:00
Xin Wang
6a46ec69f4 Enhance WebSocket session management by requiring assistant_id as a query parameter for connection. Update API reference documentation to reflect changes in message flow and metadata validation rules, including the introduction of whitelists for allowed metadata fields and restrictions on sensitive keys. Refactor client examples to align with the new session initiation process. 2026-03-01 14:10:38 +08:00
Xin Wang
b4fa664d73 Refactor WebSocket authentication handling by removing auth requirements from the hello message. Update related documentation and schemas to reflect the changes in authentication strategy, simplifying the connection process. 2026-02-28 17:33:40 +08:00
Xin Wang
aae41d4512 Clear stale ASR capture on end of utterance in DuplexPipeline. Add test to verify behavior when conversation state changes, ensuring proper handling of ASR capture variables. 2026-02-28 12:32:35 +08:00
Xin Wang
531cf6080a Update DuplexPipeline tool wait timeout to 60 seconds and modify DebugDrawer to improve tool call ID handling. Ensure better integration and functionality across components. 2026-02-27 17:38:36 +08:00
Xin Wang
229243e832 Add wait_for_response functionality to ToolResource and related components. Update API models, schemas, and routers to support new parameter. Enhance UI components to manage wait_for_response state, ensuring proper integration across the application. 2026-02-27 16:54:39 +08:00
Xin Wang
95c6e93a9c Add text_msg_prompt tool to DuplexPipeline and Assistants. Update DebugDrawer to handle text message prompts, including parameter validation and state management for displaying messages. Ensure integration with existing tools and maintain functionality across components. 2026-02-27 16:47:49 +08:00
Xin Wang
cdd8275e35 Add voice_message_prompt tool to API and UI components. Update DuplexPipeline, Assistants, and DebugDrawer to support new tool functionality, including parameter validation and speech synthesis integration. Ensure existing tools are preserved during seeding process in the database. 2026-02-27 16:04:49 +08:00
Xin Wang
b035e023c4 Implement runtime tool ID and display name mapping in DuplexPipeline. Enhance Assistants and ToolLibrary components to utilize new mappings for improved tool identification and display. Update DebugDrawer to reflect changes in tool display names during interactions. 2026-02-27 15:50:43 +08:00
Xin Wang
5f768edf68 Add parameter schema and defaults to ToolResource model and schemas. Implement runtime tool resolution in assistants and tools routers, ensuring proper handling of tool parameters. Update tests to validate new functionality and ensure correct integration of parameter handling in the API. 2026-02-27 14:44:28 +08:00
Xin Wang
d942c85eff Add new tools to DuplexPipeline: calculator, code_interpreter, turn_on_camera, turn_off_camera, increase_volume, and decrease_volume. Implement fallback schema for unknown string tools and assign default client executors for specific tools. Update tests to validate new functionality and ensure correct tool handling in the pipeline. 2026-02-27 13:59:37 +08:00
Xin Wang
6178cc05bb Add system-level dynamic variables support in session management. Implement methods to generate and apply built-in variables for current session time, UTC time, and timezone. Update documentation to reflect new variables and enhance tests for dynamic variable handling in the UI components. 2026-02-27 12:08:18 +08:00
Xin Wang
71cbfa2b48 Enhance DuplexPipeline and AssistantsPage for improved interruption handling. Introduce _OPENER_PRE_ROLL_MS constant for a head start on mic capture, and adjust interruption sensitivity settings from 500ms to 180ms across relevant components to optimize responsiveness during assistant interactions. 2026-02-27 11:51:15 +08:00
Xin Wang
3272a7a68a Add dynamic variables support in session management and UI components. Implement validation rules for dynamic variables in metadata, including key format and value constraints. Enhance session start handling to manage dynamic variable errors. Update documentation and tests to reflect new functionality. 2026-02-27 11:21:37 +08:00
Xin Wang
f1b60bef22 Update ASR delta throttle timing in DuplexPipeline from 300ms to 500ms to improve processing efficiency and responsiveness. 2026-02-27 10:23:06 +08:00
Xin Wang
403b4b93c7 Add ASR capture timeout handling in DuplexPipeline and enhance EOU detection logic. Introduce _ASR_CAPTURE_MAX_MS constant and manage capture state timing to ensure timely end of utterance processing, even during silence. Update EouDetector to allow silence-only EOU when VAD state is lost. 2026-02-27 09:59:54 +08:00
Xin Wang
0b308f9bce Remove deprecated agent configuration files: default.yaml, example.yaml, and tools.yaml, streamlining the agent behavior setup and eliminating unused parameters. 2026-02-27 09:39:23 +08:00
Xin Wang
e14eac347f Update default.yaml configuration for speech agent parameters, adjusting min_speech_duration_ms from 100 to 120 ms and eou_threshold_ms from 800 to 1300 ms. Modify audio model parameters: set start_min_speech_ms to 100 ms, pre_speech_ms to 360 ms, and final_tail_ms to 180 ms for improved audio processing. 2026-02-27 09:00:38 +08:00
Xin Wang
fb95e2abe2 Add opener audio functionality to Assistant model and related schemas, enabling audio generation and playback features. Update API routes and frontend components to support opener audio management, including status retrieval and generation controls. 2026-02-26 14:31:50 +08:00
Xin Wang
da83c8ec8a Implement initial greeting emission in DuplexPipeline after session activation, ensuring proper event ordering for frontend notifications. 2026-02-26 14:07:46 +08:00
Xin Wang
cfc8db3fe7 Implement API URL resolution in OpenAICompatibleASRService to ensure correct endpoint handling for transcription requests. 2026-02-26 12:04:59 +08:00
Xin Wang
14b4b3d966 Implement API URL resolution for OpenAICompatibleTTSService to handle both base and full speech endpoint formats. 2026-02-26 11:07:54 +08:00
Xin Wang
8bc21c7874 Add detailed logging for session runtime configuration and service resolution 2026-02-26 11:02:15 +08:00
Xin Wang
f77f7c7531 Voice library support dashscope 2026-02-26 03:54:52 +08:00
Xin Wang
b193f91432 Set DashScope TTS default mode to commit 2026-02-26 03:10:07 +08:00
Xin Wang
562341a72c add dashscope tts 2026-02-26 03:02:48 +08:00
Xin Wang
72ed7d0512 Unify db api 2026-02-26 01:58:39 +08:00
Xin Wang
56f8aa2191 Fix talking voice error 2026-02-12 19:39:26 +08:00
Xin Wang
ff3a03b1ad Use openai compatible as vendor 2026-02-12 18:44:55 +08:00
Xin Wang
98207936ae Update .env.example 2026-02-12 17:44:38 +08:00
Xin Wang
35bd83767e Cleanup engine 2026-02-12 17:42:21 +08:00
Xin Wang
838c19bf9c Add env example 2026-02-12 17:02:41 +08:00
Xin Wang
aabf2ce8b9 Fix asr begin error 2026-02-12 16:52:42 +08:00
Xin Wang
543528239e Tune engine vad config 2026-02-12 16:29:55 +08:00