Compare commits

..

72 Commits

Author SHA1 Message Date
Xin Wang
d87d3616e9 Add debug transcript components 2026-03-13 07:11:48 +08:00
Xin Wang
def6a11338 Update debug drawer records style 2026-03-13 07:09:42 +08:00
Xin Wang
5eec8f2b30 feat: Implement Dify LLM provider and update related configurations and tests 2026-03-11 16:35:59 +08:00
Xin Wang
3b9ee80c8f feat: Add FastGPT interactive voice toggle to DebugDrawer and state management 2026-03-11 13:59:34 +08:00
Xin Wang
9b9fbf432f Fix fastgpt client tool 3 rounds bugs 2026-03-11 11:33:27 +08:00
Xin Wang
f3612a710d Add fastgpt as seperate assistant mode 2026-03-11 08:37:34 +08:00
Xin Wang
13684d498b feat/fix(frontend): update shadcn compnents, fix debug drawer layout and font sizes 2026-03-10 16:21:58 +08:00
Xin Wang
47293ac46d feat: Add core UI components, Assistants page, Dashscope and Volcengine agent configurations, and a WAV client example. 2026-03-10 03:31:39 +08:00
Xin Wang
373be4eb97 feat: Add DashScope and Volcengine agent configurations, a WAV client for duplex testing, and an Assistants UI page. 2026-03-10 03:13:47 +08:00
Xin Wang
e4ccec6cc1 feat: Introduce DashScope agent configuration, a WAV client for duplex testing, and new UI components for assistants. 2026-03-10 02:25:52 +08:00
Xin Wang
312fe0cf31 Merge branch 'engine-v3' of https://gitea.xiaowang.eu.org/wx44wx/AI-VideoAssistant into engine-v3 2026-03-09 16:58:17 +08:00
Xin Wang
57264ad831 Merge branch 'engine-v3' of https://gitea.xiaowang.eu.org/wx44wx/AI-VideoAssistant into engine-v3 2026-03-09 16:57:19 +08:00
Xin Wang
bfe165daae Add DashScope ASR model support and enhance related components
- Introduced DashScope as a new ASR model in the database initialization.
- Updated ASRModel schema to include vendor information.
- Enhanced ASR router to support DashScope-specific functionality, including connection testing and preview capabilities.
- Modified frontend components to accommodate DashScope as a selectable vendor with appropriate default settings.
- Added tests to validate DashScope ASR model creation, updates, and connectivity.
- Updated backend API to handle DashScope-specific base URLs and vendor normalization.
2026-03-09 07:37:00 +08:00
Xin Wang
e07e5128fc Update mkdocs configuration to streamline navigation structure
- Removed redundant entries from the quick start section for clarity.
- Maintained the inclusion of essential topics to ensure comprehensive guidance for users.
2026-03-09 06:54:05 +08:00
Xin Wang
a2fba260fd Merge branch 'engine-v3' of https://gitea.xiaowang.eu.org/wx44wx/AI-VideoAssistant into engine-v3 2026-03-09 05:41:13 +08:00
Xin Wang
b300b469dc Update documentation for Realtime Agent Studio with enhanced content and structure
- Revised site name and description for clarity and detail.
- Updated navigation structure to better reflect the organization of content.
- Improved changelog entries for better readability and consistency.
- Migrated assistant configuration and prompt guidelines to new documentation paths.
- Enhanced core concepts section to clarify the roles and capabilities of assistants and engines.
- Streamlined workflow documentation to provide clearer guidance on configuration and usage.
2026-03-09 05:38:43 +08:00
Xin Wang
e41d34fe23 Add DashScope agent configuration files for VAD, LLM, TTS, and ASR services
- Introduced new YAML configuration files for DashScope, detailing agent behavior settings for VAD, LLM, TTS, and ASR.
- Configured parameters including model paths, API keys, and service URLs for real-time processing.
- Ensured compatibility with existing agent-side behavior management while providing specific settings for DashScope integration.
2026-03-08 23:28:08 +08:00
Xin Wang
aeeeee20d1 Add Volcengine support for TTS and ASR services
- Introduced Volcengine as a new provider for both TTS and ASR services.
- Updated configuration files to include Volcengine-specific parameters such as app_id, resource_id, and uid.
- Enhanced the ASR service to support streaming mode with Volcengine's API.
- Modified existing tests to validate the integration of Volcengine services.
- Updated documentation to reflect the addition of Volcengine as a supported provider for TTS and ASR.
- Refactored service factory to accommodate Volcengine alongside existing providers.
2026-03-08 23:09:50 +08:00
Xin Wang
3604db21eb Remove obsolete audio example files from the project 2026-03-06 14:43:11 +08:00
Xin Wang
65ae2287d5 Update documentation for assistant configuration and interaction models
- Corrected phrasing in the introduction of RAS as an open-source alternative.
- Added new documentation sections for voice AI and voice agents.
- Enhanced the flowchart for assistant components to include detailed configurations.
- Updated terminology for engine types to clarify distinctions between Pipeline and Realtime engines.
- Introduced a new section on user utterance endpoints (EoU) to explain detection mechanisms and configurations.
2026-03-06 14:38:59 +08:00
Xin Wang
da38157638 Add ASR interim results support in Assistant model and API
- Introduced `asr_interim_enabled` field in the Assistant model to control interim ASR results.
- Updated AssistantBase and AssistantUpdate schemas to include the new field.
- Modified the database schema to add the `asr_interim_enabled` column.
- Enhanced runtime metadata to reflect interim ASR settings.
- Updated API endpoints and tests to validate the new functionality.
- Adjusted documentation to include details about interim ASR results configuration.
2026-03-06 12:58:54 +08:00
Xin Wang
e11c3abb9e Implement DashScope ASR provider and enhance ASR service architecture
- Added DashScope ASR service implementation for real-time streaming.
- Updated ASR provider logic to support DashScope alongside existing providers.
- Enhanced runtime metadata resolution to include DashScope as a valid ASR provider.
- Modified configuration files and documentation to reflect the addition of DashScope.
- Introduced tests to validate DashScope integration and ASR service behavior.
- Refactored ASR service factory to accommodate new provider options and modes.
2026-03-06 11:44:39 +08:00
Xin Wang
7e0b777923 Refactor project structure and enhance backend integration
- Expanded package inclusion in `pyproject.toml` to support new modules.
- Introduced new `adapters` and `protocol` packages for better organization.
- Added backend adapter implementations for control plane integration.
- Updated main application imports to reflect new package structure.
- Removed deprecated core components and adjusted documentation accordingly.
- Enhanced architecture documentation to clarify the new runtime and integration layers.
2026-03-06 09:51:56 +08:00
Xin Wang
4e2450e800 Refactor backend integration and service architecture
- Removed the backend client compatibility wrapper and associated methods to streamline backend integration.
- Updated session management to utilize control plane gateways and runtime configuration providers.
- Adjusted TTS service implementations to remove the EdgeTTS service and simplify service dependencies.
- Enhanced documentation to reflect changes in backend integration and service architecture.
- Updated configuration files to remove deprecated TTS provider options and clarify available settings.
2026-03-06 09:00:43 +08:00
Xin Wang
6b589a1b7c Enhance session management and logging configuration
- Updated .env.example to clarify audio frame size validation and default codec settings.
- Refactored logging setup in main.py to support JSON serialization based on log format configuration.
- Improved session.py to dynamically compute audio frame bytes and include protocol version in session events.
- Added tests to validate session start events and audio frame handling based on chunk size settings.
2026-03-05 21:44:23 +08:00
Xin Wang
1cecbaa172 Update .gitignore and add audio example file
- Removed duplicate entry for Thumbs.db in .gitignore to streamline ignored files.
- Added a new audio example file: three_utterances_simple.wav to the audio_examples directory.
2026-03-05 21:28:17 +08:00
Xin Wang
935f2fbd1f Refactor assistant configuration management and update documentation
- Removed legacy agent profile settings from the .env.example and README, streamlining the configuration process.
- Introduced a new local YAML configuration adapter for assistant settings, allowing for easier management of assistant profiles.
- Updated backend integration documentation to clarify the behavior of assistant config sourcing based on backend URL settings.
- Adjusted various service implementations to directly utilize API keys from the new configuration structure.
- Enhanced test coverage for the new local YAML adapter and its integration with backend services.
2026-03-05 21:24:15 +08:00
Xin Wang
d0a6419990 Remove duplicate entry for Vocode Core from the roadmap documentation, streamlining the list of reference projects. 2026-03-05 13:22:21 +08:00
Xin Wang
b8760c24be Merge branch 'master' of https://gitea.xiaowang.eu.org/wx44wx/AI-VideoAssistant 2026-03-05 13:20:40 +08:00
Xin Wang
14abbe6f10 Update roadmap documentation with additional reference projects
- Added new sections for open-source and commercial projects to enhance resource visibility.
- Included links to various relevant projects, expanding the list of resources available for users.
2026-03-05 13:17:37 +08:00
Xin Wang
efdcbe5550 Update roadmap documentation with additional reference projects
- Added new sections for open-source and commercial projects to enhance resource visibility.
- Included links to various relevant projects, expanding the list of resources available for users.
2026-03-05 13:14:22 +08:00
Xin Wang
3b6a2f75ee Add changelog README and update roadmap with reference projects
- Created a new README file for the changelog to outline version history.
- Updated the roadmap documentation to replace the contribution section with a list of reference projects, enhancing resource visibility.
2026-03-05 12:53:18 +08:00
Xin Wang
ac9b0047ee Add Mermaid diagram support and update architecture documentation
- Included a new JavaScript file for Mermaid configuration to ensure consistent diagram sizing across documentation.
- Enhanced architecture documentation to reflect the updated pipeline engine structure, including VAD, ASR, TD, LLM, and TTS components.
- Updated various sections to clarify the integration of external services and tools within the architecture.
- Improved styling for Mermaid diagrams to enhance visual consistency and usability.
2026-03-05 11:01:56 +08:00
Xin Wang
4748f3b5f1 Merge branch 'master' of https://gitea.xiaowang.eu.org/wx44wx/AI-VideoAssistant 2026-03-04 11:21:47 +08:00
Xin Wang
947af3a525 Refactor mkdocs.yml and add new documentation for workflow configuration and voice customization
- Restructured the navigation in mkdocs.yml to improve organization, introducing subcategories for assistant creation and component libraries.
- Added new documentation for workflow configuration options, detailing setup and best practices.
- Introduced new sections for voice recognition and generation, outlining configuration items and recommendations for optimal performance.
2026-03-04 11:21:33 +08:00
Xin Wang
d572e1a7f0 Merge branch 'master' of https://gitea.xiaowang.eu.org/wx44wx/AI-VideoAssistant 2026-03-04 11:08:27 +08:00
Xin Wang
d03b3b0e0c Refactor mkdocs.yml for improved navigation structure
- Adjusted indentation in mkdocs.yml to enhance readability and maintain consistency in the navigation hierarchy.
- Ensured that sections for "功能定制" and "数据分析" are clearly organized under their respective categories.
2026-03-04 10:57:18 +08:00
Xin Wang
526024d603 Enhance assistant configuration documentation with details on persistence and runtime overrides
- Added a new section explaining the two layers of assistant configuration: database persistence and session-level overrides.
- Included a table listing fields that are stored in the database and those that can be overridden during a session.
- Provided code examples demonstrating the merging of baseline configuration with session overrides for clarity.
2026-03-04 10:57:02 +08:00
Xin Wang
b4c6277d2a Add telephone integration to roadmap documentation
- Included a new item in the roadmap for telephone integration, specifying automatic call handling and batch calling capabilities.
- Updated the existing SDK support section to reflect the addition of this feature.
2026-03-04 10:42:41 +08:00
Xin Wang
a8fa66e9cc Update documentation to reflect changes in WebSocket API message formatting and knowledge base
- Updated the WebSocket API reference to improve clarity by removing unnecessary headings and emphasizing message types.
- Revised the index.md to specify 'chroma' as the knowledge base, enhancing the overview of the platform's architecture.
2026-03-04 10:32:56 +08:00
Xin Wang
aaef370d70 Merge branch 'master' of https://gitea.xiaowang.eu.org/wx44wx/AI-VideoAssistant 2026-03-04 10:01:41 +08:00
Xin Wang
7d4af18815 Add output.audio.played message handling and update documentation
- Introduced `output.audio.played` message type for client acknowledgment of audio playback completion.
- Updated `DuplexPipeline` to track client playback state and handle playback completion events.
- Enhanced session handling to route `output.audio.played` messages to the pipeline.
- Revised API documentation to include details about the new message type and its fields.
- Updated schema documentation to reflect the addition of `output.audio.played` in the message flow.
2026-03-04 10:01:34 +08:00
Xin Wang
530d95eea4 Enhance Docker configuration and update dependencies for Realtime Agent Studio
- Updated Dockerfile for the API to include build tools for C++11 required for native extensions.
- Revised requirements.txt to upgrade several dependencies, including FastAPI and SQLAlchemy.
- Expanded docker-compose.yml to add MinIO service for S3-compatible storage and improved health checks for backend and engine services.
- Enhanced README.md in the Docker directory to provide detailed service descriptions and quick start instructions.
- Updated mkdocs.yml to reflect new navigation structure and added deployment overview documentation.
- Introduced new Dockerfiles for the engine and web services, including development configurations for hot reloading.
2026-03-04 10:01:00 +08:00
Xin Wang
4c05131536 Update documentation and configuration for Realtime Agent Studio
- Revised mkdocs.yml to reflect the new site name and description, enhancing clarity for users.
- Added a changelog.md to document important changes and updates for the project.
- Introduced a roadmap.md to outline development plans and progress for future releases.
- Expanded index.md with a comprehensive overview of the platform, including core features and installation instructions.
- Enhanced concepts documentation with detailed explanations of assistants, engines, and their configurations.
- Updated configuration documentation to provide clear guidance on environment setup and service configurations.
- Added extra JavaScript for improved user experience in the documentation site.
2026-03-02 23:35:22 +08:00
Xin Wang
80fff09b76 Merge branch 'master' of https://gitea.xiaowang.eu.org/wx44wx/AI-VideoAssistant 2026-03-02 22:51:03 +08:00
Xin Wang
eecde9f0fb Integrate React Query for data management and enhance Debug Preferences
- Added React Query for managing API calls related to assistants and voices.
- Introduced `useAssistantsQuery` and `useVoicesQuery` hooks for fetching data.
- Implemented mutations for creating, updating, and deleting voices using React Query.
- Integrated a global `QueryClient` for managing query states and configurations.
- Refactored components to utilize the new query hooks, improving data handling and performance.
- Added a Zustand store for managing debug preferences, including WebSocket URL and audio settings.
2026-03-02 22:50:57 +08:00
Xin Wang
7fbf52078f Update documentation to reflect changes in quickstart navigation and API reference
- Replaced the "通过控制台" and "通过 API" entries in the quickstart section with "资源库配置" for improved clarity.
- Updated the API reference link in index.md to direct users to the main quickstart page instead of the outdated API usage example.
2026-03-02 17:33:32 +08:00
Xin Wang
a003134477 Update documentation to enhance clarity and resource configuration for RAS
- Revised the introduction in index.md to emphasize the need for resource configuration before creating an AI assistant.
- Added a new section detailing the configuration process for ASR, LLM, and TTS resources.
- Updated the quickstart guide to reflect the new resource management steps and included troubleshooting tips for common issues.
- Removed the outdated API guide as it has been integrated into the new resource configuration workflow.
2026-03-02 17:30:48 +08:00
Xin Wang
85315ba6ca Update index.md to clarify RAS's core focus on large voice models
- Revised the description of the Realtime Agent Studio (RAS) to emphasize its foundation on large voice models, enhancing clarity on the platform's capabilities.
2026-03-02 17:01:55 +08:00
Xin Wang
9734b38808 Add task list support and update roadmap in documentation
- Added pymdownx.tasklist extension to mkdocs.yml for enhanced task management.
- Revised the roadmap section in index.md to include additional completed and in-progress tasks, improving project tracking and visibility.
2026-03-02 17:01:24 +08:00
Xin Wang
0a7a3253a6 Add emoji support and enhance documentation in RAS
- Added pymdownx.emoji extension to mkdocs.yml for emoji rendering.
- Updated index.md to include a new dashboard image and revised descriptions for clarity.
- Expanded the features section with detailed descriptions of tools and testing capabilities.
- Introduced a roadmap section outlining completed, in-progress, and to-do features for better project visibility.
2026-03-02 16:50:17 +08:00
Xin Wang
a82100fc79 Merge branch 'master' of https://gitea.xiaowang.eu.org/wx44wx/AI-VideoAssistant 2026-03-02 15:12:04 +08:00
Xin Wang
d0897aca92 Update documentation to reflect rebranding from AI Video Assistant to Realtime Agent Studio (RAS)
- Changed site name and description in mkdocs.yml.
- Revised content in index.md to provide a comprehensive overview of RAS features and capabilities.
- Updated API reference and error documentation to replace AI Video Assistant with RAS.
- Modified deployment and getting started guides to align with the new branding.
- Enhanced quickstart instructions to specify RAS service requirements.
2026-03-02 15:11:33 +08:00
Xin Wang
70b4043f9b Enhance DebugDrawer to support voice prompts in text prompt dialogs
- Added `promptType` and `voiceText` properties to `DebugTextPromptDialogState`.
- Updated state management for text prompt dialogs to handle voice prompts.
- Modified dialog activation logic to play voice prompts when applicable.
- Adjusted UI to reflect the type of prompt being displayed (text or voice).
- Ensured proper handling of prompt closure messages based on prompt type.
2026-03-02 15:10:03 +08:00
Xin Wang
3aa9e0f432 Enhance DuplexPipeline to support follow-up context for manual opener tool calls
- Introduced logic to trigger a follow-up turn when the manual opener greeting is empty.
- Updated `_execute_manual_opener_tool_calls` to return structured tool call and result data.
- Added `_build_manual_opener_follow_up_context` method to construct context for follow-up turns.
- Modified `_handle_turn` to accept system context for improved conversation management.
- Enhanced tests to validate the new follow-up behavior and ensure proper context handling.
2026-03-02 14:27:44 +08:00
Xin Wang
fb017f9952 Refactor selectedToolSchemas logic in DebugDrawer to simplify tool ID normalization. Removed redundant inclusion of DEBUG_CLIENT_TOOLS, enhancing code clarity and performance. 2026-03-02 12:40:00 +08:00
Xin Wang
00b88c5afa Add manual opener tool calls to Assistant model and API
- Introduced `manual_opener_tool_calls` field in the Assistant model to support custom tool calls.
- Updated AssistantBase and AssistantUpdate schemas to include the new field.
- Implemented normalization and migration logic for handling manual opener tool calls in the API.
- Enhanced runtime metadata to include manual opener tool calls in responses.
- Updated tests to validate the new functionality and ensure proper handling of tool calls.
- Refactored tool ID normalization to support legacy tool names for backward compatibility.
2026-03-02 12:34:42 +08:00
Xin Wang
b5cdb76e52 Implement initial generated opener logic in DuplexPipeline to utilize tool-capable assistant turns when tools are available. Update tests to verify the correct behavior of the generated opener under various conditions, ensuring proper handling of user input and task management. 2026-03-02 02:47:30 +08:00
Xin Wang
4d553de34d Refactor assistant greeting logic to conditionally use system prompt for generated openers. Update related tests to verify new behavior and ensure correct metadata handling in API responses. Enhance UI to reflect changes in opener management based on generated opener settings. 2026-03-02 02:38:45 +08:00
Xin Wang
31b3969b96 Enhance ToolLibrary by adding sourceKey to ToolParameterDraft and updating related functions for improved schema management. Introduce normalization functions for object schemas and defaults, and refactor buildToolParameterConfig to utilize these enhancements. Update state management in ToolLibraryPage to accommodate new schema handling and defaults integration. 2026-03-02 02:18:28 +08:00
Xin Wang
3f22e2b875 Merge branch 'master' of https://gitea.xiaowang.eu.org/wx44wx/AI-VideoAssistant 2026-03-02 01:56:47 +08:00
Xin Wang
531688aa6b Enhance API documentation by adding new endpoints for ASR preview, assistant configuration retrieval, and knowledge base management. Update existing assistant and tool definitions for improved clarity and functionality. Remove outdated sections from history records documentation, ensuring a streamlined reference for users. 2026-03-02 01:56:38 +08:00
Xin Wang
3626297211 Implement schema editor functionality in ToolLibrary, allowing users to manage tool parameters with JSON schema validation. Add a drawer for schema editing, enhance state management for schema-related errors, and integrate schema defaults into tool parameter configuration. Update UI to include a button for opening the schema drawer. 2026-03-02 01:54:54 +08:00
Xin Wang
1561056a3d Add voice_choice_prompt and text_choice_prompt tools to API and UI. Implement state management and parameter definitions for user selection prompts, enhancing user interaction and experience. 2026-03-02 00:49:31 +08:00
Xin Wang
3a5d27d6c3 Implement runtime configuration debugging in DebugDrawer by adding a new function to format session metadata and WebSocket configuration. Update the display logic to enhance clarity and user experience, including renaming UI elements for better context. 2026-03-01 23:14:08 +08:00
Xin Wang
3643431565 Enhance WebSocket session configuration by introducing an optional config.resolved event, which provides a public snapshot of the session's configuration. Update the API reference documentation to clarify the conditions under which this event is emitted and the details it includes. Modify session management to respect the new setting for emitting configuration details, ensuring sensitive information remains secure. Update tests to validate the new behavior and ensure compliance with the updated configuration schema. 2026-03-01 23:08:44 +08:00
Xin Wang
2418df80e5 Revamp documentation structure in mkdocs.yml by reorganizing navigation for improved accessibility. Remove outdated content from previous sections and introduce new topics including detailed guides on assistant management, configuration options, and tool integrations. Enhance API reference documentation with comprehensive error codes and WebSocket protocol details. Add new sections for automated testing, data analysis, and knowledge base management, ensuring a cohesive and user-friendly documentation experience. 2026-03-01 22:38:50 +08:00
Xin Wang
6a46ec69f4 Enhance WebSocket session management by requiring assistant_id as a query parameter for connection. Update API reference documentation to reflect changes in message flow and metadata validation rules, including the introduction of whitelists for allowed metadata fields and restrictions on sensitive keys. Refactor client examples to align with the new session initiation process. 2026-03-01 14:10:38 +08:00
Xin Wang
b4fa664d73 Refactor WebSocket authentication handling by removing auth requirements from the hello message. Update related documentation and schemas to reflect the changes in authentication strategy, simplifying the connection process. 2026-02-28 17:33:40 +08:00
Xin Wang
0821d73e7c Add API reference documentation for WebSocket communication. Update mkdocs.yml to include new API reference section. 2026-02-28 14:37:58 +08:00
Xin Wang
a7da109983 Merge branch 'master' of https://gitea.xiaowang.eu.org/wx44wx/AI-VideoAssistant 2026-02-28 12:33:23 +08:00
Xin Wang
c4c473105e Add start-dev.ps1 script to automate the launch of development services in the pycall conda environment. The script initiates the API, Web, and Engine services in separate PowerShell windows, enhancing the development workflow. 2026-02-28 11:26:52 +08:00
240 changed files with 26852 additions and 7229 deletions

5
.gitignore vendored
View File

@@ -1,6 +1,3 @@
# OS artifacts
.DS_Store
Thumbs.db
# Workspace runtime data
data/
Thumbs.db

View File

@@ -1,15 +1,17 @@
FROM python:3.12-slim
# Install build tools for C++11 (needed for native extensions, e.g. chromadb)
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
# 安装依赖
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# 复制代码
COPY . .
# 创建数据目录
RUN mkdir -p /app/data
EXPOSE 8100

View File

@@ -117,6 +117,7 @@ class Assistant(Base):
call_count: Mapped[int] = mapped_column(Integer, default=0)
first_turn_mode: Mapped[str] = mapped_column(String(32), default="bot_first")
opener: Mapped[str] = mapped_column(Text, default="")
manual_opener_tool_calls: Mapped[list] = mapped_column(JSON, default=list)
generated_opener_enabled: Mapped[bool] = mapped_column(default=False)
prompt: Mapped[str] = mapped_column(Text, default="")
knowledge_base_id: Mapped[Optional[str]] = mapped_column(String(64), nullable=True)
@@ -126,11 +127,13 @@ class Assistant(Base):
speed: Mapped[float] = mapped_column(Float, default=1.0)
hotwords: Mapped[dict] = mapped_column(JSON, default=list)
tools: Mapped[dict] = mapped_column(JSON, default=list)
asr_interim_enabled: Mapped[bool] = mapped_column(default=False)
bot_cannot_be_interrupted: Mapped[bool] = mapped_column(default=False)
interruption_sensitivity: Mapped[int] = mapped_column(Integer, default=500)
config_mode: Mapped[str] = mapped_column(String(32), default="platform")
api_url: Mapped[Optional[str]] = mapped_column(String(255), nullable=True)
api_key: Mapped[Optional[str]] = mapped_column(String(255), nullable=True)
app_id: Mapped[Optional[str]] = mapped_column(String(255), nullable=True)
# 模型关联
llm_model_id: Mapped[Optional[str]] = mapped_column(String(64), nullable=True)
asr_model_id: Mapped[Optional[str]] = mapped_column(String(64), nullable=True)

View File

@@ -1,6 +1,14 @@
import asyncio
import base64
import io
import json
import os
import sys
import threading
import time
from typing import List, Optional
import wave
from array import array
from typing import Any, Dict, List, Optional, Tuple
import httpx
from fastapi import APIRouter, Depends, File, Form, HTTPException, UploadFile
@@ -17,6 +25,32 @@ from ..schemas import (
router = APIRouter(prefix="/asr", tags=["ASR Models"])
OPENAI_COMPATIBLE_DEFAULT_ASR_MODEL = "FunAudioLLM/SenseVoiceSmall"
DASHSCOPE_DEFAULT_ASR_MODEL = "qwen3-asr-flash-realtime"
DASHSCOPE_DEFAULT_BASE_URL = "wss://dashscope.aliyuncs.com/api-ws/v1/realtime"
try:
import dashscope
from dashscope.audio.qwen_omni import MultiModality, OmniRealtimeCallback, OmniRealtimeConversation
try:
from dashscope.audio.qwen_omni import TranscriptionParams
except ImportError:
from dashscope.audio.qwen_omni.omni_realtime import TranscriptionParams
DASHSCOPE_SDK_AVAILABLE = True
DASHSCOPE_IMPORT_ERROR = ""
except Exception as exc:
dashscope = None # type: ignore[assignment]
MultiModality = None # type: ignore[assignment]
OmniRealtimeConversation = None # type: ignore[assignment]
TranscriptionParams = None # type: ignore[assignment]
DASHSCOPE_SDK_AVAILABLE = False
DASHSCOPE_IMPORT_ERROR = f"{type(exc).__name__}: {exc}"
class OmniRealtimeCallback: # type: ignore[no-redef]
"""Fallback callback base when DashScope SDK is unavailable."""
pass
def _is_openai_compatible_vendor(vendor: str) -> bool:
@@ -29,12 +63,377 @@ def _is_openai_compatible_vendor(vendor: str) -> bool:
}
def _is_dashscope_vendor(vendor: str) -> bool:
return (vendor or "").strip().lower() == "dashscope"
def _default_asr_model(vendor: str) -> str:
if _is_openai_compatible_vendor(vendor):
return OPENAI_COMPATIBLE_DEFAULT_ASR_MODEL
if _is_dashscope_vendor(vendor):
return DASHSCOPE_DEFAULT_ASR_MODEL
return "whisper-1"
def _dashscope_language(language: Optional[str]) -> Optional[str]:
normalized = (language or "").strip().lower()
if not normalized or normalized in {"multi-lingual", "multilingual", "multi_lingual", "auto"}:
return None
if normalized.startswith("zh"):
return "zh"
if normalized.startswith("en"):
return "en"
return normalized
class _DashScopePreviewCallback(OmniRealtimeCallback):
"""Collect DashScope ASR websocket events for preview/test flows."""
def __init__(self) -> None:
super().__init__()
self._open_event = threading.Event()
self._session_ready_event = threading.Event()
self._done_event = threading.Event()
self._lock = threading.Lock()
self._final_text = ""
self._last_interim_text = ""
self._error_message: Optional[str] = None
def on_open(self) -> None:
self._open_event.set()
def on_close(self, code: int, reason: str) -> None:
if self._done_event.is_set():
return
self._error_message = f"DashScope websocket closed unexpectedly: {code} {reason}"
self._done_event.set()
self._session_ready_event.set()
def on_error(self, message: Any) -> None:
self._error_message = str(message)
self._done_event.set()
self._session_ready_event.set()
def on_event(self, response: Any) -> None:
payload = _coerce_dashscope_event(response)
event_type = str(payload.get("type") or "").strip()
if not event_type:
return
if event_type in {"session.created", "session.updated"}:
self._session_ready_event.set()
return
if event_type == "error" or event_type.endswith(".failed"):
self._error_message = _format_dashscope_error_event(payload)
self._done_event.set()
self._session_ready_event.set()
return
if event_type == "conversation.item.input_audio_transcription.text":
interim_text = _extract_dashscope_text(payload, keys=("stash", "text", "transcript"))
if interim_text:
with self._lock:
self._last_interim_text = interim_text
return
if event_type == "conversation.item.input_audio_transcription.completed":
final_text = _extract_dashscope_text(payload, keys=("transcript", "text", "stash"))
with self._lock:
if final_text:
self._final_text = final_text
self._done_event.set()
return
if event_type in {"response.done", "session.finished"}:
self._done_event.set()
def wait_for_open(self, timeout: float = 10.0) -> None:
if not self._open_event.wait(timeout):
raise TimeoutError("DashScope websocket open timeout")
def wait_for_session_ready(self, timeout: float = 6.0) -> bool:
return self._session_ready_event.wait(timeout)
def wait_for_done(self, timeout: float = 20.0) -> None:
if not self._done_event.wait(timeout):
raise TimeoutError("DashScope transcription timeout")
def raise_if_error(self) -> None:
if self._error_message:
raise RuntimeError(self._error_message)
def read_text(self) -> str:
with self._lock:
return self._final_text or self._last_interim_text
def _coerce_dashscope_event(response: Any) -> Dict[str, Any]:
if isinstance(response, dict):
return response
if isinstance(response, str):
try:
parsed = json.loads(response)
if isinstance(parsed, dict):
return parsed
except json.JSONDecodeError:
pass
return {"type": "raw", "message": str(response)}
def _format_dashscope_error_event(payload: Dict[str, Any]) -> str:
error = payload.get("error")
if isinstance(error, dict):
code = str(error.get("code") or "").strip()
message = str(error.get("message") or "").strip()
if code and message:
return f"{code}: {message}"
return message or str(error)
return str(error or "DashScope realtime ASR error")
def _extract_dashscope_text(payload: Dict[str, Any], *, keys: Tuple[str, ...]) -> str:
for key in keys:
value = payload.get(key)
if isinstance(value, str) and value.strip():
return value.strip()
if isinstance(value, dict):
nested = _extract_dashscope_text(value, keys=keys)
if nested:
return nested
for value in payload.values():
if isinstance(value, dict):
nested = _extract_dashscope_text(value, keys=keys)
if nested:
return nested
return ""
def _create_dashscope_realtime_client(
*,
model: str,
callback: _DashScopePreviewCallback,
url: str,
api_key: str,
) -> Any:
if OmniRealtimeConversation is None:
raise RuntimeError("DashScope SDK unavailable")
init_kwargs = {
"model": model,
"callback": callback,
"url": url,
}
try:
return OmniRealtimeConversation(api_key=api_key, **init_kwargs) # type: ignore[misc]
except TypeError as exc:
if "api_key" not in str(exc):
raise
return OmniRealtimeConversation(**init_kwargs) # type: ignore[misc]
def _close_dashscope_client(client: Any) -> None:
finish_fn = getattr(client, "finish", None)
if callable(finish_fn):
try:
finish_fn()
except Exception:
pass
close_fn = getattr(client, "close", None)
if callable(close_fn):
try:
close_fn()
except Exception:
pass
def _configure_dashscope_session(
*,
client: Any,
callback: _DashScopePreviewCallback,
sample_rate: int,
language: Optional[str],
) -> None:
update_fn = getattr(client, "update_session", None)
if not callable(update_fn):
raise RuntimeError("DashScope ASR SDK missing update_session method")
text_modality: Any = "text"
if MultiModality is not None and hasattr(MultiModality, "TEXT"):
text_modality = MultiModality.TEXT
transcription_params: Optional[Any] = None
language_hint = _dashscope_language(language)
if TranscriptionParams is not None:
try:
params_kwargs: Dict[str, Any] = {
"sample_rate": sample_rate,
"input_audio_format": "pcm",
}
if language_hint:
params_kwargs["language"] = language_hint
transcription_params = TranscriptionParams(**params_kwargs)
except Exception:
transcription_params = None
update_attempts = [
{
"output_modalities": [text_modality],
"enable_turn_detection": False,
"enable_input_audio_transcription": True,
"transcription_params": transcription_params,
},
{
"output_modalities": [text_modality],
"enable_turn_detection": False,
"enable_input_audio_transcription": True,
},
{
"output_modalities": [text_modality],
},
]
last_error: Optional[Exception] = None
for params in update_attempts:
if params.get("transcription_params") is None:
params = {key: value for key, value in params.items() if key != "transcription_params"}
try:
update_fn(**params)
callback.wait_for_session_ready()
callback.raise_if_error()
return
except TypeError as exc:
last_error = exc
continue
except Exception as exc:
last_error = exc
continue
raise RuntimeError(f"DashScope ASR session.update failed: {last_error}")
def _load_wav_pcm16_mono(audio_bytes: bytes) -> Tuple[bytes, int]:
try:
with wave.open(io.BytesIO(audio_bytes), "rb") as wav_file:
channel_count = wav_file.getnchannels()
sample_width = wav_file.getsampwidth()
sample_rate = wav_file.getframerate()
compression = wav_file.getcomptype()
pcm_frames = wav_file.readframes(wav_file.getnframes())
except wave.Error as exc:
raise RuntimeError("DashScope preview currently supports WAV audio. Record in browser or upload a .wav file.") from exc
if compression != "NONE":
raise RuntimeError("DashScope preview requires uncompressed PCM WAV audio.")
if sample_width != 2:
raise RuntimeError("DashScope preview requires 16-bit PCM WAV audio.")
if not pcm_frames:
raise RuntimeError("Uploaded WAV file is empty")
if channel_count <= 1:
return pcm_frames, sample_rate
samples = array("h")
samples.frombytes(pcm_frames)
if sys.byteorder == "big":
samples.byteswap()
mono_samples = array(
"h",
(
int(sum(samples[index:index + channel_count]) / channel_count)
for index in range(0, len(samples), channel_count)
),
)
if sys.byteorder == "big":
mono_samples.byteswap()
return mono_samples.tobytes(), sample_rate
def _probe_dashscope_asr_connection(*, api_key: str, base_url: str, model: str, language: Optional[str]) -> None:
if not DASHSCOPE_SDK_AVAILABLE:
hint = f"`{sys.executable} -m pip install dashscope>=1.25.11`"
detail = f"; import error: {DASHSCOPE_IMPORT_ERROR}" if DASHSCOPE_IMPORT_ERROR else ""
raise RuntimeError(f"dashscope package not installed; install with {hint}{detail}")
callback = _DashScopePreviewCallback()
if dashscope is not None:
dashscope.api_key = api_key
client = _create_dashscope_realtime_client(
model=model,
callback=callback,
url=base_url,
api_key=api_key,
)
try:
client.connect()
callback.wait_for_open()
_configure_dashscope_session(
client=client,
callback=callback,
sample_rate=16000,
language=language,
)
finally:
_close_dashscope_client(client)
def _transcribe_dashscope_preview(
*,
audio_bytes: bytes,
api_key: str,
base_url: str,
model: str,
language: Optional[str],
) -> Dict[str, Any]:
if not DASHSCOPE_SDK_AVAILABLE:
hint = f"`{sys.executable} -m pip install dashscope>=1.25.11`"
detail = f"; import error: {DASHSCOPE_IMPORT_ERROR}" if DASHSCOPE_IMPORT_ERROR else ""
raise RuntimeError(f"dashscope package not installed; install with {hint}{detail}")
pcm_audio, sample_rate = _load_wav_pcm16_mono(audio_bytes)
callback = _DashScopePreviewCallback()
if dashscope is not None:
dashscope.api_key = api_key
client = _create_dashscope_realtime_client(
model=model,
callback=callback,
url=base_url,
api_key=api_key,
)
try:
client.connect()
callback.wait_for_open()
_configure_dashscope_session(
client=client,
callback=callback,
sample_rate=sample_rate,
language=language,
)
append_fn = getattr(client, "append_audio", None)
if not callable(append_fn):
raise RuntimeError("DashScope ASR SDK missing append_audio method")
commit_fn = getattr(client, "commit", None)
if not callable(commit_fn):
raise RuntimeError("DashScope ASR SDK missing commit method")
append_fn(base64.b64encode(pcm_audio).decode("ascii"))
commit_fn()
callback.wait_for_done()
callback.raise_if_error()
return {
"transcript": callback.read_text(),
"language": _dashscope_language(language) or "Multi-lingual",
"confidence": None,
}
finally:
_close_dashscope_client(client)
# ============ ASR Models CRUD ============
@router.get("")
def list_asr_models(
@@ -132,6 +531,27 @@ def test_asr_model(
start_time = time.time()
try:
if _is_dashscope_vendor(model.vendor):
effective_api_key = (model.api_key or "").strip() or os.getenv("DASHSCOPE_API_KEY", "").strip() or os.getenv("ASR_API_KEY", "").strip()
if not effective_api_key:
return ASRTestResponse(success=False, error=f"API key is required for ASR model: {model.name}")
base_url = (model.base_url or "").strip() or DASHSCOPE_DEFAULT_BASE_URL
selected_model = (model.model_name or "").strip() or _default_asr_model(model.vendor)
_probe_dashscope_asr_connection(
api_key=effective_api_key,
base_url=base_url,
model=selected_model,
language=model.language,
)
latency_ms = int((time.time() - start_time) * 1000)
return ASRTestResponse(
success=True,
language=model.language,
latency_ms=latency_ms,
message="DashScope realtime ASR connected",
)
# 连接性测试优先,避免依赖真实音频输入
headers = {"Authorization": f"Bearer {model.api_key}"}
with httpx.Client(timeout=60.0) as client:
@@ -246,7 +666,7 @@ async def preview_asr_model(
api_key: Optional[str] = Form(None),
db: Session = Depends(get_db),
):
"""预览 ASR上传音频并调用 OpenAI-compatible /audio/transcriptions"""
"""预览 ASR根据供应商调用 OpenAI-compatible 或 DashScope 实时识别"""
model = db.query(ASRModel).filter(ASRModel.id == id).first()
if not model:
raise HTTPException(status_code=404, detail="ASR Model not found")
@@ -264,18 +684,50 @@ async def preview_asr_model(
raise HTTPException(status_code=400, detail="Uploaded audio file is empty")
effective_api_key = (api_key or "").strip() or (model.api_key or "").strip()
if not effective_api_key and _is_openai_compatible_vendor(model.vendor):
effective_api_key = os.getenv("SILICONFLOW_API_KEY", "").strip()
if not effective_api_key:
if _is_openai_compatible_vendor(model.vendor):
effective_api_key = os.getenv("SILICONFLOW_API_KEY", "").strip()
elif _is_dashscope_vendor(model.vendor):
effective_api_key = os.getenv("DASHSCOPE_API_KEY", "").strip() or os.getenv("ASR_API_KEY", "").strip()
if not effective_api_key:
raise HTTPException(status_code=400, detail=f"API key is required for ASR model: {model.name}")
base_url = (model.base_url or "").strip().rstrip("/")
if _is_dashscope_vendor(model.vendor) and not base_url:
base_url = DASHSCOPE_DEFAULT_BASE_URL
if not base_url:
raise HTTPException(status_code=400, detail=f"Base URL is required for ASR model: {model.name}")
selected_model = (model.model_name or "").strip() or _default_asr_model(model.vendor)
data = {"model": selected_model}
effective_language = (language or "").strip() or None
start_time = time.time()
if _is_dashscope_vendor(model.vendor):
try:
payload = await asyncio.to_thread(
_transcribe_dashscope_preview,
audio_bytes=audio_bytes,
api_key=effective_api_key,
base_url=base_url,
model=selected_model,
language=effective_language or model.language,
)
except Exception as exc:
raise HTTPException(status_code=502, detail=f"DashScope ASR request failed: {exc}") from exc
transcript = str(payload.get("transcript") or "")
response_language = str(payload.get("language") or effective_language or model.language)
latency_ms = int((time.time() - start_time) * 1000)
return ASRTestResponse(
success=bool(transcript),
transcript=transcript,
language=response_language,
confidence=None,
latency_ms=latency_ms,
message=None if transcript else "No transcript in response",
)
data = {"model": selected_model}
if effective_language:
data["language"] = effective_language
if model.hotwords:
@@ -284,7 +736,6 @@ async def preview_asr_model(
headers = {"Authorization": f"Bearer {effective_api_key}"}
files = {"file": (filename, audio_bytes, content_type)}
start_time = time.time()
try:
with httpx.Client(timeout=90.0) as client:
response = client.post(

View File

@@ -7,6 +7,7 @@ from pathlib import Path
import httpx
from fastapi import APIRouter, Depends, HTTPException
from fastapi.responses import FileResponse
from sqlalchemy import inspect, text
from sqlalchemy.orm import Session
from typing import Any, Dict, List, Optional
import uuid
@@ -27,6 +28,7 @@ from .tools import (
TOOL_CATEGORY_MAP,
TOOL_PARAMETER_DEFAULTS,
TOOL_WAIT_FOR_RESPONSE_DEFAULTS,
normalize_tool_id,
_ensure_tool_resource_schema,
)
@@ -111,9 +113,97 @@ def _compose_runtime_system_prompt(base_prompt: Optional[str]) -> str:
return f"{raw}\n\n{tool_policy}" if raw else tool_policy
def _ensure_assistant_schema(db: Session) -> None:
"""Apply lightweight SQLite migrations for newly added assistants columns."""
bind = db.get_bind()
inspector = inspect(bind)
try:
columns = {col["name"] for col in inspector.get_columns("assistants")}
except Exception:
return
altered = False
if "manual_opener_tool_calls" not in columns:
db.execute(text("ALTER TABLE assistants ADD COLUMN manual_opener_tool_calls JSON"))
altered = True
if "asr_interim_enabled" not in columns:
db.execute(text("ALTER TABLE assistants ADD COLUMN asr_interim_enabled BOOLEAN DEFAULT 0"))
altered = True
if "app_id" not in columns:
db.execute(text("ALTER TABLE assistants ADD COLUMN app_id VARCHAR(255)"))
altered = True
if altered:
db.commit()
def _normalize_manual_opener_tool_calls(raw: Any, warnings: Optional[List[str]] = None) -> List[Dict[str, Any]]:
normalized: List[Dict[str, Any]] = []
if not isinstance(raw, list):
return normalized
for idx, item in enumerate(raw):
if not isinstance(item, dict):
if warnings is not None:
warnings.append(f"Ignored invalid manual opener tool call at index {idx}: not an object")
continue
tool_name = normalize_tool_id(str(
item.get("toolName")
or item.get("tool_name")
or item.get("name")
or ""
).strip())
if not tool_name:
if warnings is not None:
warnings.append(f"Ignored invalid manual opener tool call at index {idx}: missing toolName")
continue
args_raw = item.get("arguments")
args: Dict[str, Any] = {}
if isinstance(args_raw, dict):
args = dict(args_raw)
elif isinstance(args_raw, str):
text_value = args_raw.strip()
if text_value:
try:
parsed = json.loads(text_value)
if isinstance(parsed, dict):
args = parsed
else:
if warnings is not None:
warnings.append(
f"Ignored non-object arguments for manual opener tool call '{tool_name}' at index {idx}"
)
except Exception:
if warnings is not None:
warnings.append(f"Ignored invalid JSON arguments for manual opener tool call '{tool_name}' at index {idx}")
elif args_raw is not None and warnings is not None:
warnings.append(f"Ignored unsupported arguments type for manual opener tool call '{tool_name}' at index {idx}")
normalized.append({"toolName": tool_name, "arguments": args})
# Keep opener sequence intentionally short to avoid long pre-dialog delays.
return normalized[:8]
def _normalize_assistant_tool_ids(raw: Any) -> List[str]:
if not isinstance(raw, list):
return []
normalized: List[str] = []
seen: set[str] = set()
for item in raw:
tool_id = normalize_tool_id(item)
if not tool_id or tool_id in seen:
continue
seen.add(tool_id)
normalized.append(tool_id)
return normalized
def _resolve_runtime_tools(db: Session, selected_tool_ids: List[str], warnings: List[str]) -> List[Dict[str, Any]]:
_ensure_tool_resource_schema(db)
ids = [str(tool_id).strip() for tool_id in selected_tool_ids if str(tool_id).strip()]
ids = _normalize_assistant_tool_ids(selected_tool_ids)
if not ids:
return []
@@ -182,11 +272,18 @@ def _resolve_runtime_tools(db: Session, selected_tool_ids: List[str], warnings:
def _resolve_runtime_metadata(db: Session, assistant: Assistant) -> tuple[Dict[str, Any], List[str]]:
warnings: List[str] = []
generated_opener_enabled = bool(assistant.generated_opener_enabled)
manual_opener_tool_calls = _normalize_manual_opener_tool_calls(
assistant.manual_opener_tool_calls,
warnings=warnings,
)
metadata: Dict[str, Any] = {
"systemPrompt": _compose_runtime_system_prompt(assistant.prompt),
"firstTurnMode": assistant.first_turn_mode or "bot_first",
"greeting": assistant.opener or "",
"generatedOpenerEnabled": bool(assistant.generated_opener_enabled),
# Generated opener should rely on systemPrompt instead of fixed opener text.
"greeting": "" if generated_opener_enabled else (assistant.opener or ""),
"generatedOpenerEnabled": generated_opener_enabled,
"manualOpenerToolCalls": manual_opener_tool_calls,
"output": {"mode": "audio" if assistant.voice_output_enabled else "text"},
"bargeIn": {
"enabled": not bool(assistant.bot_cannot_be_interrupted),
@@ -203,10 +300,10 @@ def _resolve_runtime_metadata(db: Session, assistant: Assistant) -> tuple[Dict[s
config_mode = str(assistant.config_mode or "platform").strip().lower()
if config_mode in {"dify", "fastgpt"}:
if config_mode == "dify":
metadata["services"]["llm"] = {
"provider": "openai",
"model": "",
"provider": "dify",
"model": "dify",
"apiKey": assistant.api_key,
"baseUrl": assistant.api_url,
}
@@ -214,6 +311,19 @@ def _resolve_runtime_metadata(db: Session, assistant: Assistant) -> tuple[Dict[s
warnings.append(f"External LLM API URL is empty for mode: {assistant.config_mode}")
if not (assistant.api_key or "").strip():
warnings.append(f"External LLM API key is empty for mode: {assistant.config_mode}")
elif config_mode == "fastgpt":
metadata["services"]["llm"] = {
"provider": "fastgpt",
"model": "fastgpt",
"apiKey": assistant.api_key,
"baseUrl": assistant.api_url,
}
if (assistant.app_id or "").strip():
metadata["services"]["llm"]["appId"] = assistant.app_id
if not (assistant.api_url or "").strip():
warnings.append(f"FastGPT API URL is empty for mode: {assistant.config_mode}")
if not (assistant.api_key or "").strip():
warnings.append(f"FastGPT API key is empty for mode: {assistant.config_mode}")
elif assistant.llm_model_id:
llm = db.query(LLMModel).filter(LLMModel.id == assistant.llm_model_id).first()
if llm:
@@ -226,18 +336,27 @@ def _resolve_runtime_metadata(db: Session, assistant: Assistant) -> tuple[Dict[s
else:
warnings.append(f"LLM model not found: {assistant.llm_model_id}")
asr_runtime: Dict[str, Any] = {
"enableInterim": bool(assistant.asr_interim_enabled),
}
if assistant.asr_model_id:
asr = db.query(ASRModel).filter(ASRModel.id == assistant.asr_model_id).first()
if asr:
asr_provider = "openai_compatible" if _is_openai_compatible_vendor(asr.vendor) else "buffered"
metadata["services"]["asr"] = {
if _is_dashscope_vendor(asr.vendor):
asr_provider = "dashscope"
elif _is_openai_compatible_vendor(asr.vendor):
asr_provider = "openai_compatible"
else:
asr_provider = "buffered"
asr_runtime.update({
"provider": asr_provider,
"model": asr.model_name or asr.name,
"apiKey": asr.api_key if asr_provider == "openai_compatible" else None,
"baseUrl": asr.base_url if asr_provider == "openai_compatible" else None,
}
"apiKey": asr.api_key if asr_provider in {"openai_compatible", "dashscope"} else None,
"baseUrl": asr.base_url if asr_provider in {"openai_compatible", "dashscope"} else None,
})
else:
warnings.append(f"ASR model not found: {assistant.asr_model_id}")
metadata["services"]["asr"] = asr_runtime
if not assistant.voice_output_enabled:
metadata["services"]["tts"] = {"enabled": False}
@@ -327,6 +446,7 @@ def assistant_to_dict(assistant: Assistant) -> dict:
"callCount": assistant.call_count,
"firstTurnMode": assistant.first_turn_mode or "bot_first",
"opener": assistant.opener or "",
"manualOpenerToolCalls": _normalize_manual_opener_tool_calls(assistant.manual_opener_tool_calls),
"generatedOpenerEnabled": bool(assistant.generated_opener_enabled),
"openerAudioEnabled": bool(opener_audio.enabled) if opener_audio else False,
"openerAudioReady": opener_audio_ready,
@@ -339,12 +459,14 @@ def assistant_to_dict(assistant: Assistant) -> dict:
"voice": assistant.voice,
"speed": assistant.speed,
"hotwords": assistant.hotwords or [],
"tools": assistant.tools or [],
"tools": _normalize_assistant_tool_ids(assistant.tools),
"asrInterimEnabled": bool(assistant.asr_interim_enabled),
"botCannotBeInterrupted": bool(assistant.bot_cannot_be_interrupted),
"interruptionSensitivity": assistant.interruption_sensitivity,
"configMode": assistant.config_mode,
"apiUrl": assistant.api_url,
"apiKey": assistant.api_key,
"appId": assistant.app_id,
"llmModelId": assistant.llm_model_id,
"asrModelId": assistant.asr_model_id,
"embeddingModelId": assistant.embedding_model_id,
@@ -358,13 +480,16 @@ def _apply_assistant_update(assistant: Assistant, update_data: dict) -> None:
field_map = {
"knowledgeBaseId": "knowledge_base_id",
"firstTurnMode": "first_turn_mode",
"manualOpenerToolCalls": "manual_opener_tool_calls",
"interruptionSensitivity": "interruption_sensitivity",
"asrInterimEnabled": "asr_interim_enabled",
"botCannotBeInterrupted": "bot_cannot_be_interrupted",
"configMode": "config_mode",
"voiceOutputEnabled": "voice_output_enabled",
"generatedOpenerEnabled": "generated_opener_enabled",
"apiUrl": "api_url",
"apiKey": "api_key",
"appId": "app_id",
"llmModelId": "llm_model_id",
"asrModelId": "asr_model_id",
"embeddingModelId": "embedding_model_id",
@@ -490,6 +615,7 @@ def list_assistants(
db: Session = Depends(get_db)
):
"""获取助手列表"""
_ensure_assistant_schema(db)
query = db.query(Assistant)
total = query.count()
assistants = query.order_by(Assistant.created_at.desc()) \
@@ -505,6 +631,7 @@ def list_assistants(
@router.get("/{id}", response_model=AssistantOut)
def get_assistant(id: str, db: Session = Depends(get_db)):
"""获取单个助手详情"""
_ensure_assistant_schema(db)
assistant = db.query(Assistant).filter(Assistant.id == id).first()
if not assistant:
raise HTTPException(status_code=404, detail="Assistant not found")
@@ -514,6 +641,7 @@ def get_assistant(id: str, db: Session = Depends(get_db)):
@router.get("/{id}/config", response_model=AssistantEngineConfigResponse)
def get_assistant_config(id: str, db: Session = Depends(get_db)):
"""Canonical engine config endpoint consumed by engine backend adapter."""
_ensure_assistant_schema(db)
assistant = db.query(Assistant).filter(Assistant.id == id).first()
if not assistant:
raise HTTPException(status_code=404, detail="Assistant not found")
@@ -523,6 +651,7 @@ def get_assistant_config(id: str, db: Session = Depends(get_db)):
@router.get("/{id}/runtime-config", response_model=AssistantEngineConfigResponse)
def get_assistant_runtime_config(id: str, db: Session = Depends(get_db)):
"""Legacy alias for resolved engine runtime config."""
_ensure_assistant_schema(db)
assistant = db.query(Assistant).filter(Assistant.id == id).first()
if not assistant:
raise HTTPException(status_code=404, detail="Assistant not found")
@@ -532,12 +661,14 @@ def get_assistant_runtime_config(id: str, db: Session = Depends(get_db)):
@router.post("", response_model=AssistantOut)
def create_assistant(data: AssistantCreate, db: Session = Depends(get_db)):
"""创建新助手"""
_ensure_assistant_schema(db)
assistant = Assistant(
id=str(uuid.uuid4())[:8],
user_id=1, # 默认用户,后续添加认证
name=data.name,
first_turn_mode=data.firstTurnMode,
opener=data.opener,
manual_opener_tool_calls=_normalize_manual_opener_tool_calls(data.manualOpenerToolCalls),
generated_opener_enabled=data.generatedOpenerEnabled,
prompt=data.prompt,
knowledge_base_id=data.knowledgeBaseId,
@@ -546,12 +677,14 @@ def create_assistant(data: AssistantCreate, db: Session = Depends(get_db)):
voice=data.voice,
speed=data.speed,
hotwords=data.hotwords,
tools=data.tools,
tools=_normalize_assistant_tool_ids(data.tools),
asr_interim_enabled=data.asrInterimEnabled,
bot_cannot_be_interrupted=data.botCannotBeInterrupted,
interruption_sensitivity=data.interruptionSensitivity,
config_mode=data.configMode,
api_url=data.apiUrl,
api_key=data.apiKey,
app_id=data.appId,
llm_model_id=data.llmModelId,
asr_model_id=data.asrModelId,
embedding_model_id=data.embeddingModelId,
@@ -570,6 +703,7 @@ def create_assistant(data: AssistantCreate, db: Session = Depends(get_db)):
@router.get("/{id}/opener-audio", response_model=AssistantOpenerAudioOut)
def get_assistant_opener_audio(id: str, db: Session = Depends(get_db)):
_ensure_assistant_schema(db)
assistant = db.query(Assistant).filter(Assistant.id == id).first()
if not assistant:
raise HTTPException(status_code=404, detail="Assistant not found")
@@ -578,6 +712,7 @@ def get_assistant_opener_audio(id: str, db: Session = Depends(get_db)):
@router.get("/{id}/opener-audio/pcm")
def get_assistant_opener_audio_pcm(id: str, db: Session = Depends(get_db)):
_ensure_assistant_schema(db)
assistant = db.query(Assistant).filter(Assistant.id == id).first()
if not assistant:
raise HTTPException(status_code=404, detail="Assistant not found")
@@ -600,6 +735,7 @@ def generate_assistant_opener_audio(
data: AssistantOpenerAudioGenerateRequest,
db: Session = Depends(get_db),
):
_ensure_assistant_schema(db)
assistant = db.query(Assistant).filter(Assistant.id == id).first()
if not assistant:
raise HTTPException(status_code=404, detail="Assistant not found")
@@ -689,12 +825,17 @@ def generate_assistant_opener_audio(
@router.put("/{id}")
def update_assistant(id: str, data: AssistantUpdate, db: Session = Depends(get_db)):
"""更新助手"""
_ensure_assistant_schema(db)
assistant = db.query(Assistant).filter(Assistant.id == id).first()
if not assistant:
raise HTTPException(status_code=404, detail="Assistant not found")
update_data = data.model_dump(exclude_unset=True)
opener_audio_enabled = update_data.pop("openerAudioEnabled", None)
if "manualOpenerToolCalls" in update_data:
update_data["manualOpenerToolCalls"] = _normalize_manual_opener_tool_calls(update_data.get("manualOpenerToolCalls"))
if "tools" in update_data:
update_data["tools"] = _normalize_assistant_tool_ids(update_data.get("tools"))
_apply_assistant_update(assistant, update_data)
if opener_audio_enabled is not None:
record = _ensure_assistant_opener_audio(db, assistant)
@@ -710,6 +851,7 @@ def update_assistant(id: str, data: AssistantUpdate, db: Session = Depends(get_d
@router.delete("/{id}")
def delete_assistant(id: str, db: Session = Depends(get_db)):
"""删除助手"""
_ensure_assistant_schema(db)
assistant = db.query(Assistant).filter(Assistant.id == id).first()
if not assistant:
raise HTTPException(status_code=404, detail="Assistant not found")

View File

@@ -14,6 +14,19 @@ from ..schemas import ToolResourceCreate, ToolResourceOut, ToolResourceUpdate
router = APIRouter(prefix="/tools", tags=["Tools & Autotest"])
TOOL_ID_ALIASES: Dict[str, str] = {
# legacy -> canonical
"voice_message_prompt": "voice_msg_prompt",
}
def normalize_tool_id(tool_id: Optional[str]) -> str:
raw = str(tool_id or "").strip()
if not raw:
return ""
return TOOL_ID_ALIASES.get(raw, raw)
# ============ Available Tools ============
TOOL_REGISTRY = {
"calculator": {
@@ -87,7 +100,7 @@ TOOL_REGISTRY = {
"required": []
}
},
"voice_message_prompt": {
"voice_msg_prompt": {
"name": "语音消息提示",
"description": "播报一条语音提示消息",
"parameters": {
@@ -109,6 +122,67 @@ TOOL_REGISTRY = {
"required": ["msg"]
}
},
"voice_choice_prompt": {
"name": "语音选项提示",
"description": "播报问题并展示可选项,等待用户选择后回传结果",
"parameters": {
"type": "object",
"properties": {
"question": {"type": "string", "description": "向用户展示的问题文本"},
"options": {
"type": "array",
"description": "可选项(字符串或含 id/label/value 的对象)",
"minItems": 2,
"items": {
"anyOf": [
{"type": "string"},
{
"type": "object",
"properties": {
"id": {"type": "string"},
"label": {"type": "string"},
"value": {"type": "string"}
},
"required": ["label"]
}
]
}
},
"voice_text": {"type": "string", "description": "可选,单独指定播报文本;为空则播报 question"}
},
"required": ["question", "options"]
}
},
"text_choice_prompt": {
"name": "文本选项提示",
"description": "显示文本选项弹窗并等待用户选择后回传结果",
"parameters": {
"type": "object",
"properties": {
"question": {"type": "string", "description": "向用户展示的问题文本"},
"options": {
"type": "array",
"description": "可选项(字符串或含 id/label/value 的对象)",
"minItems": 2,
"items": {
"anyOf": [
{"type": "string"},
{
"type": "object",
"properties": {
"id": {"type": "string"},
"label": {"type": "string"},
"value": {"type": "string"}
},
"required": ["label"]
}
]
}
}
},
"required": ["question", "options"]
}
},
}
TOOL_CATEGORY_MAP = {
@@ -119,8 +193,11 @@ TOOL_CATEGORY_MAP = {
"turn_off_camera": "system",
"increase_volume": "system",
"decrease_volume": "system",
"voice_message_prompt": "system",
"voice_msg_prompt": "system",
"voice_message_prompt": "system", # backward compatibility
"text_msg_prompt": "system",
"voice_choice_prompt": "system",
"text_choice_prompt": "system",
}
TOOL_ICON_MAP = {
@@ -131,8 +208,11 @@ TOOL_ICON_MAP = {
"turn_off_camera": "CameraOff",
"increase_volume": "Volume2",
"decrease_volume": "Volume2",
"voice_message_prompt": "Volume2",
"voice_msg_prompt": "Volume2",
"voice_message_prompt": "Volume2", # backward compatibility
"text_msg_prompt": "Terminal",
"voice_choice_prompt": "Volume2",
"text_choice_prompt": "Terminal",
}
TOOL_HTTP_DEFAULTS = {
@@ -145,6 +225,8 @@ TOOL_PARAMETER_DEFAULTS = {
TOOL_WAIT_FOR_RESPONSE_DEFAULTS = {
"text_msg_prompt": True,
"voice_choice_prompt": True,
"text_choice_prompt": True,
}
@@ -217,9 +299,49 @@ def _validate_query_http_config(*, category: str, tool_id: Optional[str], http_u
raise HTTPException(status_code=400, detail="http_url is required for query tools (except calculator/code_interpreter)")
def _migrate_legacy_system_tool_ids(db: Session) -> None:
"""Rename legacy built-in system tool IDs to their canonical IDs."""
changed = False
for legacy_id, canonical_id in TOOL_ID_ALIASES.items():
if legacy_id == canonical_id:
continue
legacy_item = (
db.query(ToolResource)
.filter(ToolResource.id == legacy_id)
.first()
)
if not legacy_item or not bool(legacy_item.is_system):
continue
canonical_item = (
db.query(ToolResource)
.filter(ToolResource.id == canonical_id)
.first()
)
if canonical_item:
db.delete(legacy_item)
changed = True
continue
legacy_item.id = canonical_id
legacy_item.updated_at = datetime.utcnow()
changed = True
if changed:
db.commit()
def _seed_default_tools_if_empty(db: Session) -> None:
"""Ensure built-in tools exist in tool_resources without overriding custom edits."""
_ensure_tool_resource_schema(db)
_migrate_legacy_system_tool_ids(db)
existing_system_count = (
db.query(ToolResource.id)
.filter(ToolResource.is_system.is_(True))
.count()
)
if existing_system_count > 0:
return
existing_ids = {
str(item[0])
for item in db.query(ToolResource.id).all()
@@ -268,9 +390,10 @@ def list_available_tools():
@router.get("/list/{tool_id}")
def get_tool_detail(tool_id: str):
"""获取工具详情"""
if tool_id not in TOOL_REGISTRY:
canonical_tool_id = normalize_tool_id(tool_id)
if canonical_tool_id not in TOOL_REGISTRY:
raise HTTPException(status_code=404, detail="Tool not found")
return TOOL_REGISTRY[tool_id]
return TOOL_REGISTRY[canonical_tool_id]
# ============ Tool Resource CRUD ============
@@ -302,6 +425,10 @@ def get_tool_resource(id: str, db: Session = Depends(get_db)):
"""获取单个工具资源详情。"""
_seed_default_tools_if_empty(db)
item = db.query(ToolResource).filter(ToolResource.id == id).first()
if not item:
canonical_id = normalize_tool_id(id)
if canonical_id and canonical_id != id:
item = db.query(ToolResource).filter(ToolResource.id == canonical_id).first()
if not item:
raise HTTPException(status_code=404, detail="Tool resource not found")
return item
@@ -311,7 +438,7 @@ def get_tool_resource(id: str, db: Session = Depends(get_db)):
def create_tool_resource(data: ToolResourceCreate, db: Session = Depends(get_db)):
"""创建自定义工具资源。"""
_seed_default_tools_if_empty(db)
candidate_id = (data.id or "").strip()
candidate_id = normalize_tool_id((data.id or "").strip())
if candidate_id and db.query(ToolResource).filter(ToolResource.id == candidate_id).first():
raise HTTPException(status_code=400, detail="Tool ID already exists")
@@ -346,7 +473,10 @@ def create_tool_resource(data: ToolResourceCreate, db: Session = Depends(get_db)
def update_tool_resource(id: str, data: ToolResourceUpdate, db: Session = Depends(get_db)):
"""更新工具资源。"""
_seed_default_tools_if_empty(db)
canonical_id = normalize_tool_id(id)
item = db.query(ToolResource).filter(ToolResource.id == id).first()
if not item and canonical_id and canonical_id != id:
item = db.query(ToolResource).filter(ToolResource.id == canonical_id).first()
if not item:
raise HTTPException(status_code=404, detail="Tool resource not found")
@@ -354,14 +484,14 @@ def update_tool_resource(id: str, data: ToolResourceUpdate, db: Session = Depend
new_category = update_data.get("category", item.category)
new_http_url = update_data.get("http_url", item.http_url)
_validate_query_http_config(category=new_category, tool_id=id, http_url=new_http_url)
_validate_query_http_config(category=new_category, tool_id=item.id, http_url=new_http_url)
if "http_method" in update_data:
update_data["http_method"] = _normalize_http_method(update_data.get("http_method"))
if "http_timeout_ms" in update_data and update_data.get("http_timeout_ms") is not None:
update_data["http_timeout_ms"] = max(1000, int(update_data["http_timeout_ms"]))
if "parameter_schema" in update_data:
update_data["parameter_schema"] = _normalize_parameter_schema(update_data.get("parameter_schema"), tool_id=id)
update_data["parameter_schema"] = _normalize_parameter_schema(update_data.get("parameter_schema"), tool_id=item.id)
if "parameter_defaults" in update_data:
update_data["parameter_defaults"] = _normalize_parameter_defaults(update_data.get("parameter_defaults"))
if new_category != "system":
@@ -380,7 +510,10 @@ def update_tool_resource(id: str, data: ToolResourceUpdate, db: Session = Depend
def delete_tool_resource(id: str, db: Session = Depends(get_db)):
"""删除工具资源。"""
_seed_default_tools_if_empty(db)
canonical_id = normalize_tool_id(id)
item = db.query(ToolResource).filter(ToolResource.id == id).first()
if not item and canonical_id and canonical_id != id:
item = db.query(ToolResource).filter(ToolResource.id == canonical_id).first()
if not item:
raise HTTPException(status_code=404, detail="Tool resource not found")
db.delete(item)

View File

@@ -191,6 +191,7 @@ class ASRModelCreate(ASRModelBase):
class ASRModelUpdate(BaseModel):
name: Optional[str] = None
vendor: Optional[str] = None
language: Optional[str] = None
base_url: Optional[str] = None
api_key: Optional[str] = None
@@ -280,6 +281,7 @@ class AssistantBase(BaseModel):
name: str
firstTurnMode: str = "bot_first"
opener: str = ""
manualOpenerToolCalls: List[Dict[str, Any]] = []
generatedOpenerEnabled: bool = False
openerAudioEnabled: bool = False
prompt: str = ""
@@ -290,11 +292,13 @@ class AssistantBase(BaseModel):
speed: float = 1.0
hotwords: List[str] = []
tools: List[str] = []
asrInterimEnabled: bool = False
botCannotBeInterrupted: bool = False
interruptionSensitivity: int = 500
configMode: str = "platform"
apiUrl: Optional[str] = None
apiKey: Optional[str] = None
appId: Optional[str] = None
# 模型关联
llmModelId: Optional[str] = None
asrModelId: Optional[str] = None
@@ -310,6 +314,7 @@ class AssistantUpdate(BaseModel):
name: Optional[str] = None
firstTurnMode: Optional[str] = None
opener: Optional[str] = None
manualOpenerToolCalls: Optional[List[Dict[str, Any]]] = None
generatedOpenerEnabled: Optional[bool] = None
openerAudioEnabled: Optional[bool] = None
prompt: Optional[str] = None
@@ -320,11 +325,13 @@ class AssistantUpdate(BaseModel):
speed: Optional[float] = None
hotwords: Optional[List[str]] = None
tools: Optional[List[str]] = None
asrInterimEnabled: Optional[bool] = None
botCannotBeInterrupted: Optional[bool] = None
interruptionSensitivity: Optional[int] = None
configMode: Optional[str] = None
apiUrl: Optional[str] = None
apiKey: Optional[str] = None
appId: Optional[str] = None
llmModelId: Optional[str] = None
asrModelId: Optional[str] = None
embeddingModelId: Optional[str] = None
@@ -350,6 +357,7 @@ class AssistantRuntimeMetadata(BaseModel):
firstTurnMode: str = "bot_first"
greeting: str = ""
generatedOpenerEnabled: bool = False
manualOpenerToolCalls: List[Dict[str, Any]] = Field(default_factory=list)
output: Dict[str, Any] = Field(default_factory=dict)
bargeIn: Dict[str, Any] = Field(default_factory=dict)
services: Dict[str, Dict[str, Any]] = Field(default_factory=dict)

View File

@@ -279,6 +279,36 @@ POST /api/v1/asr/{id}/transcribe
---
### 8. 预览 ASR (上传音频文件)
```http
POST /api/v1/asr/{id}/preview
```
上传音频文件进行识别预览。
**Request (multipart/form-data):**
| 参数 | 类型 | 必填 | 说明 |
|------|------|------|------|
| file | file | 是 | 音频文件 (audio/* | string | 否 | 指定语言,覆盖) |
| language模型配置 |
| api_key | string | 否 | 覆盖模型配置的 API Key |
**Response:**
```json
{
"success": true,
"transcript": "您好,请问有什么可以帮助您?",
"language": "zh",
"confidence": 0.95,
"latency_ms": 1500
}
```
---
## Schema 定义
```python

View File

@@ -20,24 +20,31 @@ interface Assistant {
id: string; // 助手唯一标识 (8位UUID)
user_id: number; // 所属用户ID
name: string; // 助手名称
call_count: number; // 调用次数
opener: string; // 开场白
callCount: number; // 调用次数
firstTurnMode: string; // 首轮模式: "bot_first" | "user_first"
opener: string; // 开场白
generatedOpenerEnabled: boolean; // 是否启用生成式开场白
openerAudioEnabled: boolean; // 是否启用预生成开场音频
openerAudioReady: boolean; // 开场音频是否已生成
openerAudioDurationMs: number; // 开场音频时长(ms)
prompt: string; // 系统提示词/人格设定
knowledge_base_id?: string; // 关联知识库ID
knowledgeBaseId?: string; // 关联知识库ID
language: string; // 语言: "zh" | "en"
voice?: string; // 声音ID
voiceOutputEnabled: boolean; // 是否启用语音输出
voice?: string; // 声音ID
speed: number; // 语速 (0.5-2.0)
hotwords: string[]; // 热词列表
tools: string[]; // 启用的工具ID列表
interruption_sensitivity: number; // 打断灵敏度 (ms)
config_mode: string; // 配置模式: "platform" | "dify" | "fastgpt" | "none"
api_url?: string; // 外部API URL
api_key?: string; // 外部API Key
// 模型关联 (新增)
llm_model_id?: string; // LLM模型ID
asr_model_id?: string; // ASR模型ID
embedding_model_id?: string; // Embedding模型ID
rerank_model_id?: string; // Rerank模型ID
hotwords: string[]; // 热词列表
tools: string[]; // 启用的工具ID列表
botCannotBeInterrupted: boolean; // 是否禁止打断
interruptionSensitivity: number; // 打断灵敏度 (ms)
configMode: string; // 配置模式: "platform" | "dify" | "fastgpt" | "none"
apiUrl?: string; // 外部API URL
apiKey?: string; // 外部API Key
// 模型关联
llmModelId?: string; // LLM模型ID
asrModelId?: string; // ASR模型ID
embeddingModelId?: string; // Embedding模型ID
rerankModelId?: string; // Rerank模型ID
created_at: string;
updated_at: string;
}
@@ -219,22 +226,109 @@ DELETE. 删除助手
---
### 6. 获取助手调用统计
### 6. 获取助手引擎配置
```http
GET /api/v1/assistants/{id}/stats
GET /api/v1/assistants/{id}/config
```
获取助手的运行时引擎配置,包含 LLM、ASR、TTS、知识库等服务的完整配置信息。
**Response:**
```json
{
"assistantId": "abc12345",
"configVersionId": "asst_abc12345_20240115103000",
"assistant": {
"systemPrompt": "你是一个专业的客服人员...",
"firstTurnMode": "bot_first",
"greeting": "您好,请问有什么可以帮助您?",
"generatedOpenerEnabled": false,
"output": {"mode": "audio"},
"bargeIn": {"enabled": true, "minDurationMs": 500},
"services": {
"llm": {"provider": "openai", "model": "gpt-4o", "apiKey": "...", "baseUrl": "..."},
"asr": {"provider": "openai_compatible", "model": "paraformer-realtime-v2", "apiKey": "..."},
"tts": {"enabled": true, "provider": "dashscope", "model": "qwen3-tts-flash-realtime", "voice": "Cherry", "speed": 1.0}
},
"tools": [...],
"knowledgeBaseId": "kb_001",
"openerAudio": {"enabled": true, "ready": true, "pcmUrl": "/api/assistants/abc12345/opener-audio/pcm"}
},
"sessionStartMetadata": {...},
"sources": {
"llmModelId": "llm_001",
"asrModelId": "asr_001",
"voiceId": "voice_001",
"knowledgeBaseId": "kb_001"
},
"warnings": []
}
```
---
### 7. 获取助手开场音频状态
```http
GET /api/v1/assistants/{id}/opener-audio
```
**Response:**
```json
{
"assistant_id": "abc12345",
"total_calls": 128,
"connected_calls": 120,
"missed_calls": 8,
"avg_duration_seconds": 180,
"today_calls": 15
"enabled": true,
"ready": true,
"encoding": "pcm_s16le",
"sampleRateHz": 16000,
"channels": 1,
"durationMs": 2500,
"textHash": "abc123...",
"ttsFingerprint": "def456...",
"updatedAt": "2024-01-15T10:30:00Z"
}
```
---
### 8. 下载开场音频 PCM 文件
```http
GET /api/v1/assistants/{id}/opener-audio/pcm
```
返回 PCM 音频文件 (application/octet-stream)。
---
### 9. 生成开场音频
```http
POST /api/v1/assistants/{id}/opener-audio/generate
```
**Request Body:**
```json
{
"text": "您好,请问有什么可以帮助您?"
}
```
**Response:**
```json
{
"enabled": true,
"ready": true,
"encoding": "pcm_s16le",
"sampleRateHz": 16000,
"channels": 1,
"durationMs": 2500,
"textHash": "abc123...",
"ttsFingerprint": "def456..."
}
```

View File

@@ -289,86 +289,7 @@ GET /api/v1/history/{call_id}/audio/{turn_index}
---
### 8. 搜索通话记录
```http
GET /api/v1/history/search
```
**Query Parameters:**
| 参数 | 类型 | 必填 | 说明 |
|------|------|------|------|
| q | string | 是 | 搜索关键词 |
| page | int | 否 | 页码 |
| limit | int | 否 | 每页数量 |
**Response:**
```json
{
"total": 5,
"page": 1,
"limit": 20,
"list": [
{
"id": "call_001",
"started_at": "2024-01-15T14:30:00Z",
"matched_content": "用户咨询产品A的售后服务"
}
]
}
```
---
### 9. 获取统计信息
```http
GET /api/v1/history/stats
```
**Query Parameters:**
| 参数 | 类型 | 必填 | 说明 |
|------|------|------|------|
| start_date | string | 否 | 开始日期 |
| end_date | string | 否 | 结束日期 |
| assistant_id | string | 否 | 助手ID |
**Response:**
```json
{
"total_calls": 150,
"connected_calls": 135,
"missed_calls": 15,
"failed_calls": 0,
"avg_duration_seconds": 180,
"total_cost": 7.50,
"by_status": {
"connected": 135,
"missed": 15,
"failed": 0
},
"by_source": {
"debug": 100,
"external": 50
},
"daily_trend": [
{
"date": "2024-01-15",
"calls": 20,
"connected": 18,
"avg_duration": 175
}
]
}
```
---
## 推荐的 Schema 定义
## Schema 定义
```python
# ============ Call Record ============
@@ -440,17 +361,6 @@ class TranscriptOut(TranscriptCreate):
class Config:
from_attributes = True
class HistoryStats(BaseModel):
total_calls: int
connected_calls: int
missed_calls: int
failed_calls: int
avg_duration_seconds: float
total_cost: float
by_status: dict
by_source: dict
daily_trend: List[dict]
```
---

View File

@@ -9,7 +9,9 @@
| 小助手 | [assistant.md](./assistant.md) | AI 助手管理 |
| LLM 模型 | [llm.md](./llm.md) | LLM 模型配置与管理 |
| ASR 模型 | [asr.md](./asr.md) | 语音识别模型配置 |
| 声音资源 | [voice-resources.md](./voice-resources.md) | TTS 语音配置 |
| 工具与测试 | [tools.md](./tools.md) | 工具列表与自动测试 |
| 知识库 | [knowledge.md](./knowledge.md) | 知识库与文档管理 |
| 历史记录 | [history-records.md](./history-records.md) | 通话记录和转写 |
---

420
api/docs/knowledge.md Normal file
View File

@@ -0,0 +1,420 @@
# 知识库 (Knowledge Base) API
知识库 API 用于管理知识库和文档的创建、索引和搜索。
## 基础信息
| 项目 | 值 |
|------|-----|
| Base URL | `/api/v1/knowledge` |
| 认证方式 | Bearer Token (预留) |
---
## 数据模型
### KnowledgeBase
```typescript
interface KnowledgeBase {
id: string; // 知识库唯一标识 (8位UUID)
user_id: number; // 所属用户ID
name: string; // 知识库名称
description: string; // 知识库描述
embeddingModel: string; // Embedding 模型名称
chunkSize: number; // 文档分块大小
chunkOverlap: number; // 分块重叠大小
docCount: number; // 文档数量
chunkCount: number; // 切分后的文本块数量
status: string; // 状态: "active" | "inactive"
createdAt: string; // 创建时间
updatedAt: string; // 更新时间
documents: KnowledgeDocument[]; // 关联的文档列表
}
```
### KnowledgeDocument
```typescript
interface KnowledgeDocument {
id: string; // 文档唯一标识
kb_id: string; // 所属知识库ID
name: string; // 文档名称
size: string; // 文件大小
fileType: string; // 文件类型
storageUrl: string; // 存储地址
status: string; // 状态: "pending" | "processing" | "completed" | "failed"
chunkCount: number; // 切分后的文本块数量
errorMessage: string; // 错误信息
uploadDate: string; // 上传时间
createdAt: string; // 创建时间
processedAt: string; // 处理完成时间
}
```
---
## API 端点
### 1. 获取知识库列表
```http
GET /api/v1/knowledge/bases
```
**Query Parameters:**
| 参数 | 类型 | 必填 | 默认值 | 说明 |
|------|------|------|--------|------|
| user_id | int | 否 | 1 | 用户ID |
| page | int | 否 | 1 | 页码 |
| limit | int | 否 | 50 | 每页数量 |
**Response:**
```json
{
"total": 2,
"page": 1,
"limit": 50,
"list": [
{
"id": "kb_001",
"user_id": 1,
"name": "产品知识库",
"description": "产品文档和FAQ",
"embeddingModel": "text-embedding-3-small",
"chunkSize": 500,
"chunkOverlap": 50,
"docCount": 10,
"chunkCount": 150,
"status": "active",
"createdAt": "2024-01-15T10:30:00",
"updatedAt": "2024-01-15T10:30:00",
"documents": [...]
}
]
}
```
---
### 2. 获取单个知识库详情
```http
GET /api/v1/knowledge/bases/{kb_id}
```
**Response:**
```json
{
"id": "kb_001",
"user_id": 1,
"name": "产品知识库",
"description": "产品文档和FAQ",
"embeddingModel": "text-embedding-3-small",
"chunkSize": 500,
"chunkOverlap": 50,
"docCount": 10,
"chunkCount": 150,
"status": "active",
"createdAt": "2024-01-15T10:30:00",
"updatedAt": "2024-01-15T10:30:00",
"documents": [
{
"id": "doc_001",
"kb_id": "kb_001",
"name": "产品手册.pdf",
"size": "1.2 MB",
"fileType": "application/pdf",
"storageUrl": "",
"status": "completed",
"chunkCount": 45,
"errorMessage": null,
"uploadDate": "2024-01-15T10:30:00",
"createdAt": "2024-01-15T10:30:00",
"processedAt": "2024-01-15T10:30:05"
}
]
}
```
---
### 3. 创建知识库
```http
POST /api/v1/knowledge/bases
```
**Request Body:**
```json
{
"name": "产品知识库",
"description": "产品文档和FAQ",
"embeddingModel": "text-embedding-3-small",
"chunkSize": 500,
"chunkOverlap": 50
}
```
**Fields 说明:**
| 字段 | 类型 | 必填 | 说明 |
|------|------|------|------|
| name | string | 是 | 知识库名称 |
| description | string | 否 | 知识库描述 |
| embeddingModel | string | 否 | Embedding 模型名称,默认 "text-embedding-3-small" |
| chunkSize | int | 否 | 文档分块大小,默认 500 |
| chunkOverlap | int | 否 | 分块重叠大小,默认 50 |
---
### 4. 更新知识库
```http
PUT /api/v1/knowledge/bases/{kb_id}
```
**Request Body:** (部分更新)
```json
{
"name": "更新后的知识库名称",
"description": "新的描述",
"chunkSize": 800
}
```
**注意:** 如果知识库中已有索引的文档,则不能修改 embeddingModel。如需修改请先删除所有文档。
---
### 5. 删除知识库
```http
DELETE /api/v1/knowledge/bases/{kb_id}
```
**Response:**
```json
{
"message": "Deleted successfully"
}
```
**注意:** 删除知识库会同时删除向量数据库中的相关数据。
---
### 6. 上传文档
```http
POST /api/v1/knowledge/bases/{kb_id}/documents
```
支持两种上传方式:
**方式一:文件上传 (multipart/form-data)**
| 参数 | 类型 | 必填 | 说明 |
|------|------|------|------|
| file | file | 是 | 要上传的文档文件 |
支持的文件类型:`.txt`, `.md`, `.csv`, `.json`, `.pdf`, `.docx`
**方式二:仅创建文档记录 (application/json)**
```json
{
"name": "document.pdf",
"size": "1.2 MB",
"fileType": "application/pdf",
"storageUrl": "https://storage.example.com/doc.pdf"
}
```
**Response (文件上传):**
```json
{
"id": "doc_001",
"name": "产品手册.pdf",
"size": "1.2 MB",
"fileType": "application/pdf",
"storageUrl": "",
"status": "completed",
"chunkCount": 45,
"message": "Document uploaded and indexed"
}
```
---
### 7. 索引文档内容
```http
POST /api/v1/knowledge/bases/{kb_id}/documents/{doc_id}/index
```
直接向向量数据库索引文本内容,无需上传文件。
**Request Body:**
```json
{
"content": "要索引的文本内容..."
}
```
**Response:**
```json
{
"message": "Document indexed",
"chunkCount": 10
}
```
---
### 8. 删除文档
```http
DELETE /api/v1/knowledge/bases/{kb_id}/documents/{doc_id}
```
**Response:**
```json
{
"message": "Deleted successfully"
}
```
---
### 9. 搜索知识库
```http
POST /api/v1/knowledge/search
```
**Request Body:**
```json
{
"kb_id": "kb_001",
"query": "产品退货政策",
"nResults": 5
}
```
**Fields 说明:**
| 字段 | 类型 | 必填 | 说明 |
|------|------|------|------|
| kb_id | string | 是 | 知识库ID |
| query | string | 是 | 搜索查询文本 |
| nResults | int | 否 | 返回结果数量,默认 5 |
**Response:**
```json
{
"results": [
{
"id": "doc_001",
"text": "我们的退货政策是...",
"score": 0.85,
"metadata": {
"document_name": "退货政策.pdf",
"chunk_index": 3
}
}
]
}
```
---
### 10. 获取知识库统计
```http
GET /api/v1/knowledge/bases/{kb_id}/stats
```
**Response:**
```json
{
"kb_id": "kb_001",
"docCount": 10,
"chunkCount": 150
}
```
---
## 支持的文件类型
| 文件类型 | 扩展名 | 说明 |
|----------|--------|------|
| 纯文本 | .txt | 纯文本文件 |
| Markdown | .md | Markdown 格式文档 |
| CSV | .csv | CSV 表格数据 |
| JSON | .json | JSON 格式数据 |
| PDF | .pdf | PDF 文档 (需要 pypdf) |
| Word | .docx | Word 文档 (需要 python-docx) |
**注意:** 不支持旧的 .doc 格式,请转换为 .docx 或其他格式。
---
## Schema 定义
```python
from pydantic import BaseModel
from typing import Optional, List
class KnowledgeBaseCreate(BaseModel):
name: str
description: Optional[str] = None
embeddingModel: Optional[str] = "text-embedding-3-small"
chunkSize: Optional[int] = 500
chunkOverlap: Optional[int] = 50
class KnowledgeBaseUpdate(BaseModel):
name: Optional[str] = None
description: Optional[str] = None
embeddingModel: Optional[str] = None
chunkSize: Optional[int] = None
chunkOverlap: Optional[int] = None
class KnowledgeSearchQuery(BaseModel):
kb_id: str
query: str
nResults: Optional[int] = 5
class DocumentIndexRequest(BaseModel):
content: str
```
---
## 单元测试
项目包含完整的单元测试,位于 `api/tests/test_knowledge.py`
### 运行测试
```bash
# 运行知识库相关测试
pytest api/tests/test_knowledge.py -v
# 运行所有测试
pytest api/tests/ -v
```

View File

@@ -258,6 +258,68 @@ POST /api/v1/llm/{id}/chat
---
### 8. 预览模型输出
```http
POST /api/v1/llm/{id}/preview
```
预览模型输出,支持 text(chat) 与 embedding 两类模型。
**Request Body:**
```json
{
"message": "请介绍一下你自己",
"system_prompt": "你是一个专业的AI助手",
"max_tokens": 512,
"temperature": 0.7
}
```
**Response (text model):**
```json
{
"success": true,
"reply": "您好!我是一个...",
"usage": {
"prompt_tokens": 20,
"completion_tokens": 50,
"total_tokens": 70
},
"latency_ms": 1500,
"error": null
}
```
**Response (embedding model):**
```json
{
"success": true,
"reply": "Embedding generated successfully. dims=1536. head=[0.012345, -0.023456, ...]",
"usage": {
"prompt_tokens": 10,
"total_tokens": 10
},
"latency_ms": 800,
"error": null
}
```
**Fields 说明:**
| 字段 | 类型 | 必填 | 说明 |
|------|------|------|------|
| message | string | 是 | 用户消息/嵌入文本 |
| system_prompt | string | 否 | 系统提示词 (仅 text 模型) |
| max_tokens | int | 否 | 最大生成 token 数 (默认 512) |
| temperature | float | 否 | 温度参数 |
| api_key | string | 否 | 覆盖模型配置的 API Key |
---
## Schema 定义
```python

View File

@@ -15,14 +15,23 @@
系统内置以下工具:
| 工具ID | 名称 | 说明 |
|--------|------|------|
| search | 网络搜索 | 搜索互联网获取最新信息 |
| calculator | 计算器 | 执行数学计算 |
| weather | 天气查询 | 查询指定城市的天气 |
| translate | 翻译 | 翻译文本到指定语言 |
| knowledge | 知识库查询 | 从知识库中检索相关信息 |
| code_interpreter | 代码执行 | 安全地执行Python代码 |
| 工具ID | 名称 | 类别 | 说明 |
|--------|------|------|------|
| calculator | 计算器 | query | 执行数学计算 |
| code_interpreter | 代码执行 | query | 安全地执行Python代码 |
| current_time | 当前时间 | query | 获取当前本地时间 |
| turn_on_camera | 打开摄像头 | system | 执行打开摄像头命令 |
| turn_off_camera | 关闭摄像头 | system | 执行关闭摄像头命令 |
| increase_volume | 调高音量 | system | 提升设备音量 |
| decrease_volume | 调低音量 | system | 降低设备音量 |
| voice_msg_prompt | 语音消息提示 | system | 播报一条语音提示消息 |
| text_msg_prompt | 文本消息提示 | system | 显示一条文本弹窗提示 |
| voice_choice_prompt | 语音选项提示 | system | 播报问题并展示可选项,等待用户选择 |
| text_choice_prompt | 文本选项提示 | system | 显示文本选项弹窗并等待用户选择 |
**类别说明:**
- `query`: 查询类工具,需要配置 HTTP URL
- `system`: 系统类工具,直接在客户端执行
---
@@ -169,6 +178,132 @@ GET /api/v1/tools/health
---
### 4. 获取工具资源列表
```http
GET /api/v1/tools/resources
```
**Query Parameters:**
| 参数 | 类型 | 必填 | 默认值 | 说明 |
|------|------|------|--------|------|
| category | string | 否 | - | 过滤类别: "query" \| "system" |
| enabled | boolean | 否 | - | 过滤启用状态 |
| include_system | boolean | 否 | true | 是否包含系统工具 |
| page | int | 否 | 1 | 页码 |
| limit | int | 否 | 100 | 每页数量 |
**Response:**
```json
{
"total": 15,
"page": 1,
"limit": 100,
"list": [
{
"id": "calculator",
"user_id": 1,
"name": "计算器",
"description": "执行数学计算",
"category": "query",
"icon": "Terminal",
"http_method": "GET",
"http_url": null,
"http_timeout_ms": 10000,
"parameter_schema": {
"type": "object",
"properties": {
"expression": {"type": "string", "description": "数学表达式"}
},
"required": ["expression"]
},
"parameter_defaults": {},
"wait_for_response": false,
"enabled": true,
"is_system": true,
"created_at": "2024-01-15T10:30:00Z"
}
]
}
```
---
### 5. 获取工具资源详情
```http
GET /api/v1/tools/resources/{id}
```
---
### 6. 创建工具资源
```http
POST /api/v1/tools/resources
```
**Request Body:**
```json
{
"name": "订单查询",
"description": "查询用户订单信息",
"category": "query",
"icon": "Search",
"http_method": "POST",
"http_url": "https://api.example.com/orders",
"http_headers": {"Authorization": "Bearer {api_key}"},
"http_timeout_ms": 10000,
"parameter_schema": {
"type": "object",
"properties": {
"order_id": {"type": "string", "description": "订单ID"}
},
"required": ["order_id"]
},
"enabled": true
}
```
**Fields 说明:**
| 字段 | 类型 | 必填 | 说明 |
|------|------|------|------|
| id | string | 否 | 工具ID默认自动生成 |
| name | string | 是 | 工具名称 |
| description | string | 否 | 工具描述 |
| category | string | 是 | 类别: "query" \| "system" |
| icon | string | 否 | 图标名称 |
| http_method | string | 否 | HTTP 方法,默认 GET |
| http_url | string | 否* | HTTP 请求地址 (query 类必填) |
| http_headers | object | 否 | HTTP 请求头 |
| http_timeout_ms | int | 否 | 超时时间(毫秒),默认 10000 |
| parameter_schema | object | 否 | 参数 JSON Schema |
| parameter_defaults | object | 否 | 默认参数值 |
| wait_for_response | boolean | 否 | 是否等待响应 (仅 system 类) |
| enabled | boolean | 否 | 是否启用,默认 true |
---
### 7. 更新工具资源
```http
PUT /api/v1/tools/resources/{id}
```
---
### 8. 删除工具资源
```http
DELETE /api/v1/tools/resources/{id}
```
---
## 自动测试 (Autotest)
### 4. 运行完整自动测试

View File

@@ -182,12 +182,14 @@ POST /api/v1/voices
| 字段 | 类型 | 必填 | 说明 |
|------|------|------|------|
| name | string | 是 | 声音名称 |
| vendor | string | 是 | 供应商 |
| vendor | string | 是 | 供应商: "Ali" \| "Volcano" \| "Minimax" \| "OpenAI Compatible" \| "DashScope" |
| gender | string | 是 | 性别: "Male" \| "Female" |
| language | string | 是 | 语言: "zh" \| "en" |
| description | string | 否 | 描述信息 |
| model | string | | 厂商语音模型标识 |
| voice_key | string | | 厂商voice_key |
| model | string | | 厂商语音模型标识 (可选,部分供应商有默认值) |
| voice_key | string | | 厂商 voice_key (可选,部分供应商有默认值) |
| api_key | string | 否 | 供应商 API Key (可选,也可通过环境变量配置) |
| base_url | string | 否 | API Base URL (可选,部分供应商有默认值) |
| speed | number | 否 | 默认语速 (0.5-2.0),默认 1.0 |
| gain | number | 否 | 音量增益 (-10~10 dB),默认 0 |
| pitch | number | 否 | 音调调整,默认 0 |
@@ -244,11 +246,14 @@ POST /api/v1/voices/{id}/preview
```json
{
"success": true,
"audio_url": "https://storage.example.com/preview/voice_001_preview.mp3",
"duration_ms": 2500
"audio_url": "data:audio/wav;base64,UklGRi...",
"duration_ms": 2500,
"error": null
}
```
**注意:** `audio_url` 返回 Base64 编码的音频数据 (data URI 格式),可直接在浏览器中播放或解码保存为音频文件。
---
### 7. 获取供应商声音列表

View File

@@ -34,6 +34,7 @@ SEED_LLM_IDS = {
SEED_ASR_IDS = {
"sensevoice_small": short_id("asr"),
"telespeech_asr": short_id("asr"),
"dashscope_realtime": short_id("asr"),
}
SEED_ASSISTANT_IDS = {
@@ -408,6 +409,20 @@ def init_default_asr_models():
enable_normalization=True,
enabled=True,
),
ASRModel(
id=SEED_ASR_IDS["dashscope_realtime"],
user_id=1,
name="DashScope Realtime ASR",
vendor="DashScope",
language="Multi-lingual",
base_url=DASHSCOPE_REALTIME_URL,
api_key="YOUR_API_KEY",
model_name="qwen3-asr-flash-realtime",
hotwords=[],
enable_punctuation=True,
enable_normalization=True,
enabled=True,
),
]
seed_if_empty(db, ASRModel, asr_models, "✅ 默认ASR模型已初始化")

View File

@@ -1,12 +1,12 @@
aiosqlite==0.19.0
fastapi==0.109.0
uvicorn==0.27.0
python-multipart==0.0.6
python-dotenv==1.0.0
pydantic==2.5.3
sqlalchemy==2.0.25
minio==7.2.0
httpx==0.26.0
chromadb==0.4.22
openai==1.12.0
dashscope==1.25.11
aiosqlite==0.22.1
fastapi==0.135.1
uvicorn==0.41.0
python-multipart==0.0.22
python-dotenv==1.2.2
pydantic==2.11.7
sqlalchemy==2.0.48
minio==7.2.20
httpx==0.28.1
chromadb==1.5.2
openai==2.24.0
dashscope==1.25.13

View File

@@ -1,8 +1,21 @@
"""Tests for ASR Model API endpoints"""
import io
import wave
import pytest
from unittest.mock import patch, MagicMock
def _make_wav_bytes(sample_rate: int = 16000) -> bytes:
with io.BytesIO() as buffer:
with wave.open(buffer, "wb") as wav_file:
wav_file.setnchannels(1)
wav_file.setsampwidth(2)
wav_file.setframerate(sample_rate)
wav_file.writeframes(b"\x00\x00" * sample_rate)
return buffer.getvalue()
class TestASRModelAPI:
"""Test cases for ASR Model endpoints"""
@@ -75,6 +88,24 @@ class TestASRModelAPI:
assert data["language"] == "en"
assert data["enable_punctuation"] == False
def test_update_asr_model_vendor(self, client, sample_asr_model_data):
"""Test updating ASR vendor metadata."""
create_response = client.post("/api/asr", json=sample_asr_model_data)
model_id = create_response.json()["id"]
response = client.put(
f"/api/asr/{model_id}",
json={
"vendor": "DashScope",
"model_name": "qwen3-asr-flash-realtime",
"base_url": "wss://dashscope.aliyuncs.com/api-ws/v1/realtime",
},
)
assert response.status_code == 200
data = response.json()
assert data["vendor"] == "DashScope"
assert data["model_name"] == "qwen3-asr-flash-realtime"
def test_delete_asr_model(self, client, sample_asr_model_data):
"""Test deleting an ASR model"""
# Create first
@@ -234,6 +265,28 @@ class TestASRModelAPI:
response = client.post(f"/api/asr/{model_id}/test")
assert response.status_code == 200
def test_test_asr_model_dashscope(self, client, sample_asr_model_data, monkeypatch):
"""Test DashScope ASR connectivity probe."""
from app.routers import asr as asr_router
sample_asr_model_data["vendor"] = "DashScope"
sample_asr_model_data["base_url"] = "wss://dashscope.aliyuncs.com/api-ws/v1/realtime"
sample_asr_model_data["model_name"] = "qwen3-asr-flash-realtime"
create_response = client.post("/api/asr", json=sample_asr_model_data)
model_id = create_response.json()["id"]
def fake_probe(**kwargs):
assert kwargs["api_key"] == sample_asr_model_data["api_key"]
assert kwargs["model"] == "qwen3-asr-flash-realtime"
monkeypatch.setattr(asr_router, "_probe_dashscope_asr_connection", fake_probe)
response = client.post(f"/api/asr/{model_id}/test")
assert response.status_code == 200
data = response.json()
assert data["success"] is True
assert data["message"] == "DashScope realtime ASR connected"
@patch('httpx.Client')
def test_test_asr_model_failure(self, mock_client_class, client, sample_asr_model_data):
"""Test testing an ASR model with failed connection"""
@@ -274,7 +327,7 @@ class TestASRModelAPI:
def test_different_asr_vendors(self, client):
"""Test creating ASR models with different vendors"""
vendors = ["SiliconFlow", "OpenAI", "Azure"]
vendors = ["SiliconFlow", "OpenAI", "Azure", "DashScope"]
for vendor in vendors:
data = {
"id": f"asr-vendor-{vendor.lower()}",
@@ -345,3 +398,33 @@ class TestASRModelAPI:
)
assert response.status_code == 400
assert "Only audio files are supported" in response.text
def test_preview_asr_model_dashscope(self, client, sample_asr_model_data, monkeypatch):
"""Test ASR preview endpoint with DashScope realtime helper."""
from app.routers import asr as asr_router
sample_asr_model_data["vendor"] = "DashScope"
sample_asr_model_data["base_url"] = "wss://dashscope.aliyuncs.com/api-ws/v1/realtime"
sample_asr_model_data["model_name"] = "qwen3-asr-flash-realtime"
create_response = client.post("/api/asr", json=sample_asr_model_data)
model_id = create_response.json()["id"]
def fake_preview(**kwargs):
assert kwargs["base_url"] == sample_asr_model_data["base_url"]
assert kwargs["model"] == sample_asr_model_data["model_name"]
return {
"transcript": "你好,这是实时识别",
"language": "zh",
"confidence": None,
}
monkeypatch.setattr(asr_router, "_transcribe_dashscope_preview", fake_preview)
response = client.post(
f"/api/asr/{model_id}/preview",
files={"file": ("sample.wav", _make_wav_bytes(), "audio/wav")},
)
assert response.status_code == 200
payload = response.json()
assert payload["success"] is True
assert payload["transcript"] == "你好,这是实时识别"

View File

@@ -21,12 +21,15 @@ class TestAssistantAPI:
data = response.json()
assert data["name"] == sample_assistant_data["name"]
assert data["opener"] == sample_assistant_data["opener"]
assert data["manualOpenerToolCalls"] == []
assert data["prompt"] == sample_assistant_data["prompt"]
assert data["language"] == sample_assistant_data["language"]
assert data["voiceOutputEnabled"] is True
assert data["firstTurnMode"] == "bot_first"
assert data["generatedOpenerEnabled"] is False
assert data["asrInterimEnabled"] is False
assert data["botCannotBeInterrupted"] is False
assert data["appId"] is None
assert "id" in data
assert data["callCount"] == 0
@@ -36,6 +39,7 @@ class TestAssistantAPI:
response = client.post("/api/assistants", json=data)
assert response.status_code == 200
assert response.json()["name"] == "Minimal Assistant"
assert response.json()["asrInterimEnabled"] is False
def test_get_assistant_by_id(self, client, sample_assistant_data):
"""Test getting a specific assistant by ID"""
@@ -67,6 +71,10 @@ class TestAssistantAPI:
"prompt": "You are an updated assistant.",
"speed": 1.5,
"voiceOutputEnabled": False,
"asrInterimEnabled": True,
"manualOpenerToolCalls": [
{"toolName": "text_msg_prompt", "arguments": {"msg": "请选择服务类型"}}
],
}
response = client.put(f"/api/assistants/{assistant_id}", json=update_data)
assert response.status_code == 200
@@ -75,6 +83,10 @@ class TestAssistantAPI:
assert data["prompt"] == "You are an updated assistant."
assert data["speed"] == 1.5
assert data["voiceOutputEnabled"] is False
assert data["asrInterimEnabled"] is True
assert data["manualOpenerToolCalls"] == [
{"toolName": "text_msg_prompt", "arguments": {"msg": "请选择服务类型"}}
]
def test_delete_assistant(self, client, sample_assistant_data):
"""Test deleting an assistant"""
@@ -205,6 +217,8 @@ class TestAssistantAPI:
"voice": voice_id,
"prompt": "runtime prompt",
"opener": "runtime opener",
"manualOpenerToolCalls": [{"toolName": "text_msg_prompt", "arguments": {"msg": "欢迎"}}],
"asrInterimEnabled": True,
"speed": 1.1,
})
assistant_resp = client.post("/api/assistants", json=sample_assistant_data)
@@ -217,11 +231,14 @@ class TestAssistantAPI:
assert payload["assistantId"] == assistant_id
metadata = payload["sessionStartMetadata"]
assert metadata["systemPrompt"] == "runtime prompt"
assert metadata["systemPrompt"].startswith("runtime prompt")
assert "Tool usage policy:" in metadata["systemPrompt"]
assert metadata["greeting"] == "runtime opener"
assert metadata["manualOpenerToolCalls"] == [{"toolName": "text_msg_prompt", "arguments": {"msg": "欢迎"}}]
assert metadata["services"]["llm"]["model"] == sample_llm_model_data["model_name"]
assert metadata["services"]["asr"]["model"] == sample_asr_model_data["model_name"]
assert metadata["services"]["asr"]["baseUrl"] == sample_asr_model_data["base_url"]
assert metadata["services"]["asr"]["enableInterim"] is True
expected_tts_voice = f"{sample_voice_data['model']}:{sample_voice_data['voice_key']}"
assert metadata["services"]["tts"]["voice"] == expected_tts_voice
assert metadata["services"]["tts"]["baseUrl"] == sample_voice_data["base_url"]
@@ -239,8 +256,10 @@ class TestAssistantAPI:
assert payload["assistantId"] == assistant_id
assert payload["assistant"]["assistantId"] == assistant_id
assert payload["assistant"]["configVersionId"].startswith(f"asst_{assistant_id}_")
assert payload["assistant"]["systemPrompt"] == sample_assistant_data["prompt"]
assert payload["sessionStartMetadata"]["systemPrompt"] == sample_assistant_data["prompt"]
assert payload["assistant"]["systemPrompt"].startswith(sample_assistant_data["prompt"])
assert "Tool usage policy:" in payload["assistant"]["systemPrompt"]
assert payload["sessionStartMetadata"]["systemPrompt"].startswith(sample_assistant_data["prompt"])
assert "Tool usage policy:" in payload["sessionStartMetadata"]["systemPrompt"]
assert payload["sessionStartMetadata"]["history"]["assistantId"] == assistant_id
def test_runtime_config_resolves_selected_tools_into_runtime_definitions(self, client, sample_assistant_data):
@@ -263,6 +282,30 @@ class TestAssistantAPI:
assert by_name["calculator"]["function"]["parameters"]["type"] == "object"
assert "expression" in by_name["calculator"]["function"]["parameters"]["properties"]
def test_runtime_config_normalizes_legacy_voice_message_prompt_tool_id(self, client, sample_assistant_data):
sample_assistant_data["tools"] = ["voice_message_prompt"]
sample_assistant_data["manualOpenerToolCalls"] = [
{"toolName": "voice_message_prompt", "arguments": {"msg": "您好"}}
]
assistant_resp = client.post("/api/assistants", json=sample_assistant_data)
assert assistant_resp.status_code == 200
assistant_payload = assistant_resp.json()
assistant_id = assistant_payload["id"]
assert assistant_payload["tools"] == ["voice_msg_prompt"]
assert assistant_payload["manualOpenerToolCalls"] == [
{"toolName": "voice_msg_prompt", "arguments": {"msg": "您好"}}
]
runtime_resp = client.get(f"/api/assistants/{assistant_id}/runtime-config")
assert runtime_resp.status_code == 200
metadata = runtime_resp.json()["sessionStartMetadata"]
tools = metadata["tools"]
by_name = {item["function"]["name"]: item for item in tools}
assert "voice_msg_prompt" in by_name
assert metadata["manualOpenerToolCalls"] == [
{"toolName": "voice_msg_prompt", "arguments": {"msg": "您好"}}
]
def test_runtime_config_text_mode_when_voice_output_disabled(self, client, sample_assistant_data):
sample_assistant_data["voiceOutputEnabled"] = False
assistant_resp = client.post("/api/assistants", json=sample_assistant_data)
@@ -273,6 +316,7 @@ class TestAssistantAPI:
assert runtime_resp.status_code == 200
metadata = runtime_resp.json()["sessionStartMetadata"]
assert metadata["output"]["mode"] == "text"
assert metadata["services"]["asr"]["enableInterim"] is False
assert metadata["services"]["tts"]["enabled"] is False
def test_runtime_config_dashscope_voice_provider(self, client, sample_assistant_data):
@@ -307,6 +351,48 @@ class TestAssistantAPI:
assert tts["apiKey"] == "dashscope-key"
assert tts["baseUrl"] == "wss://dashscope.aliyuncs.com/api-ws/v1/realtime"
def test_runtime_config_dashscope_asr_provider(self, client, sample_assistant_data):
"""DashScope ASR models should map to dashscope asr provider in runtime metadata."""
asr_resp = client.post("/api/asr", json={
"name": "DashScope Realtime ASR",
"vendor": "DashScope",
"language": "zh",
"base_url": "wss://dashscope.aliyuncs.com/api-ws/v1/realtime",
"api_key": "dashscope-asr-key",
"model_name": "qwen3-asr-flash-realtime",
"hotwords": [],
"enable_punctuation": True,
"enable_normalization": True,
"enabled": True,
})
assert asr_resp.status_code == 200
asr_payload = asr_resp.json()
sample_assistant_data.update({
"asrModelId": asr_payload["id"],
})
assistant_resp = client.post("/api/assistants", json=sample_assistant_data)
assert assistant_resp.status_code == 200
assistant_id = assistant_resp.json()["id"]
runtime_resp = client.get(f"/api/assistants/{assistant_id}/runtime-config")
assert runtime_resp.status_code == 200
metadata = runtime_resp.json()["sessionStartMetadata"]
asr = metadata["services"]["asr"]
assert asr["provider"] == "dashscope"
assert asr["baseUrl"] == "wss://dashscope.aliyuncs.com/api-ws/v1/realtime"
assert asr["enableInterim"] is False
def test_runtime_config_defaults_asr_interim_disabled_without_asr_model(self, client, sample_assistant_data):
assistant_resp = client.post("/api/assistants", json=sample_assistant_data)
assert assistant_resp.status_code == 200
assistant_id = assistant_resp.json()["id"]
runtime_resp = client.get(f"/api/assistants/{assistant_id}/runtime-config")
assert runtime_resp.status_code == 200
metadata = runtime_resp.json()["sessionStartMetadata"]
assert metadata["services"]["asr"]["enableInterim"] is False
def test_assistant_interrupt_and_generated_opener_flags(self, client, sample_assistant_data):
sample_assistant_data.update({
"firstTurnMode": "user_first",
@@ -331,5 +417,40 @@ class TestAssistantAPI:
metadata = runtime_resp.json()["sessionStartMetadata"]
assert metadata["firstTurnMode"] == "user_first"
assert metadata["generatedOpenerEnabled"] is True
assert metadata["greeting"] == ""
assert metadata["bargeIn"]["enabled"] is False
assert metadata["bargeIn"]["minDurationMs"] == 900
def test_fastgpt_app_id_persists_and_flows_to_runtime(self, client, sample_assistant_data):
sample_assistant_data.update({
"configMode": "fastgpt",
"apiUrl": "https://cloud.fastgpt.cn/api",
"apiKey": "fastgpt-key",
"appId": "app-fastgpt-123",
})
assistant_resp = client.post("/api/assistants", json=sample_assistant_data)
assert assistant_resp.status_code == 200
assistant_id = assistant_resp.json()["id"]
assert assistant_resp.json()["appId"] == "app-fastgpt-123"
runtime_resp = client.get(f"/api/assistants/{assistant_id}/runtime-config")
assert runtime_resp.status_code == 200
metadata = runtime_resp.json()["sessionStartMetadata"]
assert metadata["services"]["llm"]["provider"] == "fastgpt"
assert metadata["services"]["llm"]["appId"] == "app-fastgpt-123"
def test_dify_runtime_config_uses_dify_provider(self, client, sample_assistant_data):
sample_assistant_data.update({
"configMode": "dify",
"apiUrl": "https://api.dify.ai/v1",
"apiKey": "dify-key",
})
assistant_resp = client.post("/api/assistants", json=sample_assistant_data)
assert assistant_resp.status_code == 200
assistant_id = assistant_resp.json()["id"]
runtime_resp = client.get(f"/api/assistants/{assistant_id}/runtime-config")
assert runtime_resp.status_code == 200
metadata = runtime_resp.json()["sessionStartMetadata"]
assert metadata["services"]["llm"]["provider"] == "dify"
assert metadata["services"]["llm"]["model"] == "dify"

View File

@@ -21,6 +21,7 @@ class TestToolsAPI:
assert "turn_off_camera" in tools
assert "increase_volume" in tools
assert "decrease_volume" in tools
assert "voice_msg_prompt" in tools
assert "calculator" in tools
def test_get_tool_detail(self, client):
@@ -36,6 +37,14 @@ class TestToolsAPI:
response = client.get("/api/tools/list/non-existent-tool")
assert response.status_code == 404
def test_get_tool_detail_legacy_alias(self, client):
"""Legacy tool id should resolve to canonical tool detail."""
response = client.get("/api/tools/list/voice_message_prompt")
assert response.status_code == 200
data = response.json()
assert data["name"] == "语音消息提示"
assert "msg" in data["parameters"]["properties"]
def test_health_check(self, client):
"""Test health check endpoint"""
response = client.get("/api/tools/health")
@@ -281,6 +290,7 @@ class TestToolResourceCRUD:
assert payload["total"] >= 1
ids = [item["id"] for item in payload["list"]]
assert "calculator" in ids
assert "voice_msg_prompt" in ids
calculator = next((item for item in payload["list"] if item["id"] == "calculator"), None)
assert calculator is not None
assert calculator["parameter_schema"]["type"] == "object"

1
changelog/README.md Normal file
View File

@@ -0,0 +1 @@
# Changelog

View File

@@ -1 +1,78 @@
# Docker Deployment
# Docker Deployment
This folder contains Docker Compose configuration to run the entire AI VideoAssistant stack.
## Services
| Service | Port | Description |
|---------|------|-------------|
| minio | 9000, 9001 | S3-compatible object storage |
| backend | 8100 | FastAPI backend API |
| engine | 8001 | Conversation engine (WebSocket) |
| frontend | 6000 | React web application |
## Prerequisites
1. Docker and Docker Compose installed
2. The `engine/data/vad/silero_vad.onnx` VAD model file must exist
3. Agent configuration in `engine/config/agents/default.yaml`
## Quick Start
```bash
cd docker
docker compose up -d
```
## Access Points
- **Frontend**: http://localhost:6000
- **Backend API**: http://localhost:8100
- **Engine WebSocket**: ws://localhost:8001/ws
- **MinIO Console**: http://localhost:9001 (admin / password123)
## Configuration
### Engine Environment Variables
The engine service uses environment variables for configuration. Key variables:
- `BACKEND_URL`: Backend API URL (default: `http://backend:8100`)
- `LOG_LEVEL`: Logging level (default: `INFO`)
- `CORS_ORIGINS`: Allowed CORS origins
Agent-specific settings (LLM, TTS, ASR) are configured via YAML files in `engine/config/agents/`.
### Volumes
- `minio_data`: MinIO storage data
- `backend_data`: Backend SQLite database
- `engine_logs`: Engine log files
## Development Mode
To mount source code for hot-reload during development:
```bash
docker compose -f docker-compose.yml -f docker-compose.dev.yml up -d
```
## Logs
```bash
# View all logs
docker compose logs -f
# View specific service logs
docker compose logs -f engine
docker compose logs -f backend
```
## Stopping
```bash
docker compose down
# Remove volumes as well
docker compose down -v
```

View File

@@ -1,11 +1,35 @@
version: '3.8'
# Project name used as prefix for containers, volumes, and networks
name: ras
# Docker registry mirror for China users (change to empty or "docker.io" if you have direct access)
x-registry-mirror: &registry-mirror docker.1ms.run
services:
# 后端 API
# MinIO (S3 compatible storage)
minio:
image: ${REGISTRY_MIRROR:-docker.1ms.run}/minio/minio
ports:
- "9000:9000"
- "9001:9001"
volumes:
- minio_data:/data
environment:
MINIO_ROOT_USER: admin
MINIO_ROOT_PASSWORD: password123
command: server /data --console-address ":9001"
healthcheck:
test: ["CMD", "mc", "ready", "local"]
interval: 5s
timeout: 5s
retries: 5
# Backend API
backend:
build:
context: ../api
dockerfile: Dockerfile
args:
REGISTRY_MIRROR: ${REGISTRY_MIRROR:-docker.1ms.run}
ports:
- "8100:8100"
environment:
@@ -15,12 +39,18 @@ services:
- MINIO_SECRET_KEY=password123
- MINIO_BUCKET=ai-audio
volumes:
- ../api:/app
- ../api/data:/app/data
- backend_data:/app/data
depends_on:
- minio
minio:
condition: service_started
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8100/health"]
interval: 10s
timeout: 5s
retries: 5
start_period: 10s
# 对话引擎 (py-active-call)
# Conversation Engine
engine:
build:
context: ../engine
@@ -28,31 +58,64 @@ services:
ports:
- "8001:8001"
environment:
- HOST=0.0.0.0
- PORT=8001
- BACKEND_MODE=http
- BACKEND_URL=http://backend:8100
- LOG_LEVEL=INFO
- CORS_ORIGINS=["http://localhost:6000","http://localhost:3000"]
volumes:
- ../engine/config:/app/config:ro
- ../engine/data:/app/data:ro
- engine_logs:/app/logs
depends_on:
- backend
backend:
condition: service_started
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8001/health"]
interval: 10s
timeout: 5s
retries: 5
start_period: 15s
# 前端 (Vite + React)
# Frontend (Vite + React) production: built static files served on 6000
frontend:
build:
context: ../web
dockerfile: Dockerfile
args:
- VITE_API_BASE_URL=http://localhost:8100/api
REGISTRY_MIRROR: ${REGISTRY_MIRROR:-docker.1ms.run}
VITE_API_BASE_URL: ${VITE_API_BASE_URL:-http://localhost:8100/api}
VITE_ENGINE_WS_URL: ${VITE_ENGINE_WS_URL:-ws://localhost:8001/ws}
ports:
- "6000:6000"
depends_on:
- backend
- engine
# MinIO (S3 兼容存储)
minio:
image: minio/minio
# Frontend dev hot reload on port 3000 (run with: docker compose --profile dev up)
frontend-dev:
profiles:
- dev
build:
context: ../web
dockerfile: Dockerfile.dev
args:
REGISTRY_MIRROR: ${REGISTRY_MIRROR:-docker.1ms.run}
ports:
- "9000:9000"
- "9001:9001"
volumes:
- ./storage/minio/data:/data
- "3000:3000"
environment:
MINIO_ROOT_USER: admin
MINIO_ROOT_PASSWORD: password123
command: server /data --console-address ":9001"
- VITE_API_BASE_URL=${VITE_API_BASE_URL:-http://localhost:8100/api}
- VITE_ENGINE_WS_URL=${VITE_ENGINE_WS_URL:-ws://localhost:8001/ws}
volumes:
- ../web:/app
- frontend_dev_node_modules:/app/node_modules
depends_on:
- backend
- engine
volumes:
minio_data:
backend_data:
engine_logs:
frontend_dev_node_modules:

View File

@@ -1,7 +1,18 @@
# Documentation
部署 MkDocs
pip install mkdocs
mkdocs serve
**安装依赖(推荐使用 1.x避免与 Material 主题不兼容):**
访问 http://localhost:8000 查看文档网站。
```bash
cd docs
pip install -r requirements.txt
```
或手动安装:`pip install "mkdocs>=1.6,<2" mkdocs-material`
**本地预览:**
```bash
mkdocs serve
```
访问终端中显示的地址(如 http://127.0.0.1:8000查看文档。

View File

@@ -0,0 +1,166 @@
# 效果评估
效果评估帮助你系统地衡量和改进助手的对话质量。
## 评估维度
### 核心指标
| 指标 | 说明 | 计算方式 |
|------|------|---------|
| **解决率** | 用户问题被成功解决的比例 | 已解决 / 总对话数 |
| **准确率** | 回复内容正确的比例 | 正确回复 / 总回复数 |
| **满意度** | 用户满意的对话比例 | 满意评价 / 总评价数 |
| **转人工率** | 需要人工介入的比例 | 转人工数 / 总对话数 |
### 性能指标
| 指标 | 说明 | 建议值 |
|------|------|--------|
| **首次响应时间** | 用户输入到首次回复的时间 | < 2s |
| **平均对话轮次** | 解决问题需要的平均轮数 | < 5 轮 |
| **平均对话时长** | 单次对话的平均时长 | 视场景而定 |
## 配置评估标准
在助手配置中设置评估标准:
### 解决标准
定义什么情况视为"问题已解决"
```
评估标准solved_inquiry
描述:用户的问题得到了满意的解答
成功条件:
- 用户明确表示问题已解决
- 用户表示感谢并结束对话
- 用户获得了所需信息
失败条件:
- 用户要求转人工
- 用户多次重复相同问题
- 用户表达不满
```
### 质量标准
定义回复质量的评估维度:
```
评估维度:
1. 准确性 - 信息是否正确
2. 完整性 - 是否回答了用户所有问题
3. 相关性 - 回复是否切题
4. 简洁性 - 是否避免了冗余信息
5. 语气 - 是否保持了友好专业的态度
```
## 数据收集
### 自动收集
系统自动收集以下数据:
- 对话内容和时间戳
- 工具调用记录
- 错误和异常
- 转人工事件
### 用户反馈
配置用户反馈收集:
1. 对话结束后显示满意度评价
2. 收集用户评分1-5 分)
3. 可选的文字反馈
### 数据提取
配置需要从对话中提取的信息:
```
数据提取项:
1. user_intent
描述:用户的主要意图
类型string
2. issue_category
描述:问题分类
类型enum [产品问题, 订单问题, 技术问题, 其他]
3. resolution_status
描述:解决状态
类型enum [已解决, 未解决, 转人工]
```
## 评估报告
### 查看报告
**数据分析** > **效果评估** 页面查看:
1. **总体概览** - 核心指标趋势图
2. **分类分析** - 按问题类型的评估结果
3. **时段分析** - 不同时间段的表现
4. **详细记录** - 单条对话的评估结果
### 报告示例
```
评估报告 - 2025年1月
总对话数1,234
解决率78.5%
准确率85.2%
平均满意度4.2/5
转人工率12.3%
问题分类分布:
- 产品问题45%
- 订单问题30%
- 技术问题15%
- 其他10%
改进建议:
1. 订单问题解决率较低65%),建议补充订单相关知识库
2. 技术问题转人工率高25%),建议增加技术支持工具
```
## 持续改进
### 改进流程
1. **收集数据** - 持续收集对话和评估数据
2. **分析问题** - 找出低分对话的共性
3. **制定方案** - 针对问题制定改进措施
4. **实施改进** - 更新提示词、知识库或工具
5. **验证效果** - 观察改进后的指标变化
### 常见改进措施
| 问题 | 改进措施 |
|------|---------|
| 回复不准确 | 优化提示词,补充知识库 |
| 无法理解问题 | 增加示例,优化 ASR 热词 |
| 回复太长 | 在提示词中限制长度 |
| 缺少专业知识 | 上传相关文档到知识库 |
| 工具调用失败 | 检查工具配置和 API 状态 |
### A/B 测试
对比不同配置的效果:
1. 创建助手的变体版本
2. 按比例分配流量
3. 收集两个版本的评估数据
4. 比较各项指标
5. 选择效果更好的版本
## 下一步
- [自动化测试](autotest.md) - 批量测试助手
- [历史记录](history.md) - 查看对话详情
- [提示词指南](../concepts/assistants/prompts.md) - 优化提示词

View File

@@ -0,0 +1,88 @@
# 错误码
本文档列出 Realtime Agent Studio (RAS) API 的所有错误码及其说明。
## 协议错误
| 错误码 | 说明 | 解决方案 |
|---|---|---|
| `protocol.invalid_json` | JSON 格式错误 | 检查发送的 JSON 是否合法 |
| `protocol.invalid_message` | 消息格式错误 | 检查消息结构是否符合协议 |
| `protocol.order` | 消息顺序错误 | 确保先发送 `session.start` |
| `protocol.assistant_id_required` | 缺少 `assistant_id` query 参数 | 在连接 URL 中添加 `assistant_id` 参数 |
| `protocol.invalid_override` | metadata 覆盖字段不合法 | 检查 overrides 字段是否在白名单内 |
## 助手错误
| 错误码 | 说明 | 解决方案 |
|---|---|---|
| `assistant.not_found` | 助手不存在 | 检查 `assistant_id` 是否正确 |
| `assistant.config_unavailable` | 助手配置不可用 | 确认助手已正确配置并发布 |
## 音频错误
| 错误码 | 说明 | 解决方案 |
|---|---|---|
| `audio.invalid_pcm` | PCM 数据无效 | 检查音频格式是否为 `pcm_s16le` |
| `audio.frame_size_mismatch` | 音频帧大小不匹配 | 确保帧长度是 640 字节的整数倍 |
## 服务器错误
| 错误码 | 说明 | 解决方案 |
|---|---|---|
| `server.internal` | 服务端内部错误 | 查看服务端日志排查问题 |
## 错误响应格式
所有错误都通过 `error` 事件返回:
```json
{
"type": "error",
"timestamp": 1730000000000,
"sessionId": "sess_xxx",
"data": {
"code": "protocol.invalid_json",
"message": "Invalid JSON format",
"details": {}
}
}
```
## HTTP API 错误
REST API 使用标准 HTTP 状态码:
| 状态码 | 说明 |
|--------|------|
| 200 | 请求成功 |
| 201 | 创建成功 |
| 400 | 请求参数错误 |
| 401 | 未授权(缺少或无效的认证信息) |
| 403 | 禁止访问(权限不足) |
| 404 | 资源不存在 |
| 422 | 请求实体无法处理 |
| 500 | 服务器内部错误 |
### HTTP 错误响应示例
```json
{
"success": false,
"error": {
"code": "VALIDATION_ERROR",
"message": "Invalid request parameters",
"details": {
"field": "name",
"reason": "required"
}
}
}
```
## 错误处理最佳实践
1. **始终检查错误响应** - 不要假设请求一定成功
2. **实现重试机制** - 对于临时性错误(如网络问题)实现指数退避重试
3. **记录错误日志** - 保存错误详情用于问题排查
4. **友好的用户提示** - 将技术错误转换为用户可理解的提示

View File

@@ -0,0 +1,235 @@
# API 参考
本节提供 Realtime Agent Studio (RAS) 的完整 API 文档。
## API 概览
Realtime Agent Studio (RAS) 提供两种类型的 API
| API 类型 | 用途 | 协议 |
|---------|------|------|
| **REST API** | 管理助手、模型、知识库等资源 | HTTP |
| **WebSocket API** | 实时语音对话 | WebSocket |
## REST API
### 基础地址
```
http://localhost:8000/api/v1
```
### 认证
REST API 使用 Bearer Token 认证:
```bash
curl -H "Authorization: Bearer YOUR_API_KEY" \
http://localhost:8000/api/v1/assistants
```
### 通用响应格式
**成功响应**
```json
{
"success": true,
"data": { ... }
}
```
**列表响应**
```json
{
"success": true,
"data": {
"items": [...],
"total": 100,
"page": 1,
"page_size": 20
}
}
```
**错误响应**
```json
{
"success": false,
"error": {
"code": "ERROR_CODE",
"message": "错误描述"
}
}
```
### 主要端点
#### 助手管理
| 方法 | 路径 | 说明 |
|------|------|------|
| GET | /assistants | 获取助手列表 |
| POST | /assistants | 创建助手 |
| GET | /assistants/{id} | 获取助手详情 |
| PUT | /assistants/{id} | 更新助手 |
| DELETE | /assistants/{id} | 删除助手 |
| GET | /assistants/{id}/config | 获取引擎配置 |
| GET | /assistants/{id}/opener-audio | 获取开场音频状态 |
| POST | /assistants/{id}/opener-audio/generate | 生成开场音频 |
#### 模型管理
| 方法 | 路径 | 说明 |
|------|------|------|
| GET | /llm | 获取 LLM 模型列表 |
| POST | /llm | 添加 LLM 模型 |
| PUT | /llm/{id} | 更新 LLM 模型 |
| DELETE | /llm/{id} | 删除 LLM 模型 |
| POST | /llm/{id}/test | 测试 LLM 连接 |
| POST | /llm/{id}/preview | 预览模型输出 |
| GET | /asr | 获取 ASR 模型列表 |
| POST | /asr | 添加 ASR 模型 |
| PUT | /asr/{id} | 更新 ASR 模型 |
| DELETE | /asr/{id} | 删除 ASR 模型 |
| POST | /asr/{id}/test | 测试 ASR 连接 |
| POST | /asr/{id}/preview | 上传音频预览识别 |
| GET | /voices | 获取语音列表 |
| POST | /voices | 添加语音配置 |
| PUT | /voices/{id} | 更新语音配置 |
| DELETE | /voices/{id} | 删除语音配置 |
| POST | /voices/{id}/preview | 预览声音 |
#### 知识库管理
| 方法 | 路径 | 说明 |
|------|------|------|
| GET | /knowledge/bases | 获取知识库列表 |
| POST | /knowledge/bases | 创建知识库 |
| PUT | /knowledge/bases/{id} | 更新知识库 |
| DELETE | /knowledge/bases/{id} | 删除知识库 |
| POST | /knowledge/bases/{id}/documents | 上传文档 |
| POST | /knowledge/bases/{id}/documents/{doc_id}/index | 索引文档内容 |
| DELETE | /knowledge/bases/{id}/documents/{doc_id} | 删除文档 |
| POST | /knowledge/search | 搜索知识库 |
| GET | /knowledge/bases/{id}/stats | 获取统计信息 |
#### 工具管理
| 方法 | 路径 | 说明 |
|------|------|------|
| GET | /tools/list | 获取内置工具列表 |
| GET | /tools/resources | 获取工具资源列表 |
| POST | /tools/resources | 创建工具资源 |
| PUT | /tools/resources/{id} | 更新工具资源 |
| DELETE | /tools/resources/{id} | 删除工具资源 |
| GET | /tools/health | 健康检查 |
| POST | /tools/autotest | 运行自动测试 |
| POST | /tools/test-message | 发送测试消息 |
#### 历史记录
| 方法 | 路径 | 说明 |
|------|------|------|
| GET | /history | 获取对话历史 |
| GET | /history/{id} | 获取对话详情 |
| POST | /history | 创建通话记录 |
| PUT | /history/{id} | 更新通话记录 |
| DELETE | /history/{id} | 删除通话记录 |
| POST | /history/{id}/transcripts | 添加转写片段 |
| GET | /history/{id}/audio/{turn_index} | 获取音频文件 |
## WebSocket API
### 连接地址
```
ws://localhost:8000/ws?assistant_id=<assistant_id>
```
### 协议概述
WebSocket API 使用双向消息通信:
- **文本帧**JSON 格式的控制消息
- **二进制帧**PCM 音频数据
### 详细文档
- [WebSocket 协议](websocket.md) - 完整的消息格式和流程
- [错误码](errors.md) - 错误码列表和处理方式
## SDK
> 下面的 SDK 包名和类名沿用当前包标识;产品名称在文档中统一使用 Realtime Agent StudioRAS
### JavaScript SDK
```bash
npm install @ai-video-assistant/sdk
```
```javascript
import { AIVideoAssistant } from '@ai-video-assistant/sdk';
const assistant = new AIVideoAssistant({
apiUrl: 'http://localhost:8080',
wsUrl: 'ws://localhost:8000'
});
// 创建助手
const result = await assistant.create({
name: '客服助手',
prompt: '你是一个友好的客服助手'
});
// 开始对话
const conversation = await assistant.connect(result.id);
conversation.on('response', (text) => {
console.log('助手回复:', text);
});
```
### Python SDK
```bash
pip install ai-video-assistant
```
```python
from ai_video_assistant import AIVideoAssistant
client = AIVideoAssistant(
api_url="http://localhost:8080",
ws_url="ws://localhost:8000"
)
# 创建助手
assistant = client.assistants.create(
name="客服助手",
prompt="你是一个友好的客服助手"
)
# 开始对话
async with client.connect(assistant.id) as conv:
response = await conv.send_text("你好")
print(f"助手回复: {response}")
```
## 速率限制
| 端点类型 | 限制 |
|---------|------|
| REST API | 100 请求/分钟 |
| WebSocket | 10 并发连接/用户 |
超出限制会返回 `429 Too Many Requests`
## 下一步
- [WebSocket 协议](websocket.md) - 实时对话协议详解
- [错误码](errors.md) - 错误处理参考
- [快速开始](../quickstart/index.md) - 快速创建助手

View File

@@ -0,0 +1,880 @@
# WebSocket 协议
WebSocket 端点提供双向实时语音对话能力,支持音频流输入输出和文本消息交互。
## 连接地址
```
ws://<host>/ws?assistant_id=<assistant_id>
```
- `assistant_id` 为必填 query 参数,用于从数据库加载该助手的运行时配置。
## 传输规则
- **文本帧**JSON 格式控制消息
- **二进制帧**PCM 音频数据(`pcm_s16le`, 16kHz, 单声道)
- 帧长度必须是 640 字节的整数倍20ms 音频 = 640 bytes
---
## 消息流程
```
Client -> session.start
Server <- session.started
Server <- (optional) config.resolved
Client -> (binary pcm frames...)
Server <- input.speech_started / transcript.delta / transcript.final
Server <- assistant.response.delta / assistant.response.final
Server <- output.audio.start
Server <- (binary pcm frames...)
Server <- output.audio.end
Client -> output.audio.played (optional)
Client -> session.stop
Server <- session.stopped
```
---
## 客户端 -> 服务端消息
`session.start`
客户端连接后发送的第一个消息,用于启动对话会话。
```json
{
"type": "session.start",
"audio": {
"encoding": "pcm_s16le",
"sample_rate_hz": 16000,
"channels": 1
},
"metadata": {
"channel": "web",
"source": "web_debug",
"history": {
"userId": 1
},
"overrides": {
"systemPrompt": "你是简洁助手",
"greeting": "你好,我能帮你什么?",
"output": {
"mode": "audio"
}
},
"dynamicVariables": {
"customer_name": "Alice",
"plan_tier": "Pro"
}
}
}
```
| 字段 | 类型 | 必填 | 说明 |
|---|---|---|---|
| `type` | string | 是 | 固定为 `"session.start"` |
| `audio` | object | 否 | 音频格式描述 |
| `audio.encoding` | string | 否 | 固定为 `"pcm_s16le"` |
| `audio.sample_rate_hz` | number | 否 | 固定为 `16000` |
| `audio.channels` | number | 否 | 固定为 `1` |
| `metadata` | object | 否 | 运行时配置 |
**metadata 支持的字段**
- `channel` - 渠道标识
- `source` - 来源标识
- `history.userId` - 历史记录用户 ID
- `overrides` - 可覆盖字段(仅限安全白名单)
- `dynamicVariables` - 动态变量(支持 `{{variable}}` 占位符)
**`metadata.overrides` 白名单字段**
- `systemPrompt`
- `greeting`
- `firstTurnMode`
- `generatedOpenerEnabled`
- `output`
- `bargeIn`
- `knowledgeBaseId`
- `knowledge`
- `tools`
- `openerAudio`
**限制**
- `metadata.workflow` 会被忽略(不触发 workflow 事件)
- 禁止提交 `metadata.services`
- 禁止提交 `assistantId` / `appId` / `app_id` / `configVersionId` / `config_version_id`
- 禁止提交包含密钥语义的字段(如 `apiKey` / `token` / `secret` / `password` / `authorization`
---
`input.text`
发送文本输入,跳过 ASR 识别,直接触发 LLM 回复。
```json
{
"type": "input.text",
"text": "你能做什么?"
}
```
| 字段 | 类型 | 必填 | 说明 |
|---|---|---|---|
| `type` | string | 是 | 固定为 `"input.text"` |
| `text` | string | 是 | 用户文本内容 |
---
`response.cancel`
请求中断当前回答。
```json
{
"type": "response.cancel",
"graceful": false
}
```
| 字段 | 类型 | 必填 | 默认值 | 说明 |
|---|---|---|---|---|
| `type` | string | 是 | - | 固定为 `"response.cancel"` |
| `graceful` | boolean | 否 | `false` | `false` 立即打断 |
---
`output.audio.played`
客户端回执音频已在本地播放完成(含本地 jitter buffer / 播放队列)。
```json
{
"type": "output.audio.played",
"tts_id": "tts_001",
"response_id": "resp_001",
"turn_id": "turn_001",
"played_at_ms": 1730000018450,
"played_ms": 2520
}
```
| 字段 | 类型 | 必填 | 说明 |
|---|---|---|---|
| `type` | string | 是 | 固定为 `"output.audio.played"` |
| `tts_id` | string | 是 | 已完成播放的 TTS 段 ID |
| `response_id` | string | 否 | 所属回复 ID建议回传 |
| `turn_id` | string | 否 | 所属轮次 ID建议回传 |
| `played_at_ms` | number | 否 | 客户端本地播放完成时间戳(毫秒) |
| `played_ms` | number | 否 | 本次播放耗时(毫秒) |
---
`tool_call.results`
回传客户端执行的工具结果。
```json
{
"type": "tool_call.results",
"results": [
{
"tool_call_id": "call_abc123",
"name": "weather",
"output": { "temp_c": 21, "condition": "sunny" },
"status": { "code": 200, "message": "ok" }
}
]
}
```
| 字段 | 类型 | 必填 | 说明 |
|---|---|---|---|
| `type` | string | 是 | 固定为 `"tool_call.results"` |
| `results` | array | 否 | 工具结果列表 |
| `results[].tool_call_id` | string | 是 | 工具调用 ID |
| `results[].name` | string | 是 | 工具名称 |
| `results[].output` | any | 否 | 工具输出 |
| `results[].status` | object | 是 | 执行状态 |
| `results[].status.code` | number | 是 | HTTP 状态码200-299 表示成功) |
| `results[].status.message` | string | 是 | 状态描述 |
---
`session.stop`
结束对话会话。
```json
{
"type": "session.stop",
"reason": "client_disconnect"
}
```
| 字段 | 类型 | 必填 | 说明 |
|---|---|---|---|
| `type` | string | 是 | 固定为 `"session.stop"` |
| `reason` | string | 否 | 结束原因 |
---
`Binary Audio`
`session.started` 之后可持续发送二进制 PCM 音频。
- **格式**`pcm_s16le`
- **采样率**16000 Hz
- **声道**1单声道
- **帧长**20ms = 640 bytes
---
## 服务端 -> 客户端事件
### 事件包络
所有 JSON 事件都包含统一包络字段:
```json
{
"type": "event.name",
"timestamp": 1730000000000,
"sessionId": "sess_xxx",
"seq": 42,
"source": "asr",
"trackId": "audio_in",
"data": {}
}
```
| 字段 | 类型 | 说明 |
|---|---|---|
| `type` | string | 事件类型 |
| `timestamp` | number | 事件时间戳Unix 毫秒) |
| `sessionId` | string | 会话 ID |
| `seq` | number | 递增序号(用于重放/恢复) |
| `source` | string | 事件来源:`asr` / `llm` / `tts` / `tool` / `system` / `client` / `server` |
| `trackId` | string | 事件轨道:`audio_in` / `audio_out` / `control` |
| `data` | object | 业务数据(可选) |
**轨道 ID 说明**
| trackId | 说明 | 相关事件 |
|---------|------|---------|
| `audio_in` | ASR/VAD 输入侧事件 | `input.*`, `transcript.*` |
| `audio_out` | 助手输出侧事件 | `assistant.*`, `output.audio.*`, `response.interrupted`, `metrics.ttfb` |
| `control` | 会话控制事件 | `session.*`, `error`, `heartbeat`, `(optional) config.resolved` |
---
### 会话控制类事件
#### `session.started`
会话启动成功,客户端收到此事件后可以开始发送音频。
```json
{
"type": "session.started",
"timestamp": 1730000000000,
"sessionId": "ea34e1ca-b417-4a57-b03e-f752cb82e97d",
"seq": 1,
"trackId": "control",
"tracks": {
"audio_in": "audio_in",
"audio_out": "audio_out",
"control": "control"
},
"audio": {
"encoding": "pcm_s16le",
"sample_rate_hz": 16000,
"channels": 1
}
}
```
| 字段 | 类型 | 说明 |
|---|---|---|
| `sessionId` | string | 会话唯一标识符 |
| `trackId` | string | 固定为 `"control"` |
| `tracks` | object | 可用轨道列表 |
| `tracks.audio_in` | string | 输入轨道 ID |
| `tracks.audio_out` | string | 输出轨道 ID |
| `tracks.control` | string | 控制轨道 ID |
| `audio` | object | 音频格式配置 |
| `audio.encoding` | string | 编码格式 |
| `audio.sample_rate_hz` | number | 采样率 |
| `audio.channels` | number | 声道数 |
---
#### `config.resolved`
服务端返回的**公开配置快照**。
默认不发送SaaS 公网模式建议关闭);仅在 `WS_EMIT_CONFIG_RESOLVED=true` 时发送。
```json
{
"type": "config.resolved",
"timestamp": 1730000000001,
"sessionId": "ea34e1ca-b417-4a57-b03e-f752cb82e97d",
"seq": 2,
"trackId": "control",
"config": {
"channel": "web_debug",
"output": {
"mode": "audio"
},
"tools": {
"enabled": true,
"count": 2
},
"tracks": {
"audio_in": "audio_in",
"audio_out": "audio_out",
"control": "control"
}
}
}
```
| 字段 | 类型 | 说明 |
|---|---|---|
| `trackId` | string | 固定为 `"control"` |
| `config` | object | SaaS 安全的公开配置快照 |
| `config.channel` | string | 回显 `session.start.metadata.channel`(如提供) |
| `config.output` | object | 输出配置 |
| `config.output.mode` | string | 输出模式:`"audio"` / `"text"` |
| `config.tools.enabled` | boolean | 是否启用工具能力 |
| `config.tools.count` | number | 可用工具数量(不暴露工具清单) |
| `config.tracks` | object | 可用轨道列表 |
**不会返回以下内部字段**
- `assistantId` / `appId` / `configVersionId`
- `services`provider/model/baseUrl 等)
- 系统提示词原文及其它内部编排细节
---
#### `heartbeat`
保活心跳事件,默认每 50 秒发送一次。
```json
{
"type": "heartbeat",
"timestamp": 1730000050000,
"sessionId": "ea34e1ca-b417-4a57-b03e-f752cb82e97d",
"seq": 10
}
```
| 字段 | 类型 | 说明 |
|---|---|---|
| `timestamp` | number | 心跳时间戳 |
---
#### `session.stopped`
会话结束确认。
```json
{
"type": "session.stopped",
"timestamp": 1730000100000,
"sessionId": "ea34e1ca-b417-4a57-b03e-f752cb82e97d",
"seq": 50,
"reason": "client_requested"
}
```
| 字段 | 类型 | 说明 |
|---|---|---|
| `reason` | string | 结束原因:`"client_requested"` / `"timeout"` / `"error"` |
---
### ASR 识别事件
#### `input.speech_started`
检测到语音开始VAD
```json
{
"type": "input.speech_started",
"timestamp": 1730000010000,
"sessionId": "ea34e1ca-b417-4a57-b03e-f752cb82e97d",
"seq": 5,
"source": "asr",
"trackId": "audio_in",
"probability": 0.95
}
```
| 字段 | 类型 | 说明 |
|---|---|---|
| `trackId` | string | 固定为 `"audio_in"` |
| `probability` | number | 语音检测置信度0-1 |
---
#### `input.speech_stopped`
检测到语音结束VAD
```json
{
"type": "input.speech_stopped",
"timestamp": 1730000012000,
"sessionId": "ea34e1ca-b417-4a57-b03e-f752cb82e97d",
"seq": 8,
"source": "asr",
"trackId": "audio_in",
"probability": 0.92
}
```
| 字段 | 类型 | 说明 |
|---|---|---|
| `trackId` | string | 固定为 `"audio_in"` |
| `probability` | number | 静音检测置信度0-1 |
---
#### `transcript.delta`
ASR 增量识别文本(实时转写)。
```json
{
"type": "transcript.delta",
"timestamp": 1730000011000,
"sessionId": "ea34e1ca-b417-4a57-b03e-f752cb82e97d",
"seq": 6,
"source": "asr",
"trackId": "audio_in",
"text": "你好",
"data": {
"text": "你好",
"turn_id": "turn_001",
"utterance_id": "utt_001"
}
}
```
| 字段 | 类型 | 说明 |
|---|---|---|
| `trackId` | string | 固定为 `"audio_in"` |
| `text` | string | 增量识别文本 |
| `data.text` | string | 增量识别文本(同 `text` |
| `data.turn_id` | string | 当前对话轮次 ID |
| `data.utterance_id` | string | 当前语句 ID |
**节流说明**:服务端默认每 300ms 合并一次 delta 事件。
---
#### `transcript.final`
ASR 最终识别文本(语句结束)。
```json
{
"type": "transcript.final",
"timestamp": 1730000012500,
"sessionId": "ea34e1ca-b417-4a57-b03e-f752cb82e97d",
"seq": 9,
"source": "asr",
"trackId": "audio_in",
"text": "你好,请问今天天气怎么样",
"data": {
"text": "你好,请问今天天气怎么样",
"turn_id": "turn_001",
"utterance_id": "utt_001"
}
}
```
| 字段 | 类型 | 说明 |
|---|---|---|
| `trackId` | string | 固定为 `"audio_in"` |
| `text` | string | 最终识别文本 |
| `data.text` | string | 最终识别文本(同 `text` |
| `data.turn_id` | string | 当前对话轮次 ID |
| `data.utterance_id` | string | 当前语句 ID |
---
### LLM/TTS 输出事件
#### `assistant.response.delta`
助手增量文本输出(流式生成)。
```json
{
"type": "assistant.response.delta",
"timestamp": 1730000013000,
"sessionId": "ea34e1ca-b417-4a57-b03e-f752cb82e97d",
"seq": 12,
"source": "llm",
"trackId": "audio_out",
"text": "今天天气",
"data": {
"text": "今天天气",
"turn_id": "turn_001",
"response_id": "resp_001"
}
}
```
| 字段 | 类型 | 说明 |
|---|---|---|
| `trackId` | string | 固定为 `"audio_out"` |
| `source` | string | 固定为 `"llm"` |
| `text` | string | 增量文本内容 |
| `data.text` | string | 增量文本内容(同 `text` |
| `data.turn_id` | string | 当前对话轮次 ID |
| `data.response_id` | string | 当前回复 ID |
**节流说明**:服务端默认每 80ms 合并一次 delta 事件。
---
#### `assistant.response.final`
助手完整文本输出(回复结束)。
```json
{
"type": "assistant.response.final",
"timestamp": 1730000015000,
"sessionId": "ea34e1ca-b417-4a57-b03e-f752cb82e97d",
"seq": 18,
"source": "llm",
"trackId": "audio_out",
"text": "今天天气晴朗气温25度适合外出。",
"data": {
"text": "今天天气晴朗气温25度适合外出。",
"turn_id": "turn_001",
"response_id": "resp_001"
}
}
```
| 字段 | 类型 | 说明 |
|---|---|---|
| `trackId` | string | 固定为 `"audio_out"` |
| `source` | string | 固定为 `"llm"` |
| `text` | string | 完整回复文本 |
| `data.text` | string | 完整回复文本(同 `text` |
| `data.turn_id` | string | 当前对话轮次 ID |
| `data.response_id` | string | 当前回复 ID |
---
#### `assistant.tool_call`
工具调用通知,通知客户端 LLM 请求调用工具。
```json
{
"type": "assistant.tool_call",
"timestamp": 1730000014000,
"sessionId": "ea34e1ca-b417-4a57-b03e-f752cb82e97d",
"seq": 14,
"source": "llm",
"trackId": "audio_out",
"tool_call_id": "call_abc123",
"tool_name": "weather",
"arguments": {
"city": "北京"
},
"executor": "server",
"timeout_ms": 30000,
"data": {
"tool_call": {
"id": "call_abc123",
"name": "weather",
"arguments": "{\"city\":\"北京\"}"
},
"turn_id": "turn_001"
}
}
```
| 字段 | 类型 | 说明 |
|---|---|---|
| `trackId` | string | 固定为 `"audio_out"` |
| `source` | string | 固定为 `"llm"` |
| `tool_call_id` | string | 工具调用唯一 ID |
| `tool_name` | string | 工具名称 |
| `arguments` | object | 工具参数(已解析的 JSON |
| `executor` | string | 执行方:`"server"` 服务端执行 / `"client"` 客户端执行 |
| `timeout_ms` | number | 超时时间(毫秒) |
| `data.tool_call` | object | 原始工具调用信息 |
| `data.tool_call.id` | string | 工具调用 ID |
| `data.tool_call.name` | string | 工具名称 |
| `data.tool_call.arguments` | string | 工具参数JSON 字符串) |
| `data.turn_id` | string | 当前对话轮次 ID |
**注意**:当 `executor = "client"` 时,客户端需要执行工具并返回 `tool_call.results`
---
#### `assistant.tool_result`
工具执行结果通知。
```json
{
"type": "assistant.tool_result",
"timestamp": 1730000014500,
"sessionId": "ea34e1ca-b417-4a57-b03e-f752cb82e97d",
"seq": 15,
"source": "server",
"trackId": "audio_out",
"tool_call_id": "call_abc123",
"tool_name": "weather",
"tool_display_name": "天气查询",
"ok": true,
"error": null,
"result": {
"tool_call_id": "call_abc123",
"name": "weather",
"output": {
"temperature": 25,
"condition": "晴",
"humidity": 40
},
"status": {
"code": 200,
"message": "ok"
}
}
}
```
| 字段 | 类型 | 说明 |
|---|---|---|
| `trackId` | string | 固定为 `"audio_out"` |
| `source` | string | 执行方:`"server"` / `"client"` |
| `tool_call_id` | string | 工具调用 ID |
| `tool_name` | string | 工具名称 |
| `tool_display_name` | string | 工具显示名称 |
| `ok` | boolean | 执行是否成功(状态码 200-299 为 true |
| `error` | object \| null | 错误信息(`ok=false` 时存在) |
| `error.code` | number | 错误状态码 |
| `error.message` | string | 错误描述 |
| `error.retryable` | boolean | 是否可重试 |
| `result` | object | 原始执行结果 |
| `result.output` | any | 工具返回数据 |
| `result.status` | object | 执行状态 |
| `result.status.code` | number | HTTP 状态码 |
| `result.status.message` | string | 状态描述 |
---
#### `output.audio.start`
TTS 音频播放开始标记。
```json
{
"type": "output.audio.start",
"timestamp": 1730000015500,
"sessionId": "ea34e1ca-b417-4a57-b03e-f752cb82e97d",
"seq": 19,
"source": "tts",
"trackId": "audio_out",
"data": {
"tts_id": "tts_001",
"turn_id": "turn_001"
}
}
```
| 字段 | 类型 | 说明 |
|---|---|---|
| `trackId` | string | 固定为 `"audio_out"` |
| `source` | string | 固定为 `"tts"` |
| `data.tts_id` | string | TTS 播放段 ID |
| `data.turn_id` | string | 当前对话轮次 ID |
**说明**:此事件后服务端将发送二进制 PCM 音频帧。
---
#### `output.audio.end`
TTS 音频播放结束标记。
```json
{
"type": "output.audio.end",
"timestamp": 1730000018000,
"sessionId": "ea34e1ca-b417-4a57-b03e-f752cb82e97d",
"seq": 25,
"source": "tts",
"trackId": "audio_out",
"data": {
"tts_id": "tts_001",
"turn_id": "turn_001"
}
}
```
| 字段 | 类型 | 说明 |
|---|---|---|
| `trackId` | string | 固定为 `"audio_out"` |
| `source` | string | 固定为 `"tts"` |
| `data.tts_id` | string | TTS 播放段 ID |
| `data.turn_id` | string | 当前对话轮次 ID |
**说明**`output.audio.end` 表示服务端已发送完成,不代表客户端扬声器已播完。若需要“真实播完”信号,客户端应发送 `output.audio.played`
---
#### `response.interrupted`
回答被打断(用户插话)。
```json
{
"type": "response.interrupted",
"timestamp": 1730000016000,
"sessionId": "ea34e1ca-b417-4a57-b03e-f752cb82e97d",
"seq": 20,
"source": "system",
"trackId": "audio_out",
"data": {
"turn_id": "turn_001",
"response_id": "resp_001"
}
}
```
| 字段 | 类型 | 说明 |
|---|---|---|
| `trackId` | string | 固定为 `"audio_out"` |
| `data.turn_id` | string | 被打断的对话轮次 ID |
| `data.response_id` | string | 被打断的回复 ID |
---
#### `metrics.ttfb`
首包音频时延指标Time To First Byte
```json
{
"type": "metrics.ttfb",
"timestamp": 1730000015600,
"sessionId": "ea34e1ca-b417-4a57-b03e-f752cb82e97d",
"seq": 21,
"source": "system",
"trackId": "audio_out",
"latencyMs": 1520,
"data": {
"latencyMs": 1520,
"turn_id": "turn_001"
}
}
```
| 字段 | 类型 | 说明 |
|---|---|---|
| `trackId` | string | 固定为 `"audio_out"` |
| `latencyMs` | number | 首包音频时延(毫秒) |
| `data.latencyMs` | number | 首包音频时延(同 `latencyMs` |
| `data.turn_id` | string | 当前对话轮次 ID |
**说明**:从用户输入结束到第一个音频包发送的时间。
---
### 错误事件
#### `error`
统一错误事件。
```json
{
"type": "error",
"timestamp": 1730000020000,
"sessionId": "ea34e1ca-b417-4a57-b03e-f752cb82e97d",
"seq": 30,
"sender": "server",
"code": "llm.timeout",
"message": "LLM request timeout",
"stage": "llm",
"retryable": true,
"trackId": "audio_out",
"data": {
"error": {
"stage": "llm",
"code": "llm.timeout",
"message": "LLM request timeout",
"retryable": true
}
}
}
```
| 字段 | 类型 | 说明 |
|---|---|---|
| `sender` | string | 错误来源:`"server"` / `"client"` |
| `code` | string | 错误码 |
| `message` | string | 错误描述 |
| `stage` | string | 错误阶段:`"protocol"` / `"asr"` / `"llm"` / `"tts"` / `"tool"` / `"audio"` |
| `retryable` | boolean | 是否可重试 |
| `trackId` | string | 错误关联的轨道 |
| `data.error` | object | 结构化错误信息 |
| `data.error.stage` | string | 错误阶段 |
| `data.error.code` | string | 错误码 |
| `data.error.message` | string | 错误描述 |
| `data.error.retryable` | boolean | 是否可重试 |
**trackId 约定**
- `audio_in`ASR/音频输入相关错误
- `audio_out`LLM/TTS/工具相关错误
- `control`:协议/会话控制相关错误
---
## 关联 ID 说明
事件中的关联 ID 用于追踪对话流程:
| ID 类型 | 说明 | 生命周期 |
|---------|------|---------|
| `turn_id` | 对话轮次 ID | 一次用户-助手交互 |
| `utterance_id` | 语句 ID | 一次 ASR 最终识别结果 |
| `response_id` | 回复 ID | 一次助手回复生成 |
| `tool_call_id` | 工具调用 ID | 一次工具调用 |
| `tts_id` | TTS 播放段 ID | 一段语音合成播放 |
---
## 心跳与超时
- **心跳间隔**:默认 50 秒(`heartbeat_interval_sec`
- **空闲超时**:默认 60 秒(`inactivity_timeout_sec`
- 客户端应持续发送音频或轻量消息避免被判定闲置
## 事件节流
为保持客户端渲染和服务端负载稳定v1 协议对部分事件进行节流:
| 事件 | 默认节流间隔 | 说明 |
|------|-------------|------|
| `transcript.delta` | 300ms | ASR 增量文本 |
| `assistant.response.delta` | 80ms | LLM 增量文本 |
## 错误处理
详细错误码请参考 [错误码](errors.md)。

View File

@@ -0,0 +1,8 @@
# 配置选项(旧入口)
本页保留旧链接,用于承接历史导航或外部引用。助手配置的正式文档已经迁移到:
- [配置选项](../concepts/assistants/configuration.md) - 助手配置界面与运行时配置层说明
- [助手概念](../concepts/assistants.md) - 先理解助手对象、会话与动态变量
如果你是从创建路径进入,也可以直接回到 [快速开始](../quickstart/index.md)。

View File

@@ -0,0 +1,10 @@
# 助手管理(旧入口)
本页保留旧链接,用于承接历史导航或外部引用。助手相关内容已经拆分到更明确的文档中:
- [助手概念](../concepts/assistants.md) - 了解助手是什么、由哪些部分组成,以及会话如何运行
- [配置选项](../concepts/assistants/configuration.md) - 查看控制台和运行时配置项的分工
- [提示词指南](../concepts/assistants/prompts.md) - 编写高质量系统提示词
- [测试调试](../concepts/assistants/testing.md) - 验证助手行为并排查问题
如果你是第一次上手,建议直接从 [快速开始](../quickstart/index.md) 进入。

View File

@@ -0,0 +1,8 @@
# 提示词指南(旧入口)
本页保留旧链接,用于承接历史导航或外部引用。提示词的正式文档已经迁移到:
- [提示词指南](../concepts/assistants/prompts.md) - 设计角色、任务、限制与风格
- [助手概念](../concepts/assistants.md) - 理解提示词在助手体系中的位置
如果你想先完成最小可用配置,请从 [快速开始](../quickstart/index.md) 继续。

View File

@@ -0,0 +1,8 @@
# 测试调试(旧入口)
本页保留旧链接,用于承接历史导航或外部引用。测试与调试的正式文档已经迁移到:
- [测试调试](../concepts/assistants/testing.md) - 验证助手行为、事件流和常见问题定位
- [故障排查](../resources/troubleshooting.md) - 进入更细的链路排查步骤
如果你还没创建助手,请先完成 [快速开始](../quickstart/index.md)。

View File

@@ -0,0 +1,7 @@
# 工作流配置(旧入口)
本页保留旧链接,用于承接早期草稿和历史引用。工作流的正式文档已收敛到:
- [工作流](../customization/workflows.md) - 了解工作流的定位、节点结构、设计建议和当前边界
如果你正在配置助手中的流程能力,请优先阅读上述页面,再结合 [工具](../customization/tools.md) 与 [助手概念](../concepts/assistants.md) 一起使用。

81
docs/content/changelog.md Normal file
View File

@@ -0,0 +1,81 @@
# 更新日志
本文档记录 Realtime Agent Studio 的所有重要变更。
格式基于 [Keep a Changelog](https://keepachangelog.com/zh-CN/1.0.0/)
版本号遵循 [语义化版本](https://semver.org/lang/zh-CN/)。
---
## [未发布]
### 开发中
- 工作流可视化编辑器
- 知识库 RAG 集成
- JavaScript/Python SDK
- Step Audio 多模态模型支持
---
## [0.1.0] - 2025-01-15
### 新增
#### 实时交互引擎
- **管线式全双工引擎** - ASR → LLM → TTS 流水线架构
- **智能打断** - 支持 VAD 和 EOU 检测
- **OpenAI 兼容接口** - 支持 OpenAI Compatible 的 ASR/TTS 服务
- **DashScope TTS** - 阿里云语音合成服务适配
#### 助手配置
- **系统提示词** - 支持角色定义和动态变量 `{{variable}}`
- **模型管理** - LLM/ASR/TTS 模型统一管理界面
- **工具调用** - Webhook 工具和客户端工具配置
#### 交互测试
- **实时调试控制台** - 内置 WebSocket 调试工具
#### 开放接口
- **WebSocket 协议** - `/ws` 端点,支持二进制音频流
- **RESTful API** - 完整的助手/模型/会话 CRUD 接口
#### 历史监控
- **会话回放** - 音频 + 转写 + LLM 响应完整记录
- **会话筛选** - 按时间、助手、状态多维度检索
#### 部署
- **Docker 支持** - 提供 docker-compose 一键部署
### 技术栈
- 前端React 18, TypeScript, Tailwind CSS, Zustand
- 后端FastAPI (Python 3.10+)
- 数据库SQLite开发/ PostgreSQL生产
---
## 版本规划
| 版本 | 计划发布 | 主要特性 |
|------|---------|---------|
| 0.2.0 | 2025 Q1 | 工作流编辑器、知识库集成 |
| 0.3.0 | 2025 Q2 | SDK 发布、多模态模型 |
| 1.0.0 | 2025 H2 | 生产就绪、企业特性 |
---
## 贡献者
感谢所有为 RAS 做出贡献的开发者!
---
[未发布]: https://github.com/your-org/AI-VideoAssistant/compare/v0.1.0...HEAD
[0.1.0]: https://github.com/your-org/AI-VideoAssistant/releases/tag/v0.1.0

View File

@@ -0,0 +1,147 @@
# 助手概念详解
助手Assistant是 Realtime Agent StudioRAS中最核心的配置单元也是控制台和 API 对外暴露能力的基本对象。
---
## 什么是助手
一个助手代表一个可接入、可测试、可发布的实时 AI 入口。它回答三个问题:
- **它是谁**:角色、语气、目标、限制、开场方式、静默时候的行动(比如静默时候的询问 Ask-on-Idle
- **它能做什么**语言模型能力、语音模型能力ASR、TTS、用户打断灵敏度Barge-in、语句端点设置End-of-Utterance、知识库、记忆、工具Webhook、客户端工具、系统工具、MCP、输出模式
- **它在一次会话中如何运行**:通过 `assistant_id` 载入配置,并在运行时接收动态变量、对话时候的上下文更新
如果把引擎理解为“运行时”,那么助手就是“运行时要执行的那份定义”。
## 助手由哪些部分组成
| 层次 | 负责什么 | 典型内容 |
|------|----------|----------|
| **身份层** | 定义助手角色和交互风格 | 系统提示词、限制、开场白、静默处理 |
| **模型层** | 决定理解与生成能力 | LLM、ASR、TTS、引擎类型、用户打断、语句端点 |
| **能力层** | 扩展知识和执行能力 | 知识库、工具、记忆 |
| **会话层** | 决定运行时上下文如何注入 | `assistant_id`、动态变量 |
## 身份层
助手首先是一个“被约束的角色”,而不是一段孤立的模型调用。
### 系统提示词
系统提示词定义助手的角色、任务、边界和风格,是所有能力组合的基础。
| 要素 | 作用 | 示例 |
|------|------|------|
| **角色** | 告诉模型“自己是谁” | 客服助手、销售顾问、培训教练 |
| **任务** | 指定要完成的结果 | 解答咨询、收集信息、调用工具处理业务 |
| **限制** | 明确哪些事不能做 | 不承诺超权限优惠、不输出未经验证的结论 |
| **风格** | 约束回答节奏和措辞 | 简洁、口语化、每次 2-3 句 |
### 开场白
一个助手还要定义会话应该如何开始,以及用户静默时候如何处理,包括:
- **首轮模式**:助手先说、用户先说或者机器先说
- **开场白**使用固定开场白或者AI生成开场白
### 静默处理
用户静默时候是否询问用户是否在线
## 模型层
模型决定助手的基础理解、推理和表达能力,但不是助手定义的全部。
- **LLM** 决定对话推理与文本生成能力
- **ASR** 决定语音输入如何被实时转写
- **TTS** 决定文本回复如何转成可播放语音
- **引擎类型** 决定运行链路是分段可控还是端到端低延迟
- **VAD** 声音活动模型,判断用户是否在说话
- **EOU** 语句端点模型,判断用户是否完成一段语句等待回复
- **Barge In** 由于用户声音活动或者手动请求,是否打断助手当前的回复
## 能力层
### 知识库
知识库用于补充私有领域知识,让助手回答超出基础模型常识之外的问题。
```mermaid
flowchart LR
Question[用户问题] --> Retrieval[检索]
Retrieval --> KB[(知识库)]
KB --> Context[相关片段]
Context --> LLM[LLM]
LLM --> Answer[回答]
```
知识库适合承载政策、产品资料、流程说明、FAQ 和内部文档,而不是把所有业务知识堆进系统提示词。
### 工具
工具让助手从“会说”变成“能做事”。
```mermaid
flowchart LR
User[用户] --> Assistant[助手]
Assistant --> Tool[工具 / 外部系统]
Tool --> Assistant
Assistant --> User
```
适合用工具处理的任务包括:订单查询、预约、外部搜索、写入业务系统、调用客户端能力等。
## 会话层
### `assistant_id` 的作用
在接入层面,客户端通过 `assistant_id` 指定要加载哪一个助手。引擎据此读取默认配置,并把同一份助手定义应用到当前会话。
### 会话生命周期
```mermaid
stateDiagram-v2
[*] --> Connecting: WebSocket 连接
Connecting --> Started: session.started
Started --> Active: config.resolved / 开始对话
Active --> Active: 多轮交互
Active --> Stopped: session.stop 或连接关闭
Stopped --> [*]
```
一次会话通常会沉淀以下信息:
- 用户与助手消息时间线
- 音频流、转写结果和模型输出
- 工具调用记录与中间事件
- 自定义 metadata、渠道和业务上下文
### 动态变量与会话级覆盖
助手的默认配置不需要为每个用户都重新复制一份。RAS 提供两种常见的运行时注入方式:
- **动态变量**:在提示词中使用 `{{variable}}` 占位,并在会话开始时传入具体值
- **会话级覆盖**:仅对当前会话覆盖部分运行时参数,不回写助手基线配置
```json
{
"type": "session.start",
"metadata": {
"dynamicVariables": {
"company_name": "ABC 公司",
"customer_name": "张三",
"tier": "VIP"
}
}
}
```
这种设计让你既能复用标准助手,又能在每次接入时注入渠道、用户、订单或上下文信息。
## 相关文档
- [配置选项](assistants/configuration.md) - 查看助手在控制台和运行时有哪些配置层
- [提示词指南](assistants/prompts.md) - 设计角色、任务、限制和语气
- [测试调试](assistants/testing.md) - 验证助手质量并定位问题

View File

@@ -0,0 +1,218 @@
# 配置选项
助手配置界面包含多个标签页,每个标签页负责不同方面的配置。
## 全局设置
全局设置定义助手的核心对话能力。
| 配置项 | 说明 | 建议值 |
|-------|------|--------|
| 助手名称 | 用于标识和管理 | 简洁明确 |
| 系统提示词 | 定义角色、任务和约束 | 详见[提示词指南](prompts.md) |
| 开场白 | 对话开始时的问候语 | 简短友好 |
| 温度参数 | 控制回复随机性 | 0.7(通用)/ 0.3(严谨) |
| 上下文长度 | 保留的历史消息数 | 10-20 |
### 高级选项
- **首轮模式** - 设置首次对话的触发方式
- **打断检测** - 用户打断时的处理策略
- **超时设置** - 无响应时的处理
## 语音配置
配置语音识别和语音合成参数。
### TTS 语音合成
| 配置 | 说明 |
|------|------|
| TTS 引擎 | 选择语音合成服务(阿里/火山/Minimax |
| 音色 | 选择语音风格和性别 |
| 语速 | 语音播放速度0.5-2.0 |
| 音量 | 语音输出音量0-100 |
| 音调 | 语音音调高低0.5-2.0 |
### ASR 语音识别
| 配置 | 说明 |
|------|------|
| ASR 引擎 | 选择语音识别服务 |
| 语言 | 识别语言(中文/英文/多语言) |
| 热词 | 提高特定词汇识别准确率 |
## 工具绑定
配置助手可调用的外部工具。
### 可用工具类型
| 工具 | 说明 |
|------|------|
| 搜索工具 | 网络搜索获取信息 |
| 天气查询 | 查询天气预报 |
| 计算器 | 数学计算 |
| 知识库检索 | RAG 知识检索 |
| 自定义工具 | HTTP 回调外部 API |
### 配置步骤
1. 在工具列表中勾选需要的工具
2. 配置工具参数(如有)
3. 测试工具调用是否正常
## 知识关联
关联 RAG 知识库,让助手能够回答专业领域问题。
### 配置参数
| 参数 | 说明 | 建议值 |
|------|------|--------|
| 知识库 | 选择要关联的知识库 | - |
| 相似度阈值 | 低于此分数不返回 | 0.7 |
| 返回数量 | 单次检索返回条数 | 3 |
| 检索策略 | 混合/向量/关键词 | 混合 |
### 多知识库
支持关联多个知识库,系统会自动合并检索结果。
## 外部链接
配置第三方服务集成和 Webhook 回调。
### Webhook 配置
| 字段 | 说明 |
|------|------|
| 回调 URL | 接收事件的 HTTP 端点 |
| 事件类型 | 订阅的事件(对话开始/结束/工具调用等) |
| 认证方式 | API Key / Bearer Token / 无 |
### 支持的事件
- `conversation.started` - 对话开始
- `conversation.ended` - 对话结束
- `tool.called` - 工具被调用
- `human.transfer` - 转人工
## 配置持久化与运行时覆盖
助手配置分为两层:
1. **数据库持久化配置(基线配置)**:通过助手管理 API 保存,后续会话默认读取这一层。
2. **会话级覆盖配置runtime overrides**:仅对当前 WebSocket 会话生效,不会写回数据库。
### 哪些配置会存到数据库
以下字段会持久化在 `assistants` / `assistant_opener_audio` 等表中(通过创建/更新助手写入):
| 类别 | 典型字段 |
|------|---------|
| 对话行为 | `name``prompt``opener``firstTurnMode``generatedOpenerEnabled` |
| 输出与打断 | `voiceOutputEnabled``voice``speed``botCannotBeInterrupted``interruptionSensitivity` |
| 工具与知识库 | `tools``knowledgeBaseId` |
| 模型与外部模式 | `configMode``apiUrl``apiKey``llmModelId``asrModelId``embeddingModelId``rerankModelId` |
| 开场音频 | `openerAudioEnabled` 及音频文件状态(`ready``durationMs` 等) |
> 引擎在连接时通过 `assistant_id` 从后端读取该助手的 `sessionStartMetadata` 作为默认运行配置。
### 哪些配置可以在会话中覆盖
客户端可在 `session.start.metadata.overrides` 中覆盖以下白名单字段(仅当前会话有效):
- `systemPrompt`
- `greeting`
- `firstTurnMode`
- `generatedOpenerEnabled`
- `output`
- `bargeIn`
- `knowledgeBaseId`
- `knowledge`
- `tools`
- `openerAudio`
以下字段不能由客户端覆盖:
- `services`(模型 provider / apiKey / baseUrl 等)
- `assistantId` / `appId` / `configVersionId`(及下划线变体)
- 包含密钥语义的字段(如 `apiKey``token``secret``password``authorization`
### 覆盖示例(代码)
下面示例展示「数据库基线配置 + 会话 overrides」的最终效果。
```json
// 1) 数据库存储的基线配置(示意)
// GET /api/v1/assistants/asst_demo/config -> sessionStartMetadata
{
"systemPrompt": "你是电商客服助手,回答要简洁。",
"greeting": "你好,我是你的客服助手。",
"firstTurnMode": "bot_first",
"output": { "mode": "audio" },
"knowledgeBaseId": "kb_orders",
"tools": [
{ "type": "function", "function": { "name": "query_order" } }
]
}
```
```json
// 2) 客户端发起会话时的覆盖
{
"type": "session.start",
"metadata": {
"channel": "web",
"history": { "userId": 1001 },
"overrides": {
"greeting": "你好,我来帮你查订单进度。",
"output": { "mode": "text" },
"knowledgeBaseId": "kb_vip_orders",
"tools": [
{ "type": "function", "function": { "name": "query_vip_order" } }
]
}
}
}
```
```json
// 3) 引擎合并后的有效配置(示意)
{
"assistantId": "asst_demo",
"systemPrompt": "你是电商客服助手,回答要简洁。",
"greeting": "你好,我来帮你查订单进度。",
"firstTurnMode": "bot_first",
"output": { "mode": "text" },
"knowledgeBaseId": "kb_vip_orders",
"tools": [
{ "type": "function", "function": { "name": "query_vip_order" } }
],
"channel": "web",
"history": { "userId": 1001 }
}
```
合并规则可简化为:
```python
effective = {**db_session_start_metadata, **metadata.overrides}
```
`WS_EMIT_CONFIG_RESOLVED=true` 时,服务端会返回 `config.resolved`(公开、安全裁剪后的快照)用于前端调试当前生效配置。
## 配置导入导出
### 导出配置
1. 在助手详情页点击 **更多**
2. 选择 **导出配置**
3. 下载 JSON 格式的配置文件
### 导入配置
1. 点击 **新建助手**
2. 选择 **从配置导入**
3. 上传配置文件

View File

@@ -0,0 +1,184 @@
# 提示词指南
系统提示词System Prompt是定义助手行为的核心配置。本指南介绍如何编写高质量的提示词。
## 提示词结构
一个完整的系统提示词通常包含以下部分:
```
[角色定义]
[任务描述]
[行为约束]
[输出格式]
[示例(可选)]
```
## 编写原则
### 1. 明确角色
告诉助手它是谁:
```
你是一个专业的技术支持工程师,专门负责解答产品使用问题。
```
### 2. 定义任务
明确助手需要完成什么:
```
你的主要任务是:
1. 解答用户关于产品功能的问题
2. 提供使用指导和最佳实践
3. 帮助用户排查常见故障
```
### 3. 设置约束
限制不希望出现的行为:
```
请注意:
- 不要讨论与产品无关的话题
- 不要编造不存在的功能
- 如果不确定答案,请建议用户联系人工客服
```
### 4. 指定风格
定义回复的语气和风格:
```
回复风格要求:
- 使用友好、专业的语气
- 回答简洁明了,避免冗长
- 适当使用列表和步骤说明
```
## 提示词模板
### 客服助手
```
你是 [公司名称] 的智能客服助手。
## 你的职责
- 解答用户关于产品和服务的问题
- 处理常见的投诉和建议
- 引导用户完成操作流程
## 回复要求
- 保持友好和耐心
- 回答简洁,一般不超过 3 句话
- 如果问题复杂,建议转接人工客服
## 禁止行为
- 不要讨论竞争对手
- 不要承诺无法兑现的事项
- 不要透露内部信息
```
### 技术支持
```
你是一个技术支持工程师,专门帮助用户解决技术问题。
## 工作流程
1. 首先了解用户遇到的具体问题
2. 询问必要的环境信息(系统版本、错误信息等)
3. 提供分步骤的解决方案
4. 确认问题是否解决
## 回复格式
- 使用编号列表说明操作步骤
- 提供代码示例时使用代码块
- 复杂问题可以分多次回复
```
### 销售顾问
```
你是一个产品销售顾问,帮助用户了解产品并做出购买决策。
## 沟通策略
- 先了解用户需求,再推荐合适的产品
- 突出产品优势,但不贬低竞品
- 提供真实的价格和优惠信息
## 目标
- 帮助用户找到最适合的方案
- 解答购买相关的疑问
- 促进成交但不过度推销
```
## 动态变量
提示词支持动态变量,使用 `{{变量名}}` 语法:
```
你好 {{customer_name}},欢迎来到 {{company_name}}。
你当前的会员等级是 {{membership_tier}}。
```
`session.start` 时通过 `dynamicVariables` 传入:
```json
{
"type": "session.start",
"metadata": {
"dynamicVariables": {
"customer_name": "张三",
"company_name": "AI 公司",
"membership_tier": "黄金会员"
}
}
}
```
## 常见问题
### 回复太长
在提示词中明确限制:
```
回复长度要求:
- 一般问题1-2 句话
- 复杂问题:不超过 5 句话
- 避免重复和冗余内容
```
### 答非所问
增加任务边界说明:
```
重要提示:
- 只回答与 [产品/服务] 相关的问题
- 对于无关问题,礼貌地拒绝并引导回正题
```
### 编造信息
强调诚实原则:
```
信息准确性要求:
- 只提供你确定的信息
- 不确定时说"我不太确定,建议您..."
- 绝对不要编造数据或功能
```
## 最佳实践
1. **迭代优化** - 根据实际对话效果持续调整
2. **测试覆盖** - 用各种场景测试提示词效果
3. **版本管理** - 保存历史版本,便于回退
4. **定期复盘** - 分析对话记录,发现改进点
## 下一步
- [测试调试](testing.md) - 验证提示词效果
- [知识库配置](../../customization/knowledge-base.md) - 补充专业知识

View File

@@ -0,0 +1,162 @@
# 测试调试
本指南介绍如何测试和调试 AI 助手,确保其行为符合预期。
## 测试面板
在助手详情页,点击 **测试** 按钮打开测试面板。
### 功能介绍
| 功能 | 说明 |
|------|------|
| 文本对话 | 直接输入文字进行测试 |
| 语音测试 | 使用麦克风进行语音对话 |
| 查看日志 | 实时查看系统日志 |
| 事件追踪 | 查看 WebSocket 事件流 |
## 测试用例设计
### 基础功能测试
| 测试项 | 输入 | 预期结果 |
|--------|------|---------|
| 问候响应 | "你好" | 友好的问候回复 |
| 功能介绍 | "你能做什么?" | 准确描述能力范围 |
| 开场白 | 连接后自动 | 播放配置的开场白 |
### 业务场景测试
根据助手定位设计测试用例:
```
场景:产品咨询助手
测试用例 1常见问题
- 输入:"产品有哪些功能?"
- 预期:准确列出主要功能
测试用例 2价格询问
- 输入:"多少钱?"
- 预期:提供价格信息或引导方式
测试用例 3超出范围
- 输入:"帮我写一首诗"
- 预期:礼貌拒绝并引导回业务话题
```
### 边界测试
| 测试项 | 输入 | 预期结果 |
|--------|------|---------|
| 空输入 | "" | 提示用户输入内容 |
| 超长输入 | 1000+ 字符 | 正常处理或提示过长 |
| 特殊字符 | "<script>alert(1)</script>" | 安全处理,不执行 |
| 敏感内容 | 不当言论 | 拒绝回复并提示 |
## 日志分析
### 查看日志
在测试面板的 **日志** 标签页,可以看到:
- ASR 识别结果
- LLM 推理过程
- TTS 合成状态
- 工具调用记录
### 常见日志
```
[ASR] transcript.final: "你好,请问有什么可以帮你"
[LLM] request: messages=[...]
[LLM] response: "您好!我是..."
[TTS] synthesizing: "您好!我是..."
[TTS] audio.start
[TTS] audio.end
```
## 事件追踪
**事件** 标签页查看完整的 WebSocket 事件流:
```json
{"type": "session.started", "timestamp": 1704067200000}
{"type": "input.speech_started", "timestamp": 1704067201000}
{"type": "transcript.delta", "data": {"text": "你"}}
{"type": "transcript.delta", "data": {"text": "好"}}
{"type": "transcript.final", "data": {"text": "你好"}}
{"type": "assistant.response.delta", "data": {"text": "您"}}
{"type": "assistant.response.final", "data": {"text": "您好!..."}}
{"type": "output.audio.start"}
{"type": "output.audio.end"}
```
## 性能指标
关注以下性能指标:
| 指标 | 说明 | 建议值 |
|------|------|--------|
| TTFB | 首字节时间 | < 500ms |
| 识别延迟 | ASR 处理时间 | < 1s |
| 回复延迟 | LLM 推理时间 | < 2s |
| 合成延迟 | TTS 处理时间 | < 500ms |
## 常见问题排查
### 助手不响应
1. **检查连接状态**
- 确认 WebSocket 连接成功
- 查看是否收到 `session.started` 事件
2. **检查模型配置**
- 确认 LLM 模型 API Key 有效
- 测试模型连接是否正常
3. **查看错误日志**
- 打开浏览器开发者工具
- 检查 Console 和 Network 标签
### 回复质量差
1. **优化提示词**
- 增加更明确的指令
- 添加示例和约束
2. **调整温度参数**
- 降低 temperature 提高一致性
- 适当值通常在 0.3-0.7
3. **补充知识库**
- 上传相关文档
- 提高检索相关性
### 语音问题
1. **ASR 识别不准**
- 检查麦克风权限
- 尝试更换 ASR 引擎
- 添加热词提高识别率
2. **TTS 不播放**
- 检查浏览器自动播放限制
- 确认 TTS 配置正确
## 自动化测试
使用自动化测试功能进行批量测试:
1. 进入 **自动化测试** 页面
2. 创建测试任务
3. 配置测试用例
4. 运行测试并查看报告
详见 [自动化测试](../../analysis/autotest.md)。
## 下一步
- [自动化测试](../../analysis/autotest.md) - 批量测试
- [历史记录](../../analysis/history.md) - 查看对话记录
- [效果评估](../../analysis/evaluation.md) - 评估对话质量

View File

@@ -0,0 +1,107 @@
# 引擎架构
RAS 提供两类实时运行时:**Pipeline 引擎** 和 **Realtime 引擎**。本页只回答一个问题:你的助手应该跑在哪种引擎上。
---
## 先记住这条判断标准
- 如果你优先考虑 **可控性、可替换性、成本管理、工具 / 知识 / 流程编排**,优先选 **Pipeline 引擎**
- 如果你优先考虑 **超低延迟、更自然的端到端语音体验**,优先选 **Realtime 引擎**
## 两类引擎的区别
| 维度 | Pipeline 引擎 | Realtime 引擎 |
|------|---------------|---------------|
| **交互路径** | VAD → ASR → TD → LLM → TTS | 端到端实时模型 |
| **可控性** | 高,每个环节可替换 | 中,更多依赖模型供应商 |
| **延迟** | 中等,通常由多环节累加 | 低,链路更短 |
| **能力编排** | 更适合接入工具、知识库、工作流 | 也可接工具,但流程可控性较弱 |
| **成本结构** | 可按环节优化 | 往往更依赖单一供应商定价 |
| **适合场景** | 企业客服、流程型助手、电话场景、知识问答 | 高拟真语音助手、多模态入口、高自然度体验 |
## Pipeline 引擎是什么
Pipeline 引擎把实时语音拆成多个明确环节:
```mermaid
flowchart LR
VAD[VAD] --> ASR[ASR]
ASR --> TD[回合检测]
TD --> LLM[LLM]
LLM --> TTS[TTS]
```
这样做的好处是:
- 你可以分别选择 ASR、LLM、TTS 的供应商
- 你可以单独优化某一个环节,而不是整体替换
- 工具、知识库和工作流更容易插入到链路中
代价是:
- 延迟会累加
- 系统集成更复杂
- 你需要同时管理多类外部依赖
## Realtime 引擎是什么
Realtime 引擎直接连接端到端实时模型,让模型同时处理输入、理解、生成与打断。
```mermaid
flowchart LR
Input[音频 / 视频 / 文本输入] --> RT[Realtime Model]
RT --> Output[音频 / 文本输出]
RT --> Tools[工具]
```
这样做的好处是:
- 链路更短,延迟更低
- 全双工与打断通常更自然
- 接入路径更简单,适合强调体验的入口
代价是:
- 更依赖特定模型供应商
- 对 ASR / TTS / 回合检测的独立控制更弱
- 成本和能力边界受实时模型限制更大
## 怎么选
### 适合选择 Pipeline 的情况
- 你要接入特定 ASR 或 TTS 供应商
- 你需要知识库、工具、工作流形成稳定业务流程
- 你更在意可解释性、观测和分段优化
- 你需要把成本按环节精细控制
### 适合选择 Realtime 的情况
- 你把“自然对话感”放在首位
- 你需要更低的首响和更顺滑的打断体验
- 你可以接受对某个模型供应商的依赖
- 你的场景更接近语音助手、陪练、虚拟角色或多模态入口
## 简化决策表
| 场景 | 推荐引擎 | 原因 |
|------|----------|------|
| 企业客服 / 电话机器人 | Pipeline | 可控、可审计、易接工具与业务系统 |
| 知识问答 / 业务流程助手 | Pipeline | 更适合接知识库与工作流 |
| 高拟真语音助手 | Realtime | 更自然、更低延迟 |
| 多模态入口 | Realtime | 端到端处理音频 / 视频 / 文本 |
| 预算敏感场景 | Pipeline | 更容易逐环节优化成本 |
## 智能打断的差异
两类引擎都支持打断,但边界不同:
- **Pipeline**:由 VAD / 回合检测与 TTS 停止逻辑协同实现,行为更可控
- **Realtime**:更多由实时模型内部完成,体验更自然,但可解释性更低
## 继续阅读
- [Pipeline 引擎](pipeline-engine.md) - 查看分段链路、延迟构成与配置示例
- [Realtime 引擎](realtime-engine.md) - 查看端到端实时模型的交互路径
- [系统架构](../overview/architecture.md) - 从服务边界理解引擎在整体系统中的位置

View File

@@ -0,0 +1,49 @@
# 核心概念
本章节只解释 Realtime Agent Studio 的关键心智模型,不重复环境部署或助手构建的操作细节。
---
## 先建立这三个概念
### 1. 助手是“对外提供能力的配置单元”
助手决定了一个实时 AI 入口对外表现成什么角色:它使用什么提示词、哪些模型、能访问哪些知识和工具、会话如何开始以及运行时如何被覆盖。
- [助手概念](assistants.md) — 统一理解助手、会话、动态变量与能力边界
- [配置选项](assistants/configuration.md) — 了解界面层和运行时配置项如何分工
- [提示词指南](assistants/prompts.md) — 学会定义助手的角色、任务、风格与约束
- [测试调试](assistants/testing.md) — 理解如何验证助手行为和定位问题
### 2. 引擎是“承载实时交互的运行时”
RAS 同时提供 Pipeline 引擎与 Realtime 引擎。它们都能驱动实时助手,但在延迟、可控性、成本和可替换性上各有取舍。
- [引擎概览](engines.md) — 两类引擎的能力边界与选择建议
- [Pipeline 引擎](pipeline-engine.md) — VAD/ASR/TD/LLM/TTS 串联的可组合链路
- [Realtime 引擎](realtime-engine.md) — 面向端到端实时模型的低延迟交互路径
### 3. 工作流是“把复杂业务拆成步骤和分支的方法”
当单一提示词不足以稳定处理多步骤、多条件、多工具的业务流程时,应使用工作流来显式编排节点、路由和回退策略。
- [工作流](../customization/workflows.md) — 了解何时需要工作流、它由哪些部分组成、如何设计可维护的流程
---
## 本章节不负责什么
以下内容属于“如何搭建和使用”,不在本章节展开说明:
- 助手搭建、模型/知识库/工具/工作流配置:从 [助手概览](assistants.md) 进入构建链路
- 部署与环境变量:见 [环境与部署](../getting-started/index.md)
- 第一个助手的最短操作路径:见 [快速开始](../quickstart/index.md)
- 事件格式与接入协议:见 [API 参考](../api-reference/index.md)
## 建议阅读顺序
1. 先读 [助手概念](assistants.md),明确你要配置的对象到底是什么
2. 再读 [引擎概览](engines.md),决定应该选择 Pipeline 还是 Realtime
3. 如果场景涉及多步骤流程,再读 [工作流](../customization/workflows.md)
4. 最后回到 [快速开始](../quickstart/index.md) 或 [助手概览](assistants.md) 开始具体配置

View File

@@ -0,0 +1,137 @@
# Pipeline 引擎
Pipeline 引擎把实时对话拆成多个清晰环节,适合需要高可控性、可替换外部能力和复杂业务编排的场景。
---
## 运行链路
```mermaid
flowchart LR
subgraph Input["输入处理"]
Audio[用户音频] --> VAD[声音活动检测 VAD]
VAD --> ASR[语音识别 ASR]
ASR --> TD[回合检测 TD]
end
subgraph Reasoning["语义处理"]
TD --> LLM[大语言模型 LLM]
LLM --> Tools[工具]
LLM --> Text[回复文本]
end
subgraph Output["输出生成"]
Text --> TTS[语音合成 TTS]
TTS --> AudioOut[助手音频]
end
```
Pipeline 的关键价值不在于“环节多”,而在于每个环节都可以被单独选择、单独优化、单独观测。
## 它适合什么场景
- 需要接特定 ASR / TTS 供应商
- 需要稳定接入知识库、工具和工作流
- 需要把问题定位到具体环节,而不是只看到整体失败
- 需要按延迟、成本、质量对不同环节分别优化
## 数据流
```mermaid
sequenceDiagram
participant U as 用户
participant E as 引擎
participant ASR as ASR 服务
participant LLM as LLM 服务
participant TTS as TTS 服务
U->>E: 音频帧 (PCM)
E->>E: VAD / 回合检测
E->>ASR: 发送可识别音频
ASR-->>E: transcript.delta / transcript.final
E->>LLM: 发送对话历史与当前输入
LLM-->>E: assistant.response.delta
E->>TTS: 文本片段
TTS-->>E: 音频片段
E-->>U: 音频流与事件
```
## 延迟来自哪里
| 环节 | 典型影响 | 常见优化点 |
|------|----------|------------|
| **VAD / EoU** | 用户说完后多久触发回复 | 调整静音阈值和最短语音门限 |
| **ASR** | 语音转写速度和准确率 | 选择合适模型、热词和语言设置 |
| **LLM** | 首个 token 返回速度 | 选择低延迟模型、优化上下文 |
| **TTS** | 文字到音频的生成速度 | 选择流式 TTS缩短单次回复 |
Pipeline 的总延迟通常不是单点问题,而是链路总和。因此更适合做“逐环节调优”。
## EoU用户说完为什么重要
Pipeline 必须决定“什么时候把当前轮输入正式交给 LLM”。这个判断通常由 **EoU** 完成。
- 阈值小:响应更快,但更容易把用户停顿误判为说完
- 阈值大:更稳,但首次响应会更慢
你可以把它理解为 Pipeline 中最直接影响“对话节奏感”的参数之一。
## 工具、知识库和工作流如何插入
Pipeline 特别适合把业务能力插入到对话中:
- **知识库**:在 LLM 生成前补充领域事实
- **工具**:在需要外部信息或动作时调用系统能力
- **工作流**:在多步骤、多分支流程中决定接下来走哪个节点
这也是它在企业客服、流程助手和知识问答场景中更常见的原因。
## 智能打断
在 Pipeline 中,打断通常由 VAD 检测和 TTS 停止逻辑协同完成:
```mermaid
sequenceDiagram
participant U as 用户
participant E as 引擎
participant TTS as TTS
Note over E,TTS: 正在播放回复
E->>U: 音频流...
U->>E: 用户开始说话
E->>E: 判定是否触发打断
E->>TTS: 停止合成 / 播放
E-->>U: output.audio.interrupted
```
相比端到端实时模型,这种方式更容易解释“为什么打断”以及“在哪个环节发生了问题”。
## 配置示例
```json
{
"engine": "pipeline",
"asr": {
"provider": "openai-compatible",
"model": "FunAudioLLM/SenseVoiceSmall",
"language": "zh"
},
"llm": {
"provider": "openai",
"model": "gpt-4o-mini",
"temperature": 0.7
},
"tts": {
"provider": "openai-compatible",
"model": "FunAudioLLM/CosyVoice2-0.5B",
"voice": "anna"
}
}
```
## 相关文档
- [引擎架构](engines.md) - 回到选择指南
- [Realtime 引擎](realtime-engine.md) - 对比端到端实时模型路径
- [工具](../customization/tools.md) - 设计可被 LLM 安全调用的工具
- [知识库](../customization/knowledge-base.md) - 在对话中补充领域知识

View File

@@ -0,0 +1,97 @@
# Realtime 引擎
Realtime 引擎直接连接端到端实时模型,适合把低延迟和自然语音体验放在第一位的场景。
---
## 运行链路
```mermaid
flowchart LR
Input[音频 / 视频 / 文本输入] --> RT[Realtime Model]
RT --> Output[音频 / 文本输出]
RT --> Tools[工具]
```
与 Pipeline 不同Realtime 引擎不会把 ASR、回合检测、LLM、TTS 作为独立阶段暴露出来,而是更多依赖实时模型整体处理。
## 常见后端
| 后端 | 特点 |
|------|------|
| **OpenAI Realtime** | 语音交互自然,延迟低 |
| **Gemini Live** | 多模态能力强 |
| **Doubao 实时交互** | 更适合国内环境与中文场景 |
## 它适合什么场景
- 语音助手、陪练、虚拟角色等高自然度体验场景
- 对首响和连续打断体验要求高的入口
- 希望减少链路拼装复杂度,直接接入端到端模型的团队
## 数据流
```mermaid
sequenceDiagram
participant U as 用户
participant E as 引擎
participant RT as Realtime Model
U->>E: 音频 / 视频 / 文本输入
E->>RT: 转发实时流
RT-->>E: 流式文本 / 音频输出
E-->>U: 播放或渲染结果
```
## Realtime 的优势
- **延迟更低**:链路更短,用户感知更自然
- **全双工更顺滑**:用户插话时,模型更容易在内部处理打断
- **多模态更直接**:适合音频、视频、文本混合输入输出场景
## Realtime 的取舍
- 更依赖实时模型供应商的能力边界
- 不容易对 ASR / TTS / 回合检测做独立替换
- 成本和可观测性往往不如 Pipeline 那样可逐环节拆分
## 智能打断
Realtime 模型通常原生支持全双工和打断:
```mermaid
sequenceDiagram
participant U as 用户
participant E as 引擎
participant RT as Realtime Model
Note over RT: 模型正在输出
RT-->>E: 音频流...
E-->>U: 播放
U->>E: 用户开始说话
E->>RT: 转发新输入
Note over RT: 模型内部处理中断并切换回复
RT-->>E: 新的响应
E-->>U: 播放新响应
```
这种方式更自然,但你通常只能看到模型的整体行为,而不是每个中间阶段的细节。
## 配置示例
```json
{
"engine": "multimodal",
"model": {
"provider": "openai",
"model": "gpt-4o-realtime-preview",
"voice": "alloy"
}
}
```
## 相关文档
- [引擎架构](engines.md) - 回到两类引擎的选择指南
- [Pipeline 引擎](pipeline-engine.md) - 查看分段可控的运行路径
- [WebSocket 协议](../api-reference/websocket.md) - 了解客户端如何与引擎建立会话

View File

@@ -0,0 +1,53 @@
# 语音识别
语音识别ASR负责把用户音频实时转写成文本供引擎继续理解和处理。
## 关键配置项
| 配置项 | 说明 |
|--------|------|
| **ASR 引擎** | 选择语音识别服务提供商或自建服务 |
| **模型** | 实际使用的识别模型名称 |
| **语言** | 中文、英文或多语言 |
| **热词** | 提高业务词汇、品牌词、专有名词识别率 |
| **标点与规范化** | 自动补全标点、规范数字和日期等 |
## 模式
- `offline`:引擎本地缓冲音频后触发识别(适用于 OpenAI-compatible / SiliconFlow
- `streaming`:音频分片实时发送到服务端,服务端持续返回转写事件(适用于 DashScope Realtime ASR、Volcengine BigASR
## 配置项
| 配置项 | 说明 |
|---|---|
| ASR 引擎 | 选择语音识别服务提供商 |
| 模型 | 识别模型名称 |
| `enable_interim` | 是否开启离线 ASR 中间结果(默认 `false`,仅离线模式生效) |
| `app_id` / `resource_id` | Volcengine 等厂商的应用标识与资源标识 |
| `request_params` | 厂商原生请求参数透传,例如 `end_window_size``force_to_speech_time``context` |
| 语言 | 中文/英文/多语言 |
| 热词 | 提升特定词汇识别准确率 |
| 标点与规范化 | 是否自动补全标点、文本规范化 |
## 选择建议
- 客服、外呼等业务场景建议维护热词表,并按业务线持续更新
- 多语言入口建议显式指定语言,避免模型自动判断带来的波动
- 对延迟敏感的场景优先选择流式识别模型
- 对准确率敏感的场景,先评估专有名词、数字、地址等样本的识别表现
## 运行建议
- 使用与接入端一致的采样率和编码方式,减少额外转换
- 在测试阶段准备固定样本,便于对比不同模型或参数的变化
- 把“识别准确率”和“识别延迟”一起看,不要只看其中一项
## 相关文档
- [声音资源](voices.md) - 完整语音输入输出链路中的 TTS 侧配置
- [快速开始](../quickstart/index.md) - 以任务路径接入第一个 ASR 资源
- 客服场景建议开启热词并维护业务词表
- 多语言场景建议按会话入口显式指定语言
- 对延迟敏感场景优先选择流式识别模型
- 当前支持提供商:`openai_compatible``siliconflow``dashscope``volcengine``buffered`(回退)

View File

@@ -0,0 +1,86 @@
# 知识库
知识库负责承载助手需要引用的私有事实、业务资料和长文档内容,是 RAG检索增强生成能力的正式说明页。
## 什么时候应该用知识库
当问题答案主要来自“稳定文档”而不是实时外部动作时,优先使用知识库:
- 产品说明、政策条款、操作流程、培训材料
- 内部手册、FAQ、规范文档
- 需要被多位助手复用的领域知识
如果任务本质上是“查状态、写数据、执行动作”,那通常更适合 [工具](tools.md),而不是知识库。
## 工作原理
```mermaid
flowchart LR
subgraph Indexing["索引阶段"]
Doc[文档] --> Chunk[分块]
Chunk --> Embed[向量化]
Embed --> Store[(向量数据库)]
end
subgraph Query["查询阶段"]
Q[用户问题] --> Search[相似度检索]
Store --> Search
Search --> Context[相关片段]
Context --> LLM[LLM 生成回答]
end
```
核心原则很简单:把长文档转成可检索的片段,在用户提问时只把最相关的内容送给模型。
## 适合放进知识库的内容
| 适合 | 不适合 |
|------|--------|
| 稳定规则、标准答案、产品文档 | 高频变化的实时状态 |
| 领域术语、说明手册、培训材料 | 需要外部系统写入或变更的动作 |
| 需要跨助手复用的内容 | 只在单次会话里临时生成的数据 |
## 内容准备建议
- 优先上传结构清晰、主题明确的文档
- 对超长文档按主题拆分,减少一次索引的噪声
- 标题、章节名和表格说明对召回质量很重要,不要全部删掉格式信息
- 与其堆很多相近文档,不如先清理重复、过期和相互冲突的内容
## 常见配置项
| 配置项 | 作用 | 常见做法 |
|--------|------|----------|
| **相似度阈值** | 过滤弱相关结果 | 从保守值起步,再按误召回调 |
| **返回数量** | 控制一次送给模型的候选片段数 | 先少后多,避免上下文污染 |
| **分块大小** | 决定每个文档片段的长度 | 按文档类型和问题粒度调整 |
## 创建与维护
### 最小流程
1. 新建知识库
2. 上传文档
3. 完成索引
4. 用典型问题测试召回结果
5. 绑定到目标助手
### 日常维护
- 删除过期或互相矛盾的文档
- 当业务口径变化时,优先更新知识库而不是只改提示词
- 为关键问题准备固定测试问句,观察召回是否稳定
## 与助手的关系
知识库不是独立产品入口,而是助手的能力层:
- 助手决定是否、何时、以什么风格使用知识
- 知识库决定能够提供哪些事实片段
- 工作流和工具可以与知识库并用,但承担不同职责
## 相关文档
- [助手概念](../concepts/assistants.md) - 知识库在助手能力层中的位置
- [LLM 模型](models.md) - 为知识库准备嵌入或重排模型
- [工具](tools.md) - 当任务需要执行动作时,优先考虑工具而不是知识库

View File

@@ -0,0 +1,53 @@
# LLM 模型
本页是资源库中 LLM 模型的正式说明页,聚焦文本生成、嵌入和重排模型的接入与选择。
## 这页负责什么
当你需要为助手配置“理解与生成能力”时,请从这里开始决定:
- 使用哪个供应商或模型家族
- 该模型负责文本生成、嵌入还是重排
- 接口地址、认证信息和默认参数如何设置
语音识别和语音合成分别由 [语音识别](asr.md) 与 [声音资源](voices.md) 说明,不在本页重复。
## 模型类型
| 类型 | 用途 | 常见场景 |
|------|------|----------|
| **文本模型** | 生成回复、总结、分类、规划 | 助手主对话、工具调用决策 |
| **嵌入模型** | 向量化文档或查询 | 知识库检索 |
| **重排模型** | 对检索结果再次排序 | 提升知识召回质量 |
## 配置清单
| 配置项 | 说明 | 建议 |
|--------|------|------|
| **供应商** | OpenAI 兼容、托管平台或自建服务 | 用统一命名规范区分环境 |
| **模型名称** | 控制台中的显示名称 | 体现厂商、用途和环境 |
| **模型标识** | 请求中实际使用的 model 名称 | 保持与供应商文档一致 |
| **Base URL** | 接口地址 | 为不同环境分别配置 |
| **API Key / Token** | 鉴权凭证 | 与显示名称配套管理 |
| **默认参数** | Temperature、Max Tokens、上下文长度等 | 按业务场景收敛默认值 |
## 选择建议
- **先按用途选模型,再按成本和延迟筛选供应商**
- **文本模型不要承担知识库检索职责**:检索应交给嵌入与重排模型
- **为不同环境建立清晰命名**:如 `prod-gpt4o-mini``staging-qwen-text`
- **默认参数要保守**:让助手默认稳定,再在单个场景内按需调优
## 常见组合
| 目标 | 推荐组合 |
|------|----------|
| **通用对话助手** | 1 个文本模型 |
| **知识问答助手** | 文本模型 + 嵌入模型 |
| **高质量知识召回** | 文本模型 + 嵌入模型 + 重排模型 |
## 下一步
- [语音识别](asr.md) - 为语音输入选择 ASR
- [声音资源](voices.md) - 为语音输出准备 TTS 资源
- [知识库](knowledge-base.md) - 把嵌入 / 重排模型接入 RAG 链路

View File

@@ -0,0 +1,108 @@
# 工具
工具让助手从“会回答”扩展成“能执行动作”。本页是工具能力的正式说明页。
## 什么时候应该用工具
当用户请求需要依赖外部系统、实时数据或执行某个动作时,应该使用工具,而不是只靠提示词或知识库。
典型场景包括:
- 查询订单、库存、物流、天气等实时信息
- 创建预约、提交表单、写入业务系统
- 获取客户端环境能力,如定位、相机、权限确认
如果问题本质上是“查阅稳定资料”,优先用 [知识库](knowledge-base.md);如果问题是“执行动作或读写实时状态”,优先用工具。
## 工具类型
| 类型 | 说明 | 常见场景 |
|------|------|----------|
| **Webhook 工具** | 调用外部 HTTP API | 订单查询、CRM 写入、预约服务 |
| **客户端工具** | 由接入端在本地执行 | 获取定位、打开相机、请求用户授权 |
| **内建工具** | 平台或运行时直接提供 | 搜索、计算、知识检索等 |
## 工具调用的基本过程
```mermaid
sequenceDiagram
participant User as 用户
participant Assistant as 助手 / 模型
participant Tool as 工具
User->>Assistant: 发起请求
Assistant->>Assistant: 判断是否需要工具
Assistant->>Tool: 发起工具调用
Tool-->>Assistant: 返回结构化结果
Assistant->>User: 组织最终回复
```
关键点不是“模型会不会调用工具”,而是“工具的定义是否足够清晰,能让模型在正确时机调用”。
## 如何定义一个好工具
| 要素 | 为什么重要 |
|------|------------|
| **清晰名称** | 让模型知道它是做什么的,而不是猜用途 |
| **明确描述** | 告诉模型何时调用、何时不要调用 |
| **完整参数定义** | 降低缺参、错参和歧义调用 |
| **稳定返回结构** | 让模型更容易根据结果组织回复 |
| **明确错误语义** | 让失败时也能安全退回用户对话 |
## Webhook 工具示例
```json
{
"name": "query_order",
"description": "根据订单号查询当前订单状态,仅用于用户已提供订单号的场景。",
"parameters": {
"type": "object",
"properties": {
"order_id": {
"type": "string",
"description": "订单编号"
}
},
"required": ["order_id"]
}
}
```
## 客户端工具的作用
某些动作必须在接入端执行,例如:
- 获取当前位置
- 请求麦克风或相机权限
- 打开特定页面或原生能力
这类工具通常通过事件流和客户端配合完成,而不是由后端直接执行。
## 工具设计建议
- **一工具一职责**:不要把多个业务动作塞进同一个工具
- **名称与描述写给模型看**:必须明确何时用、何时不用
- **先设计错误返回**:失败时模型应该知道如何解释给用户
- **减少高权限工具暴露面**:不是每个助手、每个工作流节点都需要全部工具
- **把业务规则放回系统**:工具负责执行,提示词负责决策边界
## 与知识库、工作流的分工
- **知识库**:提供稳定事实
- **工具**:执行动作或读取实时状态
- **工作流**:决定何时进入某个步骤、调用哪个工具、失败如何回退
当一个助手开始涉及多步骤、多系统调用时,工具通常应与 [工作流](workflows.md) 一起设计,而不是孤立配置。
## 安全与治理
- 校验输入,不直接信任模型生成的参数
- 为工具设置最小权限和清晰的可见范围
- 记录调用日志,便于审计和回放
- 对外部接口增加超时、重试和速率限制策略
## 相关文档
- [知识库](knowledge-base.md) - 当问题更适合“查资料”时使用知识库
- [工作流](workflows.md) - 当工具调用需要流程控制和分支逻辑时接入工作流
- [助手概念](../concepts/assistants.md) - 理解工具在助手能力层中的位置

View File

@@ -0,0 +1,25 @@
# TTS 参数
TTS 参数决定助手语音输出的节奏、音量和听感。本页只讨论参数层面的调优建议。
## 常用参数
| 参数 | 说明 | 常见范围 |
|------|------|----------|
| **语速** | 说话速度 | `0.5 - 2.0` |
| **音量 / 增益** | 输出音量强弱 | 供应商自定义 |
| **音调** | 声线高低 | 供应商自定义 |
| **模型** | 合成模型名称 | 依供应商而定 |
| **声音 ID** | 发音人或音色标识 | 依供应商而定 |
## 调优建议
- 对话助手通常建议把语速控制在 `0.9 - 1.2`
- 需要打断能力的场景,优先选择低延迟流式 TTS并避免过长的单次回复
- 如果业务强调可信度或专业感,先保证清晰度和稳定性,再追求个性化音色
- 不要只试听一句问候语,至少用三类文案对比:短答复、长答复、数字或专有名词较多的答复
## 相关文档
- [声音资源](voices.md) - 先选择适合的供应商、模型和音色
- [语音识别](asr.md) - 结合输入侧延迟一起评估整条语音链路

View File

@@ -0,0 +1,43 @@
# 声音资源
本页是资源库中 TTS 声音与发音人资源的正式说明页,聚焦“选择哪种声音给助手输出”。
## 这页负责什么
当你已经决定启用语音输出后,需要在这里完成:
- 选择供应商、模型和声音资源
- 为不同业务或语言准备不同音色
- 通过预览和测试确定默认发音人
更细的速度、音量、音调等参数建议见 [TTS 参数](tts.md)。
## 选择声音时要考虑什么
| 维度 | 说明 |
|------|------|
| **语言与口音** | 是否覆盖目标用户语言与地区口音 |
| **风格** | 专业、亲切、活泼、沉稳等输出气质 |
| **延迟** | 是否适合实时对话,而不仅是离线合成 |
| **稳定性** | 长文本、多轮会话中的音色一致性 |
| **成本** | 单次调用成本和高并发可用性 |
## 推荐做法
1. 先为每类业务角色确定一条主音色
2. 再按语言或渠道补充少量备选音色
3. 通过固定测试文案试听,统一比较自然度、节奏和可懂度
4. 上线后尽量保持默认音色稳定,避免频繁切换影响用户体验
## 常见资源组织方式
| 组织方式 | 适用场景 |
|----------|----------|
| **按语言区分** | 中英文或多语种助手 |
| **按业务角色区分** | 客服、销售、培训、提醒类助手 |
| **按环境区分** | 开发、预发、生产使用不同供应商或凭证 |
## 下一步
- [TTS 参数](tts.md) - 调整语速、增益、音调等输出参数
- [快速开始](../quickstart/index.md) - 把声音资源绑定到第一个助手

View File

@@ -0,0 +1,106 @@
# 工作流
工作流用于把复杂业务拆成明确的步骤、分支和回退策略,是 RAS 中承载流程逻辑的正式能力页。
## 什么时候需要工作流
当一个助手同时满足以下任一情况时,通常应考虑工作流,而不是继续堆叠单一提示词:
- 需要多轮收集信息,例如订单号、手机号、预约时间等
- 需要按意图或条件走不同分支
- 需要串联多个工具或业务系统
- 需要在异常或信息不足时统一回退到澄清、兜底或人工节点
## 工作流与助手的关系
助手负责对外表现、全局策略和渠道接入;工作流负责把某个业务流程拆成可维护的节点。
```mermaid
flowchart LR
Assistant[助手] --> Workflow[工作流]
Workflow --> Nodes[节点与分支]
Nodes --> Tools[工具 / 知识库 / 人工]
```
这意味着:
- 助手定义角色、提示词基线、模型和输出方式
- 工作流定义“这类问题该按什么顺序被处理”
- 工具和知识库作为节点可调用的能力,被有选择地暴露给流程
## 关键组成
| 组成 | 作用 | 设计建议 |
|------|------|----------|
| **工作流名称** | 区分业务流程 | 用业务语义命名,避免过于技术化 |
| **入口节点** | 用户进入后的第一步 | 保持单入口,便于理解和测试 |
| **全局提示词** | 对所有节点生效的共性约束 | 保持简短,避免与节点提示词冲突 |
| **节点提示词** | 当前节点的任务说明 | 单一职责,明确输入 / 输出 |
| **节点工具白名单** | 控制当前节点可调用的工具集合 | 遵循最小权限原则 |
| **超时与回退** | 异常、超时、缺信息时的处理方式 | 优先回到澄清、兜底或人工节点 |
| **上下文透传** | 在节点之间共享状态 | 只传递后续节点真正需要的信息 |
## 常见节点类型
| 节点类型 | 适合做什么 |
|----------|------------|
| **路由节点** | 判断用户意图并进入不同分支 |
| **信息收集节点** | 收集订单号、联系方式、时间等关键信息 |
| **处理节点** | 调用工具、执行查询、计算或写入系统 |
| **回复节点** | 组织最终答复并控制输出风格 |
| **人工节点** | 转接人工、排队或发起通知 |
| **结束节点** | 输出结束语并关闭流程 |
## 推荐编排步骤
1. 先写清楚流程目标:这条工作流要解决哪一类业务问题
2. 画出最小节点图:入口、关键分支、结束和兜底
3. 为每个节点定义唯一职责和输入 / 输出
4. 再绑定知识库、工具和回退策略
5. 在测试面板或流程调试工具中验证每条主路径和异常路径
## 配置示例
```yaml
workflow:
name: "订单咨询流程"
entry: "intent_router"
global_prompt: "优先给出可执行步骤,必要时先澄清信息。"
nodes:
- id: "intent_router"
type: "router"
prompt: "识别用户意图:查订单、退款、投诉"
next:
- when: "intent == query_order"
to: "collect_order_id"
- when: "intent == refund"
to: "refund_policy"
- id: "collect_order_id"
type: "collect"
prompt: "请用户提供订单号"
tools: ["query_order"]
fallback: "human_handoff"
- id: "human_handoff"
type: "end"
prompt: "转人工处理"
```
## 设计建议
- **让每个节点只做一件事**:避免单节点同时负责路由、收集信息和最终回复
- **工具按节点授权**:不要把所有工具暴露给整条流程中的每个节点
- **把失败路径设计出来**:超时、无结果、参数缺失都应该有明确回退
- **优先传状态,不传长文本**:节点之间共享必要结构化信息,比传递大段自然语言更稳
- **为流程保留可观测性**:每条主路径都应能在调试时解释“为什么走到这里”
## 当前边界
- 文档不会完整覆盖所有表达式或节点字段的最终 Schema
- 不同执行引擎下,可用节点字段和运行行为可能存在差异
- 可视化编排与底层字段映射可能不会一一对应
## 相关文档
- [助手概念](../concepts/assistants.md) - 工作流在助手体系中的位置
- [工具](tools.md) - 设计可被流程安全调用的工具
- [知识库](knowledge-base.md) - 让流程中的节点使用 RAG 能力

View File

@@ -1,95 +0,0 @@
# 部署指南
## 方式一Docker 部署(推荐)
### 1. 构建镜像
```bash
docker build -t ai-video-assistant-web ./web
```
### 2. 运行容器
```bash
docker run -d \
--name ai-assistant-web \
-p 3000:80 \
ai-video-assistant-web
```
### 3. 使用 Docker Compose
```yaml
version: '3.8'
services:
web:
build: ./web
ports:
- "3000:80"
environment:
- VITE_API_URL=http://api:8080
```
运行:
```bash
docker-compose up -d
```
## 方式二Nginx 部署
### 1. 构建前端
```bash
cd web
npm run build
```
### 2. 配置 Nginx
```nginx
server {
listen 80;
server_name your-domain.com;
root /var/www/ai-assistant/dist;
index index.html;
location / {
try_files $uri $uri/ /index.html;
}
location /api {
proxy_pass http://localhost:8080;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
```
### 3. 启动 Nginx
```bash
sudo nginx -t
sudo systemctl reload nginx
```
## 环境变量配置
| 变量 | 说明 | 默认值 |
|------|------|--------|
| VITE_API_URL | 后端 API 地址 | http://localhost:8080 |
| VITE_GEMINI_API_KEY | Gemini API Key | - |
## 验证部署
1. 访问 http://your-domain.com
2. 检查页面是否正常加载
3. 验证各功能模块是否可用
## 故障排查
| 问题 | 解决方案 |
|------|---------|
| 页面空白 | 检查浏览器控制台错误 |
| API 请求失败 | 确认 VITE_API_URL 配置正确 |
| 静态资源 404 | 检查 nginx try_files 配置 |

View File

@@ -0,0 +1,161 @@
# Docker 部署
Docker 是推荐的部署方式,可以快速启动服务并确保环境一致性。
## 前提条件
- Docker 20.10+
- Docker Compose 2.0+(可选)
## 构建镜像
### Web 前端
```bash
docker build -t ai-video-assistant-web ./web
```
### API 服务
```bash
docker build -t ai-video-assistant-api ./api
```
### Engine 服务
```bash
docker build -t ai-video-assistant-engine ./engine
```
## 运行容器
### 单独运行
```bash
# Web 前端
docker run -d \
--name ai-assistant-web \
-p 3000:80 \
ai-video-assistant-web
# API 服务
docker run -d \
--name ai-assistant-api \
-p 8080:8080 \
ai-video-assistant-api
# Engine 服务
docker run -d \
--name ai-assistant-engine \
-p 8000:8000 \
ai-video-assistant-engine
```
## Docker Compose
推荐使用 Docker Compose 管理多个服务:
```yaml
version: '3.8'
services:
web:
build: ./web
ports:
- "3000:80"
environment:
- VITE_API_URL=http://api:8080
depends_on:
- api
api:
build: ./api
ports:
- "8080:8080"
environment:
- DATABASE_URL=postgresql://postgres:password@db:5432/ai_assistant
depends_on:
- db
engine:
build: ./engine
ports:
- "8000:8000"
environment:
- BACKEND_URL=http://api:8080
db:
image: postgres:15
environment:
- POSTGRES_DB=ai_assistant
- POSTGRES_PASSWORD=password
volumes:
- postgres_data:/var/lib/postgresql/data
volumes:
postgres_data:
```
### 启动服务
```bash
# 启动所有服务
docker-compose up -d
# 查看日志
docker-compose logs -f
# 停止服务
docker-compose down
```
## 镜像优化
### 多阶段构建
Web 前端 Dockerfile 示例:
```dockerfile
# 构建阶段
FROM node:18-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
# 运行阶段
FROM nginx:alpine
COPY --from=builder /app/dist /usr/share/nginx/html
COPY nginx.conf /etc/nginx/nginx.conf
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]
```
## 健康检查
```yaml
services:
api:
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
```
## 常见问题
### 容器启动失败
```bash
# 查看容器日志
docker logs ai-assistant-web
# 进入容器调试
docker exec -it ai-assistant-web sh
```
### 端口冲突
修改 `docker-compose.yml` 中的端口映射,例如 `3001:80`

View File

@@ -0,0 +1,41 @@
# 部署概览
本章节介绍如何使用 Docker 部署 Realtime Agent Studio (RAS)。
## 部署方式
| 方式 | 适用场景 | 复杂度 |
|------|---------|--------|
| [Docker 部署](docker.md) | 快速启动、容器化运行 | 简单 |
## 快速开始
### Docker 一键部署
```bash
docker build -t ai-video-assistant-web ./web
docker run -d -p 3000:80 --name ai-assistant-web ai-video-assistant-web
```
### 验证部署
1. 访问 http://localhost:3000
2. 检查页面是否正常加载
3. 验证各功能模块是否可用
## 环境变量配置
| 变量 | 说明 | 默认值 |
|------|------|--------|
| VITE_API_URL | 后端 API 地址 | http://localhost:8080 |
| VITE_GEMINI_API_KEY | Gemini API Key | - |
## 故障排查
| 问题 | 解决方案 |
|------|---------|
| 页面空白 | 检查浏览器控制台错误 |
| API 请求失败 | 确认 VITE_API_URL 配置正确 |
| 静态资源 404 | 检查 nginx try_files 配置 |
更多问题请参考 [故障排查](../resources/troubleshooting.md)。

View File

@@ -1,65 +0,0 @@
# 助手管理
助手是 AI Video Assistant 的核心模块,用于创建和配置智能对话机器人。
## 创建助手
![助手管理](../images/assistants.png)
### 基本配置
1. 进入 **助手管理** 页面
2. 点击 **新建助手** 按钮
3. 填写基本信息:
| 配置项 | 说明 |
|-------|------|
| 助手名称 | 唯一标识,用于区分不同助手 |
| 提示词 | 定义助手的角色和行为 |
| 温度参数 | 控制回复的随机性0-1 |
### 配置标签页
#### 全局设置
- 设置助手的核心对话能力
- 配置上下文长度
- 设置对话开场白
#### 语音配置
| 配置 | 说明 |
|------|------|
| TTS 引擎 | 选择语音合成服务(阿里/火山/Minimax |
| 音色 | 选择语音风格和性别 |
| 语速 | 语音播放速度 |
| 音量 | 语音输出音量 |
#### 工具绑定
- 配置助手可调用的外部工具
- 启用/禁用特定功能模块
#### 知识关联
- 关联 RAG 知识库
- 配置检索参数(相似度阈值、返回数量)
#### 外部链接
- 配置第三方服务集成
- 设置 Webhook 回调
## 调试助手
在助手详情页可进行实时调试:
- 文本对话测试
- 语音输入测试
- 工具调用验证
## 发布助手
配置完成后:
1. 点击 **保存**
2. 点击 **发布**
3. 获取 API 调用地址

View File

@@ -1,53 +0,0 @@
# 知识库
知识库基于 RAG检索增强生成技术让 AI 能够回答私有领域问题。
## 概述
![知识库](../images/knowledge.png)
## 创建知识库
### 步骤
1. 进入 **知识库** 页面
2. 点击 **新建知识库**
3. 填写知识库名称
4. 上传文档
### 支持格式
| 格式 | 说明 |
|------|------|
| Markdown | 最佳选择,格式清晰 |
| PDF | 自动提取文本 |
| TXT | 纯文本支持 |
| Word | 需转换为其他格式 |
### 文档上传
- 拖拽上传或点击选择
- 单文件大小限制 10MB
- 建议单文档不超过 50000 字
## 配置检索参数
| 参数 | 说明 | 默认值 |
|------|------|--------|
| 相似度阈值 | 低于此分数的结果不返回 | 0.7 |
| 返回数量 | 单次检索返回的结果数 | 3 |
| 分块大小 | 文档分块的最大长度 | 500 |
## 管理知识库
- **查看文档** - 浏览已上传的文件
- **删除文档** - 移除不需要的内容
- **更新文档** - 重新上传覆盖
- **测试检索** - 验证知识库效果
## 关联助手
在助手配置的 **知识** 标签页中:
1. 选择要关联的知识库
2. 设置检索策略
3. 保存配置

View File

@@ -1,44 +0,0 @@
# 模型配置
## LLM 模型库
![LLM模型库](../images/llms.png)
### 支持的模型
| 供应商 | 模型 | 特点 |
|--------|------|------|
| **OpenAI** | GPT-4 / GPT-3.5 | 通用能力强 |
| **DeepSeek** | DeepSeek Chat | 高性价比 |
| **SiliconFlow** | 多种开源模型 | 本地部署友好 |
| **Google** | Gemini Pro | 多模态支持 |
### 配置步骤
1. 进入 **LLM 库** 页面
2. 点击 **添加模型**
3. 选择供应商
4. 填写 API Key 和 Endpoint
5. 设置默认参数
### 参数说明
| 参数 | 说明 | 建议值 |
|------|------|--------|
| Temperature | 随机性 | 0.7 |
| Max Tokens | 最大输出长度 | 2048 |
| Top P | 核采样 | 0.9 |
## ASR 语音识别
### 支持引擎
- **Whisper** - OpenAI 通用语音识别
- **SenseVoice** - 高精度中文语音识别
### 配置方法
1. 进入 **ASR 库** 页面
2. 选择识别引擎
3. 配置音频参数(采样率、编码)
4. 测试识别效果

View File

@@ -1,58 +0,0 @@
# 语音合成
语音合成TTS模块提供自然流畅的语音输出能力。
## 概述
![语音合成](../images/voices.png)
## 支持的引擎
| 供应商 | 特点 | 适用场景 |
|--------|------|---------|
| **阿里云** | 多音色、高自然度 | 通用场景 |
| **火山引擎** | 低延迟、实时性好 | 实时对话 |
| **Minimax** | 高性价比 | 批量合成 |
## 配置方法
### 添加语音配置
1. 进入 **语音库** 页面
2. 点击 **添加语音**
3. 选择供应商
4. 填写 API 凭证
5. 保存配置
### 测试语音
- 在线预览发音效果
- 调整语速和音量
- 切换不同音色
## 音色选择
### 中文音色
| 音色 | 风格 |
|------|------|
| 晓晓 | 标准女声 |
| 晓北 | 知性女声 |
| 逍遥 | 青年男声 |
| 丫丫 | 活泼童声 |
### 英文音色
| 音色 | 风格 |
|------|------|
| Joanna | 专业女声 |
| Matthew | 沉稳男声 |
| Amy | 亲切女声 |
## 参数调优
| 参数 | 范围 | 说明 |
|------|------|------|
| 语速 | 0.5-2.0 | 1.0 为正常速度 |
| 音量 | 0-100 | 输出音量百分比 |
| 音调 | 0.5-2.0 | 语音音调高低 |

View File

@@ -1,53 +0,0 @@
# 工作流管理
工作流提供可视化的对话流程编排能力,支持复杂的业务场景。
## 概述
![工作流](../images/workflows.png)
## 节点类型
| 节点 | 图标 | 功能说明 |
|------|------|---------|
| **对话节点** | 💬 | AI 自动回复,可设置回复策略 |
| **工具节点** | 🔧 | 调用外部 API 或自定义工具 |
| **人工节点** | 👤 | 转接人工客服 |
| **结束节点** | 🏁 | 结束对话流程 |
## 创建工作流
### 步骤
1. 进入 **工作流** 页面
2. 点击 **新建工作流**
3. 从左侧拖拽节点到画布
4. 连接节点建立流程
5. 配置各节点参数
6. 保存并发布
### 节点配置
#### 对话节点配置
- 回复模板
- 条件分支
- 知识库检索
#### 工具节点配置
- 选择工具类型
- 配置输入参数
- 设置输出处理
#### 人工节点配置
- 转接规则
- 排队策略
- 通知设置
## 流程测试
- 支持单步调试
- 可查看执行日志
- 实时验证流程逻辑

View File

@@ -1,59 +0,0 @@
# 快速开始
## 环境准备
### 前置条件
| 软件 | 版本要求 |
|------|---------|
| Node.js | 18.0 或更高 |
| npm/yarn/pnpm | 最新版本 |
| 现代浏览器 | Chrome 90+ / Firefox 90+ / Edge 90+ |
### 检查环境
```bash
node --version
npm --version
```
## 安装步骤
### 1. 克隆项目
```bash
git clone https://github.com/your-repo/AI-VideoAssistant.git
cd AI-VideoAssistant
```
### 2. 安装依赖
```bash
cd web
npm install
```
### 3. 配置环境变量
创建 `.env` 文件:
```env
VITE_API_URL=http://localhost:8080
VITE_GEMINI_API_KEY=your_api_key_here
```
### 4. 启动开发服务器
```bash
npm run dev
```
访问 http://localhost:3000
## 构建生产版本
```bash
npm run build
```
构建产物在 `dist` 目录。

View File

@@ -0,0 +1,279 @@
# 配置说明
本页面介绍 Realtime Agent Studio 各组件的配置方法。
---
## 配置概览
RAS 采用分层配置,各组件独立配置:
```mermaid
flowchart TB
subgraph Config["配置层级"]
ENV[环境变量]
File[配置文件]
DB[数据库配置]
end
subgraph Services["服务组件"]
Web[Web 前端]
API[API 服务]
Engine[Engine 服务]
end
ENV --> Web
ENV --> API
ENV --> Engine
File --> API
File --> Engine
DB --> API
```
---
## Web 前端配置
### 环境变量
`web/` 目录创建 `.env` 文件:
```env
# API 服务地址(必填)
VITE_API_URL=http://localhost:8080
# Engine WebSocket 地址(可选,默认同 API 服务器)
VITE_WS_URL=ws://localhost:8000
# Google Gemini API Key可选用于前端直连
VITE_GEMINI_API_KEY=your_api_key
```
### 变量说明
| 变量 | 必填 | 说明 | 默认值 |
|------|:----:|------|--------|
| `VITE_API_URL` | ✅ | 后端 API 服务地址 | - |
| `VITE_WS_URL` | ❌ | WebSocket 服务地址 | 从 API URL 推断 |
| `VITE_GEMINI_API_KEY` | ❌ | Gemini API 密钥 | - |
### 开发环境配置
```env
# .env.development
VITE_API_URL=http://localhost:8080
VITE_WS_URL=ws://localhost:8000
```
---
## API 服务配置
### 环境变量
```env
# 数据库配置
DATABASE_URL=sqlite:///./data/app.db
# 或 PostgreSQL
# DATABASE_URL=postgresql://user:pass@localhost:5432/ras
# Redis 配置(可选)
REDIS_URL=redis://localhost:6379/0
# 安全配置
SECRET_KEY=your-secret-key-at-least-32-chars
CORS_ORIGINS=http://localhost:3000,https://your-domain.com
# 日志级别
LOG_LEVEL=INFO
# 文件存储路径
UPLOAD_DIR=./uploads
```
### 配置文件
API 服务支持 YAML 配置文件 `api/config/settings.yaml`
```yaml
# 服务配置
server:
host: "0.0.0.0"
port: 8080
workers: 4
# 数据库配置
database:
url: "sqlite:///./data/app.db"
pool_size: 5
max_overflow: 10
# Redis 配置
redis:
url: "redis://localhost:6379/0"
# 安全配置
security:
secret_key: "your-secret-key"
token_expire_minutes: 1440
# 日志配置
logging:
level: "INFO"
format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
```
---
## Engine 服务配置
### 环境变量
```env
# 后端 API 地址
BACKEND_URL=http://localhost:8080
# WebSocket 服务配置
WS_HOST=0.0.0.0
WS_PORT=8000
# 音频配置
AUDIO_SAMPLE_RATE=16000
AUDIO_CHANNELS=1
# 日志级别
LOG_LEVEL=INFO
```
### 引擎配置
Engine 配置文件 `engine/config/engine.yaml`
```yaml
# WebSocket 服务
websocket:
host: "0.0.0.0"
port: 8000
ping_interval: 30
ping_timeout: 10
# 音频处理
audio:
sample_rate: 16000
channels: 1
chunk_size: 640 # 20ms at 16kHz
# VAD 配置
vad:
enabled: true
threshold: 0.5
min_speech_duration: 0.25
min_silence_duration: 0.5
# 引擎默认配置
defaults:
engine_type: "pipeline" # pipeline 或 multimodal
max_response_tokens: 512
temperature: 0.7
```
---
## Docker 配置
### docker-compose.yml 环境变量
```yaml
version: '3.8'
services:
web:
environment:
- VITE_API_URL=http://api:8080
api:
environment:
- DATABASE_URL=postgresql://postgres:password@db:5432/ras
- REDIS_URL=redis://redis:6379/0
- SECRET_KEY=${SECRET_KEY}
engine:
environment:
- BACKEND_URL=http://api:8080
- LOG_LEVEL=INFO
```
### 使用 .env 文件
在项目根目录创建 `.env`
```env
# Docker Compose 会自动加载
SECRET_KEY=your-secret-key-at-least-32-chars
POSTGRES_PASSWORD=secure-db-password
```
---
## 配置优先级
配置按以下优先级加载(高优先级覆盖低优先级):
```
1. 命令行参数(最高)
2. 环境变量
3. .env 文件
4. 配置文件 (yaml)
5. 代码默认值(最低)
```
---
## 敏感配置管理
!!! danger "安全提醒"
不要将敏感信息提交到代码仓库!
### 推荐做法
1. **使用 .env 文件**,并将其加入 `.gitignore`
2. **使用环境变量**,通过 CI/CD 注入
3. **使用密钥管理服务**,如 AWS Secrets Manager、HashiCorp Vault
### .gitignore 配置
```gitignore
# 环境配置文件
.env
.env.local
.env.*.local
# 敏感数据目录
/secrets/
*.pem
*.key
```
---
## 配置验证
启动服务前验证配置是否正确:
```bash
# 验证 API 服务配置
cd api
python -c "from app.config import settings; print(settings)"
# 验证 Engine 配置
cd engine
python -c "from config import settings; print(settings)"
```
---
## 下一步
- [环境与部署](index.md) - 开始安装服务
- [Docker 部署](../deployment/docker.md) - 容器化部署

View File

@@ -0,0 +1,115 @@
# 环境与部署
本页属于“快速开始”中的环境与部署路径,只负责把服务跑起来、说明配置入口和部署方式。首次创建助手请转到 [创建第一个助手](../quickstart/index.md)。
---
## 先理解部署对象
Realtime Agent StudioRAS通常由三个核心服务组成
```mermaid
flowchart LR
subgraph Services["服务组件"]
Web[Web 前端<br/>React + TypeScript]
API[API 服务<br/>FastAPI]
Engine[Engine 服务<br/>WebSocket]
end
subgraph Storage["数据存储"]
DB[(SQLite/PostgreSQL)]
end
Web -->|REST| API
Web -->|WebSocket| Engine
API <--> DB
Engine <--> API
```
| 组件 | 默认端口 | 负责什么 |
|------|----------|----------|
| **Web 前端** | 3000 | 管理控制台与调试界面 |
| **API 服务** | 8080 | 资源管理、配置持久化、历史数据 |
| **Engine 服务** | 8000 | 实时会话、事件流和音频流 |
## 选择你的安装方式
### 方式一Docker Compose
适合希望尽快跑通一套完整环境的团队。
```bash
# 仓库目录示例沿用当前代码仓库 slug
# 你本地实际目录名可以不同
git clone https://github.com/your-org/AI-VideoAssistant.git
cd AI-VideoAssistant
docker-compose up -d
```
### 方式二:本地开发
适合需要分别调试前端、API 和 Engine 的开发者。
#### 启动 API 服务
```bash
cd api
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
uvicorn main:app --host 0.0.0.0 --port 8080 --reload
```
#### 启动 Engine 服务
```bash
cd engine
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python main.py
```
#### 启动 Web 前端
```bash
cd web
npm install
npm run dev
```
## 基础验证
完成安装后,至少确认以下入口可访问:
| 服务 | 地址 | 用途 |
|------|------|------|
| Web | `http://localhost:3000` | 打开控制台 |
| API | `http://localhost:8080/docs` | 查看管理接口 |
| Engine | `http://localhost:8000/health` | 检查实时引擎健康状态 |
如果你需要更完整的环境变量、配置文件和部署说明,请继续阅读本章节其他页面:
- [环境要求](requirements.md)
- [配置说明](configuration.md)
- [部署概览](../deployment/index.md)
- [Docker 部署](../deployment/docker.md)
## 目录结构(阅读导向)
```text
repo/
├── web/ # 管理控制台
├── api/ # 控制面与管理接口
├── engine/ # 实时交互引擎
├── docker/ # 部署编排与镜像配置
└── docs/ # 当前文档站点
```
## 遇到问题时去哪里
- 需要“快速判断往哪看”:先看 [常见问题](../resources/faq.md)
- 需要“按步骤排查”:直接看 [故障排查](../resources/troubleshooting.md)
- 已经跑通环境,准备创建助手:回到 [快速开始](../quickstart/index.md)

View File

@@ -0,0 +1,150 @@
# 环境要求
本页面列出运行 Realtime Agent Studio 所需的软件和硬件要求。
---
## 软件依赖
### 必需软件
| 软件 | 版本要求 | 说明 | 安装命令 |
|------|---------|------|---------|
| **Node.js** | 18.0+ | 前端构建运行 | `nvm install 18` |
| **Python** | 3.10+ | 后端服务 | `pyenv install 3.10` |
| **Docker** | 20.10+ | 容器化部署(可选) | [安装指南](https://docs.docker.com/get-docker/) |
### 可选软件
| 软件 | 版本要求 | 用途 |
|------|---------|------|
| **Docker Compose** | 2.0+ | 多服务编排 |
| **PostgreSQL** | 14+ | 生产数据库 |
| **Redis** | 6.0+ | 缓存与会话 |
| **Nginx** | 1.20+ | 反向代理 |
---
## 版本检查
运行以下命令验证环境:
=== "Node.js"
```bash
node --version
# v18.0.0 或更高
npm --version
# 8.0.0 或更高
```
=== "Python"
```bash
python --version
# Python 3.10.0 或更高
pip --version
# pip 22.0 或更高
```
=== "Docker"
```bash
docker --version
# Docker version 20.10.0 或更高
docker compose version
# Docker Compose version v2.0.0 或更高
```
---
## 浏览器支持
控制台需要现代浏览器支持 WebSocket 和 Web Audio API
| 浏览器 | 最低版本 | 推荐版本 |
|--------|---------|---------|
| Chrome | 90+ | 最新版 |
| Firefox | 90+ | 最新版 |
| Edge | 90+ | 最新版 |
| Safari | 14+ | 最新版 |
!!! warning "IE 不支持"
Internet Explorer 不受支持,请使用现代浏览器。
---
## 硬件要求
### 开发环境
| 资源 | 最低配置 | 推荐配置 |
|------|---------|---------|
| **CPU** | 2 核心 | 4 核心+ |
| **内存** | 4GB | 8GB+ |
| **磁盘** | 10GB | 20GB+ SSD |
| **网络** | 10Mbps | 100Mbps |
---
## 网络要求
### 出站访问
以下外部服务需要网络可达(根据使用的模型供应商):
| 服务 | 域名 | 端口 | 用途 |
|------|------|------|------|
| **OpenAI** | api.openai.com | 443 | LLM / TTS |
| **Azure OpenAI** | *.openai.azure.com | 443 | LLM / ASR / TTS |
| **阿里云** | *.aliyuncs.com | 443 | DashScope TTS |
| **SiliconFlow** | api.siliconflow.cn | 443 | ASR / TTS |
| **DeepSeek** | api.deepseek.com | 443 | LLM |
### 端口规划
| 服务 | 默认端口 | 可配置 |
|------|---------|--------|
| Web 前端 | 3000 | ✅ |
| API 服务 | 8080 | ✅ |
| Engine 服务 | 8000 | ✅ |
| PostgreSQL | 5432 | ✅ |
| Redis | 6379 | ✅ |
---
## 操作系统
### 支持的系统
| 操作系统 | 版本 | 支持状态 |
|---------|------|---------|
| **Ubuntu** | 20.04 LTS, 22.04 LTS | ✅ 完全支持 |
| **Debian** | 11, 12 | ✅ 完全支持 |
| **CentOS** | 8+ | ✅ 完全支持 |
| **macOS** | 12+ (Monterey) | ✅ 开发支持 |
| **Windows** | 10/11 + WSL2 | ✅ 开发支持 |
### Windows 注意事项
推荐使用 WSL2 进行开发:
```powershell
# 安装 WSL2
wsl --install
# 安装 Ubuntu
wsl --install -d Ubuntu
```
---
## 下一步
- [配置说明](configuration.md) - 环境变量配置
- [环境与部署](index.md) - 开始安装
- [Docker 部署](../deployment/docker.md) - 容器化部署

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.1 MiB

View File

@@ -1,200 +1,186 @@
# AI Video Assistant 使用说明
<p align="center">
<img src="images/logo.png" alt="Realtime Agent Studio" width="400">
</p>
## 产品概述
<p align="center">
<strong>通过管理控制台与 API 构建、部署和运营实时多模态助手</strong>
</p>
AI Video Assistant 是一款基于大语言模型的智能对话与工作流管理平台,支持多模型集成、语音合成、自动化测试等功能,帮助企业快速构建智能客服系统。
<p align="center">
<img src="https://img.shields.io/badge/version-0.1.0-blue" alt="Version">
<img src="https://img.shields.io/badge/license-MIT-green" alt="License">
<img src="https://img.shields.io/badge/python-3.10+-blue" alt="Python">
<img src="https://img.shields.io/badge/node-18+-green" alt="Node">
</p>
![仪表盘](images/dashboard.png)
<p align="center">
<a href="overview/index.md">产品概览</a> ·
<a href="quickstart/index.md">快速开始</a> ·
<a href="concepts/assistants.md">构建助手</a> ·
<a href="concepts/index.md">核心概念</a> ·
<a href="api-reference/index.md">API 参考</a>
</p>
## 核心功能
---
| 功能模块 | 描述 |
|---------|------|
| **仪表盘** | 实时数据统计与可视化分析 |
| **助手管理** | 创建、配置、测试 AI 助手 |
| **工作流** | 可视化流程编排 |
| **模型库** | LLM/ASR/语音模型配置 |
| **知识库** | RAG 文档知识管理 |
| **历史记录** | 对话日志查询与分析 |
| **自动化测试** | 批量测试与质量评估 |
Realtime Agent Studio (RAS) 是一个通过管理控制台与 API 构建、部署和运营实时多模态助手的开源平台。
## 快速开始
## 适合谁
### 环境要求
- 需要把实时语音或视频助手接入产品、设备或内部系统的开发团队
- 需要通过控制台快速配置提示词、模型、知识库、工具和工作流的运营团队
- 需要私有化部署、模型可替换、链路可观测的企业场景
- Node.js 18+
- 现代浏览器Chrome/Firefox/Edge
## 核心能力
### 启动服务
<div class="grid cards" markdown>
```bash
cd web
npm install
npm run dev
- :material-robot-outline: **助手构建**
---
用统一的助手对象管理提示词、模型、知识库、工具、开场白和会话策略。
- :material-pulse: **双引擎运行时**
---
同时支持 Pipeline 引擎与 Realtime 引擎,可按延迟、成本和可控性选择运行方式。
- :material-source-branch: **能力扩展**
---
通过资源库、知识库、工具与工作流扩展助手能力,而不是把全部逻辑塞进单一提示词。
- :material-api: **开放集成**
---
使用 REST API 管理资源,使用 WebSocket API 接入实时对话,面向 Web、移动端和第三方系统。
- :material-shield-lock-outline: **私有化部署**
---
支持 Docker 部署、自有模型服务和企业内网运行,便于满足合规与成本要求。
- :material-chart-line: **可观测与评估**
---
提供会话历史、实时指标、自动化测试和效果评估,帮助持续改进助手质量。
</div>
## 系统架构
平台架构层级:
```mermaid
flowchart TB
subgraph Access["Access Layer"]
API["API"]
SDK["SDK"]
Browser["Browser UI"]
Embed["Web Embed"]
end
subgraph Runtime["Realtime Interaction Engine"]
direction LR
subgraph Duplex["Duplex Interaction Engine"]
direction LR
subgraph Pipeline["Pipeline Engine"]
direction LR
VAD["VAD"]
ASR["ASR"]
TD["Turn Detection"]
LLM["LLM"]
TTS["TTS"]
end
subgraph Multi["Realtime Engine"]
MM["Realtime Model"]
end
end
subgraph Capability["Agent Capabilities"]
subgraph Tools["Tool System"]
Webhook["Webhook"]
ClientTool["Client Tools"]
Builtin["Builtin Tools"]
end
subgraph KB["Knowledge System"]
Docs["Documents"]
Vector[("Vector Index")]
Retrieval["Retrieval"]
end
end
end
subgraph Platform["Platform Services"]
direction TB
Backend["Backend Service"]
Frontend["Frontend Console"]
DB[("Database")]
end
Access --> Runtime
Runtime <--> Backend
Backend <--> DB
Backend <--> Frontend
LLM --> Tools
MM --> Tools
LLM <--> KB
MM <--> KB
```
访问 `http://localhost:3000`
## 从这里开始
## 详细使用指南
<div class="grid cards" markdown>
### 1. 仪表盘
- :material-compass-outline: **[了解产品](overview/index.md)**
![仪表盘](images/dashboard.png)
---
仪表盘展示系统核心指标:
- **总对话数** - 累计对话请求数量
- **回答率** - 成功回答的对话占比
- **平均时长** - 单次对话平均持续时间
- **人工转接率** - 需要人工介入的对话比例
先看产品定位、核心模块、适用场景,以及 RAS 与其他方案的差异。
### 2. 助手管理
- :material-cog-outline: **[环境与部署](getting-started/index.md)**
![助手管理](images/assistants.png)
---
#### 创建助手
先把服务跑起来,了解环境要求、配置入口和部署方式。
1. 点击 **创建助手**
2. 配置助手基本信息(名称、提示词)
3. 选择对话语言与音色
4. 绑定知识库和工具
- :material-rocket-launch-outline: **[创建第一个助手](quickstart/index.md)**
#### 配置选项
---
| 标签页 | 配置项 |
|-------|--------|
| 全局 | 名称、提示词、温度参数 |
| 语音 | TTS 引擎、音色、语言 |
| 工具 | 可用工具列表 |
| 知识 | RAG 知识库关联 |
| 链接 | 外部服务配置 |
按最短路径准备资源、创建助手、测试效果并拿到接入所需信息。
### 3. 工作流
- :material-tune: **[构建助手](concepts/assistants.md)**
![工作流管理](images/workflows.png)
---
#### 工作流节点类型
按完整链路配置助手、提示词、模型、知识库、工具与工作流。
| 节点 | 功能 |
|------|------|
| 对话节点 | AI 自动回复 |
| 工具节点 | 调用外部工具 |
| 人工节点 | 转接人工客服 |
| 结束节点 | 结束对话流程 |
- :material-connection: **[接入应用](api-reference/index.md)**
### 4. 模型配置
---
![模型库](images/llms.png)
查看 REST 与 WebSocket 接口,把助手嵌入到你的 Web、移动端或服务端系统。
#### 支持的 LLM 模型
- :material-lifebuoy: **[排查问题](resources/troubleshooting.md)**
- **OpenAI** - GPT-4/GPT-3.5
- **DeepSeek** - DeepSeek Chat
- **SiliconFlow** - 多种开源模型
- **Google Gemini** - Gemini Pro
---
#### ASR 语音识别
当连接、对话质量或部署链路出现问题时,从这里进入可执行的排查步骤。
- **Whisper** - OpenAI 语音识别
- **SenseVoice** - 高精度中文识别
</div>
### 5. 知识库
![知识库](images/knowledge.png)
#### 创建知识库
1. 进入 **知识库** 页面
2. 点击 **新建知识库**
3. 上传文档(支持 Markdown/PDF/TXT
4. 配置检索参数
### 6. 历史记录
![历史记录](images/history.png)
查询条件:
- 按时间范围筛选
- 按助手名称搜索
- 查看对话详情与统计
### 7. 自动化测试
![自动化测试](images/autotest.png)
#### 测试类型
| 类型 | 说明 |
|------|------|
| 固定测试 | 预设问答对测试 |
| 智能测试 | AI 生成测试用例 |
#### 评估指标
- 回复准确率
- 回答完整度
- 响应时间
### 8. 语音合成
![语音合成](images/voices.png)
#### 支持的 TTS 引擎
- **阿里云** - 多音色可选
- **火山引擎** - 高自然度
- **Minimax** - 低延迟
### 9. 个人中心
![个人中心](images/profile.png)
管理账户信息与系统设置。
## 部署指南
### Docker 部署(推荐)
```bash
# 构建镜像
docker build -t ai-video-assistant .
# 运行容器
docker run -d -p 3000:3000 --name ai-assistant ai-video-assistant
```
### Nginx 反向代理
```nginx
server {
listen 80;
server_name your-domain.com;
location / {
proxy_pass http://localhost:3000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
```
## 常见问题
### Q: 如何配置 API Key
进入 **LLM 库****语音库** 页面,点击对应模型的配置按钮填写 API Key。
### Q: 助手无法回复?
1. 检查模型配置是否正确
2. 确认知识库已正确关联
3. 查看系统日志排查错误
### Q: 语音识别不准确?
- 确认 ASR 模型选择正确
- 检查音频采样率(推荐 16kHz
- 确认语言设置匹配
## 技术支持
如有问题,请提交 Issue 或联系技术支持团队。

View File

@@ -0,0 +1,26 @@
// Realtime Agent Studio - Custom JavaScript
document.addEventListener("DOMContentLoaded", function () {
// Add external link icons
document.querySelectorAll('a[href^="http"]').forEach(function (link) {
if (!link.hostname.includes(window.location.hostname)) {
link.setAttribute("target", "_blank");
link.setAttribute("rel", "noopener noreferrer");
}
});
// Smooth scroll for anchor links
document.querySelectorAll('a[href^="#"]').forEach(function (anchor) {
anchor.addEventListener("click", function (e) {
const targetId = this.getAttribute("href").slice(1);
const targetElement = document.getElementById(targetId);
if (targetElement) {
e.preventDefault();
targetElement.scrollIntoView({
behavior: "smooth",
block: "start",
});
}
});
});
});

View File

@@ -0,0 +1,18 @@
/**
* Global Mermaid config for consistent diagram sizing across all docs.
* Exposed as window.mermaid so Material for MkDocs uses this instance.
*/
import mermaid from "https://cdn.jsdelivr.net/npm/mermaid@11/dist/mermaid.esm.min.mjs";
mermaid.initialize({
startOnLoad: false,
securityLevel: "loose",
theme: "base",
useMaxWidth: false,
themeVariables: {
fontSize: "14px",
fontFamily: "Inter, sans-serif",
},
});
window.mermaid = mermaid;

View File

@@ -0,0 +1,312 @@
# 系统架构
本文档只解释 Realtime Agent Studio (RAS) 的服务边界、数据流、部署形态和关键技术选型,不重复产品定位或上手流程。
---
## 整体架构
RAS 采用前后端分离的微服务架构,主要由三个核心服务组成:
```mermaid
flowchart TB
subgraph Client["客户端"]
Browser[Web 浏览器]
Mobile[移动应用]
ThirdParty[第三方系统]
end
subgraph Frontend["前端服务"]
WebApp[React 管理控制台]
end
subgraph Backend["后端服务"]
API[API 服务<br/>FastAPI]
Engine[实时交互引擎<br/>WebSocket]
end
subgraph Storage["数据存储"]
DB[(SQLite/PostgreSQL)]
FileStore[文件存储]
end
subgraph External["外部服务"]
OpenAI[OpenAI]
SiliconFlow[SiliconFlow]
DashScope[DashScope]
LocalModel[本地模型]
end
subgraph Tools["工具"]
Webhook[Webhook]
ClientTool[客户端工具]
Builtin[内建工具]
end
Browser --> WebApp
Mobile -->|WebSocket| Engine
ThirdParty -->|REST API| API
WebApp -->|REST API| API
WebApp -->|WebSocket| Engine
API <--> DB
API <--> FileStore
Engine <--> API
Engine --> External
Engine --> Tools
```
---
## 核心组件
### 1. Web 前端 (React)
管理控制台,提供可视化的配置、测试和监控界面。
| 功能模块 | 说明 |
|---------|------|
| 助手管理 | 创建、配置、测试智能助手 |
| 资源库 | LLM / ASR / TTS 等模型管理 |
| 知识库 | RAG 文档上传与管理 |
| 历史记录 | 会话日志查询与回放 |
| 仪表盘 | 实时数据统计 |
| 调试控制台 | WebSocket 实时测试 |
### 2. API 服务 (FastAPI)
REST API 后端,处理资源管理、持久化配置和历史数据等控制面能力。
```mermaid
flowchart LR
subgraph API["API 服务"]
Router[路由层]
Service[业务逻辑层]
Model[数据模型层]
end
Client[客户端] --> Router
Router --> Service
Service --> Model
Model --> DB[(数据库)]
```
**主要职责:**
- 助手 CRUD 操作
- 模型资源管理
- 知识库管理
- 会话记录存储
- 认证与授权
### 3. 实时交互引擎 (Engine)
处理实时音视频对话、事件流转、模型调用与工具执行。
```mermaid
flowchart TB
subgraph Engine["实时交互引擎"]
WS[WebSocket Handler]
SM[会话管理器]
subgraph Pipeline["管线式引擎"]
VAD[声音活动检测 VAD]
ASR[语音识别 ASR]
TD[回合检测 TD]
LLM[大语言模型 LLM]
TTS[语音合成 TTS]
end
subgraph Realtime["实时引擎连接"]
RTOpenAI[OpenAI Realtime]
RTGemini[Gemini Live]
RTDoubao[Doubao 实时交互]
end
subgraph Tools["工具"]
Webhook[Webhook]
ClientTool[客户端工具]
Builtin[内建工具]
end
end
Client[客户端] -->|音频流| WS
WS --> SM
SM --> Pipeline
SM --> Realtime
Pipeline --> LLM
LLM --> Tools
Realtime --> Tools
Pipeline -->|文本/音频| WS
Realtime -->|文本/音频| WS
```
### 外部服务与工具
| 类别 | 说明 | 可选项 |
|------|------|--------|
| **外部模型服务** | Pipeline 引擎各环节依赖的云端或本地服务 | OpenAI、SiliconFlow、DashScope、本地模型 |
| **实时模型连接** | Realtime 引擎可直接连接的后端 | OpenAI Realtime、Gemini Live、Doubao 实时交互 |
| **工具系统** | 由助手或引擎调用的外部执行能力 | Webhook、客户端工具、内建工具 |
---
## 引擎架构
### 管线式全双工引擎
管线式引擎由 **VAD → ASR → TD → LLM → TTS** 组成。每个环节可替换,适合需要精细控制、工具扩展和较高可解释性的场景。
```mermaid
sequenceDiagram
participant C as 客户端
participant E as 引擎
participant VAD as VAD
participant ASR as 语音识别
participant TD as 回合检测
participant LLM as 大语言模型
participant TTS as 语音合成
participant Tools as 工具
C->>E: 音频流 (PCM)
E->>VAD: 检测语音活动
VAD-->>E: 有效语音段
E->>ASR: 语音转写
ASR-->>E: 转写文本
E->>TD: 判断回合边界
TD-->>E: 可送入 LLM 的输入
E->>LLM: 生成回复
LLM->>Tools: 可选:调用工具
Tools-->>LLM: 工具结果
LLM-->>E: 回复文本 (流式)
E->>TTS: 文本转语音
TTS-->>E: 音频流
E->>C: 播放音频
```
**特点:**
- 各环节可单独替换和优化
- 便于接入知识库、工具、工作流等能力
- 延迟通常高于端到端实时模型,但可控性更强
### Realtime 引擎
Realtime 引擎直接连接端到端实时模型,适合追求更低延迟和更自然多模态交互的场景。
```mermaid
sequenceDiagram
participant C as 客户端
participant E as 引擎
participant RT as Realtime Model
C->>E: 音频/视频/文本输入
E->>RT: 实时流输入
RT-->>E: 流式文本/音频输出
E->>C: 播放或渲染结果
```
**特点:**
- 交互链路更短,延迟更低
- 更依赖具体模型供应商的能力边界
- 适合强调自然对话和多模态体验的入口
---
## 数据流
### WebSocket 会话流程
```mermaid
sequenceDiagram
participant C as 客户端
participant E as 引擎
participant API as API 服务
participant DB as 数据库
C->>E: 连接 ws://.../ws?assistant_id=xxx
E->>API: 获取助手配置
API->>DB: 查询助手
DB-->>API: 助手数据
API-->>E: 配置信息
C->>E: session.start
E-->>C: session.started
E-->>C: config.resolved
loop 对话循环
C->>E: 音频帧 (binary)
E-->>C: input.speech_started
E-->>C: transcript.delta
E-->>C: transcript.final
E-->>C: assistant.response.delta
E-->>C: output.audio.start
E-->>C: 音频帧 (binary)
E-->>C: output.audio.end
end
C->>E: session.stop
E->>API: 保存会话记录
API->>DB: 存储
E-->>C: session.stopped
```
### 智能打断流程
```mermaid
sequenceDiagram
participant C as 客户端
participant E as 引擎
participant TTS as TTS 服务
Note over E: 正在播放 TTS 音频
E->>C: 音频帧...
C->>E: 用户说话 (VAD 检测)
E->>E: 触发打断
E->>TTS: 停止合成
E-->>C: output.audio.interrupted
Note over E: 处理新的用户输入
E-->>C: input.speech_started
```
---
## 部署形态
### 开发环境
```mermaid
flowchart LR
subgraph Local["本地开发"]
Web[npm run dev<br/>:3000]
API[uvicorn<br/>:8080]
Engine[python main.py<br/>:8000]
DB[(SQLite)]
end
Web --> API
Web --> Engine
API --> DB
Engine --> API
```
## 技术选型
| 组件 | 技术 | 说明 |
|------|------|------|
| **前端框架** | React 18 | 管理控制台与调试界面 |
| **状态管理** | Zustand | 前端轻量状态管理 |
| **UI 样式** | Tailwind CSS | 快速构建控制台界面 |
| **后端框架** | FastAPI | 管理接口与配置持久化 |
| **WebSocket** | websockets | 实时事件与音频流通信 |
| **数据库** | SQLite / PostgreSQL | 配置与历史数据存储 |
---
## 相关文档
- [产品概览](index.md) - 产品定位、核心模块与适用场景
- [引擎架构](../concepts/engines.md) - Pipeline 与 Realtime 的选择指南
- [WebSocket 协议](../api-reference/websocket.md) - 实时对话事件和消息格式

View File

@@ -0,0 +1,84 @@
# 产品概览
Realtime Agent Studio (RAS) 是一个通过管理控制台与 API 构建、部署和运营实时多模态助手的开源平台。
---
## 产品定位
RAS 面向需要构建实时语音或视频助手的团队,目标不是替代你的业务系统,而是提供一套可组合的助手基础设施:
- **控制台**:让团队快速配置助手、资源库、知识库、工具、工作流与评估策略
- **API 与实时运行时**:让应用、设备和第三方系统稳定接入实时对话能力
- **运维与分析能力**:让团队能观察会话效果、排查问题并持续迭代助手质量
如果你把实时助手看作一条完整的产品链路RAS 负责其中的“构建、接入、运行、观测”四个阶段。
## 核心模块
| 模块 | 负责什么 | 适合谁使用 |
|------|----------|------------|
| **助手** | 定义角色、行为、模型、知识、工具和会话策略 | 产品、运营、算法、开发 |
| **引擎** | 承载实时语音/多模态对话,输出事件流和音频流 | 开发、基础设施 |
| **资源库** | 管理 LLM、ASR、TTS 等外部能力接入 | 平台管理员、开发 |
| **知识库 / 工具 / 工作流** | 让助手获得领域知识、外部执行能力和复杂流程控制 | 业务设计者、开发 |
| **分析与评估** | 记录会话、监控指标、做自动化回归和效果评估 | 运营、QA、开发 |
## 为什么是“控制台 + API”
RAS 采用“控制台配置 + API 接入”的组合方式,而不是把所有内容都固化在代码里:
- **控制台负责提效**:让非后端角色也能参与提示词、工具、知识、流程的配置与调优
- **API 负责集成**:让产品团队继续用自己的前端、服务端或设备侧应用承载最终体验
- **同一套助手配置可复用**:控制台保存的助手定义可以被不同渠道重复接入和评估
## 典型使用方式
<div class="grid cards" markdown>
- :material-headset: **客户服务与运营自动化**
---
在客服、外呼、预约、售后等场景中接入实时语音助手,并保留人工接管与工具调用能力。
- :material-school-outline: **培训、陪练与问答**
---
用知识库、提示词和流程编排构建可持续优化的教学、培训或辅导助手。
- :material-domain: **企业内部助手**
---
通过私有部署、内部知识库和业务系统工具,把助手接入内部流程或设备终端。
- :material-devices: **多端集成**
---
通过 WebSocket API 将同一个助手接入 Web、移动端、坐席工作台或自有硬件设备。
</div>
## 与其他方案的差异
本页是站内唯一保留“产品对比”视角的地方,用于帮助你快速判断 RAS 的定位边界。
| 特性 | RAS | Vapi | Retell | ElevenLabs Agents |
|------|-----|------|--------|-------------------|
| **开源** | :white_check_mark: | :x: | :x: | :x: |
| **私有部署** | :white_check_mark: | :x: | :x: | :x: |
| **Pipeline 引擎** | :white_check_mark: | :white_check_mark: | :white_check_mark: | :x: |
| **Realtime / 多模态引擎** | :white_check_mark: | :white_check_mark: | :x: | :white_check_mark: |
| **自定义 ASR / TTS** | :white_check_mark: | 有限 | 有限 | :x: |
| **知识库与工具扩展** | :white_check_mark: | :white_check_mark: | :white_check_mark: | 有限 |
| **工作流编排** | 开发中 | :white_check_mark: | :x: | :x: |
| **数据与链路可观测** | :white_check_mark: | 有限 | 有限 | 有限 |
## 继续阅读
- [系统架构](architecture.md) - 从服务边界、数据流和部署形态理解系统如何组成
- [核心概念](../concepts/index.md) - 先建立助手、引擎与工作流的心智模型
- [快速开始](../quickstart/index.md) - 以最短路径创建第一个助手

View File

@@ -0,0 +1,44 @@
# 资源准备清单
本页保留原“资源库配置详解”链接,但在本轮文档收敛后,它只承担快速开始阶段的资源核对职责。
## 你至少要准备什么
在创建第一个助手前,至少确认以下三类资源都已经可用:
| 资源 | 为什么需要 | 正式说明页 |
|------|------------|------------|
| **LLM 模型** | 负责理解与生成回复 | [LLM 模型](../customization/models.md) |
| **ASR 资源** | 负责把语音输入转写为文本 | [语音识别](../customization/asr.md) |
| **TTS 声音资源** | 负责把文本回复合成为语音 | [声音资源](../customization/voices.md) |
## 上手前自检
### LLM
- 已配置供应商、模型名称、Base URL 和凭证
- 已明确该模型用于文本生成、嵌入还是重排
- 已准备保守的默认参数,而不是先追求极端效果
### ASR
- 已确认目标语言与模型匹配
- 已准备必要热词或专有名词词表
- 已能用固定样本测试识别准确率和延迟
### TTS
- 已选择主音色,并完成至少一次试听
- 已确认该声音适合实时对话,而不是仅适合离线播报
- 已为默认语速、音量等参数设定初始值
## 不在本页展开的内容
字段说明、供应商差异、参数建议和最佳实践已经分别收敛到正式能力页:
- [LLM 模型](../customization/models.md)
- [语音识别](../customization/asr.md)
- [声音资源](../customization/voices.md)
- [TTS 参数](../customization/tts.md)
准备完成后,请回到 [快速开始](index.md) 继续创建助手。

View File

@@ -0,0 +1,98 @@
# 快速开始
本页负责“创建第一个助手”的最短路径。环境要求、配置文件和部署方式统一放在 [环境与部署](../getting-started/index.md)。
## 目标
完成本页后,你应该已经:
1. 准备好 1 个 LLM、1 个 ASR、1 个 TTS 资源
2. 创建并保存 1 个助手
3. 完成至少 1 轮测试对话
4. 拿到接入应用所需的 `assistant_id` 和 WebSocket 地址
## 前提条件
- 已部署 Realtime Agent StudioRAS服务
- 已准备可用的 LLM / ASR / TTS 凭证
- 已能访问控制台与 WebSocket 服务
## 第一步:准备资源
创建助手之前,先准备三类资源:
- **LLM 模型**:决定助手如何理解和生成回复。详见 [LLM 模型](../customization/models.md)
- **ASR 资源**:决定语音输入如何转写。详见 [语音识别](../customization/asr.md)
- **TTS 声音资源**:决定回复如何被合成为语音。详见 [声音资源](../customization/voices.md)
如果你想先检查“资源是否准备齐”,可以看 [资源准备清单](dashboard.md)。
## 第二步:创建助手
1. 进入控制台中的 **助手** 页面
2. 新建一个助手,并填写最小必要信息:
- **助手名称**:让团队知道它服务于什么场景
- **系统提示词**:先定义角色、任务和限制
- **首轮模式**:决定由助手先说还是等待用户开口
3. 绑定默认模型:
- 文本生成使用一个 LLM
- 语音输入使用一个 ASR
- 语音输出使用一个 TTS 声音资源
如果你想把助手设计得更稳,继续阅读:
- [助手概念](../concepts/assistants.md)
- [配置选项](../concepts/assistants/configuration.md)
- [提示词指南](../concepts/assistants/prompts.md)
## 第三步:补充能力
最小助手可以只依赖提示词和模型;更复杂的场景通常还需要以下能力:
- **知识库**:让助手回答私有领域问题。见 [知识库](../customization/knowledge-base.md)
- **工具**:让助手执行查单、预约、查询等外部操作。见 [工具](../customization/tools.md)
- **工作流**:让助手处理多步骤、多分支流程。见 [工作流](../customization/workflows.md)
## 第四步:测试并发布
1. 打开助手测试面板,先验证文本对话,再验证语音输入输出
2. 观察事件流、转写、工具调用和最终回复是否符合预期
3. 保存当前配置,并确认该助手已可用于外部接入
更系统的验证方式见 [测试调试](../concepts/assistants/testing.md)。
## 第五步:接入应用
最小接入方式是使用 WebSocket API 建立实时会话:
```javascript
const ws = new WebSocket('ws://your-server/ws?assistant_id=YOUR_ASSISTANT_ID');
ws.onopen = () => {
ws.send(JSON.stringify({
type: 'session.start',
audio: { encoding: 'pcm_s16le', sample_rate_hz: 16000, channels: 1 }
}));
};
```
你通常只需要两项信息:
- `assistant_id`:指定接入哪个助手
- WebSocket 地址:由引擎服务提供实时对话入口
完整协议见 [WebSocket 协议](../api-reference/websocket.md)。
## 常见卡点
- 资源配置不生效:回到 [资源准备清单](dashboard.md) 检查三类资源是否都已准备好
- 助手不回复:先看 [测试调试](../concepts/assistants/testing.md),再进入 [故障排查](../resources/troubleshooting.md)
- 回复质量不稳定:优先检查 [提示词指南](../concepts/assistants/prompts.md) 与 [知识库](../customization/knowledge-base.md)
## 下一步
- [环境与部署](../getting-started/index.md) - 补全环境、配置和部署细节
- [构建助手](../concepts/assistants.md) - 深入配置助手、模型、知识库、工具与工作流
- [API 参考](../api-reference/index.md) - 查看管理接口与实时协议

View File

@@ -0,0 +1,59 @@
# 常见问题
本页只提供简短回答和跳转建议;如果你需要逐步排查,请直接进入 [故障排查](troubleshooting.md)。
## Q: 我应该先看哪一部分文档?
- 想了解产品是什么:看 [产品概览](../overview/index.md)
- 想先把服务跑起来:看 [环境与部署](../getting-started/index.md)
- 想最快创建第一个助手:看 [快速开始](../quickstart/index.md)
- 想系统完成助手配置:从 [助手概览](../concepts/assistants.md) 开始
## Q: 如何配置模型或 API Key
进入对应资源页完成配置:
- LLM见 [LLM 模型](../customization/models.md)
- ASR见 [语音识别](../customization/asr.md)
- TTS见 [声音资源](../customization/voices.md)
## Q: 助手为什么不回复?
通常先检查三件事:
- 助手是否已绑定可用的模型资源
- 提示词、知识库或工具是否配置完整
- WebSocket 会话是否已经正常建立
下一步:
- 助手行为验证:看 [测试调试](../concepts/assistants/testing.md)
- 逐步排查:看 [故障排查](troubleshooting.md)
## Q: 回复为什么不准确或不稳定?
优先检查:
- 提示词是否明确了角色、任务和限制
- 是否应该补充知识库,而不是继续堆叠提示词
- 是否需要把复杂业务改成工作流,而不是单轮问答
相关文档:
- [提示词指南](../concepts/assistants/prompts.md)
- [知识库](../customization/knowledge-base.md)
- [工作流](../customization/workflows.md)
## Q: 语音识别或语音播放效果不好怎么办?
- 输入侧问题先看 [语音识别](../customization/asr.md)
- 输出侧问题先看 [声音资源](../customization/voices.md) 和 [TTS 参数](../customization/tts.md)
- 需要逐步定位链路问题时,再看 [故障排查](troubleshooting.md)
## Q: 页面空白、接口报错或连接不上怎么办?
这是典型的环境或链路问题:
- 先确认 [环境与部署](../getting-started/index.md) 中的三个服务都已启动
- 再进入 [故障排查](troubleshooting.md) 按连接、API、页面加载或性能问题分类处理

View File

@@ -0,0 +1,292 @@
# 故障排查
本文档汇总常见问题的排查步骤和解决方案。
## 连接问题
### WebSocket 连接失败
**症状**:无法建立 WebSocket 连接,控制台显示连接错误。
**排查步骤**
1. **检查服务状态**
```bash
# 检查 Engine 服务是否运行
curl http://localhost:8000/health
```
2. **验证连接地址**
- 确认 host 和 port 正确
- 确认 assistant_id 参数存在
3. **检查网络**
- 确认防火墙未阻止 WebSocket
- 检查 Nginx 代理配置(如有)
4. **查看服务日志**
```bash
docker logs ai-assistant-engine
```
**常见原因**
- Engine 服务未启动
- assistant_id 无效
- 防火墙阻止 WebSocket 端口
---
### API 请求失败
**症状**REST API 返回错误或超时。
**排查步骤**
1. **检查 API 服务**
```bash
curl http://localhost:8080/health
```
2. **验证请求格式**
- Content-Type 是否为 application/json
- 请求体是否为有效 JSON
3. **检查认证**
- Authorization header 是否正确
- API Key 是否有效
4. **查看响应详情**
```bash
curl -v http://localhost:8080/api/v1/assistants
```
---
## 助手问题
### 助手不回复
**症状**:发送消息后没有收到助手回复。
**排查步骤**
1. **检查会话状态**
- 确认收到 `session.started` 事件
- 确认没有 `error` 事件
2. **检查 LLM 配置**
- API Key 是否有效
- 模型配置是否正确
- 测试模型连接
3. **查看日志**
- 检查 LLM 调用是否成功
- 查看是否有超时错误
**常见原因**
- LLM API Key 无效或过期
- 模型服务不可用
- 请求超时
---
### 回复质量差
**症状**:助手回复不准确、不相关或格式混乱。
**排查步骤**
1. **检查提示词**
- 是否有明确的角色定义
- 是否有清晰的任务描述
- 是否有必要的约束
2. **调整参数**
- 降低 temperature 提高一致性
- 调整 max_tokens 控制长度
3. **检查知识库**
- 确认知识库已关联
- 测试检索结果是否相关
4. **查看对话历史**
- 分析问题出现的模式
- 收集典型的失败案例
---
## 语音问题
### 语音识别不准确
**症状**ASR 识别结果与实际说话内容不符。
**排查步骤**
1. **检查音频质量**
- 麦克风是否正常工作
- 环境是否嘈杂
- 采样率是否正确16kHz
2. **验证 ASR 配置**
- 语言设置是否正确
- 是否配置了热词
3. **测试不同引擎**
- 尝试切换 ASR 服务提供商
- 对比识别效果
**改进建议**
- 添加业务相关的热词
- 使用降噪麦克风
- 选择针对中文优化的 ASR 引擎
---
### 语音无法播放
**症状**TTS 合成成功但没有声音输出。
**排查步骤**
1. **检查浏览器设置**
- 是否允许自动播放音频
- 音量是否静音
2. **验证音频数据**
- 确认收到 `output.audio.start` 事件
- 确认收到二进制音频帧
- 确认收到 `output.audio.end` 事件
3. **检查音频解码**
- PCM 格式是否正确解析
- AudioContext 是否正确初始化
4. **测试 TTS 服务**
- 单独测试 TTS 配置
- 检查 TTS API 状态
---
## 部署问题
### Docker 容器启动失败
**症状**:容器无法启动或立即退出。
**排查步骤**
1. **查看容器日志**
```bash
docker logs <container_name>
```
2. **检查资源限制**
```bash
docker stats
```
3. **验证配置文件**
- 环境变量是否正确
- 配置文件路径是否存在
4. **检查端口冲突**
```bash
netstat -an | grep <port>
```
---
### 页面加载空白
**症状**:浏览器打开页面但内容为空。
**排查步骤**
1. **检查浏览器控制台**
- 打开 F12 开发者工具
- 查看 Console 错误信息
2. **验证静态资源**
- 检查 Network 标签页
- 确认 JS/CSS 文件加载成功
3. **检查 API 连接**
- 确认 VITE_API_URL 配置正确
- 测试 API 是否可访问
4. **清除缓存**
```bash
# 强制刷新
Ctrl + Shift + R
```
---
## 性能问题
### 响应延迟高
**症状**:从发送消息到收到回复时间过长。
**排查步骤**
1. **定位延迟环节**
- ASR 处理时间
- LLM 推理时间
- TTS 合成时间
2. **查看性能指标**
- 检查 `metrics.ttfb` 事件
- 分析各环节耗时
3. **优化配置**
- 使用更快的模型
- 减少 max_tokens
- 启用流式输出
4. **检查网络**
- 测试到各 API 的延迟
- 考虑使用更近的服务区域
---
## 日志查看
### 服务端日志
```bash
# Docker 容器日志
docker logs -f ai-assistant-engine
# 查看最近 100 行
docker logs --tail 100 ai-assistant-engine
```
### 客户端日志
在浏览器开发者工具中:
1. **Console** - 查看 JavaScript 错误和日志
2. **Network** - 查看网络请求和响应
3. **WebSocket** - 查看 WS 消息(在 Network 标签页)
### 启用详细日志
设置环境变量启用调试日志:
```bash
# Engine 服务
LOG_LEVEL=debug
# API 服务
DEBUG=true
```
## 获取帮助
如果以上方法无法解决问题:
1. 收集相关日志和错误信息
2. 描述复现步骤
3. 提交 Issue 或联系技术支持

110
docs/content/roadmap.md Normal file
View File

@@ -0,0 +1,110 @@
# 开发路线图
本页面展示 Realtime Agent Studio 的开发计划和进度。
---
## 已完成 :white_check_mark:
### 实时交互引擎
- [x] **管线式全双工引擎** - ASR / LLM / TTS 流水线架构
- [x] **智能打断处理** - VAD + EOU 检测
- [x] **OpenAI 兼容接口** - ASR / TTS 标准接口适配
- [x] **DashScope TTS** - 阿里云语音合成适配
### 助手配置管理
- [x] **系统提示词编辑** - Prompt 配置,动态变量注入
- [x] **模型选择** - LLM / ASR / TTS 模型管理界面
- [x] **工具调用配置** - Webhook 工具 + 客户端工具
### 调试与观察
- [x] **实时调试控制台** - WebSocket 调试连接示例
- [x] **完整会话回放** - 音频 + 转写 + LLM 响应
- [x] **会话检索筛选** - 按时间 / 助手 / 状态筛选
### 开放接口
- [x] **WebSocket 协议** - `/ws` 端点完整实现
- [x] **RESTful 接口** - 完整的 CRUD API
---
## 开发中 :construction:
### 助手与能力编排
- [ ] **私有化 ASR / TTS 适配** - 本地模型接入
- [ ] **工作流编辑** - 可视化流程编排
- [ ] **知识库关联** - RAG 文档管理
### 实时交互引擎
- [ ] **原生多模态模型** - Step Audio 接入GPT-4o Realtime / Gemini Live 国内环境受限)
- [ ] **WebRTC 协议** - `/webrtc` 端点
### 开放接口
- [ ] **SDK 支持** - JavaScript / Python SDK
- [ ] **电话接入** - 电话呼入自动接听 / 自动呼出接口和批量呼出
### 效果评估
- [ ] **自动化测试工具** - 固定测试 + 智能测试
---
## 计划中 :spiral_notepad:
### 开放接口
- [ ] **Webhook 回调** - 会话事件通知机制
### 数据与评估
- [ ] **实时仪表盘增强** - 完善统计看板功能
- [ ] **评估闭环** - 测试、评分、回归与变更追踪
### 企业能力
- [ ] **多租户支持** - 团队 / 组织管理
- [ ] **权限管理** - RBAC 角色权限控制
- [ ] **审计日志** - 操作记录追踪
### 生态集成
- [ ] **更多模型供应商** - 讯飞、百度、腾讯等
- [ ] **CRM 集成** - Salesforce、HubSpot 等
- [ ] **呼叫中心集成** - SIP / PSTN 网关
---
## 版本规划
| 版本 | 目标 | 状态 |
|------|------|------|
| **v0.1.0** | 核心功能 MVP管线式引擎 | :white_check_mark: 已发布 |
| **v0.2.0** | 工作流编辑器,知识库集成 | :construction: 开发中 |
| **v0.3.0** | SDK 发布,多模态模型支持 | :spiral_notepad: 计划中 |
| **v1.0.0** | 生产就绪,企业特性 | :spiral_notepad: 计划中 |
---
## 生态参考
### 开源项目
- [Livekit Agent](https://github.com/livekit/agents)
- [Pipecat](https://github.com/pipecat-ai/pipecat)
- [Vision Agents](https://github.com/GetStream/Vision-Agents)
- [active-call](https://github.com/miuda-ai/active-call)
- [TEN](https://github.com/TEN-framework/ten-framework)
- [airi](https://github.com/moeru-ai/airi)
- [Vocode Core](https://github.com/vocodedev/vocode-core)
- [awesome-voice-agents](https://github.com/yzfly/awesome-voice-agents)
### 文档与研究参考
- [Voice AI & Voice Agents](https://voiceaiandvoiceagents.com/)

View File

@@ -0,0 +1,160 @@
/* Realtime Agent Studio - Custom Styles */
:root {
--md-primary-fg-color: #4f46e5;
--md-primary-fg-color--light: #6366f1;
--md-primary-fg-color--dark: #4338ca;
--md-accent-fg-color: #6366f1;
}
/* Hero Section - Center aligned content */
.md-typeset p[align="center"] {
text-align: center;
}
.md-typeset p[align="center"] img {
display: inline-block;
margin: 0 4px;
vertical-align: middle;
}
.md-typeset p[align="center"] a {
margin: 0 8px;
}
[data-md-color-scheme="slate"] {
--md-primary-fg-color: #818cf8;
--md-primary-fg-color--light: #a5b4fc;
--md-primary-fg-color--dark: #6366f1;
--md-accent-fg-color: #818cf8;
}
/* Hero Section Styling */
.md-content h1 {
font-weight: 700;
letter-spacing: -0.02em;
}
/* Badge Styling */
.md-content img[src*="badge"] {
margin: 0 4px;
vertical-align: middle;
}
/* Grid Cards Enhancement */
.md-typeset .grid.cards > ul > li {
border: 1px solid var(--md-default-fg-color--lightest);
border-radius: 8px;
transition: all 0.2s ease;
}
.md-typeset .grid.cards > ul > li:hover {
border-color: var(--md-primary-fg-color);
box-shadow: 0 4px 12px rgba(0, 0, 0, 0.1);
transform: translateY(-2px);
}
/* Code Block Enhancement */
.md-typeset pre > code {
border-radius: 8px;
}
.md-typeset .highlight {
border-radius: 8px;
overflow: hidden;
}
/* Table Enhancement */
.md-typeset table:not([class]) {
border-radius: 8px;
overflow: hidden;
border: 1px solid var(--md-default-fg-color--lightest);
}
.md-typeset table:not([class]) th {
background-color: var(--md-default-fg-color--lightest);
font-weight: 600;
}
/* Admonition Enhancement */
.md-typeset .admonition,
.md-typeset details {
border-radius: 8px;
border: none;
}
/* Mermaid Diagram Styling - consistent element size across diagrams */
.mermaid {
margin: 1.5rem 0;
overflow-x: auto;
}
.mermaid svg {
min-width: min-content;
}
/* Navigation Enhancement */
.md-nav__link {
font-weight: 500;
}
.md-nav__item--active > .md-nav__link {
font-weight: 600;
}
/* Footer Styling */
.md-footer {
margin-top: 3rem;
}
/* Center align for hero badges */
.md-content > .md-typeset > div[align="center"] img {
margin: 0.25rem;
}
/* Task list styling */
.md-typeset .task-list-item input[type="checkbox"] {
margin-right: 0.5rem;
}
/* Improve readability */
.md-typeset {
font-size: 0.85rem;
line-height: 1.75;
}
.md-typeset h2 {
margin-top: 2.5rem;
padding-bottom: 0.5rem;
border-bottom: 1px solid var(--md-default-fg-color--lightest);
}
.md-typeset h3 {
margin-top: 1.5rem;
}
/* Responsive improvements */
@media screen and (max-width: 76.1875em) {
.md-typeset .grid.cards > ul > li {
padding: 1rem;
}
}
/* Animation for interactive elements */
.md-typeset a:not(.md-button) {
transition: color 0.15s ease;
}
.md-typeset a:not(.md-button):hover {
color: var(--md-accent-fg-color);
}
/* Version selector styling */
.md-version {
font-size: 0.75rem;
}
/* Search highlight */
.md-search-result mark {
background-color: var(--md-accent-fg-color--transparent);
color: inherit;
}

View File

@@ -1,21 +1,157 @@
site_name: "AI Video Assistant"
site_description: "AI 视频助手 - 智能对话与工作流管理平台"
copyright: "2025"
site_author: "AI Video Assistant Team"
site_name: "Realtime Agent Studio"
site_description: "Realtime Agent StudioRAS是一个通过管理控制台与 API 构建、部署和运营实时多模态助手的开源平台"
site_url: "https://your-org.github.io/AI-VideoAssistant"
copyright: "Copyright &copy; 2025 RAS Team"
site_author: "RAS Team"
docs_dir: "content"
site_dir: "site"
nav:
- 首页: "index.md"
- 快速开始: "getting-started.md"
- 功能介绍:
- 仪表盘: "features/dashboard.md"
- 助手管理: "features/assistants.md"
- 工作流: "features/workflows.md"
- 模型配置: "features/models.md"
- 知识库: "features/knowledge.md"
- 历史记录: "features/history.md"
- 自动化测试: "features/autotest.md"
- 语音合成: "features/voices.md"
- 部署指南: "deployment.md"
- 首页: index.md
- 快速开始:
- 环境与部署: getting-started/index.md
- 创建第一个助手: quickstart/index.md
- 构建助手:
- 助手概览: concepts/assistants.md
- 基础配置: concepts/assistants/configuration.md
- 提示词: concepts/assistants/prompts.md
- LLM 模型: customization/models.md
- 语音识别: customization/asr.md
- 声音资源: customization/voices.md
- TTS 参数: customization/tts.md
- 知识库: customization/knowledge-base.md
- 工具: customization/tools.md
- 工作流: customization/workflows.md
- 测试与调试: concepts/assistants/testing.md
- 核心概念:
- 产品概览: overview/index.md
- 概念总览: concepts/index.md
- 引擎架构: concepts/engines.md
- Pipeline 引擎: concepts/pipeline-engine.md
- Realtime 引擎: concepts/realtime-engine.md
- 系统架构: overview/architecture.md
- 集成:
- API 参考: api-reference/index.md
- WebSocket 协议: api-reference/websocket.md
- 错误码: api-reference/errors.md
- 运维:
- 仪表盘: analysis/dashboard.md
- 历史记录: analysis/history.md
- 效果评估: analysis/evaluation.md
- 自动化测试: analysis/autotest.md
- 常见问题: resources/faq.md
- 故障排查: resources/troubleshooting.md
- 更新日志: changelog.md
- 路线图: roadmap.md
theme:
name: material
language: zh
custom_dir: overrides
icon:
logo: material/robot-outline
font:
text: Inter
code: JetBrains Mono
palette:
- media: "(prefers-color-scheme: light)"
scheme: default
primary: indigo
accent: indigo
toggle:
icon: material/brightness-7
name: 切换到深色模式
- media: "(prefers-color-scheme: dark)"
scheme: slate
primary: indigo
accent: indigo
toggle:
icon: material/brightness-4
name: 切换到浅色模式
features:
- navigation.instant
- navigation.instant.prefetch
- navigation.tracking
- navigation.tabs
- navigation.tabs.sticky
- navigation.sections
- navigation.expand
- navigation.path
- navigation.top
- navigation.footer
- toc.follow
- search.suggest
- search.highlight
- search.share
- content.code.copy
- content.code.annotate
- content.tabs.link
markdown_extensions:
- abbr
- admonition
- attr_list
- def_list
- footnotes
- md_in_html
- tables
- toc:
permalink: true
toc_depth: 3
- pymdownx.arithmatex:
generic: true
- pymdownx.betterem:
smart_enable: all
- pymdownx.caret
- pymdownx.details
- pymdownx.emoji:
emoji_index: !!python/name:material.extensions.emoji.twemoji
emoji_generator: !!python/name:material.extensions.emoji.to_svg
- pymdownx.highlight:
anchor_linenums: true
line_spans: __span
pygments_lang_class: true
- pymdownx.inlinehilite
- pymdownx.keys
- pymdownx.magiclink:
repo_url_shorthand: true
user: your-org
repo: AI-VideoAssistant
- pymdownx.mark
- pymdownx.smartsymbols
- pymdownx.snippets
- pymdownx.superfences:
custom_fences:
- name: mermaid
class: mermaid
format: !!python/name:pymdownx.superfences.fence_code_format
- pymdownx.tabbed:
alternate_style: true
- pymdownx.tasklist:
custom_checkbox: true
- pymdownx.tilde
plugins:
- search:
lang: zh
separator: '[\s\-\.]+'
- minify:
minify_html: true
extra:
social:
- icon: fontawesome/brands/github
link: https://github.com/your-org/AI-VideoAssistant
name: GitHub
generator: false
analytics:
provider: google
property: G-XXXXXXXXXX
extra_css:
- stylesheets/extra.css
extra_javascript:
- javascripts/mermaid.mjs
- javascripts/extra.js

9
docs/overrides/main.html Normal file
View File

@@ -0,0 +1,9 @@
{% extends "base.html" %}
{% block extrahead %}
<meta name="author" content="RAS Team">
<meta name="keywords" content="AI, Voice Agent, Realtime, LLM, ASR, TTS, WebSocket">
<meta property="og:title" content="{{ page.title }} - Realtime Agent Studio">
<meta property="og:description" content="构建实时交互音视频智能体的开源工作平台">
<meta property="og:type" content="website">
{% endblock %}

View File

@@ -0,0 +1,118 @@
<!--
Copyright (c) 2016-2025 Martin Donath <martin.donath@squidfunk.com>
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to
deal in the Software without restriction, including without limitation the
rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
sell copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
IN THE SOFTWARE.
-->
<!-- Determine classes -->
{% set class = "md-header" %}
{% if "navigation.tabs.sticky" in features %}
{% set class = class ~ " md-header--shadow md-header--lifted" %}
{% elif "navigation.tabs" not in features %}
{% set class = class ~ " md-header--shadow" %}
{% endif %}
<!-- Header -->
<header class="{{ class }}" data-md-component="header">
<nav
class="md-header__inner md-grid"
aria-label="{{ lang.t('header') }}"
>
<!-- Link to home -->
<a
href="{{ config.extra.homepage | d(nav.homepage.url, true) | url }}"
title="{{ config.site_name | e }}"
class="md-header__button md-logo"
aria-label="{{ config.site_name }}"
data-md-component="logo"
>
{% include "partials/logo.html" %}
</a>
<!-- Button to open drawer -->
<label class="md-header__button md-icon" for="__drawer">
{% set icon = config.theme.icon.menu or "material/menu" %}
{% include ".icons/" ~ icon ~ ".svg" %}
</label>
<!-- Header title (data-md-component removed so title stays site_name on scroll) -->
<div class="md-header__title">
<div class="md-header__ellipsis">
<div class="md-header__topic">
<span class="md-ellipsis">
{{ config.site_name }}
</span>
</div>
<div class="md-header__topic" data-md-component="header-topic">
<span class="md-ellipsis">
{{ config.site_name }}
</span>
</div>
</div>
</div>
<!-- Color palette toggle -->
{% if config.theme.palette %}
{% if not config.theme.palette is mapping %}
{% include "partials/palette.html" %}
{% endif %}
{% endif %}
<!-- User preference: color palette -->
{% if not config.theme.palette is mapping %}
{% include "partials/javascripts/palette.html" %}
{% endif %}
<!-- Site language selector -->
{% if config.extra.alternate %}
{% include "partials/alternate.html" %}
{% endif %}
<!-- Button to open search modal -->
{% if "material/search" in config.plugins %}
{% set search = config.plugins["material/search"] | attr("config") %}
<!-- Check if search is actually enabled - see https://t.ly/DT_0V -->
{% if search.enabled %}
<label class="md-header__button md-icon" for="__search">
{% set icon = config.theme.icon.search or "material/magnify" %}
{% include ".icons/" ~ icon ~ ".svg" %}
</label>
<!-- Search interface -->
{% include "partials/search.html" %}
{% endif %}
{% endif %}
<!-- Repository information -->
{% if config.repo_url %}
<div class="md-header__source">
{% include "partials/source.html" %}
</div>
{% endif %}
</nav>
<!-- Navigation tabs (sticky) -->
{% if "navigation.tabs.sticky" in features %}
{% if "navigation.tabs" in features %}
{% include "partials/tabs.html" %}
{% endif %}
{% endif %}
</header>

4
docs/requirements.txt Normal file
View File

@@ -0,0 +1,4 @@
# Pin MkDocs to 1.x; Material for MkDocs is not yet compatible with MkDocs 2.0
# https://squidfunk.github.io/mkdocs-material/blog/2026/02/18/mkdocs-2.0/
mkdocs>=1.6,<2
mkdocs-material

51
engine/.dockerignore Normal file
View File

@@ -0,0 +1,51 @@
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
.venv/
venv/
ENV/
.eggs/
*.egg-info/
*.egg
# IDE
.idea/
.vscode/
*.swp
*.swo
# Logs
logs/
*.log
# Testing
.pytest_cache/
.coverage
htmlcov/
.tox/
# Environment
.env
.env.local
*.env
# Git
.git/
.gitignore
# Docker
Dockerfile
.dockerignore
# Development files
*.md
tests/
examples/
scripts/
docs/
# Running artifacts
running/

View File

@@ -26,37 +26,28 @@ HISTORY_FINALIZE_DRAIN_TIMEOUT_SEC=1.5
SAMPLE_RATE=16000
# 20ms is recommended for VAD stability and latency.
# 100ms works but usually worsens start-of-speech accuracy.
# WS binary audio frame size validation is derived from SAMPLE_RATE + CHUNK_SIZE_MS.
# Client frame payloads must be a multiple of: SAMPLE_RATE * 2 * (CHUNK_SIZE_MS / 1000).
CHUNK_SIZE_MS=20
# Public default output codec exposed in config.resolved (overridable by runtime metadata).
DEFAULT_CODEC=pcm
MAX_AUDIO_BUFFER_SECONDS=30
# Agent profile selection (optional fallback when CLI args are not used)
# Prefer CLI:
# python -m app.main --agent-config config/agents/default.yaml
# python -m app.main --agent-profile default
# AGENT_CONFIG_PATH=config/agents/default.yaml
# AGENT_PROFILE=default
AGENT_CONFIG_DIR=config/agents
# Optional: provider credentials referenced from YAML, e.g. ${LLM_API_KEY}
# LLM_API_KEY=your_llm_api_key_here
# LLM_API_URL=https://api.openai.com/v1
# TTS_API_KEY=your_tts_api_key_here
# TTS_API_URL=https://api.example.com/v1/audio/speech
# ASR_API_KEY=your_asr_api_key_here
# ASR_API_URL=https://api.example.com/v1/audio/transcriptions
# Local assistant/agent YAML directory. In local mode the runtime resolves:
# ASSISTANT_LOCAL_CONFIG_DIR/<assistant_id>.yaml
ASSISTANT_LOCAL_CONFIG_DIR=config/agents
# Logging
LOG_LEVEL=INFO
# json is better for production/observability; text is easier locally.
# Controls both console and file log serialization/format.
LOG_FORMAT=json
# WebSocket behavior
INACTIVITY_TIMEOUT_SEC=60
HEARTBEAT_INTERVAL_SEC=50
# Public protocol label emitted in session.started/config.resolved payloads.
WS_PROTOCOL_VERSION=v1
# WS_API_KEY=replace_with_shared_secret
WS_REQUIRE_AUTH=false
# CORS / ICE (JSON strings)
CORS_ORIGINS=["http://localhost:3000","http://localhost:8080"]

33
engine/Dockerfile Normal file
View File

@@ -0,0 +1,33 @@
FROM python:3.12-slim
WORKDIR /app
# Build this image from the project parent directory so both
# engine-v3/engine and fastgpt-python-sdk are available in the context.
# Example:
# docker build -f engine-v3/engine/Dockerfile -t engine-v3 .
# Install system dependencies for audio processing
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
libportaudio2 \
libportaudiocpp0 \
portaudio19-dev \
ffmpeg \
&& rm -rf /var/lib/apt/lists/*
# Install Python dependencies
COPY engine-v3/engine/requirements.txt /tmp/requirements.txt
COPY fastgpt-python-sdk /deps/fastgpt-python-sdk
RUN pip install --no-cache-dir -r /tmp/requirements.txt \
&& pip install --no-cache-dir /deps/fastgpt-python-sdk
# Copy application code
COPY engine-v3/engine /app
# Create necessary directories
RUN mkdir -p /app/logs /app/data/vad
EXPOSE 8001
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8001"]

View File

@@ -1,6 +1,6 @@
# py-active-call-cc
# Realtime Agent Studio Engine
Python Active-Call: real-time audio streaming with WebSocket and WebRTC.
This repo contains a Python 3.11+ codebase for building low-latency realtime human-agent interaction pipelines (capture, stream, and process audio) using WebSockets or WebRTC.
This repo contains a Python 3.11+ codebase for building low-latency voice
pipelines (capture, stream, and process audio) using WebRTC and WebSockets.
@@ -14,35 +14,11 @@ It is currently in an early, experimental stage.
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
```
使用 agent profile推荐
```
python -m app.main --agent-profile default
```
使用指定 YAML
```
python -m app.main --agent-config config/agents/default.yaml
```
Agent 配置路径优先级
1. `--agent-config`
2. `--agent-profile`(映射到 `config/agents/<profile>.yaml`
3. `AGENT_CONFIG_PATH`
4. `AGENT_PROFILE`
5. `config/agents/default.yaml`(若存在)
说明
- Agent 相关配置是严格模式YAML 缺少必须项会直接报错,不会回退到 `.env` 或代码默认值
- 如果要引用环境变量,请在 YAML 显式写 `${ENV_VAR}`
- `siliconflow` 独立 section 已移除;请在 `agent.llm / agent.tts / agent.asr` 内通过 `provider``api_key``api_url``model` 配置
- `agent.tts.provider` 现支持 `dashscope`Realtime 协议,非 OpenAI-compatible默认 URL 为 `wss://dashscope.aliyuncs.com/api-ws/v1/realtime`,默认模型为 `qwen3-tts-flash-realtime`
- `agent.tts.dashscope_mode`(兼容旧写法 `agent.tts.mode`)支持 `commit | server_commit`,且仅在 `provider=dashscope` 时生效:
- `commit`Engine 先按句切分,再逐句提交给 DashScope。
- `server_commit`Engine 不再逐句切分,由 DashScope 对整段文本自行切分。
- 现在支持在 Agent YAML 中配置 `agent.tools`(列表),用于声明运行时可调用工具。
- 工具配置示例见 `config/agents/tools.yaml`
- 启动阶段不再通过参数加载 Agent YAML
- 会话阶段统一按 `assistant_id` 拉取运行时配置:
- `BACKEND_URL`:从 backend API 获取
- `BACKEND_URL`(或 `BACKEND_MODE=disabled`):从 `ASSISTANT_LOCAL_CONFIG_DIR/<assistant_id>.yaml` 获取
## Backend Integration
@@ -50,6 +26,7 @@ Engine runtime now supports adapter-based backend integration:
- `BACKEND_MODE=auto|http|disabled`
- `BACKEND_URL` + `BACKEND_TIMEOUT_SEC`
- `ASSISTANT_LOCAL_CONFIG_DIR` (default `engine/config/agents`)
- `HISTORY_ENABLED=true|false`
Behavior:
@@ -58,6 +35,16 @@ Behavior:
- `http`: force HTTP backend; falls back to engine-only mode when URL is missing.
- `disabled`: force engine-only mode (no backend calls).
Assistant config source behavior:
- If `BACKEND_URL` is configured and backend mode is enabled, assistant config is loaded from backend API.
- If `BACKEND_URL` is empty (or backend mode is disabled), assistant config is loaded from local YAML.
Local assistant YAML example:
- File path: `engine/config/agents/<assistant_id>.yaml`
- Runtime still requires WebSocket query param `assistant_id`; it must match the local file name.
History write path is now asynchronous and buffered per session:
- `HISTORY_QUEUE_MAX_SIZE`
@@ -84,3 +71,6 @@ python mic_client.py
`/ws` uses a strict `v1` JSON control protocol with binary PCM audio frames.
See `docs/ws_v1_schema.md`.
# Reference
* [active-call](https://github.com/restsend/active-call)

View File

@@ -0,0 +1 @@
"""Adapters package."""

View File

@@ -0,0 +1 @@
"""Control-plane adapters package."""

View File

@@ -0,0 +1,683 @@
"""Backend adapter implementations for engine integration ports."""
from __future__ import annotations
import re
from pathlib import Path
from typing import Any, Dict, List, Optional
import aiohttp
from loguru import logger
from app.config import settings
try:
import yaml
except ImportError: # pragma: no cover - validated when local YAML source is enabled
yaml = None
_ASSISTANT_ID_PATTERN = re.compile(r"^[A-Za-z0-9][A-Za-z0-9_.-]{0,127}$")
def _assistant_error(code: str, assistant_id: str) -> Dict[str, Any]:
return {"__error_code": code, "assistantId": str(assistant_id or "")}
class NullBackendAdapter:
"""No-op adapter for engine-only runtime without backend dependencies."""
async def fetch_assistant_config(self, assistant_id: str) -> Optional[Dict[str, Any]]:
_ = assistant_id
return None
async def create_call_record(
self,
*,
user_id: int,
assistant_id: Optional[str],
source: str = "debug",
) -> Optional[str]:
_ = (user_id, assistant_id, source)
return None
async def add_transcript(
self,
*,
call_id: str,
turn_index: int,
speaker: str,
content: str,
start_ms: int,
end_ms: int,
confidence: Optional[float] = None,
duration_ms: Optional[int] = None,
) -> bool:
_ = (call_id, turn_index, speaker, content, start_ms, end_ms, confidence, duration_ms)
return False
async def finalize_call_record(
self,
*,
call_id: str,
status: str,
duration_seconds: int,
) -> bool:
_ = (call_id, status, duration_seconds)
return False
async def search_knowledge_context(
self,
*,
kb_id: str,
query: str,
n_results: int = 5,
) -> List[Dict[str, Any]]:
_ = (kb_id, query, n_results)
return []
async def fetch_tool_resource(self, tool_id: str) -> Optional[Dict[str, Any]]:
_ = tool_id
return None
class HistoryDisabledBackendAdapter:
"""Adapter wrapper that disables history writes while keeping reads available."""
def __init__(self, delegate: HttpBackendAdapter | NullBackendAdapter):
self._delegate = delegate
async def fetch_assistant_config(self, assistant_id: str) -> Optional[Dict[str, Any]]:
return await self._delegate.fetch_assistant_config(assistant_id)
async def create_call_record(
self,
*,
user_id: int,
assistant_id: Optional[str],
source: str = "debug",
) -> Optional[str]:
_ = (user_id, assistant_id, source)
return None
async def add_transcript(
self,
*,
call_id: str,
turn_index: int,
speaker: str,
content: str,
start_ms: int,
end_ms: int,
confidence: Optional[float] = None,
duration_ms: Optional[int] = None,
) -> bool:
_ = (call_id, turn_index, speaker, content, start_ms, end_ms, confidence, duration_ms)
return False
async def finalize_call_record(
self,
*,
call_id: str,
status: str,
duration_seconds: int,
) -> bool:
_ = (call_id, status, duration_seconds)
return False
async def search_knowledge_context(
self,
*,
kb_id: str,
query: str,
n_results: int = 5,
) -> List[Dict[str, Any]]:
return await self._delegate.search_knowledge_context(
kb_id=kb_id,
query=query,
n_results=n_results,
)
async def fetch_tool_resource(self, tool_id: str) -> Optional[Dict[str, Any]]:
return await self._delegate.fetch_tool_resource(tool_id)
class LocalYamlAssistantConfigAdapter(NullBackendAdapter):
"""Load assistant runtime config from local YAML files."""
def __init__(self, config_dir: str):
self._config_dir = self._resolve_base_dir(config_dir)
@staticmethod
def _resolve_base_dir(config_dir: str) -> Path:
raw = Path(str(config_dir or "").strip() or "engine/config/agents")
if raw.is_absolute():
return raw.resolve()
cwd_candidate = (Path.cwd() / raw).resolve()
if cwd_candidate.exists():
return cwd_candidate
engine_dir = Path(__file__).resolve().parent.parent
engine_candidate = (engine_dir / raw).resolve()
if engine_candidate.exists():
return engine_candidate
parts = raw.parts
if parts and parts[0] == "engine" and len(parts) > 1:
trimmed_candidate = (engine_dir / Path(*parts[1:])).resolve()
if trimmed_candidate.exists():
return trimmed_candidate
return cwd_candidate
def _resolve_config_file(self, assistant_id: str) -> Optional[Path]:
normalized = str(assistant_id or "").strip()
if not _ASSISTANT_ID_PATTERN.match(normalized):
return None
yaml_path = self._config_dir / f"{normalized}.yaml"
yml_path = self._config_dir / f"{normalized}.yml"
if yaml_path.exists():
return yaml_path
if yml_path.exists():
return yml_path
return None
@staticmethod
def _as_str(value: Any) -> Optional[str]:
if value is None:
return None
text = str(value).strip()
return text or None
@classmethod
def _translate_agent_schema(cls, assistant_id: str, payload: Dict[str, Any]) -> Optional[Dict[str, Any]]:
"""Translate legacy `agent:` YAML schema into runtime assistant metadata."""
agent = payload.get("agent")
if not isinstance(agent, dict):
return None
runtime: Dict[str, Any] = {
"assistantId": str(assistant_id),
"services": {},
}
llm = agent.get("llm")
if isinstance(llm, dict):
llm_runtime: Dict[str, Any] = {}
if cls._as_str(llm.get("provider")):
llm_runtime["provider"] = cls._as_str(llm.get("provider"))
if cls._as_str(llm.get("model")):
llm_runtime["model"] = cls._as_str(llm.get("model"))
if cls._as_str(llm.get("api_key")):
llm_runtime["apiKey"] = cls._as_str(llm.get("api_key"))
if cls._as_str(llm.get("api_url")):
llm_runtime["baseUrl"] = cls._as_str(llm.get("api_url"))
if cls._as_str(llm.get("app_id")):
llm_runtime["appId"] = cls._as_str(llm.get("app_id"))
if llm_runtime:
runtime["services"]["llm"] = llm_runtime
tts = agent.get("tts")
if isinstance(tts, dict):
tts_runtime: Dict[str, Any] = {}
if cls._as_str(tts.get("provider")):
tts_runtime["provider"] = cls._as_str(tts.get("provider"))
if cls._as_str(tts.get("model")):
tts_runtime["model"] = cls._as_str(tts.get("model"))
if cls._as_str(tts.get("api_key")):
tts_runtime["apiKey"] = cls._as_str(tts.get("api_key"))
if cls._as_str(tts.get("api_url")):
tts_runtime["baseUrl"] = cls._as_str(tts.get("api_url"))
if cls._as_str(tts.get("voice")):
tts_runtime["voice"] = cls._as_str(tts.get("voice"))
if cls._as_str(tts.get("app_id")):
tts_runtime["appId"] = cls._as_str(tts.get("app_id"))
if cls._as_str(tts.get("resource_id")):
tts_runtime["resourceId"] = cls._as_str(tts.get("resource_id"))
if cls._as_str(tts.get("cluster")):
tts_runtime["cluster"] = cls._as_str(tts.get("cluster"))
if cls._as_str(tts.get("uid")):
tts_runtime["uid"] = cls._as_str(tts.get("uid"))
if tts.get("speed") is not None:
tts_runtime["speed"] = tts.get("speed")
dashscope_mode = cls._as_str(tts.get("dashscope_mode")) or cls._as_str(tts.get("mode"))
if dashscope_mode:
tts_runtime["mode"] = dashscope_mode
if tts_runtime:
runtime["services"]["tts"] = tts_runtime
asr = agent.get("asr")
if isinstance(asr, dict):
asr_runtime: Dict[str, Any] = {}
if cls._as_str(asr.get("provider")):
asr_runtime["provider"] = cls._as_str(asr.get("provider"))
if cls._as_str(asr.get("model")):
asr_runtime["model"] = cls._as_str(asr.get("model"))
if cls._as_str(asr.get("api_key")):
asr_runtime["apiKey"] = cls._as_str(asr.get("api_key"))
if cls._as_str(asr.get("api_url")):
asr_runtime["baseUrl"] = cls._as_str(asr.get("api_url"))
if cls._as_str(asr.get("app_id")):
asr_runtime["appId"] = cls._as_str(asr.get("app_id"))
if cls._as_str(asr.get("resource_id")):
asr_runtime["resourceId"] = cls._as_str(asr.get("resource_id"))
if cls._as_str(asr.get("cluster")):
asr_runtime["cluster"] = cls._as_str(asr.get("cluster"))
if cls._as_str(asr.get("uid")):
asr_runtime["uid"] = cls._as_str(asr.get("uid"))
if isinstance(asr.get("request_params"), dict):
asr_runtime["requestParams"] = dict(asr.get("request_params") or {})
if asr.get("enable_interim") is not None:
asr_runtime["enableInterim"] = asr.get("enable_interim")
if asr.get("interim_interval_ms") is not None:
asr_runtime["interimIntervalMs"] = asr.get("interim_interval_ms")
if asr.get("min_audio_ms") is not None:
asr_runtime["minAudioMs"] = asr.get("min_audio_ms")
if asr_runtime:
runtime["services"]["asr"] = asr_runtime
duplex = agent.get("duplex")
if isinstance(duplex, dict):
if cls._as_str(duplex.get("system_prompt")):
runtime["systemPrompt"] = cls._as_str(duplex.get("system_prompt"))
if duplex.get("greeting") is not None:
runtime["greeting"] = duplex.get("greeting")
barge_in = agent.get("barge_in")
if isinstance(barge_in, dict):
runtime["bargeIn"] = {}
if barge_in.get("min_duration_ms") is not None:
runtime["bargeIn"]["minDurationMs"] = barge_in.get("min_duration_ms")
if barge_in.get("silence_tolerance_ms") is not None:
runtime["bargeIn"]["silenceToleranceMs"] = barge_in.get("silence_tolerance_ms")
if not runtime["bargeIn"]:
runtime.pop("bargeIn", None)
if isinstance(agent.get("tools"), list):
runtime["tools"] = agent.get("tools")
if not runtime.get("services"):
runtime.pop("services", None)
return runtime
async def fetch_assistant_config(self, assistant_id: str) -> Optional[Dict[str, Any]]:
config_file = self._resolve_config_file(assistant_id)
if config_file is None:
return _assistant_error("assistant.not_found", assistant_id)
if yaml is None:
logger.warning(
"Local assistant config requested but PyYAML is unavailable (assistant_id={})",
assistant_id,
)
return _assistant_error("assistant.config_unavailable", assistant_id)
try:
with config_file.open("r", encoding="utf-8") as handle:
payload = yaml.safe_load(handle) or {}
except Exception as exc:
logger.warning(
"Failed to read local assistant config {} (assistant_id={}): {}",
config_file,
assistant_id,
exc,
)
return _assistant_error("assistant.config_unavailable", assistant_id)
if not isinstance(payload, dict):
logger.warning(
"Local assistant config is not an object (assistant_id={}, file={})",
assistant_id,
config_file,
)
return _assistant_error("assistant.config_unavailable", assistant_id)
translated = self._translate_agent_schema(assistant_id, payload)
if translated is not None:
payload = translated
# Accept either backend-like payload shape or a direct assistant metadata object.
if isinstance(payload.get("assistant"), dict) or isinstance(payload.get("sessionStartMetadata"), dict):
normalized_payload = dict(payload)
else:
normalized_payload = {"assistant": dict(payload)}
assistant_obj = normalized_payload.get("assistant")
if isinstance(assistant_obj, dict):
resolved_assistant_id = assistant_obj.get("assistantId") or assistant_obj.get("id") or assistant_id
assistant_obj["assistantId"] = str(resolved_assistant_id)
else:
normalized_payload["assistant"] = {"assistantId": str(assistant_id)}
normalized_payload.setdefault("assistantId", str(assistant_id))
normalized_payload.setdefault("configVersionId", f"local:{config_file.name}")
return normalized_payload
class AssistantConfigSourceAdapter:
"""Route assistant config reads by backend availability without changing other APIs."""
def __init__(
self,
*,
delegate: HttpBackendAdapter | NullBackendAdapter | HistoryDisabledBackendAdapter,
local_delegate: LocalYamlAssistantConfigAdapter,
use_backend_assistant_config: bool,
):
self._delegate = delegate
self._local_delegate = local_delegate
self._use_backend_assistant_config = bool(use_backend_assistant_config)
async def fetch_assistant_config(self, assistant_id: str) -> Optional[Dict[str, Any]]:
if self._use_backend_assistant_config:
return await self._delegate.fetch_assistant_config(assistant_id)
return await self._local_delegate.fetch_assistant_config(assistant_id)
async def create_call_record(
self,
*,
user_id: int,
assistant_id: Optional[str],
source: str = "debug",
) -> Optional[str]:
return await self._delegate.create_call_record(
user_id=user_id,
assistant_id=assistant_id,
source=source,
)
async def add_transcript(
self,
*,
call_id: str,
turn_index: int,
speaker: str,
content: str,
start_ms: int,
end_ms: int,
confidence: Optional[float] = None,
duration_ms: Optional[int] = None,
) -> bool:
return await self._delegate.add_transcript(
call_id=call_id,
turn_index=turn_index,
speaker=speaker,
content=content,
start_ms=start_ms,
end_ms=end_ms,
confidence=confidence,
duration_ms=duration_ms,
)
async def finalize_call_record(
self,
*,
call_id: str,
status: str,
duration_seconds: int,
) -> bool:
return await self._delegate.finalize_call_record(
call_id=call_id,
status=status,
duration_seconds=duration_seconds,
)
async def search_knowledge_context(
self,
*,
kb_id: str,
query: str,
n_results: int = 5,
) -> List[Dict[str, Any]]:
return await self._delegate.search_knowledge_context(
kb_id=kb_id,
query=query,
n_results=n_results,
)
async def fetch_tool_resource(self, tool_id: str) -> Optional[Dict[str, Any]]:
return await self._delegate.fetch_tool_resource(tool_id)
class HttpBackendAdapter:
"""HTTP implementation of backend integration ports."""
def __init__(self, backend_url: str, timeout_sec: int = 10):
base_url = str(backend_url or "").strip().rstrip("/")
if not base_url:
raise ValueError("backend_url is required for HttpBackendAdapter")
self._base_url = base_url
self._timeout_sec = timeout_sec
def _timeout(self) -> aiohttp.ClientTimeout:
return aiohttp.ClientTimeout(total=self._timeout_sec)
async def fetch_assistant_config(self, assistant_id: str) -> Optional[Dict[str, Any]]:
"""Fetch assistant config payload from backend API.
Expected response shape:
{
"assistant": {...},
"voice": {...} | null
}
"""
url = f"{self._base_url}/api/assistants/{assistant_id}/config"
try:
async with aiohttp.ClientSession(timeout=self._timeout()) as session:
async with session.get(url) as resp:
if resp.status == 404:
logger.warning(f"Assistant config not found: {assistant_id}")
return {"__error_code": "assistant.not_found", "assistantId": assistant_id}
resp.raise_for_status()
payload = await resp.json()
if not isinstance(payload, dict):
logger.warning("Assistant config payload is not a dict; ignoring")
return {"__error_code": "assistant.config_unavailable", "assistantId": assistant_id}
return payload
except Exception as exc:
logger.warning(f"Failed to fetch assistant config ({assistant_id}): {exc}")
return {"__error_code": "assistant.config_unavailable", "assistantId": assistant_id}
async def create_call_record(
self,
*,
user_id: int,
assistant_id: Optional[str],
source: str = "debug",
) -> Optional[str]:
"""Create a call record via backend history API and return call_id."""
url = f"{self._base_url}/api/history"
payload: Dict[str, Any] = {
"user_id": user_id,
"assistant_id": assistant_id,
"source": source,
"status": "connected",
}
try:
async with aiohttp.ClientSession(timeout=self._timeout()) as session:
async with session.post(url, json=payload) as resp:
resp.raise_for_status()
data = await resp.json()
call_id = str((data or {}).get("id") or "")
return call_id or None
except Exception as exc:
logger.warning(f"Failed to create history call record: {exc}")
return None
async def add_transcript(
self,
*,
call_id: str,
turn_index: int,
speaker: str,
content: str,
start_ms: int,
end_ms: int,
confidence: Optional[float] = None,
duration_ms: Optional[int] = None,
) -> bool:
"""Append a transcript segment to backend history."""
if not call_id:
return False
url = f"{self._base_url}/api/history/{call_id}/transcripts"
payload: Dict[str, Any] = {
"turn_index": turn_index,
"speaker": speaker,
"content": content,
"confidence": confidence,
"start_ms": start_ms,
"end_ms": end_ms,
"duration_ms": duration_ms,
}
try:
async with aiohttp.ClientSession(timeout=self._timeout()) as session:
async with session.post(url, json=payload) as resp:
resp.raise_for_status()
return True
except Exception as exc:
logger.warning(f"Failed to append history transcript (call_id={call_id}, turn={turn_index}): {exc}")
return False
async def finalize_call_record(
self,
*,
call_id: str,
status: str,
duration_seconds: int,
) -> bool:
"""Finalize a call record with status and duration."""
if not call_id:
return False
url = f"{self._base_url}/api/history/{call_id}"
payload: Dict[str, Any] = {
"status": status,
"duration_seconds": duration_seconds,
}
try:
async with aiohttp.ClientSession(timeout=self._timeout()) as session:
async with session.put(url, json=payload) as resp:
resp.raise_for_status()
return True
except Exception as exc:
logger.warning(f"Failed to finalize history call record ({call_id}): {exc}")
return False
async def search_knowledge_context(
self,
*,
kb_id: str,
query: str,
n_results: int = 5,
) -> List[Dict[str, Any]]:
"""Search backend knowledge base and return retrieval results."""
if not kb_id or not query.strip():
return []
try:
safe_n_results = max(1, int(n_results))
except (TypeError, ValueError):
safe_n_results = 5
url = f"{self._base_url}/api/knowledge/search"
payload: Dict[str, Any] = {
"kb_id": kb_id,
"query": query,
"nResults": safe_n_results,
}
try:
async with aiohttp.ClientSession(timeout=self._timeout()) as session:
async with session.post(url, json=payload) as resp:
if resp.status == 404:
logger.warning(f"Knowledge base not found for retrieval: {kb_id}")
return []
resp.raise_for_status()
data = await resp.json()
if not isinstance(data, dict):
return []
results = data.get("results", [])
if not isinstance(results, list):
return []
return [r for r in results if isinstance(r, dict)]
except Exception as exc:
logger.warning(f"Knowledge search failed (kb_id={kb_id}): {exc}")
return []
async def fetch_tool_resource(self, tool_id: str) -> Optional[Dict[str, Any]]:
"""Fetch tool resource configuration from backend API."""
if not tool_id:
return None
url = f"{self._base_url}/api/tools/resources/{tool_id}"
try:
async with aiohttp.ClientSession(timeout=self._timeout()) as session:
async with session.get(url) as resp:
if resp.status == 404:
return None
resp.raise_for_status()
data = await resp.json()
return data if isinstance(data, dict) else None
except Exception as exc:
logger.warning(f"Failed to fetch tool resource ({tool_id}): {exc}")
return None
def build_backend_adapter(
*,
backend_url: Optional[str],
backend_mode: str = "auto",
history_enabled: bool = True,
timeout_sec: int = 10,
assistant_local_config_dir: str = "engine/config/agents",
) -> AssistantConfigSourceAdapter:
"""Create backend adapter implementation based on runtime settings."""
mode = str(backend_mode or "auto").strip().lower()
has_url = bool(str(backend_url or "").strip())
base_adapter: HttpBackendAdapter | NullBackendAdapter
using_http_backend = False
if mode in {"disabled", "off", "none", "null", "engine_only", "engine-only"}:
base_adapter = NullBackendAdapter()
elif mode == "http":
if has_url:
base_adapter = HttpBackendAdapter(backend_url=str(backend_url), timeout_sec=timeout_sec)
using_http_backend = True
else:
logger.warning("BACKEND_MODE=http but BACKEND_URL is empty; falling back to NullBackendAdapter")
base_adapter = NullBackendAdapter()
else:
if has_url:
base_adapter = HttpBackendAdapter(backend_url=str(backend_url), timeout_sec=timeout_sec)
using_http_backend = True
else:
base_adapter = NullBackendAdapter()
runtime_adapter: HttpBackendAdapter | NullBackendAdapter | HistoryDisabledBackendAdapter
if not history_enabled:
runtime_adapter = HistoryDisabledBackendAdapter(base_adapter)
else:
runtime_adapter = base_adapter
return AssistantConfigSourceAdapter(
delegate=runtime_adapter,
local_delegate=LocalYamlAssistantConfigAdapter(assistant_local_config_dir),
use_backend_assistant_config=using_http_backend,
)
def build_backend_adapter_from_settings() -> AssistantConfigSourceAdapter:
"""Create backend adapter using current app settings."""
return build_backend_adapter(
backend_url=settings.backend_url,
backend_mode=settings.backend_mode,
history_enabled=settings.history_enabled,
timeout_sec=settings.backend_timeout_sec,
assistant_local_config_dir=settings.assistant_local_config_dir,
)

View File

@@ -1,357 +0,0 @@
"""Backend adapter implementations for engine integration ports."""
from __future__ import annotations
from typing import Any, Dict, List, Optional
import aiohttp
from loguru import logger
from app.config import settings
class NullBackendAdapter:
"""No-op adapter for engine-only runtime without backend dependencies."""
async def fetch_assistant_config(self, assistant_id: str) -> Optional[Dict[str, Any]]:
_ = assistant_id
return None
async def create_call_record(
self,
*,
user_id: int,
assistant_id: Optional[str],
source: str = "debug",
) -> Optional[str]:
_ = (user_id, assistant_id, source)
return None
async def add_transcript(
self,
*,
call_id: str,
turn_index: int,
speaker: str,
content: str,
start_ms: int,
end_ms: int,
confidence: Optional[float] = None,
duration_ms: Optional[int] = None,
) -> bool:
_ = (call_id, turn_index, speaker, content, start_ms, end_ms, confidence, duration_ms)
return False
async def finalize_call_record(
self,
*,
call_id: str,
status: str,
duration_seconds: int,
) -> bool:
_ = (call_id, status, duration_seconds)
return False
async def search_knowledge_context(
self,
*,
kb_id: str,
query: str,
n_results: int = 5,
) -> List[Dict[str, Any]]:
_ = (kb_id, query, n_results)
return []
async def fetch_tool_resource(self, tool_id: str) -> Optional[Dict[str, Any]]:
_ = tool_id
return None
class HistoryDisabledBackendAdapter:
"""Adapter wrapper that disables history writes while keeping reads available."""
def __init__(self, delegate: HttpBackendAdapter | NullBackendAdapter):
self._delegate = delegate
async def fetch_assistant_config(self, assistant_id: str) -> Optional[Dict[str, Any]]:
return await self._delegate.fetch_assistant_config(assistant_id)
async def create_call_record(
self,
*,
user_id: int,
assistant_id: Optional[str],
source: str = "debug",
) -> Optional[str]:
_ = (user_id, assistant_id, source)
return None
async def add_transcript(
self,
*,
call_id: str,
turn_index: int,
speaker: str,
content: str,
start_ms: int,
end_ms: int,
confidence: Optional[float] = None,
duration_ms: Optional[int] = None,
) -> bool:
_ = (call_id, turn_index, speaker, content, start_ms, end_ms, confidence, duration_ms)
return False
async def finalize_call_record(
self,
*,
call_id: str,
status: str,
duration_seconds: int,
) -> bool:
_ = (call_id, status, duration_seconds)
return False
async def search_knowledge_context(
self,
*,
kb_id: str,
query: str,
n_results: int = 5,
) -> List[Dict[str, Any]]:
return await self._delegate.search_knowledge_context(
kb_id=kb_id,
query=query,
n_results=n_results,
)
async def fetch_tool_resource(self, tool_id: str) -> Optional[Dict[str, Any]]:
return await self._delegate.fetch_tool_resource(tool_id)
class HttpBackendAdapter:
"""HTTP implementation of backend integration ports."""
def __init__(self, backend_url: str, timeout_sec: int = 10):
base_url = str(backend_url or "").strip().rstrip("/")
if not base_url:
raise ValueError("backend_url is required for HttpBackendAdapter")
self._base_url = base_url
self._timeout_sec = timeout_sec
def _timeout(self) -> aiohttp.ClientTimeout:
return aiohttp.ClientTimeout(total=self._timeout_sec)
async def fetch_assistant_config(self, assistant_id: str) -> Optional[Dict[str, Any]]:
"""Fetch assistant config payload from backend API.
Expected response shape:
{
"assistant": {...},
"voice": {...} | null
}
"""
url = f"{self._base_url}/api/assistants/{assistant_id}/config"
try:
async with aiohttp.ClientSession(timeout=self._timeout()) as session:
async with session.get(url) as resp:
if resp.status == 404:
logger.warning(f"Assistant config not found: {assistant_id}")
return None
resp.raise_for_status()
payload = await resp.json()
if not isinstance(payload, dict):
logger.warning("Assistant config payload is not a dict; ignoring")
return None
return payload
except Exception as exc:
logger.warning(f"Failed to fetch assistant config ({assistant_id}): {exc}")
return None
async def create_call_record(
self,
*,
user_id: int,
assistant_id: Optional[str],
source: str = "debug",
) -> Optional[str]:
"""Create a call record via backend history API and return call_id."""
url = f"{self._base_url}/api/history"
payload: Dict[str, Any] = {
"user_id": user_id,
"assistant_id": assistant_id,
"source": source,
"status": "connected",
}
try:
async with aiohttp.ClientSession(timeout=self._timeout()) as session:
async with session.post(url, json=payload) as resp:
resp.raise_for_status()
data = await resp.json()
call_id = str((data or {}).get("id") or "")
return call_id or None
except Exception as exc:
logger.warning(f"Failed to create history call record: {exc}")
return None
async def add_transcript(
self,
*,
call_id: str,
turn_index: int,
speaker: str,
content: str,
start_ms: int,
end_ms: int,
confidence: Optional[float] = None,
duration_ms: Optional[int] = None,
) -> bool:
"""Append a transcript segment to backend history."""
if not call_id:
return False
url = f"{self._base_url}/api/history/{call_id}/transcripts"
payload: Dict[str, Any] = {
"turn_index": turn_index,
"speaker": speaker,
"content": content,
"confidence": confidence,
"start_ms": start_ms,
"end_ms": end_ms,
"duration_ms": duration_ms,
}
try:
async with aiohttp.ClientSession(timeout=self._timeout()) as session:
async with session.post(url, json=payload) as resp:
resp.raise_for_status()
return True
except Exception as exc:
logger.warning(f"Failed to append history transcript (call_id={call_id}, turn={turn_index}): {exc}")
return False
async def finalize_call_record(
self,
*,
call_id: str,
status: str,
duration_seconds: int,
) -> bool:
"""Finalize a call record with status and duration."""
if not call_id:
return False
url = f"{self._base_url}/api/history/{call_id}"
payload: Dict[str, Any] = {
"status": status,
"duration_seconds": duration_seconds,
}
try:
async with aiohttp.ClientSession(timeout=self._timeout()) as session:
async with session.put(url, json=payload) as resp:
resp.raise_for_status()
return True
except Exception as exc:
logger.warning(f"Failed to finalize history call record ({call_id}): {exc}")
return False
async def search_knowledge_context(
self,
*,
kb_id: str,
query: str,
n_results: int = 5,
) -> List[Dict[str, Any]]:
"""Search backend knowledge base and return retrieval results."""
if not kb_id or not query.strip():
return []
try:
safe_n_results = max(1, int(n_results))
except (TypeError, ValueError):
safe_n_results = 5
url = f"{self._base_url}/api/knowledge/search"
payload: Dict[str, Any] = {
"kb_id": kb_id,
"query": query,
"nResults": safe_n_results,
}
try:
async with aiohttp.ClientSession(timeout=self._timeout()) as session:
async with session.post(url, json=payload) as resp:
if resp.status == 404:
logger.warning(f"Knowledge base not found for retrieval: {kb_id}")
return []
resp.raise_for_status()
data = await resp.json()
if not isinstance(data, dict):
return []
results = data.get("results", [])
if not isinstance(results, list):
return []
return [r for r in results if isinstance(r, dict)]
except Exception as exc:
logger.warning(f"Knowledge search failed (kb_id={kb_id}): {exc}")
return []
async def fetch_tool_resource(self, tool_id: str) -> Optional[Dict[str, Any]]:
"""Fetch tool resource configuration from backend API."""
if not tool_id:
return None
url = f"{self._base_url}/api/tools/resources/{tool_id}"
try:
async with aiohttp.ClientSession(timeout=self._timeout()) as session:
async with session.get(url) as resp:
if resp.status == 404:
return None
resp.raise_for_status()
data = await resp.json()
return data if isinstance(data, dict) else None
except Exception as exc:
logger.warning(f"Failed to fetch tool resource ({tool_id}): {exc}")
return None
def build_backend_adapter(
*,
backend_url: Optional[str],
backend_mode: str = "auto",
history_enabled: bool = True,
timeout_sec: int = 10,
) -> HttpBackendAdapter | NullBackendAdapter | HistoryDisabledBackendAdapter:
"""Create backend adapter implementation based on runtime settings."""
mode = str(backend_mode or "auto").strip().lower()
has_url = bool(str(backend_url or "").strip())
base_adapter: HttpBackendAdapter | NullBackendAdapter
if mode in {"disabled", "off", "none", "null", "engine_only", "engine-only"}:
base_adapter = NullBackendAdapter()
elif mode == "http":
if has_url:
base_adapter = HttpBackendAdapter(backend_url=str(backend_url), timeout_sec=timeout_sec)
else:
logger.warning("BACKEND_MODE=http but BACKEND_URL is empty; falling back to NullBackendAdapter")
base_adapter = NullBackendAdapter()
else:
if has_url:
base_adapter = HttpBackendAdapter(backend_url=str(backend_url), timeout_sec=timeout_sec)
else:
base_adapter = NullBackendAdapter()
if not history_enabled:
return HistoryDisabledBackendAdapter(base_adapter)
return base_adapter
def build_backend_adapter_from_settings() -> HttpBackendAdapter | NullBackendAdapter | HistoryDisabledBackendAdapter:
"""Create backend adapter using current app settings."""
return build_backend_adapter(
backend_url=settings.backend_url,
backend_mode=settings.backend_mode,
history_enabled=settings.history_enabled,
timeout_sec=settings.backend_timeout_sec,
)

View File

@@ -1,87 +0,0 @@
"""Compatibility wrappers around backend adapter implementations."""
from __future__ import annotations
from typing import Any, Dict, List, Optional
from app.backend_adapters import build_backend_adapter_from_settings
def _adapter():
return build_backend_adapter_from_settings()
async def fetch_assistant_config(assistant_id: str) -> Optional[Dict[str, Any]]:
"""Fetch assistant config payload from backend adapter."""
return await _adapter().fetch_assistant_config(assistant_id)
async def create_history_call_record(
*,
user_id: int,
assistant_id: Optional[str],
source: str = "debug",
) -> Optional[str]:
"""Create a call record via backend history API and return call_id."""
return await _adapter().create_call_record(
user_id=user_id,
assistant_id=assistant_id,
source=source,
)
async def add_history_transcript(
*,
call_id: str,
turn_index: int,
speaker: str,
content: str,
start_ms: int,
end_ms: int,
confidence: Optional[float] = None,
duration_ms: Optional[int] = None,
) -> bool:
"""Append a transcript segment to backend history."""
return await _adapter().add_transcript(
call_id=call_id,
turn_index=turn_index,
speaker=speaker,
content=content,
start_ms=start_ms,
end_ms=end_ms,
confidence=confidence,
duration_ms=duration_ms,
)
async def finalize_history_call_record(
*,
call_id: str,
status: str,
duration_seconds: int,
) -> bool:
"""Finalize a call record with status and duration."""
return await _adapter().finalize_call_record(
call_id=call_id,
status=status,
duration_seconds=duration_seconds,
)
async def search_knowledge_context(
*,
kb_id: str,
query: str,
n_results: int = 5,
) -> List[Dict[str, Any]]:
"""Search backend knowledge base and return retrieval results."""
return await _adapter().search_knowledge_context(
kb_id=kb_id,
query=query,
n_results=n_results,
)
async def fetch_tool_resource(tool_id: str) -> Optional[Dict[str, Any]]:
"""Fetch tool resource configuration from backend API."""
return await _adapter().fetch_tool_resource(tool_id)

View File

@@ -1,371 +1,31 @@
"""Configuration management using Pydantic settings and agent YAML profiles."""
"""Configuration management using Pydantic settings."""
import json
import os
import re
import sys
from dataclasses import dataclass
from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple
from typing import Any, List, Optional
from pydantic import Field
from pydantic_settings import BaseSettings, SettingsConfigDict
try:
import yaml
except ImportError: # pragma: no cover - validated when agent YAML is used
yaml = None
from dotenv import load_dotenv
except ImportError: # pragma: no cover - optional dependency in some runtimes
load_dotenv = None
def _prime_process_env_from_dotenv() -> None:
"""Load .env into process env early."""
if load_dotenv is None:
return
cwd_env = Path.cwd() / ".env"
engine_env = Path(__file__).resolve().parent.parent / ".env"
load_dotenv(dotenv_path=cwd_env, override=False)
if engine_env != cwd_env:
load_dotenv(dotenv_path=engine_env, override=False)
_ENV_REF_PATTERN = re.compile(r"\$\{([A-Za-z_][A-Za-z0-9_]*)(?::([^}]*))?\}")
_DEFAULT_AGENT_CONFIG_DIR = "config/agents"
_DEFAULT_AGENT_CONFIG_FILE = "default.yaml"
_AGENT_SECTION_KEY_MAP: Dict[str, Dict[str, str]] = {
"vad": {
"type": "vad_type",
"model_path": "vad_model_path",
"threshold": "vad_threshold",
"min_speech_duration_ms": "vad_min_speech_duration_ms",
"eou_threshold_ms": "vad_eou_threshold_ms",
},
"llm": {
"provider": "llm_provider",
"model": "llm_model",
"temperature": "llm_temperature",
"api_key": "llm_api_key",
"api_url": "llm_api_url",
},
"tts": {
"provider": "tts_provider",
"api_key": "tts_api_key",
"api_url": "tts_api_url",
"model": "tts_model",
"voice": "tts_voice",
"dashscope_mode": "tts_mode",
"mode": "tts_mode",
"speed": "tts_speed",
},
"asr": {
"provider": "asr_provider",
"api_key": "asr_api_key",
"api_url": "asr_api_url",
"model": "asr_model",
"interim_interval_ms": "asr_interim_interval_ms",
"min_audio_ms": "asr_min_audio_ms",
"start_min_speech_ms": "asr_start_min_speech_ms",
"pre_speech_ms": "asr_pre_speech_ms",
"final_tail_ms": "asr_final_tail_ms",
},
"duplex": {
"enabled": "duplex_enabled",
"greeting": "duplex_greeting",
"system_prompt": "duplex_system_prompt",
"opener_audio_file": "duplex_opener_audio_file",
},
"barge_in": {
"min_duration_ms": "barge_in_min_duration_ms",
"silence_tolerance_ms": "barge_in_silence_tolerance_ms",
},
}
_AGENT_SETTING_KEYS = {
"vad_type",
"vad_model_path",
"vad_threshold",
"vad_min_speech_duration_ms",
"vad_eou_threshold_ms",
"llm_provider",
"llm_api_key",
"llm_api_url",
"llm_model",
"llm_temperature",
"tts_provider",
"tts_api_key",
"tts_api_url",
"tts_model",
"tts_voice",
"tts_mode",
"tts_speed",
"asr_provider",
"asr_api_key",
"asr_api_url",
"asr_model",
"asr_interim_interval_ms",
"asr_min_audio_ms",
"asr_start_min_speech_ms",
"asr_pre_speech_ms",
"asr_final_tail_ms",
"duplex_enabled",
"duplex_greeting",
"duplex_system_prompt",
"duplex_opener_audio_file",
"barge_in_min_duration_ms",
"barge_in_silence_tolerance_ms",
"tools",
}
_BASE_REQUIRED_AGENT_SETTING_KEYS = {
"vad_type",
"vad_model_path",
"vad_threshold",
"vad_min_speech_duration_ms",
"vad_eou_threshold_ms",
"llm_provider",
"llm_model",
"llm_temperature",
"tts_provider",
"tts_voice",
"tts_speed",
"asr_provider",
"asr_interim_interval_ms",
"asr_min_audio_ms",
"asr_start_min_speech_ms",
"asr_pre_speech_ms",
"asr_final_tail_ms",
"duplex_enabled",
"duplex_system_prompt",
"barge_in_min_duration_ms",
"barge_in_silence_tolerance_ms",
}
_OPENAI_COMPATIBLE_LLM_PROVIDERS = {"openai_compatible", "openai-compatible", "siliconflow"}
_OPENAI_COMPATIBLE_TTS_PROVIDERS = {"openai_compatible", "openai-compatible", "siliconflow"}
_DASHSCOPE_TTS_PROVIDERS = {"dashscope"}
_OPENAI_COMPATIBLE_ASR_PROVIDERS = {"openai_compatible", "openai-compatible", "siliconflow"}
def _normalized_provider(overrides: Dict[str, Any], key: str, default: str) -> str:
return str(overrides.get(key) or default).strip().lower()
def _is_blank(value: Any) -> bool:
return value is None or (isinstance(value, str) and not value.strip())
@dataclass(frozen=True)
class AgentConfigSelection:
"""Resolved agent config location and how it was selected."""
path: Optional[Path]
source: str
def _parse_cli_agent_args(argv: List[str]) -> Tuple[Optional[str], Optional[str]]:
"""Parse only agent-related CLI flags from argv."""
config_path: Optional[str] = None
profile: Optional[str] = None
i = 0
while i < len(argv):
arg = argv[i]
if arg.startswith("--agent-config="):
config_path = arg.split("=", 1)[1].strip() or None
elif arg == "--agent-config" and i + 1 < len(argv):
config_path = argv[i + 1].strip() or None
i += 1
elif arg.startswith("--agent-profile="):
profile = arg.split("=", 1)[1].strip() or None
elif arg == "--agent-profile" and i + 1 < len(argv):
profile = argv[i + 1].strip() or None
i += 1
i += 1
return config_path, profile
def _agent_config_dir() -> Path:
base_dir = Path(os.getenv("AGENT_CONFIG_DIR", _DEFAULT_AGENT_CONFIG_DIR))
if not base_dir.is_absolute():
base_dir = Path.cwd() / base_dir
return base_dir.resolve()
def _resolve_agent_selection(
agent_config_path: Optional[str] = None,
agent_profile: Optional[str] = None,
argv: Optional[List[str]] = None,
) -> AgentConfigSelection:
cli_path, cli_profile = _parse_cli_agent_args(list(argv if argv is not None else sys.argv[1:]))
path_value = agent_config_path or cli_path or os.getenv("AGENT_CONFIG_PATH")
profile_value = agent_profile or cli_profile or os.getenv("AGENT_PROFILE")
source = "none"
candidate: Optional[Path] = None
if path_value:
source = "cli_path" if (agent_config_path or cli_path) else "env_path"
candidate = Path(path_value)
elif profile_value:
source = "cli_profile" if (agent_profile or cli_profile) else "env_profile"
candidate = _agent_config_dir() / f"{profile_value}.yaml"
else:
fallback = _agent_config_dir() / _DEFAULT_AGENT_CONFIG_FILE
if fallback.exists():
source = "default"
candidate = fallback
if candidate is None:
raise ValueError(
"Agent YAML config is required. Provide --agent-config/--agent-profile "
"or create config/agents/default.yaml."
)
if not candidate.is_absolute():
candidate = (Path.cwd() / candidate).resolve()
else:
candidate = candidate.resolve()
if not candidate.exists():
raise ValueError(f"Agent config file not found ({source}): {candidate}")
if not candidate.is_file():
raise ValueError(f"Agent config path is not a file: {candidate}")
return AgentConfigSelection(path=candidate, source=source)
def _resolve_env_refs(value: Any) -> Any:
"""Resolve ${ENV_VAR} / ${ENV_VAR:default} placeholders recursively."""
if isinstance(value, dict):
return {k: _resolve_env_refs(v) for k, v in value.items()}
if isinstance(value, list):
return [_resolve_env_refs(item) for item in value]
if not isinstance(value, str) or "${" not in value:
return value
def _replace(match: re.Match[str]) -> str:
env_key = match.group(1)
default_value = match.group(2)
env_value = os.getenv(env_key)
if env_value is None:
if default_value is None:
raise ValueError(f"Missing environment variable referenced in agent YAML: {env_key}")
return default_value
return env_value
return _ENV_REF_PATTERN.sub(_replace, value)
def _normalize_agent_overrides(raw: Dict[str, Any]) -> Dict[str, Any]:
"""Normalize YAML into flat Settings fields."""
normalized: Dict[str, Any] = {}
for key, value in raw.items():
if key == "siliconflow":
raise ValueError(
"Section 'siliconflow' is no longer supported. "
"Move provider-specific fields into agent.llm / agent.asr / agent.tts."
)
if key == "tools":
if not isinstance(value, list):
raise ValueError("Agent config key 'tools' must be a list")
normalized["tools"] = value
continue
section_map = _AGENT_SECTION_KEY_MAP.get(key)
if section_map is None:
normalized[key] = value
continue
if not isinstance(value, dict):
raise ValueError(f"Agent config section '{key}' must be a mapping")
for nested_key, nested_value in value.items():
mapped_key = section_map.get(nested_key)
if mapped_key is None:
raise ValueError(f"Unknown key in '{key}' section: '{nested_key}'")
normalized[mapped_key] = nested_value
unknown_keys = sorted(set(normalized) - _AGENT_SETTING_KEYS)
if unknown_keys:
raise ValueError(
"Unknown agent config keys in YAML: "
+ ", ".join(unknown_keys)
)
return normalized
def _missing_required_keys(overrides: Dict[str, Any]) -> List[str]:
missing = set(_BASE_REQUIRED_AGENT_SETTING_KEYS - set(overrides))
string_required = {
"vad_type",
"vad_model_path",
"llm_provider",
"llm_model",
"tts_provider",
"tts_voice",
"asr_provider",
"duplex_system_prompt",
}
for key in string_required:
if key in overrides and _is_blank(overrides.get(key)):
missing.add(key)
llm_provider = _normalized_provider(overrides, "llm_provider", "openai")
if llm_provider in _OPENAI_COMPATIBLE_LLM_PROVIDERS or llm_provider == "openai":
if "llm_api_key" not in overrides or _is_blank(overrides.get("llm_api_key")):
missing.add("llm_api_key")
tts_provider = _normalized_provider(overrides, "tts_provider", "openai_compatible")
if tts_provider in _OPENAI_COMPATIBLE_TTS_PROVIDERS:
if "tts_api_key" not in overrides or _is_blank(overrides.get("tts_api_key")):
missing.add("tts_api_key")
if "tts_api_url" not in overrides or _is_blank(overrides.get("tts_api_url")):
missing.add("tts_api_url")
if "tts_model" not in overrides or _is_blank(overrides.get("tts_model")):
missing.add("tts_model")
elif tts_provider in _DASHSCOPE_TTS_PROVIDERS:
if "tts_api_key" not in overrides or _is_blank(overrides.get("tts_api_key")):
missing.add("tts_api_key")
asr_provider = _normalized_provider(overrides, "asr_provider", "openai_compatible")
if asr_provider in _OPENAI_COMPATIBLE_ASR_PROVIDERS:
if "asr_api_key" not in overrides or _is_blank(overrides.get("asr_api_key")):
missing.add("asr_api_key")
if "asr_api_url" not in overrides or _is_blank(overrides.get("asr_api_url")):
missing.add("asr_api_url")
if "asr_model" not in overrides or _is_blank(overrides.get("asr_model")):
missing.add("asr_model")
return sorted(missing)
def _load_agent_overrides(selection: AgentConfigSelection) -> Dict[str, Any]:
if yaml is None:
raise RuntimeError(
"PyYAML is required for agent YAML configuration. Install with: pip install pyyaml"
)
with selection.path.open("r", encoding="utf-8") as file:
raw = yaml.safe_load(file) or {}
if not isinstance(raw, dict):
raise ValueError(f"Agent config must be a YAML mapping: {selection.path}")
if "agent" in raw:
agent_value = raw["agent"]
if not isinstance(agent_value, dict):
raise ValueError("The 'agent' key in YAML must be a mapping")
raw = agent_value
resolved = _resolve_env_refs(raw)
overrides = _normalize_agent_overrides(resolved)
missing_required = _missing_required_keys(overrides)
if missing_required:
raise ValueError(
f"Missing required agent settings in YAML ({selection.path}): "
+ ", ".join(missing_required)
)
overrides["agent_config_path"] = str(selection.path)
overrides["agent_config_source"] = selection.source
return overrides
def load_settings(
agent_config_path: Optional[str] = None,
agent_profile: Optional[str] = None,
argv: Optional[List[str]] = None,
) -> "Settings":
"""Load settings from .env and optional agent YAML."""
selection = _resolve_agent_selection(
agent_config_path=agent_config_path,
agent_profile=agent_profile,
argv=argv,
)
agent_overrides = _load_agent_overrides(selection)
return Settings(**agent_overrides)
_prime_process_env_from_dotenv()
class Settings(BaseSettings):
@@ -402,9 +62,8 @@ class Settings(BaseSettings):
# LLM Configuration
llm_provider: str = Field(
default="openai",
description="LLM provider (openai, openai_compatible, siliconflow)"
description="LLM provider (openai, openai_compatible, siliconflow, fastgpt)"
)
llm_api_key: Optional[str] = Field(default=None, description="LLM provider API key")
llm_api_url: Optional[str] = Field(default=None, description="LLM provider API base URL")
llm_model: str = Field(default="gpt-4o-mini", description="LLM model name")
llm_temperature: float = Field(default=0.7, description="LLM temperature for response generation")
@@ -412,12 +71,15 @@ class Settings(BaseSettings):
# TTS Configuration
tts_provider: str = Field(
default="openai_compatible",
description="TTS provider (edge, openai_compatible, siliconflow, dashscope)"
description="TTS provider (openai_compatible, siliconflow, dashscope, volcengine)"
)
tts_api_key: Optional[str] = Field(default=None, description="TTS provider API key")
tts_api_url: Optional[str] = Field(default=None, description="TTS provider API URL")
tts_model: Optional[str] = Field(default=None, description="TTS model name")
tts_voice: str = Field(default="anna", description="TTS voice name")
tts_app_id: Optional[str] = Field(default=None, description="Provider-specific TTS app ID")
tts_resource_id: Optional[str] = Field(default=None, description="Provider-specific TTS resource ID")
tts_cluster: Optional[str] = Field(default=None, description="Provider-specific TTS cluster")
tts_uid: Optional[str] = Field(default=None, description="Provider-specific TTS user ID")
tts_mode: str = Field(
default="commit",
description="DashScope-only TTS mode (commit, server_commit). Ignored for non-dashscope providers."
@@ -427,11 +89,19 @@ class Settings(BaseSettings):
# ASR Configuration
asr_provider: str = Field(
default="openai_compatible",
description="ASR provider (openai_compatible, buffered, siliconflow)"
description="ASR provider (openai_compatible, buffered, siliconflow, dashscope, volcengine)"
)
asr_api_key: Optional[str] = Field(default=None, description="ASR provider API key")
asr_api_url: Optional[str] = Field(default=None, description="ASR provider API URL")
asr_model: Optional[str] = Field(default=None, description="ASR model name")
asr_app_id: Optional[str] = Field(default=None, description="Provider-specific ASR app ID")
asr_resource_id: Optional[str] = Field(default=None, description="Provider-specific ASR resource ID")
asr_cluster: Optional[str] = Field(default=None, description="Provider-specific ASR cluster")
asr_uid: Optional[str] = Field(default=None, description="Provider-specific ASR user ID")
asr_request_params_json: Optional[str] = Field(
default=None,
description="Provider-specific ASR request params as JSON string"
)
asr_enable_interim: bool = Field(default=False, description="Enable interim transcripts for offline ASR")
asr_interim_interval_ms: int = Field(default=500, description="Interval for interim ASR results in ms")
asr_min_audio_ms: int = Field(default=300, description="Minimum audio duration before first ASR result")
asr_start_min_speech_ms: int = Field(
@@ -493,8 +163,10 @@ class Settings(BaseSettings):
inactivity_timeout_sec: int = Field(default=60, description="Close connection after no message from client (seconds)")
heartbeat_interval_sec: int = Field(default=50, description="Send heartBeat event to client every N seconds")
ws_protocol_version: str = Field(default="v1", description="Public WS protocol version")
ws_api_key: Optional[str] = Field(default=None, description="Optional API key required for WS hello auth")
ws_require_auth: bool = Field(default=False, description="Require auth in hello message even when ws_api_key is not set")
ws_emit_config_resolved: bool = Field(
default=False,
description="Emit config.resolved after session.started (debug/internal use; disabled for public SaaS by default)",
)
# Backend bridge configuration (for call/transcript persistence)
backend_mode: str = Field(
@@ -503,6 +175,10 @@ class Settings(BaseSettings):
)
backend_url: Optional[str] = Field(default=None, description="Backend API base URL (e.g. http://localhost:8787)")
backend_timeout_sec: int = Field(default=10, description="Backend API request timeout in seconds")
assistant_local_config_dir: str = Field(
default="engine/config/agents",
description="Directory containing local assistant runtime YAML files"
)
history_enabled: bool = Field(default=True, description="Enable history write bridge")
history_default_user_id: int = Field(default=1, description="Fallback user_id for history records")
history_queue_max_size: int = Field(default=256, description="Max buffered transcript writes per session")
@@ -513,10 +189,6 @@ class Settings(BaseSettings):
description="Max wait before finalizing history when queue is still draining"
)
# Agent YAML metadata
agent_config_path: Optional[str] = Field(default=None, description="Resolved agent YAML path")
agent_config_source: str = Field(default="none", description="How the agent YAML was selected")
@property
def chunk_size_bytes(self) -> int:
"""Calculate chunk size in bytes based on sample rate and duration."""
@@ -541,7 +213,7 @@ class Settings(BaseSettings):
# Global settings instance
settings = load_settings()
settings = Settings()
def get_settings() -> Settings:

View File

@@ -20,16 +20,28 @@ except ImportError:
logger.warning("aiortc not available - WebRTC endpoint will be disabled")
from app.config import settings
from app.backend_adapters import build_backend_adapter_from_settings
from core.transports import SocketTransport, WebRtcTransport, BaseTransport
from core.session import Session
from adapters.control_plane.backend import build_backend_adapter_from_settings
from runtime.transports import SocketTransport, WebRtcTransport, BaseTransport
from runtime.session.manager import Session
from processors.tracks import Resampled16kTrack
from core.events import get_event_bus, reset_event_bus
from runtime.events import get_event_bus, reset_event_bus
# Check interval for heartbeat/timeout (seconds)
_HEARTBEAT_CHECK_INTERVAL_SEC = 5
def _inactivity_deadline(
*,
last_received_at: float,
inactivity_timeout_sec: int,
pending_client_tool_deadline: Optional[float] = None,
) -> float:
deadline = float(last_received_at) + float(inactivity_timeout_sec)
if pending_client_tool_deadline is not None:
deadline = max(deadline, float(pending_client_tool_deadline))
return deadline
async def heartbeat_and_timeout_task(
transport: BaseTransport,
session: Session,
@@ -48,8 +60,22 @@ async def heartbeat_and_timeout_task(
if transport.is_closed:
break
now = time.monotonic()
if now - last_received_at[0] > inactivity_timeout_sec:
logger.info(f"Session {session_id}: {inactivity_timeout_sec}s no message, closing")
pending_client_tool_deadline = session.pipeline.pending_client_tool_deadline()
idle_deadline = _inactivity_deadline(
last_received_at=last_received_at[0],
inactivity_timeout_sec=inactivity_timeout_sec,
pending_client_tool_deadline=pending_client_tool_deadline,
)
if now > idle_deadline:
if pending_client_tool_deadline is not None and pending_client_tool_deadline >= (
last_received_at[0] + inactivity_timeout_sec
):
logger.info(
"Session {}: no message before pending client tool deadline, closing",
session_id,
)
else:
logger.info(f"Session {session_id}: {inactivity_timeout_sec}s no message, closing")
await session.cleanup()
break
if now - last_heartbeat_at[0] >= heartbeat_interval_sec:
@@ -76,22 +102,39 @@ app.add_middleware(
# Active sessions storage
active_sessions: Dict[str, Session] = {}
backend_gateway = build_backend_adapter_from_settings()
control_plane_gateway = build_backend_adapter_from_settings()
# Configure logging
logger.remove()
logger.add(
"./logs/active_call_{time}.log",
rotation="1 day",
retention="7 days",
level=settings.log_level,
format="{time:YYYY-MM-DD HH:mm:ss} | {level: <8} | {name}:{function}:{line} - {message}"
)
logger.add(
lambda msg: print(msg, end=""),
level=settings.log_level,
format="{time:HH:mm:ss} | {level: <8} | {message}"
)
_log_format = str(settings.log_format or "text").strip().lower()
if _log_format == "json":
logger.add(
"./logs/active_call_{time}.log",
rotation="1 day",
retention="7 days",
level=settings.log_level,
serialize=True,
format="{message}",
)
logger.add(
lambda msg: print(msg, end=""),
level=settings.log_level,
serialize=True,
format="{message}",
)
else:
logger.add(
"./logs/active_call_{time}.log",
rotation="1 day",
retention="7 days",
level=settings.log_level,
format="{time:YYYY-MM-DD HH:mm:ss} | {level: <8} | {name}:{function}:{line} - {message}",
)
logger.add(
lambda msg: print(msg, end=""),
level=settings.log_level,
format="{time:HH:mm:ss} | {level: <8} | {message}",
)
@app.get("/health")
@@ -163,13 +206,19 @@ async def websocket_endpoint(websocket: WebSocket):
"""
await websocket.accept()
session_id = str(uuid.uuid4())
assistant_id = str(websocket.query_params.get("assistant_id") or "").strip() or None
# Create transport and session
transport = SocketTransport(websocket)
session = Session(session_id, transport, backend_gateway=backend_gateway)
session = Session(
session_id,
transport,
control_plane_gateway=control_plane_gateway,
assistant_id=assistant_id,
)
active_sessions[session_id] = session
logger.info(f"WebSocket connection established: {session_id}")
logger.info(f"WebSocket connection established: {session_id} assistant_id={assistant_id or '-'}")
last_received_at: List[float] = [time.monotonic()]
last_heartbeat_at: List[float] = [0.0]
@@ -239,16 +288,22 @@ async def webrtc_endpoint(websocket: WebSocket):
return
await websocket.accept()
session_id = str(uuid.uuid4())
assistant_id = str(websocket.query_params.get("assistant_id") or "").strip() or None
# Create WebRTC peer connection
pc = RTCPeerConnection()
# Create transport and session
transport = WebRtcTransport(websocket, pc)
session = Session(session_id, transport, backend_gateway=backend_gateway)
session = Session(
session_id,
transport,
control_plane_gateway=control_plane_gateway,
assistant_id=assistant_id,
)
active_sessions[session_id] = session
logger.info(f"WebRTC connection established: {session_id}")
logger.info(f"WebRTC connection established: {session_id} assistant_id={assistant_id or '-'}")
last_received_at: List[float] = [time.monotonic()]
last_heartbeat_at: List[float] = [0.0]
@@ -359,12 +414,10 @@ async def startup_event():
logger.info(f"Server: {settings.host}:{settings.port}")
logger.info(f"Sample rate: {settings.sample_rate} Hz")
logger.info(f"VAD model: {settings.vad_model_path}")
if settings.agent_config_path:
logger.info(
f"Agent config loaded ({settings.agent_config_source}): {settings.agent_config_path}"
)
else:
logger.info("Agent config: none (using .env/default agent values)")
logger.info(
"Assistant runtime config source: backend when BACKEND_URL is set, "
"otherwise local YAML by assistant_id from ASSISTANT_LOCAL_CONFIG_DIR"
)
@app.on_event("shutdown")

View File

@@ -0,0 +1,47 @@
# Agent behavior configuration for DashScope realtime ASR/TTS.
# This file only controls agent-side behavior (VAD/LLM/TTS/ASR providers).
# Infra/server/network settings should stay in .env.
agent:
vad:
type: silero
model_path: data/vad/silero_vad.onnx
threshold: 0.5
min_speech_duration_ms: 100
eou_threshold_ms: 800
llm:
# provider: openai | openai_compatible | siliconflow
provider: openai_compatible
model: deepseek-v3
temperature: 0.7
api_key: your_llm_api_key
api_url: https://api.qnaigc.com/v1
tts:
provider: dashscope
api_key: your_tts_api_key
api_url: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
model: qwen3-tts-flash-realtime
voice: Cherry
dashscope_mode: commit
speed: 1.0
asr:
provider: dashscope
api_key: your_asr_api_key
api_url: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
model: qwen3-asr-flash-realtime
interim_interval_ms: 500
min_audio_ms: 300
start_min_speech_ms: 160
pre_speech_ms: 240
final_tail_ms: 120
duplex:
enabled: true
system_prompt: 你是一个人工智能助手你用简答语句回答避免使用标点符号和emoji。
barge_in:
min_duration_ms: 200
silence_tolerance_ms: 60

View File

@@ -0,0 +1,47 @@
# Agent behavior configuration for DashScope realtime ASR/TTS.
# This file only controls agent-side behavior (VAD/LLM/TTS/ASR providers).
# Infra/server/network settings should stay in .env.
agent:
vad:
type: silero
model_path: data/vad/silero_vad.onnx
threshold: 0.5
min_speech_duration_ms: 100
eou_threshold_ms: 800
llm:
# provider: openai | openai_compatible | siliconflow
provider: openai_compatible
model: deepseek-v3
temperature: 0.7
api_key: sk-fc4d59b360475f53401a864db8ce0985010acc4e696723d20a90d6569f38d80a
api_url: https://api.qnaigc.com/v1
tts:
provider: dashscope
api_key: sk-391f5126d18345d497c6e8717c8c9ad7
api_url: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
model: qwen3-tts-flash-realtime
voice: Cherry
dashscope_mode: commit
speed: 1.0
asr:
provider: dashscope
api_key: sk-391f5126d18345d497c6e8717c8c9ad7
api_url: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
model: qwen3-asr-flash-realtime
interim_interval_ms: 500
min_audio_ms: 300
start_min_speech_ms: 160
pre_speech_ms: 240
final_tail_ms: 120
duplex:
enabled: true
system_prompt: 你是一个人工智能助手你用简答语句回答避免使用标点符号和emoji。
barge_in:
min_duration_ms: 200
silence_tolerance_ms: 60

View File

@@ -11,7 +11,7 @@ agent:
eou_threshold_ms: 800
llm:
# provider: openai | openai_compatible | siliconflow
# provider: openai | openai_compatible | siliconflow | fastgpt
provider: openai_compatible
model: deepseek-v3
temperature: 0.7
@@ -21,12 +21,17 @@ agent:
api_url: https://api.qnaigc.com/v1
tts:
# provider: edge | openai_compatible | siliconflow | dashscope
# provider: openai_compatible | siliconflow | dashscope | volcengine
# dashscope defaults (if omitted):
# api_url: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
# model: qwen3-tts-flash-realtime
# dashscope_mode: commit (engine splits) | server_commit (dashscope splits)
# note: dashscope_mode/mode is ONLY used when provider=dashscope.
# volcengine defaults (if omitted):
# api_url: https://openspeech.bytedance.com/api/v3/tts/unidirectional
# resource_id: seed-tts-2.0
# app_id: your volcengine app key
# api_key: your volcengine access key
provider: openai_compatible
api_key: your_tts_api_key
api_url: https://api.siliconflow.cn/v1/audio/speech
@@ -35,11 +40,26 @@ agent:
speed: 1.0
asr:
# provider: buffered | openai_compatible | siliconflow
# provider: buffered | openai_compatible | siliconflow | dashscope | volcengine
# dashscope defaults (if omitted):
# api_url: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
# model: qwen3-asr-flash-realtime
# note: dashscope uses streaming ASR mode (chunk-by-chunk).
# volcengine defaults (if omitted):
# api_url: wss://openspeech.bytedance.com/api/v3/sauc/bigmodel
# model: bigmodel
# resource_id: volc.bigasr.sauc.duration
# app_id: your volcengine app key
# api_key: your volcengine access key
# request_params:
# end_window_size: 800
# force_to_speech_time: 1000
# note: volcengine uses streaming ASR mode (chunk-by-chunk).
provider: openai_compatible
api_key: you_asr_api_key
api_url: https://api.siliconflow.cn/v1/audio/transcriptions
model: FunAudioLLM/SenseVoiceSmall
enable_interim: false
interim_interval_ms: 500
min_audio_ms: 300
start_min_speech_ms: 160
@@ -53,3 +73,4 @@ agent:
barge_in:
min_duration_ms: 200
silence_tolerance_ms: 60

View File

@@ -18,12 +18,17 @@ agent:
api_url: https://api.qnaigc.com/v1
tts:
# provider: edge | openai_compatible | siliconflow | dashscope
# provider: openai_compatible | siliconflow | dashscope | volcengine
# dashscope defaults (if omitted):
# api_url: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
# model: qwen3-tts-flash-realtime
# dashscope_mode: commit (engine splits) | server_commit (dashscope splits)
# note: dashscope_mode/mode is ONLY used when provider=dashscope.
# volcengine defaults (if omitted):
# api_url: https://openspeech.bytedance.com/api/v3/tts/unidirectional
# resource_id: seed-tts-2.0
# app_id: your volcengine app key
# api_key: your volcengine access key
provider: openai_compatible
api_key: your_tts_api_key
api_url: https://api.siliconflow.cn/v1/audio/speech
@@ -32,11 +37,26 @@ agent:
speed: 1.0
asr:
# provider: buffered | openai_compatible | siliconflow
# provider: buffered | openai_compatible | siliconflow | dashscope | volcengine
# dashscope defaults (if omitted):
# api_url: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
# model: qwen3-asr-flash-realtime
# note: dashscope uses streaming ASR mode (chunk-by-chunk).
# volcengine defaults (if omitted):
# api_url: wss://openspeech.bytedance.com/api/v3/sauc/bigmodel
# model: bigmodel
# resource_id: volc.bigasr.sauc.duration
# app_id: your volcengine app key
# api_key: your volcengine access key
# request_params:
# end_window_size: 800
# force_to_speech_time: 1000
# note: volcengine uses streaming ASR mode (chunk-by-chunk).
provider: openai_compatible
api_key: your_asr_api_key
api_url: https://api.siliconflow.cn/v1/audio/transcriptions
model: FunAudioLLM/SenseVoiceSmall
enable_interim: false
interim_interval_ms: 500
min_audio_ms: 300
start_min_speech_ms: 160

View File

@@ -0,0 +1,68 @@
# Agent behavior configuration (safe to edit per profile)
# This file only controls agent-side behavior (VAD/LLM/TTS/ASR providers).
# Infra/server/network settings should stay in .env.
agent:
vad:
type: silero
model_path: data/vad/silero_vad.onnx
threshold: 0.5
min_speech_duration_ms: 100
eou_threshold_ms: 800
llm:
# provider: openai | openai_compatible | siliconflow
provider: openai_compatible
model: deepseek-v3
temperature: 0.7
# Required: no fallback. You can still reference env explicitly.
api_key: your_llm_api_key
# Optional for OpenAI-compatible endpoints:
api_url: https://api.qnaigc.com/v1
tts:
# provider: edge | openai_compatible | siliconflow | dashscope
# dashscope defaults (if omitted):
# api_url: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
# model: qwen3-tts-flash-realtime
# dashscope_mode: commit (engine splits) | server_commit (dashscope splits)
# note: dashscope_mode/mode is ONLY used when provider=dashscope.
# volcengine defaults (if omitted):
provider: volcengine
api_url: https://openspeech.bytedance.com/api/v3/tts/unidirectional
resource_id: seed-tts-2.0
app_id: your_tts_app_id
api_key: your_tts_api_key
speed: 1.1
voice: zh_female_vv_uranus_bigtts
asr:
asr:
provider: volcengine
api_url: wss://openspeech.bytedance.com/api/v3/sauc/bigmodel
app_id: your_asr_app_id
api_key: your_asr_api_key
resource_id: volc.bigasr.sauc.duration
uid: caller-1
model: bigmodel
request_params:
end_window_size: 800
force_to_speech_time: 1000
enable_punc: true
enable_itn: false
enable_ddc: false
show_utterance: true
result_type: single
interim_interval_ms: 500
min_audio_ms: 300
start_min_speech_ms: 160
pre_speech_ms: 240
final_tail_ms: 120
duplex:
enabled: true
system_prompt: 你是一个人工智能助手你用简答语句回答避免使用标点符号和emoji。
barge_in:
min_duration_ms: 200
silence_tolerance_ms: 60

View File

@@ -0,0 +1,67 @@
# Agent behavior configuration (safe to edit per profile)
# This file only controls agent-side behavior (VAD/LLM/TTS/ASR providers).
# Infra/server/network settings should stay in .env.
agent:
vad:
type: silero
model_path: data/vad/silero_vad.onnx
threshold: 0.5
min_speech_duration_ms: 100
eou_threshold_ms: 800
llm:
# provider: openai | openai_compatible | siliconflow
provider: openai_compatible
model: deepseek-v3
temperature: 0.7
# Required: no fallback. You can still reference env explicitly.
api_key: sk-fc4d59b360475f53401a864db8ce0985010acc4e696723d20a90d6569f38d80a
# Optional for OpenAI-compatible endpoints:
api_url: https://api.qnaigc.com/v1
tts:
# provider: edge | openai_compatible | siliconflow | dashscope
# dashscope defaults (if omitted):
# api_url: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
# model: qwen3-tts-flash-realtime
# dashscope_mode: commit (engine splits) | server_commit (dashscope splits)
# note: dashscope_mode/mode is ONLY used when provider=dashscope.
# volcengine defaults (if omitted):
provider: volcengine
api_url: https://openspeech.bytedance.com/api/v3/tts/unidirectional
resource_id: seed-tts-2.0
app_id: 2931820332
api_key: 4ustCTIpdCq8dE_msFrZvFn4nDpioIVo
speed: 1.1
voice: zh_female_vv_uranus_bigtts
asr:
provider: volcengine
api_url: wss://openspeech.bytedance.com/api/v3/sauc/bigmodel
app_id: 8607675070
api_key: QiO0AptfmU0GLTSitwn7t5-zeo4gJ6K1
resource_id: volc.bigasr.sauc.duration
uid: caller-1
model: bigmodel
request_params:
end_window_size: 800
force_to_speech_time: 1000
enable_punc: true
enable_itn: false
enable_ddc: false
show_utterance: true
result_type: single
interim_interval_ms: 500
min_audio_ms: 300
start_min_speech_ms: 160
pre_speech_ms: 240
final_tail_ms: 120
duplex:
enabled: true
system_prompt: 你是一个人工智能助手你用简答语句回答避免使用标点符号和emoji。
barge_in:
min_duration_ms: 200
silence_tolerance_ms: 60

View File

@@ -1,20 +0,0 @@
"""Core Components Package"""
from core.events import EventBus, get_event_bus
from core.transports import BaseTransport, SocketTransport, WebRtcTransport
from core.session import Session
from core.conversation import ConversationManager, ConversationState, ConversationTurn
from core.duplex_pipeline import DuplexPipeline
__all__ = [
"EventBus",
"get_event_bus",
"BaseTransport",
"SocketTransport",
"WebRtcTransport",
"Session",
"ConversationManager",
"ConversationState",
"ConversationTurn",
"DuplexPipeline",
]

View File

@@ -1,17 +0,0 @@
"""Port interfaces for engine-side integration boundaries."""
from core.ports.backend import (
AssistantConfigProvider,
BackendGateway,
HistoryWriter,
KnowledgeSearcher,
ToolResourceResolver,
)
__all__ = [
"AssistantConfigProvider",
"BackendGateway",
"HistoryWriter",
"KnowledgeSearcher",
"ToolResourceResolver",
]

Some files were not shown because too many files have changed in this diff Show More