- Introduced DashScope as a new ASR model in the database initialization.
- Updated ASRModel schema to include vendor information.
- Enhanced ASR router to support DashScope-specific functionality, including connection testing and preview capabilities.
- Modified frontend components to accommodate DashScope as a selectable vendor with appropriate default settings.
- Added tests to validate DashScope ASR model creation, updates, and connectivity.
- Updated backend API to handle DashScope-specific base URLs and vendor normalization.
- Removed redundant entries from the quick start section for clarity.
- Maintained the inclusion of essential topics to ensure comprehensive guidance for users.
- Revised site name and description for clarity and detail.
- Updated navigation structure to better reflect the organization of content.
- Improved changelog entries for better readability and consistency.
- Migrated assistant configuration and prompt guidelines to new documentation paths.
- Enhanced core concepts section to clarify the roles and capabilities of assistants and engines.
- Streamlined workflow documentation to provide clearer guidance on configuration and usage.
- Introduced new YAML configuration files for DashScope, detailing agent behavior settings for VAD, LLM, TTS, and ASR.
- Configured parameters including model paths, API keys, and service URLs for real-time processing.
- Ensured compatibility with existing agent-side behavior management while providing specific settings for DashScope integration.
- Introduced Volcengine as a new provider for both TTS and ASR services.
- Updated configuration files to include Volcengine-specific parameters such as app_id, resource_id, and uid.
- Enhanced the ASR service to support streaming mode with Volcengine's API.
- Modified existing tests to validate the integration of Volcengine services.
- Updated documentation to reflect the addition of Volcengine as a supported provider for TTS and ASR.
- Refactored service factory to accommodate Volcengine alongside existing providers.
- Corrected phrasing in the introduction of RAS as an open-source alternative.
- Added new documentation sections for voice AI and voice agents.
- Enhanced the flowchart for assistant components to include detailed configurations.
- Updated terminology for engine types to clarify distinctions between Pipeline and Realtime engines.
- Introduced a new section on user utterance endpoints (EoU) to explain detection mechanisms and configurations.
- Introduced `asr_interim_enabled` field in the Assistant model to control interim ASR results.
- Updated AssistantBase and AssistantUpdate schemas to include the new field.
- Modified the database schema to add the `asr_interim_enabled` column.
- Enhanced runtime metadata to reflect interim ASR settings.
- Updated API endpoints and tests to validate the new functionality.
- Adjusted documentation to include details about interim ASR results configuration.
- Added DashScope ASR service implementation for real-time streaming.
- Updated ASR provider logic to support DashScope alongside existing providers.
- Enhanced runtime metadata resolution to include DashScope as a valid ASR provider.
- Modified configuration files and documentation to reflect the addition of DashScope.
- Introduced tests to validate DashScope integration and ASR service behavior.
- Refactored ASR service factory to accommodate new provider options and modes.
- Expanded package inclusion in `pyproject.toml` to support new modules.
- Introduced new `adapters` and `protocol` packages for better organization.
- Added backend adapter implementations for control plane integration.
- Updated main application imports to reflect new package structure.
- Removed deprecated core components and adjusted documentation accordingly.
- Enhanced architecture documentation to clarify the new runtime and integration layers.
- Removed the backend client compatibility wrapper and associated methods to streamline backend integration.
- Updated session management to utilize control plane gateways and runtime configuration providers.
- Adjusted TTS service implementations to remove the EdgeTTS service and simplify service dependencies.
- Enhanced documentation to reflect changes in backend integration and service architecture.
- Updated configuration files to remove deprecated TTS provider options and clarify available settings.
- Updated .env.example to clarify audio frame size validation and default codec settings.
- Refactored logging setup in main.py to support JSON serialization based on log format configuration.
- Improved session.py to dynamically compute audio frame bytes and include protocol version in session events.
- Added tests to validate session start events and audio frame handling based on chunk size settings.
- Removed duplicate entry for Thumbs.db in .gitignore to streamline ignored files.
- Added a new audio example file: three_utterances_simple.wav to the audio_examples directory.
- Removed legacy agent profile settings from the .env.example and README, streamlining the configuration process.
- Introduced a new local YAML configuration adapter for assistant settings, allowing for easier management of assistant profiles.
- Updated backend integration documentation to clarify the behavior of assistant config sourcing based on backend URL settings.
- Adjusted various service implementations to directly utilize API keys from the new configuration structure.
- Enhanced test coverage for the new local YAML adapter and its integration with backend services.
- Added new sections for open-source and commercial projects to enhance resource visibility.
- Included links to various relevant projects, expanding the list of resources available for users.
- Added new sections for open-source and commercial projects to enhance resource visibility.
- Included links to various relevant projects, expanding the list of resources available for users.
- Created a new README file for the changelog to outline version history.
- Updated the roadmap documentation to replace the contribution section with a list of reference projects, enhancing resource visibility.
- Included a new JavaScript file for Mermaid configuration to ensure consistent diagram sizing across documentation.
- Enhanced architecture documentation to reflect the updated pipeline engine structure, including VAD, ASR, TD, LLM, and TTS components.
- Updated various sections to clarify the integration of external services and tools within the architecture.
- Improved styling for Mermaid diagrams to enhance visual consistency and usability.
- Restructured the navigation in mkdocs.yml to improve organization, introducing subcategories for assistant creation and component libraries.
- Added new documentation for workflow configuration options, detailing setup and best practices.
- Introduced new sections for voice recognition and generation, outlining configuration items and recommendations for optimal performance.
- Adjusted indentation in mkdocs.yml to enhance readability and maintain consistency in the navigation hierarchy.
- Ensured that sections for "功能定制" and "数据分析" are clearly organized under their respective categories.
- Added a new section explaining the two layers of assistant configuration: database persistence and session-level overrides.
- Included a table listing fields that are stored in the database and those that can be overridden during a session.
- Provided code examples demonstrating the merging of baseline configuration with session overrides for clarity.
- Included a new item in the roadmap for telephone integration, specifying automatic call handling and batch calling capabilities.
- Updated the existing SDK support section to reflect the addition of this feature.
- Updated the WebSocket API reference to improve clarity by removing unnecessary headings and emphasizing message types.
- Revised the index.md to specify 'chroma' as the knowledge base, enhancing the overview of the platform's architecture.
- Introduced `output.audio.played` message type for client acknowledgment of audio playback completion.
- Updated `DuplexPipeline` to track client playback state and handle playback completion events.
- Enhanced session handling to route `output.audio.played` messages to the pipeline.
- Revised API documentation to include details about the new message type and its fields.
- Updated schema documentation to reflect the addition of `output.audio.played` in the message flow.
- Updated Dockerfile for the API to include build tools for C++11 required for native extensions.
- Revised requirements.txt to upgrade several dependencies, including FastAPI and SQLAlchemy.
- Expanded docker-compose.yml to add MinIO service for S3-compatible storage and improved health checks for backend and engine services.
- Enhanced README.md in the Docker directory to provide detailed service descriptions and quick start instructions.
- Updated mkdocs.yml to reflect new navigation structure and added deployment overview documentation.
- Introduced new Dockerfiles for the engine and web services, including development configurations for hot reloading.
- Revised mkdocs.yml to reflect the new site name and description, enhancing clarity for users.
- Added a changelog.md to document important changes and updates for the project.
- Introduced a roadmap.md to outline development plans and progress for future releases.
- Expanded index.md with a comprehensive overview of the platform, including core features and installation instructions.
- Enhanced concepts documentation with detailed explanations of assistants, engines, and their configurations.
- Updated configuration documentation to provide clear guidance on environment setup and service configurations.
- Added extra JavaScript for improved user experience in the documentation site.
- Added React Query for managing API calls related to assistants and voices.
- Introduced `useAssistantsQuery` and `useVoicesQuery` hooks for fetching data.
- Implemented mutations for creating, updating, and deleting voices using React Query.
- Integrated a global `QueryClient` for managing query states and configurations.
- Refactored components to utilize the new query hooks, improving data handling and performance.
- Added a Zustand store for managing debug preferences, including WebSocket URL and audio settings.
- Replaced the "通过控制台" and "通过 API" entries in the quickstart section with "资源库配置" for improved clarity.
- Updated the API reference link in index.md to direct users to the main quickstart page instead of the outdated API usage example.
- Revised the introduction in index.md to emphasize the need for resource configuration before creating an AI assistant.
- Added a new section detailing the configuration process for ASR, LLM, and TTS resources.
- Updated the quickstart guide to reflect the new resource management steps and included troubleshooting tips for common issues.
- Removed the outdated API guide as it has been integrated into the new resource configuration workflow.
- Revised the description of the Realtime Agent Studio (RAS) to emphasize its foundation on large voice models, enhancing clarity on the platform's capabilities.
- Added pymdownx.tasklist extension to mkdocs.yml for enhanced task management.
- Revised the roadmap section in index.md to include additional completed and in-progress tasks, improving project tracking and visibility.
- Added pymdownx.emoji extension to mkdocs.yml for emoji rendering.
- Updated index.md to include a new dashboard image and revised descriptions for clarity.
- Expanded the features section with detailed descriptions of tools and testing capabilities.
- Introduced a roadmap section outlining completed, in-progress, and to-do features for better project visibility.
- Changed site name and description in mkdocs.yml.
- Revised content in index.md to provide a comprehensive overview of RAS features and capabilities.
- Updated API reference and error documentation to replace AI Video Assistant with RAS.
- Modified deployment and getting started guides to align with the new branding.
- Enhanced quickstart instructions to specify RAS service requirements.
- Added `promptType` and `voiceText` properties to `DebugTextPromptDialogState`.
- Updated state management for text prompt dialogs to handle voice prompts.
- Modified dialog activation logic to play voice prompts when applicable.
- Adjusted UI to reflect the type of prompt being displayed (text or voice).
- Ensured proper handling of prompt closure messages based on prompt type.