Files
AI-VideoAssistant/docs/content/customization/asr.md
Xin Wang e11c3abb9e Implement DashScope ASR provider and enhance ASR service architecture
- Added DashScope ASR service implementation for real-time streaming.
- Updated ASR provider logic to support DashScope alongside existing providers.
- Enhanced runtime metadata resolution to include DashScope as a valid ASR provider.
- Modified configuration files and documentation to reflect the addition of DashScope.
- Introduced tests to validate DashScope integration and ASR service behavior.
- Refactored ASR service factory to accommodate new provider options and modes.
2026-03-06 11:44:39 +08:00

30 lines
966 B
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 语音识别
语音识别ASR负责将用户音频实时转写为文本供对话引擎理解。
## 模式
- `offline`:引擎本地缓冲音频后触发识别(适用于 OpenAI-compatible / SiliconFlow
- `streaming`:音频分片实时发送到服务端,服务端持续返回转写事件(适用于 DashScope Realtime ASR
## 配置项
| 配置项 | 说明 |
|---|---|
| ASR 引擎 | 选择语音识别服务提供商 |
| 模型 | 识别模型名称 |
| 语言 | 中文/英文/多语言 |
| 热词 | 提升特定词汇识别准确率 |
| 标点与规范化 | 是否自动补全标点、文本规范化 |
## 建议
- 客服场景建议开启热词并维护业务词表
- 多语言场景建议按会话入口显式指定语言
- 对延迟敏感场景优先选择流式识别模型
- 当前支持提供商:`openai_compatible``siliconflow``dashscope``buffered`(回退)
## 相关文档
- [语音配置总览](voices.md)