Files
AI-VideoAssistant/api/docs/asr.md
2026-02-08 15:52:16 +08:00

410 lines
8.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 语音识别 (ASR Model) API
语音识别 API 用于管理语音识别模型的配置和调用。
## 基础信息
| 项目 | 值 |
|------|-----|
| Base URL | `/api/v1/asr` |
| 认证方式 | Bearer Token (预留) |
---
## 数据模型
### ASRModel
```typescript
interface ASRModel {
id: string; // 模型唯一标识 (8位UUID)
user_id: number; // 所属用户ID
name: string; // 模型显示名称
vendor: string; // 供应商: "OpenAI" | "SiliconFlow" | "Paraformer" | 等
language: string; // 识别语言: "zh" | "en" | "Multi-lingual"
base_url: string; // API Base URL
api_key: string; // API Key
model_name?: string; // 模型名称,如 "whisper-1" | "paraformer-v2"
hotwords?: string[]; // 热词列表
enable_punctuation: boolean; // 是否启用标点
enable_normalization: boolean; // 是否启用文本规范化
enabled: boolean; // 是否启用
created_at: string;
}
```
---
## API 端点
### 1. 获取 ASR 模型列表
```http
GET /api/v1/asr
```
**Query Parameters:**
| 参数 | 类型 | 必填 | 默认值 | 说明 |
|------|------|------|--------|------|
| language | string | 否 | - | 过滤语言: "zh" \| "en" \| "Multi-lingual" |
| enabled | boolean | 否 | - | 过滤启用状态 |
| page | int | 否 | 1 | 页码 |
| limit | int | 否 | 50 | 每页数量 |
**Response:**
```json
{
"total": 3,
"page": 1,
"limit": 50,
"list": [
{
"id": "abc12345",
"user_id": 1,
"name": "Whisper 多语种识别",
"vendor": "OpenAI",
"language": "Multi-lingual",
"base_url": "https://api.openai.com/v1",
"api_key": "sk-***",
"model_name": "whisper-1",
"enable_punctuation": true,
"enable_normalization": true,
"enabled": true,
"created_at": "2024-01-15T10:30:00Z"
},
{
"id": "def67890",
"user_id": 1,
"name": "SenseVoice 中文识别",
"vendor": "SiliconFlow",
"language": "zh",
"base_url": "https://api.siliconflow.cn/v1",
"api_key": "sf-***",
"model_name": "paraformer-v2",
"hotwords": ["小助手", "帮我"],
"enable_punctuation": true,
"enable_normalization": true,
"enabled": true,
"created_at": "2024-01-15T10:30:00Z"
}
]
}
```
---
### 2. 获取单个 ASR 模型详情
```http
GET /api/v1/asr/{id}
```
**Path Parameters:**
| 参数 | 类型 | 说明 |
|------|------|------|
| id | string | 模型ID |
**Response:**
```json
{
"id": "abc12345",
"user_id": 1,
"name": "Whisper 多语种识别",
"vendor": "OpenAI",
"language": "Multi-lingual",
"base_url": "https://api.openai.com/v1",
"api_key": "sk-***",
"model_name": "whisper-1",
"hotwords": [],
"enable_punctuation": true,
"enable_normalization": true,
"enabled": true,
"created_at": "2024-01-15T10:30:00Z"
}
```
---
### 3. 创建 ASR 模型
```http
POST /api/v1/asr
```
**Request Body:**
```json
{
"name": "SenseVoice 中文识别",
"vendor": "SiliconFlow",
"language": "zh",
"base_url": "https://api.siliconflow.cn/v1",
"api_key": "sk-your-api-key",
"model_name": "paraformer-v2",
"hotwords": ["小助手", "帮我"],
"enable_punctuation": true,
"enable_normalization": true,
"enabled": true
}
```
**Fields 说明:**
| 字段 | 类型 | 必填 | 说明 |
|------|------|------|------|
| name | string | 是 | 模型显示名称 |
| vendor | string | 是 | 供应商: "OpenAI" / "SiliconFlow" / "Paraformer" |
| language | string | 是 | 语言: "zh" / "en" / "Multi-lingual" |
| base_url | string | 是 | API Base URL |
| api_key | string | 是 | API Key |
| model_name | string | 否 | 模型名称 |
| hotwords | string[] | 否 | 热词列表,提升识别准确率 |
| enable_punctuation | boolean | 否 | 是否输出标点,默认 true |
| enable_normalization | boolean | 否 | 是否文本规范化,默认 true |
| enabled | boolean | 否 | 是否启用,默认 true |
| id | string | 否 | 指定模型ID默认自动生成 |
---
### 4. 更新 ASR 模型
```http
PUT /api/v1/asr/{id}
```
**Request Body:** (部分更新)
```json
{
"name": "Whisper-1 优化版",
"language": "zh",
"enable_punctuation": true,
"hotwords": ["新词1", "新词2"]
}
```
---
### 5. 删除 ASR 模型
```http
DELETE /api/v1/asr/{id}
```
**Response:**
```json
{
"message": "Deleted successfully"
}
```
---
### 6. 测试 ASR 模型
```http
POST /api/v1/asr/{id}/test
```
**Request Body:**
```json
{
"audio_url": "https://example.com/test-audio.wav"
}
```
或使用 Base64 编码的音频数据:
```json
{
"audio_data": "UklGRi..."
}
```
**Response (成功):**
```json
{
"success": true,
"transcript": "您好,请问有什么可以帮助您?",
"language": "zh",
"confidence": 0.95,
"latency_ms": 500
}
```
**Response (失败):**
```json
{
"success": false,
"error": "HTTP Error: 401 - Unauthorized"
}
```
---
### 7. 转写音频
```http
POST /api/v1/asr/{id}/transcribe
```
**Query Parameters:**
| 参数 | 类型 | 必填 | 说明 |
|------|------|------|------|
| audio_url | string | 否* | 音频文件URL |
| audio_data | string | 否* | Base64编码的音频数据 |
| hotwords | string[] | 否 | 热词列表 |
*二选一,至少提供一个
**Response:**
```json
{
"success": true,
"transcript": "您好,请问有什么可以帮助您?",
"language": "zh",
"confidence": 0.95
}
```
---
## Schema 定义
```python
from enum import Enum
from pydantic import BaseModel
from typing import Optional, List
from datetime import datetime
class ASRLanguage(str, Enum):
ZH = "zh"
EN = "en"
MULTILINGUAL = "Multi-lingual"
class ASRModelBase(BaseModel):
name: str
vendor: str
language: str # "zh" | "en" | "Multi-lingual"
base_url: str
api_key: str
model_name: Optional[str] = None
hotwords: List[str] = []
enable_punctuation: bool = True
enable_normalization: bool = True
enabled: bool = True
class ASRModelCreate(ASRModelBase):
id: Optional[str] = None
class ASRModelUpdate(BaseModel):
name: Optional[str] = None
language: Optional[str] = None
base_url: Optional[str] = None
api_key: Optional[str] = None
model_name: Optional[str] = None
hotwords: Optional[List[str]] = None
enable_punctuation: Optional[bool] = None
enable_normalization: Optional[bool] = None
enabled: Optional[bool] = None
class ASRModelOut(ASRModelBase):
id: str
user_id: int
created_at: datetime
class Config:
from_attributes = True
class ASRTestRequest(BaseModel):
audio_url: Optional[str] = None
audio_data: Optional[str] = None # base64 encoded
class ASRTestResponse(BaseModel):
success: bool
transcript: Optional[str] = None
language: Optional[str] = None
confidence: Optional[float] = None
latency_ms: Optional[int] = None
error: Optional[str] = None
```
---
## 供应商配置示例
### OpenAI Whisper
```json
{
"vendor": "OpenAI",
"base_url": "https://api.openai.com/v1",
"api_key": "sk-xxx",
"model_name": "whisper-1",
"language": "Multi-lingual",
"enable_punctuation": true,
"enable_normalization": true
}
```
### SiliconFlow Paraformer
```json
{
"vendor": "SiliconFlow",
"base_url": "https://api.siliconflow.cn/v1",
"api_key": "sf-xxx",
"model_name": "paraformer-v2",
"language": "zh",
"hotwords": ["产品名称", "公司名"],
"enable_punctuation": true,
"enable_normalization": true
}
```
---
## 单元测试
项目包含完整的单元测试,位于 `api/tests/test_asr.py`
### 测试用例概览
| 测试方法 | 说明 |
|----------|------|
| test_get_asr_models_empty | 空数据库获取测试 |
| test_create_asr_model | 创建模型测试 |
| test_create_asr_model_minimal | 最小数据创建测试 |
| test_get_asr_model_by_id | 获取单个模型测试 |
| test_get_asr_model_not_found | 获取不存在模型测试 |
| test_update_asr_model | 更新模型测试 |
| test_delete_asr_model | 删除模型测试 |
| test_list_asr_models_with_pagination | 分页测试 |
| test_filter_asr_models_by_language | 按语言过滤测试 |
| test_filter_asr_models_by_enabled | 按启用状态过滤测试 |
| test_create_asr_model_with_hotwords | 热词配置测试 |
| test_test_asr_model_siliconflow | SiliconFlow 供应商测试 |
| test_test_asr_model_openai | OpenAI 供应商测试 |
| test_different_asr_languages | 多语言测试 |
| test_different_asr_vendors | 多供应商测试 |
### 运行测试
```bash
# 运行 ASR 相关测试
pytest api/tests/test_asr.py -v
# 运行所有测试
pytest api/tests/ -v
```