410 lines
8.5 KiB
Markdown
410 lines
8.5 KiB
Markdown
# 语音识别 (ASR Model) API
|
||
|
||
语音识别 API 用于管理语音识别模型的配置和调用。
|
||
|
||
## 基础信息
|
||
|
||
| 项目 | 值 |
|
||
|------|-----|
|
||
| Base URL | `/api/v1/asr` |
|
||
| 认证方式 | Bearer Token (预留) |
|
||
|
||
---
|
||
|
||
## 数据模型
|
||
|
||
### ASRModel
|
||
|
||
```typescript
|
||
interface ASRModel {
|
||
id: string; // 模型唯一标识 (8位UUID)
|
||
user_id: number; // 所属用户ID
|
||
name: string; // 模型显示名称
|
||
vendor: string; // 供应商: "OpenAI Compatible" | "Paraformer" | 等
|
||
language: string; // 识别语言: "zh" | "en" | "Multi-lingual"
|
||
base_url: string; // API Base URL
|
||
api_key: string; // API Key
|
||
model_name?: string; // 模型名称,如 "whisper-1" | "paraformer-v2"
|
||
hotwords?: string[]; // 热词列表
|
||
enable_punctuation: boolean; // 是否启用标点
|
||
enable_normalization: boolean; // 是否启用文本规范化
|
||
enabled: boolean; // 是否启用
|
||
created_at: string;
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## API 端点
|
||
|
||
### 1. 获取 ASR 模型列表
|
||
|
||
```http
|
||
GET /api/v1/asr
|
||
```
|
||
|
||
**Query Parameters:**
|
||
|
||
| 参数 | 类型 | 必填 | 默认值 | 说明 |
|
||
|------|------|------|--------|------|
|
||
| language | string | 否 | - | 过滤语言: "zh" \| "en" \| "Multi-lingual" |
|
||
| enabled | boolean | 否 | - | 过滤启用状态 |
|
||
| page | int | 否 | 1 | 页码 |
|
||
| limit | int | 否 | 50 | 每页数量 |
|
||
|
||
**Response:**
|
||
|
||
```json
|
||
{
|
||
"total": 3,
|
||
"page": 1,
|
||
"limit": 50,
|
||
"list": [
|
||
{
|
||
"id": "abc12345",
|
||
"user_id": 1,
|
||
"name": "Whisper 多语种识别",
|
||
"vendor": "OpenAI Compatible",
|
||
"language": "Multi-lingual",
|
||
"base_url": "https://api.openai.com/v1",
|
||
"api_key": "sk-***",
|
||
"model_name": "whisper-1",
|
||
"enable_punctuation": true,
|
||
"enable_normalization": true,
|
||
"enabled": true,
|
||
"created_at": "2024-01-15T10:30:00Z"
|
||
},
|
||
{
|
||
"id": "def67890",
|
||
"user_id": 1,
|
||
"name": "SenseVoice 中文识别",
|
||
"vendor": "OpenAI Compatible",
|
||
"language": "zh",
|
||
"base_url": "https://api.siliconflow.cn/v1",
|
||
"api_key": "sf-***",
|
||
"model_name": "paraformer-v2",
|
||
"hotwords": ["小助手", "帮我"],
|
||
"enable_punctuation": true,
|
||
"enable_normalization": true,
|
||
"enabled": true,
|
||
"created_at": "2024-01-15T10:30:00Z"
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
### 2. 获取单个 ASR 模型详情
|
||
|
||
```http
|
||
GET /api/v1/asr/{id}
|
||
```
|
||
|
||
**Path Parameters:**
|
||
|
||
| 参数 | 类型 | 说明 |
|
||
|------|------|------|
|
||
| id | string | 模型ID |
|
||
|
||
**Response:**
|
||
|
||
```json
|
||
{
|
||
"id": "abc12345",
|
||
"user_id": 1,
|
||
"name": "Whisper 多语种识别",
|
||
"vendor": "OpenAI Compatible",
|
||
"language": "Multi-lingual",
|
||
"base_url": "https://api.openai.com/v1",
|
||
"api_key": "sk-***",
|
||
"model_name": "whisper-1",
|
||
"hotwords": [],
|
||
"enable_punctuation": true,
|
||
"enable_normalization": true,
|
||
"enabled": true,
|
||
"created_at": "2024-01-15T10:30:00Z"
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
### 3. 创建 ASR 模型
|
||
|
||
```http
|
||
POST /api/v1/asr
|
||
```
|
||
|
||
**Request Body:**
|
||
|
||
```json
|
||
{
|
||
"name": "SenseVoice 中文识别",
|
||
"vendor": "OpenAI Compatible",
|
||
"language": "zh",
|
||
"base_url": "https://api.siliconflow.cn/v1",
|
||
"api_key": "sk-your-api-key",
|
||
"model_name": "paraformer-v2",
|
||
"hotwords": ["小助手", "帮我"],
|
||
"enable_punctuation": true,
|
||
"enable_normalization": true,
|
||
"enabled": true
|
||
}
|
||
```
|
||
|
||
**Fields 说明:**
|
||
|
||
| 字段 | 类型 | 必填 | 说明 |
|
||
|------|------|------|------|
|
||
| name | string | 是 | 模型显示名称 |
|
||
| vendor | string | 是 | 供应商: "OpenAI Compatible" / "Paraformer" |
|
||
| language | string | 是 | 语言: "zh" / "en" / "Multi-lingual" |
|
||
| base_url | string | 是 | API Base URL |
|
||
| api_key | string | 是 | API Key |
|
||
| model_name | string | 否 | 模型名称 |
|
||
| hotwords | string[] | 否 | 热词列表,提升识别准确率 |
|
||
| enable_punctuation | boolean | 否 | 是否输出标点,默认 true |
|
||
| enable_normalization | boolean | 否 | 是否文本规范化,默认 true |
|
||
| enabled | boolean | 否 | 是否启用,默认 true |
|
||
| id | string | 否 | 指定模型ID,默认自动生成 |
|
||
|
||
---
|
||
|
||
### 4. 更新 ASR 模型
|
||
|
||
```http
|
||
PUT /api/v1/asr/{id}
|
||
```
|
||
|
||
**Request Body:** (部分更新)
|
||
|
||
```json
|
||
{
|
||
"name": "Whisper-1 优化版",
|
||
"language": "zh",
|
||
"enable_punctuation": true,
|
||
"hotwords": ["新词1", "新词2"]
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
### 5. 删除 ASR 模型
|
||
|
||
```http
|
||
DELETE /api/v1/asr/{id}
|
||
```
|
||
|
||
**Response:**
|
||
|
||
```json
|
||
{
|
||
"message": "Deleted successfully"
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
### 6. 测试 ASR 模型
|
||
|
||
```http
|
||
POST /api/v1/asr/{id}/test
|
||
```
|
||
|
||
**Request Body:**
|
||
|
||
```json
|
||
{
|
||
"audio_url": "https://example.com/test-audio.wav"
|
||
}
|
||
```
|
||
|
||
或使用 Base64 编码的音频数据:
|
||
|
||
```json
|
||
{
|
||
"audio_data": "UklGRi..."
|
||
}
|
||
```
|
||
|
||
**Response (成功):**
|
||
|
||
```json
|
||
{
|
||
"success": true,
|
||
"transcript": "您好,请问有什么可以帮助您?",
|
||
"language": "zh",
|
||
"confidence": 0.95,
|
||
"latency_ms": 500
|
||
}
|
||
```
|
||
|
||
**Response (失败):**
|
||
|
||
```json
|
||
{
|
||
"success": false,
|
||
"error": "HTTP Error: 401 - Unauthorized"
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
### 7. 转写音频
|
||
|
||
```http
|
||
POST /api/v1/asr/{id}/transcribe
|
||
```
|
||
|
||
**Query Parameters:**
|
||
|
||
| 参数 | 类型 | 必填 | 说明 |
|
||
|------|------|------|------|
|
||
| audio_url | string | 否* | 音频文件URL |
|
||
| audio_data | string | 否* | Base64编码的音频数据 |
|
||
| hotwords | string[] | 否 | 热词列表 |
|
||
|
||
*二选一,至少提供一个
|
||
|
||
**Response:**
|
||
|
||
```json
|
||
{
|
||
"success": true,
|
||
"transcript": "您好,请问有什么可以帮助您?",
|
||
"language": "zh",
|
||
"confidence": 0.95
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## Schema 定义
|
||
|
||
```python
|
||
from enum import Enum
|
||
from pydantic import BaseModel
|
||
from typing import Optional, List
|
||
from datetime import datetime
|
||
|
||
class ASRLanguage(str, Enum):
|
||
ZH = "zh"
|
||
EN = "en"
|
||
MULTILINGUAL = "Multi-lingual"
|
||
|
||
class ASRModelBase(BaseModel):
|
||
name: str
|
||
vendor: str
|
||
language: str # "zh" | "en" | "Multi-lingual"
|
||
base_url: str
|
||
api_key: str
|
||
model_name: Optional[str] = None
|
||
hotwords: List[str] = []
|
||
enable_punctuation: bool = True
|
||
enable_normalization: bool = True
|
||
enabled: bool = True
|
||
|
||
class ASRModelCreate(ASRModelBase):
|
||
id: Optional[str] = None
|
||
|
||
class ASRModelUpdate(BaseModel):
|
||
name: Optional[str] = None
|
||
language: Optional[str] = None
|
||
base_url: Optional[str] = None
|
||
api_key: Optional[str] = None
|
||
model_name: Optional[str] = None
|
||
hotwords: Optional[List[str]] = None
|
||
enable_punctuation: Optional[bool] = None
|
||
enable_normalization: Optional[bool] = None
|
||
enabled: Optional[bool] = None
|
||
|
||
class ASRModelOut(ASRModelBase):
|
||
id: str
|
||
user_id: int
|
||
created_at: datetime
|
||
|
||
class Config:
|
||
from_attributes = True
|
||
|
||
class ASRTestRequest(BaseModel):
|
||
audio_url: Optional[str] = None
|
||
audio_data: Optional[str] = None # base64 encoded
|
||
|
||
class ASRTestResponse(BaseModel):
|
||
success: bool
|
||
transcript: Optional[str] = None
|
||
language: Optional[str] = None
|
||
confidence: Optional[float] = None
|
||
latency_ms: Optional[int] = None
|
||
error: Optional[str] = None
|
||
```
|
||
|
||
---
|
||
|
||
## 供应商配置示例
|
||
|
||
### OpenAI Whisper
|
||
|
||
```json
|
||
{
|
||
"vendor": "OpenAI Compatible",
|
||
"base_url": "https://api.openai.com/v1",
|
||
"api_key": "sk-xxx",
|
||
"model_name": "whisper-1",
|
||
"language": "Multi-lingual",
|
||
"enable_punctuation": true,
|
||
"enable_normalization": true
|
||
}
|
||
```
|
||
|
||
### OpenAI Compatible Paraformer
|
||
|
||
```json
|
||
{
|
||
"vendor": "OpenAI Compatible",
|
||
"base_url": "https://api.siliconflow.cn/v1",
|
||
"api_key": "sf-xxx",
|
||
"model_name": "paraformer-v2",
|
||
"language": "zh",
|
||
"hotwords": ["产品名称", "公司名"],
|
||
"enable_punctuation": true,
|
||
"enable_normalization": true
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 单元测试
|
||
|
||
项目包含完整的单元测试,位于 `api/tests/test_asr.py`。
|
||
|
||
### 测试用例概览
|
||
|
||
| 测试方法 | 说明 |
|
||
|----------|------|
|
||
| test_get_asr_models_empty | 空数据库获取测试 |
|
||
| test_create_asr_model | 创建模型测试 |
|
||
| test_create_asr_model_minimal | 最小数据创建测试 |
|
||
| test_get_asr_model_by_id | 获取单个模型测试 |
|
||
| test_get_asr_model_not_found | 获取不存在模型测试 |
|
||
| test_update_asr_model | 更新模型测试 |
|
||
| test_delete_asr_model | 删除模型测试 |
|
||
| test_list_asr_models_with_pagination | 分页测试 |
|
||
| test_filter_asr_models_by_language | 按语言过滤测试 |
|
||
| test_filter_asr_models_by_enabled | 按启用状态过滤测试 |
|
||
| test_create_asr_model_with_hotwords | 热词配置测试 |
|
||
| test_test_asr_model_siliconflow | OpenAI Compatible 供应商测试 |
|
||
| test_test_asr_model_openai | OpenAI 供应商测试 |
|
||
| test_different_asr_languages | 多语言测试 |
|
||
| test_different_asr_vendors | 多供应商测试 |
|
||
|
||
### 运行测试
|
||
|
||
```bash
|
||
# 运行 ASR 相关测试
|
||
pytest api/tests/test_asr.py -v
|
||
|
||
# 运行所有测试
|
||
pytest api/tests/ -v
|
||
```
|