Files
AI-VideoAssistant/api/docs/asr.md
2026-02-08 15:52:16 +08:00

8.5 KiB
Raw Blame History

语音识别 (ASR Model) API

语音识别 API 用于管理语音识别模型的配置和调用。

基础信息

项目
Base URL /api/v1/asr
认证方式 Bearer Token (预留)

数据模型

ASRModel

interface ASRModel {
  id: string;           // 模型唯一标识 (8位UUID)
  user_id: number;      // 所属用户ID
  name: string;        // 模型显示名称
  vendor: string;      // 供应商: "OpenAI" | "SiliconFlow" | "Paraformer" | 等
  language: string;     // 识别语言: "zh" | "en" | "Multi-lingual"
  base_url: string;     // API Base URL
  api_key: string;     // API Key
  model_name?: string; // 模型名称,如 "whisper-1" | "paraformer-v2"
  hotwords?: string[]; // 热词列表
  enable_punctuation: boolean;  // 是否启用标点
  enable_normalization: boolean; // 是否启用文本规范化
  enabled: boolean;     // 是否启用
  created_at: string;
}

API 端点

1. 获取 ASR 模型列表

GET /api/v1/asr

Query Parameters:

参数 类型 必填 默认值 说明
language string - 过滤语言: "zh" | "en" | "Multi-lingual"
enabled boolean - 过滤启用状态
page int 1 页码
limit int 50 每页数量

Response:

{
  "total": 3,
  "page": 1,
  "limit": 50,
  "list": [
    {
      "id": "abc12345",
      "user_id": 1,
      "name": "Whisper 多语种识别",
      "vendor": "OpenAI",
      "language": "Multi-lingual",
      "base_url": "https://api.openai.com/v1",
      "api_key": "sk-***",
      "model_name": "whisper-1",
      "enable_punctuation": true,
      "enable_normalization": true,
      "enabled": true,
      "created_at": "2024-01-15T10:30:00Z"
    },
    {
      "id": "def67890",
      "user_id": 1,
      "name": "SenseVoice 中文识别",
      "vendor": "SiliconFlow",
      "language": "zh",
      "base_url": "https://api.siliconflow.cn/v1",
      "api_key": "sf-***",
      "model_name": "paraformer-v2",
      "hotwords": ["小助手", "帮我"],
      "enable_punctuation": true,
      "enable_normalization": true,
      "enabled": true,
      "created_at": "2024-01-15T10:30:00Z"
    }
  ]
}

2. 获取单个 ASR 模型详情

GET /api/v1/asr/{id}

Path Parameters:

参数 类型 说明
id string 模型ID

Response:

{
  "id": "abc12345",
  "user_id": 1,
  "name": "Whisper 多语种识别",
  "vendor": "OpenAI",
  "language": "Multi-lingual",
  "base_url": "https://api.openai.com/v1",
  "api_key": "sk-***",
  "model_name": "whisper-1",
  "hotwords": [],
  "enable_punctuation": true,
  "enable_normalization": true,
  "enabled": true,
  "created_at": "2024-01-15T10:30:00Z"
}

3. 创建 ASR 模型

POST /api/v1/asr

Request Body:

{
  "name": "SenseVoice 中文识别",
  "vendor": "SiliconFlow",
  "language": "zh",
  "base_url": "https://api.siliconflow.cn/v1",
  "api_key": "sk-your-api-key",
  "model_name": "paraformer-v2",
  "hotwords": ["小助手", "帮我"],
  "enable_punctuation": true,
  "enable_normalization": true,
  "enabled": true
}

Fields 说明:

字段 类型 必填 说明
name string 模型显示名称
vendor string 供应商: "OpenAI" / "SiliconFlow" / "Paraformer"
language string 语言: "zh" / "en" / "Multi-lingual"
base_url string API Base URL
api_key string API Key
model_name string 模型名称
hotwords string[] 热词列表,提升识别准确率
enable_punctuation boolean 是否输出标点,默认 true
enable_normalization boolean 是否文本规范化,默认 true
enabled boolean 是否启用,默认 true
id string 指定模型ID默认自动生成

4. 更新 ASR 模型

PUT /api/v1/asr/{id}

Request Body: (部分更新)

{
  "name": "Whisper-1 优化版",
  "language": "zh",
  "enable_punctuation": true,
  "hotwords": ["新词1", "新词2"]
}

5. 删除 ASR 模型

DELETE /api/v1/asr/{id}

Response:

{
  "message": "Deleted successfully"
}

6. 测试 ASR 模型

POST /api/v1/asr/{id}/test

Request Body:

{
  "audio_url": "https://example.com/test-audio.wav"
}

或使用 Base64 编码的音频数据:

{
  "audio_data": "UklGRi..."
}

Response (成功):

{
  "success": true,
  "transcript": "您好,请问有什么可以帮助您?",
  "language": "zh",
  "confidence": 0.95,
  "latency_ms": 500
}

Response (失败):

{
  "success": false,
  "error": "HTTP Error: 401 - Unauthorized"
}

7. 转写音频

POST /api/v1/asr/{id}/transcribe

Query Parameters:

参数 类型 必填 说明
audio_url string 否* 音频文件URL
audio_data string 否* Base64编码的音频数据
hotwords string[] 热词列表

*二选一,至少提供一个

Response:

{
  "success": true,
  "transcript": "您好,请问有什么可以帮助您?",
  "language": "zh",
  "confidence": 0.95
}

Schema 定义

from enum import Enum
from pydantic import BaseModel
from typing import Optional, List
from datetime import datetime

class ASRLanguage(str, Enum):
    ZH = "zh"
    EN = "en"
    MULTILINGUAL = "Multi-lingual"

class ASRModelBase(BaseModel):
    name: str
    vendor: str
    language: str  # "zh" | "en" | "Multi-lingual"
    base_url: str
    api_key: str
    model_name: Optional[str] = None
    hotwords: List[str] = []
    enable_punctuation: bool = True
    enable_normalization: bool = True
    enabled: bool = True

class ASRModelCreate(ASRModelBase):
    id: Optional[str] = None

class ASRModelUpdate(BaseModel):
    name: Optional[str] = None
    language: Optional[str] = None
    base_url: Optional[str] = None
    api_key: Optional[str] = None
    model_name: Optional[str] = None
    hotwords: Optional[List[str]] = None
    enable_punctuation: Optional[bool] = None
    enable_normalization: Optional[bool] = None
    enabled: Optional[bool] = None

class ASRModelOut(ASRModelBase):
    id: str
    user_id: int
    created_at: datetime

    class Config:
        from_attributes = True

class ASRTestRequest(BaseModel):
    audio_url: Optional[str] = None
    audio_data: Optional[str] = None  # base64 encoded

class ASRTestResponse(BaseModel):
    success: bool
    transcript: Optional[str] = None
    language: Optional[str] = None
    confidence: Optional[float] = None
    latency_ms: Optional[int] = None
    error: Optional[str] = None

供应商配置示例

OpenAI Whisper

{
  "vendor": "OpenAI",
  "base_url": "https://api.openai.com/v1",
  "api_key": "sk-xxx",
  "model_name": "whisper-1",
  "language": "Multi-lingual",
  "enable_punctuation": true,
  "enable_normalization": true
}

SiliconFlow Paraformer

{
  "vendor": "SiliconFlow",
  "base_url": "https://api.siliconflow.cn/v1",
  "api_key": "sf-xxx",
  "model_name": "paraformer-v2",
  "language": "zh",
  "hotwords": ["产品名称", "公司名"],
  "enable_punctuation": true,
  "enable_normalization": true
}

单元测试

项目包含完整的单元测试,位于 api/tests/test_asr.py

测试用例概览

测试方法 说明
test_get_asr_models_empty 空数据库获取测试
test_create_asr_model 创建模型测试
test_create_asr_model_minimal 最小数据创建测试
test_get_asr_model_by_id 获取单个模型测试
test_get_asr_model_not_found 获取不存在模型测试
test_update_asr_model 更新模型测试
test_delete_asr_model 删除模型测试
test_list_asr_models_with_pagination 分页测试
test_filter_asr_models_by_language 按语言过滤测试
test_filter_asr_models_by_enabled 按启用状态过滤测试
test_create_asr_model_with_hotwords 热词配置测试
test_test_asr_model_siliconflow SiliconFlow 供应商测试
test_test_asr_model_openai OpenAI 供应商测试
test_different_asr_languages 多语言测试
test_different_asr_vendors 多供应商测试

运行测试

# 运行 ASR 相关测试
pytest api/tests/test_asr.py -v

# 运行所有测试
pytest api/tests/ -v