365 lines
7.1 KiB
Markdown
365 lines
7.1 KiB
Markdown
# 语音识别 (Speech Recognition / ASR) API
|
|
|
|
语音识别 API 用于管理语音识别模型的配置和调用。
|
|
|
|
## 基础信息
|
|
|
|
| 项目 | 值 |
|
|
|------|-----|
|
|
| Base URL | `/api/v1/asr` |
|
|
| 认证方式 | Bearer Token (预留) |
|
|
|
|
---
|
|
|
|
## 数据模型
|
|
|
|
### ASRConfig
|
|
|
|
```typescript
|
|
interface ASRConfig {
|
|
id: string; // 配置ID
|
|
user_id: number; // 所属用户ID
|
|
name: string; // 配置名称
|
|
vendor: string; // 供应商
|
|
language: string; // 识别语言
|
|
base_url: string; // API地址
|
|
api_key: string; // API密钥
|
|
model_name?: string; // 模型名称
|
|
hotwords?: string[]; // 热词增强
|
|
enable_punctuation: boolean; // 是否启用标点
|
|
enable_normalization: boolean; // 是否启用文本规范化
|
|
enabled: boolean;
|
|
created_at: string;
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## API 端点
|
|
|
|
### 1. 获取 ASR 配置列表
|
|
|
|
```http
|
|
GET /api/v1/asr
|
|
```
|
|
|
|
**Query Parameters:**
|
|
|
|
| 参数 | 类型 | 必填 | 说明 |
|
|
|------|------|------|------|
|
|
| language | string | 否 | 过滤语言 |
|
|
| vendor | string | 否 | 过滤供应商 |
|
|
| enabled | boolean | 否 | 过滤启用状态 |
|
|
|
|
**Response:**
|
|
|
|
```json
|
|
{
|
|
"total": 3,
|
|
"list": [
|
|
{
|
|
"id": "asr_001",
|
|
"user_id": 1,
|
|
"name": "Whisper 多语种识别",
|
|
"vendor": "OpenAI Compatible",
|
|
"language": "Multi-lingual",
|
|
"base_url": "https://api.openai.com/v1",
|
|
"api_key": "sk-***",
|
|
"model_name": "whisper-1",
|
|
"enable_punctuation": true,
|
|
"enable_normalization": true,
|
|
"enabled": true,
|
|
"created_at": "2024-01-15T10:30:00Z"
|
|
},
|
|
{
|
|
"id": "asr_002",
|
|
"user_id": 1,
|
|
"name": "SenseVoice 中文识别",
|
|
"vendor": "OpenAI Compatible",
|
|
"language": "zh",
|
|
"base_url": "https://api.speech.ai/v1",
|
|
"api_key": "sk-***",
|
|
"model_name": "sensevoice-small",
|
|
"hotwords": ["小助手", "帮我"],
|
|
"enabled": true
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### 2. 获取单个 ASR 配置详情
|
|
|
|
```http
|
|
GET /api/v1/asr/{id}
|
|
```
|
|
|
|
**Response:**
|
|
|
|
```json
|
|
{
|
|
"id": "asr_001",
|
|
"user_id": 1,
|
|
"name": "Whisper 多语种识别",
|
|
"vendor": "OpenAI Compatible",
|
|
"language": "Multi-lingual",
|
|
"base_url": "https://api.openai.com/v1",
|
|
"api_key": "sk-xxx",
|
|
"model_name": "whisper-1",
|
|
"hotwords": [],
|
|
"enable_punctuation": true,
|
|
"enable_normalization": true,
|
|
"enabled": true,
|
|
"created_at": "2024-01-15T10:30:00Z"
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### 3. 创建 ASR 配置
|
|
|
|
```http
|
|
POST /api/v1/asr
|
|
```
|
|
|
|
**Request Body:**
|
|
|
|
```json
|
|
{
|
|
"name": "SenseVoice 中文识别",
|
|
"vendor": "OpenAI Compatible",
|
|
"language": "zh",
|
|
"base_url": "https://api.speech.ai/v1",
|
|
"api_key": "sk-your-api-key",
|
|
"model_name": "sensevoice-small",
|
|
"hotwords": ["小助手", "帮我"],
|
|
"enable_punctuation": true,
|
|
"enable_normalization": true,
|
|
"enabled": true
|
|
}
|
|
```
|
|
|
|
**Fields 说明:**
|
|
|
|
| 字段 | 类型 | 必填 | 说明 |
|
|
|------|------|------|------|
|
|
| name | string | 是 | 配置名称 |
|
|
| vendor | string | 是 | 供应商: "OpenAI Compatible" / "Azure" / "阿里云" / "讯飞" |
|
|
| language | string | 是 | 语言: "zh" / "en" / "Multi-lingual" |
|
|
| base_url | string | 是 | API Base URL |
|
|
| api_key | string | 是 | API Key |
|
|
| model_name | string | 否 | 模型名称 |
|
|
| hotwords | string[] | 否 | 热词列表,提升识别准确率 |
|
|
| enable_punctuation | boolean | 否 | 是否输出标点,默认 true |
|
|
| enable_normalization | boolean | 否 | 是否文本规范化,默认 true |
|
|
| enabled | boolean | 否 | 是否启用,默认 true |
|
|
|
|
---
|
|
|
|
### 4. 更新 ASR 配置
|
|
|
|
```http
|
|
PUT /api/v1/asr/{id}
|
|
```
|
|
|
|
**Request Body:** (部分更新)
|
|
|
|
```json
|
|
{
|
|
"name": "Whisper-1 优化版",
|
|
"language": "zh",
|
|
"enable_punctuation": true
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### 5. 删除 ASR 配置
|
|
|
|
```http
|
|
DELETE /api/v1/asr/{id}
|
|
```
|
|
|
|
---
|
|
|
|
### 6. 测试 ASR 识别
|
|
|
|
```http
|
|
POST /api/v1/asr/{id}/test
|
|
```
|
|
|
|
**Request Body:**
|
|
|
|
```json
|
|
{
|
|
"audio_url": "https://example.com/test-audio.wav",
|
|
"audio_data": "base64_encoded_audio" // 二选一
|
|
}
|
|
```
|
|
|
|
**Response:**
|
|
|
|
```json
|
|
{
|
|
"success": true,
|
|
"transcript": "您好,请问有什么可以帮助您?",
|
|
"language": "zh",
|
|
"confidence": 0.95,
|
|
"duration_ms": 3000,
|
|
"latency_ms": 450
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### 7. 实时语音识别 (流式)
|
|
|
|
```http
|
|
WS /api/v1/asr/{id}/stream
|
|
```
|
|
|
|
**连接参数:**
|
|
|
|
| 参数 | 类型 | 说明 |
|
|
|------|------|------|
|
|
| audio_format | string | 音频格式: "pcm" / "mp3" / "wav" |
|
|
| sample_rate | int | 采样率: 16000 / 44100 |
|
|
| channels | int | 声道数: 1 (单声道) / 2 (立体声) |
|
|
|
|
**消息格式:**
|
|
|
|
客户端发送 (音频数据):
|
|
```json
|
|
{
|
|
"type": "audio",
|
|
"data": "base64_encoded_audio_chunk"
|
|
}
|
|
```
|
|
|
|
服务端返回 (识别结果):
|
|
```json
|
|
{
|
|
"type": "transcript",
|
|
"text": "您好",
|
|
"is_final": false
|
|
}
|
|
```
|
|
|
|
```json
|
|
{
|
|
"type": "transcript",
|
|
"text": "您好,请问有什么可以帮助您?",
|
|
"is_final": true
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 推荐的 Schema 定义
|
|
|
|
```python
|
|
# ============ ASR Model ============
|
|
class ASRLanguage(str, Enum):
|
|
ZH = "zh"
|
|
EN = "en"
|
|
MULTILINGUAL = "Multi-lingual"
|
|
|
|
class ASRVendor(str, Enum):
|
|
OPENAI_COMPATIBLE = "OpenAI Compatible"
|
|
AZURE = "Azure"
|
|
ALI = "阿里云"
|
|
IFLYTEK = "讯飞"
|
|
|
|
class ASRModelBase(BaseModel):
|
|
name: str
|
|
vendor: str
|
|
language: str # "zh" | "en" | "Multi-lingual"
|
|
base_url: str
|
|
api_key: str
|
|
model_name: Optional[str] = None
|
|
hotwords: List[str] = []
|
|
enable_punctuation: bool = True
|
|
enable_normalization: bool = True
|
|
enabled: bool = True
|
|
|
|
class ASRModelCreate(ASRModelBase):
|
|
pass
|
|
|
|
class ASRModelUpdate(BaseModel):
|
|
name: Optional[str] = None
|
|
language: Optional[str] = None
|
|
base_url: Optional[str] = None
|
|
api_key: Optional[str] = None
|
|
model_name: Optional[str] = None
|
|
hotwords: Optional[List[str]] = None
|
|
enable_punctuation: Optional[bool] = None
|
|
enable_normalization: Optional[bool] = None
|
|
enabled: Optional[bool] = None
|
|
|
|
class ASRModelOut(ASRModelBase):
|
|
id: str
|
|
user_id: int
|
|
created_at: datetime
|
|
|
|
class Config:
|
|
from_attributes = True
|
|
|
|
class ASRTestRequest(BaseModel):
|
|
audio_url: Optional[str] = None
|
|
audio_data: Optional[str] = None # base64 encoded
|
|
|
|
class ASRTestResponse(BaseModel):
|
|
success: bool
|
|
transcript: Optional[str] = None
|
|
language: Optional[str] = None
|
|
confidence: Optional[float] = None
|
|
duration_ms: Optional[int] = None
|
|
latency_ms: Optional[int] = None
|
|
error: Optional[str] = None
|
|
```
|
|
|
|
---
|
|
|
|
## 供应商配置示例
|
|
|
|
### OpenAI Whisper
|
|
|
|
```json
|
|
{
|
|
"vendor": "OpenAI Compatible",
|
|
"base_url": "https://api.openai.com/v1",
|
|
"api_key": "sk-xxx",
|
|
"model_name": "whisper-1",
|
|
"language": "Multi-lingual",
|
|
"enable_punctuation": true,
|
|
"enable_normalization": true
|
|
}
|
|
```
|
|
|
|
### 阿里云智能语音
|
|
|
|
```json
|
|
{
|
|
"vendor": "阿里云",
|
|
"base_url": "https://filetrans.cn-shanghai.aliyuncs.com/v1",
|
|
"api_key": "your-access-key-id:your-access-key-secret",
|
|
"model_name": "nls.cn-shanghai",
|
|
"language": "zh",
|
|
"hotwords": ["产品名称", "公司名"]
|
|
}
|
|
```
|
|
|
|
### 讯飞语音
|
|
|
|
```json
|
|
{
|
|
"vendor": "讯飞",
|
|
"base_url": "https://iat-api.xfyun.cn/v2/iat",
|
|
"api_key": "your-appid:your-api-key",
|
|
"model_name": "iat",
|
|
"language": "zh",
|
|
"enable_punctuation": true
|
|
}
|
|
```
|