421 lines
7.9 KiB
Markdown
421 lines
7.9 KiB
Markdown
# 知识库 (Knowledge Base) API
|
||
|
||
知识库 API 用于管理知识库和文档的创建、索引和搜索。
|
||
|
||
## 基础信息
|
||
|
||
| 项目 | 值 |
|
||
|------|-----|
|
||
| Base URL | `/api/v1/knowledge` |
|
||
| 认证方式 | Bearer Token (预留) |
|
||
|
||
---
|
||
|
||
## 数据模型
|
||
|
||
### KnowledgeBase
|
||
|
||
```typescript
|
||
interface KnowledgeBase {
|
||
id: string; // 知识库唯一标识 (8位UUID)
|
||
user_id: number; // 所属用户ID
|
||
name: string; // 知识库名称
|
||
description: string; // 知识库描述
|
||
embeddingModel: string; // Embedding 模型名称
|
||
chunkSize: number; // 文档分块大小
|
||
chunkOverlap: number; // 分块重叠大小
|
||
docCount: number; // 文档数量
|
||
chunkCount: number; // 切分后的文本块数量
|
||
status: string; // 状态: "active" | "inactive"
|
||
createdAt: string; // 创建时间
|
||
updatedAt: string; // 更新时间
|
||
documents: KnowledgeDocument[]; // 关联的文档列表
|
||
}
|
||
```
|
||
|
||
### KnowledgeDocument
|
||
|
||
```typescript
|
||
interface KnowledgeDocument {
|
||
id: string; // 文档唯一标识
|
||
kb_id: string; // 所属知识库ID
|
||
name: string; // 文档名称
|
||
size: string; // 文件大小
|
||
fileType: string; // 文件类型
|
||
storageUrl: string; // 存储地址
|
||
status: string; // 状态: "pending" | "processing" | "completed" | "failed"
|
||
chunkCount: number; // 切分后的文本块数量
|
||
errorMessage: string; // 错误信息
|
||
uploadDate: string; // 上传时间
|
||
createdAt: string; // 创建时间
|
||
processedAt: string; // 处理完成时间
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## API 端点
|
||
|
||
### 1. 获取知识库列表
|
||
|
||
```http
|
||
GET /api/v1/knowledge/bases
|
||
```
|
||
|
||
**Query Parameters:**
|
||
|
||
| 参数 | 类型 | 必填 | 默认值 | 说明 |
|
||
|------|------|------|--------|------|
|
||
| user_id | int | 否 | 1 | 用户ID |
|
||
| page | int | 否 | 1 | 页码 |
|
||
| limit | int | 否 | 50 | 每页数量 |
|
||
|
||
**Response:**
|
||
|
||
```json
|
||
{
|
||
"total": 2,
|
||
"page": 1,
|
||
"limit": 50,
|
||
"list": [
|
||
{
|
||
"id": "kb_001",
|
||
"user_id": 1,
|
||
"name": "产品知识库",
|
||
"description": "产品文档和FAQ",
|
||
"embeddingModel": "text-embedding-3-small",
|
||
"chunkSize": 500,
|
||
"chunkOverlap": 50,
|
||
"docCount": 10,
|
||
"chunkCount": 150,
|
||
"status": "active",
|
||
"createdAt": "2024-01-15T10:30:00",
|
||
"updatedAt": "2024-01-15T10:30:00",
|
||
"documents": [...]
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
### 2. 获取单个知识库详情
|
||
|
||
```http
|
||
GET /api/v1/knowledge/bases/{kb_id}
|
||
```
|
||
|
||
**Response:**
|
||
|
||
```json
|
||
{
|
||
"id": "kb_001",
|
||
"user_id": 1,
|
||
"name": "产品知识库",
|
||
"description": "产品文档和FAQ",
|
||
"embeddingModel": "text-embedding-3-small",
|
||
"chunkSize": 500,
|
||
"chunkOverlap": 50,
|
||
"docCount": 10,
|
||
"chunkCount": 150,
|
||
"status": "active",
|
||
"createdAt": "2024-01-15T10:30:00",
|
||
"updatedAt": "2024-01-15T10:30:00",
|
||
"documents": [
|
||
{
|
||
"id": "doc_001",
|
||
"kb_id": "kb_001",
|
||
"name": "产品手册.pdf",
|
||
"size": "1.2 MB",
|
||
"fileType": "application/pdf",
|
||
"storageUrl": "",
|
||
"status": "completed",
|
||
"chunkCount": 45,
|
||
"errorMessage": null,
|
||
"uploadDate": "2024-01-15T10:30:00",
|
||
"createdAt": "2024-01-15T10:30:00",
|
||
"processedAt": "2024-01-15T10:30:05"
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
### 3. 创建知识库
|
||
|
||
```http
|
||
POST /api/v1/knowledge/bases
|
||
```
|
||
|
||
**Request Body:**
|
||
|
||
```json
|
||
{
|
||
"name": "产品知识库",
|
||
"description": "产品文档和FAQ",
|
||
"embeddingModel": "text-embedding-3-small",
|
||
"chunkSize": 500,
|
||
"chunkOverlap": 50
|
||
}
|
||
```
|
||
|
||
**Fields 说明:**
|
||
|
||
| 字段 | 类型 | 必填 | 说明 |
|
||
|------|------|------|------|
|
||
| name | string | 是 | 知识库名称 |
|
||
| description | string | 否 | 知识库描述 |
|
||
| embeddingModel | string | 否 | Embedding 模型名称,默认 "text-embedding-3-small" |
|
||
| chunkSize | int | 否 | 文档分块大小,默认 500 |
|
||
| chunkOverlap | int | 否 | 分块重叠大小,默认 50 |
|
||
|
||
---
|
||
|
||
### 4. 更新知识库
|
||
|
||
```http
|
||
PUT /api/v1/knowledge/bases/{kb_id}
|
||
```
|
||
|
||
**Request Body:** (部分更新)
|
||
|
||
```json
|
||
{
|
||
"name": "更新后的知识库名称",
|
||
"description": "新的描述",
|
||
"chunkSize": 800
|
||
}
|
||
```
|
||
|
||
**注意:** 如果知识库中已有索引的文档,则不能修改 embeddingModel。如需修改,请先删除所有文档。
|
||
|
||
---
|
||
|
||
### 5. 删除知识库
|
||
|
||
```http
|
||
DELETE /api/v1/knowledge/bases/{kb_id}
|
||
```
|
||
|
||
**Response:**
|
||
|
||
```json
|
||
{
|
||
"message": "Deleted successfully"
|
||
}
|
||
```
|
||
|
||
**注意:** 删除知识库会同时删除向量数据库中的相关数据。
|
||
|
||
---
|
||
|
||
### 6. 上传文档
|
||
|
||
```http
|
||
POST /api/v1/knowledge/bases/{kb_id}/documents
|
||
```
|
||
|
||
支持两种上传方式:
|
||
|
||
**方式一:文件上传 (multipart/form-data)**
|
||
|
||
| 参数 | 类型 | 必填 | 说明 |
|
||
|------|------|------|------|
|
||
| file | file | 是 | 要上传的文档文件 |
|
||
|
||
支持的文件类型:`.txt`, `.md`, `.csv`, `.json`, `.pdf`, `.docx`
|
||
|
||
**方式二:仅创建文档记录 (application/json)**
|
||
|
||
```json
|
||
{
|
||
"name": "document.pdf",
|
||
"size": "1.2 MB",
|
||
"fileType": "application/pdf",
|
||
"storageUrl": "https://storage.example.com/doc.pdf"
|
||
}
|
||
```
|
||
|
||
**Response (文件上传):**
|
||
|
||
```json
|
||
{
|
||
"id": "doc_001",
|
||
"name": "产品手册.pdf",
|
||
"size": "1.2 MB",
|
||
"fileType": "application/pdf",
|
||
"storageUrl": "",
|
||
"status": "completed",
|
||
"chunkCount": 45,
|
||
"message": "Document uploaded and indexed"
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
### 7. 索引文档内容
|
||
|
||
```http
|
||
POST /api/v1/knowledge/bases/{kb_id}/documents/{doc_id}/index
|
||
```
|
||
|
||
直接向向量数据库索引文本内容,无需上传文件。
|
||
|
||
**Request Body:**
|
||
|
||
```json
|
||
{
|
||
"content": "要索引的文本内容..."
|
||
}
|
||
```
|
||
|
||
**Response:**
|
||
|
||
```json
|
||
{
|
||
"message": "Document indexed",
|
||
"chunkCount": 10
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
### 8. 删除文档
|
||
|
||
```http
|
||
DELETE /api/v1/knowledge/bases/{kb_id}/documents/{doc_id}
|
||
```
|
||
|
||
**Response:**
|
||
|
||
```json
|
||
{
|
||
"message": "Deleted successfully"
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
### 9. 搜索知识库
|
||
|
||
```http
|
||
POST /api/v1/knowledge/search
|
||
```
|
||
|
||
**Request Body:**
|
||
|
||
```json
|
||
{
|
||
"kb_id": "kb_001",
|
||
"query": "产品退货政策",
|
||
"nResults": 5
|
||
}
|
||
```
|
||
|
||
**Fields 说明:**
|
||
|
||
| 字段 | 类型 | 必填 | 说明 |
|
||
|------|------|------|------|
|
||
| kb_id | string | 是 | 知识库ID |
|
||
| query | string | 是 | 搜索查询文本 |
|
||
| nResults | int | 否 | 返回结果数量,默认 5 |
|
||
|
||
**Response:**
|
||
|
||
```json
|
||
{
|
||
"results": [
|
||
{
|
||
"id": "doc_001",
|
||
"text": "我们的退货政策是...",
|
||
"score": 0.85,
|
||
"metadata": {
|
||
"document_name": "退货政策.pdf",
|
||
"chunk_index": 3
|
||
}
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
### 10. 获取知识库统计
|
||
|
||
```http
|
||
GET /api/v1/knowledge/bases/{kb_id}/stats
|
||
```
|
||
|
||
**Response:**
|
||
|
||
```json
|
||
{
|
||
"kb_id": "kb_001",
|
||
"docCount": 10,
|
||
"chunkCount": 150
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 支持的文件类型
|
||
|
||
| 文件类型 | 扩展名 | 说明 |
|
||
|----------|--------|------|
|
||
| 纯文本 | .txt | 纯文本文件 |
|
||
| Markdown | .md | Markdown 格式文档 |
|
||
| CSV | .csv | CSV 表格数据 |
|
||
| JSON | .json | JSON 格式数据 |
|
||
| PDF | .pdf | PDF 文档 (需要 pypdf) |
|
||
| Word | .docx | Word 文档 (需要 python-docx) |
|
||
|
||
**注意:** 不支持旧的 .doc 格式,请转换为 .docx 或其他格式。
|
||
|
||
---
|
||
|
||
## Schema 定义
|
||
|
||
```python
|
||
from pydantic import BaseModel
|
||
from typing import Optional, List
|
||
|
||
class KnowledgeBaseCreate(BaseModel):
|
||
name: str
|
||
description: Optional[str] = None
|
||
embeddingModel: Optional[str] = "text-embedding-3-small"
|
||
chunkSize: Optional[int] = 500
|
||
chunkOverlap: Optional[int] = 50
|
||
|
||
class KnowledgeBaseUpdate(BaseModel):
|
||
name: Optional[str] = None
|
||
description: Optional[str] = None
|
||
embeddingModel: Optional[str] = None
|
||
chunkSize: Optional[int] = None
|
||
chunkOverlap: Optional[int] = None
|
||
|
||
class KnowledgeSearchQuery(BaseModel):
|
||
kb_id: str
|
||
query: str
|
||
nResults: Optional[int] = 5
|
||
|
||
class DocumentIndexRequest(BaseModel):
|
||
content: str
|
||
```
|
||
|
||
---
|
||
|
||
## 单元测试
|
||
|
||
项目包含完整的单元测试,位于 `api/tests/test_knowledge.py`。
|
||
|
||
### 运行测试
|
||
|
||
```bash
|
||
# 运行知识库相关测试
|
||
pytest api/tests/test_knowledge.py -v
|
||
|
||
# 运行所有测试
|
||
pytest api/tests/ -v
|
||
```
|