2025-06-19 17:39:45 +08:00

487 lines
13 KiB
Markdown
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# WebSocket Chat Server Documentation
## Overview
The WebSocket Chat Server is an intelligent conversation system that provides real-time AI chat capabilities with advanced turn detection, session management, and client-aware interactions. The server supports multiple concurrent clients with individual session tracking and automatic turn completion detection.
## Features
- 🔄 **Real-time WebSocket communication**
- 🧠 **AI-powered responses** via FastGPT API
- 🎯 **Intelligent turn detection** using ONNX models
- 📱 **Multi-client support** with session isolation
- ⏱️ **Automatic timeout handling** with buffering
- 🔗 **Session persistence** across reconnections
- 🎨 **Professional logging** with client tracking
- 🌐 **Welcome message system** for new sessions
## Server Configuration
### Environment Variables
Create a `.env` file in the project root:
```env
# Turn Detection Settings
MAX_INCOMPLETE_SENTENCES=3
MAX_RESPONSE_TIMEOUT=5
# FastGPT API Configuration
CHAT_MODEL_API_URL=http://101.89.151.141:3000/
CHAT_MODEL_API_KEY=your_fastgpt_api_key_here
CHAT_MODEL_APP_ID=your_fastgpt_app_id_here
```
### Default Values
| Variable | Default | Description |
|----------|---------|-------------|
| `MAX_INCOMPLETE_SENTENCES` | 3 | Maximum buffered sentences before forcing completion |
| `MAX_RESPONSE_TIMEOUT` | 5 | Seconds of silence before processing buffered input |
| `CHAT_MODEL_API_URL` | None | FastGPT API endpoint URL |
| `CHAT_MODEL_API_KEY` | None | FastGPT API authentication key |
| `CHAT_MODEL_APP_ID` | None | FastGPT application ID |
## WebSocket Connection
### Connection URL Format
```
ws://localhost:9000?clientId=YOUR_CLIENT_ID
```
### Connection Parameters
| Parameter | Required | Description |
|-----------|----------|-------------|
| `clientId` | Yes | Unique identifier for the client session |
### Connection Example
```javascript
const ws = new WebSocket('ws://localhost:9000?clientId=user123');
```
## Message Protocol
### Message Format
All messages use JSON format with the following structure:
```json
{
"type": "MESSAGE_TYPE",
"payload": {
// Message-specific data
}
}
```
### Client to Server Messages
#### USER_INPUT
Sends user text input to the server.
```json
{
"type": "USER_INPUT",
"payload": {
"text": "Hello, how are you?",
"client_id": "user123" // Optional, will use URL clientId if not provided
}
}
```
**Fields:**
- `text` (string, required): The user's input text
- `client_id` (string, optional): Client identifier (overrides URL parameter)
### Server to Client Messages
#### AI_RESPONSE
AI-generated response to user input.
```json
{
"type": "AI_RESPONSE",
"payload": {
"text": "Hello! I'm doing well, thank you for asking. How can I help you today?",
"client_id": "user123",
"estimated_tts_duration": 3.2
}
}
```
**Fields:**
- `text` (string): AI response content
- `client_id` (string): Client identifier
- `estimated_tts_duration` (float): Estimated text-to-speech duration in seconds
#### ERROR
Error notification from server.
```json
{
"type": "ERROR",
"payload": {
"message": "Error description",
"client_id": "user123"
}
}
```
**Fields:**
- `message` (string): Error description
- `client_id` (string): Client identifier
## Session Management
### Session Lifecycle
1. **Connection**: Client connects with unique `clientId`
2. **Session Creation**: New session created or existing session reused
3. **Welcome Message**: New sessions receive welcome message automatically
4. **Interaction**: Real-time message exchange
5. **Disconnection**: Session data preserved for reconnection
### Session Data Structure
```python
class SessionData:
client_id: str # Unique client identifier
incomplete_sentences: List[str] # Buffered user input
conversation_history: List[ChatMessage] # Full conversation history
last_input_time: float # Timestamp of last user input
timeout_task: Optional[Task] # Current timeout task
ai_response_playback_ends_at: Optional[float] # AI response end time
```
### Session Persistence
- Sessions persist across WebSocket disconnections
- Reconnection with same `clientId` resumes existing session
- Conversation history maintained throughout session lifetime
- Timeout tasks properly managed during reconnections
## Turn Detection System
### How It Works
The server uses an ONNX-based turn detection model to determine when a user's utterance is complete:
1. **Input Buffering**: User input buffered during AI response playback
2. **Turn Analysis**: Model analyzes current + buffered input for completion
3. **Decision Making**: Determines if utterance is complete or needs more input
4. **Timeout Handling**: Processes buffered input after silence period
### Turn Detection Parameters
| Parameter | Value | Description |
|-----------|-------|-------------|
| Model | `livekit/turn-detector` | Pre-trained turn detection model |
| Threshold | `0.0009` | Probability threshold for completion |
| Max History | 6 turns | Maximum conversation history for analysis |
| Max Tokens | 128 | Maximum tokens for model input |
### Turn Detection Flow
```
User Input → Buffer Check → Turn Detection → Complete? → Process/Continue
↓ ↓ ↓ ↓ ↓
AI Speaking? Add to Buffer Model Predict Yes Send to AI
↓ ↓ ↓ ↓ ↓
No Schedule Timeout < Threshold No Schedule Timeout
```
## Timeout and Buffering System
### Buffering During AI Response
When the AI is generating or "speaking" a response:
- User input is buffered in `incomplete_sentences`
- New timeout task scheduled for each buffered input
- Timeout waits for AI playback to complete before processing
### Silence Timeout
After AI response completes:
- Server waits `MAX_RESPONSE_TIMEOUT` seconds (default: 5s)
- If no new input received, processes buffered input
- Forces completion if `MAX_INCOMPLETE_SENTENCES` reached
### Timeout Configuration
```python
# Wait for AI playback to finish
remaining_playtime = session.ai_response_playback_ends_at - current_time
await asyncio.sleep(remaining_playtime)
# Wait for user silence
await asyncio.sleep(MAX_RESPONSE_TIMEOUT) # 5 seconds default
```
## Error Handling
### Connection Errors
- **Missing clientId**: Connection rejected with code 1008
- **Invalid JSON**: Error message sent to client
- **Unknown message type**: Error message sent to client
### API Errors
- **FastGPT API failures**: Error logged, user message reverted
- **Network errors**: Comprehensive error logging with context
- **Model errors**: Graceful degradation with error reporting
### Timeout Errors
- **Task cancellation**: Normal during new input arrival
- **Exception handling**: Errors logged, timeout task cleared
## Logging System
### Log Levels
The server uses a comprehensive logging system with colored output:
- **INFO** (green): General information
- 🐛 **DEBUG** (cyan): Detailed debugging information
- ⚠️ **WARNING** (yellow): Warning messages
-**ERROR** (red): Error messages
- ⏱️ **TIMEOUT** (blue): Timeout-related events
- 💬 **USER_INPUT** (purple): User input processing
- 🤖 **AI_RESPONSE** (blue): AI response generation
- 🔗 **SESSION** (bold): Session management events
### Log Format
```
2024-01-15 14:30:25.123 [LEVEL] 🎯 (client_id): Message | key=value | key2=value2
```
### Example Logs
```
2024-01-15 14:30:25.123 [SESSION] 🔗 (user123): NEW SESSION: Creating session | total_sessions_before=0
2024-01-15 14:30:25.456 [USER_INPUT] 💬 (user123): AI speaking. Buffering: 'Hello' | current_buffer_size=1
2024-01-15 14:30:26.789 [AI_RESPONSE] 🤖 (user123): Response sent: 'Hello! How can I help you?' | tts_duration=2.5s | playback_ends_at=1642248629.289
```
## Client Implementation Examples
### JavaScript/Web Client
```javascript
class ChatClient {
constructor(clientId, serverUrl = 'ws://localhost:9000') {
this.clientId = clientId;
this.serverUrl = `${serverUrl}?clientId=${clientId}`;
this.ws = null;
this.messageHandlers = new Map();
}
connect() {
this.ws = new WebSocket(this.serverUrl);
this.ws.onopen = () => {
console.log('Connected to chat server');
};
this.ws.onmessage = (event) => {
const message = JSON.parse(event.data);
this.handleMessage(message);
};
this.ws.onclose = (event) => {
console.log('Disconnected from chat server');
};
}
sendMessage(text) {
if (this.ws && this.ws.readyState === WebSocket.OPEN) {
const message = {
type: 'USER_INPUT',
payload: {
text: text,
client_id: this.clientId
}
};
this.ws.send(JSON.stringify(message));
}
}
handleMessage(message) {
switch (message.type) {
case 'AI_RESPONSE':
console.log('AI:', message.payload.text);
break;
case 'ERROR':
console.error('Error:', message.payload.message);
break;
default:
console.log('Unknown message type:', message.type);
}
}
}
// Usage
const client = new ChatClient('user123');
client.connect();
client.sendMessage('Hello, how are you?');
```
### Python Client
```python
import asyncio
import websockets
import json
class ChatClient:
def __init__(self, client_id, server_url="ws://localhost:9000"):
self.client_id = client_id
self.server_url = f"{server_url}?clientId={client_id}"
self.websocket = None
async def connect(self):
self.websocket = await websockets.connect(self.server_url)
print(f"Connected to chat server as {self.client_id}")
async def send_message(self, text):
if self.websocket:
message = {
"type": "USER_INPUT",
"payload": {
"text": text,
"client_id": self.client_id
}
}
await self.websocket.send(json.dumps(message))
async def listen(self):
async for message in self.websocket:
data = json.loads(message)
await self.handle_message(data)
async def handle_message(self, message):
if message["type"] == "AI_RESPONSE":
print(f"AI: {message['payload']['text']}")
elif message["type"] == "ERROR":
print(f"Error: {message['payload']['message']}")
async def run(self):
await self.connect()
await self.listen()
# Usage
async def main():
client = ChatClient("user123")
await client.run()
asyncio.run(main())
```
## Performance Considerations
### Scalability
- **Session Management**: Sessions stored in memory (consider Redis for production)
- **Concurrent Connections**: Limited by system resources and WebSocket library
- **Model Loading**: ONNX model loaded once per server instance
### Optimization
- **Connection Pooling**: aiohttp sessions reused for API calls
- **Async Processing**: All I/O operations are asynchronous
- **Memory Management**: Sessions cleaned up on disconnection (manual cleanup needed)
### Monitoring
- **Performance Logging**: Duration tracking for all operations
- **Error Tracking**: Comprehensive error logging with context
- **Session Metrics**: Active session count and client activity
## Deployment
### Prerequisites
```bash
pip install -r requirements.txt
```
### Running the Server
```bash
python main_uninterruptable2.py
```
### Production Considerations
1. **Environment Variables**: Set all required environment variables
2. **SSL/TLS**: Use WSS for secure WebSocket connections
3. **Load Balancing**: Consider multiple server instances
4. **Session Storage**: Use Redis or database for session persistence
5. **Monitoring**: Implement health checks and metrics collection
6. **Logging**: Configure log rotation and external logging service
### Docker Deployment
```dockerfile
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 9000
CMD ["python", "main_uninterruptable2.py"]
```
## Troubleshooting
### Common Issues
1. **Connection Refused**: Check if server is running on correct port
2. **Missing clientId**: Ensure clientId parameter is provided in URL
3. **API Errors**: Verify FastGPT API credentials and network connectivity
4. **Model Loading**: Check if ONNX model files are accessible
### Debug Mode
Enable debug logging by modifying the logging level in the code or environment variables.
### Health Check
The server doesn't provide a built-in health check endpoint. Consider implementing one for production monitoring.
## API Reference
### WebSocket Events
| Event | Direction | Description |
|-------|-----------|-------------|
| `open` | Client | Connection established |
| `message` | Bidirectional | Message exchange |
| `close` | Client | Connection closed |
| `error` | Client | Connection error |
### Message Types Summary
| Type | Direction | Description |
|------|-----------|-------------|
| `USER_INPUT` | Client → Server | Send user message |
| `AI_RESPONSE` | Server → Client | Receive AI response |
| `ERROR` | Server → Client | Error notification |
## License
This WebSocket server is part of the turn detection request project. Please refer to the main project license for usage terms.