487 lines
13 KiB
Markdown
487 lines
13 KiB
Markdown
# WebSocket Chat Server Documentation
|
||
|
||
## Overview
|
||
|
||
The WebSocket Chat Server is an intelligent conversation system that provides real-time AI chat capabilities with advanced turn detection, session management, and client-aware interactions. The server supports multiple concurrent clients with individual session tracking and automatic turn completion detection.
|
||
|
||
## Features
|
||
|
||
- 🔄 **Real-time WebSocket communication**
|
||
- 🧠 **AI-powered responses** via FastGPT API
|
||
- 🎯 **Intelligent turn detection** using ONNX models
|
||
- 📱 **Multi-client support** with session isolation
|
||
- ⏱️ **Automatic timeout handling** with buffering
|
||
- 🔗 **Session persistence** across reconnections
|
||
- 🎨 **Professional logging** with client tracking
|
||
- 🌐 **Welcome message system** for new sessions
|
||
|
||
## Server Configuration
|
||
|
||
### Environment Variables
|
||
|
||
Create a `.env` file in the project root:
|
||
|
||
```env
|
||
# Turn Detection Settings
|
||
MAX_INCOMPLETE_SENTENCES=3
|
||
MAX_RESPONSE_TIMEOUT=5
|
||
|
||
# FastGPT API Configuration
|
||
CHAT_MODEL_API_URL=http://101.89.151.141:3000/
|
||
CHAT_MODEL_API_KEY=your_fastgpt_api_key_here
|
||
CHAT_MODEL_APP_ID=your_fastgpt_app_id_here
|
||
```
|
||
|
||
### Default Values
|
||
|
||
| Variable | Default | Description |
|
||
|----------|---------|-------------|
|
||
| `MAX_INCOMPLETE_SENTENCES` | 3 | Maximum buffered sentences before forcing completion |
|
||
| `MAX_RESPONSE_TIMEOUT` | 5 | Seconds of silence before processing buffered input |
|
||
| `CHAT_MODEL_API_URL` | None | FastGPT API endpoint URL |
|
||
| `CHAT_MODEL_API_KEY` | None | FastGPT API authentication key |
|
||
| `CHAT_MODEL_APP_ID` | None | FastGPT application ID |
|
||
|
||
## WebSocket Connection
|
||
|
||
### Connection URL Format
|
||
|
||
```
|
||
ws://localhost:9000?clientId=YOUR_CLIENT_ID
|
||
```
|
||
|
||
### Connection Parameters
|
||
|
||
| Parameter | Required | Description |
|
||
|-----------|----------|-------------|
|
||
| `clientId` | Yes | Unique identifier for the client session |
|
||
|
||
### Connection Example
|
||
|
||
```javascript
|
||
const ws = new WebSocket('ws://localhost:9000?clientId=user123');
|
||
```
|
||
|
||
## Message Protocol
|
||
|
||
### Message Format
|
||
|
||
All messages use JSON format with the following structure:
|
||
|
||
```json
|
||
{
|
||
"type": "MESSAGE_TYPE",
|
||
"payload": {
|
||
// Message-specific data
|
||
}
|
||
}
|
||
```
|
||
|
||
### Client to Server Messages
|
||
|
||
#### USER_INPUT
|
||
|
||
Sends user text input to the server.
|
||
|
||
```json
|
||
{
|
||
"type": "USER_INPUT",
|
||
"payload": {
|
||
"text": "Hello, how are you?",
|
||
"client_id": "user123" // Optional, will use URL clientId if not provided
|
||
}
|
||
}
|
||
```
|
||
|
||
**Fields:**
|
||
- `text` (string, required): The user's input text
|
||
- `client_id` (string, optional): Client identifier (overrides URL parameter)
|
||
|
||
### Server to Client Messages
|
||
|
||
#### AI_RESPONSE
|
||
|
||
AI-generated response to user input.
|
||
|
||
```json
|
||
{
|
||
"type": "AI_RESPONSE",
|
||
"payload": {
|
||
"text": "Hello! I'm doing well, thank you for asking. How can I help you today?",
|
||
"client_id": "user123",
|
||
"estimated_tts_duration": 3.2
|
||
}
|
||
}
|
||
```
|
||
|
||
**Fields:**
|
||
- `text` (string): AI response content
|
||
- `client_id` (string): Client identifier
|
||
- `estimated_tts_duration` (float): Estimated text-to-speech duration in seconds
|
||
|
||
#### ERROR
|
||
|
||
Error notification from server.
|
||
|
||
```json
|
||
{
|
||
"type": "ERROR",
|
||
"payload": {
|
||
"message": "Error description",
|
||
"client_id": "user123"
|
||
}
|
||
}
|
||
```
|
||
|
||
**Fields:**
|
||
- `message` (string): Error description
|
||
- `client_id` (string): Client identifier
|
||
|
||
## Session Management
|
||
|
||
### Session Lifecycle
|
||
|
||
1. **Connection**: Client connects with unique `clientId`
|
||
2. **Session Creation**: New session created or existing session reused
|
||
3. **Welcome Message**: New sessions receive welcome message automatically
|
||
4. **Interaction**: Real-time message exchange
|
||
5. **Disconnection**: Session data preserved for reconnection
|
||
|
||
### Session Data Structure
|
||
|
||
```python
|
||
class SessionData:
|
||
client_id: str # Unique client identifier
|
||
incomplete_sentences: List[str] # Buffered user input
|
||
conversation_history: List[ChatMessage] # Full conversation history
|
||
last_input_time: float # Timestamp of last user input
|
||
timeout_task: Optional[Task] # Current timeout task
|
||
ai_response_playback_ends_at: Optional[float] # AI response end time
|
||
```
|
||
|
||
### Session Persistence
|
||
|
||
- Sessions persist across WebSocket disconnections
|
||
- Reconnection with same `clientId` resumes existing session
|
||
- Conversation history maintained throughout session lifetime
|
||
- Timeout tasks properly managed during reconnections
|
||
|
||
## Turn Detection System
|
||
|
||
### How It Works
|
||
|
||
The server uses an ONNX-based turn detection model to determine when a user's utterance is complete:
|
||
|
||
1. **Input Buffering**: User input buffered during AI response playback
|
||
2. **Turn Analysis**: Model analyzes current + buffered input for completion
|
||
3. **Decision Making**: Determines if utterance is complete or needs more input
|
||
4. **Timeout Handling**: Processes buffered input after silence period
|
||
|
||
### Turn Detection Parameters
|
||
|
||
| Parameter | Value | Description |
|
||
|-----------|-------|-------------|
|
||
| Model | `livekit/turn-detector` | Pre-trained turn detection model |
|
||
| Threshold | `0.0009` | Probability threshold for completion |
|
||
| Max History | 6 turns | Maximum conversation history for analysis |
|
||
| Max Tokens | 128 | Maximum tokens for model input |
|
||
|
||
### Turn Detection Flow
|
||
|
||
```
|
||
User Input → Buffer Check → Turn Detection → Complete? → Process/Continue
|
||
↓ ↓ ↓ ↓ ↓
|
||
AI Speaking? Add to Buffer Model Predict Yes Send to AI
|
||
↓ ↓ ↓ ↓ ↓
|
||
No Schedule Timeout < Threshold No Schedule Timeout
|
||
```
|
||
|
||
## Timeout and Buffering System
|
||
|
||
### Buffering During AI Response
|
||
|
||
When the AI is generating or "speaking" a response:
|
||
|
||
- User input is buffered in `incomplete_sentences`
|
||
- New timeout task scheduled for each buffered input
|
||
- Timeout waits for AI playback to complete before processing
|
||
|
||
### Silence Timeout
|
||
|
||
After AI response completes:
|
||
|
||
- Server waits `MAX_RESPONSE_TIMEOUT` seconds (default: 5s)
|
||
- If no new input received, processes buffered input
|
||
- Forces completion if `MAX_INCOMPLETE_SENTENCES` reached
|
||
|
||
### Timeout Configuration
|
||
|
||
```python
|
||
# Wait for AI playback to finish
|
||
remaining_playtime = session.ai_response_playback_ends_at - current_time
|
||
await asyncio.sleep(remaining_playtime)
|
||
|
||
# Wait for user silence
|
||
await asyncio.sleep(MAX_RESPONSE_TIMEOUT) # 5 seconds default
|
||
```
|
||
|
||
## Error Handling
|
||
|
||
### Connection Errors
|
||
|
||
- **Missing clientId**: Connection rejected with code 1008
|
||
- **Invalid JSON**: Error message sent to client
|
||
- **Unknown message type**: Error message sent to client
|
||
|
||
### API Errors
|
||
|
||
- **FastGPT API failures**: Error logged, user message reverted
|
||
- **Network errors**: Comprehensive error logging with context
|
||
- **Model errors**: Graceful degradation with error reporting
|
||
|
||
### Timeout Errors
|
||
|
||
- **Task cancellation**: Normal during new input arrival
|
||
- **Exception handling**: Errors logged, timeout task cleared
|
||
|
||
## Logging System
|
||
|
||
### Log Levels
|
||
|
||
The server uses a comprehensive logging system with colored output:
|
||
|
||
- ℹ️ **INFO** (green): General information
|
||
- 🐛 **DEBUG** (cyan): Detailed debugging information
|
||
- ⚠️ **WARNING** (yellow): Warning messages
|
||
- ❌ **ERROR** (red): Error messages
|
||
- ⏱️ **TIMEOUT** (blue): Timeout-related events
|
||
- 💬 **USER_INPUT** (purple): User input processing
|
||
- 🤖 **AI_RESPONSE** (blue): AI response generation
|
||
- 🔗 **SESSION** (bold): Session management events
|
||
|
||
### Log Format
|
||
|
||
```
|
||
2024-01-15 14:30:25.123 [LEVEL] 🎯 (client_id): Message | key=value | key2=value2
|
||
```
|
||
|
||
### Example Logs
|
||
|
||
```
|
||
2024-01-15 14:30:25.123 [SESSION] 🔗 (user123): NEW SESSION: Creating session | total_sessions_before=0
|
||
2024-01-15 14:30:25.456 [USER_INPUT] 💬 (user123): AI speaking. Buffering: 'Hello' | current_buffer_size=1
|
||
2024-01-15 14:30:26.789 [AI_RESPONSE] 🤖 (user123): Response sent: 'Hello! How can I help you?' | tts_duration=2.5s | playback_ends_at=1642248629.289
|
||
```
|
||
|
||
## Client Implementation Examples
|
||
|
||
### JavaScript/Web Client
|
||
|
||
```javascript
|
||
class ChatClient {
|
||
constructor(clientId, serverUrl = 'ws://localhost:9000') {
|
||
this.clientId = clientId;
|
||
this.serverUrl = `${serverUrl}?clientId=${clientId}`;
|
||
this.ws = null;
|
||
this.messageHandlers = new Map();
|
||
}
|
||
|
||
connect() {
|
||
this.ws = new WebSocket(this.serverUrl);
|
||
|
||
this.ws.onopen = () => {
|
||
console.log('Connected to chat server');
|
||
};
|
||
|
||
this.ws.onmessage = (event) => {
|
||
const message = JSON.parse(event.data);
|
||
this.handleMessage(message);
|
||
};
|
||
|
||
this.ws.onclose = (event) => {
|
||
console.log('Disconnected from chat server');
|
||
};
|
||
}
|
||
|
||
sendMessage(text) {
|
||
if (this.ws && this.ws.readyState === WebSocket.OPEN) {
|
||
const message = {
|
||
type: 'USER_INPUT',
|
||
payload: {
|
||
text: text,
|
||
client_id: this.clientId
|
||
}
|
||
};
|
||
this.ws.send(JSON.stringify(message));
|
||
}
|
||
}
|
||
|
||
handleMessage(message) {
|
||
switch (message.type) {
|
||
case 'AI_RESPONSE':
|
||
console.log('AI:', message.payload.text);
|
||
break;
|
||
case 'ERROR':
|
||
console.error('Error:', message.payload.message);
|
||
break;
|
||
default:
|
||
console.log('Unknown message type:', message.type);
|
||
}
|
||
}
|
||
}
|
||
|
||
// Usage
|
||
const client = new ChatClient('user123');
|
||
client.connect();
|
||
client.sendMessage('Hello, how are you?');
|
||
```
|
||
|
||
### Python Client
|
||
|
||
```python
|
||
import asyncio
|
||
import websockets
|
||
import json
|
||
|
||
class ChatClient:
|
||
def __init__(self, client_id, server_url="ws://localhost:9000"):
|
||
self.client_id = client_id
|
||
self.server_url = f"{server_url}?clientId={client_id}"
|
||
self.websocket = None
|
||
|
||
async def connect(self):
|
||
self.websocket = await websockets.connect(self.server_url)
|
||
print(f"Connected to chat server as {self.client_id}")
|
||
|
||
async def send_message(self, text):
|
||
if self.websocket:
|
||
message = {
|
||
"type": "USER_INPUT",
|
||
"payload": {
|
||
"text": text,
|
||
"client_id": self.client_id
|
||
}
|
||
}
|
||
await self.websocket.send(json.dumps(message))
|
||
|
||
async def listen(self):
|
||
async for message in self.websocket:
|
||
data = json.loads(message)
|
||
await self.handle_message(data)
|
||
|
||
async def handle_message(self, message):
|
||
if message["type"] == "AI_RESPONSE":
|
||
print(f"AI: {message['payload']['text']}")
|
||
elif message["type"] == "ERROR":
|
||
print(f"Error: {message['payload']['message']}")
|
||
|
||
async def run(self):
|
||
await self.connect()
|
||
await self.listen()
|
||
|
||
# Usage
|
||
async def main():
|
||
client = ChatClient("user123")
|
||
await client.run()
|
||
|
||
asyncio.run(main())
|
||
```
|
||
|
||
## Performance Considerations
|
||
|
||
### Scalability
|
||
|
||
- **Session Management**: Sessions stored in memory (consider Redis for production)
|
||
- **Concurrent Connections**: Limited by system resources and WebSocket library
|
||
- **Model Loading**: ONNX model loaded once per server instance
|
||
|
||
### Optimization
|
||
|
||
- **Connection Pooling**: aiohttp sessions reused for API calls
|
||
- **Async Processing**: All I/O operations are asynchronous
|
||
- **Memory Management**: Sessions cleaned up on disconnection (manual cleanup needed)
|
||
|
||
### Monitoring
|
||
|
||
- **Performance Logging**: Duration tracking for all operations
|
||
- **Error Tracking**: Comprehensive error logging with context
|
||
- **Session Metrics**: Active session count and client activity
|
||
|
||
## Deployment
|
||
|
||
### Prerequisites
|
||
|
||
```bash
|
||
pip install -r requirements.txt
|
||
```
|
||
|
||
### Running the Server
|
||
|
||
```bash
|
||
python main_uninterruptable2.py
|
||
```
|
||
|
||
### Production Considerations
|
||
|
||
1. **Environment Variables**: Set all required environment variables
|
||
2. **SSL/TLS**: Use WSS for secure WebSocket connections
|
||
3. **Load Balancing**: Consider multiple server instances
|
||
4. **Session Storage**: Use Redis or database for session persistence
|
||
5. **Monitoring**: Implement health checks and metrics collection
|
||
6. **Logging**: Configure log rotation and external logging service
|
||
|
||
### Docker Deployment
|
||
|
||
```dockerfile
|
||
FROM python:3.9-slim
|
||
|
||
WORKDIR /app
|
||
COPY requirements.txt .
|
||
RUN pip install -r requirements.txt
|
||
|
||
COPY . .
|
||
EXPOSE 9000
|
||
|
||
CMD ["python", "main_uninterruptable2.py"]
|
||
```
|
||
|
||
## Troubleshooting
|
||
|
||
### Common Issues
|
||
|
||
1. **Connection Refused**: Check if server is running on correct port
|
||
2. **Missing clientId**: Ensure clientId parameter is provided in URL
|
||
3. **API Errors**: Verify FastGPT API credentials and network connectivity
|
||
4. **Model Loading**: Check if ONNX model files are accessible
|
||
|
||
### Debug Mode
|
||
|
||
Enable debug logging by modifying the logging level in the code or environment variables.
|
||
|
||
### Health Check
|
||
|
||
The server doesn't provide a built-in health check endpoint. Consider implementing one for production monitoring.
|
||
|
||
## API Reference
|
||
|
||
### WebSocket Events
|
||
|
||
| Event | Direction | Description |
|
||
|-------|-----------|-------------|
|
||
| `open` | Client | Connection established |
|
||
| `message` | Bidirectional | Message exchange |
|
||
| `close` | Client | Connection closed |
|
||
| `error` | Client | Connection error |
|
||
|
||
### Message Types Summary
|
||
|
||
| Type | Direction | Description |
|
||
|------|-----------|-------------|
|
||
| `USER_INPUT` | Client → Server | Send user message |
|
||
| `AI_RESPONSE` | Server → Client | Receive AI response |
|
||
| `ERROR` | Server → Client | Error notification |
|
||
|
||
## License
|
||
|
||
This WebSocket server is part of the turn detection request project. Please refer to the main project license for usage terms.
|