WebSocket Chat Server Documentation
Overview
The WebSocket Chat Server is an intelligent conversation system that provides real-time AI chat capabilities with advanced turn detection, session management, and client-aware interactions. The server supports multiple concurrent clients with individual session tracking and automatic turn completion detection.
Features
- 🔄 Real-time WebSocket communication
- 🧠 AI-powered responses via FastGPT API
- 🎯 Intelligent turn detection using ONNX models
- 📱 Multi-client support with session isolation
- ⏱️ Automatic timeout handling with buffering
- 🔗 Session persistence across reconnections
- 🎨 Professional logging with client tracking
- 🌐 Welcome message system for new sessions
Server Configuration
Environment Variables
Create a .env file in the project root:
# Turn Detection Settings
MAX_INCOMPLETE_SENTENCES=3
MAX_RESPONSE_TIMEOUT=5
# FastGPT API Configuration
CHAT_MODEL_API_URL=http://101.89.151.141:3000/
CHAT_MODEL_API_KEY=your_fastgpt_api_key_here
CHAT_MODEL_APP_ID=your_fastgpt_app_id_here
Default Values
| Variable | Default | Description |
|---|---|---|
MAX_INCOMPLETE_SENTENCES |
3 | Maximum buffered sentences before forcing completion |
MAX_RESPONSE_TIMEOUT |
5 | Seconds of silence before processing buffered input |
CHAT_MODEL_API_URL |
None | FastGPT API endpoint URL |
CHAT_MODEL_API_KEY |
None | FastGPT API authentication key |
CHAT_MODEL_APP_ID |
None | FastGPT application ID |
WebSocket Connection
Connection URL Format
ws://localhost:9000?clientId=YOUR_CLIENT_ID
Connection Parameters
| Parameter | Required | Description |
|---|---|---|
clientId |
Yes | Unique identifier for the client session |
Connection Example
const ws = new WebSocket('ws://localhost:9000?clientId=user123');
Message Protocol
Message Format
All messages use JSON format with the following structure:
{
"type": "MESSAGE_TYPE",
"payload": {
// Message-specific data
}
}
Client to Server Messages
USER_INPUT
Sends user text input to the server.
{
"type": "USER_INPUT",
"payload": {
"text": "Hello, how are you?",
"client_id": "user123" // Optional, will use URL clientId if not provided
}
}
Fields:
text(string, required): The user's input textclient_id(string, optional): Client identifier (overrides URL parameter)
Server to Client Messages
AI_RESPONSE
AI-generated response to user input.
{
"type": "AI_RESPONSE",
"payload": {
"text": "Hello! I'm doing well, thank you for asking. How can I help you today?",
"client_id": "user123",
"estimated_tts_duration": 3.2
}
}
Fields:
text(string): AI response contentclient_id(string): Client identifierestimated_tts_duration(float): Estimated text-to-speech duration in seconds
ERROR
Error notification from server.
{
"type": "ERROR",
"payload": {
"message": "Error description",
"client_id": "user123"
}
}
Fields:
message(string): Error descriptionclient_id(string): Client identifier
Session Management
Session Lifecycle
- Connection: Client connects with unique
clientId - Session Creation: New session created or existing session reused
- Welcome Message: New sessions receive welcome message automatically
- Interaction: Real-time message exchange
- Disconnection: Session data preserved for reconnection
Session Data Structure
class SessionData:
client_id: str # Unique client identifier
incomplete_sentences: List[str] # Buffered user input
conversation_history: List[ChatMessage] # Full conversation history
last_input_time: float # Timestamp of last user input
timeout_task: Optional[Task] # Current timeout task
ai_response_playback_ends_at: Optional[float] # AI response end time
Session Persistence
- Sessions persist across WebSocket disconnections
- Reconnection with same
clientIdresumes existing session - Conversation history maintained throughout session lifetime
- Timeout tasks properly managed during reconnections
Turn Detection System
How It Works
The server uses an ONNX-based turn detection model to determine when a user's utterance is complete:
- Input Buffering: User input buffered during AI response playback
- Turn Analysis: Model analyzes current + buffered input for completion
- Decision Making: Determines if utterance is complete or needs more input
- Timeout Handling: Processes buffered input after silence period
Turn Detection Parameters
| Parameter | Value | Description |
|---|---|---|
| Model | livekit/turn-detector |
Pre-trained turn detection model |
| Threshold | 0.0009 |
Probability threshold for completion |
| Max History | 6 turns | Maximum conversation history for analysis |
| Max Tokens | 128 | Maximum tokens for model input |
Turn Detection Flow
User Input → Buffer Check → Turn Detection → Complete? → Process/Continue
↓ ↓ ↓ ↓ ↓
AI Speaking? Add to Buffer Model Predict Yes Send to AI
↓ ↓ ↓ ↓ ↓
No Schedule Timeout < Threshold No Schedule Timeout
Timeout and Buffering System
Buffering During AI Response
When the AI is generating or "speaking" a response:
- User input is buffered in
incomplete_sentences - New timeout task scheduled for each buffered input
- Timeout waits for AI playback to complete before processing
Silence Timeout
After AI response completes:
- Server waits
MAX_RESPONSE_TIMEOUTseconds (default: 5s) - If no new input received, processes buffered input
- Forces completion if
MAX_INCOMPLETE_SENTENCESreached
Timeout Configuration
# Wait for AI playback to finish
remaining_playtime = session.ai_response_playback_ends_at - current_time
await asyncio.sleep(remaining_playtime)
# Wait for user silence
await asyncio.sleep(MAX_RESPONSE_TIMEOUT) # 5 seconds default
Error Handling
Connection Errors
- Missing clientId: Connection rejected with code 1008
- Invalid JSON: Error message sent to client
- Unknown message type: Error message sent to client
API Errors
- FastGPT API failures: Error logged, user message reverted
- Network errors: Comprehensive error logging with context
- Model errors: Graceful degradation with error reporting
Timeout Errors
- Task cancellation: Normal during new input arrival
- Exception handling: Errors logged, timeout task cleared
Logging System
Log Levels
The server uses a comprehensive logging system with colored output:
- ℹ️ INFO (green): General information
- 🐛 DEBUG (cyan): Detailed debugging information
- ⚠️ WARNING (yellow): Warning messages
- ❌ ERROR (red): Error messages
- ⏱️ TIMEOUT (blue): Timeout-related events
- 💬 USER_INPUT (purple): User input processing
- 🤖 AI_RESPONSE (blue): AI response generation
- 🔗 SESSION (bold): Session management events
Log Format
2024-01-15 14:30:25.123 [LEVEL] 🎯 (client_id): Message | key=value | key2=value2
Example Logs
2024-01-15 14:30:25.123 [SESSION] 🔗 (user123): NEW SESSION: Creating session | total_sessions_before=0
2024-01-15 14:30:25.456 [USER_INPUT] 💬 (user123): AI speaking. Buffering: 'Hello' | current_buffer_size=1
2024-01-15 14:30:26.789 [AI_RESPONSE] 🤖 (user123): Response sent: 'Hello! How can I help you?' | tts_duration=2.5s | playback_ends_at=1642248629.289
Client Implementation Examples
JavaScript/Web Client
class ChatClient {
constructor(clientId, serverUrl = 'ws://localhost:9000') {
this.clientId = clientId;
this.serverUrl = `${serverUrl}?clientId=${clientId}`;
this.ws = null;
this.messageHandlers = new Map();
}
connect() {
this.ws = new WebSocket(this.serverUrl);
this.ws.onopen = () => {
console.log('Connected to chat server');
};
this.ws.onmessage = (event) => {
const message = JSON.parse(event.data);
this.handleMessage(message);
};
this.ws.onclose = (event) => {
console.log('Disconnected from chat server');
};
}
sendMessage(text) {
if (this.ws && this.ws.readyState === WebSocket.OPEN) {
const message = {
type: 'USER_INPUT',
payload: {
text: text,
client_id: this.clientId
}
};
this.ws.send(JSON.stringify(message));
}
}
handleMessage(message) {
switch (message.type) {
case 'AI_RESPONSE':
console.log('AI:', message.payload.text);
break;
case 'ERROR':
console.error('Error:', message.payload.message);
break;
default:
console.log('Unknown message type:', message.type);
}
}
}
// Usage
const client = new ChatClient('user123');
client.connect();
client.sendMessage('Hello, how are you?');
Python Client
import asyncio
import websockets
import json
class ChatClient:
def __init__(self, client_id, server_url="ws://localhost:9000"):
self.client_id = client_id
self.server_url = f"{server_url}?clientId={client_id}"
self.websocket = None
async def connect(self):
self.websocket = await websockets.connect(self.server_url)
print(f"Connected to chat server as {self.client_id}")
async def send_message(self, text):
if self.websocket:
message = {
"type": "USER_INPUT",
"payload": {
"text": text,
"client_id": self.client_id
}
}
await self.websocket.send(json.dumps(message))
async def listen(self):
async for message in self.websocket:
data = json.loads(message)
await self.handle_message(data)
async def handle_message(self, message):
if message["type"] == "AI_RESPONSE":
print(f"AI: {message['payload']['text']}")
elif message["type"] == "ERROR":
print(f"Error: {message['payload']['message']}")
async def run(self):
await self.connect()
await self.listen()
# Usage
async def main():
client = ChatClient("user123")
await client.run()
asyncio.run(main())
Performance Considerations
Scalability
- Session Management: Sessions stored in memory (consider Redis for production)
- Concurrent Connections: Limited by system resources and WebSocket library
- Model Loading: ONNX model loaded once per server instance
Optimization
- Connection Pooling: aiohttp sessions reused for API calls
- Async Processing: All I/O operations are asynchronous
- Memory Management: Sessions cleaned up on disconnection (manual cleanup needed)
Monitoring
- Performance Logging: Duration tracking for all operations
- Error Tracking: Comprehensive error logging with context
- Session Metrics: Active session count and client activity
Deployment
Prerequisites
pip install -r requirements.txt
Running the Server
python main_uninterruptable2.py
Production Considerations
- Environment Variables: Set all required environment variables
- SSL/TLS: Use WSS for secure WebSocket connections
- Load Balancing: Consider multiple server instances
- Session Storage: Use Redis or database for session persistence
- Monitoring: Implement health checks and metrics collection
- Logging: Configure log rotation and external logging service
Docker Deployment
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 9000
CMD ["python", "main_uninterruptable2.py"]
Troubleshooting
Common Issues
- Connection Refused: Check if server is running on correct port
- Missing clientId: Ensure clientId parameter is provided in URL
- API Errors: Verify FastGPT API credentials and network connectivity
- Model Loading: Check if ONNX model files are accessible
Debug Mode
Enable debug logging by modifying the logging level in the code or environment variables.
Health Check
The server doesn't provide a built-in health check endpoint. Consider implementing one for production monitoring.
API Reference
WebSocket Events
| Event | Direction | Description |
|---|---|---|
open |
Client | Connection established |
message |
Bidirectional | Message exchange |
close |
Client | Connection closed |
error |
Client | Connection error |
Message Types Summary
| Type | Direction | Description |
|---|---|---|
USER_INPUT |
Client → Server | Send user message |
AI_RESPONSE |
Server → Client | Receive AI response |
ERROR |
Server → Client | Error notification |
License
This WebSocket server is part of the turn detection request project. Please refer to the main project license for usage terms.