12345-IVR-api-server/README.md

# WebSocket Chat Server Documentation

## Overview

The WebSocket Chat Server is an intelligent conversation system that provides real-time AI chat capabilities with advanced turn detection, session management, and client-aware interactions. The server supports multiple concurrent clients with individual session tracking and automatic turn completion detection.

## Features

- 🔄 **Real-time WebSocket communication**
- 🧠 **AI-powered responses** via FastGPT API
- 🎯 **Intelligent turn detection** using ONNX models
- 📱 **Multi-client support** with session isolation
- ⏱️ **Automatic timeout handling** with buffering
- 🔗 **Session persistence** across reconnections
- 🎨 **Professional logging** with client tracking
- 🌐 **Welcome message system** for new sessions

## Server Configuration

### Environment Variables

Create a `.env` file in the project root:

```env
# Turn Detection Settings
MAX_INCOMPLETE_SENTENCES=3
MAX_RESPONSE_TIMEOUT=5

# FastGPT API Configuration
CHAT_MODEL_API_URL=http://101.89.151.141:3000/
CHAT_MODEL_API_KEY=your_fastgpt_api_key_here
CHAT_MODEL_APP_ID=your_fastgpt_app_id_here
```

### Default Values

| Variable | Default | Description |
|----------|---------|-------------|
| `MAX_INCOMPLETE_SENTENCES` | 3 | Maximum buffered sentences before forcing completion |
| `MAX_RESPONSE_TIMEOUT` | 5 | Seconds of silence before processing buffered input |
| `CHAT_MODEL_API_URL` | None | FastGPT API endpoint URL |
| `CHAT_MODEL_API_KEY` | None | FastGPT API authentication key |
| `CHAT_MODEL_APP_ID` | None | FastGPT application ID |

## WebSocket Connection

### Connection URL Format

```
ws://localhost:9000?clientId=YOUR_CLIENT_ID
```

### Connection Parameters

| Parameter | Required | Description |
|-----------|----------|-------------|
| `clientId` | Yes | Unique identifier for the client session |

### Connection Example

```javascript
const ws = new WebSocket('ws://localhost:9000?clientId=user123');
```

## Message Protocol

### Message Format

All messages use JSON format with the following structure:

```json
{
  "type": "MESSAGE_TYPE",
  "payload": {
    // Message-specific data
  }
}
```

### Client to Server Messages

#### USER_INPUT

Sends user text input to the server.

```json
{
  "type": "USER_INPUT",
  "payload": {
    "text": "Hello, how are you?",
    "client_id": "user123"  // Optional, will use URL clientId if not provided
  }
}
```

**Fields:**
- `text` (string, required): The user's input text
- `client_id` (string, optional): Client identifier (overrides URL parameter)

### Server to Client Messages

#### AI_RESPONSE

AI-generated response to user input.

```json
{
  "type": "AI_RESPONSE",
  "payload": {
    "text": "Hello! I'm doing well, thank you for asking. How can I help you today?",
    "client_id": "user123",
    "estimated_tts_duration": 3.2
  }
}
```

**Fields:**
- `text` (string): AI response content
- `client_id` (string): Client identifier
- `estimated_tts_duration` (float): Estimated text-to-speech duration in seconds

#### ERROR

Error notification from server.

```json
{
  "type": "ERROR",
  "payload": {
    "message": "Error description",
    "client_id": "user123"
  }
}
```

**Fields:**
- `message` (string): Error description
- `client_id` (string): Client identifier

## Session Management

### Session Lifecycle

1. **Connection**: Client connects with unique `clientId`
2. **Session Creation**: New session created or existing session reused
3. **Welcome Message**: New sessions receive welcome message automatically
4. **Interaction**: Real-time message exchange
5. **Disconnection**: Session data preserved for reconnection

### Session Data Structure

```python
class SessionData:
    client_id: str                    # Unique client identifier
    incomplete_sentences: List[str]   # Buffered user input
    conversation_history: List[ChatMessage]  # Full conversation history
    last_input_time: float           # Timestamp of last user input
    timeout_task: Optional[Task]     # Current timeout task
    ai_response_playback_ends_at: Optional[float]  # AI response end time
```

### Session Persistence

- Sessions persist across WebSocket disconnections
- Reconnection with same `clientId` resumes existing session
- Conversation history maintained throughout session lifetime
- Timeout tasks properly managed during reconnections

## Turn Detection System

### How It Works

The server uses an ONNX-based turn detection model to determine when a user's utterance is complete:

1. **Input Buffering**: User input buffered during AI response playback
2. **Turn Analysis**: Model analyzes current + buffered input for completion
3. **Decision Making**: Determines if utterance is complete or needs more input
4. **Timeout Handling**: Processes buffered input after silence period

### Turn Detection Parameters

| Parameter | Value | Description |
|-----------|-------|-------------|
| Model | `livekit/turn-detector` | Pre-trained turn detection model |
| Threshold | `0.0009` | Probability threshold for completion |
| Max History | 6 turns | Maximum conversation history for analysis |
| Max Tokens | 128 | Maximum tokens for model input |

### Turn Detection Flow

```
User Input → Buffer Check → Turn Detection → Complete? → Process/Continue
     ↓              ↓              ↓            ↓           ↓
  AI Speaking?   Add to Buffer   Model Predict   Yes      Send to AI
     ↓              ↓              ↓            ↓           ↓
     No         Schedule Timeout   < Threshold   No      Schedule Timeout
```

## Timeout and Buffering System

### Buffering During AI Response

When the AI is generating or "speaking" a response:

- User input is buffered in `incomplete_sentences`
- New timeout task scheduled for each buffered input
- Timeout waits for AI playback to complete before processing

### Silence Timeout

After AI response completes:

- Server waits `MAX_RESPONSE_TIMEOUT` seconds (default: 5s)
- If no new input received, processes buffered input
- Forces completion if `MAX_INCOMPLETE_SENTENCES` reached

### Timeout Configuration

```python
# Wait for AI playback to finish
remaining_playtime = session.ai_response_playback_ends_at - current_time
await asyncio.sleep(remaining_playtime)

# Wait for user silence
await asyncio.sleep(MAX_RESPONSE_TIMEOUT)  # 5 seconds default
```

## Error Handling

### Connection Errors

- **Missing clientId**: Connection rejected with code 1008
- **Invalid JSON**: Error message sent to client
- **Unknown message type**: Error message sent to client

### API Errors

- **FastGPT API failures**: Error logged, user message reverted
- **Network errors**: Comprehensive error logging with context
- **Model errors**: Graceful degradation with error reporting

### Timeout Errors

- **Task cancellation**: Normal during new input arrival
- **Exception handling**: Errors logged, timeout task cleared

## Logging System

### Log Levels

The server uses a comprehensive logging system with colored output:

- ℹ️ **INFO** (green): General information
- 🐛 **DEBUG** (cyan): Detailed debugging information
- ⚠️ **WARNING** (yellow): Warning messages
- ❌ **ERROR** (red): Error messages
- ⏱️ **TIMEOUT** (blue): Timeout-related events
- 💬 **USER_INPUT** (purple): User input processing
- 🤖 **AI_RESPONSE** (blue): AI response generation
- 🔗 **SESSION** (bold): Session management events

### Log Format

```
2024-01-15 14:30:25.123 [LEVEL] 🎯 (client_id): Message | key=value | key2=value2
```

### Example Logs

```
2024-01-15 14:30:25.123 [SESSION] 🔗 (user123): NEW SESSION: Creating session | total_sessions_before=0
2024-01-15 14:30:25.456 [USER_INPUT] 💬 (user123): AI speaking. Buffering: 'Hello' | current_buffer_size=1
2024-01-15 14:30:26.789 [AI_RESPONSE] 🤖 (user123): Response sent: 'Hello! How can I help you?' | tts_duration=2.5s | playback_ends_at=1642248629.289
```

## Client Implementation Examples

### JavaScript/Web Client

```javascript
class ChatClient {
    constructor(clientId, serverUrl = 'ws://localhost:9000') {
        this.clientId = clientId;
        this.serverUrl = `${serverUrl}?clientId=${clientId}`;
        this.ws = null;
        this.messageHandlers = new Map();
    }

    connect() {
        this.ws = new WebSocket(this.serverUrl);

        this.ws.onopen = () => {
            console.log('Connected to chat server');
        };

        this.ws.onmessage = (event) => {
            const message = JSON.parse(event.data);
            this.handleMessage(message);
        };

        this.ws.onclose = (event) => {
            console.log('Disconnected from chat server');
        };
    }

    sendMessage(text) {
        if (this.ws && this.ws.readyState === WebSocket.OPEN) {
            const message = {
                type: 'USER_INPUT',
                payload: {
                    text: text,
                    client_id: this.clientId
                }
            };
            this.ws.send(JSON.stringify(message));
        }
    }

    handleMessage(message) {
        switch (message.type) {
            case 'AI_RESPONSE':
                console.log('AI:', message.payload.text);
                break;
            case 'ERROR':
                console.error('Error:', message.payload.message);
                break;
            default:
                console.log('Unknown message type:', message.type);
        }
    }
}

// Usage
const client = new ChatClient('user123');
client.connect();
client.sendMessage('Hello, how are you?');
```

### Python Client

```python
import asyncio
import websockets
import json

class ChatClient:
    def __init__(self, client_id, server_url="ws://localhost:9000"):
        self.client_id = client_id
        self.server_url = f"{server_url}?clientId={client_id}"
        self.websocket = None

    async def connect(self):
        self.websocket = await websockets.connect(self.server_url)
        print(f"Connected to chat server as {self.client_id}")

    async def send_message(self, text):
        if self.websocket:
            message = {
                "type": "USER_INPUT",
                "payload": {
                    "text": text,
                    "client_id": self.client_id
                }
            }
            await self.websocket.send(json.dumps(message))

    async def listen(self):
        async for message in self.websocket:
            data = json.loads(message)
            await self.handle_message(data)

    async def handle_message(self, message):
        if message["type"] == "AI_RESPONSE":
            print(f"AI: {message['payload']['text']}")
        elif message["type"] == "ERROR":
            print(f"Error: {message['payload']['message']}")

    async def run(self):
        await self.connect()
        await self.listen()

# Usage
async def main():
    client = ChatClient("user123")
    await client.run()

asyncio.run(main())
```

## Performance Considerations

### Scalability

- **Session Management**: Sessions stored in memory (consider Redis for production)
- **Concurrent Connections**: Limited by system resources and WebSocket library
- **Model Loading**: ONNX model loaded once per server instance

### Optimization

- **Connection Pooling**: aiohttp sessions reused for API calls
- **Async Processing**: All I/O operations are asynchronous
- **Memory Management**: Sessions cleaned up on disconnection (manual cleanup needed)

### Monitoring

- **Performance Logging**: Duration tracking for all operations
- **Error Tracking**: Comprehensive error logging with context
- **Session Metrics**: Active session count and client activity

## Deployment

### Prerequisites

```bash
pip install -r requirements.txt
```

### Running the Server

```bash
python main_uninterruptable2.py
```

### Production Considerations

1. **Environment Variables**: Set all required environment variables
2. **SSL/TLS**: Use WSS for secure WebSocket connections
3. **Load Balancing**: Consider multiple server instances
4. **Session Storage**: Use Redis or database for session persistence
5. **Monitoring**: Implement health checks and metrics collection
6. **Logging**: Configure log rotation and external logging service

### Docker Deployment

```dockerfile
FROM python:3.9-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .
EXPOSE 9000

CMD ["python", "main_uninterruptable2.py"]
```

## Troubleshooting

### Common Issues

1. **Connection Refused**: Check if server is running on correct port
2. **Missing clientId**: Ensure clientId parameter is provided in URL
3. **API Errors**: Verify FastGPT API credentials and network connectivity
4. **Model Loading**: Check if ONNX model files are accessible

### Debug Mode

Enable debug logging by modifying the logging level in the code or environment variables.

### Health Check

The server doesn't provide a built-in health check endpoint. Consider implementing one for production monitoring.

## API Reference

### WebSocket Events

| Event | Direction | Description |
|-------|-----------|-------------|
| `open` | Client | Connection established |
| `message` | Bidirectional | Message exchange |
| `close` | Client | Connection closed |
| `error` | Client | Connection error |

### Message Types Summary

| Type | Direction | Description |
|------|-----------|-------------|
| `USER_INPUT` | Client → Server | Send user message |
| `AI_RESPONSE` | Server → Client | Receive AI response |
| `ERROR` | Server → Client | Error notification |

## License

This WebSocket server is part of the turn detection request project. Please refer to the main project license for usage terms.