Init Projecto
This commit is contained in:
344
.claude/CLAUDE.md
Normal file
344
.claude/CLAUDE.md
Normal file
@@ -0,0 +1,344 @@
|
|||||||
|
# CLAUDE.md
|
||||||
|
|
||||||
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||||
|
|
||||||
|
## Project Overview
|
||||||
|
|
||||||
|
This is a Python implementation of `active-call` (originally Rust), a high-performance Voice AI Gateway that bridges telephony protocols (WebSocket, WebRTC) with AI pipelines (LLM, ASR, TTS). The system follows a **decoupled architecture** where the Media Gateway (this service) handles low-level audio/signaling, while Business Logic (AI Agent) controls it via WebSocket API.
|
||||||
|
|
||||||
|
**Technology Stack:**
|
||||||
|
- Python 3.11+ with `asyncio` for all I/O
|
||||||
|
- FastAPI + Uvicorn for WebSocket/WebRTC endpoints
|
||||||
|
- aiortc for WebRTC media transport (optional dependency)
|
||||||
|
- Silero VAD for voice activity detection (optional dependency)
|
||||||
|
- Pydantic for protocol validation
|
||||||
|
- Loguru for structured logging
|
||||||
|
|
||||||
|
## Common Development Commands
|
||||||
|
|
||||||
|
### Running the Server
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Start development server (with auto-reload)
|
||||||
|
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
|
||||||
|
|
||||||
|
# Start with specific host/port
|
||||||
|
HOST=0.0.0.0 PORT=8080 uvicorn app.main:app
|
||||||
|
|
||||||
|
# Using Docker
|
||||||
|
docker-compose up --build
|
||||||
|
```
|
||||||
|
|
||||||
|
### Testing
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run WebSocket test client (sine wave generation)
|
||||||
|
python scripts/test_websocket.py --url ws://localhost:8000/ws --sine
|
||||||
|
|
||||||
|
# Run WebSocket test client (with audio file)
|
||||||
|
python scripts/test_websocket.py --url ws://localhost:8000/ws --file test_audio.wav
|
||||||
|
|
||||||
|
# Run WebRTC test client
|
||||||
|
python scripts/test_webrtc.py --url ws://localhost:8000/webrtc
|
||||||
|
|
||||||
|
# Run unit tests
|
||||||
|
pytest tests/ -v --cov=app --cov=core
|
||||||
|
|
||||||
|
# Run specific test file
|
||||||
|
pytest tests/test_session.py -v
|
||||||
|
|
||||||
|
# Run with coverage report
|
||||||
|
pytest tests/ --cov=app --cov=core --cov-report=html
|
||||||
|
```
|
||||||
|
|
||||||
|
### Code Quality
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Format code
|
||||||
|
black app/ core/ models/ processors/ utils/ scripts/
|
||||||
|
|
||||||
|
# Lint code
|
||||||
|
ruff check app/ core/ models/ processors/ utils/ scripts/
|
||||||
|
|
||||||
|
# Type checking
|
||||||
|
mypy app/ core/
|
||||||
|
```
|
||||||
|
|
||||||
|
### Dependency Management
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Install all dependencies
|
||||||
|
pip install -r requirements.txt
|
||||||
|
|
||||||
|
# Install development dependencies
|
||||||
|
pip install -r requirements-dev.txt
|
||||||
|
|
||||||
|
# Update dependencies
|
||||||
|
pip install --upgrade -r requirements.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
## Architecture Overview
|
||||||
|
|
||||||
|
### Decoupled Design Pattern
|
||||||
|
|
||||||
|
The system implements a **decoupled architecture** separating concerns:
|
||||||
|
|
||||||
|
1. **Media Gateway Layer** (`app/`, `core/`, `processors/`)
|
||||||
|
- Handles low-level audio transport (WebSocket, WebRTC)
|
||||||
|
- Manages session lifecycle and state
|
||||||
|
- Processes audio through pipeline (VAD, resampling)
|
||||||
|
- Emits events (speaking, silence, error) to control layer
|
||||||
|
|
||||||
|
2. **Business Logic Layer** (External AI Agent)
|
||||||
|
- Connects via WebSocket
|
||||||
|
- Receives real-time events (speech detection, ASR transcripts)
|
||||||
|
- Sends commands (tts, play, interrupt, hangup)
|
||||||
|
|
||||||
|
### Key Architecture Components
|
||||||
|
|
||||||
|
**Transport Abstraction** ([`core/transports.py`](core/transports.py)):
|
||||||
|
- `BaseTransport` - Abstract interface with `send_event()` and `send_audio()`
|
||||||
|
- `SocketTransport` - WebSocket with mixed text/binary frames, uses `asyncio.Lock` to prevent frame interleaving
|
||||||
|
- `WebRtcTransport` - WebSocket signaling + aiortc RTCPeerConnection for media
|
||||||
|
|
||||||
|
**Session Management** ([`core/session.py`](core/session.py)):
|
||||||
|
- Each WebSocket/WebRTC connection creates a `Session` with unique UUID
|
||||||
|
- Routes incoming JSON commands to handlers via `parse_command()` from `models/commands.py`
|
||||||
|
- Routes binary audio data to `AudioPipeline`
|
||||||
|
- Manages session state: created → invited → accepted → ringing → hungup
|
||||||
|
- Cleanup on disconnect
|
||||||
|
|
||||||
|
**Audio Pipeline** ([`core/pipeline.py`](core/pipeline.py)):
|
||||||
|
- Processes audio through VAD (Voice Activity Detection)
|
||||||
|
- Emits events to global event bus when VAD state changes
|
||||||
|
- Supports interruption for barge-in scenarios
|
||||||
|
|
||||||
|
**Event Bus** ([`core/events.py`](core/events.py)):
|
||||||
|
- Global pub/sub system for inter-component communication
|
||||||
|
- Subscribe to specific event types (speaking, silence, error)
|
||||||
|
- Async notification to all subscribers
|
||||||
|
|
||||||
|
### Protocol Compatibility
|
||||||
|
|
||||||
|
The implementation **must maintain protocol compatibility** with the original Rust API. All commands and events are strictly defined in:
|
||||||
|
|
||||||
|
- [`models/commands.py`](models/commands.py) - Command models (invite, accept, reject, tts, play, interrupt, hangup, chat)
|
||||||
|
- [`models/events.py`](models/events.py) - Event models (answer, speaking, silence, trackStart, trackEnd, error)
|
||||||
|
- [`models/config.py`](models/config.py) - Configuration models (CallOption, VADOption, TTSOption, ASROption, etc.)
|
||||||
|
|
||||||
|
**Important:** Always use `parse_command()` from `models/commands.py` to parse incoming JSON - never manually parse command strings. This ensures type safety and validation.
|
||||||
|
|
||||||
|
### WebSocket Protocol (`/ws` endpoint)
|
||||||
|
|
||||||
|
**Mixed Frame Handling:**
|
||||||
|
- **Text frames** → JSON commands (invite, tts, play, interrupt, hangup, etc.)
|
||||||
|
- **Binary frames** → Raw PCM audio (16kHz, 16-bit, mono)
|
||||||
|
|
||||||
|
**Flow:**
|
||||||
|
1. Client connects and sends `invite` command with codec configuration
|
||||||
|
2. Server responds with `answer` event
|
||||||
|
3. Client streams binary audio frames
|
||||||
|
4. Server processes audio and emits events (speaking, silence)
|
||||||
|
5. Client can send commands at any time (tts, play, interrupt, hangup)
|
||||||
|
|
||||||
|
### WebRTC Protocol (`/webrtc` endpoint)
|
||||||
|
|
||||||
|
**Signaling Flow:**
|
||||||
|
1. Client connects via WebSocket
|
||||||
|
2. Client sends SDP offer (JSON with `sdp` and `type` fields)
|
||||||
|
3. Server creates RTCPeerConnection and generates SDP answer
|
||||||
|
4. Server responds with `answer` event containing SDP
|
||||||
|
5. WebRTC media flows via UDP (managed by aiortc)
|
||||||
|
6. Commands can be sent via WebSocket text frames at any time
|
||||||
|
|
||||||
|
**Audio Track Handling:**
|
||||||
|
- When `pc.on("track")` fires, wrap received track with `Resampled16kTrack`
|
||||||
|
- Pull frames from track and convert to bytes
|
||||||
|
- Feed bytes to `session.handle_audio()`
|
||||||
|
|
||||||
|
### Session Lifecycle
|
||||||
|
|
||||||
|
```
|
||||||
|
1. Connection → WebSocket/WebRTC endpoint accepts
|
||||||
|
2. Session creation → New Session(uuid, transport)
|
||||||
|
3. Invite → Client sends invite command
|
||||||
|
4. Answer → Server sends answer event
|
||||||
|
5. Audio streaming → Client sends binary audio / WebRTC media
|
||||||
|
6. Commands → Client sends JSON commands (tts, play, interrupt)
|
||||||
|
7. Hangup → Client sends hangup command OR connection closes
|
||||||
|
8. Cleanup → Session cleanup, remove from active_sessions
|
||||||
|
```
|
||||||
|
|
||||||
|
### Optional Dependencies
|
||||||
|
|
||||||
|
The following dependencies are **optional** - the code gracefully degrades without them:
|
||||||
|
|
||||||
|
- **aiortc + av (PyAV)** - Required for WebRTC functionality. Without them:
|
||||||
|
- `/webrtc` endpoint will reject connections
|
||||||
|
- WebRTC transport cannot be used
|
||||||
|
- WebSocket endpoint still works fine
|
||||||
|
|
||||||
|
- **onnxruntime** - Required for VAD functionality. Without it:
|
||||||
|
- VAD always returns "Speech" with probability 1.0
|
||||||
|
- speaking/silence events still emitted but not accurate
|
||||||
|
|
||||||
|
## Important Implementation Details
|
||||||
|
|
||||||
|
### Thread Safety in WebSocket Transport
|
||||||
|
|
||||||
|
The `SocketTransport` uses `asyncio.Lock()` because FastAPI WebSocket's `send_text()` and `send_bytes()` are NOT thread-safe. Without the lock, rapidly sending text and binary frames can interleave, causing protocol violations.
|
||||||
|
|
||||||
|
```python
|
||||||
|
async def send_event(self, event: dict):
|
||||||
|
async with self.lock: # Critical for thread safety
|
||||||
|
await self.ws.send_text(json.dumps(event))
|
||||||
|
|
||||||
|
async def send_audio(self, pcm_bytes: bytes):
|
||||||
|
async with self.lock:
|
||||||
|
await self.ws.send_bytes(pcm_bytes)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Event Bus Usage
|
||||||
|
|
||||||
|
Components subscribe to event types and are notified asynchronously:
|
||||||
|
|
||||||
|
```python
|
||||||
|
event_bus = get_event_bus()
|
||||||
|
|
||||||
|
# Subscribe to speaking events
|
||||||
|
event_bus.subscribe("speaking", my_callback)
|
||||||
|
|
||||||
|
# Publish events
|
||||||
|
await event_bus.publish("speaking", {"trackId": session_id, "probability": 0.9})
|
||||||
|
```
|
||||||
|
|
||||||
|
### Error Handling Pattern
|
||||||
|
|
||||||
|
All errors are sent as `error` events to the client:
|
||||||
|
|
||||||
|
```python
|
||||||
|
await self.transport.send_event({
|
||||||
|
"event": "error",
|
||||||
|
"trackId": self.current_track_id,
|
||||||
|
"timestamp": self._get_timestamp_ms(),
|
||||||
|
"sender": "server", # or "asr", "tts", "media", etc.
|
||||||
|
"error": error_message
|
||||||
|
})
|
||||||
|
```
|
||||||
|
|
||||||
|
### Configuration Management
|
||||||
|
|
||||||
|
Configuration is loaded from:
|
||||||
|
1. Environment variables
|
||||||
|
2. `.env` file (gitignored)
|
||||||
|
3. Default values in `app/config.py`
|
||||||
|
|
||||||
|
**Never commit `.env`** - it may contain sensitive keys. Use `.env.example` as a template.
|
||||||
|
|
||||||
|
### Audio Format Specifications
|
||||||
|
|
||||||
|
**Input/Output Audio:**
|
||||||
|
- Sample rate: 16kHz
|
||||||
|
- Bit depth: 16-bit (PCM)
|
||||||
|
- Channels: Mono
|
||||||
|
- Chunk size: 640 bytes (20ms at 16kHz)
|
||||||
|
|
||||||
|
**Format:** Little-endian signed 16-bit integers (int16)
|
||||||
|
|
||||||
|
## Key Files Reference
|
||||||
|
|
||||||
|
When working with this codebase, these files are the most critical:
|
||||||
|
|
||||||
|
- [`app/main.py`](app/main.py) - FastAPI endpoints, session lifecycle, event hooks
|
||||||
|
- [`core/transports.py`](core/transports.py) - Transport abstraction and WebSocket/WebRTC handling
|
||||||
|
- [`core/session.py`](core/session.py) - Command routing, session state management
|
||||||
|
- [`core/pipeline.py`](core/pipeline.py) - Audio processing, VAD integration, event emission
|
||||||
|
- [`models/commands.py`](models/commands.py) - Protocol command definitions and parsing
|
||||||
|
- [`models/events.py`](models/events.py) - Protocol event definitions
|
||||||
|
- [`processors/vad.py`](processors/vad.py) - Silero VAD implementation
|
||||||
|
- [`reference/active-call/docs/api.md`](reference/active-call/docs/api.md) - Complete API specification from original Rust implementation
|
||||||
|
|
||||||
|
### Testing Strategy
|
||||||
|
|
||||||
|
When implementing new features:
|
||||||
|
|
||||||
|
1. **Unit tests** - Test individual components (transports, session, pipeline)
|
||||||
|
2. **Integration tests** - Test endpoint behavior with test clients
|
||||||
|
3. **Protocol tests** - Verify commands/events match API specification
|
||||||
|
4. **Manual testing** - Use `scripts/test_websocket.py` and `scripts/test_webrtc.py`
|
||||||
|
|
||||||
|
### Reference Implementations
|
||||||
|
|
||||||
|
- **Original Rust implementation:** [`reference/active-call/`](reference/active-call/) - Complete feature set with SIP, ASR, TTS
|
||||||
|
- **Python reference:** [`reference/py-active-call/`](reference/py-active-call/) - Partial implementation with bot integration
|
||||||
|
|
||||||
|
Use these as references for:
|
||||||
|
- Protocol specification details
|
||||||
|
- Architecture patterns
|
||||||
|
- Testing approaches
|
||||||
|
- Edge case handling
|
||||||
|
|
||||||
|
### Common Patterns
|
||||||
|
|
||||||
|
**Creating a new command:**
|
||||||
|
1. Add model to `models/commands.py`
|
||||||
|
2. Add to `COMMAND_TYPES` dict
|
||||||
|
3. Add handler method in `core/session.py` (e.g., `_handle_mycommand`)
|
||||||
|
4. Route in `Session.handle_text()` under the command type
|
||||||
|
|
||||||
|
**Adding a new event:**
|
||||||
|
1. Add model to `models/events.py`
|
||||||
|
2. Add to `EVENT_TYPES` dict
|
||||||
|
3. Emit via `transport.send_event()` or `event_bus.publish()`
|
||||||
|
|
||||||
|
**Adding a new processor:**
|
||||||
|
1. Create in `processors/myprocessor.py`
|
||||||
|
2. Integrate into `core/pipeline.py` AudioPipeline
|
||||||
|
3. Emit events through event bus
|
||||||
|
|
||||||
|
### Session State Management
|
||||||
|
|
||||||
|
Sessions track state through these transitions:
|
||||||
|
- `created` - Initial state
|
||||||
|
- `invited` - Invite command received
|
||||||
|
- `accepted` - Accept command received
|
||||||
|
- `ringing` - Ringing command sent
|
||||||
|
- `hungup` - Hangup command or disconnect
|
||||||
|
|
||||||
|
The `state` attribute is updated in each handler and logged for debugging.
|
||||||
|
|
||||||
|
### Testing Endpoints Without Full Dependencies
|
||||||
|
|
||||||
|
The WebSocket endpoint (`/ws`) works without aiortc, av, or onnxruntime. Use this for testing core functionality:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Install minimal dependencies
|
||||||
|
pip install fastapi uvicorn numpy pydantic python-dotenv loguru aiohttp
|
||||||
|
|
||||||
|
# Start server
|
||||||
|
uvicorn app.main:app
|
||||||
|
|
||||||
|
# Test with basic client
|
||||||
|
python scripts/test_websocket.py
|
||||||
|
```
|
||||||
|
|
||||||
|
The WebRTC endpoint requires aiortc+av (PyAV) which can be challenging to install on Windows. Consider Linux/macOS for full WebRTC development.
|
||||||
|
|
||||||
|
### Logging
|
||||||
|
|
||||||
|
Logs are written to:
|
||||||
|
- Console (stdout) - Real-time output
|
||||||
|
- `logs/active_call_YYYY-MM-DD.log` - Rotated daily, retained for 7 days
|
||||||
|
|
||||||
|
Log levels: DEBUG, INFO, WARNING, ERROR, CRITICAL
|
||||||
|
|
||||||
|
Set via `LOG_LEVEL` environment variable or in `.env`.
|
||||||
|
|
||||||
|
### Dependencies Note
|
||||||
|
|
||||||
|
On Windows with Python 3.11, `aiortc` and `av` (PyAV) may have installation issues due to:
|
||||||
|
- Missing C compilers
|
||||||
|
- Incompatible binary wheel versions
|
||||||
|
- FFmpeg/library dependencies
|
||||||
|
|
||||||
|
The code gracefully handles missing optional dependencies with try/except imports and runtime checks. Consider using Docker for consistent development environments.
|
||||||
151
.gitignore
vendored
Normal file
151
.gitignore
vendored
Normal file
@@ -0,0 +1,151 @@
|
|||||||
|
# Byte-compiled / optimized / DLL files
|
||||||
|
__pycache__/
|
||||||
|
*.py[cod]
|
||||||
|
*$py.class
|
||||||
|
|
||||||
|
# C extensions
|
||||||
|
*.so
|
||||||
|
|
||||||
|
# Distribution / packaging
|
||||||
|
.Python
|
||||||
|
build/
|
||||||
|
develop-eggs/
|
||||||
|
dist/
|
||||||
|
downloads/
|
||||||
|
eggs/
|
||||||
|
.eggs/
|
||||||
|
lib/
|
||||||
|
lib64/
|
||||||
|
parts/
|
||||||
|
sdist/
|
||||||
|
var/
|
||||||
|
wheels/
|
||||||
|
share/python-wheels/
|
||||||
|
*.egg-info/
|
||||||
|
.installed.cfg
|
||||||
|
*.egg
|
||||||
|
MANIFEST
|
||||||
|
|
||||||
|
# PyInstaller
|
||||||
|
*.manifest
|
||||||
|
*.spec
|
||||||
|
|
||||||
|
# Installer logs
|
||||||
|
pip-log.txt
|
||||||
|
pip-delete-this-directory.txt
|
||||||
|
|
||||||
|
# Unit test / coverage reports
|
||||||
|
htmlcov/
|
||||||
|
.tox/
|
||||||
|
.nox/
|
||||||
|
.coverage
|
||||||
|
.coverage.*
|
||||||
|
.cache
|
||||||
|
nosetests.xml
|
||||||
|
coverage.xml
|
||||||
|
*.cover
|
||||||
|
*.py,cover
|
||||||
|
.hypothesis/
|
||||||
|
.pytest_cache/
|
||||||
|
cover/
|
||||||
|
|
||||||
|
# Translations
|
||||||
|
*.mo
|
||||||
|
*.pot
|
||||||
|
|
||||||
|
# Django stuff:
|
||||||
|
*.log
|
||||||
|
local_settings.py
|
||||||
|
db.sqlite3
|
||||||
|
db.sqlite3-journal
|
||||||
|
|
||||||
|
# Flask stuff:
|
||||||
|
instance/
|
||||||
|
.webassets-cache
|
||||||
|
|
||||||
|
# Scrapy stuff:
|
||||||
|
.scrapy
|
||||||
|
|
||||||
|
# Sphinx documentation
|
||||||
|
docs/_build/
|
||||||
|
|
||||||
|
# PyBuilder
|
||||||
|
.pybuilder/
|
||||||
|
target/
|
||||||
|
|
||||||
|
# Jupyter Notebook
|
||||||
|
.ipynb_checkpoints
|
||||||
|
|
||||||
|
# IPython
|
||||||
|
profile_default/
|
||||||
|
ipython_config.py
|
||||||
|
|
||||||
|
# pyenv
|
||||||
|
.python-version
|
||||||
|
|
||||||
|
# pipenv
|
||||||
|
Pipfile.lock
|
||||||
|
|
||||||
|
# poetry
|
||||||
|
poetry.lock
|
||||||
|
|
||||||
|
# pdm
|
||||||
|
.pdm.toml
|
||||||
|
|
||||||
|
# PEP 582
|
||||||
|
__pypackages__/
|
||||||
|
|
||||||
|
# Celery stuff
|
||||||
|
celerybeat-schedule
|
||||||
|
celerybeat.pid
|
||||||
|
|
||||||
|
# SageMath parsed files
|
||||||
|
*.sage.py
|
||||||
|
|
||||||
|
# Environments
|
||||||
|
.env
|
||||||
|
.venv
|
||||||
|
env/
|
||||||
|
venv/
|
||||||
|
ENV/
|
||||||
|
env.bak/
|
||||||
|
venv.bak/
|
||||||
|
|
||||||
|
# Spyder project settings
|
||||||
|
.spyderproject
|
||||||
|
.spyproject
|
||||||
|
|
||||||
|
# Rope project settings
|
||||||
|
.ropeproject
|
||||||
|
|
||||||
|
# mkdocs documentation
|
||||||
|
/site
|
||||||
|
|
||||||
|
# mypy
|
||||||
|
.mypy_cache/
|
||||||
|
.dmypy.json
|
||||||
|
dmypy.json
|
||||||
|
|
||||||
|
# Pyre type checker
|
||||||
|
.pyre/
|
||||||
|
|
||||||
|
# pytype static type analyzer
|
||||||
|
.pytype/
|
||||||
|
|
||||||
|
# Cython debug symbols
|
||||||
|
cython_debug/
|
||||||
|
|
||||||
|
# IDEs
|
||||||
|
.vscode/
|
||||||
|
.idea/
|
||||||
|
*.swp
|
||||||
|
*.swo
|
||||||
|
*~
|
||||||
|
|
||||||
|
# Project specific
|
||||||
|
assets/*.onnx
|
||||||
|
*.wav
|
||||||
|
*.mp3
|
||||||
|
*.pcm
|
||||||
|
recordings/
|
||||||
|
logs/
|
||||||
Reference in New Issue
Block a user