Init Projecto

2026-01-28 10:19:04 +08:00
commit 14013608a9
2 changed files with 495 additions and 0 deletions
--- a/.claude/CLAUDE.md
+++ b/.claude/CLAUDE.md
@@ -0,0 +1,344 @@
 # CLAUDE.md
 This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
 ## Project Overview
 This is a Python implementation of `active-call` (originally Rust), a high-performance Voice AI Gateway that bridges telephony protocols (WebSocket, WebRTC) with AI pipelines (LLM, ASR, TTS). The system follows a **decoupled architecture** where the Media Gateway (this service) handles low-level audio/signaling, while Business Logic (AI Agent) controls it via WebSocket API.
 **Technology Stack:**
 - Python 3.11+ with `asyncio` for all I/O
 - FastAPI + Uvicorn for WebSocket/WebRTC endpoints
 - aiortc for WebRTC media transport (optional dependency)
 - Silero VAD for voice activity detection (optional dependency)
 - Pydantic for protocol validation
 - Loguru for structured logging
 ## Common Development Commands
 ### Running the Server
 ```bash
 # Start development server (with auto-reload)
 uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
 # Start with specific host/port
 HOST=0.0.0.0 PORT=8080 uvicorn app.main:app
 # Using Docker
 docker-compose up --build
 ```
 ### Testing
 ```bash
 # Run WebSocket test client (sine wave generation)
 python scripts/test_websocket.py --url ws://localhost:8000/ws --sine
 # Run WebSocket test client (with audio file)
 python scripts/test_websocket.py --url ws://localhost:8000/ws --file test_audio.wav
 # Run WebRTC test client
 python scripts/test_webrtc.py --url ws://localhost:8000/webrtc
 # Run unit tests
 pytest tests/ -v --cov=app --cov=core
 # Run specific test file
 pytest tests/test_session.py -v
 # Run with coverage report
 pytest tests/ --cov=app --cov=core --cov-report=html
 ```
 ### Code Quality
 ```bash
 # Format code
 black app/ core/ models/ processors/ utils/ scripts/
 # Lint code
 ruff check app/ core/ models/ processors/ utils/ scripts/
 # Type checking
 mypy app/ core/
 ```
 ### Dependency Management
 ```bash
 # Install all dependencies
 pip install -r requirements.txt
 # Install development dependencies
 pip install -r requirements-dev.txt
 # Update dependencies
 pip install --upgrade -r requirements.txt
 ```
 ## Architecture Overview
 ### Decoupled Design Pattern
 The system implements a **decoupled architecture** separating concerns:
 1. **Media Gateway Layer** (`app/`, `core/`, `processors/`)
   - Handles low-level audio transport (WebSocket, WebRTC)
   - Manages session lifecycle and state
   - Processes audio through pipeline (VAD, resampling)
   - Emits events (speaking, silence, error) to control layer
 2. **Business Logic Layer** (External AI Agent)
   - Connects via WebSocket
   - Receives real-time events (speech detection, ASR transcripts)
   - Sends commands (tts, play, interrupt, hangup)
 ### Key Architecture Components
 **Transport Abstraction** ([`core/transports.py`](core/transports.py)):
 - `BaseTransport` - Abstract interface with `send_event()` and `send_audio()`
 - `SocketTransport` - WebSocket with mixed text/binary frames, uses `asyncio.Lock` to prevent frame interleaving
 - `WebRtcTransport` - WebSocket signaling + aiortc RTCPeerConnection for media
 **Session Management** ([`core/session.py`](core/session.py)):
 - Each WebSocket/WebRTC connection creates a `Session` with unique UUID
 - Routes incoming JSON commands to handlers via `parse_command()` from `models/commands.py`
 - Routes binary audio data to `AudioPipeline`
 - Manages session state: created → invited → accepted → ringing → hungup
 - Cleanup on disconnect
 **Audio Pipeline** ([`core/pipeline.py`](core/pipeline.py)):
 - Processes audio through VAD (Voice Activity Detection)
 - Emits events to global event bus when VAD state changes
 - Supports interruption for barge-in scenarios
 **Event Bus** ([`core/events.py`](core/events.py)):
 - Global pub/sub system for inter-component communication
 - Subscribe to specific event types (speaking, silence, error)
 - Async notification to all subscribers
 ### Protocol Compatibility
 The implementation **must maintain protocol compatibility** with the original Rust API. All commands and events are strictly defined in:
 - [`models/commands.py`](models/commands.py) - Command models (invite, accept, reject, tts, play, interrupt, hangup, chat)
 - [`models/events.py`](models/events.py) - Event models (answer, speaking, silence, trackStart, trackEnd, error)
 - [`models/config.py`](models/config.py) - Configuration models (CallOption, VADOption, TTSOption, ASROption, etc.)
 **Important:** Always use `parse_command()` from `models/commands.py` to parse incoming JSON - never manually parse command strings. This ensures type safety and validation.
 ### WebSocket Protocol (`/ws` endpoint)
 **Mixed Frame Handling:**
 - **Text frames** → JSON commands (invite, tts, play, interrupt, hangup, etc.)
 - **Binary frames** → Raw PCM audio (16kHz, 16-bit, mono)
 **Flow:**
 1. Client connects and sends `invite` command with codec configuration
 2. Server responds with `answer` event
 3. Client streams binary audio frames
 4. Server processes audio and emits events (speaking, silence)
 5. Client can send commands at any time (tts, play, interrupt, hangup)
 ### WebRTC Protocol (`/webrtc` endpoint)
 **Signaling Flow:**
 1. Client connects via WebSocket
 2. Client sends SDP offer (JSON with `sdp` and `type` fields)
 3. Server creates RTCPeerConnection and generates SDP answer
 4. Server responds with `answer` event containing SDP
 5. WebRTC media flows via UDP (managed by aiortc)
 6. Commands can be sent via WebSocket text frames at any time
 **Audio Track Handling:**
 - When `pc.on("track")` fires, wrap received track with `Resampled16kTrack`
 - Pull frames from track and convert to bytes
 - Feed bytes to `session.handle_audio()`
 ### Session Lifecycle
 ```
 1. Connection → WebSocket/WebRTC endpoint accepts
 2. Session creation → New Session(uuid, transport)
 3. Invite → Client sends invite command
 4. Answer → Server sends answer event
 5. Audio streaming → Client sends binary audio / WebRTC media
 6. Commands → Client sends JSON commands (tts, play, interrupt)
 7. Hangup → Client sends hangup command OR connection closes
 8. Cleanup → Session cleanup, remove from active_sessions
 ```
 ### Optional Dependencies
 The following dependencies are **optional** - the code gracefully degrades without them:
 - **aiortc + av (PyAV)** - Required for WebRTC functionality. Without them:
  - `/webrtc` endpoint will reject connections
  - WebRTC transport cannot be used
  - WebSocket endpoint still works fine
 - **onnxruntime** - Required for VAD functionality. Without it:
  - VAD always returns "Speech" with probability 1.0
  - speaking/silence events still emitted but not accurate
 ## Important Implementation Details
 ### Thread Safety in WebSocket Transport
 The `SocketTransport` uses `asyncio.Lock()` because FastAPI WebSocket's `send_text()` and `send_bytes()` are NOT thread-safe. Without the lock, rapidly sending text and binary frames can interleave, causing protocol violations.
 ```python
 async def send_event(self, event: dict):
    async with self.lock:  # Critical for thread safety
        await self.ws.send_text(json.dumps(event))
 async def send_audio(self, pcm_bytes: bytes):
    async with self.lock:
        await self.ws.send_bytes(pcm_bytes)
 ```
 ### Event Bus Usage
 Components subscribe to event types and are notified asynchronously:
 ```python
 event_bus = get_event_bus()
 # Subscribe to speaking events
 event_bus.subscribe("speaking", my_callback)
 # Publish events
 await event_bus.publish("speaking", {"trackId": session_id, "probability": 0.9})
 ```
 ### Error Handling Pattern
 All errors are sent as `error` events to the client:
 ```python
 await self.transport.send_event({
    "event": "error",
    "trackId": self.current_track_id,
    "timestamp": self._get_timestamp_ms(),
    "sender": "server",  # or "asr", "tts", "media", etc.
    "error": error_message
 })
 ```
 ### Configuration Management
 Configuration is loaded from:
 1. Environment variables
 2. `.env` file (gitignored)
 3. Default values in `app/config.py`
 **Never commit `.env`** - it may contain sensitive keys. Use `.env.example` as a template.
 ### Audio Format Specifications
 **Input/Output Audio:**
 - Sample rate: 16kHz
 - Bit depth: 16-bit (PCM)
 - Channels: Mono
 - Chunk size: 640 bytes (20ms at 16kHz)
 **Format:** Little-endian signed 16-bit integers (int16)
 ## Key Files Reference
 When working with this codebase, these files are the most critical:
 - [`app/main.py`](app/main.py) - FastAPI endpoints, session lifecycle, event hooks
 - [`core/transports.py`](core/transports.py) - Transport abstraction and WebSocket/WebRTC handling
 - [`core/session.py`](core/session.py) - Command routing, session state management
 - [`core/pipeline.py`](core/pipeline.py) - Audio processing, VAD integration, event emission
 - [`models/commands.py`](models/commands.py) - Protocol command definitions and parsing
 - [`models/events.py`](models/events.py) - Protocol event definitions
 - [`processors/vad.py`](processors/vad.py) - Silero VAD implementation
 - [`reference/active-call/docs/api.md`](reference/active-call/docs/api.md) - Complete API specification from original Rust implementation
 ### Testing Strategy
 When implementing new features:
 1. **Unit tests** - Test individual components (transports, session, pipeline)
 2. **Integration tests** - Test endpoint behavior with test clients
 3. **Protocol tests** - Verify commands/events match API specification
 4. **Manual testing** - Use `scripts/test_websocket.py` and `scripts/test_webrtc.py`
 ### Reference Implementations
 - **Original Rust implementation:** [`reference/active-call/`](reference/active-call/) - Complete feature set with SIP, ASR, TTS
 - **Python reference:** [`reference/py-active-call/`](reference/py-active-call/) - Partial implementation with bot integration
 Use these as references for:
 - Protocol specification details
 - Architecture patterns
 - Testing approaches
 - Edge case handling
 ### Common Patterns
 **Creating a new command:**
 1. Add model to `models/commands.py`
 2. Add to `COMMAND_TYPES` dict
 3. Add handler method in `core/session.py` (e.g., `_handle_mycommand`)
 4. Route in `Session.handle_text()` under the command type
 **Adding a new event:**
 1. Add model to `models/events.py`
 2. Add to `EVENT_TYPES` dict
 3. Emit via `transport.send_event()` or `event_bus.publish()`
 **Adding a new processor:**
 1. Create in `processors/myprocessor.py`
 2. Integrate into `core/pipeline.py` AudioPipeline
 3. Emit events through event bus
 ### Session State Management
 Sessions track state through these transitions:
 - `created` - Initial state
 - `invited` - Invite command received
 - `accepted` - Accept command received
 - `ringing` - Ringing command sent
 - `hungup` - Hangup command or disconnect
 The `state` attribute is updated in each handler and logged for debugging.
 ### Testing Endpoints Without Full Dependencies
 The WebSocket endpoint (`/ws`) works without aiortc, av, or onnxruntime. Use this for testing core functionality:
 ```bash
 # Install minimal dependencies
 pip install fastapi uvicorn numpy pydantic python-dotenv loguru aiohttp
 # Start server
 uvicorn app.main:app
 # Test with basic client
 python scripts/test_websocket.py
 ```
 The WebRTC endpoint requires aiortc+av (PyAV) which can be challenging to install on Windows. Consider Linux/macOS for full WebRTC development.
 ### Logging
 Logs are written to:
 - Console (stdout) - Real-time output
 - `logs/active_call_YYYY-MM-DD.log` - Rotated daily, retained for 7 days
 Log levels: DEBUG, INFO, WARNING, ERROR, CRITICAL
 Set via `LOG_LEVEL` environment variable or in `.env`.
 ### Dependencies Note
 On Windows with Python 3.11, `aiortc` and `av` (PyAV) may have installation issues due to:
 - Missing C compilers
 - Incompatible binary wheel versions
 - FFmpeg/library dependencies
 The code gracefully handles missing optional dependencies with try/except imports and runtime checks. Consider using Docker for consistent development environments.
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,151 @@
 # Byte-compiled / optimized / DLL files
 __pycache__/
 *.py[cod]
 *$py.class
 # C extensions
 *.so
 # Distribution / packaging
 .Python
 build/
 develop-eggs/
 dist/
 downloads/
 eggs/
 .eggs/
 lib/
 lib64/
 parts/
 sdist/
 var/
 wheels/
 share/python-wheels/
 *.egg-info/
 .installed.cfg
 *.egg
 MANIFEST
 # PyInstaller
 *.manifest
 *.spec
 # Installer logs
 pip-log.txt
 pip-delete-this-directory.txt
 # Unit test / coverage reports
 htmlcov/
 .tox/
 .nox/
 .coverage
 .coverage.*
 .cache
 nosetests.xml
 coverage.xml
 *.cover
 *.py,cover
 .hypothesis/
 .pytest_cache/
 cover/
 # Translations
 *.mo
 *.pot
 # Django stuff:
 *.log
 local_settings.py
 db.sqlite3
 db.sqlite3-journal
 # Flask stuff:
 instance/
 .webassets-cache
 # Scrapy stuff:
 .scrapy
 # Sphinx documentation
 docs/_build/
 # PyBuilder
 .pybuilder/
 target/
 # Jupyter Notebook
 .ipynb_checkpoints
 # IPython
 profile_default/
 ipython_config.py
 # pyenv
 .python-version
 # pipenv
 Pipfile.lock
 # poetry
 poetry.lock
 # pdm
 .pdm.toml
 # PEP 582
 __pypackages__/
 # Celery stuff
 celerybeat-schedule
 celerybeat.pid
 # SageMath parsed files
 *.sage.py
 # Environments
 .env
 .venv
 env/
 venv/
 ENV/
 env.bak/
 venv.bak/
 # Spyder project settings
 .spyderproject
 .spyproject
 # Rope project settings
 .ropeproject
 # mkdocs documentation
 /site
 # mypy
 .mypy_cache/
 .dmypy.json
 dmypy.json
 # Pyre type checker
 .pyre/
 # pytype static type analyzer
 .pytype/
 # Cython debug symbols
 cython_debug/
 # IDEs
 .vscode/
 .idea/
 *.swp
 *.swo
 *~
 # Project specific
 assets/*.onnx
 *.wav
 *.mp3
 *.pcm
 recordings/
 logs/