Remove test files and testing documentation from PR

This commit is contained in:
zack
2026-03-01 11:51:51 -05:00
parent 36b9c05730
commit cb7e612738
5 changed files with 0 additions and 2154 deletions

View File

@@ -1,273 +0,0 @@
# AssemblyAI u3-rt-pro Testing Checklist
## Test Environment Setup
- [ ] Install dependencies: `uv sync --group dev --all-extras`
- [ ] Set up `.env` file with API keys
- [ ] Verify LiveKit connection
- [ ] Run basic voice agent test
---
## Feature Testing Checklist
### ✅ Basic Configuration Tests
#### Test 1: Default u3-rt-pro Configuration
- [ ] **Setup:** Create service with default params
- [ ] **Expected:** No errors, uses u3-rt-pro model with 100ms min/max
- [ ] **Verify:** Check logs for connection confirmation
#### Test 2: Custom min_turn_silence
- [ ] **Setup:** Set `min_turn_silence=200`
- [ ] **Expected:** Both min and max set to 200ms
- [ ] **Verify:** Speak short phrases, observe turn detection timing
#### Test 3: User sets max_turn_silence (Warning Test)
- [ ] **Setup:** Set `max_turn_silence=500` in connection params
- [ ] **Expected:** Warning logged, value overridden to match min
- [ ] **Verify:** Check logs for warning message
---
### ✅ Prompting Tests
#### Test 4: No Prompt (Default - Recommended)
- [ ] **Setup:** Don't set prompt parameter
- [ ] **Expected:** Uses default prompt, 88% accuracy, no warnings
- [ ] **Verify:** Transcription quality is good
#### Test 5: Custom Prompt (Warning Test)
- [ ] **Setup:** Set custom prompt in connection params
- [ ] **Expected:** Warning logged about testing without prompt first
- [ ] **Verify:** Check logs for prompt warning
#### Test 6: Prompt + Keyterms Conflict (Error Test)
- [ ] **Setup:** Set both `prompt` and `keyterms_prompt` at init
- [ ] **Expected:** ValueError raised with helpful error message
- [ ] **Verify:** Service fails to initialize with clear error
---
### ✅ Keyterms Prompting Tests
#### Test 7: Basic Keyterms at Init
- [ ] **Setup:** Set `keyterms_prompt=["Pipecat", "AssemblyAI", "Universal-3"]`
- [ ] **Expected:** Terms are boosted in recognition
- [ ] **Verify:** Say the boosted terms, check accuracy
#### Test 8: Empty Keyterms (No Boosting)
- [ ] **Setup:** Set `keyterms_prompt=[]`
- [ ] **Expected:** No boosting, default behavior
- [ ] **Verify:** Normal transcription
---
### ✅ Diarization Tests
#### Test 9: Diarization Disabled (Default)
- [ ] **Setup:** Don't set `speaker_labels` parameter
- [ ] **Expected:** No speaker info in transcripts
- [ ] **Verify:** TranscriptionFrame.user_id is default user_id
#### Test 10: Diarization Enabled (No Formatting)
- [ ] **Setup:** Set `speaker_labels=True`
- [ ] **Expected:** Speaker ID in user_id field, plain text
- [ ] **Verify:** Multiple speakers show different IDs (Speaker A, Speaker B)
#### Test 11: Diarization with XML Formatting
- [ ] **Setup:** Set `speaker_labels=True`, `speaker_format="<{speaker}>{text}</{speaker}>"`
- [ ] **Expected:** Text includes speaker tags: `<Speaker A>Hello</Speaker A>`
- [ ] **Verify:** Formatted text in transcript, speaker ID in user_id
#### Test 12: Diarization with Colon Prefix
- [ ] **Setup:** Set `speaker_labels=True`, `speaker_format="{speaker}: {text}"`
- [ ] **Expected:** Text includes prefix: `Speaker A: Hello`
- [ ] **Verify:** Formatted text, multiple speakers distinguishable
---
### ✅ Dynamic Updates Tests
#### Test 13: Dynamic Keyterms Update (Stage 1 → Stage 2)
- [ ] **Setup:** Start with empty keyterms, update mid-conversation
- [ ] **Expected:** New keyterms take effect immediately
- [ ] **Test Steps:**
1. Start conversation with no keyterms
2. Send update frame with `keyterms_prompt=["cardiology", "Dr. Smith"]`
3. Say the new terms
- [ ] **Verify:** Improved recognition after update
#### Test 14: Clear Keyterms (Reset Context)
- [ ] **Setup:** Start with keyterms, clear them mid-stream
- [ ] **Expected:** Context biasing removed
- [ ] **Test Steps:**
1. Start with `keyterms_prompt=["test", "words"]`
2. Send update frame with `keyterms_prompt=[]`
- [ ] **Verify:** No more boosting after clear
#### Test 15: Dynamic Silence Parameters
- [ ] **Setup:** Update `max_turn_silence` mid-stream
- [ ] **Expected:** Turn detection timing changes
- [ ] **Test Steps:**
1. Start with default (1200ms)
2. Update to `max_turn_silence=5000` (for reading numbers)
3. Pause longer between words
4. Update back to `max_turn_silence=1200`
- [ ] **Verify:** Longer pauses tolerated when increased
#### Test 16: Dynamic Prompt Update
- [ ] **Setup:** Update prompt mid-stream
- [ ] **Expected:** New instructions take effect
- [ ] **Test Steps:**
1. Start with default prompt
2. Send update with custom prompt
- [ ] **Verify:** Behavior changes according to new prompt
#### Test 17: Multiple Parameters at Once
- [ ] **Setup:** Update keyterms, max_turn_silence, and min_end_of_turn together
- [ ] **Expected:** All parameters updated in single WebSocket message
- [ ] **Verify:** Check logs for single UpdateConfiguration message
#### Test 18: Dynamic Update - Prompt + Keyterms Conflict (Error)
- [ ] **Setup:** Try to update both prompt and keyterms_prompt in same update
- [ ] **Expected:** ValueError raised
- [ ] **Verify:** Update fails with clear error message
---
### ✅ Turn Detection Mode Tests
#### Test 19: Pipecat Mode (vad_force_turn_endpoint=True) - Default
- [ ] **Setup:** Use default settings (Pipecat mode)
- [ ] **Expected:**
- ForceEndpoint sent on VAD stop
- Smart Turn Analyzer makes decisions
- min=max=100ms for u3-rt-pro
- [ ] **Verify:** Fast finals, Smart Turn handles completeness
#### Test 20: STT Mode (vad_force_turn_endpoint=False) - u3-rt-pro only
- [ ] **Setup:** Set `vad_force_turn_endpoint=False` with u3-rt-pro
- [ ] **Expected:**
- AssemblyAI controls turn endings
- SpeechStarted message triggers interruptions
- UserStarted/StoppedSpeakingFrame emitted
- [ ] **Verify:** Turn detection from AssemblyAI model
#### Test 21: STT Mode with universal-streaming (Error Test)
- [ ] **Setup:** Set `vad_force_turn_endpoint=False` with universal-streaming
- [ ] **Expected:** ValueError raised (requires u3-rt-pro)
- [ ] **Verify:** Service fails with clear error
---
### ✅ Language Detection Tests (If Multilingual Model)
#### Test 22: Language Detection Enabled
- [ ] **Setup:** Use `universal-streaming-multilingual` with `language_detection=True`
- [ ] **Expected:** Language codes in transcripts
- [ ] **Verify:** Speak different languages, check language_code field
#### Test 23: Language Confidence Threshold
- [ ] **Setup:** Enable language detection
- [ ] **Expected:** High confidence (≥0.7) → detected language, Low → fallback to English
- [ ] **Verify:** Check logs for confidence warnings
---
### ✅ Edge Cases & Error Handling
#### Test 24: WebSocket Disconnect During Update
- [ ] **Setup:** Simulate disconnect, try update
- [ ] **Expected:** Error logged, update queued for reconnection
- [ ] **Verify:** Graceful handling, no crash
#### Test 25: Invalid Parameter Types
- [ ] **Setup:** Send update with wrong type (e.g., keyterms_prompt as string)
- [ ] **Expected:** Warning logged, parameter skipped
- [ ] **Verify:** Service continues, invalid param ignored
#### Test 26: Unknown Parameter in Update
- [ ] **Setup:** Send update with unsupported parameter (e.g., `language`)
- [ ] **Expected:** Warning logged about parameter
- [ ] **Verify:** Other valid params still updated
---
### ✅ Integration Tests
#### Test 27: Full Voice Agent Flow (Multi-Stage)
- [ ] **Setup:** Complete voice agent with stage transitions
- [ ] **Test Steps:**
1. Greeting stage (general keyterms)
2. Name collection stage (name keyterms)
3. Account number stage (number keyterms, longer silence)
4. Medical info stage (medical keyterms)
5. Closing stage (goodbye keyterms)
- [ ] **Verify:** Each stage has appropriate keyterms and timing
#### Test 28: Diarization + Dynamic Updates
- [ ] **Setup:** Enable diarization, update keyterms mid-stream
- [ ] **Expected:** Both features work together
- [ ] **Verify:** Speaker IDs persist, keyterms update correctly
#### Test 29: Interruption Handling
- [ ] **Setup:** Bot speaking, user interrupts
- [ ] **Expected:**
- Pipecat mode: VAD + Smart Turn handles
- STT mode: SpeechStarted triggers interrupt
- [ ] **Verify:** Bot stops, user speech processed
---
## Testing Results Template
```
| Test # | Feature | Status | Notes |
|--------|---------|--------|-------|
| 1 | Default Config | ✅ PASS | |
| 2 | Custom min_silence | ✅ PASS | |
| 3 | max_silence Warning | ✅ PASS | |
| ... | ... | ... | ... |
```
---
## Expected Outcomes Summary
### ✅ Should Work (No Errors)
- Default configuration
- Custom min_turn_silence
- Keyterms prompting
- Diarization with/without formatting
- Dynamic updates (one parameter or multiple)
- Pipecat mode turn detection
### ⚠️ Should Warn (Logs Warning, Continues)
- Custom prompt set at init
- max_turn_silence set (overridden)
- Invalid parameter types in updates
- Language update attempted
- Prompt used with universal-streaming
### ❌ Should Error (Raises Exception, Stops)
- prompt + keyterms_prompt at init
- prompt + keyterms_prompt in same update
- vad_force_turn_endpoint=False with universal-streaming
---
## Quick Test Commands
```bash
# Run basic test
python test_assemblyai_u3pro.py --test basic
# Run specific test
python test_assemblyai_u3pro.py --test diarization
# Run all tests
python test_assemblyai_u3pro.py --test all
# Interactive mode
python test_assemblyai_u3pro.py --interactive
```

View File

@@ -1,310 +0,0 @@
# AssemblyAI u3-rt-pro Testing Setup Guide
## Quick Start
### 1. Setup Environment
```bash
# Copy API keys
cp .env.testing .env
# Install dependencies
uv sync --group dev --all-extras --no-extra gstreamer --no-extra krisp
# Make test script executable
chmod +x test_assemblyai_u3pro.py
```
### 2. Ensure Audio Devices
Make sure you have:
- **Microphone** enabled and working
- **Speakers/headphones** connected
- Audio permissions granted (macOS will prompt on first run)
### 3. Run Tests
```bash
# Run a specific test
python test_assemblyai_u3pro.py --test basic
# Interactive mode (choose from menu)
python test_assemblyai_u3pro.py --interactive
# Run all tests sequentially
python test_assemblyai_u3pro.py --test all
```
---
## Available Tests
### Basic Configuration Tests
```bash
# Test 1: Default configuration (min=max=100ms)
python test_assemblyai_u3pro.py --test basic
# Test 2: Custom min_turn_silence
python test_assemblyai_u3pro.py --test custom_min
# Test 3: max_turn_silence warning (should be overridden)
python test_assemblyai_u3pro.py --test max_warning
```
### Prompting Tests
```bash
# Test 5: Custom prompt warning
python test_assemblyai_u3pro.py --test prompt_warning
# Test 6: Prompt + keyterms conflict (should error)
python test_assemblyai_u3pro.py --test prompt_keyterms_conflict
# Test 7: Basic keyterms prompting
python test_assemblyai_u3pro.py --test keyterms
```
### Diarization Tests
```bash
# Test 10: Diarization without formatting
python test_assemblyai_u3pro.py --test diarization
# Test 11: Diarization with XML formatting
python test_assemblyai_u3pro.py --test diarization_xml
```
### Dynamic Updates Tests
```bash
# Test 13: Dynamic keyterms (multi-stage)
python test_assemblyai_u3pro.py --test dynamic_keyterms
# Test 15: Dynamic silence parameters
python test_assemblyai_u3pro.py --test dynamic_silence
# Test 17: Multiple parameters at once
python test_assemblyai_u3pro.py --test multi_param
```
---
## Test Execution Flow
### For Each Test:
1. **Start the test script**
```bash
python test_assemblyai_u3pro.py --test <test_name>
```
2. **Wait for "started" message** indicating the bot is ready
3. **Speak into your microphone** to test - the bot will:
- Transcribe your speech (you'll see `📝 TRANSCRIPTION:` logs)
- Process through the LLM
- Respond with voice through your speakers
4. **Observe logs** for:
- ✅ Success indicators
- ⚠️ Warning messages
- ❌ Error messages
- 📝 Transcription output
5. **Verify expected behavior** against checklist
6. **Stop test** with Ctrl+C
---
## Expected Test Outcomes
### Should Pass (✅)
- Basic configuration creates service
- Custom parameters are applied
- Keyterms boost recognition
- Diarization shows speaker IDs
- Dynamic updates work without errors
### Should Warn (⚠️)
Check logs for warnings:
- "We recommend testing at first with no prompt"
- "max_turn_silence is not used in Pipecat mode"
- "Unknown setting for AssemblyAI STT service"
### Should Error (❌)
Should raise ValueError and fail to start:
- Both prompt and keyterms_prompt set at init
- Both prompt and keyterms_prompt in same update
- vad_force_turn_endpoint=False with universal-streaming
---
## Debugging Tips
### Check Logs
```bash
# Run with verbose logging
LOGURU_LEVEL=DEBUG python test_assemblyai_u3pro.py --test <test_name>
```
### Common Issues
**Issue: "WebSocket connection failed"**
- Check ASSEMBLYAI_API_KEY is correct
- Verify network connection
- Check firewall settings
**Issue: "No audio input/output"**
- Verify microphone permissions (System Preferences → Security & Privacy → Microphone)
- Check default audio devices in System Preferences → Sound
- Test microphone with another app first
- Make sure no other app is using the microphone
**Issue: "No transcriptions appearing"**
- Verify microphone permissions
- Check audio levels (speak louder or move closer to mic)
- Speak clearly and wait for VAD to detect
- Check if microphone is muted
**Issue: "Can't hear bot responses"**
- Check speaker/headphone volume
- Verify correct output device is selected
- Check terminal for TTS errors
**Issue: "Service fails to start"**
- Check all API keys in .env
- Run `uv sync` to ensure dependencies installed
- Check Python version (3.10+)
---
## Manual Testing Checklist
After running automated tests, manually verify:
### ✅ Audio Quality
- [ ] Transcriptions are accurate
- [ ] No distortion or dropouts
- [ ] Latency is acceptable
### ✅ Turn Detection
- [ ] Bot waits for user to finish speaking
- [ ] No premature cutoffs
- [ ] Handles natural pauses correctly
### ✅ Interruptions
- [ ] Can interrupt bot mid-sentence
- [ ] Interruption is smooth
- [ ] Bot stops speaking immediately
### ✅ Diarization (if enabled)
- [ ] Multiple speakers detected correctly
- [ ] Speaker IDs consistent
- [ ] Speaker formatting works
### ✅ Dynamic Updates
- [ ] Keyterms update without disconnection
- [ ] Turn detection timing changes work
- [ ] Updates logged correctly
---
## Test Results Recording
### Use this template:
```markdown
## Test Run: YYYY-MM-DD
| Test # | Test Name | Status | Notes |
|--------|-----------|--------|-------|
| 1 | basic | ✅ PASS | Transcriptions working |
| 2 | custom_min | ✅ PASS | Turn timing changed |
| 3 | max_warning | ✅ PASS | Warning logged |
| 5 | prompt_warning | ✅ PASS | Warning shown |
| 6 | prompt_keyterms_conflict | ✅ PASS | ValueError raised |
| 7 | keyterms | ✅ PASS | Terms boosted |
| 10 | diarization | ✅ PASS | Speaker IDs correct |
| 11 | diarization_xml | ✅ PASS | XML tags shown |
| 13 | dynamic_keyterms | ✅ PASS | Updates worked |
| 15 | dynamic_silence | ✅ PASS | Timing adjusted |
| 17 | multi_param | ✅ PASS | All params updated |
### Issues Found:
- None
### Notes:
- All tests passed successfully
- Latency is excellent (sub-300ms)
- Diarization accuracy is good
```
---
## Advanced Testing
### Custom Test Scenarios
Create custom tests by modifying `test_assemblyai_u3pro.py`:
```python
async def test_my_custom_scenario():
"""My custom test scenario."""
logger.info("Testing my specific use case")
connection_params = AssemblyAIConnectionParams(
speech_model="u3-rt-pro",
# Your custom params here
)
task, transport = await create_basic_voice_agent(connection_params)
# Your test logic here
runner = PipelineRunner()
await runner.run(task)
```
### Stress Testing
Test with:
- Multiple simultaneous speakers
- Long conversations (30+ minutes)
- Rapid speech
- Heavy accents
- Background noise
- Poor network conditions
---
## Reporting Issues
When reporting issues, include:
1. **Test name and number**
2. **Full error message and stack trace**
3. **Relevant log output** (use LOGURU_LEVEL=DEBUG)
4. **Configuration used** (connection_params)
5. **Expected vs actual behavior**
6. **Steps to reproduce**
---
## Next Steps
After testing:
1. ✅ Mark completed tests in `TESTING_CHECKLIST.md`
2. 📝 Document any issues found
3. 🐛 Create GitHub issues for bugs
4. ✨ Suggest improvements
5. 📊 Share results with team
---
## Contact
Questions? Issues?
- Check `TESTING_CHECKLIST.md` for detailed test descriptions
- Review logs with `LOGURU_LEVEL=DEBUG`
- Reach out to the team with your findings
Happy testing! 🎯

View File

@@ -1,240 +0,0 @@
#!/usr/bin/env python3
"""Custom AssemblyAI u3-rt-pro Test Script
Easy parameter tweaking for experimentation
Edit the CONFIGURATION section below to test different settings!
"""
import asyncio
import os
import sys
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.services.assemblyai.models import AssemblyAIConnectionParams
from pipecat.services.assemblyai.stt import AssemblyAISTTService
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.local.audio import LocalAudioTransport, LocalAudioTransportParams
load_dotenv(override=True)
# ============================================================================
# CONFIGURATION
# ============================================================================
# Log Level: "DEBUG" for detailed logs, "INFO" for normal operation
LOG_LEVEL = "INFO"
# ============================================================================
# BOT IMPLEMENTATION
# ============================================================================
async def main():
"""Run the custom test bot with your configured parameters."""
# Setup logging
logger.remove(0)
logger.add(sys.stderr, level=LOG_LEVEL)
logger.info("=" * 80)
logger.info("AssemblyAI u3-rt-pro Custom Test")
logger.info("=" * 80)
logger.info("Starting bot... Speak after you hear the greeting!")
logger.info("=" * 80)
# Create local audio transport
transport = LocalAudioTransport(
LocalAudioTransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
)
)
# ========================================================================
# EDIT PARAMETERS HERE
# ========================================================================
# Build connection params
connection_params = AssemblyAIConnectionParams(
# ====================================================================
# Model Selection
# ====================================================================
speech_model="u3-rt-pro",
# speech_model="universal-streaming-english",
# speech_model="universal-streaming-multilingual",
# ====================================================================
# Turn Detection Timing
# ====================================================================
# Minimum silence when confident about end of turn (milliseconds)
# Default: 100ms | Higher = more patient | Lower = faster responses
# Only used in Pipecat mode (vad_force_turn_endpoint=True)
min_turn_silence=100000,
# min_turn_silence=200,
# min_turn_silence=300,
# Maximum turn silence (milliseconds)
# WARNING: In Pipecat mode (vad_force_turn_endpoint=True), this is
# automatically set equal to min_turn_silence
# to avoid double turn detection. Only used as-is in STT mode.
max_turn_silence=500,
# End of turn confidence threshold (0.0 to 1.0)
# Higher = requires more confidence before ending turn
# end_of_turn_confidence_threshold=0.8,
# ====================================================================
# Prompting & Boosting
# ====================================================================
# Custom Prompt (WARNING: test carefully, default is optimized!)
# None = Use AssemblyAI's optimized default (recommended for 88% accuracy)
prompt=None,
# prompt="Transcribe speech with focus on technical terms.",
# prompt="Context: Medical conversation. Transcribe accurately.",
# Keyterms Prompting (boosts recognition for specific words)
# NOTE: Cannot use both prompt and keyterms_prompt!
keyterms_prompt=None,
# keyterms_prompt=["Pipecat", "AssemblyAI", "OpenAI", "Cartesia"],
# keyterms_prompt=["Python", "JavaScript", "TypeScript", "API"],
# ====================================================================
# Diarization (Speaker Identification)
# ====================================================================
# Enable speaker labels (identifies different speakers)
speaker_labels=None, # None or True
# speaker_labels=True,
# ====================================================================
# Audio Configuration
# ====================================================================
# Audio sample rate (Hz)
# sample_rate=16000,
# sample_rate=8000,
# Audio encoding format
# encoding="pcm_s16le", # Default: 16-bit PCM
# encoding="pcm_mulaw", # μ-law encoding (telephony)
# ====================================================================
# Other Options
# ====================================================================
# Format transcript turns (applies formatting rules)
# format_turns=True, # Default
# format_turns=False,
# Language detection (only for universal-streaming-multilingual)
# language_detection=True,
)
# Log connection parameters for debugging
logger.info("=" * 80)
logger.info("CONNECTION PARAMETERS:")
logger.info(f" speech_model: {connection_params.speech_model}")
logger.info(f" min_turn_silence: {connection_params.min_turn_silence}")
logger.info(f" max_turn_silence: {connection_params.max_turn_silence}")
logger.info(f" sample_rate: {connection_params.sample_rate}")
logger.info(f" encoding: {connection_params.encoding}")
logger.info(f" prompt: {connection_params.prompt}")
logger.info(f" keyterms_prompt: {connection_params.keyterms_prompt}")
logger.info(f" speaker_labels: {connection_params.speaker_labels}")
logger.info(f" format_turns: {connection_params.format_turns}")
logger.info(
f" end_of_turn_confidence_threshold: {connection_params.end_of_turn_confidence_threshold}"
)
logger.info(f" language_detection: {connection_params.language_detection}")
logger.info("=" * 80)
# AssemblyAI Speech-to-Text Service
stt = AssemblyAISTTService(
api_key=os.getenv("ASSEMBLYAI_API_KEY"),
connection_params=connection_params,
# Turn Detection Mode
# True = Pipecat mode (VAD + Smart Turn controls turns)
# False = STT mode (u3-rt-pro model controls turns)
vad_force_turn_endpoint=True,
# Speaker Formatting (only used if speaker_labels=True)
# None = Just log speaker IDs, don't modify transcript
speaker_format=None,
# speaker_format="<Speaker {speaker}>{text}</Speaker {speaker}>",
# speaker_format="{speaker}: {text}",
# speaker_format="[{speaker}] {text}",
# Additional available parameters (uncomment to use):
# should_interrupt=True, # Only for STT mode
)
# ========================================================================
# Text-to-Speech
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="a0e99841-438c-4a64-b679-ae501e7d6091", # Conversational English
)
# LLM
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4",
)
# Conversation context
messages = [
{
"role": "system",
"content": (
"You are a helpful voice assistant testing the AssemblyAI u3-rt-pro model. "
"Keep responses very brief (1-2 sentences). "
"Start by introducing yourself briefly and asking the user to speak."
),
},
]
context = LLMContext(messages)
# Configure aggregator based on mode
# In STT mode, don't use VAD (model handles turn detection)
# In Pipecat mode, use VAD + Smart Turn
vad_force_turn_endpoint = True # Must match the value in stt configuration above
user_params = None
if vad_force_turn_endpoint:
user_params = LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer())
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=user_params,
)
# Pipeline
pipeline = Pipeline(
[
transport.input(),
stt,
user_aggregator,
llm,
tts,
transport.output(),
assistant_aggregator,
]
)
# Task
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
)
# Start the conversation
await task.queue_frames([LLMRunFrame()])
# Run
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -1,749 +0,0 @@
#!/usr/bin/env python3
"""Interactive AssemblyAI u3-rt-pro Comprehensive Test Suite
Tests all features with detailed scenarios:
- Basic configuration variations
- Prompting and keyterms with difficult names
- Diarization
- Dynamic parameter updates (single and multiple)
- Mode comparisons
- STT mode timing experiments (testing silence parameters)
- Edge cases
Usage:
python test_assemblyai_interactive.py
"""
import asyncio
import os
import sys
from typing import Optional
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame, STTUpdateSettingsFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.services.assemblyai.models import AssemblyAIConnectionParams
from pipecat.services.assemblyai.stt import AssemblyAISTTService, AssemblyAISTTSettings
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.local.audio import LocalAudioTransport, LocalAudioTransportParams
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="INFO")
async def run_bot(
connection_params: AssemblyAIConnectionParams,
test_name: str,
vad_force_turn_endpoint: bool = True,
speaker_format: Optional[str] = None,
test_dynamic_updates: Optional[callable] = None,
):
"""Run the voice bot with specified configuration."""
logger.info("=" * 80)
logger.info(f"TEST: {test_name}")
logger.info("=" * 80)
logger.info("Starting bot... Speak into your microphone after you hear the greeting!")
logger.info("=" * 80)
# Create local audio transport
transport = LocalAudioTransport(
LocalAudioTransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
)
)
# AssemblyAI Speech-to-Text
stt = AssemblyAISTTService(
api_key=os.getenv("ASSEMBLYAI_API_KEY"),
connection_params=connection_params,
vad_force_turn_endpoint=vad_force_turn_endpoint,
speaker_format=speaker_format,
)
# Text-to-Speech
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="a0e99841-438c-4a64-b679-ae501e7d6091",
)
# LLM
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4",
)
# Conversation context
messages = [
{
"role": "system",
"content": (
"You are a helpful voice assistant testing the AssemblyAI u3-rt-pro model. "
"Keep responses very brief (1-2 sentences). "
"Start by introducing yourself briefly and asking the user to speak."
),
},
]
context = LLMContext(messages)
# Configure aggregator based on mode
user_params = None
if vad_force_turn_endpoint:
user_params = LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer())
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=user_params,
)
# Pipeline
pipeline = Pipeline(
[
transport.input(),
stt,
user_aggregator,
llm,
tts,
transport.output(),
assistant_aggregator,
]
)
# Task
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
)
# Handle dynamic updates if provided
if test_dynamic_updates:
asyncio.create_task(test_dynamic_updates(task))
# Start the conversation
await task.queue_frames([LLMRunFrame()])
# Run
runner = PipelineRunner()
await runner.run(task)
# ============================================================================
# Test Configurations
# ============================================================================
# === BASIC CONFIGURATION (1-3) ===
async def test_01_basic_100ms():
"""Test 1: Basic default configuration (100ms)."""
connection_params = AssemblyAIConnectionParams(
speech_model="u3-rt-pro",
min_turn_silence=100,
)
await run_bot(connection_params, "Basic Default Configuration (100ms)")
async def test_02_custom_200ms():
"""Test 2: Custom min_end_of_turn_silence (200ms)."""
connection_params = AssemblyAIConnectionParams(
speech_model="u3-rt-pro",
min_turn_silence=200,
)
await run_bot(connection_params, "Custom Turn Silence (200ms)")
async def test_03_custom_500ms():
"""Test 3: Longer silence threshold (500ms)."""
connection_params = AssemblyAIConnectionParams(
speech_model="u3-rt-pro",
min_turn_silence=500,
)
await run_bot(connection_params, "Longer Turn Silence (500ms)")
# === PROMPTING & WARNINGS (4-7) ===
async def test_04_max_warning():
"""Test 4: max_turn_silence warning (should be overridden)."""
logger.warning("⚠️ EXPECT WARNING: max_turn_silence will be overridden")
connection_params = AssemblyAIConnectionParams(
speech_model="u3-rt-pro",
max_turn_silence=500,
)
await run_bot(connection_params, "max_turn_silence Override Warning")
async def test_05_prompt_warning():
"""Test 5: Custom prompt warning."""
logger.warning("⚠️ EXPECT WARNING: Custom prompts should be tested carefully")
connection_params = AssemblyAIConnectionParams(
speech_model="u3-rt-pro",
prompt="Transcribe speech accurately with proper punctuation.",
)
await run_bot(connection_params, "Custom Prompt Warning Test")
async def test_06_prompt_keyterms_conflict():
"""Test 6: Prompt + keyterms conflict (should error)."""
logger.error("❌ EXPECT ERROR: Cannot use both prompt and keyterms_prompt")
try:
connection_params = AssemblyAIConnectionParams(
speech_model="u3-rt-pro",
prompt="Custom prompt",
keyterms_prompt=["test"],
)
await run_bot(connection_params, "Prompt + Keyterms Conflict (ERROR)")
except ValueError as e:
logger.error(f"✅ EXPECTED ERROR: {e}")
input("\nPress Enter to continue...")
return
async def test_07_keyterms_difficult():
"""Test 7: Keyterms with difficult/unusual names."""
# Use names that STT wouldn't normally get right
keyterms = ["Xiomara", "Saoirse", "Krzystof", "Nguyen", "Pipecat", "AssemblyAI"]
connection_params = AssemblyAIConnectionParams(
speech_model="u3-rt-pro",
keyterms_prompt=keyterms,
)
logger.info("🎯 Boosted terms: Xiomara, Saoirse, Krzystof, Nguyen, Pipecat, AssemblyAI")
logger.info(" Try saying these difficult names to test boosting!")
await run_bot(connection_params, "Keyterms with Difficult Names")
# === DIARIZATION (8-9) ===
async def test_08_diarization_basic():
"""Test 8: Basic diarization (speaker IDs logged)."""
connection_params = AssemblyAIConnectionParams(
speech_model="u3-rt-pro",
speaker_labels=True,
)
logger.info("🎤 Diarization enabled - speaker IDs will be logged")
logger.info(" Try having multiple people speak!")
await run_bot(connection_params, "Diarization - Basic")
async def test_09_diarization_xml():
"""Test 9: Diarization with XML formatting."""
connection_params = AssemblyAIConnectionParams(
speech_model="u3-rt-pro",
speaker_labels=True,
)
logger.info("🎤 Diarization with XML tags")
logger.info(" Transcripts will include <Speaker X>text</Speaker X>")
await run_bot(
connection_params,
"Diarization - XML Formatting",
speaker_format="<Speaker {speaker}>{text}</Speaker {speaker}>",
)
# === DYNAMIC UPDATES - SINGLE PARAMETER (10-13) ===
async def test_10_dynamic_keyterms():
"""Test 10: Dynamic keyterms update with difficult names."""
connection_params = AssemblyAIConnectionParams(
speech_model="u3-rt-pro",
)
async def dynamic_update(task):
logger.info("\n" + "=" * 80)
logger.info("PHASE 1: No keyterms boosting")
logger.info(" Try saying: Xiomara, Saoirse, Krzystof")
logger.info(" (May not transcribe correctly)")
logger.info("=" * 80)
await asyncio.sleep(15)
logger.info("\n" + "=" * 80)
logger.info("🔄 UPDATING: Adding keyterms boost")
logger.info("=" * 80)
await task.queue_frame(
STTUpdateSettingsFrame(
delta=AssemblyAISTTSettings(
connection_params=AssemblyAIConnectionParams(
keyterms_prompt=["Xiomara", "Saoirse", "Krzystof", "Nguyen"]
)
)
)
)
logger.info("\n" + "=" * 80)
logger.info("PHASE 2: Keyterms NOW boosted")
logger.info(" Say the same names again: Xiomara, Saoirse, Krzystof")
logger.info(" (Should transcribe better now!)")
logger.info("=" * 80)
logger.info("🔄 This test has 2 phases:")
logger.info(" Phase 1 (15s): No boosting - names may be wrong")
logger.info(" Phase 2: Keyterms added - names should improve")
await run_bot(
connection_params,
"Dynamic Keyterms Update (Before/After)",
test_dynamic_updates=dynamic_update,
)
async def test_11_dynamic_silence():
"""Test 11: Dynamic silence parameter update (dramatic change)."""
connection_params = AssemblyAIConnectionParams(
speech_model="u3-rt-pro",
min_turn_silence=100,
)
async def dynamic_update(task):
logger.info("\n" + "=" * 80)
logger.info("PHASE 1: Quick responses (100ms silence threshold)")
logger.info(" Speak normally - bot responds quickly")
logger.info("=" * 80)
await asyncio.sleep(10)
logger.info("\n" + "=" * 80)
logger.info("🔄 UPDATING: Changing silence from 100ms → 3000ms (3 seconds!)")
logger.info("=" * 80)
await task.queue_frame(
STTUpdateSettingsFrame(
delta=AssemblyAISTTSettings(
connection_params=AssemblyAIConnectionParams(min_turn_silence=3000)
)
)
)
logger.info("\n" + "=" * 80)
logger.info("PHASE 2: Patient responses (3 second silence threshold)")
logger.info(" Bot will wait 3 full seconds before responding")
logger.info(" Try pausing mid-sentence - bot should NOT interrupt")
logger.info("=" * 80)
logger.info("🔄 Dramatic change: 100ms → 3000ms after 10 seconds")
await run_bot(
connection_params,
"Dynamic Silence Update (100ms → 3s)",
test_dynamic_updates=dynamic_update,
)
async def test_12_dynamic_prompt():
"""Test 12: Dynamic prompt update with keyterms in prompt."""
connection_params = AssemblyAIConnectionParams(
speech_model="u3-rt-pro",
)
async def dynamic_update(task):
logger.info("\n" + "=" * 80)
logger.info("PHASE 1: Default prompt (no keyterms)")
logger.info(" Try saying: Xiomara, Saoirse, Krzystof")
logger.info(" (May not transcribe correctly)")
logger.info("=" * 80)
await asyncio.sleep(15)
logger.info("\n" + "=" * 80)
logger.info("🔄 UPDATING: Adding custom prompt with keyterms")
logger.info("=" * 80)
custom_prompt = """Transcribe verbatim. Rules:
1) Always include punctuation in output.
2) Use period/question mark ONLY for complete sentences.
3) Use comma for mid-sentence pauses.
4) Use no punctuation for incomplete trailing speech.
5) Filler words (um, uh, so, like) indicate speaker will continue.
Pay special attention to these names and transcribe them exactly: Xiomara, Saoirse, Krzystof, Nguyen."""
await task.queue_frame(
STTUpdateSettingsFrame(
delta=AssemblyAISTTSettings(
connection_params=AssemblyAIConnectionParams(prompt=custom_prompt)
)
)
)
logger.info("\n" + "=" * 80)
logger.info("PHASE 2: Prompt with keyterms NOW active")
logger.info(" Say the same names again: Xiomara, Saoirse, Krzystof")
logger.info(" (Should transcribe better now!)")
logger.info("=" * 80)
logger.info("🔄 This test has 2 phases:")
logger.info(" Phase 1 (15s): Default prompt - names may be wrong")
logger.info(" Phase 2: Custom prompt with keyterms - names should improve")
await run_bot(
connection_params,
"Dynamic Prompt Update (with keyterms)",
test_dynamic_updates=dynamic_update,
)
async def test_13_dynamic_clear_keyterms():
"""Test 13: Clear keyterms dynamically."""
connection_params = AssemblyAIConnectionParams(
speech_model="u3-rt-pro",
keyterms_prompt=["Pipecat", "AssemblyAI"],
)
async def dynamic_update(task):
await asyncio.sleep(10)
logger.info("🔄 UPDATING: Clearing keyterms (empty array)")
await task.queue_frame(
STTUpdateSettingsFrame(
delta=AssemblyAISTTSettings(
connection_params=AssemblyAIConnectionParams(keyterms_prompt=[])
)
)
)
logger.info("🎯 Initial: Pipecat, AssemblyAI boosted")
logger.info("🔄 After 10s: Keyterms will be cleared")
await run_bot(
connection_params,
"Dynamic Clear Keyterms",
test_dynamic_updates=dynamic_update,
)
# === DYNAMIC UPDATES - MULTIPLE PARAMETERS (14-15) ===
async def test_14_multi_param_update():
"""Test 14: Update multiple parameters at once."""
connection_params = AssemblyAIConnectionParams(
speech_model="u3-rt-pro",
min_turn_silence=100,
)
async def dynamic_update(task):
await asyncio.sleep(10)
logger.info("🔄 UPDATING MULTIPLE: keyterms + silence")
await task.queue_frame(
STTUpdateSettingsFrame(
delta=AssemblyAISTTSettings(
connection_params=AssemblyAIConnectionParams(
keyterms_prompt=["Xiomara", "Pipecat"],
min_turn_silence=250,
)
)
)
)
logger.info("🔄 After 10s: Will update BOTH keyterms AND silence threshold")
await run_bot(
connection_params,
"Multiple Parameter Update",
test_dynamic_updates=dynamic_update,
)
async def test_15_complex_sequence():
"""Test 15: Complex multi-stage update sequence."""
connection_params = AssemblyAIConnectionParams(
speech_model="u3-rt-pro",
)
async def dynamic_update(task):
logger.info("Stage 1: Initial (10s)")
await asyncio.sleep(10)
logger.info("🔄 Stage 2: Add keyterms")
await task.queue_frame(
STTUpdateSettingsFrame(
delta=AssemblyAISTTSettings(
connection_params=AssemblyAIConnectionParams(keyterms_prompt=["Pipecat"])
)
)
)
await asyncio.sleep(10)
logger.info("🔄 Stage 3: Change silence")
await task.queue_frame(
STTUpdateSettingsFrame(
delta=AssemblyAISTTSettings(
connection_params=AssemblyAIConnectionParams(min_turn_silence=200)
)
)
)
await asyncio.sleep(10)
logger.info("🔄 Stage 4: Update both")
await task.queue_frame(
STTUpdateSettingsFrame(
delta=AssemblyAISTTSettings(
connection_params=AssemblyAIConnectionParams(
keyterms_prompt=["AssemblyAI", "OpenAI"],
min_turn_silence=150,
)
)
)
)
logger.info("🔄 Multi-stage: 4 configuration changes over 30 seconds")
await run_bot(
connection_params,
"Complex Update Sequence (4 stages)",
test_dynamic_updates=dynamic_update,
)
# === MODE COMPARISON (16-17) ===
async def test_16_pipecat_mode():
"""Test 16: Pipecat mode (VAD + Smart Turn controls turns)."""
connection_params = AssemblyAIConnectionParams(
speech_model="u3-rt-pro",
min_turn_silence=100,
)
logger.info("🎯 Pipecat Mode: VAD + Smart Turn control turn detection")
logger.info(" Your min_end_of_turn_silence is sent but ForceEndpoint overrides it")
await run_bot(
connection_params,
"Pipecat Mode (VAD + Smart Turn)",
vad_force_turn_endpoint=True,
)
async def test_17_stt_mode():
"""Test 17: STT mode (model controls turns)."""
connection_params = AssemblyAIConnectionParams(
speech_model="u3-rt-pro",
min_turn_silence=100,
)
logger.info("🎯 STT Mode: u3-rt-pro model controls turn detection")
logger.info(" No ForceEndpoint - parameters are respected")
await run_bot(
connection_params,
"STT Mode (Model Turn Detection)",
vad_force_turn_endpoint=False,
)
# === STT MODE TIMING EXPERIMENTS (18-20) ===
async def test_18_stt_long_max_short_min():
"""Test 18: STT mode - Long max_turn_silence + Short min (5000ms + 100ms)."""
connection_params = AssemblyAIConnectionParams(
speech_model="u3-rt-pro",
min_turn_silence=100, # Short - quick confident turns
max_turn_silence=5000, # Long - allows pauses up to 5 seconds
)
logger.info("🎯 STT Mode: Testing max/min parameter interaction")
logger.info(" min_turn_silence: 100ms (quick when confident)")
logger.info(" max_turn_silence: 5000ms (allows up to 5 second pauses)")
logger.info(" Try: Quick sentences (should respond fast) + Long pauses mid-thought")
await run_bot(
connection_params,
"STT: Long Max (5s) + Short Min (100ms)",
vad_force_turn_endpoint=False,
)
async def test_19_stt_long_min():
"""Test 19: STT mode - Long min_turn_silence (3000ms)."""
connection_params = AssemblyAIConnectionParams(
speech_model="u3-rt-pro",
min_turn_silence=3000, # 3 seconds
max_turn_silence=5000, # 5 seconds
)
logger.info("🎯 STT Mode: Testing long minimum silence requirement")
logger.info(" min_turn_silence: 3000ms")
logger.info(" max_turn_silence: 5000ms")
logger.info(" Bot will wait 3 full seconds of silence before responding!")
logger.info(" Try: Speaking with short pauses - bot should NOT interrupt")
await run_bot(
connection_params,
"STT: Long Min (3s)",
vad_force_turn_endpoint=False,
)
async def test_20_stt_both_short():
"""Test 20: STT mode - Both short (max=300ms, min=100ms)."""
connection_params = AssemblyAIConnectionParams(
speech_model="u3-rt-pro",
min_turn_silence=100, # 100ms
max_turn_silence=300, # 300ms
)
logger.info("🎯 STT Mode: Testing aggressive/quick response timing")
logger.info(" min_turn_silence: 100ms")
logger.info(" max_turn_silence: 300ms")
logger.info(" Bot will respond VERY quickly to any pause!")
logger.info(" Try: Speaking with natural pauses - expect quick responses")
await run_bot(
connection_params,
"STT: Both Short (300ms/100ms)",
vad_force_turn_endpoint=False,
)
# === EDGE CASES (21-23) ===
async def test_21_very_long_silence():
"""Test 21: Very long silence threshold (STT mode only)."""
connection_params = AssemblyAIConnectionParams(
speech_model="u3-rt-pro",
min_turn_silence=10000, # 10 seconds
)
logger.warning("⚠️ STT Mode with 10 second silence threshold")
logger.info(" Bot will wait 10 seconds of silence before responding!")
await run_bot(
connection_params,
"Very Long Silence (10s) - STT Mode",
vad_force_turn_endpoint=False,
)
async def test_22_very_short_silence():
"""Test 22: Very short silence threshold (50ms)."""
connection_params = AssemblyAIConnectionParams(
speech_model="u3-rt-pro",
min_turn_silence=50,
)
logger.info("⚡ Very short silence threshold (50ms)")
logger.info(" Bot will respond very quickly!")
await run_bot(connection_params, "Very Short Silence (50ms)")
async def test_23_keyterms_plus_diarization():
"""Test 23: Keyterms + Diarization combined."""
connection_params = AssemblyAIConnectionParams(
speech_model="u3-rt-pro",
keyterms_prompt=["Xiomara", "Saoirse", "Pipecat"],
speaker_labels=True,
)
logger.info("🎯 Keyterms + 🎤 Diarization both enabled")
logger.info(" Try multiple speakers saying difficult names!")
await run_bot(
connection_params,
"Keyterms + Diarization Combined",
speaker_format="[{speaker}] {text}",
)
# ============================================================================
# Interactive Menu
# ============================================================================
def show_menu():
"""Display the comprehensive test menu."""
print("\n" + "=" * 80)
print("AssemblyAI u3-rt-pro Comprehensive Test Suite")
print("=" * 80)
print("\n📋 BASIC CONFIGURATION (1-3)")
print(" 1. Basic Default (100ms)")
print(" 2. Custom Silence (200ms)")
print(" 3. Longer Silence (500ms)")
print("\n⚠️ PROMPTING & WARNINGS (4-7)")
print(" 4. max_turn_silence Warning")
print(" 5. Custom Prompt Warning")
print(" 6. Prompt + Keyterms Conflict (ERROR)")
print(" 7. Keyterms with Difficult Names")
print("\n🎤 DIARIZATION (8-9)")
print(" 8. Diarization - Basic")
print(" 9. Diarization - XML Formatting")
print("\n🔄 DYNAMIC UPDATES - SINGLE (10-13)")
print(" 10. Dynamic Keyterms (Before/After with difficult names)")
print(" 11. Dynamic Silence (100ms → 3s DRAMATIC)")
print(" 12. Dynamic Prompt with Keyterms (Before/After)")
print(" 13. Dynamic Clear Keyterms")
print("\n🔄 DYNAMIC UPDATES - MULTIPLE (14-15)")
print(" 14. Multiple Parameters at Once")
print(" 15. Complex Update Sequence (4 stages)")
print("\n⚖️ MODE COMPARISON (16-17)")
print(" 16. Pipecat Mode (VAD + Smart Turn)")
print(" 17. STT Mode (Model Turn Detection)")
print("\n⏱️ STT MODE TIMING EXPERIMENTS (18-20)")
print(" 18. STT: Long Max (5s) + Short Min (100ms)")
print(" 19. STT: Long Min (3s)")
print(" 20. STT: Both Short (300ms/100ms)")
print("\n🎯 EDGE CASES (21-23)")
print(" 21. Very Long Silence (10s - STT Mode)")
print(" 22. Very Short Silence (50ms)")
print(" 23. Keyterms + Diarization Combined")
print("\n 0. Exit")
print("\n" + "=" * 80)
async def main():
"""Main interactive menu."""
tests = {
"1": test_01_basic_100ms,
"2": test_02_custom_200ms,
"3": test_03_custom_500ms,
"4": test_04_max_warning,
"5": test_05_prompt_warning,
"6": test_06_prompt_keyterms_conflict,
"7": test_07_keyterms_difficult,
"8": test_08_diarization_basic,
"9": test_09_diarization_xml,
"10": test_10_dynamic_keyterms,
"11": test_11_dynamic_silence,
"12": test_12_dynamic_prompt,
"13": test_13_dynamic_clear_keyterms,
"14": test_14_multi_param_update,
"15": test_15_complex_sequence,
"16": test_16_pipecat_mode,
"17": test_17_stt_mode,
"18": test_18_stt_long_max_short_min,
"19": test_19_stt_long_min,
"20": test_20_stt_both_short,
"21": test_21_very_long_silence,
"22": test_22_very_short_silence,
"23": test_23_keyterms_plus_diarization,
}
while True:
show_menu()
choice = input("Enter test number (or 0 to exit): ").strip()
if choice == "0":
print("\n👋 Goodbye!")
break
if choice in tests:
try:
await tests[choice]()
except KeyboardInterrupt:
print("\n\n⚠️ Test interrupted by user")
except Exception as e:
logger.error(f"Test failed with error: {e}")
import traceback
traceback.print_exc()
input("\n\nPress Enter to return to menu...")
else:
print(f"\n❌ Invalid choice: {choice}")
input("Press Enter to continue...")
if __name__ == "__main__":
try:
asyncio.run(main())
except KeyboardInterrupt:
print("\n\n👋 Goodbye!")

View File

@@ -1,582 +0,0 @@
#!/usr/bin/env python3
"""AssemblyAI u3-rt-pro Comprehensive Test Script
Tests all features:
- Basic configuration
- Prompting and keyterms
- Diarization
- Dynamic updates
- Turn detection modes
Usage:
python test_assemblyai_u3pro.py --test <test_name>
python test_assemblyai_u3pro.py --interactive
"""
import argparse
import asyncio
import os
import sys
from typing import List
from dotenv import load_dotenv
from loguru import logger
# Add src to path
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "src"))
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import (
EndFrame,
Frame,
LLMRunFrame,
STTUpdateSettingsFrame,
TranscriptionFrame,
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.assemblyai.models import AssemblyAIConnectionParams
from pipecat.services.assemblyai.stt import AssemblyAISTTService
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.local.audio import LocalAudioTransport, LocalAudioTransportParams
load_dotenv()
# Test configuration
class TestConfig:
"""Centralized test configuration."""
ASSEMBLYAI_API_KEY = os.getenv("ASSEMBLYAI_API_KEY")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
CARTESIA_API_KEY = os.getenv("CARTESIA_API_KEY")
@classmethod
def validate(cls):
"""Validate all required API keys are set."""
missing = []
if not cls.ASSEMBLYAI_API_KEY:
missing.append("ASSEMBLYAI_API_KEY")
if not cls.OPENAI_API_KEY:
missing.append("OPENAI_API_KEY")
if not cls.CARTESIA_API_KEY:
missing.append("CARTESIA_API_KEY")
if missing:
logger.error(f"Missing required environment variables: {', '.join(missing)}")
return False
return True
class TranscriptionLogger(FrameProcessor):
"""Log transcriptions for test verification."""
async def process_frame(self, frame: Frame, direction: FrameDirection):
if isinstance(frame, TranscriptionFrame):
logger.info(f"📝 TRANSCRIPTION: {frame.text}")
logger.info(f" Speaker: {frame.user_id}")
logger.info(f" Finalized: {frame.finalized}")
if hasattr(frame, "result") and frame.result:
if hasattr(frame.result, "speaker"):
logger.info(f" Diarization: {frame.result.speaker}")
await self.push_frame(frame, direction)
async def create_basic_voice_agent(
connection_params: AssemblyAIConnectionParams,
vad_force_turn_endpoint: bool = True,
speaker_format: str = None,
) -> tuple[PipelineTask, LocalAudioTransport]:
"""Create a basic voice agent for testing.
Args:
connection_params: AssemblyAI connection parameters
vad_force_turn_endpoint: Turn detection mode
speaker_format: Optional speaker formatting string
Returns:
Tuple of (PipelineTask, LocalAudioTransport)
"""
# Create local audio transport (uses your microphone and speakers)
transport = LocalAudioTransport(
params=LocalAudioTransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
)
)
# Create STT
stt = AssemblyAISTTService(
api_key=TestConfig.ASSEMBLYAI_API_KEY,
connection_params=connection_params,
vad_force_turn_endpoint=vad_force_turn_endpoint,
speaker_format=speaker_format,
)
# Create TTS
tts = CartesiaTTSService(
api_key=TestConfig.CARTESIA_API_KEY,
voice_id="a0e99841-438c-4a64-b679-ae501e7d6091", # Conversational English
)
# Create LLM context and service
messages = [
{
"role": "system",
"content": (
"You are a helpful voice assistant. Keep responses brief and natural. "
"If you see speaker tags like <Speaker A>text</Speaker A>, acknowledge "
"that you understand multiple speakers are present."
),
}
]
context = LLMContext(messages)
llm = OpenAILLMService(api_key=TestConfig.OPENAI_API_KEY, model="gpt-4")
# Create aggregators with VAD
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
vad_analyzer=SileroVADAnalyzer(),
),
)
# Create transcription logger
transcription_logger = TranscriptionLogger()
# Create pipeline
pipeline = Pipeline(
[
transport.input(),
stt,
transcription_logger,
user_aggregator,
llm,
tts,
transport.output(),
assistant_aggregator,
]
)
# Create task
task = PipelineTask(pipeline)
return task, transport
# ============================================================================
# Test Functions
# ============================================================================
async def test_basic_config():
"""Test 1: Basic default configuration."""
logger.info("=" * 80)
logger.info("TEST 1: Basic Default Configuration")
logger.info("=" * 80)
connection_params = AssemblyAIConnectionParams(speech_model="u3-rt-pro")
task, transport = await create_basic_voice_agent(connection_params)
logger.info("✅ Service created successfully with default params")
logger.info("Expected: min=max=100ms, u3-rt-pro model")
logger.info("Speak into your microphone to test transcription")
# Trigger initial bot greeting
await task.queue_frames([LLMRunFrame()])
runner = PipelineRunner()
await runner.run(task)
async def test_custom_min_silence():
"""Test 2: Custom min_turn_silence."""
logger.info("=" * 80)
logger.info("TEST 2: Custom min_turn_silence")
logger.info("=" * 80)
connection_params = AssemblyAIConnectionParams(speech_model="u3-rt-pro", min_turn_silence=200)
task, transport = await create_basic_voice_agent(connection_params)
logger.info("✅ Service created with min=200ms")
logger.info("Expected: Both min and max set to 200ms")
logger.info("Speak short phrases and observe turn detection timing")
runner = PipelineRunner()
await runner.run(task)
async def test_max_silence_warning():
"""Test 3: Setting max_turn_silence should trigger warning."""
logger.info("=" * 80)
logger.info("TEST 3: max_turn_silence Warning")
logger.info("=" * 80)
connection_params = AssemblyAIConnectionParams(
speech_model="u3-rt-pro",
min_turn_silence=100,
max_turn_silence=500, # Should trigger warning
)
task, transport = await create_basic_voice_agent(connection_params)
logger.info("⚠️ Check logs above for warning about max_turn_silence being overridden")
logger.info("Expected: Warning logged, max set to 100ms (same as min)")
runner = PipelineRunner()
await runner.run(task)
async def test_custom_prompt_warning():
"""Test 5: Custom prompt should trigger warning."""
logger.info("=" * 80)
logger.info("TEST 5: Custom Prompt Warning")
logger.info("=" * 80)
connection_params = AssemblyAIConnectionParams(
speech_model="u3-rt-pro",
prompt="Transcribe verbatim. Always include punctuation.",
)
task, transport = await create_basic_voice_agent(connection_params)
logger.info("⚠️ Check logs above for warning about testing without prompt first")
logger.info("Expected: Warning logged, service continues with custom prompt")
runner = PipelineRunner()
await runner.run(task)
async def test_prompt_keyterms_conflict():
"""Test 6: Prompt + keyterms_prompt should raise error."""
logger.info("=" * 80)
logger.info("TEST 6: Prompt + Keyterms Conflict (Error)")
logger.info("=" * 80)
try:
connection_params = AssemblyAIConnectionParams(
speech_model="u3-rt-pro",
prompt="Custom prompt",
keyterms_prompt=["test", "words"],
)
task, transport = await create_basic_voice_agent(connection_params)
logger.error("❌ TEST FAILED: Should have raised ValueError")
except ValueError as e:
logger.info(f"✅ TEST PASSED: ValueError raised as expected")
logger.info(f" Error message: {e}")
async def test_keyterms_basic():
"""Test 7: Basic keyterms at initialization."""
logger.info("=" * 80)
logger.info("TEST 7: Basic Keyterms Prompting")
logger.info("=" * 80)
connection_params = AssemblyAIConnectionParams(
speech_model="u3-rt-pro",
keyterms_prompt=["Pipecat", "AssemblyAI", "Universal-3", "streaming"],
)
task, transport = await create_basic_voice_agent(connection_params)
logger.info("✅ Service created with keyterms: Pipecat, AssemblyAI, Universal-3, streaming")
logger.info("Expected: Boosted recognition for these terms")
logger.info("Try saying: 'I'm testing Pipecat with AssemblyAI Universal-3 for streaming'")
runner = PipelineRunner()
await runner.run(task)
async def test_diarization_no_format():
"""Test 10: Diarization enabled without formatting."""
logger.info("=" * 80)
logger.info("TEST 10: Diarization Enabled (No Formatting)")
logger.info("=" * 80)
connection_params = AssemblyAIConnectionParams(speech_model="u3-rt-pro", speaker_labels=True)
task, transport = await create_basic_voice_agent(connection_params)
logger.info("✅ Service created with speaker_labels=True")
logger.info("Expected: Speaker IDs in user_id field, plain text in transcript")
logger.info("Have multiple people speak to see different speaker labels")
runner = PipelineRunner()
await runner.run(task)
async def test_diarization_xml_format():
"""Test 11: Diarization with XML formatting."""
logger.info("=" * 80)
logger.info("TEST 11: Diarization with XML Formatting")
logger.info("=" * 80)
connection_params = AssemblyAIConnectionParams(speech_model="u3-rt-pro", speaker_labels=True)
task, transport = await create_basic_voice_agent(
connection_params, speaker_format="<{speaker}>{text}</{speaker}>"
)
logger.info("✅ Service created with XML speaker formatting")
logger.info("Expected: Text like '<Speaker A>Hello</Speaker A>'")
logger.info("Have multiple people speak to see formatted speaker tags")
runner = PipelineRunner()
await runner.run(task)
async def test_dynamic_keyterms():
"""Test 13: Dynamic keyterms updates."""
logger.info("=" * 80)
logger.info("TEST 13: Dynamic Keyterms Updates")
logger.info("=" * 80)
connection_params = AssemblyAIConnectionParams(speech_model="u3-rt-pro")
task, transport = await create_basic_voice_agent(connection_params)
async def update_keyterms_stages():
"""Simulate multi-stage conversation with keyterms updates."""
await asyncio.sleep(5) # Wait for connection
# Stage 1: Greeting
logger.info("🔄 STAGE 1: Greeting (general terms)")
update1 = STTUpdateSettingsFrame(
settings={"keyterms_prompt": ["hello", "hi", "good morning", "welcome"]}
)
await task.queue_frames([update1])
await asyncio.sleep(10)
# Stage 2: Name collection
logger.info("🔄 STAGE 2: Name Collection")
update2 = STTUpdateSettingsFrame(
settings={
"keyterms_prompt": [
"first name",
"last name",
"John",
"Jane",
"Smith",
"Johnson",
]
}
)
await task.queue_frames([update2])
await asyncio.sleep(10)
# Stage 3: Medical info
logger.info("🔄 STAGE 3: Medical Information")
update3 = STTUpdateSettingsFrame(
settings={
"keyterms_prompt": [
"cardiology",
"echocardiogram",
"blood pressure",
"Dr. Smith",
"metoprolol",
]
}
)
await task.queue_frames([update3])
await asyncio.sleep(10)
# Stage 4: Clear keyterms
logger.info("🔄 STAGE 4: Clear Keyterms")
update4 = STTUpdateSettingsFrame(settings={"keyterms_prompt": []})
await task.queue_frames([update4])
# Start update task
asyncio.create_task(update_keyterms_stages())
logger.info("✅ Service created, will update keyterms every 10 seconds")
logger.info("Expected: Different keyterms at each stage")
logger.info("Watch logs for 'STAGE X' messages and test relevant terms")
runner = PipelineRunner()
await runner.run(task)
async def test_dynamic_silence_params():
"""Test 15: Dynamic silence parameter updates."""
logger.info("=" * 80)
logger.info("TEST 15: Dynamic Silence Parameters")
logger.info("=" * 80)
connection_params = AssemblyAIConnectionParams(speech_model="u3-rt-pro")
task, transport = await create_basic_voice_agent(connection_params)
async def update_silence_params():
"""Update silence parameters for different scenarios."""
await asyncio.sleep(5)
# Normal conversation
logger.info("🔄 PHASE 1: Normal conversation (default timing)")
await asyncio.sleep(10)
# Reading credit card
logger.info("🔄 PHASE 2: Reading numbers (longer silence tolerance)")
update1 = STTUpdateSettingsFrame(
settings={
"max_turn_silence": 5000,
"min_turn_silence": 300,
}
)
await task.queue_frames([update1])
await asyncio.sleep(15)
# Back to normal
logger.info("🔄 PHASE 3: Back to normal conversation")
update2 = STTUpdateSettingsFrame(
settings={
"max_turn_silence": 1200,
"min_turn_silence": 100,
}
)
await task.queue_frames([update2])
asyncio.create_task(update_silence_params())
logger.info("✅ Service will update silence parameters during conversation")
logger.info("Expected: Longer pauses tolerated in Phase 2")
logger.info("Try pausing between words to test")
runner = PipelineRunner()
await runner.run(task)
async def test_multi_param_update():
"""Test 17: Update multiple parameters at once."""
logger.info("=" * 80)
logger.info("TEST 17: Multiple Parameter Update")
logger.info("=" * 80)
connection_params = AssemblyAIConnectionParams(speech_model="u3-rt-pro")
task, transport = await create_basic_voice_agent(connection_params)
async def multi_update():
await asyncio.sleep(5)
logger.info("🔄 Updating multiple parameters together")
update = STTUpdateSettingsFrame(
settings={
"keyterms_prompt": ["account", "routing", "number"],
"max_turn_silence": 3000,
"min_turn_silence": 200,
}
)
await task.queue_frames([update])
logger.info("✅ Check logs for single UpdateConfiguration message")
asyncio.create_task(multi_update())
logger.info("Expected: All params updated in single WebSocket message")
runner = PipelineRunner()
await runner.run(task)
# ============================================================================
# Main Test Runner
# ============================================================================
def main():
"""Main test runner."""
parser = argparse.ArgumentParser(description="Test AssemblyAI u3-rt-pro integration")
parser.add_argument(
"--test",
type=str,
default="basic",
help="Test to run (basic, custom_min, max_warning, prompt_warning, "
"prompt_keyterms_conflict, keyterms, diarization, diarization_xml, "
"dynamic_keyterms, dynamic_silence, multi_param, all)",
)
parser.add_argument("--interactive", action="store_true", help="Run in interactive mode")
args = parser.parse_args()
# Validate environment
if not TestConfig.validate():
logger.error("Please set all required environment variables in .env")
sys.exit(1)
# Test mapping
tests = {
"basic": test_basic_config,
"custom_min": test_custom_min_silence,
"max_warning": test_max_silence_warning,
"prompt_warning": test_custom_prompt_warning,
"prompt_keyterms_conflict": test_prompt_keyterms_conflict,
"keyterms": test_keyterms_basic,
"diarization": test_diarization_no_format,
"diarization_xml": test_diarization_xml_format,
"dynamic_keyterms": test_dynamic_keyterms,
"dynamic_silence": test_dynamic_silence_params,
"multi_param": test_multi_param_update,
}
if args.interactive:
logger.info("Interactive mode - select test to run:")
for i, (name, _) in enumerate(tests.items(), 1):
logger.info(f"{i}. {name}")
logger.info(f"{len(tests) + 1}. Run all tests")
choice = input("\nEnter test number: ")
try:
choice_num = int(choice)
if choice_num == len(tests) + 1:
args.test = "all"
else:
args.test = list(tests.keys())[choice_num - 1]
except (ValueError, IndexError):
logger.error("Invalid choice")
sys.exit(1)
# Run test(s)
if args.test == "all":
logger.info("Running all tests sequentially...")
for test_name, test_func in tests.items():
try:
asyncio.run(test_func())
except KeyboardInterrupt:
logger.info(f"Test '{test_name}' interrupted")
break
except Exception as e:
logger.error(f"Test '{test_name}' failed: {e}")
else:
if args.test not in tests:
logger.error(f"Unknown test: {args.test}")
logger.info(f"Available tests: {', '.join(tests.keys())}")
sys.exit(1)
try:
asyncio.run(tests[args.test]())
except KeyboardInterrupt:
logger.info("Test interrupted")
except Exception as e:
logger.error(f"Test failed: {e}")
raise
if __name__ == "__main__":
main()