Pipecat Audio Transcription Example 🚀🎙️
Welcome to the Pipecat Audio Transcription Example!
This project showcases how to integrate the awesome pipecat library with a neat textual interface (powered by Textual) to select audio devices, perform real-time speech-to-text (STT) transcription using Whisper.
Note: Although the script allows you to select both input and output audio devices, this example only utilizes the audio input for transcription.
🎉 Features
-
Interactive Audio Device Selection:
Choose your preferred audio input device using a cool, textual UI. -
State-of-the-Art Transcription:
Leverage Whisper's large model (running on CUDA) for high-quality, real-time STT. -
Live Transcription Logging:
Watch your spoken words transform into text on your console instantly. -
Easy Setup:
Everything you need is in therequirements.txt.
🎥 Demo
Get a quick glimpse of the app in action!
(Don't worry – I'll be adding a GIF demo here soon!)
🔧 Installation
Install Dependencies:
pip install -r requirements.txt
🚀 Usage
Run the main script:
python bot.py
When the app launches, you'll see a textual interface that lets you select your audio input device. Once selected, the app will begin capturing audio, transcribing it using Whisper.
⚙️ How It Works
-
LocalAudioTransport:
Captures audio from your chosen input device. -
WhisperSTTService:
Processes the audio stream using Whisper's large model for speech-to-text conversion. -
TranscriptionLogger:
Logs the transcribed text to the console as soon as it's processed.
📦 Dependencies
The project relies on:
- pipecat – For building the audio processing pipeline.
- Textual – For the interactive terminal UI.
- Whisper – For state-of-the-art STT transcription.
Example improvements:
I plan to improve this example with local LLM calls and audio output.
