Files
pipecat/examples/local-input-select-stt/README.md
Julien Le Bourg 77fb63372a fix: incorrectly changed the base type in my last pull request for L… (#1184)
* fix: incorrectly changed the base type in my last pull request for  LocalAudioTransport

* update examples to use the new LocalTransportParams

* add local device select example
2025-02-11 08:35:57 -08:00

2.3 KiB
Raw Blame History

Pipecat Audio Transcription Example 🚀🎙️

Welcome to the Pipecat Audio Transcription Example!

This project showcases how to integrate the awesome pipecat library with a neat textual interface (powered by Textual) to select audio devices, perform real-time speech-to-text (STT) transcription using Whisper.

Note: Although the script allows you to select both input and output audio devices, this example only utilizes the audio input for transcription.


🎉 Features

  • Interactive Audio Device Selection:
    Choose your preferred audio input device using a cool, textual UI.

  • State-of-the-Art Transcription:
    Leverage Whisper's large model (running on CUDA) for high-quality, real-time STT.

  • Live Transcription Logging:
    Watch your spoken words transform into text on your console instantly.

  • Easy Setup:
    Everything you need is in the requirements.txt.


🎥 Demo

Get a quick glimpse of the app in action!
(Don't worry I'll be adding a GIF demo here soon!)

Demo GIF


🔧 Installation

Install Dependencies:

pip install -r requirements.txt

🚀 Usage

Run the main script:

python bot.py

When the app launches, you'll see a textual interface that lets you select your audio input device. Once selected, the app will begin capturing audio, transcribing it using Whisper.


⚙️ How It Works

  1. LocalAudioTransport:
    Captures audio from your chosen input device.

  2. WhisperSTTService:
    Processes the audio stream using Whisper's large model for speech-to-text conversion.

  3. TranscriptionLogger:
    Logs the transcribed text to the console as soon as it's processed.


📦 Dependencies

The project relies on:

  • pipecat For building the audio processing pipeline.
  • Textual For the interactive terminal UI.
  • Whisper For state-of-the-art STT transcription.

Example improvements:

I plan to improve this example with local LLM calls and audio output.