Reworked readme to have more pipes and cats
This commit is contained in:
committed by
Aleix Conchillo Flaqué
parent
7856d20a38
commit
8fa9fdcd5a
153
README.md
153
README.md
@@ -1,73 +1,33 @@
|
||||
[](https://pypi.org/project/pipecat-ai)
|
||||
<div align="center">
|
||||
<img alt="pipecat" width="300px" height="auto" src="image.png">
|
||||
</div>
|
||||
|
||||
# Pipecat — an open source framework for voice (and multimodal) assistants
|
||||
# Pipecat
|
||||
|
||||
[](https://pypi.org/project/dailyai)
|
||||
|
||||
`pipecat` is a framework for building voice (and multimodal) conversational agents. Things like personal coaches, meeting assistants, story-telling toys for kids, customer support bots, and snarky social companions.
|
||||
|
||||
Build things like this:
|
||||
|
||||
[](https://www.youtube.com/watch?v=lDevgsp9vn0)
|
||||
|
||||
[ [pipecat starter kits repository](https://github.com/daily-co/pipecat-examples) ]
|
||||
## Getting started with voice agents
|
||||
|
||||
**`Pipecat` started as a toolkit for implementing generative AI voice bots.** Things like personal coaches, meeting assistants, story-telling toys for kids, customer support bots, and snarky social companions.
|
||||
You can get started with Pipecat running on your local machine, then move your agent processes to the cloud when you’re ready. You can also add a telephone number, image output, video input, use different LLMs, and more.
|
||||
|
||||
In 2023 a _lot_ of us got excited about the possibility of having open-ended conversations with LLMs. It became clear pretty quickly that we were all solving the same [low-level problems](https://www.daily.co/blog/how-to-talk-to-an-llm-with-your-voice/):
|
||||
|
||||
- low-latency, reliable audio transport
|
||||
- echo cancellation
|
||||
- phrase endpointing (knowing when the bot should respond to human speech)
|
||||
- interruptibility
|
||||
- writing clean code to stream data through "pipelines" of speech-to-text, LLM inference, and text-to-speech models
|
||||
|
||||
As our applications expanded to include additional things like image generation, function calling, and vision models, we started to think about what a complete framework for these kinds of apps could look like.
|
||||
|
||||
Today, `pipecat` is:
|
||||
|
||||
1. a set of code building blocks for interacting with generative AI services and creating low-latency, interruptible data pipelines that use multiple services
|
||||
2. transport services that moves audio, video, and events across the Internet
|
||||
3. implementations of specific generative AI services
|
||||
|
||||
Currently implemented services:
|
||||
|
||||
- Speech-to-text
|
||||
- Deepgram
|
||||
- Whisper
|
||||
- LLMs
|
||||
- Azure
|
||||
- Fireworks
|
||||
- OpenAI
|
||||
- Image generation
|
||||
- Azure
|
||||
- Fal
|
||||
- OpenAI
|
||||
- Text-to-speech
|
||||
- Azure
|
||||
- Deepgram
|
||||
- ElevenLabs
|
||||
- Transport
|
||||
- Daily
|
||||
- Local
|
||||
- Vision
|
||||
- Moondream
|
||||
|
||||
If you'd like to [implement a service](<(https://github.com/daily-co/pipecat/tree/main/src/pipecat/services)>), we welcome PRs! Our goal is to support lots of services in all of the above categories, plus new categories (like real-time video) as they emerge.
|
||||
|
||||
## Getting started
|
||||
|
||||
Today, the easiest way to get started with `pipecat` is to use [Daily](https://www.daily.co/) as your transport service. This toolkit started life as an internal SDK at Daily and millions of minutes of AI conversation have been served using it and its earlier prototype incarnations.
|
||||
|
||||
```
|
||||
```shell
|
||||
# install the module
|
||||
pip install pipecat
|
||||
pip install pipecat-ai
|
||||
|
||||
# set up an .env file with API keys
|
||||
cp dot-env.template .env
|
||||
```
|
||||
|
||||
By default, in order to minimize dependencies, only the basic framework functionality is available. Some third-party AI services require additional
|
||||
dependencies that you can install with:
|
||||
By default, in order to minimize dependencies, only the basic framework functionality is available. Some third-party AI services require additional dependencies that you can install with:
|
||||
|
||||
```
|
||||
pip install "pipecat[option,...]"
|
||||
```shell
|
||||
pip install "pipecat-ai[option,...]"
|
||||
```
|
||||
|
||||
Your project may or may not need these, so they're made available as optional requirements. Here is a list:
|
||||
@@ -75,6 +35,89 @@ Your project may or may not need these, so they're made available as optional re
|
||||
- **AI services**: `anthropic`, `azure`, `fal`, `moondream`, `openai`, `playht`, `silero`, `whisper`
|
||||
- **Transports**: `daily`, `local`, `websocket`
|
||||
|
||||
## A simple voice agent running locally
|
||||
|
||||
If you’re doing AI-related stuff, you probably have an OpenAI API key.
|
||||
|
||||
To generate voice output, one service that’s easy to get started with is ElevenLabs. If you don’t already have an ElevenLabs developer account, you can sign up for one [here].
|
||||
|
||||
So let’s run a really simple agent that’s just a GPT-4 prompt, wired up to voice input and speaker output.
|
||||
|
||||
You can change the prompt, in the code. The current prompt is “Tell me something interesting about the Roman Empire.”
|
||||
|
||||
`cd examples/getting-started` to run the following examples …
|
||||
|
||||
```shell
|
||||
# Talk to a local pipecat process with your voice. Specify GPT-4 as the LLM.
|
||||
|
||||
export OPENAI_API_KEY=...
|
||||
export ELEVENLABS_API_KEY=...
|
||||
python ./local-mic.py | ./pipecat-pipes-gpt-4.py | ./local-speaker.py
|
||||
```
|
||||
|
||||
## WebSockets instead of pipes
|
||||
|
||||
To run your agent in the cloud, you can switch the Pipecat transport layer to use a WebSocket instead of Unix pipes.
|
||||
|
||||
```shell
|
||||
# Talk to a local pipecat process with your voice. Specify GPT-4 as the LLM.
|
||||
|
||||
export OPENAI_API_KEY=...
|
||||
export ELEVENLABS_API_KEY=...
|
||||
python ./local-mic-and-speaker-wss.py wss://localhost:8088
|
||||
```
|
||||
|
||||
## WebRTC for production use
|
||||
|
||||
WebSockets are fine for server-to-server communication or for initial development. But for production use, you’ll need client-server audio to use a protocol designed for real-time media transport. (For an explanation of the difference between WebSockets and WebRTC, see [this post.])
|
||||
|
||||
One way to get up and running quickly with WebRTC is to sign up for a Daily developer account. Daily gives you SDKs and global infrastructure for audio (and video) routing. Every account gets 10,000 audio/video/transcription minutes free each month.
|
||||
|
||||
Sign up [here](https://dashboard.daily.co/u/signup) and [create a room](https://docs.daily.co/reference/rest-api/rooms) in the developer Dashboard. Then run the examples, this time connecting via WebRTC instead of a WebSocket.
|
||||
|
||||
```shell
|
||||
# 1. Run the pipecat process. Provide your Daily API key and a Daily room
|
||||
export DAILY_API_KEY=...
|
||||
export OPENAI_API_KEY=...
|
||||
export ELEVENLABS_API_KEY=...
|
||||
python pipecat-daily-gpt-4.py --daily-room https://example.daily.co/pipecat
|
||||
|
||||
# 2. Visit the Daily room link in any web browser to talk to the pipecat process.
|
||||
# You'll want to use a Daily SDK to embed the client-side code into your own
|
||||
# app. But visiting the room URL in a browser is a quick way to start building
|
||||
# agents because you can focus on just the agent code at first.
|
||||
open -a "Google Chrome" https://example.daily.co/pipecat
|
||||
```
|
||||
|
||||
## Deploy your agent to the cloud
|
||||
Now that you’ve decoupled client and server, and have a Pipecat process that can run anywhere you can run Python, you can deploy this example agent to the cloud.
|
||||
|
||||
`TBC`
|
||||
|
||||
## Taking it further
|
||||
|
||||
### Add a telephone number
|
||||
Daily supports telephone connections in addition to WebRTC streams. You can add a telephone number to your Daily room with the following REST API call. Once you’ve done that, you can call your agent on the phone.
|
||||
|
||||
You’ll need to add a credit card to your Daily account to enable telephone numbers.
|
||||
|
||||
`TBC`
|
||||
|
||||
|
||||
### Add image output
|
||||
|
||||
Daily supports telephone connections in addition to WebRTC streams. You can add a telephone number to your Daily room with the following REST API call. Once you’ve done that, you can call your agent on the phone.
|
||||
|
||||
You’ll need to add a credit card to your Daily account to enable telephone numbers.
|
||||
|
||||
`TBC`
|
||||
|
||||
### Add video output
|
||||
|
||||
|
||||
`TBC`
|
||||
|
||||
|
||||
## Code examples
|
||||
|
||||
There are two directories of examples:
|
||||
|
||||
@@ -33,12 +33,12 @@ Website = "https://pipecat.ai"
|
||||
|
||||
[project.optional-dependencies]
|
||||
anthropic = [ "anthropic~=0.25.7" ]
|
||||
audio = [ "pyaudio~=0.2.0" ]
|
||||
azure = [ "azure-cognitiveservices-speech~=1.37.0" ]
|
||||
daily = [ "daily-python~=0.7.4" ]
|
||||
examples = [ "python-dotenv~=1.0.0", "flask~=3.0.3", "flask_cors~=4.0.1" ]
|
||||
fal = [ "fal-client~=0.4.0" ]
|
||||
fireworks = [ "openai~=1.26.0" ]
|
||||
local = [ "pyaudio~=0.2.0" ]
|
||||
moondream = [ "einops~=0.8.0", "timm~=0.9.16", "transformers~=4.40.2" ]
|
||||
openai = [ "openai~=1.26.0" ]
|
||||
playht = [ "pyht~=0.0.28" ]
|
||||
|
||||
@@ -15,7 +15,7 @@ try:
|
||||
except ModuleNotFoundError as e:
|
||||
logger.error(f"Exception: {e}")
|
||||
logger.error(
|
||||
"In order to use Anthropic, you need to `pip install pipecat[anthropic]`. Also, set `ANTHROPIC_API_KEY` environment variable.")
|
||||
"In order to use Anthropic, you need to `pip install pipecat-ai[anthropic]`. Also, set `ANTHROPIC_API_KEY` environment variable.")
|
||||
raise Exception(f"Missing module: {e}")
|
||||
|
||||
|
||||
|
||||
@@ -21,7 +21,7 @@ try:
|
||||
except ModuleNotFoundError as e:
|
||||
logger.error(f"Exception: {e}")
|
||||
logger.error(
|
||||
"In order to use Azure TTS, you need to `pip install pipecat[azure]`. Also, set `AZURE_SPEECH_API_KEY` and `AZURE_SPEECH_REGION` environment variables.")
|
||||
"In order to use Azure TTS, you need to `pip install pipecat-ai[azure]`. Also, set `AZURE_SPEECH_API_KEY` and `AZURE_SPEECH_REGION` environment variables.")
|
||||
raise Exception(f"Missing module: {e}")
|
||||
|
||||
from pipecat.services.openai_api_llm_service import BaseOpenAILLMService
|
||||
|
||||
@@ -23,7 +23,7 @@ try:
|
||||
except ModuleNotFoundError as e:
|
||||
logger.error(f"Exception: {e}")
|
||||
logger.error(
|
||||
"In order to use Fal, you need to `pip install pipecat[fal]`. Also, set `FAL_KEY` environment variable.")
|
||||
"In order to use Fal, you need to `pip install pipecat-ai[fal]`. Also, set `FAL_KEY` environment variable.")
|
||||
raise Exception(f"Missing module: {e}")
|
||||
|
||||
|
||||
|
||||
@@ -13,7 +13,7 @@ try:
|
||||
except ModuleNotFoundError as e:
|
||||
logger.error(f"Exception: {e}")
|
||||
logger.error(
|
||||
"In order to use Fireworks, you need to `pip install pipecat[fireworks]`. Also, set the `FIREWORKS_API_KEY` environment variable.")
|
||||
"In order to use Fireworks, you need to `pip install pipecat-ai[fireworks]`. Also, set the `FIREWORKS_API_KEY` environment variable.")
|
||||
raise Exception(f"Missing module: {e}")
|
||||
|
||||
|
||||
|
||||
@@ -19,7 +19,7 @@ try:
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
except ModuleNotFoundError as e:
|
||||
logger.error(f"Exception: {e}")
|
||||
logger.error("In order to use Moondream, you need to `pip install pipecat[moondream]`.")
|
||||
logger.error("In order to use Moondream, you need to `pip install pipecat-ai[moondream]`.")
|
||||
raise Exception(f"Missing module(s): {e}")
|
||||
|
||||
|
||||
|
||||
@@ -32,7 +32,7 @@ try:
|
||||
except ModuleNotFoundError as e:
|
||||
logger.error(f"Exception: {e}")
|
||||
logger.error(
|
||||
"In order to use OpenAI, you need to `pip install pipecat[openai]`. Also, set `OPENAI_API_KEY` environment variable.")
|
||||
"In order to use OpenAI, you need to `pip install pipecat-ai[openai]`. Also, set `OPENAI_API_KEY` environment variable.")
|
||||
raise Exception(f"Missing module: {e}")
|
||||
|
||||
|
||||
|
||||
@@ -19,7 +19,7 @@ try:
|
||||
except ModuleNotFoundError as e:
|
||||
logger.error(f"Exception: {e}")
|
||||
logger.error(
|
||||
"In order to use PlayHT, you need to `pip install pipecat[playht]`. Also, set `PLAY_HT_USER_ID` and `PLAY_HT_API_KEY` environment variables.")
|
||||
"In order to use PlayHT, you need to `pip install pipecat-ai[playht]`. Also, set `PLAY_HT_USER_ID` and `PLAY_HT_API_KEY` environment variables.")
|
||||
raise Exception(f"Missing module: {e}")
|
||||
|
||||
|
||||
|
||||
@@ -22,7 +22,7 @@ try:
|
||||
except ModuleNotFoundError as e:
|
||||
logger.error(f"Exception: {e}")
|
||||
logger.error(
|
||||
"In order to use Whisper, you need to `pip install pipecat[whisper]`.")
|
||||
"In order to use Whisper, you need to `pip install pipecat-ai[whisper]`.")
|
||||
raise Exception(f"Missing module: {e}")
|
||||
|
||||
|
||||
|
||||
@@ -18,7 +18,7 @@ try:
|
||||
except ModuleNotFoundError as e:
|
||||
logger.error(f"Exception: {e}")
|
||||
logger.error(
|
||||
"In order to use local audio, you need to `pip install pipecat[audio]`. On MacOS, you also need to `brew install portaudio`.")
|
||||
"In order to use local audio, you need to `pip install pipecat-ai[local]`. On MacOS, you also need to `brew install portaudio`.")
|
||||
raise Exception(f"Missing module: {e}")
|
||||
|
||||
|
||||
|
||||
@@ -22,7 +22,7 @@ try:
|
||||
except ModuleNotFoundError as e:
|
||||
logger.error(f"Exception: {e}")
|
||||
logger.error(
|
||||
"In order to use local audio, you need to `pip install pipecat[audio]`. On MacOS, you also need to `brew install portaudio`.")
|
||||
"In order to use local audio, you need to `pip install pipecat-ai[audio]`. On MacOS, you also need to `brew install portaudio`.")
|
||||
raise Exception(f"Missing module: {e}")
|
||||
|
||||
try:
|
||||
|
||||
@@ -44,7 +44,7 @@ try:
|
||||
from daily import (EventHandler, CallClient, Daily)
|
||||
except ModuleNotFoundError as e:
|
||||
logger.error(f"Exception: {e}")
|
||||
logger.error("In order to use the Daily transport, you need to `pip install pipecat[daily]`.")
|
||||
logger.error("In order to use the Daily transport, you need to `pip install pipecat-ai[daily]`.")
|
||||
raise Exception(f"Missing module: {e}")
|
||||
|
||||
VAD_RESET_PERIOD_MS = 2000
|
||||
|
||||
@@ -22,7 +22,7 @@ try:
|
||||
|
||||
except ModuleNotFoundError as e:
|
||||
logger.error(f"Exception: {e}")
|
||||
logger.error("In order to use Silero VAD, you need to `pip install pipecat[silero]`.")
|
||||
logger.error("In order to use Silero VAD, you need to `pip install pipecat-ai[silero]`.")
|
||||
raise Exception(f"Missing module(s): {e}")
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user