Reworked readme to have more pipes and cats

2024-05-12 18:37:07 +01:00
parent 7856d20a38
commit 8fa9fdcd5a
15 changed files with 111 additions and 68 deletions
--- a/README.md
+++ b/README.md
@@ -1,73 +1,33 @@
-[![PyPI](https://img.shields.io/pypi/v/pipecat-ai)](https://pypi.org/project/pipecat-ai)
+<div align="center">
+ <img alt="pipecat" width="300px" height="auto" src="image.png">
+</div>

-# Pipecat — an open source framework for voice (and multimodal) assistants
+# Pipecat
+
+[![PyPI](https://img.shields.io/pypi/v/dailyai)](https://pypi.org/project/dailyai)
+
+`pipecat` is a framework for building voice (and multimodal) conversational agents. Things like personal coaches, meeting assistants, story-telling toys for kids, customer support bots, and snarky social companions.

 Build things like this:

 [![AI-powered voice patient intake for healthcare](https://img.youtube.com/vi/lDevgsp9vn0/0.jpg)](https://www.youtube.com/watch?v=lDevgsp9vn0)

-[ [pipecat starter kits repository](https://github.com/daily-co/pipecat-examples) ]
+## Getting started with voice agents

-**`Pipecat` started as a toolkit for implementing generative AI voice bots.** Things like personal coaches, meeting assistants, story-telling toys for kids, customer support bots, and snarky social companions.
+You can get started with Pipecat running on your local machine, then move your agent processes to the cloud when you’re ready. You can also add a telephone number, image output, video input, use different LLMs, and more.

-In 2023 a _lot_ of us got excited about the possibility of having open-ended conversations with LLMs. It became clear pretty quickly that we were all solving the same [low-level problems](https://www.daily.co/blog/how-to-talk-to-an-llm-with-your-voice/):
-
- low-latency, reliable audio transport
- echo cancellation
- phrase endpointing (knowing when the bot should respond to human speech)
- interruptibility
- writing clean code to stream data through "pipelines" of speech-to-text, LLM inference, and text-to-speech models
-
-As our applications expanded to include additional things like image generation, function calling, and vision models, we started to think about what a complete framework for these kinds of apps could look like.
-
-Today, `pipecat` is:
-
-1. a set of code building blocks for interacting with generative AI services and creating low-latency, interruptible data pipelines that use multiple services
-2. transport services that moves audio, video, and events across the Internet
-3. implementations of specific generative AI services
-
-Currently implemented services:
-
- Speech-to-text
-  - Deepgram
-  - Whisper
- LLMs
-  - Azure
-  - Fireworks
-  - OpenAI
- Image generation
-  - Azure
-  - Fal
-  - OpenAI
- Text-to-speech
-  - Azure
-  - Deepgram
-  - ElevenLabs
- Transport
-  - Daily
-  - Local
- Vision
-  - Moondream
-
-If you'd like to [implement a service](<(https://github.com/daily-co/pipecat/tree/main/src/pipecat/services)>), we welcome PRs! Our goal is to support lots of services in all of the above categories, plus new categories (like real-time video) as they emerge.
-
-## Getting started
-
-Today, the easiest way to get started with `pipecat` is to use [Daily](https://www.daily.co/) as your transport service. This toolkit started life as an internal SDK at Daily and millions of minutes of AI conversation have been served using it and its earlier prototype incarnations.
-
-```
+```shell
 # install the module
-pip install pipecat
+pip install pipecat-ai

 # set up an .env file with API keys
 cp dot-env.template .env
 ```

-By default, in order to minimize dependencies, only the basic framework functionality is available. Some third-party AI services require additional
-dependencies that you can install with:
+By default, in order to minimize dependencies, only the basic framework functionality is available. Some third-party AI services require additional dependencies that you can install with:

-```
-pip install "pipecat[option,...]"
+```shell
+pip install "pipecat-ai[option,...]"
 ```

 Your project may or may not need these, so they're made available as optional requirements. Here is a list:
@@ -75,6 +35,89 @@ Your project may or may not need these, so they're made available as optional re
 - **AI services**: `anthropic`, `azure`, `fal`, `moondream`, `openai`, `playht`, `silero`, `whisper`
 - **Transports**: `daily`, `local`, `websocket`

+## A simple voice agent running locally
+
+If you’re doing AI-related stuff, you probably have an OpenAI API key.
+
+To generate voice output, one service that’s easy to get started with is ElevenLabs. If you don’t already have an ElevenLabs developer account, you can sign up for one [here].
+
+So let’s run a really simple agent that’s just a GPT-4 prompt, wired up to voice input and speaker output.
+
+You can change the prompt, in the code. The current prompt is “Tell me something interesting about the Roman Empire.”
+
+`cd examples/getting-started` to run the following examples …
+
+```shell
+# Talk to a local pipecat process with your voice. Specify GPT-4 as the LLM.
+
+export OPENAI_API_KEY=...
+export ELEVENLABS_API_KEY=...
+python ./local-mic.py | ./pipecat-pipes-gpt-4.py | ./local-speaker.py
+```
+
+## WebSockets instead of pipes
+
+To run your agent in the cloud, you can switch the Pipecat transport layer to use a WebSocket instead of Unix pipes.
+
+```shell
+# Talk to a local pipecat process with your voice. Specify GPT-4 as the LLM.
+
+export OPENAI_API_KEY=...
+export ELEVENLABS_API_KEY=...
+python ./local-mic-and-speaker-wss.py wss://localhost:8088
+```
+
+## WebRTC for production use
+
+WebSockets are fine for server-to-server communication or for initial development. But for production use, you’ll need client-server audio to use a protocol designed for real-time media transport. (For an explanation of the difference between WebSockets and WebRTC, see [this post.])
+
+One way to get up and running quickly with WebRTC is to sign up for a Daily developer account. Daily gives you SDKs and global infrastructure for audio (and video) routing. Every account gets 10,000 audio/video/transcription minutes free each month.
+
+Sign up [here](https://dashboard.daily.co/u/signup) and [create a room](https://docs.daily.co/reference/rest-api/rooms) in the developer Dashboard. Then run the examples, this time connecting via WebRTC instead of a WebSocket.
+
+```shell
+# 1. Run the pipecat process. Provide your Daily API key and a Daily room
+export DAILY_API_KEY=...
+export OPENAI_API_KEY=...
+export ELEVENLABS_API_KEY=...
+python pipecat-daily-gpt-4.py --daily-room https://example.daily.co/pipecat
+
+# 2. Visit the Daily room link in any web browser to talk to the pipecat process.
+#    You'll want to use a Daily SDK to embed the client-side code into your own
+#    app. But visiting the room URL in a browser is a quick way to start building
+#    agents because you can focus on just the agent code at first.
+open -a "Google Chrome" https://example.daily.co/pipecat
+```
+
+## Deploy your agent to the cloud
+Now that you’ve decoupled client and server, and have a Pipecat process that can run anywhere you can run Python, you can deploy this example agent to the cloud.
+
+`TBC`
+
+## Taking it further
+
+### Add a telephone number
+Daily supports telephone connections in addition to WebRTC streams. You can add a telephone number to your Daily room with the following REST API call. Once you’ve done that, you can call your agent on the phone.
+
+You’ll need to add a credit card to your Daily account to enable telephone numbers.
+
+`TBC`
+
+
+### Add image output
+
+Daily supports telephone connections in addition to WebRTC streams. You can add a telephone number to your Daily room with the following REST API call. Once you’ve done that, you can call your agent on the phone.
+
+You’ll need to add a credit card to your Daily account to enable telephone numbers.
+
+`TBC`
+
+### Add video output
+
+
+`TBC`
+
+
 ## Code examples

 There are two directories of examples:
--- a/image.png
+++ b/image.png
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -33,12 +33,12 @@ Website = "https://pipecat.ai"

 [project.optional-dependencies]
 anthropic = [ "anthropic~=0.25.7" ]
-audio = [ "pyaudio~=0.2.0" ]
 azure = [ "azure-cognitiveservices-speech~=1.37.0" ]
 daily = [ "daily-python~=0.7.4" ]
 examples = [ "python-dotenv~=1.0.0", "flask~=3.0.3", "flask_cors~=4.0.1" ]
 fal = [ "fal-client~=0.4.0" ]
 fireworks = [ "openai~=1.26.0" ]
+local = [ "pyaudio~=0.2.0" ]
 moondream = [ "einops~=0.8.0", "timm~=0.9.16", "transformers~=4.40.2" ]
 openai = [ "openai~=1.26.0" ]
 playht = [ "pyht~=0.0.28" ]
--- a/src/pipecat/services/anthropic.py
+++ b/src/pipecat/services/anthropic.py
@@ -15,7 +15,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error(
-        "In order to use Anthropic, you need to `pip install pipecat[anthropic]`. Also, set `ANTHROPIC_API_KEY` environment variable.")
+        "In order to use Anthropic, you need to `pip install pipecat-ai[anthropic]`. Also, set `ANTHROPIC_API_KEY` environment variable.")
    raise Exception(f"Missing module: {e}")


--- a/src/pipecat/services/azure.py
+++ b/src/pipecat/services/azure.py
@@ -21,7 +21,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error(
-        "In order to use Azure TTS, you need to `pip install pipecat[azure]`. Also, set `AZURE_SPEECH_API_KEY` and `AZURE_SPEECH_REGION` environment variables.")
+        "In order to use Azure TTS, you need to `pip install pipecat-ai[azure]`. Also, set `AZURE_SPEECH_API_KEY` and `AZURE_SPEECH_REGION` environment variables.")
    raise Exception(f"Missing module: {e}")

 from pipecat.services.openai_api_llm_service import BaseOpenAILLMService
--- a/src/pipecat/services/fal.py
+++ b/src/pipecat/services/fal.py
@@ -23,7 +23,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error(
-        "In order to use Fal, you need to `pip install pipecat[fal]`. Also, set `FAL_KEY` environment variable.")
+        "In order to use Fal, you need to `pip install pipecat-ai[fal]`. Also, set `FAL_KEY` environment variable.")
    raise Exception(f"Missing module: {e}")


--- a/src/pipecat/services/fireworks.py
+++ b/src/pipecat/services/fireworks.py
@@ -13,7 +13,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error(
-        "In order to use Fireworks, you need to `pip install pipecat[fireworks]`. Also, set the `FIREWORKS_API_KEY` environment variable.")
+        "In order to use Fireworks, you need to `pip install pipecat-ai[fireworks]`. Also, set the `FIREWORKS_API_KEY` environment variable.")
    raise Exception(f"Missing module: {e}")


--- a/src/pipecat/services/moondream.py
+++ b/src/pipecat/services/moondream.py
@@ -19,7 +19,7 @@ try:
    from transformers import AutoModelForCausalLM, AutoTokenizer
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
-    logger.error("In order to use Moondream, you need to `pip install pipecat[moondream]`.")
+    logger.error("In order to use Moondream, you need to `pip install pipecat-ai[moondream]`.")
    raise Exception(f"Missing module(s): {e}")


--- a/src/pipecat/services/openai.py
+++ b/src/pipecat/services/openai.py
@@ -32,7 +32,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error(
-        "In order to use OpenAI, you need to `pip install pipecat[openai]`. Also, set `OPENAI_API_KEY` environment variable.")
+        "In order to use OpenAI, you need to `pip install pipecat-ai[openai]`. Also, set `OPENAI_API_KEY` environment variable.")
    raise Exception(f"Missing module: {e}")


--- a/src/pipecat/services/playht.py
+++ b/src/pipecat/services/playht.py
@@ -19,7 +19,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error(
-        "In order to use PlayHT, you need to `pip install pipecat[playht]`. Also, set `PLAY_HT_USER_ID` and `PLAY_HT_API_KEY` environment variables.")
+        "In order to use PlayHT, you need to `pip install pipecat-ai[playht]`. Also, set `PLAY_HT_USER_ID` and `PLAY_HT_API_KEY` environment variables.")
    raise Exception(f"Missing module: {e}")


--- a/src/pipecat/services/whisper.py
+++ b/src/pipecat/services/whisper.py
@@ -22,7 +22,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error(
-        "In order to use Whisper, you need to `pip install pipecat[whisper]`.")
+        "In order to use Whisper, you need to `pip install pipecat-ai[whisper]`.")
    raise Exception(f"Missing module: {e}")


--- a/src/pipecat/transports/local/audio.py
+++ b/src/pipecat/transports/local/audio.py
@@ -18,7 +18,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error(
-        "In order to use local audio, you need to `pip install pipecat[audio]`. On MacOS, you also need to `brew install portaudio`.")
+        "In order to use local audio, you need to `pip install pipecat-ai[local]`. On MacOS, you also need to `brew install portaudio`.")
    raise Exception(f"Missing module: {e}")


--- a/src/pipecat/transports/local/tk.py
+++ b/src/pipecat/transports/local/tk.py
@@ -22,7 +22,7 @@ try:
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error(
-        "In order to use local audio, you need to `pip install pipecat[audio]`. On MacOS, you also need to `brew install portaudio`.")
+        "In order to use local audio, you need to `pip install pipecat-ai[audio]`. On MacOS, you also need to `brew install portaudio`.")
    raise Exception(f"Missing module: {e}")

 try:
--- a/src/pipecat/transports/services/daily.py
+++ b/src/pipecat/transports/services/daily.py
@@ -44,7 +44,7 @@ try:
    from daily import (EventHandler, CallClient, Daily)
 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
-    logger.error("In order to use the Daily transport, you need to `pip install pipecat[daily]`.")
+    logger.error("In order to use the Daily transport, you need to `pip install pipecat-ai[daily]`.")
    raise Exception(f"Missing module: {e}")

 VAD_RESET_PERIOD_MS = 2000
--- a/src/pipecat/vad/silero.py
+++ b/src/pipecat/vad/silero.py
@@ -22,7 +22,7 @@ try:

 except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
-    logger.error("In order to use Silero VAD, you need to `pip install pipecat[silero]`.")
+    logger.error("In order to use Silero VAD, you need to `pip install pipecat-ai[silero]`.")
    raise Exception(f"Missing module(s): {e}")