Compare commits
29 Commits
greedy-plu
...
jpt/fastbo
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
5bd5d22270 | ||
|
|
6ee7932337 | ||
|
|
c407445dd1 | ||
|
|
447f37167e | ||
|
|
354c21500e | ||
|
|
5728e25b5a | ||
|
|
0b6a19802f | ||
|
|
c4a2d2197c | ||
|
|
269d06aa15 | ||
|
|
dfef1f2c54 | ||
|
|
b62beaba0b | ||
|
|
adf414e40f | ||
|
|
dc64e57f63 | ||
|
|
d3e410b2ac | ||
|
|
c544b2474b | ||
|
|
18243de358 | ||
|
|
6625895d1f | ||
|
|
f9ecce739e | ||
|
|
0075dd8386 | ||
|
|
eef1cde816 | ||
|
|
8d867c30c6 | ||
|
|
42c668b7ae | ||
|
|
b62227b4ae | ||
|
|
25ef0cb87b | ||
|
|
e195941aa5 | ||
|
|
e09eef1dd7 | ||
|
|
7c13663a4e | ||
|
|
5753869e5e | ||
|
|
ba878a19f4 |
22
CHANGELOG.md
22
CHANGELOG.md
@@ -5,14 +5,34 @@ All notable changes to **pipecat** will be documented in this file.
|
||||
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
||||
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||
|
||||
## [Unreleased]
|
||||
## [0.0.32] - 2024-06-22
|
||||
|
||||
### Added
|
||||
|
||||
- Allow specifying a `DeepgramSTTService` url which allows using on-prem
|
||||
Deepgram.
|
||||
|
||||
- Added new `FastAPIWebsocketTransport`. This is a new websocket transport that
|
||||
can be integrated with FastAPI websockets.
|
||||
|
||||
- Added new `TwilioFrameSerializer`. This is a new serializer that knows how to
|
||||
serialize and deserialize audio frames from Twilio.
|
||||
|
||||
- Added Daily transport event: `on_dialout_answered`. See
|
||||
https://reference-python.daily.co/api_reference.html#daily.EventHandler
|
||||
|
||||
- Added new `AzureSTTService`. This allows you to use Azure Speech-To-Text.
|
||||
|
||||
### Performance
|
||||
|
||||
- Convert `BaseOutputTransport` and `BaseOutputTransport` to fully use asyncio
|
||||
and remove the use of threads.
|
||||
|
||||
### Other
|
||||
|
||||
- Added `twilio-chatbot`. This is an example that shows how to integrate Twilio
|
||||
phone numbers with a Pipecat bot.
|
||||
|
||||
- Updated `07f-interruptible-azure.py` to use `AzureLLMService`,
|
||||
`AzureSTTService` and `AzureTTSService`.
|
||||
|
||||
|
||||
@@ -32,14 +32,15 @@ Next, follow the steps in the README for each demo.
|
||||
|
||||
## Projects:
|
||||
|
||||
| Project | Description | Services |
|
||||
| -------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------- |
|
||||
| [Simple Chatbot](simple-chatbot) | Basic voice-driven conversational bot. A good starting point for learning the flow of the framework. | Deepgram, OpenAI, Daily, Daily Prebuilt UI |
|
||||
| [Storytelling Chatbot](storytelling-chatbot) | Stitches together multiple third-party services to create a collaborative storytime experience. | Deepgram, ElevenLabs, Open AI, Fal, Daily, Custom UI |
|
||||
| [Translation Chatbot](translation-chatbot) | Listens for user speech, then translates that speech to Spanish and speaks the translation back. Demonstrates multi-participant use-cases. | Deepgram, Azure, OpenAI, Daily, Daily Prebuilt UI |
|
||||
| [Moondream Chatbot](moondream-chatbot) | Demonstrates how to add vision capabilities to GPT4. **Note: works best with a GPU** | Deepgram, OpenAI, Moondream, Daily, Daily Prebuilt UI |
|
||||
| Function-calling Chatbot (TBC) | A chatbot that can call functions in response to user input. | Deepgram, OpenAI, Fireworks, Daily, Daily Prebuilt UI |
|
||||
| [Dialin Chatbot](dialin-chatbot) | A chatbot that connects to an incoming phone call from Daily or Twilio. | Deepgram, OpenAI, ElevenLabs, Daily, Twilio |
|
||||
| Project | Description | Services |
|
||||
|----------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------|
|
||||
| [Simple Chatbot](simple-chatbot) | Basic voice-driven conversational bot. A good starting point for learning the flow of the framework. | Deepgram, ElevenLabs, OpenAI, Daily, Daily Prebuilt UI |
|
||||
| [Storytelling Chatbot](storytelling-chatbot) | Stitches together multiple third-party services to create a collaborative storytime experience. | Deepgram, ElevenLabs, OpenAI, Fal, Daily, Custom UI |
|
||||
| [Translation Chatbot](translation-chatbot) | Listens for user speech, then translates that speech to Spanish and speaks the translation back. Demonstrates multi-participant use-cases. | Deepgram, Azure, OpenAI, Daily, Daily Prebuilt UI |
|
||||
| [Moondream Chatbot](moondream-chatbot) | Demonstrates how to add vision capabilities to GPT4. **Note: works best with a GPU** | Deepgram, ElevenLabs, OpenAI, Moondream, Daily, Daily Prebuilt UI |
|
||||
| [Patient intake](patient-intake) | A chatbot that can call functions in response to user input. | Deepgram, ElevenLabs, OpenAI, Daily, Daily Prebuilt UI |
|
||||
| [Dialin Chatbot](dialin-chatbot) | A chatbot that connects to an incoming phone call from Daily or Twilio. | Deepgram, ElevenLabs, OpenAI, Daily, Twilio |
|
||||
| [Twilio Chatbot](twilio-chatbot) | A chatbot that connects to an incoming phone call from Twilio. | Deepgram, ElevenLabs, OpenAI, Daily, Twilio |
|
||||
|
||||
> [!IMPORTANT]
|
||||
> These example projects use Daily as a WebRTC transport and can be joined using their hosted Prebuilt UI.
|
||||
|
||||
165
examples/fast-chatbot/.gitignore
vendored
Normal file
165
examples/fast-chatbot/.gitignore
vendored
Normal file
@@ -0,0 +1,165 @@
|
||||
# Byte-compiled / optimized / DLL files
|
||||
__pycache__/
|
||||
*.py[cod]
|
||||
*$py.class
|
||||
|
||||
# C extensions
|
||||
*.so
|
||||
|
||||
# Distribution / packaging
|
||||
.Python
|
||||
build/
|
||||
develop-eggs/
|
||||
dist/
|
||||
downloads/
|
||||
eggs/
|
||||
.eggs/
|
||||
lib/
|
||||
lib64/
|
||||
parts/
|
||||
sdist/
|
||||
var/
|
||||
wheels/
|
||||
share/python-wheels/
|
||||
*.egg-info/
|
||||
.installed.cfg
|
||||
*.egg
|
||||
MANIFEST
|
||||
|
||||
# PyInstaller
|
||||
# Usually these files are written by a python script from a template
|
||||
# before PyInstaller builds the exe, so as to inject date/other infos into it.
|
||||
*.manifest
|
||||
*.spec
|
||||
|
||||
# Installer logs
|
||||
pip-log.txt
|
||||
pip-delete-this-directory.txt
|
||||
|
||||
# Unit test / coverage reports
|
||||
htmlcov/
|
||||
.tox/
|
||||
.nox/
|
||||
.coverage
|
||||
.coverage.*
|
||||
.cache
|
||||
nosetests.xml
|
||||
coverage.xml
|
||||
*.cover
|
||||
*.py,cover
|
||||
.hypothesis/
|
||||
.pytest_cache/
|
||||
cover/
|
||||
|
||||
# Translations
|
||||
*.mo
|
||||
*.pot
|
||||
|
||||
# Django stuff:
|
||||
*.log
|
||||
local_settings.py
|
||||
db.sqlite3
|
||||
db.sqlite3-journal
|
||||
|
||||
# Flask stuff:
|
||||
instance/
|
||||
.webassets-cache
|
||||
|
||||
# Scrapy stuff:
|
||||
.scrapy
|
||||
|
||||
# Sphinx documentation
|
||||
docs/_build/
|
||||
|
||||
# PyBuilder
|
||||
.pybuilder/
|
||||
target/
|
||||
|
||||
# Jupyter Notebook
|
||||
.ipynb_checkpoints
|
||||
|
||||
# IPython
|
||||
profile_default/
|
||||
ipython_config.py
|
||||
|
||||
# pyenv
|
||||
# For a library or package, you might want to ignore these files since the code is
|
||||
# intended to run in multiple environments; otherwise, check them in:
|
||||
# .python-version
|
||||
|
||||
# pipenv
|
||||
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
|
||||
# However, in case of collaboration, if having platform-specific dependencies or dependencies
|
||||
# having no cross-platform support, pipenv may install dependencies that don't work, or not
|
||||
# install all needed dependencies.
|
||||
#Pipfile.lock
|
||||
|
||||
# poetry
|
||||
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
|
||||
# This is especially recommended for binary packages to ensure reproducibility, and is more
|
||||
# commonly ignored for libraries.
|
||||
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
|
||||
#poetry.lock
|
||||
|
||||
# pdm
|
||||
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
|
||||
#pdm.lock
|
||||
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
|
||||
# in version control.
|
||||
# https://pdm.fming.dev/#use-with-ide
|
||||
.pdm.toml
|
||||
|
||||
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
|
||||
__pypackages__/
|
||||
|
||||
# Celery stuff
|
||||
celerybeat-schedule
|
||||
celerybeat.pid
|
||||
|
||||
# SageMath parsed files
|
||||
*.sage.py
|
||||
|
||||
# Environments
|
||||
.env
|
||||
.venv
|
||||
env/
|
||||
venv/
|
||||
ENV/
|
||||
env.bak/
|
||||
venv.bak/
|
||||
|
||||
# Spyder project settings
|
||||
.spyderproject
|
||||
.spyproject
|
||||
|
||||
# Rope project settings
|
||||
.ropeproject
|
||||
|
||||
# mkdocs documentation
|
||||
/site
|
||||
|
||||
# mypy
|
||||
.mypy_cache/
|
||||
.dmypy.json
|
||||
dmypy.json
|
||||
|
||||
# Pyre type checker
|
||||
.pyre/
|
||||
|
||||
# pytype static type analyzer
|
||||
.pytype/
|
||||
|
||||
# Cython debug symbols
|
||||
cython_debug/
|
||||
|
||||
# PyCharm
|
||||
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
|
||||
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
|
||||
# and can be added to the global gitignore or merged into this file. For a more nuclear
|
||||
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
|
||||
#.idea/
|
||||
runpod.toml
|
||||
|
||||
# custom script to recursively upgrade items in requirements.py
|
||||
upgrade_requirements.py
|
||||
.DS_Store
|
||||
0
examples/fast-chatbot/README.md
Normal file
0
examples/fast-chatbot/README.md
Normal file
164
examples/fast-chatbot/bot.py
Normal file
164
examples/fast-chatbot/bot.py
Normal file
@@ -0,0 +1,164 @@
|
||||
#
|
||||
# Copyright (c) 2024, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
|
||||
from loguru import logger
|
||||
import argparse
|
||||
import asyncio
|
||||
import aiohttp
|
||||
import os
|
||||
import sys
|
||||
import time
|
||||
from typing import Optional
|
||||
|
||||
from pydantic import BaseModel, ValidationError
|
||||
|
||||
from pipecat.vad.vad_analyzer import VADParams
|
||||
from pipecat.vad.silero import SileroVADAnalyzer
|
||||
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||
from pipecat.services.openai import OpenAILLMService
|
||||
from pipecat.services.deepgram import DeepgramSTTService
|
||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||
from pipecat.pipeline.runner import PipelineRunner
|
||||
from pipecat.pipeline.pipeline import Pipeline
|
||||
from pipecat.frames.frames import LLMMessagesFrame, EndFrame
|
||||
|
||||
from pipecat.processors.aggregators.llm_response import (
|
||||
LLMAssistantResponseAggregator, LLMUserResponseAggregator
|
||||
)
|
||||
|
||||
from helpers import (
|
||||
ClearableDeepgramTTSService,
|
||||
AudioVolumeTimer,
|
||||
TranscriptionTimingLogger
|
||||
)
|
||||
|
||||
|
||||
from dotenv import load_dotenv
|
||||
load_dotenv(override=True)
|
||||
|
||||
logger.remove(0)
|
||||
logger.add(sys.stderr, level=os.getenv("LOG_LEVEL", "DEBUG"))
|
||||
|
||||
|
||||
class BotSettings(BaseModel):
|
||||
room_url: str
|
||||
room_token: str
|
||||
bot_name: str = "Pipecat"
|
||||
prompt: Optional[str] = "You are a helpful assistant."
|
||||
deepgram_api_key: Optional[str] = os.getenv("DEEPGRAM_API_KEY", None)
|
||||
deepgram_voice: Optional[str] = os.getenv("DEEPGRAM_VOICE", "aura-asteria-en")
|
||||
deepgram_tts_base_url: Optional[str] = os.getenv(
|
||||
"DEEPGRAM_TTS_BASE_URL", "https://api.deepgram.com/v1/speak")
|
||||
deepgram_stt_base_url: Optional[str] = os.getenv(
|
||||
"DEEPGRAM_STT_BASE_URL", "https://api.deepgram.com/v1/speak")
|
||||
openai_api_key: Optional[str] = os.getenv("OPENAI_API_KEY", None),
|
||||
openai_model: Optional[str] = os.getenv("OPENAI_MODEL", None),
|
||||
openai_base_url: Optional[str] = os.getenv("OPENAI_BASE_URL", None)
|
||||
vad_stop_secs: Optional[float] = os.getenv("VAD_STOP_SECS", 0.200)
|
||||
|
||||
|
||||
async def main(settings: BotSettings):
|
||||
async with aiohttp.ClientSession() as session:
|
||||
transport = DailyTransport(
|
||||
settings.room_url,
|
||||
settings.room_token,
|
||||
settings.bot_name,
|
||||
DailyParams(
|
||||
audio_out_enabled=True,
|
||||
transcription_enabled=False,
|
||||
vad_enabled=True,
|
||||
vad_analyzer=SileroVADAnalyzer(params=VADParams(
|
||||
stop_secs=settings.vad_stop_secs
|
||||
)),
|
||||
vad_audio_passthrough=True
|
||||
)
|
||||
)
|
||||
|
||||
stt = DeepgramSTTService(
|
||||
name="STT",
|
||||
api_key=settings.deepgram_api_key,
|
||||
url=settings.deepgram_stt_base_url
|
||||
)
|
||||
|
||||
tts = ClearableDeepgramTTSService(
|
||||
name="Voice",
|
||||
aiohttp_session=session,
|
||||
api_key=settings.deepgram_api_key,
|
||||
voice=settings.deepgram_voice,
|
||||
**({'base_url': url} if (url := settings.deepgram_tts_base_url) else {})
|
||||
)
|
||||
|
||||
llm = OpenAILLMService(
|
||||
name="LLM",
|
||||
api_key=settings.openai_api_key,
|
||||
model=settings.openai_model,
|
||||
base_url=settings.openai_base_url,
|
||||
)
|
||||
|
||||
messages = [
|
||||
{
|
||||
"role": "system",
|
||||
"content": settings.prompt,
|
||||
},
|
||||
]
|
||||
|
||||
avt = AudioVolumeTimer()
|
||||
tl = TranscriptionTimingLogger(avt)
|
||||
|
||||
tma_in = LLMUserResponseAggregator(messages)
|
||||
tma_out = LLMAssistantResponseAggregator(messages)
|
||||
|
||||
pipeline = Pipeline([
|
||||
transport.input(), # Transport user input
|
||||
avt, # Audio volume timer
|
||||
stt, # Speech-to-text
|
||||
tl, # Transcription timing logger
|
||||
tma_in, # User responses
|
||||
llm, # LLM
|
||||
tts, # TTS
|
||||
transport.output(), # Transport bot output
|
||||
tma_out, # Assistant spoken responses
|
||||
])
|
||||
|
||||
task = PipelineTask(
|
||||
pipeline,
|
||||
PipelineParams(
|
||||
allow_interruptions=True,
|
||||
enable_metrics=True,
|
||||
report_only_initial_ttfb=True
|
||||
))
|
||||
|
||||
# When the participant leaves, we exit the bot.
|
||||
@transport.event_handler("on_participant_left")
|
||||
async def on_participant_left(transport, participant, reason):
|
||||
await task.queue_frame(EndFrame())
|
||||
|
||||
# When the first participant joins, the bot should introduce itself.
|
||||
@transport.event_handler("on_first_participant_joined")
|
||||
async def on_first_participant_joined(transport, participant):
|
||||
# Provide some air whilst tracks subscribe
|
||||
time.sleep(2)
|
||||
messages.append(
|
||||
{
|
||||
"role": "system",
|
||||
"content": "Briefly introduce yourself by saying 'hello, I'm FastBot, how can I help you today?'"})
|
||||
await task.queue_frames([LLMMessagesFrame(messages)])
|
||||
|
||||
runner = PipelineRunner()
|
||||
await runner.run(task)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
parser = argparse.ArgumentParser(description="Pipecat Bot")
|
||||
parser.add_argument("-s", "--settings", type=str, required=True, help="Pipecat bot settings")
|
||||
|
||||
args, unknown = parser.parse_known_args()
|
||||
|
||||
try:
|
||||
settings = BotSettings.model_validate_json(args.settings)
|
||||
asyncio.run(main(settings))
|
||||
except ValidationError as e:
|
||||
print(e)
|
||||
164
examples/fast-chatbot/bot_runner.py
Normal file
164
examples/fast-chatbot/bot_runner.py
Normal file
@@ -0,0 +1,164 @@
|
||||
#
|
||||
# Copyright (c) 2024, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
|
||||
import os
|
||||
import argparse
|
||||
import subprocess
|
||||
|
||||
from pydantic import BaseModel, ValidationError
|
||||
from typing import Optional
|
||||
|
||||
from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper, DailyRoomObject, DailyRoomProperties, DailyRoomParams
|
||||
|
||||
from fastapi import FastAPI, Request, HTTPException
|
||||
from fastapi.middleware.cors import CORSMiddleware
|
||||
from fastapi.responses import JSONResponse
|
||||
|
||||
from bot import BotSettings
|
||||
|
||||
from dotenv import load_dotenv
|
||||
load_dotenv(override=True)
|
||||
|
||||
|
||||
# ------------ Configuration ------------ #
|
||||
|
||||
MAX_SESSION_TIME = 5 * 60 # 5 minutes
|
||||
REQUIRED_ENV_VARS = ['DAILY_API_URL', 'DAILY_API_KEY', 'DEEPGRAM_API_KEY']
|
||||
|
||||
daily_rest_helper = DailyRESTHelper(
|
||||
os.getenv("DAILY_API_KEY", ""),
|
||||
os.getenv("DAILY_API_URL", 'https://api.daily.co/v1'))
|
||||
|
||||
|
||||
class RunnerSettings(BaseModel):
|
||||
prompt: Optional[
|
||||
str] = "You are a fast, low-latency chatbot. Your goal is to demonstrate voice-driven AI capabilities at human-like speeds. When introducing yourself briefly mention your goal is to showcase speed and conversational flow. The technology powering you is Daily for transport, Cerebrium for GPU hosting, Llama 3 (8-B version) LLM, and Deepgram for speech-to-text and text-to-speech. You are hosted on the east coast of the United States. Respond to what the user said in a creative and helpful way, but keep responses short and legible. Ensure responses contain only words. Check again that you have not included special characters other than '?' or '!'."
|
||||
deepgram_voice: Optional[str] = os.getenv("DEEPGRAM_VOICE")
|
||||
openai_model: Optional[str] = os.getenv("OPENAI_MODEL", "gpt-4o")
|
||||
openai_api_key: Optional[str] = os.getenv("OPENAI_API_KEY")
|
||||
test: Optional[bool] = None
|
||||
|
||||
# ----------------- API ----------------- #
|
||||
|
||||
|
||||
app = FastAPI()
|
||||
|
||||
app.add_middleware(
|
||||
CORSMiddleware,
|
||||
allow_origins=["*"],
|
||||
allow_credentials=True,
|
||||
allow_methods=["*"],
|
||||
allow_headers=["*"]
|
||||
)
|
||||
|
||||
# ----------------- Main ----------------- #
|
||||
|
||||
|
||||
@app.post("/start_bot")
|
||||
async def start_bot(request: Request) -> JSONResponse:
|
||||
runner_settings = RunnerSettings()
|
||||
try:
|
||||
request_body = await request.body()
|
||||
if len(request_body) > 0:
|
||||
runner_settings = RunnerSettings.model_validate_json(request_body)
|
||||
except ValidationError as e:
|
||||
raise HTTPException(
|
||||
status_code=400,
|
||||
detail=f"Invalid request: {e}")
|
||||
except Exception as e:
|
||||
# If no data in request, pass
|
||||
pass
|
||||
|
||||
# Is this a webhook creation request?
|
||||
if runner_settings.test is not None:
|
||||
return JSONResponse({"test": True})
|
||||
|
||||
# Use specified room URL, or create a new one if not specified
|
||||
room_url = os.getenv("DAILY_SAMPLE_ROOM_URL", "")
|
||||
|
||||
if not room_url:
|
||||
params = DailyRoomParams(
|
||||
properties=DailyRoomProperties()
|
||||
)
|
||||
try:
|
||||
room: DailyRoomObject = daily_rest_helper.create_room(params=params)
|
||||
except Exception as e:
|
||||
raise HTTPException(
|
||||
status_code=500,
|
||||
detail=f"Unable to provision room {e}")
|
||||
else:
|
||||
# Check passed room URL exists, we should assume that it already has a sip set up
|
||||
try:
|
||||
room: DailyRoomObject = daily_rest_helper.get_room_from_url(room_url)
|
||||
except Exception:
|
||||
raise HTTPException(
|
||||
status_code=500, detail=f"Room not found: {room_url}")
|
||||
|
||||
# Give the agent a token to join the session
|
||||
token = daily_rest_helper.get_token(room.url, MAX_SESSION_TIME)
|
||||
|
||||
if not room or not token:
|
||||
raise HTTPException(
|
||||
status_code=500, detail=f"Failed to get token for room: {room_url}")
|
||||
|
||||
# Spawn a new agent, and join the user session
|
||||
try:
|
||||
bot_settings = BotSettings(
|
||||
room_url=room.url,
|
||||
room_token=token,
|
||||
prompt=runner_settings.prompt,
|
||||
deepgram_voice=runner_settings.deepgram_voice,
|
||||
openai_model=runner_settings.openai_model,
|
||||
openai_api_key=runner_settings.openai_api_key,
|
||||
)
|
||||
bot_settings_str = bot_settings.model_dump_json(exclude_none=True)
|
||||
|
||||
subprocess.Popen(
|
||||
[f"python3 -m bot -s '{bot_settings_str}'"],
|
||||
shell=True,
|
||||
bufsize=1,
|
||||
cwd=os.path.dirname(os.path.abspath(__file__)))
|
||||
except Exception as e:
|
||||
raise HTTPException(
|
||||
status_code=500, detail=f"Failed to start subprocess: {e}")
|
||||
|
||||
# Grab a token for the user to join with
|
||||
user_token = daily_rest_helper.get_token(room.url, MAX_SESSION_TIME)
|
||||
|
||||
return JSONResponse({
|
||||
"room_url": room.url,
|
||||
"token": user_token,
|
||||
})
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
# Check environment variables
|
||||
for env_var in REQUIRED_ENV_VARS:
|
||||
if env_var not in os.environ:
|
||||
raise Exception(f"Missing environment variable: {env_var}.")
|
||||
|
||||
parser = argparse.ArgumentParser(description="Pipecat Bot Runner")
|
||||
parser.add_argument("--host", type=str,
|
||||
default=os.getenv("HOST", "0.0.0.0"), help="Host address")
|
||||
parser.add_argument("--port", type=int,
|
||||
default=os.getenv("PORT", 7860), help="Port number")
|
||||
parser.add_argument("--reload", action="store_true",
|
||||
default=True, help="Reload code on change")
|
||||
|
||||
config = parser.parse_args()
|
||||
|
||||
try:
|
||||
import uvicorn
|
||||
|
||||
uvicorn.run(
|
||||
"bot_runner:app",
|
||||
host=config.host,
|
||||
port=config.port,
|
||||
reload=config.reload
|
||||
)
|
||||
|
||||
except KeyboardInterrupt:
|
||||
print("Pipecat runner shutting down...")
|
||||
12
examples/fast-chatbot/env.example
Normal file
12
examples/fast-chatbot/env.example
Normal file
@@ -0,0 +1,12 @@
|
||||
DAILY_SAMPLE_ROOM_URL= #optional: use the same room each time, or create a new one if unset
|
||||
DAILY_API_KEY=
|
||||
DAILY_API_URL=
|
||||
|
||||
DEEPGRAM_API_KEY=
|
||||
DEEPGRAM_VOICE=
|
||||
DEEPGRAM_STT_URL=
|
||||
DEEPGRAM_TTS_BASE_URL=
|
||||
|
||||
OPENAI_API_KEY=
|
||||
OPENAI_MODEL=
|
||||
OPENAI_BASE_URL=
|
||||
@@ -1,50 +1,28 @@
|
||||
#
|
||||
# Copyright (c) 2024, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
|
||||
from loguru import logger
|
||||
import asyncio
|
||||
import aiohttp
|
||||
import os
|
||||
import sys
|
||||
import json
|
||||
import math
|
||||
import struct
|
||||
import time
|
||||
from dataclasses import dataclass, field
|
||||
from typing import List
|
||||
|
||||
|
||||
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
|
||||
from pipecat.frames.frames import (
|
||||
Frame,
|
||||
TextFrame,
|
||||
LLMMessagesFrame,
|
||||
TranscriptionFrame,
|
||||
InterimTranscriptionFrame,
|
||||
AudioRawFrame,
|
||||
InterimTranscriptionFrame,
|
||||
TranscriptionFrame,
|
||||
TextFrame,
|
||||
StartInterruptionFrame,
|
||||
StopInterruptionFrame,
|
||||
LLMFullResponseStartFrame,
|
||||
TTSStoppedFrame
|
||||
TTSStoppedFrame,
|
||||
MetricsFrame
|
||||
)
|
||||
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
|
||||
from pipecat.processors.logger import FrameLogger
|
||||
from pipecat.pipeline.pipeline import Pipeline
|
||||
from pipecat.pipeline.runner import PipelineRunner
|
||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||
|
||||
from pipecat.vad.vad_analyzer import VADAnalyzer, VADState
|
||||
from pipecat.services.deepgram import DeepgramTTSService
|
||||
from pipecat.services.openai import OpenAILLMService, OpenAILLMContext, OpenAILLMContextFrame
|
||||
from pipecat.transports.services.daily import DailyParams, DailyTransport, DailyTransportMessageFrame
|
||||
from pipecat.vad.silero import SileroVADAnalyzer
|
||||
from pipecat.vad.vad_analyzer import VADAnalyzer, VADParams, VADState
|
||||
|
||||
|
||||
from runner import configure
|
||||
|
||||
from loguru import logger
|
||||
|
||||
from dotenv import load_dotenv
|
||||
load_dotenv(override=True)
|
||||
|
||||
logger.remove(0)
|
||||
logger.add(sys.stderr, level="DEBUG")
|
||||
from pipecat.services.openai import OpenAILLMContext, OpenAILLMContextFrame
|
||||
|
||||
|
||||
class GreedyLLMAggregator(FrameProcessor):
|
||||
@@ -140,6 +118,9 @@ class VADGate(FrameProcessor):
|
||||
logger.debug(f"expected a text frame, but received {frame}")
|
||||
await self.push_frame(frame, direction)
|
||||
return
|
||||
else:
|
||||
if isinstance(frame, TextFrame):
|
||||
logger.error(f"XXXXXXXXXXXXXXXXXXX received a text frame, wasn't expecting it.")
|
||||
|
||||
if isinstance(frame, AudioRawFrame):
|
||||
# if our buffer is empty or has a "finished" sentence at the end,
|
||||
@@ -211,11 +192,8 @@ class VADGate(FrameProcessor):
|
||||
await asyncio.sleep(duration - 20 / 1000)
|
||||
if self.context:
|
||||
logger.debug(f"Appending assistant message to context: [{s.text_frame.text}]")
|
||||
if self.context.messages and self.context.messages[-1]["role"] == "assistant":
|
||||
self.context.messages[-1]["content"] += " " + s.text_frame.text
|
||||
else:
|
||||
self.context.messages.append(
|
||||
{"role": "assistant", "content": s.text_frame.text}
|
||||
self.context.messages.append(
|
||||
{"role": "assistant", "content": s.text_frame.text}
|
||||
)
|
||||
await self.push_frame(s.text_frame)
|
||||
|
||||
@@ -223,102 +201,67 @@ class VADGate(FrameProcessor):
|
||||
logger.debug(f"Exception {e}")
|
||||
|
||||
|
||||
async def main(room_url: str, token):
|
||||
async with aiohttp.ClientSession() as session:
|
||||
transport = DailyTransport(
|
||||
room_url,
|
||||
token,
|
||||
"Respond bot",
|
||||
DailyParams(
|
||||
audio_out_enabled=True,
|
||||
transcription_enabled=True,
|
||||
vad_enabled=True,
|
||||
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.5))
|
||||
)
|
||||
)
|
||||
class TranscriptionTimingLogger(FrameProcessor):
|
||||
def __init__(self, avt):
|
||||
super().__init__()
|
||||
self.name = "Transcription"
|
||||
self._avt = avt
|
||||
|
||||
tts = ClearableDeepgramTTSService(
|
||||
aiohttp_session=session,
|
||||
api_key=os.getenv("DEEPGRAM_API_KEY"),
|
||||
voice="aura-asteria-en",
|
||||
# base_url="http://0.0.0.0:8080/v1/speak"
|
||||
)
|
||||
async def process_frame(self, frame: Frame, direction: FrameDirection):
|
||||
try:
|
||||
await super().process_frame(frame, direction)
|
||||
if isinstance(frame, TranscriptionFrame):
|
||||
elapsed = time.time() - self._avt.last_transition_ts
|
||||
logger.debug(f"Transcription TTF: {elapsed}")
|
||||
await self.push_frame(MetricsFrame(ttfb={self.name: elapsed}))
|
||||
|
||||
llm = OpenAILLMService(
|
||||
# To use OpenAI
|
||||
api_key=os.getenv("OPENAI_API_KEY"),
|
||||
model="gpt-4o"
|
||||
# Or, to use a local vLLM (or similar) api server
|
||||
# model="meta-llama/Meta-Llama-3-8B-Instruct",
|
||||
# model="neuralmagic/Meta-Llama-3-70B-Instruct-FP8",
|
||||
# base_url="http://0.0.0.0:8000/v1"
|
||||
)
|
||||
|
||||
messages = [
|
||||
{
|
||||
"role": "system",
|
||||
"content": "You are a helpful LLM communicating via audio. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
||||
},
|
||||
]
|
||||
|
||||
ctx = OpenAILLMContext()
|
||||
greedy = GreedyLLMAggregator(name="greedy", context=ctx)
|
||||
gate = VADGate(name="gate", vad_analyzer=transport.input().vad_analyzer(), context=ctx)
|
||||
|
||||
pipeline = Pipeline([
|
||||
transport.input(), # Transport user input
|
||||
greedy,
|
||||
llm, # LLM
|
||||
tts, # TTS
|
||||
gate,
|
||||
transport.output(), # Transport bot output
|
||||
# FrameLogger()
|
||||
])
|
||||
|
||||
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True, enable_metrics=True))
|
||||
|
||||
# When a participant joins, start transcription for that participant so the
|
||||
# bot can "hear" and respond to them.
|
||||
@ transport.event_handler("on_participant_joined")
|
||||
async def on_participant_joined(transport, participant):
|
||||
transport.capture_participant_transcription(participant["id"])
|
||||
|
||||
# When the first participant joins, the bot should introduce itself.
|
||||
@ transport.event_handler("on_first_participant_joined")
|
||||
async def on_first_participant_joined(transport, participant):
|
||||
messages.append(
|
||||
{"role": "system", "content": "Please introduce yourself to the user."})
|
||||
await task.queue_frames([LLMMessagesFrame(messages)])
|
||||
|
||||
# Handle "latency-ping" messages. The client will send app messages that look like
|
||||
# this:
|
||||
# { "latency-ping": { ts: <client-side timestamp> }}
|
||||
#
|
||||
# We want to send an immediate pong back to the client from this handler function.
|
||||
# Also, we will push a frame into the top of the pipeline and send it after the
|
||||
#
|
||||
@ transport.event_handler("on_app_message")
|
||||
async def on_app_message(transport, message, sender):
|
||||
try:
|
||||
if "latency-ping" in message:
|
||||
logger.debug(f"Received latency ping app message: {message}")
|
||||
ts = message["latency-ping"]["ts"]
|
||||
# Send immediately
|
||||
transport.output().send_message(DailyTransportMessageFrame(
|
||||
message={"latency-pong-msg-handler": {"ts": ts}},
|
||||
participant_id=sender))
|
||||
# And push to the pipeline for the Daily transport.output to send
|
||||
await tma_in.push_frame(
|
||||
DailyTransportMessageFrame(
|
||||
message={"latency-pong-pipeline-delivery": {"ts": ts}},
|
||||
participant_id=sender))
|
||||
except Exception as e:
|
||||
logger.debug(f"message handling error: {e} - {message}")
|
||||
|
||||
runner = PipelineRunner()
|
||||
await runner.run(task)
|
||||
await self.push_frame(frame, direction)
|
||||
except Exception as e:
|
||||
logger.debug(f"Exception {e}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
(url, token) = configure()
|
||||
asyncio.run(main(url, token))
|
||||
class AudioVolumeTimer(FrameProcessor):
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
self.last_transition_ts = 0
|
||||
self._prev_volume = -80
|
||||
self._speech_volume_threshold = -50
|
||||
|
||||
async def process_frame(self, frame: Frame, direction: FrameDirection):
|
||||
await super().process_frame(frame, direction)
|
||||
|
||||
if isinstance(frame, AudioRawFrame):
|
||||
volume = self.calculate_volume(frame)
|
||||
# print(f"Audio volume: {volume:.2f} dB")
|
||||
if (volume >= self._speech_volume_threshold and
|
||||
self._prev_volume < self._speech_volume_threshold):
|
||||
# logger.debug("transition above speech volume threshold")
|
||||
self.last_transition_ts = time.time()
|
||||
elif (volume < self._speech_volume_threshold and
|
||||
self._prev_volume >= self._speech_volume_threshold):
|
||||
# logger.debug("transition below non-speech volume threshold")
|
||||
self.last_transition_ts = time.time()
|
||||
self._prev_volume = volume
|
||||
|
||||
await self.push_frame(frame, direction)
|
||||
|
||||
def calculate_volume(self, frame: AudioRawFrame) -> float:
|
||||
if frame.num_channels != 1:
|
||||
raise ValueError(f"Expected 1 channel, got {frame.num_channels}")
|
||||
|
||||
# Unpack audio data into 16-bit integers
|
||||
fmt = f"{len(frame.audio) // 2}h"
|
||||
audio_samples = struct.unpack(fmt, frame.audio)
|
||||
|
||||
# Calculate RMS
|
||||
sum_squares = sum(sample**2 for sample in audio_samples)
|
||||
rms = math.sqrt(sum_squares / len(audio_samples))
|
||||
|
||||
# Convert RMS to decibels (dB)
|
||||
# Reference: maximum value for 16-bit audio is 32767
|
||||
if rms > 0:
|
||||
db = 20 * math.log10(rms / 32767)
|
||||
else:
|
||||
db = -96 # Minimum value (almost silent)
|
||||
|
||||
return db
|
||||
6
examples/fast-chatbot/requirements.txt
Normal file
6
examples/fast-chatbot/requirements.txt
Normal file
@@ -0,0 +1,6 @@
|
||||
pipecat-ai[daily,openai,silero,deepgram]
|
||||
fastapi
|
||||
uvicorn
|
||||
requests
|
||||
python-dotenv
|
||||
loguru
|
||||
@@ -1,114 +0,0 @@
|
||||
#
|
||||
# Copyright (c) 2024, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
|
||||
import asyncio
|
||||
import aiohttp
|
||||
import os
|
||||
import sys
|
||||
|
||||
from pipecat.frames.frames import LLMMessagesFrame
|
||||
from pipecat.pipeline.pipeline import Pipeline
|
||||
from pipecat.pipeline.runner import PipelineRunner
|
||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||
from pipecat.processors.aggregators.llm_response import (
|
||||
LLMAssistantResponseAggregator, LLMUserResponseAggregator)
|
||||
from pipecat.services.cartesia import CartesiaTTSService
|
||||
from pipecat.services.anthropic import AnthropicLLMService
|
||||
from pipecat.transports.services.daily import DailyParams, DailyTransport
|
||||
from pipecat.vad.silero import SileroVADAnalyzer
|
||||
|
||||
from runner import configure
|
||||
|
||||
from loguru import logger
|
||||
|
||||
from dotenv import load_dotenv
|
||||
load_dotenv(override=True)
|
||||
|
||||
logger.remove(0)
|
||||
logger.add(sys.stderr, level="DEBUG")
|
||||
|
||||
|
||||
async def main(room_url: str, token):
|
||||
async with aiohttp.ClientSession() as session:
|
||||
transport = DailyTransport(
|
||||
room_url,
|
||||
token,
|
||||
"Respond bot",
|
||||
DailyParams(
|
||||
audio_out_enabled=True,
|
||||
transcription_enabled=True,
|
||||
vad_enabled=True,
|
||||
vad_analyzer=SileroVADAnalyzer()
|
||||
)
|
||||
)
|
||||
|
||||
tts = CartesiaTTSService(
|
||||
api_key=os.getenv("CARTESIA_API_KEY"),
|
||||
voice_name=sys.argv[1] if len(sys.argv) > 1 else "British Lady"
|
||||
)
|
||||
|
||||
llm = AnthropicLLMService(
|
||||
api_key=os.getenv("ANTHROPIC_API_KEY"),
|
||||
model="claude-3-5-sonnet-20240620",
|
||||
temperature=1.0
|
||||
)
|
||||
|
||||
# todo: think more about how to handle system prompts in a more general way. OpenAI,
|
||||
# Google, and Anthropic all have slightly different approaches to providing a system
|
||||
# prompt.
|
||||
messages = [
|
||||
{
|
||||
"role": "system",
|
||||
"content": (
|
||||
"You are participating in a friendly competition to invent creative "
|
||||
"new ice cream flavors. Say the craziest flavor you can think of "
|
||||
"then wait for your opponent to come up with a different crazy flavor. "
|
||||
"then respond with another flavor idea. Repeat forever. Say only the "
|
||||
"ice cream flavors and nothing else. End each ice cream flavor statement "
|
||||
"with an exclamation mark! Go ..."
|
||||
)
|
||||
},
|
||||
]
|
||||
|
||||
tma_in = LLMUserResponseAggregator(messages)
|
||||
tma_out = LLMAssistantResponseAggregator(messages)
|
||||
|
||||
pipeline = Pipeline([
|
||||
transport.input(), # Transport user input
|
||||
tma_in, # User responses
|
||||
llm, # LLM
|
||||
tts, # TTS
|
||||
transport.output(), # Transport bot output
|
||||
tma_out, # Assistant spoken responses
|
||||
])
|
||||
|
||||
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True, enable_metrics=True))
|
||||
|
||||
# When a participant joins, start transcription for that participant so the
|
||||
# bot can "hear" and respond to them.
|
||||
@ transport.event_handler("on_participant_joined")
|
||||
async def on_participant_joined(transport, participant):
|
||||
transport.capture_participant_transcription(participant["id"])
|
||||
|
||||
# When the first participant joins, the bot should introduce itself.
|
||||
@ transport.event_handler("on_first_participant_joined")
|
||||
async def on_first_participant_joined(transport, participant):
|
||||
await task.queue_frames([LLMMessagesFrame(messages)])
|
||||
|
||||
runner = PipelineRunner()
|
||||
|
||||
await runner.run(task)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
(url, token) = configure()
|
||||
asyncio.run(main(url, token))
|
||||
|
||||
# '{"action":"app-message","data":{"metrics":{"ttfb":[{"name":"AnthropicLLMService#0","time":0.5975627899169922}]},"type":"pipecat-metrics"},"fromId":"592d3489-90ba-401d-a760-c1a863d64a4a","callFrameId":"17189290998160.035120590426112264"}'
|
||||
# [Durian and Limburger Cheese Charcoal Activated Tar Twist!]
|
||||
# [Fermented Fish Sauce and Ghost Pepper Bubblegum Cotton Candy Nightmare!]
|
||||
# [Spoiled Yogurt and Ghost Pepper Gummy Bear Blizzard!]
|
||||
# [Matcha Green Tea and Sour Gummy Worm Fusion!]
|
||||
161
examples/twilio-chatbot/.gitignore
vendored
Normal file
161
examples/twilio-chatbot/.gitignore
vendored
Normal file
@@ -0,0 +1,161 @@
|
||||
# Byte-compiled / optimized / DLL files
|
||||
__pycache__/
|
||||
*.py[cod]
|
||||
*$py.class
|
||||
|
||||
# C extensions
|
||||
*.so
|
||||
|
||||
# Distribution / packaging
|
||||
.Python
|
||||
build/
|
||||
develop-eggs/
|
||||
dist/
|
||||
downloads/
|
||||
eggs/
|
||||
.eggs/
|
||||
lib/
|
||||
lib64/
|
||||
parts/
|
||||
sdist/
|
||||
var/
|
||||
wheels/
|
||||
share/python-wheels/
|
||||
*.egg-info/
|
||||
.installed.cfg
|
||||
*.egg
|
||||
MANIFEST
|
||||
|
||||
# PyInstaller
|
||||
# Usually these files are written by a python script from a template
|
||||
# before PyInstaller builds the exe, so as to inject date/other infos into it.
|
||||
*.manifest
|
||||
*.spec
|
||||
|
||||
# Installer logs
|
||||
pip-log.txt
|
||||
pip-delete-this-directory.txt
|
||||
|
||||
# Unit test / coverage reports
|
||||
htmlcov/
|
||||
.tox/
|
||||
.nox/
|
||||
.coverage
|
||||
.coverage.*
|
||||
.cache
|
||||
nosetests.xml
|
||||
coverage.xml
|
||||
*.cover
|
||||
*.py,cover
|
||||
.hypothesis/
|
||||
.pytest_cache/
|
||||
cover/
|
||||
|
||||
# Translations
|
||||
*.mo
|
||||
*.pot
|
||||
|
||||
# Django stuff:
|
||||
*.log
|
||||
local_settings.py
|
||||
db.sqlite3
|
||||
db.sqlite3-journal
|
||||
|
||||
# Flask stuff:
|
||||
instance/
|
||||
.webassets-cache
|
||||
|
||||
# Scrapy stuff:
|
||||
.scrapy
|
||||
|
||||
# Sphinx documentation
|
||||
docs/_build/
|
||||
|
||||
# PyBuilder
|
||||
.pybuilder/
|
||||
target/
|
||||
|
||||
# Jupyter Notebook
|
||||
.ipynb_checkpoints
|
||||
|
||||
# IPython
|
||||
profile_default/
|
||||
ipython_config.py
|
||||
|
||||
# pyenv
|
||||
# For a library or package, you might want to ignore these files since the code is
|
||||
# intended to run in multiple environments; otherwise, check them in:
|
||||
# .python-version
|
||||
|
||||
# pipenv
|
||||
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
|
||||
# However, in case of collaboration, if having platform-specific dependencies or dependencies
|
||||
# having no cross-platform support, pipenv may install dependencies that don't work, or not
|
||||
# install all needed dependencies.
|
||||
#Pipfile.lock
|
||||
|
||||
# poetry
|
||||
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
|
||||
# This is especially recommended for binary packages to ensure reproducibility, and is more
|
||||
# commonly ignored for libraries.
|
||||
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
|
||||
#poetry.lock
|
||||
|
||||
# pdm
|
||||
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
|
||||
#pdm.lock
|
||||
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
|
||||
# in version control.
|
||||
# https://pdm.fming.dev/#use-with-ide
|
||||
.pdm.toml
|
||||
|
||||
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
|
||||
__pypackages__/
|
||||
|
||||
# Celery stuff
|
||||
celerybeat-schedule
|
||||
celerybeat.pid
|
||||
|
||||
# SageMath parsed files
|
||||
*.sage.py
|
||||
|
||||
# Environments
|
||||
.env
|
||||
.venv
|
||||
env/
|
||||
venv/
|
||||
ENV/
|
||||
env.bak/
|
||||
venv.bak/
|
||||
|
||||
# Spyder project settings
|
||||
.spyderproject
|
||||
.spyproject
|
||||
|
||||
# Rope project settings
|
||||
.ropeproject
|
||||
|
||||
# mkdocs documentation
|
||||
/site
|
||||
|
||||
# mypy
|
||||
.mypy_cache/
|
||||
.dmypy.json
|
||||
dmypy.json
|
||||
|
||||
# Pyre type checker
|
||||
.pyre/
|
||||
|
||||
# pytype static type analyzer
|
||||
.pytype/
|
||||
|
||||
# Cython debug symbols
|
||||
cython_debug/
|
||||
|
||||
# PyCharm
|
||||
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
|
||||
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
|
||||
# and can be added to the global gitignore or merged into this file. For a more nuclear
|
||||
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
|
||||
#.idea/
|
||||
runpod.toml
|
||||
20
examples/twilio-chatbot/Dockerfile
Normal file
20
examples/twilio-chatbot/Dockerfile
Normal file
@@ -0,0 +1,20 @@
|
||||
# Use an official Python runtime as a parent image
|
||||
FROM python:3.10-bullseye
|
||||
|
||||
# Set the working directory in the container
|
||||
WORKDIR /twilio-chatbot
|
||||
|
||||
# Copy the requirements file into the container
|
||||
COPY requirements.txt .
|
||||
|
||||
# Install any needed packages specified in requirements.txt
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
|
||||
# Copy the current directory contents into the container
|
||||
COPY . .
|
||||
|
||||
# Expose the desired port
|
||||
EXPOSE 8765
|
||||
|
||||
# Run the application
|
||||
CMD ["uvicorn", "server:app", "--host", "0.0.0.0", "--port", "8765"]
|
||||
82
examples/twilio-chatbot/README.md
Normal file
82
examples/twilio-chatbot/README.md
Normal file
@@ -0,0 +1,82 @@
|
||||
# Twilio Chatbot
|
||||
|
||||
This project is a FastAPI-based chatbot that integrates with Twilio to handle WebSocket connections and provide real-time communication. The project includes endpoints for starting a call and handling WebSocket connections.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Features](#features)
|
||||
- [Requirements](#requirements)
|
||||
- [Installation](#installation)
|
||||
- [Configure Twilio URLs](#configure-twilio-urls)
|
||||
- [Running the Application](#running-the-application)
|
||||
- [Usage](#usage)
|
||||
|
||||
## Features
|
||||
|
||||
- **FastAPI**: A modern, fast (high-performance), web framework for building APIs with Python 3.6+.
|
||||
- **WebSocket Support**: Real-time communication using WebSockets.
|
||||
- **CORS Middleware**: Allowing cross-origin requests for testing.
|
||||
- **Dockerized**: Easily deployable using Docker.
|
||||
|
||||
## Requirements
|
||||
|
||||
- Python 3.10
|
||||
- Docker (for containerized deployment)
|
||||
- ngrok (for tunneling)
|
||||
- Twilio Account
|
||||
|
||||
## Installation
|
||||
|
||||
1. **Set up a virtual environment** (optional but recommended):
|
||||
```sh
|
||||
python -m venv venv
|
||||
source venv/bin/activate # On Windows, use `venv\Scripts\activate`
|
||||
```
|
||||
|
||||
2. **Install dependencies**:
|
||||
```sh
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
3. **Create .env**:
|
||||
create .env based on env.example
|
||||
|
||||
4. **Install ngrok**:
|
||||
Follow the instructions on the [ngrok website](https://ngrok.com/download) to download and install ngrok.
|
||||
|
||||
## Configure Twilio URLs
|
||||
|
||||
1. **Update the Twilio Webhook**:
|
||||
Copy the ngrok URL and update your Twilio phone number webhook URL to `http://<ngrok_url>/start_call`.
|
||||
|
||||
2. **Update the streams.xml**:
|
||||
Copy the ngrok URL and update templates/streams.xml with `wss://<ngrok_url>/ws`.
|
||||
|
||||
## Running the Application
|
||||
|
||||
### Using Python
|
||||
|
||||
1. **Run the FastAPI application**:
|
||||
```sh
|
||||
python server.py
|
||||
```
|
||||
|
||||
2. **Start ngrok**:
|
||||
In a new terminal, start ngrok to tunnel the local server:
|
||||
```sh
|
||||
ngrok http 8765
|
||||
```
|
||||
### Using Docker
|
||||
|
||||
1. **Build the Docker image**:
|
||||
```sh
|
||||
docker build -t twilio-chatbot .
|
||||
```
|
||||
|
||||
2. **Run the Docker container**:
|
||||
```sh
|
||||
docker run -it --rm -p 8765:8765 twilio-chatbot
|
||||
```
|
||||
## Usage
|
||||
|
||||
To start a call, simply make a call to your Twilio phone number. The webhook URL will direct the call to your FastAPI application, which will handle it accordingly.
|
||||
88
examples/twilio-chatbot/bot.py
Normal file
88
examples/twilio-chatbot/bot.py
Normal file
@@ -0,0 +1,88 @@
|
||||
import aiohttp
|
||||
import os
|
||||
import sys
|
||||
|
||||
from pipecat.frames.frames import EndFrame, LLMMessagesFrame
|
||||
from pipecat.pipeline.pipeline import Pipeline
|
||||
from pipecat.pipeline.runner import PipelineRunner
|
||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||
from pipecat.processors.aggregators.llm_response import (
|
||||
LLMAssistantResponseAggregator,
|
||||
LLMUserResponseAggregator
|
||||
)
|
||||
from pipecat.services.openai import OpenAILLMService
|
||||
from pipecat.services.deepgram import DeepgramSTTService
|
||||
from pipecat.services.elevenlabs import ElevenLabsTTSService
|
||||
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketTransport, FastAPIWebsocketParams
|
||||
from pipecat.vad.silero import SileroVADAnalyzer
|
||||
|
||||
from loguru import logger
|
||||
|
||||
from dotenv import load_dotenv
|
||||
load_dotenv(override=True)
|
||||
|
||||
logger.remove(0)
|
||||
logger.add(sys.stderr, level="DEBUG")
|
||||
|
||||
|
||||
async def run_bot(websocket_client):
|
||||
async with aiohttp.ClientSession() as session:
|
||||
transport = FastAPIWebsocketTransport(
|
||||
websocket=websocket_client,
|
||||
params=FastAPIWebsocketParams(
|
||||
audio_out_enabled=True,
|
||||
add_wav_header=False,
|
||||
vad_enabled=True,
|
||||
vad_analyzer=SileroVADAnalyzer(),
|
||||
vad_audio_passthrough=True
|
||||
)
|
||||
)
|
||||
|
||||
llm = OpenAILLMService(
|
||||
api_key=os.getenv("OPENAI_API_KEY"),
|
||||
model="gpt-4o")
|
||||
|
||||
stt = DeepgramSTTService(api_key=os.getenv('DEEPGRAM_API_KEY'))
|
||||
|
||||
tts = ElevenLabsTTSService(
|
||||
aiohttp_session=session,
|
||||
api_key=os.getenv("ELEVENLABS_API_KEY"),
|
||||
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
|
||||
)
|
||||
|
||||
messages = [
|
||||
{
|
||||
"role": "system",
|
||||
"content": "You are a helpful LLM in an audio call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
|
||||
},
|
||||
]
|
||||
|
||||
tma_in = LLMUserResponseAggregator(messages)
|
||||
tma_out = LLMAssistantResponseAggregator(messages)
|
||||
|
||||
pipeline = Pipeline([
|
||||
transport.input(), # Websocket input from client
|
||||
stt, # Speech-To-Text
|
||||
tma_in, # User responses
|
||||
llm, # LLM
|
||||
tts, # Text-To-Speech
|
||||
transport.output(), # Websocket output to client
|
||||
tma_out # LLM responses
|
||||
])
|
||||
|
||||
task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True))
|
||||
|
||||
@transport.event_handler("on_client_connected")
|
||||
async def on_client_connected(transport, client):
|
||||
# Kick off the conversation.
|
||||
messages.append(
|
||||
{"role": "system", "content": "Please introduce yourself to the user."})
|
||||
await task.queue_frames([LLMMessagesFrame(messages)])
|
||||
|
||||
@transport.event_handler("on_client_disconnected")
|
||||
async def on_client_disconnected(transport, client):
|
||||
await task.queue_frames([EndFrame()])
|
||||
|
||||
runner = PipelineRunner(handle_sigint=False)
|
||||
|
||||
await runner.run(task)
|
||||
4
examples/twilio-chatbot/env.example
Normal file
4
examples/twilio-chatbot/env.example
Normal file
@@ -0,0 +1,4 @@
|
||||
OPENAI_API_KEY=
|
||||
DEEPGRAM_API_KEY=
|
||||
ELEVENLABS_API_KEY=
|
||||
ELEVENLABS_VOICE_ID=
|
||||
5
examples/twilio-chatbot/requirements.txt
Normal file
5
examples/twilio-chatbot/requirements.txt
Normal file
@@ -0,0 +1,5 @@
|
||||
pipecat-ai[daily,openai,silero,deepgram]
|
||||
fastapi
|
||||
uvicorn
|
||||
python-dotenv
|
||||
loguru
|
||||
34
examples/twilio-chatbot/server.py
Normal file
34
examples/twilio-chatbot/server.py
Normal file
@@ -0,0 +1,34 @@
|
||||
import uvicorn
|
||||
|
||||
from fastapi import FastAPI, WebSocket
|
||||
from fastapi.middleware.cors import CORSMiddleware
|
||||
from starlette.responses import HTMLResponse
|
||||
|
||||
from bot import run_bot
|
||||
|
||||
app = FastAPI()
|
||||
|
||||
app.add_middleware(
|
||||
CORSMiddleware,
|
||||
allow_origins=["*"], # Allow all origins for testing
|
||||
allow_credentials=True,
|
||||
allow_methods=["*"],
|
||||
allow_headers=["*"],
|
||||
)
|
||||
|
||||
|
||||
@app.post('/start_call')
|
||||
async def start_call():
|
||||
print("POST TwiML")
|
||||
return HTMLResponse(content=open("templates/streams.xml").read(), media_type="application/xml")
|
||||
|
||||
|
||||
@app.websocket("/ws")
|
||||
async def websocket_endpoint(websocket: WebSocket):
|
||||
await websocket.accept()
|
||||
print("WebSocket connection accepted")
|
||||
await run_bot(websocket)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
uvicorn.run(app, host="0.0.0.0", port=8765)
|
||||
7
examples/twilio-chatbot/templates/streams.xml
Normal file
7
examples/twilio-chatbot/templates/streams.xml
Normal file
@@ -0,0 +1,7 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<Response>
|
||||
<Connect>
|
||||
<Stream url="wss://<your server url>/ws"></Stream>
|
||||
</Connect>
|
||||
<Pause length="40"/>
|
||||
</Response>
|
||||
@@ -26,6 +26,8 @@ anyio==4.4.0
|
||||
# anthropic
|
||||
# httpx
|
||||
# openai
|
||||
# starlette
|
||||
# watchfiles
|
||||
async-timeout==4.0.3
|
||||
# via
|
||||
# aiohttp
|
||||
@@ -54,12 +56,15 @@ cffi==1.16.0
|
||||
charset-normalizer==3.3.2
|
||||
# via requests
|
||||
click==8.1.7
|
||||
# via flask
|
||||
# via
|
||||
# flask
|
||||
# typer
|
||||
# uvicorn
|
||||
coloredlogs==15.0.1
|
||||
# via onnxruntime
|
||||
ctranslate2==4.3.1
|
||||
# via faster-whisper
|
||||
daily-python==0.9.1
|
||||
daily-python==0.10.0
|
||||
# via pipecat-ai (pyproject.toml)
|
||||
dataclasses-json==0.6.7
|
||||
# via
|
||||
@@ -71,17 +76,25 @@ distro==1.9.0
|
||||
# via
|
||||
# anthropic
|
||||
# openai
|
||||
dnspython==2.6.1
|
||||
# via email-validator
|
||||
einops==0.8.0
|
||||
# via pipecat-ai (pyproject.toml)
|
||||
email-validator==2.2.0
|
||||
# via fastapi
|
||||
exceptiongroup==1.2.1
|
||||
# via
|
||||
# anyio
|
||||
# pytest
|
||||
fal-client==0.4.0
|
||||
# via pipecat-ai (pyproject.toml)
|
||||
fastapi==0.111.0
|
||||
# via pipecat-ai (pyproject.toml)
|
||||
fastapi-cli==0.0.4
|
||||
# via fastapi
|
||||
faster-whisper==1.0.2
|
||||
# via pipecat-ai (pyproject.toml)
|
||||
filelock==3.15.1
|
||||
filelock==3.15.3
|
||||
# via
|
||||
# huggingface-hub
|
||||
# pyht
|
||||
@@ -113,7 +126,7 @@ google-api-core[grpc]==2.19.0
|
||||
# google-ai-generativelanguage
|
||||
# google-api-python-client
|
||||
# google-generativeai
|
||||
google-api-python-client==2.133.0
|
||||
google-api-python-client==2.134.0
|
||||
# via google-generativeai
|
||||
google-auth==2.30.0
|
||||
# via
|
||||
@@ -140,24 +153,29 @@ grpcio==1.64.1
|
||||
grpcio-status==1.62.2
|
||||
# via google-api-core
|
||||
h11==0.14.0
|
||||
# via httpcore
|
||||
# via
|
||||
# httpcore
|
||||
# uvicorn
|
||||
httpcore==1.0.5
|
||||
# via httpx
|
||||
httplib2==0.22.0
|
||||
# via
|
||||
# google-api-python-client
|
||||
# google-auth-httplib2
|
||||
httptools==0.6.1
|
||||
# via uvicorn
|
||||
httpx==0.27.0
|
||||
# via
|
||||
# anthropic
|
||||
# cartesia
|
||||
# deepgram-sdk
|
||||
# fal-client
|
||||
# fastapi
|
||||
# openai
|
||||
# openpipe
|
||||
httpx-sse==0.4.0
|
||||
# via fal-client
|
||||
huggingface-hub==0.23.3
|
||||
huggingface-hub==0.23.4
|
||||
# via
|
||||
# faster-whisper
|
||||
# timm
|
||||
@@ -168,6 +186,7 @@ humanfriendly==10.0
|
||||
idna==3.7
|
||||
# via
|
||||
# anyio
|
||||
# email-validator
|
||||
# httpx
|
||||
# requests
|
||||
# yarl
|
||||
@@ -177,41 +196,46 @@ itsdangerous==2.2.0
|
||||
# via flask
|
||||
jinja2==3.1.4
|
||||
# via
|
||||
# fastapi
|
||||
# flask
|
||||
# torch
|
||||
jsonpatch==1.33
|
||||
# via langchain-core
|
||||
jsonpointer==3.0.0
|
||||
# via jsonpatch
|
||||
langchain==0.2.3
|
||||
langchain==0.2.5
|
||||
# via
|
||||
# langchain-community
|
||||
# pipecat-ai (pyproject.toml)
|
||||
langchain-community==0.2.4
|
||||
langchain-community==0.2.5
|
||||
# via pipecat-ai (pyproject.toml)
|
||||
langchain-core==0.2.5
|
||||
langchain-core==0.2.9
|
||||
# via
|
||||
# langchain
|
||||
# langchain-community
|
||||
# langchain-openai
|
||||
# langchain-text-splitters
|
||||
langchain-openai==0.1.8
|
||||
langchain-openai==0.1.9
|
||||
# via pipecat-ai (pyproject.toml)
|
||||
langchain-text-splitters==0.2.1
|
||||
# via langchain
|
||||
langsmith==0.1.77
|
||||
langsmith==0.1.81
|
||||
# via
|
||||
# langchain
|
||||
# langchain-community
|
||||
# langchain-core
|
||||
loguru==0.7.2
|
||||
# via pipecat-ai (pyproject.toml)
|
||||
markdown-it-py==3.0.0
|
||||
# via rich
|
||||
markupsafe==2.1.5
|
||||
# via
|
||||
# jinja2
|
||||
# werkzeug
|
||||
marshmallow==3.21.3
|
||||
# via dataclasses-json
|
||||
mdurl==0.1.2
|
||||
# via markdown-it-py
|
||||
mpmath==1.3.0
|
||||
# via sympy
|
||||
multidict==6.0.5
|
||||
@@ -273,9 +297,11 @@ openai==1.26.0
|
||||
# pipecat-ai (pyproject.toml)
|
||||
openpipe==4.14.0
|
||||
# via pipecat-ai (pyproject.toml)
|
||||
orjson==3.10.4
|
||||
# via langsmith
|
||||
packaging==23.2
|
||||
orjson==3.10.5
|
||||
# via
|
||||
# fastapi
|
||||
# langsmith
|
||||
packaging==24.1
|
||||
# via
|
||||
# huggingface-hub
|
||||
# langchain-core
|
||||
@@ -289,7 +315,7 @@ pillow==10.3.0
|
||||
# torchvision
|
||||
pluggy==1.5.0
|
||||
# via pytest
|
||||
proto-plus==1.23.0
|
||||
proto-plus==1.24.0
|
||||
# via
|
||||
# google-ai-generativelanguage
|
||||
# google-api-core
|
||||
@@ -317,6 +343,7 @@ pycparser==2.22
|
||||
pydantic==2.7.4
|
||||
# via
|
||||
# anthropic
|
||||
# fastapi
|
||||
# google-generativeai
|
||||
# langchain
|
||||
# langchain-core
|
||||
@@ -324,6 +351,8 @@ pydantic==2.7.4
|
||||
# openai
|
||||
pydantic-core==2.18.4
|
||||
# via pydantic
|
||||
pygments==2.18.0
|
||||
# via rich
|
||||
pyht==0.0.28
|
||||
# via pipecat-ai (pyproject.toml)
|
||||
pyloudnorm==0.1.1
|
||||
@@ -337,7 +366,11 @@ pytest-asyncio==0.23.7
|
||||
python-dateutil==2.9.0.post0
|
||||
# via openpipe
|
||||
python-dotenv==1.0.1
|
||||
# via pipecat-ai (pyproject.toml)
|
||||
# via
|
||||
# pipecat-ai (pyproject.toml)
|
||||
# uvicorn
|
||||
python-multipart==0.0.9
|
||||
# via fastapi
|
||||
pyyaml==6.0.1
|
||||
# via
|
||||
# ctranslate2
|
||||
@@ -347,6 +380,7 @@ pyyaml==6.0.1
|
||||
# langchain-core
|
||||
# timm
|
||||
# transformers
|
||||
# uvicorn
|
||||
regex==2024.5.15
|
||||
# via
|
||||
# tiktoken
|
||||
@@ -362,6 +396,8 @@ requests==2.32.3
|
||||
# pyht
|
||||
# tiktoken
|
||||
# transformers
|
||||
rich==13.7.1
|
||||
# via typer
|
||||
rsa==4.9
|
||||
# via google-auth
|
||||
safetensors==0.4.3
|
||||
@@ -370,6 +406,8 @@ safetensors==0.4.3
|
||||
# transformers
|
||||
scipy==1.13.1
|
||||
# via pyloudnorm
|
||||
shellingham==1.5.4
|
||||
# via typer
|
||||
six==1.16.0
|
||||
# via python-dateutil
|
||||
sniffio==1.3.1
|
||||
@@ -380,15 +418,17 @@ sniffio==1.3.1
|
||||
# openai
|
||||
sounddevice==0.4.7
|
||||
# via pipecat-ai (pyproject.toml)
|
||||
sqlalchemy==2.0.30
|
||||
sqlalchemy==2.0.31
|
||||
# via
|
||||
# langchain
|
||||
# langchain-community
|
||||
starlette==0.37.2
|
||||
# via fastapi
|
||||
sympy==1.12.1
|
||||
# via
|
||||
# onnxruntime
|
||||
# torch
|
||||
tenacity==8.3.0
|
||||
tenacity==8.4.1
|
||||
# via
|
||||
# langchain
|
||||
# langchain-community
|
||||
@@ -424,11 +464,14 @@ transformers==4.40.2
|
||||
# via pipecat-ai (pyproject.toml)
|
||||
triton==2.3.1
|
||||
# via torch
|
||||
typer==0.12.3
|
||||
# via fastapi-cli
|
||||
typing-extensions==4.12.2
|
||||
# via
|
||||
# anthropic
|
||||
# anyio
|
||||
# deepgram-sdk
|
||||
# fastapi
|
||||
# google-generativeai
|
||||
# huggingface-hub
|
||||
# openai
|
||||
@@ -437,20 +480,31 @@ typing-extensions==4.12.2
|
||||
# pydantic-core
|
||||
# sqlalchemy
|
||||
# torch
|
||||
# typer
|
||||
# typing-inspect
|
||||
# uvicorn
|
||||
typing-inspect==0.9.0
|
||||
# via dataclasses-json
|
||||
ujson==5.10.0
|
||||
# via fastapi
|
||||
uritemplate==4.1.1
|
||||
# via google-api-python-client
|
||||
urllib3==2.2.1
|
||||
urllib3==2.2.2
|
||||
# via requests
|
||||
uvicorn[standard]==0.30.1
|
||||
# via fastapi
|
||||
uvloop==0.19.0
|
||||
# via uvicorn
|
||||
verboselogs==1.7
|
||||
# via deepgram-sdk
|
||||
watchfiles==0.22.0
|
||||
# via uvicorn
|
||||
websockets==12.0
|
||||
# via
|
||||
# cartesia
|
||||
# deepgram-sdk
|
||||
# pipecat-ai (pyproject.toml)
|
||||
# uvicorn
|
||||
werkzeug==3.0.3
|
||||
# via flask
|
||||
yarl==1.9.4
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
#
|
||||
# This file is autogenerated by pip-compile with Python 3.10
|
||||
# This file is autogenerated by pip-compile with Python 3.12
|
||||
# by the following command:
|
||||
#
|
||||
# pip-compile --all-extras pyproject.toml
|
||||
@@ -26,10 +26,8 @@ anyio==4.4.0
|
||||
# anthropic
|
||||
# httpx
|
||||
# openai
|
||||
async-timeout==4.0.3
|
||||
# via
|
||||
# aiohttp
|
||||
# langchain
|
||||
# starlette
|
||||
# watchfiles
|
||||
attrs==23.2.0
|
||||
# via
|
||||
# aiohttp
|
||||
@@ -54,12 +52,15 @@ cffi==1.16.0
|
||||
charset-normalizer==3.3.2
|
||||
# via requests
|
||||
click==8.1.7
|
||||
# via flask
|
||||
# via
|
||||
# flask
|
||||
# typer
|
||||
# uvicorn
|
||||
coloredlogs==15.0.1
|
||||
# via onnxruntime
|
||||
ctranslate2==4.3.1
|
||||
# via faster-whisper
|
||||
daily-python==0.9.1
|
||||
daily-python==0.10.0
|
||||
# via pipecat-ai (pyproject.toml)
|
||||
dataclasses-json==0.6.7
|
||||
# via
|
||||
@@ -71,17 +72,21 @@ distro==1.9.0
|
||||
# via
|
||||
# anthropic
|
||||
# openai
|
||||
dnspython==2.6.1
|
||||
# via email-validator
|
||||
einops==0.8.0
|
||||
# via pipecat-ai (pyproject.toml)
|
||||
exceptiongroup==1.2.1
|
||||
# via
|
||||
# anyio
|
||||
# pytest
|
||||
email-validator==2.2.0
|
||||
# via fastapi
|
||||
fal-client==0.4.0
|
||||
# via pipecat-ai (pyproject.toml)
|
||||
fastapi==0.111.0
|
||||
# via pipecat-ai (pyproject.toml)
|
||||
fastapi-cli==0.0.4
|
||||
# via fastapi
|
||||
faster-whisper==1.0.2
|
||||
# via pipecat-ai (pyproject.toml)
|
||||
filelock==3.15.1
|
||||
filelock==3.15.3
|
||||
# via
|
||||
# huggingface-hub
|
||||
# pyht
|
||||
@@ -112,7 +117,7 @@ google-api-core[grpc]==2.19.0
|
||||
# google-ai-generativelanguage
|
||||
# google-api-python-client
|
||||
# google-generativeai
|
||||
google-api-python-client==2.133.0
|
||||
google-api-python-client==2.134.0
|
||||
# via google-generativeai
|
||||
google-auth==2.30.0
|
||||
# via
|
||||
@@ -137,24 +142,29 @@ grpcio==1.64.1
|
||||
grpcio-status==1.62.2
|
||||
# via google-api-core
|
||||
h11==0.14.0
|
||||
# via httpcore
|
||||
# via
|
||||
# httpcore
|
||||
# uvicorn
|
||||
httpcore==1.0.5
|
||||
# via httpx
|
||||
httplib2==0.22.0
|
||||
# via
|
||||
# google-api-python-client
|
||||
# google-auth-httplib2
|
||||
httptools==0.6.1
|
||||
# via uvicorn
|
||||
httpx==0.27.0
|
||||
# via
|
||||
# anthropic
|
||||
# cartesia
|
||||
# deepgram-sdk
|
||||
# fal-client
|
||||
# fastapi
|
||||
# openai
|
||||
# openpipe
|
||||
httpx-sse==0.4.0
|
||||
# via fal-client
|
||||
huggingface-hub==0.23.3
|
||||
huggingface-hub==0.23.4
|
||||
# via
|
||||
# faster-whisper
|
||||
# timm
|
||||
@@ -165,6 +175,7 @@ humanfriendly==10.0
|
||||
idna==3.7
|
||||
# via
|
||||
# anyio
|
||||
# email-validator
|
||||
# httpx
|
||||
# requests
|
||||
# yarl
|
||||
@@ -174,41 +185,46 @@ itsdangerous==2.2.0
|
||||
# via flask
|
||||
jinja2==3.1.4
|
||||
# via
|
||||
# fastapi
|
||||
# flask
|
||||
# torch
|
||||
jsonpatch==1.33
|
||||
# via langchain-core
|
||||
jsonpointer==3.0.0
|
||||
# via jsonpatch
|
||||
langchain==0.2.3
|
||||
langchain==0.2.5
|
||||
# via
|
||||
# langchain-community
|
||||
# pipecat-ai (pyproject.toml)
|
||||
langchain-community==0.2.4
|
||||
langchain-community==0.2.5
|
||||
# via pipecat-ai (pyproject.toml)
|
||||
langchain-core==0.2.5
|
||||
langchain-core==0.2.9
|
||||
# via
|
||||
# langchain
|
||||
# langchain-community
|
||||
# langchain-openai
|
||||
# langchain-text-splitters
|
||||
langchain-openai==0.1.8
|
||||
langchain-openai==0.1.9
|
||||
# via pipecat-ai (pyproject.toml)
|
||||
langchain-text-splitters==0.2.1
|
||||
# via langchain
|
||||
langsmith==0.1.77
|
||||
langsmith==0.1.81
|
||||
# via
|
||||
# langchain
|
||||
# langchain-community
|
||||
# langchain-core
|
||||
loguru==0.7.2
|
||||
# via pipecat-ai (pyproject.toml)
|
||||
markdown-it-py==3.0.0
|
||||
# via rich
|
||||
markupsafe==2.1.5
|
||||
# via
|
||||
# jinja2
|
||||
# werkzeug
|
||||
marshmallow==3.21.3
|
||||
# via dataclasses-json
|
||||
mdurl==0.1.2
|
||||
# via markdown-it-py
|
||||
mpmath==1.3.0
|
||||
# via sympy
|
||||
multidict==6.0.5
|
||||
@@ -239,9 +255,11 @@ openai==1.26.0
|
||||
# pipecat-ai (pyproject.toml)
|
||||
openpipe==4.14.0
|
||||
# via pipecat-ai (pyproject.toml)
|
||||
orjson==3.10.4
|
||||
# via langsmith
|
||||
packaging==23.2
|
||||
orjson==3.10.5
|
||||
# via
|
||||
# fastapi
|
||||
# langsmith
|
||||
packaging==24.1
|
||||
# via
|
||||
# huggingface-hub
|
||||
# langchain-core
|
||||
@@ -255,7 +273,7 @@ pillow==10.3.0
|
||||
# torchvision
|
||||
pluggy==1.5.0
|
||||
# via pytest
|
||||
proto-plus==1.23.0
|
||||
proto-plus==1.24.0
|
||||
# via
|
||||
# google-ai-generativelanguage
|
||||
# google-api-core
|
||||
@@ -283,6 +301,7 @@ pycparser==2.22
|
||||
pydantic==2.7.4
|
||||
# via
|
||||
# anthropic
|
||||
# fastapi
|
||||
# google-generativeai
|
||||
# langchain
|
||||
# langchain-core
|
||||
@@ -290,6 +309,8 @@ pydantic==2.7.4
|
||||
# openai
|
||||
pydantic-core==2.18.4
|
||||
# via pydantic
|
||||
pygments==2.18.0
|
||||
# via rich
|
||||
pyht==0.0.28
|
||||
# via pipecat-ai (pyproject.toml)
|
||||
pyloudnorm==0.1.1
|
||||
@@ -303,7 +324,11 @@ pytest-asyncio==0.23.7
|
||||
python-dateutil==2.9.0.post0
|
||||
# via openpipe
|
||||
python-dotenv==1.0.1
|
||||
# via pipecat-ai (pyproject.toml)
|
||||
# via
|
||||
# pipecat-ai (pyproject.toml)
|
||||
# uvicorn
|
||||
python-multipart==0.0.9
|
||||
# via fastapi
|
||||
pyyaml==6.0.1
|
||||
# via
|
||||
# ctranslate2
|
||||
@@ -313,6 +338,7 @@ pyyaml==6.0.1
|
||||
# langchain-core
|
||||
# timm
|
||||
# transformers
|
||||
# uvicorn
|
||||
regex==2024.5.15
|
||||
# via
|
||||
# tiktoken
|
||||
@@ -328,6 +354,8 @@ requests==2.32.3
|
||||
# pyht
|
||||
# tiktoken
|
||||
# transformers
|
||||
rich==13.7.1
|
||||
# via typer
|
||||
rsa==4.9
|
||||
# via google-auth
|
||||
safetensors==0.4.3
|
||||
@@ -336,6 +364,8 @@ safetensors==0.4.3
|
||||
# transformers
|
||||
scipy==1.13.1
|
||||
# via pyloudnorm
|
||||
shellingham==1.5.4
|
||||
# via typer
|
||||
six==1.16.0
|
||||
# via python-dateutil
|
||||
sniffio==1.3.1
|
||||
@@ -346,15 +376,17 @@ sniffio==1.3.1
|
||||
# openai
|
||||
sounddevice==0.4.7
|
||||
# via pipecat-ai (pyproject.toml)
|
||||
sqlalchemy==2.0.30
|
||||
sqlalchemy==2.0.31
|
||||
# via
|
||||
# langchain
|
||||
# langchain-community
|
||||
starlette==0.37.2
|
||||
# via fastapi
|
||||
sympy==1.12.1
|
||||
# via
|
||||
# onnxruntime
|
||||
# torch
|
||||
tenacity==8.3.0
|
||||
tenacity==8.4.1
|
||||
# via
|
||||
# langchain
|
||||
# langchain-community
|
||||
@@ -368,8 +400,6 @@ tokenizers==0.19.1
|
||||
# anthropic
|
||||
# faster-whisper
|
||||
# transformers
|
||||
tomli==2.0.1
|
||||
# via pytest
|
||||
torch==2.3.1
|
||||
# via
|
||||
# pipecat-ai (pyproject.toml)
|
||||
@@ -388,11 +418,13 @@ tqdm==4.66.4
|
||||
# transformers
|
||||
transformers==4.40.2
|
||||
# via pipecat-ai (pyproject.toml)
|
||||
typer==0.12.3
|
||||
# via fastapi-cli
|
||||
typing-extensions==4.12.2
|
||||
# via
|
||||
# anthropic
|
||||
# anyio
|
||||
# deepgram-sdk
|
||||
# fastapi
|
||||
# google-generativeai
|
||||
# huggingface-hub
|
||||
# openai
|
||||
@@ -401,20 +433,30 @@ typing-extensions==4.12.2
|
||||
# pydantic-core
|
||||
# sqlalchemy
|
||||
# torch
|
||||
# typer
|
||||
# typing-inspect
|
||||
typing-inspect==0.9.0
|
||||
# via dataclasses-json
|
||||
ujson==5.10.0
|
||||
# via fastapi
|
||||
uritemplate==4.1.1
|
||||
# via google-api-python-client
|
||||
urllib3==2.2.1
|
||||
urllib3==2.2.2
|
||||
# via requests
|
||||
uvicorn[standard]==0.30.1
|
||||
# via fastapi
|
||||
uvloop==0.19.0
|
||||
# via uvicorn
|
||||
verboselogs==1.7
|
||||
# via deepgram-sdk
|
||||
watchfiles==0.22.0
|
||||
# via uvicorn
|
||||
websockets==12.0
|
||||
# via
|
||||
# cartesia
|
||||
# deepgram-sdk
|
||||
# pipecat-ai (pyproject.toml)
|
||||
# uvicorn
|
||||
werkzeug==3.0.3
|
||||
# via flask
|
||||
yarl==1.9.4
|
||||
|
||||
@@ -37,7 +37,7 @@ Website = "https://pipecat.ai"
|
||||
anthropic = [ "anthropic~=0.25.7" ]
|
||||
azure = [ "azure-cognitiveservices-speech~=1.37.0" ]
|
||||
cartesia = [ "numpy~=1.26.0", "sounddevice", "cartesia" ]
|
||||
daily = [ "daily-python~=0.9.0" ]
|
||||
daily = [ "daily-python~=0.10.0" ]
|
||||
deepgram = [ "deepgram-sdk~=3.2.7" ]
|
||||
examples = [ "python-dotenv~=1.0.0", "flask~=3.0.3", "flask_cors~=4.0.1" ]
|
||||
fal = [ "fal-client~=0.4.0" ]
|
||||
@@ -50,7 +50,7 @@ openai = [ "openai~=1.26.0" ]
|
||||
openpipe = [ "openpipe~=4.14.0" ]
|
||||
playht = [ "pyht~=0.0.28" ]
|
||||
silero = [ "torch~=2.3.0", "torchaudio~=2.3.0" ]
|
||||
websocket = [ "websockets~=12.0" ]
|
||||
websocket = [ "websockets~=12.0", "fastapi~=0.111.0" ]
|
||||
whisper = [ "faster-whisper~=1.0.2" ]
|
||||
|
||||
[tool.setuptools.packages.find]
|
||||
|
||||
@@ -12,9 +12,9 @@ from pipecat.frames.frames import Frame
|
||||
class FrameSerializer(ABC):
|
||||
|
||||
@abstractmethod
|
||||
def serialize(self, frame: Frame) -> bytes:
|
||||
def serialize(self, frame: Frame) -> str | bytes | None:
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def deserialize(self, data: bytes) -> Frame | None:
|
||||
def deserialize(self, data: str | bytes) -> Frame | None:
|
||||
pass
|
||||
|
||||
@@ -26,7 +26,7 @@ class ProtobufFrameSerializer(FrameSerializer):
|
||||
def __init__(self):
|
||||
pass
|
||||
|
||||
def serialize(self, frame: Frame) -> bytes:
|
||||
def serialize(self, frame: Frame) -> str | bytes | None:
|
||||
proto_frame = frame_protos.Frame()
|
||||
if type(frame) not in self.SERIALIZABLE_TYPES:
|
||||
raise ValueError(
|
||||
@@ -41,7 +41,7 @@ class ProtobufFrameSerializer(FrameSerializer):
|
||||
result = proto_frame.SerializeToString()
|
||||
return result
|
||||
|
||||
def deserialize(self, data: bytes) -> Frame | None:
|
||||
def deserialize(self, data: str | bytes) -> Frame | None:
|
||||
"""Returns a Frame object from a Frame protobuf. Used to convert frames
|
||||
passed over the wire as protobufs to Frame objects used in pipelines
|
||||
and frame processors.
|
||||
|
||||
55
src/pipecat/serializers/twilio.py
Normal file
55
src/pipecat/serializers/twilio.py
Normal file
@@ -0,0 +1,55 @@
|
||||
#
|
||||
# Copyright (c) 2024, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
|
||||
import base64
|
||||
import json
|
||||
|
||||
from pipecat.frames.frames import AudioRawFrame, Frame
|
||||
from pipecat.serializers.base_serializer import FrameSerializer
|
||||
from pipecat.utils.audio import ulaw_8000_to_pcm_16000, pcm_16000_to_ulaw_8000
|
||||
|
||||
|
||||
class TwilioFrameSerializer(FrameSerializer):
|
||||
SERIALIZABLE_TYPES = {
|
||||
AudioRawFrame: "audio",
|
||||
}
|
||||
|
||||
def __init__(self):
|
||||
self._sid = None
|
||||
|
||||
def serialize(self, frame: Frame) -> str | bytes | None:
|
||||
if not isinstance(frame, AudioRawFrame):
|
||||
return None
|
||||
|
||||
data = frame.audio
|
||||
|
||||
serialized_data = pcm_16000_to_ulaw_8000(data)
|
||||
payload = base64.b64encode(serialized_data).decode("utf-8")
|
||||
answer = {
|
||||
"event": "media",
|
||||
"streamSid": self._sid,
|
||||
"media": {
|
||||
"payload": payload
|
||||
}
|
||||
}
|
||||
|
||||
return json.dumps(answer)
|
||||
|
||||
def deserialize(self, data: str | bytes) -> Frame | None:
|
||||
message = json.loads(data)
|
||||
|
||||
if not self._sid:
|
||||
self._sid = message["streamSid"] if "streamSid" in message else None
|
||||
|
||||
if message["event"] != "media":
|
||||
return None
|
||||
else:
|
||||
payload_base64 = message["media"]["payload"]
|
||||
payload = base64.b64decode(payload_base64)
|
||||
|
||||
deserialized_data = ulaw_8000_to_pcm_16000(payload)
|
||||
audio_frame = AudioRawFrame(audio=deserialized_data, num_channels=1, sample_rate=16000)
|
||||
return audio_frame
|
||||
@@ -21,6 +21,7 @@ from pipecat.frames.frames import (
|
||||
TTSStoppedFrame,
|
||||
TextFrame,
|
||||
VisionImageRawFrame,
|
||||
LLMFullResponseEndFrame,
|
||||
)
|
||||
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
|
||||
from pipecat.utils.audio import calculate_audio_volume
|
||||
@@ -110,7 +111,9 @@ class TTSService(AIService):
|
||||
text = frame.text
|
||||
else:
|
||||
self._current_sentence += frame.text
|
||||
if self._current_sentence.strip().endswith((".", "?", "!")):
|
||||
if self._current_sentence.strip().endswith(
|
||||
(".", "?", "!")) and not self._current_sentence.strip().endswith(
|
||||
("Mr,", "Mrs.", "Ms.", "Dr.")):
|
||||
text = self._current_sentence.strip()
|
||||
self._current_sentence = ""
|
||||
|
||||
@@ -134,6 +137,11 @@ class TTSService(AIService):
|
||||
if self._current_sentence:
|
||||
await self._push_tts_frames(self._current_sentence)
|
||||
await self.push_frame(frame)
|
||||
elif isinstance(frame, LLMFullResponseEndFrame):
|
||||
if self._current_sentence:
|
||||
await self._push_tts_frames(self._current_sentence.strip())
|
||||
self._current_sentence = ""
|
||||
await self.push_frame(frame)
|
||||
else:
|
||||
await self.push_frame(frame, direction)
|
||||
|
||||
|
||||
@@ -40,17 +40,14 @@ class AnthropicLLMService(LLMService):
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
api_key: str,
|
||||
model: str = "claude-3-opus-20240229",
|
||||
max_tokens: int = 1024,
|
||||
temperature: float = 0.0
|
||||
):
|
||||
self,
|
||||
api_key: str,
|
||||
model: str = "claude-3-opus-20240229",
|
||||
max_tokens: int = 1024):
|
||||
super().__init__()
|
||||
self._client = AsyncAnthropic(api_key=api_key)
|
||||
self._model = model
|
||||
self._max_tokens = max_tokens
|
||||
self._temperature = temperature
|
||||
|
||||
def can_generate_metrics(self) -> bool:
|
||||
return True
|
||||
@@ -113,7 +110,6 @@ class AnthropicLLMService(LLMService):
|
||||
messages=messages,
|
||||
model=self._model,
|
||||
max_tokens=self._max_tokens,
|
||||
temperature=self._temperature,
|
||||
stream=True)
|
||||
|
||||
await self.stop_ttfb_metrics()
|
||||
|
||||
@@ -89,6 +89,7 @@ class DeepgramTTSService(TTSService):
|
||||
class DeepgramSTTService(AIService):
|
||||
def __init__(self,
|
||||
api_key: str,
|
||||
url: str = "",
|
||||
live_options: LiveOptions = LiveOptions(
|
||||
encoding="linear16",
|
||||
language="en-US",
|
||||
@@ -104,7 +105,7 @@ class DeepgramSTTService(AIService):
|
||||
self._live_options = live_options
|
||||
|
||||
self._client = DeepgramClient(
|
||||
api_key, config=DeepgramClientOptions(options={"keepalive": "true"}))
|
||||
api_key, config=DeepgramClientOptions(url=url, options={"keepalive": "true"}))
|
||||
self._connection = self._client.listen.asynclive.v("1")
|
||||
self._connection.on(LiveTranscriptionEvents.Transcript, self._on_message)
|
||||
|
||||
|
||||
160
src/pipecat/transports/network/fastapi_websocket.py
Normal file
160
src/pipecat/transports/network/fastapi_websocket.py
Normal file
@@ -0,0 +1,160 @@
|
||||
#
|
||||
# Copyright (c) 2024, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
|
||||
|
||||
import asyncio
|
||||
import io
|
||||
import wave
|
||||
|
||||
from typing import Awaitable, Callable
|
||||
from pydantic.main import BaseModel
|
||||
|
||||
from pipecat.serializers.twilio import TwilioFrameSerializer
|
||||
from pipecat.frames.frames import AudioRawFrame, StartFrame
|
||||
from pipecat.processors.frame_processor import FrameProcessor
|
||||
from pipecat.serializers.base_serializer import FrameSerializer
|
||||
from pipecat.transports.base_input import BaseInputTransport
|
||||
from pipecat.transports.base_output import BaseOutputTransport
|
||||
from pipecat.transports.base_transport import BaseTransport, TransportParams
|
||||
|
||||
from loguru import logger
|
||||
|
||||
try:
|
||||
from fastapi import WebSocket
|
||||
from starlette.websockets import WebSocketState
|
||||
except ModuleNotFoundError as e:
|
||||
logger.error(f"Exception: {e}")
|
||||
logger.error(
|
||||
"In order to use FastAPI websockets, you need to `pip install pipecat-ai[websocket]`.")
|
||||
raise Exception(f"Missing module: {e}")
|
||||
|
||||
|
||||
class FastAPIWebsocketParams(TransportParams):
|
||||
add_wav_header: bool = False
|
||||
audio_frame_size: int = 6400 # 200ms
|
||||
serializer: FrameSerializer = TwilioFrameSerializer()
|
||||
|
||||
|
||||
class FastAPIWebsocketCallbacks(BaseModel):
|
||||
on_client_connected: Callable[[WebSocket], Awaitable[None]]
|
||||
on_client_disconnected: Callable[[WebSocket], Awaitable[None]]
|
||||
|
||||
|
||||
class FastAPIWebsocketInputTransport(BaseInputTransport):
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
websocket: WebSocket,
|
||||
params: FastAPIWebsocketParams,
|
||||
callbacks: FastAPIWebsocketCallbacks,
|
||||
**kwargs):
|
||||
super().__init__(params, **kwargs)
|
||||
|
||||
self._websocket = websocket
|
||||
self._params = params
|
||||
self._callbacks = callbacks
|
||||
|
||||
async def start(self, frame: StartFrame):
|
||||
await self._callbacks.on_client_connected(self._websocket)
|
||||
await super().start(frame)
|
||||
self._receive_task = self.get_event_loop().create_task(self._receive_messages())
|
||||
|
||||
async def stop(self):
|
||||
if self._websocket.client_state != WebSocketState.DISCONNECTED:
|
||||
await self._websocket.close()
|
||||
await super().stop()
|
||||
|
||||
async def _receive_messages(self):
|
||||
async for message in self._websocket.iter_text():
|
||||
frame = self._params.serializer.deserialize(message)
|
||||
|
||||
if not frame:
|
||||
continue
|
||||
|
||||
if isinstance(frame, AudioRawFrame):
|
||||
await self.push_audio_frame(frame)
|
||||
|
||||
await self._callbacks.on_client_disconnected(self._websocket)
|
||||
|
||||
|
||||
class FastAPIWebsocketOutputTransport(BaseOutputTransport):
|
||||
|
||||
def __init__(self, websocket: WebSocket, params: FastAPIWebsocketParams, **kwargs):
|
||||
super().__init__(params, **kwargs)
|
||||
|
||||
self._websocket = websocket
|
||||
self._params = params
|
||||
self._audio_buffer = bytes()
|
||||
|
||||
async def write_raw_audio_frames(self, frames: bytes):
|
||||
self._audio_buffer += frames
|
||||
while len(self._audio_buffer) >= self._params.audio_frame_size:
|
||||
frame = AudioRawFrame(
|
||||
audio=self._audio_buffer[:self._params.audio_frame_size],
|
||||
sample_rate=self._params.audio_out_sample_rate,
|
||||
num_channels=self._params.audio_out_channels
|
||||
)
|
||||
|
||||
if self._params.add_wav_header:
|
||||
content = io.BytesIO()
|
||||
ww = wave.open(content, "wb")
|
||||
ww.setsampwidth(2)
|
||||
ww.setnchannels(frame.num_channels)
|
||||
ww.setframerate(frame.sample_rate)
|
||||
ww.writeframes(frame.audio)
|
||||
ww.close()
|
||||
content.seek(0)
|
||||
wav_frame = AudioRawFrame(
|
||||
content.read(),
|
||||
sample_rate=frame.sample_rate,
|
||||
num_channels=frame.num_channels)
|
||||
frame = wav_frame
|
||||
|
||||
payload = self._params.serializer.serialize(frame)
|
||||
if payload:
|
||||
await self._websocket.send_text(payload)
|
||||
|
||||
self._audio_buffer = self._audio_buffer[self._params.audio_frame_size:]
|
||||
|
||||
|
||||
class FastAPIWebsocketTransport(BaseTransport):
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
websocket: WebSocket,
|
||||
params: FastAPIWebsocketParams = FastAPIWebsocketParams(),
|
||||
input_name: str | None = None,
|
||||
output_name: str | None = None,
|
||||
loop: asyncio.AbstractEventLoop | None = None):
|
||||
super().__init__(input_name=input_name, output_name=output_name, loop=loop)
|
||||
self._params = params
|
||||
|
||||
self._callbacks = FastAPIWebsocketCallbacks(
|
||||
on_client_connected=self._on_client_connected,
|
||||
on_client_disconnected=self._on_client_disconnected
|
||||
)
|
||||
|
||||
self._input = FastAPIWebsocketInputTransport(
|
||||
websocket, self._params, self._callbacks, name=self._input_name)
|
||||
self._output = FastAPIWebsocketOutputTransport(
|
||||
websocket, self._params, name=self._output_name)
|
||||
|
||||
# Register supported handlers. The user will only be able to register
|
||||
# these handlers.
|
||||
self._register_event_handler("on_client_connected")
|
||||
self._register_event_handler("on_client_disconnected")
|
||||
|
||||
def input(self) -> FrameProcessor:
|
||||
return self._input
|
||||
|
||||
def output(self) -> FrameProcessor:
|
||||
return self._output
|
||||
|
||||
async def _on_client_connected(self, websocket):
|
||||
await self._call_event_handler("on_client_connected", websocket)
|
||||
|
||||
async def _on_client_disconnected(self, websocket):
|
||||
await self._call_event_handler("on_client_disconnected", websocket)
|
||||
@@ -4,11 +4,9 @@
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
|
||||
|
||||
import asyncio
|
||||
import io
|
||||
import wave
|
||||
import websockets
|
||||
|
||||
from typing import Awaitable, Callable
|
||||
from pydantic.main import BaseModel
|
||||
@@ -23,6 +21,13 @@ from pipecat.transports.base_transport import BaseTransport, TransportParams
|
||||
|
||||
from loguru import logger
|
||||
|
||||
try:
|
||||
import websockets
|
||||
except ModuleNotFoundError as e:
|
||||
logger.error(f"Exception: {e}")
|
||||
logger.error("In order to use websockets, you need to `pip install pipecat-ai[websocket]`.")
|
||||
raise Exception(f"Missing module: {e}")
|
||||
|
||||
|
||||
class WebsocketServerParams(TransportParams):
|
||||
add_wav_header: bool = False
|
||||
|
||||
@@ -113,6 +113,7 @@ class DailyCallbacks(BaseModel):
|
||||
on_app_message: Callable[[Any, str], Awaitable[None]]
|
||||
on_call_state_updated: Callable[[str], Awaitable[None]]
|
||||
on_dialin_ready: Callable[[str], Awaitable[None]]
|
||||
on_dialout_answered: Callable[[Any], Awaitable[None]]
|
||||
on_dialout_connected: Callable[[Any], Awaitable[None]]
|
||||
on_dialout_stopped: Callable[[Any], Awaitable[None]]
|
||||
on_dialout_error: Callable[[Any], Awaitable[None]]
|
||||
@@ -432,6 +433,9 @@ class DailyTransportClient(EventHandler):
|
||||
def on_dialin_ready(self, sip_endpoint: str):
|
||||
self._call_async_callback(self._callbacks.on_dialin_ready, sip_endpoint)
|
||||
|
||||
def on_dialout_answered(self, data: Any):
|
||||
self._call_async_callback(self._callbacks.on_dialout_answered, data)
|
||||
|
||||
def on_dialout_connected(self, data: Any):
|
||||
self._call_async_callback(self._callbacks.on_dialout_connected, data)
|
||||
|
||||
@@ -689,6 +693,7 @@ class DailyTransport(BaseTransport):
|
||||
on_app_message=self._on_app_message,
|
||||
on_call_state_updated=self._on_call_state_updated,
|
||||
on_dialin_ready=self._on_dialin_ready,
|
||||
on_dialout_answered=self._on_dialout_answered,
|
||||
on_dialout_connected=self._on_dialout_connected,
|
||||
on_dialout_stopped=self._on_dialout_stopped,
|
||||
on_dialout_error=self._on_dialout_error,
|
||||
@@ -711,6 +716,7 @@ class DailyTransport(BaseTransport):
|
||||
self._register_event_handler("on_app_message")
|
||||
self._register_event_handler("on_call_state_updated")
|
||||
self._register_event_handler("on_dialin_ready")
|
||||
self._register_event_handler("on_dialout_answered")
|
||||
self._register_event_handler("on_dialout_connected")
|
||||
self._register_event_handler("on_dialout_stopped")
|
||||
self._register_event_handler("on_dialout_error")
|
||||
@@ -838,6 +844,9 @@ class DailyTransport(BaseTransport):
|
||||
await self._handle_dialin_ready(sip_endpoint)
|
||||
await self._call_event_handler("on_dialin_ready", sip_endpoint)
|
||||
|
||||
async def _on_dialout_answered(self, data):
|
||||
await self._call_event_handler("on_dialout_answered", data)
|
||||
|
||||
async def _on_dialout_connected(self, data):
|
||||
await self._call_event_handler("on_dialout_connected", data)
|
||||
|
||||
|
||||
@@ -4,6 +4,7 @@
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
|
||||
import audioop
|
||||
import numpy as np
|
||||
import pyloudnorm as pyln
|
||||
|
||||
@@ -31,3 +32,23 @@ def calculate_audio_volume(audio: bytes, sample_rate: int) -> float:
|
||||
|
||||
def exp_smoothing(value: float, prev_value: float, factor: float) -> float:
|
||||
return prev_value + factor * (value - prev_value)
|
||||
|
||||
|
||||
def ulaw_8000_to_pcm_16000(ulaw_8000_bytes):
|
||||
# Convert μ-law to PCM
|
||||
pcm_8000_bytes = audioop.ulaw2lin(ulaw_8000_bytes, 2)
|
||||
|
||||
# Resample from 8000 Hz to 16000 Hz
|
||||
pcm_16000_bytes = audioop.ratecv(pcm_8000_bytes, 2, 1, 8000, 16000, None)[0]
|
||||
|
||||
return pcm_16000_bytes
|
||||
|
||||
|
||||
def pcm_16000_to_ulaw_8000(pcm_16000_bytes):
|
||||
# Resample from 16000 Hz to 8000 Hz
|
||||
pcm_8000_bytes = audioop.ratecv(pcm_16000_bytes, 2, 1, 16000, 8000, None)[0]
|
||||
|
||||
# Convert PCM to μ-law
|
||||
ulaw_8000_bytes = audioop.lin2ulaw(pcm_8000_bytes, 2)
|
||||
|
||||
return ulaw_8000_bytes
|
||||
|
||||
@@ -1,86 +0,0 @@
|
||||
import asyncio
|
||||
import time
|
||||
|
||||
from vllm import LLM, SamplingParams
|
||||
from vllm.engine.arg_utils import AsyncEngineArgs
|
||||
from vllm.engine.async_llm_engine import AsyncLLMEngine
|
||||
from vllm.utils import random_uuid
|
||||
|
||||
sampling_params = SamplingParams(
|
||||
temperature=0.8,
|
||||
top_p=0.95,
|
||||
max_tokens=4096
|
||||
)
|
||||
|
||||
prompt = "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.<|eot_id|><|start_header_id|>system<|end_header_id|>\n\nPlease introduce yourself to the user.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
|
||||
|
||||
|
||||
async def main():
|
||||
print("🥶 cold starting inference")
|
||||
start = time.monotonic_ns()
|
||||
|
||||
engine_args = AsyncEngineArgs(
|
||||
model="meta-llama/Meta-Llama-3-8B-Instruct",
|
||||
enable_prefix_caching=True,
|
||||
gpu_memory_utilization=0.90,
|
||||
enforce_eager=False, # False means slower starts but faster inference
|
||||
disable_log_stats=True, # disable logging so we can stream tokens
|
||||
disable_log_requests=True,
|
||||
)
|
||||
|
||||
engine = AsyncLLMEngine.from_engine_args(engine_args)
|
||||
duration_s = (time.monotonic_ns() - start) / 1e9
|
||||
print(f"🏎️ engine started in {duration_s:.0f}s")
|
||||
|
||||
request_id = random_uuid()
|
||||
result_generator = engine.generate(
|
||||
prompt,
|
||||
sampling_params,
|
||||
request_id,
|
||||
)
|
||||
index, num_tokens = 0, 0
|
||||
start = time.monotonic_ns()
|
||||
async for output in result_generator:
|
||||
if (
|
||||
output.outputs[0].text
|
||||
and "\ufffd" == output.outputs[0].text[-1]
|
||||
):
|
||||
continue
|
||||
text_delta = output.outputs[0].text[index:]
|
||||
index = len(output.outputs[0].text)
|
||||
num_tokens = len(output.outputs[0].token_ids)
|
||||
|
||||
print(text_delta)
|
||||
duration_s = (time.monotonic_ns() - start) / 1e9
|
||||
|
||||
print(
|
||||
f"\n\tGenerated {num_tokens} tokens in {duration_s:.1f}s,"
|
||||
f" throughput = {num_tokens / duration_s:.0f} tokens/second.\n"
|
||||
)
|
||||
|
||||
return
|
||||
|
||||
|
||||
async def xmain():
|
||||
llm = LLM(
|
||||
model="meta-llama/Meta-Llama-3-8B-Instruct",
|
||||
enable_prefix_caching=True
|
||||
)
|
||||
|
||||
outputs = llm.generate(prompt, sampling_params)
|
||||
|
||||
for output in outputs:
|
||||
prompt = output.prompt
|
||||
generated_text = output.outputs[0].text
|
||||
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
|
||||
|
||||
outputs = llm.generate(prompt, sampling_params)
|
||||
|
||||
for output in outputs:
|
||||
prompt = output.prompt
|
||||
generated_text = output.outputs[0].text
|
||||
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
Reference in New Issue
Block a user