Compare commits

...

109 Commits

Author SHA1 Message Date
Filipi Fuchter
e91fb6bb29 Merge branch 'main' into fixing_sound_mixer_small_webrtc
# Conflicts:
#	CHANGELOG.md
2025-05-05 17:43:47 -03:00
Filipi Fuchter
d75815ba48 Fixed an audio mixer issue when used alongside SmallWebRTCTransport. 2025-05-05 17:39:18 -03:00
Mark Backman
b323a7bd88 Merge pull request #1742 from pipecat-ai/mb/pcc-krisp-filter
Update pipecat-cloud-example to use Krisp in PCC deployment only
2025-05-05 15:46:12 -04:00
Mark Backman
fa011d0018 Update pipecat-cloud-example to use Krisp in PCC deployment only 2025-05-05 15:09:29 -04:00
Aleix Conchillo Flaqué
e15fa8777a Merge pull request #1737 from CerebriumAI/kyle/fix-ultravox-spacing
[Fix] Ultravox frame spacing issue
2025-05-05 09:34:49 -07:00
Aleix Conchillo Flaqué
2143a6d927 Merge pull request #1732 from pipecat-ai/aleix/daily-remote-custom-tracks
DailyTransport: remove custom tracks before leaving
2025-05-05 08:44:11 -07:00
Aleix Conchillo Flaqué
044e2d3e73 DailyTransport: remove custom tracks before leaving 2025-05-05 08:35:35 -07:00
Kyle Gani
be112ec63f Merge branch 'kyle/fix-ultravox-performance' of github.com:CerebriumAI/pipecat into kyle/fix-ultravox-performance 2025-05-05 17:13:26 +02:00
Kyle Gani
d2f56c4e8f Fix: Spacing issue 2025-05-05 17:13:21 +02:00
Mark Backman
ddc6a9c695 Merge pull request #1670 from pipecat-ai/mb/daily-twilio-sip-example
Add standalone Daily + Twilio SIP example
2025-05-05 10:57:16 -04:00
Mark Backman
2bebdbc371 Merge pull request #1671 from pipecat-ai/khk/rime-arcana
support for rime arcana model
2025-05-05 10:54:50 -04:00
Mark Backman
8b9f1f0608 Add a changelog entry 2025-05-05 10:51:46 -04:00
Kwindla Hultman Kramer
b25f3b2ed2 support for rime arcana model 2025-05-05 10:50:46 -04:00
Mark Backman
a995cf81b6 Merge pull request #1724 from pipecat-ai/mb/demo-fixes
Demo fixes
2025-05-05 08:44:57 -04:00
Aleix Conchillo Flaqué
75d261639f Merge pull request #1726 from pipecat-ai/aleix/pipecat-0.0.66
update CHANGELOG for pipecat 0.0.66
2025-05-02 20:54:57 -07:00
Aleix Conchillo Flaqué
f720d795d0 update CHANGELOG for pipecat 0.0.66 2025-05-02 20:29:51 -07:00
Aleix Conchillo Flaqué
f6fe83e358 Merge pull request #1725 from pipecat-ai/aleix/update-daily-python-0.18.1
update to daily-python 0.18.1
2025-05-02 20:27:50 -07:00
Mark Backman
0513d0b6a8 Update README 2025-05-02 22:44:50 -04:00
Mark Backman
0679bb217d Remove Twilio from phone-chatbot directory 2025-05-02 22:18:50 -04:00
Mark Backman
38bd55e518 Update README 2025-05-02 22:18:50 -04:00
Mark Backman
65c7423280 Add other dial-in event handlers 2025-05-02 22:18:50 -04:00
Mark Backman
f24a85cc94 Add logic to only forward the first on_dialin_ready event 2025-05-02 22:18:50 -04:00
Mark Backman
53887b7c98 Display phone number in WebRTC call 2025-05-02 22:18:50 -04:00
Mark Backman
523c012c38 Use a Twilio asset to ring the phone throughout 2025-05-02 22:18:50 -04:00
Mark Backman
97c28989c1 Add standalone Daily + Twilio SIP example 2025-05-02 22:18:50 -04:00
Mark Backman
c19be6ebb2 Demo fixes 2025-05-02 20:58:10 -04:00
Aleix Conchillo Flaqué
54971a0735 update to daily-python 0.18.1 2025-05-02 17:47:44 -07:00
Mark Backman
4513e81e13 Merge pull request #1723 from pipecat-ai/mb/base-output-bot-speaking-log
Only display the destination in the bot started/stopped speaking log …
2025-05-02 17:32:47 -04:00
Mark Backman
872204b795 Only display the destination in the bot started/stopped speaking log when there is a desintation 2025-05-02 17:29:28 -04:00
Aleix Conchillo Flaqué
a94cbfe6f5 Merge pull request #1722 from pipecat-ai/aleix/base-output-transport-audio-task-fix
BaseOutputTransport: always initialize audio task
2025-05-02 14:26:30 -07:00
Aleix Conchillo Flaqué
7152faafb2 BaseOutputTransport: always initialize audio task
We also use the audio task to also send synchronized images with audio.
2025-05-02 14:23:15 -07:00
Mark Backman
e6aadaccd8 Merge pull request #1721 from pipecat-ai/mb/simli-silent-frames
Fix: SimliVideoService was continuously emitting audio, preventing Bo…
2025-05-02 16:44:39 -04:00
Mark Backman
3a73aa71b8 Merge pull request #1613 from pipecat-ai/mb/improve-storybot-readme
demo: Restructure storytelling-chatbot directory, update README steps…
2025-05-02 16:39:59 -04:00
Mark Backman
814e7509e1 demo: Restructure storytelling-chatbot directory, update README steps, link to vercel demo 2025-05-02 16:37:37 -04:00
Vanessa Pyne
e0cf5ec016 Merge pull request #1705 from pipecat-ai/vp-update-nvidia-models
Riva Service: add magpie-tts-multilingual model
2025-05-02 15:34:23 -05:00
vipyne
667bd32e6a Riva: remove deprecated lines in example 2025-05-02 15:33:10 -05:00
vipyne
b2ecd83706 update CHANGELOG with Riva details 2025-05-02 15:33:10 -05:00
vipyne
b2754117c8 Riva: refactor function_id and model_name 2025-05-02 15:33:10 -05:00
vipyne
6c428c303b update magpie voice 2025-05-02 15:33:10 -05:00
Mark Backman
e7d889a143 Update RivaSTTService to use by default 2025-05-02 15:33:10 -05:00
Mark Backman
da60e7069b Update pyproject.toml to use nvidia-riva-client 2.19.1 2025-05-02 15:33:10 -05:00
Mark Backman
c14406a3b9 Demos use the latest services 2025-05-02 15:33:10 -05:00
Mark Backman
725ab5ec21 Small fixes: No default api_key of None, ParakeetSTTService uses RivaSTTService.InputParams 2025-05-02 15:33:10 -05:00
Mark Backman
daf9d47e58 Update RivaSegmentedSTTService 2025-05-02 15:33:10 -05:00
vipyne
63a65627a2 Riva Service: add magpie-tts-multilingual model 2025-05-02 15:33:10 -05:00
Mark Backman
02c07755b0 Add Changelog entry for PR 1707 2025-05-02 15:33:10 -05:00
Matt Kim
15cbd18acc [Rime] Add phonemizeBetweenBrackets and pauseBetweenBrackets to RimeTTSService (ws)
There is a fix incoming in
2025-05-02 15:33:10 -05:00
Kwindla Hultman Kramer
93c40b87dc small groq updates 2025-05-02 15:33:10 -05:00
Mark Backman
eeaa9f67a1 Fix: SimliVideoService was continuously emitting audio, preventing BotStoppedSpeakingFrame from being sent 2025-05-02 16:32:42 -04:00
Mark Backman
b60691c7b2 Merge pull request #1720 from pipecat-ai/mb/changelog-pr-1707
Add Changelog entry for PR 1707
2025-05-02 16:13:40 -04:00
Mark Backman
2bb1b0b343 Add Changelog entry for PR 1707 2025-05-02 16:09:50 -04:00
Mark Backman
047ef9f86c Merge pull request #1707 from rimelabs/matt/rime/url_param_serialization
[Rime] Add new params to RimeTTSService
2025-05-02 16:08:01 -04:00
Kwindla Hultman Kramer
9a2c603c91 Merge pull request #1711 from pipecat-ai/khk/groq-updates 2025-05-02 12:21:15 -07:00
Filipi da Silva Fuchter
94c4169407 Merge pull request #1717 from pipecat-ai/local_smart_turn_torch
Local smart turn torch
2025-05-02 15:53:30 -03:00
Filipi Fuchter
cb8a551db8 Mentioning the new LocalSmartTurnAnalyzer in the changelog. 2025-05-02 14:32:18 -03:00
Filipi Fuchter
779f09af70 Fixing lint. 2025-05-02 14:22:38 -03:00
Filipi Fuchter
19dc0f2bfb New example using the local smart turn 2025-05-02 14:21:42 -03:00
Filipi Fuchter
f0709e22ba Creating a local smart turn using torch. 2025-05-02 14:21:29 -03:00
Mark Backman
8250736f5e Merge pull request #1708 from pipecat-ai/mb/gemini-user-context
Push GeminiMultimodalLiveLLMService TranscriptionFrame Upstream, remo…
2025-05-02 13:10:27 -04:00
Mark Backman
83348a9f93 Merge pull request #1714 from pipecat-ai/mb/fix-gemini-text-modality
Restore TEXT modalities support to GeminiMultimodalLiveLLMService
2025-05-02 10:41:05 -04:00
Mark Backman
96d40903a9 Only send TTSStoppedFrame from Gemini when in AUDIO mode, only send one LLMFullResponseEndFrame 2025-05-02 10:18:53 -04:00
Aleix Conchillo Flaqué
2560811805 Merge pull request #1697 from pipecat-ai/aleix/daily-custom-audio-tracks
add support for multiple transport destinations
2025-05-02 06:34:09 -07:00
Mark Backman
2b8c44c008 Merge pull request #1710 from pipecat-ai/mb/openai-context-aggregation
fix: OpenAIRealtimeBetaLLMService writes two assistant messages to th…
2025-05-02 07:43:35 -04:00
Mark Backman
38e2d37674 Restore TEXT modalities support to GeminiMultimodalLiveLLMService 2025-05-02 07:36:12 -04:00
Vanessa Pyne
6278561f88 Merge pull request #1709 from pipecat-ai/vp-fix-fastpitch-params-update
Riva TTS: update FastPitch params
2025-05-01 21:23:10 -05:00
Aleix Conchillo Flaqué
750e79c1ce DailyParams: rename to camera/microphone_out_enabled 2025-05-01 19:17:14 -07:00
Aleix Conchillo Flaqué
71eb2963c5 examples: added daily-custom-tracks 2025-05-01 19:17:14 -07:00
Aleix Conchillo Flaqué
f44e2c86ea BaseOutputTransport: compute sample_rate and audio_chunk_size in main class 2025-05-01 19:17:14 -07:00
Aleix Conchillo Flaqué
afe1f0df8c DailyTransport: make sure we can write audio frames to destination 2025-05-01 19:17:14 -07:00
Aleix Conchillo Flaqué
458fddfb48 update CHANGELOG with new Daily and Transport features 2025-05-01 19:17:14 -07:00
Aleix Conchillo Flaqué
8d915c5ccb DailyParams: allow enabling/disabling camera/microphone tracks 2025-05-01 19:17:14 -07:00
Aleix Conchillo Flaqué
304153dd03 TTSService: set transport destination to all TTS frames 2025-05-01 19:17:14 -07:00
Aleix Conchillo Flaqué
a6781b7352 rename destination to transport_destination 2025-05-01 19:17:14 -07:00
Aleix Conchillo Flaqué
5ad0058303 update CHANGELOG with frame source/destination support 2025-05-01 19:11:13 -07:00
Aleix Conchillo Flaqué
75c039de33 examples: add daily-multi-translation 2025-05-01 19:11:13 -07:00
Aleix Conchillo Flaqué
74e3c3677e DailyTransport: fix audio/video renderers registration 2025-05-01 18:58:44 -07:00
Aleix Conchillo Flaqué
dc20327f10 DailyTransport: register audio destination and use custom tracks 2025-05-01 18:58:44 -07:00
Aleix Conchillo Flaqué
e738affd29 BaseOutputTransport: allow sending audio/video to multiple destinations 2025-05-01 18:58:44 -07:00
Aleix Conchillo Flaqué
ef3d732607 DailyTransport: allow capturing multiple simultaneous audio/video sources 2025-05-01 18:58:44 -07:00
Aleix Conchillo Flaqué
6d63cff1bf DailyTransport: custom audio tracks support 2025-05-01 18:58:44 -07:00
Aleix Conchillo Flaqué
12f42605a1 pyproject: update daily-python to 0.18.0 2025-05-01 18:58:44 -07:00
Kwindla Hultman Kramer
fac3337927 small groq updates 2025-05-01 17:09:15 -07:00
Mark Backman
76d198151c Push GeminiMultimodalLiveLLMService TranscriptionFrame Upstream, remove direct context addition 2025-05-01 15:41:04 -04:00
Mark Backman
6a907058de fix: OpenAIRealtimeBetaLLMService writes two assistant messages to the context 2025-05-01 15:37:39 -04:00
vipyne
6e1f531f64 Riva TTS: update FastPitch params
91138c3f66 (diff-ece228577b1d233ce600a948243f90cece53e3a9b89554a0b27a48bc4d6e0fdfR45)
2025-05-01 11:14:41 -05:00
Matt Kim
4232cca5b6 [Rime] Add phonemizeBetweenBrackets and pauseBetweenBrackets to RimeTTSService (ws)
There is a fix incoming in
2025-04-30 18:09:22 -07:00
Mark Backman
a6a4d3d71f Merge pull request #1706 from rimelabs/matt/rime/update_url
[Rime] - Update url for Websockets API
2025-04-30 19:14:04 -04:00
Mark Backman
c52de0f5de Merge pull request #1696 from pipecat-ai/mb/fix-gemini-live-context
Fix: GeminiMultimodalLiveLLMService was appending tokens to the context
2025-04-30 19:12:06 -04:00
Mark Backman
a1e1255f16 Strip newlines from generated user transcript 2025-04-30 18:27:46 -04:00
Mark Backman
c4f758725e Ignore TranscriptionFrames too 2025-04-30 18:22:43 -04:00
Aleix Conchillo Flaqué
7bc9a78ce6 udpate CHANGELOG with RTVIObserverParams 2025-04-30 15:13:14 -07:00
Aleix Conchillo Flaqué
f8be71b32c Merge pull request #1688 from pipecat-ai/aleix/add-rtvi-observer-params
RTVIObserver: add RTVIObserverParams to configure what to send
2025-04-30 15:11:18 -07:00
Aleix Conchillo Flaqué
957fa5546d RTVIObserver: add RTVIObserverParams to configure what to send 2025-04-30 15:09:02 -07:00
Aleix Conchillo Flaqué
039cb8fcae Merge pull request #1690 from pipecat-ai/aleix/rtvi-function-call-single-param
RTVIProcessor: use single FunctionCallParams
2025-04-30 15:04:05 -07:00
Mark Backman
8e05f2f1a1 Merge pull request #1702 from pipecat-ai/mb/stt-mute-transcription-frames
Add InterimTranscriptionFrame and TranscriptionFrame to STTMuteFilter…
2025-04-30 17:54:24 -04:00
Matt Kim
8467aa1ed3 [Rime] - Update url for Websockets API
Rime has migrated their Websockets api to the base url `user.rime.ai` along with all other tts endpoints. 

See the [docs](https://docs.rime.ai/api-reference/endpoint/websockets)

`users-ws.rime.ai` is deprecated and will not reflect upgrades to the rime ws api.
2025-04-30 14:20:13 -07:00
Mark Backman
9c5878af3d OpenAI Realtime and Gemini Live push LLMTextFrame again, overwrite the assitant context aggregator for LLMTextFrame 2025-04-30 17:18:20 -04:00
Mark Backman
ef29800fe9 Update the changelog 2025-04-30 16:28:17 -04:00
Mark Backman
7e09933070 OpenAI Realtime should push TTSTextFrame only 2025-04-30 16:28:17 -04:00
Mark Backman
82a9d7f992 Gemini Mulitmodal Live to push TTSTextFrame only 2025-04-30 16:28:17 -04:00
Mark Backman
facbebb15f Transcribe user audio in 26b 2025-04-30 16:28:16 -04:00
Mark Backman
2ba60fc41f Update TranscriptProcessor to handle GeminiMultimodalLiveLLMService changes 2025-04-30 16:28:16 -04:00
Mark Backman
685f951ae2 Fix: GeminiMultimodalLiveLLMService was appending tokens to the context 2025-04-30 16:28:16 -04:00
Mark Backman
27d4c927a8 Merge pull request #1701 from pipecat-ai/mb/gemini-extend-session
Add context_window_compression support to GeminiMultimodalLiveLLMService
2025-04-30 14:35:50 -04:00
Mark Backman
20a59e8c56 Add InterimTranscriptionFrame and TranscriptionFrame to STTMuteFilter frame processing 2025-04-30 10:50:56 -04:00
Mark Backman
d9a0a93667 Add context_window_compression support to GeminiMultimodalLiveLLMService 2025-04-30 09:55:34 -04:00
Mark Backman
154d5d1859 Merge pull request #1699 from pipecat-ai/mb/more-docs-mocks
Additional import mocks to fix docs failure
2025-04-30 08:36:57 -04:00
Mark Backman
a192217256 Additional import mocks to fix docs failure 2025-04-29 21:45:27 -04:00
Aleix Conchillo Flaqué
6821b1cdab RTVIProcessor: use single FunctionCallParams 2025-04-29 05:56:23 -07:00
127 changed files with 3660 additions and 965 deletions

View File

@@ -9,6 +9,66 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added
- Added support to `RimeHttpTTSService` for the `arcana` model.
### Fixed
- Refactored how the `start` method is handled in `SmallWebRTCOutputTransport` by
initializing it before the parent class. This fixes an audio mixer issue when used
alongside `SmallWebRTCTransport`, preventing unnecessary CPU usage and avoiding the
output being flooded with silent frames when no new audio is available.
- Remove custom audio tracks from `DailyTransport` before leaving.
## [0.0.66] - 2025-05-02
### Added
- Added two new input parameters to `RimeTTSService`: `pause_between_brackets`
and `phonemize_between_brackets`.
- Added support for cross-platform local smart turn detection. You can use
`LocalSmartTurnAnalyzer` for on-device inference using Torch.
- `BaseOutputTransport` now allows multiple destinations if the transport
implementation supports it (e.g. Daily's custom tracks). With multiple
destinations it is possible to send different audio or video tracks with a
single transport simultaneously. To do that, you need to set the new
`Frame.transport_destination` field with your desired transport destination
(e.g. custom track name), tell the transport you want a new destination with
`TransportParams.audio_out_destinations` or
`TransportParams.video_out_destinations` and the transport should take care of
the rest.
- Similar to the new `Frame.transport_destination`, there's a new
`Frame.transport_source` field which is set by the `BaseInputTransport` if the
incoming data comes from a non-default source (e.g. custom tracks).
- `TTSService` has a new `transport_destination` constructor parameter. This
parameter will be used to update the `Frame.transport_destination` field for
each generated `TTSAudioRawFrame`. This allows sending multiple bots' audio to
multiple destinations in the same pipeline.
- Added `DailyTransportParams.camera_out_enabled` and
`DailyTransportParams.microphone_out_enabled` which allows you to
enable/disable the main output camera or microphone tracks. This is useful if
you only want to use custom tracks and not send the main tracks. Note that you
still need `audio_out_enabled=True` or `video_out_enabled`.
- Added `DailyTransport.capture_participant_audio()` which allows you to capture
an audio source (e.g. "microphone", "screenAudio" or a custom track name) from
a remote participant.
- Added `DailyTransport.update_publishing()` which allows you to update the call
video and audio publishing settings (e.g. audio and video quality).
- Added `RTVIObserverParams` which allows you to configure what RTVI messages
are sent to the clients.
- Added a `context_window_compression` InputParam to
`GeminiMultimodalLiveLLMService` which allows you to enable a sliding context
window for the session as well as set the token limit of the sliding window.
- Updated `SmallWebRTCConnection` to support `ice_servers` with credentials.
- Added `VADUserStartedSpeakingFrame` and `VADUserStoppedSpeakingFrame`,
@@ -25,10 +85,26 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Added `MCPClient`; a way to connect to MCP servers and use the MCP servers'
tools.
- Added `Mem0 OSS`, along with Mem0 cloud support now the OSS version is also available.
- Added `Mem0 OSS`, along with Mem0 cloud support now the OSS version is also
available.
### Changed
- `TransportParams.audio_mixer` now supports a string and also a dictionary to
provide a mixer per destination. For example:
```python
audio_out_mixer={
"track-1": SoundfileMixer(...),
"track-2": SoundfileMixer(...),
"track-N": SoundfileMixer(...),
},
```
- The `STTMuteFilter` now mutes `InterimTranscriptionFrame` and
`TranscriptionFrame` which allows the `STTMuteFilter` to be used in
conjunction with transports that generate transcripts, e.g. `DailyTransport`.
- Function calls now receive a single parameter `FunctionCallParams` instead of
`(function_name, tool_call_id, args, llm, context, result_callback)` which is
now deprecated.
@@ -58,6 +134,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
case there's no need to push audio to the rest of the pipeline, but this is
not a very common case.
- Added `RivaSegmentedSTTService`, which allows Riva offline/batch models, such
as to be "canary-1b-asr" used in Pipecat.
### Deprecated
- Function calls with parameters
@@ -73,8 +152,23 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- `TransportParams.vad_audio_passthrough` parameter is now deprecated, use
`TransportParams.audio_in_passthrough` instead.
- `ParakeetSTTService` is now deprecated, use `RivaSTTService` instead, which uses
the model "parakeet-ctc-1.1b-asr" by default.
- `FastPitchTTSService` is now deprecated, use `RivaTTSService` instead, which uses
the model "magpie-tts-multilingual" by default.
### Fixed
- Fixed an issue with `SimliVideoService` where the bot was continuously outputting
audio, which prevents the `BotStoppedSpeakingFrame` from being emitted.
- Fixed an issue where `OpenAIRealtimeBetaLLMService` would add two assistant
messages to the context.
- Fixed an issue with `GeminiMultimodalLiveLLMService` where the context
contained tokens instead of words.
- Fixed an issue with HTTP Smart Turn handling, where the service returns a 500
error. Previously, this would cause an unhandled exception. Now, a 500 error
is treated as an incomplete response.
@@ -87,6 +181,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Other
- Added `examples/daily-custom-tracks` to show how to send and receive Daily
custom tracks.
- Added `examples/daily-multi-translation` to showcase how to send multiple
simulataneous translations with the same transport.
- Added 04 foundational examples for client/server transports. Also, renamed
`29-livekit-audio-chat.py` to `04b-transports-livekit.py`.
@@ -196,8 +296,9 @@ https://en.wikipedia.org/wiki/Saint_George%27s_Day_in_Catalonia
- Fixed an issue in `SmallWebRTCTransport` where an error was thrown if the
client did not create a video transceiver.
- Fixed an issue where LLM input parameters were not working and applied correctly in `GoogleVertexLLMService`, causing
unexpected behavior during inference.
- Fixed an issue where LLM input parameters were not working and applied
correctly in `GoogleVertexLLMService`, causing unexpected behavior during
inference.
### Other

View File

@@ -50,7 +50,6 @@ autodoc_mock_imports = [
"pyht.protos",
"pyht.protos.api_pb2",
"pipecat_ai_playht", # PlayHT wrapper
"vllm",
"aiortc",
"aiortc.mediastreams",
"cv2",
@@ -76,7 +75,6 @@ autodoc_mock_imports = [
"openpipe",
"simli",
"soundfile",
# Existing mocks
"pipecat_ai_krisp",
"pyaudio",
"_tkinter",
@@ -87,6 +85,66 @@ autodoc_mock_imports = [
"pydantic.Field",
"pydantic._internal._model_construction",
"pydantic._internal._fields",
# Moondream dependencies
"torch",
"transformers",
"intel_extension_for_pytorch",
# Ultravox dependencies
"huggingface_hub",
"vllm",
"vllm.engine.arg_utils",
"transformers.AutoTokenizer",
# Langchain dependencies
"langchain_core",
"langchain_core.messages",
"langchain_core.runnables",
"langchain_core.messages.AIMessageChunk",
"langchain_core.runnables.Runnable",
# LiveKit dependencies
"livekit",
"livekit.rtc",
"livekit_api",
"livekit_protocol",
"tenacity",
"tenacity.retry",
"tenacity.stop_after_attempt",
"tenacity.wait_exponential",
"rtc",
"rtc.Room",
"rtc.RoomOptions",
"rtc.AudioSource",
"rtc.LocalAudioTrack",
"rtc.TrackPublishOptions",
"rtc.TrackSource",
"rtc.AudioStream",
"rtc.AudioFrameEvent",
"rtc.AudioFrame",
"rtc.Track",
"rtc.TrackKind",
"rtc.RemoteParticipant",
"rtc.RemoteTrackPublication",
"rtc.DataPacket",
# Riva dependencies
"riva",
"riva.client",
"riva.client.Auth",
"riva.client.ASRService",
"riva.client.StreamingRecognitionConfig",
"riva.client.RecognitionConfig",
"riva.client.AudioEncoding",
"riva.client.proto.riva_tts_pb2",
"riva.client.SpeechSynthesisService",
# Local CoreML Smart Turn dependencies
"coremltools",
"coremltools.models",
"coremltools.models.MLModel",
"torch",
"torch.nn",
"torch.nn.functional",
"transformers",
"transformers.AutoFeatureExtractor",
# Also add specific classes that are imported
"AutoFeatureExtractor",
]
# HTML output settings
@@ -118,12 +176,25 @@ def verify_modules():
},
}
# Skip importing modules that are in autodoc_mock_imports
skipped_modules = set(autodoc_mock_imports)
missing = []
for category, modules in required_modules.items():
if isinstance(modules, dict):
# Handle nested structure
for subcategory, submodules in modules.items():
for module in submodules:
# Check if module is in autodoc_mock_imports
if (
f"pipecat.{category}.{subcategory}.{module}" in skipped_modules
or module in skipped_modules
):
logger.info(
f"Skipping import of mocked module: pipecat.{category}.{subcategory}.{module}"
)
continue
try:
__import__(f"pipecat.{category}.{subcategory}.{module}")
logger.info(
@@ -137,6 +208,11 @@ def verify_modules():
else:
# Handle flat structure
for module in modules:
# Check if module is in autodoc_mock_imports
if f"pipecat.{category}.{module}" in skipped_modules or module in skipped_modules:
logger.info(f"Skipping import of mocked module: pipecat.{category}.{module}")
continue
try:
__import__(f"pipecat.{category}.{module}")
logger.info(f"Successfully imported pipecat.{category}.{module}")

View File

@@ -26,20 +26,23 @@ pipecat-ai[grok]
pipecat-ai[groq]
# pipecat-ai[krisp] # Mocked
pipecat-ai[koala]
pipecat-ai[langchain]
pipecat-ai[livekit]
# pipecat-ai[langchain] # Mocked
# pipecat-ai[livekit] # Mocked
pipecat-ai[lmnt]
pipecat-ai[local]
# pipecat-ai[local-smart-turn] # Mocked
# pipecat-ai[mem0] # Mocked
# pipecat-ai[mlx-whisper] # Mocked
pipecat-ai[moondream]
# pipecat-ai[moondream] # Mocked
pipecat-ai[nim]
# pipecat-ai[neuphonic] # Mocked
pipecat-ai[noisereduce]
pipecat-ai[openai]
# pipecat-ai[openpipe]
# pipecat-ai[playht] # Mocked due to grpcio conflict with riva
pipecat-ai[riva]
pipecat-ai[qwen]
pipecat-ai[remote-smart-turn]
# pipecat-ai[riva] # Mocked
pipecat-ai[silero]
pipecat-ai[simli]
pipecat-ai[soundfile]

View File

@@ -53,4 +53,3 @@ async def configure(aiohttp_session: aiohttp.ClientSession):
token = await daily_rest_helper.get_token(url, expiry_time)
return (url, token)
return (url, token)

View File

@@ -0,0 +1,39 @@
# Daily Custom Tracks
This example shows how to send and receive Daily custom tracks. We will run a simple `daily-python` application to send an audio file with a custom track (named "pipecat") to a room. Then, the Pipecat bot will mirror that custom track into another custom track (named "pipecat-mirror") in the same room.
## Get started
```python
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```
## Run the bot
Start the bot by giving it a Daily room URL.
```bash
python bot.py -u ROOM_URL
```
The bot will wait for the first participant to join. Then, it will mirror a custom track named "pipecat" into a new custom track named "pipecat-mirror".
## Run the sender
Now, run the custom track sender. This is a simple `daily-python` application that opens and audio file and sends it as a custom track to the same Daily room.
```bash
python custom_track_sender.py -u ROOM_URL -i office-ambience-mono-16000.mp3
```
## Open client
Finally, open the client so you can hear both custom tracks.
```bash
open index.html
```
Once the client is opened, copy the URL of the Daily room and join it. You should be able to select which custom track you want to hear.

View File

@@ -0,0 +1,87 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import sys
import aiohttp
from loguru import logger
from runner import configure
from pipecat.frames.frames import Frame, InputAudioRawFrame, OutputAudioRawFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.transports.services.daily import DailyParams, DailyTransport
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
class CustomTrackMirrorProcessor(FrameProcessor):
def __init__(self, transport_destination: str, **kwargs):
super().__init__(**kwargs)
self._transport_destination = transport_destination
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, InputAudioRawFrame) and frame.transport_source:
output_frame = OutputAudioRawFrame(
audio=frame.audio,
sample_rate=frame.sample_rate,
num_channels=frame.num_channels,
)
output_frame.transport_destination = self._transport_destination
await self.push_frame(output_frame)
else:
await self.push_frame(frame, direction)
async def main():
async with aiohttp.ClientSession() as session:
(room_url, _) = await configure(session)
transport = DailyTransport(
room_url,
None,
"Custom tracks mirror",
DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
microphone_out_enabled=False, # Disable since we just use custom tracks
audio_out_destinations=["pipecat-mirror"],
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
CustomTrackMirrorProcessor("pipecat-mirror"),
transport.output(), # Transport bot output
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
audio_in_sample_rate=16000,
audio_out_sample_rate=16000,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_audio(participant["id"], audio_source="pipecat")
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,74 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import argparse
import time
from daily import CallClient, CustomAudioSource, Daily
from pydub import AudioSegment
parser = argparse.ArgumentParser(description="Daily AI SDK Bot Sample")
parser.add_argument("-u", "--url", type=str, required=True, help="URL of the Daily room to join")
parser.add_argument(
"-i", "--input", type=str, required=True, help="Input audio file (needs 16000 sample rate)"
)
args, _ = parser.parse_known_args()
audio = AudioSegment.from_mp3(args.input)
raw_bytes = audio.raw_data
sample_rate = audio.frame_rate
channels = audio.channels
print(f"Length: {len(raw_bytes)} bytes")
print(f"Sample rate: {sample_rate}, Channels: {channels}")
# Initialize the Daily context & create call client
Daily.init()
client = CallClient()
# Join the room and indicate we have a custom track named "pipecat".
client.join(
args.url,
client_settings={
"publishing": {
"camera": False,
"microphone": False,
"customAudio": {"pipecat": True},
},
},
)
# Just sleep for a couple of seconds. To do this well we should really use
# completions.
time.sleep(2)
# Create the custom audio source. This is where we will write our audio.
audio_source = CustomAudioSource(sample_rate, channels)
# Create an audio track and assign it our audio source.
client.add_custom_audio_track("pipecat", audio_source)
# Just sleep for a second. To do this well we should really use completions.
time.sleep(1)
try:
# Just write one second of audio until we have read all the file.
chunk_size = sample_rate * channels * 2
while len(raw_bytes) > 0:
chunk = raw_bytes[:chunk_size]
raw_bytes = raw_bytes[chunk_size:]
audio_source.write_frames(chunk)
except KeyboardInterrupt:
client.leave()
# Just sleep for a second. To do this well we should really use completions.
time.sleep(1)
client.release()

View File

@@ -0,0 +1,173 @@
<html>
<head>
<title>daily custom tracks</title>
</head>
<script crossorigin src="https://unpkg.com/@daily-co/daily-js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/fomantic-ui/2.8.6/semantic.min.js"></script>
<link
rel="stylesheet"
type="text/css"
href="https://cdnjs.cloudflare.com/ajax/libs/fomantic-ui/2.8.6/semantic.min.css"
/>
<script>
function enableButton(buttonId, enable) {
const button = document.getElementById(buttonId);
button.disabled = !enable;
}
function enableJoinButton(enable) {
enableButton("join-button", enable);
}
function enableLeaveButton(enable) {
enableButton("leave-button", enable);
}
function destroyPlayers(query) {
const items = document.querySelectorAll(query);
if (items) {
for (const item of items) {
item.remove();
}
}
}
function destroyParticipantPlayers(participantId) {
destroyPlayers(`audio[data-participant-id="${participantId}"]`);
destroyPlayers(`button[data-participant-id="${participantId}"]`);
}
async function startPlayer(player, track) {
player.muted = false;
player.autoplay = true;
if (track != null) {
player.srcObject = new MediaStream([track]);
}
}
async function buildAudioPlayer(track, participantId) {
const audioContainer = document.getElementById("audio-container");
const player = document.createElement("audio");
player.dataset.participantId = participantId;
// Create a new button for controlling audio
const audioControlButton = document.createElement("button");
audioControlButton.className = "ui primary green button"
audioControlButton.innerText = track._mediaTag == "cam-audio" ? "english" : track._mediaTag;
audioControlButton.dataset.participantId = participantId;
audioControlButton.onclick = () => {
if (player.paused) {
player.play();
audioControlButton.className = "ui primary red button"
} else {
player.pause();
audioControlButton.className = "ui primary green button"
}
};
audioContainer.appendChild(player);
audioContainer.appendChild(audioControlButton);
await startPlayer(player, track);
player.pause()
return player;
}
function subscribeToTracks(participantId) {
console.log(`subscribing to track`);
if (participantId === "local") {
return;
}
callObject.updateParticipant(participantId, {
setSubscribedTracks: {
audio: true,
video: false,
custom: true,
},
});
}
function startDaily() {
enableJoinButton(true);
enableLeaveButton(false);
window.callObject = window.DailyIframe.createCallObject({});
callObject.on("participant-joined", (e) => {
if (!e.participant.local) {
console.log("participant-joined", e.participant);
subscribeToTracks(e.participant.session_id);
}
});
callObject.on("participant-left", (e) => {
console.log("participant-left", e.participant.session_id);
destroyParticipantPlayers(e.participant.session_id);
});
callObject.on("track-started", async (e) => {
console.log("track-started", e.track);
if (e.track.kind === "audio") {
await buildAudioPlayer(e.track, e.participant.session_id);
}
});
}
async function joinRoom() {
enableJoinButton(false);
enableLeaveButton(true);
const meetingUrl = document.getElementById("meeting-url").value;
callObject.join({
url: meetingUrl,
startVideoOff: true,
startAudioOff: true,
subscribeToTracksAutomatically: false,
receiveSettings: {
base: { video: { layer: 0 } },
},
});
}
async function leaveRoom() {
enableJoinButton(true);
enableLeaveButton(false);
callObject.leave();
const audioContainer = document.getElementById("audio-container");
audioContainer.replaceChildren();
}
</script>
<body onload="startDaily()">
<div class="ui centered page grid" style="margin-top: 30px">
<div class="ten wide column">
<div class="ui form" style="margin-top: 30px">
<div class="field">
<label>Meeting URL</label>
<input id="meeting-url" value="" />
</div>
</div>
</div>
</div>
<div class="ui centered aligned header" style="margin-top: 30px">
<button id="join-button" class="ui primary button" onclick="joinRoom()">
Join
</button>
<button id="leave-button" class="ui button" onclick="leaveRoom()">
Leave
</button>
</div>
<div id="tile" class="ui container" style="margin-top: 30px">
<div id="tile" class="ui center aligned grid">
<div id="audio-container"></div><br/>
</div>
</div>
</body>
</html>

View File

@@ -0,0 +1,2 @@
pydub
pipecat-ai[daily]

View File

@@ -0,0 +1,55 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import argparse
import os
import aiohttp
from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper
async def configure(aiohttp_session: aiohttp.ClientSession):
parser = argparse.ArgumentParser(description="Daily AI SDK Bot Sample")
parser.add_argument(
"-u", "--url", type=str, required=False, help="URL of the Daily room to join"
)
parser.add_argument(
"-k",
"--apikey",
type=str,
required=False,
help="Daily API Key (needed to create an owner token for the room)",
)
args, unknown = parser.parse_known_args()
url = args.url or os.getenv("DAILY_SAMPLE_ROOM_URL")
key = args.apikey or os.getenv("DAILY_API_KEY")
if not url:
raise Exception(
"No Daily room specified. use the -u/--url option from the command line, or set DAILY_SAMPLE_ROOM_URL in your environment to specify a Daily room URL."
)
if not key:
raise Exception(
"No Daily API key specified. use the -k/--apikey option from the command line, or set DAILY_API_KEY in your environment to specify a Daily API key, available from https://dashboard.daily.co/developers."
)
daily_rest_helper = DailyRESTHelper(
daily_api_key=key,
daily_api_url=os.getenv("DAILY_API_URL", "https://api.daily.co/v1"),
aiohttp_session=aiohttp_session,
)
# Create a meeting token for the given room with an expiration 1 hour in
# the future.
expiry_time: float = 60 * 60
token = await daily_rest_helper.get_token(url, expiry_time)
return (url, token)

View File

@@ -0,0 +1,15 @@
FROM python:3.10-bullseye
RUN mkdir /app
RUN mkdir /app/assets
RUN mkdir /app/utils
COPY *.py /app/
COPY requirements.txt /app/
WORKDIR /app
RUN pip3 install -r requirements.txt
EXPOSE 7860
CMD ["python3", "server.py"]

View File

@@ -0,0 +1,39 @@
# Daily Multi Translation
This example shows how to use Daily to stream multiple simultaneous translations using a single transport. Daily provides custom tracks and in this example we will simultaneously translate incoming audio in English to Spanish, French and German, each of them being sent to a custom track.
## Get started
```python
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cp env.example .env # and add your credentials
```
## Run the server
```bash
python server.py
```
Then, visit `http://localhost:7860/` in your browser. This will open a Daily Prebuilt room where you will speak in English (make sure you are not muted).
## Open client
Next, you need to open the client that will listen to the translations.
```bash
open index.html
```
Once the client is opened, copy the URL of the Daily room created above and join it. You should be able to select which translation you want to hear.
## Build and test the Docker image
```
docker build -t daily-multi-translation .
docker run --env-file .env -p 7860:7860 daily-multi-translation
```

View File

@@ -0,0 +1,165 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.audio.mixers.soundfile_mixer import SoundfileMixer
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.observers.loggers.transcription_log_observer import TranscriptionLogObserver
from pipecat.pipeline.parallel_pipeline import ParallelPipeline
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
BACKGROUND_SOUND_FILE = "office-ambience-mono-16000.mp3"
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Multi translation bot",
DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
audio_out_mixer={
"spanish": SoundfileMixer(
sound_files={"office": BACKGROUND_SOUND_FILE}, default_sound="office"
),
"french": SoundfileMixer(
sound_files={"office": BACKGROUND_SOUND_FILE}, default_sound="office"
),
"german": SoundfileMixer(
sound_files={"office": BACKGROUND_SOUND_FILE}, default_sound="office"
),
},
audio_out_destinations=["spanish", "french", "german"],
microphone_out_enabled=False, # Disable since we just use custom tracks
vad_analyzer=SileroVADAnalyzer(),
),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts_spanish = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="cefcb124-080b-4655-b31f-932f3ee743de",
transport_destination="spanish",
)
tts_french = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="8832a0b5-47b2-4751-bb22-6a8e2149303d",
transport_destination="french",
)
tts_german = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="38aabb6a-f52b-4fb0-a3d1-988518f4dc06",
transport_destination="german",
)
messages_spanish = [
{
"role": "system",
"content": "You will be provided with a sentence in English, and your task is to only translate it into Spanish.",
},
]
messages_french = [
{
"role": "system",
"content": "You will be provided with a sentence in English, and your task is to only translate it into French.",
},
]
messages_german = [
{
"role": "system",
"content": "You will be provided with a sentence in English, and your task is to only translate it into German.",
},
]
llm_spanish = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm_french = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm_german = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
context_spanish = OpenAILLMContext(messages_spanish)
context_aggregator_spanish = llm_spanish.create_context_aggregator(context_spanish)
context_french = OpenAILLMContext(messages_french)
context_aggregator_french = llm_french.create_context_aggregator(context_french)
context_german = OpenAILLMContext(messages_german)
context_aggregator_german = llm_german.create_context_aggregator(context_german)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
ParallelPipeline(
# Spanish pipeline.
[
context_aggregator_spanish.user(),
llm_spanish,
tts_spanish,
context_aggregator_spanish.assistant(),
],
# French pipeline.
[
context_aggregator_french.user(),
llm_french,
tts_french,
context_aggregator_french.assistant(),
],
# German pipeline.
[
context_aggregator_german.user(),
llm_german,
tts_german,
context_aggregator_german.assistant(),
],
),
transport.output(), # Transport bot output
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
audio_in_sample_rate=16000,
audio_out_sample_rate=16000,
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
observers=[TranscriptionLogObserver()],
)
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,5 @@
DAILY_SAMPLE_ROOM_URL=https://yourdomain.daily.co/yourroom # (for joining the bot to the same room repeatedly for local dev)
DAILY_API_KEY=7df...
OPENAI_API_KEY=sk-PL...
DEEPGRAM_API_KEY=efb...
CARTESIA_API_KEY=aeb...

View File

@@ -0,0 +1,202 @@
<html>
<head>
<title>daily multi translation</title>
</head>
<script crossorigin src="https://unpkg.com/@daily-co/daily-js"></script>
<script
src="https://code.jquery.com/jquery-3.1.1.min.js"
integrity="sha256-hVVnYaiADRTO2PzUGmuLJr8BLUSjGIZsDYGmIJLv2b8="
crossorigin="anonymous"
></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/fomantic-ui/2.8.6/semantic.min.js"></script>
<link
rel="stylesheet"
type="text/css"
href="https://cdnjs.cloudflare.com/ajax/libs/fomantic-ui/2.8.6/semantic.min.css"
/>
<script>
function enableButton(buttonId, enable) {
const button = document.getElementById(buttonId);
button.disabled = !enable;
}
function enableJoinButton(enable) {
enableButton("join-button", enable);
}
function enableLeaveButton(enable) {
enableButton("leave-button", enable);
}
function destroyPlayers(query) {
const items = document.querySelectorAll(query);
if (items) {
for (const item of items) {
item.remove();
}
}
}
function destroyParticipantPlayers(participantId) {
destroyPlayers(`video[data-participant-id="${participantId}"]`);
destroyPlayers(`audio[data-participant-id="${participantId}"]`);
destroyPlayers(`button[data-participant-id="${participantId}"]`);
}
async function startPlayer(player, track) {
player.muted = false;
player.autoplay = true;
if (track != null) {
player.srcObject = new MediaStream([track]);
}
}
async function buildVideoPlayer(track, participantId) {
const videoContainer = document.getElementById("video-container");
const player = document.createElement("video");
player.dataset.participantId = participantId;
videoContainer.appendChild(player);
await startPlayer(player, track);
await player.play();
return player;
}
async function buildAudioPlayer(track, participantId) {
const audioContainer = document.getElementById("audio-container");
const player = document.createElement("audio");
player.dataset.participantId = participantId;
// Create a new button for controlling audio
const audioControlButton = document.createElement("button");
audioControlButton.className = "ui primary green button"
audioControlButton.innerText = track._mediaTag == "cam-audio" ? "english" : track._mediaTag;
audioControlButton.dataset.participantId = participantId;
audioControlButton.onclick = () => {
if (player.paused) {
player.play();
audioControlButton.className = "ui primary red button"
} else {
player.pause();
audioControlButton.className = "ui primary green button"
}
};
audioContainer.appendChild(player);
audioContainer.appendChild(audioControlButton);
await startPlayer(player, track);
player.pause()
return player;
}
function subscribeToTracks(participantId) {
console.log(`subscribing to track`);
if (participantId === "local") {
return;
}
callObject.updateParticipant(participantId, {
setSubscribedTracks: {
audio: true,
video: true,
custom: true,
},
});
}
function startDaily() {
enableJoinButton(true);
enableLeaveButton(false);
window.callObject = window.DailyIframe.createCallObject({});
callObject.on("participant-joined", (e) => {
if (!e.participant.local) {
console.log("participant-joined", e.participant);
subscribeToTracks(e.participant.session_id);
}
});
callObject.on("participant-left", (e) => {
console.log("participant-left", e.participant.session_id);
destroyParticipantPlayers(e.participant.session_id);
});
callObject.on("track-started", async (e) => {
console.log("track-started", e.track);
if (e.track.kind === "video") {
await buildVideoPlayer(e.track, e.participant.session_id);
} else if (e.track.kind === "audio") {
await buildAudioPlayer(e.track, e.participant.session_id);
}
});
}
async function joinRoom() {
enableJoinButton(false);
enableLeaveButton(true);
const meetingUrl = document.getElementById("meeting-url").value;
callObject.join({
url: meetingUrl,
startVideoOff: true,
startAudioOff: true,
subscribeToTracksAutomatically: false,
receiveSettings: {
base: { video: { layer: 0 } },
},
});
}
async function leaveRoom() {
enableJoinButton(true);
enableLeaveButton(false);
callObject.leave();
const videoContainer = document.getElementById("video-container");
videoContainer.replaceChildren();
const audioContainer = document.getElementById("audio-container");
audioContainer.replaceChildren();
}
</script>
<body onload="startDaily()">
<div class="ui centered page grid" style="margin-top: 30px">
<div class="ten wide column">
<div class="ui form" style="margin-top: 30px">
<div class="field">
<label>Meeting URL</label>
<input id="meeting-url" value="" />
</div>
</div>
</div>
</div>
<div class="ui centered aligned header" style="margin-top: 30px">
<button id="join-button" class="ui primary button" onclick="joinRoom()">
Join
</button>
<button id="leave-button" class="ui button" onclick="leaveRoom()">
Leave
</button>
</div>
<div id="tile" class="ui container" style="margin-top: 30px">
<div id="tile" class="ui center aligned grid">
<div id="audio-container"></div><br/>
</div>
</div>
<div id="tile" class="ui container" style="margin-top: 30px">
<div id="tile" class="ui center aligned grid">
<div id="video-container" class="ui segment"></div>
</div>
</div>
</body>
</html>

View File

@@ -0,0 +1,5 @@
aiofiles
python-dotenv
fastapi[all]
uvicorn
pipecat-ai[daily,deepgram,openai,silero,cartesia]

View File

@@ -0,0 +1,55 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import argparse
import os
import aiohttp
from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper
async def configure(aiohttp_session: aiohttp.ClientSession):
parser = argparse.ArgumentParser(description="Daily AI SDK Bot Sample")
parser.add_argument(
"-u", "--url", type=str, required=False, help="URL of the Daily room to join"
)
parser.add_argument(
"-k",
"--apikey",
type=str,
required=False,
help="Daily API Key (needed to create an owner token for the room)",
)
args, unknown = parser.parse_known_args()
url = args.url or os.getenv("DAILY_SAMPLE_ROOM_URL")
key = args.apikey or os.getenv("DAILY_API_KEY")
if not url:
raise Exception(
"No Daily room specified. use the -u/--url option from the command line, or set DAILY_SAMPLE_ROOM_URL in your environment to specify a Daily room URL."
)
if not key:
raise Exception(
"No Daily API key specified. use the -k/--apikey option from the command line, or set DAILY_API_KEY in your environment to specify a Daily API key, available from https://dashboard.daily.co/developers."
)
daily_rest_helper = DailyRESTHelper(
daily_api_key=key,
daily_api_url=os.getenv("DAILY_API_URL", "https://api.daily.co/v1"),
aiohttp_session=aiohttp_session,
)
# Create a meeting token for the given room with an expiration 1 hour in
# the future.
expiry_time: float = 60 * 60
token = await daily_rest_helper.get_token(url, expiry_time)
return (url, token)

View File

@@ -0,0 +1,139 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import argparse
import os
import subprocess
from contextlib import asynccontextmanager
import aiohttp
from dotenv import load_dotenv
from fastapi import FastAPI, HTTPException, Request
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import JSONResponse, RedirectResponse
from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper, DailyRoomParams
MAX_BOTS_PER_ROOM = 1
# Bot sub-process dict for status reporting and concurrency control
bot_procs = {}
daily_helpers = {}
load_dotenv(override=True)
def cleanup():
# Clean up function, just to be extra safe
for entry in bot_procs.values():
proc = entry[0]
proc.terminate()
proc.wait()
@asynccontextmanager
async def lifespan(app: FastAPI):
aiohttp_session = aiohttp.ClientSession()
daily_helpers["rest"] = DailyRESTHelper(
daily_api_key=os.getenv("DAILY_API_KEY", ""),
daily_api_url=os.getenv("DAILY_API_URL", "https://api.daily.co/v1"),
aiohttp_session=aiohttp_session,
)
yield
await aiohttp_session.close()
cleanup()
app = FastAPI(lifespan=lifespan)
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
@app.get("/")
async def start_agent(request: Request):
print(f"!!! Creating room")
room = await daily_helpers["rest"].create_room(DailyRoomParams())
print(f"!!! Room URL: {room.url}")
# Ensure the room property is present
if not room.url:
raise HTTPException(
status_code=500,
detail="Missing 'room' property in request data. Cannot start agent without a target room!",
)
# Check if there is already an existing process running in this room
num_bots_in_room = sum(
1 for proc in bot_procs.values() if proc[1] == room.url and proc[0].poll() is None
)
if num_bots_in_room >= MAX_BOTS_PER_ROOM:
raise HTTPException(status_code=500, detail=f"Max bot limited reach for room: {room.url}")
# Get the token for the room
token = await daily_helpers["rest"].get_token(room.url)
if not token:
raise HTTPException(status_code=500, detail=f"Failed to get token for room: {room.url}")
# Spawn a new agent, and join the user session
# Note: this is mostly for demonstration purposes (refer to 'deployment' in README)
try:
proc = subprocess.Popen(
[f"python3 -m bot -u {room.url} -t {token}"],
shell=True,
bufsize=1,
cwd=os.path.dirname(os.path.abspath(__file__)),
)
bot_procs[proc.pid] = (proc, room.url)
except Exception as e:
raise HTTPException(status_code=500, detail=f"Failed to start subprocess: {e}")
return RedirectResponse(room.url)
@app.get("/status/{pid}")
def get_status(pid: int):
# Look up the subprocess
proc = bot_procs.get(pid)
# If the subprocess doesn't exist, return an error
if not proc:
raise HTTPException(status_code=404, detail=f"Bot with process id: {pid} not found")
# Check the status of the subprocess
if proc[0].poll() is None:
status = "running"
else:
status = "finished"
return JSONResponse({"bot_id": pid, "status": status})
if __name__ == "__main__":
import uvicorn
default_host = os.getenv("HOST", "0.0.0.0")
default_port = int(os.getenv("FAST_API_PORT", "7860"))
parser = argparse.ArgumentParser(description="Daily Storyteller FastAPI server")
parser.add_argument("--host", type=str, default=default_host, help="Host address")
parser.add_argument("--port", type=int, default=default_port, help="Port number")
parser.add_argument("--reload", action="store_true", help="Reload code on change")
config = parser.parse_args()
uvicorn.run(
"server:app",
host=config.host,
port=config.port,
reload=config.reload,
)

View File

@@ -4,6 +4,7 @@
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import aiohttp
@@ -21,44 +22,23 @@ from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
# Check if we're in local development mode
LOCAL_RUN = os.getenv("LOCAL_RUN")
if LOCAL_RUN:
import asyncio
import webbrowser
try:
from local_runner import configure
except ImportError:
logger.error("Could not import local_runner module. Local development mode may not work.")
# Load environment variables
load_dotenv(override=True)
# Check if we're in local development mode
LOCAL_RUN = os.getenv("LOCAL_RUN")
async def main(room_url: str, token: str):
async def main(transport: DailyTransport):
"""Main pipeline setup and execution function.
Args:
room_url: The Daily room URL
token: The Daily room token
transport: The DailyTransport object for the bot
"""
logger.debug("Starting bot in room: {}", room_url)
transport = DailyTransport(
room_url,
token,
"bot",
DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
transcription_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
logger.debug("Starting bot")
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"), voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22"
api_key=os.getenv("CARTESIA_API_KEY"), voice_id="71a7ad14-091c-4e8e-a314-022ece01c121"
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
@@ -126,10 +106,25 @@ async def bot(args: DailySessionArguments):
body: The configuration object from the request body
session_id: The session ID for logging
"""
from pipecat.audio.filters.krisp_filter import KrispFilter
logger.info(f"Bot process initialized {args.room_url} {args.token}")
transport = DailyTransport(
args.room_url,
args.token,
"Pipecat Bot",
DailyParams(
audio_in_enabled=True,
audio_in_filter=None if LOCAL_RUN else KrispFilter(),
audio_out_enabled=True,
transcription_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
try:
await main(args.room_url, args.token)
await main(transport)
logger.info("Bot process completed")
except Exception as e:
logger.exception(f"Error in bot process: {str(e)}")
@@ -137,18 +132,27 @@ async def bot(args: DailySessionArguments):
# Local development functions
async def local_main():
async def local_daily():
"""Function for local development testing."""
from local_runner import configure
try:
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
logger.warning("_")
logger.warning("_")
logger.warning(f"Talk to your voice agent here: {room_url}")
logger.warning("_")
logger.warning("_")
webbrowser.open(room_url)
await main(room_url, token)
transport = DailyTransport(
room_url,
token,
"Pipecat Bot",
DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
transcription_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
await main(transport)
except Exception as e:
logger.exception(f"Error in local development mode: {e}")
@@ -156,6 +160,6 @@ async def local_main():
# Local development entry point
if LOCAL_RUN and __name__ == "__main__":
try:
asyncio.run(local_main())
asyncio.run(local_daily())
except Exception as e:
logger.exception(f"Failed to run in local mode: {e}")

View File

@@ -1,2 +1,4 @@
CARTESIA_API_KEY=
OPENAI_API_KEY=
OPENAI_API_KEY=
# Local dev only
DAILY_API_KEY=

View File

@@ -7,6 +7,7 @@
import os
import aiohttp
from fastapi import HTTPException
from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper, DailyRoomParams

View File

@@ -1,6 +1,8 @@
agent_name = "my-first-agent"
image = "your-username/my-first-agent:0.1"
image_credentials = "your-dockerhub-creds"
secret_set = "my-first-agent-secrets"
enable_krisp = true
[scaling]
min_instances = 0

View File

@@ -14,6 +14,7 @@ from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import LLMUserAggregatorParams
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.groq.llm import GroqLLMService
from pipecat.services.groq.stt import GroqSTTService
@@ -39,7 +40,9 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespac
stt = GroqSTTService(api_key=os.getenv("GROQ_API_KEY"))
llm = GroqLLMService(api_key=os.getenv("GROQ_API_KEY"), model="llama-3.3-70b-versatile")
llm = GroqLLMService(
api_key=os.getenv("GROQ_API_KEY"), model="meta-llama/llama-4-maverick-17b-128e-instruct"
)
tts = GroqTTSService(api_key=os.getenv("GROQ_API_KEY"))
@@ -51,7 +54,9 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespac
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
context_aggregator = llm.create_context_aggregator(
context, user_params=LLMUserAggregatorParams(aggregation_timeout=0.05)
)
pipeline = Pipeline(
[

View File

@@ -44,7 +44,8 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespac
tts = RimeHttpTTSService(
api_key=os.getenv("RIME_API_KEY", ""),
voice_id="rex",
voice_id="luna",
model="arcana",
aiohttp_session=session,
)

View File

@@ -16,8 +16,12 @@ from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.nim.llm import NimLLMService
from pipecat.services.riva.stt import ParakeetSTTService
from pipecat.services.riva.tts import FastPitchTTSService
from pipecat.services.riva.stt import (
ParakeetSTTService,
RivaSegmentedSTTService,
RivaSTTService,
)
from pipecat.services.riva.tts import FastPitchTTSService, RivaTTSService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
@@ -37,11 +41,11 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespac
),
)
stt = ParakeetSTTService(api_key=os.getenv("NVIDIA_API_KEY"))
stt = RivaSTTService(api_key=os.getenv("NVIDIA_API_KEY"))
llm = NimLLMService(api_key=os.getenv("NVIDIA_API_KEY"), model="meta/llama-3.1-405b-instruct")
tts = FastPitchTTSService(api_key=os.getenv("NVIDIA_API_KEY"))
tts = RivaTTSService(api_key=os.getenv("NVIDIA_API_KEY"))
messages = [
{

View File

@@ -4,6 +4,7 @@
# SPDX-License-Identifier: BSD 2-Clause License
#
import argparse
import os
from dotenv import load_dotenv
@@ -39,7 +40,7 @@ class TranscriptionLogger(FrameProcessor):
print(f"Translation ({frame.language}): {frame.text}")
async def run_bot(webrtc_connection: SmallWebRTCConnection):
async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespace):
logger.info(f"Starting bot")
transport = SmallWebRTCTransport(

View File

@@ -17,6 +17,7 @@ from pipecat.frames.frames import TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import LLMUserAggregatorParams
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.groq.llm import GroqLLMService
@@ -53,7 +54,9 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespac
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = GroqLLMService(api_key=os.getenv("GROQ_API_KEY"), model="llama-3.3-70b-versatile")
llm = GroqLLMService(
api_key=os.getenv("GROQ_API_KEY"), model="meta-llama/llama-4-maverick-17b-128e-instruct"
)
# You can also register a function_name of None to get all functions
# sent to the same callback with an additional function_name parameter.
llm.register_function("get_current_weather", fetch_weather_from_api)
@@ -83,7 +86,9 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespac
]
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
context_aggregator = llm.create_context_aggregator(
context, user_params=LLMUserAggregatorParams(aggregation_timeout=0.05)
)
pipeline = Pipeline(
[

View File

@@ -12,10 +12,12 @@ from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import TranscriptionMessage
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.transcript_processor import TranscriptProcessor
from pipecat.services.gemini_multimodal_live.gemini import GeminiMultimodalLiveLLMService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
@@ -69,12 +71,16 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespac
)
context_aggregator = llm.create_context_aggregator(context)
transcript = TranscriptProcessor()
pipeline = Pipeline(
[
transport.input(),
context_aggregator.user(),
transcript.user(),
llm,
transport.output(),
transcript.assistant(),
context_aggregator.assistant(),
]
)
@@ -103,6 +109,15 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespac
logger.info(f"Client closed connection")
await task.cancel()
# Register event handler for transcript updates
@transcript.event_handler("on_transcript_update")
async def on_transcript_update(processor, frame):
for msg in frame.messages:
if isinstance(msg, TranscriptionMessage):
timestamp = f"[{msg.timestamp}] " if msg.timestamp else ""
line = f"{timestamp}{msg.role}: {msg.content}"
logger.info(f"Transcript: {line}")
runner = PipelineRunner(handle_sigint=False)
await runner.run(task)

View File

@@ -89,6 +89,7 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespac
llm = GeminiMultimodalLiveLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
system_instruction=system_instruction,
transcribe_user_audio=True,
tools=tools,
)

View File

@@ -36,6 +36,7 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespac
audio_in_enabled=True,
audio_out_enabled=True,
video_out_enabled=True,
video_out_is_live=True,
video_out_width=512,
video_out_height=512,
vad_analyzer=SileroVADAnalyzer(),

View File

@@ -45,6 +45,7 @@ Note:
"""
import argparse
import asyncio
import os
from dotenv import load_dotenv
@@ -102,8 +103,17 @@ async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespac
voice_name = match.content.strip().lower()
if voice_name in VOICE_IDS:
voice_id = VOICE_IDS[voice_name]
tts.set_voice(voice_id)
logger.info(f"Switched to {voice_name} voice")
# Create task to reset the TTS context after voice change
async def change_voice():
# First flush any existing audio to finish the current context
await tts.flush_audio()
# Then set the new voice
tts.set_voice(voice_id)
logger.info(f"Switched to {voice_name} voice")
# Schedule the voice change task
asyncio.create_task(change_voice())
else:
logger.warning(f"Unknown voice: {voice_name}")

View File

@@ -0,0 +1,128 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import argparse
import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams
from pipecat.audio.turn.smart_turn.local_smart_turn import LocalSmartTurnAnalyzer
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.network.small_webrtc import SmallWebRTCTransport
from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection
load_dotenv(override=True)
async def run_bot(webrtc_connection: SmallWebRTCConnection, _: argparse.Namespace):
logger.info(f"Starting bot")
# To use this locally, set the environment variable LOCAL_SMART_TURN_MODEL_PATH
# to the path where the smart-turn repo is cloned.
#
# Example setup:
#
# # Git LFS (Large File Storage)
# brew install git-lfs
# # Hugging Face uses LFS to store large model files, including .mlpackage
# git lfs install
# # Clone the repo with the smart_turn_classifier.mlpackage
# git clone https://huggingface.co/pipecat-ai/smart-turn
#
# Then set the env variable:
# export LOCAL_SMART_TURN_MODEL_PATH=./smart-turn
# or add it to your .env file
smart_turn_model_path = os.getenv("LOCAL_SMART_TURN_MODEL_PATH")
transport = SmallWebRTCTransport(
webrtc_connection=webrtc_connection,
params=TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=LocalSmartTurnAnalyzer(
smart_turn_model_path=smart_turn_model_path, params=SmartTurnParams()
),
),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
@transport.event_handler("on_client_closed")
async def on_client_closed(transport, client):
logger.info(f"Client closed connection")
await task.cancel()
runner = PipelineRunner(handle_sigint=False)
await runner.run(task)
if __name__ == "__main__":
from run import main
main()

View File

@@ -0,0 +1,138 @@
# Daily + Twilio SIP Voice Bot
This project demonstrates how to create a voice bot that can receive phone calls via Twilio and use Daily's SIP capabilities to enable voice conversations.
## How It Works
1. Twilio receives an incoming call to your phone number
2. Twilio calls your webhook server (`/call` endpoint)
3. The server creates a Daily room with SIP capabilities
4. The server starts the bot process with the room details
5. The caller is put on hold with music
6. The bot joins the Daily room and signals readiness
7. Twilio forwards the call to Daily's SIP endpoint
8. The caller and bot are connected, and the bot handles the conversation
## Prerequisites
- A Daily account with an API key
- A Twilio account with a phone number that supports voice
- OpenAI API key for the bot's intelligence
- Cartesia API key for text-to-speech
## Setup
1. Create a virtual environment and install dependencies
```bash
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
```
2. Set up environment variables
Copy the example file and fill in your API keys:
```bash
cp .env.example .env
# Edit .env with your API keys
```
3. Configure your Twilio webhook
In the Twilio console:
- Go to your phone number's configuration
- Set the webhook for "A Call Comes In" to your server's URL + "/call"
- For local testing, you can use ngrok to expose your local server
```bash
ngrok http 8000
# Then use the provided URL (e.g., https://abc123.ngrok.io/call) in Twilio
```
## Running the Server
Start the webhook server:
```bash
python server.py
```
## Testing
Call your Twilio phone number. The system should answer the call, put you on hold briefly, then connect you with the bot.
## Customizing the Bot
You can customize the bot's behavior by modifying the system prompt in `bot.py`.
### Changing the Hold Music
To change the ringing sound or hold music that callers hear while waiting to be connected to the bot, update the URL in `server.py`:
```python
resp = VoiceResponse()
resp.play(
url="https://your-custom-audio-file-url.mp3",
loop=10,
)
```
> Read [Twilio's guide](https://www.twilio.com/en-us/blog/adding-mp3-to-voice-call-using-twilio) on how to set up an mp3 in a voice call.
## Handling Multiple SIP Endpoints
The bot is configured to handle multiple `on_dialin_ready` events that might occur with multiple SIP endpoints. It ensures that each call is only forwarded once using a simple flag:
```python
# Flag to track if call has been forwarded
call_already_forwarded = False
@transport.event_handler("on_dialin_ready")
async def on_dialin_ready(transport, cdata):
nonlocal call_already_forwarded
# Skip if already forwarded
if call_already_forwarded:
logger.info("Call already forwarded, ignoring this event.")
return
# ... forwarding code ...
call_already_forwarded = True
```
Note that normally calls only require a single SIP endpoint. If you are planning to forward the call to a different number, you will need to set up 2 SIP endpoints: one for the initial call and one for the forwarded call. IMPORTANT: ensure that your `on_dialin_ready` handler only handles the first call.
## Daily SIP Configuration
The bot configures Daily rooms with SIP capabilities using these settings:
```python
sip_params = DailyRoomSipParams(
display_name="phone-user", # This will show up in the Daily UI; optional display the dialer's number
video=False, # Audio-only call
sip_mode="dial-in", # For receiving calls (vs. dial-out)
num_endpoints=1, # Number of SIP endpoints to create
)
```
## Troubleshooting
### Call is not being answered
- Check that your Twilio webhook is correctly configured
- Verify your Twilio account has sufficient funds
- Check the logs of both the server and bot processes
### Call connects but no bot is heard
- Ensure your Daily API key is correct and has SIP capabilities
- Check that the SIP endpoint is being correctly passed to the bot
- Verify that the Cartesia API key and voice ID are correct
### Bot starts but disconnects immediately
- Check the Daily and Twilio logs for any error messages
- Ensure your server has stable internet connectivity

View File

@@ -0,0 +1,183 @@
"""Twilio + Daily voice bot implementation."""
import argparse
import asyncio
import os
import sys
from dotenv import load_dotenv
from loguru import logger
from twilio.rest import Client
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
# Setup logging
load_dotenv()
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
# Initialize Twilio client
twilio_client = Client(os.getenv("TWILIO_ACCOUNT_SID"), os.getenv("TWILIO_AUTH_TOKEN"))
async def run_bot(room_url: str, token: str, call_id: str, sip_uri: str) -> None:
"""Run the voice bot with the given parameters.
Args:
room_url: The Daily room URL
token: The Daily room token
call_id: The Twilio call ID
sip_uri: The Daily SIP URI for forwarding the call
"""
logger.info(f"Starting bot with room: {room_url}")
logger.info(f"SIP endpoint: {sip_uri}")
call_already_forwarded = False
# Setup the Daily transport
transport = DailyTransport(
room_url,
token,
"Phone Bot",
DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
transcription_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
# Setup TTS service
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
# Setup LLM service
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
# Initialize LLM context with system prompt
messages = [
{
"role": "system",
"content": (
"You are a friendly phone assistant. Your responses will be read aloud, "
"so keep them concise and conversational. Avoid special characters or "
"formatting. Begin by greeting the caller and asking how you can help them today."
),
},
]
# Setup the conversational context
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
# Build the pipeline
pipeline = Pipeline(
[
transport.input(),
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
]
)
# Create the pipeline task
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True # Enable barge-in so callers can interrupt the bot
),
)
# Handle participant joining
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
logger.info(f"First participant joined: {participant['id']}")
await transport.capture_participant_transcription(participant["id"])
await task.queue_frames([context_aggregator.user().get_context_frame()])
# Handle participant leaving
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
logger.info(f"Participant left: {participant['id']}, reason: {reason}")
await task.cancel()
# Handle call ready to forward
@transport.event_handler("on_dialin_ready")
async def on_dialin_ready(transport, cdata):
nonlocal call_already_forwarded
# We only want to forward the call once
# The on_dialin_ready event will be triggered for each sip endpoint provisioned
if call_already_forwarded:
logger.warning("Call already forwarded, ignoring this event.")
return
logger.info(f"Forwarding call {call_id} to {sip_uri}")
try:
# Update the Twilio call with TwiML to forward to the Daily SIP endpoint
twilio_client.calls(call_id).update(
twiml=f"<Response><Dial><Sip>{sip_uri}</Sip></Dial></Response>"
)
logger.info("Call forwarded successfully")
call_already_forwarded = True
except Exception as e:
logger.error(f"Failed to forward call: {str(e)}")
raise
@transport.event_handler("on_dialin_connected")
async def on_dialin_connected(transport, data):
logger.debug(f"Dial-in connected: {data}")
@transport.event_handler("on_dialin_stopped")
async def on_dialin_stopped(transport, data):
logger.debug(f"Dial-in stopped: {data}")
@transport.event_handler("on_dialin_error")
async def on_dialin_error(transport, data):
logger.error(f"Dial-in error: {data}")
# If there is an error, the bot should leave the call
# This may be also handled in on_participant_left with
# await task.cancel()
@transport.event_handler("on_dialin_warning")
async def on_dialin_warning(transport, data):
logger.warning(f"Dial-in warning: {data}")
# Run the pipeline
runner = PipelineRunner()
await runner.run(task)
async def main():
"""Parse command line arguments and run the bot."""
parser = argparse.ArgumentParser(description="Daily + Twilio Voice Bot")
parser.add_argument("-u", type=str, required=True, help="Daily room URL")
parser.add_argument("-t", type=str, required=True, help="Daily room token")
parser.add_argument("-i", type=str, required=True, help="Twilio call ID")
parser.add_argument("-s", type=str, required=True, help="Daily SIP URI")
args = parser.parse_args()
# Validate required arguments
if not all([args.u, args.t, args.i, args.s]):
logger.error("All arguments (-u, -t, -i, -s) are required")
parser.print_help()
sys.exit(1)
await run_bot(args.u, args.t, args.i, args.s)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,11 @@
# Daily credentials
DAILY_API_KEY=your_daily_api_key
DAILY_API_URL=https://api.daily.co/v1
# Twilio credentials
TWILIO_ACCOUNT_SID=your_twilio_account_sid
TWILIO_AUTH_TOKEN=your_twilio_auth_token
# Service keys
OPENAI_API_KEY=your_openai_api_key
CARTESIA_API_KEY=your_cartesia_api_key

View File

@@ -0,0 +1,5 @@
pipecat-ai[daily,elevenlabs,openai,silero]
fastapi==0.115.6
uvicorn
python-dotenv
twilio

View File

@@ -0,0 +1,116 @@
"""Webhook server to handle Twilio calls and start the voice bot."""
import os
import shlex
import subprocess
from contextlib import asynccontextmanager
import aiohttp
import uvicorn
from dotenv import load_dotenv
from fastapi import FastAPI, HTTPException, Request
from fastapi.responses import PlainTextResponse
from twilio.twiml.voice_response import VoiceResponse
from utils.daily_helpers import create_sip_room
# Load environment variables
load_dotenv()
# Initialize FastAPI app with aiohttp session
@asynccontextmanager
async def lifespan(app: FastAPI):
# Create aiohttp session to be used for Daily API calls
app.state.session = aiohttp.ClientSession()
yield
# Close session when shutting down
await app.state.session.close()
app = FastAPI(lifespan=lifespan)
@app.post("/call", response_class=PlainTextResponse)
async def handle_call(request: Request):
"""Handle incoming Twilio call webhook."""
print("Received call webhook from Twilio")
try:
# Get form data from Twilio webhook
form_data = await request.form()
data = dict(form_data)
# Extract call ID (required to forward the call later)
call_sid = data.get("CallSid")
if not call_sid:
raise HTTPException(status_code=400, detail="Missing CallSid in request")
# Extract the caller's phone number
caller_phone = str(data.get("From", "unknown-caller"))
print(f"Processing call with ID: {call_sid} from {caller_phone}")
# Create a Daily room with SIP capabilities
try:
room_details = await create_sip_room(request.app.state.session, caller_phone)
except Exception as e:
print(f"Error creating Daily room: {e}")
raise HTTPException(status_code=500, detail=f"Failed to create Daily room: {str(e)}")
# Extract necessary details
room_url = room_details["room_url"]
token = room_details["token"]
sip_endpoint = room_details["sip_endpoint"]
# Make sure we have a SIP endpoint
if not sip_endpoint:
raise HTTPException(status_code=500, detail="No SIP endpoint provided by Daily")
# Start the bot process
bot_cmd = f"python bot.py -u {room_url} -t {token} -i {call_sid} -s {sip_endpoint}"
try:
# Use shlex to properly split the command for subprocess
cmd_parts = shlex.split(bot_cmd)
# CHANGE: Keep stdout/stderr for debugging
# Start the bot in the background but capture output
subprocess.Popen(
cmd_parts,
# Don't redirect output so we can see logs
# stdout=subprocess.DEVNULL,
# stderr=subprocess.DEVNULL
)
print(f"Started bot process with command: {bot_cmd}")
except Exception as e:
print(f"Error starting bot: {e}")
raise HTTPException(status_code=500, detail=f"Failed to start bot: {str(e)}")
# Generate TwiML response to put the caller on hold with music
# You can replace the URL with your own music file
# or use Twilio's built-in music on hold
# https://www.twilio.com/docs/voice/twiml/play#music-on-hold
resp = VoiceResponse()
resp.play(
url="https://therapeutic-crayon-2467.twil.io/assets/US_ringback_tone.mp3",
loop=10,
)
return str(resp)
except HTTPException:
raise
except Exception as e:
print(f"Unexpected error: {str(e)}")
raise HTTPException(status_code=500, detail=f"Server error: {str(e)}")
@app.get("/health")
async def health_check():
"""Simple health check endpoint."""
return {"status": "healthy"}
if __name__ == "__main__":
# Run the server
port = int(os.getenv("PORT", "8000"))
print(f"Starting server on port {port}")
uvicorn.run("server:app", host="0.0.0.0", port=port, reload=True)

View File

@@ -0,0 +1,76 @@
"""Helper functions for interacting with the Daily API."""
import os
from typing import Dict, Optional
import aiohttp
from dotenv import load_dotenv
from pipecat.transports.services.helpers.daily_rest import (
DailyRESTHelper,
DailyRoomParams,
DailyRoomProperties,
DailyRoomSipParams,
)
load_dotenv()
# Initialize Daily API helper
async def get_daily_helper(session: Optional[aiohttp.ClientSession] = None) -> DailyRESTHelper:
"""Get a Daily REST helper with the configured API key."""
if session is None:
session = aiohttp.ClientSession()
return DailyRESTHelper(
daily_api_key=os.getenv("DAILY_API_KEY", ""),
daily_api_url=os.getenv("DAILY_API_URL", "https://api.daily.co/v1"),
aiohttp_session=session,
)
async def create_sip_room(
session: Optional[aiohttp.ClientSession] = None, caller_phone: str = "unknown-caller"
) -> Dict[str, str]:
"""Create a Daily room with SIP capabilities for phone calls.
Args:
session: Optional aiohttp session to use for API calls
caller_phone: The phone number of the caller to use in display name
Returns:
Dictionary with room URL, token, and SIP endpoint
"""
daily_helper = await get_daily_helper(session)
# Configure SIP parameters
sip_params = DailyRoomSipParams(
display_name=caller_phone,
video=False,
sip_mode="dial-in",
num_endpoints=1,
)
# Create room properties with SIP enabled
properties = DailyRoomProperties(
sip=sip_params,
enable_dialout=True, # Needed for outbound calls if you expand the bot
enable_chat=False, # No need for chat in a voice bot
start_video_off=True, # Voice only
)
# Create room parameters
params = DailyRoomParams(properties=properties)
# Create the room
try:
room = await daily_helper.create_room(params=params)
print(f"Created room: {room.url} with SIP endpoint: {room.config.sip_endpoint}")
# Get token for the bot to join
token = await daily_helper.get_token(room.url, 24 * 60 * 60) # 24 hours validity
return {"room_url": room.url, "token": token, "sip_endpoint": room.config.sip_endpoint}
except Exception as e:
print(f"Error creating room: {e}")
raise

View File

@@ -235,10 +235,10 @@ For incoming calls from customers, Daily will send a webhook to your `/start` en
```json
{
"From": "+CALLERS_PHONE",
"To": "$PURCHASED_PHONE",
"callId": "callid-read-only-string",
"callDomain": "callDomain-read-only-string"
"From": "+CALLERS_PHONE",
"To": "$PURCHASED_PHONE",
"callId": "callid-read-only-string",
"callDomain": "callDomain-read-only-string"
}
```
@@ -266,63 +266,63 @@ When making requests to the `/start` endpoint, the config object can include:
```json
{
"config": {
"prompts": [
{
"name": "call_transfer_initial_prompt",
"text": "Your custom prompt here"
},
{
"name": "call_transfer_prompt",
"text": "Your custom prompt here"
},
{
"name": "call_transfer_finished_prompt",
"text": "Your custom prompt here"
},
{
"name": "voicemail_detection_prompt",
"text": "Your custom prompt here"
},
{
"name": "voicemail_prompt",
"text": "Your custom prompt here"
},
{
"name": "human_conversation_prompt",
"text": "Your custom prompt here"
}
],
"dialin_settings": {
"From": "+CALLERS_PHONE",
"To": "$PURCHASED_PHONE",
"callId": "callid-read-only-string",
"callDomain": "callDomain-read-only-string"
},
"dialout_settings": [
{
"phoneNumber": "+12345678910",
"callerId": "caller-id-uuid",
"sipUri": "sip:maria@example.com"
}
],
"call_transfer": {
"mode": "dialout",
"speakSummary": true,
"storeSummary": false,
"operatorNumber": "+12345678910",
"testInPrebuilt": false
},
"voicemail_detection": {
"testInPrebuilt": true
},
"simple_dialin": {
"testInPrebuilt": true
},
"simple_dialout": {
"testInPrebuilt": true
}
}
"config": {
"prompts": [
{
"name": "call_transfer_initial_prompt",
"text": "Your custom prompt here"
},
{
"name": "call_transfer_prompt",
"text": "Your custom prompt here"
},
{
"name": "call_transfer_finished_prompt",
"text": "Your custom prompt here"
},
{
"name": "voicemail_detection_prompt",
"text": "Your custom prompt here"
},
{
"name": "voicemail_prompt",
"text": "Your custom prompt here"
},
{
"name": "human_conversation_prompt",
"text": "Your custom prompt here"
}
],
"dialin_settings": {
"From": "+CALLERS_PHONE",
"To": "$PURCHASED_PHONE",
"callId": "callid-read-only-string",
"callDomain": "callDomain-read-only-string"
},
"dialout_settings": [
{
"phoneNumber": "+12345678910",
"callerId": "caller-id-uuid",
"sipUri": "sip:maria@example.com"
}
],
"call_transfer": {
"mode": "dialout",
"speakSummary": true,
"storeSummary": false,
"operatorNumber": "+12345678910",
"testInPrebuilt": false
},
"voicemail_detection": {
"testInPrebuilt": true
},
"simple_dialin": {
"testInPrebuilt": true
},
"simple_dialout": {
"testInPrebuilt": true
}
}
}
```
@@ -393,19 +393,19 @@ The following table shows which feature combinations are supported when making r
```json
{
"config": {
"dialin_settings": {
"from": "+12345678901",
"to": "+19876543210",
"call_id": "call-id-string",
"call_domain": "domain-string"
},
"call_transfer": {
"mode": "dialout",
"speakSummary": true,
"operatorNumber": "+12345678910"
}
}
"config": {
"dialin_settings": {
"from": "+12345678901",
"to": "+19876543210",
"call_id": "call-id-string",
"call_domain": "domain-string"
},
"call_transfer": {
"mode": "dialout",
"speakSummary": true,
"operatorNumber": "+12345678910"
}
}
}
```
@@ -413,14 +413,14 @@ The following table shows which feature combinations are supported when making r
```json
{
"config": {
"call_transfer": {
"mode": "dialout",
"speakSummary": true,
"operatorNumber": "+12345678910",
"testInPrebuilt": true
}
}
"config": {
"call_transfer": {
"mode": "dialout",
"speakSummary": true,
"operatorNumber": "+12345678910",
"testInPrebuilt": true
}
}
}
```
@@ -428,11 +428,11 @@ The following table shows which feature combinations are supported when making r
```json
{
"config": {
"voicemail_detection": {
"testInPrebuilt": true
}
}
"config": {
"voicemail_detection": {
"testInPrebuilt": true
}
}
}
```
@@ -440,16 +440,16 @@ The following table shows which feature combinations are supported when making r
```json
{
"config": {
"dialout_settings": [
{
"phoneNumber": "+12345678910"
}
],
"voicemail_detection": {
"testInPrebuilt": false
}
}
"config": {
"dialout_settings": [
{
"phoneNumber": "+12345678910"
}
],
"voicemail_detection": {
"testInPrebuilt": false
}
}
}
```
@@ -457,15 +457,15 @@ The following table shows which feature combinations are supported when making r
```json
{
"config": {
"dialin_settings": {
"from": "+12345678901",
"to": "+19876543210",
"call_id": "call-id-string",
"call_domain": "domain-string"
},
"simple_dialin": {}
}
"config": {
"dialin_settings": {
"from": "+12345678901",
"to": "+19876543210",
"call_id": "call-id-string",
"call_domain": "domain-string"
},
"simple_dialin": {}
}
}
```
@@ -473,11 +473,11 @@ The following table shows which feature combinations are supported when making r
```json
{
"config": {
"simple_dialin": {
"testInPrebuilt": true
}
}
"config": {
"simple_dialin": {
"testInPrebuilt": true
}
}
}
```
@@ -485,14 +485,14 @@ The following table shows which feature combinations are supported when making r
```json
{
"config": {
"dialout_settings": [
{
"phoneNumber": "+12345678910"
}
],
"simple_dialout": {}
}
"config": {
"dialout_settings": [
{
"phoneNumber": "+12345678910"
}
],
"simple_dialout": {}
}
}
```
@@ -500,37 +500,14 @@ The following table shows which feature combinations are supported when making r
```json
{
"config": {
"simple_dialout": {
"testInPrebuilt": true
}
}
"config": {
"simple_dialout": {
"testInPrebuilt": true
}
}
}
```
## Using Twilio (Alternative)
To use Twilio for call handling:
1. Start the bot runner:
```shell
python bot_runner.py --host localhost
```
2. Start ngrok:
```shell
ngrok http --domain yourdomain.ngrok.app 7860
```
3. In another terminal, run the Twilio bot:
```shell
python bot_twilio.py
```
Make requests to `/start_twilio_bot` for Twilio-specific functionality.
## Deployment
See Pipecat Cloud deployment docs for how to deploy this example: https://docs.pipecat.daily.co/agents/deploy

View File

@@ -20,8 +20,7 @@ from bot_runner_helpers import (
from dotenv import load_dotenv
from fastapi import FastAPI, HTTPException, Request
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import JSONResponse, PlainTextResponse
from twilio.twiml.voice_response import VoiceResponse
from fastapi.responses import JSONResponse
from pipecat.transports.services.helpers.daily_rest import (
DailyRESTHelper,
@@ -125,32 +124,6 @@ async def start_bot(room_details: Dict[str, str], body: Dict[str, Any], example:
raise HTTPException(status_code=500, detail=f"Failed to start subprocess: {e}")
async def start_twilio_bot(room_details: Dict[str, str], call_id: str) -> bool:
"""Start a Twilio bot process with the given configuration.
Args:
room_details: Room URL, token, and SIP endpoint
call_id: Twilio call ID (CallSid)
Returns:
Boolean indicating success
"""
room_url = room_details["room"]
token = room_details["token"]
sip_endpoint = room_details["sip_endpoint"]
# Format command for Twilio bot
bot_proc = f"python3 -m bot_twilio -u {room_url} -t {token} -i {call_id} -s {sip_endpoint}"
print(f"Starting Twilio bot. Room: {room_url}")
try:
command_parts = shlex.split(bot_proc)
subprocess.Popen(command_parts, bufsize=1, cwd=os.path.dirname(os.path.abspath(__file__)))
return True
except Exception as e:
raise HTTPException(status_code=500, detail=f"Failed to start subprocess: {e}")
# ----------------- API Setup ----------------- #
@@ -180,47 +153,6 @@ app.add_middleware(
# ----------------- API Endpoints ----------------- #
@app.post("/twilio_start_bot", response_class=PlainTextResponse)
async def twilio_start_bot(request: Request):
"""Handle incoming Twilio webhook calls and start a Twilio bot.
This endpoint is called directly by Twilio as a webhook when a call is received.
It puts the call on hold with music and starts a bot that will handle the call.
"""
print("POST /twilio_start_bot")
# Get form data from Twilio webhook
try:
form_data = await request.form()
data = dict(form_data)
except Exception as e:
raise HTTPException(status_code=400, detail=f"Failed to parse Twilio form data: {str(e)}")
# Get default room URL from environment
room_url = os.getenv("DAILY_SAMPLE_ROOM_URL", None)
# Extract call ID from Twilio data
call_id = data.get("CallSid")
if not call_id:
raise HTTPException(status_code=400, detail="Missing 'CallSid' in request")
print(f"CallId: {call_id}")
# Create Daily room for the Twilio call
room_details = await create_daily_room(room_url, None) # No special config for Twilio rooms
# Start the Twilio bot
await start_twilio_bot(room_details, call_id)
# Put the call on hold until the bot is ready to handle it
# The bot will update the call with the SIP URI when it's ready
resp = VoiceResponse()
resp.play(
url="http://com.twilio.sounds.music.s3.amazonaws.com/MARKOVICHAMP-Borghestral.mp3", loop=10
)
return str(resp)
@app.post("/start")
async def handle_start_request(request: Request) -> JSONResponse:
"""Unified endpoint to handle bot configuration for different scenarios."""
@@ -228,21 +160,7 @@ async def handle_start_request(request: Request) -> JSONResponse:
room_url = os.getenv("DAILY_SAMPLE_ROOM_URL", None)
try:
# Check if this is form data (from Twilio) or JSON
content_type = request.headers.get("content-type", "").lower()
if "application/x-www-form-urlencoded" in content_type:
# Handle form data from Twilio
form_data = await request.form()
data = dict(form_data)
# Check for CallSid which indicates this is a Twilio webhook
if "CallSid" in data:
# Redirect to Twilio handler for backward compatibility
return await twilio_start_bot(request)
else:
# Parse JSON request data
data = await request.json()
data = await request.json()
# Handle webhook test
if "test" in data:
@@ -298,14 +216,6 @@ async def handle_start_request(request: Request) -> JSONResponse:
return JSONResponse(response)
except json.JSONDecodeError:
# Check if this might be form data from Twilio
try:
content_type = request.headers.get("content-type", "").lower()
if "application/x-www-form-urlencoded" in content_type:
return await twilio_start_bot(request)
except Exception:
pass
raise HTTPException(status_code=400, detail="Invalid JSON in request body")
except Exception as e:
raise HTTPException(status_code=400, detail=f"Request processing error: {str(e)}")

View File

@@ -1,122 +0,0 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import argparse
import asyncio
import os
import sys
from dotenv import load_dotenv
from loguru import logger
from twilio.rest import Client
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.elevenlabs.tts import ElevenLabsTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
twilio_account_sid = os.getenv("TWILIO_ACCOUNT_SID")
twilio_auth_token = os.getenv("TWILIO_AUTH_TOKEN")
twilioclient = Client(twilio_account_sid, twilio_auth_token)
daily_api_key = os.getenv("DAILY_API_KEY", "")
async def main(room_url: str, token: str, callId: str, sipUri: str):
# dialin_settings are only needed if Daily's SIP URI is used
# If you are handling this via Twilio, Telnyx, set this to None
# and handle call-forwarding when on_dialin_ready fires.
transport = DailyTransport(
room_url,
token,
"Chatbot",
DailyParams(
api_key=daily_api_key,
dialin_settings=None, # Not required for Twilio
audio_in_enabled=True,
audio_out_enabled=True,
video_out_enabled=False,
vad_analyzer=SileroVADAnalyzer(),
transcription_enabled=True,
),
)
tts = ElevenLabsTTSService(
api_key=os.getenv("ELEVENLABS_API_KEY", ""),
voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
messages = [
{
"role": "system",
"content": "You are Chatbot, a friendly, helpful robot. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way, but keep your responses brief. Start by saying 'Hello! Who dares dial me at this hour?!'.",
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(),
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
]
)
task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
@transport.event_handler("on_dialin_ready")
async def on_dialin_ready(transport, cdata):
# For Twilio, Telnyx, etc. You need to update the state of the call
# and forward it to the sip_uri..
print(f"Forwarding call: {callId} {sipUri}")
try:
# The TwiML is updated using Twilio's client library
call = twilioclient.calls(callId).update(
twiml=f"<Response><Dial><Sip>{sipUri}</Sip></Dial></Response>"
)
except Exception as e:
raise Exception(f"Failed to forward call: {str(e)}")
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Pipecat Simple ChatBot")
parser.add_argument("-u", type=str, help="Room URL")
parser.add_argument("-t", type=str, help="Token")
parser.add_argument("-i", type=str, help="Call ID")
parser.add_argument("-s", type=str, help="SIP URI")
config = parser.parse_args()
asyncio.run(main(config.u, config.t, config.i, config.s))

View File

@@ -5,8 +5,6 @@ DEEPGRAM_API_KEY=
OPENAI_API_KEY=
GOOGLE_API_KEY
CARTESIA_API_KEY=
TWILIO_ACCOUNT_SID=
TWILIO_AUTH_TOKEN=
DIAL_IN_FROM_NUMBER=
DIAL_OUT_TO_NUMBER=
OPERATOR_NUMBER=

View File

@@ -2,5 +2,4 @@ pipecat-ai[daily,cartesia,deepgram,openai,google,silero]
fastapi==0.115.6
uvicorn
python-dotenv
twilio
python-multipart

View File

@@ -53,4 +53,3 @@ async def configure(aiohttp_session: aiohttp.ClientSession):
token = await daily_rest_helper.get_token(url, expiry_time)
return (url, token)
return (url, token)

View File

@@ -1,2 +0,0 @@
frontend/node_modules
frontend/out

View File

@@ -1,4 +1,4 @@
[![Try](https://img.shields.io/badge/try_it-here-blue)](https://storytelling-chatbot.fly.dev)
[![Try](https://img.shields.io/badge/try_it-here-blue)](https://gemini-storybot.vercel.app/)
# Storytelling Chatbot
@@ -9,7 +9,6 @@ It periodically prompts the user for input for a 'choose your own adventure' sty
We use Gemini 2.0 for creating the story and image prompts, and we add visual elements to the story by generating images using Google's Imagen.
---
### It uses the following AI services:
@@ -20,7 +19,7 @@ Transcribes inbound participant voice media to text.
**Google Gemini 2.0 - LLM**
Our creative writer LLM. You can see the context used to prompt it [here](src/prompts.py)
Our creative writer LLM. You can see the context used to prompt it [here](server/prompts.py)
**ElevenLabs - Text-to-Speech**
@@ -34,47 +33,76 @@ Adds pictures to our story. Prompting is quite key for style consistency, so we
## Setup
**Install requirements**
### Client
```shell
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```
1. Navigate to the client directory:
**Create environment file and set variables:**
```shell
cd client
```
```shell
mv env.example .env
```
2. Install dependencies:
When deploying to production, to ensure only this app can spawn a new bot, set your `ENV` to `production`
```shell
npm install
```
**Build the frontend:**
3. Build the client:
This project uses a custom frontend, which needs to built. Note: this is done automatically as part of the Docker deployment.
```shell
npm run build
```
```shell
cd frontend/
npm install
npm run build
```
### Server
The build UI files can be found in `frontend/out`
1. Navigate to the server directory
## Running it locally
```shell
cd ../server
```
Start the API / bot manager:
2. Set up your virtual environment and install requirements
`python src/bot_runner.py --host localhost`
```shell
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```
If you'd like to run a custom domain or port:
3. Create environment file and set variables
`python src/bot_runner.py --host somehost --p someport`
```shell
mv env.example .env
```
➡️ Open the host URL in your browser `http://localhost:7860`
You'll need API keys for:
If you've run previous versions of the demo, make sure to set `ENV=dev`, and remove the `RUN_AS_VM` line from the .env file.
- DAILY_API_KEY
- ELEVENLABS_API_KEY
- ELEVENLABS_VOICE_ID
- GOOGLE_API_KEY
4. (Optional) Deployment:
When deploying to production, to ensure only this app can spawn new bot processes, set your `ENV` to `production`
## Run it locally
1. Navigate back to the demo's root directory:
```shell
cd ..
```
2. Run the application:
```shell
python server/bot_runner.py --host localhost
```
You can run with a custom domain or port using: `python server/bot_runner.py --host somehost --p someport`
3. ➡️ Open the host URL in your browser: http://localhost:7860
---

View File

Before

Width:  |  Height:  |  Size: 1.1 KiB

After

Width:  |  Height:  |  Size: 1.1 KiB

View File

Before

Width:  |  Height:  |  Size: 1.3 MiB

After

Width:  |  Height:  |  Size: 1.3 MiB

View File

Before

Width:  |  Height:  |  Size: 2.4 MiB

After

Width:  |  Height:  |  Size: 2.4 MiB

View File

@@ -1,11 +1,11 @@
{
"name": "frontend",
"name": "client",
"version": "0.1.0",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "frontend",
"name": "client",
"version": "0.1.0",
"dependencies": {
"@daily-co/daily-js": "^0.62.0",

View File

@@ -1,5 +1,5 @@
{
"name": "frontend",
"name": "client",
"version": "0.1.0",
"private": true,
"scripts": {

View File

Before

Width:  |  Height:  |  Size: 3.7 KiB

After

Width:  |  Height:  |  Size: 3.7 KiB

View File

Before

Width:  |  Height:  |  Size: 788 KiB

After

Width:  |  Height:  |  Size: 788 KiB

View File

@@ -0,0 +1,2 @@
client/node_modules
client/out

View File

@@ -44,11 +44,11 @@ COPY ./requirements.txt requirements.txt
RUN pip3 install --no-cache-dir --upgrade -r requirements.txt
# Copy everything else
COPY --chown=user ./src/ src/
COPY --chown=user ./server/ server/
# Copy frontend app and build
COPY --chown=user ./frontend/ frontend/
RUN cd frontend && npm install && npm run build
# Copy client app and build
COPY --chown=user ./client/ client/
RUN cd client && npm install && npm run build
# Start the FastAPI server
CMD python3 src/bot_runner.py --port ${FAST_API_PORT}
CMD python3 server/bot_runner.py --port ${FAST_API_PORT}

View File

Before

Width:  |  Height:  |  Size: 1.4 MiB

After

Width:  |  Height:  |  Size: 1.4 MiB

View File

Before

Width:  |  Height:  |  Size: 1.5 MiB

After

Width:  |  Height:  |  Size: 1.5 MiB

View File

@@ -57,7 +57,7 @@ app.add_middleware(
)
# Mount the static directory
STATIC_DIR = "frontend/out"
STATIC_DIR = "client/out"
# ------------ Fast API Routes ------------ #
@@ -175,7 +175,7 @@ async def virtualize_bot(room_url: str, token: str):
image = data[0]["config"]["image"]
# Machine configuration
cmd = f"python src/bot.py -u {room_url} -t {token}"
cmd = f"python server/bot.py -u {room_url} -t {token}"
cmd = cmd.split()
worker_props = {
"config": {

View File

@@ -47,7 +47,7 @@ canonical = [ "aiofiles~=24.1.0" ]
cartesia = [ "cartesia~=1.4.0", "websockets~=13.1" ]
cerebras = []
deepseek = []
daily = [ "daily-python~=0.17.0" ]
daily = [ "daily-python~=0.18.1" ]
deepgram = [ "deepgram-sdk~=3.8.0" ]
elevenlabs = [ "websockets~=13.1" ]
fal = [ "fal-client~=0.5.9" ]
@@ -56,7 +56,7 @@ fish = [ "ormsgpack~=1.7.0", "websockets~=13.1" ]
gladia = [ "websockets~=13.1" ]
google = [ "google-cloud-speech~=2.31.1", "google-cloud-texttospeech~=2.25.1", "google-genai~=1.7.0", "google-generativeai~=0.8.4", "websockets~=13.1" ]
grok = []
groq = [ "groq~=0.20.0" ]
groq = [ "groq~=0.23.0" ]
gstreamer = [ "pygobject~=3.50.0" ]
krisp = [ "pipecat-ai-krisp~=0.3.0" ]
koala = [ "pvkoala~=2.0.3" ]
@@ -78,7 +78,7 @@ perplexity = []
playht = [ "pyht~=0.1.12", "websockets~=13.1" ]
qwen = []
rime = [ "websockets~=13.1" ]
riva = [ "nvidia-riva-client~=2.19.0" ]
riva = [ "nvidia-riva-client~=2.19.1" ]
sentry = [ "sentry-sdk~=2.23.1" ]
local-smart-turn = [ "coremltools>=8.0", "transformers", "torch==2.5.0", "torchaudio==2.5.0" ]
remote-smart-turn = []

Some files were not shown because too many files have changed in this diff Show More