Compare commits

...

398 Commits

Author SHA1 Message Date
James Hush
fdf0652141 Remove vad 2025-08-14 11:17:22 +08:00
James Hush
237c400f2d Remove vad 2025-08-14 11:16:54 +08:00
James Hush
b6afce2a92 intervention processor 2025-08-14 11:15:55 +08:00
Mark Backman
d7f31e0cbd Merge pull request #2387 from pipecat-ai/mb/retry-chat-completion
Retry chat completions for OpenAILLMService and its subclasses
2025-08-13 14:39:40 -07:00
Mark Backman
c662a2d820 Merge pull request #2437 from pipecat-ai/mb/19-english
Foundational 19: Respond in English
2025-08-13 11:57:24 -07:00
Mark Backman
89f0ff17c0 Merge pull request #2430 from pipecat-ai/aleix/pipecat-0.0.80
update CHANGELOG for 0.0.80
2025-08-13 09:41:43 -07:00
Mark Backman
b5465364fa Foundational 19: Respond in English 2025-08-13 12:37:13 -04:00
Aleix Conchillo Flaqué
c024eb7b8c update CHANGELOG for 0.0.80 2025-08-13 11:46:24 -04:00
Mark Backman
608570e89d Merge pull request #2433 from pipecat-ai/mb/openai-realtime-text-modality
fix: Add text support to OpenAIRealtimeBetaLLMService
2025-08-13 08:41:33 -07:00
Mark Backman
3ad61a8a04 Remove stray - in changelog 2025-08-13 11:39:59 -04:00
Mark Backman
4c4bae2db6 Remove unnessecary messages from 19 and 19b examples 2025-08-13 11:39:59 -04:00
Mark Backman
901b6b5913 Add foundational 19b 2025-08-13 11:37:38 -04:00
Mark Backman
71cd0f1c87 fix: Add text support to OpenAIRealtimeBetaLLMService 2025-08-13 11:37:36 -04:00
Filipi da Silva Fuchter
a2a419e6db Merge pull request #2435 from pipecat-ai/filipi/small_webrtc_end_pipeline
Fixed an issue where `SmallWebRTCTransport` ended before TTS finished.
2025-08-13 11:58:33 -03:00
Filipi Fuchter
bbbbdc459a Fixed an issue where SmallWebRTCTransport ended before TTS finished. 2025-08-13 11:46:51 -03:00
Mark Backman
d203528dad Merge pull request #2333 from yohan-altrium/fix/2277-azure-tts-ssml-reserved-characters
Fixes 2277 - SSML reserved characters causes Azure TTS to fail
2025-08-13 06:27:30 -07:00
Yohan Liyanage
4bcca7956e Refactors the code based on PR comments and adds the relevant changelog entry. 2025-08-13 16:34:33 +05:30
Aleix Conchillo Flaqué
68a4cf4c68 Merge pull request #2427 from pipecat-ai/aleix/base-watchdog-priority-queue
WatchdogPriorityQueue: this is now a base class
2025-08-12 18:25:59 -07:00
Aleix Conchillo Flaqué
0508ddddfb WatchdogPriorityQueue: fix watchdog sentinel insertion
We now force each inserted item in the priority queue to be a tuple and the
actual value to be last in the tuple. All the previous values in the tuple also
need to be numeric.
2025-08-12 17:40:58 -07:00
Mark Backman
8714c9137f Code review fixes 2025-08-12 17:49:13 -04:00
Mark Backman
4c029fcfa7 Update OpenAILLMService subclasses to use the new build_chat_completion_params function 2025-08-12 17:48:51 -04:00
Mark Backman
5c86f8e687 Add timeout/retry logic and refactor parameter building in BaseOpenAILLMService
- Add timeout (default 5.0s) and retry_on_timeout parameters to constructor
- Implement timeout/retry logic in get_chat_completions using asyncio.wait_for
- Extract build_chat_completion_params() as public method for subclass customization
2025-08-12 17:48:51 -04:00
Mark Backman
54a4d8a9f8 Merge pull request #2422 from thsunkid/thu/fix-set-lang-in-base-whisper
Fix: assigns string code instead of Language enum to BaseWhisperSTTService._language
2025-08-12 11:57:46 -07:00
Mark Backman
38af514d95 Merge pull request #2407 from pipecat-ai/mb/add-gemini-tts
Add GeminiTTSService
2025-08-12 11:56:45 -07:00
Aleix Conchillo Flaqué
6aa80c0b8e Merge pull request #2424 from pipecat-ai/aleix/system-frame-queues-fix
FrameProcessor: fix race condition on FrameProcessorQueue
2025-08-12 11:56:00 -07:00
Mark Backman
e720573e60 Added 07n-interruptible-gemini 2025-08-12 14:54:49 -04:00
Mark Backman
541a43905b Add GeminiTTSService 2025-08-12 14:52:20 -04:00
Aleix Conchillo Flaqué
707df913cd FrameProcessor: fix race condition on FrameProcessorQueue
We need to increment the counters before the await otherwise we could go to a
different task that could add an item with the same counter.

Also, we need to handle non-frame items as well.
2025-08-12 11:48:22 -07:00
Aleix Conchillo Flaqué
3f3d757581 tests: added WatchdogQueue and WatchdogPriorityQueue unit tests 2025-08-12 11:48:22 -07:00
Aleix Conchillo Flaqué
7c781ce816 WatchdogPriorityQueue: make WatchdogPriorityCancelSentinel public 2025-08-12 11:34:31 -07:00
Aleix Conchillo Flaqué
f3efc9da00 WatchdogQueue: make WatchdogQueueCancelSentinel public 2025-08-12 11:34:31 -07:00
Mark Backman
827a70104d Merge pull request #2425 from pipecat-ai/mb/runner-add-exotel
Add Exotel support to the development runner
2025-08-12 10:36:54 -07:00
Mark Backman
a40327305c Add Exotel support to the development runner 2025-08-12 13:21:18 -04:00
Thu Nguyen
168af44429 Fix: assigns string code instead of Language enum to _language attr of BaseWhisperSTTService 2025-08-12 20:27:26 +07:00
Mark Backman
5f8433476c Merge pull request #2397 from gladiaio/PLA-37-GladiaSTTService-minor-tweaks
feat: add minor tweaks to GladiaSTTService
2025-08-12 04:59:40 -07:00
Fabrice Lamant
6a6fea74f5 fix: set default region to none 2025-08-12 13:31:51 +02:00
Mark Backman
91b557ecbf Merge pull request #2419 from pipecat-ai/mb/fix-lockfile-workflow 2025-08-12 03:39:54 -07:00
Mark Backman
be85291414 Merge pull request #2420 from pipecat-ai/mb/runner-handle-sigint-default 2025-08-12 03:39:29 -07:00
Fabrice Lamant
09f171b69d fix: only pass region if set 2025-08-12 12:05:38 +02:00
Aleix Conchillo Flaqué
929fd98958 Merge pull request #2416 from pipecat-ai/aleix/release-evals-vision
scripts(evals): add vision support
2025-08-11 20:08:08 -07:00
Aleix Conchillo Flaqué
1cfbfcaf11 scripts(evals): add vision support 2025-08-11 20:06:24 -07:00
Mark Backman
cd5a3c13bd Development runner: handle_sigint defaults to False 2025-08-11 22:06:56 -04:00
Mark Backman
9b871b0cc5 Update uv.lock, remove lockfile workflow, update CONTRIBUTING with dependency guidance 2025-08-11 21:39:25 -04:00
Mark Backman
0d499a8aa3 Merge pull request #2409 from pipecat-ai/mb/refactor-playht-http
Refactor PlayHTHttpTTSService to use aiohttp
2025-08-11 18:20:58 -07:00
Mark Backman
45292ab13d Merge pull request #2411 from pipecat-ai/mb/fix-websocket-service-retry
fix: WebsocketService retry logic incorrectly handling ConnectionClos…
2025-08-11 18:17:50 -07:00
Mark Backman
be6ea0dbf6 Code review feedback 2025-08-11 21:17:04 -04:00
Aleix Conchillo Flaqué
fb18ae174e Merge pull request #2417 from pipecat-ai/aleix/release-evals-15-series
scripts(evals): add multilinguag support and 15 series
2025-08-11 17:14:47 -07:00
Mark Backman
c4506523ab Refactor PlayHTHttpTTSService to use aiohttp 2025-08-11 19:58:25 -04:00
Aleix Conchillo Flaqué
b360cb31dc scripts(evals): add multilinguag support and 15 series 2025-08-11 15:21:14 -07:00
Aleix Conchillo Flaqué
07f104199c Merge pull request #2415 from pipecat-ai/aleix/moondream-2025-01-09
MoondreamService: update to revision 2025-01-09
2025-08-11 15:10:35 -07:00
Aleix Conchillo Flaqué
bc1949b4bf MoondreamService: update to revision 2025-01-09 2025-08-11 14:54:04 -07:00
Aleix Conchillo Flaqué
2035dd8b39 Merge pull request #2403 from pipecat-ai/aleix/system-frame-queue-priority-fix
FrameProcessor: fix system frame higher priorty and use a PriortyQueue
2025-08-11 13:57:57 -07:00
Aleix Conchillo Flaqué
24c8189327 Merge pull request #2405 from pipecat-ai/aleix/frame-processor-direct-mode
FrameProcessor: introduce direct mode
2025-08-11 13:57:34 -07:00
Mark Backman
998ac32627 Merge pull request #2413 from captaincaius/fix-stt-mute-filter-vad-frames-20250810
Add VADUserStartSpeakingFrame VADUserStopSpeakingFrame to STTMuteFilter (fix #2412)
2025-08-11 13:54:34 -07:00
Aleix Conchillo Flaqué
50645c1c4f README: recommend python 3.11-3.12
Python 3.11 has significant performance improvements compared to 3.10 which
makes Pipecat's asyncio heavy use  specially better.
2025-08-11 13:53:08 -07:00
Aleix Conchillo Flaqué
8ce29ee8f2 FrameProcessor: fix system frame higher priorty and use a PriortyQueue 2025-08-11 13:53:08 -07:00
Captain Caius
7b8aeef4cc update changelog 2025-08-11 12:45:54 -07:00
Aleix Conchillo Flaqué
6a24457f0e FrameProcessor: introduce direct mode
Direct mode avoids creating internal queues and tasks and processes frames right
away. This might be useful for some very simple processors.
2025-08-11 09:26:31 -07:00
Aleix Conchillo Flaqué
2c01c2b5b3 Merge pull request #2404 from pipecat-ai/aleix/examples-22-simplify-main-pipeline
examples(foundational): update 22 series with simple main pipelines
2025-08-11 09:14:39 -07:00
Aleix Conchillo Flaqué
1c2e114fa2 examples(foundational): update 22 series with simple main pipelines 2025-08-11 09:13:09 -07:00
Filipi da Silva Fuchter
0f137e36c2 Merge pull request #2399 from pipecat-ai/filipi/heygen_latency
Improving the latency of the `HeyGenVideoService`.
2025-08-11 09:13:10 -03:00
Filipi Fuchter
b7f12a96f1 Improving the latency of the HeyGenVideoService. 2025-08-11 09:11:17 -03:00
Filipi da Silva Fuchter
3331f71e17 Merge pull request #2398 from pipecat-ai/filipi/ttfb_metrics_video_services
Added TTFB metrics for `HeyGenVideoService` and `TavusVideoService`.
2025-08-11 09:09:27 -03:00
Filipi Fuchter
55d200e2d1 Added TTFB metrics for HeyGenVideoService and TavusVideoService. 2025-08-11 09:07:21 -03:00
Captain Caius
3fae00e067 Add VADUserStartSpeakingFrame VADUserStopSpeakingFrame to STTMuteFilter 2025-08-10 19:35:04 -07:00
Mark Backman
78cdefd191 Merge pull request #2410 from smokyabdulrahman/issue-2373
Support endpoint_id for AzureSTTService
2025-08-10 16:43:29 -07:00
Mark Backman
42502a4f3b fix: WebsocketService retry logic incorrectly handling ConnectionClosedOK exception 2025-08-10 19:35:05 -04:00
Abdulrahman Alrahma
fc67cc3302 Support endpoint_id for AzureSTTService 2025-08-10 22:24:47 +01:00
Aleix Conchillo Flaqué
241ab19228 update uv.lock with numba dependency 2025-08-08 15:12:55 -07:00
Mark Backman
c08e8ec8fb Merge pull request #2391 from pipecat-ai/mb/readme-local-dev
Update README with local dev setup for contributors
2025-08-08 11:15:58 -07:00
Mark Backman
eb9bc9644e Merge pull request #2400 from pipecat-ai/mb/pin-numba-0.61.2
fix: pin numba to >=0.61.2
2025-08-08 11:15:22 -07:00
Mark Backman
3a306dae90 fix: pin numba to >=0.61.2 2025-08-08 10:52:47 -04:00
Fabrice Lamant
e503ea7466 feat: add minor tweaks to GladiaSTTService 2025-08-08 10:21:52 +02:00
Mark Backman
c42cc8254f Update README with local dev setup for contributors 2025-08-07 22:07:35 -04:00
Aleix Conchillo Flaqué
a8e21f7d5d Merge pull request #2395 from pipecat-ai/aleix/examples-15-inherit-parallel-pipeline
examples(foundational): move 15/15a logic into its own processor
2025-08-07 17:59:28 -07:00
Aleix Conchillo Flaqué
c6ef8de578 scripts(evals): fix 14v-function-calling-openai.py 2025-08-07 17:57:47 -07:00
Aleix Conchillo Flaqué
fc571fba42 examples(foundational): move 15/15a logic into its own processor 2025-08-07 17:57:47 -07:00
Mark Backman
0502ee2b5a Merge pull request #2394 from pipecat-ai/mb/uv-lock
Update uv.lock
2025-08-07 15:25:38 -07:00
Mark Backman
9ec047094b Update uv.lock 2025-08-07 18:24:47 -04:00
Mark Backman
d991c106c8 Merge pull request #2393 from pipecat-ai/mb/openai-dep
fix: pin openai package upper bound to <=1.99.1
2025-08-07 15:19:05 -07:00
Mark Backman
312fb23c89 fix: pin openai package upper bound to <=1.99.1 2025-08-07 18:00:25 -04:00
Aleix Conchillo Flaqué
4d7f21d44e Merge pull request #2392 from pipecat-ai/aleix/avoid-using-tts-say
deprecate TTSService.say() method
2025-08-07 13:55:49 -07:00
Aleix Conchillo Flaqué
ec25d0a7c9 examples(foundational): fix 20a-persistent-context-openai 2025-08-07 13:48:32 -07:00
Aleix Conchillo Flaqué
2b8218deaa examples(foundational): use TTSSpeakFrame instead of TTSService.say() 2025-08-07 13:48:32 -07:00
Aleix Conchillo Flaqué
11119430cd TTSService: deprecate say() method 2025-08-07 13:48:32 -07:00
kompfner
9ca79232c1 Merge pull request #2380 from pipecat-ai/pk/deprecate-llm-messages-frame
Deprecate `LLMMessagesFrame`, `LLMUserResponseAggregator`, and `LLMAssistantResponseAggregator`
2025-08-07 15:13:01 -04:00
Paul Kompfner
9ea06c33f7 Bump deprecation version of LLMMessagesFrame, LLMUserResponseAggregator, and LLMAssistantResponseAggregator (the deprecation slipped past the 0.0.78 release) 2025-08-07 14:56:50 -04:00
Paul Kompfner
30a1dd202e Move deprecation of LLMMessagesFrame, LLMUserResponseAggregator, and LLMAssistantResponseAggregator into the next release in the changelog 2025-08-07 14:55:11 -04:00
Paul Kompfner
809ab0b7b6 Improve printed deprecation warning 2025-08-07 14:45:35 -04:00
Paul Kompfner
2b5db9c562 Remove redundant deprecation warning in docstring 2025-08-07 14:45:35 -04:00
Paul Kompfner
b4a886b59f Remove redundant deprecation warning in docstring 2025-08-07 14:45:35 -04:00
Paul Kompfner
07eb00722b Fix langchain unit test 2025-08-07 14:45:35 -04:00
Paul Kompfner
96652b8fba Add new deprecations to changelog 2025-08-07 14:45:30 -04:00
Paul Kompfner
df1fcf0c68 Remove unused import 2025-08-07 14:43:37 -04:00
Paul Kompfner
711f740d9e Update UserResponseAggregator to avoid using the now-deprecated LLMUserResponseAggregator 2025-08-07 14:43:37 -04:00
Paul Kompfner
a0bda98c20 Update langchain to avoid using the now-deprecated LLMMessagesFrame, LLMUserResponseAggregator, and LLMAssistantResponseAggregator 2025-08-07 14:43:37 -04:00
Paul Kompfner
1c1bae35ab Mention deprecation in docstring for LLMMessagesFrame 2025-08-07 14:43:37 -04:00
Paul Kompfner
56c52c2cf2 Deprecate LLMUserResponseAggregator and LLMAssistantResponseAggregator, which depend on the now-deprecated LLMMessagesFrame. 2025-08-07 14:43:37 -04:00
Paul Kompfner
740aee1a1a Fix an issue in AnthropicLLMContext where we would never initialize turns_above_cache_threshold if we were upgrading from an OpenAILLMContext.
I noticed this when working on 22c-natural-conversation-mixed-llms.py
2025-08-07 14:43:37 -04:00
Paul Kompfner
f0391c3280 Progress on updating foundational examples to avoid using the newly-deprecated LLMMessagesFrame.
Skipping over 07b-interruptible-langchain.py for now, as it requires deeper changes involving `LLMUserResponseAggregator` and `LLMAssistantResponseAggregator`.
2025-08-07 14:43:37 -04:00
Paul Kompfner
64e48e4660 Deprecate LLMMessagesFrame.
The same functionality can be achieved using either:
- `LLMMessagesUpdateFrame` with the desired messages, with `run_llm` set to `True`
- `OpenAILLMContextFrame` with a new context initialized with the desired messages
2025-08-07 14:43:37 -04:00
Paul Kompfner
b8147bdbbd Add missing Deepgram key to env.example 2025-08-07 14:43:37 -04:00
Aleix Conchillo Flaqué
315e45d41b Merge pull request #2389 from pipecat-ai/aleix/pipecat-0.0.78
update CHANGELOG for 0.0.78
2025-08-07 11:34:27 -07:00
Aleix Conchillo Flaqué
c057139c48 update CHANGELOG for 0.0.78 2025-08-07 11:14:54 -07:00
Mark Backman
c61e07132d Merge pull request #2390 from pipecat-ai/mb/optionally-ignore-emulated-speech
feat: Add option to ignore emulated user speech while the bot is spea…
2025-08-07 11:14:46 -07:00
Mark Backman
a5f5e418a8 feat: Add option to ignore emulated user speech while the bot is speaking 2025-08-07 14:08:11 -04:00
Mark Backman
31acfaa091 Merge pull request #2388 from pipecat-ai/14v-adding-openai-stt-tts-llm-functioncalling
14v adding OpenAI stt tts llm functioncalling
2025-08-07 10:22:35 -07:00
Mark Backman
69541c8835 Linting fix, plus update eval suite with 14v and others, tiny fix for 14m, too 2025-08-07 13:20:45 -04:00
Varun Singh
af94620839 Add OpenAI function calling example with Pipecat
Introduces a new example script demonstrating how to use OpenAI's function calling capabilities within a Pipecat pipeline. The example integrates OpenAI STT, TTS, and LLM services, registers a weather function, and sets up a pipeline for real-time audio interaction over WebRTC.
2025-08-07 13:20:45 -04:00
Filipi da Silva Fuchter
cec8a74293 Merge pull request #2386 from pipecat-ai/filipi/parallel_pipeline
Only push the StartFrame when all parallel pipelines have processed it
2025-08-07 14:20:30 -03:00
Filipi Fuchter
228a55ac1e Only push the StartFrame when all parallel pipelines have processed it. 2025-08-07 14:18:21 -03:00
Vanessa Pyne
ab9831daf0 Merge pull request #2382 from pipecat-ai/vp-trace-ignore-message
log: warning -> trace for elevenlabs tts unavailable context
2025-08-07 09:35:57 -05:00
Vanessa Pyne
e8c3f5dea6 Update src/pipecat/services/elevenlabs/tts.py
Co-authored-by: Mark Backman <mark@daily.co>
2025-08-07 09:23:33 -05:00
Mark Backman
4288b5e780 Merge pull request #2381 from pipecat-ai/aleix/runner-args-pipeline-idle-timeout
allow specifying PipelineTask idle timeout to runner arguments
2025-08-07 04:47:08 -07:00
Mark Backman
23343dd7e7 Remove idle_timeout_secs from quickstart 2025-08-07 07:44:21 -04:00
Mark Backman
88de5dd415 Merge pull request #2383 from pipecat-ai/aleix/riva-stt-iterator-exception
properly handle concurrent.futures.CancelledError
2025-08-07 04:39:56 -07:00
Mark Backman
33f87589d1 Merge pull request #2384 from pipecat-ai/aleix/release-evals-soniox-inworld-asyncai
scripts(evals): added soniox, inworld and asyncai
2025-08-07 04:35:18 -07:00
Aleix Conchillo Flaqué
7ed14ad91f scripts(evals): added soniox, inworld and asyncai 2025-08-06 23:14:50 -07:00
Aleix Conchillo Flaqué
86c6141580 DailyTransport: handle future cancellation 2025-08-06 23:03:20 -07:00
Aleix Conchillo Flaqué
c97643c797 RivaSTTService: always use WatchdogQueue 2025-08-06 23:00:03 -07:00
Aleix Conchillo Flaqué
434d346079 RivaSTTService: handle future cancellation 2025-08-06 22:59:52 -07:00
vipyne
64ae8d2394 log: warning -> trace for elevenlabs tts unavailable context 2025-08-06 22:40:47 -05:00
Aleix Conchillo Flaqué
786f24c9db examples(foundational): use RunnerArgs.pipeline_idle_timeout_secs 2025-08-06 19:38:06 -07:00
Aleix Conchillo Flaqué
38951aab56 scripts(evals): use RunnerArguments.pipeline_idle_timeout_secs 2025-08-06 19:37:29 -07:00
Aleix Conchillo Flaqué
ed8b0655a8 scripts(evals): fix runner eval cancellation
We need to call asyncio.gather() just once, not for every cancelled task.
2025-08-06 19:36:42 -07:00
Aleix Conchillo Flaqué
0b2b9f5f1b RunnerArguments: add pipeline_idle_timeout_secs 2025-08-06 19:35:40 -07:00
Filipi da Silva Fuchter
ad1841b739 Merge pull request #2377 from pipecat-ai/filipi/fast_api_freeze_issue
Fixed an issue in BaseOutputTransport where the loop could consume all CPU.
2025-08-06 14:58:36 -03:00
Mark Backman
b0c002c128 Merge pull request #2378 from pipecat-ai/mb/pyproject-compat-updates
Add new python-compatiblity workflow to check for dependency compatib…
2025-08-06 10:40:29 -07:00
Mark Backman
820176084c Add support for 3.13 by bumping min version for vllm to 0.9.0, adding support for torch and torchaudio up to the next major version 2025-08-06 13:36:01 -04:00
Mark Backman
5b7e31beff README updates for python versions 2025-08-06 13:36:01 -04:00
Mark Backman
41a22d3bf4 Add new python-compatiblity workflow to check for dependency compatibility across supported python versions 2025-08-06 13:36:01 -04:00
Filipi Fuchter
84fecabac5 Removing audio sleep from FastAPI and WebSocket server when they are not connected. 2025-08-06 14:02:51 -03:00
Filipi Fuchter
bbe01d10ef Fixed an issue in BaseOutputTransport where the loop could consume all CPU. 2025-08-06 12:42:58 -03:00
Mark Backman
4364990fd0 Merge pull request #2375 from fabrice404/gladia-region-selection
Gladia region selection
2025-08-06 07:01:24 -07:00
Fabrice Lamant
e576fa481f Add new region feature for GladiaSTTService in CHANGELOG 2025-08-06 15:31:10 +02:00
Mark Backman
ac6b59cae2 Merge pull request #2372 from pipecat-ai/mb/dotenv-dev
Wider package support for python-dotenv dev dep
2025-08-06 06:06:01 -07:00
Mark Backman
12e168e740 Wider package support for python-dotenv dev dep 2025-08-06 09:04:01 -04:00
Mark Backman
ac354f66ed Merge pull request #2371 from pipecat-ai/mb/docs-gen-with-uv
Update docs auto-generation to use uv
2025-08-06 06:02:52 -07:00
Mark Backman
eead793927 Merge pull request #2370 from pipecat-ai/mb/update-workflows-for-uv
Update workflows for uv
2025-08-06 05:54:55 -07:00
Fabrice Lamant
0594a203fc Add new region parameter to Gladia 2025-08-06 14:28:06 +02:00
Mark Backman
2337a2d92d Remove dev-requirements.txt and mentions of it 2025-08-05 21:46:50 -04:00
Mark Backman
b3e2603553 Update workflows for uv 2025-08-05 21:45:48 -04:00
Mark Backman
29229df719 Speed up builds, mocking large packages 2025-08-05 21:34:40 -04:00
Aleix Conchillo Flaqué
61f4dd2ff2 scripts(evals): fix 14e-function-calling-google 2025-08-05 17:44:45 -07:00
Mark Backman
42094fb206 Update docs auto-generation to use uv 2025-08-05 20:37:27 -04:00
Aleix Conchillo Flaqué
58c41f112a DailyRunnerArguments: make body optional (fix) 2025-08-05 16:59:36 -07:00
Aleix Conchillo Flaqué
fa55e2ca9b Merge pull request #2369 from pipecat-ai/aleix/pipeline-task-cancellation-fix
PipelineTask: always try to cancel things
2025-08-05 16:56:23 -07:00
Aleix Conchillo Flaqué
313fdc92a1 DailyRunnerArguments: make body optional 2025-08-05 16:39:18 -07:00
Aleix Conchillo Flaqué
d22d2da03d PipelineTask: always try to cancel things
In a previous commit we only cleanup things if the user run
`task.cancel()`. However, if the task finishes cleanly we were not cancelling
anything.
2025-08-05 16:24:59 -07:00
Aleix Conchillo Flaqué
de2ae9a2ec Merge pull request #2368 from pipecat-ai/aleix/release-evals-runner-args-fix
pass runner arguments to release evals
2025-08-05 16:23:32 -07:00
Aleix Conchillo Flaqué
52a6d8013c scripts(evals): pass runner arguments to run_bot() 2025-08-05 16:13:32 -07:00
Aleix Conchillo Flaqué
f14cbae9b5 DailyRunnerArguments: make token optional
DailyTransport can get a None token value.
2025-08-05 15:46:12 -07:00
Aleix Conchillo Flaqué
8fe906438a Merge pull request #2358 from pipecat-ai/aleix/system-frames-queued
system frames are now queued
2025-08-05 15:09:52 -07:00
Mark Backman
d8f4db8827 Merge pull request #2367 from richtermb/richtermb/fix-errorframe-docstring
Rename 'source' parameter to 'processor' in ErrorFrame class document…
2025-08-05 15:09:18 -07:00
Aleix Conchillo Flaqué
a5ea6e1642 FrameProcessor: system frames are now queued
System frames are now queued. Before, system frames could be generated from any
task and would not guarantee any order which was causing undesired
behavior. Also, it was possible to get into some rare recursion issues because
of the way system frames were executed (they were executed in-place, meaning
calling `push_frame()` would finish after the system frame traversed all the
pipeline). This makes system frames more deterministic.
2025-08-05 15:05:50 -07:00
richtermb
e777e78510 Rename 'source' parameter to 'processor' in ErrorFrame class documentation for clarity. 2025-08-05 15:02:00 -07:00
Aleix Conchillo Flaqué
49a5a1e375 PipelineTask: improve task cancellation 2025-08-05 14:49:23 -07:00
Aleix Conchillo Flaqué
61cb45d61b PipelineTask: also wait on CancelFrame
Before CancelFrames didn't need to be waited for because system frames were
processed in-place and therefore calling push_frame() would finalize after it
traversed all the pipeline. Now, system frames are queued so we need to wait
until CancelFrame reaches the end of the pipeline.
2025-08-05 14:49:23 -07:00
Aleix Conchillo Flaqué
6c6deb4e85 Merge pull request #2366 from pipecat-ai/aleix/run-bot-runner-arguments
add sigint/sigterm to RunnerArguments
2025-08-05 14:46:19 -07:00
Aleix Conchillo Flaqué
66ad29b2b1 example: pass RunnerArguments to run_bot()
This lets us get handle_sigint from RunnerArguments which knows where the
application is running and if SIGINT/SIGTERM should be handled or not.
2025-08-05 14:38:55 -07:00
Aleix Conchillo Flaqué
21e4f0d56d PipelineRunner: argument ordering 2025-08-05 14:38:55 -07:00
Aleix Conchillo Flaqué
627b44bac2 runner: use new RunnerArguments handle_sigint/handle_sigterm
This allow us to control applications behavior from the runner arguments, which
depen on the environment they run.
2025-08-05 14:38:55 -07:00
Aleix Conchillo Flaqué
e2a576beca RunnerArguments: add handle_sigint/handle_sigterm 2025-08-05 14:32:28 -07:00
Mark Backman
2981afb117 Merge pull request #2361 from pipecat-ai/mb/fix-changelog-simli
Fix Simli changelog entry placement
2025-08-05 14:12:38 -07:00
Mark Backman
d422c57b52 Merge pull request #2304 from pipecat-ai/mb/cartesia-cjk-lang-support
CartesiaTTSService: Add CJK lang support for word timestamps
2025-08-05 14:08:53 -07:00
Mark Backman
06d8bbd154 Fix Simli changelog entry placement 2025-08-05 17:07:58 -04:00
Mark Backman
35108afeb8 Merge pull request #2360 from pipecat-ai/mb/add-heygen-readme
Add HeyGen to the README page
2025-08-05 14:05:33 -07:00
Mark Backman
a0e2a2754a Merge pull request #2327 from richtermb/richtermb/push-more-error-frames
Add source parameter to ErrorFrame and set it in FrameProcessor. Upda…
2025-08-05 14:04:52 -07:00
Mark Backman
b8d620c8bb Merge pull request #2362 from pipecat-ai/mb/aws-stt-languages
AWSTranscribeSTTService add support for new languages
2025-08-05 14:00:50 -07:00
Mark Backman
f26bbe4092 Merge pull request #2363 from pipecat-ai/mb/update-14p
Update 14p, add 14p to evals, add Google creds to env.example
2025-08-05 14:00:13 -07:00
Mark Backman
52cb23f8d5 Merge pull request #2364 from pipecat-ai/mb/11labs-default-model
ElevenLabs TTS services: revert to Turbo v2.5 as default model
2025-08-05 13:59:59 -07:00
Filipi da Silva Fuchter
17e7f8a2cd Merge pull request #2352 from pipecat-ai/filipi/webrtc_audio_frame
Implementing if the bot it is speaking or not based on the SpeechOutputAudioRawFrame
2025-08-05 17:26:44 -03:00
richtermb
efddc4732c Refactor ErrorFrame: rename source field to processor for clarity and update related references in FrameProcessor. 2025-08-05 13:25:08 -07:00
richtermb
4476a76ad7 Merge branch 'main' into richtermb/push-more-error-frames 2025-08-05 13:23:24 -07:00
Filipi Fuchter
64592b274b Fixed an issue where BotStartedSpeakingFrame and BotStoppedSpeakingFrame
were not emitted when using `TavusVideoService` or `HeyGenVideoService`.
2025-08-05 17:11:34 -03:00
Aleix Conchillo Flaqué
95c661bdaa Merge pull request #2365 from pipecat-ai/aleix/update-release-evals-for-new-runner
scripts(evals): update to use new runner function
2025-08-05 13:07:57 -07:00
Aleix Conchillo Flaqué
5546c8e01c scripts(evals): update to use new runner function 2025-08-05 11:46:28 -07:00
Mark Backman
14e02c1b08 ElevenLabs TTS services: revert to Turbo v2.5 as default model 2025-08-05 13:44:37 -04:00
Mark Backman
ba5a5c7187 Update 14p, add 14p to evals, add Google creds to env.example 2025-08-05 13:30:36 -04:00
Mark Backman
2378cba155 AWSTranscribeSTTService add support for new languages 2025-08-05 13:01:06 -04:00
Mark Backman
1138c92a00 Merge pull request #2217 from simliai/main
feat: Add Simli Trinity models support to pipecat
2025-08-05 09:01:20 -07:00
Antonyesk601
fb82dc8308 Update CHANGELOG.md
Co-authored-by: Mark Backman <m.backman@gmail.com>
2025-08-05 17:46:01 +02:00
Mark Backman
c8a15f30fa Add HeyGen to the README page 2025-08-05 10:54:49 -04:00
antonyesk601
72168070f1 update changelog 2025-08-05 14:18:41 +00:00
Mark Backman
50083d1144 Merge pull request #2342 from pipecat-ai/mb/runner-connect-request-body
Development runner handles body information in the RTVI connect request
2025-08-05 05:15:55 -07:00
Mark Backman
64732518c6 Development runner handles body information in the RTVI connect request 2025-08-05 07:26:34 -04:00
Mark Backman
c3d8ea210f CartesiaTTSService: Add CJK lang support for word timestamps 2025-08-05 07:17:40 -04:00
Filipi da Silva Fuchter
98ed614f63 Merge pull request #2357 from pipecat-ai/filipi/latency_observer
Added detailed latency logging to UserBotLatencyLogObserver.
2025-08-05 08:11:48 -03:00
Filipi Fuchter
e43bdff31e Added detailed latency logging to UserBotLatencyLogObserver. 2025-08-04 19:36:30 -03:00
Mark Backman
42e48381fe Merge pull request #2355 from pipecat-ai/mb/update-readme-for-uv
Update the README with uv-centric steps
2025-08-04 15:28:07 -07:00
Mark Backman
df7ba64b4a Merge pull request #2354 from pipecat-ai/mb/revert-43-inline-script
Remove inline script from foundational 43a
2025-08-04 15:27:28 -07:00
Mark Backman
ac9b2e67a7 Merge pull request #2349 from pipecat-ai/mb/runner-support-daily-url-arg
daily runner util: remove arg parsing, add auto room, token generation
2025-08-04 13:44:25 -07:00
Mark Backman
c9918607cf Merge pull request #2335 from pipecat-ai/mb/quickstart-runner-improvements
Improve quickstart logging, runner startup message
2025-08-04 13:43:42 -07:00
Mark Backman
cfda410a43 Remove foundational requirements.txt file 2025-08-04 16:38:37 -04:00
Mark Backman
c773ddf83d Update foundational examples README 2025-08-04 16:26:11 -04:00
Mark Backman
54d5ebbc20 Update the README with uv-centric steps 2025-08-04 16:11:38 -04:00
Mark Backman
35002cd727 Remove inline script from foundational 43a 2025-08-04 15:46:18 -04:00
Mark Backman
53d75faa47 Merge pull request #2330 from pipecat-ai/mb/runner-clean-proxy-name
Runner: strip protocol from proxy address
2025-08-04 10:42:16 -07:00
Mark Backman
2901dddc2b Merge pull request #2338 from pipecat-ai/mb/update-release-evals-tavus
Add Tavus, HeyGen, Simli to release-evals
2025-08-04 10:38:27 -07:00
Mark Backman
3a8d809837 Runner: strip protocol from proxy address 2025-08-04 13:38:02 -04:00
Mark Backman
1b3c2bee30 Merge pull request #2331 from pipecat-ai/mb/more-foundational
Updating more foundational examples
2025-08-04 10:37:15 -07:00
Mark Backman
69f049cb63 Merge pull request #2328 from pipecat-ai/mb/04b-example-cleanup
Align 04b livekit example with other foundational examples
2025-08-04 10:36:57 -07:00
Vanessa Pyne
96b1000e52 Merge pull request #2341 from getchannel/realtime-text
Hotfix: Correct Gemini Live API class to fix 1007 payload error.
2025-08-04 11:03:57 -05:00
Filipi da Silva Fuchter
0184a8c231 Merge pull request #2351 from pipecat-ai/filipi/tavus_transport_ready
Changed `TavusVideoService` to send audio or video frames only after the   transport is ready, preventing warning messages at startup.
2025-08-04 11:48:27 -03:00
Filipi Fuchter
c22866ed58 Mentioning the TavusVideoService fix in the changelog. 2025-08-04 11:46:24 -03:00
Filipi Fuchter
0e533d21be Only send audio and video from the tavus video service if the transport is ready. 2025-08-04 10:52:30 -03:00
Mark Backman
6f6f4c3dea Merge pull request #2348 from sam-s10s/speechmatics-stt
Fix for Speechmatics STT
2025-08-04 06:15:39 -07:00
Mark Backman
f609971637 daily runner util: remove arg parsing, add auto room, token generation 2025-08-03 21:50:44 -04:00
Mark Backman
54ff10ae86 Merge pull request #2332 from hankehly/fix-piper-tts-json-payload
Fix PiperTTSService to send TTS input as JSON object
2025-08-03 17:39:04 -07:00
hankehly
77057eb829 Fix ruff formatting 2025-08-04 08:13:16 +09:00
Mark Backman
2b1a7b840d Merge pull request #2346 from adenta/heygen-testing
me trying to get heygen working
2025-08-03 14:11:14 -07:00
Sam Sykes
e07db88bc0 Updated changelog. 2025-08-03 22:11:10 +01:00
Sam Sykes
c2282b0e73 Set user_id to "" (not None) for RTVIProcessor. 2025-08-03 22:08:22 +01:00
Andrew Denta
593bf09d8d update script 2025-08-03 17:01:27 -04:00
Sam Sykes
534ed77ebf Merge branch 'main' into speechmatics-stt 2025-08-03 21:51:35 +01:00
Andrew Denta
193299988d me trying to get heygen working 2025-08-03 13:42:07 -04:00
Aleix Conchillo Flaqué
d589bcb345 Merge pull request #2344 from pipecat-ai/aleix/daily-python-0.19.6
pyproject: update daily-python to 0.19.6
2025-08-03 10:15:15 -07:00
Aleix Conchillo Flaqué
011ebc2801 Merge pull request #2345 from pipecat-ai/aleix/task-observer-signature-performance
TaskObserver: don't inspect on_push_frame signature for every frame
2025-08-03 10:15:00 -07:00
Aleix Conchillo Flaqué
3a72e94d0c LLMService: only do handle function inspection once 2025-08-03 10:09:19 -07:00
Aleix Conchillo Flaqué
d6d39fc873 TaskObserver: don't inspect on_push_frame signature for every frame 2025-08-03 10:09:17 -07:00
Pete
258e83c904 Fix: Correct Gemini Live API text input to prevent 1007 WebSocket errors
- Restore TextInputMessage.realtimeInput structure for correct API format
- Remove invalid turnComplete message from _send_user_text method
- turnComplete is only valid for clientContent, not realtimeInput messages
- realtimeInput text completion is automatically inferred by the API

This fixes WebSocket 1007 errors caused by mixing realtimeInput and
clientContent message types in violation of the Gemini Live API contract.
2025-08-03 10:58:59 -04:00
Mark Backman
061f2086b2 Merge pull request #2343 from pipecat-ai/mb/update-pre-commit-ruff-version
Update pre-commit-config ruff version
2025-08-03 03:38:54 -07:00
Aleix Conchillo Flaqué
a1f3f51168 pyproject: update daily-python to 0.19.6 2025-08-02 20:02:22 -07:00
hankehly
2177a2b805 Remove trailing space 2025-08-03 10:34:27 +09:00
hankehly
68164415ce Format changelog entry 2025-08-03 10:26:37 +09:00
hankehly
7646599b66 Add changelog entry 2025-08-03 10:23:58 +09:00
Mark Backman
e467eaf130 Merge pull request #2334 from Designedforusers/fix/tavus-transport-daily-callbacks
fix: Add missing transcription callbacks to TavusTransport for 0.0.77 compatibility
2025-08-02 16:09:57 -07:00
Designedforusers
9d6d53629e style: Apply ruff formatting to fix line length
Fixed a line length issue in tavus.py where the on_transcription_stopped callback was exceeding the maximum line length. Split the partial() call across multiple lines for better readability and compliance with project style guidelines.
2025-08-02 18:27:09 -04:00
Mark Backman
89596cfec4 Update pre-commit-config ruff version 2025-08-02 18:06:06 -04:00
Designedforusers
5e338ecaf1 refactor: Remove redundant transcription callback methods
As suggested in PR review, removed the _on_transcription_stopped and
_on_transcription_error method definitions. Now using the consistent
partial(self._on_handle_callback, ...) pattern for these callbacks,
matching how all other callbacks are handled.

This simplifies the code while maintaining the same functionality.
2025-08-02 15:02:54 -04:00
Designedforusers
62319021f8 docs: Add changelog entry for TavusVideoService fix
Added changelog entry as requested by maintainers for the fix addressing
missing transcription callbacks in TavusVideoService.
2025-08-02 14:53:44 -04:00
Pete
cccd82a617 Refactor TextInputMessage class to replace realtimeInput with a text attribute.
This was sending a 1007 because it was wrapping RealtimeInput in the json.

- Updated the `TextInputMessage` class to directly store text input as a string.
- Modified the `from_text` class method to create an instance using the new `text` attribute.
2025-08-02 14:34:00 -04:00
Mark Backman
f552ba1f5e Merge pull request #2336 from pipecat-ai/mb/suppress-pydub-warning
Suppress pydub (cartesia dependency) SyntaxWarning
2025-08-02 10:01:05 -07:00
Mark Backman
b9a2a9b729 Add Tavus, HeyGen, Simli to release-evals 2025-08-02 09:35:06 -04:00
Mark Backman
e43b3869c3 Suppress pydub SyntaxWarning from the cartesia module 2025-08-02 08:49:59 -04:00
Mark Backman
55731df999 Improve quickstart logging, runner startup message 2025-08-02 08:40:05 -04:00
Designedforusers
3a7ea25077 fix: Add missing transcription callbacks to TavusTransport for 0.0.77 compatibility
TavusTransport was broken in Pipecat 0.0.77 due to PR #2292 adding required
callbacks (on_transcription_stopped, on_transcription_error) to DailyCallbacks.

This fix adds placeholder implementations of these callbacks to TavusTransportClient,
allowing TavusTransport to initialize properly. These callbacks are not used by
Tavus (which handles avatar video, not transcription) but are required by the
DailyCallbacks validation.

Fixes initialization error:
- 2 validation errors for DailyCallbacks
- on_transcription_stopped: Field required
- on_transcription_error: Field required
2025-08-02 05:56:46 -04:00
Yohan Liyanage
248206e234 Fixes 2277 - SSML reserved characters in LLM generated text causes Azure TTS to fail. 2025-08-02 12:49:29 +05:30
hankehly
694922f627 Fix PiperTTSService to send TTS input as JSON object 2025-08-02 15:29:16 +09:00
Mark Backman
cc9950e72d Updating more foundational examples 2025-08-01 19:58:40 -04:00
richtermb
6814c390ba Update CHANGELOG to reflect the addition of the source field in ErrorFrame for improved error tracking. 2025-08-01 14:47:57 -07:00
Richter Brzeski
c2d05ad23b Merge branch 'pipecat-ai:main' into richtermb/push-more-error-frames 2025-08-01 14:47:08 -07:00
Mark Backman
ee56d8572d Merge pull request #2329 from pipecat-ai/mb/fix-livekit-empty-audio-frames
fix: LiveKitTransport, don't push empty AudioRawFrames
2025-08-01 12:53:05 -07:00
richtermb
91568eeddc Update type hint for source in ErrorFrame to use forward declaration for improved clarity. 2025-08-01 12:52:56 -07:00
richtermb
165d6b4c1d Update CHANGELOG to include new source field in ErrorFrame for error tracking. 2025-08-01 12:25:29 -07:00
Mark Backman
1d8abe3c1c fix: LiveKitTransport, don't push empty AudioRawFrames 2025-08-01 14:57:53 -04:00
Mark Backman
a6e69d6aad Merge pull request #2325 from pipecat-ai/mb/dependency-groups
Move dev to [dependency-groups], update uv.lock
2025-08-01 11:54:21 -07:00
Mark Backman
519da9cc61 Align 04b livekit example with other foundational examples 2025-08-01 14:28:15 -04:00
richtermb
ead4e97ab5 Add source parameter to ErrorFrame and set it in FrameProcessor. Updated error handling in AnthropicLLMService and DeepgramSTTService to include ErrorFrame with source information. 2025-08-01 11:14:50 -07:00
Mark Backman
0c021378b0 Merge pull request #2326 from pipecat-ai/readme-quickstart-link
Update README.md
2025-08-01 10:45:30 -07:00
Mark Backman
e22c7e8ad5 Update README.md 2025-08-01 13:40:03 -04:00
Mark Backman
b71057bf7c Move dev to [dependency-groups], update uv.lock 2025-08-01 09:43:56 -04:00
Mark Backman
0865f6cd7d Merge pull request #2318 from pipecat-ai/mb/add-asyncio-readme
Add AsyncAI TTS to README vendor list
2025-08-01 06:11:13 -07:00
Mark Backman
610b1ab065 Merge pull request #2319 from pipecat-ai/mb/use-new-runner
Update foundational examples to use new runner
2025-08-01 06:11:03 -07:00
Mark Backman
3a2a226668 Merge pull request #2320 from pipecat-ai/mb/uv-lock-init
Add initial uv.lock file
2025-08-01 06:07:53 -07:00
Mark Backman
8e4b7352fd Merge pull request #2321 from pipecat-ai/mb/dev-requirements
Add dev to optional-dependencies
2025-08-01 06:02:58 -07:00
Mark Backman
637d372fe4 Add dev to optional-dependencies 2025-07-31 23:39:23 -04:00
Mark Backman
ac15fe8ae4 Add workflow to update lockfile with pyproject.toml changes 2025-07-31 23:08:21 -04:00
Mark Backman
07239c0b8b Add initial uv.lock file 2025-07-31 22:46:44 -04:00
Mark Backman
367b2fbe3c Update requirements.txt 2025-07-31 22:12:57 -04:00
Mark Backman
f1b1d5b130 Update foundational examples to use the development runner 2025-07-31 22:11:32 -04:00
Mark Backman
ff45b77fdf Remove examples runner 2025-07-31 21:22:04 -04:00
Mark Backman
e522b7ae96 Add AsyncAI TTS to README vendor list 2025-07-31 19:33:37 -04:00
Mark Backman
b8eef4f93b Merge pull request #2314 from pipecat-ai/mb/sync-quickstart-example
Add workflow to sync quickstart to pipecat-quickstart repo
2025-07-31 15:34:26 -07:00
Mark Backman
dcc205996a Merge pull request #2317 from pipecat-ai/mb/release-prep-0.0.77
Changelog update for 0.0.77
2025-07-31 15:34:02 -07:00
Mark Backman
9f61af4d1b Changelog update for 0.0.77 2025-07-31 18:19:05 -04:00
Sam Sykes
e8faf28e6a Doc fix for incorrect argument name. 2025-07-31 22:30:54 +01:00
Filipi da Silva Fuchter
40d53b3d84 Merge pull request #2316 from sam-s10s/speechmatics-stt
Updated to SpeechmaticsSTTService
2025-07-31 18:28:16 -03:00
Sam Sykes
7c223a86c2 Fix to missing deprecated attribute enable_speaker_diarization. 2025-07-31 22:25:46 +01:00
Sam Sykes
2d3f61aa07 Updated Speechmatics Plugin (#2225)
Changes
Split out module attributes to make engine settings clearer
Removed internal audio buffer to use latest Speechmatics python SDK (0.4.0)
Use diarization for improved VAD in multi-speaker situations
Support custom dictionary / vocabulary with attributes
Deprecated attributes superseded by re-organised attributes

Diarization Enhancements
Focus on specific speakers (using speaker labels)
Ignore specific speakers (using speaker labels)
Separate transcription formats for active and inactive speakers
Support for known speakers
2025-07-31 17:51:38 -03:00
Mark Backman
e05a47744d Merge pull request #2311 from pipecat-ai/mb/quickstart-fixups
Set quickstart min pipecat-ai version to 0.0.77, remove non-quickstart examples
2025-07-31 13:42:10 -07:00
Aleix Conchillo Flaqué
6ffaab2b93 CHANGELOG cosmetics 2025-07-31 13:39:37 -07:00
Aleix Conchillo Flaqué
c2d8844903 Merge pull request #2312 from pipecat-ai/aleix/srhinos/main
Enable Interruption Support for LLMUserResponseAggregator
2025-07-31 13:30:57 -07:00
Mark Backman
e8caba7723 Add workflow to sync quickstart to pipecat-quickstart repo 2025-07-31 15:56:18 -04:00
Mark Backman
df96ef7d37 Remove non-quickstart demos 2025-07-31 15:38:54 -04:00
Aleix Conchillo Flaqué
7553f670af fix formatting and update CHANGELOG 2025-07-31 10:41:11 -07:00
Mark Backman
6960f5861b Set example min pipecat-ai version to 0.0.77 due to runner requirement 2025-07-31 12:21:58 -04:00
Mark Backman
b5edbbc0ca Merge pull request #2309 from pipecat-ai/mb/remove-runner-examples
Remove examples/runner-examples
2025-07-31 09:18:22 -07:00
Aleix Conchillo Flaqué
e78d9c2c95 Merge pull request #2293 from azain47/azain47/fix-piper-tts-service
Fix Piper TTS Service
2025-07-31 08:32:17 -07:00
Vanessa Pyne
b25547a98b Merge pull request #2305 from pipecat-ai/vp-changelog-text-input
update changelog
2025-07-31 10:16:47 -05:00
Mark Backman
e80281c3c4 Remove examples/runner-examples 2025-07-31 10:59:06 -04:00
Mark Backman
d692843e5b Merge pull request #2308 from pipecat-ai/mb/change-neuphonic-url
NeuphonicTTSService: change the default url value to the global endpoint
2025-07-31 07:38:57 -07:00
Mark Backman
eaad3c5d55 NeuphonicTTSService: change the default url value to the global endpoint 2025-07-31 10:24:54 -04:00
vipyne
f2a1c66379 update changelog 2025-07-31 08:55:25 -05:00
Vanessa Pyne
af8de227bb Merge pull request #2223 from getchannel/realtime-text
Add text input handling to unify context for realtimeInput stream of GeminiMultimodalLiveService
2025-07-31 08:53:39 -05:00
Mark Backman
7cd78dd286 Merge pull request #2303 from pipecat-ai/mb/add-new-quickstart-demos
Add quickstart demos
2025-07-31 05:56:00 -07:00
Mark Backman
226b516948 Add quickstart demos 2025-07-30 22:14:10 -04:00
Mark Backman
aa85fffa57 New runner module (#2269)
* Adds pipecat.runner.run - FastAPI-based development server with automatic bot discovery

* Adds new RunnerArguments types for different transports

* Adds new runner utils for creating transports and parsing data

* Adds new Daily and LiveKit utils for setup
2025-07-30 22:02:28 -04:00
srhinos
8b97ab70ff Enable Interruption Support for LLMUserResponseAggregator 2025-07-30 20:48:31 -04:00
Filipi da Silva Fuchter
9013b2929a Merge pull request #2300 from pipecat-ai/filipi/fast_api_race_condition
Fixed a race condition in FastAPIWebsocketClient
2025-07-30 18:10:09 -03:00
Filipi Fuchter
0c6e12a9b0 Fixed a race condition in FastAPIWebsocketClient that occurred when attempting to send a message while the client was disconnecting. 2025-07-30 18:07:40 -03:00
Aleix Conchillo Flaqué
efb24071d5 Merge pull request #2301 from pipecat-ai/aleix/daily-python-0.19.5
pyproject: update daily-python to 0.19.5
2025-07-30 14:01:27 -07:00
Filipi da Silva Fuchter
318ebec67e Merge pull request #2298 from pipecat-ai/filipi/google_interruptions
Fixed an issue in GoogleLLMService where interruptions did not work when an interruption strategy was used.
2025-07-30 17:49:07 -03:00
Aleix Conchillo Flaqué
c679227aa8 pyproject: update daily-python to 0.19.5 2025-07-30 13:19:48 -07:00
Filipi Fuchter
392853f5fa Fixed an issue in GoogleLLMService where interruptions did not work when an interruption strategy was used. 2025-07-30 12:10:32 -03:00
Mark Backman
98d27caab3 Merge pull request #2296 from pipecat-ai/mb/switch-rime-voices
Added the ability to switch voices to RimeTTSService
2025-07-30 07:55:52 -07:00
Mark Backman
0fa51968bf Added the ability to switch voices to RimeTTSService 2025-07-30 10:53:14 -04:00
Mark Backman
92aee2634b Merge pull request #2291 from pipecat-ai/mb/remove-on-client-closed 2025-07-30 06:36:32 -07:00
Filipi da Silva Fuchter
bff6a93f31 Merge pull request #2150 from pipecat-ai/filipi/hey_gen
HeyGen implementation for Pipecat - HeyGenVideoService
2025-07-30 09:10:07 -03:00
Filipi Fuchter
6e921cdf45 HeyGen implementation for Pipecat - HeyGenVideoService 2025-07-30 09:07:15 -03:00
Azain.
1e2b066cf3 Fix tts.py
Update Piper TTS Service to work with the newer Piper GPL Version, that uses JSON as its payload.
2025-07-30 13:41:27 +05:30
Pete
2af3b6329d Ruff format debug 2025-07-29 17:48:11 -04:00
Pete
8ca06e5887 Add InputTextRawFrame class for handling raw text input in frames
- Introduced `InputTextRawFrame` to represent raw text input from users or programs.
- Updated `GeminiMultimodalLiveLLMService` to process `InputTextRawFrame` and send user text via the Gemini Live API's realtime input stream.
- Enhanced `_send_user_text` method documentation for clarity on its functionality and usage.
2025-07-29 17:43:14 -04:00
Mark Backman
c145a9ef13 Merge pull request #2288 from pipecat-ai/mb/stt-mute-examples
Update placement of STTMuteFilter in examples to reflect the new reco…
2025-07-29 12:10:36 -07:00
Mark Backman
b523f9a4c6 Merge pull request #2248 from ashotbagh/feat/async-tts
feat(tts): integrate Async TTS engine into pipecat
2025-07-29 12:10:11 -07:00
Mark Backman
7f184422d0 Merge branch 'main' into feat/async-tts 2025-07-29 12:06:56 -07:00
Aleix Conchillo Flaqué
fa4c3ec6bf Merge pull request #2287 from pipecat-ai/aleix/asyncio-trace-logging
utils(asyncio): use trace logging for some cancelling messages
2025-07-29 11:56:25 -07:00
Aleix Conchillo Flaqué
9fafc10844 Merge pull request #2292 from richtermb/transcription-error-callback
Add on_transcription_error callback to DailyCallbacks and handle tran…
2025-07-29 11:56:02 -07:00
richtermb
67107d02ed Refactor callback invocation for on_transcription_stopped in DailyTransportClient for improved readability 2025-07-29 11:53:41 -07:00
richtermb
c1df19982c Add on_transcription_stopped callback to DailyTransport for handling transcription stop events 2025-07-29 11:50:16 -07:00
richtermb
444b1b5b02 Add on_transcription_stopped callback to DailyCallbacks and implement handling in DailyTransport for transcription stop events 2025-07-29 11:49:28 -07:00
Mark Backman
ebfa4f2d5e Push the STTMuteFrame upstream and downstream 2025-07-29 14:37:36 -04:00
Mark Backman
e961c438e7 Update placement of STTMuteFilter in examples to reflect the new recommendation 2025-07-29 14:36:39 -04:00
richtermb
d3d36a89e2 Add _on_transcription_error method to DailyTransport for handling transcription error events 2025-07-29 10:48:50 -07:00
richtermb
fa6e5ce4a7 Add on_transcription_error callback to DailyCallbacks and handle transcription errors in DailyTransportClient 2025-07-29 10:43:18 -07:00
Mark Backman
3ffb261864 Remove use of on_client_closed event in foundational examples 2025-07-29 13:28:33 -04:00
Mark Backman
f69a02b7a7 Merge pull request #2290 from pipecat-ai/mb/remove-examples
Removed most pipecat examples, relocating to pipecat-examples repo
2025-07-29 10:02:24 -07:00
Mark Backman
f1f4aed398 Remove random Dockerfile, update README links 2025-07-29 11:41:27 -04:00
Mark Backman
414c245c92 Remove android.yaml github workflow 2025-07-29 11:34:48 -04:00
Mark Backman
3f57d94c0b Update examples README 2025-07-29 11:28:40 -04:00
Mark Backman
15e3c69ddc Removed most pipecat examples, relocating to pipecat-examples repo 2025-07-29 11:17:01 -04:00
Ashot
39b00f5269 chore: address review comments 2025-07-29 18:20:50 +04:00
Mark Backman
4c368c78c6 Merge pull request #2289 from tomoima525/tomoima525/transcription-bucket-params
Add transcription_bucket param for rest helper
2025-07-29 05:05:08 -07:00
Tomoaki Imai
6eb00a99cb update Changelog for transcription_bucket params addition 2025-07-29 20:38:12 +09:00
Tomoaki Imai
3ae8cf1916 Add transcription_bucket param for rest helper 2025-07-29 18:58:38 +09:00
Aleix Conchillo Flaqué
03e87469df utils(asyncio): use trace logging for some cancelling messages 2025-07-28 17:43:41 -07:00
Mark Backman
70255d3c81 Merge pull request #2274 from pipecat-ai/mb/remove-message-name
Remove message["name"] addition when pushing
2025-07-28 17:29:23 -07:00
Mark Backman
96a72d0647 Remove message[name] addition when pushing 2025-07-28 20:13:50 -04:00
Mark Backman
27d4910694 Merge pull request #2286 from pipecat-ai/mb/fix-transcript-processor-newline
fix: Improve TranscriptProcessor detection for transcript type
2025-07-28 17:07:43 -07:00
Mark Backman
50242f4ad8 fix: Improve TranscriptProcessor detection for transcript type 2025-07-28 19:56:36 -04:00
Mark Backman
c9dda5251c Merge pull request #2284 from pipecat-ai/mb/yank-74-75
Add yanked notices to 0.0.74 and 0.0.75 changelogs
2025-07-28 12:55:47 -07:00
Mark Backman
419cc9ac68 Add yanked notices to 0.0.74 and 0.0.75 changelogs 2025-07-28 13:36:02 -04:00
Ashot
83b4747196 chore: address review comments 2025-07-28 17:52:17 +04:00
Ashot
a13b954415 formatting/cleanup: address Copilot PR review comments 2025-07-28 17:43:17 +04:00
Ashot
f2e9562f1b feat(tts): integrate Async TTS engine into pipecat 2025-07-28 17:42:57 +04:00
Mark Backman
afed9a61f2 Merge pull request #2268 from pipecat-ai/mb/inworld-changelog
Add changelog entry for InworldTTSService
2025-07-28 05:59:57 -07:00
Mark Backman
f0de27b35e Merge pull request #2273 from pipecat-ai/mb/gitignore-plivo-stream
.gitignore: add plivo-chatbot streams.xml, .python-version
2025-07-28 05:59:41 -07:00
Filipi da Silva Fuchter
9d5510ee47 Merge pull request #2265 from pipecat-ai/filipi/small_webrtc_buffer_processor
Fixed an issue in AudioBufferProcessor when using SmallWebRTCTransport
2025-07-28 09:23:58 -03:00
Mark Backman
434c3fc527 Merge pull request #2279 from Allenmylath/patch-26
Update README.md
2025-07-28 04:51:58 -07:00
allenmylath
aba79a9478 Update README.md
Explanatory comment.Eventhought code allows for transport params.It is not clearly given in readme, what all are the options are there.
2025-07-28 11:24:27 +05:30
Mark Backman
fc96e091a9 Update NVIDIA in README 2025-07-26 15:01:52 -04:00
Mark Backman
851a27c082 Add Groq TTS to README 2025-07-26 14:58:07 -04:00
Mark Backman
a72d93dc6d Add Inworld to README 2025-07-26 14:55:19 -04:00
Mark Backman
c971232f20 .gitignore: add plivo-chatbot streams.xml, .python-version 2025-07-26 10:19:50 -04:00
Mark Backman
4b2ba2d69f Merge pull request #2270 from Allenmylath/patch-25
Update README.md
2025-07-26 04:55:59 -07:00
allenmylath
240a698fab Update README.md
recreating examples without activating bedrock models will create errors . Warning and link added
2025-07-26 14:51:19 +05:30
Mark Backman
9aaae01063 Add changelog entry for InworldTTSService 2025-07-25 21:46:02 -04:00
Mark Backman
41c8d22cf3 Merge pull request #2208 from padillamt/mtp/add-inworld-tts
Inworld HTTP TTS Service
2025-07-25 17:13:37 -07:00
padillamt
b68f044ef7 mtpadilla: updated example to reflect parameter placement changes in base Inworld TTS class 2025-07-25 15:13:43 -07:00
padillamt
e140bd6960 mtpadilla: moved model and voice id setting into the class constructor 2025-07-25 14:04:49 -07:00
Filipi Fuchter
e86b55e2b3 Fixed an issue in AudioBufferProcessor when using SmallWebRTCTransport where, if the microphone was muted, track timing was not respected. 2025-07-25 17:01:41 -03:00
padillamt
4a9bec5b35 mtpadilla: stop metrics at result chunk 2025-07-25 11:14:20 -07:00
padillamt
37361391d9 mtpadilla: removed ability to set base_url via constructor, set internally based on streaming variable 2025-07-25 09:16:56 -07:00
Filipi da Silva Fuchter
4b3726eba4 Merge pull request #2260 from pipecat-ai/filipi/audio_resampler
Fixed an issue in `AudioBufferProcessor` that caused garbled audio
2025-07-25 09:27:42 -03:00
padillamt
8e66794759 mtpadilla: switch to Deepgram ASR for lower latency 2025-07-24 22:22:36 -07:00
padillamt
acc5b9f210 inworld: change to function that stops all processing metrics
Co-authored-by: Mark Backman <m.backman@gmail.com>
2025-07-24 22:07:15 -07:00
padillamt
f982ace4c5 inworld: removal of unnecessary setting of ssampling rate since matches default 2025-07-24 21:56:01 -07:00
padillamt
5fb1899aeb inworld: removal of unnecessary default assignment as already handled 2025-07-24 21:42:42 -07:00
padillamt
7483422bd9 inworld: change set_voice uto use self._settings
Co-authored-by: Mark Backman <m.backman@gmail.com>
2025-07-24 21:23:03 -07:00
padillamt
16c20f3a99 inworld: removal of unnecessary default assignment since already done
Co-authored-by: Mark Backman <m.backman@gmail.com>
2025-07-24 21:15:34 -07:00
padillamt
d248c102c8 inworld: removal of unnecessary default assignment since already done
Co-authored-by: Mark Backman <m.backman@gmail.com>
2025-07-24 21:15:20 -07:00
padillamt
662550cc5e mtpadilla: remove unused imports 2025-07-24 21:05:22 -07:00
padillamt
067f64389b mtpadilla: no longer needed so making empty 2025-07-24 20:44:27 -07:00
padillamt
81048ce43a mtpadilla: rename 07aa-interruptible-inworld-http.py to 07ab-interruptible-inworld-http.py 2025-07-24 20:42:29 -07:00
padillamt
f6440ee6e1 mtpadilla: correct Examples header in comments 2025-07-24 13:36:40 -07:00
padillamt
da8c67114a mtpadilla: make streaming the default for example 2025-07-24 13:35:29 -07:00
Mark Backman
d8ea1311ff Merge pull request #2254 from pipecat-ai/mb/11labs-create-context-id
ElevenLabsTTSService: Only reset the context_id when interruptions ar…
2025-07-24 13:32:37 -07:00
Mark Backman
75c2ffc0b5 Check if audio context is already available, create one if not 2025-07-24 13:57:46 -04:00
Mark Backman
2297eb217e ElevenLabsTTSService: Only reset the context_id when interruptions are enabled 2025-07-24 13:53:44 -04:00
Filipi Fuchter
970b8044a0 Fixed an issue in AudioBufferProcessor that caused garbled audio when enable_turn_audio was enabled and audio resampling was required. 2025-07-24 13:25:48 -03:00
padillamt
b6367965cb mtpadilla: consolidate streaming and non-streaming options into a single class with common API, with boolean switch variable added (streaming) 2025-07-23 16:50:32 -07:00
padillamt
147bf9cfe8 mtpadilla: addition of non-streaming option with own dedicated class, and related additional non-streaming test option 2025-07-23 15:28:43 -07:00
padillamt
a5d353030e mtpadilla: small formatting fix to comments 2025-07-23 12:02:58 -07:00
padillamt
f29024bcc0 mtpadilla: update coments regarding temperature parameter 2025-07-23 11:47:26 -07:00
antonyesk601
1cbf7ae480 fix: remove unused variable; fix: remove redundant logic 2025-07-23 08:26:44 +00:00
padillamt
1915407ff7 inworld: removed unreferenced is_first_chunk variable 2025-07-21 15:30:48 -07:00
padillamt
076a675a75 inworld: Fix...Set sample_rate=None in InworldHttpTTSService to match Cartesia pattern 2025-07-21 13:50:36 -07:00
padillamt
0d5292c4ef inworld: typo fix in voice name 2025-07-21 13:48:13 -07:00
padillamt
4853d5d55c inworld: updated InworldHttpTTSService initialization 2025-07-21 13:27:25 -07:00
padillamt
8eda2435a2 inworld: removed explicit references to language since our models currently infer that from the text. 2025-07-21 13:24:10 -07:00
padillamt
54ff946976 inworld: largely adjustments for docstring compatibility 2025-07-21 12:07:58 -07:00
padillamt
aadd088b50 inworld: commented out contents as per Pipecat guidance that this pattern is being retired 2025-07-21 10:52:55 -07:00
padillamt
4250aa6616 inworld: removal of backup copy, no longer needed 2025-07-21 10:11:50 -07:00
padillamt
e3711f96a3 inworld: added detailed comments 2025-07-20 17:06:35 -07:00
Pete
8f74b97591 Refactor _send_user_text method in Gemini multimodal service to streamline event creation for turn completion 2025-07-20 18:08:45 -04:00
Pete
1d69cd1a5e Remove debug logging from _send_user_text method in Gemini multimodal service 2025-07-20 18:04:57 -04:00
Pete
bd7a0f27cc Add text input handling to Gemini multimodal service
- Updated `RealtimeInput` to include an optional `text` parameter.
- Introduced `TextInputMessage` class for encapsulating text input data.
- Implemented `_send_user_text` method to send text input to the Gemini Live API.
- Enhanced message processing to support text input alongside media chunks.
2025-07-20 17:39:31 -04:00
padillamt
5d8c184d99 inworld: commit of original text file and changes that copy openai's with Inworld TTS as only change 2025-07-18 16:30:03 -07:00
padillamt
1bc442e329 inworld: docstring fix 2025-07-18 15:13:19 -07:00
antonyesk601
688031efd6 fix: use undeclared variable _preinitialized. fix: double send of start frame 2025-07-18 08:23:04 +00:00
antonyesk601
0f9e69d3c7 feat: Add Simli Trinity models support to pipecat 2025-07-17 11:55:40 +00:00
padillamt
f3984aec33 inworld: added (empty) requirements for Inworld to be explicit reg dependencies 2025-07-16 13:21:32 -07:00
padillamt
2b76823b01 inworld: added comments to track a few things to confirm 2025-07-15 18:17:30 -07:00
padillamt
ca936bd569 inworld: added Inworld to list of needed credentials 2025-07-15 18:11:50 -07:00
padillamt
c67b779b91 inworld: first commit of Inworld example file for TTS 2025-07-15 17:21:16 -07:00
padillamt
913dba3b74 inworld: class name change 2025-07-15 17:15:57 -07:00
padillamt
384838147a inworld: removed unnecessary code from stop() and cancel() 2025-07-15 16:56:18 -07:00
padillamt
7861b911c0 inworld: first commit of __init__ and tts.py files 2025-07-15 16:50:50 -07:00
1064 changed files with 18693 additions and 105689 deletions

View File

@@ -1,60 +0,0 @@
name: android
on:
push:
branches:
- main
paths:
- "examples/simple-chatbot/client/android/**"
- "examples/p2p-webrtc/video-transform/client/android/**"
pull_request:
branches:
- "**"
paths:
- "examples/simple-chatbot/client/android/**"
- "examples/p2p-webrtc/video-transform/client/android/**"
workflow_dispatch:
inputs:
sdk_git_ref:
type: string
description: "Which git ref of the app to build"
concurrency:
group: build-android-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
sdk:
name: "Demo apps"
runs-on: ubuntu-latest
steps:
- name: Checkout repo
uses: actions/checkout@v4
with:
ref: ${{ github.event.inputs.sdk_git_ref || github.ref }}
- name: "Install Java"
uses: actions/setup-java@v4
with:
distribution: 'temurin'
java-version: '17'
- name: "Example app: Simple Chatbot"
working-directory: examples/simple-chatbot/client/android
run: ./gradlew :simple-chatbot-client:assembleDebug
- name: Upload Simple Chatbot APK
uses: actions/upload-artifact@v4
with:
name: Simple Chatbot Android Client
path: examples/simple-chatbot/client/android/simple-chatbot-client/build/outputs/apk/debug/simple-chatbot-client-debug.apk
- name: "Example app: Small WebRTC Client"
working-directory: examples/p2p-webrtc/video-transform/client/android
run: ./gradlew :small-webrtc-client:assembleDebug
- name: Upload Small WebRTC APK
uses: actions/upload-artifact@v4
with:
name: Small WebRTC Android Client
path: examples/p2p-webrtc/video-transform/client/android/small-webrtc-client/build/outputs/apk/debug/small-webrtc-client-debug.apk

View File

@@ -21,24 +21,20 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
id: setup_python
uses: actions/setup-python@v4
- name: Install uv
uses: astral-sh/setup-uv@v3
with:
python-version: '3.10'
- name: Setup virtual environment
run: |
python -m venv .venv
- name: Install basic Python dependencies
run: |
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r dev-requirements.txt
version: "latest"
- name: Set up Python
run: uv python install 3.10
- name: Install development dependencies
run: uv sync --group dev
- name: Build project
run: |
source .venv/bin/activate
python -m build
- name: Install project and other Python dependencies
run: |
source .venv/bin/activate
pip install --editable .
run: uv build
- name: Install project in editable mode
run: uv pip install --editable .

View File

@@ -18,35 +18,28 @@ jobs:
steps:
- name: Checkout repo
uses: actions/checkout@v4
- name: Install uv
uses: astral-sh/setup-uv@v3
with:
version: "latest"
- name: Set up Python
id: setup_python
uses: actions/setup-python@v4
with:
python-version: "3.10"
- name: Cache virtual environment
uses: actions/cache@v3
with:
# We are hashing dev-requirements.txt and test-requirements.txt which
# contain all dependencies needed to run the tests.
key: venv-${{ runner.os }}-${{ steps.setup_python.outputs.python-version}}-${{ hashFiles('dev-requirements.txt') }}-${{ hashFiles('test-requirements.txt') }}
path: .venv
run: uv python install 3.10
- name: Install system packages
id: install_system_packages
run: |
sudo apt-get install -y portaudio19-dev
- name: Setup virtual environment
- name: Install dependencies
run: |
python -m venv .venv
- name: Install basic Python dependencies
run: |
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r dev-requirements.txt -r test-requirements.txt
uv sync --group dev --extra anthropic --extra aws --extra google --extra langchain
- name: Run tests with coverage
run: |
source .venv/bin/activate
coverage run
coverage xml
uv run coverage run
uv run coverage xml
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v5
with:

View File

@@ -22,25 +22,22 @@ jobs:
steps:
- name: Checkout repo
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
- name: Install uv
uses: astral-sh/setup-uv@v3
with:
python-version: "3.10"
- name: Setup virtual environment
run: |
python -m venv .venv
- name: Install development Python dependencies
run: |
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r dev-requirements.txt
version: "latest"
- name: Set up Python
run: uv python install 3.10
- name: Install development dependencies
run: uv sync --group dev
- name: Ruff formatter
id: ruff-format
run: |
source .venv/bin/activate
ruff format --diff
run: uv run ruff format --diff
- name: Ruff linter (all rules)
id: ruff-check
run: |
source .venv/bin/activate
ruff check
run: uv run ruff check

View File

@@ -17,23 +17,17 @@ jobs:
uses: actions/checkout@v4
with:
ref: ${{ github.event.inputs.gitref }}
- name: Set up Python
id: setup_python
uses: actions/setup-python@v4
- name: Install uv
uses: astral-sh/setup-uv@v3
with:
python-version: '3.10'
- name: Setup virtual environment
run: |
python -m venv .venv
- name: Install basic Python dependencies
run: |
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r dev-requirements.txt
version: "latest"
- name: Set up Python
run: uv python install 3.10
- name: Install development dependencies
run: uv sync --group dev
- name: Build project
run: |
source .venv/bin/activate
python -m build
run: uv build
- name: Upload wheels
uses: actions/upload-artifact@v4
with:

View File

@@ -12,23 +12,16 @@ jobs:
with:
fetch-tags: true
fetch-depth: 100
- name: Set up Python
id: setup_python
uses: actions/setup-python@v4
- name: Install uv
uses: astral-sh/setup-uv@v3
with:
python-version: '3.10'
- name: Setup virtual environment
run: |
python -m venv .venv
- name: Install basic Python dependencies
run: |
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r dev-requirements.txt
version: "latest"
- name: Set up Python
run: uv python install 3.10
- name: Install development dependencies
run: uv sync --group dev
- name: Build project
run: |
source .venv/bin/activate
python -m build
run: uv build
- name: Upload wheels
uses: actions/upload-artifact@v4
with:
@@ -38,7 +31,7 @@ jobs:
publish-to-test-pypi:
name: "Publish to Test PyPI"
runs-on: ubuntu-latest
needs: [ build ]
needs: [build]
environment:
name: testpypi
url: https://pypi.org/p/pipecat-ai

View File

@@ -0,0 +1,61 @@
name: Python Compatibility Test
on:
push:
branches: [main, develop]
paths: ['pyproject.toml']
pull_request:
branches: [main, develop]
paths: ['pyproject.toml']
jobs:
test-compatibility:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version: ['3.10.18', '3.11.13', '3.12.11', '3.13.5']
name: Python ${{ matrix.python-version }}
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Install system dependencies
run: |
sudo apt-get update
sudo apt-get install -y \
portaudio19-dev \
libcairo2-dev \
libgirepository1.0-dev \
pkg-config
- name: Install uv
uses: astral-sh/setup-uv@v4
with:
version: 'latest'
- name: Set up Python ${{ matrix.python-version }}
run: |
uv python install ${{ matrix.python-version }}
uv python pin ${{ matrix.python-version }}
- name: Test uv sync with all extras (Python < 3.13)
if: "!startsWith(matrix.python-version, '3.13.')"
run: |
uv sync --group dev --all-extras --no-extra krisp
- name: Test uv sync without PyTorch extras (Python 3.13+)
if: startsWith(matrix.python-version, '3.13.')
run: |
uv sync --group dev --all-extras \
--no-extra krisp \
--no-extra ultravox \
--no-extra local-smart-turn \
--no-extra moondream \
--no-extra mlx-whisper
- name: Verify installation
run: |
uv run python --version
uv run python -c "import pipecat; print('✅ Pipecat imports successfully')"

56
.github/workflows/sync-quickstart.yaml vendored Normal file
View File

@@ -0,0 +1,56 @@
name: Sync Quickstart to pipecat-quickstart repo
on:
push:
branches: [main]
paths:
- 'examples/quickstart/**'
workflow_dispatch: # Manual trigger
jobs:
sync-quickstart:
runs-on: ubuntu-latest
steps:
- name: Checkout main repo
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Checkout quickstart repo
uses: actions/checkout@v4
with:
repository: pipecat-ai/pipecat-quickstart
token: ${{ secrets.QUICKSTART_SYNC_TOKEN }}
path: quickstart-repo
- name: Sync files (excluding READMEs)
run: |
# Copy code files only, skip READMEs
cp examples/quickstart/bot.py quickstart-repo/
cp examples/quickstart/requirements.txt quickstart-repo/
cp examples/quickstart/env.example quickstart-repo/
# Copy any other files that aren't README.md
find examples/quickstart -type f \
-not -name "README.md" \
-not -name "*.md" \
-exec cp {} quickstart-repo/ \;
- name: Commit and push changes
run: |
cd quickstart-repo
git config user.name "GitHub Action"
git config user.email "action@github.com"
git add .
# Only commit if there are changes
if ! git diff --staged --quiet; then
git commit -m "Sync from pipecat main repo
Updated files from examples/quickstart/
Commit: ${{ github.sha }}
"
git push
else
echo "No changes to sync"
fi

View File

@@ -22,31 +22,23 @@ jobs:
steps:
- name: Checkout repo
uses: actions/checkout@v4
- name: Install uv
uses: astral-sh/setup-uv@v3
with:
version: "latest"
- name: Set up Python
id: setup_python
uses: actions/setup-python@v4
with:
python-version: "3.10"
- name: Cache virtual environment
uses: actions/cache@v3
with:
# We are hashing dev-requirements.txt and test-requirements.txt which
# contain all dependencies needed to run the tests.
key: venv-${{ runner.os }}-${{ steps.setup_python.outputs.python-version}}-${{ hashFiles('dev-requirements.txt') }}-${{ hashFiles('test-requirements.txt') }}
path: .venv
run: uv python install 3.10
- name: Install system packages
id: install_system_packages
run: |
sudo apt-get install -y portaudio19-dev
- name: Setup virtual environment
- name: Install dependencies
run: |
python -m venv .venv
- name: Install basic Python dependencies
run: |
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r dev-requirements.txt -r test-requirements.txt
uv sync --group dev --extra anthropic --extra aws --extra google --extra langchain
- name: Test with pytest
run: |
source .venv/bin/activate
pytest
uv run pytest

7
.gitignore vendored
View File

@@ -31,8 +31,6 @@ MANIFEST
fly.toml
# Examples
examples/telnyx-chatbot/templates/streams.xml
examples/twilio-chatbot/templates/streams.xml
examples/**/node_modules/
examples/**/.expo/
examples/**/dist/
@@ -50,4 +48,7 @@ examples/**/web-build/
# Documentation
docs/api/_build/
docs/api/api
docs/api/api
# uv
.python-version

View File

@@ -1,6 +1,6 @@
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.9.7
rev: v0.12.1
hooks:
- id: ruff
language_version: python3

View File

@@ -9,22 +9,14 @@ build:
- python3-dev
- libasound2-dev
jobs:
pre_build:
- python -m pip install --upgrade pip
- pip install wheel setuptools
post_build:
- echo "Build completed"
post_install:
- pip install uv
- UV_PROJECT_ENVIRONMENT=$READTHEDOCS_VIRTUALENV_PATH uv sync --group docs --all-extras --no-extra krisp --no-extra gstreamer --no-extra ultravox --no-extra local_smart_turn --no-extra moondream --no-extra riva --no-extra mlx-whisper
sphinx:
configuration: docs/api/conf.py
fail_on_warning: false
python:
install:
- requirements: docs/api/requirements.txt
- method: pip
path: .
search:
ranking:
api/*: 5

View File

@@ -5,13 +5,311 @@ All notable changes to **Pipecat** will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [Unreleased]
## [0.0.80] - 2025-08-13
### Added
- Added a new field `handle_sigterm` to `PipelineRunner`. It defaults to `False`.
This field handles SIGTERM signals. The `handle_sigint` field still defaults
to `True`, but now it handles only SIGINT signals.
- Added `GeminiTTSService` which uses Google Gemini to generate TTS output. The
Gemini model can be prompted to insert styled speech to control the TTS
output.
- For `OpenAILLMService` and its subclasses, added the ability to retry
executing a chat completion after a timeout period. The new args are
`retry_timeout_secs` and `retry_on_timeout`. This feature is disabled by
default.
- Added Exotel support to Pipecat's development runner. You can now connect
using the runner with `uv run bot.py -t exotel` and an ngrok connection to
HTTP port 7860.
- Added `enable_direct_mode` argument to `FrameProcessor`. The direct mode is
for processors which require very little I/O or compute resources, that is
processors that can perform their task almost immediately. These type of
processors don't need any of the internal tasks and queues usually created by
frame processors which means overall application performance might be slightly
increased. Use with care.
- Added TTFB metrics for `HeyGenVideoService` and `TavusVideoService`.
- Added `endpoint_id` parameter to `AzureSTTService`. ([Custom EndpointId](https://docs.azure.cn/en-us/ai-services/speech-service/how-to-recognize-speech?pivots=programming-language-python#use-a-custom-endpoint))
### Changed
- `WatchdogPriorityQueue` now requires the items to be inserted to always be
tuples and the size of the tuple needs to be specified in the constructor when
creating the queue with the `tuple_size` argument.
- Updated Moondream to revision `2025-01-09`.
- Updated `PlayHTHttpTTSService` to no longer use the `pyht` client to remove
compatibility issues with other packages. Now you can use the PlayHT HTTP
service with other services, like GoogleLLMService.
- Updated `pyproject.toml` to once again pin `numba` to `>=0.61.2` in order to
resolve package versioning issues.
- Updated the `STTMuteFilter` to include `VADUserStartedSpeakingFrame` and
`VADUserStoppedSpeakingFrame` in the list of frames to filter when the
filtering is on.
### Performance
- Improving the latency of the `HeyGenVideoService`.
- Improved some frame processors performance by using the new frame processor
direct mode. In direct mode a frame processor will process frames right away
avoiding the need for internal queues and tasks. This is useful for some
simple processors. For example, in processors that wrap other processors
(e.g. `Pipeline`, `ParallelPipeline`), we add one processor before and one
after the wrapped processors (internally, you will see them as sources and
sinks). These sources and sinks don't do any special processing and they
basically forward frames. So, for these simple processors we now enable the
new direct mode which avoids creating any internal tasks (and queues) and
therefore improves performance.
### Fixed
- Fixed an issue with the `BaseWhisperSTTService` where the language was
specified as an enum and not a string.
- Fixed an issue where `SmallWebRTCTransport` ended before TTS finished.
- Fixed an issue in `OpenAIRealtimeBetaLLMService` where specifying a `text`
`modalities` didn't result in text being outputted from the model.
- Added SSML reserved character escaping to `AzureBaseTTSService` to properly
handle special characters in text sent to Azure TTS. This fixes an issue
where characters like `&`, `<`, `>`, `"`, and `'` in LLM-generated text would
cause TTS failures.
- Fixed a `WatchdogPriorityQueue` issue that could cause an exception when
compating watchdog cancel sentinel items with other items in the queue.
- Fixed an issue that would cause system frames to not be processed with higher
priority than other frames. This could cause slower interruption times.
- Fixed an issue where retrying a websocket connection error would result in an
error.
### Other
- Add foundation example `19b-openai-realtime-beta-text.py`, showing how to use
`OpenAIRealtimeBetaLLMService` to output text to a TTS service.
- Add vision support to release evals so we can run the foundational examples 12
series.
- Added foundational example `15a-switch-languages.py` to release evals. It is
able to detect if we switched the language properly.
- Updated foundational examples to show how to enclose complex logic
(e.g. `ParallelPipeline`) into a single processor so the main pipeline becomes
simpler.
- Added `07n-interruptible-gemini.py`, demonstrating how to use
`GeminiTTSService`.
## [0.0.79] - 2025-08-07
### Changed
- Changed `pipecat-ai`'s `openai` dependency to `>=1.74.0,<=1.99.1` due to a
breaking change in `openai` 1.99.2 ([commit](https://github.com/openai/openai-python/commit/657f551dbe583ffb259d987dafae12c6211fba06))
### Deprecated
- `TTSService.say()` is deprecated, push a `TTSSpeakFrame` instead. Calling
functions directly is a discouraged pattern in Pipecat because, for example,
it might cause issues with frame ordering.
- `LLMMessagesFrame` is deprecated, in favor of either:
- `LLMMessagesUpdateFrame` with `run_llm=True`
- `OpenAILLMContextFrame` with desired messages in a new context
- `LLMUserResponseAggregator` and `LLMAssistantResponseAggregator` are
deprecated, as they depended on the now-deprecated `LLMMessagesFrame`. Use
`LLMUserContextAggregator` and `LLMAssistantResponseAggregator` (or
LLM-specific subclasses thereof) instead.
## [0.0.78] - 2025-08-07
### Added
- Added `enable_emulated_vad_interruptions` to `LLMUserAggregatorParams`.
When user speech is emulated (e.g. when a transcription is received but
VAD doesn't detect speech), this parameter controls whether the emulated
speech can interrupt the bot. Default is False (emulated speech is ignored
while the bot is speaking).
- Added new `handle_sigint` and `handle_sigterm` to `RunnerArguments`. This
allows applications to know what settings they should use for the environment
they are running on. Also, added `pipeline_idle_timeout_secs` to be able to
control the `PipelineTask` idle timeout.
- Added `processor` field to `ErrorFrame` to indicate `FrameProcessor` that
generated the error.
- Added new language support for `AWSTranscribeSTTService`. All languages
supporting streaming data input are now supported:
https://docs.aws.amazon.com/transcribe/latest/dg/supported-languages.html
- Added support for Simli Trinity Avatars. A new `is_trinity_avatar` parameter
has been introduced to specify whether the provided `faceId` corresponds to a
Trinity avatar, which is required for optimal Trinity avatar performance.
- The development runner how handles custom `body` data for `DailyTransport`.
The `body` data is passed to the Pipecat client. You can POST to the `/start`
endpoint with a request body of:
```
{
"createDailyRoom": true,
"dailyRoomProperties": { "start_video_off": true },
"body": { "custom_data": "value" }
}
```
The `body` information is parsed and used in the application. The
`dailyRoomProperties` are currently not handled.
- Added detailed latency logging to `UserBotLatencyLogObserver`, capturing
average response time between user stop and bot start, as well as minimum and
maximum response latency.
- Added Chinese, Japanese, Korean word timestamp support to
`CartesiaTTSService`.
- Added `region` parameter to `GladiaSTTService`. Accepted values: eu-west
(default), us-west.
### Changed
- System frames are now queued. Before, system frames could be generated from
any task and would not guarantee any order which was causing undesired
behavior. Also, it was possible to get into some rare recursion issues because
of the way system frames were executed (they were executed in-place, meaning
calling `push_frame()` would finish after the system frame traversed all the
pipeline). This makes system frames more deterministic.
- Changed the default model for both `ElevenLabsTTSService` and
`ElevenLabsHttpTTSService` to `eleven_turbo_v2_5`. The rationale for this
change is that the Turbo v2.5 model exhibits the most stable voice quality
along with very low latency TTFB; latencies are on par with the Flash v2.5
model. Also, the Turbo v2.5 model outputs word/timestamp alignment data with
correct spacing.
- The development runners `/connect` and `/start` endpoint now both return
`dailyRoom` and `dailyToken` in place of the previous `room_url` and `token`.
- Updated the `pipecat.runner.daily` utility to only a take `DAILY_API_URL` and
`DAILY_SAMPLE_ROOM_URL` environment variables instead of argparsing `-u` and
`-k`, respectively.
- Updated `daily-python` to 0.19.6.
- Changed `TavusVideoService` to send audio or video frames only after the
transport is ready, preventing warning messages at startup.
- The development runner now strips any provided protocol (e.g. https://) from
the proxy address and issues a warning. It also strips trailing `/`.
### Deprecated
- In the `pipecat.runner.daily`, the `configure_with_args()` function is
deprecated. Use the `configure()` function instead.
- The development runner's `/connect` endpoint is deprecated and will be
removed in a future version. Use the `/start` endpoint in its place. In the
meantime, both endpoints work and deliver equivalent functionality.
### Fixed
- Fixed a `DailyTransport` issue that would result in an unhandled
`concurrent.futures.CancelledError` when a future is cancelled.
- Fixed a `RivaSTTService` issue that would result in an unhandled
`concurrent.futures.CancelledError` when a future is cancelled when reading
from the audio chunks from the incoming audio stream.
- Fixed an issue in the `BaseOutputTransport`, mainly reproducible with
`FastAPIWebsocketOutputTransport` when the audio mixer was enabled, where the
loop could consume 100% CPU by continuously returning without delay, preventing
other asyncio tasks (such as cancellation or shutdown signals) from being
processed.
- Fixed an issue where `BotStartedSpeakingFrame` and `BotStoppedSpeakingFrame`
were not emitted when using `TavusVideoService` or `HeyGenVideoService`.
- Fixed an issue in `LiveKitTransport` where empty `AudioRawFrame`s were pushed
down the pipeline. This resulted in warnings by the STT processor.
- Fixed `PiperTTSService` to send text as a JSON object in the request body,
resolving compatibility with Piper's HTTP API.
- Fixed an issue with the `TavusVideoService` where an error was thrown due to
missing transcription callbacks.
- Fixed an issue in `SpeechmaticsSTTService` where the `user_id` was set to
`None` when diarization is not enabled.
### Performance
- Fixed an issue in `TaskObserver` (a proxy to all observers) that was degrading
global performance.
### Other
- Added `07aa-interruptible-soniox.py`, `07ab-interruptible-inworld-http.py`,
`07ac-interruptible-asyncai.py` and `07ac-interruptible-asyncai-http.py`
release evals.
## [0.0.77] - 2025-07-31
### Added
- Added `InputTextRawFrame` frame type to handle user text input with Gemini
Multimodal Live.
- Added `HeyGenVideoService`. This is an integration for HeyGen Interactive
Avatar. A video service that handles audio streaming and requests HeyGen to
generate avatar video responses. (see https://www.heygen.com/)
- Added the ability to switch voices to `RimeTTSService`.
- Added unified development runner for building voice AI bots across multiple
transports
- `pipecat.runner.run` FastAPI-based development server with automatic bot
discovery
- `pipecat.runner.types` Runner session argument types
(`DailyRunnerArguments`, `SmallWebRTCRunnerArguments`,
`WebSocketRunnerArguments`)
- `pipecat.runner.utils.create_transport()` Factory function for creating
transports from session arguments
- `pipecat.runner.daily` and `pipecat.runner.livekit` Configuration
utilities for Daily and LiveKit setups
- Support for all transport types: Daily, WebRTC, Twilio, Telnyx, Plivo
- Automatic telephony provider detection and serializer configuration
- ESP32 WebRTC compatibility with SDP munging
- Environment detection (`ENV=local`) for conditional features
- Added Async.ai TTS integration (https://async.ai/)
- `AsyncAITTSService` WebSocket-based streaming TTS with interruption
support
- `AsyncAIHttpTTSService` HTTP-based streaming TTS service
- Example scripts:
- `examples/foundational/07ac-interruptible-asyncai.py` (WebSocket demo)
- `examples/foundational/07ac-interruptible-asyncai-http.py` (HTTP demo)
- Added `transcription_bucket` params support to the `DailyRESTHelper`.
- Added a new TTS service, `InworldTTSService`. This service provides
low-latency, high-quality speech generation using Inworld's streaming API.
- Added a new field `handle_sigterm` to `PipelineRunner`. It defaults to
`False`. This field handles SIGTERM signals. The `handle_sigint` field still
defaults to `True`, but now it handles only SIGINT signals.
- Added foundational example `14u-function-calling-ollama.py` for Ollama
function calling.
@@ -22,8 +320,21 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Added `set_log_level` to `DailyTransport`, allowing setting the logging level
for Daily's internal logging system.
- Added `on_transcription_stopped` and `on_transcription_error` to Daily
callbacks.
### Changed
- Changed the default `url` for `NeuphonicTTSService` to
`wss://api.neuphonic.com` as it provides better global performance. You can
set the URL to other URLs, such as the previous default:
`wss://eu-west-1.api.neuphonic.com`.
- Update `daily-python` to 0.19.5.
- `STTMuteFilter` now pushes the `STTMuteFrame` upstream and downstream, to
allow for more flexible `STTMuteFilter` placement.
- Play delayed messages from `ElevenLabsTTSService` if they still belong to the
current context.
@@ -45,6 +356,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
`pyneuphonic` package. This removes a package requirement, allowing Neuphonic
to work with more services.
- Updated `ElevenLabsTTSService` to handle the case where
`allow_interruptions=False`. Now, when interruptions are disabled, the same
context ID will be used throughout the conversation.
- Updated the `deepgram` optional dependency to 4.7.0, which downgrades the
`tasks cancelled error` to a debug log. This removes the log from appearing
in Pipecat logs upon leaving.
@@ -65,9 +380,39 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- For `LmntTTSService`, changed the default `model` to `blizzard`, LMNT's
recommended model.
- Updated `SpeechmaticsSTTService`:
- Added support for additional diarization options.
- Added foundational example `07a-interruptible-speechmatics-vad.py`, which
uses VAD detection provided by `SpeechmaticsSTTService`.
### Fixed
- Fixed a dependency issue for uv users where an `llvmlite` version required python 3.9.
- Fixed a `LLMUserResponseAggregator` issue where interruptions were not being
handled properly.
- Fixed `PiperTTSService` to work with newer Piper GPL.
- Fixed a race condition in `FastAPIWebsocketClient` that occurred when
attempting to send a message while the client was disconnecting.
- Fixed an issue in `GoogleLLMService` where interruptions did not work when an
interruption strategy was used.
- Fixed an issue in the `TranscriptProcessor` where newline characters could
cause the transcript output to be corrupted (e.g. missing all spaces).
- Fixed an issue in `AudioBufferProcessor` when using `SmallWebRTCTransport`
where, if the microphone was muted, track timing was not respected.
- Fixed an error that occurs when pushing an `LLMMessagesFrame`. Only some LLM
services, like Grok, are impacted by this issue. The fix is to remove the
optional `name` property that was being added to the message.
- Fixed an issue in `AudioBufferProcessor` that caused garbled audio when
`enable_turn_audio` was enabled and audio resampling was required.
- Fixed a dependency issue for uv users where an `llvmlite` version required
python 3.9.
- Fixed an issue in `MiniMaxHttpTTSService` where the `pitch` param was the
incorrect type.
@@ -81,15 +426,28 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Fixed an issue in `ElevenLabsTTSService` where the word/timestamp pairs were
calculating word boundaries incorrectly.
- Fixed an issue where, in some edge cases, the `EmulateUserStartedSpeakingFrame`
could be created even if we didn't have a transcription.
- Fixed an issue where, in some edge cases, the
`EmulateUserStartedSpeakingFrame` could be created even if we didn't have a
transcription.
- Fixed an issue in `GoogleLLMContext` where it would inject the
`system_message` as a "user" message into cases where it was not meant to;
it was only meant to do that when there were no "regular" (non-function-call)
messages in the context, to ensure that inference would run properly.
- Fixed an issue in `LiveKitTransport` where the `on_audio_track_subscribed` was never emitted.
- Fixed an issue in `LiveKitTransport` where the `on_audio_track_subscribed` was
never emitted.
### Other
- Added new quickstart demos:
- examples/quickstart: voice AI bot quickstart
- examples/client-server-web: client/server starter example
- examples/phone-bot-twilio: twilio starter example
- Removed most of the examples from the pipecat repo. Examples can now be
found in: https://github.com/pipecat-ai/pipecat-examples.
## [0.0.76] - 2025-07-11
@@ -132,7 +490,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
user started early, while the bot was still working through
`trigger_assistant_response()`.
## [0.0.75] - 2025-07-08
## [0.0.75] - 2025-07-08 [YANKED]
**This release has been yanked due to resampling issues affecting audio output
quality and critical bugs impacting `ParallelPipelines` functionality.**
**Please upgrade to version 0.0.76 or later.**
### Added
@@ -193,7 +556,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Remove unncessary push task in each `FrameProcessor`.
## [0.0.74] - 2025-07-03
## [0.0.74] - 2025-07-03 [YANKED]
**This release has been yanked due to resampling issues affecting audio output
quality and critical bugs impacting `ParallelPipelines` functionality.**
**Please upgrade to version 0.0.76 or later.**
### Added

View File

@@ -31,6 +31,23 @@ git push origin your-branch-name
Our maintainers will review your PR, and once everything is good, your contributions will be merged!
## Dependency Management
This project uses [uv](https://docs.astral.sh/uv/) for dependency management. The `uv.lock` file is committed to ensure reproducible builds.
### Adding or Updating Dependencies
1. Edit `pyproject.toml` to add/update dependencies
2. Run `uv lock` to update the lockfile with new dependency resolution
3. Run `uv sync` to install the updated dependencies locally
4. Always commit both files together:
```bash
git add pyproject.toml uv.lock
git commit -m "feat: add new dependency for feature X"
```
**Important:** Never manually edit `uv.lock`. It's auto-generated by `uv lock`.
## Code Style and Documentation
### Python Code Style

View File

@@ -1,40 +0,0 @@
# setup
FROM python:3.11.5
WORKDIR /app
COPY requirements.txt /app
COPY *.py /app
COPY pyproject.toml /app
COPY src/ /app/src/
COPY examples/ /app/examples/
WORKDIR /app
RUN ls --recursive /app/
RUN pip3 install --upgrade -r requirements.txt
RUN python -m build .
RUN pip3 install .
RUN pip3 install gunicorn
# If running on Ubuntu, Azure TTS requires some extra config
# https://learn.microsoft.com/en-us/azure/ai-services/speech-service/quickstarts/setup-platform?pivots=programming-language-python&tabs=linux%2Cubuntu%2Cdotnetcli%2Cdotnet%2Cjre%2Cmaven%2Cnodejs%2Cmac%2Cpypi
RUN wget -O - https://www.openssl.org/source/openssl-1.1.1w.tar.gz | tar zxf -
WORKDIR openssl-1.1.1w
RUN ./config --prefix=/usr/local
RUN make -j $(nproc)
RUN make install_sw install_ssldirs
RUN ldconfig -v
ENV SSL_CERT_DIR=/etc/ssl/certs
#ENV LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
RUN apt clean
RUN apt-get update
RUN apt-get -y install build-essential libssl-dev ca-certificates libasound2 wget
ENV PYTHONUNBUFFERED=1
WORKDIR /app
EXPOSE 8000
# run
CMD ["gunicorn", "--workers=2", "--log-level", "debug", "--chdir", "examples/server", "--capture-output", "daily-bot-manager:app", "--bind=0.0.0.0:8000"]

150
README.md
View File

@@ -8,7 +8,7 @@
**Pipecat** is an open-source Python framework for building real-time voice and multimodal conversational agents. Orchestrate audio and video, AI services, different transports, and conversation pipelines effortlessly—so you can focus on what makes your agent unique.
> Want to dive right in? [Install Pipecat](https://docs.pipecat.ai/getting-started/installation) then try the [quickstart](https://docs.pipecat.ai/getting-started/quickstart).
> Want to dive right in? Try the [quickstart](https://docs.pipecat.ai/getting-started/quickstart).
## 🚀 What You Can Build
@@ -31,11 +31,11 @@
## 🎬 See it in action
<p float="left">
<a href="https://github.com/pipecat-ai/pipecat/tree/main/examples/simple-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/simple-chatbot/image.png" width="400" /></a>&nbsp;
<a href="https://github.com/pipecat-ai/pipecat/tree/main/examples/storytelling-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/storytelling-chatbot/image.png" width="400" /></a>
<a href="https://github.com/pipecat-ai/pipecat-examples/tree/main/simple-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat-examples/main/simple-chatbot/image.png" width="400" /></a>&nbsp;
<a href="https://github.com/pipecat-ai/pipecat-examples/tree/main/storytelling-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat-examples/main/storytelling-chatbot/image.png" width="400" /></a>
<br/>
<a href="https://github.com/pipecat-ai/pipecat/tree/main/examples/translation-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/translation-chatbot/image.png" width="400" /></a>&nbsp;
<a href="https://github.com/pipecat-ai/pipecat/tree/main/examples/moondream-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/moondream-chatbot/image.png" width="400" /></a>
<a href="https://github.com/pipecat-ai/pipecat-examples/tree/main/translation-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat-examples/main/translation-chatbot/image.png" width="400" /></a>&nbsp;
<a href="https://github.com/pipecat-ai/pipecat-examples/tree/main/moondream-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat-examples/main/moondream-chatbot/image.png" width="400" /></a>
</p>
## 📱 Client SDKs
@@ -51,98 +51,124 @@ You can connect to Pipecat from any platform using our official SDKs:
## 🧩 Available services
| Category | Services |
| ------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [Parakeet (NVIDIA)](https://docs.pipecat.ai/server/services/stt/parakeet), [SambaNova (Whisper)](https://docs.pipecat.ai/server/services/stt/sambanova), [Soniox](https://docs.pipecat.ai/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/server/services/stt/speechmatics), [Ultravox](https://docs.pipecat.ai/server/services/stt/ultravox), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) |
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/server/services/llm/aws), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/server/services/llm/sambanova) [Together AI](https://docs.pipecat.ai/server/services/llm/together) |
| Text-to-Speech | [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [FastPitch (NVIDIA)](https://docs.pipecat.ai/server/services/tts/fastpitch), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
| Speech-to-Speech | [AWS Nova Sonic](https://docs.pipecat.ai/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai) |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local |
| Serializers | [Plivo](https://docs.pipecat.ai/server/utilities/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/utilities/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/utilities/serializers/telnyx) |
| Video | [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) |
| Memory | [mem0](https://docs.pipecat.ai/server/services/memory/mem0) |
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/fal), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) |
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [Noisereduce](https://docs.pipecat.ai/server/utilities/audio/noisereduce-filter) |
| Analytics & Metrics | [OpenTelemetry](https://docs.pipecat.ai/server/utilities/opentelemetry), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) |
| Category | Services |
| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [NVIDIA Riva](https://docs.pipecat.ai/server/services/stt/riva), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [SambaNova (Whisper)](https://docs.pipecat.ai/server/services/stt/sambanova), [Soniox](https://docs.pipecat.ai/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/server/services/stt/speechmatics), [Ultravox](https://docs.pipecat.ai/server/services/stt/ultravox), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) |
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/server/services/llm/aws), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/server/services/llm/sambanova) [Together AI](https://docs.pipecat.ai/server/services/llm/together) |
| Text-to-Speech | [Async](https://docs.pipecat.ai/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [Groq](https://docs.pipecat.ai/server/services/tts/groq), [Inworld](https://docs.pipecat.ai/server/services/tts/inworld), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [NVIDIA Riva](https://docs.pipecat.ai/server/services/tts/riva), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
| Speech-to-Speech | [AWS Nova Sonic](https://docs.pipecat.ai/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai) |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local |
| Serializers | [Plivo](https://docs.pipecat.ai/server/utilities/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/utilities/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/utilities/serializers/telnyx) |
| Video | [HeyGen](https://docs.pipecat.ai/server/services/video/heygen), [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) |
| Memory | [mem0](https://docs.pipecat.ai/server/services/memory/mem0) |
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/fal), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) |
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [Noisereduce](https://docs.pipecat.ai/server/utilities/audio/noisereduce-filter) |
| Analytics & Metrics | [OpenTelemetry](https://docs.pipecat.ai/server/utilities/opentelemetry), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) |
📚 [View full services documentation →](https://docs.pipecat.ai/server/services/supported-services)
## ⚡ Getting started
You can get started with Pipecat running on your local machine, then move your agent processes to the cloud when youre ready.
You can get started with Pipecat running on your local machine, then move your agent processes to the cloud when you're ready.
```shell
# Install the module
pip install pipecat-ai
1. Install uv
# Set up your environment
cp dot-env.template .env
```
```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```
To keep things lightweight, only the core framework is included by default. If you need support for third-party AI services, you can add the necessary dependencies with:
> **Need help?** Refer to the [uv install documentation](https://docs.astral.sh/uv/getting-started/installation/).
```shell
pip install "pipecat-ai[option,...]"
```
2. Install the module
```bash
# For new projects
uv init my-pipecat-app
cd my-pipecat-app
uv add pipecat-ai
# Or for existing projects
uv add pipecat-ai
```
3. Set up your environment
```bash
cp env.example .env
```
4. To keep things lightweight, only the core framework is included by default. If you need support for third-party AI services, you can add the necessary dependencies with:
```bash
uv add "pipecat-ai[option,...]"
```
> **Using pip?** You can still use `pip install pipecat-ai` and `pip install "pipecat-ai[option,...]"` to get set up.
## 🧪 Code examples
- [Foundational](https://github.com/pipecat-ai/pipecat/tree/main/examples/foundational) — small snippets that build on each other, introducing one or two concepts at a time
- [Example apps](https://github.com/pipecat-ai/pipecat/tree/main/examples/) — complete applications that you can use as starting points for development
- [Example apps](https://github.com/pipecat-ai/pipecat-examples) — complete applications that you can use as starting points for development
## 🛠️ Hacking on the framework itself
## 🛠️ Contributing to the framework
1. Set up a virtual environment before following these instructions. From the root of the repo:
### Prerequisites
```shell
python3 -m venv venv
source venv/bin/activate
**Minimum Python Version:** 3.10
**Recommended Python Version:** 3.11-3.12
### Setup Steps
1. Clone the repository and navigate to it:
```bash
git clone https://github.com/pipecat-ai/pipecat.git
cd pipecat
```
2. Install the development dependencies:
2. Install development and testing dependencies:
```shell
pip install -r dev-requirements.txt
```bash
uv sync --group dev --all-extras --no-extra gstreamer --no-extra krisp --no-extra local
```
3. Install the git pre-commit hooks (these help ensure your code follows project rules):
3. Install the git pre-commit hooks:
```shell
pre-commit install
```bash
uv run pre-commit install
```
4. Install the `pipecat-ai` package locally in editable mode:
### Python 3.13+ Compatibility
```shell
pip install -e .
```
Some features require PyTorch, which doesn't yet support Python 3.13+. Install using:
> The `-e` or `--editable` option allows you to modify the code without reinstalling.
```bash
uv sync --group dev --all-extras \
--no-extra gstreamer \
--no-extra krisp \
--no-extra local \
--no-extra local-smart-turn \
--no-extra mlx-whisper \
--no-extra moondream \
--no-extra ultravox
```
5. Include optional dependencies as needed. For example:
> **Tip:** For full compatibility, use Python 3.12: `uv python pin 3.12`
```shell
pip install -e ".[daily,deepgram,cartesia,openai,silero]"
```
6. (Optional) If you want to use this package from another directory:
```shell
pip install "path_to_this_repo[option,...]"
```
> **Note**: Some extras (local, gstreamer) require system dependencies. See documentation if you encounter build errors.
### Running tests
Install the test dependencies:
To run all tests, from the root directory:
```shell
pip install -r test-requirements.txt
```bash
uv run pytest
```
From the root directory, run:
Run a specific test suite:
```shell
pytest
```bash
uv run pytest tests/test_name.py
```
### Setting up your editor

View File

@@ -1,13 +0,0 @@
build~=1.2.2
coverage~=7.9.1
grpcio-tools~=1.67.1
pip-tools~=7.4.1
pre-commit~=4.2.0
pyright~=1.1.402
pytest~=8.4.1
pytest-asyncio~=1.0.0
pytest-aiohttp==1.1.0
ruff~=0.12.1
setuptools~=78.1.1
setuptools_scm~=8.3.1
python-dotenv~=1.1.1

View File

@@ -1,10 +1,27 @@
#!/bin/bash
# Build docs using uv
echo "Installing dependencies with uv..."
uv sync --group docs --all-extras --no-extra krisp --no-extra gstreamer --no-extra ultravox --no-extra local_smart_turn --no-extra moondream --no-extra riva --no-extra mlx-whisper
# Check if sphinx-build is available
if ! uv run sphinx-build --version &> /dev/null; then
echo "Error: sphinx-build is not available" >&2
exit 1
fi
# Clean previous build
rm -rf _build
echo "Building documentation..."
# Build docs matching ReadTheDocs configuration
sphinx-build -b html -d _build/doctrees . _build/html -W --keep-going
uv run sphinx-build -b html -d _build/doctrees . _build/html -W --keep-going
# Open docs (MacOS)
open _build/html/index.html
if [ $? -eq 0 ]; then
echo "Documentation built successfully!"
# Open docs (MacOS)
open _build/html/index.html
else
echo "Documentation build failed!" >&2
exit 1
fi

View File

@@ -1,4 +1,5 @@
import logging
import os
import sys
from datetime import datetime
from pathlib import Path
@@ -28,6 +29,7 @@ extensions = [
suppress_warnings = [
"autodoc.mocked_object",
"toc.not_included",
]
# Napoleon settings
@@ -45,85 +47,40 @@ autodoc_default_options = {
# Mock imports for optional dependencies
autodoc_mock_imports = [
"riva",
"livekit",
"pyht", # Base PlayHT package
"pyht.async_client", # PlayHT specific imports
"pyht.client",
"pyht.protos",
"pyht.protos.api_pb2",
"pipecat_ai_playht", # PlayHT wrapper
"aiortc",
"aiortc.mediastreams",
"cv2",
"av",
"pyneuphonic",
"mem0",
"mlx_whisper",
"anthropic",
"assemblyai",
"boto3",
"azure",
"cartesia",
"deepgram",
"elevenlabs",
"fal",
"gladia",
"google",
"krisp",
"langchain",
"lmnt",
"noisereduce",
"openpipe",
"simli",
"soundfile",
"soniox",
# Krisp - has build issues on some platforms
"pipecat_ai_krisp",
"pyaudio",
"krisp",
# System-specific GUI libraries
"_tkinter",
"tkinter",
"daily",
"daily_python",
# Moondream dependencies
"torch",
"transformers",
"intel_extension_for_pytorch",
# Ultravox dependencies
"huggingface_hub",
# Platform-specific audio libraries (if needed)
"gi",
"gi.require_version",
"gi.repository",
# OpenCV - sometimes has import issues during docs build
"cv2",
# Heavy ML packages excluded from ReadTheDocs
# ultravox dependencies
"vllm",
"vllm.engine.arg_utils",
# local-smart-turn dependencies
"coremltools",
"coremltools.models",
"coremltools.models.MLModel",
"torch",
"torch.nn",
"torch.nn.functional",
"torchaudio",
# moondream dependencies
"transformers",
"transformers.AutoTokenizer",
# Langchain dependencies
"langchain_core",
"langchain_core.messages",
"langchain_core.runnables",
"langchain_core.messages.AIMessageChunk",
"langchain_core.runnables.Runnable",
# LiveKit dependencies
"livekit",
"livekit.rtc",
"livekit_api",
"livekit_protocol",
"tenacity",
"tenacity.retry",
"tenacity.stop_after_attempt",
"tenacity.wait_exponential",
"rtc",
"rtc.Room",
"rtc.RoomOptions",
"rtc.AudioSource",
"rtc.LocalAudioTrack",
"rtc.TrackPublishOptions",
"rtc.TrackSource",
"rtc.AudioStream",
"rtc.AudioFrameEvent",
"rtc.AudioFrame",
"rtc.Track",
"rtc.TrackKind",
"rtc.RemoteParticipant",
"rtc.RemoteTrackPublication",
"rtc.DataPacket",
# Riva dependencies
"transformers.AutoFeatureExtractor",
"AutoFeatureExtractor",
"timm",
"einops",
"intel_extension_for_pytorch",
"huggingface_hub",
# riva dependencies
"riva",
"riva.client",
"riva.client.Auth",
@@ -133,57 +90,14 @@ autodoc_mock_imports = [
"riva.client.AudioEncoding",
"riva.client.proto.riva_tts_pb2",
"riva.client.SpeechSynthesisService",
# Local CoreML Smart Turn dependencies
"coremltools",
"coremltools.models",
"coremltools.models.MLModel",
"torch",
"torch.nn",
"torch.nn.functional",
"transformers",
"transformers.AutoFeatureExtractor",
# Also add specific classes that are imported
"AutoFeatureExtractor",
# Sentry dependencies
"sentry_sdk",
# AWS Nova Sonic dependencies
"aws_sdk_bedrock_runtime",
"aws_sdk_bedrock_runtime.client",
"aws_sdk_bedrock_runtime.config",
"aws_sdk_bedrock_runtime.models",
"smithy_aws_core",
"smithy_aws_core.credentials_resolvers",
"smithy_aws_core.credentials_resolvers.static",
"smithy_aws_core.identity",
"smithy_core",
"smithy_core.aio",
"smithy_core.aio.eventstream",
# MCP dependencies (you may already have these)
"mcp",
"mcp.client",
"mcp.client.session_group",
"mcp.client.sse",
"mcp.client.stdio",
"mcp.ClientSession",
"mcp.StdioServerParameters",
# gstreamer
"gi",
"gi.require_version",
"gi.repository",
# Protobuf mocks
"pipecat.frames.protobufs.frames_pb2",
"pipecat.serializers.protobuf",
"google.protobuf",
"google.protobuf.descriptor",
"google.protobuf.descriptor_pool",
"google.protobuf.runtime_version",
"google.protobuf.symbol_database",
"google.protobuf.internal.builder",
# MLX dependencies (Apple Silicon specific)
"mlx",
"mlx_whisper", # Note: might need underscore format too
]
# HTML output settings
html_theme = "sphinx_rtd_theme"
html_static_path = ["_static"]
html_static_path = ["_static"] if os.path.exists("_static") else []
autodoc_typehints = "signature" # Show type hints in the signature only, not in the docstring
html_show_sphinx = False
@@ -202,6 +116,7 @@ def import_core_modules():
"pipecat.clocks",
"pipecat.metrics",
"pipecat.observers",
"pipecat.runner",
"pipecat.serializers",
"pipecat.sync",
"pipecat.transcriptions",

View File

@@ -14,7 +14,7 @@ Quick Links
* `Join our Community <https://discord.gg/pipecat>`_
.. toctree::
:maxdepth: 3
:maxdepth: 2
:caption: API Reference
:hidden:
@@ -26,6 +26,7 @@ Quick Links
Observers <api/pipecat.observers>
Pipeline <api/pipecat.pipeline>
Processors <api/pipecat.processors>
Runner <api/pipecat.runner>
Serializers <api/pipecat.serializers>
Services <api/pipecat.services>
Sync <api/pipecat.sync>

View File

@@ -1,56 +0,0 @@
# Sphinx dependencies
sphinx>=8.1.3
sphinx-rtd-theme
sphinx-markdown-builder
sphinx-autodoc-typehints
toml
# Install all extras individually to ensure they're properly resolved
pipecat-ai[anthropic]
pipecat-ai[assemblyai]
pipecat-ai[aws]
pipecat-ai[azure]
pipecat-ai[cartesia]
pipecat-ai[cerebras]
pipecat-ai[deepseek]
pipecat-ai[daily]
pipecat-ai[deepgram]
pipecat-ai[elevenlabs]
pipecat-ai[fal]
pipecat-ai[fireworks]
pipecat-ai[fish]
pipecat-ai[gladia]
pipecat-ai[google]
pipecat-ai[grok]
pipecat-ai[groq]
# pipecat-ai[krisp] # Mocked
pipecat-ai[koala]
# pipecat-ai[langchain] # Mocked
# pipecat-ai[livekit] # Mocked
pipecat-ai[lmnt]
pipecat-ai[local]
# pipecat-ai[local-smart-turn] # Mocked
# pipecat-ai[mem0] # Mocked
# pipecat-ai[mlx-whisper] # Mocked
# pipecat-ai[moondream] # Mocked
pipecat-ai[nim]
# pipecat-ai[neuphonic] # Mocked
pipecat-ai[noisereduce]
pipecat-ai[openai]
# pipecat-ai[openpipe]
# pipecat-ai[playht] # Mocked due to grpcio conflict with riva
pipecat-ai[qwen]
pipecat-ai[remote-smart-turn]
# pipecat-ai[riva] # Mocked
pipecat-ai[sambanova]
pipecat-ai[silero]
pipecat-ai[simli]
pipecat-ai[soundfile]
pipecat-ai[soniox]
pipecat-ai[speechmatics]
pipecat-ai[tavus]
pipecat-ai[together]
# pipecat-ai[ultravox] # Mocked
# pipecat-ai[webrtc] # Mocked
pipecat-ai[websocket]
pipecat-ai[whisper]

View File

@@ -1,6 +1,10 @@
# Anthropic
ANTHROPIC_API_KEY=...
# Async
ASYNCAI_API_KEY=...
ASYNCAI_VOICE_ID=...
# AWS
AWS_SECRET_ACCESS_KEY=...
AWS_ACCESS_KEY_ID=...
@@ -25,6 +29,9 @@ CARTESIA_API_KEY=...
DAILY_API_KEY=...
DAILY_SAMPLE_ROOM_URL=https://...
# Deepgram
DEEPGRAM_API_KEY=...
# ElevenLabs
ELEVENLABS_API_KEY=...
ELEVENLABS_VOICE_ID=...
@@ -40,6 +47,13 @@ FIREWORKS_API_KEY=...
# Gladia
GLADIA_API_KEY=...
GLADIA_REGION=...
# Google
GOOGLE_API_KEY=...
GOOGLE_CLOUD_PROJECT_ID=...
GOOGLE_TEST_CREDENTIALS=...
GOOGLE_VERTEX_TEST_CREDENTIALS=...
# LMNT
LMNT_API_KEY=...
@@ -76,6 +90,9 @@ GROQ_API_KEY=...
# Grok
GROK_API_KEY=...
# Inworld
INWORLD_API_KEY=...
# Together.ai
TOGETHER_API_KEY=...
@@ -115,9 +132,11 @@ SONIOX_API_KEY=
# Speechmatics
SPEECHMATICS_API_KEY=...
# SambaNova
SAMBANOVA_API_KEY=...
# Sentry
SENTRY_DSN=...
# Heygen
HEYGEN_API_KEY=...

View File

@@ -1,88 +1,31 @@
# Pipecat Examples
This directory contains examples to help you learn how to build with Pipecat.
# Pipecat &mdash; Examples
## Getting Started
## Foundational snippets
Small snippets that build on each other, introducing one or two concepts at a time.
New to Pipecat? Start here:
➡️ [Take a look](https://github.com/pipecat-ai/pipecat/tree/main/examples/foundational)
- **[Quickstart](quickstart/)** - Get your first voice AI bot running in 5 minutes _(coming soon)_
- **[Client/Server Web](client-server-web/)** - Learn to build web applications with Pipecat's client SDKs _(coming soon)_
- **[Phone Bot with Twilio](phone-bot-twilio/)** - Connect your bot to a phone number _(coming soon)_
## Chatbot examples
Collection of self-contained real-time voice and video AI demo applications built with Pipecat.
## Foundational Examples
### Quickstart
Single-file examples that introduce core Pipecat concepts one at a time. These examples:
Each project has its own set of dependencies and configuration variables. They intentionally avoids shared code across projects &mdash; you can grab whichever demo folder you want to work with as a starting point.
- Build on each other progressively
- Focus on specific features or integrations
- Are used for testing with every Pipecat release
We recommend you start with a virtual environment:
See the **[Foundational Examples README](foundational/)** for the complete list.
```shell
cd pipecat-ai/examples/simple-chatbot
## More Advanced Examples
python -m venv venv
Ready to explore complex use cases? Visit **[pipecat-examples](https://github.com/pipecat-ai/pipecat-examples)** for:
source venv/bin/activate
pip install -r requirements.txt
```
Next, follow the steps in the README for each demo.
Make sure you `pip install -r requirements.txt` for each demo project, so you can be sure to have the necessary service dependencies that extend the functionality of Pipecat. You can read more about the framework architecture [here](https://github.com/pipecat-ai/pipecat/tree/main/docs).
## Projects:
| Project | Description | Services |
|----------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------|
| [Simple Chatbot](simple-chatbot) | Basic voice-driven conversational bot. A good starting point for learning the flow of the framework. | Deepgram, ElevenLabs, OpenAI, Daily, Daily Prebuilt UI |
| [Storytelling Chatbot](storytelling-chatbot) | Stitches together multiple third-party services to create a collaborative storytime experience. | Deepgram, ElevenLabs, OpenAI, Fal, Daily, Custom UI |
| [Translation Chatbot](translation-chatbot) | Listens for user speech, then translates that speech to Spanish and speaks the translation back. Demonstrates multi-participant use-cases. | Deepgram, Azure, OpenAI, Daily, Daily Prebuilt UI |
| [Moondream Chatbot](moondream-chatbot) | Demonstrates how to add vision capabilities to GPT4. **Note: works best with a GPU** | Deepgram, ElevenLabs, OpenAI, Moondream, Daily, Daily Prebuilt UI |
| [Patient intake](patient-intake) | A chatbot that can call functions in response to user input. | Deepgram, ElevenLabs, OpenAI, Daily, Daily Prebuilt UI |
| [Phone Chatbot](phone-chatbot) | A chatbot that connects to PSTN/SIP phone calls, powered by Daily or Twilio. | Deepgram, ElevenLabs, OpenAI, Daily, Twilio |
| [Twilio Chatbot](twilio-chatbot) | A chatbot that connects to an incoming phone call from Twilio. | Deepgram, ElevenLabs, OpenAI, Daily, Twilio |
| [studypal](studypal) | A chatbot to have a conversation about any article on the web | |
| [WebSocket Chatbot Server](websocket-server) | A real-time websocket server that handles audio streaming and bot interactions with speech-to-text and text-to-speech capabilities. | Cartesia, Deepgram, OpenAI, Websockets |
> [!IMPORTANT]
> These example projects use Daily as a WebRTC transport and can be joined using their hosted Prebuilt UI.
> It provides a quick way to join a real-time session with your bot and test your ideas without building any frontend code. If you'd like to see an example of a custom UI, try Storybot.
## FAQ
### Deployment
For each of these demos we've included a `Dockerfile`. Out of the box, this should provide everything needed to get the respective demo running on a VM:
```shell
docker build username/app:tag .
docker run -p 7860:7860 --env-file ./.env username/app:tag
docker push ...
```
### SSL
If you're working with a custom UI (such as with the Storytelling Chatbot), it's important to ensure your deployment platform supports HTTPS, as accessing user devices such as mics and webcams requires SSL.
If you try to run a custom UI without SSL, you may see an error in the console telling you that `navigator` is undefined, or no devices are available.
### Are these examples production ready?
Yes, kind of.
These demos attempt to keep things simple and are unopinionated regarding environment or scalability.
We're using FastAPI to spawn a subprocess for the bots / agents &mdash; useful for small tests, but not so great for production grade apps with many concurrent users. You can see how this works in each project's `start` endpoint in `server.py`.
Creating virtualized worker pools and on-demand instances is out of scope for these examples, but we hope to add some examples to this repo soon!
For projects that have CUDA as a requirement, such as Moondream Chatbot, be sure to deploy to a GPU-powered platform (such as [fly.io](https://fly.io) or [Runpod](https://runpod.io).)
## Getting help
➡️ [Join our Discord](https://discord.gg/pipecat)
➡️ [Reach us on Twitter](https://x.com/pipecat_ai)
- Production-ready applications
- Multi-platform client implementations
- Telephony integrations
- Multimodal and creative applications
- Deployment and monitoring examples

View File

@@ -1,60 +0,0 @@
# AWS Strands Examples
This folder contains two Python examples demonstrating how to use Pipecat with the AWS Strands agent.
## Overview
These examples show how to delegate complex, multi-step tasks to a Strands agent, which can reason step-by-step and call tools to accomplish user requests.
These examples are intentionally simplified for demonstration, using mock API calls. They work best if you ask it:
> What's the weather where the Golden Gate Bridge is?
## Example Scripts
### `black-box.py`
A minimal example that demonstrates how to use the Strands agent with Pipecat. The agent can handle multi-step queries by calling tools, but does not explain its reasoning out loud.
### `explain-thinking.py`
An enhanced example where the Strands agent explains each step of its reasoning in clear, simple language as it works through a multi-step task.
## Quick Start
1. **Clone the repository and navigate to this example:**
```bash
git clone https://github.com/pipecat-ai/pipecat.git
cd pipecat/examples/aws-strands
```
2. **Set up a virtual environment:**
```bash
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```
3. **Install dependencies:**
```bash
pip install -r requirements.txt
```
4. **Configure environment variables:**
Copy the provided `env.example` file to `.env` and fill in the necessary credentials:
```bash
cp env.example .env
# Then edit .env with your preferred editor
```
5. **Run an example:**
```bash
python black-box.py
# or
python explain-thinking.py
```

View File

@@ -1,206 +0,0 @@
#
# Copyright (c) 2025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import argparse
import asyncio
import os
from dotenv import load_dotenv
from loguru import logger
from strands import Agent, tool
from strands.models import BedrockModel
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.llm_service import FunctionCallParams
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
load_dotenv(override=True)
"""This example demonstrates how to use the Strands agent with Pipecat.
You can delegate complex, multi-step tasks to the Strands agent, which can cycle through LLM-based reasoning and tool calls to accomplish the task.
Try asking: "What's the weather where the Golden Gate Bridge is?"
"""
# Strands agent tools
@tool
def get_location_name_from_landmark(landmark: str) -> str:
"""
Get the location name from a landmark.
Args:
landmark (str): The name of the landmark, e.g. "Golden Gate Bridge".
"""
# Simulate fetching location
return "San Francisco, CA"
@tool
def get_lat_long_from_location_name(location: str) -> dict:
"""
Get the latitude and longitude for a location name.
Args:
location (str): The city and state, e.g. "San Francisco, CA".
"""
# Simulate fetching lat/long from a geocoding service
return {"lat": 37.7749, "long": -122.4194}
@tool
def get_current_weather_from_lat_long(lat: float, long: float) -> dict:
"""
Get the current weather for a specific latitude and longitude.
Args:
lat (float): The latitude of the location.
long (float): The longitude of the location.
"""
# Simulate fetching weather data from a weather service
return {"conditions": "nice", "temperature": "75"}
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
}
async def run_example(transport: BaseTransport, _: argparse.Namespace, handle_sigint: bool):
logger.info(f"Starting bot")
strands_agent = Agent(
model=BedrockModel(
model_id="us.anthropic.claude-3-7-sonnet-20250219-v1:0", max_tokens=64000
),
tools=[
get_location_name_from_landmark,
get_lat_long_from_location_name,
get_current_weather_from_lat_long,
],
system_prompt="""
You are a helpful personal assistant who can look up information about places and weather.
Your key capabilities:
1. Look up where landmarks are located.
2. Find latitude and longitude for a location.
3. Look up the current weather for a specific latitude and longitude.
Explain each step of your reasoning in clear, simple, and concise language. Your responses will be converted to audio, so avoid special characters and numbered lists.
""",
)
async def handle_location_or_weather_related_queries(params: FunctionCallParams, query: str):
"""
Handle location or weather related queries.
Args:
query (str): The user's query, e.g. "What's the weather where the Golden Gate Bridge is?".
"""
# Run in a background thread
# (Otherwise the agent blocks the event loop; one effect of that is that we don't hear
# "let me check on that" until the agent finishes)
loop = asyncio.get_running_loop()
result = await loop.run_in_executor(None, strands_agent, query)
await params.result_callback(result.message)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm.register_direct_function(handle_location_or_weather_related_queries)
@llm.event_handler("on_function_calls_started")
async def on_function_calls_started(service, function_calls):
await tts.queue_frame(TTSSpeakFrame("Let me check on that."))
tools = ToolsSchema(standard_tools=[handle_location_or_weather_related_queries])
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way. Start by suggesting that the user ask about the weather where the Golden Gate Bridge is.",
},
]
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(),
stt,
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=handle_sigint)
await runner.run(task)
if __name__ == "__main__":
from pipecat.examples.run import main
main(run_example, transport_params=transport_params)

View File

@@ -1,8 +0,0 @@
OPENAI_API_KEY=
CARTESIA_API_KEY=
DEEPGRAM_API_KEY=
DAILY_API_KEY=
DAILY_SAMPLE_ROOM_URL=
AWS_SECRET_ACCESS_KEY=
AWS_ACCESS_KEY_ID=
AWS_REGION=

View File

@@ -1,249 +0,0 @@
#
# Copyright (c) 2025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import argparse
import asyncio
import os
import threading
import time
from dotenv import load_dotenv
from loguru import logger
from strands import Agent, tool
from strands.models import BedrockModel
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.llm_service import FunctionCallParams
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
load_dotenv(override=True)
"""This example demonstrates how to use the Strands agent with Pipecat in a way where the agent explains its reasoning step-by-step.
You can delegate complex, multi-step tasks to the Strands agent, which can cycle through LLM-based reasoning and tool calls to accomplish the task.
Try asking: "What's the weather where the Golden Gate Bridge is?"
"""
# Strands agent tools
@tool
def get_location_name_from_landmark(landmark: str) -> str:
"""
Get the location name from a landmark.
Args:
landmark (str): The name of the landmark, e.g. "Golden Gate Bridge".
"""
# Simulate fetching location (slowly)
time.sleep(3)
return "San Francisco, CA"
@tool
def get_lat_long_from_location_name(location: str) -> dict:
"""
Get the latitude and longitude for a location name.
Args:
location (str): The city and state, e.g. "San Francisco, CA".
"""
# Simulate fetching lat/long from a geocoding service (slowly)
time.sleep(3)
return {"lat": 37.7749, "long": -122.4194}
@tool
def get_current_weather_from_lat_long(lat: float, long: float) -> dict:
"""
Get the current weather for a specific latitude and longitude.
Args:
lat (float): The latitude of the location.
long (float): The longitude of the location.
"""
# Simulate fetching weather data from a weather service (slowly)
time.sleep(3)
return {"conditions": "nice", "temperature": "75"}
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
# instantiated. The function will be called when the desired transport gets
# selected.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
}
async def run_example(transport: BaseTransport, _: argparse.Namespace, handle_sigint: bool):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
next_strands_message_is_last = False
strands_messages_queue = asyncio.Queue()
def strands_callback_handler(**kwargs):
"""
Handle events from the Strands agent.
"""
nonlocal next_strands_message_is_last
if "event" in kwargs:
event_obj = kwargs["event"]
if event_obj and "messageStop" in event_obj:
message_stop = event_obj["messageStop"]
if message_stop and "stopReason" in message_stop:
stop_reason = message_stop["stopReason"]
if stop_reason == "end_turn":
next_strands_message_is_last = True
elif "message" in kwargs:
message_obj = kwargs["message"]
if message_obj and "content" in message_obj and "role" in message_obj:
role = message_obj["role"]
content = message_obj["content"]
if role == "assistant" and isinstance(content, list):
for content_obj in content:
if isinstance(content_obj, dict) and "text" in content_obj:
message = content_obj["text"]
if not next_strands_message_is_last:
strands_messages_queue.put_nowait(message)
async def process_strands_messages():
while True:
message = await strands_messages_queue.get()
await tts.queue_frame(TTSSpeakFrame(message))
strands_messages_queue.task_done()
asyncio.create_task(process_strands_messages())
strands_agent = Agent(
model=BedrockModel(
model_id="us.anthropic.claude-3-7-sonnet-20250219-v1:0", max_tokens=64000
),
tools=[
get_location_name_from_landmark,
get_lat_long_from_location_name,
get_current_weather_from_lat_long,
],
system_prompt="""
You are a helpful personal assistant who can look up information about places and weather.
Your key capabilities:
1. Look up where landmarks are located.
2. Find latitude and longitude for a location.
3. Look up the current weather for a specific latitude and longitude.
Explain each step of your reasoning in clear, simple, and concise language. Your responses will be converted to audio, so avoid special characters and numbered lists.
""",
callback_handler=strands_callback_handler,
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
async def handle_location_or_weather_related_queries(params: FunctionCallParams, query: str):
"""
Handle location or weather related queries.
Args:
query (str): The user's query, e.g. "What's the weather where the Golden Gate Bridge is?".
"""
# Run in a background thread
# (Otherwise the agent blocks the event loop; one effect of that is that we don't hear
# the agent's "thinking" messages until the agent finishes)
loop = asyncio.get_running_loop()
result = await loop.run_in_executor(None, strands_agent, query)
await params.result_callback(result.message)
llm.register_direct_function(handle_location_or_weather_related_queries)
@llm.event_handler("on_function_calls_started")
async def on_function_calls_started(service, function_calls):
await tts.queue_frame(TTSSpeakFrame("Let me check on that."))
tools = ToolsSchema(standard_tools=[handle_location_or_weather_related_queries])
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way. Start by suggesting that the user ask about the weather where the Golden Gate Bridge is.",
},
]
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(),
stt,
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=handle_sigint)
await runner.run(task)
if __name__ == "__main__":
from pipecat.examples.run import main
main(run_example, transport_params=transport_params)

View File

@@ -1,6 +0,0 @@
fastapi
uvicorn
python-dotenv
pipecat-ai[webrtc,daily,deepgram,cartesia]
pipecat-ai-small-webrtc-prebuilt
strands-agents

View File

@@ -1,45 +0,0 @@
# Bot ready signaling
A simple Pipecat example demonstrating how to handle signaling between the client and the bot,
ensuring that the bot starts sending audio only when the client is available,
thereby avoiding the risk of cutting off the beginning of the audio.
## Quick Start
### First, start the bot server:
1. Navigate to the server directory:
```bash
cd server
```
2. Create and activate a virtual environment:
```bash
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```
3. Install requirements:
```bash
pip install -r requirements.txt
```
4. Copy env.example to .env and configure:
- Add your API keys
5. Start the server:
```bash
python server.py
```
### Next, connect using the client app:
For client-side setup, refer to the [JavaScript Guide](client/javascript/README.md).
## Important Note
Ensure the bot server is running before using any client implementations.
## Requirements
- Python 3.10+
- Node.js 16+ (for JavaScript)
- Daily API key
- Cartesia API key
- Modern web browser with WebRTC support

View File

@@ -1,27 +0,0 @@
# JavaScript Implementation
Basic implementation using the [Pipecat JavaScript SDK](https://docs.pipecat.ai/client/js/introduction).
## Setup
1. Run the bot server. See the [server README](../../README).
2. Navigate to the `client/javascript` directory:
```bash
cd client/javascript
```
3. Install dependencies:
```bash
npm install
```
4. Run the client app:
```
npm run dev
```
5. Visit http://localhost:5173 in your browser.

View File

@@ -1,34 +0,0 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>AI Chatbot</title>
</head>
<body>
<div class="container">
<div class="status-bar">
<div class="status">
Status: <span id="connection-status">Disconnected</span>
</div>
<div class="controls">
<button id="connect-btn">Connect</button>
<button id="disconnect-btn" disabled>Disconnect</button>
</div>
</div>
<audio id="bot-audio" autoplay></audio>
<div class="debug-panel">
<h3>Debug Info</h3>
<div id="debug-log"></div>
</div>
</div>
<script type="module" src="/src/app.js"></script>
<link rel="stylesheet" href="/src/style.css">
</body>
</html>

File diff suppressed because it is too large Load Diff

View File

@@ -1,20 +0,0 @@
{
"name": "client",
"version": "1.0.0",
"main": "index.js",
"scripts": {
"dev": "vite",
"build": "vite build",
"preview": "vite preview"
},
"keywords": [],
"author": "",
"license": "ISC",
"description": "",
"devDependencies": {
"vite": "^6.3.5"
},
"dependencies": {
"@daily-co/daily-js": "0.74.0"
}
}

View File

@@ -1,216 +0,0 @@
/**
* Copyright (c) 20242025, Daily
*
* SPDX-License-Identifier: BSD 2-Clause License
*/
import Daily from "@daily-co/daily-js";
/**
* ChatbotClient handles the connection and media management for a real-time
* voice interaction with an AI bot.
*/
class ChatbotClient {
constructor() {
// Initialize client state
this.dailyCallObject = null;
this.setupDOMElements();
this.setupEventListeners();
}
/**
* Set up references to DOM elements and create necessary media elements
*/
setupDOMElements() {
// Get references to UI control elements
this.connectBtn = document.getElementById('connect-btn');
this.disconnectBtn = document.getElementById('disconnect-btn');
this.statusSpan = document.getElementById('connection-status');
this.debugLog = document.getElementById('debug-log');
// Create an audio element for bot's voice output
this.botAudio = document.createElement('audio');
this.botAudio.autoplay = true;
this.botAudio.playsInline = true;
document.body.appendChild(this.botAudio);
}
/**
* Set up event listeners for connect/disconnect buttons
*/
setupEventListeners() {
this.connectBtn.addEventListener('click', () => this.connect());
this.disconnectBtn.addEventListener('click', () => this.disconnect());
}
/**
* Add a timestamped message to the debug log
*/
log(message) {
const entry = document.createElement('div');
entry.textContent = `${new Date().toISOString()} - ${message}`;
// Add styling based on message type
if (message.startsWith('User: ')) {
entry.style.color = '#2196F3'; // blue for user
} else if (message.startsWith('Bot: ')) {
entry.style.color = '#4CAF50'; // green for bot
}
this.debugLog.appendChild(entry);
this.debugLog.scrollTop = this.debugLog.scrollHeight;
console.log(message);
}
/**
* Update the connection status display
*/
updateStatus(status) {
this.statusSpan.textContent = status;
this.log(`Status: ${status}`);
}
handleEventToConsole (evt) {
this.log(`Received event: ${evt.action}`);
};
/**
* Set up listeners for track events (start/stop)
* This handles new tracks being added during the session
*/
setupTrackListeners() {
if (!this.dailyCallObject) return;
this.dailyCallObject.on("joined-meeting", () => {
this.updateStatus('Connected');
this.connectBtn.disabled = true;
this.disconnectBtn.disabled = false;
this.log('Client connected');
});
this.dailyCallObject.on("track-started", (evt) => {
if (evt.track.kind === "audio" && evt.participant.local === false) {
this.log("Audio track started.")
this.setupAudioTrack(evt.track);
}
});
this.dailyCallObject.on("track-stopped", this.handleEventToConsole.bind(this));
this.dailyCallObject.on("participant-joined", this.handleEventToConsole.bind(this));
this.dailyCallObject.on("participant-updated", this.handleEventToConsole.bind(this));
this.dailyCallObject.on("participant-left", () => {
// When the bot leaves, we are also disconnecting from the call
this.disconnect()
});
this.dailyCallObject.on("left-meeting", () => {
this.updateStatus('Disconnected');
this.connectBtn.disabled = false;
this.disconnectBtn.disabled = true;
this.log('Client disconnected');
});
this.dailyCallObject.on("error", this.handleEventToConsole.bind(this));
}
/**
* Set up an audio track for playback
* Handles both initial setup and track updates
*/
setupAudioTrack(track) {
this.log(`Setting up audio track, track state: ${track.readyState}, muted: ${track.muted}`);
// Check if we're already playing this track
if (this.botAudio.srcObject) {
const oldTrack = this.botAudio.srcObject.getAudioTracks()[0];
if (oldTrack?.id === track.id) return;
}
// Create a new MediaStream with the track and set it as the audio source
this.botAudio.srcObject = new MediaStream([track]);
this.botAudio.onplaying = async (event) => {
this.log("onplaying")
this.log("Will send the audio message to play the audio at the next tick")
this.dailyCallObject.sendAppMessage("playable")
}
}
async fetchRoomInfo() {
let connectUrl = '/connect'
let res = await fetch(connectUrl, {
method: "POST",
mode: "cors",
headers: new Headers({
"Content-Type": "application/json"
}),
})
if (res.ok) {
return res.json();
}
}
/**
* Initialize and connect to the bot
* This sets up the RTVI client, initializes devices, and establishes the connection
*/
async connect() {
try {
// Initialize the client
this.dailyCallObject = Daily.createCallObject({
subscribeToTracksAutomatically: true,
});
// Set up listeners for media track events
this.setupTrackListeners();
this.log('Creating the bot...');
let roomInfo = await this.fetchRoomInfo()
// Connect to the bot
this.log('Connecting to bot...');
// Only for making debugger easier
window.callObject = this.dailyCallObject;
await this.dailyCallObject.join({
url: roomInfo.room_url,
});
this.log('Connection complete');
} catch (error) {
// Handle any errors during connection
this.log(`Error connecting: ${error.message}`);
this.log(`Error stack: ${error.stack}`);
this.updateStatus('Error');
// Clean up if there's an error
if (this.dailyCallObject) {
try {
await this.dailyCallObject.leave();
} catch (disconnectError) {
this.log(`Error during disconnect: ${disconnectError.message}`);
}
}
}
}
/**
* Disconnect from the bot and clean up media resources
*/
async disconnect() {
if (this.dailyCallObject) {
try {
// Disconnect the RTVI client
await this.dailyCallObject.leave();
await this.dailyCallObject.destroy();
this.dailyCallObject = null;
// Clean up audio
if (this.botAudio.srcObject) {
this.botAudio.srcObject.getTracks().forEach((track) => track.stop());
this.botAudio.srcObject = null;
}
} catch (error) {
this.log(`Error disconnecting: ${error.message}`);
}
}
}
}
// Initialize the client when the page loads
window.addEventListener('DOMContentLoaded', () => {
new ChatbotClient();
});

View File

@@ -1,98 +0,0 @@
body {
margin: 0;
padding: 20px;
font-family: Arial, sans-serif;
background-color: #f0f0f0;
}
.container {
max-width: 1200px;
margin: 0 auto;
}
.status-bar {
display: flex;
justify-content: space-between;
align-items: center;
padding: 10px;
background-color: #fff;
border-radius: 8px;
margin-bottom: 20px;
}
.controls button {
padding: 8px 16px;
margin-left: 10px;
border: none;
border-radius: 4px;
cursor: pointer;
}
#connect-btn {
background-color: #4caf50;
color: white;
}
#disconnect-btn {
background-color: #f44336;
color: white;
}
button:disabled {
opacity: 0.5;
cursor: not-allowed;
}
.main-content {
background-color: #fff;
border-radius: 8px;
padding: 20px;
margin-bottom: 20px;
}
.bot-container {
display: flex;
flex-direction: column;
align-items: center;
}
#bot-video-container {
width: 640px;
height: 360px;
background-color: #e0e0e0;
border-radius: 8px;
margin: 20px auto;
overflow: hidden;
display: flex;
align-items: center;
justify-content: center;
}
#bot-video-container video {
width: 100%;
height: 100%;
object-fit: cover;
}
.debug-panel {
background-color: #fff;
border-radius: 8px;
padding: 20px;
}
.debug-panel h3 {
margin: 0 0 10px 0;
font-size: 16px;
font-weight: bold;
}
#debug-log {
height: 200px;
overflow-y: auto;
background-color: #f8f8f8;
padding: 10px;
border-radius: 4px;
font-family: monospace;
font-size: 12px;
line-height: 1.4;
}

View File

@@ -1,13 +0,0 @@
import { defineConfig } from 'vite';
export default defineConfig({
server: {
proxy: {
// Proxy /api requests to the backend server
'/connect': {
target: 'http://0.0.0.0:7860', // Replace with your backend URL
changeOrigin: true,
},
},
},
});

View File

@@ -1,60 +0,0 @@
# React Native Implementation
Basic implementation using the [Pipecat React Native SDK](https://docs.pipecat.ai/client/react-native/introduction).
## Usage
### Expo requirements
This project cannot be used with an [Expo Go](https://docs.expo.dev/workflow/expo-go/) app because [it requires custom native code](https://docs.expo.io/workflow/customizing/).
When a project requires custom native code or a config plugin, we need to transition from using [Expo Go](https://docs.expo.dev/workflow/expo-go/)
to a [development build](https://docs.expo.dev/development/introduction/).
More details about the custom native code used by this demo can be found in [rn-daily-js-expo-config-plugin](https://github.com/daily-co/rn-daily-js-expo-config-plugin).
### Building remotely
If you do not have experience with Xcode and Android Studio builds or do not have them installed locally on your computer, you will need to follow [this guide from Expo to use EAS Build](https://docs.expo.dev/development/create-development-builds/#create-and-install-eas-build).
### Building locally
You will need to have installed locally on your computer:
- [Xcode](https://developer.apple.com/xcode/) to build for iOS;
- [Android Studio](https://developer.android.com/studio) to build for Android;
#### Install the demo dependencies
```bash
# Use the version of node specified in .nvmrc
nvm i
# Install dependencies
npm i
# Before a native app can be compiled, the native source code must be generated.
npx expo prebuild
# Configure the environment variable to connect to the local server
cp env.example .env
# edit .env and add your local ip address, for example: http://192.168.1.16:7860
```
#### Running on Android
After plugging in an Android device [configured for debugging](https://developer.android.com/studio/debug/dev-options), run the following command:
```
npm run android
```
#### Running on iOS
Run the following command:
```
npm run ios
```
#### Connect to the server
Use the http://localhost:5173 in your app.

View File

@@ -1,75 +0,0 @@
{
"expo": {
"name": "bot-ready-rn",
"slug": "bot-ready-rn",
"version": "1.0.0",
"orientation": "portrait",
"icon": "./assets/icon.png",
"userInterfaceStyle": "light",
"splash": {
"image": "./assets/splash.png",
"resizeMode": "contain",
"backgroundColor": "#ffffff"
},
"updates": {
"fallbackToCacheTimeout": 0
},
"assetBundlePatterns": [
"**/*"
],
"ios": {
"supportsTablet": true,
"bitcode": false,
"bundleIdentifier": "co.daily.expo.BotReady",
"infoPlist": {
"UIBackgroundModes": [
"voip"
]
},
"appleTeamId": "EEBGKV9N3N"
},
"android": {
"adaptiveIcon": {
"foregroundImage": "./assets/adaptive-icon.png",
"backgroundColor": "#FFFFFF"
},
"package": "co.daily.expo.BotReady",
"permissions": [
"android.permission.ACCESS_NETWORK_STATE",
"android.permission.BLUETOOTH",
"android.permission.CAMERA",
"android.permission.INTERNET",
"android.permission.MODIFY_AUDIO_SETTINGS",
"android.permission.RECORD_AUDIO",
"android.permission.SYSTEM_ALERT_WINDOW",
"android.permission.WAKE_LOCK",
"android.permission.FOREGROUND_SERVICE",
"android.permission.FOREGROUND_SERVICE_CAMERA",
"android.permission.FOREGROUND_SERVICE_MICROPHONE",
"android.permission.FOREGROUND_SERVICE_MEDIA_PROJECTION",
"android.permission.POST_NOTIFICATIONS"
]
},
"web": {
"favicon": "./assets/favicon.png"
},
"plugins": [
"@config-plugins/react-native-webrtc",
"@daily-co/config-plugin-rn-daily-js",
[
"expo-build-properties",
{
"android": {
"minSdkVersion": 24,
"compileSdkVersion": 35,
"targetSdkVersion": 34,
"buildToolsVersion": "35.0.0"
},
"ios": {
"deploymentTarget": "15.1"
}
}
]
]
}
}

Binary file not shown.

Before

Width:  |  Height:  |  Size: 17 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.4 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 22 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 46 KiB

View File

@@ -1,7 +0,0 @@
module.exports = function(api) {
api.cache(true);
return {
presets: ['babel-preset-expo'],
plugins: [["module:react-native-dotenv"]],
};
};

View File

@@ -1 +0,0 @@
API_BASE_URL=http://YOUR_LOCAL_IP:7860

View File

@@ -1,7 +0,0 @@
import { registerRootComponent } from "expo";
import App from "./src/App";
// registerRootComponent calls AppRegistry.registerComponent('main', () => App);
// It also ensures that the environment is set up appropriately
registerRootComponent(App);

View File

@@ -1,4 +0,0 @@
// Learn more https://docs.expo.io/guides/customizing-metro
const { getDefaultConfig } = require('expo/metro-config');
module.exports = getDefaultConfig(__dirname);

File diff suppressed because it is too large Load Diff

View File

@@ -1,31 +0,0 @@
{
"name": "bot-ready-rn",
"version": "1.0.0",
"scripts": {
"start": "expo start --dev-client",
"android": "expo run:android --device",
"ios": "expo run:ios --device",
"web": "expo start --web"
},
"dependencies": {
"@config-plugins/react-native-webrtc": "^10.0.0",
"@daily-co/config-plugin-rn-daily-js": "0.0.7",
"@daily-co/react-native-daily-js": "^0.70.0",
"@daily-co/react-native-webrtc": "^118.0.3-daily.2",
"@react-native-async-storage/async-storage": "1.23.1",
"expo": "^52.0.0",
"expo-build-properties": "~0.13.1",
"expo-dev-client": "~5.0.5",
"expo-splash-screen": "~0.29.16",
"expo-status-bar": "~2.0.0",
"react": "18.3.1",
"react-native": "0.76.3",
"react-native-background-timer": "^2.4.1",
"react-native-dotenv": "^3.4.11",
"react-native-get-random-values": "^1.11.0"
},
"devDependencies": {
"@babel/core": "^7.12.9"
},
"private": true
}

View File

@@ -1,121 +0,0 @@
import React, { useState, useEffect } from 'react';
import {SafeAreaView, View, Text, Button, StyleSheet, ScrollView} from 'react-native';
import Daily from "@daily-co/react-native-daily-js";
import { API_BASE_URL } from "@env";
const CallScreen = () => {
const [connectionStatus, setConnectionStatus] = useState('Disconnected');
const [isConnected, setIsConnected] = useState(false);
const [callObject, setCallObject] = useState(null);
const [logs, setLogs] = useState([]);
useEffect(() => {
if (callObject) {
setupTrackListeners(callObject);
}
}, [callObject]);
const log = (message) => {
setLogs((prevLogs) => [...prevLogs, `${new Date().toISOString()} - ${message}`]);
console.log(message);
};
const setupTrackListeners = (callObject) => {
callObject.on("joined-meeting", () => {
setConnectionStatus('Connected');
setIsConnected(true);
log('Client connected');
});
callObject.on("left-meeting", () => {
setConnectionStatus('Disconnected');
setIsConnected(false);
log('Client disconnected');
});
callObject.on("participant-left", () => {
// When the bot leaves, we are also disconnecting from the call
disconnect().catch((err) => {
log(`Failed to disconnect ${err}`);
})
});
// Trigger so the bot can start sending audio
callObject.on("track-started", (evt) => {
if (evt.track.kind === "audio" && evt.participant.local === false) {
handleEventToConsole(evt)
log("Sending the message that will trigger the bot to play the audio.")
callObject.sendAppMessage("playable")
}
});
callObject.on("error", (evt) => log(`Error: ${evt.error}`));
// Other events just for awareness
callObject.on("track-stopped", handleEventToConsole);
callObject.on("participant-joined", handleEventToConsole);
callObject.on("participant-updated", handleEventToConsole);
};
const handleEventToConsole = (evt) => {
log(`Received event: ${evt.action}`);
};
const connect = async () => {
try {
const callObject = Daily.createCallObject({ subscribeToTracksAutomatically: true });
setCallObject(callObject);
const connectionUrl = `${API_BASE_URL}/connect`
const res = await fetch(connectionUrl, { method: "POST", headers: { "Content-Type": "application/json" } });
const roomInfo = await res.json();
await callObject.join({ url: roomInfo.room_url });
} catch (error) {
log(`Error connecting: ${error.message}`);
}
};
const disconnect = async () => {
if (callObject) {
try {
await callObject.leave();
await callObject.destroy();
setCallObject(null);
} catch (error) {
log(`Error disconnecting: ${error.message}`);
}
}
};
return (
<SafeAreaView style={styles.safeArea}>
<View style={styles.container}>
<View style={styles.statusBar}>
<Text>Status: <Text style={styles.status}>{connectionStatus}</Text></Text>
<View style={styles.controls}>
<Button
title={isConnected ? "Disconnect" : "Connect"}
onPress={isConnected ? disconnect : connect}
/>
</View>
</View>
<View style={styles.debugPanel}>
<Text style={styles.debugTitle}>Debug Info</Text>
<ScrollView style={styles.debugLog}>
{logs.map((logEntry, index) => (
<Text key={index} style={styles.logText}>{logEntry}</Text>
))}
</ScrollView>
</View>
</View>
</SafeAreaView>
);
};
const styles = StyleSheet.create({
safeArea: { flex: 1, backgroundColor: '#f0f0f0', padding: 20 },
container: { flex: 1, margin: 20 },
statusBar: { flexDirection: 'row', justifyContent: 'space-between', alignItems: 'center', padding: 10, backgroundColor: '#fff', borderRadius: 8, marginBottom: 20 },
status: { fontWeight: 'bold' },
controls: { flexDirection: 'row', gap: 10 },
debugPanel: { height: '80%', backgroundColor: '#fff', borderRadius: 8, padding: 20},
debugTitle: { fontSize: 16, fontWeight: 'bold' },
debugLog: { height: '100%', overflow: 'scroll', backgroundColor: '#f8f8f8', padding: 10, borderRadius: 4, fontFamily: 'monospace', fontSize: 12, lineHeight: 1.4 },
});
export default CallScreen;

View File

@@ -1,50 +0,0 @@
# Bot ready signaling Server
A FastAPI server that manages bot instances and provide endpoint for Pipecat client connections.
## Endpoints
- `POST /connect` - Pipecat client connection endpoint
## Environment Variables
Copy `env.example` to `.env` and configure:
```ini
# Required API Keys
DAILY_API_KEY= # Your Daily API key
CARTESIA_API_KEY= # Your Cartesia API key
# Optional Configuration
DAILY_API_URL= # Optional: Daily API URL (defaults to https://api.daily.co/v1)
DAILY_SAMPLE_ROOM_URL= # Optional: Fixed room URL for development
HOST= # Optional: Host address (defaults to 0.0.0.0)
FAST_API_PORT= # Optional: Port number (defaults to 7860)
```
## Running the Server
Set up and activate your virtual environment:
```bash
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```
Install dependencies:
```bash
pip install -r requirements.txt
```
If you want to use the local version of `pipecat` in this repo rather than the last published version, also run:
```bash
pip install --editable "../../../[daily,cartesia,openai]"
```
Run the server:
```bash
python server.py
```

View File

@@ -1,3 +0,0 @@
DAILY_SAMPLE_ROOM_URL=https://yourdomain.daily.co/yourroom # (for joining the bot to the same room repeatedly for local dev)
DAILY_API_KEY=
CARTESIA_API_KEY=

View File

@@ -1,4 +0,0 @@
python-dotenv
fastapi[all]
uvicorn
pipecat-ai[daily,cartesia,openai]

View File

@@ -1,64 +0,0 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import argparse
import os
from typing import Optional
import aiohttp
from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper
async def configure(aiohttp_session: aiohttp.ClientSession):
(url, token, _) = await configure_with_args(aiohttp_session)
return (url, token)
async def configure_with_args(
aiohttp_session: aiohttp.ClientSession, parser: Optional[argparse.ArgumentParser] = None
):
if not parser:
parser = argparse.ArgumentParser(description="Daily AI SDK Bot Sample")
parser.add_argument(
"-u", "--url", type=str, required=False, help="URL of the Daily room to join"
)
parser.add_argument(
"-k",
"--apikey",
type=str,
required=False,
help="Daily API Key (needed to create an owner token for the room)",
)
args, unknown = parser.parse_known_args()
url = args.url or os.getenv("DAILY_SAMPLE_ROOM_URL")
key = args.apikey or os.getenv("DAILY_API_KEY")
if not url:
raise Exception(
"No Daily room specified. use the -u/--url option from the command line, or set DAILY_SAMPLE_ROOM_URL in your environment to specify a Daily room URL."
)
if not key:
raise Exception(
"No Daily API key specified. use the -k/--apikey option from the command line, or set DAILY_API_KEY in your environment to specify a Daily API key, available from https://dashboard.daily.co/developers."
)
daily_rest_helper = DailyRESTHelper(
daily_api_key=key,
daily_api_url=os.getenv("DAILY_API_URL", "https://api.daily.co/v1"),
aiohttp_session=aiohttp_session,
)
# Create a meeting token for the given room with an expiration 1 hour in
# the future.
expiry_time: float = 60 * 60
token = await daily_rest_helper.get_token(url, expiry_time)
return (url, token, args)

View File

@@ -1,147 +0,0 @@
#
# Copyright (c) 2025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import argparse
import os
import subprocess
from contextlib import asynccontextmanager
from typing import Any, Dict
import aiohttp
from dotenv import load_dotenv
from fastapi import FastAPI, HTTPException, Request
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import JSONResponse
from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper, DailyRoomParams
# Load environment variables from .env file
load_dotenv(override=True)
# Dictionary to track bot processes: {pid: (process, room_url)}
bot_procs = {}
# Store Daily API helpers
daily_helpers = {}
def cleanup():
"""Cleanup function to terminate all bot processes.
Called during server shutdown.
"""
for entry in bot_procs.values():
proc = entry[0]
proc.terminate()
proc.wait()
@asynccontextmanager
async def lifespan(app: FastAPI):
"""FastAPI lifespan manager that handles startup and shutdown tasks.
- Creates aiohttp session
- Initializes Daily API helper
- Cleans up resources on shutdown
"""
aiohttp_session = aiohttp.ClientSession()
daily_helpers["rest"] = DailyRESTHelper(
daily_api_key=os.getenv("DAILY_API_KEY", ""),
daily_api_url=os.getenv("DAILY_API_URL", "https://api.daily.co/v1"),
aiohttp_session=aiohttp_session,
)
yield
await aiohttp_session.close()
cleanup()
# Initialize FastAPI app with lifespan manager
app = FastAPI(lifespan=lifespan)
# Configure CORS to allow requests from any origin
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
async def create_room_and_token() -> tuple[str, str]:
"""Helper function to create a Daily room and generate an access token.
Returns:
tuple[str, str]: A tuple containing (room_url, token)
Raises:
HTTPException: If room creation or token generation fails
"""
room = await daily_helpers["rest"].create_room(DailyRoomParams())
if not room.url:
raise HTTPException(status_code=500, detail="Failed to create room")
token = await daily_helpers["rest"].get_token(room.url)
if not token:
raise HTTPException(status_code=500, detail=f"Failed to get token for room: {room.url}")
return room.url, token
@app.post("/connect")
async def bot_connect(request: Request) -> Dict[Any, Any]:
"""Connect endpoint that creates a room and returns connection credentials.
This endpoint is called by client to establish a connection.
Returns:
Dict[Any, Any]: Authentication bundle containing room_url and token
Raises:
HTTPException: If room creation, token generation, or bot startup fails
"""
print("Creating room for RTVI connection")
room_url, token = await create_room_and_token()
print(f"Room URL: {room_url}")
# Start the bot process
try:
bot_file = "signalling_bot"
proc = subprocess.Popen(
[f"python3 -m {bot_file} -u {room_url} -t {token}"],
shell=True,
bufsize=1,
cwd=os.path.dirname(os.path.abspath(__file__)),
)
bot_procs[proc.pid] = (proc, room_url)
except Exception as e:
raise HTTPException(status_code=500, detail=f"Failed to start subprocess: {e}")
# Return the authentication bundle in format expected by DailyTransport
return {"room_url": room_url, "token": token}
if __name__ == "__main__":
import uvicorn
# Parse command line arguments for server configuration
default_host = os.getenv("HOST", "0.0.0.0")
default_port = int(os.getenv("FAST_API_PORT", "7860"))
parser = argparse.ArgumentParser(description="Daily Travel Companion FastAPI server")
parser.add_argument("--host", type=str, default=default_host, help="Host address")
parser.add_argument("--port", type=int, default=default_port, help="Port number")
parser.add_argument("--reload", action="store_true", help="Reload code on change")
config = parser.parse_args()
# Start the FastAPI server
uvicorn.run(
"server:app",
host=config.host,
port=config.port,
reload=config.reload,
)

View File

@@ -1,95 +0,0 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
from dataclasses import dataclass
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.frames.frames import AudioRawFrame, EndFrame, OutputAudioRawFrame, TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
@dataclass
class SilenceFrame(OutputAudioRawFrame):
def __init__(
self,
*,
sample_rate: int,
duration: float,
):
# Initialize the parent class with the silent frame's data
super().__init__(
audio=self.create_silent_audio_frame(sample_rate, 1, duration).audio,
sample_rate=sample_rate,
num_channels=1,
)
@staticmethod
def create_silent_audio_frame(
sample_rate: int, num_channels: int, duration: float
) -> AudioRawFrame:
"""Create an AudioRawFrame containing silence."""
frame_size = num_channels * 2 # 2 bytes per sample for 16-bit audio
total_frames = int(sample_rate * duration)
total_bytes = total_frames * frame_size
silent_audio = bytes(total_bytes) # Create a byte array filled with zeros
return AudioRawFrame(audio=silent_audio, sample_rate=sample_rate, num_channels=num_channels)
async def main():
async with aiohttp.ClientSession() as session:
(room_url, _) = await configure(session)
transport = DailyTransport(
room_url, None, "Say One Thing", DailyParams(audio_out_enabled=True)
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
runner = PipelineRunner()
task = PipelineTask(Pipeline([tts, transport.output()]))
# Register an event handler so we can play the audio when we receive a specific message
@transport.event_handler("on_app_message")
async def on_app_message(transport, message, sender):
logger.debug(f"Received app message: {message} - {sender}")
if "playable" not in message:
return
await task.queue_frames(
[
SilenceFrame(
sample_rate=task.params.audio_out_sample_rate,
duration=0.5,
),
TTSSpeakFrame(f"Hello there, how are you doing today ?"),
EndFrame(),
]
)
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -1,161 +0,0 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
.pybuilder/
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock
# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype static type analyzer
.pytype/
# Cython debug symbols
cython_debug/
# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
runpod.toml

View File

@@ -1,15 +0,0 @@
FROM python:3.10-bullseye
RUN mkdir /app
RUN mkdir /app/assets
RUN mkdir /app/utils
COPY *.py /app/
COPY requirements.txt /app/
WORKDIR /app
RUN pip3 install -r requirements.txt
EXPOSE 7860
CMD ["python3", "server.py"]

View File

@@ -1,37 +0,0 @@
# Simple Chatbot
<img src="image.png" width="420px">
This app connects you to a chatbot powered by GPT-4, complete with animations generated by Stable Video Diffusion.
See a video of it in action: https://x.com/kwindla/status/1778628911817183509
And a quick video walkthrough of the code: https://www.loom.com/share/13df1967161f4d24ade054e7f8753416
The first time, things might take extra time to get started since VAD (Voice Activity Detection) model needs to be downloaded.
## Get started
```python
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cp env.example .env # and add your credentials
```
## Run the server
```bash
python server.py
```
Then, visit `http://localhost:7860/` in your browser to start a chatbot session.
## Build and test the Docker image
```
docker build -t chatbot .
docker run --env-file .env -p 7860:7860 chatbot
```

View File

@@ -1,170 +0,0 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import datetime
import io
import os
import sys
import wave
import aiofiles
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.audio.audio_buffer_processor import AudioBufferProcessor
from pipecat.services.elevenlabs.tts import ElevenLabsTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
# Create the recordings directory if it doesn't exist
os.makedirs("recordings", exist_ok=True)
async def save_audio(audio: bytes, sample_rate: int, num_channels: int, name: str):
if len(audio) > 0:
filename = os.path.join(
"recordings",
f"{name}_conversation_recording{datetime.datetime.now().strftime('%Y%m%d_%H%M%S')}.wav",
)
with io.BytesIO() as buffer:
with wave.open(buffer, "wb") as wf:
wf.setsampwidth(2)
wf.setnchannels(num_channels)
wf.setframerate(sample_rate)
wf.writeframes(audio)
async with aiofiles.open(filename, "wb") as file:
await file.write(buffer.getvalue())
print(f"Merged audio saved to {filename}")
else:
print("No audio data to save")
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Chatbot",
DailyParams(
audio_out_enabled=True,
audio_in_enabled=True,
video_out_enabled=False,
vad_analyzer=SileroVADAnalyzer(),
transcription_enabled=True,
#
# Spanish
#
# transcription_settings=DailyTranscriptionSettings(
# language="es",
# tier="nova",
# model="2-general"
# )
),
)
tts = ElevenLabsTTSService(
api_key=os.getenv("ELEVENLABS_API_KEY"),
#
# English
#
voice_id="cgSgspJ2msm6clMCkdW9",
#
# Spanish
#
# model="eleven_multilingual_v2",
# voice_id="gD1IexrzCvsXPHUuT0s3",
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
messages = [
{
"role": "system",
#
# English
#
"content": "You are Chatbot, a friendly, helpful robot. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way, but keep your responses brief. Start by introducing yourself. Keep all your response to 12 words or fewer.",
#
# Spanish
#
# "content": "Eres Chatbot, un amigable y útil robot. Tu objetivo es demostrar tus capacidades de una manera breve. Tus respuestas se convertiran a audio así que nunca no debes incluir caracteres especiales. Contesta a lo que el usuario pregunte de una manera creativa, útil y breve. Empieza por presentarte a ti mismo.",
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
# NOTE: Watch out! This will save all the conversation in memory. You
# can pass `buffer_size` to get periodic callbacks.
audiobuffer = AudioBufferProcessor(enable_turn_audio=True)
pipeline = Pipeline(
[
transport.input(), # microphone
context_aggregator.user(),
llm,
tts,
transport.output(),
audiobuffer, # used to buffer the audio in the pipeline
context_aggregator.assistant(),
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
audio_in_sample_rate=16000,
audio_out_sample_rate=16000,
enable_metrics=True,
enable_usage_metrics=True,
),
)
@audiobuffer.event_handler("on_audio_data")
async def on_audio_data(buffer, audio, sample_rate, num_channels):
await save_audio(audio, sample_rate, num_channels, "full")
@audiobuffer.event_handler("on_user_turn_audio_data")
async def on_user_turn_audio_data(buffer, audio, sample_rate, num_channels):
await save_audio(audio, sample_rate, num_channels, "user")
@audiobuffer.event_handler("on_bot_turn_audio_data")
async def on_bot_turn_audio_data(buffer, audio, sample_rate, num_channels):
await save_audio(audio, sample_rate, num_channels, "bot")
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await audiobuffer.start_recording()
await transport.capture_participant_transcription(participant["id"])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
print(f"Participant left: {participant}")
await task.cancel()
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -1,4 +0,0 @@
DAILY_SAMPLE_ROOM_URL=https://yourdomain.daily.co/yourroom # (for joining the bot to the same room repeatedly for local dev)
DAILY_API_KEY=7df...
OPENAI_API_KEY=sk-PL...
ELEVENLABS_API_KEY=aeb...

View File

@@ -1,5 +0,0 @@
aiofiles
python-dotenv
fastapi[all]
uvicorn
pipecat-ai[daily,openai,silero,elevenlabs]

View File

@@ -1,55 +0,0 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import argparse
import os
import aiohttp
from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper
async def configure(aiohttp_session: aiohttp.ClientSession):
parser = argparse.ArgumentParser(description="Daily AI SDK Bot Sample")
parser.add_argument(
"-u", "--url", type=str, required=False, help="URL of the Daily room to join"
)
parser.add_argument(
"-k",
"--apikey",
type=str,
required=False,
help="Daily API Key (needed to create an owner token for the room)",
)
args, unknown = parser.parse_known_args()
url = args.url or os.getenv("DAILY_SAMPLE_ROOM_URL")
key = args.apikey or os.getenv("DAILY_API_KEY")
if not url:
raise Exception(
"No Daily room specified. use the -u/--url option from the command line, or set DAILY_SAMPLE_ROOM_URL in your environment to specify a Daily room URL."
)
if not key:
raise Exception(
"No Daily API key specified. use the -k/--apikey option from the command line, or set DAILY_API_KEY in your environment to specify a Daily API key, available from https://dashboard.daily.co/developers."
)
daily_rest_helper = DailyRESTHelper(
daily_api_key=key,
daily_api_url=os.getenv("DAILY_API_URL", "https://api.daily.co/v1"),
aiohttp_session=aiohttp_session,
)
# Create a meeting token for the given room with an expiration 1 hour in
# the future.
expiry_time: float = 60 * 60
token = await daily_rest_helper.get_token(url, expiry_time)
return (url, token)

View File

@@ -1,139 +0,0 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import argparse
import os
import subprocess
from contextlib import asynccontextmanager
import aiohttp
from dotenv import load_dotenv
from fastapi import FastAPI, HTTPException, Request
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import JSONResponse, RedirectResponse
from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper, DailyRoomParams
MAX_BOTS_PER_ROOM = 1
# Bot sub-process dict for status reporting and concurrency control
bot_procs = {}
daily_helpers = {}
load_dotenv(override=True)
def cleanup():
# Clean up function, just to be extra safe
for entry in bot_procs.values():
proc = entry[0]
proc.terminate()
proc.wait()
@asynccontextmanager
async def lifespan(app: FastAPI):
aiohttp_session = aiohttp.ClientSession()
daily_helpers["rest"] = DailyRESTHelper(
daily_api_key=os.getenv("DAILY_API_KEY", ""),
daily_api_url=os.getenv("DAILY_API_URL", "https://api.daily.co/v1"),
aiohttp_session=aiohttp_session,
)
yield
await aiohttp_session.close()
cleanup()
app = FastAPI(lifespan=lifespan)
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
@app.get("/")
async def start_agent(request: Request):
print(f"!!! Creating room")
room = await daily_helpers["rest"].create_room(DailyRoomParams())
print(f"!!! Room URL: {room.url}")
# Ensure the room property is present
if not room.url:
raise HTTPException(
status_code=500,
detail="Missing 'room' property in request data. Cannot start agent without a target room!",
)
# Check if there is already an existing process running in this room
num_bots_in_room = sum(
1 for proc in bot_procs.values() if proc[1] == room.url and proc[0].poll() is None
)
if num_bots_in_room >= MAX_BOTS_PER_ROOM:
raise HTTPException(status_code=500, detail=f"Max bot limited reach for room: {room.url}")
# Get the token for the room
token = await daily_helpers["rest"].get_token(room.url)
if not token:
raise HTTPException(status_code=500, detail=f"Failed to get token for room: {room.url}")
# Spawn a new agent, and join the user session
# Note: this is mostly for demonstration purposes (refer to 'deployment' in README)
try:
proc = subprocess.Popen(
[f"python3 -m bot -u {room.url} -t {token}"],
shell=True,
bufsize=1,
cwd=os.path.dirname(os.path.abspath(__file__)),
)
bot_procs[proc.pid] = (proc, room.url)
except Exception as e:
raise HTTPException(status_code=500, detail=f"Failed to start subprocess: {e}")
return RedirectResponse(room.url)
@app.get("/status/{pid}")
def get_status(pid: int):
# Look up the subprocess
proc = bot_procs.get(pid)
# If the subprocess doesn't exist, return an error
if not proc:
raise HTTPException(status_code=404, detail=f"Bot with process id: {pid} not found")
# Check the status of the subprocess
if proc[0].poll() is None:
status = "running"
else:
status = "finished"
return JSONResponse({"bot_id": pid, "status": status})
if __name__ == "__main__":
import uvicorn
default_host = os.getenv("HOST", "0.0.0.0")
default_port = int(os.getenv("FAST_API_PORT", "7860"))
parser = argparse.ArgumentParser(description="Daily Storyteller FastAPI server")
parser.add_argument("--host", type=str, default=default_host, help="Host address")
parser.add_argument("--port", type=int, default=default_port, help="Port number")
parser.add_argument("--reload", action="store_true", help="Reload code on change")
config = parser.parse_args()
uvicorn.run(
"server:app",
host=config.host,
port=config.port,
reload=config.reload,
)

View File

@@ -1,39 +0,0 @@
# Daily Custom Tracks
This example shows how to send and receive Daily custom tracks. We will run a simple `daily-python` application to send an audio file with a custom track (named "pipecat") to a room. Then, the Pipecat bot will mirror that custom track into another custom track (named "pipecat-mirror") in the same room.
## Get started
```python
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```
## Run the bot
Start the bot by giving it a Daily room URL.
```bash
python bot.py -u ROOM_URL
```
The bot will wait for the first participant to join. Then, it will mirror a custom track named "pipecat" into a new custom track named "pipecat-mirror".
## Run the sender
Now, run the custom track sender. This is a simple `daily-python` application that opens and audio file and sends it as a custom track to the same Daily room.
```bash
python custom_track_sender.py -u ROOM_URL -i office-ambience-mono-16000.mp3
```
## Open client
Finally, open the client so you can hear both custom tracks.
```bash
open index.html
```
Once the client is opened, copy the URL of the Daily room and join it. You should be able to select which custom track you want to hear.

View File

@@ -1,89 +0,0 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import sys
import aiohttp
from loguru import logger
from runner import configure
from pipecat.frames.frames import Frame, InputAudioRawFrame, OutputAudioRawFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.transports.services.daily import DailyParams, DailyTransport
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
class CustomTrackMirrorProcessor(FrameProcessor):
def __init__(self, transport_destination: str, **kwargs):
super().__init__(**kwargs)
self._transport_destination = transport_destination
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, InputAudioRawFrame) and frame.transport_source:
output_frame = OutputAudioRawFrame(
audio=frame.audio,
sample_rate=frame.sample_rate,
num_channels=frame.num_channels,
)
output_frame.transport_destination = self._transport_destination
await self.push_frame(output_frame)
else:
await self.push_frame(frame, direction)
async def main():
async with aiohttp.ClientSession() as session:
(room_url, _) = await configure(session)
transport = DailyTransport(
room_url,
None,
"Custom tracks mirror",
DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
microphone_out_enabled=False, # Disable since we just use custom tracks
audio_out_destinations=["pipecat-mirror"],
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
CustomTrackMirrorProcessor("pipecat-mirror"),
transport.output(), # Transport bot output
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
audio_in_sample_rate=16000,
audio_out_sample_rate=16000,
enable_metrics=True,
enable_usage_metrics=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_audio(participant["id"], audio_source="pipecat")
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -1,74 +0,0 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import argparse
import time
from daily import CallClient, CustomAudioSource, Daily
from pydub import AudioSegment
parser = argparse.ArgumentParser(description="Daily AI SDK Bot Sample")
parser.add_argument("-u", "--url", type=str, required=True, help="URL of the Daily room to join")
parser.add_argument(
"-i", "--input", type=str, required=True, help="Input audio file (needs 16000 sample rate)"
)
args, _ = parser.parse_known_args()
audio = AudioSegment.from_mp3(args.input)
raw_bytes = audio.raw_data
sample_rate = audio.frame_rate
channels = audio.channels
print(f"Length: {len(raw_bytes)} bytes")
print(f"Sample rate: {sample_rate}, Channels: {channels}")
# Initialize the Daily context & create call client
Daily.init()
client = CallClient()
# Join the room and indicate we have a custom track named "pipecat".
client.join(
args.url,
client_settings={
"publishing": {
"camera": False,
"microphone": False,
"customAudio": {"pipecat": True},
},
},
)
# Just sleep for a couple of seconds. To do this well we should really use
# completions.
time.sleep(2)
# Create the custom audio source. This is where we will write our audio.
audio_source = CustomAudioSource(sample_rate, channels)
# Create an audio track and assign it our audio source.
client.add_custom_audio_track("pipecat", audio_source)
# Just sleep for a second. To do this well we should really use completions.
time.sleep(1)
try:
# Just write one second of audio until we have read all the file.
chunk_size = sample_rate * channels * 2
while len(raw_bytes) > 0:
chunk = raw_bytes[:chunk_size]
raw_bytes = raw_bytes[chunk_size:]
audio_source.write_frames(chunk)
except KeyboardInterrupt:
client.leave()
# Just sleep for a second. To do this well we should really use completions.
time.sleep(1)
client.release()

View File

@@ -1,173 +0,0 @@
<html>
<head>
<title>daily custom tracks</title>
</head>
<script crossorigin src="https://unpkg.com/@daily-co/daily-js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/fomantic-ui/2.8.6/semantic.min.js"></script>
<link
rel="stylesheet"
type="text/css"
href="https://cdnjs.cloudflare.com/ajax/libs/fomantic-ui/2.8.6/semantic.min.css"
/>
<script>
function enableButton(buttonId, enable) {
const button = document.getElementById(buttonId);
button.disabled = !enable;
}
function enableJoinButton(enable) {
enableButton("join-button", enable);
}
function enableLeaveButton(enable) {
enableButton("leave-button", enable);
}
function destroyPlayers(query) {
const items = document.querySelectorAll(query);
if (items) {
for (const item of items) {
item.remove();
}
}
}
function destroyParticipantPlayers(participantId) {
destroyPlayers(`audio[data-participant-id="${participantId}"]`);
destroyPlayers(`button[data-participant-id="${participantId}"]`);
}
async function startPlayer(player, track) {
player.muted = false;
player.autoplay = true;
if (track != null) {
player.srcObject = new MediaStream([track]);
}
}
async function buildAudioPlayer(track, participantId) {
const audioContainer = document.getElementById("audio-container");
const player = document.createElement("audio");
player.dataset.participantId = participantId;
// Create a new button for controlling audio
const audioControlButton = document.createElement("button");
audioControlButton.className = "ui primary green button"
audioControlButton.innerText = track._mediaTag == "cam-audio" ? "english" : track._mediaTag;
audioControlButton.dataset.participantId = participantId;
audioControlButton.onclick = () => {
if (player.paused) {
player.play();
audioControlButton.className = "ui primary red button"
} else {
player.pause();
audioControlButton.className = "ui primary green button"
}
};
audioContainer.appendChild(player);
audioContainer.appendChild(audioControlButton);
await startPlayer(player, track);
player.pause()
return player;
}
function subscribeToTracks(participantId) {
console.log(`subscribing to track`);
if (participantId === "local") {
return;
}
callObject.updateParticipant(participantId, {
setSubscribedTracks: {
audio: true,
video: false,
custom: true,
},
});
}
function startDaily() {
enableJoinButton(true);
enableLeaveButton(false);
window.callObject = window.DailyIframe.createCallObject({});
callObject.on("participant-joined", (e) => {
if (!e.participant.local) {
console.log("participant-joined", e.participant);
subscribeToTracks(e.participant.session_id);
}
});
callObject.on("participant-left", (e) => {
console.log("participant-left", e.participant.session_id);
destroyParticipantPlayers(e.participant.session_id);
});
callObject.on("track-started", async (e) => {
console.log("track-started", e.track);
if (e.track.kind === "audio") {
await buildAudioPlayer(e.track, e.participant.session_id);
}
});
}
async function joinRoom() {
enableJoinButton(false);
enableLeaveButton(true);
const meetingUrl = document.getElementById("meeting-url").value;
callObject.join({
url: meetingUrl,
startVideoOff: true,
startAudioOff: true,
subscribeToTracksAutomatically: false,
receiveSettings: {
base: { video: { layer: 0 } },
},
});
}
async function leaveRoom() {
enableJoinButton(true);
enableLeaveButton(false);
callObject.leave();
const audioContainer = document.getElementById("audio-container");
audioContainer.replaceChildren();
}
</script>
<body onload="startDaily()">
<div class="ui centered page grid" style="margin-top: 30px">
<div class="ten wide column">
<div class="ui form" style="margin-top: 30px">
<div class="field">
<label>Meeting URL</label>
<input id="meeting-url" value="" />
</div>
</div>
</div>
</div>
<div class="ui centered aligned header" style="margin-top: 30px">
<button id="join-button" class="ui primary button" onclick="joinRoom()">
Join
</button>
<button id="leave-button" class="ui button" onclick="leaveRoom()">
Leave
</button>
</div>
<div id="tile" class="ui container" style="margin-top: 30px">
<div id="tile" class="ui center aligned grid">
<div id="audio-container"></div><br/>
</div>
</div>
</body>
</html>

View File

@@ -1,2 +0,0 @@
pydub
pipecat-ai[daily]

View File

@@ -1,55 +0,0 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import argparse
import os
import aiohttp
from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper
async def configure(aiohttp_session: aiohttp.ClientSession):
parser = argparse.ArgumentParser(description="Daily AI SDK Bot Sample")
parser.add_argument(
"-u", "--url", type=str, required=False, help="URL of the Daily room to join"
)
parser.add_argument(
"-k",
"--apikey",
type=str,
required=False,
help="Daily API Key (needed to create an owner token for the room)",
)
args, unknown = parser.parse_known_args()
url = args.url or os.getenv("DAILY_SAMPLE_ROOM_URL")
key = args.apikey or os.getenv("DAILY_API_KEY")
if not url:
raise Exception(
"No Daily room specified. use the -u/--url option from the command line, or set DAILY_SAMPLE_ROOM_URL in your environment to specify a Daily room URL."
)
if not key:
raise Exception(
"No Daily API key specified. use the -k/--apikey option from the command line, or set DAILY_API_KEY in your environment to specify a Daily API key, available from https://dashboard.daily.co/developers."
)
daily_rest_helper = DailyRESTHelper(
daily_api_key=key,
daily_api_url=os.getenv("DAILY_API_URL", "https://api.daily.co/v1"),
aiohttp_session=aiohttp_session,
)
# Create a meeting token for the given room with an expiration 1 hour in
# the future.
expiry_time: float = 60 * 60
token = await daily_rest_helper.get_token(url, expiry_time)
return (url, token)

View File

@@ -1,15 +0,0 @@
FROM python:3.10-bullseye
RUN mkdir /app
RUN mkdir /app/assets
RUN mkdir /app/utils
COPY *.py /app/
COPY requirements.txt /app/
WORKDIR /app
RUN pip3 install -r requirements.txt
EXPOSE 7860
CMD ["python3", "server.py"]

View File

@@ -1,39 +0,0 @@
# Daily Multi Translation
This example shows how to use Daily to stream multiple simultaneous translations using a single transport. Daily provides custom tracks and in this example we will simultaneously translate incoming audio in English to Spanish, French and German, each of them being sent to a custom track.
## Get started
```python
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cp env.example .env # and add your credentials
```
## Run the server
```bash
python server.py
```
Then, visit `http://localhost:7860/` in your browser. This will open a Daily Prebuilt room where you will speak in English (make sure you are not muted).
## Open client
Next, you need to open the client that will listen to the translations.
```bash
open index.html
```
Once the client is opened, copy the URL of the Daily room created above and join it. You should be able to select which translation you want to hear.
## Build and test the Docker image
```
docker build -t daily-multi-translation .
docker run --env-file .env -p 7860:7860 daily-multi-translation
```

View File

@@ -1,163 +0,0 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.audio.mixers.soundfile_mixer import SoundfileMixer
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.observers.loggers.transcription_log_observer import TranscriptionLogObserver
from pipecat.pipeline.parallel_pipeline import ParallelPipeline
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
BACKGROUND_SOUND_FILE = "office-ambience-mono-16000.mp3"
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Multi translation bot",
DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
audio_out_mixer={
"spanish": SoundfileMixer(
sound_files={"office": BACKGROUND_SOUND_FILE}, default_sound="office"
),
"french": SoundfileMixer(
sound_files={"office": BACKGROUND_SOUND_FILE}, default_sound="office"
),
"german": SoundfileMixer(
sound_files={"office": BACKGROUND_SOUND_FILE}, default_sound="office"
),
},
audio_out_destinations=["spanish", "french", "german"],
microphone_out_enabled=False, # Disable since we just use custom tracks
vad_analyzer=SileroVADAnalyzer(),
),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts_spanish = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="cefcb124-080b-4655-b31f-932f3ee743de",
transport_destination="spanish",
)
tts_french = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="8832a0b5-47b2-4751-bb22-6a8e2149303d",
transport_destination="french",
)
tts_german = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="38aabb6a-f52b-4fb0-a3d1-988518f4dc06",
transport_destination="german",
)
messages_spanish = [
{
"role": "system",
"content": "You will be provided with a sentence in English, and your task is to only translate it into Spanish.",
},
]
messages_french = [
{
"role": "system",
"content": "You will be provided with a sentence in English, and your task is to only translate it into French.",
},
]
messages_german = [
{
"role": "system",
"content": "You will be provided with a sentence in English, and your task is to only translate it into German.",
},
]
llm_spanish = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm_french = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm_german = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
context_spanish = OpenAILLMContext(messages_spanish)
context_aggregator_spanish = llm_spanish.create_context_aggregator(context_spanish)
context_french = OpenAILLMContext(messages_french)
context_aggregator_french = llm_french.create_context_aggregator(context_french)
context_german = OpenAILLMContext(messages_german)
context_aggregator_german = llm_german.create_context_aggregator(context_german)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
ParallelPipeline(
# Spanish pipeline.
[
context_aggregator_spanish.user(),
llm_spanish,
tts_spanish,
context_aggregator_spanish.assistant(),
],
# French pipeline.
[
context_aggregator_french.user(),
llm_french,
tts_french,
context_aggregator_french.assistant(),
],
# German pipeline.
[
context_aggregator_german.user(),
llm_german,
tts_german,
context_aggregator_german.assistant(),
],
),
transport.output(), # Transport bot output
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
audio_in_sample_rate=16000,
audio_out_sample_rate=16000,
enable_metrics=True,
enable_usage_metrics=True,
),
observers=[TranscriptionLogObserver()],
)
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -1,5 +0,0 @@
DAILY_SAMPLE_ROOM_URL=https://yourdomain.daily.co/yourroom # (for joining the bot to the same room repeatedly for local dev)
DAILY_API_KEY=7df...
OPENAI_API_KEY=sk-PL...
DEEPGRAM_API_KEY=efb...
CARTESIA_API_KEY=aeb...

View File

@@ -1,202 +0,0 @@
<html>
<head>
<title>daily multi translation</title>
</head>
<script crossorigin src="https://unpkg.com/@daily-co/daily-js"></script>
<script
src="https://code.jquery.com/jquery-3.1.1.min.js"
integrity="sha256-hVVnYaiADRTO2PzUGmuLJr8BLUSjGIZsDYGmIJLv2b8="
crossorigin="anonymous"
></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/fomantic-ui/2.8.6/semantic.min.js"></script>
<link
rel="stylesheet"
type="text/css"
href="https://cdnjs.cloudflare.com/ajax/libs/fomantic-ui/2.8.6/semantic.min.css"
/>
<script>
function enableButton(buttonId, enable) {
const button = document.getElementById(buttonId);
button.disabled = !enable;
}
function enableJoinButton(enable) {
enableButton("join-button", enable);
}
function enableLeaveButton(enable) {
enableButton("leave-button", enable);
}
function destroyPlayers(query) {
const items = document.querySelectorAll(query);
if (items) {
for (const item of items) {
item.remove();
}
}
}
function destroyParticipantPlayers(participantId) {
destroyPlayers(`video[data-participant-id="${participantId}"]`);
destroyPlayers(`audio[data-participant-id="${participantId}"]`);
destroyPlayers(`button[data-participant-id="${participantId}"]`);
}
async function startPlayer(player, track) {
player.muted = false;
player.autoplay = true;
if (track != null) {
player.srcObject = new MediaStream([track]);
}
}
async function buildVideoPlayer(track, participantId) {
const videoContainer = document.getElementById("video-container");
const player = document.createElement("video");
player.dataset.participantId = participantId;
videoContainer.appendChild(player);
await startPlayer(player, track);
await player.play();
return player;
}
async function buildAudioPlayer(track, participantId) {
const audioContainer = document.getElementById("audio-container");
const player = document.createElement("audio");
player.dataset.participantId = participantId;
// Create a new button for controlling audio
const audioControlButton = document.createElement("button");
audioControlButton.className = "ui primary green button"
audioControlButton.innerText = track._mediaTag == "cam-audio" ? "english" : track._mediaTag;
audioControlButton.dataset.participantId = participantId;
audioControlButton.onclick = () => {
if (player.paused) {
player.play();
audioControlButton.className = "ui primary red button"
} else {
player.pause();
audioControlButton.className = "ui primary green button"
}
};
audioContainer.appendChild(player);
audioContainer.appendChild(audioControlButton);
await startPlayer(player, track);
player.pause()
return player;
}
function subscribeToTracks(participantId) {
console.log(`subscribing to track`);
if (participantId === "local") {
return;
}
callObject.updateParticipant(participantId, {
setSubscribedTracks: {
audio: true,
video: true,
custom: true,
},
});
}
function startDaily() {
enableJoinButton(true);
enableLeaveButton(false);
window.callObject = window.DailyIframe.createCallObject({});
callObject.on("participant-joined", (e) => {
if (!e.participant.local) {
console.log("participant-joined", e.participant);
subscribeToTracks(e.participant.session_id);
}
});
callObject.on("participant-left", (e) => {
console.log("participant-left", e.participant.session_id);
destroyParticipantPlayers(e.participant.session_id);
});
callObject.on("track-started", async (e) => {
console.log("track-started", e.track);
if (e.track.kind === "video") {
await buildVideoPlayer(e.track, e.participant.session_id);
} else if (e.track.kind === "audio") {
await buildAudioPlayer(e.track, e.participant.session_id);
}
});
}
async function joinRoom() {
enableJoinButton(false);
enableLeaveButton(true);
const meetingUrl = document.getElementById("meeting-url").value;
callObject.join({
url: meetingUrl,
startVideoOff: true,
startAudioOff: true,
subscribeToTracksAutomatically: false,
receiveSettings: {
base: { video: { layer: 0 } },
},
});
}
async function leaveRoom() {
enableJoinButton(true);
enableLeaveButton(false);
callObject.leave();
const videoContainer = document.getElementById("video-container");
videoContainer.replaceChildren();
const audioContainer = document.getElementById("audio-container");
audioContainer.replaceChildren();
}
</script>
<body onload="startDaily()">
<div class="ui centered page grid" style="margin-top: 30px">
<div class="ten wide column">
<div class="ui form" style="margin-top: 30px">
<div class="field">
<label>Meeting URL</label>
<input id="meeting-url" value="" />
</div>
</div>
</div>
</div>
<div class="ui centered aligned header" style="margin-top: 30px">
<button id="join-button" class="ui primary button" onclick="joinRoom()">
Join
</button>
<button id="leave-button" class="ui button" onclick="leaveRoom()">
Leave
</button>
</div>
<div id="tile" class="ui container" style="margin-top: 30px">
<div id="tile" class="ui center aligned grid">
<div id="audio-container"></div><br/>
</div>
</div>
<div id="tile" class="ui container" style="margin-top: 30px">
<div id="tile" class="ui center aligned grid">
<div id="video-container" class="ui segment"></div>
</div>
</div>
</body>
</html>

View File

@@ -1,5 +0,0 @@
aiofiles
python-dotenv
fastapi[all]
uvicorn
pipecat-ai[daily,deepgram,openai,silero,cartesia,soundfile]

View File

@@ -1,55 +0,0 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import argparse
import os
import aiohttp
from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper
async def configure(aiohttp_session: aiohttp.ClientSession):
parser = argparse.ArgumentParser(description="Daily AI SDK Bot Sample")
parser.add_argument(
"-u", "--url", type=str, required=False, help="URL of the Daily room to join"
)
parser.add_argument(
"-k",
"--apikey",
type=str,
required=False,
help="Daily API Key (needed to create an owner token for the room)",
)
args, unknown = parser.parse_known_args()
url = args.url or os.getenv("DAILY_SAMPLE_ROOM_URL")
key = args.apikey or os.getenv("DAILY_API_KEY")
if not url:
raise Exception(
"No Daily room specified. use the -u/--url option from the command line, or set DAILY_SAMPLE_ROOM_URL in your environment to specify a Daily room URL."
)
if not key:
raise Exception(
"No Daily API key specified. use the -k/--apikey option from the command line, or set DAILY_API_KEY in your environment to specify a Daily API key, available from https://dashboard.daily.co/developers."
)
daily_rest_helper = DailyRESTHelper(
daily_api_key=key,
daily_api_url=os.getenv("DAILY_API_URL", "https://api.daily.co/v1"),
aiohttp_session=aiohttp_session,
)
# Create a meeting token for the given room with an expiration 1 hour in
# the future.
expiry_time: float = 60 * 60
token = await daily_rest_helper.get_token(url, expiry_time)
return (url, token)

View File

@@ -1,139 +0,0 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import argparse
import os
import subprocess
from contextlib import asynccontextmanager
import aiohttp
from dotenv import load_dotenv
from fastapi import FastAPI, HTTPException, Request
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import JSONResponse, RedirectResponse
from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper, DailyRoomParams
MAX_BOTS_PER_ROOM = 1
# Bot sub-process dict for status reporting and concurrency control
bot_procs = {}
daily_helpers = {}
load_dotenv(override=True)
def cleanup():
# Clean up function, just to be extra safe
for entry in bot_procs.values():
proc = entry[0]
proc.terminate()
proc.wait()
@asynccontextmanager
async def lifespan(app: FastAPI):
aiohttp_session = aiohttp.ClientSession()
daily_helpers["rest"] = DailyRESTHelper(
daily_api_key=os.getenv("DAILY_API_KEY", ""),
daily_api_url=os.getenv("DAILY_API_URL", "https://api.daily.co/v1"),
aiohttp_session=aiohttp_session,
)
yield
await aiohttp_session.close()
cleanup()
app = FastAPI(lifespan=lifespan)
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
@app.get("/")
async def start_agent(request: Request):
print(f"!!! Creating room")
room = await daily_helpers["rest"].create_room(DailyRoomParams())
print(f"!!! Room URL: {room.url}")
# Ensure the room property is present
if not room.url:
raise HTTPException(
status_code=500,
detail="Missing 'room' property in request data. Cannot start agent without a target room!",
)
# Check if there is already an existing process running in this room
num_bots_in_room = sum(
1 for proc in bot_procs.values() if proc[1] == room.url and proc[0].poll() is None
)
if num_bots_in_room >= MAX_BOTS_PER_ROOM:
raise HTTPException(status_code=500, detail=f"Max bot limited reach for room: {room.url}")
# Get the token for the room
token = await daily_helpers["rest"].get_token(room.url)
if not token:
raise HTTPException(status_code=500, detail=f"Failed to get token for room: {room.url}")
# Spawn a new agent, and join the user session
# Note: this is mostly for demonstration purposes (refer to 'deployment' in README)
try:
proc = subprocess.Popen(
[f"python3 -m bot -u {room.url} -t {token}"],
shell=True,
bufsize=1,
cwd=os.path.dirname(os.path.abspath(__file__)),
)
bot_procs[proc.pid] = (proc, room.url)
except Exception as e:
raise HTTPException(status_code=500, detail=f"Failed to start subprocess: {e}")
return RedirectResponse(room.url)
@app.get("/status/{pid}")
def get_status(pid: int):
# Look up the subprocess
proc = bot_procs.get(pid)
# If the subprocess doesn't exist, return an error
if not proc:
raise HTTPException(status_code=404, detail=f"Bot with process id: {pid} not found")
# Check the status of the subprocess
if proc[0].poll() is None:
status = "running"
else:
status = "finished"
return JSONResponse({"bot_id": pid, "status": status})
if __name__ == "__main__":
import uvicorn
default_host = os.getenv("HOST", "0.0.0.0")
default_port = int(os.getenv("FAST_API_PORT", "7860"))
parser = argparse.ArgumentParser(description="Daily Storyteller FastAPI server")
parser.add_argument("--host", type=str, default=default_host, help="Host address")
parser.add_argument("--port", type=int, default=default_port, help="Port number")
parser.add_argument("--reload", action="store_true", help="Reload code on change")
config = parser.parse_args()
uvicorn.run(
"server:app",
host=config.host,
port=config.port,
reload=config.reload,
)

View File

@@ -1,13 +0,0 @@
FROM python:3.11-bullseye
# Open port 7860 for http service
ENV FAST_API_PORT=7860
EXPOSE 7860
# Install Python dependencies
COPY *.py .
COPY ./requirements.txt requirements.txt
RUN pip3 install --no-cache-dir --upgrade -r requirements.txt
# Start the FastAPI server
CMD python3 bot_runner.py --port ${FAST_API_PORT}

View File

@@ -1,39 +0,0 @@
# Fly.io deployment example
This project modifies the `bot_runner.py` server to launch a new machine for each user session. This is a recommended approach for production vs. running shell processess as your deployment will quickly run out of system resources under load.
For this example, we are using Daily as a WebRTC transport and provisioning a new room and token for each session. You can use another transport, such as WebSockets, by modifying the `bot.py` and `bot_runner.py` files accordingly.
## Setting up your fly.io deployment
### Create your fly.toml file
You can copy the `example-fly.toml` as a reference. Be sure to change the app name to something unique.
### Create your .env file
Copy the base `env.example` to `.env` and enter the necessary API keys.
`FLY_APP_NAME` should match that in the `fly.toml` file.
### Launch a new fly.io project
`fly launch` or `fly launch --org your-org-name`
### Set the necessary app secrets from your .env
Note: you can do this manually via the fly.io dashboard under the "secrets" sub-section of your deployment (e.g. "https://fly.io/apps/fly-app-name/secrets") or run the following terminal command:
`cat .env | tr '\n' ' ' | xargs flyctl secrets set`
### Deploy your machine
`fly deploy`
## Connecting to your bot
Send a post request to your running fly.io instance:
`curl --location --request POST 'https://YOUR_FLY_APP_NAME/'`
This request will wait until the machine enters into a `starting` state, before returning the a room URL and token to join.

View File

@@ -1,113 +0,0 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import argparse
import asyncio
import os
import sys
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import EndFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.elevenlabs.tts import ElevenLabsTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
daily_api_key = os.getenv("DAILY_API_KEY", "")
daily_api_url = os.getenv("DAILY_API_URL", "https://api.daily.co/v1")
async def main(room_url: str, token: str):
transport = DailyTransport(
room_url,
token,
"Chatbot",
DailyParams(
api_url=daily_api_url,
api_key=daily_api_key,
audio_in_enabled=True,
audio_out_enabled=True,
video_out_enabled=False,
vad_analyzer=SileroVADAnalyzer(),
transcription_enabled=True,
),
)
tts = ElevenLabsTTSService(
api_key=os.getenv("ELEVENLABS_API_KEY", ""),
voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
messages = [
{
"role": "system",
"content": "You are Chatbot, a friendly, helpful robot. Your output will be converted to audio so don't include special characters other than '!' or '?' in your answers. Respond to what the user said in a creative and helpful way, but keep your responses brief. Start by saying hello.",
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(),
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
@transport.event_handler("on_call_state_updated")
async def on_call_state_updated(transport, state):
if state == "left":
# Here we don't want to cancel, we just want to finish sending
# whatever is queued, so we use an EndFrame().
await task.queue_frame(EndFrame())
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Pipecat Bot")
parser.add_argument("-u", type=str, help="Room URL")
parser.add_argument("-t", type=str, help="Token")
config = parser.parse_args()
asyncio.run(main(config.u, config.t))

View File

@@ -1,209 +0,0 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import argparse
import os
import subprocess
from contextlib import asynccontextmanager
import aiohttp
from dotenv import load_dotenv
from fastapi import FastAPI, HTTPException, Request
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import JSONResponse
from pipecat.transports.services.helpers.daily_rest import (
DailyRESTHelper,
DailyRoomObject,
DailyRoomParams,
DailyRoomProperties,
)
load_dotenv(override=True)
# ------------ Configuration ------------ #
MAX_SESSION_TIME = 5 * 60 # 5 minutes
REQUIRED_ENV_VARS = [
"DAILY_API_KEY",
"OPENAI_API_KEY",
"ELEVENLABS_API_KEY",
"ELEVENLABS_VOICE_ID",
"FLY_API_KEY",
"FLY_APP_NAME",
]
FLY_API_HOST = os.getenv("FLY_API_HOST", "https://api.machines.dev/v1")
FLY_APP_NAME = os.getenv("FLY_APP_NAME", "pipecat-fly-example")
FLY_API_KEY = os.getenv("FLY_API_KEY", "")
FLY_HEADERS = {"Authorization": f"Bearer {FLY_API_KEY}", "Content-Type": "application/json"}
daily_helpers = {}
# ----------------- API ----------------- #
@asynccontextmanager
async def lifespan(app: FastAPI):
aiohttp_session = aiohttp.ClientSession()
daily_helpers["rest"] = DailyRESTHelper(
daily_api_key=os.getenv("DAILY_API_KEY", ""),
daily_api_url=os.getenv("DAILY_API_URL", "https://api.daily.co/v1"),
aiohttp_session=aiohttp_session,
)
yield
await aiohttp_session.close()
app = FastAPI(lifespan=lifespan)
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
# ----------------- Main ----------------- #
async def spawn_fly_machine(room_url: str, token: str):
async with aiohttp.ClientSession() as session:
# Use the same image as the bot runner
async with session.get(
f"{FLY_API_HOST}/apps/{FLY_APP_NAME}/machines", headers=FLY_HEADERS
) as r:
if r.status != 200:
text = await r.text()
raise Exception(f"Unable to get machine info from Fly: {text}")
data = await r.json()
image = data[0]["config"]["image"]
# Machine configuration
cmd = f"python3 bot.py -u {room_url} -t {token}"
cmd = cmd.split()
worker_props = {
"config": {
"image": image,
"auto_destroy": True,
"init": {"cmd": cmd},
"restart": {"policy": "no"},
"guest": {"cpu_kind": "shared", "cpus": 1, "memory_mb": 1024},
},
}
# Spawn a new machine instance
async with session.post(
f"{FLY_API_HOST}/apps/{FLY_APP_NAME}/machines", headers=FLY_HEADERS, json=worker_props
) as r:
if r.status != 200:
text = await r.text()
raise Exception(f"Problem starting a bot worker: {text}")
data = await r.json()
# Wait for the machine to enter the started state
vm_id = data["id"]
async with session.get(
f"{FLY_API_HOST}/apps/{FLY_APP_NAME}/machines/{vm_id}/wait?state=started",
headers=FLY_HEADERS,
) as r:
if r.status != 200:
text = await r.text()
raise Exception(f"Bot was unable to enter started state: {text}")
print(f"Machine joined room: {room_url}")
@app.post("/")
async def start_bot(request: Request) -> JSONResponse:
try:
data = await request.json()
# Is this a webhook creation request?
if "test" in data:
return JSONResponse({"test": True})
except Exception as e:
pass
# Use specified room URL, or create a new one if not specified
room_url = os.getenv("DAILY_SAMPLE_ROOM_URL", "")
if not room_url:
params = DailyRoomParams(properties=DailyRoomProperties())
try:
room: DailyRoomObject = await daily_helpers["rest"].create_room(params=params)
except Exception as e:
raise HTTPException(status_code=500, detail=f"Unable to provision room {e}")
else:
# Check passed room URL exists, we should assume that it already has a sip set up
try:
room: DailyRoomObject = await daily_helpers["rest"].get_room_from_url(room_url)
except Exception:
raise HTTPException(status_code=500, detail=f"Room not found: {room_url}")
# Give the agent a token to join the session
token = await daily_helpers["rest"].get_token(room.url, MAX_SESSION_TIME)
if not room or not token:
raise HTTPException(status_code=500, detail=f"Failed to get token for room: {room_url}")
# Launch a new fly.io machine, or run as a shell process (not recommended)
run_as_process = os.getenv("RUN_AS_PROCESS", False)
if run_as_process:
try:
subprocess.Popen(
[f"python3 -m bot -u {room.url} -t {token}"],
shell=True,
bufsize=1,
cwd=os.path.dirname(os.path.abspath(__file__)),
)
except Exception as e:
raise HTTPException(status_code=500, detail=f"Failed to start subprocess: {e}")
else:
try:
await spawn_fly_machine(room.url, token)
except Exception as e:
raise HTTPException(status_code=500, detail=f"Failed to spawn VM: {e}")
# Grab a token for the user to join with
user_token = await daily_helpers["rest"].get_token(room.url, MAX_SESSION_TIME)
return JSONResponse(
{
"room_url": room.url,
"token": user_token,
}
)
if __name__ == "__main__":
# Check environment variables
for env_var in REQUIRED_ENV_VARS:
if env_var not in os.environ:
raise Exception(f"Missing environment variable: {env_var}.")
parser = argparse.ArgumentParser(description="Pipecat Bot Runner")
parser.add_argument(
"--host", type=str, default=os.getenv("HOST", "0.0.0.0"), help="Host address"
)
parser.add_argument("--port", type=int, default=os.getenv("PORT", 7860), help="Port number")
parser.add_argument(
"--reload", action="store_true", default=False, help="Reload code on change"
)
config = parser.parse_args()
try:
import uvicorn
uvicorn.run("bot_runner:app", host=config.host, port=config.port, reload=config.reload)
except KeyboardInterrupt:
print("Pipecat runner shutting down...")

View File

@@ -1,8 +0,0 @@
DAILY_API_KEY=
DAILY_SAMPLE_ROOM_URL= # Enter a Daily room URL to use a set room URL each time (useful for local testing)
OPENAI_API_KEY=
ELEVENLABS_API_KEY=
ELEVENLABS_VOICE_ID=
FLY_API_KEY=
FLY_APP_NAME=
RUN_AS_PROCESS= # Spawn fly.io machine for each session or run as local process

View File

@@ -1,25 +0,0 @@
# fly.toml app configuration file generated for pipecat-fly-example on 2024-07-01T15:04:53+01:00
#
# See https://fly.io/docs/reference/configuration/ for information about how to use this file.
#
app = 'pipecat-fly-example'
primary_region = 'sjc'
[build]
[env]
FLY_APP_NAME = 'pipecat-fly-example'
[http_service]
internal_port = 7860
force_https = true
auto_stop_machines = true
auto_start_machines = true
min_machines_running = 0
processes = ['app']
[[vm]]
memory = 512
cpu_kind = 'shared'
cpus = 1

View File

@@ -1,5 +0,0 @@
pipecat-ai[daily,openai,silero]
fastapi
uvicorn
python-dotenv
loguru

View File

@@ -1,94 +0,0 @@
# Modal clone
modal-examples
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
build/
dist/
*.egg-info/
*.egg
.installed.cfg
.eggs/
downloads/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
MANIFEST
# Virtual Environments
venv/
env/
.env
.venv/
ENV/
env.bak/
venv.bak/
# IDE
.idea/
.vscode/
.spyderproject
.spyproject
.ropeproject
# Testing and Coverage
.coverage
.coverage.*
htmlcov/
.pytest_cache/
.tox/
.nox/
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
cover/
# Logs and Databases
*.log
*.db
db.sqlite3
db.sqlite3-journal
pip-log.txt
# System Files
.DS_Store
Thumbs.db
desktop.ini
*.swp
*.swo
*.bak
*.tmp
*~
# Build and Documentation
docs/_build/
.pybuilder/
target/
instance/
.webassets-cache
.pdm.toml
.pdm-python
.pdm-build/
__pypackages__/
# Other
*.mo
*.pot
*.sage.py
.mypy_cache/
.dmypy.json
dmypy.json
.pyre/
.pytype/
cython_debug/
.ipynb_checkpoints

View File

@@ -1,91 +0,0 @@
# Deploying Pipecat to Modal.com
Deployment example for [modal.com](https://www.modal.com). This example demonstrates how to deploy a FastAPI webapp to Modal with an RTVI compatible `/connect` endpoint that launches a Pipecat pipeline in a separate Modal container and returns a room/token for the client to join. This example also supports providing a parameter to the `/connect` endpoint for specifying which Pipecat pipeline to launch; openai, gemini, or vllm. The vllm pipeline points to a self-hosted OpenAI compatible LLM, using a llama model (neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w4a16), deployed to Modal.
![](diagram.jpg)
# Running this Example
## Install the Modal CLI
Setup a Modal account and install it on your machine if you have not already, following their easy 3 steps in their [Getting Started Guide](https://modal.com/docs/guide#getting-started)
## Deploy a self-serve LLM
1. Deploy Modal's OpenAI-compatible LLM service:
```bash
git clone https://github.com/modal-labs/modal-examples
cd modal-examples
modal deploy 06_gpu_and_ml/llm-serving/vllm_inference.py
```
Refer to Modal's guide and example for [Deploying an OpenAI-compatible LLM service with vLLM](https://modal.com/docs/examples/vllm_inference) for more details.
2. Take note of the endpoint URL from the previous step, which will look like:
```
https://{your-workspace}--example-vllm-openai-compatible-serve.modal.run
```
You'll need this for the `bot_vllm.py` file in the next section.
**Note:** The default Modal LLM example uses Llama-3.1 and will shut down after 15 minutes of inactivity. Cold starts take 5-10 minutes. To prepare the service, we recommend visiting the `/docs` endpoint (`https://<Modal workspace>--example-vllm-openai-compatible-serve.modal.run/docs`) for your deployed LLM and wait for it to fully load before connecting your client.
## Deploy FastAPI App and Pipecat pipeline to Modal
1. Setup environment variables
```bash
cd server
cp env.example .env
# Modify .env to provide your service API Keys
```
Alternatively, you can configure your Modal app to use [secrets](https://modal.com/docs/guide/secrets)
2. Update the `modal_url` in `server/src/bot_vllm.py` to point to the url produced from the self-serve llm deploy, mentioned above.
3. From within the `server` directory, test the app locally:
```bash
modal serve app.py
```
4. Deploy to production
```bash
modal deploy app.py
```
5. Note the endpoint URL produced from this deployment. It will look like:
```bash
https://{your-workspace}--pipecat-modal-fastapi-app.modal.run
```
You'll need this URL for the client's `app.js` configuration mentioned in its README.
## Launch your bots on Modal
### Option 1: Direct Link
Simply click on the url displayed after running the server or deploy step to launch an agent and be redirected to a Daily room to talk with the launched bot. This will use the OpenAI pipeline.
### Option 2: Connect via an RTVI Client
Follow the instructions provided in the [client folder's README](client/javascript/README.md) for building and running a custom client that connects to your Modal endpoint. The provided client provides a dropdown for choosing which bot pipeline to run.
# Navigating your llm, server, and Pipecat logs
In your [Modal dashboard](https://modal.com/apps), you should have two Apps listed under Live Apps:
1. `example-vllm-openai-compatible`: This App contains the containers and logs used to run your self-hosted LLM. There will be just one App Function listed: `serve`. Click on this function to view logs for your LLM.
2. `pipecat-modal`: This App contains the containers and logs used to run your `connect` endpoints and Pipecat pipelines. It will list two App Functions:
1. `fastapi_app`: This function is running the endpoints that your client will interact with and initiate starting a new pipeline (`/`, `/connect`, `/status`). Click on this function to see logs for each endpoint hit.
2. `bot_runner`: This function handles launching and running a bot pipeline. Click on this function to get a list of all pipeline runs and access each run's logs.
# Modal + Pipecat Tips
- In most other Pipecat examples, we use `Popen` to launch the pipeline process from the `/connect` endpoint. In this example, we use a Modal function instead. This allows us to run the pipelines using a separately defined Modal image as well as run each pipeline in an isolated container.
- For the FastAPI and most common Pipecat Pipeline containers, a default `debian_slim` CPU-only should be all that's required to run. GPU containers are needed for self-hosted services.
- To minimize cold starts of the pipeline and reduce latency for users, set `min_containers=1` on the Modal Function that launches the pipeline to ensure at least one warm instance of your function is always available.
- For next steps on running a self-hosted llm and reducing latency, check out all of [Modal's LLM examples](https://modal.com/docs/examples/vllm_inference).

View File

@@ -1 +0,0 @@
node_modules

View File

@@ -1,29 +0,0 @@
# JavaScript Implementation
Basic implementation using the [Pipecat JavaScript SDK](https://docs.pipecat.ai/client/js/introduction).
## Setup
1. Deploy the Modal server. See the main [README](../../README).
2. Navigate to the `client/javascript` directory:
```bash
cd client/javascript
```
3. Modify the baseUrl in src/app.js to point to your deployed Modal endpoint
4. Install dependencies:
```bash
npm install
```
5. Run the client app:
```
npm run dev
```
6. Visit http://localhost:5173 in your browser.

View File

@@ -1,49 +0,0 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>AI Chatbot</title>
</head>
<body>
<div class="container">
<div class="status-bar">
<div class="status">
Status: <span id="connection-status">Disconnected</span>
</div>
<div class="controls">
<select id="bot-selector">
<option value="openai">OpenAI</option>
<option value="gemini">Gemini</option>
<option value="vllm">Llama</option>
</select>
<button id="connect-btn">Connect</button>
<button id="disconnect-btn" disabled>Disconnect</button>
</div>
</div>
<div class="main-content">
<div class="bot-container">
<div id="bot-video-container"></div>
<audio id="bot-audio" autoplay></audio>
</div>
</div>
<div class="device-bar">
<div class="device-controls">
<select id="device-selector"></select>
<button id="mic-toggle-btn">Mute Mic</button>
</div>
</div>
<div class="debug-panel">
<h3>Debug Info</h3>
<div id="debug-log"></div>
</div>
</div>
<script type="module" src="/src/app.js"></script>
<link rel="stylesheet" href="/src/style.css" />
</body>
</html>

File diff suppressed because it is too large Load Diff

View File

@@ -1,21 +0,0 @@
{
"name": "client",
"version": "1.0.0",
"main": "index.js",
"scripts": {
"dev": "vite",
"build": "vite build",
"preview": "vite preview"
},
"keywords": [],
"author": "",
"license": "ISC",
"description": "",
"devDependencies": {
"vite": "^6.3.5"
},
"dependencies": {
"@pipecat-ai/client-js": "^1.0.0",
"@pipecat-ai/daily-transport": "^1.0.0"
}
}

View File

@@ -1,376 +0,0 @@
/**
* Copyright (c) 20242025, Daily
*
* SPDX-License-Identifier: BSD 2-Clause License
*/
/**
* Pipecat Client Implementation
*
* This client connects to an RTVI-compatible bot server using WebRTC (via Daily).
* It handles audio/video streaming and manages the connection lifecycle.
*
* Requirements:
* - A running RTVI bot server (defaults to http://localhost:7860)
* - The server must implement the /connect endpoint that returns Daily.co room credentials
* - Browser with WebRTC support
*/
import { PipecatClient, RTVIEvent } from '@pipecat-ai/client-js';
import { DailyTransport } from '@pipecat-ai/daily-transport';
/**
* ChatbotClient handles the connection and media management for a real-time
* voice and video interaction with an AI bot.
*/
class ChatbotClient {
constructor() {
// Initialize client state
this.pcClient = null;
this.setupDOMElements();
this.initializeClientAndTransport();
this.setupEventListeners();
}
/**
* Set up references to DOM elements and create necessary media elements
*/
setupDOMElements() {
// Get references to UI control elements
this.connectBtn = document.getElementById('connect-btn');
this.disconnectBtn = document.getElementById('disconnect-btn');
this.statusSpan = document.getElementById('connection-status');
this.debugLog = document.getElementById('debug-log');
this.botVideoContainer = document.getElementById('bot-video-container');
this.deviceSelector = document.getElementById('device-selector');
// Create an audio element for bot's voice output
this.botAudio = document.createElement('audio');
this.botAudio.autoplay = true;
this.botAudio.playsInline = true;
document.body.appendChild(this.botAudio);
}
/**
* Set up event listeners for connect/disconnect buttons
*/
setupEventListeners() {
this.connectBtn.addEventListener('click', () => this.connect());
this.disconnectBtn.addEventListener('click', () => this.disconnect());
// Populate device selector
this.pcClient.getAllMics().then((mics) => {
console.log('Available mics:', mics);
mics.forEach((device) => {
const option = document.createElement('option');
option.value = device.deviceId;
option.textContent = device.label || `Microphone ${device.deviceId}`;
this.deviceSelector.appendChild(option);
});
});
this.deviceSelector.addEventListener('change', (event) => {
const selectedDeviceId = event.target.value;
console.log('Selected device ID:', selectedDeviceId);
this.pcClient.updateMic(selectedDeviceId);
});
// Handle mic mute/unmute toggle
const micToggleBtn = document.getElementById('mic-toggle-btn');
micToggleBtn.addEventListener('click', () => {
let micEnabled = this.pcClient.isMicEnabled;
micToggleBtn.textContent = micEnabled ? 'Unmute Mic' : 'Mute Mic';
this.pcClient.enableMic(!micEnabled);
// Add logic to mute/unmute the mic
if (micEnabled) {
console.log('Mic muted');
// Add code to mute the mic
} else {
console.log('Mic unmuted');
// Add code to unmute the mic
}
});
}
/**
* Set up the Pipecat client and Daily transport
*/
async initializeClientAndTransport() {
// Initialize the Pipecat client with a DailyTransport and our configuration
this.pcClient = new PipecatClient({
transport: new DailyTransport(),
enableMic: true, // Enable microphone for user input
enableCam: false,
callbacks: {
// Handle connection state changes
onConnected: () => {
this.updateStatus('Connected');
this.connectBtn.disabled = true;
this.disconnectBtn.disabled = false;
this.log('Client connected');
},
onDisconnected: () => {
this.updateStatus('Disconnected');
this.connectBtn.disabled = false;
this.disconnectBtn.disabled = true;
this.log('Client disconnected');
},
// Handle transport state changes
onTransportStateChanged: (state) => {
this.updateStatus(`Transport: ${state}`);
this.log(`Transport state changed: ${state}`);
if (state === 'connecting') {
window.startTime = Date.now();
}
if (state === 'ready') {
this.setupMediaTracks();
console.warn('TIME TO BOT READY:', Date.now() - window.startTime);
}
},
// Handle bot connection events
onBotConnected: (participant) => {
this.log(`Bot connected: ${JSON.stringify(participant)}`);
},
onBotDisconnected: (participant) => {
this.log(`Bot disconnected: ${JSON.stringify(participant)}`);
},
onBotReady: (data) => {
this.log(`Bot ready: ${JSON.stringify(data)}`);
this.setupMediaTracks();
},
// Transcript events
onUserTranscript: (data) => {
// Only log final transcripts
if (data.final) {
this.log(`User: ${data.text}`);
}
},
onBotTranscript: (data) => {
this.log(`Bot: ${data.text}`);
},
// Error handling
onMessageError: (error) => {
console.log('Message error:', error);
},
onMicUpdated: (data) => {
console.log('Mic updated:', data);
this.deviceSelector.value = data.deviceId;
},
onError: (error) => {
console.log('Error:', JSON.stringify(error));
},
},
});
// Set up listeners for media track events
this.setupTrackListeners();
await this.pcClient.initDevices();
window.client = this.pcClient;
}
/**
* Add a timestamped message to the debug log
*/
log(message) {
const entry = document.createElement('div');
entry.textContent = `${new Date().toISOString()} - ${message}`;
// Add styling based on message type
if (message.startsWith('User: ')) {
entry.style.color = '#2196F3'; // blue for user
} else if (message.startsWith('Bot: ')) {
entry.style.color = '#4CAF50'; // green for bot
}
this.debugLog.appendChild(entry);
this.debugLog.scrollTop = this.debugLog.scrollHeight;
console.log(message);
}
/**
* Update the connection status display
*/
updateStatus(status) {
this.statusSpan.textContent = status;
this.log(`Status: ${status}`);
}
/**
* Check for available media tracks and set them up if present
* This is called when the bot is ready or when the transport state changes to ready
*/
setupMediaTracks() {
if (!this.pcClient) return;
// Get current tracks from the client
const tracks = this.pcClient.tracks();
// Set up any available bot tracks
if (tracks.bot?.audio) {
this.setupAudioTrack(tracks.bot.audio);
}
if (tracks.bot?.video) {
this.setupVideoTrack(tracks.bot.video);
}
}
/**
* Set up listeners for track events (start/stop)
* This handles new tracks being added during the session
*/
setupTrackListeners() {
if (!this.pcClient) return;
// Listen for new tracks starting
this.pcClient.on(RTVIEvent.TrackStarted, (track, participant) => {
// Only handle non-local (bot) tracks
if (!participant?.local) {
if (track.kind === 'audio') {
this.setupAudioTrack(track);
} else if (track.kind === 'video') {
this.setupVideoTrack(track);
}
this.log(
`Track started event: ${track.kind} from ${
participant?.name || 'unknown'
}`
);
} else {
this.log('Local mic unmuted');
}
});
// Listen for tracks stopping
this.pcClient.on(RTVIEvent.TrackStopped, (track, participant) => {
if (participant.local) {
this.log('Local mic muted');
return;
}
this.log(
`Track stopped event: ${track.kind} from ${
participant?.name || 'unknown'
}`
);
});
}
/**
* Set up an audio track for playback
* Handles both initial setup and track updates
*/
setupAudioTrack(track) {
this.log('Setting up audio track');
// Check if we're already playing this track
if (this.botAudio.srcObject) {
const oldTrack = this.botAudio.srcObject.getAudioTracks()[0];
if (oldTrack?.id === track.id) return;
}
// Create a new MediaStream with the track and set it as the audio source
this.botAudio.srcObject = new MediaStream([track]);
}
/**
* Set up a video track for display
* Handles both initial setup and track updates
*/
setupVideoTrack(track) {
this.log('Setting up video track');
const videoEl = document.createElement('video');
videoEl.autoplay = true;
videoEl.playsInline = true;
videoEl.muted = true;
videoEl.style.width = '100%';
videoEl.style.height = '100%';
videoEl.style.objectFit = 'cover';
// Check if we're already displaying this track
if (this.botVideoContainer.querySelector('video')?.srcObject) {
const oldTrack = this.botVideoContainer
.querySelector('video')
.srcObject.getVideoTracks()[0];
if (oldTrack?.id === track.id) return;
}
// Create a new MediaStream with the track and set it as the video source
videoEl.srcObject = new MediaStream([track]);
this.botVideoContainer.innerHTML = '';
this.botVideoContainer.appendChild(videoEl);
}
/**
* Initialize and connect to the bot
* This sets up the Pipecat client, initializes devices, and establishes the connection
*/
async connect() {
try {
const botSelector = document.getElementById('bot-selector');
const selectedBot = botSelector.value;
// Initialize audio/video devices
this.log('Initializing devices...');
await this.pcClient.initDevices();
// Connect to the bot
this.log(`Connecting to bot: ${selectedBot}`);
await this.pcClient.connect({
// REPLACE WITH YOUR MODAL URL ENDPOINT
endpoint:
'https://<your-workspace>--pipecat-modal-fastapi-app.modal.run/connect',
requestData: {
bot_name: selectedBot,
},
});
this.log('Connection complete');
} catch (error) {
// Handle any errors during connection
console.error('Connection error:', error);
this.log(`Error connecting: ${JSON.stringify(error.message)}`);
this.log(`Error stack: ${error.stack}`);
this.updateStatus('Error');
// Clean up if there's an error
if (this.pcClient) {
try {
await this.pcClient.disconnect();
} catch (disconnectError) {
this.log(`Error during disconnect: ${disconnectError.message}`);
}
}
}
}
/**
* Disconnect from the bot and clean up media resources
*/
async disconnect() {
if (this.pcClient) {
try {
// Disconnect the Pipecat client
await this.pcClient.disconnect();
// Clean up audio
if (this.botAudio.srcObject) {
this.botAudio.srcObject.getTracks().forEach((track) => track.stop());
this.botAudio.srcObject = null;
}
// Clean up video
if (this.botVideoContainer.querySelector('video')?.srcObject) {
const video = this.botVideoContainer.querySelector('video');
video.srcObject.getTracks().forEach((track) => track.stop());
video.srcObject = null;
}
this.botVideoContainer.innerHTML = '';
} catch (error) {
this.log(`Error disconnecting: ${error.message}`);
}
}
}
}
// Initialize the client when the page loads
window.addEventListener('DOMContentLoaded', () => {
new ChatbotClient();
});

View File

@@ -1,135 +0,0 @@
body {
margin: 0;
padding: 20px;
font-family: Arial, sans-serif;
background-color: #f0f0f0;
}
.container {
max-width: 1200px;
margin: 0 auto;
}
.status-bar,
.device-bar {
display: flex;
justify-content: space-between;
align-items: center;
padding: 10px;
background-color: #fff;
border-radius: 8px;
margin-bottom: 20px;
}
.controls,
.device-controls {
display: flex;
align-items: center;
gap: 10px; /* Adds spacing between elements */
}
.device-controls {
margin-left: auto;
}
.controls button,
.device-controls button {
padding: 8px 16px;
margin-left: 10px;
border: none;
border-radius: 4px;
cursor: pointer;
}
#bot-selector,
#device-selector {
padding: 8px 16px;
padding-right: 40px;
border: none;
border-radius: 4px;
background-color: #6c757d; /* Gray background */
color: white; /* White text */
cursor: pointer;
appearance: none; /* Removes default browser styling for dropdowns */
background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 24 24' fill='white'%3E%3Cpath d='M7 10l5 5 5-5z'/%3E%3C/svg%3E"); /* Custom arrow */
background-repeat: no-repeat;
background-position: right 8px center; /* Position the arrow */
}
#bot-selector:focus,
#device-selector:focus {
outline: none;
box-shadow: 0 0 4px rgba(0, 0, 0, 0.3); /* Add a subtle focus effect */
}
#connect-btn {
background-color: #4caf50;
color: white;
}
#disconnect-btn {
background-color: #f44336;
color: white;
}
#mic-toggle-btn {
}
button:disabled {
opacity: 0.5;
cursor: not-allowed;
}
.main-content {
background-color: #fff;
border-radius: 8px;
padding: 20px;
margin-bottom: 20px;
}
.bot-container {
display: flex;
flex-direction: column;
align-items: center;
}
#bot-video-container {
width: 640px;
height: 360px;
background-color: #e0e0e0;
border-radius: 8px;
margin: 20px auto;
overflow: hidden;
display: flex;
align-items: center;
justify-content: center;
}
#bot-video-container video {
width: 100%;
height: 100%;
object-fit: cover;
}
.debug-panel {
background-color: #fff;
border-radius: 8px;
padding: 20px;
}
.debug-panel h3 {
margin: 0 0 10px 0;
font-size: 16px;
font-weight: bold;
}
#debug-log {
height: 200px;
overflow-y: auto;
background-color: #f8f8f8;
padding: 10px;
border-radius: 4px;
font-family: monospace;
font-size: 12px;
line-height: 1.4;
}

Binary file not shown.

Before

Width:  |  Height:  |  Size: 114 KiB

View File

@@ -1,307 +0,0 @@
"""modal_example.
This module shows a simple example of how to deploy a bot using Modal and FastAPI.
It includes:
- FastAPI endpoints for starting agents and checking bot statuses.
- Dynamic loading of bot implementations.
- Use of a Daily transport for bot communication.
"""
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import importlib
import os
from contextlib import asynccontextmanager
from typing import Any, Dict, Literal
import aiohttp
import modal
from fastapi import APIRouter, FastAPI, HTTPException
from fastapi.responses import JSONResponse, RedirectResponse
from pydantic import BaseModel
# container specifications for the FastAPI web server
web_image = (
modal.Image.debian_slim(python_version="3.13")
.pip_install_from_requirements("requirements.txt")
.pip_install("pipecat-ai[daily]")
.add_local_dir("src", remote_path="/root/src")
)
# container specifications for the Pipecat pipeline
bot_image = (
modal.Image.debian_slim(python_version="3.13")
.apt_install("ffmpeg")
.pip_install_from_requirements("requirements.txt")
.pip_install("pipecat-ai[daily,elevenlabs,openai,silero,google]")
.add_local_dir("src", remote_path="/root/src")
)
app = modal.App("pipecat-modal", secrets=[modal.Secret.from_dotenv()])
router = APIRouter()
bot_jobs = {}
daily_helpers = {}
# Names of all supported bot implementations
# These correspond to the bot files in the src directory
BotName = Literal["openai", "gemini", "vllm"]
def cleanup():
"""Cleanup function to terminate all bot processes.
Called during server shutdown.
"""
for entry in bot_jobs.values():
func = modal.FunctionCall.from_id(entry[0])
if func:
func.cancel()
def get_bot_file(bot_name: BotName) -> str:
"""Retrieve the bot file name corresponding to the provided bot_name.
Args:
bot_name (BotName): The name of the bot (e.g., 'openai', 'gemini', 'vllm').
Returns:
str: The file name corresponding to the bot implementation.
Raises:
ValueError: If the bot name is invalid or not supported.
"""
# bot_implementation = os.getenv("BOT_IMPLEMENTATION", "openai").lower().strip()
bot_implementation = bot_name.lower().strip()
if not bot_implementation:
bot_implementation = "openai"
if bot_implementation not in ["openai", "gemini", "vllm"]:
raise ValueError(
f"Invalid BOT_IMPLEMENTATION: {bot_implementation}. Must be 'openai' or 'gemini' or 'vllm'"
)
return f"bot_{bot_implementation}"
def get_runner(path: str, bot_file: str) -> callable:
"""Dynamically import the run_bot function based on the bot name.
Args:
path (str): The path to the bot files (e.g., 'src').
bot_file (str): The file name of the bot implementation (e.g., 'openai', 'gemini', 'vllm').
Returns:
function: The run_bot function from the specified bot module.
Raises:
ImportError: If the specified bot module or run_bot function is not found.
"""
try:
# Dynamically construct the module name
module_name = f"{path}.{bot_file}"
# Import the module
module = importlib.import_module(module_name)
# Get the run_bot function from the module
return getattr(module, "run_bot")
except (ImportError, AttributeError) as e:
raise ImportError(f"Failed to import run_bot from {module_name}: {e}")
async def create_room_and_token() -> tuple[str, str]:
"""Create a Daily room and generate an authentication token.
This function checks for existing room URL and token in the environment variables.
If not found, it creates a new room using the Daily API and generates a token for it.
Returns:
tuple[str, str]: A tuple containing the room URL and the authentication token.
Raises:
HTTPException: If room creation or token generation fails.
"""
from pipecat.transports.services.helpers.daily_rest import DailyRoomParams
room_url = os.getenv("DAILY_SAMPLE_ROOM_URL", None)
token = os.getenv("DAILY_SAMPLE_ROOM_TOKEN", None)
if not room_url:
room = await daily_helpers["rest"].create_room(DailyRoomParams())
if not room.url:
raise HTTPException(status_code=500, detail="Failed to create room")
room_url = room.url
token = await daily_helpers["rest"].get_token(room_url)
if not token:
raise HTTPException(status_code=500, detail=f"Failed to get token for room: {room_url}")
return room_url, token
@app.function(image=bot_image, min_containers=1)
async def bot_runner(room_url, token, bot_name: BotName = "openai"):
"""Launch the provided bot process, providing the given room URL and token for the bot to join.
Args:
room_url (str): The URL of the Daily room where the bot and client will communicate.
token (str): The authentication token for the room.
bot_name (BotName): The name of the bot implementation to use. Defaults to "openai".
Raises:
HTTPException: If the bot pipeline fails to start.
"""
try:
path = "src"
bot_file = get_bot_file(bot_name)
run_bot = get_runner(path, bot_file)
print(f"Starting bot process: {bot_file} -u {room_url} -t {token}")
await run_bot(room_url, token)
except Exception as e:
raise HTTPException(status_code=500, detail=f"Failed to start bot pipeline: {e}")
@asynccontextmanager
async def lifespan(app: FastAPI):
"""FastAPI lifespan manager that handles startup and shutdown tasks.
- Creates aiohttp session
- Initializes Daily API helper
- Cleans up resources on shutdown
"""
from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper
aiohttp_session = aiohttp.ClientSession()
daily_helpers["rest"] = DailyRESTHelper(
daily_api_key=os.getenv("DAILY_API_KEY", ""),
daily_api_url=os.getenv("DAILY_API_URL", "https://api.daily.co/v1"),
aiohttp_session=aiohttp_session,
)
yield
await aiohttp_session.close()
cleanup()
class ConnectData(BaseModel):
"""Data provided by client to specify the bot pipeline.
Attributes:
bot_name (BotName): The name of the bot to connect to. Defaults to "openai".
"""
bot_name: BotName = "openai"
async def start(data: ConnectData):
"""Internal method to start a bot agent and return the room URL and token.
Args:
data (ConnectData): The data containing the bot name to use.
Returns:
tuple[str, str]: A tuple containing the room URL and token.
"""
room_url, token = await create_room_and_token()
launch_bot_func = modal.Function.from_name("pipecat-modal", "bot_runner")
function_id = launch_bot_func.spawn(room_url, token, data.bot_name)
bot_jobs[function_id] = (function_id, room_url)
return room_url, token
@router.get("/")
async def start_agent():
"""A user endpoint for launching a bot agent and redirecting to the created room URL.
This function retrieves the bot implementation from the environment,
starts the bot agent, and redirects the user to the room URL to
interact with the bot through a Daily Prebuilt Interface.
Returns:
RedirectResponse: A response that redirects to the room URL.
"""
bot_name = os.getenv("BOT_IMPLEMENTATION", "openai").lower().strip()
print(f"Starting bot: {bot_name}")
room_url, token = await start(ConnectData(bot_name=bot_name))
return RedirectResponse(room_url)
@router.post("/connect")
async def rtvi_connect(data: ConnectData) -> Dict[Any, Any]:
"""A user endpoint for launching a bot agent and retrieving the room/token credentials.
This function retrieves the bot implementation from the request, if provided,
starts the bot agent, and returns the room URL and token for the bot. This allows the
client to then connect to the bot using their own RTVI interface.
Args:
data (ConnectData): Optional. The data containing the bot name to use.
Returns:
Dict[Any, Any]: A dictionary containing the room URL and token.
"""
print(f"Starting bot: {data.bot_name}")
if data is None or not data.bot_name:
data.bot_name = os.getenv("BOT_IMPLEMENTATION", "openai").lower().strip()
room_url, token = await start(data)
return {"room_url": room_url, "token": token}
@router.get("/status/{fid}")
def get_status(fid: str):
"""Retrieve the status of a bot process by its function ID.
Args:
fid (str): The function ID of the bot process.
Returns:
JSONResponse: A JSON response containing the bot's status and result code.
Raises:
HTTPException: If the bot process with the given ID is not found.
"""
func = modal.FunctionCall.from_id(fid)
if not func:
raise HTTPException(status_code=404, detail=f"Bot with process id: {fid} not found")
try:
result = func.get(timeout=0)
return JSONResponse({"bot_id": fid, "status": "finished", "code": result})
except modal.exception.OutputExpiredError:
return JSONResponse({"bot_id": fid, "status": "finished", "code": 404})
except TimeoutError:
return JSONResponse({"bot_id": fid, "status": "running", "code": 202})
@app.function(image=web_image, min_containers=1)
@modal.concurrent(max_inputs=1)
@modal.asgi_app()
def fastapi_app():
"""Create and configure the FastAPI application.
This function initializes the FastAPI app with middleware, routes, and lifespan management.
It is decorated to be used as a Modal ASGI app.
"""
from fastapi.middleware.cors import CORSMiddleware
# Initialize FastAPI app
web_app = FastAPI(lifespan=lifespan)
web_app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
# Include the endpoints from this file
web_app.include_router(router)
return web_app

View File

@@ -1,14 +0,0 @@
DAILY_API_KEY=
# determines which bot file to default to: 'openai', 'gemini', or 'vllm'
BOT_IMPLEMENTATION=openai
# needed for the openai bot pipeline
OPENAI_API_KEY=
ELEVENLABS_API_KEY=
# needed for the gemini live bot pipeline
GOOGLE_API_KEY=
# needed if you modified the API Key for your self-hosted LLM
VLLM_API_KEY=

Some files were not shown because too many files have changed in this diff Show More