Compare commits

...

660 Commits

Author SHA1 Message Date
Kwindla Hultman Kramer
8f2941c575 Merge pull request #492 from pipecat-ai/khk/flush-more-audio
add calls to flush_audio for say() and rtvi action
2024-09-25 12:35:50 -07:00
Mark Backman
2703813e8a Merge pull request #496 from pipecat-ai/mb/azure-tts-inputs
Add Azure TTS input params
2024-09-25 14:38:01 -04:00
Mark Backman
521e152150 Merge pull request #495 from pipecat-ai/mb/elevenlabs-input-lang
Add language_code support for ElevenLabs TTS
2024-09-25 14:37:44 -04:00
Kwindla Hultman Kramer
3d43ad0f4d actually save the file 2024-09-25 10:59:00 -07:00
Kwindla Hultman Kramer
3621fceae2 fixes as noted by aleix 2024-09-25 09:19:58 -07:00
Aleix Conchillo Flaqué
e123f33c03 Merge pull request #506 from pipecat-ai/aleix/async-generator-processor
processors: add AsyncGeneratorProcessor
2024-09-25 00:04:09 -07:00
Aleix Conchillo Flaqué
b8713666c2 processors: add AsyncGeneratorProcessor 2024-09-25 00:01:04 -07:00
Aleix Conchillo Flaqué
cf0ab85e2c Merge pull request #505 from pipecat-ai/aleix/init-task-variables
initialize task variables and add minor description
2024-09-24 23:59:38 -07:00
Aleix Conchillo Flaqué
8502c7c801 Merge pull request #504 from pipecat-ai/aleix/rtvi-handle-frame
rtvi: add RTVIProcessor.handle_message()
2024-09-24 23:59:26 -07:00
Aleix Conchillo Flaqué
e89814dc6b Merge pull request #503 from pipecat-ai/aleix/end-cancel-task-frames
frames: add EndTaskFrame and CancelTaskFrame
2024-09-24 23:59:10 -07:00
Aleix Conchillo Flaqué
9461bacf0d pyproject: update fastapi to 0.115.0 2024-09-24 19:24:37 -07:00
Aleix Conchillo Flaqué
e276dcbab7 initialize task variables and add minor description 2024-09-24 19:19:00 -07:00
Aleix Conchillo Flaqué
1a3de0e819 rtvi: add RTVIProcessor.handle_message() 2024-09-24 19:12:06 -07:00
Aleix Conchillo Flaqué
ee3786fe15 frames: add EndTaskFrame and CancelTaskFrame 2024-09-24 19:10:22 -07:00
Aleix Conchillo Flaqué
31b5667cee frames: log text with [] so we can distinguish spaces better 2024-09-24 13:10:40 -07:00
Aleix Conchillo Flaqué
a483f1a083 rtvi: handle all actions from the action task 2024-09-24 10:48:15 -07:00
Aleix Conchillo Flaqué
2ecec1c9f8 Merge pull request #500 from pipecat-ai/aleix/rtvi-action-frames-task
RTVI action frames task
2024-09-24 10:13:43 -07:00
Aleix Conchillo Flaqué
08ac311971 rtvi: use task to process incoming action frames 2024-09-24 09:36:53 -07:00
Aleix Conchillo Flaqué
cb49b6a0d6 rtvi: add llm-text and tts-text server messages 2024-09-24 09:36:43 -07:00
Aleix Conchillo Flaqué
016da177db Merge pull request #499 from mercuryyy/main
Fix syntax error in deepgram.py
2024-09-24 09:10:05 -07:00
mercuryyy
b1e17ee347 Fix syntax error in deepgram.py 2024-09-24 07:45:29 -04:00
Mark Backman
8ee9621d66 Add setter functions 2024-09-23 21:12:01 -04:00
Mark Backman
8edee8155d Add input params to Azure TTS 2024-09-23 17:52:23 -04:00
chadbailey59
c262b272fa Added RTVIActionFrame (#464)
* added RTVIActionFrame

* server-sent events

* reverted log changes

* fixup
2024-09-23 14:51:17 -05:00
Aleix Conchillo Flaqué
9ef9c1c58a Merge pull request #497 from pipecat-ai/aleix/ruff-formater
introduce Ruff formatting
2024-09-23 10:42:54 -07:00
Aleix Conchillo Flaqué
c7ff79a652 processors: fix formatting string 2024-09-23 09:53:37 -07:00
Aleix Conchillo Flaqué
da81df5284 github: install dev-requirements when running tests 2024-09-23 09:53:37 -07:00
Aleix Conchillo Flaqué
a4420dc88b README: add vscode and emacs ruff instructions 2024-09-23 09:53:37 -07:00
Aleix Conchillo Flaqué
eeb8338dce introduce Ruff formatting 2024-09-23 09:53:37 -07:00
Cyril S.
dfa4ac81fd Implement Sentry instrumentation for performance and error tracking (#470)
* feat: Add Sentry support in FrameProcessor

This update add optional Sentry integration for performance tracking and error monitoring.

Key changes include:

- Add conditional Sentry import and initialization check
- Implement Sentry spans in FrameProcessorMetrics to measure TTFB (Time To First Byte) and processing time when Sentry is available
- Maintain existing metrics functionality with MetricsFrame regardless of Sentry availability

* feat: Enable metrics in DeepgramSTTService for Sentry

This commit enhances the DeepgramSTTService class to enable metrics generation for use with Sentry.

Key changes include:

1. Enable general metrics generation:
   - Implement `can_generate_metrics` method, returning True when VAD is enabled
   - This allows metrics to be collected and used by both Sentry and the metrics system in frame_processor.py

2. Integrate Sentry-compatible performance tracking:
   - Add start_ttfb_metrics and start_processing_metrics calls in the VAD speech detection handler
   - Implement stop_ttfb_metrics call when receiving transcripts
   - Add stop_processing_metrics for final transcripts

3. Enhance VAD support for metrics:
   - Add `vad_enabled` property to check VAD event availability
   - Implement VAD-based speech detection handler for precise metric timing

These changes enable detailed performance tracking via both Sentry and the general metrics system when VAD is active. This allows for better monitoring and analysis of the speech-to-text process, providing valuable insights through Sentry and any other metrics consumers in the pipeline.

* Update frame_processor.py

* Refactor to support flexible metrics implementation

- Modified the __init__ method to accept a metrics parameter that is either FrameProcessorMetrics or one of its subclasses
- Updated the metrics initialization to create an instance with the processor's name
- Moved all FrameProcessorMetrics-related logic to a new processors\metrics\base.py file

* Implement flexible metrics system with Sentry integration

1. Created a new metrics module in processors/metrics/

2. Implemented FrameProcessorMetrics base class in base.py:

3. Implemented SentryMetrics class in sentry.py:
   - Inherits from FrameProcessorMetrics
   - Integrates with Sentry SDK for advanced metrics tracking
   - Implements Sentry-specific span creation and management for TTFB and processing metrics
   - Handles cases where Sentry is not available or initialized
2024-09-23 08:44:14 -07:00
Lewis Wolfgang
ea16dca8aa Merge pull request #469 from pipecat-ai/lewis/remove_torch_dependency
Remove torch dependency for using silero_vad
2024-09-23 09:59:40 -04:00
Mark Backman
306632b29a Add language_code support for ElevenLabs TTS 2024-09-23 09:01:02 -04:00
Mark Backman
9a4e749c7c Merge pull request #491 from pipecat-ai/mb/elevenlabs-inputs
Add voice_settings and optimize_streaming_latency to ElevenLabs
2024-09-22 21:54:21 -04:00
Mark Backman
55c645c614 Add voice_settings and optimize_streaming_latency to ElevenLabs 2024-09-22 13:58:50 -04:00
Mark Backman
a1024bb365 Merge pull request #490 from pipecat-ai/mb/llm-rtvi-service-option
Add control frames for LLM param updates
2024-09-21 20:10:17 -04:00
Mark Backman
dfc82c3ba4 Merge pull request #486 from pipecat-ai/mb/llm-extra-params
Add extra input param to LLMs
2024-09-21 18:25:47 -04:00
Mark Backman
9e27a8aad0 Add control frames for LLM param updates 2024-09-21 00:02:58 -04:00
Mark Backman
c73111afea Add extra input param to LLMs 2024-09-21 00:01:25 -04:00
Kwindla Hultman Kramer
26a64afd8d Merge pull request #485 from pipecat-ai/khk/metrics-model-exclude-none
fixup for serialization issue
2024-09-20 18:24:19 -07:00
Kwindla Hultman Kramer
78a3f081de fixup for serialization issue 2024-09-20 18:21:06 -07:00
Mark Backman
e8f8a49646 Merge pull request #484 from pipecat-ai/mb/llm-input-params
Add input params for OpenAI, Anthropic, Together AI LLMs
2024-09-20 20:35:49 -04:00
Mark Backman
219304c5ee Added Changelog entries 2024-09-20 20:31:42 -04:00
Mark Backman
f3fd312b83 Add Together AI interruptible example 2024-09-20 20:21:19 -04:00
Mark Backman
357e66d64d Input params for Together AI LLM 2024-09-20 20:21:19 -04:00
Mark Backman
4fa1ea8c4b Input params for Anthropic LLM 2024-09-20 20:21:19 -04:00
Mark Backman
3b81cd462d Input params to OpenAI LLM 2024-09-20 20:21:19 -04:00
Aleix Conchillo Flaqué
14acf05a26 Merge pull request #480 from pipecat-ai/aleix/input-output-frames
introduce input/output audio and image frames
2024-09-20 14:44:37 -07:00
Mattie Ruth
58d9c84bc9 Merge pull request #474 from pipecat-ai/ruthless/improve-metrics-types-2
Ruthless/improve metrics types 2
2024-09-20 09:47:24 -04:00
Aleix Conchillo Flaqué
7e39d9ad3d introduce input/output audio and image frames
We now distinguish between input and output audio and image frames. We introduce
`InputAudioRawFrame`, `OutputAudioRawFrame`, `InputImageRawFrame` and
`OutputImageRawFrame` (and other subclasses of those). The input frames usually
come from an input transport and are meant to be processed inside the pipeline
to generate new frames. However, the input frames will not be sent through an
output transport. The output frames can also be processed by any frame processor
in the pipeline and they are allowed to be sent by the output transport.
2024-09-19 23:11:03 -07:00
mattie ruth backman
a4edb3dab1 Cleanup on aisle METRICS. Note: See below, this is a breaking change
1. Fleshed out MetricsFrames and broke it into a proper set of types
2. Add model_name as a property to the AIService so that it can be
   automatically included in metrics and also remove that
   overhead from all the various services themselves

Breaking change!

Because of the types improvements, the MetricsFrame type has
changed. Each frame will have a list of metrics simlilar to before
except each item in the list will only contain one type of metric:
"ttfb", "tokens", "characters", or "processing". Previously these
fields would be in every entry but set to None if they didn't apply.

While this changes internal handling of the MetricsFrame, it does NOT
break the RTVI/daily messaging of metrics. That format remains the same.

Also. Remember to use model_name for accessing a service's current
model and set_model_name for setting it.
2024-09-19 21:30:34 -04:00
Mattie Ruth
ed409d0460 Merge pull request #478 from pipecat-ai/ruthless/get-tests-running
Ruthless/get tests running
2024-09-19 21:01:27 -04:00
mattie ruth backman
50b45ac2da get the test infrastructure running again
disable broken tests for now
2024-09-19 20:58:17 -04:00
Kwindla Hultman Kramer
29bcbc68c5 Merge pull request #479 from pipecat-ai/khk/small-fixes
fix small issues that crept into main
2024-09-19 17:25:27 -07:00
Kwindla Hultman Kramer
affbe9ac7d fix small issues that crept into main 2024-09-19 17:17:33 -07:00
Aleix Conchillo Flaqué
1790fa452f Merge pull request #436 from pipecat-ai/aleix/frameprocessor-single-task
introduce synchronous and asynchronous frame processors
2024-09-19 11:22:56 -07:00
Aleix Conchillo Flaqué
607a246572 updated CHANGELOG with sync/async frame processors 2024-09-19 01:32:17 -07:00
Aleix Conchillo Flaqué
4f1b06e6b2 pipeline: renamed ParallelTask to SyncParallelPipeline 2024-09-19 01:32:17 -07:00
Aleix Conchillo Flaqué
62e9a33a70 examples: use CartesiaHttpTTSService to synchronize frames 2024-09-19 01:32:17 -07:00
Aleix Conchillo Flaqué
3298f935ef services(fal,moondream): add missing **kwargs 2024-09-19 01:32:17 -07:00
Aleix Conchillo Flaqué
0e8f56c752 services: move TTSService push_stop_frames to AsyncTTSService 2024-09-19 01:32:15 -07:00
Aleix Conchillo Flaqué
8224538372 services(cartesia): added CartesiaHttpTTSService 2024-09-19 01:31:12 -07:00
Aleix Conchillo Flaqué
fbf6eef68f transports(base_output): wait for sink tasks before canceling audio/video tasks 2024-09-19 01:31:12 -07:00
Aleix Conchillo Flaqué
f078d156de frames: StartFrame is now a SystemFrame 2024-09-19 01:31:12 -07:00
Aleix Conchillo Flaqué
23d6eed5ea transports: input()/output() return subclass instead of base class 2024-09-19 01:31:12 -07:00
Aleix Conchillo Flaqué
0ed3d118d6 services(moondream); update revision to 2024-08-26 2024-09-19 01:31:12 -07:00
Aleix Conchillo Flaqué
337f048864 introduce synchronous and asynchronous frame processors
Pipecat has a pipeline-based architecture. The pipeline consists of frame
processors linked to each other. The elements travelling across the pipeline are
called frames.

To have a deterministic behavior the frames travelling through the pipeline
should always be ordered, except system frames which are out-of-band frames. To
achieve that, each frame processor should only output frames from a single task.

There are synchronous and asynchronous frame processors. The synchronous
processors push output frames from the same task that they receive input frames,
and therefore only pushing frames from one task. Asynchrnous frame processors
can have internal tasks to perform things asynchrnously (e.g. receiving data
from a websocket) but they also have a single task where they push frames from.
2024-09-19 01:31:10 -07:00
Mark Backman
6f3c421621 Merge pull request #475 from pipecat-ai/mb/tts-sample-rate
Add sample_rate setting to TTS services
2024-09-18 14:59:09 -04:00
Mark Backman
eadd68d40b Add sample_rate setting to TTS services 2024-09-18 14:50:20 -04:00
Lewis Wolfgang
71202e3cd5 Remove torch dependency for using silero_vad 2024-09-17 16:48:52 -04:00
Aleix Conchillo Flaqué
13a4a05388 Merge pull request #466 from pipecat-ai/aleix/elevenlabs-cartesia-close-websocket-first
services(cartesia,elevenlabs): close websocket before the receiving task
2024-09-16 23:55:28 -07:00
Aleix Conchillo Flaqué
20c019ae16 services(cartesia,elevenlabs): close websocket before the receiving task 2024-09-16 23:54:21 -07:00
Aleix Conchillo Flaqué
d9d6571c73 Merge pull request #465 from kunal-cai/ks--fix-ws
[Cartesia] Fix streaming truncation bug with Twilio Fast API WS
2024-09-16 17:17:13 -07:00
Kunal Shah
540cad4844 Undo sorting 2024-09-16 16:07:19 -07:00
Kunal Shah
0a26b650c0 Undo sorting 2024-09-16 16:06:25 -07:00
Kunal Shah
adaac003e5 [Cartesia] Fix streaming truncation bug with Twilio Fast API WS 2024-09-16 15:59:06 -07:00
Aleix Conchillo Flaqué
3d4f125071 Merge pull request #454 from pipecat-ai/aleix/initial-pipeline-clock-support
initial pipeline clock support
2024-09-13 13:51:04 -07:00
Aleix Conchillo Flaqué
bce87f8717 update CHANGELOG.md 2024-09-13 13:50:03 -07:00
Aleix Conchillo Flaqué
1fe940bd6b servceis(cartesia,elevenlabs): use word start times instead 2024-09-13 13:10:44 -07:00
Aleix Conchillo Flaqué
cb36a71381 fix some linting 2024-09-13 09:56:12 -07:00
Aleix Conchillo Flaqué
5acc4928fe examples: add 07d-interruptible-elevenlabs.py 2024-09-13 09:43:18 -07:00
Aleix Conchillo Flaqué
434493b8aa services(elevenlabs): implement word-by-word support through websockets 2024-09-13 09:31:35 -07:00
Aleix Conchillo Flaqué
f08b25dbb2 examples: assistant aggregator should always goes after transport 2024-09-12 00:37:34 -07:00
Aleix Conchillo Flaqué
3665734972 transports(output): initial sink clock synchronization 2024-09-12 00:37:34 -07:00
Aleix Conchillo Flaqué
a98d78cdea services(lmnt): change to subclass of AsyncTTSService 2024-09-12 00:37:34 -07:00
Aleix Conchillo Flaqué
80f6d74e80 services(cartesia): change to subclass of AsyncWordTTSService 2024-09-12 00:37:34 -07:00
Aleix Conchillo Flaqué
02d926e9bd services: create AsyncTTSService and AsyncWordTTSService 2024-09-12 00:31:48 -07:00
Aleix Conchillo Flaqué
7749692f72 processors: get pipeline clock from StartFrame 2024-09-12 00:31:48 -07:00
Aleix Conchillo Flaqué
7807cbeb39 pipeline(task): add a clock to the pipeline task 2024-09-12 00:31:48 -07:00
Aleix Conchillo Flaqué
72f231b327 frames: add a presentation timestamp (pts) to each frame 2024-09-12 00:31:48 -07:00
Aleix Conchillo Flaqué
3cbe97d346 clocks: added new BaseClock and SystemClock 2024-09-12 00:31:48 -07:00
Kwindla Hultman Kramer
b880e1a60e Merge pull request #448 from pipecat-ai/khk/aggregation-leading-space
fix for leading space in context aggregator strings
2024-09-10 09:57:35 -07:00
Aleix Conchillo Flaqué
886046e696 Merge pull request #445 from dleybz/patch-1
Update requirements.txt
2024-09-09 17:54:33 -07:00
Aleix Conchillo Flaqué
9106a5f8ae Merge pull request #449 from pipecat-ai/aleix/audio-out-bitrate
transports(daily): allow setting audio output bitrate (default 96kpbs)
2024-09-09 08:39:06 -07:00
Aleix Conchillo Flaqué
98286336bf transports(daily): allow setting audio output bitrate (default 96kpbs)
Fixes #388
2024-09-08 19:39:17 -07:00
Kwindla Hultman Kramer
081b001c8b fix for leading space in context aggregator strings 2024-09-07 16:42:52 -07:00
Danny D. Leybzon
c92531a02f Update requirements.txt
request.form() throws an error if you don't have python-multipart installed
2024-09-06 20:22:18 +02:00
Aleix Conchillo Flaqué
748a7af602 update CHANGELOG.md 2024-09-05 19:05:29 -07:00
Aleix Conchillo Flaqué
f4a0de6327 Merge pull request #444 from pipecat-ai/aleix/elevenlabs-streaming
services(elevenlabs): add elevenlabs package and use streaming
2024-09-05 11:24:12 -07:00
Aleix Conchillo Flaqué
e405d7af9f services(elevenlabs): add elevenlabs package and use streaming 2024-09-05 11:20:01 -07:00
Aashraya
51cd7fd285 twiliohandle interruption (#422)
* add interuption handler in twilio serializer

* fix autopep8

* revert ruff autoformatting

* address pr comments

* change interruption frame to user started frame in serializer

* remove overrrident handle interrupt

* remove unused import

* change userstarted to interuption frame
2024-09-02 11:06:38 -07:00
Aleix Conchillo Flaqué
aba5f89174 Merge pull request #437 from soof-golan/soof-obj-id-generation
Generate ids with itertools.count
2024-09-02 10:53:48 -07:00
Soof Golan
5c0f5a1613 Generate ids with itertools.count
Avoids the critical section with threading.Lock in favor of itertools.count.

`count` objects are threadsafe, and their critical section is implemented in C and provide better performance that Python level locking.
2024-09-02 15:39:58 +02:00
Aleix Conchillo Flaqué
7c342f7ba2 Merge pull request #433 from pipecat-ai/aleix/process-all-startframes
StartFrame should be the first frame every processor receives
2024-08-30 14:17:38 -07:00
Aleix Conchillo Flaqué
37e2388758 StartFrame should be the first frame every processor receives
Fixes #427
2024-08-29 22:43:44 -07:00
Aleix Conchillo Flaqué
05f0492a8d Merge pull request #421 from pipecat-ai/aleix/improve-multi-lingual-support
improve multi lingual support
2024-08-29 13:19:40 -07:00
Aleix Conchillo Flaqué
c0ac5c6ae8 services(lmnt): fix example and update README and CHANGELOG 2024-08-29 11:11:24 -07:00
Aleix Conchillo Flaqué
be923687fb processors(rtvi): user decices if bot interrupts on update config 2024-08-29 11:00:03 -07:00
Aleix Conchillo Flaqué
5f32fb125d updated CHANGELOG.md 2024-08-29 11:00:03 -07:00
Aleix Conchillo Flaqué
ae6fbb3146 services: just set model, voice, language independently 2024-08-29 11:00:03 -07:00
Aleix Conchillo Flaqué
864768635a services: add voice and language to set_model() 2024-08-29 11:00:03 -07:00
Aleix Conchillo Flaqué
d7c9679977 services: allow TTSModelUpdateFrame to also update language and voice 2024-08-29 11:00:03 -07:00
Aleix Conchillo Flaqué
fedfc366f6 services(deepgram): fix strenum values 2024-08-29 11:00:03 -07:00
Aleix Conchillo Flaqué
b3b39626e1 services: allow switching STT language and mdoel at the same time 2024-08-29 11:00:03 -07:00
Aleix Conchillo Flaqué
4e0ece17b6 services: added support for setting STT model and language 2024-08-29 11:00:03 -07:00
Aleix Conchillo Flaqué
fd3fdacdee transcriptions: added more languages 2024-08-29 11:00:03 -07:00
Aleix Conchillo Flaqué
a253606d50 services(daily): on_joined now returns all data not only participant 2024-08-29 11:00:03 -07:00
Aleix Conchillo Flaqué
568d9dc0a3 services(whisper): inherit from SegmentedSTTService 2024-08-29 11:00:03 -07:00
Aleix Conchillo Flaqué
6629b853c5 services(deepgram): inherit from STTService instead of AsyncAIService 2024-08-29 11:00:03 -07:00
Aleix Conchillo Flaqué
3931cb3235 services(cartesia): allow setting language and language voices 2024-08-29 11:00:01 -07:00
Aleix Conchillo Flaqué
38cd86ad52 services: added language to transcription frames 2024-08-29 10:59:02 -07:00
Aleix Conchillo Flaqué
c0cdabf61d frames: adde TTSLanguageUpdateFrame and TTSLanguageVoicesUpdateFrame 2024-08-29 10:59:02 -07:00
Aleix Conchillo Flaqué
51270a96c5 frames: add language to transcription frames 2024-08-29 10:59:02 -07:00
Kwindla Hultman Kramer
84d72c0d5c Merge pull request #425 from pipecat-ai/khk/rtvi-together-function-calling
fixup type mismatches between rtvi data structures and together.py
2024-08-28 13:11:52 -07:00
Aleix Conchillo Flaqué
79aca8169a Merge pull request #391 from sharvil/pr/add-lmnt
LMNT TTS
2024-08-27 21:40:46 -07:00
Kwindla Hultman Kramer
b9d362bd62 fixup type mismatches between rtvi data structures and together.py 2024-08-27 17:39:21 -07:00
Sharvil Nanavati
87c4a1bee1 Move stop frame task creation into TTSService.start 2024-08-27 04:45:21 +00:00
Sharvil Nanavati
c979762b70 Handle cancellation, stopping, and restarting 2024-08-27 01:24:00 +00:00
Sharvil Nanavati
1d92fc3199 Merge branch 'main' into pr/add-lmnt 2024-08-24 10:07:52 -07:00
Sharvil Nanavati
8ac7fb1a67 Use a single long-lived Task to push TTSStoppedFrame 2024-08-24 16:18:07 +00:00
Sharvil Nanavati
60c3d33def Default LMNT to 24kHz, add example 2024-08-24 15:40:29 +00:00
Sharvil Nanavati
8a39d3f4eb services: add a generic mechanism to produce TTSStoppedFrames 2024-08-24 15:40:12 +00:00
Aleix Conchillo Flaqué
e038767b6f Merge pull request #413 from pipecat-ai/aleix/pipecat-0.0.41
prepare pipecat 0.0.41
2024-08-22 17:01:43 -07:00
Aleix Conchillo Flaqué
0c46b3e481 prepare pipecat 0.0.41 2024-08-22 11:50:20 -07:00
Aleix Conchillo Flaqué
d42f072ff5 examples: fix studypal errors and update requirements 2024-08-22 11:50:05 -07:00
Aleix Conchillo Flaqué
9b6f29c24a Merge pull request #414 from pipecat-ai/aleix/add-livekit-dependency
added livekit dependency
2024-08-22 10:55:43 -07:00
Aleix Conchillo Flaqué
873d5dc23f added livekit dependency 2024-08-22 10:54:18 -07:00
Aleix Conchillo Flaqué
6d141fd47f Merge pull request #396 from nulyang/feat/livekit-serializers
Add livekit audio serializers
2024-08-22 10:44:24 -07:00
Aleix Conchillo Flaqué
c6f6cb2947 Merge pull request #412 from pipecat-ai/aleix/fastapi-variable-clash
transports(fastapi): fix variable name clash
2024-08-22 09:50:23 -07:00
Aleix Conchillo Flaqué
0eb189ce7f transports(fastapi): fix variable name clash 2024-08-22 08:50:03 -07:00
Sharvil Nanavati
f4fd7b7028 LMNT TTS 2024-08-22 00:47:41 +00:00
Aleix Conchillo Flaqué
21de8e0a35 transport(out): log bot started/stopped speaking 2024-08-21 17:23:44 -07:00
Aleix Conchillo Flaqué
6f55d494bd frames: use VADParams type in VADParamsUpdateFrame 2024-08-21 17:23:12 -07:00
Aleix Conchillo Flaqué
d216edc567 Merge pull request #409 from aashsach/anthropic-empty-tool-argument
handle empty parameters for anthropic function calling
2024-08-21 16:14:51 -07:00
Aashraya
ec6063ecc4 system is not a list, it is handled and assisgned as string 2024-08-21 16:31:50 +05:30
Aashraya
40fe4ce6fb handle empty parameters for anthropic function calling 2024-08-21 15:49:36 +05:30
Aleix Conchillo Flaqué
31d87a4048 update CHANGELOG.md for 0.0.40 2024-08-20 11:48:40 -07:00
Aleix Conchillo Flaqué
ac8b171fa9 Merge pull request #406 from pipecat-ai/hush/cartesiaDocs
Hush/cartesia docs
2024-08-20 11:17:52 -07:00
James Hush
1f06d78213 github: remove *requirements.txt from tests.yaml 2024-08-20 11:16:25 -07:00
James Hush
28eba17df8 docs: update Cartesia references 2024-08-20 11:13:13 -07:00
Aleix Conchillo Flaqué
dfc2e62339 Merge pull request #405 from pipecat-ai/aleix/revert-dailysettings-aliases
Revert "transports(daily): use aliases in DailyDialinSettings"
2024-08-20 08:53:31 -07:00
Aleix Conchillo Flaqué
80c89a39c9 processors(rtvi): add support for client-ready message (fix) 2024-08-20 07:54:11 -07:00
Aleix Conchillo Flaqué
9d1c16e996 Revert "transports(daily): use aliases in DailyDialinSettings"
This reverts commit 47d375309d.
2024-08-20 07:52:35 -07:00
Aleix Conchillo Flaqué
86604c2353 examples(studypal): use aiohttp instead of requests 2024-08-19 18:11:30 -07:00
Aleix Conchillo Flaqué
8f31a02938 Merge pull request #403 from yashn35/studypal-demo
Add studypal
2024-08-19 17:39:19 -07:00
Aleix Conchillo Flaqué
47d375309d transports(daily): use aliases in DailyDialinSettings 2024-08-19 17:27:43 -07:00
Yash Narayan
980265ca97 Add studypal 2024-08-19 16:58:29 -07:00
Aleix Conchillo Flaqué
90479fff95 processors(rtvi): add set_client_ready() 2024-08-19 16:41:43 -07:00
Aleix Conchillo Flaqué
1ce1fcb0ce Merge pull request #401 from pipecat-ai/aleix/use-cartesia-in-more-examples
examples: use Cartesia TTS in most examples
2024-08-19 16:07:35 -07:00
Aleix Conchillo Flaqué
1a662376fc examples: use Cartesia TTS in most examples 2024-08-19 15:31:34 -07:00
Aleix Conchillo Flaqué
1d24f926ec Merge pull request #400 from pipecat-ai/aleix/rtvi-client-ready
processors(rtvi): add support for client-ready message
2024-08-19 10:53:49 -07:00
Aleix Conchillo Flaqué
4f2c37c940 processors(rtvi): add support for client-ready message 2024-08-19 10:33:18 -07:00
Aleix Conchillo Flaqué
042115a6bb processors(rtvi): update initial config when sending bot-ready message 2024-08-19 09:32:27 -07:00
Aleix Conchillo Flaqué
c9f1469b41 transports(daily/helpers): add server error message to the logs 2024-08-19 08:44:05 -07:00
Aleix Conchillo Flaqué
54c9f604c9 updated CHANGELOG with VADParamsUpdateFrame 2024-08-18 21:20:40 -07:00
Kwindla Hultman Kramer
56fbcd6562 Merge pull request #397 from pipecat-ai/khk/rtvi-vad-params
VADParamsUpdateFrame and handling thereof
2024-08-18 21:14:58 -07:00
Kwindla Hultman Kramer
e6b0500568 make VADAnalyzer:set_params() public 2024-08-18 21:11:18 -07:00
Aleix Conchillo Flaqué
41038b6673 Merge pull request #394 from pipecat-ai/aleix/fix-function-calling-examples
fix function calling examples
2024-08-18 20:55:29 -07:00
Aleix Conchillo Flaqué
26d03f26c9 services(openai, anthropic): a None result should not run inference 2024-08-18 20:48:43 -07:00
Aleix Conchillo Flaqué
f3a4e54996 function calling: start callback should have function name first 2024-08-18 20:48:20 -07:00
Kwindla Hultman Kramer
925e80bb20 VADParamsUpdateFrame and handling thereof 2024-08-18 13:34:46 -07:00
nulyang
9bda09b1a8 serializers(livekit): Add audio serializers 2024-08-18 23:40:32 +08:00
Aleix Conchillo Flaqué
ef0d0531fa services: moved request_image_frame() to LLMService 2024-08-17 23:59:38 -07:00
Aleix Conchillo Flaqué
6520f20ffe fix function calling examples 2024-08-17 23:32:39 -07:00
Aleix Conchillo Flaqué
ebc4e0924b Merge pull request #387 from pipecat-ai/aleix/update-reqs-081624
update pyproject.toml and remove requirements files
2024-08-17 23:29:47 -07:00
Aleix Conchillo Flaqué
9e7c0e6033 Merge pull request #390 from sharvil/pr/websocket-fix
transports(websocket): fix `_audio_buffer` being accidentally overwritten
2024-08-17 23:26:35 -07:00
Aleix Conchillo Flaqué
cf5720f316 update CHANGELOG.md 2024-08-17 21:00:32 -07:00
Kwindla Hultman Kramer
655b468269 Merge pull request #393 from pipecat-ai/khk/anthropic-tools-ordering
fix for out-of-order image messages in anthropic context
2024-08-17 15:07:27 -07:00
Kwindla Hultman Kramer
17f8c93e44 fix for out-of-order image messages in anthropic context 2024-08-17 14:47:29 -07:00
Aleix Conchillo Flaqué
5b4061b0d5 processors(rtvi): fix send_error() 2024-08-16 23:46:57 -07:00
Aleix Conchillo Flaqué
6ce0227e98 processors(rtvi): error-response should always include and error 2024-08-16 23:23:55 -07:00
Aleix Conchillo Flaqué
a583a28850 processors(rtvi): error message should use error field 2024-08-16 23:22:27 -07:00
Aleix Conchillo Flaqué
32daf65adc processors(rtvi): send to the client if errors are fatal 2024-08-16 23:17:55 -07:00
Aleix Conchillo Flaqué
e22c80610e frames: add new FatalErrorFrame 2024-08-16 23:17:31 -07:00
Sharvil Nanavati
374f1e7e01 transports(websocket): fix _audio_buffer being accidentally overwritten
`BaseOutputTransport` declares an `_audio_buffer` instance variable.
`WebsocketServerOutputTransport` accidentally reuses that variable
internally assuming it's class-local and not inherited.

This PR renames the variable in `WebsocketServerOutputTransport`
to avoid the name collision.
2024-08-17 05:28:05 +00:00
Aleix Conchillo Flaqué
d2dfa93bf1 processors(rtvi): send bot-ready when participant joins 2024-08-16 13:58:21 -07:00
Aleix Conchillo Flaqué
fa8c6712c6 transports(daily): fix multiple DailyTransport initialization 2024-08-16 13:32:34 -07:00
Aleix Conchillo Flaqué
4c2b84cb4d update pyproject.toml and remove requirements files 2024-08-16 09:28:46 -07:00
Aleix Conchillo Flaqué
b57c9d569b Merge pull request #352 from pipecat-ai/aleix/rtvi-0.1
processors(rtvi): rtvi 0.1 message protocol
2024-08-15 17:35:50 -07:00
Aleix Conchillo Flaqué
f0e50ba000 Merge pull request #336 from nulyang/fix/azure-transcriptionframe
services(azure): fix TranscriptionFrame parameter type
2024-08-15 17:08:56 -07:00
Mattie Ruth
4a6638f749 Merge pull request #385 from pipecat-ai/mrkb/anthropic-beta-caching
Mrkb/anthropic beta caching
2024-08-15 18:26:51 -04:00
Aleix Conchillo Flaqué
31577252f3 processors(rtvi): handle ErrorFrames 2024-08-15 15:23:31 -07:00
Aleix Conchillo Flaqué
5d71c50080 transports(daily): make sure audio_in_task exists before canceling 2024-08-15 15:23:07 -07:00
Aleix Conchillo Flaqué
981269d594 pipeline(task): process ErrorFrame in same task and stop pipeline task 2024-08-15 15:22:40 -07:00
mattie ruth backman
848db985fc bump anthropic in 3.10 requirements 2024-08-15 16:51:48 -04:00
mattie ruth backman
d5d8e31447 add cache tokens to metrics event 2024-08-15 16:51:48 -04:00
Aleix Conchillo Flaqué
66670a2370 Merge pull request #384 from pipecat-ai/aleix/enable-prompt-caching-frames
services(anthropic): allow setting enable prompt caching via frame
2024-08-15 13:26:39 -07:00
Aleix Conchillo Flaqué
5637f349c6 services(anthropic): allow setting enable prompt caching via frame 2024-08-15 12:43:29 -07:00
Aleix Conchillo Flaqué
93248e1d00 Merge pull request #382 from pipecat-ai/khk/anthropic-beta-caching
Support for Anthropic prompt caching beta
2024-08-15 12:34:54 -07:00
Kwindla Hultman Kramer
187769357f update version number of anthropic dependency 2024-08-15 12:28:41 -07:00
Aleix Conchillo Flaqué
5be6422cc8 Revert "processors(rtvi): process options in the order they are defined"
This reverts commit 61ac83e2d9.
2024-08-15 11:51:00 -07:00
Aleix Conchillo Flaqué
8670b2d994 utils: add match_endofsentence and use it in processors 2024-08-15 11:26:25 -07:00
Aleix Conchillo Flaqué
0bc6db428d processors(rtvi): implement bot-started-speaking and bot-stopped-speaking 2024-08-15 11:05:10 -07:00
Aleix Conchillo Flaqué
67d565930e services: send TTSStartFrame/TTSStopFrame when really needed 2024-08-15 11:05:10 -07:00
Aleix Conchillo Flaqué
b2a7ff6fd3 processors(rtvi): all transport messages should be urgent 2024-08-15 11:05:10 -07:00
Aleix Conchillo Flaqué
425a730d7c transports(base_output): send urgent transport messages immediately 2024-08-15 11:05:10 -07:00
Aleix Conchillo Flaqué
84c5709722 frames: add urgent field to TransportMessageFrame 2024-08-15 11:05:10 -07:00
Kwindla Hultman Kramer
94deec01c9 okay, both files now 2024-08-15 00:57:10 -07:00
Kwindla Hultman Kramer
6e0dd4a779 Anthropic beta prompt caching 2024-08-15 00:54:43 -07:00
Kwindla Hultman Kramer
14bde340dd Merge pull request #381 from pipecat-ai/khk/anthropic-fixup-0814.2
Fixup anthropic context set_messages
2024-08-14 23:34:31 -07:00
Kwindla Hultman Kramer
253765c611 and fixing anthropic demos 2024-08-14 23:14:20 -07:00
Kwindla Hultman Kramer
2b26d7182f replaces 379 2024-08-14 22:40:09 -07:00
Aleix Conchillo Flaqué
61ac83e2d9 processors(rtvi): process options in the order they are defined 2024-08-14 22:26:49 -07:00
Aleix Conchillo Flaqué
d5c7b28cad Merge pull request #380 from pipecat-ai/aleix/rtvi-0.1-context-aggregators-updates
processors(aggregators): multiple LLM aggregators updates
2024-08-14 20:43:50 -07:00
Aleix Conchillo Flaqué
959580a708 processors(logger): fix linting 2024-08-14 20:39:24 -07:00
Aleix Conchillo Flaqué
3a5cd17ea3 processors(aggregators): multiple LLM aggregators updates 2024-08-14 20:23:18 -07:00
Kwindla Hultman Kramer
b78981bb9d Merge pull request #374 from pipecat-ai/khk/together
Together.ai service implementation with Llama 3.1 function calling
2024-08-14 17:29:07 -07:00
Kwindla Hultman Kramer
a6d90b0a00 linting fixes to anthropic.py 2024-08-14 17:27:00 -07:00
Aleix Conchillo Flaqué
67016492f2 transports(daily/helpers): add delete_room_from_url() 2024-08-14 17:14:02 -07:00
Aleix Conchillo Flaqué
2c38089527 processors(rtvi): handle incoming messages in a separate task 2024-08-14 15:34:02 -07:00
Kwindla Hultman Kramer
48f68ba6dc Service for together.ai, including Llama 3.1 function calling support 2024-08-13 15:01:54 -07:00
Aleix Conchillo Flaqué
574df4ba3d processors(rtvi): make sure to send bot-ready when transport is joined 2024-08-13 13:25:15 -07:00
Aleix Conchillo Flaqué
49ca16d125 pipeline(task): only send initial metrics frames if metrics enabled 2024-08-13 12:22:37 -07:00
Aleix Conchillo Flaqué
87525b085e processors(rtvi): linting and make send_error() public 2024-08-13 11:21:51 -07:00
Aleix Conchillo Flaqué
6b53c6add3 transports(daily): DailyTransport default DailyParams 2024-08-13 11:13:18 -07:00
Kwindla Hultman Kramer
29ca1b7855 Anthropic tool use core Pipecat pieces refactored (#369)
* processors(rtvi): rtvi 0.1 message protocol

* added a single function call handler

* wip - function calling

* fixup

* fixup

* fixup

* processors(rtvi): no need for configure_on_start()

* processors(rtvi): add new option values if they haven't been set yet

* Add the model name to the LLM usage metrics

* wip - anthropic tool calling

* still wip - anthropic tool use and vision

* anthropic tools and vision working

* anthropic tool calling and vision

* Cartesia error handling

* Anthropic tool use core Pipecat pieces refactored as per plan

* aleix has good ideas

* Usage metrics for Anthropic LLMs

* fix function call result state not getting cleared bug

* Pass **kwargs through from AnthropicLLMService constructor

* about to tinker with anthropic

* added openai function calling

* openai function calling

* fixup

---------

Co-authored-by: Aleix Conchillo Flaqué <aleix@daily.co>
Co-authored-by: Chad Bailey <chadbailey@gmail.com>
Co-authored-by: mattie ruth backman <mattieruth@gmail.com>
Co-authored-by: chadbailey59 <chadbailey59@users.noreply.github.com>
2024-08-13 13:01:24 -05:00
Aleix Conchillo Flaqué
a42d0c9907 processors(rtvi): add interrupt_bot() 2024-08-13 09:22:43 -07:00
marcus-daily
8bc6ceaa3d Fixing pep8 2024-08-13 15:32:23 +01:00
marcus-daily
0b8a1ab5d1 Handle describe-actions message 2024-08-13 15:32:23 +01:00
Brian Hill
358c287db2 chore: Enable build without git 2024-08-12 11:38:41 -04:00
Brian Hill
2e68453655 Merge pull request #371 from pipecat-ai/cbrianhill/allow-build-without-git
chore: Enable build without git
2024-08-12 10:15:55 -04:00
Brian Hill
89b8a9de7d chore: Enable build without git 2024-08-12 09:36:25 -04:00
Aleix Conchillo Flaqué
c4c2058df9 processors(rtvi): handle frames pushed from outside in order 2024-08-11 23:09:11 -07:00
Aleix Conchillo Flaqué
0d85c0085f processors(rtvi): interrupt the bot if a new config is received 2024-08-11 23:09:11 -07:00
Mattie Ruth
6fa8a8f84f Merge pull request #365 from pipecat-ai/ruthless/metrics 2024-08-11 20:35:05 -04:00
mattie ruth backman
a97775bff3 Add the model name to the LLM usage metrics 2024-08-11 12:08:46 -04:00
Aleix Conchillo Flaqué
32640e054d processors(rtvi): add new option values if they haven't been set yet 2024-08-10 21:25:39 -07:00
Aleix Conchillo Flaqué
aa42da5658 processors(rtvi): no need for configure_on_start() 2024-08-10 21:25:21 -07:00
nulyang
900a94a825 services(azure): fix TranscriptionFrame parameter type 2024-08-10 13:00:03 +08:00
Aleix Conchillo Flaqué
c37552de70 processors(rtvi): add support for action responses 2024-08-09 18:12:37 -07:00
Aleix Conchillo Flaqué
916b37926c processors(rtvi): rtvi 0.1 message protocol 2024-08-09 17:24:38 -07:00
Aleix Conchillo Flaqué
2b76c3c15a update macos-py3.10-requirements 2024-08-09 17:18:30 -07:00
Aleix Conchillo Flaqué
cedd7dde18 update linux-py3.10-requirements.txt 2024-08-09 17:14:46 -07:00
Lewis Wolfgang
d088608d8e Merge pull request #340 from pipecat-ai/lewis/silero-vad-via-pip
Install Silero VAD via pip
2024-08-09 13:27:29 -04:00
Aleix Conchillo Flaqué
06ee29bb8b Merge pull request #359 from pipecat-ai/aleix/twilio-elevenlabs-sample-rates
twilio and elevenlabs sample rates
2024-08-09 09:38:35 -07:00
Aleix Conchillo Flaqué
d255e954d6 services(elevenlabs): allow specifying output_format 2024-08-09 09:38:20 -07:00
Aleix Conchillo Flaqué
6a7ab6b8ac serializers(twilio): allow specifying input and output sample rates 2024-08-09 09:37:51 -07:00
Aleix Conchillo Flaqué
45b18cc0b1 Merge pull request #358 from pipecat-ai/aleix/daily-create-room-exp-fixes
transports(daily): fixed create_room expirations
2024-08-09 09:37:01 -07:00
Aleix Conchillo Flaqué
0479431f0a Merge pull request #357 from pipecat-ai/aleix/daily-on-participant-updated
transports(daily): added on_participant_updated event
2024-08-09 09:36:46 -07:00
Aleix Conchillo Flaqué
ec58dbd791 transports(daily): added on_participant_updated event
Fixes #353
2024-08-09 09:36:24 -07:00
Aleix Conchillo Flaqué
91de68aab3 Merge pull request #355 from pipecat-ai/aleix/usage-metrics-update
processors(base): add start_llm_usage_metrics and start_tts_usage_met…
2024-08-09 09:35:36 -07:00
Aleix Conchillo Flaqué
85efc30145 Merge pull request #356 from pipecat-ai/aleix/eleven_turbo_v2_5
services(elevenlabs): update default model to eleven_turbo_v2_5
2024-08-09 09:34:47 -07:00
Aleix Conchillo Flaqué
0032594f21 transports(daily): fixed create_room expirations
Fixes #348
2024-08-08 22:04:22 -07:00
Aleix Conchillo Flaqué
829fdc5679 services(elevenlabs): update default model to eleven_turbo_v2_5
Fixes #349
2024-08-08 21:38:18 -07:00
Aleix Conchillo Flaqué
22e176e329 processors(base): add start_llm_usage_metrics and start_tts_usage_metrics 2024-08-08 16:46:56 -07:00
Lewis Wolfgang
826a70a137 Merge pull request #354 from pipecat-ai/lewis/delete_room_by_name
Add delete_room_by_name to DailyRESTHelper
2024-08-08 17:09:21 -04:00
Lewis Wolfgang
dd0ea674af Treat 404 (room not found) as a success for deletion 2024-08-08 16:57:58 -04:00
Lewis Wolfgang
a4761b8921 Add delete_room_by_name to DailyRESTHelper 2024-08-08 16:31:01 -04:00
chadbailey59
3958bb7903 Additional LLM and TTS metrics (#343)
* added llm and tts usage metrics

* Metrics debug logging

* cleanup
2024-08-07 08:55:51 -05:00
Aleix Conchillo Flaqué
83a037a7ce Merge pull request #345 from pipecat-ai/aleix/base-output-render-time-fixes
transports(base_output): improve render sleep computation
2024-08-06 17:30:47 -07:00
Aleix Conchillo Flaqué
a3eb8337a6 Merge pull request #342 from pipecat-ai/aleix/base-output-transport-push-audio
transport(base_output): push audio downstream
2024-08-06 17:30:32 -07:00
Aleix Conchillo Flaqué
541072f8e0 transports(base_output): improve render sleep computation 2024-08-06 17:20:41 -07:00
Aleix Conchillo Flaqué
881248cbd6 transport(base_output): push audio downstream 2024-08-05 14:00:09 -07:00
Aleix Conchillo Flaqué
d4979f5e64 Merge pull request #337 from pipecat-ai/aleix/audio-video-sync-and-gstreamer
audio/video sync and gstreamer
2024-08-05 09:28:11 -07:00
Aleix Conchillo Flaqué
4133cd03bb processors(gstreamer): add clock_sync property 2024-08-05 09:23:25 -07:00
Lewis Wolfgang
9f07c3ca27 Fly.io example: remove step to cache silero models.
No longer necessary.
2024-08-05 10:12:35 -04:00
Lewis Wolfgang
b20bacb9ed Remove no longer needed code 2024-08-05 10:10:39 -04:00
Lewis Wolfgang
97cfbfee1d Install silero via pip 2024-08-05 10:01:27 -04:00
Aleix Conchillo Flaqué
fa7c941792 examples(gstreamer): add new GStreamer examples 2024-08-04 12:29:36 -07:00
Aleix Conchillo Flaqué
4738879f32 processors(gstreamer): add new GStreamerPipelineSource 2024-08-04 12:29:34 -07:00
Aleix Conchillo Flaqué
d5d88f756a transport(output): improve audio and image handling for video use cases 2024-08-04 12:29:08 -07:00
Aleix Conchillo Flaqué
65b136bf15 Merge pull request #334 from pipecat-ai/aleix/cleanup-examples-remove-requests
cleanup examples and remove requests
2024-08-01 22:05:01 -07:00
Aleix Conchillo Flaqué
bee0b238e4 examples(storytelling-chatbot): include package-lock.json 2024-08-01 18:23:30 -07:00
Aleix Conchillo Flaqué
c891168ffb services: revert optional aiohttp.ClientSession 2024-08-01 18:22:56 -07:00
Aleix Conchillo Flaqué
6376c2f6aa transport(websocket): fix cancel 2024-08-01 18:09:16 -07:00
Aleix Conchillo Flaqué
4d9b7cdd61 DailyRESTHelper now receives an aiohttp client session 2024-08-01 18:08:57 -07:00
Aleix Conchillo Flaqué
8263d1dd6f update CHANGELOG with latest changes 2024-07-31 23:44:07 -07:00
Aleix Conchillo Flaqué
faf41c0b36 services: ignore yielded None values 2024-07-31 23:41:03 -07:00
Aleix Conchillo Flaqué
27a09c0b2c cleanup examples and remove requests library 2024-07-31 23:39:51 -07:00
Aleix Conchillo Flaqué
3db7f6a284 Merge pull request #333 from pipecat-ai/aleix/allow-internal-http-sessions-rebased
services: allow internal http sessions if none is given
2024-07-31 21:57:00 -07:00
Aleix Conchillo Flaqué
3bfeb5b5ef services: allow internal http sessions if none is given 2024-07-31 21:56:19 -07:00
Aleix Conchillo Flaqué
62a7a555b5 Merge pull request #330 from pipecat-ai/aleix/stop-and-cancel-are-different
EndFrame tries to end gracefully CancelFrame cancels tasks
2024-07-31 15:51:29 -07:00
Aleix Conchillo Flaqué
d60e99a043 examples(06a-image-sync): make sure frames go downstream 2024-07-30 11:41:58 -07:00
Aleix Conchillo Flaqué
77723b34c7 EndFrame tries to end gracefully CancelFrame cancels tasks 2024-07-30 11:41:19 -07:00
Aleix Conchillo Flaqué
c466d34a06 Merge pull request #328 from pipecat-ai/aleix/rtvi-towards-custom-pipelines
processors(rtvi): refactor to allow future custom pipelines
2024-07-29 15:07:57 -07:00
Aleix Conchillo Flaqué
f816897833 Merge pull request #327 from pipecat-ai/aleix/bot-start-stop-speaking-frames
bot start stop speaking frames
2024-07-27 17:21:23 -07:00
Aleix Conchillo Flaqué
c1e8a5e522 processors(rtvi): refactor to allow future custom pipelines 2024-07-26 10:26:36 -07:00
Aleix Conchillo Flaqué
76aca32f2e transport(output): emit new bot start|stop speaking frames 2024-07-25 14:50:33 -07:00
Aleix Conchillo Flaqué
7e31b2a795 processors(user_idle): use user speaking instead of interruption frames 2024-07-25 14:47:56 -07:00
Aleix Conchillo Flaqué
028e38a86b Merge pull request #326 from pipecat-ai/aleix/rtvi-bot-ready-fixes
rtvi: send bot-ready when pipeline is ready and first participant joins
2024-07-25 11:39:14 -07:00
Aleix Conchillo Flaqué
8cf7649855 processors(rtvi): send bot-ready when pipeline AND first participant joins 2024-07-25 11:25:51 -07:00
Aleix Conchillo Flaqué
64f5119b08 transports(base): allow registering event handlers without decorators 2024-07-25 11:24:24 -07:00
Aleix Conchillo Flaqué
4d606aefb3 update CHANGELOG 2024-07-25 09:57:01 -07:00
Ankur Duggal
4bafdaa04d Deepgram Adjustments (#313) 2024-07-25 09:51:51 -07:00
Aleix Conchillo Flaqué
5afe1abf82 Merge pull request #323 from pipecat-ai/aleix/base-input-handle-incoming-interruptions
transports(inputs): handle start/stop interruption frames
2024-07-24 15:16:18 -07:00
Aleix Conchillo Flaqué
f066d50b98 transports(inputs): handle start/stop interruption frames 2024-07-24 15:15:09 -07:00
Aleix Conchillo Flaqué
91103e21cc github(publish_test): download tags and depth to 100 2024-07-24 14:49:09 -07:00
Aleix Conchillo Flaqué
f44dabcd65 Merge pull request #322 from pipecat-ai/aleix/base-input-transport-system-frames-fix
transports(inputs): don't queue incoming system frames
2024-07-24 14:44:18 -07:00
Aleix Conchillo Flaqué
0fd2fca231 frames: StartFrame is now a control frame 2024-07-24 14:42:59 -07:00
Aleix Conchillo Flaqué
5bb64098e7 transports(inputs): don't queue incoming system frames 2024-07-24 14:35:00 -07:00
Aleix Conchillo Flaqué
3fc85e75e0 Merge pull request #320 from pipecat-ai/aleix/req-updates-072324
update project requirements and dependencies
2024-07-23 17:45:18 -07:00
Aleix Conchillo Flaqué
3f61ea16b7 update project requirements and dependencies 2024-07-23 17:35:47 -07:00
Aleix Conchillo Flaqué
4b393092b5 Merge pull request #319 from pipecat-ai/aleix/daily-completion-callbacks-0.0.39-fix
transports(daily): fix completion callbacks handling
2024-07-23 15:27:26 -07:00
Aleix Conchillo Flaqué
b583f5162b transports(daily): fix completion callbacks handling 2024-07-23 15:25:59 -07:00
Aleix Conchillo Flaqué
060a22f395 github: only run publish_test manually
We need to run this manually to avoid test.pypi.org project size limits.
2024-07-23 14:19:24 -07:00
Aleix Conchillo Flaqué
d3e85355f1 Merge pull request #318 from pipecat-ai/aleix/prepare-0.0.38
update CHANGELOG for 0.0.38
2024-07-23 14:12:01 -07:00
Aleix Conchillo Flaqué
83e730b768 update CHANGELOG for 0.0.38 2024-07-23 14:10:10 -07:00
Aleix Conchillo Flaqué
5fcc96446c Merge pull request #317 from pipecat-ai/aleix/silero-repo-params
vad(silero): expose cache and repo parameters
2024-07-23 12:13:20 -07:00
Aleix Conchillo Flaqué
ad88925154 vad(silero): expose cache and repo parameters 2024-07-23 12:12:28 -07:00
Aleix Conchillo Flaqué
0a6ddbf15c Merge pull request #316 from pipecat-ai/aleix/metrics-improvements
metrics improvements
2024-07-23 11:23:57 -07:00
Aleix Conchillo Flaqué
08e0722d97 fix initial metrics format 2024-07-23 11:23:03 -07:00
Aleix Conchillo Flaqué
05d4fba551 processors(rtvi): send initial empty metrics 2024-07-23 11:22:41 -07:00
Aleix Conchillo Flaqué
f41c2b3c9f transports(daily): don't send empty metrics 2024-07-23 11:22:41 -07:00
Aleix Conchillo Flaqué
69f64899fe pipeline: add send_initial_empty_metrics flag 2024-07-23 11:22:41 -07:00
Aleix Conchillo Flaqué
33f0865430 Merge pull request #315 from pipecat-ai/aleix/stop-transcription-error
transports(daily): wait until start|stop_transcription are finished
2024-07-23 11:18:59 -07:00
Aleix Conchillo Flaqué
ad5b9202ab transports(daily): wait until start|stop_transcription are finished
Fixes #305
2024-07-22 22:59:30 -07:00
Aleix Conchillo Flaqué
1676693091 Merge pull request #314 from pipecat-ai/aleix/transcription-timestamps
services: transcription timestamp should use ISO8601 format
2024-07-22 22:43:01 -07:00
Aleix Conchillo Flaqué
0852b50b8f services: transcription timestamp should use ISO8601 format 2024-07-22 22:40:28 -07:00
Aleix Conchillo Flaqué
eb998aa502 Merge pull request #312 from pipecat-ai/aleix/rtvi-support
RTVI support
2024-07-22 16:58:40 -07:00
Aleix Conchillo Flaqué
6dab0e9de7 update CHANGELOG for 0.0.37 2024-07-22 16:00:30 -07:00
Aleix Conchillo Flaqué
95ff1d141c update CHANGELOG with RTVIProcessor 2024-07-22 16:00:26 -07:00
Aleix Conchillo Flaqué
87bc8a9da6 examples: remove RTVI since there are full demos elsewhere 2024-07-22 15:53:39 -07:00
Aleix Conchillo Flaqué
087fe9a537 services(cartesia): fix TTFB 2024-07-22 15:30:16 -07:00
Aleix Conchillo Flaqué
c1170260b5 processors(rtvi): use generic LLM and TTS names 2024-07-22 15:27:33 -07:00
Aleix Conchillo Flaqué
65cdf50774 processors(rtvi): fix task cleanup 2024-07-22 15:01:45 -07:00
Aleix Conchillo Flaqué
9233bb490c processors(rtvi): add support for "tts-text" messages 2024-07-22 11:40:17 -07:00
Aleix Conchillo Flaqué
43932220f7 processors(rtvi): use only user-transcription 2024-07-22 09:40:16 -07:00
Aleix Conchillo Flaqué
cea4d1894e processors(rtvi): change voice before LLM updates 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
80baa0358d processors(rtvi): lable is now rtvi 2024-07-22 09:32:18 -07:00
Chad Bailey
5d73db53a0 initial pseudo function calling 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
302ea90dce processors(rtvi): messages now require an id 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
37b04ed283 processors(rtvi): use send a type=response as command responses 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
be6995cfdf processors(rtvi): renamed realtime-ai to rtvi 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
dfbc11300c processors(realtime-ai): use label instead of tag 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
82d539d174 processors(realtime-ai): add support for interrupting the bot 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
6e00f31014 updated CHANGELOG with new frames and realtime-ai changes 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
a46ac3cc92 examples: moved 18-realtime-ai.py to examples/realtime-ai 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
6fbf98d8e2 processors(realtime-ai): llm-context now uses a data field 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
f094c42728 processors(realtime-ai): add transcription messages 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
13827e1282 processors(realtime-ai): send a successful response for every command 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
32170b47d9 processors(realtime-ai): add user-[start|stopped]-speaking messages 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
09c05354c2 processors(realtime-ai): fix voice initialization 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
b0b1475563 processors(realtime-ai): add support making TTS to speak 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
b85dd7283a processors(realtime-ai): add support for appending to the LLM context 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
846ae765e5 services(TTSService): fix sentence cleanup 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
4c629e538e processors(realtime-ai): add assistant before output transport
Cartesia can do word-to-word output instead of full sentences. This means that
for properly adding things into the context we need to add it before the
transport, otherwise some words might be lost.
2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
f6e22bb3b9 processors(realtime-ai): add silero vad to the transport 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
46a048d7f6 processors(realtime-ai): allow default setup to be None 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
bd9f4eea06 processors(realtime-ai): provide default values 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
0a672e61e2 processors(realtime-ai): update it to use groq by default 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
29a8530221 processors(realtime-ai): add support for updating config (model, voice...) 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
3e738642a7 processors(realtime-ai): add support for getting/updating LLM context 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
f551f55f03 examples: add new foundational/18-realtime-ai.py 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
9f012c8002 processors: add new RealtimeAIProcessor 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
0a69a9e5ef transport(daily): also accept TransportMessageFrame 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
194790183a processor: add support for setting a processor parent 2024-07-22 09:32:18 -07:00
Aleix Conchillo Flaqué
2227721173 update CHANGELOG with StatelessTextTransformer fix (update) 2024-07-22 09:30:45 -07:00
Aleix Conchillo Flaqué
77a53da5f5 update CHANGELOG with StatelessTextTransformer fix 2024-07-22 09:28:38 -07:00
Aleix Conchillo Flaqué
ab63ff275d Merge pull request #310 from weedge/fix/StatelessTextTransformer
fix: push_frame use TextFrame
2024-07-22 09:25:27 -07:00
weedge
e5363f65f0 fix: push_frame use TextFrame
Signed-off-by: weedge <weege007@gmail.com>
2024-07-22 17:29:06 +08:00
Lewis Wolfgang
ffc157de65 Merge pull request #307 from pipecat-ai/lewis/increase_openai_keepalive_expiry
Allow openai http connections to remain open in the pool indefinitely.
2024-07-19 07:09:17 -04:00
Lewis Wolfgang
f9fdadb4c0 Allow openai http connections to remain open in the pool indefinitely.
Rather than expiring in 5 seconds.
2024-07-18 11:18:21 -04:00
Aleix Conchillo Flaqué
4efccb79f2 Merge pull request #306 from pipecat-ai/aleix/remove-llm-response-start-end-frame
remove LLMResponseStartFrame and LLMResponseEndFrame
2024-07-17 21:51:02 -07:00
Aleix Conchillo Flaqué
337968199a update CHANGELOG with CartesiaTTSService and TTSService updates 2024-07-17 20:58:10 -07:00
Aleix Conchillo Flaqué
37027f68cb remove LLMResponseStartFrame and LLMResponseEndFrame
This was added in the past to properly handle interruptions for the
LLMAssistantContextAggregator. But this is not necessary anymore since we can
handle interruptions by just processing the StartInterruptionFrame, so there's
no need for these extra frames.
2024-07-17 20:53:35 -07:00
Kwindla Hultman Kramer
d1b62c5495 Merge pull request #304 from pipecat-ai/khk/cartesia-continue
Cartesia streaming (WebSocket) and word-level timestamps support
2024-07-17 20:29:15 -07:00
Kwindla Hultman Kramer
355fe01cb7 fixed forgotten renames 2024-07-17 20:28:27 -07:00
Kwindla Hultman Kramer
9d050a16c7 committing an uncommitted file 2024-07-17 20:23:41 -07:00
Kwindla Hultman Kramer
fa53c67606 comments re fixes 2024-07-17 18:30:45 -07:00
Kwindla Hultman Kramer
5006376fe6 undo changes to 02-llm-say-one-thing.py 2024-07-17 15:18:47 -07:00
Kwindla Hultman Kramer
2204b8e205 cartesia streaming and context management via word-level timestamps 2024-07-17 15:17:00 -07:00
Kwindla Hultman Kramer
270007b17c wip - using cartesia word timestamps for context management 2024-07-17 14:13:52 -07:00
Kwindla Hultman Kramer
568eb2ef4c cartesia websockets and streaming 2024-07-17 14:13:52 -07:00
Kwindla Hultman Kramer
73ca9184a8 wip cartesia continuation (not working yet) 2024-07-17 14:13:52 -07:00
Aleix Conchillo Flaqué
5e8e11e16e pyproject: require python >= 3.10 2024-07-17 09:52:42 -07:00
Aleix Conchillo Flaqué
029bbc16f2 Merge pull request #286 from TomTom101/feat/regex_endofsentence
fix: No more falsely detect a sentence end on "U.S.A", "3:00 a.m."
2024-07-17 09:49:21 -07:00
Aleix Conchillo Flaqué
9e3d87e4f6 Merge pull request #291 from adidoit/main
Fix error with readme example - SyntaxError: positional argument follows keyword argument
2024-07-15 13:10:17 -04:00
Aleix Conchillo Flaqué
f1410a1127 Merge pull request #297 from wtlow003/main
fix: minor typo
2024-07-15 13:08:23 -04:00
wtlow003
2b980d16c3 fix: minor typo 2024-07-12 18:27:57 +08:00
Adi Pradhan
b2b97aafb8 fix error with readme example - SyntaxError: positional argument follows keyword argument 2024-07-10 09:50:20 -04:00
TomTom101
da2082b025 chore: Combined combinable lookaheads 2024-07-06 11:11:40 +02:00
TomTom101
327ea9d547 chore: Make it a const 2024-07-06 11:08:51 +02:00
TomTom101
b23db4a202 chore: commented regex 2024-07-06 11:06:52 +02:00
TomTom101
d1a36004ab fix: No more falsely detect a sentence end on "U.S.A", "3:00 a.m." and more 2024-07-06 11:01:32 +02:00
Jon Taylor
6071920c45 Merge pull request #284 from pipecat-ai/jpt/storybot-load-balance
Update storybot demo
2024-07-03 19:48:32 +01:00
Jon Taylor
5f539e1fba fixed teardown 2024-07-03 17:02:54 +01:00
Jon Taylor
8e1539c360 virtualized deployment and added room-based balancing 2024-07-03 16:48:14 +01:00
Aleix Conchillo Flaqué
065cfb2aca Merge pull request #280 from pipecat-ai/aleix/library-updates-070224
library updates 070224 and pipecat 0.0.36
2024-07-02 10:14:03 -07:00
Aleix Conchillo Flaqué
3147534e86 update CHANGELOG for 0.0.36 2024-07-02 10:13:26 -07:00
Aleix Conchillo Flaqué
be5603bf16 examples: fix 06a-image-sync.py 2024-07-02 10:11:50 -07:00
Aleix Conchillo Flaqué
b9b0bcdcbd services(azure): close the audio stream on exit 2024-07-02 10:11:35 -07:00
Aleix Conchillo Flaqué
5bcece56f3 services(cartesia): make sure we close the client on exit 2024-07-02 10:11:16 -07:00
Aleix Conchillo Flaqué
d67faef88c pyproject: multiple library updates 2024-07-02 09:05:37 -07:00
Aleix Conchillo Flaqué
8f6db5e905 Merge pull request #279 from pipecat-ai/aleix/gladia-stt-support
add Gladia STT support
2024-07-02 08:07:35 -07:00
Aleix Conchillo Flaqué
82e93a0560 use exclude_none=True when dumping BaseModels 2024-07-02 08:03:31 -07:00
Aleix Conchillo Flaqué
a9a82c083b services: add GladiaSTTService support 2024-07-02 08:03:29 -07:00
Aleix Conchillo Flaqué
974d9c33ed Merge pull request #278 from pipecat-ai/aleix/detect-user-idle
add support for detecting user idle
2024-07-02 08:01:27 -07:00
Jon Taylor
c1957ab694 Merge pull request #274 from pipecat-ai/jpt/deployment-examples
Example deployment pattern for fly.io
2024-07-02 10:17:13 +01:00
Jon Taylor
b20a10a4bc fixed double fly 2024-07-02 10:17:01 +01:00
Aleix Conchillo Flaqué
be14ce465d transports(daily): make sure we don't send data if client is closed 2024-07-01 18:26:13 -07:00
Aleix Conchillo Flaqué
d1ca0c5614 examples: added new 17-detect-user-idle.py 2024-07-01 18:17:43 -07:00
Aleix Conchillo Flaqué
535514f506 processors: added new UserIdleProcessor 2024-07-01 18:17:43 -07:00
Aleix Conchillo Flaqué
933b63cf13 processors: added new IdleFrameProcessor 2024-07-01 14:57:42 -07:00
Aleix Conchillo Flaqué
d7c3e380a5 added BotSpeakingFrame 2024-07-01 14:57:18 -07:00
Aleix Conchillo Flaqué
c5298f78cb add more missing keyword-only arguments 2024-07-01 12:34:53 -07:00
Jon Taylor
4f8f7b8d1d added on_call_state event to prevent idle vms 2024-07-01 19:21:16 +01:00
Aleix Conchillo Flaqué
d7d46919ac update macos-py3.10-requirements.txt 2024-07-01 11:00:59 -07:00
Aleix Conchillo Flaqué
e5d73d2e2e update linux-py3.10-requirements.txt 2024-07-01 10:58:49 -07:00
Aleix Conchillo Flaqué
b145e8ec90 update README with XTTS 2024-07-01 10:49:43 -07:00
Aleix Conchillo Flaqué
97ff4a1fb8 Merge pull request #275 from pipecat-ai/aleix/add-missing-keyword-separators
add missing keyword separators
2024-07-01 10:45:31 -07:00
Aleix Conchillo Flaqué
5018a552c1 services(xtts): no need the WAV header 2024-07-01 10:44:32 -07:00
Aleix Conchillo Flaqué
7f9fd9ffce examples: added 07i-interruptible-xtts 2024-07-01 10:41:34 -07:00
Aleix Conchillo Flaqué
ddd0ca6a8f update CHANGELOG 2024-07-01 10:27:26 -07:00
Aleix Conchillo Flaqué
06f817c7e3 transport(websocket): don't send if serializer returns None 2024-07-01 10:27:26 -07:00
Aleix Conchillo Flaqué
df4c3e56c4 services: add missing * keyword separator 2024-07-01 10:27:26 -07:00
Aleix Conchillo Flaqué
9d5c2b9656 Merge pull request #276 from eddieoz/feature/xtts
Added service XTTS
2024-07-01 10:26:53 -07:00
eddieoz
7ce59c5e2e added service xtts 2024-07-01 20:17:19 +03:00
Aleix Conchillo Flaqué
1c9631fc78 Merge pull request #271 from pipecat-ai/aleix/silero-vad-version
vad(silero): allow specifying a Silero VAD version
2024-07-01 09:39:59 -07:00
Aleix Conchillo Flaqué
efbe7297f7 vad(silero): allow specifying a Silero VAD version 2024-07-01 09:38:43 -07:00
Aleix Conchillo Flaqué
1b45946a61 Merge pull request #270 from pipecat-ai/aleix/async-frame-processor
add new AsyncFrameProcessor and AsyncAIService
2024-07-01 09:37:51 -07:00
Aleix Conchillo Flaqué
cbf5a6362c add new AsyncFrameProcessor and AsyncAIService 2024-07-01 09:37:02 -07:00
Aleix Conchillo Flaqué
583b96c341 Merge pull request #269 from pipecat-ai/aleix/improve-error-handling
improve error handling and don't swallow exceptions
2024-07-01 09:36:00 -07:00
Aleix Conchillo Flaqué
fc0920504d improve error handling and don't swallow exceptions 2024-07-01 09:35:45 -07:00
Aleix Conchillo Flaqué
abd65a93b2 Merge pull request #268 from pipecat-ai/aleix/websocket-dont-send-if-closed
transports(websocket): don't send data if websocket closed
2024-07-01 09:33:45 -07:00
Aleix Conchillo Flaqué
c3244fdd7a transports(websocket): don't send data if websocket closed 2024-07-01 09:31:58 -07:00
Aleix Conchillo Flaqué
e8f58938b0 Merge pull request #267 from pipecat-ai/aleix/processing-metrics
add support for processing metrics
2024-07-01 09:31:05 -07:00
Jon Taylor
602b4f34b1 added example fly.toml 2024-07-01 16:50:53 +01:00
Jon Taylor
0399c84dfa added flyio deployment example 2024-07-01 16:46:38 +01:00
Aleix Conchillo Flaqué
fd5d879bf5 add support for processing metrics
Processing metrics indicate how much time a processor takes to generate all of
its output.
2024-06-28 14:26:57 -07:00
Aleix Conchillo Flaqué
8dff460307 Merge pull request #266 from pipecat-ai/aleix/silero-num-frames-fixes
vad: fix Silero VAD required number of frames
2024-06-28 11:25:55 -07:00
Aleix Conchillo Flaqué
cce1ddb183 vad: fix Silero VAD required number of frames 2024-06-28 10:45:48 -07:00
Aleix Conchillo Flaqué
8691d14289 Merge pull request #255 from Viking5274/main
Fix twilio error
2024-06-26 10:17:03 -07:00
daniil5701133
dd402da9e5 added handling streamSid after first wss connect
fixx name
2024-06-26 18:56:30 +03:00
Aleix Conchillo Flaqué
2fd04248f1 examples(storytelling-chatbot): upgrade npm vulnerabilities 2024-06-25 22:04:55 -07:00
Aleix Conchillo Flaqué
0ac42006f8 Merge pull request #260 from pipecat-ai/aleix/more-interruption-fixes
more interruption fixes
2024-06-25 21:52:02 -07:00
Aleix Conchillo Flaqué
66e331248d update CHANGELOG for 0.0.34 2024-06-25 21:43:23 -07:00
Aleix Conchillo Flaqué
4be3e8c87d aggregators: revert using intermediate results 2024-06-25 21:33:17 -07:00
Aleix Conchillo Flaqué
dac033fe61 services(azure): allow transcriptions during interruptions
If the user interrupts we can't just discard transcriptions because the user is
actually interrupting and talking.
2024-06-25 21:33:06 -07:00
Aleix Conchillo Flaqué
d302cbb114 services(deepgram): allow transcriptions during interruptions
If the user interrupts we can't just discard transcriptions because the user is
actually interrupting and talking.
2024-06-25 21:32:21 -07:00
Aleix Conchillo Flaqué
e3b407db28 Merge pull request #259 from pipecat-ai/aleix/prepare-0.0.33
update CHANGELOG for 0.0.33
2024-06-25 12:05:07 -07:00
Aleix Conchillo Flaqué
4ef623f09e update CHANGELOG for 0.0.33 2024-06-25 11:53:07 -07:00
Aleix Conchillo Flaqué
253530a63d Merge pull request #258 from pipecat-ai/aleix/upgrade-cartesia-1.0.0
services(cartesia): upgrade to new cartesia 1.0.0
2024-06-25 11:52:04 -07:00
Aleix Conchillo Flaqué
4f38d989f5 services(cartesia): upgrade to new cartesia 1.0.0 2024-06-25 11:51:34 -07:00
Aleix Conchillo Flaqué
84074e90ee Merge pull request #257 from pipecat-ai/aleix/cancel-all-tasks-when-interrutpted
cancel all tasks when interrutpted
2024-06-25 11:16:00 -07:00
Aleix Conchillo Flaqué
38aee7d8f2 services(azure): cancel tasks when interrupted and ignore incoming transcriptions 2024-06-25 11:15:26 -07:00
Aleix Conchillo Flaqué
64198313c6 services(deepgram): cancel tasks when interrupted and ignore incoming transcriptions 2024-06-25 11:15:07 -07:00
Aleix Conchillo Flaqué
d61b6c301c transports(base_input): create push tasks after pushing interruption 2024-06-25 11:15:07 -07:00
Aleix Conchillo Flaqué
83d1931266 Merge pull request #256 from pipecat-ai/aleix/tts-cleanup-when-interrupted
services(tts): strip before TTS and cleanup when interrupted
2024-06-25 11:14:32 -07:00
Aleix Conchillo Flaqué
c31f2ab285 services(tts): strip before TTS and cleanup when interrupted 2024-06-25 11:13:19 -07:00
Aleix Conchillo Flaqué
0ddc5721b4 Merge pull request #252 from pipecat-ai/aleix/daily-check-size-read-audio-frames
transports(daily): always check size of read audio frames
2024-06-25 09:45:05 -07:00
Aleix Conchillo Flaqué
98bd183bc4 pyproject: fix cartesia version and update requirements files 2024-06-25 09:43:54 -07:00
Aleix Conchillo Flaqué
aaa154524c Merge pull request #253 from pipecat-ai/aleix/llm-response-use-intermediate-results
aggregators: uses intermediate results for LLMAssistantResponseAggreg…
2024-06-24 19:21:14 -07:00
Aleix Conchillo Flaqué
beced68337 aggregators: uses intermediate results for LLMAssistantResponseAggregator 2024-06-24 17:33:45 -07:00
Aleix Conchillo Flaqué
94823ab952 transports(daily): always check size of read audio frames 2024-06-24 14:56:24 -07:00
Kwindla Hultman Kramer
0b6a19802f Merge pull request #250 from pipecat-ai/lewis/flush-tts-on-llm-response-end
Flush output from TTSService on LLMFullResponseEndFrame
2024-06-22 20:37:45 -04:00
Lewis Wolfgang
c4a2d2197c Flush output from TTSService on LLMFullResponseEndFrame
To cover cases when the LLM response does not end in punctuation.
2024-06-22 14:57:44 -04:00
Aleix Conchillo Flaqué
269d06aa15 Merge pull request #249 from pipecat-ai/aleix/pipecat-0.0.32
update CHANGELOG.md for 0.0.32
2024-06-22 09:21:21 -07:00
Aleix Conchillo Flaqué
dfef1f2c54 update CHANGELOG.md for 0.0.32 2024-06-22 09:19:22 -07:00
Aleix Conchillo Flaqué
b62beaba0b Merge pull request #248 from pipecat-ai/aleix/deepgramstt-url
services(deepgram): add url to DeepgramSTTService
2024-06-21 22:26:23 -07:00
Aleix Conchillo Flaqué
adf414e40f services(deepgram): add url to DeepgramSTTService 2024-06-21 16:52:28 -07:00
Aleix Conchillo Flaqué
dc64e57f63 Merge pull request #241 from pipecat-ai/aleix/transports-async
transports: fully use asyncio in all read/write operations
2024-06-21 16:00:08 -07:00
Aleix Conchillo Flaqué
d3e410b2ac transports: fully use asyncio in all read/write operations 2024-06-21 15:55:15 -07:00
Aleix Conchillo Flaqué
c544b2474b update linux-py3.10-requirements with fastapi and new daily-python 2024-06-21 15:44:01 -07:00
Aleix Conchillo Flaqué
18243de358 add fastapi and update macos-py3.10-requirements.txt 2024-06-21 13:16:47 -07:00
Aleix Conchillo Flaqué
6625895d1f update macos-py3.10-requirements.txt 2024-06-21 13:13:02 -07:00
Aleix Conchillo Flaqué
f9ecce739e Merge pull request #247 from pipecat-ai/aleix/twilio-updates
some twilio updates
2024-06-21 10:14:40 -07:00
Aleix Conchillo Flaqué
0075dd8386 update linux/macos-py3.10-requirements.txt 2024-06-21 09:48:12 -07:00
Aleix Conchillo Flaqué
eef1cde816 updated CHANGELOG.md with fastapi and twilio updates 2024-06-21 09:48:12 -07:00
Aleix Conchillo Flaqué
8d867c30c6 transports(websocket): verify websockets module 2024-06-21 09:48:12 -07:00
Aleix Conchillo Flaqué
42c668b7ae examples(twilio-chatbot): update instructions and renames 2024-06-21 09:48:12 -07:00
Aleix Conchillo Flaqué
b62227b4ae serializers(twilio): formatting and allow str | bytes | None 2024-06-21 09:47:17 -07:00
Aleix Conchillo Flaqué
25ef0cb87b serializers: allow str | bytes | None 2024-06-21 09:42:43 -07:00
Aleix Conchillo Flaqué
e195941aa5 Merge pull request #246 from pipecat-ai/aleix/daily-dialout-answered
transports(daily): added dialout_answered event
2024-06-20 18:37:24 -07:00
Aleix Conchillo Flaqué
e09eef1dd7 Merge pull request #243 from Viking5274/main
Add twilio_websocket_service with example
2024-06-20 14:09:48 -07:00
Aleix Conchillo Flaqué
7c13663a4e transports(daily): added dialout_answered event 2024-06-20 13:01:25 -07:00
daniil5701133
5753869e5e add twilio-chatbot example with README.md info how to start app
created twilio_websocket_service.py, TwilioFrameSerializer.py

moved pcm_16000_to_ulaw_8000 and ulaw_8000_to_pcm_16000 to src/pipecat/utils/audio.py
fixed callback on disconnect
2024-06-20 23:00:01 +03:00
chadbailey59
ba878a19f4 fixed "Dr." interruption (#245) 2024-06-19 20:53:04 -05:00
Aleix Conchillo Flaqué
55a9de78cd Merge pull request #239 from pipecat-ai/aleix/azure-stt
azure stt support
2024-06-14 14:07:07 +08:00
Aleix Conchillo Flaqué
ff51fc9091 updated CHANGELOG and README 2024-06-13 17:03:49 -07:00
Aleix Conchillo Flaqué
a4f857ee34 examples: use new AzureSTTService in 07f-interruptible-azure 2024-06-13 17:03:49 -07:00
Aleix Conchillo Flaqué
3250d74bef services(azure): new AzureSTTService 2024-06-13 17:03:49 -07:00
Aleix Conchillo Flaqué
c086160239 examples: cleanup some 07 interruptible examples 2024-06-13 16:36:10 -07:00
Aleix Conchillo Flaqué
6cdccaff53 Merge pull request #238 from pipecat-ai/aleix/pipecat-0.0.31
pipecat 0.0.31
2024-06-14 06:31:41 +08:00
Aleix Conchillo Flaqué
a9ab8de25d update CHANGELOG for 0.0.31 2024-06-13 15:31:03 -07:00
Aleix Conchillo Flaqué
2a29cb18a5 transports(base_output): chunk audio into 20ms instead of 10ms 2024-06-13 15:30:41 -07:00
Aleix Conchillo Flaqué
4193a4f415 Merge pull request #237 from pipecat-ai/aleix/pipecat-0.0.30
update CHANGELOG for 0.0.30
2024-06-14 05:28:14 +08:00
Aleix Conchillo Flaqué
0226ec450a update CHANGELOG for 0.0.30 2024-06-13 14:27:37 -07:00
Aleix Conchillo Flaqué
020b8ebb35 Merge pull request #236 from pipecat-ai/aleix/report-only-initial-ttfb
report only initial ttfb
2024-06-14 05:24:52 +08:00
Aleix Conchillo Flaqué
1170b30c1b aggregator(user_response): also handle small VADParams.stop_secs 2024-06-13 13:30:31 -07:00
Aleix Conchillo Flaqué
0004d4a906 vad: reduce smoothing factor and increase confidence 2024-06-13 13:30:11 -07:00
Aleix Conchillo Flaqué
cb27e86266 metrics: allow sending only initial TTFB metrics 2024-06-13 13:30:00 -07:00
Aleix Conchillo Flaqué
77a3b2ea5c Merge pull request #235 from pipecat-ai/aleix/openpipe-refactoring
openpipe refactoring
2024-06-14 01:28:50 +08:00
Aleix Conchillo Flaqué
099e65f3b6 report processor name in error logs 2024-06-13 10:20:45 -07:00
Aleix Conchillo Flaqué
befb8db120 update pyproject and requirements 2024-06-13 10:20:45 -07:00
Aleix Conchillo Flaqué
9992d826b1 examples: renamed 06b-listen... to 07h-inte... 2024-06-13 10:18:20 -07:00
Aleix Conchillo Flaqué
18604e1a39 re-add removed CHANGELOG lines 2024-06-13 10:18:20 -07:00
Aleix Conchillo Flaqué
312c569182 services(openpipe): refactored so it's based on BaseOpenAILLMService 2024-06-13 09:30:50 -07:00
Aleix Conchillo Flaqué
b43e0ed130 Merge pull request #233 from KwalAI/openpipe-integration
OpenPipe Integration
2024-06-13 22:41:57 +08:00
Aleix Conchillo Flaqué
289debea34 Merge pull request #234 from pipecat-ai/aleix/fix-daily-room-properties-exp
transports(helpers): fix DailyRoomProperties.exp
2024-06-13 22:38:41 +08:00
Aleix Conchillo Flaqué
ccd6af7016 transports(helpers): fix DailyRoomProperties.exp 2024-06-12 23:15:22 -07:00
Ankur Duggal
effc69e4e4 formatting 2024-06-12 15:01:19 -07:00
Ankur Duggal
c7a0d0db64 OpenPipe Integration 2024-06-12 14:23:56 -07:00
Aleix Conchillo Flaqué
50d69a1ca4 Merge pull request #231 from pipecat-ai/aleix/websocket-deserializer-none
serializer: allow deserialize() to return None
2024-06-13 04:36:03 +08:00
Aleix Conchillo Flaqué
8a6b8fe70a Merge pull request #232 from pipecat-ai/aleix/pyproject-deepgram
pyproject: add deepgram-sdk
2024-06-13 03:53:08 +08:00
Aleix Conchillo Flaqué
c4e53aea71 update macos-py3.10-requirements with deepgram 2024-06-12 12:52:20 -07:00
Aleix Conchillo Flaqué
ad5125e93f pyproject: add deepgram-sdk 2024-06-12 12:50:18 -07:00
Aleix Conchillo Flaqué
8d92cbac93 Merge pull request #230 from pipecat-ai/aleix/processor-names
processor names
2024-06-13 03:16:07 +08:00
Aleix Conchillo Flaqué
0225443ec8 transports(base): always send MetricsFrame 2024-06-12 12:15:29 -07:00
Aleix Conchillo Flaqué
71e1d0a334 pipeline: send initial TTFB initial metrics from PipelineTask 2024-06-12 12:15:29 -07:00
Aleix Conchillo Flaqué
83f69e02fd allow specifying frame processor names 2024-06-12 12:15:29 -07:00
Aleix Conchillo Flaqué
e1b2da1ff0 serializer: allow deserialize() to return None 2024-06-12 12:11:36 -07:00
Kwindla Hultman Kramer
5eb1b90a4b Merge pull request #229 from pipecat-ai/khk-deepgram-url-configurable
Deepgram TTS service improvements
2024-06-12 14:52:04 -04:00
Kwindla Hultman Kramer
9c4ee74b91 bot to test for demo 2024-06-12 10:41:49 -07:00
Aleix Conchillo Flaqué
f65f566829 re-add transports/services/helpers/__init__.py 2024-06-12 10:37:28 -07:00
Aleix Conchillo Flaqué
c8ad3123b7 Merge pull request #207 from pipecat-ai/dialin-example
New example: Dialin bot (call your Pipecat via phone)
2024-06-13 01:36:00 +08:00
Jon Taylor
8cefce28cf added example fly toml 2024-06-12 10:35:03 -07:00
Jon Taylor
a834d26885 removed https from daily boy 2024-06-12 10:35:03 -07:00
Jon Taylor
810e3cd551 added fly.example.toml due to gitignore 2024-06-12 10:35:03 -07:00
Jon Taylor
f258fa96cd added env to dockerignore 2024-06-12 10:35:03 -07:00
Jon Taylor
757ec61f14 added deepgram to readme 2024-06-12 10:35:03 -07:00
Jon Taylor
2c933f43d8 linting errors and removed unusued sip url 2024-06-12 10:35:03 -07:00
Jon Taylor
cc5bfa8af8 removed helps and fixed linting 2024-06-12 10:35:03 -07:00
Jon Taylor
de9f3e55f1 new example: dialin 2024-06-12 10:35:03 -07:00
Aleix Conchillo Flaqué
ed0c986218 Merge pull request #228 from pipecat-ai/aleix/websocket-fixes
websocket fixes
2024-06-13 01:30:21 +08:00
Aleix Conchillo Flaqué
72c27215b6 transports(websocket): use push_audio_frame() 2024-06-12 10:29:39 -07:00
Aleix Conchillo Flaqué
c23b14f768 examples: use DeepgramSTTService in websocker-server 2024-06-12 10:29:22 -07:00
Aleix Conchillo Flaqué
81282f9c4d services(deepgram): keep conenction alive 2024-06-12 10:29:22 -07:00
Aleix Conchillo Flaqué
2b324f6f81 Merge pull request #227 from pipecat-ai/aleix/daily-room-properties-extra
transports(daily): DailyRoomProperties now allow extra unknown parame…
2024-06-13 00:25:07 +08:00
Kwindla Hultman Kramer
049f110344 PipelineTask should not exit when Deepgram TTS returns a Bad Request "unutterable" 2024-06-12 09:24:09 -07:00
Kwindla Hultman Kramer
448a0307a8 rebasing 2024-06-12 07:54:18 -07:00
Aleix Conchillo Flaqué
7390e42f5c transports(daily): DailyRoomProperties now allow extra unknown parameters 2024-06-11 22:31:32 -07:00
Aleix Conchillo Flaqué
ee880d229f Merge pull request #223 from pipecat-ai/aleix/fix-lower-vad-stop-secs
processors: fix LLMResponseAggregator with lower VAD values
2024-06-12 13:30:34 +08:00
Aleix Conchillo Flaqué
9cd07d81f8 processors: fix LLMResponseAggregator with lower VAD values 2024-06-11 22:30:06 -07:00
Aleix Conchillo Flaqué
b453d089c3 Merge pull request #226 from pipecat-ai/aleix/chunk-audio-output
transport: chunk longer audio frames
2024-06-12 13:28:28 +08:00
Aleix Conchillo Flaqué
7410fe1d1e transport: chunk longer audio frames 2024-06-11 17:50:51 -07:00
Aleix Conchillo Flaqué
6323a77431 Merge pull request #224 from pipecat-ai/aleix/deepgram-stt-simple
deepgram stt simple
2024-06-12 08:48:19 +08:00
Aleix Conchillo Flaqué
0aedaa8553 services(deepgram): abstract StartFrame/EndFrame/CancelFrame 2024-06-10 21:18:42 -07:00
Aleix Conchillo Flaqué
6554479d39 transports: don't queue system frames 2024-06-10 21:00:01 -07:00
Aleix Conchillo Flaqué
ce2ebd3198 examples: updated 07c-interruptible-deepgram to usee DeepgramSTTService 2024-06-10 21:00:01 -07:00
Aleix Conchillo Flaqué
13ea1efc96 examples: add new 13b-deepgram-transcription 2024-06-10 21:00:01 -07:00
Aleix Conchillo Flaqué
ef380321cf services: added new DeepgramSTTService 2024-06-10 21:00:01 -07:00
Kwindla Hultman Kramer
294b037730 configurable deepgram base url 2024-06-08 09:38:48 -04:00
Aleix Conchillo Flaqué
7603996612 Merge pull request #220 from pipecat-ai/aleix/pipecat-0.0.29
update CHANGELOG for 0.0.29
2024-06-08 04:43:52 +08:00
Aleix Conchillo Flaqué
3048d2b0b1 update CHANGELOG for 0.0.29 2024-06-07 13:43:00 -07:00
Aleix Conchillo Flaqué
0bb47a09d2 Merge pull request #218 from pipecat-ai/aleix/send-inital-metrics-mapping
send inital metrics mapping
2024-06-08 04:41:59 +08:00
Aleix Conchillo Flaqué
1afe6901d9 processors: add processors_with_metrics() and can_generate_metrics() 2024-06-07 13:38:21 -07:00
Aleix Conchillo Flaqué
3e019fb512 services(openai): remove unused _chat_completions 2024-06-07 13:18:11 -07:00
Aleix Conchillo Flaqué
e069aa9608 updated CHANGELOG with BasePipeline 2024-06-07 13:18:09 -07:00
Aleix Conchillo Flaqué
0b32e42d25 transports(daily): fix extra super().process_frame() 2024-06-07 13:17:50 -07:00
Aleix Conchillo Flaqué
8d18be5069 services(anthropic): fix metrics 2024-06-07 13:17:50 -07:00
Aleix Conchillo Flaqué
e715d99d0c pipeline: send initial ttfb metrics mapping 2024-06-07 13:17:50 -07:00
Aleix Conchillo Flaqué
dc28590247 moved ParallelTask to pipecat.pipeline.parallel_task 2024-06-07 13:17:50 -07:00
Aleix Conchillo Flaqué
139f158ea1 Merge pull request #219 from pipecat-ai/aleix/switch-voices
switch voices and languages
2024-06-08 04:13:25 +08:00
Aleix Conchillo Flaqué
4b2a18837f services(whisper): add text logging 2024-06-07 13:12:51 -07:00
Aleix Conchillo Flaqué
b4340d0185 services(whisper): increase no speech probability to 0.4 2024-06-07 13:12:21 -07:00
Aleix Conchillo Flaqué
90d11398e6 examples: add 15a-switch-languages 2024-06-07 13:12:21 -07:00
Aleix Conchillo Flaqué
bf8c73b25b examples: add 15-switch-voices 2024-06-07 13:12:21 -07:00
Aleix Conchillo Flaqué
21cd21de1b processors(filters): add FunctionFilter 2024-06-07 13:12:18 -07:00
Aleix Conchillo Flaqué
c25f6e56e7 Merge pull request #217 from pipecat-ai/khk-tts-timings
Added TTFB timings for all TTS services
2024-06-07 05:42:52 +08:00
Aleix Conchillo Flaqué
a1f1d1995c transports: allow sending metrics 2024-06-06 14:35:34 -07:00
Aleix Conchillo Flaqué
390582d7f3 services: use start/stop_ttfb_metrics to report TTFB metrics 2024-06-06 14:00:10 -07:00
Aleix Conchillo Flaqué
e765a29ca2 processors: implement base process_frame(). all subsclassed should call it 2024-06-06 10:54:21 -07:00
Kwindla Hultman Kramer
cf5c244487 Merge branch 'main' into khk-tts-timings 2024-06-06 13:05:42 -04:00
Kwindla Hultman Kramer
a5eb30a93d changelog 2024-06-06 11:49:05 -04:00
Kwindla Hultman Kramer
ac7bc35944 azure tts ttfb 2024-06-06 11:45:48 -04:00
Kwindla Hultman Kramer
ddfd721f6e openai tts ttfb 2024-06-06 11:32:47 -04:00
Kwindla Hultman Kramer
aee3916cd1 cartesia async fixed 2024-06-06 11:24:26 -04:00
Kwindla Hultman Kramer
3eff1e559b pipecat async working, but maybe needs a threaded implementation 2024-06-06 11:11:06 -04:00
Kwindla Hultman Kramer
1a542c91fa temp commit, woring on playht 2024-06-06 10:48:22 -04:00
Aleix Conchillo Flaqué
cd60a84f8a Merge pull request #215 from pipecat-ai/aleix/silero-vad-memory-fix
vad(silero): fix memory issue
2024-06-06 05:50:47 +08:00
Aleix Conchillo Flaqué
3dd4bac6e6 vad(silero): fix memory issue 2024-06-05 14:50:28 -07:00
Kwindla Hultman Kramer
06ff9cfede added timing logs for cartesia, deepgram, elevenlabs 2024-06-05 16:12:10 -04:00
Aleix Conchillo Flaqué
2d1ed9a304 Merge pull request #214 from pipecat-ai/aleix/pipecat-0.0.27
transports(daily): added participants() and participant_counts()
2024-06-06 03:15:34 +08:00
Aleix Conchillo Flaqué
50b51c05f6 transports(daily): added participants() and participant_counts() 2024-06-05 12:14:00 -07:00
Aleix Conchillo Flaqué
5ce4b8dd5b update CHANGELOG with OpenAITTSService 2024-06-05 11:44:24 -07:00
Aleix Conchillo Flaqué
2f4467b5a5 Merge pull request #213 from pipecat-ai/aleix/pipecat-0.0.26
update CHANGELOG for 0.0.26
2024-06-06 01:10:01 +08:00
Aleix Conchillo Flaqué
e91ab54a69 update CHANGELOG for 0.0.26 2024-06-05 10:07:45 -07:00
Aleix Conchillo Flaqué
6a33432c82 Merge pull request #212 from pipecat-ai/aleix/make-pinlesscallupdate-public
transports(daily): move pinlessCallUpdate to public api
2024-06-05 23:14:14 +08:00
Aleix Conchillo Flaqué
135654a080 transports(daily): move pinlessCallUpdate to public api 2024-06-05 08:08:56 -07:00
Aleix Conchillo Flaqué
7b708a2bee Merge pull request #211 from pipecat-ai/aleix/base-transport-async
various fixes and improvements
2024-06-05 22:57:35 +08:00
Aleix Conchillo Flaqué
b515c28417 services(cartesia): allow output_format and model_id 2024-06-04 19:24:33 -07:00
Aleix Conchillo Flaqué
854ffb0323 update CHANGELOG for DailyRESTHelper 2024-06-04 15:45:17 -07:00
Aleix Conchillo Flaqué
891b7b22ea transports: push EndFrame/CancelFrame before stopping push task 2024-06-04 15:43:54 -07:00
Aleix Conchillo Flaqué
c8d37a7227 pipeline(runner): add support for SIGTERM 2024-06-04 15:43:54 -07:00
Aleix Conchillo Flaqué
489060881d update macos-py3.10-requirements 2024-06-04 15:43:54 -07:00
Aleix Conchillo Flaqué
d56a4cce1b update CHANGELOG with latest changes 2024-06-04 15:43:54 -07:00
Aleix Conchillo Flaqué
7eb9dfde38 pyproject: include langchain-community and langchain-openai 2024-06-04 15:43:54 -07:00
Aleix Conchillo Flaqué
571e10f83e services(anthropic): fix interruptions with anthropic 2024-06-04 15:43:54 -07:00
Aleix Conchillo Flaqué
af202d4fe5 pipeline(task): introduce has_finished() 2024-06-04 15:43:54 -07:00
Aleix Conchillo Flaqué
4057fbbcfd transports(tk): fix pyaudio output stream cleanup 2024-06-04 15:43:54 -07:00
Aleix Conchillo Flaqué
5cdb8a79a1 examples: use camera_out_is_live for live video 2024-06-04 15:43:54 -07:00
Aleix Conchillo Flaqué
a674b43243 transport: remove redundant camera thread and switch audio pull for push 2024-06-04 15:43:54 -07:00
Jon Taylor
ac41f13b7c Merge pull request #205 from pipecat-ai/daily_rest_helpers
Created REST helpers for Daily covering commonly used methods for running / deployment
2024-06-04 22:26:39 +02:00
Jon Taylor
003b9887b1 made sip and sipuri optional and None 2024-06-04 19:03:58 +02:00
Jon Taylor
ba45c2ab5b addressed review (urllib import and linting 2024-06-04 18:39:35 +02:00
Aleix Conchillo Flaqué
9d36a48a80 Merge pull request #208 from pipecat-ai/aleix/cartesia-voice-load-startup
services(cartesia): load voices on startup
2024-06-04 22:54:25 +08:00
Aleix Conchillo Flaqué
20a525635e Merge pull request #201 from TomTom101/TomTom101/openai_tts
Added OpenAI TTS (#196)
2024-06-04 22:53:56 +08:00
Aleix Conchillo Flaqué
659eceea95 services(cartesia): load voices on startup 2024-06-03 14:08:04 -07:00
TomTom101
d462c03d00 chore: Review comments 2024-06-03 20:13:15 +02:00
Jon Taylor
6591e07eb4 removed hardcoded 'https' from API url 2024-06-03 19:32:14 +02:00
Aleix Conchillo Flaqué
fe71825954 Merge pull request #206 from pipecat-ai/aleix/fix-deepgram-tts
services(deepgram): fixed DeepgramTTSService
2024-06-04 00:28:53 +08:00
Aleix Conchillo Flaqué
43516f84fe services(deepgram): fixed DeepgramTTSService 2024-06-03 07:53:46 -07:00
Jon Taylor
0849edb00b added Daily REST helpers file for common methods used in Pipecat bots 2024-06-03 16:38:13 +02:00
Aleix Conchillo Flaqué
dd3b4083eb Merge pull request #204 from TomTom101/TomTom101/langchain
fix: Fixed imports, support new PipelineParams
2024-06-03 03:16:30 +08:00
TomTom101
89673a4040 test(langchain): Use new PipelineParams in test 2024-06-02 20:19:55 +02:00
TomTom101
410dbd3dfc fix: Fixed imports, support new PipelineParams 2024-06-02 20:16:11 +02:00
TomTom101
7085b1ea3f doc(openai): Added hint re the 24kHz sample rate 2024-06-01 20:35:46 +02:00
TomTom101
8683cae719 feat: OpenAITTS 2024-06-01 10:13:28 +02:00
Aleix Conchillo Flaqué
0197efa524 Merge pull request #200 from pipecat-ai/aleix/changelog-0.0.25
update CHANGELOG.md for version 0.0.25
2024-06-01 07:48:42 +08:00
Aleix Conchillo Flaqué
16e76caa33 update CHANGELOG.md for version 0.0.25 2024-05-31 16:48:03 -07:00
Aleix Conchillo Flaqué
1f5240694d Merge pull request #199 from pipecat-ai/aleix/langchain-changelog
move LangchainProcessor to processors/frameworks and update CHANGELOG
2024-06-01 07:46:51 +08:00
Aleix Conchillo Flaqué
f087151db7 move LangchainProcessor to processors/frameworks and update CHANGELOG 2024-05-31 16:45:39 -07:00
Aleix Conchillo Flaqué
0b691ff597 Merge pull request #198 from pipecat-ai/aleix/websocket-transport
websocket transport support
2024-06-01 04:40:39 +08:00
TomTom101
ae049961b7 wip: untested 2024-05-31 22:30:52 +02:00
Aleix Conchillo Flaqué
0d6eee705f Merge pull request #190 from TomTom101/TomTom101/langchain
Langchain service
2024-06-01 04:21:12 +08:00
Aleix Conchillo Flaqué
58d20ec9dc transport(websocket-server): add on_client_disconnected 2024-05-31 12:52:43 -07:00
Aleix Conchillo Flaqué
38befe1dc1 examples(websocket): rename server.py to bot.py 2024-05-31 12:09:54 -07:00
Aleix Conchillo Flaqué
2f335100a5 remove storage folder 2024-05-31 11:54:18 -07:00
Aleix Conchillo Flaqué
3fef818843 examples(websocket-server): use VAD analyzer from transport 2024-05-31 11:54:18 -07:00
Aleix Conchillo Flaqué
428c8af77e transports(websocket): base class from BaseInputTransport 2024-05-31 11:54:18 -07:00
Aleix Conchillo Flaqué
54fccd2e25 pipeline: cleanup processors one by one 2024-05-31 11:37:43 -07:00
Aleix Conchillo Flaqué
66c6a5dc0f transports(websocket): base class from BaseOutputTransport 2024-05-31 11:37:43 -07:00
Aleix Conchillo Flaqué
92561ae19d some event loop parameter updates 2024-05-31 11:37:43 -07:00
Aleix Conchillo Flaqué
b85e93410b transports(daily): fix event handlers callback 2024-05-31 11:37:43 -07:00
Aleix Conchillo Flaqué
593993ba97 transports(base_input): remove unnecessary task 2024-05-31 11:37:41 -07:00
Aleix Conchillo Flaqué
7b8b606278 update CHANGELOG and create websocker-server instructions 2024-05-31 11:37:19 -07:00
Aleix Conchillo Flaqué
7116ad0607 examples: fix websocket-client audio playback 2024-05-31 11:36:52 -07:00
Aleix Conchillo Flaqué
c507044277 examples: use gpt-4o model by default 2024-05-31 11:36:52 -07:00
Aleix Conchillo Flaqué
5f45a9d90f examples: websocket-server updates 2024-05-31 11:36:52 -07:00
Aleix Conchillo Flaqué
e31e87aabd transport(websocket): update audio_frame_size 2024-05-31 11:36:52 -07:00
Aleix Conchillo Flaqué
2957416d90 serializers(protobuf): support id and name fields 2024-05-31 11:36:52 -07:00
Aleix Conchillo Flaqué
b9b761b67a added sample_rate and num_channels to protobuf AudioRawFrame 2024-05-31 11:36:52 -07:00
Aleix Conchillo Flaqué
a7539e9317 transports: simplify and fix async and nested decorators 2024-05-31 11:36:52 -07:00
Aleix Conchillo Flaqué
75575c0c68 use get_event_loop() and move event handlers to BaseTransport 2024-05-31 11:36:52 -07:00
Aleix Conchillo Flaqué
77b3e08214 examples: add and update wbesocket eaxmples 2024-05-31 11:36:52 -07:00
Aleix Conchillo Flaqué
956b783c1a transports: added new WebsocketServerTransport 2024-05-31 11:36:52 -07:00
Aleix Conchillo Flaqué
e90c080470 serializers: added BaseSerializer 2024-05-31 11:36:52 -07:00
Aleix Conchillo Flaqué
37aabaa03a frames: generate protobuf pb2 file for pipecat package 2024-05-31 11:36:52 -07:00
Aleix Conchillo Flaqué
3e289a7bef pyproject: add protobuf dependency 2024-05-31 11:36:52 -07:00
Aleix Conchillo Flaqué
6dd5e3fdf5 dev-requirements: add grpcio-tools 2024-05-31 11:36:52 -07:00
Aleix Conchillo Flaqué
e60df3c7c0 Merge pull request #195 from pipecat-ai/aleix/function-calling-move-to-llmservice
function calling move to LLMService
2024-06-01 02:36:29 +08:00
Aleix Conchillo Flaqué
42f772beed examples: some function calling examples cleanup 2024-05-31 11:36:04 -07:00
Aleix Conchillo Flaqué
3655c4a0fc services: move function calling registration to LLMService 2024-05-31 11:36:04 -07:00
Aleix Conchillo Flaqué
012dbffd94 update CHANGELOG.md for function calling 2024-05-31 11:36:03 -07:00
TomTom101
4b39efeee3 fix(langchain): try/catch langchain import in service; Only langchain is installed with the [langchain] extra (#190) 2024-05-31 10:19:27 +02:00
Kwindla Hultman Kramer
19caf750fd Merge pull request #194 from pipecat-ai/khk-cartesia-changelog
Added cartesia line to CHANGELOG.md
2024-05-30 14:18:41 -07:00
Kwindla Hultman Kramer
296611714f added cartesia line to CHANGELOG.md 2024-05-30 10:41:00 -07:00
chadbailey59
4c3d19cc8b Function calling (#175)
* added function calling code back

* removed old llm_context file

* added integration testing for openai

* added function calling example

* added function callbacks

* added function start callback

* fixup

* fixup

* added different return type support for function calling

* intake example working

* added frame loggers

* cleanup

* fixup

* Update openai.py

* removed function call frame types

* fixup

* re-added example

* renumbered wake phrase

* fixup for autopep8

* remove unused imports
2024-05-30 12:25:39 -05:00
Aleix Conchillo Flaqué
a3ba07c7a3 Merge pull request #193 from pipecat-ai/aleix/fix-camera-out-enabled-cpu
transport(output): fix high CPU usage with camera_out_enabled and no …
2024-05-31 01:25:06 +08:00
Kwindla Hultman Kramer
a1579808b2 Merge pull request #189 from pipecat-ai/khk-cartesia-etc
Cartesia TTS
2024-05-30 10:24:45 -07:00
Aleix Conchillo Flaqué
aecb9f5816 transport(output): fix high CPU usage with camera_out_enabled and no images 2024-05-30 10:18:43 -07:00
Aleix Conchillo Flaqué
a5d42a526c Merge pull request #191 from pipecat-ai/aleix/fix-silero-vad
vad: fix silero vad frame processor
2024-05-30 23:25:52 +08:00
Aleix Conchillo Flaqué
a9472f8116 vad: fix silero vad frame processor 2024-05-30 07:50:58 -07:00
TomTom101
b19243ab75 fix: corrected hint to install Langchain libs 2024-05-30 10:53:42 +02:00
TomTom101
2bf094b950 test(langchain): Rewrite to unittest, make it meaningful 2024-05-30 10:43:33 +02:00
Kwindla Hultman Kramer
d5f106ae19 pr fixes 2024-05-29 23:41:35 -07:00
Kwindla Hultman Kramer
920745345a cartesia tts support 2024-05-29 23:35:35 -07:00
TomTom101
143033d7db fix: install langchain-community with the langchain extra 2024-05-30 03:15:14 +02:00
TomTom101
335990c145 wip: hint to install langchain_community 2024-05-30 03:15:14 +02:00
TomTom101
6d24e836b0 wip: Example using LC message history 2024-05-30 03:15:14 +02:00
TomTom101
278a2fed56 wip: First stab at langchain support
Is this a service or processor?
How to deal with conversation history? LC has sophisticated means of this, but might get in the way of `LLMResponseAggregator`
2024-05-30 03:15:14 +02:00
250 changed files with 22619 additions and 5010 deletions

View File

@@ -1,4 +1,4 @@
name: lint
name: format
on:
workflow_dispatch:
@@ -12,12 +12,12 @@ on:
- "docs/**"
concurrency:
group: build-lint-${{ github.event.pull_request.number || github.ref }}
group: build-format-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
autopep8:
name: "Formatting lints"
ruff-format:
name: "Formatting checker"
runs-on: ubuntu-latest
steps:
- name: Checkout repo
@@ -25,7 +25,7 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
python-version: "3.10"
- name: Setup virtual environment
run: |
python -m venv .venv
@@ -34,11 +34,8 @@ jobs:
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r dev-requirements.txt
- name: autopep8
id: autopep8
- name: Ruff formatter
id: ruff
run: |
source .venv/bin/activate
autopep8 --max-line-length 100 --exit-code -r -d --exclude "*_pb2.py" -a -a src/
- name: Fail if autopep8 requires changes
if: steps.autopep8.outputs.exit-code == 2
run: exit 1
ruff format --config line-length=100 --diff --exclude "*_pb2.py"

View File

@@ -1,10 +1,6 @@
name: publish-test
on:
workflow_dispatch:
push:
branches:
- main
on: workflow_dispatch
jobs:
build:
@@ -14,7 +10,6 @@ jobs:
- name: Checkout repo
uses: actions/checkout@v4
with:
ref: ${{ github.event.inputs.gitref }}
fetch-tags: true
fetch-depth: 100
- name: Set up Python

View File

@@ -20,21 +20,17 @@ jobs:
name: "Unit and Integration Tests"
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Checkout repo
uses: actions/checkout@v4
- name: Set up Python
id: setup_python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Cache virtual environment
uses: actions/cache@v3
with:
# We are hashing requirements-dev.txt and requirements-extra.txt which
# contain all dependencies needed to run the tests and examples.
key: venv-${{ runner.os }}-${{ steps.setup_python.outputs.python-version}}-${{ hashFiles('linux-py3.10-requirements.txt') }}-${{ hashFiles('dev-requirements.txt') }}
path: .venv
python-version: "3.10"
- name: Install system packages
run: sudo apt-get install -y portaudio19-dev
id: install_system_packages
run: |
sudo apt-get install -y portaudio19-dev
- name: Setup virtual environment
run: |
python -m venv .venv
@@ -42,8 +38,8 @@ jobs:
run: |
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r linux-py3.10-requirements.txt -r dev-requirements.txt
pip install -r dev-requirements.txt -r test-requirements.txt
- name: Test with pytest
run: |
source .venv/bin/activate
pytest --doctest-modules --ignore-glob="*to_be_updated*" src tests
pytest --ignore-glob="*to_be_updated*" --ignore-glob=*pipeline_source* src tests

View File

@@ -5,6 +5,645 @@ All notable changes to **pipecat** will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [Unreleased]
### Added
- Added `AsyncGeneratorProcessor`. This processor can be used together with a
`FrameSerializer` as an async generator. It provides a `generator()` function
that returns an `AsyncGenerator` and that yields serialized frames.
- Added `EndTaskFrame` and `CancelTaskFrame`. These are new frames that are
meant to be pushed upstream to tell the pipeline task to stop nicely or
immediately respectively.
- Added configurable LLM parameters (e.g., temperature, top_p, max_tokens, seed)
for OpenAI, Anthropic, and Together AI services along with corresponding
setter functions.
- Added `sample_rate` as a constructor parameter for TTS services.
- Pipecat has a pipeline-based architecture. The pipeline consists of frame
processors linked to each other. The elements traveling across the pipeline
are called frames.
To have a deterministic behavior the frames traveling through the pipeline
should always be ordered, except system frames which are out-of-band
frames. To achieve that, each frame processor should only output frames from a
single task.
In this version we introduce synchronous and asynchronous frame
processors. The synchronous processors push output frames from the same task
that they receive input frames, and therefore only pushing frames from one
task. Asynchronous frame processors can have internal tasks to perform things
asynchronously (e.g. receiving data from a websocket) but they also have a
single task where they push frames from.
By default, frame processors are synchronous. To change a frame processor to
asynchronous you only need to pass `sync=False` to the base class constructor.
- Added pipeline clocks. A pipeline clock is used by the output transport to
know when a frame needs to be presented. For that, all frames now have an
optional `pts` field (prensentation timestamp). There's currently just one
clock implementation `SystemClock` and the `pts` field is currently only used
for `TextFrame`s (audio and image frames will be next).
- A clock can now be specified to `PipelineTask` (defaults to
`SystemClock`). This clock will be passed to each frame processor via the
`StartFrame`.
- Added `CartesiaHttpTTSService`. This is a synchronous frame processor
(i.e. given an input text frame it will wait for the whole output before
returning).
- `DailyTransport` now supports setting the audio bitrate to improve audio
quality through the `DailyParams.audio_out_bitrate` parameter. The new
default is 96kbps.
- `DailyTransport` now uses the number of audio output channels (1 or 2) to set
mono or stereo audio when needed.
- Interruptions support has been added to `TwilioFrameSerializer` when using
`FastAPIWebsocketTransport`.
- Added new `LmntTTSService` text-to-speech service.
(see https://www.lmnt.com/)
- Added `TTSModelUpdateFrame`, `TTSLanguageUpdateFrame`, `STTModelUpdateFrame`,
and `STTLanguageUpdateFrame` frames to allow you to switch models, language
and voices in TTS and STT services.
- Added new `transcriptions.Language` enum.
### Changed
- We now distinguish between input and output audio and image frames. We
introduce `InputAudioRawFrame`, `OutputAudioRawFrame`, `InputImageRawFrame`
and `OutputImageRawFrame` (and other subclasses of those). The input frames
usually come from an input transport and are meant to be processed inside the
pipeline to generate new frames. However, the input frames will not be sent
through an output transport. The output frames can also be processed by any
frame processor in the pipeline and they are allowed to be sent by the output
transport.
- `ParallelTask` has been renamed to `SyncParallelPipeline`. A
`SyncParallelPipeline` is a frame processor that contains a list of different
pipelines to be executed concurrently. The difference between a
`SyncParallelPipeline` and a `ParallelPipeline` is that, given an input frame,
the `SyncParallelPipeline` will wait for all the internal pipelines to
complete. This is achieved by ensuring all the processors in each of the
internal pipelines are synchronous.
- `StartFrame` is back a system frame so we make sure it's processed immediately
by all processors. `EndFrame` stays a control frame since it needs to be
ordered allowing the frames in the pipeline to be processed.
- Updated `MoondreamService` revision to `2024-08-26`.
- `CartesiaTTSService` and `ElevenLabsTTSService` now add presentation
timestamps to their text output. This allows the output transport to push the
text frames downstream at almost the same time the words are spoken. We say
"almost" because currently the audio frames don't have presentation timestamp
but they should be played at roughly the same time.
- `DailyTransport.on_joined` event now returns the full session data instead of
just the participant.
- `CartesiaTTSService` is now a subclass of `TTSService`.
- `DeepgramSTTService` is now a subclass of `STTService`.
- `WhisperSTTService` is now a subclass of `SegmentedSTTService`. A
`SegmentedSTTService` is a `STTService` where the provided audio is given in a
big chunk (i.e. from when the user starts speaking until the user stops
speaking) instead of a continous stream.
### Fixed
- Fixed a `BaseOutputTransport` issue that would stop audio and video rendering
tasks (after receiving and `EndFrame`) before the internal queue was emptied,
causing the pipeline to finish prematurely.
- `StartFrame` should be the first frame every processor receives to avoid
situations where things are not initialized (because initialization happens on
`StartFrame`) and other frames come in resulting in undesired behavior.
### Performance
- `obj_id()` and `obj_count()` now use `itertools.count` avoiding the need of
`threading.Lock`.
## [0.0.41] - 2024-08-22
### Added
- Added `LivekitFrameSerializer` audio frame serializer.
### Fixed
- Fix `FastAPIWebsocketOutputTransport` variable name clash with subclass.
- Fix an `AnthropicLLMService` issue with empty arguments in function calling.
### Other
- Fixed `studypal` example errors.
## [0.0.40] - 2024-08-20
### Added
- VAD parameters can now be dynamicallt updated using the
`VADParamsUpdateFrame`.
- `ErrorFrame` has now a `fatal` field to indicate the bot should exit if a
fatal error is pushed upstream (false by default). A new `FatalErrorFrame`
that sets this flag to true has been added.
- `AnthropicLLMService` now supports function calling and initial support for
prompt caching.
(see https://www.anthropic.com/news/prompt-caching)
- `ElevenLabsTTSService` can now specify ElevenLabs input parameters such as
`output_format`.
- `TwilioFrameSerializer` can now specify Twilio's and Pipecat's desired sample
rates to use.
- Added new `on_participant_updated` event to `DailyTransport`.
- Added `DailyRESTHelper.delete_room_by_name()` and
`DailyRESTHelper.delete_room_by_url()`.
- Added LLM and TTS usage metrics. Those are enabled when
`PipelineParams.enable_usage_metrics` is True.
- `AudioRawFrame`s are now pushed downstream from the base output
transport. This allows capturing the exact words the bot says by adding an STT
service at the end of the pipeline.
- Added new `GStreamerPipelineSource`. This processor can generate image or
audio frames from a GStreamer pipeline (e.g. reading an MP4 file, and RTP
stream or anything supported by GStreamer).
- Added `TransportParams.audio_out_is_live`. This flag is False by default and
it is useful to indicate we should not synchronize audio with sporadic images.
- Added new `BotStartedSpeakingFrame` and `BotStoppedSpeakingFrame` control
frames. These frames are pushed upstream and they should wrap
`BotSpeakingFrame`.
- Transports now allow you to register event handlers without decorators.
### Changed
- Support RTVI message protocol 0.1. This includes new messages, support for
messages responses, support for actions, configuration, webhooks and a bunch
of new cool stuff.
(see https://docs.rtvi.ai/)
- `SileroVAD` dependency is now imported via pip's `silero-vad` package.
- `ElevenLabsTTSService` now uses `eleven_turbo_v2_5` model by default.
- `BotSpeakingFrame` is now a control frame.
- `StartFrame` is now a control frame similar to `EndFrame`.
- `DeepgramTTSService` now is more customizable. You can adjust the encoding and
sample rate.
### Fixed
- `TTSStartFrame` and `TTSStopFrame` are now sent when TTS really starts and
stops. This allows for knowing when the bot starts and stops speaking even
with asynchronous services (like Cartesia).
- Fixed `AzureSTTService` transcription frame timestamps.
- Fixed an issue with `DailyRESTHelper.create_room()` expirations which would
cause this function to stop working after the initial expiration elapsed.
- Improved `EndFrame` and `CancelFrame` handling. `EndFrame` should end things
gracefully while a `CancelFrame` should cancel all running tasks as soon as
possible.
- Fixed an issue in `AIService` that would cause a yielded `None` value to be
processed.
- RTVI's `bot-ready` message is now sent when the RTVI pipeline is ready and
a first participant joins.
- Fixed a `BaseInputTransport` issue that was causing incoming system frames to
be queued instead of being pushed immediately.
- Fixed a `BaseInputTransport` issue that was causing start/stop interruptions
incoming frames to not cancel tasks and be processed properly.
### Other
- Added `studypal` example (from to the Cartesia folks!).
- Most examples now use Cartesia.
- Added examples `foundational/19a-tools-anthropic.py`,
`foundational/19b-tools-video-anthropic.py` and
`foundational/19a-tools-togetherai.py`.
- Added examples `foundational/18-gstreamer-filesrc.py` and
`foundational/18a-gstreamer-videotestsrc.py` that show how to use
`GStreamerPipelineSource`
- Remove `requests` library usage.
- Cleanup examples and use `DailyRESTHelper`.
## [0.0.39] - 2024-07-23
### Fixed
- Fixed a regression introduced in 0.0.38 that would cause Daily transcription
to stop the Pipeline.
## [0.0.38] - 2024-07-23
### Added
- Added `force_reload`, `skip_validation` and `trust_repo` to `SileroVAD` and
`SileroVADAnalyzer`. This allows caching and various GitHub repo validations.
- Added `send_initial_empty_metrics` flag to `PipelineParams` to request for
initial empty metrics (zero values). True by default.
### Fixed
- Fixed initial metrics format. It was using the wrong keys name/time instead of
processor/value.
- STT services should be using ISO 8601 time format for transcription frames.
- Fixed an issue that would cause Daily transport to show a stop transcription
error when actually none occurred.
## [0.0.37] - 2024-07-22
### Added
- Added `RTVIProcessor` which implements the RTVI-AI standard.
See https://github.com/rtvi-ai
- Added `BotInterruptionFrame` which allows interrupting the bot while talking.
- Added `LLMMessagesAppendFrame` which allows appending messages to the current
LLM context.
- Added `LLMMessagesUpdateFrame` which allows changing the LLM context for the
one provided in this new frame.
- Added `LLMModelUpdateFrame` which allows updating the LLM model.
- Added `TTSSpeakFrame` which causes the bot say some text. This text will not
be part of the LLM context.
- Added `TTSVoiceUpdateFrame` which allows updating the TTS voice.
### Removed
- We remove the `LLMResponseStartFrame` and `LLMResponseEndFrame` frames. These
were added in the past to properly handle interruptions for the
`LLMAssistantContextAggregator`. But the `LLMContextAggregator` is now based
on `LLMResponseAggregator` which handles interruptions properly by just
processing the `StartInterruptionFrame`, so there's no need for these extra
frames any more.
### Fixed
- Fixed an issue with `StatelessTextTransformer` where it was pushing a string
instead of a `TextFrame`.
- `TTSService` end of sentence detection has been improved. It now works with
acronyms, numbers, hours and others.
- Fixed an issue in `TTSService` that would not properly flush the current
aggregated sentence if an `LLMFullResponseEndFrame` was found.
### Performance
- `CartesiaTTSService` now uses websockets which improves speed. It also
leverages the new Cartesia contexts which maintains generated audio prosody
when multiple inputs are sent, therefore improving audio quality a lot.
## [0.0.36] - 2024-07-02
### Added
- Added `GladiaSTTService`.
See https://docs.gladia.io/chapters/speech-to-text-api/pages/live-speech-recognition
- Added `XTTSService`. This is a local Text-To-Speech service.
See https://github.com/coqui-ai/TTS
- Added `UserIdleProcessor`. This processor can be used to wait for any
interaction with the user. If the user doesn't say anything within a given
timeout a provided callback is called.
- Added `IdleFrameProcessor`. This processor can be used to wait for frames
within a given timeout. If no frame is received within the timeout a provided
callback is called.
- Added new frame `BotSpeakingFrame`. This frame will be continuously pushed
upstream while the bot is talking.
- It is now possible to specify a Silero VAD version when using `SileroVADAnalyzer`
or `SileroVAD`.
- Added `AysncFrameProcessor` and `AsyncAIService`. Some services like
`DeepgramSTTService` need to process things asynchronously. For example, audio
is sent to Deepgram but transcriptions are not returned immediately. In these
cases we still require all frames (except system frames) to be pushed
downstream from a single task. That's what `AsyncFrameProcessor` is for. It
creates a task and all frames should be pushed from that task. So, whenever a
new Deepgram transcription is ready that transcription will also be pushed
from this internal task.
- The `MetricsFrame` now includes processing metrics if metrics are enabled. The
processing metrics indicate the time a processor needs to generate all its
output. Note that not all processors generate these kind of metrics.
### Changed
- `WhisperSTTService` model can now also be a string.
- Added missing \* keyword separators in services.
### Fixed
- `WebsocketServerTransport` doesn't try to send frames anymore if serializers
returns `None`.
- Fixed an issue where exceptions that occurred inside frame processors were
being swallowed and not displayed.
- Fixed an issue in `FastAPIWebsocketTransport` where it would still try to send
data to the websocket after being closed.
### Other
- Added Fly.io deployment example in `examples/deployment/flyio-example`.
- Added new `17-detect-user-idle.py` example that shows how to use the new
`UserIdleProcessor`.
## [0.0.35] - 2024-06-28
### Changed
- `FastAPIWebsocketParams` now require a serializer.
- `TwilioFrameSerializer` now requires a `streamSid`.
### Fixed
- Silero VAD number of frames needs to be 512 for 16000 sample rate or 256 for
8000 sample rate.
## [0.0.34] - 2024-06-25
### Fixed
- Fixed an issue with asynchronous STT services (Deepgram and Azure) that could
interruptions to ignore transcriptions.
- Fixed an issue introduced in 0.0.33 that would cause the LLM to generate
shorter output.
## [0.0.33] - 2024-06-25
### Changed
- Upgraded to Cartesia's new Python library 1.0.0. `CartesiaTTSService` now
expects a voice ID instead of a voice name (you can get the voice ID from
Cartesia's playground). You can also specify the audio `sample_rate` and
`encoding` instead of the previous `output_format`.
### Fixed
- Fixed an issue with asynchronous STT services (Deepgram and Azure) that could
cause static audio issues and interruptions to not work properly when dealing
with multiple LLMs sentences.
- Fixed an issue that could mix new LLM responses with previous ones when
handling interruptions.
- Fixed a Daily transport blocking situation that occurred while reading audio
frames after a participant left the room. Needs daily-python >= 0.10.1.
## [0.0.32] - 2024-06-22
### Added
- Allow specifying a `DeepgramSTTService` url which allows using on-prem
Deepgram.
- Added new `FastAPIWebsocketTransport`. This is a new websocket transport that
can be integrated with FastAPI websockets.
- Added new `TwilioFrameSerializer`. This is a new serializer that knows how to
serialize and deserialize audio frames from Twilio.
- Added Daily transport event: `on_dialout_answered`. See
https://reference-python.daily.co/api_reference.html#daily.EventHandler
- Added new `AzureSTTService`. This allows you to use Azure Speech-To-Text.
### Performance
- Convert `BaseOutputTransport` and `BaseOutputTransport` to fully use asyncio
and remove the use of threads.
### Other
- Added `twilio-chatbot`. This is an example that shows how to integrate Twilio
phone numbers with a Pipecat bot.
- Updated `07f-interruptible-azure.py` to use `AzureLLMService`,
`AzureSTTService` and `AzureTTSService`.
## [0.0.31] - 2024-06-13
### Performance
- Break long audio frames into 20ms chunks instead of 10ms.
## [0.0.30] - 2024-06-13
### Added
- Added `report_only_initial_ttfb` to `PipelineParams`. This will make it so
only the initial TTFB metrics after the user stops talking are reported.
- Added `OpenPipeLLMService`. This service will let you run OpenAI through
OpenPipe's SDK.
- Allow specifying frame processors' name through a new `name` constructor
argument.
- Added `DeepgramSTTService`. This service has an ongoing websocket
connection. To handle this, it subclasses `AIService` instead of
`STTService`. The output of this service will be pushed from the same task,
except system frames like `StartFrame`, `CancelFrame` or
`StartInterruptionFrame`.
### Changed
- `FrameSerializer.deserialize()` can now return `None` in case it is not
possible to desearialize the given data.
- `daily_rest.DailyRoomProperties` now allows extra unknown parameters.
### Fixed
- Fixed an issue where `DailyRoomProperties.exp` always had the same old
timestamp unless set by the user.
- Fixed a couple of issues with `WebsocketServerTransport`. It needed to use
`push_audio_frame()` and also VAD was not working properly.
- Fixed an issue that would cause LLM aggregator to fail with small
`VADParams.stop_secs` values.
- Fixed an issue where `BaseOutputTransport` would send longer audio frames
preventing interruptions.
### Other
- Added new `07h-interruptible-openpipe.py` example. This example shows how to
use OpenPipe to run OpenAI LLMs and get the logs stored in OpenPipe.
- Added new `dialin-chatbot` example. This examples shows how to call the bot
using a phone number.
## [0.0.29] - 2024-06-07
### Added
- Added a new `FunctionFilter`. This filter will let you filter frames based on
a given function, except system messages which should never be filtered.
- Added `FrameProcessor.can_generate_metrics()` method to indicate if a
processor can generate metrics. In the future this might get an extra argument
to ask for a specific type of metric.
- Added `BasePipeline`. All pipeline classes should be based on this class. All
subclasses should implement a `processors_with_metrics()` method that returns
a list of all `FrameProcessor`s in the pipeline that can generate metrics.
- Added `enable_metrics` to `PipelineParams`.
- Added `MetricsFrame`. The `MetricsFrame` will report different metrics in the
system. Right now, it can report TTFB (Time To First Byte) values for
different services, that is the time spent between the arrival of a `Frame` to
the processor/service until the first `DataFrame` is pushed downstream. If
metrics are enabled an intial `MetricsFrame` with all the services in the
pipeline will be sent.
- Added TTFB metrics and debug logging for TTS services.
### Changed
- Moved `ParallelTask` to `pipecat.pipeline.parallel_task`.
### Fixed
- Fixed PlayHT TTS service to work properly async.
## [0.0.28] - 2024-06-05
### Fixed
- Fixed an issue with `SileroVADAnalyzer` that would cause memory to keep
growing indefinitely.
## [0.0.27] - 2024-06-05
### Added
- Added `DailyTransport.participants()` and `DailyTransport.participant_counts()`.
## [0.0.26] - 2024-06-05
### Added
- Added `OpenAITTSService`.
- Allow passing `output_format` and `model_id` to `CartesiaTTSService` to change
audio sample format and the model to use.
- Added `DailyRESTHelper` which helps you create Daily rooms and tokens in an
easy way.
- `PipelineTask` now has a `has_finished()` method to indicate if the task has
completed. If a task is never ran `has_finished()` will return False.
- `PipelineRunner` now supports SIGTERM. If received, the runner will be
canceled.
### Fixed
- Fixed an issue where `BaseInputTransport` and `BaseOutputTransport` where
stopping push tasks before pushing `EndFrame` frames could cause the bots to
get stuck.
- Fixed an error closing local audio transports.
- Fixed an issue with Deepgram TTS that was introduced in the previous release.
- Fixed `AnthropicLLMService` interruptions. If an interruption occurred, a
`user` message could be appended after the previous `user` message. Anthropic
does not allow that because it requires alternate `user` and `assistant`
messages.
### Performance
- The `BaseInputTransport` does not pull audio frames from sub-classes any
more. Instead, sub-classes now push audio frames into a queue in the base
class. Also, `DailyInputTransport` now pushes audio frames every 20ms instead
of 10ms.
- Remove redundant camera input thread from `DailyInputTransport`. This should
improve performance a little bit when processing participant videos.
- Load Cartesia voice on startup.
## [0.0.25] - 2024-05-31
### Added
- Added WebsocketServerTransport. This will create a websocket server and will
read messages coming from a client. The messages are serialized/deserialized
with protobufs. See `examples/websocket-server` for a detailed example.
- Added function calling (LLMService.register_function()). This will allow the
LLM to call functions you have registered when needed. For example, if you
register a function to get the weather in Los Angeles and ask the LLM about
the weather in Los Angeles, the LLM will call your function.
See https://platform.openai.com/docs/guides/function-calling
- Added new `LangchainProcessor`.
- Added Cartesia TTS support (https://cartesia.ai/)
### Fixed
- Fixed SileroVAD frame processor.
- Fixed an issue where `camera_out_enabled` would cause the highg CPU usage if
no image was provided.
### Performance
- Removed unnecessary audio input tasks.
## [0.0.24] - 2024-05-29
### Added
@@ -52,7 +691,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Added Daily transport support for dial-in use cases.
- Added Daily transport events: `on_dialout_connected`, `on_dialout_stopped`,
`on_dialout_error` and `on_dialout_warning`. See
`on_dialout_error` and `on_dialout_warning`. See
https://reference-python.daily.co/api_reference.html#daily.EventHandler
## [0.0.21] - 2024-05-22

View File

@@ -1,6 +1,6 @@
BSD 2-Clause License
Copyright (c) 2024, Kwindla Hultman Kramer
Copyright (c) 2024, Daily
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

View File

@@ -4,8 +4,7 @@
# Pipecat
[![PyPI](https://img.shields.io/pypi/v/pipecat-ai)](https://pypi.org/project/pipecat-ai) [![Discord](https://img.shields.io/discord/1239284677165056021
)](https://discord.gg/pipecat)
[![PyPI](https://img.shields.io/pypi/v/pipecat-ai)](https://pypi.org/project/pipecat-ai) [![Discord](https://img.shields.io/discord/1239284677165056021)](https://discord.gg/pipecat) <a href="https://app.commanddash.io/agent/github_pipecat-ai_pipecat"><img src="https://img.shields.io/badge/AI-Code%20Agent-EB9FDA"></a>
`pipecat` is a framework for building voice (and multimodal) conversational agents. Things like personal coaches, meeting assistants, [story-telling toys for kids](https://storytelling-chatbot.fly.dev/), customer support bots, [intake flows](https://www.youtube.com/watch?v=lDevgsp9vn0), and snarky social companions.
@@ -39,7 +38,7 @@ pip install "pipecat-ai[option,...]"
Your project may or may not need these, so they're made available as optional requirements. Here is a list:
- **AI services**: `anthropic`, `azure`, `deepgram`, `google`, `fal`, `moondream`, `openai`, `playht`, `silero`, `whisper`
- **AI services**: `anthropic`, `azure`, `deepgram`, `gladia`, `google`, `fal`, `lmnt`, `moondream`, `openai`, `openpipe`, `playht`, `silero`, `whisper`, `xtts`
- **Transports**: `local`, `websocket`, `daily`
## Code examples
@@ -49,7 +48,7 @@ Your project may or may not need these, so they're made available as optional re
## A simple voice agent running locally
Here is a very basic Pipecat bot that greets a user when they join a real-time session. We'll use [Daily](https://daily.co) for real-time media transport, and [ElevenLabs](https://elevenlabs.io/) for text-to-speech.
Here is a very basic Pipecat bot that greets a user when they join a real-time session. We'll use [Daily](https://daily.co) for real-time media transport, and [Cartesia](https://cartesia.ai/) for text-to-speech.
```python
#app.py
@@ -61,7 +60,7 @@ from pipecat.frames.frames import EndFrame, TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.pipeline.runner import PipelineRunner
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
async def main():
@@ -70,14 +69,13 @@ async def main():
transport = DailyTransport(
room_url=...,
token=...,
"Bot Name",
DailyParams(audio_out_enabled=True))
bot_name="Bot Name",
params=DailyParams(audio_out_enabled=True))
# Use Eleven Labs for Text-to-Speech
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=...,
voice_id=...,
# Use Cartesia for Text-to-Speech
tts = CartesiaTTSService(
api_key=...,
voice_id=...
)
# Simple pipeline that will process text to speech and output the result
@@ -94,7 +92,7 @@ async def main():
@transport.event_handler("on_participant_joined")
async def on_new_participant_joined(transport, participant):
participant_name = participant["info"]["userName"] or ''
# Queue a TextFrame that will get spoken by the TTS service (Eleven Labs)
# Queue a TextFrame that will get spoken by the TTS service (Cartesia)
await task.queue_frames([TextFrame(f"Hello there, {participant_name}!"), EndFrame()])
# Run the pipeline task
@@ -125,7 +123,7 @@ Sign up [here](https://dashboard.daily.co/u/signup) and [create a room](https://
Voice Activity Detection &mdash; very important for knowing when a user has finished speaking to your bot. If you are not using press-to-talk, and want Pipecat to detect when the user has finished talking, VAD is an essential component for a natural feeling conversation.
Pipecast makes use of WebRTC VAD by default when using a WebRTC transport layer. Optionally, you can use Silero VAD for improved accuracy at the cost of higher CPU usage.
Pipecat makes use of WebRTC VAD by default when using a WebRTC transport layer. Optionally, you can use Silero VAD for improved accuracy at the cost of higher CPU usage.
```shell
pip install pipecat-ai[silero]
@@ -146,20 +144,20 @@ source venv/bin/activate
From the root of this repo, run the following:
```shell
pip install -r dev-requirements.txt -r {env}-requirements.txt
pip install -r dev-requirements.txt
python -m build
```
This builds the package. To use the package locally (eg to run sample files), run
This builds the package. To use the package locally (e.g. to run sample files), run
```shell
pip install --editable .
pip install --editable ".[option,...]"
```
If you want to use this package from another directory, you can run:
```shell
pip install path_to_this_repo
pip install "path_to_this_repo[option,...]"
```
### Running tests
@@ -167,27 +165,29 @@ pip install path_to_this_repo
From the root directory, run:
```shell
pytest --doctest-modules --ignore-glob="*to_be_updated*" src tests
pytest --doctest-modules --ignore-glob="*to_be_updated*" --ignore-glob=*pipeline_source* src tests
```
## Setting up your editor
This project uses strict [PEP 8](https://peps.python.org/pep-0008/) formatting.
This project uses strict [PEP 8](https://peps.python.org/pep-0008/) formatting via [Ruff](https://github.com/astral-sh/ruff).
### Emacs
You can use [use-package](https://github.com/jwiegley/use-package) to install [py-autopep8](https://codeberg.org/ideasman42/emacs-py-autopep8) package and configure `autopep8` arguments:
You can use [use-package](https://github.com/jwiegley/use-package) to install [emacs-lazy-ruff](https://github.com/christophermadsen/emacs-lazy-ruff) package and configure `ruff` arguments:
```elisp
(use-package py-autopep8
(use-package lazy-ruff
:ensure t
:defer t
:hook ((python-mode . py-autopep8-mode))
:hook ((python-mode . lazy-ruff-mode))
:config
(setq py-autopep8-options '("-a" "-a", "--max-line-length=100")))
(setq lazy-ruff-format-command "ruff format --config line-length=100")
(setq lazy-ruff-only-format-block t)
(setq lazy-ruff-only-format-region t)
(setq lazy-ruff-only-format-buffer t))
```
`autopep8` was installed in the `venv` environment described before, so you should be able to use [pyvenv-auto](https://github.com/ryotaro612/pyvenv-auto) to automatically load that environment inside Emacs.
`ruff` was installed in the `venv` environment described before, so you should be able to use [pyvenv-auto](https://github.com/ryotaro612/pyvenv-auto) to automatically load that environment inside Emacs.
```elisp
(use-package pyvenv-auto
@@ -200,18 +200,14 @@ You can use [use-package](https://github.com/jwiegley/use-package) to install [p
### Visual Studio Code
Install the
[autopep8](https://marketplace.visualstudio.com/items?itemName=ms-python.autopep8) extension. Then edit the user settings (_Ctrl-Shift-P_ `Open User Settings (JSON)`) and set it as the default Python formatter, enable formatting on save and configure `autopep8` arguments:
[Ruff](https://marketplace.visualstudio.com/items?itemName=charliermarsh.ruff) extension. Then edit the user settings (_Ctrl-Shift-P_ `Open User Settings (JSON)`) and set it as the default Python formatter, enable formatting on save and configure `ruff` arguments:
```json
"[python]": {
"editor.defaultFormatter": "ms-python.autopep8",
"editor.defaultFormatter": "charliermarsh.ruff",
"editor.formatOnSave": true
},
"autopep8.args": [
"-a",
"-a",
"--max-line-length=100"
],
"ruff.format.args": ["--config", "line-length=100"]
```
## Getting help

View File

@@ -1,6 +1,8 @@
autopep8~=2.1.0
build~=1.2.1
grpcio-tools~=1.62.2
pip-tools~=7.4.1
pytest~=8.2.0
setuptools~=69.5.1
pyright~=1.1.376
pytest~=8.3.2
ruff~=0.6.7
setuptools~=72.2.0
setuptools_scm~=8.1.0

View File

@@ -27,9 +27,19 @@ FAL_KEY=...
# Fireworks
FIREWORKS_API_KEY=...
# Gladia
GLADIA_API_KEY=...
# LMNT
LMNT_API_KEY=...
LMNT_VOICE_ID=...
# PlayHT
PLAY_HT_USER_ID=...
PLAY_HT_API_KEY=...
# OpenAI
OPENAI_API_KEY=...
#OpenPipe
OPENPIPE_API_KEY=...

View File

@@ -32,13 +32,16 @@ Next, follow the steps in the README for each demo.
## Projects:
| Project | Description | Services |
| -------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------- |
| [Simple Chatbot](simple-chatbot) | Basic voice-driven conversational bot. A good starting point for learning the flow of the framework. | Deepgram, OpenAI, Daily, Daily Prebuilt UI |
| [Storytelling Chatbot](storytelling-chatbot) | Stitches together multiple third-party services to create a collaborative storytime experience. | Deepgram, ElevenLabs, Open AI, Fal, Daily, Custom UI |
| [Translation Chatbot](translation-chatbot) | Listens for user speech, then translates that speech to Spanish and speaks the translation back. Demonstrates multi-participant use-cases. | Deepgram, Azure, OpenAI, Daily, Daily Prebuilt UI |
| [Moondream Chatbot](moondream-chatbot) | Demonstrates how to add vision capabilities to GPT4. **Note: works best with a GPU** | Deepgram, OpenAI, Moondream, Daily, Daily Prebuilt UI |
| Function-calling Chatbot (TBC) | A chatbot that can call functions in response to user input | Deepgram, OpenAI, Fireworks, Daily, Daily Prebuilt UI |
| Project | Description | Services |
|----------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------|
| [Simple Chatbot](simple-chatbot) | Basic voice-driven conversational bot. A good starting point for learning the flow of the framework. | Deepgram, ElevenLabs, OpenAI, Daily, Daily Prebuilt UI |
| [Storytelling Chatbot](storytelling-chatbot) | Stitches together multiple third-party services to create a collaborative storytime experience. | Deepgram, ElevenLabs, OpenAI, Fal, Daily, Custom UI |
| [Translation Chatbot](translation-chatbot) | Listens for user speech, then translates that speech to Spanish and speaks the translation back. Demonstrates multi-participant use-cases. | Deepgram, Azure, OpenAI, Daily, Daily Prebuilt UI |
| [Moondream Chatbot](moondream-chatbot) | Demonstrates how to add vision capabilities to GPT4. **Note: works best with a GPU** | Deepgram, ElevenLabs, OpenAI, Moondream, Daily, Daily Prebuilt UI |
| [Patient intake](patient-intake) | A chatbot that can call functions in response to user input. | Deepgram, ElevenLabs, OpenAI, Daily, Daily Prebuilt UI |
| [Dialin Chatbot](dialin-chatbot) | A chatbot that connects to an incoming phone call from Daily or Twilio. | Deepgram, ElevenLabs, OpenAI, Daily, Twilio |
| [Twilio Chatbot](twilio-chatbot) | A chatbot that connects to an incoming phone call from Twilio. | Deepgram, ElevenLabs, OpenAI, Daily, Twilio |
| [studypal](studypal) | A chatbot to have a conversation about any article on the web | |
> [!IMPORTANT]
> These example projects use Daily as a WebRTC transport and can be joined using their hosted Prebuilt UI.

View File

@@ -0,0 +1,13 @@
FROM python:3.11-bullseye
# Open port 7860 for http service
ENV FAST_API_PORT=7860
EXPOSE 7860
# Install Python dependencies
COPY *.py .
COPY ./requirements.txt requirements.txt
RUN pip3 install --no-cache-dir --upgrade -r requirements.txt
# Start the FastAPI server
CMD python3 bot_runner.py --port ${FAST_API_PORT}

View File

@@ -0,0 +1,39 @@
# Fly.io deployment example
This project modifies the `bot_runner.py` server to launch a new machine for each user session. This is a recommended approach for production vs. running shell processess as your deployment will quickly run out of system resources under load.
For this example, we are using Daily as a WebRTC transport and provisioning a new room and token for each session. You can use another transport, such as WebSockets, by modifying the `bot.py` and `bot_runner.py` files accordingly.
## Setting up your fly.io deployment
### Create your fly.toml file
You can copy the `example-fly.toml` as a reference. Be sure to change the app name to something unique.
### Create your .env file
Copy the base `env.example` to `.env` and enter the necessary API keys.
`FLY_APP_NAME` should match that in the `fly.toml` file.
### Launch a new fly.io project
`fly launch` or `fly launch --org your-org-name`
### Set the necessary app secrets from your .env
Note: you can do this manually via the fly.io dashboard under the "secrets" sub-section of your deployment (e.g. "https://fly.io/apps/fly-app-name/secrets") or run the following terminal command:
`cat .env | tr '\n' ' ' | xargs flyctl secrets set`
### Deploy your machine
`fly deploy`
## Connecting to your bot
Send a post request to your running fly.io instance:
`curl --location --request POST 'https://YOUR_FLY_APP_NAME/start_bot'`
This request will wait until the machine enters into a `starting` state, before returning the a room URL and token to join.

View File

@@ -0,0 +1,104 @@
import asyncio
import os
import sys
import argparse
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator,
LLMUserResponseAggregator,
)
from pipecat.frames.frames import LLMMessagesFrame, EndFrame
from pipecat.services.openai import OpenAILLMService
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
daily_api_key = os.getenv("DAILY_API_KEY", "")
daily_api_url = os.getenv("DAILY_API_URL", "https://api.daily.co/v1")
async def main(room_url: str, token: str):
transport = DailyTransport(
room_url,
token,
"Chatbot",
DailyParams(
api_url=daily_api_url,
api_key=daily_api_key,
audio_in_enabled=True,
audio_out_enabled=True,
camera_out_enabled=False,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
transcription_enabled=True,
),
)
tts = ElevenLabsTTSService(
api_key=os.getenv("ELEVENLABS_API_KEY", ""),
voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are Chatbot, a friendly, helpful robot. Your output will be converted to audio so don't include special characters other than '!' or '?' in your answers. Respond to what the user said in a creative and helpful way, but keep your responses brief. Start by saying hello.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline(
[
transport.input(),
tma_in,
llm,
tts,
transport.output(),
tma_out,
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
await task.queue_frames([LLMMessagesFrame(messages)])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
@transport.event_handler("on_call_state_updated")
async def on_call_state_updated(transport, state):
if state == "left":
await task.queue_frame(EndFrame())
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Pipecat Bot")
parser.add_argument("-u", type=str, help="Room URL")
parser.add_argument("-t", type=str, help="Token")
config = parser.parse_args()
asyncio.run(main(config.u, config.t))

View File

@@ -0,0 +1,211 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import aiohttp
import argparse
import subprocess
import os
from contextlib import asynccontextmanager
from fastapi import FastAPI, Request, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import JSONResponse
from pipecat.transports.services.helpers.daily_rest import (
DailyRESTHelper,
DailyRoomObject,
DailyRoomProperties,
DailyRoomParams,
)
from dotenv import load_dotenv
load_dotenv(override=True)
# ------------ Configuration ------------ #
MAX_SESSION_TIME = 5 * 60 # 5 minutes
REQUIRED_ENV_VARS = [
"DAILY_API_KEY",
"OPENAI_API_KEY",
"ELEVENLABS_API_KEY",
"ELEVENLABS_VOICE_ID",
"FLY_API_KEY",
"FLY_APP_NAME",
]
FLY_API_HOST = os.getenv("FLY_API_HOST", "https://api.machines.dev/v1")
FLY_APP_NAME = os.getenv("FLY_APP_NAME", "pipecat-fly-example")
FLY_API_KEY = os.getenv("FLY_API_KEY", "")
FLY_HEADERS = {"Authorization": f"Bearer {FLY_API_KEY}", "Content-Type": "application/json"}
daily_helpers = {}
# ----------------- API ----------------- #
@asynccontextmanager
async def lifespan(app: FastAPI):
aiohttp_session = aiohttp.ClientSession()
daily_helpers["rest"] = DailyRESTHelper(
daily_api_key=os.getenv("DAILY_API_KEY", ""),
daily_api_url=os.getenv("DAILY_API_URL", "https://api.daily.co/v1"),
aiohttp_session=aiohttp_session,
)
yield
await aiohttp_session.close()
app = FastAPI(lifespan=lifespan)
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
# ----------------- Main ----------------- #
async def spawn_fly_machine(room_url: str, token: str):
async with aiohttp.ClientSession() as session:
# Use the same image as the bot runner
async with session.get(
f"{FLY_API_HOST}/apps/{FLY_APP_NAME}/machines", headers=FLY_HEADERS
) as r:
if r.status != 200:
text = await r.text()
raise Exception(f"Unable to get machine info from Fly: {text}")
data = await r.json()
image = data[0]["config"]["image"]
# Machine configuration
cmd = f"python3 bot.py -u {room_url} -t {token}"
cmd = cmd.split()
worker_props = {
"config": {
"image": image,
"auto_destroy": True,
"init": {"cmd": cmd},
"restart": {"policy": "no"},
"guest": {"cpu_kind": "shared", "cpus": 1, "memory_mb": 1024},
},
}
# Spawn a new machine instance
async with session.post(
f"{FLY_API_HOST}/apps/{FLY_APP_NAME}/machines", headers=FLY_HEADERS, json=worker_props
) as r:
if r.status != 200:
text = await r.text()
raise Exception(f"Problem starting a bot worker: {text}")
data = await r.json()
# Wait for the machine to enter the started state
vm_id = data["id"]
async with session.get(
f"{FLY_API_HOST}/apps/{FLY_APP_NAME}/machines/{vm_id}/wait?state=started",
headers=FLY_HEADERS,
) as r:
if r.status != 200:
text = await r.text()
raise Exception(f"Bot was unable to enter started state: {text}")
print(f"Machine joined room: {room_url}")
@app.post("/start_bot")
async def start_bot(request: Request) -> JSONResponse:
try:
data = await request.json()
# Is this a webhook creation request?
if "test" in data:
return JSONResponse({"test": True})
except Exception as e:
pass
# Use specified room URL, or create a new one if not specified
room_url = os.getenv("DAILY_SAMPLE_ROOM_URL", "")
if not room_url:
params = DailyRoomParams(properties=DailyRoomProperties())
try:
room: DailyRoomObject = await daily_helpers["rest"].create_room(params=params)
except Exception as e:
raise HTTPException(status_code=500, detail=f"Unable to provision room {e}")
else:
# Check passed room URL exists, we should assume that it already has a sip set up
try:
room: DailyRoomObject = await daily_helpers["rest"].get_room_from_url(room_url)
except Exception:
raise HTTPException(status_code=500, detail=f"Room not found: {room_url}")
# Give the agent a token to join the session
token = await daily_helpers["rest"].get_token(room.url, MAX_SESSION_TIME)
if not room or not token:
raise HTTPException(status_code=500, detail=f"Failed to get token for room: {room_url}")
# Launch a new fly.io machine, or run as a shell process (not recommended)
run_as_process = os.getenv("RUN_AS_PROCESS", False)
if run_as_process:
try:
subprocess.Popen(
[f"python3 -m bot -u {room.url} -t {token}"],
shell=True,
bufsize=1,
cwd=os.path.dirname(os.path.abspath(__file__)),
)
except Exception as e:
raise HTTPException(status_code=500, detail=f"Failed to start subprocess: {e}")
else:
try:
await spawn_fly_machine(room.url, token)
except Exception as e:
raise HTTPException(status_code=500, detail=f"Failed to spawn VM: {e}")
# Grab a token for the user to join with
user_token = await daily_helpers["rest"].get_token(room.url, MAX_SESSION_TIME)
return JSONResponse(
{
"room_url": room.url,
"token": user_token,
}
)
if __name__ == "__main__":
# Check environment variables
for env_var in REQUIRED_ENV_VARS:
if env_var not in os.environ:
raise Exception(f"Missing environment variable: {env_var}.")
parser = argparse.ArgumentParser(description="Pipecat Bot Runner")
parser.add_argument(
"--host", type=str, default=os.getenv("HOST", "0.0.0.0"), help="Host address"
)
parser.add_argument("--port", type=int, default=os.getenv("PORT", 7860), help="Port number")
parser.add_argument(
"--reload", action="store_true", default=False, help="Reload code on change"
)
config = parser.parse_args()
try:
import uvicorn
uvicorn.run("bot_runner:app", host=config.host, port=config.port, reload=config.reload)
except KeyboardInterrupt:
print("Pipecat runner shutting down...")

View File

@@ -0,0 +1,8 @@
DAILY_API_KEY=
DAILY_SAMPLE_ROOM_URL= # Enter a Daily room URL to use a set room URL each time (useful for local testing)
OPENAI_API_KEY=
ELEVENLABS_API_KEY=
ELEVENLABS_VOICE_ID=
FLY_API_KEY=
FLY_APP_NAME=
RUN_AS_PROCESS= # Spawn fly.io machine for each session or run as local process

View File

@@ -0,0 +1,25 @@
# fly.toml app configuration file generated for pipecat-fly-example on 2024-07-01T15:04:53+01:00
#
# See https://fly.io/docs/reference/configuration/ for information about how to use this file.
#
app = 'pipecat-fly-example'
primary_region = 'sjc'
[build]
[env]
FLY_APP_NAME = 'pipecat-fly-example'
[http_service]
internal_port = 7860
force_https = true
auto_stop_machines = true
auto_start_machines = true
min_machines_running = 0
processes = ['app']
[[vm]]
memory = 512
cpu_kind = 'shared'
cpus = 1

View File

@@ -0,0 +1,5 @@
pipecat-ai[daily,openai,silero]
fastapi
uvicorn
python-dotenv
loguru

View File

@@ -0,0 +1,3 @@
**/.DS_Store
.env
.env.*

165
examples/dialin-chatbot/.gitignore vendored Normal file
View File

@@ -0,0 +1,165 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
.pybuilder/
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock
# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype static type analyzer
.pytype/
# Cython debug symbols
cython_debug/
# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
runpod.toml
# custom script to recursively upgrade items in requirements.py
upgrade_requirements.py
.DS_Store

View File

@@ -0,0 +1,40 @@
FROM python:3.11-bullseye
ARG DEBIAN_FRONTEND=noninteractive
ARG USE_PERSISTENT_DATA
ENV PYTHONUNBUFFERED=1
# Expose FastAPI port
ENV FAST_API_PORT=7860
EXPOSE 7860
# Install system dependencies
RUN apt-get update && apt-get install --no-install-recommends -y \
build-essential \
git \
ffmpeg \
google-perftools \
ca-certificates curl gnupg \
&& apt-get clean && rm -rf /var/lib/apt/lists/*
# Set up a new user named "user" with user ID 1000
RUN useradd -m -u 1000 user
# Set home to the user's home directory
ENV HOME=/home/user \
PATH=/home/user/.local/bin:$PATH \
PYTHONPATH=$HOME/app \
PYTHONUNBUFFERED=1
# Switch to the "user" user
USER user
# Set the working directory to the user's home directory
WORKDIR $HOME/app
# Install Python dependencies
COPY *.py .
COPY ./requirements.txt requirements.txt
RUN pip3 install --no-cache-dir --upgrade -r requirements.txt
# Start the FastAPI server
CMD python3 bot_runner.py --host "0.0.0.0" --port ${FAST_API_PORT}

View File

@@ -0,0 +1,85 @@
<div align="center">
 <img alt="pipecat" width="300px" height="auto" src="image.png">
</div>
# Dialin example
Example project that demonstrates how to add phone number dialin to your Pipecat bots. We include examples for both Daily (`bot_daily.py`) and Twilio (`bot_twilio.py`), depending on who you want to use as a phone vendor.
- 🔁 Transport: Daily WebRTC
- 💬 Speech-to-Text: Deepgram via Daily transport
- 🤖 LLM: GPT4-o / OpenAI
- 🔉 Text-to-Speech: ElevenLabs
#### Should I use Daily or Twilio as a vendor?
If you're starting from scratch, using Daily to provision phone numbers alongside Daily as a transport offers some convenience (such as automatic call forwarding.)
If you already have Twilio numbers and workflows that you want to connect to your Pipecat bots, there is some additional configuration required (you'll need to create a `on_dialin_ready` and use the Twilio client to trigger the forward.)
You can read more about this, as well as see respective walkthroughs in our docs.
## Setup
```shell
# Install the requirements
pip install -r requirements.txt
# Setup your env
mv env.example .env
```
## Using Daily numbers
Run `bot_runner.py` to handle incoming HTTP requests:
`python bot_runner.py --host localhost`
Then target the following URL:
`POST /daily_start_bot`
For more configuration options, please consult Daily's API documentation.
## Using Twilio numbers
As above, but target the following URL:
`POST /twilio_start_bot`
For more configuration options, please consult Twilio's API documentation.
## Deployment example
A Dockerfile is included in this demo for convenience. Here is an example of how to build and deploy your bot to [fly.io](https://fly.io).
*Please note: This demo spawns agents as subprocesses for convenience / demonstration purposes. You would likely not want to do this in production as it would limit concurrency to available system resources. For more information on how to deploy your bots using VMs, refer to the Pipecat documentation.*
### Build the docker image
`docker build -t tag:project .`
### Launch the fly project
`mv fly.example.toml fly.toml`
`fly launch` (using the included fly.toml)
### Setup your secrets on Fly
Set the necessary secrets (found in `env.example`)
`fly secrets set DAILY_API_KEY=... OPENAI_API_KEY=... ELEVENLABS_API_KEY=... ELEVENLABS_VOICE_ID=...`
If you're using Twilio as a number vendor:
`fly secrets set TWILIO_ACCOUNT_SID=... TWILIO_AUTH_TOKEN=...`
### Deploy!
`fly deploy`
## Need to do something more advanced?
This demo covers the basics of bot telephony. If you want to know more about working with PSTN / SIP, please ping us on [Discord](https://discord.gg/pipecat).

View File

@@ -0,0 +1,106 @@
import asyncio
import os
import sys
import argparse
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator,
LLMUserResponseAggregator,
)
from pipecat.frames.frames import LLMMessagesFrame, EndFrame
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport, DailyDialinSettings
from pipecat.vad.silero import SileroVADAnalyzer
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
daily_api_key = os.getenv("DAILY_API_KEY", "")
daily_api_url = os.getenv("DAILY_API_URL", "https://api.daily.co/v1")
async def main(room_url: str, token: str, callId: str, callDomain: str):
# diallin_settings are only needed if Daily's SIP URI is used
# If you are handling this via Twilio, Telnyx, set this to None
# and handle call-forwarding when on_dialin_ready fires.
diallin_settings = DailyDialinSettings(call_id=callId, call_domain=callDomain)
transport = DailyTransport(
room_url,
token,
"Chatbot",
DailyParams(
api_url=daily_api_url,
api_key=daily_api_key,
dialin_settings=diallin_settings,
audio_in_enabled=True,
audio_out_enabled=True,
camera_out_enabled=False,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
transcription_enabled=True,
),
)
tts = ElevenLabsTTSService(
api_key=os.getenv("ELEVENLABS_API_KEY", ""),
voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are Chatbot, a friendly, helpful robot. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way, but keep your responses brief. Start by saying 'Oh, hello! Who dares dial me at this hour?!'.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline(
[
transport.input(),
tma_in,
llm,
tts,
transport.output(),
tma_out,
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
await task.queue_frames([LLMMessagesFrame(messages)])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Pipecat Simple ChatBot")
parser.add_argument("-u", type=str, help="Room URL")
parser.add_argument("-t", type=str, help="Token")
parser.add_argument("-i", type=str, help="Call ID")
parser.add_argument("-d", type=str, help="Call Domain")
config = parser.parse_args()
asyncio.run(main(config.u, config.t, config.i, config.d))

View File

@@ -0,0 +1,220 @@
"""
bot_runner.py
HTTP service that listens for incoming calls from either Daily or Twilio,
provisioning a room and starting a Pipecat bot in response.
Refer to README for more information.
"""
import aiohttp
import os
import argparse
import subprocess
from contextlib import asynccontextmanager
from fastapi import FastAPI, Request, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import JSONResponse, PlainTextResponse
from twilio.twiml.voice_response import VoiceResponse
from pipecat.transports.services.helpers.daily_rest import (
DailyRESTHelper,
DailyRoomObject,
DailyRoomProperties,
DailyRoomSipParams,
DailyRoomParams,
)
from dotenv import load_dotenv
load_dotenv(override=True)
# ------------ Configuration ------------ #
MAX_SESSION_TIME = 5 * 60 # 5 minutes
REQUIRED_ENV_VARS = ["OPENAI_API_KEY", "DAILY_API_KEY", "ELEVENLABS_API_KEY", "ELEVENLABS_VOICE_ID"]
daily_helpers = {}
# ----------------- API ----------------- #
@asynccontextmanager
async def lifespan(app: FastAPI):
aiohttp_session = aiohttp.ClientSession()
daily_helpers["rest"] = DailyRESTHelper(
daily_api_key=os.getenv("DAILY_API_KEY", ""),
daily_api_url=os.getenv("DAILY_API_URL", "https://api.daily.co/v1"),
aiohttp_session=aiohttp_session,
)
yield
await aiohttp_session.close()
app = FastAPI(lifespan=lifespan)
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
"""
Create Daily room, tell the bot if the room is created for Twilio's SIP or Daily's SIP (vendor).
When the vendor is Daily, the bot handles the call forwarding automatically,
i.e, forwards the call from the "hold music state" to the Daily Room's SIP URI.
Alternatively, when the vendor is Twilio (not Daily), the bot is responsible for
updating the state on Twilio. So when `dialin-ready` fires, it takes appropriate
action using the Twilio Client library.
"""
async def _create_daily_room(room_url, callId, callDomain=None, vendor="daily"):
if not room_url:
params = DailyRoomParams(
properties=DailyRoomProperties(
# Note: these are the default values, except for the display name
sip=DailyRoomSipParams(
display_name="dialin-user", video=False, sip_mode="dial-in", num_endpoints=1
)
)
)
print(f"Creating new room...")
room: DailyRoomObject = await daily_helpers["rest"].create_room(params=params)
else:
# Check passed room URL exist (we assume that it already has a sip set up!)
try:
print(f"Joining existing room: {room_url}")
room: DailyRoomObject = await daily_helpers["rest"].get_room_from_url(room_url)
except Exception:
raise HTTPException(status_code=500, detail=f"Room not found: {room_url}")
print(f"Daily room: {room.url} {room.config.sip_endpoint}")
# Give the agent a token to join the session
token = await daily_helpers["rest"].get_token(room.url, MAX_SESSION_TIME)
if not room or not token:
raise HTTPException(status_code=500, detail=f"Failed to get room or token token")
# Spawn a new agent, and join the user session
# Note: this is mostly for demonstration purposes (refer to 'deployment' in docs)
if vendor == "daily":
bot_proc = f"python3 - m bot_daily - u {room.url} - t {token} - i {
callId} - d {callDomain}"
else:
bot_proc = f"python3 - m bot_twilio - u {room.url} - t {
token} - i {callId} - s {room.config.sip_endpoint}"
try:
subprocess.Popen(
[bot_proc], shell=True, bufsize=1, cwd=os.path.dirname(os.path.abspath(__file__))
)
except Exception as e:
raise HTTPException(status_code=500, detail=f"Failed to start subprocess: {e}")
return room
@app.post("/twilio_start_bot", response_class=PlainTextResponse)
async def twilio_start_bot(request: Request):
print(f"POST /twilio_voice_bot")
# twilio_start_bot is invoked directly by Twilio (as a web hook).
# On Twilio, under Active Numbers, pick the phone number
# Click Configure and under Voice Configuration,
# "a call comes in" choose webhook and point the URL to
# where this code is hosted.
data = {}
try:
# shouldnt have received json, twilio sends form data
form_data = await request.form()
data = dict(form_data)
except Exception:
pass
room_url = os.getenv("DAILY_SAMPLE_ROOM_URL", None)
callId = data.get("CallSid")
if not callId:
raise HTTPException(status_code=500, detail="Missing 'CallSid' in request")
print("CallId: %s" % callId)
# create room and tell the bot to join the created room
# note: Twilio does not require a callDomain
room: DailyRoomObject = await _create_daily_room(room_url, callId, None, "twilio")
print(f"Put Twilio on hold...")
# We have the room and the SIP URI,
# but we do not know if the Daily SIP Worker and the Bot have joined the call
# put the call on hold until the 'on_dialin_ready' fires.
# Then, the bot will update the called sid with the sip uri.
# http://com.twilio.music.classical.s3.amazonaws.com/BusyStrings.mp3
resp = VoiceResponse()
resp.play(
url="http://com.twilio.sounds.music.s3.amazonaws.com/MARKOVICHAMP-Borghestral.mp3", loop=10
)
return str(resp)
@app.post("/daily_start_bot")
async def daily_start_bot(request: Request) -> JSONResponse:
# The /daily_start_bot is invoked when a call is received on Daily's SIP URI
# daily_start_bot will create the room, put the call on hold until
# the bot and sip worker are ready. Daily will automatically
# forward the call to the SIP URi when dialin_ready fires.
# Use specified room URL, or create a new one if not specified
room_url = os.getenv("DAILY_SAMPLE_ROOM_URL", None)
# Get the dial-in properties from the request
try:
data = await request.json()
if "test" in data:
# Pass through any webhook checks
return JSONResponse({"test": True})
callId = data.get("callId", None)
callDomain = data.get("callDomain", None)
except Exception:
raise HTTPException(status_code=500, detail="Missing properties 'callId' or 'callDomain'")
print(f"CallId: {callId}, CallDomain: {callDomain}")
room: DailyRoomObject = await _create_daily_room(room_url, callId, callDomain, "daily")
# Grab a token for the user to join with
return JSONResponse({"room_url": room.url, "sipUri": room.config.sip_endpoint})
# ----------------- Main ----------------- #
if __name__ == "__main__":
# Check environment variables
for env_var in REQUIRED_ENV_VARS:
if env_var not in os.environ:
raise Exception(f"Missing environment variable: {env_var}.")
parser = argparse.ArgumentParser(description="Pipecat Bot Runner")
parser.add_argument(
"--host", type=str, default=os.getenv("HOST", "0.0.0.0"), help="Host address"
)
parser.add_argument("--port", type=int, default=os.getenv("PORT", 7860), help="Port number")
parser.add_argument("--reload", action="store_true", default=True, help="Reload code on change")
config = parser.parse_args()
try:
import uvicorn
uvicorn.run("bot_runner:app", host=config.host, port=config.port, reload=config.reload)
except KeyboardInterrupt:
print("Pipecat runner shutting down...")

View File

@@ -0,0 +1,123 @@
import asyncio
import os
import sys
import argparse
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator,
LLMUserResponseAggregator,
)
from pipecat.frames.frames import LLMMessagesFrame, EndFrame
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from twilio.rest import Client
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
twilio_account_sid = os.getenv("TWILIO_ACCOUNT_SID")
twilio_auth_token = os.getenv("TWILIO_AUTH_TOKEN")
twilioclient = Client(twilio_account_sid, twilio_auth_token)
daily_api_key = os.getenv("DAILY_API_KEY", "")
async def main(room_url: str, token: str, callId: str, sipUri: str):
# dialin_settings are only needed if Daily's SIP URI is used
# If you are handling this via Twilio, Telnyx, set this to None
# and handle call-forwarding when on_dialin_ready fires.
transport = DailyTransport(
room_url,
token,
"Chatbot",
DailyParams(
api_key=daily_api_key,
dialin_settings=None, # Not required for Twilio
audio_in_enabled=True,
audio_out_enabled=True,
camera_out_enabled=False,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
transcription_enabled=True,
),
)
tts = ElevenLabsTTSService(
api_key=os.getenv("ELEVENLABS_API_KEY", ""),
voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are Chatbot, a friendly, helpful robot. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way, but keep your responses brief. Start by saying 'Hello! Who dares dial me at this hour?!'.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline(
[
transport.input(),
tma_in,
llm,
tts,
transport.output(),
tma_out,
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
await task.queue_frames([LLMMessagesFrame(messages)])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
@transport.event_handler("on_dialin_ready")
async def on_dialin_ready(transport, cdata):
# For Twilio, Telnyx, etc. You need to update the state of the call
# and forward it to the sip_uri..
print(f"Forwarding call: {callId} {sipUri}")
try:
# The TwiML is updated using Twilio's client library
call = twilioclient.calls(callId).update(
twiml=f"<Response><Dial><Sip>{sipUri}</Sip></Dial></Response>"
)
except Exception as e:
raise Exception(f"Failed to forward call: {str(e)}")
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Pipecat Simple ChatBot")
parser.add_argument("-u", type=str, help="Room URL")
parser.add_argument("-t", type=str, help="Token")
parser.add_argument("-i", type=str, help="Call ID")
parser.add_argument("-s", type=str, help="SIP URI")
config = parser.parse_args()
asyncio.run(main(config.u, config.t, config.i, config.s))

View File

@@ -0,0 +1,8 @@
DAILY_SAMPLE_ROOM_URL=https://yourdomain.daily.co/yourroom # (optional: for joining the bot to the same room repeatedly for local dev)
DAILY_API_KEY=.
DAILY_API_URL=api.daily.co/v1
OPENAI_API_KEY=
ELEVENLABS_API_KEY=
ELEVENLABS_VOICE_ID=
TWILIO_ACCOUNT_SID=
TWILIO_AUTH_TOKEN=

View File

@@ -0,0 +1,19 @@
# fly.toml app configuration file generated for pipecat-dialin-demo on 2024-06-03T15:57:57+02:00
#
# See https://fly.io/docs/reference/configuration/ for information about how to use this file.
#
app = 'pipecat-dialin-demo'
primary_region = 'sjc'
[build]
[http_service]
internal_port = 7860
force_https = true
auto_stop_machines = true
auto_start_machines = true
min_machines_running = 1
[[vm]]
size = 'performance-1x'

Binary file not shown.

After

Width:  |  Height:  |  Size: 19 KiB

View File

@@ -0,0 +1,6 @@
pipecat-ai[daily,elevenlabs,openai,silero]
fastapi
uvicorn
python-dotenv
twilio
python-multipart

View File

@@ -13,7 +13,7 @@ from pipecat.frames.frames import EndFrame, TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.pipeline.runner import PipelineRunner
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.cartesia import CartesiaHttpTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
@@ -21,21 +21,24 @@ from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url):
async def main():
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url, None, "Say One Thing", DailyParams(audio_out_enabled=True))
(room_url, _) = await configure(session)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
transport = DailyTransport(
room_url, None, "Say One Thing", DailyParams(audio_out_enabled=True)
)
tts = CartesiaHttpTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
runner = PipelineRunner()
@@ -46,11 +49,11 @@ async def main(room_url):
# participant joins.
@transport.event_handler("on_participant_joined")
async def on_new_participant_joined(transport, participant):
participant_name = participant["info"]["userName"] or ''
participant_name = participant["info"]["userName"] or ""
await task.queue_frames([TextFrame(f"Hello there, {participant_name}!"), EndFrame()])
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))
asyncio.run(main())

View File

@@ -9,17 +9,18 @@ import aiohttp
import os
import sys
from pipecat.frames.frames import EndFrame, TextFrame
from pipecat.frames.frames import TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.local.audio import LocalAudioTransport
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
@@ -30,10 +31,9 @@ async def main():
async with aiohttp.ClientSession() as session:
transport = LocalAudioTransport(TransportParams(audio_out_enabled=True))
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
pipeline = Pipeline([tts, transport.output()])
@@ -42,7 +42,7 @@ async def main():
async def say_something():
await asyncio.sleep(1)
await task.queue_frames([TextFrame("Hello there!"), EndFrame()])
await task.queue_frame(TextFrame("Hello there!"))
runner = PipelineRunner()

View File

@@ -13,7 +13,7 @@ from pipecat.frames.frames import EndFrame, LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.cartesia import CartesiaHttpTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
@@ -22,35 +22,34 @@ from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url):
async def main():
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
None,
"Say One Thing From an LLM",
DailyParams(audio_out_enabled=True))
(room_url, _) = await configure(session)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
transport = DailyTransport(
room_url, None, "Say One Thing From an LLM", DailyParams(audio_out_enabled=True)
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
tts = CartesiaHttpTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are an LLM in a WebRTC session, and this is a 'hello world' demo. Say hello to the world.",
}]
}
]
runner = PipelineRunner()
@@ -64,5 +63,4 @@ async def main(room_url):
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))
asyncio.run(main())

View File

@@ -21,29 +21,26 @@ from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url):
async def main():
async with aiohttp.ClientSession() as session:
(room_url, _) = await configure(session)
transport = DailyTransport(
room_url,
None,
"Show a still frame image",
DailyParams(
camera_out_enabled=True,
camera_out_width=1024,
camera_out_height=1024
)
DailyParams(camera_out_enabled=True, camera_out_width=1024, camera_out_height=1024),
)
imagegen = FalImageGenService(
params=FalImageGenService.InputParams(
image_size="square_hd"
),
params=FalImageGenService.InputParams(image_size="square_hd"),
aiohttp_session=session,
key=os.getenv("FAL_KEY"),
)
@@ -64,5 +61,4 @@ async def main(room_url):
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))
asyncio.run(main())

View File

@@ -22,6 +22,7 @@ from pipecat.transports.local.tk import TkLocalTransport
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
@@ -35,15 +36,11 @@ async def main():
transport = TkLocalTransport(
tk_root,
TransportParams(
camera_out_enabled=True,
camera_out_width=1024,
camera_out_height=1024))
TransportParams(camera_out_enabled=True, camera_out_width=1024, camera_out_height=1024),
)
imagegen = FalImageGenService(
params=FalImageGenService.InputParams(
image_size="square_hd"
),
params=FalImageGenService.InputParams(image_size="square_hd"),
aiohttp_session=session,
key=os.getenv("FAL_KEY"),
)
@@ -56,7 +53,7 @@ async def main():
runner = PipelineRunner()
async def run_tk():
while runner.is_active():
while not task.has_finished():
tk_root.update()
tk_root.update_idletasks()
await asyncio.sleep(0.1)

View File

@@ -4,6 +4,10 @@
# SPDX-License-Identifier: BSD 2-Clause License
#
#
# This example broken on latest pipecat and needs updating.
#
import aiohttp
import asyncio
import os
@@ -24,14 +28,17 @@ from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url: str):
async def main():
async with aiohttp.ClientSession() as session:
(room_url, _) = await configure(session)
transport = DailyTransport(room_url, None, "Static And Dynamic Speech")
meeting = TransportServiceOutput(transport, mic_enabled=True)
@@ -52,8 +59,7 @@ async def main(room_url: str):
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
messages = [{"role": "system",
"content": "tell the user a joke about llamas"}]
messages = [{"role": "system", "content": "tell the user a joke about llamas"}]
# Start a task to run the LLM to create a joke, and convert the LLM
# output to audio frames. This task will run in parallel with generating
@@ -71,8 +77,7 @@ async def main(room_url: str):
]
)
merge_pipeline = SequentialMergePipeline(
[simple_tts_pipeline, llm_pipeline])
merge_pipeline = SequentialMergePipeline([simple_tts_pipeline, llm_pipeline])
await asyncio.gather(
transport.run(merge_pipeline),
@@ -82,5 +87,4 @@ async def main(room_url: str):
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))
asyncio.run(main())

View File

@@ -13,23 +13,19 @@ from dataclasses import dataclass
from pipecat.frames.frames import (
AppFrame,
EndFrame,
Frame,
ImageRawFrame,
LLMFullResponseStartFrame,
LLMMessagesFrame,
TextFrame
TextFrame,
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.sync_parallel_pipeline import SyncParallelPipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.processors.aggregators.gated import GatedAggregator
from pipecat.processors.aggregators.llm_response import LLMFullResponseAggregator
from pipecat.processors.aggregators.sentence import SentenceAggregator
from pipecat.processors.aggregators.parallel_task import ParallelTask
from pipecat.services.cartesia import CartesiaHttpTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.fal import FalImageGenService
from pipecat.transports.services.daily import DailyParams, DailyTransport
@@ -38,6 +34,7 @@ from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
@@ -59,6 +56,8 @@ class MonthPrepender(FrameProcessor):
self.prepend_to_next_text_frame = False
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, MonthFrame):
self.most_recent_month = frame.month
elif self.prepend_to_next_text_frame and isinstance(frame, TextFrame):
@@ -71,8 +70,10 @@ class MonthPrepender(FrameProcessor):
await self.push_frame(frame, direction)
async def main(room_url):
async def main():
async with aiohttp.ClientSession() as session:
(room_url, _) = await configure(session)
transport = DailyTransport(
room_url,
None,
@@ -81,48 +82,44 @@ async def main(room_url):
audio_out_enabled=True,
camera_out_enabled=True,
camera_out_width=1024,
camera_out_height=1024
)
camera_out_height=1024,
),
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
tts = CartesiaHttpTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
imagegen = FalImageGenService(
params=FalImageGenService.InputParams(
image_size="square_hd"
),
params=FalImageGenService.InputParams(image_size="square_hd"),
aiohttp_session=session,
key=os.getenv("FAL_KEY"),
)
gated_aggregator = GatedAggregator(
gate_open_fn=lambda frame: isinstance(frame, ImageRawFrame),
gate_close_fn=lambda frame: isinstance(frame, LLMFullResponseStartFrame),
start_open=False
)
sentence_aggregator = SentenceAggregator()
month_prepender = MonthPrepender()
llm_full_response_aggregator = LLMFullResponseAggregator()
pipeline = Pipeline([
llm, # LLM
sentence_aggregator, # Aggregates LLM output into full sentences
ParallelTask( # Run pipelines in parallel aggregating the result
[month_prepender, tts], # Create "Month: sentence" and output audio
[llm_full_response_aggregator, imagegen] # Aggregate full LLM response
),
gated_aggregator, # Queues everything until an image is available
transport.output() # Transport output
])
# With `SyncParallelPipeline` we synchronize audio and images by pushing
# them basically in order (e.g. I1 A1 A1 A1 I2 A2 A2 A2 A2 I3 A3). To do
# that, each pipeline runs concurrently and `SyncParallelPipeline` will
# wait for the input frame to be processed.
#
# Note that `SyncParallelPipeline` requires all processors in it to be
# synchronous (which is the default for most processors).
pipeline = Pipeline(
[
llm, # LLM
sentence_aggregator, # Aggregates LLM output into full sentences
SyncParallelPipeline( # Run pipelines in parallel aggregating the result
[month_prepender, tts], # Create "Month: sentence" and output audio
[imagegen], # Generate image
),
transport.output(), # Transport output
]
)
frames = []
for month in [
@@ -148,8 +145,6 @@ async def main(room_url):
frames.append(MonthFrame(month=month))
frames.append(LLMMessagesFrame(messages))
frames.append(EndFrame())
runner = PipelineRunner()
task = PipelineTask(pipeline)
@@ -160,5 +155,4 @@ async def main(room_url):
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))
asyncio.run(main())

View File

@@ -11,18 +11,25 @@ import sys
import tkinter as tk
from pipecat.frames.frames import AudioRawFrame, Frame, URLImageRawFrame, LLMMessagesFrame, TextFrame
from pipecat.pipeline.parallel_pipeline import ParallelPipeline
from pipecat.frames.frames import (
Frame,
OutputAudioRawFrame,
TTSAudioRawFrame,
URLImageRawFrame,
LLMMessagesFrame,
TextFrame,
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.sync_parallel_pipeline import SyncParallelPipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.llm_response import LLMFullResponseAggregator
from pipecat.processors.aggregators.sentence import SentenceAggregator
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.cartesia import CartesiaHttpTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.fal import FalImageGenService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.local.tk import TkLocalTransport
from pipecat.transports.local.tk import TkLocalTransport, TkOutputTransport
from loguru import logger
@@ -42,7 +49,12 @@ async def main():
runner = PipelineRunner()
async def get_month_data(month):
messages = [{"role": "system", "content": f"Describe a nature photograph suitable for use in a calendar, for the month of {month}. Include only the image description with no preamble. Limit the description to one sentence, please.", }]
messages = [
{
"role": "system",
"content": f"Describe a nature photograph suitable for use in a calendar, for the month of {month}. Include only the image description with no preamble. Limit the description to one sentence, please.",
}
]
class ImageDescription(FrameProcessor):
def __init__(self):
@@ -50,6 +62,8 @@ async def main():
self.text = ""
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, TextFrame):
self.text = frame.text
await self.push_frame(frame, direction)
@@ -58,12 +72,16 @@ async def main():
def __init__(self):
super().__init__()
self.audio = bytearray()
self.frame = None
async def process_frame(self, frame: Frame, direction: FrameDirection):
if isinstance(frame, AudioRawFrame):
await super().process_frame(frame, direction)
if isinstance(frame, TTSAudioRawFrame):
self.audio.extend(frame.audio)
self.frame = AudioRawFrame(
bytes(self.audio), frame.sample_rate, frame.num_channels)
self.frame = OutputAudioRawFrame(
bytes(self.audio), frame.sample_rate, frame.num_channels
)
class ImageGrabber(FrameProcessor):
def __init__(self):
@@ -71,26 +89,25 @@ async def main():
self.frame = None
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, URLImageRawFrame):
self.frame = frame
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"))
tts = CartesiaHttpTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
imagegen = FalImageGenService(
params=FalImageGenService.InputParams(
image_size="square_hd"
),
params=FalImageGenService.InputParams(image_size="square_hd"),
aiohttp_session=session,
key=os.getenv("FAL_KEY"))
key=os.getenv("FAL_KEY"),
)
aggregator = LLMFullResponseAggregator()
sentence_aggregator = SentenceAggregator()
description = ImageDescription()
@@ -98,13 +115,25 @@ async def main():
image_grabber = ImageGrabber()
pipeline = Pipeline([
llm,
aggregator,
description,
ParallelPipeline([tts, audio_grabber],
[imagegen, image_grabber])
])
# With `SyncParallelPipeline` we synchronize audio and images by
# pushing them basically in order (e.g. I1 A1 A1 A1 I2 A2 A2 A2 A2
# I3 A3). To do that, each pipeline runs concurrently and
# `SyncParallelPipeline` will wait for the input frame to be
# processed.
#
# Note that `SyncParallelPipeline` requires all processors in it to
# be synchronous (which is the default for most processors).
pipeline = Pipeline(
[
llm, # LLM
sentence_aggregator, # Aggregates LLM output into full sentences
description, # Store sentence
SyncParallelPipeline(
[tts, audio_grabber], # Generate and store audio for the given sentence
[imagegen, image_grabber], # Generate and storeimage for the given sentence
),
]
)
task = PipelineTask(pipeline)
await task.queue_frame(LLMMessagesFrame(messages))
@@ -125,20 +154,19 @@ async def main():
audio_out_enabled=True,
camera_out_enabled=True,
camera_out_width=1024,
camera_out_height=1024))
camera_out_height=1024,
),
)
pipeline = Pipeline([transport.output()])
task = PipelineTask(pipeline)
# We only specify 5 months as we create tasks all at once and we might
# get rate limited otherwise.
# We only specify a few months as we create tasks all at once and we
# might get rate limited otherwise.
months: list[str] = [
"January",
"February",
# "March",
# "April",
# "May",
]
# We create one task per month. This will be executed concurrently.
@@ -156,7 +184,7 @@ async def main():
await runner.stop_when_done()
async def run_tk():
while True:
while not task.has_finished():
tk_root.update()
tk_root.update_idletasks()
await asyncio.sleep(0.1)

View File

@@ -9,16 +9,22 @@ import aiohttp
import os
import sys
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.frames.frames import Frame, LLMMessagesFrame, MetricsFrame
from pipecat.metrics.metrics import (
TTFBMetricsData,
ProcessingMetricsData,
LLMUsageMetricsData,
TTSUsageMetricsData,
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator,
LLMUserResponseAggregator,
)
from pipecat.processors.logger import FrameLogger
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
@@ -28,14 +34,37 @@ from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url: str, token):
class MetricsLogger(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection):
if isinstance(frame, MetricsFrame):
for d in frame.data:
if isinstance(d, TTFBMetricsData):
print(f"!!! MetricsFrame: {frame}, ttfb: {d.value}")
elif isinstance(d, ProcessingMetricsData):
print(f"!!! MetricsFrame: {frame}, processing: {d.value}")
elif isinstance(d, LLMUsageMetricsData):
tokens = d.value
print(
f"!!! MetricsFrame: {frame}, tokens: {
tokens.prompt_tokens}, characters: {
tokens.completion_tokens}"
)
elif isinstance(d, TTSUsageMetricsData):
print(f"!!! MetricsFrame: {frame}, characters: {d.value}")
await self.push_frame(frame, direction)
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
@@ -44,22 +73,18 @@ async def main(room_url: str, token):
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
fl_in = FrameLogger("Inner")
fl_out = FrameLogger("Outer")
ml = MetricsLogger()
messages = [
{
@@ -70,16 +95,17 @@ async def main(room_url: str, token):
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline([
fl_in,
transport.input(),
tma_in,
llm,
fl_out,
tts,
transport.output(),
tma_out
])
pipeline = Pipeline(
[
transport.input(),
tma_in,
llm,
tts,
ml,
transport.output(),
tma_out,
]
)
task = PipelineTask(pipeline)
@@ -87,8 +113,7 @@ async def main(room_url: str, token):
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append(
{"role": "system", "content": "Please introduce yourself to the user."})
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
@@ -97,5 +122,4 @@ async def main(room_url: str, token):
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))
asyncio.run(main())

View File

@@ -11,18 +11,19 @@ import sys
from PIL import Image
from pipecat.frames.frames import ImageRawFrame, Frame, SystemFrame, TextFrame
from pipecat.frames.frames import Frame, OutputImageRawFrame, SystemFrame, TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.llm_context import (
LLMAssistantContextAggregator,
LLMUserContextAggregator,
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator,
LLMUserResponseAggregator,
)
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.cartesia import CartesiaHttpTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.transports.services.daily import DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from pipecat.transports.services.daily import DailyParams
from runner import configure
@@ -30,6 +31,7 @@ from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
@@ -48,37 +50,53 @@ class ImageSyncAggregator(FrameProcessor):
self._waiting_image_bytes = self._waiting_image.tobytes()
async def process_frame(self, frame: Frame, direction: FrameDirection):
if not isinstance(frame, SystemFrame):
await self.push_frame(ImageRawFrame(image=self._speaking_image_bytes, size=(1024, 1024), format=self._speaking_image_format))
await super().process_frame(frame, direction)
if not isinstance(frame, SystemFrame) and direction == FrameDirection.DOWNSTREAM:
await self.push_frame(
OutputImageRawFrame(
image=self._speaking_image_bytes,
size=(1024, 1024),
format=self._speaking_image_format,
)
)
await self.push_frame(frame)
await self.push_frame(ImageRawFrame(image=self._waiting_image_bytes, size=(1024, 1024), format=self._waiting_image_format))
await self.push_frame(
OutputImageRawFrame(
image=self._waiting_image_bytes,
size=(1024, 1024),
format=self._waiting_image_format,
)
)
else:
await self.push_frame(frame)
async def main(room_url: str, token):
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
camera_out_enabled=True,
camera_out_width=1024,
camera_out_height=1024,
transcription_enabled=True
)
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
tts = CartesiaHttpTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [
{
@@ -87,31 +105,33 @@ async def main(room_url: str, token):
},
]
tma_in = LLMUserContextAggregator(messages)
tma_out = LLMAssistantContextAggregator(messages)
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
image_sync_aggregator = ImageSyncAggregator(
os.path.join(os.path.dirname(__file__), "assets", "speaking.png"),
os.path.join(os.path.dirname(__file__), "assets", "waiting.png"),
)
pipeline = Pipeline([
transport.input(),
image_sync_aggregator,
tma_in,
llm,
tts,
transport.output(),
tma_out
])
pipeline = Pipeline(
[
transport.input(),
image_sync_aggregator,
tma_in,
llm,
tts,
transport.output(),
tma_out,
]
)
task = PipelineTask(pipeline)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
participant_name = participant["info"]["userName"] or ''
participant_name = participant["info"]["userName"] or ""
transport.capture_participant_transcription(participant["id"])
await task.queue_frames([TextFrame(f"Hi, this is {participant_name}.")])
await task.queue_frames([TextFrame(f"Hi there {participant_name}!")])
runner = PipelineRunner()
@@ -119,5 +139,4 @@ async def main(room_url: str, token):
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))
asyncio.run(main())

View File

@@ -14,8 +14,10 @@ from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator, LLMUserResponseAggregator)
from pipecat.services.elevenlabs import ElevenLabsTTSService
LLMAssistantResponseAggregator,
LLMUserResponseAggregator,
)
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
@@ -25,14 +27,17 @@ from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url: str, token):
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
@@ -41,19 +46,16 @@ async def main(room_url: str, token):
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [
{
@@ -65,23 +67,32 @@ async def main(room_url: str, token):
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline([
transport.input(), # Transport user input
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out # Assistant spoken responses
])
pipeline = Pipeline(
[
transport.input(), # Transport user input
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out, # Assistant spoken responses
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
task = PipelineTask(
pipeline,
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append(
{"role": "system", "content": "Please introduce yourself to the user."})
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
@@ -90,5 +101,4 @@ async def main(room_url: str, token):
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))
asyncio.run(main())

View File

@@ -14,8 +14,10 @@ from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator, LLMUserResponseAggregator)
from pipecat.services.elevenlabs import ElevenLabsTTSService
LLMAssistantResponseAggregator,
LLMUserResponseAggregator,
)
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.anthropic import AnthropicLLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
@@ -25,14 +27,17 @@ from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url: str, token):
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
@@ -41,19 +46,18 @@ async def main(room_url: str, token):
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
llm = AnthropicLLMService(
api_key=os.getenv("ANTHROPIC_API_KEY"),
model="claude-3-opus-20240229")
api_key=os.getenv("ANTHROPIC_API_KEY"), model="claude-3-opus-20240229"
)
# todo: think more about how to handle system prompts in a more general way. OpenAI,
# Google, and Anthropic all have slightly different approaches to providing a system
@@ -68,14 +72,16 @@ async def main(room_url: str, token):
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline([
transport.input(), # Transport user input
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out # Assistant spoken responses
])
pipeline = Pipeline(
[
transport.input(), # Transport user input
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out, # Assistant spoken responses
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@@ -91,5 +97,4 @@ async def main(room_url: str, token):
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))
asyncio.run(main())

View File

@@ -0,0 +1,127 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
import aiohttp
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator,
LLMUserResponseAggregator,
)
from pipecat.processors.frameworks.langchain import LangchainProcessor
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_openai import ChatOpenAI
from loguru import logger
from runner import configure
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
message_store = {}
def get_session_history(session_id: str) -> BaseChatMessageHistory:
if session_id not in message_store:
message_store[session_id] = ChatMessageHistory()
return message_store[session_id]
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"Be nice and helpful. Answer very briefly and without special characters like `#` or `*`. "
"Your response will be synthesized to voice and those characters will create unnatural sounds.",
),
MessagesPlaceholder("chat_history"),
("human", "{input}"),
]
)
chain = prompt | ChatOpenAI(model="gpt-4o", temperature=0.7)
history_chain = RunnableWithMessageHistory(
chain,
get_session_history,
history_messages_key="chat_history",
input_messages_key="input",
)
lc = LangchainProcessor(history_chain)
tma_in = LLMUserResponseAggregator()
tma_out = LLMAssistantResponseAggregator()
pipeline = Pipeline(
[
transport.input(), # Transport user input
tma_in, # User responses
lc, # Langchain
tts, # TTS
transport.output(), # Transport bot output
tma_out, # Assistant spoken responses
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
lc.set_participant_id(participant["id"])
# Kick off the conversation.
# the `LLMMessagesFrame` will be picked up by the LangchainProcessor using
# only the content of the last message to inject it in the prompt defined
# above. So no role is required here.
messages = [({"content": "Please briefly introduce yourself to the user."})]
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -14,8 +14,10 @@ from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator, LLMUserResponseAggregator)
from pipecat.services.deepgram import DeepgramTTSService
LLMAssistantResponseAggregator,
LLMUserResponseAggregator,
)
from pipecat.services.deepgram import DeepgramSTTService, DeepgramTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
@@ -25,35 +27,36 @@ from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url: str, token):
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = DeepgramTTSService(
aiohttp_session=session,
api_key=os.getenv("DEEPGRAM_API_KEY"),
voice="aura-helios-en"
aiohttp_session=session, api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-helios-en"
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [
{
@@ -65,14 +68,17 @@ async def main(room_url: str, token):
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline([
transport.input(), # Transport user input
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out # Assistant spoken responses
])
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out, # Assistant spoken responses
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@@ -80,8 +86,7 @@ async def main(room_url: str, token):
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append(
{"role": "system", "content": "Please introduce yourself to the user."})
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
@@ -90,5 +95,4 @@ async def main(room_url: str, token):
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))
asyncio.run(main())

View File

@@ -0,0 +1,102 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator,
LLMUserResponseAggregator,
)
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = ElevenLabsTTSService(
api_key=os.getenv("ELEVENLABS_API_KEY", ""),
voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline(
[
transport.input(), # Transport user input
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out, # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,98 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import aiohttp
import asyncio
import os
import sys
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator,
LLMUserResponseAggregator,
)
from pipecat.services.playht import PlayHTTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
audio_out_sample_rate=16000,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = PlayHTTTSService(
user_id=os.getenv("PLAYHT_USER_ID"),
api_key=os.getenv("PLAYHT_API_KEY"),
voice_url="s3://voice-cloning-zero-shot/801a663f-efd0-4254-98d0-5c175514c3e8/jennifer/manifest.json",
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline(
[
transport.input(), # Transport user input
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out, # Assistant spoken responses
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,107 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import aiohttp
import asyncio
import os
import sys
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator,
LLMUserResponseAggregator,
)
from pipecat.services.azure import AzureLLMService, AzureSTTService, AzureTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
audio_out_sample_rate=16000,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
stt = AzureSTTService(
api_key=os.getenv("AZURE_SPEECH_API_KEY"),
region=os.getenv("AZURE_SPEECH_REGION"),
)
tts = AzureTTSService(
api_key=os.getenv("AZURE_SPEECH_API_KEY"),
region=os.getenv("AZURE_SPEECH_REGION"),
)
llm = AzureLLMService(
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
model=os.getenv("AZURE_CHATGPT_MODEL"),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out, # Assistant spoken responses
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,94 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import aiohttp
import asyncio
import os
import sys
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator,
LLMUserResponseAggregator,
)
from pipecat.services.openai import OpenAITTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
audio_out_sample_rate=24000,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = OpenAITTSService(api_key=os.getenv("OPENAI_API_KEY"), voice="alloy")
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline(
[
transport.input(), # Transport user input
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out, # Assistant spoken responses
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,102 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator,
LLMUserResponseAggregator,
)
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.openpipe import OpenPipeLLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
import time
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
timestamp = int(time.time())
llm = OpenPipeLLMService(
api_key=os.getenv("OPENAI_API_KEY"),
openpipe_api_key=os.getenv("OPENPIPE_API_KEY"),
model="gpt-4o",
tags={"conversation_id": f"pipecat-{timestamp}"},
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline(
[
transport.input(), # Transport user input
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out, # Assistant spoken responses
]
)
task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,99 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator,
LLMUserResponseAggregator,
)
from pipecat.services.deepgram import DeepgramSTTService, DeepgramTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.services.xtts import XTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = XTTSService(
aiohttp_session=session,
voice_id="Claribel Dervla",
language="en",
base_url="http://localhost:8000",
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline(
[
transport.input(), # Transport user input
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out, # Assistant spoken responses
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,102 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator,
LLMUserResponseAggregator,
)
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.gladia import GladiaSTTService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
stt = GladiaSTTService(
api_key=os.getenv("GLADIA_API_KEY"),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out, # Assistant spoken responses
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,94 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import aiohttp
import asyncio
import os
import sys
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator,
LLMUserResponseAggregator,
)
from pipecat.services.lmnt import LmntTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
audio_out_sample_rate=24000,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = LmntTTSService(api_key=os.getenv("LMNT_API_KEY"), voice_id="morgan")
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline(
[
transport.input(), # Transport user input
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out, # Assistant spoken responses
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,107 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator,
LLMUserResponseAggregator,
)
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.together import TogetherLLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
llm = TogetherLLMService(
api_key=os.getenv("TOGETHER_API_KEY"),
model=os.getenv("TOGETHER_MODEL"),
params=TogetherLLMService.InputParams(
temperature=1.0,
top_p=0.9,
top_k=40,
extra={
"frequency_penalty": 2.0,
"presence_penalty": 0.0,
},
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline(
[
transport.input(), # Transport user input
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out, # Assistant spoken responses
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -3,18 +3,19 @@ import aiohttp
import asyncio
import logging
import os
from pipecat.pipeline.aggregators import SentenceAggregator
from pipecat.processors.aggregators import SentenceAggregator
from pipecat.pipeline.pipeline import Pipeline
from pipecat.transports.daily_transport import DailyTransport
from pipecat.services.azure_ai_services import AzureLLMService, AzureTTSService
from pipecat.services.elevenlabs_ai_services import ElevenLabsTTSService
from pipecat.services.fal_ai_services import FalImageGenService
from pipecat.pipeline.frames import AudioFrame, EndFrame, ImageFrame, LLMMessagesFrame, TextFrame
from pipecat.transports.services.daily import DailyTransport
from pipecat.services.azure import AzureLLMService, AzureTTSService
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.fal import FalImageGenService
from pipecat.frames.frames import AudioFrame, EndFrame, ImageFrame, LLMMessagesFrame, TextFrame
from runner import configure
from dotenv import load_dotenv
load_dotenv(override=True)
logging.basicConfig(format=f"%(levelno)s %(asctime)s %(message)s")
@@ -22,8 +23,10 @@ logger = logging.getLogger("pipecat")
logger.setLevel(logging.DEBUG)
async def main(room_url: str):
async def main():
async with aiohttp.ClientSession() as session:
(room_url, _) = await configure(session)
transport = DailyTransport(
room_url,
None,
@@ -51,9 +54,7 @@ async def main(room_url: str):
voice_id="jBpfuIE2acCO8z3wKNLl",
)
dalle = FalImageGenService(
params=FalImageGenService.InputParams(
image_size="1024x1024"
),
params=FalImageGenService.InputParams(image_size="1024x1024"),
aiohttp_session=session,
key=os.getenv("FAL_KEY"),
)
@@ -73,13 +74,11 @@ async def main(room_url: str):
async def get_text_and_audio(messages) -> Tuple[str, bytearray]:
"""This function streams text from the LLM and uses the TTS service to convert
that text to speech as it's received. """
that text to speech as it's received."""
source_queue = asyncio.Queue()
sink_queue = asyncio.Queue()
sentence_aggregator = SentenceAggregator()
pipeline = Pipeline(
[llm, sentence_aggregator, tts1], source_queue, sink_queue
)
pipeline = Pipeline([llm, sentence_aggregator, tts1], source_queue, sink_queue)
await source_queue.put(LLMMessagesFrame(messages))
await source_queue.put(EndFrame())
@@ -144,5 +143,4 @@ async def main(room_url: str):
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))
asyncio.run(main())

View File

@@ -4,12 +4,21 @@
# SPDX-License-Identifier: BSD 2-Clause License
#
import aiohttp
import asyncio
import sys
from pipecat.frames.frames import (
Frame,
InputAudioRawFrame,
InputImageRawFrame,
OutputAudioRawFrame,
OutputImageRawFrame,
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.transports.services.daily import DailyTransport, DailyParams
from runner import configure
@@ -17,37 +26,63 @@ from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url, token):
transport = DailyTransport(
room_url, token, "Test",
DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
camera_out_enabled=True,
camera_out_width=1280,
camera_out_height=720
class MirrorProcessor(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, InputAudioRawFrame):
await self.push_frame(
OutputAudioRawFrame(
audio=frame.audio,
sample_rate=frame.sample_rate,
num_channels=frame.num_channels,
)
)
elif isinstance(frame, InputImageRawFrame):
await self.push_frame(
OutputImageRawFrame(image=frame.image, size=frame.size, format=frame.format)
)
else:
await self.push_frame(frame, direction)
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Test",
DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
camera_out_enabled=True,
camera_out_is_live=True,
camera_out_width=1280,
camera_out_height=720,
),
)
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_video(participant["id"])
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_video(participant["id"])
pipeline = Pipeline([transport.input(), transport.output()])
pipeline = Pipeline([transport.input(), MirrorProcessor(), transport.output()])
runner = PipelineRunner()
runner = PipelineRunner()
task = PipelineTask(pipeline)
task = PipelineTask(pipeline)
await runner.run(task)
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))
asyncio.run(main())

View File

@@ -4,14 +4,23 @@
# SPDX-License-Identifier: BSD 2-Clause License
#
import aiohttp
import asyncio
import sys
import tkinter as tk
from pipecat.frames.frames import (
Frame,
InputAudioRawFrame,
InputImageRawFrame,
OutputAudioRawFrame,
OutputImageRawFrame,
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.local.tk import TkLocalTransport
from pipecat.transports.services.daily import DailyParams, DailyTransport
@@ -21,45 +30,73 @@ from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url, token):
tk_root = tk.Tk()
tk_root.title("Local Mirror")
class MirrorProcessor(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
daily_transport = DailyTransport(room_url, token, "Test", DailyParams(audio_in_enabled=True))
if isinstance(frame, InputAudioRawFrame):
await self.push_frame(
OutputAudioRawFrame(
audio=frame.audio,
sample_rate=frame.sample_rate,
num_channels=frame.num_channels,
)
)
elif isinstance(frame, InputImageRawFrame):
await self.push_frame(
OutputImageRawFrame(image=frame.image, size=frame.size, format=frame.format)
)
else:
await self.push_frame(frame, direction)
tk_transport = TkLocalTransport(
tk_root,
TransportParams(
audio_out_enabled=True,
camera_out_enabled=True,
camera_out_width=1280,
camera_out_height=720))
@daily_transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_video(participant["id"])
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
pipeline = Pipeline([daily_transport.input(), tk_transport.output()])
tk_root = tk.Tk()
tk_root.title("Local Mirror")
runner = PipelineRunner()
daily_transport = DailyTransport(
room_url, token, "Test", DailyParams(audio_in_enabled=True)
)
async def run_tk():
while runner.is_active():
tk_root.update()
tk_root.update_idletasks()
await asyncio.sleep(0.1)
tk_transport = TkLocalTransport(
tk_root,
TransportParams(
audio_out_enabled=True,
camera_out_enabled=True,
camera_out_is_live=True,
camera_out_width=1280,
camera_out_height=720,
),
)
task = PipelineTask(pipeline)
@daily_transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_video(participant["id"])
await asyncio.gather(runner.run(task), run_tk())
pipeline = Pipeline([daily_transport.input(), MirrorProcessor(), tk_transport.output()])
task = PipelineTask(pipeline)
async def run_tk():
while not task.has_finished():
tk_root.update()
tk_root.update_idletasks()
await asyncio.sleep(0.1)
runner = PipelineRunner()
await asyncio.gather(runner.run(task), run_tk())
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))
asyncio.run(main())

View File

@@ -14,8 +14,10 @@ from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator, LLMUserResponseAggregator)
from pipecat.services.elevenlabs import ElevenLabsTTSService
LLMAssistantResponseAggregator,
LLMUserResponseAggregator,
)
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
@@ -25,15 +27,17 @@ from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url: str, token):
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
@@ -42,19 +46,16 @@ async def main(room_url: str, token):
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o")
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [
{
@@ -67,15 +68,17 @@ async def main(room_url: str, token):
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline([
transport.input(), # Transport user input
hey_robot_filter, # Filter out speech not directed at the robot
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out # Assistant spoken responses
])
pipeline = Pipeline(
[
transport.input(), # Transport user input
hey_robot_filter, # Filter out speech not directed at the robot
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out, # Assistant spoken responses
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@@ -90,5 +93,4 @@ async def main(room_url: str, token):
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))
asyncio.run(main())

View File

@@ -1,156 +0,0 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import random
import sys
from PIL import Image
from pipecat.frames.frames import Frame, ImageRawFrame, SpriteFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.llm_context import (
LLMUserContextAggregator,
LLMAssistantContextAggregator,
)
from pipecat.processors.filters.wake_check_filter import WakeCheckFilter
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.openai import OpenAILLMService
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
sprites = {}
image_files = [
"sc-default.png",
"sc-talk.png",
"sc-listen-1.png",
"sc-think-1.png",
"sc-think-2.png",
"sc-think-3.png",
"sc-think-4.png",
]
script_dir = os.path.dirname(__file__)
for file in image_files:
# Build the full path to the image file
full_path = os.path.join(script_dir, "assets", file)
# Get the filename without the extension to use as the dictionary key
filename = os.path.splitext(os.path.basename(full_path))[0]
# Open the image and convert it to bytes
with Image.open(full_path) as img:
sprites[file] = ImageRawFrame(image=img.tobytes(), size=img.size, format=img.format)
# When the bot isn't talking, show a static image of the cat listening
quiet_frame = sprites["sc-listen-1.png"]
# When the bot is talking, build an animation from two sprites
talking_list = [sprites["sc-default.png"], sprites["sc-talk.png"]]
talking = [random.choice(talking_list) for x in range(30)]
talking_frame = SpriteFrame(talking)
# TODO: Support "thinking" as soon as we get a valid transcript, while LLM
# is processing
thinking_list = [
sprites["sc-think-1.png"],
sprites["sc-think-2.png"],
sprites["sc-think-3.png"],
sprites["sc-think-4.png"],
]
thinking_frame = SpriteFrame(thinking_list)
class ImageSyncAggregator(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection):
await self.push_frame(talking_frame)
await self.push_frame(frame)
await self.push_frame(quiet_frame)
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Santa Cat",
DailyParams(
audio_out_enabled=True,
camera_out_enabled=True,
camera_out_width=720,
camera_out_height=1280,
camera_out_framerate=10,
transcription_enabled=True
)
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id="jBpfuIE2acCO8z3wKNLl",
)
isa = ImageSyncAggregator()
messages = [
{
"role": "system",
"content": "You are Santa Cat, a cat that lives in Santa's workshop at the North Pole. You should be clever, and a bit sarcastic. You should also tell jokes every once in a while. Your responses should only be a few sentences long.",
},
]
tma_in = LLMUserContextAggregator(messages)
tma_out = LLMAssistantContextAggregator(messages)
wcf = WakeCheckFilter(["Santa Cat", "Santa"])
pipeline = Pipeline([
transport.input(), # Transport user input
isa, # Cat talking/quiet images
wcf, # Filter out speech not directed at Santa Cat
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out # Santa Cat spoken responses
])
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
# Send some greeting at the beginning.
await tts.say("Hi! If you want to talk to me, just say 'hey Santa Cat'.")
transport.capture_participant_transcription(participant["id"])
async def starting_image():
await transport.send_image(quiet_frame)
runner = PipelineRunner()
task = PipelineTask(pipeline)
await asyncio.gather(runner.run(task), starting_image())
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -12,28 +12,30 @@ import wave
from pipecat.frames.frames import (
Frame,
AudioRawFrame,
LLMFullResponseEndFrame,
LLMMessagesFrame,
OutputAudioRawFrame,
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.llm_context import (
LLMUserContextAggregator,
LLMAssistantContextAggregator,
from pipecat.processors.aggregators.llm_response import (
LLMUserResponseAggregator,
LLMAssistantResponseAggregator,
)
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.processors.logger import FrameLogger
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.cartesia import CartesiaHttpTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
@@ -52,13 +54,15 @@ for file in sound_files:
filename = os.path.splitext(os.path.basename(full_path))[0]
# Open the image and convert it to bytes
with wave.open(full_path) as audio_file:
sounds[file] = AudioRawFrame(audio_file.readframes(-1),
audio_file.getframerate(), audio_file.getnchannels())
sounds[file] = OutputAudioRawFrame(
audio_file.readframes(-1), audio_file.getframerate(), audio_file.getnchannels()
)
class OutboundSoundEffectWrapper(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, LLMFullResponseEndFrame):
await self.push_frame(sounds["ding1.wav"])
# In case anything else downstream needs it
@@ -68,8 +72,9 @@ class OutboundSoundEffectWrapper(FrameProcessor):
class InboundSoundEffectWrapper(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, LLMMessagesFrame):
await self.push_frame(sounds["ding2.wav"])
# In case anything else downstream needs it
@@ -78,23 +83,27 @@ class InboundSoundEffectWrapper(FrameProcessor):
await self.push_frame(frame, direction)
async def main(room_url: str, token):
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(audio_out_enabled=True, transcription_enabled=True)
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id="ErXwobaYiN019PkySvjV",
tts = CartesiaHttpTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
messages = [
@@ -104,25 +113,27 @@ async def main(room_url: str, token):
},
]
tma_in = LLMUserContextAggregator(messages)
tma_out = LLMAssistantContextAggregator(messages)
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
out_sound = OutboundSoundEffectWrapper()
in_sound = InboundSoundEffectWrapper()
fl = FrameLogger("LLM Out")
fl2 = FrameLogger("Transcription In")
pipeline = Pipeline([
transport.input(),
tma_in,
in_sound,
fl2,
llm,
fl,
tts,
out_sound,
transport.output(),
tma_out
])
pipeline = Pipeline(
[
transport.input(),
tma_in,
in_sound,
fl2,
llm,
fl,
tts,
out_sound,
transport.output(),
tma_out,
]
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
@@ -138,5 +149,4 @@ async def main(room_url: str, token):
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))
asyncio.run(main())

View File

@@ -16,7 +16,7 @@ from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.user_response import UserResponseAggregator
from pipecat.processors.aggregators.vision_image_frame import VisionImageFrameAggregator
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.moondream import MoondreamService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
@@ -26,6 +26,7 @@ from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
@@ -33,7 +34,6 @@ logger.add(sys.stderr, level="DEBUG")
class UserImageRequester(FrameProcessor):
def __init__(self, participant_id: str | None = None):
super().__init__()
self._participant_id = participant_id
@@ -42,13 +42,19 @@ class UserImageRequester(FrameProcessor):
self._participant_id = participant_id
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if self._participant_id and isinstance(frame, TextFrame):
await self.push_frame(UserImageRequestFrame(self._participant_id), FrameDirection.UPSTREAM)
await self.push_frame(
UserImageRequestFrame(self._participant_id), FrameDirection.UPSTREAM
)
await self.push_frame(frame, direction)
async def main(room_url: str, token):
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
@@ -57,14 +63,8 @@ async def main(room_url: str, token):
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
vad_analyzer=SileroVADAnalyzer(),
),
)
user_response = UserResponseAggregator()
@@ -76,10 +76,9 @@ async def main(room_url: str, token):
# If you run into weird description, try with use_cpu=True
moondream = MoondreamService()
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
@transport.event_handler("on_first_participant_joined")
@@ -89,15 +88,17 @@ async def main(room_url: str, token):
transport.capture_participant_transcription(participant["id"])
image_requester.set_participant_id(participant["id"])
pipeline = Pipeline([
transport.input(),
user_response,
image_requester,
vision_aggregator,
moondream,
tts,
transport.output()
])
pipeline = Pipeline(
[
transport.input(),
user_response,
image_requester,
vision_aggregator,
moondream,
tts,
transport.output(),
]
)
task = PipelineTask(pipeline)
@@ -105,6 +106,6 @@ async def main(room_url: str, token):
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))
asyncio.run(main())

View File

@@ -16,7 +16,7 @@ from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.user_response import UserResponseAggregator
from pipecat.processors.aggregators.vision_image_frame import VisionImageFrameAggregator
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.google import GoogleLLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
@@ -26,6 +26,7 @@ from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
@@ -33,7 +34,6 @@ logger.add(sys.stderr, level="DEBUG")
class UserImageRequester(FrameProcessor):
def __init__(self, participant_id: str | None = None):
super().__init__()
self._participant_id = participant_id
@@ -42,13 +42,19 @@ class UserImageRequester(FrameProcessor):
self._participant_id = participant_id
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if self._participant_id and isinstance(frame, TextFrame):
await self.push_frame(UserImageRequestFrame(self._participant_id), FrameDirection.UPSTREAM)
await self.push_frame(
UserImageRequestFrame(self._participant_id), FrameDirection.UPSTREAM
)
await self.push_frame(frame, direction)
async def main(room_url: str, token):
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
@@ -58,8 +64,8 @@ async def main(room_url: str, token):
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
vad_analyzer=SileroVADAnalyzer(),
),
)
user_response = UserResponseAggregator()
@@ -69,13 +75,12 @@ async def main(room_url: str, token):
vision_aggregator = VisionImageFrameAggregator()
google = GoogleLLMService(
model="gemini-1.5-flash-latest",
api_key=os.getenv("GOOGLE_API_KEY"))
model="gemini-1.5-flash-latest", api_key=os.getenv("GOOGLE_API_KEY")
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
@transport.event_handler("on_first_participant_joined")
@@ -85,15 +90,17 @@ async def main(room_url: str, token):
transport.capture_participant_transcription(participant["id"])
image_requester.set_participant_id(participant["id"])
pipeline = Pipeline([
transport.input(),
user_response,
image_requester,
vision_aggregator,
google,
tts,
transport.output()
])
pipeline = Pipeline(
[
transport.input(),
user_response,
image_requester,
vision_aggregator,
google,
tts,
transport.output(),
]
)
task = PipelineTask(pipeline)
@@ -101,6 +108,6 @@ async def main(room_url: str, token):
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))
asyncio.run(main())

View File

@@ -16,7 +16,7 @@ from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.user_response import UserResponseAggregator
from pipecat.processors.aggregators.vision_image_frame import VisionImageFrameAggregator
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
@@ -26,6 +26,7 @@ from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
@@ -33,7 +34,6 @@ logger.add(sys.stderr, level="DEBUG")
class UserImageRequester(FrameProcessor):
def __init__(self, participant_id: str | None = None):
super().__init__()
self._participant_id = participant_id
@@ -42,13 +42,19 @@ class UserImageRequester(FrameProcessor):
self._participant_id = participant_id
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if self._participant_id and isinstance(frame, TextFrame):
await self.push_frame(UserImageRequestFrame(self._participant_id), FrameDirection.UPSTREAM)
await self.push_frame(
UserImageRequestFrame(self._participant_id), FrameDirection.UPSTREAM
)
await self.push_frame(frame, direction)
async def main(room_url: str, token):
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
@@ -57,8 +63,8 @@ async def main(room_url: str, token):
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
vad_analyzer=SileroVADAnalyzer(),
),
)
user_response = UserResponseAggregator()
@@ -67,15 +73,11 @@ async def main(room_url: str, token):
vision_aggregator = VisionImageFrameAggregator()
openai = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o"
)
openai = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
@transport.event_handler("on_first_participant_joined")
@@ -85,15 +87,17 @@ async def main(room_url: str, token):
transport.capture_participant_transcription(participant["id"])
image_requester.set_participant_id(participant["id"])
pipeline = Pipeline([
transport.input(),
user_response,
image_requester,
vision_aggregator,
openai,
tts,
transport.output()
])
pipeline = Pipeline(
[
transport.input(),
user_response,
image_requester,
vision_aggregator,
openai,
tts,
transport.output(),
]
)
task = PipelineTask(pipeline)
@@ -101,6 +105,6 @@ async def main(room_url: str, token):
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))
asyncio.run(main())

View File

@@ -16,7 +16,7 @@ from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.user_response import UserResponseAggregator
from pipecat.processors.aggregators.vision_image_frame import VisionImageFrameAggregator
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.anthropic import AnthropicLLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
@@ -26,6 +26,7 @@ from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
@@ -33,7 +34,6 @@ logger.add(sys.stderr, level="DEBUG")
class UserImageRequester(FrameProcessor):
def __init__(self, participant_id: str | None = None):
super().__init__()
self._participant_id = participant_id
@@ -42,13 +42,19 @@ class UserImageRequester(FrameProcessor):
self._participant_id = participant_id
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if self._participant_id and isinstance(frame, TextFrame):
await self.push_frame(UserImageRequestFrame(self._participant_id), FrameDirection.UPSTREAM)
await self.push_frame(
UserImageRequestFrame(self._participant_id), FrameDirection.UPSTREAM
)
await self.push_frame(frame, direction)
async def main(room_url: str, token):
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
@@ -57,8 +63,8 @@ async def main(room_url: str, token):
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
vad_analyzer=SileroVADAnalyzer(),
),
)
user_response = UserResponseAggregator()
@@ -67,15 +73,12 @@ async def main(room_url: str, token):
vision_aggregator = VisionImageFrameAggregator()
anthropic = AnthropicLLMService(
api_key=os.getenv("ANTHROPIC_API_KEY"),
model="claude-3-sonnet-20240229"
)
anthropic = AnthropicLLMService(api_key=os.getenv("ANTHROPIC_API_KEY"))
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
sample_rate=16000,
)
@transport.event_handler("on_first_participant_joined")
@@ -85,15 +88,17 @@ async def main(room_url: str, token):
transport.capture_participant_transcription(participant["id"])
image_requester.set_participant_id(participant["id"])
pipeline = Pipeline([
transport.input(),
user_response,
image_requester,
vision_aggregator,
anthropic,
tts,
transport.output()
])
pipeline = Pipeline(
[
transport.input(),
user_response,
image_requester,
vision_aggregator,
anthropic,
tts,
transport.output(),
]
)
task = PipelineTask(pipeline)
@@ -101,6 +106,6 @@ async def main(room_url: str, token):
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))
asyncio.run(main())

View File

@@ -4,6 +4,7 @@
# SPDX-License-Identifier: BSD 2-Clause License
#
import aiohttp
import asyncio
import sys
@@ -20,6 +21,7 @@ from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
@@ -27,29 +29,33 @@ logger.add(sys.stderr, level="DEBUG")
class TranscriptionLogger(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, TranscriptionFrame):
print(f"Transcription: {frame.text}")
async def main(room_url: str):
transport = DailyTransport(room_url, None, "Transcription bot",
DailyParams(audio_in_enabled=True))
async def main():
async with aiohttp.ClientSession() as session:
(room_url, _) = await configure(session)
stt = WhisperSTTService()
transport = DailyTransport(
room_url, None, "Transcription bot", DailyParams(audio_in_enabled=True)
)
tl = TranscriptionLogger()
stt = WhisperSTTService()
pipeline = Pipeline([transport.input(), stt, tl])
tl = TranscriptionLogger()
task = PipelineTask(pipeline)
pipeline = Pipeline([transport.input(), stt, tl])
runner = PipelineRunner()
task = PipelineTask(pipeline)
await runner.run(task)
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))
asyncio.run(main())

View File

@@ -16,11 +16,10 @@ from pipecat.services.whisper import WhisperSTTService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.local.audio import LocalAudioTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
@@ -28,13 +27,14 @@ logger.add(sys.stderr, level="DEBUG")
class TranscriptionLogger(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, TranscriptionFrame):
print(f"Transcription: {frame.text}")
async def main(room_url: str):
async def main():
transport = LocalAudioTransport(TransportParams(audio_in_enabled=True))
stt = WhisperSTTService()
@@ -51,5 +51,4 @@ async def main(room_url: str):
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))
asyncio.run(main())

View File

@@ -0,0 +1,62 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import aiohttp
import asyncio
import os
import sys
from pipecat.frames.frames import Frame, TranscriptionFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.deepgram import DeepgramSTTService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
class TranscriptionLogger(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, TranscriptionFrame):
print(f"Transcription: {frame.text}")
async def main():
async with aiohttp.ClientSession() as session:
(room_url, _) = await configure(session)
transport = DailyTransport(
room_url, None, "Transcription bot", DailyParams(audio_in_enabled=True)
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tl = TranscriptionLogger()
pipeline = Pipeline([transport.input(), stt, tl])
task = PipelineTask(pipeline)
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,134 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.logger import FrameLogger
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.openai import OpenAILLMContext, OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from openai.types.chat import ChatCompletionToolParam
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def start_fetch_weather(function_name, llm, context):
await llm.push_frame(TextFrame("Let me check on that."))
async def fetch_weather_from_api(function_name, tool_call_id, args, llm, context, result_callback):
await result_callback({"conditions": "nice", "temperature": "75"})
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
# Register a function_name of None to get all functions
# sent to the same callback with an additional function_name parameter.
llm.register_function(None, fetch_weather_from_api, start_callback=start_fetch_weather)
fl_in = FrameLogger("Inner")
fl_out = FrameLogger("Outer")
tools = [
ChatCompletionToolParam(
type="function",
function={
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
},
"required": ["location", "format"],
},
},
)
]
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
fl_in,
transport.input(),
context_aggregator.user(),
llm,
fl_out,
tts,
transport.output(),
context_aggregator.assistant(),
]
)
task = PipelineTask(pipeline)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
await tts.say("Hi! Ask me about the weather in San Francisco.")
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,160 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import aiohttp
import asyncio
import os
import sys
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.parallel_pipeline import ParallelPipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.filters.function_filter import FunctionFilter
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from openai.types.chat import ChatCompletionToolParam
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
current_voice = "News Lady"
async def switch_voice(function_name, tool_call_id, args, llm, context, result_callback):
global current_voice
current_voice = args["voice"]
await result_callback(
{
"voice": f"You are now using your {current_voice} voice. Your responses should now be as if you were a {current_voice}."
}
)
async def news_lady_filter(frame) -> bool:
return current_voice == "News Lady"
async def british_lady_filter(frame) -> bool:
return current_voice == "British Lady"
async def barbershop_man_filter(frame) -> bool:
return current_voice == "Barbershop Man"
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Pipecat",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
news_lady = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="bf991597-6c13-47e4-8411-91ec2de5c466", # Newslady
)
british_lady = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
barbershop_man = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="a0e99841-438c-4a64-b679-ae501e7d6091", # Barbershop Man
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
llm.register_function("switch_voice", switch_voice)
tools = [
ChatCompletionToolParam(
type="function",
function={
"name": "switch_voice",
"description": "Switch your voice only when the user asks you to",
"parameters": {
"type": "object",
"properties": {
"voice": {
"type": "string",
"description": "The voice the user wants you to use",
},
},
"required": ["voice"],
},
},
)
]
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities. Respond to what the user said in a creative and helpful way. Your output should not include non-alphanumeric characters. You can do the following voices: 'News Lady', 'British Lady' and 'Barbershop Man'.",
},
]
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
context_aggregator.user(), # User responses
llm, # LLM
ParallelPipeline( # TTS (one of the following vocies)
[FunctionFilter(news_lady_filter), news_lady], # News Lady voice
[FunctionFilter(british_lady_filter), british_lady], # British Lady voice
[FunctionFilter(barbershop_man_filter), barbershop_man], # Barbershop Man voice
),
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append(
{
"role": "system",
"content": f"Please introduce yourself to the user and let them know the voices you can do. Your initial responses should be as if you were a {current_voice}.",
}
)
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,151 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.parallel_pipeline import ParallelPipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.filters.function_filter import FunctionFilter
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.services.whisper import Model, WhisperSTTService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from openai.types.chat import ChatCompletionToolParam
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
current_language = "English"
async def switch_language(function_name, tool_call_id, args, llm, context, result_callback):
global current_language
current_language = args["language"]
await result_callback({"voice": f"Your answers from now on should be in {current_language}."})
async def english_filter(frame) -> bool:
return current_language == "English"
async def spanish_filter(frame) -> bool:
return current_language == "Spanish"
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Pipecat",
DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
),
)
stt = WhisperSTTService(model=Model.LARGE)
english_tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
spanish_tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="846d6cb0-2301-48b6-9683-48f5618ea2f6", # Spanish-speaking Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
llm.register_function("switch_language", switch_language)
tools = [
ChatCompletionToolParam(
type="function",
function={
"name": "switch_language",
"description": "Switch to another language when the user asks you to",
"parameters": {
"type": "object",
"properties": {
"language": {
"type": "string",
"description": "The language the user wants you to speak",
},
},
"required": ["language"],
},
},
)
]
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities. Respond to what the user said in a creative and helpful way. Your output should not include non-alphanumeric characters. You can speak the following languages: 'English' and 'Spanish'.",
},
]
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
context_aggregator.user(), # User responses
llm, # LLM
ParallelPipeline( # TTS (bot will speak the chosen language)
[FunctionFilter(english_filter), english_tts], # English
[FunctionFilter(spanish_filter), spanish_tts], # Spanish
),
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append(
{
"role": "system",
"content": f"Please introduce yourself to the user and let them know the languages you speak. Your initial responses should be in {current_language}.",
}
)
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,142 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator,
LLMUserResponseAggregator,
)
from pipecat.services.deepgram import DeepgramTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import (
DailyParams,
DailyTransport,
DailyTransportMessageFrame,
)
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = DeepgramTTSService(
aiohttp_session=session,
api_key=os.getenv("DEEPGRAM_API_KEY"),
voice="aura-asteria-en",
base_url="http://0.0.0.0:8080/v1/speak",
)
llm = OpenAILLMService(
# To use OpenAI
# api_key=os.getenv("OPENAI_API_KEY"),
# model="gpt-4o"
# Or, to use a local vLLM (or similar) api server
model="meta-llama/Meta-Llama-3-8B-Instruct",
base_url="http://0.0.0.0:8000/v1",
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline(
[
transport.input(), # Transport user input
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out, # Assistant spoken responses
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True, enable_metrics=True))
# When a participant joins, start transcription for that participant so the
# bot can "hear" and respond to them.
@transport.event_handler("on_participant_joined")
async def on_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# When the first participant joins, the bot should introduce itself.
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
# Handle "latency-ping" messages. The client will send app messages that look like
# this:
# { "latency-ping": { ts: <client-side timestamp> }}
#
# We want to send an immediate pong back to the client from this handler function.
# Also, we will push a frame into the top of the pipeline and send it after the
#
@transport.event_handler("on_app_message")
async def on_app_message(transport, message, sender):
try:
if "latency-ping" in message:
logger.debug(f"Received latency ping app message: {message}")
ts = message["latency-ping"]["ts"]
# Send immediately
transport.output().send_message(
DailyTransportMessageFrame(
message={"latency-pong-msg-handler": {"ts": ts}}, participant_id=sender
)
)
# And push to the pipeline for the Daily transport.output to send
await tma_in.push_frame(
DailyTransportMessageFrame(
message={"latency-pong-pipeline-delivery": {"ts": ts}},
participant_id=sender,
)
)
except Exception as e:
logger.debug(f"message handling error: {e} - {message}")
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,116 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator,
LLMUserResponseAggregator,
)
from pipecat.processors.user_idle_processor import UserIdleProcessor
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
async def user_idle_callback(user_idle: UserIdleProcessor):
messages.append(
{
"role": "system",
"content": "Ask the user if they are still there and try to prompt for some input, but be short.",
}
)
await user_idle.push_frame(LLMMessagesFrame(messages))
user_idle = UserIdleProcessor(callback=user_idle_callback, timeout=5.0)
pipeline = Pipeline(
[
transport.input(), # Transport user input
user_idle, # Idle user check-in
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out, # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
report_only_initial_ttfb=True,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,76 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import argparse
import sys
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.gstreamer.pipeline_source import GStreamerPipelineSource
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure_with_args
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
parser = argparse.ArgumentParser(description="Daily AI SDK Bot Sample")
parser.add_argument("-i", "--input", type=str, required=True, help="Input video file")
(room_url, _, args) = await configure_with_args(session, parser)
transport = DailyTransport(
room_url,
None,
"GStreamer",
DailyParams(
audio_out_enabled=True,
audio_out_is_live=True,
camera_out_enabled=True,
camera_out_width=1280,
camera_out_height=720,
camera_out_is_live=True,
),
)
gst = GStreamerPipelineSource(
pipeline=f"filesrc location={args.input}",
out_params=GStreamerPipelineSource.OutputParams(
video_width=1280,
video_height=720,
audio_sample_rate=16000,
audio_channels=1,
),
)
pipeline = Pipeline(
[
gst, # GStreamer file source
transport.output(), # Transport bot output
]
)
task = PipelineTask(pipeline)
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,67 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import sys
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.gstreamer.pipeline_source import GStreamerPipelineSource
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
(room_url, _) = await configure(session)
transport = DailyTransport(
room_url,
None,
"GStreamer",
DailyParams(
camera_out_enabled=True,
camera_out_width=1280,
camera_out_height=720,
camera_out_is_live=True,
),
)
gst = GStreamerPipelineSource(
pipeline='videotestsrc ! capsfilter caps="video/x-raw,width=1280,height=720,framerate=30/1"',
out_params=GStreamerPipelineSource.OutputParams(
video_width=1280, video_height=720, clock_sync=False
),
)
pipeline = Pipeline(
[
gst, # GStreamer file source
transport.output(), # Transport bot output
]
)
task = PipelineTask(pipeline)
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,118 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.anthropic import AnthropicLLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def get_weather(function_name, tool_call_id, arguments, llm, context, result_callback):
location = arguments["location"]
await result_callback(f"The weather in {location} is currently 72 degrees and sunny.")
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
llm = AnthropicLLMService(
api_key=os.getenv("ANTHROPIC_API_KEY"), model="claude-3-5-sonnet-20240620"
)
llm.register_function("get_weather", get_weather)
tools = [
{
"name": "get_weather",
"description": "Get the current weather in a given location",
"input_schema": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
}
},
"required": ["location"],
},
}
]
# todo: test with very short initial user message
# messages = [{"role": "system",
# "content": "You are a helpful assistant who can report the weather in any location in the universe. Respond concisely. Your response will be turned into speech so use only simple words and punctuation."},
# {"role": "user",
# "content": " Start the conversation by introducing yourself."}]
messages = [{"role": "user", "content": "Say 'hello' to start the conversation."}]
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
context_aggregator.user(), # User spoken responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses and tool context
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True, enable_metrics=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,173 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.anthropic import AnthropicLLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
video_participant_id = None
async def get_weather(function_name, tool_call_id, arguments, llm, context, result_callback):
location = arguments["location"]
await result_callback(f"The weather in {location} is currently 72 degrees and sunny.")
async def get_image(function_name, tool_call_id, arguments, llm, context, result_callback):
question = arguments["question"]
await llm.request_image_frame(user_id=video_participant_id, text_content=question)
async def main():
global llm
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
llm = AnthropicLLMService(
api_key=os.getenv("ANTHROPIC_API_KEY"),
model="claude-3-5-sonnet-20240620",
enable_prompt_caching_beta=True,
)
llm.register_function("get_weather", get_weather)
llm.register_function("get_image", get_image)
tools = [
{
"name": "get_weather",
"description": "Get the current weather in a given location",
"input_schema": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
}
},
"required": ["location"],
},
},
{
"name": "get_image",
"description": "Get an image from the video stream.",
"input_schema": {
"type": "object",
"properties": {
"question": {
"type": "string",
"description": "The question that the user is asking about the image.",
}
},
"required": ["question"],
},
},
]
# todo: test with very short initial user message
system_prompt = """\
You are a helpful assistant who converses with a user and answers questions. Respond concisely to general questions.
Your response will be turned into speech so use only simple words and punctuation.
You have access to two tools: get_weather and get_image.
You can respond to questions about the weather using the get_weather tool.
You can answer questions about the user's video stream using the get_image tool. Some examples of phrases that \
indicate you should use the get_image tool are:
- What do you see?
- What's in the video?
- Can you describe the video?
- Tell me about what you see.
- Tell me something interesting about what you see.
- What's happening in the video?
If you need to use a tool, simply use the tool. Do not tell the user the tool you are using. Be brief and concise.
"""
messages = [
{
"role": "system",
"content": [
{
"type": "text",
"text": system_prompt,
}
],
},
{"role": "user", "content": "Start the conversation by introducing yourself."},
]
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
context_aggregator.user(), # User speech to text
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses and tool context
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True, enable_metrics=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
global video_participant_id
video_participant_id = participant["id"]
transport.capture_participant_transcription(video_participant_id)
transport.capture_participant_video(video_participant_id, framerate=0)
# Kick off the conversation.
await task.queue_frames([context_aggregator.user().get_context_frame()])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,137 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
import json
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.together import TogetherLLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def get_current_weather(
function_name, tool_call_id, arguments, llm, context, result_callback
):
logger.debug("IN get_current_weather")
location = arguments["location"]
await result_callback(f"The weather in {location} is currently 72 degrees and sunny.")
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
llm = TogetherLLMService(
api_key=os.getenv("TOGETHER_API_KEY"),
model=os.getenv("TOGETHER_MODEL"),
)
llm.register_function("get_current_weather", get_current_weather)
weatherTool = {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
},
"required": ["location"],
},
}
system_prompt = f"""\
You have access to the following functions:
Use the function '{weatherTool["name"]}' to '{weatherTool["description"]}':
{json.dumps(weatherTool)}
If you choose to call a function ONLY reply in the following format with no prefix or suffix:
<function=example_function_name>{{\"example_name\": \"example_value\"}}</function>
Reminder:
- Function calls MUST follow the specified format, start with <function= and end with </function>
- Required parameters MUST be specified
- Only call one function at a time
- Put the entire function call reply on one line
- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls
"""
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": "Wait for the user to say something."},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline(
[
transport.input(), # Transport user input
context_aggregator.user(), # User speech to text
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses and tool context
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True, enable_metrics=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -1,18 +1,29 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import aiohttp
import argparse
import os
import time
import urllib
import requests
from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper
def configure():
parser = argparse.ArgumentParser(description="Daily AI SDK Bot Sample")
async def configure(aiohttp_session: aiohttp.ClientSession):
(url, token, _) = await configure_with_args(aiohttp_session)
return (url, token)
async def configure_with_args(
aiohttp_session: aiohttp.ClientSession, parser: argparse.ArgumentParser | None = None
):
if not parser:
parser = argparse.ArgumentParser(description="Daily AI SDK Bot Sample")
parser.add_argument(
"-u",
"--url",
type=str,
required=False,
help="URL of the Daily room to join")
"-u", "--url", type=str, required=False, help="URL of the Daily room to join"
)
parser.add_argument(
"-k",
"--apikey",
@@ -28,31 +39,24 @@ def configure():
if not url:
raise Exception(
"No Daily room specified. use the -u/--url option from the command line, or set DAILY_SAMPLE_ROOM_URL in your environment to specify a Daily room URL.")
"No Daily room specified. use the -u/--url option from the command line, or set DAILY_SAMPLE_ROOM_URL in your environment to specify a Daily room URL."
)
if not key:
raise Exception("No Daily API key specified. use the -k/--apikey option from the command line, or set DAILY_API_KEY in your environment to specify a Daily API key, available from https://dashboard.daily.co/developers.")
raise Exception(
"No Daily API key specified. use the -k/--apikey option from the command line, or set DAILY_API_KEY in your environment to specify a Daily API key, available from https://dashboard.daily.co/developers."
)
daily_rest_helper = DailyRESTHelper(
daily_api_key=key,
daily_api_url=os.getenv("DAILY_API_URL", "https://api.daily.co/v1"),
aiohttp_session=aiohttp_session,
)
# Create a meeting token for the given room with an expiration 1 hour in
# the future.
room_name: str = urllib.parse.urlparse(url).path[1:]
expiration: float = time.time() + 60 * 60
expiry_time: float = 60 * 60
res: requests.Response = requests.post(
f"https://api.daily.co/v1/meeting-tokens",
headers={
"Authorization": f"Bearer {key}"},
json={
"properties": {
"room_name": room_name,
"is_owner": True,
"exp": expiration}},
)
token = await daily_rest_helper.get_token(url, expiry_time)
if res.status_code != 200:
raise Exception(
f"Failed to create meeting token: {res.status_code} {res.text}")
token: str = res.json()["token"]
return (url, token)
return (url, token, args)

View File

@@ -1,25 +0,0 @@
syntax = "proto3";
package pipecat_proto;
message TextFrame {
string text = 1;
}
message AudioFrame {
bytes audio = 1;
}
message TranscriptionFrame {
string text = 1;
string participant_id = 2;
string timestamp = 3;
}
message Frame {
oneof frame {
TextFrame text = 1;
AudioFrame audio = 2;
TranscriptionFrame transcription = 3;
}
}

View File

@@ -1,134 +0,0 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<script src="//cdn.jsdelivr.net/npm/protobufjs@7.X.X/dist/protobuf.min.js"></script>
<title>WebSocket Audio Stream</title>
</head>
<body>
<h1>WebSocket Audio Stream</h1>
<button id="startAudioBtn">Start Audio</button>
<button id="stopAudioBtn">Stop Audio</button>
<script>
const SAMPLE_RATE = 16000;
const BUFFER_SIZE = 8192;
const MIN_AUDIO_SIZE = 6400;
let audioContext;
let microphoneStream;
let scriptProcessor;
let source;
let frame;
let audioChunks = [];
let isPlaying = false;
let ws;
const proto = protobuf.load("frames.proto", (err, root) => {
if (err) throw err;
frame = root.lookupType("pipecat_proto.Frame");
});
function initWebSocket() {
ws = new WebSocket('ws://localhost:8765');
ws.addEventListener('open', () => console.log('WebSocket connection established.'));
ws.addEventListener('message', handleWebSocketMessage);
ws.addEventListener('close', (event) => console.log("WebSocket connection closed.", event.code, event.reason));
ws.addEventListener('error', (event) => console.error('WebSocket error:', event));
}
async function handleWebSocketMessage(event) {
const arrayBuffer = await event.data.arrayBuffer();
enqueueAudioFromProto(arrayBuffer);
}
function enqueueAudioFromProto(arrayBuffer) {
const parsedFrame = frame.decode(new Uint8Array(arrayBuffer));
if (!parsedFrame?.audio) return false;
const frameCount = parsedFrame.audio.data.length / 2;
const audioOutBuffer = audioContext.createBuffer(1, frameCount, SAMPLE_RATE);
const nowBuffering = audioOutBuffer.getChannelData(0);
const view = new Int16Array(parsedFrame.audio.data.buffer);
for (let i = 0; i < frameCount; i++) {
const word = view[i];
nowBuffering[i] = ((word + 32768) % 65536 - 32768) / 32768.0;
}
audioChunks.push(audioOutBuffer);
if (!isPlaying) playNextChunk();
}
function playNextChunk() {
if (audioChunks.length === 0) {
isPlaying = false;
return;
}
isPlaying = true;
const audioOutBuffer = audioChunks.shift();
const source = audioContext.createBufferSource();
source.buffer = audioOutBuffer;
source.connect(audioContext.destination);
source.onended = playNextChunk;
source.start();
}
function startAudio() {
if (!navigator.mediaDevices || !navigator.mediaDevices.getUserMedia) {
alert('getUserMedia is not supported in your browser.');
return;
}
navigator.mediaDevices.getUserMedia({ audio: true })
.then((stream) => {
microphoneStream = stream;
audioContext = new (window.AudioContext || window.webkitAudioContext)();
scriptProcessor = audioContext.createScriptProcessor(BUFFER_SIZE, 1, 1);
source = audioContext.createMediaStreamSource(stream);
source.connect(scriptProcessor);
scriptProcessor.connect(audioContext.destination);
const audioBuffer = [];
const skipRatio = Math.floor(audioContext.sampleRate / (SAMPLE_RATE * 2));
scriptProcessor.onaudioprocess = (event) => {
const rawLeftChannelData = event.inputBuffer.getChannelData(0);
for (let i = 0; i < rawLeftChannelData.length; i += skipRatio) {
const normalized = ((rawLeftChannelData[i] * 32768.0) + 32768) % 65536 - 32768;
const swappedBytes = ((normalized & 0xff) << 8) | ((normalized >> 8) & 0xff);
audioBuffer.push(swappedBytes);
}
if (audioBuffer.length >= MIN_AUDIO_SIZE) {
const audioFrame = frame.create({ audio: { audio: audioBuffer.slice(0, MIN_AUDIO_SIZE) } });
const encodedFrame = new Uint8Array(frame.encode(audioFrame).finish());
ws.send(encodedFrame);
audioBuffer.splice(0, MIN_AUDIO_SIZE);
}
};
initWebSocket();
})
.catch((error) => console.error('Error accessing microphone:', error));
}
function stopAudio() {
if (ws) {
ws.close();
scriptProcessor.disconnect();
source.disconnect();
ws = undefined;
}
}
document.getElementById('startAudioBtn').addEventListener('click', startAudio);
document.getElementById('stopAudioBtn').addEventListener('click', stopAudio);
</script>
</body>
</html>

View File

@@ -1,50 +0,0 @@
import asyncio
import aiohttp
import logging
import os
from pipecat.pipeline.frame_processor import FrameProcessor
from pipecat.pipeline.frames import TextFrame, TranscriptionFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.services.elevenlabs_ai_services import ElevenLabsTTSService
from pipecat.transports.websocket_transport import WebsocketTransport
from pipecat.services.whisper_ai_services import WhisperSTTService
logging.basicConfig(format="%(levelno)s %(asctime)s %(message)s")
logger = logging.getLogger("pipecat")
logger.setLevel(logging.DEBUG)
class WhisperTranscriber(FrameProcessor):
async def process_frame(self, frame):
if isinstance(frame, TranscriptionFrame):
print(f"Transcribed: {frame.text}")
else:
yield frame
async def main():
async with aiohttp.ClientSession() as session:
transport = WebsocketTransport(
mic_enabled=True,
speaker_enabled=True,
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
pipeline = Pipeline([
WhisperSTTService(),
WhisperTranscriber(),
tts,
])
@transport.on_connection
async def queue_frame():
await pipeline.queue_frames([TextFrame("Hello there!")])
await transport.run(pipeline)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -1,3 +1,9 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
@@ -7,10 +13,11 @@ from PIL import Image
from pipecat.frames.frames import (
ImageRawFrame,
OutputImageRawFrame,
SpriteFrame,
Frame,
LLMMessagesFrame,
AudioRawFrame,
TTSAudioRawFrame,
TTSStoppedFrame,
TextFrame,
UserImageRawFrame,
@@ -25,7 +32,7 @@ from pipecat.processors.aggregators.llm_response import LLMUserResponseAggregato
from pipecat.processors.aggregators.sentence import SentenceAggregator
from pipecat.processors.aggregators.vision_image_frame import VisionImageFrameAggregator
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.moondream import MoondreamService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
@@ -36,6 +43,7 @@ from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
@@ -53,7 +61,7 @@ for i in range(1, 26):
# Get the filename without the extension to use as the dictionary key
# Open the image and convert it to bytes
with Image.open(full_path) as img:
sprites.append(ImageRawFrame(image=img.tobytes(), size=img.size, format=img.format))
sprites.append(OutputImageRawFrame(image=img.tobytes(), size=img.size, format=img.format))
flipped = sprites[::-1]
sprites.extend(flipped)
@@ -74,7 +82,9 @@ class TalkingAnimation(FrameProcessor):
self._is_talking = False
async def process_frame(self, frame: Frame, direction: FrameDirection):
if isinstance(frame, AudioRawFrame):
await super().process_frame(frame, direction)
if isinstance(frame, TTSAudioRawFrame):
if not self._is_talking:
await self.push_frame(talking_frame)
self._is_talking = True
@@ -93,9 +103,13 @@ class UserImageRequester(FrameProcessor):
self.participant_id = participant_id
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if self.participant_id and isinstance(frame, TextFrame):
if frame.text == user_request_answer:
await self.push_frame(UserImageRequestFrame(self.participant_id), FrameDirection.UPSTREAM)
await self.push_frame(
UserImageRequestFrame(self.participant_id), FrameDirection.UPSTREAM
)
await self.push_frame(TextFrame("Describe the image in a short sentence."))
elif isinstance(frame, UserImageRawFrame):
await self.push_frame(frame)
@@ -107,6 +121,8 @@ class TextFilterProcessor(FrameProcessor):
self.text = text
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, TextFrame):
if frame.text != self.text:
await self.push_frame(frame)
@@ -116,12 +132,16 @@ class TextFilterProcessor(FrameProcessor):
class ImageFilterProcessor(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if not isinstance(frame, ImageRawFrame):
await self.push_frame(frame)
async def main(room_url: str, token):
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
@@ -133,19 +153,16 @@ async def main(room_url: str, token):
camera_out_height=576,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
vad_analyzer=SileroVADAnalyzer(),
),
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id="pNInz6obpgDQGcFmaJgB",
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
ta = TalkingAnimation()
@@ -168,17 +185,17 @@ async def main(room_url: str, token):
ura = LLMUserResponseAggregator(messages)
pipeline = Pipeline([
transport.input(),
ura,
llm,
ParallelPipeline(
[sa, ir, va, moondream],
[tf, imgf]),
tts,
ta,
transport.output()
])
pipeline = Pipeline(
[
transport.input(),
ura,
llm,
ParallelPipeline([sa, ir, va, moondream], [tf, imgf]),
tts,
ta,
transport.output(),
]
)
task = PipelineTask(pipeline)
await task.queue_frame(quiet_frame)
@@ -196,5 +213,4 @@ async def main(room_url: str, token):
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))
asyncio.run(main())

View File

@@ -1,5 +1,4 @@
python-dotenv
requests
fastapi[all]
uvicorn
pipecat-ai[daily,moondream,openai,silero]
pipecat-ai[daily,cartesia,moondream,openai,silero]

View File

@@ -1,18 +1,21 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import aiohttp
import argparse
import os
import time
import urllib
import requests
from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper
def configure():
async def configure(aiohttp_session: aiohttp.ClientSession):
parser = argparse.ArgumentParser(description="Daily AI SDK Bot Sample")
parser.add_argument(
"-u",
"--url",
type=str,
required=False,
help="URL of the Daily room to join")
"-u", "--url", type=str, required=False, help="URL of the Daily room to join"
)
parser.add_argument(
"-k",
"--apikey",
@@ -28,31 +31,24 @@ def configure():
if not url:
raise Exception(
"No Daily room specified. use the -u/--url option from the command line, or set DAILY_SAMPLE_ROOM_URL in your environment to specify a Daily room URL.")
"No Daily room specified. use the -u/--url option from the command line, or set DAILY_SAMPLE_ROOM_URL in your environment to specify a Daily room URL."
)
if not key:
raise Exception("No Daily API key specified. use the -k/--apikey option from the command line, or set DAILY_API_KEY in your environment to specify a Daily API key, available from https://dashboard.daily.co/developers.")
raise Exception(
"No Daily API key specified. use the -k/--apikey option from the command line, or set DAILY_API_KEY in your environment to specify a Daily API key, available from https://dashboard.daily.co/developers."
)
daily_rest_helper = DailyRESTHelper(
daily_api_key=key,
daily_api_url=os.getenv("DAILY_API_URL", "https://api.daily.co/v1"),
aiohttp_session=aiohttp_session,
)
# Create a meeting token for the given room with an expiration 1 hour in
# the future.
room_name: str = urllib.parse.urlparse(url).path[1:]
expiration: float = time.time() + 60 * 60
expiry_time: float = 60 * 60
res: requests.Response = requests.post(
f"https://api.daily.co/v1/meeting-tokens",
headers={
"Authorization": f"Bearer {key}"},
json={
"properties": {
"room_name": room_name,
"is_owner": True,
"exp": expiration}},
)
if res.status_code != 200:
raise Exception(
f"Failed to create meeting token: {res.status_code} {res.text}")
token: str = res.json()["token"]
token = await daily_rest_helper.get_token(url, expiry_time)
return (url, token)

View File

@@ -1,31 +1,52 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import aiohttp
import os
import argparse
import subprocess
import atexit
from contextlib import asynccontextmanager
from fastapi import FastAPI, Request, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import JSONResponse, RedirectResponse
from utils.daily_helpers import create_room as _create_room, get_token
from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper, DailyRoomParams
MAX_BOTS_PER_ROOM = 1
# Bot sub-process dict for status reporting and concurrency control
bot_procs = {}
daily_helpers = {}
def cleanup():
# Clean up function, just to be extra safe
for proc in bot_procs.values():
for entry in bot_procs.values():
proc = entry[0]
proc.terminate()
proc.wait()
atexit.register(cleanup)
@asynccontextmanager
async def lifespan(app: FastAPI):
aiohttp_session = aiohttp.ClientSession()
daily_helpers["rest"] = DailyRESTHelper(
daily_api_key=os.getenv("DAILY_API_KEY", ""),
daily_api_url=os.getenv("DAILY_API_URL", "https://api.daily.co/v1"),
aiohttp_session=aiohttp_session,
)
yield
await aiohttp_session.close()
cleanup()
app = FastAPI()
app = FastAPI(lifespan=lifespan)
app.add_middleware(
CORSMiddleware,
@@ -39,45 +60,42 @@ app.add_middleware(
@app.get("/start")
async def start_agent(request: Request):
print(f"!!! Creating room")
room_url, room_name = _create_room()
print(f"!!! Room URL: {room_url}")
room = await daily_helpers["rest"].create_room(DailyRoomParams())
print(f"!!! Room URL: {room.url}")
# Ensure the room property is present
if not room_url:
if not room.url:
raise HTTPException(
status_code=500,
detail="Missing 'room' property in request data. Cannot start agent without a target room!")
detail="Missing 'room' property in request data. Cannot start agent without a target room!",
)
# Check if there is already an existing process running in this room
num_bots_in_room = sum(
1 for proc in bot_procs.values() if proc[1] == room_url and proc[0].poll() is None)
1 for proc in bot_procs.values() if proc[1] == room.url and proc[0].poll() is None
)
if num_bots_in_room >= MAX_BOTS_PER_ROOM:
raise HTTPException(
status_code=500, detail=f"Max bot limited reach for room: {room_url}")
raise HTTPException(status_code=500, detail=f"Max bot limited reach for room: {room.url}")
# Get the token for the room
token = get_token(room_url)
token = await daily_helpers["rest"].get_token(room.url)
if not token:
raise HTTPException(
status_code=500, detail=f"Failed to get token for room: {room_url}")
raise HTTPException(status_code=500, detail=f"Failed to get token for room: {room.url}")
# Spawn a new agent, and join the user session
# Note: this is mostly for demonstration purposes (refer to 'deployment' in README)
try:
proc = subprocess.Popen(
[
f"python3 -m bot -u {room_url} -t {token}"
],
[f"python3 -m bot -u {room.url} -t {token}"],
shell=True,
bufsize=1,
cwd=os.path.dirname(os.path.abspath(__file__))
cwd=os.path.dirname(os.path.abspath(__file__)),
)
bot_procs[proc.pid] = (proc, room_url)
bot_procs[proc.pid] = (proc, room.url)
except Exception as e:
raise HTTPException(
status_code=500, detail=f"Failed to start subprocess: {e}")
raise HTTPException(status_code=500, detail=f"Failed to start subprocess: {e}")
return RedirectResponse(room_url)
return RedirectResponse(room.url)
@app.get("/status/{pid}")
@@ -87,8 +105,7 @@ def get_status(pid: int):
# If the subprocess doesn't exist, return an error
if not proc:
raise HTTPException(
status_code=404, detail=f"Bot with process id: {pid} not found")
raise HTTPException(status_code=404, detail=f"Bot with process id: {pid} not found")
# Check the status of the subprocess
if proc[0].poll() is None:
@@ -105,14 +122,10 @@ if __name__ == "__main__":
default_host = os.getenv("HOST", "0.0.0.0")
default_port = int(os.getenv("FAST_API_PORT", "7860"))
parser = argparse.ArgumentParser(
description="Daily Moondream FastAPI server")
parser.add_argument("--host", type=str,
default=default_host, help="Host address")
parser.add_argument("--port", type=int,
default=default_port, help="Port number")
parser.add_argument("--reload", action="store_true",
help="Reload code on change")
parser = argparse.ArgumentParser(description="Daily Moondream FastAPI server")
parser.add_argument("--host", type=str, default=default_host, help="Host address")
parser.add_argument("--port", type=int, default=default_port, help="Port number")
parser.add_argument("--reload", action="store_true", help="Reload code on change")
config = parser.parse_args()

View File

@@ -1,109 +0,0 @@
import urllib.parse
import os
import time
import urllib
import requests
from dotenv import load_dotenv
load_dotenv()
daily_api_path = os.getenv("DAILY_API_URL") or "api.daily.co/v1"
daily_api_key = os.getenv("DAILY_API_KEY")
def create_room() -> tuple[str, str]:
"""
Helper function to create a Daily room.
# See: https://docs.daily.co/reference/rest-api/rooms
Returns:
tuple: A tuple containing the room URL and room name.
Raises:
Exception: If the request to create the room fails or if the response does not contain the room URL or room name.
"""
room_props = {
"exp": time.time() + 60 * 60, # 1 hour
"enable_chat": True,
"enable_emoji_reactions": True,
"eject_at_room_exp": True,
"enable_prejoin_ui": False, # Important for the bot to be able to join headlessly
}
res = requests.post(
f"https://{daily_api_path}/rooms",
headers={"Authorization": f"Bearer {daily_api_key}"},
json={
"properties": room_props
},
)
if res.status_code != 200:
raise Exception(f"Unable to create room: {res.text}")
data = res.json()
room_url: str = data.get("url")
room_name: str = data.get("name")
if room_url is None or room_name is None:
raise Exception("Missing room URL or room name in response")
return room_url, room_name
def get_name_from_url(room_url: str) -> str:
"""
Extracts the name from a given room URL.
Args:
room_url (str): The URL of the room.
Returns:
str: The extracted name from the room URL.
"""
return urllib.parse.urlparse(room_url).path[1:]
def get_token(room_url: str) -> str:
"""
Retrieves a meeting token for the specified Daily room URL.
# See: https://docs.daily.co/reference/rest-api/meeting-tokens
Args:
room_url (str): The URL of the Daily room.
Returns:
str: The meeting token.
Raises:
Exception: If no room URL is specified or if no Daily API key is specified.
Exception: If there is an error creating the meeting token.
"""
if not room_url:
raise Exception(
"No Daily room specified. You must specify a Daily room in order a token to be generated.")
if not daily_api_key:
raise Exception(
"No Daily API key specified. set DAILY_API_KEY in your environment to specify a Daily API key, available from https://dashboard.daily.co/developers.")
expiration: float = time.time() + 60 * 60
room_name = get_name_from_url(room_url)
res: requests.Response = requests.post(
f"https://{daily_api_path}/meeting-tokens",
headers={
"Authorization": f"Bearer {daily_api_key}"},
json={
"properties": {
"room_name": room_name,
"is_owner": True, # Owner tokens required for transcription
"exp": expiration}},
)
if res.status_code != 200:
raise Exception(
f"Failed to create meeting token: {res.status_code} {res.text}")
token: str = res.json()["token"]
return token

View File

@@ -0,0 +1,16 @@
FROM python:3.10-bullseye
RUN mkdir /app
RUN mkdir /app/assets
RUN mkdir /app/utils
COPY *.py /app/
COPY requirements.txt /app/
copy assets/* /app/assets/
copy utils/* /app/utils/
WORKDIR /app
RUN pip3 install -r requirements.txt
EXPOSE 7860
CMD ["python3", "server.py"]

View File

@@ -0,0 +1,37 @@
# Simple Chatbot
<img src="image.png" width="420px">
This app connects you to a chatbot powered by GPT-4, complete with animations generated by Stable Video Diffusion.
See a video of it in action: https://x.com/kwindla/status/1778628911817183509
And a quick video walkthrough of the code: https://www.loom.com/share/13df1967161f4d24ade054e7f8753416
The first time, things might take extra time to get started since VAD (Voice Activity Detection) model needs to be downloaded.
## Get started
```python
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cp env.example .env # and add your credentials
```
## Run the server
```bash
python server.py
```
Then, visit `http://localhost:7860/start` in your browser to start a chatbot session.
## Build and test the Docker image
```
docker build -t chatbot .
docker run --env-file .env -p 7860:7860 chatbot
```

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

@@ -0,0 +1,365 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
import wave
from pipecat.frames.frames import OutputAudioRawFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.logger import FrameLogger
from pipecat.processors.frame_processor import FrameDirection
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.openai import OpenAILLMContext, OpenAILLMContextFrame, OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
sounds = {}
sound_files = [
"clack-short.wav",
"clack.wav",
"clack-short-quiet.wav",
"ding.wav",
"ding2.wav",
]
script_dir = os.path.dirname(__file__)
for file in sound_files:
# Build the full path to the sound file
full_path = os.path.join(script_dir, "assets", file)
# Get the filename without the extension to use as the dictionary key
filename = os.path.splitext(os.path.basename(full_path))[0]
# Open the sound and convert it to bytes
with wave.open(full_path) as audio_file:
sounds[file] = OutputAudioRawFrame(
audio_file.readframes(-1), audio_file.getframerate(), audio_file.getnchannels()
)
class IntakeProcessor:
def __init__(self, context: OpenAILLMContext):
print(f"Initializing context from IntakeProcessor")
context.add_message(
{
"role": "system",
"content": "You are Jessica, an agent for a company called Tri-County Health Services. Your job is to collect important information from the user before their doctor visit. You're talking to Chad Bailey. You should address the user by their first name and be polite and professional. You're not a medical professional, so you shouldn't provide any advice. Keep your responses short. Your job is to collect information to give to a doctor. Don't make assumptions about what values to plug into functions. Ask for clarification if a user response is ambiguous. Start by introducing yourself. Then, ask the user to confirm their identity by telling you their birthday, including the year. When they answer with their birthday, call the verify_birthday function.",
}
)
context.set_tools(
[
{
"type": "function",
"function": {
"name": "verify_birthday",
"description": "Use this function to verify the user has provided their correct birthday.",
"parameters": {
"type": "object",
"properties": {
"birthday": {
"type": "string",
"description": "The user's birthdate, including the year. The user can provide it in any format, but convert it to YYYY-MM-DD format to call this function.",
}
},
},
},
}
]
)
async def verify_birthday(
self, function_name, tool_call_id, args, llm, context, result_callback
):
if args["birthday"] == "1983-01-01":
context.set_tools(
[
{
"type": "function",
"function": {
"name": "list_prescriptions",
"description": "Once the user has provided a list of their prescription medications, call this function.",
"parameters": {
"type": "object",
"properties": {
"prescriptions": {
"type": "array",
"items": {
"type": "object",
"properties": {
"medication": {
"type": "string",
"description": "The medication's name",
},
"dosage": {
"type": "string",
"description": "The prescription's dosage",
},
},
},
}
},
},
},
}
]
)
# It's a bit weird to push this to the LLM, but it gets it into the pipeline
# await llm.push_frame(sounds["ding2.wav"], FrameDirection.DOWNSTREAM)
# We don't need the function call in the context, so just return a new
# system message and let the framework re-prompt
await result_callback(
[
{
"role": "system",
"content": "Next, thank the user for confirming their identity, then ask the user to list their current prescriptions. Each prescription needs to have a medication name and a dosage. Do not call the list_prescriptions function with any unknown dosages.",
}
]
)
else:
# The user provided an incorrect birthday; ask them to try again
await result_callback(
[
{
"role": "system",
"content": "The user provided an incorrect birthday. Ask them for their birthday again. When they answer, call the verify_birthday function.",
}
]
)
async def start_prescriptions(self, function_name, llm, context):
print(f"!!! doing start prescriptions")
# Move on to allergies
context.set_tools(
[
{
"type": "function",
"function": {
"name": "list_allergies",
"description": "Once the user has provided a list of their allergies, call this function.",
"parameters": {
"type": "object",
"properties": {
"allergies": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "What the user is allergic to",
}
},
},
}
},
},
},
}
]
)
context.add_message(
{
"role": "system",
"content": "Next, ask the user if they have any allergies. Once they have listed their allergies or confirmed they don't have any, call the list_allergies function.",
}
)
print(f"!!! about to await llm process frame in start prescrpitions")
await llm.process_frame(OpenAILLMContextFrame(context), FrameDirection.DOWNSTREAM)
print(f"!!! past await process frame in start prescriptions")
async def start_allergies(self, function_name, llm, context):
print("!!! doing start allergies")
# Move on to conditions
context.set_tools(
[
{
"type": "function",
"function": {
"name": "list_conditions",
"description": "Once the user has provided a list of their medical conditions, call this function.",
"parameters": {
"type": "object",
"properties": {
"conditions": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "The user's medical condition",
}
},
},
}
},
},
},
},
]
)
context.add_message(
{
"role": "system",
"content": "Now ask the user if they have any medical conditions the doctor should know about. Once they've answered the question, call the list_conditions function.",
}
)
await llm.process_frame(OpenAILLMContextFrame(context), FrameDirection.DOWNSTREAM)
async def start_conditions(self, function_name, llm, context):
print("!!! doing start conditions")
# Move on to visit reasons
context.set_tools(
[
{
"type": "function",
"function": {
"name": "list_visit_reasons",
"description": "Once the user has provided a list of the reasons they are visiting a doctor today, call this function.",
"parameters": {
"type": "object",
"properties": {
"visit_reasons": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "The user's reason for visiting the doctor",
}
},
},
}
},
},
},
}
]
)
context.add_message(
{
"role": "system",
"content": "Finally, ask the user the reason for their doctor visit today. Once they answer, call the list_visit_reasons function.",
}
)
await llm.process_frame(OpenAILLMContextFrame(context), FrameDirection.DOWNSTREAM)
async def start_visit_reasons(self, function_name, llm, context):
print("!!! doing start visit reasons")
# move to finish call
context.set_tools([])
context.add_message(
{"role": "system", "content": "Now, thank the user and end the conversation."}
)
await llm.process_frame(OpenAILLMContextFrame(context), FrameDirection.DOWNSTREAM)
async def save_data(self, function_name, tool_call_id, args, llm, context, result_callback):
logger.info(f"!!! Saving data: {args}")
# Since this is supposed to be "async", returning None from the callback
# will prevent adding anything to context or re-prompting
await result_callback(None)
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Chatbot",
DailyParams(
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
transcription_enabled=True,
#
# Spanish
#
# transcription_settings=DailyTranscriptionSettings(
# language="es",
# tier="nova",
# model="2-general"
# )
),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", # British Lady
)
# tts = CartesiaTTSService(
# api_key=os.getenv("CARTESIA_API_KEY"),
# voice_id="846d6cb0-2301-48b6-9683-48f5618ea2f6", # Spanish-speaking Lady
# )
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = []
context = OpenAILLMContext(messages=messages)
context_aggregator = llm.create_context_aggregator(context)
intake = IntakeProcessor(context)
llm.register_function("verify_birthday", intake.verify_birthday)
llm.register_function(
"list_prescriptions", intake.save_data, start_callback=intake.start_prescriptions
)
llm.register_function(
"list_allergies", intake.save_data, start_callback=intake.start_allergies
)
llm.register_function(
"list_conditions", intake.save_data, start_callback=intake.start_conditions
)
llm.register_function(
"list_visit_reasons", intake.save_data, start_callback=intake.start_visit_reasons
)
fl = FrameLogger("LLM Output")
pipeline = Pipeline(
[
transport.input(), # Transport input
context_aggregator.user(), # User responses
llm, # LLM
fl, # Frame logger
tts, # TTS
transport.output(), # Transport output
context_aggregator.assistant(), # Assistant responses
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=False))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
print(f"Context is: {context}")
await task.queue_frames([OpenAILLMContextFrame(context)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,4 @@
DAILY_SAMPLE_ROOM_URL=https://yourdomain.daily.co/yourroom # (for joining the bot to the same room repeatedly for local dev)
DAILY_API_KEY=7df...
OPENAI_API_KEY=sk-PL...
ELEVENLABS_API_KEY=aeb...

Binary file not shown.

After

Width:  |  Height:  |  Size: 733 KiB

View File

@@ -0,0 +1,4 @@
python-dotenv
fastapi[all]
uvicorn
pipecat-ai[daily,cartesia,openai,silero]

View File

@@ -0,0 +1,54 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import aiohttp
import argparse
import os
from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper
async def configure(aiohttp_session: aiohttp.ClientSession):
parser = argparse.ArgumentParser(description="Daily AI SDK Bot Sample")
parser.add_argument(
"-u", "--url", type=str, required=False, help="URL of the Daily room to join"
)
parser.add_argument(
"-k",
"--apikey",
type=str,
required=False,
help="Daily API Key (needed to create an owner token for the room)",
)
args, unknown = parser.parse_known_args()
url = args.url or os.getenv("DAILY_SAMPLE_ROOM_URL")
key = args.apikey or os.getenv("DAILY_API_KEY")
if not url:
raise Exception(
"No Daily room specified. use the -u/--url option from the command line, or set DAILY_SAMPLE_ROOM_URL in your environment to specify a Daily room URL."
)
if not key:
raise Exception(
"No Daily API key specified. use the -k/--apikey option from the command line, or set DAILY_API_KEY in your environment to specify a Daily API key, available from https://dashboard.daily.co/developers."
)
daily_rest_helper = DailyRESTHelper(
daily_api_key=key,
daily_api_url=os.getenv("DAILY_API_URL", "https://api.daily.co/v1"),
aiohttp_session=aiohttp_session,
)
# Create a meeting token for the given room with an expiration 1 hour in
# the future.
expiry_time: float = 60 * 60
token = await daily_rest_helper.get_token(url, expiry_time)
return (url, token)

View File

@@ -0,0 +1,137 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import aiohttp
import os
import argparse
import subprocess
from contextlib import asynccontextmanager
from fastapi import FastAPI, Request, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import JSONResponse, RedirectResponse
from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper, DailyRoomParams
MAX_BOTS_PER_ROOM = 1
# Bot sub-process dict for status reporting and concurrency control
bot_procs = {}
daily_helpers = {}
def cleanup():
# Clean up function, just to be extra safe
for entry in bot_procs.values():
proc = entry[0]
proc.terminate()
proc.wait()
@asynccontextmanager
async def lifespan(app: FastAPI):
aiohttp_session = aiohttp.ClientSession()
daily_helpers["rest"] = DailyRESTHelper(
daily_api_key=os.getenv("DAILY_API_KEY", ""),
daily_api_url=os.getenv("DAILY_API_URL", "https://api.daily.co/v1"),
aiohttp_session=aiohttp_session,
)
yield
await aiohttp_session.close()
cleanup()
app = FastAPI(lifespan=lifespan)
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
@app.get("/start")
async def start_agent(request: Request):
print(f"!!! Creating room")
room = await daily_helpers["rest"].create_room(DailyRoomParams())
print(f"!!! Room URL: {room.url}")
# Ensure the room property is present
if not room.url:
raise HTTPException(
status_code=500,
detail="Missing 'room' property in request data. Cannot start agent without a target room!",
)
# Check if there is already an existing process running in this room
num_bots_in_room = sum(
1 for proc in bot_procs.values() if proc[1] == room.url and proc[0].poll() is None
)
if num_bots_in_room >= MAX_BOTS_PER_ROOM:
raise HTTPException(status_code=500, detail=f"Max bot limited reach for room: {room.url}")
# Get the token for the room
token = await daily_helpers["rest"].get_token(room.url)
if not token:
raise HTTPException(status_code=500, detail=f"Failed to get token for room: {room.url}")
# Spawn a new agent, and join the user session
# Note: this is mostly for demonstration purposes (refer to 'deployment' in README)
try:
proc = subprocess.Popen(
[f"python3 -m bot -u {room.url} -t {token}"],
shell=True,
bufsize=1,
cwd=os.path.dirname(os.path.abspath(__file__)),
)
bot_procs[proc.pid] = (proc, room.url)
except Exception as e:
raise HTTPException(status_code=500, detail=f"Failed to start subprocess: {e}")
return RedirectResponse(room.url)
@app.get("/status/{pid}")
def get_status(pid: int):
# Look up the subprocess
proc = bot_procs.get(pid)
# If the subprocess doesn't exist, return an error
if not proc:
raise HTTPException(status_code=404, detail=f"Bot with process id: {pid} not found")
# Check the status of the subprocess
if proc[0].poll() is None:
status = "running"
else:
status = "finished"
return JSONResponse({"bot_id": pid, "status": status})
if __name__ == "__main__":
import uvicorn
default_host = os.getenv("HOST", "0.0.0.0")
default_port = int(os.getenv("FAST_API_PORT", "7860"))
parser = argparse.ArgumentParser(description="Daily Storyteller FastAPI server")
parser.add_argument("--host", type=str, default=default_host, help="Host address")
parser.add_argument("--port", type=int, default=default_port, help="Port number")
parser.add_argument("--reload", action="store_true", help="Reload code on change")
config = parser.parse_args()
print(f"to join a test room, visit http://localhost:{config.port}/start")
uvicorn.run(
"server:app",
host=config.host,
port=config.port,
reload=config.reload,
)

View File

@@ -1,3 +1,9 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
@@ -8,19 +14,22 @@ from PIL import Image
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import LLMAssistantResponseAggregator, LLMUserResponseAggregator
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator,
LLMUserResponseAggregator,
)
from pipecat.frames.frames import (
AudioRawFrame,
ImageRawFrame,
OutputImageRawFrame,
SpriteFrame,
Frame,
LLMMessagesFrame,
TTSStoppedFrame
TTSAudioRawFrame,
TTSStoppedFrame,
)
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTranscriptionSettings, DailyTransport
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
@@ -28,6 +37,7 @@ from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
@@ -43,7 +53,7 @@ for i in range(1, 26):
# Get the filename without the extension to use as the dictionary key
# Open the image and convert it to bytes
with Image.open(full_path) as img:
sprites.append(ImageRawFrame(image=img.tobytes(), size=img.size, format=img.format))
sprites.append(OutputImageRawFrame(image=img.tobytes(), size=img.size, format=img.format))
flipped = sprites[::-1]
sprites.extend(flipped)
@@ -64,7 +74,9 @@ class TalkingAnimation(FrameProcessor):
self._is_talking = False
async def process_frame(self, frame: Frame, direction: FrameDirection):
if isinstance(frame, AudioRawFrame):
await super().process_frame(frame, direction)
if isinstance(frame, TTSAudioRawFrame):
if not self._is_talking:
await self.push_frame(talking_frame)
self._is_talking = True
@@ -75,8 +87,10 @@ class TalkingAnimation(FrameProcessor):
await self.push_frame(frame)
async def main(room_url: str, token):
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
@@ -97,17 +111,15 @@ async def main(room_url: str, token):
# tier="nova",
# model="2-general"
# )
)
),
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
#
# English
#
voice_id="pNInz6obpgDQGcFmaJgB",
#
# Spanish
#
@@ -115,9 +127,7 @@ async def main(room_url: str, token):
# voice_id="gD1IexrzCvsXPHUuT0s3",
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4-turbo-preview")
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [
{
@@ -126,7 +136,6 @@ async def main(room_url: str, token):
# English
#
"content": "You are Chatbot, a friendly, helpful robot. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way, but keep your responses brief. Start by introducing yourself.",
#
# Spanish
#
@@ -139,15 +148,17 @@ async def main(room_url: str, token):
ta = TalkingAnimation()
pipeline = Pipeline([
transport.input(),
user_response,
llm,
tts,
ta,
transport.output(),
assistant_response,
])
pipeline = Pipeline(
[
transport.input(),
user_response,
llm,
tts,
ta,
transport.output(),
assistant_response,
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
await task.queue_frame(quiet_frame)
@@ -163,5 +174,4 @@ async def main(room_url: str, token):
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))
asyncio.run(main())

View File

@@ -1,5 +1,4 @@
python-dotenv
requests
fastapi[all]
uvicorn
pipecat-ai[daily,openai,silero]
pipecat-ai[daily,elevenlabs,openai,silero]

View File

@@ -1,18 +1,21 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import aiohttp
import argparse
import os
import time
import urllib
import requests
from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper
def configure():
async def configure(aiohttp_session: aiohttp.ClientSession):
parser = argparse.ArgumentParser(description="Daily AI SDK Bot Sample")
parser.add_argument(
"-u",
"--url",
type=str,
required=False,
help="URL of the Daily room to join")
"-u", "--url", type=str, required=False, help="URL of the Daily room to join"
)
parser.add_argument(
"-k",
"--apikey",
@@ -28,31 +31,24 @@ def configure():
if not url:
raise Exception(
"No Daily room specified. use the -u/--url option from the command line, or set DAILY_SAMPLE_ROOM_URL in your environment to specify a Daily room URL.")
"No Daily room specified. use the -u/--url option from the command line, or set DAILY_SAMPLE_ROOM_URL in your environment to specify a Daily room URL."
)
if not key:
raise Exception("No Daily API key specified. use the -k/--apikey option from the command line, or set DAILY_API_KEY in your environment to specify a Daily API key, available from https://dashboard.daily.co/developers.")
raise Exception(
"No Daily API key specified. use the -k/--apikey option from the command line, or set DAILY_API_KEY in your environment to specify a Daily API key, available from https://dashboard.daily.co/developers."
)
daily_rest_helper = DailyRESTHelper(
daily_api_key=key,
daily_api_url=os.getenv("DAILY_API_URL", "https://api.daily.co/v1"),
aiohttp_session=aiohttp_session,
)
# Create a meeting token for the given room with an expiration 1 hour in
# the future.
room_name: str = urllib.parse.urlparse(url).path[1:]
expiration: float = time.time() + 60 * 60
expiry_time: float = 60 * 60
res: requests.Response = requests.post(
f"https://api.daily.co/v1/meeting-tokens",
headers={
"Authorization": f"Bearer {key}"},
json={
"properties": {
"room_name": room_name,
"is_owner": True,
"exp": expiration}},
)
if res.status_code != 200:
raise Exception(
f"Failed to create meeting token: {res.status_code} {res.text}")
token: str = res.json()["token"]
token = await daily_rest_helper.get_token(url, expiry_time)
return (url, token)

Some files were not shown because too many files have changed in this diff Show More