Compare commits

...

127 Commits

Author SHA1 Message Date
Jon Taylor
5bd5d22270 removed space from event handler 2024-06-26 18:30:56 +01:00
Jon Taylor
6ee7932337 added pause to start and new intro prompt 2024-06-26 18:24:14 +01:00
Jon Taylor
c407445dd1 removed header comment from bot runner 2024-06-24 17:35:26 +01:00
Jon Taylor
447f37167e added VAD stop seconds env 2024-06-24 17:34:25 +01:00
Jon Taylor
354c21500e prompt tweaks 2024-06-24 17:28:10 +01:00
Jon Taylor
5728e25b5a added fastbot example 2024-06-24 16:25:36 +01:00
Kwindla Hultman Kramer
0b6a19802f Merge pull request #250 from pipecat-ai/lewis/flush-tts-on-llm-response-end
Flush output from TTSService on LLMFullResponseEndFrame
2024-06-22 20:37:45 -04:00
Lewis Wolfgang
c4a2d2197c Flush output from TTSService on LLMFullResponseEndFrame
To cover cases when the LLM response does not end in punctuation.
2024-06-22 14:57:44 -04:00
Aleix Conchillo Flaqué
269d06aa15 Merge pull request #249 from pipecat-ai/aleix/pipecat-0.0.32
update CHANGELOG.md for 0.0.32
2024-06-22 09:21:21 -07:00
Aleix Conchillo Flaqué
dfef1f2c54 update CHANGELOG.md for 0.0.32 2024-06-22 09:19:22 -07:00
Aleix Conchillo Flaqué
b62beaba0b Merge pull request #248 from pipecat-ai/aleix/deepgramstt-url
services(deepgram): add url to DeepgramSTTService
2024-06-21 22:26:23 -07:00
Aleix Conchillo Flaqué
adf414e40f services(deepgram): add url to DeepgramSTTService 2024-06-21 16:52:28 -07:00
Aleix Conchillo Flaqué
dc64e57f63 Merge pull request #241 from pipecat-ai/aleix/transports-async
transports: fully use asyncio in all read/write operations
2024-06-21 16:00:08 -07:00
Aleix Conchillo Flaqué
d3e410b2ac transports: fully use asyncio in all read/write operations 2024-06-21 15:55:15 -07:00
Aleix Conchillo Flaqué
c544b2474b update linux-py3.10-requirements with fastapi and new daily-python 2024-06-21 15:44:01 -07:00
Aleix Conchillo Flaqué
18243de358 add fastapi and update macos-py3.10-requirements.txt 2024-06-21 13:16:47 -07:00
Aleix Conchillo Flaqué
6625895d1f update macos-py3.10-requirements.txt 2024-06-21 13:13:02 -07:00
Aleix Conchillo Flaqué
f9ecce739e Merge pull request #247 from pipecat-ai/aleix/twilio-updates
some twilio updates
2024-06-21 10:14:40 -07:00
Aleix Conchillo Flaqué
0075dd8386 update linux/macos-py3.10-requirements.txt 2024-06-21 09:48:12 -07:00
Aleix Conchillo Flaqué
eef1cde816 updated CHANGELOG.md with fastapi and twilio updates 2024-06-21 09:48:12 -07:00
Aleix Conchillo Flaqué
8d867c30c6 transports(websocket): verify websockets module 2024-06-21 09:48:12 -07:00
Aleix Conchillo Flaqué
42c668b7ae examples(twilio-chatbot): update instructions and renames 2024-06-21 09:48:12 -07:00
Aleix Conchillo Flaqué
b62227b4ae serializers(twilio): formatting and allow str | bytes | None 2024-06-21 09:47:17 -07:00
Aleix Conchillo Flaqué
25ef0cb87b serializers: allow str | bytes | None 2024-06-21 09:42:43 -07:00
Aleix Conchillo Flaqué
e195941aa5 Merge pull request #246 from pipecat-ai/aleix/daily-dialout-answered
transports(daily): added dialout_answered event
2024-06-20 18:37:24 -07:00
Aleix Conchillo Flaqué
e09eef1dd7 Merge pull request #243 from Viking5274/main
Add twilio_websocket_service with example
2024-06-20 14:09:48 -07:00
Aleix Conchillo Flaqué
7c13663a4e transports(daily): added dialout_answered event 2024-06-20 13:01:25 -07:00
daniil5701133
5753869e5e add twilio-chatbot example with README.md info how to start app
created twilio_websocket_service.py, TwilioFrameSerializer.py

moved pcm_16000_to_ulaw_8000 and ulaw_8000_to_pcm_16000 to src/pipecat/utils/audio.py
fixed callback on disconnect
2024-06-20 23:00:01 +03:00
chadbailey59
ba878a19f4 fixed "Dr." interruption (#245) 2024-06-19 20:53:04 -05:00
Aleix Conchillo Flaqué
55a9de78cd Merge pull request #239 from pipecat-ai/aleix/azure-stt
azure stt support
2024-06-14 14:07:07 +08:00
Aleix Conchillo Flaqué
ff51fc9091 updated CHANGELOG and README 2024-06-13 17:03:49 -07:00
Aleix Conchillo Flaqué
a4f857ee34 examples: use new AzureSTTService in 07f-interruptible-azure 2024-06-13 17:03:49 -07:00
Aleix Conchillo Flaqué
3250d74bef services(azure): new AzureSTTService 2024-06-13 17:03:49 -07:00
Aleix Conchillo Flaqué
c086160239 examples: cleanup some 07 interruptible examples 2024-06-13 16:36:10 -07:00
Aleix Conchillo Flaqué
6cdccaff53 Merge pull request #238 from pipecat-ai/aleix/pipecat-0.0.31
pipecat 0.0.31
2024-06-14 06:31:41 +08:00
Aleix Conchillo Flaqué
a9ab8de25d update CHANGELOG for 0.0.31 2024-06-13 15:31:03 -07:00
Aleix Conchillo Flaqué
2a29cb18a5 transports(base_output): chunk audio into 20ms instead of 10ms 2024-06-13 15:30:41 -07:00
Aleix Conchillo Flaqué
4193a4f415 Merge pull request #237 from pipecat-ai/aleix/pipecat-0.0.30
update CHANGELOG for 0.0.30
2024-06-14 05:28:14 +08:00
Aleix Conchillo Flaqué
0226ec450a update CHANGELOG for 0.0.30 2024-06-13 14:27:37 -07:00
Aleix Conchillo Flaqué
020b8ebb35 Merge pull request #236 from pipecat-ai/aleix/report-only-initial-ttfb
report only initial ttfb
2024-06-14 05:24:52 +08:00
Aleix Conchillo Flaqué
1170b30c1b aggregator(user_response): also handle small VADParams.stop_secs 2024-06-13 13:30:31 -07:00
Aleix Conchillo Flaqué
0004d4a906 vad: reduce smoothing factor and increase confidence 2024-06-13 13:30:11 -07:00
Aleix Conchillo Flaqué
cb27e86266 metrics: allow sending only initial TTFB metrics 2024-06-13 13:30:00 -07:00
Aleix Conchillo Flaqué
77a3b2ea5c Merge pull request #235 from pipecat-ai/aleix/openpipe-refactoring
openpipe refactoring
2024-06-14 01:28:50 +08:00
Aleix Conchillo Flaqué
099e65f3b6 report processor name in error logs 2024-06-13 10:20:45 -07:00
Aleix Conchillo Flaqué
befb8db120 update pyproject and requirements 2024-06-13 10:20:45 -07:00
Aleix Conchillo Flaqué
9992d826b1 examples: renamed 06b-listen... to 07h-inte... 2024-06-13 10:18:20 -07:00
Aleix Conchillo Flaqué
18604e1a39 re-add removed CHANGELOG lines 2024-06-13 10:18:20 -07:00
Aleix Conchillo Flaqué
312c569182 services(openpipe): refactored so it's based on BaseOpenAILLMService 2024-06-13 09:30:50 -07:00
Aleix Conchillo Flaqué
b43e0ed130 Merge pull request #233 from KwalAI/openpipe-integration
OpenPipe Integration
2024-06-13 22:41:57 +08:00
Aleix Conchillo Flaqué
289debea34 Merge pull request #234 from pipecat-ai/aleix/fix-daily-room-properties-exp
transports(helpers): fix DailyRoomProperties.exp
2024-06-13 22:38:41 +08:00
Aleix Conchillo Flaqué
ccd6af7016 transports(helpers): fix DailyRoomProperties.exp 2024-06-12 23:15:22 -07:00
Ankur Duggal
effc69e4e4 formatting 2024-06-12 15:01:19 -07:00
Ankur Duggal
c7a0d0db64 OpenPipe Integration 2024-06-12 14:23:56 -07:00
Aleix Conchillo Flaqué
50d69a1ca4 Merge pull request #231 from pipecat-ai/aleix/websocket-deserializer-none
serializer: allow deserialize() to return None
2024-06-13 04:36:03 +08:00
Aleix Conchillo Flaqué
8a6b8fe70a Merge pull request #232 from pipecat-ai/aleix/pyproject-deepgram
pyproject: add deepgram-sdk
2024-06-13 03:53:08 +08:00
Aleix Conchillo Flaqué
c4e53aea71 update macos-py3.10-requirements with deepgram 2024-06-12 12:52:20 -07:00
Aleix Conchillo Flaqué
ad5125e93f pyproject: add deepgram-sdk 2024-06-12 12:50:18 -07:00
Aleix Conchillo Flaqué
8d92cbac93 Merge pull request #230 from pipecat-ai/aleix/processor-names
processor names
2024-06-13 03:16:07 +08:00
Aleix Conchillo Flaqué
0225443ec8 transports(base): always send MetricsFrame 2024-06-12 12:15:29 -07:00
Aleix Conchillo Flaqué
71e1d0a334 pipeline: send initial TTFB initial metrics from PipelineTask 2024-06-12 12:15:29 -07:00
Aleix Conchillo Flaqué
83f69e02fd allow specifying frame processor names 2024-06-12 12:15:29 -07:00
Aleix Conchillo Flaqué
e1b2da1ff0 serializer: allow deserialize() to return None 2024-06-12 12:11:36 -07:00
Kwindla Hultman Kramer
5eb1b90a4b Merge pull request #229 from pipecat-ai/khk-deepgram-url-configurable
Deepgram TTS service improvements
2024-06-12 14:52:04 -04:00
Kwindla Hultman Kramer
9c4ee74b91 bot to test for demo 2024-06-12 10:41:49 -07:00
Aleix Conchillo Flaqué
f65f566829 re-add transports/services/helpers/__init__.py 2024-06-12 10:37:28 -07:00
Aleix Conchillo Flaqué
c8ad3123b7 Merge pull request #207 from pipecat-ai/dialin-example
New example: Dialin bot (call your Pipecat via phone)
2024-06-13 01:36:00 +08:00
Jon Taylor
8cefce28cf added example fly toml 2024-06-12 10:35:03 -07:00
Jon Taylor
a834d26885 removed https from daily boy 2024-06-12 10:35:03 -07:00
Jon Taylor
810e3cd551 added fly.example.toml due to gitignore 2024-06-12 10:35:03 -07:00
Jon Taylor
f258fa96cd added env to dockerignore 2024-06-12 10:35:03 -07:00
Jon Taylor
757ec61f14 added deepgram to readme 2024-06-12 10:35:03 -07:00
Jon Taylor
2c933f43d8 linting errors and removed unusued sip url 2024-06-12 10:35:03 -07:00
Jon Taylor
cc5bfa8af8 removed helps and fixed linting 2024-06-12 10:35:03 -07:00
Jon Taylor
de9f3e55f1 new example: dialin 2024-06-12 10:35:03 -07:00
Aleix Conchillo Flaqué
ed0c986218 Merge pull request #228 from pipecat-ai/aleix/websocket-fixes
websocket fixes
2024-06-13 01:30:21 +08:00
Aleix Conchillo Flaqué
72c27215b6 transports(websocket): use push_audio_frame() 2024-06-12 10:29:39 -07:00
Aleix Conchillo Flaqué
c23b14f768 examples: use DeepgramSTTService in websocker-server 2024-06-12 10:29:22 -07:00
Aleix Conchillo Flaqué
81282f9c4d services(deepgram): keep conenction alive 2024-06-12 10:29:22 -07:00
Aleix Conchillo Flaqué
2b324f6f81 Merge pull request #227 from pipecat-ai/aleix/daily-room-properties-extra
transports(daily): DailyRoomProperties now allow extra unknown parame…
2024-06-13 00:25:07 +08:00
Kwindla Hultman Kramer
049f110344 PipelineTask should not exit when Deepgram TTS returns a Bad Request "unutterable" 2024-06-12 09:24:09 -07:00
Kwindla Hultman Kramer
448a0307a8 rebasing 2024-06-12 07:54:18 -07:00
Aleix Conchillo Flaqué
7390e42f5c transports(daily): DailyRoomProperties now allow extra unknown parameters 2024-06-11 22:31:32 -07:00
Aleix Conchillo Flaqué
ee880d229f Merge pull request #223 from pipecat-ai/aleix/fix-lower-vad-stop-secs
processors: fix LLMResponseAggregator with lower VAD values
2024-06-12 13:30:34 +08:00
Aleix Conchillo Flaqué
9cd07d81f8 processors: fix LLMResponseAggregator with lower VAD values 2024-06-11 22:30:06 -07:00
Aleix Conchillo Flaqué
b453d089c3 Merge pull request #226 from pipecat-ai/aleix/chunk-audio-output
transport: chunk longer audio frames
2024-06-12 13:28:28 +08:00
Aleix Conchillo Flaqué
7410fe1d1e transport: chunk longer audio frames 2024-06-11 17:50:51 -07:00
Aleix Conchillo Flaqué
6323a77431 Merge pull request #224 from pipecat-ai/aleix/deepgram-stt-simple
deepgram stt simple
2024-06-12 08:48:19 +08:00
Aleix Conchillo Flaqué
0aedaa8553 services(deepgram): abstract StartFrame/EndFrame/CancelFrame 2024-06-10 21:18:42 -07:00
Aleix Conchillo Flaqué
6554479d39 transports: don't queue system frames 2024-06-10 21:00:01 -07:00
Aleix Conchillo Flaqué
ce2ebd3198 examples: updated 07c-interruptible-deepgram to usee DeepgramSTTService 2024-06-10 21:00:01 -07:00
Aleix Conchillo Flaqué
13ea1efc96 examples: add new 13b-deepgram-transcription 2024-06-10 21:00:01 -07:00
Aleix Conchillo Flaqué
ef380321cf services: added new DeepgramSTTService 2024-06-10 21:00:01 -07:00
Kwindla Hultman Kramer
294b037730 configurable deepgram base url 2024-06-08 09:38:48 -04:00
Aleix Conchillo Flaqué
7603996612 Merge pull request #220 from pipecat-ai/aleix/pipecat-0.0.29
update CHANGELOG for 0.0.29
2024-06-08 04:43:52 +08:00
Aleix Conchillo Flaqué
3048d2b0b1 update CHANGELOG for 0.0.29 2024-06-07 13:43:00 -07:00
Aleix Conchillo Flaqué
0bb47a09d2 Merge pull request #218 from pipecat-ai/aleix/send-inital-metrics-mapping
send inital metrics mapping
2024-06-08 04:41:59 +08:00
Aleix Conchillo Flaqué
1afe6901d9 processors: add processors_with_metrics() and can_generate_metrics() 2024-06-07 13:38:21 -07:00
Aleix Conchillo Flaqué
3e019fb512 services(openai): remove unused _chat_completions 2024-06-07 13:18:11 -07:00
Aleix Conchillo Flaqué
e069aa9608 updated CHANGELOG with BasePipeline 2024-06-07 13:18:09 -07:00
Aleix Conchillo Flaqué
0b32e42d25 transports(daily): fix extra super().process_frame() 2024-06-07 13:17:50 -07:00
Aleix Conchillo Flaqué
8d18be5069 services(anthropic): fix metrics 2024-06-07 13:17:50 -07:00
Aleix Conchillo Flaqué
e715d99d0c pipeline: send initial ttfb metrics mapping 2024-06-07 13:17:50 -07:00
Aleix Conchillo Flaqué
dc28590247 moved ParallelTask to pipecat.pipeline.parallel_task 2024-06-07 13:17:50 -07:00
Aleix Conchillo Flaqué
139f158ea1 Merge pull request #219 from pipecat-ai/aleix/switch-voices
switch voices and languages
2024-06-08 04:13:25 +08:00
Aleix Conchillo Flaqué
4b2a18837f services(whisper): add text logging 2024-06-07 13:12:51 -07:00
Aleix Conchillo Flaqué
b4340d0185 services(whisper): increase no speech probability to 0.4 2024-06-07 13:12:21 -07:00
Aleix Conchillo Flaqué
90d11398e6 examples: add 15a-switch-languages 2024-06-07 13:12:21 -07:00
Aleix Conchillo Flaqué
bf8c73b25b examples: add 15-switch-voices 2024-06-07 13:12:21 -07:00
Aleix Conchillo Flaqué
21cd21de1b processors(filters): add FunctionFilter 2024-06-07 13:12:18 -07:00
Aleix Conchillo Flaqué
c25f6e56e7 Merge pull request #217 from pipecat-ai/khk-tts-timings
Added TTFB timings for all TTS services
2024-06-07 05:42:52 +08:00
Aleix Conchillo Flaqué
a1f1d1995c transports: allow sending metrics 2024-06-06 14:35:34 -07:00
Aleix Conchillo Flaqué
390582d7f3 services: use start/stop_ttfb_metrics to report TTFB metrics 2024-06-06 14:00:10 -07:00
Aleix Conchillo Flaqué
e765a29ca2 processors: implement base process_frame(). all subsclassed should call it 2024-06-06 10:54:21 -07:00
Kwindla Hultman Kramer
cf5c244487 Merge branch 'main' into khk-tts-timings 2024-06-06 13:05:42 -04:00
Kwindla Hultman Kramer
a5eb30a93d changelog 2024-06-06 11:49:05 -04:00
Kwindla Hultman Kramer
ac7bc35944 azure tts ttfb 2024-06-06 11:45:48 -04:00
Kwindla Hultman Kramer
ddfd721f6e openai tts ttfb 2024-06-06 11:32:47 -04:00
Kwindla Hultman Kramer
aee3916cd1 cartesia async fixed 2024-06-06 11:24:26 -04:00
Kwindla Hultman Kramer
3eff1e559b pipecat async working, but maybe needs a threaded implementation 2024-06-06 11:11:06 -04:00
Kwindla Hultman Kramer
1a542c91fa temp commit, woring on playht 2024-06-06 10:48:22 -04:00
Aleix Conchillo Flaqué
cd60a84f8a Merge pull request #215 from pipecat-ai/aleix/silero-vad-memory-fix
vad(silero): fix memory issue
2024-06-06 05:50:47 +08:00
Aleix Conchillo Flaqué
3dd4bac6e6 vad(silero): fix memory issue 2024-06-05 14:50:28 -07:00
Kwindla Hultman Kramer
06ff9cfede added timing logs for cartesia, deepgram, elevenlabs 2024-06-05 16:12:10 -04:00
Aleix Conchillo Flaqué
2d1ed9a304 Merge pull request #214 from pipecat-ai/aleix/pipecat-0.0.27
transports(daily): added participants() and participant_counts()
2024-06-06 03:15:34 +08:00
Aleix Conchillo Flaqué
50b51c05f6 transports(daily): added participants() and participant_counts() 2024-06-05 12:14:00 -07:00
Aleix Conchillo Flaqué
5ce4b8dd5b update CHANGELOG with OpenAITTSService 2024-06-05 11:44:24 -07:00
108 changed files with 4748 additions and 718 deletions

View File

@@ -5,10 +5,144 @@ All notable changes to **pipecat** will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [0.0.32] - 2024-06-22
### Added
- Allow specifying a `DeepgramSTTService` url which allows using on-prem
Deepgram.
- Added new `FastAPIWebsocketTransport`. This is a new websocket transport that
can be integrated with FastAPI websockets.
- Added new `TwilioFrameSerializer`. This is a new serializer that knows how to
serialize and deserialize audio frames from Twilio.
- Added Daily transport event: `on_dialout_answered`. See
https://reference-python.daily.co/api_reference.html#daily.EventHandler
- Added new `AzureSTTService`. This allows you to use Azure Speech-To-Text.
### Performance
- Convert `BaseOutputTransport` and `BaseOutputTransport` to fully use asyncio
and remove the use of threads.
### Other
- Added `twilio-chatbot`. This is an example that shows how to integrate Twilio
phone numbers with a Pipecat bot.
- Updated `07f-interruptible-azure.py` to use `AzureLLMService`,
`AzureSTTService` and `AzureTTSService`.
## [0.0.31] - 2024-06-13
### Performance
- Break long audio frames into 20ms chunks instead of 10ms.
## [0.0.30] - 2024-06-13
### Added
- Added `report_only_initial_ttfb` to `PipelineParams`. This will make it so
only the initial TTFB metrics after the user stops talking are reported.
- Added `OpenPipeLLMService`. This service will let you run OpenAI through
OpenPipe's SDK.
- Allow specifying frame processors' name through a new `name` constructor
argument.
- Added `DeepgramSTTService`. This service has an ongoing websocket
connection. To handle this, it subclasses `AIService` instead of
`STTService`. The output of this service will be pushed from the same task,
except system frames like `StartFrame`, `CancelFrame` or
`StartInterruptionFrame`.
### Changed
- `FrameSerializer.deserialize()` can now return `None` in case it is not
possible to desearialize the given data.
- `daily_rest.DailyRoomProperties` now allows extra unknown parameters.
### Fixed
- Fixed an issue where `DailyRoomProperties.exp` always had the same old
timestamp unless set by the user.
- Fixed a couple of issues with `WebsocketServerTransport`. It needed to use
`push_audio_frame()` and also VAD was not working properly.
- Fixed an issue that would cause LLM aggregator to fail with small
`VADParams.stop_secs` values.
- Fixed an issue where `BaseOutputTransport` would send longer audio frames
preventing interruptions.
### Other
- Added new `07h-interruptible-openpipe.py` example. This example shows how to
use OpenPipe to run OpenAI LLMs and get the logs stored in OpenPipe.
- Added new `dialin-chatbot` example. This examples shows how to call the bot
using a phone number.
## [0.0.29] - 2024-06-07
### Added
- Added a new `FunctionFilter`. This filter will let you filter frames based on
a given function, except system messages which should never be filtered.
- Added `FrameProcessor.can_generate_metrics()` method to indicate if a
processor can generate metrics. In the future this might get an extra argument
to ask for a specific type of metric.
- Added `BasePipeline`. All pipeline classes should be based on this class. All
subclasses should implement a `processors_with_metrics()` method that returns
a list of all `FrameProcessor`s in the pipeline that can generate metrics.
- Added `enable_metrics` to `PipelineParams`.
- Added `MetricsFrame`. The `MetricsFrame` will report different metrics in the
system. Right now, it can report TTFB (Time To First Byte) values for
different services, that is the time spent between the arrival of a `Frame` to
the processor/service until the first `DataFrame` is pushed downstream. If
metrics are enabled an intial `MetricsFrame` with all the services in the
pipeline will be sent.
- Added TTFB metrics and debug logging for TTS services.
### Changed
- Moved `ParallelTask` to `pipecat.pipeline.parallel_task`.
### Fixed
- Fixed PlayHT TTS service to work properly async.
## [0.0.28] - 2024-06-05
### Fixed
- Fixed an issue with `SileroVADAnalyzer` that would cause memory to keep
growing indefinitely.
## [0.0.27] - 2024-06-05
### Added
- Added `DailyTransport.participants()` and `DailyTransport.participant_counts()`.
## [0.0.26] - 2024-06-05
### Added
- Added `OpenAITTSService`.
- Allow passing `output_format` and `model_id` to `CartesiaTTSService` to change
audio sample format and the model to use.

View File

@@ -39,7 +39,7 @@ pip install "pipecat-ai[option,...]"
Your project may or may not need these, so they're made available as optional requirements. Here is a list:
- **AI services**: `anthropic`, `azure`, `deepgram`, `google`, `fal`, `moondream`, `openai`, `playht`, `silero`, `whisper`
- **AI services**: `anthropic`, `azure`, `deepgram`, `google`, `fal`, `moondream`, `openai`, `openpipe`, `playht`, `silero`, `whisper`
- **Transports**: `local`, `websocket`, `daily`
## Code examples

View File

@@ -2,6 +2,7 @@ autopep8~=2.1.0
build~=1.2.1
grpcio-tools~=1.62.2
pip-tools~=7.4.1
pyright~=1.1.367
pytest~=8.2.0
setuptools~=69.5.1
setuptools_scm~=8.1.0

View File

@@ -33,3 +33,6 @@ PLAY_HT_API_KEY=...
# OpenAI
OPENAI_API_KEY=...
#OpenPipe
OPENPIPE_API_KEY=...

View File

@@ -32,13 +32,15 @@ Next, follow the steps in the README for each demo.
## Projects:
| Project | Description | Services |
| -------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------- |
| [Simple Chatbot](simple-chatbot) | Basic voice-driven conversational bot. A good starting point for learning the flow of the framework. | Deepgram, OpenAI, Daily, Daily Prebuilt UI |
| [Storytelling Chatbot](storytelling-chatbot) | Stitches together multiple third-party services to create a collaborative storytime experience. | Deepgram, ElevenLabs, Open AI, Fal, Daily, Custom UI |
| [Translation Chatbot](translation-chatbot) | Listens for user speech, then translates that speech to Spanish and speaks the translation back. Demonstrates multi-participant use-cases. | Deepgram, Azure, OpenAI, Daily, Daily Prebuilt UI |
| [Moondream Chatbot](moondream-chatbot) | Demonstrates how to add vision capabilities to GPT4. **Note: works best with a GPU** | Deepgram, OpenAI, Moondream, Daily, Daily Prebuilt UI |
| Function-calling Chatbot (TBC) | A chatbot that can call functions in response to user input | Deepgram, OpenAI, Fireworks, Daily, Daily Prebuilt UI |
| Project | Description | Services |
|----------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------|
| [Simple Chatbot](simple-chatbot) | Basic voice-driven conversational bot. A good starting point for learning the flow of the framework. | Deepgram, ElevenLabs, OpenAI, Daily, Daily Prebuilt UI |
| [Storytelling Chatbot](storytelling-chatbot) | Stitches together multiple third-party services to create a collaborative storytime experience. | Deepgram, ElevenLabs, OpenAI, Fal, Daily, Custom UI |
| [Translation Chatbot](translation-chatbot) | Listens for user speech, then translates that speech to Spanish and speaks the translation back. Demonstrates multi-participant use-cases. | Deepgram, Azure, OpenAI, Daily, Daily Prebuilt UI |
| [Moondream Chatbot](moondream-chatbot) | Demonstrates how to add vision capabilities to GPT4. **Note: works best with a GPU** | Deepgram, ElevenLabs, OpenAI, Moondream, Daily, Daily Prebuilt UI |
| [Patient intake](patient-intake) | A chatbot that can call functions in response to user input. | Deepgram, ElevenLabs, OpenAI, Daily, Daily Prebuilt UI |
| [Dialin Chatbot](dialin-chatbot) | A chatbot that connects to an incoming phone call from Daily or Twilio. | Deepgram, ElevenLabs, OpenAI, Daily, Twilio |
| [Twilio Chatbot](twilio-chatbot) | A chatbot that connects to an incoming phone call from Twilio. | Deepgram, ElevenLabs, OpenAI, Daily, Twilio |
> [!IMPORTANT]
> These example projects use Daily as a WebRTC transport and can be joined using their hosted Prebuilt UI.

View File

@@ -0,0 +1,3 @@
**/.DS_Store
.env
.env.*

165
examples/dialin-chatbot/.gitignore vendored Normal file
View File

@@ -0,0 +1,165 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
.pybuilder/
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock
# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype static type analyzer
.pytype/
# Cython debug symbols
cython_debug/
# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
runpod.toml
# custom script to recursively upgrade items in requirements.py
upgrade_requirements.py
.DS_Store

View File

@@ -0,0 +1,40 @@
FROM python:3.11-bullseye
ARG DEBIAN_FRONTEND=noninteractive
ARG USE_PERSISTENT_DATA
ENV PYTHONUNBUFFERED=1
# Expose FastAPI port
ENV FAST_API_PORT=7860
EXPOSE 7860
# Install system dependencies
RUN apt-get update && apt-get install --no-install-recommends -y \
build-essential \
git \
ffmpeg \
google-perftools \
ca-certificates curl gnupg \
&& apt-get clean && rm -rf /var/lib/apt/lists/*
# Set up a new user named "user" with user ID 1000
RUN useradd -m -u 1000 user
# Set home to the user's home directory
ENV HOME=/home/user \
PATH=/home/user/.local/bin:$PATH \
PYTHONPATH=$HOME/app \
PYTHONUNBUFFERED=1
# Switch to the "user" user
USER user
# Set the working directory to the user's home directory
WORKDIR $HOME/app
# Install Python dependencies
COPY *.py .
COPY ./requirements.txt requirements.txt
RUN pip3 install --no-cache-dir --upgrade -r requirements.txt
# Start the FastAPI server
CMD python3 bot_runner.py --host "0.0.0.0" --port ${FAST_API_PORT}

View File

@@ -0,0 +1,85 @@
<div align="center">
 <img alt="pipecat" width="300px" height="auto" src="image.png">
</div>
# Dialin example
Example project that demonstrates how to add phone number dialin to your Pipecat bots. We include examples for both Daily (`bot_daily.py`) and Twilio (`bot_twilio.py`), depending on who you want to use as a phone vendor.
- 🔁 Transport: Daily WebRTC
- 💬 Speech-to-Text: Deepgram via Daily transport
- 🤖 LLM: GPT4-o / OpenAI
- 🔉 Text-to-Speech: ElevenLabs
#### Should I use Daily or Twilio as a vendor?
If you're starting from scratch, using Daily to provision phone numbers alongside Daily as a transport offers some convenience (such as automatic call forwarding.)
If you already have Twilio numbers and workflows that you want to connect to your Pipecat bots, there is some additional configuration required (you'll need to create a `on_dialin_ready` and use the Twilio client to trigger the forward.)
You can read more about this, as well as see respective walkthroughs in our docs.
## Setup
```shell
# Install the requirements
pip install -r requirements.txt
# Setup your env
mv env.example .env
```
## Using Daily numbers
Run `bot_runner.py` to handle incoming HTTP requests:
`python bot_runner.py --host localhost`
Then target the following URL:
`POST /daily_start_bot`
For more configuration options, please consult Daily's API documentation.
## Using Twilio numbers
As above, but target the following URL:
`POST /twilio_start_bot`
For more configuration options, please consult Twilio's API documentation.
## Deployment example
A Dockerfile is included in this demo for convenience. Here is an example of how to build and deploy your bot to [fly.io](https://fly.io).
*Please note: This demo spawns agents as subprocesses for convenience / demonstration purposes. You would likely not want to do this in production as it would limit concurrency to available system resources. For more information on how to deploy your bots using VMs, refer to the Pipecat documentation.*
### Build the docker image
`docker build -t tag:project .`
### Launch the fly project
`mv fly.example.toml fly.toml`
`fly launch` (using the included fly.toml)
### Setup your secrets on Fly
Set the necessary secrets (found in `env.example`)
`fly secrets set DAILY_API_KEY=... OPENAI_API_KEY=... ELEVENLABS_API_KEY=... ELEVENLABS_VOICE_ID=...`
If you're using Twilio as a number vendor:
`fly secrets set TWILIO_ACCOUNT_SID=... TWILIO_AUTH_TOKEN=...`
### Deploy!
`fly deploy`
## Need to do something more advanced?
This demo covers the basics of bot telephony. If you want to know more about working with PSTN / SIP, please ping us on [Discord](https://discord.gg/pipecat).

View File

@@ -0,0 +1,111 @@
import asyncio
import aiohttp
import os
import sys
import argparse
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import LLMAssistantResponseAggregator, LLMUserResponseAggregator
from pipecat.frames.frames import (
LLMMessagesFrame,
EndFrame
)
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport, DailyDialinSettings
from pipecat.vad.silero import SileroVADAnalyzer
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
daily_api_key = os.getenv("DAILY_API_KEY", "")
daily_api_url = os.getenv("DAILY_API_URL", "https://api.daily.co/v1")
async def main(room_url: str, token: str, callId: str, callDomain: str):
async with aiohttp.ClientSession() as session:
# diallin_settings are only needed if Daily's SIP URI is used
# If you are handling this via Twilio, Telnyx, set this to None
# and handle call-forwarding when on_dialin_ready fires.
diallin_settings = DailyDialinSettings(
call_id=callId,
call_domain=callDomain
)
transport = DailyTransport(
room_url,
token,
"Chatbot",
DailyParams(
api_url=daily_api_url,
api_key=daily_api_key,
dialin_settings=diallin_settings,
audio_in_enabled=True,
audio_out_enabled=True,
camera_out_enabled=False,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
transcription_enabled=True,
)
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY", ""),
voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are Chatbot, a friendly, helpful robot. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way, but keep your responses brief. Start by saying 'Oh, hello! Who dares dial me at this hour?!'.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline([
transport.input(),
tma_in,
llm,
tts,
transport.output(),
tma_out,
])
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
await task.queue_frames([LLMMessagesFrame(messages)])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Pipecat Simple ChatBot")
parser.add_argument("-u", type=str, help="Room URL")
parser.add_argument("-t", type=str, help="Token")
parser.add_argument("-i", type=str, help="Call ID")
parser.add_argument("-d", type=str, help="Call Domain")
config = parser.parse_args()
asyncio.run(main(config.u, config.t, config.i, config.d))

View File

@@ -0,0 +1,220 @@
"""
bot_runner.py
HTTP service that listens for incoming calls from either Daily or Twilio,
provisioning a room and starting a Pipecat bot in response.
Refer to README for more information.
"""
import os
import argparse
import subprocess
from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper, DailyRoomObject, DailyRoomProperties, DailyRoomSipParams, DailyRoomParams
from fastapi import FastAPI, Request, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import JSONResponse, PlainTextResponse
from twilio.twiml.voice_response import VoiceResponse
from dotenv import load_dotenv
load_dotenv(override=True)
# ------------ Configuration ------------ #
MAX_SESSION_TIME = 5 * 60 # 5 minutes
REQUIRED_ENV_VARS = ['OPENAI_API_KEY', 'DAILY_API_KEY',
'ELEVENLABS_API_KEY', 'ELEVENLABS_VOICE_ID']
daily_rest_helper = DailyRESTHelper(
os.getenv("DAILY_API_KEY", ""),
os.getenv("DAILY_API_URL", 'https://api.daily.co/v1'))
# ----------------- API ----------------- #
app = FastAPI()
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"]
)
"""
Create Daily room, tell the bot if the room is created for Twilio's SIP or Daily's SIP (vendor).
When the vendor is Daily, the bot handles the call forwarding automatically,
i.e, forwards the call from the "hold music state" to the Daily Room's SIP URI.
Alternatively, when the vendor is Twilio (not Daily), the bot is responsible for
updating the state on Twilio. So when `dialin-ready` fires, it takes appropriate
action using the Twilio Client library.
"""
def _create_daily_room(room_url, callId, callDomain=None, vendor="daily"):
if not room_url:
params = DailyRoomParams(
properties=DailyRoomProperties(
# Note: these are the default values, except for the display name
sip=DailyRoomSipParams(
display_name="dialin-user",
video=False,
sip_mode="dial-in",
num_endpoints=1
)
)
)
print(f"Creating new room...")
room: DailyRoomObject = daily_rest_helper.create_room(params=params)
else:
# Check passed room URL exist (we assume that it already has a sip set up!)
try:
print(f"Joining existing room: {room_url}")
room: DailyRoomObject = daily_rest_helper.get_room_from_url(
room_url)
except Exception:
raise HTTPException(
status_code=500, detail=f"Room not found: {room_url}")
print(f"Daily room: {room.url} {room.config.sip_endpoint}")
# Give the agent a token to join the session
token = daily_rest_helper.get_token(room.url, MAX_SESSION_TIME)
if not room or not token:
raise HTTPException(
status_code=500, detail=f"Failed to get room or token token")
# Spawn a new agent, and join the user session
# Note: this is mostly for demonstration purposes (refer to 'deployment' in docs)
if vendor == "daily":
bot_proc = f"python3 -m bot_daily -u {room.url} -t {token} -i {
callId} -d {callDomain}"
else:
bot_proc = f"python3 -m bot_twilio -u {room.url} -t {
token} -i {callId} -s {room.config.sip_endpoint}"
try:
subprocess.Popen(
[bot_proc],
shell=True,
bufsize=1,
cwd=os.path.dirname(os.path.abspath(__file__))
)
except Exception as e:
raise HTTPException(
status_code=500, detail=f"Failed to start subprocess: {e}")
return room
@app.post("/twilio_start_bot", response_class=PlainTextResponse)
async def twilio_start_bot(request: Request):
print(f"POST /twilio_voice_bot")
# twilio_start_bot is invoked directly by Twilio (as a web hook).
# On Twilio, under Active Numbers, pick the phone number
# Click Configure and under Voice Configuration,
# "a call comes in" choose webhook and point the URL to
# where this code is hosted.
data = {}
try:
# shouldnt have received json, twilio sends form data
form_data = await request.form()
data = dict(form_data)
except Exception:
pass
room_url = os.getenv("DAILY_SAMPLE_ROOM_URL", None)
callId = data.get('CallSid')
if not callId:
raise HTTPException(
status_code=500, detail="Missing 'CallSid' in request")
print("CallId: %s" % callId)
# create room and tell the bot to join the created room
# note: Twilio does not require a callDomain
room: DailyRoomObject = _create_daily_room(
room_url, callId, None, "twilio")
print(f"Put Twilio on hold...")
# We have the room and the SIP URI,
# but we do not know if the Daily SIP Worker and the Bot have joined the call
# put the call on hold until the 'on_dialin_ready' fires.
# Then, the bot will update the called sid with the sip uri.
# http://com.twilio.music.classical.s3.amazonaws.com/BusyStrings.mp3
resp = VoiceResponse()
resp.play(
url="http://com.twilio.sounds.music.s3.amazonaws.com/MARKOVICHAMP-Borghestral.mp3", loop=10)
return str(resp)
@app.post("/daily_start_bot")
async def daily_start_bot(request: Request) -> JSONResponse:
# The /daily_start_bot is invoked when a call is received on Daily's SIP URI
# daily_start_bot will create the room, put the call on hold until
# the bot and sip worker are ready. Daily will automatically
# forward the call to the SIP URi when dialin_ready fires.
# Use specified room URL, or create a new one if not specified
room_url = os.getenv("DAILY_SAMPLE_ROOM_URL", None)
# Get the dial-in properties from the request
try:
data = await request.json()
if "test" in data:
# Pass through any webhook checks
return JSONResponse({"test": True})
callId = data.get("callId", None)
callDomain = data.get("callDomain", None)
except Exception:
raise HTTPException(
status_code=500,
detail="Missing properties 'callId' or 'callDomain'")
print(f"CallId: {callId}, CallDomain: {callDomain}")
room: DailyRoomObject = _create_daily_room(
room_url, callId, callDomain, "daily")
# Grab a token for the user to join with
return JSONResponse({
"room_url": room.url,
"sipUri": room.config.sip_endpoint
})
# ----------------- Main ----------------- #
if __name__ == "__main__":
# Check environment variables
for env_var in REQUIRED_ENV_VARS:
if env_var not in os.environ:
raise Exception(f"Missing environment variable: {env_var}.")
parser = argparse.ArgumentParser(description="Pipecat Bot Runner")
parser.add_argument("--host", type=str,
default=os.getenv("HOST", "0.0.0.0"), help="Host address")
parser.add_argument("--port", type=int,
default=os.getenv("PORT", 7860), help="Port number")
parser.add_argument("--reload", action="store_true",
default=True, help="Reload code on change")
config = parser.parse_args()
try:
import uvicorn
uvicorn.run(
"bot_runner:app",
host=config.host,
port=config.port,
reload=config.reload
)
except KeyboardInterrupt:
print("Pipecat runner shutting down...")

View File

@@ -0,0 +1,125 @@
import asyncio
import aiohttp
import os
import sys
import argparse
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import LLMAssistantResponseAggregator, LLMUserResponseAggregator
from pipecat.frames.frames import (
LLMMessagesFrame,
EndFrame
)
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from twilio.rest import Client
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
twilio_account_sid = os.getenv('TWILIO_ACCOUNT_SID')
twilio_auth_token = os.getenv('TWILIO_AUTH_TOKEN')
twilioclient = Client(twilio_account_sid, twilio_auth_token)
daily_api_key = os.getenv("DAILY_API_KEY", "")
async def main(room_url: str, token: str, callId: str, sipUri: str):
async with aiohttp.ClientSession() as session:
# diallin_settings are only needed if Daily's SIP URI is used
# If you are handling this via Twilio, Telnyx, set this to None
# and handle call-forwarding when on_dialin_ready fires.
transport = DailyTransport(
room_url,
token,
"Chatbot",
DailyParams(
api_key=daily_api_key,
dialin_settings=None, # Not required for Twilio
audio_in_enabled=True,
audio_out_enabled=True,
camera_out_enabled=False,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
transcription_enabled=True,
)
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY", ""),
voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are Chatbot, a friendly, helpful robot. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way, but keep your responses brief. Start by saying 'Hello! Who dares dial me at this hour?!'.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline([
transport.input(),
tma_in,
llm,
tts,
transport.output(),
tma_out,
])
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
await task.queue_frames([LLMMessagesFrame(messages)])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
@transport.event_handler("on_dialin_ready")
async def on_dialin_ready(transport, cdata):
# For Twilio, Telnyx, etc. You need to update the state of the call
# and forward it to the sip_uri..
print(f"Forwarding call: {callId} {sipUri}")
try:
# The TwiML is updated using Twilio's client library
call = twilioclient.calls(callId).update(
twiml=f'<Response><Dial><Sip>{sipUri}</Sip></Dial></Response>'
)
except Exception as e:
raise Exception(f"Failed to forward call: {str(e)}")
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Pipecat Simple ChatBot")
parser.add_argument("-u", type=str, help="Room URL")
parser.add_argument("-t", type=str, help="Token")
parser.add_argument("-i", type=str, help="Call ID")
parser.add_argument("-s", type=str, help="SIP URI")
config = parser.parse_args()
asyncio.run(main(config.u, config.t, config.i, config.s))

View File

@@ -0,0 +1,8 @@
DAILY_SAMPLE_ROOM_URL=https://yourdomain.daily.co/yourroom # (optional: for joining the bot to the same room repeatedly for local dev)
DAILY_API_KEY=.
DAILY_API_URL=api.daily.co/v1
OPENAI_API_KEY=
ELEVENLABS_API_KEY=
ELEVENLABS_VOICE_ID=
TWILIO_ACCOUNT_SID=
TWILIO_AUTH_TOKEN=

View File

@@ -0,0 +1,19 @@
# fly.toml app configuration file generated for pipecat-dialin-demo on 2024-06-03T15:57:57+02:00
#
# See https://fly.io/docs/reference/configuration/ for information about how to use this file.
#
app = 'pipecat-dialin-demo'
primary_region = 'sjc'
[build]
[http_service]
internal_port = 7860
force_https = true
auto_stop_machines = true
auto_start_machines = true
min_machines_running = 1
[[vm]]
size = 'performance-1x'

Binary file not shown.

After

Width:  |  Height:  |  Size: 19 KiB

View File

@@ -0,0 +1,7 @@
pipecat-ai[daily,openai,silero]
fastapi
uvicorn
requests
python-dotenv
loguru
twilio

165
examples/fast-chatbot/.gitignore vendored Normal file
View File

@@ -0,0 +1,165 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
.pybuilder/
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock
# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype static type analyzer
.pytype/
# Cython debug symbols
cython_debug/
# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
runpod.toml
# custom script to recursively upgrade items in requirements.py
upgrade_requirements.py
.DS_Store

View File

View File

@@ -0,0 +1,164 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
from loguru import logger
import argparse
import asyncio
import aiohttp
import os
import sys
import time
from typing import Optional
from pydantic import BaseModel, ValidationError
from pipecat.vad.vad_analyzer import VADParams
from pipecat.vad.silero import SileroVADAnalyzer
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.services.openai import OpenAILLMService
from pipecat.services.deepgram import DeepgramSTTService
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.pipeline import Pipeline
from pipecat.frames.frames import LLMMessagesFrame, EndFrame
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator, LLMUserResponseAggregator
)
from helpers import (
ClearableDeepgramTTSService,
AudioVolumeTimer,
TranscriptionTimingLogger
)
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level=os.getenv("LOG_LEVEL", "DEBUG"))
class BotSettings(BaseModel):
room_url: str
room_token: str
bot_name: str = "Pipecat"
prompt: Optional[str] = "You are a helpful assistant."
deepgram_api_key: Optional[str] = os.getenv("DEEPGRAM_API_KEY", None)
deepgram_voice: Optional[str] = os.getenv("DEEPGRAM_VOICE", "aura-asteria-en")
deepgram_tts_base_url: Optional[str] = os.getenv(
"DEEPGRAM_TTS_BASE_URL", "https://api.deepgram.com/v1/speak")
deepgram_stt_base_url: Optional[str] = os.getenv(
"DEEPGRAM_STT_BASE_URL", "https://api.deepgram.com/v1/speak")
openai_api_key: Optional[str] = os.getenv("OPENAI_API_KEY", None),
openai_model: Optional[str] = os.getenv("OPENAI_MODEL", None),
openai_base_url: Optional[str] = os.getenv("OPENAI_BASE_URL", None)
vad_stop_secs: Optional[float] = os.getenv("VAD_STOP_SECS", 0.200)
async def main(settings: BotSettings):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
settings.room_url,
settings.room_token,
settings.bot_name,
DailyParams(
audio_out_enabled=True,
transcription_enabled=False,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(params=VADParams(
stop_secs=settings.vad_stop_secs
)),
vad_audio_passthrough=True
)
)
stt = DeepgramSTTService(
name="STT",
api_key=settings.deepgram_api_key,
url=settings.deepgram_stt_base_url
)
tts = ClearableDeepgramTTSService(
name="Voice",
aiohttp_session=session,
api_key=settings.deepgram_api_key,
voice=settings.deepgram_voice,
**({'base_url': url} if (url := settings.deepgram_tts_base_url) else {})
)
llm = OpenAILLMService(
name="LLM",
api_key=settings.openai_api_key,
model=settings.openai_model,
base_url=settings.openai_base_url,
)
messages = [
{
"role": "system",
"content": settings.prompt,
},
]
avt = AudioVolumeTimer()
tl = TranscriptionTimingLogger(avt)
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline([
transport.input(), # Transport user input
avt, # Audio volume timer
stt, # Speech-to-text
tl, # Transcription timing logger
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out, # Assistant spoken responses
])
task = PipelineTask(
pipeline,
PipelineParams(
allow_interruptions=True,
enable_metrics=True,
report_only_initial_ttfb=True
))
# When the participant leaves, we exit the bot.
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
# When the first participant joins, the bot should introduce itself.
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
# Provide some air whilst tracks subscribe
time.sleep(2)
messages.append(
{
"role": "system",
"content": "Briefly introduce yourself by saying 'hello, I'm FastBot, how can I help you today?'"})
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Pipecat Bot")
parser.add_argument("-s", "--settings", type=str, required=True, help="Pipecat bot settings")
args, unknown = parser.parse_known_args()
try:
settings = BotSettings.model_validate_json(args.settings)
asyncio.run(main(settings))
except ValidationError as e:
print(e)

View File

@@ -0,0 +1,164 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
import argparse
import subprocess
from pydantic import BaseModel, ValidationError
from typing import Optional
from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper, DailyRoomObject, DailyRoomProperties, DailyRoomParams
from fastapi import FastAPI, Request, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import JSONResponse
from bot import BotSettings
from dotenv import load_dotenv
load_dotenv(override=True)
# ------------ Configuration ------------ #
MAX_SESSION_TIME = 5 * 60 # 5 minutes
REQUIRED_ENV_VARS = ['DAILY_API_URL', 'DAILY_API_KEY', 'DEEPGRAM_API_KEY']
daily_rest_helper = DailyRESTHelper(
os.getenv("DAILY_API_KEY", ""),
os.getenv("DAILY_API_URL", 'https://api.daily.co/v1'))
class RunnerSettings(BaseModel):
prompt: Optional[
str] = "You are a fast, low-latency chatbot. Your goal is to demonstrate voice-driven AI capabilities at human-like speeds. When introducing yourself briefly mention your goal is to showcase speed and conversational flow. The technology powering you is Daily for transport, Cerebrium for GPU hosting, Llama 3 (8-B version) LLM, and Deepgram for speech-to-text and text-to-speech. You are hosted on the east coast of the United States. Respond to what the user said in a creative and helpful way, but keep responses short and legible. Ensure responses contain only words. Check again that you have not included special characters other than '?' or '!'."
deepgram_voice: Optional[str] = os.getenv("DEEPGRAM_VOICE")
openai_model: Optional[str] = os.getenv("OPENAI_MODEL", "gpt-4o")
openai_api_key: Optional[str] = os.getenv("OPENAI_API_KEY")
test: Optional[bool] = None
# ----------------- API ----------------- #
app = FastAPI()
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"]
)
# ----------------- Main ----------------- #
@app.post("/start_bot")
async def start_bot(request: Request) -> JSONResponse:
runner_settings = RunnerSettings()
try:
request_body = await request.body()
if len(request_body) > 0:
runner_settings = RunnerSettings.model_validate_json(request_body)
except ValidationError as e:
raise HTTPException(
status_code=400,
detail=f"Invalid request: {e}")
except Exception as e:
# If no data in request, pass
pass
# Is this a webhook creation request?
if runner_settings.test is not None:
return JSONResponse({"test": True})
# Use specified room URL, or create a new one if not specified
room_url = os.getenv("DAILY_SAMPLE_ROOM_URL", "")
if not room_url:
params = DailyRoomParams(
properties=DailyRoomProperties()
)
try:
room: DailyRoomObject = daily_rest_helper.create_room(params=params)
except Exception as e:
raise HTTPException(
status_code=500,
detail=f"Unable to provision room {e}")
else:
# Check passed room URL exists, we should assume that it already has a sip set up
try:
room: DailyRoomObject = daily_rest_helper.get_room_from_url(room_url)
except Exception:
raise HTTPException(
status_code=500, detail=f"Room not found: {room_url}")
# Give the agent a token to join the session
token = daily_rest_helper.get_token(room.url, MAX_SESSION_TIME)
if not room or not token:
raise HTTPException(
status_code=500, detail=f"Failed to get token for room: {room_url}")
# Spawn a new agent, and join the user session
try:
bot_settings = BotSettings(
room_url=room.url,
room_token=token,
prompt=runner_settings.prompt,
deepgram_voice=runner_settings.deepgram_voice,
openai_model=runner_settings.openai_model,
openai_api_key=runner_settings.openai_api_key,
)
bot_settings_str = bot_settings.model_dump_json(exclude_none=True)
subprocess.Popen(
[f"python3 -m bot -s '{bot_settings_str}'"],
shell=True,
bufsize=1,
cwd=os.path.dirname(os.path.abspath(__file__)))
except Exception as e:
raise HTTPException(
status_code=500, detail=f"Failed to start subprocess: {e}")
# Grab a token for the user to join with
user_token = daily_rest_helper.get_token(room.url, MAX_SESSION_TIME)
return JSONResponse({
"room_url": room.url,
"token": user_token,
})
if __name__ == "__main__":
# Check environment variables
for env_var in REQUIRED_ENV_VARS:
if env_var not in os.environ:
raise Exception(f"Missing environment variable: {env_var}.")
parser = argparse.ArgumentParser(description="Pipecat Bot Runner")
parser.add_argument("--host", type=str,
default=os.getenv("HOST", "0.0.0.0"), help="Host address")
parser.add_argument("--port", type=int,
default=os.getenv("PORT", 7860), help="Port number")
parser.add_argument("--reload", action="store_true",
default=True, help="Reload code on change")
config = parser.parse_args()
try:
import uvicorn
uvicorn.run(
"bot_runner:app",
host=config.host,
port=config.port,
reload=config.reload
)
except KeyboardInterrupt:
print("Pipecat runner shutting down...")

View File

@@ -0,0 +1,12 @@
DAILY_SAMPLE_ROOM_URL= #optional: use the same room each time, or create a new one if unset
DAILY_API_KEY=
DAILY_API_URL=
DEEPGRAM_API_KEY=
DEEPGRAM_VOICE=
DEEPGRAM_STT_URL=
DEEPGRAM_TTS_BASE_URL=
OPENAI_API_KEY=
OPENAI_MODEL=
OPENAI_BASE_URL=

View File

@@ -0,0 +1,267 @@
from loguru import logger
import asyncio
import math
import struct
import time
from dataclasses import dataclass, field
from typing import List
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.frames.frames import (
Frame,
AudioRawFrame,
InterimTranscriptionFrame,
TranscriptionFrame,
TextFrame,
StartInterruptionFrame,
LLMFullResponseStartFrame,
TTSStoppedFrame,
MetricsFrame
)
from pipecat.vad.vad_analyzer import VADAnalyzer, VADState
from pipecat.services.deepgram import DeepgramTTSService
from pipecat.services.openai import OpenAILLMContext, OpenAILLMContextFrame
class GreedyLLMAggregator(FrameProcessor):
def __init__(self, context: OpenAILLMContext = None, **kwargs):
super().__init__(**kwargs)
self.context: OpenAILLMContext = context if context else OpenAILLMContext()
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
logger.debug(f"{frame}")
try:
if isinstance(frame, InterimTranscriptionFrame):
return
if isinstance(frame, TranscriptionFrame):
# append transcribed text to last "user" frame
if self.context.messages and self.context.messages[-1]["role"] == "user":
last_frame = self.context.messages.pop()
else:
last_frame = {"role": "user", "content": ""}
last_frame["content"] += " " + frame.text
self.context.messages.append(last_frame)
oai_context_frame = OpenAILLMContextFrame(context=self.context)
logger.debug(f"pushing frame {oai_context_frame}")
await self.push_frame(oai_context_frame)
return
await self.push_frame(frame, direction)
except Exception as e:
logger.debug(f"error: {e}")
class ClearableDeepgramTTSService(DeepgramTTSService):
def __init___(self, **kwargs):
super().__init(**kwargs)
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, StartInterruptionFrame):
self._current_sentence = ""
@dataclass
class BufferedSentence:
audio_frames: List[AudioRawFrame] = field(default_factory=list)
text_frame: TextFrame = None
class VADGate(FrameProcessor):
def __init__(
self,
vad_analyzer: VADAnalyzer = None,
context: OpenAILLMContext = None,
**kwargs):
super().__init__(**kwargs)
self.vad_analyzer = vad_analyzer
self.context = context
self._audio_pusher_task = None
self._expect_text_frame_next = False
self._sentences: List[BufferedSentence] = []
# queue output from tts one sentence at a time. associate a buffer of audio frames with the content of
# each text frame.
#
# start a coroutine to service the queue and send sentences down the pipeline when possible.
# 1. do not send anything when we are not in VADState.QUIET
# 2. if we are in VADState.QUIET, send a sentence, estimate how long it will take for that sentence
# to output, sleep until it's time to send another sentence
# 3. each time we send a sentence, append it to the conversation context
# 3. when the sentence buffer becomes empty, cancel the coroutine
# 4. if we get a new LLMFullResponse, treat that as a cancellation, too
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
try:
# A TTSService will emit a series of AudioRawFrame objects, then a TTSStoppedFrame,
# then a TextFrame.
if self._expect_text_frame_next:
self._expect_text_frame_next = False
if isinstance(frame, TextFrame):
self._sentences[-1].text_frame = frame
else:
logger.debug(f"expected a text frame, but received {frame}")
await self.push_frame(frame, direction)
return
else:
if isinstance(frame, TextFrame):
logger.error(f"XXXXXXXXXXXXXXXXXXX received a text frame, wasn't expecting it.")
if isinstance(frame, AudioRawFrame):
# if our buffer is empty or has a "finished" sentence at the end,
# then we need to start buffering a new sentence
if not self._sentences or self._sentences[-1].text_frame:
self._sentences.append(BufferedSentence())
self._sentences[-1].audio_frames.append(frame)
await self.maybe_start_audio_pusher_task()
return
if isinstance(frame, TTSStoppedFrame):
self._expect_text_frame_next = True
await self.push_frame(frame, direction)
return
# There are two ways we can be interrupted. During greedy inference, a new
# LLM response can start. Or, during playout, we can get a traditional
# user interruption frame.
if (isinstance(frame, LLMFullResponseStartFrame) or
isinstance(frame, StartInterruptionFrame)):
logger.debug(f"{frame} - Handle interruption in VADGate")
self._sentences = []
if self._audio_pusher_task:
self._audio_pusher_task.cancel()
self._audio_pusher_task = None
await self.push_frame(frame, direction)
return
await self.push_frame(frame, direction)
except Exception as e:
logger.debug(f"error: {e}")
async def maybe_start_audio_pusher_task(self):
try:
if self._audio_pusher_task:
return
self._audio_pusher_task = self.get_event_loop().create_task(self.push_audio())
except Exception as e:
logger.debug(f"Exception {e}")
async def push_audio(self):
try:
while True:
if not self._sentences:
await asyncio.sleep(0.01)
continue
if self.vad_analyzer._vad_state != VADState.QUIET:
await asyncio.sleep(0.01)
continue
# we only want to push completed sentence buffers
if not self._sentences[0].text_frame:
await asyncio.sleep(0.01)
continue
s = self._sentences.pop(0)
if not s.audio_frames:
continue
sample_rate = s.audio_frames[0].sample_rate
duration = 0
logger.debug(f"Pushing {len(s.audio_frames)} audio frames")
for frame in s.audio_frames:
await self.push_frame(frame)
# assume linear16 encoding (2 bytes per sample). todo: add some more
# metadata to AudioRawFrame, maybe
duration += (len(frame.audio) / 2 / frame.num_channels) / sample_rate
await asyncio.sleep(duration - 20 / 1000)
if self.context:
logger.debug(f"Appending assistant message to context: [{s.text_frame.text}]")
self.context.messages.append(
{"role": "assistant", "content": s.text_frame.text}
)
await self.push_frame(s.text_frame)
except Exception as e:
logger.debug(f"Exception {e}")
class TranscriptionTimingLogger(FrameProcessor):
def __init__(self, avt):
super().__init__()
self.name = "Transcription"
self._avt = avt
async def process_frame(self, frame: Frame, direction: FrameDirection):
try:
await super().process_frame(frame, direction)
if isinstance(frame, TranscriptionFrame):
elapsed = time.time() - self._avt.last_transition_ts
logger.debug(f"Transcription TTF: {elapsed}")
await self.push_frame(MetricsFrame(ttfb={self.name: elapsed}))
await self.push_frame(frame, direction)
except Exception as e:
logger.debug(f"Exception {e}")
class AudioVolumeTimer(FrameProcessor):
def __init__(self):
super().__init__()
self.last_transition_ts = 0
self._prev_volume = -80
self._speech_volume_threshold = -50
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, AudioRawFrame):
volume = self.calculate_volume(frame)
# print(f"Audio volume: {volume:.2f} dB")
if (volume >= self._speech_volume_threshold and
self._prev_volume < self._speech_volume_threshold):
# logger.debug("transition above speech volume threshold")
self.last_transition_ts = time.time()
elif (volume < self._speech_volume_threshold and
self._prev_volume >= self._speech_volume_threshold):
# logger.debug("transition below non-speech volume threshold")
self.last_transition_ts = time.time()
self._prev_volume = volume
await self.push_frame(frame, direction)
def calculate_volume(self, frame: AudioRawFrame) -> float:
if frame.num_channels != 1:
raise ValueError(f"Expected 1 channel, got {frame.num_channels}")
# Unpack audio data into 16-bit integers
fmt = f"{len(frame.audio) // 2}h"
audio_samples = struct.unpack(fmt, frame.audio)
# Calculate RMS
sum_squares = sum(sample**2 for sample in audio_samples)
rms = math.sqrt(sum_squares / len(audio_samples))
# Convert RMS to decibels (dB)
# Reference: maximum value for 16-bit audio is 32767
if rms > 0:
db = 20 * math.log10(rms / 32767)
else:
db = -96 # Minimum value (almost silent)
return db

View File

@@ -0,0 +1,6 @@
pipecat-ai[daily,openai,silero,deepgram]
fastapi
uvicorn
requests
python-dotenv
loguru

View File

@@ -23,11 +23,11 @@ from pipecat.frames.frames import (
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.pipeline.parallel_task import ParallelTask
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.processors.aggregators.gated import GatedAggregator
from pipecat.processors.aggregators.llm_response import LLMFullResponseAggregator
from pipecat.processors.aggregators.sentence import SentenceAggregator
from pipecat.processors.aggregators.parallel_task import ParallelTask
from pipecat.services.openai import OpenAILLMService
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.fal import FalImageGenService
@@ -59,6 +59,8 @@ class MonthPrepender(FrameProcessor):
self.prepend_to_next_text_frame = False
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, MonthFrame):
self.most_recent_month = frame.month
elif self.prepend_to_next_text_frame and isinstance(frame, TextFrame):

View File

@@ -50,6 +50,8 @@ async def main():
self.text = ""
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, TextFrame):
self.text = frame.text
await self.push_frame(frame, direction)
@@ -60,6 +62,8 @@ async def main():
self.audio = bytearray()
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, AudioRawFrame):
self.audio.extend(frame.audio)
self.frame = AudioRawFrame(
@@ -71,6 +75,8 @@ async def main():
self.frame = None
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, URLImageRawFrame):
self.frame = frame

View File

@@ -49,6 +49,8 @@ class ImageSyncAggregator(FrameProcessor):
self._waiting_image_bytes = self._waiting_image.tobytes()
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if not isinstance(frame, SystemFrame):
await self.push_frame(ImageRawFrame(image=self._speaking_image_bytes, size=(1024, 1024), format=self._speaking_image_format))
await self.push_frame(frame)

View File

@@ -74,7 +74,11 @@ async def main(room_url: str, token):
tma_out # Assistant spoken responses
])
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
task = PipelineTask(pipeline, PipelineParams(
allow_interruptions=True,
enable_metrics=True,
report_only_initial_ttfb=True,
))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):

View File

@@ -15,7 +15,7 @@ from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator, LLMUserResponseAggregator)
from pipecat.services.deepgram import DeepgramTTSService
from pipecat.services.deepgram import DeepgramSTTService, DeepgramTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
@@ -39,12 +39,14 @@ async def main(room_url: str, token):
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True
)
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = DeepgramTTSService(
aiohttp_session=session,
api_key=os.getenv("DEEPGRAM_API_KEY"),
@@ -67,6 +69,7 @@ async def main(room_url: str, token):
pipeline = Pipeline([
transport.input(), # Transport user input
stt, # STT
tma_in, # User responses
llm, # LLM
tts, # TTS

View File

@@ -5,7 +5,6 @@
#
import asyncio
import aiohttp
import os
import sys
@@ -20,6 +19,7 @@ from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
@@ -32,62 +32,61 @@ logger.add(sys.stderr, level="DEBUG")
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
audio_out_sample_rate=44100,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
audio_out_sample_rate=44100,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_name="British Lady",
output_format="pcm_44100"
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_name="British Lady",
output_format="pcm_44100"
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o")
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline([
transport.input(), # Transport user input
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out # Assistant spoken responses
])
pipeline = Pipeline([
transport.input(), # Transport user input
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out # Assistant spoken responses
])
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append(
{"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append(
{"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
runner = PipelineRunner()
await runner.run(task)
await runner.run(task)
if __name__ == "__main__":

View File

@@ -0,0 +1,93 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator, LLMUserResponseAggregator)
from pipecat.services.playht import PlayHTTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url: str, token):
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
audio_out_sample_rate=16000,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
tts = PlayHTTTSService(
user_id=os.getenv("PLAYHT_USER_ID"),
api_key=os.getenv("PLAYHT_API_KEY"),
voice_url="s3://voice-cloning-zero-shot/801a663f-efd0-4254-98d0-5c175514c3e8/jennifer/manifest.json",
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline([
transport.input(), # Transport user input
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out # Assistant spoken responses
])
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append(
{"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,100 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator, LLMUserResponseAggregator)
from pipecat.services.azure import AzureLLMService, AzureSTTService, AzureTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url: str, token):
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
audio_out_sample_rate=16000,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True,
)
)
stt = AzureSTTService(
api_key=os.getenv("AZURE_SPEECH_API_KEY"),
region=os.getenv("AZURE_SPEECH_REGION"),
)
tts = AzureTTSService(
api_key=os.getenv("AZURE_SPEECH_API_KEY"),
region=os.getenv("AZURE_SPEECH_REGION"),
)
llm = AzureLLMService(
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
model=os.getenv("AZURE_CHATGPT_MODEL"),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline([
transport.input(), # Transport user input
stt, # STT
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out # Assistant spoken responses
])
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append(
{"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,92 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator, LLMUserResponseAggregator)
from pipecat.services.openai import OpenAITTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url: str, token):
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
audio_out_sample_rate=24000,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
tts = OpenAITTSService(
api_key=os.getenv("OPENAI_API_KEY"),
voice="alloy"
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline([
transport.input(), # Transport user input
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out # Assistant spoken responses
])
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append(
{"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,102 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator,
LLMUserResponseAggregator,
)
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openpipe import OpenPipeLLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
import time
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
timestamp = int(time.time())
llm = OpenPipeLLMService(
api_key=os.getenv("OPENAI_API_KEY"),
openpipe_api_key=os.getenv("OPENPIPE_API_KEY"),
model="gpt-4o",
tags={
"conversation_id": f"pipecat-{timestamp}"
}
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline([
transport.input(), # Transport user input
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out # Assistant spoken responses
])
task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append(
{"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -60,6 +60,8 @@ for file in sound_files:
class OutboundSoundEffectWrapper(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, LLMFullResponseEndFrame):
await self.push_frame(sounds["ding1.wav"])
# In case anything else downstream needs it
@@ -71,6 +73,8 @@ class OutboundSoundEffectWrapper(FrameProcessor):
class InboundSoundEffectWrapper(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, LLMMessagesFrame):
await self.push_frame(sounds["ding2.wav"])
# In case anything else downstream needs it

View File

@@ -42,6 +42,8 @@ class UserImageRequester(FrameProcessor):
self._participant_id = participant_id
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if self._participant_id and isinstance(frame, TextFrame):
await self.push_frame(UserImageRequestFrame(self._participant_id), FrameDirection.UPSTREAM)
await self.push_frame(frame, direction)

View File

@@ -42,6 +42,8 @@ class UserImageRequester(FrameProcessor):
self._participant_id = participant_id
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if self._participant_id and isinstance(frame, TextFrame):
await self.push_frame(UserImageRequestFrame(self._participant_id), FrameDirection.UPSTREAM)
await self.push_frame(frame, direction)

View File

@@ -42,6 +42,8 @@ class UserImageRequester(FrameProcessor):
self._participant_id = participant_id
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if self._participant_id and isinstance(frame, TextFrame):
await self.push_frame(UserImageRequestFrame(self._participant_id), FrameDirection.UPSTREAM)
await self.push_frame(frame, direction)

View File

@@ -42,6 +42,8 @@ class UserImageRequester(FrameProcessor):
self._participant_id = participant_id
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if self._participant_id and isinstance(frame, TextFrame):
await self.push_frame(UserImageRequestFrame(self._participant_id), FrameDirection.UPSTREAM)
await self.push_frame(frame, direction)

View File

@@ -29,6 +29,8 @@ logger.add(sys.stderr, level="DEBUG")
class TranscriptionLogger(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, TranscriptionFrame):
print(f"Transcription: {frame.text}")

View File

@@ -28,6 +28,8 @@ logger.add(sys.stderr, level="DEBUG")
class TranscriptionLogger(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, TranscriptionFrame):
print(f"Transcription: {frame.text}")

View File

@@ -0,0 +1,58 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
from pipecat.frames.frames import Frame, TranscriptionFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.deepgram import DeepgramSTTService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
class TranscriptionLogger(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, TranscriptionFrame):
print(f"Transcription: {frame.text}")
async def main(room_url: str):
transport = DailyTransport(room_url, None, "Transcription bot",
DailyParams(audio_in_enabled=True))
stt = DeepgramSTTService(os.getenv("DEEPGRAM_API_KEY"))
tl = TranscriptionLogger()
pipeline = Pipeline([transport.input(), stt, tl])
task = PipelineTask(pipeline)
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url))

View File

@@ -41,7 +41,7 @@ async def start_fetch_weather(llm):
async def fetch_weather_from_api(llm, args):
return ({"conditions": "nice", "temperature": "75"})
return {"conditions": "nice", "temperature": "75"}
async def main(room_url: str, token):

View File

@@ -0,0 +1,159 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.parallel_pipeline import ParallelPipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantContextAggregator,
LLMUserContextAggregator
)
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.filters.function_filter import FunctionFilter
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from openai.types.chat import ChatCompletionToolParam
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
current_voice = "News Lady"
async def switch_voice(llm, args):
global current_voice
current_voice = args["voice"]
return {"voice": f"You are now using your {current_voice} voice. Your responses should now be as if you were a {current_voice}."}
async def news_lady_filter(frame) -> bool:
return current_voice == "News Lady"
async def british_lady_filter(frame) -> bool:
return current_voice == "British Lady"
async def barbershop_man_filter(frame) -> bool:
return current_voice == "Barbershop Man"
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Pipecat",
DailyParams(
audio_out_enabled=True,
audio_out_sample_rate=44100,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
news_lady = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_name="Newslady",
output_format="pcm_44100"
)
british_lady = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_name="British Lady",
output_format="pcm_44100"
)
barbershop_man = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_name="Barbershop Man",
output_format="pcm_44100"
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o")
llm.register_function("switch_voice", switch_voice)
tools = [
ChatCompletionToolParam(
type="function",
function={
"name": "switch_voice",
"description": "Switch your voice only when the user asks you to",
"parameters": {
"type": "object",
"properties": {
"voice": {
"type": "string",
"description": "The voice the user wants you to use",
},
},
"required": ["voice"],
},
})]
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities. Respond to what the user said in a creative and helpful way. Your output should not include non-alphanumeric characters. You can do the following voices: 'News Lady', 'British Lady' and 'Barbershop Man'.",
},
]
context = OpenAILLMContext(messages, tools)
tma_in = LLMUserContextAggregator(context)
tma_out = LLMAssistantContextAggregator(context)
pipeline = Pipeline([
transport.input(), # Transport user input
tma_in, # User responses
llm, # LLM
ParallelPipeline( # TTS (one of the following vocies)
[FunctionFilter(news_lady_filter), news_lady], # News Lady voice
[FunctionFilter(british_lady_filter), british_lady], # British Lady voice
[FunctionFilter(barbershop_man_filter), barbershop_man], # Barbershop Man voice
),
transport.output(), # Transport bot output
tma_out # Assistant spoken responses
])
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append(
{
"role": "system",
"content": f"Please introduce yourself to the user and let them know the voices you can do. Your initial responses should be as if you were a {current_voice}."})
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,153 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.parallel_pipeline import ParallelPipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantContextAggregator,
LLMUserContextAggregator
)
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.filters.function_filter import FunctionFilter
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.services.whisper import Model, WhisperSTTService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer
from openai.types.chat import ChatCompletionToolParam
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
current_language = "English"
async def switch_language(llm, args):
global current_language
current_language = args["language"]
return {"voice": f"Your answers from now on should be in {current_language}."}
async def english_filter(frame) -> bool:
return current_language == "English"
async def spanish_filter(frame) -> bool:
return current_language == "Spanish"
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Pipecat",
DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True
)
)
stt = WhisperSTTService(model=Model.LARGE)
english_tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id="pNInz6obpgDQGcFmaJgB",
)
spanish_tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
model="eleven_multilingual_v2",
voice_id="9F4C8ztpNUmXkdDDbz3J",
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o")
llm.register_function("switch_language", switch_language)
tools = [
ChatCompletionToolParam(
type="function",
function={
"name": "switch_language",
"description": "Switch to another language when the user asks you to",
"parameters": {
"type": "object",
"properties": {
"language": {
"type": "string",
"description": "The language the user wants you to speak",
},
},
"required": ["language"],
},
})]
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities. Respond to what the user said in a creative and helpful way. Your output should not include non-alphanumeric characters. You can speak the following languages: 'English' and 'Spanish'.",
},
]
context = OpenAILLMContext(messages, tools)
tma_in = LLMUserContextAggregator(context)
tma_out = LLMAssistantContextAggregator(context)
pipeline = Pipeline([
transport.input(), # Transport user input
stt, # STT
tma_in, # User responses
llm, # LLM
ParallelPipeline( # TTS (bot will speak the chosen language)
[FunctionFilter(english_filter), english_tts], # English
[FunctionFilter(spanish_filter), spanish_tts], # Spanish
),
transport.output(), # Transport bot output
tma_out # Assistant spoken responses
])
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append(
{
"role": "system",
"content": f"Please introduce yourself to the user and let them know the languages you speak. Your initial responses should be in {current_language}."})
await task.queue_frames([LLMMessagesFrame(messages)])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -0,0 +1,130 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import aiohttp
import os
import sys
import json
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator, LLMUserResponseAggregator)
from pipecat.services.deepgram import DeepgramTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport, DailyTransportMessageFrame
from pipecat.vad.silero import SileroVADAnalyzer
from runner import configure
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main(room_url: str, token):
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
room_url,
token,
"Respond bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer()
)
)
tts = DeepgramTTSService(
aiohttp_session=session,
api_key=os.getenv("DEEPGRAM_API_KEY"),
voice="aura-asteria-en",
base_url="http://0.0.0.0:8080/v1/speak"
)
llm = OpenAILLMService(
# To use OpenAI
# api_key=os.getenv("OPENAI_API_KEY"),
# model="gpt-4o"
# Or, to use a local vLLM (or similar) api server
model="meta-llama/Meta-Llama-3-8B-Instruct",
base_url="http://0.0.0.0:8000/v1"
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline([
transport.input(), # Transport user input
tma_in, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
tma_out # Assistant spoken responses
])
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True, enable_metrics=True))
# When a participant joins, start transcription for that participant so the
# bot can "hear" and respond to them.
@transport.event_handler("on_participant_joined")
async def on_participant_joined(transport, participant):
transport.capture_participant_transcription(participant["id"])
# When the first participant joins, the bot should introduce itself.
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
messages.append(
{"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
# Handle "latency-ping" messages. The client will send app messages that look like
# this:
# { "latency-ping": { ts: <client-side timestamp> }}
#
# We want to send an immediate pong back to the client from this handler function.
# Also, we will push a frame into the top of the pipeline and send it after the
#
@transport.event_handler("on_app_message")
async def on_app_message(transport, message, sender):
try:
if "latency-ping" in message:
logger.debug(f"Received latency ping app message: {message}")
ts = message["latency-ping"]["ts"]
# Send immediately
transport.output().send_message(DailyTransportMessageFrame(
message={"latency-pong-msg-handler": {"ts": ts}},
participant_id=sender))
# And push to the pipeline for the Daily transport.output to send
await tma_in.push_frame(
DailyTransportMessageFrame(
message={"latency-pong-pipeline-delivery": {"ts": ts}},
participant_id=sender))
except Exception as e:
logger.debug(f"message handling error: {e} - {message}")
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
(url, token) = configure()
asyncio.run(main(url, token))

View File

@@ -74,6 +74,8 @@ class TalkingAnimation(FrameProcessor):
self._is_talking = False
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, AudioRawFrame):
if not self._is_talking:
await self.push_frame(talking_frame)
@@ -93,6 +95,8 @@ class UserImageRequester(FrameProcessor):
self.participant_id = participant_id
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if self.participant_id and isinstance(frame, TextFrame):
if frame.text == user_request_answer:
await self.push_frame(UserImageRequestFrame(self.participant_id), FrameDirection.UPSTREAM)
@@ -107,6 +111,8 @@ class TextFilterProcessor(FrameProcessor):
self.text = text
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, TextFrame):
if frame.text != self.text:
await self.push_frame(frame)
@@ -116,6 +122,8 @@ class TextFilterProcessor(FrameProcessor):
class ImageFilterProcessor(FrameProcessor):
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if not isinstance(frame, ImageRawFrame):
await self.push_frame(frame)

View File

@@ -64,6 +64,8 @@ class TalkingAnimation(FrameProcessor):
self._is_talking = False
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, AudioRawFrame):
if not self._is_talking:
await self.push_frame(talking_frame)

View File

@@ -52,6 +52,8 @@ class StoryImageProcessor(FrameProcessor):
self._fal_service = fal_service
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, StoryImageFrame):
try:
async with timeout(7):
@@ -86,6 +88,8 @@ class StoryProcessor(FrameProcessor):
self._story = story
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, UserStoppedSpeakingFrame):
# Send an app message to the UI
await self.push_frame(DailyTransportMessageFrame(CUE_ASSISTANT_TURN))

View File

@@ -40,6 +40,8 @@ class TranslationProcessor(FrameProcessor):
self._language = language
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, TextFrame):
context = [
{
@@ -65,6 +67,8 @@ class TranslationSubtitles(FrameProcessor):
# subtitles.
#
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, TextFrame):
message = {
"language": self._language,

161
examples/twilio-chatbot/.gitignore vendored Normal file
View File

@@ -0,0 +1,161 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
.pybuilder/
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock
# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype static type analyzer
.pytype/
# Cython debug symbols
cython_debug/
# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
runpod.toml

View File

@@ -0,0 +1,20 @@
# Use an official Python runtime as a parent image
FROM python:3.10-bullseye
# Set the working directory in the container
WORKDIR /twilio-chatbot
# Copy the requirements file into the container
COPY requirements.txt .
# Install any needed packages specified in requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
# Copy the current directory contents into the container
COPY . .
# Expose the desired port
EXPOSE 8765
# Run the application
CMD ["uvicorn", "server:app", "--host", "0.0.0.0", "--port", "8765"]

View File

@@ -0,0 +1,82 @@
# Twilio Chatbot
This project is a FastAPI-based chatbot that integrates with Twilio to handle WebSocket connections and provide real-time communication. The project includes endpoints for starting a call and handling WebSocket connections.
## Table of Contents
- [Features](#features)
- [Requirements](#requirements)
- [Installation](#installation)
- [Configure Twilio URLs](#configure-twilio-urls)
- [Running the Application](#running-the-application)
- [Usage](#usage)
## Features
- **FastAPI**: A modern, fast (high-performance), web framework for building APIs with Python 3.6+.
- **WebSocket Support**: Real-time communication using WebSockets.
- **CORS Middleware**: Allowing cross-origin requests for testing.
- **Dockerized**: Easily deployable using Docker.
## Requirements
- Python 3.10
- Docker (for containerized deployment)
- ngrok (for tunneling)
- Twilio Account
## Installation
1. **Set up a virtual environment** (optional but recommended):
```sh
python -m venv venv
source venv/bin/activate # On Windows, use `venv\Scripts\activate`
```
2. **Install dependencies**:
```sh
pip install -r requirements.txt
```
3. **Create .env**:
create .env based on env.example
4. **Install ngrok**:
Follow the instructions on the [ngrok website](https://ngrok.com/download) to download and install ngrok.
## Configure Twilio URLs
1. **Update the Twilio Webhook**:
Copy the ngrok URL and update your Twilio phone number webhook URL to `http://<ngrok_url>/start_call`.
2. **Update the streams.xml**:
Copy the ngrok URL and update templates/streams.xml with `wss://<ngrok_url>/ws`.
## Running the Application
### Using Python
1. **Run the FastAPI application**:
```sh
python server.py
```
2. **Start ngrok**:
In a new terminal, start ngrok to tunnel the local server:
```sh
ngrok http 8765
```
### Using Docker
1. **Build the Docker image**:
```sh
docker build -t twilio-chatbot .
```
2. **Run the Docker container**:
```sh
docker run -it --rm -p 8765:8765 twilio-chatbot
```
## Usage
To start a call, simply make a call to your Twilio phone number. The webhook URL will direct the call to your FastAPI application, which will handle it accordingly.

View File

@@ -0,0 +1,88 @@
import aiohttp
import os
import sys
from pipecat.frames.frames import EndFrame, LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator,
LLMUserResponseAggregator
)
from pipecat.services.openai import OpenAILLMService
from pipecat.services.deepgram import DeepgramSTTService
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketTransport, FastAPIWebsocketParams
from pipecat.vad.silero import SileroVADAnalyzer
from loguru import logger
from dotenv import load_dotenv
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def run_bot(websocket_client):
async with aiohttp.ClientSession() as session:
transport = FastAPIWebsocketTransport(
websocket=websocket_client,
params=FastAPIWebsocketParams(
audio_out_enabled=True,
add_wav_header=False,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
vad_audio_passthrough=True
)
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o")
stt = DeepgramSTTService(api_key=os.getenv('DEEPGRAM_API_KEY'))
tts = ElevenLabsTTSService(
aiohttp_session=session,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in an audio call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
},
]
tma_in = LLMUserResponseAggregator(messages)
tma_out = LLMAssistantResponseAggregator(messages)
pipeline = Pipeline([
transport.input(), # Websocket input from client
stt, # Speech-To-Text
tma_in, # User responses
llm, # LLM
tts, # Text-To-Speech
transport.output(), # Websocket output to client
tma_out # LLM responses
])
task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True))
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
# Kick off the conversation.
messages.append(
{"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMMessagesFrame(messages)])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
await task.queue_frames([EndFrame()])
runner = PipelineRunner(handle_sigint=False)
await runner.run(task)

View File

@@ -0,0 +1,4 @@
OPENAI_API_KEY=
DEEPGRAM_API_KEY=
ELEVENLABS_API_KEY=
ELEVENLABS_VOICE_ID=

View File

@@ -0,0 +1,5 @@
pipecat-ai[daily,openai,silero,deepgram]
fastapi
uvicorn
python-dotenv
loguru

View File

@@ -0,0 +1,34 @@
import uvicorn
from fastapi import FastAPI, WebSocket
from fastapi.middleware.cors import CORSMiddleware
from starlette.responses import HTMLResponse
from bot import run_bot
app = FastAPI()
app.add_middleware(
CORSMiddleware,
allow_origins=["*"], # Allow all origins for testing
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
@app.post('/start_call')
async def start_call():
print("POST TwiML")
return HTMLResponse(content=open("templates/streams.xml").read(), media_type="application/xml")
@app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
await websocket.accept()
print("WebSocket connection accepted")
await run_bot(websocket)
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8765)

View File

@@ -0,0 +1,7 @@
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Connect>
<Stream url="wss://<your server url>/ws"></Stream>
</Connect>
<Pause length="40"/>
</Response>

View File

@@ -12,14 +12,14 @@ import sys
from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.llm_response import (
LLMAssistantResponseAggregator,
LLMUserResponseAggregator
)
from pipecat.services.deepgram import DeepgramSTTService
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.services.whisper import WhisperSTTService
from pipecat.transports.network.websocket_server import WebsocketServerParams, WebsocketServerTransport
from pipecat.vad.silero import SileroVADAnalyzer
@@ -36,7 +36,6 @@ async def main():
async with aiohttp.ClientSession() as session:
transport = WebsocketServerTransport(
params=WebsocketServerParams(
audio_in_enabled=True,
audio_out_enabled=True,
add_wav_header=True,
vad_enabled=True,
@@ -49,7 +48,7 @@ async def main():
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o")
stt = WhisperSTTService()
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = ElevenLabsTTSService(
aiohttp_session=session,

View File

@@ -4,9 +4,12 @@
#
# pip-compile --all-extras pyproject.toml
#
aiofiles==23.2.1
# via deepgram-sdk
aiohttp==3.9.5
# via
# cartesia
# deepgram-sdk
# langchain
# langchain-community
# pipecat-ai (pyproject.toml)
@@ -15,18 +18,24 @@ aiosignal==1.3.1
annotated-types==0.7.0
# via pydantic
anthropic==0.25.9
# via pipecat-ai (pyproject.toml)
# via
# openpipe
# pipecat-ai (pyproject.toml)
anyio==4.4.0
# via
# anthropic
# httpx
# openai
# starlette
# watchfiles
async-timeout==4.0.3
# via
# aiohttp
# langchain
attrs==23.2.0
# via aiohttp
# via
# aiohttp
# openpipe
av==12.1.0
# via faster-whisper
azure-cognitiveservices-speech==1.37.0
@@ -47,30 +56,45 @@ cffi==1.16.0
charset-normalizer==3.3.2
# via requests
click==8.1.7
# via flask
# via
# flask
# typer
# uvicorn
coloredlogs==15.0.1
# via onnxruntime
ctranslate2==4.2.1
ctranslate2==4.3.1
# via faster-whisper
daily-python==0.9.1
daily-python==0.10.0
# via pipecat-ai (pyproject.toml)
dataclasses-json==0.6.7
# via
# deepgram-sdk
# langchain-community
deepgram-sdk==3.2.7
# via pipecat-ai (pyproject.toml)
dataclasses-json==0.6.6
# via langchain-community
distro==1.9.0
# via
# anthropic
# openai
dnspython==2.6.1
# via email-validator
einops==0.8.0
# via pipecat-ai (pyproject.toml)
email-validator==2.2.0
# via fastapi
exceptiongroup==1.2.1
# via
# anyio
# pytest
fal-client==0.4.0
# via pipecat-ai (pyproject.toml)
fastapi==0.111.0
# via pipecat-ai (pyproject.toml)
fastapi-cli==0.0.4
# via fastapi
faster-whisper==1.0.2
# via pipecat-ai (pyproject.toml)
filelock==3.14.0
filelock==3.15.3
# via
# huggingface-hub
# pyht
@@ -102,9 +126,9 @@ google-api-core[grpc]==2.19.0
# google-ai-generativelanguage
# google-api-python-client
# google-generativeai
google-api-python-client==2.132.0
google-api-python-client==2.134.0
# via google-generativeai
google-auth==2.29.0
google-auth==2.30.0
# via
# google-ai-generativelanguage
# google-api-core
@@ -129,22 +153,29 @@ grpcio==1.64.1
grpcio-status==1.62.2
# via google-api-core
h11==0.14.0
# via httpcore
# via
# httpcore
# uvicorn
httpcore==1.0.5
# via httpx
httplib2==0.22.0
# via
# google-api-python-client
# google-auth-httplib2
httptools==0.6.1
# via uvicorn
httpx==0.27.0
# via
# anthropic
# cartesia
# deepgram-sdk
# fal-client
# fastapi
# openai
# openpipe
httpx-sse==0.4.0
# via fal-client
huggingface-hub==0.23.2
huggingface-hub==0.23.4
# via
# faster-whisper
# timm
@@ -155,6 +186,7 @@ humanfriendly==10.0
idna==3.7
# via
# anyio
# email-validator
# httpx
# requests
# yarl
@@ -164,41 +196,46 @@ itsdangerous==2.2.0
# via flask
jinja2==3.1.4
# via
# fastapi
# flask
# torch
jsonpatch==1.33
# via langchain-core
jsonpointer==2.4
jsonpointer==3.0.0
# via jsonpatch
langchain==0.2.1
langchain==0.2.5
# via
# langchain-community
# pipecat-ai (pyproject.toml)
langchain-community==0.2.1
langchain-community==0.2.5
# via pipecat-ai (pyproject.toml)
langchain-core==0.2.3
langchain-core==0.2.9
# via
# langchain
# langchain-community
# langchain-openai
# langchain-text-splitters
langchain-openai==0.1.8
langchain-openai==0.1.9
# via pipecat-ai (pyproject.toml)
langchain-text-splitters==0.2.0
langchain-text-splitters==0.2.1
# via langchain
langsmith==0.1.69
langsmith==0.1.81
# via
# langchain
# langchain-community
# langchain-core
loguru==0.7.2
# via pipecat-ai (pyproject.toml)
markdown-it-py==3.0.0
# via rich
markupsafe==2.1.5
# via
# jinja2
# werkzeug
marshmallow==3.21.2
marshmallow==3.21.3
# via dataclasses-json
mdurl==0.1.2
# via markdown-it-py
mpmath==1.3.0
# via sympy
multidict==6.0.5
@@ -256,10 +293,15 @@ onnxruntime==1.18.0
openai==1.26.0
# via
# langchain-openai
# openpipe
# pipecat-ai (pyproject.toml)
orjson==3.10.3
# via langsmith
packaging==23.2
openpipe==4.14.0
# via pipecat-ai (pyproject.toml)
orjson==3.10.5
# via
# fastapi
# langsmith
packaging==24.1
# via
# huggingface-hub
# langchain-core
@@ -273,7 +315,7 @@ pillow==10.3.0
# torchvision
pluggy==1.5.0
# via pytest
proto-plus==1.23.0
proto-plus==1.24.0
# via
# google-ai-generativelanguage
# google-api-core
@@ -298,9 +340,10 @@ pyaudio==0.2.14
# via pipecat-ai (pyproject.toml)
pycparser==2.22
# via cffi
pydantic==2.7.3
pydantic==2.7.4
# via
# anthropic
# fastapi
# google-generativeai
# langchain
# langchain-core
@@ -308,6 +351,8 @@ pydantic==2.7.3
# openai
pydantic-core==2.18.4
# via pydantic
pygments==2.18.0
# via rich
pyht==0.0.28
# via pipecat-ai (pyproject.toml)
pyloudnorm==0.1.1
@@ -318,8 +363,14 @@ pytest==8.2.2
# via pytest-asyncio
pytest-asyncio==0.23.7
# via cartesia
python-dateutil==2.9.0.post0
# via openpipe
python-dotenv==1.0.1
# via pipecat-ai (pyproject.toml)
# via
# pipecat-ai (pyproject.toml)
# uvicorn
python-multipart==0.0.9
# via fastapi
pyyaml==6.0.1
# via
# ctranslate2
@@ -329,6 +380,7 @@ pyyaml==6.0.1
# langchain-core
# timm
# transformers
# uvicorn
regex==2024.5.15
# via
# tiktoken
@@ -344,6 +396,8 @@ requests==2.32.3
# pyht
# tiktoken
# transformers
rich==13.7.1
# via typer
rsa==4.9
# via google-auth
safetensors==0.4.3
@@ -352,6 +406,10 @@ safetensors==0.4.3
# transformers
scipy==1.13.1
# via pyloudnorm
shellingham==1.5.4
# via typer
six==1.16.0
# via python-dateutil
sniffio==1.3.1
# via
# anthropic
@@ -360,15 +418,17 @@ sniffio==1.3.1
# openai
sounddevice==0.4.7
# via pipecat-ai (pyproject.toml)
sqlalchemy==2.0.30
sqlalchemy==2.0.31
# via
# langchain
# langchain-community
starlette==0.37.2
# via fastapi
sympy==1.12.1
# via
# onnxruntime
# torch
tenacity==8.3.0
tenacity==8.4.1
# via
# langchain
# langchain-community
@@ -384,15 +444,15 @@ tokenizers==0.19.1
# transformers
tomli==2.0.1
# via pytest
torch==2.3.0
torch==2.3.1
# via
# pipecat-ai (pyproject.toml)
# timm
# torchaudio
# torchvision
torchaudio==2.3.0
torchaudio==2.3.1
# via pipecat-ai (pyproject.toml)
torchvision==0.18.0
torchvision==0.18.1
# via timm
tqdm==4.66.4
# via
@@ -402,12 +462,16 @@ tqdm==4.66.4
# transformers
transformers==4.40.2
# via pipecat-ai (pyproject.toml)
triton==2.3.0
triton==2.3.1
# via torch
typing-extensions==4.12.1
typer==0.12.3
# via fastapi-cli
typing-extensions==4.12.2
# via
# anthropic
# anyio
# deepgram-sdk
# fastapi
# google-generativeai
# huggingface-hub
# openai
@@ -416,17 +480,31 @@ typing-extensions==4.12.1
# pydantic-core
# sqlalchemy
# torch
# typer
# typing-inspect
# uvicorn
typing-inspect==0.9.0
# via dataclasses-json
ujson==5.10.0
# via fastapi
uritemplate==4.1.1
# via google-api-python-client
urllib3==2.2.1
urllib3==2.2.2
# via requests
uvicorn[standard]==0.30.1
# via fastapi
uvloop==0.19.0
# via uvicorn
verboselogs==1.7
# via deepgram-sdk
watchfiles==0.22.0
# via uvicorn
websockets==12.0
# via
# cartesia
# deepgram-sdk
# pipecat-ai (pyproject.toml)
# uvicorn
werkzeug==3.0.3
# via flask
yarl==1.9.4

View File

@@ -1,12 +1,15 @@
#
# This file is autogenerated by pip-compile with Python 3.10
# This file is autogenerated by pip-compile with Python 3.12
# by the following command:
#
# pip-compile --all-extras pyproject.toml
#
aiofiles==23.2.1
# via deepgram-sdk
aiohttp==3.9.5
# via
# cartesia
# deepgram-sdk
# langchain
# langchain-community
# pipecat-ai (pyproject.toml)
@@ -15,18 +18,20 @@ aiosignal==1.3.1
annotated-types==0.7.0
# via pydantic
anthropic==0.25.9
# via pipecat-ai (pyproject.toml)
# via
# openpipe
# pipecat-ai (pyproject.toml)
anyio==4.4.0
# via
# anthropic
# httpx
# openai
async-timeout==4.0.3
# starlette
# watchfiles
attrs==23.2.0
# via
# aiohttp
# langchain
attrs==23.2.0
# via aiohttp
# openpipe
av==12.1.0
# via faster-whisper
azure-cognitiveservices-speech==1.37.0
@@ -47,30 +52,41 @@ cffi==1.16.0
charset-normalizer==3.3.2
# via requests
click==8.1.7
# via flask
# via
# flask
# typer
# uvicorn
coloredlogs==15.0.1
# via onnxruntime
ctranslate2==4.2.1
ctranslate2==4.3.1
# via faster-whisper
daily-python==0.9.1
daily-python==0.10.0
# via pipecat-ai (pyproject.toml)
dataclasses-json==0.6.7
# via
# deepgram-sdk
# langchain-community
deepgram-sdk==3.2.7
# via pipecat-ai (pyproject.toml)
dataclasses-json==0.6.6
# via langchain-community
distro==1.9.0
# via
# anthropic
# openai
dnspython==2.6.1
# via email-validator
einops==0.8.0
# via pipecat-ai (pyproject.toml)
exceptiongroup==1.2.1
# via
# anyio
# pytest
email-validator==2.2.0
# via fastapi
fal-client==0.4.0
# via pipecat-ai (pyproject.toml)
fastapi==0.111.0
# via pipecat-ai (pyproject.toml)
fastapi-cli==0.0.4
# via fastapi
faster-whisper==1.0.2
# via pipecat-ai (pyproject.toml)
filelock==3.14.0
filelock==3.15.3
# via
# huggingface-hub
# pyht
@@ -101,9 +117,9 @@ google-api-core[grpc]==2.19.0
# google-ai-generativelanguage
# google-api-python-client
# google-generativeai
google-api-python-client==2.132.0
google-api-python-client==2.134.0
# via google-generativeai
google-auth==2.29.0
google-auth==2.30.0
# via
# google-ai-generativelanguage
# google-api-core
@@ -126,22 +142,29 @@ grpcio==1.64.1
grpcio-status==1.62.2
# via google-api-core
h11==0.14.0
# via httpcore
# via
# httpcore
# uvicorn
httpcore==1.0.5
# via httpx
httplib2==0.22.0
# via
# google-api-python-client
# google-auth-httplib2
httptools==0.6.1
# via uvicorn
httpx==0.27.0
# via
# anthropic
# cartesia
# deepgram-sdk
# fal-client
# fastapi
# openai
# openpipe
httpx-sse==0.4.0
# via fal-client
huggingface-hub==0.23.2
huggingface-hub==0.23.4
# via
# faster-whisper
# timm
@@ -152,6 +175,7 @@ humanfriendly==10.0
idna==3.7
# via
# anyio
# email-validator
# httpx
# requests
# yarl
@@ -161,41 +185,46 @@ itsdangerous==2.2.0
# via flask
jinja2==3.1.4
# via
# fastapi
# flask
# torch
jsonpatch==1.33
# via langchain-core
jsonpointer==2.4
jsonpointer==3.0.0
# via jsonpatch
langchain==0.2.2
langchain==0.2.5
# via
# langchain-community
# pipecat-ai (pyproject.toml)
langchain-community==0.2.2
langchain-community==0.2.5
# via pipecat-ai (pyproject.toml)
langchain-core==0.2.4
langchain-core==0.2.9
# via
# langchain
# langchain-community
# langchain-openai
# langchain-text-splitters
langchain-openai==0.1.8
langchain-openai==0.1.9
# via pipecat-ai (pyproject.toml)
langchain-text-splitters==0.2.1
# via langchain
langsmith==0.1.69
langsmith==0.1.81
# via
# langchain
# langchain-community
# langchain-core
loguru==0.7.2
# via pipecat-ai (pyproject.toml)
markdown-it-py==3.0.0
# via rich
markupsafe==2.1.5
# via
# jinja2
# werkzeug
marshmallow==3.21.2
marshmallow==3.21.3
# via dataclasses-json
mdurl==0.1.2
# via markdown-it-py
mpmath==1.3.0
# via sympy
multidict==6.0.5
@@ -222,10 +251,15 @@ onnxruntime==1.18.0
openai==1.26.0
# via
# langchain-openai
# openpipe
# pipecat-ai (pyproject.toml)
orjson==3.10.3
# via langsmith
packaging==23.2
openpipe==4.14.0
# via pipecat-ai (pyproject.toml)
orjson==3.10.5
# via
# fastapi
# langsmith
packaging==24.1
# via
# huggingface-hub
# langchain-core
@@ -239,7 +273,7 @@ pillow==10.3.0
# torchvision
pluggy==1.5.0
# via pytest
proto-plus==1.23.0
proto-plus==1.24.0
# via
# google-ai-generativelanguage
# google-api-core
@@ -264,9 +298,10 @@ pyaudio==0.2.14
# via pipecat-ai (pyproject.toml)
pycparser==2.22
# via cffi
pydantic==2.7.3
pydantic==2.7.4
# via
# anthropic
# fastapi
# google-generativeai
# langchain
# langchain-core
@@ -274,6 +309,8 @@ pydantic==2.7.3
# openai
pydantic-core==2.18.4
# via pydantic
pygments==2.18.0
# via rich
pyht==0.0.28
# via pipecat-ai (pyproject.toml)
pyloudnorm==0.1.1
@@ -284,8 +321,14 @@ pytest==8.2.2
# via pytest-asyncio
pytest-asyncio==0.23.7
# via cartesia
python-dateutil==2.9.0.post0
# via openpipe
python-dotenv==1.0.1
# via pipecat-ai (pyproject.toml)
# via
# pipecat-ai (pyproject.toml)
# uvicorn
python-multipart==0.0.9
# via fastapi
pyyaml==6.0.1
# via
# ctranslate2
@@ -295,6 +338,7 @@ pyyaml==6.0.1
# langchain-core
# timm
# transformers
# uvicorn
regex==2024.5.15
# via
# tiktoken
@@ -310,6 +354,8 @@ requests==2.32.3
# pyht
# tiktoken
# transformers
rich==13.7.1
# via typer
rsa==4.9
# via google-auth
safetensors==0.4.3
@@ -318,6 +364,10 @@ safetensors==0.4.3
# transformers
scipy==1.13.1
# via pyloudnorm
shellingham==1.5.4
# via typer
six==1.16.0
# via python-dateutil
sniffio==1.3.1
# via
# anthropic
@@ -326,15 +376,17 @@ sniffio==1.3.1
# openai
sounddevice==0.4.7
# via pipecat-ai (pyproject.toml)
sqlalchemy==2.0.30
sqlalchemy==2.0.31
# via
# langchain
# langchain-community
starlette==0.37.2
# via fastapi
sympy==1.12.1
# via
# onnxruntime
# torch
tenacity==8.3.0
tenacity==8.4.1
# via
# langchain
# langchain-community
@@ -348,17 +400,15 @@ tokenizers==0.19.1
# anthropic
# faster-whisper
# transformers
tomli==2.0.1
# via pytest
torch==2.3.0
torch==2.3.1
# via
# pipecat-ai (pyproject.toml)
# timm
# torchaudio
# torchvision
torchaudio==2.3.0
torchaudio==2.3.1
# via pipecat-ai (pyproject.toml)
torchvision==0.18.0
torchvision==0.18.1
# via timm
tqdm==4.66.4
# via
@@ -368,10 +418,13 @@ tqdm==4.66.4
# transformers
transformers==4.40.2
# via pipecat-ai (pyproject.toml)
typing-extensions==4.12.1
typer==0.12.3
# via fastapi-cli
typing-extensions==4.12.2
# via
# anthropic
# anyio
# deepgram-sdk
# fastapi
# google-generativeai
# huggingface-hub
# openai
@@ -380,17 +433,30 @@ typing-extensions==4.12.1
# pydantic-core
# sqlalchemy
# torch
# typer
# typing-inspect
typing-inspect==0.9.0
# via dataclasses-json
ujson==5.10.0
# via fastapi
uritemplate==4.1.1
# via google-api-python-client
urllib3==2.2.1
urllib3==2.2.2
# via requests
uvicorn[standard]==0.30.1
# via fastapi
uvloop==0.19.0
# via uvicorn
verboselogs==1.7
# via deepgram-sdk
watchfiles==0.22.0
# via uvicorn
websockets==12.0
# via
# cartesia
# deepgram-sdk
# pipecat-ai (pyproject.toml)
# uvicorn
werkzeug==3.0.3
# via flask
yarl==1.9.4

View File

@@ -37,7 +37,8 @@ Website = "https://pipecat.ai"
anthropic = [ "anthropic~=0.25.7" ]
azure = [ "azure-cognitiveservices-speech~=1.37.0" ]
cartesia = [ "numpy~=1.26.0", "sounddevice", "cartesia" ]
daily = [ "daily-python~=0.9.0" ]
daily = [ "daily-python~=0.10.0" ]
deepgram = [ "deepgram-sdk~=3.2.7" ]
examples = [ "python-dotenv~=1.0.0", "flask~=3.0.3", "flask_cors~=4.0.1" ]
fal = [ "fal-client~=0.4.0" ]
google = [ "google-generativeai~=0.5.3" ]
@@ -46,9 +47,10 @@ langchain = [ "langchain~=0.2.1", "langchain-community~=0.2.1", "langchain-opena
local = [ "pyaudio~=0.2.0" ]
moondream = [ "einops~=0.8.0", "timm~=0.9.16", "transformers~=4.40.2" ]
openai = [ "openai~=1.26.0" ]
openpipe = [ "openpipe~=4.14.0" ]
playht = [ "pyht~=0.0.28" ]
silero = [ "torch~=2.3.0", "torchaudio~=2.3.0" ]
websocket = [ "websockets~=12.0" ]
websocket = [ "websockets~=12.0", "fastapi~=0.111.0" ]
whisper = [ "faster-whisper~=1.0.2" ]
[tool.setuptools.packages.find]

View File

@@ -4,7 +4,7 @@
# SPDX-License-Identifier: BSD 2-Clause License
#
from typing import Any, List, Tuple
from typing import Any, List, Mapping, Tuple
from dataclasses import dataclass, field
@@ -188,6 +188,8 @@ class SystemFrame(Frame):
class StartFrame(SystemFrame):
"""This is the first frame that should be pushed down a pipeline."""
allow_interruptions: bool = False
enable_metrics: bool = False
report_only_initial_ttfb: bool = False
@dataclass
@@ -238,6 +240,13 @@ class StopInterruptionFrame(SystemFrame):
pass
@dataclass
class MetricsFrame(SystemFrame):
"""Emitted by processor that can compute metrics like latencies.
"""
ttfb: Mapping[str, float]
#
# Control frames
#

View File

@@ -0,0 +1,21 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
from abc import abstractmethod
from typing import List
from pipecat.processors.frame_processor import FrameProcessor
class BasePipeline(FrameProcessor):
def __init__(self):
super().__init__()
@abstractmethod
def processors_with_metrics(self) -> List[FrameProcessor]:
pass

View File

@@ -6,6 +6,10 @@
import asyncio
from itertools import chain
from typing import List
from pipecat.pipeline.base_pipeline import BasePipeline
from pipecat.pipeline.pipeline import Pipeline
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.frames.frames import CancelFrame, EndFrame, Frame, StartFrame
@@ -20,6 +24,8 @@ class Source(FrameProcessor):
self._up_queue = upstream_queue
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
match direction:
case FrameDirection.UPSTREAM:
await self._up_queue.put(frame)
@@ -34,6 +40,8 @@ class Sink(FrameProcessor):
self._down_queue = downstream_queue
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
match direction:
case FrameDirection.UPSTREAM:
await self.push_frame(frame, direction)
@@ -41,7 +49,7 @@ class Sink(FrameProcessor):
await self._down_queue.put(frame)
class ParallelPipeline(FrameProcessor):
class ParallelPipeline(BasePipeline):
def __init__(self, *args):
super().__init__()
@@ -77,6 +85,13 @@ class ParallelPipeline(FrameProcessor):
logger.debug(f"Finished creating {self} pipelines")
#
# BasePipeline
#
def processors_with_metrics(self) -> List[FrameProcessor]:
return list(chain.from_iterable(p.processors_with_metrics() for p in self._pipelines))
#
# Frame processor
#
@@ -90,6 +105,8 @@ class ParallelPipeline(FrameProcessor):
self._down_task = loop.create_task(self._process_down_queue())
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, StartFrame):
await self._start_tasks()

View File

@@ -6,8 +6,10 @@
import asyncio
from itertools import chain
from typing import List
from pipecat.pipeline.base_pipeline import BasePipeline
from pipecat.pipeline.pipeline import Pipeline
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.frames.frames import Frame
@@ -22,6 +24,8 @@ class Source(FrameProcessor):
self._up_queue = upstream_queue
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
match direction:
case FrameDirection.UPSTREAM:
await self._up_queue.put(frame)
@@ -36,6 +40,8 @@ class Sink(FrameProcessor):
self._down_queue = downstream_queue
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
match direction:
case FrameDirection.UPSTREAM:
await self.push_frame(frame, direction)
@@ -43,7 +49,7 @@ class Sink(FrameProcessor):
await self._down_queue.put(frame)
class ParallelTask(FrameProcessor):
class ParallelTask(BasePipeline):
def __init__(self, *args):
super().__init__()
@@ -75,11 +81,20 @@ class ParallelTask(FrameProcessor):
self._pipelines.append(pipeline)
logger.debug(f"Finished creating {self} pipelines")
#
# BasePipeline
#
def processors_with_metrics(self) -> List[FrameProcessor]:
return list(chain.from_iterable(p.processors_with_metrics() for p in self._pipelines))
#
# Frame processor
#
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if direction == FrameDirection.UPSTREAM:
# If we get an upstream frame we process it in each sink.
await asyncio.gather(*[s.process_frame(frame, direction) for s in self._sinks])

View File

@@ -4,11 +4,10 @@
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
from typing import Callable, Coroutine, List
from pipecat.frames.frames import Frame
from pipecat.pipeline.base_pipeline import BasePipeline
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
@@ -19,6 +18,8 @@ class PipelineSource(FrameProcessor):
self._upstream_push_frame = upstream_push_frame
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
match direction:
case FrameDirection.UPSTREAM:
await self._upstream_push_frame(frame, direction)
@@ -33,6 +34,8 @@ class PipelineSink(FrameProcessor):
self._downstream_push_frame = downstream_push_frame
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
match direction:
case FrameDirection.UPSTREAM:
await self.push_frame(frame, direction)
@@ -40,7 +43,7 @@ class PipelineSink(FrameProcessor):
await self._downstream_push_frame(frame, direction)
class Pipeline(FrameProcessor):
class Pipeline(BasePipeline):
def __init__(self, processors: List[FrameProcessor]):
super().__init__()
@@ -53,6 +56,19 @@ class Pipeline(FrameProcessor):
self._link_processors()
#
# BasePipeline
#
def processors_with_metrics(self):
services = []
for p in self._processors:
if isinstance(p, BasePipeline):
services += p.processors_with_metrics()
elif p.can_generate_metrics():
services.append(p)
return services
#
# Frame processor
#
@@ -61,6 +77,8 @@ class Pipeline(FrameProcessor):
await self._cleanup_processors()
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if direction == FrameDirection.DOWNSTREAM:
await self._source.process_frame(frame, FrameDirection.DOWNSTREAM)
elif direction == FrameDirection.UPSTREAM:

View File

@@ -10,7 +10,8 @@ from typing import AsyncIterable, Iterable
from pydantic import BaseModel
from pipecat.frames.frames import CancelFrame, EndFrame, ErrorFrame, Frame, StartFrame, StopTaskFrame
from pipecat.frames.frames import CancelFrame, EndFrame, ErrorFrame, Frame, MetricsFrame, StartFrame, StopTaskFrame
from pipecat.pipeline.base_pipeline import BasePipeline
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.utils.utils import obj_count, obj_id
@@ -19,6 +20,8 @@ from loguru import logger
class PipelineParams(BaseModel):
allow_interruptions: bool = False
enable_metrics: bool = False
report_only_initial_ttfb: bool = False
class Source(FrameProcessor):
@@ -28,6 +31,8 @@ class Source(FrameProcessor):
self._up_queue = up_queue
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
match direction:
case FrameDirection.UPSTREAM:
await self._up_queue.put(frame)
@@ -37,7 +42,7 @@ class Source(FrameProcessor):
class PipelineTask:
def __init__(self, pipeline: FrameProcessor, params: PipelineParams = PipelineParams()):
def __init__(self, pipeline: BasePipeline, params: PipelineParams = PipelineParams()):
self.id: int = obj_id()
self.name: str = f"{self.__class__.__name__}#{obj_count(self)}"
@@ -66,6 +71,8 @@ class PipelineTask:
await self._source.process_frame(CancelFrame(), FrameDirection.DOWNSTREAM)
self._process_down_task.cancel()
self._process_up_task.cancel()
await self._process_down_task
await self._process_up_task
async def run(self):
self._process_up_task = asyncio.create_task(self._process_up_queue())
@@ -86,9 +93,20 @@ class PipelineTask:
else:
raise Exception("Frames must be an iterable or async iterable")
def _initial_metrics_frame(self) -> MetricsFrame:
processors = self._pipeline.processors_with_metrics()
ttfb = dict(zip([p.name for p in processors], [0] * len(processors)))
return MetricsFrame(ttfb=ttfb)
async def _process_down_queue(self):
await self._source.process_frame(
StartFrame(allow_interruptions=self._params.allow_interruptions), FrameDirection.DOWNSTREAM)
start_frame = StartFrame(
allow_interruptions=self._params.allow_interruptions,
enable_metrics=self._params.enable_metrics,
report_only_initial_ttfb=self._params.report_only_initial_ttfb
)
await self._source.process_frame(start_frame, FrameDirection.DOWNSTREAM)
await self._source.process_frame(self._initial_metrics_frame(), FrameDirection.DOWNSTREAM)
running = True
should_cleanup = True
while running:
@@ -106,6 +124,7 @@ class PipelineTask:
await self._pipeline.cleanup()
# We just enqueue None to terminate the task gracefully.
self._process_up_task.cancel()
await self._process_up_task
async def _process_up_queue(self):
while True:

View File

@@ -48,6 +48,8 @@ class GatedAggregator(FrameProcessor):
self._accumulator: List[Frame] = []
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
# We must not block system frames.
if isinstance(frame, SystemFrame):
await self.push_frame(frame, direction)

View File

@@ -71,6 +71,8 @@ class LLMResponseAggregator(FrameProcessor):
# S I T E -> X
# S I E T -> X
# S I E I T -> X
# S E T -> X
# S E I T -> X
#
# The following case would not be supported:
#
@@ -79,6 +81,8 @@ class LLMResponseAggregator(FrameProcessor):
# and T2 would be dropped.
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
send_aggregation = False
if isinstance(frame, self._start_frame):
@@ -87,6 +91,7 @@ class LLMResponseAggregator(FrameProcessor):
self._seen_start_frame = True
self._seen_end_frame = False
self._seen_interim_results = False
await self.push_frame(frame, direction)
elif isinstance(frame, self._end_frame):
self._seen_end_frame = True
self._seen_start_frame = False
@@ -94,11 +99,12 @@ class LLMResponseAggregator(FrameProcessor):
# We might have received the end frame but we might still be
# aggregating (i.e. we have seen interim results but not the final
# text).
self._aggregating = self._seen_interim_results
self._aggregating = self._seen_interim_results or len(self._aggregation) == 0
# Send the aggregation if we are not aggregating anymore (i.e. no
# more interim results received).
send_aggregation = not self._aggregating
await self.push_frame(frame, direction)
elif isinstance(frame, self._accumulator_frame):
if self._aggregating:
self._aggregation += f" {frame.text}"
@@ -207,6 +213,8 @@ class LLMFullResponseAggregator(FrameProcessor):
self._aggregation = ""
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, TextFrame):
self._aggregation += frame.text
elif isinstance(frame, LLMFullResponseEndFrame):

View File

@@ -33,6 +33,8 @@ class SentenceAggregator(FrameProcessor):
self._aggregation = ""
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
# We ignore interim description at this point.
if isinstance(frame, InterimTranscriptionFrame):
return

View File

@@ -74,6 +74,8 @@ class ResponseAggregator(FrameProcessor):
# S I T E -> X
# S I E T -> X
# S I E I T -> X
# S E T -> X
# S E I T -> X
#
# The following case would not be supported:
#
@@ -82,6 +84,8 @@ class ResponseAggregator(FrameProcessor):
# and T2 would be dropped.
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
send_aggregation = False
if isinstance(frame, self._start_frame):
@@ -89,6 +93,7 @@ class ResponseAggregator(FrameProcessor):
self._seen_start_frame = True
self._seen_end_frame = False
self._seen_interim_results = False
await self.push_frame(frame, direction)
elif isinstance(frame, self._end_frame):
self._seen_end_frame = True
self._seen_start_frame = False
@@ -96,11 +101,12 @@ class ResponseAggregator(FrameProcessor):
# We might have received the end frame but we might still be
# aggregating (i.e. we have seen interim results but not the final
# text).
self._aggregating = self._seen_interim_results
self._aggregating = self._seen_interim_results or len(self._aggregation) == 0
# Send the aggregation if we are not aggregating anymore (i.e. no
# more interim results received).
send_aggregation = not self._aggregating
await self.push_frame(frame, direction)
elif isinstance(frame, self._accumulator_frame):
if self._aggregating:
self._aggregation += f" {frame.text}"

View File

@@ -30,6 +30,8 @@ class VisionImageFrameAggregator(FrameProcessor):
self._describe_text = None
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, TextFrame):
self._describe_text = frame.text
elif isinstance(frame, ImageRawFrame):

View File

@@ -30,5 +30,7 @@ class FrameFilter(FrameProcessor):
or isinstance(frame, SystemFrame))
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if self._should_passthrough_frame(frame):
await self.push_frame(frame, direction)

View File

@@ -0,0 +1,30 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
from typing import Awaitable, Callable
from pipecat.frames.frames import Frame, SystemFrame
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
class FunctionFilter(FrameProcessor):
def __init__(self, filter: Callable[[Frame], Awaitable[bool]]):
super().__init__()
self._filter = filter
#
# Frame processor
#
def _should_passthrough_frame(self, frame):
return isinstance(frame, SystemFrame)
async def process_frame(self, frame: Frame, direction: FrameDirection):
passthrough = self._should_passthrough_frame(frame)
allowed = await self._filter(frame)
if passthrough or allowed:
await self.push_frame(frame, direction)

View File

@@ -43,6 +43,8 @@ class WakeCheckFilter(FrameProcessor):
self._wake_patterns.append(pattern)
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
try:
if isinstance(frame, TranscriptionFrame):
p = self._participant_states.get(frame.user_id)

View File

@@ -5,10 +5,11 @@
#
import asyncio
import time
from enum import Enum
from pipecat.frames.frames import ErrorFrame, Frame
from pipecat.frames.frames import ErrorFrame, Frame, MetricsFrame, StartFrame, UserStoppedSpeakingFrame
from pipecat.utils.utils import obj_count, obj_id
from loguru import logger
@@ -21,13 +22,53 @@ class FrameDirection(Enum):
class FrameProcessor:
def __init__(self, loop: asyncio.AbstractEventLoop | None = None):
def __init__(
self,
name: str | None = None,
loop: asyncio.AbstractEventLoop | None = None,
**kwargs):
self.id: int = obj_id()
self.name = f"{self.__class__.__name__}#{obj_count(self)}"
self.name = name or f"{self.__class__.__name__}#{obj_count(self)}"
self._prev: "FrameProcessor" | None = None
self._next: "FrameProcessor" | None = None
self._loop: asyncio.AbstractEventLoop = loop or asyncio.get_running_loop()
# Properties
self._allow_interruptions = False
self._enable_metrics = False
self._report_only_initial_ttfb = False
# Metrics
self._start_ttfb_time = 0
self._should_report_ttfb = True
@property
def interruptions_allowed(self):
return self._allow_interruptions
@property
def metrics_enabled(self):
return self._enable_metrics
@property
def report_only_initial_ttfb(self):
return self._report_only_initial_ttfb
def can_generate_metrics(self) -> bool:
return False
async def start_ttfb_metrics(self):
if self.metrics_enabled and self._should_report_ttfb:
self._start_ttfb_time = time.time()
self._should_report_ttfb = not self._report_only_initial_ttfb
async def stop_ttfb_metrics(self):
if self.metrics_enabled and self._start_ttfb_time > 0:
ttfb = time.time() - self._start_ttfb_time
logger.debug(f"{self.name} TTFB: {ttfb}")
await self.push_frame(MetricsFrame(ttfb={self.name: ttfb}))
self._start_ttfb_time = 0
async def cleanup(self):
pass
@@ -40,7 +81,12 @@ class FrameProcessor:
return self._loop
async def process_frame(self, frame: Frame, direction: FrameDirection):
pass
if isinstance(frame, StartFrame):
self._allow_interruptions = frame.allow_interruptions
self._enable_metrics = frame.enable_metrics
self._report_only_initial_ttfb = frame.report_only_initial_ttfb
elif isinstance(frame, UserStoppedSpeakingFrame):
self._should_report_ttfb = True
async def push_error(self, error: ErrorFrame):
await self.push_frame(error, FrameDirection.UPSTREAM)

View File

@@ -39,6 +39,8 @@ class LangchainProcessor(FrameProcessor):
self._participant_id = participant_id
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, LLMMessagesFrame):
# Messages are accumulated by the `LLMUserResponseAggregator` in a list of messages.
# The last one by the human is the one we want to send to the LLM.
@@ -71,7 +73,7 @@ class LangchainProcessor(FrameProcessor):
await self.push_frame(TextFrame(self.__get_token_value(token)))
await self.push_frame(LLMResponseEndFrame())
except GeneratorExit:
logger.warning("Generator was closed prematurely")
logger.warning(f"{self} generator was closed prematurely")
except Exception as e:
logger.error(f"An unknown error occurred: {e}")
logger.error(f"{self} an unknown error occurred: {e}")
await self.push_frame(LLMFullResponseEndFrame())

View File

@@ -27,6 +27,8 @@ class StatelessTextTransformer(FrameProcessor):
self._transform_fn = transform_fn
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, TextFrame):
result = self._transform_fn(frame.text)
if isinstance(result, Coroutine):

View File

@@ -12,9 +12,9 @@ from pipecat.frames.frames import Frame
class FrameSerializer(ABC):
@abstractmethod
def serialize(self, frame: Frame) -> bytes:
def serialize(self, frame: Frame) -> str | bytes | None:
pass
@abstractmethod
def deserialize(self, data: bytes) -> Frame:
def deserialize(self, data: str | bytes) -> Frame | None:
pass

View File

@@ -11,6 +11,8 @@ import pipecat.frames.protobufs.frames_pb2 as frame_protos
from pipecat.frames.frames import AudioRawFrame, Frame, TextFrame, TranscriptionFrame
from pipecat.serializers.base_serializer import FrameSerializer
from loguru import logger
class ProtobufFrameSerializer(FrameSerializer):
SERIALIZABLE_TYPES = {
@@ -24,7 +26,7 @@ class ProtobufFrameSerializer(FrameSerializer):
def __init__(self):
pass
def serialize(self, frame: Frame) -> bytes:
def serialize(self, frame: Frame) -> str | bytes | None:
proto_frame = frame_protos.Frame()
if type(frame) not in self.SERIALIZABLE_TYPES:
raise ValueError(
@@ -39,7 +41,7 @@ class ProtobufFrameSerializer(FrameSerializer):
result = proto_frame.SerializeToString()
return result
def deserialize(self, data: bytes) -> Frame:
def deserialize(self, data: str | bytes) -> Frame | None:
"""Returns a Frame object from a Frame protobuf. Used to convert frames
passed over the wire as protobufs to Frame objects used in pipelines
and frame processors.
@@ -61,8 +63,8 @@ class ProtobufFrameSerializer(FrameSerializer):
proto = frame_protos.Frame.FromString(data)
which = proto.WhichOneof("frame")
if which not in self.SERIALIZABLE_FIELDS:
raise ValueError(
"Proto does not contain a valid frame. You may need to add a new case to ProtobufFrameSerializer.deserialize.")
logger.error("Unable to deserialize a valid frame")
return None
class_name = self.SERIALIZABLE_FIELDS[which]
args = getattr(proto, which)

View File

@@ -0,0 +1,55 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import base64
import json
from pipecat.frames.frames import AudioRawFrame, Frame
from pipecat.serializers.base_serializer import FrameSerializer
from pipecat.utils.audio import ulaw_8000_to_pcm_16000, pcm_16000_to_ulaw_8000
class TwilioFrameSerializer(FrameSerializer):
SERIALIZABLE_TYPES = {
AudioRawFrame: "audio",
}
def __init__(self):
self._sid = None
def serialize(self, frame: Frame) -> str | bytes | None:
if not isinstance(frame, AudioRawFrame):
return None
data = frame.audio
serialized_data = pcm_16000_to_ulaw_8000(data)
payload = base64.b64encode(serialized_data).decode("utf-8")
answer = {
"event": "media",
"streamSid": self._sid,
"media": {
"payload": payload
}
}
return json.dumps(answer)
def deserialize(self, data: str | bytes) -> Frame | None:
message = json.loads(data)
if not self._sid:
self._sid = message["streamSid"] if "streamSid" in message else None
if message["event"] != "media":
return None
else:
payload_base64 = message["media"]["payload"]
payload = base64.b64decode(payload_base64)
deserialized_data = ulaw_8000_to_pcm_16000(payload)
audio_frame = AudioRawFrame(audio=deserialized_data, num_channels=1, sample_rate=16000)
return audio_frame

View File

@@ -16,10 +16,12 @@ from pipecat.frames.frames import (
EndFrame,
ErrorFrame,
Frame,
StartFrame,
TTSStartedFrame,
TTSStoppedFrame,
TextFrame,
VisionImageRawFrame,
LLMFullResponseEndFrame,
)
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.utils.audio import calculate_audio_volume
@@ -27,8 +29,27 @@ from pipecat.utils.utils import exp_smoothing
class AIService(FrameProcessor):
def __init__(self):
super().__init__()
def __init__(self, **kwargs):
super().__init__(**kwargs)
async def start(self, frame: StartFrame):
pass
async def stop(self, frame: EndFrame):
pass
async def cancel(self, frame: CancelFrame):
pass
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, StartFrame):
await self.start(frame)
elif isinstance(frame, CancelFrame):
await self.cancel(frame)
elif isinstance(frame, EndFrame):
await self.stop(frame)
async def process_generator(self, generator: AsyncGenerator[Frame, None]):
async for f in generator:
@@ -41,8 +62,8 @@ class AIService(FrameProcessor):
class LLMService(AIService):
"""This class is a no-op but serves as a base class for LLM services."""
def __init__(self):
super().__init__()
def __init__(self, **kwargs):
super().__init__(**kwargs)
self._callbacks = {}
self._start_callbacks = {}
@@ -71,8 +92,8 @@ class LLMService(AIService):
class TTSService(AIService):
def __init__(self, aggregate_sentences: bool = True):
super().__init__()
def __init__(self, aggregate_sentences: bool = True, **kwargs):
super().__init__(**kwargs)
self._aggregate_sentences: bool = aggregate_sentences
self._current_sentence: str = ""
@@ -90,7 +111,9 @@ class TTSService(AIService):
text = frame.text
else:
self._current_sentence += frame.text
if self._current_sentence.strip().endswith((".", "?", "!")):
if self._current_sentence.strip().endswith(
(".", "?", "!")) and not self._current_sentence.strip().endswith(
("Mr,", "Mrs.", "Ms.", "Dr.")):
text = self._current_sentence.strip()
self._current_sentence = ""
@@ -106,12 +129,19 @@ class TTSService(AIService):
await self.push_frame(TextFrame(text))
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, TextFrame):
await self._process_text_frame(frame)
elif isinstance(frame, EndFrame):
if self._current_sentence:
await self._push_tts_frames(self._current_sentence)
await self.push_frame(frame)
elif isinstance(frame, LLMFullResponseEndFrame):
if self._current_sentence:
await self._push_tts_frames(self._current_sentence.strip())
self._current_sentence = ""
await self.push_frame(frame)
else:
await self.push_frame(frame, direction)
@@ -124,8 +154,9 @@ class STTService(AIService):
max_silence_secs: float = 0.3,
max_buffer_secs: float = 1.5,
sample_rate: int = 16000,
num_channels: int = 1):
super().__init__()
num_channels: int = 1,
**kwargs):
super().__init__(**kwargs)
self._min_volume = min_volume
self._max_silence_secs = max_silence_secs
self._max_buffer_secs = max_buffer_secs
@@ -134,8 +165,8 @@ class STTService(AIService):
(self._content, self._wave) = self._new_wave()
self._silence_num_frames = 0
# Volume exponential smoothing
self._smoothing_factor = 0.4
self._prev_volume = 1 - self._smoothing_factor
self._smoothing_factor = 0.2
self._prev_volume = 0
@abstractmethod
async def run_stt(self, audio: bytes) -> AsyncGenerator[Frame, None]:
@@ -179,6 +210,8 @@ class STTService(AIService):
async def process_frame(self, frame: Frame, direction: FrameDirection):
"""Processes a frame of audio data, either buffering or transcribing it."""
await super().process_frame(frame, direction)
if isinstance(frame, CancelFrame) or isinstance(frame, EndFrame):
self._wave.close()
await self.push_frame(frame, direction)
@@ -192,8 +225,8 @@ class STTService(AIService):
class ImageGenService(AIService):
def __init__(self):
super().__init__()
def __init__(self, **kwargs):
super().__init__(**kwargs)
# Renders the image. Returns an Image object.
@abstractmethod
@@ -201,6 +234,8 @@ class ImageGenService(AIService):
pass
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, TextFrame):
await self.push_frame(frame, direction)
await self.process_generator(self.run_image_gen(frame.text))
@@ -211,8 +246,8 @@ class ImageGenService(AIService):
class VisionService(AIService):
"""VisionService is a base class for vision services."""
def __init__(self):
super().__init__()
def __init__(self, **kwargs):
super().__init__(**kwargs)
self._describe_text = None
@abstractmethod
@@ -220,6 +255,8 @@ class VisionService(AIService):
pass
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, VisionImageRawFrame):
await self.process_generator(self.run_vision(frame))
else:

View File

@@ -4,7 +4,6 @@
# SPDX-License-Identifier: BSD 2-Clause License
#
import time
import base64
from pipecat.frames.frames import (
@@ -50,6 +49,9 @@ class AnthropicLLMService(LLMService):
self._model = model
self._max_tokens = max_tokens
def can_generate_metrics(self) -> bool:
return True
def _get_messages_from_openai_context(
self, context: OpenAILLMContext):
openai_messages = context.get_messages()
@@ -102,13 +104,16 @@ class AnthropicLLMService(LLMService):
messages = self._get_messages_from_openai_context(context)
start_time = time.time()
await self.start_ttfb_metrics()
response = await self._client.messages.create(
messages=messages,
model=self._model,
max_tokens=self._max_tokens,
stream=True)
logger.debug(f"Anthropic LLM TTFB: {time.time() - start_time}")
await self.stop_ttfb_metrics()
async for event in response:
# logger.debug(f"Anthropic LLM event: {event}")
if (event.type == "content_block_delta"):
@@ -117,11 +122,13 @@ class AnthropicLLMService(LLMService):
await self.push_frame(LLMResponseEndFrame())
except Exception as e:
logger.error(f"Anthrophic exception: {e}")
logger.error(f"{self} exception: {e}")
finally:
await self.push_frame(LLMFullResponseEndFrame())
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
context = None
if isinstance(frame, OpenAILLMContextFrame):

View File

@@ -7,27 +7,30 @@
import aiohttp
import asyncio
import io
import time
from PIL import Image
from typing import AsyncGenerator
from numpy import str_
from openai import AsyncAzureOpenAI
from pipecat.frames.frames import AudioRawFrame, ErrorFrame, Frame, URLImageRawFrame
from pipecat.services.ai_services import TTSService, ImageGenService
from pipecat.frames.frames import AudioRawFrame, CancelFrame, EndFrame, ErrorFrame, Frame, StartFrame, SystemFrame, TranscriptionFrame, URLImageRawFrame
from pipecat.processors.frame_processor import FrameDirection
from pipecat.services.ai_services import AIService, TTSService, ImageGenService
from pipecat.services.openai import BaseOpenAILLMService
from loguru import logger
# See .env.example for Azure configuration needed
try:
from openai import AsyncAzureOpenAI
from azure.cognitiveservices.speech import (
SpeechSynthesizer,
SpeechConfig,
SpeechRecognizer,
SpeechSynthesizer,
ResultReason,
CancellationReason,
)
from azure.cognitiveservices.speech.audio import AudioStreamFormat, PushAudioInputStream
from azure.cognitiveservices.speech.dialog import AudioConfig
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error(
@@ -35,41 +38,6 @@ except ModuleNotFoundError as e:
raise Exception(f"Missing module: {e}")
class AzureTTSService(TTSService):
def __init__(self, *, api_key: str, region: str, voice="en-US-SaraNeural", **kwargs):
super().__init__(**kwargs)
self.speech_config = SpeechConfig(subscription=api_key, region=region)
self.speech_synthesizer = SpeechSynthesizer(
speech_config=self.speech_config, audio_config=None
)
self._voice = voice
async def run_tts(self, text: str) -> AsyncGenerator[Frame, None]:
logger.debug(f"Generating TTS: {text}")
ssml = (
"<speak version='1.0' xml:lang='en-US' xmlns='http://www.w3.org/2001/10/synthesis' "
"xmlns:mstts='http://www.w3.org/2001/mstts'>"
f"<voice name='{self._voice}'>"
"<mstts:silence type='Sentenceboundary' value='20ms' />"
"<mstts:express-as style='lyrical' styledegree='2' role='SeniorFemale'>"
"<prosody rate='1.05'>"
f"{text}"
"</prosody></mstts:express-as></voice></speak> ")
result = await asyncio.to_thread(self.speech_synthesizer.speak_ssml, (ssml))
if result.reason == ResultReason.SynthesizingAudioCompleted:
# Azure always sends a 44-byte header. Strip it off.
yield AudioRawFrame(audio=result.audio_data[44:], sample_rate=16000, num_channels=1)
elif result.reason == ResultReason.Canceled:
cancellation_details = result.cancellation_details
logger.warning(f"Speech synthesis canceled: {cancellation_details.reason}")
if cancellation_details.reason == CancellationReason.Error:
logger.error(f"Error details: {cancellation_details.error_details}")
class AzureLLMService(BaseOpenAILLMService):
def __init__(
self,
@@ -84,7 +52,7 @@ class AzureLLMService(BaseOpenAILLMService):
self._api_version = api_version
super().__init__(api_key=api_key, model=model)
def create_client(self, api_key=None, base_url=None):
def create_client(self, api_key=None, base_url=None, **kwargs):
return AsyncAzureOpenAI(
api_key=api_key,
azure_endpoint=self._endpoint,
@@ -92,6 +60,116 @@ class AzureLLMService(BaseOpenAILLMService):
)
class AzureTTSService(TTSService):
def __init__(self, *, api_key: str, region: str, voice="en-US-SaraNeural", **kwargs):
super().__init__(**kwargs)
speech_config = SpeechConfig(subscription=api_key, region=region)
self._speech_synthesizer = SpeechSynthesizer(speech_config=speech_config, audio_config=None)
self._voice = voice
def can_generate_metrics(self) -> bool:
return True
async def run_tts(self, text: str) -> AsyncGenerator[Frame, None]:
logger.debug(f"Generating TTS: {text}")
await self.start_ttfb_metrics()
ssml = (
"<speak version='1.0' xml:lang='en-US' xmlns='http://www.w3.org/2001/10/synthesis' "
"xmlns:mstts='http://www.w3.org/2001/mstts'>"
f"<voice name='{self._voice}'>"
"<mstts:silence type='Sentenceboundary' value='20ms' />"
"<mstts:express-as style='lyrical' styledegree='2' role='SeniorFemale'>"
"<prosody rate='1.05'>"
f"{text}"
"</prosody></mstts:express-as></voice></speak> ")
result = await asyncio.to_thread(self._speech_synthesizer.speak_ssml, (ssml))
if result.reason == ResultReason.SynthesizingAudioCompleted:
await self.stop_ttfb_metrics()
# Azure always sends a 44-byte header. Strip it off.
yield AudioRawFrame(audio=result.audio_data[44:], sample_rate=16000, num_channels=1)
elif result.reason == ResultReason.Canceled:
cancellation_details = result.cancellation_details
logger.warning(f"Speech synthesis canceled: {cancellation_details.reason}")
if cancellation_details.reason == CancellationReason.Error:
logger.error(f"{self} error: {cancellation_details.error_details}")
class AzureSTTService(AIService):
def __init__(
self,
*,
api_key: str,
region: str,
language="en-US",
sample_rate=16000,
channels=1,
**kwargs):
super().__init__(**kwargs)
speech_config = SpeechConfig(subscription=api_key, region=region)
speech_config.speech_recognition_language = language
stream_format = AudioStreamFormat(samples_per_second=sample_rate, channels=channels)
self._audio_stream = PushAudioInputStream(stream_format)
audio_config = AudioConfig(stream=self._audio_stream)
self._speech_recognizer = SpeechRecognizer(
speech_config=speech_config, audio_config=audio_config)
self._speech_recognizer.recognized.connect(self._on_handle_recognized)
self._create_push_task()
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, SystemFrame):
await self.push_frame(frame, direction)
elif isinstance(frame, AudioRawFrame):
self._audio_stream.write(frame.audio)
else:
await self._push_queue.put((frame, direction))
async def start(self, frame: StartFrame):
self._speech_recognizer.start_continuous_recognition_async()
async def stop(self, frame: EndFrame):
self._speech_recognizer.stop_continuous_recognition_async()
await self._push_queue.put((frame, FrameDirection.DOWNSTREAM))
await self._push_frame_task
async def cancel(self, frame: CancelFrame):
self._speech_recognizer.stop_continuous_recognition_async()
self._push_frame_task.cancel()
await self._push_frame_task
def _create_push_task(self):
self._push_queue = asyncio.Queue()
self._push_frame_task = self.get_event_loop().create_task(self._push_frame_task_handler())
async def _push_frame_task_handler(self):
running = True
while running:
try:
(frame, direction) = await self._push_queue.get()
await self.push_frame(frame, direction)
running = not isinstance(frame, EndFrame)
except asyncio.CancelledError:
break
def _on_handle_recognized(self, event):
if event.result.reason == ResultReason.RecognizedSpeech and len(event.result.text) > 0:
direction = FrameDirection.DOWNSTREAM
frame = TranscriptionFrame(event.result.text, "", int(time.time_ns() / 1000000))
asyncio.run_coroutine_threadsafe(
self._push_queue.put((frame, direction)), self.get_event_loop())
class AzureImageGenServiceREST(ImageGenService):
def __init__(
@@ -138,7 +216,7 @@ class AzureImageGenServiceREST(ImageGenService):
while status != "succeeded":
attempts_left -= 1
if attempts_left == 0:
logger.error("Image generation timed out")
logger.error(f"{self} error: image generation timed out")
yield ErrorFrame("Image generation timed out")
return
@@ -151,7 +229,7 @@ class AzureImageGenServiceREST(ImageGenService):
image_url = json_response["result"]["data"][0]["url"] if json_response else None
if not image_url:
logger.error("Image generation failed")
logger.error(f"{self} error: image generation failed")
yield ErrorFrame("Image generation failed")
return

View File

@@ -37,12 +37,17 @@ class CartesiaTTSService(TTSService):
voice_id = voices[self._voice_name]["id"]
self._voice = self._client.get_voice_embedding(voice_id=voice_id)
except Exception as e:
logger.error(f"Cartesia initialization error: {e}")
logger.error(f"{self} initialization error: {e}")
def can_generate_metrics(self) -> bool:
return True
async def run_tts(self, text: str) -> AsyncGenerator[Frame, None]:
logger.debug(f"Generating TTS: [{text}]")
try:
await self.start_ttfb_metrics()
chunk_generator = await self._client.generate(
stream=True,
transcript=text,
@@ -52,6 +57,7 @@ class CartesiaTTSService(TTSService):
)
async for chunk in chunk_generator:
await self.stop_ttfb_metrics()
yield AudioRawFrame(chunk["audio"], chunk["sampling_rate"], 1)
except Exception as e:
logger.error(f"Cartesia exception: {e}")
logger.error(f"{self} exception: {e}")

View File

@@ -5,11 +5,30 @@
#
import aiohttp
import asyncio
import time
from typing import AsyncGenerator
from pipecat.frames.frames import AudioRawFrame, ErrorFrame, Frame
from pipecat.services.ai_services import TTSService
from pipecat.frames.frames import (
AudioRawFrame,
CancelFrame,
EndFrame,
ErrorFrame,
Frame,
InterimTranscriptionFrame,
StartFrame,
SystemFrame,
TranscriptionFrame)
from pipecat.processors.frame_processor import FrameDirection
from pipecat.services.ai_services import AIService, TTSService
from deepgram import (
DeepgramClient,
DeepgramClientOptions,
LiveTranscriptionEvents,
LiveOptions,
)
from loguru import logger
@@ -22,31 +41,122 @@ class DeepgramTTSService(TTSService):
aiohttp_session: aiohttp.ClientSession,
api_key: str,
voice: str = "aura-helios-en",
base_url: str = "https://api.deepgram.com/v1/speak",
**kwargs):
super().__init__(**kwargs)
self._voice = voice
self._api_key = api_key
self._aiohttp_session = aiohttp_session
self._base_url = base_url
def can_generate_metrics(self) -> bool:
return True
async def run_tts(self, text: str) -> AsyncGenerator[Frame, None]:
logger.debug(f"Generating TTS: [{text}]")
base_url = "https://api.deepgram.com/v1/speak"
base_url = self._base_url
request_url = f"{base_url}?model={self._voice}&encoding=linear16&container=none&sample_rate=16000"
headers = {"authorization": f"token {self._api_key}"}
body = {"text": text}
try:
await self.start_ttfb_metrics()
async with self._aiohttp_session.post(request_url, headers=headers, json=body) as r:
if r.status != 200:
text = await r.text()
logger.error(f"Error getting audio (status: {r.status}, error: {text})")
yield ErrorFrame(f"Error getting audio (status: {r.status}, error: {text})")
response_text = await r.text()
# If we get a a "Bad Request: Input is unutterable", just print out a debug log.
# All other unsuccesful requests should emit an error frame. If not specifically
# handled by the running PipelineTask, the ErrorFrame will cancel the task.
if "unutterable" in response_text:
logger.debug(f"Unutterable text: [{text}]")
return
logger.error(
f"{self} error getting audio (status: {r.status}, error: {response_text})")
yield ErrorFrame(f"Error getting audio (status: {r.status}, error: {response_text})")
return
async for data in r.content:
await self.stop_ttfb_metrics()
frame = AudioRawFrame(audio=data, sample_rate=16000, num_channels=1)
yield frame
except Exception as e:
logger.error(f"Deepgram exception: {e}")
logger.error(f"{self} exception: {e}")
class DeepgramSTTService(AIService):
def __init__(self,
api_key: str,
url: str = "",
live_options: LiveOptions = LiveOptions(
encoding="linear16",
language="en-US",
model="nova-2-conversationalai",
sample_rate=16000,
channels=1,
interim_results=True,
smart_format=True,
),
**kwargs):
super().__init__(**kwargs)
self._live_options = live_options
self._client = DeepgramClient(
api_key, config=DeepgramClientOptions(url=url, options={"keepalive": "true"}))
self._connection = self._client.listen.asynclive.v("1")
self._connection.on(LiveTranscriptionEvents.Transcript, self._on_message)
self._create_push_task()
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, SystemFrame):
await self.push_frame(frame, direction)
elif isinstance(frame, AudioRawFrame):
await self._connection.send(frame.audio)
else:
await self._push_queue.put((frame, direction))
async def start(self, frame: StartFrame):
if await self._connection.start(self._live_options):
logger.debug(f"{self}: Connected to Deepgram")
else:
logger.error(f"{self}: Unable to connect to Deepgram")
async def stop(self, frame: EndFrame):
await self._connection.finish()
await self._push_queue.put((frame, FrameDirection.DOWNSTREAM))
await self._push_frame_task
async def cancel(self, frame: CancelFrame):
await self._connection.finish()
self._push_frame_task.cancel()
await self._push_frame_task
def _create_push_task(self):
self._push_queue = asyncio.Queue()
self._push_frame_task = self.get_event_loop().create_task(self._push_frame_task_handler())
async def _push_frame_task_handler(self):
running = True
while running:
try:
(frame, direction) = await self._push_queue.get()
await self.push_frame(frame, direction)
running = not isinstance(frame, EndFrame)
except asyncio.CancelledError:
break
async def _on_message(self, *args, **kwargs):
result = kwargs["result"]
is_final = result.is_final
transcript = result.channel.alternatives[0].transcript
if len(transcript) > 0:
if is_final:
await self._push_queue.put((TranscriptionFrame(transcript, "", int(time.time_ns() / 1000000)), FrameDirection.DOWNSTREAM))
else:
await self._push_queue.put((InterimTranscriptionFrame(transcript, "", int(time.time_ns() / 1000000)), FrameDirection.DOWNSTREAM))

View File

@@ -31,6 +31,9 @@ class ElevenLabsTTSService(TTSService):
self._aiohttp_session = aiohttp_session
self._model = model
def can_generate_metrics(self) -> bool:
return True
async def run_tts(self, text: str) -> AsyncGenerator[Frame, None]:
logger.debug(f"Generating TTS: [{text}]")
@@ -47,14 +50,17 @@ class ElevenLabsTTSService(TTSService):
"Content-Type": "application/json",
}
await self.start_ttfb_metrics()
async with self._aiohttp_session.post(url, json=payload, headers=headers, params=querystring) as r:
if r.status != 200:
text = await r.text()
logger.error(f"Error getting audio (status: {r.status}, error: {text})")
logger.error(f"{self} error getting audio (status: {r.status}, error: {text})")
yield ErrorFrame(f"Error getting audio (status: {r.status}, error: {text})")
return
async for chunk in r.content:
if len(chunk) > 0:
await self.stop_ttfb_metrics()
frame = AudioRawFrame(chunk, 16000, 1)
yield frame

View File

@@ -62,7 +62,7 @@ class FalImageGenService(ImageGenService):
image_url = response["images"][0]["url"] if response else None
if not image_url:
logger.error("Image generation failed")
logger.error(f"{self} error: image generation failed")
yield ErrorFrame("Image generation failed")
return

View File

@@ -1,8 +1,10 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import json
import os
import asyncio
import time
from typing import List
@@ -45,6 +47,9 @@ class GoogleLLMService(LLMService):
gai.configure(api_key=api_key)
self._client = gai.GenerativeModel(model)
def can_generate_metrics(self) -> bool:
return True
def _get_messages_from_openai_context(
self, context: OpenAILLMContext) -> List[glm.Content]:
openai_messages = context.get_messages()
@@ -81,9 +86,11 @@ class GoogleLLMService(LLMService):
messages = self._get_messages_from_openai_context(context)
start_time = time.time()
await self.start_ttfb_metrics()
response = self._client.generate_content(messages, stream=True)
logger.debug(f"Google LLM TTFB: {time.time() - start_time}")
await self.stop_ttfb_metrics()
async for chunk in self._async_generator_wrapper(response):
try:
@@ -97,14 +104,16 @@ class GoogleLLMService(LLMService):
logger.debug(
f"LLM refused to generate content for safety reasons - {messages}.")
else:
logger.error(f"Error {e}")
logger.error(f"{self} error: {e}")
except Exception as e:
logger.error(f"Exception: {e}")
logger.error(f"{self} exception: {e}")
finally:
await self.push_frame(LLMFullResponseEndFrame())
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
context = None
if isinstance(frame, OpenAILLMContextFrame):

View File

@@ -71,7 +71,7 @@ class MoondreamService(VisionService):
async def run_vision(self, frame: VisionImageRawFrame) -> AsyncGenerator[Frame, None]:
if not self._model:
logger.error("Moondream model not available")
logger.error(f"{self} error: Moondream model not available")
yield ErrorFrame("Moondream model not available")
return

View File

@@ -3,13 +3,14 @@
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import aiohttp
import base64
import io
import json
import time
from typing import AsyncGenerator, List, Literal
import aiohttp
from typing import Any, AsyncGenerator, List, Literal
from loguru import logger
from PIL import Image
@@ -40,7 +41,6 @@ from pipecat.services.ai_services import (
try:
from openai import AsyncOpenAI, AsyncStream, BadRequestError
from openai.types.chat import (
ChatCompletion,
ChatCompletionChunk,
ChatCompletionFunctionMessageParam,
ChatCompletionMessageParam,
@@ -67,17 +67,32 @@ class BaseOpenAILLMService(LLMService):
calls from the LLM.
"""
def __init__(self, model: str, api_key=None, base_url=None):
super().__init__()
def __init__(self, model: str, api_key=None, base_url=None, **kwargs):
super().__init__(**kwargs)
self._model: str = model
self._client = self.create_client(api_key=api_key, base_url=base_url)
self._client = self.create_client(api_key=api_key, base_url=base_url, **kwargs)
def create_client(self, api_key=None, base_url=None):
def create_client(self, api_key=None, base_url=None, **kwargs):
return AsyncOpenAI(api_key=api_key, base_url=base_url)
def can_generate_metrics(self) -> bool:
return True
async def get_chat_completions(
self,
context: OpenAILLMContext,
messages: List[ChatCompletionMessageParam]) -> AsyncStream[ChatCompletionChunk]:
chunks = await self._client.chat.completions.create(
model=self._model,
stream=True,
messages=messages,
tools=context.tools,
tool_choice=context.tool_choice,
)
return chunks
async def _stream_chat_completions(
self, context: OpenAILLMContext
) -> AsyncStream[ChatCompletionChunk]:
self, context: OpenAILLMContext) -> AsyncStream[ChatCompletionChunk]:
logger.debug(f"Generating chat: {context.get_messages_json()}")
messages: List[ChatCompletionMessageParam] = context.get_messages()
@@ -94,35 +109,20 @@ class BaseOpenAILLMService(LLMService):
del message["data"]
del message["mime_type"]
start_time = time.time()
chunks: AsyncStream[ChatCompletionChunk] = (
await self._client.chat.completions.create(
model=self._model,
stream=True,
messages=messages,
tools=context.tools,
tool_choice=context.tool_choice,
)
)
logger.debug(f"OpenAI LLM TTFB: {time.time() - start_time}")
try:
chunks = await self.get_chat_completions(context, messages)
except Exception as e:
logger.error(f"{self} exception: {e}")
return chunks
async def _chat_completions(self, messages) -> str | None:
response: ChatCompletion = await self._client.chat.completions.create(
model=self._model, stream=False, messages=messages
)
if response and len(response.choices) > 0:
return response.choices[0].message.content
else:
return None
async def _process_context(self, context: OpenAILLMContext):
function_name = ""
arguments = ""
tool_call_id = ""
await self.start_ttfb_metrics()
chunk_stream: AsyncStream[ChatCompletionChunk] = (
await self._stream_chat_completions(context)
)
@@ -131,6 +131,8 @@ class BaseOpenAILLMService(LLMService):
if len(chunk.choices) == 0:
continue
await self.stop_ttfb_metrics()
if chunk.choices[0].delta.tool_calls:
# We're streaming the LLM response to enable the fastest response times.
# For text, we just yield each chunk as we receive it and count on consumers
@@ -215,6 +217,8 @@ class BaseOpenAILLMService(LLMService):
raise BaseException(f"Unknown return type from function callback: {type(result)}")
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
context = None
if isinstance(frame, OpenAILLMContextFrame):
context: OpenAILLMContext = frame.context
@@ -266,7 +270,7 @@ class OpenAIImageGenService(ImageGenService):
image_url = image.data[0].url
if not image_url:
logger.error(f"No image provided in response: {image}")
logger.error(f"{self} No image provided in response: {image}")
yield ErrorFrame("Image generation failed")
return
@@ -303,10 +307,15 @@ class OpenAITTSService(TTSService):
self._client = AsyncOpenAI(api_key=api_key)
def can_generate_metrics(self) -> bool:
return True
async def run_tts(self, text: str) -> AsyncGenerator[Frame, None]:
logger.debug(f"Generating TTS: [{text}]")
try:
await self.start_ttfb_metrics()
async with self._client.audio.speech.with_streaming_response.create(
input=text,
model=self._model,
@@ -315,12 +324,14 @@ class OpenAITTSService(TTSService):
) as r:
if r.status_code != 200:
error = await r.text()
logger.error(f"Error getting audio (status: {r.status_code}, error: {error})")
logger.error(
f"{self} error getting audio (status: {r.status_code}, error: {error})")
yield ErrorFrame(f"Error getting audio (status: {r.status_code}, error: {error})")
return
async for chunk in r.iter_bytes(8192):
if len(chunk) > 0:
await self.stop_ttfb_metrics()
frame = AudioRawFrame(chunk, 24_000, 1)
yield frame
except BadRequestError as e:
logger.error(f"Error generating TTS: {e}")
logger.error(f"{self} error generating TTS: {e}")

View File

@@ -0,0 +1,70 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
from typing import Dict, List
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.openai import BaseOpenAILLMService
from loguru import logger
try:
from openpipe import AsyncOpenAI as OpenPipeAI, AsyncStream
from openai.types.chat import (ChatCompletionMessageParam, ChatCompletionChunk)
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error(
"In order to use OpenPipe, you need to `pip install pipecat-ai[openpipe]`. Also, set `OPENPIPE_API_KEY` and `OPENAI_API_KEY` environment variables.")
raise Exception(f"Missing module: {e}")
class OpenPipeLLMService(BaseOpenAILLMService):
def __init__(
self,
model: str = "gpt-4o",
api_key: str | None = None,
base_url: str | None = None,
openpipe_api_key: str | None = None,
openpipe_base_url: str = "https://app.openpipe.ai/api/v1",
tags: Dict[str, str] | None = None,
**kwargs):
super().__init__(
model,
api_key,
base_url,
openpipe_api_key=openpipe_api_key,
openpipe_base_url=openpipe_base_url,
**kwargs)
self._tags = tags
def create_client(self, api_key=None, base_url=None, **kwargs):
openpipe_api_key = kwargs.get("openpipe_api_key") or ""
openpipe_base_url = kwargs.get("openpipe_base_url") or ""
client = OpenPipeAI(
api_key=api_key,
base_url=base_url,
openpipe={
"api_key": openpipe_api_key,
"base_url": openpipe_base_url
}
)
return client
async def get_chat_completions(
self,
context: OpenAILLMContext,
messages: List[ChatCompletionMessageParam]) -> AsyncStream[ChatCompletionChunk]:
chunks = await self._client.chat.completions.create(
model=self._model,
stream=True,
messages=messages,
openpipe={
"tags": self._tags,
"log_request": True
}
)
return chunks

View File

@@ -15,8 +15,8 @@ from pipecat.services.ai_services import TTSService
from loguru import logger
try:
from pyht import Client
from pyht.client import TTSOptions
from pyht.async_client import AsyncClient
from pyht.protos.api_pb2 import Format
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
@@ -25,7 +25,7 @@ except ModuleNotFoundError as e:
raise Exception(f"Missing module: {e}")
class PlayHTAIService(TTSService):
class PlayHTTTSService(TTSService):
def __init__(self, *, api_key: str, user_id: str, voice_url: str, **kwargs):
super().__init__(**kwargs)
@@ -33,7 +33,7 @@ class PlayHTAIService(TTSService):
self._user_id = user_id
self._speech_key = api_key
self._client = Client(
self._client = AsyncClient(
user_id=self._user_id,
api_key=self._speech_key,
)
@@ -43,32 +43,41 @@ class PlayHTAIService(TTSService):
quality="higher",
format=Format.FORMAT_WAV)
def __del__(self):
self._client.close()
def can_generate_metrics(self) -> bool:
return True
async def run_tts(self, text: str) -> AsyncGenerator[Frame, None]:
b = bytearray()
in_header = True
for chunk in self._client.tts(text, self._options):
# skip the RIFF header.
if in_header:
b.extend(chunk)
if len(b) <= 36:
continue
else:
fh = io.BytesIO(b)
fh.seek(36)
(data, size) = struct.unpack('<4sI', fh.read(8))
logger.debug(
f"first attempt: data: {data}, size: {hex(size)}, position: {fh.tell()}")
while data != b'data':
fh.read(size)
logger.debug(f"Generating TTS: [{text}]")
try:
b = bytearray()
in_header = True
await self.start_ttfb_metrics()
playht_gen = self._client.tts(
text,
voice_engine="PlayHT2.0-turbo",
options=self._options)
async for chunk in playht_gen:
# skip the RIFF header.
if in_header:
b.extend(chunk)
if len(b) <= 36:
continue
else:
fh = io.BytesIO(b)
fh.seek(36)
(data, size) = struct.unpack('<4sI', fh.read(8))
logger.debug(
f"subsequent data: {data}, size: {hex(size)}, position: {fh.tell()}, data != data: {data != b'data'}")
logger.debug("position: ", fh.tell())
in_header = False
else:
if len(chunk):
frame = AudioRawFrame(chunk, 16000, 1)
yield frame
while data != b'data':
fh.read(size)
(data, size) = struct.unpack('<4sI', fh.read(8))
in_header = False
else:
if len(chunk):
await self.stop_ttfb_metrics()
frame = AudioRawFrame(chunk, 16000, 1)
yield frame
except Exception as e:
logger.error(f"{self} error generating TTS: {e}")

View File

@@ -45,7 +45,7 @@ class WhisperSTTService(STTService):
model: Model = Model.DISTIL_MEDIUM_EN,
device: str = "auto",
compute_type: str = "default",
no_speech_prob: float = 0.1,
no_speech_prob: float = 0.4,
**kwargs):
super().__init__(**kwargs)
@@ -56,6 +56,9 @@ class WhisperSTTService(STTService):
self._model: WhisperModel | None = None
self._load()
def can_generate_metrics(self) -> bool:
return True
def _load(self):
"""Loads the Whisper model. Note that if this is the first time
this model is being run, it will take time to download."""
@@ -69,10 +72,12 @@ class WhisperSTTService(STTService):
async def run_stt(self, audio: bytes) -> AsyncGenerator[Frame, None]:
"""Transcribes given audio using Whisper"""
if not self._model:
logger.error(f"{self} error: Whisper model not available")
yield ErrorFrame("Whisper model not available")
logger.error("Whisper model not available")
return
await self.start_ttfb_metrics()
# Divide by 32768 because we have signed 16-bit data.
audio_float = np.frombuffer(audio, dtype=np.int16).astype(np.float32) / 32768.0
@@ -83,4 +88,6 @@ class WhisperSTTService(STTService):
text += f"{segment.text} "
if text:
await self.stop_ttfb_metrics()
logger.debug(f"Transcription: [{text}]")
yield TranscriptionFrame(text, "", int(time.time_ns() / 1000000))

View File

@@ -5,7 +5,6 @@
#
import asyncio
import queue
from concurrent.futures import ThreadPoolExecutor
@@ -28,14 +27,11 @@ from loguru import logger
class BaseInputTransport(FrameProcessor):
def __init__(self, params: TransportParams):
super().__init__()
def __init__(self, params: TransportParams, **kwargs):
super().__init__(**kwargs)
self._params = params
self._running = False
self._allow_interruptions = False
self._executor = ThreadPoolExecutor(max_workers=5)
# Create push frame task. This is the task that will push frames in
@@ -43,55 +39,40 @@ class BaseInputTransport(FrameProcessor):
self._create_push_task()
async def start(self, frame: StartFrame):
# Make sure we have the latest params. Note that this transport might
# have been started on another task that might not need interruptions,
# for example.
self._allow_interruptions = frame.allow_interruptions
if self._running:
return
self._running = True
# Create audio input queue and thread if needed.
# Create audio input queue and task if needed.
if self._params.audio_in_enabled or self._params.vad_enabled:
self._audio_in_queue = queue.Queue()
self._audio_thread = self._loop.run_in_executor(
self._executor, self._audio_thread_handler)
self._audio_in_queue = asyncio.Queue()
self._audio_task = self.get_event_loop().create_task(self._audio_task_handler())
async def stop(self):
if not self._running:
return
# This will exit all threads.
self._running = False
# Wait for the threads to finish.
# Wait for the task to finish.
if self._params.audio_in_enabled or self._params.vad_enabled:
await self._audio_thread
self._push_frame_task.cancel()
self._audio_task.cancel()
await self._audio_task
def vad_analyzer(self) -> VADAnalyzer | None:
return self._params.vad_analyzer
def push_audio_frame(self, frame: AudioRawFrame):
self._audio_in_queue.put_nowait(frame)
async def push_audio_frame(self, frame: AudioRawFrame):
if self._params.audio_in_enabled or self._params.vad_enabled:
self._audio_in_queue.put_nowait(frame)
#
# Frame processor
#
async def cleanup(self):
pass
self._push_frame_task.cancel()
await self._push_frame_task
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, CancelFrame):
await self.stop()
# We don't queue a CancelFrame since we want to stop ASAP.
await self.push_frame(frame, direction)
await self.stop()
elif isinstance(frame, StartFrame):
self._allow_interruption = frame.allow_interruptions
await self.start(frame)
await self._internal_push_frame(frame, direction)
elif isinstance(frame, EndFrame):
@@ -106,8 +87,8 @@ class BaseInputTransport(FrameProcessor):
def _create_push_task(self):
loop = self.get_event_loop()
self._push_frame_task = loop.create_task(self._push_frame_task_handler())
self._push_queue = asyncio.Queue()
self._push_frame_task = loop.create_task(self._push_frame_task_handler())
async def _internal_push_frame(
self,
@@ -128,11 +109,12 @@ class BaseInputTransport(FrameProcessor):
#
async def _handle_interruptions(self, frame: Frame):
if self._allow_interruptions:
if self.interruptions_allowed:
# Make sure we notify about interruptions quickly out-of-band
if isinstance(frame, UserStartedSpeakingFrame):
logger.debug("User started speaking")
self._push_frame_task.cancel()
await self._push_frame_task
self._create_push_task()
await self.push_frame(StartInterruptionFrame())
elif isinstance(frame, UserStoppedSpeakingFrame):
@@ -144,15 +126,16 @@ class BaseInputTransport(FrameProcessor):
# Audio input
#
def _vad_analyze(self, audio_frames: bytes) -> VADState:
async def _vad_analyze(self, audio_frames: bytes) -> VADState:
state = VADState.QUIET
vad_analyzer = self.vad_analyzer()
if vad_analyzer:
state = vad_analyzer.analyze_audio(audio_frames)
state = await self.get_event_loop().run_in_executor(
self._executor, vad_analyzer.analyze_audio, audio_frames)
return state
def _handle_vad(self, audio_frames: bytes, vad_state: VADState):
new_vad_state = self._vad_analyze(audio_frames)
async def _handle_vad(self, audio_frames: bytes, vad_state: VADState):
new_vad_state = await self._vad_analyze(audio_frames)
if new_vad_state != vad_state and new_vad_state != VADState.STARTING and new_vad_state != VADState.STOPPING:
frame = None
if new_vad_state == VADState.SPEAKING:
@@ -161,33 +144,29 @@ class BaseInputTransport(FrameProcessor):
frame = UserStoppedSpeakingFrame()
if frame:
future = asyncio.run_coroutine_threadsafe(
self._handle_interruptions(frame), self.get_event_loop())
future.result()
await self._handle_interruptions(frame)
vad_state = new_vad_state
return vad_state
def _audio_thread_handler(self):
async def _audio_task_handler(self):
vad_state: VADState = VADState.QUIET
while self._running:
while True:
try:
frame: AudioRawFrame = self._audio_in_queue.get(timeout=1)
frame: AudioRawFrame = await self._audio_in_queue.get()
audio_passthrough = True
# Check VAD and push event if necessary. We just care about
# changes from QUIET to SPEAKING and vice versa.
if self._params.vad_enabled:
vad_state = self._handle_vad(frame.audio, vad_state)
vad_state = await self._handle_vad(frame.audio, vad_state)
audio_passthrough = self._params.vad_audio_passthrough
# Push audio downstream if passthrough.
# Push audio downstream if passthrough.
if audio_passthrough:
future = asyncio.run_coroutine_threadsafe(
self._internal_push_frame(frame), self._loop)
future.result()
except queue.Empty:
pass
await self._internal_push_frame(frame)
except asyncio.CancelledError:
break
except BaseException as e:
logger.error(f"Error reading audio frames: {e}")
logger.error(f"{self} error reading audio frames: {e}")

View File

@@ -7,11 +7,6 @@
import asyncio
import itertools
import queue
import time
import threading
from concurrent.futures import ThreadPoolExecutor
from PIL import Image
from typing import List
@@ -20,6 +15,7 @@ from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.frames.frames import (
AudioRawFrame,
CancelFrame,
MetricsFrame,
SpriteFrame,
StartFrame,
EndFrame,
@@ -27,6 +23,7 @@ from pipecat.frames.frames import (
ImageRawFrame,
StartInterruptionFrame,
StopInterruptionFrame,
SystemFrame,
TransportMessageFrame)
from pipecat.transports.base_transport import TransportParams
@@ -35,68 +32,56 @@ from loguru import logger
class BaseOutputTransport(FrameProcessor):
def __init__(self, params: TransportParams):
super().__init__()
def __init__(self, params: TransportParams, **kwargs):
super().__init__(**kwargs)
self._params = params
self._running = False
self._allow_interruptions = False
self._executor = ThreadPoolExecutor(max_workers=5)
# These are the images that we should send to the camera at our desired
# framerate.
self._camera_images = None
# Create media threads queues.
if self._params.camera_out_enabled:
self._camera_out_queue = queue.Queue()
self._sink_queue = queue.Queue()
# We will write 20ms audio at a time. If we receive long audio frames we
# will chunk them. This will help with interruption handling.
audio_bytes_10ms = int(self._params.audio_out_sample_rate / 100) * \
self._params.audio_out_channels * 2
self._audio_chunk_size = audio_bytes_10ms * 2
self._stopped_event = asyncio.Event()
self._is_interrupted = threading.Event()
# Create sink frame task. This is the task that will actually write
# audio or video frames. We write audio/video in a task so we can keep
# generating frames upstream while, for example, the audio is playing.
self._create_sink_task()
# Create push frame task. This is the task that will push frames in
# order. We also guarantee that all frames are pushed in the same task.
self._create_push_task()
async def start(self, frame: StartFrame):
# Make sure we have the latest params. Note that this transport might
# have been started on another task that might not need interruptions,
# for example.
self._allow_interruptions = frame.allow_interruptions
if self._running:
return
self._running = True
loop = self.get_event_loop()
# Create queues and threads.
# Create media threads queues.
if self._params.camera_out_enabled:
self._camera_out_thread = loop.run_in_executor(
self._executor, self._camera_out_thread_handler)
self._sink_thread = loop.run_in_executor(self._executor, self._sink_thread_handler)
self._camera_out_queue = asyncio.Queue()
self._camera_out_task = self.get_event_loop().create_task(self._camera_out_task_handler())
async def stop(self):
if not self._running:
return
# This will exit all threads.
self._running = False
# Wait on the threads to finish.
if self._params.camera_out_enabled:
self._camera_out_task.cancel()
await self._camera_out_task
self._stopped_event.set()
def send_message(self, frame: TransportMessageFrame):
async def send_message(self, frame: TransportMessageFrame):
pass
def write_frame_to_camera(self, frame: ImageRawFrame):
async def send_metrics(self, frame: MetricsFrame):
pass
def write_raw_audio_frames(self, frames: bytes):
async def write_frame_to_camera(self, frame: ImageRawFrame):
pass
async def write_raw_audio_frames(self, frames: bytes):
pass
#
@@ -104,13 +89,16 @@ class BaseOutputTransport(FrameProcessor):
#
async def cleanup(self):
# Wait on the threads to finish.
if self._params.camera_out_enabled:
await self._camera_out_thread
if self._sink_task:
self._sink_task.cancel()
await self._sink_task
await self._sink_thread
self._push_frame_task.cancel()
await self._push_frame_task
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
#
# Out-of-band frames like (CancelFrame or StartInterruptionFrame) are
# pushed immediately. Other frames require order so they are put in the
@@ -118,16 +106,23 @@ class BaseOutputTransport(FrameProcessor):
#
if isinstance(frame, StartFrame):
await self.start(frame)
self._sink_queue.put_nowait(frame)
# EndFrame is managed in the queue handler.
await self.push_frame(frame, direction)
# EndFrame is managed in the sink queue handler.
elif isinstance(frame, CancelFrame):
await self.push_frame(frame, direction)
await self.stop()
elif isinstance(frame, StartInterruptionFrame) or isinstance(frame, StopInterruptionFrame):
await self.push_frame(frame, direction)
elif isinstance(frame, StartInterruptionFrame) or isinstance(frame, StopInterruptionFrame):
await self._handle_interruptions(frame)
await self.push_frame(frame, direction)
elif isinstance(frame, MetricsFrame):
await self.send_metrics(frame)
await self.push_frame(frame, direction)
elif isinstance(frame, SystemFrame):
await self.push_frame(frame, direction)
elif isinstance(frame, AudioRawFrame):
await self._handle_audio(frame)
else:
self._sink_queue.put_nowait(frame)
await self._sink_queue.put(frame)
# If we are finishing, wait here until we have stopped, otherwise we might
# close things too early upstream. We need this event because we don't
@@ -136,59 +131,57 @@ class BaseOutputTransport(FrameProcessor):
await self._stopped_event.wait()
async def _handle_interruptions(self, frame: Frame):
if not self._allow_interruptions:
if not self.interruptions_allowed:
return
if isinstance(frame, StartInterruptionFrame):
self._is_interrupted.set()
# Stop sink task.
self._sink_task.cancel()
await self._sink_task
self._create_sink_task()
# Stop push task.
self._push_frame_task.cancel()
await self._push_frame_task
self._create_push_task()
elif isinstance(frame, StopInterruptionFrame):
self._is_interrupted.clear()
def _sink_thread_handler(self):
# 10ms bytes
bytes_size_10ms = int(self._params.audio_out_sample_rate / 100) * \
self._params.audio_out_channels * 2
async def _handle_audio(self, frame: AudioRawFrame):
audio = frame.audio
for i in range(0, len(audio), self._audio_chunk_size):
chunk = AudioRawFrame(audio[i: i + self._audio_chunk_size],
sample_rate=frame.sample_rate, num_channels=frame.num_channels)
await self._sink_queue.put(chunk)
# We will send at least 100ms bytes.
smallest_write_size = bytes_size_10ms * 10
def _create_sink_task(self):
loop = self.get_event_loop()
self._sink_queue = asyncio.Queue()
self._sink_task = loop.create_task(self._sink_task_handler())
async def _sink_task_handler(self):
# Audio accumlation buffer
buffer = bytearray()
while self._running:
while True:
try:
frame = self._sink_queue.get(timeout=1)
if not self._is_interrupted.is_set():
if isinstance(frame, AudioRawFrame):
if self._params.audio_out_enabled:
buffer.extend(frame.audio)
buffer = self._send_audio_truncated(buffer, smallest_write_size)
elif isinstance(frame, ImageRawFrame) and self._params.camera_out_enabled:
self._set_camera_image(frame)
elif isinstance(frame, SpriteFrame) and self._params.camera_out_enabled:
self._set_camera_images(frame.images)
elif isinstance(frame, TransportMessageFrame):
self.send_message(frame)
else:
future = asyncio.run_coroutine_threadsafe(
self._internal_push_frame(frame), self.get_event_loop())
future.result()
frame = await self._sink_queue.get()
if isinstance(frame, AudioRawFrame) and self._params.audio_out_enabled:
buffer.extend(frame.audio)
buffer = await self._maybe_send_audio(buffer)
elif isinstance(frame, ImageRawFrame) and self._params.camera_out_enabled:
await self._set_camera_image(frame)
elif isinstance(frame, SpriteFrame) and self._params.camera_out_enabled:
await self._set_camera_images(frame.images)
elif isinstance(frame, TransportMessageFrame):
await self.send_message(frame)
else:
# If we get interrupted just clear the output buffer.
buffer = bytearray()
await self._internal_push_frame(frame)
if isinstance(frame, EndFrame):
# Send all remaining audio before stopping (multiple of 10ms of audio).
self._send_audio_truncated(buffer, bytes_size_10ms)
future = asyncio.run_coroutine_threadsafe(self.stop(), self.get_event_loop())
future.result()
await self.stop()
self._sink_queue.task_done()
except queue.Empty:
pass
except asyncio.CancelledError:
break
except BaseException as e:
logger.error(f"Error processing sink queue: {e}")
logger.error(f"{self} error processing sink queue: {e}")
#
# Push frames task
@@ -196,8 +189,8 @@ class BaseOutputTransport(FrameProcessor):
def _create_push_task(self):
loop = self.get_event_loop()
self._push_frame_task = loop.create_task(self._push_frame_task_handler())
self._push_queue = asyncio.Queue()
self._push_frame_task = loop.create_task(self._push_frame_task_handler())
async def _internal_push_frame(
self,
@@ -220,7 +213,7 @@ class BaseOutputTransport(FrameProcessor):
async def send_image(self, frame: ImageRawFrame | SpriteFrame):
await self.process_frame(frame, FrameDirection.DOWNSTREAM)
def _draw_image(self, frame: ImageRawFrame):
async def _draw_image(self, frame: ImageRawFrame):
desired_size = (self._params.camera_out_width, self._params.camera_out_height)
if frame.size != desired_size:
@@ -230,34 +223,34 @@ class BaseOutputTransport(FrameProcessor):
f"{frame} does not have the expected size {desired_size}, resizing")
frame = ImageRawFrame(resized_image.tobytes(), resized_image.size, resized_image.format)
self.write_frame_to_camera(frame)
await self.write_frame_to_camera(frame)
def _set_camera_image(self, image: ImageRawFrame):
async def _set_camera_image(self, image: ImageRawFrame):
if self._params.camera_out_is_live:
self._camera_out_queue.put_nowait(image)
await self._camera_out_queue.put(image)
else:
self._camera_images = itertools.cycle([image])
def _set_camera_images(self, images: List[ImageRawFrame]):
async def _set_camera_images(self, images: List[ImageRawFrame]):
self._camera_images = itertools.cycle(images)
def _camera_out_thread_handler(self):
while self._running:
async def _camera_out_task_handler(self):
while True:
try:
if self._params.camera_out_is_live:
image = self._camera_out_queue.get(timeout=1)
self._draw_image(image)
image = await self._camera_out_queue.get()
await self._draw_image(image)
self._camera_out_queue.task_done()
elif self._camera_images:
image = next(self._camera_images)
self._draw_image(image)
time.sleep(1.0 / self._params.camera_out_framerate)
await self._draw_image(image)
await asyncio.sleep(1.0 / self._params.camera_out_framerate)
else:
time.sleep(1.0 / self._params.camera_out_framerate)
except queue.Empty:
pass
await asyncio.sleep(1.0 / self._params.camera_out_framerate)
except asyncio.CancelledError:
break
except Exception as e:
logger.error(f"Error writing to camera: {e}")
logger.error(f"{self} error writing to camera: {e}")
#
# Audio out
@@ -266,13 +259,8 @@ class BaseOutputTransport(FrameProcessor):
async def send_audio(self, frame: AudioRawFrame):
await self.process_frame(frame, FrameDirection.DOWNSTREAM)
def _send_audio_truncated(self, buffer: bytearray, smallest_write_size: int) -> bytearray:
try:
truncated_length: int = len(buffer) - (len(buffer) % smallest_write_size)
if truncated_length:
self.write_raw_audio_frames(bytes(buffer[:truncated_length]))
buffer = buffer[truncated_length:]
return buffer
except BaseException as e:
logger.error(f"Error writing audio frames: {e}")
return buffer
async def _maybe_send_audio(self, buffer: bytearray) -> bytearray:
if len(buffer) >= self._audio_chunk_size:
await self.write_raw_audio_frames(bytes(buffer[:self._audio_chunk_size]))
buffer = buffer[self._audio_chunk_size:]
return buffer

View File

@@ -41,7 +41,12 @@ class TransportParams(BaseModel):
class BaseTransport(ABC):
def __init__(self, loop: asyncio.AbstractEventLoop | None):
def __init__(self,
input_name: str | None = None,
output_name: str | None = None,
loop: asyncio.AbstractEventLoop | None = None):
self._input_name = input_name
self._output_name = output_name
self._loop = loop or asyncio.get_running_loop()
self._event_handlers: dict = {}

View File

@@ -6,6 +6,8 @@
import asyncio
from concurrent.futures import ThreadPoolExecutor
from pipecat.frames.frames import AudioRawFrame, StartFrame
from pipecat.processors.frame_processor import FrameProcessor
from pipecat.transports.base_input import BaseInputTransport
@@ -43,26 +45,20 @@ class LocalAudioInputTransport(BaseInputTransport):
await super().start(frame)
self._in_stream.start_stream()
async def stop(self):
await super().stop()
self._in_stream.stop_stream()
async def cleanup(self):
await super().cleanup()
self._in_stream.stop_stream()
# This is not very pretty (taken from PyAudio docs).
while self._in_stream.is_active():
await asyncio.sleep(0.1)
self._in_stream.close()
await super().cleanup()
def _audio_in_callback(self, in_data, frame_count, time_info, status):
if not self._running:
return (None, pyaudio.paAbort)
frame = AudioRawFrame(audio=in_data,
sample_rate=self._params.audio_in_sample_rate,
num_channels=self._params.audio_in_channels)
self.push_audio_frame(frame)
asyncio.run_coroutine_threadsafe(self.push_audio_frame(frame), self.get_event_loop())
return (None, pyaudio.paContinue)
@@ -72,19 +68,29 @@ class LocalAudioOutputTransport(BaseOutputTransport):
def __init__(self, py_audio: pyaudio.PyAudio, params: TransportParams):
super().__init__(params)
self._executor = ThreadPoolExecutor(max_workers=5)
self._out_stream = py_audio.open(
format=py_audio.get_format_from_width(2),
channels=params.audio_out_channels,
rate=params.audio_out_sample_rate,
output=True)
def write_raw_audio_frames(self, frames: bytes):
self._out_stream.write(frames)
async def start(self, frame: StartFrame):
await super().start(frame)
self._out_stream.start_stream()
async def cleanup(self):
await super().cleanup()
self._out_stream.stop_stream()
# This is not very pretty (taken from PyAudio docs).
while self._out_stream.is_active():
await asyncio.sleep(0.1)
self._out_stream.close()
async def write_raw_audio_frames(self, frames: bytes):
await self.get_event_loop().run_in_executor(self._executor, self._out_stream.write, frames)
class LocalAudioTransport(BaseTransport):

View File

@@ -6,6 +6,8 @@
import asyncio
from concurrent.futures import ThreadPoolExecutor
import numpy as np
import tkinter as tk
@@ -53,25 +55,20 @@ class TkInputTransport(BaseInputTransport):
await super().start(frame)
self._in_stream.start_stream()
async def stop(self):
await super().stop()
self._in_stream.stop_stream()
async def cleanup(self):
await super().cleanup()
self._in_stream.stop_stream()
# This is not very pretty (taken from PyAudio docs).
while self._in_stream.is_active():
await asyncio.sleep(0.1)
self._in_stream.close()
def _audio_in_callback(self, in_data, frame_count, time_info, status):
if not self._running:
return (None, pyaudio.paAbort)
frame = AudioRawFrame(audio=in_data,
sample_rate=self._params.audio_in_sample_rate,
num_channels=self._params.audio_in_channels)
self.push_audio_frame(frame)
asyncio.run_coroutine_threadsafe(self.push_audio_frame(frame), self.get_event_loop())
return (None, pyaudio.paContinue)
@@ -81,6 +78,8 @@ class TkOutputTransport(BaseOutputTransport):
def __init__(self, tk_root: tk.Tk, py_audio: pyaudio.PyAudio, params: TransportParams):
super().__init__(params)
self._executor = ThreadPoolExecutor(max_workers=5)
self._out_stream = py_audio.open(
format=py_audio.get_format_from_width(2),
channels=params.audio_out_channels,
@@ -94,16 +93,24 @@ class TkOutputTransport(BaseOutputTransport):
self._image_label = tk.Label(tk_root, image=photo)
self._image_label.pack()
def write_raw_audio_frames(self, frames: bytes):
self._out_stream.write(frames)
def write_frame_to_camera(self, frame: ImageRawFrame):
self.get_event_loop().call_soon(self._write_frame_to_tk, frame)
async def start(self, frame: StartFrame):
await super().start(frame)
self._out_stream.start_stream()
async def cleanup(self):
await super().cleanup()
self._out_stream.stop_stream()
# This is not very pretty (taken from PyAudio docs).
while self._out_stream.is_active():
await asyncio.sleep(0.1)
self._out_stream.close()
async def write_raw_audio_frames(self, frames: bytes):
await self.get_event_loop().run_in_executor(self._executor, self._out_stream.write, frames)
async def write_frame_to_camera(self, frame: ImageRawFrame):
self.get_event_loop().call_soon(self._write_frame_to_tk, frame)
def _write_frame_to_tk(self, frame: ImageRawFrame):
width = frame.size[0]
height = frame.size[1]

View File

@@ -0,0 +1,160 @@
#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import io
import wave
from typing import Awaitable, Callable
from pydantic.main import BaseModel
from pipecat.serializers.twilio import TwilioFrameSerializer
from pipecat.frames.frames import AudioRawFrame, StartFrame
from pipecat.processors.frame_processor import FrameProcessor
from pipecat.serializers.base_serializer import FrameSerializer
from pipecat.transports.base_input import BaseInputTransport
from pipecat.transports.base_output import BaseOutputTransport
from pipecat.transports.base_transport import BaseTransport, TransportParams
from loguru import logger
try:
from fastapi import WebSocket
from starlette.websockets import WebSocketState
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error(
"In order to use FastAPI websockets, you need to `pip install pipecat-ai[websocket]`.")
raise Exception(f"Missing module: {e}")
class FastAPIWebsocketParams(TransportParams):
add_wav_header: bool = False
audio_frame_size: int = 6400 # 200ms
serializer: FrameSerializer = TwilioFrameSerializer()
class FastAPIWebsocketCallbacks(BaseModel):
on_client_connected: Callable[[WebSocket], Awaitable[None]]
on_client_disconnected: Callable[[WebSocket], Awaitable[None]]
class FastAPIWebsocketInputTransport(BaseInputTransport):
def __init__(
self,
websocket: WebSocket,
params: FastAPIWebsocketParams,
callbacks: FastAPIWebsocketCallbacks,
**kwargs):
super().__init__(params, **kwargs)
self._websocket = websocket
self._params = params
self._callbacks = callbacks
async def start(self, frame: StartFrame):
await self._callbacks.on_client_connected(self._websocket)
await super().start(frame)
self._receive_task = self.get_event_loop().create_task(self._receive_messages())
async def stop(self):
if self._websocket.client_state != WebSocketState.DISCONNECTED:
await self._websocket.close()
await super().stop()
async def _receive_messages(self):
async for message in self._websocket.iter_text():
frame = self._params.serializer.deserialize(message)
if not frame:
continue
if isinstance(frame, AudioRawFrame):
await self.push_audio_frame(frame)
await self._callbacks.on_client_disconnected(self._websocket)
class FastAPIWebsocketOutputTransport(BaseOutputTransport):
def __init__(self, websocket: WebSocket, params: FastAPIWebsocketParams, **kwargs):
super().__init__(params, **kwargs)
self._websocket = websocket
self._params = params
self._audio_buffer = bytes()
async def write_raw_audio_frames(self, frames: bytes):
self._audio_buffer += frames
while len(self._audio_buffer) >= self._params.audio_frame_size:
frame = AudioRawFrame(
audio=self._audio_buffer[:self._params.audio_frame_size],
sample_rate=self._params.audio_out_sample_rate,
num_channels=self._params.audio_out_channels
)
if self._params.add_wav_header:
content = io.BytesIO()
ww = wave.open(content, "wb")
ww.setsampwidth(2)
ww.setnchannels(frame.num_channels)
ww.setframerate(frame.sample_rate)
ww.writeframes(frame.audio)
ww.close()
content.seek(0)
wav_frame = AudioRawFrame(
content.read(),
sample_rate=frame.sample_rate,
num_channels=frame.num_channels)
frame = wav_frame
payload = self._params.serializer.serialize(frame)
if payload:
await self._websocket.send_text(payload)
self._audio_buffer = self._audio_buffer[self._params.audio_frame_size:]
class FastAPIWebsocketTransport(BaseTransport):
def __init__(
self,
websocket: WebSocket,
params: FastAPIWebsocketParams = FastAPIWebsocketParams(),
input_name: str | None = None,
output_name: str | None = None,
loop: asyncio.AbstractEventLoop | None = None):
super().__init__(input_name=input_name, output_name=output_name, loop=loop)
self._params = params
self._callbacks = FastAPIWebsocketCallbacks(
on_client_connected=self._on_client_connected,
on_client_disconnected=self._on_client_disconnected
)
self._input = FastAPIWebsocketInputTransport(
websocket, self._params, self._callbacks, name=self._input_name)
self._output = FastAPIWebsocketOutputTransport(
websocket, self._params, name=self._output_name)
# Register supported handlers. The user will only be able to register
# these handlers.
self._register_event_handler("on_client_connected")
self._register_event_handler("on_client_disconnected")
def input(self) -> FrameProcessor:
return self._input
def output(self) -> FrameProcessor:
return self._output
async def _on_client_connected(self, websocket):
await self._call_event_handler("on_client_connected", websocket)
async def _on_client_disconnected(self, websocket):
await self._call_event_handler("on_client_disconnected", websocket)

Some files were not shown because too many files have changed in this diff Show More