Compare commits

...

456 Commits

Author SHA1 Message Date
Aleix Conchillo Flaqué
41695806e8 process SystemFrames in a queue as high priority frames 2025-05-07 17:05:35 -07:00
Mark Backman
7280e390d9 Merge pull request #1774 from pipecat-ai/mb/moondream-ex-server
Add load_dotenv to moondream example server
2025-05-07 19:02:30 -04:00
Mark Backman
4efc3f0a39 Merge pull request #1775 from pipecat-ai/mb/patient-ex-env
Add load_dotenv to patient-intake server file
2025-05-07 19:02:20 -04:00
Mark Backman
cb7e7a8aa3 Add load_dotenv to patient-intake server file 2025-05-07 18:40:04 -04:00
Mark Backman
9136402846 Add load_dotenv to moondream example server 2025-05-07 18:29:27 -04:00
Aleix Conchillo Flaqué
260fc76137 Merge pull request #1773 from pipecat-ai/aleix/pipecat-0.0.67
update CHANGELOG for 0.0.67
2025-05-07 15:05:55 -07:00
Aleix Conchillo Flaqué
7cfb9a4d15 update CHANGELOG for 0.0.67 2025-05-07 14:59:16 -07:00
Mark Backman
2089e0c974 Merge pull request #1768 from pipecat-ai/mb/update-observers
Add DebugLogObserver
2025-05-07 17:30:49 -04:00
Mark Backman
9e0b4fe5d1 Replace list with tuple 2025-05-07 17:21:09 -04:00
Mark Backman
75ce632f84 Add DebugLogObserver 2025-05-07 17:21:08 -04:00
Mark Backman
efeb96c4e8 Remove unused imports 2025-05-07 17:20:42 -04:00
kompfner
fb5438e9c2 Merge pull request #1770 from pipecat-ai/pk/amazon-nova-sonic-interruption-reliability
AWS Nova Sonic service - make interruption handling more reliable, in…
2025-05-07 17:16:06 -04:00
Mark Backman
7da9f66e1c Merge pull request #1761 from pipecat-ai/mb/elevenlabs-context-id
Update ElevenLabsTTSService to use the new websocket API
2025-05-07 17:12:06 -04:00
Mark Backman
9e16e3d614 Update ElevenLabsTTSService to use the new websocket API 2025-05-07 17:09:52 -04:00
Paul Kompfner
84d040c6d0 AWS Nova Sonic service - make interruption handling more reliable, in terms of:
- not getting the conversation into a "stuck" state
- not losing assistant text that should've made it into the context
2025-05-07 16:34:18 -04:00
Mark Backman
f3e0beb8f1 Merge pull request #1762 from pipecat-ai/iss-1734-rtvi-function-call-breakage
Revert breaking change in RTVI protocol for function calling
2025-05-07 15:25:22 -04:00
Aleix Conchillo Flaqué
e00a1196ef Merge pull request #1767 from pipecat-ai/aleix/daily-python-0.18.2
pyproject: update daily-python to 0.18.2
2025-05-07 12:19:59 -07:00
Aleix Conchillo Flaqué
3867c0f8e7 Merge pull request #1766 from pipecat-ai/aleix/daily-fix-multiple-audio-video-sources
fix multiple audio video sources
2025-05-07 12:19:46 -07:00
Aleix Conchillo Flaqué
cdf0953722 pyproject: update daily-python to 0.18.2 2025-05-07 11:56:36 -07:00
Aleix Conchillo Flaqué
ed00f7d071 add video_source field to UserImageRequestFrame 2025-05-07 11:50:21 -07:00
Aleix Conchillo Flaqué
a3038afa02 DailyTransport: fix multiple audio/video sources 2025-05-07 11:50:00 -07:00
kompfner
f9ca0b8cc6 Merge pull request #1704 from pipecat-ai/pk/amazon-nova-sonic
Amazon Nova Sonic LLM service
2025-05-07 14:45:28 -04:00
Paul Kompfner
2920aa5af4 [WIP] AWS Nova Sonic service - pull AWS Nova Sonic support out of the aws optional dependency in pyproject.toml and into its own aws-nova-sonic optional dependency. That's because it requires Python >= 3.12, a higher version than the base project's 3.10. This change allows anyone using any of the other AWS services (including our own unit tests) to continue using the lower Python version. 2025-05-07 14:32:32 -04:00
Paul Kompfner
93c9cc4a0e [WIP] AWS Nova Sonic service - minor fix 2025-05-07 13:54:06 -04:00
Paul Kompfner
b53f9235e4 [WIP] AWS Nova Sonic service - remove unnecessary _context_available state, instead just relying on the presence of _context 2025-05-07 13:54:06 -04:00
Paul Kompfner
1491462d15 [WIP] AWS Nova Sonic service - remove _handling_bot_stopped_speaking, which no longer seems to be necessary; I'm no longer observing back-to-back BotStoppedSpeaking frames 2025-05-07 13:54:06 -04:00
Paul Kompfner
c78f779800 [WIP] AWS Nova Sonic service - log an error message if you try to use AWS Nova Sonic without the proper dependency (e.g. without having done pip install pipecat-ai[aws]) 2025-05-07 13:54:06 -04:00
Paul Kompfner
b013e375fb [WIP] AWS Nova Sonic service - simplify a bit of logic (and do the same simplification in the OpenAI Realtime service) 2025-05-07 13:54:06 -04:00
Paul Kompfner
52036138c1 [WIP] AWS Nova Sonic service - remove unnecessary (no-op) code 2025-05-07 13:54:06 -04:00
Paul Kompfner
4ba9a42861 [WIP] AWS Nova Sonic service - add more accurate typing 2025-05-07 13:54:06 -04:00
Paul Kompfner
27bff7a759 [WIP] AWS Nova Sonic service - fix comment 2025-05-07 13:54:06 -04:00
Paul Kompfner
896f8d85f7 [WIP] AWS Nova Sonic service - remove out-of-date TODO comment 2025-05-07 13:54:06 -04:00
Paul Kompfner
ed06cdd2c7 [WIP] AWS Nova Sonic service - add CHANGELOG entry 2025-05-07 13:54:02 -04:00
Paul Kompfner
8473647269 [WIP] AWS Nova Sonic service - update persistent-context example to better avoid saving "transitional", as opposed to meaningful, context messages 2025-05-07 13:52:51 -04:00
Paul Kompfner
5579145a06 [WIP] AWS Nova Sonic service - post-rebase, update examples to play nicely with recent pipecat changes 2025-05-07 13:52:51 -04:00
Paul Kompfner
35848d10b3 [WIP] AWS Nova Sonic service - remove various TODO comments 2025-05-07 13:52:51 -04:00
Paul Kompfner
c7e223e85a [WIP] AWS Nova Sonic service - remove print statements in favor of logger 2025-05-07 13:52:51 -04:00
Paul Kompfner
885b2d1d2f [WIP] AWS Nova Sonic service - make parameters configurable 2025-05-07 13:52:51 -04:00
Paul Kompfner
73020be511 [WIP] AWS Nova Sonic service - minor fix: only try to read received JSON if we have it 2025-05-07 13:52:51 -04:00
Paul Kompfner
d388c057c0 [WIP] AWS Nova Sonic service - recover from unwanted disconnection due to an error 2025-05-07 13:52:51 -04:00
Paul Kompfner
c4d0f91a7f [WIP] AWS Nova Sonic service - remove some old code that was accidentally still there, possibly sending a duplicate system instruction 2025-05-07 13:52:51 -04:00
Paul Kompfner
467233be04 [WIP] AWS Nova Sonic service - support multi-line system prompt 2025-05-07 13:52:51 -04:00
Paul Kompfner
2b02d08f4c [WIP] AWS Nova Sonic service - add comments to examples pointing out the us-east-1 is the only supported region so far 2025-05-07 13:52:51 -04:00
Paul Kompfner
9fe265ea64 [WIP] AWS Nova Sonic service - implement ability to persist and load conversations 2025-05-07 13:52:51 -04:00
Paul Kompfner
cc1f4ba81c [WIP] AWS Nova Sonic service - add a hacky way of programmatically triggering an assistant response 2025-05-07 13:52:51 -04:00
Paul Kompfner
3784bdbd27 [WIP] AWS Nova Sonic service - in our hacky direct manipulation of the context, aggregate assistant text rather than recording every chunk as a separate message 2025-05-07 13:52:51 -04:00
Paul Kompfner
4ffdc3b77c [WIP] AWS Nova Sonic service - do hacky direct manipulation of the context for now, since I can't seem to get assistant context aggregation working properly with frames, grr 2025-05-07 13:52:51 -04:00
Paul Kompfner
38c9fa681a [WIP] AWS Nova Sonic service - Protect against back-to-back BotStoppedSpeaking calls, which I've observed 2025-05-07 13:52:51 -04:00
Paul Kompfner
c477039954 [WIP] AWS Nova Sonic service - just for safety, add a short delay after BotStoppedSpeaking before sending LLMFullResponseEndFrame + TTSStoppedFrame, to give a bit of leeway for the LLM to deliver the "FINAL" text block describing what was said 2025-05-07 13:52:51 -04:00
Paul Kompfner
d6ef3d64ac [WIP] AWS Nova Sonic service - fix context problems of double-counting LLM text, and mis-categorizing user text as LLM text 2025-05-07 13:52:51 -04:00
Paul Kompfner
6938152db6 [WIP] AWS Nova Sonic service - fix comment 2025-05-07 13:52:51 -04:00
Paul Kompfner
2154db07f0 [WIP] AWS Nova Sonic service - remove unnecessary error log 2025-05-07 13:52:51 -04:00
Paul Kompfner
5e0803479e [WIP] AWS Nova Sonic service - add send_transcription_frames option 2025-05-07 13:52:51 -04:00
Paul Kompfner
3960c604a4 [WIP] AWS Nova Sonic service - fix empty assistant conversation history item in the context after tool use 2025-05-07 13:52:51 -04:00
Paul Kompfner
394648f1c9 [WIP] AWS Nova Sonic service - fix user utterances not making it into the context 2025-05-07 13:52:51 -04:00
Paul Kompfner
da5c4953d5 [WIP] AWS Nova Sonic service - allow passing in tools into initializer 2025-05-07 13:52:51 -04:00
Paul Kompfner
2b7e1cb5b1 [WIP] AWS Nova Sonic service - add tool calling 2025-05-07 13:52:51 -04:00
Paul Kompfner
f182eafb40 [WIP] AWS Nova Sonic service - add ability to pass in OpenAILLMContext 2025-05-07 13:52:51 -04:00
Paul Kompfner
9f7f42e885 [WIP] AWS Nova Sonic service 2025-05-07 13:52:51 -04:00
Paul Kompfner
9b8bce1914 [WIP] AWS Nova Sonic service - add voice_id 2025-05-07 13:52:51 -04:00
Paul Kompfner
96d05e12fc [WIP] AWS Nova Sonic service 2025-05-07 13:52:51 -04:00
Paul Kompfner
68c1069548 [WIP] AWS Nova Sonic service 2025-05-07 13:52:51 -04:00
Paul Kompfner
5b64613f65 [WIP] AWS Nova Sonic service 2025-05-07 13:52:51 -04:00
Paul Kompfner
1f9baefba8 [WIP] AWS Nova Sonic service - added stubs for handling interruption and user-started-speaking frames 2025-05-07 13:52:51 -04:00
Paul Kompfner
0c255d2618 [WIP] AWS Nova Sonic service - added TTSTextFrame and reworked/cleaned up some bookkeeping logic 2025-05-07 13:52:51 -04:00
Paul Kompfner
a38206de9c [WIP] AWS Nova Sonic service - added TranscriptionFrame 2025-05-07 13:52:51 -04:00
Paul Kompfner
260f7c9b85 [WIP] AWS Nova Sonic service - format 2025-05-07 13:52:51 -04:00
Paul Kompfner
de294caed9 [WIP] AWS Nova Sonic service - added LLMFullResponseStartFrame, LLMTextFrame, and LLMFullResponseEndFrame 2025-05-07 13:52:51 -04:00
Paul Kompfner
e40aa4f99a [WIP] AWS Nova Sonic service - added TTSStartedFrame and TTSStoppedFrame 2025-05-07 13:52:51 -04:00
Paul Kompfner
b1d413b9be [WIP] AWS Nova Sonic service 2025-05-07 13:52:51 -04:00
Paul Kompfner
8cbad070ad [WIP] AWS Nova Sonic service 2025-05-07 13:52:51 -04:00
Paul Kompfner
13569a5a5a [WIP] AWS Nova Sonic service 2025-05-07 13:52:51 -04:00
Paul Kompfner
d789334a60 [WIP] AWS Nova Sonic service 2025-05-07 13:52:51 -04:00
Paul Kompfner
7668b27fc0 [WIP] AWS Nova Sonic service 2025-05-07 13:52:51 -04:00
Paul Kompfner
6d30f441e8 [WIP] AWS Nova Sonic service 2025-05-07 13:52:51 -04:00
Paul Kompfner
a9e395b366 [WIP] AWS Nova Sonic service 2025-05-07 13:52:51 -04:00
Paul Kompfner
5e5626f04f [WIP] AWS Nova Sonic service 2025-05-07 13:52:47 -04:00
Aleix Conchillo Flaqué
d80aa5b44e Merge pull request #1753 from pipecat-ai/aleix/add-bedrock-support
Add support for Amazon Bedrock LLMs
2025-05-07 09:31:48 -07:00
Aleix Conchillo Flaqué
80ef6dc4de update README with AWS Bedrock and Transcribe 2025-05-07 09:26:26 -07:00
Aleix Conchillo Flaqué
458549f7df AWSBedrockLLMService: fix function calling 2025-05-07 09:26:26 -07:00
Aleix Conchillo Flaqué
a8405649d0 aws: use AWS prefix for all services 2025-05-07 09:26:26 -07:00
Aleix Conchillo Flaqué
ce1a72850b tests: add bedrock context aggregator tests 2025-05-07 09:26:26 -07:00
Aleix Conchillo Flaqué
58de381746 AWS: add missing utils 2025-05-07 09:26:26 -07:00
Aleix Conchillo Flaqué
bed2e894a2 BedrockLLMService: pull initial system frame from messages 2025-05-07 09:26:26 -07:00
Aleix Conchillo Flaqué
b4de98cfb7 AWS: various cleanups (logs, imports...) 2025-05-07 09:26:26 -07:00
Aleix Conchillo Flaqué
a4b9db9e07 fix formatting 2025-05-07 09:26:26 -07:00
Adithya Suresh
664111a3c9 Added cache related info to metrics 2025-05-07 09:26:26 -07:00
Adithya Suresh
aa964847f3 System param to be a list 2025-05-07 09:26:26 -07:00
Adithya Suresh
fa5cac7e0a Bug fix in content format 2025-05-07 09:26:26 -07:00
Adithya Suresh
b2b01861b2 Remove model restriction 2025-05-07 09:26:26 -07:00
Adithya Suresh
f014f718eb Restructured STT and enabled prosody tags for generative Polly 2025-05-07 09:26:26 -07:00
Adithya Suresh
05ae8d3ffa Removed OpenAI based context formatting 2025-05-07 09:26:26 -07:00
Adithya Suresh
88c9e08bd8 Updated tools parsing logic 2025-05-07 09:26:26 -07:00
Adithya Suresh
844f61dfea Initial implementation 2025-05-07 09:26:26 -07:00
Tico Ballagas
acb7d597cb Change example to use generative voices 2025-05-07 09:19:49 -07:00
Tico Ballagas
2b18f60261 Initial implementation of AWS Transcribe TTS 2025-05-07 09:19:49 -07:00
mattie ruth backman
5b66133a6c Revert breaking change in RTVI protocol for function calling 2025-05-07 12:08:28 -04:00
Mark Backman
0c5bc6a57a Merge pull request #1760 from WebinarGeek/wg/daily-active-speaker-event
DailyTransport: added on_active_speaker_changed event handler
2025-05-07 11:17:49 -04:00
Mark Backman
7981e00955 Merge pull request #1759 from pipecat-ai/mb/readme-nvidia-riva
Update README with Riva services
2025-05-07 11:13:51 -04:00
Dan Berg
5e39c0cfeb DailyTransport: added on_active_speaker_changed event handler 2025-05-07 15:22:30 +02:00
Mark Backman
a444701929 Update README with Riva services 2025-05-07 09:02:08 -04:00
Mark Backman
f6c1eb5d9d Merge pull request #1757 from pipecat-ai/mb/remove-canonical
Removing CanonicalMetricsService
2025-05-06 21:38:03 -04:00
Mark Backman
a1d46cb26b Removing CanonicalMetricsService 2025-05-06 21:23:23 -04:00
Aleix Conchillo Flaqué
99ab148d88 Merge pull request #1739 from pipecat-ai/aleix/observers-frame-pushed-class
BaseObserver: add FramePushed class and deprecate multiple arguments
2025-05-06 15:29:05 -07:00
Aleix Conchillo Flaqué
d69fa5dba5 update CHANGELOG with UltravoxSTTService fix 2025-05-06 15:26:25 -07:00
Aleix Conchillo Flaqué
0d30b000af BaseObserver: add FramePushed class and deprecated multiple arguments 2025-05-06 15:26:23 -07:00
Mark Backman
e7c0e742d2 Merge pull request #1752 from pipecat-ai/mb/deepgram-tts-aura-2
Update Deepgram TTS default voice to Aura 2 voice
2025-05-06 16:26:26 -04:00
Mark Backman
2aff2dcca3 Merge pull request #1751 from pipecat-ai/mb/11labs-enable_ssml_parsing
Add enable_ssml_parsing and enable_logging to ElevenLabsTTSService
2025-05-06 16:25:20 -04:00
Mark Backman
288f8865c8 Add enable_logging to ElevenLabsTTSService 2025-05-06 12:13:26 -04:00
Mark Backman
8691870bcb Update Deepgram TTS default voice to Aura 2 voice 2025-05-06 11:29:32 -04:00
Mark Backman
e06146c237 Add enable_ssml_parsing to ElevenLabsTTSService 2025-05-06 11:06:57 -04:00
Aleix Conchillo Flaqué
c68e990cda Merge pull request #1748 from pipecat-ai/aleix/task-manager-dictionary
task manager dictionary and cleanup PipelineTask
2025-05-06 07:57:53 -07:00
Aleix Conchillo Flaqué
4583905313 PipelineTask: cleanup if task is cancelled from outside Pipecat 2025-05-05 21:33:21 -07:00
Aleix Conchillo Flaqué
9cc498b1fa TaskManager: use a dictionary instead of a set to store tasks 2025-05-05 21:27:49 -07:00
Mark Backman
b3c5dc4045 Merge pull request #1443 from adithyaxx/anthropic-client-bug-fixes
Handle missing token counts in AsyncAnthropicBedrock client properly
2025-05-05 21:13:11 -04:00
Aleix Conchillo Flaqué
3824da7261 Merge pull request #1745 from pipecat-ai/aleix/make-sure-transports-are-ready
only send data to transports after they are really ready
2025-05-05 14:21:36 -07:00
Aleix Conchillo Flaqué
855d567b1e only send data to transports after they are really ready 2025-05-05 14:06:58 -07:00
Mark Backman
b323a7bd88 Merge pull request #1742 from pipecat-ai/mb/pcc-krisp-filter
Update pipecat-cloud-example to use Krisp in PCC deployment only
2025-05-05 15:46:12 -04:00
Mark Backman
fa011d0018 Update pipecat-cloud-example to use Krisp in PCC deployment only 2025-05-05 15:09:29 -04:00
Aleix Conchillo Flaqué
e15fa8777a Merge pull request #1737 from CerebriumAI/kyle/fix-ultravox-spacing
[Fix] Ultravox frame spacing issue
2025-05-05 09:34:49 -07:00
Aleix Conchillo Flaqué
2143a6d927 Merge pull request #1732 from pipecat-ai/aleix/daily-remote-custom-tracks
DailyTransport: remove custom tracks before leaving
2025-05-05 08:44:11 -07:00
Aleix Conchillo Flaqué
044e2d3e73 DailyTransport: remove custom tracks before leaving 2025-05-05 08:35:35 -07:00
Kyle Gani
be112ec63f Merge branch 'kyle/fix-ultravox-performance' of github.com:CerebriumAI/pipecat into kyle/fix-ultravox-performance 2025-05-05 17:13:26 +02:00
Kyle Gani
d2f56c4e8f Fix: Spacing issue 2025-05-05 17:13:21 +02:00
Mark Backman
ddc6a9c695 Merge pull request #1670 from pipecat-ai/mb/daily-twilio-sip-example
Add standalone Daily + Twilio SIP example
2025-05-05 10:57:16 -04:00
Mark Backman
2bebdbc371 Merge pull request #1671 from pipecat-ai/khk/rime-arcana
support for rime arcana model
2025-05-05 10:54:50 -04:00
Mark Backman
8b9f1f0608 Add a changelog entry 2025-05-05 10:51:46 -04:00
Kwindla Hultman Kramer
b25f3b2ed2 support for rime arcana model 2025-05-05 10:50:46 -04:00
Mark Backman
a995cf81b6 Merge pull request #1724 from pipecat-ai/mb/demo-fixes
Demo fixes
2025-05-05 08:44:57 -04:00
Aleix Conchillo Flaqué
75d261639f Merge pull request #1726 from pipecat-ai/aleix/pipecat-0.0.66
update CHANGELOG for pipecat 0.0.66
2025-05-02 20:54:57 -07:00
Aleix Conchillo Flaqué
f720d795d0 update CHANGELOG for pipecat 0.0.66 2025-05-02 20:29:51 -07:00
Aleix Conchillo Flaqué
f6fe83e358 Merge pull request #1725 from pipecat-ai/aleix/update-daily-python-0.18.1
update to daily-python 0.18.1
2025-05-02 20:27:50 -07:00
Mark Backman
0513d0b6a8 Update README 2025-05-02 22:44:50 -04:00
Mark Backman
0679bb217d Remove Twilio from phone-chatbot directory 2025-05-02 22:18:50 -04:00
Mark Backman
38bd55e518 Update README 2025-05-02 22:18:50 -04:00
Mark Backman
65c7423280 Add other dial-in event handlers 2025-05-02 22:18:50 -04:00
Mark Backman
f24a85cc94 Add logic to only forward the first on_dialin_ready event 2025-05-02 22:18:50 -04:00
Mark Backman
53887b7c98 Display phone number in WebRTC call 2025-05-02 22:18:50 -04:00
Mark Backman
523c012c38 Use a Twilio asset to ring the phone throughout 2025-05-02 22:18:50 -04:00
Mark Backman
97c28989c1 Add standalone Daily + Twilio SIP example 2025-05-02 22:18:50 -04:00
Mark Backman
c19be6ebb2 Demo fixes 2025-05-02 20:58:10 -04:00
Aleix Conchillo Flaqué
54971a0735 update to daily-python 0.18.1 2025-05-02 17:47:44 -07:00
Mark Backman
4513e81e13 Merge pull request #1723 from pipecat-ai/mb/base-output-bot-speaking-log
Only display the destination in the bot started/stopped speaking log …
2025-05-02 17:32:47 -04:00
Mark Backman
872204b795 Only display the destination in the bot started/stopped speaking log when there is a desintation 2025-05-02 17:29:28 -04:00
Aleix Conchillo Flaqué
a94cbfe6f5 Merge pull request #1722 from pipecat-ai/aleix/base-output-transport-audio-task-fix
BaseOutputTransport: always initialize audio task
2025-05-02 14:26:30 -07:00
Aleix Conchillo Flaqué
7152faafb2 BaseOutputTransport: always initialize audio task
We also use the audio task to also send synchronized images with audio.
2025-05-02 14:23:15 -07:00
Mark Backman
e6aadaccd8 Merge pull request #1721 from pipecat-ai/mb/simli-silent-frames
Fix: SimliVideoService was continuously emitting audio, preventing Bo…
2025-05-02 16:44:39 -04:00
Mark Backman
3a73aa71b8 Merge pull request #1613 from pipecat-ai/mb/improve-storybot-readme
demo: Restructure storytelling-chatbot directory, update README steps…
2025-05-02 16:39:59 -04:00
Mark Backman
814e7509e1 demo: Restructure storytelling-chatbot directory, update README steps, link to vercel demo 2025-05-02 16:37:37 -04:00
Vanessa Pyne
e0cf5ec016 Merge pull request #1705 from pipecat-ai/vp-update-nvidia-models
Riva Service: add magpie-tts-multilingual model
2025-05-02 15:34:23 -05:00
vipyne
667bd32e6a Riva: remove deprecated lines in example 2025-05-02 15:33:10 -05:00
vipyne
b2ecd83706 update CHANGELOG with Riva details 2025-05-02 15:33:10 -05:00
vipyne
b2754117c8 Riva: refactor function_id and model_name 2025-05-02 15:33:10 -05:00
vipyne
6c428c303b update magpie voice 2025-05-02 15:33:10 -05:00
Mark Backman
e7d889a143 Update RivaSTTService to use by default 2025-05-02 15:33:10 -05:00
Mark Backman
da60e7069b Update pyproject.toml to use nvidia-riva-client 2.19.1 2025-05-02 15:33:10 -05:00
Mark Backman
c14406a3b9 Demos use the latest services 2025-05-02 15:33:10 -05:00
Mark Backman
725ab5ec21 Small fixes: No default api_key of None, ParakeetSTTService uses RivaSTTService.InputParams 2025-05-02 15:33:10 -05:00
Mark Backman
daf9d47e58 Update RivaSegmentedSTTService 2025-05-02 15:33:10 -05:00
vipyne
63a65627a2 Riva Service: add magpie-tts-multilingual model 2025-05-02 15:33:10 -05:00
Mark Backman
02c07755b0 Add Changelog entry for PR 1707 2025-05-02 15:33:10 -05:00
Matt Kim
15cbd18acc [Rime] Add phonemizeBetweenBrackets and pauseBetweenBrackets to RimeTTSService (ws)
There is a fix incoming in
2025-05-02 15:33:10 -05:00
Kwindla Hultman Kramer
93c40b87dc small groq updates 2025-05-02 15:33:10 -05:00
Mark Backman
eeaa9f67a1 Fix: SimliVideoService was continuously emitting audio, preventing BotStoppedSpeakingFrame from being sent 2025-05-02 16:32:42 -04:00
Mark Backman
b60691c7b2 Merge pull request #1720 from pipecat-ai/mb/changelog-pr-1707
Add Changelog entry for PR 1707
2025-05-02 16:13:40 -04:00
Mark Backman
2bb1b0b343 Add Changelog entry for PR 1707 2025-05-02 16:09:50 -04:00
Mark Backman
047ef9f86c Merge pull request #1707 from rimelabs/matt/rime/url_param_serialization
[Rime] Add new params to RimeTTSService
2025-05-02 16:08:01 -04:00
Kwindla Hultman Kramer
9a2c603c91 Merge pull request #1711 from pipecat-ai/khk/groq-updates 2025-05-02 12:21:15 -07:00
Filipi da Silva Fuchter
94c4169407 Merge pull request #1717 from pipecat-ai/local_smart_turn_torch
Local smart turn torch
2025-05-02 15:53:30 -03:00
Filipi Fuchter
cb8a551db8 Mentioning the new LocalSmartTurnAnalyzer in the changelog. 2025-05-02 14:32:18 -03:00
Filipi Fuchter
779f09af70 Fixing lint. 2025-05-02 14:22:38 -03:00
Filipi Fuchter
19dc0f2bfb New example using the local smart turn 2025-05-02 14:21:42 -03:00
Filipi Fuchter
f0709e22ba Creating a local smart turn using torch. 2025-05-02 14:21:29 -03:00
Mark Backman
8250736f5e Merge pull request #1708 from pipecat-ai/mb/gemini-user-context
Push GeminiMultimodalLiveLLMService TranscriptionFrame Upstream, remo…
2025-05-02 13:10:27 -04:00
Mark Backman
83348a9f93 Merge pull request #1714 from pipecat-ai/mb/fix-gemini-text-modality
Restore TEXT modalities support to GeminiMultimodalLiveLLMService
2025-05-02 10:41:05 -04:00
Mark Backman
96d40903a9 Only send TTSStoppedFrame from Gemini when in AUDIO mode, only send one LLMFullResponseEndFrame 2025-05-02 10:18:53 -04:00
Aleix Conchillo Flaqué
2560811805 Merge pull request #1697 from pipecat-ai/aleix/daily-custom-audio-tracks
add support for multiple transport destinations
2025-05-02 06:34:09 -07:00
Mark Backman
2b8c44c008 Merge pull request #1710 from pipecat-ai/mb/openai-context-aggregation
fix: OpenAIRealtimeBetaLLMService writes two assistant messages to th…
2025-05-02 07:43:35 -04:00
Mark Backman
38e2d37674 Restore TEXT modalities support to GeminiMultimodalLiveLLMService 2025-05-02 07:36:12 -04:00
Vanessa Pyne
6278561f88 Merge pull request #1709 from pipecat-ai/vp-fix-fastpitch-params-update
Riva TTS: update FastPitch params
2025-05-01 21:23:10 -05:00
Aleix Conchillo Flaqué
750e79c1ce DailyParams: rename to camera/microphone_out_enabled 2025-05-01 19:17:14 -07:00
Aleix Conchillo Flaqué
71eb2963c5 examples: added daily-custom-tracks 2025-05-01 19:17:14 -07:00
Aleix Conchillo Flaqué
f44e2c86ea BaseOutputTransport: compute sample_rate and audio_chunk_size in main class 2025-05-01 19:17:14 -07:00
Aleix Conchillo Flaqué
afe1f0df8c DailyTransport: make sure we can write audio frames to destination 2025-05-01 19:17:14 -07:00
Aleix Conchillo Flaqué
458fddfb48 update CHANGELOG with new Daily and Transport features 2025-05-01 19:17:14 -07:00
Aleix Conchillo Flaqué
8d915c5ccb DailyParams: allow enabling/disabling camera/microphone tracks 2025-05-01 19:17:14 -07:00
Aleix Conchillo Flaqué
304153dd03 TTSService: set transport destination to all TTS frames 2025-05-01 19:17:14 -07:00
Aleix Conchillo Flaqué
a6781b7352 rename destination to transport_destination 2025-05-01 19:17:14 -07:00
Aleix Conchillo Flaqué
5ad0058303 update CHANGELOG with frame source/destination support 2025-05-01 19:11:13 -07:00
Aleix Conchillo Flaqué
75c039de33 examples: add daily-multi-translation 2025-05-01 19:11:13 -07:00
Aleix Conchillo Flaqué
74e3c3677e DailyTransport: fix audio/video renderers registration 2025-05-01 18:58:44 -07:00
Aleix Conchillo Flaqué
dc20327f10 DailyTransport: register audio destination and use custom tracks 2025-05-01 18:58:44 -07:00
Aleix Conchillo Flaqué
e738affd29 BaseOutputTransport: allow sending audio/video to multiple destinations 2025-05-01 18:58:44 -07:00
Aleix Conchillo Flaqué
ef3d732607 DailyTransport: allow capturing multiple simultaneous audio/video sources 2025-05-01 18:58:44 -07:00
Aleix Conchillo Flaqué
6d63cff1bf DailyTransport: custom audio tracks support 2025-05-01 18:58:44 -07:00
Aleix Conchillo Flaqué
12f42605a1 pyproject: update daily-python to 0.18.0 2025-05-01 18:58:44 -07:00
Kwindla Hultman Kramer
fac3337927 small groq updates 2025-05-01 17:09:15 -07:00
Mark Backman
76d198151c Push GeminiMultimodalLiveLLMService TranscriptionFrame Upstream, remove direct context addition 2025-05-01 15:41:04 -04:00
Mark Backman
6a907058de fix: OpenAIRealtimeBetaLLMService writes two assistant messages to the context 2025-05-01 15:37:39 -04:00
vipyne
6e1f531f64 Riva TTS: update FastPitch params
91138c3f66 (diff-ece228577b1d233ce600a948243f90cece53e3a9b89554a0b27a48bc4d6e0fdfR45)
2025-05-01 11:14:41 -05:00
Matt Kim
4232cca5b6 [Rime] Add phonemizeBetweenBrackets and pauseBetweenBrackets to RimeTTSService (ws)
There is a fix incoming in
2025-04-30 18:09:22 -07:00
Mark Backman
a6a4d3d71f Merge pull request #1706 from rimelabs/matt/rime/update_url
[Rime] - Update url for Websockets API
2025-04-30 19:14:04 -04:00
Mark Backman
c52de0f5de Merge pull request #1696 from pipecat-ai/mb/fix-gemini-live-context
Fix: GeminiMultimodalLiveLLMService was appending tokens to the context
2025-04-30 19:12:06 -04:00
Mark Backman
a1e1255f16 Strip newlines from generated user transcript 2025-04-30 18:27:46 -04:00
Mark Backman
c4f758725e Ignore TranscriptionFrames too 2025-04-30 18:22:43 -04:00
Aleix Conchillo Flaqué
7bc9a78ce6 udpate CHANGELOG with RTVIObserverParams 2025-04-30 15:13:14 -07:00
Aleix Conchillo Flaqué
f8be71b32c Merge pull request #1688 from pipecat-ai/aleix/add-rtvi-observer-params
RTVIObserver: add RTVIObserverParams to configure what to send
2025-04-30 15:11:18 -07:00
Aleix Conchillo Flaqué
957fa5546d RTVIObserver: add RTVIObserverParams to configure what to send 2025-04-30 15:09:02 -07:00
Aleix Conchillo Flaqué
039cb8fcae Merge pull request #1690 from pipecat-ai/aleix/rtvi-function-call-single-param
RTVIProcessor: use single FunctionCallParams
2025-04-30 15:04:05 -07:00
Mark Backman
8e05f2f1a1 Merge pull request #1702 from pipecat-ai/mb/stt-mute-transcription-frames
Add InterimTranscriptionFrame and TranscriptionFrame to STTMuteFilter…
2025-04-30 17:54:24 -04:00
Matt Kim
8467aa1ed3 [Rime] - Update url for Websockets API
Rime has migrated their Websockets api to the base url `user.rime.ai` along with all other tts endpoints. 

See the [docs](https://docs.rime.ai/api-reference/endpoint/websockets)

`users-ws.rime.ai` is deprecated and will not reflect upgrades to the rime ws api.
2025-04-30 14:20:13 -07:00
Mark Backman
9c5878af3d OpenAI Realtime and Gemini Live push LLMTextFrame again, overwrite the assitant context aggregator for LLMTextFrame 2025-04-30 17:18:20 -04:00
Mark Backman
ef29800fe9 Update the changelog 2025-04-30 16:28:17 -04:00
Mark Backman
7e09933070 OpenAI Realtime should push TTSTextFrame only 2025-04-30 16:28:17 -04:00
Mark Backman
82a9d7f992 Gemini Mulitmodal Live to push TTSTextFrame only 2025-04-30 16:28:17 -04:00
Mark Backman
facbebb15f Transcribe user audio in 26b 2025-04-30 16:28:16 -04:00
Mark Backman
2ba60fc41f Update TranscriptProcessor to handle GeminiMultimodalLiveLLMService changes 2025-04-30 16:28:16 -04:00
Mark Backman
685f951ae2 Fix: GeminiMultimodalLiveLLMService was appending tokens to the context 2025-04-30 16:28:16 -04:00
Mark Backman
27d4c927a8 Merge pull request #1701 from pipecat-ai/mb/gemini-extend-session
Add context_window_compression support to GeminiMultimodalLiveLLMService
2025-04-30 14:35:50 -04:00
Mark Backman
20a59e8c56 Add InterimTranscriptionFrame and TranscriptionFrame to STTMuteFilter frame processing 2025-04-30 10:50:56 -04:00
Mark Backman
d9a0a93667 Add context_window_compression support to GeminiMultimodalLiveLLMService 2025-04-30 09:55:34 -04:00
Mark Backman
154d5d1859 Merge pull request #1699 from pipecat-ai/mb/more-docs-mocks
Additional import mocks to fix docs failure
2025-04-30 08:36:57 -04:00
Mark Backman
a192217256 Additional import mocks to fix docs failure 2025-04-29 21:45:27 -04:00
Mark Backman
10cdc47e05 Merge pull request #1689 from pipecat-ai/mb/handle-http-smart-turn-errors
Handle case where Fal Smart Turn returns a 500 error
2025-04-29 14:21:45 -04:00
Mark Backman
2b4d41a548 Merge pull request #1693 from flixoflax/bump-modal-example-dependencies
Bump modal deployment example dependencies
2025-04-29 14:21:31 -04:00
Filipi da Silva Fuchter
962f8062a5 Merge pull request #1581 from pipecat-ai/voice_agent_ice_servers
Configuring the voice-agent example to wait for all the ice candidates.
2025-04-29 13:30:12 -03:00
Filipi Fuchter
d80d385b2f Adding a section explaining about ice servers. 2025-04-29 12:19:59 -03:00
Filipi Fuchter
b347ca472f Checking the state again to avoid any eventual race condition 2025-04-29 12:04:20 -03:00
Filipi Fuchter
c3c4952abf Reducing the timeout to 2 seconds for gathering the ice candidates. 2025-04-29 11:34:58 -03:00
Filipi Fuchter
f369ab4c1a Printing each new ice candidate. 2025-04-29 11:23:33 -03:00
flixoflax
62b41c6789 removed aiohttp from deps, and version constraint from pipecat-ai 2025-04-29 16:14:51 +02:00
Filipi da Silva Fuchter
2872bc7902 Merge pull request #1587 from pipecat-ai/improving_ice_servers
Updated `SmallWebRTCConnection` to support `ice_servers` with credentials.
2025-04-29 11:07:32 -03:00
Filipi Fuchter
9658b75a10 Configuring the voice-agent example to use ice-servers and wait for all the ice candidates. 2025-04-29 10:52:07 -03:00
flixoflax
63de9039e6 Bumb modal deployment example deps 2025-04-29 15:50:56 +02:00
Filipi Fuchter
9352396d7e Mentioning the new feature in the changelog. 2025-04-29 10:40:09 -03:00
Filipi Fuchter
d1ab1d38b7 Fixing the examples to use the new IceServer structure. 2025-04-29 10:33:19 -03:00
Filipi Fuchter
080f70d91c Allowing to define the username and credential for the ice servers. 2025-04-29 10:32:42 -03:00
Mark Backman
ebed1fc6ea Restructure _send_raw_request to raise errors early, then process the successful response 2025-04-29 09:05:14 -04:00
Aleix Conchillo Flaqué
6821b1cdab RTVIProcessor: use single FunctionCallParams 2025-04-29 05:56:23 -07:00
Mark Backman
144ae9b611 Handle case where Fal Smart Turn returns a 500 error 2025-04-29 08:53:02 -04:00
Aleix Conchillo Flaqué
a2e7331ce2 Merge pull request #1680 from pipecat-ai/aleix/local-input-select-stt-update
examples: update local-input-select-stt
2025-04-28 12:01:12 -07:00
Aleix Conchillo Flaqué
8accd3e387 Merge pull request #1681 from pipecat-ai/aleix/tts-service-llm-full-response-end-fix
TTSService: do not push LLMFullResponseEndFrame if not needed
2025-04-28 12:00:55 -07:00
Mark Backman
3d05a74dc0 Merge pull request #1678 from Dev-Khant/integrate-mem0-oss
Integrate with Mem0 OSS
2025-04-28 14:15:40 -04:00
Aleix Conchillo Flaqué
e3c965f4d5 TTSService: do not push LLMFullResponseEndFrame if not needed 2025-04-28 11:03:22 -07:00
Dev-Khant
5354e5d891 formatting 2025-04-28 22:57:18 +05:30
Aleix Conchillo Flaqué
5784e91cff update .gitignore 2025-04-28 09:34:45 -07:00
Aleix Conchillo Flaqué
bc5f098aaa examples: update local-input-select-stt 2025-04-28 09:28:26 -07:00
Aleix Conchillo Flaqué
93534b4692 Merge pull request #1679 from pipecat-ai/aleix/update-package-lock-apr-28
examples: update package-lock.json
2025-04-28 09:27:01 -07:00
Aleix Conchillo Flaqué
b23d54c609 examples: update package-lock.json 2025-04-28 09:15:14 -07:00
Dev-Khant
aa23a7b1e6 update changelog 2025-04-28 20:03:20 +05:30
Dev-Khant
c0c41789ab Integrate with Mem0 OSS 2025-04-28 17:15:53 +05:30
Aleix Conchillo Flaqué
029ef4f8c2 Merge pull request #1667 from pipecat-ai/aleix/function-call-single-parameter
function call single parameter
2025-04-25 13:53:10 -07:00
Aleix Conchillo Flaqué
9cad6dfce9 RTVIProcessor: simplify handle_function_call() and depreacted handle_function_call_start() 2025-04-25 13:34:05 -07:00
Aleix Conchillo Flaqué
4df6444832 examples: update with single FunctionCallParams parameter 2025-04-25 13:34:05 -07:00
Aleix Conchillo Flaqué
944bc23135 LLMService: use a single FunctionCallParams parameter for function calls 2025-04-25 13:34:03 -07:00
Aleix Conchillo Flaqué
1d863ee7de Merge pull request #1669 from pipecat-ai/aleix/short-utterances-fixes
short utterances fixes
2025-04-25 13:25:15 -07:00
Vanessa Pyne
9ca775d1ab Merge pull request #1668 from pipecat-ai/vp-update-transport-params-in-ex
Update examples with new transport param names
2025-04-25 15:24:28 -05:00
Aleix Conchillo Flaqué
03002ad685 LLMUserContextAggregator: reduce aggregation_timeout to 0.5 2025-04-25 13:21:00 -07:00
Aleix Conchillo Flaqué
99a4154cbc LLMUserContextAggregator: ignore short uterrances while bot speaking 2025-04-25 13:21:00 -07:00
vipyne
0f68cc182d Update examples with new transport param names 2025-04-25 15:18:56 -05:00
Aleix Conchillo Flaqué
3ac50b9902 Merge pull request #1664 from CerebriumAI/kyle/fix-ultravox-performance
Improved: Ultravox performance
2025-04-25 12:59:25 -07:00
Michael Louis
9557705b53 Changed warmup to internal function 2025-04-25 14:40:19 -04:00
Mark Backman
a7718926e9 Merge pull request #1666 from pipecat-ai/mb/add-vad-start-stop-events
Add VADUserStartedSpeakingFrame and VADUserStoppedSpeakingFrame
2025-04-25 10:49:17 -04:00
Mark Backman
dfa10af6ed Simplify VAD events to be detected and emitted from BaseInputTransport 2025-04-25 10:28:44 -04:00
Mark Backman
8485ea6c5e Merge pull request #1650 from pipecat-ai/mb/fal-smart-turn-readme
Add hosted demo link to Fal smart turn README
2025-04-25 09:39:54 -04:00
Mark Backman
b298376766 Add VADUserStartedSpeakingFrame and VADUserStoppedSpeakingFrame 2025-04-25 09:06:54 -04:00
Mark Backman
7cfefe4f84 Merge pull request #1593 from WebinarGeek/wg/gladia-translations
Push gladia translations as a TranscriptionFrame
2025-04-25 08:35:36 -04:00
Mark Backman
71c7373987 Update CHANGELOG 2025-04-25 08:26:40 -04:00
Mark Backman
d1086914fe Add TranslationFrame and use in GladiaSTTService; add 13c-gladia-translation.py 2025-04-25 08:25:39 -04:00
Dan Berg
2fb85941d3 Add hint installing test requirements 2025-04-25 08:24:51 -04:00
Dan Berg
be8788e4da Push gladia translations as a TranscriptionFrame 2025-04-25 08:24:51 -04:00
Mark Backman
acb6abd761 Merge pull request #1644 from pipecat-ai/mb/add-transport-examples
Add transports examples to foundational examples
2025-04-25 08:22:26 -04:00
Mark Backman
a528aad957 Merge pull request #1642 from pipecat-ai/mb/foundational-requirements
Add deepgram and cartesia to foundational example requirements to mak…
2025-04-25 08:22:08 -04:00
Mark Backman
5c13252801 Merge pull request #1658 from pipecat-ai/mb/update-fal-smart-turn-version
Update daily-transport version in fal-smart-turn demo
2025-04-25 08:21:50 -04:00
Mark Backman
a4422ac6c2 Add transports examples to foundational examples 2025-04-25 08:20:04 -04:00
Mark Backman
985a031353 Merge pull request #1657 from pipecat-ai/mb/word-wrangler-example
Add Word Wrangler demos
2025-04-25 08:16:02 -04:00
Kyle Gani
5c4079b286 Cleaned up: layout 2025-04-25 13:18:03 +02:00
Kyle Gani
5489ac5a73 Merge branch 'main' into kyle/fix-ultravox-performance 2025-04-25 13:16:48 +02:00
Kyle Gani
49fbcc86ac Improved: Ultravox performance 2025-04-25 13:12:08 +02:00
Aleix Conchillo Flaqué
d20c3307b9 Merge pull request #1648 from pipecat-ai/aleix/always-push-audio
input transports now always push audio frames
2025-04-24 18:59:09 -07:00
Aleix Conchillo Flaqué
9fd76923fd STTService: passthrough audio frames by default 2025-04-24 17:14:18 -07:00
Aleix Conchillo Flaqué
a753a623d4 examples: allow setting custom program arguments 2025-04-24 17:14:18 -07:00
Aleix Conchillo Flaqué
4ee6c4b59e BaseOutputTransport: reword camera with video 2025-04-24 17:14:18 -07:00
Aleix Conchillo Flaqué
e79a002e5a examples: update camera_* with video_* 2025-04-24 17:14:18 -07:00
Aleix Conchillo Flaqué
420912dd4b transports: deprecate TransportParams.camera_* in favor of video_* 2025-04-24 17:14:18 -07:00
Aleix Conchillo Flaqué
de7185e8db examples: remove vad_enabled=True 2025-04-24 17:14:18 -07:00
Aleix Conchillo Flaqué
8bfcfe8b1d transports: deprecate TransportParams.vad_enabled 2025-04-24 17:14:18 -07:00
Aleix Conchillo Flaqué
26d2ce5926 examples: remove vad_audio_passthrough=True 2025-04-24 17:14:18 -07:00
Aleix Conchillo Flaqué
9ee56bff9e transports: push audio by default. s/vad_audio_passthrough/audio_in_passthrough/ 2025-04-24 17:14:16 -07:00
Vanessa Pyne
2e0d77e4f0 Merge pull request #1627 from pipecat-ai/vp-mcp-take-3
MCP Service
2025-04-24 18:16:19 -05:00
vipyne
4cc8a4312c Revert "update 39* examples as per #1648"
This reverts commit b29ffeef29.
2025-04-24 18:11:35 -05:00
vipyne
cb7cb381aa Add MCPClient changelog line 2025-04-24 18:05:20 -05:00
vipyne
b29ffeef29 update 39* examples as per #1648 2025-04-24 17:19:50 -05:00
vipyne
b7b2a5b7a1 mcp service fix and add multiple mcp example 2025-04-24 17:13:40 -05:00
vipyne
3384598e07 mcp_service: pr notes 2025-04-24 17:13:40 -05:00
vipyne
c420dbe57f MCP Service
WIP getting mcp.run to work

add mcp[cli] to toml and lint

mcp stdio example

mcp sse example

mcp_run POC

ruff formatting
2025-04-24 17:13:40 -05:00
Mark Backman
fa8aafc7a5 Merge pull request #1660 from pipecat-ai/mb/foundational-examples-readme
Update foundational README with ToC
2025-04-24 18:12:27 -04:00
Mark Backman
4b364dda29 Update foundational README with ToC 2025-04-24 18:04:36 -04:00
Mark Backman
6bb765e40f Update daily-transport version in fal-smart-turn demo 2025-04-24 15:06:08 -04:00
Mark Backman
c80d09f66c Add Word Wrangler demos 2025-04-24 14:31:48 -04:00
Mark Backman
f8ff10c5d5 Add hosted demo link to Fal smart turn README 2025-04-23 21:27:53 -04:00
Mark Backman
09ff836ef6 Merge pull request #1640 from pipecat-ai/mb/fal-smart-turn-example
Fal Smart Turn example
2025-04-23 17:27:27 -04:00
Mark Backman
e446ecac14 Merge pull request #1649 from pipecat-ai/mb/fix-smart-turn-metrics
fix: SmartTurnMetricsData was reporting 0 for inference and processin…
2025-04-23 17:23:16 -04:00
Mark Backman
8c0c8a6153 Code review fixes 2025-04-23 17:22:19 -04:00
Mark Backman
70033ae00b fix: SmartTurnMetricsData was reporting 0 for inference and processing time 2025-04-23 17:17:46 -04:00
Mark Backman
ac9dce63ae README updates 2025-04-23 16:55:35 -04:00
Mark Backman
8b2df48fab Fix DailyTransport bot name 2025-04-23 16:43:54 -04:00
Mark Backman
156a5690fc Merge pull request #1647 from mattmatters/fix-typo
Fix wating -> waiting typo
2025-04-23 16:27:00 -04:00
Matt Lewis
d42c618398 Fix wating -> waiting typo 2025-04-23 15:55:14 -04:00
Mark Backman
b23ca5a4a8 Merge pull request #1641 from pipecat-ai/mb/11labs-input-params
ElevenLabs: InputParams can be set individually
2025-04-23 14:31:37 -04:00
Mark Backman
63a6697a90 ElevenLabs: InputParams can be set individually 2025-04-23 14:28:38 -04:00
Aleix Conchillo Flaqué
f1e45d0f02 Merge pull request #1646 from pipecat-ai/aleix/pipecat-0.0.65
update CHANGELOG for 0.0.65
2025-04-23 11:27:25 -07:00
Mark Backman
4ad227ca2d Merge pull request #1643 from pipecat-ai/mb/gladia-keepalive
Add a keepalive task to GladiaSTTService
2025-04-23 14:27:15 -04:00
Aleix Conchillo Flaqué
66cc18194b update CHANGELOG for 0.0.65 2025-04-23 11:25:32 -07:00
Mark Backman
7d65132c93 Add a keepalive task to GladiaSTTService 2025-04-23 14:21:44 -04:00
Mark Backman
db7d7a4204 Merge pull request #1645 from pipecat-ai/mb/telnyx-auto-hang-up
Add auto_hang_up to Telnyx serializer
2025-04-23 14:20:28 -04:00
Mark Backman
7bbac11084 Add docstrings to TelnyxFrameSerializer 2025-04-23 14:13:20 -04:00
Mark Backman
76c8322b57 Make call_sid optional in TwilioFrameSerializer 2025-04-23 14:08:50 -04:00
Mark Backman
7b1cd3523d Twilio: send only one hangup command 2025-04-23 13:41:36 -04:00
Mark Backman
6bd821ac9a Add auto_hang_up to Telnyx serializer 2025-04-23 13:29:54 -04:00
Mark Backman
a6d51c343e Add deepgram and cartesia to foundational example requirements to make quickstart smoother 2025-04-23 08:47:47 -04:00
Mark Backman
1a5cf7a521 Add local run and deployment steps to README 2025-04-22 21:37:35 -04:00
Mark Backman
69491417ec Fal Smart Turn example 2025-04-22 21:16:41 -04:00
Aleix Conchillo Flaqué
b91780ced2 Merge pull request #1638 from pipecat-ai/aleix/pipecat-0.0.64
update CHANGELOG for 0.0.64
2025-04-22 17:35:25 -07:00
Aleix Conchillo Flaqué
8ded666958 update CHANGELOG for 0.0.64 2025-04-22 17:32:06 -07:00
Filipi da Silva Fuchter
2490c804a5 Merge pull request #1631 from pipecat-ai/smart_turn_timeout
Returning the turn as complete if the request don’t return a result within SmartTurnParams stop_secs
2025-04-22 19:51:10 -03:00
Filipi Fuchter
dd8856a673 Merge branch 'main' into smart_turn_timeout
# Conflicts:
#	dot-env.template
2025-04-22 19:49:32 -03:00
Aleix Conchillo Flaqué
e7da08dab1 move smart turn files to audio.turn.smart_turn package 2025-04-22 15:29:31 -07:00
Aleix Conchillo Flaqué
ae60d42016 s/SmartTurnAnalyzer/HttpSmartTurnAnalyzer/ and add FalSmartTurnAnalyzer 2025-04-22 15:13:12 -07:00
Aleix Conchillo Flaqué
50e8d82ece SmartTurn: some linting cleanup 2025-04-22 14:39:02 -07:00
Mark Backman
cc9901a82f Replace httpx with aiohttp 2025-04-22 17:14:19 -04:00
Aleix Conchillo Flaqué
1fd43e8a3f Merge pull request #1636 from pipecat-ai/aleix/examples-logging
examples: always use loguru for logging
2025-04-22 13:06:40 -07:00
Aleix Conchillo Flaqué
fdc508a1a5 examples: always use loguru for logging 2025-04-22 11:51:49 -07:00
Mark Backman
37269db247 Merge pull request #1634 from pipecat-ai/mb/twilio-end-call
Automatically hangup Twilio calls
2025-04-22 14:05:10 -04:00
Mark Backman
51269aabbd Added cancel method to WebsocketServerOutputTransport 2025-04-22 13:58:39 -04:00
Mark Backman
74ecc19e09 Code review feedback 2025-04-22 13:54:12 -04:00
Mark Backman
c6d48c16df Add twilio to pyproject.toml, update demo to use twilio option 2025-04-22 13:01:56 -04:00
Mark Backman
873d84aa09 Twilio serializer to return None 2025-04-22 12:50:11 -04:00
Mark Backman
7360866c97 Add docstrings 2025-04-22 12:49:17 -04:00
Mark Backman
81f4768661 Automatically hangup Twilio calls 2025-04-22 12:45:34 -04:00
Vanessa Pyne
972d65f61b Merge pull request #1628 from pipecat-ai/vp-typo-fixes
typo fixes in phone-chatbot example
2025-04-22 10:05:56 -05:00
Mark Backman
1da9d398e3 Merge pull request #1619 from pipecat-ai/mb/grok-3-beta
GrokLLMService uses grok-3-beta as default model
2025-04-22 10:33:32 -04:00
Filipi Fuchter
7358bc6428 Returning the turn as complete if the request don’t return a result within SmartTurnParams stop_secs 2025-04-22 10:35:14 -03:00
vipyne
a6af499f84 typo fixes in phone-chatbot example 2025-04-21 23:49:13 -05:00
Aleix Conchillo Flaqué
f9d1a53e28 Merge pull request #1609 from pipecat-ai/aleix/pyproject-py-typed
pyproject: fix license fields
2025-04-21 16:14:22 -07:00
Mark Backman
3f3010af79 Add a SmartTurnMetricsData class, emitted by Metrics Frame in response to smart turn responses 2025-04-21 18:56:14 -04:00
Aleix Conchillo Flaqué
a02d47ddbd Merge pull request #1625 from 0xPatryk/patch-1
Fixed AttributeError: object has no attribute '_sample_rate"
2025-04-21 15:40:54 -07:00
Patryk
a649aff3e7 Fixed AttributeError: 'OpenAITTSService' object has no attribute '_sample_rate' 2025-04-21 11:03:45 +02:00
Mark Backman
a9b551d73e GrokLLMService uses grok-3-beta as default model 2025-04-19 08:05:59 -04:00
Mark Backman
747a821943 Merge pull request #1614 from pipecat-ai/mb/changelog-for-1525
Add CHANGELOG entry for PR 1525
2025-04-19 07:10:13 -04:00
Aleix Conchillo Flaqué
010db3ccd5 README: minor update 2025-04-18 20:57:05 -07:00
Aleix Conchillo Flaqué
db773b8b93 Merge pull request #1616 from pipecat-ai/aleix/new-readme
make README more fun
2025-04-18 18:15:35 -07:00
Mark Backman
16b7bf71b4 Additional README changes 2025-04-18 21:00:57 -04:00
Aleix Conchillo Flaqué
82d19508a4 make README more fun 2025-04-18 14:37:28 -07:00
Mark Backman
dc3646f0e7 Merge pull request #1615 from pipecat-ai/mb/issue-template
Add issue templates and move the pull request template to .github
2025-04-18 14:58:09 -04:00
Mark Backman
62e659cd3a Update to .yml templates so that types are used 2025-04-18 13:21:01 -04:00
Mark Backman
b2945f44fd Add issue templates and move the pull request template to .github 2025-04-18 12:17:46 -04:00
Mark Backman
618fbef81c Add CHANGELOG entry for PR 1525 2025-04-18 11:32:34 -04:00
Mark Backman
70c42dfa6e Merge pull request #1525 from shaiyon/google-default-creds
Enable usage of Application Default Credentials in Google services
2025-04-18 11:31:08 -04:00
Mark Backman
9ab374dd1f Merge pull request #1612 from pipecat-ai/mb/07g-stt-model
examples: Fix 07g by changing STT model
2025-04-18 08:04:20 -04:00
Mark Backman
cc6d284417 examples: Fix 07g by changing STT model 2025-04-18 07:13:34 -04:00
Filipi da Silva Fuchter
f77d8f0b6f Merge pull request #1611 from pipecat-ai/smart_turn_changelog
Mentioning the Smart Turn Detection into the changelog.
2025-04-17 23:02:57 -03:00
Varun Singh
9c0beb05cf Merge pull request #1597 from pipecat-ai/vr000m-opus-added
Changing default codec to OPUS for telephony
2025-04-17 18:42:12 -07:00
Aleix Conchillo Flaqué
858981c404 Merge pull request #1610 from pipecat-ai/aleix/add-base-turn-analyzer
audio: add BaseTurnAnalyzer class
2025-04-17 18:38:08 -07:00
Aleix Conchillo Flaqué
9eed225aa2 audio: add BaseTurnAnalyzer class 2025-04-17 18:37:52 -07:00
Filipi Fuchter
9f7371e485 Mentioning the Smart Turn Detection into the changelog. 2025-04-17 22:31:40 -03:00
Aleix Conchillo Flaqué
d77c37ff14 pyproject: add py.typed (PEP 561) 2025-04-17 17:29:04 -07:00
Aleix Conchillo Flaqué
b4916f9dae pyproject: fix license fields 2025-04-17 17:28:14 -07:00
Aleix Conchillo Flaqué
004a920920 Merge pull request #1563 from Bnowako/packaging-type-information
Add marker file for static type checkers
2025-04-17 17:26:15 -07:00
Filipi da Silva Fuchter
203c5a3a60 Merge pull request #1592 from pipecat-ai/smart_turn
Smart turn
2025-04-17 18:21:47 -03:00
Filipi Fuchter
7f6fb1754b Merge remote-tracking branch 'origin/smart_turn' into smart_turn 2025-04-17 17:53:53 -03:00
Filipi Fuchter
a390ce13a4 Removing the UserEndOfTurnFrame 2025-04-17 17:53:31 -03:00
Filipi da Silva Fuchter
61d31d1c40 Restoring stop_secs to default value.
Co-authored-by: Mark Backman <mark@daily.co>
2025-04-17 17:44:47 -03:00
Filipi da Silva Fuchter
e872ff943a Using the default model for OpenAi.
Co-authored-by: Mark Backman <mark@daily.co>
2025-04-17 17:43:39 -03:00
Filipi da Silva Fuchter
c71005e249 Using the default model for OpenAi.
Co-authored-by: Mark Backman <mark@daily.co>
2025-04-17 17:43:23 -03:00
Filipi Fuchter
6e06bf97c0 Preventing emitting the UserStartedSpeaking event multiple times. 2025-04-17 17:21:29 -03:00
Filipi Fuchter
a80dc94e91 Fixing ruff format. 2025-04-17 16:47:17 -03:00
Filipi Fuchter
3ea9cfd251 Keeping the _speech_triggered as true if the state is incomplete. 2025-04-17 16:46:15 -03:00
Filipi Fuchter
a80f82cdb6 Moving the environment variables to inside the demo. 2025-04-17 16:28:50 -03:00
Aleix Conchillo Flaqué
d24bab354f Merge pull request #1607 from pipecat-ai/aleix/fix-websocket-disconnects
services: fix TTS websocket services disconnections
2025-04-17 12:27:52 -07:00
Filipi Fuchter
53ee3fb64c Changing the log levels used in smart_turn 2025-04-17 16:14:13 -03:00
Filipi Fuchter
3599761e4e Changing the default behavior to only use the last vad segment, and increasing the default stop_secs to 3 2025-04-17 16:07:03 -03:00
Aleix Conchillo Flaqué
c0b3fe3985 services: only read from TTS websocket if websocket connection established 2025-04-17 11:54:07 -07:00
Aleix Conchillo Flaqué
497d48b6c8 services: fix TTS websocket services disconnections
Fixes #1467
2025-04-17 11:29:49 -07:00
Filipi Fuchter
e179916c9c Creating a new param use_only_last_vad_segment 2025-04-17 11:49:51 -03:00
Filipi Fuchter
b0b38beb19 Returning the max duration back to 8 seconds. 2025-04-17 11:39:48 -03:00
Filipi Fuchter
8577139d21 Fixing to keep the last max samples. 2025-04-17 11:39:06 -03:00
Filipi Fuchter
e2fbbb4b40 Renaming the smart turn classes. 2025-04-17 10:43:21 -03:00
Filipi Fuchter
88ce117e84 Changing the max duration default value to 16 seconds. 2025-04-17 10:35:13 -03:00
Filipi Fuchter
266537c3f4 Fixing to respect the stop_secs. 2025-04-17 10:07:08 -03:00
Filipi Fuchter
230d2f80fa Merge branch 'main' into smart_turn 2025-04-17 09:36:30 -03:00
Filipi Fuchter
3f0688aefa Testing smart turn using stop_secs as 5 seconds 2025-04-17 09:36:03 -03:00
Filipi da Silva Fuchter
5be3e6979e Merge pull request #1533 from pipecat-ai/daily_small_webrtc
Example interoping between SmallWebRTC and Daily
2025-04-17 09:19:23 -03:00
Mark Backman
9c19cff818 Merge pull request #1585 from ArmanJR/main
Troubleshooting SSL error
2025-04-16 22:46:45 -04:00
Mark Backman
95f3537bde Merge pull request #1598 from pipecat-ai/mb/11labs-http-timestamps
Added word/timestamp pairs to ElevenLabsHttpTTSService
2025-04-16 22:38:26 -04:00
Mark Backman
7ff748defd Merge pull request #1600 from pipecat-ai/mb/11labs-previous-text
Add previous_text context to ElevenLabsHttpTTSService
2025-04-16 22:33:38 -04:00
Mark Backman
2dafbee2aa Code review fixes 2025-04-16 22:29:33 -04:00
Mark Backman
1e0a9d7b06 Add previous_text context to ElevenLabsHttpTTSService 2025-04-16 22:22:08 -04:00
Mark Backman
4a23e138b1 Added word/timestamp pairs to ElevenLabsHttpTTSService 2025-04-16 22:20:51 -04:00
Mark Backman
384f80983f Added word/timestamp pairs to ElevenLabsHttpTTSService 2025-04-16 21:55:00 -04:00
Aleix Conchillo Flaqué
f6f01ea7e4 Merge pull request #1588 from pipecat-ai/aleix/llm-aggregator-params
LLM aggregator params
2025-04-16 15:25:21 -07:00
Aleix Conchillo Flaqué
f385cc0460 pyproject: add websockets as google dependency 2025-04-16 15:19:25 -07:00
Aleix Conchillo Flaqué
e97de43de2 add LLMUserAggregatorParams and LLMAssistantAggregatorParams 2025-04-16 15:19:19 -07:00
Aleix Conchillo Flaqué
8299c96ad4 Merge pull request #1603 from pipecat-ai/aleix/deepgram-tavus-fixes
deepgram/tavus fixes
2025-04-16 14:55:45 -07:00
Aleix Conchillo Flaqué
e9af585edd DeepgramTTSService: re-add base_url to constructor 2025-04-16 14:54:02 -07:00
Aleix Conchillo Flaqué
31f7082d12 DeepgramTTSService: use Deepgram's asyncrest instead of asyncio.to_thread 2025-04-16 14:40:59 -07:00
Aleix Conchillo Flaqué
6cea71270e tts: use smaller audio chunk sizes 2025-04-16 14:40:59 -07:00
Aleix Conchillo Flaqué
d05b2d0e8d TavusVideoService: fix rate limiting and max size 2025-04-16 14:40:59 -07:00
Filipi Fuchter
a458c1e92b Improving the README and fixing the env.example 2025-04-16 18:38:48 -03:00
Filipi Fuchter
5bbf1d0209 Example interoping between SmallWebRTC and Daily. 2025-04-16 17:14:12 -03:00
Mark Backman
235cd9cecc Merge pull request #1586 from rahultayal22/rah_google_vertex_issue
Fixed params issue in Google Vertex ai
2025-04-16 14:56:46 -04:00
Mark Backman
829f3ed2db Merge pull request #1601 from pipecat-ai/mb/eject-at-exp-token
Add eject_at_token_exp to Daily REST helpers, modify default values
2025-04-16 14:54:41 -04:00
Rahul Tayal
ac64f0ba91 Run ruff on code 2025-04-16 23:19:09 +05:30
Rahul Tayal
ce41a7585b Resolved comment to update change log 2025-04-16 22:24:25 +05:30
Mark Backman
ce92dfb5ec Add eject_at_token_exp to Daily REST helpers, modify default values 2025-04-16 12:26:33 -04:00
Mark Backman
ee132a2188 Merge pull request #1596 from pipecat-ai/mb/gpt-4.1
Update services and examples to use gpt-4.1 by default
2025-04-16 08:37:48 -04:00
Mark Backman
5f3bbf9828 Rely on default OpenAI model for examples and tests 2025-04-16 08:33:34 -04:00
Mark Backman
55d1d81430 Merge pull request #1595 from pipecat-ai/mb/rtvi-start-convo
Update client/server demos to kick off conversation in on_client_read…
2025-04-16 08:23:16 -04:00
Filipi Fuchter
8e36bdbed7 Adding some comments to the code. 2025-04-16 09:11:27 -03:00
Filipi Fuchter
cd8bd7f487 Adding some comments to the code. 2025-04-16 08:58:40 -03:00
Filipi Fuchter
5fa47b7a5c Adding the dependencies for the remote smart turn 2025-04-16 08:45:01 -03:00
Filipi Fuchter
616961b487 Stop removing segments from the end 2025-04-16 08:04:38 -03:00
Filipi Fuchter
650d4d9ee2 Changing the start speech time and adding logs. 2025-04-16 07:55:20 -03:00
Filipi Fuchter
2627cb6bf2 Allowing to define SmartTurnParams 2025-04-16 07:13:13 -03:00
Filipi Fuchter
0e4115049b Refactoring to use keep alive sessions. 2025-04-16 06:44:57 -03:00
Filipi Fuchter
3ebef9346f Adding support for RemoteSmartTurn 2025-04-16 06:33:42 -03:00
Filipi Fuchter
3e2d21779f Refactoring the BaseEndOfTurnAnalyzer to include most of the logic 2025-04-16 06:11:56 -03:00
Filipi Fuchter
cfefcac35f Resetting the silence frames when the user speaks. 2025-04-15 20:51:36 -03:00
Filipi Fuchter
57b39c084f Triggering to check if the turn is complete based on the maximum timeout 2025-04-15 20:42:41 -03:00
Filipi Fuchter
11b6de0900 Triggering to check if the turn is complete each time the user stops speaking based on the vad 2025-04-15 17:28:00 -03:00
Varun Singh
824bc9bf16 Update dial.js 2025-04-15 12:48:33 -07:00
Varun Singh
d0ddef6c12 Update server.py 2025-04-15 12:37:33 -07:00
Mark Backman
ad40a0f076 Update OpenAILLMService and OpenPipeLLMService to use gpt-4.1 by default 2025-04-15 15:11:05 -04:00
Filipi Fuchter
e6325a8229 Integrating with the smart turn model to predict 2025-04-15 16:01:09 -03:00
Mark Backman
6d10732889 Update OpenAILLMService examples to use gpt-4.1 2025-04-15 14:59:55 -04:00
Mark Backman
fdb46a0fa9 Update client/server demos to kick off conversation in on_client_ready handler 2025-04-15 14:50:38 -04:00
Filipi Fuchter
3588b06718 Adding missing torch dependency. 2025-04-15 12:28:36 -03:00
Filipi Fuchter
73874f6ec0 Loading the smart turn model. 2025-04-15 12:11:06 -03:00
Filipi Fuchter
6ab9a8ad7f Starting to create a local smart turn 2025-04-15 11:24:39 -03:00
Filipi Fuchter
821e303249 Bringing Aleix initial implementation for the smart turn. 2025-04-15 10:21:40 -03:00
chadbailey59
efae26a5a8 Client connect/disconnect events for DailyTransport (#1544)
* added multi transport example

* added working example

* restructured example and added readme

* removed image

* cleanup

* changed data type of callback signature

* removed pipecat example

* added changelog
2025-04-14 15:56:41 -05:00
Aleix Conchillo Flaqué
d16ace22ac Merge pull request #1583 from pipecat-ai/aleix/soundfilemixer-constructor-updates
SoundfileMixer: add mixing argument and require keywords
2025-04-14 10:59:30 -07:00
Rahul Tayal
001c26b79c Fixed params issue in Google Vertex ai 2025-04-14 23:29:16 +05:30
Arman
8dc4f1cda0 Troubleshooting SSL error 2025-04-14 13:39:53 -04:00
Aleix Conchillo Flaqué
ab6be11a0e SoundfileMixer: add mixing argument and require keywords 2025-04-14 08:30:56 -07:00
Filipi da Silva Fuchter
054158b0ff Merge pull request #1579 from pipecat-ai/fixing_smallwebrtc_issue
Fixed an issue in `SmallWebRTCTransport`
2025-04-14 10:44:22 -03:00
Filipi da Silva Fuchter
174cf13abd Merge pull request #1580 from pipecat-ai/fixing_voice_agent_example
Fixing the voice agent example to always create the video transceiver.
2025-04-14 10:44:07 -03:00
Filipi Fuchter
099d2c02e1 Fixing the voice agent example to always create the video transceiver. 2025-04-14 10:41:39 -03:00
Filipi Fuchter
e1108466f6 Fixed an issue in SmallWebRTCTransport where an error was thrown if the client did not create a video transceiver. 2025-04-14 10:36:25 -03:00
Mark Backman
edd53d425e Merge pull request #1577 from pipecat-ai/hush/trackStoppedSimpleChatbot
docs: Fix TrackStopped typo in SimpleChatbot
2025-04-14 08:32:58 -04:00
James Hush
b160cf34e9 Remove formatting 2025-04-14 15:13:45 +08:00
James Hush
dae3b927e1 docs: Fix TrackStopped typo in SimpleChatbot 2025-04-14 15:12:17 +08:00
Bnowako
61cba0136f Add marker file for static type checkers 2025-04-11 11:00:57 +02:00
Prem Adithya
c510870736 Merge branch 'pipecat-ai:main' into anthropic-client-bug-fixes 2025-04-04 16:41:04 +11:00
Shaiyon Hariri
af23200511 Use default google creds as fallback when not provided in llm_vertex,stt, and tts 2025-04-03 16:42:58 -04:00
Adithya Suresh
e8783f6a33 Handle cache token counts being none 2025-03-31 15:25:11 +11:00
490 changed files with 39564 additions and 8795 deletions

87
.github/ISSUE_TEMPLATE/1-bug_report.yml vendored Normal file
View File

@@ -0,0 +1,87 @@
name: Bug report
description: Report a bug or unexpected behavior
type: Bug
body:
- type: markdown
attributes:
value: |
## Bug Report
Thank you for taking the time to fill out this bug report.
- type: markdown
attributes:
value: |
### Environment
- type: input
id: pipecat-version
attributes:
label: pipecat version
description: Which version are you using?
placeholder: e.g., 0.0.63
validations:
required: true
- type: input
id: python-version
attributes:
label: Python version
description: Which Python version are you using?
placeholder: e.g., 3.12.8
validations:
required: true
- type: input
id: os
attributes:
label: Operating System
description: Which OS are you using?
placeholder: e.g., Ubuntu 24.04, Windows 11, macOS 12.5
validations:
required: true
- type: textarea
id: description
attributes:
label: Issue description
description: Provide a clear description of the issue.
validations:
required: true
- type: textarea
id: repro
attributes:
label: Reproduction steps
description: List the steps to reproduce the issue.
placeholder: |
1. Do this...
2. Then do that...
3. Observe the error...
validations:
required: true
- type: textarea
id: expected
attributes:
label: Expected behavior
description: What did you expect to happen?
validations:
required: true
- type: textarea
id: actual
attributes:
label: Actual behavior
description: What actually happened?
validations:
required: true
- type: textarea
id: logs
attributes:
label: Logs
description: If applicable, include any relevant logs or error messages
render: shell
validations:
required: false

67
.github/ISSUE_TEMPLATE/2-question.yml vendored Normal file
View File

@@ -0,0 +1,67 @@
name: Question
description: Ask a question or get help
type: Question
body:
- type: markdown
attributes:
value: |
## Question
Use this form to ask a question about pipecat.
- type: markdown
attributes:
value: |
### Environment (if applicable)
- type: input
id: pipecat-version
attributes:
label: pipecat version
description: Which version are you using? (if applicable)
placeholder: e.g., 0.0.63
validations:
required: false
- type: input
id: python-version
attributes:
label: Python version
description: Which Python version are you using? (if applicable)
placeholder: e.g., 3.12.8
validations:
required: false
- type: input
id: os
attributes:
label: Operating System
description: Which OS are you using? (if applicable)
placeholder: e.g., Ubuntu 24.04, Windows 11, macOS 12.5
validations:
required: false
- type: textarea
id: question
attributes:
label: Question
description: Provide your question in detail here.
validations:
required: true
- type: textarea
id: tried
attributes:
label: What I've tried
description: Describe what you've already tried or research you've done.
placeholder: I've looked at the documentation and tried...
validations:
required: false
- type: textarea
id: context
attributes:
label: Context
description: Any additional context or information that might help others understand your question better.
validations:
required: false

View File

@@ -0,0 +1,52 @@
name: Feature request
description: Suggest an enhancement or new feature
type: Enhancement
body:
- type: markdown
attributes:
value: |
## Feature Request
Thank you for suggesting an enhancement to pipecat.
- type: textarea
id: problem
attributes:
label: Problem Statement
description: A clear description of the problem this feature would solve.
placeholder: I'm always frustrated when...
validations:
required: true
- type: textarea
id: solution
attributes:
label: Proposed Solution
description: A clear and concise description of what you want to happen.
validations:
required: true
- type: textarea
id: alternatives
attributes:
label: Alternative Solutions
description: Any alternative solutions or features you've considered.
validations:
required: false
- type: textarea
id: context
attributes:
label: Additional Context
description: Add any other context, mockups, or screenshots about the feature request here.
placeholder: You can drag and drop images here to include them.
validations:
required: false
- type: checkboxes
id: contribution
attributes:
label: Would you be willing to help implement this feature?
options:
- label: Yes, I'd like to contribute
- label: No, I'm just suggesting

View File

@@ -0,0 +1,82 @@
name: Service Issue
description: An issue with a third-party service
type: Service Issue
body:
- type: markdown
attributes:
value: |
## Service Issue
Use this form to report an issue with a third-party service integration.
- type: input
id: pipecat-version
attributes:
label: pipecat version
description: Which version are you using?
placeholder: e.g., 0.0.63
validations:
required: true
- type: input
id: service-name
attributes:
label: Service Name
description: Which third-party service is having issues?
placeholder: e.g., OpenAI, ElevenLabs, Anthropic
validations:
required: true
- type: input
id: service-version
attributes:
label: Service or model version
description: Which version of the service API or model are you using?
placeholder: e.g., v1, gpt-4.1
validations:
required: false
- type: textarea
id: description
attributes:
label: Issue Description
description: Provide a clear description of the service issue.
validations:
required: true
- type: textarea
id: reproduction
attributes:
label: Reproduction Steps
description: Provide steps to reproduce the issue.
placeholder: |
1. Configure service X
2. Call method Y
3. See error Z
validations:
required: true
- type: textarea
id: expected
attributes:
label: Expected Behavior
description: What did you expect to happen?
validations:
required: true
- type: textarea
id: actual
attributes:
label: Actual Behavior
description: What actually happened?
validations:
required: true
- type: textarea
id: logs
attributes:
label: Error Logs
description: If available, include any error messages or logs.
render: shell
validations:
required: false

View File

@@ -0,0 +1,56 @@
name: New Service
description: Request to support a new third-party service
type: New Service
body:
- type: markdown
attributes:
value: |
## New Service Request
Use this form to request support for a new third-party service in pipecat.
- type: input
id: service-name
attributes:
label: Service Name
description: What is the name of the third-party service?
placeholder: e.g., NewAPI, SomeService
validations:
required: true
- type: input
id: service-website
attributes:
label: Service Website
description: Link to the service's website or documentation
placeholder: e.g., https://newapi.com
validations:
required: true
- type: textarea
id: service-description
attributes:
label: Service Description
description: Briefly describe what this service does and how it works.
validations:
required: true
- type: textarea
id: api-info
attributes:
label: API Information
description: If available, provide details about the service's API.
placeholder: |
- API documentation link
- Authentication method
- Key endpoints you'd like supported
validations:
required: false
- type: checkboxes
id: contribution
attributes:
label: Would you be willing to help implement this service?
options:
- label: Yes, I'd like to contribute
- label: No, I'm just suggesting

74
.github/ISSUE_TEMPLATE/6-dependency.yml vendored Normal file
View File

@@ -0,0 +1,74 @@
name: Dependency Issue
description: An issue with a Pipecat dependency (not a third-party service)
type: Dependency Issue
body:
- type: markdown
attributes:
value: |
## Dependency Issue
Use this form to report an issue with a Pipecat dependency.
- type: input
id: pipecat-version
attributes:
label: pipecat version
description: Which version are you using?
placeholder: e.g., 0.0.63
validations:
required: true
- type: input
id: dependency-name
attributes:
label: Dependency Name
description: Which Pipecat dependency is causing the issue?
placeholder: e.g., openai, anthropic, fastapi
validations:
required: true
- type: input
id: dependency-version
attributes:
label: Dependency Version
description: Which version of the dependency are you using?
placeholder: e.g., 1.2.3
validations:
required: true
- type: textarea
id: description
attributes:
label: Issue Description
description: Provide a clear description of the dependency issue.
validations:
required: true
- type: textarea
id: impact
attributes:
label: Impact
description: How is this dependency issue affecting your usage of pipecat?
validations:
required: true
- type: textarea
id: reproduction
attributes:
label: Reproduction Steps
description: If applicable, provide steps to reproduce the issue.
placeholder: |
1. Install dependency X
2. Run command Y
3. See error Z
validations:
required: false
- type: textarea
id: logs
attributes:
label: Error Logs
description: If applicable, include any relevant error messages or logs.
render: shell
validations:
required: false

View File

@@ -0,0 +1,70 @@
name: Troubleshooting
description: Help with a specific use case
type: Troubleshooting
body:
- type: markdown
attributes:
value: |
## Troubleshooting Request
Use this form to get help with a specific use case or implementation.
- type: input
id: pipecat-version
attributes:
label: pipecat version
description: Which version are you using?
placeholder: e.g., 0.0.63
validations:
required: true
- type: input
id: python-version
attributes:
label: Python version
description: Which version of Python are you using?
placeholder: e.g., 3.12.8
validations:
required: true
- type: input
id: os
attributes:
label: Operating System
description: Which OS are you using?
placeholder: e.g., Ubuntu 24.04, Windows 11, macOS 12.5
validations:
required: true
- type: textarea
id: use-case
attributes:
label: Use Case Description
description: Describe what you're trying to accomplish with pipecat.
validations:
required: true
- type: textarea
id: current-approach
attributes:
label: Current Approach
description: What have you tried so far? Include code snippets if relevant.
render: python
validations:
required: true
- type: textarea
id: errors
attributes:
label: Errors or Unexpected Behavior
description: Describe any errors or unexpected behavior you're encountering.
validations:
required: true
- type: textarea
id: additional-context
attributes:
label: Additional Context
description: Any other information that might help us understand your situation.
validations:
required: false

1
.github/ISSUE_TEMPLATE/config.yml vendored Normal file
View File

@@ -0,0 +1 @@
blank_issues_enabled: false

30
.gitignore vendored
View File

@@ -7,7 +7,7 @@ venv
/.idea
#*#
# Distribution / packaging
# Distribution / Packaging
.Python
build/
develop-eggs/
@@ -30,24 +30,24 @@ MANIFEST
.env
fly.toml
# Example files
pipecat/examples/twilio-chatbot/templates/streams.xml
pipecat/examples/bot-ready-signalling/client/react-native/node_modules/
pipecat/examples/bot-ready-signalling/client/react-native/.expo/
pipecat/examples/bot-ready-signalling/client/react-native/dist/
pipecat/examples/bot-ready-signalling/client/react-native/npm-debug.*
pipecat/examples/bot-ready-signalling/client/react-native/*.jks
pipecat/examples/bot-ready-signalling/client/react-native/*.p8
pipecat/examples/bot-ready-signalling/client/react-native/*.p12
pipecat/examples/bot-ready-signalling/client/react-native/*.key
pipecat/examples/bot-ready-signalling/client/react-native/*.mobileprovision
pipecat/examples/bot-ready-signalling/client/react-native/*.orig.*
pipecat/examples/bot-ready-signalling/client/react-native/web-build/
# Examples
examples/telnyx-chatbot/templates/streams.xml
examples/twilio-chatbot/templates/streams.xml
examples/**/node_modules/
examples/**/.expo/
examples/**/dist/
examples/**/npm-debug.*
examples/**/*.jks
examples/**/*.p8
examples/**/*.p12
examples/**/*.key
examples/**/*.mobileprovision
examples/**/*.orig.*
examples/**/web-build/
# macOS
.DS_Store
# Documentation
docs/api/_build/
docs/api/api

View File

@@ -5,6 +5,355 @@ All notable changes to **Pipecat** will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [0.0.67] - 2025-05-07
### Added
- Added `DebugLogObserver` for detailed frame logging with configurable
filtering by frame type and endpoint. This observer automatically extracts
and formats all frame data fields for debug logging.
- `UserImageRequestFrame.video_source` field has been added to request an image
from the desired video source.
- Added support for the AWS Nova Sonic speech-to-speech model with the new
`AWSNovaSonicLLMService`.
See https://docs.aws.amazon.com/nova/latest/userguide/speech.html.
Note that it requires Python >= 3.12 and `pip install pipecat-ai[aws-nova-sonic]`.
- Added new AWS services `AWSBedrockLLMService` and `AWSTranscribeSTTService`.
- Added `on_active_speaker_changed` event handler to the `DailyTransport` class.
- Added `enable_ssml_parsing` and `enable_logging` to `InputParams` in
`ElevenLabsTTSService`.
- Added support to `RimeHttpTTSService` for the `arcana` model.
### Changed
- Updated `ElevenLabsTTSService` to use the beta websocket API
(multi-stream-input). This new API supports context_ids and cancelling those
contexts, which greatly improves interruption handling.
- Observers `on_push_frame()` now take a single argument `FramePushed` instead
of multiple arguments.
- Updated the default voice for `DeepgramTTSService` to `aura-2-helena-en`.
### Deprecated
- `PollyTTSService` is now deprecated, use `AWSPollyTTSService` instead.
- Observer `on_push_frame(src, dst, frame, direction, timestamp)` is now
deprecated, use `on_push_frame(data: FramePushed)` instead.
### Fixed
- Fixed a `DailyTransport` issue that was causing issues when multiple audio or
video sources where being captured.
- Fixed a `UltravoxSTTService` issue that would cause the service to generate
all tokens as one word.
- Fixed a `PipelineTask` issue that would cause tasks to not be cancelled if
task was cancelled from outside of Pipecat.
- Fixed a `TaskManager` that was causing dangling tasks to be reported.
- Fixed an issue that could cause data to be sent to the transports when they
were still not ready.
- Remove custom audio tracks from `DailyTransport` before leaving.
### Removed
- Removed `CanonicalMetricsService` as it's no longer maintained.
## [0.0.66] - 2025-05-02
### Added
- Added two new input parameters to `RimeTTSService`: `pause_between_brackets`
and `phonemize_between_brackets`.
- Added support for cross-platform local smart turn detection. You can use
`LocalSmartTurnAnalyzer` for on-device inference using Torch.
- `BaseOutputTransport` now allows multiple destinations if the transport
implementation supports it (e.g. Daily's custom tracks). With multiple
destinations it is possible to send different audio or video tracks with a
single transport simultaneously. To do that, you need to set the new
`Frame.transport_destination` field with your desired transport destination
(e.g. custom track name), tell the transport you want a new destination with
`TransportParams.audio_out_destinations` or
`TransportParams.video_out_destinations` and the transport should take care of
the rest.
- Similar to the new `Frame.transport_destination`, there's a new
`Frame.transport_source` field which is set by the `BaseInputTransport` if the
incoming data comes from a non-default source (e.g. custom tracks).
- `TTSService` has a new `transport_destination` constructor parameter. This
parameter will be used to update the `Frame.transport_destination` field for
each generated `TTSAudioRawFrame`. This allows sending multiple bots' audio to
multiple destinations in the same pipeline.
- Added `DailyTransportParams.camera_out_enabled` and
`DailyTransportParams.microphone_out_enabled` which allows you to
enable/disable the main output camera or microphone tracks. This is useful if
you only want to use custom tracks and not send the main tracks. Note that you
still need `audio_out_enabled=True` or `video_out_enabled`.
- Added `DailyTransport.capture_participant_audio()` which allows you to capture
an audio source (e.g. "microphone", "screenAudio" or a custom track name) from
a remote participant.
- Added `DailyTransport.update_publishing()` which allows you to update the call
video and audio publishing settings (e.g. audio and video quality).
- Added `RTVIObserverParams` which allows you to configure what RTVI messages
are sent to the clients.
- Added a `context_window_compression` InputParam to
`GeminiMultimodalLiveLLMService` which allows you to enable a sliding context
window for the session as well as set the token limit of the sliding window.
- Updated `SmallWebRTCConnection` to support `ice_servers` with credentials.
- Added `VADUserStartedSpeakingFrame` and `VADUserStoppedSpeakingFrame`,
indicating when the VAD detected the user to start and stop speaking. These
events are helpful when using smart turn detection, as the user's stop time
can differ from when their turn ends (signified by UserStoppedSpeakingFrame).
- Added `TranslationFrame`, a new frame type that contains a translated
transcription.
- Added `TransportParams.audio_in_passthrough`. If set (the default), incoming
audio will be pushed downstream.
- Added `MCPClient`; a way to connect to MCP servers and use the MCP servers'
tools.
- Added `Mem0 OSS`, along with Mem0 cloud support now the OSS version is also
available.
### Changed
- `TransportParams.audio_mixer` now supports a string and also a dictionary to
provide a mixer per destination. For example:
```python
audio_out_mixer={
"track-1": SoundfileMixer(...),
"track-2": SoundfileMixer(...),
"track-N": SoundfileMixer(...),
},
```
- The `STTMuteFilter` now mutes `InterimTranscriptionFrame` and
`TranscriptionFrame` which allows the `STTMuteFilter` to be used in
conjunction with transports that generate transcripts, e.g. `DailyTransport`.
- Function calls now receive a single parameter `FunctionCallParams` instead of
`(function_name, tool_call_id, args, llm, context, result_callback)` which is
now deprecated.
- Changed the user aggregator timeout for late transcriptions from 1.0s to 0.5s
(`LLMUserAggregatorParams.aggregation_timeout`). Sometimes, the STT services
might give us more than one transcription which could come after the user
stopped speaking. We still want to include these additional transcriptions
with the first one because it's part of the user turn. This is what this
timeout is helpful with.
- Short utterances not detected by VAD while the bot is speaking are now
ignored. This reduces the amount of bot interruptions significantly providing
a more natural conversation experience.
- Updated `GladiaSTTService` to output a `TranslationFrame` when specifying a
`translation` and `translation_config`.
- STT services now passthrough audio frames by default. This allows you to add
audio recording without worrying about what's wrong in your pipeline when it
doesn't work the first time.
- Input transports now always push audio downstream unless disabled with
`TransportParams.audio_in_passthrough`. After many Pipecat releases, we
realized this is the common use case. There are use cases where the input
transport already provides STT and you also don't want recordings, in which
case there's no need to push audio to the rest of the pipeline, but this is
not a very common case.
- Added `RivaSegmentedSTTService`, which allows Riva offline/batch models, such
as to be "canary-1b-asr" used in Pipecat.
### Deprecated
- Function calls with parameters
`(function_name, tool_call_id, args, llm, context, result_callback)` are
deprectated, use a single `FunctionCallParams` parameter instead.
- `TransportParams.camera_*` parameters are now deprecated, use
`TransportParams.video_*` instead.
- `TransportParams.vad_enabled` parameter is now deprecated, use
`TransportParams.audio_in_enabled` and `TransportParams.vad_analyzer` instead.
- `TransportParams.vad_audio_passthrough` parameter is now deprecated, use
`TransportParams.audio_in_passthrough` instead.
- `ParakeetSTTService` is now deprecated, use `RivaSTTService` instead, which uses
the model "parakeet-ctc-1.1b-asr" by default.
- `FastPitchTTSService` is now deprecated, use `RivaTTSService` instead, which uses
the model "magpie-tts-multilingual" by default.
### Fixed
- Fixed an issue with `SimliVideoService` where the bot was continuously outputting
audio, which prevents the `BotStoppedSpeakingFrame` from being emitted.
- Fixed an issue where `OpenAIRealtimeBetaLLMService` would add two assistant
messages to the context.
- Fixed an issue with `GeminiMultimodalLiveLLMService` where the context
contained tokens instead of words.
- Fixed an issue with HTTP Smart Turn handling, where the service returns a 500
error. Previously, this would cause an unhandled exception. Now, a 500 error
is treated as an incomplete response.
- Fixed a TTS services issue that could cause assistant output not to be
aggregated to the context when also using `TTSSpeakFrame`s.
- Fixed an issue where the `SmartTurnMetricsData` was reporting 0ms for
inference and processing time when using the `FalSmartTurnAnalyzer`.
### Other
- Added `examples/daily-custom-tracks` to show how to send and receive Daily
custom tracks.
- Added `examples/daily-multi-translation` to showcase how to send multiple
simulataneous translations with the same transport.
- Added 04 foundational examples for client/server transports. Also, renamed
`29-livekit-audio-chat.py` to `04b-transports-livekit.py`.
- Added foundational example `13c-gladia-translation.py` showing how to use
`TranscriptionFrame` and `TranslationFrame`.
## [0.0.65] - 2025-04-23 "Sant Jordi's release" 🌹📕
https://en.wikipedia.org/wiki/Saint_George%27s_Day_in_Catalonia
### Added
- Added automatic hangup logic to the Telnyx serializer. This feature hangs up
the Telnyx call when an `EndFrame` or `CancelFrame` is received. It is
enabled by default and is configurable via the `auto_hang_up` `InputParam`.
- Added a keepalive task to `GladiaSTTService` to prevent the websocket from
disconnecting after 30 seconds of no audio input.
### Changed
- The `InputParams` for `ElevenLabsTTSService` and `ElevenLabsHttpTTSService`
no longer require that `stability` and `similarity_boost` be set. You can
individually set each param.
- In `TwilioFrameSerializer`, `call_sid` is Optional so as to avoid a breaking
changed. `call_sid` is required to automatically hang up.
### Fixed
- Fixed an issue where `TwilioFrameSerializer` would send two hang up commands:
one for the `EndFrame` and one for the `CancelFrame`.
## [0.0.64] - 2025-04-22
### Added
- Added automatic hangup logic to the Twilio serializer. This feature hangs up
the Twilio call when an `EndFrame` or `CancelFrame` is received. It is
enabled by default and is configurable via the `auto_hang_up` `InputParam`.
- Added `SmartTurnMetricsData`, which contains end-of-turn prediction metrics,
to the `MetricsFrame`. Using `MetricsFrame`, you can now retrieve prediction
confidence scores and processing time metrics from the smart turn analyzers.
- Added support for Application Default Credentials in Google services,
`GoogleSTTService`, `GoogleTTSService`, and `GoogleVertexLLMService`.
- Added support for Smart Turn Detection via the `turn_analyzer` transport
parameter. You can now choose between `HttpSmartTurnAnalyzer()` or
`FalSmartTurnAnalyzer()` for remote inference or
`LocalCoreMLSmartTurnAnalyzer()` for on-device inference using Core ML.
- `DeepgramTTSService` accepts `base_url` argument again, allowing you to
connect to an on-prem service.
- Added `LLMUserAggregatorParams` and `LLMAssistantAggregatorParams` which allow
you to control aggregator settings. You can now pass these arguments when
creating aggregator pairs with `create_context_aggregator()`.
- Added `previous_text` context support to ElevenLabsHttpTTSService, improving
speech consistency across sentences within an LLM response.
- Added word/timestamp pairs to `ElevenLabsHttpTTSService`.
- It is now possible to disable `SoundfileMixer` when created. You can then use
`MixerEnableFrame` to dynamically enable it when necessary.
- Added `on_client_connected` and `on_client_disconnected` event handlers to
the `DailyTransport` class. These handlers map to the same underlying Daily
events as `on_participant_joined` and `on_participant_left`, respectively.
This makes it easier to write a single bot pipeline that can also use other
transports like `SmallWebRTCTransport` and `FastAPIWebsocketTransport`.
### Changed
- `GrokLLMService` now uses `grok-3-beta` as its default model.
- Daily's REST helpers now include an `eject_at_token_exp` param, which ejects
the user when their token expires. This new parameter defaults to False.
Also, the default value for `enable_prejoin_ui` changed to False and
`eject_at_room_exp` changed to False.
- `OpenAILLMService` and `OpenPipeLLMService` now use `gpt-4.1` as their
default model.
- `SoundfileMixer` constructor arguments need to be keywords.
### Deprecated
- `DeepgramSTTService` parameter `url` is now deprecated, use `base_url`
instead.
### Removed
- Parameters `user_kwargs` and `assistant_kwargs` when creating a context
aggregator pair using `create_context_aggregator()` have been removed. Use
`user_params` and `assistant_params` instead.
### Fixed
- Fixed an issue that would cause TTS websocket-based services to not cleanup
resources properly when disconnecting.
- Fixed a `TavusVideoService` issue that was causing audio choppiness.
- Fixed an issue in `SmallWebRTCTransport` where an error was thrown if the
client did not create a video transceiver.
- Fixed an issue where LLM input parameters were not working and applied
correctly in `GoogleVertexLLMService`, causing unexpected behavior during
inference.
### Other
- Updated the `twilio-chatbot` example to use the auto-hangup feature.
## [0.0.63] - 2025-04-11
### Added

239
README.md
View File

@@ -1,43 +1,72 @@
<h1><div align="center">
 <img alt="pipecat" width="300px" height="auto" src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/pipecat.png">
<img alt="pipecat" width="300px" height="auto" src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/pipecat.png">
</div></h1>
[![PyPI](https://img.shields.io/pypi/v/pipecat-ai)](https://pypi.org/project/pipecat-ai) ![Tests](https://github.com/pipecat-ai/pipecat/actions/workflows/tests.yaml/badge.svg) [![codecov](https://codecov.io/gh/pipecat-ai/pipecat/graph/badge.svg?token=LNVUIVO4Y9)](https://codecov.io/gh/pipecat-ai/pipecat) [![Docs](https://img.shields.io/badge/Documentation-blue)](https://docs.pipecat.ai) [![Discord](https://img.shields.io/discord/1239284677165056021)](https://discord.gg/pipecat)
Pipecat is an open source Python framework for building voice and multimodal conversational agents. It handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions, letting you focus on creating engaging experiences.
# 🎙️ Pipecat: Real-Time Voice & Multimodal AI Agents
## What you can build
**Pipecat** is an open-source Python framework for building real-time voice and multimodal conversational agents. Orchestrate audio and video, AI services, different transports, and conversation pipelines effortlessly—so you can focus on what makes your agent unique.
- **Voice Assistants**: [Natural, real-time conversations with AI](https://demo.dailybots.ai/)
- **Interactive Agents**: Personal coaches and meeting assistants
- **Multimodal Apps**: Combine voice, video, images, and text
- **Creative Tools**: [Story-telling experiences](https://storytelling-chatbot.fly.dev/) and social companions
- **Business Solutions**: [Customer intake flows](https://www.youtube.com/watch?v=lDevgsp9vn0) and support bots
- **Complex conversational flows**: [Refer to Pipecat Flows](https://github.com/pipecat-ai/pipecat-flows) to learn more
## 🚀 What You Can Build
## See it in action
- **Voice Assistants** natural, streaming conversations with AI
- **AI Companions** coaches, meeting assistants, characters
- **Multimodal Interfaces** voice, video, images, and more
- **Interactive Storytelling** creative tools with generative media
- **Business Agents** customer intake, support bots, guided flows
- **Complex Dialog Systems** design logic with structured conversations
🧭 Looking to build structured conversations? Check out [Pipecat Flows](https://github.com/pipecat-ai/pipecat-flows) for managing complex conversational states and transitions.
## 🧠 Why Pipecat?
- **Voice-first**: Integrates speech recognition, text-to-speech, and conversation handling
- **Pluggable**: Supports many AI services and tools
- **Composable Pipelines**: Build complex behavior from modular components
- **Real-Time**: Ultra-low latency interaction with different transports (e.g. WebSockets or WebRTC)
## 🎬 See it in action
<p float="left">
<a href="https://github.com/pipecat-ai/pipecat/tree/main/examples/simple-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/simple-chatbot/image.png" width="280" /></a>&nbsp;
<a href="https://github.com/pipecat-ai/pipecat/tree/main/examples/storytelling-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/storytelling-chatbot/image.png" width="280" /></a>
<a href="https://github.com/pipecat-ai/pipecat/tree/main/examples/simple-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/simple-chatbot/image.png" width="400" /></a>&nbsp;
<a href="https://github.com/pipecat-ai/pipecat/tree/main/examples/storytelling-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/storytelling-chatbot/image.png" width="400" /></a>
<br/>
<a href="https://github.com/pipecat-ai/pipecat/tree/main/examples/translation-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/translation-chatbot/image.png" width="280" /></a>&nbsp;
<a href="https://github.com/pipecat-ai/pipecat/tree/main/examples/moondream-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/moondream-chatbot/image.png" width="280" /></a>
<a href="https://github.com/pipecat-ai/pipecat/tree/main/examples/translation-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/translation-chatbot/image.png" width="400" /></a>&nbsp;
<a href="https://github.com/pipecat-ai/pipecat/tree/main/examples/moondream-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/moondream-chatbot/image.png" width="400" /></a>
</p>
## Key features
## 📱 Client SDKs
- **Voice-first Design**: Built-in speech recognition, TTS, and conversation handling
- **Flexible Integration**: Works with popular AI services (OpenAI, ElevenLabs, etc.)
- **Pipeline Architecture**: Build complex apps from simple, reusable components
- **Real-time Processing**: Frame-based pipeline architecture for fluid interactions
- **Production Ready**: Enterprise-grade WebRTC and Websocket support
You can connect to Pipecat from any platform using our official SDKs:
💡 Looking to build structured conversations? Check out [Pipecat Flows](https://github.com/pipecat-ai/pipecat-flows) for managing complex conversational states and transitions.
| Platform | SDK Repo | Description |
| -------- | ------------------------------------------------------------------------------ | -------------------------------- |
| Web | [pipecat-client-web](https://github.com/pipecat-ai/pipecat-client-web) | JavaScript and React client SDKs |
| iOS | [pipecat-client-ios](https://github.com/pipecat-ai/pipecat-client-ios) | Swift SDK for iOS |
| Android | [pipecat-client-android](https://github.com/pipecat-ai/pipecat-client-android) | Kotlin SDK for Android |
| C++ | [pipecat-client-cxx](https://github.com/pipecat-ai/pipecat-client-cxx) | C++ client SDK |
## Getting started
## 🧩 Available services
You can get started with Pipecat running on your local machine, then move your agent processes to the cloud when youre ready. You can also add a 📞 telephone number, 🖼️ image output, 📺 video input, use different LLMs, and more.
| Category | Services |
|---------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [Parakeet (NVIDIA)](https://docs.pipecat.ai/server/services/stt/parakeet), [Ultravox](https://docs.pipecat.ai/server/services/stt/ultravox), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) |
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/server/services/llm/aws), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [Together AI](https://docs.pipecat.ai/server/services/llm/together) |
| Text-to-Speech | [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [FastPitch (NVIDIA)](https://docs.pipecat.ai/server/services/tts/fastpitch), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
| Speech-to-Speech | [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai) |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local |
| Video | [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) |
| Memory | [mem0](https://docs.pipecat.ai/server/services/memory/mem0) |
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/fal), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) |
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [Noisereduce](https://docs.pipecat.ai/server/utilities/audio/noisereduce-filter) |
| Analytics & Metrics | [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) |
📚 [View full services documentation →](https://docs.pipecat.ai/server/services/supported-services)
## ⚡ Getting started
You can get started with Pipecat running on your local machine, then move your agent processes to the cloud when youre ready.
```shell
# Install the module
@@ -53,155 +82,71 @@ To keep things lightweight, only the core framework is included by default. If y
pip install "pipecat-ai[option,...]"
```
### Available services
| Category | Services | Install Command Example |
| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------- |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [Parakeet (NVIDIA)](https://docs.pipecat.ai/server/services/stt/parakeet), [Ultravox](https://docs.pipecat.ai/server/services/stt/ultravox), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) | `pip install "pipecat-ai[deepgram]"` |
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [Together AI](https://docs.pipecat.ai/server/services/llm/together) | `pip install "pipecat-ai[openai]"` |
| Text-to-Speech | [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [FastPitch (NVIDIA)](https://docs.pipecat.ai/server/services/tts/fastpitch), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) | `pip install "pipecat-ai[cartesia]"` |
| Speech-to-Speech | [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai) | `pip install "pipecat-ai[google]"` |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local | `pip install "pipecat-ai[daily]"` |
| Video | [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) | `pip install "pipecat-ai[tavus,simli]"` |
| Memory | [mem0](https://docs.pipecat.ai/server/services/memory/mem0) | `pip install "pipecat-ai[mem0]"` |
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/fal), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) | `pip install "pipecat-ai[moondream]"` |
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [Noisereduce](https://docs.pipecat.ai/server/utilities/audio/noisereduce-filter) | `pip install "pipecat-ai[silero]"` |
| Analytics & Metrics | [Canonical AI](https://docs.pipecat.ai/server/services/analytics/canonical), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) | `pip install "pipecat-ai[canonical]"` |
📚 [View full services documentation →](https://docs.pipecat.ai/server/services/supported-services)
## Code examples
## 🧪 Code examples
- [Foundational](https://github.com/pipecat-ai/pipecat/tree/main/examples/foundational) — small snippets that build on each other, introducing one or two concepts at a time
- [Example apps](https://github.com/pipecat-ai/pipecat/tree/main/examples/) — complete applications that you can use as starting points for development
## A simple voice agent running locally
## 🛠️ Hacking on the framework itself
Here is a very basic Pipecat bot that greets a user when they join a real-time session. We'll use [Daily](https://daily.co) for real-time media transport, and [Cartesia](https://cartesia.ai/) for text-to-speech.
1. Set up a virtual environment before following these instructions. From the root of the repo:
```python
import asyncio
```shell
python3 -m venv venv
source venv/bin/activate
```
from pipecat.frames.frames import TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.pipeline.runner import PipelineRunner
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
2. Install the development dependencies:
async def main():
# Use Daily as a real-time media transport (WebRTC)
transport = DailyTransport(
room_url=...,
token="", # leave empty. Note: token is _not_ your api key
bot_name="Bot Name",
params=DailyParams(audio_out_enabled=True))
```shell
pip install -r dev-requirements.txt
```
# Use Cartesia for Text-to-Speech
tts = CartesiaTTSService(
api_key=...,
voice_id=...
)
3. Install the git pre-commit hooks (these help ensure your code follows project rules):
# Simple pipeline that will process text to speech and output the result
pipeline = Pipeline([tts, transport.output()])
```shell
pre-commit install
```
# Create Pipecat processor that can run one or more pipelines tasks
runner = PipelineRunner()
4. Install the `pipecat-ai` package locally in editable mode:
# Assign the task callable to run the pipeline
task = PipelineTask(pipeline)
```shell
pip install -e .
```
# Register an event handler to play audio when a
# participant joins the transport WebRTC session
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
participant_name = participant.get("info", {}).get("userName", "")
# Queue a TextFrame that will get spoken by the TTS service (Cartesia)
await task.queue_frame(TextFrame(f"Hello there, {participant_name}!"))
> The `-e` or `--editable` option allows you to modify the code without reinstalling.
# Register an event handler to exit the application when the user leaves.
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.cancel()
5. Include optional dependencies as needed. For example:
# Run the pipeline task
await runner.run(task)
```shell
pip install -e ".[daily,deepgram,cartesia,openai,silero]"
```
if __name__ == "__main__":
asyncio.run(main())
```
6. (Optional) If you want to use this package from another directory:
Run it with:
```shell
python app.py
```
Daily provides a prebuilt WebRTC user interface. While the app is running, you can visit at `https://<yourdomain>.daily.co/<room_url>` and listen to the bot say hello!
## WebRTC for production use
WebSockets are fine for server-to-server communication or for initial development. But for production use, youll need client-server audio to use a protocol designed for real-time media transport. (For an explanation of the difference between WebSockets and WebRTC, see [this post.](https://www.daily.co/blog/how-to-talk-to-an-llm-with-your-voice/#webrtc))
One way to get up and running quickly with WebRTC is to sign up for a Daily developer account. Daily gives you SDKs and global infrastructure for audio (and video) routing. Every account gets 10,000 audio/video/transcription minutes free each month.
Sign up [here](https://dashboard.daily.co/u/signup) and [create a room](https://docs.daily.co/reference/rest-api/rooms) in the developer Dashboard.
## Hacking on the framework itself
_Note: You may need to set up a virtual environment before following these instructions. From the root of the repo:_
```shell
python3 -m venv venv
source venv/bin/activate
```
Install the development dependencies:
```shell
pip install -r dev-requirements.txt
```
Install the git pre-commit hooks (these help ensure your code follows project rules):
```shell
pre-commit install
```
Install the `pipecat-ai` package locally in editable mode:
```shell
pip install -e .
```
The `-e` or `--editable` option allows you to modify the code without reinstalling.
To include optional dependencies, add them to the install command. For example:
```shell
pip install -e ".[daily,deepgram,cartesia,openai,silero]" # Updated for the services you're using
```
If you want to use this package from another directory:
```shell
pip install "path_to_this_repo[option,...]"
```
```shell
pip install "path_to_this_repo[option,...]"
```
### Running tests
Install the test dependencies:
```shell
pip install -r test-requirements.txt
```
From the root directory, run:
```shell
pytest
```
## Setting up your editor
### Setting up your editor
This project uses strict [PEP 8](https://peps.python.org/pep-0008/) formatting via [Ruff](https://github.com/astral-sh/ruff).
### Emacs
#### Emacs
You can use [use-package](https://github.com/jwiegley/use-package) to install [emacs-lazy-ruff](https://github.com/christophermadsen/emacs-lazy-ruff) package and configure `ruff` arguments:
@@ -223,7 +168,7 @@ You can use [use-package](https://github.com/jwiegley/use-package) to install [e
:hook ((python-mode . pyvenv-auto-run)))
```
### Visual Studio Code
#### Visual Studio Code
Install the
[Ruff](https://marketplace.visualstudio.com/items?itemName=charliermarsh.ruff) extension. Then edit the user settings (_Ctrl-Shift-P_ `Open User Settings (JSON)`) and set it as the default Python formatter, and enable formatting on save:
@@ -235,7 +180,7 @@ Install the
}
```
### PyCharm
#### PyCharm
`ruff` was installed in the `venv` environment described before, now to enable autoformatting on save, go to `File` -> `Settings` -> `Tools` -> `File Watchers` and add a new watcher with the following settings:
@@ -245,7 +190,7 @@ Install the
4. **Arguments**: `format $FilePath$`
5. **Program**: `$PyInterpreterDirectory$/ruff`
## Contributing
## 🤝 Contributing
We welcome contributions from the community! Whether you're fixing bugs, improving documentation, or adding new features, here's how you can help:
@@ -258,7 +203,7 @@ Before submitting a pull request, please check existing issues and PRs to avoid
We aim to review all contributions promptly and provide constructive feedback to help get your changes merged.
## Getting help
## 🛟 Getting help
➡️ [Join our Discord](https://discord.gg/pipecat)

View File

@@ -1,22 +0,0 @@
# Description
Is this reporting a bug or feature request?
If reporting a bug, please fill out the following:
### Environment
- pipecat-ai version:
- python version:
- OS:
### Issue description
Provide a clear description of the issue.
### Repro steps
List the steps to reproduce the issue.
### Expected behavior
### Actual behavior
### Logs

View File

@@ -50,7 +50,6 @@ autodoc_mock_imports = [
"pyht.protos",
"pyht.protos.api_pb2",
"pipecat_ai_playht", # PlayHT wrapper
"vllm",
"aiortc",
"aiortc.mediastreams",
"cv2",
@@ -76,7 +75,6 @@ autodoc_mock_imports = [
"openpipe",
"simli",
"soundfile",
# Existing mocks
"pipecat_ai_krisp",
"pyaudio",
"_tkinter",
@@ -87,6 +85,66 @@ autodoc_mock_imports = [
"pydantic.Field",
"pydantic._internal._model_construction",
"pydantic._internal._fields",
# Moondream dependencies
"torch",
"transformers",
"intel_extension_for_pytorch",
# Ultravox dependencies
"huggingface_hub",
"vllm",
"vllm.engine.arg_utils",
"transformers.AutoTokenizer",
# Langchain dependencies
"langchain_core",
"langchain_core.messages",
"langchain_core.runnables",
"langchain_core.messages.AIMessageChunk",
"langchain_core.runnables.Runnable",
# LiveKit dependencies
"livekit",
"livekit.rtc",
"livekit_api",
"livekit_protocol",
"tenacity",
"tenacity.retry",
"tenacity.stop_after_attempt",
"tenacity.wait_exponential",
"rtc",
"rtc.Room",
"rtc.RoomOptions",
"rtc.AudioSource",
"rtc.LocalAudioTrack",
"rtc.TrackPublishOptions",
"rtc.TrackSource",
"rtc.AudioStream",
"rtc.AudioFrameEvent",
"rtc.AudioFrame",
"rtc.Track",
"rtc.TrackKind",
"rtc.RemoteParticipant",
"rtc.RemoteTrackPublication",
"rtc.DataPacket",
# Riva dependencies
"riva",
"riva.client",
"riva.client.Auth",
"riva.client.ASRService",
"riva.client.StreamingRecognitionConfig",
"riva.client.RecognitionConfig",
"riva.client.AudioEncoding",
"riva.client.proto.riva_tts_pb2",
"riva.client.SpeechSynthesisService",
# Local CoreML Smart Turn dependencies
"coremltools",
"coremltools.models",
"coremltools.models.MLModel",
"torch",
"torch.nn",
"torch.nn.functional",
"transformers",
"transformers.AutoFeatureExtractor",
# Also add specific classes that are imported
"AutoFeatureExtractor",
]
# HTML output settings
@@ -118,12 +176,25 @@ def verify_modules():
},
}
# Skip importing modules that are in autodoc_mock_imports
skipped_modules = set(autodoc_mock_imports)
missing = []
for category, modules in required_modules.items():
if isinstance(modules, dict):
# Handle nested structure
for subcategory, submodules in modules.items():
for module in submodules:
# Check if module is in autodoc_mock_imports
if (
f"pipecat.{category}.{subcategory}.{module}" in skipped_modules
or module in skipped_modules
):
logger.info(
f"Skipping import of mocked module: pipecat.{category}.{subcategory}.{module}"
)
continue
try:
__import__(f"pipecat.{category}.{subcategory}.{module}")
logger.info(
@@ -137,6 +208,11 @@ def verify_modules():
else:
# Handle flat structure
for module in modules:
# Check if module is in autodoc_mock_imports
if f"pipecat.{category}.{module}" in skipped_modules or module in skipped_modules:
logger.info(f"Skipping import of mocked module: pipecat.{category}.{module}")
continue
try:
__import__(f"pipecat.{category}.{module}")
logger.info(f"Successfully imported pipecat.{category}.{module}")

View File

@@ -10,7 +10,6 @@ pipecat-ai[anthropic]
pipecat-ai[assemblyai]
pipecat-ai[aws]
pipecat-ai[azure]
pipecat-ai[canonical]
pipecat-ai[cartesia]
pipecat-ai[cerebras]
pipecat-ai[deepseek]
@@ -26,20 +25,23 @@ pipecat-ai[grok]
pipecat-ai[groq]
# pipecat-ai[krisp] # Mocked
pipecat-ai[koala]
pipecat-ai[langchain]
pipecat-ai[livekit]
# pipecat-ai[langchain] # Mocked
# pipecat-ai[livekit] # Mocked
pipecat-ai[lmnt]
pipecat-ai[local]
# pipecat-ai[local-smart-turn] # Mocked
# pipecat-ai[mem0] # Mocked
# pipecat-ai[mlx-whisper] # Mocked
pipecat-ai[moondream]
# pipecat-ai[moondream] # Mocked
pipecat-ai[nim]
# pipecat-ai[neuphonic] # Mocked
pipecat-ai[noisereduce]
pipecat-ai[openai]
# pipecat-ai[openpipe]
# pipecat-ai[playht] # Mocked due to grpcio conflict with riva
pipecat-ai[riva]
pipecat-ai[qwen]
pipecat-ai[remote-smart-turn]
# pipecat-ai[riva] # Mocked
pipecat-ai[silero]
pipecat-ai[simli]
pipecat-ai[soundfile]

View File

@@ -92,4 +92,12 @@ ASSEMBLYAI_API_KEY=...
OPENROUTER_API_KEY=...
# Piper
PIPER_BASE_URL=...
PIPER_BASE_URL=...
# Smart turn
LOCAL_SMART_TURN_MODEL_PATH=
FAL_SMART_TURN_API_KEY=...
# Twilio
TWILIO_ACCOUNT_SID=
TWILIO_AUTH_TOKEN=

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -1,161 +0,0 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
recordings/
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
.pybuilder/
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock
# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype static type analyzer
.pytype/
# Cython debug symbols
cython_debug/
# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
runpod.toml

View File

@@ -1,66 +0,0 @@
# Chatbot with canonical-metrics
This project implements a chatbot using a pipeline architecture that integrates audio processing, transcription, and a language model for conversational interactions. The chatbot operates within a daily communication environment, utilizing various services for text-to-speech and language model responses.
## Features
- **Audio Input and Output**: Captures microphone input and plays back audio responses.
- **Voice Activity Detection**: Utilizes Silero VAD to manage audio input intelligently.
- **Text-to-Speech**: Integrates ElevenLabs TTS service to convert text responses into audio.
- **Language Model Interaction**: Uses OpenAI's GPT-4 model to generate responses based on user input.
- **Transcription Services**: Captures and transcribes participant speech for analytics.
- **Metrics Collection**: Sends audio data for analysis via Canonical Metrics Service.
## Requirements
- Python 3.10+
- `python-dotenv`
- Additional libraries from the `pipecat` package.
## Setup
1. Clone the repository.
2. Install the required packages.
3. Set up environment variables for API keys:
- `OPENAI_API_KEY`
- `ELEVENLABS_API_KEY`
- `CANONICAL_API_KEY`
- `CANONICAL_API_URL`
4. Run the script.
## Usage
The chatbot introduces itself and engages in conversations, providing brief and creative responses. Designed for flexibility, it can support multiple languages with appropriate configuration.
## Events
- Participants joining or leaving the call are handled dynamically, adjusting the chatbot's behavior accordingly.
The first time, things might take extra time to get started since VAD (Voice Activity Detection) model needs to be downloaded.
## Get started
```python
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cp env.example .env # and add your credentials
```
## Run the server
```bash
python server.py
```
Then, visit `http://localhost:7860/` in your browser to start a chatbot session.
## Build and test the Docker image
```
docker build -t chatbot .
docker run --env-file .env -p 7860:7860 chatbot
```

View File

@@ -1,148 +0,0 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
import uuid
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import EndFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.audio.audio_buffer_processor import AudioBufferProcessor
from pipecat.services.canonical.metrics import CanonicalMetricsService
from pipecat.services.elevenlabs.tts import ElevenLabsTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Chatbot",
DailyParams(
audio_out_enabled=True,
audio_in_enabled=True,
camera_out_enabled=False,
vad_enabled=True,
vad_audio_passthrough=True,
vad_analyzer=SileroVADAnalyzer(),
transcription_enabled=True,
#
# Spanish
#
# transcription_settings=DailyTranscriptionSettings(
# language="es",
# tier="nova",
# model="2-general"
# )
),
)
tts = ElevenLabsTTSService(
api_key=os.getenv("ELEVENLABS_API_KEY"),
#
# English
#
voice_id="cgSgspJ2msm6clMCkdW9",
#
# Spanish
#
# model="eleven_multilingual_v2",
# voice_id="gD1IexrzCvsXPHUuT0s3",
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
messages = [
{
"role": "system",
#
# English
#
"content": "You are Chatbot, a friendly, helpful robot. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way, but keep your responses brief. Start by introducing yourself. Keep all your responses to 12 words or fewer.",
#
# Spanish
#
# "content": "Eres Chatbot, un amigable y útil robot. Tu objetivo es demostrar tus capacidades de una manera breve. Tus respuestas se convertiran a audio así que nunca no debes incluir caracteres especiales. Contesta a lo que el usuario pregunte de una manera creativa, útil y breve. Empieza por presentarte a ti mismo.",
},
]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
"""
CanonicalMetrics uses AudioBufferProcessor under the hood to buffer the audio. On
call completion, CanonicalMetrics will send the audio buffer to Canonical for
analysis. Visit https://voice.canonical.chat to learn more.
"""
audio_buffer_processor = AudioBufferProcessor(num_channels=2)
canonical = CanonicalMetricsService(
audio_buffer_processor=audio_buffer_processor,
aiohttp_session=session,
api_key=os.getenv("CANONICAL_API_KEY"),
call_id=str(uuid.uuid4()),
assistant="pipecat-chatbot",
assistant_speaks_first=True,
context=context,
)
pipeline = Pipeline(
[
transport.input(), # microphone
context_aggregator.user(),
llm,
tts,
transport.output(),
canonical, # uploads audio buffer to Canonical AI for metrics
audio_buffer_processor, # captures audio into a buffer
context_aggregator.assistant(),
]
)
task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await audio_buffer_processor.start_recording()
await transport.capture_participant_transcription(participant["id"])
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
print(f"Participant left: {participant}")
await task.cancel()
@transport.event_handler("on_call_state_updated")
async def on_call_state_updated(transport, state):
if state == "left":
# Here we don't want to cancel, we just want to finish sending
# whatever is queued, so we use an EndFrame().
await task.queue_frame(EndFrame())
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -1,5 +0,0 @@
python-dotenv
fastapi[all]
uvicorn
pipecat-ai[daily,openai,silero,elevenlabs,canonical]

View File

@@ -66,9 +66,7 @@ async def main():
DailyParams(
audio_out_enabled=True,
audio_in_enabled=True,
camera_out_enabled=False,
vad_enabled=True,
vad_audio_passthrough=True,
video_out_enabled=False,
vad_analyzer=SileroVADAnalyzer(),
transcription_enabled=True,
#
@@ -95,7 +93,7 @@ async def main():
# voice_id="gD1IexrzCvsXPHUuT0s3",
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
messages = [
{

View File

@@ -53,4 +53,3 @@ async def configure(aiohttp_session: aiohttp.ClientSession):
token = await daily_rest_helper.get_token(url, expiry_time)
return (url, token)
return (url, token)

View File

@@ -0,0 +1,39 @@
# Daily Custom Tracks
This example shows how to send and receive Daily custom tracks. We will run a simple `daily-python` application to send an audio file with a custom track (named "pipecat") to a room. Then, the Pipecat bot will mirror that custom track into another custom track (named "pipecat-mirror") in the same room.
## Get started
```python
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```
## Run the bot
Start the bot by giving it a Daily room URL.
```bash
python bot.py -u ROOM_URL
```
The bot will wait for the first participant to join. Then, it will mirror a custom track named "pipecat" into a new custom track named "pipecat-mirror".
## Run the sender
Now, run the custom track sender. This is a simple `daily-python` application that opens and audio file and sends it as a custom track to the same Daily room.
```bash
python custom_track_sender.py -u ROOM_URL -i office-ambience-mono-16000.mp3
```
## Open client
Finally, open the client so you can hear both custom tracks.
```bash
open index.html
```
Once the client is opened, copy the URL of the Daily room and join it. You should be able to select which custom track you want to hear.

View File

@@ -0,0 +1,87 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import sys
import aiohttp
from loguru import logger
from runner import configure
from pipecat.frames.frames import Frame, InputAudioRawFrame, OutputAudioRawFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.transports.services.daily import DailyParams, DailyTransport
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
class CustomTrackMirrorProcessor(FrameProcessor):
def __init__(self, transport_destination: str, **kwargs):
super().__init__(**kwargs)
self._transport_destination = transport_destination
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, InputAudioRawFrame) and frame.transport_source:
output_frame = OutputAudioRawFrame(
audio=frame.audio,
sample_rate=frame.sample_rate,
num_channels=frame.num_channels,
)
output_frame.transport_destination = self._transport_destination
await self.push_frame(output_frame)
else:
await self.push_frame(frame, direction)
async def main():
async with aiohttp.ClientSession() as session:
(room_url, _) = await configure(session)
transport = DailyTransport(
room_url,
None,
"Custom tracks mirror",
DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
microphone_out_enabled=False, # Disable since we just use custom tracks
audio_out_destinations=["pipecat-mirror"],
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
CustomTrackMirrorProcessor("pipecat-mirror"),
transport.output(), # Transport bot output
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
audio_in_sample_rate=16000,
audio_out_sample_rate=16000,
),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_audio(participant["id"], audio_source="pipecat")
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,74 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import argparse
import time
from daily import CallClient, CustomAudioSource, Daily
from pydub import AudioSegment
parser = argparse.ArgumentParser(description="Daily AI SDK Bot Sample")
parser.add_argument("-u", "--url", type=str, required=True, help="URL of the Daily room to join")
parser.add_argument(
"-i", "--input", type=str, required=True, help="Input audio file (needs 16000 sample rate)"
)
args, _ = parser.parse_known_args()
audio = AudioSegment.from_mp3(args.input)
raw_bytes = audio.raw_data
sample_rate = audio.frame_rate
channels = audio.channels
print(f"Length: {len(raw_bytes)} bytes")
print(f"Sample rate: {sample_rate}, Channels: {channels}")
# Initialize the Daily context & create call client
Daily.init()
client = CallClient()
# Join the room and indicate we have a custom track named "pipecat".
client.join(
args.url,
client_settings={
"publishing": {
"camera": False,
"microphone": False,
"customAudio": {"pipecat": True},
},
},
)
# Just sleep for a couple of seconds. To do this well we should really use
# completions.
time.sleep(2)
# Create the custom audio source. This is where we will write our audio.
audio_source = CustomAudioSource(sample_rate, channels)
# Create an audio track and assign it our audio source.
client.add_custom_audio_track("pipecat", audio_source)
# Just sleep for a second. To do this well we should really use completions.
time.sleep(1)
try:
# Just write one second of audio until we have read all the file.
chunk_size = sample_rate * channels * 2
while len(raw_bytes) > 0:
chunk = raw_bytes[:chunk_size]
raw_bytes = raw_bytes[chunk_size:]
audio_source.write_frames(chunk)
except KeyboardInterrupt:
client.leave()
# Just sleep for a second. To do this well we should really use completions.
time.sleep(1)
client.release()

View File

@@ -0,0 +1,173 @@
<html>
<head>
<title>daily custom tracks</title>
</head>
<script crossorigin src="https://unpkg.com/@daily-co/daily-js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/fomantic-ui/2.8.6/semantic.min.js"></script>
<link
rel="stylesheet"
type="text/css"
href="https://cdnjs.cloudflare.com/ajax/libs/fomantic-ui/2.8.6/semantic.min.css"
/>
<script>
function enableButton(buttonId, enable) {
const button = document.getElementById(buttonId);
button.disabled = !enable;
}
function enableJoinButton(enable) {
enableButton("join-button", enable);
}
function enableLeaveButton(enable) {
enableButton("leave-button", enable);
}
function destroyPlayers(query) {
const items = document.querySelectorAll(query);
if (items) {
for (const item of items) {
item.remove();
}
}
}
function destroyParticipantPlayers(participantId) {
destroyPlayers(`audio[data-participant-id="${participantId}"]`);
destroyPlayers(`button[data-participant-id="${participantId}"]`);
}
async function startPlayer(player, track) {
player.muted = false;
player.autoplay = true;
if (track != null) {
player.srcObject = new MediaStream([track]);
}
}
async function buildAudioPlayer(track, participantId) {
const audioContainer = document.getElementById("audio-container");
const player = document.createElement("audio");
player.dataset.participantId = participantId;
// Create a new button for controlling audio
const audioControlButton = document.createElement("button");
audioControlButton.className = "ui primary green button"
audioControlButton.innerText = track._mediaTag == "cam-audio" ? "english" : track._mediaTag;
audioControlButton.dataset.participantId = participantId;
audioControlButton.onclick = () => {
if (player.paused) {
player.play();
audioControlButton.className = "ui primary red button"
} else {
player.pause();
audioControlButton.className = "ui primary green button"
}
};
audioContainer.appendChild(player);
audioContainer.appendChild(audioControlButton);
await startPlayer(player, track);
player.pause()
return player;
}
function subscribeToTracks(participantId) {
console.log(`subscribing to track`);
if (participantId === "local") {
return;
}
callObject.updateParticipant(participantId, {
setSubscribedTracks: {
audio: true,
video: false,
custom: true,
},
});
}
function startDaily() {
enableJoinButton(true);
enableLeaveButton(false);
window.callObject = window.DailyIframe.createCallObject({});
callObject.on("participant-joined", (e) => {
if (!e.participant.local) {
console.log("participant-joined", e.participant);
subscribeToTracks(e.participant.session_id);
}
});
callObject.on("participant-left", (e) => {
console.log("participant-left", e.participant.session_id);
destroyParticipantPlayers(e.participant.session_id);
});
callObject.on("track-started", async (e) => {
console.log("track-started", e.track);
if (e.track.kind === "audio") {
await buildAudioPlayer(e.track, e.participant.session_id);
}
});
}
async function joinRoom() {
enableJoinButton(false);
enableLeaveButton(true);
const meetingUrl = document.getElementById("meeting-url").value;
callObject.join({
url: meetingUrl,
startVideoOff: true,
startAudioOff: true,
subscribeToTracksAutomatically: false,
receiveSettings: {
base: { video: { layer: 0 } },
},
});
}
async function leaveRoom() {
enableJoinButton(true);
enableLeaveButton(false);
callObject.leave();
const audioContainer = document.getElementById("audio-container");
audioContainer.replaceChildren();
}
</script>
<body onload="startDaily()">
<div class="ui centered page grid" style="margin-top: 30px">
<div class="ten wide column">
<div class="ui form" style="margin-top: 30px">
<div class="field">
<label>Meeting URL</label>
<input id="meeting-url" value="" />
</div>
</div>
</div>
</div>
<div class="ui centered aligned header" style="margin-top: 30px">
<button id="join-button" class="ui primary button" onclick="joinRoom()">
Join
</button>
<button id="leave-button" class="ui button" onclick="leaveRoom()">
Leave
</button>
</div>
<div id="tile" class="ui container" style="margin-top: 30px">
<div id="tile" class="ui center aligned grid">
<div id="audio-container"></div><br/>
</div>
</div>
</body>
</html>

View File

@@ -0,0 +1,2 @@
pydub
pipecat-ai[daily]

View File

@@ -1,7 +1,12 @@
FROM python:3.10-bullseye
RUN mkdir /app
RUN mkdir /app/assets
RUN mkdir /app/utils
COPY *.py /app/
COPY requirements.txt /app/
WORKDIR /app
RUN pip3 install -r requirements.txt

View File

@@ -0,0 +1,39 @@
# Daily Multi Translation
This example shows how to use Daily to stream multiple simultaneous translations using a single transport. Daily provides custom tracks and in this example we will simultaneously translate incoming audio in English to Spanish, French and German, each of them being sent to a custom track.
## Get started
```python
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cp env.example .env # and add your credentials
```
## Run the server
```bash
python server.py
```
Then, visit `http://localhost:7860/` in your browser. This will open a Daily Prebuilt room where you will speak in English (make sure you are not muted).
## Open client
Next, you need to open the client that will listen to the translations.
```bash
open index.html
```
Once the client is opened, copy the URL of the Daily room created above and join it. You should be able to select which translation you want to hear.
## Build and test the Docker image
```
docker build -t daily-multi-translation .
docker run --env-file .env -p 7860:7860 daily-multi-translation
```

View File

@@ -0,0 +1,165 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure
from pipecat.audio.mixers.soundfile_mixer import SoundfileMixer
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.observers.loggers.transcription_log_observer import TranscriptionLogObserver
from pipecat.pipeline.parallel_pipeline import ParallelPipeline
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
BACKGROUND_SOUND_FILE = "office-ambience-mono-16000.mp3"
async def main():
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Multi translation bot",
DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
audio_out_mixer={
"spanish": SoundfileMixer(
sound_files={"office": BACKGROUND_SOUND_FILE}, default_sound="office"
),
"french": SoundfileMixer(
sound_files={"office": BACKGROUND_SOUND_FILE}, default_sound="office"
),
"german": SoundfileMixer(
sound_files={"office": BACKGROUND_SOUND_FILE}, default_sound="office"
),
},
audio_out_destinations=["spanish", "french", "german"],
microphone_out_enabled=False, # Disable since we just use custom tracks
vad_analyzer=SileroVADAnalyzer(),
),
)
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts_spanish = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="cefcb124-080b-4655-b31f-932f3ee743de",
transport_destination="spanish",
)
tts_french = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="8832a0b5-47b2-4751-bb22-6a8e2149303d",
transport_destination="french",
)
tts_german = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="38aabb6a-f52b-4fb0-a3d1-988518f4dc06",
transport_destination="german",
)
messages_spanish = [
{
"role": "system",
"content": "You will be provided with a sentence in English, and your task is to only translate it into Spanish.",
},
]
messages_french = [
{
"role": "system",
"content": "You will be provided with a sentence in English, and your task is to only translate it into French.",
},
]
messages_german = [
{
"role": "system",
"content": "You will be provided with a sentence in English, and your task is to only translate it into German.",
},
]
llm_spanish = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm_french = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm_german = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
context_spanish = OpenAILLMContext(messages_spanish)
context_aggregator_spanish = llm_spanish.create_context_aggregator(context_spanish)
context_french = OpenAILLMContext(messages_french)
context_aggregator_french = llm_french.create_context_aggregator(context_french)
context_german = OpenAILLMContext(messages_german)
context_aggregator_german = llm_german.create_context_aggregator(context_german)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
ParallelPipeline(
# Spanish pipeline.
[
context_aggregator_spanish.user(),
llm_spanish,
tts_spanish,
context_aggregator_spanish.assistant(),
],
# French pipeline.
[
context_aggregator_french.user(),
llm_french,
tts_french,
context_aggregator_french.assistant(),
],
# German pipeline.
[
context_aggregator_german.user(),
llm_german,
tts_german,
context_aggregator_german.assistant(),
],
),
transport.output(), # Transport bot output
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
audio_in_sample_rate=16000,
audio_out_sample_rate=16000,
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
report_only_initial_ttfb=True,
),
observers=[TranscriptionLogObserver()],
)
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -1,6 +1,5 @@
DAILY_SAMPLE_ROOM_URL=https://yourdomain.daily.co/yourroom # (for joining the bot to the same room repeatedly for local dev)
DAILY_API_KEY=7df...
OPENAI_API_KEY=sk-PL...
ELEVENLABS_API_KEY=aeb...
CANONICAL_API_KEY=can...
CANONICAL_API_URL=
DEEPGRAM_API_KEY=efb...
CARTESIA_API_KEY=aeb...

View File

@@ -0,0 +1,202 @@
<html>
<head>
<title>daily multi translation</title>
</head>
<script crossorigin src="https://unpkg.com/@daily-co/daily-js"></script>
<script
src="https://code.jquery.com/jquery-3.1.1.min.js"
integrity="sha256-hVVnYaiADRTO2PzUGmuLJr8BLUSjGIZsDYGmIJLv2b8="
crossorigin="anonymous"
></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/fomantic-ui/2.8.6/semantic.min.js"></script>
<link
rel="stylesheet"
type="text/css"
href="https://cdnjs.cloudflare.com/ajax/libs/fomantic-ui/2.8.6/semantic.min.css"
/>
<script>
function enableButton(buttonId, enable) {
const button = document.getElementById(buttonId);
button.disabled = !enable;
}
function enableJoinButton(enable) {
enableButton("join-button", enable);
}
function enableLeaveButton(enable) {
enableButton("leave-button", enable);
}
function destroyPlayers(query) {
const items = document.querySelectorAll(query);
if (items) {
for (const item of items) {
item.remove();
}
}
}
function destroyParticipantPlayers(participantId) {
destroyPlayers(`video[data-participant-id="${participantId}"]`);
destroyPlayers(`audio[data-participant-id="${participantId}"]`);
destroyPlayers(`button[data-participant-id="${participantId}"]`);
}
async function startPlayer(player, track) {
player.muted = false;
player.autoplay = true;
if (track != null) {
player.srcObject = new MediaStream([track]);
}
}
async function buildVideoPlayer(track, participantId) {
const videoContainer = document.getElementById("video-container");
const player = document.createElement("video");
player.dataset.participantId = participantId;
videoContainer.appendChild(player);
await startPlayer(player, track);
await player.play();
return player;
}
async function buildAudioPlayer(track, participantId) {
const audioContainer = document.getElementById("audio-container");
const player = document.createElement("audio");
player.dataset.participantId = participantId;
// Create a new button for controlling audio
const audioControlButton = document.createElement("button");
audioControlButton.className = "ui primary green button"
audioControlButton.innerText = track._mediaTag == "cam-audio" ? "english" : track._mediaTag;
audioControlButton.dataset.participantId = participantId;
audioControlButton.onclick = () => {
if (player.paused) {
player.play();
audioControlButton.className = "ui primary red button"
} else {
player.pause();
audioControlButton.className = "ui primary green button"
}
};
audioContainer.appendChild(player);
audioContainer.appendChild(audioControlButton);
await startPlayer(player, track);
player.pause()
return player;
}
function subscribeToTracks(participantId) {
console.log(`subscribing to track`);
if (participantId === "local") {
return;
}
callObject.updateParticipant(participantId, {
setSubscribedTracks: {
audio: true,
video: true,
custom: true,
},
});
}
function startDaily() {
enableJoinButton(true);
enableLeaveButton(false);
window.callObject = window.DailyIframe.createCallObject({});
callObject.on("participant-joined", (e) => {
if (!e.participant.local) {
console.log("participant-joined", e.participant);
subscribeToTracks(e.participant.session_id);
}
});
callObject.on("participant-left", (e) => {
console.log("participant-left", e.participant.session_id);
destroyParticipantPlayers(e.participant.session_id);
});
callObject.on("track-started", async (e) => {
console.log("track-started", e.track);
if (e.track.kind === "video") {
await buildVideoPlayer(e.track, e.participant.session_id);
} else if (e.track.kind === "audio") {
await buildAudioPlayer(e.track, e.participant.session_id);
}
});
}
async function joinRoom() {
enableJoinButton(false);
enableLeaveButton(true);
const meetingUrl = document.getElementById("meeting-url").value;
callObject.join({
url: meetingUrl,
startVideoOff: true,
startAudioOff: true,
subscribeToTracksAutomatically: false,
receiveSettings: {
base: { video: { layer: 0 } },
},
});
}
async function leaveRoom() {
enableJoinButton(true);
enableLeaveButton(false);
callObject.leave();
const videoContainer = document.getElementById("video-container");
videoContainer.replaceChildren();
const audioContainer = document.getElementById("audio-container");
audioContainer.replaceChildren();
}
</script>
<body onload="startDaily()">
<div class="ui centered page grid" style="margin-top: 30px">
<div class="ten wide column">
<div class="ui form" style="margin-top: 30px">
<div class="field">
<label>Meeting URL</label>
<input id="meeting-url" value="" />
</div>
</div>
</div>
</div>
<div class="ui centered aligned header" style="margin-top: 30px">
<button id="join-button" class="ui primary button" onclick="joinRoom()">
Join
</button>
<button id="leave-button" class="ui button" onclick="leaveRoom()">
Leave
</button>
</div>
<div id="tile" class="ui container" style="margin-top: 30px">
<div id="tile" class="ui center aligned grid">
<div id="audio-container"></div><br/>
</div>
</div>
<div id="tile" class="ui container" style="margin-top: 30px">
<div id="tile" class="ui center aligned grid">
<div id="video-container" class="ui segment"></div>
</div>
</div>
</body>
</html>

View File

@@ -0,0 +1,5 @@
aiofiles
python-dotenv
fastapi[all]
uvicorn
pipecat-ai[daily,deepgram,openai,silero,cartesia]

View File

@@ -0,0 +1,55 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import argparse
import os
import aiohttp
from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper
async def configure(aiohttp_session: aiohttp.ClientSession):
parser = argparse.ArgumentParser(description="Daily AI SDK Bot Sample")
parser.add_argument(
"-u", "--url", type=str, required=False, help="URL of the Daily room to join"
)
parser.add_argument(
"-k",
"--apikey",
type=str,
required=False,
help="Daily API Key (needed to create an owner token for the room)",
)
args, unknown = parser.parse_known_args()
url = args.url or os.getenv("DAILY_SAMPLE_ROOM_URL")
key = args.apikey or os.getenv("DAILY_API_KEY")
if not url:
raise Exception(
"No Daily room specified. use the -u/--url option from the command line, or set DAILY_SAMPLE_ROOM_URL in your environment to specify a Daily room URL."
)
if not key:
raise Exception(
"No Daily API key specified. use the -k/--apikey option from the command line, or set DAILY_API_KEY in your environment to specify a Daily API key, available from https://dashboard.daily.co/developers."
)
daily_rest_helper = DailyRESTHelper(
daily_api_key=key,
daily_api_url=os.getenv("DAILY_API_URL", "https://api.daily.co/v1"),
aiohttp_session=aiohttp_session,
)
# Create a meeting token for the given room with an expiration 1 hour in
# the future.
expiry_time: float = 60 * 60
token = await daily_rest_helper.get_token(url, expiry_time)
return (url, token)

View File

@@ -41,8 +41,7 @@ async def main(room_url: str, token: str):
api_key=daily_api_key,
audio_in_enabled=True,
audio_out_enabled=True,
camera_out_enabled=False,
vad_enabled=True,
video_out_enabled=False,
vad_analyzer=SileroVADAnalyzer(),
transcription_enabled=True,
),
@@ -53,7 +52,7 @@ async def main(room_url: str, token: str):
voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
messages = [
{

View File

@@ -32,9 +32,9 @@ async def main(room_url: str, token: str):
token,
"bot",
DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
@@ -43,7 +43,7 @@ async def main(room_url: str, token: str):
api_key=os.getenv("CARTESIA_API_KEY", ""), voice_id="71a7ad14-091c-4e8e-a314-022ece01c121"
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
messages = [
{

View File

@@ -1,5 +1,4 @@
python-dotenv==1.0.1
modal==0.71.3
pipecat-ai[daily,silero,cartesia,openai]==0.0.52
pipecat-ai[daily,silero,cartesia,openai]
fastapi==0.115.6
aiohttp==3.11.11

View File

@@ -141,6 +141,7 @@ async def dial(request: RoomRequest, raw_request: Request):
"display_name": request.From,
"sip_mode": "dial-in",
"num_endpoints": 2 if request.call_transfer is not None else 1,
"codecs": {"audio": ["OPUS"]},
}
daily_room_properties["sip"] = sip_config

View File

@@ -103,6 +103,7 @@ export default async function handler(req, res) {
display_name: From,
sip_mode: 'dial-in',
num_endpoints: call_transfer !== null ? 2 : 1,
codecs: {"audio": ["OPUS"]},
};
daily_room_properties.sip = sip_config;
}
@@ -172,4 +173,4 @@ export const config = {
sizeLimit: '1mb',
},
},
};
};

View File

@@ -4,6 +4,7 @@
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import aiohttp
@@ -21,47 +22,26 @@ from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
# Check if we're in local development mode
LOCAL_RUN = os.getenv("LOCAL_RUN")
if LOCAL_RUN:
import asyncio
import webbrowser
try:
from local_runner import configure
except ImportError:
logger.error("Could not import local_runner module. Local development mode may not work.")
# Load environment variables
load_dotenv(override=True)
# Check if we're in local development mode
LOCAL_RUN = os.getenv("LOCAL_RUN")
async def main(room_url: str, token: str):
async def main(transport: DailyTransport):
"""Main pipeline setup and execution function.
Args:
room_url: The Daily room URL
token: The Daily room token
transport: The DailyTransport object for the bot
"""
logger.debug("Starting bot in room: {}", room_url)
transport = DailyTransport(
room_url,
token,
"bot",
DailyParams(
audio_out_enabled=True,
transcription_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
logger.debug("Starting bot")
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"), voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22"
api_key=os.getenv("CARTESIA_API_KEY"), voice_id="71a7ad14-091c-4e8e-a314-022ece01c121"
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
messages = [
{
@@ -126,10 +106,25 @@ async def bot(args: DailySessionArguments):
body: The configuration object from the request body
session_id: The session ID for logging
"""
from pipecat.audio.filters.krisp_filter import KrispFilter
logger.info(f"Bot process initialized {args.room_url} {args.token}")
transport = DailyTransport(
args.room_url,
args.token,
"Pipecat Bot",
DailyParams(
audio_in_enabled=True,
audio_in_filter=None if LOCAL_RUN else KrispFilter(),
audio_out_enabled=True,
transcription_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
try:
await main(args.room_url, args.token)
await main(transport)
logger.info("Bot process completed")
except Exception as e:
logger.exception(f"Error in bot process: {str(e)}")
@@ -137,18 +132,27 @@ async def bot(args: DailySessionArguments):
# Local development functions
async def local_main():
async def local_daily():
"""Function for local development testing."""
from local_runner import configure
try:
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
logger.warning("_")
logger.warning("_")
logger.warning(f"Talk to your voice agent here: {room_url}")
logger.warning("_")
logger.warning("_")
webbrowser.open(room_url)
await main(room_url, token)
transport = DailyTransport(
room_url,
token,
"Pipecat Bot",
DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
transcription_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
)
await main(transport)
except Exception as e:
logger.exception(f"Error in local development mode: {e}")
@@ -156,6 +160,6 @@ async def local_main():
# Local development entry point
if LOCAL_RUN and __name__ == "__main__":
try:
asyncio.run(local_main())
asyncio.run(local_daily())
except Exception as e:
logger.exception(f"Failed to run in local mode: {e}")

View File

@@ -1,2 +1,4 @@
CARTESIA_API_KEY=
OPENAI_API_KEY=
OPENAI_API_KEY=
# Local dev only
DAILY_API_KEY=

View File

@@ -7,6 +7,7 @@
import os
import aiohttp
from fastapi import HTTPException
from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper, DailyRoomParams

View File

@@ -1,6 +1,8 @@
agent_name = "my-first-agent"
image = "your-username/my-first-agent:0.1"
image_credentials = "your-dockerhub-creds"
secret_set = "my-first-agent-secrets"
enable_krisp = true
[scaling]
min_instances = 0

51
examples/fal-smart-turn/.gitignore vendored Normal file
View File

@@ -0,0 +1,51 @@
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
build/
dist/
*.egg-info/
.installed.cfg
*.egg
.pytest_cache/
.coverage
.coverage.*
.env
.venv
env/
venv/
ENV/
.mypy_cache/
.dmypy.json
dmypy.json
# JavaScript/Node.js
node_modules/
dist/
dist-ssr/
*.local
.env.local
.env.development.local
.env.test.local
.env.production.local
# Logs
logs/
*.log
npm-debug.log*
yarn-debug.log*
yarn-error.log*
pnpm-debug.log*
# Editor/IDE
.vscode/*
!.vscode/extensions.json
.idea/
*.swp
*.swo
.DS_Store
# Project specific
runpod.toml

View File

@@ -0,0 +1,152 @@
# Smart Turn Detection Demo
This demo showcases Pipecat's Smart Turn Detection feature - an advanced conversational turn detection system that uses machine learning to identify when a speaker has finished their turn in a conversation. Unlike basic Voice Activity Detection (VAD) which only detects speech vs. silence, Smart Turn detects natural conversational cues like intonation patterns, pacing, and linguistic signals.
This demo uses the [pipecat-ai/smart-turn](https://huggingface.co/pipecat-ai/smart-turn) model - an open-source, community-driven conversational turn detection model designed to provide more natural turn-taking in voice interactions. The model is being hosted on Fal's infrastructure for GPU acceleration, offering inference times between 50-60ms.
In the client UI, you can see the transcription messages along with the smart-turn model's prediction results in real-time.
## Try the demo
Try the hosted version of the demo here: https://pcc-smart-turn.vercel.app/.
## Run the demo locally
### Run the Server
1. Set up and activate your virtual environment:
```bash
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```
2. Install dependencies:
```bash
pip install -r requirements.txt
```
3. Create your .env file and set your env vars:
```bash
cp env.example .env
```
Keys to provide:
- GOOGLE_API_KEY
- CARTESIA_API_KEY
- DEEPGRAM_API_KEY
- DAILY_API_KEY
- FAL_SMART_TURN_API_KEY
4. Run the server:
```bash
LOCAL=1 python server.py
```
### Run the client
1. Open a new terminal and navigate to the client directory:
```bash
cd client
```
2. Install dependencies:
```bash
npm install
```
3. Create your .env.local file:
```bash
cp env.local.example .env.local
```
> Note: No keys need to be modified. `NEXT_PUBLIC_API_BASE_URL` is already configured for local use.
4. Start the development server:
```bash
npm run dev
```
5. Open [http://localhost:3000](http://localhost:3000) in your browser.
## Deploy the app
### Deploy the server to Pipecat Cloud
1. Navigate to server
```bash
cd server
```
2. You should already have a .env set up from running locally. If not, do that now.
3. Update your build and deploy scripts.
- In build.sh, set `DOCKER_USERNAME` and `AGENT_NAME`.
- In pcc-deploy.toml, set `image`, which specifies where your Docker image is stored.
4. Build your Docker image by running the build script:
```bash
./build.sh
```
> Note: This builds, tags and pushes your docker image and assumes Docker Hub is the container registry.
5. Make sure you have the Pipecat Cloud CLI installed:
```bash
pip install pipecatcloud
```
6. Login via the Pipecat Cloud CLI:
```bash
pcc auth login
```
> Note: If you don't have an account, sign up at https://pipecat.daily.co.
7. Add a secrets set:
```bash
pcc secrets set pcc-smart-turn-secrets --file .env
```
8. Deploy your agent:
```bash
pcc deploy
```
> Note: This uses your pcc-deploy.toml settings. Modify as needed.
### Deploy the client to Vercel
This project uses TypeScript, React, and Next.js, making it a perfect fit for [Vercel](https://vercel.com/).
- In your client directory, install Vercel's CLI tool: `npm install -g vercel`
- Verify it's installed using `vercel --version`
- Log in your Vercel account using `vercel login`
- Deploy your client to Vercel using `vercel`
Follow the vercel prompts to deploy your project.
### Test your deployed app
Now with the client and server deployed, you can join the call using your Vercel URL.
See the debug information for the Smart Turn data. It prints a log line for each smart-turn inference:
```
Smart Turn: COMPLETE, Probability: 95.3%, Model inference: 65.23ms, Server processing: 82.09ms, End-to-end: 245.43ms
```

View File

@@ -0,0 +1,41 @@
# See https://help.github.com/articles/ignoring-files/ for more about ignoring files.
# dependencies
/node_modules
/.pnp
.pnp.*
.yarn/*
!.yarn/patches
!.yarn/plugins
!.yarn/releases
!.yarn/versions
# testing
/coverage
# next.js
/.next/
/out/
# production
/build
# misc
.DS_Store
*.pem
# debug
npm-debug.log*
yarn-debug.log*
yarn-error.log*
.pnpm-debug.log*
# env files (can opt-in for committing if needed)
.env*
# vercel
.vercel
# typescript
*.tsbuildinfo
next-env.d.ts

View File

@@ -0,0 +1,3 @@
NEXT_PUBLIC_API_BASE_URL=http://localhost:7860
PIPECAT_CLOUD_API_KEY=
AGENT_NAME=pcc-smart-turn

View File

@@ -0,0 +1,16 @@
import { dirname } from "path";
import { fileURLToPath } from "url";
import { FlatCompat } from "@eslint/eslintrc";
const __filename = fileURLToPath(import.meta.url);
const __dirname = dirname(__filename);
const compat = new FlatCompat({
baseDirectory: __dirname,
});
const eslintConfig = [
...compat.extends("next/core-web-vitals", "next/typescript"),
];
export default eslintConfig;

View File

@@ -0,0 +1,7 @@
import type { NextConfig } from "next";
const nextConfig: NextConfig = {
/* config options here */
};
export default nextConfig;

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,28 @@
{
"name": "my-nextjs-app",
"version": "0.1.0",
"private": true,
"scripts": {
"dev": "next dev",
"build": "next build",
"start": "next start",
"lint": "next lint"
},
"dependencies": {
"@pipecat-ai/client-js": "^0.3.5",
"@pipecat-ai/client-react": "^0.3.5",
"@pipecat-ai/daily-transport": "^0.3.10",
"next": "15.3.1",
"react": "^19.0.0",
"react-dom": "^19.0.0"
},
"devDependencies": {
"@eslint/eslintrc": "^3",
"@types/node": "^20",
"@types/react": "^19",
"@types/react-dom": "^19",
"eslint": "^9",
"eslint-config-next": "15.2.3",
"typescript": "^5"
}
}

View File

@@ -0,0 +1,7 @@
<svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
<path d="M3.3088 5.05615C3.64682 4.92779 4.02833 5.02411 4.26653 5.29797L7.36884 8.86461H16.6312L19.7335 5.29797C19.9717 5.02411 20.3532 4.92779 20.6912 5.05615C21.0292 5.18452 21.253 5.51072 21.253 5.87504V13.75H24V15.5H19.5181V8.19909L17.6762 10.3167C17.5115 10.506 17.2738 10.6146 17.0241 10.6146H6.9759C6.72616 10.6146 6.48854 10.506 6.32383 10.3167L4.48193 8.19909V15.5H0V13.75H2.74699V5.87504C2.74699 5.51072 2.97078 5.18452 3.3088 5.05615Z" fill="black"/>
<path d="M19.5181 17.25H24V19H19.5181V17.25Z" fill="black"/>
<path d="M0 17.25H4.48193V19H0V17.25Z" fill="black"/>
<path d="M9.25301 14.3333C9.25301 14.9777 8.73517 15.5 8.09639 15.5C7.4576 15.5 6.93976 14.9777 6.93976 14.3333C6.93976 13.689 7.4576 13.1667 8.09639 13.1667C8.73517 13.1667 9.25301 13.689 9.25301 14.3333Z" fill="black"/>
<path d="M17.0602 14.3333C17.0602 14.9777 16.5424 15.5 15.9036 15.5C15.2648 15.5 14.747 14.9777 14.747 14.3333C14.747 13.689 15.2648 13.1667 15.9036 13.1667C16.5424 13.1667 17.0602 13.689 17.0602 14.3333Z" fill="black"/>
</svg>

After

Width:  |  Height:  |  Size: 1.1 KiB

View File

@@ -0,0 +1,44 @@
import { NextResponse, NextRequest } from 'next/server';
export async function POST(request: NextRequest) {
const { MY_CUSTOM_DATA } = await request.json();
try {
const response = await fetch(
`https://api.pipecat.daily.co/v1/public/${process.env.AGENT_NAME}/start`,
{
method: 'POST',
headers: {
Authorization: `Bearer ${process.env.PIPECAT_CLOUD_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
// Create Daily room
createDailyRoom: true,
// Optionally set Daily room properties
dailyRoomProperties: { start_video_off: true },
// Optionally pass custom data to the bot
body: { MY_CUSTOM_DATA },
}),
}
);
if (!response.ok) {
throw new Error(`API responded with status: ${response.status}`);
}
const data = await response.json();
// Transform the response to match what RTVI client expects
return NextResponse.json({
room_url: data.dailyRoom,
token: data.dailyToken,
});
} catch (error) {
console.error('API error:', error);
return NextResponse.json(
{ error: 'Failed to start agent' },
{ status: 500 }
);
}
}

View File

@@ -0,0 +1,82 @@
body {
margin: 0;
padding: 20px;
font-family: Arial, sans-serif;
background-color: #f0f0f0;
}
.app {
max-width: 1200px;
margin: 0 auto;
}
.status-bar {
display: flex;
justify-content: space-between;
align-items: center;
padding: 10px;
background-color: #fff;
border-radius: 8px;
margin-bottom: 20px;
}
.controls button {
padding: 8px 16px;
margin-left: 10px;
border: none;
border-radius: 4px;
cursor: pointer;
}
button:disabled {
opacity: 0.5;
cursor: not-allowed;
}
.connect-btn {
background-color: #4caf50;
color: white;
}
.disconnect-btn {
background-color: #f44336;
color: white;
}
.main-content {
background-color: #fff;
border-radius: 8px;
padding: 20px;
margin-bottom: 20px;
}
.bot-container {
display: flex;
flex-direction: column;
align-items: center;
}
.video-container {
width: 640px;
height: 360px;
background-color: #ddd;
margin-bottom: 20px;
border-radius: 8px;
overflow: hidden;
}
.video-container video {
width: 100%;
height: 100%;
object-fit: cover;
}
.mic-enabled {
background-color: #4caf50;
color: white;
}
.mic-disabled {
background-color: #f44336;
color: white;
}

View File

@@ -0,0 +1,27 @@
import './globals.css';
import { RTVIProvider } from '@/providers/RTVIProvider';
export const metadata = {
title: 'Pipecat React Client',
description: 'Pipecat RTVI Client using Next.js',
icons: {
icon: [{ url: '/favicon.svg', type: 'image/svg+xml' }],
},
};
export default function RootLayout({
children,
}: {
children: React.ReactNode;
}) {
return (
<html lang="en">
<head>
<link rel="icon" href="/favicon.svg" type="image/svg+xml" />
</head>
<body>
<RTVIProvider>{children}</RTVIProvider>
</body>
</html>
);
}

View File

@@ -0,0 +1,41 @@
'use client';
import {
RTVIClientAudio,
RTVIClientVideo,
useRTVIClientTransportState,
} from '@pipecat-ai/client-react';
import { ConnectButton } from '../components/ConnectButton';
import { StatusDisplay } from '../components/StatusDisplay';
import { DebugDisplay } from '../components/DebugDisplay';
function BotVideo() {
const transportState = useRTVIClientTransportState();
const isConnected = transportState !== 'disconnected';
return (
<div className="bot-container">
<div className="video-container">
{isConnected && <RTVIClientVideo participant="bot" fit="cover" />}
</div>
</div>
);
}
export default function Home() {
return (
<div className="app">
<div className="status-bar">
<StatusDisplay />
<ConnectButton />
</div>
<div className="main-content">
<BotVideo />
</div>
<DebugDisplay />
<RTVIClientAudio />
</div>
);
}

View File

@@ -0,0 +1,40 @@
import {
useRTVIClient,
useRTVIClientTransportState,
} from '@pipecat-ai/client-react';
export function ConnectButton() {
const client = useRTVIClient();
const transportState = useRTVIClientTransportState();
const isConnected = ['connected', 'ready'].includes(transportState);
const handleClick = async () => {
if (!client) {
console.error('RTVI client is not initialized');
return;
}
try {
if (isConnected) {
await client.disconnect();
} else {
await client.connect();
}
} catch (error) {
console.error('Connection error:', error);
}
};
return (
<div className="controls">
<button
className={isConnected ? 'disconnect-btn' : 'connect-btn'}
onClick={handleClick}
disabled={
!client || ['connecting', 'disconnecting'].includes(transportState)
}>
{isConnected ? 'Disconnect' : 'Connect'}
</button>
</div>
);
}

View File

@@ -0,0 +1,26 @@
.debug-panel {
background-color: #fff;
border-radius: 8px;
padding: 20px;
}
.debug-panel h3 {
margin: 0 0 10px 0;
font-size: 16px;
font-weight: bold;
}
.debug-log {
height: 200px;
overflow-y: auto;
background-color: #f8f8f8;
padding: 10px;
border-radius: 4px;
font-family: monospace;
font-size: 12px;
line-height: 1.4;
}
.debug-log div {
margin-bottom: 4px;
}

View File

@@ -0,0 +1,171 @@
import { useRef, useCallback } from 'react';
import {
Participant,
RTVIEvent,
TransportState,
TranscriptData,
BotLLMTextData,
} from '@pipecat-ai/client-js';
import { useRTVIClient, useRTVIClientEvent } from '@pipecat-ai/client-react';
import './DebugDisplay.css';
interface SmartTurnResultData {
type: 'smart_turn_result';
is_complete: boolean;
probability: number;
inference_time_ms: number; // Pure model inference time
server_total_time_ms: number; // Server processing time
e2e_processing_time_ms: number; // Complete end-to-end time
}
export function DebugDisplay() {
const debugLogRef = useRef<HTMLDivElement>(null);
const client = useRTVIClient();
const log = useCallback((message: string) => {
if (!debugLogRef.current) return;
const entry = document.createElement('div');
entry.textContent = `${new Date().toISOString()} - ${message}`;
// Add styling based on message type
if (message.startsWith('User: ')) {
entry.style.color = '#2196F3'; // blue for user
} else if (message.startsWith('Bot: ')) {
entry.style.color = '#4CAF50'; // green for bot
} else if (message.includes('Smart Turn:')) {
entry.style.color = '#9C27B0'; // purple for smart turn
}
debugLogRef.current.appendChild(entry);
debugLogRef.current.scrollTop = debugLogRef.current.scrollHeight;
}, []);
// Log transport state changes
useRTVIClientEvent(
RTVIEvent.TransportStateChanged,
useCallback(
(state: TransportState) => {
log(`Transport state changed: ${state}`);
},
[log]
)
);
// Log bot connection events
useRTVIClientEvent(
RTVIEvent.BotConnected,
useCallback(
(participant?: Participant) => {
log(`Bot connected: ${JSON.stringify(participant)}`);
},
[log]
)
);
useRTVIClientEvent(
RTVIEvent.BotDisconnected,
useCallback(
(participant?: Participant) => {
log(`Bot disconnected: ${JSON.stringify(participant)}`);
},
[log]
)
);
// Log track events
useRTVIClientEvent(
RTVIEvent.TrackStarted,
useCallback(
(track: MediaStreamTrack, participant?: Participant) => {
log(
`Track started: ${track.kind} from ${participant?.name || 'unknown'}`
);
},
[log]
)
);
useRTVIClientEvent(
RTVIEvent.TrackStopped,
useCallback(
(track: MediaStreamTrack, participant?: Participant) => {
log(
`Track stopped: ${track.kind} from ${participant?.name || 'unknown'}`
);
},
[log]
)
);
// Log bot ready state and check tracks
useRTVIClientEvent(
RTVIEvent.BotReady,
useCallback(() => {
log(`Bot ready`);
if (!client) return;
const tracks = client.tracks();
log(
`Available tracks: ${JSON.stringify({
local: {
audio: !!tracks.local.audio,
video: !!tracks.local.video,
},
bot: {
audio: !!tracks.bot?.audio,
video: !!tracks.bot?.video,
},
})}`
);
}, [client, log])
);
// Log transcripts
useRTVIClientEvent(
RTVIEvent.UserTranscript,
useCallback(
(data: TranscriptData) => {
// Only log final transcripts
if (data.final) {
log(`User: ${data.text}`);
}
},
[log]
)
);
useRTVIClientEvent(
RTVIEvent.BotTranscript,
useCallback(
(data: BotLLMTextData) => {
log(`Bot: ${data.text}`);
},
[log]
)
);
useRTVIClientEvent(
RTVIEvent.ServerMessage,
useCallback(
(data: SmartTurnResultData) => {
log(
`Smart Turn:
${data.is_complete ? 'COMPLETE' : 'INCOMPLETE'},
Probability: ${(data.probability * 100).toFixed(1)}%,
Model inference: ${data.inference_time_ms?.toFixed(2) || 'N/A'}ms,
Server processing: ${data.server_total_time_ms?.toFixed(2) || 'N/A'}ms,
End-to-end: ${data.e2e_processing_time_ms?.toFixed(2) || 'N/A'}ms`
);
},
[log]
)
);
return (
<div className="debug-panel">
<h3>Debug Info</h3>
<div ref={debugLogRef} className="debug-log" />
</div>
);
}

View File

@@ -0,0 +1,11 @@
import { useRTVIClientTransportState } from '@pipecat-ai/client-react';
export function StatusDisplay() {
const transportState = useRTVIClientTransportState();
return (
<div className="status">
Status: <span>{transportState}</span>
</div>
);
}

View File

@@ -0,0 +1,43 @@
'use client';
import { RTVIClient } from '@pipecat-ai/client-js';
import { DailyTransport } from '@pipecat-ai/daily-transport';
import { RTVIClientProvider } from '@pipecat-ai/client-react';
import { PropsWithChildren, useEffect, useState } from 'react';
// Get the API base URL from environment variables
// Default to "/api" if not specified
// "/api" is the default for Next.js API routes and used
// for the Pipecat Cloud deployed agent
const API_BASE_URL = process.env.NEXT_PUBLIC_API_BASE_URL || '/api';
console.log('Using API base URL:', API_BASE_URL);
export function RTVIProvider({ children }: PropsWithChildren) {
const [client, setClient] = useState<RTVIClient | null>(null);
useEffect(() => {
const transport = new DailyTransport();
const rtviClient = new RTVIClient({
transport,
params: {
baseUrl: API_BASE_URL,
endpoints: {
connect: '/connect',
},
requestData: { foo: 'bar' },
},
enableMic: true,
enableCam: false,
});
setClient(rtviClient);
}, []);
if (!client) {
return null;
}
return <RTVIClientProvider client={client}>{children}</RTVIClientProvider>;
}

View File

@@ -0,0 +1,28 @@
{
"compilerOptions": {
"target": "ES2017",
"lib": ["dom", "dom.iterable", "esnext"],
"allowJs": true,
"skipLibCheck": true,
"strict": true,
"noEmit": true,
"esModuleInterop": true,
"module": "esnext",
"moduleResolution": "bundler",
"resolveJsonModule": true,
"isolatedModules": true,
"jsx": "preserve",
"incremental": true,
"plugins": [
{
"name": "next"
}
],
"paths": {
"@/components/*": ["./src/components/*"],
"@/providers/*": ["./src/providers/*"]
}
},
"include": ["next-env.d.ts", "**/*.ts", "**/*.tsx", ".next/types/**/*.ts"],
"exclude": ["node_modules"]
}

View File

@@ -0,0 +1,8 @@
FROM dailyco/pipecat-base:latest
COPY ./requirements.txt requirements.txt
RUN pip install --no-cache-dir --upgrade -r requirements.txt
COPY ./assets assets
COPY ./bot.py bot.py

Binary file not shown.

After

Width:  |  Height:  |  Size: 759 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 884 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 876 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 881 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 866 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 874 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 882 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 885 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 888 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 890 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 898 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 836 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 903 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 908 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 908 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 905 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 903 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 866 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 849 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 866 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 866 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 864 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 858 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 875 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 881 KiB

View File

@@ -0,0 +1,299 @@
#
# Copyright (c) 2025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from PIL import Image
from pipecatcloud.agent import DailySessionArguments
from pipecat.audio.turn.smart_turn.fal_smart_turn import FalSmartTurnAnalyzer
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import (
BotStartedSpeakingFrame,
BotStoppedSpeakingFrame,
Frame,
MetricsFrame,
OutputImageRawFrame,
SpriteFrame,
)
from pipecat.metrics.metrics import SmartTurnMetricsData
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.processors.frameworks.rtvi import (
RTVIConfig,
RTVIObserver,
RTVIProcessor,
RTVIServerMessageFrame,
)
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.google.llm import GoogleLLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
# Check if we're in local development mode
LOCAL = os.getenv("LOCAL")
logger.remove()
logger.add(sys.stderr, level="DEBUG")
sprites = []
script_dir = os.path.dirname(__file__)
# Load sequential animation frames
for i in range(1, 26):
# Build the full path to the image file
full_path = os.path.join(script_dir, f"assets/robot0{i}.png")
# Get the filename without the extension to use as the dictionary key
# Open the image and convert it to bytes
with Image.open(full_path) as img:
sprites.append(OutputImageRawFrame(image=img.tobytes(), size=img.size, format=img.format))
# Create a smooth animation by adding reversed frames
flipped = sprites[::-1]
sprites.extend(flipped)
# Define static and animated states
quiet_frame = sprites[0] # Static frame for when bot is listening
talking_frame = SpriteFrame(images=sprites) # Animation sequence for when bot is talking
class TalkingAnimation(FrameProcessor):
"""Manages the bot's visual animation states.
Switches between static (listening) and animated (talking) states based on
the bot's current speaking status.
"""
def __init__(self):
super().__init__()
self._is_talking = False
async def process_frame(self, frame: Frame, direction: FrameDirection):
"""Process incoming frames and update animation state.
Args:
frame: The incoming frame to process
direction: The direction of frame flow in the pipeline
"""
await super().process_frame(frame, direction)
# Switch to talking animation when bot starts speaking
if isinstance(frame, BotStartedSpeakingFrame):
if not self._is_talking:
await self.push_frame(talking_frame)
self._is_talking = True
# Return to static frame when bot stops speaking
elif isinstance(frame, BotStoppedSpeakingFrame):
await self.push_frame(quiet_frame)
self._is_talking = False
await self.push_frame(frame, direction)
class SmartTurnMetricsProcessor(FrameProcessor):
"""Processes the metrics data from Smart Turn Analyzer.
This processor is responsible for handling smart turn metrics data
and forwarding it to the client UI via RTVI.
"""
async def process_frame(self, frame: Frame, direction: FrameDirection):
"""Process incoming frames and handle Smart Turn metrics.
Args:
frame: The incoming frame to process
direction: The direction of frame flow in the pipeline
"""
await super().process_frame(frame, direction)
# Handle Smart Turn metrics
if isinstance(frame, MetricsFrame):
for metrics in frame.data:
if isinstance(metrics, SmartTurnMetricsData):
logger.info(f"Smart Turn metrics: {metrics}")
# Create a payload with the smart turn prediction data
smart_turn_data = {
"type": "smart_turn_result",
"is_complete": metrics.is_complete,
"probability": metrics.probability,
"inference_time_ms": metrics.inference_time_ms,
"server_total_time_ms": metrics.server_total_time_ms,
"e2e_processing_time_ms": metrics.e2e_processing_time_ms,
}
# Send the data to the client via RTVI
rtvi_frame = RTVIServerMessageFrame(data=smart_turn_data)
await self.push_frame(rtvi_frame)
await self.push_frame(frame, direction)
async def main(transport: DailyTransport):
# Configure your STT, LLM, and TTS services here
# Swap out different processors or properties to customize your bot
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
llm = GoogleLLMService(api_key=os.getenv("GOOGLE_API_KEY"))
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
# Set up the initial context for the conversation
# You can specified initial system and assistant messages here
messages = [
{
"role": "system",
"content": "You are Chatbot, a friendly, helpful robot. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way, but keep your responses brief. Start by introducing yourself.",
},
]
# This sets up the LLM context by providing messages and tools
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
ta = TalkingAnimation()
smart_turn_metrics_processor = SmartTurnMetricsProcessor()
# RTVI events for Pipecat client UI
rtvi = RTVIProcessor(config=RTVIConfig(config=[]))
# A core voice AI pipeline
# Add additional processors to customize the bot's behavior
pipeline = Pipeline(
[
transport.input(),
rtvi,
smart_turn_metrics_processor,
stt,
context_aggregator.user(),
llm,
tts,
ta,
transport.output(),
context_aggregator.assistant(),
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
allow_interruptions=True,
enable_metrics=True,
enable_usage_metrics=True,
),
observers=[RTVIObserver(rtvi)],
)
@rtvi.event_handler("on_client_ready")
async def on_client_ready(rtvi):
logger.debug("Client ready event received")
await rtvi.set_bot_ready()
# Kick off the conversation
await task.queue_frames([context_aggregator.user().get_context_frame()])
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
logger.info("First participant joined: {}", participant["id"])
# Push a static frame to show the bot is listening
await task.queue_frame(quiet_frame)
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
logger.info("Participant left: {}", participant)
await task.cancel()
runner = PipelineRunner(handle_sigint=False, force_gc=True)
await runner.run(task)
async def bot(args: DailySessionArguments):
"""Main bot entry point compatible with the FastAPI route handler.
Args:
room_url: The Daily room URL
token: The Daily room token
body: The configuration object from the request body
session_id: The session ID for logging
"""
from pipecat.audio.filters.krisp_filter import KrispFilter
logger.info(f"Bot process initialized {args.room_url} {args.token}")
async with aiohttp.ClientSession() as session:
transport = DailyTransport(
args.room_url,
args.token,
"Smart Turn Bot",
params=DailyParams(
audio_in_enabled=True,
audio_in_filter=KrispFilter(),
audio_out_enabled=True,
video_out_enabled=True,
video_out_width=1024,
video_out_height=576,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=FalSmartTurnAnalyzer(
api_key=os.getenv("FAL_SMART_TURN_API_KEY"), aiohttp_session=session
),
),
)
try:
await main(transport)
logger.info("Bot process completed")
except Exception as e:
logger.exception(f"Error in bot process: {str(e)}")
raise
# Local development
async def local_daily():
"""Daily transport for local development."""
from runner import configure
try:
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
"Smart Turn Bot",
params=DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
video_out_enabled=True,
video_out_width=1024,
video_out_height=576,
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
turn_analyzer=FalSmartTurnAnalyzer(
api_key=os.getenv("FAL_SMART_TURN_API_KEY"), aiohttp_session=session
),
),
)
await main(transport)
except Exception as e:
logger.exception(f"Error in local development mode: {e}")
# Local development entry point
if LOCAL and __name__ == "__main__":
try:
asyncio.run(local_daily())
except Exception as e:
logger.exception(f"Failed to run in local mode: {e}")

View File

@@ -0,0 +1,19 @@
#!/bin/bash
set -e
VERSION="0.1"
DOCKER_USERNAME=""
AGENT_NAME="pcc-smart-turn"
# Build the Docker image with the correct context
echo "Building Docker image..."
docker build --platform=linux/arm64 -t "$DOCKER_USERNAME/$AGENT_NAME:$VERSION" -t "$DOCKER_USERNAME/$AGENT_NAME:latest" .
# Push the Docker images
echo "Pushing Docker image $DOCKER_USERNAME/$AGENT_NAME:$VERSION..."
docker push "$DOCKER_USERNAME/$AGENT_NAME:$VERSION"
echo "Pushing Docker image $DOCKER_USERNAME/$AGENT_NAME:latest..."
docker push "$DOCKER_USERNAME/$AGENT_NAME:latest"
echo "Successfully built and pushed $DOCKER_USERNAME/$AGENT_NAME:$VERSION and $DOCKER_USERNAME/$AGENT_NAME:latest"

View File

@@ -0,0 +1,5 @@
GOOGLE_API_KEY=
CARTESIA_API_KEY=
DEEPGRAM_API_KEY=
DAILY_API_KEY=
FAL_SMART_TURN_API_KEY=

View File

@@ -0,0 +1,7 @@
agent_name = "pcc-smart-turn"
image = "your-username/pcc-smart-turn:0.1"
secret_set = "pcc-smart-turn-secrets"
enable_krisp = true
[scaling]
min_instances = 0

View File

@@ -0,0 +1,3 @@
pipecatcloud
pipecat-ai[google,daily,deepgram,cartesia,silero]
python-dotenv

View File

@@ -0,0 +1,56 @@
#
# Copyright (c) 20242025, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import argparse
import os
import aiohttp
from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper
async def configure(aiohttp_session: aiohttp.ClientSession):
"""Configure the Daily room and Daily REST helper."""
parser = argparse.ArgumentParser(description="Daily AI SDK Bot Sample")
parser.add_argument(
"-u", "--url", type=str, required=False, help="URL of the Daily room to join"
)
parser.add_argument(
"-k",
"--apikey",
type=str,
required=False,
help="Daily API Key (needed to create an owner token for the room)",
)
args, unknown = parser.parse_known_args()
url = args.url or os.getenv("DAILY_SAMPLE_ROOM_URL")
key = args.apikey or os.getenv("DAILY_API_KEY")
if not url:
raise Exception(
"No Daily room specified. use the -u/--url option from the command line, or set DAILY_SAMPLE_ROOM_URL in your environment to specify a Daily room URL."
)
if not key:
raise Exception(
"No Daily API key specified. use the -k/--apikey option from the command line, or set DAILY_API_KEY in your environment to specify a Daily API key, available from https://dashboard.daily.co/developers."
)
daily_rest_helper = DailyRESTHelper(
daily_api_key=key,
daily_api_url=os.getenv("DAILY_API_URL", "https://api.daily.co/v1"),
aiohttp_session=aiohttp_session,
)
# Create a meeting token for the given room with an expiration 1 hour in
# the future.
expiry_time: float = 60 * 60
token = await daily_rest_helper.get_token(url, expiry_time)
return (url, token)

Some files were not shown because too many files have changed in this diff Show More