Compare commits

...

36 Commits

Author SHA1 Message Date
Mark Backman
d98b6b418d Release prep for 0.0.75 (#2169)
* Update CHANGELOG for 0.0.75 release

* Add RTVI changes to CHANGELOG

* Update CHANGELOG, add deprecation directives to rtvi.py

---------

Co-authored-by: mattie ruth backman <mattieruth@gmail.com>
2025-07-08 15:39:44 -04:00
Mark Backman
deea29b5e8 Merge pull request #2170 from pipecat-ai/mb/update-packages-0.0.75
Package updates to run the release evals
2025-07-08 12:02:22 -07:00
Mark Backman
0bdbc83ed9 Package updates to run the release evals 2025-07-08 11:39:49 -07:00
Mark Backman
6c591f0990 Merge pull request #2167 from pipecat-ai/mb/fix-riva-watchdog
RivaSTTService: reset the watchdog in an async function
2025-07-08 11:29:44 -07:00
Mark Backman
b55b9c257b RivaSTTService: remove reset_watchdog, which is handled in the WatchdogQueue already 2025-07-08 11:19:47 -07:00
Mark Backman
5156c21d14 Merge pull request #2168 from pipecat-ai/mb/fix-neuophonic-tts
Fix: NeuphonicTTSService to use latest websocket API
2025-07-08 11:17:58 -07:00
Mark Backman
a9d824753b Fix: NeuphonicTTSService to use latest websocket API 2025-07-08 11:08:08 -07:00
Filipi da Silva Fuchter
3c6a208101 Merge pull request #2148 from pipecat-ai/filipi/aws_bedrock
Refactoring AWSBedrockLLMService to work async
2025-07-08 12:14:28 -03:00
Mark Backman
b1032a1ca4 Merge pull request #2151 from pipecat-ai/mb/ollama-kwargs
OLLamaLLMService: Pass kwargs
2025-07-08 07:21:09 -07:00
Mark Backman
931f34fccd OLLamaLLMService: Pass kwargs 2025-07-08 07:17:18 -07:00
Mark Backman
f2509adec1 Merge pull request #2162 from pipecat-ai/mb/tts-service-aggregate-sentences
TTS services: Add aggregate_sentences arg
2025-07-08 07:14:58 -07:00
Filipi Fuchter
285b82eb65 Mentioning the AWSBedrockLLMService and AWSPollyTTSService refactors in the changelog. 2025-07-08 07:30:30 -03:00
Filipi Fuchter
74da197304 Refactored AWSBedrockLLMService and AWSPollyTTSService to work asynchronously using aioboto3 instead of the boto3 library. 2025-07-08 07:28:23 -03:00
mattie ruth backman
a6de16f92f Bump all client dependencies to use client-js/react/transports 1.0.0 2025-07-07 15:56:08 -07:00
mattie ruth backman
fc09854d7f fix cam light always on 2025-07-07 15:56:08 -07:00
mattie ruth backman
2959029151 PR Review fixes 2025-07-07 15:56:08 -07:00
mattie ruth backman
e590441b7b Add support for about info in ready messages and add deprecation comments to deprecated types 2025-07-07 15:56:08 -07:00
mattie ruth backman
dc41ec7cb1 Updated all examples with clients to use the new PipecatClient 2025-07-07 15:56:08 -07:00
mattie ruth backman
43049c865c Add support for new RTVI client message protocol: handling and responding 2025-07-07 15:56:08 -07:00
mattie ruth backman
c4a9fc7f88 video-transport typescript formatting 2025-07-07 15:56:08 -07:00
mattie ruth backman
faf4026cf4 Add device controls to the simple chatbot example 2025-07-07 15:56:08 -07:00
Mark Backman
f53f45a6cd TTS services: Add aggregate_sentences arg 2025-07-07 15:38:31 -07:00
Mark Backman
e04e876f44 Merge pull request #2156 from shahrukhx01/add-additional-whisper-models
WhisperSTTService: Add additional whisper model variants
2025-07-07 12:53:06 -07:00
shahrukhx01
a84e7e30da WhisperSTTService: Add additional whisper model variants 2025-07-07 21:43:48 +02:00
Mark Backman
6eed6ff779 Merge pull request #2147 from pipecat-ai/mb/user-idle-long-function-call
UserIdleProcessor: Account for function calls in progress
2025-07-04 14:11:16 -07:00
Mark Backman
1375211610 UserIdleProcessor: Account for function calls in progress 2025-07-04 14:05:05 -07:00
Mark Backman
4e9369a702 Merge pull request #2149 from pipecat-ai/mb/twilio-hang-up-handling 2025-07-04 12:44:17 -07:00
Mark Backman
f9e8748a96 TwilioFrameSerializer: Handle user hanging up before the serializer 2025-07-04 09:42:16 -07:00
Filipi da Silva Fuchter
20d6bf267a Merge pull request #2146 from pipecat-ai/remove_gemini_duplicated_code
Removing duplicated code inside Gemini.
2025-07-04 11:59:10 -03:00
Filipi Fuchter
b573f9dab2 Removing duplicated code inside Gemini. 2025-07-04 10:57:53 -03:00
Mark Backman
dbc76389d8 Merge pull request #2140 from pipecat-ai/mb/fix-26-imports
Fix: missing import in 26f foundational example
2025-07-03 14:12:54 -07:00
Aleix Conchillo Flaqué
c27f838444 Merge pull request #2124 from pipecat-ai/aleix/frame-processor-no-push-queue
FrameProcessor: remove unnecessary push task
2025-07-03 14:03:05 -07:00
Aleix Conchillo Flaqué
ce84485e26 Merge pull request #2142 from pipecat-ai/aleix/publish-workflow-message
github: update publish message to make it clear
2025-07-03 14:02:51 -07:00
Mark Backman
6cf254e2f9 Fix: missing import in 26f foundational example, update twilio transport_params to FastAPIWebsocketParams 2025-07-03 13:58:18 -07:00
Aleix Conchillo Flaqué
02b63c28a5 FrameProcessor: remove unnecessary push task
When we call `FrameProcessor.push_frame()` we end up calling
`FrameProcessor.queue_frame()` on the next or previous processor which already
uses the input queue and guarantees frame ordering. So, there's no need to have
a two queues next to each other.
2025-07-03 13:57:32 -07:00
Aleix Conchillo Flaqué
57c6ce7ffa github: update publish message to make it clear 2025-07-03 13:55:02 -07:00
76 changed files with 6476 additions and 3940 deletions

View File

@@ -5,7 +5,7 @@ on:
inputs:
gitref:
type: string
description: "what git ref to build"
description: "what git tag to build (e.g. v0.0.74)"
required: true
jobs:

View File

@@ -5,6 +5,67 @@ All notable changes to **Pipecat** will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [0.0.75] - 2025-07-08
### Added
- Added an `aggregate_sentences` arg in `CartesiaTTSService`,
`ElevenLabsTTSService`, `NeuphonicTTSService` and `RimeTTSService`, where the
default value is True. When `aggregate_sentences` is True, the `TTSService`
aggregates the LLM streamed tokens into sentences by default. Note: setting
the value to False requires a custom processor before the `TTSService` to
aggregate LLM tokens.
- Added `kwargs` to the `OLLamaLLMService` to allow for configuration args to
be passed to Ollama.
- Added call hang-up error handling in `TwilioFrameSerializer`, which handles
the case where the user has hung up before the `TwilioFrameSerializer` hangs
up the call.
### Changed
- Updated `RTVIObserver` and `RTVIProcessor` to match the new RTVI 1.0.0 protocol.
This includes:
- Deprecating support for all messages related to service configuaration and
actions.
- Adding support for obtaining and logging data about client, including its
RTVI version and optionally included system information (OS/browser/etc.)
- Adding support for handling the new `client-message` RTVI message through
either a `on_client_message` event handler or listening for a new
`RTVIClientMessageFrame`
- Adding support for responding to a `client-message` with a `server-response`
via either a direct call on the `RTVIProcessor` or via pushing a new
`RTVIServerResponseFrame`
- Adding built-in support for handling the new `append-to-context` RTVI message
which allows a client to add to the user or assistant llm context. No extra
code is required for supporting this behavior.
- Updating all JavaScript and React client RTVI examples to use versions 1.0.0
of the clients.
Get started migrating to RTVI protocol 1.0.0 by following the migration guide:
https://docs.pipecat.ai/client/migration-guide
- Refactored `AWSBedrockLLMService` and `AWSPollyTTSService` to work
asynchronously using `aioboto3` instead of the `boto3` library.
- The `UserIdleProcessor` now handles the scenario where function calls take
longer than the idle timeout duration. This allows you to use the
`UserIdleProcessor` in conjunction with function calls that take a while to
return a result.
### Fixed
- Updated the `NeuphonicTTSService` to work with the updated websocket API.
- Fixed an issue with `RivaSTTService` where the watchdog feature was causing
an error on initialization.
### Performance
- Remove unncessary push task in each `FrameProcessor`.
## [0.0.74] - 2025-07-03
### Added

File diff suppressed because it is too large Load Diff

View File

@@ -15,7 +15,7 @@
"vite": "^6.3.5"
},
"dependencies": {
"@pipecat-ai/client-js": "^0.3.5",
"@pipecat-ai/daily-transport": "^0.3.10"
"@pipecat-ai/client-js": "^1.0.0",
"@pipecat-ai/daily-transport": "^1.0.0"
}
}

View File

@@ -5,7 +5,7 @@
*/
/**
* RTVI Client Implementation
* Pipecat Client Implementation
*
* This client connects to an RTVI-compatible bot server using WebRTC (via Daily).
* It handles audio/video streaming and manages the connection lifecycle.
@@ -16,7 +16,7 @@
* - Browser with WebRTC support
*/
import { RTVIClient, RTVIEvent } from '@pipecat-ai/client-js';
import { PipecatClient, RTVIEvent } from '@pipecat-ai/client-js';
import { DailyTransport } from '@pipecat-ai/daily-transport';
/**
@@ -26,7 +26,7 @@ import { DailyTransport } from '@pipecat-ai/daily-transport';
class ChatbotClient {
constructor() {
// Initialize client state
this.rtviClient = null;
this.pcClient = null;
this.setupDOMElements();
this.initializeClientAndTransport();
this.setupEventListeners();
@@ -59,7 +59,7 @@ class ChatbotClient {
this.disconnectBtn.addEventListener('click', () => this.disconnect());
// Populate device selector
this.rtviClient.getAllMics().then((mics) => {
this.pcClient.getAllMics().then((mics) => {
console.log('Available mics:', mics);
mics.forEach((device) => {
const option = document.createElement('option');
@@ -71,16 +71,16 @@ class ChatbotClient {
this.deviceSelector.addEventListener('change', (event) => {
const selectedDeviceId = event.target.value;
console.log('Selected device ID:', selectedDeviceId);
this.rtviClient.updateMic(selectedDeviceId);
this.pcClient.updateMic(selectedDeviceId);
});
// Handle mic mute/unmute toggle
const micToggleBtn = document.getElementById('mic-toggle-btn');
micToggleBtn.addEventListener('click', () => {
let micEnabled = this.rtviClient.isMicEnabled;
let micEnabled = this.pcClient.isMicEnabled;
micToggleBtn.textContent = micEnabled ? 'Unmute Mic' : 'Mute Mic';
this.rtviClient.enableMic(!micEnabled);
this.pcClient.enableMic(!micEnabled);
// Add logic to mute/unmute the mic
if (micEnabled) {
console.log('Mic muted');
@@ -93,23 +93,12 @@ class ChatbotClient {
}
/**
* Set up the RTVI client and Daily transport
* Set up the Pipecat client and Daily transport
*/
async initializeClientAndTransport() {
// Initialize the RTVI client with a DailyTransport and our configuration
this.rtviClient = new RTVIClient({
// Initialize the Pipecat client with a DailyTransport and our configuration
this.pcClient = new PipecatClient({
transport: new DailyTransport(),
params: {
// REPLACE WITH YOUR MODAL URL ENDPOINT
baseUrl:
'https://<Modal workspace>--pipecat-modal-bot-launcher.modal.run',
endpoints: {
connect: '/connect',
},
requestData: {
bot_name: 'openai',
},
},
enableMic: true, // Enable microphone for user input
enableCam: false,
callbacks: {
@@ -176,8 +165,8 @@ class ChatbotClient {
// Set up listeners for media track events
this.setupTrackListeners();
await this.rtviClient.initDevices();
window.client = this.rtviClient;
await this.pcClient.initDevices();
window.client = this.pcClient;
}
/**
@@ -212,10 +201,10 @@ class ChatbotClient {
* This is called when the bot is ready or when the transport state changes to ready
*/
setupMediaTracks() {
if (!this.rtviClient) return;
if (!this.pcClient) return;
// Get current tracks from the client
const tracks = this.rtviClient.tracks();
const tracks = this.pcClient.tracks();
// Set up any available bot tracks
if (tracks.bot?.audio) {
@@ -231,10 +220,10 @@ class ChatbotClient {
* This handles new tracks being added during the session
*/
setupTrackListeners() {
if (!this.rtviClient) return;
if (!this.pcClient) return;
// Listen for new tracks starting
this.rtviClient.on(RTVIEvent.TrackStarted, (track, participant) => {
this.pcClient.on(RTVIEvent.TrackStarted, (track, participant) => {
// Only handle non-local (bot) tracks
if (!participant?.local) {
if (track.kind === 'audio') {
@@ -253,7 +242,7 @@ class ChatbotClient {
});
// Listen for tracks stopping
this.rtviClient.on(RTVIEvent.TrackStopped, (track, participant) => {
this.pcClient.on(RTVIEvent.TrackStopped, (track, participant) => {
if (participant.local) {
this.log('Local mic muted');
return;
@@ -311,21 +300,27 @@ class ChatbotClient {
/**
* Initialize and connect to the bot
* This sets up the RTVI client, initializes devices, and establishes the connection
* This sets up the Pipecat client, initializes devices, and establishes the connection
*/
async connect() {
try {
const botSelector = document.getElementById('bot-selector');
const selectedBot = botSelector.value;
this.rtviClient.params.requestData.bot_name = selectedBot;
// Initialize audio/video devices
this.log('Initializing devices...');
await this.rtviClient.initDevices();
await this.pcClient.initDevices();
// Connect to the bot
this.log(`Connecting to bot: ${selectedBot}`);
await this.rtviClient.connect();
await this.pcClient.connect({
// REPLACE WITH YOUR MODAL URL ENDPOINT
endpoint:
'https://<your-workspace>--pipecat-modal-fastapi-app.modal.run/connect',
requestData: {
bot_name: selectedBot,
},
});
this.log('Connection complete');
} catch (error) {
@@ -336,9 +331,9 @@ class ChatbotClient {
this.updateStatus('Error');
// Clean up if there's an error
if (this.rtviClient) {
if (this.pcClient) {
try {
await this.rtviClient.disconnect();
await this.pcClient.disconnect();
} catch (disconnectError) {
this.log(`Error during disconnect: ${disconnectError.message}`);
}
@@ -350,10 +345,10 @@ class ChatbotClient {
* Disconnect from the bot and clean up media resources
*/
async disconnect() {
if (this.rtviClient) {
if (this.pcClient) {
try {
// Disconnect the RTVI client
await this.rtviClient.disconnect();
// Disconnect the Pipecat client
await this.pcClient.disconnect();
// Clean up audio
if (this.botAudio.srcObject) {

View File

@@ -1,2 +1,3 @@
python-dotenv==1.0.1
modal==0.71.3
modal==1.0.5
fastapi[all]

View File

@@ -44,7 +44,7 @@ Try the hosted version of the demo here: https://pcc-smart-turn.vercel.app/.
4. Run the server:
```bash
LOCAL=1 python server.py
LOCAL_RUN=1 python server.py
```
### Run the client

File diff suppressed because it is too large Load Diff

View File

@@ -9,9 +9,9 @@
"lint": "next lint"
},
"dependencies": {
"@pipecat-ai/client-js": "^0.3.5",
"@pipecat-ai/client-react": "^0.3.5",
"@pipecat-ai/daily-transport": "^0.3.10",
"@pipecat-ai/client-js": "^1.0.0",
"@pipecat-ai/client-react": "^1.0.0",
"@pipecat-ai/daily-transport": "^1.0.0",
"next": "15.3.1",
"react": "^19.0.0",
"react-dom": "^19.0.0"

View File

@@ -1,5 +1,5 @@
import './globals.css';
import { RTVIProvider } from '@/providers/RTVIProvider';
import { PipecatProvider } from '@/providers/PipecatProvider';
export const metadata = {
title: 'Pipecat React Client',
@@ -20,7 +20,7 @@ export default function RootLayout({
<link rel="icon" href="/favicon.svg" type="image/svg+xml" />
</head>
<body>
<RTVIProvider>{children}</RTVIProvider>
<PipecatProvider>{children}</PipecatProvider>
</body>
</html>
);

View File

@@ -1,22 +1,22 @@
'use client';
import {
RTVIClientAudio,
RTVIClientVideo,
useRTVIClientTransportState,
PipecatClientAudio,
PipecatClientVideo,
usePipecatClientTransportState,
} from '@pipecat-ai/client-react';
import { ConnectButton } from '../components/ConnectButton';
import { StatusDisplay } from '../components/StatusDisplay';
import { DebugDisplay } from '../components/DebugDisplay';
function BotVideo() {
const transportState = useRTVIClientTransportState();
const transportState = usePipecatClientTransportState();
const isConnected = transportState !== 'disconnected';
return (
<div className="bot-container">
<div className="video-container">
{isConnected && <RTVIClientVideo participant="bot" fit="cover" />}
{isConnected && <PipecatClientVideo participant="bot" fit="cover" />}
</div>
</div>
);
@@ -35,7 +35,7 @@ export default function Home() {
</div>
<DebugDisplay />
<RTVIClientAudio />
<PipecatClientAudio />
</div>
);
}

View File

@@ -1,11 +1,17 @@
import {
useRTVIClient,
useRTVIClientTransportState,
usePipecatClient,
usePipecatClientTransportState,
} from '@pipecat-ai/client-react';
// Get the API base URL from environment variables
// Default to "/api" if not specified
// "/api" is the default for Next.js API routes and used
// for the Pipecat Cloud deployed agent
const API_BASE_URL = process.env.NEXT_PUBLIC_API_BASE_URL || '/api';
export function ConnectButton() {
const client = useRTVIClient();
const transportState = useRTVIClientTransportState();
const client = usePipecatClient();
const transportState = usePipecatClientTransportState();
const isConnected = ['connected', 'ready'].includes(transportState);
const handleClick = async () => {
@@ -18,7 +24,10 @@ export function ConnectButton() {
if (isConnected) {
await client.disconnect();
} else {
await client.connect();
await client.connect({
endpoint: `${API_BASE_URL}/connect`,
requestData: { foo: 'bar' },
});
}
} catch (error) {
console.error('Connection error:', error);

View File

@@ -6,7 +6,7 @@ import {
TranscriptData,
BotLLMTextData,
} from '@pipecat-ai/client-js';
import { useRTVIClient, useRTVIClientEvent } from '@pipecat-ai/client-react';
import { usePipecatClient, useRTVIClientEvent } from '@pipecat-ai/client-react';
import './DebugDisplay.css';
interface SmartTurnResultData {
@@ -20,7 +20,7 @@ interface SmartTurnResultData {
export function DebugDisplay() {
const debugLogRef = useRef<HTMLDivElement>(null);
const client = useRTVIClient();
const client = usePipecatClient();
const log = useCallback((message: string) => {
if (!debugLogRef.current) return;

View File

@@ -1,7 +1,7 @@
import { useRTVIClientTransportState } from '@pipecat-ai/client-react';
import { usePipecatClientTransportState } from '@pipecat-ai/client-react';
export function StatusDisplay() {
const transportState = useRTVIClientTransportState();
const transportState = usePipecatClientTransportState();
return (
<div className="status">

View File

@@ -0,0 +1,28 @@
'use client';
import { PipecatClient } from '@pipecat-ai/client-js';
import { DailyTransport } from '@pipecat-ai/daily-transport';
import { PipecatClientProvider } from '@pipecat-ai/client-react';
import { PropsWithChildren, useEffect, useState } from 'react';
export function PipecatProvider({ children }: PropsWithChildren) {
const [client, setClient] = useState<PipecatClient | null>(null);
useEffect(() => {
const pcClient = new PipecatClient({
transport: new DailyTransport(),
enableMic: true,
enableCam: false,
});
setClient(pcClient);
}, []);
if (!client) {
return null;
}
return (
<PipecatClientProvider client={client}>{children}</PipecatClientProvider>
);
}

View File

@@ -1,43 +0,0 @@
'use client';
import { RTVIClient } from '@pipecat-ai/client-js';
import { DailyTransport } from '@pipecat-ai/daily-transport';
import { RTVIClientProvider } from '@pipecat-ai/client-react';
import { PropsWithChildren, useEffect, useState } from 'react';
// Get the API base URL from environment variables
// Default to "/api" if not specified
// "/api" is the default for Next.js API routes and used
// for the Pipecat Cloud deployed agent
const API_BASE_URL = process.env.NEXT_PUBLIC_API_BASE_URL || '/api';
console.log('Using API base URL:', API_BASE_URL);
export function RTVIProvider({ children }: PropsWithChildren) {
const [client, setClient] = useState<RTVIClient | null>(null);
useEffect(() => {
const transport = new DailyTransport();
const rtviClient = new RTVIClient({
transport,
params: {
baseUrl: API_BASE_URL,
endpoints: {
connect: '/connect',
},
requestData: { foo: 'bar' },
},
enableMic: true,
enableCam: false,
});
setClient(rtviClient);
}, []);
if (!client) {
return null;
}
return <RTVIClientProvider client={client}>{children}</RTVIClientProvider>;
}

View File

@@ -45,7 +45,7 @@ from pipecat.transports.services.daily import DailyParams, DailyTransport
load_dotenv(override=True)
# Check if we're in local development mode
LOCAL = os.getenv("LOCAL")
LOCAL = os.getenv("LOCAL_RUN")
logger.remove()
logger.add(sys.stderr, level="DEBUG")

View File

@@ -35,7 +35,7 @@ transport_params = {
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
"twilio": lambda: TransportParams(
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),

View File

@@ -42,7 +42,7 @@ transport_params = {
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
"twilio": lambda: TransportParams(
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),

View File

@@ -33,7 +33,7 @@ transport_params = {
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
"twilio": lambda: TransportParams(
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),

View File

@@ -55,7 +55,7 @@ transport_params = {
# endpointing, for now.
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.5)),
),
"twilio": lambda: TransportParams(
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
# set stop_secs to something roughly similar to the internal setting

View File

@@ -18,10 +18,10 @@ from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.gemini_multimodal_live.gemini import (
GeminiMultimodalLiveContext,
GeminiMultimodalLiveLLMService,
)
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
load_dotenv(override=True)

View File

@@ -20,11 +20,10 @@ import {
} from '@pipecat-ai/client-js';
import {
ProtobufFrameSerializer,
WebSocketTransport
} from "@pipecat-ai/websocket-transport";
WebSocketTransport,
} from '@pipecat-ai/websocket-transport';
class RecordingSerializer extends ProtobufFrameSerializer {
private lastTimestamp: number | null = null;
private recordingAudioToSend: boolean = false;
private _recordedAudio: { data: ArrayBuffer; delay: number }[] = [];
@@ -40,7 +39,11 @@ class RecordingSerializer extends ProtobufFrameSerializer {
}
// @ts-ignore
serializeAudio(data: ArrayBuffer, sampleRate: number, numChannels: number): Uint8Array | null {
serializeAudio(
data: ArrayBuffer,
sampleRate: number,
numChannels: number
): Uint8Array | null {
if (this.recordingAudioToSend) {
const now = Date.now();
// Compute delay since last packet
@@ -55,13 +58,13 @@ class RecordingSerializer extends ProtobufFrameSerializer {
}
public get recordedAudio() {
return this._recordedAudio
return this._recordedAudio;
}
}
class WebsocketClientApp {
private ENABLE_RECORDING_MODE = false
private RECORDING_TIME_MS = 10000
private ENABLE_RECORDING_MODE = false;
private RECORDING_TIME_MS = 10000;
private rtviClient: RTVIClient | null = null;
private connectBtn: HTMLButtonElement | null = null;
@@ -71,7 +74,7 @@ class WebsocketClientApp {
private botAudio: HTMLAudioElement;
private declare websocketTransport: WebSocketTransport;
private sendRecordedAudio: boolean = false
private sendRecordedAudio: boolean = false;
private declare recordingSerializer: RecordingSerializer;
private playBtn: HTMLButtonElement | null = null;
@@ -91,8 +94,12 @@ class WebsocketClientApp {
* Set up references to DOM elements and create necessary media elements
*/
private setupDOMElements(): void {
this.connectBtn = document.getElementById('connect-btn') as HTMLButtonElement;
this.disconnectBtn = document.getElementById('disconnect-btn') as HTMLButtonElement;
this.connectBtn = document.getElementById(
'connect-btn'
) as HTMLButtonElement;
this.disconnectBtn = document.getElementById(
'disconnect-btn'
) as HTMLButtonElement;
this.statusSpan = document.getElementById('connection-status');
this.debugLog = document.getElementById('debug-log');
this.playBtn = document.getElementById('play-btn') as HTMLButtonElement;
@@ -105,8 +112,12 @@ class WebsocketClientApp {
private setupEventListeners(): void {
this.connectBtn?.addEventListener('click', () => this.connect());
this.disconnectBtn?.addEventListener('click', () => this.disconnect());
this.playBtn?.addEventListener('click', () => this.startSendingRecordedAudio());
this.stopBtn?.addEventListener('click', () => this.stopSendingRecordedAudio());
this.playBtn?.addEventListener('click', () =>
this.startSendingRecordedAudio()
);
this.stopBtn?.addEventListener('click', () =>
this.stopSendingRecordedAudio()
);
}
/**
@@ -165,7 +176,9 @@ class WebsocketClientApp {
// Listen for tracks stopping
this.rtviClient.on(RTVIEvent.TrackStopped, (track, participant) => {
this.log(`Track stopped: ${track.kind} from ${participant?.name || 'unknown'}`);
this.log(
`Track stopped: ${track.kind} from ${participant?.name || 'unknown'}`
);
});
}
@@ -175,7 +188,10 @@ class WebsocketClientApp {
*/
private setupAudioTrack(track: MediaStreamTrack): void {
this.log('Setting up audio track');
if (this.botAudio.srcObject && "getAudioTracks" in this.botAudio.srcObject) {
if (
this.botAudio.srcObject &&
'getAudioTracks' in this.botAudio.srcObject
) {
const oldTrack = this.botAudio.srcObject.getAudioTracks()[0];
if (oldTrack?.id === track.id) return;
}
@@ -190,27 +206,17 @@ class WebsocketClientApp {
try {
const startTime = Date.now();
this.recordingSerializer = new RecordingSerializer()
const transport = this.ENABLE_RECORDING_MODE ?
new WebSocketTransport({
serializer: this.recordingSerializer,
recorderSampleRate: 8000,
playerSampleRate:8000
}) :
new WebSocketTransport({
serializer: new ProtobufFrameSerializer(),
recorderSampleRate: 8000,
playerSampleRate:8000
});
this.websocketTransport = transport
this.recordingSerializer = new RecordingSerializer();
const ws_opts = {
serializer: this.ENABLE_RECORDING_MODE
? this.recordingSerializer
: new ProtobufFrameSerializer(),
recorderSampleRate: 8000,
playerSampleRate: 8000,
};
const RTVIConfig: RTVIClientOptions = {
transport,
params: {
// The baseURL and endpoint of your bot server that the client will connect to
baseUrl: 'http://localhost:7860',
endpoints: { connect: '/connect' },
},
transport: new WebSocketTransport(ws_opts),
enableMic: true,
enableCam: false,
callbacks: {
@@ -238,27 +244,34 @@ class WebsocketClientApp {
onMessageError: (error) => console.error('Message error:', error),
onError: (error) => console.error('Error:', error),
},
}
};
this.rtviClient = new RTVIClient(RTVIConfig);
this.websocketTransport = this.rtviClient.transport;
this.setupTrackListeners();
this.log('Initializing devices...');
await this.rtviClient.initDevices();
this.log('Connecting to bot...');
await this.rtviClient.connect();
await this.rtviClient.connect({
endpoint: 'http://localhost:7860/connect',
});
const timeTaken = Date.now() - startTime;
this.log(`Connection complete, timeTaken: ${timeTaken}`);
if (this.ENABLE_RECORDING_MODE) {
this.log(`Starting to recording the next ${(this.RECORDING_TIME_MS/1000)}s of audio`);
this.recordingSerializer.startRecording()
await this.sleep(this.RECORDING_TIME_MS)
this.recordingSerializer.stopRecording()
this.log("Recording stopped");
this.rtviClient.enableMic(false)
this.startSendingRecordedAudio()
this.log(
`Starting to recording the next ${
this.RECORDING_TIME_MS / 1000
}s of audio`
);
this.recordingSerializer.startRecording();
await this.sleep(this.RECORDING_TIME_MS);
this.recordingSerializer.stopRecording();
this.log('Recording stopped');
this.rtviClient.enableMic(false);
this.startSendingRecordedAudio();
}
} catch (error) {
this.log(`Error connecting: ${(error as Error).message}`);
@@ -280,11 +293,16 @@ class WebsocketClientApp {
public async disconnect(): Promise<void> {
if (this.rtviClient) {
try {
this.stopSendingRecordedAudio()
this.stopSendingRecordedAudio();
await this.rtviClient.disconnect();
this.rtviClient = null;
if (this.botAudio.srcObject && "getAudioTracks" in this.botAudio.srcObject) {
this.botAudio.srcObject.getAudioTracks().forEach((track) => track.stop());
if (
this.botAudio.srcObject &&
'getAudioTracks' in this.botAudio.srcObject
) {
this.botAudio.srcObject
.getAudioTracks()
.forEach((track) => track.stop());
this.botAudio.srcObject = null;
}
} catch (error) {
@@ -294,21 +312,21 @@ class WebsocketClientApp {
}
private startSendingRecordedAudio() {
this.sendRecordedAudio = true
this.sendRecordedAudio = true;
if (this.playBtn) this.playBtn.disabled = true;
if (this.stopBtn) this.stopBtn.disabled = false;
void this.replayAudio()
void this.replayAudio();
}
private stopSendingRecordedAudio() {
if (this.stopBtn) this.stopBtn.disabled = true;
if (this.playBtn) this.playBtn.disabled = false;
this.sendRecordedAudio = false
this.sendRecordedAudio = false;
}
private async replayAudio() {
if (this.sendRecordedAudio) {
this.log("Sending recorded audio")
this.log('Sending recorded audio');
for (const chunk of this.recordingSerializer.recordedAudio) {
await this.sleep(chunk.delay);
this.websocketTransport.handleUserAudioStream(chunk.data);
@@ -316,14 +334,13 @@ class WebsocketClientApp {
const randomDelay = 1000 + Math.random() * (10000 - 500);
await this.sleep(randomDelay);
void this.replayAudio()
void this.replayAudio();
}
}
private sleep(ms: number): Promise<void> {
return new Promise(resolve => setTimeout(resolve, ms));
return new Promise((resolve) => setTimeout(resolve, ms));
}
}
declare global {

File diff suppressed because it is too large Load Diff

View File

@@ -18,7 +18,7 @@
"vite": "^6.3.5"
},
"dependencies": {
"@pipecat-ai/client-js": "^0.3.5",
"@pipecat-ai/daily-transport": "^0.3.8"
"@pipecat-ai/client-js": "^1.0.0",
"@pipecat-ai/daily-transport": "^1.0.0"
}
}

View File

@@ -5,7 +5,7 @@
*/
/**
* RTVI Client Implementation
* Pipecat Client Implementation
*
* This client connects to an RTVI-compatible bot server using WebRTC (via Daily).
* It handles audio/video streaming and manages the connection lifecycle.
@@ -18,20 +18,22 @@
import {
Participant,
RTVIClient,
RTVIClientOptions,
PipecatClient,
PipecatClientOptions,
RTVIEvent,
} from '@pipecat-ai/client-js';
import { DailyTransport } from '@pipecat-ai/daily-transport';
import {
DailyEventCallbacks,
DailyTransport,
} from '@pipecat-ai/daily-transport';
import SoundUtils from './util/soundUtils';
import { InstantVoiceHelper } from './util/instantVoiceHelper';
/**
* InstantVoiceClient handles the connection and media management for a real-time
* voice and video interaction with an AI bot.
*/
class InstantVoiceClient {
private declare rtviClient: RTVIClient;
private declare pcClient: PipecatClient;
private connectBtn: HTMLButtonElement | null = null;
private disconnectBtn: HTMLButtonElement | null = null;
private statusSpan: HTMLElement | null = null;
@@ -46,7 +48,7 @@ class InstantVoiceClient {
document.body.appendChild(this.botAudio);
this.setupDOMElements();
this.setupEventListeners();
this.initializeRTVIClient();
this.initializePipecatClient();
}
/**
@@ -72,16 +74,11 @@ class InstantVoiceClient {
this.disconnectBtn?.addEventListener('click', () => this.disconnect());
}
private initializeRTVIClient(): void {
const RTVIConfig: RTVIClientOptions = {
private initializePipecatClient(): void {
const PipecatConfig: PipecatClientOptions = {
transport: new DailyTransport({
bufferLocalAudioUntilBotReady: true,
}),
params: {
// The baseURL and endpoint of your bot server that the client will connect to
baseUrl: 'http://localhost:7860',
endpoints: { connect: '/connect' },
},
enableMic: true,
enableCam: false,
callbacks: {
@@ -113,30 +110,23 @@ class InstantVoiceClient {
onBotTranscript: (data) => this.log(`Bot: ${data.text}`),
onMessageError: (error) => console.error('Message error:', error),
onError: (error) => console.error('Error:', error),
},
onAudioBufferingStarted: () => {
SoundUtils.beep();
this.updateBufferingStatus('Yes');
this.log(
`onMicCaptureStarted, timeTaken: ${Date.now() - this.startTime}`
);
},
onAudioBufferingStopped: () => {
this.updateBufferingStatus('No');
this.log(
`onMicCaptureStopped, timeTaken: ${Date.now() - this.startTime}`
);
},
} as DailyEventCallbacks,
};
this.rtviClient = new RTVIClient(RTVIConfig);
this.rtviClient.registerHelper(
'transport',
new InstantVoiceHelper({
callbacks: {
onAudioBufferingStarted: () => {
SoundUtils.beep();
this.updateBufferingStatus('Yes');
this.log(
`onMicCaptureStarted, timeTaken: ${Date.now() - this.startTime}`
);
},
onAudioBufferingStopped: () => {
this.updateBufferingStatus('No');
this.log(
`onMicCaptureStopped, timeTaken: ${Date.now() - this.startTime}`
);
},
},
})
);
this.pcClient = new PipecatClient(PipecatConfig);
this.setupTrackListeners();
}
@@ -182,8 +172,8 @@ class InstantVoiceClient {
* This is called when the bot is ready or when the transport state changes to ready
*/
setupMediaTracks() {
if (!this.rtviClient) return;
const tracks = this.rtviClient.tracks();
if (!this.pcClient) return;
const tracks = this.pcClient.tracks();
if (tracks.bot?.audio) {
this.setupAudioTrack(tracks.bot.audio);
}
@@ -194,10 +184,10 @@ class InstantVoiceClient {
* This handles new tracks being added during the session
*/
setupTrackListeners() {
if (!this.rtviClient) return;
if (!this.pcClient) return;
// Listen for new tracks starting
this.rtviClient.on(RTVIEvent.TrackStarted, (track, participant) => {
this.pcClient.on(RTVIEvent.TrackStarted, (track, participant) => {
// Only handle non-local (bot) tracks
if (!participant?.local && track.kind === 'audio') {
this.setupAudioTrack(track);
@@ -205,7 +195,7 @@ class InstantVoiceClient {
});
// Listen for tracks stopping
this.rtviClient.on(RTVIEvent.TrackStopped, (track, participant) => {
this.pcClient.on(RTVIEvent.TrackStopped, (track, participant) => {
this.log(
`Track stopped: ${track.kind} from ${participant?.name || 'unknown'}`
);
@@ -230,22 +220,25 @@ class InstantVoiceClient {
/**
* Initialize and connect to the bot
* This sets up the RTVI client, initializes devices, and establishes the connection
* This sets up the Pipecat client, initializes devices, and establishes the connection
*/
public async connect(): Promise<void> {
try {
this.startTime = Date.now();
this.log('Connecting to bot...');
await this.rtviClient.connect();
await this.pcClient.connect({
// The baseURL and endpoint of your bot server that the client will connect to
endpoint: 'http://localhost:7860/connect',
});
} catch (error) {
this.log(`Error connecting: ${(error as Error).message}`);
this.updateStatus('Error');
this.updateBufferingStatus('No');
// Clean up if there's an error
if (this.rtviClient) {
if (this.pcClient) {
try {
await this.rtviClient.disconnect();
await this.pcClient.disconnect();
} catch (disconnectError) {
this.log(`Error during disconnect: ${disconnectError}`);
}
@@ -258,7 +251,7 @@ class InstantVoiceClient {
*/
public async disconnect(): Promise<void> {
try {
await this.rtviClient.disconnect();
await this.pcClient.disconnect();
if (
this.botAudio.srcObject &&
'getAudioTracks' in this.botAudio.srcObject

View File

@@ -1,39 +0,0 @@
import {RTVIClientHelper, RTVIClientHelperOptions, RTVIMessage} from "@pipecat-ai/client-js";
import {DailyRTVIMessageType} from '@pipecat-ai/daily-transport';
export type InstantVoiceHelperCallbacks = Partial<{
onAudioBufferingStarted: () => void;
onAudioBufferingStopped: () => void;
}>;
// --- Interface and class
export interface InstantVoiceHelperOptions extends RTVIClientHelperOptions {
callbacks?: InstantVoiceHelperCallbacks;
}
export class InstantVoiceHelper extends RTVIClientHelper {
protected declare _options: InstantVoiceHelperOptions;
constructor(options: InstantVoiceHelperOptions) {
super(options);
}
handleMessage(rtviMessage: RTVIMessage): void {
switch (rtviMessage.type) {
case DailyRTVIMessageType.AUDIO_BUFFERING_STARTED:
if (this._options.callbacks?.onAudioBufferingStarted) {
this._options.callbacks?.onAudioBufferingStarted()
}
break;
case DailyRTVIMessageType.AUDIO_BUFFERING_STOPPED:
if (this._options.callbacks?.onAudioBufferingStopped) {
this._options.callbacks?.onAudioBufferingStopped()
}
break;
}
}
getMessageTypes(): string[] {
return [DailyRTVIMessageType.AUDIO_BUFFERING_STARTED, DailyRTVIMessageType.AUDIO_BUFFERING_STOPPED];
}
}

File diff suppressed because it is too large Load Diff

View File

@@ -15,7 +15,7 @@
"vite": "^6.3.5"
},
"dependencies": {
"@pipecat-ai/client-js": "^0.3.5",
"@pipecat-ai/daily-transport": "^0.3.8"
"@pipecat-ai/client-js": "^1.0.0",
"@pipecat-ai/daily-transport": "^1.0.0"
}
}

View File

@@ -5,7 +5,7 @@
*/
/**
* RTVI Client Implementation
* Pipecat Client Implementation
*
* This client connects to an RTVI-compatible bot server using WebRTC (via Daily).
* It handles audio/video streaming and manages the connection lifecycle.
@@ -16,78 +16,9 @@
* - Browser with WebRTC support
*/
import {
LogLevel,
RTVIClient,
RTVIClientHelper,
RTVIEvent,
} from '@pipecat-ai/client-js';
import { LogLevel, PipecatClient, RTVIEvent } from '@pipecat-ai/client-js';
import { DailyTransport } from '@pipecat-ai/daily-transport';
class SearchResponseHelper extends RTVIClientHelper {
constructor(contentPanel) {
super();
this.contentPanel = contentPanel;
}
handleMessage(rtviMessage) {
console.log('SearchResponseHelper, received message:', rtviMessage);
if (rtviMessage.data) {
// Clear existing content
this.contentPanel.innerHTML = '';
// Create a container for all content
const contentContainer = document.createElement('div');
contentContainer.className = 'content-container';
// Add the search_result
if (rtviMessage.data.search_result) {
const searchResultDiv = document.createElement('div');
searchResultDiv.className = 'search-result';
searchResultDiv.textContent = rtviMessage.data.search_result;
contentContainer.appendChild(searchResultDiv);
}
// Add the sources
if (rtviMessage.data.origins) {
const sourcesDiv = document.createElement('div');
sourcesDiv.className = 'sources';
const sourcesTitle = document.createElement('h3');
sourcesTitle.className = 'sources-title';
sourcesTitle.textContent = 'Sources:';
sourcesDiv.appendChild(sourcesTitle);
rtviMessage.data.origins.forEach((origin) => {
const sourceLink = document.createElement('a');
sourceLink.className = 'source-link';
sourceLink.href = origin.site_uri;
sourceLink.target = '_blank';
sourceLink.textContent = origin.site_title;
sourcesDiv.appendChild(sourceLink);
});
contentContainer.appendChild(sourcesDiv);
}
// Add the rendered_content in an iframe
if (rtviMessage.data.rendered_content) {
const iframe = document.createElement('iframe');
iframe.className = 'iframe-container';
iframe.srcdoc = rtviMessage.data.rendered_content;
contentContainer.appendChild(iframe);
}
// Append the content container to the content panel
this.contentPanel.appendChild(contentContainer);
}
}
getMessageTypes() {
return ['bot-llm-search-response'];
}
}
/**
* ChatbotClient handles the connection and media management for a real-time
* voice and video interaction with an AI bot.
@@ -95,7 +26,7 @@ class SearchResponseHelper extends RTVIClientHelper {
class ChatbotClient {
constructor() {
// Initialize client state
this.rtviClient = null;
this.pcClient = null;
this.setupDOMElements();
this.setupEventListeners();
}
@@ -160,10 +91,10 @@ class ChatbotClient {
* This is called when the bot is ready or when the transport state changes to ready
*/
setupMediaTracks() {
if (!this.rtviClient) return;
if (!this.pcClient) return;
// Get current tracks from the client
const tracks = this.rtviClient.tracks();
const tracks = this.pcClient.tracks();
// Set up any available bot tracks
if (tracks.bot?.audio) {
@@ -176,10 +107,10 @@ class ChatbotClient {
* This handles new tracks being added during the session
*/
setupTrackListeners() {
if (!this.rtviClient) return;
if (!this.pcClient) return;
// Listen for new tracks starting
this.rtviClient.on(RTVIEvent.TrackStarted, (track, participant) => {
this.pcClient.on(RTVIEvent.TrackStarted, (track, participant) => {
// Only handle non-local (bot) tracks
if (!participant?.local && track.kind === 'audio') {
this.setupAudioTrack(track);
@@ -187,7 +118,7 @@ class ChatbotClient {
});
// Listen for tracks stopping
this.rtviClient.on(RTVIEvent.TrackStopped, (track, participant) => {
this.pcClient.on(RTVIEvent.TrackStopped, (track, participant) => {
this.log(
`Track stopped event: ${track.kind} from ${
participant?.name || 'unknown'
@@ -213,20 +144,13 @@ class ChatbotClient {
/**
* Initialize and connect to the bot
* This sets up the RTVI client, initializes devices, and establishes the connection
* This sets up the Pipecat client, initializes devices, and establishes the connection
*/
async connect() {
try {
// Initialize the RTVI client with a Daily WebRTC transport and our configuration
this.rtviClient = new RTVIClient({
// Initialize the Pipecat client with a Daily WebRTC transport and our configuration
this.pcClient = new PipecatClient({
transport: new DailyTransport(),
params: {
// The baseURL and endpoint of your bot server that the client will connect to
baseUrl: 'http://localhost:7860',
endpoints: {
connect: '/connect',
},
},
enableMic: true, // Enable microphone for user input
enableCam: false,
callbacks: {
@@ -251,6 +175,8 @@ class ChatbotClient {
this.setupMediaTracks();
}
},
// Handle search response events
onBotLlmSearchResponse: this.handleSearchResponse.bind(this),
// Handle bot connection events
onBotConnected: (participant) => {
this.log(`Bot connected: ${JSON.stringify(participant)}`);
@@ -281,22 +207,22 @@ class ChatbotClient {
},
},
});
//this.rtviClient.setLogLevel(LogLevel.DEBUG)
this.rtviClient.registerHelper(
'llm',
new SearchResponseHelper(this.searchResultContainer)
);
//this.pcClient.setLogLevel(LogLevel.DEBUG)
// Set up listeners for media track events
this.setupTrackListeners();
// Initialize audio devices
this.log('Initializing devices...');
await this.rtviClient.initDevices();
await this.pcClient.initDevices();
// Connect to the bot
this.log('Connecting to bot...');
await this.rtviClient.connect();
await this.pcClient.connect({
// The baseURL and endpoint of your bot server that the client will connect to
endpoint: 'http://localhost:7860/connect',
});
this.log('Connection complete');
} catch (error) {
@@ -306,9 +232,9 @@ class ChatbotClient {
this.updateStatus('Error');
// Clean up if there's an error
if (this.rtviClient) {
if (this.pcClient) {
try {
await this.rtviClient.disconnect();
await this.pcClient.disconnect();
} catch (disconnectError) {
this.log(`Error during disconnect: ${disconnectError.message}`);
}
@@ -320,11 +246,11 @@ class ChatbotClient {
* Disconnect from the bot and clean up media resources
*/
async disconnect() {
if (this.rtviClient) {
if (this.pcClient) {
try {
// Disconnect the RTVI client
await this.rtviClient.disconnect();
this.rtviClient = null;
// Disconnect the Pipecat client
await this.pcClient.disconnect();
this.pcClient = null;
// Clean up audio
if (this.botAudio.srcObject) {
@@ -339,6 +265,57 @@ class ChatbotClient {
}
}
}
handleSearchResponse(response) {
console.log('SearchResponseHelper, received message:', response);
// Clear existing content
this.searchResultContainer.innerHTML = '';
// Create a container for all content
const contentContainer = document.createElement('div');
contentContainer.className = 'content-container';
// Add the search_result
if (response.search_result) {
const searchResultDiv = document.createElement('div');
searchResultDiv.className = 'search-result';
searchResultDiv.textContent = response.search_result;
contentContainer.appendChild(searchResultDiv);
}
// Add the sources
if (response.origins) {
const sourcesDiv = document.createElement('div');
sourcesDiv.className = 'sources';
const sourcesTitle = document.createElement('h3');
sourcesTitle.className = 'sources-title';
sourcesTitle.textContent = 'Sources:';
sourcesDiv.appendChild(sourcesTitle);
response.origins.forEach((origin) => {
const sourceLink = document.createElement('a');
sourceLink.className = 'source-link';
sourceLink.href = origin.site_uri;
sourceLink.target = '_blank';
sourceLink.textContent = origin.site_title;
sourcesDiv.appendChild(sourceLink);
});
contentContainer.appendChild(sourcesDiv);
}
// Add the rendered_content in an iframe
if (response.rendered_content) {
const iframe = document.createElement('iframe');
iframe.className = 'iframe-container';
iframe.srcdoc = response.rendered_content;
contentContainer.appendChild(iframe);
}
// Append the content container to the content panel
this.searchResultContainer.appendChild(contentContainer);
}
}
// Initialize the client when the page loads

View File

@@ -24,6 +24,7 @@ from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.llm_service import FunctionCallParams
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.utils.tracing.setup import setup_tracing
@@ -61,7 +62,7 @@ transport_params = {
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
"twilio": lambda: TransportParams(
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),

View File

@@ -24,6 +24,7 @@ from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.llm_service import FunctionCallParams
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams
from pipecat.transports.services.daily import DailyParams
from pipecat.utils.tracing.setup import setup_tracing
@@ -58,7 +59,7 @@ transport_params = {
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
"twilio": lambda: TransportParams(
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),

File diff suppressed because it is too large Load Diff

View File

@@ -18,7 +18,7 @@
"vite": "^6.3.5"
},
"dependencies": {
"@pipecat-ai/client-js": "^0.3.2",
"@pipecat-ai/small-webrtc-transport": "^0.0.2"
"@pipecat-ai/client-js": "^1.0.0",
"@pipecat-ai/small-webrtc-transport": "^1.0.0"
}
}

View File

@@ -1,217 +1,236 @@
import { SmallWebRTCTransport } from '@pipecat-ai/small-webrtc-transport';
import {
SmallWebRTCTransport
} from "@pipecat-ai/small-webrtc-transport";
import {Participant, RTVIClient, RTVIClientOptions, Transport} from "@pipecat-ai/client-js";
BotLLMTextData,
Participant,
PipecatClient,
PipecatClientOptions,
TranscriptData,
TransportState,
} from '@pipecat-ai/client-js';
class WebRTCApp {
private declare connectBtn: HTMLButtonElement;
private declare disconnectBtn: HTMLButtonElement;
private declare muteBtn: HTMLButtonElement;
private declare connectBtn: HTMLButtonElement;
private declare disconnectBtn: HTMLButtonElement;
private declare muteBtn: HTMLButtonElement;
private declare audioInput: HTMLSelectElement;
private declare videoInput: HTMLSelectElement;
private declare audioCodec: HTMLSelectElement;
private declare videoCodec: HTMLSelectElement;
private declare audioInput: HTMLSelectElement;
private declare videoInput: HTMLSelectElement;
private declare audioCodec: HTMLSelectElement;
private declare videoCodec: HTMLSelectElement;
private declare videoElement: HTMLVideoElement;
private declare audioElement: HTMLAudioElement;
private declare videoElement: HTMLVideoElement;
private declare audioElement: HTMLAudioElement;
private debugLog: HTMLElement | null = null;
private statusSpan: HTMLElement | null = null;
private debugLog: HTMLElement | null = null;
private statusSpan: HTMLElement | null = null;
private declare smallWebRTCTransport: SmallWebRTCTransport;
private declare pcClient: PipecatClient;
private declare smallWebRTCTransport: SmallWebRTCTransport;
private declare rtviClient: RTVIClient;
constructor() {
this.setupDOMElements();
this.setupDOMEventListeners();
this.initializePipecatClient();
void this.populateDevices();
}
constructor() {
this.setupDOMElements();
this.setupDOMEventListeners();
this.initializeRTVIClient()
void this.populateDevices();
private initializePipecatClient(): void {
const opts: PipecatClientOptions = {
transport: new SmallWebRTCTransport({ connectionUrl: '/api/offer' }),
enableMic: true,
enableCam: true,
callbacks: {
onTransportStateChanged: (state: TransportState) => {
this.log(`Transport state: ${state}`);
},
onConnected: () => {
this.onConnectedHandler();
},
onBotReady: () => {
this.log('Bot is ready.');
},
onDisconnected: () => {
this.onDisconnectedHandler();
},
onUserStartedSpeaking: () => {
this.log('User started speaking.');
},
onUserStoppedSpeaking: () => {
this.log('User stopped speaking.');
},
onBotStartedSpeaking: () => {
this.log('Bot started speaking.');
},
onBotStoppedSpeaking: () => {
this.log('Bot stopped speaking.');
},
onUserTranscript: (transcript: TranscriptData) => {
if (transcript.final) {
this.log(`User transcript: ${transcript.text}`);
}
},
onBotTranscript: (data: BotLLMTextData) => {
this.log(`Bot transcript: ${data.text}`);
},
onTrackStarted: (
track: MediaStreamTrack,
participant?: Participant
) => {
if (participant?.local) {
return;
}
this.onBotTrackStarted(track);
},
onServerMessage: (msg: unknown) => {
this.log(`Server message: ${msg}`);
},
},
};
this.pcClient = new PipecatClient(opts);
this.smallWebRTCTransport = this.pcClient.transport as SmallWebRTCTransport;
}
private setupDOMElements(): void {
this.connectBtn = document.getElementById(
'connect-btn'
) as HTMLButtonElement;
this.disconnectBtn = document.getElementById(
'disconnect-btn'
) as HTMLButtonElement;
this.muteBtn = document.getElementById('mute-btn') as HTMLButtonElement;
this.audioInput = document.getElementById(
'audio-input'
) as HTMLSelectElement;
this.videoInput = document.getElementById(
'video-input'
) as HTMLSelectElement;
this.audioCodec = document.getElementById(
'audio-codec'
) as HTMLSelectElement;
this.videoCodec = document.getElementById(
'video-codec'
) as HTMLSelectElement;
this.videoElement = document.getElementById(
'bot-video'
) as HTMLVideoElement;
this.audioElement = document.getElementById(
'bot-audio'
) as HTMLAudioElement;
this.debugLog = document.getElementById('debug-log');
this.statusSpan = document.getElementById('connection-status');
}
private setupDOMEventListeners(): void {
this.connectBtn.addEventListener('click', () => this.start());
this.disconnectBtn.addEventListener('click', () => this.stop());
this.audioInput.addEventListener('change', (e) => {
// @ts-ignore
let audioDevice = e.target?.value;
this.pcClient.updateMic(audioDevice);
});
this.videoInput.addEventListener('change', (e) => {
// @ts-ignore
let videoDevice = e.target?.value;
this.pcClient.updateCam(videoDevice);
});
this.muteBtn.addEventListener('click', () => {
let isCamEnabled = this.pcClient.isCamEnabled;
this.pcClient.enableCam(!isCamEnabled);
this.muteBtn.textContent = isCamEnabled ? '📵' : '📷';
});
}
private log(message: string): void {
if (!this.debugLog) return;
const entry = document.createElement('div');
entry.textContent = `${new Date().toISOString()} - ${message}`;
if (message.startsWith('User: ')) {
entry.style.color = '#2196F3';
} else if (message.startsWith('Bot: ')) {
entry.style.color = '#4CAF50';
}
this.debugLog.appendChild(entry);
this.debugLog.scrollTop = this.debugLog.scrollHeight;
}
private initializeRTVIClient(): void {
const transport = new SmallWebRTCTransport();
const RTVIConfig: RTVIClientOptions = {
params: {
baseUrl: "/api/offer"
},
transport: transport as Transport,
enableMic: true,
enableCam: true,
callbacks: {
onTransportStateChanged: (state) => {
this.log(`Transport state: ${state}`)
},
onConnected: () => {
this.onConnectedHandler()
},
onBotReady: () => {
this.log("Bot is ready.")
},
onDisconnected: () => {
this.onDisconnectedHandler()
},
onUserStartedSpeaking: () => {
this.log("User started speaking.")
},
onUserStoppedSpeaking: () => {
this.log("User stopped speaking.")
},
onBotStartedSpeaking: () => {
this.log("Bot started speaking.")
},
onBotStoppedSpeaking: () => {
this.log("Bot stopped speaking.")
},
onUserTranscript: (transcript) => {
if (transcript.final) {
this.log(`User transcript: ${transcript.text}`)
}
},
onBotTranscript: (transcript) => {
this.log(`Bot transcript: ${transcript.text}`)
},
onTrackStarted: (track: MediaStreamTrack, participant?: Participant) => {
if (participant?.local) {
return
}
this.onBotTrackStarted(track)
},
onServerMessage: (msg) => {
this.log(`Server message: ${msg}`)
}
},
}
RTVIConfig.customConnectHandler = () => Promise.resolve();
this.rtviClient = new RTVIClient(RTVIConfig);
this.smallWebRTCTransport = transport
private clearAllLogs() {
this.debugLog!.innerText = '';
}
private updateStatus(status: string): void {
if (this.statusSpan) {
this.statusSpan.textContent = status;
}
this.log(`Status: ${status}`);
}
private setupDOMElements(): void {
this.connectBtn = document.getElementById('connect-btn') as HTMLButtonElement;
this.disconnectBtn = document.getElementById('disconnect-btn') as HTMLButtonElement;
this.muteBtn = document.getElementById('mute-btn') as HTMLButtonElement;
private onConnectedHandler() {
this.updateStatus('Connected');
if (this.connectBtn) this.connectBtn.disabled = true;
if (this.disconnectBtn) this.disconnectBtn.disabled = false;
}
this.audioInput = document.getElementById('audio-input') as HTMLSelectElement;
this.videoInput = document.getElementById('video-input') as HTMLSelectElement;
this.audioCodec = document.getElementById('audio-codec') as HTMLSelectElement;
this.videoCodec = document.getElementById('video-codec') as HTMLSelectElement;
private onDisconnectedHandler() {
this.updateStatus('Disconnected');
if (this.connectBtn) this.connectBtn.disabled = false;
if (this.disconnectBtn) this.disconnectBtn.disabled = true;
}
this.videoElement = document.getElementById('bot-video') as HTMLVideoElement;
this.audioElement = document.getElementById('bot-audio') as HTMLAudioElement;
this.debugLog = document.getElementById('debug-log');
this.statusSpan = document.getElementById('connection-status');
private onBotTrackStarted(track: MediaStreamTrack) {
if (track.kind === 'video') {
this.videoElement.srcObject = new MediaStream([track]);
} else {
this.audioElement.srcObject = new MediaStream([track]);
}
}
private setupDOMEventListeners(): void {
this.connectBtn.addEventListener("click", () => this.start());
this.disconnectBtn.addEventListener("click", () => this.stop());
this.audioInput.addEventListener("change", (e) => {
// @ts-ignore
let audioDevice = e.target?.value
this.rtviClient.updateMic(audioDevice)
})
this.videoInput.addEventListener("change", (e) => {
// @ts-ignore
let videoDevice = e.target?.value
this.rtviClient.updateCam(videoDevice)
})
this.muteBtn.addEventListener('click', () => {
let isCamEnabled = this.rtviClient.isCamEnabled
this.rtviClient.enableCam(!isCamEnabled)
this.muteBtn.textContent = isCamEnabled ? '📵' : '📷';
});
private async populateDevices(): Promise<void> {
const populateSelect = (
select: HTMLSelectElement,
devices: MediaDeviceInfo[]
): void => {
let counter = 1;
devices.forEach((device) => {
const option = document.createElement('option');
option.value = device.deviceId;
option.text = device.label || 'Device #' + counter;
select.appendChild(option);
counter += 1;
});
};
try {
const audioDevices = await this.pcClient.getAllMics();
populateSelect(this.audioInput, audioDevices);
const videoDevices = await this.pcClient.getAllCams();
populateSelect(this.videoInput, videoDevices);
} catch (e) {
alert(e);
}
}
private log(message: string): void {
if (!this.debugLog) return;
const entry = document.createElement('div');
entry.textContent = `${new Date().toISOString()} - ${message}`;
if (message.startsWith('User: ')) {
entry.style.color = '#2196F3';
} else if (message.startsWith('Bot: ')) {
entry.style.color = '#4CAF50';
}
this.debugLog.appendChild(entry);
this.debugLog.scrollTop = this.debugLog.scrollHeight;
private async start(): Promise<void> {
this.clearAllLogs();
this.connectBtn.disabled = true;
this.updateStatus('Connecting');
this.smallWebRTCTransport.setAudioCodec(this.audioCodec.value);
this.smallWebRTCTransport.setVideoCodec(this.videoCodec.value);
try {
await this.pcClient.connect();
} catch (e) {
console.log(`Failed to connect ${e}`);
this.stop();
}
}
private clearAllLogs() {
this.debugLog!.innerText = ''
}
private updateStatus(status: string): void {
if (this.statusSpan) {
this.statusSpan.textContent = status;
}
this.log(`Status: ${status}`);
}
private onConnectedHandler() {
this.updateStatus('Connected');
if (this.connectBtn) this.connectBtn.disabled = true;
if (this.disconnectBtn) this.disconnectBtn.disabled = false;
}
private onDisconnectedHandler() {
this.updateStatus('Disconnected');
if (this.connectBtn) this.connectBtn.disabled = false;
if (this.disconnectBtn) this.disconnectBtn.disabled = true;
}
private onBotTrackStarted(track: MediaStreamTrack) {
if (track.kind === 'video') {
this.videoElement.srcObject = new MediaStream([track]);
} else {
this.audioElement.srcObject = new MediaStream([track]);
}
}
private async populateDevices(): Promise<void> {
const populateSelect = (select: HTMLSelectElement, devices: MediaDeviceInfo[]): void => {
let counter = 1;
devices.forEach((device) => {
const option = document.createElement('option');
option.value = device.deviceId;
option.text = device.label || ('Device #' + counter);
select.appendChild(option);
counter += 1;
});
};
try {
const audioDevices = await this.rtviClient.getAllMics();
populateSelect(this.audioInput, audioDevices);
const videoDevices = await this.rtviClient.getAllCams();
populateSelect(this.videoInput, videoDevices);
} catch (e) {
alert(e);
}
}
private async start(): Promise<void> {
this.clearAllLogs()
this.connectBtn.disabled = true;
this.updateStatus("Connecting")
this.smallWebRTCTransport.setAudioCodec(this.audioCodec.value)
this.smallWebRTCTransport.setVideoCodec(this.videoCodec.value)
try {
await this.rtviClient.connect()
} catch (e) {
console.log(`Failed to connect ${e}`)
this.stop()
}
}
private stop(): void {
void this.rtviClient.disconnect()
}
private stop(): void {
void this.pcClient.disconnect();
}
}
// Create the WebRTCConnection instance

View File

@@ -1,40 +1,44 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>AI Chatbot</title>
</head>
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>AI Chatbot</title>
</head>
<body>
<div class="container">
<div class="status-bar">
<div class="status">
Status: <span id="connection-status">Disconnected</span>
</div>
<div class="controls">
<button id="connect-btn">Connect</button>
<button id="disconnect-btn" disabled>Disconnect</button>
</div>
</div>
<div class="main-content">
<div class="bot-container">
<div id="bot-video-container">
<body>
<div class="container">
<div class="status-bar">
<div class="status">
Status: <span id="connection-status">Disconnected</span>
</div>
<audio id="bot-audio" autoplay></audio>
<div class="controls">
<button id="connect-btn">Connect</button>
<button id="disconnect-btn" disabled>Disconnect</button>
</div>
</div>
<div class="main-content">
<div class="bot-container">
<div id="bot-video-container"></div>
<audio id="bot-audio" autoplay></audio>
</div>
</div>
<div class="device-bar">
<div class="device-controls">
<select id="device-selector"></select>
<button id="mic-toggle-btn">Unmute Mic</button>
</div>
</div>
<div class="debug-panel">
<h3>Debug Info</h3>
<div id="debug-log"></div>
</div>
</div>
<div class="debug-panel">
<h3>Debug Info</h3>
<div id="debug-log"></div>
</div>
</div>
<script type="module" src="/src/app.js"></script>
<link rel="stylesheet" href="/src/style.css">
</body>
</html>
<script type="module" src="/src/app.js"></script>
<link rel="stylesheet" href="/src/style.css" />
</body>
</html>

File diff suppressed because it is too large Load Diff

View File

@@ -15,7 +15,7 @@
"vite": "^6.3.5"
},
"dependencies": {
"@pipecat-ai/client-js": "^0.3.5",
"@pipecat-ai/daily-transport": "^0.3.8"
"@pipecat-ai/client-js": "^1.0.0",
"@pipecat-ai/daily-transport": "^1.0.0"
}
}

View File

@@ -5,7 +5,7 @@
*/
/**
* RTVI Client Implementation
* Pipecat Client Implementation
*
* This client connects to an RTVI-compatible bot server using WebRTC (via Daily).
* It handles audio/video streaming and manages the connection lifecycle.
@@ -16,7 +16,7 @@
* - Browser with WebRTC support
*/
import { RTVIClient, RTVIEvent } from '@pipecat-ai/client-js';
import { PipecatClient, RTVIEvent } from '@pipecat-ai/client-js';
import { DailyTransport } from '@pipecat-ai/daily-transport';
/**
@@ -26,9 +26,8 @@ import { DailyTransport } from '@pipecat-ai/daily-transport';
class ChatbotClient {
constructor() {
// Initialize client state
this.rtviClient = null;
this.pcClient = null;
this.setupDOMElements();
this.setupEventListeners();
this.initializeClientAndTransport();
}
@@ -42,6 +41,8 @@ class ChatbotClient {
this.statusSpan = document.getElementById('connection-status');
this.debugLog = document.getElementById('debug-log');
this.botVideoContainer = document.getElementById('bot-video-container');
this.deviceSelector = document.getElementById('device-selector');
this.micToggleBtn = document.getElementById('mic-toggle-btn');
// Create an audio element for bot's voice output
this.botAudio = document.createElement('audio');
@@ -54,25 +55,53 @@ class ChatbotClient {
* Set up event listeners for connect/disconnect buttons
*/
setupEventListeners() {
this.connectBtn.addEventListener('click', () => this.connect());
this.connectBtn.addEventListener('click', () => {
console.log('click');
this.connect();
});
this.disconnectBtn.addEventListener('click', () => this.disconnect());
// Populate device selector
this.pcClient.getAllMics().then((mics) => {
console.log('Available mics:', mics);
mics.forEach((device) => {
const option = document.createElement('option');
option.value = device.deviceId;
option.textContent = device.label || `Microphone ${device.deviceId}`;
this.deviceSelector.appendChild(option);
});
});
this.deviceSelector.addEventListener('change', (event) => {
const selectedDeviceId = event.target.value;
console.log('Selected device ID:', selectedDeviceId);
this.pcClient.updateMic(selectedDeviceId);
});
// Handle mic mute/unmute toggle
const micToggleBtn = document.getElementById('mic-toggle-btn');
micToggleBtn.addEventListener('click', async () => {
if (this.pcClient.state === 'disconnected') {
await this.pcClient.initDevices();
} else {
this.pcClient.enableMic(!this.pcClient.isMicEnabled);
}
});
}
updateMicToggleButton(micEnabled) {
console.log('Mic enabled:', micEnabled, this.pcClient?.isMicEnabled);
this.micToggleBtn.textContent = micEnabled ? 'Mute Mic' : 'Unmute Mic';
}
/**
* Set up the RTVI client and Daily transport
* Set up the Pipecat client and Daily transport
*/
initializeClientAndTransport() {
// Initialize the RTVI client with a DailyTransport and our configuration
this.rtviClient = new RTVIClient({
async initializeClientAndTransport() {
console.log('Initializing Pipecat client and transport...');
// Initialize the Pipecat client with a DailyTransport and our configuration
this.pcClient = new PipecatClient({
transport: new DailyTransport(),
params: {
// The baseURL and endpoint of your bot server that the client will connect to
baseUrl: 'http://localhost:7860',
endpoints: {
connect: '/connect',
},
},
enableMic: true, // Enable microphone for user input
enableMic: true,
enableCam: false,
callbacks: {
// Handle connection state changes
@@ -87,6 +116,7 @@ class ChatbotClient {
this.connectBtn.disabled = false;
this.disconnectBtn.disabled = true;
this.log('Client disconnected');
this.updateMicToggleButton(false);
},
// Handle transport state changes
onTransportStateChanged: (state) => {
@@ -121,14 +151,20 @@ class ChatbotClient {
onMessageError: (error) => {
console.log('Message error:', error);
},
onMicUpdated: (data) => {
console.log('Mic updated:', data);
this.deviceSelector.value = data.deviceId;
},
onError: (error) => {
console.log('Error:', JSON.stringify(error));
},
},
});
window.client = this; // Expose client globally for debugging
// Set up listeners for media track events
this.setupTrackListeners();
this.setupEventListeners();
}
/**
@@ -163,10 +199,10 @@ class ChatbotClient {
* This is called when the bot is ready or when the transport state changes to ready
*/
setupMediaTracks() {
if (!this.rtviClient) return;
if (!this.pcClient) return;
// Get current tracks from the client
const tracks = this.rtviClient.tracks();
const tracks = this.pcClient.tracks();
// Set up any available bot tracks
if (tracks.bot?.audio) {
@@ -182,27 +218,34 @@ class ChatbotClient {
* This handles new tracks being added during the session
*/
setupTrackListeners() {
if (!this.rtviClient) return;
if (!this.pcClient) return;
// Listen for new tracks starting
this.rtviClient.on(RTVIEvent.TrackStarted, (track, participant) => {
// Only handle non-local (bot) tracks
this.pcClient.on(RTVIEvent.TrackStarted, (track, participant) => {
if (!participant?.local) {
if (track.kind === 'audio') {
this.setupAudioTrack(track);
} else if (track.kind === 'video') {
this.setupVideoTrack(track);
}
} else if (track.kind === 'audio') {
console.log(`Local audio track started: `, this.pcClient.tracks());
// If local audio track starts, update mic
this.updateMicToggleButton(true);
}
});
// Listen for tracks stopping
this.rtviClient.on(RTVIEvent.TrackStopped, (track, participant) => {
this.pcClient.on(RTVIEvent.TrackStopped, (track, participant) => {
this.log(
`Track stopped event: ${track.kind} from ${
participant?.name || 'unknown'
participant ? (participant.local ? 'local' : 'bot') : 'unknown'
}`
);
if (participant?.local && track.kind === 'audio') {
// If local audio track stops, update mic toggle button
this.updateMicToggleButton(false);
}
});
}
@@ -251,17 +294,16 @@ class ChatbotClient {
/**
* Initialize and connect to the bot
* This sets up the RTVI client, initializes devices, and establishes the connection
* This sets up the Pipecat client, initializes devices, and establishes the connection
*/
async connect() {
try {
// Initialize audio/video devices
this.log('Initializing devices...');
await this.rtviClient.initDevices();
// Connect to the bot
this.log('Connecting to bot...');
await this.rtviClient.connect();
await this.pcClient.connect({
endpoint: 'http://localhost:7860/connect',
timeout: 25000,
});
this.log('Connection complete');
} catch (error) {
@@ -271,9 +313,9 @@ class ChatbotClient {
this.updateStatus('Error');
// Clean up if there's an error
if (this.rtviClient) {
if (this.pcClient) {
try {
await this.rtviClient.disconnect();
await this.pcClient.disconnect();
} catch (disconnectError) {
this.log(`Error during disconnect: ${disconnectError.message}`);
}
@@ -285,10 +327,10 @@ class ChatbotClient {
* Disconnect from the bot and clean up media resources
*/
async disconnect() {
if (this.rtviClient) {
if (this.pcClient) {
try {
// Disconnect the RTVI client
await this.rtviClient.disconnect();
// Disconnect the Pipecat client
await this.pcClient.disconnect();
// Clean up audio
if (this.botAudio.srcObject) {

View File

@@ -10,7 +10,8 @@ body {
margin: 0 auto;
}
.status-bar {
.status-bar,
.device-bar {
display: flex;
justify-content: space-between;
align-items: center;
@@ -20,7 +21,19 @@ body {
margin-bottom: 20px;
}
.controls button {
.controls,
.device-controls {
display: flex;
align-items: center;
gap: 10px; /* Adds spacing between elements */
}
.device-controls {
margin-left: auto;
}
.controls button,
.device-controls button {
padding: 8px 16px;
margin-left: 10px;
border: none;
@@ -28,6 +41,27 @@ body {
cursor: pointer;
}
#bot-selector,
#device-selector {
padding: 8px 16px;
padding-right: 40px;
border: none;
border-radius: 4px;
background-color: #6c757d; /* Gray background */
color: white; /* White text */
cursor: pointer;
appearance: none; /* Removes default browser styling for dropdowns */
background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 24 24' fill='white'%3E%3Cpath d='M7 10l5 5 5-5z'/%3E%3C/svg%3E"); /* Custom arrow */
background-repeat: no-repeat;
background-position: right 8px center; /* Position the arrow */
}
#bot-selector:focus,
#device-selector:focus {
outline: none;
box-shadow: 0 0 4px rgba(0, 0, 0, 0.3); /* Add a subtle focus effect */
}
#connect-btn {
background-color: #4caf50;
color: white;
@@ -38,6 +72,9 @@ body {
color: white;
}
#mic-toggle-btn {
}
button:disabled {
opacity: 0.5;
cursor: not-allowed;

File diff suppressed because it is too large Load Diff

View File

@@ -10,9 +10,9 @@
"preview": "vite preview"
},
"dependencies": {
"@pipecat-ai/client-js": "^0.3.5",
"@pipecat-ai/client-react": "^0.3.5",
"@pipecat-ai/daily-transport": "^0.3.8",
"@pipecat-ai/client-js": "^1.0.0",
"@pipecat-ai/client-react": "^1.0.0",
"@pipecat-ai/daily-transport": "^1.0.0",
"react": "^18.3.1",
"react-dom": "^18.3.1"
},

View File

@@ -1,22 +1,22 @@
import {
RTVIClientAudio,
RTVIClientVideo,
useRTVIClientTransportState,
PipecatClientAudio,
PipecatClientVideo,
usePipecatClientTransportState,
} from '@pipecat-ai/client-react';
import { RTVIProvider } from './providers/RTVIProvider';
import { PipecatProvider } from './providers/PipecatProvider';
import { ConnectButton } from './components/ConnectButton';
import { StatusDisplay } from './components/StatusDisplay';
import { DebugDisplay } from './components/DebugDisplay';
import './App.css';
function BotVideo() {
const transportState = useRTVIClientTransportState();
const transportState = usePipecatClientTransportState();
const isConnected = transportState !== 'disconnected';
return (
<div className="bot-container">
<div className="video-container">
{isConnected && <RTVIClientVideo participant="bot" fit="cover" />}
{isConnected && <PipecatClientVideo participant="bot" fit="cover" />}
</div>
</div>
);
@@ -35,16 +35,16 @@ function AppContent() {
</div>
<DebugDisplay />
<RTVIClientAudio />
<PipecatClientAudio />
</div>
);
}
function App() {
return (
<RTVIProvider>
<PipecatProvider>
<AppContent />
</RTVIProvider>
</PipecatProvider>
);
}

View File

@@ -1,16 +1,16 @@
import {
useRTVIClient,
useRTVIClientTransportState,
usePipecatClient,
usePipecatClientTransportState,
} from '@pipecat-ai/client-react';
export function ConnectButton() {
const client = useRTVIClient();
const transportState = useRTVIClientTransportState();
const client = usePipecatClient();
const transportState = usePipecatClientTransportState();
const isConnected = ['connected', 'ready'].includes(transportState);
const handleClick = async () => {
if (!client) {
console.error('RTVI client is not initialized');
console.error('Pipecat client is not initialized');
return;
}
@@ -18,7 +18,7 @@ export function ConnectButton() {
if (isConnected) {
await client.disconnect();
} else {
await client.connect();
await client.connect({ endpoint: 'http://localhost:7860/connect' });
}
} catch (error) {
console.error('Connection error:', error);

View File

@@ -6,12 +6,12 @@ import {
TranscriptData,
BotLLMTextData,
} from '@pipecat-ai/client-js';
import { useRTVIClient, useRTVIClientEvent } from '@pipecat-ai/client-react';
import { usePipecatClient, useRTVIClientEvent } from '@pipecat-ai/client-react';
import './DebugDisplay.css';
export function DebugDisplay() {
const debugLogRef = useRef<HTMLDivElement>(null);
const client = useRTVIClient();
const client = usePipecatClient();
const log = useCallback((message: string) => {
if (!debugLogRef.current) return;

View File

@@ -1,7 +1,7 @@
import { useRTVIClientTransportState } from '@pipecat-ai/client-react';
import { usePipecatClientTransportState } from '@pipecat-ai/client-react';
export function StatusDisplay() {
const transportState = useRTVIClientTransportState();
const transportState = usePipecatClientTransportState();
return (
<div className="status">

View File

@@ -0,0 +1,16 @@
import { type PropsWithChildren } from 'react';
import { PipecatClient } from '@pipecat-ai/client-js';
import { DailyTransport } from '@pipecat-ai/daily-transport';
import { PipecatClientProvider } from '@pipecat-ai/client-react';
const client = new PipecatClient({
transport: new DailyTransport(),
enableMic: true,
enableCam: false,
});
export function PipecatProvider({ children }: PropsWithChildren) {
return (
<PipecatClientProvider client={client}>{children}</PipecatClientProvider>
);
}

View File

@@ -1,22 +0,0 @@
import { type PropsWithChildren } from 'react';
import { RTVIClient } from '@pipecat-ai/client-js';
import { DailyTransport } from '@pipecat-ai/daily-transport';
import { RTVIClientProvider } from '@pipecat-ai/client-react';
const transport = new DailyTransport();
const client = new RTVIClient({
transport,
params: {
baseUrl: 'http://localhost:7860',
endpoints: {
connect: '/connect',
},
},
enableMic: true,
enableCam: false,
});
export function RTVIProvider({ children }: PropsWithChildren) {
return <RTVIClientProvider client={client}>{children}</RTVIClientProvider>;
}

View File

@@ -12,12 +12,11 @@ import {
import {
WebSocketTransport,
TwilioSerializer,
} from "@pipecat-ai/websocket-transport";
} from '@pipecat-ai/websocket-transport';
class WebsocketClientApp {
private static STREAM_SID = "ws_mock_stream_sid"
private static CALL_SID = "ws_mock_call_sid"
private static STREAM_SID = 'ws_mock_stream_sid';
private static CALL_SID = 'ws_mock_call_sid';
private rtviClient: RTVIClient | null = null;
private connectBtn: HTMLButtonElement | null = null;
@@ -38,8 +37,12 @@ class WebsocketClientApp {
* Set up references to DOM elements and create necessary media elements
*/
private setupDOMElements(): void {
this.connectBtn = document.getElementById('connect-btn') as HTMLButtonElement;
this.disconnectBtn = document.getElementById('disconnect-btn') as HTMLButtonElement;
this.connectBtn = document.getElementById(
'connect-btn'
) as HTMLButtonElement;
this.disconnectBtn = document.getElementById(
'disconnect-btn'
) as HTMLButtonElement;
this.statusSpan = document.getElementById('connection-status');
this.debugLog = document.getElementById('debug-log');
}
@@ -80,13 +83,23 @@ class WebsocketClientApp {
}
private async emulateTwilioMessages() {
const connectedMessage={"event": "connected", "protocol": "Call", "version": "1.0.0"}
const connectedMessage = {
event: 'connected',
protocol: 'Call',
version: '1.0.0',
};
const websocketTransport = this.rtviClient?.transport as WebSocketTransport
void websocketTransport?.sendRawMessage(connectedMessage)
const websocketTransport = this.rtviClient?.transport as WebSocketTransport;
void websocketTransport?.sendRawMessage(connectedMessage);
const startMessage={"event": "start", "start": {"streamSid": WebsocketClientApp.STREAM_SID, "callSid": WebsocketClientApp.CALL_SID}}
void websocketTransport?.sendRawMessage(startMessage)
const startMessage = {
event: 'start',
start: {
streamSid: WebsocketClientApp.STREAM_SID,
callSid: WebsocketClientApp.CALL_SID,
},
};
void websocketTransport?.sendRawMessage(startMessage);
}
/**
@@ -118,7 +131,9 @@ class WebsocketClientApp {
// Listen for tracks stopping
this.rtviClient.on(RTVIEvent.TrackStopped, (track, participant) => {
this.log(`Track stopped: ${track.kind} from ${participant?.name || 'unknown'}`);
this.log(
`Track stopped: ${track.kind} from ${participant?.name || 'unknown'}`
);
});
}
@@ -128,7 +143,10 @@ class WebsocketClientApp {
*/
private setupAudioTrack(track: MediaStreamTrack): void {
this.log('Setting up audio track');
if (this.botAudio.srcObject && "getAudioTracks" in this.botAudio.srcObject) {
if (
this.botAudio.srcObject &&
'getAudioTracks' in this.botAudio.srcObject
) {
const oldTrack = this.botAudio.srcObject.getAudioTracks()[0];
if (oldTrack?.id === track.id) return;
}
@@ -143,23 +161,19 @@ class WebsocketClientApp {
try {
const startTime = Date.now();
const transport = new WebSocketTransport({
const ws_opts = {
serializer: new TwilioSerializer(),
recorderSampleRate: 8000,
playerSampleRate: 8000
});
playerSampleRate: 8000,
ws_url: 'http://localhost:8765/ws',
};
const RTVIConfig: RTVIClientOptions = {
transport,
params: {
// The baseURL and endpoint of your bot server that the client will connect to
baseUrl: 'http://localhost:8765',
endpoints: { connect: '/' },
},
transport: new WebSocketTransport(ws_opts),
enableMic: true,
enableCam: false,
callbacks: {
onConnected: () => {
this.emulateTwilioMessages()
this.emulateTwilioMessages();
this.updateStatus('Connected');
if (this.connectBtn) this.connectBtn.disabled = true;
if (this.disconnectBtn) this.disconnectBtn.disabled = false;
@@ -183,13 +197,7 @@ class WebsocketClientApp {
onMessageError: (error) => console.error('Message error:', error),
onError: (error) => console.error('Error:', error),
},
}
// @ts-ignore
RTVIConfig.customConnectHandler = () => Promise.resolve(
{
ws_url: "/ws",
}
);
};
this.rtviClient = new RTVIClient(RTVIConfig);
this.setupTrackListeners();
@@ -223,8 +231,13 @@ class WebsocketClientApp {
try {
await this.rtviClient.disconnect();
this.rtviClient = null;
if (this.botAudio.srcObject && "getAudioTracks" in this.botAudio.srcObject) {
this.botAudio.srcObject.getAudioTracks().forEach((track) => track.stop());
if (
this.botAudio.srcObject &&
'getAudioTracks' in this.botAudio.srcObject
) {
this.botAudio.srcObject
.getAudioTracks()
.forEach((track) => track.stop());
this.botAudio.srcObject = null;
}
} catch (error) {
@@ -232,7 +245,6 @@ class WebsocketClientApp {
}
}
}
}
declare global {

File diff suppressed because it is too large Load Diff

View File

@@ -19,8 +19,8 @@
"vite": "^6.3.5"
},
"dependencies": {
"@pipecat-ai/client-js": "^0.4.0",
"@pipecat-ai/websocket-transport": "^0.4.2",
"@pipecat-ai/client-js": "^1.0.0",
"@pipecat-ai/websocket-transport": "^1.0.0",
"protobufjs": "^7.4.0"
}
}

View File

@@ -5,7 +5,7 @@
*/
/**
* RTVI Client Implementation
* Pipecat Client Implementation
*
* This client connects to an RTVI-compatible bot server using WebSocket.
*
@@ -14,16 +14,14 @@
*/
import {
RTVIClient,
RTVIClientOptions,
PipecatClient,
PipecatClientOptions,
RTVIEvent,
} from '@pipecat-ai/client-js';
import {
WebSocketTransport
} from "@pipecat-ai/websocket-transport";
import { WebSocketTransport } from '@pipecat-ai/websocket-transport';
class WebsocketClientApp {
private rtviClient: RTVIClient | null = null;
private pcClient: PipecatClient | null = null;
private connectBtn: HTMLButtonElement | null = null;
private disconnectBtn: HTMLButtonElement | null = null;
private statusSpan: HTMLElement | null = null;
@@ -31,7 +29,7 @@ class WebsocketClientApp {
private botAudio: HTMLAudioElement;
constructor() {
console.log("WebsocketClientApp");
console.log('WebsocketClientApp');
this.botAudio = document.createElement('audio');
this.botAudio.autoplay = true;
//this.botAudio.playsInline = true;
@@ -45,8 +43,12 @@ class WebsocketClientApp {
* Set up references to DOM elements and create necessary media elements
*/
private setupDOMElements(): void {
this.connectBtn = document.getElementById('connect-btn') as HTMLButtonElement;
this.disconnectBtn = document.getElementById('disconnect-btn') as HTMLButtonElement;
this.connectBtn = document.getElementById(
'connect-btn'
) as HTMLButtonElement;
this.disconnectBtn = document.getElementById(
'disconnect-btn'
) as HTMLButtonElement;
this.statusSpan = document.getElementById('connection-status');
this.debugLog = document.getElementById('debug-log');
}
@@ -91,8 +93,8 @@ class WebsocketClientApp {
* This is called when the bot is ready or when the transport state changes to ready
*/
setupMediaTracks() {
if (!this.rtviClient) return;
const tracks = this.rtviClient.tracks();
if (!this.pcClient) return;
const tracks = this.pcClient.tracks();
if (tracks.bot?.audio) {
this.setupAudioTrack(tracks.bot.audio);
}
@@ -103,10 +105,10 @@ class WebsocketClientApp {
* This handles new tracks being added during the session
*/
setupTrackListeners() {
if (!this.rtviClient) return;
if (!this.pcClient) return;
// Listen for new tracks starting
this.rtviClient.on(RTVIEvent.TrackStarted, (track, participant) => {
this.pcClient.on(RTVIEvent.TrackStarted, (track, participant) => {
// Only handle non-local (bot) tracks
if (!participant?.local && track.kind === 'audio') {
this.setupAudioTrack(track);
@@ -114,8 +116,10 @@ class WebsocketClientApp {
});
// Listen for tracks stopping
this.rtviClient.on(RTVIEvent.TrackStopped, (track, participant) => {
this.log(`Track stopped: ${track.kind} from ${participant?.name || 'unknown'}`);
this.pcClient.on(RTVIEvent.TrackStopped, (track, participant) => {
this.log(
`Track stopped: ${track.kind} from ${participant?.name || 'unknown'}`
);
});
}
@@ -125,7 +129,10 @@ class WebsocketClientApp {
*/
private setupAudioTrack(track: MediaStreamTrack): void {
this.log('Setting up audio track');
if (this.botAudio.srcObject && "getAudioTracks" in this.botAudio.srcObject) {
if (
this.botAudio.srcObject &&
'getAudioTracks' in this.botAudio.srcObject
) {
const oldTrack = this.botAudio.srcObject.getAudioTracks()[0];
if (oldTrack?.id === track.id) return;
}
@@ -134,21 +141,15 @@ class WebsocketClientApp {
/**
* Initialize and connect to the bot
* This sets up the RTVI client, initializes devices, and establishes the connection
* This sets up the Pipecat client, initializes devices, and establishes the connection
*/
public async connect(): Promise<void> {
try {
const startTime = Date.now();
//const transport = new DailyTransport();
const transport = new WebSocketTransport();
const RTVIConfig: RTVIClientOptions = {
transport,
params: {
// The baseURL and endpoint of your bot server that the client will connect to
baseUrl: 'http://localhost:7860',
endpoints: { connect: '/connect' },
},
const PipecatConfig: PipecatClientOptions = {
transport: new WebSocketTransport(),
enableMic: true,
enableCam: false,
callbacks: {
@@ -176,15 +177,20 @@ class WebsocketClientApp {
onMessageError: (error) => console.error('Message error:', error),
onError: (error) => console.error('Error:', error),
},
}
this.rtviClient = new RTVIClient(RTVIConfig);
};
this.pcClient = new PipecatClient(PipecatConfig);
// @ts-ignore
window.pcClient = this.pcClient; // Expose for debugging
this.setupTrackListeners();
this.log('Initializing devices...');
await this.rtviClient.initDevices();
await this.pcClient.initDevices();
this.log('Connecting to bot...');
await this.rtviClient.connect();
await this.pcClient.connect({
// The baseURL and endpoint of your bot server that the client will connect to
endpoint: 'http://localhost:7860/connect',
});
const timeTaken = Date.now() - startTime;
this.log(`Connection complete, timeTaken: ${timeTaken}`);
@@ -192,9 +198,9 @@ class WebsocketClientApp {
this.log(`Error connecting: ${(error as Error).message}`);
this.updateStatus('Error');
// Clean up if there's an error
if (this.rtviClient) {
if (this.pcClient) {
try {
await this.rtviClient.disconnect();
await this.pcClient.disconnect();
} catch (disconnectError) {
this.log(`Error during disconnect: ${disconnectError}`);
}
@@ -206,12 +212,17 @@ class WebsocketClientApp {
* Disconnect from the bot and clean up media resources
*/
public async disconnect(): Promise<void> {
if (this.rtviClient) {
if (this.pcClient) {
try {
await this.rtviClient.disconnect();
this.rtviClient = null;
if (this.botAudio.srcObject && "getAudioTracks" in this.botAudio.srcObject) {
this.botAudio.srcObject.getAudioTracks().forEach((track) => track.stop());
await this.pcClient.disconnect();
this.pcClient = null;
if (
this.botAudio.srcObject &&
'getAudioTracks' in this.botAudio.srcObject
) {
this.botAudio.srcObject
.getAudioTracks()
.forEach((track) => track.stop());
this.botAudio.srcObject = null;
}
} catch (error) {
@@ -219,7 +230,6 @@ class WebsocketClientApp {
}
}
}
}
declare global {

File diff suppressed because it is too large Load Diff

View File

@@ -9,11 +9,12 @@
"lint": "next lint"
},
"dependencies": {
"@pipecat-ai/client-js": "^0.3.5",
"@pipecat-ai/client-react": "^0.3.5",
"@pipecat-ai/daily-transport": "^0.3.10",
"@pipecat-ai/client-js": "^1.0.0",
"@pipecat-ai/client-react": "^1.0.0",
"@pipecat-ai/daily-transport": "^1.0.0",
"@tabler/icons-react": "^3.31.0",
"@tailwindcss/postcss": "^4.1.3",
"jotai": "^2.12.5",
"js-confetti": "^0.12.0",
"next": "15.2.4",
"react": "^19.0.0",

View File

@@ -1,16 +1,26 @@
import { useEffect, useCallback } from 'react';
import {
useRTVIClient,
useRTVIClientTransportState,
usePipecatClient,
usePipecatClientTransportState,
} from '@pipecat-ai/client-react';
import { CONNECTION_STATES } from '@/constants/gameConstants';
import { useConfigurationSettings } from '@/contexts/Configuration';
// Get the API base URL from environment variables
// Default to "/api" if not specified
// "/api" is the default for Next.js API routes and used
// for the Pipecat Cloud deployed agent
const API_BASE_URL = process.env.NEXT_PUBLIC_API_BASE_URL || '/api';
console.log('Using API base URL:', API_BASE_URL);
export function useConnectionState(
onConnected?: () => void,
onDisconnected?: () => void
) {
const client = useRTVIClient();
const transportState = useRTVIClientTransportState();
const client = usePipecatClient();
const transportState = usePipecatClientTransportState();
const config = useConfigurationSettings();
const isConnected = CONNECTION_STATES.ACTIVE.includes(transportState);
const isConnecting = CONNECTION_STATES.CONNECTING.includes(transportState);
@@ -35,12 +45,17 @@ export function useConnectionState(
if (isConnected) {
await client.disconnect();
} else {
await client.connect();
await client.connect({
endpoint: `${API_BASE_URL}/connect`,
requestData: {
personality: config.personality,
},
});
}
} catch (error) {
console.error('Connection error:', error);
}
}, [client, isConnected]);
}, [client, config, isConnected]);
return {
isConnected,

View File

@@ -1,15 +1,15 @@
import { ConfigurationProvider } from "@/contexts/Configuration";
import { RTVIProvider } from "@/providers/RTVIProvider";
import { RTVIClientAudio } from "@pipecat-ai/client-react";
import type { AppProps } from "next/app";
import { Nunito } from "next/font/google";
import Head from "next/head";
import "../styles/globals.css";
import { ConfigurationProvider } from '@/contexts/Configuration';
import { PipecatProvider } from '@/providers/PipecatProvider';
import { PipecatClientAudio } from '@pipecat-ai/client-react';
import type { AppProps } from 'next/app';
import { Nunito } from 'next/font/google';
import Head from 'next/head';
import '../styles/globals.css';
const nunito = Nunito({
subsets: ["latin"],
display: "swap",
variable: "--font-sans",
subsets: ['latin'],
display: 'swap',
variable: '--font-sans',
});
export default function App({ Component, pageProps }: AppProps) {
@@ -21,10 +21,10 @@ export default function App({ Component, pageProps }: AppProps) {
</Head>
<main className={`${nunito.variable}`}>
<ConfigurationProvider>
<RTVIProvider>
<RTVIClientAudio />
<PipecatProvider>
<PipecatClientAudio />
<Component {...pageProps} />
</RTVIProvider>
</PipecatProvider>
</ConfigurationProvider>
</main>
</>

View File

@@ -1,11 +1,11 @@
import type { NextApiRequest, NextApiResponse } from "next";
import type { NextApiRequest, NextApiResponse } from 'next';
export default async function handler(
req: NextApiRequest,
res: NextApiResponse
) {
if (req.method !== "POST") {
return res.status(405).json({ error: "Method not allowed" });
if (req.method !== 'POST') {
return res.status(405).json({ error: 'Method not allowed' });
}
try {
@@ -15,16 +15,16 @@ export default async function handler(
if (!personality) {
return res
.status(400)
.json({ error: "Missing required configuration parameters" });
.json({ error: 'Missing required configuration parameters' });
}
const response = await fetch(
`https://api.pipecat.daily.co/v1/public/${process.env.AGENT_NAME}/start`,
{
method: "POST",
method: 'POST',
headers: {
Authorization: `Bearer ${process.env.PIPECAT_CLOUD_API_KEY}`,
"Content-Type": "application/json",
'Content-Type': 'application/json',
},
body: JSON.stringify({
createDailyRoom: true,
@@ -37,15 +37,15 @@ export default async function handler(
const data = await response.json();
console.log("Response from API:", JSON.stringify(data, null, 2));
console.log('Response from API:', JSON.stringify(data, null, 2));
// Transform the response to match what RTVI client expects
// Transform the response to match what Pipecat client expects
return res.status(200).json({
room_url: data.dailyRoom,
token: data.dailyToken,
});
} catch (error) {
console.error("Error starting agent:", error);
return res.status(500).json({ error: "Failed to start agent" });
console.error('Error starting agent:', error);
return res.status(500).json({ error: 'Failed to start agent' });
}
}

View File

@@ -0,0 +1,43 @@
'use client';
import { PipecatClient } from '@pipecat-ai/client-js';
import { DailyTransport } from '@pipecat-ai/daily-transport';
import { PipecatClientProvider } from '@pipecat-ai/client-react';
import { PropsWithChildren, useEffect, useState, useRef } from 'react';
export function PipecatProvider({ children }: PropsWithChildren) {
const [client, setClient] = useState<PipecatClient | null>(null);
const clientCreated = useRef(false);
useEffect(() => {
// Only create the client once
if (clientCreated.current) return;
const pcClient = new PipecatClient({
transport: new DailyTransport(),
enableMic: true,
enableCam: false,
});
setClient(pcClient);
clientCreated.current = true;
// Cleanup when component unmounts
return () => {
if (pcClient) {
pcClient.disconnect().catch((err) => {
console.error('Error disconnecting client:', err);
});
}
clientCreated.current = false;
};
}, []);
if (!client) {
return null;
}
return (
<PipecatClientProvider client={client}>{children}</PipecatClientProvider>
);
}

View File

@@ -1,72 +0,0 @@
"use client";
import { RTVIClient } from "@pipecat-ai/client-js";
import { DailyTransport } from "@pipecat-ai/daily-transport";
import { RTVIClientProvider } from "@pipecat-ai/client-react";
import { PropsWithChildren, useEffect, useState, useRef } from "react";
import { useConfigurationSettings } from "@/contexts/Configuration";
// Get the API base URL from environment variables
// Default to "/api" if not specified
// "/api" is the default for Next.js API routes and used
// for the Pipecat Cloud deployed agent
const API_BASE_URL = process.env.NEXT_PUBLIC_API_BASE_URL || "/api";
console.log("Using API base URL:", API_BASE_URL);
export function RTVIProvider({ children }: PropsWithChildren) {
const [client, setClient] = useState<RTVIClient | null>(null);
const config = useConfigurationSettings();
const clientCreated = useRef(false);
useEffect(() => {
// Only create the client once
if (clientCreated.current) return;
const transport = new DailyTransport();
const rtviClient = new RTVIClient({
transport,
params: {
baseUrl: API_BASE_URL,
endpoints: {
connect: "/connect",
},
requestData: {
personality: config.personality,
},
},
enableMic: true,
enableCam: false,
});
setClient(rtviClient);
clientCreated.current = true;
// Cleanup when component unmounts
return () => {
if (rtviClient) {
rtviClient.disconnect().catch((err) => {
console.error("Error disconnecting client:", err);
});
}
clientCreated.current = false;
};
}, []);
// Update the connectParams when config changes
useEffect(() => {
if (!client) return;
// Update the connect params without recreating the client
client.params.requestData = {
personality: config.personality,
};
}, [client, config.personality]);
if (!client) {
return null;
}
return <RTVIClientProvider client={client}>{children}</RTVIClientProvider>;
}

View File

@@ -4,6 +4,7 @@
# SPDX-License-Identifier: BSD 2-Clause License
#
import argparse
import asyncio
import os
import sys
@@ -198,16 +199,15 @@ async def bot(args: DailySessionArguments):
# Local development
async def local_daily():
async def local_daily(args: DailySessionArguments):
"""Daily transport for local development."""
from runner import configure
# from runner import configure
try:
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
transport = DailyTransport(
room_url,
token,
room_url=args.room_url,
token=args.token,
bot_name="Bot",
params=DailyParams(
audio_in_enabled=True,
@@ -217,7 +217,7 @@ async def local_daily():
)
test_config = {
"personality": "witty",
"personality": args.personality,
}
await main(transport, test_config)
@@ -227,7 +227,24 @@ async def local_daily():
# Local development entry point
if LOCAL_RUN and __name__ == "__main__":
parser = argparse.ArgumentParser(
description="Run the Word Wrangler bot in local development mode"
)
parser.add_argument(
"-u", "--room-url", type=str, default=os.getenv("DAILY_SAMPLE_ROOM_URL", "")
)
parser.add_argument(
"-t", "--token", type=str, default=os.getenv("DAILY_SAMPLE_ROOM_TOKEN", None)
)
parser.add_argument(
"-p",
"--personality",
default="witty",
choices=["friendly", "professional", "enthusiastic", "thoughtful", "witty"],
help="Personality preset for the bot (friendly, professional, enthusiastic, thoughtful, witty)",
)
args = parser.parse_args()
try:
asyncio.run(local_daily())
asyncio.run(local_daily(args))
except Exception as e:
logger.exception(f"Failed to run in local mode: {e}")

View File

@@ -160,14 +160,15 @@ async def rtvi_connect(request: Request) -> Dict[Any, Any]:
Raises:
HTTPException: If room creation, token generation, or bot startup fails
"""
print("Creating room for RTVI connection")
body = await request.json()
print("Creating room for RTVI connection", body)
room_url, token = await create_room_and_token()
print(f"Room URL: {room_url}")
# Start the bot process
try:
proc = subprocess.Popen(
[f"python3 -m bot -u {room_url} -t {token}"],
[f"python3 -m bot -u {room_url} -t {token} -p {body.get('personality', 'witty')}"],
shell=True,
bufsize=1,
cwd=os.path.dirname(os.path.abspath(__file__)),

View File

@@ -32,7 +32,7 @@ dependencies = [
"pyloudnorm~=0.1.1",
"resampy~=0.4.3",
"soxr~=0.5.0",
"openai~=1.70.0",
"openai~=1.74.0",
]
[project.urls]
@@ -42,7 +42,7 @@ Website = "https://pipecat.ai"
[project.optional-dependencies]
anthropic = [ "anthropic~=0.49.0" ]
assemblyai = [ "websockets~=13.1" ]
aws = [ "boto3~=1.37.16", "websockets~=13.1" ]
aws = [ "aioboto3~=15.0.0", "websockets~=13.1" ]
aws-nova-sonic = [ "aws_sdk_bedrock_runtime~=0.0.2" ]
azure = [ "azure-cognitiveservices-speech~=1.42.0"]
cartesia = [ "cartesia~=2.0.3", "websockets~=13.1" ]
@@ -79,7 +79,7 @@ perplexity = []
playht = [ "pyht~=0.1.12", "websockets~=13.1" ]
qwen = []
rime = [ "websockets~=13.1" ]
riva = [ "nvidia-riva-client~=2.19.1" ]
riva = [ "nvidia-riva-client~=2.21.1" ]
sambanova = []
sentry = [ "sentry-sdk~=2.23.1" ]
local-smart-turn = [ "coremltools>=8.0", "transformers", "torch==2.5.0", "torchaudio==2.5.0" ]

View File

@@ -152,11 +152,6 @@ class FrameProcessor(BaseObject):
self.__input_event = None
self.__input_frame_task: Optional[asyncio.Task] = None
# Every processor in Pipecat should only output frames from a single
# task. This avoid problems like audio overlapping. System frames are the
# exception to this rule. This create this task.
self.__push_frame_task: Optional[asyncio.Task] = None
@property
def id(self) -> int:
"""Get the unique identifier for this processor.
@@ -385,7 +380,6 @@ class FrameProcessor(BaseObject):
"""Clean up processor resources."""
await super().cleanup()
await self.__cancel_input_task()
await self.__cancel_push_task()
if self._metrics is not None:
await self._metrics.cleanup()
@@ -512,10 +506,7 @@ class FrameProcessor(BaseObject):
if not self._check_started(frame):
return
if isinstance(frame, SystemFrame):
await self.__internal_push_frame(frame, direction)
else:
await self.__push_queue.put((frame, direction))
await self.__internal_push_frame(frame, direction)
async def __start(self, frame: StartFrame):
"""Handle the start frame to initialize processor state.
@@ -530,7 +521,6 @@ class FrameProcessor(BaseObject):
self._interruption_strategies = frame.interruption_strategies
self._report_only_initial_ttfb = frame.report_only_initial_ttfb
self.__create_input_task()
self.__create_push_task()
async def __cancel(self, frame: CancelFrame):
"""Handle the cancel frame to stop processor operation.
@@ -540,7 +530,6 @@ class FrameProcessor(BaseObject):
"""
self._cancelling = True
await self.__cancel_input_task()
await self.__cancel_push_task()
async def __pause(self, frame: FrameProcessorPauseFrame | FrameProcessorPauseUrgentFrame):
"""Handle pause frame to pause processor operation.
@@ -567,9 +556,6 @@ class FrameProcessor(BaseObject):
async def _start_interruption(self):
"""Start handling an interruption by canceling current tasks."""
try:
# Cancel the push frame task. This will stop pushing frames downstream.
await self.__cancel_push_task()
# Cancel the input task. This will stop processing queued frames.
await self.__cancel_input_task()
except Exception as e:
@@ -579,9 +565,6 @@ class FrameProcessor(BaseObject):
# Create a new input queue and task.
self.__create_input_task()
# Create a new output queue and task.
self.__create_push_task()
async def _stop_interruption(self):
"""Stop handling an interruption."""
# Nothing to do right now.
@@ -677,23 +660,3 @@ class FrameProcessor(BaseObject):
await self.push_error(ErrorFrame(str(e)))
finally:
self.__input_queue.task_done()
def __create_push_task(self):
"""Create the frame pushing task."""
if not self.__push_frame_task:
self.__push_queue = WatchdogQueue(self.task_manager)
self.__push_frame_task = self.create_task(self.__push_frame_task_handler())
async def __cancel_push_task(self):
"""Cancel the frame pushing task."""
if self.__push_frame_task:
self.__push_queue.cancel()
await self.cancel_task(self.__push_frame_task)
self.__push_frame_task = None
async def __push_frame_task_handler(self):
"""Handle frames from the push queue."""
while True:
(frame, direction) = await self.__push_queue.get()
await self.__internal_push_frame(frame, direction)
self.__push_queue.task_done()

View File

@@ -44,6 +44,7 @@ from pipecat.frames.frames import (
InterimTranscriptionFrame,
LLMFullResponseEndFrame,
LLMFullResponseStartFrame,
LLMMessagesAppendFrame,
LLMTextFrame,
MetricsFrame,
StartFrame,
@@ -71,13 +72,14 @@ from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.llm_service import (
FunctionCallParams, # TODO(aleix): we shouldn't import `services` from `processors`
)
from pipecat.services.openai.llm import OpenAIContextAggregatorPair
from pipecat.transports.base_input import BaseInputTransport
from pipecat.transports.base_output import BaseOutputTransport
from pipecat.transports.base_transport import BaseTransport
from pipecat.utils.asyncio.watchdog_queue import WatchdogQueue
from pipecat.utils.string import match_endofsentence
RTVI_PROTOCOL_VERSION = "0.3.0"
RTVI_PROTOCOL_VERSION = "1.0.0"
RTVI_MESSAGE_LABEL = "rtvi-ai"
RTVIMessageLiteral = Literal["rtvi-ai"]
@@ -90,6 +92,10 @@ class RTVIServiceOption(BaseModel):
Defines a configurable option that can be set for an RTVI service,
including its name, type, and handler function.
.. deprecated:: 0.0.75
Pipeline Configuration has been removed as part of the RTVI protocol 1.0.0.
Use custom client and server messages instead.
"""
name: str
@@ -104,6 +110,10 @@ class RTVIService(BaseModel):
Represents a service that can be configured and used within the RTVI protocol,
containing a name and list of configurable options.
.. deprecated:: 0.0.75
Pipeline Configuration has been removed as part of the RTVI protocol 1.0.0.
Use custom client and server messages instead.
"""
name: str
@@ -122,6 +132,10 @@ class RTVIActionArgumentData(BaseModel):
"""Data for an RTVI action argument.
Contains the name and value of an argument passed to an RTVI action.
.. deprecated:: 0.0.75
Actions have been removed as part of the RTVI protocol 1.0.0.
Use custom client and server messages instead.
"""
name: str
@@ -132,6 +146,10 @@ class RTVIActionArgument(BaseModel):
"""Definition of an RTVI action argument.
Specifies the name and expected type of an argument for an RTVI action.
.. deprecated:: 0.0.75
Actions have been removed as part of the RTVI protocol 1.0.0.
Use custom client and server messages instead.
"""
name: str
@@ -143,6 +161,10 @@ class RTVIAction(BaseModel):
Represents an action that can be executed within the RTVI protocol,
including its service, name, arguments, and handler function.
.. deprecated:: 0.0.75
Actions have been removed as part of the RTVI protocol 1.0.0.
Use custom client and server messages instead.
"""
service: str
@@ -166,6 +188,10 @@ class RTVIServiceOptionConfig(BaseModel):
"""Configuration value for an RTVI service option.
Contains the name and value to set for a specific service option.
.. deprecated:: 0.0.75
Pipeline Configuration has been removed as part of the RTVI protocol 1.0.0.
Use custom client and server messages instead.
"""
name: str
@@ -176,6 +202,10 @@ class RTVIServiceConfig(BaseModel):
"""Configuration for an RTVI service.
Contains the service name and list of option configurations to apply.
.. deprecated:: 0.0.75
Pipeline Configuration has been removed as part of the RTVI protocol 1.0.0.
Use custom client and server messages instead.
"""
service: str
@@ -186,6 +216,10 @@ class RTVIConfig(BaseModel):
"""Complete RTVI configuration.
Contains the full configuration for all RTVI services.
.. deprecated:: 0.0.75
Pipeline Configuration has been removed as part of the RTVI protocol 1.0.0.
Use custom client and server messages instead.
"""
config: List[RTVIServiceConfig]
@@ -196,10 +230,15 @@ class RTVIConfig(BaseModel):
#
# deprecated
class RTVIUpdateConfig(BaseModel):
"""Request to update RTVI configuration.
Contains new configuration settings and whether to interrupt the bot.
.. deprecated:: 0.0.75
Pipeline Configuration has been removed as part of the RTVI protocol 1.0.0.
Use custom client and server messages instead.
"""
config: List[RTVIServiceConfig]
@@ -210,6 +249,10 @@ class RTVIActionRunArgument(BaseModel):
"""Argument for running an RTVI action.
Contains the name and value of an argument to pass to an action.
.. deprecated:: 0.0.75
Actions have been removed as part of the RTVI protocol 1.0.0.
Use custom client and server messages instead.
"""
name: str
@@ -220,6 +263,10 @@ class RTVIActionRun(BaseModel):
"""Request to run an RTVI action.
Contains the service, action name, and optional arguments.
.. deprecated:: 0.0.75
Actions have been removed as part of the RTVI protocol 1.0.0.
Use custom client and server messages instead.
"""
service: str
@@ -234,12 +281,80 @@ class RTVIActionFrame(DataFrame):
Parameters:
rtvi_action_run: The action to execute.
message_id: Optional message ID for response correlation.
.. deprecated:: 0.0.75
Actions have been removed as part of the RTVI protocol 1.0.0.
Use custom client and server messages instead.
"""
rtvi_action_run: RTVIActionRun
message_id: Optional[str] = None
class RTVIRawClientMessageData(BaseModel):
"""Data structure expected from client messages sent to the RTVI server."""
t: str
d: Optional[Any] = None
class RTVIClientMessage(BaseModel):
"""Cleansed data structure for client messages for handling."""
msg_id: str
type: str
data: Optional[Any] = None
@dataclass
class RTVIClientMessageFrame(SystemFrame):
"""A frame for sending messages from the client to the RTVI server.
This frame is meant for custom messaging from the client to the server
and expects a server-response message.
"""
msg_id: str
type: str
data: Optional[Any] = None
@dataclass
class RTVIServerResponseFrame(SystemFrame):
"""A frame for responding to a client RTVI message.
This frame should be sent in response to an RTVIClientMessageFrame
and include the original RTVIClientMessageFrame to ensure the response
is properly attributed to the original request. To respond with an error,
set the `error` field to a string describing the error. This will result
in the client receiving a `response-error` message instead of a
`server-response` message.
"""
client_msg: RTVIClientMessageFrame
data: Optional[Any] = None
error: Optional[str] = None
class RTVIRawServerResponseData(BaseModel):
"""Data structure for server responses to client messages."""
t: str
d: Optional[Any] = None
class RTVIServerResponse(BaseModel):
"""The RTVI-formatted message response from the server to the client.
This message is used to respond to custom messages sent by the client.
"""
label: RTVIMessageLiteral = RTVI_MESSAGE_LABEL
type: Literal["server-response"] = "server-response"
id: str
data: RTVIRawServerResponseData
class RTVIMessage(BaseModel):
"""Base RTVI message structure.
@@ -269,7 +384,7 @@ class RTVIErrorResponseData(BaseModel):
class RTVIErrorResponse(BaseModel):
"""RTVI error response message.
Sent in response to a client request that resulted in an error.
RTVI Formatted error response message for relaying failed client requests.
"""
label: RTVIMessageLiteral = RTVI_MESSAGE_LABEL
@@ -285,13 +400,13 @@ class RTVIErrorData(BaseModel):
"""
error: str
fatal: bool
fatal: bool # Indicates the pipeline has stopped due to this error
class RTVIError(BaseModel):
"""RTVI error event message.
Sent when an error occurs that isn't in response to a specific request.
RTVI Formatted error message for relaying errors in the pipeline.
"""
label: RTVIMessageLiteral = RTVI_MESSAGE_LABEL
@@ -303,6 +418,10 @@ class RTVIDescribeConfigData(BaseModel):
"""Data for describing available RTVI configuration.
Contains the list of available services and their options.
.. deprecated:: 0.0.75
Pipeline Configuration has been removed as part of the RTVI protocol 1.0.0.
Use custom client and server messages instead.
"""
config: List[RTVIService]
@@ -312,6 +431,10 @@ class RTVIDescribeConfig(BaseModel):
"""Message describing available RTVI configuration.
Sent in response to a describe-config request.
.. deprecated:: 0.0.75
Pipeline Configuration has been removed as part of the RTVI protocol 1.0.0.
Use custom client and server messages instead.
"""
label: RTVIMessageLiteral = RTVI_MESSAGE_LABEL
@@ -324,6 +447,10 @@ class RTVIDescribeActionsData(BaseModel):
"""Data for describing available RTVI actions.
Contains the list of available actions that can be executed.
.. deprecated:: 0.0.75
Actions have been removed as part of the RTVI protocol 1.0.0.
Use custom client and server messages instead.
"""
actions: List[RTVIAction]
@@ -333,6 +460,10 @@ class RTVIDescribeActions(BaseModel):
"""Message describing available RTVI actions.
Sent in response to a describe-actions request.
.. deprecated:: 0.0.75
Actions have been removed as part of the RTVI protocol 1.0.0.
Use custom client and server messages instead.
"""
label: RTVIMessageLiteral = RTVI_MESSAGE_LABEL
@@ -345,6 +476,10 @@ class RTVIConfigResponse(BaseModel):
"""Response containing current RTVI configuration.
Sent in response to a get-config request.
.. deprecated:: 0.0.75
Pipeline Configuration has been removed as part of the RTVI protocol 1.0.0.
Use custom client and server messages instead.
"""
label: RTVIMessageLiteral = RTVI_MESSAGE_LABEL
@@ -357,6 +492,10 @@ class RTVIActionResponseData(BaseModel):
"""Data for an RTVI action response.
Contains the result of executing an action.
.. deprecated:: 0.0.75
Actions have been removed as part of the RTVI protocol 1.0.0.
Use custom client and server messages instead.
"""
result: ActionResult
@@ -366,6 +505,10 @@ class RTVIActionResponse(BaseModel):
"""Response to an RTVI action execution.
Sent after successfully executing an action.
.. deprecated:: 0.0.75
Actions have been removed as part of the RTVI protocol 1.0.0.
Use custom client and server messages instead.
"""
label: RTVIMessageLiteral = RTVI_MESSAGE_LABEL
@@ -374,6 +517,30 @@ class RTVIActionResponse(BaseModel):
data: RTVIActionResponseData
class AboutClientData(BaseModel):
"""Data about the RTVI client.
Contains information about the client, including which RTVI library it
is using, what platform it is on and any additional details, if available.
"""
library: str
library_version: Optional[str] = None
platform: Optional[str] = None
platform_version: Optional[str] = None
platform_details: Optional[Any] = None
class RTVIClientReadyData(BaseModel):
"""Data format of client ready messages.
Contains the RTVIprotocol version and client information.
"""
version: str
about: AboutClientData
class RTVIBotReadyData(BaseModel):
"""Data for bot ready notification.
@@ -381,7 +548,10 @@ class RTVIBotReadyData(BaseModel):
"""
version: str
config: List[RTVIServiceConfig]
# The config field is deprecated and will not be included if
# the client's rtvi version is 1.0.0 or higher.
config: Optional[List[RTVIServiceConfig]] = None
about: Optional[Mapping[str, Any]] = None
class RTVIBotReady(BaseModel):
@@ -418,6 +588,25 @@ class RTVILLMFunctionCallMessage(BaseModel):
data: RTVILLMFunctionCallMessageData
class RTVIAppendToContextData(BaseModel):
"""Data format for appending messages to the context.
Contains the role, content, and whether to run the message immediately.
"""
role: Literal["user", "assistant"] | str
content: Any
run_immediately: bool = False
class RTVIAppendToContext(BaseModel):
"""RTVI Message format to append content to the LLM context."""
label: RTVIMessageLiteral = RTVI_MESSAGE_LABEL
type: Literal["append-to-context"] = "append-to-context"
data: RTVIAppendToContextData
class RTVILLMFunctionCallStartMessageData(BaseModel):
"""Data for LLM function call start notification.
@@ -752,6 +941,11 @@ class RTVIObserver(BaseObserver):
elif isinstance(frame, RTVIServerMessageFrame):
message = RTVIServerMessage(data=frame.data)
await self.push_transport_message_urgent(message)
elif isinstance(frame, RTVIServerResponseFrame):
if frame.error is not None:
await self._send_error_response(frame)
else:
await self._send_server_response(frame)
if mark_as_seen:
self._frames_seen.add(frame.id)
@@ -879,6 +1073,22 @@ class RTVIObserver(BaseObserver):
message = RTVIMetricsMessage(data=metrics)
await self.push_transport_message_urgent(message)
async def _send_server_response(self, frame: RTVIServerResponseFrame):
"""Send a response to the client for a specific request."""
message = RTVIServerResponse(
id=str(frame.client_msg.msg_id),
data=RTVIRawServerResponseData(t=frame.client_msg.type, d=frame.data),
)
await self.push_transport_message_urgent(message)
async def _send_error_response(self, frame: RTVIServerResponseFrame):
"""Send a response to the client for a specific request."""
if self._params.errors_enabled:
message = RTVIErrorResponse(
id=str(frame.client_msg.msg_id), data=RTVIErrorResponseData(error=frame.error)
)
await self.push_transport_message_urgent(message)
class RTVIProcessor(FrameProcessor):
"""Main processor for handling RTVI protocol messages and actions.
@@ -908,6 +1118,7 @@ class RTVIProcessor(FrameProcessor):
self._bot_ready = False
self._client_ready = False
self._client_ready_id = ""
self._client_version = []
self._errors_enabled = True
self._registered_actions: Dict[str, RTVIAction] = {}
@@ -921,6 +1132,7 @@ class RTVIProcessor(FrameProcessor):
self._register_event_handler("on_bot_started")
self._register_event_handler("on_client_ready")
self._register_event_handler("on_client_message")
self._input_transport = None
self._transport = transport
@@ -936,6 +1148,15 @@ class RTVIProcessor(FrameProcessor):
Args:
action: The action to register.
"""
import warnings
with warnings.catch_warnings():
warnings.simplefilter("always")
warnings.warn(
"The actions API is deprecated, use server and client messages instead.",
DeprecationWarning,
)
id = self._action_id(action.service, action.action)
self._registered_actions[id] = action
@@ -945,6 +1166,15 @@ class RTVIProcessor(FrameProcessor):
Args:
service: The service to register.
"""
import warnings
with warnings.catch_warnings():
warnings.simplefilter("always")
warnings.warn(
"The actions API is deprecated, use server and client messages instead.",
DeprecationWarning,
)
self._registered_services[service.name] = service
async def set_client_ready(self):
@@ -970,6 +1200,22 @@ class RTVIProcessor(FrameProcessor):
"""Send a bot interruption frame upstream."""
await self.push_frame(BotInterruptionFrame(), FrameDirection.UPSTREAM)
async def send_server_message(self, data: Any):
"""Send a server message to the client."""
message = RTVIServerMessage(data=data)
await self._send_server_message(message)
async def send_server_response(self, client_msg: RTVIClientMessage, data: Any):
"""Send a server response for a given client message."""
message = RTVIServerResponse(
id=client_msg.msg_id, data=RTVIRawServerResponseData(t=client_msg.type, d=data)
)
await self._send_server_message(message)
async def send_error_response(self, client_msg: RTVIClientMessage, error: str):
"""Send an error response for a given client message."""
await self._send_error_response(id=client_msg.msg_id, error=error)
async def send_error(self, error: str):
"""Send an error message to the client.
@@ -1013,9 +1259,6 @@ class RTVIProcessor(FrameProcessor):
function_name: Name of the function being called.
llm: The LLM processor making the call.
context: The LLM context.
Note:
This method is deprecated. Use handle_function_call() instead.
"""
import warnings
@@ -1136,7 +1379,15 @@ class RTVIProcessor(FrameProcessor):
try:
match message.type:
case "client-ready":
await self._handle_client_ready(message.id)
data = None
try:
data = RTVIClientReadyData.model_validate(message.data)
except ValidationError:
# Not all clients have been updated to RTVI 1.0.0.
# For now, that's okay, we just log their info as unknown.
data = None
pass
await self._handle_client_ready(message.id, data)
case "describe-actions":
await self._handle_describe_actions(message.id)
case "describe-config":
@@ -1148,6 +1399,9 @@ class RTVIProcessor(FrameProcessor):
await self._handle_update_config(message.id, update_config)
case "disconnect-bot":
await self.push_frame(EndTaskFrame(), FrameDirection.UPSTREAM)
case "client-message":
data = RTVIRawClientMessageData.model_validate(message.data)
await self._handle_client_message(message.id, data)
case "action":
action = RTVIActionRun.model_validate(message.data)
action_frame = RTVIActionFrame(message_id=message.id, rtvi_action_run=action)
@@ -1155,6 +1409,9 @@ class RTVIProcessor(FrameProcessor):
case "llm-function-call-result":
data = RTVILLMFunctionCallResultData.model_validate(message.data)
await self._handle_function_call_result(data)
case "append-to-context":
data = RTVIAppendToContextData.model_validate(message.data)
await self._handle_update_context(data, message.id)
case "raw-audio" | "raw-audio-batch":
await self._handle_audio_buffer(message.data)
@@ -1168,9 +1425,20 @@ class RTVIProcessor(FrameProcessor):
await self._send_error_response(message.id, f"Exception processing message: {e}")
logger.warning(f"Exception processing message: {e}")
async def _handle_client_ready(self, request_id: str):
"""Handle a client-ready message."""
logger.debug("Received client-ready")
async def _handle_client_ready(self, request_id: str, data: RTVIClientReadyData | None):
"""Handle the client-ready message from the client."""
version = data.version if data else "unknown"
logger.debug(f"Received client-ready: version {version}")
if version == "unknown":
self._client_version = [0, 3, 0] # Default to 0.3.0 if unknown
else:
try:
self._client_version = [int(v) for v in version.split(".")]
except ValueError:
logger.warning(f"Invalid client version format: {version}")
self._client_version = [0, 3, 0]
about = data.about if data else {"library": "unknown"}
logger.debug(f"Client Details: {about}")
if self._input_transport:
await self._input_transport.start_audio_in_streaming()
@@ -1201,18 +1469,45 @@ class RTVIProcessor(FrameProcessor):
async def _handle_describe_config(self, request_id: str):
"""Handle a describe-config request."""
import warnings
with warnings.catch_warnings():
warnings.simplefilter("always")
warnings.warn(
"Configuration helpers are deprecated. If your application needs this behavior, use custom server and client messages.",
DeprecationWarning,
)
services = list(self._registered_services.values())
message = RTVIDescribeConfig(id=request_id, data=RTVIDescribeConfigData(config=services))
await self._push_transport_message(message)
async def _handle_describe_actions(self, request_id: str):
"""Handle a describe-actions request."""
import warnings
with warnings.catch_warnings():
warnings.simplefilter("always")
warnings.warn(
"The Actions API is deprecated, use custom server and client messages instead.",
DeprecationWarning,
)
actions = list(self._registered_actions.values())
message = RTVIDescribeActions(id=request_id, data=RTVIDescribeActionsData(actions=actions))
await self._push_transport_message(message)
async def _handle_get_config(self, request_id: str):
"""Handle a get-config request."""
import warnings
with warnings.catch_warnings():
warnings.simplefilter("always")
warnings.warn(
"Configuration helpers are deprecated. If your application needs this behavior, use custom server and client messages.",
DeprecationWarning,
)
message = RTVIConfigResponse(id=request_id, data=self._config)
await self._push_transport_message(message)
@@ -1230,6 +1525,15 @@ class RTVIProcessor(FrameProcessor):
async def _update_service_config(self, config: RTVIServiceConfig):
"""Update configuration for a specific service."""
import warnings
with warnings.catch_warnings():
warnings.simplefilter("always")
warnings.warn(
"Configuration helpers are deprecated. If your application needs this behavior, use custom server and client messages.",
DeprecationWarning,
)
service = self._registered_services[config.service]
for option in config.options:
handler = service._options_dict[option.name].handler
@@ -1238,6 +1542,15 @@ class RTVIProcessor(FrameProcessor):
async def _update_config(self, data: RTVIConfig, interrupt: bool):
"""Update the RTVI configuration."""
import warnings
with warnings.catch_warnings():
warnings.simplefilter("always")
warnings.warn(
"Configuration helpers are deprecated. If your application needs this behavior, use custom server and client messages.",
DeprecationWarning,
)
if interrupt:
await self.interrupt_bot()
for service_config in data.config:
@@ -1248,6 +1561,33 @@ class RTVIProcessor(FrameProcessor):
await self._update_config(RTVIConfig(config=data.config), data.interrupt)
await self._handle_get_config(request_id)
async def _handle_update_context(self, data: RTVIAppendToContextData):
if data.run_immediately:
await self.interrupt_bot()
frame = LLMMessagesAppendFrame(
messages=[{"role": data.role, "content": data.content}],
run_llm=data.run_immediately,
)
await self.push_frame(frame)
async def _handle_client_message(self, msg_id: str, data: RTVIRawClientMessageData):
"""Handle a client message frame."""
if not data:
await self._send_error_response(msg_id, "Malformed client message")
return
# Create a RTVIClientMessageFrame to push the message
frame = RTVIClientMessageFrame(msg_id=msg_id, type=data.t, data=data.d)
await self.push_frame(frame)
await self._call_event_handler(
"on_client_message",
RTVIClientMessage(
msg_id=msg_id,
type=data.t,
data=data.d,
),
)
async def _handle_function_call_result(self, data):
"""Handle a function call result from the client."""
frame = FunctionCallResultFrame(
@@ -1278,12 +1618,19 @@ class RTVIProcessor(FrameProcessor):
async def _send_bot_ready(self):
"""Send the bot-ready message to the client."""
config = None
if self._client_version[0] < 1:
config = self._config.config
message = RTVIBotReady(
id=self._client_ready_id,
data=RTVIBotReadyData(version=RTVI_PROTOCOL_VERSION, config=self._config.config),
data=RTVIBotReadyData(version=RTVI_PROTOCOL_VERSION, config=config),
)
await self._push_transport_message(message)
async def _send_server_message(self, message: RTVIServerMessage | RTVIServerResponse):
"""Send a message or response to the client."""
await self._push_transport_message(message)
async def _send_error_frame(self, frame: ErrorFrame):
"""Send an error frame as an RTVI error message."""
if self._errors_enabled:

View File

@@ -15,6 +15,8 @@ from pipecat.frames.frames import (
CancelFrame,
EndFrame,
Frame,
FunctionCallInProgressFrame,
FunctionCallResultFrame,
StartFrame,
UserStartedSpeakingFrame,
UserStoppedSpeakingFrame,
@@ -168,6 +170,13 @@ class UserIdleProcessor(FrameProcessor):
self._idle_event.set()
elif isinstance(frame, BotSpeakingFrame):
self._idle_event.set()
elif isinstance(frame, FunctionCallInProgressFrame):
# Function calls can take longer than the timeout, so we want to prevent idle callbacks
self._interrupted = True
self._idle_event.set()
elif isinstance(frame, FunctionCallResultFrame):
self._interrupted = False
self._idle_event.set()
async def cleanup(self) -> None:
"""Cleans up resources when processor is shutting down."""

View File

@@ -185,8 +185,26 @@ class TwilioFrameSerializer(FrameSerializer):
async with session.post(endpoint, auth=auth, data=params) as response:
if response.status == 200:
logger.info(f"Successfully terminated Twilio call {call_sid}")
elif response.status == 404:
# Handle the case where the call has already ended
# Error code 20404: "The requested resource was not found"
# Source: https://www.twilio.com/docs/errors/20404
try:
error_data = await response.json()
if error_data.get("code") == 20404:
logger.debug(f"Twilio call {call_sid} was already terminated")
return
except:
pass # Fall through to log the raw error
# Log other 404 errors
error_text = await response.text()
logger.error(
f"Failed to terminate Twilio call {call_sid}: "
f"Status {response.status}, Response: {error_text}"
)
else:
# Get the error details for better debugging
# Log other errors
error_text = await response.text()
logger.error(
f"Failed to terminate Twilio call {call_sid}: "

View File

@@ -55,7 +55,7 @@ from pipecat.services.llm_service import LLMService
from pipecat.utils.tracing.service_decorators import traced_llm
try:
import boto3
import aioboto3
import httpx
from botocore.config import Config
except ModuleNotFoundError as e:
@@ -749,13 +749,17 @@ class AWSBedrockLLMService(LLMService):
read_timeout=300, # 5 minutes
retries={"max_attempts": 3},
)
session = boto3.Session(
aws_access_key_id=aws_access_key,
aws_secret_access_key=aws_secret_key,
aws_session_token=aws_session_token,
region_name=aws_region,
)
self._client = session.client(service_name="bedrock-runtime", config=client_config)
self._aws_session = aioboto3.Session()
# Store AWS session parameters for creating client in async context
self._aws_params = {
"aws_access_key_id": aws_access_key,
"aws_secret_access_key": aws_secret_key,
"aws_session_token": aws_session_token,
"region_name": aws_region,
"config": client_config,
}
self.set_model_name(model)
self._settings = {
@@ -903,70 +907,74 @@ class AWSBedrockLLMService(LLMService):
logger.debug(f"Calling AWS Bedrock model with: {request_params}")
# Call AWS Bedrock with streaming
response = self._client.converse_stream(**request_params)
async with self._aws_session.client(
service_name="bedrock-runtime", **self._aws_params
) as client:
# Call AWS Bedrock with streaming
response = await client.converse_stream(**request_params)
await self.stop_ttfb_metrics()
await self.stop_ttfb_metrics()
# Process the streaming response
tool_use_block = None
json_accumulator = ""
# Process the streaming response
tool_use_block = None
json_accumulator = ""
function_calls = []
for event in response["stream"]:
self.reset_watchdog()
function_calls = []
# Handle text content
if "contentBlockDelta" in event:
delta = event["contentBlockDelta"]["delta"]
if "text" in delta:
await self.push_frame(LLMTextFrame(delta["text"]))
completion_tokens_estimate += self._estimate_tokens(delta["text"])
elif "toolUse" in delta and "input" in delta["toolUse"]:
# Handle partial JSON for tool use
json_accumulator += delta["toolUse"]["input"]
completion_tokens_estimate += self._estimate_tokens(
delta["toolUse"]["input"]
)
async for event in response["stream"]:
self.reset_watchdog()
# Handle tool use start
elif "contentBlockStart" in event:
content_block_start = event["contentBlockStart"]["start"]
if "toolUse" in content_block_start:
tool_use_block = {
"id": content_block_start["toolUse"].get("toolUseId", ""),
"name": content_block_start["toolUse"].get("name", ""),
}
json_accumulator = ""
# Handle text content
if "contentBlockDelta" in event:
delta = event["contentBlockDelta"]["delta"]
if "text" in delta:
await self.push_frame(LLMTextFrame(delta["text"]))
completion_tokens_estimate += self._estimate_tokens(delta["text"])
elif "toolUse" in delta and "input" in delta["toolUse"]:
# Handle partial JSON for tool use
json_accumulator += delta["toolUse"]["input"]
completion_tokens_estimate += self._estimate_tokens(
delta["toolUse"]["input"]
)
# Handle message completion with tool use
elif "messageStop" in event and "stopReason" in event["messageStop"]:
if event["messageStop"]["stopReason"] == "tool_use" and tool_use_block:
try:
arguments = json.loads(json_accumulator) if json_accumulator else {}
# Handle tool use start
elif "contentBlockStart" in event:
content_block_start = event["contentBlockStart"]["start"]
if "toolUse" in content_block_start:
tool_use_block = {
"id": content_block_start["toolUse"].get("toolUseId", ""),
"name": content_block_start["toolUse"].get("name", ""),
}
json_accumulator = ""
# Only call function if it's not the no_operation tool
if not using_noop_tool:
function_calls.append(
FunctionCallFromLLM(
context=context,
tool_call_id=tool_use_block["id"],
function_name=tool_use_block["name"],
arguments=arguments,
# Handle message completion with tool use
elif "messageStop" in event and "stopReason" in event["messageStop"]:
if event["messageStop"]["stopReason"] == "tool_use" and tool_use_block:
try:
arguments = json.loads(json_accumulator) if json_accumulator else {}
# Only call function if it's not the no_operation tool
if not using_noop_tool:
function_calls.append(
FunctionCallFromLLM(
context=context,
tool_call_id=tool_use_block["id"],
function_name=tool_use_block["name"],
arguments=arguments,
)
)
)
else:
logger.debug("Ignoring no_operation tool call")
except json.JSONDecodeError:
logger.error(f"Failed to parse tool arguments: {json_accumulator}")
else:
logger.debug("Ignoring no_operation tool call")
except json.JSONDecodeError:
logger.error(f"Failed to parse tool arguments: {json_accumulator}")
# Handle usage metrics if available
if "metadata" in event and "usage" in event["metadata"]:
usage = event["metadata"]["usage"]
prompt_tokens += usage.get("inputTokens", 0)
completion_tokens += usage.get("outputTokens", 0)
cache_read_input_tokens += usage.get("cacheReadInputTokens", 0)
cache_creation_input_tokens += usage.get("cacheWriteInputTokens", 0)
# Handle usage metrics if available
if "metadata" in event and "usage" in event["metadata"]:
usage = event["metadata"]["usage"]
prompt_tokens += usage.get("inputTokens", 0)
completion_tokens += usage.get("outputTokens", 0)
cache_read_input_tokens += usage.get("cacheReadInputTokens", 0)
cache_creation_input_tokens += usage.get("cacheWriteInputTokens", 0)
await self.run_function_calls(function_calls)
except asyncio.CancelledError:

View File

@@ -30,7 +30,7 @@ from pipecat.transcriptions.language import Language
from pipecat.utils.tracing.service_decorators import traced_tts
try:
import boto3
import aioboto3
from botocore.exceptions import BotoCoreError, ClientError
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
@@ -177,13 +177,25 @@ class AWSPollyTTSService(TTSService):
params = params or AWSPollyTTSService.InputParams()
self._polly_client = boto3.client(
"polly",
aws_access_key_id=aws_access_key_id,
aws_secret_access_key=api_key,
aws_session_token=aws_session_token,
region_name=region,
)
# Get credentials from environment variables if not provided
self._aws_params = {
"aws_access_key_id": aws_access_key_id or os.getenv("AWS_ACCESS_KEY_ID"),
"aws_secret_access_key": api_key or os.getenv("AWS_SECRET_ACCESS_KEY"),
"aws_session_token": aws_session_token or os.getenv("AWS_SESSION_TOKEN"),
"region_name": region or os.getenv("AWS_REGION", "us-east-1"),
}
# Validate that we have the required credentials
if (
not self._aws_params["aws_access_key_id"]
or not self._aws_params["aws_secret_access_key"]
):
raise ValueError(
"AWS credentials not found. Please provide them either through constructor parameters "
"or set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables."
)
self._aws_session = aioboto3.Session()
self._settings = {
"engine": params.engine,
"language": self.language_to_service_language(params.language)
@@ -199,24 +211,6 @@ class AWSPollyTTSService(TTSService):
self.set_voice(voice_id)
# Get credentials from environment variables if not provided
self._credentials = {
"aws_access_key_id": aws_access_key_id or os.getenv("AWS_ACCESS_KEY_ID"),
"aws_secret_access_key": api_key or os.getenv("AWS_SECRET_ACCESS_KEY"),
"aws_session_token": aws_session_token or os.getenv("AWS_SESSION_TOKEN"),
"region": region or os.getenv("AWS_REGION", "us-east-1"),
}
# Validate that we have the required credentials
if (
not self._credentials["aws_access_key_id"]
or not self._credentials["aws_secret_access_key"]
):
raise ValueError(
"AWS credentials not found. Please provide them either through constructor parameters "
"or set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables."
)
def can_generate_metrics(self) -> bool:
"""Check if this service can generate processing metrics.
@@ -279,14 +273,6 @@ class AWSPollyTTSService(TTSService):
Yields:
Frame: Audio frames containing the synthesized speech.
"""
def read_audio_data(**args):
response = self._polly_client.synthesize_speech(**args)
if "AudioStream" in response:
audio_data = response["AudioStream"].read()
return audio_data
return None
logger.debug(f"{self}: Generating TTS [{text}]")
try:
@@ -309,30 +295,32 @@ class AWSPollyTTSService(TTSService):
# Filter out None values
filtered_params = {k: v for k, v in params.items() if v is not None}
audio_data = await asyncio.to_thread(read_audio_data, **filtered_params)
async with self._aws_session.client("polly", **self._aws_params) as polly:
response = await polly.synthesize_speech(**filtered_params)
if "AudioStream" in response:
# Get the streaming body and read it
stream = response["AudioStream"]
audio_data = await stream.read()
else:
logger.error(f"{self} No audio stream in response")
audio_data = None
if not audio_data:
logger.error(f"{self} No audio data returned")
yield None
return
audio_data = await self._resampler.resample(audio_data, 16000, self.sample_rate)
audio_data = await self._resampler.resample(audio_data, 16000, self.sample_rate)
await self.start_tts_usage_metrics(text)
await self.start_tts_usage_metrics(text)
yield TTSStartedFrame()
yield TTSStartedFrame()
CHUNK_SIZE = self.chunk_size
CHUNK_SIZE = self.chunk_size
for i in range(0, len(audio_data), CHUNK_SIZE):
chunk = audio_data[i : i + CHUNK_SIZE]
if len(chunk) > 0:
await self.stop_ttfb_metrics()
frame = TTSAudioRawFrame(chunk, self.sample_rate, 1)
yield frame
yield TTSStoppedFrame()
for i in range(0, len(audio_data), CHUNK_SIZE):
chunk = audio_data[i : i + CHUNK_SIZE]
if len(chunk) > 0:
await self.stop_ttfb_metrics()
frame = TTSAudioRawFrame(chunk, self.sample_rate, 1)
yield frame
yield TTSStoppedFrame()
except (BotoCoreError, ClientError) as error:
logger.exception(f"{self} error generating TTS: {error}")
error_message = f"AWS Polly TTS error: {str(error)}"

View File

@@ -121,6 +121,7 @@ class CartesiaTTSService(AudioContextWordTTSService):
container: str = "raw",
params: Optional[InputParams] = None,
text_aggregator: Optional[BaseTextAggregator] = None,
aggregate_sentences: Optional[bool] = True,
**kwargs,
):
"""Initialize the Cartesia TTS service.
@@ -136,6 +137,7 @@ class CartesiaTTSService(AudioContextWordTTSService):
container: Audio container format.
params: Additional input parameters for voice customization.
text_aggregator: Custom text aggregator for processing input text.
aggregate_sentences: Whether to aggregate sentences within the TTSService.
**kwargs: Additional arguments passed to the parent service.
"""
# Aggregating sentences still gives cleaner-sounding results and fewer
@@ -149,7 +151,7 @@ class CartesiaTTSService(AudioContextWordTTSService):
# can use those to generate text frames ourselves aligned with the
# playout timing of the audio!
super().__init__(
aggregate_sentences=True,
aggregate_sentences=aggregate_sentences,
push_text_frames=False,
pause_frame_processing=True,
sample_rate=sample_rate,

View File

@@ -238,6 +238,7 @@ class ElevenLabsTTSService(AudioContextWordTTSService):
url: str = "wss://api.elevenlabs.io",
sample_rate: Optional[int] = None,
params: Optional[InputParams] = None,
aggregate_sentences: Optional[bool] = True,
**kwargs,
):
"""Initialize the ElevenLabs TTS service.
@@ -249,6 +250,7 @@ class ElevenLabsTTSService(AudioContextWordTTSService):
url: WebSocket URL for ElevenLabs TTS API.
sample_rate: Audio sample rate. If None, uses default.
params: Additional input parameters for voice customization.
aggregate_sentences: Whether to aggregate sentences within the TTSService.
**kwargs: Additional arguments passed to the parent service.
"""
# Aggregating sentences still gives cleaner-sounding results and fewer
@@ -266,7 +268,7 @@ class ElevenLabsTTSService(AudioContextWordTTSService):
# speaking for a while, so we want the parent class to send TTSStopFrame
# after a short period not receiving any audio.
super().__init__(
aggregate_sentences=True,
aggregate_sentences=aggregate_sentences,
push_text_frames=False,
push_stop_frames=True,
pause_frame_processing=True,

View File

@@ -572,9 +572,6 @@ class GeminiMultimodalLiveLLMService(LLMService):
# Initialize the File API client
self.file_api = GeminiFileAPI(api_key=api_key, base_url=file_api_base_url)
# Initialize the File API client
self.file_api = GeminiFileAPI(api_key=api_key, base_url=file_api_base_url)
def can_generate_metrics(self) -> bool:
"""Check if the service can generate usage metrics.

View File

@@ -106,10 +106,11 @@ class NeuphonicTTSService(InterruptibleTTSService):
*,
api_key: str,
voice_id: Optional[str] = None,
url: str = "wss://api.neuphonic.com",
url: str = "wss://eu-west-1.api.neuphonic.com",
sample_rate: Optional[int] = 22050,
encoding: str = "pcm_linear",
params: Optional[InputParams] = None,
aggregate_sentences: Optional[bool] = True,
**kwargs,
):
"""Initialize the Neuphonic TTS service.
@@ -121,10 +122,11 @@ class NeuphonicTTSService(InterruptibleTTSService):
sample_rate: Audio sample rate in Hz. Defaults to 22050.
encoding: Audio encoding format. Defaults to "pcm_linear".
params: Additional input parameters for TTS configuration.
aggregate_sentences: Whether to aggregate sentences within the TTSService.
**kwargs: Additional arguments passed to parent InterruptibleTTSService.
"""
super().__init__(
aggregate_sentences=True,
aggregate_sentences=aggregate_sentences,
push_text_frames=False,
push_stop_frames=True,
stop_frame_timeout_s=2.0,
@@ -279,14 +281,18 @@ class NeuphonicTTSService(InterruptibleTTSService):
"voice_id": self._voice_id,
}
query_params = [f"api_key={self._api_key}"]
query_params = []
for key, value in tts_config.items():
if value is not None:
query_params.append(f"{key}={value}")
url = f"{self._url}/speak/{self._settings['lang_code']}?{'&'.join(query_params)}"
url = f"{self._url}/speak/{self._settings['lang_code']}"
if query_params:
url += f"?{'&'.join(query_params)}"
self._websocket = await websockets.connect(url)
headers = {"x-api-key": self._api_key}
self._websocket = await websockets.connect(url, extra_headers=headers)
except Exception as e:
logger.error(f"{self} initialization error: {e}")
self._websocket = None
@@ -311,7 +317,7 @@ class NeuphonicTTSService(InterruptibleTTSService):
async for message in WatchdogAsyncIterator(self._websocket, manager=self.task_manager):
if isinstance(message, str):
msg = json.loads(message)
if msg.get("data", {}).get("audio") is not None:
if msg.get("data") and msg["data"].get("audio"):
await self.stop_ttfb_metrics()
audio = base64.b64decode(msg["data"]["audio"])
@@ -324,12 +330,19 @@ class NeuphonicTTSService(InterruptibleTTSService):
while True:
self.reset_watchdog()
await asyncio.sleep(KEEPALIVE_SLEEP)
await self._send_text("")
await self._send_keepalive()
async def _send_keepalive(self):
"""Send keepalive message to maintain connection."""
if self._websocket:
# Send empty text for keepalive
msg = {"text": ""}
await self._websocket.send(json.dumps(msg))
async def _send_text(self, text: str):
"""Send text to Neuphonic WebSocket for synthesis."""
if self._websocket:
msg = {"text": text}
msg = {"text": f"{text} <STOP>"}
logger.debug(f"Sending text to websocket: {msg}")
await self._websocket.send(json.dumps(msg))

View File

@@ -6,6 +6,8 @@
"""OLLama LLM service implementation for Pipecat AI framework."""
from loguru import logger
from pipecat.services.openai.llm import OpenAILLMService
@@ -16,12 +18,28 @@ class OLLamaLLMService(OpenAILLMService):
providing a compatible interface for running large language models locally.
"""
def __init__(self, *, model: str = "llama2", base_url: str = "http://localhost:11434/v1"):
def __init__(
self, *, model: str = "llama2", base_url: str = "http://localhost:11434/v1", **kwargs
):
"""Initialize OLLama LLM service.
Args:
model: The OLLama model to use. Defaults to "llama2".
base_url: The base URL for the OLLama API endpoint.
Defaults to "http://localhost:11434/v1".
**kwargs: Additional keyword arguments passed to OpenAILLMService.
"""
super().__init__(model=model, base_url=base_url, api_key="ollama")
super().__init__(model=model, base_url=base_url, api_key="ollama", **kwargs)
def create_client(self, base_url=None, **kwargs):
"""Create OpenAI-compatible client for Ollama.
Args:
base_url: The base URL for the API. If None, uses instance base_url.
**kwargs: Additional keyword arguments passed to the parent create_client method.
Returns:
An OpenAI-compatible client configured for Ollama.
"""
logger.debug(f"Creating Ollama client with api {base_url}")
return super().create_client(base_url, **kwargs)

View File

@@ -99,6 +99,7 @@ class RimeTTSService(AudioContextWordTTSService):
sample_rate: Optional[int] = None,
params: Optional[InputParams] = None,
text_aggregator: Optional[BaseTextAggregator] = None,
aggregate_sentences: Optional[bool] = True,
**kwargs,
):
"""Initialize Rime TTS service.
@@ -111,11 +112,12 @@ class RimeTTSService(AudioContextWordTTSService):
sample_rate: Audio sample rate in Hz.
params: Additional configuration parameters.
text_aggregator: Custom text aggregator for processing input text.
aggregate_sentences: Whether to aggregate sentences within the TTSService.
**kwargs: Additional arguments passed to parent class.
"""
# Initialize with parent class settings for proper frame handling
super().__init__(
aggregate_sentences=True,
aggregate_sentences=aggregate_sentences,
push_text_frames=False,
push_stop_frames=True,
pause_frame_processing=True,

View File

@@ -279,7 +279,6 @@ class RivaSTTService(STTService):
streaming_config=self._config,
)
for response in responses:
self.reset_watchdog()
if not response.results:
continue
asyncio.run_coroutine_threadsafe(

View File

@@ -49,8 +49,10 @@ class Model(Enum):
Parameters:
TINY: Smallest multilingual model, fastest inference.
BASE: Basic multilingual model, good speed/quality balance.
SMALL: Small multilingual model, better speed/quality balance than BASE.
MEDIUM: Medium-sized multilingual model, better quality.
LARGE: Best quality multilingual model, slower inference.
LARGE_V3_TURBO: Fast multilingual model, slightly lower quality than LARGE.
DISTIL_LARGE_V2: Fast multilingual distilled model.
DISTIL_MEDIUM_EN: Fast English-only distilled model.
"""
@@ -58,8 +60,10 @@ class Model(Enum):
# Multilingual models
TINY = "tiny"
BASE = "base"
SMALL = "small"
MEDIUM = "medium"
LARGE = "large-v3"
LARGE_V3_TURBO = "deepdml/faster-whisper-large-v3-turbo-ct2"
DISTIL_LARGE_V2 = "Systran/faster-distil-whisper-large-v2"
# English-only models