RTSP stream example

Merge pull request #2816 from pipecat-ai/pk/gemini-live-await-ongoing-response-after-endframe
Implement ending `GeminiMultimodalLiveLLMService` gracefully (i.e. af…
2025-10-09 13:54:32 +08:00 · 2025-10-08 17:20:14 -04:00 · 2025-10-08 17:19:46 -04:00 · 2025-10-08 17:16:31 -04:00 · 2025-10-08 17:13:04 -04:00 · 2025-10-08 17:10:45 -04:00
10 changed files with 800 additions and 856 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -5,6 +5,46 @@ All notable changes to **Pipecat** will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

+## [Unreleased]
+
+### Added
+
+- Added some new configuration options to `GeminiMultimodalLiveLLMService`:
+
+  - `thinking`
+  - `enable_affective_dialog`
+  - `proactivity`
+
+  Note that these new configuration options require using a newer model than
+  the default, like "gemini-2.5-flash-native-audio-preview-09-2025". The last
+  two require specifying `http_options=HttpOptions(api_version="v1alpha")`.
+
+- Added `on_pipeline_error` event to `PipelineTask`. This event will get fired
+  when an `ErrorFrame` is pushed (use `FrameProcessor.push_error()`).
+
+  ```python
+  @task.event_handler("on_pipeline_error")
+  async def on_pipeline_error(task: PipelineTask, frame: ErrorFrame):
+      ...
+  ```
+
+- Added a `service_tier` `InputParam` to the `BaseOpenAILLMService`. This
+  parameter can influence the latency of the response. For example `"priority"`
+  will result in faster completions, but in exchange for a higher price.
+
+### Changed
+
+- Updated `GeminiMultimodalLiveLLMService` to use the `google-genai` library
+  rather than use WebSockets directly.
+
+### Fixed
+
+- `GeminiMultimodalLiveLLMService` will now end gracefully (i.e. after the bot
+  has finished) upon receiving an `EndFrame`.
+
+- `GeminiMultimodalLiveLLMService` will try to seamlessly reconnect when it
+  loses its connection.
+
 ## [0.0.89] - 2025-10-07

 ### Fixed
@@ -23,8 +63,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - Added `HumeTTSService` for text-to-speech synthesis using Hume AI's expressive
  voice models. Provides high-quality, emotionally expressive speech synthesis
  with support for various voice models. Includes example in
-  `examples/foundational/07ad-interruptible-hume.py`. Use with `uv pip install
-  pipecat-ai[hume]`.
+  `examples/foundational/07ad-interruptible-hume.py`. Use with:
+  `uv pip install pipecat-ai[hume]`.

 ### Changed

--- a/README.md
+++ b/README.md
@@ -51,6 +51,10 @@ Looking for help debugging your pipeline and processors? Check out [Whisker](htt

 Love terminal applications? Check out [Tail](https://github.com/pipecat-ai/tail), a terminal dashboard for Pipecat.

+### 📺️ Pipecat TV Channel
+
+Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.youtube.com/playlist?list=PLzU2zoMTQIHjqC3v4q2XVSR3hGSzwKFwH) channel.
+
 ## 🎬 See it in action

 <p float="left">
--- a/examples/foundational/18-gstreamer-filesrc.py
+++ b/examples/foundational/18-gstreamer-filesrc.py
@@ -21,7 +21,7 @@ from pipecat.transports.daily.transport import DailyParams
 load_dotenv(override=True)

 parser = argparse.ArgumentParser(description="Pipecat Video Streaming Bot")
-parser.add_argument("-i", "--input", type=str, required=True, help="Input video file")
+parser.add_argument("-i", "--input", type=str, required=False, help="Input video file")
 args = parser.parse_args()

 # We store functions so objects (e.g. SileroVADAnalyzer) don't get
@@ -48,8 +48,9 @@ transport_params = {
 async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    logger.info(f"Starting bot with video input: {args.input}")

+    location = "rtsp://rtspstream:9bGdZ6NKfRXnMbFAg71al@zephyr.rtsp.stream/people"
    gst = GStreamerPipelineSource(
-        pipeline=f"filesrc location={args.input}",
+        pipeline=(f"rtspsrc location={location} ! decodebin ! autovideosink"),
        out_params=GStreamerPipelineSource.OutputParams(
            video_width=1280,
            video_height=720,
--- a/examples/foundational/26i-gemini-multimodal-live-graceful-end.py
+++ b/examples/foundational/26i-gemini-multimodal-live-graceful-end.py
@@ -0,0 +1,206 @@
+#
+# Copyright (c) 2024–2025, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+
+import asyncio
+import os
+from datetime import datetime
+
+from dotenv import load_dotenv
+from loguru import logger
+
+from pipecat.adapters.schemas.function_schema import FunctionSchema
+from pipecat.adapters.schemas.tools_schema import AdapterType, ToolsSchema
+from pipecat.audio.vad.silero import SileroVADAnalyzer
+from pipecat.audio.vad.vad_analyzer import VADParams
+from pipecat.frames.frames import EndTaskFrame, LLMRunFrame
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
+from pipecat.processors.frame_processor import FrameDirection
+from pipecat.runner.types import RunnerArguments
+from pipecat.runner.utils import create_transport
+from pipecat.services.gemini_multimodal_live.gemini import GeminiMultimodalLiveLLMService
+from pipecat.services.llm_service import FunctionCallParams
+from pipecat.transports.base_transport import BaseTransport, TransportParams
+from pipecat.transports.daily.transport import DailyParams
+from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
+
+load_dotenv(override=True)
+
+
+async def fetch_weather_from_api(params: FunctionCallParams):
+    temperature = 75 if params.arguments["format"] == "fahrenheit" else 24
+    await params.result_callback(
+        {
+            "conditions": "nice",
+            "temperature": temperature,
+            "format": params.arguments["format"],
+            "timestamp": datetime.now().strftime("%Y%m%d_%H%M%S"),
+        }
+    )
+
+
+async def fetch_restaurant_recommendation(params: FunctionCallParams):
+    await params.result_callback({"name": "The Golden Dragon"})
+
+
+async def end_conversation(params: FunctionCallParams):
+    await params.result_callback({"success": True})
+    await params.llm.push_frame(EndTaskFrame(), FrameDirection.UPSTREAM)
+
+
+system_instruction = """
+You are a helpful assistant who can answer questions and use tools.
+
+You have three tools available to you:
+1. get_current_weather: Use this tool to get the current weather in a specific location.
+2. get_restaurant_recommendation: Use this tool to get a restaurant recommendation in a specific location.
+3. end_conversation: Use this tool to gracefully end the conversation.
+
+After you've responded to the user three times, do two things, in order:
+1. Politely let them know that that's all the time you have today and say goodbye.
+2. Call the end_conversation tool to gracefully end the conversation.
+"""
+
+
+# We store functions so objects (e.g. SileroVADAnalyzer) don't get
+# instantiated. The function will be called when the desired transport gets
+# selected.
+transport_params = {
+    "daily": lambda: DailyParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+        # set stop_secs to something roughly similar to the internal setting
+        # of the Multimodal Live api, just to align events. This doesn't really
+        # matter because we can only use the Multimodal Live API's phrase
+        # endpointing, for now.
+        vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.5)),
+    ),
+    "twilio": lambda: FastAPIWebsocketParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+        # set stop_secs to something roughly similar to the internal setting
+        # of the Multimodal Live api, just to align events. This doesn't really
+        # matter because we can only use the Multimodal Live API's phrase
+        # endpointing, for now.
+        vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.5)),
+    ),
+    "webrtc": lambda: TransportParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+        # set stop_secs to something roughly similar to the internal setting
+        # of the Multimodal Live api, just to align events. This doesn't really
+        # matter because we can only use the Multimodal Live API's phrase
+        # endpointing, for now.
+        vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.5)),
+    ),
+}
+
+
+async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
+    logger.info(f"Starting bot")
+
+    weather_function = FunctionSchema(
+        name="get_current_weather",
+        description="Get the current weather",
+        properties={
+            "location": {
+                "type": "string",
+                "description": "The city and state, e.g. San Francisco, CA",
+            },
+            "format": {
+                "type": "string",
+                "enum": ["celsius", "fahrenheit"],
+                "description": "The temperature unit to use. Infer this from the user's location.",
+            },
+        },
+        required=["location", "format"],
+    )
+    restaurant_function = FunctionSchema(
+        name="get_restaurant_recommendation",
+        description="Get a restaurant recommendation",
+        properties={
+            "location": {
+                "type": "string",
+                "description": "The city and state, e.g. San Francisco, CA",
+            },
+        },
+        required=["location"],
+    )
+    end_conversation_function = FunctionSchema(
+        name="end_conversation",
+        description="Gracefully end the conversation",
+        properties={},
+        required=[],
+    )
+    search_tool = {"google_search": {}}
+    tools = ToolsSchema(
+        standard_tools=[weather_function, restaurant_function, end_conversation_function],
+        custom_tools={AdapterType.GEMINI: [search_tool]},
+    )
+
+    llm = GeminiMultimodalLiveLLMService(
+        api_key=os.getenv("GOOGLE_API_KEY"),
+        system_instruction=system_instruction,
+        tools=tools,
+    )
+
+    llm.register_function("get_current_weather", fetch_weather_from_api)
+    llm.register_function("get_restaurant_recommendation", fetch_restaurant_recommendation)
+    llm.register_function("end_conversation", end_conversation)
+
+    context = OpenAILLMContext(
+        [{"role": "user", "content": "Say hello."}],
+    )
+    context_aggregator = llm.create_context_aggregator(context)
+
+    pipeline = Pipeline(
+        [
+            transport.input(),
+            context_aggregator.user(),
+            llm,
+            transport.output(),
+            context_aggregator.assistant(),
+        ]
+    )
+
+    task = PipelineTask(
+        pipeline,
+        params=PipelineParams(
+            enable_metrics=True,
+            enable_usage_metrics=True,
+        ),
+        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
+    )
+
+    @transport.event_handler("on_client_connected")
+    async def on_client_connected(transport, client):
+        logger.info(f"Client connected")
+        # Kick off the conversation.
+        await task.queue_frames([LLMRunFrame()])
+
+    @transport.event_handler("on_client_disconnected")
+    async def on_client_disconnected(transport, client):
+        logger.info(f"Client disconnected")
+        await task.cancel()
+
+    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
+
+    await runner.run(task)
+
+
+async def bot(runner_args: RunnerArguments):
+    """Main bot entry point compatible with Pipecat Cloud."""
+    transport = await create_transport(runner_args, transport_params)
+    await run_bot(transport, runner_args)
+
+
+if __name__ == "__main__":
+    from pipecat.runner.run import main
+
+    main()
--- a/src/pipecat/pipeline/task.py
+++ b/src/pipecat/pipeline/task.py
@@ -138,6 +138,8 @@ class PipelineTask(BasePipelineTask):
          Use this event for cleanup, logging, or post-processing tasks. Users can inspect
          the frame if they need to handle specific cases.

+    - on_pipeline_error: Called when an error occurs with ErrorFrame
+
    Example::

        @task.event_handler("on_frame_reached_upstream")
@@ -148,9 +150,17 @@ class PipelineTask(BasePipelineTask):
        async def on_pipeline_idle_timeout(task):
            ...

+        @task.event_handler("on_pipeline_started")
+        async def on_pipeline_started(task, frame):
+            ...
+
        @task.event_handler("on_pipeline_finished")
        async def on_pipeline_finished(task, frame):
            ...
+
+        @task.event_handler("on_pipeline_error")
+        async def on_pipeline_error(task, frame):
+            ...
    """

    def __init__(
@@ -288,6 +298,7 @@ class PipelineTask(BasePipelineTask):
        self._register_event_handler("on_pipeline_ended")
        self._register_event_handler("on_pipeline_cancelled")
        self._register_event_handler("on_pipeline_finished")
+        self._register_event_handler("on_pipeline_error")

    @property
    def params(self) -> PipelineParams:
@@ -694,12 +705,11 @@ class PipelineTask(BasePipelineTask):
            logger.debug(f"{self}: received interruption task frame {frame}")
            await self._pipeline.queue_frame(InterruptionFrame())
        elif isinstance(frame, ErrorFrame):
+            await self._call_event_handler("on_pipeline_error", frame)
            if frame.fatal:
                logger.error(f"A fatal error occurred: {frame}")
                # Cancel all tasks downstream.
                await self.queue_frame(CancelFrame())
-                # Tell the task we should stop.
-                await self.queue_frame(StopTaskFrame())
            else:
                logger.warning(f"{self}: Something went wrong: {frame}")

--- a/src/pipecat/services/gemini_multimodal_live/events.py
+++ b/src/pipecat/services/gemini_multimodal_live/events.py
@@ -4,527 +4,38 @@
 # SPDX-License-Identifier: BSD 2-Clause License
 #

-"""Event models and utilities for Google Gemini Multimodal Live API."""
-
-import base64
-import io
-import json
-from enum import Enum
-from typing import List, Literal, Optional
-
-from PIL import Image
-from pydantic import BaseModel, Field
-
-from pipecat.frames.frames import ImageRawFrame
-
-#
-# Client events
-#
-
-
-class MediaChunk(BaseModel):
-    """Represents a chunk of media data for transmission.
-
-    Parameters:
-        mimeType: MIME type of the media content.
-        data: Base64-encoded media data.
-    """
-
-    mimeType: str
-    data: str
-
-
-class ContentPart(BaseModel):
-    """Represents a part of content that can contain text or media.
-
-    Parameters:
-        text: Text content. Defaults to None.
-        inlineData: Inline media data. Defaults to None.
-    """
-
-    text: Optional[str] = Field(default=None, validate_default=False)
-    inlineData: Optional[MediaChunk] = Field(default=None, validate_default=False)
-    fileData: Optional["FileData"] = Field(default=None, validate_default=False)
-
-
-class FileData(BaseModel):
-    """Represents a file reference in the Gemini File API."""
-
-    mimeType: str
-    fileUri: str
-
-
-ContentPart.model_rebuild()  # Rebuild model to resolve forward reference
-
-
-class Turn(BaseModel):
-    """Represents a conversational turn in the dialogue.
-
-    Parameters:
-        role: The role of the speaker, either "user" or "model". Defaults to "user".
-        parts: List of content parts that make up the turn.
-    """
-
-    role: Literal["user", "model"] = "user"
-    parts: List[ContentPart]
-
-
-class StartSensitivity(str, Enum):
-    """Determines how start of speech is detected."""
-
-    UNSPECIFIED = "START_SENSITIVITY_UNSPECIFIED"  # Default is HIGH
-    HIGH = "START_SENSITIVITY_HIGH"  # Detect start of speech more often
-    LOW = "START_SENSITIVITY_LOW"  # Detect start of speech less often
-
-
-class EndSensitivity(str, Enum):
-    """Determines how end of speech is detected."""
-
-    UNSPECIFIED = "END_SENSITIVITY_UNSPECIFIED"  # Default is HIGH
-    HIGH = "END_SENSITIVITY_HIGH"  # End speech more often
-    LOW = "END_SENSITIVITY_LOW"  # End speech less often
-
-
-class AutomaticActivityDetection(BaseModel):
-    """Configures automatic detection of voice activity.
-
-    Parameters:
-        disabled: Whether automatic activity detection is disabled. Defaults to None.
-        start_of_speech_sensitivity: Sensitivity for detecting speech start. Defaults to None.
-        prefix_padding_ms: Padding before speech start in milliseconds. Defaults to None.
-        end_of_speech_sensitivity: Sensitivity for detecting speech end. Defaults to None.
-        silence_duration_ms: Duration of silence to detect speech end. Defaults to None.
-    """
-
-    disabled: Optional[bool] = None
-    start_of_speech_sensitivity: Optional[StartSensitivity] = None
-    prefix_padding_ms: Optional[int] = None
-    end_of_speech_sensitivity: Optional[EndSensitivity] = None
-    silence_duration_ms: Optional[int] = None
-
-
-class RealtimeInputConfig(BaseModel):
-    """Configures the realtime input behavior.
-
-    Parameters:
-        automatic_activity_detection: Voice activity detection configuration. Defaults to None.
-    """
-
-    automatic_activity_detection: Optional[AutomaticActivityDetection] = None
-
-
-class RealtimeInput(BaseModel):
-    """Contains realtime input media chunks and text.
-
-    Parameters:
-        mediaChunks: List of media chunks for realtime processing.
-        text: Text for realtime processing.
-    """
-
-    mediaChunks: Optional[List[MediaChunk]] = None
-    text: Optional[str] = None
-
-
-class ClientContent(BaseModel):
-    """Content sent from client to the Gemini Live API.
-
-    Parameters:
-        turns: List of conversation turns. Defaults to None.
-        turnComplete: Whether the client's turn is complete. Defaults to False.
-    """
-
-    turns: Optional[List[Turn]] = None
-    turnComplete: bool = False
-
-
-class AudioInputMessage(BaseModel):
-    """Message containing audio input data.
-
-    Parameters:
-        realtimeInput: Realtime input containing audio chunks.
-    """
-
-    realtimeInput: RealtimeInput
-
-    @classmethod
-    def from_raw_audio(cls, raw_audio: bytes, sample_rate: int) -> "AudioInputMessage":
-        """Create an audio input message from raw audio data.
-
-        Args:
-            raw_audio: Raw audio bytes.
-            sample_rate: Audio sample rate in Hz.
-
-        Returns:
-            AudioInputMessage instance with encoded audio data.
-        """
-        data = base64.b64encode(raw_audio).decode("utf-8")
-        return cls(
-            realtimeInput=RealtimeInput(
-                mediaChunks=[MediaChunk(mimeType=f"audio/pcm;rate={sample_rate}", data=data)]
-            )
-        )
-
-
-class VideoInputMessage(BaseModel):
-    """Message containing video/image input data.
-
-    Parameters:
-        realtimeInput: Realtime input containing video/image chunks.
-    """
-
-    realtimeInput: RealtimeInput
-
-    @classmethod
-    def from_image_frame(cls, frame: ImageRawFrame) -> "VideoInputMessage":
-        """Create a video input message from an image frame.
-
-        Args:
-            frame: Image frame to encode.
-
-        Returns:
-            VideoInputMessage instance with encoded image data.
-        """
-        buffer = io.BytesIO()
-        Image.frombytes(frame.format, frame.size, frame.image).save(buffer, format="JPEG")
-        data = base64.b64encode(buffer.getvalue()).decode("utf-8")
-        return cls(
-            realtimeInput=RealtimeInput(mediaChunks=[MediaChunk(mimeType=f"image/jpeg", data=data)])
-        )
-
-
-class TextInputMessage(BaseModel):
-    """Message containing text input data."""
-
-    realtimeInput: RealtimeInput
-
-    @classmethod
-    def from_text(cls, text: str) -> "TextInputMessage":
-        """Create a text input message from a string.
-
-        Args:
-            text: The text to send.
-
-        Returns:
-            A TextInputMessage instance.
-        """
-        return cls(realtimeInput=RealtimeInput(text=text))
-
-
-class ClientContentMessage(BaseModel):
-    """Message containing client content for the API.
-
-    Parameters:
-        clientContent: The client content to send.
-    """
-
-    clientContent: ClientContent
-
-
-class SystemInstruction(BaseModel):
-    """System instruction for the model.
-
-    Parameters:
-        parts: List of content parts that make up the system instruction.
-    """
-
-    parts: List[ContentPart]
-
-
-class AudioTranscriptionConfig(BaseModel):
-    """Configuration for audio transcription."""
-
-    pass
-
-
-class Setup(BaseModel):
-    """Setup configuration for the Gemini Live session.
-
-    Parameters:
-        model: Model identifier to use.
-        system_instruction: System instruction for the model. Defaults to None.
-        tools: List of available tools/functions. Defaults to None.
-        generation_config: Generation configuration parameters. Defaults to None.
-        input_audio_transcription: Input audio transcription config. Defaults to None.
-        output_audio_transcription: Output audio transcription config. Defaults to None.
-        realtime_input_config: Realtime input configuration. Defaults to None.
-    """
-
-    model: str
-    system_instruction: Optional[SystemInstruction] = None
-    tools: Optional[List[dict]] = None
-    generation_config: Optional[dict] = None
-    input_audio_transcription: Optional[AudioTranscriptionConfig] = None
-    output_audio_transcription: Optional[AudioTranscriptionConfig] = None
-    realtime_input_config: Optional[RealtimeInputConfig] = None
-
-
-class Config(BaseModel):
-    """Configuration message for session setup.
-
-    Parameters:
-        setup: Setup configuration for the session.
-    """
-
-    setup: Setup
-
-
-#
-# Grounding metadata models
-#
-
-
-class SearchEntryPoint(BaseModel):
-    """Represents the search entry point with rendered content for search suggestions."""
-
-    renderedContent: Optional[str] = None
-
-
-class WebSource(BaseModel):
-    """Represents a web source from grounding chunks."""
-
-    uri: Optional[str] = None
-    title: Optional[str] = None
-
-
-class GroundingChunk(BaseModel):
-    """Represents a grounding chunk containing web source information."""
-
-    web: Optional[WebSource] = None
-
-
-class GroundingSegment(BaseModel):
-    """Represents a segment of text that is grounded."""
-
-    startIndex: Optional[int] = None
-    endIndex: Optional[int] = None
-    text: Optional[str] = None
-
-
-class GroundingSupport(BaseModel):
-    """Represents support information for grounded text segments."""
-
-    segment: Optional[GroundingSegment] = None
-    groundingChunkIndices: Optional[List[int]] = None
-    confidenceScores: Optional[List[float]] = None
-
-
-class GroundingMetadata(BaseModel):
-    """Represents grounding metadata from Google Search."""
-
-    searchEntryPoint: Optional[SearchEntryPoint] = None
-    groundingChunks: Optional[List[GroundingChunk]] = None
-    groundingSupports: Optional[List[GroundingSupport]] = None
-    webSearchQueries: Optional[List[str]] = None
-
-
-#
-# Server events
-#
-
-
-class SetupComplete(BaseModel):
-    """Indicates that session setup is complete."""
-
-    pass
-
-
-class InlineData(BaseModel):
-    """Inline data embedded in server responses.
-
-    Parameters:
-        mimeType: MIME type of the data.
-        data: Base64-encoded data content.
-    """
-
-    mimeType: str
-    data: str
-
-
-class Part(BaseModel):
-    """Part of a server response containing data or text.
-
-    Parameters:
-        inlineData: Inline binary data. Defaults to None.
-        text: Text content. Defaults to None.
-    """
-
-    inlineData: Optional[InlineData] = None
-    text: Optional[str] = None
-
-
-class ModelTurn(BaseModel):
-    """Represents a turn from the model in the conversation.
-
-    Parameters:
-        parts: List of content parts in the model's response.
-    """
-
-    parts: List[Part]
-
-
-class ServerContentInterrupted(BaseModel):
-    """Indicates server content was interrupted.
-
-    Parameters:
-        interrupted: Whether the content was interrupted.
-    """
-
-    interrupted: bool
-
-
-class ServerContentTurnComplete(BaseModel):
-    """Indicates the server's turn is complete.
-
-    Parameters:
-        turnComplete: Whether the turn is complete.
-    """
-
-    turnComplete: bool
-
-
-class BidiGenerateContentTranscription(BaseModel):
-    """Transcription data from bidirectional content generation.
-
-    Parameters:
-        text: The transcribed text content.
-    """
-
-    text: str
-
-
-class ServerContent(BaseModel):
-    """Content sent from server to client.
-
-    Parameters:
-        modelTurn: Model's conversational turn. Defaults to None.
-        interrupted: Whether content was interrupted. Defaults to None.
-        turnComplete: Whether the turn is complete. Defaults to None.
-        inputTranscription: Transcription of input audio. Defaults to None.
-        outputTranscription: Transcription of output audio. Defaults to None.
-    """
-
-    modelTurn: Optional[ModelTurn] = None
-    interrupted: Optional[bool] = None
-    turnComplete: Optional[bool] = None
-    inputTranscription: Optional[BidiGenerateContentTranscription] = None
-    outputTranscription: Optional[BidiGenerateContentTranscription] = None
-    groundingMetadata: Optional[GroundingMetadata] = None
-
-
-class FunctionCall(BaseModel):
-    """Represents a function call from the model.
-
-    Parameters:
-        id: Unique identifier for the function call.
-        name: Name of the function to call.
-        args: Arguments to pass to the function.
-    """
-
-    id: str
-    name: str
-    args: dict
-
-
-class ToolCall(BaseModel):
-    """Contains one or more function calls.
-
-    Parameters:
-        functionCalls: List of function calls to execute.
-    """
-
-    functionCalls: List[FunctionCall]
-
-
-class Modality(str, Enum):
-    """Modality types in token counts."""
-
-    UNSPECIFIED = "MODALITY_UNSPECIFIED"
-    TEXT = "TEXT"
-    IMAGE = "IMAGE"
-    AUDIO = "AUDIO"
-    VIDEO = "VIDEO"
-
-
-class ModalityTokenCount(BaseModel):
-    """Token count for a specific modality.
-
-    Parameters:
-        modality: The modality type.
-        tokenCount: Number of tokens for this modality.
-    """
-
-    modality: Modality
-    tokenCount: int
-
-
-class UsageMetadata(BaseModel):
-    """Usage metadata about the API response.
-
-    Parameters:
-        promptTokenCount: Number of tokens in the prompt. Defaults to None.
-        cachedContentTokenCount: Number of cached content tokens. Defaults to None.
-        responseTokenCount: Number of tokens in the response. Defaults to None.
-        toolUsePromptTokenCount: Number of tokens for tool use prompts. Defaults to None.
-        thoughtsTokenCount: Number of tokens for model thoughts. Defaults to None.
-        totalTokenCount: Total number of tokens used. Defaults to None.
-        promptTokensDetails: Detailed breakdown of prompt tokens by modality. Defaults to None.
-        cacheTokensDetails: Detailed breakdown of cache tokens by modality. Defaults to None.
-        responseTokensDetails: Detailed breakdown of response tokens by modality. Defaults to None.
-        toolUsePromptTokensDetails: Detailed breakdown of tool use tokens by modality. Defaults to None.
-    """
-
-    promptTokenCount: Optional[int] = None
-    cachedContentTokenCount: Optional[int] = None
-    responseTokenCount: Optional[int] = None
-    toolUsePromptTokenCount: Optional[int] = None
-    thoughtsTokenCount: Optional[int] = None
-    totalTokenCount: Optional[int] = None
-    promptTokensDetails: Optional[List[ModalityTokenCount]] = None
-    cacheTokensDetails: Optional[List[ModalityTokenCount]] = None
-    responseTokensDetails: Optional[List[ModalityTokenCount]] = None
-    toolUsePromptTokensDetails: Optional[List[ModalityTokenCount]] = None
-
-
-class ServerEvent(BaseModel):
-    """Server event received from the Gemini Live API.
-
-    Parameters:
-        setupComplete: Setup completion notification. Defaults to None.
-        serverContent: Content from the server. Defaults to None.
-        toolCall: Tool/function call request. Defaults to None.
-        usageMetadata: Token usage metadata. Defaults to None.
-    """
-
-    setupComplete: Optional[SetupComplete] = None
-    serverContent: Optional[ServerContent] = None
-    toolCall: Optional[ToolCall] = None
-    usageMetadata: Optional[UsageMetadata] = None
-
-
-def parse_server_event(str):
-    """Parse a server event from JSON string.
-
-    Args:
-        str: JSON string containing the server event.
-
-    Returns:
-        ServerEvent instance if parsing succeeds, None otherwise.
-    """
-    try:
-        evt = json.loads(str)
-        return ServerEvent.model_validate(evt)
-    except Exception as e:
-        print(f"Error parsing server event: {e}")
-        return None
-
-
-class ContextWindowCompressionConfig(BaseModel):
-    """Configuration for context window compression.
-
-    Parameters:
-        sliding_window: Whether to use sliding window compression. Defaults to True.
-        trigger_tokens: Token count threshold to trigger compression. Defaults to None.
-    """
-
-    sliding_window: Optional[bool] = Field(default=True)
-    trigger_tokens: Optional[int] = Field(default=None)
+"""Event models and utilities for Google Gemini Multimodal Live API.
+
+.. deprecated:: 0.0.90
+    Importing StartSensitivity and EndSensitivity from this module is deprecated.
+    Import them directly from google.genai.types instead.
+"""
+
+import warnings
+
+from loguru import logger
+
+try:
+    from google.genai.types import (
+        EndSensitivity as _EndSensitivity,
+    )
+    from google.genai.types import (
+        StartSensitivity as _StartSensitivity,
+    )
+except ModuleNotFoundError as e:
+    logger.error(f"Exception: {e}")
+    logger.error("In order to use Google AI, you need to `pip install pipecat-ai[google]`.")
+    raise Exception(f"Missing module: {e}")
+
+# These aliases are just here for backward compatibility, since we used to
+# define public-facing StartSensitivity and EndSensitivity enums in this
+# module.
+warnings.warn(
+    "Importing StartSensitivity and EndSensitivity from "
+    "pipecat.services.gemini_multimodal_live.events is deprecated. "
+    "Please import them directly from google.genai.types instead.",
+    DeprecationWarning,
+    stacklevel=2,
+)
+StartSensitivity = _StartSensitivity
+EndSensitivity = _EndSensitivity
--- a/src/pipecat/services/gemini_multimodal_live/gemini.py
+++ b/src/pipecat/services/gemini_multimodal_live/gemini.py
--- a/src/pipecat/services/openai/base_llm.py
+++ b/src/pipecat/services/openai/base_llm.py
@@ -66,6 +66,7 @@ class BaseOpenAILLMService(LLMService):
            top_p: Top-p (nucleus) sampling parameter (0.0 to 1.0).
            max_tokens: Maximum tokens in response (deprecated, use max_completion_tokens).
            max_completion_tokens: Maximum completion tokens to generate.
+            service_tier: Service tier to use (e.g., "auto", "flex", "priority").
            extra: Additional model-specific parameters.
        """

@@ -83,6 +84,7 @@ class BaseOpenAILLMService(LLMService):
        top_p: Optional[float] = Field(default_factory=lambda: NOT_GIVEN, ge=0.0, le=1.0)
        max_tokens: Optional[int] = Field(default_factory=lambda: NOT_GIVEN, ge=1)
        max_completion_tokens: Optional[int] = Field(default_factory=lambda: NOT_GIVEN, ge=1)
+        service_tier: Optional[str] = Field(default_factory=lambda: NOT_GIVEN)
        extra: Optional[Dict[str, Any]] = Field(default_factory=dict)

    def __init__(
@@ -125,6 +127,7 @@ class BaseOpenAILLMService(LLMService):
            "top_p": params.top_p,
            "max_tokens": params.max_tokens,
            "max_completion_tokens": params.max_completion_tokens,
+            "service_tier": params.service_tier,
            "extra": params.extra if isinstance(params.extra, dict) else {},
        }
        self._retry_timeout_secs = retry_timeout_secs
@@ -236,6 +239,7 @@ class BaseOpenAILLMService(LLMService):
            "top_p": self._settings["top_p"],
            "max_tokens": self._settings["max_tokens"],
            "max_completion_tokens": self._settings["max_completion_tokens"],
+            "service_tier": self._settings["service_tier"],
        }

        # Messages, tools, tool_choice
--- a/src/pipecat/utils/tracing/service_decorators.py
+++ b/src/pipecat/utils/tracing/service_decorators.py
@@ -651,9 +651,9 @@ def traced_gemini_live(operation: str) -> Callable:

                        elif operation == "llm_tool_call" and args:
                            # Extract tool call information
-                            evt = args[0] if args else None
-                            if evt and hasattr(evt, "toolCall") and evt.toolCall.functionCalls:
-                                function_calls = evt.toolCall.functionCalls
+                            msg = args[0] if args else None
+                            if msg and hasattr(msg, "tool_call") and msg.tool_call.function_calls:
+                                function_calls = msg.tool_call.function_calls
                                if function_calls:
                                    # Add information about the first function call
                                    call = function_calls[0]
@@ -722,19 +722,19 @@ def traced_gemini_live(operation: str) -> Callable:

                        elif operation == "llm_response" and args:
                            # Extract usage and response metadata from turn complete event
-                            evt = args[0] if args else None
-                            if evt and hasattr(evt, "usageMetadata") and evt.usageMetadata:
-                                usage = evt.usageMetadata
+                            msg = args[0] if args else None
+                            if msg and hasattr(msg, "usage_metadata") and msg.usage_metadata:
+                                usage = msg.usage_metadata

                                # Token usage - basic attributes for span visibility
-                                if hasattr(usage, "promptTokenCount"):
-                                    operation_attrs["tokens.prompt"] = usage.promptTokenCount or 0
-                                if hasattr(usage, "responseTokenCount"):
+                                if hasattr(usage, "prompt_token_count"):
+                                    operation_attrs["tokens.prompt"] = usage.prompt_token_count or 0
+                                if hasattr(usage, "response_token_count"):
                                    operation_attrs["tokens.completion"] = (
-                                        usage.responseTokenCount or 0
+                                        usage.response_token_count or 0
                                    )
-                                if hasattr(usage, "totalTokenCount"):
-                                    operation_attrs["tokens.total"] = usage.totalTokenCount or 0
+                                if hasattr(usage, "total_token_count"):
+                                    operation_attrs["tokens.total"] = usage.total_token_count or 0

                            # Get output text and modality from service state
                            text = getattr(self, "_bot_text_buffer", "")
@@ -751,9 +751,9 @@ def traced_gemini_live(operation: str) -> Callable:

                            # Add turn completion status
                            if (
-                                evt
-                                and hasattr(evt, "serverContent")
-                                and evt.serverContent.turnComplete
+                                msg
+                                and hasattr(msg, "server_content")
+                                and msg.server_content.turn_complete
                            ):
                                operation_attrs["turn_complete"] = True

@@ -772,16 +772,16 @@ def traced_gemini_live(operation: str) -> Callable:

                        # For llm_response operation, also handle token usage metrics
                        if operation == "llm_response" and hasattr(self, "start_llm_usage_metrics"):
-                            evt = args[0] if args else None
-                            if evt and hasattr(evt, "usageMetadata") and evt.usageMetadata:
-                                usage = evt.usageMetadata
+                            msg = args[0] if args else None
+                            if msg and hasattr(msg, "usage_metadata") and msg.usage_metadata:
+                                usage = msg.usage_metadata
                                # Create LLMTokenUsage object
                                from pipecat.metrics.metrics import LLMTokenUsage

                                tokens = LLMTokenUsage(
-                                    prompt_tokens=usage.promptTokenCount or 0,
-                                    completion_tokens=usage.responseTokenCount or 0,
-                                    total_tokens=usage.totalTokenCount or 0,
+                                    prompt_tokens=usage.prompt_token_count or 0,
+                                    completion_tokens=usage.response_token_count or 0,
+                                    total_tokens=usage.total_token_count or 0,
                                )
                                _add_token_usage_to_span(current_span, tokens)

--- a/tests/test_pipeline.py
+++ b/tests/test_pipeline.py
@@ -11,6 +11,7 @@ import unittest
 from pipecat.frames.frames import (
    CancelFrame,
    EndFrame,
+    ErrorFrame,
    Frame,
    HeartbeatFrame,
    InputAudioRawFrame,
@@ -450,3 +451,34 @@ class TestPipelineTask(unittest.IsolatedAsyncioTestCase):
            await task.run(PipelineTaskParams(loop=asyncio.get_event_loop()))
        except asyncio.CancelledError:
            assert cancelled
+
+    async def test_task_error(self):
+        class ErrorProcessor(FrameProcessor):
+            def __init__(self, **kwargs):
+                super().__init__(**kwargs)
+
+            async def process_frame(self, frame: Frame, direction: FrameDirection):
+                await super().process_frame(frame, direction)
+
+                if isinstance(frame, TextFrame):
+                    await self.push_error(ErrorFrame("Boo!"))
+
+                await self.push_frame(frame, direction)
+
+        error_received = False
+
+        pipeline = Pipeline([ErrorProcessor()])
+        task = PipelineTask(pipeline)
+
+        @task.event_handler("on_pipeline_error")
+        async def on_pipeline_error(task: PipelineTask, frame: ErrorFrame):
+            nonlocal error_received
+            error_received = True
+            await task.cancel()
+
+        await task.queue_frame(TextFrame(text="Hello from Pipecat!"))
+
+        try:
+            await task.run(PipelineTaskParams(loop=asyncio.get_event_loop()))
+        except asyncio.CancelledError:
+            assert error_received
Author	SHA1	Message	Date
James Hush	0656b8bf08	RTSP stream example	2025-10-09 13:54:32 +08:00
kompfner	106db69e8e	Merge pull request #2816 from pipecat-ai/pk/gemini-live-await-ongoing-response-after-endframe Implement ending `GeminiMultimodalLiveLLMService` gracefully (i.e. af…	2025-10-08 17:20:14 -04:00
Paul Kompfner	cf90071926	Format fix	2025-10-08 17:19:46 -04:00
Paul Kompfner	deaeb75a1f	Fix changelog after rebase (and add a missing item)	2025-10-08 17:16:31 -04:00
Paul Kompfner	a666327d70	Implement ending `GeminiMultimodalLiveLLMService` gracefully (i.e. after the bot is finished)	2025-10-08 17:13:04 -04:00
kompfner	13a0522546	Merge pull request #2804 from pipecat-ai/pk/gemini-live-session-resumption Add (relatively spartan) reconnection logic to `GeminiMultimodalLiveLLMService`	2025-10-08 17:10:45 -04:00
Paul Kompfner	7da37a0d1f	Pull `_connection_established_threshold` and `_max_consecutive_failures` into file-level constants	2025-10-08 17:04:05 -04:00
Paul Kompfner	7efb22a323	Add (relatively spartan) reconnection logic to `GeminiMultimodalLiveLLMService`, leveraging the Gemini Live session resumption mechanism	2025-10-08 16:53:21 -04:00
kompfner	8084e2f909	Merge pull request #2776 from pipecat-ai/pk/gemini-live-gen-ai-library Gemini Live service uses the `genai` library rather than WebSockets directly	2025-10-08 16:50:16 -04:00
Paul Kompfner	86127c6a6e	Add to the changelog the `GeminiMultimodalLiveLLMService` update to use `google-genai`	2025-10-08 16:46:41 -04:00
Paul Kompfner	402e019ae2	Make a bit of code clearer	2025-10-08 16:45:55 -04:00
Paul Kompfner	f09e4e238b	Fix some mishandling of enum values	2025-10-08 16:45:55 -04:00
Paul Kompfner	2921162b3b	Add deprecation warning around importing StartSensitivity and EndSensitivity from pipecat.services.gemini_multimodal_live.events	2025-10-08 16:45:55 -04:00
Paul Kompfner	ac1582c906	Let users directly use `google-genai` types rather than aliased re-exported types	2025-10-08 16:45:55 -04:00
Paul Kompfner	e4b01a5844	Bumping deprecation version of `GeminiMultimodalLiveLLMService`'s `base_url` arg	2025-10-08 16:45:55 -04:00
Paul Kompfner	fa663abbbc	Add CHANGELOG entry for new `GeminiMultimodalLiveLLMService` configuration options	2025-10-08 16:45:55 -04:00
Paul Kompfner	d19e6111c3	Bumping deprecation version of `GeminiMultimodalLiveLLMService`'s `base_url` arg	2025-10-08 16:45:55 -04:00
Paul Kompfner	8a6d504a7e	Add `enable_affective_dialog` and `proactivity` settings to `GeminiMultimodalLiveLLMService`	2025-10-08 16:45:55 -04:00
Paul Kompfner	43915937f2	Update how we import and re-export some types in `GeminiMultimodalLiveLLMService`	2025-10-08 16:45:55 -04:00
Paul Kompfner	48e92a22fe	Add thinking settings to `GeminiMultimodalLiveLLMService`	2025-10-08 16:45:55 -04:00
Paul Kompfner	566af6b0b8	Minor comment improvement	2025-10-08 16:45:55 -04:00
Paul Kompfner	12e7613d5f	Deprecate the `base_url` argument to `GeminiMultimodalLiveLLMService`. It expected a WebSocket URL, but we're no longer (directly) using WebSockets to talk to Gemini. Instead of trying to (potentially erroneously) map a given custom WebSocket URL to an `HttpOptions` object (the new preferred way of customizing requests made by the Gemini API client), we're simply deprecating `base_url` and pointing users to the `http_options` argument instead.	2025-10-08 16:45:55 -04:00
Paul Kompfner	04a68f2c57	Fix tracing in `GeminiMultimodalLiveLLMService`	2025-10-08 16:45:55 -04:00
Paul Kompfner	9b4ca12f49	Revert to only supporting providing a single modality - looks like specifying a list of modalities results in an API error. Also, fix some missing `await`s in error handling.	2025-10-08 16:45:55 -04:00
Paul Kompfner	453ce715a6	Add some error handling to `GeminiMultimodalLiveLLMService`	2025-10-08 16:45:55 -04:00
Paul Kompfner	d87b6189ba	Update `GeminiMultimodalLiveLLMService` to use the `google-genai` library, which is the new recommended approach for interfacing with Gemini Live.	2025-10-08 16:45:55 -04:00
Mark Backman	8293347b77	Merge pull request #2814 from pipecat-ai/mb/openai-service-tier Add service_tier to BaseOpenAILLMService	2025-10-08 16:44:27 -04:00
Mark Backman	c85a3f0b94	Add service_tier to BaseOpenAILLMService	2025-10-08 16:33:36 -04:00
Aleix Conchillo Flaqué	233fb25e6c	Merge pull request #2810 from pipecat-ai/aleix/on-pipeline-error PipelineTask: add on_pipeline_error event	2025-10-08 11:26:46 -07:00
Aleix Conchillo Flaqué	080978daa6	Merge pull request #2808 from pipecat-ai/aleix/readme-pipecat-tv README: add Pipecat TV reference	2025-10-08 11:26:17 -07:00
Aleix Conchillo Flaqué	62b7c3d3b2	PipelineTask: add on_pipeline_error event	2025-10-07 18:36:38 -07:00
Aleix Conchillo Flaqué	066b77fba0	README: add Pipecat TV reference	2025-10-07 15:01:28 -07:00