Files
pipecat/docs/frame.md
2024-12-09 20:44:10 -05:00

4.0 KiB

Understanding Different Frame Types in the Pipecat System

In the Pipecat system, frames are used to represent different types of data and control signals that flow through the pipeline. Understanding these frame types is crucial for working with the system effectively. This tutorial will cover the main categories of frames and their specific uses.

1. Base Frame Classes

Frame

The Frame class is the base class for all frames. It includes:

  • id: A unique identifier
  • name: A descriptive name
  • pts: Presentation timestamp (optional)

DataFrame

DataFrame is a subclass of Frame and serves as a base for most data-carrying frames.

2. Audio Frames

AudioRawFrame

Represents a chunk of audio with properties:

  • audio: Raw audio data
  • sample_rate: Audio sample rate
  • num_channels: Number of audio channels

Subclasses include:

  • InputAudioRawFrame: For audio from input sources
  • OutputAudioRawFrame: For audio to be played by output devices
  • TTSAudioRawFrame: For audio generated by Text-to-Speech services

3. Image Frames

ImageRawFrame

Represents an image with properties:

  • image: Raw image data
  • size: Image dimensions
  • format: Image format (e.g., JPEG, PNG)

Subclasses include:

  • InputImageRawFrame: For images from input sources
  • OutputImageRawFrame: For images to be displayed
  • UserImageRawFrame: For images associated with a specific user
  • VisionImageRawFrame: For images with associated text for description
  • URLImageRawFrame: For images with an associated URL

SpriteFrame

Represents an animated sprite, containing a list of ImageRawFrame objects.

4. Text and Transcription Frames

TextFrame

Represents a chunk of text, used for various purposes in the pipeline.

TranscriptionFrame

A specialized TextFrame for speech transcriptions, including:

  • user_id: ID of the speaking user
  • timestamp: When the transcription was generated
  • language: Detected language of the speech

InterimTranscriptionFrame

Similar to TranscriptionFrame, but for interim (not final) transcriptions.

5. LLM (Language Model) Frames

LLMMessagesFrame

Contains a list of messages for an LLM service to process.

LLMMessagesAppendFrame and LLMMessagesUpdateFrame

Used to modify the current context of LLM messages.

LLMSetToolsFrame

Specifies tools (functions) available for the LLM to use.

LLMEnablePromptCachingFrame

Controls prompt caching in certain LLMs.

6. System and Control Frames

SystemFrame

Base class for system-level frames.

Important system frames include:

  • StartFrame: Initiates a pipeline
  • CancelFrame: Stops a pipeline immediately
  • ErrorFrame: Notifies of errors (with FatalErrorFrame for unrecoverable errors)
  • EndTaskFrame and CancelTaskFrame: Control pipeline tasks
  • StartInterruptionFrame and StopInterruptionFrame: Indicate user speech for interruptions

ControlFrame

Base class for control-flow frames.

Notable control frames:

  • EndFrame: Signals the end of a pipeline
  • LLMFullResponseStartFrame and LLMFullResponseEndFrame: Bracket LLM responses
  • UserStartedSpeakingFrame and UserStoppedSpeakingFrame: Indicate user speech activity
  • BotStartedSpeakingFrame and BotStoppedSpeakingFrame: Indicate bot speech activity
  • TTSStartedFrame and TTSStoppedFrame: Bracket Text-to-Speech responses

7. Special Purpose Frames

MetricsFrame

Contains performance metrics data.

FunctionCallInProgressFrame and FunctionCallResultFrame

Used for handling LLM function (tool) calls.

ServiceUpdateSettingsFrame

Base class for updating service settings, with specific subclasses for LLM, TTS, and STT services.

Conclusion

Understanding these frame types is essential for working with the Pipecat system. Each frame type serves a specific purpose in the pipeline, whether it's carrying data (like audio or images), controlling the flow of the pipeline, or managing system-level operations. By using the appropriate frame types, you can effectively process and transmit various kinds of information through your pipeline.