4.0 KiB
Understanding Different Frame Types in the Pipecat System
In the Pipecat system, frames are used to represent different types of data and control signals that flow through the pipeline. Understanding these frame types is crucial for working with the system effectively. This tutorial will cover the main categories of frames and their specific uses.
1. Base Frame Classes
Frame
The Frame class is the base class for all frames. It includes:
id: A unique identifiername: A descriptive namepts: Presentation timestamp (optional)
DataFrame
DataFrame is a subclass of Frame and serves as a base for most data-carrying frames.
2. Audio Frames
AudioRawFrame
Represents a chunk of audio with properties:
audio: Raw audio datasample_rate: Audio sample ratenum_channels: Number of audio channels
Subclasses include:
InputAudioRawFrame: For audio from input sourcesOutputAudioRawFrame: For audio to be played by output devicesTTSAudioRawFrame: For audio generated by Text-to-Speech services
3. Image Frames
ImageRawFrame
Represents an image with properties:
image: Raw image datasize: Image dimensionsformat: Image format (e.g., JPEG, PNG)
Subclasses include:
InputImageRawFrame: For images from input sourcesOutputImageRawFrame: For images to be displayedUserImageRawFrame: For images associated with a specific userVisionImageRawFrame: For images with associated text for descriptionURLImageRawFrame: For images with an associated URL
SpriteFrame
Represents an animated sprite, containing a list of ImageRawFrame objects.
4. Text and Transcription Frames
TextFrame
Represents a chunk of text, used for various purposes in the pipeline.
TranscriptionFrame
A specialized TextFrame for speech transcriptions, including:
user_id: ID of the speaking usertimestamp: When the transcription was generatedlanguage: Detected language of the speech
InterimTranscriptionFrame
Similar to TranscriptionFrame, but for interim (not final) transcriptions.
5. LLM (Language Model) Frames
LLMMessagesFrame
Contains a list of messages for an LLM service to process.
LLMMessagesAppendFrame and LLMMessagesUpdateFrame
Used to modify the current context of LLM messages.
LLMSetToolsFrame
Specifies tools (functions) available for the LLM to use.
LLMEnablePromptCachingFrame
Controls prompt caching in certain LLMs.
6. System and Control Frames
SystemFrame
Base class for system-level frames.
Important system frames include:
StartFrame: Initiates a pipelineCancelFrame: Stops a pipeline immediatelyErrorFrame: Notifies of errors (withFatalErrorFramefor unrecoverable errors)EndTaskFrameandCancelTaskFrame: Control pipeline tasksStartInterruptionFrameandStopInterruptionFrame: Indicate user speech for interruptions
ControlFrame
Base class for control-flow frames.
Notable control frames:
EndFrame: Signals the end of a pipelineLLMFullResponseStartFrameandLLMFullResponseEndFrame: Bracket LLM responsesUserStartedSpeakingFrameandUserStoppedSpeakingFrame: Indicate user speech activityBotStartedSpeakingFrameandBotStoppedSpeakingFrame: Indicate bot speech activityTTSStartedFrameandTTSStoppedFrame: Bracket Text-to-Speech responses
7. Special Purpose Frames
MetricsFrame
Contains performance metrics data.
FunctionCallInProgressFrame and FunctionCallResultFrame
Used for handling LLM function (tool) calls.
ServiceUpdateSettingsFrame
Base class for updating service settings, with specific subclasses for LLM, TTS, and STT services.
Conclusion
Understanding these frame types is essential for working with the Pipecat system. Each frame type serves a specific purpose in the pipeline, whether it's carrying data (like audio or images), controlling the flow of the pipeline, or managing system-level operations. By using the appropriate frame types, you can effectively process and transmit various kinds of information through your pipeline.