Processing metrics were an early addition that predated a clear
understanding of what timing measurements matter in real-time pipelines.
They were inconsistently implemented across services, often broken, and
overlapped with the better-defined TTFB metric.
- Remove ProcessingMetricsData class and all start/stop_processing_metrics
methods from FrameProcessorMetrics, FrameProcessor, and SentryMetrics
- Remove all processing metrics calls from 31 service files (LLM, TTS,
STT, image, vision, realtime)
- Clean up empty _start_metrics() stubs left in STT services
- Remove processing metrics handling from RTVI, metrics log observer,
pipeline task initial metrics, and strands agents framework
- Update tests and examples
Remaining metrics (TTFB, LLM token usage, TTS character usage, text
aggregation time) are well-defined and consistently implemented.
- Add InterruptionFrame handling with stop_all_metrics()
- Add processing metrics (start/stop) at response boundaries
- Fix agent transcript handling for voice and text modalities:
- Voice mode: push LLMTextFrame (append_to_context=False) and
TTSTextFrame for deltas, skip duplicated final text
- Text mode: push LLMTextFrame with proper response lifecycle,
no TTSTextFrame (downstream TTS handles audio)
- Add output_medium parameter to AgentInputParams and OneShotInputParams
- Improve TTFB measurement using VAD speech end time
- Update example with user turn strategies and transcript events
- Add text-only output example (50a-ultravox-realtime-text.py)
Move the sentence vs token aggregation concern into text aggregators
so all text flows through them regardless of mode. This enables
pattern detection and tag handling to work in TOKEN mode.
- Add TextAggregationMode enum (SENTENCE, TOKEN) as the user-facing
TTS setting, separate from the internal AggregationType
- Add TOKEN mode support to Simple, SkipTags, and PatternPair aggregators
- Add text_aggregation_mode parameter to TTSService and all TTS subclasses
- Deprecate aggregate_sentences in favor of text_aggregation_mode
- Merge TTSService._process_text_frame() into a single codepath
Add TextAggregationMetricsData measuring the time from the first LLM
token to the first complete sentence, representing the latency cost of
sentence aggregation in the TTS pipeline.
- Wire up passing speed setting to Groq, even though only a value of 1.0 is supported today
- Update the 55y example to switch voices instead of changing speed
- Add a 55zn example to exercise runtime updates of Groq STT
The only (rare) exception—where a service directly still needs to directly call `self._sync_model_name_to_metrics()`—is when the model name need to be "pulled" from another field (or nested field) in settings up to settings.model on a settings update. This only occurs in Deepgram services, where we use the voice as the model name.
This change has the side-effect of bringing model name to metrics for a number of services that were accidentally omitting it before.
This was added in 31daa889e8, but only
to `RimeTTSService`, not to `RimeNonJsonTTSService. Bringing these
to parity means that users switching between the two, with the same
inputs, have more consistent vocalization behaviors.
Introduce a generic TurnMetricsData class for turn detection metrics,
replacing the service-specific SmartTurnMetricsData (now deprecated).
Add end-to-end processing time measurement to KrispVivaTurn, tracking
the interval from VAD speech-to-silence transition to model threshold
crossing. Consume metrics in the strategy _handle_input_audio path
so they are pushed immediately when fresh.
The Krisp VIVA SDK v1.8.0 requires a license key in globalInit(). Add
api_key parameter to KrispVivaSDKManager, KrispVivaTurn, and
KrispVivaFilter with fallback to KRISP_API_KEY env var. Maintain
backwards compatibility with older SDK versions by catching TypeError
and falling back to the old 3-arg signature.
Brings back the 6 development workflow skills (changelog, cleanup,
code-review, docstring, pr-description, pr-submit) that were moved
to pipecat-ai/skills, and adds a .claude-plugin/marketplace.json so
other pipecat-ai repos can install them. Updates README contributing
section with installation instructions.
Audio filters like RNNoise, KrispViva, and AIC return empty bytes while
buffering audio to accumulate their required frame size. These empty
frames were flowing downstream, causing misleading "Empty audio frame
received for STT service" warnings.
Skip the frame in BaseInputTransport when audio is empty, preventing
unnecessary processing in VAD and downstream processors.
Fixes#3517