Files
pipecat/changelog/+ultravox-server-interruption.fixed.md
Paul Kompfner a00211627f Surface server-side interruption from Nova Sonic and Ultravox
BaseOutputTransport only clears buffered audio mid-playback on
InterruptionFrame. Realtime services stream audio downstream as fast as
they produce it, and playback necessarily trails the buffer — so when the
user interrupts, the bot keeps talking past the interruption unless the
service surfaces the interruption to the pipeline.

Two realtime services were missing this signal:

  - AWS Nova Sonic acknowledged the INTERRUPTED stop reason internally
    (closing its own response state) but never broadcast InterruptionFrame.
  - Ultravox's playback_clear_buffer message — the server's explicit
    "drop buffered output audio" signal for interruptions — was not
    handled at all.

In both cases the latent bug was masked by enabling local VAD on the
user aggregator, which produced UserStartedSpeakingFrame and triggered
the aggregator-side interruption path. The realtime context aggregator
work makes local VAD optional, so the underlying gap needs fixing first.

Wire broadcast_interruption() into both services on the server-side
interruption signal, firing before the response-end signal so the
assistant aggregator marks the message interrupted=True before
LLMFullResponseEndFrame closes the turn.
2026-05-21 11:25:29 -04:00

660 B

  • Fixed Ultravox Realtime not surfacing server-side interruption. The server sends a playback_clear_buffer message when the user interrupts the bot mid-speech, instructing clients to drop buffered output audio; this was previously unhandled, so BaseOutputTransport kept playing the buffered audio and the bot kept talking past the interruption. Ultravox now broadcasts InterruptionFrame on playback_clear_buffer. This was previously masked by enabling local VAD on the user aggregator, which generated UserStartedSpeakingFrame and triggered the aggregator-side interruption path; the fix makes the behavior correct without local VAD as a workaround.