Fix Soniox processing metrics to measure token-to-transcript time

Move start_processing_metrics from run_stt (called per audio chunk, producing noisy 0ms logs) to _receive_messages when the first final token arrives for a new utterance. The existing stop_processing_metrics in send_endpoint_transcript completes the pair, giving a meaningful measurement of time from first recognition to finalized transcript.
2026-02-24 13:09:29 -05:00
parent 6f7664846c
commit 23ad181515
1 changed files with 2 additions and 2 deletions
--- a/src/pipecat/services/soniox/stt.py
+++ b/src/pipecat/services/soniox/stt.py
@@ -301,10 +301,8 @@ class SonioxSTTService(WebsocketSTTService):
        Yields:
            Frame: None (transcription results come via WebSocket callbacks).
        """
-        await self.start_processing_metrics()
        if self._websocket and self._websocket.state is State.OPEN:
            await self._websocket.send(audio)
-        await self.stop_processing_metrics()

        yield None

@@ -485,6 +483,8 @@ class SonioxSTTService(WebsocketSTTService):
                            # the rest will be sent as interim tokens (even final tokens).
                            await send_endpoint_transcript()
                        else:
+                            if not self._final_transcription_buffer:
+                                await self.start_processing_metrics()
                            self._final_transcription_buffer.append(token)
                    else:
                        non_final_transcription.append(token)