If we aggregate transcriptions we will get incorrect interruptions. For example, if we have a strategy with min_words=3 and we say "One" and pause, then "Two" and pause and then "Three", this would trigger the start of the turn when it shouldn't. We should only look at the incoming transcription text and don't aggregate it with the previous.
6.7 KiB
6.7 KiB