Only create the EmulateUserStartedSpeakingFrame if we have received a transcription.
This commit is contained in:
@@ -12,6 +12,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
||||
- For `LmntTTSService`, changed the default `model` to `blizzard`, LMNT's
|
||||
recommended model.
|
||||
|
||||
### Fixed
|
||||
|
||||
- Fixed an issue where, in some edge cases, the `EmulateUserStartedSpeakingFrame`
|
||||
could be created even if we didn't have a transcription.
|
||||
|
||||
## [0.0.76] - 2025-07-11
|
||||
|
||||
### Added
|
||||
|
||||
@@ -693,7 +693,11 @@ class LLMUserContextAggregator(LLMContextResponseAggregator):
|
||||
# to emulate VAD (i.e. user start/stopped speaking), but we do it only
|
||||
# if the bot is not speaking. If the bot is speaking and we really have
|
||||
# a short utterance we don't really want to interrupt the bot.
|
||||
if not self._user_speaking and not self._waiting_for_aggregation:
|
||||
if (
|
||||
not self._user_speaking
|
||||
and not self._waiting_for_aggregation
|
||||
and len(self._aggregation) > 0
|
||||
):
|
||||
if self._bot_speaking:
|
||||
# If we reached this case and the bot is speaking, let's ignore
|
||||
# what the user said.
|
||||
|
||||
Reference in New Issue
Block a user