1. TTSTextFrames now include metadata about whether the text was spoken
or not along with a type string to describe what the text represents:
ex. "sentence", "word", "custom aggregation"
2. Expanded how aggregators work so that the aggregate method returns
aggregated text along with the type of aggregation used to create it
3. Deprecated the RTVI bot-transcription event in lieu of...
4. Introduced support for a new bot-output event. This event is meant
to be the one stop shop for communicating what the bot actually "says".
It is based off TTSTextFrames to communicate both sentence by sentence
(or whatever aggregation is used) as well as word by word. In addition,
it will include LLMTextFrames, aggregated by sentence when tts is
turned off (i.e. skip_tts is true).
Resolvespipecat-ai/pipecat-client-web#158