Modified the BaseTextAggregator type so that when text gets aggregated, metadata can
be associated with it. Currently, that just means a `type`, so that the aggregation
can be classified or described. Changes made to support this:
- **IMPORTANT**: Aggregators are now expected to strip leading/trailing white space
characters before returning their aggregation from `aggregation()` or `.text`. This
way all aggregators have a consistent contract allowing downstream use to know how
to stitch aggregations back together
- Introduced a new `Aggregation` dataclass to represent both the aggregated `text` and
a string identifying the `type` of aggregation (ex. "sentence", "word", "my custom
aggregation")
- **BREAKING**: `BaseTextAggregator.text` now returns an `Aggregation` (instead of `str`).
To update: `aggregated_text = myAggregator.text` -> `aggregated_text = myAggregator.text.text`
- **BREAKING**: `BaseTextAggregator.aggregate()` now returns `Optional[Aggregation]`
(instead of `Optional[str]`). To update:
```
aggregation = myAggregator.aggregate(text)
if (aggregation):
print(f"successfully aggregated text: {aggregation.text}") // instead of {aggregation}
```
- `SimpleTextAggregator`, `SkipTagsAggregator`, `PatternPairAggregator` updated to
produce/consume `Aggregation` objects.
- All uses of the above Aggregators have been updated accordingly.
36 lines
1.3 KiB
Python
36 lines
1.3 KiB
Python
#
|
|
# Copyright (c) 2024-2025 Daily
|
|
#
|
|
# SPDX-License-Identifier: BSD 2-Clause License
|
|
#
|
|
|
|
import unittest
|
|
|
|
from pipecat.utils.text.simple_text_aggregator import SimpleTextAggregator
|
|
|
|
|
|
class TestSimpleTextAggregator(unittest.IsolatedAsyncioTestCase):
|
|
def setUp(self):
|
|
self.aggregator = SimpleTextAggregator()
|
|
|
|
async def test_reset_aggregations(self):
|
|
assert await self.aggregator.aggregate("Hello ") == None
|
|
assert self.aggregator.text.text == "Hello"
|
|
await self.aggregator.reset()
|
|
assert self.aggregator.text.text == ""
|
|
|
|
async def test_simple_sentence(self):
|
|
assert await self.aggregator.aggregate("Hello ") == None
|
|
aggregate = await self.aggregator.aggregate("Pipecat!")
|
|
assert aggregate.text == "Hello Pipecat!"
|
|
assert aggregate.type == "sentence"
|
|
assert self.aggregator.text.text == ""
|
|
|
|
async def test_multiple_sentences(self):
|
|
aggregate = await self.aggregator.aggregate("Hello Pipecat! How are ")
|
|
assert aggregate.text == "Hello Pipecat!"
|
|
# Aggregators should strip leading/trailing spaces when returning text
|
|
assert self.aggregator.text.text == "How are"
|
|
aggregate = await self.aggregator.aggregate("you?")
|
|
assert aggregate.text == "How are you?"
|