Add changelog for PR #3851

Remove processing metrics (ProcessingMetricsData)
Processing metrics were an early addition that predated a clear understanding of what timing measurements matter in real-time pipelines. They were inconsistently implemented across services, often broken, and overlapped with the better-defined TTFB metric. - Remove ProcessingMetricsData class and all start/stop_processing_metrics methods from FrameProcessorMetrics, FrameProcessor, and SentryMetrics - Remove all processing metrics calls from 31 service files (LLM, TTS, STT, image, vision, realtime) - Clean up empty _start_metrics() stubs left in STT services - Remove processing metrics handling from RTVI, metrics log observer, pipeline task initial metrics, and strands agents framework - Update tests and examples Remaining metrics (TTFB, LLM token usage, TTS character usage, text aggregation time) are well-defined and consistently implemented.
2026-02-26 18:27:50 -05:00 · 2026-02-26 18:20:49 -05:00
341 changed files with 4465 additions and 10136 deletions
--- a/.claude/skills/update-docs/SKILL.md
+++ b/.claude/skills/update-docs/SKILL.md
@@ -157,11 +157,7 @@ After processing all mapped pairs, check for two kinds of gaps:

 **Missing sections**: Mapped doc pages that are missing standard sections compared to the source. For example, a transport page with no Configuration section, or a service page with no InputParams table when the source defines `InputParams(BaseModel)`. Flag these and offer to add the missing sections.

-If the user wants a new page, do all three of the following:
-
-#### 8a: Create the doc page
-
-Create the new `.mdx` file using this template structure:
+If the user wants a new page, create it using this template structure:
 ```
 ---
 title: "Service Name"
@@ -211,53 +207,6 @@ pip install "pipecat-ai[package-name]"
 [Event table and example code]
 ```

-#### 8b: Add to docs.json
-
-Add the new page path to `DOCS_PATH/docs.json` in the correct navigation group. The path format is `server/services/{category}/{provider}` (without the `.mdx` extension).
-
-Find the matching group in the navigation structure:
- **STT** → `"group": "Speech-to-Text"` under Services
- **TTS** → `"group": "Text-to-Speech"` under Services
- **LLM** → `"group": "LLM"` under Services
- **S2S** → `"group": "Speech-to-Speech"` under Services
- **Transport** → `"group": "Transport"` under Services
- **Serializer** → `"group": "Serializers"` under Services
- **Image generation** → `"group": "Image Generation"` under Services
- **Video** → `"group": "Video"` under Services
- **Memory** → `"group": "Memory"` under Services
- **Vision** → `"group": "Vision"` under Services
- **Analytics** → `"group": "Analytics & Monitoring"` under Services
-
-Insert the new entry **alphabetically** within the group's `pages` array. For example, adding a new STT service "foo":
-```json
-{
-  "group": "Speech-to-Text",
-  "pages": [
-    "server/services/stt/assemblyai",
-    "server/services/stt/aws",
-    ...
-    "server/services/stt/foo",
-    ...
-  ]
-}
-```
-
-#### 8c: Add to supported-services.mdx
-
-Add a new row to the correct category table in `DOCS_PATH/server/services/supported-services.mdx`.
-
-Use this format:
-```
-| [DisplayName](/server/services/{category}/{provider}) | `pip install "pipecat-ai[package]"` |
-```
-
-To determine the correct values:
- **DisplayName**: Use the service's human-readable name (e.g., "ElevenLabs", "AWS Polly", "Google Gemini")
- **package**: Look at the service's `pyproject.toml` extras or the import pattern in the source code. For example, if the service is in `src/pipecat/services/foo/`, the package is typically `foo`.
- If no pip dependencies are required, use `No dependencies required` instead.
-
-Insert the new row **alphabetically** within the table. Match the column alignment of the existing rows.
-
 ### Step 9: Output summary

 After all edits are complete, print a summary:
@@ -272,9 +221,6 @@ After all edits are complete, print a summary:
 ### Updated guides
 - `guides/learn/speech-to-text.mdx` — Updated code example (renamed `old_param` → `new_param`)

-### New service pages
- `server/services/tts/newprovider.mdx` — Created page, added to docs.json (Text-to-Speech), added to supported-services.mdx
-
 ### Unmapped source files
 - `src/pipecat/services/newprovider/tts.py` — NewProviderTTSService (no doc page exists)

@@ -301,6 +247,4 @@ Before finishing, verify:
 - [ ] New parameters have accurate types and defaults from source
 - [ ] Formatting matches the existing page style
 - [ ] Guides referencing changed APIs were checked and updated
- [ ] New service pages were added to `docs.json` in the correct group, alphabetically
- [ ] New service pages were added to `supported-services.mdx` in the correct table, alphabetically
 - [ ] Unmapped files were reported to the user
--- a/.github/workflows/update-docs.yml
+++ b/.github/workflows/update-docs.yml
@@ -1,147 +0,0 @@
-name: Update Documentation on PR Merge
-
-on:
-  pull_request_target:
-    types: [closed]
-    branches: [main]
-    paths:
-      - "src/pipecat/services/**"
-      - "src/pipecat/transports/**"
-      - "src/pipecat/serializers/**"
-      - "src/pipecat/processors/**"
-      - "src/pipecat/audio/**"
-      - "src/pipecat/turns/**"
-      - "src/pipecat/observers/**"
-      - "src/pipecat/pipeline/**"
-  workflow_dispatch:
-    inputs:
-      pr_number:
-        description: "PR number to generate docs for"
-        required: true
-        type: string
-
-jobs:
-  update-docs:
-    if: >-
-      github.event_name == 'workflow_dispatch' ||
-      github.event.pull_request.merged == true
-    runs-on: ubuntu-latest
-    timeout-minutes: 15
-    permissions:
-      contents: read
-      pull-requests: read
-      id-token: write
-    steps:
-      - name: Checkout pipecat
-        uses: actions/checkout@v4
-        with:
-          fetch-depth: 0
-
-      - name: Checkout docs
-        uses: actions/checkout@v4
-        with:
-          repository: pipecat-ai/docs
-          token: ${{ secrets.DOCS_SYNC_TOKEN }}
-          path: _docs
-
-      - name: Resolve PR number
-        id: pr
-        run: |
-          if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
-            echo "number=${{ inputs.pr_number }}" >> "$GITHUB_OUTPUT"
-          else
-            echo "number=${{ github.event.pull_request.number }}" >> "$GITHUB_OUTPUT"
-          fi
-
-      - name: Update documentation
-        uses: anthropics/claude-code-action@v1
-        env:
-          DOCS_SYNC_TOKEN: ${{ secrets.DOCS_SYNC_TOKEN }}
-        with:
-          anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
-          github_token: ${{ secrets.GITHUB_TOKEN }}
-          prompt: |
-            You are updating documentation for the pipecat-ai/docs repository based on
-            changes merged in PR #${{ steps.pr.outputs.number }} of pipecat-ai/pipecat.
-
-            ## Setup
-
-            1. Read the skill instructions at `.claude/skills/update-docs/SKILL.md`
-            2. Read the source-to-doc mapping at `.claude/skills/update-docs/SOURCE_DOC_MAPPING.md`
-            3. The docs repository is checked out at `./_docs/`
-
-            ## Get the diff
-
-            Run `gh pr diff ${{ steps.pr.outputs.number }}` to see what changed in the PR.
-            Also run `gh pr diff ${{ steps.pr.outputs.number }} --name-only` to get the list of changed files.
-            Filter to source files matching the directories listed in SKILL.md Step 3.
-
-            If no relevant source files were changed, exit with "No documentation changes needed."
-
-            ## Follow the skill instructions
-
-            Apply the SKILL.md workflow (Steps 3-9) with these adaptations for automation:
-
-            ### Docs path
-            Use `./_docs/` — it's already checked out. Do not ask for a path.
-
-            ### Branch management
-            - Branch name: `docs/pr-${{ steps.pr.outputs.number }}`
-            - Work inside `./_docs/` for all doc edits and git operations
-            - Check if the branch already exists on the remote:
-              ```bash
-              cd _docs && git fetch origin docs/pr-${{ steps.pr.outputs.number }} 2>/dev/null
-              ```
-              - If it exists: check it out (supports workflow re-runs)
-              - If not: create it from main
-
-            ### Git config
-            Before committing in `_docs`, set:
-            ```bash
-            git config user.name "github-actions[bot]"
-            git config user.email "github-actions[bot]@users.noreply.github.com"
-            ```
-
-            ### No interactive questions
-            Do not ask questions. If you encounter gaps (unmapped files, missing sections,
-            ambiguous changes), note them in the PR body under "## Gaps identified".
-
-            ### Creating the docs PR
-            After committing all changes in `_docs`, push and create a PR:
-            ```bash
-            cd _docs
-            git push -u origin docs/pr-${{ steps.pr.outputs.number }}
-            GH_TOKEN=$DOCS_SYNC_TOKEN gh pr create \
-              --repo pipecat-ai/docs \
-              --label auto-docs \
-              --title "docs: update for pipecat PR #${{ steps.pr.outputs.number }}" \
-              --body "$(cat <<'BODY'
-            Automated documentation update for [pipecat PR #${{ steps.pr.outputs.number }}](https://github.com/pipecat-ai/pipecat/pull/${{ steps.pr.outputs.number }}).
-
-            ## Changes
-            <summarize each doc page updated and what changed>
-
-            ## Gaps identified
-            <any unmapped files, missing doc pages, or missing sections — or "None">
-            BODY
-            )"
-            ```
-
-            ### Re-run handling
-            If `gh pr create` fails because a PR from that branch already exists,
-            push the updated commits and use `gh pr edit` to update the body instead.
-
-            ### No-op
-            If after analyzing the diff you determine no documentation changes are needed
-            (e.g., only skip-listed files changed, or changes don't affect public API docs),
-            exit cleanly without creating a branch or PR. Output "No documentation changes needed."
-
-            ## Important rules
-            - Only modify files inside `./_docs/` — never modify pipecat source code
-            - Follow the conservative editing rules from SKILL.md Step 6
-            - Read each doc page fully before editing (SKILL.md Guidelines)
-            - Use `GH_TOKEN=$DOCS_SYNC_TOKEN` for all `gh` commands targeting pipecat-ai/docs
-          claude_args: |
-            --model claude-sonnet-4-5-20250929
-            --max-turns 30
-            --allowedTools "Read,Write,Edit,Glob,Grep,Bash"
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -7,389 +7,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

 <!-- towncrier release notes start -->

-## [0.0.104] - 2026-03-02
-
-### Added
-
- Added `TextAggregationMetricsData` metric measuring the time from the first
-  LLM token to the first complete sentence, representing the latency cost of
-  sentence aggregation in the TTS pipeline.
-  (PR [#3696](https://github.com/pipecat-ai/pipecat/pull/3696))
-
- Added support for using strongly-typed objects instead of dicts for updating
-  service settings at runtime.
-
-    Instead of, say:
-
-    ```python
-    await task.queue_frame(
-        STTUpdateSettingsFrame(settings={"language": Language.ES})
-    )
-    ```
-
-    you'd do:
-
-    ```python
-    await task.queue_frame(
-        STTUpdateSettingsFrame(delta=DeepgramSTTSettings(language=Language.ES))
-    )
-    ```
-
-  Each service now vends strongly-typed classes like `DeepgramSTTSettings`
-  representing the service's runtime-updatable settings.
-  (PR [#3714](https://github.com/pipecat-ai/pipecat/pull/3714))
-
- Added support for specifying private endpoints for Azure Speech-to-Text,
-  enabling use in private networks behind firewalls.
-  (PR [#3764](https://github.com/pipecat-ai/pipecat/pull/3764))
-
- Added `LemonSliceTransport` and `LemonSliceApi` to support adding real-time
-  LemonSlice Avatars to any Daily room.
-  (PR [#3791](https://github.com/pipecat-ai/pipecat/pull/3791))
-
- Added `output_medium` parameter to `AgentInputParams` and
-  `OneShotInputParams` in Ultravox service to control initial output medium
-  (text or voice) at call creation time.
-  (PR [#3806](https://github.com/pipecat-ai/pipecat/pull/3806))
-
- Added `TurnMetricsData` as a generic metrics class for turn detection, with
-  e2e processing time measurement. `KrispVivaTurn` now emits `TurnMetricsData`
-  with `e2e_processing_time_ms` tracking the interval from VAD
-  speech-to-silence transition to turn completion.
-  (PR [#3809](https://github.com/pipecat-ai/pipecat/pull/3809))
-
- Added `on_audio_context_interrupted()` and `on_audio_context_completed()`
-  callbacks to `AudioContextTTSService`. Subclasses can override these to
-  perform provider-specific cleanup instead of overriding
-  `_handle_interruption()`.
-  (PR [#3814](https://github.com/pipecat-ai/pipecat/pull/3814))
-
- Added `on_summary_applied` event to `LLMContextSummarizer` for observability,
-  providing message counts before and after context summarization.
-  (PR [#3855](https://github.com/pipecat-ai/pipecat/pull/3855))
-
- Added `summary_message_template` to `LLMContextSummarizationConfig` for
-  customizing how summaries are formatted when injected into context (e.g.,
-  wrapping in XML tags).
-  (PR [#3855](https://github.com/pipecat-ai/pipecat/pull/3855))
-
- Added `summarization_timeout` to `LLMContextSummarizationConfig` (default
-  120s) to prevent hung LLM calls from permanently blocking future
-  summarizations.
-  (PR [#3855](https://github.com/pipecat-ai/pipecat/pull/3855))
-
- Added optional `llm` field to `LLMContextSummarizationConfig` for routing
-  summarization to a dedicated LLM service (e.g., a cheaper/faster model)
-  instead of the pipeline's primary model.
-  (PR [#3855](https://github.com/pipecat-ai/pipecat/pull/3855))
-
- Add AssemblyAI u3-rt-pro model support with built-in turn detection mode
-  (PR [#3856](https://github.com/pipecat-ai/pipecat/pull/3856))
-
- Added `LLMSummarizeContextFrame` to trigger on-demand context summarization
-  from anywhere in the pipeline (e.g. a function call tool). Accepts an
-  optional `config: LLMContextSummaryConfig` to override summary generation
-  settings per request.
-  (PR [#3863](https://github.com/pipecat-ai/pipecat/pull/3863))
-
- Added `LLMContextSummaryConfig` (summary generation params:
-  `target_context_tokens`, `min_messages_after_summary`,
-  `summarization_prompt`) and `LLMAutoContextSummarizationConfig` (auto-trigger
-  thresholds: `max_context_tokens`, `max_unsummarized_messages`, plus a nested
-  `summary_config`). These replace the monolithic
-  `LLMContextSummarizationConfig`.
-  (PR [#3863](https://github.com/pipecat-ai/pipecat/pull/3863))
-
- Added support for the `speed_alpha` parameter to the `arcana` model in
-  `RimeTTSService`.
-  (PR [#3873](https://github.com/pipecat-ai/pipecat/pull/3873))
-
- Added `ClientConnectedFrame`, a new `SystemFrame` pushed by all transports
-  (Daily, LiveKit, FastAPI WebSocket, WebSocket Server, SmallWebRTC, HeyGen,
-  Tavus) when a client connects. Enables observers to track transport readiness
-  timing.
-  (PR [#3881](https://github.com/pipecat-ai/pipecat/pull/3881))
-
- Added `StartupTimingObserver` for measuring how long each processor's
-  `start()` method takes during pipeline startup. Also measures transport
-  readiness — the time from `StartFrame` to first client connection — via the
-  `on_transport_timing_report` event.
-  (PR [#3881](https://github.com/pipecat-ai/pipecat/pull/3881))
-
- Added `BotConnectedFrame` for SFU transports and `on_transport_timing_report`
-  event to `StartupTimingObserver` with bot and client connection timing.
-  (PR [#3881](https://github.com/pipecat-ai/pipecat/pull/3881))
-
- Added optional `direction` parameter to `PipelineTask.queue_frame()` and
-  `PipelineTask.queue_frames()`, allowing frames to be pushed upstream from the
-  end of the pipeline.
-  (PR [#3883](https://github.com/pipecat-ai/pipecat/pull/3883))
-
- Added `on_latency_breakdown` event to `UserBotLatencyObserver` providing
-  per-service TTFB, text aggregation, user turn duration, and function call
-  latency metrics for each user-to-bot response cycle.
-  (PR [#3885](https://github.com/pipecat-ai/pipecat/pull/3885))
-
- Added `on_first_bot_speech_latency` event to `UserBotLatencyObserver`
-  measuring the time from client connection to first bot speech. An
-  `on_latency_breakdown` is also emitted for this first speech event.
-  (PR [#3885](https://github.com/pipecat-ai/pipecat/pull/3885))
-
- Added `broadcast_interruption()` to `FrameProcessor`. This method pushes an
-  `InterruptionFrame` both upstream and downstream directly from the calling
-  processor, avoiding the round-trip through the pipeline task that
-  `push_interruption_task_frame_and_wait()` required.
-  (PR [#3896](https://github.com/pipecat-ai/pipecat/pull/3896))
-
-### Changed
-
- Added `text_aggregation_mode` parameter to `TTSService` and all TTS
-  subclasses with a new `TextAggregationMode` enum (`SENTENCE`, `TOKEN`). All
-  text now flows through text aggregators regardless of mode, enabling pattern
-  detection and tag handling in TOKEN mode.
-  (PR [#3696](https://github.com/pipecat-ai/pipecat/pull/3696))
-
- ⚠️ Refactored runtime-updatable service settings to use strongly-typed
-  classes (`TTSSettings`, `STTSettings`, `LLMSettings`, and service-specific
-  subclasses) instead of plain dicts. Each service's `_settings` now holds
-  these strongly-typed objects. For service maintainers, see changes in
-  COMMUNITY_INTEGRATIONS.md.
-  (PR [#3714](https://github.com/pipecat-ai/pipecat/pull/3714))
-
- Word timestamp support has been moved from `WordTTSService` into `TTSService`
-  via a new `supports_word_timestamps` parameter. Services that previously
-  extended `WordTTSService`, `AudioContextWordTTSService`, or
-  `WebsocketWordTTSService` now pass `supports_word_timestamps=True` to their
-  parent `__init__` instead.
-  (PR [#3786](https://github.com/pipecat-ai/pipecat/pull/3786))
-
- Improved Ultravox TTFB measurement accuracy by using VAD speech end time
-  instead of `UserStoppedSpeakingFrame` timing.
-  (PR [#3806](https://github.com/pipecat-ai/pipecat/pull/3806))
-
- Aligned `UltravoxRealtimeLLMService` frame handling with OpenAI/Gemini
-  realtime services: added `InterruptionFrame` handling with metrics cleanup,
-  processing metrics at response boundaries, and improved agent transcript
-  handling for both voice and text output modalities.
-  (PR [#3806](https://github.com/pipecat-ai/pipecat/pull/3806))
-
- Updated `OpenAIRealtimeLLMService` default model to `gpt-realtime-1.5`.
-  (PR [#3807](https://github.com/pipecat-ai/pipecat/pull/3807))
-
- Added `api_key` parameter to `KrispVivaSDKManager`, `KrispVivaTurn`, and
-  `KrispVivaFilter` for Krisp SDK v1.6.1+ licensing. Falls back to
-  `KRISP_VIVA_API_KEY` environment variable.
-  (PR [#3809](https://github.com/pipecat-ai/pipecat/pull/3809))
-
- Bumped `nltk` minimum version from 3.9.1 to 3.9.3 to resolve a security
-  vulnerability.
-  (PR [#3811](https://github.com/pipecat-ai/pipecat/pull/3811))
-
- `ServiceSettingsUpdateFrame`s are now `UninterruptibleFrame`s. Generally
-  speaking, you don't want a user interruption to prevent a service setting
-  change from going into effect. Note that you usually don't use
-  `ServiceSettingsUpdateFrame` directly, you use one of its subclasses:
-    - `LLMUpdateSettingsFrame`
-    - `TTSUpdateSettingsFrame`
-    - `STTUpdateSettingsFrame`
-  (PR [#3819](https://github.com/pipecat-ai/pipecat/pull/3819))
-
- Updated context summarization to use `user` role instead of `assistant` for
-  summary messages.
-  (PR [#3855](https://github.com/pipecat-ai/pipecat/pull/3855))
-
- Rename `AssemblyAISTTService` parameter
-  `min_end_of_turn_silence_when_confident` parameter to `min_turn_silence` (old
-  name still supported with deprecation warning)
-  (PR [#3856](https://github.com/pipecat-ai/pipecat/pull/3856))
-
- ⚠️ Renamed `LLMAssistantAggregatorParams` fields:
-  `enable_context_summarization` → `enable_auto_context_summarization` and
-  `context_summarization_config` → `auto_context_summarization_config` (now
-  accepts `LLMAutoContextSummarizationConfig`). The old names still work with a
-  `DeprecationWarning` for one release cycle.
-  (PR [#3863](https://github.com/pipecat-ai/pipecat/pull/3863))
-
- `ElevenLabsRealtimeSTTService` now sets `TranscriptionFrame.finalized` to
-  `True` when using `CommitStrategy.MANUAL`.
-  (PR [#3865](https://github.com/pipecat-ai/pipecat/pull/3865))
-
- Updated numba version pin from == to >=0.61.2
-  (PR [#3868](https://github.com/pipecat-ai/pipecat/pull/3868))
-
- Updated tracing code to use `ServiceSettings` dataclass API
-  (`given_fields()`, attribute access) instead of dict-style access
-  (`.items()`, `in`, subscript).
-  (PR [#3879](https://github.com/pipecat-ai/pipecat/pull/3879))
-
- ⚠️ Removed `event` field and `complete()` method from `InterruptionFrame`.
-  Removed `event` field from `InterruptionTaskFrame`. These are no longer
-  needed since `broadcast_interruption()` does not require a round-trip
-  completion signal.
-  (PR [#3896](https://github.com/pipecat-ai/pipecat/pull/3896))
-
- Moved `pipecat.services.deepgram.stt_sagemaker` and
-  `pipecat.services.deepgram.tts_sagemaker` to
-  `pipecat.services.deepgram.sagemaker.stt` and
-  `pipecat.services.deepgram.sagemaker.tts`. The old import paths still work
-  but emit a `DeprecationWarning`.
-  (PR [#3902](https://github.com/pipecat-ai/pipecat/pull/3902))
-
-### Deprecated
-
- ⚠️ Deprecated `aggregate_sentences` parameter on `TTSService` and all TTS
-  subclasses. Use `text_aggregation_mode=TextAggregationMode.SENTENCE` or
-  `text_aggregation_mode=TextAggregationMode.TOKEN` instead.
-  (PR [#3696](https://github.com/pipecat-ai/pipecat/pull/3696))
-
- Deprecated `set_model()`, `set_voice()`, and `set_language()` on AI services
-  in favor of runtime updates via `TTSUpdateSettingsFrame`,
-  `STTUpdateSettingsFrame`, and `LLMUpdateSettingsFrame`.
-
-  ⚠️ Note, too, a subtle behavior change in these deprecated methods. Whereas
-  previously only `set_language()` caused the service to actually react to the
-  update (e.g. by reconnecting to a remote service so it an pick up the
-  change), now all these methods do. This change was made as part of a refactor
-  making them all work the same way under the hood.
-  (PR [#3714](https://github.com/pipecat-ai/pipecat/pull/3714))
-
- Dict-based `*UpdateSettingsFrame(settings={...})` is deprecated in favor of
-  passing typed settings delta objects with
-  `*UpdateSettingsFrame(delta={...})`.
-  (PR [#3714](https://github.com/pipecat-ai/pipecat/pull/3714))
-
- Deprecated `WordTTSService`, `WebsocketWordTTSService`,
-  `AudioContextWordTTSService`, and `InterruptibleWordTTSService`. Use their
-  non-word counterparts with `supports_word_timestamps=True` instead:
-    - `WordTTSService` → `TTSService(supports_word_timestamps=True)`
-    - `WebsocketWordTTSService` →
-  `WebsocketTTSService(supports_word_timestamps=True)`
-    - `AudioContextWordTTSService` →
-  `AudioContextTTSService(supports_word_timestamps=True)`
-    - `InterruptibleWordTTSService` →
-  `InterruptibleTTSService(supports_word_timestamps=True)`
-  (PR [#3786](https://github.com/pipecat-ai/pipecat/pull/3786))
-
- Deprecated `SmartTurnMetricsData` in favor of `TurnMetricsData`.
-  `BaseSmartTurn` now emits `TurnMetricsData` directly.
-  (PR [#3809](https://github.com/pipecat-ai/pipecat/pull/3809))
-
- Deprecated `LLMContextSummarizationConfig`. Use
-  `LLMAutoContextSummarizationConfig` with a nested `LLMContextSummaryConfig`
-  instead. The old class emits a `DeprecationWarning`.
-  (PR [#3863](https://github.com/pipecat-ai/pipecat/pull/3863))
-
- Deprecated `push_interruption_task_frame_and_wait()` in `FrameProcessor`. Use
-  `broadcast_interruption()` instead. The old method now delegates to
-  `broadcast_interruption()` and logs a deprecation warning.
-  (PR [#3896](https://github.com/pipecat-ai/pipecat/pull/3896))
-
-### Removed
-
- Removed `local-smart-turn-v3` optional extra from `pyproject.toml`. The
-  `transformers` and `onnxruntime` packages are now always installed as core
-  dependencies since they are required by the default turn stop strategy,
-  `TurnAnalyzerUserTurnStopStrategy` which uses `LocalSmartTurnAnalyzerV3`.
-  (PR [#3803](https://github.com/pipecat-ai/pipecat/pull/3803))
-
- ⚠️ Removed `PlayHTTTSService` and `PlayHTHttpTTSService`. PlayHT has been
-  shut down and is no longer available.
-  (PR [#3838](https://github.com/pipecat-ai/pipecat/pull/3838))
-
-### Fixed
-
- Added `LLMSpecificMessage` handling in `LLMContextSummarizationUtil` to skip
-  provider-specific messages during context summarization.
-  (PR [#3794](https://github.com/pipecat-ai/pipecat/pull/3794))
-
- Treated `response_cancel_not_active` as a non-fatal error in realtime
-  services (`OpenAIRealtimeLLMService`, `GrokRealtimeLLMService`,
-  `OpenAIRealtimeBetaLLMService`) to prevent WebSocket disconnection when
-  cancelling an inactive response.
-  (PR [#3795](https://github.com/pipecat-ai/pipecat/pull/3795))
-
- Fixed Poetry compatibility by inlining `local-smart-turn-v3` dependencies
-  (`transformers`, `onnxruntime`) into core dependencies instead of using a
-  self-referential extra.
-  (PR [#3803](https://github.com/pipecat-ai/pipecat/pull/3803))
-
- Fixed `SentryMetrics` method signatures to match updated
-  `FrameProcessorMetrics` base class, resolving `TypeError` when using
-  `start_time`/`end_time` keyword arguments.
-  (PR [#3808](https://github.com/pipecat-ai/pipecat/pull/3808))
-
- Fixed STT TTFB metrics not being reported for `SonioxSTTService` and
-  `AWSTranscribeSTTService` due to missing `can_generate_metrics()` override.
-  (PR [#3813](https://github.com/pipecat-ai/pipecat/pull/3813))
-
- Fixed an issue where `AudioContextTTSService`-based providers (AsyncAI,
-  ElevenLabs, Inworld, Rime) did not close or clean up their server-side audio
-  contexts after normal speech completion, only on interruption.
-  (PR [#3814](https://github.com/pipecat-ai/pipecat/pull/3814))
-
- Fixed STT TTFB metrics measuring timeout expiry time instead of actual
-  transcript arrival time.
-  (PR [#3822](https://github.com/pipecat-ai/pipecat/pull/3822))
-
- Fixed `InterimTranscriptionFrame` and `TranslationFrame` being
-  unintentionally pushed downstream in `LLMUserAggregator`. They are now
-  consumed like `TranscriptionFrame`.
-  (PR [#3825](https://github.com/pipecat-ai/pipecat/pull/3825))
-
- Fixed misleading "Empty audio frame received for STT service" warnings when
-  using audio filters (e.g. `RNNoiseFilter`, `KrispVivaFilter`, `AICFilter`)
-  that buffer audio internally.
-  (PR [#3828](https://github.com/pipecat-ai/pipecat/pull/3828))
-
- Fixed issues with `RimeNonJsonTTSService` where trailing punctuation is
-  sometimes vocalized
-  (PR [#3837](https://github.com/pipecat-ai/pipecat/pull/3837))
-
- Fixed `TTSSpeakFrame` not committing spoken text to the conversation context
-  when used outside of an LLM response (e.g., bot greetings or injected
-  speech).
-  (PR [#3845](https://github.com/pipecat-ai/pipecat/pull/3845))
-
- Removed verbose per-chunk audio logging from `GenesysAudioHookSerializer`
-  that flooded production logs.
-  (PR [#3850](https://github.com/pipecat-ai/pipecat/pull/3850))
-
- Add beta feature warning when using custom prompts with AssemblyAI
-  (PR [#3856](https://github.com/pipecat-ai/pipecat/pull/3856))
-
- Fixed `LocalSmartTurnAnalyzerV3` producing incorrect end-of-turn predictions
-  at non-16kHz sample rates (e.g. 8kHz Twilio telephony) by adding automatic
-  resampling to 16kHz before Whisper feature extraction.
-  (PR [#3857](https://github.com/pipecat-ai/pipecat/pull/3857))
-
- Fixed `PipelineTask` double-inserting `RTVIProcessor` into the frame chain
-  when the user provides both an `RTVIProcessor` in the pipeline and a custom
-  `RTVIObserver` subclass in observers.
-  (PR [#3867](https://github.com/pipecat-ai/pipecat/pull/3867))
-
- Fixed turn completion instructions being lost when `LLMMessagesUpdateFrame`
-  replaces the LLM context. When `filter_incomplete_user_turns` is enabled, the
-  turn completion system message is now re-injected after context replacement.
-  (PR [#3888](https://github.com/pipecat-ai/pipecat/pull/3888))
-
- Fixed Azure TTS and STT services silently swallowing cancellation errors
-  (invalid API key, network failures, rate limiting) instead of propagating
-  them as `ErrorFrame`s to the pipeline.
-  (PR [#3893](https://github.com/pipecat-ai/pipecat/pull/3893))
-
-### Performance
-
- Switched `GradiumTTSService` from `InterruptibleWordTTSService` to
-  `AudioContextWordTTSService`, eliminating websocket disconnect/reconnect on
-  every interruption by using `client_req_id`-based multiplexing.
-  (PR [#3759](https://github.com/pipecat-ai/pipecat/pull/3759))
-
-### Other
-
- Standardized Sarvam STT/TTS User-Agent header handling to consistently send
-  Pipecat SDK identity in websocket requests.
-  (PR [#3886](https://github.com/pipecat-ai/pipecat/pull/3886))
-
 ## [0.0.103] - 2026-02-20

 ### Added
--- a/README.md
+++ b/README.md
@@ -89,7 +89,7 @@ Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.yout
 | Speech-to-Speech    | [AWS Nova Sonic](https://docs.pipecat.ai/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/server/services/s2s/ultravox),                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
 | Transport           | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
 | Serializers         | [Exotel](https://docs.pipecat.ai/server/utilities/serializers/exotel), [Plivo](https://docs.pipecat.ai/server/utilities/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/utilities/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/utilities/serializers/telnyx), [Vonage](https://docs.pipecat.ai/server/utilities/serializers/vonage)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
-| Video               | [HeyGen](https://docs.pipecat.ai/server/services/video/heygen), [LemonSlice](https://docs.pipecat.ai/server/services/video/lemonslice), [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
+| Video               | [HeyGen](https://docs.pipecat.ai/server/services/video/heygen), [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
 | Memory              | [mem0](https://docs.pipecat.ai/server/services/memory/mem0)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
 | Vision & Image      | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/google-imagen), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
 | Audio Processing    | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [ai-coustics](https://docs.pipecat.ai/server/utilities/audio/aic-filter)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
--- a/changelog/3696.added.md
+++ b/changelog/3696.added.md
@@ -0,0 +1 @@
+- Added `TextAggregationMetricsData` metric measuring the time from the first LLM token to the first complete sentence, representing the latency cost of sentence aggregation in the TTS pipeline.
--- a/changelog/3696.changed.md
+++ b/changelog/3696.changed.md
@@ -0,0 +1 @@
+- Added `text_aggregation_mode` parameter to `TTSService` and all TTS subclasses with a new `TextAggregationMode` enum (`SENTENCE`, `TOKEN`). All text now flows through text aggregators regardless of mode, enabling pattern detection and tag handling in TOKEN mode.
--- a/changelog/3696.deprecated.md
+++ b/changelog/3696.deprecated.md
@@ -0,0 +1 @@
+- ⚠️ Deprecated `aggregate_sentences` parameter on `TTSService` and all TTS subclasses. Use `text_aggregation_mode=TextAggregationMode.SENTENCE` or `text_aggregation_mode=TextAggregationMode.TOKEN` instead.
--- a/changelog/3714.added.md
+++ b/changelog/3714.added.md
@@ -0,0 +1,19 @@
+- Added support for using strongly-typed objects instead of dicts for updating service settings at runtime.
+
+  Instead of, say:
+
+  ```python
+  await task.queue_frame(
+      STTUpdateSettingsFrame(settings={"language": Language.ES})
+  )
+  ```
+
+  you'd do:
+
+  ```python
+  await task.queue_frame(
+      STTUpdateSettingsFrame(delta=DeepgramSTTSettings(language=Language.ES))
+  )
+  ```
+
+  Each service now vends strongly-typed classes like `DeepgramSTTSettings` representing the service's runtime-updatable settings.
--- a/changelog/3714.changed.md
+++ b/changelog/3714.changed.md
@@ -0,0 +1 @@
+- ⚠️ Refactored runtime-updatable service settings to use strongly-typed classes (`TTSSettings`, `STTSettings`, `LLMSettings`, and service-specific subclasses) instead of plain dicts. Each service's `_settings` now holds these strongly-typed objects. For service maintainers, see changes in COMMUNITY_INTEGRATIONS.md.
--- a/changelog/3714.deprecated.2.md
+++ b/changelog/3714.deprecated.2.md
@@ -0,0 +1 @@
+- Dict-based `*UpdateSettingsFrame(settings={...})` is deprecated in favor of passing typed settings delta objects with `*UpdateSettingsFrame(delta={...})`.
--- a/changelog/3714.deprecated.md
+++ b/changelog/3714.deprecated.md
@@ -0,0 +1,3 @@
+- Deprecated `set_model()`, `set_voice()`, and `set_language()` on AI services in favor of runtime updates via `TTSUpdateSettingsFrame`, `STTUpdateSettingsFrame`, and `LLMUpdateSettingsFrame`.
+
+  ⚠️ Note, too, a subtle behavior change in these deprecated methods. Whereas previously only `set_language()` caused the service to actually react to the update (e.g. by reconnecting to a remote service so it an pick up the change), now all these methods do. This change was made as part of a refactor making them all work the same way under the hood.
--- a/changelog/3759.performance.md
+++ b/changelog/3759.performance.md
@@ -0,0 +1 @@
+- Switched `GradiumTTSService` from `InterruptibleWordTTSService` to `AudioContextWordTTSService`, eliminating websocket disconnect/reconnect on every interruption by using `client_req_id`-based multiplexing.
--- a/changelog/3786.changed.md
+++ b/changelog/3786.changed.md
@@ -0,0 +1 @@
+- Word timestamp support has been moved from `WordTTSService` into `TTSService` via a new `supports_word_timestamps` parameter. Services that previously extended `WordTTSService`, `AudioContextWordTTSService`, or `WebsocketWordTTSService` now pass `supports_word_timestamps=True` to their parent `__init__` instead.
--- a/changelog/3786.deprecated.md
+++ b/changelog/3786.deprecated.md
@@ -0,0 +1,5 @@
+- Deprecated `WordTTSService`, `WebsocketWordTTSService`, `AudioContextWordTTSService`, and `InterruptibleWordTTSService`. Use their non-word counterparts with `supports_word_timestamps=True` instead:
+  - `WordTTSService` → `TTSService(supports_word_timestamps=True)`
+  - `WebsocketWordTTSService` → `WebsocketTTSService(supports_word_timestamps=True)`
+  - `AudioContextWordTTSService` → `AudioContextTTSService(supports_word_timestamps=True)`
+  - `InterruptibleWordTTSService` → `InterruptibleTTSService(supports_word_timestamps=True)`
--- a/changelog/3803.fixed.md
+++ b/changelog/3803.fixed.md
@@ -0,0 +1 @@
+- Fixed Poetry compatibility by inlining `local-smart-turn-v3` dependencies (`transformers`, `onnxruntime`) into core dependencies instead of using a self-referential extra.
--- a/changelog/3803.removed.md
+++ b/changelog/3803.removed.md
@@ -0,0 +1 @@
+- Removed `local-smart-turn-v3` optional extra from `pyproject.toml`. The `transformers` and `onnxruntime` packages are now always installed as core dependencies since they are required by the default turn stop strategy, `TurnAnalyzerUserTurnStopStrategy` which uses `LocalSmartTurnAnalyzerV3`.
--- a/changelog/3806.added.md
+++ b/changelog/3806.added.md
@@ -0,0 +1 @@
+- Added `output_medium` parameter to `AgentInputParams` and `OneShotInputParams` in Ultravox service to control initial output medium (text or voice) at call creation time.
--- a/changelog/3806.changed.2.md
+++ b/changelog/3806.changed.2.md
@@ -0,0 +1 @@
+- Improved Ultravox TTFB measurement accuracy by using VAD speech end time instead of `UserStoppedSpeakingFrame` timing.
--- a/changelog/3806.changed.md
+++ b/changelog/3806.changed.md
@@ -0,0 +1 @@
+- Aligned `UltravoxRealtimeLLMService` frame handling with OpenAI/Gemini realtime services: added `InterruptionFrame` handling with metrics cleanup, processing metrics at response boundaries, and improved agent transcript handling for both voice and text output modalities.
--- a/changelog/3807.changed.md
+++ b/changelog/3807.changed.md
@@ -0,0 +1 @@
+- Updated `OpenAIRealtimeLLMService` default model to `gpt-realtime-1.5`.
--- a/changelog/3808.fixed.md
+++ b/changelog/3808.fixed.md
@@ -0,0 +1 @@
+- Fixed `SentryMetrics` method signatures to match updated `FrameProcessorMetrics` base class, resolving `TypeError` when using `start_time`/`end_time` keyword arguments.
--- a/changelog/3809.added.md
+++ b/changelog/3809.added.md
@@ -0,0 +1 @@
+- Added `TurnMetricsData` as a generic metrics class for turn detection, with e2e processing time measurement. `KrispVivaTurn` now emits `TurnMetricsData` with `e2e_processing_time_ms` tracking the interval from VAD speech-to-silence transition to turn completion.
--- a/changelog/3809.changed.md
+++ b/changelog/3809.changed.md
@@ -0,0 +1 @@
+- Added `api_key` parameter to `KrispVivaSDKManager`, `KrispVivaTurn`, and `KrispVivaFilter` for Krisp SDK v1.6.1+ licensing. Falls back to `KRISP_VIVA_API_KEY` environment variable.
--- a/changelog/3809.deprecated.md
+++ b/changelog/3809.deprecated.md
@@ -0,0 +1 @@
+- Deprecated `SmartTurnMetricsData` in favor of `TurnMetricsData`. `BaseSmartTurn` now emits `TurnMetricsData` directly.
--- a/changelog/3811.changed.md
+++ b/changelog/3811.changed.md
@@ -0,0 +1 @@
+- Bumped `nltk` minimum version from 3.9.1 to 3.9.3 to resolve a security vulnerability.
--- a/changelog/3813.fixed.md
+++ b/changelog/3813.fixed.md
@@ -0,0 +1 @@
+- Fixed STT TTFB metrics not being reported for `SonioxSTTService` and `AWSTranscribeSTTService` due to missing `can_generate_metrics()` override.
--- a/changelog/3814.added.md
+++ b/changelog/3814.added.md
@@ -0,0 +1 @@
+- Added `on_audio_context_interrupted()` and `on_audio_context_completed()` callbacks to `AudioContextTTSService`. Subclasses can override these to perform provider-specific cleanup instead of overriding `_handle_interruption()`.
--- a/changelog/3814.fixed.md
+++ b/changelog/3814.fixed.md
@@ -0,0 +1 @@
+- Fixed an issue where `AudioContextTTSService`-based providers (AsyncAI, ElevenLabs, Inworld, Rime) did not close or clean up their server-side audio contexts after normal speech completion, only on interruption.
--- a/changelog/3819.changed.md
+++ b/changelog/3819.changed.md
@@ -0,0 +1,4 @@
+- `ServiceSettingsUpdateFrame`s are now `UninterruptibleFrame`s. Generally speaking, you don't want a user interruption to prevent a service setting change from going into effect. Note that you usually don't use `ServiceSettingsUpdateFrame` directly, you use one of its subclasses:
+  - `LLMUpdateSettingsFrame`
+  - `TTSUpdateSettingsFrame`
+  - `STTUpdateSettingsFrame`
--- a/changelog/3822.fixed.md
+++ b/changelog/3822.fixed.md
@@ -0,0 +1 @@
+- Fixed STT TTFB metrics measuring timeout expiry time instead of actual transcript arrival time.
--- a/changelog/3825.fixed.md
+++ b/changelog/3825.fixed.md
@@ -0,0 +1 @@
+- Fixed `InterimTranscriptionFrame` and `TranslationFrame` being unintentionally pushed downstream in `LLMUserAggregator`. They are now consumed like `TranscriptionFrame`.
--- a/changelog/3828.fixed.md
+++ b/changelog/3828.fixed.md
@@ -0,0 +1 @@
+- Fixed misleading "Empty audio frame received for STT service" warnings when using audio filters (e.g. `RNNoiseFilter`, `KrispVivaFilter`, `AICFilter`) that buffer audio internally.
--- a/changelog/3837.fixed.md
+++ b/changelog/3837.fixed.md
@@ -0,0 +1 @@
+- Fixed issues with `RimeNonJsonTTSService` where trailing punctuation is sometimes vocalized
--- a/changelog/3838.removed.md
+++ b/changelog/3838.removed.md
@@ -0,0 +1 @@
+- ⚠️ Removed `PlayHTTTSService` and `PlayHTHttpTTSService`. PlayHT has been shut down and is no longer available.
--- a/changelog/3848.changed.md
+++ b/changelog/3848.changed.md
@@ -1 +0,0 @@
- ⚠️ Updated `DeepgramSTTService` to use `deepgram-sdk` v6. The `LiveOptions` class was removed from the SDK and is now provided by pipecat directly; import it from `pipecat.services.deepgram.stt` instead of `deepgram`.
--- a/changelog/3848.fixed.md
+++ b/changelog/3848.fixed.md
@@ -1 +0,0 @@
- Fixed `DeepgramSTTService` keepalive ping timeout disconnections. The deepgram-sdk v6 removed automatic keepalive; pipecat now sends explicit `KeepAlive` messages every 5 seconds, within the recommended 3–5 second interval before Deepgram's 10-second inactivity timeout.
--- a/changelog/3851.removed.md
+++ b/changelog/3851.removed.md
@@ -0,0 +1 @@
+- ⚠️ Removed `ProcessingMetricsData` and all `start_processing_metrics()`/`stop_processing_metrics()` methods from `FrameProcessor` and `FrameProcessorMetrics`. These metrics were inconsistently implemented across services and overlapped with the better-defined TTFB metric. TTFB, LLM token usage, TTS character usage, and text aggregation metrics are unaffected.
--- a/changelog/3889.changed.md
+++ b/changelog/3889.changed.md
@@ -1,3 +0,0 @@
- Support for Voice Focus 2.0 models.
-  - Updated `aic-sdk` to `~=2.1.0` to support Voice Focus 2.0 models.
-  - Cleaned unused `ParameterFixedError` exception handling in `AICFilter` parameter setup. 
--- a/changelog/3889.fixed.md
+++ b/changelog/3889.fixed.md
@@ -1 +0,0 @@
- Fixed `BufferError: Existing exports of data: object cannot be re-sized` in `AICFilter` caused by holding a `memoryview` on the mutable audio buffer across async yield points.
--- a/changelog/3914.changed.md
+++ b/changelog/3914.changed.md
@@ -1 +0,0 @@
- `max_context_tokens` and `max_unsummarized_messages` in `LLMAutoContextSummarizationConfig` (and deprecated `LLMContextSummarizationConfig`) can now be set to `None` independently to disable that summarization threshold. At least one must remain set.
--- a/changelog/3915.added.md
+++ b/changelog/3915.added.md
@@ -1 +0,0 @@
- Added optional `timeout_secs` parameter to `register_function()` and `register_direct_function()` for per-tool function call timeout control, overriding the global `function_call_timeout_secs` default.
--- a/changelog/3916.added.md
+++ b/changelog/3916.added.md
@@ -1 +0,0 @@
- Added `cloud-audio-only` recording option to Daily transport's `enable_recording` property.
--- a/changelog/3918.added.md
+++ b/changelog/3918.added.md
@@ -1,15 +0,0 @@
- Wired up `system_instruction` in `BaseOpenAILLMService`, `AnthropicLLMService`, and `AWSBedrockLLMService` so it works as a default system prompt, matching the behavior of the Google services. This enables sharing a single `LLMContext` across multiple LLM services, where each service provides its own system instruction independently.
-
-  ```python
-  llm = OpenAILLMService(
-      api_key=os.getenv("OPENAI_API_KEY"),
-      system_instruction="You are a helpful assistant.",
-  )
-
-  context = LLMContext()
-
-  @transport.event_handler("on_client_connected")
-  async def on_client_connected(transport, client):
-      context.add_message({"role": "user", "content": "Please introduce yourself."})
-      await task.queue_frames([LLMRunFrame()])
-  ```
--- a/changelog/3918.other.md
+++ b/changelog/3918.other.md
@@ -1 +0,0 @@
- Updated foundational examples to use `system_instruction` on LLM services instead of adding system messages to `LLMContext`.
--- a/env.example
+++ b/env.example
@@ -108,10 +108,6 @@ KRISP_VIVA_API_KEY=...
 KRISP_VIVA_FILTER_MODEL_PATH=...
 KRISP_VIVA_TURN_MODEL_PATH=...

-# LemonSlice
-LEMONSLICE_API_KEY=...
-LEMONSLICE_AGENT_ID=...
-
 # LiveKit
 LIVEKIT_API_KEY=...
 LIVEKIT_API_SECRET=...
--- a/examples/foundational/02-llm-say-one-thing.py
+++ b/examples/foundational/02-llm-say-one-thing.py
@@ -42,10 +42,14 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
    )

-    llm = OpenAILLMService(
-        api_key=os.getenv("OPENAI_API_KEY"),
-        system_instruction="You are an LLM in a WebRTC session, and this is a 'hello world' demo.",
-    )
+    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+
+    messages = [
+        {
+            "role": "system",
+            "content": "You are an LLM in a WebRTC session, and this is a 'hello world' demo. Say hello to the world.",
+        }
+    ]

    task = PipelineTask(
        Pipeline([llm, tts, transport.output()]),
@@ -55,9 +59,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    # Register an event handler so we can play the audio when the client joins
    @transport.event_handler("on_client_connected")
    async def on_client_connected(transport, client):
-        context = LLMContext()
-        context.add_message({"role": "system", "content": "Say hello to the world."})
-        await task.queue_frames([LLMContextFrame(context), EndFrame()])
+        await task.queue_frames([LLMContextFrame(LLMContext(messages)), EndFrame()])

    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)

--- a/examples/foundational/04-transports-small-webrtc.py
+++ b/examples/foundational/04-transports-small-webrtc.py
@@ -70,12 +70,16 @@ async def run_example(webrtc_connection: SmallWebRTCConnection):
        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
    )

-    llm = OpenAILLMService(
-        api_key=os.getenv("OPENAI_API_KEY"),
-        system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-    )
+    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

-    context = LLMContext()
+    messages = [
+        {
+            "role": "system",
+            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        },
+    ]
+
+    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -105,7 +109,7 @@ async def run_example(webrtc_connection: SmallWebRTCConnection):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
+        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/04a-transports-daily.py
+++ b/examples/foundational/04a-transports-daily.py
@@ -53,13 +53,16 @@ async def main():
            voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
        )

-        llm = OpenAILLMService(
-            api_key=os.getenv("OPENAI_API_KEY"),
-            model="gpt-4o",
-            system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        )
+        llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")

-        context = LLMContext()
+        messages = [
+            {
+                "role": "system",
+                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+            },
+        ]
+
+        context = LLMContext(messages)
        user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
            context,
            user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -88,9 +91,7 @@ async def main():
        async def on_first_participant_joined(transport, participant):
            await transport.capture_participant_transcription(participant["id"])
            # Kick off the conversation.
-            context.add_message(
-                {"role": "system", "content": "Please introduce yourself to the user."}
-            )
+            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
            await task.queue_frames([LLMRunFrame()])

        @transport.event_handler("on_participant_left")
--- a/examples/foundational/04b-transports-livekit.py
+++ b/examples/foundational/04b-transports-livekit.py
@@ -55,17 +55,24 @@ async def main():

    stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))

-    llm = OpenAILLMService(
-        api_key=os.getenv("OPENAI_API_KEY"),
-        system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-    )
+    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
    )

-    context = LLMContext()
+    messages = [
+        {
+            "role": "system",
+            "content": "You are a helpful LLM in a WebRTC call. "
+            "Your goal is to demonstrate your capabilities in a succinct way. "
+            "Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. "
+            "Respond to what the user said in a creative and helpful way.",
+        },
+    ]
+
+    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
--- a/examples/foundational/06-listen-and-respond.py
+++ b/examples/foundational/06-listen-and-respond.py
@@ -13,7 +13,6 @@ from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import Frame, LLMRunFrame, MetricsFrame
 from pipecat.metrics.metrics import (
    LLMUsageMetricsData,
-    ProcessingMetricsData,
    TTFBMetricsData,
    TTSUsageMetricsData,
 )
@@ -46,8 +45,6 @@ class MetricsLogger(FrameProcessor):
            for d in frame.data:
                if isinstance(d, TTFBMetricsData):
                    print(f"!!! MetricsFrame: {frame}, ttfb: {d.value}")
-                elif isinstance(d, ProcessingMetricsData):
-                    print(f"!!! MetricsFrame: {frame}, processing: {d.value}")
                elif isinstance(d, LLMUsageMetricsData):
                    tokens = d.value
                    print(
@@ -86,14 +83,18 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
    )

-    llm = OpenAILLMService(
-        api_key=os.getenv("OPENAI_API_KEY"),
-        system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-    )
+    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

    ml = MetricsLogger()

-    context = LLMContext()
+    messages = [
+        {
+            "role": "system",
+            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        },
+    ]
+
+    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -125,7 +126,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
+        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/06a-image-sync.py
+++ b/examples/foundational/06a-image-sync.py
@@ -103,12 +103,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
    )

-    llm = OpenAILLMService(
-        api_key=os.getenv("OPENAI_API_KEY"),
-        system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-    )
+    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

-    context = LLMContext()
+    messages = [
+        {
+            "role": "system",
+            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        },
+    ]
+
+    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
--- a/examples/foundational/07-interruptible-cartesia-http.py
+++ b/examples/foundational/07-interruptible-cartesia-http.py
@@ -59,12 +59,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
    )

-    llm = OpenAILLMService(
-        api_key=os.getenv("OPENAI_API_KEY"),
-        system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-    )
+    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

-    context = LLMContext()
+    messages = [
+        {
+            "role": "system",
+            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        },
+    ]
+
+    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -95,7 +99,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
+        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07-interruptible.py
+++ b/examples/foundational/07-interruptible.py
@@ -62,12 +62,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        # text_aggregation_mode=TextAggregationMode.TOKEN,
    )

-    llm = OpenAILLMService(
-        api_key=os.getenv("OPENAI_API_KEY"),
-        system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-    )
+    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

-    context = LLMContext()
+    messages = [
+        {
+            "role": "system",
+            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        },
+    ]
+
+    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -98,7 +102,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
+        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07c-interruptible-deepgram-flux.py
+++ b/examples/foundational/07c-interruptible-deepgram-flux.py
@@ -10,7 +10,6 @@ import os
 from dotenv import load_dotenv
 from loguru import logger

-from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import LLMRunFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
@@ -61,18 +60,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-2-andromeda-en")

-    llm = OpenAILLMService(
-        api_key=os.getenv("OPENAI_API_KEY"),
-        system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-    )
+    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

-    context = LLMContext()
+    messages = [
+        {
+            "role": "system",
+            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        },
+    ]
+
+    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
-        user_params=LLMUserAggregatorParams(
-            user_turn_strategies=ExternalUserTurnStrategies(),
-            vad_analyzer=SileroVADAnalyzer(),
-        ),
+        user_params=LLMUserAggregatorParams(user_turn_strategies=ExternalUserTurnStrategies()),
    )

    pipeline = Pipeline(
@@ -100,7 +100,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
+        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07c-interruptible-deepgram-http.py
+++ b/examples/foundational/07c-interruptible-deepgram-http.py
@@ -63,12 +63,14 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
            aiohttp_session=session,
        )

-        llm = OpenAILLMService(
-            api_key=os.getenv("OPENAI_API_KEY"),
-            system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        )
+        llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

-        messages = []
+        messages = [
+            {
+                "role": "system",
+                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+            },
+        ]

        context = LLMContext(messages)
        user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
@@ -101,9 +103,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        async def on_client_connected(transport, client):
            logger.info(f"Client connected")
            # Kick off the conversation.
-            context.add_message(
-                {"role": "system", "content": "Please introduce yourself to the user."}
-            )
+            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
            await task.queue_frames([LLMRunFrame()])

        @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07c-interruptible-deepgram-sagemaker.py
+++ b/examples/foundational/07c-interruptible-deepgram-sagemaker.py
@@ -23,8 +23,8 @@ from pipecat.processors.aggregators.llm_response_universal import (
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
 from pipecat.services.aws.llm import AWSBedrockLLMService
-from pipecat.services.deepgram.sagemaker.stt import DeepgramSageMakerSTTService
-from pipecat.services.deepgram.sagemaker.tts import DeepgramSageMakerTTSService
+from pipecat.services.deepgram.stt_sagemaker import DeepgramSageMakerSTTService
+from pipecat.services.deepgram.tts_sagemaker import DeepgramSageMakerTTSService
 from pipecat.transports.base_transport import BaseTransport, TransportParams
 from pipecat.transports.daily.transport import DailyParams
 from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
@@ -76,10 +76,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        aws_region=os.getenv("AWS_REGION"),
        model="us.amazon.nova-pro-v1:0",
        params=AWSBedrockLLMService.InputParams(temperature=0.8),
-        system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
    )

-    context = LLMContext()
+    messages = [
+        {
+            "role": "system",
+            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        },
+    ]
+
+    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -110,7 +116,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
+        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07c-interruptible-deepgram-vad.py
+++ b/examples/foundational/07c-interruptible-deepgram-vad.py
@@ -61,12 +61,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-2-andromeda-en")

-    llm = OpenAILLMService(
-        api_key=os.getenv("OPENAI_API_KEY"),
-        system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-    )
+    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

-    context = LLMContext()
+    messages = [
+        {
+            "role": "system",
+            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        },
+    ]
+
+    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(user_turn_strategies=ExternalUserTurnStrategies()),
@@ -97,7 +101,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
+        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07c-interruptible-deepgram.py
+++ b/examples/foundational/07c-interruptible-deepgram.py
@@ -57,12 +57,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-2-andromeda-en")

-    llm = OpenAILLMService(
-        api_key=os.getenv("OPENAI_API_KEY"),
-        system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-    )
+    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

-    context = LLMContext()
+    messages = [
+        {
+            "role": "system",
+            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        },
+    ]
+
+    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -93,7 +97,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
+        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07d-interruptible-elevenlabs-http.py
+++ b/examples/foundational/07d-interruptible-elevenlabs-http.py
@@ -67,12 +67,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
            aiohttp_session=session,
        )

-        llm = OpenAILLMService(
-            api_key=os.getenv("OPENAI_API_KEY"),
-            system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        )
+        llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

-        context = LLMContext()
+        messages = [
+            {
+                "role": "system",
+                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+            },
+        ]
+
+        context = LLMContext(messages)
        user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
            context,
            user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -103,9 +107,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        async def on_client_connected(transport, client):
            logger.info(f"Client connected")
            # Kick off the conversation.
-            context.add_message(
-                {"role": "system", "content": "Please introduce yourself to the user."}
-            )
+            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
            await task.queue_frames([LLMRunFrame()])

        @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07d-interruptible-elevenlabs.py
+++ b/examples/foundational/07d-interruptible-elevenlabs.py
@@ -60,12 +60,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
    )

-    llm = OpenAILLMService(
-        api_key=os.getenv("OPENAI_API_KEY"),
-        system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-    )
+    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

-    context = LLMContext()
+    messages = [
+        {
+            "role": "system",
+            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        },
+    ]
+
+    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -96,7 +100,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
+        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07f-interruptible-azure-http.py
+++ b/examples/foundational/07f-interruptible-azure-http.py
@@ -66,10 +66,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
        endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
        model=os.getenv("AZURE_CHATGPT_MODEL"),
-        system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
    )

-    context = LLMContext()
+    messages = [
+        {
+            "role": "system",
+            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        },
+    ]
+
+    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -100,7 +106,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
+        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07f-interruptible-azure.py
+++ b/examples/foundational/07f-interruptible-azure.py
@@ -66,10 +66,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
        endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
        model=os.getenv("AZURE_CHATGPT_MODEL"),
-        system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
    )

-    context = LLMContext()
+    messages = [
+        {
+            "role": "system",
+            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        },
+    ]
+
+    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -100,7 +106,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
+        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07g-interruptible-openai-http.py
+++ b/examples/foundational/07g-interruptible-openai-http.py
@@ -11,6 +11,7 @@ from dotenv import load_dotenv
 from loguru import logger

 from pipecat.audio.vad.silero import SileroVADAnalyzer
+from pipecat.audio.vad.vad_analyzer import VADParams
 from pipecat.frames.frames import LLMRunFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
@@ -60,12 +61,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = OpenAITTSService(api_key=os.getenv("OPENAI_API_KEY"), voice="ballad")

-    llm = OpenAILLMService(
-        api_key=os.getenv("OPENAI_API_KEY"),
-        system_instruction="You are very knowledgable about dogs. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-    )
+    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

-    context = LLMContext()
+    messages = [
+        {
+            "role": "system",
+            "content": "You are very knowledgable about dogs. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        },
+    ]
+
+    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -97,7 +102,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
+        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07g-interruptible-openai.py
+++ b/examples/foundational/07g-interruptible-openai.py
@@ -66,12 +66,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = OpenAITTSService(api_key=os.getenv("OPENAI_API_KEY"), voice="ballad")

-    llm = OpenAILLMService(
-        api_key=os.getenv("OPENAI_API_KEY"),
-        system_instruction="You are very knowledgable about dogs. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-    )
+    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

-    context = LLMContext()
+    messages = [
+        {
+            "role": "system",
+            "content": "You are very knowledgable about dogs. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        },
+    ]
+
+    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -103,7 +107,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
+        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07h-interruptible-openpipe.py
+++ b/examples/foundational/07h-interruptible-openpipe.py
@@ -65,10 +65,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        api_key=os.getenv("OPENAI_API_KEY"),
        openpipe_api_key=os.getenv("OPENPIPE_API_KEY"),
        tags={"conversation_id": f"pipecat-{timestamp}"},
-        system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
    )

-    context = LLMContext()
+    messages = [
+        {
+            "role": "system",
+            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        },
+    ]
+
+    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -99,7 +105,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
+        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07i-interruptible-xtts.py
+++ b/examples/foundational/07i-interruptible-xtts.py
@@ -63,12 +63,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
            base_url="http://localhost:8000",
        )

-        llm = OpenAILLMService(
-            api_key=os.getenv("OPENAI_API_KEY"),
-            system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        )
+        llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

-        context = LLMContext()
+        messages = [
+            {
+                "role": "system",
+                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+            },
+        ]
+
+        context = LLMContext(messages)
        user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
            context,
            user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -99,9 +103,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        async def on_client_connected(transport, client):
            logger.info(f"Client connected")
            # Kick off the conversation.
-            context.add_message(
-                {"role": "system", "content": "Please introduce yourself to the user."}
-            )
+            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
            await task.queue_frames([LLMRunFrame()])

        @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07k-interruptible-lmnt.py
+++ b/examples/foundational/07k-interruptible-lmnt.py
@@ -56,12 +56,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = LmntTTSService(api_key=os.getenv("LMNT_API_KEY"), voice_id="morgan")

-    llm = OpenAILLMService(
-        api_key=os.getenv("OPENAI_API_KEY"),
-        system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-    )
+    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

-    context = LLMContext()
+    messages = [
+        {
+            "role": "system",
+            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        },
+    ]
+
+    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -92,7 +96,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
+        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07l-interruptible-groq.py
+++ b/examples/foundational/07l-interruptible-groq.py
@@ -55,14 +55,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    stt = GroqSTTService(api_key=os.getenv("GROQ_API_KEY"))

    llm = GroqLLMService(
-        api_key=os.getenv("GROQ_API_KEY"),
-        model="meta-llama/llama-4-maverick-17b-128e-instruct",
-        system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        api_key=os.getenv("GROQ_API_KEY"), model="meta-llama/llama-4-maverick-17b-128e-instruct"
    )

    tts = GroqTTSService(api_key=os.getenv("GROQ_API_KEY"))

-    context = LLMContext()
+    messages = [
+        {
+            "role": "system",
+            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        },
+    ]
+
+    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -93,7 +98,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
+        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07m-interruptible-aws.py
+++ b/examples/foundational/07m-interruptible-aws.py
@@ -62,10 +62,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        aws_region="us-west-2",
        model="us.anthropic.claude-haiku-4-5-20251001-v1:0",
        params=AWSBedrockLLMService.InputParams(temperature=0.8),
-        system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
    )

-    context = LLMContext()
+    messages = [
+        {
+            "role": "system",
+            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        },
+    ]
+
+    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -96,7 +102,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
+        messages.append({"role": "user", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07n-interruptible-gemini-image.py
+++ b/examples/foundational/07n-interruptible-gemini-image.py
@@ -83,11 +83,17 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    llm = GoogleLLMService(
        api_key=os.getenv("GOOGLE_API_KEY"),
        model="gemini-2.5-flash-image",
-        # model="gemini-3-pro-image-preview", # A more powerful model, but slower,
-        system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        # model="gemini-3-pro-image-preview", # A more powerful model, but slower
    )

-    context = LLMContext()
+    messages = [
+        {
+            "role": "system",
+            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        },
+    ]
+
+    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -118,7 +124,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation with a styled introduction
-        context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
+        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07n-interruptible-gemini.py
+++ b/examples/foundational/07n-interruptible-gemini.py
@@ -71,7 +71,13 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    llm = GoogleLLMService(
        api_key=os.getenv("GOOGLE_API_KEY"),
        model="gemini-2.5-flash",
-        system_instruction="""You are a helpful AI assistant in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way.
+    )
+
+    # System message that instructs the AI on how to speak
+    messages = [
+        {
+            "role": "system",
+            "content": """You are a helpful AI assistant in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way.

            IMPORTANT: You're using Gemini TTS which supports expressive markup tags. You can use these tags in your responses:
            - [sigh] - Insert a sigh sound
@@ -89,9 +95,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
            - "The answer is... [long pause] ...42!"

            Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.""",
-    )
+        },
+    ]

-    context = LLMContext()
+    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -122,7 +129,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation
-        context.add_message(
+        messages.append(
            {
                "role": "system",
                "content": "You are an AI assistant. You can help with a variety of tasks. Introduce yourself and ask the user what they would like to know.",
--- a/examples/foundational/07n-interruptible-google-http.py
+++ b/examples/foundational/07n-interruptible-google-http.py
@@ -72,10 +72,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        # params=GoogleLLMService.InputParams(
        #     thinking=GoogleLLMService.ThinkingConfig(thinking_budget=4096)
        # ),
-        system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
    )

-    context = LLMContext()
+    messages = [
+        {
+            "role": "system",
+            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        },
+    ]
+
+    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -106,7 +112,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
+        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07n-interruptible-google.py
+++ b/examples/foundational/07n-interruptible-google.py
@@ -72,10 +72,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        # params=GoogleLLMService.InputParams(
        #     thinking=GoogleLLMService.ThinkingConfig(thinking_budget=4096)
        # ),
-        system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
    )

-    context = LLMContext()
+    messages = [
+        {
+            "role": "system",
+            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        },
+    ]
+
+    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -106,7 +112,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
+        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07o-interruptible-assemblyai-turn-detection.py
+++ b/examples/foundational/07o-interruptible-assemblyai-turn-detection.py
@@ -1,175 +0,0 @@
-#
-# Copyright (c) 2024-2026, Daily
-#
-# SPDX-License-Identifier: BSD 2-Clause License
-#
-
-
-import os
-
-from dotenv import load_dotenv
-from loguru import logger
-
-from pipecat.audio.vad.silero import SileroVADAnalyzer
-from pipecat.frames.frames import LLMRunFrame
-from pipecat.pipeline.pipeline import Pipeline
-from pipecat.pipeline.runner import PipelineRunner
-from pipecat.pipeline.task import PipelineParams, PipelineTask
-from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.llm_response_universal import (
-    LLMContextAggregatorPair,
-    LLMUserAggregatorParams,
-)
-from pipecat.runner.types import RunnerArguments
-from pipecat.runner.utils import create_transport
-from pipecat.services.assemblyai.models import AssemblyAIConnectionParams
-from pipecat.services.assemblyai.stt import AssemblyAISTTService
-from pipecat.services.cartesia.tts import CartesiaTTSService
-from pipecat.services.openai.llm import OpenAILLMService
-from pipecat.transports.base_transport import BaseTransport, TransportParams
-from pipecat.transports.daily.transport import DailyParams
-from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
-from pipecat.turns.user_turn_strategies import ExternalUserTurnStrategies
-
-load_dotenv(override=True)
-
-
-# We use lambdas to defer transport parameter creation until the transport
-# type is selected at runtime.
-transport_params = {
-    "daily": lambda: DailyParams(
-        audio_in_enabled=True,
-        audio_out_enabled=True,
-    ),
-    "twilio": lambda: FastAPIWebsocketParams(
-        audio_in_enabled=True,
-        audio_out_enabled=True,
-    ),
-    "webrtc": lambda: TransportParams(
-        audio_in_enabled=True,
-        audio_out_enabled=True,
-    ),
-}
-
-
-async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
-    """AssemblyAI u3-rt-pro with Built-in Turn Detection
-
-    This example demonstrates using AssemblyAI's u3-rt-pro Speech-to-Text model
-    with AssemblyAI's built-in turn detection for more natural conversation flow.
-
-    Key features:
-
-    1. AssemblyAI Turn Detection
-       - Set `vad_force_turn_endpoint=False` to use AssemblyAI's built-in turn detection
-       - AssemblyAI's model determines when user starts/stops speaking
-       - Uses `ExternalUserTurnStrategies` to delegate turn control to AssemblyAI
-       - More natural turn detection based on speech patterns and pauses
-
-    2. Advanced Turn Detection Tuning
-       - `min_turn_silence`: Minimum silence (ms) when confident about end-of-turn.
-         Lower values = faster responses. Default: 100ms
-       - `max_turn_silence`: Maximum silence (ms) before forcing end-of-turn.
-         Prevents long pauses. Default: 1000ms
-
-    3. Prompt-Based Transcription Enhancement
-       - Use `prompt` parameter to improve accuracy for specific names/terms
-       - Particularly useful for proper nouns, technical terms, domain vocabulary
-       - Example: "Names: Xiomara, Saoirse, Krzystof. Technical terms: API, OAuth."
-
-    4. Speaker Diarization (Optional)
-       - Enable with `speaker_labels=True`
-       - Automatically identifies different speakers in multi-party conversations
-       - TranscriptionFrame includes speaker_id field (e.g., "Speaker A", "Speaker B")
-
-    5. Language Detection (Optional, multilingual model only)
-       - Enable with `language_detection=True`
-       - Automatically detects spoken language
-       - Available with universal-streaming-multilingual model
-
-    For more information: https://www.assemblyai.com/docs/speech-to-text/streaming
-    """
-    logger.info(f"Starting bot")
-
-    stt = AssemblyAISTTService(
-        api_key=os.getenv("ASSEMBLYAI_API_KEY"),
-        vad_force_turn_endpoint=False,  # Use AssemblyAI's built-in turn detection
-        connection_params=AssemblyAIConnectionParams(
-            speech_model="u3-rt-pro",
-            # Optional: Tune turn detection timing (defaults shown below)
-            # min_turn_silence=100,  # Default
-            # max_turn_silence=1000,  # Default
-            # Optional: Boost accuracy for specific names/terms
-            # prompt="Names: Xiomara, Saoirse, Krzystof. Technical terms: API, OAuth.",
-            # Optional: Enable speaker diarization
-            # speaker_labels=True,
-        ),
-    )
-
-    tts = CartesiaTTSService(
-        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
-    )
-
-    llm = OpenAILLMService(
-        api_key=os.getenv("OPENAI_API_KEY"),
-        system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-    )
-
-    context = LLMContext()
-    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
-        context,
-        user_params=LLMUserAggregatorParams(
-            user_turn_strategies=ExternalUserTurnStrategies(),
-            vad_analyzer=SileroVADAnalyzer(),
-        ),
-    )
-
-    pipeline = Pipeline(
-        [
-            transport.input(),  # Transport user input
-            stt,  # STT
-            user_aggregator,  # User responses
-            llm,  # LLM
-            tts,  # TTS
-            transport.output(),  # Transport bot output
-            assistant_aggregator,  # Assistant spoken responses
-        ]
-    )
-
-    task = PipelineTask(
-        pipeline,
-        params=PipelineParams(
-            enable_metrics=True,
-            enable_usage_metrics=True,
-        ),
-        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
-    )
-
-    @transport.event_handler("on_client_connected")
-    async def on_client_connected(transport, client):
-        logger.info(f"Client connected")
-        # Kick off the conversation.
-        context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
-        await task.queue_frames([LLMRunFrame()])
-
-    @transport.event_handler("on_client_disconnected")
-    async def on_client_disconnected(transport, client):
-        logger.info(f"Client disconnected")
-        await task.cancel()
-
-    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
-
-    await runner.run(task)
-
-
-async def bot(runner_args: RunnerArguments):
-    """Main bot entry point compatible with Pipecat Cloud."""
-    transport = await create_transport(runner_args, transport_params)
-    await run_bot(transport, runner_args)
-
-
-if __name__ == "__main__":
-    from pipecat.runner.run import main
-
-    main()
--- a/examples/foundational/07o-interruptible-assemblyai.py
+++ b/examples/foundational/07o-interruptible-assemblyai.py
@@ -62,12 +62,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
    )

-    llm = OpenAILLMService(
-        api_key=os.getenv("OPENAI_API_KEY"),
-        system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-    )
+    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

-    context = LLMContext()
+    messages = [
+        {
+            "role": "system",
+            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        },
+    ]
+
+    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -98,7 +102,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
+        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07p-interruptible-krisp-viva.py
+++ b/examples/foundational/07p-interruptible-krisp-viva.py
@@ -87,12 +87,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        api_key=os.getenv("CARTESIA_API_KEY"), voice_id="71a7ad14-091c-4e8e-a314-022ece01c121"
    )

-    llm = OpenAILLMService(
-        api_key=os.getenv("OPENAI_API_KEY"),
-        system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-    )
+    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

-    context = LLMContext()
+    messages = [
+        {
+            "role": "system",
+            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        },
+    ]
+
+    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(
@@ -129,7 +133,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
+        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07p-interruptible-krisp.py
+++ b/examples/foundational/07p-interruptible-krisp.py
@@ -60,12 +60,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-helios-en")

-    llm = OpenAILLMService(
-        api_key=os.getenv("OPENAI_API_KEY"),
-        system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-    )
+    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

-    context = LLMContext()
+    messages = [
+        {
+            "role": "system",
+            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        },
+    ]
+
+    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -96,7 +100,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
+        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07q-interruptible-rime-http.py
+++ b/examples/foundational/07q-interruptible-rime-http.py
@@ -65,12 +65,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
            aiohttp_session=session,
        )

-        llm = OpenAILLMService(
-            api_key=os.getenv("OPENAI_API_KEY"),
-            system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        )
+        llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

-        context = LLMContext()
+        messages = [
+            {
+                "role": "system",
+                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+            },
+        ]
+
+        context = LLMContext(messages)
        user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
            context,
            user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -101,9 +105,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        async def on_client_connected(transport, client):
            logger.info(f"Client connected")
            # Kick off the conversation.
-            context.add_message(
-                {"role": "system", "content": "Please introduce yourself to the user."}
-            )
+            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
            await task.queue_frames([LLMRunFrame()])

        @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07q-interruptible-rime.py
+++ b/examples/foundational/07q-interruptible-rime.py
@@ -59,12 +59,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        voice_id="luna",
    )

-    llm = OpenAILLMService(
-        api_key=os.getenv("OPENAI_API_KEY"),
-        system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-    )
+    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

-    context = LLMContext()
+    messages = [
+        {
+            "role": "system",
+            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        },
+    ]
+
+    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -95,7 +99,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
+        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07r-interruptible-nvidia.py
+++ b/examples/foundational/07r-interruptible-nvidia.py
@@ -55,14 +55,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    stt = NvidiaSTTService(api_key=os.getenv("NVIDIA_API_KEY"))

    llm = NvidiaLLMService(
-        api_key=os.getenv("NVIDIA_API_KEY"),
-        model="meta/llama-3.3-70b-instruct",
-        system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        api_key=os.getenv("NVIDIA_API_KEY"), model="meta/llama-3.1-405b-instruct"
    )

    tts = NvidiaTTSService(api_key=os.getenv("NVIDIA_API_KEY"))

-    context = LLMContext()
+    messages = [
+        {
+            "role": "system",
+            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        },
+    ]
+
+    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -93,7 +98,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
+        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07t-interruptible-fish.py
+++ b/examples/foundational/07t-interruptible-fish.py
@@ -60,12 +60,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        model="4ce7e917cedd4bc2bb2e6ff3a46acaa1",  # Barack Obama
    )

-    llm = OpenAILLMService(
-        api_key=os.getenv("OPENAI_API_KEY"),
-        system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-    )
+    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

-    context = LLMContext()
+    messages = [
+        {
+            "role": "system",
+            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        },
+    ]
+
+    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -96,7 +100,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
+        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07v-interruptible-neuphonic-http.py
+++ b/examples/foundational/07v-interruptible-neuphonic-http.py
@@ -64,12 +64,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
            aiohttp_session=session,
        )

-        llm = OpenAILLMService(
-            api_key=os.getenv("OPENAI_API_KEY"),
-            system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        )
+        llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

-        context = LLMContext()
+        messages = [
+            {
+                "role": "system",
+                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+            },
+        ]
+
+        context = LLMContext(messages)
        user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
            context,
            user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -100,9 +104,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        async def on_client_connected(transport, client):
            logger.info(f"Client connected")
            # Kick off the conversation.
-            context.add_message(
-                {"role": "system", "content": "Please introduce yourself to the user."}
-            )
+            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
            await task.queue_frames([LLMRunFrame()])

        @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07v-interruptible-neuphonic.py
+++ b/examples/foundational/07v-interruptible-neuphonic.py
@@ -59,12 +59,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        voice_id="fc854436-2dac-4d21-aa69-ae17b54e98eb",  # Emily
    )

-    llm = OpenAILLMService(
-        api_key=os.getenv("OPENAI_API_KEY"),
-        system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-    )
+    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

-    context = LLMContext()
+    messages = [
+        {
+            "role": "system",
+            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        },
+    ]
+
+    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -95,7 +99,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
+        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07w-interruptible-fal.py
+++ b/examples/foundational/07w-interruptible-fal.py
@@ -62,12 +62,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
    )

-    llm = OpenAILLMService(
-        api_key=os.getenv("OPENAI_API_KEY"),
-        system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-    )
+    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

-    context = LLMContext()
+    messages = [
+        {
+            "role": "system",
+            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        },
+    ]
+
+    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -98,7 +102,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
+        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07x-interruptible-local.py
+++ b/examples/foundational/07x-interruptible-local.py
@@ -47,12 +47,16 @@ async def main():
        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
    )

-    llm = OpenAILLMService(
-        api_key=os.getenv("OPENAI_API_KEY"),
-        system_instruction="You are a helpful LLM. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-    )
+    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

-    context = LLMContext()
+    messages = [
+        {
+            "role": "system",
+            "content": "You are a helpful LLM. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        },
+    ]
+
+    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -78,7 +82,7 @@ async def main():
        ),
    )

-    context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
+    messages.append({"role": "system", "content": "Please introduce yourself to the user."})
    await task.queue_frames([LLMRunFrame()])

    runner = PipelineRunner()
--- a/examples/foundational/07y-interruptible-minimax.py
+++ b/examples/foundational/07y-interruptible-minimax.py
@@ -66,12 +66,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
            params=MiniMaxHttpTTSService.InputParams(language=Language.EN),
        )

-        llm = OpenAILLMService(
-            api_key=os.getenv("OPENAI_API_KEY"),
-            system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        )
+        llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

-        context = LLMContext()
+        messages = [
+            {
+                "role": "system",
+                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+            },
+        ]
+
+        context = LLMContext(messages)
        user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
            context,
            user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -102,9 +106,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        async def on_client_connected(transport, client):
            logger.info(f"Client connected")
            # Kick off the conversation.
-            context.add_message(
-                {"role": "system", "content": "Please introduce yourself to the user."}
-            )
+            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
            await task.queue_frames([LLMRunFrame()])

        @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07z-interruptible-sarvam-http.py
+++ b/examples/foundational/07z-interruptible-sarvam-http.py
@@ -68,12 +68,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
            params=SarvamHttpTTSService.InputParams(language=Language.EN),
        )

-        llm = OpenAILLMService(
-            api_key=os.getenv("OPENAI_API_KEY"),
-            system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        )
+        llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

-        context = LLMContext()
+        messages = [
+            {
+                "role": "system",
+                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+            },
+        ]
+
+        context = LLMContext(messages)
        user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
            context,
            user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -104,9 +108,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        async def on_client_connected(transport, client):
            logger.info(f"Client connected")
            # Kick off the conversation.
-            context.add_message(
-                {"role": "system", "content": "Please introduce yourself to the user."}
-            )
+            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
            await task.queue_frames([LLMRunFrame()])

        @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07z-interruptible-sarvam.py
+++ b/examples/foundational/07z-interruptible-sarvam.py
@@ -62,12 +62,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        model="bulbul:v2",
        voice_id="manisha",
    )
-    llm = OpenAILLMService(
-        api_key=os.getenv("OPENAI_API_KEY"),
-        system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-    )
+    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

-    context = LLMContext()
+    messages = [
+        {
+            "role": "system",
+            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        },
+    ]
+
+    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -97,7 +101,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
+        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

        # Optionally, you can wait for 30 seconds and then change the voice.
--- a/examples/foundational/07za-interruptible-soniox.py
+++ b/examples/foundational/07za-interruptible-soniox.py
@@ -64,12 +64,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
    )

-    llm = OpenAILLMService(
-        api_key=os.getenv("OPENAI_API_KEY"),
-        system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-    )
+    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

-    context = LLMContext()
+    messages = [
+        {
+            "role": "system",
+            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        },
+    ]
+
+    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -99,7 +103,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
+        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07zb-interruptible-inworld-http.py
+++ b/examples/foundational/07zb-interruptible-inworld-http.py
@@ -64,12 +64,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
            streaming=True,
        )

-        llm = OpenAILLMService(
-            api_key=os.getenv("OPENAI_API_KEY"),
-            system_instruction="You are a helpful AI demonstrating Inworld AI's TTS. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a friendly and helpful way.",
-        )
+        llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

-        context = LLMContext()
+        messages = [
+            {
+                "role": "system",
+                "content": "You are a helpful AI demonstrating Inworld AI's TTS. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a friendly and helpful way.",
+            },
+        ]
+
+        context = LLMContext(messages)
        user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
            context,
            user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -107,9 +111,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        async def on_client_connected(transport, client):
            logger.info("Client connected")
            # Kick off the conversation.
-            context.add_message(
-                {"role": "system", "content": "Please introduce yourself to the user."}
-            )
+            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
            await task.queue_frames([LLMRunFrame()])

        @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07zb-interruptible-inworld.py
+++ b/examples/foundational/07zb-interruptible-inworld.py
@@ -61,12 +61,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        temperature=1.1,
    )

-    llm = OpenAILLMService(
-        api_key=os.getenv("OPENAI_API_KEY"),
-        system_instruction="You are a helpful AI demonstrating Inworld AI's TTS. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a friendly and helpful way.",
-    )
+    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

-    context = LLMContext()
+    messages = [
+        {
+            "role": "system",
+            "content": "You are a helpful AI demonstrating Inworld AI's TTS. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a friendly and helpful way.",
+        },
+    ]
+
+    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -104,7 +108,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info("Client connected")
        # Kick off the conversation.
-        context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
+        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07zc-interruptible-asyncai-http.py
+++ b/examples/foundational/07zc-interruptible-asyncai-http.py
@@ -64,12 +64,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
            aiohttp_session=session,
        )

-        llm = OpenAILLMService(
-            api_key=os.getenv("OPENAI_API_KEY"),
-            system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        )
+        llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

-        context = LLMContext()
+        messages = [
+            {
+                "role": "system",
+                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+            },
+        ]
+
+        context = LLMContext(messages)
        user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
            context,
            user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -100,9 +104,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        async def on_client_connected(transport, client):
            logger.info(f"Client connected")
            # Kick off the conversation.
-            context.add_message(
-                {"role": "system", "content": "Please introduce yourself to the user."}
-            )
+            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
            await task.queue_frames([LLMRunFrame()])

        @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07zc-interruptible-asyncai.py
+++ b/examples/foundational/07zc-interruptible-asyncai.py
@@ -60,12 +60,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        voice_id=os.getenv("ASYNCAI_VOICE_ID", "e0f39dc4-f691-4e78-bba5-5c636692cc04"),
    )

-    llm = OpenAILLMService(
-        api_key=os.getenv("OPENAI_API_KEY"),
-        system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-    )
+    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

-    context = LLMContext()
+    messages = [
+        {
+            "role": "system",
+            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        },
+    ]
+
+    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -96,7 +100,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
+        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07zd-interruptible-aicoustics.py
+++ b/examples/foundational/07zd-interruptible-aicoustics.py
@@ -40,7 +40,7 @@ def _create_aic_filter() -> AICFilter:

    return AICFilter(
        license_key=license_key,
-        model_id="quail-vf-2.0-l-16khz",
+        model_id="quail-vf-l-16khz",
    )


@@ -80,12 +80,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
    )

-    llm = OpenAILLMService(
-        api_key=os.getenv("OPENAI_API_KEY"),
-        system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-    )
+    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

-    context = LLMContext()
+    messages = [
+        {
+            "role": "system",
+            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        },
+    ]
+
+    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=aic_vad_analyzer),
@@ -124,7 +128,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        logger.info(f"Client connected")
        await audiobuffer.start_recording()
        # Kick off the conversation.
-        context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
+        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @audiobuffer.event_handler("on_audio_data")
--- a/examples/foundational/07ze-interruptible-hume.py
+++ b/examples/foundational/07ze-interruptible-hume.py
@@ -62,12 +62,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        voice_id="f898a92e-685f-43fa-985b-a46920f0650b",
    )

-    llm = OpenAILLMService(
-        api_key=os.getenv("OPENAI_API_KEY"),
-        system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-    )
+    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

-    context = LLMContext()
+    messages = [
+        {
+            "role": "system",
+            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        },
+    ]
+
+    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -109,7 +113,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
            "💡 Word timestamps are enabled! Watch the console for TTSTextFrame logs showing each word with its PTS."
        )
        # Kick off the conversation.
-        context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
+        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07zf-interruptible-gradium.py
+++ b/examples/foundational/07zf-interruptible-gradium.py
@@ -66,12 +66,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        url="wss://us.api.gradium.ai/api/speech/tts",
    )

-    llm = OpenAILLMService(
-        api_key=os.getenv("OPENAI_API_KEY"),
-        system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-    )
+    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

-    context = LLMContext()
+    messages = [
+        {
+            "role": "system",
+            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        },
+    ]
+
+    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -102,7 +106,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
+        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07zg-interruptible-camb.py
+++ b/examples/foundational/07zg-interruptible-camb.py
@@ -59,12 +59,18 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        model="mars-flash",
    )

-    llm = OpenAILLMService(
-        api_key=os.getenv("OPENAI_API_KEY"),
-        system_instruction="You are a helpful voice assistant powered by Camb AI text-to-speech. ",
-    )
+    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

-    context = LLMContext()
+    messages = [
+        {
+            "role": "system",
+            "content": "You are a helpful voice assistant powered by Camb AI text-to-speech. "
+            "Keep your responses concise and conversational since they will be spoken aloud. "
+            "Avoid special characters, emojis, or bullet points.",
+        },
+    ]
+
+    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -95,7 +101,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    @transport.event_handler("on_client_connected")
    async def on_client_connected(transport, client):
        logger.info("Client connected")
-        context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
+        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07zh-interruptible-hathora.py
+++ b/examples/foundational/07zh-interruptible-hathora.py
@@ -64,10 +64,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        base_url="https://app-362f7ca1-6975-4e18-a605-ab202bf2c315.app.hathora.dev/v1",
        api_key=os.getenv("HATHORA_API_KEY"),
        model=None,
-        system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
    )

-    context = LLMContext()
+    messages = [
+        {
+            "role": "system",
+            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        },
+    ]
+
+    context = LLMContext(messages)
    context_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -98,7 +104,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
+        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07zi-interruptible-piper.py
+++ b/examples/foundational/07zi-interruptible-piper.py
@@ -56,12 +56,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = PiperTTSService(voice_id="en_US-ryan-high")

-    llm = OpenAILLMService(
-        api_key=os.getenv("OPENAI_API_KEY"),
-        system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-    )
+    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

-    context = LLMContext()
+    messages = [
+        {
+            "role": "system",
+            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        },
+    ]
+
+    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -92,7 +96,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
+        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07zj-interruptible-kokoro.py
+++ b/examples/foundational/07zj-interruptible-kokoro.py
@@ -56,12 +56,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = KokoroTTSService(voice_id="af_heart")

-    llm = OpenAILLMService(
-        api_key=os.getenv("OPENAI_API_KEY"),
-        system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-    )
+    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

-    context = LLMContext()
+    messages = [
+        {
+            "role": "system",
+            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        },
+    ]
+
+    context = LLMContext(messages)
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -92,7 +96,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
+        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/Show More
+++ b/Show More
				`@@ -0,0 +1 @@`
				- Added `TextAggregationMetricsData` metric measuring the time from the first LLM token to the first complete sentence, representing the latency cost of sentence aggregation in the TTS pipeline.
				`@@ -0,0 +1 @@`
				- Added `text_aggregation_mode` parameter to `TTSService` and all TTS subclasses with a new `TextAggregationMode` enum (`SENTENCE`, `TOKEN`). All text now flows through text aggregators regardless of mode, enabling pattern detection and tag handling in TOKEN mode.
				`@@ -0,0 +1 @@`
				- ⚠️ Deprecated `aggregate_sentences` parameter on `TTSService` and all TTS subclasses. Use `text_aggregation_mode=TextAggregationMode.SENTENCE` or `text_aggregation_mode=TextAggregationMode.TOKEN` instead.
				`@@ -0,0 +1 @@`
				- ⚠️ Refactored runtime-updatable service settings to use strongly-typed classes (`TTSSettings`, `STTSettings`, `LLMSettings`, and service-specific subclasses) instead of plain dicts. Each service's `_settings` now holds these strongly-typed objects. For service maintainers, see changes in COMMUNITY_INTEGRATIONS.md.
				`@@ -0,0 +1 @@`
				- Dict-based `UpdateSettingsFrame(settings={...})` is deprecated in favor of passing typed settings delta objects with `UpdateSettingsFrame(delta={...})`.
				`@@ -0,0 +1 @@`
				- Switched `GradiumTTSService` from `InterruptibleWordTTSService` to `AudioContextWordTTSService`, eliminating websocket disconnect/reconnect on every interruption by using `client_req_id`-based multiplexing.
				`@@ -0,0 +1 @@`
				- Word timestamp support has been moved from `WordTTSService` into `TTSService` via a new `supports_word_timestamps` parameter. Services that previously extended `WordTTSService`, `AudioContextWordTTSService`, or `WebsocketWordTTSService` now pass `supports_word_timestamps=True` to their parent `__init__` instead.
				`@@ -0,0 +1 @@`
				- Fixed Poetry compatibility by inlining `local-smart-turn-v3` dependencies (`transformers`, `onnxruntime`) into core dependencies instead of using a self-referential extra.
				`@@ -0,0 +1 @@`
				- Removed `local-smart-turn-v3` optional extra from `pyproject.toml`. The `transformers` and `onnxruntime` packages are now always installed as core dependencies since they are required by the default turn stop strategy, `TurnAnalyzerUserTurnStopStrategy` which uses `LocalSmartTurnAnalyzerV3`.
				`@@ -0,0 +1 @@`
				- Added `output_medium` parameter to `AgentInputParams` and `OneShotInputParams` in Ultravox service to control initial output medium (text or voice) at call creation time.