fix: propagate append_to_context from TextFrame through TTS _process_text_frame

Co-authored-by: jamsea <614910+jamsea@users.noreply.github.com>
Initial plan
2026-03-18 04:24:25 +00:00 · 2026-03-18 04:21:01 +00:00 · 2026-03-17 21:05:10 -04:00 · 2026-03-17 18:24:20 -04:00 · 2026-03-17 16:41:11 -04:00 · 2026-03-17 16:39:24 -04:00
550 changed files with 41905 additions and 13339 deletions
--- a/.claude-plugin/marketplace.json
+++ b/.claude-plugin/marketplace.json
@@ -0,0 +1,27 @@
+{
+  "name": "pipecat-dev-skills",
+  "owner": {
+    "name": "Pipecat"
+  },
+  "metadata": {
+    "description": "Development workflow skills for contributing to the Pipecat project",
+    "version": "1.0.0"
+  },
+  "plugins": [
+    {
+      "name": "pipecat-dev",
+      "description": "Development workflow skills for contributing to the Pipecat project",
+      "version": "1.0.0",
+      "source": "./",
+      "skills": [
+        "./.claude/skills/changelog",
+        "./.claude/skills/cleanup",
+        "./.claude/skills/code-review",
+        "./.claude/skills/docstring",
+        "./.claude/skills/pr-description",
+        "./.claude/skills/pr-submit",
+        "./.claude/skills/update-docs"
+      ]
+    }
+  ]
+}
--- a/.claude/skills/changelog/SKILL.md
+++ b/.claude/skills/changelog/SKILL.md
@@ -32,6 +32,20 @@ Create changelog files for the important commits in this PR. The PR number is pr

 6. Use ⚠️ emoji prefix for breaking changes.

+7. **Write changes in user-facing terms first.** Lead with what users of the framework will notice: new APIs, changed behavior, new parameters, fixed bugs they might have hit, etc. Implementation details (internal refactoring, how something is wired up under the hood) can be included as secondary context after the user-facing description, but should never be the *only* content of a changelog entry when there is a user-visible effect.
+
+   **Good** (user-facing first, implementation detail as context):
+   ```
+   - Turn completion instructions now persist correctly across full context updates when using `system_instruction`. Previously they were injected as a context system message, which caused warning spam and didn't survive context updates.
+   ```
+
+   **Bad** (implementation detail only, no user-facing framing):
+   ```
+   - Fixed turn completion instructions being injected as a context system message instead of using `system_instruction`.
+   ```
+
+   Ask yourself: "If I'm a developer building on Pipecat, what would I notice changed?" Start there.
+
 ## Example

 For PR #3519 with a new feature and a bug fix:
@@ -43,5 +57,5 @@ For PR #3519 with a new feature and a bug fix:

 `changelog/3519.fixed.md`:
 ```
- Fixed an issue where something was not working correctly.
+- Fixed an issue where something was not working correctly in some user-visible scenario. The root cause was an internal implementation detail.
 ```
--- a/.claude/skills/cleanup/SKILL.md
+++ b/.claude/skills/cleanup/SKILL.md
@@ -1,6 +1,6 @@
 # Code Cleanup Skill

-The **Code Cleanup Skill** reviews, refactors, and documents code changes in your current branch, ensuring alignment with **Pipecat’s architecture, coding standards, and example patterns**.
+The **Code Cleanup Skill** reviews, refactors, and documents code changes in your current branch, ensuring alignment with **Pipecat's architecture, coding standards, and example patterns**.
 It focuses on **readability, correctness, performance, and consistency**, while avoiding breaking changes.

 ---
@@ -28,9 +28,9 @@ This skill analyzes all changes introduced in your branch and performs the follo

 Invoke the skill using any of the following commands:

- “Clean up my branch code”
- “Refactor the changes in my branch”
- “Review and improve my branch code”
+- "Clean up my branch code"
+- "Refactor the changes in my branch"
+- "Review and improve my branch code"
 - `/cleanup`

 ---
--- a/.claude/skills/docstring/SKILL.md
+++ b/.claude/skills/docstring/SKILL.md
@@ -3,21 +3,20 @@ name: docstring
 description: Document a Python module and its classes using Google style
 ---

-Document a Python module and its classes using Google-style docstrings following project conventions. The class name is provided as an argument.
+Document a Python module or class using Google-style docstrings following project conventions. The argument can be a class name or a module path.

 ## Instructions

-1. First, find the class in the codebase:
-   ```
-   Search for "class ClassName" in src/pipecat/
-   ```
+1. Determine what to document based on the argument:

-2. If multiple files contain that class name:
-   - List all matches with their file paths
-   - Ask the user which one they want to document
-   - Wait for confirmation before proceeding
+   **If a module path is provided** (e.g. `src/pipecat/audio/vad/vad_analyzer.py`):
+   - Use that file directly

-3. Once the file is identified, read the module to understand its structure:
+   **If a class name is provided** (e.g. `VADAnalyzer`):
+   - Search for `class ClassName` in `src/pipecat/`
+   - If multiple files contain that class name, list all matches with their file paths, ask the user which one they want to document, and wait for confirmation
+
+2. Once the file is identified, read the module to understand its structure:
   - Identify all classes, functions, and important type aliases
   - Understand the purpose of each component

--- a/.claude/skills/update-docs/SKILL.md
+++ b/.claude/skills/update-docs/SKILL.md
@@ -157,7 +157,11 @@ After processing all mapped pairs, check for two kinds of gaps:

 **Missing sections**: Mapped doc pages that are missing standard sections compared to the source. For example, a transport page with no Configuration section, or a service page with no InputParams table when the source defines `InputParams(BaseModel)`. Flag these and offer to add the missing sections.

-If the user wants a new page, create it using this template structure:
+If the user wants a new page, do all three of the following:
+
+#### 8a: Create the doc page
+
+Create the new `.mdx` file using this template structure:
 ```
 ---
 title: "Service Name"
@@ -207,6 +211,53 @@ pip install "pipecat-ai[package-name]"
 [Event table and example code]
 ```

+#### 8b: Add to docs.json
+
+Add the new page path to `DOCS_PATH/docs.json` in the correct navigation group. The path format is `server/services/{category}/{provider}` (without the `.mdx` extension).
+
+Find the matching group in the navigation structure:
+- **STT** → `"group": "Speech-to-Text"` under Services
+- **TTS** → `"group": "Text-to-Speech"` under Services
+- **LLM** → `"group": "LLM"` under Services
+- **S2S** → `"group": "Speech-to-Speech"` under Services
+- **Transport** → `"group": "Transport"` under Services
+- **Serializer** → `"group": "Serializers"` under Services
+- **Image generation** → `"group": "Image Generation"` under Services
+- **Video** → `"group": "Video"` under Services
+- **Memory** → `"group": "Memory"` under Services
+- **Vision** → `"group": "Vision"` under Services
+- **Analytics** → `"group": "Analytics & Monitoring"` under Services
+
+Insert the new entry **alphabetically** within the group's `pages` array. For example, adding a new STT service "foo":
+```json
+{
+  "group": "Speech-to-Text",
+  "pages": [
+    "server/services/stt/assemblyai",
+    "server/services/stt/aws",
+    ...
+    "server/services/stt/foo",
+    ...
+  ]
+}
+```
+
+#### 8c: Add to supported-services.mdx
+
+Add a new row to the correct category table in `DOCS_PATH/server/services/supported-services.mdx`.
+
+Use this format:
+```
+| [DisplayName](/server/services/{category}/{provider}) | `pip install "pipecat-ai[package]"` |
+```
+
+To determine the correct values:
+- **DisplayName**: Use the service's human-readable name (e.g., "ElevenLabs", "AWS Polly", "Google Gemini")
+- **package**: Look at the service's `pyproject.toml` extras or the import pattern in the source code. For example, if the service is in `src/pipecat/services/foo/`, the package is typically `foo`.
+- If no pip dependencies are required, use `No dependencies required` instead.
+
+Insert the new row **alphabetically** within the table. Match the column alignment of the existing rows.
+
 ### Step 9: Output summary

 After all edits are complete, print a summary:
@@ -221,6 +272,9 @@ After all edits are complete, print a summary:
 ### Updated guides
 - `guides/learn/speech-to-text.mdx` — Updated code example (renamed `old_param` → `new_param`)

+### New service pages
+- `server/services/tts/newprovider.mdx` — Created page, added to docs.json (Text-to-Speech), added to supported-services.mdx
+
 ### Unmapped source files
 - `src/pipecat/services/newprovider/tts.py` — NewProviderTTSService (no doc page exists)

@@ -247,4 +301,6 @@ Before finishing, verify:
 - [ ] New parameters have accurate types and defaults from source
 - [ ] Formatting matches the existing page style
 - [ ] Guides referencing changed APIs were checked and updated
+- [ ] New service pages were added to `docs.json` in the correct group, alphabetically
+- [ ] New service pages were added to `supported-services.mdx` in the correct table, alphabetically
 - [ ] Unmapped files were reported to the user
--- a/.github/workflows/coverage.yaml
+++ b/.github/workflows/coverage.yaml
@@ -37,11 +37,12 @@ jobs:
          uv sync --group dev \
            --extra anthropic \
            --extra aws \
+            --extra deepgram \
            --extra google \
            --extra langchain \
            --extra livekit \
-            --extra local-smart-turn-v3 \
            --extra piper \
+            --extra sagemaker \
            --extra tracing \
            --extra websocket

--- a/.github/workflows/tests.yaml
+++ b/.github/workflows/tests.yaml
@@ -41,11 +41,12 @@ jobs:
          uv sync --group dev \
            --extra anthropic \
            --extra aws \
+            --extra deepgram \
            --extra google \
            --extra langchain \
            --extra livekit \
-            --extra local-smart-turn-v3 \
            --extra piper \
+            --extra sagemaker \
            --extra tracing \
            --extra websocket

--- a/.github/workflows/update-docs.yml
+++ b/.github/workflows/update-docs.yml
@@ -0,0 +1,147 @@
+name: Update Documentation on PR Merge
+
+on:
+  pull_request_target:
+    types: [closed]
+    branches: [main]
+    paths:
+      - "src/pipecat/services/**"
+      - "src/pipecat/transports/**"
+      - "src/pipecat/serializers/**"
+      - "src/pipecat/processors/**"
+      - "src/pipecat/audio/**"
+      - "src/pipecat/turns/**"
+      - "src/pipecat/observers/**"
+      - "src/pipecat/pipeline/**"
+  workflow_dispatch:
+    inputs:
+      pr_number:
+        description: "PR number to generate docs for"
+        required: true
+        type: string
+
+jobs:
+  update-docs:
+    if: >-
+      github.event_name == 'workflow_dispatch' ||
+      github.event.pull_request.merged == true
+    runs-on: ubuntu-latest
+    timeout-minutes: 15
+    permissions:
+      contents: read
+      pull-requests: read
+      id-token: write
+    steps:
+      - name: Checkout pipecat
+        uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+
+      - name: Checkout docs
+        uses: actions/checkout@v4
+        with:
+          repository: pipecat-ai/docs
+          token: ${{ secrets.DOCS_SYNC_TOKEN }}
+          path: _docs
+
+      - name: Resolve PR number
+        id: pr
+        run: |
+          if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
+            echo "number=${{ inputs.pr_number }}" >> "$GITHUB_OUTPUT"
+          else
+            echo "number=${{ github.event.pull_request.number }}" >> "$GITHUB_OUTPUT"
+          fi
+
+      - name: Update documentation
+        uses: anthropics/claude-code-action@v1
+        env:
+          DOCS_SYNC_TOKEN: ${{ secrets.DOCS_SYNC_TOKEN }}
+        with:
+          anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
+          github_token: ${{ secrets.GITHUB_TOKEN }}
+          prompt: |
+            You are updating documentation for the pipecat-ai/docs repository based on
+            changes merged in PR #${{ steps.pr.outputs.number }} of pipecat-ai/pipecat.
+
+            ## Setup
+
+            1. Read the skill instructions at `.claude/skills/update-docs/SKILL.md`
+            2. Read the source-to-doc mapping at `.claude/skills/update-docs/SOURCE_DOC_MAPPING.md`
+            3. The docs repository is checked out at `./_docs/`
+
+            ## Get the diff
+
+            Run `gh pr diff ${{ steps.pr.outputs.number }}` to see what changed in the PR.
+            Also run `gh pr diff ${{ steps.pr.outputs.number }} --name-only` to get the list of changed files.
+            Filter to source files matching the directories listed in SKILL.md Step 3.
+
+            If no relevant source files were changed, exit with "No documentation changes needed."
+
+            ## Follow the skill instructions
+
+            Apply the SKILL.md workflow (Steps 3-9) with these adaptations for automation:
+
+            ### Docs path
+            Use `./_docs/` — it's already checked out. Do not ask for a path.
+
+            ### Branch management
+            - Branch name: `docs/pr-${{ steps.pr.outputs.number }}`
+            - Work inside `./_docs/` for all doc edits and git operations
+            - Check if the branch already exists on the remote:
+              ```bash
+              cd _docs && git fetch origin docs/pr-${{ steps.pr.outputs.number }} 2>/dev/null
+              ```
+              - If it exists: check it out (supports workflow re-runs)
+              - If not: create it from main
+
+            ### Git config
+            Before committing in `_docs`, set:
+            ```bash
+            git config user.name "github-actions[bot]"
+            git config user.email "github-actions[bot]@users.noreply.github.com"
+            ```
+
+            ### No interactive questions
+            Do not ask questions. If you encounter gaps (unmapped files, missing sections,
+            ambiguous changes), note them in the PR body under "## Gaps identified".
+
+            ### Creating the docs PR
+            After committing all changes in `_docs`, push and create a PR:
+            ```bash
+            cd _docs
+            git push -u origin docs/pr-${{ steps.pr.outputs.number }}
+            GH_TOKEN=$DOCS_SYNC_TOKEN gh pr create \
+              --repo pipecat-ai/docs \
+              --label auto-docs \
+              --title "docs: update for pipecat PR #${{ steps.pr.outputs.number }}" \
+              --body "$(cat <<'BODY'
+            Automated documentation update for [pipecat PR #${{ steps.pr.outputs.number }}](https://github.com/pipecat-ai/pipecat/pull/${{ steps.pr.outputs.number }}).
+
+            ## Changes
+            <summarize each doc page updated and what changed>
+
+            ## Gaps identified
+            <any unmapped files, missing doc pages, or missing sections — or "None">
+            BODY
+            )"
+            ```
+
+            ### Re-run handling
+            If `gh pr create` fails because a PR from that branch already exists,
+            push the updated commits and use `gh pr edit` to update the body instead.
+
+            ### No-op
+            If after analyzing the diff you determine no documentation changes are needed
+            (e.g., only skip-listed files changed, or changes don't affect public API docs),
+            exit cleanly without creating a branch or PR. Output "No documentation changes needed."
+
+            ## Important rules
+            - Only modify files inside `./_docs/` — never modify pipecat source code
+            - Follow the conservative editing rules from SKILL.md Step 6
+            - Read each doc page fully before editing (SKILL.md Guidelines)
+            - Use `GH_TOKEN=$DOCS_SYNC_TOKEN` for all `gh` commands targeting pipecat-ai/docs
+          claude_args: |
+            --model claude-sonnet-4-5-20250929
+            --max-turns 30
+            --allowedTools "Read,Write,Edit,Glob,Grep,Bash"
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -7,6 +7,654 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

 <!-- towncrier release notes start -->

+## [0.0.105] - 2026-03-10
+
+### Added
+
+- Added concurrent audio context support: `CartesiaTTSService` can now
+  synthesize the next sentence while the previous one is still playing, by
+  setting `pause_frame_processing=False` and routing each sentence through its
+  own audio context queue.
+  (PR [#3804](https://github.com/pipecat-ai/pipecat/pull/3804))
+
+- Added custom video track support to Daily transport. Use
+  `video_out_destinations` in `DailyParams` to publish multiple video tracks
+  simultaneously, mirroring the existing `audio_out_destinations` feature.
+  (PR [#3831](https://github.com/pipecat-ai/pipecat/pull/3831))
+
+- Added `ServiceSwitcherStrategyFailover` that automatically switches to the
+  next service when the active service reports a non-fatal error. Recovery
+  policies can be implemented via the `on_service_switched` event handler.
+  (PR [#3861](https://github.com/pipecat-ai/pipecat/pull/3861))
+
+- Added optional `timeout_secs` parameter to `register_function()` and
+  `register_direct_function()` for per-tool function call timeout control,
+  overriding the global `function_call_timeout_secs` default.
+  (PR [#3915](https://github.com/pipecat-ai/pipecat/pull/3915))
+
+- Added `cloud-audio-only` recording option to Daily transport's
+  `enable_recording` property.
+  (PR [#3916](https://github.com/pipecat-ai/pipecat/pull/3916))
+
+- Wired up `system_instruction` in `BaseOpenAILLMService`,
+  `AnthropicLLMService`, and `AWSBedrockLLMService` so it works as a default
+  system prompt, matching the behavior of the Google services. This enables
+  sharing a single `LLMContext` across multiple LLM services, where each
+  service provides its own system instruction independently.
+
+    ```python
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        system_instruction="You are a helpful assistant.",
+    )
+
+    context = LLMContext()
+
+    @transport.event_handler("on_client_connected")
+    async def on_client_connected(transport, client):
+        context.add_message({"role": "user", "content": "Please introduce yourself."})
+        await task.queue_frames([LLMRunFrame()])
+    ```
+  (PR [#3918](https://github.com/pipecat-ai/pipecat/pull/3918))
+
+- Added `vad_threshold` parameter to `AssemblyAIConnectionParams` for
+  configuring voice activity detection sensitivity in U3 Pro. Aligning this
+  with external VAD thresholds (e.g., Silero VAD) prevents the "dead zone"
+  where AssemblyAI transcribes speech that VAD hasn't detected yet.
+  (PR [#3927](https://github.com/pipecat-ai/pipecat/pull/3927))
+
+- Added `push_empty_transcripts` parameter to `BaseWhisperSTTService` and
+  `OpenAISTTService` to allow empty transcripts to be pushed downstream as
+  `TranscriptionFrame` instead of discarding them (the default behavior). This
+  is intended for situations where VAD fires even though the user did not
+  speak. In these cases, it is useful to know that nothing was transcribed so
+  that the agent can resume speaking, instead of waiting longer for a
+  transcription.
+  (PR [#3930](https://github.com/pipecat-ai/pipecat/pull/3930))
+
+- LLM services (`BaseOpenAILLMService`, `AnthropicLLMService`,
+  `AWSBedrockLLMService`) now log a warning when both `system_instruction` and
+  a system message in the context are set. The constructor's
+  `system_instruction` takes precedence.
+  (PR [#3932](https://github.com/pipecat-ai/pipecat/pull/3932))
+
+- Runtime settings updates (via `STTUpdateSettingsFrame`) now work for AWS
+  Transcribe, Azure, Cartesia, Deepgram, ElevenLabs Realtime, Gradium, and
+  Soniox STT services. Previously, changing settings at runtime only stored the
+  new values without reconnecting.
+  (PR [#3946](https://github.com/pipecat-ai/pipecat/pull/3946))
+
+- Exposed `on_summary_applied` event on `LLMAssistantAggregator`, allowing
+  users to listen for context summarization events without accessing private
+  members.
+  (PR [#3947](https://github.com/pipecat-ai/pipecat/pull/3947))
+
+- Deepgram Flux STT settings (`keyterm`, `eot_threshold`,
+  `eager_eot_threshold`, `eot_timeout_ms`) can now be updated mid-stream via
+  `STTUpdateSettingsFrame` without triggering a reconnect. The new values are
+  sent to Deepgram as a Configure WebSocket message on the existing connection.
+  (PR [#3953](https://github.com/pipecat-ai/pipecat/pull/3953))
+
+- Added `system_instruction` parameter to `run_inference` across all LLM
+  services, allowing callers to override the system prompt for one-shot
+  inference calls. Used by `_generate_summary` to pass the summarization prompt
+  cleanly.
+  (PR [#3968](https://github.com/pipecat-ai/pipecat/pull/3968))
+
+### Changed
+
+- Audio context management (previously in `AudioContextTTSService`) is now
+  built into `TTSService`. All WebSocket providers (`cartesia`, `elevenlabs`,
+  `asyncai`, `inworld`, `rime`, `gradium`, `resembleai`) now inherit from
+  `WebsocketTTSService` directly. Word-timestamp baseline is set automatically
+  on the first audio chunk of each context instead of requiring each provider
+  to call `start_word_timestamps()` in their receive loop.
+  (PR [#3804](https://github.com/pipecat-ai/pipecat/pull/3804))
+
+- Daily transport now uses `CustomVideoSource`/`CustomVideoTrack` instead of
+  `VirtualCameraDevice` for the default camera output, mirroring how audio
+  already works with `CustomAudioSource`/`CustomAudioTrack`.
+  (PR [#3831](https://github.com/pipecat-ai/pipecat/pull/3831))
+
+- ⚠️ Updated `DeepgramSTTService` to use `deepgram-sdk` v6. The `LiveOptions`
+  class was removed from the SDK and is now provided by pipecat directly;
+  import it from `pipecat.services.deepgram.stt` instead of `deepgram`.
+  (PR [#3848](https://github.com/pipecat-ai/pipecat/pull/3848))
+
+- `ServiceSwitcherStrategy` base class now provides a `handle_error()` hook for
+  subclasses to implement error-based switching. `ServiceSwitcher` defaults to
+  `ServiceSwitcherStrategyManual` and `strategy_type` is now optional.
+  (PR [#3861](https://github.com/pipecat-ai/pipecat/pull/3861))
+
+- Support for Voice Focus 2.0 models.
+    - Updated `aic-sdk` to `~=2.1.0` to support Voice Focus 2.0 models.
+    - Cleaned unused `ParameterFixedError` exception handling in `AICFilter`
+      parameter setup.
+  (PR [#3889](https://github.com/pipecat-ai/pipecat/pull/3889))
+
+- `max_context_tokens` and `max_unsummarized_messages` in
+  `LLMAutoContextSummarizationConfig` (and deprecated
+  `LLMContextSummarizationConfig`) can now be set to `None` independently to
+  disable that summarization threshold. At least one must remain set.
+  (PR [#3914](https://github.com/pipecat-ai/pipecat/pull/3914))
+
+- ⚠️ Removed `formatted_finals` and `word_finalization_max_wait_time` from
+  `AssemblyAIConnectionParams` as these were v2 API parameters not supported in
+  v3. Clarified that `format_turns` only applies to Universal-Streaming models;
+  U3 Pro has automatic formatting built-in.
+  (PR [#3927](https://github.com/pipecat-ai/pipecat/pull/3927))
+
+- Changed `DeepgramTTSService` to send a Clear message on interruption instead
+  of disconnecting and reconnecting the WebSocket, allowing the connection to
+  persist throughout the session.
+  (PR [#3958](https://github.com/pipecat-ai/pipecat/pull/3958))
+
+- Re-added `enhancement_level` support to `AICFilter` with runtime
+  `FilterEnableFrame` control, applying `ProcessorParameter.Bypass` and
+  `ProcessorParameter.EnhancementLevel` together.
+  (PR [#3961](https://github.com/pipecat-ai/pipecat/pull/3961))
+
+- Updated `daily-python` dependency from `~=0.23.0` to `~=0.24.0`.
+  (PR [#3970](https://github.com/pipecat-ai/pipecat/pull/3970))
+
+- Updated `FishAudioTTSService` default model from `s1` to `s2-pro`, matching
+  Fish Audio's latest recommended model for improved quality and speed.
+  (PR [#3973](https://github.com/pipecat-ai/pipecat/pull/3973))
+
+- `AzureSTTService` `region` parameter is now optional when `private_endpoint`
+  is provided. A `ValueError` is raised if neither is given, and a warning is
+  logged if both are provided (`private_endpoint` takes priority).
+  (PR [#3974](https://github.com/pipecat-ai/pipecat/pull/3974))
+
+### Deprecated
+
+- Deprecated `AudioContextTTSService` and `AudioContextWordTTSService`.
+  Subclass `WebsocketTTSService` directly instead; audio context management is
+  now part of the base `TTSService`.
+  - Deprecated `WordTTSService`, `WebsocketWordTTSService`, and
+    `InterruptibleWordTTSService`. Word timestamp logic is now always active in
+    `TTSService` and no longer needs to be opted into via a subclass.
+  (PR [#3804](https://github.com/pipecat-ai/pipecat/pull/3804))
+
+- Deprecated `pipecat.services.google.llm_vertex`,
+  `pipecat.services.google.llm_openai`, and
+  `pipecat.services.google.gemini_live.llm_vertex` modules. Use
+  `pipecat.services.google.vertex.llm`, `pipecat.services.google.openai.llm`,
+  and `pipecat.services.google.gemini_live.vertex.llm` instead. The old import
+  paths still work but will emit a `DeprecationWarning`.
+  (PR [#3980](https://github.com/pipecat-ai/pipecat/pull/3980))
+
+### Removed
+
+- ⚠️ Removed `supports_word_timestamps` parameter from `TTSService.__init__()`.
+  Word timestamp logic is now always active. Remove this argument from any
+  custom subclass `super().__init__()` calls.
+  (PR [#3804](https://github.com/pipecat-ai/pipecat/pull/3804))
+
+### Fixed
+
+- Fixed `DeepgramSTTService` keepalive ping timeout disconnections. The
+  deepgram-sdk v6 removed automatic keepalive; pipecat now sends explicit
+  `KeepAlive` messages every 5 seconds, within the recommended 3–5 second
+  interval before Deepgram's 10-second inactivity timeout.
+  (PR [#3848](https://github.com/pipecat-ai/pipecat/pull/3848))
+
+- Fixed `BufferError: Existing exports of data: object cannot be re-sized` in
+  `AICFilter` caused by holding a `memoryview` on the mutable audio buffer
+  across async yield points.
+  (PR [#3889](https://github.com/pipecat-ai/pipecat/pull/3889))
+
+- Fixed TTS context not being appended to the assistant message history when
+  using `TTSSpeakFrame` with `append_to_context=True` with some TTS providers.
+  (PR [#3936](https://github.com/pipecat-ai/pipecat/pull/3936))
+
+- Fixed context summarization leaving orphaned tool responses in the kept
+  context when tool calls were moved to the summarized portion.
+  (PR [#3937](https://github.com/pipecat-ai/pipecat/pull/3937))
+
+- Fixed turn completion state not resetting at end of LLM responses.
+  `LLMFullResponseEndFrame` is pushed (not received) by the LLM service, so the
+  mixin now handles it in `push_frame` instead of `process_frame`.
+  (PR [#3956](https://github.com/pipecat-ai/pipecat/pull/3956))
+
+- Fixed turn completion instructions being injected as a context system message
+  instead of using `system_instruction`. This caused warning spam when
+  `system_instruction` was also set and didn't persist across full context
+  updates.
+  (PR [#3957](https://github.com/pipecat-ai/pipecat/pull/3957))
+
+- Fixed `TTSService` audio context queue getting blocked when
+  `append_to_audio_context()` was called with a `None` context ID, which
+  prevented subsequent audio from being delivered.
+  (PR [#3958](https://github.com/pipecat-ai/pipecat/pull/3958))
+
+- Fixed `on_call_state_updated` event handler in LiveKit transport receiving
+  incorrect number of arguments due to redundant `self` passed to
+  `_call_event_handler`.
+  (PR [#3959](https://github.com/pipecat-ai/pipecat/pull/3959))
+
+- Fixed OpenAI Realtime, OpenAI Realtime Beta, and Grok realtime services
+  treating `conversation_already_has_active_response` as a fatal error. These
+  services now log it as a non-fatal debug event when a response is already in
+  progress.
+  (PR [#3960](https://github.com/pipecat-ai/pipecat/pull/3960))
+
+- Fixed `SmallWebRTCConnection` silently discarding messages sent before the
+  data channel is open by queuing them and flushing once the channel is ready.
+  A bounded queue (`MAX_MESSAGE_QUEUE_SIZE = 50`) prevents unbounded memory
+  growth, and a 10-second timeout after connection clears the queue and falls
+  back to discard mode if the data channel never opens.
+  (PR [#3962](https://github.com/pipecat-ai/pipecat/pull/3962))
+
+- Fixed `AzureSTTService` failing to initialize when `private_endpoint` is
+  provided. The Azure Speech SDK's `SpeechConfig` does not accept both `region`
+  and `endpoint` simultaneously, so they are now passed conditionally.
+  (PR [#3967](https://github.com/pipecat-ai/pipecat/pull/3967))
+
+- Fixed `GoogleLLMService` ignoring the `system_instruction` set via
+  constructor or `GoogleLLMSettings` when a system message was also present in
+  the context. The settings value now correctly takes priority, and a warning
+  is logged when both are set.
+  (PR [#3976](https://github.com/pipecat-ai/pipecat/pull/3976))
+
+### Other
+
+- Updated foundational examples to use `system_instruction` on LLM services
+  instead of adding system messages to `LLMContext`.
+  (PR [#3918](https://github.com/pipecat-ai/pipecat/pull/3918))
+
+- Updated AssemblyAI turn detection example to use `keyterms_prompt` list
+  format instead of `prompt` string for improved clarity.
+  (PR [#3929](https://github.com/pipecat-ai/pipecat/pull/3929))
+
+- Updated foundational examples and eval scripts to use `"user"` role instead
+  of `"system"` when adding messages to `LLMContext`, since system prompts
+  should be set via `system_instruction` on the LLM service.
+  (PR [#3931](https://github.com/pipecat-ai/pipecat/pull/3931))
+
+## [0.0.104] - 2026-03-02
+
+### Added
+
+- Added `TextAggregationMetricsData` metric measuring the time from the first
+  LLM token to the first complete sentence, representing the latency cost of
+  sentence aggregation in the TTS pipeline.
+  (PR [#3696](https://github.com/pipecat-ai/pipecat/pull/3696))
+
+- Added support for using strongly-typed objects instead of dicts for updating
+  service settings at runtime.
+
+    Instead of, say:
+
+    ```python
+    await task.queue_frame(
+        STTUpdateSettingsFrame(settings={"language": Language.ES})
+    )
+    ```
+
+    you'd do:
+
+    ```python
+    await task.queue_frame(
+        STTUpdateSettingsFrame(delta=DeepgramSTTSettings(language=Language.ES))
+    )
+    ```
+
+  Each service now vends strongly-typed classes like `DeepgramSTTSettings`
+  representing the service's runtime-updatable settings.
+  (PR [#3714](https://github.com/pipecat-ai/pipecat/pull/3714))
+
+- Added support for specifying private endpoints for Azure Speech-to-Text,
+  enabling use in private networks behind firewalls.
+  (PR [#3764](https://github.com/pipecat-ai/pipecat/pull/3764))
+
+- Added `LemonSliceTransport` and `LemonSliceApi` to support adding real-time
+  LemonSlice Avatars to any Daily room.
+  (PR [#3791](https://github.com/pipecat-ai/pipecat/pull/3791))
+
+- Added `output_medium` parameter to `AgentInputParams` and
+  `OneShotInputParams` in Ultravox service to control initial output medium
+  (text or voice) at call creation time.
+  (PR [#3806](https://github.com/pipecat-ai/pipecat/pull/3806))
+
+- Added `TurnMetricsData` as a generic metrics class for turn detection, with
+  e2e processing time measurement. `KrispVivaTurn` now emits `TurnMetricsData`
+  with `e2e_processing_time_ms` tracking the interval from VAD
+  speech-to-silence transition to turn completion.
+  (PR [#3809](https://github.com/pipecat-ai/pipecat/pull/3809))
+
+- Added `on_audio_context_interrupted()` and `on_audio_context_completed()`
+  callbacks to `AudioContextTTSService`. Subclasses can override these to
+  perform provider-specific cleanup instead of overriding
+  `_handle_interruption()`.
+  (PR [#3814](https://github.com/pipecat-ai/pipecat/pull/3814))
+
+- Added `on_summary_applied` event to `LLMContextSummarizer` for observability,
+  providing message counts before and after context summarization.
+  (PR [#3855](https://github.com/pipecat-ai/pipecat/pull/3855))
+
+- Added `summary_message_template` to `LLMContextSummarizationConfig` for
+  customizing how summaries are formatted when injected into context (e.g.,
+  wrapping in XML tags).
+  (PR [#3855](https://github.com/pipecat-ai/pipecat/pull/3855))
+
+- Added `summarization_timeout` to `LLMContextSummarizationConfig` (default
+  120s) to prevent hung LLM calls from permanently blocking future
+  summarizations.
+  (PR [#3855](https://github.com/pipecat-ai/pipecat/pull/3855))
+
+- Added optional `llm` field to `LLMContextSummarizationConfig` for routing
+  summarization to a dedicated LLM service (e.g., a cheaper/faster model)
+  instead of the pipeline's primary model.
+  (PR [#3855](https://github.com/pipecat-ai/pipecat/pull/3855))
+
+- Add AssemblyAI u3-rt-pro model support with built-in turn detection mode
+  (PR [#3856](https://github.com/pipecat-ai/pipecat/pull/3856))
+
+- Added `LLMSummarizeContextFrame` to trigger on-demand context summarization
+  from anywhere in the pipeline (e.g. a function call tool). Accepts an
+  optional `config: LLMContextSummaryConfig` to override summary generation
+  settings per request.
+  (PR [#3863](https://github.com/pipecat-ai/pipecat/pull/3863))
+
+- Added `LLMContextSummaryConfig` (summary generation params:
+  `target_context_tokens`, `min_messages_after_summary`,
+  `summarization_prompt`) and `LLMAutoContextSummarizationConfig` (auto-trigger
+  thresholds: `max_context_tokens`, `max_unsummarized_messages`, plus a nested
+  `summary_config`). These replace the monolithic
+  `LLMContextSummarizationConfig`.
+  (PR [#3863](https://github.com/pipecat-ai/pipecat/pull/3863))
+
+- Added support for the `speed_alpha` parameter to the `arcana` model in
+  `RimeTTSService`.
+  (PR [#3873](https://github.com/pipecat-ai/pipecat/pull/3873))
+
+- Added `ClientConnectedFrame`, a new `SystemFrame` pushed by all transports
+  (Daily, LiveKit, FastAPI WebSocket, WebSocket Server, SmallWebRTC, HeyGen,
+  Tavus) when a client connects. Enables observers to track transport readiness
+  timing.
+  (PR [#3881](https://github.com/pipecat-ai/pipecat/pull/3881))
+
+- Added `StartupTimingObserver` for measuring how long each processor's
+  `start()` method takes during pipeline startup. Also measures transport
+  readiness — the time from `StartFrame` to first client connection — via the
+  `on_transport_timing_report` event.
+  (PR [#3881](https://github.com/pipecat-ai/pipecat/pull/3881))
+
+- Added `BotConnectedFrame` for SFU transports and `on_transport_timing_report`
+  event to `StartupTimingObserver` with bot and client connection timing.
+  (PR [#3881](https://github.com/pipecat-ai/pipecat/pull/3881))
+
+- Added optional `direction` parameter to `PipelineTask.queue_frame()` and
+  `PipelineTask.queue_frames()`, allowing frames to be pushed upstream from the
+  end of the pipeline.
+  (PR [#3883](https://github.com/pipecat-ai/pipecat/pull/3883))
+
+- Added `on_latency_breakdown` event to `UserBotLatencyObserver` providing
+  per-service TTFB, text aggregation, user turn duration, and function call
+  latency metrics for each user-to-bot response cycle.
+  (PR [#3885](https://github.com/pipecat-ai/pipecat/pull/3885))
+
+- Added `on_first_bot_speech_latency` event to `UserBotLatencyObserver`
+  measuring the time from client connection to first bot speech. An
+  `on_latency_breakdown` is also emitted for this first speech event.
+  (PR [#3885](https://github.com/pipecat-ai/pipecat/pull/3885))
+
+- Added `broadcast_interruption()` to `FrameProcessor`. This method pushes an
+  `InterruptionFrame` both upstream and downstream directly from the calling
+  processor, avoiding the round-trip through the pipeline task that
+  `push_interruption_task_frame_and_wait()` required.
+  (PR [#3896](https://github.com/pipecat-ai/pipecat/pull/3896))
+
+### Changed
+
+- Added `text_aggregation_mode` parameter to `TTSService` and all TTS
+  subclasses with a new `TextAggregationMode` enum (`SENTENCE`, `TOKEN`). All
+  text now flows through text aggregators regardless of mode, enabling pattern
+  detection and tag handling in TOKEN mode.
+  (PR [#3696](https://github.com/pipecat-ai/pipecat/pull/3696))
+
+- ⚠️ Refactored runtime-updatable service settings to use strongly-typed
+  classes (`TTSSettings`, `STTSettings`, `LLMSettings`, and service-specific
+  subclasses) instead of plain dicts. Each service's `_settings` now holds
+  these strongly-typed objects. For service maintainers, see changes in
+  COMMUNITY_INTEGRATIONS.md.
+  (PR [#3714](https://github.com/pipecat-ai/pipecat/pull/3714))
+
+- Word timestamp support has been moved from `WordTTSService` into `TTSService`
+  via a new `supports_word_timestamps` parameter. Services that previously
+  extended `WordTTSService`, `AudioContextWordTTSService`, or
+  `WebsocketWordTTSService` now pass `supports_word_timestamps=True` to their
+  parent `__init__` instead.
+  (PR [#3786](https://github.com/pipecat-ai/pipecat/pull/3786))
+
+- Improved Ultravox TTFB measurement accuracy by using VAD speech end time
+  instead of `UserStoppedSpeakingFrame` timing.
+  (PR [#3806](https://github.com/pipecat-ai/pipecat/pull/3806))
+
+- Aligned `UltravoxRealtimeLLMService` frame handling with OpenAI/Gemini
+  realtime services: added `InterruptionFrame` handling with metrics cleanup,
+  processing metrics at response boundaries, and improved agent transcript
+  handling for both voice and text output modalities.
+  (PR [#3806](https://github.com/pipecat-ai/pipecat/pull/3806))
+
+- Updated `OpenAIRealtimeLLMService` default model to `gpt-realtime-1.5`.
+  (PR [#3807](https://github.com/pipecat-ai/pipecat/pull/3807))
+
+- Added `api_key` parameter to `KrispVivaSDKManager`, `KrispVivaTurn`, and
+  `KrispVivaFilter` for Krisp SDK v1.6.1+ licensing. Falls back to
+  `KRISP_VIVA_API_KEY` environment variable.
+  (PR [#3809](https://github.com/pipecat-ai/pipecat/pull/3809))
+
+- Bumped `nltk` minimum version from 3.9.1 to 3.9.3 to resolve a security
+  vulnerability.
+  (PR [#3811](https://github.com/pipecat-ai/pipecat/pull/3811))
+
+- `ServiceSettingsUpdateFrame`s are now `UninterruptibleFrame`s. Generally
+  speaking, you don't want a user interruption to prevent a service setting
+  change from going into effect. Note that you usually don't use
+  `ServiceSettingsUpdateFrame` directly, you use one of its subclasses:
+    - `LLMUpdateSettingsFrame`
+    - `TTSUpdateSettingsFrame`
+    - `STTUpdateSettingsFrame`
+  (PR [#3819](https://github.com/pipecat-ai/pipecat/pull/3819))
+
+- Updated context summarization to use `user` role instead of `assistant` for
+  summary messages.
+  (PR [#3855](https://github.com/pipecat-ai/pipecat/pull/3855))
+
+- Rename `AssemblyAISTTService` parameter
+  `min_end_of_turn_silence_when_confident` parameter to `min_turn_silence` (old
+  name still supported with deprecation warning)
+  (PR [#3856](https://github.com/pipecat-ai/pipecat/pull/3856))
+
+- ⚠️ Renamed `LLMAssistantAggregatorParams` fields:
+  `enable_context_summarization` → `enable_auto_context_summarization` and
+  `context_summarization_config` → `auto_context_summarization_config` (now
+  accepts `LLMAutoContextSummarizationConfig`). The old names still work with a
+  `DeprecationWarning` for one release cycle.
+  (PR [#3863](https://github.com/pipecat-ai/pipecat/pull/3863))
+
+- `ElevenLabsRealtimeSTTService` now sets `TranscriptionFrame.finalized` to
+  `True` when using `CommitStrategy.MANUAL`.
+  (PR [#3865](https://github.com/pipecat-ai/pipecat/pull/3865))
+
+- Updated numba version pin from == to >=0.61.2
+  (PR [#3868](https://github.com/pipecat-ai/pipecat/pull/3868))
+
+- Updated tracing code to use `ServiceSettings` dataclass API
+  (`given_fields()`, attribute access) instead of dict-style access
+  (`.items()`, `in`, subscript).
+  (PR [#3879](https://github.com/pipecat-ai/pipecat/pull/3879))
+
+- ⚠️ Removed `event` field and `complete()` method from `InterruptionFrame`.
+  Removed `event` field from `InterruptionTaskFrame`. These are no longer
+  needed since `broadcast_interruption()` does not require a round-trip
+  completion signal.
+  (PR [#3896](https://github.com/pipecat-ai/pipecat/pull/3896))
+
+- Moved `pipecat.services.deepgram.stt_sagemaker` and
+  `pipecat.services.deepgram.tts_sagemaker` to
+  `pipecat.services.deepgram.sagemaker.stt` and
+  `pipecat.services.deepgram.sagemaker.tts`. The old import paths still work
+  but emit a `DeprecationWarning`.
+  (PR [#3902](https://github.com/pipecat-ai/pipecat/pull/3902))
+
+### Deprecated
+
+- ⚠️ Deprecated `aggregate_sentences` parameter on `TTSService` and all TTS
+  subclasses. Use `text_aggregation_mode=TextAggregationMode.SENTENCE` or
+  `text_aggregation_mode=TextAggregationMode.TOKEN` instead.
+  (PR [#3696](https://github.com/pipecat-ai/pipecat/pull/3696))
+
+- Deprecated `set_model()`, `set_voice()`, and `set_language()` on AI services
+  in favor of runtime updates via `TTSUpdateSettingsFrame`,
+  `STTUpdateSettingsFrame`, and `LLMUpdateSettingsFrame`.
+
+  ⚠️ Note, too, a subtle behavior change in these deprecated methods. Whereas
+  previously only `set_language()` caused the service to actually react to the
+  update (e.g. by reconnecting to a remote service so it an pick up the
+  change), now all these methods do. This change was made as part of a refactor
+  making them all work the same way under the hood.
+  (PR [#3714](https://github.com/pipecat-ai/pipecat/pull/3714))
+
+- Dict-based `*UpdateSettingsFrame(settings={...})` is deprecated in favor of
+  passing typed settings delta objects with
+  `*UpdateSettingsFrame(delta={...})`.
+  (PR [#3714](https://github.com/pipecat-ai/pipecat/pull/3714))
+
+- Deprecated `WordTTSService`, `WebsocketWordTTSService`,
+  `AudioContextWordTTSService`, and `InterruptibleWordTTSService`. Use their
+  non-word counterparts with `supports_word_timestamps=True` instead:
+    - `WordTTSService` → `TTSService(supports_word_timestamps=True)`
+    - `WebsocketWordTTSService` →
+  `WebsocketTTSService(supports_word_timestamps=True)`
+    - `AudioContextWordTTSService` →
+  `AudioContextTTSService(supports_word_timestamps=True)`
+    - `InterruptibleWordTTSService` →
+  `InterruptibleTTSService(supports_word_timestamps=True)`
+  (PR [#3786](https://github.com/pipecat-ai/pipecat/pull/3786))
+
+- Deprecated `SmartTurnMetricsData` in favor of `TurnMetricsData`.
+  `BaseSmartTurn` now emits `TurnMetricsData` directly.
+  (PR [#3809](https://github.com/pipecat-ai/pipecat/pull/3809))
+
+- Deprecated `LLMContextSummarizationConfig`. Use
+  `LLMAutoContextSummarizationConfig` with a nested `LLMContextSummaryConfig`
+  instead. The old class emits a `DeprecationWarning`.
+  (PR [#3863](https://github.com/pipecat-ai/pipecat/pull/3863))
+
+- Deprecated `push_interruption_task_frame_and_wait()` in `FrameProcessor`. Use
+  `broadcast_interruption()` instead. The old method now delegates to
+  `broadcast_interruption()` and logs a deprecation warning.
+  (PR [#3896](https://github.com/pipecat-ai/pipecat/pull/3896))
+
+### Removed
+
+- Removed `local-smart-turn-v3` optional extra from `pyproject.toml`. The
+  `transformers` and `onnxruntime` packages are now always installed as core
+  dependencies since they are required by the default turn stop strategy,
+  `TurnAnalyzerUserTurnStopStrategy` which uses `LocalSmartTurnAnalyzerV3`.
+  (PR [#3803](https://github.com/pipecat-ai/pipecat/pull/3803))
+
+- ⚠️ Removed `PlayHTTTSService` and `PlayHTHttpTTSService`. PlayHT has been
+  shut down and is no longer available.
+  (PR [#3838](https://github.com/pipecat-ai/pipecat/pull/3838))
+
+### Fixed
+
+- Added `LLMSpecificMessage` handling in `LLMContextSummarizationUtil` to skip
+  provider-specific messages during context summarization.
+  (PR [#3794](https://github.com/pipecat-ai/pipecat/pull/3794))
+
+- Treated `response_cancel_not_active` as a non-fatal error in realtime
+  services (`OpenAIRealtimeLLMService`, `GrokRealtimeLLMService`,
+  `OpenAIRealtimeBetaLLMService`) to prevent WebSocket disconnection when
+  cancelling an inactive response.
+  (PR [#3795](https://github.com/pipecat-ai/pipecat/pull/3795))
+
+- Fixed Poetry compatibility by inlining `local-smart-turn-v3` dependencies
+  (`transformers`, `onnxruntime`) into core dependencies instead of using a
+  self-referential extra.
+  (PR [#3803](https://github.com/pipecat-ai/pipecat/pull/3803))
+
+- Fixed `SentryMetrics` method signatures to match updated
+  `FrameProcessorMetrics` base class, resolving `TypeError` when using
+  `start_time`/`end_time` keyword arguments.
+  (PR [#3808](https://github.com/pipecat-ai/pipecat/pull/3808))
+
+- Fixed STT TTFB metrics not being reported for `SonioxSTTService` and
+  `AWSTranscribeSTTService` due to missing `can_generate_metrics()` override.
+  (PR [#3813](https://github.com/pipecat-ai/pipecat/pull/3813))
+
+- Fixed an issue where `AudioContextTTSService`-based providers (AsyncAI,
+  ElevenLabs, Inworld, Rime) did not close or clean up their server-side audio
+  contexts after normal speech completion, only on interruption.
+  (PR [#3814](https://github.com/pipecat-ai/pipecat/pull/3814))
+
+- Fixed STT TTFB metrics measuring timeout expiry time instead of actual
+  transcript arrival time.
+  (PR [#3822](https://github.com/pipecat-ai/pipecat/pull/3822))
+
+- Fixed `InterimTranscriptionFrame` and `TranslationFrame` being
+  unintentionally pushed downstream in `LLMUserAggregator`. They are now
+  consumed like `TranscriptionFrame`.
+  (PR [#3825](https://github.com/pipecat-ai/pipecat/pull/3825))
+
+- Fixed misleading "Empty audio frame received for STT service" warnings when
+  using audio filters (e.g. `RNNoiseFilter`, `KrispVivaFilter`, `AICFilter`)
+  that buffer audio internally.
+  (PR [#3828](https://github.com/pipecat-ai/pipecat/pull/3828))
+
+- Fixed issues with `RimeNonJsonTTSService` where trailing punctuation is
+  sometimes vocalized
+  (PR [#3837](https://github.com/pipecat-ai/pipecat/pull/3837))
+
+- Fixed `TTSSpeakFrame` not committing spoken text to the conversation context
+  when used outside of an LLM response (e.g., bot greetings or injected
+  speech).
+  (PR [#3845](https://github.com/pipecat-ai/pipecat/pull/3845))
+
+- Removed verbose per-chunk audio logging from `GenesysAudioHookSerializer`
+  that flooded production logs.
+  (PR [#3850](https://github.com/pipecat-ai/pipecat/pull/3850))
+
+- Add beta feature warning when using custom prompts with AssemblyAI
+  (PR [#3856](https://github.com/pipecat-ai/pipecat/pull/3856))
+
+- Fixed `LocalSmartTurnAnalyzerV3` producing incorrect end-of-turn predictions
+  at non-16kHz sample rates (e.g. 8kHz Twilio telephony) by adding automatic
+  resampling to 16kHz before Whisper feature extraction.
+  (PR [#3857](https://github.com/pipecat-ai/pipecat/pull/3857))
+
+- Fixed `PipelineTask` double-inserting `RTVIProcessor` into the frame chain
+  when the user provides both an `RTVIProcessor` in the pipeline and a custom
+  `RTVIObserver` subclass in observers.
+  (PR [#3867](https://github.com/pipecat-ai/pipecat/pull/3867))
+
+- Fixed turn completion instructions being lost when `LLMMessagesUpdateFrame`
+  replaces the LLM context. When `filter_incomplete_user_turns` is enabled, the
+  turn completion system message is now re-injected after context replacement.
+  (PR [#3888](https://github.com/pipecat-ai/pipecat/pull/3888))
+
+- Fixed Azure TTS and STT services silently swallowing cancellation errors
+  (invalid API key, network failures, rate limiting) instead of propagating
+  them as `ErrorFrame`s to the pipeline.
+  (PR [#3893](https://github.com/pipecat-ai/pipecat/pull/3893))
+
+### Performance
+
+- Switched `GradiumTTSService` from `InterruptibleWordTTSService` to
+  `AudioContextWordTTSService`, eliminating websocket disconnect/reconnect on
+  every interruption by using `client_req_id`-based multiplexing.
+  (PR [#3759](https://github.com/pipecat-ai/pipecat/pull/3759))
+
+### Other
+
+- Standardized Sarvam STT/TTS User-Agent header handling to consistently send
+  Pipecat SDK identity in websocket requests.
+  (PR [#3886](https://github.com/pipecat-ai/pipecat/pull/3886))
+
 ## [0.0.103] - 2026-02-20

 ### Added
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -25,7 +25,7 @@ uv run pytest tests/test_name.py
 uv run pytest tests/test_name.py::test_function_name

 # Preview changelog
-towncrier build --draft --version Unreleased
+uv run towncrier build --draft --version Unreleased

 # Lint and format check
 uv run ruff check
@@ -74,7 +74,7 @@ All data flows as **Frame** objects through a pipeline of **FrameProcessors**:
 - **Context Aggregation**: `LLMContext` accumulates messages for LLM calls; `UserResponse` aggregates user input

 - **Turn Management**: Turn management is done through `LLMUserAggregator` and
-`LLMAssistantAggregator`, created with `LLMContextAggregatorPair`
+  `LLMAssistantAggregator`, created with `LLMContextAggregatorPair`

 - **User turn strategies**: Detection of when the user starts and stops speaking is done via user turn start/stop strategies. They push `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` respectively.

@@ -90,23 +90,26 @@ All data flows as **Frame** objects through a pipeline of **FrameProcessors**:

 ### Key Directories

-| Directory                 | Purpose                                            |
-|---------------------------|----------------------------------------------------|
-| `src/pipecat/frames/`     | Frame definitions (100+ types)                     |
-| `src/pipecat/processors/` | FrameProcessor base + aggregators, filters, audio  |
-| `src/pipecat/pipeline/`   | Pipeline orchestration                             |
-| `src/pipecat/services/`   | AI service integrations (60+ providers)            |
-| `src/pipecat/transports/` | Transport layer (Daily, LiveKit, WebSocket, Local) |
-| `src/pipecat/serializers/`| Frame serialization for WebSocket protocols        |
-| `src/pipecat/observers/`  | Pipeline observers for monitoring frame flow       |
-| `src/pipecat/audio/`      | VAD, filters, mixers, turn detection, DTMF         |
-| `src/pipecat/turns/`      | User turn management                               |
+| Directory                  | Purpose                                            |
+| -------------------------- | -------------------------------------------------- |
+| `src/pipecat/frames/`      | Frame definitions (100+ types)                     |
+| `src/pipecat/processors/`  | FrameProcessor base + aggregators, filters, audio  |
+| `src/pipecat/pipeline/`    | Pipeline orchestration                             |
+| `src/pipecat/services/`    | AI service integrations (60+ providers)            |
+| `src/pipecat/transports/`  | Transport layer (Daily, LiveKit, WebSocket, Local) |
+| `src/pipecat/serializers/` | Frame serialization for WebSocket protocols        |
+| `src/pipecat/observers/`   | Pipeline observers for monitoring frame flow       |
+| `src/pipecat/audio/`       | VAD, filters, mixers, turn detection, DTMF         |
+| `src/pipecat/turns/`       | User turn management                               |

 ## Code Style

 - **Docstrings**: Google-style. Classes describe purpose; `__init__` has `Args:` section; dataclasses use `Parameters:` section.
 - **Linting**: Ruff (line length 100). Pre-commit hooks enforce formatting.
 - **Type hints**: Required for complex async code.
+- **Dataclass vs Pydantic**: Use `@dataclass` for frames and internal pipeline data (high-frequency, no validation needed). Use Pydantic `BaseModel` for configuration, parameters, metrics, and external API data (benefits from validation and serialization). Specifically:
+  - `@dataclass`: Frame types, context aggregator pairs, internal data containers
+  - `BaseModel`: Service `InputParams`, transport/VAD/turn params, metrics data, API request/response models, serializer params

 ### Docstring Example

@@ -152,4 +155,3 @@ When adding a new service:
 ## Testing

 Test utilities live in `src/pipecat/tests/utils.py`. Use `run_test()` to send frames through a pipeline and assert expected output frames in each direction. Use `SleepFrame(sleep=N)` to add delays between frames.
-
--- a/COMMUNITY_INTEGRATIONS.md
+++ b/COMMUNITY_INTEGRATIONS.md
@@ -25,7 +25,6 @@ Your repository must contain these components:
 - **Source code** - Complete implementation following Pipecat patterns
 - **Foundational example** - Single file example showing basic usage (see [Pipecat examples](https://github.com/pipecat-ai/pipecat/tree/main/examples/foundational))
 - **README.md** - Must include:
-
  - Introduction and explanation of your integration
  - Installation instructions
  - Usage instructions with Pipecat Pipeline
@@ -110,7 +109,6 @@ Once your PR is submitted, post in the `#community-integrations` Discord channel
 #### Key requirements:

 - **Frame sequence:** Output must follow this frame sequence pattern:
-
  - `LLMFullResponseStartFrame` - Signals the start of an LLM response
  - `LLMTextFrame` - Contains LLM content, typically streamed as tokens
  - `LLMFullResponseEndFrame` - Signals the end of an LLM response
@@ -233,24 +231,137 @@ def can_generate_metrics(self) -> bool:
    return True
 ```

-### Dynamic Settings Updates
+### Service Settings

-STT, LLM, and TTS services support `ServiceUpdateSettingsFrame` for dynamic configuration changes. The base STTService has an `_update_settings()` method that handles settings, and the private `_settings` `Dict` is used to store settings and provide access to the subclass.
+Every AI service (STT, LLM, TTS, image generation, etc.) exposes a **Settings dataclass** that serves two roles:
+
+1. **Store mode** — the service's `self._settings` holds the current value of every runtime-updatable field.
+2. **Delta mode** — an update frame (e.g. `TTSUpdateSettingsFrame`) specifies only the fields that should change; unspecified fields remain `NOT_GIVEN`.
+
+#### Defining your Settings class
+
+Extend `STTSettings`, `TTSSettings`, `LLMSettings`, or `ImageGenSettings` (or, if your service directly subclasses `AIService`, `ServiceSettings`). The base classes already provide common fields (e.g. `model`, `voice`, `language`). You only need to add **service-specific knobs that should be runtime-updatable**:

 ```python
-async def set_language(self, language: Language):
-    """Set the recognition language and reconnect.
+from dataclasses import dataclass, field

-    Args:
-        language: The language to use for speech recognition.
+from pipecat.services.settings import TTSSettings, NOT_GIVEN
+
+@dataclass
+class MyTTSSettings(TTSSettings):
+    """Settings for MyTTS service.
+
+    Parameters:
+        speaking_rate: Speed multiplier (0.5–2.0).
    """
-    logger.info(f"Switching STT language to: [{language}]")
-    self._settings["language"] = language
-    await self._disconnect()
-    await self._connect()
+
+    speaking_rate: float | None = field(default_factory=lambda: NOT_GIVEN)
 ```

-Note that, in this example, Deepgram requires the websocket connection be disconnected and reconnected to reinitialize the service with the new value. Consider if your service requires reconnection.
+**What goes in Settings vs. `__init__` params:**
+
+| Belongs in Settings                                      | Stays as `__init__` params                |
+| -------------------------------------------------------- | ----------------------------------------- |
+| Model name, voice, language                              | API keys, auth tokens                     |
+| Service-specific tuning knobs (rate, pitch, temperature) | Base URLs, endpoint overrides             |
+| Anything users may want to change mid-session            | Audio encoding, sample format             |
+|                                                          | Connection parameters (timeouts, retries) |
+
+The rule of thumb: if a caller might send an update frame to change it at runtime, it belongs in Settings. Everything else is init-only config stored as `self._xxx`.
+
+#### Wiring settings into `__init__`
+
+Accept an **optional** `settings` parameter. Build a `default_settings` object with all fields set to real values, then merge any caller overrides with `apply_update`.
+
+Add a `Settings` **class attribute** that points to your settings dataclass. This lets callers access the settings class through the service itself (e.g. `MyTTSService.Settings(...)`) without a separate import:
+
+```python
+from typing import Optional
+
+class MyTTSService(TTSService):
+    Settings = MyTTSSettings
+    _settings: Settings
+
+    def __init__(
+        self,
+        *,
+        api_key: str,
+        settings: Optional[Settings] = None,
+        **kwargs,
+    ):
+        # 1. Defaults — every field has a real value (store mode).
+        default_settings = self.Settings(
+            model="my-model-v1",
+            voice="default-voice",
+            language="en",
+            speaking_rate=1.0,
+        )
+
+        # 2. Merge caller overrides (only given fields win).
+        if settings is not None:
+            default_settings.apply_update(settings)
+
+        # 3. Pass the fully-populated settings to the base class.
+        super().__init__(settings=default_settings, **kwargs)
+
+        # 4. Init-only config stored separately.
+        self._api_key = api_key
+```
+
+This pattern lets callers override only what they care about:
+
+```python
+# Uses all defaults
+svc = MyTTSService(api_key="sk-xxx")
+
+# Overrides just the voice — access Settings through the service class
+svc = MyTTSService(
+    api_key="sk-xxx",
+    settings=MyTTSService.Settings(voice="custom-voice"),
+)
+```
+
+#### Reacting to runtime changes
+
+AI services support runtime configuration changes via `*UpdateSettingsFrame`s (e.g. `STTUpdateSettingsFrame`, `TTSUpdateSettingsFrame`, `LLMUpdateSettingsFrame`).
+
+To react to runtime setting changes, override `_update_settings`. The base implementation applies the delta to `self._settings` and returns a `dict` mapping each changed field name to its **pre-update** value. Your override should call `super()` first, then act on the changed fields. A common implementation might look like:
+
+```python
+async def _update_settings(self, update: TTSSettings) -> dict[str, Any]:
+    """Apply a settings update, reconfiguring the connection if needed."""
+    changed = await super()._update_settings(update)
+
+    if not changed:
+        return changed
+
+    await self._disconnect()
+    await self._connect()
+
+    return changed
+```
+
+The dict keys work like a set for membership tests (`"language" in changed`) and truthiness (`if changed`). Use `changed.keys() - {"language"}` for set difference, or `changed["language"]` to inspect the previous value of a field.
+
+Note that, in this example, the service requires a reconnect to apply the new language. Consider, for each setting, whether your service requires reconnection or can apply changes in-place.
+
+If your service can't yet apply certain settings at runtime, call `self._warn_unhandled_updated_settings(changed)` with any unhandled field names so users get a clear log message:
+
+```python
+async def _update_settings(self, update: TTSSettings) -> dict[str, Any]:
+    changed = await super()._update_settings(update)
+
+    if not changed:
+        return changed
+
+    if "language" in changed:
+        await self._update_language()
+    else:
+        # TODO: this should be temporary - handle changes to other settings soon!
+        self._warn_unhandled_updated_settings(changed.keys() - {"language"})
+
+    return changed
+```

 ### Sample Rate Handling

@@ -260,7 +371,7 @@ Sample rates are set via PipelineParams and passed to each frame processor at in
 async def start(self, frame: StartFrame):
    """Start the service."""
    await super().start(frame)
-    self._settings["output_format"]["sample_rate"] = self.sample_rate
+    self._settings.output_sample_rate = self.sample_rate
    await self._connect()
 ```

--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -49,12 +49,12 @@ Every pull request that makes a user-facing change should include a changelog en
   ```

 2. Choose the appropriate type:
-
   - `added.md` - New features
   - `changed.md` - Changes in existing functionality
   - `deprecated.md` - Soon-to-be removed features
   - `removed.md` - Removed features
   - `fixed.md` - Bug fixes
+   - `performance.md` - Performance improvements
   - `security.md` - Security fixes
   - `other.md` - Other changes (documentation, dependencies, etc.)

@@ -80,7 +80,6 @@ Every pull request that makes a user-facing change should include a changelog en

 ```markdown
 - Updated service configuration:
-
  - Changed default timeout to 30 seconds
  - Added retry logic for failed connections
 ```
@@ -105,7 +104,6 @@ changelog/1234.changed.2.md

 ```markdown
 - Updated service configuration:
-
  - Changed default timeout to 30 seconds
  - Added retry logic for failed connections
 ```
--- a/README.md
+++ b/README.md
@@ -55,6 +55,16 @@ Looking for help debugging your pipeline and processors? Check out [Whisker](htt

 Love terminal applications? Check out [Tail](https://github.com/pipecat-ai/tail), a terminal dashboard for Pipecat.

+### 🤖 Claude Code Skills
+
+Use [Pipecat Skills](https://github.com/pipecat-ai/skills) with [Claude Code](https://claude.ai/code) to scaffold projects, deploy to Pipecat Cloud, and more. Install the marketplace with:
+
+```
+claude plugin marketplace add pipecat-ai/skills
+```
+
+and install any of the available plugins.
+
 ### 📺️ Pipecat TV Channel

 Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.youtube.com/playlist?list=PLzU2zoMTQIHjqC3v4q2XVSR3hGSzwKFwH) channel.
@@ -71,19 +81,19 @@ Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.yout

 ## 🧩 Available services

-| Category            | Services                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
-| ------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| Speech-to-Text      | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Gradium](https://docs.pipecat.ai/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [Hathora](https://docs.pipecat.ai/server/services/stt/hathora), [NVIDIA Riva](https://docs.pipecat.ai/server/services/stt/riva), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [SambaNova (Whisper)](https://docs.pipecat.ai/server/services/stt/sambanova), [Sarvam](https://docs.pipecat.ai/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper)                                                                                                                                                                                                                                                            |
-| LLMs                | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/server/services/llm/aws), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/server/services/llm/mistral), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/server/services/llm/sambanova) [Together AI](https://docs.pipecat.ai/server/services/llm/together)                                                                                                                                                                                                                                                                                              |
-| Text-to-Speech      | [Async](https://docs.pipecat.ai/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Camb AI](https://docs.pipecat.ai/server/services/tts/camb), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [Gradium](https://docs.pipecat.ai/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/server/services/tts/groq), [Hathora](https://docs.pipecat.ai/server/services/tts/hathora), [Hume](https://docs.pipecat.ai/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/server/services/tts/inworld), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [NVIDIA Riva](https://docs.pipecat.ai/server/services/tts/riva), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Resemble](https://docs.pipecat.ai/server/services/tts/resemble), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [Speechmatics](https://docs.pipecat.ai/server/services/tts/speechmatics), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
-| Speech-to-Speech    | [AWS Nova Sonic](https://docs.pipecat.ai/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/server/services/s2s/ultravox),                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
-| Transport           | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
-| Serializers         | [Exotel](https://docs.pipecat.ai/server/utilities/serializers/exotel), [Plivo](https://docs.pipecat.ai/server/utilities/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/utilities/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/utilities/serializers/telnyx), [Vonage](https://docs.pipecat.ai/server/utilities/serializers/vonage)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
-| Video               | [HeyGen](https://docs.pipecat.ai/server/services/video/heygen), [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
-| Memory              | [mem0](https://docs.pipecat.ai/server/services/memory/mem0)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
-| Vision & Image      | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/google-imagen), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
-| Audio Processing    | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [ai-coustics](https://docs.pipecat.ai/server/utilities/audio/aic-filter)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
-| Analytics & Metrics | [OpenTelemetry](https://docs.pipecat.ai/server/utilities/opentelemetry), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
+| Category            | Services                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
+| ------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| Speech-to-Text      | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Gradium](https://docs.pipecat.ai/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [NVIDIA Riva](https://docs.pipecat.ai/server/services/stt/riva), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [SambaNova (Whisper)](https://docs.pipecat.ai/server/services/stt/sambanova), [Sarvam](https://docs.pipecat.ai/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper)                                                                                                                                                                                                                                                                                                                             |
+| LLMs                | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/server/services/llm/aws), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/server/services/llm/mistral), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/server/services/llm/sambanova) [Together AI](https://docs.pipecat.ai/server/services/llm/together)                                                                                                                                                                                                                                                                                                                                                               |
+| Text-to-Speech      | [Async](https://docs.pipecat.ai/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Camb AI](https://docs.pipecat.ai/server/services/tts/camb), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [Gradium](https://docs.pipecat.ai/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/server/services/tts/groq), [Hume](https://docs.pipecat.ai/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/server/services/tts/inworld), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [NVIDIA Riva](https://docs.pipecat.ai/server/services/tts/riva), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [Resemble](https://docs.pipecat.ai/server/services/tts/resemble), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [Speechmatics](https://docs.pipecat.ai/server/services/tts/speechmatics), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
+| Speech-to-Speech    | [AWS Nova Sonic](https://docs.pipecat.ai/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/server/services/s2s/ultravox),                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
+| Transport           | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
+| Serializers         | [Exotel](https://docs.pipecat.ai/server/utilities/serializers/exotel), [Plivo](https://docs.pipecat.ai/server/utilities/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/utilities/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/utilities/serializers/telnyx), [Vonage](https://docs.pipecat.ai/server/utilities/serializers/vonage)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
+| Video               | [HeyGen](https://docs.pipecat.ai/server/services/video/heygen), [LemonSlice](https://docs.pipecat.ai/server/services/video/lemonslice), [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
+| Memory              | [mem0](https://docs.pipecat.ai/server/services/memory/mem0)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
+| Vision & Image      | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/google-imagen), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
+| Audio Processing    | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [ai-coustics](https://docs.pipecat.ai/server/utilities/audio/aic-filter)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
+| Analytics & Metrics | [OpenTelemetry](https://docs.pipecat.ai/server/utilities/opentelemetry), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |

 📚 [View full services documentation →](https://docs.pipecat.ai/server/services/supported-services)

@@ -163,6 +173,15 @@ You can get started with Pipecat running on your local machine, then move your a

 > **Note**: Some extras (local, gstreamer) require system dependencies. See documentation if you encounter build errors.

+### Claude Code Skills
+
+Install development workflow skills for contributing to Pipecat with [Claude Code](https://claude.ai/code):
+
+```
+claude plugin marketplace add pipecat-ai/pipecat
+claude plugin install pipecat-dev@pipecat-dev-skills
+```
+
 ### Running tests

 To run all tests, from the root directory:
--- a/changelog/3457.changed.md
+++ b/changelog/3457.changed.md
@@ -0,0 +1 @@
+- Changed tool result JSON serialization to use `ensure_ascii=False`, preserving UTF-8 characters instead of escaping them. This reduces context size and token usage for non-English languages.
--- a/changelog/3759.performance.md
+++ b/changelog/3759.performance.md
@@ -1 +0,0 @@
- Switched `GradiumTTSService` from `InterruptibleWordTTSService` to `AudioContextWordTTSService`, eliminating websocket disconnect/reconnect on every interruption by using `client_req_id`-based multiplexing.
--- a/changelog/3991.changed.md
+++ b/changelog/3991.changed.md
@@ -0,0 +1 @@
+- `OpenAIRealtimeSTTService`'s `noise_reduction` parameter is now part of `OpenAIRealtimeSTTSettings`, making it runtime-updatable via `STTUpdateSettingsFrame`. The direct `noise_reduction` init argument is deprecated as of 0.0.106.
--- a/changelog/3997.changed.md
+++ b/changelog/3997.changed.md
@@ -0,0 +1 @@
+- Updated `sarvamai` dependency from `0.1.26a2` (alpha) to `0.1.26` (stable release).
--- a/changelog/4000.fixed.md
+++ b/changelog/4000.fixed.md
@@ -0,0 +1 @@
+- Fixed an issue where the default model for `OpenAILLMService` and `AzureLLMService` was mistakenly reverted to `gpt-4o`. The defaults are now restored to `gpt-4.1`.
--- a/changelog/4001.changed.md
+++ b/changelog/4001.changed.md
@@ -0,0 +1 @@
+- `SimliVideoService` now extends `AIService` instead of `FrameProcessor`, aligning it with the HeyGen and Tavus video services. It supports `SimliVideoService.Settings(...)` for configuration and uses `start()`/`stop()`/`cancel()` lifecycle methods. Existing constructor usage (`api_key`, `face_id`, etc.) remains unchanged.
--- a/changelog/4001.deprecated.md
+++ b/changelog/4001.deprecated.md
@@ -0,0 +1 @@
+- `SimliVideoService.InputParams` is deprecated. Use the direct constructor parameters `max_session_length`, `max_idle_time`, and `enable_logging` instead.
--- a/changelog/4004.added.md
+++ b/changelog/4004.added.md
@@ -0,0 +1 @@
+- Added optional `service` field to `ServiceUpdateSettingsFrame` (and its subclasses `LLMUpdateSettingsFrame`, `TTSUpdateSettingsFrame`, `STTUpdateSettingsFrame`) to target a specific service instance. When `service` is set, only the matching service applies the settings; others forward the frame unchanged. This enables updating a single service when multiple services of the same type exist in the pipeline.
--- a/changelog/4005.added.md
+++ b/changelog/4005.added.md
@@ -0,0 +1 @@
+- Added `sip_provider` and `room_geo` parameters to `configure()` in the Daily runner. These convenience parameters let callers specify a SIP provider name and geographic region directly without manually constructing `DailyRoomProperties` and `DailyRoomSipParams`.
--- a/changelog/4006.fixed.md
+++ b/changelog/4006.fixed.md
@@ -0,0 +1 @@
+- Fixed a race condition where `EndTaskFrame` could cause the pipeline to shut down before in-flight frames (e.g. LLM function call responses) finished processing. `EndTaskFrame` and `StopTaskFrame` now flow through the pipeline as `ControlFrame`s, ensuring all pending work is flushed before shutdown begins. `CancelTaskFrame` and `InterruptionTaskFrame` remain immediate (`SystemFrame`).
--- a/changelog/4007.fixed.2.md
+++ b/changelog/4007.fixed.2.md
@@ -0,0 +1 @@
+- Fixed `TTSService` potentially canceling in-flight audio during shutdown. The stop sequence now waits for all queued audio contexts to finish processing before canceling the stop frame task.
--- a/changelog/4007.fixed.md
+++ b/changelog/4007.fixed.md
@@ -0,0 +1 @@
+- Fixed `ParallelPipeline` dropping or misordering frames during lifecycle synchronization. Buffered frames are now flushed in the correct order relative to synchronization frames (`StartFrame` goes first, `EndFrame`/`CancelFrame` go after), and frames added to the buffer during flush are also drained.
--- a/changelog/4009.added.md
+++ b/changelog/4009.added.md
@@ -0,0 +1 @@
+- Added `PerplexityLLMAdapter` that automatically transforms conversation messages to satisfy Perplexity's stricter API constraints (strict role alternation, no non-initial system messages, last message must be user/tool). Previously, certain conversation histories could cause Perplexity API errors that didn't occur with OpenAI (`PerplexityLLMService` subclasses `OpenAILLMService` since Perplexity uses an OpenAI-compatible API).
--- a/changelog/4012.deprecated.md
+++ b/changelog/4012.deprecated.md
@@ -0,0 +1 @@
+- Deprecated `LocalSmartTurnAnalyzerV2` and `LocalCoreMLSmartTurnAnalyzer`. Use `LocalSmartTurnAnalyzerV3` instead. Instantiating these analyzers will now emit a `DeprecationWarning`.
--- a/changelog/4023.changed.md
+++ b/changelog/4023.changed.md
@@ -0,0 +1 @@
+- Update `pipecat-ai-small-webrtc-prebuilt` to `2.4.0`.
--- a/changelog/4024.fixed.md
+++ b/changelog/4024.fixed.md
@@ -0,0 +1 @@
+- Fixed `Language` enum values (e.g. `Language.ES`) not being converted to service-specific codes when passed via `settings=Service.Settings(language=Language.ES)` at init time. This caused API errors (e.g. 400 from Rime) because the raw enum was sent instead of the expected language code (e.g. `"spa"`). Runtime updates via `UpdateSettingsFrame` were unaffected. The fix centralizes conversion in the base `TTSService` and `STTService` classes so all services handle this consistently.
--- a/changelog/4026.fixed.md
+++ b/changelog/4026.fixed.md
@@ -0,0 +1 @@
+- Fixed `DeepgramSTTService` ignoring the `base_url` scheme when using `ws://` or `http://`. Previously these were silently overwritten with `wss://` / `https://`, breaking air-gapped or private deployments that don't use TLS. All scheme choices (`wss://`, `https://`, `ws://`, `http://`, or bare hostname) are now respected.
--- a/changelog/4035.security.md
+++ b/changelog/4035.security.md
@@ -0,0 +1 @@
+- Bumped PyJWT minimum version from 2.10.1 to 2.12.0 in the `livekit` extra to address CVE-2026-32597 (GHSA-752w-5fwx-jx9f), where PyJWT <= 2.11.0 accepted unknown `crit` header extensions.
--- a/changelog/4037.fixed.md
+++ b/changelog/4037.fixed.md
@@ -0,0 +1 @@
+- Fixed `LLMSwitcher.register_function()` and `register_direct_function()` not accepting or forwarding the `timeout_secs` parameter.
--- a/changelog/4046.fixed.md
+++ b/changelog/4046.fixed.md
@@ -0,0 +1 @@
+Fixed `SonioxSTTService` and `OpenAIRealtimeSTTService` crash when language parameters contain plain strings instead of `Language` enum values.
--- a/changelog/4047.added.md
+++ b/changelog/4047.added.md
@@ -0,0 +1 @@
+- Added DTMF input event support to the Daily transport. Incoming DTMF tones are now received via Daily's `on_dtmf_event` callback and pushed into the pipeline as `InputDTMFFrame`, enabling bots to react to keypad presses from phone callers.
--- a/changelog/4047.changed.md
+++ b/changelog/4047.changed.md
@@ -0,0 +1 @@
+- Updated `daily-python` dependency to 0.25.0.
--- a/changelog/4048.changed.md
+++ b/changelog/4048.changed.md
@@ -0,0 +1 @@
+- Added `enable_dialout` parameter to `configure()` in `pipecat.runner.daily` to support dial-out rooms. Also narrowed misleading `Optional` type hints and deduplicated token expiry calculation.
--- a/changelog/4057.fixed.md
+++ b/changelog/4057.fixed.md
@@ -0,0 +1 @@
+- Fixed premature user turn stops caused by late transcriptions arriving between turns. A stale transcript from the previous turn could persist into the next turn and trigger a stop before the current turn's real transcript arrived. Stop strategies are now reset at both turn start and turn stop to prevent state from leaking across turn boundaries.
--- a/changelog/4058.fixed.md
+++ b/changelog/4058.fixed.md
@@ -0,0 +1 @@
+- Fixed raw language strings like `"de-DE"` silently failing when passed to TTS/STT services (e.g. ElevenLabs producing no audio). Raw strings now go through the same `Language` enum resolution as enum values, so regional codes like `"de-DE"` are properly converted to service-expected formats like `"de"`. Unrecognized strings log a warning instead of failing silently.
--- a/changelog/4063.fixed.md
+++ b/changelog/4063.fixed.md
@@ -0,0 +1 @@
+- Fixed Deepgram STT list-type settings (`keyterm`, `keywords`, `search`, `redact`, `replace`) being stringified instead of passed as lists to the SDK, which caused them to be sent as literal strings (e.g. `"['pipecat']"`) in the WebSocket query params.
--- a/docs/api/README.md
+++ b/docs/api/README.md
@@ -42,7 +42,7 @@ This script:

 - Creates a fresh virtual environment
 - Installs all dependencies as specified in requirements files
- Handles conflicting dependencies (like grpcio versions for Riva and PlayHT)
+- Handles conflicting dependencies (like grpcio versions for Riva)
 - Builds the documentation in an isolated environment
 - Provides detailed logging of the build process

@@ -74,7 +74,6 @@ start _build/html/index.html
 ├── index.rst       # Main documentation entry point
 ├── requirements-base.txt    # Base documentation dependencies
 ├── requirements-riva.txt    # Riva-specific dependencies
-├── requirements-playht.txt  # PlayHT-specific dependencies
 ├── build-docs.sh   # Local build script
 └── rtd-test.py     # ReadTheDocs test build script
 ```
--- a/env.example
+++ b/env.example
@@ -86,9 +86,6 @@ GROK_API_KEY=...
 # Groq
 GROQ_API_KEY=...

-# Hathora
-HATHORA_API_KEY=...
-
 # Heygen
 HEYGEN_API_KEY=...
 HEYGEN_LIVE_AVATAR_API_KEY=...
@@ -104,9 +101,14 @@ INWORLD_API_KEY=...
 KRISP_MODEL_PATH=...

 # Krisp Viva
+KRISP_VIVA_API_KEY=...
 KRISP_VIVA_FILTER_MODEL_PATH=...
 KRISP_VIVA_TURN_MODEL_PATH=...

+# LemonSlice
+LEMONSLICE_API_KEY=...
+LEMONSLICE_AGENT_ID=...
+
 # LiveKit
 LIVEKIT_API_KEY=...
 LIVEKIT_API_SECRET=...
@@ -146,10 +148,6 @@ KOALA_ACCESS_KEY=...
 # Piper
 PIPER_BASE_URL=...

-# PlayHT
-PLAYHT_USER_ID=...
-PLAYHT_API_KEY=...
-
 # Plivo
 PLIVO_AUTH_ID=...
 PLIVO_AUTH_TOKEN=...
--- a/examples/foundational/01-say-one-thing-piper.py
+++ b/examples/foundational/01-say-one-thing-piper.py
@@ -39,7 +39,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    # Create an HTTP session
    async with aiohttp.ClientSession() as session:
        tts = PiperHttpTTSService(
-            base_url=os.getenv("PIPER_BASE_URL"), aiohttp_session=session, sample_rate=24000
+            base_url=os.getenv("PIPER_BASE_URL"),
+            aiohttp_session=session,
+            sample_rate=24000,
        )

        task = PipelineTask(
--- a/examples/foundational/01-say-one-thing-rime.py
+++ b/examples/foundational/01-say-one-thing-rime.py
@@ -39,8 +39,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async with aiohttp.ClientSession() as session:
        tts = RimeHttpTTSService(
            api_key=os.getenv("RIME_API_KEY", ""),
-            voice_id="rex",
            aiohttp_session=session,
+            settings=RimeHttpTTSService.Settings(
+                voice="rex",
+            ),
        )

        task = PipelineTask(
--- a/examples/foundational/01-say-one-thing.py
+++ b/examples/foundational/01-say-one-thing.py
@@ -37,7 +37,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
    )

    task = PipelineTask(
--- a/examples/foundational/01a-local-audio.py
+++ b/examples/foundational/01a-local-audio.py
@@ -29,7 +29,9 @@ async def main():

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
    )

    pipeline = Pipeline([tts, transport.output()])
--- a/examples/foundational/01b-livekit-audio.py
+++ b/examples/foundational/01b-livekit-audio.py
@@ -37,7 +37,9 @@ async def main():

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
    )

    runner = PipelineRunner()
--- a/examples/foundational/02-llm-say-one-thing.py
+++ b/examples/foundational/02-llm-say-one-thing.py
@@ -39,17 +39,17 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
-
-    messages = [
-        {
-            "role": "system",
-            "content": "You are an LLM in a WebRTC session, and this is a 'hello world' demo. Say hello to the world.",
-        }
-    ]
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

    task = PipelineTask(
        Pipeline([llm, tts, transport.output()]),
@@ -59,7 +59,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    # Register an event handler so we can play the audio when the client joins
    @transport.event_handler("on_client_connected")
    async def on_client_connected(transport, client):
-        await task.queue_frames([LLMContextFrame(LLMContext(messages)), EndFrame()])
+        context = LLMContext()
+        context.add_message({"role": "user", "content": "Say hello to the world."})
+        await task.queue_frames([LLMContextFrame(context), EndFrame()])

    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)

--- a/examples/foundational/03-still-frame.py
+++ b/examples/foundational/03-still-frame.py
@@ -45,7 +45,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    # Create an HTTP session
    async with aiohttp.ClientSession() as session:
        imagegen = FalImageGenService(
-            params=FalImageGenService.InputParams(image_size="square_hd"),
+            settings=FalImageGenService.Settings(
+                image_size="square_hd",
+            ),
            aiohttp_session=session,
            key=os.getenv("FAL_KEY"),
        )
--- a/examples/foundational/03a-local-still-frame.py
+++ b/examples/foundational/03a-local-still-frame.py
@@ -37,7 +37,9 @@ async def main():
        )

        imagegen = FalImageGenService(
-            params=FalImageGenService.InputParams(image_size="square_hd"),
+            settings=FalImageGenService.Settings(
+                image_size="square_hd",
+            ),
            aiohttp_session=session,
            key=os.getenv("FAL_KEY"),
        )
--- a/examples/foundational/04-transports-small-webrtc.py
+++ b/examples/foundational/04-transports-small-webrtc.py
@@ -67,19 +67,19 @@ async def run_example(webrtc_connection: SmallWebRTCConnection):

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -109,7 +109,7 @@ async def run_example(webrtc_connection: SmallWebRTCConnection):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/04a-transports-daily.py
+++ b/examples/foundational/04a-transports-daily.py
@@ -50,19 +50,19 @@ async def main():

        tts = CartesiaTTSService(
            api_key=os.getenv("CARTESIA_API_KEY"),
-            voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+            settings=CartesiaTTSService.Settings(
+                voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+            ),
        )

-        llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
+        llm = OpenAILLMService(
+            api_key=os.getenv("OPENAI_API_KEY"),
+            settings=OpenAILLMService.Settings(
+                system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+            ),
+        )

-        messages = [
-            {
-                "role": "system",
-                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-            },
-        ]
-
-        context = LLMContext(messages)
+        context = LLMContext()
        user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
            context,
            user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -91,7 +91,9 @@ async def main():
        async def on_first_participant_joined(transport, participant):
            await transport.capture_participant_transcription(participant["id"])
            # Kick off the conversation.
-            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+            context.add_message(
+                {"role": "user", "content": "Please introduce yourself to the user."}
+            )
            await task.queue_frames([LLMRunFrame()])

        @transport.event_handler("on_participant_left")
--- a/examples/foundational/04b-transports-livekit.py
+++ b/examples/foundational/04b-transports-livekit.py
@@ -55,24 +55,21 @@ async def main():

    stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. "
-            "Your goal is to demonstrate your capabilities in a succinct way. "
-            "Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. "
-            "Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
--- a/examples/foundational/05-sync-speech-and-image.py
+++ b/examples/foundational/05-sync-speech-and-image.py
@@ -98,11 +98,15 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

        tts = CartesiaHttpTTSService(
            api_key=os.getenv("CARTESIA_API_KEY"),
-            voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+            settings=CartesiaHttpTTSService.Settings(
+                voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+            ),
        )

        imagegen = FalImageGenService(
-            params=FalImageGenService.InputParams(image_size="square_hd"),
+            settings=FalImageGenService.Settings(
+                image_size="square_hd",
+            ),
            aiohttp_session=session,
            key=os.getenv("FAL_KEY"),
        )
@@ -148,7 +152,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        ]:
            messages = [
                {
-                    "role": "system",
+                    "role": "user",
                    "content": f"Describe a nature photograph suitable for use in a calendar, for the month of {month}. Include only the image description with no preamble. Limit the description to one sentence, please.",
                }
            ]
--- a/examples/foundational/05a-local-sync-speech-and-image.py
+++ b/examples/foundational/05a-local-sync-speech-and-image.py
@@ -49,7 +49,7 @@ async def main():
        async def get_month_data(month):
            messages = [
                {
-                    "role": "system",
+                    "role": "user",
                    "content": f"Describe a nature photograph suitable for use in a calendar, for the month of {month}. Include only the image description with no preamble. Limit the description to one sentence, please.",
                }
            ]
@@ -98,11 +98,15 @@ async def main():

            tts = CartesiaHttpTTSService(
                api_key=os.getenv("CARTESIA_API_KEY"),
-                voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+                settings=CartesiaHttpTTSService.Settings(
+                    voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+                ),
            )

            imagegen = FalImageGenService(
-                params=FalImageGenService.InputParams(image_size="square_hd"),
+                settings=FalImageGenService.Settings(
+                    image_size="square_hd",
+                ),
                aiohttp_session=session,
                key=os.getenv("FAL_KEY"),
            )
--- a/examples/foundational/06-listen-and-respond.py
+++ b/examples/foundational/06-listen-and-respond.py
@@ -83,21 +83,21 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

    ml = MetricsLogger()

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -129,7 +129,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/06a-image-sync.py
+++ b/examples/foundational/06a-image-sync.py
@@ -100,19 +100,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
--- a/examples/foundational/07-interruptible-cartesia-http.py
+++ b/examples/foundational/07-interruptible-cartesia-http.py
@@ -6,6 +6,7 @@

 import os

+import aiohttp
 from dotenv import load_dotenv
 from loguru import logger

@@ -52,64 +53,68 @@ transport_params = {
 async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    logger.info(f"Starting bot")

-    stt = CartesiaSTTService(api_key=os.getenv("CARTESIA_API_KEY"))
+    async with aiohttp.ClientSession() as session:
+        stt = CartesiaSTTService(api_key=os.getenv("CARTESIA_API_KEY"))

-    tts = CartesiaHttpTTSService(
-        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
-    )
+        tts = CartesiaHttpTTSService(
+            api_key=os.getenv("CARTESIA_API_KEY"),
+            aiohttp_session=session,
+            settings=CartesiaHttpTTSService.Settings(
+                voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+            ),
+        )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+        llm = OpenAILLMService(
+            api_key=os.getenv("OPENAI_API_KEY"),
+            settings=OpenAILLMService.Settings(
+                system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+            ),
+        )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
+        context = LLMContext()
+        user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+            context,
+            user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+        )

-    context = LLMContext(messages)
-    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
-        context,
-        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
-    )
+        pipeline = Pipeline(
+            [
+                transport.input(),  # Transport user input
+                stt,
+                user_aggregator,  # User responses
+                llm,  # LLM
+                tts,  # TTS
+                transport.output(),  # Transport bot output
+                assistant_aggregator,  # Assistant spoken responses
+            ]
+        )

-    pipeline = Pipeline(
-        [
-            transport.input(),  # Transport user input
-            stt,
-            user_aggregator,  # User responses
-            llm,  # LLM
-            tts,  # TTS
-            transport.output(),  # Transport bot output
-            assistant_aggregator,  # Assistant spoken responses
-        ]
-    )
+        task = PipelineTask(
+            pipeline,
+            params=PipelineParams(
+                enable_metrics=True,
+                enable_usage_metrics=True,
+            ),
+            idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
+        )

-    task = PipelineTask(
-        pipeline,
-        params=PipelineParams(
-            enable_metrics=True,
-            enable_usage_metrics=True,
-        ),
-        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
-    )
+        @transport.event_handler("on_client_connected")
+        async def on_client_connected(transport, client):
+            logger.info(f"Client connected")
+            # Kick off the conversation.
+            context.add_message(
+                {"role": "user", "content": "Please introduce yourself to the user."}
+            )
+            await task.queue_frames([LLMRunFrame()])

-    @transport.event_handler("on_client_connected")
-    async def on_client_connected(transport, client):
-        logger.info(f"Client connected")
-        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
-        await task.queue_frames([LLMRunFrame()])
+        @transport.event_handler("on_client_disconnected")
+        async def on_client_disconnected(transport, client):
+            logger.info(f"Client disconnected")
+            await task.cancel()

-    @transport.event_handler("on_client_disconnected")
-    async def on_client_disconnected(transport, client):
-        logger.info(f"Client disconnected")
-        await task.cancel()
+        runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)

-    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
-
-    await runner.run(task)
+        await runner.run(task)


 async def bot(runner_args: RunnerArguments):
--- a/examples/foundational/07-interruptible.py
+++ b/examples/foundational/07-interruptible.py
@@ -55,19 +55,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -98,7 +98,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07a-interruptible-speechmatics-vad.py
+++ b/examples/foundational/07a-interruptible-speechmatics-vad.py
@@ -21,7 +21,6 @@ from pipecat.processors.aggregators.llm_response_universal import (
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
-from pipecat.services.openai.base_llm import BaseOpenAILLMService
 from pipecat.services.openai.llm import OpenAILLMService
 from pipecat.services.speechmatics.stt import SpeechmaticsSTTService
 from pipecat.services.speechmatics.tts import SpeechmaticsTTSService
@@ -93,7 +92,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async with aiohttp.ClientSession() as session:
        stt = SpeechmaticsSTTService(
            api_key=os.getenv("SPEECHMATICS_API_KEY"),
-            params=SpeechmaticsSTTService.InputParams(
+            settings=SpeechmaticsSTTService.Settings(
                language=Language.EN,
                turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.ADAPTIVE,
                # focus_speakers=["S1"],
@@ -104,32 +103,21 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

        tts = SpeechmaticsTTSService(
            api_key=os.getenv("SPEECHMATICS_API_KEY"),
-            voice_id="sarah",
+            settings=SpeechmaticsTTSService.Settings(
+                voice="sarah",
+            ),
            aiohttp_session=session,
        )

        llm = OpenAILLMService(
            api_key=os.getenv("OPENAI_API_KEY"),
-            params=BaseOpenAILLMService.InputParams(temperature=0.75),
+            settings=OpenAILLMService.Settings(
+                temperature=0.75,
+                system_instruction="You are a helpful British assistant called Sarah in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Always include punctuation in your responses. Give very short replies - do not give longer replies unless strictly necessary. Respond to what the user said in a concise, funny, creative and helpful way. Use `<Sn/>` tags to identify different speakers - do not use tags in your replies. Do not respond to speakers within `<PASSIVE/>` tags unless explicitly asked to.",
+            ),
        )

-        messages = [
-            {
-                "role": "system",
-                "content": (
-                    "You are a helpful British assistant called Sarah. "
-                    "Your goal is to demonstrate your capabilities in a succinct way. "
-                    "Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. "
-                    "Always include punctuation in your responses. "
-                    "Give very short replies - do not give longer replies unless strictly necessary. "
-                    "Respond to what the user said in a concise, funny, creative and helpful way. "
-                    "Use `<Sn/>` tags to identify different speakers - do not use tags in your replies. "
-                    "Do not respond to speakers within `<PASSIVE/>` tags unless explicitly asked to. "
-                ),
-            },
-        ]
-
-        context = LLMContext(messages)
+        context = LLMContext()
        user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
            context,
            user_params=LLMUserAggregatorParams(user_turn_strategies=ExternalUserTurnStrategies()),
@@ -160,7 +148,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        async def on_client_connected(transport, client):
            logger.info(f"Client connected")
            # Kick off the conversation.
-            messages.append({"role": "system", "content": "Say a short hello to the user."})
+            context.add_message({"role": "user", "content": "Say a short hello to the user."})
            await task.queue_frames([LLMRunFrame()])

        @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07a-interruptible-speechmatics.py
+++ b/examples/foundational/07a-interruptible-speechmatics.py
@@ -22,7 +22,6 @@ from pipecat.processors.aggregators.llm_response_universal import (
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
-from pipecat.services.openai.base_llm import BaseOpenAILLMService
 from pipecat.services.openai.llm import OpenAILLMService
 from pipecat.services.speechmatics.stt import SpeechmaticsSTTService
 from pipecat.services.speechmatics.tts import SpeechmaticsTTSService
@@ -76,7 +75,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async with aiohttp.ClientSession() as session:
        stt = SpeechmaticsSTTService(
            api_key=os.getenv("SPEECHMATICS_API_KEY"),
-            params=SpeechmaticsSTTService.InputParams(
+            settings=SpeechmaticsSTTService.Settings(
                language=Language.EN,
                speaker_active_format="<{speaker_id}>{text}</{speaker_id}>",
            ),
@@ -84,31 +83,21 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

        tts = SpeechmaticsTTSService(
            api_key=os.getenv("SPEECHMATICS_API_KEY"),
-            voice_id="sarah",
+            settings=SpeechmaticsTTSService.Settings(
+                voice="sarah",
+            ),
            aiohttp_session=session,
        )

        llm = OpenAILLMService(
            api_key=os.getenv("OPENAI_API_KEY"),
-            params=BaseOpenAILLMService.InputParams(temperature=0.75),
+            settings=OpenAILLMService.Settings(
+                temperature=0.75,
+                system_instruction="You are a helpful British assistant called Sarah in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Always include punctuation in your responses. Give very short replies - do not give longer replies unless strictly necessary. Respond to what the user said in a concise, funny, creative and helpful way. Use `<Sn/>` tags to identify different speakers - do not use tags in your replies. Do not respond to speakers within `<PASSIVE/>` tags unless explicitly asked to.",
+            ),
        )

-        messages = [
-            {
-                "role": "system",
-                "content": (
-                    "You are a helpful British assistant called Sarah. "
-                    "Your goal is to demonstrate your capabilities in a succinct way. "
-                    "Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. "
-                    "Always include punctuation in your responses. "
-                    "Give very short replies - do not give longer replies unless strictly necessary. "
-                    "Respond to what the user said in a concise, funny, creative and helpful way. "
-                    "Use `<Sn/>` tags to identify different speakers - do not use tags in your replies."
-                ),
-            },
-        ]
-
-        context = LLMContext(messages)
+        context = LLMContext()
        user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
            context,
            user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -139,7 +128,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        async def on_client_connected(transport, client):
            logger.info(f"Client connected")
            # Kick off the conversation.
-            messages.append({"role": "system", "content": "Say a short hello to the user."})
+            context.add_message({"role": "user", "content": "Say a short hello to the user."})
            await task.queue_frames([LLMRunFrame()])

        @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07b-interruptible-langchain.py
+++ b/examples/foundational/07b-interruptible-langchain.py
@@ -71,15 +71,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
    )

    prompt = ChatPromptTemplate.from_messages(
        [
            (
                "system",
-                "Be nice and helpful. Answer very briefly and without special characters like `#` or `*`. "
-                "Your response will be synthesized to voice and those characters will create unnatural sounds.",
+                "You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
            ),
            MessagesPlaceholder("chat_history"),
            ("human", "{input}"),
--- a/examples/foundational/07c-interruptible-deepgram-flux.py
+++ b/examples/foundational/07c-interruptible-deepgram-flux.py
@@ -10,6 +10,7 @@ import os
 from dotenv import load_dotenv
 from loguru import logger

+from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import LLMRunFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
@@ -55,24 +56,32 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    stt = DeepgramFluxSTTService(
        api_key=os.getenv("DEEPGRAM_API_KEY"),
-        params=DeepgramFluxSTTService.InputParams(min_confidence=0.3),
+        settings=DeepgramFluxSTTService.Settings(
+            min_confidence=0.3,
+        ),
    )

-    tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-2-andromeda-en")
+    tts = DeepgramTTSService(
+        api_key=os.getenv("DEEPGRAM_API_KEY"),
+        settings=DeepgramTTSService.Settings(
+            voice="aura-2-andromeda-en",
+        ),
+    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
-        user_params=LLMUserAggregatorParams(user_turn_strategies=ExternalUserTurnStrategies()),
+        user_params=LLMUserAggregatorParams(
+            user_turn_strategies=ExternalUserTurnStrategies(),
+            vad_analyzer=SileroVADAnalyzer(),
+        ),
    )

    pipeline = Pipeline(
@@ -100,7 +109,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07c-interruptible-deepgram-http.py
+++ b/examples/foundational/07c-interruptible-deepgram-http.py
@@ -59,20 +59,20 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

        tts = DeepgramHttpTTSService(
            api_key=os.getenv("DEEPGRAM_API_KEY"),
-            voice="aura-2-andromeda-en",
+            settings=DeepgramHttpTTSService.Settings(
+                voice="aura-2-andromeda-en",
+            ),
            aiohttp_session=session,
        )

-        llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+        llm = OpenAILLMService(
+            api_key=os.getenv("OPENAI_API_KEY"),
+            settings=OpenAILLMService.Settings(
+                system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+            ),
+        )

-        messages = [
-            {
-                "role": "system",
-                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-            },
-        ]
-
-        context = LLMContext(messages)
+        context = LLMContext()
        user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
            context,
            user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -103,7 +103,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        async def on_client_connected(transport, client):
            logger.info(f"Client connected")
            # Kick off the conversation.
-            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+            context.add_message(
+                {"role": "user", "content": "Please introduce yourself to the user."}
+            )
            await task.queue_frames([LLMRunFrame()])

        @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07c-interruptible-deepgram-sagemaker.py
+++ b/examples/foundational/07c-interruptible-deepgram-sagemaker.py
@@ -22,9 +22,9 @@ from pipecat.processors.aggregators.llm_response_universal import (
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
-from pipecat.services.aws.llm import AWSBedrockLLMService
-from pipecat.services.deepgram.stt_sagemaker import DeepgramSageMakerSTTService
-from pipecat.services.deepgram.tts_sagemaker import DeepgramSageMakerTTSService
+from pipecat.services.aws.llm import AWSBedrockLLMService, AWSBedrockLLMSettings
+from pipecat.services.deepgram.sagemaker.stt import DeepgramSageMakerSTTService
+from pipecat.services.deepgram.sagemaker.tts import DeepgramSageMakerTTSService
 from pipecat.transports.base_transport import BaseTransport, TransportParams
 from pipecat.transports.daily.transport import DailyParams
 from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
@@ -69,23 +69,21 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    tts = DeepgramSageMakerTTSService(
        endpoint_name=os.getenv("SAGEMAKER_TTS_ENDPOINT_NAME"),
        region=os.getenv("AWS_REGION"),
-        voice="aura-2-andromeda-en",
+        settings=DeepgramSageMakerTTSService.Settings(
+            voice="aura-2-andromeda-en",
+        ),
    )

    llm = AWSBedrockLLMService(
        aws_region=os.getenv("AWS_REGION"),
-        model="us.amazon.nova-pro-v1:0",
-        params=AWSBedrockLLMService.InputParams(temperature=0.8),
+        settings=AWSBedrockLLMSettings(
+            model="us.amazon.nova-pro-v1:0",
+            temperature=0.8,
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -116,7 +114,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07c-interruptible-deepgram-vad.py
+++ b/examples/foundational/07c-interruptible-deepgram-vad.py
@@ -7,7 +7,6 @@

 import os

-from deepgram import LiveOptions
 from dotenv import load_dotenv
 from loguru import logger

@@ -56,21 +55,27 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    stt = DeepgramSTTService(
        api_key=os.getenv("DEEPGRAM_API_KEY"),
-        live_options=LiveOptions(vad_events=True, utterance_end_ms="1000"),
+        settings=DeepgramSTTService.Settings(
+            vad_events=True,
+            utterance_end_ms="1000",
+        ),
    )

-    tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-2-andromeda-en")
+    tts = DeepgramTTSService(
+        api_key=os.getenv("DEEPGRAM_API_KEY"),
+        settings=DeepgramTTSService.Settings(
+            voice="aura-2-andromeda-en",
+        ),
+    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(user_turn_strategies=ExternalUserTurnStrategies()),
@@ -101,7 +106,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07c-interruptible-deepgram.py
+++ b/examples/foundational/07c-interruptible-deepgram.py
@@ -55,18 +55,21 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))

-    tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-2-andromeda-en")
+    tts = DeepgramTTSService(
+        api_key=os.getenv("DEEPGRAM_API_KEY"),
+        settings=DeepgramTTSService.Settings(
+            voice="aura-2-andromeda-en",
+        ),
+    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -97,7 +100,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07d-interruptible-elevenlabs-http.py
+++ b/examples/foundational/07d-interruptible-elevenlabs-http.py
@@ -63,20 +63,20 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

        tts = ElevenLabsHttpTTSService(
            api_key=os.getenv("ELEVENLABS_API_KEY", ""),
-            voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
            aiohttp_session=session,
+            settings=ElevenLabsHttpTTSService.Settings(
+                voice=os.getenv("ELEVENLABS_VOICE_ID", ""),
+            ),
        )

-        llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+        llm = OpenAILLMService(
+            api_key=os.getenv("OPENAI_API_KEY"),
+            settings=OpenAILLMService.Settings(
+                system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+            ),
+        )

-        messages = [
-            {
-                "role": "system",
-                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-            },
-        ]
-
-        context = LLMContext(messages)
+        context = LLMContext()
        user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
            context,
            user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -107,7 +107,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        async def on_client_connected(transport, client):
            logger.info(f"Client connected")
            # Kick off the conversation.
-            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+            context.add_message(
+                {"role": "user", "content": "Please introduce yourself to the user."}
+            )
            await task.queue_frames([LLMRunFrame()])

        @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07d-interruptible-elevenlabs.py
+++ b/examples/foundational/07d-interruptible-elevenlabs.py
@@ -57,19 +57,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = ElevenLabsTTSService(
        api_key=os.getenv("ELEVENLABS_API_KEY", ""),
-        voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
+        settings=ElevenLabsTTSService.Settings(
+            voice=os.getenv("ELEVENLABS_VOICE_ID", ""),
+        ),
    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -100,7 +100,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07f-interruptible-azure-http.py
+++ b/examples/foundational/07f-interruptible-azure-http.py
@@ -65,17 +65,13 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    llm = AzureLLMService(
        api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
        endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
-        model=os.getenv("AZURE_CHATGPT_MODEL"),
+        settings=AzureLLMService.Settings(
+            model=os.getenv("AZURE_CHATGPT_MODEL"),
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -106,7 +102,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07f-interruptible-azure.py
+++ b/examples/foundational/07f-interruptible-azure.py
@@ -65,17 +65,13 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    llm = AzureLLMService(
        api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
        endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
-        model=os.getenv("AZURE_CHATGPT_MODEL"),
+        settings=AzureLLMService.Settings(
+            model=os.getenv("AZURE_CHATGPT_MODEL"),
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -106,7 +102,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07g-interruptible-openai-http.py
+++ b/examples/foundational/07g-interruptible-openai-http.py
@@ -11,7 +11,6 @@ from dotenv import load_dotenv
 from loguru import logger

 from pipecat.audio.vad.silero import SileroVADAnalyzer
-from pipecat.audio.vad.vad_analyzer import VADParams
 from pipecat.frames.frames import LLMRunFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
@@ -55,22 +54,27 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    stt = OpenAISTTService(
        api_key=os.getenv("OPENAI_API_KEY"),
-        model="gpt-4o-transcribe",
-        prompt="Expect words related to dogs, such as breed names.",
+        settings=OpenAISTTService.Settings(
+            model="gpt-4o-transcribe",
+            prompt="Expect words related to dogs, such as breed names.",
+        ),
    )

-    tts = OpenAITTSService(api_key=os.getenv("OPENAI_API_KEY"), voice="ballad")
+    tts = OpenAITTSService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAITTSService.Settings(
+            voice="ballad",
+        ),
+    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are very knowledgable about dogs. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are very knowledgable about dogs. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -102,7 +106,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07g-interruptible-openai.py
+++ b/examples/foundational/07g-interruptible-openai.py
@@ -55,27 +55,28 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    stt = OpenAIRealtimeSTTService(
        api_key=os.getenv("OPENAI_API_KEY"),
-        model="gpt-4o-transcribe",
-        prompt="Expect words related to dogs, such as breed names.",
-        language=Language.EN,
-        # Uses local VAD by default.
-        # To enable server-side VAD, set turn_detection=None or
-        # a dict with server_vad settings.
-        # turn_detection={"type": "server_vad", "threshold": 0.5},
+        settings=OpenAIRealtimeSTTService.Settings(
+            model="gpt-4o-transcribe",
+            prompt="Expect words related to dogs, such as breed names.",
+            language=Language.EN,
+        ),
    )

-    tts = OpenAITTSService(api_key=os.getenv("OPENAI_API_KEY"), voice="ballad")
+    tts = OpenAITTSService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAITTSService.Settings(
+            voice="ballad",
+        ),
+    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are very knowledgable about dogs. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are very knowledgable about dogs. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -107,7 +108,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07h-interruptible-openpipe.py
+++ b/examples/foundational/07h-interruptible-openpipe.py
@@ -57,7 +57,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
    )

    timestamp = int(time.time())
@@ -65,16 +67,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        api_key=os.getenv("OPENAI_API_KEY"),
        openpipe_api_key=os.getenv("OPENPIPE_API_KEY"),
        tags={"conversation_id": f"pipecat-{timestamp}"},
+        settings=OpenPipeLLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -105,7 +103,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07i-interruptible-xtts.py
+++ b/examples/foundational/07i-interruptible-xtts.py
@@ -59,20 +59,20 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

        tts = XTTSService(
            aiohttp_session=session,
-            voice_id="Claribel Dervla",
+            settings=XTTSService.Settings(
+                voice="Claribel Dervla",
+            ),
            base_url="http://localhost:8000",
        )

-        llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+        llm = OpenAILLMService(
+            api_key=os.getenv("OPENAI_API_KEY"),
+            settings=OpenAILLMService.Settings(
+                system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+            ),
+        )

-        messages = [
-            {
-                "role": "system",
-                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-            },
-        ]
-
-        context = LLMContext(messages)
+        context = LLMContext()
        user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
            context,
            user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -103,7 +103,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        async def on_client_connected(transport, client):
            logger.info(f"Client connected")
            # Kick off the conversation.
-            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+            context.add_message(
+                {"role": "user", "content": "Please introduce yourself to the user."}
+            )
            await task.queue_frames([LLMRunFrame()])

        @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07j-interruptible-gladia-vad.py
+++ b/examples/foundational/07j-interruptible-gladia-vad.py
@@ -23,7 +23,7 @@ from pipecat.processors.aggregators.llm_response_universal import (
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
 from pipecat.services.cartesia.tts import CartesiaTTSService
-from pipecat.services.gladia.config import GladiaInputParams, LanguageConfig
+from pipecat.services.gladia.config import LanguageConfig
 from pipecat.services.gladia.stt import GladiaSTTService
 from pipecat.services.openai.llm import OpenAILLMService
 from pipecat.transcriptions.language import Language
@@ -58,7 +58,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    stt = GladiaSTTService(
        api_key=os.getenv("GLADIA_API_KEY", ""),
        region=os.getenv("GLADIA_REGION"),
-        params=GladiaInputParams(
+        settings=GladiaSTTService.Settings(
            language_config=LanguageConfig(
                languages=[Language.EN],
            ),
@@ -68,19 +68,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY", ""),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY", ""))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY", ""),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": f"You are a helpful LLM. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(
@@ -114,7 +114,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07j-interruptible-gladia.py
+++ b/examples/foundational/07j-interruptible-gladia.py
@@ -23,7 +23,7 @@ from pipecat.processors.aggregators.llm_response_universal import (
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
 from pipecat.services.cartesia.tts import CartesiaTTSService
-from pipecat.services.gladia.config import GladiaInputParams, LanguageConfig
+from pipecat.services.gladia.config import LanguageConfig
 from pipecat.services.gladia.stt import GladiaSTTService
 from pipecat.services.openai.llm import OpenAILLMService
 from pipecat.transcriptions.language import Language
@@ -57,7 +57,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    stt = GladiaSTTService(
        api_key=os.getenv("GLADIA_API_KEY", ""),
        region=os.getenv("GLADIA_REGION"),
-        params=GladiaInputParams(
+        settings=GladiaSTTService.Settings(
            language_config=LanguageConfig(
                languages=[Language.EN],
            )
@@ -66,19 +66,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY", ""),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY", ""))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY", ""),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": f"You are a helpful LLM. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -109,7 +109,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07k-interruptible-lmnt.py
+++ b/examples/foundational/07k-interruptible-lmnt.py
@@ -54,18 +54,21 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))

-    tts = LmntTTSService(api_key=os.getenv("LMNT_API_KEY"), voice_id="morgan")
+    tts = LmntTTSService(
+        api_key=os.getenv("LMNT_API_KEY"),
+        settings=LmntTTSService.Settings(
+            voice="morgan",
+        ),
+    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -96,7 +99,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07l-interruptible-groq.py
+++ b/examples/foundational/07l-interruptible-groq.py
@@ -55,19 +55,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    stt = GroqSTTService(api_key=os.getenv("GROQ_API_KEY"))

    llm = GroqLLMService(
-        api_key=os.getenv("GROQ_API_KEY"), model="meta-llama/llama-4-maverick-17b-128e-instruct"
+        api_key=os.getenv("GROQ_API_KEY"),
+        settings=GroqLLMService.Settings(
+            model="llama-3.1-8b-instant",
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
    )

    tts = GroqTTSService(api_key=os.getenv("GROQ_API_KEY"))

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -98,7 +95,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07m-interruptible-aws-strands.py
+++ b/examples/foundational/07m-interruptible-aws-strands.py
@@ -95,13 +95,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = AWSPollyTTSService(
        region="us-west-2",  # only specific regions support generative TTS
-        voice_id="Joanna",
-        params=AWSPollyTTSService.InputParams(engine="generative", rate="1.1"),
+        settings=AWSPollyTTSService.Settings(
+            voice="Joanna",
+            engine="generative",
+            rate="1.1",
+        ),
    )

    # Create Strands agent processor
    try:
-        agent = build_agent(model_id="us.anthropic.claude-3-5-haiku-20241022-v1:0", max_tokens=8000)
+        agent = build_agent(model_id="us.anthropic.claude-sonnet-4-6", max_tokens=8000)
        llm = StrandsAgentsProcessor(agent=agent)
        logger.info("Successfully created Strands agent for NAB customer service coaching")
    except Exception as e:
@@ -149,7 +152,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
                    messages=[
                        {
                            "role": "user",
-                            "content": f"Greet the user and introduce yourself.",
+                            "content": f"Greet the user and introduce yourself. Don't use emojis.",
                        }
                    ],
                    run_llm=True,
--- a/examples/foundational/07m-interruptible-aws.py
+++ b/examples/foundational/07m-interruptible-aws.py
@@ -54,24 +54,23 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = AWSPollyTTSService(
        region="us-west-2",  # only specific regions support generative TTS
-        voice_id="Joanna",
-        params=AWSPollyTTSService.InputParams(engine="generative", rate="1.1"),
+        settings=AWSPollyTTSService.Settings(
+            voice="Joanna",
+            engine="generative",
+            rate="1.1",
+        ),
    )

    llm = AWSBedrockLLMService(
        aws_region="us-west-2",
-        model="us.anthropic.claude-haiku-4-5-20251001-v1:0",
-        params=AWSBedrockLLMService.InputParams(temperature=0.8),
+        settings=AWSBedrockLLMService.Settings(
+            model="us.anthropic.claude-sonnet-4-6",
+            temperature=0.8,
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -102,7 +101,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "user", "content": "Please introduce yourself to the user."})
+        context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07n-interruptible-gemini-image.py
+++ b/examples/foundational/07n-interruptible-gemini-image.py
@@ -70,30 +70,30 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    logger.info(f"Starting bot")

    stt = GoogleSTTService(
-        params=GoogleSTTService.InputParams(languages=Language.EN_US),
        credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
+        settings=GoogleSTTService.Settings(
+            languages=[Language.EN_US],
+        ),
    )

    tts = GoogleTTSService(
-        voice_id="en-US-Chirp3-HD-Charon",
-        params=GoogleTTSService.InputParams(language=Language.EN_US),
        credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
+        settings=GoogleTTSService.Settings(
+            voice="en-US-Chirp3-HD-Charon",
+            language=Language.EN_US,
+        ),
    )

    llm = GoogleLLMService(
        api_key=os.getenv("GOOGLE_API_KEY"),
-        model="gemini-2.5-flash-image",
-        # model="gemini-3-pro-image-preview", # A more powerful model, but slower
+        settings=GoogleLLMService.Settings(
+            model="gemini-2.5-flash-image",
+            # model="gemini-3-pro-image-preview", # A more powerful model, but slower,
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -124,7 +124,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation with a styled introduction
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07n-interruptible-gemini.py
+++ b/examples/foundational/07n-interruptible-gemini.py
@@ -54,15 +54,17 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    logger.info(f"Starting bot with Gemini TTS")

    stt = GoogleSTTService(
-        params=GoogleSTTService.InputParams(languages=Language.EN_US),
+        settings=GoogleSTTService.Settings(
+            languages=[Language.EN_US],
+        ),
        credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
    )

    tts = GeminiTTSService(
        credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
-        model="gemini-2.5-flash-tts",
-        voice_id="Charon",
-        params=GeminiTTSService.InputParams(
+        settings=GeminiTTSService.Settings(
+            model="gemini-2.5-flash-tts",
+            voice="Charon",
            language=Language.EN_US,
            prompt="You are a helpful AI assistant. Speak in a natural, conversational tone.",
        ),
@@ -71,13 +73,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    llm = GoogleLLMService(
        api_key=os.getenv("GOOGLE_API_KEY"),
        model="gemini-2.5-flash",
-    )
-
-    # System message that instructs the AI on how to speak
-    messages = [
-        {
-            "role": "system",
-            "content": """You are a helpful AI assistant in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way.
+        settings=GoogleLLMService.Settings(
+            system_instruction="""You are a helpful assistant in a voice conversation.

            IMPORTANT: You're using Gemini TTS which supports expressive markup tags. You can use these tags in your responses:
            - [sigh] - Insert a sigh sound
@@ -94,11 +91,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
            - "[whispering] Let me tell you a secret."
            - "The answer is... [long pause] ...42!"

-            Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.""",
-        },
-    ]
+            Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Keep responses concise. Respond to what the user said in a creative and helpful way.""",
+        ),
+    )

-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -129,9 +126,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation
-        messages.append(
+        context.add_message(
            {
-                "role": "system",
+                "role": "user",
                "content": "You are an AI assistant. You can help with a variety of tasks. Introduce yourself and ask the user what they would like to know.",
            }
        )
--- a/examples/foundational/07n-interruptible-google-http.py
+++ b/examples/foundational/07n-interruptible-google-http.py
@@ -54,34 +54,34 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    logger.info(f"Starting bot")

    stt = GoogleSTTService(
-        params=GoogleSTTService.InputParams(languages=Language.EN_US, model="chirp_3"),
+        settings=GoogleSTTService.Settings(
+            languages=[Language.EN_US],
+            # Add model to use a specific model
+            # model="chirp_3",
+        ),
        credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
        location="us",
    )

    tts = GoogleHttpTTSService(
-        voice_id="en-US-Chirp3-HD-Charon",
-        params=GoogleHttpTTSService.InputParams(language=Language.EN_US),
+        settings=GoogleHttpTTSService.Settings(
+            voice="en-US-Chirp3-HD-Charon",
+            language=Language.EN_US,
+        ),
        credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
    )

    llm = GoogleLLMService(
        api_key=os.getenv("GOOGLE_API_KEY"),
-        model="gemini-2.5-flash",
-        # force a certain amount of thinking if you want it
-        # params=GoogleLLMService.InputParams(
-        #     thinking=GoogleLLMService.ThinkingConfig(thinking_budget=4096)
-        # ),
+        settings=GoogleLLMService.Settings(
+            model="gemini-2.5-flash",
+            # force a certain amount of thinking if you want it
+            # thinking=GoogleLLMService.ThinkingConfig(thinking_budget=4096)
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -112,7 +112,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07n-interruptible-google.py
+++ b/examples/foundational/07n-interruptible-google.py
@@ -54,34 +54,34 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    logger.info(f"Starting bot")

    stt = GoogleSTTService(
-        params=GoogleSTTService.InputParams(languages=Language.EN_US, model="chirp_3"),
+        settings=GoogleSTTService.Settings(
+            languages=[Language.EN_US],
+            # Add model to use a specific model
+            # model="chirp_3",
+        ),
        credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
        location="us",
    )

    tts = GoogleTTSService(
-        voice_id="en-US-Chirp3-HD-Charon",
-        params=GoogleTTSService.InputParams(language=Language.EN_US),
+        settings=GoogleTTSService.Settings(
+            voice="en-US-Chirp3-HD-Charon",
+            language=Language.EN_US,
+        ),
        credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
    )

    llm = GoogleLLMService(
        api_key=os.getenv("GOOGLE_API_KEY"),
-        model="gemini-2.5-flash",
-        # force a certain amount of thinking if you want it
-        # params=GoogleLLMService.InputParams(
-        #     thinking=GoogleLLMService.ThinkingConfig(thinking_budget=4096)
-        # ),
+        settings=GoogleLLMService.Settings(
+            model="gemini-2.5-flash",
+            # force a certain amount of thinking if you want it
+            # thinking=GoogleLLMService.ThinkingConfig(thinking_budget=4096),
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -112,7 +112,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07o-interruptible-assemblyai-turn-detection.py
+++ b/examples/foundational/07o-interruptible-assemblyai-turn-detection.py
@@ -0,0 +1,178 @@
+#
+# Copyright (c) 2024-2026, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+
+import os
+
+from dotenv import load_dotenv
+from loguru import logger
+
+from pipecat.audio.vad.silero import SileroVADAnalyzer
+from pipecat.frames.frames import LLMRunFrame
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.llm_context import LLMContext
+from pipecat.processors.aggregators.llm_response_universal import (
+    LLMContextAggregatorPair,
+    LLMUserAggregatorParams,
+)
+from pipecat.runner.types import RunnerArguments
+from pipecat.runner.utils import create_transport
+from pipecat.services.assemblyai.stt import AssemblyAISTTService
+from pipecat.services.cartesia.tts import CartesiaTTSService
+from pipecat.services.openai.llm import OpenAILLMService
+from pipecat.transports.base_transport import BaseTransport, TransportParams
+from pipecat.transports.daily.transport import DailyParams
+from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
+from pipecat.turns.user_turn_strategies import ExternalUserTurnStrategies
+
+load_dotenv(override=True)
+
+
+# We use lambdas to defer transport parameter creation until the transport
+# type is selected at runtime.
+transport_params = {
+    "daily": lambda: DailyParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+    "twilio": lambda: FastAPIWebsocketParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+    "webrtc": lambda: TransportParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+}
+
+
+async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
+    """AssemblyAI u3-rt-pro with Built-in Turn Detection
+
+    This example demonstrates using AssemblyAI's u3-rt-pro Speech-to-Text model
+    with AssemblyAI's built-in turn detection for more natural conversation flow.
+
+    Key features:
+
+    1. AssemblyAI Turn Detection
+       - Set `vad_force_turn_endpoint=False` to use AssemblyAI's built-in turn detection
+       - AssemblyAI's model determines when user starts/stops speaking
+       - Uses `ExternalUserTurnStrategies` to delegate turn control to AssemblyAI
+       - More natural turn detection based on speech patterns and pauses
+
+    2. Advanced Turn Detection Tuning
+       - `min_turn_silence`: Minimum silence (ms) when confident about end-of-turn.
+         Lower values = faster responses. Default: 100ms
+       - `max_turn_silence`: Maximum silence (ms) before forcing end-of-turn.
+         Prevents long pauses. Default: 1000ms
+
+    3. Prompt-Based Transcription Enhancement
+       - Use `prompt` parameter to improve accuracy for specific names/terms
+       - Particularly useful for proper nouns, technical terms, domain vocabulary
+       - Example: "Names: Xiomara, Saoirse, Krzystof. Technical terms: API, OAuth."
+
+    4. Speaker Diarization (Optional)
+       - Enable with `speaker_labels=True`
+       - Automatically identifies different speakers in multi-party conversations
+       - TranscriptionFrame includes speaker_id field (e.g., "Speaker A", "Speaker B")
+
+    5. Language Detection (Optional, multilingual model only)
+       - Enable with `language_detection=True`
+       - Automatically detects spoken language
+       - Available with universal-streaming-multilingual model
+
+    For more information: https://www.assemblyai.com/docs/speech-to-text/streaming
+    """
+    logger.info(f"Starting bot")
+
+    stt = AssemblyAISTTService(
+        api_key=os.getenv("ASSEMBLYAI_API_KEY"),
+        vad_force_turn_endpoint=False,  # Use AssemblyAI's built-in turn detection
+        settings=AssemblyAISTTService.Settings(
+            model="u3-rt-pro",
+            # Optional: Tune turn detection timing (defaults shown below)
+            # min_turn_silence=100,  # Default
+            # max_turn_silence=1000,  # Default
+            # Optional: Boost accuracy for specific names/terms
+            # keyterms_prompt=["Xiomara", "Saoirse", "Krzystof", "API", "OAuth"],
+            # Optional: Enable speaker diarization
+            # speaker_labels=True,
+        ),
+    )
+
+    tts = CartesiaTTSService(
+        api_key=os.getenv("CARTESIA_API_KEY"),
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
+    )
+
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )
+
+    context = LLMContext()
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        user_params=LLMUserAggregatorParams(
+            user_turn_strategies=ExternalUserTurnStrategies(),
+            vad_analyzer=SileroVADAnalyzer(),
+        ),
+    )
+
+    pipeline = Pipeline(
+        [
+            transport.input(),  # Transport user input
+            stt,  # STT
+            user_aggregator,  # User responses
+            llm,  # LLM
+            tts,  # TTS
+            transport.output(),  # Transport bot output
+            assistant_aggregator,  # Assistant spoken responses
+        ]
+    )
+
+    task = PipelineTask(
+        pipeline,
+        params=PipelineParams(
+            enable_metrics=True,
+            enable_usage_metrics=True,
+        ),
+        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
+    )
+
+    @transport.event_handler("on_client_connected")
+    async def on_client_connected(transport, client):
+        logger.info(f"Client connected")
+        # Kick off the conversation.
+        context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
+        await task.queue_frames([LLMRunFrame()])
+
+    @transport.event_handler("on_client_disconnected")
+    async def on_client_disconnected(transport, client):
+        logger.info(f"Client disconnected")
+        await task.cancel()
+
+    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
+
+    await runner.run(task)
+
+
+async def bot(runner_args: RunnerArguments):
+    """Main bot entry point compatible with Pipecat Cloud."""
+    transport = await create_transport(runner_args, transport_params)
+    await run_bot(transport, runner_args)
+
+
+if __name__ == "__main__":
+    from pipecat.runner.run import main
+
+    main()
--- a/examples/foundational/07o-interruptible-assemblyai.py
+++ b/examples/foundational/07o-interruptible-assemblyai.py
@@ -59,19 +59,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -102,7 +102,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07p-interruptible-krisp-viva.py
+++ b/examples/foundational/07p-interruptible-krisp-viva.py
@@ -31,6 +31,8 @@ from pipecat.audio.filters.krisp_viva_filter import KrispVivaFilter
 from pipecat.audio.turn.krisp_viva_turn import KrispVivaTurn
 from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import LLMRunFrame
+from pipecat.metrics.metrics import TurnMetricsData
+from pipecat.observers.loggers.metrics_log_observer import MetricsLogObserver
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -41,32 +43,37 @@ from pipecat.processors.aggregators.llm_response_universal import (
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
+from pipecat.services.cartesia.tts import CartesiaTTSService
 from pipecat.services.deepgram.stt import DeepgramSTTService
-from pipecat.services.deepgram.tts import DeepgramTTSService
 from pipecat.services.openai.llm import OpenAILLMService
 from pipecat.transports.base_transport import BaseTransport, TransportParams
 from pipecat.transports.daily.transport import DailyParams
 from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
+from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
+from pipecat.turns.user_turn_strategies import UserTurnStrategies

 load_dotenv(override=True)

 # We use lambdas to defer transport parameter creation until the transport
 # type is selected at runtime.
+
+krisp_viva_filter = KrispVivaFilter()
+
 transport_params = {
    "daily": lambda: DailyParams(
        audio_in_enabled=True,
        audio_out_enabled=True,
-        audio_in_filter=KrispVivaFilter(),
+        audio_in_filter=krisp_viva_filter,
    ),
    "twilio": lambda: FastAPIWebsocketParams(
        audio_in_enabled=True,
        audio_out_enabled=True,
-        audio_in_filter=KrispVivaFilter(),
+        audio_in_filter=krisp_viva_filter,
    ),
    "webrtc": lambda: TransportParams(
        audio_in_enabled=True,
        audio_out_enabled=True,
-        audio_in_filter=KrispVivaFilter(),
+        audio_in_filter=krisp_viva_filter,
    ),
 }

@@ -76,18 +83,21 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))

-    tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-helios-en")
+    tts = CartesiaTTSService(
+        api_key=os.getenv("CARTESIA_API_KEY"),
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
+    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(
@@ -117,13 +127,14 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
            enable_usage_metrics=True,
        ),
        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
+        observers=[MetricsLogObserver(include_metrics={TurnMetricsData})],
    )

    @transport.event_handler("on_client_connected")
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07p-interruptible-krisp.py
+++ b/examples/foundational/07p-interruptible-krisp.py
@@ -58,18 +58,21 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))

-    tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-helios-en")
+    tts = DeepgramTTSService(
+        api_key=os.getenv("DEEPGRAM_API_KEY"),
+        settings=DeepgramTTSService.Settings(
+            voice="aura-helios-en",
+        ),
+    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -100,7 +103,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07q-interruptible-rime-http.py
+++ b/examples/foundational/07q-interruptible-rime-http.py
@@ -60,21 +60,22 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

        tts = RimeHttpTTSService(
            api_key=os.getenv("RIME_API_KEY", ""),
-            voice_id="luna",
+            settings=RimeHttpTTSService.Settings(
+                voice="luna",
+                model="arcana",
+            ),
            model="arcana",
            aiohttp_session=session,
        )

-        llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+        llm = OpenAILLMService(
+            api_key=os.getenv("OPENAI_API_KEY"),
+            settings=OpenAILLMService.Settings(
+                system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+            ),
+        )

-        messages = [
-            {
-                "role": "system",
-                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-            },
-        ]
-
-        context = LLMContext(messages)
+        context = LLMContext()
        user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
            context,
            user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -105,7 +106,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        async def on_client_connected(transport, client):
            logger.info(f"Client connected")
            # Kick off the conversation.
-            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+            context.add_message(
+                {"role": "user", "content": "Please introduce yourself to the user."}
+            )
            await task.queue_frames([LLMRunFrame()])

        @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07q-interruptible-rime.py
+++ b/examples/foundational/07q-interruptible-rime.py
@@ -56,19 +56,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = RimeTTSService(
        api_key=os.getenv("RIME_API_KEY", ""),
-        voice_id="luna",
+        settings=RimeTTSService.Settings(
+            voice="luna",
+        ),
    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -99,7 +99,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07r-interruptible-nvidia.py
+++ b/examples/foundational/07r-interruptible-nvidia.py
@@ -55,19 +55,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    stt = NvidiaSTTService(api_key=os.getenv("NVIDIA_API_KEY"))

    llm = NvidiaLLMService(
-        api_key=os.getenv("NVIDIA_API_KEY"), model="meta/llama-3.1-405b-instruct"
+        api_key=os.getenv("NVIDIA_API_KEY"),
+        settings=NvidiaLLMService.Settings(
+            model="meta/llama-3.3-70b-instruct",
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
    )

    tts = NvidiaTTSService(api_key=os.getenv("NVIDIA_API_KEY"))

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -98,7 +95,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07s-interruptible-google-audio-in.py
+++ b/examples/foundational/07s-interruptible-google-audio-in.py
@@ -48,7 +48,7 @@ load_dotenv(override=True)

 marker = "|----|"
 system_message = f"""
-You are a helpful LLM in a WebRTC call. Your goals are to be helpful and brief in your responses.
+You are a helpful LLM in a voice call. Your goals are to be helpful and brief in your responses.

 You are expert at transcribing audio to text. You will receive a mixture of audio and text input. When
 asked to transcribe what the user said, output an exact, word-for-word transcription.
@@ -216,31 +216,24 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    llm = GoogleLLMService(
        api_key=os.getenv("GOOGLE_API_KEY"),
-        model="gemini-2.5-flash",
-        # force a certain amount of thinking if you want it
-        # params=GoogleLLMService.InputParams(
-        #     thinking=GoogleLLMService.ThinkingConfig(thinking_budget=4096)
-        # ),
+        settings=GoogleLLMService.Settings(
+            model="gemini-2.5-flash",
+            system_instruction=system_message,
+            # force a certain amount of thinking if you want it
+            # thinking=GoogleLLMService.ThinkingConfig(thinking_budget=4096)
+        ),
    )

    tts = GoogleTTSService(
-        voice_id="en-US-Chirp3-HD-Charon",
+        settings=GoogleTTSService.Settings(
+            voice="en-US-Chirp3-HD-Charon",
+            language=Language.EN_US,
+        ),
        params=GoogleTTSService.InputParams(language=Language.EN_US),
        credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
    )

-    messages = [
-        {
-            "role": "system",
-            "content": system_message,
-        },
-        {
-            "role": "user",
-            "content": "Start by saying hello.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -276,7 +269,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07t-interruptible-fish.py
+++ b/examples/foundational/07t-interruptible-fish.py
@@ -57,19 +57,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = FishAudioTTSService(
        api_key=os.getenv("FISH_API_KEY"),
-        model="4ce7e917cedd4bc2bb2e6ff3a46acaa1",  # Barack Obama
+        settings=FishAudioTTSService.Settings(
+            voice="4ce7e917cedd4bc2bb2e6ff3a46acaa1",  # Barack Obama
+        ),
    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -100,7 +100,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07v-interruptible-neuphonic-http.py
+++ b/examples/foundational/07v-interruptible-neuphonic-http.py
@@ -60,20 +60,20 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

        tts = NeuphonicHttpTTSService(
            api_key=os.getenv("NEUPHONIC_API_KEY"),
-            voice_id="fc854436-2dac-4d21-aa69-ae17b54e98eb",  # Emily
+            settings=NeuphonicHttpTTSService.Settings(
+                voice="fc854436-2dac-4d21-aa69-ae17b54e98eb",  # Emily
+            ),
            aiohttp_session=session,
        )

-        llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+        llm = OpenAILLMService(
+            api_key=os.getenv("OPENAI_API_KEY"),
+            settings=OpenAILLMService.Settings(
+                system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+            ),
+        )

-        messages = [
-            {
-                "role": "system",
-                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-            },
-        ]
-
-        context = LLMContext(messages)
+        context = LLMContext()
        user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
            context,
            user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -104,7 +104,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        async def on_client_connected(transport, client):
            logger.info(f"Client connected")
            # Kick off the conversation.
-            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+            context.add_message(
+                {"role": "user", "content": "Please introduce yourself to the user."}
+            )
            await task.queue_frames([LLMRunFrame()])

        @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07v-interruptible-neuphonic.py
+++ b/examples/foundational/07v-interruptible-neuphonic.py
@@ -56,19 +56,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = NeuphonicTTSService(
        api_key=os.getenv("NEUPHONIC_API_KEY"),
-        voice_id="fc854436-2dac-4d21-aa69-ae17b54e98eb",  # Emily
+        settings=NeuphonicTTSService.Settings(
+            voice="fc854436-2dac-4d21-aa69-ae17b54e98eb",  # Emily
+        ),
    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -99,7 +99,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07w-interruptible-fal.py
+++ b/examples/foundational/07w-interruptible-fal.py
@@ -7,6 +7,7 @@

 import os

+import aiohttp
 from dotenv import load_dotenv
 from loguru import logger

@@ -53,66 +54,70 @@ transport_params = {
 async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    logger.info(f"Starting bot")

-    stt = FalSTTService(
-        api_key=os.getenv("FAL_KEY"),
-    )
+    async with aiohttp.ClientSession() as session:
+        stt = FalSTTService(
+            api_key=os.getenv("FAL_KEY"),
+            aiohttp_session=session,
+        )

-    tts = CartesiaTTSService(
-        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
-    )
+        tts = CartesiaTTSService(
+            api_key=os.getenv("CARTESIA_API_KEY"),
+            settings=CartesiaTTSService.Settings(
+                voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+            ),
+        )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+        llm = OpenAILLMService(
+            api_key=os.getenv("OPENAI_API_KEY"),
+            settings=OpenAILLMService.Settings(
+                system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+            ),
+        )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
+        context = LLMContext()
+        user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+            context,
+            user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+        )

-    context = LLMContext(messages)
-    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
-        context,
-        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
-    )
+        pipeline = Pipeline(
+            [
+                transport.input(),  # Transport user input
+                stt,  # STT
+                user_aggregator,  # User responses
+                llm,  # LLM
+                tts,  # TTS
+                transport.output(),  # Transport bot output
+                assistant_aggregator,  # Assistant spoken responses
+            ]
+        )

-    pipeline = Pipeline(
-        [
-            transport.input(),  # Transport user input
-            stt,  # STT
-            user_aggregator,  # User responses
-            llm,  # LLM
-            tts,  # TTS
-            transport.output(),  # Transport bot output
-            assistant_aggregator,  # Assistant spoken responses
-        ]
-    )
+        task = PipelineTask(
+            pipeline,
+            params=PipelineParams(
+                enable_metrics=True,
+                enable_usage_metrics=True,
+            ),
+            idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
+        )

-    task = PipelineTask(
-        pipeline,
-        params=PipelineParams(
-            enable_metrics=True,
-            enable_usage_metrics=True,
-        ),
-        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
-    )
+        @transport.event_handler("on_client_connected")
+        async def on_client_connected(transport, client):
+            logger.info(f"Client connected")
+            # Kick off the conversation.
+            context.add_message(
+                {"role": "user", "content": "Please introduce yourself to the user."}
+            )
+            await task.queue_frames([LLMRunFrame()])

-    @transport.event_handler("on_client_connected")
-    async def on_client_connected(transport, client):
-        logger.info(f"Client connected")
-        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
-        await task.queue_frames([LLMRunFrame()])
+        @transport.event_handler("on_client_disconnected")
+        async def on_client_disconnected(transport, client):
+            logger.info(f"Client disconnected")
+            await task.cancel()

-    @transport.event_handler("on_client_disconnected")
-    async def on_client_disconnected(transport, client):
-        logger.info(f"Client disconnected")
-        await task.cancel()
+        runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)

-    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
-
-    await runner.run(task)
+        await runner.run(task)


 async def bot(runner_args: RunnerArguments):
--- a/examples/foundational/07x-interruptible-local.py
+++ b/examples/foundational/07x-interruptible-local.py
@@ -44,19 +44,19 @@ async def main():

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -82,7 +82,7 @@ async def main():
        ),
    )

-    messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+    context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
    await task.queue_frames([LLMRunFrame()])

    runner = PipelineRunner()
--- a/examples/foundational/07y-interruptible-minimax.py
+++ b/examples/foundational/07y-interruptible-minimax.py
@@ -63,19 +63,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
            api_key=os.getenv("MINIMAX_API_KEY", ""),
            group_id=os.getenv("MINIMAX_GROUP_ID", ""),
            aiohttp_session=session,
-            params=MiniMaxHttpTTSService.InputParams(language=Language.EN),
+            settings=MiniMaxHttpTTSService.Settings(
+                language=Language.EN,
+            ),
        )

-        llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+        llm = OpenAILLMService(
+            api_key=os.getenv("OPENAI_API_KEY"),
+            settings=OpenAILLMService.Settings(
+                system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+            ),
+        )

-        messages = [
-            {
-                "role": "system",
-                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-            },
-        ]
-
-        context = LLMContext(messages)
+        context = LLMContext()
        user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
            context,
            user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -106,7 +106,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        async def on_client_connected(transport, client):
            logger.info(f"Client connected")
            # Kick off the conversation.
-            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+            context.add_message(
+                {"role": "user", "content": "Please introduce yourself to the user."}
+            )
            await task.queue_frames([LLMRunFrame()])

        @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07z-interruptible-sarvam-http.py
+++ b/examples/foundational/07z-interruptible-sarvam-http.py
@@ -59,25 +59,27 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async with aiohttp.ClientSession() as session:
        stt = SarvamSTTService(
            api_key=os.getenv("SARVAM_API_KEY"),
-            model="saarika:v2.5",
+            settings=SarvamSTTService.Settings(
+                model="saarika:v2.5",
+            ),
        )

        tts = SarvamHttpTTSService(
            api_key=os.getenv("SARVAM_API_KEY"),
            aiohttp_session=session,
-            params=SarvamHttpTTSService.InputParams(language=Language.EN),
+            settings=SarvamHttpTTSService.Settings(
+                language=Language.EN_IN,
+            ),
        )

-        llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+        llm = OpenAILLMService(
+            api_key=os.getenv("OPENAI_API_KEY"),
+            settings=OpenAILLMService.Settings(
+                system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+            ),
+        )

-        messages = [
-            {
-                "role": "system",
-                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-            },
-        ]
-
-        context = LLMContext(messages)
+        context = LLMContext()
        user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
            context,
            user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -108,7 +110,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        async def on_client_connected(transport, client):
            logger.info(f"Client connected")
            # Kick off the conversation.
-            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+            context.add_message(
+                {"role": "user", "content": "Please introduce yourself to the user."}
+            )
            await task.queue_frames([LLMRunFrame()])

        @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07z-interruptible-sarvam.py
+++ b/examples/foundational/07z-interruptible-sarvam.py
@@ -54,24 +54,26 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    stt = SarvamSTTService(
        api_key=os.getenv("SARVAM_API_KEY"),
-        model="saarika:v2.5",
+        settings=SarvamSTTService.Settings(
+            model="saarika:v2.5",
+        ),
    )

    tts = SarvamTTSService(
        api_key=os.getenv("SARVAM_API_KEY"),
-        model="bulbul:v2",
-        voice_id="manisha",
+        settings=SarvamTTSService.Settings(
+            model="bulbul:v2",
+            voice="manisha",
+        ),
+    )
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
    )
-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -101,7 +103,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

        # Optionally, you can wait for 30 seconds and then change the voice.
--- a/Show More
+++ b/Show More
				`@@ -0,0 +1 @@`
				- Changed tool result JSON serialization to use `ensure_ascii=False`, preserving UTF-8 characters instead of escaping them. This reduces context size and token usage for non-English languages.
				`@@ -1 +0,0 @@`
				- Switched `GradiumTTSService` from `InterruptibleWordTTSService` to `AudioContextWordTTSService`, eliminating websocket disconnect/reconnect on every interruption by using `client_req_id`-based multiplexing.
				`@@ -0,0 +1 @@`
				- `OpenAIRealtimeSTTService`'s `noise_reduction` parameter is now part of `OpenAIRealtimeSTTSettings`, making it runtime-updatable via `STTUpdateSettingsFrame`. The direct `noise_reduction` init argument is deprecated as of 0.0.106.
				`@@ -0,0 +1 @@`
				- Updated `sarvamai` dependency from `0.1.26a2` (alpha) to `0.1.26` (stable release).
				`@@ -0,0 +1 @@`
				- Fixed an issue where the default model for `OpenAILLMService` and `AzureLLMService` was mistakenly reverted to `gpt-4o`. The defaults are now restored to `gpt-4.1`.
				`@@ -0,0 +1 @@`
				- `SimliVideoService` now extends `AIService` instead of `FrameProcessor`, aligning it with the HeyGen and Tavus video services. It supports `SimliVideoService.Settings(...)` for configuration and uses `start()`/`stop()`/`cancel()` lifecycle methods. Existing constructor usage (`api_key`, `face_id`, etc.) remains unchanged.
				`@@ -0,0 +1 @@`
				- `SimliVideoService.InputParams` is deprecated. Use the direct constructor parameters `max_session_length`, `max_idle_time`, and `enable_logging` instead.
				`@@ -0,0 +1 @@`
				- Added optional `service` field to `ServiceUpdateSettingsFrame` (and its subclasses `LLMUpdateSettingsFrame`, `TTSUpdateSettingsFrame`, `STTUpdateSettingsFrame`) to target a specific service instance. When `service` is set, only the matching service applies the settings; others forward the frame unchanged. This enables updating a single service when multiple services of the same type exist in the pipeline.
				`@@ -0,0 +1 @@`
				- Added `sip_provider` and `room_geo` parameters to `configure()` in the Daily runner. These convenience parameters let callers specify a SIP provider name and geographic region directly without manually constructing `DailyRoomProperties` and `DailyRoomSipParams`.
				`@@ -0,0 +1 @@`
				- Fixed a race condition where `EndTaskFrame` could cause the pipeline to shut down before in-flight frames (e.g. LLM function call responses) finished processing. `EndTaskFrame` and `StopTaskFrame` now flow through the pipeline as `ControlFrame`s, ensuring all pending work is flushed before shutdown begins. `CancelTaskFrame` and `InterruptionTaskFrame` remain immediate (`SystemFrame`).