Update langchain to 1.x to fix CVE in langchain-core (dependabot #174 )

langchain-core had a high-severity path traversal vulnerability in legacy load_prompt functions, fixed in 1.2.22. Declare langchain and openpipe extras as conflicting since langchain-openai now requires openai>=2.26 while openpipe caps openai<=1.97.1.
Update Pygments to 2.20.0 in uv.lock
2026-03-29 09:47:57 -04:00 · 2026-03-29 09:34:05 -04:00 · 2026-03-29 09:33:08 -04:00 · 2026-03-29 09:03:06 -04:00 · 2026-03-29 08:58:01 -04:00 · 2026-03-29 08:50:11 -04:00
616 changed files with 59164 additions and 18251 deletions
--- a/.claude-plugin/marketplace.json
+++ b/.claude-plugin/marketplace.json
@@ -0,0 +1,27 @@
+{
+  "name": "pipecat-dev-skills",
+  "owner": {
+    "name": "Pipecat"
+  },
+  "metadata": {
+    "description": "Development workflow skills for contributing to the Pipecat project",
+    "version": "1.0.0"
+  },
+  "plugins": [
+    {
+      "name": "pipecat-dev",
+      "description": "Development workflow skills for contributing to the Pipecat project",
+      "version": "1.0.0",
+      "source": "./",
+      "skills": [
+        "./.claude/skills/changelog",
+        "./.claude/skills/cleanup",
+        "./.claude/skills/code-review",
+        "./.claude/skills/docstring",
+        "./.claude/skills/pr-description",
+        "./.claude/skills/pr-submit",
+        "./.claude/skills/update-docs"
+      ]
+    }
+  ]
+}
--- a/.claude/skills/changelog/SKILL.md
+++ b/.claude/skills/changelog/SKILL.md
@@ -26,12 +26,26 @@ Create changelog files for the important commits in this PR. The PR number is pr
   - `{PR_NUMBER}.performance.md` - for performance improvements
   - `{PR_NUMBER}.other.md` - for other changes

-4. Each changelog file should at least contain a main single line starting with `- ` followed by a clear description of the change.
+4. Each changelog file should at least contain a main single line starting with `- ` followed by a clear description of the change. No line wrapping.

 5. If the change is complicated, changelog files can have indented lines after the main line with additional details or code samples.

 6. Use ⚠️ emoji prefix for breaking changes.

+7. **Write changes in user-facing terms first.** Lead with what users of the framework will notice: new APIs, changed behavior, new parameters, fixed bugs they might have hit, etc. Implementation details (internal refactoring, how something is wired up under the hood) can be included as secondary context after the user-facing description, but should never be the *only* content of a changelog entry when there is a user-visible effect.
+
+   **Good** (user-facing first, implementation detail as context):
+   ```
+   - Turn completion instructions now persist correctly across full context updates when using `system_instruction`. Previously they were injected as a context system message, which caused warning spam and didn't survive context updates.
+   ```
+
+   **Bad** (implementation detail only, no user-facing framing):
+   ```
+   - Fixed turn completion instructions being injected as a context system message instead of using `system_instruction`.
+   ```
+
+   Ask yourself: "If I'm a developer building on Pipecat, what would I notice changed?" Start there.
+
 ## Example

 For PR #3519 with a new feature and a bug fix:
@@ -43,5 +57,5 @@ For PR #3519 with a new feature and a bug fix:

 `changelog/3519.fixed.md`:
 ```
- Fixed an issue where something was not working correctly.
+- Fixed an issue where something was not working correctly in some user-visible scenario. The root cause was an internal implementation detail.
 ```
--- a/.claude/skills/cleanup/SKILL.md
+++ b/.claude/skills/cleanup/SKILL.md
@@ -1,6 +1,6 @@
 # Code Cleanup Skill

-The **Code Cleanup Skill** reviews, refactors, and documents code changes in your current branch, ensuring alignment with **Pipecat’s architecture, coding standards, and example patterns**.
+The **Code Cleanup Skill** reviews, refactors, and documents code changes in your current branch, ensuring alignment with **Pipecat's architecture, coding standards, and example patterns**.
 It focuses on **readability, correctness, performance, and consistency**, while avoiding breaking changes.

 ---
@@ -28,9 +28,9 @@ This skill analyzes all changes introduced in your branch and performs the follo

 Invoke the skill using any of the following commands:

- “Clean up my branch code”
- “Refactor the changes in my branch”
- “Review and improve my branch code”
+- "Clean up my branch code"
+- "Refactor the changes in my branch"
+- "Review and improve my branch code"
 - `/cleanup`

 ---
--- a/.claude/skills/docstring/SKILL.md
+++ b/.claude/skills/docstring/SKILL.md
@@ -3,21 +3,20 @@ name: docstring
 description: Document a Python module and its classes using Google style
 ---

-Document a Python module and its classes using Google-style docstrings following project conventions. The class name is provided as an argument.
+Document a Python module or class using Google-style docstrings following project conventions. The argument can be a class name or a module path.

 ## Instructions

-1. First, find the class in the codebase:
-   ```
-   Search for "class ClassName" in src/pipecat/
-   ```
+1. Determine what to document based on the argument:

-2. If multiple files contain that class name:
-   - List all matches with their file paths
-   - Ask the user which one they want to document
-   - Wait for confirmation before proceeding
+   **If a module path is provided** (e.g. `src/pipecat/audio/vad/vad_analyzer.py`):
+   - Use that file directly

-3. Once the file is identified, read the module to understand its structure:
+   **If a class name is provided** (e.g. `VADAnalyzer`):
+   - Search for `class ClassName` in `src/pipecat/`
+   - If multiple files contain that class name, list all matches with their file paths, ask the user which one they want to document, and wait for confirmation
+
+2. Once the file is identified, read the module to understand its structure:
   - Identify all classes, functions, and important type aliases
   - Understand the purpose of each component

--- a/.claude/skills/pr-submit/SKILL.md
+++ b/.claude/skills/pr-submit/SKILL.md
@@ -0,0 +1,28 @@
+---
+name: pr-submit
+description: Create and submit a GitHub PR from the current branch
+---
+
+Submit the current changes as a GitHub pull request.
+
+## Instructions
+
+1. Check the current state of the repository:
+   - Run `git status` to see staged, unstaged, and untracked changes
+   - Run `git diff` to see current changes
+   - Run `git log --oneline -10` to see recent commits
+
+2. If there are uncommitted changes relevant to the PR:
+   - Ask the user if they want a specific prefix for the branch name (e.g., `alice/`, `fix/`, `feat/`)
+   - Create a new branch based on the current branch
+   - Commit the changes using multiple commits if the changes are unrelated
+
+3. Push the branch and create the PR:
+   - Push with `-u` flag to set upstream tracking
+   - Create the PR using `gh pr create`
+
+4. After the PR is created:
+   - Run `/changelog <pr_number>` to generate changelog files, then commit and push them
+   - Run `/pr-description <pr_number>` to update the PR description
+
+5. Return the PR URL to the user.
--- a/.claude/skills/update-docs/SKILL.md
+++ b/.claude/skills/update-docs/SKILL.md
@@ -0,0 +1,306 @@
+---
+name: update-docs
+description: Update documentation pages to match source code changes on the current branch
+---
+
+Update documentation pages to reflect source code changes on the current branch. Analyzes the diff against main, maps changed source files to their corresponding doc pages, and makes targeted edits.
+
+## Arguments
+
+```
+/update-docs [DOCS_PATH]
+```
+
+- `DOCS_PATH` (optional): Path to the docs repository root. If not provided, ask the user.
+
+Examples:
+- `/update-docs /Users/me/src/docs`
+- `/update-docs`
+
+## Instructions
+
+### Step 1: Resolve docs path
+
+If `DOCS_PATH` was provided as an argument, use it. Otherwise, ask the user for the path to their docs repository.
+
+Verify the path exists and contains `server/services/` subdirectory.
+
+### Step 2: Create docs branch
+
+Get the current pipecat branch name:
+```bash
+git rev-parse --abbrev-ref HEAD
+```
+
+In the docs repo, create a new branch off main with a matching name:
+```bash
+cd DOCS_PATH && git checkout main && git pull && git checkout -b {branch-name}-docs
+```
+
+For example, if the pipecat branch is `feat/new-service`, the docs branch becomes `feat/new-service-docs`.
+
+All doc edits in subsequent steps are made on this branch.
+
+### Step 3: Detect changed source files
+
+Run:
+```bash
+git diff main..HEAD --name-only
+```
+
+Filter to files that could affect documentation:
+- `src/pipecat/services/**/*.py` (service implementations)
+- `src/pipecat/transports/**/*.py` (transport implementations)
+- `src/pipecat/serializers/**/*.py` (serializer implementations)
+- `src/pipecat/processors/**/*.py` (processor implementations)
+- `src/pipecat/audio/**/*.py` (audio utilities)
+- `src/pipecat/turns/**/*.py` (turn management)
+- `src/pipecat/observers/**/*.py` (observers)
+- `src/pipecat/pipeline/**/*.py` (pipeline core)
+
+Ignore `__init__.py`, `__pycache__`, test files, and files that only contain type re-exports.
+
+### Step 4: Map source files to doc pages
+
+For each changed source file, find the corresponding doc page. Read the mapping file at `.claude/skills/update-docs/SOURCE_DOC_MAPPING.md` and apply its tiered lookup: tier 1 (known exceptions) → tier 2 (pattern matching) → tier 3 (search fallback). **First match wins.**
+
+### Step 5: Analyze each source-doc pair
+
+For each mapped pair:
+
+1. **Read the full source file** to understand current state
+2. **Read the diff** for that file: `git diff main..HEAD -- <source_file>`
+3. **Read the current doc page** in full
+
+Identify what changed by comparing source to docs:
+
+- **Constructor parameters**: Compare `__init__` signature to the Configuration section's `<ParamField>` entries
+- **InputParams fields**: Compare `InputParams(BaseModel)` class fields to the InputParams table
+- **Event handlers**: Compare `_register_event_handler` calls and event handler definitions to Event Handlers section
+- **Class names / imports**: Check if Usage examples reference correct names
+- **Behavioral changes**: Check if Notes section needs updating
+
+### Step 6: Make targeted edits
+
+For each doc page that needs updates, edit **only the sections that need changes**. Preserve all other content exactly as-is.
+
+#### Rules
+
+- **Never remove content** unless the corresponding source code was removed
+- **Never rewrite sections** that are already accurate
+- **Match existing formatting** — if the page uses `<ParamField>` tags, use them; if it uses tables, use tables
+- **Keep descriptions concise** — match the tone and length of surrounding content
+- **Preserve CardGroup, links, and examples** unless they reference removed functionality
+- **Don't touch frontmatter** unless the class was renamed
+
+#### Section-specific guidance
+
+**Configuration** (constructor params):
+- Use `<ParamField path="name" type="type" default="value">` format if the page already uses it
+- Add new params in logical order (required first, then optional)
+- Remove params that no longer exist in source
+- Update types/defaults that changed
+
+**InputParams** (runtime settings):
+- Use markdown table format: `| Parameter | Type | Default | Description |`
+- Match the field names and types from the `InputParams(BaseModel)` class
+- Include the default values from the source
+
+**Usage** (code examples):
+- Update import paths, class names, and parameter names
+- Only modify examples if they would break or be misleading with the new API
+- Don't rewrite working examples just to add new optional params
+
+**Notes**:
+- Add notes for new behavioral gotchas or breaking changes
+- Remove notes about limitations that were fixed
+- Keep existing notes that are still accurate
+
+**Event Handlers**:
+- Update the event table and example code
+- Add new events, remove deleted ones
+- Update handler signatures if they changed
+
+**Overview / Key Features / Prerequisites**:
+- Only update if the PR fundamentally changes what the service does (new capability, removed capability, renamed class)
+- Most PRs will NOT need changes to these sections
+
+### Step 7: Update guides
+
+Guides at `DOCS_PATH/guides/` reference specific class names, parameters, imports, and code patterns. After completing reference doc edits, check if any guides need updates too.
+
+For each changed source file, collect the class names, renamed parameters, and changed imports from the diff. Search the guides directory:
+```bash
+grep -rl "ClassName\|old_param_name" DOCS_PATH/guides/
+```
+
+For each guide that references changed code:
+1. Read the full guide
+2. Update class names, parameter names, import paths, and code examples that are now incorrect
+3. **Don't rewrite prose** — only fix the specific references that changed
+4. Leave guides alone if they reference the service generally but don't use any changed APIs
+
+Guide directories:
+- `guides/learn/` — conceptual tutorials (pipeline, LLM, STT, TTS, etc.)
+- `guides/fundamentals/` — practical how-tos (metrics, recording, transcripts, etc.)
+- `guides/features/` — feature-specific guides (Gemini Live, OpenAI audio, WhatsApp, etc.)
+- `guides/telephony/` — telephony integration guides (Twilio, Plivo, Telnyx, etc.)
+
+### Step 8: Identify doc gaps
+
+After processing all mapped pairs, check for two kinds of gaps:
+
+**Missing pages**: Source files that had no doc page mapping (neither tier 1, 2, nor 3) and are not marked as "(skip)". For each, tell the user:
+- The source file path
+- The main class(es) it defines
+- Whether a new doc page should be created
+
+**Missing sections**: Mapped doc pages that are missing standard sections compared to the source. For example, a transport page with no Configuration section, or a service page with no InputParams table when the source defines `InputParams(BaseModel)`. Flag these and offer to add the missing sections.
+
+If the user wants a new page, do all three of the following:
+
+#### 8a: Create the doc page
+
+Create the new `.mdx` file using this template structure:
+```
+---
+title: "Service Name"
+description: "Brief description"
+---
+
+## Overview
+
+[Description from class docstring or source analysis]
+
+<CardGroup cols={2}>
+  [Cards for API reference and examples if available]
+</CardGroup>
+
+## Installation
+
+```bash
+pip install "pipecat-ai[package-name]"
+```
+
+## Prerequisites
+
+[Environment variables and account setup]
+
+## Configuration
+
+[ParamField entries for constructor params]
+
+## InputParams
+
+[Table of InputParams fields, if the service has them]
+
+## Usage
+
+### Basic Setup
+
+```python
+[Minimal working example]
+```
+
+## Notes
+
+[Important caveats]
+
+## Event Handlers
+
+[Event table and example code]
+```
+
+#### 8b: Add to docs.json
+
+Add the new page path to `DOCS_PATH/docs.json` in the correct navigation group. The path format is `server/services/{category}/{provider}` (without the `.mdx` extension).
+
+Find the matching group in the navigation structure:
+- **STT** → `"group": "Speech-to-Text"` under Services
+- **TTS** → `"group": "Text-to-Speech"` under Services
+- **LLM** → `"group": "LLM"` under Services
+- **S2S** → `"group": "Speech-to-Speech"` under Services
+- **Transport** → `"group": "Transport"` under Services
+- **Serializer** → `"group": "Serializers"` under Services
+- **Image generation** → `"group": "Image Generation"` under Services
+- **Video** → `"group": "Video"` under Services
+- **Memory** → `"group": "Memory"` under Services
+- **Vision** → `"group": "Vision"` under Services
+- **Analytics** → `"group": "Analytics & Monitoring"` under Services
+
+Insert the new entry **alphabetically** within the group's `pages` array. For example, adding a new STT service "foo":
+```json
+{
+  "group": "Speech-to-Text",
+  "pages": [
+    "server/services/stt/assemblyai",
+    "server/services/stt/aws",
+    ...
+    "server/services/stt/foo",
+    ...
+  ]
+}
+```
+
+#### 8c: Add to supported-services.mdx
+
+Add a new row to the correct category table in `DOCS_PATH/server/services/supported-services.mdx`.
+
+Use this format:
+```
+| [DisplayName](/server/services/{category}/{provider}) | `pip install "pipecat-ai[package]"` |
+```
+
+To determine the correct values:
+- **DisplayName**: Use the service's human-readable name (e.g., "ElevenLabs", "AWS Polly", "Google Gemini")
+- **package**: Look at the service's `pyproject.toml` extras or the import pattern in the source code. For example, if the service is in `src/pipecat/services/foo/`, the package is typically `foo`.
+- If no pip dependencies are required, use `No dependencies required` instead.
+
+Insert the new row **alphabetically** within the table. Match the column alignment of the existing rows.
+
+### Step 9: Output summary
+
+After all edits are complete, print a summary:
+
+```
+## Documentation Updates
+
+### Updated reference pages
+- `server/services/stt/deepgram.mdx` — Updated Configuration (added `new_param`), InputParams (updated `language` default)
+- `server/services/tts/elevenlabs.mdx` — Updated Event Handlers (added `on_connected`)
+
+### Updated guides
+- `guides/learn/speech-to-text.mdx` — Updated code example (renamed `old_param` → `new_param`)
+
+### New service pages
+- `server/services/tts/newprovider.mdx` — Created page, added to docs.json (Text-to-Speech), added to supported-services.mdx
+
+### Unmapped source files
+- `src/pipecat/services/newprovider/tts.py` — NewProviderTTSService (no doc page exists)
+
+### Skipped files
+- `src/pipecat/services/ai_service.py` — internal base class
+```
+
+## Guidelines
+
+- **Be conservative** — only change what the diff warrants. Don't "improve" docs beyond what changed in source.
+- **Read before editing** — always read the full doc page before making changes so you understand the existing structure.
+- **Preserve voice** — match the writing style of the existing doc page, don't impose a different tone.
+- **One PR at a time** — this skill operates on the current branch's diff against main. Don't look at other branches.
+- **Parallel analysis** — when multiple source files map to different doc pages, analyze and edit them in parallel for efficiency.
+- **Shared source files** — files like `services/google/google.py` are shared bases. Check which services import from them and update all affected doc pages.
+
+## Checklist
+
+Before finishing, verify:
+
+- [ ] All changed source files were checked against the mapping table
+- [ ] Each doc page edit matches the actual source code change (not guessed)
+- [ ] No content was removed unless the corresponding source was removed
+- [ ] New parameters have accurate types and defaults from source
+- [ ] Formatting matches the existing page style
+- [ ] Guides referencing changed APIs were checked and updated
+- [ ] New service pages were added to `docs.json` in the correct group, alphabetically
+- [ ] New service pages were added to `supported-services.mdx` in the correct table, alphabetically
+- [ ] Unmapped files were reported to the user
--- a/.claude/skills/update-docs/SOURCE_DOC_MAPPING.md
+++ b/.claude/skills/update-docs/SOURCE_DOC_MAPPING.md
@@ -0,0 +1,79 @@
+# Source-to-Doc Mapping
+
+Maps pipecat source files to their documentation pages. Source paths are relative to `src/pipecat/`. Doc paths are relative to `DOCS_PATH`.
+
+## Name mismatches
+
+These source paths don't follow the standard `services/{provider}/{type}.py` → `server/services/{type}/{provider}.mdx` pattern.
+
+| Source path | Doc page |
+|---|---|
+| `services/google/llm.py` | `server/services/llm/gemini.mdx` |
+| `services/google/llm_vertex.py` | `server/services/llm/google-vertex.mdx` |
+| `services/google/google.py` | (shared base — check which services use it) |
+| `services/google/gemini_live/**` | `server/services/s2s/gemini-live.mdx` |
+| `services/google/gemini_live/llm_vertex.py` | `server/services/s2s/gemini-live-vertex.mdx` |
+| `services/aws_nova_sonic/**` | `server/services/s2s/aws.mdx` |
+| `services/ultravox/**` | `server/services/s2s/ultravox.mdx` |
+| `services/grok/realtime/**` | `server/services/s2s/grok.mdx` |
+| `services/openai/realtime/**` | `server/services/s2s/openai.mdx` |
+| `processors/frameworks/rtvi.py` | `server/frameworks/rtvi/rtvi-processor.mdx` and `server/frameworks/rtvi/rtvi-observer.mdx` |
+| `processors/transcript_processor.py` | `server/utilities/transcript-processor.mdx` |
+| `processors/user_idle_processor.py` | `server/utilities/user-idle-processor.mdx` |
+| `processors/idle_frame_processor.py` | `server/pipeline/pipeline-idle-detection.mdx` |
+| `pipeline/task.py` | `server/pipeline/pipeline-task.mdx` |
+| `pipeline/runner.py` | `server/utilities/runner/guide.mdx` |
+| `transports/base_transport.py` | `server/services/transport/transport-params.mdx` |
+
+## Skip list
+
+These files should never trigger doc updates.
+
+| Pattern | Reason |
+|---|---|
+| `services/ai_service.py` | Internal base class |
+| `services/stt_service.py` | Internal base class |
+| `services/tts_service.py` | Internal base class |
+| `services/llm_service.py` | Internal base class |
+| `services/websocket_service.py` | Internal base class |
+| `services/openai_realtime_beta/**` | Deprecated |
+| `services/openai_realtime/**` | Deprecated |
+| `services/gemini_multimodal_live/**` | Deprecated |
+| `services/aws/agent_core.py` | Internal |
+| `services/aws/sagemaker/**` | No doc page |
+| `transports/base_input.py` | Internal base class |
+| `transports/base_output.py` | Internal base class |
+| `transports/websocket/client.py` | No doc page |
+| `serializers/base_serializer.py` | Internal base class |
+| `serializers/protobuf.py` | Internal |
+| `processors/audio/**` | Internal |
+| `pipeline/pipeline.py` | Core architecture, not a service doc |
+
+## Pattern matching
+
+For files not in the tables above, apply these patterns. Convert underscores to hyphens in provider names for doc filenames.
+
+| Source pattern | Doc pattern |
+|---|---|
+| `services/{provider}/stt*.py` | `server/services/stt/{provider}.mdx` |
+| `services/{provider}/tts*.py` | `server/services/tts/{provider}.mdx` |
+| `services/{provider}/llm*.py` | `server/services/llm/{provider}.mdx` |
+| `services/{provider}/image*.py` | `server/services/image-generation/{provider}.mdx` |
+| `services/{provider}/video*.py` | `server/services/video/{provider}.mdx` |
+| `services/{provider}/realtime/**` | `server/services/s2s/{provider}.mdx` |
+| `transports/{name}/**` | `server/services/transport/{name}.mdx` |
+| `serializers/{name}.py` | `server/services/serializers/{name}.mdx` |
+| `observers/**` | `server/utilities/observers/` (match by class name) |
+| `audio/vad/**` | `server/utilities/audio/` (match by class name) |
+| `audio/filters/**` | `server/utilities/audio/` (match by class name) |
+| `audio/mixers/**` | `server/utilities/audio/` (match by class name) |
+| `processors/filters/**` | `server/utilities/filters/` (match by class name) |
+
+If the doc file doesn't exist at the resolved path, the file is **unmapped**.
+
+## Search fallback
+
+For files that don't match any table or pattern above:
+1. Extract the main class name(s) from the source file
+2. Search the docs directory for that class name: `grep -r "ClassName" DOCS_PATH/server/`
+3. If found in a doc page, use that as the mapping
--- a/.github/workflows/coverage.yaml
+++ b/.github/workflows/coverage.yaml
@@ -29,6 +29,7 @@ jobs:

      - name: Install system packages
        run: |
+          sudo apt-get update
          sudo apt-get install -y portaudio19-dev

      - name: Install dependencies
@@ -36,11 +37,13 @@ jobs:
          uv sync --group dev \
            --extra anthropic \
            --extra aws \
+            --extra deepgram \
            --extra google \
            --extra langchain \
            --extra livekit \
-            --extra local-smart-turn-v3 \
            --extra piper \
+            --extra sagemaker \
+            --extra tracing \
            --extra websocket

      - name: Run tests with coverage
--- a/.github/workflows/generate-changelog.yml
+++ b/.github/workflows/generate-changelog.yml
@@ -86,7 +86,7 @@ jobs:
          fi

          # Validate fragment types
-          VALID_TYPES="added changed deprecated removed fixed security other"
+          VALID_TYPES="added changed deprecated removed fixed performance security other"
          INVALID_FRAGMENTS=""

          for file in changelog/*.md; do
--- a/.github/workflows/tests.yaml
+++ b/.github/workflows/tests.yaml
@@ -33,6 +33,7 @@ jobs:

      - name: Install system packages
        run: |
+          sudo apt-get update
          sudo apt-get install -y portaudio19-dev

      - name: Install dependencies
@@ -40,11 +41,13 @@ jobs:
          uv sync --group dev \
            --extra anthropic \
            --extra aws \
+            --extra deepgram \
            --extra google \
            --extra langchain \
            --extra livekit \
-            --extra local-smart-turn-v3 \
            --extra piper \
+            --extra sagemaker \
+            --extra tracing \
            --extra websocket

      - name: Test with pytest
--- a/.github/workflows/update-docs.yml
+++ b/.github/workflows/update-docs.yml
@@ -0,0 +1,147 @@
+name: Update Documentation on PR Merge
+
+on:
+  pull_request_target:
+    types: [closed]
+    branches: [main]
+    paths:
+      - "src/pipecat/services/**"
+      - "src/pipecat/transports/**"
+      - "src/pipecat/serializers/**"
+      - "src/pipecat/processors/**"
+      - "src/pipecat/audio/**"
+      - "src/pipecat/turns/**"
+      - "src/pipecat/observers/**"
+      - "src/pipecat/pipeline/**"
+  workflow_dispatch:
+    inputs:
+      pr_number:
+        description: "PR number to generate docs for"
+        required: true
+        type: string
+
+jobs:
+  update-docs:
+    if: >-
+      github.event_name == 'workflow_dispatch' ||
+      github.event.pull_request.merged == true
+    runs-on: ubuntu-latest
+    timeout-minutes: 15
+    permissions:
+      contents: read
+      pull-requests: read
+      id-token: write
+    steps:
+      - name: Checkout pipecat
+        uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+
+      - name: Checkout docs
+        uses: actions/checkout@v4
+        with:
+          repository: pipecat-ai/docs
+          token: ${{ secrets.DOCS_SYNC_TOKEN }}
+          path: _docs
+
+      - name: Resolve PR number
+        id: pr
+        run: |
+          if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
+            echo "number=${{ inputs.pr_number }}" >> "$GITHUB_OUTPUT"
+          else
+            echo "number=${{ github.event.pull_request.number }}" >> "$GITHUB_OUTPUT"
+          fi
+
+      - name: Update documentation
+        uses: anthropics/claude-code-action@v1
+        env:
+          DOCS_SYNC_TOKEN: ${{ secrets.DOCS_SYNC_TOKEN }}
+        with:
+          anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
+          github_token: ${{ secrets.GITHUB_TOKEN }}
+          prompt: |
+            You are updating documentation for the pipecat-ai/docs repository based on
+            changes merged in PR #${{ steps.pr.outputs.number }} of pipecat-ai/pipecat.
+
+            ## Setup
+
+            1. Read the skill instructions at `.claude/skills/update-docs/SKILL.md`
+            2. Read the source-to-doc mapping at `.claude/skills/update-docs/SOURCE_DOC_MAPPING.md`
+            3. The docs repository is checked out at `./_docs/`
+
+            ## Get the diff
+
+            Run `gh pr diff ${{ steps.pr.outputs.number }}` to see what changed in the PR.
+            Also run `gh pr diff ${{ steps.pr.outputs.number }} --name-only` to get the list of changed files.
+            Filter to source files matching the directories listed in SKILL.md Step 3.
+
+            If no relevant source files were changed, exit with "No documentation changes needed."
+
+            ## Follow the skill instructions
+
+            Apply the SKILL.md workflow (Steps 3-9) with these adaptations for automation:
+
+            ### Docs path
+            Use `./_docs/` — it's already checked out. Do not ask for a path.
+
+            ### Branch management
+            - Branch name: `docs/pr-${{ steps.pr.outputs.number }}`
+            - Work inside `./_docs/` for all doc edits and git operations
+            - Check if the branch already exists on the remote:
+              ```bash
+              cd _docs && git fetch origin docs/pr-${{ steps.pr.outputs.number }} 2>/dev/null
+              ```
+              - If it exists: check it out (supports workflow re-runs)
+              - If not: create it from main
+
+            ### Git config
+            Before committing in `_docs`, set:
+            ```bash
+            git config user.name "github-actions[bot]"
+            git config user.email "github-actions[bot]@users.noreply.github.com"
+            ```
+
+            ### No interactive questions
+            Do not ask questions. If you encounter gaps (unmapped files, missing sections,
+            ambiguous changes), note them in the PR body under "## Gaps identified".
+
+            ### Creating the docs PR
+            After committing all changes in `_docs`, push and create a PR:
+            ```bash
+            cd _docs
+            git push -u origin docs/pr-${{ steps.pr.outputs.number }}
+            GH_TOKEN=$DOCS_SYNC_TOKEN gh pr create \
+              --repo pipecat-ai/docs \
+              --label auto-docs \
+              --title "docs: update for pipecat PR #${{ steps.pr.outputs.number }}" \
+              --body "$(cat <<'BODY'
+            Automated documentation update for [pipecat PR #${{ steps.pr.outputs.number }}](https://github.com/pipecat-ai/pipecat/pull/${{ steps.pr.outputs.number }}).
+
+            ## Changes
+            <summarize each doc page updated and what changed>
+
+            ## Gaps identified
+            <any unmapped files, missing doc pages, or missing sections — or "None">
+            BODY
+            )"
+            ```
+
+            ### Re-run handling
+            If `gh pr create` fails because a PR from that branch already exists,
+            push the updated commits and use `gh pr edit` to update the body instead.
+
+            ### No-op
+            If after analyzing the diff you determine no documentation changes are needed
+            (e.g., only skip-listed files changed, or changes don't affect public API docs),
+            exit cleanly without creating a branch or PR. Output "No documentation changes needed."
+
+            ## Important rules
+            - Only modify files inside `./_docs/` — never modify pipecat source code
+            - Follow the conservative editing rules from SKILL.md Step 6
+            - Read each doc page fully before editing (SKILL.md Guidelines)
+            - Use `GH_TOKEN=$DOCS_SYNC_TOKEN` for all `gh` commands targeting pipecat-ai/docs
+          claude_args: |
+            --model claude-sonnet-4-5-20250929
+            --max-turns 30
+            --allowedTools "Read,Write,Edit,Glob,Grep,Bash"
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -25,7 +25,7 @@ uv run pytest tests/test_name.py
 uv run pytest tests/test_name.py::test_function_name

 # Preview changelog
-towncrier build --draft --version Unreleased
+uv run towncrier build --draft --version Unreleased

 # Lint and format check
 uv run ruff check
@@ -74,7 +74,7 @@ All data flows as **Frame** objects through a pipeline of **FrameProcessors**:
 - **Context Aggregation**: `LLMContext` accumulates messages for LLM calls; `UserResponse` aggregates user input

 - **Turn Management**: Turn management is done through `LLMUserAggregator` and
-`LLMAssistantAggregator`, created with `LLMContextAggregatorPair`
+  `LLMAssistantAggregator`, created with `LLMContextAggregatorPair`

 - **User turn strategies**: Detection of when the user starts and stops speaking is done via user turn start/stop strategies. They push `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` respectively.

@@ -90,23 +90,26 @@ All data flows as **Frame** objects through a pipeline of **FrameProcessors**:

 ### Key Directories

-| Directory                 | Purpose                                            |
-|---------------------------|----------------------------------------------------|
-| `src/pipecat/frames/`     | Frame definitions (100+ types)                     |
-| `src/pipecat/processors/` | FrameProcessor base + aggregators, filters, audio  |
-| `src/pipecat/pipeline/`   | Pipeline orchestration                             |
-| `src/pipecat/services/`   | AI service integrations (60+ providers)            |
-| `src/pipecat/transports/` | Transport layer (Daily, LiveKit, WebSocket, Local) |
-| `src/pipecat/serializers/`| Frame serialization for WebSocket protocols        |
-| `src/pipecat/observers/`  | Pipeline observers for monitoring frame flow       |
-| `src/pipecat/audio/`      | VAD, filters, mixers, turn detection, DTMF         |
-| `src/pipecat/turns/`      | User turn management                               |
+| Directory                  | Purpose                                            |
+| -------------------------- | -------------------------------------------------- |
+| `src/pipecat/frames/`      | Frame definitions (100+ types)                     |
+| `src/pipecat/processors/`  | FrameProcessor base + aggregators, filters, audio  |
+| `src/pipecat/pipeline/`    | Pipeline orchestration                             |
+| `src/pipecat/services/`    | AI service integrations (60+ providers)            |
+| `src/pipecat/transports/`  | Transport layer (Daily, LiveKit, WebSocket, Local) |
+| `src/pipecat/serializers/` | Frame serialization for WebSocket protocols        |
+| `src/pipecat/observers/`   | Pipeline observers for monitoring frame flow       |
+| `src/pipecat/audio/`       | VAD, filters, mixers, turn detection, DTMF         |
+| `src/pipecat/turns/`       | User turn management                               |

 ## Code Style

 - **Docstrings**: Google-style. Classes describe purpose; `__init__` has `Args:` section; dataclasses use `Parameters:` section.
 - **Linting**: Ruff (line length 100). Pre-commit hooks enforce formatting.
 - **Type hints**: Required for complex async code.
+- **Dataclass vs Pydantic**: Use `@dataclass` for frames and internal pipeline data (high-frequency, no validation needed). Use Pydantic `BaseModel` for configuration, parameters, metrics, and external API data (benefits from validation and serialization). Specifically:
+  - `@dataclass`: Frame types, context aggregator pairs, internal data containers
+  - `BaseModel`: Service `InputParams`, transport/VAD/turn params, metrics data, API request/response models, serializer params

 ### Docstring Example

@@ -152,7 +155,3 @@ When adding a new service:
 ## Testing

 Test utilities live in `src/pipecat/tests/utils.py`. Use `run_test()` to send frames through a pipeline and assert expected output frames in each direction. Use `SleepFrame(sleep=N)` to add delays between frames.
-
-## Pull Requests
-
-After creating a PR, use `/changelog <pr_number>` to generate the changelog file and `/pr-description <pr_number>` to update the PR description.
--- a/COMMUNITY_INTEGRATIONS.md
+++ b/COMMUNITY_INTEGRATIONS.md
@@ -25,7 +25,6 @@ Your repository must contain these components:
 - **Source code** - Complete implementation following Pipecat patterns
 - **Foundational example** - Single file example showing basic usage (see [Pipecat examples](https://github.com/pipecat-ai/pipecat/tree/main/examples/foundational))
 - **README.md** - Must include:
-
  - Introduction and explanation of your integration
  - Installation instructions
  - Usage instructions with Pipecat Pipeline
@@ -66,12 +65,25 @@ Once your PR is submitted, post in the `#community-integrations` Discord channel

 #### Websocket-based Services

+**Base class:** `WebsocketSTTService`
+
+**Use for:** Services where you manage the websocket connection directly. Combines `STTService` with `WebsocketService` for automatic reconnection and keepalive support.
+
+**Examples:**
+
+- [CartesiaSTTService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/cartesia/stt.py)
+- [ElevenLabsRealtimeSTTService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/elevenlabs/stt.py)
+
+#### SDK-based Streaming Services
+
 **Base class:** `STTService`

+**Use for:** Streaming services where the provider's Python SDK manages the connection internally.
+
 **Examples:**

 - [DeepgramSTTService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/deepgram/stt.py)
- [SpeechmaticsSTTService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/speechmatics/stt.py)
+- [GoogleSTTService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/google/stt.py)

 #### File-based Services

@@ -109,56 +121,59 @@ Once your PR is submitted, post in the `#community-integrations` Discord channel

 #### Key requirements:

+- **`_process_context(self, context: LLMContext)`** — The main method that processes an LLM context and generates a response. Each LLM service overrides `process_frame` to extract context from `LLMContextFrame` and calls `_process_context`.
+
+- **`adapter_class`** — Class attribute pointing to a `BaseLLMAdapter` subclass. Defaults to `OpenAILLMAdapter`. Non-OpenAI services must implement their own adapter (see `src/pipecat/adapters/base_llm_adapter.py`) with methods:
+  - `get_llm_invocation_params(context)` — Extract provider-specific params from universal context
+  - `to_provider_tools_format(tools_schema)` — Convert standard tools to provider format
+  - `get_messages_for_logging(context)` — Format messages for logging
+  - Reference adapters: `src/pipecat/adapters/services/` (anthropic, gemini, bedrock, etc.)
+
 - **Frame sequence:** Output must follow this frame sequence pattern:
+  - `LLMFullResponseStartFrame` — Signals the start of an LLM response
+  - `LLMTextFrame` — Contains LLM content, typically streamed as tokens
+  - `LLMFullResponseEndFrame` — Signals the end of an LLM response

-  - `LLMFullResponseStartFrame` - Signals the start of an LLM response
-  - `LLMTextFrame` - Contains LLM content, typically streamed as tokens
-  - `LLMFullResponseEndFrame` - Signals the end of an LLM response
+- **Thought frames (reasoning models):** If the model supports extended thinking / chain-of-thought, emit thought frames alongside the response:
+  - `LLMThoughtStartFrame` — Signals the start of a thought
+  - `LLMThoughtTextFrame` — Contains thought content, streamed as tokens
+  - `LLMThoughtEndFrame` — Signals the end of a thought

- **Context aggregation:** Implement context aggregation to collect user and assistant content:
-  - Aggregators come in pairs with a `user()` instance and `assistant()` instance
-  - Context must adhere to the `LLMContext` universal format
-  - Aggregators should handle adding messages, function calls, and images to the context
+- **Context aggregation** is handled by the framework via `LLMContext` + `LLMContextAggregatorPair`. The LLM service just processes context it receives — no need to implement aggregators.

 ### TTS (Text-to-Speech) Services

-#### AudioContextWordTTSService
+#### WebsocketTTSService

-**Use for:** Websocket-based services supporting word/timestamp alignment
+**Use for:** Websocket-based streaming services (with or without word timestamps)

-**Example:**
+**Examples:**

 - [CartesiaTTSService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/cartesia/tts.py)
+- [ElevenLabsTTSService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/elevenlabs/tts.py)

 #### InterruptibleTTSService

-**Use for:** Websocket-based services without word/timestamp alignment, requiring disconnection on interruption
+**Use for:** Websocket-based services without word timestamps that reconnect on interruption (e.g. don't support a context ID or interruption message)

 **Example:**

 - [SarvamTTSService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/sarvam/tts.py)

-#### WordTTSService
-
-**Use for:** HTTP-based services supporting word/timestamp alignment
-
-**Example:**
-
- [ElevenLabsHttpTTSService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/elevenlabs/tts.py)
-
 #### TTSService

-**Use for:** HTTP-based services without word/timestamp alignment
+**Use for:** HTTP-based services (word timestamps are supported in the base class)

-**Example:**
+**Examples:**

 - [GoogleHttpTTSService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/google/tts.py)
+- [OpenAITTSService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/openai/tts.py)

 #### Key requirements:

- For websocket services, use asyncio WebSocket implementation (required for v13+ support)
+- For websocket services, use asyncio WebSocket implementation
 - Handle idle service timeouts with keepalives
- TTSServices push both audio (`TTSRawAudioFrame`) and text (`TTSTextFrame`) frames
+- TTS services push both audio (`TTSAudioRawFrame`) and text (`TTSTextFrame`) frames

 ### Telephony Serializers

@@ -202,9 +217,9 @@ Vision services process images and provide analysis such as descriptions, object

 #### Key requirements:

- Must implement `run_vision` method that takes an `LLMContext` and returns an `AsyncGenerator[Frame, None]`
- The method processes the latest image in the context and yields frames with analysis results
- Typically yields `TextFrame` objects containing descriptions or answers
+- Must implement `run_vision` method that takes a `UserImageRawFrame` and returns an `AsyncGenerator[Frame, None]`
+- The method processes the image frame and yields frames with analysis results
+- Must yield the frame sequence: `VisionFullResponseStartFrame`, `VisionTextFrame`, `VisionFullResponseEndFrame`

 ## Implementation Guidelines

@@ -233,24 +248,137 @@ def can_generate_metrics(self) -> bool:
    return True
 ```

-### Dynamic Settings Updates
+### Service Settings

-STT, LLM, and TTS services support `ServiceUpdateSettingsFrame` for dynamic configuration changes. The base STTService has an `_update_settings()` method that handles settings, and the private `_settings` `Dict` is used to store settings and provide access to the subclass.
+Every AI service (STT, LLM, TTS, image generation, etc.) exposes a **Settings dataclass** that serves two roles:
+
+1. **Store mode** — the service's `self._settings` holds the current value of every runtime-updatable field.
+2. **Delta mode** — an update frame (e.g. `TTSUpdateSettingsFrame`) specifies only the fields that should change; unspecified fields remain `NOT_GIVEN`.
+
+#### Defining your Settings class
+
+Extend `STTSettings`, `TTSSettings`, `LLMSettings`, or `ImageGenSettings` (or, if your service directly subclasses `AIService`, `ServiceSettings`). The base classes already provide common fields (e.g. `model`, `voice`, `language`). You only need to add **service-specific knobs that should be runtime-updatable**:

 ```python
-async def set_language(self, language: Language):
-    """Set the recognition language and reconnect.
+from dataclasses import dataclass, field

-    Args:
-        language: The language to use for speech recognition.
+from pipecat.services.settings import TTSSettings, NOT_GIVEN
+
+@dataclass
+class MyTTSSettings(TTSSettings):
+    """Settings for MyTTS service.
+
+    Parameters:
+        speaking_rate: Speed multiplier (0.5–2.0).
    """
-    logger.info(f"Switching STT language to: [{language}]")
-    self._settings["language"] = language
-    await self._disconnect()
-    await self._connect()
+
+    speaking_rate: float | None = field(default_factory=lambda: NOT_GIVEN)
 ```

-Note that, in this example, Deepgram requires the websocket connection be disconnected and reconnected to reinitialize the service with the new value. Consider if your service requires reconnection.
+**What goes in Settings vs. `__init__` params:**
+
+| Belongs in Settings                                      | Stays as `__init__` params                |
+| -------------------------------------------------------- | ----------------------------------------- |
+| Model name, voice, language                              | API keys, auth tokens                     |
+| Service-specific tuning knobs (rate, pitch, temperature) | Base URLs, endpoint overrides             |
+| Anything users may want to change mid-session            | Audio encoding, sample format             |
+|                                                          | Connection parameters (timeouts, retries) |
+
+The rule of thumb: if a caller might send an update frame to change it at runtime, it belongs in Settings. Everything else is init-only config stored as `self._xxx`.
+
+#### Wiring settings into `__init__`
+
+Accept an **optional** `settings` parameter. Build a `default_settings` object with all fields set to real values, then merge any caller overrides with `apply_update`.
+
+Add a `Settings` **class attribute** that points to your settings dataclass. This lets callers access the settings class through the service itself (e.g. `MyTTSService.Settings(...)`) without a separate import:
+
+```python
+from typing import Optional
+
+class MyTTSService(TTSService):
+    Settings = MyTTSSettings
+    _settings: Settings
+
+    def __init__(
+        self,
+        *,
+        api_key: str,
+        settings: Optional[Settings] = None,
+        **kwargs,
+    ):
+        # 1. Defaults — every field has a real value (store mode).
+        default_settings = self.Settings(
+            model="my-model-v1",
+            voice="default-voice",
+            language="en",
+            speaking_rate=1.0,
+        )
+
+        # 2. Merge caller overrides (only given fields win).
+        if settings is not None:
+            default_settings.apply_update(settings)
+
+        # 3. Pass the fully-populated settings to the base class.
+        super().__init__(settings=default_settings, **kwargs)
+
+        # 4. Init-only config stored separately.
+        self._api_key = api_key
+```
+
+This pattern lets callers override only what they care about:
+
+```python
+# Uses all defaults
+svc = MyTTSService(api_key="sk-xxx")
+
+# Overrides just the voice — access Settings through the service class
+svc = MyTTSService(
+    api_key="sk-xxx",
+    settings=MyTTSService.Settings(voice="custom-voice"),
+)
+```
+
+#### Reacting to runtime changes
+
+AI services support runtime configuration changes via `*UpdateSettingsFrame`s (e.g. `STTUpdateSettingsFrame`, `TTSUpdateSettingsFrame`, `LLMUpdateSettingsFrame`).
+
+To react to runtime setting changes, override `_update_settings`. The base implementation applies the delta to `self._settings` and returns a `dict` mapping each changed field name to its **pre-update** value. Your override should call `super()` first, then act on the changed fields. A common implementation might look like:
+
+```python
+async def _update_settings(self, update: TTSSettings) -> dict[str, Any]:
+    """Apply a settings update, reconfiguring the connection if needed."""
+    changed = await super()._update_settings(update)
+
+    if not changed:
+        return changed
+
+    await self._disconnect()
+    await self._connect()
+
+    return changed
+```
+
+The dict keys work like a set for membership tests (`"language" in changed`) and truthiness (`if changed`). Use `changed.keys() - {"language"}` for set difference, or `changed["language"]` to inspect the previous value of a field.
+
+Note that, in this example, the service requires a reconnect to apply the new language. Consider, for each setting, whether your service requires reconnection or can apply changes in-place.
+
+If your service can't yet apply certain settings at runtime, call `self._warn_unhandled_updated_settings(changed)` with any unhandled field names so users get a clear log message:
+
+```python
+async def _update_settings(self, update: TTSSettings) -> dict[str, Any]:
+    changed = await super()._update_settings(update)
+
+    if not changed:
+        return changed
+
+    if "language" in changed:
+        await self._update_language()
+    else:
+        # TODO: this should be temporary - handle changes to other settings soon!
+        self._warn_unhandled_updated_settings(changed.keys() - {"language"})
+
+    return changed
+```

 ### Sample Rate Handling

@@ -260,7 +388,7 @@ Sample rates are set via PipelineParams and passed to each frame processor at in
 async def start(self, frame: StartFrame):
    """Start the service."""
    await super().start(frame)
-    self._settings["output_format"]["sample_rate"] = self.sample_rate
+    self._settings.output_sample_rate = self.sample_rate
    await self._connect()
 ```

@@ -270,7 +398,7 @@ Note that `self.sample_rate` is a `@property` set in the TTSService base class,

 Use Pipecat's tracing decorators:

- **STT:** `@traced_stt` - decorate a function that handles `transcript`, `is_final`, `language` as args
+- **STT:** `@traced_stt` - decorate `_handle_transcription(self, transcript, is_final, language)` (the standard method name convention)
 - **LLM:** `@traced_llm` - decorate the `_process_context()` method
 - **TTS:** `@traced_tts` - decorate the `run_tts()` method

@@ -292,17 +420,15 @@ For REST-based communication, use aiohttp. Pipecat includes this as a required d
 - Wrap API calls in appropriate try/catch blocks
 - Handle rate limits and network failures gracefully
 - Provide meaningful error messages
- When errors occur, raise exceptions AND push `ErrorFrame`s to notify the pipeline:
+- When errors occur, raise exceptions AND push errors to notify the pipeline:

 ```python
-from pipecat.frames.frames import ErrorFrame
-
 try:
    # Your API call
    result = await self._make_api_call()
 except Exception as e:
-    # Push error frame to pipeline
-    await self.push_error(ErrorFrame(error=f"{self} error: {e}"))
+    # Push error upstream to notify the pipeline
+    await self.push_error(f"{self} error: {e}", exception=e)
    # Raise or handle as appropriate
    raise
 ```
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -49,12 +49,12 @@ Every pull request that makes a user-facing change should include a changelog en
   ```

 2. Choose the appropriate type:
-
   - `added.md` - New features
   - `changed.md` - Changes in existing functionality
   - `deprecated.md` - Soon-to-be removed features
   - `removed.md` - Removed features
   - `fixed.md` - Bug fixes
+   - `performance.md` - Performance improvements
   - `security.md` - Security fixes
   - `other.md` - Other changes (documentation, dependencies, etc.)

@@ -80,7 +80,6 @@ Every pull request that makes a user-facing change should include a changelog en

 ```markdown
 - Updated service configuration:
-
  - Changed default timeout to 30 seconds
  - Added retry logic for failed connections
 ```
@@ -105,7 +104,6 @@ changelog/1234.changed.2.md

 ```markdown
 - Updated service configuration:
-
  - Changed default timeout to 30 seconds
  - Added retry logic for failed connections
 ```
--- a/README.md
+++ b/README.md
@@ -55,6 +55,20 @@ Looking for help debugging your pipeline and processors? Check out [Whisker](htt

 Love terminal applications? Check out [Tail](https://github.com/pipecat-ai/tail), a terminal dashboard for Pipecat.

+### 🤖 Claude Code Skills
+
+Use [Pipecat Skills](https://github.com/pipecat-ai/skills) with [Claude Code](https://claude.ai/code) to scaffold projects, deploy to Pipecat Cloud, and more. Install the marketplace with:
+
+```
+claude plugin marketplace add pipecat-ai/skills
+```
+
+and install any of the available plugins.
+
+### 🧩 Community Integrations
+
+Build and share your own Pipecat service integrations! Browse existing [community integrations](https://docs.pipecat.ai/server/services/community-integrations) or check out our [guide](COMMUNITY_INTEGRATIONS.md) to create your own.
+
 ### 📺️ Pipecat TV Channel

 Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.youtube.com/playlist?list=PLzU2zoMTQIHjqC3v4q2XVSR3hGSzwKFwH) channel.
@@ -71,19 +85,20 @@ Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.yout

 ## 🧩 Available services

-| Category            | Services                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
-| ------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| Speech-to-Text      | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Gradium](https://docs.pipecat.ai/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [Hathora](https://docs.pipecat.ai/server/services/stt/hathora), [NVIDIA Riva](https://docs.pipecat.ai/server/services/stt/riva), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [SambaNova (Whisper)](https://docs.pipecat.ai/server/services/stt/sambanova), [Sarvam](https://docs.pipecat.ai/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper)                                                                                                                                                                                                                                                            |
-| LLMs                | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/server/services/llm/aws), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/server/services/llm/mistral), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/server/services/llm/sambanova) [Together AI](https://docs.pipecat.ai/server/services/llm/together)                                                                                                                                                                                                                                                                                              |
-| Text-to-Speech      | [Async](https://docs.pipecat.ai/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Camb AI](https://docs.pipecat.ai/server/services/tts/camb), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [Gradium](https://docs.pipecat.ai/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/server/services/tts/groq), [Hathora](https://docs.pipecat.ai/server/services/tts/hathora), [Hume](https://docs.pipecat.ai/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/server/services/tts/inworld), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [NVIDIA Riva](https://docs.pipecat.ai/server/services/tts/riva), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Resemble](https://docs.pipecat.ai/server/services/tts/resemble), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [Speechmatics](https://docs.pipecat.ai/server/services/tts/speechmatics), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
-| Speech-to-Speech    | [AWS Nova Sonic](https://docs.pipecat.ai/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/server/services/s2s/ultravox),                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
-| Transport           | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
-| Serializers         | [Exotel](https://docs.pipecat.ai/server/utilities/serializers/exotel), [Plivo](https://docs.pipecat.ai/server/utilities/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/utilities/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/utilities/serializers/telnyx), [Vonage](https://docs.pipecat.ai/server/utilities/serializers/vonage)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
-| Video               | [HeyGen](https://docs.pipecat.ai/server/services/video/heygen), [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
-| Memory              | [mem0](https://docs.pipecat.ai/server/services/memory/mem0)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
-| Vision & Image      | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/google-imagen), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
-| Audio Processing    | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [ai-coustics](https://docs.pipecat.ai/server/utilities/audio/aic-filter)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
-| Analytics & Metrics | [OpenTelemetry](https://docs.pipecat.ai/server/utilities/opentelemetry), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
+| Category            | Services                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
+| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| Speech-to-Text      | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Gradium](https://docs.pipecat.ai/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [NVIDIA Riva](https://docs.pipecat.ai/server/services/stt/riva), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [Sarvam](https://docs.pipecat.ai/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
+| LLMs                | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/server/services/llm/aws), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/server/services/llm/mistral), [Nebius](https://docs.pipecat.ai/server/services/llm/nebius), [Novita](https://docs.pipecat.ai/server/services/llm/novita), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nvidia), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/server/services/llm/sambanova), [Sarvam](https://docs.pipecat.ai/server/services/llm/sarvam), [Together AI](https://docs.pipecat.ai/server/services/llm/together)                                                                                                                                                                                                                                                                                                                                                         |
+| Text-to-Speech      | [Async](https://docs.pipecat.ai/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Camb AI](https://docs.pipecat.ai/server/services/tts/camb), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [Gradium](https://docs.pipecat.ai/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/server/services/tts/groq), [Hume](https://docs.pipecat.ai/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/server/services/tts/inworld), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [NVIDIA Riva](https://docs.pipecat.ai/server/services/tts/riva), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [Resemble](https://docs.pipecat.ai/server/services/tts/resemble), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [Smallest](https://docs.pipecat.ai/server/services/tts/smallest), [Speechmatics](https://docs.pipecat.ai/server/services/tts/speechmatics), [xAI](https://docs.pipecat.ai/server/services/tts/xai), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
+| Speech-to-Speech    | [AWS Nova Sonic](https://docs.pipecat.ai/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/server/services/s2s/ultravox),                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
+| Transport           | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
+| Serializers         | [Exotel](https://docs.pipecat.ai/server/utilities/serializers/exotel), [Plivo](https://docs.pipecat.ai/server/utilities/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/utilities/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/utilities/serializers/telnyx), [Vonage](https://docs.pipecat.ai/server/utilities/serializers/vonage)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
+| Video               | [HeyGen](https://docs.pipecat.ai/server/services/video/heygen), [LemonSlice](https://docs.pipecat.ai/server/services/video/lemonslice), [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
+| Memory              | [mem0](https://docs.pipecat.ai/server/services/memory/mem0)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
+| Vision & Image      | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/google-imagen), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
+| Audio Processing    | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [ai-coustics](https://docs.pipecat.ai/server/utilities/audio/aic-filter)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
+| Analytics & Metrics | [OpenTelemetry](https://docs.pipecat.ai/server/utilities/opentelemetry), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
+| Community           | [Browse community integrations →](https://docs.pipecat.ai/server/services/community-integrations)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |

 📚 [View full services documentation →](https://docs.pipecat.ai/server/services/supported-services)

@@ -163,6 +178,15 @@ You can get started with Pipecat running on your local machine, then move your a

 > **Note**: Some extras (local, gstreamer) require system dependencies. See documentation if you encounter build errors.

+### Claude Code Skills
+
+Install development workflow skills for contributing to Pipecat with [Claude Code](https://claude.ai/code):
+
+```
+claude plugin marketplace add pipecat-ai/pipecat
+claude plugin install pipecat-dev@pipecat-dev-skills
+```
+
 ### Running tests

 To run all tests, from the root directory:
--- a/docs/api/README.md
+++ b/docs/api/README.md
@@ -42,7 +42,7 @@ This script:

 - Creates a fresh virtual environment
 - Installs all dependencies as specified in requirements files
- Handles conflicting dependencies (like grpcio versions for Riva and PlayHT)
+- Handles conflicting dependencies (like grpcio versions for Riva)
 - Builds the documentation in an isolated environment
 - Provides detailed logging of the build process

@@ -74,7 +74,6 @@ start _build/html/index.html
 ├── index.rst       # Main documentation entry point
 ├── requirements-base.txt    # Base documentation dependencies
 ├── requirements-riva.txt    # Riva-specific dependencies
-├── requirements-playht.txt  # PlayHT-specific dependencies
 ├── build-docs.sh   # Local build script
 └── rtd-test.py     # ReadTheDocs test build script
 ```
--- a/env.example
+++ b/env.example
@@ -47,7 +47,8 @@ DAILY_ROOM_URL=https://...

 # Deepgram
 DEEPGRAM_API_KEY=...
-SAGEMAKER_ENDPOINT_NAME=...
+SAGEMAKER_STT_ENDPOINT_NAME=...
+SAGEMAKER_TTS_ENDPOINT_NAME=...

 # DeepSeek
 DEEPSEEK_API_KEY=...
@@ -79,15 +80,9 @@ GOOGLE_TEST_CREDENTIALS=...
 # Gradium
 GRAPDIUM_API_KEY=...

-# Grok
-GROK_API_KEY=...
-
 # Groq
 GROQ_API_KEY=...

-# Hathora
-HATHORA_API_KEY=...
-
 # Heygen
 HEYGEN_API_KEY=...
 HEYGEN_LIVE_AVATAR_API_KEY=...
@@ -103,9 +98,14 @@ INWORLD_API_KEY=...
 KRISP_MODEL_PATH=...

 # Krisp Viva
+KRISP_VIVA_API_KEY=...
 KRISP_VIVA_FILTER_MODEL_PATH=...
 KRISP_VIVA_TURN_MODEL_PATH=...

+# LemonSlice
+LEMONSLICE_API_KEY=...
+LEMONSLICE_AGENT_ID=...
+
 # LiveKit
 LIVEKIT_API_KEY=...
 LIVEKIT_API_SECRET=...
@@ -121,9 +121,15 @@ MINIMAX_GROUP_ID=...
 # Mistral
 MISTRAL_API_KEY=...

+# Nebius
+NEBIUS_API_KEY=...
+
 # Neuphonic
 NEUPHONIC_API_KEY=...

+# Novita
+NOVITA_API_KEY=...
+
 # NVIDIA
 NVIDIA_API_KEY=...

@@ -145,10 +151,6 @@ KOALA_ACCESS_KEY=...
 # Piper
 PIPER_BASE_URL=...

-# PlayHT
-PLAYHT_USER_ID=...
-PLAYHT_API_KEY=...
-
 # Plivo
 PLIVO_AUTH_ID=...
 PLIVO_AUTH_TOKEN=...
@@ -177,6 +179,9 @@ SENTRY_DSN=...
 SIMLI_API_KEY=...
 SIMLI_FACE_ID=...

+# Smallest
+SMALLEST_API_KEY=...
+
 # Smart turn
 LOCAL_SMART_TURN_MODEL_PATH=...
 FAL_SMART_TURN_API_KEY=...
@@ -210,3 +215,6 @@ WHATSAPP_TOKEN=...
 WHATSAPP_WEBHOOK_VERIFICATION_TOKEN=...
 WHATSAPP_PHONE_NUMBER_ID=...
 WHATSAPP_APP_SECRET=...
+
+# xAI / Grok
+XAI_API_KEY=...
--- a/examples/foundational/01-say-one-thing-piper.py
+++ b/examples/foundational/01-say-one-thing-piper.py
@@ -39,7 +39,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    # Create an HTTP session
    async with aiohttp.ClientSession() as session:
        tts = PiperHttpTTSService(
-            base_url=os.getenv("PIPER_BASE_URL"), aiohttp_session=session, sample_rate=24000
+            base_url=os.getenv("PIPER_BASE_URL"),
+            aiohttp_session=session,
+            sample_rate=24000,
        )

        task = PipelineTask(
--- a/examples/foundational/01-say-one-thing-rime.py
+++ b/examples/foundational/01-say-one-thing-rime.py
@@ -39,8 +39,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async with aiohttp.ClientSession() as session:
        tts = RimeHttpTTSService(
            api_key=os.getenv("RIME_API_KEY", ""),
-            voice_id="rex",
            aiohttp_session=session,
+            settings=RimeHttpTTSService.Settings(
+                voice="rex",
+            ),
        )

        task = PipelineTask(
--- a/examples/foundational/01-say-one-thing.py
+++ b/examples/foundational/01-say-one-thing.py
@@ -37,7 +37,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
    )

    task = PipelineTask(
--- a/examples/foundational/01a-local-audio.py
+++ b/examples/foundational/01a-local-audio.py
@@ -29,7 +29,9 @@ async def main():

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
    )

    pipeline = Pipeline([tts, transport.output()])
--- a/examples/foundational/01b-livekit-audio.py
+++ b/examples/foundational/01b-livekit-audio.py
@@ -37,7 +37,9 @@ async def main():

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
    )

    runner = PipelineRunner()
--- a/examples/foundational/02-llm-say-one-thing.py
+++ b/examples/foundational/02-llm-say-one-thing.py
@@ -39,17 +39,17 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
-
-    messages = [
-        {
-            "role": "system",
-            "content": "You are an LLM in a WebRTC session, and this is a 'hello world' demo. Say hello to the world.",
-        }
-    ]
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

    task = PipelineTask(
        Pipeline([llm, tts, transport.output()]),
@@ -59,7 +59,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    # Register an event handler so we can play the audio when the client joins
    @transport.event_handler("on_client_connected")
    async def on_client_connected(transport, client):
-        await task.queue_frames([LLMContextFrame(LLMContext(messages)), EndFrame()])
+        context = LLMContext()
+        context.add_message({"role": "developer", "content": "Say hello to the world."})
+        await task.queue_frames([LLMContextFrame(context), EndFrame()])

    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)

--- a/examples/foundational/03-still-frame.py
+++ b/examples/foundational/03-still-frame.py
@@ -45,7 +45,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    # Create an HTTP session
    async with aiohttp.ClientSession() as session:
        imagegen = FalImageGenService(
-            params=FalImageGenService.InputParams(image_size="square_hd"),
+            settings=FalImageGenService.Settings(
+                image_size="square_hd",
+            ),
            aiohttp_session=session,
            key=os.getenv("FAL_KEY"),
        )
--- a/examples/foundational/03a-local-still-frame.py
+++ b/examples/foundational/03a-local-still-frame.py
@@ -37,7 +37,9 @@ async def main():
        )

        imagegen = FalImageGenService(
-            params=FalImageGenService.InputParams(image_size="square_hd"),
+            settings=FalImageGenService.Settings(
+                image_size="square_hd",
+            ),
            aiohttp_session=session,
            key=os.getenv("FAL_KEY"),
        )
--- a/examples/foundational/04-transports-small-webrtc.py
+++ b/examples/foundational/04-transports-small-webrtc.py
@@ -67,19 +67,19 @@ async def run_example(webrtc_connection: SmallWebRTCConnection):

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -109,7 +109,9 @@ async def run_example(webrtc_connection: SmallWebRTCConnection):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/04a-transports-daily.py
+++ b/examples/foundational/04a-transports-daily.py
@@ -50,19 +50,19 @@ async def main():

        tts = CartesiaTTSService(
            api_key=os.getenv("CARTESIA_API_KEY"),
-            voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+            settings=CartesiaTTSService.Settings(
+                voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+            ),
        )

-        llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
+        llm = OpenAILLMService(
+            api_key=os.getenv("OPENAI_API_KEY"),
+            settings=OpenAILLMService.Settings(
+                system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+            ),
+        )

-        messages = [
-            {
-                "role": "system",
-                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-            },
-        ]
-
-        context = LLMContext(messages)
+        context = LLMContext()
        user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
            context,
            user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -91,7 +91,9 @@ async def main():
        async def on_first_participant_joined(transport, participant):
            await transport.capture_participant_transcription(participant["id"])
            # Kick off the conversation.
-            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+            context.add_message(
+                {"role": "developer", "content": "Please introduce yourself to the user."}
+            )
            await task.queue_frames([LLMRunFrame()])

        @transport.event_handler("on_participant_left")
--- a/examples/foundational/04b-transports-livekit.py
+++ b/examples/foundational/04b-transports-livekit.py
@@ -55,24 +55,21 @@ async def main():

    stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. "
-            "Your goal is to demonstrate your capabilities in a succinct way. "
-            "Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. "
-            "Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
--- a/examples/foundational/05-sync-speech-and-image.py
+++ b/examples/foundational/05-sync-speech-and-image.py
@@ -16,11 +16,12 @@ from pipecat.frames.frames import (
    Frame,
    LLMContextFrame,
    LLMFullResponseStartFrame,
+    OutputImageRawFrame,
    TextFrame,
 )
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
-from pipecat.pipeline.sync_parallel_pipeline import SyncParallelPipeline
+from pipecat.pipeline.sync_parallel_pipeline import FrameOrder, SyncParallelPipeline
 from pipecat.pipeline.task import PipelineTask
 from pipecat.processors.aggregators.llm_context import LLMContext
 from pipecat.processors.aggregators.sentence import SentenceAggregator
@@ -30,6 +31,7 @@ from pipecat.runner.utils import create_transport
 from pipecat.services.cartesia.tts import CartesiaHttpTTSService
 from pipecat.services.fal.image import FalImageGenService
 from pipecat.services.openai.llm import OpenAILLMService
+from pipecat.services.tts_service import TextAggregationMode
 from pipecat.transports.base_transport import BaseTransport, TransportParams
 from pipecat.transports.daily.transport import DailyParams

@@ -44,6 +46,18 @@ class MonthFrame(DataFrame):
        return f"{self.name}(month: {self.month})"


+class MarkImageForPlaybackSync(FrameProcessor):
+    """Marks output image frames to be synchronized with audio playback."""
+
+    async def process_frame(self, frame: Frame, direction: FrameDirection):
+        await super().process_frame(frame, direction)
+
+        if isinstance(frame, OutputImageRawFrame):
+            frame.sync_with_audio = True
+
+        await self.push_frame(frame, direction)
+
+
 class MonthPrepender(FrameProcessor):
    def __init__(self):
        super().__init__()
@@ -98,11 +112,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

        tts = CartesiaHttpTTSService(
            api_key=os.getenv("CARTESIA_API_KEY"),
-            voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+            settings=CartesiaHttpTTSService.Settings(
+                voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+            ),
+            # No need to aggregate by sentences (the default), as we already know we're getting full sentences
+            # (Otherwise the service will unnecessarily wait for follow-up input to confirm the sentence is complete,
+            #  which, sadly, actually breaks the synchronization mechanism)
+            text_aggregation_mode=TextAggregationMode.TOKEN,
        )

        imagegen = FalImageGenService(
-            params=FalImageGenService.InputParams(image_size="square_hd"),
+            settings=FalImageGenService.Settings(
+                image_size="square_hd",
+            ),
            aiohttp_session=session,
            key=os.getenv("FAL_KEY"),
        )
@@ -115,17 +137,26 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        # that, each pipeline runs concurrently and `SyncParallelPipeline` will
        # wait for the input frame to be processed.
        #
+        # We use `FrameOrder.PIPELINE` so that each synchronized batch of output
+        # frames is pushed in the order the pipelines are listed: image first,
+        # then audio. This ensures the transport receives the image before the
+        # audio frames it should accompany.
+        #
        # Note that `SyncParallelPipeline` requires the last processor in each
        # of the pipelines to be synchronous. In this case, we use
-        # `CartesiaHttpTTSService` and `FalImageGenService` which make HTTP
+        # `FalImageGenService` and `CartesiaHttpTTSService` which make HTTP
        # requests and wait for the response.
        pipeline = Pipeline(
            [
                llm,  # LLM
                sentence_aggregator,  # Aggregates LLM output into full sentences
                SyncParallelPipeline(  # Run pipelines in parallel aggregating the result
+                    [
+                        imagegen,  # Generate image
+                        MarkImageForPlaybackSync(),  # Mark image as needing sync w/audio during playback
+                    ],
                    [month_prepender, tts],  # Create "Month: sentence" and output audio
-                    [imagegen],  # Generate image
+                    frame_order=FrameOrder.PIPELINE,
                ),
                transport.output(),  # Transport output
            ]
@@ -148,7 +179,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        ]:
            messages = [
                {
-                    "role": "system",
+                    "role": "user",
                    "content": f"Describe a nature photograph suitable for use in a calendar, for the month of {month}. Include only the image description with no preamble. Limit the description to one sentence, please.",
                }
            ]
--- a/examples/foundational/05a-local-sync-speech-and-image.py
+++ b/examples/foundational/05a-local-sync-speech-and-image.py
@@ -1,198 +0,0 @@
-#
-# Copyright (c) 2024-2026, Daily
-#
-# SPDX-License-Identifier: BSD 2-Clause License
-#
-
-import asyncio
-import os
-import sys
-import tkinter as tk
-
-import aiohttp
-from dotenv import load_dotenv
-from loguru import logger
-
-from pipecat.frames.frames import (
-    Frame,
-    LLMContextFrame,
-    OutputAudioRawFrame,
-    TextFrame,
-    TTSAudioRawFrame,
-    URLImageRawFrame,
-)
-from pipecat.pipeline.pipeline import Pipeline
-from pipecat.pipeline.runner import PipelineRunner
-from pipecat.pipeline.sync_parallel_pipeline import SyncParallelPipeline
-from pipecat.pipeline.task import PipelineTask
-from pipecat.processors.aggregators.llm_context import LLMContext
-from pipecat.processors.aggregators.sentence import SentenceAggregator
-from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
-from pipecat.services.cartesia.tts import CartesiaHttpTTSService
-from pipecat.services.fal.image import FalImageGenService
-from pipecat.services.openai.llm import OpenAILLMService
-from pipecat.transports.local.tk import TkLocalTransport, TkTransportParams
-
-load_dotenv(override=True)
-
-logger.remove(0)
-logger.add(sys.stderr, level="DEBUG")
-
-
-async def main():
-    async with aiohttp.ClientSession() as session:
-        tk_root = tk.Tk()
-        tk_root.title("Calendar")
-
-        runner = PipelineRunner()
-
-        async def get_month_data(month):
-            messages = [
-                {
-                    "role": "system",
-                    "content": f"Describe a nature photograph suitable for use in a calendar, for the month of {month}. Include only the image description with no preamble. Limit the description to one sentence, please.",
-                }
-            ]
-
-            class ImageDescription(FrameProcessor):
-                def __init__(self):
-                    super().__init__()
-                    self.text = ""
-
-                async def process_frame(self, frame: Frame, direction: FrameDirection):
-                    await super().process_frame(frame, direction)
-
-                    if isinstance(frame, TextFrame):
-                        self.text = frame.text
-                    await self.push_frame(frame, direction)
-
-            class AudioGrabber(FrameProcessor):
-                def __init__(self):
-                    super().__init__()
-                    self.audio = bytearray()
-                    self.frame = None
-
-                async def process_frame(self, frame: Frame, direction: FrameDirection):
-                    await super().process_frame(frame, direction)
-
-                    if isinstance(frame, TTSAudioRawFrame):
-                        self.audio.extend(frame.audio)
-                        self.frame = OutputAudioRawFrame(
-                            bytes(self.audio), frame.sample_rate, frame.num_channels
-                        )
-                    await self.push_frame(frame, direction)
-
-            class ImageGrabber(FrameProcessor):
-                def __init__(self):
-                    super().__init__()
-                    self.frame = None
-
-                async def process_frame(self, frame: Frame, direction: FrameDirection):
-                    await super().process_frame(frame, direction)
-
-                    if isinstance(frame, URLImageRawFrame):
-                        self.frame = frame
-                    await self.push_frame(frame, direction)
-
-            llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
-
-            tts = CartesiaHttpTTSService(
-                api_key=os.getenv("CARTESIA_API_KEY"),
-                voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
-            )
-
-            imagegen = FalImageGenService(
-                params=FalImageGenService.InputParams(image_size="square_hd"),
-                aiohttp_session=session,
-                key=os.getenv("FAL_KEY"),
-            )
-
-            sentence_aggregator = SentenceAggregator()
-
-            description = ImageDescription()
-
-            audio_grabber = AudioGrabber()
-
-            image_grabber = ImageGrabber()
-
-            # With `SyncParallelPipeline` we synchronize audio and images by
-            # pushing them basically in order (e.g. I1 A1 A1 A1 I2 A2 A2 A2 A2
-            # I3 A3). To do that, each pipeline runs concurrently and
-            # `SyncParallelPipeline` will wait for the input frame to be
-            # processed.
-            #
-            # Note that `SyncParallelPipeline` requires the last processor in
-            # each of the pipelines to be synchronous. In this case, we use
-            # `CartesiaHttpTTSService` and `FalImageGenService` which make HTTP
-            # requests and wait for the response.
-            pipeline = Pipeline(
-                [
-                    llm,  # LLM
-                    sentence_aggregator,  # Aggregates LLM output into full sentences
-                    description,  # Store sentence
-                    SyncParallelPipeline(
-                        [tts, audio_grabber],  # Generate and store audio for the given sentence
-                        [imagegen, image_grabber],  # Generate and storeimage for the given sentence
-                    ),
-                ]
-            )
-
-            task = PipelineTask(pipeline)
-            await task.queue_frame(LLMContextFrame(LLMContext(messages)))
-            await task.stop_when_done()
-
-            await runner.run(task)
-
-            return {
-                "month": month,
-                "text": description.text,
-                "image": image_grabber.frame,
-                "audio": audio_grabber.frame,
-            }
-
-        transport = TkLocalTransport(
-            tk_root,
-            TkTransportParams(
-                audio_out_enabled=True,
-                video_out_enabled=True,
-                video_out_width=1024,
-                video_out_height=1024,
-            ),
-        )
-
-        pipeline = Pipeline([transport.output()])
-
-        task = PipelineTask(pipeline)
-
-        # We only specify a few months as we create tasks all at once and we
-        # might get rate limited otherwise.
-        months: list[str] = [
-            "January",
-            "February",
-        ]
-
-        # We create one task per month. This will be executed concurrently.
-        month_tasks = [asyncio.create_task(get_month_data(month)) for month in months]
-
-        # Now we wait for each month task in the order they're completed. The
-        # benefit is we'll have as little delay as possible before the first
-        # month, and likely no delay between months, but the months won't
-        # display in order.
-        async def show_images(month_tasks):
-            for month_data_task in asyncio.as_completed(month_tasks):
-                data = await month_data_task
-                await task.queue_frames([data["image"], data["audio"]])
-
-            await runner.stop_when_done()
-
-        async def run_tk():
-            while not task.has_finished():
-                tk_root.update()
-                tk_root.update_idletasks()
-                await asyncio.sleep(0.1)
-
-        await asyncio.gather(runner.run(task), show_images(month_tasks), run_tk())
-
-
-if __name__ == "__main__":
-    asyncio.run(main())
--- a/examples/foundational/06-listen-and-respond.py
+++ b/examples/foundational/06-listen-and-respond.py
@@ -83,21 +83,21 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

    ml = MetricsLogger()

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -129,7 +129,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/06a-image-sync.py
+++ b/examples/foundational/06a-image-sync.py
@@ -100,19 +100,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
--- a/examples/foundational/07-interruptible-cartesia-http.py
+++ b/examples/foundational/07-interruptible-cartesia-http.py
@@ -6,6 +6,7 @@

 import os

+import aiohttp
 from dotenv import load_dotenv
 from loguru import logger

@@ -52,64 +53,68 @@ transport_params = {
 async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    logger.info(f"Starting bot")

-    stt = CartesiaSTTService(api_key=os.getenv("CARTESIA_API_KEY"))
+    async with aiohttp.ClientSession() as session:
+        stt = CartesiaSTTService(api_key=os.getenv("CARTESIA_API_KEY"))

-    tts = CartesiaHttpTTSService(
-        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
-    )
+        tts = CartesiaHttpTTSService(
+            api_key=os.getenv("CARTESIA_API_KEY"),
+            aiohttp_session=session,
+            settings=CartesiaHttpTTSService.Settings(
+                voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+            ),
+        )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+        llm = OpenAILLMService(
+            api_key=os.getenv("OPENAI_API_KEY"),
+            settings=OpenAILLMService.Settings(
+                system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+            ),
+        )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
+        context = LLMContext()
+        user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+            context,
+            user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+        )

-    context = LLMContext(messages)
-    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
-        context,
-        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
-    )
+        pipeline = Pipeline(
+            [
+                transport.input(),  # Transport user input
+                stt,
+                user_aggregator,  # User responses
+                llm,  # LLM
+                tts,  # TTS
+                transport.output(),  # Transport bot output
+                assistant_aggregator,  # Assistant spoken responses
+            ]
+        )

-    pipeline = Pipeline(
-        [
-            transport.input(),  # Transport user input
-            stt,
-            user_aggregator,  # User responses
-            llm,  # LLM
-            tts,  # TTS
-            transport.output(),  # Transport bot output
-            assistant_aggregator,  # Assistant spoken responses
-        ]
-    )
+        task = PipelineTask(
+            pipeline,
+            params=PipelineParams(
+                enable_metrics=True,
+                enable_usage_metrics=True,
+            ),
+            idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
+        )

-    task = PipelineTask(
-        pipeline,
-        params=PipelineParams(
-            enable_metrics=True,
-            enable_usage_metrics=True,
-        ),
-        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
-    )
+        @transport.event_handler("on_client_connected")
+        async def on_client_connected(transport, client):
+            logger.info(f"Client connected")
+            # Kick off the conversation.
+            context.add_message(
+                {"role": "developer", "content": "Please introduce yourself to the user."}
+            )
+            await task.queue_frames([LLMRunFrame()])

-    @transport.event_handler("on_client_connected")
-    async def on_client_connected(transport, client):
-        logger.info(f"Client connected")
-        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
-        await task.queue_frames([LLMRunFrame()])
+        @transport.event_handler("on_client_disconnected")
+        async def on_client_disconnected(transport, client):
+            logger.info(f"Client disconnected")
+            await task.cancel()

-    @transport.event_handler("on_client_disconnected")
-    async def on_client_disconnected(transport, client):
-        logger.info(f"Client disconnected")
-        await task.cancel()
+        runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)

-    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
-
-    await runner.run(task)
+        await runner.run(task)


 async def bot(runner_args: RunnerArguments):
--- a/examples/foundational/07-interruptible-openai-responses.py
+++ b/examples/foundational/07-interruptible-openai-responses.py
@@ -4,7 +4,6 @@
 # SPDX-License-Identifier: BSD 2-Clause License
 #

-
 import os

 from dotenv import load_dotenv
@@ -22,9 +21,9 @@ from pipecat.processors.aggregators.llm_response_universal import (
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
+from pipecat.services.cartesia.tts import CartesiaTTSService
 from pipecat.services.deepgram.stt import DeepgramSTTService
-from pipecat.services.openai.llm import OpenAILLMService
-from pipecat.services.playht.tts import PlayHTHttpTTSService
+from pipecat.services.openai.responses.llm import OpenAIResponsesLLMService
 from pipecat.transports.base_transport import BaseTransport, TransportParams
 from pipecat.transports.daily.transport import DailyParams
 from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
@@ -54,22 +53,21 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))

-    tts = PlayHTHttpTTSService(
-        user_id=os.getenv("PLAYHT_USER_ID"),
-        api_key=os.getenv("PLAYHT_API_KEY"),
-        voice_url="s3://voice-cloning-zero-shot/d9ff78ba-d016-47f6-b0ef-dd630f59414e/female-cs/manifest.json",
+    tts = CartesiaTTSService(
+        api_key=os.getenv("CARTESIA_API_KEY"),
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAIResponsesLLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAIResponsesLLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -100,7 +98,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07-interruptible.py
+++ b/examples/foundational/07-interruptible.py
@@ -55,19 +55,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -98,7 +98,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07a-interruptible-speechmatics-vad.py
+++ b/examples/foundational/07a-interruptible-speechmatics-vad.py
@@ -21,7 +21,6 @@ from pipecat.processors.aggregators.llm_response_universal import (
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
-from pipecat.services.openai.base_llm import BaseOpenAILLMService
 from pipecat.services.openai.llm import OpenAILLMService
 from pipecat.services.speechmatics.stt import SpeechmaticsSTTService
 from pipecat.services.speechmatics.tts import SpeechmaticsTTSService
@@ -93,7 +92,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async with aiohttp.ClientSession() as session:
        stt = SpeechmaticsSTTService(
            api_key=os.getenv("SPEECHMATICS_API_KEY"),
-            params=SpeechmaticsSTTService.InputParams(
+            settings=SpeechmaticsSTTService.Settings(
                language=Language.EN,
                turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.ADAPTIVE,
                # focus_speakers=["S1"],
@@ -104,32 +103,21 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

        tts = SpeechmaticsTTSService(
            api_key=os.getenv("SPEECHMATICS_API_KEY"),
-            voice_id="sarah",
+            settings=SpeechmaticsTTSService.Settings(
+                voice="sarah",
+            ),
            aiohttp_session=session,
        )

        llm = OpenAILLMService(
            api_key=os.getenv("OPENAI_API_KEY"),
-            params=BaseOpenAILLMService.InputParams(temperature=0.75),
+            settings=OpenAILLMService.Settings(
+                temperature=0.75,
+                system_instruction="You are a helpful British assistant called Sarah in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Always include punctuation in your responses. Give very short replies - do not give longer replies unless strictly necessary. Respond to what the user said in a concise, funny, creative and helpful way. Use `<Sn/>` tags to identify different speakers - do not use tags in your replies. Do not respond to speakers within `<PASSIVE/>` tags unless explicitly asked to.",
+            ),
        )

-        messages = [
-            {
-                "role": "system",
-                "content": (
-                    "You are a helpful British assistant called Sarah. "
-                    "Your goal is to demonstrate your capabilities in a succinct way. "
-                    "Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. "
-                    "Always include punctuation in your responses. "
-                    "Give very short replies - do not give longer replies unless strictly necessary. "
-                    "Respond to what the user said in a concise, funny, creative and helpful way. "
-                    "Use `<Sn/>` tags to identify different speakers - do not use tags in your replies. "
-                    "Do not respond to speakers within `<PASSIVE/>` tags unless explicitly asked to. "
-                ),
-            },
-        ]
-
-        context = LLMContext(messages)
+        context = LLMContext()
        user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
            context,
            user_params=LLMUserAggregatorParams(user_turn_strategies=ExternalUserTurnStrategies()),
@@ -160,7 +148,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        async def on_client_connected(transport, client):
            logger.info(f"Client connected")
            # Kick off the conversation.
-            messages.append({"role": "system", "content": "Say a short hello to the user."})
+            context.add_message({"role": "developer", "content": "Say a short hello to the user."})
            await task.queue_frames([LLMRunFrame()])

        @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07a-interruptible-speechmatics.py
+++ b/examples/foundational/07a-interruptible-speechmatics.py
@@ -22,7 +22,6 @@ from pipecat.processors.aggregators.llm_response_universal import (
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
-from pipecat.services.openai.base_llm import BaseOpenAILLMService
 from pipecat.services.openai.llm import OpenAILLMService
 from pipecat.services.speechmatics.stt import SpeechmaticsSTTService
 from pipecat.services.speechmatics.tts import SpeechmaticsTTSService
@@ -76,7 +75,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async with aiohttp.ClientSession() as session:
        stt = SpeechmaticsSTTService(
            api_key=os.getenv("SPEECHMATICS_API_KEY"),
-            params=SpeechmaticsSTTService.InputParams(
+            settings=SpeechmaticsSTTService.Settings(
                language=Language.EN,
                speaker_active_format="<{speaker_id}>{text}</{speaker_id}>",
            ),
@@ -84,31 +83,21 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

        tts = SpeechmaticsTTSService(
            api_key=os.getenv("SPEECHMATICS_API_KEY"),
-            voice_id="sarah",
+            settings=SpeechmaticsTTSService.Settings(
+                voice="sarah",
+            ),
            aiohttp_session=session,
        )

        llm = OpenAILLMService(
            api_key=os.getenv("OPENAI_API_KEY"),
-            params=BaseOpenAILLMService.InputParams(temperature=0.75),
+            settings=OpenAILLMService.Settings(
+                temperature=0.75,
+                system_instruction="You are a helpful British assistant called Sarah in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Always include punctuation in your responses. Give very short replies - do not give longer replies unless strictly necessary. Respond to what the user said in a concise, funny, creative and helpful way. Use `<Sn/>` tags to identify different speakers - do not use tags in your replies. Do not respond to speakers within `<PASSIVE/>` tags unless explicitly asked to.",
+            ),
        )

-        messages = [
-            {
-                "role": "system",
-                "content": (
-                    "You are a helpful British assistant called Sarah. "
-                    "Your goal is to demonstrate your capabilities in a succinct way. "
-                    "Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. "
-                    "Always include punctuation in your responses. "
-                    "Give very short replies - do not give longer replies unless strictly necessary. "
-                    "Respond to what the user said in a concise, funny, creative and helpful way. "
-                    "Use `<Sn/>` tags to identify different speakers - do not use tags in your replies."
-                ),
-            },
-        ]
-
-        context = LLMContext(messages)
+        context = LLMContext()
        user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
            context,
            user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -139,7 +128,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        async def on_client_connected(transport, client):
            logger.info(f"Client connected")
            # Kick off the conversation.
-            messages.append({"role": "system", "content": "Say a short hello to the user."})
+            context.add_message({"role": "developer", "content": "Say a short hello to the user."})
            await task.queue_frames([LLMRunFrame()])

        @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07b-interruptible-langchain.py
+++ b/examples/foundational/07b-interruptible-langchain.py
@@ -71,15 +71,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
    )

    prompt = ChatPromptTemplate.from_messages(
        [
            (
                "system",
-                "Be nice and helpful. Answer very briefly and without special characters like `#` or `*`. "
-                "Your response will be synthesized to voice and those characters will create unnatural sounds.",
+                "You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
            ),
            MessagesPlaceholder("chat_history"),
            ("human", "{input}"),
--- a/examples/foundational/07c-interruptible-deepgram-flux-sagemaker.py
+++ b/examples/foundational/07c-interruptible-deepgram-flux-sagemaker.py
@@ -0,0 +1,151 @@
+#
+# Copyright (c) 2024-2026, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+
+import os
+
+from dotenv import load_dotenv
+from loguru import logger
+
+from pipecat.audio.vad.silero import SileroVADAnalyzer
+from pipecat.frames.frames import LLMRunFrame
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.llm_context import LLMContext
+from pipecat.processors.aggregators.llm_response_universal import (
+    LLMContextAggregatorPair,
+    LLMUserAggregatorParams,
+)
+from pipecat.runner.types import RunnerArguments
+from pipecat.runner.utils import create_transport
+from pipecat.services.aws.llm import AWSBedrockLLMService, AWSBedrockLLMSettings
+from pipecat.services.deepgram.flux.sagemaker.stt import DeepgramFluxSageMakerSTTService
+from pipecat.services.deepgram.sagemaker.tts import DeepgramSageMakerTTSService
+from pipecat.transports.base_transport import BaseTransport, TransportParams
+from pipecat.transports.daily.transport import DailyParams
+from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
+from pipecat.turns.user_turn_strategies import ExternalUserTurnStrategies
+
+load_dotenv(override=True)
+
+
+# We use lambdas to defer transport parameter creation until the transport
+# type is selected at runtime.
+transport_params = {
+    "daily": lambda: DailyParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+    "twilio": lambda: FastAPIWebsocketParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+    "webrtc": lambda: TransportParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+}
+
+
+async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
+    logger.info(f"Starting bot")
+
+    # Initialize Deepgram Flux SageMaker STT Service
+    # This requires:
+    # - AWS credentials configured (via environment variables or AWS CLI)
+    # - A deployed SageMaker endpoint with Deepgram Flux model
+    stt = DeepgramFluxSageMakerSTTService(
+        endpoint_name=os.getenv("SAGEMAKER_STT_ENDPOINT_NAME"),
+        region=os.getenv("AWS_REGION"),
+        settings=DeepgramFluxSageMakerSTTService.Settings(
+            min_confidence=0.3,
+        ),
+    )
+
+    # Initialize Deepgram SageMaker TTS Service
+    # This requires:
+    # - AWS credentials configured (via environment variables or AWS CLI)
+    # - A deployed SageMaker endpoint with Deepgram TTS model
+    tts = DeepgramSageMakerTTSService(
+        endpoint_name=os.getenv("SAGEMAKER_TTS_ENDPOINT_NAME"),
+        region=os.getenv("AWS_REGION"),
+        settings=DeepgramSageMakerTTSService.Settings(
+            voice="aura-2-andromeda-en",
+        ),
+    )
+
+    llm = AWSBedrockLLMService(
+        aws_region=os.getenv("AWS_REGION"),
+        settings=AWSBedrockLLMSettings(
+            model="us.amazon.nova-pro-v1:0",
+            temperature=0.8,
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )
+
+    context = LLMContext()
+    # Use ExternalUserTurnStrategies since Flux handles turn detection natively
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        user_params=LLMUserAggregatorParams(
+            user_turn_strategies=ExternalUserTurnStrategies(),
+            vad_analyzer=SileroVADAnalyzer(),
+        ),
+    )
+
+    pipeline = Pipeline(
+        [
+            transport.input(),  # Transport user input
+            stt,  # STT
+            user_aggregator,  # User responses
+            llm,  # LLM
+            tts,  # TTS
+            transport.output(),  # Transport bot output
+            assistant_aggregator,  # Assistant spoken responses
+        ]
+    )
+
+    task = PipelineTask(
+        pipeline,
+        params=PipelineParams(
+            enable_metrics=True,
+            enable_usage_metrics=True,
+        ),
+        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
+    )
+
+    @transport.event_handler("on_client_connected")
+    async def on_client_connected(transport, client):
+        logger.info(f"Client connected")
+        # Kick off the conversation.
+        context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
+        await task.queue_frames([LLMRunFrame()])
+
+    @transport.event_handler("on_client_disconnected")
+    async def on_client_disconnected(transport, client):
+        logger.info(f"Client disconnected")
+        await task.cancel()
+
+    @stt.event_handler("on_update")
+    async def on_deepgram_flux_update(stt, transcript):
+        logger.debug(f"On deepgram flux update: {transcript}")
+
+    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
+
+    await runner.run(task)
+
+
+async def bot(runner_args: RunnerArguments):
+    """Main bot entry point compatible with Pipecat Cloud."""
+    transport = await create_transport(runner_args, transport_params)
+    await run_bot(transport, runner_args)
+
+
+if __name__ == "__main__":
+    from pipecat.runner.run import main
+
+    main()
--- a/examples/foundational/07c-interruptible-deepgram-flux.py
+++ b/examples/foundational/07c-interruptible-deepgram-flux.py
@@ -10,6 +10,7 @@ import os
 from dotenv import load_dotenv
 from loguru import logger

+from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import LLMRunFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
@@ -55,24 +56,32 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    stt = DeepgramFluxSTTService(
        api_key=os.getenv("DEEPGRAM_API_KEY"),
-        params=DeepgramFluxSTTService.InputParams(min_confidence=0.3),
+        settings=DeepgramFluxSTTService.Settings(
+            min_confidence=0.3,
+        ),
    )

-    tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-2-andromeda-en")
+    tts = DeepgramTTSService(
+        api_key=os.getenv("DEEPGRAM_API_KEY"),
+        settings=DeepgramTTSService.Settings(
+            voice="aura-2-andromeda-en",
+        ),
+    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
-        user_params=LLMUserAggregatorParams(user_turn_strategies=ExternalUserTurnStrategies()),
+        user_params=LLMUserAggregatorParams(
+            user_turn_strategies=ExternalUserTurnStrategies(),
+            vad_analyzer=SileroVADAnalyzer(),
+        ),
    )

    pipeline = Pipeline(
@@ -100,7 +109,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07c-interruptible-deepgram-http.py
+++ b/examples/foundational/07c-interruptible-deepgram-http.py
@@ -59,20 +59,20 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

        tts = DeepgramHttpTTSService(
            api_key=os.getenv("DEEPGRAM_API_KEY"),
-            voice="aura-2-andromeda-en",
+            settings=DeepgramHttpTTSService.Settings(
+                voice="aura-2-andromeda-en",
+            ),
            aiohttp_session=session,
        )

-        llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+        llm = OpenAILLMService(
+            api_key=os.getenv("OPENAI_API_KEY"),
+            settings=OpenAILLMService.Settings(
+                system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+            ),
+        )

-        messages = [
-            {
-                "role": "system",
-                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-            },
-        ]
-
-        context = LLMContext(messages)
+        context = LLMContext()
        user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
            context,
            user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -103,7 +103,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        async def on_client_connected(transport, client):
            logger.info(f"Client connected")
            # Kick off the conversation.
-            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+            context.add_message(
+                {"role": "developer", "content": "Please introduce yourself to the user."}
+            )
            await task.queue_frames([LLMRunFrame()])

        @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07c-interruptible-deepgram-sagemaker.py
+++ b/examples/foundational/07c-interruptible-deepgram-sagemaker.py
@@ -22,9 +22,9 @@ from pipecat.processors.aggregators.llm_response_universal import (
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
-from pipecat.services.aws.llm import AWSBedrockLLMService
-from pipecat.services.deepgram.stt_sagemaker import DeepgramSageMakerSTTService
-from pipecat.services.deepgram.tts import DeepgramTTSService
+from pipecat.services.aws.llm import AWSBedrockLLMService, AWSBedrockLLMSettings
+from pipecat.services.deepgram.sagemaker.stt import DeepgramSageMakerSTTService
+from pipecat.services.deepgram.sagemaker.tts import DeepgramSageMakerTTSService
 from pipecat.transports.base_transport import BaseTransport, TransportParams
 from pipecat.transports.daily.transport import DailyParams
 from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
@@ -58,26 +58,32 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    # - AWS credentials configured (via environment variables or AWS CLI)
    # - A deployed SageMaker endpoint with Deepgram model
    stt = DeepgramSageMakerSTTService(
-        endpoint_name=os.getenv("SAGEMAKER_ENDPOINT_NAME"),
+        endpoint_name=os.getenv("SAGEMAKER_STT_ENDPOINT_NAME"),
        region=os.getenv("AWS_REGION"),
    )

-    tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-2-andromeda-en")
+    # Initialize Deepgram SageMaker TTS Service
+    # This requires:
+    # - AWS credentials configured (via environment variables or AWS CLI)
+    # - A deployed SageMaker endpoint with Deepgram TTS model
+    tts = DeepgramSageMakerTTSService(
+        endpoint_name=os.getenv("SAGEMAKER_TTS_ENDPOINT_NAME"),
+        region=os.getenv("AWS_REGION"),
+        settings=DeepgramSageMakerTTSService.Settings(
+            voice="aura-2-andromeda-en",
+        ),
+    )

    llm = AWSBedrockLLMService(
        aws_region=os.getenv("AWS_REGION"),
-        model="us.amazon.nova-pro-v1:0",
-        params=AWSBedrockLLMService.InputParams(temperature=0.8),
+        settings=AWSBedrockLLMSettings(
+            model="us.amazon.nova-pro-v1:0",
+            temperature=0.8,
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -108,7 +114,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07c-interruptible-deepgram-vad.py
+++ b/examples/foundational/07c-interruptible-deepgram-vad.py
@@ -7,7 +7,6 @@

 import os

-from deepgram import LiveOptions
 from dotenv import load_dotenv
 from loguru import logger

@@ -56,21 +55,27 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    stt = DeepgramSTTService(
        api_key=os.getenv("DEEPGRAM_API_KEY"),
-        live_options=LiveOptions(vad_events=True, utterance_end_ms="1000"),
+        settings=DeepgramSTTService.Settings(
+            vad_events=True,
+            utterance_end_ms="1000",
+        ),
    )

-    tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-2-andromeda-en")
+    tts = DeepgramTTSService(
+        api_key=os.getenv("DEEPGRAM_API_KEY"),
+        settings=DeepgramTTSService.Settings(
+            voice="aura-2-andromeda-en",
+        ),
+    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(user_turn_strategies=ExternalUserTurnStrategies()),
@@ -101,7 +106,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07c-interruptible-deepgram.py
+++ b/examples/foundational/07c-interruptible-deepgram.py
@@ -55,18 +55,21 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))

-    tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-2-andromeda-en")
+    tts = DeepgramTTSService(
+        api_key=os.getenv("DEEPGRAM_API_KEY"),
+        settings=DeepgramTTSService.Settings(
+            voice="aura-2-andromeda-en",
+        ),
+    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -97,7 +100,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07d-interruptible-elevenlabs-http.py
+++ b/examples/foundational/07d-interruptible-elevenlabs-http.py
@@ -63,20 +63,20 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

        tts = ElevenLabsHttpTTSService(
            api_key=os.getenv("ELEVENLABS_API_KEY", ""),
-            voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
            aiohttp_session=session,
+            settings=ElevenLabsHttpTTSService.Settings(
+                voice=os.getenv("ELEVENLABS_VOICE_ID", ""),
+            ),
        )

-        llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+        llm = OpenAILLMService(
+            api_key=os.getenv("OPENAI_API_KEY"),
+            settings=OpenAILLMService.Settings(
+                system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+            ),
+        )

-        messages = [
-            {
-                "role": "system",
-                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-            },
-        ]
-
-        context = LLMContext(messages)
+        context = LLMContext()
        user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
            context,
            user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -107,7 +107,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        async def on_client_connected(transport, client):
            logger.info(f"Client connected")
            # Kick off the conversation.
-            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+            context.add_message(
+                {"role": "developer", "content": "Please introduce yourself to the user."}
+            )
            await task.queue_frames([LLMRunFrame()])

        @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07d-interruptible-elevenlabs.py
+++ b/examples/foundational/07d-interruptible-elevenlabs.py
@@ -57,19 +57,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = ElevenLabsTTSService(
        api_key=os.getenv("ELEVENLABS_API_KEY", ""),
-        voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
+        settings=ElevenLabsTTSService.Settings(
+            voice=os.getenv("ELEVENLABS_VOICE_ID", ""),
+        ),
    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -100,7 +100,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07e-interruptible-xai.py
+++ b/examples/foundational/07e-interruptible-xai.py
@@ -0,0 +1,128 @@
+#
+# Copyright (c) 2024-2026, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+import os
+
+import aiohttp
+from dotenv import load_dotenv
+from loguru import logger
+
+from pipecat.audio.vad.silero import SileroVADAnalyzer
+from pipecat.frames.frames import LLMRunFrame
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.llm_context import LLMContext
+from pipecat.processors.aggregators.llm_response_universal import (
+    LLMContextAggregatorPair,
+    LLMUserAggregatorParams,
+)
+from pipecat.runner.types import RunnerArguments
+from pipecat.runner.utils import create_transport
+from pipecat.services.deepgram.stt import DeepgramSTTService
+from pipecat.services.xai.llm import GrokLLMService
+from pipecat.services.xai.tts import XAIHttpTTSService
+from pipecat.transports.base_transport import BaseTransport, TransportParams
+from pipecat.transports.daily.transport import DailyParams
+from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
+
+load_dotenv(override=True)
+
+# We use lambdas to defer transport parameter creation until the transport
+# type is selected at runtime.
+transport_params = {
+    "daily": lambda: DailyParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+    "twilio": lambda: FastAPIWebsocketParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+    "webrtc": lambda: TransportParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+}
+
+
+async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
+    logger.info(f"Starting bot")
+
+    async with aiohttp.ClientSession() as session:
+        stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
+
+        tts = XAIHttpTTSService(
+            api_key=os.getenv("XAI_API_KEY"),
+            aiohttp_session=session,
+            settings=XAIHttpTTSService.Settings(
+                voice="eve",
+            ),
+        )
+
+        llm = GrokLLMService(
+            api_key=os.getenv("XAI_API_KEY"),
+            settings=GrokLLMService.Settings(
+                system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+            ),
+        )
+
+        context = LLMContext()
+        user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+            context,
+            user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+        )
+
+        pipeline = Pipeline(
+            [
+                transport.input(),  # Transport user input
+                stt,
+                user_aggregator,  # User responses
+                llm,  # LLM
+                tts,  # TTS
+                transport.output(),  # Transport bot output
+                assistant_aggregator,  # Assistant spoken responses
+            ]
+        )
+
+        task = PipelineTask(
+            pipeline,
+            params=PipelineParams(
+                enable_metrics=True,
+                enable_usage_metrics=True,
+            ),
+            idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
+        )
+
+        @transport.event_handler("on_client_connected")
+        async def on_client_connected(transport, client):
+            logger.info(f"Client connected")
+            # Kick off the conversation.
+            context.add_message(
+                {"role": "developer", "content": "Please introduce yourself to the user."}
+            )
+            await task.queue_frames([LLMRunFrame()])
+
+        @transport.event_handler("on_client_disconnected")
+        async def on_client_disconnected(transport, client):
+            logger.info(f"Client disconnected")
+            await task.cancel()
+
+        runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
+
+        await runner.run(task)
+
+
+async def bot(runner_args: RunnerArguments):
+    """Main bot entry point compatible with Pipecat Cloud."""
+    transport = await create_transport(runner_args, transport_params)
+    await run_bot(transport, runner_args)
+
+
+if __name__ == "__main__":
+    from pipecat.runner.run import main
+
+    main()
--- a/examples/foundational/07f-interruptible-azure-http.py
+++ b/examples/foundational/07f-interruptible-azure-http.py
@@ -65,17 +65,13 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    llm = AzureLLMService(
        api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
        endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
-        model=os.getenv("AZURE_CHATGPT_MODEL"),
+        settings=AzureLLMService.Settings(
+            model=os.getenv("AZURE_CHATGPT_MODEL"),
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -106,7 +102,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07f-interruptible-azure.py
+++ b/examples/foundational/07f-interruptible-azure.py
@@ -65,17 +65,13 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    llm = AzureLLMService(
        api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
        endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
-        model=os.getenv("AZURE_CHATGPT_MODEL"),
+        settings=AzureLLMService.Settings(
+            model=os.getenv("AZURE_CHATGPT_MODEL"),
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -106,7 +102,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07g-interruptible-openai-http.py
+++ b/examples/foundational/07g-interruptible-openai-http.py
@@ -11,7 +11,6 @@ from dotenv import load_dotenv
 from loguru import logger

 from pipecat.audio.vad.silero import SileroVADAnalyzer
-from pipecat.audio.vad.vad_analyzer import VADParams
 from pipecat.frames.frames import LLMRunFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
@@ -55,22 +54,27 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    stt = OpenAISTTService(
        api_key=os.getenv("OPENAI_API_KEY"),
-        model="gpt-4o-transcribe",
-        prompt="Expect words related to dogs, such as breed names.",
+        settings=OpenAISTTService.Settings(
+            model="gpt-4o-transcribe",
+            prompt="Expect words related to dogs, such as breed names.",
+        ),
    )

-    tts = OpenAITTSService(api_key=os.getenv("OPENAI_API_KEY"), voice="ballad")
+    tts = OpenAITTSService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAITTSService.Settings(
+            voice="ballad",
+        ),
+    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are very knowledgable about dogs. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are very knowledgable about dogs. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -102,7 +106,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07g-interruptible-openai.py
+++ b/examples/foundational/07g-interruptible-openai.py
@@ -55,27 +55,28 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    stt = OpenAIRealtimeSTTService(
        api_key=os.getenv("OPENAI_API_KEY"),
-        model="gpt-4o-transcribe",
-        prompt="Expect words related to dogs, such as breed names.",
-        language=Language.EN,
-        # Uses local VAD by default.
-        # To enable server-side VAD, set turn_detection=None or
-        # a dict with server_vad settings.
-        # turn_detection={"type": "server_vad", "threshold": 0.5},
+        settings=OpenAIRealtimeSTTService.Settings(
+            model="gpt-4o-transcribe",
+            prompt="Expect words related to dogs, such as breed names.",
+            language=Language.EN,
+        ),
    )

-    tts = OpenAITTSService(api_key=os.getenv("OPENAI_API_KEY"), voice="ballad")
+    tts = OpenAITTSService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAITTSService.Settings(
+            voice="ballad",
+        ),
+    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are very knowledgable about dogs. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are very knowledgable about dogs. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -107,7 +108,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07h-interruptible-openpipe.py
+++ b/examples/foundational/07h-interruptible-openpipe.py
@@ -57,7 +57,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
    )

    timestamp = int(time.time())
@@ -65,16 +67,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        api_key=os.getenv("OPENAI_API_KEY"),
        openpipe_api_key=os.getenv("OPENPIPE_API_KEY"),
        tags={"conversation_id": f"pipecat-{timestamp}"},
+        settings=OpenPipeLLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -105,7 +103,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07i-interruptible-xtts.py
+++ b/examples/foundational/07i-interruptible-xtts.py
@@ -59,20 +59,20 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

        tts = XTTSService(
            aiohttp_session=session,
-            voice_id="Claribel Dervla",
+            settings=XTTSService.Settings(
+                voice="Claribel Dervla",
+            ),
            base_url="http://localhost:8000",
        )

-        llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+        llm = OpenAILLMService(
+            api_key=os.getenv("OPENAI_API_KEY"),
+            settings=OpenAILLMService.Settings(
+                system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+            ),
+        )

-        messages = [
-            {
-                "role": "system",
-                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-            },
-        ]
-
-        context = LLMContext(messages)
+        context = LLMContext()
        user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
            context,
            user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -103,7 +103,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        async def on_client_connected(transport, client):
            logger.info(f"Client connected")
            # Kick off the conversation.
-            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+            context.add_message(
+                {"role": "developer", "content": "Please introduce yourself to the user."}
+            )
            await task.queue_frames([LLMRunFrame()])

        @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07j-interruptible-gladia-vad.py
+++ b/examples/foundational/07j-interruptible-gladia-vad.py
@@ -23,7 +23,7 @@ from pipecat.processors.aggregators.llm_response_universal import (
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
 from pipecat.services.cartesia.tts import CartesiaTTSService
-from pipecat.services.gladia.config import GladiaInputParams, LanguageConfig
+from pipecat.services.gladia.config import LanguageConfig
 from pipecat.services.gladia.stt import GladiaSTTService
 from pipecat.services.openai.llm import OpenAILLMService
 from pipecat.transcriptions.language import Language
@@ -58,7 +58,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    stt = GladiaSTTService(
        api_key=os.getenv("GLADIA_API_KEY", ""),
        region=os.getenv("GLADIA_REGION"),
-        params=GladiaInputParams(
+        settings=GladiaSTTService.Settings(
            language_config=LanguageConfig(
                languages=[Language.EN],
            ),
@@ -68,19 +68,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY", ""),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY", ""))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY", ""),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": f"You are a helpful LLM. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(
@@ -114,7 +114,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07j-interruptible-gladia.py
+++ b/examples/foundational/07j-interruptible-gladia.py
@@ -23,7 +23,7 @@ from pipecat.processors.aggregators.llm_response_universal import (
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
 from pipecat.services.cartesia.tts import CartesiaTTSService
-from pipecat.services.gladia.config import GladiaInputParams, LanguageConfig
+from pipecat.services.gladia.config import LanguageConfig
 from pipecat.services.gladia.stt import GladiaSTTService
 from pipecat.services.openai.llm import OpenAILLMService
 from pipecat.transcriptions.language import Language
@@ -57,7 +57,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    stt = GladiaSTTService(
        api_key=os.getenv("GLADIA_API_KEY", ""),
        region=os.getenv("GLADIA_REGION"),
-        params=GladiaInputParams(
+        settings=GladiaSTTService.Settings(
            language_config=LanguageConfig(
                languages=[Language.EN],
            )
@@ -66,19 +66,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY", ""),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY", ""))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY", ""),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": f"You are a helpful LLM. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -109,7 +109,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07k-interruptible-lmnt.py
+++ b/examples/foundational/07k-interruptible-lmnt.py
@@ -54,18 +54,21 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))

-    tts = LmntTTSService(api_key=os.getenv("LMNT_API_KEY"), voice_id="morgan")
+    tts = LmntTTSService(
+        api_key=os.getenv("LMNT_API_KEY"),
+        settings=LmntTTSService.Settings(
+            voice="morgan",
+        ),
+    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -96,7 +99,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07l-interruptible-groq.py
+++ b/examples/foundational/07l-interruptible-groq.py
@@ -55,19 +55,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    stt = GroqSTTService(api_key=os.getenv("GROQ_API_KEY"))

    llm = GroqLLMService(
-        api_key=os.getenv("GROQ_API_KEY"), model="meta-llama/llama-4-maverick-17b-128e-instruct"
+        api_key=os.getenv("GROQ_API_KEY"),
+        settings=GroqLLMService.Settings(
+            model="llama-3.1-8b-instant",
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
    )

    tts = GroqTTSService(api_key=os.getenv("GROQ_API_KEY"))

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -98,7 +95,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07m-interruptible-aws-strands.py
+++ b/examples/foundational/07m-interruptible-aws-strands.py
@@ -95,13 +95,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = AWSPollyTTSService(
        region="us-west-2",  # only specific regions support generative TTS
-        voice_id="Joanna",
-        params=AWSPollyTTSService.InputParams(engine="generative", rate="1.1"),
+        settings=AWSPollyTTSService.Settings(
+            voice="Joanna",
+            engine="generative",
+            rate="1.1",
+        ),
    )

    # Create Strands agent processor
    try:
-        agent = build_agent(model_id="us.anthropic.claude-3-5-haiku-20241022-v1:0", max_tokens=8000)
+        agent = build_agent(model_id="us.anthropic.claude-sonnet-4-6", max_tokens=8000)
        llm = StrandsAgentsProcessor(agent=agent)
        logger.info("Successfully created Strands agent for NAB customer service coaching")
    except Exception as e:
@@ -148,8 +151,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
                LLMMessagesAppendFrame(
                    messages=[
                        {
-                            "role": "user",
-                            "content": f"Greet the user and introduce yourself.",
+                            "role": "developer",
+                            "content": f"Greet the user and introduce yourself. Don't use emojis.",
                        }
                    ],
                    run_llm=True,
--- a/examples/foundational/07m-interruptible-aws.py
+++ b/examples/foundational/07m-interruptible-aws.py
@@ -54,24 +54,23 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = AWSPollyTTSService(
        region="us-west-2",  # only specific regions support generative TTS
-        voice_id="Joanna",
-        params=AWSPollyTTSService.InputParams(engine="generative", rate="1.1"),
+        settings=AWSPollyTTSService.Settings(
+            voice="Joanna",
+            engine="generative",
+            rate="1.1",
+        ),
    )

    llm = AWSBedrockLLMService(
        aws_region="us-west-2",
-        model="us.anthropic.claude-haiku-4-5-20251001-v1:0",
-        params=AWSBedrockLLMService.InputParams(temperature=0.8),
+        settings=AWSBedrockLLMService.Settings(
+            model="us.anthropic.claude-sonnet-4-6",
+            temperature=0.8,
+            # system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -102,7 +101,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "user", "content": "Please introduce yourself to the user."})
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07n-interruptible-gemini-image.py
+++ b/examples/foundational/07n-interruptible-gemini-image.py
@@ -70,30 +70,30 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    logger.info(f"Starting bot")

    stt = GoogleSTTService(
-        params=GoogleSTTService.InputParams(languages=Language.EN_US),
        credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
+        settings=GoogleSTTService.Settings(
+            languages=[Language.EN_US],
+        ),
    )

    tts = GoogleTTSService(
-        voice_id="en-US-Chirp3-HD-Charon",
-        params=GoogleTTSService.InputParams(language=Language.EN_US),
        credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
+        settings=GoogleTTSService.Settings(
+            voice="en-US-Chirp3-HD-Charon",
+            language=Language.EN_US,
+        ),
    )

    llm = GoogleLLMService(
        api_key=os.getenv("GOOGLE_API_KEY"),
-        model="gemini-2.5-flash-image",
-        # model="gemini-3-pro-image-preview", # A more powerful model, but slower
+        settings=GoogleLLMService.Settings(
+            model="gemini-2.5-flash-image",
+            # model="gemini-3-pro-image-preview", # A more powerful model, but slower,
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -124,7 +124,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation with a styled introduction
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07n-interruptible-gemini.py
+++ b/examples/foundational/07n-interruptible-gemini.py
@@ -54,15 +54,17 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    logger.info(f"Starting bot with Gemini TTS")

    stt = GoogleSTTService(
-        params=GoogleSTTService.InputParams(languages=Language.EN_US),
+        settings=GoogleSTTService.Settings(
+            languages=[Language.EN_US],
+        ),
        credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
    )

    tts = GeminiTTSService(
        credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
-        model="gemini-2.5-flash-tts",
-        voice_id="Charon",
-        params=GeminiTTSService.InputParams(
+        settings=GeminiTTSService.Settings(
+            model="gemini-2.5-flash-tts",
+            voice="Charon",
            language=Language.EN_US,
            prompt="You are a helpful AI assistant. Speak in a natural, conversational tone.",
        ),
@@ -71,13 +73,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    llm = GoogleLLMService(
        api_key=os.getenv("GOOGLE_API_KEY"),
        model="gemini-2.5-flash",
-    )
-
-    # System message that instructs the AI on how to speak
-    messages = [
-        {
-            "role": "system",
-            "content": """You are a helpful AI assistant in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way.
+        settings=GoogleLLMService.Settings(
+            system_instruction="""You are a helpful assistant in a voice conversation.

            IMPORTANT: You're using Gemini TTS which supports expressive markup tags. You can use these tags in your responses:
            - [sigh] - Insert a sigh sound
@@ -94,11 +91,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
            - "[whispering] Let me tell you a secret."
            - "The answer is... [long pause] ...42!"

-            Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.""",
-        },
-    ]
+            Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Keep responses concise. Respond to what the user said in a creative and helpful way.""",
+        ),
+    )

-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -129,9 +126,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation
-        messages.append(
+        context.add_message(
            {
-                "role": "system",
+                "role": "developer",
                "content": "You are an AI assistant. You can help with a variety of tasks. Introduce yourself and ask the user what they would like to know.",
            }
        )
--- a/examples/foundational/07n-interruptible-google-http.py
+++ b/examples/foundational/07n-interruptible-google-http.py
@@ -54,34 +54,34 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    logger.info(f"Starting bot")

    stt = GoogleSTTService(
-        params=GoogleSTTService.InputParams(languages=Language.EN_US, model="chirp_3"),
+        settings=GoogleSTTService.Settings(
+            languages=[Language.EN_US],
+            # Add model to use a specific model
+            # model="chirp_3",
+        ),
        credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
        location="us",
    )

    tts = GoogleHttpTTSService(
-        voice_id="en-US-Chirp3-HD-Charon",
-        params=GoogleHttpTTSService.InputParams(language=Language.EN_US),
+        settings=GoogleHttpTTSService.Settings(
+            voice="en-US-Chirp3-HD-Charon",
+            language=Language.EN_US,
+        ),
        credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
    )

    llm = GoogleLLMService(
        api_key=os.getenv("GOOGLE_API_KEY"),
-        model="gemini-2.5-flash",
-        # force a certain amount of thinking if you want it
-        # params=GoogleLLMService.InputParams(
-        #     thinking=GoogleLLMService.ThinkingConfig(thinking_budget=4096)
-        # ),
+        settings=GoogleLLMService.Settings(
+            model="gemini-2.5-flash",
+            # force a certain amount of thinking if you want it
+            # thinking=GoogleLLMService.ThinkingConfig(thinking_budget=4096)
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -112,7 +112,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07n-interruptible-google.py
+++ b/examples/foundational/07n-interruptible-google.py
@@ -54,34 +54,34 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    logger.info(f"Starting bot")

    stt = GoogleSTTService(
-        params=GoogleSTTService.InputParams(languages=Language.EN_US, model="chirp_3"),
+        settings=GoogleSTTService.Settings(
+            languages=[Language.EN_US],
+            # Add model to use a specific model
+            # model="chirp_3",
+        ),
        credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
        location="us",
    )

    tts = GoogleTTSService(
-        voice_id="en-US-Chirp3-HD-Charon",
-        params=GoogleTTSService.InputParams(language=Language.EN_US),
+        settings=GoogleTTSService.Settings(
+            voice="en-US-Chirp3-HD-Charon",
+            language=Language.EN_US,
+        ),
        credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
    )

    llm = GoogleLLMService(
        api_key=os.getenv("GOOGLE_API_KEY"),
-        model="gemini-2.5-flash",
-        # force a certain amount of thinking if you want it
-        # params=GoogleLLMService.InputParams(
-        #     thinking=GoogleLLMService.ThinkingConfig(thinking_budget=4096)
-        # ),
+        settings=GoogleLLMService.Settings(
+            model="gemini-2.5-flash",
+            # force a certain amount of thinking if you want it
+            # thinking=GoogleLLMService.ThinkingConfig(thinking_budget=4096),
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -112,7 +112,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07o-interruptible-assemblyai-turn-detection.py
+++ b/examples/foundational/07o-interruptible-assemblyai-turn-detection.py
@@ -0,0 +1,180 @@
+#
+# Copyright (c) 2024-2026, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+
+import os
+
+from dotenv import load_dotenv
+from loguru import logger
+
+from pipecat.audio.vad.silero import SileroVADAnalyzer
+from pipecat.frames.frames import LLMRunFrame
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.llm_context import LLMContext
+from pipecat.processors.aggregators.llm_response_universal import (
+    LLMContextAggregatorPair,
+    LLMUserAggregatorParams,
+)
+from pipecat.runner.types import RunnerArguments
+from pipecat.runner.utils import create_transport
+from pipecat.services.assemblyai.stt import AssemblyAISTTService
+from pipecat.services.cartesia.tts import CartesiaTTSService
+from pipecat.services.openai.llm import OpenAILLMService
+from pipecat.transports.base_transport import BaseTransport, TransportParams
+from pipecat.transports.daily.transport import DailyParams
+from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
+from pipecat.turns.user_turn_strategies import ExternalUserTurnStrategies
+
+load_dotenv(override=True)
+
+
+# We use lambdas to defer transport parameter creation until the transport
+# type is selected at runtime.
+transport_params = {
+    "daily": lambda: DailyParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+    "twilio": lambda: FastAPIWebsocketParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+    "webrtc": lambda: TransportParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+}
+
+
+async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
+    """AssemblyAI u3-rt-pro with Built-in Turn Detection
+
+    This example demonstrates using AssemblyAI's u3-rt-pro Speech-to-Text model
+    with AssemblyAI's built-in turn detection for more natural conversation flow.
+
+    Key features:
+
+    1. AssemblyAI Turn Detection
+       - Set `vad_force_turn_endpoint=False` to use AssemblyAI's built-in turn detection
+       - AssemblyAI's model determines when user starts/stops speaking
+       - Uses `ExternalUserTurnStrategies` to delegate turn control to AssemblyAI
+       - More natural turn detection based on speech patterns and pauses
+
+    2. Advanced Turn Detection Tuning
+       - `min_turn_silence`: Minimum silence (ms) when confident about end-of-turn.
+         Lower values = faster responses. Default: 100ms
+       - `max_turn_silence`: Maximum silence (ms) before forcing end-of-turn.
+         Prevents long pauses. Default: 1000ms
+
+    3. Prompt-Based Transcription Enhancement
+       - Use `prompt` parameter to improve accuracy for specific names/terms
+       - Particularly useful for proper nouns, technical terms, domain vocabulary
+       - Example: "Names: Xiomara, Saoirse, Krzystof. Technical terms: API, OAuth."
+
+    4. Speaker Diarization (Optional)
+       - Enable with `speaker_labels=True`
+       - Automatically identifies different speakers in multi-party conversations
+       - TranscriptionFrame includes speaker_id field (e.g., "Speaker A", "Speaker B")
+
+    5. Language Detection (Optional, multilingual model only)
+       - Enable with `language_detection=True`
+       - Automatically detects spoken language
+       - Available with universal-streaming-multilingual model
+
+    For more information: https://www.assemblyai.com/docs/speech-to-text/streaming
+    """
+    logger.info(f"Starting bot")
+
+    stt = AssemblyAISTTService(
+        api_key=os.getenv("ASSEMBLYAI_API_KEY"),
+        vad_force_turn_endpoint=False,  # Use AssemblyAI's built-in turn detection
+        settings=AssemblyAISTTService.Settings(
+            model="u3-rt-pro",
+            # Optional: Tune turn detection timing (defaults shown below)
+            # min_turn_silence=100,  # Default
+            # max_turn_silence=1000,  # Default
+            # Optional: Boost accuracy for specific names/terms
+            # keyterms_prompt=["Xiomara", "Saoirse", "Krzystof", "API", "OAuth"],
+            # Optional: Enable speaker diarization
+            # speaker_labels=True,
+        ),
+    )
+
+    tts = CartesiaTTSService(
+        api_key=os.getenv("CARTESIA_API_KEY"),
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
+    )
+
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )
+
+    context = LLMContext()
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        user_params=LLMUserAggregatorParams(
+            user_turn_strategies=ExternalUserTurnStrategies(),
+            vad_analyzer=SileroVADAnalyzer(),
+        ),
+    )
+
+    pipeline = Pipeline(
+        [
+            transport.input(),  # Transport user input
+            stt,  # STT
+            user_aggregator,  # User responses
+            llm,  # LLM
+            tts,  # TTS
+            transport.output(),  # Transport bot output
+            assistant_aggregator,  # Assistant spoken responses
+        ]
+    )
+
+    task = PipelineTask(
+        pipeline,
+        params=PipelineParams(
+            enable_metrics=True,
+            enable_usage_metrics=True,
+        ),
+        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
+    )
+
+    @transport.event_handler("on_client_connected")
+    async def on_client_connected(transport, client):
+        logger.info(f"Client connected")
+        # Kick off the conversation.
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
+        await task.queue_frames([LLMRunFrame()])
+
+    @transport.event_handler("on_client_disconnected")
+    async def on_client_disconnected(transport, client):
+        logger.info(f"Client disconnected")
+        await task.cancel()
+
+    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
+
+    await runner.run(task)
+
+
+async def bot(runner_args: RunnerArguments):
+    """Main bot entry point compatible with Pipecat Cloud."""
+    transport = await create_transport(runner_args, transport_params)
+    await run_bot(transport, runner_args)
+
+
+if __name__ == "__main__":
+    from pipecat.runner.run import main
+
+    main()
--- a/examples/foundational/07o-interruptible-assemblyai.py
+++ b/examples/foundational/07o-interruptible-assemblyai.py
@@ -59,19 +59,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -102,7 +102,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07p-interruptible-krisp-viva.py
+++ b/examples/foundational/07p-interruptible-krisp-viva.py
@@ -31,6 +31,8 @@ from pipecat.audio.filters.krisp_viva_filter import KrispVivaFilter
 from pipecat.audio.turn.krisp_viva_turn import KrispVivaTurn
 from pipecat.audio.vad.silero import SileroVADAnalyzer
 from pipecat.frames.frames import LLMRunFrame
+from pipecat.metrics.metrics import TurnMetricsData
+from pipecat.observers.loggers.metrics_log_observer import MetricsLogObserver
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -41,32 +43,37 @@ from pipecat.processors.aggregators.llm_response_universal import (
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
+from pipecat.services.cartesia.tts import CartesiaTTSService
 from pipecat.services.deepgram.stt import DeepgramSTTService
-from pipecat.services.deepgram.tts import DeepgramTTSService
 from pipecat.services.openai.llm import OpenAILLMService
 from pipecat.transports.base_transport import BaseTransport, TransportParams
 from pipecat.transports.daily.transport import DailyParams
 from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
+from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
+from pipecat.turns.user_turn_strategies import UserTurnStrategies

 load_dotenv(override=True)

 # We use lambdas to defer transport parameter creation until the transport
 # type is selected at runtime.
+
+krisp_viva_filter = KrispVivaFilter()
+
 transport_params = {
    "daily": lambda: DailyParams(
        audio_in_enabled=True,
        audio_out_enabled=True,
-        audio_in_filter=KrispVivaFilter(),
+        audio_in_filter=krisp_viva_filter,
    ),
    "twilio": lambda: FastAPIWebsocketParams(
        audio_in_enabled=True,
        audio_out_enabled=True,
-        audio_in_filter=KrispVivaFilter(),
+        audio_in_filter=krisp_viva_filter,
    ),
    "webrtc": lambda: TransportParams(
        audio_in_enabled=True,
        audio_out_enabled=True,
-        audio_in_filter=KrispVivaFilter(),
+        audio_in_filter=krisp_viva_filter,
    ),
 }

@@ -76,25 +83,28 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))

-    tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-helios-en")
+    tts = CartesiaTTSService(
+        api_key=os.getenv("CARTESIA_API_KEY"),
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
+    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(
            user_turn_strategies=UserTurnStrategies(
                stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=KrispVivaTurn())]
            ),
-            vad_analyzer=SileroVADAnalyzer(),
+            vad_analyzer=SileroVADAnalyzer(),  # or KrispVivaVadAnalyzer
        ),
    )

@@ -117,13 +127,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
            enable_usage_metrics=True,
        ),
        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
+        observers=[MetricsLogObserver(include_metrics={TurnMetricsData})],
    )

    @transport.event_handler("on_client_connected")
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07p-interruptible-krisp.py
+++ b/examples/foundational/07p-interruptible-krisp.py
@@ -58,18 +58,21 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))

-    tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-helios-en")
+    tts = DeepgramTTSService(
+        api_key=os.getenv("DEEPGRAM_API_KEY"),
+        settings=DeepgramTTSService.Settings(
+            voice="aura-helios-en",
+        ),
+    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -100,7 +103,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07q-interruptible-rime-http.py
+++ b/examples/foundational/07q-interruptible-rime-http.py
@@ -60,21 +60,22 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

        tts = RimeHttpTTSService(
            api_key=os.getenv("RIME_API_KEY", ""),
-            voice_id="luna",
+            settings=RimeHttpTTSService.Settings(
+                voice="luna",
+                model="arcana",
+            ),
            model="arcana",
            aiohttp_session=session,
        )

-        llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+        llm = OpenAILLMService(
+            api_key=os.getenv("OPENAI_API_KEY"),
+            settings=OpenAILLMService.Settings(
+                system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+            ),
+        )

-        messages = [
-            {
-                "role": "system",
-                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-            },
-        ]
-
-        context = LLMContext(messages)
+        context = LLMContext()
        user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
            context,
            user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -105,7 +106,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        async def on_client_connected(transport, client):
            logger.info(f"Client connected")
            # Kick off the conversation.
-            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+            context.add_message(
+                {"role": "developer", "content": "Please introduce yourself to the user."}
+            )
            await task.queue_frames([LLMRunFrame()])

        @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07q-interruptible-rime.py
+++ b/examples/foundational/07q-interruptible-rime.py
@@ -56,19 +56,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = RimeTTSService(
        api_key=os.getenv("RIME_API_KEY", ""),
-        voice_id="rex",
+        settings=RimeTTSService.Settings(
+            voice="luna",
+        ),
    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -99,7 +99,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07r-interruptible-nvidia.py
+++ b/examples/foundational/07r-interruptible-nvidia.py
@@ -55,19 +55,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    stt = NvidiaSTTService(api_key=os.getenv("NVIDIA_API_KEY"))

    llm = NvidiaLLMService(
-        api_key=os.getenv("NVIDIA_API_KEY"), model="meta/llama-3.1-405b-instruct"
+        api_key=os.getenv("NVIDIA_API_KEY"),
+        settings=NvidiaLLMService.Settings(
+            model="meta/llama-3.3-70b-instruct",
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
    )

    tts = NvidiaTTSService(api_key=os.getenv("NVIDIA_API_KEY"))

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -98,7 +95,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07s-interruptible-google-audio-in.py
+++ b/examples/foundational/07s-interruptible-google-audio-in.py
@@ -48,7 +48,7 @@ load_dotenv(override=True)

 marker = "|----|"
 system_message = f"""
-You are a helpful LLM in a WebRTC call. Your goals are to be helpful and brief in your responses.
+You are a helpful LLM in a voice call. Your goals are to be helpful and brief in your responses.

 You are expert at transcribing audio to text. You will receive a mixture of audio and text input. When
 asked to transcribe what the user said, output an exact, word-for-word transcription.
@@ -96,7 +96,7 @@ class UserAudioCollector(FrameProcessor):
            self._user_speaking = True
        elif isinstance(frame, UserStoppedSpeakingFrame):
            self._user_speaking = False
-            self._context.add_audio_frames_message(audio_frames=self._audio_frames)
+            await self._context.add_audio_frames_message(audio_frames=self._audio_frames)
            await self._user_context_aggregator.push_frame(LLMRunFrame())

        elif isinstance(frame, InputAudioRawFrame):
@@ -216,31 +216,24 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    llm = GoogleLLMService(
        api_key=os.getenv("GOOGLE_API_KEY"),
-        model="gemini-2.5-flash",
-        # force a certain amount of thinking if you want it
-        # params=GoogleLLMService.InputParams(
-        #     thinking=GoogleLLMService.ThinkingConfig(thinking_budget=4096)
-        # ),
+        settings=GoogleLLMService.Settings(
+            model="gemini-2.5-flash",
+            system_instruction=system_message,
+            # force a certain amount of thinking if you want it
+            # thinking=GoogleLLMService.ThinkingConfig(thinking_budget=4096)
+        ),
    )

    tts = GoogleTTSService(
-        voice_id="en-US-Chirp3-HD-Charon",
+        settings=GoogleTTSService.Settings(
+            voice="en-US-Chirp3-HD-Charon",
+            language=Language.EN_US,
+        ),
        params=GoogleTTSService.InputParams(language=Language.EN_US),
        credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
    )

-    messages = [
-        {
-            "role": "system",
-            "content": system_message,
-        },
-        {
-            "role": "user",
-            "content": "Start by saying hello.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -276,7 +269,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07t-interruptible-fish.py
+++ b/examples/foundational/07t-interruptible-fish.py
@@ -57,19 +57,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = FishAudioTTSService(
        api_key=os.getenv("FISH_API_KEY"),
-        model="4ce7e917cedd4bc2bb2e6ff3a46acaa1",  # Barack Obama
+        settings=FishAudioTTSService.Settings(
+            voice="4ce7e917cedd4bc2bb2e6ff3a46acaa1",  # Barack Obama
+        ),
    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -100,7 +100,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07v-interruptible-neuphonic-http.py
+++ b/examples/foundational/07v-interruptible-neuphonic-http.py
@@ -60,20 +60,20 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

        tts = NeuphonicHttpTTSService(
            api_key=os.getenv("NEUPHONIC_API_KEY"),
-            voice_id="fc854436-2dac-4d21-aa69-ae17b54e98eb",  # Emily
+            settings=NeuphonicHttpTTSService.Settings(
+                voice="fc854436-2dac-4d21-aa69-ae17b54e98eb",  # Emily
+            ),
            aiohttp_session=session,
        )

-        llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+        llm = OpenAILLMService(
+            api_key=os.getenv("OPENAI_API_KEY"),
+            settings=OpenAILLMService.Settings(
+                system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+            ),
+        )

-        messages = [
-            {
-                "role": "system",
-                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-            },
-        ]
-
-        context = LLMContext(messages)
+        context = LLMContext()
        user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
            context,
            user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -104,7 +104,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        async def on_client_connected(transport, client):
            logger.info(f"Client connected")
            # Kick off the conversation.
-            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+            context.add_message(
+                {"role": "developer", "content": "Please introduce yourself to the user."}
+            )
            await task.queue_frames([LLMRunFrame()])

        @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07v-interruptible-neuphonic.py
+++ b/examples/foundational/07v-interruptible-neuphonic.py
@@ -56,19 +56,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = NeuphonicTTSService(
        api_key=os.getenv("NEUPHONIC_API_KEY"),
-        voice_id="fc854436-2dac-4d21-aa69-ae17b54e98eb",  # Emily
+        settings=NeuphonicTTSService.Settings(
+            voice="fc854436-2dac-4d21-aa69-ae17b54e98eb",  # Emily
+        ),
    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -99,7 +99,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07w-interruptible-fal.py
+++ b/examples/foundational/07w-interruptible-fal.py
@@ -7,6 +7,7 @@

 import os

+import aiohttp
 from dotenv import load_dotenv
 from loguru import logger

@@ -53,66 +54,70 @@ transport_params = {
 async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    logger.info(f"Starting bot")

-    stt = FalSTTService(
-        api_key=os.getenv("FAL_KEY"),
-    )
+    async with aiohttp.ClientSession() as session:
+        stt = FalSTTService(
+            api_key=os.getenv("FAL_KEY"),
+            aiohttp_session=session,
+        )

-    tts = CartesiaTTSService(
-        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
-    )
+        tts = CartesiaTTSService(
+            api_key=os.getenv("CARTESIA_API_KEY"),
+            settings=CartesiaTTSService.Settings(
+                voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+            ),
+        )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+        llm = OpenAILLMService(
+            api_key=os.getenv("OPENAI_API_KEY"),
+            settings=OpenAILLMService.Settings(
+                system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+            ),
+        )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
+        context = LLMContext()
+        user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+            context,
+            user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+        )

-    context = LLMContext(messages)
-    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
-        context,
-        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
-    )
+        pipeline = Pipeline(
+            [
+                transport.input(),  # Transport user input
+                stt,  # STT
+                user_aggregator,  # User responses
+                llm,  # LLM
+                tts,  # TTS
+                transport.output(),  # Transport bot output
+                assistant_aggregator,  # Assistant spoken responses
+            ]
+        )

-    pipeline = Pipeline(
-        [
-            transport.input(),  # Transport user input
-            stt,  # STT
-            user_aggregator,  # User responses
-            llm,  # LLM
-            tts,  # TTS
-            transport.output(),  # Transport bot output
-            assistant_aggregator,  # Assistant spoken responses
-        ]
-    )
+        task = PipelineTask(
+            pipeline,
+            params=PipelineParams(
+                enable_metrics=True,
+                enable_usage_metrics=True,
+            ),
+            idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
+        )

-    task = PipelineTask(
-        pipeline,
-        params=PipelineParams(
-            enable_metrics=True,
-            enable_usage_metrics=True,
-        ),
-        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
-    )
+        @transport.event_handler("on_client_connected")
+        async def on_client_connected(transport, client):
+            logger.info(f"Client connected")
+            # Kick off the conversation.
+            context.add_message(
+                {"role": "developer", "content": "Please introduce yourself to the user."}
+            )
+            await task.queue_frames([LLMRunFrame()])

-    @transport.event_handler("on_client_connected")
-    async def on_client_connected(transport, client):
-        logger.info(f"Client connected")
-        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
-        await task.queue_frames([LLMRunFrame()])
+        @transport.event_handler("on_client_disconnected")
+        async def on_client_disconnected(transport, client):
+            logger.info(f"Client disconnected")
+            await task.cancel()

-    @transport.event_handler("on_client_disconnected")
-    async def on_client_disconnected(transport, client):
-        logger.info(f"Client disconnected")
-        await task.cancel()
+        runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)

-    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
-
-    await runner.run(task)
+        await runner.run(task)


 async def bot(runner_args: RunnerArguments):
--- a/examples/foundational/07x-interruptible-local.py
+++ b/examples/foundational/07x-interruptible-local.py
@@ -44,19 +44,19 @@ async def main():

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -82,7 +82,7 @@ async def main():
        ),
    )

-    messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+    context.add_message({"role": "developer", "content": "Please introduce yourself to the user."})
    await task.queue_frames([LLMRunFrame()])

    runner = PipelineRunner()
--- a/examples/foundational/07y-interruptible-minimax.py
+++ b/examples/foundational/07y-interruptible-minimax.py
@@ -63,19 +63,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
            api_key=os.getenv("MINIMAX_API_KEY", ""),
            group_id=os.getenv("MINIMAX_GROUP_ID", ""),
            aiohttp_session=session,
-            params=MiniMaxHttpTTSService.InputParams(language=Language.EN),
+            settings=MiniMaxHttpTTSService.Settings(
+                language=Language.EN,
+            ),
        )

-        llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+        llm = OpenAILLMService(
+            api_key=os.getenv("OPENAI_API_KEY"),
+            settings=OpenAILLMService.Settings(
+                system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+            ),
+        )

-        messages = [
-            {
-                "role": "system",
-                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-            },
-        ]
-
-        context = LLMContext(messages)
+        context = LLMContext()
        user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
            context,
            user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -106,7 +106,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        async def on_client_connected(transport, client):
            logger.info(f"Client connected")
            # Kick off the conversation.
-            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+            context.add_message(
+                {"role": "developer", "content": "Please introduce yourself to the user."}
+            )
            await task.queue_frames([LLMRunFrame()])

        @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07z-interruptible-sarvam-http.py
+++ b/examples/foundational/07z-interruptible-sarvam-http.py
@@ -23,7 +23,7 @@ from pipecat.processors.aggregators.llm_response_universal import (
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
-from pipecat.services.openai.llm import OpenAILLMService
+from pipecat.services.sarvam.llm import SarvamLLMService
 from pipecat.services.sarvam.stt import SarvamSTTService
 from pipecat.services.sarvam.tts import SarvamHttpTTSService
 from pipecat.transcriptions.language import Language
@@ -59,25 +59,27 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async with aiohttp.ClientSession() as session:
        stt = SarvamSTTService(
            api_key=os.getenv("SARVAM_API_KEY"),
-            model="saarika:v2.5",
+            settings=SarvamSTTService.Settings(
+                model="saarika:v2.5",
+            ),
        )

        tts = SarvamHttpTTSService(
            api_key=os.getenv("SARVAM_API_KEY"),
            aiohttp_session=session,
-            params=SarvamHttpTTSService.InputParams(language=Language.EN),
+            settings=SarvamHttpTTSService.Settings(
+                language=Language.EN_IN,
+            ),
        )

-        llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+        llm = SarvamLLMService(
+            api_key=os.getenv("SARVAM_API_KEY"),
+            settings=SarvamLLMService.Settings(
+                system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+            ),
+        )

-        messages = [
-            {
-                "role": "system",
-                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-            },
-        ]
-
-        context = LLMContext(messages)
+        context = LLMContext()
        user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
            context,
            user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -108,7 +110,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        async def on_client_connected(transport, client):
            logger.info(f"Client connected")
            # Kick off the conversation.
-            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+            context.add_message(
+                {"role": "developer", "content": "Please introduce yourself to the user."}
+            )
            await task.queue_frames([LLMRunFrame()])

        @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07z-interruptible-sarvam.py
+++ b/examples/foundational/07z-interruptible-sarvam.py
@@ -21,7 +21,7 @@ from pipecat.processors.aggregators.llm_response_universal import (
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
-from pipecat.services.openai.llm import OpenAILLMService
+from pipecat.services.sarvam.llm import SarvamLLMService
 from pipecat.services.sarvam.stt import SarvamSTTService
 from pipecat.services.sarvam.tts import SarvamTTSService
 from pipecat.transports.base_transport import BaseTransport, TransportParams
@@ -54,24 +54,26 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    stt = SarvamSTTService(
        api_key=os.getenv("SARVAM_API_KEY"),
-        model="saarika:v2.5",
+        settings=SarvamSTTService.Settings(
+            model="saaras:v3",
+        ),
    )

    tts = SarvamTTSService(
        api_key=os.getenv("SARVAM_API_KEY"),
-        model="bulbul:v2",
-        voice_id="manisha",
+        settings=SarvamTTSService.Settings(
+            model="bulbul:v3",
+            voice="shubh",
+        ),
+    )
+    llm = SarvamLLMService(
+        api_key=os.getenv("SARVAM_API_KEY"),
+        settings=SarvamLLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
    )
-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -94,6 +96,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        params=PipelineParams(
            enable_metrics=True,
            enable_usage_metrics=True,
+            allow_interruptions=True,
        ),
    )

@@ -101,7 +104,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

        # Optionally, you can wait for 30 seconds and then change the voice.
--- a/examples/foundational/07za-interruptible-soniox.py
+++ b/examples/foundational/07za-interruptible-soniox.py
@@ -24,7 +24,7 @@ from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
 from pipecat.services.cartesia.tts import CartesiaTTSService
 from pipecat.services.openai.llm import OpenAILLMService
-from pipecat.services.soniox.stt import SonioxInputParams, SonioxSTTService
+from pipecat.services.soniox.stt import SonioxSTTService
 from pipecat.transcriptions.language import Language
 from pipecat.transports.base_transport import BaseTransport, TransportParams
 from pipecat.transports.daily.transport import DailyParams
@@ -53,7 +53,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    stt = SonioxSTTService(
        api_key=os.getenv("SONIOX_API_KEY"),
-        params=SonioxInputParams(
+        settings=SonioxSTTService.Settings(
+            # Add language hints to use a specific language
+            # Add strict mode to enforce the language hints
            language_hints=[Language.EN],
            language_hints_strict=True,
        ),
@@ -61,19 +63,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -103,7 +105,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07zb-interruptible-inworld-http.py
+++ b/examples/foundational/07zb-interruptible-inworld-http.py
@@ -58,22 +58,21 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        tts = InworldHttpTTSService(
            api_key=os.getenv("INWORLD_API_KEY", ""),
            aiohttp_session=session,
-            voice_id="Ashley",
-            model="inworld-tts-1",
-            # Set to False for non-streaming mode or True for streaming mode.
            streaming=True,
+            settings=InworldHttpTTSService.Settings(
+                voice="Ashley",
+            ),
+            # Set to False for non-streaming mode or True for streaming mode.
        )

-        llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+        llm = OpenAILLMService(
+            api_key=os.getenv("OPENAI_API_KEY"),
+            settings=OpenAILLMService.Settings(
+                system_instruction="You are a helpful AI demonstrating Inworld AI's TTS. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a friendly and helpful way.",
+            ),
+        )

-        messages = [
-            {
-                "role": "system",
-                "content": "You are a helpful AI demonstrating Inworld AI's TTS. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a friendly and helpful way.",
-            },
-        ]
-
-        context = LLMContext(messages)
+        context = LLMContext()
        user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
            context,
            user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -111,7 +110,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        async def on_client_connected(transport, client):
            logger.info("Client connected")
            # Kick off the conversation.
-            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+            context.add_message(
+                {"role": "developer", "content": "Please introduce yourself to the user."}
+            )
            await task.queue_frames([LLMRunFrame()])

        @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07zb-interruptible-inworld.py
+++ b/examples/foundational/07zb-interruptible-inworld.py
@@ -10,8 +10,7 @@ from dotenv import load_dotenv
 from loguru import logger

 from pipecat.audio.vad.silero import SileroVADAnalyzer
-from pipecat.frames.frames import LLMRunFrame, TTSTextFrame
-from pipecat.observers.loggers.debug_log_observer import DebugLogObserver, FrameEndpoint
+from pipecat.frames.frames import LLMRunFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -25,7 +24,6 @@ from pipecat.runner.utils import create_transport
 from pipecat.services.deepgram.stt import DeepgramSTTService
 from pipecat.services.inworld.tts import InworldTTSService
 from pipecat.services.openai.llm import OpenAILLMService
-from pipecat.transports.base_output import BaseOutputTransport
 from pipecat.transports.base_transport import BaseTransport, TransportParams
 from pipecat.transports.daily.transport import DailyParams
 from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
@@ -56,21 +54,20 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = InworldTTSService(
        api_key=os.getenv("INWORLD_API_KEY", ""),
-        voice_id="Ashley",
-        model="inworld-tts-1",
-        temperature=1.1,
+        settings=InworldTTSService.Settings(
+            voice="Ashley",
+            temperature=1.1,
+        ),
    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful AI demonstrating Inworld AI's TTS. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a friendly and helpful way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful AI demonstrating Inworld AI's TTS. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a friendly and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -94,13 +91,6 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
            enable_metrics=True,
            enable_usage_metrics=True,
        ),
-        observers=[
-            DebugLogObserver(
-                frame_types={
-                    TTSTextFrame: (BaseOutputTransport, FrameEndpoint.SOURCE),
-                }
-            ),
-        ],
        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
    )

@@ -108,7 +98,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info("Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07zc-interruptible-asyncai-http.py
+++ b/examples/foundational/07zc-interruptible-asyncai-http.py
@@ -60,20 +60,20 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

        tts = AsyncAIHttpTTSService(
            api_key=os.getenv("ASYNCAI_API_KEY", ""),
-            voice_id=os.getenv("ASYNCAI_VOICE_ID", "e0f39dc4-f691-4e78-bba5-5c636692cc04"),
+            settings=AsyncAIHttpTTSService.Settings(
+                voice="e0f39dc4-f691-4e78-bba5-5c636692cc04",
+            ),
            aiohttp_session=session,
        )

-        llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+        llm = OpenAILLMService(
+            api_key=os.getenv("OPENAI_API_KEY"),
+            settings=OpenAILLMService.Settings(
+                system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+            ),
+        )

-        messages = [
-            {
-                "role": "system",
-                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-            },
-        ]
-
-        context = LLMContext(messages)
+        context = LLMContext()
        user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
            context,
            user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -104,7 +104,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        async def on_client_connected(transport, client):
            logger.info(f"Client connected")
            # Kick off the conversation.
-            messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+            context.add_message(
+                {"role": "developer", "content": "Please introduce yourself to the user."}
+            )
            await task.queue_frames([LLMRunFrame()])

        @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07zc-interruptible-asyncai.py
+++ b/examples/foundational/07zc-interruptible-asyncai.py
@@ -57,19 +57,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = AsyncAITTSService(
        api_key=os.getenv("ASYNCAI_API_KEY", ""),
-        voice_id=os.getenv("ASYNCAI_VOICE_ID", "e0f39dc4-f691-4e78-bba5-5c636692cc04"),
+        settings=AsyncAITTSService.Settings(
+            voice="e0f39dc4-f691-4e78-bba5-5c636692cc04",
+        ),
    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -100,7 +100,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07zd-interruptible-aicoustics.py
+++ b/examples/foundational/07zd-interruptible-aicoustics.py
@@ -40,7 +40,7 @@ def _create_aic_filter() -> AICFilter:

    return AICFilter(
        license_key=license_key,
-        model_id="quail-vf-l-16khz",
+        model_id="quail-vf-2.0-l-16khz",
    )


@@ -77,19 +77,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=aic_vad_analyzer),
@@ -128,7 +128,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
        logger.info(f"Client connected")
        await audiobuffer.start_recording()
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

    @audiobuffer.event_handler("on_audio_data")
--- a/examples/foundational/07ze-interruptible-hume.py
+++ b/examples/foundational/07ze-interruptible-hume.py
@@ -59,19 +59,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    tts = HumeTTSService(
        api_key=os.getenv("HUME_API_KEY"),
        # Replace with your Hume voice ID
-        voice_id="f898a92e-685f-43fa-985b-a46920f0650b",
+        settings=HumeTTSService.Settings(
+            voice="f898a92e-685f-43fa-985b-a46920f0650b",
+        ),
    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -113,7 +113,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
            "💡 Word timestamps are enabled! Watch the console for TTSTextFrame logs showing each word with its PTS."
        )
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07zf-interruptible-gradium.py
+++ b/examples/foundational/07zf-interruptible-gradium.py
@@ -55,27 +55,27 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    stt = GradiumSTTService(
        api_key=os.getenv("GRADIUM_API_KEY"),
        api_endpoint_base_url="wss://us.api.gradium.ai/api/speech/asr",
-        params=GradiumSTTService.InputParams(
+        settings=GradiumSTTService.Settings(
            language=Language.EN,
        ),
    )

    tts = GradiumTTSService(
        api_key=os.getenv("GRADIUM_API_KEY"),
-        voice_id="YTpq7expH9539ERJ",
        url="wss://us.api.gradium.ai/api/speech/tts",
+        settings=GradiumTTSService.Settings(
+            voice="YTpq7expH9539ERJ",
+        ),
    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -106,7 +106,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07zg-interruptible-camb.py
+++ b/examples/foundational/07zg-interruptible-camb.py
@@ -56,21 +56,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = CambTTSService(
        api_key=os.getenv("CAMB_API_KEY"),
-        model="mars-flash",
+        settings=CambTTSService.Settings(
+            model="mars-flash",
+        ),
    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful voice assistant powered by Camb AI text-to-speech. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Keep responses concise.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful voice assistant powered by Camb AI text-to-speech. "
-            "Keep your responses concise and conversational since they will be spoken aloud. "
-            "Avoid special characters, emojis, or bullet points.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -101,7 +99,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    @transport.event_handler("on_client_connected")
    async def on_client_connected(transport, client):
        logger.info("Client connected")
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07zi-interruptible-piper.py
+++ b/examples/foundational/07zi-interruptible-piper.py
@@ -54,18 +54,20 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))

-    tts = PiperTTSService(voice_id="en_US-ryan-high")
+    tts = PiperTTSService(
+        settings=PiperTTSService.Settings(
+            voice="en_US-ryan-high",
+        ),
+    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -96,7 +98,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07zj-interruptible-kokoro.py
+++ b/examples/foundational/07zj-interruptible-kokoro.py
@@ -54,18 +54,20 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))

-    tts = KokoroTTSService(voice_id="af_heart")
+    tts = KokoroTTSService(
+        settings=KokoroTTSService.Settings(
+            voice="af_heart",
+        ),
+    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -96,7 +98,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07zk-interruptible-resemble.py
+++ b/examples/foundational/07zk-interruptible-resemble.py
@@ -30,24 +30,20 @@ from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams

 load_dotenv(override=True)

-# We store functions so objects (e.g. SileroVADAnalyzer) don't get
-# instantiated. The function will be called when the desired transport gets
-# selected.
+# We use lambdas to defer transport parameter creation until the transport
+# type is selected at runtime.
 transport_params = {
    "daily": lambda: DailyParams(
        audio_in_enabled=True,
        audio_out_enabled=True,
-        vad_analyzer=SileroVADAnalyzer(),
    ),
    "twilio": lambda: FastAPIWebsocketParams(
        audio_in_enabled=True,
        audio_out_enabled=True,
-        vad_analyzer=SileroVADAnalyzer(),
    ),
    "webrtc": lambda: TransportParams(
        audio_in_enabled=True,
        audio_out_enabled=True,
-        vad_analyzer=SileroVADAnalyzer(),
    ),
 }

@@ -59,19 +55,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = ResembleAITTSService(
        api_key=os.getenv("RESEMBLE_API_KEY"),
-        voice_id=os.getenv("RESEMBLE_VOICE_UUID"),
+        settings=ResembleAITTSService.Settings(
+            voice=os.getenv("RESEMBLE_VOICE_UUID"),
+        ),
    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -102,7 +98,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/07zl-interruptible-smallest.py
+++ b/examples/foundational/07zl-interruptible-smallest.py
@@ -1,5 +1,5 @@
 #
-# Copyright (c) 2024–2026, Daily
+# Copyright (c) 2024-2026, Daily
 #
 # SPDX-License-Identifier: BSD 2-Clause License
 #
@@ -21,17 +21,16 @@ from pipecat.processors.aggregators.llm_response_universal import (
 )
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
-from pipecat.services.hathora.stt import HathoraSTTService
-from pipecat.services.hathora.tts import HathoraTTSService
+from pipecat.services.deepgram.stt import DeepgramSTTService
 from pipecat.services.openai.llm import OpenAILLMService
+from pipecat.services.smallest.tts import SmallestTTSService
 from pipecat.transports.base_transport import BaseTransport, TransportParams
 from pipecat.transports.daily.transport import DailyParams
 from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams

 load_dotenv(override=True)

-# We use lambdas to defer transport parameter creation until the transport
-# type is selected at runtime.
+
 transport_params = {
    "daily": lambda: DailyParams(
        audio_in_enabled=True,
@@ -51,43 +50,39 @@ transport_params = {
 async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    logger.info(f"Starting bot")

-    stt = HathoraSTTService(
-        model="nvidia-parakeet-tdt-0.6b-v3",
+    stt = DeepgramSTTService(
+        api_key=os.getenv("DEEPGRAM_API_KEY"),
    )

-    tts = HathoraTTSService(
-        model="hexgrad-kokoro-82m",
+    tts = SmallestTTSService(
+        api_key=os.getenv("SMALLEST_API_KEY"),
+        settings=SmallestTTSService.Settings(
+            voice="sophia",
+        ),
    )

-    # See https://models.hathora.dev/model/qwen3-30b-a3b
    llm = OpenAILLMService(
-        base_url="https://app-362f7ca1-6975-4e18-a605-ab202bf2c315.app.hathora.dev/v1",
-        api_key=os.getenv("HATHORA_API_KEY"),
-        model=None,
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
+        ),
    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
-    context_aggregator = LLMContextAggregatorPair(
+    context = LLMContext()
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
    )

    pipeline = Pipeline(
        [
-            transport.input(),  # Transport user input
+            transport.input(),
            stt,
-            context_aggregator.user(),  # User responses
-            llm,  # LLM
-            tts,  # TTS
-            transport.output(),  # Transport bot output
-            context_aggregator.assistant(),  # Assistant spoken responses
+            user_aggregator,
+            llm,
+            tts,
+            transport.output(),
+            assistant_aggregator,
        ]
    )

@@ -97,14 +92,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
            enable_metrics=True,
            enable_usage_metrics=True,
        ),
-        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
    )

    @transport.event_handler("on_client_connected")
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
-        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/08-custom-frame-processor.py
+++ b/examples/foundational/08-custom-frame-processor.py
@@ -95,19 +95,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -141,7 +141,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected: {client}")
        # Kick off the conversation.
-        messages.append({"role": "system", "content": "Please introduce yourself to the user."})
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/10-wake-phrase.py
+++ b/examples/foundational/10-wake-phrase.py
@@ -10,7 +10,7 @@ from dotenv import load_dotenv
 from loguru import logger

 from pipecat.audio.vad.silero import SileroVADAnalyzer
-from pipecat.frames.frames import TTSSpeakFrame
+from pipecat.frames.frames import LLMRunFrame
 from pipecat.pipeline.pipeline import Pipeline
 from pipecat.pipeline.runner import PipelineRunner
 from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -19,7 +19,6 @@ from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
    LLMUserAggregatorParams,
 )
-from pipecat.processors.filters.wake_check_filter import WakeCheckFilter
 from pipecat.runner.types import RunnerArguments
 from pipecat.runner.utils import create_transport
 from pipecat.services.cartesia.tts import CartesiaTTSService
@@ -28,6 +27,11 @@ from pipecat.services.openai.llm import OpenAILLMService
 from pipecat.transports.base_transport import BaseTransport, TransportParams
 from pipecat.transports.daily.transport import DailyParams
 from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
+from pipecat.turns.user_start import WakePhraseUserTurnStartStrategy
+from pipecat.turns.user_turn_strategies import (
+    UserTurnStrategies,
+    default_user_turn_start_strategies,
+)

 load_dotenv(override=True)

@@ -52,35 +56,49 @@ transport_params = {
 async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    logger.info(f"Starting bot")

-    stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
+    stt = DeepgramSTTService(
+        api_key=os.getenv("DEEPGRAM_API_KEY"),
+        settings=DeepgramSTTService.Settings(
+            keyterm=["pipecat"],
+        ),
+    )

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful assistant. Respond to what the user said in a creative and helpful way. Keep your responses brief.",
-        },
-    ]
-
-    hey_robot_filter = WakeCheckFilter(["hey robot", "hey, robot"])
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
-        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+        user_params=LLMUserAggregatorParams(
+            user_turn_strategies=UserTurnStrategies(
+                start=[
+                    WakePhraseUserTurnStartStrategy(
+                        phrases=["pipecat"],
+                        # Timeout before wake phrase must be spoken again
+                        timeout=5.0,
+                    ),
+                    *default_user_turn_start_strategies(),
+                ]
+            ),
+            vad_analyzer=SileroVADAnalyzer(),
+        ),
    )

    pipeline = Pipeline(
        [
            transport.input(),  # Transport user input
-            stt,  # STT
-            hey_robot_filter,  # Filter out speech not directed at the robot
+            stt,
            user_aggregator,  # User responses
            llm,  # LLM
            tts,  # TTS
@@ -102,7 +120,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    async def on_client_connected(transport, client):
        logger.info(f"Client connected")
        # Kick off the conversation.
-        await task.queue_frame(TTSSpeakFrame("Hi! If you want to talk to me, just say 'Hey Robot'"))
+        context.add_message(
+            {"role": "developer", "content": "Please introduce yourself to the user."}
+        )
+        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
    async def on_client_disconnected(transport, client):
--- a/examples/foundational/11-sound-effects.py
+++ b/examples/foundational/11-sound-effects.py
@@ -104,21 +104,21 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
+        ),
+    )

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
--- a/examples/foundational/12-describe-image-openai-responses.py
+++ b/examples/foundational/12-describe-image-openai-responses.py
@@ -0,0 +1,139 @@
+#
+# Copyright (c) 2024-2026, Daily
+#
+# SPDX-License-Identifier: BSD 2-Clause License
+#
+
+
+import os
+
+from dotenv import load_dotenv
+from loguru import logger
+from PIL import Image
+
+from pipecat.audio.vad.silero import SileroVADAnalyzer
+from pipecat.frames.frames import LLMRunFrame
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.llm_context import LLMContext
+from pipecat.processors.aggregators.llm_response_universal import (
+    LLMContextAggregatorPair,
+    LLMUserAggregatorParams,
+)
+from pipecat.runner.types import RunnerArguments
+from pipecat.runner.utils import create_transport
+from pipecat.services.cartesia.tts import CartesiaTTSService
+from pipecat.services.deepgram.stt import DeepgramSTTService
+from pipecat.services.openai.responses.llm import OpenAIResponsesLLMService
+from pipecat.transports.base_transport import BaseTransport, TransportParams
+from pipecat.transports.daily.transport import DailyParams
+
+load_dotenv(override=True)
+
+
+# We use lambdas to defer transport parameter creation until the transport
+# type is selected at runtime.
+transport_params = {
+    "daily": lambda: DailyParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+    "webrtc": lambda: TransportParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+    ),
+}
+
+
+async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
+    logger.info(f"Starting bot")
+
+    stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
+
+    tts = CartesiaTTSService(
+        api_key=os.getenv("CARTESIA_API_KEY"),
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
+    )
+
+    llm = OpenAIResponsesLLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAIResponsesLLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way. You are also able to describe images.",
+        ),
+    )
+
+    context = LLMContext()
+    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
+        context,
+        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
+    )
+
+    pipeline = Pipeline(
+        [
+            transport.input(),  # Transport user input
+            stt,  # STT
+            user_aggregator,  # User responses
+            llm,  # LLM
+            tts,  # TTS
+            transport.output(),  # Transport bot output
+            assistant_aggregator,  # Assistant spoken responses
+        ]
+    )
+
+    task = PipelineTask(
+        pipeline,
+        params=PipelineParams(
+            enable_metrics=True,
+            enable_usage_metrics=True,
+        ),
+        idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
+    )
+
+    @transport.event_handler("on_client_connected")
+    async def on_client_connected(transport, client):
+        logger.info(f"Client connected")
+
+        if not runner_args.body:
+            script_dir = os.path.dirname(__file__)
+            runner_args.body = {
+                "image_path": os.path.join(script_dir, "assets", "cat.jpg"),
+                "question": "Describe this image",
+            }
+
+        image_path = runner_args.body["image_path"]
+        question = runner_args.body["question"]
+
+        # Kick off the conversation.
+        image = Image.open(image_path)
+        message = await LLMContext.create_image_message(
+            image=image.tobytes(),
+            format="RGB",
+            size=image.size,
+            text=question,
+        )
+        context.add_message(message)
+        await task.queue_frames([LLMRunFrame()])
+
+    @transport.event_handler("on_client_disconnected")
+    async def on_client_disconnected(transport, client):
+        logger.info(f"Client disconnected")
+        await task.cancel()
+
+    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
+
+    await runner.run(task)
+
+
+async def bot(runner_args: RunnerArguments):
+    """Main bot entry point compatible with Pipecat Cloud."""
+    transport = await create_transport(runner_args, transport_params)
+    await run_bot(transport, runner_args)
+
+
+if __name__ == "__main__":
+    from pipecat.runner.run import main
+
+    main()
--- a/examples/foundational/12-describe-image-openai.py
+++ b/examples/foundational/12-describe-image-openai.py
@@ -53,19 +53,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
    )

-    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
+    llm = OpenAILLMService(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        settings=OpenAILLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way. You are also able to describe images.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way. You are also able to describe images.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -114,7 +114,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
            size=image.size,
            text=question,
        )
-        messages.append(message)
+        context.add_message(message)
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/12a-describe-image-anthropic.py
+++ b/examples/foundational/12a-describe-image-anthropic.py
@@ -53,19 +53,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
    )

-    llm = AnthropicLLMService(api_key=os.getenv("ANTHROPIC_API_KEY"))
+    llm = AnthropicLLMService(
+        api_key=os.getenv("ANTHROPIC_API_KEY"),
+        settings=AnthropicLLMService.Settings(
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way. You are also able to describe images.",
+        ),
+    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way. You are also able to describe images.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -114,7 +114,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
            size=image.size,
            text=question,
        )
-        messages.append(message)
+        context.add_message(message)
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/examples/foundational/12b-describe-image-aws.py
+++ b/examples/foundational/12b-describe-image-aws.py
@@ -53,26 +53,21 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):

    tts = CartesiaTTSService(
        api_key=os.getenv("CARTESIA_API_KEY"),
-        voice_id="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        settings=CartesiaTTSService.Settings(
+            voice="71a7ad14-091c-4e8e-a314-022ece01c121",  # British Reading Lady
+        ),
    )

    llm = AWSBedrockLLMService(
        aws_region="us-west-2",
-        model="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
-        # Note: usually, prefer providing latency="optimized" param.
-        # Here we can't because AWS Bedrock doesn't support it for Claude 3.7,
-        # which we need for image input.
-        params=AWSBedrockLLMService.InputParams(temperature=0.8),
+        settings=AWSBedrockLLMService.Settings(
+            model="us.anthropic.claude-sonnet-4-6",
+            temperature=0.8,
+            system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way. You are also able to describe images.",
+        ),
    )

-    messages = [
-        {
-            "role": "system",
-            "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way. You are also able to describe images.",
-        },
-    ]
-
-    context = LLMContext(messages)
+    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -121,7 +116,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
            size=image.size,
            text=question,
        )
-        messages.append(message)
+        context.add_message(message)
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
--- a/Show More
+++ b/Show More