pipecat

Author	SHA1	Message	Date
Filipi da Silva Fuchter	95db08646c	Merge pull request #4430 from pipecat-ai/filipi/flux_audio Implementing dynamic watchdog timeout for Deepgram Flux STT	2026-05-06 11:40:06 -03:00
filipi87	03e5ebb266	Improving watchdog_min_timeout description.	2026-05-06 11:37:18 -03:00
filipi87	5daf267c11	Adding changelogs.	2026-05-06 11:26:14 -03:00
filipi87	1cb77b422a	Created a watchdog_min_timeout to allow to change the default value.	2026-05-06 11:22:37 -03:00
filipi87	0c779b4c3d	Implementing dynamic watchdog timeout for Deepgram Flux STT	2026-05-06 11:01:58 -03:00
Mark Backman	eda98fb13f	Merge pull request #4424 from pipecat-ai/mb/revert-elevenlabs-tts-alignment fix(elevenlabs): only use normalizedAlignment when pronunciation dict is set	2026-05-06 08:27:25 -04:00
Mark Backman	3722ee223c	Merge pull request #4419 from pipecat-ai/mb/fix-changelog-entry-4416 Fix changelog filename for 4416	2026-05-05 14:50:24 -04:00
Mark Backman	2620e76dab	docs(elevenlabs): clarify alignment leading-space handling	2026-05-05 14:49:41 -04:00
Mark Backman	2447db766e	docs(changelog): add 4424 entry for elevenlabs alignment selection fix	2026-05-05 14:49:41 -04:00
Mark Backman	61a81ed87b	fix(elevenlabs): use alignment by default, normalizedAlignment only with pronunciation dicts PR #4344 unconditionally switched to normalizedAlignment to fix garbled words with pronunciation dictionaries (#4316). But normalizedAlignment returns the post-normalized form of what was spoken - including romanization of non-Latin scripts (Chinese rendered as pinyin), which ends up in the LLM context and degrades subsequent turns. Gate the switch on pronunciation_dictionary_locators being configured. Adds a _select_alignment helper with preferred-with-fallback (both fields are nullable per the API schema), used by both the WebSocket and HTTP services. Tests cover dictionary mode, default mode, fallback when preferred is missing or null, and HTTP field-name variants.	2026-05-05 14:49:41 -04:00
Mark Backman	735cd09c7e	Merge pull request #4422 from cshape/tts-2 feat(inworld): default to inworld-tts-2	2026-05-05 14:00:04 -04:00
Cale Shapera	84eefba4df	docs: add changelog fragment for tts-2 default flip	2026-05-05 09:20:16 -07:00
Cale Shapera	fe3af5d9f7	feat(inworld): default to inworld-tts-2 Flip the default Inworld TTS model from inworld-tts-1.5-max to inworld-tts-2 across: - InworldHttpTTSService (HTTP) - InworldTTSService (WebSocket) - InworldRealtimeLLMService (cascade Realtime) inworld-tts-1.5-max and inworld-tts-1.5-mini remain valid options; existing users can pin the prior model explicitly via the model setting. Docstring examples updated to reference the new default.	2026-05-05 09:20:16 -07:00
Mark Backman	7729eecfe4	Fix changelog filename for 4416	2026-05-04 21:54:58 -04:00
Mark Backman	fa31a2fd63	Merge pull request #4416 from pipecat-ai/mb/pr-4333-aws-credentials-review feat(aws): add shared credential resolver with boto3 chain fallback	2026-05-04 21:48:33 -04:00
Mark Backman	678d40e102	docs(changelog): add 4333 entries for AWS credential resolver expansion	2026-05-04 19:30:37 -04:00
Mark Backman	8becafee38	fix(aws): use shared credential resolver in Polly, Bedrock, AgentCore Polly TTS, Bedrock LLM, and AgentCore previously did `arg or os.getenv("AWS_...")` and handed the result straight to aioboto3. When only one of `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` was set, aioboto3 received a half-populated kwarg and errored instead of falling through to the boto3 credential provider chain (instance profiles, IRSA, ECS task roles, SSO, etc.). Route credential resolution through the shared `resolve_credentials()` helper introduced for AWS Transcribe so all four services follow the same `explicit → env → boto3 chain` fallback. Add an `AWSCredentials.to_boto_kwargs()` method to bridge the dataclass field names (`access_key`, `secret_key`) to the aioboto3 kwargs (`aws_access_key_id`, `aws_secret_access_key`). No public API changes. Behaviour is identical for fully-explicit and fully-env-var configurations; partial env vars now correctly trigger the chain instead of erroring.	2026-05-04 19:23:53 -04:00
Mark Backman	83190d38e9	Merge pull request #4414 from pipecat-ai/mb/fix-ttsspeakframe-assistant-turn-stopped	2026-05-04 18:12:33 -04:00
Mark Backman	7519c26ac5	Merge pull request #4417 from pipecat-ai/mb/resolve-runner-filepath	2026-05-04 18:09:34 -04:00
Mark Backman	b2b7e9ee6f	Merge pull request #4415 from pipecat-ai/mb/fix-elevenlabs-leading-spaces-flash	2026-05-04 18:08:31 -04:00
Mark Backman	e864d5778a	ci: install runner extra for the coverage job	2026-05-04 16:44:47 -04:00
Mark Backman	89f10dd9a1	test: drop webrtc-dependent test, remove webrtc extra from CI	2026-05-04 16:42:05 -04:00
Mark Backman	f67e3ef0b2	ci: install runner and webrtc extras for the test job	2026-05-04 16:29:58 -04:00
Mark Backman	5b087d6aeb	docs: add changelog for #4417	2026-05-04 16:22:26 -04:00
Mark Backman	e780f759d0	fix: validate download path containment in runner Resolve and contain the user-supplied filename before serving it from the runner's /files endpoint. Also raise a 404 (instead of returning None) when the downloads folder is unset, and use the resolved basename for Content-Disposition.	2026-05-04 16:20:27 -04:00
Daniel Wirjo	35153de28e	feat(aws): add shared credential resolver with boto3 chain fallback AWS Transcribe STT previously only supported credentials via explicit parameters or environment variables. Services running with IAM roles (EKS pod roles, IRSA, ECS task roles, EC2 instance profiles) or SSO couldn't use Transcribe without exporting static credentials. Changes: - Add resolve_credentials() to utils.py providing a standard fallback chain: explicit params → environment variables → boto3 credential provider chain (instance profiles, IRSA, pod roles, SSO, etc.) - Add AWSCredentials dataclass for type-safe credential passing - Update AWSTranscribeSTTService to use resolve_credentials() instead of manual os.getenv() calls - The boto3 fallback is only attempted when both access key and secret key are unresolved, avoiding replacement of explicitly provided creds - boto3 is imported lazily inside the function to avoid hard dependency for services that don't need the fallback chain - Add 7 unit tests covering the credential resolution chain The Bedrock LLM and Polly TTS services already support the full credential chain via aioboto3.Session() and are not modified. Related to #4197	2026-05-04 15:40:06 -04:00
Mark Backman	9886d72f5e	Add changelog for PR #4415	2026-05-04 15:18:15 -04:00
Mark Backman	90e6b51acd	Fix ElevenLabs alignment chunk spacing	2026-05-04 15:15:37 -04:00
Mark Backman	61acdba3ae	docs: add changelog entry for #4414	2026-05-04 10:43:52 -04:00
Mark Backman	f1a3ee97de	fix: surface TTSSpeakFrame greetings in on_assistant_turn_stopped Two issues were causing TTSSpeakFrame(append_to_context=True) greetings to silently lose their trailing words and never fire on_assistant_turn_stopped: - LLMAssistantPushAggregationFrame was emitted without a PTS, so the transport routed it through the audio (sync) queue while word-level TTSTextFrames travel through the clock queue. The aggregation could reach the assistant aggregator before the final words, leaving them orphaned in the buffer. Stamp the frame with `_word_last_pts + 1` when there are word timestamps so it can't overtake them. - The aggregator's LLMAssistantPushAggregationFrame handler called push_aggregation() directly, bypassing _trigger_assistant_turn_stopped. For TTS-only flows there is no LLMFullResponseStartFrame, so the turn start timestamp was never set and on_assistant_turn_stopped never fired. Open a turn (if needed) and trigger stopped from the handler. Fixes #4264.	2026-05-04 10:41:22 -04:00
Mark Backman	b363b91d12	Merge pull request #4401 from pipecat-ai/mb/grok-realtime-model fix(xai/realtime): pass model as query param on connect	2026-05-04 09:44:33 -04:00
Mark Backman	30efd11e15	Merge pull request #4397 from pipecat-ai/mb/smallwebrtc-trace-app-message	2026-05-01 20:47:04 -04:00
kompfner	a745e8d318	Merge pull request #4378 from pipecat-ai/pk/more-pyright-fixes More pyright fixes	2026-05-01 14:09:27 -04:00
Paul Kompfner	2730e47e61	ci: install all extras for the pyright type-check job The pyright job in `format.yaml` previously installed only `--extra daily --extra tracing`. That was sufficient when most optional-dep- using files were in the pyright ignore list, but as this PR has cleared dozens of files, those files now reference symbols from optional-dep modules (`aiortc.RTCIceServer` via `IceServer`, `google.genai.types.HttpOptions`, etc.). `reportMissingImports: false` tolerates the failed imports themselves, but the imported names become `Unknown` and using them as type expressions trips `reportInvalidTypeForm` / `reportAttributeAccessIssue` — errors that aren't gated by that flag. Switch to `--all-extras --no-extra gstreamer --no-extra local` (matching the dev setup in README.md), so pyright sees the same dependency set the code is intended to be type-checked against and the install-set scales naturally as more files leave the ignore list. Also reconcile CLAUDE.md's setup command, which only excluded `gstreamer`. README.md is canonical and additionally excludes `local` (pyaudio requires `portaudio` native libs that aren't installed by default on a clean Ubuntu CI runner).	2026-05-01 09:36:14 -04:00
Paul Kompfner	4703df8686	fix: clear 8 more services from pyright ignore list A fourth pass over low-error-count files. Drops 8 files (57 → 49) and full-pyright errors from 525 → 496. Default pyright stays clean. Optional access on transport/client receivers (4 files). Same fix shape as #4359 — a receiver typed `X \| None` accessed without a guard. For "should never happen" cases (caller's lifecycle ensures the field is non-None when the method runs), used `assert` rather than silent early-return so an invariant violation surfaces loudly: - `transports/whatsapp/client.py` (5 errors): `_validate_whatsapp_webhook_request` was typed `bytes` / `str` but called with `bytes \| None` / `str \| None`. Widened the helper signature and pushed the explicit None-check inside (matching its existing empty-string check). Also handled `pipecat_connection.get_answer()` returning `None` — would have crashed at `.get("sdp")` before. - `transports/websocket/client.py` (5 errors): four are the deprecated `websockets.WebSocketClientProtocol` alias (same `# pyright: ignore[reportAttributeAccessIssue]` as the `services/websocket_service.py` fix from earlier in this PR). The fifth was `async for message in self._websocket` — traced the call chain and confirmed `_client_task` is created only after `self._websocket` is assigned and cancelled before it's cleared, so the field is never None when `_client_task_handler` runs. Used `assert`. - `services/openai/stt.py` (4 errors): same pattern. `_receive_messages` is started by `_connect()` only when `self._websocket` is set, and the reconnect loop in `WebsocketService._receive_task_handler` re-establishes it before each retry. `assert` at entry. Plus L478/L483: the `try`/`except ModuleNotFoundError` import-guard makes `websocket_connect` and `State` `<type> \| None`; `__init__` already raises `ImportError` if either is None, so an `assert` at the `_connect_websocket` use site is honest. Plus an L538 `Language \| str` cast (same shape as last batch). - `services/deepgram/flux/base.py` (2 errors): `event = data.get("event")` flowed into `_handle_turn_resumed(event: str)` as `Any \| None`. Tightened with an `isinstance(event, str)` guard before the `FluxEventType(event)` lookup. The other error (`average_confidence > min_confidence` where `min_confidence: float \| None`) was a latent crash on missing confidence data — restored the original `not min_confidence` (which treats both `None` and `0.0` as "no filter") and added an explicit drop-on-missing-confidence-data branch. `gemini_live` Settings/InputParams (vertex). The deprecated `InputParams` declares `modalities: GeminiModalities \| None` and `media_resolution: GeminiMediaResolution \| None`, but their downstream usage at `services/google/gemini_live/llm.py:952,959` calls `.value` on each — `None` would crash. Rather than touching the deprecated input model, translate `None` to the canonical defaults (`GeminiModalities.AUDIO`, `GeminiMediaResolution.UNSPECIFIED`) at the assignment site in `vertex/llm.py`. Also fixed an unrelated annotation bug: `_get_credentials` was annotated `-> str` but actually returns `service_account.Credentials` (used correctly by the caller — only the annotation was wrong). `moondream/vision.py` (3 errors). `frame.format` is `str \| None` but `Image.frombytes(mode, ...)` requires `str`; raise instead of crashing on missing format. The other two errors are pyright thinking the moondream2-custom `encode_image` and `query` methods are `Tensor` (rather than callables) — those are provided by the model code via `trust_remote_code=True` and aren't visible to pyright on the base `AutoModelForCausalLM` type. Scoped `# pyright: ignore[reportCallIssue]` on the two call sites. `transports/base_output.py` (3 errors). Two are `self._mixer.mix(...)` calls in `with_mixer`, a closure invoked only when `self._mixer` is truthy at the call site — captured the mixer to a local variable inside the closure with an `assert`, then used that. Third is the PIL `frombytes(mode, ...)` shape — `frame.format is None` early- return guard at the top of `resize_frame` so the main resize logic reads cleanly. `elevenlabs/tts.py` (4 errors). The payload-building dict at L1271 was typed `dict[str, str \| dict[str, float \| bool]]` — an aspirational shape that matched only the first two assignments. Subsequent code assigned `list[dict[...]]` (pronunciation locators) and bools, all violating the annotation. Same pattern at L926 (the WebSocket-init `msg`). Both widened to `dict[str, Any]`, which is the honest shape for a JSON request payload and what similar code uses elsewhere. Files dropped from the ignore list (57 → 49): services/deepgram/flux/base.py, services/elevenlabs/tts.py, services/google/gemini_live/vertex/llm.py, services/moondream/vision.py, services/openai/stt.py, transports/base_output.py, transports/websocket/client.py, transports/whatsapp/client.py.	2026-05-01 09:36:14 -04:00
Paul Kompfner	26a40e2e62	fix: clear 10 more services from pyright ignore list A third pass over low-error-count files in the ignore list. Drops 10 files (67 → 57) and full-pyright errors from 555 → 525. Default pyright stays clean. Optional access guards (4 files). The same fix shape as 9e9b1f39e: a receiver typed `X \| None` accessed without a guard, fixed with a local-var capture or an early return. - `mistral/stt.py`: `_connection.send_audio` could crash if `_connect()` swallowed an exception and left `_connection` unset; drop the audio chunk with a warning instead. `_receive_events` iterating `_connection.events()` got the same defensive narrowing. - `deepgram/flux/stt.py`: `_websocket_url` is set in `_connect` before `_connect_websocket` is called, but pyright doesn't track that across methods — assert at the use site. `websocket.response` is `Response \| None` in the websockets stubs even though it's always populated post-handshake; guarded with a fallback. - `audio/filters/rnnoise_filter.py`: the module-level import sets `RNNoise` to `None` if `pyrnnoise` isn't installed; raise `ImportError` explicitly instead of relying on the existing try- block to catch the `None(...)` call. Also gated `filter()` with `or self._rnnoise is None` so pyright sees the narrowing. - `transports/smallwebrtc/request_handler.py`: `get_answer()` legitimately returns `None`; raise instead of crashing on three subscript accesses. `TTSService` `audio-context` API tightening. Mirroring the `append_to_audio_context` fix from the previous batch: `remove_audio_context` was typed `str` but is called with `str \| None` from `get_active_audio_context_id()` results. Widened to `str \| None` and the `None` handling lives in the function body (early debug log + return) — matching `append_to_audio_context`'s shape. `audio_context_available` keeps its narrow `str` signature; asking "is `None` available?" isn't a meaningful question (`_audio_contexts` is `dict[str, asyncio.Queue]`). The internal call site in `on_turn_context_completed` narrows `_turn_context_id` explicitly before passing it. Side effect: deepgram/tts.py's L307 error clears without local changes. `deepgram/tts.py` (4 errors → 0): the same `push_error(ErrorFrame(...))` latent bug we fixed in resembleai earlier in this PR — `push_error` takes a string; there's a separate `push_error_frame` for frames. Two sites switched. The Optional `_websocket.response` access is guarded the same way as deepgram/flux/stt.py. The `remove_audio_context` error was cleared by the tightening above. `aws/utils.py` (3 errors → 0): `AWSTranscribePresignedURL` declared `session_token: str` but the dict source is `str \| None` (AWS supports long-term IAM creds without a session token). Same for `vocabulary_name`/`vocabulary_filter_name` on `get_request_url`, which were typed `str = ""` even though the body uses truthy checks to skip them. Widened to `str \| None = None` — matches actual runtime semantics. `audio/dtmf/utils.py` (2 errors → 0): `files("...").joinpath(...)` returns a `Traversable`, but `aiofiles.open` wants a real path. For regular pip installs this worked in practice (Traversable was a `Path`), but it would fail for zipped distributions (zipapp, zipimport) where the resource isn't on disk. Wrapped in `importlib.resources.as_file(...)` — the canonical bridge that extracts to a temp file when the resource isn't already on the filesystem. Validated end-to-end: regular install still reads bytes; ad-hoc zipapp test confirmed `as_file` extracts the resource and returns a real Path. `openai/image.py` (2 errors → 0): the `size` arg to `images.generate` is `Literal[...] \| None` in the SDK but our settings field is `str \| None`. Mirrored the `groq/tts.py` hint-not-constraint pattern from the previous batch: defined a module-level `OpenAIImageSize = Literal[...]` alias with a comment attributing the upstream symbol and documenting the cast contract (callers can pass any string; invalid values surface as an OpenAI API error). Also guarded `image.data[0]` (response.data is `list[Image] \| None`). `processors/frameworks/{langchain,strands_agents}.py` (4 + 4 → 0): both processors do `messages[-1]["content"]` on a value typed `LLMStandardMessage \| LLMSpecificMessage` (the latter is a dataclass, not a dict, so `__getitem__` errors). Historically these only handled plain-text user messages, so the fix is two explicit guards (skip if the last message isn't a dict; skip if `content` isn't a string) plus a TODO noting that other shapes (multi-modal content, provider-specific messages) aren't supported yet. langchain's `__get_token_value` also got a small fix where `AIMessageChunk.content` is `str \| list[parts]` but the function declares `-> str`; stringify the list case. strands_agents' surfaced two unrelated narrows: a `graph_exit_node: str \| None` arg gated by an `__init__`-time assert, and `agent.stream_async` reached only when we're not in graph mode. Files dropped from the ignore list (67 → 57): audio/dtmf/utils.py, audio/filters/rnnoise_filter.py, processors/frameworks/langchain.py, processors/frameworks/strands_agents.py, services/aws/utils.py, services/deepgram/flux/stt.py, services/deepgram/tts.py, services/mistral/stt.py, services/openai/image.py, transports/smallwebrtc/request_handler.py.	2026-05-01 09:36:14 -04:00
Paul Kompfner	31ff07916f	fix: clear 10 more services from pyright ignore list A second pass over the low-error-count files in the ignore list. Drops 10 files (77 → 67) and full-pyright errors from 580 → 555. Default pyright stays clean. Three coherent shapes plus a handful of one-offs: `Language \| str \| None` → `Language \| None` at STT frame boundaries. `assert_given(self._settings.language)` returns `Language \| str \| None` (strips `_NotGiven`, keeps the rest), but `TranscriptionFrame.language` expects `Language \| None`. In practice both `_settings.language` and SDK-supplied codes resolve to a `Language` enum value, but technically they could be raw strings — and `Language` is a StrEnum, so downstream consumers (which mostly compare/serialize as strings) handle either. Used `cast("Language \| None", ...)` at each call site rather than a runtime-validating helper, so an unrecognised code (e.g. one we haven't added to the enum yet) still flows through unchanged. Cleared azure/stt.py, aws/stt.py, gradium/stt.py; mistral/stt.py keeps the cast at the SDK boundary (storing under `_detected_language: Language \| None`) but stays in the ignore list because of two unrelated Optional-access errors. aiobotocore `async with` stub gap. `aioboto3.Session().client(...)` is an async context manager at runtime but its stubs don't advertise `__aenter__`/`__aexit__` to pyright. Scoped `# pyright: ignore[reportGeneralTypeIssues]` on the two affected sites: aws/agent_core.py and aws/tts.py. aws/tts.py also had a latent bug on the no-`AudioStream` path: the original code set `audio_data = None` and then crashed in `resample(...)` and `len(audio_data)` below; replaced with an early `return` after logging — matches the convention elsewhere (OpenAI TTS, etc.) of not recording usage metrics on the error path. heygen `event_id: str \| None` → `str` at transport→client boundary. Three call sites in transports/heygen/transport.py passed `self._event_id` (`str \| None`) into client methods that take `str`. Added a guard at each: `agent_speak_end` and `interrupt` only fire when `_event_id` is set; `write_audio_frame` warn-and-drops when there's no active bot event rather than sending a malformed message. `OpenAIResponsesLLMInvocationParams` TypedDict. `get_llm_invocation_params` always sets both `input` and `tools` in the same dict literal, but the TypedDict was `total=False` so direct subscript access (`invocation_params["input"]`) tripped `reportTypedDictNotRequiredAccess` in services/openai/responses/llm.py. Marked both keys `Required[...]`; `instructions` stays non-required since it's only added when a system instruction is present. Latent bug in heygen/api_interactive_avatar.py: the code accessed `request_data.voice.voiceId` and `request_data.voice.elevenlabsSettings`, but those names are Pydantic aliases; the actual attribute names (used for attribute access) are `voice_id` and `elevenlabs_settings`. Switched to the field names — those camelCase accesses would have raised AttributeError at runtime if `voice` was set. Other small fixes: - assemblyai/stt.py: the deprecated `connection_params=` init path was reading `formatted_finals` and `word_finalization_max_wait_time` off `AssemblyAIConnectionParams`, but those fields were never on the deprecated input model — they were added to Settings later. Removed the reads (with a comment noting they're only available via the canonical `settings=...` API); the deprecated input model is unchanged. - rtvi/processor.py: two `about: Mapping[str, Any] = None` parameter signatures — declared `Mapping`, defaulted to `None`, and both function bodies already handled the None case. Widened to `Mapping[str, Any] \| None = None`. - aws/stt.py: `subprotocols=["mqtt"]` failed against websockets' `Sequence[Subprotocol] \| None` (Subprotocol is a NewType wrapper). Wrapped: `subprotocols=[Subprotocol("mqtt")]`. Files dropped from the ignore list (77 → 67): processors/frameworks/rtvi/processor.py, services/assemblyai/stt.py, services/aws/agent_core.py, services/aws/stt.py, services/aws/tts.py, services/azure/stt.py, services/gradium/stt.py, services/heygen/api_interactive_avatar.py, services/openai/responses/llm.py, transports/heygen/transport.py.	2026-05-01 09:36:14 -04:00
Paul Kompfner	814f00ce41	fix: clear 19 TTS/STT/etc. services from pyright ignore list Several adjacent fix shapes that together drop 19 files from the pyrightconfig.json ignore list (96 → 77) and full-pyright errors from 605 → 580. Default pyright stays clean. TTS voice/context_id None handling — most files in this batch had a single error of the shape "value typed `T \| None` passed where `T` is required" coming out of `assert_given(self._settings.voice)` (which strips `_NotGiven` but not `None`) or `get_active_audio_context_id()`. Two patterns: - For services where a missing voice means the request can't proceed (hume, openai, xtts, groq, kokoro, piper), added an explicit None check. Inside `run_tts` we yield an `ErrorFrame` and return — matching each service's existing error-emission style (a few wrap `Exception` broadly and were fine; openai/hume/xtts had narrower or no try blocks so a bare `raise ValueError` would have escaped uncaught). Piper validates in `__init__`, where failing fast at construction is the right shape. OpenAI also gained a `voice not in VALID_VOICES` guard with a clear message listing supported voices. - For services where a missing audio context just means "skip this message" (fish, lmnt, smallest, sarvam, neuphonic), widened `TTSService.append_to_audio_context`'s `context_id` signature to `str \| None`. The function body already explicitly handled the None case with a debug log + early return, so the prior `str` annotation was a lie; making it honest cleared call sites without local guards. inworld's `_close_context` got the same treatment. google.genai imports — switched `from google import genai` to `import google.genai as genai` in google/image.py and google/llm.py. The dotted form sidesteps a PEP 420 namespace-package stub gap (the `google` namespace stubs come from a different distribution and don't declare `genai`), which means pyright now resolves `genai` to the real module rather than `Unknown`. IDE autocomplete on `genai.<x>` works for the first time. In image.py this surfaced three latent bugs that the `Unknown` resolution had been hiding (model was `str \| _NotGiven \| None` not narrowed before passing to the SDK; two spots accessed `.image_bytes` on an `Image \| None` without a guard) — all fixed. llm.py's dotted import surfaced 8 errors (Content-list typing nuances, internal `_api_client` access, a few small Optionals); deferred to a future pass since they're outside this commit's scope, so the file stays in the ignore list with the dotted import. Latent bug fixes spotted along the way: - resembleai/tts.py was calling `push_error(ErrorFrame(...))`, but `push_error` takes a string — there's a separate `push_error_frame` for the frame case. Switched to the right method. - openai/base_llm.py: `max_completion_tokens` was the only sibling field on `OpenAILLMSettings` missing `\| None` in its type, which caused the assignment in openai/llm.py from `params.max_completion_tokens` (`int \| None`) to fail. Added `\| None` for consistency with `max_tokens` etc. - heygen/base_api.py: `livekit_url: str = None` and `ws_url: str = None` declared `str` while defaulting to `None`. Removed the bogus defaults — both fields are required at construction in every in-tree call site, and the previous `str = None` was a Pydantic footgun. Other small ones: gladia/stt.py needed a None guard on `_session_url` before `websocket_connect`; openrouter/llm.py's `build_chat_completion_params` override widened to `dict[str, Any]` diverging from the parent's `OpenAILLMInvocationParams` — restored the parent's type; neuphonic/tts.py guarded the receive loop's `async for message in self._websocket` with a local-variable narrowing matching the pattern from 9e9b1f39e. groq/tts.py: tightened `output_format`'s typing to `Literal["flac","mp3","mulaw","ogg","wav"] \| str = "wav"`. The literal side gives IDE autocomplete hints for the currently-supported set; the `\| str` side keeps callers unblocked if groq adds a new format before this list is updated. A `cast` at the API boundary satisfies groq's stricter `Literal` parameter type. The literal alias mirrors the inlined Literal on `groq.resources.audio.speech.AsyncSpeech.create`'s `response_format` (the SDK doesn't export it as a named symbol). websocket_service.py: scoped `# pyright: ignore[reportAttributeAccessIssue]` on `websockets.WebSocketClientProtocol`. That alias is now a deprecated re-export from the legacy submodule and pyright doesn't surface it on the top-level `websockets` namespace; runtime is fine. Migrating to `websockets.ClientConnection` is a separate piece of work (transports/websocket/client.py uses the same alias four times) and left for a future commit. Files dropped from the ignore list: fish/tts.py, gladia/stt.py, google/image.py, groq/tts.py, heygen/base_api.py, hume/tts.py, inworld/tts.py, kokoro/tts.py, lmnt/tts.py, neuphonic/tts.py, openai/llm.py, openai/tts.py, openrouter/llm.py, piper/tts.py, resembleai/tts.py, sarvam/tts.py, smallest/tts.py, websocket_service.py, xtts/tts.py.	2026-05-01 09:36:14 -04:00
Paul Kompfner	96756bc1f6	fix: clean up TypedDict / Optional patterns in 6 more LLM adapters Same approach as the previous round — apply boundary casts where the code does dict-style mutation on TypedDict-typed values, narrow at return sites, and document the LLMSpecificMessage limitation in realtime adapters that pack history into a single text message. aws_nova_sonic_adapter.py — pure typing + small narrowing fixes: - Filter LLMSpecific items in `_from_universal_context_messages` (documented). - `_from_universal_context_message` now declared `-> AWSNovaSonicConversationHistoryMessage \| None` (it already had paths returning None implicitly). - `get_messages_for_logging` returns `dict[str, Any]` per element via `dataclasses.asdict`, matching the declared return type. - Use a local `role` variable so pyright keeps the narrowing across the truthy-content guard. grok_realtime_adapter.py / inworld_realtime_adapter.py — same shape of fix as `open_ai_realtime_adapter.py` from the previous batch. The two files are essentially copies of the OpenAI Realtime adapter, so the same template applies: cast at the boundary, filter LLMSpecificMessage with a documented note, replace the implicit-None fallthrough with `raise ValueError`, and switch the `text_content +=` pattern (which fails when one of the parts is None) to a `text_parts.append(...)` + `" ".join(...)` pattern. open_ai_adapter.py — pure typing. Cast at the `OpenAILLMInvocationParams` return, narrow the system-instruction warning's `initial_content` to `str \| None`, and cast the custom-tools list to `list[ChatCompletionToolParam]`. open_ai_responses_adapter.py — pure typing. Same shape: narrow `first_content` to `str \| None` for the warning resolver, cast the constructed dict literals at append sites where the target is `ResponseInputItemParam`, and cast `get_messages_for_logging`'s return to the declared `list[dict[str, Any]]`. processors/aggregators/llm_context.py — pure typing. Cast the deepcopied message in the redaction loop in `get_messages` to `dict[str, Any]` and the create_image/audio_message return-dict literals to `LLMContextMessage`. Removes 6 newly-clean files from the pyright ignore list. Net: -77 pyright errors (full-config: 680 -> 603).	2026-05-01 09:36:14 -04:00
Paul Kompfner	5e24027fd5	fix: type fixes (and a few latent bug fixes) in 4 LLM adapters Same shape of fix we applied to anthropic_adapter.py earlier — these adapters do dict-style mutation on values typed as ChatCompletionMessageParam (a union of TypedDicts) or against Optional fields. Apply boundary casts (`cast(dict[str, Any], ...)` for the mutation block, cast back to the TypedDict at return sites). Most changes are pure typing (rename + cast); a handful in gemini and openai_realtime are small defensive bug fixes for code paths that were latently broken by Optional fields slipping through: perplexity_adapter.py — pure typing. Cast the deepcopied messages to `list[dict[str, Any]]` for the role-merging / system-conversion / trailing-assistant-removal transformations and cast back to ChatCompletionMessageParam at the return. bedrock_adapter.py — pure typing. Cast the message to `dict[str, Any]` at the top of `_from_standard_message` for the tool-result / tool-use / image-content transformations. Cast the constructed dict at the return site of `get_llm_invocation_params`. gemini_adapter.py — typing + several None guards on Content.parts and related Optional fields. Each guard turns a latent `TypeError`/`AttributeError` (when the type-system-allowed None showed up at runtime) into a defensive skip — the type annotations say these can be None and we now handle that. open_ai_realtime_adapter.py: - Typing: cast the deepcopied messages, cast back where needed. - LLMSpecificMessage handling: previously the function would crash on the first `.get()` call if any LLMSpecificMessage was in the list. Filter them out and document the limitation — this adapter's pack-into-single-text-message strategy doesn't compose with opaque per-provider payloads. - Real bug fix: `events.ConversationItem` is a Pydantic BaseModel, not a TypedDict. The bulk-packing path was constructing a raw dict where a ConversationItem was expected. Replaced with proper constructor calls (matches what the single-user-message path already does). - Real bug fix: `_from_universal_context_message` was declared `-> events.ConversationItem` but on the unhandled-message fallthrough it logged and returned None implicitly. Raise ValueError so the violation is loud, not silent. Removes 4 newly-clean files from the pyright ignore list: adapters/services/{perplexity,bedrock,gemini,open_ai_realtime}_adapter.py. Net: -95 pyright errors (full-config: 775 -> 680).	2026-05-01 09:36:14 -04:00
Paul Kompfner	ef226c8a8e	fix: silence _settings NotGiven leaks and tighten Google STT language method Six pyright errors followed the same pattern: a value flowed out of `self._settings.X` (typed `T \| _NotGiven`) into a context that wanted the plain `T`. Wrap each with `assert_given(...)` so the sentinel gets stripped at the boundary: - aws/nova_sonic/llm.py: `_settings.model` (in InvokeModel...Input) and `_settings.system_instruction` (passed to the adapter). - deepgram/flux/base.py: iterating `_settings.keyterm`. - google/stt.py: iterating `_settings.languages`. - google/tts.py: iterating `_settings.speaker_configs`. - openai/base_llm.py: `_settings.system_instruction` passed to the adapter. Also takes a deeper pass at the related Google STT issue: the override of `language_to_service_language` had been broadened to take `Language \| list[Language]` and return `str \| list[str]`, a Liskov violation against the base's `Language -> str \| None` contract. External callers always pass a single Language, and the only consumer of the list path was Google STT's own `_get_language_codes`. Restore the override to a single-Language signature and let `_get_language_codes` iterate. The override is also tightened to return `str` (narrower than the base's `str \| None`, which is LSP-compatible) since it always falls back to `"en-US"` rather than returning None. Net: -7 pyright errors (full-config run: 782 -> 775).	2026-05-01 09:36:14 -04:00
Paul Kompfner	2a731336be	fix: tighten language_to_<service>_language return types to plain str These provider-specific helpers are all thin wrappers around `resolve_language(...)`, which itself returns `str` — never `None`. The `str \| None` annotations were misleading and were producing spurious pyright errors at the call sites that assigned the result into a `str` field. Update each helper's signature to `str` and rewrite the `Returns:` docstring to describe the actual fallback behaviour (resolve to base or full code, with a warning). Importantly, the per-class `language_to_service_language(...)` methods on `STTService`/`TTSService` subclasses keep `str \| None` as their return type. That signature is an extension hook for future and/or third-party subclasses that may genuinely not be able to produce a code for some languages, even though all in-tree first- party services currently return a string. Also includes one small unrelated tightening in azure/stt.py: wrap `self._settings.language` with `assert_given(...)` so the truthy fallback to `language_to_azure_language(Language.EN_US)` doesn't silently swallow a NotGiven sentinel. Net: -3 pyright errors (full-config run: 785 -> 782).	2026-05-01 09:36:14 -04:00
Paul Kompfner	bec407ce3a	fix: handle Optional websocket/client receivers across services Pyright flagged 19 sites where `await self._<connection>.send/recv/...` was called on a receiver typed `X \| None`. Each kind of call site needed a slightly different fix to be both type-safe and behaviour- preserving: Streaming/user-facing paths (early return + warn — drop and warn is the right runtime fail-safe when reconnect didn't succeed): - cartesia/stt.py (run_stt) - soniox/stt.py (_send_keepalive) - elevenlabs/tts.py (run_tts — yields ErrorFrame and returns) - deepgram/sagemaker/tts.py (run_tts) - transports/lemonslice/transport.py (send_message) - transports/tavus/transport.py (send_message) "Should never happen" cases (early return with comment, no warn — caller already gated on a separate `_is_*` check, so a warn would be noise): - deepgram/flux/stt.py (transport methods, gated by _transport_is_active) - deepgram/flux/sagemaker/stt.py (same) - stt_service.py (_send_keepalive, gated by _is_keepalive_ready) - elevenlabs/stt.py (_send_keepalive, same) - llm_service.py (_ws_recv — raises ConnectionError to match _ensure_connected's contract) - heygen/client.py (receive loop, gated by self._connected) Just-assigned-above (use a local variable so pyright keeps the narrowing across statements): - lmnt/tts.py - gradium/stt.py - fish/tts.py Other: - transports/websocket/server.py — used the existing local `websocket` parameter in scope instead of `self._websocket` for the close call. - websocket_service.py — `send_with_retry` raises ConnectionError when `self._websocket` is None inside the existing try-block, so the broad `except Exception` triggers reconnect just as it would on a real send failure (preserving the prior behaviour where None silently fell through to the AttributeError-driven reconnect path). Drops three now-clean files from the pyright ignore list: cartesia/stt.py, elevenlabs/stt.py, and soniox/stt.py.	2026-05-01 09:36:14 -04:00
Paul Kompfner	1cd73b1ef8	refactor: give TAdapter a default to restore precise typing for unparameterized LLMService subclasses After making LLMService generic, an unparameterized subclass (`class MyService(LLMService):` with no bracket — the third-party provider pattern) saw `get_llm_adapter()` return `Unknown` rather than `BaseLLMAdapter` as it did before the refactor. Add `default=BaseLLMAdapter` (PEP 696) on the TypeVar — via `typing_extensions.TypeVar` so older Python targets keep working — so unparameterized callers get `LLMService[BaseLLMAdapter]` and `get_llm_adapter()` returns `BaseLLMAdapter`, matching the pre-refactor type precision. Two internal fallouts of having a default (where the default makes unannotated `LLMService` resolve invariantly to `LLMService[BaseLLMAdapter]`): - `FunctionCallParams.llm` is now `LLMService[Any]` so concrete parameterizations like `LLMService[OpenAILLMAdapter]` can be passed where the field is set. - The explicit `LLMService.__init__(self, **kwargs)` in `WebsocketLLMService.__init__` gets a `pyright: ignore[reportArgumentType]` comment — pyright's invariance handling can't see through the multi-inheritance + generic + default combination, but the runtime call is correct (generics are erased).	2026-05-01 09:36:14 -04:00
Paul Kompfner	c4f5f1ebbb	test, refactor: follow-ups to LLMService generic refactor Two follow-ups now that LLMService is generic over its adapter: - Add an explicit backward-compat test verifying that an LLMService subclass with no generic parameter (the third-party-provider pattern) instantiates and returns a usable adapter. The existing MockLLMService (declared without brackets) already exercised this implicitly, but it's worth a named assertion. - Drop the now-redundant `params: SomeLLMInvocationParams = ...` variable annotations on `adapter.get_llm_invocation_params()` results. Since `get_llm_adapter()` now returns the precise adapter type, and `BaseLLMAdapter` is generic in its invocation-params type, the call already infers the right TypedDict.	2026-05-01 09:36:14 -04:00
Paul Kompfner	49068ff557	refactor: make LLMService generic over its adapter type Previously, `LLMService.get_llm_adapter()` returned `BaseLLMAdapter`, which forced every caller that wanted the precise adapter type to write `adapter: SomeAdapter = self.get_llm_adapter()` and accept pyright's complaint that the assignment doesn't match the declared type. That pattern existed in 17 places across the LLM services. Make `LLMService` generic over its adapter type — `LLMService(..., Generic[TAdapter])` with `TAdapter = TypeVar("TAdapter", bound=BaseLLMAdapter)` — so subclasses opt in via `LLMService[XAdapter]` and callers get the precise type back from `get_llm_adapter()` automatically. Backward-compatible for third-party providers: code that says `class MyService(LLMService):` (no bracket) still type-checks, with TAdapter resolving to BaseLLMAdapter from the bound — identical to the pre-refactor behavior. The `adapter_class` attribute keeps its loose `type[BaseLLMAdapter] = OpenAILLMAdapter` typing so the default remains usable; one localized cast in `__init__` bridges the loose class attr to the precise instance attr. In-tree subclasses opted in: - AnthropicLLMService -> LLMService[AnthropicLLMAdapter] - AWSBedrockLLMService -> LLMService[AWSBedrockLLMAdapter] - AWSNovaSonicLLMService -> LLMService[AWSNovaSonicLLMAdapter] - BaseOpenAILLMService -> LLMService[OpenAILLMAdapter] (propagates to ~15 OpenAI-compatible providers like Cerebras, Groq, Together) - GeminiLiveLLMService -> LLMService[GeminiLLMAdapter] - GoogleLLMService -> LLMService[GeminiLLMAdapter] - GrokRealtimeLLMService -> LLMService[GrokRealtimeLLMAdapter] - InworldRealtimeLLMService -> LLMService[InworldRealtimeLLMAdapter] - OpenAIRealtimeLLMService -> LLMService[OpenAIRealtimeLLMAdapter] - _BaseOpenAIResponsesLLMService -> LLMService[OpenAIResponsesLLMAdapter] - WebsocketLLMService is also generic so the multi-inheritance case (OpenAIResponsesLLMService) can keep both bases agreeing on TAdapter. All 17 redundant `adapter: SomeAdapter = self.get_llm_adapter()` annotations are now plain `adapter = self.get_llm_adapter()`.	2026-05-01 09:36:14 -04:00
Paul Kompfner	d23bdaaacd	fix: handle NotGiven from from_standard_tools in Nova Sonic connect Same pattern as the earlier get_setup_params fix: when context tools are absent, the fallback `adapter.from_standard_tools(self._tools)` can return the NotGiven sentinel, and `_send_prompt_start_event` expects a list. Coerce via `or []` so the NotGiven case becomes an empty list.	2026-05-01 09:36:14 -04:00
Paul Kompfner	53ce57b7fa	fix: tighten _process_completed_function_calls in AWS Nova Sonic Three small changes that resolve pyright errors and sharpen the logic: - Guard `self._context` with the codebase's "should never happen" early-return pattern, so we don't blindly call `.get_messages()` on None. - Skip `LLMSpecificMessage` items in the iteration. They're opaque provider-specific payloads with no `.get()`, and the surrounding logic only applies to standard tool-result messages. - Match `role == "tool"` explicitly. The previous truthy-only check was working by accident — the `tool_call_id` filter further down was effectively narrowing to tool messages, but the intent is clearer when stated upfront.	2026-05-01 09:36:14 -04:00
Paul Kompfner	dabca70744	fix: warn and bail in reset_conversation when no context exists reset_conversation is part of the public AWSNovaSonicLLMService API and is also called internally from the receive-task error handler. Previously it captured `self._context` (typed `LLMContext \| None`) and unconditionally passed it to `_handle_context`, which expects a real context — silently doing the wrong thing if no initial context had been received yet. Treat that as developer error: log a warning and return early. Nothing to preserve means nothing to reset.	2026-05-01 09:36:14 -04:00
Paul Kompfner	191bdc733f	fix: conform AWSNovaSonicLLMService.get_setup_params to its protocol The service implements the NovaSonicSessionSender protocol so the session-continuation helper can target either the current or next session. The protocol declares `get_setup_params(self) -> tuple[str \| None, list]`, but the implementation was unannotated and could return NotGiven in the tools position when from_standard_tools fell through to its NotGiven sentinel. Add the matching return annotation and coerce the NotGiven case to an empty list.	2026-05-01 09:36:14 -04:00

1 2 3 4 5 ...

9307 Commits