pipecat

Author	SHA1	Message	Date
Paul Kompfner	712e42533d	Introduce WebsocketLLMService and refactor OpenAIResponsesLLMService to use it Add WebsocketLLMService as a base class for WebSocket-based LLM services, parallel to WebsocketTTSService/WebsocketSTTService but codifying a transactional request-response model rather than a continuous background receive loop. WebsocketLLMService provides: - Connection lifecycle (start/stop/cancel → connect/disconnect) - _ws_send/_ws_recv with transparent ConnectionClosed handling (auto-reconnect via exponential backoff → WebsocketReconnectedError) - _ensure_connected with retry via _try_reconnect OpenAIResponsesLLMService now inherits from WebsocketLLMService, removing duplicated connection management code (_connect, _disconnect, _reconnect, _ensure_connected, _ws_send, start, stop, cancel) and simplifying _process_context from a loop with attempt tracking to a flat try/except with a single retry.	2026-03-30 22:26:31 -04:00
Paul Kompfner	26f85687d6	Handle response cancellation by draining before next inference Instead of trying to filter stale events inline (unreliable — the API doesn't provide a way to correlate events to a specific response), drain remaining events from a cancelled response before starting the next one. On cancellation, send response.cancel and set a drain flag. At the start of the next _process_context, read and discard events until a terminal event arrives, ensuring a clean connection. Falls back to reconnecting if draining times out.	2026-03-30 09:59:03 -04:00
Paul Kompfner	9defff2a34	Skip server-known output items in previous_response_id optimization When using previous_response_id, the server already knows its own output from the previous response. Store the raw response output and, on the next call, compare it against the items following the matched input prefix — checking role and text content for messages, and call_id for function calls. If the items match, skip them and send only truly new input (user messages, tool results). Falls back to full context if either the prefix or the output comparison fails.	2026-03-30 09:59:03 -04:00
Paul Kompfner	f2a8a9e753	Add WebSocket-based OpenAI Responses LLM service with previous_response_id optimization Introduce a WebSocket variant of the OpenAI Responses API service that maintains a persistent connection to wss://api.openai.com/v1/responses for lower-latency inference. The WebSocket variant automatically uses previous_response_id to send only incremental context when possible, falling back to full context on reconnection or cache miss. The WebSocket variant becomes the new default OpenAIResponsesLLMService, and the HTTP variant is renamed to OpenAIResponsesHttpLLMService. Both share a private base class with common settings, parameter building, and run_inference (always HTTP) logic.	2026-03-30 09:58:56 -04:00

4 Commits