* Add support for websocket via codex
* Add model alias and creds support
* fix: skip cost tracking for WS session wrapper call types
The @client decorator on _aresponses_websocket fires async_success_handler
with result=None after the session ends. This triggered cost tracking errors
because standard_logging_object is never built for None results.
Per-turn costs are correctly tracked by individual litellm.aresponses calls
inside the session. The outer session-level logging obj should not attempt
cost tracking.
Fix: skip _aresponses_websocket and _arealtime call types in deployment_callback_on_success,
RouterBudgetLimiting.async_log_success_event, and _PROXY_track_cost_callback.
* fix: address Greptile review comments
Fix JSON injection: use json.dumps instead of f-string interpolation for model name in WS body.
Add 30s timeout for first WS frame to prevent unbounded connection resource tie-up.
Restore per-event model override in streaming_iterator; fall back to connection-level model when event omits it.
Strengthen regression test: inject alias into kwargs via _update_kwargs_with_deployment mock so the test would fail on un-fixed code.
* fix: handle nested response.create format in first-frame model extraction
When ?model= is omitted, the first WS frame can carry the model in either flat
format (first_event["model"]) or nested format (first_event["response"]["model"]).
The flat-only check would silently reject clients using the nested wire format.
Mirrors the same two-format logic in _build_base_call_kwargs.
* fix: don't force connection-level custom_llm_provider on per-event model overrides
If a client sends a different model per response.create turn, litellm needs to
re-resolve the provider from that model string. Forcing the connection-level
custom_llm_provider would silently route the request to the wrong backend.
Only inject custom_llm_provider when the per-event model matches the
connection-level model.
* refactor: extract WS model extraction into testable function
Pull the flat/nested model extraction into _extract_model_from_first_ws_event
so tests import and exercise the real function rather than a copy.
* fix: compare providers not full model strings in _inject_credentials
The model == self.model guard was too strict: same-provider model variants
(e.g., vertex_ai/gemini-2.0 -> vertex_ai/gemini-1.5 on one connection) would
lose custom_llm_provider, breaking routing when a custom api_base is in use.
Compare the provider extracted by get_llm_provider instead, so same-provider
variants still inherit the connection-level provider while cross-provider
overrides let litellm re-resolve.
* style: black formatting
* refactor: extract first-frame model resolution to fix PLR0915 (too many statements)
* Fix responses WebSocket first-frame validation
* fix: classify WS first-frame read errors and clarify cost-skip log
Distinguish client disconnects from server errors when reading the
responses WebSocket first frame, make the cost-tracking skip log message
accurate for session wrappers (which do carry a model), and resolve the
connection-level provider once per session instead of on every
response.create event.
* test: cover WS first-frame read errors and same-provider credential injection
Adds regression tests for the still-uncovered responses WebSocket paths:
the timeout, invalid-JSON and missing-model branches of
_read_ws_model_from_first_frame, plus the provider comparison in
ManagedResponsesWebSocketHandler._same_provider and _inject_credentials
(same-provider model variants keep the connection provider; cross-provider
models re-resolve).
* fix(responses-ws): fall back to explicit custom_llm_provider when connection model is unresolvable
When a WebSocket session is opened with a custom deployment alias that litellm
cannot resolve to a provider, _connection_provider was None, so _same_provider
returned False for every resolvable per-event model and the connection-level
custom_llm_provider was dropped. Use the explicitly-set custom_llm_provider as
the connection provider in that case so same-provider per-event models still
inherit it while genuinely cross-provider models continue to re-resolve.
---------
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>