litellm

ai-workspace-services/litellm

Fork 0

History

Sameer Kankute 2453936a82 Litellm websocket improvements (#29563 ) * Add support for websocket via codex * Add model alias and creds support * fix: skip cost tracking for WS session wrapper call types The @client decorator on _aresponses_websocket fires async_success_handler with result=None after the session ends. This triggered cost tracking errors because standard_logging_object is never built for None results. Per-turn costs are correctly tracked by individual litellm.aresponses calls inside the session. The outer session-level logging obj should not attempt cost tracking. Fix: skip _aresponses_websocket and _arealtime call types in deployment_callback_on_success, RouterBudgetLimiting.async_log_success_event, and _PROXY_track_cost_callback. * fix: address Greptile review comments Fix JSON injection: use json.dumps instead of f-string interpolation for model name in WS body. Add 30s timeout for first WS frame to prevent unbounded connection resource tie-up. Restore per-event model override in streaming_iterator; fall back to connection-level model when event omits it. Strengthen regression test: inject alias into kwargs via _update_kwargs_with_deployment mock so the test would fail on un-fixed code. * fix: handle nested response.create format in first-frame model extraction When ?model= is omitted, the first WS frame can carry the model in either flat format (first_event["model"]) or nested format (first_event["response"]["model"]). The flat-only check would silently reject clients using the nested wire format. Mirrors the same two-format logic in _build_base_call_kwargs. * fix: don't force connection-level custom_llm_provider on per-event model overrides If a client sends a different model per response.create turn, litellm needs to re-resolve the provider from that model string. Forcing the connection-level custom_llm_provider would silently route the request to the wrong backend. Only inject custom_llm_provider when the per-event model matches the connection-level model. * refactor: extract WS model extraction into testable function Pull the flat/nested model extraction into _extract_model_from_first_ws_event so tests import and exercise the real function rather than a copy. * fix: compare providers not full model strings in _inject_credentials The model == self.model guard was too strict: same-provider model variants (e.g., vertex_ai/gemini-2.0 -> vertex_ai/gemini-1.5 on one connection) would lose custom_llm_provider, breaking routing when a custom api_base is in use. Compare the provider extracted by get_llm_provider instead, so same-provider variants still inherit the connection-level provider while cross-provider overrides let litellm re-resolve. * style: black formatting * refactor: extract first-frame model resolution to fix PLR0915 (too many statements) * Fix responses WebSocket first-frame validation * fix: classify WS first-frame read errors and clarify cost-skip log Distinguish client disconnects from server errors when reading the responses WebSocket first frame, make the cost-tracking skip log message accurate for session wrappers (which do carry a model), and resolve the connection-level provider once per session instead of on every response.create event. * test: cover WS first-frame read errors and same-provider credential injection Adds regression tests for the still-uncovered responses WebSocket paths: the timeout, invalid-JSON and missing-model branches of _read_ws_model_from_first_frame, plus the provider comparison in ManagedResponsesWebSocketHandler._same_provider and _inject_credentials (same-provider model variants keep the connection provider; cross-provider models re-resolve). * fix(responses-ws): fall back to explicit custom_llm_provider when connection model is unresolvable When a WebSocket session is opened with a custom deployment alias that litellm cannot resolve to a provider, _connection_provider was None, so _same_provider returned False for every resolvable per-event model and the connection-level custom_llm_provider was dropped. Use the explicitly-set custom_llm_provider as the connection provider in that case so same-provider per-event models still inherit it while genuinely cross-provider models continue to re-resolve. --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>	2026-06-03 11:48:35 -07:00
..
__init__.py
test_endpoints.py	Litellm websocket improvements (#29563 )	2026-06-03 11:48:35 -07:00

Sameer Kankute 2453936a82

Litellm websocket improvements (#29563 )

* Add support for websocket via codex

* Add model alias and creds support

* fix: skip cost tracking for WS session wrapper call types

The @client decorator on _aresponses_websocket fires async_success_handler
with result=None after the session ends. This triggered cost tracking errors
because standard_logging_object is never built for None results.

Per-turn costs are correctly tracked by individual litellm.aresponses calls
inside the session. The outer session-level logging obj should not attempt
cost tracking.

Fix: skip _aresponses_websocket and _arealtime call types in deployment_callback_on_success,
RouterBudgetLimiting.async_log_success_event, and _PROXY_track_cost_callback.

* fix: address Greptile review comments

Fix JSON injection: use json.dumps instead of f-string interpolation for model name in WS body.

Add 30s timeout for first WS frame to prevent unbounded connection resource tie-up.

Restore per-event model override in streaming_iterator; fall back to connection-level model when event omits it.

Strengthen regression test: inject alias into kwargs via _update_kwargs_with_deployment mock so the test would fail on un-fixed code.

* fix: handle nested response.create format in first-frame model extraction

When ?model= is omitted, the first WS frame can carry the model in either flat
format (first_event["model"]) or nested format (first_event["response"]["model"]).
The flat-only check would silently reject clients using the nested wire format.

Mirrors the same two-format logic in _build_base_call_kwargs.

* fix: don't force connection-level custom_llm_provider on per-event model overrides

If a client sends a different model per response.create turn, litellm needs to
re-resolve the provider from that model string. Forcing the connection-level
custom_llm_provider would silently route the request to the wrong backend.

Only inject custom_llm_provider when the per-event model matches the
connection-level model.

* refactor: extract WS model extraction into testable function

Pull the flat/nested model extraction into _extract_model_from_first_ws_event
so tests import and exercise the real function rather than a copy.

* fix: compare providers not full model strings in _inject_credentials

The model == self.model guard was too strict: same-provider model variants
(e.g., vertex_ai/gemini-2.0 -> vertex_ai/gemini-1.5 on one connection) would
lose custom_llm_provider, breaking routing when a custom api_base is in use.

Compare the provider extracted by get_llm_provider instead, so same-provider
variants still inherit the connection-level provider while cross-provider
overrides let litellm re-resolve.

* style: black formatting

* refactor: extract first-frame model resolution to fix PLR0915 (too many statements)

* Fix responses WebSocket first-frame validation

* fix: classify WS first-frame read errors and clarify cost-skip log

Distinguish client disconnects from server errors when reading the
responses WebSocket first frame, make the cost-tracking skip log message
accurate for session wrappers (which do carry a model), and resolve the
connection-level provider once per session instead of on every
response.create event.

* test: cover WS first-frame read errors and same-provider credential injection

Adds regression tests for the still-uncovered responses WebSocket paths:
the timeout, invalid-JSON and missing-model branches of
_read_ws_model_from_first_frame, plus the provider comparison in
ManagedResponsesWebSocketHandler._same_provider and _inject_credentials
(same-provider model variants keep the connection provider; cross-provider
models re-resolve).

* fix(responses-ws): fall back to explicit custom_llm_provider when connection model is unresolvable

When a WebSocket session is opened with a custom deployment alias that litellm
cannot resolve to a provider, _connection_provider was None, so _same_provider
returned False for every resolvable per-event model and the connection-level
custom_llm_provider was dropped. Use the explicitly-set custom_llm_provider as
the connection provider in that case so same-provider per-event models still
inherit it while genuinely cross-provider models continue to re-resolve.

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>

2026-06-03 11:48:35 -07:00

__init__.py

test_endpoints.py

Litellm websocket improvements (#29563 )

2026-06-03 11:48:35 -07:00