Commit Graph

38958 Commits

Author SHA1 Message Date
Sameer Kankute
fd7ff0f269
fix(hosted_vllm): normalize custom tools for chat completions (#25763)
* fix(hosted_vllm): normalize custom tools for chat completions

Convert custom tool definitions into OpenAI function tools before forwarding hosted_vllm chat requests to avoid provider-side validation failures. Add a regression test and include a local curl verification screenshot.

Made-with: Cursor

* Fix black issue

* Fix hosted vllm custom tool schema fallback

* fix black

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2026-05-05 17:27:02 -07:00
yuneng-jiang
9a338e1b6b
[Test] Tests: Stop parametrizing API keys into pytest test IDs (#27249)
Several tests parametrized over (model, api_key, ...) tuples or raw
token strings, causing pytest to embed those values in the test ID
and print them in CI logs. Refactored each affected test to keep the
same coverage without putting key material into parametrize.

- audio_tests/test_audio_speech.py: split env-var keys into separate
  azure/openai test functions sharing a helper; sync_mode parametrize
  preserved.
- audio_tests/test_whisper.py: split into openai_whisper /
  azure_whisper functions sharing a helper; response_format parametrize
  preserved.
- local_testing/test_embedding.py: single-case parametrize inlined.
- proxy_unit_tests/test_user_api_key_auth.py: 5 header parametrize
  cases split into 5 named tests sharing an _assert helper.
- proxy_unit_tests/test_proxy_utils.py: 4 api_key_value cases split
  into 4 named tests.
- test_litellm/proxy/auth/test_user_api_key_auth.py: 5 key-prefix
  cases (Bearer / Basic / lowercase bearer / raw / AWS SigV4) split
  into 5 named tests.

Verified: black clean; 14 refactored unit tests pass; pytest collects
audio/embedding tests with safe IDs (no key material in test IDs).
2026-05-05 17:21:18 -07:00
Sameer Kankute
e912e6d4ff
feat(audio_transcription): add NVIDIA Riva STT provider (#27185)
* feat(audio_transcription): add NVIDIA Riva STT provider

Adds nvidia_riva as a new audio transcription provider, supporting both
NVCF-hosted and self-hosted Riva ASR deployments via gRPC streaming.

- Auto-resamples input audio to 16 kHz mono LINEAR_PCM (soundfile + numpy,
  audioread fallback) so callers can send any common format.
- Maps OpenAI params: language (en -> en-US), response_format (text/json/
  verbose_json), timestamp_granularities=["word"] -> enable_word_time_offsets,
  word offsets converted ms -> s for verbose_json.
- Auth: NVCF when nvcf_function_id is set (SSL on by default), self-hosted
  otherwise (SSL off by default), with explicit use_ssl override.
- gRPC errors wrapped via NvidiaRivaException -> litellm exception classes.
- Optional deps gated behind [stt-nvidia-riva] extra (nvidia-riva-client,
  soundfile, audioread, numpy).

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(nvidia_riva): address PR review feedback

- handler: forward call-level `timeout` to streaming_response_generator
  (kwarg-detected via inspect for older riva-client compat) so a stalled
  Riva server cannot block the caller indefinitely.
- audio_utils: spill bytes to a tempfile before audioread.audio_open;
  most audioread backends (FFmpeg, GStreamer) require a real filesystem
  path and previously raised TypeError on BytesIO, breaking the mp3/m4a
  fallback path.
- audio_utils: prefer soxr / scipy.signal.resample_poly for resampling
  (anti-aliased polyphase) when installed, falling back to linear only
  as a last resort. Avoids aliasing on 44.1/48 kHz -> 16 kHz downsamples.
- transformation: bare `es` now maps to es-ES (Castilian) instead of
  es-US, matching BCP-47 conventions.

Co-authored-by: Cursor <cursoragent@cursor.com>

* chore: trigger CI re-run [stabilize loop 1/3]

* Update litellm/llms/nvidia_riva/audio_transcription/transformation.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* chore: trigger CI re-run [stabilize loop 1/3]

* fix code qa

* fix lint

* fix mypy

* fix mypy

* Fix NVIDIA Riva ASR service lookup

* Fix NVIDIA Riva transcription payload logging

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: oss-pr-review-agent-shin[bot] <281797381+oss-pr-review-agent-shin[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
2026-05-05 17:17:51 -07:00
Krrish Dholakia
454ce5073f
fix(anthropic, mcp): sanitize tool names to match Anthropic's [a-zA-Z0-9_-]{1,128} pattern (#26788)
* fix(anthropic, mcp): sanitize tool names to match Anthropic's `^[a-zA-Z0-9_-]{1,128}$`

Tool names with characters like `/` or `.` (commonly produced by the
OpenAPI -> MCP generator from `operationId`s such as
`actions/download-job-logs-for-workflow-run`) caused Anthropic to reject
requests with `tools.N.custom.name: String should match pattern
'^[a-zA-Z0-9_-]{1,128}$'`.

Two layers of fix:

1. Anthropic transformation: build a per-request forward map (original ->
   sanitized, disambiguated by suffix on collisions) and a reverse map
   (only for names actually rewritten). Forward map is applied to tool
   defs, `tool_choice`, and historical assistant tool_calls in messages.
   Reverse map is threaded through both the non-streaming and streaming
   response paths so callers continue to see their original tool names
   in `tool_use` blocks.

2. OpenAPI -> MCP generator: sanitize `operationId` (and the
   method+path fallback) at registration time so generated MCP tools are
   valid for any strict-name provider, not just Anthropic. The dashboard
   preview endpoint applies the same sanitization for parity.

Includes unit tests covering: collision disambiguation between
`foo_bar` and `foo/bar` in the same request, reverse-map only firing
for actually-rewritten names, message rewrite for historical tool_calls,
streaming chunk_parser reverse-mapping, and sanitization of OpenAPI
operationIds plus the preview endpoint output.

Made-with: Cursor

* fix(anthropic): build tool-name maps in transform_request, not optional_params

The previous patch stashed the per-request forward and reverse tool-name
maps under ``optional_params["_anthropic_tool_name_forward_map"]`` and
``optional_params["_anthropic_tool_name_map"]``. ``optional_params`` is
the dict that becomes the JSON body via ``data = {**optional_params}``,
so those internal keys leaked over the wire and Anthropic 400'd with:

  _anthropic_tool_name_forward_map: Extra inputs are not permitted

Worse, this meant *every* request whose tool list contained any name with
an invalid character (the exact case the patch was meant to fix) regressed
into a confusing meta-error pointing at LiteLLM's internal map instead of
the offending tool.

Fix: move all tool-name sanitization into ``transform_request``, which is
the single chokepoint already shared by ``AnthropicConfig``,
``AmazonAnthropicConfig`` (Bedrock invoke), ``VertexAIAnthropicConfig``,
and ``AzureAnthropicConfig`` (all call ``super().transform_request`` /
``AnthropicConfig.transform_request(self, ...)``). New static helper
``_sanitize_tool_names_in_request`` walks the already-Anthropic-shaped
``optional_params["tools"]`` (only ``type=="custom"`` entries -- hosted
tool names are reserved by Anthropic and must not be touched), builds
the per-request forward/reverse maps, and applies the forward map in
place to ``tools[*].name`` and ``tool_choice.name``. The reverse map is
stashed exclusively on ``litellm_params`` (which is never serialized to
a provider) under ``_anthropic_tool_name_map`` for the response paths
to consume.

Side effect of this restructure: ``map_openai_params`` is now a pure
OpenAI->Anthropic param translator with no side-channel state, which
matches its contract everywhere else in the codebase.

Tests: replaced the now-incorrect "stashes maps in optional_params"
tests with regressions that assert no underscore-prefixed keys appear
in either ``optional_params`` after ``map_openai_params`` or in the
final ``transform_request`` body. Added end-to-end coverage for:
sanitization in ``transform_request``, ``tool_choice`` rewriting,
historical ``tool_calls`` rewriting in messages, and hosted-tool
passthrough.

Made-with: Cursor

* fix(anthropic): always sanitize empty text content blocks

Anthropic 400s on `{"role": "user", "content": ""}` with:
  "messages: text content blocks must be non-empty"

LiteLLM already had `_sanitize_empty_text_content` to rewrite empty text
to a placeholder, but it was gated behind `litellm.modify_params=True`.
With that flag off (default), empty content from upstream agent
frameworks (e.g. pydantic-ai) flowed straight through and tripped the
Anthropic validator.

Fix:
- Always run `_sanitize_empty_text_content` at the top of
  `anthropic_messages_pt`, independent of `modify_params`. There is no
  way to "pass through" an empty text block, so this is non-optional.
  The richer tool-call sanitizations (Cases A/B/D, which actually
  mutate conversation structure) remain gated on `modify_params`.
- Extend `_sanitize_empty_text_content` to also handle list-of-blocks
  content (`[{"type": "text", "text": ""}]`), not just string content.

Adds 3 regression tests covering string content, list-of-blocks
content, and the no-op case (non-empty messages with modify_params off).

Made-with: Cursor

* fix(anthropic): drop dead tool-name forward-map params, fix mypy + caller-mutation

- remove unused `name_forward_map` param from `_map_tool_choice`,
  `_map_tool_helper`, `_map_tools` and the `_apply_anthropic_tool_name_forward`
  helper. Production sanitization runs in `_sanitize_tool_names_in_request`
  at `transform_request`; these params were never threaded through.
- handler.py: use `ANTHROPIC_TOOL_NAME_REVERSE_MAP_KEY` constant instead of
  the hardcoded `"_anthropic_tool_name_map"` string.
- fix mypy `"object" has no attribute "__iter__"` in
  `_rewrite_tool_names_in_messages` by guarding `tool_calls` with
  `isinstance(..., list)`.
- `_sanitize_tool_names_in_request`: build a new tools list with copy-on-
  change entries (and copy `tool_choice` on rewrite) so a caller reusing
  the same tool list/dicts across requests doesn't see its inputs
  permanently rewritten.
- doc-comment `_build_request_tool_name_maps` clarifying it operates on
  OpenAI-format tools (vs `_sanitize_tool_names_in_request` which runs
  on Anthropic-format tools post-`_map_tools`).
- tests: drop 3 tests pinning the now-removed param paths; add coverage
  for tool_calls + None function_call rewrite and caller-dict immutability.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(mcp): inherit stored credentials in test/tools/list for edit flow

When editing an existing MCP server, the Tool Configuration preview
calls POST /mcp-rest/test/tools/list with server_id but no credentials
(management API redacts them). The endpoint now calls
_inherit_credentials_from_existing_server() so stored bearer tokens
and OAuth2 M2M credentials are loaded from global_mcp_server_manager
automatically — tools load without re-entering credentials.

New servers (no server_id) and requests with explicit credentials are
unaffected (function is a no-op in both cases).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(mcp): show all tools in edit panel, not just allowed tools

Edit flow was passing externalTools (from GET /tools/list, filtered by
allowed_tools) to MCPToolConfiguration, disabling the internal hook.
Remove the external props so the internal hook fires via
POST /test/tools/list, which returns all tools unfiltered. Combined
with the credential inheritance fix, tools load automatically without
re-entering credentials and all tools are visible for re-configuration.

existingAllowedTools still pre-checks previously allowed tools.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix order-dependent collision in _build_anthropic_tool_name_maps

Use a two-pass approach: first pre-register all already-valid tool names
in the 'used' set, then sanitize/disambiguate names that need rewriting.
This ensures valid names always have priority regardless of input order,
preventing duplicate tool names on the wire when e.g. 'foo/bar' appears
before 'foo_bar' in the tool list.

Add regression test for the reversed ordering case.

* Fix OpenAPI tool name collision: disambiguate sanitized names with numeric suffixes

sanitize_openapi_tool_name replaces all invalid chars with '_', but when
two operationIds differ only by sanitized characters (e.g. 'foo/list' and
'foo.list' both become 'foo_list'), the second registration silently
overwrites the first in the tool registry.

Add collision disambiguation in register_tools_from_openapi that appends
_2, _3, ... suffixes when a sanitized name is already taken, mirroring
the existing logic in _build_anthropic_tool_name_maps.

* Fix preview endpoint missing collision disambiguation for tool names

Add used_names tracking and _2/_3 suffix disambiguation to
_preview_openapi_tools, matching the logic in register_tools_from_openapi.
Without this, two operationIds that sanitize to the same string (e.g.
'foo/list' and 'foo.list' both becoming 'foo_list') would show duplicate
names in the preview while registration would disambiguate them.

* Align preview HTTP method order with register_tools_from_openapi

The preview endpoint and register_tools_from_openapi both use
order-dependent collision disambiguation (_2, _3 suffixes). When the
iteration order differs, two operations on the same path with sanitized
names that collide get different suffixes in preview vs registration,
so the dashboard shows names that don't match what actually got
registered.

Also adds a regression test that fails on the swapped order.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

* Skip duplicate originals in _build_anthropic_tool_name_maps

If the same invalid tool name appeared twice in original_names (e.g.
['foo/bar', 'foo/bar']), the second occurrence overwrote the forward
map entry with a freshly-suffixed name (foo_bar_2), leaving foo_bar
orphaned in 'used' with no reverse mapping. _sanitize_tool_names_in_request
then rewrote both tool entries to foo_bar_2, and Anthropic 400'd on
duplicate tool names.

Skip the rewrite if forward already has the original mapped.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
2026-05-06 00:00:36 +00:00
Yassin Kortam
dbc8f5a937
helm: skip proxy startup prisma db push when migrations Job is enabled (#27200)
Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>
2026-05-05 16:58:53 -07:00
Yassin Kortam
618df94433
helm: increase default probe timeouts, disable debug logging by default (#27237)
Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>
2026-05-05 16:58:34 -07:00
Yassin Kortam
950074eea2
fix: atomic TPM rate limit (#27001)
Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>
2026-05-05 16:58:07 -07:00
Sameer Kankute
b8635bbc7a
feat(realtime): OpenAI Realtime GA support and beta compatibility (#27110)
* feat(realtime): OpenAI Realtime GA support and beta compatibility

- Normalize beta-style session.update to GA for upstream OpenAI; optional GA→beta
  event translation when client sends OpenAI-Beta: realtime=v1
- Default upstream WebSocket without OpenAI-Beta; forward header when client opts in
- Extend OpenAI realtime types for GA event names and conversation item shapes
- Relax LiteLLMRealtimeStreamLoggingObject.results to List[Any] for GA events
- Update proxy client_secrets fallback to omit beta header; dashboard RealtimePlayground
- Add unit tests for remap, translation, and beta header helper

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix results

* fix greptile

* Fix mypy issues

* Remove unused class constants _GA_TEXT_DELTA_TYPES and _GA_AUDIO_DELTA_TYPES

These frozensets were defined as class-level constants in realtime_streaming.py
but never referenced anywhere in the codebase. Removing dead code.

Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>

* fix(realtime): use GA-shaped session.update in guardrail injections

The guardrail VAD injection code sent a beta-style session.update with a
flat turn_detection field:

  {"session": {"turn_detection": {"create_response": false}}}

When the upstream OpenAI backend operates in GA mode (no OpenAI-Beta
header forwarded), it requires the nested GA shape:

  {"session": {"type": "realtime", "audio": {"input": {"turn_detection": {"create_response": false}}}}}

The _remap_beta_session_to_ga helper was only applied to client-
originated session.update messages in client_ack_messages. Internally-
generated session.updates (sent via _send_to_backend) in two paths:
  - _handle_raw_backend_message (raw/no provider_config path, line 518)
  - backend_to_client_send_messages provider_config path (line 481)
bypassed the remap, so GA upstreams ignored or rejected them, breaking
audio transcription guardrails for all non-beta clients.

Fix: add _make_disable_auto_response_message() helper that always emits
the correct GA-shaped session.update, and replace both injection sites
with it.

Update existing tests to assert the GA nested shape instead of the old
flat beta shape, and add a new unit test for the helper itself.

Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>

* Log realtime session type

* Fix beta realtime session payloads

* Fix realtime audio format remapping edge case

* Fix Azure realtime beta session shape

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
2026-05-05 16:49:20 -07:00
harish-berri
4fec69dd1e
refactor(BaseAWSLLM): implement shared IAM cache and static credentia… (#27125)
* refactor(BaseAWSLLM): implement shared IAM cache and static credential caching

- Introduced a process-wide shared IAM cache to optimize credential management across instances.
- Added a method to handle caching of static credentials, ensuring only long-lived credentials are cached.
- Updated the get_credentials method to utilize the new caching mechanism for static credential flows.
- Enhanced unit tests to verify the correct behavior of the shared cache and static credential usage.

* refactor(BaseAWSLLM): enhance IAM credential caching and update related tests

- Improved the process-wide IAM credential caching mechanism to better handle static and AssumeRole credentials.
- Renamed the caching method for clarity and updated comments to reflect the new caching behavior.
- Added a fixture to ensure the IAM cache is flushed between tests to prevent leakage of cached entries.
- Updated unit tests to verify the correct behavior of the shared IAM cache, particularly for static credentials and role assumptions.

* refactor(BaseAWSLLM): clarify IAM credential caching behavior and enhance tests

- Updated documentation to specify that only static and ambient environment credentials are cached, excluding AssumeRole and other credential types.
- Modified the caching logic to ensure that AssumeRole credentials are not stored in the IAM cache, requiring STS calls for each request.
- Enhanced unit tests to verify that AssumeRole credentials are not cached and to ensure proper behavior of the IAM cache across different scenarios.

* Code Readability improvement for aws auth path

* refactor(BaseAWSLLM): enhance IAM credential caching documentation and add tests

- Updated comments to clarify the behavior of the in-process IAM credential cache, specifying the TTL for static and ambient credentials.
- Added new unit tests to verify the caching behavior for ambient environment credentials across instances and ensure that static access key sessions are constructed only once when cached.
- Ensured that temporary session tokens and AWS profiles are not cached, validating the expected behavior through additional tests.

* refactor(BaseAWSLLM): improve IAM credential handling and add tests for role assumption

- Updated comments to clarify the behavior of IAM credential caching, particularly regarding the handling of ambient credentials and role assumptions.
- Enhanced unit tests to verify that the caching mechanism correctly distinguishes between already running roles and new role assumptions, ensuring that cached environment credentials are not reused incorrectly.
- Added a new test case to validate the behavior when switching roles, confirming that the system correctly uses AssumeRole when the role changes.
2026-05-05 16:47:47 -07:00
yuneng-jiang
e84282b7b3
[Infra] Bump deps (#27157)
* bump: version 0.4.70 → 0.4.71

* bump: version 0.1.39 → 0.1.40

* uv lock
2026-05-05 15:58:05 -07:00
yuneng-jiang
7abafe50fb
chore: update Next.js build artifacts (2026-05-05 22:45 UTC, node v20.20.2) (#27240) 2026-05-05 22:51:21 +00:00
ryan-crabbe-berri
1b25f853ce
[Fix] Team UI: handle legacy dict shape for metadata.guardrails (#27224)
* [Fix] Team UI: handle legacy dict shape for metadata.guardrails

A team can have metadata.guardrails stored as {"modify_guardrails": bool}
(the permission-flag shape introduced in PR #4810) rather than the
expected string[]. The opt-out logic added in PR #25575 calls .filter()
on this field, which throws TypeError on a dict and crashes the team
detail page.

Add a safeGuardrailsList helper that returns [] when the field is not
an array, and route the three read sites through it.

* [Fix] Team UI: inline Array.isArray guards for guardrails metadata

Replace the safeGuardrailsList helper with inline Array.isArray checks
at each call site, and apply the same guard to opted_out_global_guardrails
for consistency. No known legacy dict rows for opted_out_global_guardrails,
but the unguarded `|| []` pattern is the same shape risk.

Six call sites now defended directly: three for metadata.guardrails
and three for metadata.opted_out_global_guardrails.
2026-05-05 15:40:44 -07:00
Mateo Wang
7e13256fee
test: add 24hr Redis-backed VCR cache to additional test suites (#27159)
* test: add 24hr Redis-backed VCR cache to additional test suites

Extracts the existing llm_translation VCR plumbing into a reusable helper
(tests/_vcr_conftest_common.py) and wires it into the conftest.py files
of the test directories listed in LIT-2787:

  audio_tests, batches_tests, guardrails_tests, image_gen_tests,
  litellm_utils_tests, local_testing, logging_callback_tests,
  pass_through_unit_tests, router_unit_tests, unified_google_tests

The same helper is also adopted by the pre-existing llm_translation and
llm_responses_api_testing conftests to remove the copy-pasted VCR setup.

Each consuming conftest:
- registers the Redis persister via pytest_recording_configure
- auto-marks collected tests with pytest.mark.vcr (skipping respx-using
  files where applicable, since respx and vcrpy both patch httpx)
- gates cassette writes on test success via _vcr_outcome_gate

The cache is opt-in via CASSETTE_REDIS_URL; when unset, VCR is disabled
and tests hit live providers as before. LITELLM_VCR_DISABLE=1 still
forces a bypass for ad-hoc local runs.

Test directories that run LiteLLM proxy in Docker (build_and_test,
proxy_logging_guardrails_model_info_tests, proxy_store_model_in_db_tests)
are intentionally not included: VCR.py patches the in-process httpx
transport and cannot intercept calls made from inside a Docker container.
The installing_litellm_on_python* jobs make no LLM calls and don't
benefit from caching.

https://linear.app/litellm-ai/issue/LIT-2787/add-24hr-caching-to-additional-test-suites

* test(vcr): add safe-body matcher to handle JSONL and binary request bodies

vcrpy's stock body matcher inspects Content-Type and unconditionally
runs json.loads on application/json bodies. JSON Lines payloads (used
by the Bedrock batch S3 PUT and other upload paths) crash that with
json.JSONDecodeError: Extra data, before the matcher can return
'not a match'.

This was the root cause of the batches_testing CI job failing on
test_async_create_file once VCR auto-marking was applied to the
batches_tests directory.

Add a conservative byte-equality body matcher and use it in place of
'body' in the shared match_on tuple. The matcher is strictly more
conservative than vcrpy's default — the only thing it gives up is
'different JSON key order is treated as the same body', which doesn't
apply to deterministic litellm-built request payloads. It can never
produce a false positive that the default would have rejected, so
there is no cross-contamination risk.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

* test(vcr): exclude tests that VCR replay actively breaks

A few tests are incompatible with cassette replay and were failing on
the latest CI run after VCR auto-marking was extended to local_testing
and logging_callback_tests:

- test_amazing_s3_logs.py (logging_callback_tests): the test asserts on
  a per-run response_id that should round-trip through a real S3
  PUT/LIST. vcrpy's boto3 stub intercepts the PUT and the LIST replays
  stale keys, so the freshly-generated id is never found.
- test_async_embedding_azure (logging_callback_tests) and
  test_amazing_sync_embedding (local_testing): the failure branches
  deliberately pass api_key='my-bad-key' to assert that the failure
  callback fires. We scrub auth headers from cassettes (so the bad-key
  request matches the prior good-key request), and vcrpy replays the
  recorded 200 — the failure callback never fires.
- test_assistants.py (local_testing): the OpenAI Assistants polling
  APIs mint fresh thread/run IDs every recording session and then poll
  until status=='completed'. Replays of those polled GETs can never
  match a freshly-generated run id, so every CI run effectively
  re-records and the suite blows past the 15m no_output_timeout.

Skip these from VCR auto-marking so they continue to hit live providers
as they did before this change. The remaining tests in each directory
still get cached.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

* test(vcr): expand skip lists for second batch of incompatible tests

Followup to the previous commit. After re-running CI on the rebuilt
branch, three more tests surfaced as VCR-replay-incompatible:

- litellm_utils_testing :: test_get_valid_models_from_dynamic_api_key
  Calls GET /v1/models with api_key='123' to assert the result is empty.
  We scrub auth headers, so the bad-key request matches the prior
  good-key cassette and replays the recorded model list.
- litellm_utils_testing :: test_litellm_overhead.py
  Measures litellm_overhead_time_ms as a percentage of total wall-clock
  time. With cached responses the upstream 'network' time collapses to
  microseconds, blowing past the 40%% threshold the test asserts on.
  Skip the whole file (every parametrization is at risk).
- local_testing_part1 :: test_async_custom_handler_completion and
  test_async_custom_handler_embedding
  Same bad-key failure-callback pattern as the already-skipped
  test_amazing_sync_embedding.
- litellm_router_testing :: test_router_caching.py
  Asserts on litellm's own router-level response cache by comparing
  response1.id to response2.id across repeat upstream calls (test
  bypasses litellm cache via ttl=0 and expects upstream to return a
  *new* id). With VCR replay both upstream calls return the same
  cassette body, so the ids are identical. Skip the whole file.
- logging_callback_tests :: test_async_chat_azure (preemptive)
  Same shape as already-skipped test_async_embedding_azure; was masked
  by upstream OpenAI rate-limit failures on baseline.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

* test(vcr): use item.path and tighten matcher docstring

- Replace pytest's deprecated item.fspath with item.path in
  apply_vcr_auto_marker_to_items so we don't emit deprecation
  warnings under pytest 8.
- Clarify _safe_body_matcher docstring to reflect actual behavior
  (direct == first, then UTF-8 bytes comparison, no repr fallback).

Addresses Greptile review feedback on PR #27159.

* test(vcr): swallow all RedisError on cassette save/load

Cassette persistence is strictly best-effort: any Redis-side failure
(connection blip, timeout, OutOfMemoryError when the maxmemory cap is
hit, READONLY replicas, etc.) should degrade to 'test passed but
cassette not cached' rather than fail the test on teardown.

Previously the persister only caught ConnectionError and TimeoutError,
so OutOfMemoryError — which Redis Cloud raises when the cassette cache
hits its memory cap and there are no evictable keys — propagated out of
vcrpy's autouse fixture and ERRORed otherwise-passing tests on
teardown. This caused the litellm_utils_testing CircleCI job to fail on
the latest commit's run, even though the underlying test was a unit
test that used mock_response and produced no real upstream traffic
(the cassette was dirtied by a background langfuse callback). The
rerun only succeeded because Redis evictions happened to free enough
room before the SET — i.e. it was timing-dependent flakiness.

Catch redis.exceptions.RedisError (the common base of all server- and
client-side Redis exceptions) on both save and load, and parametrize
the regression tests across ConnectionError, TimeoutError, and
OutOfMemoryError to pin the new behavior.

* test(vcr): surface cassette-cache failures with warnings + session banner

When the persister silently swallows a Redis OOM (or any RedisError) on
save/load there is otherwise no visible signal that the cache is
degraded — tests pass, the cassette just isn't persisted, and the next
session still hits the same Redis at the same near-cap memory.

Add three layers of observability so that failure mode is loud:

1. Per-process health counters ("save_failures", "load_failures", and
   the last error string for each), exposed via cassette_cache_health()
   and reset via reset_cassette_cache_health(). The persister
   increments these in addition to logging.

2. VCRCassetteCacheWarning (UserWarning subclass) emitted via
   warnings.warn() inside the persister's except block. Pytest's
   built-in warnings summary at session end automatically lists every
   such warning, so the failure is visible in CI logs without any
   conftest-level wiring.

3. Session-end banner via emit_cassette_cache_session_banner() and a
   stderr-fallback atexit handler registered from
   register_persister_if_enabled(). Two states:
     - red "VCR CASSETTE CACHE DEGRADED" when save_failures or
       load_failures > 0
     - yellow "VCR CASSETTE CACHE NEAR CAPACITY" (no failures, but
       used_memory >= 85% of maxmemory) so the next session knows
       the Redis is approaching OOM before any SET actually fails

Capacity comes from a best-effort INFO memory probe
(cassette_cache_capacity_snapshot) that returns None on any failure or
when maxmemory is uncapped. The atexit handler skips xdist workers so
only the controller emits.

Tests: parametrize the existing save/load swallow-error tests across
ConnectionError/TimeoutError/OutOfMemoryError, add direct tests for
the health counters and warning emission, and a new
test_vcr_conftest_common_banner.py covering banner output for every
state (silent/red/yellow/disabled/xdist-worker).

* test(vcr): bucket cassettes by API key fingerprint, drop bad-key skips

Tests that deliberately call an LLM API with a bad key (e.g. to assert
that the failure callback fires, or that check_valid_key returns False)
were being silently served the prior good-key cassette: we scrub the
real Authorization / x-api-key header from the cassette before storing
it, so a follow-up bad-key call is byte-identical to the good-key call
under the existing match_on tuple.

Add a 'key_fingerprint' custom matcher that distinguishes requests by
the SHA-256 of their API-key headers. The fingerprint is stamped into
a synthetic 'x-litellm-key-fp' header by a new before_record_request
hook, which then strips the real auth headers (we have to do the
scrubbing here instead of via vcrpy's filter_headers knob, because
filter_headers runs *first* and would erase the value we want to hash).

Bad-key requests now get a different cassette bucket than good-key
requests, so vcrpy will not replay a recorded 200 in place of the
expected 401. The fingerprint is a one-way hash of the secret, so
cassettes never contain the key.

This permanently removes the 'bad-key' category of skips:

- tests/local_testing: dropped ::test_amazing_sync_embedding,
  ::test_async_custom_handler_completion,
  ::test_async_custom_handler_embedding
- tests/logging_callback_tests: dropped ::test_async_chat_azure,
  ::test_async_embedding_azure
- tests/litellm_utils_tests: dropped
  ::test_get_valid_models_from_dynamic_api_key

Coverage: 7 new unit tests in tests/test_litellm/test_vcr_safe_body_matcher.py
covering header stripping, fingerprint determinism, no-auth bucketing,
good-vs-bad key discrimination, x-api-key (Anthropic/Azure) discrimination,
and idempotence under replay.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

* test(vcr): drop redundant comments and docstrings

Trim narration of code that is already self-evident from function and
variable names. Keep the two genuinely non-obvious bits:

- ordering constraint between filter_headers and before_record_request,
  which would invite a maintainer to re-introduce the bug if removed
- the per-directory _VCR_INCOMPATIBLE_FILES rationale, since 'why
  exactly is this skipped' is not knowable from the test name alone

Also drop the 40-line commented-out drop-in conftest snippet at the
bottom of _vcr_conftest_common.py — the consuming conftests are the
canonical reference.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

* test(vcr): make _before_record_request idempotent

vcrpy invokes before_record_request more than once per request:
can_play_response_for calls it, then __contains__ /
_responses (reached via play_response) call it again on the
result. The second invocation sees a request whose auth headers we
already stripped, so a naive recompute yields "no-key" and
overwrites the real fingerprint stored in the header.

This makes can_play_response_for and play_response disagree on
matchability — the former says "yes, we have a stored response for
this" (matching no-key to no-key) and the latter throws
UnhandledHTTPRequestError because it computes a fresh real
fingerprint that doesn't match the stored no-key.

In CI this manifested as ~30 failing tests across guardrails_testing,
audio_testing, batches_testing, image_gen_testing, llm_responses_api,
litellm_router_unit_testing, etc. Skip the recompute when the header
is already set, so re-applying the hook is a no-op.

Adds a regression test that fires the hook twice on the same dict and
asserts the fingerprint stays put.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

* test(vcr): drop more redundant docstrings and headers

* test(vcr): enable 24hr cache for ocr_tests and search_tests

These two directories were the only non-dockerized test suites in the
build_and_test workflow that make live LLM/provider API calls but were
not VCR-enabled by this PR. Together they account for 96 tests:

- tests/ocr_tests/ (31): Mistral OCR, Azure AI OCR, Azure Document
  Intelligence, Vertex AI OCR. Pure-unit tests inside the same files
  (e.g. TestAzureDocumentIntelligencePagesParam) make no HTTP calls
  and become benign VCR NOOPs.
- tests/search_tests/ (65): Brave, DataForSEO, DuckDuckGo, Exa,
  Firecrawl, Google PSE, Linkup, Parallel.ai, Perplexity, SearchAPI,
  Searxng, Serper, Tavily.

Both directories use the canonical minimal conftest pattern from
tests/audio_tests/conftest.py with no skip lists. None of the test
files use respx, none assert on per-call upstream non-determinism
(no response1.id != response2.id, no overhead-as-fraction-of-total,
no live polling), so the default match_on tuple should cache cleanly.
If a flake surfaces during the first cassette-recording CI run, we
can add a targeted skip the same way we did for the other dirs.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
2026-05-05 15:13:31 -07:00
yuneng-jiang
fa6a82ab0f
[Fix] UI: Clear Admin Session Cookies Before Establishing Invited User's Session (#27227)
The invite-signup form was writing the new user's token via raw
`document.cookie` at `path=/`, while the rest of the auth surface uses
`storeLoginToken` (which writes at `path=/ui` and mirrors to
sessionStorage). After signup the inviter's `path=/ui` cookie kept
winning path-specificity matching, and sessionStorage still held the
inviter's token, so the dashboard rendered as the inviter rather than
the newly created user.

Treat invite signup as a principal-change boundary — clear prior
session cookies first, then store the new token via the canonical
helper.
2026-05-05 13:57:04 -07:00
yuneng-jiang
f318ef03bd
Merge pull request #27170 from BerriAI/litellm_/unruffled-mcclintock-62a296
[Fix] Docker: Remove Hardcoded Prisma Binary Target For Multi-Arch Builds
2026-05-04 21:43:14 -07:00
shin-berri
43c78057d4
Merge pull request #27169 from BerriAI/litellm_/vigorous-albattani-2b7480
[Perf] CI: Skip Redundant Playwright Apt Install in E2E UI Job
2026-05-04 21:39:43 -07:00
Yuneng Jiang
4ee586a321
[Fix] Docker: Remove Hardcoded Prisma Binary Target For Multi-Arch Builds
PRISMA_CLI_BINARY_TARGETS="debian-openssl-3.0.x" was hardcoded in
docker/Dockerfile.non_root by #17695. On a buildx linux/arm64 leg this
forces prisma to download the amd64 schema-engine into an arm64 image,
so 'prisma migrate deploy' fails at startup with 'Could not find
schema-engine binary'.

Removing the env lets prisma auto-detect per build platform: amd64
builds still resolve to debian-openssl-3.0.x (Wolfi falls back to
debian, same binary as before), and arm64 builds now correctly fetch
linux-arm64-openssl-3.0.x. The offline-cache pre-warm goal of #17695 is
preserved — only which binaries fill the cache changes.

Fixes #19458
2026-05-04 21:37:16 -07:00
Yuneng Jiang
19ad964c4a
Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_/vigorous-albattani-2b7480 2026-05-04 21:19:34 -07:00
Yuneng Jiang
c1c0506d2c
[Perf] CI: Skip Redundant Playwright Apt Install in E2E UI Job
The cimg/python:3.12-browsers base image already ships every Chromium
system dependency Playwright needs (libnss3, libatk-bridge2.0-0,
libcups2, etc. — the install log shows them all as "already the newest
version"). Passing --with-deps to `npx playwright install` therefore
runs an apt-get update + install for nothing, but pays the full cost of
hitting Ubuntu mirrors. On a recent run those mirrors stalled hard:
apt-get update alone took 6m53s at 81.5 kB/s with several archives
returning connection refused.

Drop --with-deps and persist ~/.cache/ms-playwright alongside
node_modules so the Chromium binary is also reused across runs. Bump
the cache key to v2 so the existing v1 entry (which only contained
node_modules) is not loaded and skipped over the new browser path.
2026-05-04 21:19:31 -07:00
yuneng-jiang
cd38ecd532
Merge pull request #27156 from BerriAI/yj_build_may4
[Infra] Build UI
2026-05-04 21:12:20 -07:00
shin-berri
ff3d089ab8
Merge pull request #27160 from BerriAI/litellm_/peaceful-gates-6e46e7
[Fix] Proxy: Break managed-resources import cycle on Python 3.13
2026-05-04 21:11:53 -07:00
yuneng-jiang
f2969ca78a
Merge pull request #27165 from BerriAI/litellm_/friendly-lichterman-35cf02
[Fix] CI: Enable VCR replay for test_azure_o_series
2026-05-04 20:59:46 -07:00
Yuneng Jiang
0976fbc6c4
[Fix] Tests: Restore /metrics access for prometheus test suite
/metrics now requires auth by default; tests/otel_tests/test_prometheus.py
makes 4+ unauthenticated GETs against http://0.0.0.0:4000/metrics, so
every prometheus test in CI now fails the metric assertion.

Set require_auth_for_metrics_endpoint: false in otel_test_config.yaml
to opt out for this test job, which scrapes /metrics directly. Verified
locally: 8/8 prometheus tests green (one flaky retry on
test_proxy_success_metrics that pre-dates this PR).

Also drop the -x stop-on-first-failure flag from the otel test command
so all failures in the job surface in a single CI run rather than
hiding behind whichever one trips first.
2026-05-04 20:54:54 -07:00
Yuneng Jiang
6a6c79d992
[Fix] CI: Enable VCR replay for test_azure_o_series
The Azure o-series tests were excluded from the conftest's VCR auto-marker
because of a respx/vcrpy transport-patching conflict, but the only respx
reference in the file was an unused `MockRouter` import. Drop the dead
import and remove the file from the conflict set so cassettes record on
first run and replay thereafter, eliminating the 60-95s live Azure latency
that was crashing xdist workers under --timeout=120 thread-mode timeouts.
2026-05-04 20:48:26 -07:00
Sameer Kankute
b0edffb883
Merge pull request #27103 from BerriAI/litellm_azure-deployment-image-body
fix(azure): omit model from deployment image gen and image edit bodies
2026-05-05 09:09:45 +05:30
Yuneng Jiang
e6f524f951
[Fix] Tests: Pick chat-completion OTEL trace by content, not recency
The /otel-spans endpoint returns process-wide spans and tags
most_recent_parent by max start_time. After tightening that route to
proxy_admin (sk-1234), the GET /otel-spans request itself emits auth
spans that beat the chat-completion spans on start_time, so
most_recent_parent now points at the request's own auth trace
(['postgres', 'postgres']) and the >=5-span assertion fails.

Pick the chat-completion trace by content: it is the only trace whose
span list is a superset of {postgres, redis, raw_gen_ai_request,
batch_write_to_db}. Verified locally end-to-end against
otel_test_config.yaml + OTEL_EXPORTER=in_memory: 3/3 runs green.
2026-05-04 20:35:09 -07:00
Sameer Kankute
4487d8352f
Merge pull request #27115 from Sameerlite/litellm_health_check_reasoning_effort
feat(proxy): add health_check_reasoning_effort for model health checks
2026-05-05 09:00:09 +05:30
Yuneng Jiang
8a1b6635fa
[Fix] Tests: Use master key for /otel-spans in test_chat_completion_check_otel_spans
/otel-spans now requires proxy admin (returns 401 'Only proxy admin
can be used to generate, delete, update info for new keys/users/teams.
Route=/otel-spans' for non-admin callers). Switch the GET call to use
the master key sk-1234 while keeping the generated key for the
chat-completion request that produces the spans.
2026-05-04 20:23:11 -07:00
Sameer Kankute
b4ee6a2355
test(proxy): cover health_check_reasoning_effort for completion mode
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-05 08:52:57 +05:30
Sameer Kankute
bb0e4168ad
refactor(azure): move image gen JSON helper; rename image edit finalize hook
- Add image_generation/http_utils.azure_deployment_image_generation_json_body; call
  from azure.py (keeps AzureChatCompletion focused on chat).
- Rename finalize_image_edit_multipart_data to finalize_image_edit_request_data with
  docstring covering multipart and JSON POST payloads (review feedback).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-05 08:49:46 +05:30
Yuneng Jiang
193907a4a3
[Fix] Lint: Mark _user_has_admin_view re-export in common_utils
Ruff F401 flagged the aliased import as unused within common_utils.py
because the name is consumed only by external modules (~15 callers
across guardrails, spend tracking, MCP, agents, management endpoints).
Add `# noqa: F401  re-exported` so the alias survives lint while
keeping a single source of truth in litellm.proxy._types.
2026-05-04 20:16:59 -07:00
Yuneng Jiang
8cac6c5bff
[Fix] Proxy: Address Greptile feedback on hook-cycle PR
- Move _user_has_admin_view to litellm.proxy._types as
  user_api_key_has_admin_view (single source of truth). common_utils.py
  and isolation.py both import from there now, removing the duplicated
  role-check that could silently diverge if new admin roles are added.
- Add pytest.importorskip("litellm_enterprise") to the two regression
  tests that assert managed_files / managed_vector_stores are registered;
  those keys come from ENTERPRISE_PROXY_HOOKS so the tests would fail
  unconditionally in a checkout without the enterprise extra installed.
2026-05-04 20:13:31 -07:00
Yuneng Jiang
727ab8dcc4
[Fix] Proxy: Break managed-resources import cycle on Python 3.13
The Python 3.13 CCI smoke matrix surfaces a partially-initialized-module
ImportError when loading the managed files hook chain:

  litellm.proxy.hooks/__init__ (mid-import)
    -> enterprise.enterprise_hooks
    -> litellm_enterprise.proxy.hooks.managed_files
    -> litellm.llms.base_llm.managed_resources.isolation
    -> litellm.proxy.management_endpoints.common_utils
    -> litellm.proxy.utils  (re-enters litellm.proxy.hooks)

The except ImportError block in hooks/__init__.py silently swallowed the
failure, leaving managed_files unregistered and POST /files returning
500 "Managed files hook not found".

Two-layer fix:
- Inline the 3-line _user_has_admin_view check in isolation.py instead
  of importing it from litellm.proxy.management_endpoints.common_utils.
  litellm.llms.* should not depend on litellm.proxy.* — removing this
  layering violation breaks the cycle at its root.
- Define PROXY_HOOKS and get_proxy_hook before the conditional
  enterprise import in litellm/proxy/hooks/__init__.py, so any future
  re-entry resolves the public names instead of hitting an
  ImportError on a partially-initialized module.

Also fold in two unrelated CCI repairs surfaced in the same staging run:
- tests/otel_tests/test_key_logging_callbacks.py: per-key
  gcs_bucket_name / gcs_path_service_account are now stripped by
  initialize_dynamic_callback_params, so the GCS client falls through
  to the env-only branch. Update the assertion to match the new
  "GCS_BUCKET_NAME is not set" message.
- .circleci/config.yml: tests/pass_through_tests now resolves
  google-auth-library@10.x via the @google-cloud/vertexai 1.12.0 bump,
  which uses dynamic ESM imports Jest 29 cannot load without
  --experimental-vm-modules. Pass that flag in the Vertex JS test step.

Adds tests/test_litellm/proxy/hooks/test_proxy_hooks_init.py as a
regression guard: managed_files / managed_vector_stores must register,
and isolation.py must not transitively import litellm.proxy.utils.
2026-05-04 20:05:24 -07:00
Yuneng Jiang
7c8409d013
chore: update Next.js build artifacts (2026-05-05 02:13 UTC, node v20.20.2) 2026-05-04 19:13:25 -07:00
yuneng-jiang
9ea824d5bf
Merge pull request #27143 from BerriAI/cursor/fix-secret-fields-in-spend-logs-a532
fix(security): prevent secret_fields from leaking into spend logs
2026-05-04 19:07:54 -07:00
yuneng-jiang
be5f217aaf
Merge pull request #26861 from BerriAI/litellm_fix_scim_virtual_key_deactivation
fix(scim): revoke virtual keys when SCIM deprovisions a user
2026-05-04 19:03:55 -07:00
Cursor Agent
5923c3209b
fix(security): prevent secret_fields from leaking into spend logs
secret_fields (containing raw HTTP headers including Authorization
Bearer tokens) was being included in proxy_server_request['body']
because the body snapshot was a copy.copy(data) of the full request
dict. This body gets serialized and persisted in the LiteLLM_SpendLogs
table, exposing user credentials in the database.

Root cause: data['secret_fields'] was set before the body snapshot at
data['proxy_server_request']['body'] = copy.copy(data), so the full
raw headers (including auth tokens) ended up in the snapshot.

Fix (defense in depth):
1. Exclude 'secret_fields' when creating the body snapshot in
   litellm_pre_call_utils.py (primary fix)
2. Strip 'secret_fields' in _sanitize_request_body_for_spend_logs_payload
   as a secondary safeguard

secret_fields remains available on the live data dict for legitimate
downstream consumers (MCP, Responses API).

Co-authored-by: Krrish Dholakia <krrish-berri-2@users.noreply.github.com>
2026-05-05 02:01:41 +00:00
yuneng-jiang
555a8131fe
Merge pull request #26951 from stuxf/codex/skills-containers-tenant-guard
chore(proxy): tighten resource ownership checks
2026-05-04 18:47:17 -07:00
yuneng-jiang
2f305050ce
Merge pull request #27004 from stuxf/fix/managed-resource-service-account-isolation
fix(proxy): isolate managed resources for service-account API keys
2026-05-04 18:45:55 -07:00
user
3dcb6bd3f9
Merge remote-tracking branch 'upstream/litellm_internal_staging' into codex/skills-containers-tenant-guard
# Conflicts:
#	litellm/proxy/auth/auth_utils.py
2026-05-05 01:41:25 +00:00
user
7faba9656f
Merge remote-tracking branch 'upstream/litellm_internal_staging' into fix/managed-resource-service-account-isolation 2026-05-05 01:38:11 +00:00
yuneng-jiang
281296f9cf
Merge pull request #27151 from BerriAI/litellm_yj_may4
[Infra] Merge dev branch
2026-05-04 18:29:52 -07:00
user
aee064ad37
Merge remote-tracking branch 'upstream/litellm_internal_staging' into fix/managed-resource-service-account-isolation 2026-05-05 01:29:05 +00:00
yuneng-jiang
dcb357ee2d
Merge pull request #27149 from BerriAI/litellm_/peaceful-bell-ba8ca5
[Fix] Tests: Replace deprecated openrouter/claude-3.7-sonnet with claude-sonnet-4.5
2026-05-04 18:27:45 -07:00
yuneng-jiang
efca16ccfa
Merge pull request #27043 from stuxf/fix/ssti-prompt-managers
fix(security): sandbox jinja2 in gitlab/arize/bitbucket prompt managers
2026-05-04 18:23:41 -07:00
Yuneng Jiang
e35cd5af76
Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_yj_may4 2026-05-04 18:22:47 -07:00
Yuneng Jiang
7f550a5d67
Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_/peaceful-bell-ba8ca5 2026-05-04 18:21:33 -07:00
Yassin Kortam
db2a3cafb6
Merge pull request #27131 from BerriAI/litellm_fix/routing-groups-ui
feat: routing groups ui
2026-05-04 18:16:49 -07:00
mateo-berri
4179159f0f Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_azure-deployment-image-body 2026-05-04 18:16:46 -07:00
Yassin Kortam
a56256e5ee feat: routing groups ui 2026-05-04 18:09:14 -07:00