litellm

Author	SHA1	Message	Date
Sameer Kankute	fd7ff0f269	fix(hosted_vllm): normalize custom tools for chat completions (#25763 ) * fix(hosted_vllm): normalize custom tools for chat completions Convert custom tool definitions into OpenAI function tools before forwarding hosted_vllm chat requests to avoid provider-side validation failures. Add a regression test and include a local curl verification screenshot. Made-with: Cursor * Fix black issue * Fix hosted vllm custom tool schema fallback * fix black --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com>	2026-05-05 17:27:02 -07:00
yuneng-jiang	9a338e1b6b	[Test] Tests: Stop parametrizing API keys into pytest test IDs (#27249 ) Several tests parametrized over (model, api_key, ...) tuples or raw token strings, causing pytest to embed those values in the test ID and print them in CI logs. Refactored each affected test to keep the same coverage without putting key material into parametrize. - audio_tests/test_audio_speech.py: split env-var keys into separate azure/openai test functions sharing a helper; sync_mode parametrize preserved. - audio_tests/test_whisper.py: split into openai_whisper / azure_whisper functions sharing a helper; response_format parametrize preserved. - local_testing/test_embedding.py: single-case parametrize inlined. - proxy_unit_tests/test_user_api_key_auth.py: 5 header parametrize cases split into 5 named tests sharing an _assert helper. - proxy_unit_tests/test_proxy_utils.py: 4 api_key_value cases split into 4 named tests. - test_litellm/proxy/auth/test_user_api_key_auth.py: 5 key-prefix cases (Bearer / Basic / lowercase bearer / raw / AWS SigV4) split into 5 named tests. Verified: black clean; 14 refactored unit tests pass; pytest collects audio/embedding tests with safe IDs (no key material in test IDs).	2026-05-05 17:21:18 -07:00
Sameer Kankute	e912e6d4ff	feat(audio_transcription): add NVIDIA Riva STT provider (#27185 ) * feat(audio_transcription): add NVIDIA Riva STT provider Adds nvidia_riva as a new audio transcription provider, supporting both NVCF-hosted and self-hosted Riva ASR deployments via gRPC streaming. - Auto-resamples input audio to 16 kHz mono LINEAR_PCM (soundfile + numpy, audioread fallback) so callers can send any common format. - Maps OpenAI params: language (en -> en-US), response_format (text/json/ verbose_json), timestamp_granularities=["word"] -> enable_word_time_offsets, word offsets converted ms -> s for verbose_json. - Auth: NVCF when nvcf_function_id is set (SSL on by default), self-hosted otherwise (SSL off by default), with explicit use_ssl override. - gRPC errors wrapped via NvidiaRivaException -> litellm exception classes. - Optional deps gated behind [stt-nvidia-riva] extra (nvidia-riva-client, soundfile, audioread, numpy). Co-authored-by: Cursor <cursoragent@cursor.com> * fix(nvidia_riva): address PR review feedback - handler: forward call-level `timeout` to streaming_response_generator (kwarg-detected via inspect for older riva-client compat) so a stalled Riva server cannot block the caller indefinitely. - audio_utils: spill bytes to a tempfile before audioread.audio_open; most audioread backends (FFmpeg, GStreamer) require a real filesystem path and previously raised TypeError on BytesIO, breaking the mp3/m4a fallback path. - audio_utils: prefer soxr / scipy.signal.resample_poly for resampling (anti-aliased polyphase) when installed, falling back to linear only as a last resort. Avoids aliasing on 44.1/48 kHz -> 16 kHz downsamples. - transformation: bare `es` now maps to es-ES (Castilian) instead of es-US, matching BCP-47 conventions. Co-authored-by: Cursor <cursoragent@cursor.com> * chore: trigger CI re-run [stabilize loop 1/3] * Update litellm/llms/nvidia_riva/audio_transcription/transformation.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * chore: trigger CI re-run [stabilize loop 1/3] * fix code qa * fix lint * fix mypy * fix mypy * Fix NVIDIA Riva ASR service lookup * Fix NVIDIA Riva transcription payload logging --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: oss-pr-review-agent-shin[bot] <281797381+oss-pr-review-agent-shin[bot]@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>	2026-05-05 17:17:51 -07:00
Krrish Dholakia	454ce5073f	fix(anthropic, mcp): sanitize tool names to match Anthropic's [a-zA-Z0-9_-]{1,128} pattern (#26788 ) * fix(anthropic, mcp): sanitize tool names to match Anthropic's `^[a-zA-Z0-9_-]{1,128}$` Tool names with characters like `/` or `.` (commonly produced by the OpenAPI -> MCP generator from `operationId`s such as `actions/download-job-logs-for-workflow-run`) caused Anthropic to reject requests with `tools.N.custom.name: String should match pattern '^[a-zA-Z0-9_-]{1,128}$'`. Two layers of fix: 1. Anthropic transformation: build a per-request forward map (original -> sanitized, disambiguated by suffix on collisions) and a reverse map (only for names actually rewritten). Forward map is applied to tool defs, `tool_choice`, and historical assistant tool_calls in messages. Reverse map is threaded through both the non-streaming and streaming response paths so callers continue to see their original tool names in `tool_use` blocks. 2. OpenAPI -> MCP generator: sanitize `operationId` (and the method+path fallback) at registration time so generated MCP tools are valid for any strict-name provider, not just Anthropic. The dashboard preview endpoint applies the same sanitization for parity. Includes unit tests covering: collision disambiguation between `foo_bar` and `foo/bar` in the same request, reverse-map only firing for actually-rewritten names, message rewrite for historical tool_calls, streaming chunk_parser reverse-mapping, and sanitization of OpenAPI operationIds plus the preview endpoint output. Made-with: Cursor * fix(anthropic): build tool-name maps in transform_request, not optional_params The previous patch stashed the per-request forward and reverse tool-name maps under ``optional_params["_anthropic_tool_name_forward_map"]`` and ``optional_params["_anthropic_tool_name_map"]``. ``optional_params`` is the dict that becomes the JSON body via ``data = {*optional_params}``, so those internal keys leaked over the wire and Anthropic 400'd with: _anthropic_tool_name_forward_map: Extra inputs are not permitted Worse, this meant every* request whose tool list contained any name with an invalid character (the exact case the patch was meant to fix) regressed into a confusing meta-error pointing at LiteLLM's internal map instead of the offending tool. Fix: move all tool-name sanitization into ``transform_request``, which is the single chokepoint already shared by ``AnthropicConfig``, ``AmazonAnthropicConfig`` (Bedrock invoke), ``VertexAIAnthropicConfig``, and ``AzureAnthropicConfig`` (all call ``super().transform_request`` / ``AnthropicConfig.transform_request(self, ...)``). New static helper ``_sanitize_tool_names_in_request`` walks the already-Anthropic-shaped ``optional_params["tools"]`` (only ``type=="custom"`` entries -- hosted tool names are reserved by Anthropic and must not be touched), builds the per-request forward/reverse maps, and applies the forward map in place to ``tools[].name`` and ``tool_choice.name``. The reverse map is stashed exclusively on ``litellm_params`` (which is never serialized to a provider) under ``_anthropic_tool_name_map`` for the response paths to consume. Side effect of this restructure: ``map_openai_params`` is now a pure OpenAI->Anthropic param translator with no side-channel state, which matches its contract everywhere else in the codebase. Tests: replaced the now-incorrect "stashes maps in optional_params" tests with regressions that assert no underscore-prefixed keys appear in either ``optional_params`` after ``map_openai_params`` or in the final ``transform_request`` body. Added end-to-end coverage for: sanitization in ``transform_request``, ``tool_choice`` rewriting, historical ``tool_calls`` rewriting in messages, and hosted-tool passthrough. Made-with: Cursor fix(anthropic): always sanitize empty text content blocks Anthropic 400s on `{"role": "user", "content": ""}` with: "messages: text content blocks must be non-empty" LiteLLM already had `_sanitize_empty_text_content` to rewrite empty text to a placeholder, but it was gated behind `litellm.modify_params=True`. With that flag off (default), empty content from upstream agent frameworks (e.g. pydantic-ai) flowed straight through and tripped the Anthropic validator. Fix: - Always run `_sanitize_empty_text_content` at the top of `anthropic_messages_pt`, independent of `modify_params`. There is no way to "pass through" an empty text block, so this is non-optional. The richer tool-call sanitizations (Cases A/B/D, which actually mutate conversation structure) remain gated on `modify_params`. - Extend `_sanitize_empty_text_content` to also handle list-of-blocks content (`[{"type": "text", "text": ""}]`), not just string content. Adds 3 regression tests covering string content, list-of-blocks content, and the no-op case (non-empty messages with modify_params off). Made-with: Cursor * fix(anthropic): drop dead tool-name forward-map params, fix mypy + caller-mutation - remove unused `name_forward_map` param from `_map_tool_choice`, `_map_tool_helper`, `_map_tools` and the `_apply_anthropic_tool_name_forward` helper. Production sanitization runs in `_sanitize_tool_names_in_request` at `transform_request`; these params were never threaded through. - handler.py: use `ANTHROPIC_TOOL_NAME_REVERSE_MAP_KEY` constant instead of the hardcoded `"_anthropic_tool_name_map"` string. - fix mypy `"object" has no attribute "__iter__"` in `_rewrite_tool_names_in_messages` by guarding `tool_calls` with `isinstance(..., list)`. - `_sanitize_tool_names_in_request`: build a new tools list with copy-on- change entries (and copy `tool_choice` on rewrite) so a caller reusing the same tool list/dicts across requests doesn't see its inputs permanently rewritten. - doc-comment `_build_request_tool_name_maps` clarifying it operates on OpenAI-format tools (vs `_sanitize_tool_names_in_request` which runs on Anthropic-format tools post-`_map_tools`). - tests: drop 3 tests pinning the now-removed param paths; add coverage for tool_calls + None function_call rewrite and caller-dict immutability. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(mcp): inherit stored credentials in test/tools/list for edit flow When editing an existing MCP server, the Tool Configuration preview calls POST /mcp-rest/test/tools/list with server_id but no credentials (management API redacts them). The endpoint now calls _inherit_credentials_from_existing_server() so stored bearer tokens and OAuth2 M2M credentials are loaded from global_mcp_server_manager automatically — tools load without re-entering credentials. New servers (no server_id) and requests with explicit credentials are unaffected (function is a no-op in both cases). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(mcp): show all tools in edit panel, not just allowed tools Edit flow was passing externalTools (from GET /tools/list, filtered by allowed_tools) to MCPToolConfiguration, disabling the internal hook. Remove the external props so the internal hook fires via POST /test/tools/list, which returns all tools unfiltered. Combined with the credential inheritance fix, tools load automatically without re-entering credentials and all tools are visible for re-configuration. existingAllowedTools still pre-checks previously allowed tools. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix order-dependent collision in _build_anthropic_tool_name_maps Use a two-pass approach: first pre-register all already-valid tool names in the 'used' set, then sanitize/disambiguate names that need rewriting. This ensures valid names always have priority regardless of input order, preventing duplicate tool names on the wire when e.g. 'foo/bar' appears before 'foo_bar' in the tool list. Add regression test for the reversed ordering case. * Fix OpenAPI tool name collision: disambiguate sanitized names with numeric suffixes sanitize_openapi_tool_name replaces all invalid chars with '_', but when two operationIds differ only by sanitized characters (e.g. 'foo/list' and 'foo.list' both become 'foo_list'), the second registration silently overwrites the first in the tool registry. Add collision disambiguation in register_tools_from_openapi that appends _2, _3, ... suffixes when a sanitized name is already taken, mirroring the existing logic in _build_anthropic_tool_name_maps. * Fix preview endpoint missing collision disambiguation for tool names Add used_names tracking and _2/_3 suffix disambiguation to _preview_openapi_tools, matching the logic in register_tools_from_openapi. Without this, two operationIds that sanitize to the same string (e.g. 'foo/list' and 'foo.list' both becoming 'foo_list') would show duplicate names in the preview while registration would disambiguate them. * Align preview HTTP method order with register_tools_from_openapi The preview endpoint and register_tools_from_openapi both use order-dependent collision disambiguation (_2, _3 suffixes). When the iteration order differs, two operations on the same path with sanitized names that collide get different suffixes in preview vs registration, so the dashboard shows names that don't match what actually got registered. Also adds a regression test that fails on the swapped order. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * Skip duplicate originals in _build_anthropic_tool_name_maps If the same invalid tool name appeared twice in original_names (e.g. ['foo/bar', 'foo/bar']), the second occurrence overwrote the forward map entry with a freshly-suffixed name (foo_bar_2), leaving foo_bar orphaned in 'used' with no reverse mapping. _sanitize_tool_names_in_request then rewrote both tool entries to foo_bar_2, and Anthropic 400'd on duplicate tool names. Skip the rewrite if forward already has the original mapped. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com> Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>	2026-05-06 00:00:36 +00:00
Yassin Kortam	dbc8f5a937	helm: skip proxy startup prisma db push when migrations Job is enabled (#27200 ) Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>	2026-05-05 16:58:53 -07:00
Yassin Kortam	618df94433	helm: increase default probe timeouts, disable debug logging by default (#27237 ) Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>	2026-05-05 16:58:34 -07:00
Yassin Kortam	950074eea2	fix: atomic TPM rate limit (#27001 ) Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>	2026-05-05 16:58:07 -07:00
Sameer Kankute	b8635bbc7a	feat(realtime): OpenAI Realtime GA support and beta compatibility (#27110 ) * feat(realtime): OpenAI Realtime GA support and beta compatibility - Normalize beta-style session.update to GA for upstream OpenAI; optional GA→beta event translation when client sends OpenAI-Beta: realtime=v1 - Default upstream WebSocket without OpenAI-Beta; forward header when client opts in - Extend OpenAI realtime types for GA event names and conversation item shapes - Relax LiteLLMRealtimeStreamLoggingObject.results to List[Any] for GA events - Update proxy client_secrets fallback to omit beta header; dashboard RealtimePlayground - Add unit tests for remap, translation, and beta header helper Co-authored-by: Cursor <cursoragent@cursor.com> * fix results * fix greptile * Fix mypy issues * Remove unused class constants _GA_TEXT_DELTA_TYPES and _GA_AUDIO_DELTA_TYPES These frozensets were defined as class-level constants in realtime_streaming.py but never referenced anywhere in the codebase. Removing dead code. Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com> * fix(realtime): use GA-shaped session.update in guardrail injections The guardrail VAD injection code sent a beta-style session.update with a flat turn_detection field: {"session": {"turn_detection": {"create_response": false}}} When the upstream OpenAI backend operates in GA mode (no OpenAI-Beta header forwarded), it requires the nested GA shape: {"session": {"type": "realtime", "audio": {"input": {"turn_detection": {"create_response": false}}}}} The _remap_beta_session_to_ga helper was only applied to client- originated session.update messages in client_ack_messages. Internally- generated session.updates (sent via _send_to_backend) in two paths: - _handle_raw_backend_message (raw/no provider_config path, line 518) - backend_to_client_send_messages provider_config path (line 481) bypassed the remap, so GA upstreams ignored or rejected them, breaking audio transcription guardrails for all non-beta clients. Fix: add _make_disable_auto_response_message() helper that always emits the correct GA-shaped session.update, and replace both injection sites with it. Update existing tests to assert the GA nested shape instead of the old flat beta shape, and add a new unit test for the helper itself. Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com> * Log realtime session type * Fix beta realtime session payloads * Fix realtime audio format remapping edge case * Fix Azure realtime beta session shape --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com> Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>	2026-05-05 16:49:20 -07:00
harish-berri	4fec69dd1e	refactor(BaseAWSLLM): implement shared IAM cache and static credentia… (#27125 ) * refactor(BaseAWSLLM): implement shared IAM cache and static credential caching - Introduced a process-wide shared IAM cache to optimize credential management across instances. - Added a method to handle caching of static credentials, ensuring only long-lived credentials are cached. - Updated the get_credentials method to utilize the new caching mechanism for static credential flows. - Enhanced unit tests to verify the correct behavior of the shared cache and static credential usage. * refactor(BaseAWSLLM): enhance IAM credential caching and update related tests - Improved the process-wide IAM credential caching mechanism to better handle static and AssumeRole credentials. - Renamed the caching method for clarity and updated comments to reflect the new caching behavior. - Added a fixture to ensure the IAM cache is flushed between tests to prevent leakage of cached entries. - Updated unit tests to verify the correct behavior of the shared IAM cache, particularly for static credentials and role assumptions. * refactor(BaseAWSLLM): clarify IAM credential caching behavior and enhance tests - Updated documentation to specify that only static and ambient environment credentials are cached, excluding AssumeRole and other credential types. - Modified the caching logic to ensure that AssumeRole credentials are not stored in the IAM cache, requiring STS calls for each request. - Enhanced unit tests to verify that AssumeRole credentials are not cached and to ensure proper behavior of the IAM cache across different scenarios. * Code Readability improvement for aws auth path * refactor(BaseAWSLLM): enhance IAM credential caching documentation and add tests - Updated comments to clarify the behavior of the in-process IAM credential cache, specifying the TTL for static and ambient credentials. - Added new unit tests to verify the caching behavior for ambient environment credentials across instances and ensure that static access key sessions are constructed only once when cached. - Ensured that temporary session tokens and AWS profiles are not cached, validating the expected behavior through additional tests. * refactor(BaseAWSLLM): improve IAM credential handling and add tests for role assumption - Updated comments to clarify the behavior of IAM credential caching, particularly regarding the handling of ambient credentials and role assumptions. - Enhanced unit tests to verify that the caching mechanism correctly distinguishes between already running roles and new role assumptions, ensuring that cached environment credentials are not reused incorrectly. - Added a new test case to validate the behavior when switching roles, confirming that the system correctly uses AssumeRole when the role changes.	2026-05-05 16:47:47 -07:00
yuneng-jiang	e84282b7b3	[Infra] Bump deps (#27157 ) * bump: version 0.4.70 → 0.4.71 * bump: version 0.1.39 → 0.1.40 * uv lock	2026-05-05 15:58:05 -07:00
yuneng-jiang	7abafe50fb	chore: update Next.js build artifacts (2026-05-05 22:45 UTC, node v20.20.2) (#27240 )	2026-05-05 22:51:21 +00:00
ryan-crabbe-berri	1b25f853ce	[Fix] Team UI: handle legacy dict shape for metadata.guardrails (#27224 ) * [Fix] Team UI: handle legacy dict shape for metadata.guardrails A team can have metadata.guardrails stored as {"modify_guardrails": bool} (the permission-flag shape introduced in PR #4810) rather than the expected string[]. The opt-out logic added in PR #25575 calls .filter() on this field, which throws TypeError on a dict and crashes the team detail page. Add a safeGuardrailsList helper that returns [] when the field is not an array, and route the three read sites through it. * [Fix] Team UI: inline Array.isArray guards for guardrails metadata Replace the safeGuardrailsList helper with inline Array.isArray checks at each call site, and apply the same guard to opted_out_global_guardrails for consistency. No known legacy dict rows for opted_out_global_guardrails, but the unguarded `\|\| []` pattern is the same shape risk. Six call sites now defended directly: three for metadata.guardrails and three for metadata.opted_out_global_guardrails.	2026-05-05 15:40:44 -07:00
Mateo Wang	7e13256fee	test: add 24hr Redis-backed VCR cache to additional test suites (#27159 ) * test: add 24hr Redis-backed VCR cache to additional test suites Extracts the existing llm_translation VCR plumbing into a reusable helper (tests/_vcr_conftest_common.py) and wires it into the conftest.py files of the test directories listed in LIT-2787: audio_tests, batches_tests, guardrails_tests, image_gen_tests, litellm_utils_tests, local_testing, logging_callback_tests, pass_through_unit_tests, router_unit_tests, unified_google_tests The same helper is also adopted by the pre-existing llm_translation and llm_responses_api_testing conftests to remove the copy-pasted VCR setup. Each consuming conftest: - registers the Redis persister via pytest_recording_configure - auto-marks collected tests with pytest.mark.vcr (skipping respx-using files where applicable, since respx and vcrpy both patch httpx) - gates cassette writes on test success via _vcr_outcome_gate The cache is opt-in via CASSETTE_REDIS_URL; when unset, VCR is disabled and tests hit live providers as before. LITELLM_VCR_DISABLE=1 still forces a bypass for ad-hoc local runs. Test directories that run LiteLLM proxy in Docker (build_and_test, proxy_logging_guardrails_model_info_tests, proxy_store_model_in_db_tests) are intentionally not included: VCR.py patches the in-process httpx transport and cannot intercept calls made from inside a Docker container. The installing_litellm_on_python* jobs make no LLM calls and don't benefit from caching. https://linear.app/litellm-ai/issue/LIT-2787/add-24hr-caching-to-additional-test-suites * test(vcr): add safe-body matcher to handle JSONL and binary request bodies vcrpy's stock body matcher inspects Content-Type and unconditionally runs json.loads on application/json bodies. JSON Lines payloads (used by the Bedrock batch S3 PUT and other upload paths) crash that with json.JSONDecodeError: Extra data, before the matcher can return 'not a match'. This was the root cause of the batches_testing CI job failing on test_async_create_file once VCR auto-marking was applied to the batches_tests directory. Add a conservative byte-equality body matcher and use it in place of 'body' in the shared match_on tuple. The matcher is strictly more conservative than vcrpy's default — the only thing it gives up is 'different JSON key order is treated as the same body', which doesn't apply to deterministic litellm-built request payloads. It can never produce a false positive that the default would have rejected, so there is no cross-contamination risk. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * test(vcr): exclude tests that VCR replay actively breaks A few tests are incompatible with cassette replay and were failing on the latest CI run after VCR auto-marking was extended to local_testing and logging_callback_tests: - test_amazing_s3_logs.py (logging_callback_tests): the test asserts on a per-run response_id that should round-trip through a real S3 PUT/LIST. vcrpy's boto3 stub intercepts the PUT and the LIST replays stale keys, so the freshly-generated id is never found. - test_async_embedding_azure (logging_callback_tests) and test_amazing_sync_embedding (local_testing): the failure branches deliberately pass api_key='my-bad-key' to assert that the failure callback fires. We scrub auth headers from cassettes (so the bad-key request matches the prior good-key request), and vcrpy replays the recorded 200 — the failure callback never fires. - test_assistants.py (local_testing): the OpenAI Assistants polling APIs mint fresh thread/run IDs every recording session and then poll until status=='completed'. Replays of those polled GETs can never match a freshly-generated run id, so every CI run effectively re-records and the suite blows past the 15m no_output_timeout. Skip these from VCR auto-marking so they continue to hit live providers as they did before this change. The remaining tests in each directory still get cached. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * test(vcr): expand skip lists for second batch of incompatible tests Followup to the previous commit. After re-running CI on the rebuilt branch, three more tests surfaced as VCR-replay-incompatible: - litellm_utils_testing :: test_get_valid_models_from_dynamic_api_key Calls GET /v1/models with api_key='123' to assert the result is empty. We scrub auth headers, so the bad-key request matches the prior good-key cassette and replays the recorded model list. - litellm_utils_testing :: test_litellm_overhead.py Measures litellm_overhead_time_ms as a percentage of total wall-clock time. With cached responses the upstream 'network' time collapses to microseconds, blowing past the 40%% threshold the test asserts on. Skip the whole file (every parametrization is at risk). - local_testing_part1 :: test_async_custom_handler_completion and test_async_custom_handler_embedding Same bad-key failure-callback pattern as the already-skipped test_amazing_sync_embedding. - litellm_router_testing :: test_router_caching.py Asserts on litellm's own router-level response cache by comparing response1.id to response2.id across repeat upstream calls (test bypasses litellm cache via ttl=0 and expects upstream to return a new id). With VCR replay both upstream calls return the same cassette body, so the ids are identical. Skip the whole file. - logging_callback_tests :: test_async_chat_azure (preemptive) Same shape as already-skipped test_async_embedding_azure; was masked by upstream OpenAI rate-limit failures on baseline. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * test(vcr): use item.path and tighten matcher docstring - Replace pytest's deprecated item.fspath with item.path in apply_vcr_auto_marker_to_items so we don't emit deprecation warnings under pytest 8. - Clarify _safe_body_matcher docstring to reflect actual behavior (direct == first, then UTF-8 bytes comparison, no repr fallback). Addresses Greptile review feedback on PR #27159. * test(vcr): swallow all RedisError on cassette save/load Cassette persistence is strictly best-effort: any Redis-side failure (connection blip, timeout, OutOfMemoryError when the maxmemory cap is hit, READONLY replicas, etc.) should degrade to 'test passed but cassette not cached' rather than fail the test on teardown. Previously the persister only caught ConnectionError and TimeoutError, so OutOfMemoryError — which Redis Cloud raises when the cassette cache hits its memory cap and there are no evictable keys — propagated out of vcrpy's autouse fixture and ERRORed otherwise-passing tests on teardown. This caused the litellm_utils_testing CircleCI job to fail on the latest commit's run, even though the underlying test was a unit test that used mock_response and produced no real upstream traffic (the cassette was dirtied by a background langfuse callback). The rerun only succeeded because Redis evictions happened to free enough room before the SET — i.e. it was timing-dependent flakiness. Catch redis.exceptions.RedisError (the common base of all server- and client-side Redis exceptions) on both save and load, and parametrize the regression tests across ConnectionError, TimeoutError, and OutOfMemoryError to pin the new behavior. * test(vcr): surface cassette-cache failures with warnings + session banner When the persister silently swallows a Redis OOM (or any RedisError) on save/load there is otherwise no visible signal that the cache is degraded — tests pass, the cassette just isn't persisted, and the next session still hits the same Redis at the same near-cap memory. Add three layers of observability so that failure mode is loud: 1. Per-process health counters ("save_failures", "load_failures", and the last error string for each), exposed via cassette_cache_health() and reset via reset_cassette_cache_health(). The persister increments these in addition to logging. 2. VCRCassetteCacheWarning (UserWarning subclass) emitted via warnings.warn() inside the persister's except block. Pytest's built-in warnings summary at session end automatically lists every such warning, so the failure is visible in CI logs without any conftest-level wiring. 3. Session-end banner via emit_cassette_cache_session_banner() and a stderr-fallback atexit handler registered from register_persister_if_enabled(). Two states: - red "VCR CASSETTE CACHE DEGRADED" when save_failures or load_failures > 0 - yellow "VCR CASSETTE CACHE NEAR CAPACITY" (no failures, but used_memory >= 85% of maxmemory) so the next session knows the Redis is approaching OOM before any SET actually fails Capacity comes from a best-effort INFO memory probe (cassette_cache_capacity_snapshot) that returns None on any failure or when maxmemory is uncapped. The atexit handler skips xdist workers so only the controller emits. Tests: parametrize the existing save/load swallow-error tests across ConnectionError/TimeoutError/OutOfMemoryError, add direct tests for the health counters and warning emission, and a new test_vcr_conftest_common_banner.py covering banner output for every state (silent/red/yellow/disabled/xdist-worker). * test(vcr): bucket cassettes by API key fingerprint, drop bad-key skips Tests that deliberately call an LLM API with a bad key (e.g. to assert that the failure callback fires, or that check_valid_key returns False) were being silently served the prior good-key cassette: we scrub the real Authorization / x-api-key header from the cassette before storing it, so a follow-up bad-key call is byte-identical to the good-key call under the existing match_on tuple. Add a 'key_fingerprint' custom matcher that distinguishes requests by the SHA-256 of their API-key headers. The fingerprint is stamped into a synthetic 'x-litellm-key-fp' header by a new before_record_request hook, which then strips the real auth headers (we have to do the scrubbing here instead of via vcrpy's filter_headers knob, because filter_headers runs first and would erase the value we want to hash). Bad-key requests now get a different cassette bucket than good-key requests, so vcrpy will not replay a recorded 200 in place of the expected 401. The fingerprint is a one-way hash of the secret, so cassettes never contain the key. This permanently removes the 'bad-key' category of skips: - tests/local_testing: dropped ::test_amazing_sync_embedding, ::test_async_custom_handler_completion, ::test_async_custom_handler_embedding - tests/logging_callback_tests: dropped ::test_async_chat_azure, ::test_async_embedding_azure - tests/litellm_utils_tests: dropped ::test_get_valid_models_from_dynamic_api_key Coverage: 7 new unit tests in tests/test_litellm/test_vcr_safe_body_matcher.py covering header stripping, fingerprint determinism, no-auth bucketing, good-vs-bad key discrimination, x-api-key (Anthropic/Azure) discrimination, and idempotence under replay. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * test(vcr): drop redundant comments and docstrings Trim narration of code that is already self-evident from function and variable names. Keep the two genuinely non-obvious bits: - ordering constraint between filter_headers and before_record_request, which would invite a maintainer to re-introduce the bug if removed - the per-directory _VCR_INCOMPATIBLE_FILES rationale, since 'why exactly is this skipped' is not knowable from the test name alone Also drop the 40-line commented-out drop-in conftest snippet at the bottom of _vcr_conftest_common.py — the consuming conftests are the canonical reference. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * test(vcr): make _before_record_request idempotent vcrpy invokes before_record_request more than once per request: can_play_response_for calls it, then __contains__ / _responses (reached via play_response) call it again on the result. The second invocation sees a request whose auth headers we already stripped, so a naive recompute yields "no-key" and overwrites the real fingerprint stored in the header. This makes can_play_response_for and play_response disagree on matchability — the former says "yes, we have a stored response for this" (matching no-key to no-key) and the latter throws UnhandledHTTPRequestError because it computes a fresh real fingerprint that doesn't match the stored no-key. In CI this manifested as ~30 failing tests across guardrails_testing, audio_testing, batches_testing, image_gen_testing, llm_responses_api, litellm_router_unit_testing, etc. Skip the recompute when the header is already set, so re-applying the hook is a no-op. Adds a regression test that fires the hook twice on the same dict and asserts the fingerprint stays put. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * test(vcr): drop more redundant docstrings and headers * test(vcr): enable 24hr cache for ocr_tests and search_tests These two directories were the only non-dockerized test suites in the build_and_test workflow that make live LLM/provider API calls but were not VCR-enabled by this PR. Together they account for 96 tests: - tests/ocr_tests/ (31): Mistral OCR, Azure AI OCR, Azure Document Intelligence, Vertex AI OCR. Pure-unit tests inside the same files (e.g. TestAzureDocumentIntelligencePagesParam) make no HTTP calls and become benign VCR NOOPs. - tests/search_tests/ (65): Brave, DataForSEO, DuckDuckGo, Exa, Firecrawl, Google PSE, Linkup, Parallel.ai, Perplexity, SearchAPI, Searxng, Serper, Tavily. Both directories use the canonical minimal conftest pattern from tests/audio_tests/conftest.py with no skip lists. None of the test files use respx, none assert on per-call upstream non-determinism (no response1.id != response2.id, no overhead-as-fraction-of-total, no live polling), so the default match_on tuple should cache cleanly. If a flake surfaces during the first cassette-recording CI run, we can add a targeted skip the same way we did for the other dirs. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>	2026-05-05 15:13:31 -07:00
yuneng-jiang	fa6a82ab0f	[Fix] UI: Clear Admin Session Cookies Before Establishing Invited User's Session (#27227 ) The invite-signup form was writing the new user's token via raw `document.cookie` at `path=/`, while the rest of the auth surface uses `storeLoginToken` (which writes at `path=/ui` and mirrors to sessionStorage). After signup the inviter's `path=/ui` cookie kept winning path-specificity matching, and sessionStorage still held the inviter's token, so the dashboard rendered as the inviter rather than the newly created user. Treat invite signup as a principal-change boundary — clear prior session cookies first, then store the new token via the canonical helper.	2026-05-05 13:57:04 -07:00
yuneng-jiang	f318ef03bd	Merge pull request #27170 from BerriAI/litellm_/unruffled-mcclintock-62a296 [Fix] Docker: Remove Hardcoded Prisma Binary Target For Multi-Arch Builds	2026-05-04 21:43:14 -07:00
shin-berri	43c78057d4	Merge pull request #27169 from BerriAI/litellm_/vigorous-albattani-2b7480 [Perf] CI: Skip Redundant Playwright Apt Install in E2E UI Job	2026-05-04 21:39:43 -07:00
Yuneng Jiang	4ee586a321	[Fix] Docker: Remove Hardcoded Prisma Binary Target For Multi-Arch Builds PRISMA_CLI_BINARY_TARGETS="debian-openssl-3.0.x" was hardcoded in docker/Dockerfile.non_root by #17695. On a buildx linux/arm64 leg this forces prisma to download the amd64 schema-engine into an arm64 image, so 'prisma migrate deploy' fails at startup with 'Could not find schema-engine binary'. Removing the env lets prisma auto-detect per build platform: amd64 builds still resolve to debian-openssl-3.0.x (Wolfi falls back to debian, same binary as before), and arm64 builds now correctly fetch linux-arm64-openssl-3.0.x. The offline-cache pre-warm goal of #17695 is preserved — only which binaries fill the cache changes. Fixes #19458	2026-05-04 21:37:16 -07:00
Yuneng Jiang	19ad964c4a	Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_/vigorous-albattani-2b7480	2026-05-04 21:19:34 -07:00
Yuneng Jiang	c1c0506d2c	[Perf] CI: Skip Redundant Playwright Apt Install in E2E UI Job The cimg/python:3.12-browsers base image already ships every Chromium system dependency Playwright needs (libnss3, libatk-bridge2.0-0, libcups2, etc. — the install log shows them all as "already the newest version"). Passing --with-deps to `npx playwright install` therefore runs an apt-get update + install for nothing, but pays the full cost of hitting Ubuntu mirrors. On a recent run those mirrors stalled hard: apt-get update alone took 6m53s at 81.5 kB/s with several archives returning connection refused. Drop --with-deps and persist ~/.cache/ms-playwright alongside node_modules so the Chromium binary is also reused across runs. Bump the cache key to v2 so the existing v1 entry (which only contained node_modules) is not loaded and skipped over the new browser path.	2026-05-04 21:19:31 -07:00
yuneng-jiang	cd38ecd532	Merge pull request #27156 from BerriAI/yj_build_may4 [Infra] Build UI	2026-05-04 21:12:20 -07:00
shin-berri	ff3d089ab8	Merge pull request #27160 from BerriAI/litellm_/peaceful-gates-6e46e7 [Fix] Proxy: Break managed-resources import cycle on Python 3.13	2026-05-04 21:11:53 -07:00
yuneng-jiang	f2969ca78a	Merge pull request #27165 from BerriAI/litellm_/friendly-lichterman-35cf02 [Fix] CI: Enable VCR replay for test_azure_o_series	2026-05-04 20:59:46 -07:00
Yuneng Jiang	0976fbc6c4	[Fix] Tests: Restore /metrics access for prometheus test suite /metrics now requires auth by default; tests/otel_tests/test_prometheus.py makes 4+ unauthenticated GETs against http://0.0.0.0:4000/metrics, so every prometheus test in CI now fails the metric assertion. Set require_auth_for_metrics_endpoint: false in otel_test_config.yaml to opt out for this test job, which scrapes /metrics directly. Verified locally: 8/8 prometheus tests green (one flaky retry on test_proxy_success_metrics that pre-dates this PR). Also drop the -x stop-on-first-failure flag from the otel test command so all failures in the job surface in a single CI run rather than hiding behind whichever one trips first.	2026-05-04 20:54:54 -07:00
Yuneng Jiang	6a6c79d992	[Fix] CI: Enable VCR replay for test_azure_o_series The Azure o-series tests were excluded from the conftest's VCR auto-marker because of a respx/vcrpy transport-patching conflict, but the only respx reference in the file was an unused `MockRouter` import. Drop the dead import and remove the file from the conflict set so cassettes record on first run and replay thereafter, eliminating the 60-95s live Azure latency that was crashing xdist workers under --timeout=120 thread-mode timeouts.	2026-05-04 20:48:26 -07:00
Sameer Kankute	b0edffb883	Merge pull request #27103 from BerriAI/litellm_azure-deployment-image-body fix(azure): omit model from deployment image gen and image edit bodies	2026-05-05 09:09:45 +05:30
Yuneng Jiang	e6f524f951	[Fix] Tests: Pick chat-completion OTEL trace by content, not recency The /otel-spans endpoint returns process-wide spans and tags most_recent_parent by max start_time. After tightening that route to proxy_admin (sk-1234), the GET /otel-spans request itself emits auth spans that beat the chat-completion spans on start_time, so most_recent_parent now points at the request's own auth trace (['postgres', 'postgres']) and the >=5-span assertion fails. Pick the chat-completion trace by content: it is the only trace whose span list is a superset of {postgres, redis, raw_gen_ai_request, batch_write_to_db}. Verified locally end-to-end against otel_test_config.yaml + OTEL_EXPORTER=in_memory: 3/3 runs green.	2026-05-04 20:35:09 -07:00
Sameer Kankute	4487d8352f	Merge pull request #27115 from Sameerlite/litellm_health_check_reasoning_effort feat(proxy): add health_check_reasoning_effort for model health checks	2026-05-05 09:00:09 +05:30
Yuneng Jiang	8a1b6635fa	[Fix] Tests: Use master key for /otel-spans in test_chat_completion_check_otel_spans /otel-spans now requires proxy admin (returns 401 'Only proxy admin can be used to generate, delete, update info for new keys/users/teams. Route=/otel-spans' for non-admin callers). Switch the GET call to use the master key sk-1234 while keeping the generated key for the chat-completion request that produces the spans.	2026-05-04 20:23:11 -07:00
Sameer Kankute	b4ee6a2355	test(proxy): cover health_check_reasoning_effort for completion mode Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-05 08:52:57 +05:30
Sameer Kankute	bb0e4168ad	refactor(azure): move image gen JSON helper; rename image edit finalize hook - Add image_generation/http_utils.azure_deployment_image_generation_json_body; call from azure.py (keeps AzureChatCompletion focused on chat). - Rename finalize_image_edit_multipart_data to finalize_image_edit_request_data with docstring covering multipart and JSON POST payloads (review feedback). Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-05 08:49:46 +05:30
Yuneng Jiang	193907a4a3	[Fix] Lint: Mark _user_has_admin_view re-export in common_utils Ruff F401 flagged the aliased import as unused within common_utils.py because the name is consumed only by external modules (~15 callers across guardrails, spend tracking, MCP, agents, management endpoints). Add `# noqa: F401 re-exported` so the alias survives lint while keeping a single source of truth in litellm.proxy._types.	2026-05-04 20:16:59 -07:00
Yuneng Jiang	8cac6c5bff	[Fix] Proxy: Address Greptile feedback on hook-cycle PR - Move _user_has_admin_view to litellm.proxy._types as user_api_key_has_admin_view (single source of truth). common_utils.py and isolation.py both import from there now, removing the duplicated role-check that could silently diverge if new admin roles are added. - Add pytest.importorskip("litellm_enterprise") to the two regression tests that assert managed_files / managed_vector_stores are registered; those keys come from ENTERPRISE_PROXY_HOOKS so the tests would fail unconditionally in a checkout without the enterprise extra installed.	2026-05-04 20:13:31 -07:00
Yuneng Jiang	727ab8dcc4	[Fix] Proxy: Break managed-resources import cycle on Python 3.13 The Python 3.13 CCI smoke matrix surfaces a partially-initialized-module ImportError when loading the managed files hook chain: litellm.proxy.hooks/__init__ (mid-import) -> enterprise.enterprise_hooks -> litellm_enterprise.proxy.hooks.managed_files -> litellm.llms.base_llm.managed_resources.isolation -> litellm.proxy.management_endpoints.common_utils -> litellm.proxy.utils (re-enters litellm.proxy.hooks) The except ImportError block in hooks/__init__.py silently swallowed the failure, leaving managed_files unregistered and POST /files returning 500 "Managed files hook not found". Two-layer fix: - Inline the 3-line _user_has_admin_view check in isolation.py instead of importing it from litellm.proxy.management_endpoints.common_utils. litellm.llms.* should not depend on litellm.proxy.* — removing this layering violation breaks the cycle at its root. - Define PROXY_HOOKS and get_proxy_hook before the conditional enterprise import in litellm/proxy/hooks/__init__.py, so any future re-entry resolves the public names instead of hitting an ImportError on a partially-initialized module. Also fold in two unrelated CCI repairs surfaced in the same staging run: - tests/otel_tests/test_key_logging_callbacks.py: per-key gcs_bucket_name / gcs_path_service_account are now stripped by initialize_dynamic_callback_params, so the GCS client falls through to the env-only branch. Update the assertion to match the new "GCS_BUCKET_NAME is not set" message. - .circleci/config.yml: tests/pass_through_tests now resolves google-auth-library@10.x via the @google-cloud/vertexai 1.12.0 bump, which uses dynamic ESM imports Jest 29 cannot load without --experimental-vm-modules. Pass that flag in the Vertex JS test step. Adds tests/test_litellm/proxy/hooks/test_proxy_hooks_init.py as a regression guard: managed_files / managed_vector_stores must register, and isolation.py must not transitively import litellm.proxy.utils.	2026-05-04 20:05:24 -07:00
Yuneng Jiang	7c8409d013	chore: update Next.js build artifacts (2026-05-05 02:13 UTC, node v20.20.2)	2026-05-04 19:13:25 -07:00
yuneng-jiang	9ea824d5bf	Merge pull request #27143 from BerriAI/cursor/fix-secret-fields-in-spend-logs-a532 fix(security): prevent secret_fields from leaking into spend logs	2026-05-04 19:07:54 -07:00
yuneng-jiang	be5f217aaf	Merge pull request #26861 from BerriAI/litellm_fix_scim_virtual_key_deactivation fix(scim): revoke virtual keys when SCIM deprovisions a user	2026-05-04 19:03:55 -07:00
Cursor Agent	5923c3209b	fix(security): prevent secret_fields from leaking into spend logs secret_fields (containing raw HTTP headers including Authorization Bearer tokens) was being included in proxy_server_request['body'] because the body snapshot was a copy.copy(data) of the full request dict. This body gets serialized and persisted in the LiteLLM_SpendLogs table, exposing user credentials in the database. Root cause: data['secret_fields'] was set before the body snapshot at data['proxy_server_request']['body'] = copy.copy(data), so the full raw headers (including auth tokens) ended up in the snapshot. Fix (defense in depth): 1. Exclude 'secret_fields' when creating the body snapshot in litellm_pre_call_utils.py (primary fix) 2. Strip 'secret_fields' in _sanitize_request_body_for_spend_logs_payload as a secondary safeguard secret_fields remains available on the live data dict for legitimate downstream consumers (MCP, Responses API). Co-authored-by: Krrish Dholakia <krrish-berri-2@users.noreply.github.com>	2026-05-05 02:01:41 +00:00
yuneng-jiang	555a8131fe	Merge pull request #26951 from stuxf/codex/skills-containers-tenant-guard chore(proxy): tighten resource ownership checks	2026-05-04 18:47:17 -07:00
yuneng-jiang	2f305050ce	Merge pull request #27004 from stuxf/fix/managed-resource-service-account-isolation fix(proxy): isolate managed resources for service-account API keys	2026-05-04 18:45:55 -07:00
user	3dcb6bd3f9	Merge remote-tracking branch 'upstream/litellm_internal_staging' into codex/skills-containers-tenant-guard # Conflicts: # litellm/proxy/auth/auth_utils.py	2026-05-05 01:41:25 +00:00
user	7faba9656f	Merge remote-tracking branch 'upstream/litellm_internal_staging' into fix/managed-resource-service-account-isolation	2026-05-05 01:38:11 +00:00
yuneng-jiang	281296f9cf	Merge pull request #27151 from BerriAI/litellm_yj_may4 [Infra] Merge dev branch	2026-05-04 18:29:52 -07:00
user	aee064ad37	Merge remote-tracking branch 'upstream/litellm_internal_staging' into fix/managed-resource-service-account-isolation	2026-05-05 01:29:05 +00:00
yuneng-jiang	dcb357ee2d	Merge pull request #27149 from BerriAI/litellm_/peaceful-bell-ba8ca5 [Fix] Tests: Replace deprecated openrouter/claude-3.7-sonnet with claude-sonnet-4.5	2026-05-04 18:27:45 -07:00
yuneng-jiang	efca16ccfa	Merge pull request #27043 from stuxf/fix/ssti-prompt-managers fix(security): sandbox jinja2 in gitlab/arize/bitbucket prompt managers	2026-05-04 18:23:41 -07:00
Yuneng Jiang	e35cd5af76	Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_yj_may4	2026-05-04 18:22:47 -07:00
Yuneng Jiang	7f550a5d67	Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_/peaceful-bell-ba8ca5	2026-05-04 18:21:33 -07:00
Yassin Kortam	db2a3cafb6	Merge pull request #27131 from BerriAI/litellm_fix/routing-groups-ui feat: routing groups ui	2026-05-04 18:16:49 -07:00
mateo-berri	4179159f0f	Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_azure-deployment-image-body	2026-05-04 18:16:46 -07:00
Yassin Kortam	a56256e5ee	feat: routing groups ui	2026-05-04 18:09:14 -07:00

1 2 3 4 5 ...

38958 Commits