* fix(hosted_vllm): normalize custom tools for chat completions
Convert custom tool definitions into OpenAI function tools before forwarding hosted_vllm chat requests to avoid provider-side validation failures. Add a regression test and include a local curl verification screenshot.
Made-with: Cursor
* Fix black issue
* Fix hosted vllm custom tool schema fallback
* fix black
---------
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Several tests parametrized over (model, api_key, ...) tuples or raw
token strings, causing pytest to embed those values in the test ID
and print them in CI logs. Refactored each affected test to keep the
same coverage without putting key material into parametrize.
- audio_tests/test_audio_speech.py: split env-var keys into separate
azure/openai test functions sharing a helper; sync_mode parametrize
preserved.
- audio_tests/test_whisper.py: split into openai_whisper /
azure_whisper functions sharing a helper; response_format parametrize
preserved.
- local_testing/test_embedding.py: single-case parametrize inlined.
- proxy_unit_tests/test_user_api_key_auth.py: 5 header parametrize
cases split into 5 named tests sharing an _assert helper.
- proxy_unit_tests/test_proxy_utils.py: 4 api_key_value cases split
into 4 named tests.
- test_litellm/proxy/auth/test_user_api_key_auth.py: 5 key-prefix
cases (Bearer / Basic / lowercase bearer / raw / AWS SigV4) split
into 5 named tests.
Verified: black clean; 14 refactored unit tests pass; pytest collects
audio/embedding tests with safe IDs (no key material in test IDs).
* feat(audio_transcription): add NVIDIA Riva STT provider
Adds nvidia_riva as a new audio transcription provider, supporting both
NVCF-hosted and self-hosted Riva ASR deployments via gRPC streaming.
- Auto-resamples input audio to 16 kHz mono LINEAR_PCM (soundfile + numpy,
audioread fallback) so callers can send any common format.
- Maps OpenAI params: language (en -> en-US), response_format (text/json/
verbose_json), timestamp_granularities=["word"] -> enable_word_time_offsets,
word offsets converted ms -> s for verbose_json.
- Auth: NVCF when nvcf_function_id is set (SSL on by default), self-hosted
otherwise (SSL off by default), with explicit use_ssl override.
- gRPC errors wrapped via NvidiaRivaException -> litellm exception classes.
- Optional deps gated behind [stt-nvidia-riva] extra (nvidia-riva-client,
soundfile, audioread, numpy).
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(nvidia_riva): address PR review feedback
- handler: forward call-level `timeout` to streaming_response_generator
(kwarg-detected via inspect for older riva-client compat) so a stalled
Riva server cannot block the caller indefinitely.
- audio_utils: spill bytes to a tempfile before audioread.audio_open;
most audioread backends (FFmpeg, GStreamer) require a real filesystem
path and previously raised TypeError on BytesIO, breaking the mp3/m4a
fallback path.
- audio_utils: prefer soxr / scipy.signal.resample_poly for resampling
(anti-aliased polyphase) when installed, falling back to linear only
as a last resort. Avoids aliasing on 44.1/48 kHz -> 16 kHz downsamples.
- transformation: bare `es` now maps to es-ES (Castilian) instead of
es-US, matching BCP-47 conventions.
Co-authored-by: Cursor <cursoragent@cursor.com>
* chore: trigger CI re-run [stabilize loop 1/3]
* Update litellm/llms/nvidia_riva/audio_transcription/transformation.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* chore: trigger CI re-run [stabilize loop 1/3]
* fix code qa
* fix lint
* fix mypy
* fix mypy
* Fix NVIDIA Riva ASR service lookup
* Fix NVIDIA Riva transcription payload logging
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: oss-pr-review-agent-shin[bot] <281797381+oss-pr-review-agent-shin[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
* fix(anthropic, mcp): sanitize tool names to match Anthropic's `^[a-zA-Z0-9_-]{1,128}$`
Tool names with characters like `/` or `.` (commonly produced by the
OpenAPI -> MCP generator from `operationId`s such as
`actions/download-job-logs-for-workflow-run`) caused Anthropic to reject
requests with `tools.N.custom.name: String should match pattern
'^[a-zA-Z0-9_-]{1,128}$'`.
Two layers of fix:
1. Anthropic transformation: build a per-request forward map (original ->
sanitized, disambiguated by suffix on collisions) and a reverse map
(only for names actually rewritten). Forward map is applied to tool
defs, `tool_choice`, and historical assistant tool_calls in messages.
Reverse map is threaded through both the non-streaming and streaming
response paths so callers continue to see their original tool names
in `tool_use` blocks.
2. OpenAPI -> MCP generator: sanitize `operationId` (and the
method+path fallback) at registration time so generated MCP tools are
valid for any strict-name provider, not just Anthropic. The dashboard
preview endpoint applies the same sanitization for parity.
Includes unit tests covering: collision disambiguation between
`foo_bar` and `foo/bar` in the same request, reverse-map only firing
for actually-rewritten names, message rewrite for historical tool_calls,
streaming chunk_parser reverse-mapping, and sanitization of OpenAPI
operationIds plus the preview endpoint output.
Made-with: Cursor
* fix(anthropic): build tool-name maps in transform_request, not optional_params
The previous patch stashed the per-request forward and reverse tool-name
maps under ``optional_params["_anthropic_tool_name_forward_map"]`` and
``optional_params["_anthropic_tool_name_map"]``. ``optional_params`` is
the dict that becomes the JSON body via ``data = {**optional_params}``,
so those internal keys leaked over the wire and Anthropic 400'd with:
_anthropic_tool_name_forward_map: Extra inputs are not permitted
Worse, this meant *every* request whose tool list contained any name with
an invalid character (the exact case the patch was meant to fix) regressed
into a confusing meta-error pointing at LiteLLM's internal map instead of
the offending tool.
Fix: move all tool-name sanitization into ``transform_request``, which is
the single chokepoint already shared by ``AnthropicConfig``,
``AmazonAnthropicConfig`` (Bedrock invoke), ``VertexAIAnthropicConfig``,
and ``AzureAnthropicConfig`` (all call ``super().transform_request`` /
``AnthropicConfig.transform_request(self, ...)``). New static helper
``_sanitize_tool_names_in_request`` walks the already-Anthropic-shaped
``optional_params["tools"]`` (only ``type=="custom"`` entries -- hosted
tool names are reserved by Anthropic and must not be touched), builds
the per-request forward/reverse maps, and applies the forward map in
place to ``tools[*].name`` and ``tool_choice.name``. The reverse map is
stashed exclusively on ``litellm_params`` (which is never serialized to
a provider) under ``_anthropic_tool_name_map`` for the response paths
to consume.
Side effect of this restructure: ``map_openai_params`` is now a pure
OpenAI->Anthropic param translator with no side-channel state, which
matches its contract everywhere else in the codebase.
Tests: replaced the now-incorrect "stashes maps in optional_params"
tests with regressions that assert no underscore-prefixed keys appear
in either ``optional_params`` after ``map_openai_params`` or in the
final ``transform_request`` body. Added end-to-end coverage for:
sanitization in ``transform_request``, ``tool_choice`` rewriting,
historical ``tool_calls`` rewriting in messages, and hosted-tool
passthrough.
Made-with: Cursor
* fix(anthropic): always sanitize empty text content blocks
Anthropic 400s on `{"role": "user", "content": ""}` with:
"messages: text content blocks must be non-empty"
LiteLLM already had `_sanitize_empty_text_content` to rewrite empty text
to a placeholder, but it was gated behind `litellm.modify_params=True`.
With that flag off (default), empty content from upstream agent
frameworks (e.g. pydantic-ai) flowed straight through and tripped the
Anthropic validator.
Fix:
- Always run `_sanitize_empty_text_content` at the top of
`anthropic_messages_pt`, independent of `modify_params`. There is no
way to "pass through" an empty text block, so this is non-optional.
The richer tool-call sanitizations (Cases A/B/D, which actually
mutate conversation structure) remain gated on `modify_params`.
- Extend `_sanitize_empty_text_content` to also handle list-of-blocks
content (`[{"type": "text", "text": ""}]`), not just string content.
Adds 3 regression tests covering string content, list-of-blocks
content, and the no-op case (non-empty messages with modify_params off).
Made-with: Cursor
* fix(anthropic): drop dead tool-name forward-map params, fix mypy + caller-mutation
- remove unused `name_forward_map` param from `_map_tool_choice`,
`_map_tool_helper`, `_map_tools` and the `_apply_anthropic_tool_name_forward`
helper. Production sanitization runs in `_sanitize_tool_names_in_request`
at `transform_request`; these params were never threaded through.
- handler.py: use `ANTHROPIC_TOOL_NAME_REVERSE_MAP_KEY` constant instead of
the hardcoded `"_anthropic_tool_name_map"` string.
- fix mypy `"object" has no attribute "__iter__"` in
`_rewrite_tool_names_in_messages` by guarding `tool_calls` with
`isinstance(..., list)`.
- `_sanitize_tool_names_in_request`: build a new tools list with copy-on-
change entries (and copy `tool_choice` on rewrite) so a caller reusing
the same tool list/dicts across requests doesn't see its inputs
permanently rewritten.
- doc-comment `_build_request_tool_name_maps` clarifying it operates on
OpenAI-format tools (vs `_sanitize_tool_names_in_request` which runs
on Anthropic-format tools post-`_map_tools`).
- tests: drop 3 tests pinning the now-removed param paths; add coverage
for tool_calls + None function_call rewrite and caller-dict immutability.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(mcp): inherit stored credentials in test/tools/list for edit flow
When editing an existing MCP server, the Tool Configuration preview
calls POST /mcp-rest/test/tools/list with server_id but no credentials
(management API redacts them). The endpoint now calls
_inherit_credentials_from_existing_server() so stored bearer tokens
and OAuth2 M2M credentials are loaded from global_mcp_server_manager
automatically — tools load without re-entering credentials.
New servers (no server_id) and requests with explicit credentials are
unaffected (function is a no-op in both cases).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(mcp): show all tools in edit panel, not just allowed tools
Edit flow was passing externalTools (from GET /tools/list, filtered by
allowed_tools) to MCPToolConfiguration, disabling the internal hook.
Remove the external props so the internal hook fires via
POST /test/tools/list, which returns all tools unfiltered. Combined
with the credential inheritance fix, tools load automatically without
re-entering credentials and all tools are visible for re-configuration.
existingAllowedTools still pre-checks previously allowed tools.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Fix order-dependent collision in _build_anthropic_tool_name_maps
Use a two-pass approach: first pre-register all already-valid tool names
in the 'used' set, then sanitize/disambiguate names that need rewriting.
This ensures valid names always have priority regardless of input order,
preventing duplicate tool names on the wire when e.g. 'foo/bar' appears
before 'foo_bar' in the tool list.
Add regression test for the reversed ordering case.
* Fix OpenAPI tool name collision: disambiguate sanitized names with numeric suffixes
sanitize_openapi_tool_name replaces all invalid chars with '_', but when
two operationIds differ only by sanitized characters (e.g. 'foo/list' and
'foo.list' both become 'foo_list'), the second registration silently
overwrites the first in the tool registry.
Add collision disambiguation in register_tools_from_openapi that appends
_2, _3, ... suffixes when a sanitized name is already taken, mirroring
the existing logic in _build_anthropic_tool_name_maps.
* Fix preview endpoint missing collision disambiguation for tool names
Add used_names tracking and _2/_3 suffix disambiguation to
_preview_openapi_tools, matching the logic in register_tools_from_openapi.
Without this, two operationIds that sanitize to the same string (e.g.
'foo/list' and 'foo.list' both becoming 'foo_list') would show duplicate
names in the preview while registration would disambiguate them.
* Align preview HTTP method order with register_tools_from_openapi
The preview endpoint and register_tools_from_openapi both use
order-dependent collision disambiguation (_2, _3 suffixes). When the
iteration order differs, two operations on the same path with sanitized
names that collide get different suffixes in preview vs registration,
so the dashboard shows names that don't match what actually got
registered.
Also adds a regression test that fails on the swapped order.
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* Skip duplicate originals in _build_anthropic_tool_name_maps
If the same invalid tool name appeared twice in original_names (e.g.
['foo/bar', 'foo/bar']), the second occurrence overwrote the forward
map entry with a freshly-suffixed name (foo_bar_2), leaving foo_bar
orphaned in 'used' with no reverse mapping. _sanitize_tool_names_in_request
then rewrote both tool entries to foo_bar_2, and Anthropic 400'd on
duplicate tool names.
Skip the rewrite if forward already has the original mapped.
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* feat(realtime): OpenAI Realtime GA support and beta compatibility
- Normalize beta-style session.update to GA for upstream OpenAI; optional GA→beta
event translation when client sends OpenAI-Beta: realtime=v1
- Default upstream WebSocket without OpenAI-Beta; forward header when client opts in
- Extend OpenAI realtime types for GA event names and conversation item shapes
- Relax LiteLLMRealtimeStreamLoggingObject.results to List[Any] for GA events
- Update proxy client_secrets fallback to omit beta header; dashboard RealtimePlayground
- Add unit tests for remap, translation, and beta header helper
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix results
* fix greptile
* Fix mypy issues
* Remove unused class constants _GA_TEXT_DELTA_TYPES and _GA_AUDIO_DELTA_TYPES
These frozensets were defined as class-level constants in realtime_streaming.py
but never referenced anywhere in the codebase. Removing dead code.
Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>
* fix(realtime): use GA-shaped session.update in guardrail injections
The guardrail VAD injection code sent a beta-style session.update with a
flat turn_detection field:
{"session": {"turn_detection": {"create_response": false}}}
When the upstream OpenAI backend operates in GA mode (no OpenAI-Beta
header forwarded), it requires the nested GA shape:
{"session": {"type": "realtime", "audio": {"input": {"turn_detection": {"create_response": false}}}}}
The _remap_beta_session_to_ga helper was only applied to client-
originated session.update messages in client_ack_messages. Internally-
generated session.updates (sent via _send_to_backend) in two paths:
- _handle_raw_backend_message (raw/no provider_config path, line 518)
- backend_to_client_send_messages provider_config path (line 481)
bypassed the remap, so GA upstreams ignored or rejected them, breaking
audio transcription guardrails for all non-beta clients.
Fix: add _make_disable_auto_response_message() helper that always emits
the correct GA-shaped session.update, and replace both injection sites
with it.
Update existing tests to assert the GA nested shape instead of the old
flat beta shape, and add a new unit test for the helper itself.
Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>
* Log realtime session type
* Fix beta realtime session payloads
* Fix realtime audio format remapping edge case
* Fix Azure realtime beta session shape
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
* refactor(BaseAWSLLM): implement shared IAM cache and static credential caching
- Introduced a process-wide shared IAM cache to optimize credential management across instances.
- Added a method to handle caching of static credentials, ensuring only long-lived credentials are cached.
- Updated the get_credentials method to utilize the new caching mechanism for static credential flows.
- Enhanced unit tests to verify the correct behavior of the shared cache and static credential usage.
* refactor(BaseAWSLLM): enhance IAM credential caching and update related tests
- Improved the process-wide IAM credential caching mechanism to better handle static and AssumeRole credentials.
- Renamed the caching method for clarity and updated comments to reflect the new caching behavior.
- Added a fixture to ensure the IAM cache is flushed between tests to prevent leakage of cached entries.
- Updated unit tests to verify the correct behavior of the shared IAM cache, particularly for static credentials and role assumptions.
* refactor(BaseAWSLLM): clarify IAM credential caching behavior and enhance tests
- Updated documentation to specify that only static and ambient environment credentials are cached, excluding AssumeRole and other credential types.
- Modified the caching logic to ensure that AssumeRole credentials are not stored in the IAM cache, requiring STS calls for each request.
- Enhanced unit tests to verify that AssumeRole credentials are not cached and to ensure proper behavior of the IAM cache across different scenarios.
* Code Readability improvement for aws auth path
* refactor(BaseAWSLLM): enhance IAM credential caching documentation and add tests
- Updated comments to clarify the behavior of the in-process IAM credential cache, specifying the TTL for static and ambient credentials.
- Added new unit tests to verify the caching behavior for ambient environment credentials across instances and ensure that static access key sessions are constructed only once when cached.
- Ensured that temporary session tokens and AWS profiles are not cached, validating the expected behavior through additional tests.
* refactor(BaseAWSLLM): improve IAM credential handling and add tests for role assumption
- Updated comments to clarify the behavior of IAM credential caching, particularly regarding the handling of ambient credentials and role assumptions.
- Enhanced unit tests to verify that the caching mechanism correctly distinguishes between already running roles and new role assumptions, ensuring that cached environment credentials are not reused incorrectly.
- Added a new test case to validate the behavior when switching roles, confirming that the system correctly uses AssumeRole when the role changes.
* [Fix] Team UI: handle legacy dict shape for metadata.guardrails
A team can have metadata.guardrails stored as {"modify_guardrails": bool}
(the permission-flag shape introduced in PR #4810) rather than the
expected string[]. The opt-out logic added in PR #25575 calls .filter()
on this field, which throws TypeError on a dict and crashes the team
detail page.
Add a safeGuardrailsList helper that returns [] when the field is not
an array, and route the three read sites through it.
* [Fix] Team UI: inline Array.isArray guards for guardrails metadata
Replace the safeGuardrailsList helper with inline Array.isArray checks
at each call site, and apply the same guard to opted_out_global_guardrails
for consistency. No known legacy dict rows for opted_out_global_guardrails,
but the unguarded `|| []` pattern is the same shape risk.
Six call sites now defended directly: three for metadata.guardrails
and three for metadata.opted_out_global_guardrails.
* test: add 24hr Redis-backed VCR cache to additional test suites
Extracts the existing llm_translation VCR plumbing into a reusable helper
(tests/_vcr_conftest_common.py) and wires it into the conftest.py files
of the test directories listed in LIT-2787:
audio_tests, batches_tests, guardrails_tests, image_gen_tests,
litellm_utils_tests, local_testing, logging_callback_tests,
pass_through_unit_tests, router_unit_tests, unified_google_tests
The same helper is also adopted by the pre-existing llm_translation and
llm_responses_api_testing conftests to remove the copy-pasted VCR setup.
Each consuming conftest:
- registers the Redis persister via pytest_recording_configure
- auto-marks collected tests with pytest.mark.vcr (skipping respx-using
files where applicable, since respx and vcrpy both patch httpx)
- gates cassette writes on test success via _vcr_outcome_gate
The cache is opt-in via CASSETTE_REDIS_URL; when unset, VCR is disabled
and tests hit live providers as before. LITELLM_VCR_DISABLE=1 still
forces a bypass for ad-hoc local runs.
Test directories that run LiteLLM proxy in Docker (build_and_test,
proxy_logging_guardrails_model_info_tests, proxy_store_model_in_db_tests)
are intentionally not included: VCR.py patches the in-process httpx
transport and cannot intercept calls made from inside a Docker container.
The installing_litellm_on_python* jobs make no LLM calls and don't
benefit from caching.
https://linear.app/litellm-ai/issue/LIT-2787/add-24hr-caching-to-additional-test-suites
* test(vcr): add safe-body matcher to handle JSONL and binary request bodies
vcrpy's stock body matcher inspects Content-Type and unconditionally
runs json.loads on application/json bodies. JSON Lines payloads (used
by the Bedrock batch S3 PUT and other upload paths) crash that with
json.JSONDecodeError: Extra data, before the matcher can return
'not a match'.
This was the root cause of the batches_testing CI job failing on
test_async_create_file once VCR auto-marking was applied to the
batches_tests directory.
Add a conservative byte-equality body matcher and use it in place of
'body' in the shared match_on tuple. The matcher is strictly more
conservative than vcrpy's default — the only thing it gives up is
'different JSON key order is treated as the same body', which doesn't
apply to deterministic litellm-built request payloads. It can never
produce a false positive that the default would have rejected, so
there is no cross-contamination risk.
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* test(vcr): exclude tests that VCR replay actively breaks
A few tests are incompatible with cassette replay and were failing on
the latest CI run after VCR auto-marking was extended to local_testing
and logging_callback_tests:
- test_amazing_s3_logs.py (logging_callback_tests): the test asserts on
a per-run response_id that should round-trip through a real S3
PUT/LIST. vcrpy's boto3 stub intercepts the PUT and the LIST replays
stale keys, so the freshly-generated id is never found.
- test_async_embedding_azure (logging_callback_tests) and
test_amazing_sync_embedding (local_testing): the failure branches
deliberately pass api_key='my-bad-key' to assert that the failure
callback fires. We scrub auth headers from cassettes (so the bad-key
request matches the prior good-key request), and vcrpy replays the
recorded 200 — the failure callback never fires.
- test_assistants.py (local_testing): the OpenAI Assistants polling
APIs mint fresh thread/run IDs every recording session and then poll
until status=='completed'. Replays of those polled GETs can never
match a freshly-generated run id, so every CI run effectively
re-records and the suite blows past the 15m no_output_timeout.
Skip these from VCR auto-marking so they continue to hit live providers
as they did before this change. The remaining tests in each directory
still get cached.
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* test(vcr): expand skip lists for second batch of incompatible tests
Followup to the previous commit. After re-running CI on the rebuilt
branch, three more tests surfaced as VCR-replay-incompatible:
- litellm_utils_testing :: test_get_valid_models_from_dynamic_api_key
Calls GET /v1/models with api_key='123' to assert the result is empty.
We scrub auth headers, so the bad-key request matches the prior
good-key cassette and replays the recorded model list.
- litellm_utils_testing :: test_litellm_overhead.py
Measures litellm_overhead_time_ms as a percentage of total wall-clock
time. With cached responses the upstream 'network' time collapses to
microseconds, blowing past the 40%% threshold the test asserts on.
Skip the whole file (every parametrization is at risk).
- local_testing_part1 :: test_async_custom_handler_completion and
test_async_custom_handler_embedding
Same bad-key failure-callback pattern as the already-skipped
test_amazing_sync_embedding.
- litellm_router_testing :: test_router_caching.py
Asserts on litellm's own router-level response cache by comparing
response1.id to response2.id across repeat upstream calls (test
bypasses litellm cache via ttl=0 and expects upstream to return a
*new* id). With VCR replay both upstream calls return the same
cassette body, so the ids are identical. Skip the whole file.
- logging_callback_tests :: test_async_chat_azure (preemptive)
Same shape as already-skipped test_async_embedding_azure; was masked
by upstream OpenAI rate-limit failures on baseline.
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* test(vcr): use item.path and tighten matcher docstring
- Replace pytest's deprecated item.fspath with item.path in
apply_vcr_auto_marker_to_items so we don't emit deprecation
warnings under pytest 8.
- Clarify _safe_body_matcher docstring to reflect actual behavior
(direct == first, then UTF-8 bytes comparison, no repr fallback).
Addresses Greptile review feedback on PR #27159.
* test(vcr): swallow all RedisError on cassette save/load
Cassette persistence is strictly best-effort: any Redis-side failure
(connection blip, timeout, OutOfMemoryError when the maxmemory cap is
hit, READONLY replicas, etc.) should degrade to 'test passed but
cassette not cached' rather than fail the test on teardown.
Previously the persister only caught ConnectionError and TimeoutError,
so OutOfMemoryError — which Redis Cloud raises when the cassette cache
hits its memory cap and there are no evictable keys — propagated out of
vcrpy's autouse fixture and ERRORed otherwise-passing tests on
teardown. This caused the litellm_utils_testing CircleCI job to fail on
the latest commit's run, even though the underlying test was a unit
test that used mock_response and produced no real upstream traffic
(the cassette was dirtied by a background langfuse callback). The
rerun only succeeded because Redis evictions happened to free enough
room before the SET — i.e. it was timing-dependent flakiness.
Catch redis.exceptions.RedisError (the common base of all server- and
client-side Redis exceptions) on both save and load, and parametrize
the regression tests across ConnectionError, TimeoutError, and
OutOfMemoryError to pin the new behavior.
* test(vcr): surface cassette-cache failures with warnings + session banner
When the persister silently swallows a Redis OOM (or any RedisError) on
save/load there is otherwise no visible signal that the cache is
degraded — tests pass, the cassette just isn't persisted, and the next
session still hits the same Redis at the same near-cap memory.
Add three layers of observability so that failure mode is loud:
1. Per-process health counters ("save_failures", "load_failures", and
the last error string for each), exposed via cassette_cache_health()
and reset via reset_cassette_cache_health(). The persister
increments these in addition to logging.
2. VCRCassetteCacheWarning (UserWarning subclass) emitted via
warnings.warn() inside the persister's except block. Pytest's
built-in warnings summary at session end automatically lists every
such warning, so the failure is visible in CI logs without any
conftest-level wiring.
3. Session-end banner via emit_cassette_cache_session_banner() and a
stderr-fallback atexit handler registered from
register_persister_if_enabled(). Two states:
- red "VCR CASSETTE CACHE DEGRADED" when save_failures or
load_failures > 0
- yellow "VCR CASSETTE CACHE NEAR CAPACITY" (no failures, but
used_memory >= 85% of maxmemory) so the next session knows
the Redis is approaching OOM before any SET actually fails
Capacity comes from a best-effort INFO memory probe
(cassette_cache_capacity_snapshot) that returns None on any failure or
when maxmemory is uncapped. The atexit handler skips xdist workers so
only the controller emits.
Tests: parametrize the existing save/load swallow-error tests across
ConnectionError/TimeoutError/OutOfMemoryError, add direct tests for
the health counters and warning emission, and a new
test_vcr_conftest_common_banner.py covering banner output for every
state (silent/red/yellow/disabled/xdist-worker).
* test(vcr): bucket cassettes by API key fingerprint, drop bad-key skips
Tests that deliberately call an LLM API with a bad key (e.g. to assert
that the failure callback fires, or that check_valid_key returns False)
were being silently served the prior good-key cassette: we scrub the
real Authorization / x-api-key header from the cassette before storing
it, so a follow-up bad-key call is byte-identical to the good-key call
under the existing match_on tuple.
Add a 'key_fingerprint' custom matcher that distinguishes requests by
the SHA-256 of their API-key headers. The fingerprint is stamped into
a synthetic 'x-litellm-key-fp' header by a new before_record_request
hook, which then strips the real auth headers (we have to do the
scrubbing here instead of via vcrpy's filter_headers knob, because
filter_headers runs *first* and would erase the value we want to hash).
Bad-key requests now get a different cassette bucket than good-key
requests, so vcrpy will not replay a recorded 200 in place of the
expected 401. The fingerprint is a one-way hash of the secret, so
cassettes never contain the key.
This permanently removes the 'bad-key' category of skips:
- tests/local_testing: dropped ::test_amazing_sync_embedding,
::test_async_custom_handler_completion,
::test_async_custom_handler_embedding
- tests/logging_callback_tests: dropped ::test_async_chat_azure,
::test_async_embedding_azure
- tests/litellm_utils_tests: dropped
::test_get_valid_models_from_dynamic_api_key
Coverage: 7 new unit tests in tests/test_litellm/test_vcr_safe_body_matcher.py
covering header stripping, fingerprint determinism, no-auth bucketing,
good-vs-bad key discrimination, x-api-key (Anthropic/Azure) discrimination,
and idempotence under replay.
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* test(vcr): drop redundant comments and docstrings
Trim narration of code that is already self-evident from function and
variable names. Keep the two genuinely non-obvious bits:
- ordering constraint between filter_headers and before_record_request,
which would invite a maintainer to re-introduce the bug if removed
- the per-directory _VCR_INCOMPATIBLE_FILES rationale, since 'why
exactly is this skipped' is not knowable from the test name alone
Also drop the 40-line commented-out drop-in conftest snippet at the
bottom of _vcr_conftest_common.py — the consuming conftests are the
canonical reference.
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* test(vcr): make _before_record_request idempotent
vcrpy invokes before_record_request more than once per request:
can_play_response_for calls it, then __contains__ /
_responses (reached via play_response) call it again on the
result. The second invocation sees a request whose auth headers we
already stripped, so a naive recompute yields "no-key" and
overwrites the real fingerprint stored in the header.
This makes can_play_response_for and play_response disagree on
matchability — the former says "yes, we have a stored response for
this" (matching no-key to no-key) and the latter throws
UnhandledHTTPRequestError because it computes a fresh real
fingerprint that doesn't match the stored no-key.
In CI this manifested as ~30 failing tests across guardrails_testing,
audio_testing, batches_testing, image_gen_testing, llm_responses_api,
litellm_router_unit_testing, etc. Skip the recompute when the header
is already set, so re-applying the hook is a no-op.
Adds a regression test that fires the hook twice on the same dict and
asserts the fingerprint stays put.
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* test(vcr): drop more redundant docstrings and headers
* test(vcr): enable 24hr cache for ocr_tests and search_tests
These two directories were the only non-dockerized test suites in the
build_and_test workflow that make live LLM/provider API calls but were
not VCR-enabled by this PR. Together they account for 96 tests:
- tests/ocr_tests/ (31): Mistral OCR, Azure AI OCR, Azure Document
Intelligence, Vertex AI OCR. Pure-unit tests inside the same files
(e.g. TestAzureDocumentIntelligencePagesParam) make no HTTP calls
and become benign VCR NOOPs.
- tests/search_tests/ (65): Brave, DataForSEO, DuckDuckGo, Exa,
Firecrawl, Google PSE, Linkup, Parallel.ai, Perplexity, SearchAPI,
Searxng, Serper, Tavily.
Both directories use the canonical minimal conftest pattern from
tests/audio_tests/conftest.py with no skip lists. None of the test
files use respx, none assert on per-call upstream non-determinism
(no response1.id != response2.id, no overhead-as-fraction-of-total,
no live polling), so the default match_on tuple should cache cleanly.
If a flake surfaces during the first cassette-recording CI run, we
can add a targeted skip the same way we did for the other dirs.
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
---------
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
The invite-signup form was writing the new user's token via raw
`document.cookie` at `path=/`, while the rest of the auth surface uses
`storeLoginToken` (which writes at `path=/ui` and mirrors to
sessionStorage). After signup the inviter's `path=/ui` cookie kept
winning path-specificity matching, and sessionStorage still held the
inviter's token, so the dashboard rendered as the inviter rather than
the newly created user.
Treat invite signup as a principal-change boundary — clear prior
session cookies first, then store the new token via the canonical
helper.
PRISMA_CLI_BINARY_TARGETS="debian-openssl-3.0.x" was hardcoded in
docker/Dockerfile.non_root by #17695. On a buildx linux/arm64 leg this
forces prisma to download the amd64 schema-engine into an arm64 image,
so 'prisma migrate deploy' fails at startup with 'Could not find
schema-engine binary'.
Removing the env lets prisma auto-detect per build platform: amd64
builds still resolve to debian-openssl-3.0.x (Wolfi falls back to
debian, same binary as before), and arm64 builds now correctly fetch
linux-arm64-openssl-3.0.x. The offline-cache pre-warm goal of #17695 is
preserved — only which binaries fill the cache changes.
Fixes#19458
The cimg/python:3.12-browsers base image already ships every Chromium
system dependency Playwright needs (libnss3, libatk-bridge2.0-0,
libcups2, etc. — the install log shows them all as "already the newest
version"). Passing --with-deps to `npx playwright install` therefore
runs an apt-get update + install for nothing, but pays the full cost of
hitting Ubuntu mirrors. On a recent run those mirrors stalled hard:
apt-get update alone took 6m53s at 81.5 kB/s with several archives
returning connection refused.
Drop --with-deps and persist ~/.cache/ms-playwright alongside
node_modules so the Chromium binary is also reused across runs. Bump
the cache key to v2 so the existing v1 entry (which only contained
node_modules) is not loaded and skipped over the new browser path.
/metrics now requires auth by default; tests/otel_tests/test_prometheus.py
makes 4+ unauthenticated GETs against http://0.0.0.0:4000/metrics, so
every prometheus test in CI now fails the metric assertion.
Set require_auth_for_metrics_endpoint: false in otel_test_config.yaml
to opt out for this test job, which scrapes /metrics directly. Verified
locally: 8/8 prometheus tests green (one flaky retry on
test_proxy_success_metrics that pre-dates this PR).
Also drop the -x stop-on-first-failure flag from the otel test command
so all failures in the job surface in a single CI run rather than
hiding behind whichever one trips first.
The Azure o-series tests were excluded from the conftest's VCR auto-marker
because of a respx/vcrpy transport-patching conflict, but the only respx
reference in the file was an unused `MockRouter` import. Drop the dead
import and remove the file from the conflict set so cassettes record on
first run and replay thereafter, eliminating the 60-95s live Azure latency
that was crashing xdist workers under --timeout=120 thread-mode timeouts.
The /otel-spans endpoint returns process-wide spans and tags
most_recent_parent by max start_time. After tightening that route to
proxy_admin (sk-1234), the GET /otel-spans request itself emits auth
spans that beat the chat-completion spans on start_time, so
most_recent_parent now points at the request's own auth trace
(['postgres', 'postgres']) and the >=5-span assertion fails.
Pick the chat-completion trace by content: it is the only trace whose
span list is a superset of {postgres, redis, raw_gen_ai_request,
batch_write_to_db}. Verified locally end-to-end against
otel_test_config.yaml + OTEL_EXPORTER=in_memory: 3/3 runs green.
/otel-spans now requires proxy admin (returns 401 'Only proxy admin
can be used to generate, delete, update info for new keys/users/teams.
Route=/otel-spans' for non-admin callers). Switch the GET call to use
the master key sk-1234 while keeping the generated key for the
chat-completion request that produces the spans.
- Add image_generation/http_utils.azure_deployment_image_generation_json_body; call
from azure.py (keeps AzureChatCompletion focused on chat).
- Rename finalize_image_edit_multipart_data to finalize_image_edit_request_data with
docstring covering multipart and JSON POST payloads (review feedback).
Co-authored-by: Cursor <cursoragent@cursor.com>
Ruff F401 flagged the aliased import as unused within common_utils.py
because the name is consumed only by external modules (~15 callers
across guardrails, spend tracking, MCP, agents, management endpoints).
Add `# noqa: F401 re-exported` so the alias survives lint while
keeping a single source of truth in litellm.proxy._types.
- Move _user_has_admin_view to litellm.proxy._types as
user_api_key_has_admin_view (single source of truth). common_utils.py
and isolation.py both import from there now, removing the duplicated
role-check that could silently diverge if new admin roles are added.
- Add pytest.importorskip("litellm_enterprise") to the two regression
tests that assert managed_files / managed_vector_stores are registered;
those keys come from ENTERPRISE_PROXY_HOOKS so the tests would fail
unconditionally in a checkout without the enterprise extra installed.
The Python 3.13 CCI smoke matrix surfaces a partially-initialized-module
ImportError when loading the managed files hook chain:
litellm.proxy.hooks/__init__ (mid-import)
-> enterprise.enterprise_hooks
-> litellm_enterprise.proxy.hooks.managed_files
-> litellm.llms.base_llm.managed_resources.isolation
-> litellm.proxy.management_endpoints.common_utils
-> litellm.proxy.utils (re-enters litellm.proxy.hooks)
The except ImportError block in hooks/__init__.py silently swallowed the
failure, leaving managed_files unregistered and POST /files returning
500 "Managed files hook not found".
Two-layer fix:
- Inline the 3-line _user_has_admin_view check in isolation.py instead
of importing it from litellm.proxy.management_endpoints.common_utils.
litellm.llms.* should not depend on litellm.proxy.* — removing this
layering violation breaks the cycle at its root.
- Define PROXY_HOOKS and get_proxy_hook before the conditional
enterprise import in litellm/proxy/hooks/__init__.py, so any future
re-entry resolves the public names instead of hitting an
ImportError on a partially-initialized module.
Also fold in two unrelated CCI repairs surfaced in the same staging run:
- tests/otel_tests/test_key_logging_callbacks.py: per-key
gcs_bucket_name / gcs_path_service_account are now stripped by
initialize_dynamic_callback_params, so the GCS client falls through
to the env-only branch. Update the assertion to match the new
"GCS_BUCKET_NAME is not set" message.
- .circleci/config.yml: tests/pass_through_tests now resolves
google-auth-library@10.x via the @google-cloud/vertexai 1.12.0 bump,
which uses dynamic ESM imports Jest 29 cannot load without
--experimental-vm-modules. Pass that flag in the Vertex JS test step.
Adds tests/test_litellm/proxy/hooks/test_proxy_hooks_init.py as a
regression guard: managed_files / managed_vector_stores must register,
and isolation.py must not transitively import litellm.proxy.utils.
secret_fields (containing raw HTTP headers including Authorization
Bearer tokens) was being included in proxy_server_request['body']
because the body snapshot was a copy.copy(data) of the full request
dict. This body gets serialized and persisted in the LiteLLM_SpendLogs
table, exposing user credentials in the database.
Root cause: data['secret_fields'] was set before the body snapshot at
data['proxy_server_request']['body'] = copy.copy(data), so the full
raw headers (including auth tokens) ended up in the snapshot.
Fix (defense in depth):
1. Exclude 'secret_fields' when creating the body snapshot in
litellm_pre_call_utils.py (primary fix)
2. Strip 'secret_fields' in _sanitize_request_body_for_spend_logs_payload
as a secondary safeguard
secret_fields remains available on the live data dict for legitimate
downstream consumers (MCP, Responses API).
Co-authored-by: Krrish Dholakia <krrish-berri-2@users.noreply.github.com>