Commit Graph

39002 Commits

Author SHA1 Message Date
user
5bafa8b3a2
Drop dep bumps + black-26 reformat to clear fork CI policy
PR was blocked by .github/workflows/guard-fork-dependencies.yml: fork PRs
cannot modify uv.lock. Reverting:

- uv.lock + pyproject.toml black bump (24.10.0 -> 26.3.1) and the 295
  files of mechanical Black 26 reformat coupled to it
- pyproject.toml diskcache extra change (kept the runtime mitigation in
  litellm/caching/disk_cache.py via JSONDisk)

Kept:
- Dockerfile cache narrowing (drops ~660 MB of uv build cache that
  surfaced cached setuptools as CVE findings)
- litellm/caching/disk_cache.py: dc.JSONDisk to neutralize CVE-2025-69872
- ui/litellm-dashboard/package-lock.json + litellm-js/spend-logs/package-lock.json:
  next/postcss/hono/uuid CVE bumps (these are not blocked by the fork guard)
- tests/test_litellm/caching/test_disk_cache.py
- tests/code_coverage_tests/liccheck.ini: harmless black authorization

Black + gitpython + langchain dep upgrades will need a follow-up from a
maintainer pushing a branch in the canonical BerriAI/litellm repo.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 23:04:52 +00:00
user
63bda3f001
Merge remote-tracking branch 'upstream/litellm_internal_staging' into cve-sweep-2026-05
# Conflicts:
#	uv.lock
2026-05-07 23:03:28 +00:00
yuneng-jiang
0fb88d50dd
Merge pull request #27415 from BerriAI/litellm_/sweet-mcclintock-2b3656
[Fix] Realtime Tests: Update Deprecated OpenAI Model Pin
2026-05-07 15:46:51 -07:00
Yuneng Jiang
a43dc9f0b1
[Fix] Batches Tests: Remove VCR Auto-Marker
Strip VCR wiring from the batches test conftest. Drops:

- import of `_vcr_conftest_common` helpers
- the `vcr_config` fixture, `pytest_recording_configure`,
  `_vcr_outcome_gate`, `pytest_runtest_makereport`
- the `apply_vcr_auto_marker_to_items` call in
  `pytest_collection_modifyitems`
- `VerboseReporterState` / its `pytest_configure` /
  `pytest_runtest_logreport` hooks (purely VCR-verdict plumbing)

Why: every test in this directory creates ephemeral OpenAI / Bedrock /
vLLM resources whose IDs change per run (file-XXX, batch-XXX,
ft-XXX, ...). VCR's path/query/body matchers don't match across runs,
so `record_mode="new_episodes"` was silently passing through to the
live API and recording many new cassette entries every run. Cassette
bloat without replay benefit.

Behaviour after this change is identical to running the directory
without `CASSETTE_REDIS_URL` set: tests that have keys hit live APIs,
tests that don't continue to skip via their existing skipif markers.

Conftest now keeps only path setup and the session-scoped `event_loop`
fixture.
2026-05-07 15:39:46 -07:00
ishaan-berri
e4c14862fc
feat(mcp): add OBO MCP Auth (#27421)
* feat(mcp): add oauth2 token exchange auth

Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>

* fix(mcp): cache token exchange fallback

Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>

---------

Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com>
Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>
2026-05-07 15:35:21 -07:00
Yuneng Jiang
5256a1fdb6
[Fix] Fine-Tuning Test: Bump Off Deprecated gpt-3.5-turbo-0125
OpenAI announced gpt-3.5-turbo-0125 (and fine-tuning of gpt-3.5-turbo
in general) for shutdown on 2026-10-23, with the announcement landing
2026-04-22. The hard-fail date is ~5 months out, but timing fits the
recent uptick in this test flaking and OpenAI may already be running
the deprecated model's pipeline with deprioritized infra.

Bump to gpt-4o-mini-2024-07-18 — currently supported for fine-tuning,
no announced shutdown. Updates the live test plus the mocked test for
consistency. Belt-and-suspenders with the existing propagation-retry
helper.
2026-05-07 14:41:58 -07:00
Yuneng Jiang
a8cad84dc7
[Fix] Fine-Tuning Test: Retry on File Propagation 400
Previous fix polled `litellm.afile_retrieve` for `status == "processed"`
before calling the fine-tuning endpoint. That doesn't actually solve
the race:

- OpenAI's `FileObject.status` field is deprecated per the SDK type and
  not authoritative — it can read "processed" before the file is usable.
- The retrieve and fine-tuning endpoints don't share a consistency
  model, so retrieve succeeding tells you nothing about FT visibility.

Replace with a retry around the actual `acreate_fine_tuning_job` call
that catches the OpenAI 400 `'file-... does not exist'` and backs off
exponentially (1s → cap 8s, 12 attempts, ~70s total budget). The
operation succeeding is the only reliable signal that propagation
finished.
2026-05-07 14:17:45 -07:00
Yuneng Jiang
a64716ed5b
[Fix] Fine-Tuning Test: Wait for OpenAI File Propagation
OpenAI file uploads are eventually consistent — a freshly uploaded file
may briefly 404 from `retrieve` and is rejected by the fine-tuning
endpoint with `'file-... does not exist'` until processing finishes.
The async fine-tuning test called `acreate_fine_tuning_job` immediately
after `acreate_file` and flaked on this race.

Add a polling helper that waits up to ~30s for `status=processed` (and
short-circuits on `error`), called between upload and FT job creation.
Mirrors the same propagation lag covered by the `await asyncio.sleep(1)`
in the sister batches test, but more robust against longer delays.
2026-05-07 14:06:04 -07:00
Yuneng Jiang
4189d78a64
Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_/sweet-mcclintock-2b3656 2026-05-07 13:46:53 -07:00
Yuneng Jiang
9e7dc5ef68
[Fix] Realtime Tests: Update Deprecated OpenAI Model Pin
OpenAI deprecated the gpt-4o-realtime-preview-2024-10-01 snapshot,
which caused these E2E tests to fail consistently in CI. Bump to the
unversioned gpt-4o-realtime-preview alias to match the sibling
test_openai_realtime_simple.py and stay current as OpenAI rolls the
alias forward.
2026-05-07 13:46:45 -07:00
michelligabriele
3b78a3a545
fix(chat-completions): decode unified file_id when model_file_id_mapping is unavailable (#27406)
* fix(chat-completions): decode unified file_id when model_file_id_mapping is unavailable

* fix(chat-completions): tolerate non-dict content items (e.g. token-ids from text_completion)
2026-05-07 13:17:04 -07:00
ishaan-berri
b891a201f8
Preserve LiteLLM headers for passthrough responses (#27412)
Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com>
Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>
2026-05-07 12:59:36 -07:00
Shivam Rawat
29e4eb16da
Merge pull request #27222 from BerriAI/litellm_s3AuditParams
[Feat] Decouple S3 audit-log config via s3_audit_callback_params
2026-05-07 12:49:03 -07:00
yuneng-jiang
b9b315157b
Merge pull request #27409 from BerriAI/litellm_/inspiring-allen-ec64a4
[Fix] Tests: Reduce VCR cassette bloat and fix multipart caching
2026-05-07 12:39:58 -07:00
Michael-RZ-Berri
db8198faba
[Fix] Allow non-admin compliance path reads (#27234)
* allow non-admin roles on /compliance/* read routes

* Restrict compliance routes to internal users

---------

Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2026-05-07 14:07:23 -05:00
Yuneng Jiang
2f9519d286
[Fix] Tests: Reduce VCR cassette bloat and fix multipart caching
- Add `_strip_image_b64_payloads` filter: rewrites `data[*].b64_json` in
  image-gen responses to a 4-byte placeholder before the cassette is saved.
  Image-edit and image-gen cassettes (193 MB / 184 MB / 104 MB / ...) will
  shrink to <100 KB on next record. Tests assert response shape only, so
  coverage is preserved.
- Add `_normalize_multipart_boundary` filter: replaces httpx's per-request
  random multipart boundary with a fixed string in both Content-Type header
  and body bytes. Audio-transcription / Whisper tests have been effectively
  unmocked — every CI run hit live providers and was silently capped at
  MAX_EPISODES_PER_CASSETTE=50. Both record and replay now see identical
  bytes; the safe_body matcher works.
- Fix test_evals_api.py body poisoning: replace `int(time.time())` in eval
  names with `hashlib.sha1(test_node_name)[:12]`, add a function-scoped
  `managed_eval` fixture that creates and deletes the eval, and switch
  `get_eval` / `update_eval` from `list_evals().data[0].id` (which made
  the URL vary by run) to `managed_eval.id`. Net coverage gain: delete is
  now actually exercised.
- Swap arxiv PDF URL in BaseOCRTest for the in-repo `dummy.pdf` (589 B)
  served via sha-pinned jsdelivr.
- Swap etsystatic image URL in BaseLLMChatTest.test_image_url for the
  in-repo LiteLLM logo (9.2 KB) served via the same jsdelivr pin.
- Add `tests/llm_translation/test_vcr_filters.py` with 14 unit tests
  covering both new filters: replacement, idempotency, nesting, content-
  length update, two-distinct-boundaries-converge-after-normalize, etc.

Cassettes recorded with the prior patterns will mismatch on the first CI
run after merge; recommend flushing the cassette Redis once (post-merge)
so re-records save under the new format from the start.
2026-05-07 11:54:19 -07:00
michelligabriele
9f1b41d206
fix(proxy): run model-level post_call guardrails on streaming requests (#26922) 2026-05-07 11:53:03 -07:00
ishaan-berri
fee5900acc
feat(xai): add grok-4.3 and grok-4.3-latest to model_prices_and_conte… (#27154)
* feat(xai): add grok-4.3 and grok-4.3-latest to model_prices_and_context_window.json

xAI's docs page now lists grok-4.3 as the recommended chat / coding model:
"We strongly recommend all API callers use grok-4.3. It is the most
intelligent and fastest model we've built." (https://docs.x.ai/docs/models)

Pricing/specs sourced from xAI's published model metadata:
  - input:  $1.25 / 1M tokens (<=200k),  $2.50 / 1M tokens (>200k)
  - output: $2.50 / 1M tokens (<=200k),  $5.00 / 1M tokens (>200k)
  - cached: $0.20 / 1M tokens (<=200k),  $0.40 / 1M tokens (>200k)
  - context: 1,000,000 tokens
  - capabilities: vision, reasoning, function calling, structured outputs,
    prompt caching, web search

Adds two entries: `xai/grok-4.3` (canonical) and `xai/grok-4.3-latest` (alias),
mirroring the pattern used for the rest of the xAI/Grok-4 family.

* test(xai): add model_info test for grok-4.3 + sync backup cost map

- Mirror xai/grok-4.3 and xai/grok-4.3-latest entries into
  litellm/model_prices_and_context_window_backup.json so the bundled
  model cost map matches the canonical model_prices_and_context_window.json.
- Add tests/test_litellm/test_xai_grok_4_3_model_metadata.py covering
  pricing tiers, capability flags, context window, provider routing,
  and parity between the main and backup cost maps.
- Point 'source' at the live xAI models page (the per-model URL
  https://docs.x.ai/docs/models/grok-4.3 currently 404s).

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

---------

Co-authored-by: shin-watcher <shin-watcher@berri.ai>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
2026-05-07 09:06:56 -07:00
harish-berri
a67b7a7e87
Refactor Bedrock response stream shape handling (#27257)
* Refactor Bedrock response stream shape handling

- Introduced a module-level constant `BEDROCK_RESPONSE_STREAM_SHAPE` to cache the response stream shape, eliminating the need for per-instance caching in `BedrockEventStreamDecoderBase`.
- Updated relevant methods to utilize the new constant, improving performance by avoiding redundant loading of the shape.
- Added tests to ensure the shape is loaded correctly at import time and is consistent across different modules.
- Added a new mock server script for testing Bedrock pass-through functionality.

* Refactor response parsing for Bedrock and SageMaker

- Improved code readability by formatting the parsing method calls in `AWSEventStreamDecoder` for both Bedrock and SageMaker response stream shapes.
- Added blank lines for better separation of code blocks in `invoke_handler.py` and `common_utils.py` to enhance maintainability.

* Enhance error handling for Bedrock and SageMaker response stream shape loading

- Wrapped the loading logic in `_load_bedrock_response_stream_shape` and `_load_sagemaker_response_stream_shape` with try-except blocks to gracefully handle exceptions.
- Added logging to warn when the response stream shape cannot be pre-loaded, ensuring the module imports cleanly.
- Updated tests to verify that loading failures return `None` instead of propagating exceptions.

* Implement error handling for missing response stream shapes in Bedrock and SageMaker

- Added checks in `_parse_message_from_event` methods to raise appropriate errors when `BEDROCK_RESPONSE_STREAM_SHAPE` or `SAGEMAKER_RESPONSE_STREAM_SHAPE` is None, ensuring clearer error reporting.
- Updated logging messages to reflect the unavailability of event-stream decoding for both Bedrock and SageMaker.
- Enhanced unit tests to verify that the correct exceptions are raised when the response stream shapes are not loaded.
2026-05-06 17:39:38 -07:00
ishaan-berri
854456f58e
Fix Prometheus remaining metric zero values (#27348)
Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com>
Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>
2026-05-06 17:22:20 -07:00
yuneng-jiang
f1c91d754d
[Chore] CI: Block PRs that drop overall code coverage (#27340)
* [Chore] CI: Block PRs that drop overall code coverage

Tighten Codecov project status threshold from 1% to 0% so any drop in
overall project coverage relative to the base commit fails the
codecov/project check. target: auto keeps the bar floating with the
codebase, no manual maintenance needed as coverage moves up over time.

* [Chore] CI: Always post Codecov status regardless of CI outcome

Set codecov.require_ci_to_pass: false and codecov.notify.wait_for_ci:
false so Codecov posts the codecov/project and codecov/patch checks as
soon as the expected uploads arrive, instead of withholding them when
unrelated CI jobs fail. The coverage-regression check is independent
of test pass/fail, and CI failures are already enforced by their own
required-status checks.
2026-05-06 16:41:50 -07:00
yuneng-jiang
a3a42c6c47
[Chore] CI: Assign test_request_size_limit_middleware To Proxy-Runtime Shard (#27341)
The assert-shard-coverage guard in test-unit-proxy-db.yml failed because
test_request_size_limit_middleware.py was added under tests/proxy_unit_tests/
but not referenced by any matrix entry. Assigning it to the proxy-runtime
shard, which already covers other server-runtime tests (proxy_routes,
proxy_gunicorn, server_root_path).
2026-05-06 16:34:45 -07:00
oss-agent-shin
b318231fe9
Add Azure Sentinel audit log support (#27280)
* Add Azure Sentinel audit log callback support

Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>

* Fix Azure Sentinel audit log batching

Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>

* Fix Azure Sentinel CI checks

Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>

---------

Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com>
Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>
2026-05-06 15:50:06 -07:00
ishaan-berri
aba131d3cf
fix: Vertex Anthropic streaming status error hangs (#27310)
* Fix streaming HTTP status error hangs

Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>

* Fix sync streaming HTTP status error hangs

Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>

* Cap sync streaming error read workers

Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>

---------

Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com>
Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>
2026-05-06 15:32:55 -07:00
ishaan-berri
c15718f9d1
Fix Anthropic streaming reasoning token usage (#27319)
* fix anthropic streaming reasoning token usage

Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>

* test anthropic streaming reasoning usage end to end

Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>

* address anthropic reasoning token text split

Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>

* harden anthropic reasoning usage for mocked tokens

Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>

---------

Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com>
Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>
2026-05-06 15:28:22 -07:00
ishaan-berri
bd1a05aed9
Fix MCP DB reload partial failures (#27314)
* Fix MCP database reload partial failures

Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>

* Avoid staged MCP registry exposure

Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>

---------

Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com>
Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>
2026-05-06 15:18:18 -07:00
ishaan-berri
924c141843
Add new chat model metadata (#27313)
* add new model metadata

Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>

* address review feedback

Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>

---------

Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com>
Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>
2026-05-06 15:15:21 -07:00
ishaan-berri
487479eff7
perf: cap Prometheus end-user metric cardinality with TTL + LRU eviction (#27272)
Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>
2026-05-06 13:35:13 -07:00
oss-agent-shin
c8e47dcb43
Fix early proxy request size enforcement (#27311)
* Add early proxy request size guard

Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>

* Address request size review feedback

Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>

---------

Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com>
Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>
2026-05-06 12:29:11 -07:00
Dibyo Mukherjee
169c436684
Fix/member access group team (#27317)
* fix(auth): pass team_id in member-level model access check

_check_team_member_model_access calls _can_object_call_model without
team_id, so access groups defined via model_info.access_groups cannot
resolve for team-scoped DB models (their internal router name is
model_name_<team>_<uuid>, not the public name). The team-level check
already passes team_id; this mirrors that.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test(auth): add tests for member-level access group resolution with team_id

Eight tests covering _can_object_call_model and
_check_team_member_model_access with team-scoped DB models:

- access group resolves when team_id is passed
- access group fails without team_id (pre-fix behavior)
- literal model name still works with team_id (no regression)
- denied model still denied with team_id
- second model in group also reachable
- end-to-end member access via access group (mocked membership)
- end-to-end member denied for model not in allowed list
- no-override member inherits team-level check

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-06 12:05:22 -07:00
oss-agent-shin
d90cf56245
Fix SCIM user lookup filters (#27308)
* Fix SCIM Okta userName lookup

Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>

* fix scim user filter typing

Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>

---------

Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com>
Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>
2026-05-06 11:58:47 -07:00
ishaan-berri
c92a08a307
Fix team member budget enforcement without user row (#27273)
* Fix team member budget enforcement without user row

Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>

* Clarify regenerated key budget repro

Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>

---------

Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com>
Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>
2026-05-06 11:42:29 -07:00
Yassin Kortam
b1f577199a
fix(proxy): keep spend log cleanup running after batch failures and surface DB errors (#27303)
Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>
2026-05-06 18:39:15 +00:00
Mateo Wang
b83d11351f
proxy: hot-reload config YAML when --reload is set (#27274)
* proxy: hot-reload config YAML when --reload is set

Uvicorn's --reload only watches *.py by default, so editing the
--config YAML did not restart the proxy. _get_reload_options() now
extends reload_dirs/reload_includes with the config file's directory
and basename when --config is provided.

* proxy: qualify reload_includes with absolute config path

Address Greptile review on PR #27274. When the --config file lives
outside cwd, reload_includes previously stored only the basename, which
meant uvicorn/watchfiles would also reload on edits to any same-named
file inside cwd. Use the absolute config path as the include pattern in
that case so only the actual proxy config triggers a restart.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

* fix(proxy): use basename for reload_includes config pattern

Uvicorn's resolve_reload_patterns() calls pathlib.Path.glob(), which
raises NotImplementedError on absolute patterns (uvicorn discussion
2156). Passing config_abs (an absolute path) when the config file lived
outside cwd crashed startup under --reload. The config_dir is already
added to reload_dirs, so using just the basename as the include pattern
is sufficient to match the specific config file.

* fix: make it reload app when yaml changes

* style: remove unneeded comments

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
2026-05-06 16:06:58 +00:00
Yassin Kortam
bd1ea0252a
perf(proxy): run daily activity aggregation off the event loop (#27264)
Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>
2026-05-05 20:19:28 -07:00
ishaan-berri
c32ad90823
Fix Prometheus custom metadata label counts (#27268) (#27271)
* Fix Prometheus custom metadata label counts (#27268)

Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com>
Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>

* fix enterprise test: update positional label assertions to keyword args

prometheus_label_factory now calls .labels() with keyword arguments.
Update test_async_log_failure_event assertion to match.

---------

Co-authored-by: oss-agent-shin <ext-agent-shin@berri.ai>
Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com>
Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>
2026-05-05 20:04:56 -07:00
ishaan-berri
e9fb29061a
Include model name + configured TPM/RPM in priority rate-limit 429 er… (#27216)
* Include model name + configured TPM/RPM in priority rate-limit 429 errors (#27215)

* Include model name + configured TPM/RPM in priority rate-limit 429 errors

The current 429 message ('Priority-based rate limit exceeded. Priority: prod,
Rate limit type: tokens, Remaining: -664145, Model saturation: 86.3%') doesn't
tell the operator which model was hit or what the configured limit is, so they
can't tell whether the priority allocation needs tuning or the model TPM is
just too small.

Add Model, Model TPM, and Model RPM to both the priority-based 429 and the
sibling Model-capacity 429 in dynamic_rate_limiter_v3._check_rate_limits.
Pure error-message change — no behavior or schema impact.

* test: assert priority 429 includes model name + configured TPM/RPM

Adds a regression test for the new fields in the priority-based 429 detail
('Model:', 'Model TPM:', 'Model RPM:'). Verified locally that the test
fails against the unpatched dynamic_rate_limiter_v3.py and passes after
the patch.

---------

Co-authored-by: shin-watcher <ext-agent-shin@berri.ai>

* Update litellm/proxy/hooks/dynamic_rate_limiter_v3.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Update litellm/proxy/hooks/dynamic_rate_limiter_v3.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

---------

Co-authored-by: shin-watcher <ext-agent-shin@berri.ai>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-05-05 19:05:22 -07:00
Dennis Henry
73de892654
fix: replace user api key auth with authorization or cookie for mcp server creation (#27190)
* fix: replace user api key auth with authorization or cookie for mcp server creation

* updated tests
2026-05-05 18:36:22 -07:00
Michael-RZ-Berri
e75c7a312a
union x-litellm-tags with static team/key tags (#27247)
Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain>
2026-05-05 17:46:42 -07:00
Mateo Wang
fdaa288607
ci(circleci): enable Rerun Failed Tests for all pytest jobs (#27155)
* ci(circleci): enable Rerun Failed Tests for all pytest suites

Migrated every pytest-based CircleCI job that uploads JUnit results to use
'circleci tests run' instead of invoking pytest directly. This is the
prerequisite for CircleCI's 'Rerun failed tests' feature to be available
on each job in the pipeline.

For each job:
- Glob test files via 'circleci tests glob' and pipe them into
  'circleci tests run --command="xargs ... pytest ..."' so the agent can
  feed the failed-test subset on rerun.
- Preserve all original pytest flags (parallelism, timeouts, retries,
  coverage, junit output paths).
- For jobs that previously lacked 'store_test_results' (proxy spend
  accuracy, proxy_build_from_pip, db_migration_disable_update_check),
  add the step so JUnit XML is uploaded and rerun is actually wired up.
- Replace the dynamic IGNORE_DIRS shell array in llm_translation_testing
  with a 'grep -v' filter on the glob output, matching the previous
  behavior of skipping tests/llm_translation/realtime.
- For 'build_and_test', glob 'tests/test_*.py' (top-level only) which
  matches the prior 'tests/*.py' shell glob; the long list of
  '--ignore=tests/<subdir>' flags was vestigial and is dropped.

Jobs already using 'circleci tests run' (local_testing_part1/2,
litellm_router_testing) are unchanged.

* fix(ci): convert classnames to file paths on rerun

CircleCI's Rerun Failed Tests sends each previously failed test as a
JUnit classname (e.g. 'tests.otel_tests.test_key_logging_callbacks'),
but pytest needs a file path. Without the awk preprocess step, rerun
runs fail with 'file or directory not found'.

Mirror the awk transform that local_testing_part1, local_testing_part2,
and litellm_router_testing already use, so rerun works in every job
that this PR migrated to 'circleci tests run'.

* ci: drop -x from OTEL pytest run so all failures are reported

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2026-05-05 17:27:09 -07:00
Sameer Kankute
fd7ff0f269
fix(hosted_vllm): normalize custom tools for chat completions (#25763)
* fix(hosted_vllm): normalize custom tools for chat completions

Convert custom tool definitions into OpenAI function tools before forwarding hosted_vllm chat requests to avoid provider-side validation failures. Add a regression test and include a local curl verification screenshot.

Made-with: Cursor

* Fix black issue

* Fix hosted vllm custom tool schema fallback

* fix black

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2026-05-05 17:27:02 -07:00
yuneng-jiang
9a338e1b6b
[Test] Tests: Stop parametrizing API keys into pytest test IDs (#27249)
Several tests parametrized over (model, api_key, ...) tuples or raw
token strings, causing pytest to embed those values in the test ID
and print them in CI logs. Refactored each affected test to keep the
same coverage without putting key material into parametrize.

- audio_tests/test_audio_speech.py: split env-var keys into separate
  azure/openai test functions sharing a helper; sync_mode parametrize
  preserved.
- audio_tests/test_whisper.py: split into openai_whisper /
  azure_whisper functions sharing a helper; response_format parametrize
  preserved.
- local_testing/test_embedding.py: single-case parametrize inlined.
- proxy_unit_tests/test_user_api_key_auth.py: 5 header parametrize
  cases split into 5 named tests sharing an _assert helper.
- proxy_unit_tests/test_proxy_utils.py: 4 api_key_value cases split
  into 4 named tests.
- test_litellm/proxy/auth/test_user_api_key_auth.py: 5 key-prefix
  cases (Bearer / Basic / lowercase bearer / raw / AWS SigV4) split
  into 5 named tests.

Verified: black clean; 14 refactored unit tests pass; pytest collects
audio/embedding tests with safe IDs (no key material in test IDs).
2026-05-05 17:21:18 -07:00
Sameer Kankute
e912e6d4ff
feat(audio_transcription): add NVIDIA Riva STT provider (#27185)
* feat(audio_transcription): add NVIDIA Riva STT provider

Adds nvidia_riva as a new audio transcription provider, supporting both
NVCF-hosted and self-hosted Riva ASR deployments via gRPC streaming.

- Auto-resamples input audio to 16 kHz mono LINEAR_PCM (soundfile + numpy,
  audioread fallback) so callers can send any common format.
- Maps OpenAI params: language (en -> en-US), response_format (text/json/
  verbose_json), timestamp_granularities=["word"] -> enable_word_time_offsets,
  word offsets converted ms -> s for verbose_json.
- Auth: NVCF when nvcf_function_id is set (SSL on by default), self-hosted
  otherwise (SSL off by default), with explicit use_ssl override.
- gRPC errors wrapped via NvidiaRivaException -> litellm exception classes.
- Optional deps gated behind [stt-nvidia-riva] extra (nvidia-riva-client,
  soundfile, audioread, numpy).

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(nvidia_riva): address PR review feedback

- handler: forward call-level `timeout` to streaming_response_generator
  (kwarg-detected via inspect for older riva-client compat) so a stalled
  Riva server cannot block the caller indefinitely.
- audio_utils: spill bytes to a tempfile before audioread.audio_open;
  most audioread backends (FFmpeg, GStreamer) require a real filesystem
  path and previously raised TypeError on BytesIO, breaking the mp3/m4a
  fallback path.
- audio_utils: prefer soxr / scipy.signal.resample_poly for resampling
  (anti-aliased polyphase) when installed, falling back to linear only
  as a last resort. Avoids aliasing on 44.1/48 kHz -> 16 kHz downsamples.
- transformation: bare `es` now maps to es-ES (Castilian) instead of
  es-US, matching BCP-47 conventions.

Co-authored-by: Cursor <cursoragent@cursor.com>

* chore: trigger CI re-run [stabilize loop 1/3]

* Update litellm/llms/nvidia_riva/audio_transcription/transformation.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* chore: trigger CI re-run [stabilize loop 1/3]

* fix code qa

* fix lint

* fix mypy

* fix mypy

* Fix NVIDIA Riva ASR service lookup

* Fix NVIDIA Riva transcription payload logging

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: oss-pr-review-agent-shin[bot] <281797381+oss-pr-review-agent-shin[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
2026-05-05 17:17:51 -07:00
Krrish Dholakia
454ce5073f
fix(anthropic, mcp): sanitize tool names to match Anthropic's [a-zA-Z0-9_-]{1,128} pattern (#26788)
* fix(anthropic, mcp): sanitize tool names to match Anthropic's `^[a-zA-Z0-9_-]{1,128}$`

Tool names with characters like `/` or `.` (commonly produced by the
OpenAPI -> MCP generator from `operationId`s such as
`actions/download-job-logs-for-workflow-run`) caused Anthropic to reject
requests with `tools.N.custom.name: String should match pattern
'^[a-zA-Z0-9_-]{1,128}$'`.

Two layers of fix:

1. Anthropic transformation: build a per-request forward map (original ->
   sanitized, disambiguated by suffix on collisions) and a reverse map
   (only for names actually rewritten). Forward map is applied to tool
   defs, `tool_choice`, and historical assistant tool_calls in messages.
   Reverse map is threaded through both the non-streaming and streaming
   response paths so callers continue to see their original tool names
   in `tool_use` blocks.

2. OpenAPI -> MCP generator: sanitize `operationId` (and the
   method+path fallback) at registration time so generated MCP tools are
   valid for any strict-name provider, not just Anthropic. The dashboard
   preview endpoint applies the same sanitization for parity.

Includes unit tests covering: collision disambiguation between
`foo_bar` and `foo/bar` in the same request, reverse-map only firing
for actually-rewritten names, message rewrite for historical tool_calls,
streaming chunk_parser reverse-mapping, and sanitization of OpenAPI
operationIds plus the preview endpoint output.

Made-with: Cursor

* fix(anthropic): build tool-name maps in transform_request, not optional_params

The previous patch stashed the per-request forward and reverse tool-name
maps under ``optional_params["_anthropic_tool_name_forward_map"]`` and
``optional_params["_anthropic_tool_name_map"]``. ``optional_params`` is
the dict that becomes the JSON body via ``data = {**optional_params}``,
so those internal keys leaked over the wire and Anthropic 400'd with:

  _anthropic_tool_name_forward_map: Extra inputs are not permitted

Worse, this meant *every* request whose tool list contained any name with
an invalid character (the exact case the patch was meant to fix) regressed
into a confusing meta-error pointing at LiteLLM's internal map instead of
the offending tool.

Fix: move all tool-name sanitization into ``transform_request``, which is
the single chokepoint already shared by ``AnthropicConfig``,
``AmazonAnthropicConfig`` (Bedrock invoke), ``VertexAIAnthropicConfig``,
and ``AzureAnthropicConfig`` (all call ``super().transform_request`` /
``AnthropicConfig.transform_request(self, ...)``). New static helper
``_sanitize_tool_names_in_request`` walks the already-Anthropic-shaped
``optional_params["tools"]`` (only ``type=="custom"`` entries -- hosted
tool names are reserved by Anthropic and must not be touched), builds
the per-request forward/reverse maps, and applies the forward map in
place to ``tools[*].name`` and ``tool_choice.name``. The reverse map is
stashed exclusively on ``litellm_params`` (which is never serialized to
a provider) under ``_anthropic_tool_name_map`` for the response paths
to consume.

Side effect of this restructure: ``map_openai_params`` is now a pure
OpenAI->Anthropic param translator with no side-channel state, which
matches its contract everywhere else in the codebase.

Tests: replaced the now-incorrect "stashes maps in optional_params"
tests with regressions that assert no underscore-prefixed keys appear
in either ``optional_params`` after ``map_openai_params`` or in the
final ``transform_request`` body. Added end-to-end coverage for:
sanitization in ``transform_request``, ``tool_choice`` rewriting,
historical ``tool_calls`` rewriting in messages, and hosted-tool
passthrough.

Made-with: Cursor

* fix(anthropic): always sanitize empty text content blocks

Anthropic 400s on `{"role": "user", "content": ""}` with:
  "messages: text content blocks must be non-empty"

LiteLLM already had `_sanitize_empty_text_content` to rewrite empty text
to a placeholder, but it was gated behind `litellm.modify_params=True`.
With that flag off (default), empty content from upstream agent
frameworks (e.g. pydantic-ai) flowed straight through and tripped the
Anthropic validator.

Fix:
- Always run `_sanitize_empty_text_content` at the top of
  `anthropic_messages_pt`, independent of `modify_params`. There is no
  way to "pass through" an empty text block, so this is non-optional.
  The richer tool-call sanitizations (Cases A/B/D, which actually
  mutate conversation structure) remain gated on `modify_params`.
- Extend `_sanitize_empty_text_content` to also handle list-of-blocks
  content (`[{"type": "text", "text": ""}]`), not just string content.

Adds 3 regression tests covering string content, list-of-blocks
content, and the no-op case (non-empty messages with modify_params off).

Made-with: Cursor

* fix(anthropic): drop dead tool-name forward-map params, fix mypy + caller-mutation

- remove unused `name_forward_map` param from `_map_tool_choice`,
  `_map_tool_helper`, `_map_tools` and the `_apply_anthropic_tool_name_forward`
  helper. Production sanitization runs in `_sanitize_tool_names_in_request`
  at `transform_request`; these params were never threaded through.
- handler.py: use `ANTHROPIC_TOOL_NAME_REVERSE_MAP_KEY` constant instead of
  the hardcoded `"_anthropic_tool_name_map"` string.
- fix mypy `"object" has no attribute "__iter__"` in
  `_rewrite_tool_names_in_messages` by guarding `tool_calls` with
  `isinstance(..., list)`.
- `_sanitize_tool_names_in_request`: build a new tools list with copy-on-
  change entries (and copy `tool_choice` on rewrite) so a caller reusing
  the same tool list/dicts across requests doesn't see its inputs
  permanently rewritten.
- doc-comment `_build_request_tool_name_maps` clarifying it operates on
  OpenAI-format tools (vs `_sanitize_tool_names_in_request` which runs
  on Anthropic-format tools post-`_map_tools`).
- tests: drop 3 tests pinning the now-removed param paths; add coverage
  for tool_calls + None function_call rewrite and caller-dict immutability.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(mcp): inherit stored credentials in test/tools/list for edit flow

When editing an existing MCP server, the Tool Configuration preview
calls POST /mcp-rest/test/tools/list with server_id but no credentials
(management API redacts them). The endpoint now calls
_inherit_credentials_from_existing_server() so stored bearer tokens
and OAuth2 M2M credentials are loaded from global_mcp_server_manager
automatically — tools load without re-entering credentials.

New servers (no server_id) and requests with explicit credentials are
unaffected (function is a no-op in both cases).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(mcp): show all tools in edit panel, not just allowed tools

Edit flow was passing externalTools (from GET /tools/list, filtered by
allowed_tools) to MCPToolConfiguration, disabling the internal hook.
Remove the external props so the internal hook fires via
POST /test/tools/list, which returns all tools unfiltered. Combined
with the credential inheritance fix, tools load automatically without
re-entering credentials and all tools are visible for re-configuration.

existingAllowedTools still pre-checks previously allowed tools.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix order-dependent collision in _build_anthropic_tool_name_maps

Use a two-pass approach: first pre-register all already-valid tool names
in the 'used' set, then sanitize/disambiguate names that need rewriting.
This ensures valid names always have priority regardless of input order,
preventing duplicate tool names on the wire when e.g. 'foo/bar' appears
before 'foo_bar' in the tool list.

Add regression test for the reversed ordering case.

* Fix OpenAPI tool name collision: disambiguate sanitized names with numeric suffixes

sanitize_openapi_tool_name replaces all invalid chars with '_', but when
two operationIds differ only by sanitized characters (e.g. 'foo/list' and
'foo.list' both become 'foo_list'), the second registration silently
overwrites the first in the tool registry.

Add collision disambiguation in register_tools_from_openapi that appends
_2, _3, ... suffixes when a sanitized name is already taken, mirroring
the existing logic in _build_anthropic_tool_name_maps.

* Fix preview endpoint missing collision disambiguation for tool names

Add used_names tracking and _2/_3 suffix disambiguation to
_preview_openapi_tools, matching the logic in register_tools_from_openapi.
Without this, two operationIds that sanitize to the same string (e.g.
'foo/list' and 'foo.list' both becoming 'foo_list') would show duplicate
names in the preview while registration would disambiguate them.

* Align preview HTTP method order with register_tools_from_openapi

The preview endpoint and register_tools_from_openapi both use
order-dependent collision disambiguation (_2, _3 suffixes). When the
iteration order differs, two operations on the same path with sanitized
names that collide get different suffixes in preview vs registration,
so the dashboard shows names that don't match what actually got
registered.

Also adds a regression test that fails on the swapped order.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

* Skip duplicate originals in _build_anthropic_tool_name_maps

If the same invalid tool name appeared twice in original_names (e.g.
['foo/bar', 'foo/bar']), the second occurrence overwrote the forward
map entry with a freshly-suffixed name (foo_bar_2), leaving foo_bar
orphaned in 'used' with no reverse mapping. _sanitize_tool_names_in_request
then rewrote both tool entries to foo_bar_2, and Anthropic 400'd on
duplicate tool names.

Skip the rewrite if forward already has the original mapped.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
2026-05-06 00:00:36 +00:00
Yassin Kortam
dbc8f5a937
helm: skip proxy startup prisma db push when migrations Job is enabled (#27200)
Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>
2026-05-05 16:58:53 -07:00
Yassin Kortam
618df94433
helm: increase default probe timeouts, disable debug logging by default (#27237)
Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>
2026-05-05 16:58:34 -07:00
Yassin Kortam
950074eea2
fix: atomic TPM rate limit (#27001)
Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>
2026-05-05 16:58:07 -07:00
Sameer Kankute
b8635bbc7a
feat(realtime): OpenAI Realtime GA support and beta compatibility (#27110)
* feat(realtime): OpenAI Realtime GA support and beta compatibility

- Normalize beta-style session.update to GA for upstream OpenAI; optional GA→beta
  event translation when client sends OpenAI-Beta: realtime=v1
- Default upstream WebSocket without OpenAI-Beta; forward header when client opts in
- Extend OpenAI realtime types for GA event names and conversation item shapes
- Relax LiteLLMRealtimeStreamLoggingObject.results to List[Any] for GA events
- Update proxy client_secrets fallback to omit beta header; dashboard RealtimePlayground
- Add unit tests for remap, translation, and beta header helper

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix results

* fix greptile

* Fix mypy issues

* Remove unused class constants _GA_TEXT_DELTA_TYPES and _GA_AUDIO_DELTA_TYPES

These frozensets were defined as class-level constants in realtime_streaming.py
but never referenced anywhere in the codebase. Removing dead code.

Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>

* fix(realtime): use GA-shaped session.update in guardrail injections

The guardrail VAD injection code sent a beta-style session.update with a
flat turn_detection field:

  {"session": {"turn_detection": {"create_response": false}}}

When the upstream OpenAI backend operates in GA mode (no OpenAI-Beta
header forwarded), it requires the nested GA shape:

  {"session": {"type": "realtime", "audio": {"input": {"turn_detection": {"create_response": false}}}}}

The _remap_beta_session_to_ga helper was only applied to client-
originated session.update messages in client_ack_messages. Internally-
generated session.updates (sent via _send_to_backend) in two paths:
  - _handle_raw_backend_message (raw/no provider_config path, line 518)
  - backend_to_client_send_messages provider_config path (line 481)
bypassed the remap, so GA upstreams ignored or rejected them, breaking
audio transcription guardrails for all non-beta clients.

Fix: add _make_disable_auto_response_message() helper that always emits
the correct GA-shaped session.update, and replace both injection sites
with it.

Update existing tests to assert the GA nested shape instead of the old
flat beta shape, and add a new unit test for the helper itself.

Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>

* Log realtime session type

* Fix beta realtime session payloads

* Fix realtime audio format remapping edge case

* Fix Azure realtime beta session shape

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
2026-05-05 16:49:20 -07:00
harish-berri
4fec69dd1e
refactor(BaseAWSLLM): implement shared IAM cache and static credentia… (#27125)
* refactor(BaseAWSLLM): implement shared IAM cache and static credential caching

- Introduced a process-wide shared IAM cache to optimize credential management across instances.
- Added a method to handle caching of static credentials, ensuring only long-lived credentials are cached.
- Updated the get_credentials method to utilize the new caching mechanism for static credential flows.
- Enhanced unit tests to verify the correct behavior of the shared cache and static credential usage.

* refactor(BaseAWSLLM): enhance IAM credential caching and update related tests

- Improved the process-wide IAM credential caching mechanism to better handle static and AssumeRole credentials.
- Renamed the caching method for clarity and updated comments to reflect the new caching behavior.
- Added a fixture to ensure the IAM cache is flushed between tests to prevent leakage of cached entries.
- Updated unit tests to verify the correct behavior of the shared IAM cache, particularly for static credentials and role assumptions.

* refactor(BaseAWSLLM): clarify IAM credential caching behavior and enhance tests

- Updated documentation to specify that only static and ambient environment credentials are cached, excluding AssumeRole and other credential types.
- Modified the caching logic to ensure that AssumeRole credentials are not stored in the IAM cache, requiring STS calls for each request.
- Enhanced unit tests to verify that AssumeRole credentials are not cached and to ensure proper behavior of the IAM cache across different scenarios.

* Code Readability improvement for aws auth path

* refactor(BaseAWSLLM): enhance IAM credential caching documentation and add tests

- Updated comments to clarify the behavior of the in-process IAM credential cache, specifying the TTL for static and ambient credentials.
- Added new unit tests to verify the caching behavior for ambient environment credentials across instances and ensure that static access key sessions are constructed only once when cached.
- Ensured that temporary session tokens and AWS profiles are not cached, validating the expected behavior through additional tests.

* refactor(BaseAWSLLM): improve IAM credential handling and add tests for role assumption

- Updated comments to clarify the behavior of IAM credential caching, particularly regarding the handling of ambient credentials and role assumptions.
- Enhanced unit tests to verify that the caching mechanism correctly distinguishes between already running roles and new role assumptions, ensuring that cached environment credentials are not reused incorrectly.
- Added a new test case to validate the behavior when switching roles, confirming that the system correctly uses AssumeRole when the role changes.
2026-05-05 16:47:47 -07:00
yuneng-jiang
e84282b7b3
[Infra] Bump deps (#27157)
* bump: version 0.4.70 → 0.4.71

* bump: version 0.1.39 → 0.1.40

* uv lock
2026-05-05 15:58:05 -07:00