Commit Graph

39037 Commits

Author SHA1 Message Date
Mateo Wang
3b21441d2b
Merge pull request #26947 from BerriAI/litellm_rateLimitMetricLabels
[fix] fix metric labels for litellm-side rejects
2026-05-04 15:34:02 -07:00
user
a2473ef0c2
chore(caching): remove allow_legacy_unscoped_cache_hits opt-in
The flag was an opt-in escape hatch for the cross-tenant leak the rest
of the patch closes — flipping it on (env var or constructor param)
re-enables exactly the VERIA-54 primitive on either backend. There is
no operational need that the secure path doesn't already meet:

- Qdrant: legacy points without ``litellm_cache_key`` payload are
  excluded by the must-clause filter and treated as misses; new sets
  populate the cache key, so cold-start lasts only as long as the
  natural cache rebuild.
- Redis: existing unscoped index can't carry the new schema; the init
  path falls back to ``{name}_isolated`` (and recreates it on stale
  schema), leaving the legacy index untouched.

Drop the constructor param, env-var fallback, ``_using_legacy_unscoped_index``
flag, the legacy-reuse branch in ``_init_semantic_cache``, and the
matching guards in set/get paths. Update tests to drop the legacy-mode
cases and assert the secure-only behaviour.
2026-05-04 22:16:30 +00:00
user
7d7244986e
chore(caching): annotate qdrant quantization_params dict type
Mypy infers the dict's value type from the first branch
(Dict[str, bool]) which clashes with the scalar branch's mixed-type
inner dict. Explicit Dict[str, Any] annotation lifts the inference.
2026-05-04 22:10:59 +00:00
Yassin Kortam
f9ae559c1e
Merge pull request #27022 from BerriAI/litellm_fix/routing-strategy-model-filter
feat: selectively apply routing strategy according to model name
2026-05-04 15:06:22 -07:00
Michael-RZ-Berri
1a17c438b6
Merge pull request #27133 from BerriAI/litellm_zeroBudgetTreatedAsNoCap
[Fix] Treat 0 team_member_budget as no cap
2026-05-04 15:05:18 -07:00
yuneng-jiang
2aa4301fe7
Merge pull request #27027 from stuxf/fix/mcp-server-url-redact-non-admin
fix(proxy): redact MCP server URL and headers for non-admin viewers (VERIA-8)
2026-05-04 15:05:05 -07:00
yuneng-jiang
07807f760f
Merge pull request #27019 from stuxf/codex/cloud-storage-file-guard
fix(files): constrain cloud storage file paths (VERIA-45, VERIA-59)
2026-05-04 14:59:47 -07:00
Mateo Wang
196dbb4b43
Merge pull request #27074 from BerriAI/litellm_fix_reasoning_effort_followup-0f97
fix(anthropic,bedrock,vertex): forward output_config.effort + 400 on garbage reasoning_effort
2026-05-04 14:59:27 -07:00
yuneng-jiang
e1083b1353
Merge pull request #27014 from stuxf/fix/activity-endpoint-tenant-scoping
fix(proxy): scope team and agent activity endpoints per-entity (VERIA-43)
2026-05-04 14:51:45 -07:00
yuneng-jiang
6f5678bcd8
Merge pull request #27007 from stuxf/fix/admin-viewer-write-route-blocklist
fix(auth): block missing write routes for proxy admin viewers
2026-05-04 14:45:14 -07:00
Michael Riad Zaky
28bf4647ef Treat 0 team_member_budget as no cap 2026-05-04 14:45:13 -07:00
user
af7794272b Add semantic cache legacy migration flag 2026-05-04 14:42:53 -07:00
yuneng-jiang
e995156462
Merge pull request #26953 from stuxf/chore/audit-log-cache-vault-settings
chore(audit): audit-log /cache/settings + /config_overrides/hashicorp_vault mutations
2026-05-04 14:42:23 -07:00
yuneng-jiang
c38e23e6d3
Merge pull request #26915 from stuxf/codex/provider-url-destination-guard
chore(providers): guard URL-valued model destinations
2026-05-04 14:38:37 -07:00
yuneng-jiang
c064170a18
Merge pull request #26819 from stuxf/fix/team-callback-idor
chore(team): require team-management role on /team/{id}/callback endpoints
2026-05-04 14:31:59 -07:00
mateo-berri
cdd777fb21 fix: remove unused import 2026-05-04 14:29:12 -07:00
user
abbefccad4
fix(guardrails): align banned_keywords + azure_content_safety call_type gates with runtime route_type
The hooks gated on ``call_type == "completion"`` but the proxy ingress
passes ``route_type`` straight through as ``call_type`` —
``"acompletion"`` for /v1/chat/completions and ``"aresponses"`` for
/v1/responses. Tests passed because they used the literal sync
``"completion"`` value, masking the gap.

Switch both hooks to ``is_text_content_call_type`` (matches the
canonical runtime values: completion / acompletion / aresponses) and
update existing tests to assert against runtime values, plus parametrize
a regression test that pins the gate.
2026-05-04 21:27:24 +00:00
user
9f1feaadeb Clean up Redis semantic cache isolation fallback 2026-05-04 14:23:31 -07:00
yuneng-jiang
285e103db4
Merge pull request #27126 from stuxf/codex/dependency-refresh-2026-05-04
chore(deps): refresh dependency locks
2026-05-04 14:23:22 -07:00
Cursor Agent
0f04132cf8
refactor(anthropic,bedrock,databricks): factor BadRequestError for unknown reasoning_effort
Three call sites raised the same BadRequestError("Invalid reasoning_effort:
... Must be one of 'minimal', 'low', ...") block when REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT
returned None: anthropic chat map_openai_params, bedrock converse
_handle_reasoning_effort_parameter, and databricks chat reasoning_effort path.

Extract AnthropicConfig._raise_invalid_reasoning_effort(model, value, llm_provider)
so future copy edits / valid-set changes happen in one place. Typed as NoReturn
so type-checkers correctly narrow control flow at call sites.

Addresses Michael's review on PR #27074.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
2026-05-04 21:16:40 +00:00
Cursor Agent
37ae24dbeb
refactor(anthropic,bedrock): hoist drop_params output_config warning to module constant
Three call sites (anthropic chat, bedrock converse, bedrock invoke messages)
emitted the same '...Effort is only supported on Opus 4.5+, Sonnet 4.6+, and
Mythos Preview' warning verbatim. Extract DROP_UNSUPPORTED_OUTPUT_CONFIG_WARNING
in litellm/llms/anthropic/chat/transformation.py and import it from the bedrock
sites so future copy edits live in one place.

Addresses Michael's review on PR #27074.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
2026-05-04 21:09:08 +00:00
harish-berri
f79d3fae2d
Merge pull request #26902 from BerriAI/litellm_pyroscope_tag_wrapper
feat(proxy): add support for Grafana Cloud Pyroscope authentication
2026-05-04 14:08:17 -07:00
mateo-berri
1ac430cfea style: make _model_supports_effort_param more concise 2026-05-04 13:54:05 -07:00
Yassin Kortam
516b741de1 feat: selectively apply routing strategy according to model name 2026-05-04 13:27:32 -07:00
user
1dcfc36393 chore(deps): align dashboard node engine 2026-05-04 13:21:03 -07:00
user
0f02a5f8f6 test: keep decode token test local 2026-05-04 13:14:59 -07:00
Cursor Agent
2cb3f0f027
refactor: remove unnecessary comments from #27074
Strip out the explanatory and historical comments that don't carry
business-logic justification. Comments that simply narrate what code
does — or that explain prior behavior, what was changed, or which PR
introduced a fix — are removed. Docstrings are reduced to a one-line
summary where the long form repeated information already evident from
the code or test data.

No code-behavior changes. All 643 affected unit tests still pass.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
2026-05-04 19:34:56 +00:00
Cursor Agent
56070b86a3
test(model_prices): add supports_adaptive_thinking to schema
`test_aaamodel_prices_and_context_window_json_is_valid` validates the
model-map JSON against an explicit schema with `additionalProperties`,
so the new `supports_adaptive_thinking` flag added in
98ced0ae43 needs a matching schema entry.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
2026-05-04 19:11:45 +00:00
user
e96d850b84 chore(deps): address dependency review notes 2026-05-04 12:09:04 -07:00
Cursor Agent
98ced0ae43
refactor(anthropic): drive adaptive-thinking gate via supports_adaptive_thinking flag
Three of greptile's open comments on #27074 (P2 converse:512, P1
databricks:361, and the underlying capability-flag policy rule) flagged
the same pattern: _is_claude_4_6_model(...) or _is_claude_4_7_model(...)
used inline as a runtime 'is this an adaptive-thinking model?' check.
That requires a code release each time a new adaptive Claude lands.

Consolidate the inline gating to AnthropicModelInfo._is_adaptive_thinking_model,
and switch the helper itself to read a new supports_adaptive_thinking
flag from `model_prices_and_context_window.json` via `_supports_factory`,
falling back to the family pattern only when the model-map entry doesn't
carry the flag (preserves OpenRouter / Vercel / Bedrock-prefixed variants
that route through the same code path with non-canonical ids).

Adds `supports_adaptive_thinking: true` to the four 4.6/4.7 anthropic
entries (opus-4-6 + dated, opus-4-7 + dated, sonnet-4-6). Bedrock-prefixed
and Vertex-prefixed entries don't need the flag because both fall back
through the family pattern (the helper short-circuits early on True from
either path) and the bedrock/vertex Claude IDs all match the existing
opus-4-{6,7} / sonnet-4-{6,7} pattern.

Affected call sites:

- `bedrock/chat/converse_transformation.py:_handle_reasoning_effort_parameter`
- `anthropic/chat/transformation.py:_map_reasoning_effort`
- `anthropic/chat/transformation.py:map_openai_params` (output_config branch)
- `databricks/chat/transformation.py:map_openai_params` (output_config branch)

The remaining `_is_claude_4_6_model` / `_is_claude_4_7_model` references
in `AnthropicConfig._validate_effort_for_model` and
`AnthropicConfig.get_supported_openai_params` are intentionally retained:
they're per-model gating fallbacks for variants whose model-map entries
don't yet carry the `supports_max_reasoning_effort` /
`supports_reasoning` flag. Those are documented in-place.

Tests: 537 anthropic/bedrock/databricks/vertex/messages tests pass.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
2026-05-04 18:58:22 +00:00
user
2d36596e8b fix: preserve tokenizer decode round trips 2026-05-04 11:47:32 -07:00
user
0c3b4a06cf chore(deps): authorize pytest license 2026-05-04 11:39:46 -07:00
user
bfdd786962 chore(deps): refresh dependency locks 2026-05-04 11:36:18 -07:00
user
7c94149aeb Merge remote-tracking branch 'origin/litellm_internal_staging' into codex/cloud-storage-file-guard
# Conflicts:
#	litellm/llms/vertex_ai/files/handler.py
#	litellm/llms/vertex_ai/files/transformation.py
2026-05-04 11:25:41 -07:00
user
a05d3b5851 Fix qdrant semantic cache miss metadata 2026-05-04 11:18:41 -07:00
harish-berri
ed3b056bc8 Implement normalize_nonempty_secret_str function to trim whitespace from secrets and treat empty values as unset. Update proxy_server to use this function for Grafana credentials. Enhance tests to validate the new normalization behavior. 2026-05-04 18:17:31 +00:00
Mateo Wang
790d8bbe1a
Merge pull request #26899 from BerriAI/litellm_suppress-spend-log-tracebacks-2208
feat(spend-logs): opt-in suppression of stack traces in spend-tracking error logs
2026-05-04 11:09:03 -07:00
Cursor Agent
5d124892d2
refactor(bedrock/converse): delegate effort gating to AnthropicConfig._validate_effort_for_model
Removes the duplicated max/xhigh gating logic in
_validate_anthropic_adaptive_effort and the now-unused
_supports_effort_level_on_bedrock helper. Per-model gating now flows
through the centralized AnthropicConfig._validate_effort_for_model
(whose _supports_effort_level already strips Bedrock prefixes), so the
chat completion, /v1/messages, and Bedrock Converse paths can't drift
when a new gated effort tier is added.
2026-05-04 18:01:30 +00:00
mateo-berri
c1708ddbba fix(xai): fold reasoning_tokens into completion_tokens to satisfy OpenAI invariant
xAI's chat completions API accounts reasoning_tokens separately from
completion_tokens, but rolls them into total_tokens. This breaks the
OpenAI invariant total_tokens == prompt_tokens + completion_tokens
that downstream consumers (including litellm's own _usage_format_tests
in tests/llm_translation/base_llm_unit_tests.py:58) rely on.

Live capture (grok-3-mini-beta, 2026-05-04):
    prompt=14, completion=10, total=336, reasoning=312
    14 + 10 = 24, NOT 336.

OpenAI's o1/o3 reasoning models include reasoning_tokens in
completion_tokens, leaving the prompt+completion=total invariant
intact. xAI deviates. This patch aligns xAI to OpenAI semantics by
folding reasoning_tokens into completion_tokens after the parent
OpenAI parser runs.

The fold is idempotent and defensive:
- Only fires when total_tokens == prompt_tokens + completion_tokens
  + reasoning_tokens (the documented xAI shape). Refuses to fold if
  the gap doesn't match, guarding against silent corruption when xAI
  changes accounting.
- Skips if completion_tokens already covers the gap (already
  normalised — e.g. cost calc replays a previously-folded Usage).

xai.cost_calculator.cost_per_token already added reasoning_tokens to
the visible completion count for billing. Post-fold the Usage block
now satisfies that invariant directly, so the cost calc would
double-bill. Updated cost_per_token to detect the OpenAI-normalised
shape (total == prompt + completion) and skip the reasoning add-on
in that case, falling through to the legacy raw-shape behaviour for
callers that bypass the transformation (e.g. proxy log replay).

Tests:
- Adds TestXAIReasoningTokenFolding covering: gap-explained-fold,
  idempotent-no-double-fold, no-reasoning-skip, gap-mismatch-skip.
- Adds test_already_normalised_usage_does_not_double_count_reasoning
  to lock the cost-calc idempotency.
- Updates 7 pre-existing cost-calc tests whose total_tokens was
  internally inconsistent (used the OpenAI-normalised total but kept
  reasoning_tokens external) to use the documented xAI raw shape
  total = prompt + visible completion + reasoning. Pre-existing
  values masked the missing-fold by accident.

Verified end-to-end against the live xAI API:
    LITELLM_LOCAL_MODEL_COST_MAP=False (CI default) +
    XAI_API_KEY set +
    pytest tests/llm_translation/test_xai.py::TestXAIChat::test_prompt_caching
        -> PASSED in 18.81s (was: AssertionError on
        usage.total_tokens == usage.prompt_tokens + usage.completion_tokens)

20/20 tests in tests/test_litellm/llms/xai/test_xai_cost_calculator.py
and 8/8 in tests/test_litellm/llms/xai/test_xai_chat_transformation.py
pass.
2026-05-04 10:34:31 -07:00
harish-berri
a470309c3b Merge branch 'litellm_internal_staging' of https://github.com/BerriAI/litellm into litellm_pyroscope_tag_wrapper
merge parent
2026-05-04 17:17:49 +00:00
yuneng-jiang
b2c270e653
Merge pull request #27123 from BerriAI/litellm_/intelligent-fermat-298a82
[Fix] Docker: Pin Wolfi And Uv To Multi-Arch Index Digests
2026-05-04 10:08:47 -07:00
Yuneng Jiang
25a5cccc7a
[Fix] Docker: Pin Uv To Multi-Arch Index Digest In Remaining Dockerfiles
Apply the same fix to the three Dockerfiles not in the release pipeline
today (alpine, dev, health_check) so they stay correct if/when they're
built for arm64 in the future.

Wolfi pins are not present in these files; the python:3.11-alpine and
python:3.13-slim digests they already use are multi-arch indexes that
include arm64/v8, so only the uv pin needed swapping.
2026-05-04 10:02:48 -07:00
Michael-RZ-Berri
675e49ed94
Merge pull request #26894 from BerriAI/litellm_langsmithRedactApiInfo
[Fix] Remove unwanted metadata info from LangSmith
2026-05-04 09:55:59 -07:00
Yuneng Jiang
08d130a8fe
[Fix] Docker: Pin Wolfi And Uv To Multi-Arch Index Digests
The previous pins resolved to single-platform amd64 manifests, so buildx
pulled the same amd64 base for both linux/amd64 and linux/arm64 targets.
The published OCI index then advertised an arm64 entry whose layers are
byte-identical to amd64 -- arm64 users got an amd64 binary.

Switch all three Dockerfiles to the multi-arch image-index digests:
  - cgr.dev/chainguard/wolfi-base   (index has linux/amd64 + linux/arm64)
  - ghcr.io/astral-sh/uv:0.11.7     (index has linux/amd64 + linux/arm64)

Resolved with `docker buildx imagetools inspect <ref>` -- that returns
the index digest. `docker pull` + `docker inspect` returns the per-host
platform digest, which is what slipped in last time.
2026-05-04 09:55:53 -07:00
Mateo Wang
196c7a0c09
Merge pull request #27077 from BerriAI/litellm_fix_responses_api_legacy_claude_4_sonnet-9574
test(responses): replace legacy `claude-4-sonnet-20250514` alias in multiturn tool-call test
2026-05-04 09:50:17 -07:00
mateo-berri
f4d6d5953d test(anthropic/chat): force PR-local model_cost map via autouse fixture
CI runs without LITELLM_LOCAL_MODEL_COST_MAP=True, so litellm.model_cost
is loaded from main-branch JSON (default model_cost_map_url) instead of
the PR's checked-out model_prices_and_context_window.json. Tests that
assert per-model flags added in this PR (supports_max_reasoning_effort,
supports_xhigh_reasoning_effort) therefore pass locally but fail in CI
with 'AssertionError: assert False is True' on 5 cases:

  - test_anthropic_model_supports_effort_param_recognizes_supporting_models
    [anthropic.claude-mythos-preview, bedrock/.../mythos-preview,
     claude-opus-4-5-20251101]
  - test_supports_effort_level_handles_provider_prefixes
    [bedrock/invoke/us.anthropic.claude-sonnet-4-6-max-True,
     claude-sonnet-4-6-max-True]

Add an autouse fixture at tests/test_litellm/llms/anthropic/chat/conftest.py
that monkey-patches litellm.model_cost to the PR-local JSON for every test
in this directory. The parent conftest already snapshots+restores
litellm.model_cost per-function, so the mutation is contained.

This is a scoped workaround. The proper fix is to set the env var
globally in the test workflow once the ~10 inline self-set test files
are audited; tracking that as a follow-up issue.
2026-05-04 09:26:57 -07:00
Sameer Kankute
c53c71ad66
test(image_gen): align Azure image gen fixture with body omitting model
Expected JSON matches deployment-scoped Azure POST (#26316).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-04 18:09:30 +05:30
Sameer Kankute
32a5e77adf
feat(proxy): add health_check_reasoning_effort for model health checks
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-04 17:13:01 +05:30
mateo-berri
2c9166c4f3 fix(databricks): narrow reasoning_effort_value to str for mypy
`non_default_params.get("reasoning_effort")` returns `Any | None`,
but `REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT.get()` expects `str`.
Mypy flagged this on the strict pass. Narrow with `isinstance` before
the lookup; non-strings fall through to the existing `BadRequestError`
below with a clean validation message, so behavior is unchanged.

Fixes a regression introduced by 1a10746e95 in this PR.
2026-05-04 00:52:09 -07:00
mateo-berri
f8f07c5cb7 refactor(anthropic): extract _validate_effort_for_model to prevent drift
The chat completion path (`_apply_output_config`) and the /v1/messages
pass-through (`AnthropicMessagesConfig._translate_reasoning_effort_to_anthropic`)
both gate `max` / `xhigh` per model. The two sites had diverged from
near-identical copies into separately maintained blocks, creating a real
drift risk when a new model tier (e.g. Claude 4.8) lands -- a contributor
could update one site and miss the other.

Centralise the gating in `AnthropicConfig._validate_effort_for_model`,
which returns an error message string or `None`. Each call site keeps
its own provider-appropriate exception type (`BadRequestError` for the
chat path, `AnthropicError` for the /v1/messages pass-through) but the
gating decision now comes from one place. Net -11 LOC.

Adds a parametrised unit test exercising the helper directly across
4.5 / 4.6 / 4.7 model families and `max` / `xhigh` / lower-effort
inputs. Existing tests at both call sites continue to pass unchanged.

Addresses Greptile finding on PR #27074.
2026-05-04 00:47:55 -07:00