litellm

Author	SHA1	Message	Date
Mateo Wang	3b21441d2b	Merge pull request #26947 from BerriAI/litellm_rateLimitMetricLabels [fix] fix metric labels for litellm-side rejects	2026-05-04 15:34:02 -07:00
user	a2473ef0c2	chore(caching): remove allow_legacy_unscoped_cache_hits opt-in The flag was an opt-in escape hatch for the cross-tenant leak the rest of the patch closes — flipping it on (env var or constructor param) re-enables exactly the VERIA-54 primitive on either backend. There is no operational need that the secure path doesn't already meet: - Qdrant: legacy points without ``litellm_cache_key`` payload are excluded by the must-clause filter and treated as misses; new sets populate the cache key, so cold-start lasts only as long as the natural cache rebuild. - Redis: existing unscoped index can't carry the new schema; the init path falls back to ``{name}_isolated`` (and recreates it on stale schema), leaving the legacy index untouched. Drop the constructor param, env-var fallback, ``_using_legacy_unscoped_index`` flag, the legacy-reuse branch in ``_init_semantic_cache``, and the matching guards in set/get paths. Update tests to drop the legacy-mode cases and assert the secure-only behaviour.	2026-05-04 22:16:30 +00:00
user	7d7244986e	chore(caching): annotate qdrant quantization_params dict type Mypy infers the dict's value type from the first branch (Dict[str, bool]) which clashes with the scalar branch's mixed-type inner dict. Explicit Dict[str, Any] annotation lifts the inference.	2026-05-04 22:10:59 +00:00
Yassin Kortam	f9ae559c1e	Merge pull request #27022 from BerriAI/litellm_fix/routing-strategy-model-filter feat: selectively apply routing strategy according to model name	2026-05-04 15:06:22 -07:00
Michael-RZ-Berri	1a17c438b6	Merge pull request #27133 from BerriAI/litellm_zeroBudgetTreatedAsNoCap [Fix] Treat 0 team_member_budget as no cap	2026-05-04 15:05:18 -07:00
yuneng-jiang	2aa4301fe7	Merge pull request #27027 from stuxf/fix/mcp-server-url-redact-non-admin fix(proxy): redact MCP server URL and headers for non-admin viewers (VERIA-8)	2026-05-04 15:05:05 -07:00
yuneng-jiang	07807f760f	Merge pull request #27019 from stuxf/codex/cloud-storage-file-guard fix(files): constrain cloud storage file paths (VERIA-45, VERIA-59)	2026-05-04 14:59:47 -07:00
Mateo Wang	196dbb4b43	Merge pull request #27074 from BerriAI/litellm_fix_reasoning_effort_followup-0f97 fix(anthropic,bedrock,vertex): forward output_config.effort + 400 on garbage reasoning_effort	2026-05-04 14:59:27 -07:00
yuneng-jiang	e1083b1353	Merge pull request #27014 from stuxf/fix/activity-endpoint-tenant-scoping fix(proxy): scope team and agent activity endpoints per-entity (VERIA-43)	2026-05-04 14:51:45 -07:00
yuneng-jiang	6f5678bcd8	Merge pull request #27007 from stuxf/fix/admin-viewer-write-route-blocklist fix(auth): block missing write routes for proxy admin viewers	2026-05-04 14:45:14 -07:00
Michael Riad Zaky	28bf4647ef	Treat 0 team_member_budget as no cap	2026-05-04 14:45:13 -07:00
user	af7794272b	Add semantic cache legacy migration flag	2026-05-04 14:42:53 -07:00
yuneng-jiang	e995156462	Merge pull request #26953 from stuxf/chore/audit-log-cache-vault-settings chore(audit): audit-log /cache/settings + /config_overrides/hashicorp_vault mutations	2026-05-04 14:42:23 -07:00
yuneng-jiang	c38e23e6d3	Merge pull request #26915 from stuxf/codex/provider-url-destination-guard chore(providers): guard URL-valued model destinations	2026-05-04 14:38:37 -07:00
yuneng-jiang	c064170a18	Merge pull request #26819 from stuxf/fix/team-callback-idor chore(team): require team-management role on /team/{id}/callback endpoints	2026-05-04 14:31:59 -07:00
mateo-berri	cdd777fb21	fix: remove unused import	2026-05-04 14:29:12 -07:00
user	abbefccad4	fix(guardrails): align banned_keywords + azure_content_safety call_type gates with runtime route_type The hooks gated on ``call_type == "completion"`` but the proxy ingress passes ``route_type`` straight through as ``call_type`` — ``"acompletion"`` for /v1/chat/completions and ``"aresponses"`` for /v1/responses. Tests passed because they used the literal sync ``"completion"`` value, masking the gap. Switch both hooks to ``is_text_content_call_type`` (matches the canonical runtime values: completion / acompletion / aresponses) and update existing tests to assert against runtime values, plus parametrize a regression test that pins the gate.	2026-05-04 21:27:24 +00:00
user	9f1feaadeb	Clean up Redis semantic cache isolation fallback	2026-05-04 14:23:31 -07:00
yuneng-jiang	285e103db4	Merge pull request #27126 from stuxf/codex/dependency-refresh-2026-05-04 chore(deps): refresh dependency locks	2026-05-04 14:23:22 -07:00
Cursor Agent	0f04132cf8	refactor(anthropic,bedrock,databricks): factor BadRequestError for unknown reasoning_effort Three call sites raised the same BadRequestError("Invalid reasoning_effort: ... Must be one of 'minimal', 'low', ...") block when REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT returned None: anthropic chat map_openai_params, bedrock converse _handle_reasoning_effort_parameter, and databricks chat reasoning_effort path. Extract AnthropicConfig._raise_invalid_reasoning_effort(model, value, llm_provider) so future copy edits / valid-set changes happen in one place. Typed as NoReturn so type-checkers correctly narrow control flow at call sites. Addresses Michael's review on PR #27074. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>	2026-05-04 21:16:40 +00:00
Cursor Agent	37ae24dbeb	refactor(anthropic,bedrock): hoist drop_params output_config warning to module constant Three call sites (anthropic chat, bedrock converse, bedrock invoke messages) emitted the same '...Effort is only supported on Opus 4.5+, Sonnet 4.6+, and Mythos Preview' warning verbatim. Extract DROP_UNSUPPORTED_OUTPUT_CONFIG_WARNING in litellm/llms/anthropic/chat/transformation.py and import it from the bedrock sites so future copy edits live in one place. Addresses Michael's review on PR #27074. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>	2026-05-04 21:09:08 +00:00
harish-berri	f79d3fae2d	Merge pull request #26902 from BerriAI/litellm_pyroscope_tag_wrapper feat(proxy): add support for Grafana Cloud Pyroscope authentication	2026-05-04 14:08:17 -07:00
mateo-berri	1ac430cfea	style: make _model_supports_effort_param more concise	2026-05-04 13:54:05 -07:00
Yassin Kortam	516b741de1	feat: selectively apply routing strategy according to model name	2026-05-04 13:27:32 -07:00
user	1dcfc36393	chore(deps): align dashboard node engine	2026-05-04 13:21:03 -07:00
user	0f02a5f8f6	test: keep decode token test local	2026-05-04 13:14:59 -07:00
Cursor Agent	2cb3f0f027	refactor: remove unnecessary comments from #27074 Strip out the explanatory and historical comments that don't carry business-logic justification. Comments that simply narrate what code does — or that explain prior behavior, what was changed, or which PR introduced a fix — are removed. Docstrings are reduced to a one-line summary where the long form repeated information already evident from the code or test data. No code-behavior changes. All 643 affected unit tests still pass. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>	2026-05-04 19:34:56 +00:00
Cursor Agent	56070b86a3	test(model_prices): add supports_adaptive_thinking to schema `test_aaamodel_prices_and_context_window_json_is_valid` validates the model-map JSON against an explicit schema with `additionalProperties`, so the new `supports_adaptive_thinking` flag added in `98ced0ae43` needs a matching schema entry. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>	2026-05-04 19:11:45 +00:00
user	e96d850b84	chore(deps): address dependency review notes	2026-05-04 12:09:04 -07:00
Cursor Agent	98ced0ae43	refactor(anthropic): drive adaptive-thinking gate via supports_adaptive_thinking flag Three of greptile's open comments on #27074 (P2 converse:512, P1 databricks:361, and the underlying capability-flag policy rule) flagged the same pattern: _is_claude_4_6_model(...) or _is_claude_4_7_model(...) used inline as a runtime 'is this an adaptive-thinking model?' check. That requires a code release each time a new adaptive Claude lands. Consolidate the inline gating to AnthropicModelInfo._is_adaptive_thinking_model, and switch the helper itself to read a new supports_adaptive_thinking flag from `model_prices_and_context_window.json` via `_supports_factory`, falling back to the family pattern only when the model-map entry doesn't carry the flag (preserves OpenRouter / Vercel / Bedrock-prefixed variants that route through the same code path with non-canonical ids). Adds `supports_adaptive_thinking: true` to the four 4.6/4.7 anthropic entries (opus-4-6 + dated, opus-4-7 + dated, sonnet-4-6). Bedrock-prefixed and Vertex-prefixed entries don't need the flag because both fall back through the family pattern (the helper short-circuits early on True from either path) and the bedrock/vertex Claude IDs all match the existing opus-4-{6,7} / sonnet-4-{6,7} pattern. Affected call sites: - `bedrock/chat/converse_transformation.py:_handle_reasoning_effort_parameter` - `anthropic/chat/transformation.py:_map_reasoning_effort` - `anthropic/chat/transformation.py:map_openai_params` (output_config branch) - `databricks/chat/transformation.py:map_openai_params` (output_config branch) The remaining `_is_claude_4_6_model` / `_is_claude_4_7_model` references in `AnthropicConfig._validate_effort_for_model` and `AnthropicConfig.get_supported_openai_params` are intentionally retained: they're per-model gating fallbacks for variants whose model-map entries don't yet carry the `supports_max_reasoning_effort` / `supports_reasoning` flag. Those are documented in-place. Tests: 537 anthropic/bedrock/databricks/vertex/messages tests pass. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>	2026-05-04 18:58:22 +00:00
user	2d36596e8b	fix: preserve tokenizer decode round trips	2026-05-04 11:47:32 -07:00
user	0c3b4a06cf	chore(deps): authorize pytest license	2026-05-04 11:39:46 -07:00
user	bfdd786962	chore(deps): refresh dependency locks	2026-05-04 11:36:18 -07:00
user	7c94149aeb	Merge remote-tracking branch 'origin/litellm_internal_staging' into codex/cloud-storage-file-guard # Conflicts: # litellm/llms/vertex_ai/files/handler.py # litellm/llms/vertex_ai/files/transformation.py	2026-05-04 11:25:41 -07:00
user	a05d3b5851	Fix qdrant semantic cache miss metadata	2026-05-04 11:18:41 -07:00
harish-berri	ed3b056bc8	Implement normalize_nonempty_secret_str function to trim whitespace from secrets and treat empty values as unset. Update proxy_server to use this function for Grafana credentials. Enhance tests to validate the new normalization behavior.	2026-05-04 18:17:31 +00:00
Mateo Wang	790d8bbe1a	Merge pull request #26899 from BerriAI/litellm_suppress-spend-log-tracebacks-2208 feat(spend-logs): opt-in suppression of stack traces in spend-tracking error logs	2026-05-04 11:09:03 -07:00
Cursor Agent	5d124892d2	refactor(bedrock/converse): delegate effort gating to AnthropicConfig._validate_effort_for_model Removes the duplicated max/xhigh gating logic in _validate_anthropic_adaptive_effort and the now-unused _supports_effort_level_on_bedrock helper. Per-model gating now flows through the centralized AnthropicConfig._validate_effort_for_model (whose _supports_effort_level already strips Bedrock prefixes), so the chat completion, /v1/messages, and Bedrock Converse paths can't drift when a new gated effort tier is added.	2026-05-04 18:01:30 +00:00
mateo-berri	c1708ddbba	fix(xai): fold reasoning_tokens into completion_tokens to satisfy OpenAI invariant xAI's chat completions API accounts reasoning_tokens separately from completion_tokens, but rolls them into total_tokens. This breaks the OpenAI invariant total_tokens == prompt_tokens + completion_tokens that downstream consumers (including litellm's own _usage_format_tests in tests/llm_translation/base_llm_unit_tests.py:58) rely on. Live capture (grok-3-mini-beta, 2026-05-04): prompt=14, completion=10, total=336, reasoning=312 14 + 10 = 24, NOT 336. OpenAI's o1/o3 reasoning models include reasoning_tokens in completion_tokens, leaving the prompt+completion=total invariant intact. xAI deviates. This patch aligns xAI to OpenAI semantics by folding reasoning_tokens into completion_tokens after the parent OpenAI parser runs. The fold is idempotent and defensive: - Only fires when total_tokens == prompt_tokens + completion_tokens + reasoning_tokens (the documented xAI shape). Refuses to fold if the gap doesn't match, guarding against silent corruption when xAI changes accounting. - Skips if completion_tokens already covers the gap (already normalised — e.g. cost calc replays a previously-folded Usage). xai.cost_calculator.cost_per_token already added reasoning_tokens to the visible completion count for billing. Post-fold the Usage block now satisfies that invariant directly, so the cost calc would double-bill. Updated cost_per_token to detect the OpenAI-normalised shape (total == prompt + completion) and skip the reasoning add-on in that case, falling through to the legacy raw-shape behaviour for callers that bypass the transformation (e.g. proxy log replay). Tests: - Adds TestXAIReasoningTokenFolding covering: gap-explained-fold, idempotent-no-double-fold, no-reasoning-skip, gap-mismatch-skip. - Adds test_already_normalised_usage_does_not_double_count_reasoning to lock the cost-calc idempotency. - Updates 7 pre-existing cost-calc tests whose total_tokens was internally inconsistent (used the OpenAI-normalised total but kept reasoning_tokens external) to use the documented xAI raw shape total = prompt + visible completion + reasoning. Pre-existing values masked the missing-fold by accident. Verified end-to-end against the live xAI API: LITELLM_LOCAL_MODEL_COST_MAP=False (CI default) + XAI_API_KEY set + pytest tests/llm_translation/test_xai.py::TestXAIChat::test_prompt_caching -> PASSED in 18.81s (was: AssertionError on usage.total_tokens == usage.prompt_tokens + usage.completion_tokens) 20/20 tests in tests/test_litellm/llms/xai/test_xai_cost_calculator.py and 8/8 in tests/test_litellm/llms/xai/test_xai_chat_transformation.py pass.	2026-05-04 10:34:31 -07:00
harish-berri	a470309c3b	Merge branch 'litellm_internal_staging' of https://github.com/BerriAI/litellm into litellm_pyroscope_tag_wrapper merge parent	2026-05-04 17:17:49 +00:00
yuneng-jiang	b2c270e653	Merge pull request #27123 from BerriAI/litellm_/intelligent-fermat-298a82 [Fix] Docker: Pin Wolfi And Uv To Multi-Arch Index Digests	2026-05-04 10:08:47 -07:00
Yuneng Jiang	25a5cccc7a	[Fix] Docker: Pin Uv To Multi-Arch Index Digest In Remaining Dockerfiles Apply the same fix to the three Dockerfiles not in the release pipeline today (alpine, dev, health_check) so they stay correct if/when they're built for arm64 in the future. Wolfi pins are not present in these files; the python:3.11-alpine and python:3.13-slim digests they already use are multi-arch indexes that include arm64/v8, so only the uv pin needed swapping.	2026-05-04 10:02:48 -07:00
Michael-RZ-Berri	675e49ed94	Merge pull request #26894 from BerriAI/litellm_langsmithRedactApiInfo [Fix] Remove unwanted metadata info from LangSmith	2026-05-04 09:55:59 -07:00
Yuneng Jiang	08d130a8fe	[Fix] Docker: Pin Wolfi And Uv To Multi-Arch Index Digests The previous pins resolved to single-platform amd64 manifests, so buildx pulled the same amd64 base for both linux/amd64 and linux/arm64 targets. The published OCI index then advertised an arm64 entry whose layers are byte-identical to amd64 -- arm64 users got an amd64 binary. Switch all three Dockerfiles to the multi-arch image-index digests: - cgr.dev/chainguard/wolfi-base (index has linux/amd64 + linux/arm64) - ghcr.io/astral-sh/uv:0.11.7 (index has linux/amd64 + linux/arm64) Resolved with `docker buildx imagetools inspect <ref>` -- that returns the index digest. `docker pull` + `docker inspect` returns the per-host platform digest, which is what slipped in last time.	2026-05-04 09:55:53 -07:00
Mateo Wang	196c7a0c09	Merge pull request #27077 from BerriAI/litellm_fix_responses_api_legacy_claude_4_sonnet-9574 test(responses): replace legacy `claude-4-sonnet-20250514` alias in multiturn tool-call test	2026-05-04 09:50:17 -07:00
mateo-berri	f4d6d5953d	test(anthropic/chat): force PR-local model_cost map via autouse fixture CI runs without LITELLM_LOCAL_MODEL_COST_MAP=True, so litellm.model_cost is loaded from main-branch JSON (default model_cost_map_url) instead of the PR's checked-out model_prices_and_context_window.json. Tests that assert per-model flags added in this PR (supports_max_reasoning_effort, supports_xhigh_reasoning_effort) therefore pass locally but fail in CI with 'AssertionError: assert False is True' on 5 cases: - test_anthropic_model_supports_effort_param_recognizes_supporting_models [anthropic.claude-mythos-preview, bedrock/.../mythos-preview, claude-opus-4-5-20251101] - test_supports_effort_level_handles_provider_prefixes [bedrock/invoke/us.anthropic.claude-sonnet-4-6-max-True, claude-sonnet-4-6-max-True] Add an autouse fixture at tests/test_litellm/llms/anthropic/chat/conftest.py that monkey-patches litellm.model_cost to the PR-local JSON for every test in this directory. The parent conftest already snapshots+restores litellm.model_cost per-function, so the mutation is contained. This is a scoped workaround. The proper fix is to set the env var globally in the test workflow once the ~10 inline self-set test files are audited; tracking that as a follow-up issue.	2026-05-04 09:26:57 -07:00
Sameer Kankute	c53c71ad66	test(image_gen): align Azure image gen fixture with body omitting model Expected JSON matches deployment-scoped Azure POST (#26316). Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-04 18:09:30 +05:30
Sameer Kankute	32a5e77adf	feat(proxy): add health_check_reasoning_effort for model health checks Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-04 17:13:01 +05:30
mateo-berri	2c9166c4f3	fix(databricks): narrow reasoning_effort_value to str for mypy `non_default_params.get("reasoning_effort")` returns `Any \| None`, but `REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT.get()` expects `str`. Mypy flagged this on the strict pass. Narrow with `isinstance` before the lookup; non-strings fall through to the existing `BadRequestError` below with a clean validation message, so behavior is unchanged. Fixes a regression introduced by `1a10746e95` in this PR.	2026-05-04 00:52:09 -07:00
mateo-berri	f8f07c5cb7	refactor(anthropic): extract _validate_effort_for_model to prevent drift The chat completion path (`_apply_output_config`) and the /v1/messages pass-through (`AnthropicMessagesConfig._translate_reasoning_effort_to_anthropic`) both gate `max` / `xhigh` per model. The two sites had diverged from near-identical copies into separately maintained blocks, creating a real drift risk when a new model tier (e.g. Claude 4.8) lands -- a contributor could update one site and miss the other. Centralise the gating in `AnthropicConfig._validate_effort_for_model`, which returns an error message string or `None`. Each call site keeps its own provider-appropriate exception type (`BadRequestError` for the chat path, `AnthropicError` for the /v1/messages pass-through) but the gating decision now comes from one place. Net -11 LOC. Adds a parametrised unit test exercising the helper directly across 4.5 / 4.6 / 4.7 model families and `max` / `xhigh` / lower-effort inputs. Existing tests at both call sites continue to pass unchanged. Addresses Greptile finding on PR #27074.	2026-05-04 00:47:55 -07:00

... 2 3 4 5 6 ...

39037 Commits