Commit Graph

417 Commits

Author SHA1 Message Date
yuneng-jiang
1480ec698b
chore(ci): bump versions (#28287)
* bump: version 0.4.72 → 0.4.73

* bump: version 1.86.0 → 1.87.0

* uv lock
2026-05-19 15:10:37 -07:00
Sameer Kankute
36c494fdd2
Litellm oss staging (#28161)
* fix(opentelemetry): JSON-serialize dict metadata fields for OTEL span attributes (#27451) (#27455)

Squash-merged by litellm-agent from Anai-Guo's PR.

* feat(dashscope): add embeddings and reranks(qwen3-rerank) support via OpenAI-compatible endpoint (#27508)

Squash-merged by litellm-agent from yimao's PR.

* fix(vertex_ai/gemini): raise BadRequestError when image_url or url fi… (#24550)

Squash-merged by litellm-agent from krisxia0506's PR.

* fix(vertex_ai): raise error on mid-stream 429/error chunks instead of silently swallowing (#23711)

Squash-merged by litellm-agent from krisxia0506's PR.

* fix: raise BadRequestError for file content blocks missing 'file' sub… (#24503)

Squash-merged by litellm-agent from krisxia0506's PR.

* Fix Gemini MIME detection for extensionless GCS URIs (#27278)

Squash-merged by litellm-agent from krisxia0506's PR.

* fix(vertex_ai/partner_models): drop unused vertexai SDK gate from count_tokens (closes #28084) (#28107)

Squash-merged by litellm-agent from voidborne-d's PR.

* feat(chart): add support for autoscaling behavior in HPA (#27990)

Squash-merged by litellm-agent from FabrizioCafolla's PR.

* feat(proxy): add blocked flag to models for pause/resume from the UI (#27927)

Squash-merged by litellm-agent from Cyberfilo's PR.

* fix: pass socket timeouts to Redis cluster clients (#27920)

Squash-merged by litellm-agent from tomdee's PR.

* Fix/cache token (#28009)

Squash-merged by litellm-agent from escon1004's PR.

* fix(deepseek): forward reasoning_content in multi-turn thinking mode conversations (#28080)

Squash-merged by litellm-agent from Divyansh8321's PR.

* fix(guardrails): return HTTP 400 instead of 500 for blocked requests (#27617)

* fix: reset org and tag budgets (#27326)

* reset org budgets

* reset tag budgets

---------

Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain>

* fix(ui): omit allowed_routes from key edit save when unchanged (#27553)

* fix(ui): omit allowed_routes from key edit save when unchanged

When a team admin opens Edit Settings on a key with key_type=AI APIs and
saves without changing anything, the UI re-sends the existing allowed_routes
value, which the backend's _check_allowed_routes_caller_permission gate
rejects for non-proxy-admins (LIT-2681).

Strip allowed_routes from the patch in handleSubmit when it deep-equals the
original keyData.allowed_routes. The backend treats absence as "leave alone,"
so no-op saves now succeed for non-admins. Admins explicitly editing the
field still send the new value.

* fix(ui): order-insensitive allowed_routes diff + cover null-original case

Address Greptile review:

- Switch the "is allowed_routes unchanged" check to a Set-based comparison so
  a server-side reorder of the array doesn't register as a user edit and
  re-trigger LIT-2681.
- Add two regression tests: (1) keyData.allowed_routes is null and the form
  is untouched — patch should strip the field; (2) server returned routes in
  a different order than the user originally entered — patch should still
  recognize the value as unchanged.

* chore(ui): strip ticket refs and tighten comments in key edit fix

- Remove internal-tracker references from in-code comments
- Tighten the WHY comment in handleSubmit to two lines
- Drop redundant test-block comments — test names already describe the case

* fix(ui): annotate Set<string> generic in allowed_routes diff to fix tsc

* fix(guardrails): return HTTP 400 instead of 500 for guardrail-blocked requests

GuardrailRaisedException and BlockedPiiEntityError both lacked a
status_code attribute.  When these exceptions reached the proxy
exception handler (getattr(e, 'status_code', 500)), the fallback
defaulted to HTTP 500 — making intentional guardrail blocks
indistinguishable from server errors and causing unnecessary client
retries.

Changes:
- Add status_code=400 (keyword-only) to GuardrailRaisedException
- Add status_code=400 (keyword-only) to BlockedPiiEntityError
- Update _is_guardrail_intervention() to recognize both exceptions
  so downstream loggers record 'guardrail_intervened' instead of
  'guardrail_failed_to_respond'
- Add 6 unit tests for default/custom status codes and getattr pattern
- Strengthen existing blocked-action test with status_code assertion

Fixes #24348

---------

Co-authored-by: Michael-RZ-Berri <michael@berri.ai>
Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain>
Co-authored-by: ryan-crabbe-berri <ryan@berri.ai>
Co-authored-by: Krrish Dholakia <krrish+github@berri.ai>

* fix(router/proxy): address Greptile P1+P2 review comments on PR #28161

- router: raise ServiceUnavailableError (503) instead of RouterRateLimitErrorBasic (429)
  when a specifically-addressed deployment is administratively blocked; 429 misleads
  retry-enabled clients into spinning forever against a paused model
- proxy_server: compute get_fully_blocked_model_names() once before both branches in
  model_list() instead of duplicating the call in each branch
- deepseek: upgrade silent debug log to warning when injecting placeholder
  reasoning_content so callers are clearly notified of degraded multi-turn quality
- tests: update two blocked-deployment assertions to expect ServiceUnavailableError

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix: address bug detection findings (cache token order, mutable defaults)

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix: address bugs in async pass-through, anthropic cache token detection, rerank tests

- async_get_available_deployment_for_pass_through: enforce blocked check on specific deployments
- cost_calculator: detect anthropic-style usage by attribute presence (not truthiness) to avoid mixing OpenAI cached_tokens into anthropic normalization when read=0
- dashscope rerank tests: pass request to httpx.Response constructions for consistency

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix code qa

* fix(vertex_ai/gemini): strip MIME parameters from GCS contentType

GCS object metadata's contentType field can include parameters such as
'text/html; charset=utf-8'. Strip them in _apply_gemini_mime_type_aliases
so downstream get_file_extension_from_mime_type sees a bare MIME type.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(vertex_ai/gemini): clarify mime-type error message string concatenation

Co-authored-by: Yassin Kortam <yassin@berri.ai>

---------

Co-authored-by: Tai An <antai12232931@outlook.com>
Co-authored-by: Vincent <yimao1231@gmail.com>
Co-authored-by: Kris Xia <xiajiayi0506@gmail.com>
Co-authored-by: d 🔹 <liusway405@gmail.com>
Co-authored-by: Fabrizio Cafolla <developer@fabriziocafolla.com>
Co-authored-by: Filippo Menghi <113345637+Cyberfilo@users.noreply.github.com>
Co-authored-by: Tom Denham <tom@tomdee.co.uk>
Co-authored-by: escon1004 <70471150+escon1004@users.noreply.github.com>
Co-authored-by: Divyansh Singhal <97736786+Divyansh8321@users.noreply.github.com>
Co-authored-by: robin-fiddler <robin@fiddler.ai>
Co-authored-by: Michael-RZ-Berri <michael@berri.ai>
Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain>
Co-authored-by: ryan-crabbe-berri <ryan@berri.ai>
Co-authored-by: Krrish Dholakia <krrish+github@berri.ai>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
2026-05-18 16:27:44 -07:00
Yuneng Jiang
0aa439d919
bump: version 0.4.71 → 0.4.72 2026-05-13 21:51:11 -07:00
Krrish Dholakia
8bbc61e03c
fix: harden /key/update authorization checks (#27878)
* fix: patch Host-header auth bypass in get_request_route

Starlette reconstructs request.url from the Host header. A malformed
Host like `localhost/?x=1` causes Starlette to build the full URL as
`http://localhost/?x=1/health`, which url-parses to path="/". Since "/"
is in LiteLLMRoutes.public_routes, all protected routes became reachable
without authentication.

Fix: read scope["path"] (set by uvicorn from the HTTP request line,
not derivable from headers) instead of request.url.path. Sub-path
deployments are handled via scope["app_root_path"] / scope["root_path"],
mirroring Starlette's own base_url construction logic.

Affected variants confirmed fixed:
  Host: localhost/?x=1
  Host: localhost:4000/?x=1
  Host: localhost/#test
  Host: localhost:4000/#test

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* style: reduce comments in route fix

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix: block credential fields in RAG ingest vector_store options

Credential fields (vertex_credentials, aws_access_key_id, api_key, etc.)
in ingest_options.vector_store are now rejected at the API boundary with
a 400 error. Credentials must be configured server-side.

Previously any authenticated user could supply a vertex_credentials dict
with type=external_account pointing credential_source.file at an
arbitrary path (e.g. /proc/1/environ) and token_url at an
attacker-controlled server. google-auth's identity_pool.Credentials
refresh() would read the file and POST its contents to the attacker.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix: block /key/update self-escalation by assigned users

Non-admin users who were assigned a key (created_by != caller) could
update any non-budget field — models, rpm_limit, guardrails, etc. —
without admin authorization, allowing privilege self-escalation.

Gate: only the key creator (created_by == caller) may edit their own
key without admin check; budget changes always require admin regardless
of creator status. All other callers must pass _check_key_admin_access.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix: block user-controlled api_base in RAG ingest vector_store options

A user-supplied api_base in ingest_options.vector_store caused the server
to forward its configured provider credentials (Gemini, OpenAI) to an
attacker-controlled endpoint via SSRF.

Add api_base to the blocked credential params set alongside api_key and
the existing credential fields.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix: restrict /utils/transform_request to PROXY_ADMIN and apply body safety check

Any authenticated internal_user could POST arbitrary provider config
(aws_sts_endpoint, api_base, etc.) to /utils/transform_request and have
the server forward its credentials to an attacker-controlled endpoint.

- Gate the endpoint on PROXY_ADMIN role (403 for all other roles)
- Call is_request_body_safe() to reject banned params even for admins
- Convert ValueError from safety check to HTTP 400

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix: apply banned-param check to /utils/transform_request

Without is_request_body_safe(), any authenticated user could pass
aws_sts_endpoint, api_base, or aws_web_identity_token to
/utils/transform_request and have the server forward its configured
provider credentials to an attacker-controlled endpoint during SDK
credential resolution.

Applies the same banned-param blocklist already used by LLM endpoints.
Endpoint remains accessible to all authenticated users.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix: block SSRF via api_base in /prompts/test dotprompt YAML frontmatter

Any frontmatter key not in ["model","input","output"] flowed into
optional_params and was merged into the LLM call data dict, bypassing
is_request_body_safe. An attacker with any bearer key could set
api_base in YAML to redirect the outbound LLM request — including the
provider API key — to an attacker-controlled host.

Fix: call is_request_body_safe on the constructed data dict after
optional_params are merged, before invoking ProxyBaseLLMRequestProcessing.
ValueError from the banned-param check is surfaced as HTTP 400.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* Update litellm/proxy/rag_endpoints/endpoints.py

Co-authored-by: veria-ai[bot] <224490171+veria-ai[bot]@users.noreply.github.com>

* fix: coerce nested config strings before banned-param check

_NESTED_CONFIG_KEYS descent used isinstance(nested, dict) which silently
skipped litellm_embedding_config when delivered as a JSON string via
multipart/form-data. Banned params (api_base, aws_sts_endpoint, etc.)
nested inside the stringified value were invisible to is_request_body_safe.

_NESTED_METADATA_KEYS already used _coerce_metadata_to_dict which parses
JSON strings before checking. Apply the same coercion to _NESTED_CONFIG_KEYS.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix: replace substring match with prefix match in is_llm_api_route

mapped_pass_through_routes used `_llm_passthrough_route in route` (substring)
so any admin-only path whose URL contained a provider name (openai, anthropic,
azure, bedrock, etc.) was misclassified as an LLM API route and bypassed the
admin gate in non_proxy_admin_allowed_routes_check.

Confirmed live: non-admin key could GET /credentials/by_name/openai (read
masked provider API key) and DELETE /credentials/openai (delete credential).

Fix: use exact match or startswith(prefix + "/") — the same pattern used
everywhere else in RouteChecks — so only routes that actually start with a
passthrough prefix are allowed through.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix: stabilize PR #27878 test failures

- key_management_endpoints: extend can_skip_admin_check to team keys so
  team members with /key/update permission can update non-budget fields.
  can_team_member_execute_key_management_endpoint already validates team
  membership + permission and raises if unauthorized; reaching the admin
  check on a team key means the caller was authorized.

- test: set created_by on mock key in
  test_update_key_non_budget_fields_allowed_for_internal_user so
  caller_is_creator resolves correctly (MagicMock default ≠ user_id).

- auth_utils.get_request_route: guard against non-dict request.scope
  (e.g. MagicMock in unit tests) to prevent a MagicMock leaking into
  UserAPIKeyAuth.request_route and failing Pydantic validation.

- ci: assign test_multipart_bypass_repro.py to the proxy-runtime shard
  in test-unit-proxy-db.yml to satisfy the shard-coverage check.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(lint): add explicit str() cast in get_request_route for MyPy

scope.get() returns Any|None which MyPy cannot coerce to str implicitly.
Wrap both scope.get() calls in str() to satisfy the type checker.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix: guard bare-/ root_path strip + make total_spend migration idempotent

auth_utils.get_request_route: when Starlette sets scope["app_root_path"]
to "/" (e.g. behind some middleware), the old stripping logic would
remove the leading slash from every path ("/team/new" → "team/new"),
breaking route matching and causing auth to misclassify protected routes.
Skip stripping when root_path is bare "/".

migration: add IF NOT EXISTS to total_spend ALTER TABLE so the migration
is safe to replay when a prior partial run already created the column.
Without this guard, prisma migrate deploy fails on CI DBs that were
partially migrated, causing all subsequent DB operations (including
/team/new) to 500.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix: require creator still owns key for personal-key bypass in /key/update

caller_is_creator now requires both created_by == caller AND user_id ==
caller. Previously checking only created_by let a demoted admin who
originally created a key for another user continue editing non-budget
fields on it after reassignment, bypassing _check_key_admin_access.

Adds regression test: creator whose key was reassigned is blocked (403).

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix: extract auth checks to fix PLR0915 + broaden max_budget assertion

internal_user_endpoints._update_single_user_helper exceeded 50 statements
(PLR0915). Extract authorization checks into _check_user_update_authz helper
to bring statement count under the limit.

test_validate_max_budget: assert "negative" (substring of both the local
"cannot be negative" and the CI "non-negative finite number" messages) so
the test is stable regardless of which exact wording the function uses.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: veria-ai[bot] <224490171+veria-ai[bot]@users.noreply.github.com>
2026-05-14 04:16:04 +00:00
Sameer Kankute
18f77ff7bc
feat(mcp): add delegate_auth_to_upstream flag for PKCE passthrough (#27834)
* feat(mcp): add delegate_auth_to_upstream flag for PKCE passthrough

Adds an opt-in per-server flag that lets clients (e.g. VS Code) complete
PKCE directly with an upstream OAuth2 MCP server, instead of LiteLLM
double-gating with its own API-key/SSO check. Only honored when
auth_type=oauth2 and the operator explicitly sets the flag; mixed-target
or non-oauth2 requests fail closed.

- Adds the field to Pydantic models, Prisma schema, and a migration
- New MCPRequestHandler._target_servers_delegate_auth_to_upstream gate
  that runs only when no x-litellm-api-key is present, so authenticated
  users still get user_id resolution + stored-credential lookup
- Anonymous callers now see delegate servers in get_allowed_mcp_servers
  (scoped to delegate servers only; the upstream still enforces auth)
- mcp_management_endpoints: allow anonymous /authorize and /token for
  delegate servers so VS Code can complete PKCE without a LiteLLM session
- UI toggle (shown only for oauth2) + payload/view wiring
- Tests covering: oauth2 on/off, non-oauth2 with flag, mixed targets,
  no resolvable target, explicit key precedence, and 401 emission

Co-authored-by: Cursor <cursoragent@cursor.com>

* Enforce oauth2 for delegated MCP auth bypass

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(mcp): close secondary Authorization bypass for delegate servers

The delegate-auth bypass gated only on the primary `x-litellm-api-key`
header, so a LiteLLM key sent via `Authorization: Bearer sk-...` (the
secondary header) was silently dropped — skipping spend tracking and
rate limiting. Gate on the resolved litellm_api_key (which considers
both headers) so the bypass fires only when neither is present.

Also update the existing "Authorization header present" test to reflect
that an upstream OAuth token now flows through the existing oauth2
fallback (LiteLLM auth attempt → fail → anonymous), not via the
delegate branch.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Avoid duplicate MCP OAuth credential lookup

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(mcp): block delegate bypass for M2M and internal-only servers

Two security issues flagged in code review:

1. High – client_credentials (M2M) servers must not be delegatable:
   LiteLLM auto-fetches the upstream token using stored credentials, so
   allowing anonymous bypass would let any external caller invoke tools
   authenticated as LiteLLM's service account.
   Fix: check `server.has_client_credentials` in
   `_target_servers_delegate_auth_to_upstream`, the anonymous
   allow-list in `get_allowed_mcp_servers`, and `_mcp_oauth_user_api_key_auth`.

2. Medium – internal-only servers exposed to public internet:
   The anonymous delegate allow-list was not filtering by
   `available_on_public_internet`, so external callers with an upstream
   OAuth token could invoke tools on servers marked internal-only.
   Fix: add `available_on_public_internet` guard to the anonymous
   delegate server list in `get_allowed_mcp_servers`.

Tests added for both cases.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Require public MCP delegate auth servers

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(mcp): align delegate auth path parsing with downstream routing

`_extract_target_server_names_from_path` used a naive segments-based
split while `server.py::_get_mcp_servers_in_path` uses a regex that
allows server names with one embedded slash and comma-separated lists.
With the old parser, a request to `/mcp/<delegated>/<garbage>` was
parsed as targeting `<delegated>` by the auth gate (bypassing LiteLLM
auth) while the routing layer parsed it as `<delegated>/<garbage>` —
when that name did not resolve, the request fell back to the anonymous
allow-list, which can include `allow_all_keys` servers that normally
require a LiteLLM key.

Replace the parser with the same regex logic as
`_get_mcp_servers_in_path` so auth gating sees the exact target name(s)
downstream routing sees. Add regression tests covering parser parity
and the specific extra-path-segment bypass attempt.

https://claude.ai/code/session_01SjyPmwfmrq8fveFgw9iHW9

* fix(mcp): close header/path TOCTOU in MCP delegate auth gate

`_target_servers_delegate_auth_to_upstream` and
`_target_servers_use_oauth2` trusted the `x-mcp-servers` header when
present, but `server.py::extract_mcp_auth_context` overrides that
header with the path-derived list for `/mcp/...` routes. An attacker
could set `x-mcp-servers: <delegated>` while pointing the URL path at
a non-delegate server, flipping the auth gate without changing the
target downstream routing actually uses.

Extract a shared `_resolve_target_server_names` helper that mirrors
the downstream override (path-derived names for `/mcp/...` routes,
header value otherwise). Add regression tests covering the TOCTOU
attempt and the helper's path-vs-header precedence.

https://claude.ai/code/session_01SjyPmwfmrq8fveFgw9iHW9

* Fix delegated MCP OAuth test mock

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(mcp): drop unreachable /{server}/mcp branch in auth path parser

`_extract_target_server_names_from_path` also matched the
``/{server_name}/mcp`` form, but the downstream parser
``_get_mcp_servers_in_path`` only handles ``/mcp/...`` — and
``dynamic_mcp_route`` in ``proxy_server`` rewrites ``/{name}/mcp``
to ``/mcp/{name}`` on the scope before the MCP handler runs. Parsing
the un-rewritten form on the auth side was therefore unreachable in
production, and contradicted the docstring's claim of mirroring the
downstream parser — exactly the kind of mismatch that risks a future
header/path TOCTOU if any new entry point skips the rewrite.

Drop the branch; the canonical ``/mcp/...`` path matches both
parsers. Update the regression test to assert the new behavior.

https://claude.ai/code/session_01SjyPmwfmrq8fveFgw9iHW9

* Fix MCP path auth target resolution

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(mcp): require auth for refresh_token grants on delegate-auth servers

`_mcp_oauth_user_api_key_auth` gates the unauthenticated PKCE flow for
``delegate_auth_to_upstream`` servers, but the bypass applied to BOTH
``/authorize`` and ``/token`` regardless of grant type. ``mcp_token``
accepts ``grant_type=refresh_token`` as well as ``authorization_code``,
and ``exchange_token_with_server`` attaches the server's stored
``client_secret`` to whatever is forwarded upstream. An unauthenticated
caller holding a refresh token issued to that OAuth client could mint
fresh upstream access tokens through LiteLLM.

Limit the anonymous bypass on ``/token`` to ``grant_type=authorization_code``
(the only grant PKCE actually protects via ``code_verifier``); fall
through to normal LiteLLM auth for ``refresh_token`` and any other grant.
``/authorize`` continues to allow anonymous PKCE redirects.

https://claude.ai/code/session_01SjyPmwfmrq8fveFgw9iHW9

* fix(ui): clear delegate_auth_to_upstream when switching off oauth2

The ``delegate_auth_to_upstream`` form field is rendered inside an
``isOAuth2 && (...)`` conditional, so the Form.Item unmounts when the
user changes ``auth_type`` away from ``oauth2``. The follow-up
``form.setFieldValue("delegate_auth_to_upstream", false)`` runs after
the field has already deregistered, so ``onFinish`` receives
``undefined`` and the fallback ``?? mcpServer.delegate_auth_to_upstream``
preserved the old ``true``. The flag then persisted in the database for
a non-oauth2 server and silently re-activated if ``auth_type`` was later
switched back to ``oauth2``.

In the edit payload, force the flag to ``false`` whenever
``auth_type !== oauth2``; only trust the form value (and the existing
DB fallback) when the server is actually oauth2. Backend defense-in-depth
already ignores the flag for non-oauth2 servers, but the DB state should
stay clean too.

https://claude.ai/code/session_01SjyPmwfmrq8fveFgw9iHW9

* Fix MCP delegate auth reset on edit

Co-authored-by: Yassin Kortam <yassin@berri.ai>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
Co-authored-by: Claude <claude@anthropic.com>
2026-05-13 12:06:13 -07:00
yuneng-jiang
e84282b7b3
[Infra] Bump deps (#27157)
* bump: version 0.4.70 → 0.4.71

* bump: version 0.1.39 → 0.1.40

* uv lock
2026-05-05 15:58:05 -07:00
user
7faba9656f
Merge remote-tracking branch 'upstream/litellm_internal_staging' into fix/managed-resource-service-account-isolation 2026-05-05 01:38:11 +00:00
user
83971a8712 fix(proxy): normalize managed resource team owner field 2026-05-04 17:05:50 -07:00
user
bfdd786962 chore(deps): refresh dependency locks 2026-05-04 11:36:18 -07:00
user
799d79160a
fix(proxy): match Prisma index names + extend listing to team for user-keyed callers
Two follow-ups to the managed-resource isolation fix:

1. Rename the new composite indexes to match Prisma's auto-generated naming
   convention (`<Table>_created_by_team_id_created_at_idx`). The previous
   `*_team_owner_created_at_idx` names left `prisma migrate diff` reporting
   an outstanding `RENAME INDEX`, failing `test_aaaasschema_migration_check`.

2. Make `build_owner_filter` return an OR clause when the caller has both
   a `user_id` and a `team_id`, so listings include team-shared resources
   the same way `can_access_resource` already permits reading them. Without
   this a user could fetch a team-shared resource by id but never see it
   in their list view.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 20:44:51 +00:00
user
84fede37b4
fix(proxy): isolate managed resources for service-account API keys
Service-account API keys are issued without a `user_id`, and managed
file/batch/vector-store ownership checks compared
`resource.created_by == user_api_key_dict.user_id`. Because Python
evaluates `None == None` as True, any service-account key passed
ownership checks for any resource also created without a user id, and
listing endpoints skipped the `created_by` filter entirely when the
caller had no user id — returning every tenant's records.

Replace the bare equality with an identity-aware helper:

- Admins (PROXY_ADMIN, PROXY_ADMIN_VIEW_ONLY) keep their unscoped view.
- Callers with a `user_id` are scoped to records they created.
- Callers without a `user_id` but with a `team_id` are scoped to records
  created within their team via a new `created_by_team_id` column.
- Callers with no admin role and no identifying ids are denied — the
  listing path returns an empty page without issuing a query.

Schema migration adds `created_by_team_id` to LiteLLM_ManagedFileTable,
LiteLLM_ManagedObjectTable, and LiteLLM_ManagedVectorStoreTable, plus
indexes for the new filter. Writes in BaseManagedResource and the
enterprise managed_files hook now stamp the column from
`user_api_key_dict.team_id`. Reads in `can_user_access_unified_resource_id`,
`can_user_call_unified_file_id`, `can_user_call_unified_object_id`,
`list_user_resources`, `list_user_batches`, and `get_user_created_file_ids`
all delegate to the new helper.

Tests cover the helper in isolation, the base-class listing/access paths,
and the enterprise file-access hook (including a regression test for the
original `None == None` bypass).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 20:22:37 +00:00
Yuneng Jiang
dd549d9c50
bump: version 0.4.69 → 0.4.70 2026-04-30 21:39:37 -07:00
Sameer Kankute
6588564a88
Merge pull request #26691 from BerriAI/litellm_team_search_credentials_metadata
feat(proxy): add team-level search provider credentials
2026-04-30 08:35:17 +05:30
ishaan-berri
4a7af1ff68
feat(proxy): durable agent workflow run tracking via /v1/workflows/runs (#26793)
* feat(schema): add workflow run tracking tables (LiteLLM_WorkflowRun, LiteLLM_WorkflowEvent, LiteLLM_WorkflowMessage)

* feat(proxy): add /v1/workflows/runs endpoints for durable agent workflow tracking

* feat(proxy): register workflow management router in proxy_server

* docs(workflows): add README for workflow run tracking API

* test(workflows): add unit tests for /v1/workflows/runs endpoints

* fix(workflows): atomic event+status update via tx(), run_id 404 guard, sequence retry on collision

* test(workflows): add tx mock, 404 on unknown run_id, retry-on-collision tests

* fix(workflows): constrain status to Literal enum, rename total→count in list responses

* add tenant isolation and bounded limits to workflow endpoints

* add created_by column and index to LiteLLM_WorkflowRun

* add ownership and bounded-limit tests for workflow endpoints

* Fix workflow run ownership for null owners

* guard prisma import in workflow_management_endpoints

* sync schema.prisma copies with workflow run models

* black: format workflow_management_endpoints.py

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2026-04-29 17:12:18 -07:00
Sameer Kankute
b5c60d8873
Add migration script 2026-04-29 12:30:10 +05:30
Sameer Kankute
4b03cb68a2
feat(proxy): move search tool access to object permissions
Store search tool allowlists only on object permissions, wire auth/management/UI flows to object_permission.search_tools, and remove legacy team-metadata search credential code and tests.

Made-with: Cursor
2026-04-29 12:29:20 +05:30
Yuneng Jiang
67628a60c3
bump: version 0.4.68 → 0.4.69 2026-04-25 19:30:33 -07:00
Yuneng Jiang
4884b0b611
Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_yj_apr23
# Conflicts:
#	litellm/proxy/management_endpoints/key_management_endpoints.py
2026-04-25 09:47:47 -07:00
Krrish Dholakia
70492cee42
feat(proxy): add /v1/memory CRUD endpoints (#26218)
* feat(proxy): add /v1/memory CRUD endpoints with user/team scoping

New LiteLLM_MemoryTable stores user/team-scoped key/value entries with
optional JSON metadata. Value is a String (LLM-readable text) and metadata
is an optional Json? envelope, matching the Letta + mem0 hybrid model so
future structured fields can be added without a schema migration.

Endpoints:
  POST   /v1/memory         - create
  GET    /v1/memory         - list (caller-scoped; admins see all)
  GET    /v1/memory/{key}   - fetch one
  PUT    /v1/memory/{key}   - upsert
  DELETE /v1/memory/{key}   - delete

Non-admin callers cannot set a user_id/team_id other than their own.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(proxy/memory): omit metadata field when None on create

Prisma's Python client rejects `metadata=None` on a `Json?` field with
"A value is required but not set" — the field must be omitted from the
`data` dict entirely to store SQL NULL. Build the create payload
conditionally in both `create_memory` and the PUT-create branch of
`upsert_memory`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(ui): add Memory page to view/manage /v1/memory entries

Adds a new "Memory" sidebar item under Tools so users can see what their
agents have stored. Lists all memories visible to the caller (scoped by
the backend), with a key-search filter, preview column, scope tags, and
view/edit/delete actions. Create modal accepts optional JSON metadata.

- networking.tsx: fetchMemoryList / createMemory / updateMemory / deleteMemory
  wired to the /v1/memory CRUD endpoints.
- MemoryView + MemoryEditModal: new antd-based components (per CLAUDE.md:
  use antd for new UI, not tremor).
- page.tsx + leftnav.tsx: wire the "memory" route + sidebar entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(memory): add key_prefix filter + promote Memory to AI GATEWAY nav

Backend:
- GET /v1/memory now accepts `key_prefix` for Redis-style namespace
  scans (e.g. `?key_prefix=user:`). When both `key` and `key_prefix`
  are passed, `key_prefix` wins.
- Prefix filter sits under the visibility filter in the Prisma where
  clause, so it can never leak rows across user/team scopes.
- New tests: prefix match, and cross-scope isolation (another user's
  `user:*` rows must not appear in the caller's results).

UI:
- Memory moved from a Tools submenu to a top-level AI GATEWAY item
  (alongside Agents, MCP Servers, Skills) — it's an API primitive,
  not a tool-management surface.
- Search box now drives prefix search, matching the Redis mental
  model ("type the namespace, see everything under it").

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(memory): enforce unique key per scope by using NULLS NOT DISTINCT

The unique constraint `(key, user_id, team_id)` on LiteLLM_MemoryTable
silently allowed duplicates when user_id or team_id was NULL, because
Postgres treats every NULL as distinct by default (ANSI semantics). A
caller with no team_id could POST the same key three times and get
three rows.

Migration:
1. Dedupe existing rows, keeping the most recent per (key, user_id,
   team_id), using `IS NOT DISTINCT FROM` so NULL == NULL.
2. Drop the old unique index.
3. Recreate it with `NULLS NOT DISTINCT` (Postgres 15+).

No code change: POST already returns 409 on unique-violation error
messages — it just wasn't firing before because the constraint didn't
catch the NULL-team case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(memory): make key globally unique, 409 on any duplicate

Switches from the compound unique `(key, user_id, team_id)` to a simple
`key @unique`. The compound form silently allowed duplicates when
user_id or team_id was NULL (Postgres treats each NULL as distinct), so
callers could POST the same key repeatedly. Globally-unique key means
one row per key, period — any duplicate create → 409.

- schema.prisma (×3): `key String @unique`, drop `@@unique(...)`.
- initial add_memory_table migration: unique index on (key) only.
- Remove the now-unused follow-up NULLS NOT DISTINCT migration.
- Endpoint error message simplified ("already exists" — no "for this scope").
- Test fake's create() now enforces global key uniqueness.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ui/memory): full-width layout + user/teams-style columns

- Add `w-full` to the MemoryView outer div so the page fills the
  flex-flex-1 container (was collapsing to intrinsic width).
- Replace the combined "Scope" column with separate User ID / Team ID
  columns, matching the layout of the Users / Teams pages: ID, Name,
  Preview, User ID, Team ID, Updated, Actions.
- IDs render with a truncated mono label + copy-to-clipboard button,
  same pattern as view_users.
- Detail drawer now shows Memory ID / User ID / Team ID as separate
  fields instead of stacked color tags.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ui/memory): use clean MCP-style ID pill, drop copy icons

The ID / User ID / Team ID columns showed a mono text blob with a
copy-to-clipboard icon next to each value — too busy compared to the
MCP Servers page. Swap the renderer for MCP's pill style:

- Truncated mono ID inside a blue Tailwind pill
  (`font-mono text-blue-600 bg-blue-50 ... rounded-md border`).
- No copy icon. Full ID surfaces via tooltip.
- ID column is a button that opens the detail drawer on click;
  user/team ID pills are static (not clickable).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(memory): address greptile review feedback

Addresses 5 greptile findings (3/5 → higher confidence target):

1. Identity-less orphan rows (P1): non-admin callers with no user_id AND
   no team_id could create rows that the visibility filter would never
   match again. Now rejected up front with 400 — caller must authenticate
   with a scoped key or act as PROXY_ADMIN.

2. Upsert race returning 500 (P1): PUT's check-then-create isn't atomic;
   a concurrent writer could slip a row in between the 404-check and the
   create call. Now catch unique-violation on create, re-read, and fall
   through to update — PUT stays idempotent. If the conflicting row
   belongs to a different scope, surface a 409 instead of 500.

3. PUT-create scope inconsistency (P2): PUT's create branch always used
   the caller's own user_id/team_id, so admins couldn't bootstrap rows
   scoped elsewhere via PUT (only POST). Now PUT-create calls the shared
   `_resolve_scope()` helper, matching POST semantics.

4. Stale schema comment (P2): schema said "Keyed by (key, user_id,
   team_id)" but `key` is globally unique. Updated all three schema
   copies to reflect the actual design.

5. UI silently truncated at 200 (P2): MemoryView fetched pageSize=200
   with no load-more. Swapped to real server-side pagination driven by
   `data.total`; page size is now 50 and the pager is a real AntD
   control.

Also extracts a shared `_resolve_scope()` helper and `_is_unique_violation()`
from create_memory so POST and PUT don't drift on the scope/error logic.

Tests: +3 new (identity-less 400, PUT admin bootstrap, PUT race →
update), 18/18 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(memory): typed Prisma error + explicit-null metadata on PUT

Two more greptile threads from the last review:

- Unique-violation detection was string-matching "Unique"/"UniqueViolation"
  in the exception message, fragile across Prisma/driver versions. Now
  check the typed error `code == "P2002"` first, with string fallback.

- PUT could not distinguish "metadata omitted" from "metadata: null" —
  both parsed as `None`, so callers had no way to clear stored metadata.
  Switch to Pydantic v2's `model_fields_set` to tell which fields the
  caller actually sent; explicit null now clears the column.

New tests:
- explicit null clears metadata
- omitted metadata preserves existing value

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ui/memory): send explicit null when user clears metadata

Addresses the remaining P1 from the last greptile review:

When the edit modal's metadata textarea was cleared and saved,
`metadataParsed` stayed `undefined`, `JSON.stringify` dropped the key
entirely, and the backend's `model_fields_set` guard therefore left
the stored metadata untouched — UI showed success but nothing changed.

Now: empty textarea on edit → send explicit `null` so the backend
sees `metadata` in `model_fields_set` and clears the column.
Empty textarea on create still maps to `undefined` (field omitted)
to avoid Prisma's `Json? = None` quirk on insert.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ui/memory): preserve slashes in key path encoding

The backend route `/v1/memory/{key:path}` supports keys with slashes,
but `encodeURIComponent` encoded `/` as `%2F`. Some proxies (nginx
default, CloudFlare, AWS ALB) reject or re-decode `%2F` mid-flight,
so UI update/delete calls on slash-containing keys could fail or
silently misroute.

New helper `encodeMemoryKeyForPath` splits by `/`, URL-encodes each
segment, then rejoins with literal `/`. Every other unsafe char
(spaces, `?`, `#`, `%`) stays encoded per-segment; slashes stay as
path delimiters, matching what the `:path` converter expects.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ui/memory): drop misleading client-side column sorters

With server-side pagination, client sorters on `key` and `updated_at`
only reorder the current page while pretending to sort the full
dataset — users would see "sorted by name" but only the visible 50
rows would actually be sorted.

Remove the sorters. The backend already returns rows in
`updated_at DESC` order (sensible default for a memory view), and
users can narrow the result with the key-prefix filter.

Greptile also flagged missing `@@map` on the new model as a
"consistency" issue, but only 1 of 59 tables in this repo uses
`@@map` — the dominant pattern is to rely on Prisma's default
(model name == table name). Skipping that finding as a
false-positive on convention.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(memory): compose visibility + key filters via explicit AND

Greptile P1 (filter-fragility): `where.update(vis)` was semantically
correct today, but dict-merging by key meant any future visibility
filter that grew a new top-level "OR" would silently clobber the
existing key filter.

Compose explicitly instead:

    where = {"AND": [key_filter, vis]}

Applied to both `list_memory` and `_find_memory_for_caller`. When
either side is empty (admin has no visibility filter; list has no
key filter), skip the wrapper and use the non-empty side directly
to keep the generated SQL clean.

Test fake's `_matches` now understands top-level `AND` too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(ui/memory): wrap write helpers with react-query useMutation

Previously the Memory view read via `useQuery` but called the raw
create/update/delete fetch helpers directly in handlers, tracking
loading state with a local `submitting` flag and invalidating state
via `refetch()`. That mixes two concerns:

- it skips react-query's mutation state (isPending / isError / isSuccess)
- `refetch()` only retouches the currently-mounted query instance, not
  other cached pages, so navigating back to an older page could show
  stale rows

Switch the three write paths to `useMutation`:

- `createMutation`, `updateMutation`, `deleteMutation` — each owns
  the mutation fn, success toast, and error toast.
- Success handlers invalidate the whole `["memoryList", ...]` prefix
  via `queryClient.invalidateQueries`, so every cached page refetches
  (pagination + filter-aware).
- Refresh button now invalidates instead of `refetch()`, keeping all
  behavior consistent.
- handleSave/handleDelete become thin adapters that call `.mutateAsync`;
  their errors are swallowed locally since the mutation's onError has
  already surfaced the toast.

Also tightened the edit modal's key-field tooltip to reflect the
actual global-unique semantics (was "Unique per user/team scope").

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(memory): close cross-user write gap + sanitize 500 errors (Veria)

Addresses two Veria findings:

**High — cross-user memory tampering via team membership.** The
visibility filter uses an OR (`user_id == caller OR team_id == caller`)
so team members can SEE each other's team-scoped rows. That's
intentional for list/get. But because PUT/DELETE used the same filter
to find the target row, any team member could overwrite or delete a
teammate's *personal* row whenever both `user_id` and `team_id` were
stamped on it — broader visibility was being silently treated as
broader authority.

New `_assert_write_access(row, caller)` enforces ownership for
mutations. Non-admin rules:

- The row's `user_id` must match the caller (personal ownership), OR
- The row has no `user_id` and its `team_id` matches the caller's
  team (a "pure team row" intended for shared writes).

Admins bypass the check. The same gate runs in PUT (both regular
and post-race-recovery branches) and DELETE.

**Medium — DB internals leaked through 500 detail.** Every `except`
block was raising `HTTPException(500, detail=str(e))`, which surfaces
Prisma error strings (table/column names, host:port, error class
names) to API callers. New `_internal_error()` helper logs the real
exception server-side and returns a generic, caller-safe `detail`.
Applied to create, list, upsert (general fallthrough), and delete.

Also tightened the race-recovery 409 message to drop the "in a
different scope" wording — the caller never needs to know whose
scope it lives in.

Tests (+5):
- teammate cannot overwrite personal row → 403
- teammate cannot delete personal row → 403
- teammate CAN modify pure team row (no user_id stamped) → 200
- admin bypasses write-auth → 200
- 500 response never echoes Prisma internals (table/host/class names)

25/25 unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(memory): require team admin to modify pure team rows

Tightens the write-authorization rule for "pure team rows" (rows with
no user_id stamped, only team_id) to match the pattern used by
team-management endpoints (`_is_user_team_admin` + `_is_user_org_admin_for_team`):

- Plain team members can READ team rows via the OR visibility filter
  (intentional, unchanged).
- Only PROXY_ADMIN, team admins of the row's team_id, or org admins
  for the team's organization may MODIFY them. Plain members get 403.

`_assert_write_access` is now async and takes the prisma_client so it
can fetch the team and run the existing `_is_user_team_admin` /
`_is_user_org_admin_for_team` helpers from
`litellm.proxy.management_endpoints.common_utils`. The org-admin path
is best-effort: it calls `get_user_object`, which depends on the
proxy_server module being initialized, so any exception there is
treated as "not an org admin" rather than crashing the request.

Tests:
- team admin can modify pure team row → 200
- plain team member cannot modify pure team row → 403
- plain team member cannot delete pure team row → 403

Updates the test fake to add a tiny `litellm_teamtable.find_unique`
implementation and a `_make_team(team_id, admin_user_ids=[...])`
helper.

27/27 unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: mypy + UI page-metadata sync for memory page

Two CI failures:

1. mypy: `_find_memory_for_caller` had `key_filter` inferred as
   `dict[str, str]` (literal type) and the conditional `{"AND": [key_filter, vis]}`
   returned `dict[str, list[...]]`, so the join site failed
   `dict-item` typing. Annotate both intermediates as `dict` so mypy
   widens the value type.

2. UI test (`page_utils.test.ts > should have descriptions for all
   pages`): every leftnav entry must have a description in
   `page_metadata.ts`, and `memory` was missing. Added a one-line
   description, matching the style of neighboring entries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [Feat] Day-0 support for GPT-5.5 and GPT-5.5 Pro (#26449)

* feat(openai): day-0 support for GPT-5.5 and GPT-5.5 Pro

Add pricing + capability entries for the new GPT-5.5 family launched by
OpenAI on 2026-04-24:

- gpt-5.5 / gpt-5.5-2026-04-23 (chat): $5/$30/$0.50 per 1M
  input/output/cached input
- gpt-5.5-pro / gpt-5.5-pro-2026-04-23 (responses-only): $60/$360/$6
  per 1M input/output/cached input

Other fees (long-context >272k, flex, batches, priority, cache
discounts) follow the same ratios as GPT-5.4, with context window
retained at 1.05M input / 128K output.

No transformation / classifier code changes are required:
OpenAIGPT5Config.is_model_gpt_5_4_plus_model() already matches 5.5+ via
numeric version parsing, and model registration is driven from the
JSON. The existing responses-API bridge for tools + reasoning_effort
(litellm/main.py:970) already covers gpt-5.5-pro.

Tests:
- GPT5_MODELS regression list now covers gpt-5.5-pro and dated variants
- New test_generic_cost_per_token_gpt55_pro cost-calc test
- Updated test_generic_cost_per_token_gpt55 for long-context fields

* fix(openai): mirror reasoning_effort flags onto gpt-5.5 dated variants

gpt-5.5-2026-04-23 and gpt-5.5-pro-2026-04-23 were missing the
supports_none_reasoning_effort, supports_xhigh_reasoning_effort, and
supports_minimal_reasoning_effort flags that their non-dated
counterparts define. Reasoning-effort routing in OpenAIGPT5Config is
fully capability-driven from these JSON flags — since an absent flag
is treated as False for opt-in levels (xhigh), users pinning to a
dated snapshot would silently lose xhigh support and diverge from the
base alias on logprobs + flexible temperature handling.

Copy the flags onto both dated variants so every dated snapshot
inherits the base model's reasoning-effort capability profile.

Adds a parametrized regression test that asserts
supports_{none,minimal,xhigh}_reasoning_effort parity between each
dated variant and its non-dated counterpart, preventing future drift
when new snapshots are added.

* fix(schema): close LiteLLM_MemoryTable model brace dropped during merge

The rebase against `litellm_internal_staging` (which added
`LiteLLM_AdaptiveRouterState` / `LiteLLM_AdaptiveRouterSession`) left
the closing brace of `LiteLLM_MemoryTable` missing in all three
schema copies — the next model declaration ended up parsed as a field
of the memory table, surfacing as the CI prisma error:

    error: This line is not a valid field or attribute definition.
      -->  schema.prisma:1250
       |
    1249 | // Per-(router, request_type, model) Beta posterior for the adaptive router.
    1250 | model LiteLLM_AdaptiveRouterState {

Add the missing `}` (and the standard blank line) after the memory
table's `@@index([team_id])` in `schema.prisma`,
`litellm/proxy/schema.prisma`, and
`litellm-proxy-extras/litellm_proxy_extras/schema.prisma`.

`prisma generate --schema litellm/proxy/schema.prisma` now runs clean;
27/27 memory unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Mateo Wang <277851410+mateo-berri@users.noreply.github.com>
2026-04-24 18:38:07 -07:00
yuneng-jiang
1a3db6dfa4
Merge pull request #26365 from stuxf/fix/deps-security-bumps
chore(deps): bump vulnerable dependencies
2026-04-24 13:02:58 -07:00
yuneng-jiang
3c3ef7ec9f
Merge pull request #26369 from stuxf/fix/license-metadata
chore(packaging): declare MIT license in litellm-proxy-extras metadata
2026-04-24 13:01:52 -07:00
user
5ba6bc0784
chore(deps): bump uv to 0.11.7 + drop dead npm sed
- UV_IMAGE across all Dockerfiles: 0.10.9 -> 0.11.7.
- Loosen `required-version` in enterprise/ and litellm-proxy-extras/
  from strict `==0.10.9` to `>=0.10.9` so the new Docker image can
  build those workspace members. Matches the main pyproject range.
- Drop the `sed` block that rewrote tar/minimatch version ranges in
  npm's bundled package.json files. The override loop above already
  swaps the vendored directories on disk; npm doesn't re-resolve at
  runtime, so the sed was cosmetic.
2026-04-24 00:36:59 +00:00
user
d60734392b
chore(packaging): declare MIT license in litellm-proxy-extras metadata
litellm-proxy-extras ships a LICENSE file with MIT terms but did not
declare a `license` SPDX expression in its pyproject.toml, so tools
that read the metadata (PyPI, Nexus IQ, pip-licenses) reported
License-None for every published version. Add the explicit expression
so downstream scanners resolve the declared license.
2026-04-23 23:57:22 +00:00
yuneng-jiang
6a25866f51
Merge pull request #26295 from BerriAI/yj_bump_apr22
[Infra] bump versions
2026-04-22 18:33:03 -07:00
Yuneng Jiang
3ddb3cbdf6
bump: version 0.4.67 → 0.4.68 2026-04-22 18:20:21 -07:00
ryan-crabbe-berri
c4c1861389
Merge pull request #26195 from BerriAI/litellm_team_member_total_spend
Track per-member total spend on team memberships
2026-04-22 18:20:16 -07:00
yuneng-jiang
24aec61e4b
Merge pull request #26049 from BerriAI/litellm_adaptive_routing
Litellm adaptive routing
2026-04-22 08:52:51 -07:00
Krrish Dholakia
f1da202d9e fix(adaptive_router): P1 flusher hot-reload + P2 hook accumulation + CI
P1: start the adaptive-router flusher loop unconditionally at proxy boot
instead of gating on 'adaptive_routers is non-empty'. Adaptive routers
added via /config/reload after boot now have their queues drained.
State is lazy-loaded per router on first flush tick (new _state_loaded
flag on AdaptiveRouter) so hot-reloaded routers still get their
persisted priors.

P2: _finalize_adaptive_router_if_configured now prunes stale
AdaptiveRouterPostCallHook callbacks from every litellm callback list
before registering new ones. Without this, every Router replacement
left the old hooks wired up in litellm.callbacks and double-fired
signal recording for every request. Uses
logging_callback_manager.remove_callbacks_by_type (same pattern as the
semantic tool filter).

CI fixes:
- black --check failure: reformatted litellm/router.py
- schema migration diff: aligned @@index with the explicit index name
  ('idx_adaptive_router_session_activity') from the original migration
  by adding 'map:' to all three schema.prisma copies. No new migration
  needed.

Tests: 1 new covering the prune-on-hot-reload path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 17:49:38 -07:00
yuneng-jiang
5dc2926a1e
Merge pull request #26194 from BerriAI/litellm_fix_migration_thrashing
[Feature] Proxy: opt-in v2 migration resolver
2026-04-21 16:55:54 -07:00
Krrish Dholakia
ecd9a83e61 fix(adaptive_router): P2 review items — @updatedAt + snapshot samples
- Mark last_updated_at (AdaptiveRouterState) and last_activity_at
  (AdaptiveRouterSession) with @updatedAt so Prisma refreshes the
  timestamps on every write. Without this the fields stayed frozen at
  INSERT time and the last_activity_at index was misleading for any
  future TTL/eviction logic. Applied to all three schema.prisma copies;
  no migration SQL change needed (Prisma @updatedAt is a client-side
  annotation that doesn't touch DDL).

- get_state_snapshot: report cell.total_samples instead of alpha+beta
  for the 'samples' field. The previous value inflated every cell by
  the COLD_START_MASS prior (e.g. showed 10.0 before any real traffic
  arrived), which confused operators reading /adaptive_router/.../state.
  Updated docs + the snapshot test to match.

Also fixes two pre-existing merge-break syntax errors in router.py
(missing ')' on the AdaptiveRouter TYPE_CHECKING import; truncated
async_pre_routing_hook dispatch call for the adaptive router branch)
that were masking the rest of the file from the interpreter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 16:27:01 -07:00
Krrish Dholakia
c7342bdc4f
Merge branch 'litellm_internal_staging' into litellm_adaptive_routing 2026-04-21 16:22:38 -07:00
Yuneng Jiang
2b8b9502d9
[Fix] v2 resolver: swallow non-connection DB errors; wrap resolve failures
Addresses two further Greptile findings:

- `_warn_if_db_ahead_of_head` only caught `psycopg.OperationalError`.
  Non-connection DB errors (e.g. `InsufficientPrivilege` / 42501 if the
  runtime DB user lacks SELECT on `_prisma_migrations`) would propagate
  uncaught and crash startup — contradicting the docstring's
  "informational only, never blocks" guarantee. Widen the catch to
  `psycopg.DatabaseError` so all DB-layer errors are swallowed.

- In the P3009 and P3018 idempotent-recovery paths, the call to
  `_resolve_specific_migration(name)` was not wrapped in its own
  try/except. Being inside an active `except CalledProcessError`
  handler, a new `CalledProcessError` from the resolve call would NOT
  re-enter the same handler — it would propagate out as
  `CalledProcessError`, past `proxy_cli.py`'s `except RuntimeError`,
  crashing startup with an unhandled traceback instead of the intended
  clean `sys.exit(2)`. Wrap both call sites to convert to RuntimeError.

Adds unit tests for both behaviors.
2026-04-21 15:53:07 -07:00
Yuneng Jiang
9049f37864
[Fix] v2 migration resolver: address Greptile review findings
- Open the psycopg connection in `_warn_if_db_ahead_of_head` with
  autocommit=True. Without it, psycopg3's `with conn` calls COMMIT on
  clean exit, which fails after the `UndefinedTable` (fresh-DB) branch
  left the transaction in an aborted state — crashing first-run startups.

- Wrap the v2 `prisma db push` path in try/except and raise RuntimeError
  on CalledProcessError/TimeoutExpired. Otherwise these propagate past
  proxy_cli.py's `except RuntimeError` as unhandled tracebacks.

- Reword the loop-exhaustion error to cover the non-timeout exit path
  (repeated P3005/P3009/P3018 idempotent-recovery `continue`s), not
  just persistent timeouts.

Adds a unit test for the db_push error wrapping.
2026-04-21 15:34:24 -07:00
Yuneng Jiang
a16c00e22c
[Feature] Proxy: opt-in v2 migration resolver (--use_v2_migration_resolver)
Default behavior (v1) is unchanged. Users who have seen schema thrashing
during rolling deploys can opt into the v2 resolver with
`--use_v2_migration_resolver`.

Why v2 is safer:
- Runs `prisma migrate deploy` only.
- Recovers from P3005 (baseline) and idempotent P3009/P3018 errors, same
  as v1.
- Never calls `_resolve_all_migrations`, which generates a schema diff
  between the live DB and the shipped schema.prisma and applies it via
  `prisma db execute`. That path bypassed every migration's SQL and was
  the root cause of thrashing when two LiteLLM versions contended for
  the same DB.
- Logs a non-blocking warning when the DB has migrations applied that
  are newer than anything this build ships (ahead-of-HEAD). It does not
  refuse to start — many users have unusual ledger state from past
  thrashing, and blocking startup would be a breaking change.

Also prints a message on startup when the default (v1) resolver is in
use, pointing operators at the opt-in flag.

Adds unit tests covering the v2 fail-fast paths, the stripping of
Prisma-specific query params from DATABASE_URL (needed for psycopg),
the timestamp helpers, and pins the default: v1 still invokes
`_resolve_all_migrations`, v2 must not.
2026-04-21 14:20:35 -07:00
Ryan Crabbe
e5f3e15969
Track per-member total spend on team memberships
Adds total_spend column to LiteLLM_TeamMembership that accumulates
continuously and is not zeroed by the budget cycle reset job. This
enables UI surfaces to distinguish current-cycle spend (the existing
spend column, which resets) from lifetime spend per team member.

Also exposes budget_reset_at on LiteLLM_BudgetTable so /team/info
callers can see when a member's budget window next resets. The field
was already stored in the DB but stripped by the response Pydantic
model.

Includes regression tests that:
- Guard the reset job against ever writing total_spend: 0
- Verify the spend writer increments both spend and total_spend in
  one UPDATE statement.
2026-04-21 13:56:44 -07:00
Yuneng Jiang
b39f210a6c
[Infra] Add freshness and destructive guards to migration workflow
Generating a migration from a stale branch could silently emit DROP
COLUMN for columns the stale branch did not know about, and the
script would write that SQL to a new migration file with no warning.

Adds two guards to ci_cd/run_migration.py:

- Branch freshness check: fetches origin/<base-branch> and exits 3 if
  HEAD is behind. Default base is litellm_internal_staging. New
  flags: --base-branch, --skip-freshness-check.
- Destructive guard: refuses (exit 2) if the generated diff contains
  DROP COLUMN / DROP TABLE / DROP INDEX, unless --allow-destructive
  is passed.

Refusal banners include guidance and an explicit callout instructing
AI agents not to auto-bypass the flags. Also treats Prisma's
"-- This is an empty migration." output as a no-op rather than
writing an empty file.

Updates litellm-proxy-extras/migration_runbook.md with the new
workflow, flag documentation, and agent warnings.
2026-04-21 12:00:23 -07:00
Krrish Dholakia
b6fc75b3ce
Merge branch 'litellm_internal_staging' into litellm_adaptive_routing 2026-04-20 15:28:08 -07:00
Krrish Dholakia
fba736ca3c fix(adaptive_router): 3 P1 review defects
- Use 'auto_router/adaptive_router' prefix in example yaml, docs, and
  README — the old 'adaptive_router/...' and 'openai/gpt-4o-mini' values
  silently skipped adaptive-router init because detection requires the
  'auto_router/adaptive_router' prefix.

- Read x-litellm-min-quality-tier from request headers (and the
  'min_quality_tier' metadata key as fallback) in async_pre_routing_hook.
  Previously the documented header was defined but never extracted, so
  the quality-floor feature was inert.

- Evict expired entries from _session_states. The cache grew without
  bound — added a parallel expiry map (same TTL as _owner_cache) and an
  opportunistic bulk sweep when the cache crosses a size threshold.

- Align adaptive-router migration SQL with Prisma schema: all count
  columns and the 'clean_credit_awarded' / 'last_processed_turn' fields
  are NOT NULL in the data model, so the migration now declares them
  NOT NULL. Fixes test_aaaasschema_migration_check.

Tests: 8 new covering header/metadata/precedence/invalid-value paths for
min_quality_tier and TTL-based eviction of _session_states.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 15:22:18 -07:00
ishaan-berri
2f22a1293e
bump litellm-proxy-extras to 0.4.67 (#26043)
* bump litellm-proxy-extras version to 0.4.67

* bump litellm-proxy-extras pin to 0.4.67 in litellm pyproject

* regenerate uv.lock for litellm-proxy-extras 0.4.67

* bump litellm-enterprise version to 0.1.38

* bump litellm-enterprise pin to 0.1.38 in litellm pyproject

* regenerate uv.lock for litellm-enterprise 0.1.38
2026-04-18 19:03:56 -07:00
Krrish Dholakia
dd4a1d2be2 feat: add adaptive routing to litellm
allow model routing to improve based on conversation signals

ensures router is picking best model for task
2026-04-18 16:35:17 -07:00
Ishaan Jaffer
e6a20af646
fix(proxy-extras): skip post-deploy sanity check when no migrations pending
When prisma migrate deploy reports 'No pending migrations to apply' the DB
already matches schema — running _resolve_all_migrations (migrate diff +
prisma db execute) adds 25+ seconds unnecessarily, causing the proxy to
miss the 90-second startup timeout in test_litellm_proxy_server_config_no_general_settings.
2026-04-17 15:59:41 -07:00
Ishaan Jaffer
33175a8ee7
fix(proxy-extras): fall back to prisma db execute when migrate diff fails on pooler URL
When DIRECT_URL is not set and DATABASE_URL is a Neon pooler URL, prisma migrate diff
fails (pooler doesn't support extended query protocol for schema introspection). Previously
_resolve_all_migrations returned early without applying any migrations, leaving the
budget_limits column missing and causing test_auth_callback_new_user to fail.

Now falls back to running each migration SQL file via prisma db execute --file, which
works with pooler URLs and is safe to re-run due to IF NOT EXISTS guards.
2026-04-17 15:38:48 -07:00
Ishaan Jaffer
33a2cee4af
fix(proxy-extras): use DIRECT_URL for prisma migrate diff, tempfile for diff dir 2026-04-17 15:17:15 -07:00
Ishaan Jaffer
7c47bbd226
fix(migration): run schema sanity check after P3009/P3018 idempotent migration recovery 2026-04-17 15:01:10 -07:00
Ishaan Jaffer
9281147a1a
fix(schema): add budget_limits Json? to LiteLLM_TeamTable and LiteLLM_VerificationToken 2026-04-17 14:47:18 -07:00
Ishaan Jaffer
e8461b5b97
style: run black formatter on files from main merge 2026-04-17 13:02:59 -07:00
Ishaan Jaffer
f31d4faa87
Merge origin/main into litellm_ishaan_april6 2026-04-17 12:36:51 -07:00
Yuneng Jiang
073685136d
bump: version 0.4.65 → 0.4.66 2026-04-16 09:54:56 -07:00
Ishaan Jaffer
def9c4ec47
chore: merge litellm_internal_staging, resolve uv.lock conflict 2026-04-15 18:51:19 -07:00
harish876
5f99e52fbc Added concurrent index creation. Added necessary disclaimers to index creation.
Index creation is scoped to a single statements and hence
Validated index creation in local env
2026-04-15 22:52:47 +00:00