* fix(opentelemetry): JSON-serialize dict metadata fields for OTEL span attributes (#27451) (#27455)
Squash-merged by litellm-agent from Anai-Guo's PR.
* feat(dashscope): add embeddings and reranks(qwen3-rerank) support via OpenAI-compatible endpoint (#27508)
Squash-merged by litellm-agent from yimao's PR.
* fix(vertex_ai/gemini): raise BadRequestError when image_url or url fi… (#24550)
Squash-merged by litellm-agent from krisxia0506's PR.
* fix(vertex_ai): raise error on mid-stream 429/error chunks instead of silently swallowing (#23711)
Squash-merged by litellm-agent from krisxia0506's PR.
* fix: raise BadRequestError for file content blocks missing 'file' sub… (#24503)
Squash-merged by litellm-agent from krisxia0506's PR.
* Fix Gemini MIME detection for extensionless GCS URIs (#27278)
Squash-merged by litellm-agent from krisxia0506's PR.
* fix(vertex_ai/partner_models): drop unused vertexai SDK gate from count_tokens (closes#28084) (#28107)
Squash-merged by litellm-agent from voidborne-d's PR.
* feat(chart): add support for autoscaling behavior in HPA (#27990)
Squash-merged by litellm-agent from FabrizioCafolla's PR.
* feat(proxy): add blocked flag to models for pause/resume from the UI (#27927)
Squash-merged by litellm-agent from Cyberfilo's PR.
* fix: pass socket timeouts to Redis cluster clients (#27920)
Squash-merged by litellm-agent from tomdee's PR.
* Fix/cache token (#28009)
Squash-merged by litellm-agent from escon1004's PR.
* fix(deepseek): forward reasoning_content in multi-turn thinking mode conversations (#28080)
Squash-merged by litellm-agent from Divyansh8321's PR.
* fix(guardrails): return HTTP 400 instead of 500 for blocked requests (#27617)
* fix: reset org and tag budgets (#27326)
* reset org budgets
* reset tag budgets
---------
Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain>
* fix(ui): omit allowed_routes from key edit save when unchanged (#27553)
* fix(ui): omit allowed_routes from key edit save when unchanged
When a team admin opens Edit Settings on a key with key_type=AI APIs and
saves without changing anything, the UI re-sends the existing allowed_routes
value, which the backend's _check_allowed_routes_caller_permission gate
rejects for non-proxy-admins (LIT-2681).
Strip allowed_routes from the patch in handleSubmit when it deep-equals the
original keyData.allowed_routes. The backend treats absence as "leave alone,"
so no-op saves now succeed for non-admins. Admins explicitly editing the
field still send the new value.
* fix(ui): order-insensitive allowed_routes diff + cover null-original case
Address Greptile review:
- Switch the "is allowed_routes unchanged" check to a Set-based comparison so
a server-side reorder of the array doesn't register as a user edit and
re-trigger LIT-2681.
- Add two regression tests: (1) keyData.allowed_routes is null and the form
is untouched — patch should strip the field; (2) server returned routes in
a different order than the user originally entered — patch should still
recognize the value as unchanged.
* chore(ui): strip ticket refs and tighten comments in key edit fix
- Remove internal-tracker references from in-code comments
- Tighten the WHY comment in handleSubmit to two lines
- Drop redundant test-block comments — test names already describe the case
* fix(ui): annotate Set<string> generic in allowed_routes diff to fix tsc
* fix(guardrails): return HTTP 400 instead of 500 for guardrail-blocked requests
GuardrailRaisedException and BlockedPiiEntityError both lacked a
status_code attribute. When these exceptions reached the proxy
exception handler (getattr(e, 'status_code', 500)), the fallback
defaulted to HTTP 500 — making intentional guardrail blocks
indistinguishable from server errors and causing unnecessary client
retries.
Changes:
- Add status_code=400 (keyword-only) to GuardrailRaisedException
- Add status_code=400 (keyword-only) to BlockedPiiEntityError
- Update _is_guardrail_intervention() to recognize both exceptions
so downstream loggers record 'guardrail_intervened' instead of
'guardrail_failed_to_respond'
- Add 6 unit tests for default/custom status codes and getattr pattern
- Strengthen existing blocked-action test with status_code assertion
Fixes#24348
---------
Co-authored-by: Michael-RZ-Berri <michael@berri.ai>
Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain>
Co-authored-by: ryan-crabbe-berri <ryan@berri.ai>
Co-authored-by: Krrish Dholakia <krrish+github@berri.ai>
* fix(router/proxy): address Greptile P1+P2 review comments on PR #28161
- router: raise ServiceUnavailableError (503) instead of RouterRateLimitErrorBasic (429)
when a specifically-addressed deployment is administratively blocked; 429 misleads
retry-enabled clients into spinning forever against a paused model
- proxy_server: compute get_fully_blocked_model_names() once before both branches in
model_list() instead of duplicating the call in each branch
- deepseek: upgrade silent debug log to warning when injecting placeholder
reasoning_content so callers are clearly notified of degraded multi-turn quality
- tests: update two blocked-deployment assertions to expect ServiceUnavailableError
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix: address bug detection findings (cache token order, mutable defaults)
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix: address bugs in async pass-through, anthropic cache token detection, rerank tests
- async_get_available_deployment_for_pass_through: enforce blocked check on specific deployments
- cost_calculator: detect anthropic-style usage by attribute presence (not truthiness) to avoid mixing OpenAI cached_tokens into anthropic normalization when read=0
- dashscope rerank tests: pass request to httpx.Response constructions for consistency
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix code qa
* fix(vertex_ai/gemini): strip MIME parameters from GCS contentType
GCS object metadata's contentType field can include parameters such as
'text/html; charset=utf-8'. Strip them in _apply_gemini_mime_type_aliases
so downstream get_file_extension_from_mime_type sees a bare MIME type.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(vertex_ai/gemini): clarify mime-type error message string concatenation
Co-authored-by: Yassin Kortam <yassin@berri.ai>
---------
Co-authored-by: Tai An <antai12232931@outlook.com>
Co-authored-by: Vincent <yimao1231@gmail.com>
Co-authored-by: Kris Xia <xiajiayi0506@gmail.com>
Co-authored-by: d 🔹 <liusway405@gmail.com>
Co-authored-by: Fabrizio Cafolla <developer@fabriziocafolla.com>
Co-authored-by: Filippo Menghi <113345637+Cyberfilo@users.noreply.github.com>
Co-authored-by: Tom Denham <tom@tomdee.co.uk>
Co-authored-by: escon1004 <70471150+escon1004@users.noreply.github.com>
Co-authored-by: Divyansh Singhal <97736786+Divyansh8321@users.noreply.github.com>
Co-authored-by: robin-fiddler <robin@fiddler.ai>
Co-authored-by: Michael-RZ-Berri <michael@berri.ai>
Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain>
Co-authored-by: ryan-crabbe-berri <ryan@berri.ai>
Co-authored-by: Krrish Dholakia <krrish+github@berri.ai>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix: patch Host-header auth bypass in get_request_route
Starlette reconstructs request.url from the Host header. A malformed
Host like `localhost/?x=1` causes Starlette to build the full URL as
`http://localhost/?x=1/health`, which url-parses to path="/". Since "/"
is in LiteLLMRoutes.public_routes, all protected routes became reachable
without authentication.
Fix: read scope["path"] (set by uvicorn from the HTTP request line,
not derivable from headers) instead of request.url.path. Sub-path
deployments are handled via scope["app_root_path"] / scope["root_path"],
mirroring Starlette's own base_url construction logic.
Affected variants confirmed fixed:
Host: localhost/?x=1
Host: localhost:4000/?x=1
Host: localhost/#test
Host: localhost:4000/#test
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
* style: reduce comments in route fix
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
* fix: block credential fields in RAG ingest vector_store options
Credential fields (vertex_credentials, aws_access_key_id, api_key, etc.)
in ingest_options.vector_store are now rejected at the API boundary with
a 400 error. Credentials must be configured server-side.
Previously any authenticated user could supply a vertex_credentials dict
with type=external_account pointing credential_source.file at an
arbitrary path (e.g. /proc/1/environ) and token_url at an
attacker-controlled server. google-auth's identity_pool.Credentials
refresh() would read the file and POST its contents to the attacker.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
* fix: block /key/update self-escalation by assigned users
Non-admin users who were assigned a key (created_by != caller) could
update any non-budget field — models, rpm_limit, guardrails, etc. —
without admin authorization, allowing privilege self-escalation.
Gate: only the key creator (created_by == caller) may edit their own
key without admin check; budget changes always require admin regardless
of creator status. All other callers must pass _check_key_admin_access.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
* fix: block user-controlled api_base in RAG ingest vector_store options
A user-supplied api_base in ingest_options.vector_store caused the server
to forward its configured provider credentials (Gemini, OpenAI) to an
attacker-controlled endpoint via SSRF.
Add api_base to the blocked credential params set alongside api_key and
the existing credential fields.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
* fix: restrict /utils/transform_request to PROXY_ADMIN and apply body safety check
Any authenticated internal_user could POST arbitrary provider config
(aws_sts_endpoint, api_base, etc.) to /utils/transform_request and have
the server forward its credentials to an attacker-controlled endpoint.
- Gate the endpoint on PROXY_ADMIN role (403 for all other roles)
- Call is_request_body_safe() to reject banned params even for admins
- Convert ValueError from safety check to HTTP 400
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
* fix: apply banned-param check to /utils/transform_request
Without is_request_body_safe(), any authenticated user could pass
aws_sts_endpoint, api_base, or aws_web_identity_token to
/utils/transform_request and have the server forward its configured
provider credentials to an attacker-controlled endpoint during SDK
credential resolution.
Applies the same banned-param blocklist already used by LLM endpoints.
Endpoint remains accessible to all authenticated users.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
* fix: block SSRF via api_base in /prompts/test dotprompt YAML frontmatter
Any frontmatter key not in ["model","input","output"] flowed into
optional_params and was merged into the LLM call data dict, bypassing
is_request_body_safe. An attacker with any bearer key could set
api_base in YAML to redirect the outbound LLM request — including the
provider API key — to an attacker-controlled host.
Fix: call is_request_body_safe on the constructed data dict after
optional_params are merged, before invoking ProxyBaseLLMRequestProcessing.
ValueError from the banned-param check is surfaced as HTTP 400.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
* Update litellm/proxy/rag_endpoints/endpoints.py
Co-authored-by: veria-ai[bot] <224490171+veria-ai[bot]@users.noreply.github.com>
* fix: coerce nested config strings before banned-param check
_NESTED_CONFIG_KEYS descent used isinstance(nested, dict) which silently
skipped litellm_embedding_config when delivered as a JSON string via
multipart/form-data. Banned params (api_base, aws_sts_endpoint, etc.)
nested inside the stringified value were invisible to is_request_body_safe.
_NESTED_METADATA_KEYS already used _coerce_metadata_to_dict which parses
JSON strings before checking. Apply the same coercion to _NESTED_CONFIG_KEYS.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
* fix: replace substring match with prefix match in is_llm_api_route
mapped_pass_through_routes used `_llm_passthrough_route in route` (substring)
so any admin-only path whose URL contained a provider name (openai, anthropic,
azure, bedrock, etc.) was misclassified as an LLM API route and bypassed the
admin gate in non_proxy_admin_allowed_routes_check.
Confirmed live: non-admin key could GET /credentials/by_name/openai (read
masked provider API key) and DELETE /credentials/openai (delete credential).
Fix: use exact match or startswith(prefix + "/") — the same pattern used
everywhere else in RouteChecks — so only routes that actually start with a
passthrough prefix are allowed through.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
* fix: stabilize PR #27878 test failures
- key_management_endpoints: extend can_skip_admin_check to team keys so
team members with /key/update permission can update non-budget fields.
can_team_member_execute_key_management_endpoint already validates team
membership + permission and raises if unauthorized; reaching the admin
check on a team key means the caller was authorized.
- test: set created_by on mock key in
test_update_key_non_budget_fields_allowed_for_internal_user so
caller_is_creator resolves correctly (MagicMock default ≠ user_id).
- auth_utils.get_request_route: guard against non-dict request.scope
(e.g. MagicMock in unit tests) to prevent a MagicMock leaking into
UserAPIKeyAuth.request_route and failing Pydantic validation.
- ci: assign test_multipart_bypass_repro.py to the proxy-runtime shard
in test-unit-proxy-db.yml to satisfy the shard-coverage check.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
* fix(lint): add explicit str() cast in get_request_route for MyPy
scope.get() returns Any|None which MyPy cannot coerce to str implicitly.
Wrap both scope.get() calls in str() to satisfy the type checker.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
* fix: guard bare-/ root_path strip + make total_spend migration idempotent
auth_utils.get_request_route: when Starlette sets scope["app_root_path"]
to "/" (e.g. behind some middleware), the old stripping logic would
remove the leading slash from every path ("/team/new" → "team/new"),
breaking route matching and causing auth to misclassify protected routes.
Skip stripping when root_path is bare "/".
migration: add IF NOT EXISTS to total_spend ALTER TABLE so the migration
is safe to replay when a prior partial run already created the column.
Without this guard, prisma migrate deploy fails on CI DBs that were
partially migrated, causing all subsequent DB operations (including
/team/new) to 500.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
* fix: require creator still owns key for personal-key bypass in /key/update
caller_is_creator now requires both created_by == caller AND user_id ==
caller. Previously checking only created_by let a demoted admin who
originally created a key for another user continue editing non-budget
fields on it after reassignment, bypassing _check_key_admin_access.
Adds regression test: creator whose key was reassigned is blocked (403).
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
* fix: extract auth checks to fix PLR0915 + broaden max_budget assertion
internal_user_endpoints._update_single_user_helper exceeded 50 statements
(PLR0915). Extract authorization checks into _check_user_update_authz helper
to bring statement count under the limit.
test_validate_max_budget: assert "negative" (substring of both the local
"cannot be negative" and the CI "non-negative finite number" messages) so
the test is stable regardless of which exact wording the function uses.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: veria-ai[bot] <224490171+veria-ai[bot]@users.noreply.github.com>
* feat(mcp): add delegate_auth_to_upstream flag for PKCE passthrough
Adds an opt-in per-server flag that lets clients (e.g. VS Code) complete
PKCE directly with an upstream OAuth2 MCP server, instead of LiteLLM
double-gating with its own API-key/SSO check. Only honored when
auth_type=oauth2 and the operator explicitly sets the flag; mixed-target
or non-oauth2 requests fail closed.
- Adds the field to Pydantic models, Prisma schema, and a migration
- New MCPRequestHandler._target_servers_delegate_auth_to_upstream gate
that runs only when no x-litellm-api-key is present, so authenticated
users still get user_id resolution + stored-credential lookup
- Anonymous callers now see delegate servers in get_allowed_mcp_servers
(scoped to delegate servers only; the upstream still enforces auth)
- mcp_management_endpoints: allow anonymous /authorize and /token for
delegate servers so VS Code can complete PKCE without a LiteLLM session
- UI toggle (shown only for oauth2) + payload/view wiring
- Tests covering: oauth2 on/off, non-oauth2 with flag, mixed targets,
no resolvable target, explicit key precedence, and 401 emission
Co-authored-by: Cursor <cursoragent@cursor.com>
* Enforce oauth2 for delegated MCP auth bypass
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(mcp): close secondary Authorization bypass for delegate servers
The delegate-auth bypass gated only on the primary `x-litellm-api-key`
header, so a LiteLLM key sent via `Authorization: Bearer sk-...` (the
secondary header) was silently dropped — skipping spend tracking and
rate limiting. Gate on the resolved litellm_api_key (which considers
both headers) so the bypass fires only when neither is present.
Also update the existing "Authorization header present" test to reflect
that an upstream OAuth token now flows through the existing oauth2
fallback (LiteLLM auth attempt → fail → anonymous), not via the
delegate branch.
Co-authored-by: Cursor <cursoragent@cursor.com>
* Avoid duplicate MCP OAuth credential lookup
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(mcp): block delegate bypass for M2M and internal-only servers
Two security issues flagged in code review:
1. High – client_credentials (M2M) servers must not be delegatable:
LiteLLM auto-fetches the upstream token using stored credentials, so
allowing anonymous bypass would let any external caller invoke tools
authenticated as LiteLLM's service account.
Fix: check `server.has_client_credentials` in
`_target_servers_delegate_auth_to_upstream`, the anonymous
allow-list in `get_allowed_mcp_servers`, and `_mcp_oauth_user_api_key_auth`.
2. Medium – internal-only servers exposed to public internet:
The anonymous delegate allow-list was not filtering by
`available_on_public_internet`, so external callers with an upstream
OAuth token could invoke tools on servers marked internal-only.
Fix: add `available_on_public_internet` guard to the anonymous
delegate server list in `get_allowed_mcp_servers`.
Tests added for both cases.
Co-authored-by: Cursor <cursoragent@cursor.com>
* Require public MCP delegate auth servers
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(mcp): align delegate auth path parsing with downstream routing
`_extract_target_server_names_from_path` used a naive segments-based
split while `server.py::_get_mcp_servers_in_path` uses a regex that
allows server names with one embedded slash and comma-separated lists.
With the old parser, a request to `/mcp/<delegated>/<garbage>` was
parsed as targeting `<delegated>` by the auth gate (bypassing LiteLLM
auth) while the routing layer parsed it as `<delegated>/<garbage>` —
when that name did not resolve, the request fell back to the anonymous
allow-list, which can include `allow_all_keys` servers that normally
require a LiteLLM key.
Replace the parser with the same regex logic as
`_get_mcp_servers_in_path` so auth gating sees the exact target name(s)
downstream routing sees. Add regression tests covering parser parity
and the specific extra-path-segment bypass attempt.
https://claude.ai/code/session_01SjyPmwfmrq8fveFgw9iHW9
* fix(mcp): close header/path TOCTOU in MCP delegate auth gate
`_target_servers_delegate_auth_to_upstream` and
`_target_servers_use_oauth2` trusted the `x-mcp-servers` header when
present, but `server.py::extract_mcp_auth_context` overrides that
header with the path-derived list for `/mcp/...` routes. An attacker
could set `x-mcp-servers: <delegated>` while pointing the URL path at
a non-delegate server, flipping the auth gate without changing the
target downstream routing actually uses.
Extract a shared `_resolve_target_server_names` helper that mirrors
the downstream override (path-derived names for `/mcp/...` routes,
header value otherwise). Add regression tests covering the TOCTOU
attempt and the helper's path-vs-header precedence.
https://claude.ai/code/session_01SjyPmwfmrq8fveFgw9iHW9
* Fix delegated MCP OAuth test mock
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(mcp): drop unreachable /{server}/mcp branch in auth path parser
`_extract_target_server_names_from_path` also matched the
``/{server_name}/mcp`` form, but the downstream parser
``_get_mcp_servers_in_path`` only handles ``/mcp/...`` — and
``dynamic_mcp_route`` in ``proxy_server`` rewrites ``/{name}/mcp``
to ``/mcp/{name}`` on the scope before the MCP handler runs. Parsing
the un-rewritten form on the auth side was therefore unreachable in
production, and contradicted the docstring's claim of mirroring the
downstream parser — exactly the kind of mismatch that risks a future
header/path TOCTOU if any new entry point skips the rewrite.
Drop the branch; the canonical ``/mcp/...`` path matches both
parsers. Update the regression test to assert the new behavior.
https://claude.ai/code/session_01SjyPmwfmrq8fveFgw9iHW9
* Fix MCP path auth target resolution
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(mcp): require auth for refresh_token grants on delegate-auth servers
`_mcp_oauth_user_api_key_auth` gates the unauthenticated PKCE flow for
``delegate_auth_to_upstream`` servers, but the bypass applied to BOTH
``/authorize`` and ``/token`` regardless of grant type. ``mcp_token``
accepts ``grant_type=refresh_token`` as well as ``authorization_code``,
and ``exchange_token_with_server`` attaches the server's stored
``client_secret`` to whatever is forwarded upstream. An unauthenticated
caller holding a refresh token issued to that OAuth client could mint
fresh upstream access tokens through LiteLLM.
Limit the anonymous bypass on ``/token`` to ``grant_type=authorization_code``
(the only grant PKCE actually protects via ``code_verifier``); fall
through to normal LiteLLM auth for ``refresh_token`` and any other grant.
``/authorize`` continues to allow anonymous PKCE redirects.
https://claude.ai/code/session_01SjyPmwfmrq8fveFgw9iHW9
* fix(ui): clear delegate_auth_to_upstream when switching off oauth2
The ``delegate_auth_to_upstream`` form field is rendered inside an
``isOAuth2 && (...)`` conditional, so the Form.Item unmounts when the
user changes ``auth_type`` away from ``oauth2``. The follow-up
``form.setFieldValue("delegate_auth_to_upstream", false)`` runs after
the field has already deregistered, so ``onFinish`` receives
``undefined`` and the fallback ``?? mcpServer.delegate_auth_to_upstream``
preserved the old ``true``. The flag then persisted in the database for
a non-oauth2 server and silently re-activated if ``auth_type`` was later
switched back to ``oauth2``.
In the edit payload, force the flag to ``false`` whenever
``auth_type !== oauth2``; only trust the form value (and the existing
DB fallback) when the server is actually oauth2. Backend defense-in-depth
already ignores the flag for non-oauth2 servers, but the DB state should
stay clean too.
https://claude.ai/code/session_01SjyPmwfmrq8fveFgw9iHW9
* Fix MCP delegate auth reset on edit
Co-authored-by: Yassin Kortam <yassin@berri.ai>
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
Co-authored-by: Claude <claude@anthropic.com>
Two follow-ups to the managed-resource isolation fix:
1. Rename the new composite indexes to match Prisma's auto-generated naming
convention (`<Table>_created_by_team_id_created_at_idx`). The previous
`*_team_owner_created_at_idx` names left `prisma migrate diff` reporting
an outstanding `RENAME INDEX`, failing `test_aaaasschema_migration_check`.
2. Make `build_owner_filter` return an OR clause when the caller has both
a `user_id` and a `team_id`, so listings include team-shared resources
the same way `can_access_resource` already permits reading them. Without
this a user could fetch a team-shared resource by id but never see it
in their list view.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Service-account API keys are issued without a `user_id`, and managed
file/batch/vector-store ownership checks compared
`resource.created_by == user_api_key_dict.user_id`. Because Python
evaluates `None == None` as True, any service-account key passed
ownership checks for any resource also created without a user id, and
listing endpoints skipped the `created_by` filter entirely when the
caller had no user id — returning every tenant's records.
Replace the bare equality with an identity-aware helper:
- Admins (PROXY_ADMIN, PROXY_ADMIN_VIEW_ONLY) keep their unscoped view.
- Callers with a `user_id` are scoped to records they created.
- Callers without a `user_id` but with a `team_id` are scoped to records
created within their team via a new `created_by_team_id` column.
- Callers with no admin role and no identifying ids are denied — the
listing path returns an empty page without issuing a query.
Schema migration adds `created_by_team_id` to LiteLLM_ManagedFileTable,
LiteLLM_ManagedObjectTable, and LiteLLM_ManagedVectorStoreTable, plus
indexes for the new filter. Writes in BaseManagedResource and the
enterprise managed_files hook now stamp the column from
`user_api_key_dict.team_id`. Reads in `can_user_access_unified_resource_id`,
`can_user_call_unified_file_id`, `can_user_call_unified_object_id`,
`list_user_resources`, `list_user_batches`, and `get_user_created_file_ids`
all delegate to the new helper.
Tests cover the helper in isolation, the base-class listing/access paths,
and the enterprise file-access hook (including a regression test for the
original `None == None` bypass).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(schema): add workflow run tracking tables (LiteLLM_WorkflowRun, LiteLLM_WorkflowEvent, LiteLLM_WorkflowMessage)
* feat(proxy): add /v1/workflows/runs endpoints for durable agent workflow tracking
* feat(proxy): register workflow management router in proxy_server
* docs(workflows): add README for workflow run tracking API
* test(workflows): add unit tests for /v1/workflows/runs endpoints
* fix(workflows): atomic event+status update via tx(), run_id 404 guard, sequence retry on collision
* test(workflows): add tx mock, 404 on unknown run_id, retry-on-collision tests
* fix(workflows): constrain status to Literal enum, rename total→count in list responses
* add tenant isolation and bounded limits to workflow endpoints
* add created_by column and index to LiteLLM_WorkflowRun
* add ownership and bounded-limit tests for workflow endpoints
* Fix workflow run ownership for null owners
* guard prisma import in workflow_management_endpoints
* sync schema.prisma copies with workflow run models
* black: format workflow_management_endpoints.py
---------
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Store search tool allowlists only on object permissions, wire auth/management/UI flows to object_permission.search_tools, and remove legacy team-metadata search credential code and tests.
Made-with: Cursor
* feat(proxy): add /v1/memory CRUD endpoints with user/team scoping
New LiteLLM_MemoryTable stores user/team-scoped key/value entries with
optional JSON metadata. Value is a String (LLM-readable text) and metadata
is an optional Json? envelope, matching the Letta + mem0 hybrid model so
future structured fields can be added without a schema migration.
Endpoints:
POST /v1/memory - create
GET /v1/memory - list (caller-scoped; admins see all)
GET /v1/memory/{key} - fetch one
PUT /v1/memory/{key} - upsert
DELETE /v1/memory/{key} - delete
Non-admin callers cannot set a user_id/team_id other than their own.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(proxy/memory): omit metadata field when None on create
Prisma's Python client rejects `metadata=None` on a `Json?` field with
"A value is required but not set" — the field must be omitted from the
`data` dict entirely to store SQL NULL. Build the create payload
conditionally in both `create_memory` and the PUT-create branch of
`upsert_memory`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(ui): add Memory page to view/manage /v1/memory entries
Adds a new "Memory" sidebar item under Tools so users can see what their
agents have stored. Lists all memories visible to the caller (scoped by
the backend), with a key-search filter, preview column, scope tags, and
view/edit/delete actions. Create modal accepts optional JSON metadata.
- networking.tsx: fetchMemoryList / createMemory / updateMemory / deleteMemory
wired to the /v1/memory CRUD endpoints.
- MemoryView + MemoryEditModal: new antd-based components (per CLAUDE.md:
use antd for new UI, not tremor).
- page.tsx + leftnav.tsx: wire the "memory" route + sidebar entry.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(memory): add key_prefix filter + promote Memory to AI GATEWAY nav
Backend:
- GET /v1/memory now accepts `key_prefix` for Redis-style namespace
scans (e.g. `?key_prefix=user:`). When both `key` and `key_prefix`
are passed, `key_prefix` wins.
- Prefix filter sits under the visibility filter in the Prisma where
clause, so it can never leak rows across user/team scopes.
- New tests: prefix match, and cross-scope isolation (another user's
`user:*` rows must not appear in the caller's results).
UI:
- Memory moved from a Tools submenu to a top-level AI GATEWAY item
(alongside Agents, MCP Servers, Skills) — it's an API primitive,
not a tool-management surface.
- Search box now drives prefix search, matching the Redis mental
model ("type the namespace, see everything under it").
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(memory): enforce unique key per scope by using NULLS NOT DISTINCT
The unique constraint `(key, user_id, team_id)` on LiteLLM_MemoryTable
silently allowed duplicates when user_id or team_id was NULL, because
Postgres treats every NULL as distinct by default (ANSI semantics). A
caller with no team_id could POST the same key three times and get
three rows.
Migration:
1. Dedupe existing rows, keeping the most recent per (key, user_id,
team_id), using `IS NOT DISTINCT FROM` so NULL == NULL.
2. Drop the old unique index.
3. Recreate it with `NULLS NOT DISTINCT` (Postgres 15+).
No code change: POST already returns 409 on unique-violation error
messages — it just wasn't firing before because the constraint didn't
catch the NULL-team case.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(memory): make key globally unique, 409 on any duplicate
Switches from the compound unique `(key, user_id, team_id)` to a simple
`key @unique`. The compound form silently allowed duplicates when
user_id or team_id was NULL (Postgres treats each NULL as distinct), so
callers could POST the same key repeatedly. Globally-unique key means
one row per key, period — any duplicate create → 409.
- schema.prisma (×3): `key String @unique`, drop `@@unique(...)`.
- initial add_memory_table migration: unique index on (key) only.
- Remove the now-unused follow-up NULLS NOT DISTINCT migration.
- Endpoint error message simplified ("already exists" — no "for this scope").
- Test fake's create() now enforces global key uniqueness.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(ui/memory): full-width layout + user/teams-style columns
- Add `w-full` to the MemoryView outer div so the page fills the
flex-flex-1 container (was collapsing to intrinsic width).
- Replace the combined "Scope" column with separate User ID / Team ID
columns, matching the layout of the Users / Teams pages: ID, Name,
Preview, User ID, Team ID, Updated, Actions.
- IDs render with a truncated mono label + copy-to-clipboard button,
same pattern as view_users.
- Detail drawer now shows Memory ID / User ID / Team ID as separate
fields instead of stacked color tags.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(ui/memory): use clean MCP-style ID pill, drop copy icons
The ID / User ID / Team ID columns showed a mono text blob with a
copy-to-clipboard icon next to each value — too busy compared to the
MCP Servers page. Swap the renderer for MCP's pill style:
- Truncated mono ID inside a blue Tailwind pill
(`font-mono text-blue-600 bg-blue-50 ... rounded-md border`).
- No copy icon. Full ID surfaces via tooltip.
- ID column is a button that opens the detail drawer on click;
user/team ID pills are static (not clickable).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(memory): address greptile review feedback
Addresses 5 greptile findings (3/5 → higher confidence target):
1. Identity-less orphan rows (P1): non-admin callers with no user_id AND
no team_id could create rows that the visibility filter would never
match again. Now rejected up front with 400 — caller must authenticate
with a scoped key or act as PROXY_ADMIN.
2. Upsert race returning 500 (P1): PUT's check-then-create isn't atomic;
a concurrent writer could slip a row in between the 404-check and the
create call. Now catch unique-violation on create, re-read, and fall
through to update — PUT stays idempotent. If the conflicting row
belongs to a different scope, surface a 409 instead of 500.
3. PUT-create scope inconsistency (P2): PUT's create branch always used
the caller's own user_id/team_id, so admins couldn't bootstrap rows
scoped elsewhere via PUT (only POST). Now PUT-create calls the shared
`_resolve_scope()` helper, matching POST semantics.
4. Stale schema comment (P2): schema said "Keyed by (key, user_id,
team_id)" but `key` is globally unique. Updated all three schema
copies to reflect the actual design.
5. UI silently truncated at 200 (P2): MemoryView fetched pageSize=200
with no load-more. Swapped to real server-side pagination driven by
`data.total`; page size is now 50 and the pager is a real AntD
control.
Also extracts a shared `_resolve_scope()` helper and `_is_unique_violation()`
from create_memory so POST and PUT don't drift on the scope/error logic.
Tests: +3 new (identity-less 400, PUT admin bootstrap, PUT race →
update), 18/18 pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(memory): typed Prisma error + explicit-null metadata on PUT
Two more greptile threads from the last review:
- Unique-violation detection was string-matching "Unique"/"UniqueViolation"
in the exception message, fragile across Prisma/driver versions. Now
check the typed error `code == "P2002"` first, with string fallback.
- PUT could not distinguish "metadata omitted" from "metadata: null" —
both parsed as `None`, so callers had no way to clear stored metadata.
Switch to Pydantic v2's `model_fields_set` to tell which fields the
caller actually sent; explicit null now clears the column.
New tests:
- explicit null clears metadata
- omitted metadata preserves existing value
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(ui/memory): send explicit null when user clears metadata
Addresses the remaining P1 from the last greptile review:
When the edit modal's metadata textarea was cleared and saved,
`metadataParsed` stayed `undefined`, `JSON.stringify` dropped the key
entirely, and the backend's `model_fields_set` guard therefore left
the stored metadata untouched — UI showed success but nothing changed.
Now: empty textarea on edit → send explicit `null` so the backend
sees `metadata` in `model_fields_set` and clears the column.
Empty textarea on create still maps to `undefined` (field omitted)
to avoid Prisma's `Json? = None` quirk on insert.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(ui/memory): preserve slashes in key path encoding
The backend route `/v1/memory/{key:path}` supports keys with slashes,
but `encodeURIComponent` encoded `/` as `%2F`. Some proxies (nginx
default, CloudFlare, AWS ALB) reject or re-decode `%2F` mid-flight,
so UI update/delete calls on slash-containing keys could fail or
silently misroute.
New helper `encodeMemoryKeyForPath` splits by `/`, URL-encodes each
segment, then rejoins with literal `/`. Every other unsafe char
(spaces, `?`, `#`, `%`) stays encoded per-segment; slashes stay as
path delimiters, matching what the `:path` converter expects.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(ui/memory): drop misleading client-side column sorters
With server-side pagination, client sorters on `key` and `updated_at`
only reorder the current page while pretending to sort the full
dataset — users would see "sorted by name" but only the visible 50
rows would actually be sorted.
Remove the sorters. The backend already returns rows in
`updated_at DESC` order (sensible default for a memory view), and
users can narrow the result with the key-prefix filter.
Greptile also flagged missing `@@map` on the new model as a
"consistency" issue, but only 1 of 59 tables in this repo uses
`@@map` — the dominant pattern is to rely on Prisma's default
(model name == table name). Skipping that finding as a
false-positive on convention.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(memory): compose visibility + key filters via explicit AND
Greptile P1 (filter-fragility): `where.update(vis)` was semantically
correct today, but dict-merging by key meant any future visibility
filter that grew a new top-level "OR" would silently clobber the
existing key filter.
Compose explicitly instead:
where = {"AND": [key_filter, vis]}
Applied to both `list_memory` and `_find_memory_for_caller`. When
either side is empty (admin has no visibility filter; list has no
key filter), skip the wrapper and use the non-empty side directly
to keep the generated SQL clean.
Test fake's `_matches` now understands top-level `AND` too.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor(ui/memory): wrap write helpers with react-query useMutation
Previously the Memory view read via `useQuery` but called the raw
create/update/delete fetch helpers directly in handlers, tracking
loading state with a local `submitting` flag and invalidating state
via `refetch()`. That mixes two concerns:
- it skips react-query's mutation state (isPending / isError / isSuccess)
- `refetch()` only retouches the currently-mounted query instance, not
other cached pages, so navigating back to an older page could show
stale rows
Switch the three write paths to `useMutation`:
- `createMutation`, `updateMutation`, `deleteMutation` — each owns
the mutation fn, success toast, and error toast.
- Success handlers invalidate the whole `["memoryList", ...]` prefix
via `queryClient.invalidateQueries`, so every cached page refetches
(pagination + filter-aware).
- Refresh button now invalidates instead of `refetch()`, keeping all
behavior consistent.
- handleSave/handleDelete become thin adapters that call `.mutateAsync`;
their errors are swallowed locally since the mutation's onError has
already surfaced the toast.
Also tightened the edit modal's key-field tooltip to reflect the
actual global-unique semantics (was "Unique per user/team scope").
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(memory): close cross-user write gap + sanitize 500 errors (Veria)
Addresses two Veria findings:
**High — cross-user memory tampering via team membership.** The
visibility filter uses an OR (`user_id == caller OR team_id == caller`)
so team members can SEE each other's team-scoped rows. That's
intentional for list/get. But because PUT/DELETE used the same filter
to find the target row, any team member could overwrite or delete a
teammate's *personal* row whenever both `user_id` and `team_id` were
stamped on it — broader visibility was being silently treated as
broader authority.
New `_assert_write_access(row, caller)` enforces ownership for
mutations. Non-admin rules:
- The row's `user_id` must match the caller (personal ownership), OR
- The row has no `user_id` and its `team_id` matches the caller's
team (a "pure team row" intended for shared writes).
Admins bypass the check. The same gate runs in PUT (both regular
and post-race-recovery branches) and DELETE.
**Medium — DB internals leaked through 500 detail.** Every `except`
block was raising `HTTPException(500, detail=str(e))`, which surfaces
Prisma error strings (table/column names, host:port, error class
names) to API callers. New `_internal_error()` helper logs the real
exception server-side and returns a generic, caller-safe `detail`.
Applied to create, list, upsert (general fallthrough), and delete.
Also tightened the race-recovery 409 message to drop the "in a
different scope" wording — the caller never needs to know whose
scope it lives in.
Tests (+5):
- teammate cannot overwrite personal row → 403
- teammate cannot delete personal row → 403
- teammate CAN modify pure team row (no user_id stamped) → 200
- admin bypasses write-auth → 200
- 500 response never echoes Prisma internals (table/host/class names)
25/25 unit tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(memory): require team admin to modify pure team rows
Tightens the write-authorization rule for "pure team rows" (rows with
no user_id stamped, only team_id) to match the pattern used by
team-management endpoints (`_is_user_team_admin` + `_is_user_org_admin_for_team`):
- Plain team members can READ team rows via the OR visibility filter
(intentional, unchanged).
- Only PROXY_ADMIN, team admins of the row's team_id, or org admins
for the team's organization may MODIFY them. Plain members get 403.
`_assert_write_access` is now async and takes the prisma_client so it
can fetch the team and run the existing `_is_user_team_admin` /
`_is_user_org_admin_for_team` helpers from
`litellm.proxy.management_endpoints.common_utils`. The org-admin path
is best-effort: it calls `get_user_object`, which depends on the
proxy_server module being initialized, so any exception there is
treated as "not an org admin" rather than crashing the request.
Tests:
- team admin can modify pure team row → 200
- plain team member cannot modify pure team row → 403
- plain team member cannot delete pure team row → 403
Updates the test fake to add a tiny `litellm_teamtable.find_unique`
implementation and a `_make_team(team_id, admin_user_ids=[...])`
helper.
27/27 unit tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: mypy + UI page-metadata sync for memory page
Two CI failures:
1. mypy: `_find_memory_for_caller` had `key_filter` inferred as
`dict[str, str]` (literal type) and the conditional `{"AND": [key_filter, vis]}`
returned `dict[str, list[...]]`, so the join site failed
`dict-item` typing. Annotate both intermediates as `dict` so mypy
widens the value type.
2. UI test (`page_utils.test.ts > should have descriptions for all
pages`): every leftnav entry must have a description in
`page_metadata.ts`, and `memory` was missing. Added a one-line
description, matching the style of neighboring entries.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* [Feat] Day-0 support for GPT-5.5 and GPT-5.5 Pro (#26449)
* feat(openai): day-0 support for GPT-5.5 and GPT-5.5 Pro
Add pricing + capability entries for the new GPT-5.5 family launched by
OpenAI on 2026-04-24:
- gpt-5.5 / gpt-5.5-2026-04-23 (chat): $5/$30/$0.50 per 1M
input/output/cached input
- gpt-5.5-pro / gpt-5.5-pro-2026-04-23 (responses-only): $60/$360/$6
per 1M input/output/cached input
Other fees (long-context >272k, flex, batches, priority, cache
discounts) follow the same ratios as GPT-5.4, with context window
retained at 1.05M input / 128K output.
No transformation / classifier code changes are required:
OpenAIGPT5Config.is_model_gpt_5_4_plus_model() already matches 5.5+ via
numeric version parsing, and model registration is driven from the
JSON. The existing responses-API bridge for tools + reasoning_effort
(litellm/main.py:970) already covers gpt-5.5-pro.
Tests:
- GPT5_MODELS regression list now covers gpt-5.5-pro and dated variants
- New test_generic_cost_per_token_gpt55_pro cost-calc test
- Updated test_generic_cost_per_token_gpt55 for long-context fields
* fix(openai): mirror reasoning_effort flags onto gpt-5.5 dated variants
gpt-5.5-2026-04-23 and gpt-5.5-pro-2026-04-23 were missing the
supports_none_reasoning_effort, supports_xhigh_reasoning_effort, and
supports_minimal_reasoning_effort flags that their non-dated
counterparts define. Reasoning-effort routing in OpenAIGPT5Config is
fully capability-driven from these JSON flags — since an absent flag
is treated as False for opt-in levels (xhigh), users pinning to a
dated snapshot would silently lose xhigh support and diverge from the
base alias on logprobs + flexible temperature handling.
Copy the flags onto both dated variants so every dated snapshot
inherits the base model's reasoning-effort capability profile.
Adds a parametrized regression test that asserts
supports_{none,minimal,xhigh}_reasoning_effort parity between each
dated variant and its non-dated counterpart, preventing future drift
when new snapshots are added.
* fix(schema): close LiteLLM_MemoryTable model brace dropped during merge
The rebase against `litellm_internal_staging` (which added
`LiteLLM_AdaptiveRouterState` / `LiteLLM_AdaptiveRouterSession`) left
the closing brace of `LiteLLM_MemoryTable` missing in all three
schema copies — the next model declaration ended up parsed as a field
of the memory table, surfacing as the CI prisma error:
error: This line is not a valid field or attribute definition.
--> schema.prisma:1250
|
1249 | // Per-(router, request_type, model) Beta posterior for the adaptive router.
1250 | model LiteLLM_AdaptiveRouterState {
Add the missing `}` (and the standard blank line) after the memory
table's `@@index([team_id])` in `schema.prisma`,
`litellm/proxy/schema.prisma`, and
`litellm-proxy-extras/litellm_proxy_extras/schema.prisma`.
`prisma generate --schema litellm/proxy/schema.prisma` now runs clean;
27/27 memory unit tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Mateo Wang <277851410+mateo-berri@users.noreply.github.com>
- UV_IMAGE across all Dockerfiles: 0.10.9 -> 0.11.7.
- Loosen `required-version` in enterprise/ and litellm-proxy-extras/
from strict `==0.10.9` to `>=0.10.9` so the new Docker image can
build those workspace members. Matches the main pyproject range.
- Drop the `sed` block that rewrote tar/minimatch version ranges in
npm's bundled package.json files. The override loop above already
swaps the vendored directories on disk; npm doesn't re-resolve at
runtime, so the sed was cosmetic.
litellm-proxy-extras ships a LICENSE file with MIT terms but did not
declare a `license` SPDX expression in its pyproject.toml, so tools
that read the metadata (PyPI, Nexus IQ, pip-licenses) reported
License-None for every published version. Add the explicit expression
so downstream scanners resolve the declared license.
P1: start the adaptive-router flusher loop unconditionally at proxy boot
instead of gating on 'adaptive_routers is non-empty'. Adaptive routers
added via /config/reload after boot now have their queues drained.
State is lazy-loaded per router on first flush tick (new _state_loaded
flag on AdaptiveRouter) so hot-reloaded routers still get their
persisted priors.
P2: _finalize_adaptive_router_if_configured now prunes stale
AdaptiveRouterPostCallHook callbacks from every litellm callback list
before registering new ones. Without this, every Router replacement
left the old hooks wired up in litellm.callbacks and double-fired
signal recording for every request. Uses
logging_callback_manager.remove_callbacks_by_type (same pattern as the
semantic tool filter).
CI fixes:
- black --check failure: reformatted litellm/router.py
- schema migration diff: aligned @@index with the explicit index name
('idx_adaptive_router_session_activity') from the original migration
by adding 'map:' to all three schema.prisma copies. No new migration
needed.
Tests: 1 new covering the prune-on-hot-reload path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Mark last_updated_at (AdaptiveRouterState) and last_activity_at
(AdaptiveRouterSession) with @updatedAt so Prisma refreshes the
timestamps on every write. Without this the fields stayed frozen at
INSERT time and the last_activity_at index was misleading for any
future TTL/eviction logic. Applied to all three schema.prisma copies;
no migration SQL change needed (Prisma @updatedAt is a client-side
annotation that doesn't touch DDL).
- get_state_snapshot: report cell.total_samples instead of alpha+beta
for the 'samples' field. The previous value inflated every cell by
the COLD_START_MASS prior (e.g. showed 10.0 before any real traffic
arrived), which confused operators reading /adaptive_router/.../state.
Updated docs + the snapshot test to match.
Also fixes two pre-existing merge-break syntax errors in router.py
(missing ')' on the AdaptiveRouter TYPE_CHECKING import; truncated
async_pre_routing_hook dispatch call for the adaptive router branch)
that were masking the rest of the file from the interpreter.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses two further Greptile findings:
- `_warn_if_db_ahead_of_head` only caught `psycopg.OperationalError`.
Non-connection DB errors (e.g. `InsufficientPrivilege` / 42501 if the
runtime DB user lacks SELECT on `_prisma_migrations`) would propagate
uncaught and crash startup — contradicting the docstring's
"informational only, never blocks" guarantee. Widen the catch to
`psycopg.DatabaseError` so all DB-layer errors are swallowed.
- In the P3009 and P3018 idempotent-recovery paths, the call to
`_resolve_specific_migration(name)` was not wrapped in its own
try/except. Being inside an active `except CalledProcessError`
handler, a new `CalledProcessError` from the resolve call would NOT
re-enter the same handler — it would propagate out as
`CalledProcessError`, past `proxy_cli.py`'s `except RuntimeError`,
crashing startup with an unhandled traceback instead of the intended
clean `sys.exit(2)`. Wrap both call sites to convert to RuntimeError.
Adds unit tests for both behaviors.
- Open the psycopg connection in `_warn_if_db_ahead_of_head` with
autocommit=True. Without it, psycopg3's `with conn` calls COMMIT on
clean exit, which fails after the `UndefinedTable` (fresh-DB) branch
left the transaction in an aborted state — crashing first-run startups.
- Wrap the v2 `prisma db push` path in try/except and raise RuntimeError
on CalledProcessError/TimeoutExpired. Otherwise these propagate past
proxy_cli.py's `except RuntimeError` as unhandled tracebacks.
- Reword the loop-exhaustion error to cover the non-timeout exit path
(repeated P3005/P3009/P3018 idempotent-recovery `continue`s), not
just persistent timeouts.
Adds a unit test for the db_push error wrapping.
Default behavior (v1) is unchanged. Users who have seen schema thrashing
during rolling deploys can opt into the v2 resolver with
`--use_v2_migration_resolver`.
Why v2 is safer:
- Runs `prisma migrate deploy` only.
- Recovers from P3005 (baseline) and idempotent P3009/P3018 errors, same
as v1.
- Never calls `_resolve_all_migrations`, which generates a schema diff
between the live DB and the shipped schema.prisma and applies it via
`prisma db execute`. That path bypassed every migration's SQL and was
the root cause of thrashing when two LiteLLM versions contended for
the same DB.
- Logs a non-blocking warning when the DB has migrations applied that
are newer than anything this build ships (ahead-of-HEAD). It does not
refuse to start — many users have unusual ledger state from past
thrashing, and blocking startup would be a breaking change.
Also prints a message on startup when the default (v1) resolver is in
use, pointing operators at the opt-in flag.
Adds unit tests covering the v2 fail-fast paths, the stripping of
Prisma-specific query params from DATABASE_URL (needed for psycopg),
the timestamp helpers, and pins the default: v1 still invokes
`_resolve_all_migrations`, v2 must not.
Adds total_spend column to LiteLLM_TeamMembership that accumulates
continuously and is not zeroed by the budget cycle reset job. This
enables UI surfaces to distinguish current-cycle spend (the existing
spend column, which resets) from lifetime spend per team member.
Also exposes budget_reset_at on LiteLLM_BudgetTable so /team/info
callers can see when a member's budget window next resets. The field
was already stored in the DB but stripped by the response Pydantic
model.
Includes regression tests that:
- Guard the reset job against ever writing total_spend: 0
- Verify the spend writer increments both spend and total_spend in
one UPDATE statement.
Generating a migration from a stale branch could silently emit DROP
COLUMN for columns the stale branch did not know about, and the
script would write that SQL to a new migration file with no warning.
Adds two guards to ci_cd/run_migration.py:
- Branch freshness check: fetches origin/<base-branch> and exits 3 if
HEAD is behind. Default base is litellm_internal_staging. New
flags: --base-branch, --skip-freshness-check.
- Destructive guard: refuses (exit 2) if the generated diff contains
DROP COLUMN / DROP TABLE / DROP INDEX, unless --allow-destructive
is passed.
Refusal banners include guidance and an explicit callout instructing
AI agents not to auto-bypass the flags. Also treats Prisma's
"-- This is an empty migration." output as a no-op rather than
writing an empty file.
Updates litellm-proxy-extras/migration_runbook.md with the new
workflow, flag documentation, and agent warnings.
- Use 'auto_router/adaptive_router' prefix in example yaml, docs, and
README — the old 'adaptive_router/...' and 'openai/gpt-4o-mini' values
silently skipped adaptive-router init because detection requires the
'auto_router/adaptive_router' prefix.
- Read x-litellm-min-quality-tier from request headers (and the
'min_quality_tier' metadata key as fallback) in async_pre_routing_hook.
Previously the documented header was defined but never extracted, so
the quality-floor feature was inert.
- Evict expired entries from _session_states. The cache grew without
bound — added a parallel expiry map (same TTL as _owner_cache) and an
opportunistic bulk sweep when the cache crosses a size threshold.
- Align adaptive-router migration SQL with Prisma schema: all count
columns and the 'clean_credit_awarded' / 'last_processed_turn' fields
are NOT NULL in the data model, so the migration now declares them
NOT NULL. Fixes test_aaaasschema_migration_check.
Tests: 8 new covering header/metadata/precedence/invalid-value paths for
min_quality_tier and TTL-based eviction of _session_states.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* bump litellm-proxy-extras version to 0.4.67
* bump litellm-proxy-extras pin to 0.4.67 in litellm pyproject
* regenerate uv.lock for litellm-proxy-extras 0.4.67
* bump litellm-enterprise version to 0.1.38
* bump litellm-enterprise pin to 0.1.38 in litellm pyproject
* regenerate uv.lock for litellm-enterprise 0.1.38
When prisma migrate deploy reports 'No pending migrations to apply' the DB
already matches schema — running _resolve_all_migrations (migrate diff +
prisma db execute) adds 25+ seconds unnecessarily, causing the proxy to
miss the 90-second startup timeout in test_litellm_proxy_server_config_no_general_settings.
When DIRECT_URL is not set and DATABASE_URL is a Neon pooler URL, prisma migrate diff
fails (pooler doesn't support extended query protocol for schema introspection). Previously
_resolve_all_migrations returned early without applying any migrations, leaving the
budget_limits column missing and causing test_auth_callback_new_user to fail.
Now falls back to running each migration SQL file via prisma db execute --file, which
works with pooler URLs and is safe to re-run due to IF NOT EXISTS guards.