litellm

Author	SHA1	Message	Date
yuneng-jiang	1480ec698b	chore(ci): bump versions (#28287 ) * bump: version 0.4.72 → 0.4.73 * bump: version 1.86.0 → 1.87.0 * uv lock	2026-05-19 15:10:37 -07:00
Sameer Kankute	36c494fdd2	Litellm oss staging (#28161 ) * fix(opentelemetry): JSON-serialize dict metadata fields for OTEL span attributes (#27451) (#27455) Squash-merged by litellm-agent from Anai-Guo's PR. * feat(dashscope): add embeddings and reranks(qwen3-rerank) support via OpenAI-compatible endpoint (#27508) Squash-merged by litellm-agent from yimao's PR. * fix(vertex_ai/gemini): raise BadRequestError when image_url or url fi… (#24550) Squash-merged by litellm-agent from krisxia0506's PR. * fix(vertex_ai): raise error on mid-stream 429/error chunks instead of silently swallowing (#23711) Squash-merged by litellm-agent from krisxia0506's PR. * fix: raise BadRequestError for file content blocks missing 'file' sub… (#24503) Squash-merged by litellm-agent from krisxia0506's PR. * Fix Gemini MIME detection for extensionless GCS URIs (#27278) Squash-merged by litellm-agent from krisxia0506's PR. * fix(vertex_ai/partner_models): drop unused vertexai SDK gate from count_tokens (closes #28084) (#28107) Squash-merged by litellm-agent from voidborne-d's PR. * feat(chart): add support for autoscaling behavior in HPA (#27990) Squash-merged by litellm-agent from FabrizioCafolla's PR. * feat(proxy): add blocked flag to models for pause/resume from the UI (#27927) Squash-merged by litellm-agent from Cyberfilo's PR. * fix: pass socket timeouts to Redis cluster clients (#27920) Squash-merged by litellm-agent from tomdee's PR. * Fix/cache token (#28009) Squash-merged by litellm-agent from escon1004's PR. * fix(deepseek): forward reasoning_content in multi-turn thinking mode conversations (#28080) Squash-merged by litellm-agent from Divyansh8321's PR. * fix(guardrails): return HTTP 400 instead of 500 for blocked requests (#27617) * fix: reset org and tag budgets (#27326) * reset org budgets * reset tag budgets --------- Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain> * fix(ui): omit allowed_routes from key edit save when unchanged (#27553) * fix(ui): omit allowed_routes from key edit save when unchanged When a team admin opens Edit Settings on a key with key_type=AI APIs and saves without changing anything, the UI re-sends the existing allowed_routes value, which the backend's _check_allowed_routes_caller_permission gate rejects for non-proxy-admins (LIT-2681). Strip allowed_routes from the patch in handleSubmit when it deep-equals the original keyData.allowed_routes. The backend treats absence as "leave alone," so no-op saves now succeed for non-admins. Admins explicitly editing the field still send the new value. * fix(ui): order-insensitive allowed_routes diff + cover null-original case Address Greptile review: - Switch the "is allowed_routes unchanged" check to a Set-based comparison so a server-side reorder of the array doesn't register as a user edit and re-trigger LIT-2681. - Add two regression tests: (1) keyData.allowed_routes is null and the form is untouched — patch should strip the field; (2) server returned routes in a different order than the user originally entered — patch should still recognize the value as unchanged. * chore(ui): strip ticket refs and tighten comments in key edit fix - Remove internal-tracker references from in-code comments - Tighten the WHY comment in handleSubmit to two lines - Drop redundant test-block comments — test names already describe the case * fix(ui): annotate Set<string> generic in allowed_routes diff to fix tsc * fix(guardrails): return HTTP 400 instead of 500 for guardrail-blocked requests GuardrailRaisedException and BlockedPiiEntityError both lacked a status_code attribute. When these exceptions reached the proxy exception handler (getattr(e, 'status_code', 500)), the fallback defaulted to HTTP 500 — making intentional guardrail blocks indistinguishable from server errors and causing unnecessary client retries. Changes: - Add status_code=400 (keyword-only) to GuardrailRaisedException - Add status_code=400 (keyword-only) to BlockedPiiEntityError - Update _is_guardrail_intervention() to recognize both exceptions so downstream loggers record 'guardrail_intervened' instead of 'guardrail_failed_to_respond' - Add 6 unit tests for default/custom status codes and getattr pattern - Strengthen existing blocked-action test with status_code assertion Fixes #24348 --------- Co-authored-by: Michael-RZ-Berri <michael@berri.ai> Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain> Co-authored-by: ryan-crabbe-berri <ryan@berri.ai> Co-authored-by: Krrish Dholakia <krrish+github@berri.ai> * fix(router/proxy): address Greptile P1+P2 review comments on PR #28161 - router: raise ServiceUnavailableError (503) instead of RouterRateLimitErrorBasic (429) when a specifically-addressed deployment is administratively blocked; 429 misleads retry-enabled clients into spinning forever against a paused model - proxy_server: compute get_fully_blocked_model_names() once before both branches in model_list() instead of duplicating the call in each branch - deepseek: upgrade silent debug log to warning when injecting placeholder reasoning_content so callers are clearly notified of degraded multi-turn quality - tests: update two blocked-deployment assertions to expect ServiceUnavailableError Co-authored-by: Cursor <cursoragent@cursor.com> * fix: address bug detection findings (cache token order, mutable defaults) Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix: address bugs in async pass-through, anthropic cache token detection, rerank tests - async_get_available_deployment_for_pass_through: enforce blocked check on specific deployments - cost_calculator: detect anthropic-style usage by attribute presence (not truthiness) to avoid mixing OpenAI cached_tokens into anthropic normalization when read=0 - dashscope rerank tests: pass request to httpx.Response constructions for consistency Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix code qa * fix(vertex_ai/gemini): strip MIME parameters from GCS contentType GCS object metadata's contentType field can include parameters such as 'text/html; charset=utf-8'. Strip them in _apply_gemini_mime_type_aliases so downstream get_file_extension_from_mime_type sees a bare MIME type. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(vertex_ai/gemini): clarify mime-type error message string concatenation Co-authored-by: Yassin Kortam <yassin@berri.ai> --------- Co-authored-by: Tai An <antai12232931@outlook.com> Co-authored-by: Vincent <yimao1231@gmail.com> Co-authored-by: Kris Xia <xiajiayi0506@gmail.com> Co-authored-by: d 🔹 <liusway405@gmail.com> Co-authored-by: Fabrizio Cafolla <developer@fabriziocafolla.com> Co-authored-by: Filippo Menghi <113345637+Cyberfilo@users.noreply.github.com> Co-authored-by: Tom Denham <tom@tomdee.co.uk> Co-authored-by: escon1004 <70471150+escon1004@users.noreply.github.com> Co-authored-by: Divyansh Singhal <97736786+Divyansh8321@users.noreply.github.com> Co-authored-by: robin-fiddler <robin@fiddler.ai> Co-authored-by: Michael-RZ-Berri <michael@berri.ai> Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain> Co-authored-by: ryan-crabbe-berri <ryan@berri.ai> Co-authored-by: Krrish Dholakia <krrish+github@berri.ai> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Yassin Kortam <yassin@berri.ai>	2026-05-18 16:27:44 -07:00
Yuneng Jiang	0aa439d919	bump: version 0.4.71 → 0.4.72	2026-05-13 21:51:11 -07:00
Krrish Dholakia	8bbc61e03c	fix: harden /key/update authorization checks (#27878 ) * fix: patch Host-header auth bypass in get_request_route Starlette reconstructs request.url from the Host header. A malformed Host like `localhost/?x=1` causes Starlette to build the full URL as `http://localhost/?x=1/health`, which url-parses to path="/". Since "/" is in LiteLLMRoutes.public_routes, all protected routes became reachable without authentication. Fix: read scope["path"] (set by uvicorn from the HTTP request line, not derivable from headers) instead of request.url.path. Sub-path deployments are handled via scope["app_root_path"] / scope["root_path"], mirroring Starlette's own base_url construction logic. Affected variants confirmed fixed: Host: localhost/?x=1 Host: localhost:4000/?x=1 Host: localhost/#test Host: localhost:4000/#test Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * style: reduce comments in route fix Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: block credential fields in RAG ingest vector_store options Credential fields (vertex_credentials, aws_access_key_id, api_key, etc.) in ingest_options.vector_store are now rejected at the API boundary with a 400 error. Credentials must be configured server-side. Previously any authenticated user could supply a vertex_credentials dict with type=external_account pointing credential_source.file at an arbitrary path (e.g. /proc/1/environ) and token_url at an attacker-controlled server. google-auth's identity_pool.Credentials refresh() would read the file and POST its contents to the attacker. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: block /key/update self-escalation by assigned users Non-admin users who were assigned a key (created_by != caller) could update any non-budget field — models, rpm_limit, guardrails, etc. — without admin authorization, allowing privilege self-escalation. Gate: only the key creator (created_by == caller) may edit their own key without admin check; budget changes always require admin regardless of creator status. All other callers must pass _check_key_admin_access. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: block user-controlled api_base in RAG ingest vector_store options A user-supplied api_base in ingest_options.vector_store caused the server to forward its configured provider credentials (Gemini, OpenAI) to an attacker-controlled endpoint via SSRF. Add api_base to the blocked credential params set alongside api_key and the existing credential fields. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: restrict /utils/transform_request to PROXY_ADMIN and apply body safety check Any authenticated internal_user could POST arbitrary provider config (aws_sts_endpoint, api_base, etc.) to /utils/transform_request and have the server forward its credentials to an attacker-controlled endpoint. - Gate the endpoint on PROXY_ADMIN role (403 for all other roles) - Call is_request_body_safe() to reject banned params even for admins - Convert ValueError from safety check to HTTP 400 Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: apply banned-param check to /utils/transform_request Without is_request_body_safe(), any authenticated user could pass aws_sts_endpoint, api_base, or aws_web_identity_token to /utils/transform_request and have the server forward its configured provider credentials to an attacker-controlled endpoint during SDK credential resolution. Applies the same banned-param blocklist already used by LLM endpoints. Endpoint remains accessible to all authenticated users. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: block SSRF via api_base in /prompts/test dotprompt YAML frontmatter Any frontmatter key not in ["model","input","output"] flowed into optional_params and was merged into the LLM call data dict, bypassing is_request_body_safe. An attacker with any bearer key could set api_base in YAML to redirect the outbound LLM request — including the provider API key — to an attacker-controlled host. Fix: call is_request_body_safe on the constructed data dict after optional_params are merged, before invoking ProxyBaseLLMRequestProcessing. ValueError from the banned-param check is surfaced as HTTP 400. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * Update litellm/proxy/rag_endpoints/endpoints.py Co-authored-by: veria-ai[bot] <224490171+veria-ai[bot]@users.noreply.github.com> * fix: coerce nested config strings before banned-param check _NESTED_CONFIG_KEYS descent used isinstance(nested, dict) which silently skipped litellm_embedding_config when delivered as a JSON string via multipart/form-data. Banned params (api_base, aws_sts_endpoint, etc.) nested inside the stringified value were invisible to is_request_body_safe. _NESTED_METADATA_KEYS already used _coerce_metadata_to_dict which parses JSON strings before checking. Apply the same coercion to _NESTED_CONFIG_KEYS. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: replace substring match with prefix match in is_llm_api_route mapped_pass_through_routes used `_llm_passthrough_route in route` (substring) so any admin-only path whose URL contained a provider name (openai, anthropic, azure, bedrock, etc.) was misclassified as an LLM API route and bypassed the admin gate in non_proxy_admin_allowed_routes_check. Confirmed live: non-admin key could GET /credentials/by_name/openai (read masked provider API key) and DELETE /credentials/openai (delete credential). Fix: use exact match or startswith(prefix + "/") — the same pattern used everywhere else in RouteChecks — so only routes that actually start with a passthrough prefix are allowed through. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: stabilize PR #27878 test failures - key_management_endpoints: extend can_skip_admin_check to team keys so team members with /key/update permission can update non-budget fields. can_team_member_execute_key_management_endpoint already validates team membership + permission and raises if unauthorized; reaching the admin check on a team key means the caller was authorized. - test: set created_by on mock key in test_update_key_non_budget_fields_allowed_for_internal_user so caller_is_creator resolves correctly (MagicMock default ≠ user_id). - auth_utils.get_request_route: guard against non-dict request.scope (e.g. MagicMock in unit tests) to prevent a MagicMock leaking into UserAPIKeyAuth.request_route and failing Pydantic validation. - ci: assign test_multipart_bypass_repro.py to the proxy-runtime shard in test-unit-proxy-db.yml to satisfy the shard-coverage check. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix(lint): add explicit str() cast in get_request_route for MyPy scope.get() returns Any\|None which MyPy cannot coerce to str implicitly. Wrap both scope.get() calls in str() to satisfy the type checker. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: guard bare-/ root_path strip + make total_spend migration idempotent auth_utils.get_request_route: when Starlette sets scope["app_root_path"] to "/" (e.g. behind some middleware), the old stripping logic would remove the leading slash from every path ("/team/new" → "team/new"), breaking route matching and causing auth to misclassify protected routes. Skip stripping when root_path is bare "/". migration: add IF NOT EXISTS to total_spend ALTER TABLE so the migration is safe to replay when a prior partial run already created the column. Without this guard, prisma migrate deploy fails on CI DBs that were partially migrated, causing all subsequent DB operations (including /team/new) to 500. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: require creator still owns key for personal-key bypass in /key/update caller_is_creator now requires both created_by == caller AND user_id == caller. Previously checking only created_by let a demoted admin who originally created a key for another user continue editing non-budget fields on it after reassignment, bypassing _check_key_admin_access. Adds regression test: creator whose key was reassigned is blocked (403). Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: extract auth checks to fix PLR0915 + broaden max_budget assertion internal_user_endpoints._update_single_user_helper exceeded 50 statements (PLR0915). Extract authorization checks into _check_user_update_authz helper to bring statement count under the limit. test_validate_max_budget: assert "negative" (substring of both the local "cannot be negative" and the CI "non-negative finite number" messages) so the test is stable regardless of which exact wording the function uses. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: veria-ai[bot] <224490171+veria-ai[bot]@users.noreply.github.com>	2026-05-14 04:16:04 +00:00
Sameer Kankute	18f77ff7bc	feat(mcp): add delegate_auth_to_upstream flag for PKCE passthrough (#27834 ) * feat(mcp): add delegate_auth_to_upstream flag for PKCE passthrough Adds an opt-in per-server flag that lets clients (e.g. VS Code) complete PKCE directly with an upstream OAuth2 MCP server, instead of LiteLLM double-gating with its own API-key/SSO check. Only honored when auth_type=oauth2 and the operator explicitly sets the flag; mixed-target or non-oauth2 requests fail closed. - Adds the field to Pydantic models, Prisma schema, and a migration - New MCPRequestHandler._target_servers_delegate_auth_to_upstream gate that runs only when no x-litellm-api-key is present, so authenticated users still get user_id resolution + stored-credential lookup - Anonymous callers now see delegate servers in get_allowed_mcp_servers (scoped to delegate servers only; the upstream still enforces auth) - mcp_management_endpoints: allow anonymous /authorize and /token for delegate servers so VS Code can complete PKCE without a LiteLLM session - UI toggle (shown only for oauth2) + payload/view wiring - Tests covering: oauth2 on/off, non-oauth2 with flag, mixed targets, no resolvable target, explicit key precedence, and 401 emission Co-authored-by: Cursor <cursoragent@cursor.com> * Enforce oauth2 for delegated MCP auth bypass Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp): close secondary Authorization bypass for delegate servers The delegate-auth bypass gated only on the primary `x-litellm-api-key` header, so a LiteLLM key sent via `Authorization: Bearer sk-...` (the secondary header) was silently dropped — skipping spend tracking and rate limiting. Gate on the resolved litellm_api_key (which considers both headers) so the bypass fires only when neither is present. Also update the existing "Authorization header present" test to reflect that an upstream OAuth token now flows through the existing oauth2 fallback (LiteLLM auth attempt → fail → anonymous), not via the delegate branch. Co-authored-by: Cursor <cursoragent@cursor.com> * Avoid duplicate MCP OAuth credential lookup Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp): block delegate bypass for M2M and internal-only servers Two security issues flagged in code review: 1. High – client_credentials (M2M) servers must not be delegatable: LiteLLM auto-fetches the upstream token using stored credentials, so allowing anonymous bypass would let any external caller invoke tools authenticated as LiteLLM's service account. Fix: check `server.has_client_credentials` in `_target_servers_delegate_auth_to_upstream`, the anonymous allow-list in `get_allowed_mcp_servers`, and `_mcp_oauth_user_api_key_auth`. 2. Medium – internal-only servers exposed to public internet: The anonymous delegate allow-list was not filtering by `available_on_public_internet`, so external callers with an upstream OAuth token could invoke tools on servers marked internal-only. Fix: add `available_on_public_internet` guard to the anonymous delegate server list in `get_allowed_mcp_servers`. Tests added for both cases. Co-authored-by: Cursor <cursoragent@cursor.com> * Require public MCP delegate auth servers Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp): align delegate auth path parsing with downstream routing `_extract_target_server_names_from_path` used a naive segments-based split while `server.py::_get_mcp_servers_in_path` uses a regex that allows server names with one embedded slash and comma-separated lists. With the old parser, a request to `/mcp/<delegated>/<garbage>` was parsed as targeting `<delegated>` by the auth gate (bypassing LiteLLM auth) while the routing layer parsed it as `<delegated>/<garbage>` — when that name did not resolve, the request fell back to the anonymous allow-list, which can include `allow_all_keys` servers that normally require a LiteLLM key. Replace the parser with the same regex logic as `_get_mcp_servers_in_path` so auth gating sees the exact target name(s) downstream routing sees. Add regression tests covering parser parity and the specific extra-path-segment bypass attempt. https://claude.ai/code/session_01SjyPmwfmrq8fveFgw9iHW9 * fix(mcp): close header/path TOCTOU in MCP delegate auth gate `_target_servers_delegate_auth_to_upstream` and `_target_servers_use_oauth2` trusted the `x-mcp-servers` header when present, but `server.py::extract_mcp_auth_context` overrides that header with the path-derived list for `/mcp/...` routes. An attacker could set `x-mcp-servers: <delegated>` while pointing the URL path at a non-delegate server, flipping the auth gate without changing the target downstream routing actually uses. Extract a shared `_resolve_target_server_names` helper that mirrors the downstream override (path-derived names for `/mcp/...` routes, header value otherwise). Add regression tests covering the TOCTOU attempt and the helper's path-vs-header precedence. https://claude.ai/code/session_01SjyPmwfmrq8fveFgw9iHW9 * Fix delegated MCP OAuth test mock Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp): drop unreachable /{server}/mcp branch in auth path parser `_extract_target_server_names_from_path` also matched the ``/{server_name}/mcp`` form, but the downstream parser ``_get_mcp_servers_in_path`` only handles ``/mcp/...`` — and ``dynamic_mcp_route`` in ``proxy_server`` rewrites ``/{name}/mcp`` to ``/mcp/{name}`` on the scope before the MCP handler runs. Parsing the un-rewritten form on the auth side was therefore unreachable in production, and contradicted the docstring's claim of mirroring the downstream parser — exactly the kind of mismatch that risks a future header/path TOCTOU if any new entry point skips the rewrite. Drop the branch; the canonical ``/mcp/...`` path matches both parsers. Update the regression test to assert the new behavior. https://claude.ai/code/session_01SjyPmwfmrq8fveFgw9iHW9 * Fix MCP path auth target resolution Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp): require auth for refresh_token grants on delegate-auth servers `_mcp_oauth_user_api_key_auth` gates the unauthenticated PKCE flow for ``delegate_auth_to_upstream`` servers, but the bypass applied to BOTH ``/authorize`` and ``/token`` regardless of grant type. ``mcp_token`` accepts ``grant_type=refresh_token`` as well as ``authorization_code``, and ``exchange_token_with_server`` attaches the server's stored ``client_secret`` to whatever is forwarded upstream. An unauthenticated caller holding a refresh token issued to that OAuth client could mint fresh upstream access tokens through LiteLLM. Limit the anonymous bypass on ``/token`` to ``grant_type=authorization_code`` (the only grant PKCE actually protects via ``code_verifier``); fall through to normal LiteLLM auth for ``refresh_token`` and any other grant. ``/authorize`` continues to allow anonymous PKCE redirects. https://claude.ai/code/session_01SjyPmwfmrq8fveFgw9iHW9 * fix(ui): clear delegate_auth_to_upstream when switching off oauth2 The ``delegate_auth_to_upstream`` form field is rendered inside an ``isOAuth2 && (...)`` conditional, so the Form.Item unmounts when the user changes ``auth_type`` away from ``oauth2``. The follow-up ``form.setFieldValue("delegate_auth_to_upstream", false)`` runs after the field has already deregistered, so ``onFinish`` receives ``undefined`` and the fallback ``?? mcpServer.delegate_auth_to_upstream`` preserved the old ``true``. The flag then persisted in the database for a non-oauth2 server and silently re-activated if ``auth_type`` was later switched back to ``oauth2``. In the edit payload, force the flag to ``false`` whenever ``auth_type !== oauth2``; only trust the form value (and the existing DB fallback) when the server is actually oauth2. Backend defense-in-depth already ignores the flag for non-oauth2 servers, but the DB state should stay clean too. https://claude.ai/code/session_01SjyPmwfmrq8fveFgw9iHW9 * Fix MCP delegate auth reset on edit Co-authored-by: Yassin Kortam <yassin@berri.ai> --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Yassin Kortam <yassin@berri.ai> Co-authored-by: Claude <claude@anthropic.com>	2026-05-13 12:06:13 -07:00
yuneng-jiang	e84282b7b3	[Infra] Bump deps (#27157 ) * bump: version 0.4.70 → 0.4.71 * bump: version 0.1.39 → 0.1.40 * uv lock	2026-05-05 15:58:05 -07:00
user	7faba9656f	Merge remote-tracking branch 'upstream/litellm_internal_staging' into fix/managed-resource-service-account-isolation	2026-05-05 01:38:11 +00:00
user	83971a8712	fix(proxy): normalize managed resource team owner field	2026-05-04 17:05:50 -07:00
user	bfdd786962	chore(deps): refresh dependency locks	2026-05-04 11:36:18 -07:00
user	799d79160a	fix(proxy): match Prisma index names + extend listing to team for user-keyed callers Two follow-ups to the managed-resource isolation fix: 1. Rename the new composite indexes to match Prisma's auto-generated naming convention (`<Table>_created_by_team_id_created_at_idx`). The previous `*_team_owner_created_at_idx` names left `prisma migrate diff` reporting an outstanding `RENAME INDEX`, failing `test_aaaasschema_migration_check`. 2. Make `build_owner_filter` return an OR clause when the caller has both a `user_id` and a `team_id`, so listings include team-shared resources the same way `can_access_resource` already permits reading them. Without this a user could fetch a team-shared resource by id but never see it in their list view. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 20:44:51 +00:00
user	84fede37b4	fix(proxy): isolate managed resources for service-account API keys Service-account API keys are issued without a `user_id`, and managed file/batch/vector-store ownership checks compared `resource.created_by == user_api_key_dict.user_id`. Because Python evaluates `None == None` as True, any service-account key passed ownership checks for any resource also created without a user id, and listing endpoints skipped the `created_by` filter entirely when the caller had no user id — returning every tenant's records. Replace the bare equality with an identity-aware helper: - Admins (PROXY_ADMIN, PROXY_ADMIN_VIEW_ONLY) keep their unscoped view. - Callers with a `user_id` are scoped to records they created. - Callers without a `user_id` but with a `team_id` are scoped to records created within their team via a new `created_by_team_id` column. - Callers with no admin role and no identifying ids are denied — the listing path returns an empty page without issuing a query. Schema migration adds `created_by_team_id` to LiteLLM_ManagedFileTable, LiteLLM_ManagedObjectTable, and LiteLLM_ManagedVectorStoreTable, plus indexes for the new filter. Writes in BaseManagedResource and the enterprise managed_files hook now stamp the column from `user_api_key_dict.team_id`. Reads in `can_user_access_unified_resource_id`, `can_user_call_unified_file_id`, `can_user_call_unified_object_id`, `list_user_resources`, `list_user_batches`, and `get_user_created_file_ids` all delegate to the new helper. Tests cover the helper in isolation, the base-class listing/access paths, and the enterprise file-access hook (including a regression test for the original `None == None` bypass). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 20:22:37 +00:00
Yuneng Jiang	dd549d9c50	bump: version 0.4.69 → 0.4.70	2026-04-30 21:39:37 -07:00
Sameer Kankute	6588564a88	Merge pull request #26691 from BerriAI/litellm_team_search_credentials_metadata feat(proxy): add team-level search provider credentials	2026-04-30 08:35:17 +05:30
ishaan-berri	4a7af1ff68	feat(proxy): durable agent workflow run tracking via /v1/workflows/runs (#26793 ) * feat(schema): add workflow run tracking tables (LiteLLM_WorkflowRun, LiteLLM_WorkflowEvent, LiteLLM_WorkflowMessage) * feat(proxy): add /v1/workflows/runs endpoints for durable agent workflow tracking * feat(proxy): register workflow management router in proxy_server * docs(workflows): add README for workflow run tracking API * test(workflows): add unit tests for /v1/workflows/runs endpoints * fix(workflows): atomic event+status update via tx(), run_id 404 guard, sequence retry on collision * test(workflows): add tx mock, 404 on unknown run_id, retry-on-collision tests * fix(workflows): constrain status to Literal enum, rename total→count in list responses * add tenant isolation and bounded limits to workflow endpoints * add created_by column and index to LiteLLM_WorkflowRun * add ownership and bounded-limit tests for workflow endpoints * Fix workflow run ownership for null owners * guard prisma import in workflow_management_endpoints * sync schema.prisma copies with workflow run models * black: format workflow_management_endpoints.py --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com>	2026-04-29 17:12:18 -07:00
Sameer Kankute	b5c60d8873	Add migration script	2026-04-29 12:30:10 +05:30
Sameer Kankute	4b03cb68a2	feat(proxy): move search tool access to object permissions Store search tool allowlists only on object permissions, wire auth/management/UI flows to object_permission.search_tools, and remove legacy team-metadata search credential code and tests. Made-with: Cursor	2026-04-29 12:29:20 +05:30
Yuneng Jiang	67628a60c3	bump: version 0.4.68 → 0.4.69	2026-04-25 19:30:33 -07:00
Yuneng Jiang	4884b0b611	Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_yj_apr23 # Conflicts: # litellm/proxy/management_endpoints/key_management_endpoints.py	2026-04-25 09:47:47 -07:00
Krrish Dholakia	70492cee42	feat(proxy): add /v1/memory CRUD endpoints (#26218 ) * feat(proxy): add /v1/memory CRUD endpoints with user/team scoping New LiteLLM_MemoryTable stores user/team-scoped key/value entries with optional JSON metadata. Value is a String (LLM-readable text) and metadata is an optional Json? envelope, matching the Letta + mem0 hybrid model so future structured fields can be added without a schema migration. Endpoints: POST /v1/memory - create GET /v1/memory - list (caller-scoped; admins see all) GET /v1/memory/{key} - fetch one PUT /v1/memory/{key} - upsert DELETE /v1/memory/{key} - delete Non-admin callers cannot set a user_id/team_id other than their own. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(proxy/memory): omit metadata field when None on create Prisma's Python client rejects `metadata=None` on a `Json?` field with "A value is required but not set" — the field must be omitted from the `data` dict entirely to store SQL NULL. Build the create payload conditionally in both `create_memory` and the PUT-create branch of `upsert_memory`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ui): add Memory page to view/manage /v1/memory entries Adds a new "Memory" sidebar item under Tools so users can see what their agents have stored. Lists all memories visible to the caller (scoped by the backend), with a key-search filter, preview column, scope tags, and view/edit/delete actions. Create modal accepts optional JSON metadata. - networking.tsx: fetchMemoryList / createMemory / updateMemory / deleteMemory wired to the /v1/memory CRUD endpoints. - MemoryView + MemoryEditModal: new antd-based components (per CLAUDE.md: use antd for new UI, not tremor). - page.tsx + leftnav.tsx: wire the "memory" route + sidebar entry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(memory): add key_prefix filter + promote Memory to AI GATEWAY nav Backend: - GET /v1/memory now accepts `key_prefix` for Redis-style namespace scans (e.g. `?key_prefix=user:`). When both `key` and `key_prefix` are passed, `key_prefix` wins. - Prefix filter sits under the visibility filter in the Prisma where clause, so it can never leak rows across user/team scopes. - New tests: prefix match, and cross-scope isolation (another user's `user:` rows must not appear in the caller's results). UI: - Memory moved from a Tools submenu to a top-level AI GATEWAY item (alongside Agents, MCP Servers, Skills) — it's an API primitive, not a tool-management surface. - Search box now drives prefix search, matching the Redis mental model ("type the namespace, see everything under it"). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> fix(memory): enforce unique key per scope by using NULLS NOT DISTINCT The unique constraint `(key, user_id, team_id)` on LiteLLM_MemoryTable silently allowed duplicates when user_id or team_id was NULL, because Postgres treats every NULL as distinct by default (ANSI semantics). A caller with no team_id could POST the same key three times and get three rows. Migration: 1. Dedupe existing rows, keeping the most recent per (key, user_id, team_id), using `IS NOT DISTINCT FROM` so NULL == NULL. 2. Drop the old unique index. 3. Recreate it with `NULLS NOT DISTINCT` (Postgres 15+). No code change: POST already returns 409 on unique-violation error messages — it just wasn't firing before because the constraint didn't catch the NULL-team case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(memory): make key globally unique, 409 on any duplicate Switches from the compound unique `(key, user_id, team_id)` to a simple `key @unique`. The compound form silently allowed duplicates when user_id or team_id was NULL (Postgres treats each NULL as distinct), so callers could POST the same key repeatedly. Globally-unique key means one row per key, period — any duplicate create → 409. - schema.prisma (×3): `key String @unique`, drop `@@unique(...)`. - initial add_memory_table migration: unique index on (key) only. - Remove the now-unused follow-up NULLS NOT DISTINCT migration. - Endpoint error message simplified ("already exists" — no "for this scope"). - Test fake's create() now enforces global key uniqueness. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ui/memory): full-width layout + user/teams-style columns - Add `w-full` to the MemoryView outer div so the page fills the flex-flex-1 container (was collapsing to intrinsic width). - Replace the combined "Scope" column with separate User ID / Team ID columns, matching the layout of the Users / Teams pages: ID, Name, Preview, User ID, Team ID, Updated, Actions. - IDs render with a truncated mono label + copy-to-clipboard button, same pattern as view_users. - Detail drawer now shows Memory ID / User ID / Team ID as separate fields instead of stacked color tags. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ui/memory): use clean MCP-style ID pill, drop copy icons The ID / User ID / Team ID columns showed a mono text blob with a copy-to-clipboard icon next to each value — too busy compared to the MCP Servers page. Swap the renderer for MCP's pill style: - Truncated mono ID inside a blue Tailwind pill (`font-mono text-blue-600 bg-blue-50 ... rounded-md border`). - No copy icon. Full ID surfaces via tooltip. - ID column is a button that opens the detail drawer on click; user/team ID pills are static (not clickable). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(memory): address greptile review feedback Addresses 5 greptile findings (3/5 → higher confidence target): 1. Identity-less orphan rows (P1): non-admin callers with no user_id AND no team_id could create rows that the visibility filter would never match again. Now rejected up front with 400 — caller must authenticate with a scoped key or act as PROXY_ADMIN. 2. Upsert race returning 500 (P1): PUT's check-then-create isn't atomic; a concurrent writer could slip a row in between the 404-check and the create call. Now catch unique-violation on create, re-read, and fall through to update — PUT stays idempotent. If the conflicting row belongs to a different scope, surface a 409 instead of 500. 3. PUT-create scope inconsistency (P2): PUT's create branch always used the caller's own user_id/team_id, so admins couldn't bootstrap rows scoped elsewhere via PUT (only POST). Now PUT-create calls the shared `_resolve_scope()` helper, matching POST semantics. 4. Stale schema comment (P2): schema said "Keyed by (key, user_id, team_id)" but `key` is globally unique. Updated all three schema copies to reflect the actual design. 5. UI silently truncated at 200 (P2): MemoryView fetched pageSize=200 with no load-more. Swapped to real server-side pagination driven by `data.total`; page size is now 50 and the pager is a real AntD control. Also extracts a shared `_resolve_scope()` helper and `_is_unique_violation()` from create_memory so POST and PUT don't drift on the scope/error logic. Tests: +3 new (identity-less 400, PUT admin bootstrap, PUT race → update), 18/18 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(memory): typed Prisma error + explicit-null metadata on PUT Two more greptile threads from the last review: - Unique-violation detection was string-matching "Unique"/"UniqueViolation" in the exception message, fragile across Prisma/driver versions. Now check the typed error `code == "P2002"` first, with string fallback. - PUT could not distinguish "metadata omitted" from "metadata: null" — both parsed as `None`, so callers had no way to clear stored metadata. Switch to Pydantic v2's `model_fields_set` to tell which fields the caller actually sent; explicit null now clears the column. New tests: - explicit null clears metadata - omitted metadata preserves existing value Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ui/memory): send explicit null when user clears metadata Addresses the remaining P1 from the last greptile review: When the edit modal's metadata textarea was cleared and saved, `metadataParsed` stayed `undefined`, `JSON.stringify` dropped the key entirely, and the backend's `model_fields_set` guard therefore left the stored metadata untouched — UI showed success but nothing changed. Now: empty textarea on edit → send explicit `null` so the backend sees `metadata` in `model_fields_set` and clears the column. Empty textarea on create still maps to `undefined` (field omitted) to avoid Prisma's `Json? = None` quirk on insert. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ui/memory): preserve slashes in key path encoding The backend route `/v1/memory/{key:path}` supports keys with slashes, but `encodeURIComponent` encoded `/` as `%2F`. Some proxies (nginx default, CloudFlare, AWS ALB) reject or re-decode `%2F` mid-flight, so UI update/delete calls on slash-containing keys could fail or silently misroute. New helper `encodeMemoryKeyForPath` splits by `/`, URL-encodes each segment, then rejoins with literal `/`. Every other unsafe char (spaces, `?`, `#`, `%`) stays encoded per-segment; slashes stay as path delimiters, matching what the `:path` converter expects. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ui/memory): drop misleading client-side column sorters With server-side pagination, client sorters on `key` and `updated_at` only reorder the current page while pretending to sort the full dataset — users would see "sorted by name" but only the visible 50 rows would actually be sorted. Remove the sorters. The backend already returns rows in `updated_at DESC` order (sensible default for a memory view), and users can narrow the result with the key-prefix filter. Greptile also flagged missing `@@map` on the new model as a "consistency" issue, but only 1 of 59 tables in this repo uses `@@map` — the dominant pattern is to rely on Prisma's default (model name == table name). Skipping that finding as a false-positive on convention. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(memory): compose visibility + key filters via explicit AND Greptile P1 (filter-fragility): `where.update(vis)` was semantically correct today, but dict-merging by key meant any future visibility filter that grew a new top-level "OR" would silently clobber the existing key filter. Compose explicitly instead: where = {"AND": [key_filter, vis]} Applied to both `list_memory` and `_find_memory_for_caller`. When either side is empty (admin has no visibility filter; list has no key filter), skip the wrapper and use the non-empty side directly to keep the generated SQL clean. Test fake's `_matches` now understands top-level `AND` too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(ui/memory): wrap write helpers with react-query useMutation Previously the Memory view read via `useQuery` but called the raw create/update/delete fetch helpers directly in handlers, tracking loading state with a local `submitting` flag and invalidating state via `refetch()`. That mixes two concerns: - it skips react-query's mutation state (isPending / isError / isSuccess) - `refetch()` only retouches the currently-mounted query instance, not other cached pages, so navigating back to an older page could show stale rows Switch the three write paths to `useMutation`: - `createMutation`, `updateMutation`, `deleteMutation` — each owns the mutation fn, success toast, and error toast. - Success handlers invalidate the whole `["memoryList", ...]` prefix via `queryClient.invalidateQueries`, so every cached page refetches (pagination + filter-aware). - Refresh button now invalidates instead of `refetch()`, keeping all behavior consistent. - handleSave/handleDelete become thin adapters that call `.mutateAsync`; their errors are swallowed locally since the mutation's onError has already surfaced the toast. Also tightened the edit modal's key-field tooltip to reflect the actual global-unique semantics (was "Unique per user/team scope"). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(memory): close cross-user write gap + sanitize 500 errors (Veria) Addresses two Veria findings: High — cross-user memory tampering via team membership. The visibility filter uses an OR (`user_id == caller OR team_id == caller`) so team members can SEE each other's team-scoped rows. That's intentional for list/get. But because PUT/DELETE used the same filter to find the target row, any team member could overwrite or delete a teammate's personal row whenever both `user_id` and `team_id` were stamped on it — broader visibility was being silently treated as broader authority. New `_assert_write_access(row, caller)` enforces ownership for mutations. Non-admin rules: - The row's `user_id` must match the caller (personal ownership), OR - The row has no `user_id` and its `team_id` matches the caller's team (a "pure team row" intended for shared writes). Admins bypass the check. The same gate runs in PUT (both regular and post-race-recovery branches) and DELETE. Medium — DB internals leaked through 500 detail. Every `except` block was raising `HTTPException(500, detail=str(e))`, which surfaces Prisma error strings (table/column names, host:port, error class names) to API callers. New `_internal_error()` helper logs the real exception server-side and returns a generic, caller-safe `detail`. Applied to create, list, upsert (general fallthrough), and delete. Also tightened the race-recovery 409 message to drop the "in a different scope" wording — the caller never needs to know whose scope it lives in. Tests (+5): - teammate cannot overwrite personal row → 403 - teammate cannot delete personal row → 403 - teammate CAN modify pure team row (no user_id stamped) → 200 - admin bypasses write-auth → 200 - 500 response never echoes Prisma internals (table/host/class names) 25/25 unit tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(memory): require team admin to modify pure team rows Tightens the write-authorization rule for "pure team rows" (rows with no user_id stamped, only team_id) to match the pattern used by team-management endpoints (`_is_user_team_admin` + `_is_user_org_admin_for_team`): - Plain team members can READ team rows via the OR visibility filter (intentional, unchanged). - Only PROXY_ADMIN, team admins of the row's team_id, or org admins for the team's organization may MODIFY them. Plain members get 403. `_assert_write_access` is now async and takes the prisma_client so it can fetch the team and run the existing `_is_user_team_admin` / `_is_user_org_admin_for_team` helpers from `litellm.proxy.management_endpoints.common_utils`. The org-admin path is best-effort: it calls `get_user_object`, which depends on the proxy_server module being initialized, so any exception there is treated as "not an org admin" rather than crashing the request. Tests: - team admin can modify pure team row → 200 - plain team member cannot modify pure team row → 403 - plain team member cannot delete pure team row → 403 Updates the test fake to add a tiny `litellm_teamtable.find_unique` implementation and a `_make_team(team_id, admin_user_ids=[...])` helper. 27/27 unit tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: mypy + UI page-metadata sync for memory page Two CI failures: 1. mypy: `_find_memory_for_caller` had `key_filter` inferred as `dict[str, str]` (literal type) and the conditional `{"AND": [key_filter, vis]}` returned `dict[str, list[...]]`, so the join site failed `dict-item` typing. Annotate both intermediates as `dict` so mypy widens the value type. 2. UI test (`page_utils.test.ts > should have descriptions for all pages`): every leftnav entry must have a description in `page_metadata.ts`, and `memory` was missing. Added a one-line description, matching the style of neighboring entries. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [Feat] Day-0 support for GPT-5.5 and GPT-5.5 Pro (#26449) * feat(openai): day-0 support for GPT-5.5 and GPT-5.5 Pro Add pricing + capability entries for the new GPT-5.5 family launched by OpenAI on 2026-04-24: - gpt-5.5 / gpt-5.5-2026-04-23 (chat): $5/$30/$0.50 per 1M input/output/cached input - gpt-5.5-pro / gpt-5.5-pro-2026-04-23 (responses-only): $60/$360/$6 per 1M input/output/cached input Other fees (long-context >272k, flex, batches, priority, cache discounts) follow the same ratios as GPT-5.4, with context window retained at 1.05M input / 128K output. No transformation / classifier code changes are required: OpenAIGPT5Config.is_model_gpt_5_4_plus_model() already matches 5.5+ via numeric version parsing, and model registration is driven from the JSON. The existing responses-API bridge for tools + reasoning_effort (litellm/main.py:970) already covers gpt-5.5-pro. Tests: - GPT5_MODELS regression list now covers gpt-5.5-pro and dated variants - New test_generic_cost_per_token_gpt55_pro cost-calc test - Updated test_generic_cost_per_token_gpt55 for long-context fields * fix(openai): mirror reasoning_effort flags onto gpt-5.5 dated variants gpt-5.5-2026-04-23 and gpt-5.5-pro-2026-04-23 were missing the supports_none_reasoning_effort, supports_xhigh_reasoning_effort, and supports_minimal_reasoning_effort flags that their non-dated counterparts define. Reasoning-effort routing in OpenAIGPT5Config is fully capability-driven from these JSON flags — since an absent flag is treated as False for opt-in levels (xhigh), users pinning to a dated snapshot would silently lose xhigh support and diverge from the base alias on logprobs + flexible temperature handling. Copy the flags onto both dated variants so every dated snapshot inherits the base model's reasoning-effort capability profile. Adds a parametrized regression test that asserts supports_{none,minimal,xhigh}_reasoning_effort parity between each dated variant and its non-dated counterpart, preventing future drift when new snapshots are added. * fix(schema): close LiteLLM_MemoryTable model brace dropped during merge The rebase against `litellm_internal_staging` (which added `LiteLLM_AdaptiveRouterState` / `LiteLLM_AdaptiveRouterSession`) left the closing brace of `LiteLLM_MemoryTable` missing in all three schema copies — the next model declaration ended up parsed as a field of the memory table, surfacing as the CI prisma error: error: This line is not a valid field or attribute definition. --> schema.prisma:1250 \| 1249 \| // Per-(router, request_type, model) Beta posterior for the adaptive router. 1250 \| model LiteLLM_AdaptiveRouterState { Add the missing `}` (and the standard blank line) after the memory table's `@@index([team_id])` in `schema.prisma`, `litellm/proxy/schema.prisma`, and `litellm-proxy-extras/litellm_proxy_extras/schema.prisma`. `prisma generate --schema litellm/proxy/schema.prisma` now runs clean; 27/27 memory unit tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Mateo Wang <277851410+mateo-berri@users.noreply.github.com>	2026-04-24 18:38:07 -07:00
yuneng-jiang	1a3db6dfa4	Merge pull request #26365 from stuxf/fix/deps-security-bumps chore(deps): bump vulnerable dependencies	2026-04-24 13:02:58 -07:00
yuneng-jiang	3c3ef7ec9f	Merge pull request #26369 from stuxf/fix/license-metadata chore(packaging): declare MIT license in litellm-proxy-extras metadata	2026-04-24 13:01:52 -07:00
user	5ba6bc0784	chore(deps): bump uv to 0.11.7 + drop dead npm sed - UV_IMAGE across all Dockerfiles: 0.10.9 -> 0.11.7. - Loosen `required-version` in enterprise/ and litellm-proxy-extras/ from strict `==0.10.9` to `>=0.10.9` so the new Docker image can build those workspace members. Matches the main pyproject range. - Drop the `sed` block that rewrote tar/minimatch version ranges in npm's bundled package.json files. The override loop above already swaps the vendored directories on disk; npm doesn't re-resolve at runtime, so the sed was cosmetic.	2026-04-24 00:36:59 +00:00
user	d60734392b	chore(packaging): declare MIT license in litellm-proxy-extras metadata litellm-proxy-extras ships a LICENSE file with MIT terms but did not declare a `license` SPDX expression in its pyproject.toml, so tools that read the metadata (PyPI, Nexus IQ, pip-licenses) reported License-None for every published version. Add the explicit expression so downstream scanners resolve the declared license.	2026-04-23 23:57:22 +00:00
yuneng-jiang	6a25866f51	Merge pull request #26295 from BerriAI/yj_bump_apr22 [Infra] bump versions	2026-04-22 18:33:03 -07:00
Yuneng Jiang	3ddb3cbdf6	bump: version 0.4.67 → 0.4.68	2026-04-22 18:20:21 -07:00
ryan-crabbe-berri	c4c1861389	Merge pull request #26195 from BerriAI/litellm_team_member_total_spend Track per-member total spend on team memberships	2026-04-22 18:20:16 -07:00
yuneng-jiang	24aec61e4b	Merge pull request #26049 from BerriAI/litellm_adaptive_routing Litellm adaptive routing	2026-04-22 08:52:51 -07:00
Krrish Dholakia	f1da202d9e	fix(adaptive_router): P1 flusher hot-reload + P2 hook accumulation + CI P1: start the adaptive-router flusher loop unconditionally at proxy boot instead of gating on 'adaptive_routers is non-empty'. Adaptive routers added via /config/reload after boot now have their queues drained. State is lazy-loaded per router on first flush tick (new _state_loaded flag on AdaptiveRouter) so hot-reloaded routers still get their persisted priors. P2: _finalize_adaptive_router_if_configured now prunes stale AdaptiveRouterPostCallHook callbacks from every litellm callback list before registering new ones. Without this, every Router replacement left the old hooks wired up in litellm.callbacks and double-fired signal recording for every request. Uses logging_callback_manager.remove_callbacks_by_type (same pattern as the semantic tool filter). CI fixes: - black --check failure: reformatted litellm/router.py - schema migration diff: aligned @@index with the explicit index name ('idx_adaptive_router_session_activity') from the original migration by adding 'map:' to all three schema.prisma copies. No new migration needed. Tests: 1 new covering the prune-on-hot-reload path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 17:49:38 -07:00
yuneng-jiang	5dc2926a1e	Merge pull request #26194 from BerriAI/litellm_fix_migration_thrashing [Feature] Proxy: opt-in v2 migration resolver	2026-04-21 16:55:54 -07:00
Krrish Dholakia	ecd9a83e61	fix(adaptive_router): P2 review items — @updatedAt + snapshot samples - Mark last_updated_at (AdaptiveRouterState) and last_activity_at (AdaptiveRouterSession) with @updatedAt so Prisma refreshes the timestamps on every write. Without this the fields stayed frozen at INSERT time and the last_activity_at index was misleading for any future TTL/eviction logic. Applied to all three schema.prisma copies; no migration SQL change needed (Prisma @updatedAt is a client-side annotation that doesn't touch DDL). - get_state_snapshot: report cell.total_samples instead of alpha+beta for the 'samples' field. The previous value inflated every cell by the COLD_START_MASS prior (e.g. showed 10.0 before any real traffic arrived), which confused operators reading /adaptive_router/.../state. Updated docs + the snapshot test to match. Also fixes two pre-existing merge-break syntax errors in router.py (missing ')' on the AdaptiveRouter TYPE_CHECKING import; truncated async_pre_routing_hook dispatch call for the adaptive router branch) that were masking the rest of the file from the interpreter. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 16:27:01 -07:00
Krrish Dholakia	c7342bdc4f	Merge branch 'litellm_internal_staging' into litellm_adaptive_routing	2026-04-21 16:22:38 -07:00
Yuneng Jiang	2b8b9502d9	[Fix] v2 resolver: swallow non-connection DB errors; wrap resolve failures Addresses two further Greptile findings: - `_warn_if_db_ahead_of_head` only caught `psycopg.OperationalError`. Non-connection DB errors (e.g. `InsufficientPrivilege` / 42501 if the runtime DB user lacks SELECT on `_prisma_migrations`) would propagate uncaught and crash startup — contradicting the docstring's "informational only, never blocks" guarantee. Widen the catch to `psycopg.DatabaseError` so all DB-layer errors are swallowed. - In the P3009 and P3018 idempotent-recovery paths, the call to `_resolve_specific_migration(name)` was not wrapped in its own try/except. Being inside an active `except CalledProcessError` handler, a new `CalledProcessError` from the resolve call would NOT re-enter the same handler — it would propagate out as `CalledProcessError`, past `proxy_cli.py`'s `except RuntimeError`, crashing startup with an unhandled traceback instead of the intended clean `sys.exit(2)`. Wrap both call sites to convert to RuntimeError. Adds unit tests for both behaviors.	2026-04-21 15:53:07 -07:00
Yuneng Jiang	9049f37864	[Fix] v2 migration resolver: address Greptile review findings - Open the psycopg connection in `_warn_if_db_ahead_of_head` with autocommit=True. Without it, psycopg3's `with conn` calls COMMIT on clean exit, which fails after the `UndefinedTable` (fresh-DB) branch left the transaction in an aborted state — crashing first-run startups. - Wrap the v2 `prisma db push` path in try/except and raise RuntimeError on CalledProcessError/TimeoutExpired. Otherwise these propagate past proxy_cli.py's `except RuntimeError` as unhandled tracebacks. - Reword the loop-exhaustion error to cover the non-timeout exit path (repeated P3005/P3009/P3018 idempotent-recovery `continue`s), not just persistent timeouts. Adds a unit test for the db_push error wrapping.	2026-04-21 15:34:24 -07:00
Yuneng Jiang	a16c00e22c	[Feature] Proxy: opt-in v2 migration resolver (--use_v2_migration_resolver) Default behavior (v1) is unchanged. Users who have seen schema thrashing during rolling deploys can opt into the v2 resolver with `--use_v2_migration_resolver`. Why v2 is safer: - Runs `prisma migrate deploy` only. - Recovers from P3005 (baseline) and idempotent P3009/P3018 errors, same as v1. - Never calls `_resolve_all_migrations`, which generates a schema diff between the live DB and the shipped schema.prisma and applies it via `prisma db execute`. That path bypassed every migration's SQL and was the root cause of thrashing when two LiteLLM versions contended for the same DB. - Logs a non-blocking warning when the DB has migrations applied that are newer than anything this build ships (ahead-of-HEAD). It does not refuse to start — many users have unusual ledger state from past thrashing, and blocking startup would be a breaking change. Also prints a message on startup when the default (v1) resolver is in use, pointing operators at the opt-in flag. Adds unit tests covering the v2 fail-fast paths, the stripping of Prisma-specific query params from DATABASE_URL (needed for psycopg), the timestamp helpers, and pins the default: v1 still invokes `_resolve_all_migrations`, v2 must not.	2026-04-21 14:20:35 -07:00
Ryan Crabbe	e5f3e15969	Track per-member total spend on team memberships Adds total_spend column to LiteLLM_TeamMembership that accumulates continuously and is not zeroed by the budget cycle reset job. This enables UI surfaces to distinguish current-cycle spend (the existing spend column, which resets) from lifetime spend per team member. Also exposes budget_reset_at on LiteLLM_BudgetTable so /team/info callers can see when a member's budget window next resets. The field was already stored in the DB but stripped by the response Pydantic model. Includes regression tests that: - Guard the reset job against ever writing total_spend: 0 - Verify the spend writer increments both spend and total_spend in one UPDATE statement.	2026-04-21 13:56:44 -07:00
Yuneng Jiang	b39f210a6c	[Infra] Add freshness and destructive guards to migration workflow Generating a migration from a stale branch could silently emit DROP COLUMN for columns the stale branch did not know about, and the script would write that SQL to a new migration file with no warning. Adds two guards to ci_cd/run_migration.py: - Branch freshness check: fetches origin/<base-branch> and exits 3 if HEAD is behind. Default base is litellm_internal_staging. New flags: --base-branch, --skip-freshness-check. - Destructive guard: refuses (exit 2) if the generated diff contains DROP COLUMN / DROP TABLE / DROP INDEX, unless --allow-destructive is passed. Refusal banners include guidance and an explicit callout instructing AI agents not to auto-bypass the flags. Also treats Prisma's "-- This is an empty migration." output as a no-op rather than writing an empty file. Updates litellm-proxy-extras/migration_runbook.md with the new workflow, flag documentation, and agent warnings.	2026-04-21 12:00:23 -07:00
Krrish Dholakia	b6fc75b3ce	Merge branch 'litellm_internal_staging' into litellm_adaptive_routing	2026-04-20 15:28:08 -07:00
Krrish Dholakia	fba736ca3c	fix(adaptive_router): 3 P1 review defects - Use 'auto_router/adaptive_router' prefix in example yaml, docs, and README — the old 'adaptive_router/...' and 'openai/gpt-4o-mini' values silently skipped adaptive-router init because detection requires the 'auto_router/adaptive_router' prefix. - Read x-litellm-min-quality-tier from request headers (and the 'min_quality_tier' metadata key as fallback) in async_pre_routing_hook. Previously the documented header was defined but never extracted, so the quality-floor feature was inert. - Evict expired entries from _session_states. The cache grew without bound — added a parallel expiry map (same TTL as _owner_cache) and an opportunistic bulk sweep when the cache crosses a size threshold. - Align adaptive-router migration SQL with Prisma schema: all count columns and the 'clean_credit_awarded' / 'last_processed_turn' fields are NOT NULL in the data model, so the migration now declares them NOT NULL. Fixes test_aaaasschema_migration_check. Tests: 8 new covering header/metadata/precedence/invalid-value paths for min_quality_tier and TTL-based eviction of _session_states. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 15:22:18 -07:00
ishaan-berri	2f22a1293e	bump litellm-proxy-extras to 0.4.67 (#26043 ) * bump litellm-proxy-extras version to 0.4.67 * bump litellm-proxy-extras pin to 0.4.67 in litellm pyproject * regenerate uv.lock for litellm-proxy-extras 0.4.67 * bump litellm-enterprise version to 0.1.38 * bump litellm-enterprise pin to 0.1.38 in litellm pyproject * regenerate uv.lock for litellm-enterprise 0.1.38	2026-04-18 19:03:56 -07:00
Krrish Dholakia	dd4a1d2be2	feat: add adaptive routing to litellm allow model routing to improve based on conversation signals ensures router is picking best model for task	2026-04-18 16:35:17 -07:00
Ishaan Jaffer	e6a20af646	fix(proxy-extras): skip post-deploy sanity check when no migrations pending When prisma migrate deploy reports 'No pending migrations to apply' the DB already matches schema — running _resolve_all_migrations (migrate diff + prisma db execute) adds 25+ seconds unnecessarily, causing the proxy to miss the 90-second startup timeout in test_litellm_proxy_server_config_no_general_settings.	2026-04-17 15:59:41 -07:00
Ishaan Jaffer	33175a8ee7	fix(proxy-extras): fall back to prisma db execute when migrate diff fails on pooler URL When DIRECT_URL is not set and DATABASE_URL is a Neon pooler URL, prisma migrate diff fails (pooler doesn't support extended query protocol for schema introspection). Previously _resolve_all_migrations returned early without applying any migrations, leaving the budget_limits column missing and causing test_auth_callback_new_user to fail. Now falls back to running each migration SQL file via prisma db execute --file, which works with pooler URLs and is safe to re-run due to IF NOT EXISTS guards.	2026-04-17 15:38:48 -07:00
Ishaan Jaffer	33a2cee4af	fix(proxy-extras): use DIRECT_URL for prisma migrate diff, tempfile for diff dir	2026-04-17 15:17:15 -07:00
Ishaan Jaffer	7c47bbd226	fix(migration): run schema sanity check after P3009/P3018 idempotent migration recovery	2026-04-17 15:01:10 -07:00
Ishaan Jaffer	9281147a1a	fix(schema): add budget_limits Json? to LiteLLM_TeamTable and LiteLLM_VerificationToken	2026-04-17 14:47:18 -07:00
Ishaan Jaffer	e8461b5b97	style: run black formatter on files from main merge	2026-04-17 13:02:59 -07:00
Ishaan Jaffer	f31d4faa87	Merge origin/main into litellm_ishaan_april6	2026-04-17 12:36:51 -07:00
Yuneng Jiang	073685136d	bump: version 0.4.65 → 0.4.66	2026-04-16 09:54:56 -07:00
Ishaan Jaffer	def9c4ec47	chore: merge litellm_internal_staging, resolve uv.lock conflict	2026-04-15 18:51:19 -07:00
harish876	5f99e52fbc	Added concurrent index creation. Added necessary disclaimers to index creation. Index creation is scoped to a single statements and hence Validated index creation in local env	2026-04-15 22:52:47 +00:00

1 2 3 4 5 ...

417 Commits