litellm

Author	SHA1	Message	Date
Mateo Wang	48d7e15b83	chore(admin-ui): regenerate static export with trailingSlash: true (#28112 ) * chore(admin-ui): regenerate static export with trailingSlash: true Rebuilds litellm/proxy/_experimental/out/ from ui/litellm-dashboard with `trailingSlash: true` enabled in next.config.mjs. Next.js now emits every route as <dir>/index.html (e.g. mcp/oauth/callback/index.html) instead of <dir>.html with a sibling metadata-only directory, which fixes the 404 on extensionless URLs served through FastAPI's StaticFiles(html=True) mount. This is the build artifact half of the fix; the config change, Dockerfile cleanup, and regression test live in the follow-up source PR that stacks on top of this branch. * fix(admin-ui): emit nested routes as <dir>/index.html (#28106) Linear and other OAuth providers redirect the user back to /ui/mcp/oauth/callback?code=...&state=... after the consent step. The packaged Next.js static export only produced /ui/mcp/oauth/callback.html, so FastAPI's StaticFiles served a 404 on the extensionless URL and the OAuth handshake never completed. The Dockerfile.non_root build step tried to paper over this at image-build time with `for html_file in .html; do ...`, but that shell glob does not recurse, so nested routes like mcp/oauth/callback.html were left stranded next to an empty mcp/oauth/callback/ directory containing only Next.js metadata. The runtime restructure step in proxy_server.py was then skipped because the .litellm_ui_ready marker had already been dropped. Set trailingSlash: true in the dashboard's Next.js config so the export emits every nested route as <dir>/index.html natively. The Dockerfile loop is now a no-op for the bundled UI and has been removed; the .litellm_ui_ready marker is still written so the proxy keeps skipping the redundant Python restructure step at startup. Stacks on top of the static export regeneration in the parent branch. chore: restore origin/litellm_internal_staging out files	2026-05-25 21:06:50 -07:00
milan-berri	0fb710400f	fix(spend_counter): seed Redis counter via SET NX to prevent cross-pod double-seed (#27854 ) * fix(spend_counter): seed Redis counter via SET NX to prevent cross-pod double-seed Symptom ------- Customers on multi-pod deployments see team `spend` jump to ~2x (or N x the pod count) shortly after a Redis cache miss / TTL expiry, triggering spurious "Budget Crossed" alerts and blocked requests until the value is manually reset. Root cause ---------- `SpendCounterReseed.coalesced` warmed the primary spend counter by calling `redis.async_increment(key, value=db_spend, refresh_ttl=True)`, which lowers to Redis `INCRBYFLOAT`. That is additive, not idempotent. The per-counter `asyncio.Lock` only coalesces seeders inside one process. With N pods sharing one Redis, on a cold key (cold start, TTL expiry, manual delete) every pod independently passes its lock + Redis re-check, reads the same `db_spend`, and issues `INCRBYFLOAT db_spend`. Final value: N x db_spend. Fix --- Use `redis.async_set_cache(key, value=db_spend, nx=True)` for the seed. SET NX is atomic across pods: exactly one writer initializes the key; losers read the winner's value via `async_get_cache`. This is the same idiom already used by `coalesced_window` in the same file, so the two seed paths are now consistent. Per-request deltas continue to use `INCRBYFLOAT` (correct - additive behaviour is what we want for increments, not for initial seed). Verification ------------ Live two-process repro against the same Postgres + Redis (DB spend = 506): Unpatched: 4/4 runs -> Redis counter = ~1012 (~2 x db_spend) Patched: 12/12 runs -> Redis counter = ~506 Unit tests (`test_proxy_server.py`): - New `test_primary_spend_counter_redis_concurrent_seed_does_not_double_seed` patches `_get_lock` to return a fresh lock per caller (otherwise the per-process lock masks the race), races two `coalesced` calls, and asserts final = 506 with exactly one of two SET NX attempts winning. - 4 existing tests updated for the new seed contract (SET NX for the seed, INCRBYFLOAT only for the per-request delta). - Full `spend_counter or reseed or budget` slice: 22 passed. Co-authored-by: Cursor <cursoragent@cursor.com> * test(spend_counter): make SET NX mock atomic so loser branch is exercised Greptile flagged that `redis_set_cache` in test_primary_spend_counter_redis_concurrent_seed_does_not_double_seed placed `await asyncio.sleep(0)` AFTER the NX membership check. Both concurrent tasks observed an empty `redis_store`, passed the guard, and both returned True - so the loser branch (else: read back winner's value) was never exercised. Fix the mock to model real atomic Redis SET NX: - Yield BEFORE the membership check so two concurrent callers interleave the way real SET NX does (first to resume runs check + write atomically and wins; second resumes after the key exists and loses). - Track set_cache return values; assert sorted([loser, winner]) so we know exactly one task wins and one loses. - Track async_get_cache calls that happen AFTER at least one SET NX has completed; assert at least one such read - that is the loser-path fallback (`current_value = float(cached)` when seeded is False). Verified by temporarily reverting the mock to the old order: the test now fails with `expected exactly one SET NX winner and one loser, got [True, True]`, exactly the failure mode Greptile described. No production code change. Co-authored-by: Cursor <cursoragent@cursor.com> * test(spend_counter): mock async_set_cache to populate redis_store in concurrent read+write test `test_concurrent_read_and_write_paths_share_one_db_query` mocks `async_increment` to populate the in-memory `redis_store`, but did not mock `async_set_cache`. After the SET-NX seed change in `coalesced()`, the seed step writes via `async_set_cache(nx=True)` (default AsyncMock, no `redis_store` write), so the simulated Redis stays empty after the first reseed. The second `get_current_spend` then sees a clean Redis miss, re-enters the DB read path, and the test fails with `expected 1 DB query, got 2`. Fix: add a `redis_set_cache` side_effect that updates `redis_store` on `nx=True` (and rejects when the key already exists), matching the pattern used by the four sibling tests fixed in this branch's first commit. Pre-existing assertions are unchanged. Full `tests/test_litellm/proxy/test_proxy_server.py`: 158 passed. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-20 10:57:08 -07:00
Sameer Kankute	0290c7bc00	fix(proxy): decode bytes and pass-through SSE for Google-native streamGenerateContent (#27444 ) (#28213 ) * fix(proxy): decode bytes and pass-through SSE for Google-native streamGenerateContent (#27444) * fix(proxy): address Greptile review on Google-native SSE bytes path Remove unreachable try/except around SSE pass-through yield and add a unit test covering pre-formatted SSE bytes, terminator padding, and non-SSE byte fallback wrapping. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Tai An <antai12232931@outlook.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-18 22:46:11 -07:00
Shivam Rawat	fbe0ee81f1	fix(proxy): sort BYOK models by their displayed name in /v2/model/info (#28079 ) * fix(proxy): sort BYOK models by team_public_model_name in /v2/model/info Team BYOK rows persist an internal `model_name` like `model_name_{team_id}_{uuid}` and expose the user-facing name via `model_info.team_public_model_name`. The UI's `getDisplayModelName` and the search filter already fall back to that field, but `_sort_models` was keying off the raw `model_name` — so BYOK rows ranked by their opaque IDs and clumped at the end of the alphabetized list instead of interleaving with non-BYOK rows. Match the UI/search behavior: prefer `team_public_model_name` when present, fall back to `model_name` otherwise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(proxy): case-insensitive DB-side search for BYOK models `_apply_search_filter_to_models` used Prisma's JSON path `string_contains` to match the BYOK `team_public_model_name` field, but that operator is case-sensitive in Postgres (no `mode: insensitive` flag like column-level string filters have). So a search for "claude" missed a stored "Claude Sonnet" via the DB branch even though the router-side path matched it case-insensitively. Widen the JSON branch to "row has a team_public_model_name set" and filter case-insensitively in Python so DB-only BYOK rows match the same terms users see in the UI. This also drops the now-unused DB-level page-size optimization and `sort_by` knob — the in-Python filter is the source of truth for `db_models_total_count` now. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(proxy): scope BYOK search results to caller's accessible teams `_apply_search_filter_to_models` was widened to fetch every row with a `team_public_model_name` set so case-insensitive search could match mixed-case stored names. `/v2/model/info` is reachable by non-admin keys though, and the helper ran before `include_team_models` / `teamId` filtering — so a non-admin caller could search a common substring like "claude" and see BYOK rows belonging to teams they're not a member of. Resolve the caller's team membership once (admin → no scoping, else their `user_row.teams`) and drop BYOK rows (those with `model_info.team_id` set) outside that scope on both the router-side matches and the over-broad DB query, before display-name matching. Non-team rows are unaffected and remain gated by the existing `include_team_models` / `direct_access` paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(proxy): search by team_public_model_name and scope teamId queries - /v2/model/info search now matches both `model_name` and `model_info.team_public_model_name`, so team BYOK rows (which persist an internal `model_name_{team_id}_{uuid}`) are findable by the public name shown in the UI. DB query OR-includes a JSON-path match on `team_public_model_name` for rows that exist only in the DB. - `_filter_models_by_team_id` no longer short-circuits on the viewer's `direct_access` flag — that describes the admin viewer's own permissions and would leak every public model into a team-scoped view. Models are kept only when they belong to the team (own BYOK, in access_via_team_ids, or reachable via team.models / access groups). - Added `_authorize_team_id_query`: the untrusted `teamId` query parameter now requires the caller to be a proxy admin or a member of the requested team, otherwise returns 403. Without this, any authenticated user could enumerate another team's BYOK metadata by guessing the team id. - `_get_caller_byok_team_scope` now treats `PROXY_ADMIN_VIEW_ONLY` the same as `PROXY_ADMIN` (both are admin roles); previously VIEW_ONLY admins fell through to a user-id team lookup and saw only their own teams' BYOK rows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(proxy): bound BYOK search DB fetch in /v2/model/info Previously the DB-side search OR'd a JSON-path predicate `{model_info: {path: [team_public_model_name], string_contains: ""}}` to compensate for Prisma's case-sensitive JSON `string_contains` on Postgres. That predicate matches every row that has any `team_public_model_name` set, so any authenticated caller could force a full BYOK-table read with `/v2/model/info?search=x` regardless of page size. Drop the JSON-path branch. The DB query now does a bounded `model_name contains <search>` lookup. BYOK rows that are loaded into the router are still searchable by their `team_public_model_name` via the router-side filter; only the rare edge case of a BYOK row that exists only in the DB (router sync failed) loses display-name search, which is an acceptable trade-off given the DoS surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(proxy): bound DB find_many in /v2/model/info search The previous bounding patch dropped the page-aware `take=N` on `find_many`, so a broad `?search=model` would load and decrypt every matching DB row on each request even though the response only returns one page. Restore bounded fetches in `_apply_search_filter_to_models`: * Unsorted searches use `take = max(0, page * size - router_count)`, i.e. exactly one page worth of remaining DB rows. * Sorted searches need ordering across the full match set, so they cap at `_SORTED_SEARCH_DB_FETCH_CAP = 500` instead of fetching everything. * Total count comes from a cheap `count(...)` query so pagination stays accurate without materializing every row. Wired `page`, `size`, and `sortBy` through from the endpoint and added a regression test covering both `take` values. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(proxy): extract DB-fetch helper to satisfy PLR0915 _apply_search_filter_to_models tripped Ruff's "too many statements" (51 > 50) after the bounded-fetch fix. Move the DB-side block into `_fetch_db_models_for_search`, which keeps the same behavior: * Bounded `take` via page math (unsorted) or `_SORTED_SEARCH_DB_FETCH_CAP` (sorted) * Cheap `count(...)` for accurate pagination totals * Caller-team scope applied to fetched rows before decrypt Pure refactor; no behavior change. All 8 BYOK/team tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * style: apply black formatting to _fetch_db_models_for_search CI's "Check Black formatting" step flagged one line in the helper added in d55eecf6af. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 15:46:21 -07:00
Yuneng Jiang	a4a1726d99	test(proxy): add endpoint-level regression for /config/update double-encryption Adds test_update_config_env_var_round_trip_not_double_encrypted, which drives the real /config/update handler: first write plaintext, then re-POST the stored ciphertext (the Admin UI round-trip) and assert the value is not stacked with a second encryption layer and untouched keys stay byte-identical. Verified to fail against the pre-fix handler and pass after. Also tightens the unit test to exactly three ciphertext re-feeds.	2026-05-15 15:29:08 -07:00
Yuneng Jiang	0d8c9137fb	fix(proxy): make /config/update env-var encryption idempotent A single decrypt-then-encrypt chokepoint (_encrypt_env_variables_for_db) now backs both update_config and save_config. Re-submitting a value the Admin UI read back from /get/config/callbacks as ciphertext no longer stacks a second encryption layer, which previously decrypted to garbage and silently broke the callback. The chokepoint decrypts with the pure _decrypt_db_variables (no os.environ mutation on the write path) and encrypts exactly once; update_config merges only the sent keys so untouched env vars keep their stored ciphertext byte-for-byte.	2026-05-15 15:14:18 -07:00
Yassin Kortam	a6494e6fe3	perf: eliminate per-request callback scanning on proxy hot path (#27858 ) - Introduce `_CallbackCapabilities` dataclass and `ProxyLogging._callback_capabilities()` static method that inspects `litellm.callbacks` once and caches capability flags keyed on (list length, member ids); invalidates automatically when the callback list mutates without per-request iteration overhead - Replace O(n) `litellm.callbacks` walks in `async_pre_call_hook`, `during_call_hook`, `async_post_call_streaming_iterator_hook`, `async_post_call_streaming_hook`, and `post_call_response_headers_hook` with fast-path exits when no relevant callbacks are registered - Add `needs_iterator_wrap()` and `needs_per_chunk_streaming_hook()` instance methods to decouple iterator-level wrapping from per-chunk hook execution; avoids `get_response_string` materialization per chunk when no guardrail or chunk-hook callback is active - Introduce `_fast_serialize_simple_model_response_stream()` using `orjson` for common single-choice text streaming chunks, bypassing the full Pydantic serializer; falls back to `model_dump_json` for tool calls, logprobs, usage, and provider-specific fields - Add early-return in `_restamp_streaming_chunk_model` when downstream model already matches the requested model, avoiding unnecessary string comparisons on every chunk - Fix stale zero-cost cache bug in `_is_model_cost_zero`: move the per-router `_zero_cost_cache` dict onto the `Router` instance and clear it in `_invalidate_model_group_info_cache` so in-place pricing updates via `upsert_deployment` immediately resume budget enforcement - Add `scripts/benchmark_chat_completions_perf.py`: standalone async benchmarking tool with a mock OpenAI provider, LiteLLM proxy process management, non-streaming RPS, streaming TTFT, and full-stream latency measurements with repeat/median run support - Add comprehensive unit tests covering capability detection, cache invalidation, fast-path correctness, zero-cost cache regression, and the no-callback streaming fast path Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>	2026-05-14 09:28:31 -07:00
Krrish Dholakia	8bbc61e03c	fix: harden /key/update authorization checks (#27878 ) * fix: patch Host-header auth bypass in get_request_route Starlette reconstructs request.url from the Host header. A malformed Host like `localhost/?x=1` causes Starlette to build the full URL as `http://localhost/?x=1/health`, which url-parses to path="/". Since "/" is in LiteLLMRoutes.public_routes, all protected routes became reachable without authentication. Fix: read scope["path"] (set by uvicorn from the HTTP request line, not derivable from headers) instead of request.url.path. Sub-path deployments are handled via scope["app_root_path"] / scope["root_path"], mirroring Starlette's own base_url construction logic. Affected variants confirmed fixed: Host: localhost/?x=1 Host: localhost:4000/?x=1 Host: localhost/#test Host: localhost:4000/#test Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * style: reduce comments in route fix Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: block credential fields in RAG ingest vector_store options Credential fields (vertex_credentials, aws_access_key_id, api_key, etc.) in ingest_options.vector_store are now rejected at the API boundary with a 400 error. Credentials must be configured server-side. Previously any authenticated user could supply a vertex_credentials dict with type=external_account pointing credential_source.file at an arbitrary path (e.g. /proc/1/environ) and token_url at an attacker-controlled server. google-auth's identity_pool.Credentials refresh() would read the file and POST its contents to the attacker. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: block /key/update self-escalation by assigned users Non-admin users who were assigned a key (created_by != caller) could update any non-budget field — models, rpm_limit, guardrails, etc. — without admin authorization, allowing privilege self-escalation. Gate: only the key creator (created_by == caller) may edit their own key without admin check; budget changes always require admin regardless of creator status. All other callers must pass _check_key_admin_access. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: block user-controlled api_base in RAG ingest vector_store options A user-supplied api_base in ingest_options.vector_store caused the server to forward its configured provider credentials (Gemini, OpenAI) to an attacker-controlled endpoint via SSRF. Add api_base to the blocked credential params set alongside api_key and the existing credential fields. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: restrict /utils/transform_request to PROXY_ADMIN and apply body safety check Any authenticated internal_user could POST arbitrary provider config (aws_sts_endpoint, api_base, etc.) to /utils/transform_request and have the server forward its credentials to an attacker-controlled endpoint. - Gate the endpoint on PROXY_ADMIN role (403 for all other roles) - Call is_request_body_safe() to reject banned params even for admins - Convert ValueError from safety check to HTTP 400 Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: apply banned-param check to /utils/transform_request Without is_request_body_safe(), any authenticated user could pass aws_sts_endpoint, api_base, or aws_web_identity_token to /utils/transform_request and have the server forward its configured provider credentials to an attacker-controlled endpoint during SDK credential resolution. Applies the same banned-param blocklist already used by LLM endpoints. Endpoint remains accessible to all authenticated users. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: block SSRF via api_base in /prompts/test dotprompt YAML frontmatter Any frontmatter key not in ["model","input","output"] flowed into optional_params and was merged into the LLM call data dict, bypassing is_request_body_safe. An attacker with any bearer key could set api_base in YAML to redirect the outbound LLM request — including the provider API key — to an attacker-controlled host. Fix: call is_request_body_safe on the constructed data dict after optional_params are merged, before invoking ProxyBaseLLMRequestProcessing. ValueError from the banned-param check is surfaced as HTTP 400. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * Update litellm/proxy/rag_endpoints/endpoints.py Co-authored-by: veria-ai[bot] <224490171+veria-ai[bot]@users.noreply.github.com> * fix: coerce nested config strings before banned-param check _NESTED_CONFIG_KEYS descent used isinstance(nested, dict) which silently skipped litellm_embedding_config when delivered as a JSON string via multipart/form-data. Banned params (api_base, aws_sts_endpoint, etc.) nested inside the stringified value were invisible to is_request_body_safe. _NESTED_METADATA_KEYS already used _coerce_metadata_to_dict which parses JSON strings before checking. Apply the same coercion to _NESTED_CONFIG_KEYS. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: replace substring match with prefix match in is_llm_api_route mapped_pass_through_routes used `_llm_passthrough_route in route` (substring) so any admin-only path whose URL contained a provider name (openai, anthropic, azure, bedrock, etc.) was misclassified as an LLM API route and bypassed the admin gate in non_proxy_admin_allowed_routes_check. Confirmed live: non-admin key could GET /credentials/by_name/openai (read masked provider API key) and DELETE /credentials/openai (delete credential). Fix: use exact match or startswith(prefix + "/") — the same pattern used everywhere else in RouteChecks — so only routes that actually start with a passthrough prefix are allowed through. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: stabilize PR #27878 test failures - key_management_endpoints: extend can_skip_admin_check to team keys so team members with /key/update permission can update non-budget fields. can_team_member_execute_key_management_endpoint already validates team membership + permission and raises if unauthorized; reaching the admin check on a team key means the caller was authorized. - test: set created_by on mock key in test_update_key_non_budget_fields_allowed_for_internal_user so caller_is_creator resolves correctly (MagicMock default ≠ user_id). - auth_utils.get_request_route: guard against non-dict request.scope (e.g. MagicMock in unit tests) to prevent a MagicMock leaking into UserAPIKeyAuth.request_route and failing Pydantic validation. - ci: assign test_multipart_bypass_repro.py to the proxy-runtime shard in test-unit-proxy-db.yml to satisfy the shard-coverage check. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix(lint): add explicit str() cast in get_request_route for MyPy scope.get() returns Any\|None which MyPy cannot coerce to str implicitly. Wrap both scope.get() calls in str() to satisfy the type checker. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: guard bare-/ root_path strip + make total_spend migration idempotent auth_utils.get_request_route: when Starlette sets scope["app_root_path"] to "/" (e.g. behind some middleware), the old stripping logic would remove the leading slash from every path ("/team/new" → "team/new"), breaking route matching and causing auth to misclassify protected routes. Skip stripping when root_path is bare "/". migration: add IF NOT EXISTS to total_spend ALTER TABLE so the migration is safe to replay when a prior partial run already created the column. Without this guard, prisma migrate deploy fails on CI DBs that were partially migrated, causing all subsequent DB operations (including /team/new) to 500. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: require creator still owns key for personal-key bypass in /key/update caller_is_creator now requires both created_by == caller AND user_id == caller. Previously checking only created_by let a demoted admin who originally created a key for another user continue editing non-budget fields on it after reassignment, bypassing _check_key_admin_access. Adds regression test: creator whose key was reassigned is blocked (403). Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: extract auth checks to fix PLR0915 + broaden max_budget assertion internal_user_endpoints._update_single_user_helper exceeded 50 statements (PLR0915). Extract authorization checks into _check_user_update_authz helper to bring statement count under the limit. test_validate_max_budget: assert "negative" (substring of both the local "cannot be negative" and the CI "non-negative finite number" messages) so the test is stable regardless of which exact wording the function uses. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: veria-ai[bot] <224490171+veria-ai[bot]@users.noreply.github.com>	2026-05-14 04:16:04 +00:00
Yuneng Jiang	83f26d17c1	Strip SERVER_ROOT_PATH before lazy-feature prefix match LazyFeatureMiddleware compared the raw scope path against registered prefixes (e.g. /policies), so requests under a server root path like /api/v1/policies/... never matched, the feature never loaded, and the endpoint returned 404. Strip the configured root path before matching, normalizing trailing slashes and enforcing a component boundary so /api does not falsely match /apiv2.	2026-05-12 20:43:08 -07:00
Michael-RZ-Berri	b888177ea6	fix: reset proxy budget when initial reset duration is null then updated (#27488 ) Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain>	2026-05-09 18:33:36 -04:00
Michael Riad Zaky	5d7b7e7e37	fix(realtime): register /openai/v1/realtime as websocket route	2026-05-06 13:13:58 -07:00
yuneng-jiang	c2cea58567	Merge branch 'litellm_yj_may1' into codex/budget-race-enforcement	2026-05-01 14:32:18 -07:00
Yuneng Jiang	650821b538	Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_fix-config-update-targeted-upserts # Conflicts: # tests/test_litellm/proxy/test_proxy_server.py	2026-05-01 10:38:34 -07:00
user	66c0fe23da	handle bad reservation counters after spend write	2026-04-30 22:55:26 -07:00
user	0b1ea9eb8f	harden budget reservation edge cases	2026-04-30 21:49:31 -07:00
user	b53adf7cff	address budget reservation review edges	2026-04-30 21:21:26 -07:00
user	4f8769943b	skip invalid budget window counter increments	2026-04-30 20:18:07 -07:00
user	c28e093f41	finalize budget reservations after counter updates	2026-04-30 19:50:36 -07:00
user	15d845c321	avoid stale local spend counters after redis misses	2026-04-30 19:33:55 -07:00
user	64fadc3b8e	Merge remote-tracking branch 'origin/litellm_internal_staging' into codex/budget-race-enforcement-greptile-fix # Conflicts: # litellm/proxy/db/spend_counter_reseed.py # litellm/proxy/proxy_server.py	2026-04-30 18:25:48 -07:00
Michael Riad Zaky	9f08db91f9	Refresh Redis TTL on counter writes and skip stale in-memory on Redis miss	2026-04-30 17:50:58 -07:00
user	694fadd175	fix budget reservation review findings	2026-04-30 17:38:18 -07:00
yuneng-jiang	15b7386859	Merge pull request #26815 from stuxf/fix/get-image-lfi-ssrf chore(proxy): contain UI_LOGO_PATH / LITELLM_FAVICON_URL on unauthenticated asset endpoints	2026-04-30 17:10:15 -07:00
Michael Riad Zaky	47b2832d6f	test: replace subprocess startup-import diff with static source scan	2026-04-30 16:15:46 -07:00
user	46183e6dc9	Merge remote-tracking branch 'origin/litellm_internal_staging' into codex/budget-race-enforcement-snapshot	2026-04-30 13:56:16 -07:00
user	ca50868b75	harden end-user and tag budget reservations	2026-04-30 13:37:10 -07:00
user	b8a141cefd	fix(static-assets): stop serving stale logo cache	2026-04-30 11:34:25 -07:00
user	215f538d4f	fix(static-assets): browser-load remote branding assets	2026-04-30 11:30:57 -07:00
user	926de696a1	tighten budget counter cache recovery	2026-04-29 20:51:07 -07:00
Yuneng Jiang	be3d27a0b8	[Test] Proxy: Add /config/update critical-path tests to test_proxy_server.py Companion to the previous commit which deleted the symbol-named tests/test_litellm/proxy/management_endpoints/test_update_config_endpoint.py. Adds the 5 critical-path tests in their proper home — the test file that mirrors the source file (proxy_server.py). The two commits are one logical change; they were split because git add aborted on a stale path argument.	2026-04-29 19:18:31 -07:00
Michael Riad Zaky	0f8dd28542	lazy-load optional feature routers on first request	2026-04-29 17:20:55 -07:00
user	55d393d77d	fix(static-assets): unblock CI — pass headers explicitly + harden + update legacy tests Three CI failures from the previous push, all addressed: * ``lint`` (mypy): ``async_client.get(url, *request_kwargs)`` confused mypy because ``AsyncHTTPHandler.get``'s second positional arg is typed ``bool \| None``. Switched to an explicit branch: ``await async_client.get(rewritten_url, headers={"host": host_header})`` for the HTTP-rewritten case, plain ``get(rewritten_url)`` otherwise. ``proxy-infra`` / ``test_get_image_custom_local_logo_bypasses_cache``: the existing test set ``UI_LOGO_PATH=/app/custom_logo.jpg`` with no ``LITELLM_ASSETS_PATH``, asserting the path was served verbatim. That was the LFI behaviour the new path-containment guard closes. Updated the test to set ``LITELLM_ASSETS_PATH=/app`` so the path is inside an allowed root, and patched the helper's ``realpath`` / ``isfile`` to go along with the mocked filesystem. Test intent (bypass cache when ``UI_LOGO_PATH`` is local) is preserved. * ``auth-and-jwt`` / ``test_get_image_cache_logic``: existing test built a ``Mock`` response without ``headers``, so the new Content-Type check tripped on ``Mock().split(";")[0]``. Two fixes: 1. Set ``mock_response.headers = {"content-type": "image/jpeg"}`` on the test (matches the real upstream contract — a logo CDN always sets a Content-Type). 2. Make ``fetch_validated_image_bytes`` defensive: if the Content-Type header is missing or non-string, treat as non-image and fall back to default. Closes a subtle hole — pre-fix, an upstream that omits Content-Type entirely would have served arbitrary bytes under the ``image/jpeg`` wrapper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 21:47:41 +00:00
user	9ef8572d67	fix(proxy): /get_logo_url no longer discloses local UI_LOGO_PATH The unauthenticated ``/get_logo_url`` endpoint returned the ``UI_LOGO_PATH`` env var verbatim. For HTTP(S) URLs this is intended — the dashboard loads the logo directly from a public/internal CDN. For local filesystem paths it was an information disclosure: any caller could fetch ``/get_logo_url`` and read admin-only filesystem details like ``UI_LOGO_PATH=/etc/litellm/secret-config.json``. Now the endpoint returns the URL only when it begins with ``http://`` or ``https://``. For local paths (or unset) it returns an empty string — the dashboard falls back to ``/get_image`` which serves the file via the path-containment guard added in the previous commit. Tests parametrize the disclosure-blocked cases (``/etc/...``, ``/proc/self/environ``, relative paths) and confirm HTTP / HTTPS URLs still pass through unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 21:15:21 +00:00
Krrish Dholakia	fd32f29e39	Revert "lazy-load optional feature routers on first request (#26534 )" (#26727 ) This reverts commit `21ed38971d`.	2026-04-29 00:21:41 +00:00
Michael-RZ-Berri	21ed38971d	lazy-load optional feature routers on first request (#26534 ) Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain>	2026-04-28 17:04:40 -07:00
Michael Riad Zaky	6052ce1017	cache LiteLLM_Config param reads in DualCache + batch scheduler-tick fetch	2026-04-28 16:29:50 -07:00
Michael-RZ-Berri	3ef16098f2	Reseed enforcement read path from DB on counter miss (#26459 ) Co-authored-by: Michael Riad Zaky <michaelr@Michaels-MacBook-Air.local>	2026-04-25 15:14:02 -07:00
Michael Riad Zaky	0bd49ecb8b	Fix bug that bypasses per-team member budget limit	2026-04-22 10:41:13 -07:00
Ishaan Jaffer	e8461b5b97	style: run black formatter on files from main merge	2026-04-17 13:02:59 -07:00
Sameer Kankute	ffb87dcac9	Fix failing test and code qa + lint	2026-04-14 20:53:17 +05:30
Ryan Crabbe	8d9bbc6eb2	[Fix] Include access group models in UI model listing Models associated with a team only through access groups (not directly in team.models) were not appearing on the /ui/?page=models page. The API authorization path already resolved access groups correctly, but the /v2/model/info listing endpoint only checked team.models. Add _add_access_group_models_to_team_models() which batch-fetches all distinct access groups in a single find_many query, then resolves each team's access group models into deployments and merges them into the team_models dict.	2026-03-28 12:32:12 -07:00
yuneng-jiang	846e4b44b6	Merge pull request #24682 from michelligabriele/fix/budget-spend-counters fix(proxy): enforce budget limits across multi-pod deployments via Redis-backed spend counters	2026-03-27 16:59:23 -07:00
michelligabriele	d533b432fd	fix(proxy): enforce budget limits across multi-pod deployments via Redis-backed spend counters Budget checks on API keys, teams, and team members were not enforced in multi-pod deployments because user_api_key_cache is intentionally in-memory-only. Each pod tracked spend independently, so with N pods the effective budget was N × max_budget. Introduces a separate spend_counter_cache (DualCache wired to redis_usage_cache) with atomic increment/read helpers: - increment_spend_counters(): awaited in cost callback (not create_task) to update both in-memory and Redis before the next auth check - get_current_spend(): reads Redis first (cross-pod authoritative), falls back to in-memory, then to cached object .spend from DB Budget check functions (_virtual_key_max_budget_check, _team_max_budget_check, _check_team_member_budget) now read spend via get_current_spend() instead of cached object .spend fields. When Redis is not configured, falls back to in-memory-only counters (same as current single-instance behavior). Fixes #23714	2026-03-27 20:39:52 +01:00
Sameer Kankute	92a07e2d6e	fix(proxy): address Greptile review feedback - Remove HTTP_PROXY/HTTPS_PROXY from blocklist (legitimately used in corporate envs) - Add NO_PROXY/no_proxy to blocklist (prevents bypassing proxy monitoring) - Remove dead code in _is_valid_user_id (space exception was unreachable) - Update tests accordingly Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-27 20:38:36 +05:30
Sameer Kankute	8112fbf274	fix(proxy): sanitize user_id input and block dangerous env var keys Add input validation to get_user_id_from_request (length limit, control char rejection) and a blocklist of dangerous environment variable keys in _load_environment_variables to prevent PATH/LD_PRELOAD/PYTHONPATH override via config. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-27 20:38:36 +05:30
Ryan Crabbe	ad43a35d76	feat: add control plane for multi-proxy worker management Adds a control plane capability that enables a central admin instance to manage multiple regional worker proxies from a single UI. Backend: - Worker registry loaded from YAML config (worker_id, name, url) - /.well-known/litellm-ui-config exposes is_control_plane and workers list - /v3/login + /v3/login/exchange: opaque code exchange for cross-origin username/password auth (JWT never in URL/logs, single-use 60s TTL) - SSO cookie handoff with return_to → opaque code → exchange - _validate_return_to: full origin validation (scheme+hostname+port) - Startup warning when control_plane_url set without Redis - Both /v3 endpoints gated behind control_plane_url config Frontend: - Worker selector dropdown on login page (gated behind is_control_plane) - Cross-origin SSO code exchange handling on callback - switchToWorkerUrl: localStorage-persisted worker URL for API calls - useWorker hook: shared worker state management - WorkerDropdown in navbar for switching workers - Logout/switch clears worker state from localStorage Tests: - 7 tests for /v3/login + /v3/login/exchange - 10 tests for _validate_return_to - 2 tests for control plane discovery endpoint	2026-03-19 22:50:19 -07:00
Sameer Kankute	7e2f2a8ffa	Fix inflight mypy	2026-03-02 19:41:32 +05:30
Darien Kindlund	dc96ade956	fix: preserve interval_hours in model cost map reload config (#22200 ) The upsert update branches for model_cost_map_reload_config were overwriting param_value with only the force_reload flag, dropping interval_hours. This caused scheduled reloads to self-destruct after their first execution. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-02 19:21:11 +05:30
Ryan Crabbe	5bcaeabfd8	Merge origin/main into litellm_fix_streaming_connection_pool_leak Resolve conflict in test_proxy_server.py: keep both async_data_generator cleanup tests and store_model_in_db DB config override tests.	2026-02-21 12:44:50 -08:00
yuneng-jiang	6bfab8acd4	address greptile review feedback (greploop iteration 2) Reset logo_path to default_logo when custom UI_LOGO_PATH file doesn't exist, so the else branch at the bottom of get_image serves the default logo instead of the non-existent custom path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-19 20:25:00 -08:00

1 2 3

139 Commits