Commit Graph

139 Commits

Author SHA1 Message Date
Mateo Wang
48d7e15b83
chore(admin-ui): regenerate static export with trailingSlash: true (#28112)
* chore(admin-ui): regenerate static export with trailingSlash: true

Rebuilds litellm/proxy/_experimental/out/ from ui/litellm-dashboard with
`trailingSlash: true` enabled in next.config.mjs. Next.js now emits every
route as <dir>/index.html (e.g. mcp/oauth/callback/index.html) instead of
<dir>.html with a sibling metadata-only directory, which fixes the 404 on
extensionless URLs served through FastAPI's StaticFiles(html=True) mount.

This is the build artifact half of the fix; the config change, Dockerfile
cleanup, and regression test live in the follow-up source PR that stacks
on top of this branch.

* fix(admin-ui): emit nested routes as <dir>/index.html (#28106)

Linear and other OAuth providers redirect the user back to
/ui/mcp/oauth/callback?code=...&state=... after the consent step. The
packaged Next.js static export only produced /ui/mcp/oauth/callback.html,
so FastAPI's StaticFiles served a 404 on the extensionless URL and the
OAuth handshake never completed.

The Dockerfile.non_root build step tried to paper over this at image-build
time with `for html_file in *.html; do ...`, but that shell glob does not
recurse, so nested routes like mcp/oauth/callback.html were left stranded
next to an empty mcp/oauth/callback/ directory containing only Next.js
metadata. The runtime restructure step in proxy_server.py was then skipped
because the .litellm_ui_ready marker had already been dropped.

Set trailingSlash: true in the dashboard's Next.js config so the export
emits every nested route as <dir>/index.html natively. The Dockerfile loop
is now a no-op for the bundled UI and has been removed; the
.litellm_ui_ready marker is still written so the proxy keeps skipping the
redundant Python restructure step at startup. Stacks on top of the static
export regeneration in the parent branch.

* chore: restore origin/litellm_internal_staging out files
2026-05-25 21:06:50 -07:00
milan-berri
0fb710400f
fix(spend_counter): seed Redis counter via SET NX to prevent cross-pod double-seed (#27854)
* fix(spend_counter): seed Redis counter via SET NX to prevent cross-pod double-seed

Symptom
-------
Customers on multi-pod deployments see team `spend` jump to ~2x (or N x
the pod count) shortly after a Redis cache miss / TTL expiry, triggering
spurious "Budget Crossed" alerts and blocked requests until the value is
manually reset.

Root cause
----------
`SpendCounterReseed.coalesced` warmed the primary spend counter by
calling `redis.async_increment(key, value=db_spend, refresh_ttl=True)`,
which lowers to Redis `INCRBYFLOAT`. That is additive, not idempotent.

The per-counter `asyncio.Lock` only coalesces seeders inside one
process. With N pods sharing one Redis, on a cold key (cold start, TTL
expiry, manual delete) every pod independently passes its lock + Redis
re-check, reads the same `db_spend`, and issues `INCRBYFLOAT db_spend`.
Final value: N x db_spend.

Fix
---
Use `redis.async_set_cache(key, value=db_spend, nx=True)` for the seed.
SET NX is atomic across pods: exactly one writer initializes the key;
losers read the winner's value via `async_get_cache`. This is the same
idiom already used by `coalesced_window` in the same file, so the two
seed paths are now consistent.

Per-request deltas continue to use `INCRBYFLOAT` (correct - additive
behaviour is what we want for increments, not for initial seed).

Verification
------------
Live two-process repro against the same Postgres + Redis (DB
spend = 506):

  Unpatched: 4/4 runs -> Redis counter = ~1012  (~2 x db_spend)
  Patched:  12/12 runs -> Redis counter = ~506

Unit tests (`test_proxy_server.py`):

- New `test_primary_spend_counter_redis_concurrent_seed_does_not_double_seed`
  patches `_get_lock` to return a fresh lock per caller (otherwise the
  per-process lock masks the race), races two `coalesced` calls, and
  asserts final = 506 with exactly one of two SET NX attempts winning.
- 4 existing tests updated for the new seed contract (SET NX for the
  seed, INCRBYFLOAT only for the per-request delta).
- Full `spend_counter or reseed or budget` slice: 22 passed.

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(spend_counter): make SET NX mock atomic so loser branch is exercised

Greptile flagged that `redis_set_cache` in
test_primary_spend_counter_redis_concurrent_seed_does_not_double_seed
placed `await asyncio.sleep(0)` AFTER the NX membership check. Both
concurrent tasks observed an empty `redis_store`, passed the guard, and
both returned True - so the loser branch (else: read back winner's value)
was never exercised.

Fix the mock to model real atomic Redis SET NX:

- Yield BEFORE the membership check so two concurrent callers interleave
  the way real SET NX does (first to resume runs check + write atomically
  and wins; second resumes after the key exists and loses).
- Track set_cache return values; assert sorted([loser, winner]) so we
  know exactly one task wins and one loses.
- Track async_get_cache calls that happen AFTER at least one SET NX has
  completed; assert at least one such read - that is the loser-path
  fallback (`current_value = float(cached)` when seeded is False).

Verified by temporarily reverting the mock to the old order: the test
now fails with `expected exactly one SET NX winner and one loser, got
[True, True]`, exactly the failure mode Greptile described.

No production code change.

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(spend_counter): mock async_set_cache to populate redis_store in concurrent read+write test

`test_concurrent_read_and_write_paths_share_one_db_query` mocks
`async_increment` to populate the in-memory `redis_store`, but did not
mock `async_set_cache`. After the SET-NX seed change in `coalesced()`,
the seed step writes via `async_set_cache(nx=True)` (default AsyncMock,
no `redis_store` write), so the simulated Redis stays empty after the
first reseed. The second `get_current_spend` then sees a clean Redis
miss, re-enters the DB read path, and the test fails with
`expected 1 DB query, got 2`.

Fix: add a `redis_set_cache` side_effect that updates `redis_store` on
`nx=True` (and rejects when the key already exists), matching the
pattern used by the four sibling tests fixed in this branch's first
commit. Pre-existing assertions are unchanged.

Full `tests/test_litellm/proxy/test_proxy_server.py`: 158 passed.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-20 10:57:08 -07:00
Sameer Kankute
0290c7bc00
fix(proxy): decode bytes and pass-through SSE for Google-native streamGenerateContent (#27444) (#28213)
* fix(proxy): decode bytes and pass-through SSE for Google-native streamGenerateContent (#27444)

* fix(proxy): address Greptile review on Google-native SSE bytes path

Remove unreachable try/except around SSE pass-through yield and add a
unit test covering pre-formatted SSE bytes, terminator padding, and
non-SSE byte fallback wrapping.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Tai An <antai12232931@outlook.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-18 22:46:11 -07:00
Shivam Rawat
fbe0ee81f1
fix(proxy): sort BYOK models by their displayed name in /v2/model/info (#28079)
* fix(proxy): sort BYOK models by team_public_model_name in /v2/model/info

Team BYOK rows persist an internal `model_name` like
`model_name_{team_id}_{uuid}` and expose the user-facing name via
`model_info.team_public_model_name`. The UI's `getDisplayModelName`
and the search filter already fall back to that field, but
`_sort_models` was keying off the raw `model_name` — so BYOK rows
ranked by their opaque IDs and clumped at the end of the alphabetized
list instead of interleaving with non-BYOK rows.

Match the UI/search behavior: prefer `team_public_model_name` when
present, fall back to `model_name` otherwise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(proxy): case-insensitive DB-side search for BYOK models

`_apply_search_filter_to_models` used Prisma's JSON path
`string_contains` to match the BYOK `team_public_model_name` field, but
that operator is case-sensitive in Postgres (no `mode: insensitive`
flag like column-level string filters have). So a search for "claude"
missed a stored "Claude Sonnet" via the DB branch even though the
router-side path matched it case-insensitively.

Widen the JSON branch to "row has a team_public_model_name set" and
filter case-insensitively in Python so DB-only BYOK rows match the
same terms users see in the UI. This also drops the now-unused
DB-level page-size optimization and `sort_by` knob — the in-Python
filter is the source of truth for `db_models_total_count` now.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(proxy): scope BYOK search results to caller's accessible teams

`_apply_search_filter_to_models` was widened to fetch every row with a
`team_public_model_name` set so case-insensitive search could match
mixed-case stored names. `/v2/model/info` is reachable by non-admin
keys though, and the helper ran before `include_team_models` / `teamId`
filtering — so a non-admin caller could search a common substring like
"claude" and see BYOK rows belonging to teams they're not a member of.

Resolve the caller's team membership once (admin → no scoping, else
their `user_row.teams`) and drop BYOK rows (those with
`model_info.team_id` set) outside that scope on both the router-side
matches and the over-broad DB query, before display-name matching.
Non-team rows are unaffected and remain gated by the existing
`include_team_models` / `direct_access` paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(proxy): search by team_public_model_name and scope teamId queries

- /v2/model/info search now matches both `model_name` and
  `model_info.team_public_model_name`, so team BYOK rows (which persist
  an internal `model_name_{team_id}_{uuid}`) are findable by the public
  name shown in the UI. DB query OR-includes a JSON-path match on
  `team_public_model_name` for rows that exist only in the DB.
- `_filter_models_by_team_id` no longer short-circuits on the viewer's
  `direct_access` flag — that describes the admin viewer's own
  permissions and would leak every public model into a team-scoped view.
  Models are kept only when they belong to the team (own BYOK, in
  access_via_team_ids, or reachable via team.models / access groups).
- Added `_authorize_team_id_query`: the untrusted `teamId` query
  parameter now requires the caller to be a proxy admin or a member of
  the requested team, otherwise returns 403. Without this, any
  authenticated user could enumerate another team's BYOK metadata by
  guessing the team id.
- `_get_caller_byok_team_scope` now treats `PROXY_ADMIN_VIEW_ONLY` the
  same as `PROXY_ADMIN` (both are admin roles); previously VIEW_ONLY
  admins fell through to a user-id team lookup and saw only their own
  teams' BYOK rows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(proxy): bound BYOK search DB fetch in /v2/model/info

Previously the DB-side search OR'd a JSON-path predicate
`{model_info: {path: [team_public_model_name], string_contains: ""}}`
to compensate for Prisma's case-sensitive JSON `string_contains` on
Postgres. That predicate matches every row that has any
`team_public_model_name` set, so any authenticated caller could force a
full BYOK-table read with `/v2/model/info?search=x` regardless of page
size.

Drop the JSON-path branch. The DB query now does a bounded
`model_name contains <search>` lookup. BYOK rows that are loaded into
the router are still searchable by their `team_public_model_name` via
the router-side filter; only the rare edge case of a BYOK row that
exists only in the DB (router sync failed) loses display-name search,
which is an acceptable trade-off given the DoS surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(proxy): bound DB find_many in /v2/model/info search

The previous bounding patch dropped the page-aware `take=N` on
`find_many`, so a broad `?search=model` would load and decrypt every
matching DB row on each request even though the response only returns
one page.

Restore bounded fetches in `_apply_search_filter_to_models`:

* Unsorted searches use `take = max(0, page * size - router_count)`,
  i.e. exactly one page worth of remaining DB rows.
* Sorted searches need ordering across the full match set, so they cap
  at `_SORTED_SEARCH_DB_FETCH_CAP = 500` instead of fetching everything.
* Total count comes from a cheap `count(...)` query so pagination stays
  accurate without materializing every row.

Wired `page`, `size`, and `sortBy` through from the endpoint and added
a regression test covering both `take` values.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(proxy): extract DB-fetch helper to satisfy PLR0915

_apply_search_filter_to_models tripped Ruff's "too many statements"
(51 > 50) after the bounded-fetch fix. Move the DB-side block into
`_fetch_db_models_for_search`, which keeps the same behavior:

* Bounded `take` via page math (unsorted) or `_SORTED_SEARCH_DB_FETCH_CAP`
  (sorted)
* Cheap `count(...)` for accurate pagination totals
* Caller-team scope applied to fetched rows before decrypt

Pure refactor; no behavior change. All 8 BYOK/team tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* style: apply black formatting to _fetch_db_models_for_search

CI's "Check Black formatting" step flagged one line in the helper added
in d55eecf6af. No behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 15:46:21 -07:00
Yuneng Jiang
a4a1726d99
test(proxy): add endpoint-level regression for /config/update double-encryption
Adds test_update_config_env_var_round_trip_not_double_encrypted, which
drives the real /config/update handler: first write plaintext, then
re-POST the stored ciphertext (the Admin UI round-trip) and assert the
value is not stacked with a second encryption layer and untouched keys
stay byte-identical. Verified to fail against the pre-fix handler and
pass after. Also tightens the unit test to exactly three ciphertext
re-feeds.
2026-05-15 15:29:08 -07:00
Yuneng Jiang
0d8c9137fb
fix(proxy): make /config/update env-var encryption idempotent
A single decrypt-then-encrypt chokepoint (_encrypt_env_variables_for_db)
now backs both update_config and save_config. Re-submitting a value the
Admin UI read back from /get/config/callbacks as ciphertext no longer
stacks a second encryption layer, which previously decrypted to garbage
and silently broke the callback. The chokepoint decrypts with the pure
_decrypt_db_variables (no os.environ mutation on the write path) and
encrypts exactly once; update_config merges only the sent keys so
untouched env vars keep their stored ciphertext byte-for-byte.
2026-05-15 15:14:18 -07:00
Yassin Kortam
a6494e6fe3
perf: eliminate per-request callback scanning on proxy hot path (#27858)
- Introduce `_CallbackCapabilities` dataclass and `ProxyLogging._callback_capabilities()` static method that inspects `litellm.callbacks` once and caches capability flags keyed on (list length, member ids); invalidates automatically when the callback list mutates without per-request iteration overhead
- Replace O(n) `litellm.callbacks` walks in `async_pre_call_hook`, `during_call_hook`, `async_post_call_streaming_iterator_hook`, `async_post_call_streaming_hook`, and `post_call_response_headers_hook` with fast-path exits when no relevant callbacks are registered
- Add `needs_iterator_wrap()` and `needs_per_chunk_streaming_hook()` instance methods to decouple iterator-level wrapping from per-chunk hook execution; avoids `get_response_string` materialization per chunk when no guardrail or chunk-hook callback is active
- Introduce `_fast_serialize_simple_model_response_stream()` using `orjson` for common single-choice text streaming chunks, bypassing the full Pydantic serializer; falls back to `model_dump_json` for tool calls, logprobs, usage, and provider-specific fields
- Add early-return in `_restamp_streaming_chunk_model` when downstream model already matches the requested model, avoiding unnecessary string comparisons on every chunk
- Fix stale zero-cost cache bug in `_is_model_cost_zero`: move the per-router `_zero_cost_cache` dict onto the `Router` instance and clear it in `_invalidate_model_group_info_cache` so in-place pricing updates via `upsert_deployment` immediately resume budget enforcement
- Add `scripts/benchmark_chat_completions_perf.py`: standalone async benchmarking tool with a mock OpenAI provider, LiteLLM proxy process management, non-streaming RPS, streaming TTFT, and full-stream latency measurements with repeat/median run support
- Add comprehensive unit tests covering capability detection, cache invalidation, fast-path correctness, zero-cost cache regression, and the no-callback streaming fast path

Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>
2026-05-14 09:28:31 -07:00
Krrish Dholakia
8bbc61e03c
fix: harden /key/update authorization checks (#27878)
* fix: patch Host-header auth bypass in get_request_route

Starlette reconstructs request.url from the Host header. A malformed
Host like `localhost/?x=1` causes Starlette to build the full URL as
`http://localhost/?x=1/health`, which url-parses to path="/". Since "/"
is in LiteLLMRoutes.public_routes, all protected routes became reachable
without authentication.

Fix: read scope["path"] (set by uvicorn from the HTTP request line,
not derivable from headers) instead of request.url.path. Sub-path
deployments are handled via scope["app_root_path"] / scope["root_path"],
mirroring Starlette's own base_url construction logic.

Affected variants confirmed fixed:
  Host: localhost/?x=1
  Host: localhost:4000/?x=1
  Host: localhost/#test
  Host: localhost:4000/#test

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* style: reduce comments in route fix

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix: block credential fields in RAG ingest vector_store options

Credential fields (vertex_credentials, aws_access_key_id, api_key, etc.)
in ingest_options.vector_store are now rejected at the API boundary with
a 400 error. Credentials must be configured server-side.

Previously any authenticated user could supply a vertex_credentials dict
with type=external_account pointing credential_source.file at an
arbitrary path (e.g. /proc/1/environ) and token_url at an
attacker-controlled server. google-auth's identity_pool.Credentials
refresh() would read the file and POST its contents to the attacker.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix: block /key/update self-escalation by assigned users

Non-admin users who were assigned a key (created_by != caller) could
update any non-budget field — models, rpm_limit, guardrails, etc. —
without admin authorization, allowing privilege self-escalation.

Gate: only the key creator (created_by == caller) may edit their own
key without admin check; budget changes always require admin regardless
of creator status. All other callers must pass _check_key_admin_access.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix: block user-controlled api_base in RAG ingest vector_store options

A user-supplied api_base in ingest_options.vector_store caused the server
to forward its configured provider credentials (Gemini, OpenAI) to an
attacker-controlled endpoint via SSRF.

Add api_base to the blocked credential params set alongside api_key and
the existing credential fields.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix: restrict /utils/transform_request to PROXY_ADMIN and apply body safety check

Any authenticated internal_user could POST arbitrary provider config
(aws_sts_endpoint, api_base, etc.) to /utils/transform_request and have
the server forward its credentials to an attacker-controlled endpoint.

- Gate the endpoint on PROXY_ADMIN role (403 for all other roles)
- Call is_request_body_safe() to reject banned params even for admins
- Convert ValueError from safety check to HTTP 400

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix: apply banned-param check to /utils/transform_request

Without is_request_body_safe(), any authenticated user could pass
aws_sts_endpoint, api_base, or aws_web_identity_token to
/utils/transform_request and have the server forward its configured
provider credentials to an attacker-controlled endpoint during SDK
credential resolution.

Applies the same banned-param blocklist already used by LLM endpoints.
Endpoint remains accessible to all authenticated users.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix: block SSRF via api_base in /prompts/test dotprompt YAML frontmatter

Any frontmatter key not in ["model","input","output"] flowed into
optional_params and was merged into the LLM call data dict, bypassing
is_request_body_safe. An attacker with any bearer key could set
api_base in YAML to redirect the outbound LLM request — including the
provider API key — to an attacker-controlled host.

Fix: call is_request_body_safe on the constructed data dict after
optional_params are merged, before invoking ProxyBaseLLMRequestProcessing.
ValueError from the banned-param check is surfaced as HTTP 400.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* Update litellm/proxy/rag_endpoints/endpoints.py

Co-authored-by: veria-ai[bot] <224490171+veria-ai[bot]@users.noreply.github.com>

* fix: coerce nested config strings before banned-param check

_NESTED_CONFIG_KEYS descent used isinstance(nested, dict) which silently
skipped litellm_embedding_config when delivered as a JSON string via
multipart/form-data. Banned params (api_base, aws_sts_endpoint, etc.)
nested inside the stringified value were invisible to is_request_body_safe.

_NESTED_METADATA_KEYS already used _coerce_metadata_to_dict which parses
JSON strings before checking. Apply the same coercion to _NESTED_CONFIG_KEYS.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix: replace substring match with prefix match in is_llm_api_route

mapped_pass_through_routes used `_llm_passthrough_route in route` (substring)
so any admin-only path whose URL contained a provider name (openai, anthropic,
azure, bedrock, etc.) was misclassified as an LLM API route and bypassed the
admin gate in non_proxy_admin_allowed_routes_check.

Confirmed live: non-admin key could GET /credentials/by_name/openai (read
masked provider API key) and DELETE /credentials/openai (delete credential).

Fix: use exact match or startswith(prefix + "/") — the same pattern used
everywhere else in RouteChecks — so only routes that actually start with a
passthrough prefix are allowed through.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix: stabilize PR #27878 test failures

- key_management_endpoints: extend can_skip_admin_check to team keys so
  team members with /key/update permission can update non-budget fields.
  can_team_member_execute_key_management_endpoint already validates team
  membership + permission and raises if unauthorized; reaching the admin
  check on a team key means the caller was authorized.

- test: set created_by on mock key in
  test_update_key_non_budget_fields_allowed_for_internal_user so
  caller_is_creator resolves correctly (MagicMock default ≠ user_id).

- auth_utils.get_request_route: guard against non-dict request.scope
  (e.g. MagicMock in unit tests) to prevent a MagicMock leaking into
  UserAPIKeyAuth.request_route and failing Pydantic validation.

- ci: assign test_multipart_bypass_repro.py to the proxy-runtime shard
  in test-unit-proxy-db.yml to satisfy the shard-coverage check.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(lint): add explicit str() cast in get_request_route for MyPy

scope.get() returns Any|None which MyPy cannot coerce to str implicitly.
Wrap both scope.get() calls in str() to satisfy the type checker.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix: guard bare-/ root_path strip + make total_spend migration idempotent

auth_utils.get_request_route: when Starlette sets scope["app_root_path"]
to "/" (e.g. behind some middleware), the old stripping logic would
remove the leading slash from every path ("/team/new" → "team/new"),
breaking route matching and causing auth to misclassify protected routes.
Skip stripping when root_path is bare "/".

migration: add IF NOT EXISTS to total_spend ALTER TABLE so the migration
is safe to replay when a prior partial run already created the column.
Without this guard, prisma migrate deploy fails on CI DBs that were
partially migrated, causing all subsequent DB operations (including
/team/new) to 500.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix: require creator still owns key for personal-key bypass in /key/update

caller_is_creator now requires both created_by == caller AND user_id ==
caller. Previously checking only created_by let a demoted admin who
originally created a key for another user continue editing non-budget
fields on it after reassignment, bypassing _check_key_admin_access.

Adds regression test: creator whose key was reassigned is blocked (403).

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix: extract auth checks to fix PLR0915 + broaden max_budget assertion

internal_user_endpoints._update_single_user_helper exceeded 50 statements
(PLR0915). Extract authorization checks into _check_user_update_authz helper
to bring statement count under the limit.

test_validate_max_budget: assert "negative" (substring of both the local
"cannot be negative" and the CI "non-negative finite number" messages) so
the test is stable regardless of which exact wording the function uses.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: veria-ai[bot] <224490171+veria-ai[bot]@users.noreply.github.com>
2026-05-14 04:16:04 +00:00
Yuneng Jiang
83f26d17c1
Strip SERVER_ROOT_PATH before lazy-feature prefix match
LazyFeatureMiddleware compared the raw scope path against registered
prefixes (e.g. /policies), so requests under a server root path like
/api/v1/policies/... never matched, the feature never loaded, and the
endpoint returned 404. Strip the configured root path before matching,
normalizing trailing slashes and enforcing a component boundary so
/api does not falsely match /apiv2.
2026-05-12 20:43:08 -07:00
Michael-RZ-Berri
b888177ea6
fix: reset proxy budget when initial reset duration is null then updated (#27488)
Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain>
2026-05-09 18:33:36 -04:00
Michael Riad Zaky
5d7b7e7e37 fix(realtime): register /openai/v1/realtime as websocket route 2026-05-06 13:13:58 -07:00
yuneng-jiang
c2cea58567
Merge branch 'litellm_yj_may1' into codex/budget-race-enforcement 2026-05-01 14:32:18 -07:00
Yuneng Jiang
650821b538
Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_fix-config-update-targeted-upserts
# Conflicts:
#	tests/test_litellm/proxy/test_proxy_server.py
2026-05-01 10:38:34 -07:00
user
66c0fe23da handle bad reservation counters after spend write 2026-04-30 22:55:26 -07:00
user
0b1ea9eb8f harden budget reservation edge cases 2026-04-30 21:49:31 -07:00
user
b53adf7cff address budget reservation review edges 2026-04-30 21:21:26 -07:00
user
4f8769943b skip invalid budget window counter increments 2026-04-30 20:18:07 -07:00
user
c28e093f41 finalize budget reservations after counter updates 2026-04-30 19:50:36 -07:00
user
15d845c321 avoid stale local spend counters after redis misses 2026-04-30 19:33:55 -07:00
user
64fadc3b8e Merge remote-tracking branch 'origin/litellm_internal_staging' into codex/budget-race-enforcement-greptile-fix
# Conflicts:
#	litellm/proxy/db/spend_counter_reseed.py
#	litellm/proxy/proxy_server.py
2026-04-30 18:25:48 -07:00
Michael Riad Zaky
9f08db91f9 Refresh Redis TTL on counter writes and skip stale in-memory on Redis miss 2026-04-30 17:50:58 -07:00
user
694fadd175 fix budget reservation review findings 2026-04-30 17:38:18 -07:00
yuneng-jiang
15b7386859
Merge pull request #26815 from stuxf/fix/get-image-lfi-ssrf
chore(proxy): contain UI_LOGO_PATH / LITELLM_FAVICON_URL on unauthenticated asset endpoints
2026-04-30 17:10:15 -07:00
Michael Riad Zaky
47b2832d6f test: replace subprocess startup-import diff with static source scan 2026-04-30 16:15:46 -07:00
user
46183e6dc9 Merge remote-tracking branch 'origin/litellm_internal_staging' into codex/budget-race-enforcement-snapshot 2026-04-30 13:56:16 -07:00
user
ca50868b75 harden end-user and tag budget reservations 2026-04-30 13:37:10 -07:00
user
b8a141cefd fix(static-assets): stop serving stale logo cache 2026-04-30 11:34:25 -07:00
user
215f538d4f fix(static-assets): browser-load remote branding assets 2026-04-30 11:30:57 -07:00
user
926de696a1 tighten budget counter cache recovery 2026-04-29 20:51:07 -07:00
Yuneng Jiang
be3d27a0b8
[Test] Proxy: Add /config/update critical-path tests to test_proxy_server.py
Companion to the previous commit which deleted the symbol-named
tests/test_litellm/proxy/management_endpoints/test_update_config_endpoint.py.
Adds the 5 critical-path tests in their proper home — the test file
that mirrors the source file (proxy_server.py).

The two commits are one logical change; they were split because git
add aborted on a stale path argument.
2026-04-29 19:18:31 -07:00
Michael Riad Zaky
0f8dd28542 lazy-load optional feature routers on first request 2026-04-29 17:20:55 -07:00
user
55d393d77d
fix(static-assets): unblock CI — pass headers explicitly + harden + update legacy tests
Three CI failures from the previous push, all addressed:

* ``lint`` (mypy): ``async_client.get(url, **request_kwargs)`` confused
  mypy because ``AsyncHTTPHandler.get``'s second positional arg is typed
  ``bool | None``. Switched to an explicit branch:
  ``await async_client.get(rewritten_url, headers={"host": host_header})``
  for the HTTP-rewritten case, plain ``get(rewritten_url)`` otherwise.

* ``proxy-infra`` /
  ``test_get_image_custom_local_logo_bypasses_cache``: the existing
  test set ``UI_LOGO_PATH=/app/custom_logo.jpg`` with no
  ``LITELLM_ASSETS_PATH``, asserting the path was served verbatim. That
  was the LFI behaviour the new path-containment guard closes. Updated
  the test to set ``LITELLM_ASSETS_PATH=/app`` so the path is inside an
  allowed root, and patched the helper's ``realpath`` / ``isfile`` to
  go along with the mocked filesystem. Test intent (bypass cache when
  ``UI_LOGO_PATH`` is local) is preserved.

* ``auth-and-jwt`` / ``test_get_image_cache_logic``: existing test
  built a ``Mock`` response without ``headers``, so the new
  Content-Type check tripped on ``Mock().split(";")[0]``. Two fixes:

    1. Set ``mock_response.headers = {"content-type": "image/jpeg"}``
       on the test (matches the real upstream contract — a logo CDN
       always sets a Content-Type).
    2. Make ``fetch_validated_image_bytes`` defensive: if the
       Content-Type header is missing or non-string, treat as non-image
       and fall back to default. Closes a subtle hole — pre-fix, an
       upstream that omits Content-Type entirely would have served
       arbitrary bytes under the ``image/jpeg`` wrapper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 21:47:41 +00:00
user
9ef8572d67
fix(proxy): /get_logo_url no longer discloses local UI_LOGO_PATH
The unauthenticated ``/get_logo_url`` endpoint returned the
``UI_LOGO_PATH`` env var verbatim. For HTTP(S) URLs this is intended —
the dashboard loads the logo directly from a public/internal CDN. For
local filesystem paths it was an information disclosure: any caller
could fetch ``/get_logo_url`` and read admin-only filesystem details
like ``UI_LOGO_PATH=/etc/litellm/secret-config.json``.

Now the endpoint returns the URL only when it begins with
``http://`` or ``https://``. For local paths (or unset) it returns an
empty string — the dashboard falls back to ``/get_image`` which
serves the file via the path-containment guard added in the previous
commit.

Tests parametrize the disclosure-blocked cases (``/etc/...``,
``/proc/self/environ``, relative paths) and confirm HTTP / HTTPS URLs
still pass through unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 21:15:21 +00:00
Krrish Dholakia
fd32f29e39
Revert "lazy-load optional feature routers on first request (#26534)" (#26727)
This reverts commit 21ed38971d.
2026-04-29 00:21:41 +00:00
Michael-RZ-Berri
21ed38971d
lazy-load optional feature routers on first request (#26534)
Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain>
2026-04-28 17:04:40 -07:00
Michael Riad Zaky
6052ce1017 cache LiteLLM_Config param reads in DualCache + batch scheduler-tick fetch 2026-04-28 16:29:50 -07:00
Michael-RZ-Berri
3ef16098f2
Reseed enforcement read path from DB on counter miss (#26459)
Co-authored-by: Michael Riad Zaky <michaelr@Michaels-MacBook-Air.local>
2026-04-25 15:14:02 -07:00
Michael Riad Zaky
0bd49ecb8b Fix bug that bypasses per-team member budget limit 2026-04-22 10:41:13 -07:00
Ishaan Jaffer
e8461b5b97
style: run black formatter on files from main merge 2026-04-17 13:02:59 -07:00
Sameer Kankute
ffb87dcac9
Fix failing test and code qa + lint 2026-04-14 20:53:17 +05:30
Ryan Crabbe
8d9bbc6eb2
[Fix] Include access group models in UI model listing
Models associated with a team only through access groups (not directly in
team.models) were not appearing on the /ui/?page=models page. The API
authorization path already resolved access groups correctly, but the
/v2/model/info listing endpoint only checked team.models.

Add _add_access_group_models_to_team_models() which batch-fetches all
distinct access groups in a single find_many query, then resolves each
team's access group models into deployments and merges them into the
team_models dict.
2026-03-28 12:32:12 -07:00
yuneng-jiang
846e4b44b6
Merge pull request #24682 from michelligabriele/fix/budget-spend-counters
fix(proxy): enforce budget limits across multi-pod deployments via Redis-backed spend counters
2026-03-27 16:59:23 -07:00
michelligabriele
d533b432fd
fix(proxy): enforce budget limits across multi-pod deployments via Redis-backed spend counters
Budget checks on API keys, teams, and team members were not enforced in
multi-pod deployments because user_api_key_cache is intentionally
in-memory-only. Each pod tracked spend independently, so with N pods
the effective budget was N × max_budget.

Introduces a separate spend_counter_cache (DualCache wired to
redis_usage_cache) with atomic increment/read helpers:
- increment_spend_counters(): awaited in cost callback (not create_task)
  to update both in-memory and Redis before the next auth check
- get_current_spend(): reads Redis first (cross-pod authoritative),
  falls back to in-memory, then to cached object .spend from DB

Budget check functions (_virtual_key_max_budget_check,
_team_max_budget_check, _check_team_member_budget) now read spend via
get_current_spend() instead of cached object .spend fields.

When Redis is not configured, falls back to in-memory-only counters
(same as current single-instance behavior).

Fixes #23714
2026-03-27 20:39:52 +01:00
Sameer Kankute
92a07e2d6e
fix(proxy): address Greptile review feedback
- Remove HTTP_PROXY/HTTPS_PROXY from blocklist (legitimately used in corporate envs)
- Add NO_PROXY/no_proxy to blocklist (prevents bypassing proxy monitoring)
- Remove dead code in _is_valid_user_id (space exception was unreachable)
- Update tests accordingly

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 20:38:36 +05:30
Sameer Kankute
8112fbf274
fix(proxy): sanitize user_id input and block dangerous env var keys
Add input validation to get_user_id_from_request (length limit, control char rejection) and a blocklist of dangerous environment variable keys in _load_environment_variables to prevent PATH/LD_PRELOAD/PYTHONPATH override via config.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 20:38:36 +05:30
Ryan Crabbe
ad43a35d76 feat: add control plane for multi-proxy worker management
Adds a control plane capability that enables a central admin instance
to manage multiple regional worker proxies from a single UI.

Backend:
- Worker registry loaded from YAML config (worker_id, name, url)
- /.well-known/litellm-ui-config exposes is_control_plane and workers list
- /v3/login + /v3/login/exchange: opaque code exchange for cross-origin
  username/password auth (JWT never in URL/logs, single-use 60s TTL)
- SSO cookie handoff with return_to → opaque code → exchange
- _validate_return_to: full origin validation (scheme+hostname+port)
- Startup warning when control_plane_url set without Redis
- Both /v3 endpoints gated behind control_plane_url config

Frontend:
- Worker selector dropdown on login page (gated behind is_control_plane)
- Cross-origin SSO code exchange handling on callback
- switchToWorkerUrl: localStorage-persisted worker URL for API calls
- useWorker hook: shared worker state management
- WorkerDropdown in navbar for switching workers
- Logout/switch clears worker state from localStorage

Tests:
- 7 tests for /v3/login + /v3/login/exchange
- 10 tests for _validate_return_to
- 2 tests for control plane discovery endpoint
2026-03-19 22:50:19 -07:00
Sameer Kankute
7e2f2a8ffa Fix inflight mypy 2026-03-02 19:41:32 +05:30
Darien Kindlund
dc96ade956 fix: preserve interval_hours in model cost map reload config (#22200)
The upsert update branches for model_cost_map_reload_config were
overwriting param_value with only the force_reload flag, dropping
interval_hours. This caused scheduled reloads to self-destruct
after their first execution.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-02 19:21:11 +05:30
Ryan Crabbe
5bcaeabfd8 Merge origin/main into litellm_fix_streaming_connection_pool_leak
Resolve conflict in test_proxy_server.py: keep both async_data_generator
cleanup tests and store_model_in_db DB config override tests.
2026-02-21 12:44:50 -08:00
yuneng-jiang
6bfab8acd4 address greptile review feedback (greploop iteration 2)
Reset logo_path to default_logo when custom UI_LOGO_PATH file doesn't
exist, so the else branch at the bottom of get_image serves the default
logo instead of the non-existent custom path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-19 20:25:00 -08:00