* chore(admin-ui): regenerate static export with trailingSlash: true
Rebuilds litellm/proxy/_experimental/out/ from ui/litellm-dashboard with
`trailingSlash: true` enabled in next.config.mjs. Next.js now emits every
route as <dir>/index.html (e.g. mcp/oauth/callback/index.html) instead of
<dir>.html with a sibling metadata-only directory, which fixes the 404 on
extensionless URLs served through FastAPI's StaticFiles(html=True) mount.
This is the build artifact half of the fix; the config change, Dockerfile
cleanup, and regression test live in the follow-up source PR that stacks
on top of this branch.
* fix(admin-ui): emit nested routes as <dir>/index.html (#28106)
Linear and other OAuth providers redirect the user back to
/ui/mcp/oauth/callback?code=...&state=... after the consent step. The
packaged Next.js static export only produced /ui/mcp/oauth/callback.html,
so FastAPI's StaticFiles served a 404 on the extensionless URL and the
OAuth handshake never completed.
The Dockerfile.non_root build step tried to paper over this at image-build
time with `for html_file in *.html; do ...`, but that shell glob does not
recurse, so nested routes like mcp/oauth/callback.html were left stranded
next to an empty mcp/oauth/callback/ directory containing only Next.js
metadata. The runtime restructure step in proxy_server.py was then skipped
because the .litellm_ui_ready marker had already been dropped.
Set trailingSlash: true in the dashboard's Next.js config so the export
emits every nested route as <dir>/index.html natively. The Dockerfile loop
is now a no-op for the bundled UI and has been removed; the
.litellm_ui_ready marker is still written so the proxy keeps skipping the
redundant Python restructure step at startup. Stacks on top of the static
export regeneration in the parent branch.
* chore: restore origin/litellm_internal_staging out files
* fix(spend_counter): seed Redis counter via SET NX to prevent cross-pod double-seed
Symptom
-------
Customers on multi-pod deployments see team `spend` jump to ~2x (or N x
the pod count) shortly after a Redis cache miss / TTL expiry, triggering
spurious "Budget Crossed" alerts and blocked requests until the value is
manually reset.
Root cause
----------
`SpendCounterReseed.coalesced` warmed the primary spend counter by
calling `redis.async_increment(key, value=db_spend, refresh_ttl=True)`,
which lowers to Redis `INCRBYFLOAT`. That is additive, not idempotent.
The per-counter `asyncio.Lock` only coalesces seeders inside one
process. With N pods sharing one Redis, on a cold key (cold start, TTL
expiry, manual delete) every pod independently passes its lock + Redis
re-check, reads the same `db_spend`, and issues `INCRBYFLOAT db_spend`.
Final value: N x db_spend.
Fix
---
Use `redis.async_set_cache(key, value=db_spend, nx=True)` for the seed.
SET NX is atomic across pods: exactly one writer initializes the key;
losers read the winner's value via `async_get_cache`. This is the same
idiom already used by `coalesced_window` in the same file, so the two
seed paths are now consistent.
Per-request deltas continue to use `INCRBYFLOAT` (correct - additive
behaviour is what we want for increments, not for initial seed).
Verification
------------
Live two-process repro against the same Postgres + Redis (DB
spend = 506):
Unpatched: 4/4 runs -> Redis counter = ~1012 (~2 x db_spend)
Patched: 12/12 runs -> Redis counter = ~506
Unit tests (`test_proxy_server.py`):
- New `test_primary_spend_counter_redis_concurrent_seed_does_not_double_seed`
patches `_get_lock` to return a fresh lock per caller (otherwise the
per-process lock masks the race), races two `coalesced` calls, and
asserts final = 506 with exactly one of two SET NX attempts winning.
- 4 existing tests updated for the new seed contract (SET NX for the
seed, INCRBYFLOAT only for the per-request delta).
- Full `spend_counter or reseed or budget` slice: 22 passed.
Co-authored-by: Cursor <cursoragent@cursor.com>
* test(spend_counter): make SET NX mock atomic so loser branch is exercised
Greptile flagged that `redis_set_cache` in
test_primary_spend_counter_redis_concurrent_seed_does_not_double_seed
placed `await asyncio.sleep(0)` AFTER the NX membership check. Both
concurrent tasks observed an empty `redis_store`, passed the guard, and
both returned True - so the loser branch (else: read back winner's value)
was never exercised.
Fix the mock to model real atomic Redis SET NX:
- Yield BEFORE the membership check so two concurrent callers interleave
the way real SET NX does (first to resume runs check + write atomically
and wins; second resumes after the key exists and loses).
- Track set_cache return values; assert sorted([loser, winner]) so we
know exactly one task wins and one loses.
- Track async_get_cache calls that happen AFTER at least one SET NX has
completed; assert at least one such read - that is the loser-path
fallback (`current_value = float(cached)` when seeded is False).
Verified by temporarily reverting the mock to the old order: the test
now fails with `expected exactly one SET NX winner and one loser, got
[True, True]`, exactly the failure mode Greptile described.
No production code change.
Co-authored-by: Cursor <cursoragent@cursor.com>
* test(spend_counter): mock async_set_cache to populate redis_store in concurrent read+write test
`test_concurrent_read_and_write_paths_share_one_db_query` mocks
`async_increment` to populate the in-memory `redis_store`, but did not
mock `async_set_cache`. After the SET-NX seed change in `coalesced()`,
the seed step writes via `async_set_cache(nx=True)` (default AsyncMock,
no `redis_store` write), so the simulated Redis stays empty after the
first reseed. The second `get_current_spend` then sees a clean Redis
miss, re-enters the DB read path, and the test fails with
`expected 1 DB query, got 2`.
Fix: add a `redis_set_cache` side_effect that updates `redis_store` on
`nx=True` (and rejects when the key already exists), matching the
pattern used by the four sibling tests fixed in this branch's first
commit. Pre-existing assertions are unchanged.
Full `tests/test_litellm/proxy/test_proxy_server.py`: 158 passed.
Co-authored-by: Cursor <cursoragent@cursor.com>
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(proxy): decode bytes and pass-through SSE for Google-native streamGenerateContent (#27444)
* fix(proxy): address Greptile review on Google-native SSE bytes path
Remove unreachable try/except around SSE pass-through yield and add a
unit test covering pre-formatted SSE bytes, terminator padding, and
non-SSE byte fallback wrapping.
Co-authored-by: Cursor <cursoragent@cursor.com>
---------
Co-authored-by: Tai An <antai12232931@outlook.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(proxy): sort BYOK models by team_public_model_name in /v2/model/info
Team BYOK rows persist an internal `model_name` like
`model_name_{team_id}_{uuid}` and expose the user-facing name via
`model_info.team_public_model_name`. The UI's `getDisplayModelName`
and the search filter already fall back to that field, but
`_sort_models` was keying off the raw `model_name` — so BYOK rows
ranked by their opaque IDs and clumped at the end of the alphabetized
list instead of interleaving with non-BYOK rows.
Match the UI/search behavior: prefer `team_public_model_name` when
present, fall back to `model_name` otherwise.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(proxy): case-insensitive DB-side search for BYOK models
`_apply_search_filter_to_models` used Prisma's JSON path
`string_contains` to match the BYOK `team_public_model_name` field, but
that operator is case-sensitive in Postgres (no `mode: insensitive`
flag like column-level string filters have). So a search for "claude"
missed a stored "Claude Sonnet" via the DB branch even though the
router-side path matched it case-insensitively.
Widen the JSON branch to "row has a team_public_model_name set" and
filter case-insensitively in Python so DB-only BYOK rows match the
same terms users see in the UI. This also drops the now-unused
DB-level page-size optimization and `sort_by` knob — the in-Python
filter is the source of truth for `db_models_total_count` now.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(proxy): scope BYOK search results to caller's accessible teams
`_apply_search_filter_to_models` was widened to fetch every row with a
`team_public_model_name` set so case-insensitive search could match
mixed-case stored names. `/v2/model/info` is reachable by non-admin
keys though, and the helper ran before `include_team_models` / `teamId`
filtering — so a non-admin caller could search a common substring like
"claude" and see BYOK rows belonging to teams they're not a member of.
Resolve the caller's team membership once (admin → no scoping, else
their `user_row.teams`) and drop BYOK rows (those with
`model_info.team_id` set) outside that scope on both the router-side
matches and the over-broad DB query, before display-name matching.
Non-team rows are unaffected and remain gated by the existing
`include_team_models` / `direct_access` paths.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(proxy): search by team_public_model_name and scope teamId queries
- /v2/model/info search now matches both `model_name` and
`model_info.team_public_model_name`, so team BYOK rows (which persist
an internal `model_name_{team_id}_{uuid}`) are findable by the public
name shown in the UI. DB query OR-includes a JSON-path match on
`team_public_model_name` for rows that exist only in the DB.
- `_filter_models_by_team_id` no longer short-circuits on the viewer's
`direct_access` flag — that describes the admin viewer's own
permissions and would leak every public model into a team-scoped view.
Models are kept only when they belong to the team (own BYOK, in
access_via_team_ids, or reachable via team.models / access groups).
- Added `_authorize_team_id_query`: the untrusted `teamId` query
parameter now requires the caller to be a proxy admin or a member of
the requested team, otherwise returns 403. Without this, any
authenticated user could enumerate another team's BYOK metadata by
guessing the team id.
- `_get_caller_byok_team_scope` now treats `PROXY_ADMIN_VIEW_ONLY` the
same as `PROXY_ADMIN` (both are admin roles); previously VIEW_ONLY
admins fell through to a user-id team lookup and saw only their own
teams' BYOK rows.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(proxy): bound BYOK search DB fetch in /v2/model/info
Previously the DB-side search OR'd a JSON-path predicate
`{model_info: {path: [team_public_model_name], string_contains: ""}}`
to compensate for Prisma's case-sensitive JSON `string_contains` on
Postgres. That predicate matches every row that has any
`team_public_model_name` set, so any authenticated caller could force a
full BYOK-table read with `/v2/model/info?search=x` regardless of page
size.
Drop the JSON-path branch. The DB query now does a bounded
`model_name contains <search>` lookup. BYOK rows that are loaded into
the router are still searchable by their `team_public_model_name` via
the router-side filter; only the rare edge case of a BYOK row that
exists only in the DB (router sync failed) loses display-name search,
which is an acceptable trade-off given the DoS surface.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(proxy): bound DB find_many in /v2/model/info search
The previous bounding patch dropped the page-aware `take=N` on
`find_many`, so a broad `?search=model` would load and decrypt every
matching DB row on each request even though the response only returns
one page.
Restore bounded fetches in `_apply_search_filter_to_models`:
* Unsorted searches use `take = max(0, page * size - router_count)`,
i.e. exactly one page worth of remaining DB rows.
* Sorted searches need ordering across the full match set, so they cap
at `_SORTED_SEARCH_DB_FETCH_CAP = 500` instead of fetching everything.
* Total count comes from a cheap `count(...)` query so pagination stays
accurate without materializing every row.
Wired `page`, `size`, and `sortBy` through from the endpoint and added
a regression test covering both `take` values.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor(proxy): extract DB-fetch helper to satisfy PLR0915
_apply_search_filter_to_models tripped Ruff's "too many statements"
(51 > 50) after the bounded-fetch fix. Move the DB-side block into
`_fetch_db_models_for_search`, which keeps the same behavior:
* Bounded `take` via page math (unsorted) or `_SORTED_SEARCH_DB_FETCH_CAP`
(sorted)
* Cheap `count(...)` for accurate pagination totals
* Caller-team scope applied to fetched rows before decrypt
Pure refactor; no behavior change. All 8 BYOK/team tests still pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* style: apply black formatting to _fetch_db_models_for_search
CI's "Check Black formatting" step flagged one line in the helper added
in d55eecf6af. No behavior change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds test_update_config_env_var_round_trip_not_double_encrypted, which
drives the real /config/update handler: first write plaintext, then
re-POST the stored ciphertext (the Admin UI round-trip) and assert the
value is not stacked with a second encryption layer and untouched keys
stay byte-identical. Verified to fail against the pre-fix handler and
pass after. Also tightens the unit test to exactly three ciphertext
re-feeds.
A single decrypt-then-encrypt chokepoint (_encrypt_env_variables_for_db)
now backs both update_config and save_config. Re-submitting a value the
Admin UI read back from /get/config/callbacks as ciphertext no longer
stacks a second encryption layer, which previously decrypted to garbage
and silently broke the callback. The chokepoint decrypts with the pure
_decrypt_db_variables (no os.environ mutation on the write path) and
encrypts exactly once; update_config merges only the sent keys so
untouched env vars keep their stored ciphertext byte-for-byte.
- Introduce `_CallbackCapabilities` dataclass and `ProxyLogging._callback_capabilities()` static method that inspects `litellm.callbacks` once and caches capability flags keyed on (list length, member ids); invalidates automatically when the callback list mutates without per-request iteration overhead
- Replace O(n) `litellm.callbacks` walks in `async_pre_call_hook`, `during_call_hook`, `async_post_call_streaming_iterator_hook`, `async_post_call_streaming_hook`, and `post_call_response_headers_hook` with fast-path exits when no relevant callbacks are registered
- Add `needs_iterator_wrap()` and `needs_per_chunk_streaming_hook()` instance methods to decouple iterator-level wrapping from per-chunk hook execution; avoids `get_response_string` materialization per chunk when no guardrail or chunk-hook callback is active
- Introduce `_fast_serialize_simple_model_response_stream()` using `orjson` for common single-choice text streaming chunks, bypassing the full Pydantic serializer; falls back to `model_dump_json` for tool calls, logprobs, usage, and provider-specific fields
- Add early-return in `_restamp_streaming_chunk_model` when downstream model already matches the requested model, avoiding unnecessary string comparisons on every chunk
- Fix stale zero-cost cache bug in `_is_model_cost_zero`: move the per-router `_zero_cost_cache` dict onto the `Router` instance and clear it in `_invalidate_model_group_info_cache` so in-place pricing updates via `upsert_deployment` immediately resume budget enforcement
- Add `scripts/benchmark_chat_completions_perf.py`: standalone async benchmarking tool with a mock OpenAI provider, LiteLLM proxy process management, non-streaming RPS, streaming TTFT, and full-stream latency measurements with repeat/median run support
- Add comprehensive unit tests covering capability detection, cache invalidation, fast-path correctness, zero-cost cache regression, and the no-callback streaming fast path
Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>
* fix: patch Host-header auth bypass in get_request_route
Starlette reconstructs request.url from the Host header. A malformed
Host like `localhost/?x=1` causes Starlette to build the full URL as
`http://localhost/?x=1/health`, which url-parses to path="/". Since "/"
is in LiteLLMRoutes.public_routes, all protected routes became reachable
without authentication.
Fix: read scope["path"] (set by uvicorn from the HTTP request line,
not derivable from headers) instead of request.url.path. Sub-path
deployments are handled via scope["app_root_path"] / scope["root_path"],
mirroring Starlette's own base_url construction logic.
Affected variants confirmed fixed:
Host: localhost/?x=1
Host: localhost:4000/?x=1
Host: localhost/#test
Host: localhost:4000/#test
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
* style: reduce comments in route fix
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
* fix: block credential fields in RAG ingest vector_store options
Credential fields (vertex_credentials, aws_access_key_id, api_key, etc.)
in ingest_options.vector_store are now rejected at the API boundary with
a 400 error. Credentials must be configured server-side.
Previously any authenticated user could supply a vertex_credentials dict
with type=external_account pointing credential_source.file at an
arbitrary path (e.g. /proc/1/environ) and token_url at an
attacker-controlled server. google-auth's identity_pool.Credentials
refresh() would read the file and POST its contents to the attacker.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
* fix: block /key/update self-escalation by assigned users
Non-admin users who were assigned a key (created_by != caller) could
update any non-budget field — models, rpm_limit, guardrails, etc. —
without admin authorization, allowing privilege self-escalation.
Gate: only the key creator (created_by == caller) may edit their own
key without admin check; budget changes always require admin regardless
of creator status. All other callers must pass _check_key_admin_access.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
* fix: block user-controlled api_base in RAG ingest vector_store options
A user-supplied api_base in ingest_options.vector_store caused the server
to forward its configured provider credentials (Gemini, OpenAI) to an
attacker-controlled endpoint via SSRF.
Add api_base to the blocked credential params set alongside api_key and
the existing credential fields.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
* fix: restrict /utils/transform_request to PROXY_ADMIN and apply body safety check
Any authenticated internal_user could POST arbitrary provider config
(aws_sts_endpoint, api_base, etc.) to /utils/transform_request and have
the server forward its credentials to an attacker-controlled endpoint.
- Gate the endpoint on PROXY_ADMIN role (403 for all other roles)
- Call is_request_body_safe() to reject banned params even for admins
- Convert ValueError from safety check to HTTP 400
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
* fix: apply banned-param check to /utils/transform_request
Without is_request_body_safe(), any authenticated user could pass
aws_sts_endpoint, api_base, or aws_web_identity_token to
/utils/transform_request and have the server forward its configured
provider credentials to an attacker-controlled endpoint during SDK
credential resolution.
Applies the same banned-param blocklist already used by LLM endpoints.
Endpoint remains accessible to all authenticated users.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
* fix: block SSRF via api_base in /prompts/test dotprompt YAML frontmatter
Any frontmatter key not in ["model","input","output"] flowed into
optional_params and was merged into the LLM call data dict, bypassing
is_request_body_safe. An attacker with any bearer key could set
api_base in YAML to redirect the outbound LLM request — including the
provider API key — to an attacker-controlled host.
Fix: call is_request_body_safe on the constructed data dict after
optional_params are merged, before invoking ProxyBaseLLMRequestProcessing.
ValueError from the banned-param check is surfaced as HTTP 400.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
* Update litellm/proxy/rag_endpoints/endpoints.py
Co-authored-by: veria-ai[bot] <224490171+veria-ai[bot]@users.noreply.github.com>
* fix: coerce nested config strings before banned-param check
_NESTED_CONFIG_KEYS descent used isinstance(nested, dict) which silently
skipped litellm_embedding_config when delivered as a JSON string via
multipart/form-data. Banned params (api_base, aws_sts_endpoint, etc.)
nested inside the stringified value were invisible to is_request_body_safe.
_NESTED_METADATA_KEYS already used _coerce_metadata_to_dict which parses
JSON strings before checking. Apply the same coercion to _NESTED_CONFIG_KEYS.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
* fix: replace substring match with prefix match in is_llm_api_route
mapped_pass_through_routes used `_llm_passthrough_route in route` (substring)
so any admin-only path whose URL contained a provider name (openai, anthropic,
azure, bedrock, etc.) was misclassified as an LLM API route and bypassed the
admin gate in non_proxy_admin_allowed_routes_check.
Confirmed live: non-admin key could GET /credentials/by_name/openai (read
masked provider API key) and DELETE /credentials/openai (delete credential).
Fix: use exact match or startswith(prefix + "/") — the same pattern used
everywhere else in RouteChecks — so only routes that actually start with a
passthrough prefix are allowed through.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
* fix: stabilize PR #27878 test failures
- key_management_endpoints: extend can_skip_admin_check to team keys so
team members with /key/update permission can update non-budget fields.
can_team_member_execute_key_management_endpoint already validates team
membership + permission and raises if unauthorized; reaching the admin
check on a team key means the caller was authorized.
- test: set created_by on mock key in
test_update_key_non_budget_fields_allowed_for_internal_user so
caller_is_creator resolves correctly (MagicMock default ≠ user_id).
- auth_utils.get_request_route: guard against non-dict request.scope
(e.g. MagicMock in unit tests) to prevent a MagicMock leaking into
UserAPIKeyAuth.request_route and failing Pydantic validation.
- ci: assign test_multipart_bypass_repro.py to the proxy-runtime shard
in test-unit-proxy-db.yml to satisfy the shard-coverage check.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
* fix(lint): add explicit str() cast in get_request_route for MyPy
scope.get() returns Any|None which MyPy cannot coerce to str implicitly.
Wrap both scope.get() calls in str() to satisfy the type checker.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
* fix: guard bare-/ root_path strip + make total_spend migration idempotent
auth_utils.get_request_route: when Starlette sets scope["app_root_path"]
to "/" (e.g. behind some middleware), the old stripping logic would
remove the leading slash from every path ("/team/new" → "team/new"),
breaking route matching and causing auth to misclassify protected routes.
Skip stripping when root_path is bare "/".
migration: add IF NOT EXISTS to total_spend ALTER TABLE so the migration
is safe to replay when a prior partial run already created the column.
Without this guard, prisma migrate deploy fails on CI DBs that were
partially migrated, causing all subsequent DB operations (including
/team/new) to 500.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
* fix: require creator still owns key for personal-key bypass in /key/update
caller_is_creator now requires both created_by == caller AND user_id ==
caller. Previously checking only created_by let a demoted admin who
originally created a key for another user continue editing non-budget
fields on it after reassignment, bypassing _check_key_admin_access.
Adds regression test: creator whose key was reassigned is blocked (403).
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
* fix: extract auth checks to fix PLR0915 + broaden max_budget assertion
internal_user_endpoints._update_single_user_helper exceeded 50 statements
(PLR0915). Extract authorization checks into _check_user_update_authz helper
to bring statement count under the limit.
test_validate_max_budget: assert "negative" (substring of both the local
"cannot be negative" and the CI "non-negative finite number" messages) so
the test is stable regardless of which exact wording the function uses.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: veria-ai[bot] <224490171+veria-ai[bot]@users.noreply.github.com>
LazyFeatureMiddleware compared the raw scope path against registered
prefixes (e.g. /policies), so requests under a server root path like
/api/v1/policies/... never matched, the feature never loaded, and the
endpoint returned 404. Strip the configured root path before matching,
normalizing trailing slashes and enforcing a component boundary so
/api does not falsely match /apiv2.
Companion to the previous commit which deleted the symbol-named
tests/test_litellm/proxy/management_endpoints/test_update_config_endpoint.py.
Adds the 5 critical-path tests in their proper home — the test file
that mirrors the source file (proxy_server.py).
The two commits are one logical change; they were split because git
add aborted on a stale path argument.
Three CI failures from the previous push, all addressed:
* ``lint`` (mypy): ``async_client.get(url, **request_kwargs)`` confused
mypy because ``AsyncHTTPHandler.get``'s second positional arg is typed
``bool | None``. Switched to an explicit branch:
``await async_client.get(rewritten_url, headers={"host": host_header})``
for the HTTP-rewritten case, plain ``get(rewritten_url)`` otherwise.
* ``proxy-infra`` /
``test_get_image_custom_local_logo_bypasses_cache``: the existing
test set ``UI_LOGO_PATH=/app/custom_logo.jpg`` with no
``LITELLM_ASSETS_PATH``, asserting the path was served verbatim. That
was the LFI behaviour the new path-containment guard closes. Updated
the test to set ``LITELLM_ASSETS_PATH=/app`` so the path is inside an
allowed root, and patched the helper's ``realpath`` / ``isfile`` to
go along with the mocked filesystem. Test intent (bypass cache when
``UI_LOGO_PATH`` is local) is preserved.
* ``auth-and-jwt`` / ``test_get_image_cache_logic``: existing test
built a ``Mock`` response without ``headers``, so the new
Content-Type check tripped on ``Mock().split(";")[0]``. Two fixes:
1. Set ``mock_response.headers = {"content-type": "image/jpeg"}``
on the test (matches the real upstream contract — a logo CDN
always sets a Content-Type).
2. Make ``fetch_validated_image_bytes`` defensive: if the
Content-Type header is missing or non-string, treat as non-image
and fall back to default. Closes a subtle hole — pre-fix, an
upstream that omits Content-Type entirely would have served
arbitrary bytes under the ``image/jpeg`` wrapper.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The unauthenticated ``/get_logo_url`` endpoint returned the
``UI_LOGO_PATH`` env var verbatim. For HTTP(S) URLs this is intended —
the dashboard loads the logo directly from a public/internal CDN. For
local filesystem paths it was an information disclosure: any caller
could fetch ``/get_logo_url`` and read admin-only filesystem details
like ``UI_LOGO_PATH=/etc/litellm/secret-config.json``.
Now the endpoint returns the URL only when it begins with
``http://`` or ``https://``. For local paths (or unset) it returns an
empty string — the dashboard falls back to ``/get_image`` which
serves the file via the path-containment guard added in the previous
commit.
Tests parametrize the disclosure-blocked cases (``/etc/...``,
``/proc/self/environ``, relative paths) and confirm HTTP / HTTPS URLs
still pass through unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Models associated with a team only through access groups (not directly in
team.models) were not appearing on the /ui/?page=models page. The API
authorization path already resolved access groups correctly, but the
/v2/model/info listing endpoint only checked team.models.
Add _add_access_group_models_to_team_models() which batch-fetches all
distinct access groups in a single find_many query, then resolves each
team's access group models into deployments and merges them into the
team_models dict.
Budget checks on API keys, teams, and team members were not enforced in
multi-pod deployments because user_api_key_cache is intentionally
in-memory-only. Each pod tracked spend independently, so with N pods
the effective budget was N × max_budget.
Introduces a separate spend_counter_cache (DualCache wired to
redis_usage_cache) with atomic increment/read helpers:
- increment_spend_counters(): awaited in cost callback (not create_task)
to update both in-memory and Redis before the next auth check
- get_current_spend(): reads Redis first (cross-pod authoritative),
falls back to in-memory, then to cached object .spend from DB
Budget check functions (_virtual_key_max_budget_check,
_team_max_budget_check, _check_team_member_budget) now read spend via
get_current_spend() instead of cached object .spend fields.
When Redis is not configured, falls back to in-memory-only counters
(same as current single-instance behavior).
Fixes#23714
- Remove HTTP_PROXY/HTTPS_PROXY from blocklist (legitimately used in corporate envs)
- Add NO_PROXY/no_proxy to blocklist (prevents bypassing proxy monitoring)
- Remove dead code in _is_valid_user_id (space exception was unreachable)
- Update tests accordingly
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add input validation to get_user_id_from_request (length limit, control char rejection) and a blocklist of dangerous environment variable keys in _load_environment_variables to prevent PATH/LD_PRELOAD/PYTHONPATH override via config.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds a control plane capability that enables a central admin instance
to manage multiple regional worker proxies from a single UI.
Backend:
- Worker registry loaded from YAML config (worker_id, name, url)
- /.well-known/litellm-ui-config exposes is_control_plane and workers list
- /v3/login + /v3/login/exchange: opaque code exchange for cross-origin
username/password auth (JWT never in URL/logs, single-use 60s TTL)
- SSO cookie handoff with return_to → opaque code → exchange
- _validate_return_to: full origin validation (scheme+hostname+port)
- Startup warning when control_plane_url set without Redis
- Both /v3 endpoints gated behind control_plane_url config
Frontend:
- Worker selector dropdown on login page (gated behind is_control_plane)
- Cross-origin SSO code exchange handling on callback
- switchToWorkerUrl: localStorage-persisted worker URL for API calls
- useWorker hook: shared worker state management
- WorkerDropdown in navbar for switching workers
- Logout/switch clears worker state from localStorage
Tests:
- 7 tests for /v3/login + /v3/login/exchange
- 10 tests for _validate_return_to
- 2 tests for control plane discovery endpoint
The upsert update branches for model_cost_map_reload_config were
overwriting param_value with only the force_reload flag, dropping
interval_hours. This caused scheduled reloads to self-destruct
after their first execution.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reset logo_path to default_logo when custom UI_LOGO_PATH file doesn't
exist, so the else branch at the bottom of get_image serves the default
logo instead of the non-existent custom path.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>