Commit Graph

39663 Commits

Author SHA1 Message Date
Haitao Pan
7eaa4aeb3e fix: return failed status dict instead of raising Exception when wildcard model provider is unknown in ahealth_check_wildcard_models 2026-06-16 17:46:49 +08:00
Haitao Pan
b02df7735a feat: implement unified Auth Token SSO callback in /sso/key/generate using master_key verification 2026-06-16 17:16:45 +08:00
Haitao Pan
1bc132132e ci: publish litellm runtime releases 2026-06-15 21:58:56 +08:00
ryan-crabbe-berri
3ad385a8a4
feat(ui): migrate budgets, workflows, and guardrails-monitor to path routes (#30236)
* feat(ui): cut budgets, workflows, and guardrails-monitor over to path routes

Continues the page-by-page App Router migration (#30185, #30226). All
three legacy switch arms passed only accessToken, so each route wrapper
is a thin useAuthorized() + render. MIGRATED_PAGES routes the sidebar
and redirects the legacy ?page= URLs; the e2e fixture picks all three
up in the migration smoke and sidebar specs automatically.

* refactor(ui): colocate budgets, workflows, and guardrails-monitor components

budgets and workflow_runs were imported only by the legacy switch, so
they move wholesale into their route folders; the budgetItem type
hoists into the shared useBudgets hook, which owns the API response
shape, so the hooks layer no longer imports from a page folder.
GuardrailsMonitor keeps LogViewer, mockData, and MetricCard at the
shared src/components home because ToolDetail and ToolPolicies import
them; the rest moves. eslint suppressions are re-keyed accordingly.

* fix(ui): restore MetricCard test-utils path and merge duplicate import

MetricCard.test.tsx got the moved-tree depth rewrite before being moved
back to src/components/GuardrailsMonitor, leaving a five-level path
that escapes the project root; the suite failed at import. Also merge
the two imports from useBudgets in budget_panel.tsx. Both flagged by
Greptile.
2026-06-11 14:27:40 -07:00
Yassin Kortam
1828a7c6f0
fix(passthrough): resolve costing model when body model is unknown (#30160) 2026-06-11 14:26:55 -07:00
michelligabriele
8e12d42ea7
fix(proxy): coalesce NULL rollup metrics in aggregated daily-activity (#30151) 2026-06-11 22:32:08 +02:00
ryan-crabbe-berri
a2c916fb45
feat(ui): migrate projects and access-groups to path routes (#30226)
* feat(ui): cut projects and access-groups over to path routes

Same recipe as playground (#30185): MIGRATED_PAGES entries route the
sidebar and redirect the legacy ?page= URLs, the switch arms are
deleted, and the e2e fixture grows two entries. Both components were
already zero-prop and self-fetching via React Query hooks, so the
route wrappers are trivial.

* refactor(ui): move Projects and AccessGroups components into their route folders

Both folders were imported only by the legacy switch, so they colocate
wholesale under (dashboard)/{projects,access-groups}/components. Their
React Query hooks stay in the shared (dashboard)/hooks layer. eslint
suppressions are re-keyed to the new paths.

* test(ui): enable enable_projects_ui in e2e global setup

The projects migration smoke clicks the Projects sidebar link, which
only renders when the enterprise-gated enable_projects_ui setting is
on; the seeded e2e database starts with it off, so the locator timed
out in both e2e_ui_testing jobs. CI already launches the proxy with
LITELLM_LICENSE for premium UI coverage, so flip the setting in
globalSetup via the same /update/ui_settings call the admin UI toggle
makes, failing loudly if the PATCH is rejected.

* test(ui): use Playwright request context instead of raw fetch in global setup

The frontend lint bans raw fetch() outside src/lib/http/; the e2e
convention for proxy API calls is Playwright's APIRequestContext, as
in routerSettings.spec.ts.
2026-06-11 13:20:21 -07:00
ryan-crabbe-berri
530c0b2326
feat(ui): migrate playground to path routing and colocate its files (#30185)
* feat(ui): cut playground over to the /ui/playground path route

Follows the api-reference recipe: the sidebar and deep links route
llm-playground to the path route, ?page=llm-playground redirects, and
the legacy switch arm is deleted. The route's page.tsx was already the
real implementation, so no view extraction was needed.

* refactor(ui): move playground-owned files into its route folder

Per the (dashboard) README convention, page-owned code lives in the
page's folder: chat_ui/compareUI/complianceUI components, the chat
hooks, and the playground-only llm_calls helpers move under
(dashboard)/playground/. Modules with non-playground consumers (chat
message primitives; fetch_models, chat_completion, responses_api) stay
at their lowest common ancestor in src/components/{chat_ui,llm_calls}
because legacy pages still import them. eslint-suppressions entries are
re-keyed to the new paths so the grandfathered baseline still applies.

* test(ui): teach sidebar e2e spec about migrated path routes

The sidebar spec asserted ?page=<key> for every item, which the
playground cutover correctly broke: the sidebar now links to
/ui/playground and the legacy URL redirects there. Drive the expected
URL from the migration fixture (now a page-id -> segment map) so
future cutovers only add a fixture entry. Also wrap one import line
in AgentBuilderView.tsx that the move left unformatted; the changed-
files prettier check flagged it.
2026-06-11 12:07:17 -07:00
Yassin Kortam
a992ed18df
feat(spend_logs): opt-in native Postgres partitioning for SpendLogs retention (#29466)
High-volume deployments see LiteLLM_SpendLogs grow unbounded because
retention via DELETE leaves dead tuples that autovacuum cannot reclaim
fast enough. With a range-partitioned table, retention drops whole
partitions instead: an instant metadata operation that returns disk to
the OS immediately.

The feature is gated behind general_settings.use_spend_logs_partitioning
(default false). With the flag off, the cleanup job never queries the
catalog and behaves exactly as today. With it on, the job verifies the
table is partitioned, pre-creates upcoming partitions, and drops expired
ones; expired rows the drops cannot reach (DEFAULT partition, partitions
spanning the cutoff) are still deleted row-wise so retention is never
bypassed. If the table is not partitioned it falls back to batched
DELETE only.

Converting an existing table is a manual, documented operation in
db_scripts/partition_spend_logs.sql; db_scripts/unpartition_spend_logs.sql
rolls it back. Both scripts rename the old table's indexes aside before
recreating them, since a table rename keeps the schema-unique index names
and would otherwise silently skip the CREATE INDEX IF NOT EXISTS block.

Granularity and pre-create lookahead are tunable via
SPEND_LOG_PARTITION_INTERVAL (day/week/month, invalid values fall back to
day) and SPEND_LOG_PARTITION_PRECREATE_AHEAD.
2026-06-11 11:02:42 -07:00
Yassin Kortam
012d9f6c0a
feat(rate-limiter): allow opting out of v3 TPM reservation and Redis circuit breaker (#30211) 2026-06-11 10:34:26 -07:00
ryan-crabbe-berri
0d120de785
chore(hooks): enforce Conventional Commits and Conventional Branches (#30174)
* chore(hooks): enforce Conventional Commits and Conventional Branches

Adds opt-in local git hooks plus a CI PR-title check:

- .githooks/commit-msg validates commit subjects against Conventional
  Commits 1.0.0 (feat|fix|docs|style|refactor|perf|test|build|ci|
  chore|revert)(scope)!: subject. Merge/revert/fixup!/squash!/amend!
  messages pass through; --no-verify still works.
- .githooks/pre-push validates branch names against Conventional
  Branches (feature|bugfix|hotfix|release|chore)/desc. Bypasses
  main, litellm_internal_staging, dependabot/*, gh-readonly-queue/*.
  Tag pushes and deletions are skipped.
- scripts/install_git_hooks.sh sets core.hooksPath=.githooks and is
  wired up as 'make install-hooks'. Opt-in — not chained into
  install-dev.
- .github/workflows/conventional-commits.yml validates PR titles via
  amannn/action-semantic-pull-request pinned to v6.1.1's SHA. This is
  the actual gate since squash-merge uses the PR title as the commit
  subject.
- tests/test_litellm/test_git_hooks.py exercises both hooks via
  subprocess for accept / reject / bypass / git-generated-message
  cases.
- CONTRIBUTING.md documents the conventions, the install step, the
  bypass list, and the --no-verify escape hatch.

Resolves LIT-3306

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(hooks): address Greptile review on PR #28703

Resolves two findings from the automated code review:

1. CONTRIBUTING.md: shrink the new Conventional Commits / Branches
   section to a 2-line pointer at docs.litellm.ai. Per the team
   convention, the full documentation lives in the litellm-docs
   repo — see BerriAI/litellm-docs#208 for the companion change that
   adds the section to docs/extras/contributing_code.md.

2. .githooks/commit-msg: tighten the subject regex to also reject an
   uppercase first letter in the description. CI's subjectPattern is
   ^(?![A-Z]).+$ so the previous local hook would accept 'feat: Add
   thing' which would then fail the PR-title check. The local hook is
   now the strictly tighter of the two gates. Test cases extended to
   cover both the new rejection and the digit/symbol-start cases that
   remain allowed.

Resolves LIT-3306

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: trigger ci after branch rename

* fix(ci): rerun pr title check when bypass label changes

amannn/action-semantic-pull-request only honors ignoreLabels if the
workflow retriggers on labeled/unlabeled events; without them a red
check stays red after a maintainer applies the bypass label.

Also point the CONTRIBUTING.md workflow comments at the conventions
section, which now sits above the Development Workflow section.

---------

Co-authored-by: Yassin Kortam <yassinkortam@Yassins-MBP.localdomain>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-11 10:00:23 -07:00
Mateo Wang
49ca04d8c3
feat(bedrock): aws_bedrock_project_id for bedrock-mantle project / workspace association (#30163)
* feat(bedrock): support aws_bedrock_project_id for bedrock-mantle project association

Adds a litellm_params field to associate bedrock-mantle requests with an
Amazon Bedrock project, sent as the OpenAI-Project header on the
OpenAI-compatible chat and responses paths and as the anthropic-workspace
header on the Anthropic messages paths. This lets a single model entry opt
into a project-scoped data retention mode (e.g. provider_data_share for
Claude Fable 5) while the account-wide setting stays on default.

The param is carried via litellm_params only and is explicitly excluded
from optional_params so it can never leak into a request body.

Fixes #30070

* chore(ui): regenerate schema.d.ts for aws_bedrock_project_id

Generated with npm run gen:api after adding the field to LiteLLM_Params

* fix(proxy): ban client-supplied aws_bedrock_project_id in request bodies

The deployment pins aws_bedrock_project_id so the project's data
retention policy applies to its requests. Without this guard an
authenticated caller could supply the field in the request body and,
since client kwargs win the router merge, run requests under any
project reachable with the deployment's shared AWS credentials.

Adds the field to _BANNED_REQUEST_BODY_PARAMS so it is rejected at the
auth boundary by default while remaining available through the existing
admin opt-ins (allow_client_side_credentials proxy-wide or
configurable_clientside_auth_params per deployment).
2026-06-11 10:01:08 +05:30
Mateo Wang
7a96b3490d
[internal copy of #30137] perf(realtime): eliminate redundant per-frame JSON work on OpenAI realtime relay (#30142)
* perf(realtime): eliminate redundant per-frame JSON work on OpenAI realtime relay

The GA realtime support added in #27110 made backend_to_client_send_messages
parse every backend frame up to three times for beta clients (OpenAI-Beta:
realtime=v1), build a discarded Pydantic object per frame for logging, and
re-serialize even frames that need no translation. For high-frequency
response.output_audio.delta frames carrying multi-KB base64 payloads, that
serialized CPU work on the hottest relay path drove the latency regression
between v1.83.14 and v1.88.1 for gpt-realtime-1.5 and gpt-realtime-2.

This parses each frame once via _parse_backend_event and threads the dict into
_handle_raw_backend_message, store_message, and _translate_event_to_beta;
short-circuits store_message before the Pydantic build for events not in the
logged set; returns the original event unchanged from _translate_event_to_beta
when no rename applies so the raw frame is forwarded without re-serialization;
and only json.dumps when the type is actually renamed.

* fix(realtime): widen store_message type hint to accept plain dict

The parse-once refactor passes the dict produced by _parse_backend_event into
store_message, but the parameter was typed as str | bytes | OpenAIRealtimeEvents
(a union of TypedDicts), which mypy does not consider compatible with a plain
dict. Add dict to the accepted union; the body already handles it.

---------

Co-authored-by: Miguel Armenta <maarmenta92@gmail.com>
2026-06-11 09:56:35 +05:30
ishaan-berri
4a3860df1f
fix: completion_cost AttributeError on streaming Anthropic web_search responses (#26153) (#27346)
* fix: coerce server_tool_use dict to ServerToolUse in Usage.__init__ (#26153)

* fix: coerce server_tool_use to ServerToolUse in stream_chunk_builder (#26153)

* fix: dict/pydantic-tolerant access in tool_call_cost_tracking (#26153)

* fix: dict/pydantic-tolerant access in anthropic cost_calculation (#26153)

* test: assert ServerToolUse type in existing stream_chunk_builder anthropic web search test

* test: regression test for #26153 (stream_chunk_builder server_tool_use type)

* test: dict/pydantic safety for tool_call_cost_tracking helper

* test: dict/pydantic safety for anthropic web_search cost

* refactor: consolidate _get_web_search_requests into shared cost-calc utils

* test(realtime): use gpt-realtime; openai retired gpt-4o-realtime-preview

OpenAI shut down the gpt-4o-realtime-preview family (incl. the undated
alias) on 2026-05-07, causing the live realtime test to fail with a
4000 invalid_request_error.invalid_model close. gpt-realtime is the GA
successor; switch the live-call tests to it, matching the base branch.

* refactor(types): drop redundant server_tool_use coercion in Usage.__init__

---------

Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
2026-06-10 21:20:11 -07:00
Sameer Kankute
6068bb7781
fix(proxy): align /v1/model/info with router deployments (#30025)
* fix(proxy): align /v1/model/info with router deployments

Return router model_list entries (including team-scoped models) with team
access metadata instead of wildcard-expanded names from get_complete_model_list.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(proxy): gate v1 team filter and honor key allowlists

Only apply get_all_team_and_direct_access_models for admin or user-bound
keys, then intersect with key/team model restrictions to avoid empty lists
for service tokens and metadata leaks for restricted keys.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(proxy): skip v1 team filter when user row is missing

Require a DB-backed user before applying team-access filtering on
/v1/model/info, and skip the trailing filter in get_all_team_and_direct_access_models
when user context cannot be resolved.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Revert "fix(proxy): skip v1 team filter when user row is missing"

This reverts commit 74e1fbd77a981103cd9a4ed1cbdd662f5cbcf209.

* fix(proxy): restore legacy v1 model access filtering

Keep /v1/model/info on key/team allowlists instead of DB team-membership
filtering, while still listing router deployments for team-scoped models.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(proxy): drop A2A agent entries from public /v1/model/info list

* fix(proxy): scope team BYOK rows on /v1/model/info to caller's teams

Listing the full router model_list let any authenticated key without
explicit model restrictions enumerate other teams' BYOK deployments
(public name, team_id, api_base) via /v1/model/info. Reuse the existing
_get_caller_byok_team_scope check so non-admin callers only see global
deployments plus their own team's BYOK rows; admins keep the full view.

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
2026-06-10 19:38:21 -07:00
ryan-crabbe-berri
4def6916da
refactor(ui): consolidate dashboard to one shell in the (dashboard) layout (#30166)
* refactor(ui): consolidate dashboard to one shell in the (dashboard) layout

Moves the legacy ?page= switch page into the (dashboard) route group and
hoists Navbar, sidebar, ThemeProvider, and DebugWarningBanner into the
shared layout with real props, deleting the degraded duplicate shell that
wrapped migrated routes. The active page key now derives from the URL at
render time, so navigating between legacy and migrated pages no longer
remounts the shell.

useProxySettings becomes a React Query hook taking accessToken, shared by
the navbar, the AdminPanel arm, and migrated pages; this replaces the
lifted proxySettings state and the Navbar setProxySettings prop drilling.
The invitation onboarding flow (?invitation_id=) keeps rendering without
chrome via a layout escape hatch. Dead dark mode state and the no-op antd
ConfigProvider are removed.

* fix(ui): include accessToken in useProxySettings query key

The queryFn closes over accessToken, so the key must include it for the
cache to be honest about its inputs. Settings are instance-global today,
which made the omission harmless, but a token change while mounted would
have served the cached entry without refetching.

* test(ui): point CreateKeyPage test at the moved page

The page moved into the (dashboard) route group and no longer renders
the navbar (the layout owns chrome now), so the valid-token test asserts
the default page content (UserDashboard stub) instead.
2026-06-10 18:37:44 -07:00
ryan-crabbe-berri
496f5b9859
fix(ui): dev server 404s on migrated-page links because uiBase hardcodes /ui (#30169)
* fix(ui): serve migrated-page links unprefixed on the dev server

migratedHref and legacyPageHref always prepended /ui, which is where the
proxy mounts the static export but not where next dev serves the app
(basePath is empty; the app lives at the root on localhost:3000). Every
sidebar link to a migrated page and every ?page= bookmark redirect
therefore 404'd in dev, and would do so for each page cut over in the
App Router migration.

uiBase now returns the bare root under NODE_ENV=development. The check
is inlined at build time, so production output is unchanged for both
the default /ui mount and server_root_path deployments.

* test(ui): pin NODE_ENV in production-mode migratedPages tests

The production-mode describes relied on vitest defaulting NODE_ENV to
test; a developer with NODE_ENV=development exported in their shell
would see them fail. Stub it explicitly so the suite is deterministic
regardless of ambient environment.
2026-06-11 00:16:36 +00:00
Yassin Kortam
da9d64b4de
fix(proxy): return 5xx on DB infra errors during auth; reserve 401 for genuine auth failures (#29986) 2026-06-10 23:48:11 +00:00
Mateo Wang
ba72ccf52c
feat: add conventional commits and coding guidelines (#30159)
* feat: add guideline for conventional commits

* feat: add functional programming coding conventions
2026-06-10 16:34:08 -07:00
yuneng-jiang
b301d306c2
fix(release): stop backport releases from overwriting the latest badge (#30005)
create-release published every release with GitHub's default make_latest,
which is true, so any newly published stable release claimed the repo
"Latest" badge regardless of version. That let a backport like 1.84.6
overwrite a newer line like 1.88.1 as latest.

Compute make_latest explicitly: a stable release only claims latest when
its version is >= the current latest (via getLatestRelease), backports to
an older line publish with make_latest false, and prereleases never claim
latest. Version comparison accounts for the maintenance suffix (.postN and
legacy -stable.patch.N) so within-line ordering stays correct
2026-06-10 16:33:48 -07:00
Yassin Kortam
dff25fef44
feat(proxy): add option to disable server-side prepared statements for DB lookups (#29984) 2026-06-10 16:06:32 -07:00
Yassin Kortam
3bd3951e37
fix(proxy): recover from cached-plan errors by reconnecting the Prisma client (#29983) 2026-06-10 16:06:01 -07:00
tin-berri
1436ee9092
fix(mcp): drop orphaned per-user credential rows when an MCP server is deleted (#30141) 2026-06-10 15:56:58 -07:00
yuneng-jiang
7899463c6a
fix(callbacks): forward callback_settings to callback initializers and guard consumers against non-dict values (#30161)
* fix(datadog): pass callback_specific_params so DatadogCostManagementLogger receives cost_tag_keys (#29590)

* fix(datadog): pass callback_specific_params so DatadogCostManagementLogger receives cost_tag_keys

* test(proxy): regression test that load_config forwards callback_specific_params

* fix(proxy): guard lakera_prompt_injection callback_specific_params against non-dict

Addresses review feedback: forwarding callback_settings as callback_specific_params
(so DatadogCostManagementLogger receives cost_tag_keys) exposed the
lakera_prompt_injection branch, which did lakeraAI_Moderation(**callback_specific_params
["lakera_prompt_injection"]) with no type guard. A config like
`callback_settings: {lakera_prompt_injection: "any-string"}` then hit `**"any-string"`
-> TypeError: argument after ** must be a mapping, not str.

Guard the lakera branch with isinstance(dict), matching the existing presidio and
datadog_cost_management branches (non-dict values fall back to {}). Add a regression
test asserting initialize_callbacks_on_proxy ignores a non-dict value instead of crashing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test: inject fake lakera_ai module to avoid importing the real one

CI fix for the lakera regression test: it stubbed litellm.proxy.proxy_server with
a SimpleNamespace and then monkeypatch.setattr'd the real lakera_ai module, which
forces importing it — and lakera_ai does `from litellm.proxy.proxy_server import
LiteLLM_TeamTable`, absent on the stub -> ImportError under proxy-infra tests.

Inject a fake lakera_ai module into sys.modules instead, so the callbacks branch's
`from ...lakera_ai import lakeraAI_Moderation` resolves to the stub without loading
the real module. The guard under test (isinstance(dict) in the lakera branch) is
unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(callbacks): guard compression/websearch interceptors against non-dict callback_settings (#30153)

#29590 forwards the full callback_settings dict into initialize_callbacks_on_proxy, which activates the compression_interception and websearch_interception consumers. Their initialize_from_proxy_config read the callback_settings subkey without an isinstance(dict) guard, so a non-dict value such as `compression_interception: true` reached from_config_yaml(...).get(...) and aborted proxy startup with AttributeError. #29590 added that guard for lakera_prompt_injection but not for these two

Mirror the isinstance(dict) guard already used by the lakera, presidio, and datadog branches so a non-dict value is ignored and the callback initializes with defaults. A parametrized test feeds every callback_settings consumer a non-dict value through initialize_callbacks_on_proxy to catch a future consumer that forgets the guard

* fix(callbacks): normalize non-dict callback_specific_params to empty dict

A blank callback_settings: key in YAML loads as None, and
config.get('callback_settings', {}) returns None because dict.get only
falls back to the default when the key is absent. Forwarding that value
verbatim to initialize_callbacks_on_proxy made the first
'<name>' in callback_specific_params membership test raise
TypeError: argument of type 'NoneType' is not iterable, aborting proxy
startup. Same failure for any non-dict root such as callback_settings: true.

Normalize the value at the function boundary so both callsites (and any
future ones) initialize callbacks with their defaults instead of crashing.

---------

Co-authored-by: Hedi Daoud <150018939+hdaoud23@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 15:22:00 -07:00
Mateo Wang
20e453f698
feat(cli): per-agent lite claude / codex / opencode commands that wrap coding agents through the proxy (#29850)
* feat(cli): add `litellm-proxy run -- <agent>` to wrap coding agents through the proxy

Wraps Claude Code, Codex, OpenCode, and any other coding agent so all of its
LLM traffic routes through a LiteLLM proxy, with the agent-vault style of "just
works" DX: one `run -- <agent>` command, auto SSO login when interactive,
env-key "agent mode" for containers/CI, and a fail-fast key check against the
proxy so bad credentials error immediately instead of deep inside the agent.

The wrapped binary is detected by name to pick the right variables. Claude Code
gets ANTHROPIC_BASE_URL (the bare proxy root, so it appends /v1/messages) and
ANTHROPIC_AUTH_TOKEN, with any stray ANTHROPIC_API_KEY cleared so the proxy
token wins. Codex and OpenCode get OPENAI_BASE_URL (proxy + /v1) and
OPENAI_API_KEY. Unrecognized commands get both sets so they work either way.
`litellm-proxy claude-code` remains as a shortcut for `run -- claude`.

The core logic is split into dependency-injected helpers (agent_profile,
build_agent_env, verify_proxy_key, run_agent) so env wiring, the preflight, and
the launch handoff are unit-tested without monkeypatching, alongside CliRunner
tests for auth resolution, agent mode, and auto-login. Mutation-tested the env
profiles, preflight, and agent-mode branch to confirm the tests fail when the
behavior is broken.

https://claude.ai/code/session_0154VpLXW7mMvk5wfbgPRJa6

* Make each coding agent its own litellm-proxy command

Replace the `run -- <agent>` interface and the `claude-code` shortcut with
top-level commands generated per known agent, so launching is just
`litellm-proxy claude`, `litellm-proxy codex`, or `litellm-proxy opencode`,
with everything after the agent name forwarded straight to it. This drops the
ceremony of `run --` and cuts typing.

The `--model`/`--small-fast-model` wrapper flags are gone; pass the agent's
own model flag instead, or export the model env vars (the wrapper preserves
what you already have set), which keeps the surface minimal and avoids
intercepting flags the agent owns. Rename the module to agents.py to match.

* fix(cli): route `litellm-proxy codex` through the proxy via a custom provider

Codex ignores OPENAI_BASE_URL (it always dials api.openai.com over the
Responses WebSocket transport), so the OpenAI env profile alone left
`litellm-proxy codex` talking to OpenAI directly instead of the proxy. Point
Codex at the proxy with a custom provider passed as `-c` config overrides, and
force the HTTP/SSE Responses transport with supports_websockets=false since the
proxy does not speak the Responses WebSocket protocol. The provider reads its
key from OPENAI_API_KEY, which the agent env already exports.

The overrides are injected ahead of the user's args so they precede Codex's
subcommand. Claude Code and OpenCode are unaffected; they honor the exported
env vars. Adds regression tests for the per-agent launch args and the
injection ordering.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

* Rename litellm-proxy CLI command to lite

The proxy management CLI was invoked as litellm-proxy, which is a lot to
type for an everyday command. Rename the console script entry point to
lite and update the in-CLI usage examples, help text, error messages and
docs to match.

* fix(sso): stop CLI auth success page from hanging on "Closing..."

The CLI opens the SSO success page with webbrowser.open, so the tab is
not script-opened and the browser refuses window.close(). The countdown
would end on "Closing..." and the tab would sit there forever.

Drop the countdown and just show "You can now close this window and
return to your terminal." from the start, while still attempting
window.close() once so the tab auto-closes in the rare case the browser
allows it. Add a regression test asserting the manual-close instruction
is always present and the misleading countdown/"Closing..." text is gone.

* fix(cli): reattach controlling terminal after SSO login, keep litellm-proxy alias

When the first `lite claude` has to log in via browser SSO, completing the login could
leave stdin detached from the terminal, so a TUI agent like Claude Code would start in
non-interactive mode and exit with "Input must be provided". The wrapper now reopens the
controlling terminal onto stdin just before handoff when the session started interactively;
piped or redirected input is detected up front and left alone, so agent-mode and
non-interactive use are unchanged.

Also keep the `litellm-proxy` console script as an alias for `lite` so existing scripts and
CI that invoke `litellm-proxy` keep working; both names map to the same CLI.

* feat(install): make the curl installer need only curl, not a pre-existing Python

The installer now lets uv provision a managed Python 3.13 when no suitable
interpreter is found, instead of aborting. The minimum is also bumped from
3.9 to 3.10 to match the package's requires-python (>=3.10), so a system
Python 3.9 is no longer selected only for uv tool install to reject it.

* feat(cli): add thin litellm[cli] install path (install-cli.sh + brew) for the lite CLI

On a developer laptop the `lite` CLI only needs `lite login` and running coding
agents through a proxy, but the sole install path was `litellm[proxy]`, which
drags in the whole server tree (fastapi, uvicorn, boto3, polars, cryptography,
litellm-enterprise). The CLI's heavy imports are all guarded, so it runs on the
base SDK plus just rich, pyyaml and requests.

Add a `cli` extra carrying exactly those three, a `scripts/install-cli.sh` curl
one-liner that installs `litellm[cli]`, and a `BerriAI/homebrew-litellm` tap
formula with a release runbook under `packaging/homebrew/`. The installer passes
no `--python`, so uv honours litellm's requires-python and provisions a managed
interpreter, skipping a too-old (3.9) or too-new (3.14+) system Python instead
of failing to resolve.

A pyproject thin-contract test asserts the `cli` extra keeps the deps the CLI
imports and never leaks a server-only dependency from `proxy`, so the laptop
install cannot silently re-bloat

* fix(install): let uv pick the Python via --python-preference system

Both installers detected a system Python with a floor-only check and forced it
with `uv tool install --python <interp>`. On a host whose only Python is outside
litellm's requires-python (a too-old 3.9 or, increasingly, a too-new 3.14) that
forced an incompatible interpreter and the resolve failed. Drop the detection and
pass `--python-preference system`: uv reuses a compatible system Python when
present and downloads a managed one otherwise, always honouring requires-python

* test(router): filter aiohttp unclosed-session gc noise in test_async_fallbacks

test_async_fallbacks asserts the last three captured log records are the
router's fallback messages. Under the litellm_router_testing job (pytest -k
router -n 4) many router tests share the module-level in_memory_llm_clients_cache
(max 200, ttl 3600s). Older cached OpenAI/Azure clients get evicted while their
aiohttp ClientSession is still open, and when the gc reclaims them aiohttp emits
"Unclosed client session"/"Unclosed connector" through the asyncio logger.
Those records land in caplog mid-test and push the expected router logs out of
the last-three window, so the assertion flips to failing non-deterministically.

These warnings are async cleanup noise, not router debug logs, so filter them
out exactly like the existing leaked-task warnings before asserting order. The
assertion on the three router fallback messages is unchanged.

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
2026-06-10 13:52:26 -07:00
Mateo Wang
a4a3348801
[internal copy of #28007] Fix/gcp model garden streaming (#28363)
* fix(vertex): stream Model Garden Gemma/Qwen responses correctly through /v1/messages

* test(vertex): cover _CombinedChunkSplitter defensive branches

* test(databricks): rename test file to avoid duplicate basename collision

* fix(databricks,anthropic): defensive token defaults; document single-mode splitter

Address greptile P2 concerns:
- databricks: default usage token fields to 0 when constructing
  ChatCompletionUsageBlock from a partially populated usage block — matches
  the defensive pattern used in ollama/vertex_ai/cohere/bedrock.
- _CombinedChunkSplitter: clarify in the docstring that an instance is
  single-mode (sync or async, not both), since the two iteration paths hold
  independent upstream iterator references.

Co-authored-by: Claude <claude@anthropic.com>

---------

Co-authored-by: Steven Kessler <9701252+stvnksslr@users.noreply.github.com>
Co-authored-by: Claude <claude@anthropic.com>
2026-06-10 12:31:00 -07:00
Yassin Kortam
410b892f77
fix(register_model): preserve built-in cache pricing when registering custom overrides under unmapped keys (#30044)
* fix(spend-tracking): fall back to direct spend-counter increment when reservation reconcile fails

When the reservation-reconcile path in `_reconcile_budget_reservation_for_counter_update`
hits a Redis error, it now correctly returns an empty set so that
`increment_spend_counters` re-runs the direct increment for the affected counters.
Previously, the function logged the failure, invalidated the reserved counters, and
still returned the reserved counter keys, which caused the caller to skip the direct
increment. With the increment skipped and the counter deleted, the next request
reseeded the counter from `LiteLLM_VerificationToken.spend`, a column the batched
flusher only updates every few seconds, so the enforced cross-pod spend value
collapsed to a stale snapshot and budget gating stopped firing for affected keys.

Adds a regression test that exercises the failure path with a flaky redis backend
and asserts the actual response cost lands in the shared counter.

* fix(register_model): preserve built-in cache pricing when registering custom overrides under unmapped keys

When a custom-priced model is registered under a key shape that
get_model_info cannot resolve (e.g. litellm_params.model set to
bedrock/bedrock/us.anthropic.claude-sonnet-4-6 or another non-canonical
alias), register_model previously fell back to an empty existing_model.
The merged entry then carried only the fields the user set explicitly
(input/output cost, provider) and dropped cache pricing. Downstream the
cost calculator defaulted cache_creation_input_token_cost and
cache_read_input_token_cost to 0, silently dropping the bulk of the bill
for cache-heavy Anthropic traffic.

register_model now attempts to resolve a canonical built-in entry by
stripping provider prefixes, region prefixes, and provider-specific
suffixes before giving up. When a variant resolves, its defaults
(notably cache pricing) are inherited while the user's explicit overrides
still win. When nothing resolves and the user supplied no cache pricing,
it logs a warning instead of silently under-billing.

* fix(router): inherit built-in cache pricing on deployments with partial custom pricing

A deployment configured with only input_cost_per_token and output_cost_per_token
under model_info was being registered under its model_info.id with no cache cost
fields. The cost calculator then defaulted cache_creation_input_token_cost and
cache_read_input_token_cost to 0, silently billing cache_read and cache_creation
tokens at zero. For cache-heavy Anthropic traffic this drops the bulk of the bill.

When the deployment's litellm_params.model resolves to a built-in cost-map entry,
pull the cache pricing fields from there before registering. User-specified
cache fields still win on merge; only missing fields are inherited.

Pairs with the register_model fallback added earlier in this branch: that
handles unmapped key shapes like bedrock/bedrock/x, this handles deploy-id
keys whose backend model is mapped.

* fix(register_model): inherit only cache pricing on unmapped-key fallback, not provider

The unmapped-key fallback in register_model copied the entire resolved
built-in entry, so registering openai/command-r-plus inherited the cohere
built-in's litellm_provider and get_model_info(custom_llm_provider=openai)
could no longer resolve it. Restrict the fallback to the cache-pricing
fields, matching the router-side _inherit_builtin_cache_pricing, so the
cache-cost dropout stays fixed without clobbering the registered provider.

Add a direct unit test for Router._inherit_builtin_cache_pricing so the
router coverage check sees it, and pin the fixed spend-counter contract:
when reservation reconcile fails the counter must hold the directly
incremented cost rather than being left at None.
2026-06-10 12:11:03 -07:00
ryan-crabbe-berri
a75ed0079c
chore(ui): make knip recognize .mjs scripts and openapi-typescript (#30052)
The knip entry/project globs only matched scripts/**/*.ts, so the two
.mjs scripts went unanalyzed and produced "no matches" config hints.
openapi-typescript was also reported as unused because gen-api-types.mjs
invokes its binary through a dynamic execFileSync path that knip cannot
trace statically; ignoreDependencies records that it is genuinely used.
2026-06-10 11:44:24 -07:00
michelligabriele
f9293d40c4
fix(proxy): self-heal startup/reload prisma reads on engine disconnect (#28803) 2026-06-10 20:16:58 +02:00
Sameer Kankute
3b40ac987f
Litellm oss 090626 (#30021)
* fix(mcp): report scoped server name during initialize (#29865)

* fix mcp scoped server name

* Update litellm/proxy/_experimental/mcp_server/mcp_context.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* test(mcp): cover scoped server name in the SSE initialize handler

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix(ui): show all session logs in the drawer, not just the first 50 (#29795)

* fix(ui): show newest session logs first

* test(ui): keep session log pagination coverage

* fix(ui): show all session logs in the drawer, not just the first page

The session detail drawer fetched session logs via sessionSpendLogsCall
without page/page_size, so it only ever received the backend default of one
page (50 rows). Sessions with more than 50 calls had the rest unreachable in
the UI (#29153).

sessionSpendLogsCall now takes page/page_size, and the drawer fetches the
first page, reads total_pages, then fetches the remaining pages and
accumulates them before the existing client-side sort. This keeps the single
continuous list (and the selected-log lookup and keyboard navigation, which
all assume the full session) correct. Fetching is bounded by a page cap, and
the sidebar shows a "showing most recent N" note if a session exceeds it.

The rows are lightweight metadata (the endpoint excludes messages/response),
so the full set is small; request/response bodies are still loaded per log on
demand.

* fix(ui): default session drawer to most recent log, newest first

Open a session with its most recent log selected, and order the sidebar
newest-first to match the all-sessions logs overview. MCP calls stay
grouped last. The latest log by time is computed explicitly, since the
MCP grouping means it is not always the first row.

* Apply fetching pages in batches suggestion from @greptile-apps[bot]

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix(ui): derive session total from accumulated rows when backend omits it

Compute the session total after all pages are fetched, falling back to the
accumulated row count rather than the first page's. Guards the truncation
note against a backend response that omits total but spans multiple pages.

---------

Co-authored-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix(proxy): handle Mistral multipart passthrough (#29927)

* fix(proxy): handle Mistral multipart passthrough

* chore: satisfy passthrough ci formatting

* test(proxy): cover Mistral passthrough in CI shard

* fix(vertex_ai): use REP host for context caching on eu/us multi-region endpoints (#29573)

Context caching built the cachedContents URL as
https://{location}-aiplatform.googleapis.com, which is an invalid host for the
eu/us multi-region endpoints and returns 404. The inference path already
resolves these to the REP host (https://aiplatform.{geo}.rep.googleapis.com)
via get_vertex_base_url(); reuse that helper in
_get_token_and_url_context_caching so caching uses the same host as inference.

Adds tests covering the eu/us multi-region cachedContents URLs (v1 and
v1beta1).

Fixes #29571

* Support per-model encrypted content affinity config (#29760)

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>

* fix: propagate upstream status code in proxy API exception handler (#29402)

* fix: propagate upstream status code in proxy API exception handler

When Google GenAI / Vertex returns a 404 for deprecated or missing
models via streamGenerateContent, the exception was falling through to
a generic handler that defaulted to 500. Now provider exceptions
carrying a valid HTTP status_code correctly propagate it through to
the ProxyException.

* fix: apply black formatting to common_request_processing.py

* fix: tighten status code range to 400-599 and deduplicate ProxyException raise

* fix(tests): use valid vertex_location in context caching tests

Replace "test_location" (contains underscore) with "us-central1" so tests
pass the regex validation added in get_vertex_base_url().

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(sdk): add xAI OAuth provider (#29866)

* Add xAI OAuth provider

* Update oauth.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Fix xAI OAuth CI failures

* Add xAI OAuth coverage tests

* Move xAI OAuth coverage tests to core utils

* Address xAI OAuth review comments

* Prevent xAI OAuth api_base token exfiltration

* Treat blank xAI OAuth api keys as absent

* Wrap invalid xAI OAuth JSON responses

* Use xAI OAuth behind explicit flag

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix(proxy) #27734 allow clearing budget_duration and team_member fields by sending null on /key/update and /team/update (#27751)

* fix(proxy): allow clearing budget_duration and team_member fields by sending null on /key/update and /team/update

Fixes #27734

Sending null for budget_duration, team_member_budget,
team_member_budget_duration, team_member_rpm_limit, or
team_member_tpm_limit via /key/update or /team/update returned 200 OK
but silently ignored the null value. The fields remained unchanged in
the database.

Root causes:
- /key/update: prepare_key_update_data() popped budget_duration from the
  update dict but never re-added it (or budget_reset_at) when the value
  was None.
- /team/update: _set_budget_reset_at() only acted when budget_duration
  was non-None, leaving a stale budget_reset_at in the DB.
- /team/update: team_member_* null values bypassed the budget table
  update entirely because should_create_budget() requires at least one
  non-None field.

* test(proxy): cover no-budget-row path in clear_team_member_budget_fields

* fix(presidio): unmask PII tokens in Anthropic native SSE streaming bytes (#30028)

* fix(presidio): unmask PII tokens in Anthropic native SSE streaming bytes

When output_parse_pii=true on the Anthropic native path (anthropic/claude-*),
response chunks arrive as raw bytes in SSE format. _stream_pii_unmasking was
yielding those bytes unchanged, so <PERSON_1> tokens were never replaced with
the original values before reaching the caller.

Add _unmask_sse_bytes_chunk to parse each data: line, find content_block_delta
/ text_delta events, and apply _unmask_pii_text before re-encoding. Wire it
into _stream_pii_unmasking so bytes chunks are unmasked when pii_tokens exist.

* fix(presidio): handle CRLF line endings and non-ASCII PII in SSE unmask

Strip trailing \r before the [DONE] guard so CRLF-terminated SSE chunks
don't bypass it and silently swallow a JSONDecodeError. Add
ensure_ascii=False to json.dumps so non-ASCII replacement values like
accented names are preserved as UTF-8 on the wire rather than being
\uXXXX-escaped. Add regression tests for both cases.

* feat(bedrock_mantle): path-aware Responses routing (/v1/responses vs /openai/v1/responses) (#29925)

* feat(bedrock_mantle): path-aware Responses routing (/v1/responses vs /openai/v1/responses)

Bedrock Mantle serves the Responses API on two upstream paths:
  - gpt frontier models (gpt-5.5 / gpt-5.4) on /openai/v1/responses
  - every other Responses-capable model (e.g. gpt-oss) on the standard /v1/responses

BedrockMantleResponsesAPIConfig gains a `use_openai_path` flag; the provider gate in
utils.py picks the path per model: openai.gpt-* (non gpt-oss) -> /openai/v1/responses;
any model declared mode=responses (price-map entry or user model_info) -> /v1/responses;
everything else returns None and keeps the existing chat-completions emulation.

Adds gpt-5.5 / gpt-5.4 price-map entries, registry wiring, and the routing-matrix tests.

* feat(bedrock_mantle): data-driven frontier routing via use_openai_responses_path

Addresses the Greptile review point that frontier detection should be a
price-map field rather than a hardcoded name match. The gate now routes a
model to /openai/v1/responses when its price-map entry declares
use_openai_responses_path, so a frontier model whose name does not follow the
openai.gpt- convention can be onboarded by JSON alone. The name-convention
check is kept as a fallback that needs no price-map entry, which preserves
zero-change routing for a future gpt-6 before its entry loads. gpt-5.5 / gpt-5.4
get the flag in both price maps. Adds tests for the data-driven flag path and
for the flag presence on the gpt-5.x entries; both branches are mutation-tested.

* test(model_prices): allow use_openai_responses_path in price-map schema

The model_prices_and_context_window.json schema validator
(test_aaamodel_prices_and_context_window_json_is_valid) enforces
additionalProperties: false, so the new use_openai_responses_path flag on the
gpt-5.5 / gpt-5.4 entries failed validation. Add it to the schema as a boolean,
alongside the other supports_* / capability flags.

* Add Tensormesh serverless models to the model cost map (#30037)

* Add Tensormesh serverless models to the model cost map

* Flag reasoning support on the Tensormesh models that expose thinking mode

* fix(proxy): invalidate stale key spend counter after budget reset or manual spend update (#30001)

* fix(proxy): reconcile stale key spend counter after budget reset

* fix(proxy): invalidate stale key spend counter after budget reset or manual spend update

* fix(proxy): remove read-time stale counter reconciliation to prevent budget bypass

* revert: undo unrelated formatting changes in enterprise directory

* test(proxy): add unit test for key spend update invalidating counter

* test(proxy): fix mocked update_data and hash token expectations in unit test

* fix(proxy): use Responses-API transformer in pass-through cost tracking (#29728)

The `elif is_responses:` branch of `openai_passthrough_handler` was
calling the chat-completions `transform_response` on a Responses API
payload. The chat-completions transformer expects `choices: [...]`
in the raw response; the Responses API uses `output: [...]` and
`usage.input_tokens` / `usage.output_tokens` (not
`prompt_tokens` / `completion_tokens`). The result was a
KeyError 'choices' deep inside `convert_to_model_response_object`,
swallowed by the surrounding `except Exception` in the handler, and
the SpendLogs row was written by the fallback path with zeroed-out
tokens, spend, and model.

This bug silently undercounts cost for every successful pass-through
call to either OpenAI's `/v1/responses` or Azure's
`/openai/v1/responses` (deployments configured for the Responses
API). Reproduced 2026-06-04 against a real Azure OpenAI Responses
API deployment proxied through LiteLLM v1.88.0.

Fix: use the dedicated
`OpenAIResponsesAPIConfig.transform_response_api_response` for the
Responses branch. This transformer already exists in LiteLLM
(`litellm/llms/openai/responses/transformation.py`) and knows the
Responses-API on-the-wire shape. `litellm.completion_cost` already
handles `ResponsesAPIResponse` natively with `call_type="responses"`,
so no downstream changes are needed.

Tests:

  test_responses_api_uses_responses_transformer_not_chat_completions
    NEW. Real regression test — exercises the openai_passthrough_handler
    with a real-shaped Responses payload (no `choices`, has `output`
    and Responses-API `usage` keys) and NO mocked `get_provider_config`.
    Pre-fix: raises KeyError 'choices' inside the chat-completions
    transformer (the bug). Post-fix: returns a ResponsesAPIResponse,
    completion_cost is called with call_type="responses" and a
    ResponsesAPIResponse instance (asserted).
    Verified to fail on un-fixed handler + pass on fixed handler
    before commit.

  test_responses_api_cost_tracking
    UPDATED. Old test mocked `get_provider_config` (no longer called
    in the responses branch post-fix). Now mocks the Responses
    transformer directly (`OpenAIResponsesAPIConfig.transform_response_api_response`)
    to test the downstream cost-calc contract.

Out of scope for this PR (separate followup):
  - Recognizing *.cognitiveservices.azure.com (the newer Azure
    OpenAI hostname) in the is_openai_*_route checks. Separate PR.

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>

* fix(skills): execute DB skills by matching the litellm_skill_ tool name prefix (#30116)

Skill IDs are generated as litellm_skill_<uuid> and the model-facing
tool name is the sanitized skill ID, but the post-call execution gates
in SkillsInjectionHook only ran tools whose name starts with "skill_",
so DB skills were silently returned to the client as raw tool calls.

Fixes #28122.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(anthropic): synthesize content_block_start when Responses stream omits output_item.added (#30115)

* fix(team): reserve team budget raises for proxy admins on /team/update (#30030)

The caller's PERSONAL max_budget was the wrong yardstick for /team/update: a
team's spend ceiling has nothing to do with the admin's own key budget. That
comparison was an unintended side effect of reusing _check_user_team_limits()
(which exists for the /team/new path) and broke the UI, which re-sends the
unchanged budget on every save.

New behavior on /team/update for standalone teams:
- A team admin (already authorized via _verify_team_access) may freely KEEP or
  LOWER the team budget, and change models/tpm/rpm, without being gated by their
  personal limits.
- GROWING a team's spend ceiling is a budget-authority action reserved for proxy
  admins -> 403 for team admins. "Growing" covers both raising max_budget above
  the team's current finite value and removing the cap entirely (max_budget=null,
  detected via model_fields_set so an explicit null is distinguished from an
  omitted field). For a team that currently has no cap, setting a finite value is
  a restriction and is allowed.
- Org-scoped teams remain governed by _check_org_team_limits() (capped by the
  org budget).

Also reverts the #29525 existing_team_max_budget workaround in
_check_user_team_limits() back to the create-only form; /team/new still enforces
the creator's personal caps.

docs(access_control): resolve the contradiction in the team-admin section —
team admins can keep/lower the budget and manage rate limits/models, but cannot
raise the team budget (proxy-admin only).

tests: unit + behavior coverage for raise-blocked, cap-removal-blocked (team
admin), raise/removal allowed (proxy admin), uncapped-team restriction allowed,
keep/lower/resend allowed, and unchanged create-path guards.

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(ui): data-driven App Router migration E2E smoke (default + server-root-path) (#29974)

* test(ui): add a data-driven App Router migration E2E smoke

Add a growing Playwright smoke for migrated pages: for each segment it deep-links
to the path route, asserts the URL and that the dashboard shell rendered, then
clicks off to a legacy page and asserts navigation still works. Driven by
e2e_tests/fixtures/migratedPages.ts, so adding a page is one line.

Runs in two situations against the same proxy: the default mount (npm run
e2e:migration) and a non-root SERVER_ROOT_PATH mount (npm run e2e:migration:root).
globalSetup now logs in at `${SERVER_ROOT_PATH}/ui/login` so the admin storage
state is valid under a prefix. Seeded with api-reference; append the rest as their
migrations merge.

* test(ui): support headed slow-motion + watch pauses in the migration smoke

Honor SLOWMO in the server-root-path config (the default config already did),
and add an env-gated E2E_WATCH_MS pause so a headed run lingers on each state.
Both are no-ops by default, so CI behavior is unchanged.

* test(ui): make the migration smoke a sidebar-click user journey

Rework the smoke from deep-linking to a real navigation journey: start at the
landing page, click the migrated page in the sidebar (expanding submenus for
nested items), assert the path route rendered, reload it (the check a wrong
server_root_path breaks), bounce to a legacy page and back, and — once two pages
are migrated — navigate directly between two migrated pages. Verifies via URL +
shell render, driven by the same fixture list.

* test(ui): address review on the migration smoke

Escape ROOT and segment before interpolating them into RegExp URL matchers so a
future segment containing regex metacharacters can't silently widen the match.
Make the server-root-path config fail fast when SERVER_ROOT_PATH is unset instead
of silently re-running the default mount and passing without exercising the prefix.

* test(ui): drop unused watch helper and fix stale smoke README

* test(ui): run the migration smoke under a server root path in CI

* test(ui): harden + instrument the server-root-path proxy reboot in CI

* test(ui): run the server-root-path migration smoke as its own CI job

Replace the in-place proxy reboot in e2e_ui_testing with a dedicated
e2e_ui_testing_server_root_path job that boots the proxy once with
SERVER_ROOT_PATH=/litellm, matching how every other proxy variant in the
config gets its own job rather than killing and relaunching the live proxy.

The reboot was failing deterministically: after pkill -9 and relaunch the
prefixed proxy never came back up on :4000 (connection refused), so the smoke
never ran. The readiness step that was supposed to surface the cause could
never reach its boot-log tail because CircleCI runs steps under bash -eo
pipefail and the preceding `curl -sv ... | tail` aborted the step with curl's
exit 7. Booting the proxy as the job's own background step lets any boot crash
land in that step's log instead of being swallowed.

The default e2e_ui_testing job is unchanged aside from dropping the reboot,
prefixed-readiness, and prefixed-smoke steps; the migration smoke still runs at
the root mount there via the default Playwright config.

* fix(proxy): extend response headers hook to streaming, TTS, image gen, and pass-through (#24232)

* fix(proxy): extend response headers hook to streaming, TTS, image gen, and pass-through

* test: mock post_call_response_headers_hook in audio speech route tests

* chore(ui): remove dead App Router route stubs under (dashboard) (#30045)

models-and-endpoints, organizations, and virtual-keys each had a page.tsx
route under (dashboard)/ that is not in MIGRATED_PAGES, so the sidebar and
deep links never resolve to it and the route is unreachable. Each was a thin
wrapper that handed the shared view empty or no-op props (empty modelData with
a no-op setModelData, hardcoded empty organizations, no-op
setUserRole/setUserEmail), so reaching one would render a degraded page in any
case. The real wrapper belongs in the PR that flips each page into
MIGRATED_PAGES, written with eyes on it and a test

This continues the dead-scaffolding cleanup from #28891. The shared components
these wrappers rendered (ModelsAndEndpointsView, OrganizationFilters) stay,
since the legacy ?page= switch in app/page.tsx and src/components still import
them

* fix(ui/mcp): reset OAuth state on create-server modal close so a prior server's token no longer leaks into the next add-server session (#30000)

* fix(ui/mcp): reset OAuth hook state on modal close so a prior server's token no longer leaks into the next add-server session

* fix(ui/mcp): clear in-flight OAuth guard on reset and reset form/tools on modal close so nothing leaks on a parent-driven dismiss

* fix(mcp): allow team access-group grants in OAuth authorize/token access check (#30041)

* fix(mcp): honor team access-group grants in OAuth authorize/token access check

* test(mcp): mock build_effective_auth_contexts in non-admin authorize tests for isolation

* docs(security): require a reproduction video for vulnerability reports (#30048) (#30063)

With AI models capable of automated vulnerability discovery now publicly
available, we expect a large increase in report volume, much of it
unverified. Requiring a video of the exploit running against a live
instance raises the bar for submissions and keeps triage focused on
reproducible issues. Reports without a video will be closed and reopened
if one is added later.

Co-authored-by: stuxf <70670632+stuxf@users.noreply.github.com>

* feat(ui): add admin flag to disable in-product UI nudges for everyone (#29796)

* feat(ui): add admin flag to disable in-product UI nudges for everyone

Admins can now suppress the survey and Claude Code feedback popups for
all users via a single disable_ui_nudges UI setting, instead of relying
on each user dismissing them individually.

* fix(ui): suppress nudges while ui settings are loading

Gate nudgesDisabled on the ui-settings loading state so an admin with
disable_ui_nudges on doesn't see the survey prompt flash, and the
getInProductNudgesCall fetch doesn't fire, on a cold page load before
the flag resolves. Falls back to showing nudges if the fetch errors.

* test(ui): wrap CreateKeyPage test in QueryClientProvider

page.tsx now calls useUISettings (react-query), which needs a
QueryClient that layout.tsx supplies in production but the test did
not. Add the provider and mock getUiSettings so the query resolves.

* chore(ui): remove dead dashboard files and unused dependencies (#30047)

* chore(ui): remove dead dashboard files and unused dependencies

knip flagged seven orphaned source/config files with no importers and
five declared dependencies that nothing in the tree uses. Removing them
shrinks the dashboard bundle's source surface and keeps the manifest
honest; vite stays installed transitively via vitest, so test tooling is
unaffected.

* fix(ci): restore serverRootPath.config.ts referenced by SERVER_ROOT_PATH workflow

The dead-code sweep removed e2e_tests/serverRootPath.config.ts, but its spec
(tests/login/serverRootPathRedirect.spec.ts) and the test_server_root_path.yml
workflow step still depend on it, so the redirect e2e job failed to load a
config that no longer existed.

* fix(proxy): authorize batch files using upload target_model_names (LIT-3593) (#30009)

* fix(proxy): authorize batch files using upload target_model_names (LIT-3593)

After replace_model_in_jsonl, body.model is a stripped provider id. Reverse-mapping it via resolve_model_name_from_model_id is first-match on model_list and caused false 403s when multiple deployments share the same stripped name. Use target_model_names from the unified file id instead.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(proxy): restore resolve_model_name_from_model_id for JSONL fallback path (LIT-3593)

Restores the reverse-lookup for the JSONL body.model fallback path so that
legacy/pre-target_model_names managed files still map stripped provider IDs
back to proxy aliases before auth. Also cleans up redundant `or None`.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Revert "fix(proxy): restore resolve_model_name_from_model_id for JSONL fallback path (LIT-3593)"

This reverts commit 30d2e96f77ef521ccaaf2193fe554980380eb669.

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* Add Claude Fable 5 across Anthropic, Bedrock, Vertex AI, and Azure AI (#30064)

* Add Claude Fable 5 across Anthropic, Bedrock, Vertex AI, and Azure AI

Adds cost map entries for claude-fable-5 ($10/$50 per MTok, 1M context,
128K output, adaptive thinking only) on the Anthropic API, Bedrock
converse (base, global, and us/eu geo inference profiles at the 10%
regional premium), Vertex AI, and Azure AI (Microsoft Foundry, which
serves Fable 5 with the full 1M context window unlike Opus 4.8).

Registers anthropic.claude-fable-5 in BEDROCK_CONVERSE_MODELS, lists the
model in the setup wizard, and extends the reasoning effort e2e grid.
The Bedrock, Vertex, and Azure grid cells carry fail_reason markers
until the CI accounts are provisioned: Bedrock needs the provider data
sharing opt-in Fable 5 requires, and the Foundry resource needs a
claude-fable-5 deployment.

The first-party entry carries provider_specific_entry {us: 1.1} for the
inference_geo premium and deliberately no fast multiplier since Fable 5
has no fast mode.

https://claude.ai/code/session_01MZarYYT3aS7DxaNjoax6Gm

* Drop removed sampling params for Claude 4.7+ when drop_params is set

Fable 5, Opus 4.7, and Opus 4.8 removed sampling params: the API rejects
top_p, top_k, and any temperature other than 1 with a 400. LiteLLM was
forwarding them even with drop_params enabled because the Anthropic and
Bedrock converse transformations passed temperature/top_p through
unconditionally.

Mirror the GPT-5/o-series handling: temperature=1 still passes through,
other values and any top_p are dropped when drop_params is set, and
without drop_params a clean client-side UnsupportedParamsError tells the
caller how to opt in, instead of surfacing the raw provider error.

https://claude.ai/code/session_01MZarYYT3aS7DxaNjoax6Gm

* Drive sampling param gating from the cost map and cover top_k

Greptile review follow-ups on the sampling param fix: the restriction for
Fable 5 / Opus 4.7 / 4.8 is now declared as supports_sampling_params: false
on every affected cost map entry (perplexity excluded; that route is
OpenAI-compatible and maps sampling params upstream) and read back through
a tri-state map lookup, keeping the name check only as a fallback for
provider-routed ids whose hosted map entries predate the flag, the same
layering supports_adaptive_thinking uses. top_k bypasses map_openai_params
as a provider-specific kwarg, so it is gated at the shared
AnthropicConfig.transform_request boundary (direct, Bedrock invoke, Vertex,
Azure) and in the Bedrock converse _handle_top_k_value path, with
drop_params threaded through the converse transform helpers.

Also updates the reasoning effort grid cell count assertion for the four
Fable 5 rows added on this branch (29 x 11 cells).

https://claude.ai/code/session_01MZarYYT3aS7DxaNjoax6Gm

* Declare supports_sampling_params in the cost map schema

The model map validation schema uses additionalProperties: false, so the
new flag must be declared for the 28 entries that carry it; this was the
one failing job (misc / Run tests) on the previous commit.

https://claude.ai/code/session_01MZarYYT3aS7DxaNjoax6Gm

* fix(bedrock): gate top_k=0 on converse to match Anthropic boundary

Truthiness check let top_k=0 silently disappear on models that removed
sampling params, while AnthropicConfig.transform_request treats 0 as
present and raises UnsupportedParamsError (or drops when drop_params is
set). Switch to 'is not None' so converse, direct Anthropic, invoke,
Vertex, and Azure all behave the same for top_k=0.

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>

* fix(anthropic): avoid index -1 content_block_delta in messages stream

When a /v1/messages request is routed through the Responses API
adapter, AnthropicResponsesStreamWrapper only emits content_block_start
on response.output_item.added. Some upstreams (LMStudio for example)
never send that event, so the text delta handler fell back to
_current_block_index, which starts at -1, and clients received
content_block_delta events with index -1 and no preceding
content_block_start. Anthropic SDKs then fail with "text part -1 not
found"

The text delta handler now synthesizes a content_block_start with a
fresh block index whenever the delta references an unregistered item_id
or no block is open yet, and registers the item_id so follow-up deltas
reuse the same index

Addresses the /v1/messages defect in #27442

* Make test sys.path shim resolve relative to the file, not the CWD

os.path.abspath("../../../../../../..") depends on where pytest is
invoked from; anchoring on os.path.dirname(__file__) makes the import
work from any working directory. Also corrects the depth: the repo root
is six levels above this file, not seven.

---------

Co-authored-by: milan-berri <milan@berri.ai>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: ryan-crabbe-berri <ryan@berri.ai>
Co-authored-by: michelligabriele <gabriele.michelli@icloud.com>
Co-authored-by: tin-berri <tin@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: stuxf <70670632+stuxf@users.noreply.github.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Mateo Wang <277851410+mateo-berri@users.noreply.github.com>

* fix: enable compact-2026-01-12 beta header for vertex_ai provider (#30114)

* fix(team): reserve team budget raises for proxy admins on /team/update (#30030)

The caller's PERSONAL max_budget was the wrong yardstick for /team/update: a
team's spend ceiling has nothing to do with the admin's own key budget. That
comparison was an unintended side effect of reusing _check_user_team_limits()
(which exists for the /team/new path) and broke the UI, which re-sends the
unchanged budget on every save.

New behavior on /team/update for standalone teams:
- A team admin (already authorized via _verify_team_access) may freely KEEP or
  LOWER the team budget, and change models/tpm/rpm, without being gated by their
  personal limits.
- GROWING a team's spend ceiling is a budget-authority action reserved for proxy
  admins -> 403 for team admins. "Growing" covers both raising max_budget above
  the team's current finite value and removing the cap entirely (max_budget=null,
  detected via model_fields_set so an explicit null is distinguished from an
  omitted field). For a team that currently has no cap, setting a finite value is
  a restriction and is allowed.
- Org-scoped teams remain governed by _check_org_team_limits() (capped by the
  org budget).

Also reverts the #29525 existing_team_max_budget workaround in
_check_user_team_limits() back to the create-only form; /team/new still enforces
the creator's personal caps.

docs(access_control): resolve the contradiction in the team-admin section —
team admins can keep/lower the budget and manage rate limits/models, but cannot
raise the team budget (proxy-admin only).

tests: unit + behavior coverage for raise-blocked, cap-removal-blocked (team
admin), raise/removal allowed (proxy admin), uncapped-team restriction allowed,
keep/lower/resend allowed, and unchanged create-path guards.

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(ui): data-driven App Router migration E2E smoke (default + server-root-path) (#29974)

* test(ui): add a data-driven App Router migration E2E smoke

Add a growing Playwright smoke for migrated pages: for each segment it deep-links
to the path route, asserts the URL and that the dashboard shell rendered, then
clicks off to a legacy page and asserts navigation still works. Driven by
e2e_tests/fixtures/migratedPages.ts, so adding a page is one line.

Runs in two situations against the same proxy: the default mount (npm run
e2e:migration) and a non-root SERVER_ROOT_PATH mount (npm run e2e:migration:root).
globalSetup now logs in at `${SERVER_ROOT_PATH}/ui/login` so the admin storage
state is valid under a prefix. Seeded with api-reference; append the rest as their
migrations merge.

* test(ui): support headed slow-motion + watch pauses in the migration smoke

Honor SLOWMO in the server-root-path config (the default config already did),
and add an env-gated E2E_WATCH_MS pause so a headed run lingers on each state.
Both are no-ops by default, so CI behavior is unchanged.

* test(ui): make the migration smoke a sidebar-click user journey

Rework the smoke from deep-linking to a real navigation journey: start at the
landing page, click the migrated page in the sidebar (expanding submenus for
nested items), assert the path route rendered, reload it (the check a wrong
server_root_path breaks), bounce to a legacy page and back, and — once two pages
are migrated — navigate directly between two migrated pages. Verifies via URL +
shell render, driven by the same fixture list.

* test(ui): address review on the migration smoke

Escape ROOT and segment before interpolating them into RegExp URL matchers so a
future segment containing regex metacharacters can't silently widen the match.
Make the server-root-path config fail fast when SERVER_ROOT_PATH is unset instead
of silently re-running the default mount and passing without exercising the prefix.

* test(ui): drop unused watch helper and fix stale smoke README

* test(ui): run the migration smoke under a server root path in CI

* test(ui): harden + instrument the server-root-path proxy reboot in CI

* test(ui): run the server-root-path migration smoke as its own CI job

Replace the in-place proxy reboot in e2e_ui_testing with a dedicated
e2e_ui_testing_server_root_path job that boots the proxy once with
SERVER_ROOT_PATH=/litellm, matching how every other proxy variant in the
config gets its own job rather than killing and relaunching the live proxy.

The reboot was failing deterministically: after pkill -9 and relaunch the
prefixed proxy never came back up on :4000 (connection refused), so the smoke
never ran. The readiness step that was supposed to surface the cause could
never reach its boot-log tail because CircleCI runs steps under bash -eo
pipefail and the preceding `curl -sv ... | tail` aborted the step with curl's
exit 7. Booting the proxy as the job's own background step lets any boot crash
land in that step's log instead of being swallowed.

The default e2e_ui_testing job is unchanged aside from dropping the reboot,
prefixed-readiness, and prefixed-smoke steps; the migration smoke still runs at
the root mount there via the default Playwright config.

* fix(proxy): extend response headers hook to streaming, TTS, image gen, and pass-through (#24232)

* fix(proxy): extend response headers hook to streaming, TTS, image gen, and pass-through

* test: mock post_call_response_headers_hook in audio speech route tests

* chore(ui): remove dead App Router route stubs under (dashboard) (#30045)

models-and-endpoints, organizations, and virtual-keys each had a page.tsx
route under (dashboard)/ that is not in MIGRATED_PAGES, so the sidebar and
deep links never resolve to it and the route is unreachable. Each was a thin
wrapper that handed the shared view empty or no-op props (empty modelData with
a no-op setModelData, hardcoded empty organizations, no-op
setUserRole/setUserEmail), so reaching one would render a degraded page in any
case. The real wrapper belongs in the PR that flips each page into
MIGRATED_PAGES, written with eyes on it and a test

This continues the dead-scaffolding cleanup from #28891. The shared components
these wrappers rendered (ModelsAndEndpointsView, OrganizationFilters) stay,
since the legacy ?page= switch in app/page.tsx and src/components still import
them

* fix(ui/mcp): reset OAuth state on create-server modal close so a prior server's token no longer leaks into the next add-server session (#30000)

* fix(ui/mcp): reset OAuth hook state on modal close so a prior server's token no longer leaks into the next add-server session

* fix(ui/mcp): clear in-flight OAuth guard on reset and reset form/tools on modal close so nothing leaks on a parent-driven dismiss

* fix(mcp): allow team access-group grants in OAuth authorize/token access check (#30041)

* fix(mcp): honor team access-group grants in OAuth authorize/token access check

* test(mcp): mock build_effective_auth_contexts in non-admin authorize tests for isolation

* docs(security): require a reproduction video for vulnerability reports (#30048) (#30063)

With AI models capable of automated vulnerability discovery now publicly
available, we expect a large increase in report volume, much of it
unverified. Requiring a video of the exploit running against a live
instance raises the bar for submissions and keeps triage focused on
reproducible issues. Reports without a video will be closed and reopened
if one is added later.

Co-authored-by: stuxf <70670632+stuxf@users.noreply.github.com>

* feat(ui): add admin flag to disable in-product UI nudges for everyone (#29796)

* feat(ui): add admin flag to disable in-product UI nudges for everyone

Admins can now suppress the survey and Claude Code feedback popups for
all users via a single disable_ui_nudges UI setting, instead of relying
on each user dismissing them individually.

* fix(ui): suppress nudges while ui settings are loading

Gate nudgesDisabled on the ui-settings loading state so an admin with
disable_ui_nudges on doesn't see the survey prompt flash, and the
getInProductNudgesCall fetch doesn't fire, on a cold page load before
the flag resolves. Falls back to showing nudges if the fetch errors.

* test(ui): wrap CreateKeyPage test in QueryClientProvider

page.tsx now calls useUISettings (react-query), which needs a
QueryClient that layout.tsx supplies in production but the test did
not. Add the provider and mock getUiSettings so the query resolves.

* chore(ui): remove dead dashboard files and unused dependencies (#30047)

* chore(ui): remove dead dashboard files and unused dependencies

knip flagged seven orphaned source/config files with no importers and
five declared dependencies that nothing in the tree uses. Removing them
shrinks the dashboard bundle's source surface and keeps the manifest
honest; vite stays installed transitively via vitest, so test tooling is
unaffected.

* fix(ci): restore serverRootPath.config.ts referenced by SERVER_ROOT_PATH workflow

The dead-code sweep removed e2e_tests/serverRootPath.config.ts, but its spec
(tests/login/serverRootPathRedirect.spec.ts) and the test_server_root_path.yml
workflow step still depend on it, so the redirect e2e job failed to load a
config that no longer existed.

* fix(proxy): authorize batch files using upload target_model_names (LIT-3593) (#30009)

* fix(proxy): authorize batch files using upload target_model_names (LIT-3593)

After replace_model_in_jsonl, body.model is a stripped provider id. Reverse-mapping it via resolve_model_name_from_model_id is first-match on model_list and caused false 403s when multiple deployments share the same stripped name. Use target_model_names from the unified file id instead.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(proxy): restore resolve_model_name_from_model_id for JSONL fallback path (LIT-3593)

Restores the reverse-lookup for the JSONL body.model fallback path so that
legacy/pre-target_model_names managed files still map stripped provider IDs
back to proxy aliases before auth. Also cleans up redundant `or None`.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Revert "fix(proxy): restore resolve_model_name_from_model_id for JSONL fallback path (LIT-3593)"

This reverts commit 30d2e96f77ef521ccaaf2193fe554980380eb669.

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* Add Claude Fable 5 across Anthropic, Bedrock, Vertex AI, and Azure AI (#30064)

* Add Claude Fable 5 across Anthropic, Bedrock, Vertex AI, and Azure AI

Adds cost map entries for claude-fable-5 ($10/$50 per MTok, 1M context,
128K output, adaptive thinking only) on the Anthropic API, Bedrock
converse (base, global, and us/eu geo inference profiles at the 10%
regional premium), Vertex AI, and Azure AI (Microsoft Foundry, which
serves Fable 5 with the full 1M context window unlike Opus 4.8).

Registers anthropic.claude-fable-5 in BEDROCK_CONVERSE_MODELS, lists the
model in the setup wizard, and extends the reasoning effort e2e grid.
The Bedrock, Vertex, and Azure grid cells carry fail_reason markers
until the CI accounts are provisioned: Bedrock needs the provider data
sharing opt-in Fable 5 requires, and the Foundry resource needs a
claude-fable-5 deployment.

The first-party entry carries provider_specific_entry {us: 1.1} for the
inference_geo premium and deliberately no fast multiplier since Fable 5
has no fast mode.

https://claude.ai/code/session_01MZarYYT3aS7DxaNjoax6Gm

* Drop removed sampling params for Claude 4.7+ when drop_params is set

Fable 5, Opus 4.7, and Opus 4.8 removed sampling params: the API rejects
top_p, top_k, and any temperature other than 1 with a 400. LiteLLM was
forwarding them even with drop_params enabled because the Anthropic and
Bedrock converse transformations passed temperature/top_p through
unconditionally.

Mirror the GPT-5/o-series handling: temperature=1 still passes through,
other values and any top_p are dropped when drop_params is set, and
without drop_params a clean client-side UnsupportedParamsError tells the
caller how to opt in, instead of surfacing the raw provider error.

https://claude.ai/code/session_01MZarYYT3aS7DxaNjoax6Gm

* Drive sampling param gating from the cost map and cover top_k

Greptile review follow-ups on the sampling param fix: the restriction for
Fable 5 / Opus 4.7 / 4.8 is now declared as supports_sampling_params: false
on every affected cost map entry (perplexity excluded; that route is
OpenAI-compatible and maps sampling params upstream) and read back through
a tri-state map lookup, keeping the name check only as a fallback for
provider-routed ids whose hosted map entries predate the flag, the same
layering supports_adaptive_thinking uses. top_k bypasses map_openai_params
as a provider-specific kwarg, so it is gated at the shared
AnthropicConfig.transform_request boundary (direct, Bedrock invoke, Vertex,
Azure) and in the Bedrock converse _handle_top_k_value path, with
drop_params threaded through the converse transform helpers.

Also updates the reasoning effort grid cell count assertion for the four
Fable 5 rows added on this branch (29 x 11 cells).

https://claude.ai/code/session_01MZarYYT3aS7DxaNjoax6Gm

* Declare supports_sampling_params in the cost map schema

The model map validation schema uses additionalProperties: false, so the
new flag must be declared for the 28 entries that carry it; this was the
one failing job (misc / Run tests) on the previous commit.

https://claude.ai/code/session_01MZarYYT3aS7DxaNjoax6Gm

* fix(bedrock): gate top_k=0 on converse to match Anthropic boundary

Truthiness check let top_k=0 silently disappear on models that removed
sampling params, while AnthropicConfig.transform_request treats 0 as
present and raises UnsupportedParamsError (or drops when drop_params is
set). Switch to 'is not None' so converse, direct Anthropic, invoke,
Vertex, and Azure all behave the same for top_k=0.

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>

* fix: enable compact-2026-01-12 beta header for vertex_ai provider

The vertex_ai block in anthropic_beta_headers_config.json mapped
compact-2026-01-12 to null, so update_headers_with_filtered_beta
stripped the header before the request reached Vertex while the
compact_20260112 context edit stayed in the body, and Vertex rejected
the request with HTTP 400. Vertex rawPredict accepts the header, and
the bedrock and databricks blocks already forward it. Mirrors #21867,
which enabled context-1m-2025-08-07 for vertex_ai the same way.

Fixes #27290.

---------

Co-authored-by: milan-berri <milan@berri.ai>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: ryan-crabbe-berri <ryan@berri.ai>
Co-authored-by: michelligabriele <gabriele.michelli@icloud.com>
Co-authored-by: tin-berri <tin@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: stuxf <70670632+stuxf@users.noreply.github.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Mateo Wang <277851410+mateo-berri@users.noreply.github.com>

* fix(proxy): coerce litellm_settings.max_budget env var to float (#30113)

* fix(team): reserve team budget raises for proxy admins on /team/update (#30030)

The caller's PERSONAL max_budget was the wrong yardstick for /team/update: a
team's spend ceiling has nothing to do with the admin's own key budget. That
comparison was an unintended side effect of reusing _check_user_team_limits()
(which exists for the /team/new path) and broke the UI, which re-sends the
unchanged budget on every save.

New behavior on /team/update for standalone teams:
- A team admin (already authorized via _verify_team_access) may freely KEEP or
  LOWER the team budget, and change models/tpm/rpm, without being gated by their
  personal limits.
- GROWING a team's spend ceiling is a budget-authority action reserved for proxy
  admins -> 403 for team admins. "Growing" covers both raising max_budget above
  the team's current finite value and removing the cap entirely (max_budget=null,
  detected via model_fields_set so an explicit null is distinguished from an
  omitted field). For a team that currently has no cap, setting a finite value is
  a restriction and is allowed.
- Org-scoped teams remain governed by _check_org_team_limits() (capped by the
  org budget).

Also reverts the #29525 existing_team_max_budget workaround in
_check_user_team_limits() back to the create-only form; /team/new still enforces
the creator's personal caps.

docs(access_control): resolve the contradiction in the team-admin section —
team admins can keep/lower the budget and manage rate limits/models, but cannot
raise the team budget (proxy-admin only).

tests: unit + behavior coverage for raise-blocked, cap-removal-blocked (team
admin), raise/removal allowed (proxy admin), uncapped-team restriction allowed,
keep/lower/resend allowed, and unchanged create-path guards.

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(ui): data-driven App Router migration E2E smoke (default + server-root-path) (#29974)

* test(ui): add a data-driven App Router migration E2E smoke

Add a growing Playwright smoke for migrated pages: for each segment it deep-links
to the path route, asserts the URL and that the dashboard shell rendered, then
clicks off to a legacy page and asserts navigation still works. Driven by
e2e_tests/fixtures/migratedPages.ts, so adding a page is one line.

Runs in two situations against the same proxy: the default mount (npm run
e2e:migration) and a non-root SERVER_ROOT_PATH mount (npm run e2e:migration:root).
globalSetup now logs in at `${SERVER_ROOT_PATH}/ui/login` so the admin storage
state is valid under a prefix. Seeded with api-reference; append the rest as their
migrations merge.

* test(ui): support headed slow-motion + watch pauses in the migration smoke

Honor SLOWMO in the server-root-path config (the default config already did),
and add an env-gated E2E_WATCH_MS pause so a headed run lingers on each state.
Both are no-ops by default, so CI behavior is unchanged.

* test(ui): make the migration smoke a sidebar-click user journey

Rework the smoke from deep-linking to a real navigation journey: start at the
landing page, click the migrated page in the sidebar (expanding submenus for
nested items), assert the path route rendered, reload it (the check a wrong
server_root_path breaks), bounce to a legacy page and back, and — once two pages
are migrated — navigate directly between two migrated pages. Verifies via URL +
shell render, driven by the same fixture list.

* test(ui): address review on the migration smoke

Escape ROOT and segment before interpolating them into RegExp URL matchers so a
future segment containing regex metacharacters can't silently widen the match.
Make the server-root-path config fail fast when SERVER_ROOT_PATH is unset instead
of silently re-running the default mount and passing without exercising the prefix.

* test(ui): drop unused watch helper and fix stale smoke README

* test(ui): run the migration smoke under a server root path in CI

* test(ui): harden + instrument the server-root-path proxy reboot in CI

* test(ui): run the server-root-path migration smoke as its own CI job

Replace the in-place proxy reboot in e2e_ui_testing with a dedicated
e2e_ui_testing_server_root_path job that boots the proxy once with
SERVER_ROOT_PATH=/litellm, matching how every other proxy variant in the
config gets its own job rather than killing and relaunching the live proxy.

The reboot was failing deterministically: after pkill -9 and relaunch the
prefixed proxy never came back up on :4000 (connection refused), so the smoke
never ran. The readiness step that was supposed to surface the cause could
never reach its boot-log tail because CircleCI runs steps under bash -eo
pipefail and the preceding `curl -sv ... | tail` aborted the step with curl's
exit 7. Booting the proxy as the job's own background step lets any boot crash
land in that step's log instead of being swallowed.

The default e2e_ui_testing job is unchanged aside from dropping the reboot,
prefixed-readiness, and prefixed-smoke steps; the migration smoke still runs at
the root mount there via the default Playwright config.

* fix(proxy): extend response headers hook to streaming, TTS, image gen, and pass-through (#24232)

* fix(proxy): extend response headers hook to streaming, TTS, image gen, and pass-through

* test: mock post_call_response_headers_hook in audio speech route tests

* chore(ui): remove dead App Router route stubs under (dashboard) (#30045)

models-and-endpoints, organizations, and virtual-keys each had a page.tsx
route under (dashboard)/ that is not in MIGRATED_PAGES, so the sidebar and
deep links never resolve to it and the route is unreachable. Each was a thin
wrapper that handed the shared view empty or no-op props (empty modelData with
a no-op setModelData, hardcoded empty organizations, no-op
setUserRole/setUserEmail), so reaching one would render a degraded page in any
case. The real wrapper belongs in the PR that flips each page into
MIGRATED_PAGES, written with eyes on it and a test

This continues the dead-scaffolding cleanup from #28891. The shared components
these wrappers rendered (ModelsAndEndpointsView, OrganizationFilters) stay,
since the legacy ?page= switch in app/page.tsx and src/components still import
them

* fix(ui/mcp): reset OAuth state on create-server modal close so a prior server's token no longer leaks into the next add-server session (#30000)

* fix(ui/mcp): reset OAuth hook state on modal close so a prior server's token no longer leaks into the next add-server session

* fix(ui/mcp): clear in-flight OAuth guard on reset and reset form/tools on modal close so nothing leaks on a parent-driven dismiss

* fix(mcp): allow team access-group grants in OAuth authorize/token access check (#30041)

* fix(mcp): honor team access-group grants in OAuth authorize/token access check

* test(mcp): mock build_effective_auth_contexts in non-admin authorize tests for isolation

* docs(security): require a reproduction video for vulnerability reports (#30048) (#30063)

With AI models capable of automated vulnerability discovery now publicly
available, we expect a large increase in report volume, much of it
unverified. Requiring a video of the exploit running against a live
instance raises the bar for submissions and keeps triage focused on
reproducible issues. Reports without a video will be closed and reopened
if one is added later.

Co-authored-by: stuxf <70670632+stuxf@users.noreply.github.com>

* feat(ui): add admin flag to disable in-product UI nudges for everyone (#29796)

* feat(ui): add admin flag to disable in-product UI nudges for everyone

Admins can now suppress the survey and Claude Code feedback popups for
all users via a single disable_ui_nudges UI setting, instead of relying
on each user dismissing them individually.

* fix(ui): suppress nudges while ui settings are loading

Gate nudgesDisabled on the ui-settings loading state so an admin with
disable_ui_nudges on doesn't see the survey prompt flash, and the
getInProductNudgesCall fetch doesn't fire, on a cold page load before
the flag resolves. Falls back to showing nudges if the fetch errors.

* test(ui): wrap CreateKeyPage test in QueryClientProvider

page.tsx now calls useUISettings (react-query), which needs a
QueryClient that layout.tsx supplies in production but the test did
not. Add the provider and mock getUiSettings so the query resolves.

* chore(ui): remove dead dashboard files and unused dependencies (#30047)

* chore(ui): remove dead dashboard files and unused dependencies

knip flagged seven orphaned source/config files with no importers and
five declared dependencies that nothing in the tree uses. Removing them
shrinks the dashboard bundle's source surface and keeps the manifest
honest; vite stays installed transitively via vitest, so test tooling is
unaffected.

* fix(ci): restore serverRootPath.config.ts referenced by SERVER_ROOT_PATH workflow

The dead-code sweep removed e2e_tests/serverRootPath.config.ts, but its spec
(tests/login/serverRootPathRedirect.spec.ts) and the test_server_root_path.yml
workflow step still depend on it, so the redirect e2e job failed to load a
config that no longer existed.

* fix(proxy): authorize batch files using upload target_model_names (LIT-3593) (#30009)

* fix(proxy): authorize batch files using upload target_model_names (LIT-3593)

After replace_model_in_jsonl, body.model is a stripped provider id. Reverse-mapping it via resolve_model_name_from_model_id is first-match on model_list and caused false 403s when multiple deployments share the same stripped name. Use target_model_names from the unified file id instead.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(proxy): restore resolve_model_name_from_model_id for JSONL fallback path (LIT-3593)

Restores the reverse-lookup for the JSONL body.model fallback path so that
legacy/pre-target_model_names managed files still map stripped provider IDs
back to proxy aliases before auth. Also cleans up redundant `or None`.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Revert "fix(proxy): restore resolve_model_name_from_model_id for JSONL fallback path (LIT-3593)"

This reverts commit 30d2e96f77ef521ccaaf2193fe554980380eb669.

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* Add Claude Fable 5 across Anthropic, Bedrock, Vertex AI, and Azure AI (#30064)

* Add Claude Fable 5 across Anthropic, Bedrock, Vertex AI, and Azure AI

Adds cost map entries for claude-fable-5 ($10/$50 per MTok, 1M context,
128K output, adaptive thinking only) on the Anthropic API, Bedrock
converse (base, global, and us/eu geo inference profiles at the 10%
regional premium), Vertex AI, and Azure AI (Microsoft Foundry, which
serves Fable 5 with the full 1M context window unlike Opus 4.8).

Registers anthropic.claude-fable-5 in BEDROCK_CONVERSE_MODELS, lists the
model in the setup wizard, and extends the reasoning effort e2e grid.
The Bedrock, Vertex, and Azure grid cells carry fail_reason markers
until the CI accounts are provisioned: Bedrock needs the provider data
sharing opt-in Fable 5 requires, and the Foundry resource needs a
claude-fable-5 deployment.

The first-party entry carries provider_specific_entry {us: 1.1} for the
inference_geo premium and deliberately no fast multiplier since Fable 5
has no fast mode.

https://claude.ai/code/session_01MZarYYT3aS7DxaNjoax6Gm

* Drop removed sampling params for Claude 4.7+ when drop_params is set

Fable 5, Opus 4.7, and Opus 4.8 removed sampling params: the API rejects
top_p, top_k, and any temperature other than 1 with a 400. LiteLLM was
forwarding them even with drop_params enabled because the Anthropic and
Bedrock converse transformations passed temperature/top_p through
unconditionally.

Mirror the GPT-5/o-series handling: temperature=1 still passes through,
other values and any top_p are dropped when drop_params is set, and
without drop_params a clean client-side UnsupportedParamsError tells the
caller how to opt in, instead of surfacing the raw provider error.

https://claude.ai/code/session_01MZarYYT3aS7DxaNjoax6Gm

* Drive sampling param gating from the cost map and cover top_k

Greptile review follow-ups on the sampling param fix: the restriction for
Fable 5 / Opus 4.7 / 4.8 is now declared as supports_sampling_params: false
on every affected cost map entry (perplexity excluded; that route is
OpenAI-compatible and maps sampling params upstream) and read back through
a tri-state map lookup, keeping the name check only as a fallback for
provider-routed ids whose hosted map entries predate the flag, the same
layering supports_adaptive_thinking uses. top_k bypasses map_openai_params
as a provider-specific kwarg, so it is gated at the shared
AnthropicConfig.transform_request boundary (direct, Bedrock invoke, Vertex,
Azure) and in the Bedrock converse _handle_top_k_value path, with
drop_params threaded through the converse transform helpers.

Also updates the reasoning effort grid cell count assertion for the four
Fable 5 rows added on this branch (29 x 11 cells).

https://claude.ai/code/session_01MZarYYT3aS7DxaNjoax6Gm

* Declare supports_sampling_params in the cost map schema

The model map validation schema uses additionalProperties: false, so the
new flag must be declared for the 28 entries that carry it; this was the
one failing job (misc / Run tests) on the previous commit.

https://claude.ai/code/session_01MZarYYT3aS7DxaNjoax6Gm

* fix(bedrock): gate top_k=0 on converse to match Anthropic boundary

Truthiness check let top_k=0 silently disappear on models that removed
sampling params, while AnthropicConfig.transform_request treats 0 as
present and raises UnsupportedParamsError (or drops when drop_params is
set). Switch to 'is not None' so converse, direct Anthropic, invoke,
Vertex, and Azure all behave the same for top_k=0.

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>

* fix(proxy): coerce litellm_settings.max_budget env var to float

When max_budget is set in litellm_settings via os.environ/MAX_BUDGET,
the env var resolves to a string and the generic setattr branch in
ProxyConfig.load_config stored it as-is, so the startup check
litellm.max_budget > 0 raised TypeError. The earlier fix (#23855) only
covered the CLI initialize() path. Coerce the value to float in the
settings loop, matching the existing max_internal_user_budget handling.

Fixes #26696.

---------

Co-authored-by: milan-berri <milan@berri.ai>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: ryan-crabbe-berri <ryan@berri.ai>
Co-authored-by: michelligabriele <gabriele.michelli@icloud.com>
Co-authored-by: tin-berri <tin@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: stuxf <70670632+stuxf@users.noreply.github.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Mateo Wang <277851410+mateo-berri@users.noreply.github.com>

* fix(router): don't drop bedrock pass-through deployments using IAM credentials (#30111)

* Fix Bedrock passthrough deployment dropped when using IAM credentials

Bedrock deployments with use_in_pass_through enabled and IAM/OIDC auth
(aws_role_name, no api_key) hit the generic pass-through branch in
Router._initialize_deployment_for_pass_through, which calls
set_pass_through_credentials and raises "api_key is required". The
exception drops the deployment from the router entirely, breaking both
passthrough and normal routing for that model.

Skip the credential store write when no api_key is set; the bedrock
passthrough route resolves AWS credentials at request time via
BedrockConverseLLM.get_credentials(), not the passthrough credential
store, so there is nothing to register here.

Fixes #27728.

* Reset passthrough credentials singleton before api_key credential test

The test reads the module-level passthrough_endpoint_router singleton,
so a stale "openai" entry written by an earlier test in the same
process could make the assertion pass without exercising the code path.
Clearing the credentials dict up front makes the test order-independent.

* fix(sdk): stop mirroring reasoning_content in provider_specific_fields (#30110)

The dict-to-response conversion path mirrored reasoning_content into
provider_specific_fields, while live provider transforms (Anthropic's
_build_provider_specific_fields) only set it top-level on the Message.
Cache-replayed messages therefore serialized differently from live
ones, breaking disk cache key stability for multi-turn conversations
with extended thinking.

The mirror was added for DeepSeek before Message.reasoning_content
existed as a top-level attribute. The top-level field is still set by
the converter, so DeepSeek's request-side promotion is unaffected.

Fixes #27337.

* fix(mcp): coerce mcp_server_cost_info values to float at ingest (#30109)

* fix(mcp): coerce mcp_server_cost_info values to float at ingest

YAML 1.1 parses scientific notation without a decimal point
(e.g. 7e-05) as a string, and MCPServerCostInfo is a TypedDict with no
runtime validation, so a string-typed default_cost_per_query from
config.yaml flowed through the proxy untouched and crashed the MCP
server settings page with '.toFixed is not a function'. Normalize
mcp_server_cost_info on both the config and DB load paths, dropping
non-numeric values with a warning instead of failing the server load.

Fixes #27097.

* fix(mcp): drop non-numeric default_cost_per_query instead of nulling it

Keeping the key with a None value still exposes a null to the UI,
which can crash .toFixed formatting when the consumer checks key
existence rather than truthiness. Delete the key on coercion failure,
matching how non-numeric per-tool cost entries are already omitted.

* fix(proxy): count embedding and text completion tokens toward TPM limits (#30105)

* fix(proxy): count embedding and text completion tokens toward TPM limits

The parallel request limiters only read token usage off ModelResponse,
so EmbeddingResponse and TextCompletionResponse objects left
total_tokens at 0 and the per key, user, team, and end user TPM
counters never incremented. Requests to /v1/embeddings and
/v1/completions were effectively free against any tpm_limit. In the v3
limiter this was worse: the post-call reconciliation computed actual
usage as 0 and refunded the pre-call reservation made at request time.

Broaden the isinstance checks to accept EmbeddingResponse and
TextCompletionResponse, which both expose a Usage object, at the four
per-scope sites in parallel_request_limiter.py and at the usage
extraction in parallel_request_limiter_v3.py. ResponsesAPIResponse was
already covered in v3 via BaseLiteLLMOpenAIResponseObject.

Fixes #27738.

* test(proxy): cover v1 limiter TPM counting for embedding and text completion responses

Exercise the broadened isinstance sites in parallel_request_limiter.py
by asserting that async_log_success_event adds total_tokens to the per
key, user, team, and end user TPM counters for EmbeddingResponse and
TextCompletionResponse objects. The counters are pre-seeded at zero so
the assertion is exactly the increment; on the pre-fix code these
responses left total_tokens at 0 and the test fails.

* fix(openai): forward client headers on the text completion path (#30103)

* fix(openai): forward client headers on the text completion path

litellm.completion() merges caller headers with extra_headers, but the
text-completion-openai branch never passed the merged dict to
openai_text_completions.completion(), and the handler only used its
headers argument for logging. Pass the merged headers through the call
site and set them as extra_headers on the outgoing request, mirroring
the chat completion handler, so x-* client headers forwarded by the
proxy reach the provider on /v1/completions.

Fixes #27410.

* Drop redundant extra_headers assignment and fix test module collision

completion() merges extra_headers into headers before the
text-completion-openai branch, and the handler now sets the merged
headers as extra_headers on the request, so the branch-local
optional_params["extra_headers"] assignment was a dead duplicate.
Removing it keeps the assignment in one place while both entry paths
(litellm.text_completion and direct handler callers) still forward
headers; a new regression test pins the extra_headers kwarg path.

Also rename the test module to test_completion_handler.py since its
basename collided with tests/test_litellm/llms/bedrock/batches/
test_handler.py and broke pytest collection.

* fix(bedrock): route Anthropic-shape count_tokens to InvokeModel and base64-encode the body (#30102)

* fix(bedrock): route Anthropic-shape count_tokens to InvokeModel

POST /v1/messages/count_tokens with Anthropic content blocks
({"type": "text"|"tool_use"|...}) was routed to the Converse input of
the Bedrock CountTokens API. The Converse transform copies list content
through verbatim, so Bedrock rejected the request with a 400 and the
caller silently fell back to the local tokenizer, returning counts that
can be off by ~50% on tool-heavy payloads.

_detect_input_type now routes messages whose content blocks carry a
"type" key (Anthropic shape) to the invokeModel input, which forwards
the body verbatim. The invokeModel body is now base64-encoded as the
CountTokens API requires (InvokeModelTokensRequest.body is a
base64-encoded blob), and Anthropic Messages bodies get the
anthropic_version and max_tokens fields Bedrock validates against.

Fixes #27632.

* refactor(bedrock): name the CountTokens max_tokens placeholder

Replace the magic 1024 with a module-level
DEFAULT_ANTHROPIC_INVOKE_MODEL_MAX_TOKENS constant so the intent is
explicit and there is a single place to update if Bedrock's InvokeModel
schema ever changes. Module-local rather than litellm/constants.py
because the value is only a schema-validation placeholder for token
counting, not a user-tunable generation default.

* Add above-512k pricing tier for MiniMax-M3 and correct its base rates (#30095)

* Add above-512k pricing tier support for MiniMax-M3

MiniMax-M3 doubles its per-token rates once a prompt exceeds 512k
input tokens. The tiered cost parser already handles arbitrary
thresholds, but get_model_info only copies whitelisted keys from
ModelInfoBase, which had no 512k variants, so above_512k keys were
silently dropped and long-context requests were priced at the flat
rate.

Add the input, output, and cache-read above_512k_tokens fields to
ModelInfoBase and pass them through in get_model_info. Update the
minimax/MiniMax-M3 entry with the tiered rates and correct the base
rates, which matched the above-512k tier instead of the published
base tier (https://platform.minimax.io/docs/guides/pricing-paygo).

Fixes #29663.

* Add above-512k keys to pricing schema, set MiniMax-M3 context to 1M

Register the three new above_512k_tokens cost keys in the INTENDED_SCHEMA
of test_aaamodel_prices_and_context_window_json_is_valid, declared the same
way as the existing above_200k/above_272k tier keys, so the schema check
accepts the MiniMax-M3 tiered pricing entry.

Also raise MiniMax-M3 max_input_tokens from 512000 to 1000000 in both
pricing JSONs. The MiniMax API docs
(https://platform.minimax.io/docs/guides/text-generation) state the model
supports a 1,000,000-token context window, and the pay-as-you-go pricing
page (https://platform.minimax.io/docs/guides/pricing-paygo) prices input
above 512k tokens, which only makes sense if inputs beyond 512k are
accepted. This makes the above-512k pricing tier reachable.

* fix(bedrock): make document names unique across conversation turns (#30093)

* fix(bedrock): make document names unique across conversation turns

PR #16275 derived Bedrock document names purely from a content hash so
that names stay deterministic for prompt caching. When the same PDF or
document appears in more than one conversation turn, every occurrence
gets the identical name and Bedrock rejects the request with "Messages
can not contain duplicate document names".

Add _rename_duplicate_bedrock_document_names, a post-pass over the
assembled message blocks that keeps the first occurrence's hash-based
name and appends a positional suffix (_2, _3, ...) to later
occurrences. Apply it in both _bedrock_converse_messages_pt and
_bedrock_converse_messages_pt_async. Names remain deterministic across
requests and the first occurrence is unchanged, so prompt cache
prefixes stay stable.

Fixes #29418.

* fix(bedrock): avoid suffix collisions with organic document names

A renamed duplicate could collide with a document whose hash-derived
name already ends in the same positional suffix (e.g. an organic
report_2 next to two documents named report). Collect every document
name up front and bump the suffix until the candidate is unused, so
renames can collide neither with organic names nor with each other.

* fix(_types): remove ResponsesAPIResponse from PassThroughEndpointLoggingResultValues

The import of ResponsesAPIResponse was removed from the file but a usage
was left in the Union type, causing a NameError on import and breaking
all CI tests. Remove the stale reference to match the cleanup intent.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(_types): restore ResponsesAPIResponse import and add use_xai_oauth to filter list

Two related fixes:
1. Re-add ResponsesAPIResponse import in _types.py — it was removed but still
   needed in PassThroughEndpointLoggingResultValues (used in
   openai_passthrough_logging_handler.py).
2. Add use_xai_oauth to all_litellm_params so it is filtered before forwarding
   kwargs to providers like OpenAI that do not recognize it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Hari <kancharla.ha@northeastern.edu>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: Ceder Dens <ceder.dens@uantwerpen.be>
Co-authored-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>
Co-authored-by: 冯基魁 <56265583+fengjikui@users.noreply.github.com>
Co-authored-by: victoruce <161634297+victoruce@users.noreply.github.com>
Co-authored-by: kejunleng <33445544+silencedoctor@users.noreply.github.com>
Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: Tyson Cung <45380903+tysoncung@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Jeremy Chapeau <113923302+jychp@users.noreply.github.com>
Co-authored-by: Daan <255322319+daanhendrio@users.noreply.github.com>
Co-authored-by: Avani Prajapati <143805019+Avani-prajapati@users.noreply.github.com>
Co-authored-by: Kent <72616338+kingdoooo@users.noreply.github.com>
Co-authored-by: daitran-tensormesh <dai@tensormesh.ai>
Co-authored-by: Dimitris Spachos <dspachos@gmail.com>
Co-authored-by: Liam Scott <liam@uilliam.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Filippo Menghi <113345637+Cyberfilo@users.noreply.github.com>
Co-authored-by: milan-berri <milan@berri.ai>
Co-authored-by: ryan-crabbe-berri <ryan@berri.ai>
Co-authored-by: michelligabriele <gabriele.michelli@icloud.com>
Co-authored-by: tin-berri <tin@berri.ai>
Co-authored-by: stuxf <70670632+stuxf@users.noreply.github.com>
Co-authored-by: Mateo Wang <277851410+mateo-berri@users.noreply.github.com>
2026-06-10 10:34:07 -07:00
michelligabriele
2fe9feda71
fix(caching): restore stored prompt_tokens on embedding cache hits instead of recomputing (#30046) 2026-06-10 15:49:20 +05:30
Mateo Wang
e15b37a18e
Add Claude Fable 5 across Anthropic, Bedrock, Vertex AI, and Azure AI (#30064)
* Add Claude Fable 5 across Anthropic, Bedrock, Vertex AI, and Azure AI

Adds cost map entries for claude-fable-5 ($10/$50 per MTok, 1M context,
128K output, adaptive thinking only) on the Anthropic API, Bedrock
converse (base, global, and us/eu geo inference profiles at the 10%
regional premium), Vertex AI, and Azure AI (Microsoft Foundry, which
serves Fable 5 with the full 1M context window unlike Opus 4.8).

Registers anthropic.claude-fable-5 in BEDROCK_CONVERSE_MODELS, lists the
model in the setup wizard, and extends the reasoning effort e2e grid.
The Bedrock, Vertex, and Azure grid cells carry fail_reason markers
until the CI accounts are provisioned: Bedrock needs the provider data
sharing opt-in Fable 5 requires, and the Foundry resource needs a
claude-fable-5 deployment.

The first-party entry carries provider_specific_entry {us: 1.1} for the
inference_geo premium and deliberately no fast multiplier since Fable 5
has no fast mode.

https://claude.ai/code/session_01MZarYYT3aS7DxaNjoax6Gm

* Drop removed sampling params for Claude 4.7+ when drop_params is set

Fable 5, Opus 4.7, and Opus 4.8 removed sampling params: the API rejects
top_p, top_k, and any temperature other than 1 with a 400. LiteLLM was
forwarding them even with drop_params enabled because the Anthropic and
Bedrock converse transformations passed temperature/top_p through
unconditionally.

Mirror the GPT-5/o-series handling: temperature=1 still passes through,
other values and any top_p are dropped when drop_params is set, and
without drop_params a clean client-side UnsupportedParamsError tells the
caller how to opt in, instead of surfacing the raw provider error.

https://claude.ai/code/session_01MZarYYT3aS7DxaNjoax6Gm

* Drive sampling param gating from the cost map and cover top_k

Greptile review follow-ups on the sampling param fix: the restriction for
Fable 5 / Opus 4.7 / 4.8 is now declared as supports_sampling_params: false
on every affected cost map entry (perplexity excluded; that route is
OpenAI-compatible and maps sampling params upstream) and read back through
a tri-state map lookup, keeping the name check only as a fallback for
provider-routed ids whose hosted map entries predate the flag, the same
layering supports_adaptive_thinking uses. top_k bypasses map_openai_params
as a provider-specific kwarg, so it is gated at the shared
AnthropicConfig.transform_request boundary (direct, Bedrock invoke, Vertex,
Azure) and in the Bedrock converse _handle_top_k_value path, with
drop_params threaded through the converse transform helpers.

Also updates the reasoning effort grid cell count assertion for the four
Fable 5 rows added on this branch (29 x 11 cells).

https://claude.ai/code/session_01MZarYYT3aS7DxaNjoax6Gm

* Declare supports_sampling_params in the cost map schema

The model map validation schema uses additionalProperties: false, so the
new flag must be declared for the 28 entries that carry it; this was the
one failing job (misc / Run tests) on the previous commit.

https://claude.ai/code/session_01MZarYYT3aS7DxaNjoax6Gm

* fix(bedrock): gate top_k=0 on converse to match Anthropic boundary

Truthiness check let top_k=0 silently disappear on models that removed
sampling params, while AnthropicConfig.transform_request treats 0 as
present and raises UnsupportedParamsError (or drops when drop_params is
set). Switch to 'is not None' so converse, direct Anthropic, invoke,
Vertex, and Azure all behave the same for top_k=0.

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2026-06-10 08:50:15 +05:30
Sameer Kankute
2cd7e87485
fix(proxy): authorize batch files using upload target_model_names (LIT-3593) (#30009)
* fix(proxy): authorize batch files using upload target_model_names (LIT-3593)

After replace_model_in_jsonl, body.model is a stripped provider id. Reverse-mapping it via resolve_model_name_from_model_id is first-match on model_list and caused false 403s when multiple deployments share the same stripped name. Use target_model_names from the unified file id instead.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(proxy): restore resolve_model_name_from_model_id for JSONL fallback path (LIT-3593)

Restores the reverse-lookup for the JSONL body.model fallback path so that
legacy/pre-target_model_names managed files still map stripped provider IDs
back to proxy aliases before auth. Also cleans up redundant `or None`.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Revert "fix(proxy): restore resolve_model_name_from_model_id for JSONL fallback path (LIT-3593)"

This reverts commit 30d2e96f77ef521ccaaf2193fe554980380eb669.

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-10 08:22:15 +05:30
ryan-crabbe-berri
9e0d92c129
chore(ui): remove dead dashboard files and unused dependencies (#30047)
* chore(ui): remove dead dashboard files and unused dependencies

knip flagged seven orphaned source/config files with no importers and
five declared dependencies that nothing in the tree uses. Removing them
shrinks the dashboard bundle's source surface and keeps the manifest
honest; vite stays installed transitively via vitest, so test tooling is
unaffected.

* fix(ci): restore serverRootPath.config.ts referenced by SERVER_ROOT_PATH workflow

The dead-code sweep removed e2e_tests/serverRootPath.config.ts, but its spec
(tests/login/serverRootPathRedirect.spec.ts) and the test_server_root_path.yml
workflow step still depend on it, so the redirect e2e job failed to load a
config that no longer existed.
2026-06-09 17:54:38 -07:00
ryan-crabbe-berri
248176112e
feat(ui): add admin flag to disable in-product UI nudges for everyone (#29796)
* feat(ui): add admin flag to disable in-product UI nudges for everyone

Admins can now suppress the survey and Claude Code feedback popups for
all users via a single disable_ui_nudges UI setting, instead of relying
on each user dismissing them individually.

* fix(ui): suppress nudges while ui settings are loading

Gate nudgesDisabled on the ui-settings loading state so an admin with
disable_ui_nudges on doesn't see the survey prompt flash, and the
getInProductNudgesCall fetch doesn't fire, on a cold page load before
the flag resolves. Falls back to showing nudges if the fetch errors.

* test(ui): wrap CreateKeyPage test in QueryClientProvider

page.tsx now calls useUISettings (react-query), which needs a
QueryClient that layout.tsx supplies in production but the test did
not. Add the provider and mock getUiSettings so the query resolves.
2026-06-09 17:45:42 -07:00
yuneng-jiang
50522157dc
docs(security): require a reproduction video for vulnerability reports (#30048) (#30063)
With AI models capable of automated vulnerability discovery now publicly
available, we expect a large increase in report volume, much of it
unverified. Requiring a video of the exploit running against a live
instance raises the bar for submissions and keeps triage focused on
reproducible issues. Reports without a video will be closed and reopened
if one is added later.

Co-authored-by: stuxf <70670632+stuxf@users.noreply.github.com>
2026-06-09 14:59:50 -07:00
tin-berri
5b7063d194
fix(mcp): allow team access-group grants in OAuth authorize/token access check (#30041)
* fix(mcp): honor team access-group grants in OAuth authorize/token access check

* test(mcp): mock build_effective_auth_contexts in non-admin authorize tests for isolation
2026-06-09 14:19:11 -07:00
tin-berri
d8fe091938
fix(ui/mcp): reset OAuth state on create-server modal close so a prior server's token no longer leaks into the next add-server session (#30000)
* fix(ui/mcp): reset OAuth hook state on modal close so a prior server's token no longer leaks into the next add-server session

* fix(ui/mcp): clear in-flight OAuth guard on reset and reset form/tools on modal close so nothing leaks on a parent-driven dismiss
2026-06-09 14:18:28 -07:00
ryan-crabbe-berri
38edf241a4
chore(ui): remove dead App Router route stubs under (dashboard) (#30045)
models-and-endpoints, organizations, and virtual-keys each had a page.tsx
route under (dashboard)/ that is not in MIGRATED_PAGES, so the sidebar and
deep links never resolve to it and the route is unreachable. Each was a thin
wrapper that handed the shared view empty or no-op props (empty modelData with
a no-op setModelData, hardcoded empty organizations, no-op
setUserRole/setUserEmail), so reaching one would render a degraded page in any
case. The real wrapper belongs in the PR that flips each page into
MIGRATED_PAGES, written with eyes on it and a test

This continues the dead-scaffolding cleanup from #28891. The shared components
these wrappers rendered (ModelsAndEndpointsView, OrganizationFilters) stay,
since the legacy ?page= switch in app/page.tsx and src/components still import
them
2026-06-09 14:05:09 -07:00
michelligabriele
fe60f9d0f1
fix(proxy): extend response headers hook to streaming, TTS, image gen, and pass-through (#24232)
* fix(proxy): extend response headers hook to streaming, TTS, image gen, and pass-through

* test: mock post_call_response_headers_hook in audio speech route tests
2026-06-09 22:10:23 +02:00
ryan-crabbe-berri
6ae8a509f0
test(ui): data-driven App Router migration E2E smoke (default + server-root-path) (#29974)
* test(ui): add a data-driven App Router migration E2E smoke

Add a growing Playwright smoke for migrated pages: for each segment it deep-links
to the path route, asserts the URL and that the dashboard shell rendered, then
clicks off to a legacy page and asserts navigation still works. Driven by
e2e_tests/fixtures/migratedPages.ts, so adding a page is one line.

Runs in two situations against the same proxy: the default mount (npm run
e2e:migration) and a non-root SERVER_ROOT_PATH mount (npm run e2e:migration:root).
globalSetup now logs in at `${SERVER_ROOT_PATH}/ui/login` so the admin storage
state is valid under a prefix. Seeded with api-reference; append the rest as their
migrations merge.

* test(ui): support headed slow-motion + watch pauses in the migration smoke

Honor SLOWMO in the server-root-path config (the default config already did),
and add an env-gated E2E_WATCH_MS pause so a headed run lingers on each state.
Both are no-ops by default, so CI behavior is unchanged.

* test(ui): make the migration smoke a sidebar-click user journey

Rework the smoke from deep-linking to a real navigation journey: start at the
landing page, click the migrated page in the sidebar (expanding submenus for
nested items), assert the path route rendered, reload it (the check a wrong
server_root_path breaks), bounce to a legacy page and back, and — once two pages
are migrated — navigate directly between two migrated pages. Verifies via URL +
shell render, driven by the same fixture list.

* test(ui): address review on the migration smoke

Escape ROOT and segment before interpolating them into RegExp URL matchers so a
future segment containing regex metacharacters can't silently widen the match.
Make the server-root-path config fail fast when SERVER_ROOT_PATH is unset instead
of silently re-running the default mount and passing without exercising the prefix.

* test(ui): drop unused watch helper and fix stale smoke README

* test(ui): run the migration smoke under a server root path in CI

* test(ui): harden + instrument the server-root-path proxy reboot in CI

* test(ui): run the server-root-path migration smoke as its own CI job

Replace the in-place proxy reboot in e2e_ui_testing with a dedicated
e2e_ui_testing_server_root_path job that boots the proxy once with
SERVER_ROOT_PATH=/litellm, matching how every other proxy variant in the
config gets its own job rather than killing and relaunching the live proxy.

The reboot was failing deterministically: after pkill -9 and relaunch the
prefixed proxy never came back up on :4000 (connection refused), so the smoke
never ran. The readiness step that was supposed to surface the cause could
never reach its boot-log tail because CircleCI runs steps under bash -eo
pipefail and the preceding `curl -sv ... | tail` aborted the step with curl's
exit 7. Booting the proxy as the job's own background step lets any boot crash
land in that step's log instead of being swallowed.

The default e2e_ui_testing job is unchanged aside from dropping the reboot,
prefixed-readiness, and prefixed-smoke steps; the migration smoke still runs at
the root mount there via the default Playwright config.
2026-06-09 10:40:01 -07:00
milan-berri
d84499e0f2
fix(team): reserve team budget raises for proxy admins on /team/update (#30030)
The caller's PERSONAL max_budget was the wrong yardstick for /team/update: a
team's spend ceiling has nothing to do with the admin's own key budget. That
comparison was an unintended side effect of reusing _check_user_team_limits()
(which exists for the /team/new path) and broke the UI, which re-sends the
unchanged budget on every save.

New behavior on /team/update for standalone teams:
- A team admin (already authorized via _verify_team_access) may freely KEEP or
  LOWER the team budget, and change models/tpm/rpm, without being gated by their
  personal limits.
- GROWING a team's spend ceiling is a budget-authority action reserved for proxy
  admins -> 403 for team admins. "Growing" covers both raising max_budget above
  the team's current finite value and removing the cap entirely (max_budget=null,
  detected via model_fields_set so an explicit null is distinguished from an
  omitted field). For a team that currently has no cap, setting a finite value is
  a restriction and is allowed.
- Org-scoped teams remain governed by _check_org_team_limits() (capped by the
  org budget).

Also reverts the #29525 existing_team_max_budget workaround in
_check_user_team_limits() back to the create-only form; /team/new still enforces
the creator's personal caps.

docs(access_control): resolve the contradiction in the team-admin section —
team admins can keep/lower the budget and manage rate limits/models, but cannot
raise the team budget (proxy-admin only).

tests: unit + behavior coverage for raise-blocked, cap-removal-blocked (team
admin), raise/removal allowed (proxy admin), uncapped-team restriction allowed,
keep/lower/resend allowed, and unchanged create-path guards.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-09 09:19:15 -07:00
tin-berri
51ba6e39cd
fix(mcp): load MCP tool configuration tools via the OBO/passthrough-aware GET path (#29960)
* fix(ui): load MCP tool configuration tools via the OBO/passthrough-aware GET path

* fix(mcp): admin-only include_disabled_tools so the settings UI shows toggled-off tools

* fix(ui): repopulate MCP server edit form when server data loads after mount (OAuth return)

* fix(ui): persist MCP OAuth token on save and return to the Settings tab after authorize

* fix(ui): scope MCP OAuth callback to the initiating form so create and edit flows don't cross-talk

* fix(ui): derive OAuth-return Settings tab via lazy state init instead of setState-in-effect

* Fix MCP OAuth edit token handling

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2026-06-08 19:58:51 -07:00
Sameer Kankute
424db6a980
feat(azure_ai): add MAI-Image-2.5 image generation support (#29688)
* feat(azure_ai): add MAI-Image-2.5 image generation support

Route azure_ai MAI models to /mai/v1/images/generations and map OpenAI size to width/height for the serverless API.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(azure_ai): address MAI image generation review feedback

Validate unsupported size values, default width/height independently, add MAI-Image-2.5 pricing, and expand test coverage.

@greptileai

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(azure_ai): add MAI image edit and expand model cost map

Add MAI image edit support with usage normalization for Azure response format,
and register MAI-Image-2.5-Flash and MAI-Image-2e pricing in the model map.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(azure_ai): validate MAI edit size by consuming map iterator

Greptile: lazy map() never evaluated int() so values like 1024xabc passed through.
Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(azure_ai): normalize MAI usage in generation response handler

Apply normalize_mai_image_usage before building ImageResponse so token-based
cost calculation works when Azure returns num_output_tokens fields.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(azure_ai): narrow MAI edit size param type for mypy

Co-authored-by: Cursor <cursoragent@cursor.com>

* Fix Azure MAI image response handling

* Fix MAI image generation base model routing

* fix(azure_ai): preserve zero num_output_tokens in MAI usage normalization

* fix(azure_ai): wrap MAI generation response JSON parsing in error handling

* fix(azure_ai): build MAI image edit URL correctly for /mai/ root bases

* fix(azure_ai): build MAI image generation URL correctly for /mai/ root bases

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
2026-06-08 18:27:04 -07:00
tin-berri
92817cb65b
changing expires_in default to use actual slack return details (#29951) 2026-06-08 18:13:06 -07:00
yuneng-jiang
1bbaf1c39d
fix(guardrails): read CrowdStrike AIDR identity from both metadata bags (#29991)
Capture user_id and extra_info from metadata or litellm_metadata. The single-bag read dropped identity whenever a request carried a present litellm_metadata field (null or a user-supplied dict), since /chat/completions routes the authenticated identity into metadata while the guardrail read litellm_metadata first
2026-06-08 17:46:28 -07:00
milan-berri
411bd3da5b
feat(vantage): include organization metadata in FOCUS Tags export (#28184)
* feat(vantage): include organization metadata in FOCUS Tags export

Join LiteLLM_OrganizationTable when building Vantage/FOCUS export rows so
organization_id and organization_alias appear in Tags for org-level filtering.

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(focus): include api_requests in organization Tags tests

FocusTransformer now requires api_requests after staging merge; add the
column to test fixtures so integrations CI can run the Tags assertions.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-09 02:59:21 +03:00
yuneng-jiang
c24a3603d9
fix(team-management): delete a team's BYOK models when the team is deleted (#29977)
A team's BYOK models (rows in LiteLLM_ProxyModelTable with model_info.team_id set)
were left orphaned when the team was deleted; they lingered in the database and kept
showing on the Models + Endpoints page. delete_team now removes them via a new
delete_team_models helper that deletes the rows in one transaction and syncs the
in-memory router only after that transaction commits, run before the team rows are
deleted so a mid-flight failure never leaves the team gone with its models orphaned
2026-06-08 16:55:35 -07:00
yuneng-jiang
bac2590b39
build(deps): bump pyjwt to 2.13.0 and ws override to 8.20.1 (#29982)
Raise the PyJWT floor in pyproject (>=2.13.0,<3.0) and re-resolve uv.lock so
the proxy installs 2.13.0 instead of 2.12.0. Bump the ws transitive-version
override in the dashboard from 8.19.0 to 8.20.1 and regenerate package-lock;
jsdom and openai both dedupe onto the single 8.20.1 copy.

Both are routine dependency maintenance bumps to keep pinned versions current.
2026-06-08 16:39:21 -07:00
milan-berri
f59e4ebc9e
fix(ui): show team projects to internal users (#28855)
Allow internal users to fetch their backend-scoped project list so the key creation project dropdown can populate for selected teams.
2026-06-08 16:27:35 -07:00