litellm

History

Sameer Kankute 3b40ac987f Litellm oss 090626 (#30021 ) * fix(mcp): report scoped server name during initialize (#29865) * fix mcp scoped server name * Update litellm/proxy/_experimental/mcp_server/mcp_context.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * test(mcp): cover scoped server name in the SSE initialize handler --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix(ui): show all session logs in the drawer, not just the first 50 (#29795) * fix(ui): show newest session logs first * test(ui): keep session log pagination coverage * fix(ui): show all session logs in the drawer, not just the first page The session detail drawer fetched session logs via sessionSpendLogsCall without page/page_size, so it only ever received the backend default of one page (50 rows). Sessions with more than 50 calls had the rest unreachable in the UI (#29153). sessionSpendLogsCall now takes page/page_size, and the drawer fetches the first page, reads total_pages, then fetches the remaining pages and accumulates them before the existing client-side sort. This keeps the single continuous list (and the selected-log lookup and keyboard navigation, which all assume the full session) correct. Fetching is bounded by a page cap, and the sidebar shows a "showing most recent N" note if a session exceeds it. The rows are lightweight metadata (the endpoint excludes messages/response), so the full set is small; request/response bodies are still loaded per log on demand. * fix(ui): default session drawer to most recent log, newest first Open a session with its most recent log selected, and order the sidebar newest-first to match the all-sessions logs overview. MCP calls stay grouped last. The latest log by time is computed explicitly, since the MCP grouping means it is not always the first row. * Apply fetching pages in batches suggestion from @greptile-apps[bot] Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix(ui): derive session total from accumulated rows when backend omits it Compute the session total after all pages are fetched, falling back to the accumulated row count rather than the first page's. Guards the truncation note against a backend response that omits total but spans multiple pages. --------- Co-authored-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix(proxy): handle Mistral multipart passthrough (#29927) * fix(proxy): handle Mistral multipart passthrough * chore: satisfy passthrough ci formatting * test(proxy): cover Mistral passthrough in CI shard * fix(vertex_ai): use REP host for context caching on eu/us multi-region endpoints (#29573) Context caching built the cachedContents URL as https://{location}-aiplatform.googleapis.com, which is an invalid host for the eu/us multi-region endpoints and returns 404. The inference path already resolves these to the REP host (https://aiplatform.{geo}.rep.googleapis.com) via get_vertex_base_url(); reuse that helper in _get_token_and_url_context_caching so caching uses the same host as inference. Adds tests covering the eu/us multi-region cachedContents URLs (v1 and v1beta1). Fixes #29571 * Support per-model encrypted content affinity config (#29760) Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> * fix: propagate upstream status code in proxy API exception handler (#29402) * fix: propagate upstream status code in proxy API exception handler When Google GenAI / Vertex returns a 404 for deprecated or missing models via streamGenerateContent, the exception was falling through to a generic handler that defaulted to 500. Now provider exceptions carrying a valid HTTP status_code correctly propagate it through to the ProxyException. * fix: apply black formatting to common_request_processing.py * fix: tighten status code range to 400-599 and deduplicate ProxyException raise * fix(tests): use valid vertex_location in context caching tests Replace "test_location" (contains underscore) with "us-central1" so tests pass the regex validation added in get_vertex_base_url(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(sdk): add xAI OAuth provider (#29866) * Add xAI OAuth provider * Update oauth.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Fix xAI OAuth CI failures * Add xAI OAuth coverage tests * Move xAI OAuth coverage tests to core utils * Address xAI OAuth review comments * Prevent xAI OAuth api_base token exfiltration * Treat blank xAI OAuth api keys as absent * Wrap invalid xAI OAuth JSON responses * Use xAI OAuth behind explicit flag --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix(proxy) #27734 allow clearing budget_duration and team_member fields by sending null on /key/update and /team/update (#27751) * fix(proxy): allow clearing budget_duration and team_member fields by sending null on /key/update and /team/update Fixes #27734 Sending null for budget_duration, team_member_budget, team_member_budget_duration, team_member_rpm_limit, or team_member_tpm_limit via /key/update or /team/update returned 200 OK but silently ignored the null value. The fields remained unchanged in the database. Root causes: - /key/update: prepare_key_update_data() popped budget_duration from the update dict but never re-added it (or budget_reset_at) when the value was None. - /team/update: _set_budget_reset_at() only acted when budget_duration was non-None, leaving a stale budget_reset_at in the DB. - /team/update: team_member_* null values bypassed the budget table update entirely because should_create_budget() requires at least one non-None field. * test(proxy): cover no-budget-row path in clear_team_member_budget_fields * fix(presidio): unmask PII tokens in Anthropic native SSE streaming bytes (#30028) * fix(presidio): unmask PII tokens in Anthropic native SSE streaming bytes When output_parse_pii=true on the Anthropic native path (anthropic/claude-), response chunks arrive as raw bytes in SSE format. _stream_pii_unmasking was yielding those bytes unchanged, so <PERSON_1> tokens were never replaced with the original values before reaching the caller. Add _unmask_sse_bytes_chunk to parse each data: line, find content_block_delta / text_delta events, and apply _unmask_pii_text before re-encoding. Wire it into _stream_pii_unmasking so bytes chunks are unmasked when pii_tokens exist. fix(presidio): handle CRLF line endings and non-ASCII PII in SSE unmask Strip trailing \r before the [DONE] guard so CRLF-terminated SSE chunks don't bypass it and silently swallow a JSONDecodeError. Add ensure_ascii=False to json.dumps so non-ASCII replacement values like accented names are preserved as UTF-8 on the wire rather than being \uXXXX-escaped. Add regression tests for both cases. * feat(bedrock_mantle): path-aware Responses routing (/v1/responses vs /openai/v1/responses) (#29925) * feat(bedrock_mantle): path-aware Responses routing (/v1/responses vs /openai/v1/responses) Bedrock Mantle serves the Responses API on two upstream paths: - gpt frontier models (gpt-5.5 / gpt-5.4) on /openai/v1/responses - every other Responses-capable model (e.g. gpt-oss) on the standard /v1/responses BedrockMantleResponsesAPIConfig gains a `use_openai_path` flag; the provider gate in utils.py picks the path per model: openai.gpt-* (non gpt-oss) -> /openai/v1/responses; any model declared mode=responses (price-map entry or user model_info) -> /v1/responses; everything else returns None and keeps the existing chat-completions emulation. Adds gpt-5.5 / gpt-5.4 price-map entries, registry wiring, and the routing-matrix tests. * feat(bedrock_mantle): data-driven frontier routing via use_openai_responses_path Addresses the Greptile review point that frontier detection should be a price-map field rather than a hardcoded name match. The gate now routes a model to /openai/v1/responses when its price-map entry declares use_openai_responses_path, so a frontier model whose name does not follow the openai.gpt- convention can be onboarded by JSON alone. The name-convention check is kept as a fallback that needs no price-map entry, which preserves zero-change routing for a future gpt-6 before its entry loads. gpt-5.5 / gpt-5.4 get the flag in both price maps. Adds tests for the data-driven flag path and for the flag presence on the gpt-5.x entries; both branches are mutation-tested. * test(model_prices): allow use_openai_responses_path in price-map schema The model_prices_and_context_window.json schema validator (test_aaamodel_prices_and_context_window_json_is_valid) enforces additionalProperties: false, so the new use_openai_responses_path flag on the gpt-5.5 / gpt-5.4 entries failed validation. Add it to the schema as a boolean, alongside the other supports_* / capability flags. * Add Tensormesh serverless models to the model cost map (#30037) * Add Tensormesh serverless models to the model cost map * Flag reasoning support on the Tensormesh models that expose thinking mode * fix(proxy): invalidate stale key spend counter after budget reset or manual spend update (#30001) * fix(proxy): reconcile stale key spend counter after budget reset * fix(proxy): invalidate stale key spend counter after budget reset or manual spend update * fix(proxy): remove read-time stale counter reconciliation to prevent budget bypass * revert: undo unrelated formatting changes in enterprise directory * test(proxy): add unit test for key spend update invalidating counter * test(proxy): fix mocked update_data and hash token expectations in unit test * fix(proxy): use Responses-API transformer in pass-through cost tracking (#29728) The `elif is_responses:` branch of `openai_passthrough_handler` was calling the chat-completions `transform_response` on a Responses API payload. The chat-completions transformer expects `choices: [...]` in the raw response; the Responses API uses `output: [...]` and `usage.input_tokens` / `usage.output_tokens` (not `prompt_tokens` / `completion_tokens`). The result was a KeyError 'choices' deep inside `convert_to_model_response_object`, swallowed by the surrounding `except Exception` in the handler, and the SpendLogs row was written by the fallback path with zeroed-out tokens, spend, and model. This bug silently undercounts cost for every successful pass-through call to either OpenAI's `/v1/responses` or Azure's `/openai/v1/responses` (deployments configured for the Responses API). Reproduced 2026-06-04 against a real Azure OpenAI Responses API deployment proxied through LiteLLM v1.88.0. Fix: use the dedicated `OpenAIResponsesAPIConfig.transform_response_api_response` for the Responses branch. This transformer already exists in LiteLLM (`litellm/llms/openai/responses/transformation.py`) and knows the Responses-API on-the-wire shape. `litellm.completion_cost` already handles `ResponsesAPIResponse` natively with `call_type="responses"`, so no downstream changes are needed. Tests: test_responses_api_uses_responses_transformer_not_chat_completions NEW. Real regression test — exercises the openai_passthrough_handler with a real-shaped Responses payload (no `choices`, has `output` and Responses-API `usage` keys) and NO mocked `get_provider_config`. Pre-fix: raises KeyError 'choices' inside the chat-completions transformer (the bug). Post-fix: returns a ResponsesAPIResponse, completion_cost is called with call_type="responses" and a ResponsesAPIResponse instance (asserted). Verified to fail on un-fixed handler + pass on fixed handler before commit. test_responses_api_cost_tracking UPDATED. Old test mocked `get_provider_config` (no longer called in the responses branch post-fix). Now mocks the Responses transformer directly (`OpenAIResponsesAPIConfig.transform_response_api_response`) to test the downstream cost-calc contract. Out of scope for this PR (separate followup): - Recognizing .cognitiveservices.azure.com (the newer Azure OpenAI hostname) in the is_openai__route checks. Separate PR. Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> * fix(skills): execute DB skills by matching the litellm_skill_ tool name prefix (#30116) Skill IDs are generated as litellm_skill_<uuid> and the model-facing tool name is the sanitized skill ID, but the post-call execution gates in SkillsInjectionHook only ran tools whose name starts with "skill_", so DB skills were silently returned to the client as raw tool calls. Fixes #28122. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(anthropic): synthesize content_block_start when Responses stream omits output_item.added (#30115) * fix(team): reserve team budget raises for proxy admins on /team/update (#30030) The caller's PERSONAL max_budget was the wrong yardstick for /team/update: a team's spend ceiling has nothing to do with the admin's own key budget. That comparison was an unintended side effect of reusing _check_user_team_limits() (which exists for the /team/new path) and broke the UI, which re-sends the unchanged budget on every save. New behavior on /team/update for standalone teams: - A team admin (already authorized via _verify_team_access) may freely KEEP or LOWER the team budget, and change models/tpm/rpm, without being gated by their personal limits. - GROWING a team's spend ceiling is a budget-authority action reserved for proxy admins -> 403 for team admins. "Growing" covers both raising max_budget above the team's current finite value and removing the cap entirely (max_budget=null, detected via model_fields_set so an explicit null is distinguished from an omitted field). For a team that currently has no cap, setting a finite value is a restriction and is allowed. - Org-scoped teams remain governed by _check_org_team_limits() (capped by the org budget). Also reverts the #29525 existing_team_max_budget workaround in _check_user_team_limits() back to the create-only form; /team/new still enforces the creator's personal caps. docs(access_control): resolve the contradiction in the team-admin section — team admins can keep/lower the budget and manage rate limits/models, but cannot raise the team budget (proxy-admin only). tests: unit + behavior coverage for raise-blocked, cap-removal-blocked (team admin), raise/removal allowed (proxy admin), uncapped-team restriction allowed, keep/lower/resend allowed, and unchanged create-path guards. Co-authored-by: Cursor <cursoragent@cursor.com> * test(ui): data-driven App Router migration E2E smoke (default + server-root-path) (#29974) * test(ui): add a data-driven App Router migration E2E smoke Add a growing Playwright smoke for migrated pages: for each segment it deep-links to the path route, asserts the URL and that the dashboard shell rendered, then clicks off to a legacy page and asserts navigation still works. Driven by e2e_tests/fixtures/migratedPages.ts, so adding a page is one line. Runs in two situations against the same proxy: the default mount (npm run e2e:migration) and a non-root SERVER_ROOT_PATH mount (npm run e2e:migration:root). globalSetup now logs in at `${SERVER_ROOT_PATH}/ui/login` so the admin storage state is valid under a prefix. Seeded with api-reference; append the rest as their migrations merge. * test(ui): support headed slow-motion + watch pauses in the migration smoke Honor SLOWMO in the server-root-path config (the default config already did), and add an env-gated E2E_WATCH_MS pause so a headed run lingers on each state. Both are no-ops by default, so CI behavior is unchanged. * test(ui): make the migration smoke a sidebar-click user journey Rework the smoke from deep-linking to a real navigation journey: start at the landing page, click the migrated page in the sidebar (expanding submenus for nested items), assert the path route rendered, reload it (the check a wrong server_root_path breaks), bounce to a legacy page and back, and — once two pages are migrated — navigate directly between two migrated pages. Verifies via URL + shell render, driven by the same fixture list. * test(ui): address review on the migration smoke Escape ROOT and segment before interpolating them into RegExp URL matchers so a future segment containing regex metacharacters can't silently widen the match. Make the server-root-path config fail fast when SERVER_ROOT_PATH is unset instead of silently re-running the default mount and passing without exercising the prefix. * test(ui): drop unused watch helper and fix stale smoke README * test(ui): run the migration smoke under a server root path in CI * test(ui): harden + instrument the server-root-path proxy reboot in CI * test(ui): run the server-root-path migration smoke as its own CI job Replace the in-place proxy reboot in e2e_ui_testing with a dedicated e2e_ui_testing_server_root_path job that boots the proxy once with SERVER_ROOT_PATH=/litellm, matching how every other proxy variant in the config gets its own job rather than killing and relaunching the live proxy. The reboot was failing deterministically: after pkill -9 and relaunch the prefixed proxy never came back up on :4000 (connection refused), so the smoke never ran. The readiness step that was supposed to surface the cause could never reach its boot-log tail because CircleCI runs steps under bash -eo pipefail and the preceding `curl -sv ... \| tail` aborted the step with curl's exit 7. Booting the proxy as the job's own background step lets any boot crash land in that step's log instead of being swallowed. The default e2e_ui_testing job is unchanged aside from dropping the reboot, prefixed-readiness, and prefixed-smoke steps; the migration smoke still runs at the root mount there via the default Playwright config. * fix(proxy): extend response headers hook to streaming, TTS, image gen, and pass-through (#24232) * fix(proxy): extend response headers hook to streaming, TTS, image gen, and pass-through * test: mock post_call_response_headers_hook in audio speech route tests * chore(ui): remove dead App Router route stubs under (dashboard) (#30045) models-and-endpoints, organizations, and virtual-keys each had a page.tsx route under (dashboard)/ that is not in MIGRATED_PAGES, so the sidebar and deep links never resolve to it and the route is unreachable. Each was a thin wrapper that handed the shared view empty or no-op props (empty modelData with a no-op setModelData, hardcoded empty organizations, no-op setUserRole/setUserEmail), so reaching one would render a degraded page in any case. The real wrapper belongs in the PR that flips each page into MIGRATED_PAGES, written with eyes on it and a test This continues the dead-scaffolding cleanup from #28891. The shared components these wrappers rendered (ModelsAndEndpointsView, OrganizationFilters) stay, since the legacy ?page= switch in app/page.tsx and src/components still import them * fix(ui/mcp): reset OAuth state on create-server modal close so a prior server's token no longer leaks into the next add-server session (#30000) * fix(ui/mcp): reset OAuth hook state on modal close so a prior server's token no longer leaks into the next add-server session * fix(ui/mcp): clear in-flight OAuth guard on reset and reset form/tools on modal close so nothing leaks on a parent-driven dismiss * fix(mcp): allow team access-group grants in OAuth authorize/token access check (#30041) * fix(mcp): honor team access-group grants in OAuth authorize/token access check * test(mcp): mock build_effective_auth_contexts in non-admin authorize tests for isolation * docs(security): require a reproduction video for vulnerability reports (#30048) (#30063) With AI models capable of automated vulnerability discovery now publicly available, we expect a large increase in report volume, much of it unverified. Requiring a video of the exploit running against a live instance raises the bar for submissions and keeps triage focused on reproducible issues. Reports without a video will be closed and reopened if one is added later. Co-authored-by: stuxf <70670632+stuxf@users.noreply.github.com> * feat(ui): add admin flag to disable in-product UI nudges for everyone (#29796) * feat(ui): add admin flag to disable in-product UI nudges for everyone Admins can now suppress the survey and Claude Code feedback popups for all users via a single disable_ui_nudges UI setting, instead of relying on each user dismissing them individually. * fix(ui): suppress nudges while ui settings are loading Gate nudgesDisabled on the ui-settings loading state so an admin with disable_ui_nudges on doesn't see the survey prompt flash, and the getInProductNudgesCall fetch doesn't fire, on a cold page load before the flag resolves. Falls back to showing nudges if the fetch errors. * test(ui): wrap CreateKeyPage test in QueryClientProvider page.tsx now calls useUISettings (react-query), which needs a QueryClient that layout.tsx supplies in production but the test did not. Add the provider and mock getUiSettings so the query resolves. * chore(ui): remove dead dashboard files and unused dependencies (#30047) * chore(ui): remove dead dashboard files and unused dependencies knip flagged seven orphaned source/config files with no importers and five declared dependencies that nothing in the tree uses. Removing them shrinks the dashboard bundle's source surface and keeps the manifest honest; vite stays installed transitively via vitest, so test tooling is unaffected. * fix(ci): restore serverRootPath.config.ts referenced by SERVER_ROOT_PATH workflow The dead-code sweep removed e2e_tests/serverRootPath.config.ts, but its spec (tests/login/serverRootPathRedirect.spec.ts) and the test_server_root_path.yml workflow step still depend on it, so the redirect e2e job failed to load a config that no longer existed. * fix(proxy): authorize batch files using upload target_model_names (LIT-3593) (#30009) * fix(proxy): authorize batch files using upload target_model_names (LIT-3593) After replace_model_in_jsonl, body.model is a stripped provider id. Reverse-mapping it via resolve_model_name_from_model_id is first-match on model_list and caused false 403s when multiple deployments share the same stripped name. Use target_model_names from the unified file id instead. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(proxy): restore resolve_model_name_from_model_id for JSONL fallback path (LIT-3593) Restores the reverse-lookup for the JSONL body.model fallback path so that legacy/pre-target_model_names managed files still map stripped provider IDs back to proxy aliases before auth. Also cleans up redundant `or None`. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Revert "fix(proxy): restore resolve_model_name_from_model_id for JSONL fallback path (LIT-3593)" This reverts commit 30d2e96f77ef521ccaaf2193fe554980380eb669. --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * Add Claude Fable 5 across Anthropic, Bedrock, Vertex AI, and Azure AI (#30064) * Add Claude Fable 5 across Anthropic, Bedrock, Vertex AI, and Azure AI Adds cost map entries for claude-fable-5 ($10/$50 per MTok, 1M context, 128K output, adaptive thinking only) on the Anthropic API, Bedrock converse (base, global, and us/eu geo inference profiles at the 10% regional premium), Vertex AI, and Azure AI (Microsoft Foundry, which serves Fable 5 with the full 1M context window unlike Opus 4.8). Registers anthropic.claude-fable-5 in BEDROCK_CONVERSE_MODELS, lists the model in the setup wizard, and extends the reasoning effort e2e grid. The Bedrock, Vertex, and Azure grid cells carry fail_reason markers until the CI accounts are provisioned: Bedrock needs the provider data sharing opt-in Fable 5 requires, and the Foundry resource needs a claude-fable-5 deployment. The first-party entry carries provider_specific_entry {us: 1.1} for the inference_geo premium and deliberately no fast multiplier since Fable 5 has no fast mode. https://claude.ai/code/session_01MZarYYT3aS7DxaNjoax6Gm * Drop removed sampling params for Claude 4.7+ when drop_params is set Fable 5, Opus 4.7, and Opus 4.8 removed sampling params: the API rejects top_p, top_k, and any temperature other than 1 with a 400. LiteLLM was forwarding them even with drop_params enabled because the Anthropic and Bedrock converse transformations passed temperature/top_p through unconditionally. Mirror the GPT-5/o-series handling: temperature=1 still passes through, other values and any top_p are dropped when drop_params is set, and without drop_params a clean client-side UnsupportedParamsError tells the caller how to opt in, instead of surfacing the raw provider error. https://claude.ai/code/session_01MZarYYT3aS7DxaNjoax6Gm * Drive sampling param gating from the cost map and cover top_k Greptile review follow-ups on the sampling param fix: the restriction for Fable 5 / Opus 4.7 / 4.8 is now declared as supports_sampling_params: false on every affected cost map entry (perplexity excluded; that route is OpenAI-compatible and maps sampling params upstream) and read back through a tri-state map lookup, keeping the name check only as a fallback for provider-routed ids whose hosted map entries predate the flag, the same layering supports_adaptive_thinking uses. top_k bypasses map_openai_params as a provider-specific kwarg, so it is gated at the shared AnthropicConfig.transform_request boundary (direct, Bedrock invoke, Vertex, Azure) and in the Bedrock converse _handle_top_k_value path, with drop_params threaded through the converse transform helpers. Also updates the reasoning effort grid cell count assertion for the four Fable 5 rows added on this branch (29 x 11 cells). https://claude.ai/code/session_01MZarYYT3aS7DxaNjoax6Gm * Declare supports_sampling_params in the cost map schema The model map validation schema uses additionalProperties: false, so the new flag must be declared for the 28 entries that carry it; this was the one failing job (misc / Run tests) on the previous commit. https://claude.ai/code/session_01MZarYYT3aS7DxaNjoax6Gm * fix(bedrock): gate top_k=0 on converse to match Anthropic boundary Truthiness check let top_k=0 silently disappear on models that removed sampling params, while AnthropicConfig.transform_request treats 0 as present and raises UnsupportedParamsError (or drops when drop_params is set). Switch to 'is not None' so converse, direct Anthropic, invoke, Vertex, and Azure all behave the same for top_k=0. --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> * fix(anthropic): avoid index -1 content_block_delta in messages stream When a /v1/messages request is routed through the Responses API adapter, AnthropicResponsesStreamWrapper only emits content_block_start on response.output_item.added. Some upstreams (LMStudio for example) never send that event, so the text delta handler fell back to _current_block_index, which starts at -1, and clients received content_block_delta events with index -1 and no preceding content_block_start. Anthropic SDKs then fail with "text part -1 not found" The text delta handler now synthesizes a content_block_start with a fresh block index whenever the delta references an unregistered item_id or no block is open yet, and registers the item_id so follow-up deltas reuse the same index Addresses the /v1/messages defect in #27442 * Make test sys.path shim resolve relative to the file, not the CWD os.path.abspath("../../../../../../..") depends on where pytest is invoked from; anchoring on os.path.dirname(__file__) makes the import work from any working directory. Also corrects the depth: the repo root is six levels above this file, not seven. --------- Co-authored-by: milan-berri <milan@berri.ai> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: ryan-crabbe-berri <ryan@berri.ai> Co-authored-by: michelligabriele <gabriele.michelli@icloud.com> Co-authored-by: tin-berri <tin@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: stuxf <70670632+stuxf@users.noreply.github.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Mateo Wang <277851410+mateo-berri@users.noreply.github.com> * fix: enable compact-2026-01-12 beta header for vertex_ai provider (#30114) * fix(team): reserve team budget raises for proxy admins on /team/update (#30030) The caller's PERSONAL max_budget was the wrong yardstick for /team/update: a team's spend ceiling has nothing to do with the admin's own key budget. That comparison was an unintended side effect of reusing _check_user_team_limits() (which exists for the /team/new path) and broke the UI, which re-sends the unchanged budget on every save. New behavior on /team/update for standalone teams: - A team admin (already authorized via _verify_team_access) may freely KEEP or LOWER the team budget, and change models/tpm/rpm, without being gated by their personal limits. - GROWING a team's spend ceiling is a budget-authority action reserved for proxy admins -> 403 for team admins. "Growing" covers both raising max_budget above the team's current finite value and removing the cap entirely (max_budget=null, detected via model_fields_set so an explicit null is distinguished from an omitted field). For a team that currently has no cap, setting a finite value is a restriction and is allowed. - Org-scoped teams remain governed by _check_org_team_limits() (capped by the org budget). Also reverts the #29525 existing_team_max_budget workaround in _check_user_team_limits() back to the create-only form; /team/new still enforces the creator's personal caps. docs(access_control): resolve the contradiction in the team-admin section — team admins can keep/lower the budget and manage rate limits/models, but cannot raise the team budget (proxy-admin only). tests: unit + behavior coverage for raise-blocked, cap-removal-blocked (team admin), raise/removal allowed (proxy admin), uncapped-team restriction allowed, keep/lower/resend allowed, and unchanged create-path guards. Co-authored-by: Cursor <cursoragent@cursor.com> * test(ui): data-driven App Router migration E2E smoke (default + server-root-path) (#29974) * test(ui): add a data-driven App Router migration E2E smoke Add a growing Playwright smoke for migrated pages: for each segment it deep-links to the path route, asserts the URL and that the dashboard shell rendered, then clicks off to a legacy page and asserts navigation still works. Driven by e2e_tests/fixtures/migratedPages.ts, so adding a page is one line. Runs in two situations against the same proxy: the default mount (npm run e2e:migration) and a non-root SERVER_ROOT_PATH mount (npm run e2e:migration:root). globalSetup now logs in at `${SERVER_ROOT_PATH}/ui/login` so the admin storage state is valid under a prefix. Seeded with api-reference; append the rest as their migrations merge. * test(ui): support headed slow-motion + watch pauses in the migration smoke Honor SLOWMO in the server-root-path config (the default config already did), and add an env-gated E2E_WATCH_MS pause so a headed run lingers on each state. Both are no-ops by default, so CI behavior is unchanged. * test(ui): make the migration smoke a sidebar-click user journey Rework the smoke from deep-linking to a real navigation journey: start at the landing page, click the migrated page in the sidebar (expanding submenus for nested items), assert the path route rendered, reload it (the check a wrong server_root_path breaks), bounce to a legacy page and back, and — once two pages are migrated — navigate directly between two migrated pages. Verifies via URL + shell render, driven by the same fixture list. * test(ui): address review on the migration smoke Escape ROOT and segment before interpolating them into RegExp URL matchers so a future segment containing regex metacharacters can't silently widen the match. Make the server-root-path config fail fast when SERVER_ROOT_PATH is unset instead of silently re-running the default mount and passing without exercising the prefix. * test(ui): drop unused watch helper and fix stale smoke README * test(ui): run the migration smoke under a server root path in CI * test(ui): harden + instrument the server-root-path proxy reboot in CI * test(ui): run the server-root-path migration smoke as its own CI job Replace the in-place proxy reboot in e2e_ui_testing with a dedicated e2e_ui_testing_server_root_path job that boots the proxy once with SERVER_ROOT_PATH=/litellm, matching how every other proxy variant in the config gets its own job rather than killing and relaunching the live proxy. The reboot was failing deterministically: after pkill -9 and relaunch the prefixed proxy never came back up on :4000 (connection refused), so the smoke never ran. The readiness step that was supposed to surface the cause could never reach its boot-log tail because CircleCI runs steps under bash -eo pipefail and the preceding `curl -sv ... \| tail` aborted the step with curl's exit 7. Booting the proxy as the job's own background step lets any boot crash land in that step's log instead of being swallowed. The default e2e_ui_testing job is unchanged aside from dropping the reboot, prefixed-readiness, and prefixed-smoke steps; the migration smoke still runs at the root mount there via the default Playwright config. * fix(proxy): extend response headers hook to streaming, TTS, image gen, and pass-through (#24232) * fix(proxy): extend response headers hook to streaming, TTS, image gen, and pass-through * test: mock post_call_response_headers_hook in audio speech route tests * chore(ui): remove dead App Router route stubs under (dashboard) (#30045) models-and-endpoints, organizations, and virtual-keys each had a page.tsx route under (dashboard)/ that is not in MIGRATED_PAGES, so the sidebar and deep links never resolve to it and the route is unreachable. Each was a thin wrapper that handed the shared view empty or no-op props (empty modelData with a no-op setModelData, hardcoded empty organizations, no-op setUserRole/setUserEmail), so reaching one would render a degraded page in any case. The real wrapper belongs in the PR that flips each page into MIGRATED_PAGES, written with eyes on it and a test This continues the dead-scaffolding cleanup from #28891. The shared components these wrappers rendered (ModelsAndEndpointsView, OrganizationFilters) stay, since the legacy ?page= switch in app/page.tsx and src/components still import them * fix(ui/mcp): reset OAuth state on create-server modal close so a prior server's token no longer leaks into the next add-server session (#30000) * fix(ui/mcp): reset OAuth hook state on modal close so a prior server's token no longer leaks into the next add-server session * fix(ui/mcp): clear in-flight OAuth guard on reset and reset form/tools on modal close so nothing leaks on a parent-driven dismiss * fix(mcp): allow team access-group grants in OAuth authorize/token access check (#30041) * fix(mcp): honor team access-group grants in OAuth authorize/token access check * test(mcp): mock build_effective_auth_contexts in non-admin authorize tests for isolation * docs(security): require a reproduction video for vulnerability reports (#30048) (#30063) With AI models capable of automated vulnerability discovery now publicly available, we expect a large increase in report volume, much of it unverified. Requiring a video of the exploit running against a live instance raises the bar for submissions and keeps triage focused on reproducible issues. Reports without a video will be closed and reopened if one is added later. Co-authored-by: stuxf <70670632+stuxf@users.noreply.github.com> * feat(ui): add admin flag to disable in-product UI nudges for everyone (#29796) * feat(ui): add admin flag to disable in-product UI nudges for everyone Admins can now suppress the survey and Claude Code feedback popups for all users via a single disable_ui_nudges UI setting, instead of relying on each user dismissing them individually. * fix(ui): suppress nudges while ui settings are loading Gate nudgesDisabled on the ui-settings loading state so an admin with disable_ui_nudges on doesn't see the survey prompt flash, and the getInProductNudgesCall fetch doesn't fire, on a cold page load before the flag resolves. Falls back to showing nudges if the fetch errors. * test(ui): wrap CreateKeyPage test in QueryClientProvider page.tsx now calls useUISettings (react-query), which needs a QueryClient that layout.tsx supplies in production but the test did not. Add the provider and mock getUiSettings so the query resolves. * chore(ui): remove dead dashboard files and unused dependencies (#30047) * chore(ui): remove dead dashboard files and unused dependencies knip flagged seven orphaned source/config files with no importers and five declared dependencies that nothing in the tree uses. Removing them shrinks the dashboard bundle's source surface and keeps the manifest honest; vite stays installed transitively via vitest, so test tooling is unaffected. * fix(ci): restore serverRootPath.config.ts referenced by SERVER_ROOT_PATH workflow The dead-code sweep removed e2e_tests/serverRootPath.config.ts, but its spec (tests/login/serverRootPathRedirect.spec.ts) and the test_server_root_path.yml workflow step still depend on it, so the redirect e2e job failed to load a config that no longer existed. * fix(proxy): authorize batch files using upload target_model_names (LIT-3593) (#30009) * fix(proxy): authorize batch files using upload target_model_names (LIT-3593) After replace_model_in_jsonl, body.model is a stripped provider id. Reverse-mapping it via resolve_model_name_from_model_id is first-match on model_list and caused false 403s when multiple deployments share the same stripped name. Use target_model_names from the unified file id instead. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(proxy): restore resolve_model_name_from_model_id for JSONL fallback path (LIT-3593) Restores the reverse-lookup for the JSONL body.model fallback path so that legacy/pre-target_model_names managed files still map stripped provider IDs back to proxy aliases before auth. Also cleans up redundant `or None`. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Revert "fix(proxy): restore resolve_model_name_from_model_id for JSONL fallback path (LIT-3593)" This reverts commit 30d2e96f77ef521ccaaf2193fe554980380eb669. --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * Add Claude Fable 5 across Anthropic, Bedrock, Vertex AI, and Azure AI (#30064) * Add Claude Fable 5 across Anthropic, Bedrock, Vertex AI, and Azure AI Adds cost map entries for claude-fable-5 ($10/$50 per MTok, 1M context, 128K output, adaptive thinking only) on the Anthropic API, Bedrock converse (base, global, and us/eu geo inference profiles at the 10% regional premium), Vertex AI, and Azure AI (Microsoft Foundry, which serves Fable 5 with the full 1M context window unlike Opus 4.8). Registers anthropic.claude-fable-5 in BEDROCK_CONVERSE_MODELS, lists the model in the setup wizard, and extends the reasoning effort e2e grid. The Bedrock, Vertex, and Azure grid cells carry fail_reason markers until the CI accounts are provisioned: Bedrock needs the provider data sharing opt-in Fable 5 requires, and the Foundry resource needs a claude-fable-5 deployment. The first-party entry carries provider_specific_entry {us: 1.1} for the inference_geo premium and deliberately no fast multiplier since Fable 5 has no fast mode. https://claude.ai/code/session_01MZarYYT3aS7DxaNjoax6Gm * Drop removed sampling params for Claude 4.7+ when drop_params is set Fable 5, Opus 4.7, and Opus 4.8 removed sampling params: the API rejects top_p, top_k, and any temperature other than 1 with a 400. LiteLLM was forwarding them even with drop_params enabled because the Anthropic and Bedrock converse transformations passed temperature/top_p through unconditionally. Mirror the GPT-5/o-series handling: temperature=1 still passes through, other values and any top_p are dropped when drop_params is set, and without drop_params a clean client-side UnsupportedParamsError tells the caller how to opt in, instead of surfacing the raw provider error. https://claude.ai/code/session_01MZarYYT3aS7DxaNjoax6Gm * Drive sampling param gating from the cost map and cover top_k Greptile review follow-ups on the sampling param fix: the restriction for Fable 5 / Opus 4.7 / 4.8 is now declared as supports_sampling_params: false on every affected cost map entry (perplexity excluded; that route is OpenAI-compatible and maps sampling params upstream) and read back through a tri-state map lookup, keeping the name check only as a fallback for provider-routed ids whose hosted map entries predate the flag, the same layering supports_adaptive_thinking uses. top_k bypasses map_openai_params as a provider-specific kwarg, so it is gated at the shared AnthropicConfig.transform_request boundary (direct, Bedrock invoke, Vertex, Azure) and in the Bedrock converse _handle_top_k_value path, with drop_params threaded through the converse transform helpers. Also updates the reasoning effort grid cell count assertion for the four Fable 5 rows added on this branch (29 x 11 cells). https://claude.ai/code/session_01MZarYYT3aS7DxaNjoax6Gm * Declare supports_sampling_params in the cost map schema The model map validation schema uses additionalProperties: false, so the new flag must be declared for the 28 entries that carry it; this was the one failing job (misc / Run tests) on the previous commit. https://claude.ai/code/session_01MZarYYT3aS7DxaNjoax6Gm * fix(bedrock): gate top_k=0 on converse to match Anthropic boundary Truthiness check let top_k=0 silently disappear on models that removed sampling params, while AnthropicConfig.transform_request treats 0 as present and raises UnsupportedParamsError (or drops when drop_params is set). Switch to 'is not None' so converse, direct Anthropic, invoke, Vertex, and Azure all behave the same for top_k=0. --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> * fix: enable compact-2026-01-12 beta header for vertex_ai provider The vertex_ai block in anthropic_beta_headers_config.json mapped compact-2026-01-12 to null, so update_headers_with_filtered_beta stripped the header before the request reached Vertex while the compact_20260112 context edit stayed in the body, and Vertex rejected the request with HTTP 400. Vertex rawPredict accepts the header, and the bedrock and databricks blocks already forward it. Mirrors #21867, which enabled context-1m-2025-08-07 for vertex_ai the same way. Fixes #27290. --------- Co-authored-by: milan-berri <milan@berri.ai> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: ryan-crabbe-berri <ryan@berri.ai> Co-authored-by: michelligabriele <gabriele.michelli@icloud.com> Co-authored-by: tin-berri <tin@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: stuxf <70670632+stuxf@users.noreply.github.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Mateo Wang <277851410+mateo-berri@users.noreply.github.com> * fix(proxy): coerce litellm_settings.max_budget env var to float (#30113) * fix(team): reserve team budget raises for proxy admins on /team/update (#30030) The caller's PERSONAL max_budget was the wrong yardstick for /team/update: a team's spend ceiling has nothing to do with the admin's own key budget. That comparison was an unintended side effect of reusing _check_user_team_limits() (which exists for the /team/new path) and broke the UI, which re-sends the unchanged budget on every save. New behavior on /team/update for standalone teams: - A team admin (already authorized via _verify_team_access) may freely KEEP or LOWER the team budget, and change models/tpm/rpm, without being gated by their personal limits. - GROWING a team's spend ceiling is a budget-authority action reserved for proxy admins -> 403 for team admins. "Growing" covers both raising max_budget above the team's current finite value and removing the cap entirely (max_budget=null, detected via model_fields_set so an explicit null is distinguished from an omitted field). For a team that currently has no cap, setting a finite value is a restriction and is allowed. - Org-scoped teams remain governed by _check_org_team_limits() (capped by the org budget). Also reverts the #29525 existing_team_max_budget workaround in _check_user_team_limits() back to the create-only form; /team/new still enforces the creator's personal caps. docs(access_control): resolve the contradiction in the team-admin section — team admins can keep/lower the budget and manage rate limits/models, but cannot raise the team budget (proxy-admin only). tests: unit + behavior coverage for raise-blocked, cap-removal-blocked (team admin), raise/removal allowed (proxy admin), uncapped-team restriction allowed, keep/lower/resend allowed, and unchanged create-path guards. Co-authored-by: Cursor <cursoragent@cursor.com> * test(ui): data-driven App Router migration E2E smoke (default + server-root-path) (#29974) * test(ui): add a data-driven App Router migration E2E smoke Add a growing Playwright smoke for migrated pages: for each segment it deep-links to the path route, asserts the URL and that the dashboard shell rendered, then clicks off to a legacy page and asserts navigation still works. Driven by e2e_tests/fixtures/migratedPages.ts, so adding a page is one line. Runs in two situations against the same proxy: the default mount (npm run e2e:migration) and a non-root SERVER_ROOT_PATH mount (npm run e2e:migration:root). globalSetup now logs in at `${SERVER_ROOT_PATH}/ui/login` so the admin storage state is valid under a prefix. Seeded with api-reference; append the rest as their migrations merge. * test(ui): support headed slow-motion + watch pauses in the migration smoke Honor SLOWMO in the server-root-path config (the default config already did), and add an env-gated E2E_WATCH_MS pause so a headed run lingers on each state. Both are no-ops by default, so CI behavior is unchanged. * test(ui): make the migration smoke a sidebar-click user journey Rework the smoke from deep-linking to a real navigation journey: start at the landing page, click the migrated page in the sidebar (expanding submenus for nested items), assert the path route rendered, reload it (the check a wrong server_root_path breaks), bounce to a legacy page and back, and — once two pages are migrated — navigate directly between two migrated pages. Verifies via URL + shell render, driven by the same fixture list. * test(ui): address review on the migration smoke Escape ROOT and segment before interpolating them into RegExp URL matchers so a future segment containing regex metacharacters can't silently widen the match. Make the server-root-path config fail fast when SERVER_ROOT_PATH is unset instead of silently re-running the default mount and passing without exercising the prefix. * test(ui): drop unused watch helper and fix stale smoke README * test(ui): run the migration smoke under a server root path in CI * test(ui): harden + instrument the server-root-path proxy reboot in CI * test(ui): run the server-root-path migration smoke as its own CI job Replace the in-place proxy reboot in e2e_ui_testing with a dedicated e2e_ui_testing_server_root_path job that boots the proxy once with SERVER_ROOT_PATH=/litellm, matching how every other proxy variant in the config gets its own job rather than killing and relaunching the live proxy. The reboot was failing deterministically: after pkill -9 and relaunch the prefixed proxy never came back up on :4000 (connection refused), so the smoke never ran. The readiness step that was supposed to surface the cause could never reach its boot-log tail because CircleCI runs steps under bash -eo pipefail and the preceding `curl -sv ... \| tail` aborted the step with curl's exit 7. Booting the proxy as the job's own background step lets any boot crash land in that step's log instead of being swallowed. The default e2e_ui_testing job is unchanged aside from dropping the reboot, prefixed-readiness, and prefixed-smoke steps; the migration smoke still runs at the root mount there via the default Playwright config. * fix(proxy): extend response headers hook to streaming, TTS, image gen, and pass-through (#24232) * fix(proxy): extend response headers hook to streaming, TTS, image gen, and pass-through * test: mock post_call_response_headers_hook in audio speech route tests * chore(ui): remove dead App Router route stubs under (dashboard) (#30045) models-and-endpoints, organizations, and virtual-keys each had a page.tsx route under (dashboard)/ that is not in MIGRATED_PAGES, so the sidebar and deep links never resolve to it and the route is unreachable. Each was a thin wrapper that handed the shared view empty or no-op props (empty modelData with a no-op setModelData, hardcoded empty organizations, no-op setUserRole/setUserEmail), so reaching one would render a degraded page in any case. The real wrapper belongs in the PR that flips each page into MIGRATED_PAGES, written with eyes on it and a test This continues the dead-scaffolding cleanup from #28891. The shared components these wrappers rendered (ModelsAndEndpointsView, OrganizationFilters) stay, since the legacy ?page= switch in app/page.tsx and src/components still import them * fix(ui/mcp): reset OAuth state on create-server modal close so a prior server's token no longer leaks into the next add-server session (#30000) * fix(ui/mcp): reset OAuth hook state on modal close so a prior server's token no longer leaks into the next add-server session * fix(ui/mcp): clear in-flight OAuth guard on reset and reset form/tools on modal close so nothing leaks on a parent-driven dismiss * fix(mcp): allow team access-group grants in OAuth authorize/token access check (#30041) * fix(mcp): honor team access-group grants in OAuth authorize/token access check * test(mcp): mock build_effective_auth_contexts in non-admin authorize tests for isolation * docs(security): require a reproduction video for vulnerability reports (#30048) (#30063) With AI models capable of automated vulnerability discovery now publicly available, we expect a large increase in report volume, much of it unverified. Requiring a video of the exploit running against a live instance raises the bar for submissions and keeps triage focused on reproducible issues. Reports without a video will be closed and reopened if one is added later. Co-authored-by: stuxf <70670632+stuxf@users.noreply.github.com> * feat(ui): add admin flag to disable in-product UI nudges for everyone (#29796) * feat(ui): add admin flag to disable in-product UI nudges for everyone Admins can now suppress the survey and Claude Code feedback popups for all users via a single disable_ui_nudges UI setting, instead of relying on each user dismissing them individually. * fix(ui): suppress nudges while ui settings are loading Gate nudgesDisabled on the ui-settings loading state so an admin with disable_ui_nudges on doesn't see the survey prompt flash, and the getInProductNudgesCall fetch doesn't fire, on a cold page load before the flag resolves. Falls back to showing nudges if the fetch errors. * test(ui): wrap CreateKeyPage test in QueryClientProvider page.tsx now calls useUISettings (react-query), which needs a QueryClient that layout.tsx supplies in production but the test did not. Add the provider and mock getUiSettings so the query resolves. * chore(ui): remove dead dashboard files and unused dependencies (#30047) * chore(ui): remove dead dashboard files and unused dependencies knip flagged seven orphaned source/config files with no importers and five declared dependencies that nothing in the tree uses. Removing them shrinks the dashboard bundle's source surface and keeps the manifest honest; vite stays installed transitively via vitest, so test tooling is unaffected. * fix(ci): restore serverRootPath.config.ts referenced by SERVER_ROOT_PATH workflow The dead-code sweep removed e2e_tests/serverRootPath.config.ts, but its spec (tests/login/serverRootPathRedirect.spec.ts) and the test_server_root_path.yml workflow step still depend on it, so the redirect e2e job failed to load a config that no longer existed. * fix(proxy): authorize batch files using upload target_model_names (LIT-3593) (#30009) * fix(proxy): authorize batch files using upload target_model_names (LIT-3593) After replace_model_in_jsonl, body.model is a stripped provider id. Reverse-mapping it via resolve_model_name_from_model_id is first-match on model_list and caused false 403s when multiple deployments share the same stripped name. Use target_model_names from the unified file id instead. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(proxy): restore resolve_model_name_from_model_id for JSONL fallback path (LIT-3593) Restores the reverse-lookup for the JSONL body.model fallback path so that legacy/pre-target_model_names managed files still map stripped provider IDs back to proxy aliases before auth. Also cleans up redundant `or None`. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Revert "fix(proxy): restore resolve_model_name_from_model_id for JSONL fallback path (LIT-3593)" This reverts commit 30d2e96f77ef521ccaaf2193fe554980380eb669. --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * Add Claude Fable 5 across Anthropic, Bedrock, Vertex AI, and Azure AI (#30064) * Add Claude Fable 5 across Anthropic, Bedrock, Vertex AI, and Azure AI Adds cost map entries for claude-fable-5 ($10/$50 per MTok, 1M context, 128K output, adaptive thinking only) on the Anthropic API, Bedrock converse (base, global, and us/eu geo inference profiles at the 10% regional premium), Vertex AI, and Azure AI (Microsoft Foundry, which serves Fable 5 with the full 1M context window unlike Opus 4.8). Registers anthropic.claude-fable-5 in BEDROCK_CONVERSE_MODELS, lists the model in the setup wizard, and extends the reasoning effort e2e grid. The Bedrock, Vertex, and Azure grid cells carry fail_reason markers until the CI accounts are provisioned: Bedrock needs the provider data sharing opt-in Fable 5 requires, and the Foundry resource needs a claude-fable-5 deployment. The first-party entry carries provider_specific_entry {us: 1.1} for the inference_geo premium and deliberately no fast multiplier since Fable 5 has no fast mode. https://claude.ai/code/session_01MZarYYT3aS7DxaNjoax6Gm * Drop removed sampling params for Claude 4.7+ when drop_params is set Fable 5, Opus 4.7, and Opus 4.8 removed sampling params: the API rejects top_p, top_k, and any temperature other than 1 with a 400. LiteLLM was forwarding them even with drop_params enabled because the Anthropic and Bedrock converse transformations passed temperature/top_p through unconditionally. Mirror the GPT-5/o-series handling: temperature=1 still passes through, other values and any top_p are dropped when drop_params is set, and without drop_params a clean client-side UnsupportedParamsError tells the caller how to opt in, instead of surfacing the raw provider error. https://claude.ai/code/session_01MZarYYT3aS7DxaNjoax6Gm * Drive sampling param gating from the cost map and cover top_k Greptile review follow-ups on the sampling param fix: the restriction for Fable 5 / Opus 4.7 / 4.8 is now declared as supports_sampling_params: false on every affected cost map entry (perplexity excluded; that route is OpenAI-compatible and maps sampling params upstream) and read back through a tri-state map lookup, keeping the name check only as a fallback for provider-routed ids whose hosted map entries predate the flag, the same layering supports_adaptive_thinking uses. top_k bypasses map_openai_params as a provider-specific kwarg, so it is gated at the shared AnthropicConfig.transform_request boundary (direct, Bedrock invoke, Vertex, Azure) and in the Bedrock converse _handle_top_k_value path, with drop_params threaded through the converse transform helpers. Also updates the reasoning effort grid cell count assertion for the four Fable 5 rows added on this branch (29 x 11 cells). https://claude.ai/code/session_01MZarYYT3aS7DxaNjoax6Gm * Declare supports_sampling_params in the cost map schema The model map validation schema uses additionalProperties: false, so the new flag must be declared for the 28 entries that carry it; this was the one failing job (misc / Run tests) on the previous commit. https://claude.ai/code/session_01MZarYYT3aS7DxaNjoax6Gm * fix(bedrock): gate top_k=0 on converse to match Anthropic boundary Truthiness check let top_k=0 silently disappear on models that removed sampling params, while AnthropicConfig.transform_request treats 0 as present and raises UnsupportedParamsError (or drops when drop_params is set). Switch to 'is not None' so converse, direct Anthropic, invoke, Vertex, and Azure all behave the same for top_k=0. --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> * fix(proxy): coerce litellm_settings.max_budget env var to float When max_budget is set in litellm_settings via os.environ/MAX_BUDGET, the env var resolves to a string and the generic setattr branch in ProxyConfig.load_config stored it as-is, so the startup check litellm.max_budget > 0 raised TypeError. The earlier fix (#23855) only covered the CLI initialize() path. Coerce the value to float in the settings loop, matching the existing max_internal_user_budget handling. Fixes #26696. --------- Co-authored-by: milan-berri <milan@berri.ai> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: ryan-crabbe-berri <ryan@berri.ai> Co-authored-by: michelligabriele <gabriele.michelli@icloud.com> Co-authored-by: tin-berri <tin@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: stuxf <70670632+stuxf@users.noreply.github.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Mateo Wang <277851410+mateo-berri@users.noreply.github.com> * fix(router): don't drop bedrock pass-through deployments using IAM credentials (#30111) * Fix Bedrock passthrough deployment dropped when using IAM credentials Bedrock deployments with use_in_pass_through enabled and IAM/OIDC auth (aws_role_name, no api_key) hit the generic pass-through branch in Router._initialize_deployment_for_pass_through, which calls set_pass_through_credentials and raises "api_key is required". The exception drops the deployment from the router entirely, breaking both passthrough and normal routing for that model. Skip the credential store write when no api_key is set; the bedrock passthrough route resolves AWS credentials at request time via BedrockConverseLLM.get_credentials(), not the passthrough credential store, so there is nothing to register here. Fixes #27728. * Reset passthrough credentials singleton before api_key credential test The test reads the module-level passthrough_endpoint_router singleton, so a stale "openai" entry written by an earlier test in the same process could make the assertion pass without exercising the code path. Clearing the credentials dict up front makes the test order-independent. * fix(sdk): stop mirroring reasoning_content in provider_specific_fields (#30110) The dict-to-response conversion path mirrored reasoning_content into provider_specific_fields, while live provider transforms (Anthropic's _build_provider_specific_fields) only set it top-level on the Message. Cache-replayed messages therefore serialized differently from live ones, breaking disk cache key stability for multi-turn conversations with extended thinking. The mirror was added for DeepSeek before Message.reasoning_content existed as a top-level attribute. The top-level field is still set by the converter, so DeepSeek's request-side promotion is unaffected. Fixes #27337. * fix(mcp): coerce mcp_server_cost_info values to float at ingest (#30109) * fix(mcp): coerce mcp_server_cost_info values to float at ingest YAML 1.1 parses scientific notation without a decimal point (e.g. 7e-05) as a string, and MCPServerCostInfo is a TypedDict with no runtime validation, so a string-typed default_cost_per_query from config.yaml flowed through the proxy untouched and crashed the MCP server settings page with '.toFixed is not a function'. Normalize mcp_server_cost_info on both the config and DB load paths, dropping non-numeric values with a warning instead of failing the server load. Fixes #27097. * fix(mcp): drop non-numeric default_cost_per_query instead of nulling it Keeping the key with a None value still exposes a null to the UI, which can crash .toFixed formatting when the consumer checks key existence rather than truthiness. Delete the key on coercion failure, matching how non-numeric per-tool cost entries are already omitted. * fix(proxy): count embedding and text completion tokens toward TPM limits (#30105) * fix(proxy): count embedding and text completion tokens toward TPM limits The parallel request limiters only read token usage off ModelResponse, so EmbeddingResponse and TextCompletionResponse objects left total_tokens at 0 and the per key, user, team, and end user TPM counters never incremented. Requests to /v1/embeddings and /v1/completions were effectively free against any tpm_limit. In the v3 limiter this was worse: the post-call reconciliation computed actual usage as 0 and refunded the pre-call reservation made at request time. Broaden the isinstance checks to accept EmbeddingResponse and TextCompletionResponse, which both expose a Usage object, at the four per-scope sites in parallel_request_limiter.py and at the usage extraction in parallel_request_limiter_v3.py. ResponsesAPIResponse was already covered in v3 via BaseLiteLLMOpenAIResponseObject. Fixes #27738. * test(proxy): cover v1 limiter TPM counting for embedding and text completion responses Exercise the broadened isinstance sites in parallel_request_limiter.py by asserting that async_log_success_event adds total_tokens to the per key, user, team, and end user TPM counters for EmbeddingResponse and TextCompletionResponse objects. The counters are pre-seeded at zero so the assertion is exactly the increment; on the pre-fix code these responses left total_tokens at 0 and the test fails. * fix(openai): forward client headers on the text completion path (#30103) * fix(openai): forward client headers on the text completion path litellm.completion() merges caller headers with extra_headers, but the text-completion-openai branch never passed the merged dict to openai_text_completions.completion(), and the handler only used its headers argument for logging. Pass the merged headers through the call site and set them as extra_headers on the outgoing request, mirroring the chat completion handler, so x-* client headers forwarded by the proxy reach the provider on /v1/completions. Fixes #27410. * Drop redundant extra_headers assignment and fix test module collision completion() merges extra_headers into headers before the text-completion-openai branch, and the handler now sets the merged headers as extra_headers on the request, so the branch-local optional_params["extra_headers"] assignment was a dead duplicate. Removing it keeps the assignment in one place while both entry paths (litellm.text_completion and direct handler callers) still forward headers; a new regression test pins the extra_headers kwarg path. Also rename the test module to test_completion_handler.py since its basename collided with tests/test_litellm/llms/bedrock/batches/ test_handler.py and broke pytest collection. * fix(bedrock): route Anthropic-shape count_tokens to InvokeModel and base64-encode the body (#30102) * fix(bedrock): route Anthropic-shape count_tokens to InvokeModel POST /v1/messages/count_tokens with Anthropic content blocks ({"type": "text"\|"tool_use"\|...}) was routed to the Converse input of the Bedrock CountTokens API. The Converse transform copies list content through verbatim, so Bedrock rejected the request with a 400 and the caller silently fell back to the local tokenizer, returning counts that can be off by ~50% on tool-heavy payloads. _detect_input_type now routes messages whose content blocks carry a "type" key (Anthropic shape) to the invokeModel input, which forwards the body verbatim. The invokeModel body is now base64-encoded as the CountTokens API requires (InvokeModelTokensRequest.body is a base64-encoded blob), and Anthropic Messages bodies get the anthropic_version and max_tokens fields Bedrock validates against. Fixes #27632. * refactor(bedrock): name the CountTokens max_tokens placeholder Replace the magic 1024 with a module-level DEFAULT_ANTHROPIC_INVOKE_MODEL_MAX_TOKENS constant so the intent is explicit and there is a single place to update if Bedrock's InvokeModel schema ever changes. Module-local rather than litellm/constants.py because the value is only a schema-validation placeholder for token counting, not a user-tunable generation default. * Add above-512k pricing tier for MiniMax-M3 and correct its base rates (#30095) * Add above-512k pricing tier support for MiniMax-M3 MiniMax-M3 doubles its per-token rates once a prompt exceeds 512k input tokens. The tiered cost parser already handles arbitrary thresholds, but get_model_info only copies whitelisted keys from ModelInfoBase, which had no 512k variants, so above_512k keys were silently dropped and long-context requests were priced at the flat rate. Add the input, output, and cache-read above_512k_tokens fields to ModelInfoBase and pass them through in get_model_info. Update the minimax/MiniMax-M3 entry with the tiered rates and correct the base rates, which matched the above-512k tier instead of the published base tier (https://platform.minimax.io/docs/guides/pricing-paygo). Fixes #29663. * Add above-512k keys to pricing schema, set MiniMax-M3 context to 1M Register the three new above_512k_tokens cost keys in the INTENDED_SCHEMA of test_aaamodel_prices_and_context_window_json_is_valid, declared the same way as the existing above_200k/above_272k tier keys, so the schema check accepts the MiniMax-M3 tiered pricing entry. Also raise MiniMax-M3 max_input_tokens from 512000 to 1000000 in both pricing JSONs. The MiniMax API docs (https://platform.minimax.io/docs/guides/text-generation) state the model supports a 1,000,000-token context window, and the pay-as-you-go pricing page (https://platform.minimax.io/docs/guides/pricing-paygo) prices input above 512k tokens, which only makes sense if inputs beyond 512k are accepted. This makes the above-512k pricing tier reachable. * fix(bedrock): make document names unique across conversation turns (#30093) * fix(bedrock): make document names unique across conversation turns PR #16275 derived Bedrock document names purely from a content hash so that names stay deterministic for prompt caching. When the same PDF or document appears in more than one conversation turn, every occurrence gets the identical name and Bedrock rejects the request with "Messages can not contain duplicate document names". Add _rename_duplicate_bedrock_document_names, a post-pass over the assembled message blocks that keeps the first occurrence's hash-based name and appends a positional suffix (_2, _3, ...) to later occurrences. Apply it in both _bedrock_converse_messages_pt and _bedrock_converse_messages_pt_async. Names remain deterministic across requests and the first occurrence is unchanged, so prompt cache prefixes stay stable. Fixes #29418. * fix(bedrock): avoid suffix collisions with organic document names A renamed duplicate could collide with a document whose hash-derived name already ends in the same positional suffix (e.g. an organic report_2 next to two documents named report). Collect every document name up front and bump the suffix until the candidate is unused, so renames can collide neither with organic names nor with each other. * fix(_types): remove ResponsesAPIResponse from PassThroughEndpointLoggingResultValues The import of ResponsesAPIResponse was removed from the file but a usage was left in the Union type, causing a NameError on import and breaking all CI tests. Remove the stale reference to match the cleanup intent. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(_types): restore ResponsesAPIResponse import and add use_xai_oauth to filter list Two related fixes: 1. Re-add ResponsesAPIResponse import in _types.py — it was removed but still needed in PassThroughEndpointLoggingResultValues (used in openai_passthrough_logging_handler.py). 2. Add use_xai_oauth to all_litellm_params so it is filtered before forwarding kwargs to providers like OpenAI that do not recognize it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Hari <kancharla.ha@northeastern.edu> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: Ceder Dens <ceder.dens@uantwerpen.be> Co-authored-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com> Co-authored-by: 冯基魁 <56265583+fengjikui@users.noreply.github.com> Co-authored-by: victoruce <161634297+victoruce@users.noreply.github.com> Co-authored-by: kejunleng <33445544+silencedoctor@users.noreply.github.com> Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: Tyson Cung <45380903+tysoncung@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Jeremy Chapeau <113923302+jychp@users.noreply.github.com> Co-authored-by: Daan <255322319+daanhendrio@users.noreply.github.com> Co-authored-by: Avani Prajapati <143805019+Avani-prajapati@users.noreply.github.com> Co-authored-by: Kent <72616338+kingdoooo@users.noreply.github.com> Co-authored-by: daitran-tensormesh <dai@tensormesh.ai> Co-authored-by: Dimitris Spachos <dspachos@gmail.com> Co-authored-by: Liam Scott <liam@uilliam.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Filippo Menghi <113345637+Cyberfilo@users.noreply.github.com> Co-authored-by: milan-berri <milan@berri.ai> Co-authored-by: ryan-crabbe-berri <ryan@berri.ai> Co-authored-by: michelligabriele <gabriele.michelli@icloud.com> Co-authored-by: tin-berri <tin@berri.ai> Co-authored-by: stuxf <70670632+stuxf@users.noreply.github.com> Co-authored-by: Mateo Wang <277851410+mateo-berri@users.noreply.github.com>		2026-06-10 10:34:07 -07:00
..
agent_tests	Revert "chore(tests): migrate Bedrock CI to AWS account 941277531214 (#28728 )" (#29326 )	2026-05-30 11:26:24 -07:00
audio_tests	fix(tests): stabilize image-edit VCR cassettes to stop live gpt-image-1 spend (#28110 )	2026-05-18 09:15:39 -07:00
basic_proxy_startup_tests
batches_tests	test: stabilize batch VCR coverage and stop live upload/network leaks (#29477 )	2026-06-02 16:11:52 -07:00
benchmarks
code_coverage_tests	Litellm oss staging (#29492 )	2026-06-02 08:48:10 -07:00
documentation_tests
enterprise	feat: standardize rate limit errors with category, rate_limit_type, model, and llm_provider fields (#27687 )	2026-06-06 17:50:29 -07:00
guardrails_tests	Revert "chore(tests): migrate Bedrock CI to AWS account 941277531214 (#28728 )" (#29326 )	2026-05-30 11:26:24 -07:00
image_gen_tests	feat(fal_ai): add Nano Banana / Gemini 2.5 Flash Image generation support (#29798 )	2026-06-06 11:16:44 -07:00
integration	CI: copy of #25177 (OCI GenAI: embeddings, streaming/reasoning fixes, model catalog) (#28223 )	2026-05-23 12:15:41 -07:00
litellm	Title: Fix managed batch cancel credential resolution (#29734 )	2026-06-06 12:35:18 -07:00
litellm_core_utils
litellm_utils_tests	test(vcr): close out the remaining VCR live-call leaks (#29603 )	2026-06-03 13:46:43 -07:00
litellm-proxy-extras
llm_responses_api_testing	test(responses): bump deprecated gemini-3-pro-preview to gemini-3.1-pro-preview (#29433 )	2026-06-01 09:54:30 -07:00
llm_translation	Litellm oss 090626 (#30021 )	2026-06-10 10:34:07 -07:00
load_tests
local_testing	Litellm oss staging 040626 (#29671 )	2026-06-04 11:07:20 -07:00
logging_callback_tests	test(vcr): close out the remaining VCR live-call leaks (#29603 )	2026-06-03 13:46:43 -07:00
mcp_tests	[internal copy of #28008 ] Support MCP OAuth passthrough and issuer-scoped JWT auth (#28356 )	2026-06-02 12:22:04 -07:00
multi_instance_e2e_tests
ocr_tests	test(vcr): close out the remaining VCR live-call leaks (#29603 )	2026-06-03 13:46:43 -07:00
old_proxy_tests/tests
openai_endpoints_tests	chore(ci): modernize model references in tests and configs (#27856 )	2026-05-15 15:44:28 -07:00
otel_tests	feat(prometheus): add user_email and user_alias to user budget metrics (#28155 )	2026-05-18 16:28:14 -07:00
pass_through_tests	test(vcr): close out the remaining VCR live-call leaks (#29603 )	2026-06-03 13:46:43 -07:00
pass_through_unit_tests	test(vcr): close out the remaining VCR live-call leaks (#29603 )	2026-06-03 13:46:43 -07:00
proxy_admin_ui_tests	fix(guardrails): persist disable_global_guardrails on keys (#29233 )	2026-05-28 21:19:04 -07:00
proxy_behavior	fix(team): reserve team budget raises for proxy admins on /team/update (#30030 )	2026-06-09 09:19:15 -07:00
proxy_e2e_anthropic_messages_tests	Extend the record/replay proxy to chat, embeddings, moderations, rerank, and Anthropic (#29847 )	2026-06-06 14:33:42 -07:00
proxy_migration_tests	test(proxy): stop running real-DB tests in GitHub Actions unit jobs (#29700 )	2026-06-04 14:56:02 -07:00
proxy_security_tests	test(proxy): stop running real-DB tests in GitHub Actions unit jobs (#29700 )	2026-06-04 14:56:02 -07:00
proxy_unit_tests	Litellm jwt mapping virtualkeys (#28510 )	2026-06-04 19:00:36 -07:00
router_unit_tests	Litellm oss 090626 (#30021 )	2026-06-10 10:34:07 -07:00
scim_tests
search_tests	fix(tests): stabilize image-edit VCR cassettes to stop live gpt-image-1 spend (#28110 )	2026-05-18 09:15:39 -07:00
spend_tracking_tests	chore(ci): modernize model references in tests and configs (#27856 )	2026-05-15 15:44:28 -07:00
store_model_in_db_tests
test_litellm	Litellm oss 090626 (#30021 )	2026-06-10 10:34:07 -07:00
unified_google_tests	test(google): add google-genai SDK proxy integration tests (#29781 )	2026-06-05 21:05:32 +00:00
vector_store_tests	Revert "chore(tests): migrate Bedrock CI to AWS account 941277531214 (#28728 )" (#29326 )	2026-05-30 11:26:24 -07:00
windows_tests	ci: reproduce default-Windows wheel install to guard MAX_PATH (#29597 )	2026-06-03 11:28:08 -07:00
__init__.py
_flush_vcr_cache.py	tests(vcr): isolate cassette redis to CASSETTE_REDIS_URL	2026-05-01 12:32:59 -07:00
_live_test_helpers.py	test(vcr): close out the remaining VCR live-call leaks (#29603 )	2026-06-03 13:46:43 -07:00
_openai_record_replay_proxy.py	Extend the record/replay proxy to chat, embeddings, moderations, rerank, and Anthropic (#29847 )	2026-06-06 14:33:42 -07:00
_vcr_conftest_common.py	test(vcr): close out the remaining VCR live-call leaks (#29603 )	2026-06-03 13:46:43 -07:00
_vcr_redis_persister.py	test(vcr): stop refreshing cassette TTL on read so cassettes lapse after 24h (#29784 )	2026-06-05 10:22:41 -07:00
eval_swe_bench.py
gettysburg.wav
large_text.py
openai_batch_completions.jsonl
README.MD
test_budget_management.py
test_callbacks_on_proxy.py	test(callbacks): harden flaky proxy callback-leak detector (#28195 )	2026-05-18 16:39:02 -07:00
test_config.py
test_debug_warning.py
test_default_encoding_non_root.py
test_end_users.py	chore(ci): modernize model references in tests and configs (#27856 )	2026-05-15 15:44:28 -07:00
test_entrypoint.py
test_fallbacks.py
test_gpt5_azure_temperature_support.py
test_health.py	fix(tests): swap dall-e to gpt-image-1 after openai deprecation	2026-05-12 16:55:18 -07:00
test_keys.py	Extend the record/replay proxy to chat, embeddings, moderations, rerank, and Anthropic (#29847 )	2026-06-06 14:33:42 -07:00
test_litellm_proxy_responses_config.py	chore(ci): modernize model references in tests and configs (#27856 )	2026-05-15 15:44:28 -07:00
test_logging.conf
test_models.py
test_new_vector_store_endpoints.py
test_openai_endpoints.py	Extend the record/replay proxy to chat, embeddings, moderations, rerank, and Anthropic (#29847 )	2026-06-06 14:33:42 -07:00
test_organizations.py
test_otel_thread_leak.py
test_passthrough_endpoints.py
test_presidio_latency.py
test_proxy_server_non_root.py
test_ratelimit.py	chore(ci): modernize model references in tests and configs (#27856 )	2026-05-15 15:44:28 -07:00
test_resource_cleanup.py
test_service_logger_otel.py
test_spend_logs.py	Litellm oss staging 04 21 2026 2 (#26569 )	2026-05-20 21:25:19 -07:00
test_team_logging.py
test_team_members.py	Litellm oss staging 04 21 2026 2 (#26569 )	2026-05-20 21:25:19 -07:00
test_team.py
test_users.py	Fix: tag budget reset must drop stale management-cache entry (#27568 )	2026-05-10 00:18:55 +00:00

README.MD

In total litellm runs 1000+ tests

[02/20/2025] Update:

To make it easier to contribute and map what behavior is tested,

we've started mapping the litellm directory in tests/test_litellm

This folder can only run mock tests.