5e2db7eee4
39598 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
533eab4dbd
|
fix(tests/vcr): make Redis cassette cache replay deterministically (zero VCR misses on consecutive runs) (#28826)
* test(vcr): make Redis-backed cassettes replay deterministically across runs - Pin LITELLM_LOCAL_MODEL_COST_MAP=True in the shared VCR harness so the per-test importlib.reload(litellm) no longer fetches the model cost map from raw.githubusercontent.com. That live fetch was being recorded into cassettes; for tests that subsequently skip it was the only recorded episode, so the persister refused to save it (skipped tests don't persist) and the test re-recorded it live every run (MISS:NOT_PERSISTED). - Compare-time symmetric matcher tolerance for Google OAuth (ya29.*) tokens, observability/telemetry payloads, credential-exchange bodies, and volatile UUID/timestamp tokens, so existing cassettes select a recorded episode instead of growing past the 50-episode cap and re-recording live. - Don't record fire-and-forget telemetry (langfuse/arize/otel/...) into non-telemetry tests' cassettes. Several modules set litellm.success_callback at import time, so observability logging is globally enabled and an async flush from the background logging worker lands in an unrelated test's VCR window, saved as a spurious MISS:RECORDED (observed: a Langfuse batch from another completion landing on test_lowest_latency_routing_buffer). Such a request now passes through live (telemetry hosts aren't real-spend hosts); tests that actually assert on telemetry keep recording it. - Dedupe + cap the VCR diagnostic dump so the classification summary survives CircleCI's ~400KB step-output truncation. - Stabilize a non-deterministic rate-limit test body; mark AWS Secrets Manager lifecycle tests VCR-incompatible (uniquely-named secrets can't be replayed). - Mark test_router_text_completion_client VCR-incompatible: it fires 300 identical requests to verify async-client reuse, but vcrpy patches the HTTP transport so replay never exercises the real connection pool the test validates, and recording 300 near-identical episodes overflows the 50-episode cap (MISS:OVERFLOW every run). It hits a free mock endpoint. - Mark the Vertex AI MaaS Mistral OCR tests (vertex_ai/mistral-ocr-2505) VCR-incompatible: the MaaS model is not provisioned in the CI GCP project, so the live :rawPredict call fails and the test skips every run, leaving no cassette to record (MISS:NOT_PERSISTED every run). Sibling direct-Mistral and Azure OCR tests are unaffected and still replay from cache. * fix(tests/vcr): refresh cassette TTL on read so replayed cassettes don't expire The Redis VCR persister loaded cassettes with a plain GET, which does not touch the key's TTL. A cassette that is only ever replayed (HIT/NOOP, never re-recorded) therefore expired exactly 24h after its last *write*, no matter how often it was read. Whichever CI run happened to cross that boundary re-recorded the cassette live and surfaced a spurious VCR MISS on otherwise deterministic cassettes — the residual per-run flakiness floor (a different random subset of read-only cassettes expiring each run). Slide the expiry forward on every successful load (best-effort EXPIRE), so any cassette used at least once per TTL window stays alive indefinitely and the 2nd/3rd run of a day replays cleanly. * fix(tests/vcr): recover from spurious GET-None for existing cassette keys Under concurrent CI load, the persister's load GET was observed returning None for a cassette key that demonstrably existed on the (single, non- clustered) Redis master — an external monitor saw the key present with a healthy TTL at the same instant the in-process client read None. Because None is a valid GET result (not a RedisError), the retry-on-error client config never engaged, so the cassette re-recorded live (a phantom MISS:RECORDED); for flaky/networked tests the failed live call then triggered a pytest rerun, which is why a rotating subset of otherwise deterministic tests missed each run. On a None result, re-check EXISTS and re-read once. If the key really exists, use the recovered value and log [vcr-transient-miss-recovered] (also counted in cassette_cache_health). A genuinely absent key (a new cassette) still falls through to CassetteNotFoundError. * chore(tests/vcr): TEMP diagnostic for persistent-miss cassette load path Logs GET/EXISTS at load time for the three cassettes that re-record every run despite being present in Redis, to capture what the in-process client sees. To be reverted before merge. * chore(tests/vcr): write load diagnostic to Redis (truncation-proof) CI stdout truncates to the last ~400KB, dropping the early loaddbg lines for the alphabetically-first failing test. Push the load probe to a Redis list instead so it survives. To be reverted before merge. * fix(tests/vcr): don't drop stored telemetry episodes during cassette load Root cause of the residual per-run misses on present cassettes: vcrpy's Cassette._load() replays each *stored* interaction through Cassette.append(), which runs before_record_request on it — and a None return there silently drops that episode. The telemetry-leak suppressor (_should_drop_telemetry_record) returns None for telemetry requests, so when a non-telemetry-named test (or the alphabetically-first test in a worker, whose _current_test_nodeid is still empty) loaded a cassette containing a Langfuse ingestion episode, the episode was dropped on read — forcing an endless live re-record (a phantom MISS:RECORDED on a cassette that was demonstrably present in Redis). Verified by reproducing Cassette._load() against the real cassette: empty/non-telemetry nodeid -> 0 episodes survive; with the guard -> 1 survives. Fix: guard the suppressor with a thread-local set around Cassette._load (via a small idempotent monkeypatch), so the drop only ever stops *new* incidental telemetry from being recorded and never filters the existing cassette on read. Also drops the speculative GET-None recovery + its diagnostics from the previous commits: the load diagnostic showed GET returns the cassette bytes fine (get=1440B), so the persister never returned a spurious None — the loss happened later in vcrpy's append. The proven TTL-refresh-on-read fix is retained. * fix(tests/vcr): drop incidental telemetry export POSTs to stop rotating async-flush misses litellm's observability loggers flush on a background thread, so a Langfuse ingestion POST scheduled by one telemetry test can fire mid-way through a *later* telemetry-named test (after that test's own httpx mock has exited) and be recorded by VCR as a phantom episode — a non-deterministic MISS:RECORDED / PARTIAL that rotates onto a different telemetry test from run to run. Telemetry export POSTs are fire-and-forget; no test asserts on a *recorded* export response except the pass-through proxy test (which forwards a client POST to Langfuse ingestion and replays its 207). So _should_drop_telemetry_record now drops incidental export POSTs for every test except that one. Dropping returns None (live fire-and-forget, never stored), so it can only turn a phantom miss into a harmless live call, never the reverse; recorded read-back GETs that telemetry tests assert on are matched by method and left untouched. * fix(tests/vcr): restore assertion in test_banner_silent_when_vcr_disabled The assertion that the banner is suppressed when VCR is disabled was inadvertently moved into test_diagnostic_log_silent_when_no_dir when the diagnostic-log tests were added, leaving the disabled-VCR test verifying nothing. Co-authored-by: Yassin Kortam <yassin@berri.ai> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Yassin Kortam <yassin@berri.ai> |
||
|
|
f75a7c6b22
|
fix(model-edit): allow clearing custom pricing on wildcard models (#28719)
* fix(model-edit): allow clearing custom input/output cost on wildcard deployments A user-set pricing override on a `/model/*` wildcard deployment could not be removed: clearing the Input/Output Cost fields in the UI succeeded visually, but the next read still showed the old values because both `litellm_params` and `model_info` (mirrored via `SPECIAL_MODEL_INFO_PARAMS`) retained the original rates. UI: when the pricing field is touched but left empty, send `null` instead of dropping it from the payload so the backend sees the clear intent. The cache-read-cost fallback now guards against `null` as well as `undefined` so a cleared input cost cannot silently wipe the cache-read override. Backend: `update_db_model` honors explicit-null clears, but ONLY for `SPECIAL_MODEL_INFO_PARAMS` (the 4 pricing fields). Restricting the null-clear path prevents a team-scoped caller from using this codepath to null out privileged fields like `team_id` or access groups. Tests cover both clear paths (`litellm_params` and `model_info`), the SPECIAL_MODEL_INFO_PARAMS mirror, PATCH semantics for omitted fields, and the security guard that non-pricing nulls don't reach the merged dict. Resolves LIT-3250 * fix(model-edit): run null-clears after both merges, not interleaved The previous version cleared `model_info` from inside the litellm_params merge block, but the subsequent `model_info.update(...)` re-injected the old pricing because the UI's PATCH carries the full model_info blob with the stale values still in it. Move the explicit-null clear pass to after both merges so a model_info passthrough cannot resurrect cleared fields. Adds a regression test for the realistic UI submit shape (both blobs in the patch, model_info still holding the old pricing). * test(e2e): clear-custom-pricing flow with create/delete cleanup Covers the dashboard model edit form's pricing-clear flow end-to-end: seeds a deployment with custom input/output pricing, drives the UI to clear both fields, asserts the outgoing PATCH sends explicit nulls, and confirms via /v2/model/info that the override is gone from both litellm_params and model_info. The dashboard DB persists across this suite, so beforeEach creates a uniquely-named deployment and afterEach POSTs /model/delete to leave the DB clean regardless of test outcome. * fix(model-edit): extend pricing clear to cache_read and cache_write costs Pre-existing parallel of the wildcard input/output cost bug: cleared cache_read_input_token_cost and cache_creation_input_token_cost overrides silently persisted because the UI omitted the key (delete or fallback) and the backend null-clear allowlist did not cover them. - types/router.py: add cache_read_input_token_cost and cache_creation_input_token_cost to SPECIAL_MODEL_INFO_PARAMS, so they are mirrored between litellm_params and model_info by Deployment.__init__ and honoured by the null-clear loop in update_db_model. - model_info_view.tsx: emit explicit null for touched-but-empty cache_read and cache_write fields. Preserve the input_cost->cache_read mirror only when cache_read itself was not touched. - model_management_endpoints.py: update the allowlist comment. - Tests: three new unit tests for cache clear paths and a preserve check; the e2e spec now seeds, clears, and asserts null PATCH + key-absence for all four pricing fields. |
||
|
|
a8263cbc88
|
fix(ui): route API Reference back to query-param page (#28726)
* fix(ui): route API Reference back to query-param page The path-based /ui/api-reference route was broken in practice — the page-local useProxySettings hook didn't match what the root page passes down. Remove api_ref from the migration maps (LEGACY_REDIRECTS in app/page.tsx, MIGRATED_PAGES in leftnav.tsx and (dashboard)/layout.tsx), point the leftnav item back at page="api_ref", and restore the api_ref render branch in the root page. The path-based page.tsx and the useProxySettings hook stay in place unchanged; only api_ref is moved back to query-param routing while the migration infrastructure is preserved for future page moves. * fix(ui): alias ?page=api-reference to api_ref branch Handles bookmarks of the hyphen-form query param that was live during the brief path-based migration window, so they render the working APIReferenceView instead of falling through to the default page. |
||
|
|
96a2e8b16d
|
fix(azure): preserve AD token refresh in v1 OpenAI client path (#28627)
* fix(azure): preserve AD token refresh in v1 OpenAI client path
The /openai/v1/ code path (api_version in {"v1", "latest", "preview"})
constructs a plain OpenAI/AsyncOpenAI client, but only forwarded
`api_key` from `azure_client_params`. When `enable_azure_ad_token_refresh`
is set (or any AD-only auth), `api_key` is None and the client
constructor raised "The api_key client option must be set...", breaking
every Azure call with a v1 api_version.
The OpenAI SDK (>=2.20.0) accepts a callable for `api_key` and re-invokes
it on every request via `_refresh_api_key`, so we now forward
`azure_ad_token_provider` directly — preserving the per-request token
refresh behavior of the regular AzureOpenAI client and avoiding the
expiry hole that resolving the token once at client-creation time would
introduce. Static `azure_ad_token` strings fall through to `api_key`.
For the async path we wrap the sync provider returned by azure-identity
in an async function since AsyncOpenAI expects `Callable[[], Awaitable[str]]`.
Fixes #27945
https://claude.ai/code/session_01UnzrDSFUUgp5T2wRoPMxq5
* fix(azure): offload sync token provider to thread in v1 async wrapper
* fix(azure): include AD credential identity in v1 client cache key
---------
Co-authored-by: Claude <noreply@anthropic.com>
|
||
|
|
48d7e15b83
|
chore(admin-ui): regenerate static export with trailingSlash: true (#28112)
* chore(admin-ui): regenerate static export with trailingSlash: true Rebuilds litellm/proxy/_experimental/out/ from ui/litellm-dashboard with `trailingSlash: true` enabled in next.config.mjs. Next.js now emits every route as <dir>/index.html (e.g. mcp/oauth/callback/index.html) instead of <dir>.html with a sibling metadata-only directory, which fixes the 404 on extensionless URLs served through FastAPI's StaticFiles(html=True) mount. This is the build artifact half of the fix; the config change, Dockerfile cleanup, and regression test live in the follow-up source PR that stacks on top of this branch. * fix(admin-ui): emit nested routes as <dir>/index.html (#28106) Linear and other OAuth providers redirect the user back to /ui/mcp/oauth/callback?code=...&state=... after the consent step. The packaged Next.js static export only produced /ui/mcp/oauth/callback.html, so FastAPI's StaticFiles served a 404 on the extensionless URL and the OAuth handshake never completed. The Dockerfile.non_root build step tried to paper over this at image-build time with `for html_file in *.html; do ...`, but that shell glob does not recurse, so nested routes like mcp/oauth/callback.html were left stranded next to an empty mcp/oauth/callback/ directory containing only Next.js metadata. The runtime restructure step in proxy_server.py was then skipped because the .litellm_ui_ready marker had already been dropped. Set trailingSlash: true in the dashboard's Next.js config so the export emits every nested route as <dir>/index.html natively. The Dockerfile loop is now a no-op for the bundled UI and has been removed; the .litellm_ui_ready marker is still written so the proxy keeps skipping the redundant Python restructure step at startup. Stacks on top of the static export regeneration in the parent branch. * chore: restore origin/litellm_internal_staging out files |
||
|
|
c23b19f09c
|
feat(openai): apply regional-processing cost uplift for EU/US data residency (#28626)
* feat(openai): apply regional-processing cost uplift for EU/US data residency OpenAI charges a 10% uplift on the latest GPT models when requests are served from a regionalized hostname (eu./us.api.openai.com). Infer the region from `api_base`, expose it on `kwargs["litellm_params"]["data_residency"]`, and multiply the computed cost by a per-model `regional_processing_uplift_multiplier_<region>` field. https://claude.ai/code/session_012ebH44s7ohYxjoix5CXzTW * test: allow regional_processing_uplift_multiplier_{eu,us} in model_prices schema * fix(cost): tighten data_residency inference and restore model_cost in tests - Only infer OpenAI data_residency when custom_llm_provider == "openai"; drop the implicit None fallback so non-OpenAI callers can't accidentally pick up a regional tag from a stray OpenAI hostname. - _local_model_cost_map fixture now snapshots and restores litellm.model_cost and LITELLM_LOCAL_MODEL_COST_MAP so tests don't leak state across the session. * refactor(openai): move data_residency helper under llms/openai * fix: thread data_residency through realtime stream cost calculation Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(cost): thread data_residency through batch_cost_calculator Apply the OpenAI regional-processing uplift multiplier to retrieve_batch cost paths so Batch API requests served via eu./us.api.openai.com are priced at the same uplifted token rates as completions/transcriptions. * refactor(openai): encapsulate provider check inside infer_openai_data_residency Move the custom_llm_provider == "openai" guard from get_litellm_params into the helper itself so the core utility no longer carries provider-specific dispatch logic. Callers pass through the provider unconditionally; the helper returns None for any non-OpenAI provider. * fix(responses): thread data_residency through Responses logging params The Responses API paths build their logging litellm_params dict after provider resolution but did not include data_residency, so cost calc saw None even when the effective api_base was a regional OpenAI host. --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Yassin Kortam <yassin@berri.ai> |
||
|
|
f38c16c71e
|
test(proxy): add harness for proxy_server.py behavior-pinning (#28827)
* test(proxy): add harness for proxy_server.py behavior-pinning Creates tests/test_litellm/proxy/proxy_server/ with: - conftest.py: 11 shared fixtures (app, client, mock_prisma, auth_as, mock_router with parametrized response builders, normalize, etc.) - _coverage_check.py: per-PR coverage gate (line + branch) against a baseline, self-selects target by inspecting which placeholder files have been filled - _pin_check.py: AST-based gate that verifies every pin-list item has >=1 happy + >=1 error test with a real assertion (no status-only) - test_harness_smoke.py: 19 smoke tests covering every fixture + both scripts end-to-end - 26 placeholder test files (one docstring each) reserved for follow-up PRs per the directory ownership in the Notion plan - .coverage_baseline pinned at 0% so future PRs measure deltas against new-tests-only and aren't entangled with the broader scattered test suite Adds a dedicated proxy-server job to test-unit-proxy-endpoints.yml so this directory's runtime + coverage are tracked independently. Plan: https://www.notion.so/36c43b8acdab81ee845fd5365128a2fc * ci(proxy-endpoints): allow workflow_dispatch Lets the workflow be triggered manually on a branch via `gh workflow run`, which is needed for the verify-first flow on workflow changes before opening a PR. * test(proxy): address review feedback on proxy_server harness - conftest.py: anchor sys.path insert to __file__ (Path(__file__).resolve().parents[4]) instead of CWD-relative os.path.abspath("../../../../") which resolved to the wrong directory when pytest is launched from the repo root. - _coverage_check.py: actually read .coverage_baseline and use it as the floor (line_min = max(target, baseline)). Closes the gap between the PR description's "delta semantics" and what the script was doing. With baseline=0.0 today this is a no-op; future PRs that update the baseline cause regressions (test deletions etc.) to trip the gate even if the static PR target is still met. - _pin_check.py: drop unreachable startswith("_") guard (test_*.py glob never yields underscore-prefixed names) and read each test file once instead of twice. |
||
|
|
48dd71b818
|
ci: add daily oss-agent-shin branch creation workflow (#28829)
Creates litellm_oss_agent_shin_MM_DD_YYYY from main every day at 00:00 UTC. Lets us retarget oss-agent-shin fork PRs onto a canonical branch so CircleCI runs with secrets, without granting the agent write access. Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com> |
||
|
|
1b6788e0c4
|
fix(proxy): hydrate wildcard discovery credentials (#28284) (#28822)
* fix(proxy): hydrate wildcard discovery credentials * fix(proxy): constrain wildcard credential hydration Co-authored-by: Dibyo Mukherjee <dibyo@adobe.com> |
||
|
|
7cd98508e7
|
fix(team): keep team_alias cache in sync on _cache_team_object writes (#28737)
* fix(team): keep team_alias cache in sync on _cache_team_object writes _cache_team_object wrote only to the team_id:<id> cache key, but the JWT auth path that uses team_alias_jwt_field reads from a separate team_alias:<alias> key (get_team_object_by_alias caches under both keys on miss, but reads only the alias-keyed one). After any team-mutation endpoint (team_model_add, team_model_delete, update_team, the two access-group writes) the team_id cache was refreshed but the team_alias cache stayed stale until TTL — JWT callers using team_alias_jwt_field kept seeing the pre-mutation team for the full cache window. Mirror the write under the alias key inside _cache_team_object so every existing caller stays in sync without further changes. Skip the alias write when team_alias is None/empty so we don't collide across alias-less teams. Surfaced testing the LIT-3244 cherry-pick on patch/1.86.0: the LIT-3244 fix correctly invalidated the team_id cache but the customer's JWT used team_alias_jwt_field, so they kept hitting the stale alias-keyed entry. * fix(team): delete (not overwrite) team_alias cache on _cache_team_object The prior shape of this PR wrote both team_id:<id> AND team_alias:<alias> from _cache_team_object. team_alias is NOT unique in the schema (no @unique on LiteLLM_TeamTable.team_alias), and get_team_object_by_alias enforces uniqueness on its own DB-fetch path (len(teams) > 1 raises). Writing the alias-keyed cache from the generic refresh path bypassed that check: a team admin renaming their team to collide with another team's alias could silently overwrite the cached team for JWT-by-alias auth, swapping the resolved team under that alias for the cache window. Switch the alias-keyed operation from a write to a delete (mirroring the dual-cache delete pattern in _delete_cache_key_object). After every team write, the next JWT-by-alias reader cache-misses and falls through to get_team_object_by_alias, which (a) re-fetches the fresh team from DB, closing the LIT-3244 staleness gap that motivated this PR, and (b) enforces alias uniqueness before populating either cache key. team_id:<id> writes are unchanged — team_id is the table PK and is guaranteed unique. Surfaced in veria-ai review on #28739. * fix(managed-files): anchor model_id regex so it doesn't match llm_output_file_model_id extract_model_id_from_unified_id used `re.search(r"model_id,([^;]+)", ...)` which substring-matches the `model_id,` inside the file-ID encoding's `llm_output_file_model_id,<deployment_uuid>` field. parse_unified_id then fed that deployment UUID back into the auth path as a model candidate via _extract_models_from_managed_resource_id, and every team-BYOK file attach 403'd with: team not allowed to access model. This team can only access models=['openai/*']. Tried to access <deployment-uuid> The team's models list correctly contains the public name (`openai/*`) that target_model_names matches, but the bogus UUID candidate fails the wildcard check first. Anchor the regex to a field boundary (`(?:^|;)model_id,`) so it matches the legitimate top-level `model_id,<value>` field on vector_store unified IDs and skips substring matches inside other fields. File-IDs (which have no top-level `model_id` field) now return None and contribute no spurious UUID candidate. Surfaced reproducing LIT-3244 on patch/1.86.0 with the customer's exact flow: team with openai/* BYOK deployment, JWT-scoped user, POST /v1/vector_stores/{id}/files attaching a file uploaded with target_model_names=openai/gpt-4o. |
||
|
|
0d5040fc06
|
chore(ci): merge dev branch (#28807)
* chore(proxy): route path-dependent call sites through get_request_route Replace direct ``request.url.path`` reads in auth, ACL, routing, and audit-log decisions with ``get_request_route(request)`` — the helper already added in ``auth/auth_utils.py`` that returns the ASGI ``scope["path"]`` with ``root_path`` stripped. Starlette reconstructs ``url.path`` from the Host header; ``scope["path"]`` is uvicorn's parse of the request line and matches what FastAPI dispatches on, so it's the authoritative route for any decision that should agree with the actual handler. Sites: - _experimental/mcp_server/auth/user_api_key_auth_mcp.py - management_endpoints/mcp_management_endpoints.py - vector_store_endpoints/utils.py - pass_through_endpoints/pass_through_endpoints.py - auth/route_checks.py - litellm_pre_call_utils.py - spend_tracking/spend_management_endpoints.py - common_utils/http_parsing_utils.py - management_helpers/utils.py - health_endpoints/_health_endpoints.py Adds regression tests in tests/proxy_unit_tests/test_proxy_routes.py that construct a Request with scope["path"] set to a benign route and the Host header crafted so url.path would resolve differently; each site's decision is asserted against scope["path"]. * chore(proxy): make get_request_route imports lazy at call sites Move the ``from litellm.proxy.auth.auth_utils import get_request_route`` imports added in the prior commit back to the function bodies that use them. The module-level form participates in a long-standing import cycle through ``auth_utils -> _types -> ...`` and was flagged by CodeQL on the PR; the lazy form matches the pattern the proxy already uses for ``user_api_key_auth`` and related helpers elsewhere in these files. Also drop the ``RouteChecks._is_assistants_api_request`` delegation in ``_get_metadata_variable_name`` introduced in the prior commit — the delegation pulled ``RouteChecks`` into the same cycle, and the call site reuses the resolved route for its other branches, so inlining the substring check is both cycle-free and avoids a redundant second ``get_request_route`` call. Comment in test_proxy_routes.py acknowledges that the two MCP table entries exercise ``get_request_route`` directly rather than the full production handler (which needs ASGI scope + MCP state to invoke). --------- Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: user <70670632+stuxf@users.noreply.github.com> |
||
|
|
66f10bceea
|
feat(proxy): allow llm_api_routes virtual keys to list MCP servers (#28442)
* feat(proxy): allow llm_api_routes virtual keys to list MCP servers
Add a new `mcp_discovery_routes` group (GET /v1/mcp/server and GET
/v1/mcp/server/{server_id}) and include it in `llm_api_routes` so that
virtual keys configured with `allowed_routes=["llm_api_routes"]` can
discover the MCP servers they have access to. Previously these calls
failed with 'Virtual key is not allowed to call this route. Only allowed
to call routes: [llm_api_routes]'.
The GET handlers already sanitize the response for restricted virtual
keys via `_sanitize_mcp_server_list_for_virtual_key`, stripping
credential-bearing fields (url, headers, env). Write methods
(POST/PUT/DELETE) on the same paths remain gated by the existing
handler-level admin role checks.
The new discovery list is intentionally kept OUT of
`mcp_inference_routes`, so `is_llm_api_route()` still returns False
for these paths — this preserves the existing contract that
DISABLE_LLM_API_ENDPOINTS must not block the Admin UI from listing MCP
servers.
Co-authored-by: ryan-crabbe-berri <ryan-crabbe-berri@users.noreply.github.com>
* refactor(proxy): make MCP discovery carve-out method-aware
Replace the `mcp_discovery_routes` group in `llm_api_routes` with a
method-aware special case inside `is_virtual_key_allowed_to_call_route`.
Virtual keys with allowed_routes=["llm_api_routes"] are now permitted
to call only GET /v1/mcp/server and GET /v1/mcp/server/{server_id} —
non-GET methods and multi-segment admin sub-paths fall through to the
existing 403. This keeps the general llm_api_routes list free of
management paths and avoids accidentally exposing POST/PUT/DELETE
writes through the route-check layer.
---------
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: ryan-crabbe-berri <ryan-crabbe-berri@users.noreply.github.com>
|
||
|
|
c127968dfb
|
fix(ui): show 2-decimal precision for max_budget on key overview (#28809)
The Key Info Overview tab's Spend card truncated sub-dollar budgets to "$0" because formatNumberWithCommas defaults to 0 decimals. The Settings tab passes 2; align the overview so a $0.10 budget renders as "$0.10". Resolves LIT-2845 |
||
|
|
d98ada8c3f
|
chore(ci): merge dev branch (#28657)
* feat(dashboard): navbar hierarchy + Agent Platform notifications (#27543) * feat(dashboard): refine navbar zones and Agent Platform notice Restructure the admin navbar for production users: clear product vs community vs personal columns with vertical dividers, icon-only Slack/GitHub in a shared chip, and Docs/Blog typography aligned on an 8px rhythm. Add a notifications bell with popover linking to the LiteLLM Agent Platform repo and optional mark-as-read persistence. Promote the account control with initials avatar, single-line display name, and navDisplayName mapping for placeholder user ids (e.g. default_user_id). Co-authored-by: Cursor <cursoragent@cursor.com> * fix(dashboard): address PR review — AntD buttons, public page guard, dedupe regex - Replace raw <button> with AntD Button in BlogDropdown, NotificationsBell, UserDropdown, and test mock - Guard NotificationsBell + container behind !isPublicPage to avoid rendering on public pages - Remove redundant equality checks in navDisplayName (regex already covers them) - Remove unused `lower` variable after simplification Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: yuneng-jiang <yuneng@berri.ai> * fix(dashboard): drop dead useHealthReadiness import in navbar The module was removed in #27896 (replaced by useHealthReadinessDetails), but the import survived the rebase. The symbol is unused — only useHealthReadinessDetails is consumed in the file. Removing the dead import unblocks the UI TypeScript build. * fix(dashboard): align CommunityEngagementButtons test with icon-only aria-labels The component was refactored to an icon-only chip with aria-label='LiteLLM on GitHub' (squash #27543), but the test still asserted /star us on github/i. Update the query to match the rendered accessible name. * refactor(dashboard): drop unused props from NavbarProps The navbar refactor moved user identity + dark-mode state to internal hooks (useAuthorized, useWorker), but the NavbarProps interface still declared userID, userEmail, userRole, premiumUser, isDarkMode, and toggleDarkMode as required, forcing every caller to thread them through. Drop them from the interface and all four call sites (page.tsx, (dashboard)/layout.tsx, public_model_hub.tsx, navbar.test.tsx). Also shrinks the destructure in layout.tsx so the now-unused locals stop being pulled out of useAuthorized(). * refactor(dashboard): use useSyncExternalStore for NotificationsBell dismiss flag Reads/writes of the litellmHideAgentPlatformBanner key were done directly inside NotificationsBell via a useEffect + useState pair. Every other localStorage-backed flag in the dashboard (Disable ShowPrompts, DisableBouncingIcon, DisableShowNewBadge, DisableUsageIndicator, DisableBlogPosts) is wrapped in a useSyncExternalStore hook over localStorageUtils so all mounted components stay in sync. Extract useHideAgentPlatformBanner to follow the same shape, swap NotificationsBell to consume it, and add a regression test that two sibling bells stay in sync without a remount when one is dismissed. * refactor: mask credential fields in proxy settings GET responses (#28682) * refactor: mask credential fields in proxy settings GET responses Brings SSO settings, cache settings, and the email/Slack alerting view in /get/config/callbacks in line with the HashiCorp Vault config-override pattern, so persisted credentials are not transported back to the UI in plaintext. * refactor: harden short-value masking and hoist alerting var constant Closes two review observations: - mask_sensitive_keys now replaces short values (below the visible prefix+suffix length) with an all-mask string instead of returning them unchanged, so a 1-7 character credential is no longer round-tripped verbatim. - _ALERTING_SENSITIVE_VARS is moved out of get_config() to a module-level constant, matching the analogous _SSO_SENSITIVE_FIELDS and _CACHE_SENSITIVE_FIELDS in the SSO and cache endpoint files. --------- Co-authored-by: Krrish Dholakia <krrish+github@berri.ai> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
5f75be5c1c
|
chore(ci): merge dev branch (#28801)
* chore(proxy): route path-dependent call sites through get_request_route Replace direct ``request.url.path`` reads in auth, ACL, routing, and audit-log decisions with ``get_request_route(request)`` — the helper already added in ``auth/auth_utils.py`` that returns the ASGI ``scope["path"]`` with ``root_path`` stripped. Starlette reconstructs ``url.path`` from the Host header; ``scope["path"]`` is uvicorn's parse of the request line and matches what FastAPI dispatches on, so it's the authoritative route for any decision that should agree with the actual handler. Sites: - _experimental/mcp_server/auth/user_api_key_auth_mcp.py - management_endpoints/mcp_management_endpoints.py - vector_store_endpoints/utils.py - pass_through_endpoints/pass_through_endpoints.py - auth/route_checks.py - litellm_pre_call_utils.py - spend_tracking/spend_management_endpoints.py - common_utils/http_parsing_utils.py - management_helpers/utils.py - health_endpoints/_health_endpoints.py Adds regression tests in tests/proxy_unit_tests/test_proxy_routes.py that construct a Request with scope["path"] set to a benign route and the Host header crafted so url.path would resolve differently; each site's decision is asserted against scope["path"]. * chore(proxy): make get_request_route imports lazy at call sites Move the ``from litellm.proxy.auth.auth_utils import get_request_route`` imports added in the prior commit back to the function bodies that use them. The module-level form participates in a long-standing import cycle through ``auth_utils -> _types -> ...`` and was flagged by CodeQL on the PR; the lazy form matches the pattern the proxy already uses for ``user_api_key_auth`` and related helpers elsewhere in these files. Also drop the ``RouteChecks._is_assistants_api_request`` delegation in ``_get_metadata_variable_name`` introduced in the prior commit — the delegation pulled ``RouteChecks`` into the same cycle, and the call site reuses the resolved route for its other branches, so inlining the substring check is both cycle-free and avoids a redundant second ``get_request_route`` call. Comment in test_proxy_routes.py acknowledges that the two MCP table entries exercise ``get_request_route`` directly rather than the full production handler (which needs ASGI scope + MCP state to invoke). --------- Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: user <70670632+stuxf@users.noreply.github.com> |
||
|
|
30551de371
|
fix(otel): export SERVER span on management-endpoint success without http_request (#28794)
Co-authored-by: Yassin Kortam <yassinkortam@Yassins-MacBook-Pro.local> |
||
|
|
f9407bc036
|
chore(tests): migrate Bedrock CI to AWS account 941277531214 (#28728)
* chore(tests): migrate Bedrock CI from AWS account 888602223428 to 941277531214
The original account (888602223428) was put under a security restriction by
AWS after a root access key leaked in a PR comment. While that account works
its way through the AWS Support unlock process, Bedrock-touching CI tests have
been migrated to a fresh account (941277531214).
Changes:
- Replace 26 hardcoded references to 888602223428 with 941277531214 across
8 files (provisioned-model ARNs, imported-model ARNs, AgentCore runtime
ARNs, batch execution role ARN, and example proxy config).
- The provisioned-model and imported-model ARNs are referenced only from
mocked unit tests — no AWS resources to recreate.
- The batch execution IAM role has been recreated in the new account with
the same name and equivalent permissions.
- The two AgentCore runtimes (hosted_agent_r9jvp-3ySZuRHjLC,
hosted_agent_13sf6-cALnp38iZD) are being recreated in the new account
under the same names — see tools/agentcore-deploy/ in a follow-up.
CircleCI env vars AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_REGION_NAME
were updated separately via the CircleCI API to point at the new account.
Smoke-tested locally against the new account:
aws bedrock-runtime converse --region us-west-2 \
--model-id us.anthropic.claude-sonnet-4-5-20250929-v1:0 \
--messages '[{"role":"user","content":[{"text":"ping"}]}]'
→ 200, model returned 'pong'
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* chore(tests): refresh AgentCore ARN suffixes to match newly-deployed runtimes
The first migration commit replaced just the account ID, but AgentCore
auto-assigns a random 10-char suffix to every runtime on creation — we
can't reuse the original suffixes (`3ySZuRHjLC`, `cALnp38iZD`) in the
new account. Updated the AgentCore-runtime ARNs in the three files that
reference real runtime IDs (not the mock-based unit-test ARNs).
Deployed runtimes:
arn:aws:bedrock-agentcore:us-west-2:941277531214:runtime/hosted_agent_r9jvp-Rq79QFC2fp
arn:aws:bedrock-agentcore:us-west-2:941277531214:runtime/hosted_agent_13sf6-4046UzHSwy
Both runtimes are status=READY and pass a smoke invoke:
$ aws bedrock-agentcore invoke-agent-runtime --agent-runtime-arn ... --payload '{"prompt":"ping"}'
→ 200, {"result": "echo: ping"}
The agent is a minimal echo (see /tmp/agentcore_deploy/agent.py for the
deploy artifacts). Tests that only verify the SDK wiring will pass; if any
test asserts on agent output content, swap the echo for the real agent.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* chore(tests): point Bedrock batch tests at new-account S3 bucket
The account migration (888602223428 -> 941277531214) was a flat
account-ID swap, which only rewrites ARNs that embed the account
number. S3 bucket names carry no account ID, so the live Bedrock
batch tests still uploaded to `litellm-proxy` — a bucket that lives
in the old account. S3 names are globally unique, and the old account
still holds that name, so it can't be recreated in the new account.
Rename to `litellm-proxy-941277531214` (account-ID suffix guarantees
global uniqueness). The bucket must be created in 941277531214 and the
batch execution role granted s3:GetObject/PutObject/ListBucket on it
before this job is run in CI.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(tests): point live S3 logging test at new-account bucket
Same account-ID-free blind spot as the batch bucket: `load-testing-oct`
lives in the old account and its name can't be reused globally. The
`logging_testing` CI job is wired into the workflow and runs
test_basic_s3_logging, which uploads to this bucket with the CI env
creds, then lists and deletes objects — a live dependency.
Rename to `load-testing-oct-941277531214`. The bucket must exist in the
new account with the CI IAM principal granted
s3:PutObject/GetObject/ListBucket/DeleteObject before this job runs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(tests): repoint Bedrock guardrail IDs to new-account guardrails
The migration left guardrail IDs untouched (no account ID in them), so
all live guardrail tests failed with "guardrail identifier or version
does not exist" against 941277531214. Recreated both guardrails in the
new account and updated the hardcoded IDs:
- wf0hkdb5x07f -> zgkmukebruil (PII mask: PHONE + CREDIT_DEBIT_CARD,
with explicit inputAction=ANONYMIZE so masking applies to INPUT,
which is the source litellm's moderation hook sends)
- ff6ujrregl1q -> 4w3d1di3snt5 (blocks "coffee"; blocked message set
to the exact string the tests assert on)
Updated test_bedrock_guardrails.py, otel_test_config.yaml, and the
guardrailConfig in test_bedrock_completion.py. Verified locally: the 5
previously-failing guardrail tests now pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(bedrock): migrate legacy models to current inference profiles
The new CI account (941277531214) cannot invoke legacy Bedrock models
(AWS gates them: "marked by provider as Legacy... not actively using in
the last 30 days"). Migrated the live-call tests:
- anthropic.claude-3-sonnet-20240229 -> us.anthropic.claude-sonnet-4-5-20250929-v1:0
- anthropic.claude-3-haiku-20240307 -> us.anthropic.claude-haiku-4-5-20251001-v1:0
Current Claude models on Bedrock require the us. inference-profile prefix
(bare on-demand ids are rejected).
cohere.command-r-plus has no working replacement (all Cohere is legacy-
gated in the new account): swapped to claude-haiku-4-5 in provider-
agnostic param lists. amazon.titan-image-generator skipped (no working
replacement). Mocked/transformation/cost tests that reference the legacy
strings are intentionally left unchanged. Verified live against the new
account.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(bedrock): repoint SageMaker + Knowledge Base to new-account resources
These referenced account-scoped resources by hardcoded id that only
existed in the old account, so the migration's account-ID swap missed
them. Recreated in 941277531214 and repointed:
- SageMaker endpoint jumpstart-dft-hf-textgeneration1-mp-20240815-185614
-> litellm-ci-textgen (gpt2 on a TGI container, ml.g5.xlarge)
- Bedrock Knowledge Base T37J8R4WTM -> LCYXFBR2TU (OpenSearch Serverless
vector store + titan-embed-text-v2, seeded with a LiteLLM doc)
Verified live: test_sagemaker.py (12 passed) and
test_bedrock_knowledgebase_hook.py (12 passed).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(reasoning_effort_grid): skip bedrock claude-opus-4-7 cells (not entitled on 941277531214)
claude-opus-4-7 is listed in the new Bedrock CI account's foundation
models but invoke is denied (AccessDeniedException: "not available for
this account"). Bedrock access to the flagship Opus requires an AWS
Sales request, not the self-serve model-access toggle, so it can't be
enabled inline with the rest of the account migration.
Add an optional `skip_reason` to ModelEntry and set it on the
bedrock-claude-opus-4-7 entry; the grid test honors it via pytest.skip.
Cell count (231) and route coverage are unchanged, so the structural
asserts still pass. Restore coverage by deleting the one skip_reason
line once access is granted.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(bedrock): swap/skip legacy-gated models unavailable on new CI account
The migrated AWS account (941277531214) cannot access several models that
the old account could, so the remaining red CI jobs were hitting real
Bedrock "Access denied / Legacy" and "account not authorized" errors:
- image_gen: skip both Nova Canvas test classes (amazon.nova-canvas-v1:0 is
legacy-gated), matching the existing titan skip.
- batches: skip test_async_file_and_batch (Bedrock batch inference is not
authorized on the new account; requires an AWS support case).
- litellm_overhead: swap legacy claude-3-5-haiku for the active
us.anthropic.claude-haiku-4-5 inference profile.
- test_completion_claude_3_function_call: swap legacy claude-3-sonnet for the
active us.anthropic.claude-sonnet-4-5 inference profile.
https://claude.ai/code/session_01Y7zgHYu9GX29YRwV4yiWAa
* test(bedrock): fix remaining e2e legacy-model + batch failures on new CI account
- e2e_openai_endpoints: skip test_bedrock_batches_api (Bedrock batch inference
is not authorized on account 941277531214) and migrate the missed
s3_bucket_name in oai_misc_config.yaml to litellm-proxy-941277531214.
- build_and_test: swap legacy bedrock claude-3-sonnet for the active
us.anthropic.claude-sonnet-4-5 inference profile in the proxy structured
output e2e test.
https://claude.ai/code/session_01Y7zgHYu9GX29YRwV4yiWAa
* test(bedrock): make opus-4-7 + batch cells fail loudly and mock image-gen (#28791)
Replace the silent skips added for the new CI account with noisier behavior:
- reasoning-effort grid: opus-4-7 cells now fail (when AWS creds are present)
instead of skipping, so the missing entitlement stays visible in CI; they
still skip when AWS creds are absent (local dev)
- Bedrock batch inference tests: drop the skip so they run and fail until
batch access is granted
- Titan + Nova Canvas image-gen tests: mock the Bedrock HTTP call so the
transform + cost-tracking path stays under test without live model access
https://claude.ai/code/session_01MT7SWDnXUjv6e6EPG7BDjT
Co-authored-by: Claude <noreply@anthropic.com>
* test(bedrock): use pytest.xfail for known-failing opus-4-7 cells
Replace pytest.fail with pytest.xfail when a model has a fail_reason,
so known-broken cells stay visible as XFAIL without keeping CI red.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
---------
Co-authored-by: Mateo <mateo@Mateos-MacBook-Pro.local>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
|
||
|
|
f45909cb81
|
fix(proxy): Bedrock Knowledge Base pass-through: preserve SigV4 headers and signed request body (#27526)
* Fix Bedrock KB pass-through SigV4 headers and signed body Coerce botocore HeadersDict to a dict for pass-through routes. When forward_headers is true, drop request headers that collide case-insensitively with signed headers so client Bearer auth does not shadow AWS SigV4. Send prepped.body as raw content so the outbound payload matches the signature after logging hooks mutate the parsed dict. Co-authored-by: Cursor <cursoragent@cursor.com> * Simplify pass-through raw body handling Read the SigV4-signed bytes directly from request.state inside pass_through_request instead of threading a custom_raw_body argument through three functions. Helper methods are restored to their original signatures, and the new branch lives in one place at each httpx call site. Co-authored-by: Cursor <cursoragent@cursor.com> * Harden pass-through raw body read from request.state Guard missing request.state (test fixtures) and ignore non-bytes/str values so MagicMock does not trigger the SigV4 raw-body path. Co-authored-by: Cursor <cursoragent@cursor.com> * Test pass_through_request state_raw_body uses httpx content= Cover non-streaming (async_client.request) and streaming (build_request) paths so SigV4 bytes on request.state are not replaced by json= of a hook-mutated dict. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
4148667671
|
Fix spend logs v2 route permissions (#28705)
Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: ryan-crabbe-berri <ryan-crabbe-berri@users.noreply.github.com> |
||
|
|
92d4bba58f
|
fix(ui/add-model): stop vertex_ai-anthropic_models from leaking under Anthropic (#28723)
`getProviderModels()` matched a model into a provider's dropdown when the model's `litellm_provider` string *contained* the provider key as a substring. The intent was to admit suffix variants (e.g. `anthropic_text`, `bedrock_converse`), but the substring check is too loose: it also pulls in unrelated providers whose name happens to contain the key, most visibly `vertex_ai-anthropic_models` matching `anthropic` and `vertex_ai-openai_models` matching `openai`. Replace `.includes()` with separator-anchored prefix matching (`startsWith(provider + "_")` / `startsWith(provider + "-")`). All legitimate variants in `model_prices_and_context_window.json` still match (`anthropic_text`, `azure_text`, `azure_ai`, `bedrock_converse`, `bedrock_mantle`, `cohere_chat`, `fireworks_ai-embedding-models`, `vertex_ai-*`, `vertex_ai_beta`), and the cross-provider leak is closed. Tests: update one assertion that pinned the buggy substring behavior (`custom_openai_endpoint` matching `openai` — not a real provider value); add 6 new tests covering the leak regressions and the variant-preservation contract for vertex_ai/bedrock/fireworks. |
||
|
|
5f73ad4fe7
|
fix(team): refresh team cache on team_model_add/delete (LIT-3244) (#28683)
* fix(team): refresh team cache on team_model_add/delete (LIT-3244)
team_model_add and team_model_delete wrote to the DB but did not
invalidate the in-memory LiteLLM_TeamTableCachedObj used by
common_checks. After the v1.83.14 common_checks centralization made
team.models authoritative on /v1/files and /v1/vector_stores/*,
adding a Team-BYOK model silently failed to grant the new public
model name to team members until the cache TTL expired (and a
removed model kept working until then on the symmetric path).
Extract the cache-refresh snippet from update_team into a small
helper and apply it consistently at all three team-write sites.
* test: also assert updated models in team-cache-refresh pin
Strengthens the LIT-3244 regression test to also assert
`call_kwargs["team_table"].models` matches the updated row,
not just `team_id`. Both `existing_team` and `updated_team`
share `team_id` in the test setup, so the previous assertion
would have passed even if the implementation accidentally cached
the pre-mutation row.
Greptile review feedback.
* fix(team): hydrate object_permission on cache-refreshing team updates
The Prisma update calls in update_team, team_model_add, and
team_model_delete returned a team row with object_permission_id set
but object_permission=None (the relation was not requested via
include=). _refresh_cached_team then wrote that to the in-memory
LiteLLM_TeamTableCachedObj, and the cache-hit path in get_team_object
returns the cached object without re-hydrating. Downstream consumers
(validate_key_search_tools_against_team, the MCP/agent authz paths)
treat a missing object_permission as no team-level restriction, so
a team-write op silently dropped object-permission enforcement until
the cache TTL expired or a DB-fetch path re-hydrated it.
Add include={"object_permission": True} to all three updates so the
refresh writes a complete cached team. Extend the LIT-3244 regression
test to pin both the cached object_permission and the include shape
on the Prisma call.
Surfaced in PR review of LIT-3244.
|
||
|
|
3bcfe41f05
|
test(model_prices): allow audio_transcription_config in schema (#28708)
The schema in test_aaamodel_prices_and_context_window_json_is_valid uses additionalProperties: false. The azure/speech/azure-stt entry added in #27482 introduced an audio_transcription_config field that the schema did not whitelist, so the test fails on every branch built on top of staging. Add the field as a string property. |
||
|
|
7c667b8797
|
fix(helm): drop main- prefix from default image tag (#28710)
* fix(helm): drop main- prefix from default image tag
The default image tag in the deployment + migrations-job templates was
`main-{{ .Chart.AppVersion }}`. The current release pipeline publishes
content tags without the `main-` prefix (e.g. `v1.85.1` / `1.85.1`,
`v1.86.0-rc.1` / `1.86.0-rc.1`), so the rendered ref points at a tag
that does not exist on GHCR or DockerHub and installs fail with
ImagePullBackOff.
- templates/deployment.yaml, templates/migrations-job.yaml: render
`.Chart.AppVersion` directly instead of `main-<AppVersion>`.
- Chart.yaml: bump stale `appVersion: v1.80.12` (not on either
registry) to `v1.85.1` so local-checkout installs also resolve.
- values.yaml: update the commented tag-override hint to match.
* fix(helm): use :latest in tag override example, not pinned version
Per review: ghcr.io/berriai/litellm-database:latest is a floating
alias for the most recent stable (same digest as :main-stable),
maintained by the release pipeline's UPDATE_LATEST advance step.
Better example than a pinned version that goes stale.
|
||
|
|
8513d7fc0c
|
chore: update Next.js build artifacts (2026-05-23 19:21 UTC, node v20.20.2) (#28707) | ||
|
|
886e91b85e
|
fix(otel): stamp http.response.status_code on all error responses (#28405)
* fix(otel): stamp http.response.status_code on all error responses
httpx.HTTPStatusError exposes status under .response.status_code, not as a
top-level attr, so unified-endpoint 5xx failures left the SERVER span without
a status. The admin hooks only wrote a child span and never stamped or ended
the parent at all, so admin 4xx/5xx (and success) responses were invisible
to dashboards. Adds a fallback to .response.status_code in get_error_information,
and ends the parent SERVER span in async_management_endpoint_{success,failure}_hook
with the same _record_exception_on_span helper the unified path uses.
Resolves LIT-3193
* test(otel): exercise httpx.HTTPStatusError through admin path
Pins the contract that get_error_information's response.status_code fallback
is reachable from any entry point — without this, a future refactor that
bypasses _record_exception_on_span in the admin hooks could regress for
httpx-wrapped exceptions while the unified suite still passes.
* chore(otel): trim verbose comments in LIT-3193 changes
Tighten docstrings and remove redundant section dividers/inline narration.
Behavior is unchanged.
* fix(otel): set span.status on management hook parent SERVER span
Mirror the unified failure path: stamp StatusCode.ERROR on the parent
SERVER span before recording the exception, and StatusCode.OK before
ending it on success. Without this, OTEL backends filtering on span
status (the idiomatic primitive) miss admin-endpoint failures even
though the http.response.status_code attribute is correct.
Extend assert_server_span_attrs to assert span.status.status_code
matches the expected outcome so the gap can't regress.
* fix(otel): close SERVER span on body-validation and unhandled errors
Stash the SERVER span on request.state in auth so FastAPI exception
handlers can finish it for failures that occur after auth but before
the route handler (e.g. /model/new TypeError, /key/generate
RequestValidationError). Without this, those requests left dangling
spans missing http.response.status_code.
Resolves LIT-3193
* fix(otel): generic 500 body, log exception details server-side
Don't leak str(exc) and type(exc).__name__ to clients on uncaught
exceptions. The full traceback is logged via verbose_proxy_logger and
the SERVER span still gets http.response.status_code=500.
Resolves LIT-3193
* fix(otel): stamp http.response.status_code on every SERVER span path
Closes three remaining gaps where the proxy SERVER span ended without
the http.response.status_code attribute:
1. ProxyException raised from _read_request_body (e.g. invalid JSON
body) bubbled out of user_api_key_auth before the SERVER span was
created, so the FastAPI handler had nothing to close and the trace
never reached the backend. Hoist the span creation to a new
idempotent _ensure_parent_otel_span_on_request_state helper called
at the top of user_api_key_auth; wire openai_exception_handler to
close the dangling span. Covers /v1/chat/completions, /v1/messages,
/v1/responses (shared handler).
2. /v1/responses success — _handle_success ends the proxy span before
async_post_call_success_hook fires on this path, so the hook's
set_response_status_code_attribute(200) silently no-op'd against an
ended span. Stamp 200 + set OK status at the close site in
_handle_success / _end_proxy_span_from_kwargs via a shared
_close_proxy_span_ok helper, so the attribute lands regardless of
which success hook runs first.
3. Failure path for exceptions without code/status_code (e.g. a bare
TypeError surfacing through _handle_llm_api_exception) — empty
error_information.error_code → _record_exception_on_span skips the
stamp → the hook ends the span. Default to 500 in
async_post_call_failure_hook so the attribute is always set.
Resolves LIT-3193
|
||
|
|
14c0a2b3e2
|
feat(prometheus): emit per-token-type detail metrics (LIT-3220) (#28372) (#28378)
* feat(prometheus): emit per-token-type detail metrics (LIT-3220) (#28372) Adds five sparse counter metrics that break out the token detail fields providers already report in `usage.prompt_tokens_details` and `usage.completion_tokens_details`: - litellm_input_cached_tokens_metric (provider prompt-cache reads) - litellm_input_cache_creation_tokens_metric (Anthropic prompt-cache writes) - litellm_input_audio_tokens_metric (audio input tokens) - litellm_output_reasoning_tokens_metric (reasoning tokens) - litellm_output_audio_tokens_metric (audio output tokens) These are additive — existing input/output/total counters are unchanged, so no dashboards break. Each new counter is only incremented when the underlying detail is populated and > 0, keeping scrape output sparse for providers that don't report a given field. Data is read from the canonical Usage dict that `get_standard_logging_object_payload` already attaches at `standard_logging_payload["metadata"]["usage_object"]`, so no new plumbing through the logging pipeline is required. Tests: 10 new unit tests covering registration, label-set parity, all-types increment, zero/None/negative skip behaviour, and the no-metadata/no-usage_object no-op paths. Closes LIT-3220 Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: Krrish Dholakia <krrishdholakia@berri.ai> Co-authored-by: Claude <noreply@anthropic.com> * chore: remove proof folder image --------- Co-authored-by: oss-agent-shin <ext-agent-shin@berri.ai> Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: Krrish Dholakia <krrishdholakia@berri.ai> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com> |
||
|
|
5e16f20962
|
test(proxy): phase-4 payload behavior pinning for tier-2/3 key + team management endpoints (#28681)
* test(proxy): phase-4 payload behavior pinning for tier-2/3 key + team management endpoints Extends the Phase 1–3 behavior-pin suite at tests/proxy_behavior/management/ with a second axis: payload-shape pinning. Phase 1–3 held payload minimal and pinned (actor, target) → status across 37 routes; Phase 4 holds the caller fixed at an authorized actor, varies the payload shape, and asserts the observable DB effect (on accept) or the named guard / row-unchanged (on reject). Faithfulness contract from Phase 1–3 is unchanged. Six families + one gap-closer (59 new scenarios, 620 → 679 total): * F1 — key budget / rate-limit (test_key_budget_limits.py, 18) * F2 — key↔team reassignment (test_key_team_change.py, 6) * F3 — team budget / rate-limit (test_team_budget_limits.py, 15) * F4 — member-info validation (test_team_member_info_validation.py, 5) * F5 — permission batching (test_team_permissions_bulk_update.py, 6) * F6 — org-scoped team access (+2 detail-string pins in existing files) * F7 — coverage gap-closer (test_f7_coverage_closeout.py, 7) Harness extensions in conftest.py (additive only): * create_scratch_org() seeder with its own scratch-prefixed budget row * budget / limit fields on create_scratch_team() * scratch teardown also sweeps litellm_organizationtable Coverage telemetry (behavior-suite-only): * key_management_endpoints.py 60 % → 65 % (+82 lines) * team_endpoints.py 62 % → 72 % (+137 lines, crosses 70 % stretch) Key lands under 70 % per plan §7 escape hatch — the gap is dominated by routes outside F1–F6 scope (key list/info v2 internals) and structurally dead org-budget guards (call sites at lines 889 + 2310 + 985 + 1751 load the org without include_budget_table=True, so org.litellm_budget_table is None at guard time and the aggregate guard no-ops). Pinned as observed no-op behavior so a future fix that flips the flag turns these into reds. Zero source-code changes; pyproject.toml diff is empty; test_route_coverage.py stays green untouched; G3 grep guards still green; local wall-time 14 s for the full suite (no coverage), 22 s with coverage. G4 regression-replay protocol executed against three representative fix-PR parents ( |
||
|
|
203b529c9d
|
feat(azure): add speech transcription config support (#27482)
Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com> Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com> |
||
|
|
2eab9ee2c0
|
perf: reduce per-request and per-chunk overhead across Anthropic streaming hot paths (#28289)
* perf: reduce per-request and per-chunk overhead across Anthropic streaming hot paths
- Introduce pure-text fast-path in `_build_complete_streaming_response` that collapses O(N) `content_block_delta` events into a single equivalent SSE event before conversion, eliminating per-output-token Pydantic `ModelResponseStream` construction; non-text streams (tool_use, thinking, citations) fall back to the unchanged legacy path
- Skip agentic streaming wrapper entirely when no callback overrides `async_should_run_agentic_loop`; the wrapper buffered every chunk and rebuilt the SSE response only to call hooks that all return `(False, {})` — a pure no-op for the default config
- Serialize request body once (`json.dumps`) for both the pre-call log input and the wire, instead of twice; avoids a full O(payload) scan per request, significant for long-context Claude Code histories
- Add fast path in `async_streaming_data_generator` that bypasses the per-chunk `async_post_call_streaming_hook` coroutine await, response-string materialization, and cost-injection call when no callback/guardrail/cost-injection is active (the default config)
- Resolve `_DD_STREAMING_TRACE_ENABLED` once at import time; eliminate per-chunk `NullSpan` context manager allocation when Datadog tracing is disabled (the default)
- Memoize `get_type_hints(AnthropicMessagesRequestOptionalParams)` with `@lru_cache(maxsize=1)` — resolves once per process instead of once per `/v1/messages` request (~80µs each)
- Hoist `cost_injection_active` out of the per-chunk loop in `chunk_processor`; eliminates repeated `getattr` + endpoint-type checks on every streamed byte chunk
- Extract `_build_passthrough_logging_result` from `_route_streaming_logging_to_handler` as a standalone static method to facilitate future off-loop dispatch
- Convert `async_sse_data_generator` from an `async for: yield` trampoline to a direct return of the underlying generator, removing one async-generator layer per streamed chunk
- Skip redundant `strip_empty_text_blocks_from_anthropic_messages` scan in `anthropic_messages_handler` when the async wrapper already sanitized (signalled via `_litellm_messages_presanitized` sentinel, popped before reaching provider params)
- Gate debug log `f-string` evaluation behind `isEnabledFor(DEBUG)` in both the streaming generator and the transformation layer to avoid serializing entire message payloads on every request at non-debug log levels
- Add benchmark script (`scripts/benchmark_anthropic_messages_perf.py`) with a local mock Anthropic SSE provider for reproducible TTFT and TPM measurement across commits/branches
- Add parity tests asserting fast-path and legacy-path produce byte-identical logged/billed payloads, plus unit tests for agentic hook detection, pre-serialized body reuse, and memoized key resolution
* perf: address greptile review for anthropic streaming hot path
- Bail to legacy in `_collapse_pure_text_chunks` when content_block_delta
events from different block indexes are observed without an intervening
flush. Anthropic sends blocks strictly sequentially, but defensive bail
prevents silent text-merging if the protocol ever interleaves.
- Replace leaf-class `__dict__` check for `async_post_call_streaming_hook`
in `_callback_capabilities` with a function-identity comparison that
walks the MRO. A vendor base class can carry the override and the
registered class can add nothing else; before this PR the hook was
unconditionally invoked, so an inherited-override miss would silently
drop the hook on the streaming path.
- Add unit tests for both behaviors.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(mypy): narrow model_name to str in cost-injection branch
The hoisted cost_injection_active flag in chunk_processor encodes the
`bool(model_name)` requirement but mypy can't track that invariant
through the local, so the per-chunk `_process_chunk_with_cost_injection(
chunk, model_name)` calls flagged Optional[str] vs str. Pin a typed
non-None local inside the cost-injection branch so mypy narrows
correctly without changing runtime behavior.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
3b2ce201d8
|
encrypt callback_vars in key/team metadata at rest (#27141)
Co-authored-by: Michael Riad Zaky <michaelr@Michaels-MacBook-Air.local> Co-authored-by: Yuneng Jiang <yuneng@berri.ai> |
||
|
|
492891cad8
|
CI: copy of #25177 (OCI GenAI: embeddings, streaming/reasoning fixes, model catalog) (#28223)
* fix(opentelemetry): JSON-serialize dict metadata fields for OTEL span attributes (#27451) (#27455)
Squash-merged by litellm-agent from Anai-Guo's PR.
* feat(dashscope): add embeddings and reranks(qwen3-rerank) support via OpenAI-compatible endpoint (#27508)
Squash-merged by litellm-agent from yimao's PR.
* fix(vertex_ai/gemini): raise BadRequestError when image_url or url fi… (#24550)
Squash-merged by litellm-agent from krisxia0506's PR.
* fix(vertex_ai): raise error on mid-stream 429/error chunks instead of silently swallowing (#23711)
Squash-merged by litellm-agent from krisxia0506's PR.
* fix: raise BadRequestError for file content blocks missing 'file' sub… (#24503)
Squash-merged by litellm-agent from krisxia0506's PR.
* Fix Gemini MIME detection for extensionless GCS URIs (#27278)
Squash-merged by litellm-agent from krisxia0506's PR.
* fix(vertex_ai/partner_models): drop unused vertexai SDK gate from count_tokens (closes #28084) (#28107)
Squash-merged by litellm-agent from voidborne-d's PR.
* feat(chart): add support for autoscaling behavior in HPA (#27990)
Squash-merged by litellm-agent from FabrizioCafolla's PR.
* feat(proxy): add blocked flag to models for pause/resume from the UI (#27927)
Squash-merged by litellm-agent from Cyberfilo's PR.
* fix: pass socket timeouts to Redis cluster clients (#27920)
Squash-merged by litellm-agent from tomdee's PR.
* Fix/cache token (#28009)
Squash-merged by litellm-agent from escon1004's PR.
* fix(deepseek): forward reasoning_content in multi-turn thinking mode conversations (#28080)
Squash-merged by litellm-agent from Divyansh8321's PR.
* fix(guardrails): return HTTP 400 instead of 500 for blocked requests (#27617)
* fix: reset org and tag budgets (#27326)
* reset org budgets
* reset tag budgets
---------
Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain>
* fix(ui): omit allowed_routes from key edit save when unchanged (#27553)
* fix(ui): omit allowed_routes from key edit save when unchanged
When a team admin opens Edit Settings on a key with key_type=AI APIs and
saves without changing anything, the UI re-sends the existing allowed_routes
value, which the backend's _check_allowed_routes_caller_permission gate
rejects for non-proxy-admins (LIT-2681).
Strip allowed_routes from the patch in handleSubmit when it deep-equals the
original keyData.allowed_routes. The backend treats absence as "leave alone,"
so no-op saves now succeed for non-admins. Admins explicitly editing the
field still send the new value.
* fix(ui): order-insensitive allowed_routes diff + cover null-original case
Address Greptile review:
- Switch the "is allowed_routes unchanged" check to a Set-based comparison so
a server-side reorder of the array doesn't register as a user edit and
re-trigger LIT-2681.
- Add two regression tests: (1) keyData.allowed_routes is null and the form
is untouched — patch should strip the field; (2) server returned routes in
a different order than the user originally entered — patch should still
recognize the value as unchanged.
* chore(ui): strip ticket refs and tighten comments in key edit fix
- Remove internal-tracker references from in-code comments
- Tighten the WHY comment in handleSubmit to two lines
- Drop redundant test-block comments — test names already describe the case
* fix(ui): annotate Set<string> generic in allowed_routes diff to fix tsc
* fix(guardrails): return HTTP 400 instead of 500 for guardrail-blocked requests
GuardrailRaisedException and BlockedPiiEntityError both lacked a
status_code attribute. When these exceptions reached the proxy
exception handler (getattr(e, 'status_code', 500)), the fallback
defaulted to HTTP 500 — making intentional guardrail blocks
indistinguishable from server errors and causing unnecessary client
retries.
Changes:
- Add status_code=400 (keyword-only) to GuardrailRaisedException
- Add status_code=400 (keyword-only) to BlockedPiiEntityError
- Update _is_guardrail_intervention() to recognize both exceptions
so downstream loggers record 'guardrail_intervened' instead of
'guardrail_failed_to_respond'
- Add 6 unit tests for default/custom status codes and getattr pattern
- Strengthen existing blocked-action test with status_code assertion
Fixes #24348
---------
Co-authored-by: Michael-RZ-Berri <michael@berri.ai>
Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain>
Co-authored-by: ryan-crabbe-berri <ryan@berri.ai>
Co-authored-by: Krrish Dholakia <krrish+github@berri.ai>
* fix(router/proxy): address Greptile P1+P2 review comments on PR #28161
- router: raise ServiceUnavailableError (503) instead of RouterRateLimitErrorBasic (429)
when a specifically-addressed deployment is administratively blocked; 429 misleads
retry-enabled clients into spinning forever against a paused model
- proxy_server: compute get_fully_blocked_model_names() once before both branches in
model_list() instead of duplicating the call in each branch
- deepseek: upgrade silent debug log to warning when injecting placeholder
reasoning_content so callers are clearly notified of degraded multi-turn quality
- tests: update two blocked-deployment assertions to expect ServiceUnavailableError
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix: address bug detection findings (cache token order, mutable defaults)
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix: address bugs in async pass-through, anthropic cache token detection, rerank tests
- async_get_available_deployment_for_pass_through: enforce blocked check on specific deployments
- cost_calculator: detect anthropic-style usage by attribute presence (not truthiness) to avoid mixing OpenAI cached_tokens into anthropic normalization when read=0
- dashscope rerank tests: pass request to httpx.Response constructions for consistency
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix code qa
* fix(vertex_ai/gemini): strip MIME parameters from GCS contentType
GCS object metadata's contentType field can include parameters such as
'text/html; charset=utf-8'. Strip them in _apply_gemini_mime_type_aliases
so downstream get_file_extension_from_mime_type sees a bare MIME type.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(vertex_ai/gemini): clarify mime-type error message string concatenation
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* feat(oci): add embeddings, fix streaming/reasoning, expand model catalog
- Add OCIEmbedConfig with full Cohere embed support (7 models, batch up to 96)
- Fix sync streaming: split SSE events on \n\n before JSON parsing
- Fix reasoning models (Gemini 2.5, xAI Grok): make completionTokens and message
optional in OCIResponseChoice to handle max_tokens exhausted on reasoning
- Fix compartment_id resolution in chat transform to use resolve_oci_credentials
- Fix tool call id: make OCIToolCall.id optional, generate UUID fallback for
providers (Google via OCI) that omit it
- Add OCI_KEY env var support for inline PEM keys
- Fix datetime.utcnow() deprecation in request signing
- Expand model catalog: 29 OCI models including Llama 4, Gemini 2.5, xAI Grok,
Cohere Command A, and all Cohere embed variants
- Add 37 live integration tests: sync/async completions for Meta/Google/xAI/Cohere,
sync/async embeddings, tool use across all vendors, streaming, env var auth
- Add 23 embed unit tests covering all transform and validation paths
* fix(oci): remove dead OCI elif branch in utils.py, align async split_chunks with sync version
* test(oci): add unit tests for split_chunks fix and no-duplicate-OCI-branch guard
* fix(oci): address remaining bugs from issue #25082 — streaming signed body, Cohere stop sequences, hardcoded defaults
- Bug 1: sync and async streaming paths now use signed_json_body when provided
instead of re-serializing data with json.dumps() — the OCI RSA-SHA256 signature
covers the exact request body bytes, so re-serializing produces an invalid sig
- Bug 3: Cohere stop sequences now map to 'stopSequences' (was incorrectly 'stop')
- Bug 4: removed hardcoded Cohere defaults (maxTokens=600, temperature=1, topK=0,
topP=0.75, frequencyPenalty=0) that silently overrode user intent on every call
- Added 6 unit tests covering all three fixes
* fix(oci): comprehensive code quality pass — bugs, tests, schema accuracy
- Fix Cohere tool call IDs (was always call_0; now UUID per call)
- Fix TOOL_CALL finish reason mapping in both sync and streaming paths
- Fix Cohere stop parameter mapping (stop → stopSequences)
- Remove hardcoded Cohere defaults (maxTokens/topK/topP/frequencyPenalty)
- Fix content[0] safety guard against empty content arrays
- Fix streaming signed body used consistently (not re-serialized)
- Raise OCIError (not bare Exception/ValueError) throughout
- Centralize OCI_API_VERSION constant; import uuid at module level
- Fix embed get_complete_url to strip trailing slashes from api_base
- Fix OCIEmbedResponse schema: add inputTextTokenCounts (actual OCI field)
- Fix embed usage computed from inputTextTokenCounts (sum of per-input counts)
- Fix Cohere toolCallId included in tool result messages
- Add OCIToolCall.id as Optional (absent in Google/xAI streaming chunks)
- Update tests to reflect correct behavior (no hardcoded defaults, UUID ids,
deferred credential validation, OCIError vs ValueError, real response schema)
* test(oci): move integration tests to tests/llm_translation/
Addresses greptile P1: tests/test_litellm/ is for mock-only unit tests
(make test-unit target). Real-network OCI tests now live in the correct
location alongside other provider integration tests.
* fix(oci): align types and transformation with official OCI SDK
- Remove OCIVendors.GEMINI — apiFormat="GEMINI" is invalid; all non-Cohere
models use apiFormat="GENERIC"
- Add toolChoice, logitBias, logProbs to OCIChatRequestPayload so params
present in the mapping are no longer silently dropped by Pydantic
- Exclude n→numGenerations from Cohere param map (not a Cohere API field)
- Fix CohereToolResult: change callId/result to call/outputs matching
the OCI SDK's CohereToolResult structure
- Fix CohereToolMessage: replace non-existent toolCallId with toolResults
list; update adapt_messages_to_cohere_standard to build proper tool-result
history entries by resolving tool call name+params from preceding assistant
messages
- Map generic-model stream finish reasons to OpenAI convention
(COMPLETE→stop, MAX_TOKENS→length, TOOL_CALLS→tool_calls), consistent
with the existing Cohere streaming path
- Add optional id field to OCIEmbedResponse so valid API responses
carrying an id are not rejected by the Pydantic model
* fix(oci): use 'output' key in Cohere tool result outputs (matches reference impl)
* fix(oci): port schema/type utilities from langchain-oracle reference impl
- Add resolve_oci_schema_refs: inline $ref/$defs — OCI rejects JSON Schema refs
- Add resolve_oci_schema_anyof: flatten Optional[T] anyOf (Pydantic v2 emits these)
- Add sanitize_oci_schema: strip title, normalise null types, ensure array items
- Add OCI_JSON_TO_PYTHON_TYPES: Cohere expects Python type names (str/int/float),
not JSON Schema names (string/integer/number)
- Add enrich_cohere_param_description: embed enum/format/range/pattern constraints
into description since CohereParameterDefinition has no dedicated fields
- Apply all of the above in adapt_tool_definitions_to_cohere_standard and
adapt_tool_definition_to_oci_standard
- Fix toolChoice conversion: map OpenAI string ('auto','none','required') to OCI
dict form ({"type":"AUTO"} etc.) — the API rejects plain strings
- Update unit test expectations to match correct Python type names and enriched
descriptions
* refactor(oci): split transformation.py into cohere.py and generic.py
transformation.py was 1 243 lines doing too many jobs. Split along the
same boundaries as the langchain-oracle reference (providers/cohere.py,
providers/generic.py):
chat/cohere.py — Cohere message/tool building, response + stream parsing
chat/generic.py — Generic message/tool building, response + stream parsing
transformation.py — thin OCIChatConfig orchestrator + OCIStreamWrapper
Public symbols (OCIChatConfig, OCIStreamWrapper, adapt_messages_to_*,
OCIRequestWrapper, version, …) remain importable from transformation.py
for backward compatibility. OCIStreamWrapper gains delegating shims for
_handle_cohere_stream_chunk and _handle_generic_stream_chunk so existing
test call sites keep working unchanged.
transformation.py: 1 243 → 620 lines
* refactor(oci): principal-level code quality pass
- Remove _extract_text_content duplication — single definition in cohere.py,
imported where needed; instance method on OCIChatConfig eliminated
- Move cryptography imports to module level with _CRYPTOGRAPHY_AVAILABLE flag
and _require_cryptography() guard; no more re-import on every signing call
- Move litellm version import to module level via litellm._version; remove
inline import inside validate_oci_environment
- sign_with_manual_credentials now returns Tuple[dict, bytes] matching
sign_with_oci_signer — asymmetry eliminated, Optional[bytes] guards removed
throughout stream wrappers (signed_json_body: bytes = b"")
- Rename _openai_to_oci_cohere_param_map → openai_to_oci_cohere_param_map
for consistency with openai_to_oci_generic_param_map
- Remove double-key bug in map_openai_params where responseFormat was stored
under both OCI and OpenAI key names simultaneously
- Remove delegating shims (adapt_messages_to_cohere_standard,
adapt_tool_definitions_to_cohere_standard, _handle_generic_stream_chunk)
from OCIChatConfig/OCIStreamWrapper; tests now import directly from
cohere.py and generic.py where symbols live
- Trim __all__ to 7 genuine public symbols; remove the 13-symbol list that
existed only to support test imports
- Collapse per-model integration test classes into pytest.mark.parametrize;
CHAT_MODELS list is the single source of truth for model-specific config
- Black + Ruff clean across all OCI files
* fix(oci): address PR review findings
- types/llms/oci.py: add "TOOL_CALL" to CohereChatResponse.finishReason
Literal so Pydantic does not raise ValidationError on non-streaming
Cohere tool-use calls (Greptile P1)
- test_oci_cohere_tool_calls.py: add test covering TOOL_CALL finish reason
- model_prices_and_context_window.json: remove 6 duplicate oci/cohere.embed-*
keys that were silently overridden by the more complete entries already
present in the file (Greptile P1)
- common_utils.py: move OCI_API_VERSION here from chat/transformation.py
so embed/transformation.py does not need to import chat/transformation;
change Protocol stub body from ... to pass (CodeQL "statement no effect");
add comment to sha256_base64 clarifying it implements OCI HTTP signing
spec, not password hashing (CodeQL false positive)
- chat/transformation.py: import CustomStreamWrapper from
litellm_core_utils.streaming_handler instead of litellm.utils to reduce
import cycle depth (CodeQL cyclic import)
- chat/cohere.py, chat/generic.py: import Usage and
ChatCompletionMessageToolCall from litellm.types.utils instead of
litellm.utils for the same reason
- embed/transformation.py: import OCI_API_VERSION from common_utils
instead of chat/transformation (removes the embed→chat import edge)
* test(oci): add unit tests to improve patch coverage
- test_oci_common_utils.py (new): covers sha256_base64, build_signature_string,
OCIRequestWrapper.path_url, resolve_oci_credentials, get_oci_base_url,
validate_oci_environment, sign_with_oci_signer error paths, sign_oci_request
routing, load_private_key_from_file error paths, resolve_oci_schema_refs
(including circular ref and external $ref), resolve_oci_schema_anyof,
sanitize_oci_schema (all branches), enrich_cohere_param_description
- test_oci_generic_chat.py (new): covers content-message error paths (non-dict
item, unsupported type, non-string text, invalid image_url), tool-call
validation error paths, adapt_messages_to_generic_oci_standard error paths,
handle_generic_response (None message, text content, tool calls),
handle_generic_stream_chunk (finish reasons, streaming tool calls),
OCIStreamWrapper non-string chunk error
- test_oci_chat_transformation.py: add error paths for validate_environment
(empty messages), transform_request (missing compartment_id, Cohere without
user messages), transform_response (error key), map_openai_params
(unsupported param with and without drop_params), tool_choice string mapping
- test_oci_cohere_tool_calls.py: add edge cases for stream chunk finish
reasons (TOOL_CALL, MAX_TOKENS, unknown), _extract_text_content with
non-dict list items and non-string input,
adapt_messages_to_cohere_standard with malformed JSON tool arguments
* fix(oci): rename supports_streaming to supports_native_streaming in model prices
The JSON schema for model_prices_and_context_window.json uses
`supports_native_streaming` (not `supports_streaming`) and has
`additionalProperties: false`. Rename the field across all OCI
entries to pass the schema validation test.
* test(oci): add 67 tests targeting uncovered happy paths for coverage
Boost patch coverage on the four lowest-coverage OCI files:
- common_utils.py: sign_with_manual_credentials (oci_key / oci_key_file
paths), sign_oci_request routing, _require_cryptography
- generic.py: adapt_messages_to_generic_oci_standard (all roles),
adapt_tool_definition_to_oci_standard, adapt_tools_to_openai_standard,
handle_generic_stream_chunk text/finish-reason paths
- cohere.py: _extract_text_content, adapt_messages_to_cohere_standard
(all roles including tool results), handle_cohere_response /
handle_cohere_stream_chunk all finish-reason branches
- transformation.py: get_vendor_from_model, OCIChatConfig._get_optional_params
(toolChoice string→dict, responseFormat, tools for both vendors),
transform_request for GENERIC model, get_sync/async_custom_stream_wrapper
with mocked HTTP, OCIStreamWrapper.chunk_creator happy paths
* fix(oci): suppress CodeQL false positive on sha256_base64 (OCI HTTP signing, not password hashing)
* fix(oci): remove 6 duplicate model price entries and reconcile conflicting values
Six OCI chat model keys appeared twice in model_prices_and_context_window.json
with conflicting pricing/context data (JSON parsers silently discard the first).
Remove the first-occurrence entries and update the surviving entries:
- meta.llama-4-maverick / llama-4-scout: keep updated entries (free preview
pricing, larger context windows, vision support)
- meta.llama-3.1-70b: keep original pricing, restore supports_native_streaming
- google.gemini-2.5-{flash,pro,flash-lite}: keep OCI pricing page values,
restore supports_native_streaming
* fix(oci): route GPT-5 family to maxCompletionTokens
GPT-5 / GPT-5-mini / GPT-5-nano / GPT-5.5 on OCI reject "maxTokens"
with HTTP 400:
Invalid 'maxTokens': Unsupported parameter: 'maxTokens' is not
supported with this model. Use 'maxCompletionTokens' instead.
(Same convention as OpenAI's reasoning-API contract.)
Add a model-aware rename in OCIChatConfig._get_optional_params so the
request payload uses maxCompletionTokens when the model id starts with
openai.gpt-5. Regular Llama / Cohere / Gemini / GPT-4.x continue to use
maxTokens unchanged.
Also widen OCIChatRequestPayload to carry the new optional field so it
survives Pydantic serialization.
Verified live against OCI us-chicago-1:
- openai.gpt-5, gpt-5-mini, gpt-5-nano, gpt-5.5 all return 200
- Full feature sweep on gpt-5.5 (basic, system, multi-turn, streaming,
tools, usage) all green
- meta.llama-3.3-70b-instruct still uses maxTokens (no regression)
4 new unit tests cover the helper, the routing in both pre- and
post-translation states, and Pydantic serialization.
* ci(oci): fix CI failures — black formatting + recursive_detector ignore
- Run black on litellm/llms/oci/common_utils.py + 3 OCI test files
that drifted out of black-compliance during the rebase.
- Add the three bounded recursive functions in oci/common_utils.py
(`_resolve`, `resolve_oci_schema_anyof`, `sanitize_oci_schema`) to
the recursive_detector IGNORE_FUNCTIONS list. All three are bounded:
`_resolve` uses a `resolving_stack` cycle guard; the other two are
bounded by JSON-schema tree depth (no cycles in well-formed input),
matching the pattern of the existing OCI/Vertex schema walkers
already on the list.
* fix(oci): silence MyPy errors in cohere.py — typed-dict access
Two errors flagged by `lint` CI:
llms/oci/chat/cohere.py:73: "object" has no attribute "__iter__"
llms/oci/chat/cohere.py:119: No overload variant of "get" of "dict"
matches argument types "object", "CohereToolCall"
Both stem from `msg.get("tool_calls")` / `msg.get("tool_call_id")`
returning `object` per the AllMessageValues TypedDict union. Bind to
`Any` locally for the iteration and coerce the lookup key with `str()`,
removing the now-unused `# type: ignore` on those lines.
No behaviour change — pure type-narrowing for the type checker.
* fix(oci): silence CodeQL py/weak-sensitive-data-hashing on sha256_base64
CodeQL's taint analysis traces request bodies back to environment-loaded
secrets and flags `hashlib.sha256(body).digest()` as
`py/weak-sensitive-data-hashing` — even though SHA-256 is the algorithm
mandated by the OCI HTTP request signing spec for the
`x-content-sha256` header (not a password/secret hash).
The previous suppression used legacy `# lgtm[...]` syntax which the
modern CodeQL action ignores. Switch to Python's standard
`hashlib.sha256(..., usedforsecurity=False)` (Python 3.9+) which CodeQL
honours as a non-security declaration. Behaviour unchanged.
* feat(oci): add reasoning_effort passthrough — only true missing primitive
OCI's GenericChatRequest exposes a reasoningEffort field
(NONE/MINIMAL/LOW/MEDIUM/HIGH) that's the single biggest cost knob for
reasoning-capable models on the service:
- GPT-5 family
- Gemini 2.5
- Grok reasoning variants (3-mini, 4-fast, 4.20)
- Cohere Command-A-Reasoning
Setting reasoning_effort=LOW typically cuts reasoning-token spend 5-10×
vs the default. Without exposing this, litellm users had no way to tune
cost-vs-quality on these models.
The other GenericChatRequest fields (verbosity, parallel_tool_calls,
logit_bias, n, metadata, web_search_options, prediction) are not
exposed because they are not missing primitives — they either duplicate
prompt-engineering, framework-level controls, or are too niche to
justify the maintenance surface. We only ship what users genuinely
can't accomplish another way.
Excluded from the Cohere v1 param map: CohereChatRequest has no
reasoningEffort field, and Cohere reasoning models
(cohere.command-a-reasoning) use COHEREV2 which is a separate request
type not covered by this PR.
Verified live: GPT-5.5 + reasoning_effort="HIGH" sends
{"reasoningEffort": "HIGH"} on the wire and OCI accepts the request.
* feat(oci): reasoning_effort + reasoning_tokens for OCI GenAI
Three small additions for OCI reasoning models, requested by users
testing the PR in production fork builds:
1. **reasoning_effort param mapping (GENERIC vendors).** OCI expects
uppercase levels ("LOW"/"MEDIUM"/"HIGH"/"NONE") on `reasoningEffort`,
but OpenAI-compatible clients send lowercase. Mapped + uppercased in
`_get_optional_params`. Marked unsupported on Cohere V1/V2 since OCI
Cohere has no reasoning models (avoids Pydantic validation failure
on CohereChatRequest).
2. **"disable" → "NONE" mapping.** OpenAI uses "disable" to turn off
reasoning; OCI uses "NONE". Without this, callers get a 400.
3. **reasoning_tokens propagated to Usage.** OCI returns
`completionTokensDetails.reasoningTokens` but it wasn't being passed
to LiteLLM's Usage object. Now flows through to
`Usage.completion_tokens_details.reasoning_tokens` so callers can
track reasoning token consumption for cost/observability.
Tests: 7 new unit tests in TestOCIReasoningEffort covering upper/lower
case, "disable"→"NONE", Cohere drop/raise paths, and reasoning_tokens
extraction (with and without completionTokensDetails). 5 new live
integration tests against xai.grok-3-mini in us-chicago-1 verifying the
full request/response loop end-to-end. Existing
test_transform_response_simple_text assertion that
completion_tokens_details was None has been updated to assert
reasoning_tokens flows through.
Verified live on xai.grok-3-mini: reasoning_effort=low → OCI accepts
"LOW", returns reasoningTokens=316 in usage. reasoning_effort=disable
→ OCI accepts "NONE". Full suite: 370/370 unit + 51/51 integration.
* fix(codeql): re-scope py/weak-sensitive-data-hashing exclusion to OCI signing file
CodeQL's taint analysis re-fires the `py/weak-sensitive-data-hashing`
alert at `litellm/llms/oci/common_utils.py:103` whenever upstream code
paths into the OCI signing module change (touching `transformation.py`
opens new flow paths that CodeQL re-evaluates from scratch). The
`hashlib.sha256(..., usedforsecurity=False)` declaration silences the
direct-call form of the query but not the taint-flow form.
SHA-256 here is mandated by the OCI HTTP signing specification for the
x-content-sha256 content-integrity header — not for password storage:
https://docs.oracle.com/en-us/iaas/Content/API/Concepts/signingrequests.htm
CodeQL has no per-query path filter and GitHub Code Scanning ignores
inline lgtm/codeql comments, so path-ignoring this single ~560-line
signing utility file is the narrowest available suppression. All other
files retain full coverage of py/weak-sensitive-data-hashing — including
litellm/proxy/utils.py where the rule legitimately applies.
This restores the NEUTRAL CodeQL state the PR had on prior commits
(see `2111c98af7` for the same approach on the previous branch
evolution that the cherry-pick was rebased onto a different baseline).
* fix(oci): drop duplicate text on Cohere streaming terminal chunk
OCI Cohere's terminal SSE event re-sends the full assembled response in
`text` alongside a populated `chatHistory`. Emitting that text as another
delta concatenates the entire response onto the already-streamed output
(e.g. "How can I help?How can I help?").
Use `chatHistory is not None` as the discriminator for the consolidated
terminal event — `finishReason` is a weaker signal that could in principle
appear on a non-consolidated chunk. The two coincide today; this preserves
correctness if OCI ever ships finishReason on an incremental chunk.
Adds a live-OCI integration regression test that compares streamed vs
non-streamed length and asserts the response prefix appears only once.
Verified to fail under the previous code with the exact reported
reproduction: 'Hello! How can I help you today?Hello! How can I help you today?'.
Reported by @gotsysdba on PR #25177.
* fix(oci): buffer SSE stream across HTTP read boundaries
The old split_chunks helper split each individual HTTP read on "\n\n",
which assumed SSE event boundaries always aligned with read boundaries.
In practice the OCI streaming endpoint delivers events that may:
- straddle two reads (chunk_creator gets a truncated JSON and crashes)
- arrive separated by a single "\n" instead of "\n\n"
- share a read with multiple complete events
Replace the inline split with module-level helpers _iter_sse_events
(sync) / _aiter_sse_events (async) that maintain a buffer across reads,
split on any newline, and yield only complete "data:" lines.
Add 25 regression tests covering event-split-across-reads, tiny-chunk
reads, single-newline separators, keepalive/comment lines, trailing
partial events flushed at EOF, "\r\n" line endings, and an end-to-end
smoke test that feeds an awkwardly-chopped payload through the splitter
into OCIStreamWrapper.chunk_creator.
Reported by John Lathouwers.
* test(oci): repoint TestOCIKeyNormalization to sign_with_manual_credentials
The signing helper moved from OCIChatConfig._sign_with_manual_credentials
to a module-level sign_with_manual_credentials in common_utils.py. Four
tests in TestOCIKeyNormalization still called the old method:
- 2 failed outright with AttributeError
- 2 passed by accident because they used pytest.raises(Exception),
which happily caught the AttributeError instead of exercising the
intended OCIError path
Repoint all four to the new module-level function so they exercise the
actual oci_key type-validation branch.
* fix(oci): validate oci_region before URL interpolation to prevent SSRF
Anchor oci_region to ^[a-z][a-z0-9-]{0,30}[a-z0-9]$ inside get_oci_base_url
so user-supplied regions that would redirect the signed request to an
attacker-controlled host (e.g. 'evil.com/#') fail with HTTP 400 before
the URL or signature is built. Empty string still falls back to the
us-ashburn-1 default, so existing callers are unaffected.
* test(audio): skip when gpt-4o-audio-preview is unavailable upstream
OpenAI retired `gpt-4o-audio-preview` (404 model_not_found in CI as of
2026-05-19), and the existing try/except in these tests only re-raised
on 'openai-internal' errors. Other exceptions were silently swallowed,
so the next line ran with an unbound `response`/`completion` and
failed with an unrelated UnboundLocalError that masked the real cause.
Extend the skip condition to also cover model_not_found / 'does not exist'
so the suite reports the upstream outage cleanly, matching the pattern
used in
|
||
|
|
7270f723de
|
fix(mcp): forward upstream initialize instructions on cold gateway init (#28231)
Prefetch upstream InitializeResult.instructions before merging gateway initialize options when YAML/DB do not set instructions, so clients receive upstream server text on the first MCP initialize without list_tools. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
f35e7eb2f6
|
feat(guardrails): add Microsoft Purview DLP guardrail (#24966)
* feat(guardrails): add Microsoft Purview DLP guardrail
* fix(guardrails/purview): raise_for_status on HTTP errors, cap scope cache, reuse executor
* fix(guardrails/purview): propagate litellm_call_id as correlation_id to Purview
* chore: fixes
* refactor(guardrails): delegate get_user_prompt to get_last_user_message
PurviewGuardrailBase duplicated AzureGuardrailBase (and OpenAIGuardrailBase)
user-prompt extraction. The same logic already lived in
common_utils.get_last_user_message; wire guardrail bases to that helper,
fix the helper docstring, and drop its redundant self-import of
convert_content_list_to_str.
Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>
* fix(purview): make protection scope cache true LRU on hits
OrderedDict.get() does not update insertion order; call move_to_end on
TTL-valid cache hits so popitem(last=False) evicts least-recently-used
users instead of FIFO by first insert.
Add a regression test with a small max cache size.
Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>
* Fix mypy
* fix(guardrails/purview): harden user-id resolution and broaden DLP text
Prefer API key and proxy-injected metadata over client metadata for Entra
identity. Scan full message transcript pre-call and all completion choices
post-call. Align logging-only hook with the same user-id rules.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(guardrails/purview): scan /v1/completions prompt and TextChoices
Normalize text-completion prompts (string or list of strings); skip token-id-only
prompts. Run post-call DLP on TextCompletionResponse choices. Extend logging_only
hook for text_completion. Add tests and completion_prompt_to_str helper.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(purview-dlp): return data after DLP pass; per-call executor; dedupe text extraction
async_pre_call_hook now returns the request dict after a successful check so
callers match skip-path behavior. logging_hook uses a fresh ThreadPoolExecutor
per invocation like Presidio to avoid single-worker starvation. Response text
extraction is centralized in _completion_response_text_parts.
Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>
* fix(purview): fix LRU cache refresh position and add Responses API scanning
Two fixes to the Microsoft Purview DLP guardrail:
1. LRU cache bug (base.py): When a stale scope cache entry was re-fetched,
the assignment updated the value but
Python's OrderedDict.__setitem__ preserves the original insertion order for
existing keys. This left the refreshed entry near the front of the dict,
making it the first candidate for LRU eviction via popitem(last=False).
Fix: call move_to_end(user_id) after every write to an existing key.
2. Responses API coverage gap (purview_dlp.py): Requests to /v1/responses use
an 'input' field instead of 'messages' or 'prompt', so the pre-call hook
returned without scanning the content. Similarly, post-call hook did not
handle ResponsesAPIResponse.output. Fix: add _responses_api_input_to_str()
helper and handle 'responses'/'aresponses' call types in async_pre_call_hook,
async_post_call_success_hook (via _completion_response_text_parts), and
async_logging_hook.
Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>
* fix(purview): message separator, non-blocking logging_hook, TextChoices type error
Three bugs fixed in the Microsoft Purview DLP guardrail:
1. get_prompt_text_for_dlp message separator (base.py)
- Previously called get_str_from_messages() which concatenated all message
texts with NO separator, so 'end of msg1' + 'start of msg2' became
'end of msg1start of msg2'.
- Now joins per-message text with '\n\n' via convert_content_list_to_str(),
preserving DLP pattern detection accuracy across message boundaries.
2. logging_hook blocking the event loop thread (purview_dlp.py)
- Previously called future.result() which blocked the calling thread
(often the event loop thread) for the entire round-trip of two sequential
Microsoft Graph API calls (_compute_protection_scopes + _process_content).
- Now fires and forgets: when called inside a running loop, schedules the
coroutine with loop.create_task(); otherwise spawns a daemon thread.
Returns (kwargs, result) immediately in both cases.
- Removes unused concurrent.futures.ThreadPoolExecutor import; adds threading.
3. Incompatible assignment type error (purview_dlp.py:180)
- mypy inferred 'choice' as TextChoices from the first loop body, then
flagged the assignment in the second loop as incompatible with Choices.
- Fixed by using distinct loop variable names: text_choice (TextChoices) and
chat_choice (Choices).
Tests: 7 new tests added covering the separator fix (TestGetPromptTextForDlp)
and the non-blocking logging_hook (TestLoggingHookNonBlocking).
Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>
* fix(purview): suppress API errors in logging-only mode and scan tool-call arguments
Three issues fixed:
1. _check_content except block re-raised unconditionally even when
block_on_violation=False. The docstring promised 'log only - do not
raise' but network/API errors always propagated. Fixed by checking
block_on_violation before re-raising; when False, log a warning and
continue.
2. async_logging_hook used a single try/except wrapping both the prompt
and response audit calls. When the first _check_content (uploadText)
raised due to an API error the second call (downloadText) was silently
skipped. Fixed by giving each audit call its own try/except so both
always run independently.
3. convert_content_list_to_str() only reads message.content, so
tool_calls[].function.arguments and function_call.arguments were
invisible to the Purview pre-call and post-call scans. An authenticated
caller could embed sensitive text in tool-call arguments and bypass DLP.
Fixed by:
- Adding PurviewGuardrailBase._extract_tool_call_args_from_message()
which handles both dict and object-style messages, covering both
tool_calls[] arrays and the legacy function_call field.
- Updating get_prompt_text_for_dlp() to include those arguments
alongside message content (request/prompt path).
- Changing _completion_response_text_parts() from @staticmethod to an
instance method and adding tool-call argument extraction for
ModelResponse choices (response path).
Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>
* chore(ui): restructure pre-built Next.js output to directory-based routing
Flat page files (e.g. guardrails.html) replaced by directory-based
index.html equivalents (e.g. guardrails/index.html) matching the
Next.js App Router output format.
Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>
* fix(purview): comprehensive security hardening — identity spoofing, streaming bypass, token-id gap
Four security issues addressed:
1. end_user_id kwargs fallback missing in _resolve_user_id_from_logging_kwargs
user_id already fell back to kwargs.get("user_api_key_user_id") when absent
from metadata, but end_user_id only checked md.get("user_api_key_end_user_id")
with no kwargs-level fallback. Added or kwargs.get("user_api_key_end_user_id").
2. Streaming responses bypassed post_call blocking
async_post_call_success_hook only runs on assembled non-streaming responses.
For streaming requests the proxy already delivered all content before the
hook ran, so raising HTTPException there had no effect. Added
async_post_call_streaming_iterator_hook which buffers the entire stream,
assembles it via stream_chunk_builder, runs the Purview DLP check, and only
then re-yields chunks via MockResponseIterator. If a violation is detected the
exception is raised before any bytes reach the client. The proxy automatically
skips async_post_call_success_hook for guardrails that define this method,
preventing duplicate scans.
3. Caller-controlled Purview user identity in blocking modes
When a LiteLLM API key has no bound user_id the guardrail fell back to
metadata[user_id_field], which is supplied by the caller. A caller could set
this to any Entra object ID whose Purview policies are more permissive and
bypass DLP. Added _resolve_trusted_user_id() that only returns identities
from the proxy auth system (user_api_key_dict.user_id, end_user_id, or
proxy-injected metadata["user_api_key_user_id"]). Added
_resolve_user_id_for_blocking() used by all blocking-mode hooks: tries
trusted sources first; if only caller-supplied is available, logs a
SECURITY WARNING and still proceeds (backward compat); if nothing resolves,
skips with a warning.
4. Token-id prompt DLP bypass
When /v1/completions received a pure token-id array prompt,
completion_prompt_to_str() returned None and the pre_call hook silently
skipped the Purview scan. An authenticated caller could tokenize blocked
text and send it without DLP evaluation. The hook now detects this case
(raw_prompt present but prompt_text None) and logs a WARNING while letting
the request pass through — token-id payloads are opaque at the text layer
and cannot be scanned. This makes the gap explicit rather than silent.
Tests: 94 total, all passing.
Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>
* Revert "chore(ui): restructure pre-built Next.js output to directory-based routing"
This reverts commit c70c4303b735bb3885732bd4a0e01997e9571f56.
* fix(purview): fail closed on identity spoofing, token prompts, and path encoding
Encode Entra user IDs in Graph paths, guard caches with asyncio.Lock, scan
Responses API instructions with string input, reject caller-only metadata and
token-id completion prompts in blocking mode, and revert unrelated UI HTML
restructure from the PR branch.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(purview): use threading.Lock and getattr for LitellmParams
- Replace asyncio.Lock with threading.Lock in PurviewGuardrailBase.
The cache lock is acquired both from the proxy's main event loop and
from short-lived event loops created by the logging_hook thread
fallback. In Python 3.10+ an asyncio.Lock is bound to the first event
loop that acquires it, so the second loop would silently break audit
logging with RuntimeError. All critical sections are in-memory dict
ops with no awaits, so a synchronous lock is safe.
- Use getattr() on LitellmParams in initialize_guardrail() instead of
.get(), which does not exist on Pydantic BaseModel instances and
would raise AttributeError at runtime. Tests updated to construct
Mock objects with spec= so they reflect the real interface.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* refactor(purview): dedupe trust-level user resolution and drop dead code
- _resolve_user_id now delegates levels 1-3 to _resolve_trusted_user_id
so blocking and non-blocking paths share a single source of truth.
- Drop redundant event_hook override in MicrosoftPurviewDLPGuardrail.__init__
(initialize_guardrail already forwards event_hook=litellm_params.mode).
- Drop unused self._logging_only attribute; blocking is controlled by the
block_on_violation argument passed to _check_content.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(purview): fail-closed on responses API transform error; avoid duplicate audit calls
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(purview): fail-closed blocking DLP; revert directory-based UI HTML
Blocking hooks now require UserAPIKeyAuth user_id/end_user_id only (no
spoofable metadata), re-raise Responses API transform errors, scan streamed
text completions, and reject requests with no bound identity. Reverts the
accidental directory-based Next.js output from cc47081 (c70c4303b7).
Co-authored-by: Cursor <cursoragent@cursor.com>
* Remove dead code in purview_dlp: _resolve_user_id_for_blocking never returns falsy
The method either returns a non-empty trusted user id or raises HTTPException,
so the 'if not user_id' guards in async_pre_call_hook and async_post_call_success_hook
were unreachable. Tighten the return type to str and drop the dead checks to
make the fail-closed behavior explicit.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(purview): exclude caller-controlled end_user_id from blocking DLP
Blocking Purview checks now use only API-key/JWT-bound user_id, not
end_user_id populated from request user/metadata/safety_identifier.
Co-authored-by: Cursor <cursoragent@cursor.com>
* style(purview): apply Black formatting to base.py
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(purview): use post-await timestamp for cache TTL
Capture the timestamp after the network call completes when storing it
as the cache freshness marker, so the effective TTL reflects when the
response was actually received rather than when the request started.
Under high network latency the previous behavior shortened the
effective cache lifetime.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(purview_dlp): fail closed when stream_chunk_builder returns None
stream_chunk_builder can return None (e.g., when ChunkProcessor filters
all chunks), causing both isinstance checks to fail and the buffered
chunks to be released without DLP scanning. Explicitly fail closed in
that case by raising an HTTPException so the streaming DLP guardrail
does not bypass policy enforcement.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(purview_dlp): resolve user_id before buffering stream
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* merge main (#28629)
* test(vcr): classify cache verdicts, detect live calls, surface cost leaks
Convert the per-test VCR verdict line from a single 'NOOP / HIT / MISS /
PARTIAL' tag into a classified outcome that distinguishes the cases that
silently bill the live API on every CI run from the ones that don't:
HIT pure replay
PARTIAL mixed replay + new recordings
MISS:RECORDED new cassette saved to Redis (cached next run)
MISS:OVERFLOW cassette > MAX_EPISODES_PER_CASSETTE; persister
refused to save; re-bills every run
MISS:NOT_PERSISTED test failed; save_cassette skipped; re-bills
NOOP VCR-marked but no HTTP traffic (mocked elsewhere)
UNMARKED:LIVE_CALL test bypassed VCR AND opened a TCP connection
to a known LLM provider host -> wasted spend
UNMARKED:NO_TRAFFIC test bypassed VCR but didn't call out
The UNMARKED:LIVE_CALL signal is what converts 'this test probably hits
live' into 'this test connected to api.openai.com'. We install a
socket.connect / socket.create_connection wrapper for the duration of
each non-VCR-marked test and record any outbound TCP to a known LLM
provider hostname. The probe sits below the httpx layer so vcrpy and
respx (which both patch above the socket) are unaffected.
Replace the file-level _RESPX_CONFLICTING_FILES blacklists in the
llm_translation and local_testing conftests with per-item respx
detection in apply_vcr_auto_marker_to_items. A test now skips VCR when
it actually carries @pytest.mark.respx or has respx_mock in its fixture
chain - not just because some other test in the same file imports
MockRouter. Items skipped by skip_files are split into respx_conflict
(real conflict, the module wires up respx) vs file_opt_out (dead skip-
list entry whose module never touches respx) so the session summary
makes pruning obvious.
Stabilize the AWS SigV4 fingerprint: the Authorization header on
Bedrock requests rotates its Credential date and Signature on every
call, which previously pushed every Bedrock test past the 50-episode
overflow threshold. Extract the access-key id only
('aws-sigv4:AKIA...') so two requests with the same identity match.
Always emit verdict logging when VCR is active (set
LITELLM_VCR_VERBOSE=0 to opt back into the legacy quiet mode). Add a
session-end classification summary that lists overflow tests, unmarked
live-call tests, and the skip-reason breakdown.
Wire the live-call probe + summary hook into every test directory that
already uses the Redis-backed VCR cache (audio_tests, guardrails_tests,
image_gen_tests, litellm_utils_tests, llm_responses_api_testing,
llm_translation, local_testing, logging_callback_tests, ocr_tests,
pass_through_unit_tests, router_unit_tests, search_tests,
unified_google_tests).
Add tests/llm_translation/test_vcr_classification.py covering the
verdict classifier, skip-reason tagging, AWS SigV4 fingerprint stability,
live-host classification, and session summary rendering.
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* test(vcr): drop dead 'from respx import MockRouter' imports
These seven test files were on _RESPX_CONFLICTING_FILES, which made the
auto-marker skip them entirely. Inspecting the source shows the only
respx artifact is a top-level 'from respx import MockRouter' that no
test ever uses - no @pytest.mark.respx, no respx_mock fixture, no
respx.mock context manager. The import is dead code left over from a
previous mocking pattern.
Now that apply_vcr_auto_marker_to_items detects respx per-item via the
marker / fixture chain (
|
||
|
|
574ee7526d
|
test(streaming): tolerate Vertex 429 wrapped in MidStreamFallbackError (#28669)
Streaming 429s are wrapped in MidStreamFallbackError so the Router can fall back; the existing 'except litellm.RateLimitError: pass' in test_vertex_ai_stream no longer matches, causing the generic pytest.fail branch to fire when upstream Vertex returns 429. Add a sibling except for MidStreamFallbackError that only swallows it when e.original_exception is a RateLimitError, so unrelated streaming failures still fail the test. |
||
|
|
1b141bc588
|
fix(bedrock): decouple STS region from Bedrock aws_region_name (#28245)
* fix(bedrock): decouple STS region from Bedrock aws_region_name STS AssumeRole now resolves signing region from aws_sts_endpoint (parsed host) or AWS_REGION/AWS_DEFAULT_REGION instead of aws_region_name, fixing air-gapped cross-region Bedrock setups and endpoint/signature mismatches. Co-authored-by: Cursor <cursoragent@cursor.com> * test(bedrock): add regression coverage for _build_sts_client_kwargs Parametrize _resolve_sts_region and _build_sts_client_kwargs matrix cases, and assert IRSA/web-identity paths use aligned STS endpoint and region_name. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(bedrock): tighten STS region helpers and drop redundant web-identity endpoint synthesis Co-authored-by: Cursor <cursoragent@cursor.com> * test(bedrock): cover FIPS, GovCloud, and China STS endpoints Addresses greptile P2: regex sts(?:-fips)? supported sts-fips hosts but was not exercised by the parametrized parse test. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
a3c953ed4e
|
style: apply black formatting to fix lint CI (LIT-3274) (#28639) (#28641)
* fix(bedrock): strip bedrock/ prefix and URL-encode ARNs in get_bedrock_model_id for invoke path
The invoke path (used by /v1/messages → Anthropic SDK / Claude Code) called
get_bedrock_model_id() which, when falling back to the raw model string, did
not strip the 'bedrock/' routing prefix and did not URL-encode ARNs.
For a model like:
bedrock/arn:aws:bedrock:us-east-1:<ACCOUNT>:inference-profile/global.anthropic...
the URL built was:
/model/bedrock/arn:aws:bedrock:…/invoke-with-response-stream ❌
Bedrock returned a JSON error body. LiteLLM's AWSEventStreamDecoder passed
those bytes into botocore's EventStreamBuffer which expects binary event-stream
framing. Checksum validation failed on the JSON prelude (0x223a7b22 == ':{"')
producing a misleading botocore.eventstream.ChecksumMismatch instead of the
actual Bedrock error.
Fix: strip 'bedrock/' (and 'invoke/') routing prefix from model string, then
URL-encode if the result is an ARN — matching what the converse path already
does in converse_handler.py.
Fixes: LIT-3274
* fix(bedrock): use strip_bedrock_routing_prefix to handle compound prefixes
Address greptile review: the original fix used a loop with break, so
bedrock/invoke/arn:... only stripped bedrock/ leaving invoke/arn:...
which is not an ARN → fell through to .replace('invoke/','',1) →
bare unencoded ARN → same malformed-URL bug.
strip_bedrock_routing_prefix() iterates without break, correctly
stripping bedrock/ then invoke/ in sequence. Also adds test case
for the compound-prefix scenario.
* style: apply black formatting to fix lint CI (LIT-3274)
---------
Co-authored-by: oss-agent-shin <ext-agent-shin@berri.ai>
Co-authored-by: LiteLLM Bot <bot@berri.ai>
|
||
|
|
9600fda2cc
|
fix(sagemaker): send native Cohere embed payload to Cohere SageMaker endpoints (#28613)
* fix(sagemaker): use Cohere embed payload for Marketplace endpoints
SageMaker embedding only special-cased Voyage; every other endpoint received
HuggingFace TGI `{"inputs": [...]}`. AWS Marketplace Cohere containers expect
the native Cohere embed payload (`texts`, `input_type`) and reject the HF
shape with `422 EmbedReqV2.inputs is of type string but should be of type
Object`.
Add `SagemakerCohereEmbeddingConfig` that reuses Bedrock/Cohere request and
response transforms, and route SageMaker endpoint names containing `cohere`
or a Cohere embed model fragment (`embed-multilingual`, `embed-english`,
`embed-v3`, `embed-v4`) to it. Supports `input_type`, `dimensions`, and
`encoding_format`. Voyage and HuggingFace SageMaker endpoints are unchanged.
Co-authored-by: Cursor <cursoragent@cursor.com>
* refactor(sagemaker): simplify cohere detection and align with file conventions
- Detect Cohere SageMaker endpoints with a single `"cohere" in model.lower()`
check, mirroring the existing Voyage branch instead of a separate helper
function and marker constant.
- Drop instance caches of sub-configs; instantiate `BedrockCohereEmbeddingConfig`
/ `CohereEmbeddingConfig` per call to match the existing pattern in
`BedrockCohereEmbeddingConfig._transform_request`.
- Match `SagemakerEmbeddingConfig`'s signatures, defaults, and `Any` typing for
`logging_obj`; collapse the input-normalization helper inline.
- Inline `transform_embedding_response` input lookup; no behavior change.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(sagemaker): restore provider-supported embedding params after map
Cohere input_type is advertised in get_supported_openai_params but was
filtered out of non_default_params by OPENAI_EMBEDDING_PARAMS before
map_openai_params ran. Merge supported params from passed_params after
map (same path Greptile flagged). Handle input_type explicitly in
SagemakerCohereEmbeddingConfig.map_openai_params and add an integration
test through get_optional_params_embeddings.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(embeddings): only restore non-OpenAI supported params after map
The post-map restore loop must skip OPENAI_EMBEDDING_PARAMS so mapped
fields (e.g. dimensions -> output_dimension) are not duplicated under
their OpenAI names. Align SageMaker embedding import order with sibling
files and add a regression test for dimensions mapping.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(sagemaker): avoid double post_call on Cohere embedding response
Greptile review on #28613 caught that `CohereEmbeddingConfig._transform_response`
calls `logging_obj.post_call` internally. The SageMaker embedding handler
already calls `post_call` once before invoking the transform, so the Cohere
SageMaker path fired callbacks, cost calculators, and log handlers twice
per request.
Extract the parsing body of `_transform_response` into
`_populate_embedding_response` (pure extract-method, no behavior change
for existing Cohere direct or Bedrock Cohere paths, which keep calling
`_transform_response`). Have `SagemakerCohereEmbeddingConfig` call the
new helper directly so it parses the response without re-logging.
Add a regression test asserting `logging_obj.post_call` is not invoked
by the SageMaker Cohere transform.
Co-authored-by: Cursor <cursoragent@cursor.com>
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
|
||
|
|
643989989f
|
chore(test): remove dead old Playwright e2e suite (#28632)
The Playwright suite under tests/proxy_admin_ui_tests/e2e_ui_tests/ is no longer wired into CI (only test_*.py is globbed) and every active spec is duplicated by ui/litellm-dashboard/e2e_tests/tests/ (login, auth redirect, search users, internal user list). team_admin.spec.ts was entirely commented out. Removing the directory plus its only-used-here playwright config, package.json/lock, and utils/login.ts keeps the canonical suite under ui/litellm-dashboard/e2e_tests/ as the single source of truth. |
||
|
|
f62ae93e13
|
test(proxy): behavior-pinning matrix for tier-2/3 key + team management endpoints (#28620)
* test(proxy): add create_scratch_actor harness helper
Adds create_scratch_actor() to the management behavior-suite conftest and
extends create_scratch_team() with team_member_permissions / models kwargs,
needed by the PR3 team-key-permission and team-model matrices. The new
helper mints a scratch-prefixed user + verification token (+ org
memberships), all reclaimed by the existing scratch-prefix teardown.
* test(proxy): pin /key block, unblock, health, aliases behavior
Adds behavior-pinning matrices for POST /key/block, POST /key/unblock,
POST /key/health, and GET /key/aliases. Pins that the management-route gate
401s ORG_ADMIN-role callers before _check_key_admin_access runs, the
block/unblock round-trip on the blocked column, missing-key 404, and the
_apply_non_admin_alias_scope visibility rules for /key/aliases.
* test(proxy): pin /key/bulk_update + /team/key/bulk_update behavior
Adds behavior-pinning matrices for POST /key/bulk_update (PROXY_ADMIN-only;
ORG_ADMIN stopped 401 at the route gate, INTERNAL_USER-role 403 at the
handler) and POST /team/key/bulk_update (team-member-permission gate keyed
on KEY_UPDATE). Pins batch semantics: empty/over-cap 400, per-key failure
isolation into failed_updates, all_keys_in_team broadcast, and no-keys 404.
Adds an optional key_alias arg to create_scratch_key for multi-key scenarios.
* test(proxy): pin /key SA-generate, v2-info, reset-spend behavior
Adds behavior-pinning matrices for POST /key/service-account/generate
(team-membership + team-member-permission gating; SA keys carry no user_id),
POST /v2/key/info (per-key _can_user_query_key_info silently drops invisible
keys), and POST /key/{key}/reset_spend (PROXY_ADMIN or team admin only;
missing key 404, reset-value 400). Pins that ORG_ADMIN-role callers are
stopped 401 at the management-route gate on the two non-info routes.
* test(proxy): close PR1/PR2 key-side deferred coverage gaps
Closes the four key-side gaps deferred from PR1/PR2:
- 404 on missing key for /key/update and /key/delete (not 401/403)
- denied /key/update leaves max_budget/tpm_limit/rpm_limit untouched
- /key/regenerate enforces litellm.upperbound_key_generate_params (#26340)
- /key/list key_alias substring vs exact (admin-only) + team_id filter,
and a non-admin filtering a foreign team is 403
* test(proxy): pin /team block, unblock, available, filter/ui, members/me
Adds behavior-pinning matrices for POST /team/block + /team/unblock
(management-route gate fronts _verify_team_access; reachable only by
PROXY_ADMIN and an org admin of the team's own org), GET /team/available
(default empty path), GET /team/filter/ui (route-gated PROXY-ADMIN-only
despite the handler having no gate), and GET /team/{team_id}/members/me
(caller resolves its own membership; non-member 404, no-user_id key 400).
* test(proxy): pin /team model add/delete + permissions endpoints
Adds behavior-pinning matrices for POST /team/model/add + /team/model/delete
(route-gated PROXY-ADMIN-only; missing team 404), GET /team/permissions_list +
POST /team/permissions_update (self-managed; proxy/team/org admin pass), and
POST /team/permissions_bulk_update (PROXY_ADMIN-only). Pins the deliberate
divergence that the available-team self-join grants read access via
permissions_list but never write access via permissions_update.
* test(proxy): pin /team delete, bulk_member_add, v2/list, daily/activity
Adds behavior-pinning matrices for POST /team/delete (per-team
_verify_team_access; batch aborts whole on a missing id), POST
/team/bulk_member_add (route-gated PROXY-ADMIN-only; empty/over-cap 400),
GET /v2/team/list (_enforce_list_team_v2_access — bare query 401s regular
users, org-scoped for org admins) and GET /team/daily/activity (non-member
team_ids filter 404, the VERIA-43 fix).
* test(proxy): add route-coverage gate + close team org-relocation gap
Adds test_route_coverage.py (PR3.M1): parses every @router route literal
from the two management-endpoint source files and asserts each is exercised
by >=1 behavior-suite scenario — a permanent regression guard for future
routes. Closes the last PR1/PR2 deferred gap: the /team/update org-relocation
allowed branch, exercised by a dual-org-admin minted via create_scratch_actor.
test_team_model uses literal route URLs so the coverage parser resolves them.
* test(proxy): bound plain route params to one path segment in coverage gate
Plain path params ({team_id}) now compile to [^/?]+ instead of [^?]+, so a
parameter cannot span '/'. Starlette ':path' params still match across '/'.
Keeps the route-coverage guard from falsely reporting a future multi-segment
route as covered. All 37 routes remain covered.
|
||
|
|
985574b6be
|
fix(check_licenses): read PEP 639 license-expression metadata (#28529)
The dependency license checker only read the legacy free-text `info.license` field from PyPI. Packages that adopt PEP 639 publish their license as an SPDX expression in `info.license_expression` and leave the legacy field null, so the checker reported "Unknown license" and failed CI for every newly-bumped PEP 639 dependency. `get_package_license_from_pypi` now resolves the license in order: `license_expression`, then legacy `license`, then the `License :: OSI Approved :: ...` trove classifiers. `is_license_acceptable` splits compound SPDX expressions on the uppercase OR/AND operators (case-sensitive, so the lowercase `-or-later` inside an identifier is not mistaken for an operator) and strips `WITH <exception>` suffixes, requiring every component to be acceptable. Free-text license blobs are detected and fall back to the original whole-string matching. The `black` and `pydantic-settings` entries in liccheck.ini that existed solely to work around this now resolve correctly on their own and have been removed. |
||
|
|
b0b25ae4b9
|
Include team alias in CLI JWT token (#28621) | ||
|
|
e9f0eddbd1
|
Litellm oss staging 2 (#28582)
* fix(anthropic): handle empty streaming tool calls (#28549) Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> * [Feature][Bug Fix] Decouple Azure OpenAI Deployment ID from model name via base_model to fix gpt5 model routing (#28490) * feat(azure): decouple deployment ID from model name via base_model Azure OpenAI deployments have arbitrary names (deployment IDs) that may not match the underlying model. Previously, model-type detection (o-series, gpt-5, etc.) relied on substring matching against the deployment name, causing misrouted configs and rejected params when deployment names were non-standard (e.g. 'my-deployment-id' for gpt-5.2). This change extends the existing base_model field to drive model-type detection, config selection, supported param resolution, and param mapping throughout the Azure call path: - _get_azure_config() uses base_model for is_o_series/is_gpt_5 checks - get_provider_chat_config() threads base_model for Azure - get_supported_openai_params() accepts and uses base_model - get_optional_params() accepts base_model and passes it to all Azure config method calls (get_supported_openai_params, map_openai_params) - azure.py completion handler uses base_model for GPT-5 detection - Config internal methods (e.g. is_model_gpt_5_2_model) now receive base_model so features like logprobs are correctly enabled Fully backward compatible - when base_model is unset, behavior is identical. Existing o_series/ and gpt5_series/ prefix workarounds continue to work. Usage in proxy config: model_list: - model_name: my-gpt5 litellm_params: model: azure/my-deployment-id model_info: base_model: azure/gpt-5.2 Fixes: non-standard deployment names like 'prefix-gpt-5.2' rejecting logprobs/top_logprobs despite the underlying model supporting them. * Addressing Greptile comments. * gemini-3.1-flash-lite pricing (#27933) * feat(model_prices): add gemini-3.1-flash-lite pricing with standard/batch/flex/priority tiers * fix pricing * add service tier --------- Co-authored-by: shin-berri <shin-laptop@berri.ai> * fix(openai-responses): strip Anthropic cache_control from Responses API requests (#28431) Squash-merged by litellm-agent from cwang-otto's PR. * Treat None litellm_provider as wildcard in _check_provider_match (#28523) Squash-merged by litellm-agent from adityasingh2400's PR. * fix greptile * fix: use _azure_detection_model in default Azure branch of get_supported_openai_params Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(openai-responses): strip cache_control on compact endpoint as well Co-authored-by: Yassin Kortam <yassin@berri.ai> --------- Co-authored-by: Felipe Garé <90070734+FelipeRodriguesGare@users.noreply.github.com> Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: withomasmicrosoft <withomas@microsoft.com> Co-authored-by: mubashir1osmani <mubashir.osmani777@gmail.com> Co-authored-by: cwang-otto <chengxuan.wang@ottotheagent.com> Co-authored-by: Aditya Singh <60082699+adityasingh2400@users.noreply.github.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Yassin Kortam <yassin@berri.ai> |
||
|
|
21a21e01f7
|
fix(responses): use OpenAI SSEDecoder for Responses API streaming (#28566)
* fix(responses): use OpenAI SSEDecoder for Responses API streaming httpx aiter_lines() uses str.splitlines(), which splits on U+2028 inside JSON payloads and silently drops response.completed (no spend log). Use openai._streaming.SSEDecoder (bytes.splitlines before decode) instead. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(responses): drop redundant SSE prefix strip after SSEDecoder switch SSEDecoder already strips the 'data:' field prefix from each event, so the extra call to _strip_sse_data_from_chunk on sse.data was redundant and could incorrectly mangle payloads whose actual content starts with 'data:'. Co-authored-by: Yassin Kortam <yassin@berri.ai> --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Yassin Kortam <yassin@berri.ai> |
||
|
|
50a3f10a92
|
feat(proxy): persist allowlisted OIDC claims in CLI SSO poll (#28463)
* feat(proxy): persist allowlisted OIDC claims in CLI SSO poll
Map CLI_SSO_CLAIM_MAP sources into user metadata and return scalar
attribution_metadata from /sso/cli/poll. Build SSOUserDefinedValues in
cli_sso_callback so first-time CLI logins can upsert users. Add mock OIDC
scripts and tests for claim extraction and poll exposure.
Co-authored-by: Cursor <cursoragent@cursor.com>
* docs(proxy): document CLI SSO attribution_metadata in client README
Co-authored-by: Cursor <cursoragent@cursor.com>
* Delete scripts/mock_oidc_server_for_cli_sso.py
* Delete scripts/test_cli_sso_claims_e2e.py
* fix(ui_sso): preserve claim types and avoid metadata. prefix stripping
- Replace _update_dictionary with a local recursive merge so string
OIDC claim values that happen to look numeric are not silently coerced
to int/float when persisting CLI SSO attribution metadata.
- Use a local dot-path resolver in _extract_sso_claim_value so that
source claim paths beginning with 'metadata.' are not silently stripped
by get_nested_value (which is designed for LiteLLM JWT metadata, not
arbitrary OIDC claims).
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* Remove redundant metadata. prefix strip in _set_nested_metadata_value
The _parse_cli_sso_claim_map already strips the metadata. prefix from
dest keys before reaching the setter. The duplicate strip in
_set_nested_metadata_value was a no-op in normal flow but could
mis-place values for dest keys like metadata.metadata.foo.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* Fix greptile
* Fix ruff
* Move CLI SSO user defined values build inside try/except for consistent error handling
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(proxy): enforce restricted SSO group on CLI SSO callback
Apply verify_user_in_restricted_sso_group before CLI session completion
and user upsert, matching the UI SSO path. Re-raise ProxyException so
restricted-group denials return 403 instead of 500.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(proxy): replace recursive CLI SSO metadata helpers with iterative merge
Use stack-based flatten/merge to satisfy recursive_detector CI. Fix mypy
types for UserApiKeyCache and user_id on CLI SSO session completion.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix: resolve nested CustomOpenID extra_fields in CLI SSO claim extraction
When GENERIC_USER_EXTRA_ATTRIBUTES captures a parent object (e.g. org_info),
extra_fields stores it as {"org_info": {"department": "..."}}. A CLI claim
map entry using a dotted path like org_info.department would silently fail
because the lookup only checked the exact flat key. Fall back to dotted-path
resolution on extra_fields before model_dump().
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(sso): update CLI SSO test for new received_response kwarg and remove redundant 'token' secret fragment
Co-authored-by: Yassin Kortam <yassin@berri.ai>
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
|
||
|
|
ef36e89638
|
feat(mcp): Add tool call and tool list support via UI for Oauth mcps (#28454)
* feat(mcp): cache OAuth token client-side so Tools tab loads without re-auth
After a user creates an OAuth MCP server and completes the authorization
flow, the resulting access token is now stored in sessionStorage keyed by
server_id. The MCP Tools tab reads this cached token and includes it as
an MCP auth header when listing and invoking tools, so the user never sees
an empty tool list. When the session ends (tab close / new browser) an
Authorize button re-triggers the flow without leaving the Tools screen.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
* fix(ui/mcp): surface listMCPTools 401 errors so auth gate reappears
listMCPTools previously swallowed all errors (including HTTP 401) by
returning a synthetic { tools: [], error: 'network_error', ... } payload.
That made the useQuery retry-on-401 guard and mcpToolsError dead code,
so expired OAuth tokens never re-triggered the auth gate.
- Throw an enhanced Error with .status attached on non-2xx responses
(still preserves the legacy shape for true network failures so the
caller can render a generic message without crashing).
- Clear the cached OAuth session token when the tools query fails with
401, mirroring callMCPTool's onError handler so the Authorize button
is shown again.
- Surface mcpToolsError in the existing error banner.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(mcp-tools): stable onSuccess + reuse parsed flow state
- Pass stable setOauthToken setter directly as onSuccess to avoid
recreating useToolsOAuthFlow's resumeOAuthFlow on every render.
- Reuse the already-parsed FLOW_STATE_KEY value (peeked) instead of
re-reading and re-parsing sessionStorage in resumeOAuthFlow.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(ui/mcp): restore listMCPTools never-throws contract
The previous fix made listMCPTools throw on HTTP errors while still
returning a synthetic object on network errors. This inconsistent
contract broke existing callers (MCPToolPermissions, MCPAppsPanel,
MCPConnectPicker) which inspect result.error / result.message and
expect the function to never throw.
- Return a normalized { tools: [], error, message, status, ... }
object on HTTP errors (instead of throwing) so all callers see a
consistent shape and the user-visible error text from
result.message is preserved.
- Convert the returned error object into a thrown Error inside the
one caller that needs it — the useQuery in mcp_tools.tsx — so the
401 retry/onError handlers still trigger and clear the cached
OAuth token.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix greptile
* fix(mcp): align OAuth header alias lookup with dashboard sanitization
Backend auth header resolution now matches x-mcp-{alias} keys produced by
the dashboard sanitizer, and the Tools tab re-syncs OAuth tokens when
serverId changes.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(mcp): widen auth header lookup types for list_tools
Accept legacy str | dict server auth maps and annotate list_tools
server_auth_header as Union[str, dict] for mypy.
Co-authored-by: Cursor <cursoragent@cursor.com>
* refactor(ui): extract shared buildCallbackUrl/clearStorage for MCP OAuth hooks
Hoist the duplicate buildCallbackUrl and clearStorage helpers out of
useToolsOAuthFlow and useUserMcpOAuthFlow into a new shared module
src/hooks/mcpOAuthUtils.ts so the two hooks cannot drift if the URL
construction or storage cleanup logic needs to change.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(ui): don't gate M2M OAuth MCP servers behind interactive authorize
M2M (client_credentials) OAuth servers share auth_type="oauth2" with
interactive PKCE servers, but the backend fetches their token internally
and they typically lack a user authorization endpoint. Gating tool
listing on them rendered an Authorize button that would fail or redirect
incorrectly. Detect M2M via the presence of token_url (matching the
existing heuristic in mcp_server_edit.tsx) and skip the auth gate.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(ui/mcp): return error shape when listMCPTools JSON parse fails
Restore the never-throws contract when response.json() fails on a 2xx
body so callers do not receive null and crash on result.tools.
Co-authored-by: Cursor <cursoragent@cursor.com>
---------
Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
|
||
|
|
7a93cceb9f
|
Add error_description and hint for oauth flows (#28471)
* Add error_description and hint for oauth flows * Fix tests * fix(mcp-oauth): improve redirect_uri errors without leaking internal config Use NoReturn on _oauth_invalid_request, structured errors for BYOK loopback validation, and refactor validate_trusted_redirect_uri to satisfy PLR0915. Keep PROXY_BASE_URL and raw proxy_base_url in server logs only, not in the HTTP 400 body returned to unauthenticated callers. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(mcp-oauth): stop leaking internal proxy origin in redirect_uri 400 body The trusted-redirect-uri rejection helper included the proxy's resolved scheme/host/port (e.g. http://litellm-internal:4000) in both the error_description and as a top-level proxy_origin field. Since the OAuth /authorize endpoint is unauthenticated, any caller could probe with a crafted redirect_uri and enumerate the internal network topology behind a reverse proxy. Keep full diagnostic detail in the server-side warning log (including the computed proxy base) but omit proxy-side values from the HTTP 400 body. Also drop the duplicated origin computation in _raise_trusted_redirect_uri_rejected now that those values are no longer needed by the response. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp-oauth): remove dead userinfo check in redirect_uri validation The first check combined missing netloc with userinfo presence, making the second userinfo-only check unreachable. Split into two distinct checks so each error message reflects the actual failure mode. Co-authored-by: Yassin Kortam <yassin@berri.ai> --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Yassin Kortam <yassin@berri.ai> |
||
|
|
d96e26064f
|
Fix conflicts and UI (#28477) | ||
|
|
d04373f4ce
|
Add granian as a ASGI compliant web server. Provider better throughput stability, (#26027)
* Add granian as a ASGI compliant web server. Provides better stability, 10-20 RPS improvement under standard LT conditions. TODO: Verify poetry lock details and add locust numbers to PR * Update granian version in license_cache.json and pyproject.toml to 2.5.7 * Enhance proxy CLI tests by adding SSL initialization checks for Granian server. Remove Python version skip conditions and implement tests to ensure SSL certificate and key are required for server initialization. * update uv lock to fix granian import error |
||
|
|
07bcd2c19e
|
test(e2e): forward LITELLM_LICENSE to UI e2e proxy (#28398)
* test(e2e): forward LITELLM_LICENSE to UI e2e proxy The UI e2e job ran without LITELLM_LICENSE, so premium_user was always false in the issued login JWT and premium-gated UI surfaces (Team-BYOK Model switch, etc.) couldn't be driven through the UI. Forward the env var from run_e2e.sh and the CircleCI e2e_ui_testing job, and add a sanity test that decodes the admin storage state token and asserts premium_user=true so the wiring fails loudly if it ever regresses. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Update ui/litellm-dashboard/e2e_tests/tests/proxy-admin/license.spec.ts Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> |
||
|
|
9fcd424318
|
chore(deps): bump deps (#28528)
* build(deps): bump next from 16.2.4 to 16.2.6 in /ui/litellm-dashboard (#27665) Bumps [next](https://github.com/vercel/next.js) from 16.2.4 to 16.2.6. - [Release notes](https://github.com/vercel/next.js/releases) - [Changelog](https://github.com/vercel/next.js/blob/canary/release.js) - [Commits](https://github.com/vercel/next.js/compare/v16.2.4...v16.2.6) --- updated-dependencies: - dependency-name: next dependency-version: 16.2.6 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump protobufjs in /tests/pass_through_tests (#28296) Bumps [protobufjs](https://github.com/protobufjs/protobuf.js) from 7.5.6 to 7.6.0. - [Release notes](https://github.com/protobufjs/protobuf.js/releases) - [Changelog](https://github.com/protobufjs/protobuf.js/blob/protobufjs-v7.6.0/CHANGELOG.md) - [Commits](https://github.com/protobufjs/protobuf.js/compare/protobufjs-v7.5.6...protobufjs-v7.6.0) --- updated-dependencies: - dependency-name: protobufjs dependency-version: 7.6.0 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump ws from 8.20.0 to 8.20.1 in /tests/pass_through_tests (#28303) Bumps [ws](https://github.com/websockets/ws) from 8.20.0 to 8.20.1. - [Release notes](https://github.com/websockets/ws/releases) - [Commits](https://github.com/websockets/ws/compare/8.20.0...8.20.1) --- updated-dependencies: - dependency-name: ws dependency-version: 8.20.1 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> |